The relationship between words in a sentence often tell us more about the underlying semantic content of a document than its actual words, individually. Recent publications in the natural language processing arena, more specifically using word embeddings, try to incorporate semantic aspects into their word vector representation by considering the context of words and how they are distributed in a document collection. In this work, we propose two novel algorithms, called Flexible Lexical Chain II and Fixed Lexical Chain II that combine the semantic relations derived from lexical chains, prior knowledge from lexical databases, and the robustness of the distributional hypothesis in word embeddings into a single decoupled system. In short, our approach has three main contributions: (i) unsupervised techniques that fully integrate word embeddings and lexical chains; (ii) a more solid semantic representation that considers the latent relation between words in a document; and (iii) lightweight word embeddings models that can be extended to any natural language task. Knowledge-based systems that use natural language text can benefit from our approach to mitigate ambiguous semantic representations provided by traditional statistical approaches. The proposed techniques are tested against seven word embeddings algorithms using five different machine learning classifiers over six scenarios in the document classification task. Our results show that the integration between lexical chains and word embeddings representations sustain state-of-the-art results, even against more complex systems.
Github: https://github.com/truas/LexicalChain_Builder
Terry Ruas, Charles Henrique Porto Ferreira, William Grosky, Fabrício Olivetti de França, Débora Maria Rossi de Medeiros, "Enhanced word embeddings using multi-semantic representation through lexical chains", Information Sciences, 2020, https://doi.org/10.1016/j.ins.2020.04.048
This data set is a collection of word similarity benchmarks (RG65, MEN3K, Wordsim 353, simlex999, SCWS, yp130, simverb3500) in their original format and converted into a cosine similarity scale.
In addition, we have two Wikpedia Dumps from 2010 (April) and 2018 (January) in which we provide the original format (raw words), converted using the techniques described in the paper (MSSA, MSSA-D and MSSA-NR) (title in this repository), and also the word embeddings models for 300d and 1000d using a word2vec implementation. A readme.txt is provided with more details for each file.
This merged Global Precipitation Measurement (GPM) Core Observatory and atmospheric river dataset contains gridded Goddard Profiling (GPROF) algorithm v7 precipitation rates (Kummerow et al. 2015; Randel et al. 2020), Remote Sensing Systems (RSS) atmospheric water vapor (Meissner et al. 2012), and Mattingly et al. (2018) atmospheric rivers in the North Atlantic and North Pacific oceans. The GPROF precipitation rates and RSS atmospheric water vapor are both derived using the GPM Microwave Imager (GMI) brightness temperature observations. The atmospheric river data is derived from MERRA-2 (Modern-Era Retrospective analysis for Research and Applications Reanalysis, Version 2) integrated water vapor transport (Mattingly et al. 2018).
, The data coverage starts at the beginning of the GPM data record (GPM launched in Feb 2014 and the processed data coverage starts in May 2014). Subsequent years will be added throughout the lifetime of the project.
, The monthly files are compressed into year and basin: either the North Atlantic (NA) or the North Pacific (NP) (e.g., NA_2014) and zipped. The files have the basin name indicated and are by year and month (e.g., gridded_atlantic_201405.nc). The files produced are in NetCDF format ( https://www.unidata.ucar.edu/software/netcdf/) and conform to all standard NetCDF metadata conventions ( http://cfconventions.org/cf-conventions/cf-conventions.html), and Kummerow, C. D., Randel, D. L., Kulie, M., Wang, N. Y., Ferraro, R., Joseph Munchak, S., & Petkovic, V. (2015). The evolution of the Goddard profiling algorithm to a fully parametric scheme. Journal of atmospheric and oceanic technology, 32(12), 2265-2280. https://doi.org/10.1175/JTECH-D-15-0039.1
Mattingly, K. S., Mote, T. L., & Fettweis, X. (2018). Atmospheric river impacts on Greenland Ice Sheet surface mass balance. Journal of Geophysical Research: Atmospheres, 123(16), 8538-8560. https://doi.org/10.1029/2018JD028714
Meissner, T., F. J. Wentz, and D. Draper, 2012: GMI Calibration Algorithm and Analysis Theoretical Basis Document, Remote Sensing Systems, Santa Rosa, CA, report number 041912, 124 pp.
Randel, D. L., Kummerow, C. D., & Ringerud, S. (2020). The Goddard Profiling (GPROF) precipitation retrieval algorithm. Satellite Precipitation Measurement: Volume 1, 141-152. https://doi.org/10.1007/978-3-030-24568-9_8
Research data supporting, 'Flexible Synthesis Scheme and Application of AuNP Surface-Conjugatable Meta-Iodobenzylguanidine Derivatives for Enhanced Cellular Internalization', 10.1021/acsmaterialslett.3c00781, AuNP = Au nanoparticle.
In 10.1021/acsmaterialslett.3c00781, we report the synthesis and application of two metaiodobenzylguanidine (MIBG) derivatives decorated on the surface of AuNPs at high molar ratios; greatly enhanced cellular uptake is observed across neuroblastoma (NB), HeLa, and HEK cell lines.
This dataset consists of synthetic NMR and mass spec data for the small molecules and their intermediates, dark-field microscopy data, inductively-coupled plasma mass spectrometry data, and raw data for characterizing the AuNPs (TEM, DLS, zeta potential).
Flexible Synthesis Scheme and Application of AuNP Surface-Conjugatable Metaiodobenzylguanidine Derivatives for Enhanced Cellular Internalization Natalie S. Potter, Alan McLean, Evan C. Bornowski, Thomas Hopkins, Jingyi Luo, John P. Wolfe, Wei Qian, and Raoul Kopelman ACS Materials Letters Article ASAP DOI: 10.1021/acsmaterialslett.3c00781
Data consists largely of UV-VIs spectra, both raw and analyzed, that were used to calibrate the relevant sensor. A more detailed description of individual files' contents can be found in the ReadMe word document.
images of plants, in nature or specimens, of the family Molluginaceae. The common species in Central Mali are Mollugo nudicaulis (from which a spontaneous "soap" can be made) and Glinus lotoides. See also Aizoaceae (Trianthema, Zaleya), Gisekiaceae (Gisekia), and Limeaceae (Limeus). These families have been combined in various ways in previous classifications.
Bacteria live in a broad range of environmental temperatures that require adaptations of their RNA sequences to maintain function. Riboswitches are regulatory RNAs that change conformation upon binding of typical metabolite ligands to control bacterial gene expression. The paradigmatic small class-I preQ1 riboswitches from the mesophile Bacillus subtilis (Bsu) and the thermophile Thermoanaerobacter tengcongensis (Tte) adopt similar pseudoknot structures when bound to preQ1. Here, we use single-molecule detected chemical denaturation by urea to compare the thermodynamic and kinetic folding properties of the two riboswitches, and the urea-countering effects of trimethylamine N-oxide (TMAO). This data includes the experimental findings and associated analyses detailed in the research article titled "Single-molecule FRET observes opposing effects of urea and TMAO on structurally similar meso- and thermophilic riboswitch RNAs". The data consists of multiple zip files, each representing an experiment that corresponds to the key results in the publication. Each experiment includes movies, qualifying smFRET trajectories, and analysis files related to various conditions within that experimental group.