Index Catalog // Deep Blue Data

Enhanced word embeddings using multi-semantic representation through lexical chains

Creator:: Ruas, Terry, Ferreira, Charles H. P., Grosky, William, França, Fabrício O., and Medeiros, Débora M. R,
Description:: The relationship between words in a sentence often tell us more about the underlying semantic content of a document than its actual words, individually. Recent publications in the natural language processing arena, more specifically using word embeddings, try to incorporate semantic aspects into their word vector representation by considering the context of words and how they are distributed in a document collection. In this work, we propose two novel algorithms, called Flexible Lexical Chain II and Fixed Lexical Chain II that combine the semantic relations derived from lexical chains, prior knowledge from lexical databases, and the robustness of the distributional hypothesis in word embeddings into a single decoupled system. In short, our approach has three main contributions: (i) unsupervised techniques that fully integrate word embeddings and lexical chains; (ii) a more solid semantic representation that considers the latent relation between words in a document; and (iii) lightweight word embeddings models that can be extended to any natural language task. Knowledge-based systems that use natural language text can benefit from our approach to mitigate ambiguous semantic representations provided by traditional statistical approaches. The proposed techniques are tested against seven word embeddings algorithms using five different machine learning classifiers over six scenarios in the document classification task. Our results show that the integration between lexical chains and word embeddings representations sustain state-of-the-art results, even against more complex systems. Github: https://github.com/truas/LexicalChain_Builder
Keyword:: document classification, lexical chains, word embeddings, synset embeddings, chain2vec, and natural language processing
Citation to related publication:: Terry Ruas, Charles Henrique Porto Ferreira, William Grosky, Fabrício Olivetti de França, Débora Maria Rossi de Medeiros, "Enhanced word embeddings using multi-semantic representation through lexical chains", Information Sciences, 2020, https://doi.org/10.1016/j.ins.2020.04.048
Discipline:: Other, Science, and Engineering

Multi-Sense embeddings through a word sense disambiguation process

Creator:: Ruas, Terry, Grosky, William, and Aizawa, Akiko
Description:: This data set is a collection of word similarity benchmarks (RG65, MEN3K, Wordsim 353, simlex999, SCWS, yp130, simverb3500) in their original format and converted into a cosine similarity scale. In addition, we have two Wikpedia Dumps from 2010 (April) and 2018 (January) in which we provide the original format (raw words), converted using the techniques described in the paper (MSSA, MSSA-D and MSSA-NR) (title in this repository), and also the word embeddings models for 300d and 1000d using a word2vec implementation. A readme.txt is provided with more details for each file.
Keyword:: multi-sense embeddings, MSSA, word2vec, wikipedia dump, synset, and natural language processing
Citation to related publication:: https://doi.org/10.1016/j.eswa.2019.06.026
Discipline:: Other

Marquette, Michigan Snowfall Instrument Site: Micro Rain Radar Dataset

Creator:: Richter, Jack and Pettersen, Claire
Description:: Radar observations supply detailed information about the structure and evolution of precipitation. These observations allow one to evaluate the macro- and/or micro-physical properties of precipitation at high spatial and temporal resolution. This dataset provides a nearly continuous collection of radar observations from a Metek Micro Rain Radar 2 (MRR) in Marquette, Michigan, USA (MQT). The MRR is a relatively low-cost, low-power K-band (24 GHz) profiling radar that scans the atmosphere at a fixed 90° zenith angle (i.e., directly overhead). The MRR in MQT is configured such that observations are provided every minute at a vertical resolution of 100m up to 3000m AGL (note: due to ground clutter, the effective operating range is 400m–3000m AGL). The MRR data are processed using IMProToo (Maahn and Kollias, 2012; https://doi.org/10.5194/amt-5-2661-2012) to increase the sensitivity of the radar to -10 dBZ and are “de-noised” using a principal component analysis method on the MRR raw power spectra to remove interference from a nearby broadcasting tower (Pettersen et al., 2020; https://doi.org/10.1175/JAMC-D-19-0099.1). Within this dataset, users will find observations such as the equivalent reflectivity factor, Doppler velocity, and reflectivity power spectra.
Keyword:: radar, snowfall, precipiation, microphysics, and in situ
Citation to related publication:: https://doi.org/10.1175/JAMC-D-19-0099.1, https://doi.org/10.1175/BAMS-D-19-0128.1, and https://doi.org/10.1029/2022JD037132
Discipline:: Other and Science

A Comprehensive Northern Hemisphere Particle Microphysics Dataset from the Precipitation Imaging Package

Creator:: King, Fraser and Pettersen, Claire
Description:: Microphysical observations of precipitating particles are crucial for numerical weather prediction models and remote sensing retrieval algorithms. This dataset provides a unified, comprehensive collection of particle microphysical observations from the Precipitation Imaging Package (PIP) over the Northern Hemisphere. Data spans from 2014-2023 across 10 measurement sites and encompasses over 775 thousand precipitating minutes. Within this dataset, users will find a range of microphysical attributes for rain and snow, along with higher-order products.
Keyword:: precipitation, imaging, package, PIP, snowfall, rainfall, disdrometer, particle, microphysics
Discipline:: Other

Classification of complex local environments in systems of particle shapes through shape-symmetry encoded data augmentation

Creator:: Lee, Shih Kuang, Tsai, Sun Ting, and Glotzer, Sharon C.
Description:: The trajectory data and codes were generated for our work "Classification of complex local environments in systems of particle shapes through shape-symmetry encoded data augmentation" (amidst peer review process). The data sets contain trajectory data in GSD file format for 7 test systems, including cubic structures, two-dimensional and three-dimensional patchy particle shape systems, hexagonal bipyramids with two aspect ratios, and truncated shapes with two degrees of truncation. Besides, the corresponding Python code and Jupyter notebook used to perform data augmentation, MLP classifier training, and MLP classifier testing are included.
Keyword:: Machine Learning, Colloids Self-Assembly, Crystallization, and Order Parameter
Citation to related publication:: https://doi.org/10.48550/arXiv.2312.11822
Discipline:: Other, Science, and Engineering

Human Rights and Large-Scale Sport Events: A Scoping Review [Literature Search Files]

Creator:: Sant, Stacy-Lynn, Maleske, Christine, and Vanderboll, Kathryn
Description:: This dataset includes the full list of journals searched in this review and the complete literature search strategies. and No proprietary software is required to open any of these files.
Keyword:: Sport Management, Sport Events, Human Rights, and Scoping Review
Discipline:: Other

Research on Human Rights and Large-Scale Sport Events from 1990-2022: A Scoping Review [Literature Search and Citation Files]

Creator:: Sant, Stacy-Lynn, Maleske, Christine, and Vanderboll, Kathryn
Description:: This dataset includes the list of journals searched in this review and the complete literature search strategies, as well as a full citation list and journal analysis of all studies included in the review. and No proprietary software is required to open any of these files.
Keyword:: Sport Management, Sport Events, Human Rights, and Scoping Review
Discipline:: Other

Data for: Global Service-Learning - A systematic review of principle and practice

Creator:: Hawes, Jason K, Johnson, Rebecca, Payne, Lindsey, Ley, Christian, Grady, Caitlin A., Domenech, Jennifer, Evich, Carly D., Kanach, Andrew, Koeppen, Allison, Roe, Kristen, Caprio, Audrey, Puente Castro, Jessica, LeMaster, Paige, and Blatchley, Ernest R. III
Description:: Global service-learning brings students, instructors, and communities together to support learning and community development across borders. In doing so, global service-learning practitioners act at the intersection of two fields: service-learning and international development. Critical scholarship in all three domains has highlighted the tensions inherent in defining and tracking “success” in community development. In response, service-learning and international development have turned considerable attention to documenting project characteristics, also known as best practices or success factors, which support equitable, sustainable community development. This database accompanies the article "Global Service-Learning - A systematic review of principle and practice," which presents a systematic synthesis of these fields’ best practices in the context of global service-learning. We propose 18 guiding principles for project design which aim to support practitioners in creating and maintaining justice-oriented, stakeholder-driven projects. This database contains the necessary reference material to trace the path of our analysis from abstract review to thematic synthesis. It also contains the final results of the thematic synthesis. To respect copyright restrictions, we have not made PDFs of all articles analyzed publicly accessible. Please contact the authors of this database or of the original article if you seek to access one of the articles we reference. For more information, see: Hawes, J. K., et al. “Global Service-Learning - A Systematic Review of Principle and Practice.” International Journal of Research on Service-Learning and Community Engagement 10, no. 1 (2022).
Keyword:: service-learning, international development, global service-learning, best practices, equitable development, higher education, community engagement, and student-friendly
Citation to related publication:: Hawes, J. K. (2021). Global Service-Learning—A systematic review of principle and practice. International Journal of Research on Service-Learning and Community Engagement, 10(1). https://doi.org/10.37333/001c.31383
Discipline:: International Studies and Other

Trimester-specific phthalate exposures in pregnancy are associated with circulating metabolites in children

Creator:: Goodrich, Jaclyn M., Tang, Lu, Rodríguez-Carmona, Yanelli, Meijer, J L., Perng, Wei, Watkins, Deborah J., Meeker, John D. , Mercado-García, Adriana, Cantoral, Alejandra, Song, Peter X. , Téllez-Rojo, Martha M. , and Peterson, Karen E.
Description:: Phthalates are chemicals found in many products that humans are exposed to. Prenatal exposure to phthalates has been associated with adverse outcomes that are detected in childhood, adolescence, and even adulthood. In this study, we sought to identify subtle biological changes in the metabolome of children that were exposed to phthalates during gestation. We hypothesized that prenatal phthalate exposures would alter metabolic pathways related to adiposity and cardiometabolic health. The article is under review (citation to be added when paper is published). The data included here encompass all exposure, demographic, and untargeted metabolomics data needed for the analysis described in the manuscript.
Keyword:: Phthalates , Prenatal, and Metabolomics
Citation to related publication:: Goodrich J.M., Tang L.,Rodríguez-Carmona Y., Meijer J.L, Perng W., Watkins D.J., Meeker J.D., Mercado-García A., Cantoral A., Song P.X., Téllez-Rojo M.M., Peterson K.E. Trimester-specific phthalate exposures in pregnancy are associated with circulating metabolites in children. PLoS One. (Under revision – forthcoming.)
Discipline:: Other and Health Sciences

Itineraries of Tent Maps up to Orbit Length 34

Creator:: Robert Buckley, Grace O'Brien, and Zoe Zhou
Description:: The purpose of the research is to better understand and approximate the Thurston Set. This project was computational in nature and Python was used to collect our data. The data set contains encoded itineraries that can be used to compute values that are elements of the Thurston Set. A visual approximation of the Thurston Set can be found here ( https://arxiv.org/abs/1402.2008), on the first page Thurston’s own paper. The data can also be used to study the distribution of superattracting beta values within the interval (1, 2] and to explore an analogous Mandelbrot-Julia Correspondence. This research was conducted through the Lab of Geometry at Michigan under the advisement of Harrison Bray during the Fall semester of 2019. , The Python 3.x scripts in this deposit are the exact versions used to created the *.txt files that are in the zip archive. As the project continues, any expansion to the work, such as further analysis or visualization scripts, will be posted to the project's GitHub https://github.com/Tent-Maps-Team/Thurston-Set. Also, a user can reproduce our results and generate bigger datasets on machines with large amounts of memory. , and The data consists of zipper folders representing tent map itinerary orbit lengths. These orbit files can be used to create visualizations, create and explore conjectures such as refining proposed bounds on the Thurston Set and supporting an analogous Mandelbrot-Julia Correspondence. Within these zipped folders are .txt files in CSV format with the naming structure of xx_y of admissible itineraries up to the length indicated by the folder name where xx is the length of the encoded itineraries included. The txt's have a single column and each line(row) is an array representing an encoding of an itinerary. Some of the txt's have been split into multiple parts (whenever there are more than 200 MB of itinerary data) and these txt's have been numbered using the y after the underscore. As we exclude the degenerate tent map (where β = 1), we cannot have orbit length 1 or 2 and this is why the orbits start with length 3 (i.e. start with 3.zip).
Keyword:: Math, mathematics, tent maps, thurston, milnor, Milnor-Thurston, supperattracting, entropy, orbit, and itineraries
Citation to related publication:: Buckley R, O’Brien G, Zhou Z (2021). On Itineraries of Tent Maps. Forthcoming.
Discipline:: Other

Enhanced word embeddings using multi-semantic representation through lexical chains

Multi-Sense embeddings through a word sense disambiguation process

Marquette, Michigan Snowfall Instrument Site: Micro Rain Radar Dataset

A Comprehensive Northern Hemisphere Particle Microphysics Dataset from the Precipitation Imaging Package

Classification of complex local environments in systems of particle shapes through shape-symmetry encoded data augmentation

Human Rights and Large-Scale Sport Events: A Scoping Review [Literature Search Files]

Research on Human Rights and Large-Scale Sport Events from 1990-2022: A Scoping Review [Literature Search and Citation Files]

Data for: Global Service-Learning - A systematic review of principle and practice

Trimester-specific phthalate exposures in pregnancy are associated with circulating metabolites in children

Itineraries of Tent Maps up to Orbit Length 34

Limit your search

Resource type

Creator

Discipline

Language

Search Results

Search Constraints

Search Results

Limit your search