The relationship between words in a sentence often tell us more about the underlying semantic content of a document than its actual words, individually. Recent publications in the natural language processing arena, more specifically using word embeddings, try to incorporate semantic aspects into their word vector representation by considering the context of words and how they are distributed in a document collection. In this work, we propose two novel algorithms, called Flexible Lexical Chain II and Fixed Lexical Chain II that combine the semantic relations derived from lexical chains, prior knowledge from lexical databases, and the robustness of the distributional hypothesis in word embeddings into a single decoupled system. In short, our approach has three main contributions: (i) unsupervised techniques that fully integrate word embeddings and lexical chains; (ii) a more solid semantic representation that considers the latent relation between words in a document; and (iii) lightweight word embeddings models that can be extended to any natural language task. Knowledge-based systems that use natural language text can benefit from our approach to mitigate ambiguous semantic representations provided by traditional statistical approaches. The proposed techniques are tested against seven word embeddings algorithms using five different machine learning classifiers over six scenarios in the document classification task. Our results show that the integration between lexical chains and word embeddings representations sustain state-of-the-art results, even against more complex systems.
This data set is a collection of word similarity benchmarks (RG65, MEN3K, Wordsim 353, simlex999, SCWS, yp130, simverb3500) in their original format and converted into a cosine similarity scale.
In addition, we have two Wikpedia Dumps from 2010 (April) and 2018 (January) in which we provide the original format (raw words), converted using the techniques described in the paper (MSSA, MSSA-D and MSSA-NR) (title in this repository), and also the word embeddings models for 300d and 1000d using a word2vec implementation. A readme.txt is provided with more details for each file.
This collection represents various raw data and analysis of cores extracted during the November 2008 mission of R/V Melville in the Santa Barbara Basin., The core included is the jumbo piston core MV0811-14JC. Core photos, physical properties and magnetic susceptibility from the multisensor track (MST), and the scanning X-ray fluorescence (XRF) data are included in the collection., and Cruise DOI: 10.7284/903459
The research is funded by NSF OCE-1304327.
This data and scripts are meant to test and show seizure differentiation based on bifurcation theory. A zip file is included which contains real and simulated seizure waveforms, Matlab scripts, and metadata. The matlab scripts allow for visual review validation and objective feature analysis. The file “README.txt” provides more detail about each individual file within the zip file. and Data citation: Crisp, D.N., Saggio, M.L., Scott, J., Stacey, W.C., Nakatani, M., Gliske, S.F., Lin, J. (2019). Epidynamics: Navigating the map of seizure dynamics - Code & Data [Data set]. University of Michigan Deep Blue Data Repository. https://doi.org/10.7302/ejhy-5h41
This collection represents various raw data and analysis of cores extracted during the January 2009 mission of the research vessel Sproul in the Santa Barbara Basin., Cores included: box core SPR0901-04BC, box core SPR0901-unnamed, and Kasten core SPR0901-03KC. Core photos, physical properties and magnetic susceptibility from the multisensor track (MST), and the scanning X-ray fluorescence (XRF) data are included in the collection., and Cruise DOI: 10.7284/901089
This research is funded by NSF-OCE 0752093.
This is the flora-fauna lexical material obtained in the course of more general lexical and grammatical fieldwork on languages of central-eastern Mali (Dogon, Songhay, Bangime, Bozo). The spreadsheets in this work, duplicated in xlsx and csv formants, present our flora-fauna lexicons as of early 2019 for many languages of central-eastern Mali, and certain languages of southwestern Burkina Faso. The Malian data is in two spreadsheets (flora, fauna), while the Burkina data is in separate spreadsheets for flora, birds, fish, insects, lizards and snakes, and mammals. Please begin with the “readme” document.
Our project, mainly on Dogon languages of Mali, has branched out to Burkina Faso with emphasis on documentation of the most endangered languages. Tiefo-N was studied on an emergency basis since it was down to two aging competent speakers. For additional comments and links to a reference grammar, see the readme file.
The work on the Bangime language, spoken by the Bangande people, was carried out as part of a larger linguistic fieldwork project focused on Dogon languages. Bangime is confirmed as a language isolate with no demonstrable linguistic relatives—possibly the only such isolate in West Africa.
Jalkunan is an endangered language of the Mande family, spoken in the village cluster of Blédougou in southwestern Burkina Faso. The lexical work complements a published grammar with texts. See the readme for further information.
The research adheres to PRISMA-HARM recommendations for systematic reviews. The reproducible search strategies for all databases, the citation export files from all databases, and the eligibility screening decisions are included in the dataset.
The search data supports a literature review project on lifestyle therapies for the management of atrial fibrillation. The data included in the dataset are the reproducible search strategies (in docx) and the exported results of all citations from all databases (txt and ris files). These searches and exported result files contain all citations originating from the database searches that were considered for inclusion.
Three sensitivity analyses were performed. First, a second matching step was performed in which two controls were selected for each case, where possible using a nearest neighbor and caliper metric. Controls needed to have propensity scores within 0.1 of the case to be selected. Thirty-eight of the 39 cases had at least one control using this method and for 36 cases two controls could be selected. The average difference between case and control propensity adjuvant RT was 0.008 (range 0.00003-0.095).
A second sensitivity analysis was performed to guard against immortal time bias. In order to mitigate the possibility of this effect, cases known not to have undergone adjuvant RT have been screened for suitable follow-up without a recurrence (local or regional recurrence, metastatic failure, and/or death) to ensure that if adjuvant RT had been prescribed as part of the multi-modality treatment regimen, that it would have been initiated. Three months was selected as the mandatory follow-up time. One to one matching was carried out and all 39 cases were matched to a control. A third sensitivity analysis was performed to account for stage migration seen in control patients that presented to the University of Michigan with more advanced disease. Patients that underwent adjuvant radiation were matched one to one with control group patients who did not receive adjuvant radiation, and who had the same stage at diagnosis as compared to stage at University of Michigan presentation.
Estimated phylogenetic relationships based on more than 18,000 loci in 93 individuals (full data) or 21 individuals (subset data) representing 19 described species and two putative undescribed species. Nine files are part of this dataset, including all input files to infer the phylogenetic reconstructions and the outputs obtained, in addition to a pruned tree used to infer the ancestral state reconstructions.
Short documentary videos of practical activities and cultural events of Dogon, Fulbe, Songhay, and Bangande ethnic groups of eastern Central Mali. The videos were byproducts of linguistic research on the local language. They are presented here in three formats: wmv, avi, and either qt or mov. See the "readme" files in each work for a summary of the videos in it. and The footage was shot with various digital cameras. The oldest videos (2010 and one or two from 2011) were edited using iMovie. The later videos were edited using AVS editing software. Several of the 2010 videos, referred to as "compilations," are simple sequences of short clips that combine to illustrate a complex activity such as extracting oil from nuts. The later videos are in more flowing documentary form with overlaid titles in English. In some cases, vocabulary from the relevant native language is included in the titles.
This is a collection of photos of villages located primarily in Central Mali. These photos are primarily of Dogon villages, but there are village photos of other nearby ethnicities, including Bangande, Fulbe, Tuareg, Songay, and Bozo.These photos were taken to document the villages Professor Jeffrey Heath worked in and people he worked with while documenting languages throughout the region. For interactive geographical maps involving these villages see: http://dogonlanguages.org/geography.cfm.
This collection was produced as part of the project, “A ‘Big Data’ Approach to Understanding Neighborhood Effects in Chronic Illness Disparities.” The Investigators for the project are Tiffany Veinot, Veronica Berrocal, Phillipa Clarke, Robert Goodspeed, Daniel Romero, and VG Vinod Vydiswaran from the University of Michigan. The study took place from 2015-2016, with funding from the University of Michigan’s Social Sciences Annual Institute, MCubed, and the Sloan and Moore Foundations.
Contact: Tiffany Veinot, MLS, PhD
Office: 3443 North Quad
This eportfolio was created for the Gateway course of the Sweetland Minor in Writing to provide an opportunity for students to reflect on their growing identities as writers, as captured in their text-based and multimodal compositions produced over the Gateway semester. The title of the work contains the pseudonym created for the study while the creator field lists the student's given name to allow proper attribution for their work. The eportfolio is collected here as an artifact in the Sweetland Writing Development Study, which has been published as Developing Writers in Higher Education: A Longitudinal Study (University of Michigan Press, 2019). To learn more about this study, please see the epublication https://doi.org/10.3998/mpub.10079890, and to learn more about the Minor in Writing program and the eportfolio prompts, please see Appendix 2a ( https://doi.org/10.3998/mpub.10079890.cmp.1) to the publication.