Show simple item record

Understanding Word Embedding Stability Across Languages and Applications

dc.contributor.authorBurdick, Laura
dc.date.accessioned2020-10-04T23:22:36Z
dc.date.availableNO_RESTRICTION
dc.date.available2020-10-04T23:22:36Z
dc.date.issued2020
dc.identifier.urihttps://hdl.handle.net/2027.42/162917
dc.description.abstractDespite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this thesis, we consider several aspects of embedding spaces, including their stability. First, we propose a definition of stability, and show that common English word embeddings are surprisingly unstable. We explore how properties of data, words, and algorithms relate to instability. We extend this work to approximately 100 world languages, considering how linguistic typology relates to stability. Additionally, we consider contextualized output embedding spaces. Using paraphrases, we explore properties and assumptions of BERT, a popular embedding algorithm. Second, we consider how stability and other word embedding properties affect tasks where embeddings are commonly used. We consider both word embeddings used as features in downstream applications and corpus-centered applications, where embeddings are used to study characteristics of language and individual writers. In addition to stability, we also consider other word embedding properties, specifically batching and curriculum learning, and how methodological choices made for these properties affect downstream tasks. Finally, we consider how knowledge of stability affects how we use word embeddings. Throughout this thesis, we discuss strategies to mitigate instability and provide analyses highlighting the strengths and weaknesses of word embeddings in different scenarios and languages. We show areas where more work is needed to improve embeddings, and we show where embeddings are already a strong tool.
dc.language.isoen_US
dc.subjectnatural language processing
dc.subjectword embeddings
dc.subjectmachine learning
dc.subjectstability
dc.subjectmultilingual
dc.subjectword semantics
dc.titleUnderstanding Word Embedding Stability Across Languages and Applications
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineComputer Science & Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberMihalcea, Rada
dc.contributor.committeememberJurgens, David
dc.contributor.committeememberChai, Joyce
dc.contributor.committeememberKummerfeld, Jonathan K.
dc.contributor.committeememberMimno, David
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbtoplevelEngineering
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/162917/1/lburdick_1.pdfen_US
dc.identifier.orcid0000-0002-9953-4592
dc.identifier.name-orcid(Wendlandt) Burdick, Laura; 0000-0002-9953-4592en_US
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.