Show simple item record

Leveraging Longitudinal Data for Personalized Prediction and Word Representations

dc.contributor.authorWelch, Charles
dc.date.accessioned2021-06-08T23:10:18Z
dc.date.available2021-06-08T23:10:18Z
dc.date.issued2021
dc.date.submitted2021
dc.identifier.urihttps://hdl.handle.net/2027.42/167971
dc.description.abstractThis thesis focuses on personalization, word representations, and longitudinal dialog. We first look at users expressions of individual preferences. In this targeted sentiment task, we find that we can improve entity extraction and sentiment classification using domain lexicons and linear term weighting. This task is important to personalization and dialog systems, as targets need to be identified in conversation and personal preferences affect how the system should react. Then we examine individuals with large amounts of personal conversational data in order to better predict what people will say. We consider extra-linguistic features that can be used to predict behavior and to predict the relationship between interlocutors. We show that these features improve over just using message content and that training on personal data leads to much better performance than training on a sample from all other users. We look not just at using personal data for these end-tasks, but also constructing personalized word representations. When we have a lot of data for an individual, we create personalized word embeddings that improve performance on language modeling and authorship attribution. When we have limited data, but we have user demographics, we can instead construct demographic word embeddings. We show that these representations improve language modeling and word association performance. When we do not have demographic information, we show that using a small amount of data from an individual, we can calculate similarity to existing users and interpolate or leverage data from these users to improve language modeling performance. Using these types of personalized word representations, we are able to provide insight into what words vary more across users and demographics. The kind of personalized representations that we introduce in this work allow for applications such as predictive typing, style transfer, and dialog systems. Importantly, they also have the potential to enable more equitable language models, with improved performance for those demographic groups that have little representation in the data.
dc.language.isoen_US
dc.subjectlongitudinal dialog, personalization, natural language processing
dc.titleLeveraging Longitudinal Data for Personalized Prediction and Word Representations
dc.typeThesis
dc.description.thesisdegreenamePhDen_US
dc.description.thesisdegreedisciplineComputer Science & Engineering
dc.description.thesisdegreegrantorUniversity of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeememberMihalcea, Rada
dc.contributor.committeememberJurgens, David
dc.contributor.committeememberChai, Joyce
dc.contributor.committeememberPavlick, Ellie
dc.subject.hlbsecondlevelComputer Science
dc.subject.hlbtoplevelEngineering
dc.subject.hlbtoplevelScience
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/167971/1/cfwelch_1.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/1398
dc.identifier.orcid0000-0002-3489-2882
dc.identifier.name-orcidWelch, Charles; 0000-0002-3489-2882en_US
dc.working.doi10.7302/1398en
dc.owningcollnameDissertations and Theses (Ph.D. and Master's)


Files in this item

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.