Leveraging Longitudinal Data for Personalized Prediction and Word Representations

Welch, Charles

Leveraging Longitudinal Data for Personalized Prediction and Word Representations

dc.contributor.author	Welch, Charles
dc.date.accessioned	2021-06-08T23:10:18Z
dc.date.available	2021-06-08T23:10:18Z
dc.date.issued	2021
dc.date.submitted	2021
dc.identifier.uri	https://hdl.handle.net/2027.42/167971
dc.description.abstract	This thesis focuses on personalization, word representations, and longitudinal dialog. We first look at users expressions of individual preferences. In this targeted sentiment task, we find that we can improve entity extraction and sentiment classification using domain lexicons and linear term weighting. This task is important to personalization and dialog systems, as targets need to be identified in conversation and personal preferences affect how the system should react. Then we examine individuals with large amounts of personal conversational data in order to better predict what people will say. We consider extra-linguistic features that can be used to predict behavior and to predict the relationship between interlocutors. We show that these features improve over just using message content and that training on personal data leads to much better performance than training on a sample from all other users. We look not just at using personal data for these end-tasks, but also constructing personalized word representations. When we have a lot of data for an individual, we create personalized word embeddings that improve performance on language modeling and authorship attribution. When we have limited data, but we have user demographics, we can instead construct demographic word embeddings. We show that these representations improve language modeling and word association performance. When we do not have demographic information, we show that using a small amount of data from an individual, we can calculate similarity to existing users and interpolate or leverage data from these users to improve language modeling performance. Using these types of personalized word representations, we are able to provide insight into what words vary more across users and demographics. The kind of personalized representations that we introduce in this work allow for applications such as predictive typing, style transfer, and dialog systems. Importantly, they also have the potential to enable more equitable language models, with improved performance for those demographic groups that have little representation in the data.
dc.language.iso	en_US
dc.subject	longitudinal dialog, personalization, natural language processing
dc.title	Leveraging Longitudinal Data for Personalized Prediction and Word Representations
dc.type	Thesis
dc.description.thesisdegreename	PhD	en_US
dc.description.thesisdegreediscipline	Computer Science & Engineering
dc.description.thesisdegreegrantor	University of Michigan, Horace H. Rackham School of Graduate Studies
dc.contributor.committeemember	Mihalcea, Rada
dc.contributor.committeemember	Jurgens, David
dc.contributor.committeemember	Chai, Joyce
dc.contributor.committeemember	Pavlick, Ellie
dc.subject.hlbsecondlevel	Computer Science
dc.subject.hlbtoplevel	Engineering
dc.subject.hlbtoplevel	Science
dc.description.bitstreamurl	http://deepblue.lib.umich.edu/bitstream/2027.42/167971/1/cfwelch_1.pdf
dc.identifier.doi	https://dx.doi.org/10.7302/1398
dc.identifier.orcid	0000-0002-3489-2882
dc.identifier.name-orcid	Welch, Charles; 0000-0002-3489-2882	en_US
dc.working.doi	10.7302/1398	en
dc.owningcollname	Dissertations and Theses (Ph.D. and Master's)

Files in this item

Name:: cfwelch_1.pdf
Size:: 987.8KB
Format:: PDF

View/Open

Dissertations and Theses (Ph.D. and Master's)

Show simple item record

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.