Demographic-Aware Natural Language Processing
Garimella, Aparna
2020
Abstract
The underlying traits of our demographic group affect and shape our thoughts, and therefore surface in the way we express ourselves and employ language in our day-to-day life. Understanding and analyzing language use in people from different demographic backgrounds help uncover their demographic particularities. Conversely, leveraging these differences could lead to the development of better language representations, thus enabling further demographic-focused refinements in natural language processing (NLP) tasks. In this thesis, I employ methods rooted in computational linguistics to better understand various demographic groups through their language use. The thesis makes two main contributions. First, it provides empirical evidence that words are indeed used differently by different demographic groups in naturally occurring text. Through experiments conducted on large datasets which display usage scenarios for hundreds of frequent words, I show that automatic classification methods can be effective in distinguishing between word usages of different demographic groups. I compare the encoding ability of the utilized features by conducting feature analyses, and shed light on how various attributes contribute to highlighting the differences. Second, the thesis explores whether demographic differences in word usage by different groups can inform the development of more refined approaches to NLP tasks. Specifically, I start by investigating the task of word association prediction. The thesis shows that going beyond the traditional ``one-size-fits-all'' approach, demographic-aware models achieve better performances in predicting word associations for different demographic groups than generic ones. Next, I investigate the impact of demographic information on part-of-speech tagging and syntactic parsing, and the experiments reveal numerous part-of-speech tags and syntactic relations, whose predictions benefit from the prevalence of a specific group in the training data. Finally, I explore demographic-specific humor generation, and develop a humor generation framework to fill-in the blanks to generate funny stories, while taking into account people's demographic backgrounds.Subjects
Demographic-Aware Natural Language Processing Identifying Demographic Differences in Word Usage Demographic-Aware Word Associations Gender-Bias in Part-of-Speech Tagging and Dependency Parsing Demographic-Aware Humor Generation in Mad Libs Personalization in Language
Types
Thesis
Metadata
Show full item recordCollections
Showing items related by title, author, creator and subject.
-
Reynolds, Lucas Victor (2020-04)
-
Felisbino-Mendes, Mariana S; Matozinhos, Fernanda P; Miranda, J J; Villamor, Eduardo; Velasquez-Melendez, Gustavo (2014-01-07)
-
Jensen, Michael; Zajac, Edward J. (John Wiley & Sons, Ltd., 2004-06)
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.