Demographic-Aware Natural Language Processing

Garimella, Aparna

Demographic-Aware Natural Language Processing

Garimella, Aparna

2020

View/Open

gaparna_1.pdf

(1.9MB

PDF)

Abstract

The underlying traits of our demographic group affect and shape our thoughts, and therefore surface in the way we express ourselves and employ language in our day-to-day life. Understanding and analyzing language use in people from different demographic backgrounds help uncover their demographic particularities. Conversely, leveraging these differences could lead to the development of better language representations, thus enabling further demographic-focused refinements in natural language processing (NLP) tasks. In this thesis, I employ methods rooted in computational linguistics to better understand various demographic groups through their language use. The thesis makes two main contributions. First, it provides empirical evidence that words are indeed used differently by different demographic groups in naturally occurring text. Through experiments conducted on large datasets which display usage scenarios for hundreds of frequent words, I show that automatic classification methods can be effective in distinguishing between word usages of different demographic groups. I compare the encoding ability of the utilized features by conducting feature analyses, and shed light on how various attributes contribute to highlighting the differences. Second, the thesis explores whether demographic differences in word usage by different groups can inform the development of more refined approaches to NLP tasks. Specifically, I start by investigating the task of word association prediction. The thesis shows that going beyond the traditional ``one-size-fits-all'' approach, demographic-aware models achieve better performances in predicting word associations for different demographic groups than generic ones. Next, I investigate the impact of demographic information on part-of-speech tagging and syntactic parsing, and the experiments reveal numerous part-of-speech tags and syntactic relations, whose predictions benefit from the prevalence of a specific group in the training data. Finally, I explore demographic-specific humor generation, and develop a humor generation framework to fill-in the blanks to generate funny stories, while taking into account people's demographic backgrounds.

Subjects

Demographic-Aware Natural Language Processing

Identifying Demographic Differences in Word Usage

Demographic-Aware Word Associations

Gender-Bias in Part-of-Speech Tagging and Dependency Parsing

Demographic-Aware Humor Generation in Mad Libs

Personalization in Language

Types

Thesis

Handle

https://hdl.handle.net/2027.42/155164

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.