C-NARS: An Open-Source Tool for Classification of Narratives in Survey Data
dc.contributor.author | Abramowitz, Joelle | |
dc.contributor.author | Kim, Jinseok | |
dc.date.accessioned | 2023-10-12T00:44:16Z | |
dc.date.available | 2023-10-12T00:44:16Z | |
dc.date.issued | 2021-09-28 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/178294 | en |
dc.description | The code set is available at https://github.com/TEEDLab/CNARS | en_US |
dc.description.abstract | To help researchers and policy-makers better understand how different types of self-employment contribute to older adults’ income, retirement, and quality of life, this project develops a computational method to classify self-employment narratives in survey data. Among 17,854 job narratives in the Health and Retirement Study between 1994 and 2018, about 4,500 instances are labeled into one of three categories – Owner, Manager, and Independent - by human coders. A variety of machine learning algorithms are trained and tested on the labeled data in which each narrative text is pre-processed (lemmatization, stemming, etc.) and transformed into a vector of word tokens for cosine similarity calculation among narratives. The best-performing classification model (Gradient Boosting Trees) is applied to the entire 17,854 instances to produce probability scores of an instance being likely to belong to each of the three categories. A total of 14,748 instances with a probability score of 0.9 or above for ‘Independent’ or with a probability score of 0.6 or above for ‘Owner’ are filtered as accurately tagged instances because they are highly likely to be assigned correct categories (97.3% for Independent and 99.0% for Owner) when evaluated on 10 random subsets (20% of 4,500 instances each) of the labeled data. The remaining instances are passed to manual inspection and correction before the entire data are to be used for statistical analyses. The classification code sets – Classification of Narratives in Survey Data (C-NARS) - are made publicly available for researchers to implement machine learning methods for the classification of narratives in survey data. | en_US |
dc.description.sponsorship | Michigan Retirement & Disability Research Center (MRDRC) UM21-14. "What We Talk about When We Talk About Self-Employment: Examining Self-Employment and the Transition to Retirement among Older Adults in the United States" (2020.10 - 2021.9; PI - Joelle Abramowitz, Co-PI - Jinseok Kim) | en_US |
dc.language.iso | en_US | en_US |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | survey narratives | en_US |
dc.subject | job type prediction | en_US |
dc.subject | survey methodology | en_US |
dc.subject | machine learning for classification | en_US |
dc.title | C-NARS: An Open-Source Tool for Classification of Narratives in Survey Data | en_US |
dc.type | Technical Report | en_US |
dc.subject.hlbsecondlevel | Social Sciences (General) | |
dc.subject.hlbtoplevel | Social Sciences | |
dc.contributor.affiliationum | Institute for Social Research (ISR) | en_US |
dc.contributor.affiliationum | Survey Research Center, Institute for Social Research | en_US |
dc.contributor.affiliationum | School of Information | en_US |
dc.contributor.affiliationumcampus | Ann Arbor | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/178294/1/Kim&Abramowitz_CNARS_technical report.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/8683 | |
dc.identifier.orcid | 0000-0001-6481-2065 | en_US |
dc.description.filedescription | Description of Kim&Abramowitz_CNARS_technical report.pdf : Technical Report | |
dc.description.depositor | SELF | en_US |
dc.identifier.name-orcid | Kim, Jinseok; 0000-0001-6481-2065 | en_US |
dc.working.doi | 10.7302/8683 | en_US |
dc.owningcollname | Institute for Social Research (ISR) |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe its collections in a way that respects the people and communities who create, use, and are represented in them. We encourage you to Contact Us anonymously if you encounter harmful or problematic language in catalog records or finding aids. More information about our policies and practices is available at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.