The Lannang Corpus (LanCorp): A POS-tagged, sociolinguistic corpus containing recordings and transcriptions of Lannang speech collected from the metropolitan Manila Lannangs between 2016 and 2020
dc.contributor.author | Gonzales, Wilkinson Daniel Wong | |
dc.date.accessioned | 2022-09-02T20:52:03Z | |
dc.date.available | 2022-09-02T20:52:03Z | |
dc.date.issued | 2022-09-05 | |
dc.identifier.uri | https://hdl.handle.net/2027.42/174160 | en |
dc.description | This work is a description of a data set hosted by the Deep Blue Data repository. This data set can be accessed at: https://doi.org/10.7302/66g9-e028. | en_US |
dc.description.abstract | The Lannang Corpus (LanCorp) is a sociolinguistic POS-tagged 375,000-word speech-and-text corpus of Lannang languages based on audio recordings collected in metropolitan Manila between 2016 and 2020. It hopes to furnish scholars interested in SinoPhilippine (socio)linguistics with a contemporary, multilingual corpus (i.e., Hokkien, Tagalog, English, Lánnang-uè, Mandarin) compiled using recorded oral data primarily collected from a Sino-Philippine community in metropolitan Manila by the community: the Manila Lannangs. The publicly available corpus contains manual transcriptions (time-aligned to the audio), source language and part-of-speech tags derived using a mix of manual and computational methods, and a wide range of social metadata; it is also organized and stored systematically for easy data retrieval and (socio)linguistic analysis. Although there are existing sociolinguistic corpora, they are small in scale and were not released publicly due to lack of informant consent – LanCorp readily fills the gap. | en_US |
dc.language.iso | en_US | en_US |
dc.rights | Attribution-NonCommercial 4.0 International | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | * |
dc.subject | Corpus | en_US |
dc.subject | Lannang languages | en_US |
dc.subject | Manila | en_US |
dc.title | The Lannang Corpus (LanCorp): A POS-tagged, sociolinguistic corpus containing recordings and transcriptions of Lannang speech collected from the metropolitan Manila Lannangs between 2016 and 2020 | en_US |
dc.type | Dataset | en_US |
dc.subject.hlbsecondlevel | Linguistics | |
dc.subject.hlbtoplevel | Humanities | |
dc.contributor.affiliationum | Linguistics, Department of | en_US |
dc.contributor.affiliationother | Department of English, The Chinese University of Hong Kong | en_US |
dc.contributor.affiliationumcampus | Ann Arbor | en_US |
dc.description.bitstreamurl | http://deepblue.lib.umich.edu/bitstream/2027.42/174160/1/LanCorp.pdf | |
dc.identifier.doi | https://dx.doi.org/10.7302/5891 | |
dc.description.filedescription | Description of LanCorp.pdf : Main Article | |
dc.description.depositor | SELF | en_US |
dc.working.doi | 10.7302/5891 | en_US |
dc.owningcollname | Linguistics, Department of |
Files in this item
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.