Show simple item record

The Lannang Corpus (LanCorp): A POS-tagged, sociolinguistic corpus containing recordings and transcriptions of Lannang speech collected from the metropolitan Manila Lannangs between 2016 and 2020

dc.contributor.authorGonzales, Wilkinson Daniel Wong
dc.date.accessioned2022-09-02T20:52:03Z
dc.date.available2022-09-02T20:52:03Z
dc.date.issued2022-09-05
dc.identifier.urihttps://hdl.handle.net/2027.42/174160en
dc.descriptionThis work is a description of a data set hosted by the Deep Blue Data repository. This data set can be accessed at: https://doi.org/10.7302/66g9-e028.en_US
dc.description.abstractThe Lannang Corpus (LanCorp) is a sociolinguistic POS-tagged 375,000-word speech-and-text corpus of Lannang languages based on audio recordings collected in metropolitan Manila between 2016 and 2020. It hopes to furnish scholars interested in SinoPhilippine (socio)linguistics with a contemporary, multilingual corpus (i.e., Hokkien, Tagalog, English, Lánnang-uè, Mandarin) compiled using recorded oral data primarily collected from a Sino-Philippine community in metropolitan Manila by the community: the Manila Lannangs. The publicly available corpus contains manual transcriptions (time-aligned to the audio), source language and part-of-speech tags derived using a mix of manual and computational methods, and a wide range of social metadata; it is also organized and stored systematically for easy data retrieval and (socio)linguistic analysis. Although there are existing sociolinguistic corpora, they are small in scale and were not released publicly due to lack of informant consent – LanCorp readily fills the gap.en_US
dc.language.isoen_USen_US
dc.rightsAttribution-NonCommercial 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/*
dc.subjectCorpusen_US
dc.subjectLannang languagesen_US
dc.subjectManilaen_US
dc.titleThe Lannang Corpus (LanCorp): A POS-tagged, sociolinguistic corpus containing recordings and transcriptions of Lannang speech collected from the metropolitan Manila Lannangs between 2016 and 2020en_US
dc.typeDataseten_US
dc.subject.hlbsecondlevelLinguistics
dc.subject.hlbtoplevelHumanities
dc.contributor.affiliationumLinguistics, Department ofen_US
dc.contributor.affiliationotherDepartment of English, The Chinese University of Hong Kongen_US
dc.contributor.affiliationumcampusAnn Arboren_US
dc.description.bitstreamurlhttp://deepblue.lib.umich.edu/bitstream/2027.42/174160/1/LanCorp.pdf
dc.identifier.doihttps://dx.doi.org/10.7302/5891
dc.description.filedescriptionDescription of LanCorp.pdf : Main Article
dc.description.depositorSELFen_US
dc.working.doi10.7302/5891en_US
dc.owningcollnameLinguistics, Department of


Files in this item

Show simple item record

Attribution-NonCommercial 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial 4.0 International

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.