The Lannang Corpus (LanCorp) is a sociolinguistic POS-tagged 375,000-word speech-and-text corpus of Lannang languages based on audio recordings collected in metropolitan Manila between 2016 and 2020. It hopes to furnish scholars interested in Sino-Philippine (socio)linguistics with a contemporary, multilingual corpus (i.e., Hokkien, Tagalog, English, Lánnang-uè, Mandarin) compiled using recorded oral data primarily collected from a Sino-Philippine community in metropolitan Manila by the community: the Manila Lannangs. The publicly available corpus contains manual transcriptions (time-aligned to the audio), source language and part-of-speech tags derived using a mix of manual and computational methods, and a wide range of social metadata; it is also organized and stored systematically for easy data retrieval and (socio)linguistic analysis. Although there are existing sociolinguistic corpora, they are small in scale and were not released publicly due to lack of informant consent – LanCorp readily fills the gap.
[1] Gonzales, Wilkinson Daniel Wong. 2021. Interactions of Sinitic languages in the Philippines: Sinicization, Filipinization, and Sino-Philippine language creation. The Palgrave handbook of Chinese language studies, ed. by Zhengdao Ye. London: Palgrave-MacMillan., [2] Gonzales, Wilkinson Daniel Wong. 2021. Filipino, Chinese, neither, or both? The Lannang identity and its relationship with language. Language & Communication 77., [3] Gonzales, Wilkinson Daniel Wong. 2022. “Truly a Language of Our Own” A Corpus-Based, Experimental, and Variationist Account of Lánnang-uè in Manila. Ann Arbor: University of Michigan Ph.D. dissertation., [4] Gonzales, Wilkinson Daniel Wong. 2022. Hybridization. Philippine English: Development, Structure, and Sociology of English in the Philippines, ed. by Ariane Macalinga Borlongan. London: Routledge., and [5] Gonzales, Wilkinson Daniel Wong. in preparation. Advancing Sino-Philippine (socio)linguistics using the Lannang Corpus (LanCorp) – a multilingual, POS-tagged, and audio-textual databank.