Article Title:Advancing Sino-Philippine linguistics and sociolinguistics using the Lannang Corpus (LanCorp)
Abstract:
This paper introduces the Lannang Corpus (LanCorp), a public 375,000-word collection of raw and transcribed recordings of Lannang languages spoken in metropolitan Manila, which have been annotated with part-of-speech tags and linked to 40 types of sociolinguistic metadata. It begins by providing an overview of the LanCorp (e.g. design, formats, accessibility). Then, it goes on to show various examples of how the corpus can be used for variationist sociolinguistic research, using L & aacute;nnang-u & egrave; data as a case study. The findings from the exploratory studies indicate that Lannang languages are influenced by sociolinguistic factors, demonstrating the intricate nature of the Sino-Philippine sociolinguistic ecology. Due to its large size, sociolinguistic metadata, and various formats, LanCorp can be used to study Lannang languages in general and how they are used by specific social groups. It enables scholars to investigate multilingual interactions in a wide range of sociolinguistic factors, furthering the field of Sino-Philippine (socio)linguistics.
Keywords: Sino-Philippine sociolinguistics; language variation and change; mixed language and multilingual corpora; computational methods for analyzing multilingual phenomena; language documentation through corpora
DOI: 10.1075/ijcl.22096.gon
Source:INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS
Welcome to correct the error, please contact email: humanisticspider@gmail.com