The article discusses the project of an annotated bilingual speech corpus of the Albanian historical diaspora, which is located in four Albanian-speaking villages in the Priazov'ye and Budjak regions. Using the corpus to study bilingual speech behaviour will help us solve some research problems that are of interest to bilingualism researchers. These problems relate to two types of contact-related phenomena: code switching and borrowing. In the present paper, we analyze the main approaches to the interpretation of single-word insertions in one language into speech in another language and study methods of transcribing linguistic data and annotating contact phenomena in some existing corpora of bilingual speech. Theoretical and practical solutions for selection, metatext markup and transcription of texts are proposed, and then an attempt is made to develop a standard for the annotation of contact-related phenomena in the spontaneous oral speech of bilinguals, which can subsequently be applied in similar corpora. In conclusion, the prospects for using the created corpus and further developing corpus resources for the Albanian language are discussed.
Translated title of the contributionThe Bilingual Speech Corpus of Albanians in Budjak and Priazov'ye: Towards the Annotation of Contact-Related Phenomena
Original languageRussian
Pages (from-to)78-106
Number of pages29
JournalИндоевропейское языкознание и классическая филология
Volume29
Issue number2
DOIs
StatePublished - 2025

    Research areas

  • annotation, borrowing, code-switching, corpus, integration, single-word switches

ID: 137681779