The Yakut language includes a significant number of Russian loanwords. During the assimilation process, Russian roots may undergo transformations according to the phonetics of the Yakut language or they may retain their original spelling. Both spellings are often equally common for the same loanword, and therefore Yakut speakers are faced with the question of which variant to use. The study of Russian loanwords in Yakut is a topic relevant not only to the areas of language policy and planning, and the efforts to standardize the spelling, but also to the research that seeks to reveal usage trends. The task of identifying Russian loanwords in Yakut texts can be carried out within the scope of the study of automatic language identification. Automatic language identification (LI) refers to the problem of determining the language in which a document or part of it is written. In general, LI can be considered as a text classification task, that is, matching a document to a set of predefined classes. This paper presents the results of an experiment on training a classifier for the automatic identification of Russian loanwords that have preserved the original spelling in Yakut texts. The classifier was trained using a 3-gram model.
Translated title of the contributionIDENTIFICATION OF RUSSIAN BORROWINGS IN YAKUT TEXTS
Original languageRussian
Pages (from-to)41-54
JournalКомпьютерная лингвистика и вычислительные онтологии
Issue number6
StatePublished - 2022
EventInternational Conference "Internet and Modern Society" (IMS-2022): International Workshop «Computational Linguistics» (CompLing-2022) - ITMO University, Санкт-Петербург, Russian Federation
Duration: 23 Jun 202224 Jun 2022
Conference number: XXIV
http://ims.ifmo.ru/ru/pages/2/programma.htm
http://ims.ifmo.ru/ru

    Research areas

  • YAKUT LANGUAGE, RUSSIAN LOANWORDS, language identification, 3-GRAM MODEL, lexicography

ID: 104746232