Two methods for identifying Russian words in Yakut texts

Standard

Two methods for identifying Russian words in Yakut texts. / Zakharov, Victor ; Vissio, Nicolas Cortegoso .

In: International Journal of Open Information Technologies, Vol. 10, No. 11, 2022, p. 26-34.

Research output: Contribution to journal › Article › peer-review

Harvard

Zakharov, V & Vissio, NC 2022, 'Two methods for identifying Russian words in Yakut texts', International Journal of Open Information Technologies, vol. 10, no. 11, pp. 26-34.

APA

Zakharov, V., & Vissio, N. C. (2022). Two methods for identifying Russian words in Yakut texts. International Journal of Open Information Technologies, 10(11), 26-34.

Vancouver

Zakharov V , Vissio NC. Two methods for identifying Russian words in Yakut texts. International Journal of Open Information Technologies. 2022;10(11):26-34.

Author

Zakharov, Victor ; Vissio, Nicolas Cortegoso . / Two methods for identifying Russian words in Yakut texts. In: International Journal of Open Information Technologies. 2022 ; Vol. 10, No. 11. pp. 26-34.

BibTeX

@article{9c660ffa52d84329b9ad9cae85e92606,

title = "Two methods for identifying Russian words in Yakut texts",

abstract = "The article discusses two methods for extracting foreign words from Yakut texts. Foreign words refer to non-integrated lexical units, which have not been adapted to Yakut orthography and are therefore written as in the original language. Based on the fact that most foreign words in Yakut texts come from the Russian language, it is assumed that they have a particular form by which they can be distinguished from the Yakut word forms. The first method reviewed here is based on rules. It implements an algorithm that detects letter combinations that are foreign to the Yakut language. The second method applies a statistical approach to model and differentiate Yakut and Russian letter combinations. The effectiveness of both methods in extracting Russian foreign words is compared with the results of manual highlighting performed by Russian speakers on 6 Yakut texts. This work is a continuation of the article “Identification of Russian borrowings in Yakut texts”, published in “Computer Linguistics and Computational Ontologies. Number 5 (Proceedings of the XXIV Joint International Conference {"}Internet and Modern Society, IMS-2022.",

author = "Victor Zakharov and Vissio, {Nicolas Cortegoso}",

note = "Cortegoso Vissio N., Zakharov V. Two methods for identifying Russian words in Yakut texts // International Journal of Open Information Technologies, ISSN: 2307-8162, vol. 10, no.11, 2022, p. 26-34.",

year = "2022",

language = "English",

volume = "10",

pages = "26--34",

journal = "International Journal of Open Information Technologies",

issn = "2307-8162",

publisher = "Издательство Московского университета",

number = "11",

}

RIS

TY - JOUR

T1 - Two methods for identifying Russian words in Yakut texts

AU - Zakharov, Victor

AU - Vissio, Nicolas Cortegoso

N1 - Cortegoso Vissio N., Zakharov V. Two methods for identifying Russian words in Yakut texts // International Journal of Open Information Technologies, ISSN: 2307-8162, vol. 10, no.11, 2022, p. 26-34.

PY - 2022

Y1 - 2022

N2 - The article discusses two methods for extracting foreign words from Yakut texts. Foreign words refer to non-integrated lexical units, which have not been adapted to Yakut orthography and are therefore written as in the original language. Based on the fact that most foreign words in Yakut texts come from the Russian language, it is assumed that they have a particular form by which they can be distinguished from the Yakut word forms. The first method reviewed here is based on rules. It implements an algorithm that detects letter combinations that are foreign to the Yakut language. The second method applies a statistical approach to model and differentiate Yakut and Russian letter combinations. The effectiveness of both methods in extracting Russian foreign words is compared with the results of manual highlighting performed by Russian speakers on 6 Yakut texts. This work is a continuation of the article “Identification of Russian borrowings in Yakut texts”, published in “Computer Linguistics and Computational Ontologies. Number 5 (Proceedings of the XXIV Joint International Conference "Internet and Modern Society, IMS-2022.

AB - The article discusses two methods for extracting foreign words from Yakut texts. Foreign words refer to non-integrated lexical units, which have not been adapted to Yakut orthography and are therefore written as in the original language. Based on the fact that most foreign words in Yakut texts come from the Russian language, it is assumed that they have a particular form by which they can be distinguished from the Yakut word forms. The first method reviewed here is based on rules. It implements an algorithm that detects letter combinations that are foreign to the Yakut language. The second method applies a statistical approach to model and differentiate Yakut and Russian letter combinations. The effectiveness of both methods in extracting Russian foreign words is compared with the results of manual highlighting performed by Russian speakers on 6 Yakut texts. This work is a continuation of the article “Identification of Russian borrowings in Yakut texts”, published in “Computer Linguistics and Computational Ontologies. Number 5 (Proceedings of the XXIV Joint International Conference "Internet and Modern Society, IMS-2022.

UR - http://injoit.org/index.php/j1/article/view/1425

M3 - Article

VL - 10

SP - 26

EP - 34

JO - International Journal of Open Information Technologies

JF - International Journal of Open Information Technologies

SN - 2307-8162

IS - 11

ER -

ID: 104746137