Standard

Disambiguation of Russian Homographs with Transformers. / Столяров, Иван Игоревич; Митрофанова, Ольга Александровна.

Literature, Language аnd Computing (LiLaC): Russian Contribution from the LiLaC-2023. Springer Nature, 2025. p. 73-83.

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Harvard

Столяров, ИИ & Митрофанова, ОА 2025, Disambiguation of Russian Homographs with Transformers. in Literature, Language аnd Computing (LiLaC): Russian Contribution from the LiLaC-2023. Springer Nature, pp. 73-83, Литература, язык и компьютерные технологии LiLaC, 9/11/23. https://doi.org/10.1007/978-981-96-0990-1_7

APA

Столяров, И. И., & Митрофанова, О. А. (2025). Disambiguation of Russian Homographs with Transformers. In Literature, Language аnd Computing (LiLaC): Russian Contribution from the LiLaC-2023 (pp. 73-83). Springer Nature. https://doi.org/10.1007/978-981-96-0990-1_7

Vancouver

Столяров ИИ, Митрофанова ОА. Disambiguation of Russian Homographs with Transformers. In Literature, Language аnd Computing (LiLaC): Russian Contribution from the LiLaC-2023. Springer Nature. 2025. p. 73-83 https://doi.org/10.1007/978-981-96-0990-1_7

Author

Столяров, Иван Игоревич ; Митрофанова, Ольга Александровна. / Disambiguation of Russian Homographs with Transformers. Literature, Language аnd Computing (LiLaC): Russian Contribution from the LiLaC-2023. Springer Nature, 2025. pp. 73-83

BibTeX

@inbook{96c65261db19444798ea414ac2addd45,
title = "Disambiguation of Russian Homographs with Transformers",
abstract = "The purpose of this work was to test BERT transformer-based models for homograph disambiguation in Russian—a long-standing issue in Text-To-Speech systems. The paper presents different types of Russian homographs and offers an in-depth analysis of existing methods for their disambiguation. A dataset of contexts from the Russian National Corpus for 28 homograph pairs was created and manually annotated. Three BERT models for the Russian language were selected and tested in two experiments. The results have shown that these models could achieve and outperform SOTA results in disambiguating homographs of all types on a relatively small training dataset. The pretrained models could also be used to disambiguate new pairs of intraparadigmatic homographs absent from the original dataset.",
keywords = "BERT, Homograph disambiguation, Russian homographs, Text-to-Speech",
author = "Столяров, {Иван Игоревич} and Митрофанова, {Ольга Александровна}",
year = "2025",
doi = "10.1007/978-981-96-0990-1_7",
language = "English",
isbn = "9789819609901",
pages = "73--83",
booktitle = "Literature, Language аnd Computing (LiLaC): Russian Contribution from the LiLaC-2023",
publisher = "Springer Nature",
address = "Germany",
note = "null ; Conference date: 09-11-2023 Through 11-11-2023",

}

RIS

TY - CHAP

T1 - Disambiguation of Russian Homographs with Transformers

AU - Столяров, Иван Игоревич

AU - Митрофанова, Ольга Александровна

N1 - Conference code: 2

PY - 2025

Y1 - 2025

N2 - The purpose of this work was to test BERT transformer-based models for homograph disambiguation in Russian—a long-standing issue in Text-To-Speech systems. The paper presents different types of Russian homographs and offers an in-depth analysis of existing methods for their disambiguation. A dataset of contexts from the Russian National Corpus for 28 homograph pairs was created and manually annotated. Three BERT models for the Russian language were selected and tested in two experiments. The results have shown that these models could achieve and outperform SOTA results in disambiguating homographs of all types on a relatively small training dataset. The pretrained models could also be used to disambiguate new pairs of intraparadigmatic homographs absent from the original dataset.

AB - The purpose of this work was to test BERT transformer-based models for homograph disambiguation in Russian—a long-standing issue in Text-To-Speech systems. The paper presents different types of Russian homographs and offers an in-depth analysis of existing methods for their disambiguation. A dataset of contexts from the Russian National Corpus for 28 homograph pairs was created and manually annotated. Three BERT models for the Russian language were selected and tested in two experiments. The results have shown that these models could achieve and outperform SOTA results in disambiguating homographs of all types on a relatively small training dataset. The pretrained models could also be used to disambiguate new pairs of intraparadigmatic homographs absent from the original dataset.

KW - BERT

KW - Homograph disambiguation

KW - Russian homographs

KW - Text-to-Speech

UR - https://link.springer.com/chapter/10.1007/978-981-96-0990-1_7

UR - https://www.mendeley.com/catalogue/dac99d81-a374-3048-a802-04698bdc98ed/

U2 - 10.1007/978-981-96-0990-1_7

DO - 10.1007/978-981-96-0990-1_7

M3 - Chapter

SN - 9789819609901

SP - 73

EP - 83

BT - Literature, Language аnd Computing (LiLaC): Russian Contribution from the LiLaC-2023

PB - Springer Nature

Y2 - 9 November 2023 through 11 November 2023

ER -

ID: 113868018