Research output: Chapter in Book/Report/Conference proceeding › Chapter › peer-review
Disambiguation of Russian Homographs with Transformers. / Столяров, Иван Игоревич; Митрофанова, Ольга Александровна.
Literature, Language аnd Computing (LiLaC): Russian Contribution from the LiLaC-2023. Springer Nature, 2025. p. 73-83.Research output: Chapter in Book/Report/Conference proceeding › Chapter › peer-review
}
TY - CHAP
T1 - Disambiguation of Russian Homographs with Transformers
AU - Столяров, Иван Игоревич
AU - Митрофанова, Ольга Александровна
N1 - Conference code: 2
PY - 2025
Y1 - 2025
N2 - The purpose of this work was to test BERT transformer-based models for homograph disambiguation in Russian—a long-standing issue in Text-To-Speech systems. The paper presents different types of Russian homographs and offers an in-depth analysis of existing methods for their disambiguation. A dataset of contexts from the Russian National Corpus for 28 homograph pairs was created and manually annotated. Three BERT models for the Russian language were selected and tested in two experiments. The results have shown that these models could achieve and outperform SOTA results in disambiguating homographs of all types on a relatively small training dataset. The pretrained models could also be used to disambiguate new pairs of intraparadigmatic homographs absent from the original dataset.
AB - The purpose of this work was to test BERT transformer-based models for homograph disambiguation in Russian—a long-standing issue in Text-To-Speech systems. The paper presents different types of Russian homographs and offers an in-depth analysis of existing methods for their disambiguation. A dataset of contexts from the Russian National Corpus for 28 homograph pairs was created and manually annotated. Three BERT models for the Russian language were selected and tested in two experiments. The results have shown that these models could achieve and outperform SOTA results in disambiguating homographs of all types on a relatively small training dataset. The pretrained models could also be used to disambiguate new pairs of intraparadigmatic homographs absent from the original dataset.
KW - BERT
KW - Homograph disambiguation
KW - Russian homographs
KW - Text-to-Speech
UR - https://link.springer.com/chapter/10.1007/978-981-96-0990-1_7
UR - https://www.mendeley.com/catalogue/dac99d81-a374-3048-a802-04698bdc98ed/
U2 - 10.1007/978-981-96-0990-1_7
DO - 10.1007/978-981-96-0990-1_7
M3 - Chapter
SN - 9789819609901
SP - 73
EP - 83
BT - Literature, Language аnd Computing (LiLaC): Russian Contribution from the LiLaC-2023
PB - Springer Nature
Y2 - 9 November 2023 through 11 November 2023
ER -
ID: 113868018