In this paper, we describe the second stage of the study aimed at describing the factors that influence the phonetic reduction of words in Russian speech using machine learning algorithms. We discuss the limitations of the first stage of our study and try to overcome some of them by increasing the dataset and using new algorithms such as random forest, gradient boosting, and perceptron. We used the texts from the Corpus of Russian Speech as the data. The dataset was divided into two separate datasets: one consisted of single words and the other contained multiword units from our corpus. According to the results, for single words the most important features turned out to be the number of syllables and whether the word is an adjective as they were chosen by all algorithms. For the multiword units, the main features were the number of syllables, frequency in Russian spoken texts (in ipm), and token frequency in a given text. In our further research, we are going to expand the dataset and look closer on such features as text type and token frequency in a given text.

Original languageEnglish
Title of host publicationSpeech and Computer - 23rd International Conference, SPECOM 2021, Proceedings
EditorsAlexey Karpov, Rodmonga Potapova
PublisherSpringer Nature
Pages146-156
Number of pages11
ISBN (Print)9783030878016
DOIs
StatePublished - 2021
Event23rd International Conference on Speech and Computer, SPECOM 2021 - Virtual, Online, Russian Federation
Duration: 27 Sep 202130 Sep 2021
Conference number: 23
http://specom.nw.ru/2021/

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12997 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference23rd International Conference on Speech and Computer, SPECOM 2021
Abbreviated titleSPECOM 2021
Country/TerritoryRussian Federation
CityVirtual, Online
Period27/09/2130/09/21
Internet address

    Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

    Research areas

  • Machine learning, Phonetic reduction, Russian, Speech

ID: 87566335