DOI

In this paper, we describe the second stage of the study aimed at describing the factors that influence the phonetic reduction of words in Russian speech using machine learning algorithms. We discuss the limitations of the first stage of our study and try to overcome some of them by increasing the dataset and using new algorithms such as random forest, gradient boosting, and perceptron. We used the texts from the Corpus of Russian Speech as the data. The dataset was divided into two separate datasets: one consisted of single words and the other contained multiword units from our corpus. According to the results, for single words the most important features turned out to be the number of syllables and whether the word is an adjective as they were chosen by all algorithms. For the multiword units, the main features were the number of syllables, frequency in Russian spoken texts (in ipm), and token frequency in a given text. In our further research, we are going to expand the dataset and look closer on such features as text type and token frequency in a given text.

Язык оригиналаанглийский
Название основной публикацииSpeech and Computer - 23rd International Conference, SPECOM 2021, Proceedings
РедакторыAlexey Karpov, Rodmonga Potapova
ИздательSpringer Nature
Страницы146-156
Число страниц11
ISBN (печатное издание)9783030878016
DOI
СостояниеОпубликовано - 2021
Событие23rd International Conference on Speech and Computer - Virtual, Online, Российская Федерация
Продолжительность: 27 сен 202130 сен 2021
Номер конференции: 23
http://specom.nw.ru/2021/

Серия публикаций

НазваниеLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Том12997 LNAI
ISSN (печатное издание)0302-9743
ISSN (электронное издание)1611-3349

конференция

конференция23rd International Conference on Speech and Computer
Сокращенное названиеSPECOM 2021
Страна/TерриторияРоссийская Федерация
ГородVirtual, Online
Период27/09/2130/09/21
Сайт в сети Internet

    Предметные области Scopus

  • Теоретические компьютерные науки
  • Компьютерные науки (все)

ID: 87566335