Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
What Causes Phonetic Reduction in Russian Speech : New Evidence from Machine Learning Algorithms. / Dayter, Maria; Riekhakaynen, Elena.
Speech and Computer - 23rd International Conference, SPECOM 2021, Proceedings. ed. / Alexey Karpov; Rodmonga Potapova. Springer Nature, 2021. p. 146-156 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12997 LNAI).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - What Causes Phonetic Reduction in Russian Speech
T2 - 23rd International Conference on Speech and Computer, SPECOM 2021
AU - Dayter, Maria
AU - Riekhakaynen, Elena
N1 - Conference code: 23
PY - 2021
Y1 - 2021
N2 - In this paper, we describe the second stage of the study aimed at describing the factors that influence the phonetic reduction of words in Russian speech using machine learning algorithms. We discuss the limitations of the first stage of our study and try to overcome some of them by increasing the dataset and using new algorithms such as random forest, gradient boosting, and perceptron. We used the texts from the Corpus of Russian Speech as the data. The dataset was divided into two separate datasets: one consisted of single words and the other contained multiword units from our corpus. According to the results, for single words the most important features turned out to be the number of syllables and whether the word is an adjective as they were chosen by all algorithms. For the multiword units, the main features were the number of syllables, frequency in Russian spoken texts (in ipm), and token frequency in a given text. In our further research, we are going to expand the dataset and look closer on such features as text type and token frequency in a given text.
AB - In this paper, we describe the second stage of the study aimed at describing the factors that influence the phonetic reduction of words in Russian speech using machine learning algorithms. We discuss the limitations of the first stage of our study and try to overcome some of them by increasing the dataset and using new algorithms such as random forest, gradient boosting, and perceptron. We used the texts from the Corpus of Russian Speech as the data. The dataset was divided into two separate datasets: one consisted of single words and the other contained multiword units from our corpus. According to the results, for single words the most important features turned out to be the number of syllables and whether the word is an adjective as they were chosen by all algorithms. For the multiword units, the main features were the number of syllables, frequency in Russian spoken texts (in ipm), and token frequency in a given text. In our further research, we are going to expand the dataset and look closer on such features as text type and token frequency in a given text.
KW - Machine learning
KW - Phonetic reduction
KW - Russian
KW - Speech
UR - http://www.scopus.com/inward/record.url?scp=85116342082&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/f8bceca4-1368-359d-a283-305fa6528c35/
U2 - 10.1007/978-3-030-87802-3_14
DO - 10.1007/978-3-030-87802-3_14
M3 - Conference contribution
AN - SCOPUS:85116342082
SN - 9783030878016
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 146
EP - 156
BT - Speech and Computer - 23rd International Conference, SPECOM 2021, Proceedings
A2 - Karpov, Alexey
A2 - Potapova, Rodmonga
PB - Springer Nature
Y2 - 27 September 2021 through 30 September 2021
ER -
ID: 87566335