For creating a linguistically annotated speech corpus, it is useful to have a tool for an automatic phonetic transcription. We used the Kaldi tool to transcribe the recordings of radio interviews and talk shows from the Corpus of Spoken Russian. The training set included 2466 interpausal intervals (speech fragments between two pauses), and the test set – 617 ones. 15 models for monophone training and 15 models for triphone training were tested using a low-dimensional dictionary that contained only allophones. The error rates ranged from 44% to 39%. Learning through triphones coped better with the task than the one through monophones. Increasing the length of N-grams had a positive effect on the result of the model, the percentage of errors decreased to 36%. The frequency of allophone occurrence does not seem to affect the accuracy of their recognition. Vowels are recognized worse than consonants, which is consistent with what is known about how trained experts in phonetics transcribe spontaneous speech.
Original languageEnglish
Title of host publicationProceedings of the Third International Conference on Advances in Computing Research (ACR’25)
PublisherSpringer Nature
Pages168-178
Number of pages11
ISBN (Print)9783031876462
DOIs
StatePublished - 16 Apr 2025
EventThe 2025 International Conference on Advances in Computing Research - Ницца, France
Duration: 7 Jul 20259 Jul 2025
Conference number: 3
https://iicser.org/ACR25/

Publication series

NameLecture Notes in Networks and Systems
PublisherSpringer Nature
Volume1346
ISSN (Print)2367-3389

Conference

ConferenceThe 2025 International Conference on Advances in Computing Research
Abbreviated titleACR'25
Country/TerritoryFrance
CityНицца
Period7/07/259/07/25
Internet address

    Research areas

  • Acoustic Transcription, Automatic Speech Recognition, Natural Language Processing, Phonetic Transcription, Russian Speech

ID: 138031353