Corpora of Russian Spontaneous Speech as a Tool for Modelling Natural Speech Production and Recognition

DOI

https://doi.org/10.1109/CCWC47524.2020.9031251
Final published version

Elena I. Riekhakaynen

The paper presents two corpora of spontaneous Russian. The aim of the study is to describe the speech signal in a way close to the one a listener has to cope with while processing natural speech and to use the corpora for further computer simulation of spoken word recognition. The corpus of adult speech includes around two hours of recordings provided with the orthographic and acoustic-phonetic transcription performed manually by trained phoneticians. The word list imitating the mental lexicon of a listener where each phonetic realization corresponds to all possible variants of its interpretation found in the corpus was created based on the corpus. The analysis of the adult speech shows how often reduced word forms occur in spontaneous speech and allows to develop and check an algorithm of the restoration of grammatical information in noun phrases. The corpus of children's speech includes both longitudinal and experimental data (around 18 hours all together) and is the first example of the corpus of Russian children's speech provided with phonetic annotation. The preliminary analysis of the children's speech shows that at least some reduced variants can be stored in the mental lexicon of a native speaker.

Original language	English
Title of host publication	10th Annual Computing and Communication Workshop and Conference (CCWC)
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	0406-0411
ISBN (Electronic)	978-172813783-4
DOIs	https://doi.org/10.1109/CCWC47524.2020.9031251
State	Published - 12 Mar 2020
Event	10th Annual Computing and Communication Workshop and Conference - University of Nevada, Las Vegas, United States Duration: 6 Jan 2020 → 8 Jan 2020 Conference number: 10 http://ieee-ccwc.org/

Conference

Conference	10th Annual Computing and Communication Workshop and Conference
Abbreviated title	IEEE CCWC
Country/Territory	United States
City	Las Vegas
Period	6/01/20 → 8/01/20
Internet address	http://ieee-ccwc.org/

Research areas

Spontaneous speech, Children's Speech, Russian, Phonetic Reduction, Speech Processing, corpus linguistics

ID: 72568850