Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Corpora of Russian Spontaneous Speech as a Tool for Modelling Natural Speech Production and Recognition. / Riekhakaynen, Elena I. .
10th Annual Computing and Communication Workshop and Conference (CCWC). Institute of Electrical and Electronics Engineers Inc., 2020. p. 0406-0411.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Corpora of Russian Spontaneous Speech as a Tool for Modelling Natural Speech Production and Recognition
AU - Riekhakaynen, Elena I.
N1 - Conference code: 10
PY - 2020/3/12
Y1 - 2020/3/12
N2 - The paper presents two corpora of spontaneous Russian. The aim of the study is to describe the speech signal in a way close to the one a listener has to cope with while processing natural speech and to use the corpora for further computer simulation of spoken word recognition. The corpus of adult speech includes around two hours of recordings provided with the orthographic and acoustic-phonetic transcription performed manually by trained phoneticians. The word list imitating the mental lexicon of a listener where each phonetic realization corresponds to all possible variants of its interpretation found in the corpus was created based on the corpus. The analysis of the adult speech shows how often reduced word forms occur in spontaneous speech and allows to develop and check an algorithm of the restoration of grammatical information in noun phrases. The corpus of children's speech includes both longitudinal and experimental data (around 18 hours all together) and is the first example of the corpus of Russian children's speech provided with phonetic annotation. The preliminary analysis of the children's speech shows that at least some reduced variants can be stored in the mental lexicon of a native speaker.
AB - The paper presents two corpora of spontaneous Russian. The aim of the study is to describe the speech signal in a way close to the one a listener has to cope with while processing natural speech and to use the corpora for further computer simulation of spoken word recognition. The corpus of adult speech includes around two hours of recordings provided with the orthographic and acoustic-phonetic transcription performed manually by trained phoneticians. The word list imitating the mental lexicon of a listener where each phonetic realization corresponds to all possible variants of its interpretation found in the corpus was created based on the corpus. The analysis of the adult speech shows how often reduced word forms occur in spontaneous speech and allows to develop and check an algorithm of the restoration of grammatical information in noun phrases. The corpus of children's speech includes both longitudinal and experimental data (around 18 hours all together) and is the first example of the corpus of Russian children's speech provided with phonetic annotation. The preliminary analysis of the children's speech shows that at least some reduced variants can be stored in the mental lexicon of a native speaker.
KW - Spontaneous speech
KW - Children's Speech
KW - Russian
KW - Phonetic Reduction
KW - Speech Processing
KW - corpus linguistics
U2 - 10.1109/CCWC47524.2020.9031251
DO - 10.1109/CCWC47524.2020.9031251
M3 - Conference contribution
SP - 406
EP - 411
BT - 10th Annual Computing and Communication Workshop and Conference (CCWC)
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 6 January 2020 through 8 January 2020
ER -
ID: 72568850