The paper presents two corpora of spontaneous Russian. The aim of the study is to describe the speech signal in a way close to the one a listener has to cope with while processing natural speech and to use the corpora for further computer simulation of spoken word recognition. The corpus of adult speech includes around two hours of recordings provided with the orthographic and acoustic-phonetic transcription performed manually by trained phoneticians. The word list imitating the mental lexicon of a listener where each phonetic realization corresponds to all possible variants of its interpretation found in the corpus was created based on the corpus. The analysis of the adult speech shows how often reduced word forms occur in spontaneous speech and allows to develop and check an algorithm of the restoration of grammatical information in noun phrases. The corpus of children's speech includes both longitudinal and experimental data (around 18 hours all together) and is the first example of the corpus of Russian children's speech provided with phonetic annotation. The preliminary analysis of the children's speech shows that at least some reduced variants can be stored in the mental lexicon of a native speaker.
Original languageEnglish
Title of host publication10th Annual Computing and Communication Workshop and Conference (CCWC)
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages0406-0411
ISBN (Electronic)978-172813783-4
DOIs
StatePublished - 12 Mar 2020
Event10th Annual Computing and Communication Workshop and Conference - University of Nevada, Las Vegas, United States
Duration: 6 Jan 20208 Jan 2020
Conference number: 10
http://ieee-ccwc.org/

Conference

Conference10th Annual Computing and Communication Workshop and Conference
Abbreviated titleIEEE CCWC
Country/TerritoryUnited States
CityLas Vegas
Period6/01/208/01/20
Internet address

    Research areas

  • Spontaneous speech, Children's Speech, Russian, Phonetic Reduction, Speech Processing, corpus linguistics

ID: 72568850