DOI

The paper presents two corpora of spontaneous Russian. The aim of the study is to describe the speech signal in a way close to the one a listener has to cope with while processing natural speech and to use the corpora for further computer simulation of spoken word recognition. The corpus of adult speech includes around two hours of recordings provided with the orthographic and acoustic-phonetic transcription performed manually by trained phoneticians. The word list imitating the mental lexicon of a listener where each phonetic realization corresponds to all possible variants of its interpretation found in the corpus was created based on the corpus. The analysis of the adult speech shows how often reduced word forms occur in spontaneous speech and allows to develop and check an algorithm of the restoration of grammatical information in noun phrases. The corpus of children's speech includes both longitudinal and experimental data (around 18 hours all together) and is the first example of the corpus of Russian children's speech provided with phonetic annotation. The preliminary analysis of the children's speech shows that at least some reduced variants can be stored in the mental lexicon of a native speaker.
Язык оригиналаанглийский
Название основной публикации10th Annual Computing and Communication Workshop and Conference (CCWC)
ИздательInstitute of Electrical and Electronics Engineers Inc.
Страницы0406-0411
ISBN (электронное издание)978-172813783-4
DOI
СостояниеОпубликовано - 12 мар 2020
Событие10th Annual Computing and Communication Workshop and Conference - University of Nevada, Las Vegas, Соединенные Штаты Америки
Продолжительность: 6 янв 20208 янв 2020
Номер конференции: 10
http://ieee-ccwc.org/

конференция

конференция10th Annual Computing and Communication Workshop and Conference
Сокращенное названиеIEEE CCWC
Страна/TерриторияСоединенные Штаты Америки
ГородLas Vegas
Период6/01/208/01/20
Сайт в сети Internet

ID: 72568850