DOI

End-to-end speech recognition systems incorporating deep neural networks (DNNs) have achieved good results. We propose applying CTC (Connectionist Temporal Classification) models and attention-based encoder-decoder in automatic recognition of the Russian continuous speech. We used different neural network models such Long short-term memory (LSTM), bidirectional LSTM and Residual Networks to provide experiments. We got recognition accuracy a bit worse than hybrid models but our models can work without large language model and they showed better performance in terms of average decoding speed that can be helpful in real systems. Experiments are performed with extra-large vocabulary (more than 150K words) of Russian speech.

Язык оригиналаанглийский
Название основной публикацииSpeech and Computer - 20th International Conference, SPECOM 2018, Proceedings
РедакторыRodmonga Potapova, Oliver Jokisch, Alexey Karpov
ИздательSpringer Nature
Страницы377-386
Число страниц10
ISBN (печатное издание)9783319995786
DOI
СостояниеОпубликовано - 1 сен 2018
Событие20th International Conference on Speech and Computer - Leipzig, Германия
Продолжительность: 18 сен 201822 сен 2018

Серия публикаций

НазваниеLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Том11096 LNAI
ISSN (печатное издание)0302-9743
ISSN (электронное издание)1611-3349

конференция

конференция20th International Conference on Speech and Computer
Сокращенное названиеSPECOM 2018
Страна/TерриторияГермания
ГородLeipzig
Период18/09/1822/09/18

    Предметные области Scopus

  • Теоретические компьютерные науки
  • Компьютерные науки (все)

ID: 36521378