End-to-end speech recognition systems incorporating deep neural networks (DNNs) have achieved good results. We propose applying CTC (Connectionist Temporal Classification) models and attention-based encoder-decoder in automatic recognition of the Russian continuous speech. We used different neural network models such Long short-term memory (LSTM), bidirectional LSTM and Residual Networks to provide experiments. We got recognition accuracy a bit worse than hybrid models but our models can work without large language model and they showed better performance in terms of average decoding speed that can be helpful in real systems. Experiments are performed with extra-large vocabulary (more than 150K words) of Russian speech.

Original languageEnglish
Title of host publicationSpeech and Computer - 20th International Conference, SPECOM 2018, Proceedings
EditorsRodmonga Potapova, Oliver Jokisch, Alexey Karpov
PublisherSpringer Nature
Pages377-386
Number of pages10
ISBN (Print)9783319995786
DOIs
StatePublished - 1 Sep 2018
Event20th International Conference on Speech and Computer - Leipzig, Germany
Duration: 18 Sep 201822 Sep 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11096 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Speech and Computer
Abbreviated titleSPECOM 2018
Country/TerritoryGermany
CityLeipzig
Period18/09/1822/09/18

    Research areas

  • Deep learning, End-to-end models, Russian speech, Speech recognition

    Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

ID: 36521378