End-to-End Speech Recognition in Russian › SPbU Researchers Portal

DOI

https://doi.org/10.1007/978-3-319-99579-3_40
Final published version

Nikita Markovnikov
Irina Kipyatkova
Elena Lyakso

End-to-end speech recognition systems incorporating deep neural networksÂ (DNNs) have achieved good results. We propose applying CTCÂ (Connectionist Temporal Classification) models and attention-based encoder-decoder in automatic recognition of the Russian continuous speech. We used different neural network models such Long short-term memoryÂ (LSTM), bidirectional LSTM and Residual Networks to provide experiments. We got recognition accuracy a bit worse than hybrid models but our models can work without large language model and they showed better performance in terms of average decoding speed that can be helpful in real systems. Experiments are performed with extra-large vocabulary (more than 150K words) of Russian speech.

Original language	English
Title of host publication	Speech and Computer - 20th International Conference, SPECOM 2018, Proceedings
Editors	Rodmonga Potapova, Oliver Jokisch, Alexey Karpov
Publisher	Springer Nature
Pages	377-386
Number of pages	10
ISBN (Print)	9783319995786
DOIs	https://doi.org/10.1007/978-3-319-99579-3_40
State	Published - 1 Sep 2018
Event	20th International Conference on Speech and Computer - Leipzig, Germany Duration: 18 Sep 2018 → 22 Sep 2018

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11096 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	20th International Conference on Speech and Computer
Abbreviated title	SPECOM 2018
Country/Territory	Germany
City	Leipzig
Period	18/09/18 → 22/09/18

Research areas

Deep learning, End-to-end models, Russian speech, Speech recognition

Scopus subject areas

Theoretical Computer Science
Computer Science(all)

ID: 36521378