Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
End-to-end speech recognition systems incorporating deep neural networks (DNNs) have achieved good results. We propose applying CTC (Connectionist Temporal Classification) models and attention-based encoder-decoder in automatic recognition of the Russian continuous speech. We used different neural network models such Long short-term memory (LSTM), bidirectional LSTM and Residual Networks to provide experiments. We got recognition accuracy a bit worse than hybrid models but our models can work without large language model and they showed better performance in terms of average decoding speed that can be helpful in real systems. Experiments are performed with extra-large vocabulary (more than 150K words) of Russian speech.
Original language | English |
---|---|
Title of host publication | Speech and Computer - 20th International Conference, SPECOM 2018, Proceedings |
Editors | Rodmonga Potapova, Oliver Jokisch, Alexey Karpov |
Publisher | Springer Nature |
Pages | 377-386 |
Number of pages | 10 |
ISBN (Print) | 9783319995786 |
DOIs | |
State | Published - 1 Sep 2018 |
Event | 20th International Conference on Speech and Computer - Leipzig, Germany Duration: 18 Sep 2018 → 22 Sep 2018 |
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11096 LNAI |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference | 20th International Conference on Speech and Computer |
---|---|
Abbreviated title | SPECOM 2018 |
Country/Territory | Germany |
City | Leipzig |
Period | 18/09/18 → 22/09/18 |
ID: 36521378