Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
End-to-End Speech Recognition in Russian. / Markovnikov, Nikita; Kipyatkova, Irina; Lyakso, Elena.
Speech and Computer - 20th International Conference, SPECOM 2018, Proceedings. ed. / Rodmonga Potapova; Oliver Jokisch; Alexey Karpov. Springer Nature, 2018. p. 377-386 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11096 LNAI).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - End-to-End Speech Recognition in Russian
AU - Markovnikov, Nikita
AU - Kipyatkova, Irina
AU - Lyakso, Elena
PY - 2018/9/1
Y1 - 2018/9/1
N2 - End-to-end speech recognition systems incorporating deep neural networks (DNNs) have achieved good results. We propose applying CTC (Connectionist Temporal Classification) models and attention-based encoder-decoder in automatic recognition of the Russian continuous speech. We used different neural network models such Long short-term memory (LSTM), bidirectional LSTM and Residual Networks to provide experiments. We got recognition accuracy a bit worse than hybrid models but our models can work without large language model and they showed better performance in terms of average decoding speed that can be helpful in real systems. Experiments are performed with extra-large vocabulary (more than 150K words) of Russian speech.
AB - End-to-end speech recognition systems incorporating deep neural networks (DNNs) have achieved good results. We propose applying CTC (Connectionist Temporal Classification) models and attention-based encoder-decoder in automatic recognition of the Russian continuous speech. We used different neural network models such Long short-term memory (LSTM), bidirectional LSTM and Residual Networks to provide experiments. We got recognition accuracy a bit worse than hybrid models but our models can work without large language model and they showed better performance in terms of average decoding speed that can be helpful in real systems. Experiments are performed with extra-large vocabulary (more than 150K words) of Russian speech.
KW - Deep learning
KW - End-to-end models
KW - Russian speech
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85053774772&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-99579-3_40
DO - 10.1007/978-3-319-99579-3_40
M3 - Conference contribution
AN - SCOPUS:85053774772
SN - 9783319995786
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 377
EP - 386
BT - Speech and Computer - 20th International Conference, SPECOM 2018, Proceedings
A2 - Potapova, Rodmonga
A2 - Jokisch, Oliver
A2 - Karpov, Alexey
PB - Springer Nature
T2 - 20th International Conference on Speech and Computer
Y2 - 18 September 2018 through 22 September 2018
ER -
ID: 36521378