Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › глава/раздел › Рецензирование
Russian-Language speech recognition system based on DeepSpeech. / Iakushkin, Oleg; Degtyarev, Alexander; Sedova, Olga; Fedoseev, Georgy; Shaleva, Anna.
Distributed Computing and Grid-technologies in Science and Education 2018.. Том 2267 2018. стр. 470-474 (CEUR Workshop Proceedings).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › глава/раздел › Рецензирование
}
TY - CHAP
T1 - Russian-Language speech recognition system based on DeepSpeech
AU - Iakushkin, Oleg
AU - Degtyarev, Alexander
AU - Sedova, Olga
AU - Fedoseev, Georgy
AU - Shaleva, Anna
PY - 2018/12/30
Y1 - 2018/12/30
N2 - The paper examines the practical issues in developing a speech-to-text system using deep neuralnetworks. The development of a Russian-language speech recognition system based on DeepSpeecharchitecture is described. The Mozilla company’s open source implementation of DeepSpeech for theEnglish language was used as a starting point. The system was trained in a containerized environmentusing the Docker technology. It allowed to describe the entire process of component assembly fromthe source code, including a number of optimization techniques for CPU and GPU. Docker also allowsto easily reproduce computation optimization tests on alternative infrastructures. We examined the useof TensorFlow XLA technology that optimizes linear algebra computations in the course of neuralnetwork training. The number of nodes in the internal layers of neural network was optimized basedon the word error rate (WER) obtained on a test data set, having regard to GPU memory limitations.We studied the use of probabilistic language models with various maximum lengths of wordsequences and selected the model that shows the best WER. Our study resulted in a Russian-languageacoustic model having been trained based on a data set comprising audio and subtitles from YouTubevideo clips. The language model was built based on the texts of subtitles and publicly availableRussian-language corpus of Wikipedia’s popular articles. The resulting system was tested on a data setconsisting of audio recordings of Russian literature available on voxforge.com—the best WERdemonstrated by the system was 18%.
AB - The paper examines the practical issues in developing a speech-to-text system using deep neuralnetworks. The development of a Russian-language speech recognition system based on DeepSpeecharchitecture is described. The Mozilla company’s open source implementation of DeepSpeech for theEnglish language was used as a starting point. The system was trained in a containerized environmentusing the Docker technology. It allowed to describe the entire process of component assembly fromthe source code, including a number of optimization techniques for CPU and GPU. Docker also allowsto easily reproduce computation optimization tests on alternative infrastructures. We examined the useof TensorFlow XLA technology that optimizes linear algebra computations in the course of neuralnetwork training. The number of nodes in the internal layers of neural network was optimized basedon the word error rate (WER) obtained on a test data set, having regard to GPU memory limitations.We studied the use of probabilistic language models with various maximum lengths of wordsequences and selected the model that shows the best WER. Our study resulted in a Russian-languageacoustic model having been trained based on a data set comprising audio and subtitles from YouTubevideo clips. The language model was built based on the texts of subtitles and publicly availableRussian-language corpus of Wikipedia’s popular articles. The resulting system was tested on a data setconsisting of audio recordings of Russian literature available on voxforge.com—the best WERdemonstrated by the system was 18%.
M3 - Chapter
VL - 2267
T3 - CEUR Workshop Proceedings
SP - 470
EP - 474
BT - Distributed Computing and Grid-technologies in Science and Education 2018.
ER -
ID: 36505048