Ссылки

The paper examines the practical issues in developing a speech-to-text system using deep neuralnetworks. The development of a Russian-language speech recognition system based on DeepSpeecharchitecture is described. The Mozilla company’s open source implementation of DeepSpeech for theEnglish language was used as a starting point. The system was trained in a containerized environmentusing the Docker technology. It allowed to describe the entire process of component assembly fromthe source code, including a number of optimization techniques for CPU and GPU. Docker also allowsto easily reproduce computation optimization tests on alternative infrastructures. We examined the useof TensorFlow XLA technology that optimizes linear algebra computations in the course of neuralnetwork training. The number of nodes in the internal layers of neural network was optimized basedon the word error rate (WER) obtained on a test data set, having regard to GPU memory limitations.We studied the use of probabilistic language models with various maximum lengths of wordsequences and selected the model that shows the best WER. Our study resulted in a Russian-languageacoustic model having been trained based on a data set comprising audio and subtitles from YouTubevideo clips. The language model was built based on the texts of subtitles and publicly availableRussian-language corpus of Wikipedia’s popular articles. The resulting system was tested on a data setconsisting of audio recordings of Russian literature available on voxforge.com—the best WERdemonstrated by the system was 18%.
Язык оригиналаанглийский
Название основной публикацииDistributed Computing and Grid-technologies in Science and Education 2018.
Страницы470-474
Том2267
СостояниеОпубликовано - 30 дек 2018

Серия публикаций

НазваниеCEUR Workshop Proceedings
ИздательRWTH Aahen University
ISSN (печатное издание)1613-0073

ID: 36505048