Russian-Language speech recognition system based on DeepSpeech › Научные исследования в СПбГУ

Standard

Russian-Language speech recognition system based on DeepSpeech. / Iakushkin, Oleg ; Degtyarev, Alexander ; Sedova, Olga ; Fedoseev, Georgy; Shaleva, Anna.

Distributed Computing and Grid-technologies in Science and Education 2018.. Том 2267 2018. стр. 470-474 (CEUR Workshop Proceedings).

Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › глава/раздел › научная › Рецензирование

Harvard

Iakushkin, O , Degtyarev, A , Sedova, O , Fedoseev, G & Shaleva, A 2018, Russian-Language speech recognition system based on DeepSpeech. в Distributed Computing and Grid-technologies in Science and Education 2018.. Том. 2267, CEUR Workshop Proceedings, стр. 470-474. <http://ceur-ws.org/Vol-2267/470-474-paper-90.pdf>

APA

Iakushkin, O., Degtyarev, A., Sedova, O., Fedoseev, G., & Shaleva, A. (2018). Russian-Language speech recognition system based on DeepSpeech. в Distributed Computing and Grid-technologies in Science and Education 2018. (Том 2267, стр. 470-474). (CEUR Workshop Proceedings). http://ceur-ws.org/Vol-2267/470-474-paper-90.pdf

Vancouver

Iakushkin O , Degtyarev A , Sedova O , Fedoseev G, Shaleva A. Russian-Language speech recognition system based on DeepSpeech. в Distributed Computing and Grid-technologies in Science and Education 2018.. Том 2267. 2018. стр. 470-474. (CEUR Workshop Proceedings).

Author

Iakushkin, Oleg ; Degtyarev, Alexander ; Sedova, Olga ; Fedoseev, Georgy ; Shaleva, Anna. / Russian-Language speech recognition system based on DeepSpeech. Distributed Computing and Grid-technologies in Science and Education 2018.. Том 2267 2018. стр. 470-474 (CEUR Workshop Proceedings).

BibTeX

@inbook{f53c2d4d408c4c13aaef3a898da65f73,

title = "Russian-Language speech recognition system based on DeepSpeech",

abstract = "The paper examines the practical issues in developing a speech-to-text system using deep neuralnetworks. The development of a Russian-language speech recognition system based on DeepSpeecharchitecture is described. The Mozilla company{\textquoteright}s open source implementation of DeepSpeech for theEnglish language was used as a starting point. The system was trained in a containerized environmentusing the Docker technology. It allowed to describe the entire process of component assembly fromthe source code, including a number of optimization techniques for CPU and GPU. Docker also allowsto easily reproduce computation optimization tests on alternative infrastructures. We examined the useof TensorFlow XLA technology that optimizes linear algebra computations in the course of neuralnetwork training. The number of nodes in the internal layers of neural network was optimized basedon the word error rate (WER) obtained on a test data set, having regard to GPU memory limitations.We studied the use of probabilistic language models with various maximum lengths of wordsequences and selected the model that shows the best WER. Our study resulted in a Russian-languageacoustic model having been trained based on a data set comprising audio and subtitles from YouTubevideo clips. The language model was built based on the texts of subtitles and publicly availableRussian-language corpus of Wikipedia{\textquoteright}s popular articles. The resulting system was tested on a data setconsisting of audio recordings of Russian literature available on voxforge.com—the best WERdemonstrated by the system was 18%.",

author = "Oleg Iakushkin and Alexander Degtyarev and Olga Sedova and Georgy Fedoseev and Anna Shaleva",

year = "2018",

month = dec,

day = "30",

language = "English",

volume = "2267",

series = "CEUR Workshop Proceedings",

publisher = "RWTH Aahen University",

pages = "470--474",

booktitle = "Distributed Computing and Grid-technologies in Science and Education 2018.",

}

RIS

TY - CHAP

T1 - Russian-Language speech recognition system based on DeepSpeech

AU - Iakushkin, Oleg

AU - Degtyarev, Alexander

AU - Sedova, Olga

AU - Fedoseev, Georgy

AU - Shaleva, Anna

PY - 2018/12/30

Y1 - 2018/12/30

N2 - The paper examines the practical issues in developing a speech-to-text system using deep neuralnetworks. The development of a Russian-language speech recognition system based on DeepSpeecharchitecture is described. The Mozilla company’s open source implementation of DeepSpeech for theEnglish language was used as a starting point. The system was trained in a containerized environmentusing the Docker technology. It allowed to describe the entire process of component assembly fromthe source code, including a number of optimization techniques for CPU and GPU. Docker also allowsto easily reproduce computation optimization tests on alternative infrastructures. We examined the useof TensorFlow XLA technology that optimizes linear algebra computations in the course of neuralnetwork training. The number of nodes in the internal layers of neural network was optimized basedon the word error rate (WER) obtained on a test data set, having regard to GPU memory limitations.We studied the use of probabilistic language models with various maximum lengths of wordsequences and selected the model that shows the best WER. Our study resulted in a Russian-languageacoustic model having been trained based on a data set comprising audio and subtitles from YouTubevideo clips. The language model was built based on the texts of subtitles and publicly availableRussian-language corpus of Wikipedia’s popular articles. The resulting system was tested on a data setconsisting of audio recordings of Russian literature available on voxforge.com—the best WERdemonstrated by the system was 18%.

AB - The paper examines the practical issues in developing a speech-to-text system using deep neuralnetworks. The development of a Russian-language speech recognition system based on DeepSpeecharchitecture is described. The Mozilla company’s open source implementation of DeepSpeech for theEnglish language was used as a starting point. The system was trained in a containerized environmentusing the Docker technology. It allowed to describe the entire process of component assembly fromthe source code, including a number of optimization techniques for CPU and GPU. Docker also allowsto easily reproduce computation optimization tests on alternative infrastructures. We examined the useof TensorFlow XLA technology that optimizes linear algebra computations in the course of neuralnetwork training. The number of nodes in the internal layers of neural network was optimized basedon the word error rate (WER) obtained on a test data set, having regard to GPU memory limitations.We studied the use of probabilistic language models with various maximum lengths of wordsequences and selected the model that shows the best WER. Our study resulted in a Russian-languageacoustic model having been trained based on a data set comprising audio and subtitles from YouTubevideo clips. The language model was built based on the texts of subtitles and publicly availableRussian-language corpus of Wikipedia’s popular articles. The resulting system was tested on a data setconsisting of audio recordings of Russian literature available on voxforge.com—the best WERdemonstrated by the system was 18%.

M3 - Chapter

VL - 2267

T3 - CEUR Workshop Proceedings

SP - 470

EP - 474

BT - Distributed Computing and Grid-technologies in Science and Education 2018.

ER -

ID: 36505048