DOI

Recent studies on the application of generative adversarial networks (GAN) for speech synthesis have shown improvements in the naturalness of synthesized speech, compared to the conventional approaches. In this article, we present a new framework of GAN to train an acoustic model for speech synthesis. The proposed GAN consists of a generator and a pair of agent discriminators, where the generator produces acoustic parameters taking into account linguistic parameters; and the pair of agent discriminators are introduced to improve the naturalness of the synthesized speech. We feed the agents with acoustic and linguistic parameters, thereby the agents do not only examine the acoustic distribution, but also the relationship between linguistic and acoustic parameters. Training and testing were conducted on the Kazakh speech corpus. According to the results of this research, the proposed framework of GAN improves the accuracy of the acoustic model for the Kazakh text-to-speech system.

Язык оригиналаанглийский
Номер статьи3
Страницы (с-по)729-735
Число страниц7
ЖурналInternational Journal of Speech Technology
Том24
Номер выпуска3
Дата раннего онлайн-доступа15 апр 2021
DOI
СостояниеОпубликовано - сен 2021

    Предметные области Scopus

  • Программный продукт
  • Языки и лингвистика
  • Человеко-машинное взаимодействие
  • Языки и лингвистика
  • Компьютерное зрение и распознавание образов

ID: 76651638