Research output: Contribution to journal › Article › peer-review
GAN acoustic model for Kazakh speech synthesis. / Kaliyev, Arman; Zeno, Bassel; Rybin, Sergey V.; Matveev, Yuri N.; Lyakso, Elena E.
In: International Journal of Speech Technology, Vol. 24, No. 3, 3, 09.2021, p. 729-735.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - GAN acoustic model for Kazakh speech synthesis
AU - Kaliyev, Arman
AU - Zeno, Bassel
AU - Rybin, Sergey V.
AU - Matveev, Yuri N.
AU - Lyakso, Elena E.
N1 - Publisher Copyright: © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2021/9
Y1 - 2021/9
N2 - Recent studies on the application of generative adversarial networks (GAN) for speech synthesis have shown improvements in the naturalness of synthesized speech, compared to the conventional approaches. In this article, we present a new framework of GAN to train an acoustic model for speech synthesis. The proposed GAN consists of a generator and a pair of agent discriminators, where the generator produces acoustic parameters taking into account linguistic parameters; and the pair of agent discriminators are introduced to improve the naturalness of the synthesized speech. We feed the agents with acoustic and linguistic parameters, thereby the agents do not only examine the acoustic distribution, but also the relationship between linguistic and acoustic parameters. Training and testing were conducted on the Kazakh speech corpus. According to the results of this research, the proposed framework of GAN improves the accuracy of the acoustic model for the Kazakh text-to-speech system.
AB - Recent studies on the application of generative adversarial networks (GAN) for speech synthesis have shown improvements in the naturalness of synthesized speech, compared to the conventional approaches. In this article, we present a new framework of GAN to train an acoustic model for speech synthesis. The proposed GAN consists of a generator and a pair of agent discriminators, where the generator produces acoustic parameters taking into account linguistic parameters; and the pair of agent discriminators are introduced to improve the naturalness of the synthesized speech. We feed the agents with acoustic and linguistic parameters, thereby the agents do not only examine the acoustic distribution, but also the relationship between linguistic and acoustic parameters. Training and testing were conducted on the Kazakh speech corpus. According to the results of this research, the proposed framework of GAN improves the accuracy of the acoustic model for the Kazakh text-to-speech system.
KW - Acoustic model
KW - CAAG-GAN
KW - GAN
KW - Kazakh language
KW - Text-to-speech
UR - http://www.scopus.com/inward/record.url?scp=85104745943&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/1f7fdc37-ee0f-3e53-860b-68291efc3dd8/
U2 - 10.1007/s10772-021-09840-0
DO - 10.1007/s10772-021-09840-0
M3 - Article
AN - SCOPUS:85104745943
VL - 24
SP - 729
EP - 735
JO - International Journal of Speech Technology
JF - International Journal of Speech Technology
SN - 1381-2416
IS - 3
M1 - 3
ER -
ID: 76651638