GAN acoustic model for Kazakh speech synthesis › Научные исследования в СПбГУ

Standard

GAN acoustic model for Kazakh speech synthesis. / Kaliyev, Arman; Zeno, Bassel; Rybin, Sergey V.; Matveev, Yuri N.; Lyakso, Elena E.

в: International Journal of Speech Technology, Том 24, № 3, 3, 09.2021, стр. 729-735.

Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование

Harvard

Kaliyev, A, Zeno, B, Rybin, SV, Matveev, YN & Lyakso, EE 2021, 'GAN acoustic model for Kazakh speech synthesis', International Journal of Speech Technology, Том. 24, № 3, 3, стр. 729-735. https://doi.org/10.1007/s10772-021-09840-0

APA

Kaliyev, A., Zeno, B., Rybin, S. V., Matveev, Y. N., & Lyakso, E. E. (2021). GAN acoustic model for Kazakh speech synthesis. International Journal of Speech Technology, 24(3), 729-735. [3]. https://doi.org/10.1007/s10772-021-09840-0

Vancouver

Kaliyev A, Zeno B, Rybin SV, Matveev YN , Lyakso EE. GAN acoustic model for Kazakh speech synthesis. International Journal of Speech Technology. 2021 Сент.;24(3):729-735. 3. https://doi.org/10.1007/s10772-021-09840-0

Author

Kaliyev, Arman ; Zeno, Bassel ; Rybin, Sergey V. ; Matveev, Yuri N. ; Lyakso, Elena E. / GAN acoustic model for Kazakh speech synthesis. в: International Journal of Speech Technology. 2021 ; Том 24, № 3. стр. 729-735.

BibTeX

@article{dec6650d1b4e4ac0ac617279f00be293,

title = "GAN acoustic model for Kazakh speech synthesis",

abstract = "Recent studies on the application of generative adversarial networks (GAN) for speech synthesis have shown improvements in the naturalness of synthesized speech, compared to the conventional approaches. In this article, we present a new framework of GAN to train an acoustic model for speech synthesis. The proposed GAN consists of a generator and a pair of agent discriminators, where the generator produces acoustic parameters taking into account linguistic parameters; and the pair of agent discriminators are introduced to improve the naturalness of the synthesized speech. We feed the agents with acoustic and linguistic parameters, thereby the agents do not only examine the acoustic distribution, but also the relationship between linguistic and acoustic parameters. Training and testing were conducted on the Kazakh speech corpus. According to the results of this research, the proposed framework of GAN improves the accuracy of the acoustic model for the Kazakh text-to-speech system.",

keywords = "Acoustic model, CAAG-GAN, GAN, Kazakh language, Text-to-speech",

author = "Arman Kaliyev and Bassel Zeno and Rybin, {Sergey V.} and Matveev, {Yuri N.} and Lyakso, {Elena E.}",

note = "Publisher Copyright: {\textcopyright} 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2021",

month = sep,

doi = "10.1007/s10772-021-09840-0",

language = "English",

volume = "24",

pages = "729--735",

journal = "International Journal of Speech Technology",

issn = "1381-2416",

publisher = "Springer Nature",

number = "3",

}

RIS

TY - JOUR

T1 - GAN acoustic model for Kazakh speech synthesis

AU - Kaliyev, Arman

AU - Zeno, Bassel

AU - Rybin, Sergey V.

AU - Matveev, Yuri N.

AU - Lyakso, Elena E.

PY - 2021/9

Y1 - 2021/9

N2 - Recent studies on the application of generative adversarial networks (GAN) for speech synthesis have shown improvements in the naturalness of synthesized speech, compared to the conventional approaches. In this article, we present a new framework of GAN to train an acoustic model for speech synthesis. The proposed GAN consists of a generator and a pair of agent discriminators, where the generator produces acoustic parameters taking into account linguistic parameters; and the pair of agent discriminators are introduced to improve the naturalness of the synthesized speech. We feed the agents with acoustic and linguistic parameters, thereby the agents do not only examine the acoustic distribution, but also the relationship between linguistic and acoustic parameters. Training and testing were conducted on the Kazakh speech corpus. According to the results of this research, the proposed framework of GAN improves the accuracy of the acoustic model for the Kazakh text-to-speech system.

AB - Recent studies on the application of generative adversarial networks (GAN) for speech synthesis have shown improvements in the naturalness of synthesized speech, compared to the conventional approaches. In this article, we present a new framework of GAN to train an acoustic model for speech synthesis. The proposed GAN consists of a generator and a pair of agent discriminators, where the generator produces acoustic parameters taking into account linguistic parameters; and the pair of agent discriminators are introduced to improve the naturalness of the synthesized speech. We feed the agents with acoustic and linguistic parameters, thereby the agents do not only examine the acoustic distribution, but also the relationship between linguistic and acoustic parameters. Training and testing were conducted on the Kazakh speech corpus. According to the results of this research, the proposed framework of GAN improves the accuracy of the acoustic model for the Kazakh text-to-speech system.

KW - Acoustic model

KW - CAAG-GAN

KW - GAN

KW - Kazakh language

KW - Text-to-speech

UR - http://www.scopus.com/inward/record.url?scp=85104745943&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/1f7fdc37-ee0f-3e53-860b-68291efc3dd8/

U2 - 10.1007/s10772-021-09840-0

DO - 10.1007/s10772-021-09840-0

M3 - Article

AN - SCOPUS:85104745943

VL - 24

SP - 729

EP - 735

JO - International Journal of Speech Technology

JF - International Journal of Speech Technology

SN - 1381-2416

IS - 3

M1 - 3

ER -

ID: 76651638