Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
The sound database formation for the allophone-based model for english concatenative speech synthesis. / Evgrafova, Karina.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2005. p. 219-225 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3658 LNAI).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - The sound database formation for the allophone-based model for english concatenative speech synthesis
AU - Evgrafova, Karina
PY - 2005/12/1
Y1 - 2005/12/1
N2 - The goal of this paper is to describe the development of the sound database for the allophone-based model for English concatenative speech synthesis. The procedure of the sound unit inventory construction is described and its main results are presented. At present moment the optimized sound units inventory of the allophonic database for English concatenative speech synthesis contains 1200 elements (1000 vowel allophones and 200 consonant allophones). The smoothness of junctions between the allophones shows high quality of the segmentation made. The decrease in the number of the database components in the result of optimization does not affect the quality of the resulting synthesized speech. At the level of segments it can be evaluated as fairly high in terms of both naturalness and intelligibility.
AB - The goal of this paper is to describe the development of the sound database for the allophone-based model for English concatenative speech synthesis. The procedure of the sound unit inventory construction is described and its main results are presented. At present moment the optimized sound units inventory of the allophonic database for English concatenative speech synthesis contains 1200 elements (1000 vowel allophones and 200 consonant allophones). The smoothness of junctions between the allophones shows high quality of the segmentation made. The decrease in the number of the database components in the result of optimization does not affect the quality of the resulting synthesized speech. At the level of segments it can be evaluated as fairly high in terms of both naturalness and intelligibility.
UR - http://www.scopus.com/inward/record.url?scp=33646061350&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33646061350
SN - 3540287892
SN - 9783540287896
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 219
EP - 225
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
T2 - 8th International Conference on Text, Speech and Dialogue, TSD 2005
Y2 - 12 September 2005 through 15 September 2005
ER -
ID: 41279942