Research output: Contribution to journal › Article › peer-review
Emotion, age, and gender classification in children's speech by humans and machines. / Kaya, Heysem; Salah, Albert Ali; Karpov, Alexey; Frolova, Olga; Grigorev, Aleksey; Lyakso, Elena.
In: Computer Speech and Language, Vol. 46, 01.11.2017, p. 268-283.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Emotion, age, and gender classification in children's speech by humans and machines
AU - Kaya, Heysem
AU - Salah, Albert Ali
AU - Karpov, Alexey
AU - Frolova, Olga
AU - Grigorev, Aleksey
AU - Lyakso, Elena
PY - 2017/11/1
Y1 - 2017/11/1
N2 - In this article, we present the first child emotional speech corpus in Russian, called “EmoChildRu”, collected from 3 to 7 years old children. The base corpus includes over 20 K recordings (approx. 30 h), collected from 120 children. Audio recordings are carried out in three controlled settings by creating different emotional states for children: playing with a standard set of toys; repetition of words from a toy-parrot in a game store setting; watching a cartoon and retelling of the story, respectively. This corpus is designed to study the reflection of the emotional state in the characteristics of voice and speech and for studies of the formation of emotional states in ontogenesis. A portion of the corpus is annotated for three emotional states (comfort, discomfort, neutral). Additional data include the results of the adult listeners’ analysis of child speech, questionnaires, as well as annotation for gender and age in months. We also provide several baselines, comparing human and machine estimation on this corpus for prediction of age, gender and comfort state. While in age estimation, the acoustics-based automatic systems show higher performance, they do not reach human perception levels in comfort state and gender classification. The comparative results indicate the importance and necessity of developing further linguistic models for discrimination.
AB - In this article, we present the first child emotional speech corpus in Russian, called “EmoChildRu”, collected from 3 to 7 years old children. The base corpus includes over 20 K recordings (approx. 30 h), collected from 120 children. Audio recordings are carried out in three controlled settings by creating different emotional states for children: playing with a standard set of toys; repetition of words from a toy-parrot in a game store setting; watching a cartoon and retelling of the story, respectively. This corpus is designed to study the reflection of the emotional state in the characteristics of voice and speech and for studies of the formation of emotional states in ontogenesis. A portion of the corpus is annotated for three emotional states (comfort, discomfort, neutral). Additional data include the results of the adult listeners’ analysis of child speech, questionnaires, as well as annotation for gender and age in months. We also provide several baselines, comparing human and machine estimation on this corpus for prediction of age, gender and comfort state. While in age estimation, the acoustics-based automatic systems show higher performance, they do not reach human perception levels in comfort state and gender classification. The comparative results indicate the importance and necessity of developing further linguistic models for discrimination.
KW - Age recognition
KW - Computational paralinguistics
KW - Emotional child speech
KW - Emotional states
KW - Gender recognition
KW - Perception experiments
KW - Spectrographic analysis
UR - http://www.scopus.com/inward/record.url?scp=85021761471&partnerID=8YFLogxK
U2 - 10.1016/j.csl.2017.06.002
DO - 10.1016/j.csl.2017.06.002
M3 - Article
AN - SCOPUS:85021761471
VL - 46
SP - 268
EP - 283
JO - Computer Speech and Language
JF - Computer Speech and Language
SN - 0885-2308
ER -
ID: 36522018