Emotion, age, and gender classification in children's speech by humans and machines

Heysem Kaya, Albert Ali Salah, Alexey Karpov, Olga Frolova, Aleksey Grigorev, Elena Lyakso

Результат исследований: Научные публикации в периодических изданияхстатьянаучнаярецензирование

12 Цитирования (Scopus)

Выдержка

In this article, we present the first child emotional speech corpus in Russian, called “EmoChildRu”, collected from 3 to 7 years old children. The base corpus includes over 20 K recordings (approx. 30 h), collected from 120 children. Audio recordings are carried out in three controlled settings by creating different emotional states for children: playing with a standard set of toys; repetition of words from a toy-parrot in a game store setting; watching a cartoon and retelling of the story, respectively. This corpus is designed to study the reflection of the emotional state in the characteristics of voice and speech and for studies of the formation of emotional states in ontogenesis. A portion of the corpus is annotated for three emotional states (comfort, discomfort, neutral). Additional data include the results of the adult listeners’ analysis of child speech, questionnaires, as well as annotation for gender and age in months. We also provide several baselines, comparing human and machine estimation on this corpus for prediction of age, gender and comfort state. While in age estimation, the acoustics-based automatic systems show higher performance, they do not reach human perception levels in comfort state and gender classification. The comparative results indicate the importance and necessity of developing further linguistic models for discrimination.

Язык оригиналаанглийский
Страницы (с-по)268-283
Число страниц16
ЖурналComputer Speech and Language
Том46
DOI
СостояниеОпубликовано - 1 ноя 2017

Отпечаток

Audio recordings
Emotional Speech
Linguistics
Human Perception
Acoustics
Questionnaire
Discrimination
Annotation
System Performance
Baseline
High Performance
Speech
Emotion
Children
Gender
Corpus
Human
Game
Prediction
Model

Предметные области Scopus

  • Программный продукт
  • Теоретические компьютерные науки
  • Человеко-машинное взаимодействие

Цитировать

@article{7089413468ea43d1b91434d22428b8d8,
title = "Emotion, age, and gender classification in children's speech by humans and machines",
abstract = "In this article, we present the first child emotional speech corpus in Russian, called “EmoChildRu”, collected from 3 to 7 years old children. The base corpus includes over 20 K recordings (approx. 30 h), collected from 120 children. Audio recordings are carried out in three controlled settings by creating different emotional states for children: playing with a standard set of toys; repetition of words from a toy-parrot in a game store setting; watching a cartoon and retelling of the story, respectively. This corpus is designed to study the reflection of the emotional state in the characteristics of voice and speech and for studies of the formation of emotional states in ontogenesis. A portion of the corpus is annotated for three emotional states (comfort, discomfort, neutral). Additional data include the results of the adult listeners’ analysis of child speech, questionnaires, as well as annotation for gender and age in months. We also provide several baselines, comparing human and machine estimation on this corpus for prediction of age, gender and comfort state. While in age estimation, the acoustics-based automatic systems show higher performance, they do not reach human perception levels in comfort state and gender classification. The comparative results indicate the importance and necessity of developing further linguistic models for discrimination.",
keywords = "Age recognition, Computational paralinguistics, Emotional child speech, Emotional states, Gender recognition, Perception experiments, Spectrographic analysis",
author = "Heysem Kaya and Salah, {Albert Ali} and Alexey Karpov and Olga Frolova and Aleksey Grigorev and Elena Lyakso",
year = "2017",
month = "11",
day = "1",
doi = "10.1016/j.csl.2017.06.002",
language = "English",
volume = "46",
pages = "268--283",
journal = "Computer Speech and Language",
issn = "0885-2308",
publisher = "Elsevier",

}

Emotion, age, and gender classification in children's speech by humans and machines. / Kaya, Heysem; Salah, Albert Ali; Karpov, Alexey; Frolova, Olga; Grigorev, Aleksey; Lyakso, Elena.

В: Computer Speech and Language, Том 46, 01.11.2017, стр. 268-283.

Результат исследований: Научные публикации в периодических изданияхстатьянаучнаярецензирование

TY - JOUR

T1 - Emotion, age, and gender classification in children's speech by humans and machines

AU - Kaya, Heysem

AU - Salah, Albert Ali

AU - Karpov, Alexey

AU - Frolova, Olga

AU - Grigorev, Aleksey

AU - Lyakso, Elena

PY - 2017/11/1

Y1 - 2017/11/1

N2 - In this article, we present the first child emotional speech corpus in Russian, called “EmoChildRu”, collected from 3 to 7 years old children. The base corpus includes over 20 K recordings (approx. 30 h), collected from 120 children. Audio recordings are carried out in three controlled settings by creating different emotional states for children: playing with a standard set of toys; repetition of words from a toy-parrot in a game store setting; watching a cartoon and retelling of the story, respectively. This corpus is designed to study the reflection of the emotional state in the characteristics of voice and speech and for studies of the formation of emotional states in ontogenesis. A portion of the corpus is annotated for three emotional states (comfort, discomfort, neutral). Additional data include the results of the adult listeners’ analysis of child speech, questionnaires, as well as annotation for gender and age in months. We also provide several baselines, comparing human and machine estimation on this corpus for prediction of age, gender and comfort state. While in age estimation, the acoustics-based automatic systems show higher performance, they do not reach human perception levels in comfort state and gender classification. The comparative results indicate the importance and necessity of developing further linguistic models for discrimination.

AB - In this article, we present the first child emotional speech corpus in Russian, called “EmoChildRu”, collected from 3 to 7 years old children. The base corpus includes over 20 K recordings (approx. 30 h), collected from 120 children. Audio recordings are carried out in three controlled settings by creating different emotional states for children: playing with a standard set of toys; repetition of words from a toy-parrot in a game store setting; watching a cartoon and retelling of the story, respectively. This corpus is designed to study the reflection of the emotional state in the characteristics of voice and speech and for studies of the formation of emotional states in ontogenesis. A portion of the corpus is annotated for three emotional states (comfort, discomfort, neutral). Additional data include the results of the adult listeners’ analysis of child speech, questionnaires, as well as annotation for gender and age in months. We also provide several baselines, comparing human and machine estimation on this corpus for prediction of age, gender and comfort state. While in age estimation, the acoustics-based automatic systems show higher performance, they do not reach human perception levels in comfort state and gender classification. The comparative results indicate the importance and necessity of developing further linguistic models for discrimination.

KW - Age recognition

KW - Computational paralinguistics

KW - Emotional child speech

KW - Emotional states

KW - Gender recognition

KW - Perception experiments

KW - Spectrographic analysis

UR - http://www.scopus.com/inward/record.url?scp=85021761471&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2017.06.002

DO - 10.1016/j.csl.2017.06.002

M3 - Article

VL - 46

SP - 268

EP - 283

JO - Computer Speech and Language

JF - Computer Speech and Language

SN - 0885-2308

ER -