Automatic Detection of Backchannels in Russian Dialogue Speech

Результат исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференции


This paper deals with acoustic properties of backchannels – those turns within a dialogue which do not convey information but signify that the speaker is listening to his/her interlocutor (uh-huh, hm etc.). The research is based on a Russian corpus of dialogue speech, SibLing, a part of which (339 min of speech) was manually segmented into backchannels and non-backchannels. Then, a number of acoustic parameters was calculated: duration, intensity, fundamental frequency, and pause duration. Our data have shown that in Russian speech backchannels are shorter and have lower loudness and pitch than non-backchannels. After that, two classifiers were tested: CART and SVM. The highest efficiency was achieved using SVM (F 1 = 0.651) and the following feature set: duration, maximum fundamental frequency, melodic slope. The most valuable feature was duration.
Язык оригиналаанглийский
Название основной публикацииSpeech and Computer
Подзаголовок основной публикации22nd International Conference, SPECOM 2020, St. Petersburg, Russia, October 7–9, 2020, Proceedings
РедакторыAlexey Karpov, Rodmonga Potapova
Место публикацииCham
ИздательSpringer Nature
ISBN (электронное издание)978-3-030-60276-5
ISBN (печатное издание)978-3-030-60275-8
СостояниеОпубликовано - 2020
Событие22nd International Conference on Speech and Computer - St. Petersburg, Российская Федерация
Продолжительность: 7 окт 20209 окт 2020

Серия публикаций

НазваниеLecture Notes in Computer Science
ISSN (печатное издание)0302-9743


конференция22nd International Conference on Speech and Computer
Сокращенный заголовокSPECOM 2020
СтранаРоссийская Федерация
ГородSt. Petersburg

Ключевые слова

  • dialogue speech
  • backchannel
  • turn-taking
  • speech acoustics
  • Russian

Fingerprint Подробные сведения о темах исследования «Automatic Detection of Backchannels in Russian Dialogue Speech». Вместе они формируют уникальный семантический отпечаток (fingerprint).