Voice Activity Detector (VAD) Based on Long-Term Mel Frequency Band Features › Научные исследования в СПбГУ

Ссылки

http://link.springer.com/chapter/10.1007/978-3-319-45510-5_40

DOI

https://doi.org/10.1007/978-3-319-45510-5_40
Другие версии

Sergey Salishev
Andrey Barabanov
Daniil Kocharov
Pavel Skrelin
Mikhail Moiseev

We propose a VAD using long-term 200 ms Mel frequency band statistics, auditory masking, and a pre-trained two level decision tree ensemble based classifier, which allows capturing syllable level structure of speech and discriminating it from common noises. Proposed algorithm demonstrates on the test dataset almost 100 % acceptance of clear voice for English, Chinese, Russian, and Polish speech and 100 % rejection of stationary noises independently of loudness. The algorithm is aimed to be used as a trigger for ASR. It reuses short-term FFT analysis (STFFT) from ASR frontend with additional 2 KB memory and 15 % complexity overhead

Язык оригинала	английский
Страницы (с-по)	352-358
Журнал	Lecture Notes in Computer Science
Том	9924
DOI	https://doi.org/10.1007/978-3-319-45510-5_40
Состояние	Опубликовано - 2016
Событие	International Conference on Text, Speech, and Dialogue 2016 - Брно, Чехия Продолжительность: 12 апр 2016 → 16 апр 2016 Номер конференции: 19 https://www.tsdconference.org/tsd2016/

ID: 7595429

Pure – это продукт компании Elsevier
На данном информационном ресурсе могут быть опубликованы архивные материалы
с упоминанием физических и юридических лиц, включенных Министерством юстиции
Российской Федерации в реестр иностранных агентов

Вход в Pure