Voice Activity Detector (VAD) Based on Long-Term Mel Frequency Band Features

Links

http://link.springer.com/chapter/10.1007/978-3-319-45510-5_40

DOI

https://doi.org/10.1007/978-3-319-45510-5_40
Other version

Sergey Salishev
Andrey Barabanov
Daniil Kocharov
Pavel Skrelin
Mikhail Moiseev

We propose a VAD using long-term 200 ms Mel frequency band statistics, auditory masking, and a pre-trained two level decision tree ensemble based classifier, which allows capturing syllable level structure of speech and discriminating it from common noises. Proposed algorithm demonstrates on the test dataset almost 100 % acceptance of clear voice for English, Chinese, Russian, and Polish speech and 100 % rejection of stationary noises independently of loudness. The algorithm is aimed to be used as a trigger for ASR. It reuses short-term FFT analysis (STFFT) from ASR frontend with additional 2 KB memory and 15 % complexity overhead

Original language	English
Pages (from-to)	352-358
Journal	Lecture Notes in Computer Science
Volume	9924
DOIs	https://doi.org/10.1007/978-3-319-45510-5_40
State	Published - 2016
Event	International Conference on Text, Speech, and Dialogue 2016 - Брно, Czech Republic Duration: 12 Apr 2016 → 16 Apr 2016 Conference number: 19 https://www.tsdconference.org/tsd2016/

Research areas

Voice Activity Detector Classification Decision tree ensemble Auditory masking

ID: 7595429