Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
Voice Activity Detector (VAD) Based on Long-Term Mel Frequency Band Features. / Salishev, Sergey; Barabanov, Andrey; Kocharov, Daniil; Skrelin, Pavel; Moiseev, Mikhail.
в: Lecture Notes in Computer Science, Том 9924, 2016, стр. 352-358.Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
}
TY - JOUR
T1 - Voice Activity Detector (VAD) Based on Long-Term Mel Frequency Band Features
AU - Salishev, Sergey
AU - Barabanov, Andrey
AU - Kocharov, Daniil
AU - Skrelin, Pavel
AU - Moiseev, Mikhail
N1 - Conference code: 19
PY - 2016
Y1 - 2016
N2 - We propose a VAD using long-term 200 ms Mel frequency band statistics, auditory masking, and a pre-trained two level decision tree ensemble based classifier, which allows capturing syllable level structure of speech and discriminating it from common noises. Proposed algorithm demonstrates on the test dataset almost 100 % acceptance of clear voice for English, Chinese, Russian, and Polish speech and 100 % rejection of stationary noises independently of loudness. The algorithm is aimed to be used as a trigger for ASR. It reuses short-term FFT analysis (STFFT) from ASR frontend with additional 2 KB memory and 15 % complexity overhead
AB - We propose a VAD using long-term 200 ms Mel frequency band statistics, auditory masking, and a pre-trained two level decision tree ensemble based classifier, which allows capturing syllable level structure of speech and discriminating it from common noises. Proposed algorithm demonstrates on the test dataset almost 100 % acceptance of clear voice for English, Chinese, Russian, and Polish speech and 100 % rejection of stationary noises independently of loudness. The algorithm is aimed to be used as a trigger for ASR. It reuses short-term FFT analysis (STFFT) from ASR frontend with additional 2 KB memory and 15 % complexity overhead
KW - Voice Activity Detector Classification Decision tree ensemble Auditory masking
U2 - 10.1007/978-3-319-45510-5_40
DO - 10.1007/978-3-319-45510-5_40
M3 - Article
VL - 9924
SP - 352
EP - 358
JO - Lecture Notes in Computer Science
JF - Lecture Notes in Computer Science
SN - 0302-9743
T2 - International Conference on Text, Speech, and Dialogue 2016
Y2 - 12 April 2016 through 16 April 2016
ER -
ID: 7595429