Voice Activity Detector (VAD) Based on Long-Term Mel Frequency Band Features

Sergey Salishev, Andrey Barabanov, Daniil Kocharov, Pavel Skrelin, Mikhail Moiseev

Результат исследований: Научные публикации в периодических изданияхстатьярецензирование

3 Цитирования (Scopus)


We propose a VAD using long-term 200 ms Mel frequency band statistics, auditory masking, and a pre-trained two level decision tree ensemble based classifier, which allows capturing syllable level structure of speech and discriminating it from common noises. Proposed algorithm demonstrates on the test dataset almost 100 % acceptance of clear voice for English, Chinese, Russian, and Polish speech and 100 % rejection of stationary noises independently of loudness. The algorithm is aimed to be used as a trigger for ASR. It reuses short-term FFT analysis (STFFT) from ASR frontend with additional 2 KB memory and 15 % complexity overhead
Язык оригиналаанглийский
Страницы (с-по)352-358
ЖурналLecture Notes in Computer Science
СостояниеОпубликовано - 2016
СобытиеInternational Conference on Text, Speech, and Dialogue 2016 - Брно, Чехия
Продолжительность: 12 апр 201616 апр 2016
Номер конференции: 19

Fingerprint Подробные сведения о темах исследования «Voice Activity Detector (VAD) Based on Long-Term Mel Frequency Band Features». Вместе они формируют уникальный семантический отпечаток (fingerprint).