Voice Activity Detector (VAD) Based on Long-Term Mel Frequency Band Features

Standard

Voice Activity Detector (VAD) Based on Long-Term Mel Frequency Band Features. / Salishev, Sergey; Barabanov, Andrey ; Kocharov, Daniil ; Skrelin, Pavel; Moiseev, Mikhail.

в: Lecture Notes in Computer Science, Том 9924, 2016, стр. 352-358.

Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование

BibTeX

@article{e24556bba37242f58b07cd8ba645b8eb,

title = "Voice Activity Detector (VAD) Based on Long-Term Mel Frequency Band Features",

abstract = "We propose a VAD using long-term 200 ms Mel frequency band statistics, auditory masking, and a pre-trained two level decision tree ensemble based classifier, which allows capturing syllable level structure of speech and discriminating it from common noises. Proposed algorithm demonstrates on the test dataset almost 100 % acceptance of clear voice for English, Chinese, Russian, and Polish speech and 100 % rejection of stationary noises independently of loudness. The algorithm is aimed to be used as a trigger for ASR. It reuses short-term FFT analysis (STFFT) from ASR frontend with additional 2 KB memory and 15 % complexity overhead",

keywords = "Voice Activity Detector Classification Decision tree ensemble Auditory masking",

author = "Sergey Salishev and Andrey Barabanov and Daniil Kocharov and Pavel Skrelin and Mikhail Moiseev",

year = "2016",

doi = "10.1007/978-3-319-45510-5_40",

language = "English",

volume = "9924",

pages = "352--358",

journal = "Lecture Notes in Computer Science",

issn = "0302-9743",

publisher = "Springer Nature",

note = "International Conference on Text, Speech, and Dialogue 2016, TSD 2016 ; Conference date: 12-04-2016 Through 16-04-2016",

url = "https://www.tsdconference.org/tsd2016/",

}

RIS

TY - JOUR

T1 - Voice Activity Detector (VAD) Based on Long-Term Mel Frequency Band Features

AU - Salishev, Sergey

AU - Barabanov, Andrey

AU - Kocharov, Daniil

AU - Skrelin, Pavel

AU - Moiseev, Mikhail

N1 - Conference code: 19

PY - 2016

Y1 - 2016

N2 - We propose a VAD using long-term 200 ms Mel frequency band statistics, auditory masking, and a pre-trained two level decision tree ensemble based classifier, which allows capturing syllable level structure of speech and discriminating it from common noises. Proposed algorithm demonstrates on the test dataset almost 100 % acceptance of clear voice for English, Chinese, Russian, and Polish speech and 100 % rejection of stationary noises independently of loudness. The algorithm is aimed to be used as a trigger for ASR. It reuses short-term FFT analysis (STFFT) from ASR frontend with additional 2 KB memory and 15 % complexity overhead

AB - We propose a VAD using long-term 200 ms Mel frequency band statistics, auditory masking, and a pre-trained two level decision tree ensemble based classifier, which allows capturing syllable level structure of speech and discriminating it from common noises. Proposed algorithm demonstrates on the test dataset almost 100 % acceptance of clear voice for English, Chinese, Russian, and Polish speech and 100 % rejection of stationary noises independently of loudness. The algorithm is aimed to be used as a trigger for ASR. It reuses short-term FFT analysis (STFFT) from ASR frontend with additional 2 KB memory and 15 % complexity overhead

KW - Voice Activity Detector Classification Decision tree ensemble Auditory masking

U2 - 10.1007/978-3-319-45510-5_40

DO - 10.1007/978-3-319-45510-5_40

M3 - Article

VL - 9924

SP - 352

EP - 358

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

T2 - International Conference on Text, Speech, and Dialogue 2016

Y2 - 12 April 2016 through 16 April 2016

ER -

ID: 7595429

Standard

Harvard

APA

Vancouver

Author

BibTeX

RIS