We propose a VAD using long-term 200 ms Mel frequency band statistics, auditory masking, and pre-trained two level decision tree ensemble based classifier, which allows capturing syllable level structure of speech and discriminating it from com-mon noises. Proposed algorithm demonstrates almost 100% acceptance of clear voice for English, Chinese, Russian, and Polish speech and 100% rejection of sta-tionary noises independently of loudness
Original languageEnglish
Title of host publicationProceedings of the 7th Tutorial and Research Workshop on Experimental Linguistics ExLing 2016
Place of PublicationAthens
PublisherNational and Kapodistrian University of Athens
ISBN (Print)2529-1092; 978-960-466-161-9
StatePublished - 2016
Event7th Tutorial and Research Workshop on Experimental Linguistics: ExLing 2016 - Санкт-Петербург, Russian Federation
Duration: 27 Jun 20162 Jul 2016
Conference number: 7


Conference7th Tutorial and Research Workshop on Experimental Linguistics
Abbreviated titleExLing 2016
Country/TerritoryRussian Federation
Internet address

    Research areas

  • Voice Activity Detector, classification, decision tree ensemble, auditory masking, phonetic features

    Scopus subject areas

  • Signal Processing

