Using multiple acoustic feature sets for speech recognition

Research output: Contribution to journal › Article › peer-review

Department of Phonetics and Methods for Teaching Foreign Languages

DOI

https://doi.org/10.1016/j.specom.2007.04.005
Final published version

András Zolnay
Daniil Kocharov
Ralf Schlüter
Hermann Ney

In this paper, the use of multiple acoustic feature sets for speech recognition is investigated. The combination of both auditory as well as articulatory motivated features is considered. In addition to a voicing feature, we introduce a recently developed articulatory motivated feature, the spectrum derivative feature. Features are combined both directly using linear discriminant analysis (LDA) as well as indirectly on model level using discriminative model combination (DMC). Experimental results are presented for both small- and large-vocabulary tasks. The results show that the accuracy of automatic speech recognition systems can be significantly improved by the combination of auditory and articulatory motivated features. The word error rate is reduced from 1.8% to 1.5% on the SieTill task for German digit string recognition. Consistent improvements in word error rate have been obtained on two large-vocabulary corpora. The word error rate is reduced from 19.1% to 18.4% on the VerbMobil II corpus, a German large-vocabulary conversational speech task, and from 14.1% to 13.5% on the British English part of the European parliament plenary sessions (EPPS) task from the 2005 TC-STAR ASR evaluation campaign.

Original language	English
Pages (from-to)	514-525
Number of pages	12
Journal	Speech Communication
Volume	49
Issue number	6
DOIs	https://doi.org/10.1016/j.specom.2007.04.005
State	Published - 1 Jun 2007

Scopus subject areas

Software
Modelling and Simulation
Communication
Language and Linguistics
Linguistics and Language
Computer Vision and Pattern Recognition
Computer Science Applications

Research areas

Acoustic feature extraction, Articulatory features, Auditory features, Discriminative model combination, Linear discriminant analysis, Spectrum derivative feature, Voicing

ID: 41211279