Standard

Using multiple acoustic feature sets for speech recognition. / Zolnay, András; Kocharov, Daniil; Schlüter, Ralf; Ney, Hermann.

In: Speech Communication, Vol. 49, No. 6, 01.06.2007, p. 514-525.

Research output: Contribution to journalArticlepeer-review

Harvard

Zolnay, A, Kocharov, D, Schlüter, R & Ney, H 2007, 'Using multiple acoustic feature sets for speech recognition', Speech Communication, vol. 49, no. 6, pp. 514-525. https://doi.org/10.1016/j.specom.2007.04.005

APA

Zolnay, A., Kocharov, D., Schlüter, R., & Ney, H. (2007). Using multiple acoustic feature sets for speech recognition. Speech Communication, 49(6), 514-525. https://doi.org/10.1016/j.specom.2007.04.005

Vancouver

Zolnay A, Kocharov D, Schlüter R, Ney H. Using multiple acoustic feature sets for speech recognition. Speech Communication. 2007 Jun 1;49(6):514-525. https://doi.org/10.1016/j.specom.2007.04.005

Author

Zolnay, András ; Kocharov, Daniil ; Schlüter, Ralf ; Ney, Hermann. / Using multiple acoustic feature sets for speech recognition. In: Speech Communication. 2007 ; Vol. 49, No. 6. pp. 514-525.

BibTeX

@article{8a46aa23f18e42d591e9d2bd2777e0fa,
title = "Using multiple acoustic feature sets for speech recognition",
abstract = "In this paper, the use of multiple acoustic feature sets for speech recognition is investigated. The combination of both auditory as well as articulatory motivated features is considered. In addition to a voicing feature, we introduce a recently developed articulatory motivated feature, the spectrum derivative feature. Features are combined both directly using linear discriminant analysis (LDA) as well as indirectly on model level using discriminative model combination (DMC). Experimental results are presented for both small- and large-vocabulary tasks. The results show that the accuracy of automatic speech recognition systems can be significantly improved by the combination of auditory and articulatory motivated features. The word error rate is reduced from 1.8% to 1.5% on the SieTill task for German digit string recognition. Consistent improvements in word error rate have been obtained on two large-vocabulary corpora. The word error rate is reduced from 19.1% to 18.4% on the VerbMobil II corpus, a German large-vocabulary conversational speech task, and from 14.1% to 13.5% on the British English part of the European parliament plenary sessions (EPPS) task from the 2005 TC-STAR ASR evaluation campaign.",
keywords = "Acoustic feature extraction, Articulatory features, Auditory features, Discriminative model combination, Linear discriminant analysis, Spectrum derivative feature, Voicing",
author = "Andr{\'a}s Zolnay and Daniil Kocharov and Ralf Schl{\"u}ter and Hermann Ney",
year = "2007",
month = jun,
day = "1",
doi = "10.1016/j.specom.2007.04.005",
language = "English",
volume = "49",
pages = "514--525",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",
number = "6",

}

RIS

TY - JOUR

T1 - Using multiple acoustic feature sets for speech recognition

AU - Zolnay, András

AU - Kocharov, Daniil

AU - Schlüter, Ralf

AU - Ney, Hermann

PY - 2007/6/1

Y1 - 2007/6/1

N2 - In this paper, the use of multiple acoustic feature sets for speech recognition is investigated. The combination of both auditory as well as articulatory motivated features is considered. In addition to a voicing feature, we introduce a recently developed articulatory motivated feature, the spectrum derivative feature. Features are combined both directly using linear discriminant analysis (LDA) as well as indirectly on model level using discriminative model combination (DMC). Experimental results are presented for both small- and large-vocabulary tasks. The results show that the accuracy of automatic speech recognition systems can be significantly improved by the combination of auditory and articulatory motivated features. The word error rate is reduced from 1.8% to 1.5% on the SieTill task for German digit string recognition. Consistent improvements in word error rate have been obtained on two large-vocabulary corpora. The word error rate is reduced from 19.1% to 18.4% on the VerbMobil II corpus, a German large-vocabulary conversational speech task, and from 14.1% to 13.5% on the British English part of the European parliament plenary sessions (EPPS) task from the 2005 TC-STAR ASR evaluation campaign.

AB - In this paper, the use of multiple acoustic feature sets for speech recognition is investigated. The combination of both auditory as well as articulatory motivated features is considered. In addition to a voicing feature, we introduce a recently developed articulatory motivated feature, the spectrum derivative feature. Features are combined both directly using linear discriminant analysis (LDA) as well as indirectly on model level using discriminative model combination (DMC). Experimental results are presented for both small- and large-vocabulary tasks. The results show that the accuracy of automatic speech recognition systems can be significantly improved by the combination of auditory and articulatory motivated features. The word error rate is reduced from 1.8% to 1.5% on the SieTill task for German digit string recognition. Consistent improvements in word error rate have been obtained on two large-vocabulary corpora. The word error rate is reduced from 19.1% to 18.4% on the VerbMobil II corpus, a German large-vocabulary conversational speech task, and from 14.1% to 13.5% on the British English part of the European parliament plenary sessions (EPPS) task from the 2005 TC-STAR ASR evaluation campaign.

KW - Acoustic feature extraction

KW - Articulatory features

KW - Auditory features

KW - Discriminative model combination

KW - Linear discriminant analysis

KW - Spectrum derivative feature

KW - Voicing

UR - http://www.scopus.com/inward/record.url?scp=34250015828&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2007.04.005

DO - 10.1016/j.specom.2007.04.005

M3 - Article

AN - SCOPUS:34250015828

VL - 49

SP - 514

EP - 525

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

IS - 6

ER -

ID: 41211279