It is well known that the performance of speech recognition systems degrades under the presence of Lombard effect which refers to a change in the speaking style when the speaker is in a noisy environment. This thesis describes a feature compensation method that reduces the Lombard effect of the test data so that the mismatch between the training and the test data is reduced.
In this thesis, it is assumed that the Lombard speech is the output of a linear time-invariant system where the input is normal speech. The Lombard generating system is denoted as G and the objective of the thesis is to find and evaluate the performance of the inverse $G^{-1}$ which we call the Lombard compensation filter.
The magnitude response of the proposed Lombard compensation filter is obtained by taking the ratio of the spectra of normal speech and Lombard speech. The compensation filters are obtained under the following three separate assumptions: 1) G is independent of speech characteristics, 2) G is dependent on voiced/unvoiced frame decision and 3) G depends on the 7 phoneme groups classification of speech. To eliminate any error due to classification or any frame decision in the evaluation, correct frame decision was assumed known.
The proposed Lombard compensation method was found to reduce the word error rate (WER) of 50 Korean words with Lombard effect. Using 13th mel-frequency cepstral coefficients (MFCC) features, the word error rate of 35.25% with baseline was improved to 29.85%, 29.2%, 28.42% respectively under the three assumptions.