The technique of hidden Markov models has been established as one of the most successful methods applied to the problem of speech recognition. However, the mismatch between the acoustic conditions during training and recognition causes a performance deterioration in real applications of speech recognition systems. Two important effects are the presence of a stationary background noise and the frequency response of the transmission channel from the speaker to audio input of recognizer.
There have been many efforts to compensate the noise effects. Among them, model based techniques are very effective approach for compensating the environmental mismatch. They keep model parameters to discriminate among different classes of signals. But they require much computation and precise noise information.
This thesis proposes fast covariance compensation methods and real time noise estimation techniques for on-line model compensation. To reduce the compensation cost, the proposed method compensates the covariances of a HMM directly at log-spectral domain, which is based on the observation that the energy in a frequency band is dominated either by clean speech energy or by noise energy. For estimating the background noise information from each input signal, the proposed method uses modified weighted average method. In addition, to estimate the frequency response of a transmission channel, the proposed method extracts clean speech signal from clean HMM using a robust distortion measure for noisy speech; the angle between clean and noisy speech signal vectors. Using the resultant information and spectral subtraction based technique, the frequency response is estimated. Experiments on artificially produced speech data confirm that the proposed method is fast and effective technique for the on-line model compensation.