There has been much effort to realize speech recognition technology in mobile communication environment. The research works, especially on the way extracting feature parameters used for recognition, can be classified into three categories - the ways of feature extraction from input speech of codec, the packet created from encoder, and output speech of codec are available. Among these, the first way has limits to be used because the speech recognition system must be equipped with on a phone which has low computational power and memory.
There was a study that the recognition parameters extracted from the packet gives higher performance than those from output speech. But with the results on experiments in noise environment, I found that the performance of parameters extracted from output speech is higher. This is because of an effect of QCELP coder which decreases the energy in silence (or non-speech) frames. But there still exist the noise components in speech frames and we can improve the performance through noise reduction in those output speech frames. In this thesis, the adaptive comb filtering is employed for this work.
Adaptive comb filter has two constraints such that it can be applied only to the voiced frames and it must estimate reliable pitch period. The proposed method uses two kinds of information extracted from the packet to solve these problems. One is the frame rate related to a decision on a speech or silence frame and the other is pitch period information which is measured precisely by QCELP coder. Experiment results proved that the speech recognition system employing the proposed front-end gives superior performance in noise environment over a recognizer using feature parameters extracted directly from output speech or using those obtained applying the spectral subtraction to output speech.