Telephone speech recognition has a large number of applications such as remote control systems and reservation systems. But its performance is not sufficient for the real applications because of many degrading effects contained in telephone speech. The channel distortion of the telephone line is one of the major effects that severely degrades recognition performance and there have been many efforts to compensate it. Among them, RASTA (RelAtive SpecTrAl) processing is widely spread and known to be very effective. Recently many researches are concentrated on improving the RASTA method.
In the RASTA processing, RASTA filter suppresses slow varying factors such as channel distortions and quickly changing parts in speech signal to emphasize the important part of speech signal, which is similar to the human auditory perception process. But, because it uses the same filter for every telephone speech data whose channel distortions are different, it has limits on removing the channel distortions.
This thesis addresses the drawback of RASTA method and proposes a new channel-robust RASTA filter adaptation method to reduce channel distortions in telephone speech. The proposed method estimates channel bias for each telephone speech by maximum likelihood estimation and extracts reference signals which are channel-free. From the reference signals it determines shifted RASTA filter coefficients by gradient descent method. The resultant RASTA filter can remove various telephone channel distortions adaptively and emphasize the important part of speech signal. Experiments on real telephone speech data confirmed the effectiveness of the proposed method.