Noisy speech recognition is one of the most important problems in speech recognition. In this thesis, the algorithm, which efficiently removes the mixed noise with speech in the feature representation domain, is proposed. The proposed method is based on the independent component analysis to separate the mixed noise. In addition, a new feature extraction method is proposed for mel-frequency cepstral coefficients. It sums up several adjacent FFT point values, and computes each band energy using the summed values. The simple analysis using sample variances of speech and noise shows its noise robustness, and the isolated word recognition experiments confirm a performance improvement for noisy speech. By using this feature extraction method, one unmixing network every summed value is required.
For the instantaneous mixtures of speech and noise, the noise components are removed almost completely, so it can be obtained the same performance as clean speech signal with less computational load. For the delayed mixtures, the more FFT point values are summed up or the longer time-delay is used, the less performance improvement is obtained. But, by making distances from noise sources to sensors same, it can be obtained the almost same performance improvement as the instantaneous mixtures because the time-delay of speech signal does not have much influence on recognition performance.
And, it is proposed a solution to real-time speech recognition using independent component analysis.