In this thesis work, performances of speaker adaptation methods using a probabilistic spectral mapping matrix in HMM-based isolated word recognition are studied. HMM parameters in a speaker dependent isolated word recognition system are adapted to a new speaker by the probabilistic spectral mapping matrix, which can be obtained by several algorithms. These are Viterbi, DTW, and forward-backward algorithms. Among them, Viterbi and forward-backward algorithms are used in HMM-based speech recognition as a decoding or a training method.
In the HMM-based isolated word recognition system, input speech signal is first passed through a digital filter bank, and then quantized by a vector quantizer. In training phase, the quantized outputs are used in estimating model parameters by the maximum mutual information estimation(MMIE) technique or the maximum likelihood estimation(MLE) technique. And in testing phase, the likelihood of each model is calculated by applying the Viterbi method to the quantized outputs, then the model which yields maximum likelihood is selected as the recognized model(phoneme or word).
The goal of speaker adaptation is to minimize the amount of training speech required from a new speaker to achieve acceptable recognition performance. To solve this problem, a prototype speaker's well-trained phonetic hidden Markov models are modified using a probabilistic spectral mapping matrix. Computing the matrix is the main part in this speaker adaptation problem. And it is solved by the above three approaches. The Viterbi approach uses the prototype model parameter or the observation symbol probability matrix and the HMM state sequence. The forward-backward method makes use of the forward and backward parameters to obtain the probabilistic spectral mapping matrix. The correspondence between the prototype speech and the new input speech is the major concern of the DTW technique.
Computer simulation is done to obtain the performance of the isolated word recognition with speaker adaptation. Simulation results show that Viterbi approach yields the best performance. The performance improvements in recognition accuracy for isolated word recognition is 42.6 ~ 68.8%. Also, selection of the initial values of the matrix and the normalization method in computing the matrix affects the recognition accuracy.
HMM parameter의 화자 적응으로, Probabilistic spectrsl mapping matrix를 사용하였다. 이 방법은 새로운 화자의 특성과 원 화자의 특성사이의 차이를 적절한 parameter(probabilistic spectral mapping matrix)를 통하여 극복하는 것이다.
이 probabilistic spectral mapping matrix를 구하는 방식에 따라, forward-backward 접근법, DTW 접근법, 그리고 Viterbi 접근법이 있다. 또한, 새로운 화자의 data에 의한 간단한 re-estimation으로 HMM을 화자 적응하는 방법도 있다. 이들 모두에 대해 화자 적응을 실험한 결과, 80% 내외의 인식률을 보인 Viterbi 접근법이 가장 우수했다.
DTW 접근법은 Viterbi 접근법에 비해 인식률이 약간 낮으나, 여러가지 DTW algorithm 중에서 적절한 방식을 택한다면 화자 적응이 더 잘될 수도 있다.
Probabilistic spectral mapping matrix에서 이 행렬의 정규화 및 초기값에 대한 영향을 살펴보고, 적절한 초기값과 정규화는 인식률의 향상을 보여주는 것을 알았다.