Speaker adaptation is an efficient way to reduce the mismatch that typically occurs between the training and test condition of any speech recognizer. For HMM based speech recognizer, adaptation techniques are model adaptation to improve the model. Model adaptation techniques can usually be divided into two families of approaches to estimate model parameters.
On one hand, MAP adaptation directly estimates the model parameters to maximize a posteriori probability. Since MAP adaptation only reestimates model parameters of the corresponding units appearing in the adaptation data, a large amount of such data is needed to observe any significant improvement in performance. However, nice asymptotic properties are usually observed, meaning that the performance improves as the amount of adaptation data increases. On the other hand, MLLR adaptation applies a general transformation on some clusters of model parameters to maximize the likelihood of adaptation data. Because each individual model is transformed, the approach is quite effective when a small amount of adaptation data is available. However as the amount of adaptation data increases, the performance improvement quickly saturates.
In this thesis, I proposed to estimate model parameters using combination of MAP and MLLR adaptation. To obtain better performance regardless of the amount of adaptation data, model parameters are estimated by interpolation of SI mean, MAP adapted mean, and MLLR adapted mean per state. The weight vectors are calculated to maximize the likelihood of adaptation data. I use the Lagrange method to solve this problem efficiently. Experimental results have shown that the proposed method is better than MAP or MLLR adaptation alone.