In this thesis work, we study Korean speech recognition using generalized triphone models obtained by an information-theoretic context merging procedure.
When large vocabulary continuous speech is recognized, subword unit like phoneme or triphone must be used as a recognition unit. It is known that the use of triphone is advantageous over phoneme because it can overcome some drawbacks that phoneme has. But, the use of triphones can result in poor trainability if training data is insufficient. Also, similar triphones may be reguarded as different ones, and thus a triphone model may produce an unnecessarily specified model.
To alleviate these problems, we find a generalized triphone set obtained by merging similar triphones of the same phoneme identity. We used two information-theoretic methods using VQ codewords and speech parameter vectors respectively.
We performed simulation using 74 Korean isolated words. Simulation results show that by using this generalized triphone model, the recognition performance is considerably improved as compared with phoneme and triphone without generalization.