This paper presents the comparision of MLP and Gaussian mixtures in phoneme recognition for the development of MLP/HMM hybrid recognition system. Recognition results are presented at the frame level for the TIMIT task. The 44 phone set based on that used by Kai-Fu Lee in the development of the SPHINX system is used and the phone model is 1-state left-to-right HMM. MLP is traned by error back-propagation(EBP) algorithm with mean square error(MSE) function. The outputs of MLP can estimate a {\it posteriori} probabilities of output classes conditioned on the input. The means and variances, parameters of Gaussian mixtures are estimated via simple average without decoding.
Context is very important in speech recognition. In this paper, contextual information is handled by increasing the dimensionality of the observation vector to include some parameterization of the neighboring acoustic vectors. The contextual information of 9 consecutive frames is concerned.