Probabilistic models have been widely used for natural language processing. Part-of-speech tagging, which is assigning the most likely tag to each word in a given sentence, is the one of the problems which can be solved by statistical approach. Many researchers have solved the problem using hidden Markov model (HMM), which is well known as one of statistical model. But it has many difficulties: integrating heterogeneous information, coping with data sparseness problem and adapting to new environments.
In this paper, we propose a Markov random field (MRF) model based approach to the tagging problem. The MRF provides the base frame to combine various statistical information with maximum entropy (ME) method. As Gibbs distribution can be used to describe a posteriori probability of tagging, we use it in maximum a posteriori (MAP) estimation of optimizing process. Besides, several tagging models are developed to show the effect of adding information.
Experimental results show that the performance of tagger gets improved as we add more statistical information, and that MRF based tagging model is better than HMM based tagging model in data sparseness problem.