Part-of-speech tagging in Korean has different aspects from that in English. In Korean, a phrase is made up of one or more morphemes. In order to tag morphemes in a phrase, we should analyze the phrase in advance. The analysis of a phrase can produce different number of morphemes, and different types of morphemes. Tagging can be done on phrase-based as was suggested in [LEE 1993]. There are, however, some problems in the phrasebased tagging. First, since we cannot predict all possible tags in advance, whenever we found a new tag we have to increase the number of states in the HMM, which requires modification of the whole HMM.
Second, since the number of tags in the phrase-based method is considerably large, the size of bigram and trigram for the phrase-based method becomes very big, which decrease the efficiency of the system. Third, since this method needs tagged corpus, we have to do hard work to make the corpus.
In this thesis, a morpheme-based method without tagged corpus is suggested to avoid the problems. We also suggest a revised Viterbi algorithm for the HMM to reduce duplicated computations for better efficiency. Experimental results indicate the morpheme-based tagging has advantage over the phrase-based one when there are many categories for morphemes.