When we apply a tagging system to an open-ended text, we can find many problems that make the system brittle. In particular, as unknown words in texts are the major source of the brittleness the successful handling of them is the key for the robustness of a tagging system.
In this thesis, we propose a robust Korean part-of-speech tagging method that consists of four modules: a morphological analyzer, an unknown word guessing module, an unknown word candidate filter, and a tagging module. When the morphological analysis fails, all the possible candidates for an unknown word are suggested from the given partial morpheme lattice so that a full morpheme lattice of a word phrase can be constructed. The unknown word candidate filter discards implausible candidates using the characteristics of Korean syllables and clue morphemes. The tagging module takes into account the possibility of some tags being used to explain unknown words. and it performs tagging that is done by selecting the most probable morphological analysis among the multiple results based on the relations of unknown words and their following words.
Experimental results show the proposed system is better than existing systems which do not use unknown word handling model. An interesting observation is that the accuracy of tagging known words increases as the accuracy of tagging unknown words increases.