In natural language processing, statistical methods based on a large annotated corpus are being used these days. To make the corpus be effective to such methods, the quality of the corpus as well as the quantity of it is important. But the task of building a large, faultless, annotated corpus is very difficult and labor-intensive.
This thesis presents a method that overcomes problems in building parts-of- speech tagged corpus. For the unknown word problem, we modify the morphological analyzer to recognize and notify unknown words in the input text to the user, who then can semi-automatically register them on the dictionary and re-analyze the text correctly. For the errors of an automatic tagging, we propose a rule-based error correction method which finds and corrects errors semi-automatically based on user-defined rules. We also make use of the user's error correction log to reflect the user‘s feedback during manual correction process.
Experiments were carried out on 10,000 Korean words to show the efficiency of error correction process of this workbench. The result shows that about 63.2% of tagging errors can be corrected semi-automatically and user- friendly.