In natural language processing, statistical methods based on a large annotated corpus are being used in these days. A corpus used in these approaches requires an accuracy for tagging. Building a large and accurately annotated corpus is a very difficult and hard work. Tagging Workbench was implemented as an integrated tool which helps users build a accurate part-of-speech (POS) tagged corpus with the less labor. But, many of errors resulted from failures in analyzing some complex endings. These errors caused unnecessary manual error-correction tasks.
This thesis presents a method that improves a morphological analyzer in Tagging Workbench by using predicative phrase segmentation knowledge. The knowledge of the pre-analyses was extracted automatically from POS tagged documents built by Tagging Workbench. A proposed method automatically extracts predicative phrase segmentation knowledge using the structure of predicate phrase from POS-tagged documents built by Tagging Workbench.
By applying predicative phrase segmentation knowledge in the process of morphological analysis, automatic POS tagging results were improved by 7%, and the amount of manual error-correction task in building a POS tagged corpus was reduced.