BaseNP means a non-recursive noun phrase that does not contain inner noun phrases. BaseNP can be easily identified by only simple word patterns and part of speech patterns. So it is used in many areas including preprocessing phase for parsing, information extraction. BaseNP identification in Korean is studied using similar approaches as those used in English. Because of the head-final property of Korean(Cho, 1986), more global information is considered useful in Korean baseNP identification. In this thesis, we apply the state-based model that easily considers global information to Korean baseNP identification.
This thesis focuses on the directionality of the state transitions in the state-based baseNP identification model. Because of the characteristics of agglutinative language, we see that, in the Korean baseNP identification, the beginning position of noun phrase is difficult to identify and the ending position is easy to identify. This fact implies that the properties and performance of the model can be changed if the state transition is processed right-to-left manner. According to this intuition, we consider not only forward-processing state-based model but also backward-processing state-based model that is newly proposed. Moreover, these two models are combined using several methods. The first one is choosing one of the two sub-results that are different from each other. The second one is applying two processing models sequentially.
The performance of both forward-processing model and backward-processing model is better than that of previous models for Korean baseNP identification. Further improvement is achieved by combining two models. The precision is 92.55% and the recall is 90.90%.