BaseNP is a non-recursive noun phrase that does not contain another noun phrases. BaseNP recognition is a base component for many large-scale natural language processing applications. It is the first step of many partial parses, Information Retrieval systems and Information Extraction systems.
This thesis proposes Korean baseNP recognition model using context window variation reflecting Korean language character. We can generate a large number of classifiers by training a memory based learning algorithm with a change of context window length and context window position. By using context window variation, we can use different information of left and right context in each classifier. So if we combine a number of different results using majority voting, we can obtain more improvement than only one classifier. When learning with context window variation, we apply solutions for various Korean properties. The Korean language has character of a morpheme unit, derivation, inflection. These Korean characteristics cause baseNP recognizer to be confused in learning. As solutions for these problems, we present processing of a derivative word, utilization of the stem and the last ending of an inflected word and use of modification relation.
By our proposed method, the precision is 92.19%, the recall is 91.08% and F-rate is 91.63%.