Named-entity recognition and classification is a task that identifies proper names in text and classifies their types such as persons, organizations, locations, etc. This is an important subtask in most natural language processing applications, in particular information retrieval and extraction.
There are two major approaches to classifying Named Entities - one based on rules and the other based on a supervised learning. The former approach is costly because a rule and a dictionary have to be changed according to the application. The latter approach needs costly hand-tagged training data for learning.
This thesis proposes an unsupervised learning model that does not use training data for classifying NEs. Initial features are created by a small-size NE dictionary and modified by a few rules extracted in corpora automatically. This model uses these features as a training data for machine learning. The learning for classification is progressed by the combination of three different learning methods such as Maximum Entropy Model, Memory-based Learning and Sparse Network of Winnows.
Unlike previous unsupervised model, this model considers semantic ambiguity of NE and then the model uses special ensemble method to resolve it.
The experimental result shows 77.0% in precision and 78.8% in recall for Korean news articles.