Word clustering is the process of classifying words into each corresponding class. There are two main streams in classifying words: a manual method and an automatic method. A manual method requires too much effort and time whereas an automatic method is less accurate than a manual method because of data sparseness problem. Therefore we need a method that complements each methods.
In this thesis, we propose two techniques to improve the accuracy of an automatic noun clustering method. One is to filter attributes of nouns on which the clustering process is based. By using category utility, which is the combination of intra-class similarity and inter-class difference measures, we can filter those attributes which improve clustering results. On the other hand, in order to correct the erroneous results that may result an automatic method, we incorporate the knowledge of an expert through a post-editor. This post-editor enables an expert to find out misclassified words, so that system can re-cluster them.
The noun clustering system that we implemented consists of four modules: a partial parser, a noun-attribute representation module, a noun clustering module, and a post-editor. In the process of automatic clustering, we use restrictions on what words can appear together in the same context, and in particular, on what words can be arguments of what predicates.
To evaluate the performance of our system, we carried out two experiments. First experiment which compared our attribute-filtering method with a method based only on frequencies proved our method promising. Second experiment was to prove the effectiveness of the post-editor, which showed that the proposed system increased both efficiency and accuracy of word clustering work.