The more the number of syntactic units of a sentence increase, the more the ambiguities in parsing increase. Compound words such as compound noun or compound predicates explains much of the ambiguities. Compound words has to be first separated by a unit and then the structure of them should be revealed. This thesis focuses on reducing the number of syntactic units through preprocessing, especially on compound noun.
All the preprocessing steps except compound noun processing is rather simple, but contributes to reducing the syntactic units. Compound noun require more knowledge to reveal the structure than other preprocessing works. But, these knowledge is not easy to acquire. For these knowledge, we propose a modified dependency model using mutual information from trained corpus and heuristics.
Our evaluation shows that the number of syntactic units reduces by 62%. The accuracy of structure analysis of compound nouns is 77.9%.