Cross-Language Information Retrieval(CLIR) refers to retrieval when the query and the document database are in different languages. Query translation is the least expensive and more practical approach to CLIR when compared to full document translation.
In this thesis, we propose a Korean-English query translation method that combines bilingual dictionary based method and parallel corpora based method. In our dictionary based method, we present a technique that reduces the ambiguity problem using multi-lingual ontology and co-occurrence statistics. Parallel corpora based method supplements English query when Korean query is not found in the dictionary or the dictionary-based translation quality is poor. As the first step of the parallel corpora based module, original query is expanded by retrieving large Korean document collention. Then English query is generated by retrieving parallel corpora using expanded Korean queries.
Experiment were carried out with TREC-6 collections and English CLIR topics. The results show that the proposed system compensates the weakness of each method and performs better when the effective merging of the queries from both systems is done.