A parallel corpus is a set of multilingual texts of the same content. Study on parallel corpus leads to the acquisition of linguistic resources such as bilingual dictionary, bilingual grammars and translation examples. Alignment refers to the establishment of the correspondences between matching elements in parallel corpus. The methods that made use of collocation probability and relative positions for English/French alignment do not directly apply to the case of Korean/English. This is due to the fact that the matching unit of Korean and English is more variable and the word order clue is not reliable.
This thesis presents a method that overcomes the two problems of Korean/English alignment. For the unit matching problem, we extended the alignment unit from words to phrases. We also associated functional words of Korean with the positions of English, which captures the position clue.
We use the EM algorithm to estimate parameters of our model and we propose an alignment algorithm based on dynamic programming. Experiments were carried out on 253,000 English words and its Korean translations. The result shows that the proposed model achieves 68.7% accuracy at phrase level and 89.2% accuracy of bilingual dictionary induced from the alignment.