Linguistic knowledge can be obtained from bilingual corpus by using alignment techniques. Bilingual alignment is harder for the languages of different structures than those of similar structures.
In this thesis, I present a bilingual alignment technique to extract compound nouns from the languages of different structures, specially from English-Korean bilingual corpus.
Because compound nouns appear in the neighboring position both in English and in Korean sentence, I utilize it; I use only content words and combine the neighboring words to find compound noun candidates at the estimation stage. At the initial stage, the probability of simple word translation is calculated using cooccurrence information. At next stages, the probability of compound word translation is calculated repeatedly by using the previous simple word translation probability and cooccurrence information. The probability is calculated repeatedly using reestimation method.
Experiment was carried out with UR Final Act English-Korean parallel corpus. 2,290 translation pairs were acquired and the precision of them was 74%.