Translation selection is a process to select, from a set of target language words corresponding to a source language word, one that conveys the correct sense of a source word and makes more fluent target sentences. Translation selection is a key problem in machine translation since the quality of translation varies significantly according to results of translation selection.
A source langauge word has various senses and each sense can be mapped into multiple target words. Based on this `word-to-sense and sense-to-word' relationship between a source word and its translations, we propose to select translation through disambiguation of a source word sense and selection of a target word. By dividing translation selection into two sub-problems, we can select translation using automatically obtained knowledge. Knowledge for sense disambiguation and word selection is extracted from a mono-bilingual dictionary and target language monolingual corpora. From a dictionary, we extract clue words for sense disambiguation, frequency information for senses, sense-to-word mapping and syntactic relation mapping information. From target language corpora, we extract statistics of target word co-occurrence. For translation selection, we introduce three measures: sense preference and sense probability for sense disambiguation, and word probability for word selection. The first one is based on knowledge from a bilingual dictionary, and the others are calculated using statistics from a target language corpus.
We applied our method to English-to-Korean translation. Translations of all content words - nouns, verbs, adjectives and adverbs - were obtained and evaluated. To exclude any kind of human intervention during evaluation, we extracted evaluation sets from bilingual copora using a bilingual dictionary. The experiment was conducted with altering combination of various clues and our results are compared with results of previous methods. The experiments show that our method outperforms previous methods by 9.47%~28.35% although we use knowledge extracted from easily obtainable resources.