This paper describes a corpus-based modality generation of a Korean synthesizer. Modalities may be expressed by modality morphemes such as auxiliary verbs and verb endings. To form a complete predicate, they are concatenated together with a main-verb stem, being arranged in the Korean-specific modality order, which is neither a linear order nor a partial order mathematically. To lexicalize a modality, the synthesizer must choose the best one among several different morpheme candidates whose meanings are very similar to one another, since each of them shows a subtle difference from the others as far as stylistic naturalness is concerned.
To cope with these difficulties, a corpus-based modality generation is suggested, where a large corpus is analyzed to acquire reliable linguistic knowledge on modalities. Through the corpus analysis, firstly, auxiliary verbs are classified into modality groups according to their modal meanings and grammatical functions. Secondly, the representative for each modality group is selected mainly on the basis of frequency in the corpus. Thirdly, the corpusbased ordering relation among a set of modality groups is transformed into a partial ordering by removing some pairwise orderings, and then through the topological sorting we derive a linear modality order covering as much actual ordering information as possible.
Finally, by performance evaluation, we show that the corpus-based approach may be a great help to the improvement of the conventional rule-based Korean synthesizer.