Recently, there have been considerable researches for the use of speech as a medium for man-machine communication. Among those, many efforts have been paid for realization of speech output systems which can appropriately respond with high-quality speech to the spoken inquiry of users. Especially, speech output system needs dialogue-style prosody control for naturalness of synthesized speech.
This paper proposes the intonation model that generates Korean dialogue-style intonation. This model uses a new intonation labeling method, referred to as the RFC(Rise/Fall/Connection) model, that represents intonation patterns with 7 F0 labels and 3 pause labels. To generate intonation patterns, this model uses stochastic method. This paper also suggests an efficient selection method of F0 feature parameters for F0 contour generation, which uses clustered feature parameters instead of mean values of feature parameters.
We achieved 80.78% accuracy in the F0 label generation and 96.66% accuracy in the pause label generation by the proposed intonation model. And, the synthesized speech using clustered feature parameters was evaluated more natural than using mean values of feature parameters.