The speech waveform carries a variety of information. Among them, voice individuality plays an important role in daily speech communication between people. Voice conversion is a technique used to change or modify speaker individuality; i.e., speech uttered by one speaker is transformed in order to sound as if it had been articulated by another speaker. Conventional voice conversion methods use only static characteristics (spectrum envelope) of speaker and do not consider dynamic characteristics such as formant trajectory.
In this thesis, we propose a new voice conversion method based on hidden Markov model (HMM) to utilize dynamic characteristics of speaker. Each state of HMM represents a part of speaker information, which is a set of acoustically similar feature parameters. A speaker is modeled by state-dependent codebooks with a first-order stochastic Markov chain, and the mapping between spakers is defined in each state. We developed two voice conversion systems; one uses source speaker's transitional probabilities for optimal representation of input speech, and the other uses target speaker's transitional probabilities to produce converted speech similar to target speech.
To evaluate the performance of the proposed systems, objective and subjective quality tests were carried out. Experimental results showed that the proposed systems outperform conventional codebook mapping method based on vector quantization, and succeeded in changing speaker individuality resonably well.