Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique will make it possible to verify the identity of persons accessing systems, that is, access control by voice, in various services. In general, most of the text-dependent speaker recognition methods assume that each part of speech has an equal amount of information to represent a speaker, although it differently contributes to speaker recognition. To improve performance, we make efficient use of the clues of recognition, that is, speaker information.
This thesis proposes a speaker recognition method which applies different importance to all basic portions of a sampled speech waveform. To define the quantity of the speaker information contained in each frame of speech signal, we use F-ratio measure, which is the technique to select proper feature parameters in speaker recognition. The measured quantity is used as a weighting factor and incorporated into a scoring method of speaker recognition. To reject an impostor efficiently in speaker identification system, a post-processing algorithm is also proposed. The proposed algorithm combines the scoring method for speaker identification and the speaker information measurement.
In speaker verification experiments, the proposed method reduced equal error rates considerably as compared to a conventional method which treats all speech segments to have the same importance. In speaker identification experiments, the proposed method marked relatively 28% higher recognition rate than the baseline system, and was more robust in long-term variation. These results demonstrate that the proposed method is efficient in measuring speaker information and more proper for speaker recognition.