Speaker identification is the selection of the best matched speaker with input speech among the enrolled speakers. Speaker identification is mainly used in telephone services since it uses only speech as its input.
In real environments, correct speaker identification is difficult for two main reasons. First, the number of enrolled speakers is large. In this case, subspaces which are represented by each speaker model can be covered by subspaces by other speaker models. Second, mis-matches occur between speaker models and input speech due to: insufficient training data, mis-matches between training and testing environments, and the effects of noise. Therefore, we need normalisation and scoring methods which will reduce the number of mis-matches.
As a solution for the overlapping of speaker subspaces, this thesis proposes a confidence measure based on significance testing in order to select candidates for identification results. If the obtained confidence value from input by this measure is greater than the predefined threshold, the identification system accepts the identification result. If the obtained confidence value is less than the threshold for the client set, it rejects the identification result and selects the proper candidates. This thesis also proposes a scoring method which eliminates the frames which have a lower average rank of selected candidates after candidate selection, as a solution for mis-matches between speaker model and input speech. As a result, every speaker has the same selected frames when calculating the normalised score.
In order to verify whether the proposed confidence measure accepts or rejects correctly, identification rates from all of the inputs and those inputs exceeding the pre-defined confidence level are compared. Those inputs exceeding the pre-defined confidence level (0.95) show an average of 28.71 percent higher identification rates than that of all inputs.
In order to verify the candidate selection method, identification rates from all of the inputs and the probability that an input speaker exists among candidates are compared. The probability that an input speaker exists among candidates shows an average of 10.44 percent higher than identification rates from all of the inputs.
The proposed scoring and normalisation method with candidates were compared with other scoring and normalisation methods. The proposed method shows an average of 2.78 percent higher identification rate than the conventional method when many client speakers and small training data were used.