한국과학기술원 도서관

서지주요정보
A fast speaker identification method using HMMs and phonetic GMMS = HMM과 음소별 GMM을 결합한 고속 회자식별 방법
서명 / 저자	A fast speaker identification method using HMMs and phonetic GMMS = HMM과 음소별 GMM을 결합한 고속 회자식별 방법 / Suk-Bong Kwon.
발행사항	[대전 : 한국정보통신대학교, 2004].
Online Access	원문보기 원문인쇄

소장정보

등록번호

DM0000504

소장위치/청구기호

학술문화관(문화관) 보존서고

ICU/MS04-77 2004

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

This thesis proposes a fast text-independent speaker identification method using phonetic GMMs. The individual Gaussian component of a GMM can accurately represent acoustic characteristics of a speaker, so the GMM is effective to make a speaker model in text-independent condition. In the text- dependent speaker identification, input speech content for identification is determined a priori. In this application, using the hidden Markov model (HMM) as a speaker model shows better accuracy, since the HMM can model the temporal structure of the input speech as well as the speaker identity. When we build a speaker GMM for text-independent speaker identification, sufficient training data are required to estimate the GMM parameters precisely. On the other hand, the HMM-based text-independent speaker model doesn't demand so many training data in building the speaker HMMs. In order to combine the advantages of the GMMs and the HMMs in the text-independent speaker identification, we propose a system architecture using phonetic GMMs. The speaker identification using phonetic GMMs uses three different types of models: speaker-independent phone HMM, baseline speaker GMM, phonetic speaker GMM. The HMM is used to get the segmental information of phones the baseline GMM is used to obtain the N-best speakers from all registered speakers, and the phonetic GMM is finally used to find a person who speaks to the system. From the experiments, as the number of mixtures of the baseline GMM is increased to 320, we obtained an identification accuracy similar to that of the phonetic GMM with 14 mixtures for 45 phones, but the time elapsed to identify the speaker was longer five times than that of the phonetic GMM. Hence the phonetic GMMs can save the elapsed time, but the number of parameters is much greater than that of the baseline GMM because of using three mode types. This problem can be overcome more or less by tying phones into some classes. This is based on the fact that the likelihood for input utterance is dominant on intervals of vowels since they are much longer than those of consonants. In this thesis, we didn't tie vowels but tied consonants into 4 classes for each phones. As a result, not only the phonetic class GMMs still have similar identification accuracy with the phonetic GMMs, but also the number of parameters can be reduced to the half.

지금까지 화자인식 시스템에서 GMM(Gaussian Mixture Model)과 HMM(Hidden Markov Model)이 효과적인 모델로 많이 사용되어 왔다. 문맥독립형 화자식별 시스템에서는 GMM이 좋은 성능을 보이고 있고, 문맥 종속형 화자식별 시스템에서는 HMM이 효과적으로 사용되고 있다. 최근에는 보다 종은 성능을 얻기 위해 GMM과 HMM외에 다른 추가적인 정보를 이용하여 인식률을 향상 시키고 있거나 화자식별 과정에서 여러 가지 다른 알고리즘을 적용하여 화자식별 성능을 높이고 있다. 본 논문에서는 GMM과 HMM의 장점을 결합한 문맥 독립형 고속 화자인식 방법을 제안하였다. 제안된 시스템은 크게 3가지 부시스템으로 구성된다. Baseline GMM을 사용한 화자식별 시스템에서는 N-best 화자들의 목록을 얻고, HMM기반 음소단위 인식기에서는 입력 음성 신호에 대해서 음소별 분할 정보를 얻는다. 이렇게 얻어진 N-best화자들과 음소 분할 정보를 이용하여 화자별로 음소에 대한 음향학적 성질을 보다 정교하게 나타내도록 만들어진 음소별 GMM에서 재인식 과정을 걸쳐 최종 인식 결과를 얻게 된다. 본 논문의 실험에서 14개의 mixture를 갖는 음소별 GMM이 320개의 mixture를 갖는 baseline GMM과 비슷한 성능을 보이고 있음을 알 수 있었으며, 소요 시간면에서는 5배 정도의 빠른 응답속도를 얻을 수 있었다. 하지만 사용되는 파라메터 양은 baseline GMM보다는 훨씬 많았다. 일반적으로 화자식별에서 식별함수의 likelihood는 모음구간에서 지배적이고 실제 입력 음성 신호에서도 모음구간이 자음구간보다 2배이상 길다. 이러한 성질을 이용하여 음소별 GMM에서 자음을 음운학적 성질로 구분한 4개의 그룹으로 나누어 음소별 class GMM을 만들었다. 음소별 class GMM을 이용한 화자식별에서 식별 성능은 음소별 GMM의 성능과 비교해서 거의 차이가 없었지만 파라메터 메모리의 양은 반정도 줄일 수 있었다. 물론 화자식별 속도는 음소별 GMM을 이용한 화자식별 시스템과 같았다. 따라서 화자식별 성능을 높이기 위해 GMM의 mixture를 수를 늘이는 것 보다 음소별 GMM을 만들어 적은 시간을 소요하면서도 좋은 화자식별 성능을 갖는 시스템을 구현할 수가 있었다.

서지기타정보

서지기타정보
청구기호	{ICU/MS04-77 2004
형태사항	vii, 44 p. : 삽화 ; 26 cm
언어	영어
일반주기	저자명의 한글표기 : 권석봉 지도교수의 영문표기 : Hoi-Rin Kim 지도교수의 한글표기 : 김회린
학위논문	학위논문(석사) - 한국정보통신대학원대학교 : 공학부,
서지주기	References : p. 39-41

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서