한국과학기술원 도서관

서지주요정보
Diphone단위의 hidden Markov model을 이용하는 음성인식 시스템의 성능 향상에 관한 연구 = A study on the performance improvement of the speech recognition system based on the diphone-level hidden markov model
서명 / 저자	Diphone단위의 hidden Markov model을 이용하는 음성인식 시스템의 성능 향상에 관한 연구 = A study on the performance improvement of the speech recognition system based on the diphone-level hidden markov model / 박현상.
발행사항	[대전 : 한국과학기술원, 1993].
Online Access	제한공개(로그인 후 원문보기 가능)원문

소장정보

등록번호

8004122

소장위치/청구기호

학술문화관(문화관) 보존서고

MEE 93097

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

In this thesis work, speech units appropriate for recognition of Korean language have been studied. Among well-known speech recognition units such as phoneme, syllable, triphone, diphone, etc., efforts were mainly concentrated on diphone. For better speech recognition, co-articulatory effects within an utterance should be considered in the selection of a recognition unit. One way to model such effects is to use larger units of speech. It has been found that diphone is a good recognition unit because it can model transitional regions explicitly. When diphone is used, stationary phoneme models may be inserted between diphones. Computer simulation for isolated word recognition was done with 74 word database spoken by seven male speakers. Best performance was obtained when transition regions between phonemes were modeled by a two-state HMM and stationary phoneme regions by an one-state HMM excluding /b/, /d/, and /g/. By merging rarely occurring diphone units, the recognition rate was increased from 93.98\% to 96.29\%. In the training phase, by clustering feature vectors corresponding to one HMM state including its left-adjacent feature vector, the recognition rate was increased form 96.29\% to 96.76\%. In addition, a minimum smoothing technique was proposed to smooth a poorly-modeled HMM with a well-trained HMM. With this technique we could get the recognition rate of 97.22\% after merging some diphone units.

본 논문에서는 한국어 음성인식에 적합한 음성 인식 단위에 대해서 연구하였다. 음성 인식에는 음소, 음절, 단어, triphone, diphone과 같은 여러가지 인식 단위가 사용되는데 본 논문에서는 이들중 diphone을 이용한 음성 인식에 관해서 주로 연구하였다. 좋은 음성 인식 시스템을 구현하기 위해서는 발음된 음성내의 조음화현상을 처리할 수 있는 인식 단위를 선택해야만 한다. 따라서 음소보다 개념적으로 확대된 인식 단위가 필요하게 되는데, diphone은 음소간의 전이영역을 modeling하기 때문에 좋은 인식 단위가 될 수 있다. Diphone을 인식 단위로 할 경우에 안정적인 음소영역을 diphone사이에 삽입할 수도 있다. 74단어로 구성된 고립단어 인식 실험결과 diphone을 2-state HMM으로, 터짐 소리와 묵음을 제거한 음소모델을 1-state HMM으로 나타냈을 때 가장 높은 인식률을 보였다. 이 때 드물게 발생하는 diphone들을 하나의 단위로 합쳤을 때 인식률이 93.98\%에서 96.29\%로 향상되었다. 또한 Training과정중 하나의 HMM state에 해당하는 특징벡터들과 좌측으로 가장 인접한 특징벡터를 clustering함으로써, 인식률이 96.29\%에서 96.76\%로 증가되었다. 게다가 o}櫓繹瑾 }train된 HMM을 충분히 train된 HMM을 사용해서 smoothing하기 위해 극소보간법이 제안되었다. 이 방법으로 최고 97.22\%의 인식률을 얻을 수 있었다.

서지기타정보

서지기타정보
청구기호	{MEE 93097
형태사항	iv, 58 p. : 삽화 ; 26 cm
언어	한국어
일반주기	저자명의 영문표기 : Hyun-Sang Park 지도교수의 한글표기 : 은종관 지도교수의 영문표기 : Chong-Kwan Un
학위논문	학위논문(석사) - 한국과학기술원 : 전기및전자공학과,
서지주기	참고문헌 : p. 54-57
주제	Markov processes. Speech perception. Phonemics. Markov 과정. --과학기술용어시소러스 음성 인식. --과학기술용어시소러스 음소. --과학기술용어시소러스

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서