서지주요정보
HMM에 기반한 음성인식에서 음향학적 문맥 정보의 결합 = The use of acoustic contextual information in HMM-based speech recognition
서명 / 저자 HMM에 기반한 음성인식에서 음향학적 문맥 정보의 결합 = The use of acoustic contextual information in HMM-based speech recognition / 최인정.
발행사항 [대전 : 한국과학기술원, 1999].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8009886

소장위치/청구기호

학술문화관(문화관) 보존서고

DEE 99031

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Acoustic and phonetic contexts are very important in speech recognition. Achieving the highest possible levels of speech recognition performance means making efficient use of all the contextual information. However, current hidden Markov model (HMM) technology primarily approaches the problem from a top-down perspective by modeling phonetic context. In this dissertation work, we present various methods to incorporate acoustic contextual information in HMM-based speech recognition. To evaluate the performance of the proposed methods, we use three kinds of speech databases. First, we propose a variable information rate (VIR) model which applies different information rates to all basic portions of sampled speech waveform. As a special case of VIR model, we use context-dependent state weights as a scaling factor to reflect the informational importance within each portion of the signal. The discriminating power of the individual states is evaluated based on the acoustic context. Context-dependent state weights are supposed to reduce the influence of non-characteristic feature vectors and to raise the influence of typical feature vectors on the observation probability of an HMM state. The additional parameters are estimated by using the generalized probabilistic descent (GPD) training algorithm. The proposed method does not increase the complexity of the recognizer and can be implemented with minor modification of the conventional recognition algorithm. In speaker-independent speech recognition experiments, the proposed method results in considerably improved performance than the conventional method that treats all speech segments with the same importance. Second, a new approach of using multi-layer perceptrons (MLPs) to estimate context-dependent state weights is proposed. MLPs have a very flexible architecture which can easily accommodate contextual inputs, and thus we employ this merit of MLPs in order to obtain state weights with the wider acoustic context. In this approach, MLP outputs are used as state-dependent weights of HMM log state-likelihoods. The MLP is trained in two steps. In the first step, context-dependent state weights with explicit context classification are used as the desired outputs, and the MLP is trained with the error back-propagation (EBP) algorithm. In the second step, the MLP parameters are adapted by a discriminative training in order to further improve the discriminability of competing HMM states. The proposed method reduces the error rate considerably as compared with the conventional HMM in three kinds of speech recognition tasks. Third, a novel method is proposed to incorporate acoustic contextual information into speech recognizers based on HMM. Acoustic contextual information in conventional HMMs is hardly taken into account except for higher-order derivative features, and thus the possible correlation of the successive acoustic vectors is overlooked. We investigate the effects of contextual inputs in HMM-based speech recognizers, and these effects are incorporated into contextual information parameters with simplifying assumptions. The contextual information parameters are shown to measure both the degree of correlation of the input features and the boundary uncertainties between HMM states. The parameter estimation and recognition algorithms can be implemented without extensive modification or increased complexity. Experimental results show that the recognizer with contextual information results in much better performance than the conventional HMM speech recognizer. Finally, we propose a VIR analysis in which the amount of information within a basic period of speech signal determines the number of features to be extracted. And we formulate the HMMs incorporating the VIR analysis. The information rate parameters, which determine the number of acoustic vectors to be extracted in the period, depend on both an HMM state and the neighboring feature vectors. The VIR analysis is incorporated into the conventional HMMs in two approaches. In the first approach, to not influence the calculation of Viterbi path within an HMM, variable information rates are applied to only within model selection steps. In the other approach, the parameters are directly incorporated into the calculation of state observation probabilities. The information rate parameters are estimated based on the minimum classification error criterion. The HMM recognizers with VIR analysis achieve 10-47% decrease in word error rate for two kinds of continuous speech recognition tasks.

서지기타정보

서지기타정보
청구기호 {DEE 99031
형태사항 xi, 128 p. : 삽화 ; 26 cm
언어 한국어
일반주기 저자명의 영문표기 : In-Jeong Choi
지도교수의 한글표기 : 이수영
지도교수의 영문표기 : Soo-Young Lee
수록 잡지명 : "Speech recognition based on variable information rate model". Electronics Letters. The Institution of Electrical Engineers, vol. 33, no. 9, pp. 749-750 (1997)
수록 잡지명 : "The use of acoustic contextual information in HMM-based speech recognition". IEEE Signal Processing Letters. The Institute of Electrical and Electronics Engineers, vol. 5, no. 5, pp. 108-110 (1998)
학위논문 학위논문(박사) - 한국과학기술원 : 전기및전자공학과,
서지주기 참고문헌 : p. 122-128
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서