서지주요정보
시간축 변환과 가변 프레임률 분석에 기반한 음성인식을 위한 특징 추출법 = A feature extraction method for ASR based on time-scale modification and variable frame rate analysis
서명 / 저자 시간축 변환과 가변 프레임률 분석에 기반한 음성인식을 위한 특징 추출법 = A feature extraction method for ASR based on time-scale modification and variable frame rate analysis / 정영숙.
저자명 정영숙 ; Jung, Young-Sook
발행사항 [대전 : 한국과학기술원, 2002].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8013558

소장위치/청구기호

학술문화관(문화관) 보존서고

MEE 02100

SMS전송

도서상태

이용가능

대출가능

반납예정일

초록정보

In this thesis, we investigate the effect of the variability of speech signals according to speaking rate on the performance of automatic speech recognition (ASR). To reduce the variability, we propose two methods in the feature space and in the Hidden Markov model (HMM) space. First, we propose a feature extraction method, in which each speech analysis frame has a different time resolution depending on a speech characteristic the frame belongs to. The proposed method provides higher resolution to speech frames of transient regions than those of steady regions. The distinguishing feature of the proposed method is achieved by combining a time-scale modification (TSM) technique and a variable frame rate (VFR) analysis. TSM is applied for increasing the resolution of speech signals in transient regions, and a VFR analysis reduces the resolution of steady regions by discarding steady frames. We performed speech recognition experiments on a task of Korean connected digit, and it was shown that the proposed method reduced word error rate by 14.1% compared to the conventional feature extraction method. Second, we propose the method of normalizing speaking rate to reduce an acoustic variability due to speaking rate between speakers. The proposed method is to find an optimal speaking rate for each utterance so that the best word accuracy is obtained with the speaking rate. The maximum-a-posterior (MAP) criterion in the HMM space is used for searching the optimal speaking rate, and the utterance is modified by using TSM. The word error rate of the connected digit recognition system was reduced by 10.14% by employing the proposed speaking rate normalization method.

서지기타정보

서지기타정보
청구기호 {MEE 02100
형태사항 v, 50 p. : 삽도 ; 26 cm
언어 한국어
일반주기 저자명의 영문표기 : Young-Sook Jung
지도교수의 한글표기 : 이황수
지도교수의 영문표기 : Hwang-Soo Lee
학위논문 학위논문(석사) - 한국과학기술원 : 전기및전자공학전공,
서지주기 참고문헌 : p. 46-48
주제 특징추출
음성인식
시간축 변환
가변 프레임률
feature extraction
speech recognition
time scale modification
variable frame rate analysis
QR CODE qr code