한국과학기술원 도서관

서지주요정보
시각정보를 이용한 음성인식에서 강인한 성능을 갖는 특징 추출 알고리즘의 개발 = Development of the robust feature extraction method in the visual speech recognition system
서명 / 저자	시각정보를 이용한 음성인식에서 강인한 성능을 갖는 특징 추출 알고리즘의 개발 = Development of the robust feature extraction method in the visual speech recognition system / 심선희.
발행사항	[대전 : 한국과학기술원, 2002].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8013031

소장위치/청구기호

학술문화관(문화관) 보존서고

MEE 02054

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

The visual image of a talker provides information complementary to the acoustic speech waveform, and enables improved reconition accuracy, especially in the environments corrupted by high acoustic noise or multiple talkers. Because most of the phonologically relevant visual information can be obtained from the mouth and lips, it is important to measure their dynamics in an accurate and robust manner, Moreover it is desirable to extract information on the mouth and lips without use of artificial invasive markers or patterned illumination. In this thesis, a new method is proposed to extract features of the lips from a color image. The color image is transformed into one that can sharply represent the inner and outer lip contour in spite of the existence of the tongue, teeth and shadow in the image. Thus, we can obtain more exact length information such as heights and width of the lips in the talker's image. Visual speech features, consisting of shape and color information of the lips, are extracted from the lip-tracking results of many words spoken by many people. Then, hidden Markov models(HMMs) are trained using these features. Audio-visual speech recognition as well as visual recognition is performeed for isolated digit recognition using late integration of audio-visual features. All speech recognition experiments are performed in a speaker-dependent way, using HMMs with 20 states and 10 mixture components and the same speakers for training and testing. The visual only recognition system yields the performance up to 91.5% recognition rate. Addition-ally white gaussian noise is added to the audio signal resulting in signal-to-noise ratio(SNR) from 20dB to -20dB. The experimental result shows that, even though the audio recognition system shows bad performance in the presence of noise, the performance of the audio-visual recognition system is close to that of the visual system. The developed method can be applied for the deformable template with better results than the conventional ones.

서지기타정보

서지기타정보
청구기호	{MEE 02054
형태사항	vii, 58 p. : 삽화 ; 26 cm
언어	한국어
일반주기	저자명의 영문표기 : Sun-Hee Shim 지도교수의 한글표기 : 박철훈 지도교수의 영문표기 : Cheol-Hoon Park
학위논문	학위논문(석사) - 한국과학기술원 : 전기및전자공학전공,
서지주기	참고문헌 : p. 55-57

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서