In this thesis work, a time reduction algorithm for large-vocabulary speech recognition has been studied. To reduce recognition time, vowel was classified according to its format information based on linear prediction analysis.
Format extraction was done in the assumed vowel regions. Therefore, it was necessary to segment vowel regions exactly from the input speech before format extraction. To segment vowel regions, we used acoustic features such as energy, zero crossing rate, and filter bank outputs. To extract formats, we determined resonant peaks of the vocal tract transfer function. A knowledge-based decision method was used to find the true formats from the peaks. Using these formats, each vowel was classified, after which words can also be distinguished from others by their vowels.
Computer simulation was done to test the time reduction algorithm with 1160 words. The vocabularies were spoken by one male speaker under an ordinary ambient condition. The simulation result has shown that the recognition rate is about 97% and the time reduction rate is about 80%.
본 연구에서는 대용량 단어 인식 시스템에서의 인식 시간 감축을 위한 algorithm을 연구하였다. 인식 시간 감축을 위하여 모음 분류를 시도하였고 모음의 formant 특성을 이용하였다.
본 연구는 크게 모음 구간 분리와 그 구간내에서의 formant 추출로 나누어진다. 모음 구간 분리는 filter bank energy와 ZCR등의 acoustic feature등을 사용했으며 formant 추출은 linear prediction에 의한 peak picking을 썼다.
전화 번호 안내 시스템의 1160 단어에 대한 test결과 전체 인식율은 97% 정도이고 인식 시간 감축율은 80% 정도였다.