In this thesis work, a Korean large vocabulary isolated word recognition system is described and its performance is studied.
In the isolated word recognition system, input speech signal is first processed by a digital filter bank. The digital filter bank is implemented by the fast Fourier transform and its outputs are quantized by a vector quantizer(VQ). Then, the hidden Markov modeling(HMM) is applied to the VQ output sequence for each phoneme.
Word recognition is done in two ways. One is the nonlinear word matching method and the other is the word-level Viterbi scoring method. The former requires phoneme segmentation and recognizes the word which has the maximum likelihood phoneme sequence with respect to the input test word. The latter chooses the word which has the largest probability of word-level HMM parameters with respect to the input test word without explicit phoneme segmentation.
Computer simulation was done to obtain the performances of the two recognition methods. The test data are 418 word vocabularies extracted from 1160 word database used in the 114 telephone-number query system. In speaker-dependent recognition, the recognition accuracies of 92.61% and 94.78% were obtained for the nonlinear word matching method and the word-level Viterbi scoring method, respectively. In this simulation study, the size of VQ codebook was 256 and the number of phoneme models was 49.
본 논문에서는 대용량 격리 단어 인식 시스템 구현을 위한 algorithm을 연구하였다. 먼저 training 과정에서 manual segmentation을 통해 각 음소별로 data를 분류해서 VQ를 한 후에 HMM parameter를 estimate하였다.
이 단어 인식 시스템의 성능을 알아 보기 위해서 computer simulation을 화자 종속으로, 한 화자에 의해 발음된 114 전화번호 안내 시스템의 1160 단어에서 적절한 418 단어를 선택하여 수행하였다.
Codeword의 수가 256개이고 음소 model의 수가 49개일 때 첫째, test 단어를 음소 분리한 경우의 nonlinear word matching 방법으로는 92.61%의 인식율을 얻었고 둘째, test 단어, 그 자체를 재구성된 word HMM으로 scoring하는 word-level Viterbi scoring 방법으로는 94.78%의 인식율을 얻었다.