In this thesis work, we propose a Weighted Time State Neural Network (WTSNN) for phoneme recognition and a neural network architecture for phoneme-based isolated word recognition and obtain their performance. States of TSNN is to reflect temporal structure of phonemic features. However, the contribution of each state to phoneme recognition varies state to state, we propose a new algorithm called weighted TSNN in which each state is weighted according to the effect on recognition performance. Weights of network can also be obtained by learning multi-layer perceptron at the top of TSNN.
In recognizing initial stop consonants in syllables, the proposed algorithm yields better performance results over the conventional Time Delay Neural Network (TDNN) and Time State Neural Network (TSNN) algorithms by 8% and 6%, respectively, in a speaker independent mode.
For the phoneme-based isolated word recognition, we propose an architecture based on TDNN and TSNN. The weighted TSNN or TSNN has higher phoneme recognition rates than TDNN. However, with the TSNN architecture alone, we can not spot phonemes in speech signal. The proposed phoneme recognition architecture combines the phoneme spotting ability of TDNN and the higher recognition performance of TSNN. Using this architecture, we can reduce the lerning time and obtain recognition rates comparable to those of large TDNN architecture with increased hidden nodes.
From computer simulation results on speaker dependent phoneme-based Korean digit recognition, we obtain the recognition accuracies of 80.95% and 96.0% for phoneme recognition and word recognition, respectively using the conventional TDNN architecture. For the proposed architecture, we obtain the recognition rates of 86.67% and 96.0% for phoneme and word recognition respectively. The proposed architecture requires less computation time than the large TDNN in the learning phase.