Recently, various neural networks have been widedly used in speech recognition. However, most of them were used to classify isolated phonemes or words. Among them a recurrent network is favorable to connected speech recognition due to its nature of allowing variable patterns in length. This thesis proposes and evaluates a structured recurrent neural network to recognize connected phoneme sequences hierarchically. A neural network that has self recurrent connections is useful to capture the temporally distributed features of the temporally distorted patterns. We improved it to recognized the connected sequences of the trained units by an analog target function and a hierarchical organization of the network that performs broad classification and segmentation simultaneously with the networks that perform detailed classification for broad classified phonemes.
Speaker dependent word recognition experiments based phonemes were performed with the phonetically balanced 35 Korean words uttered by one male and one female speakers, which are consisted of stop(p,t,k,b,d,g), vowel (a,\(pd,o,u,i), and nasal(m,n) classes. Among 8 repeated utterances, 2 of them were used as training data, and the remaining 6 were used as test data. With the input parameters of log energy, zero crossing rate, 5 frequency band energy, and the first cepstrum coefficients, the broad classification accuracy of 96% was obtained. With the 14 LPC cepstrum coefficients, detailed phoneme classification accuracy was 94%. The improved recognition results compared with those of the recurrent neural network without broad classification/segmentation stage reveals the utility of the proposed structured recurrent neural network for the connected sequences of phonemes.