Recently, various neural networks have been widely used in speech recognition. Among them, the neural networks with the recurrent connections that give the network memory have been studied for the recognition of timevarying sequences. The successful neural networks for speech recognition should not only capture the temporally-distributed feature, but also allow the temporal distortion that results in length variation. Though the recurrent connections provide some capability of the sequence recognition, it is too burdensome for them to memorize all the dynamicity in the speech signal with them only.
So we extended the Elman's network[Elman88] that has fully recurrent connections in hidden layer to enhance the dynamic memory capacity of the recurrent network. The input layer of the extended Elman's network is aligned with n(n>1) context buffers instead of 1 in the Elman's which is useful to extract the context sensitive features in the input. The target function in the output layer is an analog function instead of binary. This reflects the confidence level of the output for the current input in the context buffer.
With the 14th LPC cepstral coefficients, speaker dependent CV syllable recognition was performed. The experimental results show that the performance of the extended Elman's network is superior to that of the Elman as well as to that of the multi-layer perceptron(MLP) without recurrent connections and with maximum input buffer, that is, there exists an optimal number of input context buffers that makes the performance better. This may be due to the fact that the recurrent connections and the context buffers work cooperatively to give the network more discriminant capability than the use only of the recurrent connections or of the context buffers. With the cooperation of the recurrent connections and the context buffers, the segmentation-free nature of the recurrent network makes it possible to extend the proposed network for connected speech recognition.