A speaker independent keyword spotting system is very popular in continuous speech recognition. Due to its complexity and a big load of calculations, it is very difficult to build up a simple hardware system. In this paper, I suggest Dynamic Time Warping(DTW) as a solution. For a simple and fast calculation, the Single-End-Point DTW algorithm is proposed. With this algorithm, a real-time hardware can be implemented, and a keyword spotting can be done efficiently with less computations compared to conventional methods. The proposed algorithm searches local areas successively, and it only needs single-end-point of a speech segment. Some of the slope weights and distance measurements are modified for a better performance.
For efficient processing, input speech is transformed by preprocessing. The ZCPA cepstruma are used as speech feature parameters which model mammalian auditory periphery, and the parameters are noise robust. By applying the feature normalization, all dimensions are normalized to have same standard deviations, and all frames are normalized to have same energies. By averaging the patterns, several references per one keyword are created, and they are compared to test input speech. If a pronunciation is similar to one keyword, Euclidean distance between corresponding reference pattern and test input speech is relatively small. When the distance is smaller than the pre-defined threshold, the system detects the keyword.
The proposed algorithm is tested for isolated speech recognition and key-word spotting with various conditions. Detection results compared with original DTW and simple Hidden Markov Model(HMM). It is proven that this algorithm has a good performance.