The goal of vocabulary independent speech recognition (VISR) is to transform the input speech waveform into words by utilizing subword models with continuous or isolated utterance, even whether the vocabulary set is fixed or not. VISR system has generally been developed with large speech database to model various phonetic and acoustic phenomena. However, in this thesis, we propose the VISR system using back-off phoneme decision tree with small speech database.
The back-off phoneme decision tree, which shows the better performance in the condition of small speech database, is binary decision tree for automatic phoneme clustering. In the back-off phoneme decision tree, splitting from parent node is maintained, without stopping criterion, until no more data elements are assigned to children nodes. After the tree construction, the type of node is determined by the number of elements belong to given node. The type of node means that acoustic model belong to given node needs to be trained or not. According to the node type, the acoustic subword model is trained by data not only of terminal nodes, but also of internal nodes. In contrast to previous phoneme decision tree, back-off phoneme decision tree has advantage of using more phonetic information for acoustic subword modeling. In addition, parent node is referred without additional computation if unseen model is requested.
With the small speech database, the effectiveness of back-off phoneme decision tree was shown through several experiments. By model combination with deleted interpolation method, we could estimate reliable HMM model parameters. For the future work, the Back-Off phoneme decision tree will be applied to continuous speech recognition system with large scaled speech corpus.