Morphological analysis is one of the essential components for Korean document processing. Since the analysis requires frequent access to the dictionary, the electronic dictionary should have an efficient indexing structure that reflects linguistic features of the Korean language.
In this thesis, we design and implement a database index structure based on TRIE for the Korean electronic dictionary. To maximize the utilization of main memory space, the index structure uses four types of arrays classified into type 0, type 1, type 2, and type 3 according to the sizes of the nodes in the TRIE. The type 0 array is introduced for the 2400 first syllables of words to facilitate direct access using the KS code. Direct access enhances performance compared with searching. Type 0 array also makes it easy to remove the last phoneme of a syllable obviating time-consuming code translation.
The characteristics of the database index structure are as follows: (1) operations for accessing, constructing, loading, and saving the dictionary run much faster than other existing index structures, (2) the TRIE structure requires near optimal main memory space, and (3) dictionary entries can be inserted and deleted incrementally.