Boyer and Moore had presented a fast string search algorithm that searches a pattern of string efficiently. It is faster than the linear search algorithm that Knuth, Morris, and Pratt proposed.
In data processing point of view, there are two peculiarities in HANGUL. One is the representation of HANGUL syllable and another is the mixed representation of HANGUL sentences using HANGUL and English alphabets in informal writing.
HANGUL text can best be processed by dealing with the unit of a syllable and each syllable consists of a variable number of HANGUL characters. Since a HANGUL syllable cannot be represented by 1-byte code, several code systems were proposed. They are 3-byte code, 2-byte code, etc.
In representing HANGUL sentences, Chinese characters are most frequently intermixed with HANGUL syllables within sentence. To make the matters worse, English characters are also intermixed with HANGUL syllables within an informal sentence.
In this thesis, a string search algorithm is proposed for searching a pattern of string in the string of mixed codes. The proposed algorithm is based on the algorithm proposed by Boyer and Moore. A new 2-byte HANGUL code is also proposed in this thesis which improves the problems with the existing 2-byte codes system of string searching.
It is proved that the algorithm proposed in this thesis is very efficient. However, the current version of the algorithm has the shortcoming of the large memory space reguirement for the pattern string.