Most of previous techniques for Hangul character recognition can not be directly applied to recognize complex documents because they deal with single font and single size characters only. This thesis describes an attempt to develop a multifont and multisize Hangul recognition system.
The system consists of three kinds of networks: a character extraction network, a type classification network, and six character recognition networks. At first the system extracts each character image from a line image using the character extraction network which is, in turn, composed of character segmenting network and character merging network. Then, the type classification network classifies the character image into one of the six types. Finally, recognition networks classify it into a Hangul character. In order to train the recognition networks, the backpropagation algorithm is used with descending epsilon. The descending epsilon approach is able to absorb shape variations in various font types because it trains Neural Network tolerable.
The proposed system is evaluated by two sets of data: a collection of 520 most frequently used characters in eight font types, and images of a multifont magazine. The recognition rates are about 94% for the first experiment, and 94.5% for the second one. Judging from the recognition results, we may conclude Neural Network approach can handle multifont and multisize Hangul documents.