This thesis presents an algorithm for document image binarization and a recognition algorithm for off-line handwritten numerals.
For the document image binarization, we apply a global thresholding to an image which is obtained by subtracting the estimated intensity value representing the background from the original image. We extract text regions from the image with noise due to uneven illumination. We develop several methods such as a least squares based, a smoothing method and an edge based method to estimate the intensity value representing the background.
For the handwritten numeral recognition, we first divide the ten numerals into two classes such as (0,2,6,8,9) and (1,3,4,5,7), and then make two classifiers. Next, we obtain final recognized numeral between two numerals recognized at each classifier using one-to-one competition classifier. Each classifier is a neural network with three-layers and feature vectors are the length or gradient of longest line which crosses the numeral pixels on the image plane, and the value obtained through 8×5 and 5×8 nonlinear normalization. Using this structure we achieved a 95% correct recognition rate for the Concordia numeral database and reduced the occurrence of the confused numeral cases caused by the other numeral's obstruction.
In a real experiment, the numeral recognition system of the proposed binarization method and recognition structure has achieved a 86.5% of correct recognition rate for 600 test handwritten numerals.
In addition, for the preprocessing to recognize Korean characters we briefly present a Jaso decomposition method of handwritten Korean characters through the extraction of vertical and horizontal component scanned on the text region.