In this thesis, we describe a system for extracting articles from Korean newspapers. This system is composed of three modules; 1) for newspaper region identification which segments newspaper image into regions such as titles, article bodies, pictures and figures, 2) for newspaper article tracing which form an article by connecting separated newspaper regions and 3) character extraction which segments text into character images.
Both of statistical characteristics of regions and ruled lines are used in the region identification process. By utilizing the information of ruled lines, regions are identified much faster and more accurately than those based on only statistical information. In the article extraction process, the knowledge about general layout principles of Korean newspapers is utilized. Since characters may appear in different size, a single character size is calculated from given line width or line height considering the typical height/width ratio of Korean fonts. The performance of the system was assured through a set of experiments.