한국과학기술원 도서관

서지주요정보
(A) hierarchical probabilistic framework for scene text separation = 영상 내 텍스트 분할을 위한 확률적 계층 프레임워크에 관한 연구
서명 / 저자	(A) hierarchical probabilistic framework for scene text separation = 영상 내 텍스트 분할을 위한 확률적 계층 프레임워크에 관한 연구 / Young-Hee Kwon.
발행사항	[대전 : 한국과학기술원, 2009].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8020368

소장위치/청구기호

학술문화관(문화관) 보존서고

DCS 09007

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Automated image understanding could be useful so that extracted information could be used in further ways. Since text has been the most significant information medium for the mankind, an automatic system that reads text has been one of the most popular research targets. However, when text reading is applied to $\textit{scene images}$ taken by digital cameras, we meet a different class of text from traditional ones, aka $\textit{scene text}$, which shows highly various characteristics. A common process to recognize text from an image is first to separate text pixels from background, and then recognize the text-only image. Separating text pixels from scene images remains an open problem, since scene images have much variation and properties of text such as position, size, and color are not constrained. In this research, we cast the scene text separation problem into probabilistic labeling in which we yield the labels which maximize its conditional probability given the image. Based on the hierarchical model of scene text with four layers as image - text line - stroke - label, we provide pixel-oriented representation of objects in each layer to build a random field for describing possibilities of objects in the layer. According to the hierarchy, the proposed hierarchical framework decomposes the labeling problem into three sub-problems as one about text lines, one about strokes, and one about labels, where two formers are described using the Kernel Ridge Regression (KRR), and the latter using the Conditional Random Field (CRF). The optimal labels are identified through a stochastic gradient method. Our framework is built learnable so that the parameters in it can be trained to improve the performance. The experimental results showed the promising performance of our framework.

자동적으로 영상을 이해하는 시스템이 구현다면 이를 통해 얻어낸 정보들은 유용하게 가공되어 활용될 수 있을 것이다. 텍스트(text)는 가장 주요한 정보 매체였기 때문에 이를 자동화하고자 하는 연구가 오랫동안 수행되어왔다. 최근 디지털 카메라의 활용도가 커지면서 텍스트를 영상을 통해 처리하는 것이 더 간편해졌는데, 이러한 영상으로부터 텍스트를 읽어내려면 '영상내 텍스트(scene text)'라는 독특한 특성을 지닌 문자 집합을 처리하는 것이 필요하다. 영상으로부터 텍스트를 읽어내는 일반적인 방법은 먼저 영상에서 텍스트에 속한 화소들을 분리해내고, 그 다음에 텍스트만 존재하는 이진 영상을 대상으로 인식기를 수행하는 것이다. 영상내 텍스트를 영상으로부터 분할해 내는 문제는 아직 쉽게 풀리지 않은 어려운 문제로 남아있는데, 이는 일반적으로 디지탈 카메라로 촬영한 영상이 다양성이 커서 텍스트의 위치, 크기, 색상 등을 미리 알 수 없는 특성에 기인한다. 본 연구에서는 영상내 텍스트 분할의 문제를 확률적인 레이블(lable) 결정 문제로 바꾸어서 각 픽셀의 레이블을 주어진 영상에 대해 가장 높은 확률에 따라 결정하도록 한다. 계층적인 영상내 텍스트 모델을 4단계(영상-문자열-획-레이블)로 구성하고 화소 기반의 컴포넌트 표현법을 통해 각 단계의 컴포넌트들이 랜덤필드(random field)를 통해 표현되도록 한다. 계층에 따라 전체 문제를 3개의 작은 문제로 나누어서 문자열 단계, 획 단계, 레이블 단계를 조건부 확률을 통해 표현함으로써 단계 간에 유기적인 연결을 제시한다. 문자열 단계와 획 단계에서의 조건부 확률은 Kernel Ridge Regression (KRR)을 이용해서 구현했고, 레이블 단계의 조건부 확률은 Conditional Random Field (CRF)를 통해 구현했다. 최적의 레이블은 stochastic gradient 기법을 이용하여 구해내었다. 본 프레임워크는 훈련가능하게 구성되었으므로 파라메터들을 훈련시켜서 성능을 향상시키는 것이 가능하다. 실험 결과를 통해 본 프레임워크의 실용성을 확인할 수 있었다.

서지기타정보

서지기타정보
청구기호	{DCS 09007
형태사항	vi, 53 p. : 삽화 ; 26 cm
언어	영어
일반주기	저자명의 한글표기 : 권영희 지도교수의 영문표기 : Jin-Hyung Kim 지도교수의 한글표기 : 김진형
학위논문	학위논문(박사) - 한국과학기술원 : 전산학전공,
서지주기	References : p. 47-53

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서