한국과학기술원 도서관

서지주요정보
Study on design of neural network for location-aware scene text recognizer = 위치 인지 기반 이미지 문자 인식기를 위한 신경망 디자인에 관한 연구
서명 / 저자	Study on design of neural network for location-aware scene text recognizer = 위치 인지 기반 이미지 문자 인식기를 위한 신경망 디자인에 관한 연구 / Huiwon Yun.
발행사항	[대전 : 한국과학기술원, 2023].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8041149

소장위치/청구기호

학술문화관(도서관)2층 학위논문

MAI 23032

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

In Scene text recognition (STR), it is important to identify where each character is located in the visual scene when generating text sequence class using a language decoder. Previously, STR models using autoregressive architecture (e.g. RNN, Transformer) were proposed to implicitly learn the separation of the region of each character. Since these models do not use supervision on the localization, they still have a misalignment between the activated region of the visual feature and the ground truth text. To resolve these issues, we present a novel STR method, visual LocAlization LeverAged to LANguage Decoding (vLaLa-Land) which explicitly learns localization by character detection task in the Transformer decoder. In order to train localization and recognition harmonically, we developed two novel mechanisms in the decoder. First, to capture the overall semantic relationship of linguistic and visual information, we apply bidirectional reference-guided Transformer decoder layers on top of the unidirectional autoregressive Transformer decoder layers. Second, to properly recognize the irregular shape text, we consider the height, width, and rotation of each character when computing the cross-attention score. We train our model on synthetic datasets and evaluate our model on real datasets. The experiments show that our method is effective in enhancing text recognition accuracy while simultaneously improving the localization ability of the model. Moreover, our model especially works well on the irregular dataset and archives competitive performance on multiple STR benchmarks.

이미지 문자 인식에서 언어 디코더를 통해 문자 배열을 생성할 때 시각장면 내에 각각의 글자가 어디에 위치하였는지 식별하는 것이 중요하다. 이전에는 각 글자의 영역을 암시적으로 학습하는 자동회귀 구조의 이미지 문자 인식 모델들이 제안되었다. 이러한 모델들은 글자위치에 대해 지도학습으로 학습되지 않기 때문에, 정답 문자와 활성화되는 시각 영역사이의 불일치 문제를 겪는다. 이 문제를 해결하기 위하여, 우리는 트랜스포머 디코더에서 글자 탐지를 명시적으로 학습하는 형태의 새로운 이미지 문자 인식 모델을 제안한다. 글자 검출과 인식을 함께 학습하기 위해, 우리는 디코더에 두가지 구조를 도입하였다. 첫째, 언어적 시각적 정보의 전체적인 의미 관계를 포착하기 위해, 우리는 단방향 자동회귀 트랜스포머 디코더 레이어 위에 양방향 레퍼런스로 지침받는 트렌스포머 디코더 레이어를 적용하였다. 둘째, 불규칙한 모양의 문자를 적절하게 인식하기 위해, 크로스 어텐션 점수를 계산할 때, 각 글자의 높이, 너비, 회전각도를 고려하였다. 우리는 우리의 모델을 합성데이터셋으로 학습하였으며, 실제 데이터셋에 대하여 평가하였다. 실험결과는 우리의 모델이 글자 인식의 정확도를 키우면서 동시에 모델의 검출 능력을 향상시키는데 효과적임을 보여준다. 또한 우리의 모델은 불규칙 데이터셋에 대해 효과적이었으며 다수의 이미지 문자 인식 벤치마크에서 경쟁력있는 결과를 보여주었다.

서지기타정보

서지기타정보
청구기호	{MAI 23032
형태사항	iii, 24 p. : 삽도 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 윤희원 지도교수의 영문표기 : Jaegul Choo 지도교수의 한글표기 : 주재걸 Including appendix
학위논문	학위논문(석사) - 한국과학기술원 : 김재철AI대학원,
서지주기	References : p. 21-24
주제	Scene text recognition Object detection Artificial neural network Computer vision 이미지 문자 인식 객체 탐지 인공신경망 컴퓨터 비전

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서