서지주요정보
Towards universal visual scene understanding in the wild = 강인한 딥러닝 기반 범용적 장면 이해
서명 / 저자 Towards universal visual scene understanding in the wild = 강인한 딥러닝 기반 범용적 장면 이해 / Kwanyong Park.
발행사항 [대전 : 한국과학기술원, 2023].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8041549

소장위치/청구기호

학술문화관(도서관)2층 학위논문

DEE 23070

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Despite the significant advancements in deep learning technology, sophisticated deep models often struggle to perform effectively in real-world scenarios. The fundamental cause of these failures is the lack of or bias in training data. While data scaling is a fundamental and ideal solution, constructing large datasets for numerous target tasks presents practical challenges. In this thesis, we explore practical learning frameworks for training robust deep learning models under data constraints. Specifically, we aim to construct generalized models that are robust to new objects and environments not included in the training data. To achieve this high level of generality, we leverage large-scale pre-existing or easily obtainable data as additional training data. We anticipate that this type of data will effectively generalize and regularize the task knowledge learned from the limited scale of the target task dataset. Under the above problem definition, we mainly investigate the task of the universal scene understanding model. The goal of the task is to provide high-quality scene understanding for input image/video, which offers comprehensive descriptions of what and where the consisting objects are. Different from the traditional recognition model which understands a scene in a predefined way, a universal scene understanding model flexibly handles different task definitions. Various task formats can be defined through predefined dataset-level or arbitrary user-provided semantic concepts and user interactions. To address the aforementioned challenging task, we design a universal scene understanding model from a modular perspective and each module serves unique functionality. The model consists of three key modules: the segmenter, the refiner, and the classifier. The segmenter module initially generates a set of regions that serve as coarse masks, effectively identifying potential object regions within the scene. Subsequently, the refiner module refines the segmented regions to preserve the fine details of the scene structure. This module is at the core for a more comprehensive understanding of the scene. Finally, the classifier module is responsible for assigning class labels to each refined region. It analyzes the content of the regions and assigns them appropriate class names, enabling a detailed categorization of objects within the scene. In Chapter 2, we start with the generalization issue for the image segmented. Specifically, we study the problem in the context of unsupervised domain adaptation. In Chapter 3, we move to the generalization issue for the video segmented. To build a robust video segmenter, we jointly utilize image and video data and explore how to bridge these distinct data mainly in the semi-supervised video object segmentation task. In Chapter 4, we formulate the refiner module as a problem of mask-guided matting. In Chapter 5, we first propose our research effort to build a general classifier module. Inspired by the recent success of vision-language foundation models (e.g. CLIP), we investigate how to utilize these foundation models as a generic knowledge basis for vision tasks. Then, we finally connect all the modules to build a universal scene understanding model. The model is instantiated in different configurations and this results in interesting and novel recognition tasks: panoptic soft segmentation, mask-guided video soft segmentation, and open vocabulary instance soft segmentation.

최근 딥러닝 기술의 빠른 발전에도 불구하고, 정교한 딥러닝 모델조차도 실제 현장에서 여전히 많은 실패 사례를 보이고 있다. 이러한 실패의 가장 근본적인 원인은 학습 데이터의 부족 혹은 편향 문제이다. 데이터의 크기를 키우는 것이 이상적인 해결책이나, 다양한 문제에 대한 대규모 데이터셋을 구축하는 것은 현실적으로 어렵다. 본 학위논문에서는, 이러한 데이터의 제약 조건 속에서도, 강인한 딥러닝 기반 범용적 장면 이해 모델을 학습하는 방법에 대해 다룬다. 특히, 학습 시 포함되어 있지 않았던 새로운 물체, 새로운 환경에 강인한 모델 구축을 목표로 한다. 이를 위해, 이미 구축되어 있거나, 비교적 쉽게 얻을 수 있는 데이터셋들을 활용하였으며, 여러 데이터셋간의 차이를 효과적으로 처리할 수 있는 학습 방법론을 제안하였다.

서지기타정보

서지기타정보
청구기호 {DEE 23070
형태사항 vi, 76 p. : 삽도 ; 30 cm
언어 영어
일반주기 저자명의 한글표기 : 박관용
지도교수의 영문표기 : In So Kweon
지도교수의 한글표기 : 권인소
Including appendix
학위논문 학위논문(박사) - 한국과학기술원 : 전기및전자공학부,
서지주기 References : p. 60-73
주제 Universal scene understanding
Generalization
Data hungry
범용적 장면 이해
일반화
데이터 부족 문제
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서