한국과학기술원 도서관

서지주요정보
From unimodal to multimodal learning with adaptive alignment for 2D-3D visual recognition = 2D-3D 시각 인식을 위한 적응적 정렬을 활용한 유니모달부터 멀티모달 학습 기법 연구
서명 / 저자	From unimodal to multimodal learning with adaptive alignment for 2D-3D visual recognition = 2D-3D 시각 인식을 위한 적응적 정렬을 활용한 유니모달부터 멀티모달 학습 기법 연구 / Thang Vu.
발행사항	[대전 : 한국과학기술원, 2023].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8040326

소장위치/청구기호

학술문화관(도서관)2층 학위논문

DEE 23055

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

This dissertation considers unimodal and multimodal learning with adaptive alignment for 2D-3D visual recognition on images and point clouds. Regarding unimodality on 2D images, we investigate object detection and instance segmentation tasks, which are commonly formulated by a two-stage pipeline of RPN and R-CNN. We propose Cascade RPN with Adaptive Convolution to ensure alignment between features and reference boxes which is required for progressive refinement. For the R-CNN, we revisit Cascade Mask R-CNN and propose SCNet to align sample distribution between training and inference in existing cascade architectures. For unimodality on 3D point clouds, we propose SoftGroup to perform grouping on soft scores to avoid error propagation from hard semantic prediction into instance segmentation. SoftGroup is further extended to SoftGroup++ for scalable 3D instance segmentation with an adaptive strategy to reduce time complexity and search space. Finally, we propose Bird Eye View (BEV) fusion for multimodal object detection that aligns image and point features via BEV projection followed by weighted fusion to address the limitation of sparse points for far objects. Extensive experiments on various standard benchmarked datasets demonstrate the superiority and generality of the proposed methods.

본 논문은 2D-3D 이미지 및 점구름 데이터 기반 시각 인지의 적응적 정렬을 통한 단일 모달 및 멀티 모달 학습 기법에 대해 다룬다. 우선 2D 이미지의 단일 모달과 관련하여 물체 탐지 및 객체 분할 기법을 연구하며, 이는 주로 RPN 및 R-CNN의 두 단계 파이프라인으로 구성되어 있다. 기존의 방법을 개선하기 위해 본 논문에서는 Cascade RPN을 제안하였으며, 점진적 개선이 필요한 학습된 이미지의 특징과 참조된 박스들과의 정렬을 보장하여 기존의 RPN의 성능을 개선하였다. R-CNN의 성능을 개선하기 위해 본 연구에서는 샘플 분포의 정렬을 위한 SCNet을 제안하여 학습과 추론 과정에서의 샘플의 일관성을 보장함으로써 기존 cascade 기반 구조들의 성능을 개선하였다. 3D 점구름 데이터의 단일 모달리티를 위해 하드 시맨틱 예측에서 인스턴스 분할로의 오류 전파를 피하기 위해 소프트 스코어에서 그룹화를 수행하는 SoftGroup을 제안합니다. 시간 복잡도와 탐색 공간을 줄이기 위한 적응형 전략으로 확장 가능한 3D 인스턴스 분할을 위해 SoftGroup이 SoftGroup++로 더욱 확장되었습니다. 최종적으로, BEV 프로젝션을 통해 이미지와 포인트 특징을 정렬하는 멀티모달 객체 감지를 위한 BEV(Bird Eye View) 퓨전을 제안하고, 멀리 있는 객체에 대한 희소 포인트의 한계를 해결하기 위해 가중 퓨전 모델을 사용합니다. 공식 벤치마크의 데이터세트들을 기반으로 한 다양한 실험을 통해 본 논문에서 제안한 연구들의 우수성과 보편성을 기술하였다.

서지기타정보

서지기타정보
청구기호	{DEE 23055
형태사항	vi, 88 p. : 삽도 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 부탕 지도교수의 영문표기 : Chang Dong Yoo 지도교수의 한글표기 : 유창동 Including appendix
학위논문	학위논문(박사) - 한국과학기술원 : 전기및전자공학부,
서지주기	References : p. 75-85
주제	Unimodal Multimodal Adaptive alignment 2D-3D visual recognition Deep neural network 유니모달 멀티모달 적응적 정렬 비주얼 인식 2D-3D 딥 뉴럴 네트워크

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서