한국과학기술원 도서관

서지주요정보
Audio-visual learning with semantically similar samples = 의미론적 유사성을 이용한 청각-시각 연관학습
서명 / 저자	Audio-visual learning with semantically similar samples = 의미론적 유사성을 이용한 청각-시각 연관학습 / Hyeonggon Ryu.
발행사항	[대전 : 한국과학기술원, 2023].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8040579

소장위치/청구기호

학술문화관(도서관)2층 학위논문

MPD 23003

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Instance discrimination-based contrastive learning is the learning method that contrasts the positive and negative pair. It assumes that the negative pair should contain different semantic information. However, the assumption only holds because of the random construction of the training batch. Intuitively, this faulty negative pair disturb the training and degrade the model performance. This work aims to solve the faulty negative problem for in- stance discrimination-based Audio-Visual Learning. Existing audio-visual works employ contrastive learning by assigning corresponding audio-visual pairs from the same source as positive while randomly mismatched pairs as negatives. As aforementioned general instance discrimination-based contrastive learning, these negative pairs may contain semantically matched audio-visual information. The key contribution of this work is showing that semantically similar samples can compensate for the effect of faulty negative pairs. Our approach incorporates semantically similar samples into a contrastive learning objective directly. It is applied to two tasks: Audio-Visual Sound Source Localization and Visually Grounded Speech. We demonstrate the effectiveness of our approach to the tasks.

객체 구분 대조 학습 (Instance discrimination-based contrastive learning) 은 일치 쌍 (Positive pair) 과 불일치 쌍 (Negative pair) 을 대조하여 학습하는 기법이다. 이 때 불일치 쌍은 반드시 서로 다른 의미를 지닌다는 것을 전제로 한다. 그러나 학습 중 배치 구성의 무작위성 때문에 이러한 가정은 항상 성립하지는 않는다. 때로는 불일치쌍으로묶인두데이터샘플에의미론적유사성이있을수있기때문이다. 이러한잘못된불일치쌍 (Faulty negative pair) 은 학습을 방해하고 모델 성능을 저하한다. 본 논문에서는 객체 구분 (Instance discrim- ination) 청각-시각 학습에서의 잘못된 불일치 (Faulty negative) 문제를 해결한다. 기존의 청각-시각 연관학습 기법들은 상응하는 청각-시각 쌍을 일치 쌍으로, 다른 무작위 샘플된 쌍들을 불일치 쌍으로 할당하여 대조 학습하는 방법으로 이루어진다. 앞에서 언급한 일반적인 객체 구분 대조 학습과 마찬가지로 이러한 불일치 쌍들은 의미론적으로 유사한 청각-시각 정보를 가지고 있을 우려가 있다. 본 논문의 주요한 기여는 의미론적 으로 유사한 샘플들이 잘못된 불일치 쌍에 의한 효과를 상쇄한다는데에 있다. 본 논문에서는 의미론적으로 유사한 샘플들을 대조 학습 목적함수에 직접적으로 활용하며 이 기법들을 청각-시각 음원 위치 탐색(Sound localization)과 음성의 시각적 이해(Visually grounded speech) 두 가지 문제에 적용하여 그 기법들의 효과를 보였다.

서지기타정보

서지기타정보
청구기호	{MPD 23003
형태사항	v, 30 p. : 삽도 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 류형곤 지도교수의 영문표기 : In So Kweon 지도교수의 한글표기 : 권인소 Including appendix
학위논문	학위논문(석사) - 한국과학기술원 : 미래자동차학제전공,
서지주기	References : p. 25-28
주제	Audio-visual learning Sound source localization Visually grounded speech 청각-시각연관학습 음원위치탐색 음성의시각적이해

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서