서지주요정보
Learning audio-visual relationships and correspondences in the visual scenes = 시각과 청각 정보 간의 관련성 학습 기법
서명 / 저자 Learning audio-visual relationships and correspondences in the visual scenes = 시각과 청각 정보 간의 관련성 학습 기법 / Arda Senocak.
발행사항 [대전 : 한국과학기술원, 2022].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8039557

소장위치/청구기호

학술문화관(도서관)2층 학위논문

DEE 22047

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

If we think about how we, as human beings, experience the world around us, it can be realized that we continuously use all of our senses. With all these different sensory signals, we learn and understand the scenes. Regardless of whether a person sees an image of “lion”, hears a “lion roaring” sound, or hears someone says the word “lion”, the same response is triggered inside of the human brain. Though human perception uses multimodal information, most of the existing models for understanding the scene around us deal with only a single modality, such as vision. Thus, developing a machine perception that uses multimodal data is very essential. Among these sensory signals, inarguably the most dominant ones are vision and audition. The sound is not only complementary to the visual information but also correlated to visual events. When we see that a car is moving, we hear the engine sound at the same time. In this thesis, I introduce computational models to find the correspondence and complementary information between audio and visual signals. I introduce several tasks that benefit from the correspondence information such as sound source localization, audio-visual cross-modal retrieval, and audio-visual driven important moment selection in the videos. I propose effective self-supervised, semi-supervised and weakly-supervised methods to learn audio-visual correspondence. I also discuss different relationships of audio-visual signals as they do not follow a single type of relationship and leverage these two signals as complementary information to each other in video understanding task by following different ways of audio-visual formations.

한국어 초록이 없습니다.

서지기타정보

서지기타정보
청구기호 {DEE 22047
형태사항 ix, 73 p. : 삽도 ; 30 cm
언어 영어
일반주기 저자명의 한글표기 : 세노자크 아르다
지도교수의 영문표기 : In So Kweon
지도교수의 한글표기 : 권인소
Including appendix
학위논문 학위논문(박사) - 한국과학기술원 : 전기및전자공학부,
서지주기 References : p. 67-72
주제 Audio-visual learning
Self-supervision
Multimodal learning
시각-청각 학습
자가학습
다중모달 학습
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서