한국과학기술원 도서관

서지주요정보
Cross-modal alignment and translation for missing modality action recognition = 모달리티 소실을 고려하는 행동 인식을 위한 모달리티 간 정렬 및 변환
서명 / 저자	Cross-modal alignment and translation for missing modality action recognition = 모달리티 소실을 고려하는 행동 인식을 위한 모달리티 간 정렬 및 변환 / Yeonju Park.
발행사항	[대전 : 한국과학기술원, 2023].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8040753

소장위치/청구기호

학술문화관(도서관)2층 학위논문

MEE 23046

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Multimodal data provides complementary information on the same context, leading to performance improvement in video action recognition. However, in reality, not all modalities are available at test time. To this end, we propose Cross-Modal Alignment and Translation (CMAT) framework for action recognition that is robust to missing modalities. Specifically, our framework first aligns representations of multiple modalities from the same video sample through contrastive learning, effectively alleviating the bias with respect to the type of missing modality. Then, CMAT learns to translate representations of one modality into that of another modality. This allows the representations of the missing modalities to be generated from the remaining modalities during the testing. Consequently, CMAT fully utilizes multimodal information obtained through abundant interactions across modalities. The proposed CMAT achieves the state-of-the-art performances in both complete and missing modality settings on NTU RGB+D, NTU RGB+D 120, and Northwestern-UCLA datasets. Moreover, extensive ablation studies demonstrate the effectiveness of our design.

멀티모달 데이터는 동일한 상황에 대해 상호보완적인 정보를 제공하며 비디오 기반의 행동 인식에서 성능 향상을 이끈다. 그러나 실제 상황에서는 추론 시에 일부 모달리티를 사용할 수 없는 경우가 존재한다. 이를 위해 소실된 모달리티에 강인한 행동 인식을 위한 교차 모달 정렬 및 변환 (CMAT) 프레임워크를 제안한다. 구체적으로, 먼저 대조 학습을 통해 같은 비디오에 대한 여러 모달리티의 특징을 정렬한다. 이 과정은 소실되는 모달리티의 유형에 따른 편향을 효과적으로 완화한다. 그 다음, 한 모달리티의 특징을 다른 모달리티의 특징으로 변환하도록 학습한다. 이를 통해 추론 과정에서 주어진 모달리티로부터 소실된 모달리티의 특징을 생성할 수 있다. 결과적으로 CMAT는 모달리티 간의 풍부한 상호작용을 통해 얻은 멀티모달 정보를 충분히 활용한다. 제안 프레임워크는 NTU RGB+D, NTU RGB+D 120 및 노스웨스턴-UCLA 데이터 세트에 대해 완전한 모달리티 설정과 모달리티 소실 설정 모두에서 최첨단 성능을 달성한다. 또한 광범위한 절제 연구는 우리의 설계의 효과를 입증한다.

서지기타정보

서지기타정보
청구기호	{MEE 23046
형태사항	iii, 24 p. : 삽도 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 박연주 지도교수의 영문표기 : Changick Kim 지도교수의 한글표기 : 김창익
학위논문	학위논문(석사) - 한국과학기술원 : 전기및전자공학부,
서지주기	References : p. 18-22
주제	Action recognition Multi-modal learning Missing modality Contrastive learning Feature translation 행동 인식 멀티모달 학습 소실 모달리티 대조 학습 특징 변환

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서