한국과학기술원 도서관

서지주요정보
Attention-based video masking for improving open set action recognition = 오픈셋 행동 인식 향상을 위한 어텐션 기반 비디오 마스킹
서명 / 저자	Attention-based video masking for improving open set action recognition = 오픈셋 행동 인식 향상을 위한 어텐션 기반 비디오 마스킹 / Minho Sim.
발행사항	[대전 : 한국과학기술원, 2023].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8041346

소장위치/청구기호

학술문화관(문화관) 보존서고

MCS 23055

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

In real-world scenarios, human action recognition (HAR) is essentially an open set problem that requires a model to classify actions from known classes and detect actions from unknown classes simultaneously. However, HAR models are easily biased to static information in the video (e.g., background), which can lead to performance degradation of open set action recognition (OSAR) models. In this paper, we propose a simple framework for improving OSAR based on the video attention map extracted from the video vision transformer model. Specifically, our framework eliminates patches with static bias in video using two debiasing steps: (1) frame selection and (2) patch masking. Experimental results show that our framework achieves consistent performance improvement on multiple OSAR methods and challenging benchmarks. Furthermore, we introduce two new OSAR tasks, Kinetics-400 vs. Kinetics-600 exclusive and Kinetics-400 vs. Kinetics-700 exclusive, to validate our method in a setting close to the real-world scenario. With extensive experiments, we demonstrate the effectiveness of our attention-based masking, and in-depth analysis validates the effect of static bias on OSAR.

실제 시나리오에서 행동 인식은 모델이 알고 있는 클래스의 행동을 분류함과 동시에 알고 있지 않은 클래스의 행동을 동시에 감지할 수 있어야하는 오픈셋 문제에 해당한다. 그러나 행동 인식 모델은 배경과 같은 영상 내 정적 정보에 편향되기 쉬우며, 이는 오픈셋 행동 인식 모델의 성능 저하로 이어질 수 있다. 따라서, 본 논문에서는 비디오로부터 추출된 어텐션 맵을 기반으로 오픈셋 행동 인식을 개선할 수 있는 프레임워크를 제안한다. 제안하는 프레임워크는 두 가지의 편향 제거 단계: (1) 프레임 선택, (2) 패치 마스킹을 통해 비디오에서 정적 편향이 존재하는 패치를 제거한다. 실험 결과는 제안하는 방법이 다양한 오픈셋 행동 인식 방법과 어려운 벤치마크에서 일관된 성능 향상을 달성할 수 있음을 보여준다. 또한, 우리는 실제 시나리오에 가까운 환경에서 제안하는 방법을 검증하기 위해 두 가지 새로운 오픈셋 행동 인식 밴치마크 태스크 Kinetics-400 vs. Kinetics-600 exclusive와 Kinetics-400 vs. Kinetics-700 exclusive를 도입하였다. 우리는 다양한 실험을 통해 어텐션 기반 마스킹의 효과를 입증하고, 심층 분석을 통해 정적 편향이 오픈셋 행동 인식에 미치는 영향을 검증하였다.

서지기타정보

서지기타정보
청구기호	{MCS 23055
형태사항	iv, 35 p. : 삽도 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 심민호 지도교수의 영문표기 : Ho-Jin Choi 지도교수의 한글표기 : 최호진 Including appendix
학위논문	학위논문(석사) - 한국과학기술원 : 전산학부,
서지주기	References : p. 29-33
주제	Open set action recognition video masking attention map video vision transformer 오픈셋 행동 인식 비디오 마스킹 어텐션 맵 비디오 비전 트랜스포머

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서