한국과학기술원 도서관

서지주요정보
Network fusion based video summarization using visual-semantic features = 시각적 의미적 특징점을 이용한 네트워크 퓨전 기반 비디오 요약
서명 / 저자	Network fusion based video summarization using visual-semantic features = 시각적 의미적 특징점을 이용한 네트워크 퓨전 기반 비디오 요약 / Hyunwoo Nam.
발행사항	[대전 : 한국과학기술원, 2018].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8032057

소장위치/청구기호

학술문화관(문화관) 보존서고

MEE 18038

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

This paper proposes a video summarization method based on network fusion. The goal of this method is to create a meaningful video summary consisting of representative scenes without duplication by using visual and semantic features. To achieve this goal, a final summary is generated by considering the visual and semantic similarity among shots. More specifically, our method uses Convolutional neural networks (CNNs) to extract visual and semantic features. Visual features are the image features from the top layer of CNNs. Semantic features are the word vectors represented by the Word2Vec descriptor. A number of key frames are generated by a shot segmentation and used as input of the CNNs. A visual or semantic network is then constructed by computing the cosine similarity among visual or semantic features. After the similarity is computed, two networks are combined into a fused network by the network fusion process. An optimal video summary is then computed using spectral clustering on the fused network. The performance of this method is evaluated on two datasets, and the results show that it achieves better performance than state-of-the-art video summarization methods.

본 논문에서는 네트워크 퓨전 기반 비디오 요약 기법을 제안한다. 이 기법의 목표는 시각적 의미적 특징점들을 이용하여 중복없이 대표적인 장면들로 구성된 의미있는 비디오 요약본을 만드는 것이다. 이 목표를 달성하기 위해, 우리는 여러 샷(Shot) 사이의 시각적 의미적 유사성을 고려하여 최종 비디오 요약본을 생성한다. 구체적으로, 우리 기법은 시각적 의미적 특징점을 추출하기 위해 합성곱 신경망 (Convolutional neural networks)를 이용한다. 시각적 특징점은 합성곱 신경망의 최상단 레이어에서의 영상 특징점 벡터이다. 의미적 특징점은 워드투벡터 (Word2Vec) 서술자에 의해 표현된 단어 벡터이다. 우리 기법에서는 많은 키 프레임들이 샷 분할(Shot segmentation) 기법에 의해 생성되고 이 키 프레임들은 합성곱 신경망의 입력으로 사용된다. 시각적 혹은 의미적 네트워크는 시각적 혹은 의미적 특징점들사이의 코사인 유사성을 계산함으로써 구축된다. 코사인 유사성이 계산된 후, 두 네트워크는 네트워크 퓨전에 의해 하나의 네트워크로 결합된다. 최적의 비디오 요약은 결합된 네트워크에서 스펙트럼 군집화 (Spectral clustering)를 이용함으로써 계산된다. 우리 기법의 성능은 두 개의 데이터셋에서 평가되었고, 최신의 비디오 요약 기법들보다 더 좋은 성능을 달성했다.

서지기타정보

서지기타정보
청구기호	{MEE 18038
형태사항	iii, 26 p. : 삽화 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 남현우 지도교수의 영문표기 : Chang Dong Yoo 지도교수의 한글표기 : 유창동
학위논문	학위논문(석사) - 한국과학기술원 : 전기및전자공학부,
서지주기	References : p. 19-23

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서