한국과학기술원 도서관

서지주요정보
Deep predictive video compression using mode-selective uni- and bi-directional predictions based on multi-frame hypothesis = 심층 신경망 기반 다중 프레임을 이용한 단방향 및 양방향 예측을 이용한 비디오 압축
서명 / 저자	Deep predictive video compression using mode-selective uni- and bi-directional predictions based on multi-frame hypothesis = 심층 신경망 기반 다중 프레임을 이용한 단방향 및 양방향 예측을 이용한 비디오 압축 / Woonsung Park.
발행사항	[대전 : 한국과학기술원, 2021].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8037648

소장위치/청구기호

학술문화관(문화관) 보존서고

DEE 21044

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Recently, deep learning based researches in the field of image processing are being actively conducted. Likewise, there are recent methods based on both non-linear transform and motion estimation using deep neural networks for image and video compression. Deep learning-based image compression has shown significant performance improvement in terms of coding efficiency and subjective quality. However, less effort has been relatively done on video compression based on deep neural networks. In this study, we propose an end-to-end deep predictive video compression network, called DeepPVCnet, using mode-selective uni- and bi-directional predictions based on multi-frame hypothesis with a multi-scale structure and a temporal-context-adaptive entropy model as follows: First, we propose a structure that compresses the current frame using multiple reference frames, not using a single reference frame as in recent methods. The method based on a single reference frame has a limitation in improving the coding efficiency because the neighboring frame information is limitedly used for compression of the current frame. Second, learned from the lesson of the conventional video codec, we firstly incorporate a mode-selective framework into our DeepPVCnet with uni- and bi-directional predictive modes in a rate-distortion minimization sense. Since the recent methods used either uni-directional or bi-directional predictions for the current frame, the coding efficiency can be limited for video compression. Third, we propose an entropy model that utilizes the temporal context information of the reference frames for the current frame coding. The autoregressive entropy models for CNN-based image and video compression is difficult to compute with parallel processing. On the other hand, our proposed entropy model utilizes temporally coherent context from the reference frames, so that the context information can be computed in parallel. Finally, Our DeepPVCnet jointly compresses motion information and residual data that are generated from the multi-scale stucture via the feature transformation layers, which has an advantage in terms of computational complexity and the ability to remove redundancy between joint information. Extensive experiments show that our DeepPVCnet outperforms AVC/H.264, HEVC/H.265 and state-of-the-art methods in MS-SSIM perspective.

최근 이미지 처리 분야에서 딥러닝 연구가 활발히 이루어지고 있는데, 이미지나 비디오 압축 방법에 대해서도 딥러닝 기반의 비선형 변환, 모션 정보 추출과 같은 방법을 이용한 연구가 이루어지고 있다. 그런데 최신 딥러닝 기반의 이미지 압축 방법은 활발한 연구가 이루어지며 높은 압축 성능을 보이고 있지만, 최신 딥러닝 기반의 비디오 압축 방법은 상대적으로 연구가 많이 이루어지지 않았다. 본 논문 연구에서는 다음과 같이 심층 신경망 기반의 비디오 압축 방법을 제안한다. 첫째, 제안하는 연구는 기존의 단일 프레임만을 참조하여 현재 프레임을 압축하는 구조와는 달리 다중 프레임을 이용하여 현재 프레임을 압축하는 구조를 제안한다. 기존의 방법은 현재 프레임의 주변 단일 프레임을 이용하여 현재 프레임을 예측하는 방식으로 현재 프레임을 압축하였다. 그러나 이러한 방법은 현재 프레임의 압축을 위해 주변 정보를 한정적으로 사용하기 때문에 압축 효율을 높이는 데에 한계가 있다. 따라서 제안하는 방법은 더 많은 주변 정보를 활용하여 압축 성능을 올리는 방법을 제안한다. 둘째, 기존의 단방향이나 양방향만을 예측으로 사용하던 방법들과는 달리 두 경우를 모두 고려하여 압축적 관점에서 더 적은 정보량으로 더 높은 복원 성능을 보이는 경우를 택하는 방법을 제안한다. 기본적으로 현재 프레임을 압축하기 위해 주변 프레임을 사용하는데, 어떤 프레임을 사용할 것인지에 대한 결정을 압축적 관점에서 고려하는 것이 압축 성능을 더 개선할 수 있다. 셋째, 양자화된 잠재 공간의 엔트로피 코딩을 위해 잠재 공간의 분포를 가우시안 분포로 가정하고, 그에 해당하는 파라미터를 현재 프레임으로부터 얻은 컨텍스트 정보와 주변 프레임으로부터 얻은 컨텍스트 정보를 모두 활용하여 추론하는 방법을 제안한다. 기존의 방법은 양자화된 잠재 공간의 특정 위치의 엔트로피 코딩을 위해 디코더에서 이미 생성된 잠재 공간의 주변 값을 이용하여 특정 위치의 분포를 예측하기 때문에 병렬 처리가 불가능하지만, 제안하는 방법은 디코더에서 언제든지 이용할 수 있는 정보를 바탕으로 잠재 공간의 분포를 예측하기 때문에 병렬 처리가 가능하다는 장점이 있다. 넷째, 멀티 스케일 구조로부터 생성된 프레임간의 모션 정보와 그에 따른 잔차 이미지를 동시에 효과적으로 잠재 공간으로 압축할 수 있는 방법을 제안한다. 이 방법은 모션 정보와 잔차 이미지를 따로 압축하는 방법보다 계산 복잡도 측면이나 구조적으로 이점이 있다. 이러한 방법들을 통해 제안한 방법은 최근 심층 신경망 기반의 비디오 압축 방법이나 기존 표준 비디오 코덱(AVC/H.264, HEVC/H.265)보다 MS-SSIM 관점에서 더 높은 압축 성능을 보였다.

서지기타정보

서지기타정보
청구기호	{DEE 21044
형태사항	vii, 60 p. : 삽화 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 박운성 지도교수의 영문표기 : Munchurl Kim 지도교수의 한글표기 : 김문철 Including Appendix
학위논문	학위논문(박사) - 한국과학기술원 : 전기및전자공학부,
서지주기	References : p. 54-57

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서