한국과학기술원 도서관

서지주요정보
Deep spiral convolutional neural network for single image super-resolution and image enhancement = 초해상도 단일 영상 복원 및 영상 화질 개선을 위한 심층 나선형 컨볼루셔널 신경망 연구
서명 / 저자	Deep spiral convolutional neural network for single image super-resolution and image enhancement = 초해상도 단일 영상 복원 및 영상 화질 개선을 위한 심층 나선형 컨볼루셔널 신경망 연구 / Sanghyuk Park.
발행사항	[대전 : 한국과학기술원, 2018].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8032431

소장위치/청구기호

학술문화관(문화관) 보존서고

DEE 18002

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Convolutional neural network (CNN) based super-resolution (SR) and restoration algorithms have recently achieved a significant improvement on single image super-resolution (SISR) and various image enhancement (IE) tasks. The main objective of SR and IE are to generate a high-quality, high-resolution (HR) image from a given single low-resolution (LR) image or corrupted noisy image. Despite the powerful learning strength of deep networks, the previous CNN-based SR and IE algorithms still have limitations in recovering fine-textured HR results, although they have shown a high numerical similarity score such as a peak signal-to-noise ratio. This dissertation considers a fully end-to-end trainable texture-enhanced multi-scale SR network (TE-MSRN) and IE networks (e.g., multi-scale denoising network (MsDNN), multi-scale deblurring network (MsDBN) and video quality enhancement network (VQENet)) based on a deep spiral CNN while mitigating the limitations of previous deep SR and IE networks in terms of the SR and IE performance, training efficiency, and in recovering fine-textured details. When the SR and IE networks getting deeper, learning the long-range dependencies of the complex relationships between corrupted LR image and HR image becomes more difficult. Generally, the deeper networks suffer from not only the additional increase of the computational complexity and memory cost but also the difficulty in training networks due to the over-fitting and gradient explosion/vanishing/shattered problems. To overcome these difficulties, this dissertation investigates six extensions: an upscaling network with multi-scale feature embedding, multi-scale restoration network, both global and local residual learning, texture evaluating network, a deep spiral CNN, and combination of multiple loss. The TE-MSRN takes a LR image and reconstructs a HR image using an upscaling network and restoration network while not only minimizing corresponding residuals but also enhancing the texture by enforcing the HR prediction to generate ground truth texture through a texture evaluating network. The global residual between intermediate HR prediction and ground truth is minimized in a recurrent manner while reducing each local residual using a deep spiral CNN, and the intermediate output of each recurrent state in the restoration network is supervised by the intermediate auxiliary loss. While reconstructing the HR output, the texture evaluating network is cascaded on to the restoration network such that an accurate texture prediction can be made from the output of the restoration network during training, with this then removed during testing. A deep spiral CNN is considered via a recurrent structure while recurrently minimizing the restoration residual in multiple stages: multi-scale recurrent CNN takes its previous output as input and produces an output that is closer to ground truth residual. With each iteration, the residual gradually reduced, and the HR prediction becomes closer to ground truth. The entire process is reminiscent of a spiraling staircase reaching its destination. Nevertheless, it remains jointly optimized in a unified single architecture using all subnetworks specialized for their own purposes from scratch while increasing the training efficiency and yielding superior SR performance. The considered TE-MSRN is trained to produce a fine-textured HR image with suitable combinations of losses: $l_1$-loss, $l_2$-loss, perceptual structural similarity loss, and intermediate auxiliary loss. Based on a combination of loss functions, the TE-MSRN is explicitly trained to reduce visually implausible artifacts further, leading to a more accurate HR result. This is demonstrated to be effective when used to reduce visually implausible artifacts further, leading to a more accurate HR result. The TE-MSRN is completely end-to-end trainable with integration into a unified single architecture. The main architecture of TE-MSRN is modified to MsDNN, MsDBN, and VQENet for each task. The performance of the TE-MSRN is evaluated on six standard benchmark datasets for SR including two datasets consisting of only textures, four benchmark datasets for IE, and three benchmark datasets for complex video scene analysis (VSA) with video quality enhancement (VQE). Extensive experimental results show that the TE-MSRN, MsDNN, and MsDBN achieve the best performance while making better texture predictions compared to the current state-of-the-art SR and IE algorithms, and show that VQENet helps to increase VSA performance.

컨볼루셔널 신경망 기반의 초해상도 영상 복원 알고리즘은 최근 단일 영상 초해상도 복원 및 영상 화질 개선 분야에서 상당한 성능 향상을 이루었다. 초해상도 영상 복원 및 영상 화질 개선 테스크의 주된 목적은, 손상된 저해상도 입력 영상으로부터 고품질 고해상도 영상을 생성하는 것이다. 최근 깊은 신경망의 강력한 학습 능력에도 불구하고, 기존에 제안되었던 다수의 컨볼루셔널 신경망 기반의 초해상도 영상 복원 및 영상 화질 개선 방법들은 최고 신호 대 잡음 비와 같은 수치적 복원 정확도 성능에서 높은 점수에 비하여 시각적으로 미세한 텍스처 질감의 고해상도 영상을 복원하는데는 여전히 한계가 있다. 본 논문에서는 초해상도 영상 복원 및 화질 개선의 향상 성능, 네트워크 훈련 효율성 및 미세 텍스처 세부 정보 복원 측면에서 이전에 제안되었던 심층 컨볼루셔널 신경망 기반 방법들의 한계을 극복하기 위하여, 심층 나선형 컨볼루셔널 신경망에 기반한 완전한 엔드-투-엔드 학습이 가능한 텍스처 품질 향상된 멀티스케일 초해상도 영상 복원 및 화질 개선 네트워크(texture-enhanced multi-scale SR network, TE-MSRN)를 제안한다. 초해상도 복원 및 영상 화질 개선을 위한 네트워크의 구조가 깊어지는 경우에, 계산 복잡도와 메모리 비용이 추가로 증가할 뿐만 아니라, 그래디언트 소실, 폭발, 셰터트 현상 등의 문제로 인하여 깊은 네트워크를 학습하는 데 어려움이 존재한다. 이러한 어려움을 극복하기 위하여 본 논문에서는 다중 스케일 특징 임베딩을 포함한 업 스케일링 네트워크, 다중 스케일 복원 네트워크, 전역 및 지역 레지듀얼 학습, 텍스처 평가 네트워크, 심층 나선형 신경망 구조 및 다중 손실 조합 학습 등의 6가지 개선을 통한 성능 향상에 대하여 조사한다. 본 논문에서 제안하는 TE-MSRN은 입력된 저해상도 영상을 업 스케일링 네트워크와 복원 네트워크를 통하여 레지듀얼을 감소시키면서 고해상도 영상을 복원하고, 이때 텍스처 평가 네트워크를 통하여 복원된 고해상도 영상의 텍츠처가 실제와 같아지도록 유도한다. 고해상도 영상 복원은 중간 복원 영상 예측과 정답 영상 과의 전역적 레지듀얼을 순환적 방법으로 최소화하며, 심층 나선형 신경망을 이용하여 각 순환 상태의 중간 예측과 보조 손실 감독을 통한 지역적 레지듀얼을 동시에 최소화 하며 이루어진다. 고해상도 영상을 복원하는 동안, 텍스처 평가 네트워크는 복원 네트워크 끝단에 연결되어, 네트워크 훈련 과정 중에 복원 네트워크의 출력 영상이 정답 텍스처를 출력할 수 있도록 도움을 주며, 테스트 시에는 제거된다. 이때, 심층 나선형 컨벌루셔널 신경망은 이전 순환 상태의 출력을 다음 순환 상태의 입력으로 받아 정답 레지듀얼에 가깝게 새로운 출력을 생성하는 과정을 통하여, 순환 과정이 반복될수록 잔류 레지듀얼은 지속적으로 감소되어 고해상도 영상 예측이 정답에 점차 가까워진다. 제안하는 TE-MSRN은 교육 효율성을 높이고 탁월한 초해상도 복원 성능을 제공하면서 각각의 목적에 맞게 특수화된 모든 서브 네트워크는 단일 아키텍쳐로 통합되어 공동으로 최적화하기 위한 완전한 엔드-투-엔드 훈련 학습이 가능하다. 본 논문에서 제안하는 TE-MSRN은 $l_1$-손실, $l_2$-손실, 지각 구조적 유사성 손실 및 중간 보조 손실 등의 다중 손실 함수 조합을 사용하여 미세한 질감의 고해상도 영상을 복원하도록 훈련된다. 또한, TE-MSRN의 기본 아키텍처는 다른 영상 화질 개선 분야를 위해 디노이징 네트워크, 디블러링 네트워크 및 비디오 영상 품질 향상 네트워크 등으로 활용된다. TE-MSRN의 성능은 텍스처만으로 구성된 두 개의 데이터 세트를 포함한 6 종류의 표준 벤치 마크 데이터 세트에서 평가되며, 영상 화질 개선을 위한 4 종류의 벤치 마크 데이터 세트, 비디오 품질 향상 및 비디오 장면 분석을 위한 3 종류의 벤치 마크 데이터 세트를 이용하였다. 다양한 실험 평가 결과에 따라, 제안하는 방법은 최신 성능의 초해상도 영상 복원 및 화질 개선 알고리즘과 비교하여 최상의 성능을 달성하였음을 확인할 수 있고, 비디오 품질 향상 네트워크는 비디오 장면 분석에 도움이 되었음을 확인할 수 있었다.

서지기타정보

서지기타정보
청구기호	{DEE 18002
형태사항	vii, 102 p. : 삽화 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 박상혁 지도교수의 영문표기 : Changdong Yoo 지도교수의 한글표기 : 유창동
학위논문	학위논문(박사) - 한국과학기술원 : 전기및전자공학부,
서지주기	References : p. 89-98

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서