Omni-directional cameras have many advantages over conventional cameras in that they have a much wider field-of-view (FOV). Accordingly, several approaches have been proposed recently to apply convolutional neural networks (CNNs) to omni-directional images for various visual tasks. However, most of them use image representations defined in the Euclidean space after transforming the omni-directional views originally formed in the non-Euclidean space. This transformation leads to shape distortion due to nonuniform spatial resolving power and the loss of continuity. These effects make existing convolution kernels experience difficulties in extracting meaningful information.
This paper presents a novel method to resolve such problems of applying CNNs to omni-directional images. The proposed method utilizes a spherical polyhedron to represent omni-directional views. This method minimizes the variance of the spatial resolving power on the sphere surface, and includes new convolution and pooling methods for the proposed representation. The proposed method can also be adopted by any existing CNN-based methods. The feasibility of the proposed method is demonstrated through classification, detection, and semantic segmentation tasks with synthetic and real datasets.
본 논문에서는 전방향 영상을 합성곱 신경망에 공급할 때 생기는 문제를 이미지의 기하학적 왜곡과 관련하여 심도 있게 분석하였으며, 이러한 왜곡을 해결하기 위해 새로운 형태의 전방향 영상 표현 기법인 SpherePHD 표현 방식을 제안하였다. 또한 본 논문에서는 제안한 SpherePHD 표현 방식의 영상을 합성곱 신경망에 적용시킬 수 있도록 새로운 알고리즘을 제안하였다. 마지막으로 본 논문에서는 제안한 SpherePHD 표현 방식이 타당하며 합성곱 신경망에 적용시켰을 때 기존의 전방향 영상에 비해 높은 성능을 보인다는 것을 다양한 실험을 통해 검증하였다. 검증에 사용된 실험은 객체 분류, 객체 탐지, 의미론적 영역 분할 기법, 깊이 지도 생성이다.