Cell-based multi-view sequence encoding for representation of 3D object = 3차원 물체의 표현을 위한 셀 기반 다중 시점 영상의 부호화
Representation and storage of 3D objects or scenes have been the essential parts of many applications. Recently, there are growing demands for providing more sophisticated and intelligent representation of 3D objects or scenes in many applications such as computer graphics communication, natural and synthetic scene coding/composition, Virtual reality and telepresence applications, and various multimedia internet applications. Geometry-based rendering has been the typical approach to these problems. But, this approach has several major drawbacks. To solve these problems, image-based rendering has been introduced and developed by many researchers in recent years. Image-based rendering is a powerful and promising approach for 3D object representation. It is fundamentally different from traditional geometry-based rendering. This approach considers a 3D object or a scene as a collection of images while geometry-based rendering uses mathematical descriptions of an object. Such collection of images is called key frame and taken from the predefined reference viewpoints. Thus image-based rendering system should be able to generate intermediate views of the object using these key frames. In an image-based rendering system, the computation required to render an object is only dependent on the number of pixels in the key frames, not on the geometric complexity of the object. But the conventional image-based rendering approaches described in the literature have some problems. First, they need the exact pixel-by-pixel correspondences and the depth information of each pixel which are difficult to calculate or may not be available in some cases. These methods also require that the disparity map as well as the key frames should be properly encoded and stored. This is not desirable because the size of the disparity map is also extra burden. Second, although they do not require the exact pixel-by-pixel correspondences and the depth information of each pixel, an enormous set of key frames are required, which makes the database size huge. And most of all, these approaches in the literature have one problem in common. They only consider the generation of the intermediate views, which means most of re-searches are confined to view synthesis. Other parts for the system implementation are not fully considered : how to select a key frame set, how to compress the key frames, and how to encode additional side information efficiently. In particular, encoding of the key frames - encoding of multi-view sequences - is extremely important. In an image-based rendering system, there are 3 major problems to be solved: key frame selection, key frame encoding, and view synthesis. As mentioned in the previous sections, algorithms on view synthesis have been intensively studied by many re-searchers while the other two problems have seldom been considered. Since complete description of an object needs a large number of key frames and the transmission or storage of these key frames are not practical because of the increased cost and large bandwidth, efficient coding techniques should be employed to reduce the data rate. Our system is intended for a complete system implementation and integration for an image-based rendering, especially focusing on encoding of key frames. The key frames of an object are acquired in different viewpoint and there must be inter-viewpoint redundancy between these key frames. We will present how to exploit this inter-viewpoint redundancy and how to encode the key frames using it. In this thesis, we present an object-based encoding system for multi-view sequences of 3D object and a view synthesis algorithm. We also provide two improved solutions for our multi-view encoder system. First, we propose an efficient key frame rearrangement method, warping parameter estimation by total least squares, texture encoding, and shape information encoding. Computer simulations show the encoding results and we consider the results carefully. Second, we propose two additional improvements for our multi-view encoding system. One is a hierarchical encoding scheme for fast decoding and progressive transmission of key frames with various resolutions. The other is an enhanced outlier handling method for robust prediction of inter-viewpoint redundancy. Finally, a simple view synthesis algorithm based on the warping parameters and its experimental results are given. our view synthesis algorithm is based on image warping, which consists of disparity field regeneration technique, region partitioning, and reverse warping.

3차원 물체의 표현과 저장은 많은 응용 분야에서 필수적인 요소가 되고 있다. 특히, 최근에 와서 인터넷의 보급과 더불어 원격 화상 회의나 가상 현실, 인터넷을 통한 쇼핑, 또는 멀티미디어 검색등의 응용 분야가 활성화되고 있다. 기하학 기반 렌더링은 이러한 목적을 달성하기 위한 전형적인 방법으로 인식되어 왔다. 그러나, 이런 기하학 기반 렌더링은 모델링이 어렵다는 단점과 많은 계산량을 요구하기 때문에 실시간 처리가 필요한 분야에는 적용하기가 힘들다는 단점이 있다. 기하학 기반 렌더링의 단점들을 극복하기 위한 노력으로 영상 기반 렌더링이라는 기법이 대두되기 시작했고, 많은 연구가 활발하게 진행되고 있다. 영상 기반 렌더링은 수학적인 모델링이 필요없이 영상들의 집합을 가지고 물체를 표현하는 것이다. 즉, 물체를 바라보는 시점을 변화해 가면서 영상을 획득하고, 이러한 영상들의 집합으로 원하는 물체를 표현하는 것이다. 영상 기반 렌더링은 기존의 기하학 기반 렌더링의 부족함을 충족시켜 주는 강력한 렌더링 기법으로 각과받기 시작했으나, 아직까지의 연구는 대부분이 시점 합성에 국한되어 있다. 영상 기반 렌더링을 위한 다른 구성 요소들, 예컨데 기준 영상 획득과 선택이나 기준 영상의 효율적인 저장 방법들에 대한 연구는 거의 이루어 지지 않고 있다. 본 논문에서는 영상 기반 렌더링 시스템을 위한 기준 영상 압축 부호화에 대해 서술하고 있다. 첫째로 기존의 동영상 부호화와 영상 기반 렌더링에서의 기준 영상 부호화를 차별화시켜서, 우수한 압축 이득을 얻을 수 있을 뿐만이 아니라, 3차원 물체를 보다 효율적으로 보여주기 위한 시스템을 제시한다. 임의 접근이 용이하도록 기준 영상을 재배열하는 방법을 제시하였고, 영상의 해상도에 따라서 선택적 전송에 적합하게 하기 위한 계층적 부호화 접근 방식과 그에 대한 실험 결과도 제시했다, 부가적으로 부호화 효율을 높이기 위한 노력으로 신뢰성 있는 파라미터 추정을 위한 향상된 외부자 제어법으로 가중치 적용에 대한 연구와 그 결과도 제시했다.


