The 3D computer graphics becomes the most important tool for the scientific visualization and military simulation, as well as the inevitable medium for personal communications. Although the computing technology has been improved, the current demand toward higher quality in real-time applications is beyond the performance of existing graphics computers. To improve rendering performance, many researchers have devoted their efforts to the development of parallelization methods, dedicated processing elements, and special memory-processor organization. And this thesis, as one of such efforts, makes a contribution to the development of graphics computers by introducing new ideas for parallel rendering architecture.
A scalable parallel rendering architecture based on the interleaved scanline rasterization is presented, which successfully removes the sorting overhead resident in the conventional scanline-based parallel rendering. Nevertheless, all advantageous features of the scanline-based rendering are kept; The frame memory bottleneck does not exist; The clipping overhead and the load imbalance shown in other region-based rendering approaches do not appear; The memory requirement is constant regardless to the number of rasterizers. The performance of the proposed architecture is evaluated by discrete event-based hardware simulator (DEVHS). which is made for combinational circuit modeling and simulation purpose. As expected, a good scalability is shown under the limitation of the polygon bus. More than 8 Mtriangles/s performance of PC-based rendering system is expected using 64 rasterizers and the AGP 2x bus. The first prototyping of rasterization board have been implemented and its functional verification is finished.
Communication overhead is also a major issue in parallel rendering architecture. By making use of the fact that the vertical motion of objects in computer animation is very smooth, the proposed primitive distribution algorithms effectively reduce network bandwidth and thus, increase the overall performance of a parallel rendering system. The proposed algorithms have been developed for the region-based parallel rendering architectures of ring topology including the interleaved scanline rasterization approach. Through various experimental results, the good scalability of network performance in wide range of the number of nodes is shown.
Several algorithms and resulting hardwares to increase rendering performance are also proposed in this thesis. {\it memory embedded interpolator (MEI)} is presented for fast linear interpolation. The MEI use some amount of memory for the look-up table to calculate linearly interpolated values. The MEI is a simple and fast interpolator and it can be easily implemented on general programmable devices as well as ASIC. To overcome the quality degradation from the linear interpolation, the perspective decomposition algorithm is also proposed. This algorithm reduces the perspective distortion effectively and it is helpful to generate more realistic images especially when texture filtering is applied.
An efficient bump mapping algorithm based on the angular perturbation is proposed. It directly calculates n´ㆍh and n´ㆍI from the perturbation fuctions, contrary to the conventional approaches in which the normal vector perturbation is performed primarily. It reduces considerably the number of operations involved in bump mapping by using the vector relations represented in the spherical polar form. A tabular method is combined with the proposed bump mapping algorithm to provide a real-time Phong shading. Since the proposed bump mapping algorithm generates n´ㆍh and n´ㆍI values, the required bit widths to index the diffuse and the specular look-up tables are quite smaller than that of the previous approaches. This leads to very small memory requirement. Phong shading on the bumped surface, therefore, is possible to be performed in real-time and the resulting hardware implementation is so simple to be integrated in one chip rendering accelerator. Again, several approximation methods for the angular perturbation algorithm are developed to simulate the normal vector perturbation in the non-orthonormal coordinate. This removes some constraints in modeling object geometries. Thus, it is useful for rendering non-uniform surfaces or deformable objects without great computational overheads.
All the proposed methods have been developed for high performance rendering computers. The interleaved scanline rasterization and primitive distribution algorithms on the ring topology present one of scalable parallel rendering architectures. Also the MEI, perspective decomposition, and bump mapping algorithms give solutions to increase rendering performance of the architecture. I believe that the proposed architecture is very useful for large-scale parallel rendering systems to generate fast complex scenes in which many moving objects exist.
3차원 컴퓨터 그래픽스는 게임, 가상현실, 과학적 가시화, 모의 군사 훈련등 많은 분야에서 사용되는 도구가 되었을 뿐 아니라 개인적 통신 수단의 중요한 전달 매체이다. 비록 컴퓨터의 성능은 발달했지만 보다 고화질의 영상을 빠른시간에 만들기 위한 욕구는 현존하는 그래픽스 컴퓨터의 성능을 능가하고 있다. 렌더링 성능을 높히기 위한 많은 연구가 이루어 지고 있으며 본 논문에서도 병렬화 방법 및 기존 알고리듬의 가속화에 대한 새로운 방법을 제시한다.
확장가능한 병렬 렌더링 아키텍쳐를 주사선 할당 방식의 분할 방법에 의거하여 제안한다. 기존의 아키텍쳐가 가지고 있는 장점 즉, 프레임 메모리의 병목 현상 제거, 비교적 균등한 부하 분포, 적은 프레임 메모리 요구 등을 유지하면서 폴리곤 정렬 문제를 해소함으로써 성능을 향상시켰다. 제안된 구조는 이산사건 시뮬레이션을 통해 그 성능을 확인했으며 PC를 호스트로 한 64개의 프로세서를 이용할 경우 8 Mtriangle/s라는 성능을 볼 수 있었다.
또한, 병렬 컴퓨터의 기본적인 문제점인 프로세서간 상호 통신량을 줄일 수 있는 렌더링 데이터 분배 알고리듬을 제안한다. 기본적으로 부드러운 움직임을 가정하여 국부적인 통신으로 렌더링 데이터를 주고 받을 수 있는 방안을 제시, 전제 통신량을 효과적으로 줄일 수 있다. 각 프로세서는 환 구조의 통신망에 연결되며 이웃한 프로세서와 데이터를 주고 받을 수 있으며, 앞서 제시한 주사선 할당 방식의 병렬 렌더링 방법을 이용하여 전체적으로 선형적 렌더링 성능및 네트웍 성능을 얻을 수 있다.