Voxel-coloring is a popular method of shape estimation. One of the limitations of Voxel-coloring, however, is that the approximated shape can be very coarse when there are only a few cameras. And if there are a lot of silhouettes, it consumes much computing time. Previous works improve the shape approximation by combining multiple silhouette images captured across time. As the silhouettes are accumulated with lapse of time, it consumes up excessive computing time. So we face a necessity of selecting optimal silhouettes which can be used for efficient Voxel coloring.
Our algorithm first calculates a motion homography using stereo such as 2D correspondence and triangulation. Then, we analyze the homography to find motion parameters and the number of independent motions in the scene. Once the rigid motions in each part between 2 time frames are generated, all of the silhouettes are treated as being captured at the same time, making the visual hull of the objects refined.
When there exist a lot of silhouettes, we use human visual system theory to find optimal views for Voxel Coloring. We divide this process into two steps. At first we select basis views for shape estimation, find complicated area using F-distribution, and select views around this area.
We validate our algorithm on real objects and compare it with SFS. Finally we validate our optimal views by comparing the 3D reconstruction results of using optimal 18 images with that of 36 images.