Segmenting and tracking moving objects from monocular gray-level images can be normally divided into three sub-problems: feature extraction, motion and 3D pose estimation.
Several previous approaches, developed especially for road scenes, have shown some limited success in their performance in outdoor environments due to dynamically changing illumination, the complexity and diversity of the scene. Therefore, most previous approaches worked only for certain particular situations and produced unexpected erroneous outputs.
This thesis proposes an evidential reasoning and probabilistic representation of extracted features for robustly extracting vehicles in a road scene. Generally, an evidential reasoning finds the perceptually known evidences of a target to be in an image.
Since a noisy image produces unreliable features and degrades the detection and localization, selecting image primitives, which are less sensitive to noise and well represent the evidences, is important. We overcome this problem by the probabilistic integration of image features based on maximum a posteriori probability that combines the prior and likelihood probabilities using Bayes' rule. Eventually, the elaborate segmentation embodies a successful paradigm for an accurate estimation of geometry and motion as well as 3D pose.
Using observed geometric data and motion parameters, we develop a Kalman filter based tracking algorithm which recursively estimates the geometry and motion of a target. Each region representing a vehicle in an image evolves smoothly under the affine transformation assumption.
By integrating the parameter estimation and image segmentation, we efficiently improve the accuracy of segmentation and tracking and minimize the effects of large motion and the abrupt motion change as well as noise.
The relative motion between an observer and an object causes the image deformation which can be described by the first-order differential invariants. We obtain the image deformation and the direction of relative motion from the changes in the orientation of extracted region boundaries and in the center of gravity of extracted regions between two images. Theses two terms give rise to the time-to-contact between an observer and a target and the surface normal vector of the viewed surface.
We have successfully performed the visual tracking of a toy vehicle moving on a three dimensionally shaped rail by an X-Y Cartesian manipulator and have demonstrated the accuracy of segmentation and tracking of multiple moving vehicles through experiments in a variety of road scenes.