Stereo, or the determination of 3D structure from multiple 2D images of a scene, is one of the fundamental problems of computer vision. Although steady progress has been made in recent algorithms, producing accurate results in the neighborhood of depth discontinuities remains a challenge. Moreover, among the techniques that best localize depth discontinuities, it is common to work only with a discrete set of disparity values, hindering the modeling of smooth, non-fronto-parallel surfaces.
This dissertation proposes a three-axis categorization of binocular stereo algorithms according to their modeling of smooth surfaces, depth discontinuities, and occlusion regions, and describes a new algorithm that simultaneously lies in the most accurate category along each axis. To the author's knowledge, it is the first such algorithm for binocular stereo.
The proposed method estimates scene structure as a collection of smooth surface patches. The disparities within each patch are modeled by a continuous-valued spline, while the extent of each patch is represented via a labeled, pixelwise segmentation of the source images. Disparities and extents are alternately estimated by surface fitting and graph cuts, respectively, in an iterative, energy minimization framework. Input images are treated symmetrically, and occlusions are addressed explicitly. Boundary localization is aided by image gradients.
Qualitative and quantitative experimental results are presented, which demonstrate that, for scenes consisting of smooth surfaces, the proposed algorithm significantly improves upon the state of the art, more accurately localizing both the depth of surface interiors and the position of surface boundaries. Finally, limitations of the proposed method are discussed, and directions for future research are suggested.