M. J. Black
D. J. Fleet
We present a new method for the modeling and tracking of human motion using a sequence of 2D video images. Our analysis is divided in two parts: First, we estimate a statistical model of typical activities from a large set of 3D human motion data. In a second step we use this probabilistic model as a prior distribution for Bayesian propagation using particle filters.
From a statistical modeling perspective, a 3D human motion can be thought of as a collection of time-series. The human body is represented as a set of articulated cylinders with 25 degrees of freedom and the evolution of a particular joint angle is described by one of the time-series. A key difficulty for the modeling of these data is that each time-series has to be decomposed into suitable temporal primitives prior to statistical analysis. For example, in the case of repetitive human motion such as walking, motion sequences decompose naturally into a sequence of identical ``motion cycles''. In this work, we present a new set of tools that allows for the automatic segmentation of the training data. In detail, we suggest an iterative procedure that generates the best segmentation with respect to the signal-to-noise ratio of the data in an aligned reference domain. This procedure allows us to use the mean and the principal components of the individual cycles in the reference domain as a statistical model. Technical difficulties in this context include missing information in the motion time-series and the necessity of enforcing smooth transitions between different cycles. To deal with these difficulties, we develop a new iterative method for data imputation and functional Principal Component Analysis (PCA) based on periodic regression splines.
The learned temporal model provides a prior probability distribution over human motions which can be used in a Bayesian framework for tracking. For this purpose, we specify a generative model of image appearance and the likelihood of observing image data given the model. The high dimensionality and non-linearity of the articulated human body model and the ambiguities in matching the generative model to the image result in a posterior distribution that cannot be represented in closed form. Hence, the posterior is represented using a discrete set of samples and is propagated over time using particle filtering. The learned temporal prior helps constrain the sampled posterior to regions of the parameter space with a high probability of corresponding to human motions. The resulting algorithm is able to track human subjects in monocular video sequences and recover their 3D motion under changes in their pose and against complex unknown backgrounds.
The moving pictures in the talk don't work in the web version but are included below.
MPEG movies of learned models. The movies show the mean and the first five principal components of the learned walking model. The principal components illustrate the main sources of variation from the mean walking behaviour. Also, we show several samples of artificially synthesized motions, generated using different noise levels.
Principal Component 1. Principal Component 2. Principal Component 3. Principal Component 4. Principal Component 5.
Low noise. Moderate noise. Large noise. Huge noise.
Click on images for MPEG movies of tracking results:
Ormoneit, D., Sidenbladh, H., Black, M. J. , and Hastie, T., Learning and tracking cyclic human motion, Working Paper: Department of Statistics, Stanford University. June 2000. (postscript)
Sidenbladh, H., Black, M. J., and Fleet, D.J., Stochastic tracking of 3D human figures using 2D image motion, to appear: European Conference on Computer Vision, Dublin, Ireland, June 2000.
Ormoneit, D., Sidenbladh, H., Black, M. J. , and Hastie, T., Learning and tracking human motion using functional analysis, to appear: In IEEE Workshop on Human Modeling, Analysis and Synthesis. Hilton Head Island, South Carolina, June 2000. (postscript)
Sidenbladh, H., De la Torre, F., Black, M. J., A framework for modeling the appearance of 3D articulated figures, Int. Conf. on Automatic Face and Gesture Recognition, Grenoble, France, April 2000.
Return to home page.