Motivation from Content-Based Image Retrieval

Suppose we are comparing scenes using the EMD between color distributions. One problem is that the EMD can be large between color distributions for two images of the same scene taken under different illuminants, even if the camera location and orientation is fixed. This is because the pixels colors in the images can be quite different, as illustrated in the images below.

[color_g.jpg]

Under certain assumptions on the reflectance functions of scene objects, a change in the spectral power distribution (SPD) from a(w) to b(w) causes a linear transformation Aa,b of image pixel colors ([2]):

lighting SPD a(w) --> b(w)    ===>    [R G B]T --> Aa,b [R G B]T.

By allowing for a linear tranformation in the comparison between color distributions, the EMD can show, for example, that the scene under the white illuminant is similar to the scene under the red illuminant.

Texture comparison is another example in which allowing transformations is useful. Excellent results using the EMD to compare textures were obtained by Y. Rubner ([4]). The main idea is to summarize a texture by a distribution of energy in the spatial frequency plane. A distribution point xi is a point in the spatial frequency plane, and its weight wi is the fraction of the total energy at that frequency. The textures shown below contain energy at only one spatial frequency, but this will be enough to make our point clear.

[texture_g.jpg]

Suppose we want the EMD to be small between the energy distributions for the left and right textures because these differ only by a scaling and rotation. As shown above, let q=(fx,fy) denote a point in spatial frequency space. If we work in log-polar spatial frequency space, recording

p=(log ||q||,angle(q)),

then scaling and rotating the texture results in a translation of the point p:

scale texture by c, rotate by theta    ===>    p --> p + (log(1/c),theta).

By allowing for a translation in log-polar spatial frequency space, the EMD captures the similarity between textures that differ primarily in scale and orientation.

The need for transformations might be more direct than in the previous two applications, in the sense that distribution points may be points in the image plane instead of points in a color space or a spatial frequency space. Suppose for example, that we wish to match features in a stereo pair of images as shown below.

[ht1_feat.jpg] [ht30_feat.jpg]

The change in location of a feature point can be modelled (approximately) by an affine transformation

pl = A pr + t

if the thickness of the object is small in comparison to its distance from the camera center of projection.


top Title, Table of Contents, The EMD
prev The Problem
next A Convergent Iteration


The ideas and results contained in this document are part of my thesis, which will be published as a Stanford computer science technical report in June 1999.

S. Cohen. Finding Color and Shape Patterns in Images. Thesis Technical Report STAN-CS-TR-99-?. To be published June 1999.

Similar ideas applied to the EMD under translation have already been published in the technical report [1].

Email comments to scohen@cs.stanford.edu.