## Motivation from Content-Based Image Retrieval

Suppose we are comparing scenes using the EMD between color distributions. One problem is that the EMD can be large between color distributions for two images of the same scene taken under different illuminants, even if the camera location and orientation is fixed. This is because the pixels colors in the images can be quite different, as illustrated in the images below.

Under certain assumptions on the reflectance functions of scene objects, a change in the spectral power distribution (SPD) from a(w) to b(w) causes a linear transformation Aa,b of image pixel colors ([2]):

lighting SPD a(w) --> b(w)    ===>    [R G B]T --> Aa,b [R G B]T.

By allowing for a linear tranformation in the comparison between color distributions, the EMD can show, for example, that the scene under the white illuminant is similar to the scene under the red illuminant.

Texture comparison is another example in which allowing transformations is useful. Excellent results using the EMD to compare textures were obtained by Y. Rubner ([4]). The main idea is to summarize a texture by a distribution of energy in the spatial frequency plane. A distribution point xi is a point in the spatial frequency plane, and its weight wi is the fraction of the total energy at that frequency. The textures shown below contain energy at only one spatial frequency, but this will be enough to make our point clear.

Suppose we want the EMD to be small between the energy distributions for the left and right textures because these differ only by a scaling and rotation. As shown above, let q=(fx,fy) denote a point in spatial frequency space. If we work in log-polar spatial frequency space, recording

p=(log ||q||,angle(q)),

then scaling and rotating the texture results in a translation of the point p:

scale texture by c, rotate by theta    ===>    p --> p + (log(1/c),theta).

By allowing for a translation in log-polar spatial frequency space, the EMD captures the similarity between textures that differ primarily in scale and orientation.

The need for transformations might be more direct than in the previous two applications, in the sense that distribution points may be points in the image plane instead of points in a color space or a spatial frequency space. Suppose for example, that we wish to match features in a stereo pair of images as shown below.

The change in location of a feature point can be modelled (approximately) by an affine transformation

pl = A pr + t

if the thickness of the object is small in comparison to its distance from the camera center of projection.