The Earth Mover's Distance under a transformation set *G* is defined
as

In words, we want to find the transformation of one distribution that minimizes its EMD to another distribution.

The general motivation for allowing transformations is to have a distance function which measures similarity modulo some factors. Consider the example below in which the two distributions have equal total weight.

The EMD between the distributions on the left is large because mass in the darker distributions must be transported large distances to change the darker distribution into the lighter distribution (lighter in terms of gray level, not total weight). The two distributions, however, appear visually similar. If we allow the darker distribution to be translated before comparing it to the lighter distribution, then the EMD is small. This is clearly shown in the right half of the above figure. We shall give more motivation for allowing transformations in the next section.

One class of distribution transformations contains transformations g that change a distribution's weights, but leave its points fixed:

Here **y**=(Y,u) splits the distribution into its points
contained in the matrix Y, and its weights contained in the vector u.
A useful transformation set of this class arises in the
scale estimation phase of the
SEDL image retrieval system. In this
setting, transformations are parametrized by a scale factor c, and

Here, the distributions **x** and **y** are the color distributions
for a database image and a query pattern, respectively, and the goal is to
compute an estimate for the scale (size) at which the pattern may occur
within the database image. See the description of the
SEDL scale estimation phase for more details.

Another class of distribution transformations consists of transformations g that change the points of a distribution, but leave its weights fixed:

For example, suppose *G*=*E*, the group of Euclidean
transformations. Distributions in *E* are defined by a rotation matrix
R and a translation vector t, and

In words, g_{R,t} rotates each distribution point by R, and then
translates the result by t. For the case when transformations change only
distribution points, we shall give an iteration that is guaranteed to
converge, although it may converge to only a locally optimal transformation.
Before we do this,
however, let us give some more motivation for allowing transformations.

top | Title, Table of Contents, The EMD |

prev | Some Notes |

next | Motivation from Content-Based Image Retrieval |

The ideas and results contained in this document are part of my thesis, which will be published as a Stanford computer science technical report in June 1999.

S. Cohen.
**Finding Color and Shape Patterns in Images**.
*Thesis Technical Report STAN-CS-TR-99-?*.
To be published June 1999.

Similar ideas applied to the EMD under translation have already been published in the technical report [1].

Email comments to scohen@cs.stanford.edu.