The Earth Mover's Distance Under Transformations

We begin with a statement of the problem and the basic motivation for its study.

The Problem

The Earth Mover's Distance under a transformation set G is defined as

EMDG(x,y) = ming in G EMD(x,g(y)).

In words, we want to find the transformation of one distribution that minimizes its EMD to another distribution.

The general motivation for allowing transformations is to have a distance function which measures similarity modulo some factors. Consider the example below in which the two distributions have equal total weight.


The EMD between the distributions on the left is large because mass in the darker distributions must be transported large distances to change the darker distribution into the lighter distribution (lighter in terms of gray level, not total weight). The two distributions, however, appear visually similar. If we allow the darker distribution to be translated before comparing it to the lighter distribution, then the EMD is small. This is clearly shown in the right half of the above figure. We shall give more motivation for allowing transformations in the next section.

One class of distribution transformations contains transformations g that change a distribution's weights, but leave its points fixed:

g(y) = g(Y,u) = (Y,g(u)).

Here y=(Y,u) splits the distribution into its points contained in the matrix Y, and its weights contained in the vector u. A useful transformation set of this class arises in the scale estimation phase of the SEDL image retrieval system. In this setting, transformations are parametrized by a scale factor c, and

gc(y) = gc(Y,u) = (Y,cu).

Here, the distributions x and y are the color distributions for a database image and a query pattern, respectively, and the goal is to compute an estimate for the scale (size) at which the pattern may occur within the database image. See the description of the SEDL scale estimation phase for more details.

Another class of distribution transformations consists of transformations g that change the points of a distribution, but leave its weights fixed:

g(y) = g(Y,u) = (g(Y),u).

For example, suppose G=E, the group of Euclidean transformations. Distributions in E are defined by a rotation matrix R and a translation vector t, and

gR,t(Y) = gR,t([y1 ... yn]) = [Ry1+t ... Ryn+t].

In words, gR,t rotates each distribution point by R, and then translates the result by t. For the case when transformations change only distribution points, we shall give an iteration that is guaranteed to converge, although it may converge to only a locally optimal transformation. Before we do this, however, let us give some more motivation for allowing transformations.

top Title, Table of Contents, The EMD
prev Some Notes
next Motivation from Content-Based Image Retrieval

The ideas and results contained in this document are part of my thesis, which will be published as a Stanford computer science technical report in June 1999.

S. Cohen. Finding Color and Shape Patterns in Images. Thesis Technical Report STAN-CS-TR-99-?. To be published June 1999.

Similar ideas applied to the EMD under translation have already been published in the technical report [1].

Email comments to