Formal Definition

Although we have described and informally defined the EMD for the two separate cases of (i) equal-weight distributions, and (ii) unequal-weight distributions, only one linear program is needed to formally define the EMD. Recall the notion for the input distributions:

x = { (x1,w1), (x2,w2), ..., (xm,wm) } and
y = { (y1,u1), (y2,u2), ..., (yn,un) },

wS is the total weight of x, and uS is the total weight of y. The EMD is defined by the linear program

EMD(x,y) = minF in F(x,y) WORK(F,x,y) / min(wS,uS)
= minF=(fij) in F(x,y) sumi=1..m, j=1..n fij d(xi,yj) / min(wS,uS),

where the minimum is taken over the set of feasible flows F(x,y) defined by

(1) fij >= 0 i=1..m, j=1..n
(2) sumj=1..n fij <= wi i=1..m
(3) sumi=1..m fij <= uj j=1..n
(4) sumi=1..m, j=1..n fij = min(wS,uS).

Here the flow variable fij represents the amount of mass matched between xi and yj. Constraint (1) forces this amount to be nonnegative. Constraint (2) requires that the total amount of y mass matched to x mass at xi does not exceed the amount wi of x mass at this location. Similarly, constraint (3) requires that the total amount of x mass matched to y mass at yj does not exceed the amount uj of y mass at this location. Finally, constraint (4) forces the total amount of mass matched to be the amount of mass in the lighter distribution. Without this constraint, the minimum would be achieved by setting fij=0 for all i,j.

The distance function d between points is called the ground distance. In contrast, the EMD is a distance between distributions which is built on top of this point distance function. The total amount of mass moved during any feasible flow (i.e. a flow satisfying (1), (2), (3), and (4)) is equal to the amount of mass min(wS,uS) in the lighter distribution. The EMD is equal to the average distance travelled (in ground distance units) by mass during an optimal flow. A simple units analysis shows that EMD units are the same as ground distance units.

When x and y are equal weight distributions, then all the mass in x must be matched to mass in y, and vice-versa. In this case, the previously given definition for the set of feasible flows F(x,y) can be rewritten as

(1) fij >= 0    i=1..m, j=1..n
(5) sumj=1..n fij = wi    i=1..m
(6) sumi=1..m fij = uj    j=1..n.

From (5) and (6), it follows that the total amount of mass matched in the equal-weight case is
sumi=1..m, j=1..n fij = wS=uS(=min(wS,uS)).

The linear program (LP) which defines the EMD is a special type of LP known as the transportation problem. The transportation simplex algorithm is a specialization of the simplex algorithm for solving LPs that takes advantage of the structure present in transportation problems ([3]). Y. Rubner has written a C code package to compute the EMD using the transportation simplex algorithm.


top Title, Table of Contents, The EMD
prev Intuition and Examples
next Some Notes


Email comments to scohen@cs.stanford.edu.