Although we have described and informally defined the EMD for the two separate cases of (i) equal-weight distributions, and (ii) unequal-weight distributions, only one linear program is needed to formally define the EMD. Recall the notion for the input distributions:
x = { (x_{1},w_{1}), (x_{2},w_{2}), ..., (x_{m},w_{m}) } | and |
y = { (y_{1},u_{1}), (y_{2},u_{2}), ..., (y_{n},u_{n}) }, |
w^{S} is the total weight of x, and u^{S} is the total weight of y. The EMD is defined by the linear program
EMD(x,y) | = | min_{F in F(x,y)} | WORK(F,x,y) | / | min(w^{S},u^{S}) |
= | min_{F=(fij) in F(x,y)} | sum_{i=1..m, j=1..n} f_{ij} d(x_{i},y_{j}) | / | min(w^{S},u^{S}), |
where the minimum is taken over the set of feasible flows F(x,y) defined by
(1) | f_{ij} | >= | 0 | i=1..m, j=1..n |
(2) | sum_{j=1..n} f_{ij} | <= | w_{i} | i=1..m |
(3) | sum_{i=1..m} f_{ij} | <= | u_{j} | j=1..n |
(4) | sum_{i=1..m, j=1..n} f_{ij} | = | min(w^{S},u^{S}). |
Here the flow variable f_{ij} represents the amount of mass matched between x_{i} and y_{j}. Constraint (1) forces this amount to be nonnegative. Constraint (2) requires that the total amount of y mass matched to x mass at x_{i} does not exceed the amount w_{i} of x mass at this location. Similarly, constraint (3) requires that the total amount of x mass matched to y mass at y_{j} does not exceed the amount u_{j} of y mass at this location. Finally, constraint (4) forces the total amount of mass matched to be the amount of mass in the lighter distribution. Without this constraint, the minimum would be achieved by setting f_{ij}=0 for all i,j.
The distance function d between points is called the ground distance. In contrast, the EMD is a distance between distributions which is built on top of this point distance function. The total amount of mass moved during any feasible flow (i.e. a flow satisfying (1), (2), (3), and (4)) is equal to the amount of mass min(w^{S},u^{S}) in the lighter distribution. The EMD is equal to the average distance travelled (in ground distance units) by mass during an optimal flow. A simple units analysis shows that EMD units are the same as ground distance units.
When x and y are equal weight distributions, then all the mass in x must be matched to mass in y, and vice-versa. In this case, the previously given definition for the set of feasible flows F(x,y) can be rewritten as
(1) | f_{ij} | >= | 0 | i=1..m, j=1..n |
(5) | sum_{j=1..n} f_{ij} | = | w_{i} | i=1..m |
(6) | sum_{i=1..m} f_{ij} | = | u_{j} | j=1..n. |
From (5) and (6), it follows that the total amount of mass matched in
the equal-weight case is
sum_{i=1..m, j=1..n} f_{ij} =
w^{S}=u^{S}(=min(w^{S},u^{S})).
The linear program (LP) which defines the EMD is a special type of LP known as the transportation problem. The transportation simplex algorithm is a specialization of the simplex algorithm for solving LPs that takes advantage of the structure present in transportation problems ([3]). Y. Rubner has written a C code package to compute the EMD using the transportation simplex algorithm.
top | Title, Table of Contents, The EMD |
prev | Intuition and Examples |
next | Some Notes |
Email comments to scohen@cs.stanford.edu.