The Scale Estimation Phase

During the scale estimation phase, SEDL computes an estimate for the scale at which the pattern may occur within an image. It does so using only the amounts of colors within the image and query pattern, and not the locations of the colors.

The Main Ideas

The inputs to the scale estimation phase are the color signatures of the query pattern and the database image. In the example shown below, the query pattern (left) is a comet product label and the database image (right) is a comet advertisement. The center image is the pattern scaled according to the computed scale estimate.

[scale-colorsigs.jpg] [emdc.comet1.zoom.jpg]

A color signature is a collection of (color,weight) pairs, where the weight of a color is the fraction of image pixels classified as that color. For example the query color signature (Y,u), shows that about u1=23% of the query is y1=white, u2=18% is y2=dark green, and u3=17% is y3=yellow. It is convenient to think of color signatures as discrete distributions of mass/weight in some color space with total weight 1=100%.

In the example above, the query is about 10% of the total database image area. Suppose we scale down the query signature to have total weight 0.10 instead of 1.0, as shown in the above color signature labelled "Scaled Comet Logo Signature". Then all the color mass in the scaled query signature can be matched to similar-color mass in the image signature (90% of the color mass in the image is not needed to perform this match). If we decrease the total weight of the query to less than 0.10, the match does not improve because we already had enough image mass of each query color to match all the query mass when the total was 0.10. If we increase the total weight of the query to more than 0.10, then, the match will get worse because, for example, there will not be enough red and yellow mass in the image to match the red and yellow in the scaled query.

There is a distance measure between discrete distributions of mass called the Earth Mover's Distance (EMD). One can prove that the EMD between the image distribution and the scaled query distribution decreases as the scale parameter c is decreased, and furthermore that this distance becomes constant for all c <= c0. This coincides with the intuition given in the previous paragraph. A portion of the graph of EMD versus c for the comet example is shown above, to the right of the color signatures. The value c0 is used as an estimate for the scale at which the pattern appears in the image.

In the comet example, the scale estimate is very accurate. If, however, there were a bit more red and yellow in the image than just in the query occurence, then the scale estimate would be a bit too large. This is because "background" red and yellow in the image matches the red and yellow in the query just as well as the red and yellow from the query occurrence. In general, the scale estimation algorithm has the attractive property that the accuracy of the scale estimate is determined by the query color with the least background clutter within the image. The larger this minimum background clutter, the more overestimated the scale. Since there is no background clutter for red and yellow in the comet example, the scale estimate were very accurate, and this is despite the fact that there is a lot of background clutter for white (since there is a lot of white in the image that is not part of the query occurrence). In summary, just one query color with a small amount of background clutter in the image is enough to obtain an accurate scale estimate.

Some Details

We can find the breakpoint c0 efficiently (i.e. with as few EMD computations as possible) by performing a binary search along the c-axis.

[se_binsearch.jpg]

The user provides some minimum scale cmin that he/she is interested in finding the pattern. For simplicity of description, assume c0 >= cmin. Then initially we know is that c0 is somewhere in the interval [cmin,1]. If we check the value of the EMD at the midpoint cmid of this interval, and, as in the example above, it is greater than the EMD at cmin, then we know that c0 is somewhere in the smaller interval [cmin,cmid] (since the EMD decreases as c decreases). By checking the value of the EMD at the midpoint of the interval in which we are sure that c0 occurs, we can keep reducing the size of this interval (shown as a thick line segment in the graphs above) until it becomes smaller than some user-specified precision for the scale estimate.

Although the EMD versus c curve must become constant for c small enough, this constant value may be large. This can happen, for example, if a substantial portion of the query pattern is red, but there is no color in the image which is at all similar to red. When the EMD curve flattens out at a large EMD value, the system asserts that the pattern does not occur within the image, and it does not perform the the initial placement and refinement stages.

Results

Some results from the scale estimation phase are shown below. To the right of each image is the pattern scaled according to the estimate.

The scale estimate is fairly accurate in all of these examples, although it is a little too small for all but the final tide example. The scale is overestimated in the tide example because the pattern appears twice within the image. Since the scale estimation phase uses only the amounts of colors, and not their positions within the image, the algorithm cannot tell the difference between one pattern occurrence of scale c and two pattern occurrences of scales d and e, where c=d+e.
[se-results.jpg]

More results for the scale estimation are shown below.

As above, the scale estimate is fairly accurate in all of these examples. The estimate is a little too small for all but the final misty example in which the pattern occurs twice. In this example, the scale estimate is close to the sum of the scales of the two pattern occurrences.
[se-more-results.jpg]


top Title, Table of Contents, Introduction
prev Image Signatures
next The Initial Placement Phase


The ideas and results contained in this document are part of my thesis, which will be published as a Stanford computer science technical report in June 1999.

S. Cohen. Finding Color and Shape Patterns in Images. Thesis Technical Report STAN-CS-TR-99-?. To be published June 1999.

Email comments to scohen@cs.stanford.edu.