EMD距离(Earth Mover's Distance)

EMD(搬土距离)(Earth Mover's Distance)

参考:

http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/RUBNER/emd.htm
https://www.cnblogs.com/jackyzzy/p/3314667.html
http://blog.csdn.net/wangdonggg/article/details/35280735

详细的数据推导: http://blog.csdn.net/zhangping1987/article/details/25368183

Earth Mover's Distance,是2000年IJCV期刊文章《The Earth Mover's Distance as a Metric for Image Retrieval》基于运输问题的效率 提出的一种直方图相似度量。 它是归一化的从一个分布变为另一个分布的最小代价, 可以用来测量两个分布(multi-dimensional distributions)之间的距离。

EMD需要求解运输问题,其运算复杂度较高,平均而言至少是二次方级别。但是它作为距离函数,有一个非常好的特点是存在下界——两个分布的质心之间的距离,因此在粗略计算时,可以考虑用分布质心之间的距离代替EMD。

两个分布:

    (位置, 权重)

    P = {(p1, wp1), (p2, wp2), ..., (pm, wpm)}

    Q = {(q1, wq1), (q2, wq2), ..., (qn, wqn))}

    D = [dij]  表示位置 pi 到 位置 qj的距离

    F = [fij]  表示权重 wpi 到 wqj 的权重

代价函数:

    WORK(P, Q, F) =  sum(dij * fij for i in range(1:m) for j in range(1:n))

限制条件:

    fij > 0

    sum(fij for j in range(1:n)) <= wpi    从i处搬出来的重量不能超过wpi所有的总和
    
    sum(fij for i in range(1:m)) <= wqj    搬到j处的重量不能超过wqj能存放的总重量
    
    WORK = min(sum(wpi for i in range(1:m)), sum(wqj for j in range(1:n)))
    当仓库总重量和工厂货物总重量不一致时, 才需要限制第四个条件

动态规划的问题:

    F是线性优化问题的最优解,earth mover's distance 定义为归一化后的WORK:
    EMD(P, Q) = WORK(P, Q, F)
                    Divide
                sum(fij for i in range(1:m) for j in range(1:n))
    为了使EMD的值不随着运输总量的大小变化而变化, 引入归一化因子-总的运输量,  (The normalization factor is introduced in order to avoid favoring smaller signatures in the case of partial matching.)

EMD距离ground distance的优势:

  • Naturally extends the notion of a distance between single elements to that of a distance between sets, or distributions, of elements.
    将距离的概念扩展到集合/概率分布

  • Can be applied to the more general variable-size signatures, which subsume histograms. Signatures are more compact, and the cost of moving ``earth'' reflects the notion of nearness properly, without the quantization problems of most other measures.

  • Allows for partial matches in a very natural way. This is important, for instance, for image retrieval and in order to deal with occlusions and clutter.
    允许计算部分的相似度, 对于图像检索这很重要

  • Is a true metric if the ground distance is metric and if the total weights of two signatures are equal. This allows endowing image spaces with a metric structure.

  • Is bounded from below by the distance between the centers of mass of the two signatures when the ground distance is induced by a norm. Using this lower bound in retrieval systems significantly reduced the number of EMD computations.
    有下界-两个分布的质心之间的距离, 粗略估计可以减少大量计算

  • Matches perceptual similarity better than other measures, when the ground distance is perceptually meaningful. This was shown by [2] for color- and texture-based image retrieval.
    分布在视觉上有意义时, EMD距离比其它距离估计更有好, 可以参照基于色彩以及纹理的图像检索

使用:

C代码包: emd.h, emd.c, emd.i

OpenCV:实现了EMD api, 
    pip install --upgrade setuptools
    pip install numpy Matplotlib
    pip install opencv-python

import numpy as np
import cv

#p、q是两个矩阵,第一列表示权值,后面三列表示直方图或数量
p=np.asarray([[0.4,100,40,22],
        [0.3,211,20,2],
        [0.2,32,190,150],
        [0.1,2,100,100]],np.float32)
q=np.array([[0.5,0,0,0],
        [0.3,50,100,80],
        [0.2,255,255,255]],np.float32)
pp=cv.fromarray(p)
qq=cv.fromarray(q)
emd=cv.CalcEMD2(pp,qq,cv.CV_DIST_L2)

你可能感兴趣的:(EMD距离(Earth Mover's Distance))