jaccard距离实例计算,用数字表示,用数字表示代入公式中

jaccard距离定义:


求法: A∩B和A∪B,A与B都是向量与集合求解有点区别
      假如 A=(1,0,0)B=(0,1,0)A∩B=(0,0,0);A∪B=(1,1,0)

      所以J(A,B)= 0 / 2 = 0    注:2 是AUB有两个1,如果A∩B=(1,1,0),则结果为 J(A,B) = 2 / 2 =1

              dj(A.B) = 1 - J(A,B) = 1 - 0  = 1


附上python实现这个距离的代码:

def jaccard(u, v, w=None):
    """
    Compute the Jaccard-Needham dissimilarity between two boolean 1-D arrays.

    The Jaccard-Needham dissimilarity between 1-D boolean arrays `u` and `v`,
    is defined as

    .. math::

       \\frac{c_{TF} + c_{FT}}
            {c_{TT} + c_{FT} + c_{TF}}

    where :math:`c_{ij}` is the number of occurrences of
    :math:`\\mathtt{u[k]} = i` and :math:`\\mathtt{v[k]} = j` for
    :math:`k < n`.

    Parameters
    ----------
    u : (N,) array_like, bool
        Input array.
    v : (N,) array_like, bool
        Input array.
    w : (N,) array_like, optional
        The weights for each value in `u` and `v`. Default is None,
        which gives each value a weight of 1.0

    Returns
    -------
    jaccard : double
        The Jaccard distance between vectors `u` and `v`.

    Examples
    --------
    >>> from scipy.spatial import distance
    >>> distance.jaccard([1, 0, 0], [0, 1, 0])
    1.0
    >>> distance.jaccard([1, 0, 0], [1, 1, 0])
    0.5
    >>> distance.jaccard([1, 0, 0], [1, 2, 0])
    0.5
    >>> distance.jaccard([1, 0, 0], [1, 1, 1])
    0.66666666666666663

    """
    u = _validate_vector(u)
    v = _validate_vector(v)
    print(u,v)
    print(u != 0,v != 0)
    nonzero = np.bitwise_or(u != 0, v != 0)
    print(u != v)
    unequal_nonzero = np.bitwise_and((u != v), nonzero)

    print(nonzero,unequal_nonzero)
    if w is not None:
        w = _validate_weights(w)
        nonzero = w * nonzero
        unequal_nonzero = w * unequal_nonzero
    a = unequal_nonzero.sum()
    b = nonzero.sum()
    print(a,b,'ss')
    dist = np.double(unequal_nonzero.sum()) / np.double(nonzero.sum())
    return dist


这个代码的最后计算公式为

   | (u 异或 v)  与 (u 或  v) |  /  | (u 或  v) |  注意绝对值是求得到结果含有1的个数
 例如:A=(1,0,0)B=(0,1,0)A∩B=(0,0,0);A∪B=(1,1,0)

| (u 异或 v)  与 (u 或  v) |   = | (1,1,0)& (1,1,0)|  = | (1,1,0) | = 2

 | (u 或  v) |  =  | (1,1,0) |  = 2

所以 dj(A,B) = 2 / 2 = 1

这个结果和python给的例子代码是一样的,最后这个公式是我在理解代码得到的,如果有错读者可以自己看源码计算距离


  











你可能感兴趣的:(Python,推荐算法,Python,推荐算法,jaccard距离,jaccard相似度)