jaccard距离定义:
求法: A∩B和A∪B,A与B都是向量与集合求解有点区别
假如 A=(1,0,0)B=(0,1,0)A∩B=(0,0,0);A∪B=(1,1,0)
所以J(A,B)= 0 / 2 = 0 注:2 是AUB有两个1,如果A∩B=(1,1,0),则结果为 J(A,B) = 2 / 2 =1
dj(A.B) = 1 - J(A,B) = 1 - 0 = 1
附上python实现这个距离的代码:
def jaccard(u, v, w=None):
"""
Compute the Jaccard-Needham dissimilarity between two boolean 1-D arrays.
The Jaccard-Needham dissimilarity between 1-D boolean arrays `u` and `v`,
is defined as
.. math::
\\frac{c_{TF} + c_{FT}}
{c_{TT} + c_{FT} + c_{TF}}
where :math:`c_{ij}` is the number of occurrences of
:math:`\\mathtt{u[k]} = i` and :math:`\\mathtt{v[k]} = j` for
:math:`k < n`.
Parameters
----------
u : (N,) array_like, bool
Input array.
v : (N,) array_like, bool
Input array.
w : (N,) array_like, optional
The weights for each value in `u` and `v`. Default is None,
which gives each value a weight of 1.0
Returns
-------
jaccard : double
The Jaccard distance between vectors `u` and `v`.
Examples
--------
>>> from scipy.spatial import distance
>>> distance.jaccard([1, 0, 0], [0, 1, 0])
1.0
>>> distance.jaccard([1, 0, 0], [1, 1, 0])
0.5
>>> distance.jaccard([1, 0, 0], [1, 2, 0])
0.5
>>> distance.jaccard([1, 0, 0], [1, 1, 1])
0.66666666666666663
"""
u = _validate_vector(u)
v = _validate_vector(v)
print(u,v)
print(u != 0,v != 0)
nonzero = np.bitwise_or(u != 0, v != 0)
print(u != v)
unequal_nonzero = np.bitwise_and((u != v), nonzero)
print(nonzero,unequal_nonzero)
if w is not None:
w = _validate_weights(w)
nonzero = w * nonzero
unequal_nonzero = w * unequal_nonzero
a = unequal_nonzero.sum()
b = nonzero.sum()
print(a,b,'ss')
dist = np.double(unequal_nonzero.sum()) / np.double(nonzero.sum())
return dist
| (u 异或 v) 与 (u 或 v) | / | (u 或 v) | 注意绝对值是求得到结果含有1的个数
例如:A=(1,0,0)B=(0,1,0)A∩B=(0,0,0);A∪B=(1,1,0)
| (u 异或 v) 与 (u 或 v) | = | (1,1,0)& (1,1,0)| = | (1,1,0) | = 2
| (u 或 v) | = | (1,1,0) | = 2
所以 dj(A,B) = 2 / 2 = 1
这个结果和python给的例子代码是一样的,最后这个公式是我在理解代码得到的,如果有错读者可以自己看源码计算距离