常见的距离
d ( x , y ) = ∑ i = 1 n ( x i − y i ) 2 = ∥ x − y ∥ 2 d(\boldsymbol{x},\boldsymbol{y})=\sqrt{\sum_{i=1}^n(x_i-y_i)^2}=\Vert\boldsymbol{x}-\boldsymbol{y}\Vert_2 d(x,y)=i=1∑n(xi−yi)2=∥x−y∥2
d ( x , y ) = x ⊤ ⋅ y d(\boldsymbol{x},\boldsymbol{y})=\boldsymbol{x}^\top\cdot\boldsymbol{y} d(x,y)=x⊤⋅y
d ( x , y ) = ∑ i = 1 n ∣ x i − y i ∣ = ∥ x − y ∥ 1 d(\boldsymbol{x},\boldsymbol{y})=\sum_{i=1}^n\vert x_i-y_i\vert=\Vert\boldsymbol{x}-\boldsymbol{y}\Vert_1 d(x,y)=i=1∑n∣xi−yi∣=∥x−y∥1
d ( x , y ) = ∑ i = 1 n ∣ x i − y i ∣ p p = ∥ x − y ∥ p d(\boldsymbol{x},\boldsymbol{y})=\sqrt[p]{\sum_{i=1}^n\vert x_i-y_i\vert^p}=\Vert\boldsymbol{x}-\boldsymbol{y}\Vert_p d(x,y)=pi=1∑n∣xi−yi∣p=∥x−y∥p
def distance_1d(s):
n = len(s)
A = np.zeros((n,n))
for i in range(n):
for j in range(n):
A[i,j] = abs(s[i]-s[j])
return A
import numpy as np
if __name__ == "__main__":
x = np.array([1,2,3,4,5])
D = distance_1d(x)
print(D)
def distance_nd(S):
nrow, ncol = S.shape # 获取输入矩阵的行数和列数
A = np.zeros((nrow,nrow))
for i in range(nrow):
for j in range(nrow):
summ = 0
for k in range(ncol):
summ = summ + (S[i,k]-S[j,k])**2
A[i,j] = np.sqrt(summ)
return A
import numpy as np
if __name__ == "__main__":
X = np.random.randn(5,10)
D = distance_nd(X)
print(D)
def Euclid_dist(x,y):
dist = np.sqrt(np.sum(np.square(x-y)))
return dist
import numpy as np
if __name__ == "__main__":
X = np.random.randn(5,10)
nrow, ncol = X.shape # 获取输入矩阵的行数和列数
A = np.zeros((nrow,nrow))
for i in range(nrow):
for j in range(nrow):
A[i,j] = Euclid_dist(X[i,:],X[j,:])
print(A)
两个列向量(样本)之间的欧式距离表示为
d 2 ( x , y ) = ∑ k ( x k − y k ) 2 = ( x − y ) T ( x − y ) = x ⊤ x − 2 x ⊤ y + y ⊤ y d^2(\boldsymbol{x},\boldsymbol{y})=\sum_k(x_k-y_k)^2=(\boldsymbol{x}-\boldsymbol{y})^T(\boldsymbol{x}-\boldsymbol{y})=\boldsymbol{x}^\top\boldsymbol{x}-2\boldsymbol{x}^\top\boldsymbol{y}+\boldsymbol{y}^\top\boldsymbol{y} d2(x,y)=k∑(xk−yk)2=(x−y)T(x−y)=x⊤x−2x⊤y+y⊤y
两个矩阵列与列之间的欧式距离表示为
d 2 ( X , Y ) = [ x 1 ⊤ x 1 − 2 x 1 ⊤ y 1 + y 1 ⊤ y 1 x 1 ⊤ x 1 − 2 x 1 ⊤ y 2 + y 2 ⊤ y 2 ⋯ x 1 ⊤ x 1 − 2 x 1 ⊤ y N + y N ⊤ y N x 2 ⊤ x 2 − 2 x 2 ⊤ y 1 + y 1 ⊤ y 1 x 2 ⊤ x 2 − 2 x 2 ⊤ y 2 + y 2 ⊤ y 2 ⋯ x 2 ⊤ x 2 − 2 x 2 ⊤ y N + y N ⊤ y N ⋮ ⋮ ⋱ ⋮ x M ⊤ x M − 2 x M ⊤ y 1 + y 1 ⊤ y 1 x M ⊤ x M − 2 x M ⊤ y 2 + y 2 ⊤ y 2 ⋯ x M ⊤ x M − 2 x M ⊤ y N + y N ⊤ y N ] = [ x 1 ⊤ x 1 x 1 ⊤ x 1 ⋯ x 1 ⊤ x 1 x 2 ⊤ x 2 x 2 ⊤ x 2 ⋯ x 2 ⊤ x 2 ⋮ ⋮ ⋱ ⋮ x M ⊤ x M x M ⊤ x M ⋯ x M ⊤ x M ] + [ y 1 ⊤ y 1 y 2 ⊤ y 2 ⋯ y N ⊤ y N y 1 ⊤ y 1 y 2 ⊤ y 2 ⋯ y 2 ⊤ y 2 ⋮ ⋮ ⋱ ⋮ y 1 ⊤ y 1 y 2 ⊤ y 2 ⋯ y N ⊤ y N ] − 2 [ x 1 ⊤ y 1 x 1 ⊤ y 2 ⋯ x 1 ⊤ y N x 2 ⊤ y 1 x 2 ⊤ y 2 ⋯ x 2 ⊤ y N ⋮ ⋮ ⋱ ⋮ x M ⊤ y 1 x M ⊤ y 2 ⋯ x M ⊤ y N ] = [ x 1 ⊤ x 1 x 2 ⊤ x 2 ⋮ x M ⊤ x M ] ⋅ [ 1 , 1 , ⋯ , 1 ] N + [ 1 1 ⋮ 1 ] M ⋅ [ y 1 ⊤ y 1 , y 2 ⊤ y 2 , ⋯ , y N ⊤ y N ] − 2 [ x 1 ⊤ x 2 ⊤ ⋮ x M ⊤ ] ⋅ [ y 1 , y 2 , ⋯ , y N ] \begin{array}{ll} d^2(X,Y)&= \left[\begin{array}{cccc} \boldsymbol{x}_1^\top\boldsymbol{x}_1-2\boldsymbol{x}_1^\top\boldsymbol{y}_1+\boldsymbol{y}_1^\top\boldsymbol{y}_1&\boldsymbol{x}_1^\top\boldsymbol{x}_1-2\boldsymbol{x}_1^\top\boldsymbol{y}_2+\boldsymbol{y}_2^\top\boldsymbol{y}_2&\cdots &\boldsymbol{x}_1^\top\boldsymbol{x}_1-2\boldsymbol{x}_1^\top\boldsymbol{y}_N+\boldsymbol{y}_N^\top\boldsymbol{y}_N\\ \boldsymbol{x}_2^\top\boldsymbol{x}_2-2\boldsymbol{x}_2^\top\boldsymbol{y}_1+\boldsymbol{y}_1^\top\boldsymbol{y}_1&\boldsymbol{x}_2^\top\boldsymbol{x}_2-2\boldsymbol{x}_2^\top\boldsymbol{y}_2+\boldsymbol{y}_2^\top\boldsymbol{y}_2&\cdots &\boldsymbol{x}_2^\top\boldsymbol{x}_2-2\boldsymbol{x}_2^\top\boldsymbol{y}_N+\boldsymbol{y}_N^\top\boldsymbol{y}_N\\ \vdots & \vdots & \ddots & \vdots&\\ \boldsymbol{x}_M^\top\boldsymbol{x}_M-2\boldsymbol{x}_M^\top\boldsymbol{y}_1+\boldsymbol{y}_1^\top\boldsymbol{y}_1&\boldsymbol{x}_M^\top\boldsymbol{x}_M-2\boldsymbol{x}_M^\top\boldsymbol{y}_2+\boldsymbol{y}_2^\top\boldsymbol{y}_2&\cdots& \boldsymbol{x}_M^\top\boldsymbol{x}_M-2\boldsymbol{x}_M^\top\boldsymbol{y}_N+\boldsymbol{y}_N^\top\boldsymbol{y}_N\\ \end{array} \right]\\\;\\ &=\left[\begin{array}{cccc} \boldsymbol{x}_1^\top\boldsymbol{x}_1&\boldsymbol{x}_1^\top\boldsymbol{x}_1&\cdots &\boldsymbol{x}_1^\top\boldsymbol{x}_1\\ \boldsymbol{x}_2^\top\boldsymbol{x}_2&\boldsymbol{x}_2^\top\boldsymbol{x}_2&\cdots &\boldsymbol{x}_2^\top\boldsymbol{x}_2\\ \vdots & \vdots & \ddots & \vdots&\\ \boldsymbol{x}_M^\top\boldsymbol{x}_M&\boldsymbol{x}_M^\top\boldsymbol{x}_M&\cdots& \boldsymbol{x}_M^\top\boldsymbol{x}_M\\ \end{array} \right]+ \left[\begin{array}{cccc} \boldsymbol{y}_1^\top\boldsymbol{y}_1&\boldsymbol{y}_2^\top\boldsymbol{y}_2&\cdots &\boldsymbol{y}_N^\top\boldsymbol{y}_N\\ \boldsymbol{y}_1^\top\boldsymbol{y}_1&\boldsymbol{y}_2^\top\boldsymbol{y}_2&\cdots &\boldsymbol{y}_2^\top\boldsymbol{y}_2\\ \vdots & \vdots & \ddots & \vdots&\\ \boldsymbol{y}_1^\top\boldsymbol{y}_1&\boldsymbol{y}_2^\top\boldsymbol{y}_2&\cdots& \boldsymbol{y}_N^\top\boldsymbol{y}_N\\ \end{array} \right]\\\;\\ &\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad-2\left[\begin{array}{cccc} \boldsymbol{x}_1^\top\boldsymbol{y}_1&\boldsymbol{x}_1^\top\boldsymbol{y}_2&\cdots &\boldsymbol{x}_1^\top\boldsymbol{y}_N\\ \boldsymbol{x}_2^\top\boldsymbol{y}_1&\boldsymbol{x}_2^\top\boldsymbol{y}_2&\cdots &\boldsymbol{x}_2^\top\boldsymbol{y}_N\\ \vdots & \vdots & \ddots & \vdots&\\ \boldsymbol{x}_M^\top\boldsymbol{y}_1&\boldsymbol{x}_M^\top\boldsymbol{y}_2&\cdots& \boldsymbol{x}_M^\top\boldsymbol{y}_N\\ \end{array} \right]\\\;\\ &=\left[\begin{array}{c} \boldsymbol{x}_1^\top\boldsymbol{x}_1\\ \boldsymbol{x}_2^\top\boldsymbol{x}_2\\ \vdots\\ \boldsymbol{x}_M^\top\boldsymbol{x}_M \end{array}\right]\cdot[1,1,\cdots,1]_N +\left[\begin{array}{c} 1\\ 1\\ \vdots\\ 1 \end{array}\right]_M\cdot[\boldsymbol{y}_1^\top\boldsymbol{y}_1,\boldsymbol{y}_2^\top\boldsymbol{y}_2,\cdots,\boldsymbol{y}_N^\top\boldsymbol{y}_N] -2\left[\begin{array}{c} \boldsymbol{x}_1^\top\\ \boldsymbol{x}_2^\top\\ \vdots\\ \boldsymbol{x}_M^\top \end{array}\right]\cdot[\boldsymbol{y}_1,\boldsymbol{y}_2,\cdots,\boldsymbol{y}_N] \end{array} d2(X,Y)= x1⊤x1−2x1⊤y1+y1⊤y1x2⊤x2−2x2⊤y1+y1⊤y1⋮xM⊤xM−2xM⊤y1+y1⊤y1x1⊤x1−2x1⊤y2+y2⊤y2x2⊤x2−2x2⊤y2+y2⊤y2⋮xM⊤xM−2xM⊤y2+y2⊤y2⋯⋯⋱⋯x1⊤x1−2x1⊤yN+yN⊤yNx2⊤x2−2x2⊤yN+yN⊤yN⋮xM⊤xM−2xM⊤yN+yN⊤yN = x1⊤x1x2⊤x2⋮xM⊤xMx1⊤x1x2⊤x2⋮xM⊤xM⋯⋯⋱⋯x1⊤x1x2⊤x2⋮xM⊤xM + y1⊤y1y1⊤y1⋮y1⊤y1y2⊤y2y2⊤y2⋮y2⊤y2⋯⋯⋱⋯yN⊤yNy2⊤y2⋮yN⊤yN −2 x1⊤y1x2⊤y1⋮xM⊤y1x1⊤y2x2⊤y2⋮xM⊤y2⋯⋯⋱⋯x1⊤yNx2⊤yN⋮xM⊤yN = x1⊤x1x2⊤x2⋮xM⊤xM ⋅[1,1,⋯,1]N+ 11⋮1 M⋅[y1⊤y1,y2⊤y2,⋯,yN⊤yN]−2 x1⊤x2⊤⋮xM⊤ ⋅[y1,y2,⋯,yN]
若对数据矩阵本身求两两(列表示样本点)之间的欧式距离,则计算表达式可简单表示为
d 2 ( X , X ) = diag ( X ⊤ X ) ⋅ 1 ⊤ + 1 ⋅ diag ( X ⊤ X ) − 2 X ⊤ X d^2(X,X)=\text{diag}(X^\top X)\cdot\boldsymbol{1}^\top+\boldsymbol{1}\cdot\text{diag}(X^\top X)-2X^\top X d2(X,X)=diag(X⊤X)⋅1⊤+1⋅diag(X⊤X)−2X⊤X
import numpy as np
X = np.array([[1,2,3,4,5,6,7,8,9],[1,1,1,1,1,1,1,1,1]])
D, N = X.shape
print('LLE running on {} points in {} dimensions\n'.format(N,D))
G = X.T@X
H = np.diag(G).reshape(-1,1)@np.ones((1,9))
dist = H+H.T-2*G
dist = np.sqrt(dist)
print(dist)
print('\n')
index = np.argsort(dist,axis=0)
neighborhood = index[1:5,:]
print(neighborhood)
import numpy as np
from sklearn.metrics.pairwise import paired_distances
X = np.array([[1,2,3,4,5,6,7,8,9],[1,1,1,1,1,1,1,1,1]])
dist = paired_distances(X,X)
print(dist)
print('\n')
index = np.argsort(distance,axis=0)
print(index)
print('\n')
neighborhood = index[1:5,:]
print(neighborhood)