Python 实现 距离公式 欧式距离、余弦距离、曼哈顿距离

距离公式 python

1、欧式距离(Euclidean Distance)

计算公式:

( x 1 − x 2 ) 2 + ( y 1 − y 2 ) 2 \sqrt{(x_1-x_2)^2+(y_1-y_2)^2} (x1x2)2+(y1y2)2

Python

# 计算欧氏距离
def distEclud(vecA, vecB):
    return np.sqrt(np.sum(np.power((vecA - vecB), 2)))

2、曼哈顿距离(Manhattan Distance)

计算公式:

∣ x 1 − x 2 ∣ + ∣ y 1 − y 2 ∣ |x_1-x_2|+|y_1-y_2| x1x2+y1y2

优点:计算速度快

Python

def manhattan(rating1, rating2):
    """Computes the Manhattan distance. Both rating1 and rating2 are dictionaries
       of the form {'The Strokes': 3.0, 'Slightly Stoopid': 2.5}"""
    distance = 0
    commonRatings = False 
    for key in rating1:
        if key in rating2:
            distance += abs(rating1[key] - rating2[key])
            commonRatings = True
    if commonRatings:
        return distance
    else:
        return -1 #Indicates no ratings in common

3、一般化

​ 曼哈顿距离和偶是距离可以一般化为如下的明氏距离(Minkowski Distance):
d ( x , y ) = ( ∑ k = 1 n ∣ x k − y k ∣ r ) 1 r d(x,y)=(\sum_{k=1}^{n}|x_k-y_k|^r)^{\frac{1}{r}} d(x,y)=(k=1nxkykr)r1
​ 其中:
r = 1 r=1 r=1 时,上述公式计算的就是曼哈顿距离。
r = 2 r=2 r=2 时,上述公式计算的就是欧式距离。
r = ∞ r=\infty r= 时,上述公式计算的是上确界距离(Supermum Distance)

r r r 越大,某一维上的较大差异对最终差值的影响也就越大。

Python

def minkowski(rating1, rating2, r):
    """Computes the Minkowski distance. Both rating1 and rating2 are dictionaries
       of the form {'The Strokes': 3.0, 'Slightly Stoopid': 2.5}"""
    distance = 0
    commonRatings = False 
    for key in rating1:
        if key in rating2:
            distance += pow(abs(rating1[key] - rating2[key]), r)
            commonRatings = True
    if commonRatings:
        return pow(distance, 1/r)
    else:
        return -1 #Indicates no ratings in common

4、余弦距离(Cosine Distance)

计算公式:

d = ∑ i = 1 N x i y i ∑ i N x i 2 ∑ i N y i 2 d=\frac{\sum_{i=1}^Nx_iy_i}{\sqrt{\sum_{i}^Nx_i^2}\sqrt{\sum_i^Ny_i^2}} d=iNxi2 iNyi2 i=1Nxiyi

优点:余弦距离根据向量方向来判断向量相似度,与向量各个维度的相对大小有关,不受各个维度直接数值影响。
某种程度上,归一化后的欧氏距离和余弦相似性表征能力相同

Python

# 余弦距离
def distCos(vecA, vecB):
    # print(vecA)
    # print(vecB)
    return float(np.sum(np.array(vecA) * np.array(vecB))) / (distEclud(vecA,np.mat(np.zeros(len(vecA[0])))) * distEclud(vecB,np.mat(np.zeros(len(vecB[0])))))

5、相关测试代码

#
#  FILTERINGDATA.py
#

from math import sqrt

users = {"Angelica": {"Blues Traveler": 3.5, "Broken Bells": 2.0, "Norah Jones": 4.5, "Phoenix": 5.0, "Slightly Stoopid": 1.5, "The Strokes": 2.5, "Vampire Weekend": 2.0},
         "Bill":{"Blues Traveler": 2.0, "Broken Bells": 3.5, "Deadmau5": 4.0, "Phoenix": 2.0, "Slightly Stoopid": 3.5, "Vampire Weekend": 3.0},
         "Chan": {"Blues Traveler": 5.0, "Broken Bells": 1.0, "Deadmau5": 1.0, "Norah Jones": 3.0, "Phoenix": 5, "Slightly Stoopid": 1.0},
         "Dan": {"Blues Traveler": 3.0, "Broken Bells": 4.0, "Deadmau5": 4.5, "Phoenix": 3.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 2.0},
         "Hailey": {"Broken Bells": 4.0, "Deadmau5": 1.0, "Norah Jones": 4.0, "The Strokes": 4.0, "Vampire Weekend": 1.0},
         "Jordyn":  {"Broken Bells": 4.5, "Deadmau5": 4.0, "Norah Jones": 5.0, "Phoenix": 5.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 4.0},
         "Sam": {"Blues Traveler": 5.0, "Broken Bells": 2.0, "Norah Jones": 3.0, "Phoenix": 5.0, "Slightly Stoopid": 4.0, "The Strokes": 5.0},
         "Veronica": {"Blues Traveler": 3.0, "Norah Jones": 5.0, "Phoenix": 4.0, "Slightly Stoopid": 2.5, "The Strokes": 3.0}
        }



def manhattan(rating1, rating2):
    """Computes the Manhattan distance. Both rating1 and rating2 are dictionaries
       of the form {'The Strokes': 3.0, 'Slightly Stoopid': 2.5}"""
    distance = 0
    commonRatings = False 
    for key in rating1:
        if key in rating2:
            distance += abs(rating1[key] - rating2[key])
            commonRatings = True
    if commonRatings:
        return distance
    else:
        return -1 #Indicates no ratings in common


def computeNearestNeighbor(username, users):
    """creates a sorted list of users based on their distance to username"""
    distances = []
    for user in users:
        if user != username:
            distance = manhattan(users[user], users[username])
            distances.append((distance, user))
    # sort based on distance -- closest first
    distances.sort()
    return distances

def recommend(username, users):
    """Give list of recommendations"""
    # first find nearest neighbor
    nearest = computeNearestNeighbor(username, users)[0][1]

    recommendations = []
    # now find bands neighbor rated that user didn't
    neighborRatings = users[nearest]
    userRatings = users[username]
    for artist in neighborRatings:
        if not artist in userRatings:
            recommendations.append((artist, neighborRatings[artist]))
    # using the fn sorted for variety - sort is more efficient
    return sorted(recommendations, key=lambda artistTuple: artistTuple[1], reverse = True)

# examples - uncomment to run

print( recommend('Hailey', users))
#print( recommend('Chan', users))

你可能感兴趣的:(数据分析)