【集体智慧编程 学习笔记】 Euclidean距离和Pearson相关系数

Euclidean距离

定义:欧几里得空间中点 x = (x1,…,xn) 和 y = (y1,…,yn) 之间的距离为

Euclidean距离公式

Pearson相关系数

两个变量之间的相关系数越高,从一个变量去预测另一个变量的精确度就越高,这是因为相关系数越高,就意味着这两个变量的共变部分越多,所以从其中一个变量的变化就可越多地获知另一个变量的变化。如果两个变量之间的相关系数为1或-1,那么你完全可由变量X去获知变量Y的值。

· 当相关系数为0时,X和Y两变量无关系。

· 当X的值增大,Y也增大,正相关关系,相关系数在0.00与1.00之间

· 当X的值减小,Y也减小,正相关关系,相关系数在0.00与1.00之间

· 当X的值增大,Y减小,负相关关系,相关系数在-1.00与0.00之间

当X的值减小,Y增大,负相关关系,相关系数在-1.00与0.00之间

相关系数的绝对值越大,相关性越强,相关系数越接近于1和-1,相关度越强,相关系数越接近于0,相关度越弱。

Pearson相关系数

实现代码:

view source
01 from math import sqrt
02  
03 # A dictionary of movie critics and their ratings of a small
04 # set of movies
05 movies = {'Lisa Rose': {'Lady in the Water'2.5,
06                        'Snakes on a Plane'3.5,
07                        'Just My Luck'3.0,
08                        'Superman Returns'3.5,
09                        'You, Me and Dupree'2.5,
10                        'The Night Listener'3.0},
11          'Gene Seymour': {'Lady in the Water'3.0,
12                           'Snakes on a Plane'3.5,
13                           'Just My Luck'1.5,
14                           'Superman Returns'5.0,
15                           'The Night Listener'3.0,
16                           'You, Me and Dupree'3.5},
17          'Michael Phillips': {'Lady in the Water'2.5,
18                               'Snakes on a Plane'3.0,
19                               'Superman Returns'3.5,
20                               'The Night Listener'4.0},
21          'Claudia Puig': {'Snakes on a Plane'3.5,
22                           'Just My Luck'3.0,
23                           'The Night Listener'4.5,
24                           'Superman Returns'4.0,
25                           'You, Me and Dupree'2.5},
26          'Mick LaSalle': {'Lady in the Water'3.0,
27                           'Snakes on a Plane'4.0,
28                           'Just My Luck'2.0,
29                           'Superman Returns'3.0,
30                           'The Night Listener'3.0,
31                           'You, Me and Dupree'2.0},
32          'Jack Matthews': {'Lady in the Water'3.0,
33                            'Snakes on a Plane'4.0,
34                            'The Night Listener'3.0,
35                            'Superman Returns'5.0,
36                            'You, Me and Dupree'3.5},
37          'Toby': {'Snakes on a Plane':4.5,
38                   'You, Me and Dupree':1.0,
39                   'Superman Returns':4.0}}
40  
41 def euclidean(data, p1, p2):
42     "Calculate Euclidean distance"
43     distance = sum([pow(data[p1][item]-data[p2][item],2)
44                       for item in data[p1] if item in data[p2]])
45  
46     return distance
47  
48 def pearson(data, p1, p2):
49     "Calculate Pearson correlation coefficient"
50     corrItems = [item for item in data[p1] if item in data[p2]]
51  
52     = len(corrItems)
53     if == 0:
54         return 0;
55  
56     sumX = sum([data[p1][item] for item in corrItems])
57     sumY = sum([data[p2][item] for item in corrItems])
58     sumXY = sum([data[p1][item] * data[p2][item] for item in corrItems])
59     sumXsq = sum([pow(data[p1][item], 2for item in corrItems])
60     sumYsq = sum([pow(data[p2][item],2for item in corrItems])        
61  
62     pearson = (sumXY - sumX * sumY / n) / sqrt((sumXsq - pow(sumX, 2/ n) * (sumYsq - pow(sumY, 2/ n))
63     return pearson

 

转载请注明: 转自阿龙の异度空间

本文链接地址: http://blog.yidooo.net/archives/3190.html

你可能感兴趣的:(vmware,编程,mysql,破解,Dictionary,distance)