一、.皮尔逊相关系数(Pearson Correlation Coefficient)
1.可以用来衡量两个数值之间的线性相关程度
2.对应的取值范围为[-1,1],即存在正相关,负相关和不相关
3.计算公式:
二、R^2
1.定义:决定系数,反应因变量的全部变异能通过回归关系被自变量解释的比例
2.描述:如R^2=0.8,则表示回归关系可以解释为因变量80%的变异。即,如果可以控制自变量保持不变,则因变量的变异程度会减少80%
3.简单线性回归的计算方式:R^2=r*r
5.R^2也具有一定的局限性,R^2会随着自变量的增大增大,R^2和样本量具有一定的关系。因此,为了改进R^2的局限性,我们要对R^2进行修正。修正的方法为:
其中P为预测值的数量
N为总实例的数量。
3.应用实例代码即结果
import numpy as np
from astropy.units import Ybarn
import math
def computerCorrelation(X, Y):
XBar = np.mean(X)#X的均值
YBar = np.mean(Y)#Y的均值
SSR = 0#初始化SSR
varX = 0
varY = 0
for i in range (0, len(X)):
diffXXBar = X[i] - XBar
diffYYBar = Y[i] - YBar
SSR += (diffXXBar * diffYYBar )
varX += diffXXBar**2
varY += diffYYBar**2
SST = math.sqrt(varX *varY )
return SSR/SST
def polyfit (x, y,degree):#degree代表次数
results = {}#把结果存储在results中
coffs = np.polyfit(x, y, degree)#返回相关系数估计
results['polynomfot'] = coffs .tolist()
p = np.poly1d(coffs)#返回预测值yhat
yhat = p(x)
ybar = np.sum(y)/len(y)
ssreg = np.sum((yhat-ybar)**2)
print('ssreg:', str(ssreg))
sstot = np.sum((y-ybar)**2)
print('sstot:', str(sstot))
results['determinotion'] = ssreg/sstot
print('results:', results)
return results
textX = [1, 3, 5, 7, 9]
textY = [10, 13, 15, 20, 35]
print('r:', computerCorrelation(textX, textY))
print('r^2:', str(computerCorrelation(textX, textY)**2))
print(polyfit(textX, textY, 1), 'determinotion')
运行结果:
r: 0.9136680531834395
r^2: 0.8347893114080164
ssreg: 324.9000000000002
sstot: 389.20000000000005
results: {'polynomfot': [2.8500000000000005, 4.350000000000008], 'determinotion': 0.8347893114080168}
{'polynomfot': [2.8500000000000005, 4.350000000000008], 'determinotion': 0.8347893114080168} determinotion