回归分析中的相关度(Corr)和R^2

一、.皮尔逊相关系数(Pearson Correlation Coefficient)

1.可以用来衡量两个数值之间的线性相关程度

2.对应的取值范围为[-1,1],即存在正相关,负相关和不相关

3.计算公式:

\rho =Cor(X,Y)=\frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}

r_{xy}=\frac{\sum(x-\bar{x})(y-\bar{y})) }{\sqrt{\sum (x-\bar{x})^2\sum (y-\bar{y})^2}}

二、R^2

1.定义:决定系数,反应因变量的全部变异能通过回归关系被自变量解释的比例

2.描述:如R^2=0.8,则表示回归关系可以解释为因变量80%的变异。即,如果可以控制自变量保持不变,则因变量的变异程度会减少80%

3.简单线性回归的计算方式:R^2=r*r

4.多元线性回归的计算方式:R^{2}=\frac{SSR}{SST}=\frac{\sum (\hat{y_{i}}-\bar{y})^2}{\sum (y_{i}-\bar{y})^2}

SST=\sum_{i}^{}(y_{i}-\bar{y})^2

SSR=\sum_{i}^{}(\hat{y_{i}}-\bar{y})^2

SSE=\sum_{i}^{}(y_{i}-\hat{y_{i}})^2

5.R^2也具有一定的局限性,R^2会随着自变量的增大增大,R^2和样本量具有一定的关系。因此,为了改进R^2的局限性,我们要对R^2进行修正。修正的方法为:

R^{2}_{adjust}=1-\frac{(1-R^2)(N-1)}{N-P-1}

其中P为预测值的数量

N为总实例的数量。

3.应用实例代码即结果

import numpy as np
from astropy.units import Ybarn
import math

def computerCorrelation(X, Y):
    XBar = np.mean(X)#X的均值
    YBar = np.mean(Y)#Y的均值
    SSR = 0#初始化SSR
    varX = 0
    varY = 0
    for i in range (0, len(X)):
        diffXXBar = X[i] - XBar
        diffYYBar = Y[i] - YBar
        SSR += (diffXXBar  * diffYYBar )
        varX += diffXXBar**2
        varY += diffYYBar**2
        SST = math.sqrt(varX *varY )
    return SSR/SST
def polyfit (x, y,degree):#degree代表次数
    results = {}#把结果存储在results中
    coffs = np.polyfit(x, y, degree)#返回相关系数估计
    results['polynomfot'] = coffs .tolist()
    p = np.poly1d(coffs)#返回预测值yhat
    yhat = p(x)
    ybar = np.sum(y)/len(y)
    ssreg = np.sum((yhat-ybar)**2)
    print('ssreg:', str(ssreg))
    sstot = np.sum((y-ybar)**2)
    print('sstot:', str(sstot))
    results['determinotion'] = ssreg/sstot
    print('results:', results)
    return results

textX = [1, 3, 5, 7, 9]
textY = [10, 13, 15, 20, 35]
print('r:', computerCorrelation(textX, textY))
print('r^2:', str(computerCorrelation(textX, textY)**2))
print(polyfit(textX, textY, 1), 'determinotion')

运行结果:

r: 0.9136680531834395
r^2: 0.8347893114080164
ssreg: 324.9000000000002
sstot: 389.20000000000005
results: {'polynomfot': [2.8500000000000005, 4.350000000000008], 'determinotion': 0.8347893114080168}
{'polynomfot': [2.8500000000000005, 4.350000000000008], 'determinotion': 0.8347893114080168} determinotion

你可能感兴趣的:(python,概率论)