计算皮尔逊相关系数

皮尔逊系数

在统计学中,皮尔逊相关系数( Pearson correlation coefficient),又称皮尔逊积矩相关系数(Pearson product-moment correlation coefficient,简称 PPMCC或PCCs)。用于衡量两个变量X和Y之间的线性相关相关关系,值域在-1与1之间。


计算皮尔逊相关系数_第1张图片

1 python计算方法

几种方式

1.1根据公式手写

def cal_pccs(x, y, n):
    """
    warning: data format must be narray
    :param x: Variable 1
    :param y: The variable 2
    :param n: The number of elements in x
    :return: pccs
    """
    sum_xy = np.sum(np.sum(x*y))
    sum_x = np.sum(np.sum(x))
    sum_y = np.sum(np.sum(y))
    sum_x2 = np.sum(np.sum(x*x))
    sum_y2 = np.sum(np.sum(y*y))
    pcc = (n*sum_xy-sum_x*sum_y)/np.sqrt((n*sum_x2-sum_x*sum_x)*(n*sum_y2-sum_y*sum_y))
    return pcc

1.2 numpy的函数

pccs = np.corrcoef(x, y)

1.3 scipy.stats中的函数

from scipy.stats import pearsonr
pccs = pearsonr(x, y)

1.4细化分解解释

from math import sqrt


def multipl(a, b):
    sumofab = 0.0
    for i in range(len(a)):
        temp = a[i] * b[i]
        sumofab += temp
    return sumofab


def corrcoef(x, y):
    n = len(x)
    # 求和
    sum1 = sum(x)
    sum2 = sum(y)
    # 求乘积之和
    sumofxy = multipl(x, y)
    # 求平方和
    sumofx2 = sum([pow(i, 2) for i in x])
    sumofy2 = sum([pow(j, 2) for j in y])
    num = sumofxy - (float(sum1) * float(sum2) / n)
    # 计算皮尔逊相关系数
    den = sqrt((sumofx2 - float(sum1 ** 2) / n) * (sumofy2 - float(sum2 ** 2) / n))
    return num / den


x = [0, 1, 0, 3]
y = [0, 1, 1, 1]

print(corrcoef(x, y))  # 0.471404520791

你可能感兴趣的:(计算皮尔逊相关系数)