这篇分享Python中计算pandas的DataFrame各列相关系数方法,以及介绍如何检验DataFrame两列之间相关系数的显著性。
eg:
>>>df.head()
Guba XQ BCI Count Value
0 0.021 0.098 0.175 0.077 0.057
1 0.031 0.097 0.192 0.087 0.069
2 0.018 0.101 0.193 0.075 0.069
3 0.017 0.112 0.203 0.077 0.063
4 0.042 0.158 0.222 0.335 0.567
1.pearson相关系数
>>>df.corr()
Guba XQ BCI Count Value
Guba 1.000000 0.175604 -0.014611 0.200896 0.256166
XQ 0.175604 1.000000 -0.390358 0.654250 0.482809
BCI -0.014611 -0.390358 1.000000 -0.259319 -0.156440
Count 0.200896 0.654250 -0.259319 1.000000 0.832961
Value 0.256166 0.482809 -0.156440 0.832961 1.000000
2.Kendall Tau相关系数
>>>df.corr('kendall')
Guba XQ BCI Count Value
Guba 1.000000 0.153904 -0.012438 0.133122 0.090707
XQ 0.153904 1.000000 -0.244304 0.374908 0.255377
BCI -0.012438 -0.244304 1.000000 -0.157442 -0.091950
Count 0.133122 0.374908 -0.157442 1.000000 0.720916
Value 0.090707 0.255377 -0.091950 0.720916 1.000000
3.spearman秩相关
>>>df.corr('spearman')
Guba XQ BCI Count Value
Guba 1.000000 0.219124 -0.017204 0.189752 0.143163
XQ 0.219124 1.000000 -0.358981 0.563938 0.427756
BCI -0.017204 -0.358981 1.000000 -0.241880 -0.140010
Count 0.189752 0.563938 -0.241880 1.000000 0.877732
Value 0.143163 0.427756 -0.140010 0.877732 1.000000
4.显著性检验
>>>import scipy.stats as stats
>>>#输出结果第一个值为pearsonr相关系数,
>>>#第二个为p-value,所以这里Guba列和Value值是显著相关的
>>>stats.pearsonr(df['Guba'],df['Value'])
(0.256165703418037, 8.10519823509109e-07)