求变量的相关系数; 用 data[[‘变量1’,‘变量2’,‘变量3’]].corr(method = ‘pearson’)
得出的结果是以: 系数矩阵的形式输出
import os
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
"""
两连续变量的相关系数 不需要假设检验
"""
os.chdir(r'D:\pycharm程序文件\练习1')
data = pd.read_csv('creditcard_exp.csv',skipinitialspace=True) # skipinitialspace=True 用于方差分析
matplotlib.rcParams['axes.unicode_minus']=False#解决保存图像时负号'-'显示为方块的问题
plt.rcParams['font.sans-serif'] = ['SimHei'] # 指定默认字体
# 两连续变量: avg_exp(y) ~ avg_exp_ln(x)
data_two = data[['Income','avg_exp_ln']].copy()
def scatter_fig():
x = data_two['avg_exp_ln']
y = data_two['Income']
plt.scatter(x, y)
plt.xticks(rotation=45)
plt.show()
# scatter_fig()
# 求解两变量的相关系数; 用 .data[['变量1','变量2']].corr(method = 'pearson')
coefficient = data_two.corr(method = 'pearson')
print(coefficient)
# 多变量之间的相关系数, 以系数矩阵的形式输出
coefficient1 = data[['avg_exp','Income','avg_exp_ln']].corr(method = 'pearson')
print(coefficient1)
输出结果:
Income avg_exp_ln
Income 1.00000 0.63489
avg_exp_ln 0.63489 1.00000
avg_exp Income avg_exp_ln
avg_exp 1.000000 0.674011 0.941926
Income 0.674011 1.000000 0.634890
avg_exp_ln 0.941926 0.634890 1.000000