python ols做线性回归模型

线性回归: y = β₁x + β₂; python求解系数β₁, β₂; 以及用P值判断该模型是否可靠,
ols(‘因变量1 ~ 自变量2’,data = ‘变量数据来源’).fit(); fit()表示拟合

import os
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib

os.chdir(r'D:\pycharm程序文件\练习1')
data = pd.read_csv('creditcard_exp.csv',skipinitialspace=True) # skipinitialspace=True 用于方差分析

matplotlib.rcParams['axes.unicode_minus']=False#解决保存图像时负号'-'显示为方块的问题
plt.rcParams['font.sans-serif'] = ['SimHei'] # 指定默认字体

data_two = data[['Income','avg_exp']].copy()

# 先看一下散点图
def scatter_fig():
    x = data_two['avg_exp']
    y = data_two['Income']
    plt.scatter(x, y)
    plt.xticks(rotation=45)
    plt.show()


# 做线性回归: y = β₁x + β₂;  ols('因变量1 ~ 自变量2',data = '变量数据来源').fit(); fit()表示拟合
from statsmodels.formula.api import ols

linear_regression = ols(' avg_exp ~ Income ',data=data).fit()
linear_regression1 = ols(' avg_exp ~ Income + Age + Income*Age',data=data).fit()
linear_regression1.summary()
print(linear_regression1.summary())




输出结果分析:

     OLS Regression Results                            
==============================================================================
Dep. Variable:                avg_exp   R-squared:                       0.454
Model:                            OLS   Adj. R-squared:                  0.446
Method:                 Least Squares   F-statistic:                     56.61
Date:                Wed, 16 Oct 2019   Prob (F-statistic):           1.60e-10
Time:                        13:48:55   Log-Likelihood:                -504.69
No. Observations:                  70   AIC:                             1013.
Df Residuals:                      68   BIC:                             1018.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    258.0495    104.290      2.474      0.016      49.942     466.157
Income        97.7286     12.989      7.524      0.000      71.809     123.648
==============================================================================
Omnibus:                        3.714   Durbin-Watson:                   1.424
Prob(Omnibus):                  0.156   Jarque-Bera (JB):                3.507
Skew:                           0.485   Prob(JB):                        0.173
Kurtosis:                       2.490   Cond. No.                         21.4
==============================================================================


1. 先看Income的P>|t|, 是否小于α ; 看模型能不能用
2. Income的coef值: 97.7286是斜率β₁ ;  Intercept的coef值是 截距 β₂(意义不大)
3. R-squared:  0.454 用于描述模型优劣 ; 范围(0,1), 越大越优/4. AIC:                             1013
   BIC:                             1018.  这两个用于选择模型



"""


你可能感兴趣的:(python)