Python statsmodels模块 回归分析 多重共线性

问题:采用stasmodles进行单变量回归,结果显示存在多重共线性

错误提示:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                eci_mid   R-squared:                       0.197
Model:                            OLS   Adj. R-squared:                  0.195
Method:                 Least Squares   F-statistic:                     82.93
Date:                Wed, 23 Jun 2021   Prob (F-statistic):           7.68e-18
Time:                        15:27:28   Log-Likelihood:                -441.98
No. Observations:                 339   AIC:                             888.0
Df Residuals:                     337   BIC:                             895.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.3352      0.060     -5.592      0.000      -0.453      -0.217
in_degree   4.736e-05    5.2e-06      9.107      0.000    3.71e-05    5.76e-05
==============================================================================
Omnibus:                       32.398   Durbin-Watson:                   1.613
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               12.251
Skew:                           0.204   Prob(JB):                      0.00219
Kurtosis:                       2.163   Cond. No.                     1.42e+04
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.42e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

原因:不是与共线性有关(因为只有一个变量),而是与解释变量的缩放有关。
解决方法:把解释变量去中心化即可。参见回答中的小例子。

其他查看共线性的方法:查看相关性矩阵的特征值。如果特征值接近0,说明存在共线性,对应的特征向量表明了哪些变量是共线的。(在回归分析中经常采用方差膨胀因子(VIF)进行度量。)

You can detect high-multi-collinearity by inspecting the eigen values of correlation matrix. A very low eigen value shows that the data are collinear, and the corresponding eigen vector shows which variables are collinear. If there is no collinearity in the data, you would expect that none of the eigen values are close to zero.

详细解释参考:https://stackoverflow.com/questions/25676145/capturing-high-multi-collinearity-in-statsmodels

你可能感兴趣的:(Python statsmodels模块 回归分析 多重共线性)