python计算方差膨胀因子_Python中的方差膨胀因子

正如其他人和Josef perktell在this post中所提到的,函数的作者variance_inflation_factor期望解释变量矩阵中存在常量。可以使用statsmodels中的add_constant将所需常量添加到数据帧,然后再将其值传递给函数。from statsmodels.stats.outliers_influence import variance_inflation_factor

from statsmodels.tools.tools import add_constant

df = pd.DataFrame(

{'a': [1, 1, 2, 3, 4],

'b': [2, 2, 3, 2, 1],

'c': [4, 6, 7, 8, 9],

'd': [4, 3, 4, 5, 4]}

)

X = add_constant(df)

>>> pd.Series([variance_inflation_factor(X.values, i)

for i in range(X.shape[1])],

index=X.columns)

const 136.875

a 22.950

b 3.000

c 12.950

d 3.000

dtype: float64

我相信您还可以使用assign将常量添加到数据帧最右边的列:X = df.assign(const=1)

>>> pd.Series([variance_inflation_factor(X.values, i)

for i in range(X.shape[1])],

index=X.columns)

a 22.950

b 3.000

c 12.950

d 3.000

const 136.875

dtype: float64

源代码本身相当简洁:def variance_inflation_factor(exog, exog_idx):

"""

exog : ndarray, (nobs, k_vars)

design matrix with all explanatory variables, as for example used in

regression

exog_idx : int

index of the exogenous variable in the columns of exog

"""

k_vars = exog.shape[1]

x_i = exog[:, exog_idx]

mask = np.arange(k_vars) != exog_idx

x_noti = exog[:, mask]

r_squared_i = OLS(x_i, x_noti).fit().rsquared

vif = 1. / (1. - r_squared_i)

return vif

修改代码以将所有vif作为一个系列返回也相当简单:from statsmodels.regression.linear_model import OLS

from statsmodels.tools.tools import add_constant

def variance_inflation_factors(exog_df):

'''

Parameters

----------

exog_df : dataframe, (nobs, k_vars)

design matrix with all explanatory variables, as for example used in

regression.

Returns

-------

vif : Series

variance inflation factors

'''

exog_df = add_constant(exog_df)

vifs = pd.Series(

[1 / (1. - OLS(exog_df[col].values,

exog_df.loc[:, exog_df.columns != col].values).fit().rsquared)

for col in exog_df],

index=exog_df.columns,

name='VIF'

)

return vifs

>>> variance_inflation_factors(df)

const 136.875

a 22.950

b 3.000

c 12.950

Name: VIF, dtype: float64

你可能感兴趣的:(python计算方差膨胀因子)