题目网址:
https://nbviewer.jupyter.org/github/schmit/cme193-ipython-notebooks-lecture/blob/master/Exercises.ipynb
需安装的文件:Ipython、Jupyter
要配置的Python的库:pandas、seaborn、statsmodels
使用pip进行安装
For each of the four datasets…
1.1 code:
#1.1
variable = 'xy'
print("mean:")
for var in variable:
print(anascombe.groupby("dataset")[var].mean())
print("var:")
for var in variable:
print(anascombe.groupby("dataset")[var].var())
1.1 result:
mean:
dataset
I 9.0
II 9.0
III 9.0
IV 9.0
Name: x, dtype: float64
dataset
I 7.500909
II 7.500909
III 7.500000
IV 7.500909
Name: y, dtype: float64
var:
dataset
I 11.0
II 11.0
III 11.0
IV 11.0
Name: x, dtype: float64
dataset
I 4.127269
II 4.127629
III 4.122620
IV 4.123249
Name: y, dtype: float64
[Finished in 11.2s]
1.2 code:
# 1.2
corrs = anascombe.groupby("dataset").corr()
print("\n1.2\ncorrelation coefficient between x and y")
for key in anascombe.groupby('dataset').indices:
print("%-6s" % key, corrs['x'][key]['y'])
1.2 result:
1.2
correlation coefficient between x and y
I 0.816420516345
II 0.816236506
III 0.81628673949
IV 0.816521436889
[Finished in 9.5s]
1.3 code
print('\n1.3\nlinear regression:')
for key in anascombe.groupby('dataset').indices:
group = anascombe.groupby('dataset').get_group(key)
n = len(group)
is_train = np.random.rand(n)>-np.inf
train = group[is_train].reset_index(drop=True)
lin_model = smf.ols('y ~ x', train).fit()
print(key)
print(lin_model.summary() ,end='\n')
result:
1.3
linear regression:
I
D:\Anaconda\lib\site-packages\scipy\stats\stats.py:1390: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=11
"anyway, n=%i" % int(n))
OLS Regression Results ==============================================================================
Dep. Variable: y R-squared: 0.667
Model: OLS Adj. R-squared: 0.629
Method: Least Squares F-statistic: 17.99
Date: Thu, 26 Jul 2018 Prob (F-statistic): 0.00217
Time: 15:27:51 Log-Likelihood: -16.841
No. Observations: 11 AIC: 37.68
Df Residuals: 9 BIC: 38.48
Df Model: 1
Covariance Type: nonrobust ==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------ Intercept 3.0001 1.125 2.667 0.026 0.456 5.544 x 0.5001 0.118 4.241 0.002 0.233 0.767 ============================================================================== Omnibus: 0.082 Durbin-Watson: 3.212 Prob(Omnibus): 0.960 Jarque-Bera (JB): 0.289 Skew: -0.122 Prob(JB): 0.865 Kurtosis: 2.244 Cond. No. 29.1 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. II OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.666 Model: OLS Adj. R-squared: 0.629 Method: Least Squares F-statistic: 17.97 Date: Thu, 26 Jul 2018 Prob (F-statistic): 0.00218 Time: 15:27:51 Log-Likelihood: -16.846 No. Observations: 11 AIC: 37.69 Df Residuals: 9 BIC: 38.49 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------
Intercept 3.0009 1.125 2.667 0.026 0.455 5.547
x 0.5000 0.118 4.239 0.002 0.233 0.767 ==============================================================================
Omnibus: 1.594 Durbin-Watson: 2.188
Prob(Omnibus): 0.451 Jarque-Bera (JB): 1.108
Skew: -0.567 Prob(JB): 0.575
Kurtosis: 1.936 Cond. No. 29.1 ==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
III
OLS Regression Results ==============================================================================
Dep. Variable: y R-squared: 0.666
Model: OLS Adj. R-squared: 0.629
Method: Least Squares F-statistic: 17.97
Date: Thu, 26 Jul 2018 Prob (F-statistic): 0.00218
Time: 15:27:51 Log-Likelihood: -16.838
No. Observations: 11 AIC: 37.68
Df Residuals: 9 BIC: 38.47
Df Model: 1
Covariance Type: nonrobust ==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------ Intercept 3.0025 1.124 2.670 0.026 0.459 5.546 x 0.4997 0.118 4.239 0.002 0.233 0.766 ============================================================================== Omnibus: 19.540 Durbin-Watson: 2.144 Prob(Omnibus): 0.000 Jarque-Bera (JB): 13.478 Skew: 2.041 Prob(JB): 0.00118 Kurtosis: 6.571 Cond. No. 29.1 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. IV OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.667 Model: OLS Adj. R-squared: 0.630 Method: Least Squares F-statistic: 18.00 Date: Thu, 26 Jul 2018 Prob (F-statistic): 0.00216 Time: 15:27:51 Log-Likelihood: -16.833 No. Observations: 11 AIC: 37.67 Df Residuals: 9 BIC: 38.46 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------
Intercept 3.0017 1.124 2.671 0.026 0.459 5.544
x 0.4999 0.118 4.243 0.002 0.233 0.766 ==============================================================================
Omnibus: 0.555 Durbin-Watson: 1.662
Prob(Omnibus): 0.758 Jarque-Bera (JB): 0.524
Skew: 0.010 Prob(JB): 0.769
Kurtosis: 1.931 Cond. No. 29.1 ==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[Finished in 9.3s]
Using Seaborn, visualize all four datasets.
hint: use sns.FacetGrid combined with plt.scatter
要求使用Seaborn库来对数据进行展示。Seaborn是对matplotlib库更高级的封装。
code:
ac = np.random.rand(len(anascombe)) < 0.5
print(smf.ols('y ~ x', anascombe[ac].reset_index(drop=True)).fit().summary())
sns.FacetGrid(anascombe, col="dataset").map(plt.scatter, 'x', 'y')
plt.show()
result:
OLS Regression Results ==============================================================================
Dep. Variable: y R-squared: 0.511
Model: OLS Adj. R-squared: 0.487
Method: Least Squares F-statistic: 20.94
Date: Thu, 26 Jul 2018 Prob (F-statistic): 0.000183
Time: 15:43:01 Log-Likelihood: -29.420
No. Observations: 22 AIC: 62.84
Df Residuals: 20 BIC: 65.02
Df Model: 1
Covariance Type: nonrobust ==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------ Intercept 3.9729 0.814 4.884 0.000 2.276 5.670 x 0.4022 0.088 4.576 0.000 0.219 0.585 ============================================================================== Omnibus: 0.143 Durbin-Watson: 2.152 Prob(Omnibus): 0.931 Jarque-Bera (JB): 0.170 Skew: 0.148 Prob(JB): 0.918 Kurtosis: 2.687 Cond. No. 36.9 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.