Anscombe's quartet 学习

题目网址:

https://nbviewer.jupyter.org/github/schmit/cme193-ipython-notebooks-lecture/blob/master/Exercises.ipynb

准备工作

需安装的文件:Ipython、Jupyter

要配置的Python的库:pandas、seaborn、statsmodels

使用pip进行安装

Part 1

For each of the four datasets…

  • Compute the mean and variance of both x and y
  • Compute the correlation coefficient between x and y
  • Compute the linear regression line: y=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

1.1 code:

#1.1
variable = 'xy'
print("mean:")
for var in variable:
    print(anascombe.groupby("dataset")[var].mean()) 
print("var:")
for var in variable:
    print(anascombe.groupby("dataset")[var].var())

1.1 result:

mean:
dataset
I      9.0
II     9.0
III    9.0
IV     9.0
Name: x, dtype: float64
dataset
I      7.500909
II     7.500909
III    7.500000
IV     7.500909
Name: y, dtype: float64
var:
dataset
I      11.0
II     11.0
III    11.0
IV     11.0
Name: x, dtype: float64
dataset
I      4.127269
II     4.127629
III    4.122620
IV     4.123249
Name: y, dtype: float64
[Finished in 11.2s]

1.2 code:

# 1.2
corrs = anascombe.groupby("dataset").corr()
print("\n1.2\ncorrelation coefficient between x and y")
for key in anascombe.groupby('dataset').indices:
    print("%-6s" % key, corrs['x'][key]['y'])

1.2 result:

1.2
correlation coefficient between x and y
I      0.816420516345
II     0.816236506
III    0.81628673949
IV     0.816521436889
[Finished in 9.5s]

1.3 code

print('\n1.3\nlinear regression:')
for key in anascombe.groupby('dataset').indices:
    group = anascombe.groupby('dataset').get_group(key)
    n = len(group)
    is_train = np.random.rand(n)>-np.inf
    train = group[is_train].reset_index(drop=True)
    lin_model = smf.ols('y ~ x', train).fit()
    print(key)
    print(lin_model.summary() ,end='\n')

result:

1.3
linear regression:
I
D:\Anaconda\lib\site-packages\scipy\stats\stats.py:1390: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=11
 "anyway, n=%i" % int(n))
 OLS Regression Results ==============================================================================
Dep. Variable:                      y   R-squared:                       0.667
Model:                            OLS   Adj. R-squared:                  0.629
Method:                 Least Squares   F-statistic:                     17.99
Date:                Thu, 26 Jul 2018   Prob (F-statistic):            0.00217
Time:                        15:27:51   Log-Likelihood:                -16.841
No. Observations:                  11   AIC:                             37.68
Df Residuals:                       9   BIC:                             38.48
Df Model:                           1                                         
Covariance Type: nonrobust ==============================================================================
 coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------ Intercept 3.0001 1.125 2.667 0.026 0.456 5.544 x 0.5001 0.118 4.241 0.002 0.233 0.767 ============================================================================== Omnibus: 0.082 Durbin-Watson: 3.212 Prob(Omnibus): 0.960 Jarque-Bera (JB): 0.289 Skew: -0.122 Prob(JB): 0.865 Kurtosis: 2.244 Cond. No. 29.1 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. II OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.666 Model: OLS Adj. R-squared: 0.629 Method: Least Squares F-statistic: 17.97 Date: Thu, 26 Jul 2018 Prob (F-statistic): 0.00218 Time: 15:27:51 Log-Likelihood: -16.846 No. Observations: 11 AIC: 37.69 Df Residuals: 9 BIC: 38.49 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------
Intercept      3.0009      1.125      2.667      0.026       0.455       5.547
x 0.5000 0.118 4.239 0.002 0.233 0.767 ==============================================================================
Omnibus:                        1.594   Durbin-Watson:                   2.188
Prob(Omnibus):                  0.451   Jarque-Bera (JB):                1.108
Skew:                          -0.567   Prob(JB):                        0.575
Kurtosis: 1.936 Cond. No. 29.1 ==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
III
 OLS Regression Results ==============================================================================
Dep. Variable:                      y   R-squared:                       0.666
Model:                            OLS   Adj. R-squared:                  0.629
Method:                 Least Squares   F-statistic:                     17.97
Date:                Thu, 26 Jul 2018   Prob (F-statistic):            0.00218
Time:                        15:27:51   Log-Likelihood:                -16.838
No. Observations:                  11   AIC:                             37.68
Df Residuals:                       9   BIC:                             38.47
Df Model:                           1                                         
Covariance Type: nonrobust ==============================================================================
 coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------ Intercept 3.0025 1.124 2.670 0.026 0.459 5.546 x 0.4997 0.118 4.239 0.002 0.233 0.766 ============================================================================== Omnibus: 19.540 Durbin-Watson: 2.144 Prob(Omnibus): 0.000 Jarque-Bera (JB): 13.478 Skew: 2.041 Prob(JB): 0.00118 Kurtosis: 6.571 Cond. No. 29.1 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. IV OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.667 Model: OLS Adj. R-squared: 0.630 Method: Least Squares F-statistic: 18.00 Date: Thu, 26 Jul 2018 Prob (F-statistic): 0.00216 Time: 15:27:51 Log-Likelihood: -16.833 No. Observations: 11 AIC: 37.67 Df Residuals: 9 BIC: 38.46 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------
Intercept      3.0017      1.124      2.671      0.026       0.459       5.544
x 0.4999 0.118 4.243 0.002 0.233 0.766 ==============================================================================
Omnibus:                        0.555   Durbin-Watson:                   1.662
Prob(Omnibus):                  0.758   Jarque-Bera (JB):                0.524
Skew:                           0.010   Prob(JB):                        0.769
Kurtosis: 1.931 Cond. No. 29.1 ==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[Finished in 9.3s]

Part 2

Using Seaborn, visualize all four datasets.

hint: use sns.FacetGrid combined with plt.scatter

要求使用Seaborn库来对数据进行展示。Seaborn是对matplotlib库更高级的封装。

code:

ac = np.random.rand(len(anascombe)) < 0.5
print(smf.ols('y ~ x', anascombe[ac].reset_index(drop=True)).fit().summary())
sns.FacetGrid(anascombe, col="dataset").map(plt.scatter, 'x', 'y')
plt.show()

result:

 OLS Regression Results ==============================================================================
Dep. Variable:                      y   R-squared:                       0.511
Model:                            OLS   Adj. R-squared:                  0.487
Method:                 Least Squares   F-statistic:                     20.94
Date:                Thu, 26 Jul 2018   Prob (F-statistic):           0.000183
Time:                        15:43:01   Log-Likelihood:                -29.420
No. Observations:                  22   AIC:                             62.84
Df Residuals:                      20   BIC:                             65.02
Df Model:                           1                                         
Covariance Type: nonrobust ==============================================================================
 coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------ Intercept 3.9729 0.814 4.884 0.000 2.276 5.670 x 0.4022 0.088 4.576 0.000 0.219 0.585 ============================================================================== Omnibus: 0.143 Durbin-Watson: 2.152 Prob(Omnibus): 0.931 Jarque-Bera (JB): 0.170 Skew: 0.148 Prob(JB): 0.918 Kurtosis: 2.687 Cond. No. 36.9 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

图片:
Anscombe's quartet 学习_第1张图片

你可能感兴趣的:(python)