Python学习笔记——Pandas、Seaborn

练习题目来源:https://nbviewer.jupyter.org/github/schmit/cme193-ipython-notebooks-lecture/blob/master/Exercises.ipynb#Anscombe's-quartet

Part 1

For each of the four datasets...

  • Compute the mean and variance of both x and y
  • Compute the correlation coefficient between x and y
  • Compute the linear regression line: y=β0+β1x+ϵy=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

对于四组数据,分别计算x,y的均值、方差、相关系数、线性回归方程(两个β值)

Part 2

Using Seaborn, visualize all four datasets.

hint: use sns.FacetGrid combined with plt.scatter

对四组数据进行可视化操作


代码实现:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm

sns.set_context("talk")
anscombe = pd.read_csv('anscombe.csv')

#part 1
mx = anscombe.groupby('dataset').mean().x
my = anscombe.groupby('dataset').mean().y
vx = anscombe.groupby('dataset').var().x
vy = anscombe.groupby('dataset').var().y
print('x mean : \n', mx, '\n')
print('y mean : \n', my, '\n')
print('x var : \n', vx, '\n')
print('y var : \n', vy, '\n')

cor = anscombe.groupby('dataset').corr()
print('correlation : \n', cor, '\n')

for a in [anscombe[anscombe.dataset == i] for i in ['I', 'II', 'III', 'IV']]:
    s_x = sm.add_constant(np.array(a.x))
    s_y = np.array(a.y)
    beta_pair = sm.OLS(s_y, s_x).fit()
    print('β1, β0 = ', beta_pair.params)


#part 2
temp = sns.FacetGrid(data=anscombe, col='dataset', col_wrap=2)
temp.map(plt.scatter, 'x', 'y')
plt.show()

Python学习笔记——Pandas、Seaborn_第1张图片


Python学习笔记——Pandas、Seaborn_第2张图片

2018/6/11

你可能感兴趣的:(Python学习笔记——Pandas、Seaborn)