首先练习python的matplotlib和seaborn两个模块画图:
%matplotlib inline
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
np.random.seed(sum(map(ord, "aesthetics")))
def sinplot(flip=1):
x = np.linspace(0, 16, 100)
for i in range(1, 8):
plt.plot(x, np.sin(x + i * .5) * (8 - i) * flip)
sinplot()
看看结果:
转成seaborn模块:
import seaborn as sns
sinplot()
立马感觉高大上啊!
跟着kaggle上的大神做一下数据的分析处理。点击这里查看
****import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white",color_codes=True)
train=pd.read_csv("input/train.csv")
test=pd.read_csv("input/test.csv")
train.tail(5)****
运行这段代码时报错了,说是key值有问题,我也没搞清楚为什么:KeyError: u’no item named TARGET’
df=pd.DataFrame(train.TARGET.value_counts())
df['Percentage']=100*df['TARGET']/train.shape[0]
df
统计下一行中TARGET为0的个数
x=train.iloc[:,:-1]
y=train.TARGET
x['n0']=(x==0).sum(axis=1)
train['n0']=x['n0']
将数据中的bank products和numbers of products统计下,画成直方图
train.num_var4.hist(bins=100)
plt.xlabel('Number of bank products')
plt.ylabel('Number of customers in train')
plt.title('Most customers have 1 product with the bank')
plt.show()