2019-05-23

创建多重索引

letters = ['A', 'B', 'C']
numbers = list(range(10))
mi=pd.MultiIndex.from_product([letters,numbers])
s=pd.Series(np.random.rand(30),index=mi)
s

查询索引为1,3,6的值

s.loc[:,[1,3,6]]

多重索引Series切片

s.loc[pd.IndexSlice[:'B',5:]]

多重索引DataFrame

frame=pd.DataFrame(np.arange(12).reshape(6,2),index=[list('AAABBB'),list('123123')],columns=['hello','shiyanlou'])
frame

多重索引设置列名称

frame.index.names=['first','second']
frame

DataFrame行列名称转换

frame.stack()

DataFrame索引转换

frame.unstack()

DataFrame按关键字查询

df[df['name'].isin(['cat','dog'])]

DataFrame按标签及列名查询

df.loc[df.index[[1,3,4]],['animal','age']]

priority 列的 yes 值替换为 Trueno 值替换为 False

df['priority']=df.priority.map({'yes':'True','no':'False'})

DataFrame的每个元素减去每一行的平均值

df=pd.DataFrame(np.random.random(size=(5,3)))
print(df)
df.sub(df.mean(axis=1),axis=0)

DataFrame分组,并得到每一组中最大三个数的和

df = pd.DataFrame({'A': list('aaabbcaabcccbbc'),
                   'B': [12, 345, 3, 1, 45, 14, 4, 52, 54, 23, 235, 21, 57, 3, 87]})
print(df)
df.groupby('A')['B'].nlargest(3).sum(level=0)

数据清洗

df['Airline']=df['Airline'].str.extract('([a-zA-Z\s]+)',expand=False).str.strip()
df

格式统一化

delays=df['RecentDelays'].apply(pd.Series)
delays.columns=['delay_{}'.format(n) for n in range(1,len(delays.columns)+1)]
df=df.drop('RecentDelays',axis=1).join(delays)
df

DataFrame绘制柱形图折线图的组合图

df = pd.DataFrame({"revenue": [57, 68, 63, 71, 72, 90, 80, 62, 59, 51, 47, 52],
                   "advertising": [2.1, 1.9, 2.7, 3.0, 3.6, 3.2, 2.7, 2.4, 1.8, 1.6, 1.3, 1.9],
                   "month": range(12)
                   })

ax = df.plot.bar("month", "revenue", color="yellow")
df.plot("month", "advertising", secondary_y=True, ax=ax)

你可能感兴趣的:(2019-05-23)