由于groupby函数不能对index进行组合,可以reset_index()把index变成columns后在进行groupby的计算:
# 先把MMR 的表格走一个pivot_table:
pivot = pd.pivot_table(mmr, index = ['Sales','strat.business field'],
columns = 'Period/year', values = 'Net sales in CHF',
aggfunc = np.sum)
把原来的index删除,df.reset_index(inplace = True)
pivot.reset_index(inplace = True)
pivot.columns
Out[22]: Index(['Sales', 'strat.business field', 2020.001, 2020.0020000000002], dtype='object', name='Period/year')
pivot.index
Out[23]: RangeIndex(start=0, stop=45, step=1)
设置一列,多列为新的index:
pivot.set_index('Sales',inplace = True)
pivot.set_index(['Sales','strat.business field'],inplace = True)
name1 = 'jack jack bob bob tom'.split()
date1 = 'day1 day2 day1 day2 day1'.split()
df = pd.DataFrame({
'name':name1,'date':date1,'values':[2,4,3,2,5]})
用groupby 统计每个人的数字:
df.groupby('name')['values'].sum()
Out[30]:
name
bob 5
jack 6
tom 5
Name: values, dtype: int64
用pivot_table统计上面的数据:
pd.pivot_table(df,index = 'name',values = 'values',aggfunc = np.sum)
Out[32]:
values
name
bob 5
jack 6
tom 5
#带入日期作为columns:
df1 = pd.pivot_table(df,index = 'name', values = 'values', columns = 'date',aggfunc = np.sum)
df1.fillna(0)
Out[36]:
date day1 day2
name
bob 3.0 2.0
jack 2.0 4.0
tom 5.0 0.0