pandas.DataFrame.pivot_table
Create a spreadsheet-style pivot table as a DataFrame.The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.
DataFrame.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)[source]
参数:
values column to aggregate, optional
当不需要显示全部的列的时候,选择需要展示的列
index column, Grouper, array, or list of the previous
指定的索引,可以是一个列表
columns column, Grouper, array, or list of the previous
columns参数就是用来显示字符型数据的,和fill_value搭配使用
aggfunc function, list of functions, dict, default numpy.mean
处理的方法,默认是 aggfunc='mean'
求均值
fill_value scalar, default None
在聚合之后,空值填什么,默认是NaN
margins bool, default False
Add all row / columns (e.g. for subtotal / grand totals).
dropna bool, default True
Do not include columns whose entries are all NaN.
margins_name str, default ‘All’
Name of the row / column that will contain the totals when margins is True.
observed bool, default False
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
Returns:
DataFrame
An Excel style pivot table.
导入基本的模块,和创建Dataframe(也可以读文件来得到数据集):
import numpy as np
import pandas as pd
df = pd.DataFrame({
'name': ['Leonard', 'Sheldon', 'Raj', 'Howard', 'Sheldon', 'Penny', 'Penny'],
'item': ['water', 'coke', 'soda', 'wine', 'water', 'coke', 'wine'],
'num': [1, 1, 1, 3, 1, 1, 2],
'price':[2, 3, 4, 15, 2, 3, 10],
'time': ['2022.8.20', '2022.8.20', '2022.8.21', '2022.8.21', '2022.8.21', '2022.8.20', '2022.8.19'],
'operator': ['Penny', 'Bernadette', 'Penny', 'Bernadette', 'Penny', 'Bernadette', 'Bernadette'],
})
按照姓名汇总:
# 以姓名为索引 只显示数据类型的值, 且默认按聚集再求均值处理
df1 = df.pivot_table(index=['name'])
多个索引:
# 多索引
df2 = df.pivot_table(index=['operator', 'name'])
指定显示的列:
# 显然Values不能随便指定, pivot_table()只能显示数值列, 只看价格
df3 = df.pivot_table(index=['operator', 'name'], values=['price'])
指定处理方法:
# 指定处理方法
df4 = df.pivot_table(index=['operator', 'name'], aggfunc={'num': np.sum, 'price': [np.sum, np.mean]})
想显示字符类型的列:
# columns显示字符类型
df5 = df.pivot_table(index=['operator', 'name'], values=['price'], columns='item')
将为空的值填上0:
# fill_value将NaN都填成0
df6 = df.pivot_table(index=['operator', 'name'], values=['price'], columns='item', fill_value=0)
组合处理方式:
# 组合模式
pd7 = df.pivot_table(index=['operator', 'name', 'item'], values=['num', 'price'],aggfunc=[np.sum], fill_value=0, margins=True)
此外,一般对DataFrame的方法都可以继续处理,比如筛选行、列等等