Pandas玩转数据(十三) -- 透视表

数据分析汇总学习

https://blog.csdn.net/weixin_39778570/article/details/81157884

import pandas as pd
import numpy as np
from pandas import Series, DataFrame
# 打开excel文件
xlsx = pd.ExcelFile('sales-funnel.xlsx')
df = pd.read_excel(xlsx)

# 生成透视表
# aggfunc默认取平均值
pd.pivot_table(df,index=['Name'])
Out[13]: 
                               Account     Price  Quantity
Name                                                      
Barton LLC                    740150.0   35000.0  1.000000
Fritsch, Russel and Anderson  737550.0   35000.0  1.000000
Herman LLC                    141962.0   65000.0  2.000000
Jerde-Hilpert                 412290.0    5000.0  2.000000
Kassulke, Ondricka and Metz   307599.0    7000.0  3.000000
Keeling LLC                   688981.0  100000.0  5.000000
Kiehn-Spinka                  146832.0   65000.0  2.000000
Koepp Ltd                     729833.0   35000.0  2.000000
Kulas Inc                     218895.0   25000.0  1.500000
Purdy-Kunde                   163416.0   30000.0  1.000000
Stokes LLC                    239344.0    7500.0  1.000000
Trantow-Barrows               714466.0   15000.0  1.333333

# 可以设置多个index,多级
pd.pivot_table(df, index=['Name','Rep','Manager'])
Out[15]: 
                                                           Account     Price  \
Name                         Rep           Manager                             
Barton LLC                   John Smith    Debra Henley   740150.0   35000.0   
Fritsch, Russel and Anderson Craig Booker  Debra Henley   737550.0   35000.0   
Herman LLC                   Cedric Moss   Fred Anderson  141962.0   65000.0   
Jerde-Hilpert                John Smith    Debra Henley   412290.0    5000.0   
Kassulke, Ondricka and Metz  Wendy Yule    Fred Anderson  307599.0    7000.0   
Keeling LLC                  Wendy Yule    Fred Anderson  688981.0  100000.0   
Kiehn-Spinka                 Daniel Hilton Debra Henley   146832.0   65000.0   
Koepp Ltd                    Wendy Yule    Fred Anderson  729833.0   35000.0   
Kulas Inc                    Daniel Hilton Debra Henley   218895.0   25000.0   
Purdy-Kunde                  Cedric Moss   Fred Anderson  163416.0   30000.0   
Stokes LLC                   Cedric Moss   Fred Anderson  239344.0    7500.0   
Trantow-Barrows              Craig Booker  Debra Henley   714466.0   15000.0   

                                                          Quantity  
Name                         Rep           Manager                  
Barton LLC                   John Smith    Debra Henley   1.000000  
Fritsch, Russel and Anderson Craig Booker  Debra Henley   1.000000  
Herman LLC                   Cedric Moss   Fred Anderson  2.000000  
Jerde-Hilpert                John Smith    Debra Henley   2.000000  
Kassulke, Ondricka and Metz  Wendy Yule    Fred Anderson  3.000000  
Keeling LLC                  Wendy Yule    Fred Anderson  5.000000  
Kiehn-Spinka                 Daniel Hilton Debra Henley   2.000000  
Koepp Ltd                    Wendy Yule    Fred Anderson  2.000000  
Kulas Inc                    Daniel Hilton Debra Henley   1.500000  
Purdy-Kunde                  Cedric Moss   Fred Anderson  1.000000  
Stokes LLC                   Cedric Moss   Fred Anderson  1.000000  
Trantow-Barrows              Craig Booker  Debra Henley   1.333333  

# 多级index,经理和销售顾问
pd.pivot_table(df, index=['Manager','Rep'])
Out[17]: 
                              Account         Price  Quantity
Manager       Rep                                            
Debra Henley  Craig Booker   720237.0  20000.000000  1.250000
              Daniel Hilton  194874.0  38333.333333  1.666667
              John Smith     576220.0  20000.000000  1.500000
Fred Anderson Cedric Moss    196016.5  27500.000000  1.250000
              Wendy Yule     614061.5  44250.000000  3.000000

# 可以指定生成那个values
pd.pivot_table(df, index=['Manager','Rep'], values=['Price'])
Out[18]: 
                                    Price
Manager       Rep                        
Debra Henley  Craig Booker   20000.000000
              Daniel Hilton  38333.333333
              John Smith     20000.000000
Fred Anderson Cedric Moss    27500.000000
              Wendy Yule     44250.000000
# 可多个values,修改聚合函数
pd.pivot_table(df, index=['Manager','Rep'], values=['Price','Quantity'], aggfunc='sum')
Out[19]: 
                              Price  Quantity
Manager       Rep                            
Debra Henley  Craig Booker    80000         5
              Daniel Hilton  115000         5
              John Smith      40000         3
Fred Anderson Cedric Moss    110000         5
              Wendy Yule     177000        12
# 可以添加colunms,多级columns
pd.pivot_table(df, index=['Manager','Rep'], values=['Price','Quantity'], columns=['Product'],aggfunc='sum')
Out[20]: 
                                Price                              Quantity  \
Product                           CPU Maintenance Monitor Software      CPU   
Manager       Rep                                                             
Debra Henley  Craig Booker    65000.0      5000.0     NaN  10000.0      2.0   
              Daniel Hilton  105000.0         NaN     NaN  10000.0      4.0   
              John Smith      35000.0      5000.0     NaN      NaN      1.0   
Fred Anderson Cedric Moss     95000.0      5000.0     NaN  10000.0      3.0   
              Wendy Yule     165000.0      7000.0  5000.0      NaN      7.0   


Product                     Maintenance Monitor Software  
Manager       Rep                                         
Debra Henley  Craig Booker          2.0     NaN      1.0  
              Daniel Hilton         NaN     NaN      1.0  
              John Smith            2.0     NaN      NaN  
Fred Anderson Cedric Moss           1.0     NaN      1.0  
              Wendy Yule            3.0     2.0      NaN  

# 对没有的数据赋值
df_pivot = pd.pivot_table(df, index=['Manager','Rep'], values=['Price','Quantity'], columns=['Product'],fill_value=0,aggfunc='sum')

df数据表
Pandas玩转数据(十三) -- 透视表_第1张图片
df_pivot表
这里写图片描述

你可能感兴趣的:(python数据科学)