python 3.6 透视表pivot_table和交叉表

小结:

pandas库的参数变了不少,一定要先去官网升级到最新版的pandas,然后下载一个详细RN用于查询。

  1. 透视表pivot_table
  2. 交叉表 crosstab

1. 透视表pivot_table

根据一个键或多个键做数据聚合,默认类型是:根据键值/键值对,计算分组平均数

常规引入相关库

import pandas as pd
import numpy as np
from pandas import DataFrame,Series

注意pivot_table 可以在DadaFrame或pandas中调用,以下是在DadaFrame中调用。

# 读取数据并确定是DataFrame格式
  
In [10]: tips = pd.read_csv('ch08/tips.csv')

In [11]: tips['tip_pct'] = tips['tip'] / tips['total_bill']

In [12]: tips[:10]
Out[12]:
   total_bill   tip     sex smoker  day    time  size   tip_pct
0       16.99  1.01  Female     No  Sun  Dinner     2  0.059447
1       10.34  1.66    Male     No  Sun  Dinner     3  0.160542
2       21.01  3.50    Male     No  Sun  Dinner     3  0.166587
3       23.68  3.31    Male     No  Sun  Dinner     2  0.139780
4       24.59  3.61  Female     No  Sun  Dinner     4  0.146808
5       25.29  4.71    Male     No  Sun  Dinner     4  0.186240
6        8.77  2.00    Male     No  Sun  Dinner     2  0.228050
7       26.88  3.12    Male     No  Sun  Dinner     4  0.116071
8       15.04  1.96    Male     No  Sun  Dinner     2  0.130319
9       14.78  3.23    Male     No  Sun  Dinner     2  0.218539


In [5]: type(tips)
Out[5]: pandas.core.frame.DataFrame

# 普通透视表,设定index

In [6]: tips.pivot_table(index=['sex','smoker'])
Out[6]:
                   size       tip   tip_pct  total_bill
sex    smoker
Female No      2.592593  2.773519  0.156921   18.105185
       Yes     2.242424  2.931515  0.182150   17.977879
Male   No      2.711340  3.113402  0.160669   19.791237
       Yes     2.500000  3.051167  0.152771   22.284500

# 多层透视表, margins 作用是多加一个ALL列,不考虑分组级别中的差异

In [9]: tips.pivot_table( ['tip_pct','size'], index = ['sex','day'],
   ...:                    columns ='smoker')
   ...:
   ...:
Out[9]:
                 size             tip_pct
smoker             No       Yes        No       Yes
sex    day
Female Fri   2.500000  2.000000  0.165296  0.209129
       Sat   2.307692  2.200000  0.147993  0.163817
       Sun   3.071429  2.500000  0.165710  0.237075
       Thur  2.480000  2.428571  0.155971  0.163073
Male   Fri   2.000000  2.125000  0.138005  0.144730
       Sat   2.656250  2.629630  0.162132  0.139067
       Sun   2.883721  2.600000  0.158291  0.173964
       Thur  2.500000  2.300000  0.165706  0.164417

In [10]:

In [10]: tips.pivot_table( ['tip_pct','size'], index = ['sex','day'],
    ...:                    columns ='smoker', margins = True)
    ...:
    ...:
    ...:
Out[10]:
                 size                       tip_pct
smoker             No       Yes       All        No       Yes       All
sex    day
Female Fri   2.500000  2.000000  2.111111  0.165296  0.209129  0.199388
       Sat   2.307692  2.200000  2.250000  0.147993  0.163817  0.156470
       Sun   3.071429  2.500000  2.944444  0.165710  0.237075  0.181569
       Thur  2.480000  2.428571  2.468750  0.155971  0.163073  0.157525
Male   Fri   2.000000  2.125000  2.100000  0.138005  0.144730  0.143385
       Sat   2.656250  2.629630  2.644068  0.162132  0.139067  0.151577
       Sun   2.883721  2.600000  2.810345  0.158291  0.173964  0.162344
       Thur  2.500000  2.300000  2.433333  0.165706  0.164417  0.165276
All          2.668874  2.408602  2.569672  0.159328  0.163196  0.160803


# 一个对比例子, 普通透视表,margins 参数, fill_value填充空值,aggfunc使用其他函数(还能用lambda或自定义函数)

In [11]: tips.pivot_table('tip_pct',index=['sex','smoker'],columns='day')
Out[11]:
day                 Fri       Sat       Sun      Thur
sex    smoker
Female No      0.165296  0.147993  0.165710  0.155971
       Yes     0.209129  0.163817  0.237075  0.163073
Male   No      0.138005  0.162132  0.158291  0.165706
       Yes     0.144730  0.139067  0.173964  0.164417

In [12]:

In [12]: tips.pivot_table('tip_pct',index=['sex','smoker'],columns='day', margins= True)
Out[12]:
day                 Fri       Sat       Sun      Thur       All
sex    smoker
Female No      0.165296  0.147993  0.165710  0.155971  0.156921
       Yes     0.209129  0.163817  0.237075  0.163073  0.182150
Male   No      0.138005  0.162132  0.158291  0.165706  0.160669
       Yes     0.144730  0.139067  0.173964  0.164417  0.152771
All            0.169913  0.153152  0.166897  0.161276  0.160803

In [13]:

In [13]: tips.pivot_table('tip_pct',index=['sex','smoker'],columns='day', margins= True, aggfunc=len)
Out[13]:
day             Fri   Sat   Sun  Thur    All
sex    smoker
Female No       2.0  13.0  14.0  25.0   54.0
       Yes      7.0  15.0   4.0   7.0   33.0
Male   No       2.0  32.0  43.0  20.0   97.0
       Yes      8.0  27.0  15.0  10.0   60.0
All            19.0  87.0  76.0  62.0  244.0

In [14]:

In [14]:

In [14]: tips.pivot_table('size',index=['time','sex','smoker'],columns='day',aggfunc='sum')
Out[14]:
day                    Fri   Sat    Sun  Thur
time   sex    smoker
Dinner Female No       2.0  30.0   43.0   2.0
              Yes      8.0  33.0   10.0   NaN
       Male   No       4.0  85.0  124.0   NaN
              Yes     12.0  71.0   39.0   NaN
Lunch  Female No       3.0   NaN    NaN  60.0
              Yes      6.0   NaN    NaN  17.0
       Male   No       NaN   NaN    NaN  50.0
              Yes      5.0   NaN    NaN  23.0

In [15]: tips.pivot_table('size',index=['time','sex','smoker'],columns='day',aggfunc='sum', fill_value=0
    ...: )
Out[15]:
day                   Fri  Sat  Sun  Thur
time   sex    smoker
Dinner Female No        2   30   43     2
              Yes       8   33   10     0
       Male   No        4   85  124     0
              Yes      12   71   39     0
Lunch  Female No        3    0    0    60
              Yes       6    0    0    17
       Male   No        0    0    0    50
              Yes       5    0    0    23

In [16]:

2. 交叉表 crosstab

计算分组频率的特殊透视表。

In [20]: data = pd.DataFrame({'Sample': range(1, 11), 'Gender': ['Female', 'Male', 'Female', 'Male', 'Male', 'Male',
    ...:  'Female', 'Female', 'Male', 'Female'],
    ...:                     'Handedness': ['Right-handed', 'Left-handed', 'Right-handed', 'Right-handed', 'Left-han
    ...: ded', 'Right-handed', 'Right-handed', 'Left-handed', 'Right-handed', 'Right-handed']})
    ...:

In [21]: data
Out[21]:
   Sample  Gender    Handedness
0       1  Female  Right-handed
1       2    Male   Left-handed
2       3  Female  Right-handed
3       4    Male  Right-handed
4       5    Male   Left-handed
5       6    Male  Right-handed
6       7  Female  Right-handed
7       8  Female   Left-handed
8       9    Male  Right-handed
9      10  Female  Right-handed

In [22]: pd.crosstab(data.Gender,data.Handedness)
Out[22]:
Handedness  Left-handed  Right-handed
Gender
Female                1             4
Male                  2             3

In [23]: pd.crosstab(data.Gender,data.Handedness, margins=True)
Out[23]:
Handedness  Left-handed  Right-handed  All
Gender
Female                1             4    5
Male                  2             3    5
All                   3             7   10

In [24]:

2018.8.25

你可能感兴趣的:(python 3.6 透视表pivot_table和交叉表)