Pandas玩转数据(十二) -- 数据聚合技术Aggregation

数据分析汇总学习

https://blog.csdn.net/weixin_39778570/article/details/81157884

import pandas as pd
import numpy as np
from pandas import Series, DataFrame

# 打开csv文件
f = open('city_weather.csv')
df = pd.read_csv(f)
df
Out[12]: 
          date city  temperature  wind
0   03/01/2016   BJ            8     5
1   17/01/2016   BJ           12     2
2   31/01/2016   BJ           19     2
3   14/02/2016   BJ           -3     3
4   28/02/2016   BJ           19     2
5   13/03/2016   BJ            5     3
6   27/03/2016   SH           -4     4
7   10/04/2016   SH           19     3
8   24/04/2016   SH           20     3
9   08/05/2016   SH           17     3
10  22/05/2016   SH            4     2
11  05/06/2016   SH          -10     4
12  19/06/2016   SH            0     5
13  03/07/2016   SH           -9     5
14  17/07/2016   GZ           10     2
15  31/07/2016   GZ           -1     5
16  14/08/2016   GZ            1     5
17  28/08/2016   GZ           25     4
18  11/09/2016   SZ           20     1
19  25/09/2016   SZ          -10     4

# 对df进行分组
g = df.groupby('city')

# 实验聚合函数min
g.agg('min')
Out[7]: 
            date  temperature  wind
city                               
BJ    03/01/2016           -3     2
GZ    14/08/2016           -1     2
SH    03/07/2016          -10     2
SZ    11/09/2016          -10     1

# 自定义函数,目的查看传如函数的数据类型
def foo (attr):
    print(type(attr))
    return np.nan

#传入一个Sreies对象
g.agg(foo)
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
Out[9]: 
      date  temperature  wind
city                         
BJ     NaN          NaN   NaN
GZ     NaN          NaN   NaN
SH     NaN          NaN   NaN
SZ     NaN          NaN   NaN

# 自定义函数进行聚合
def foo (attr):
    return attr.max() - attr.min()

g.agg(foo)
Out[11]: 
      temperature  wind
city                   
BJ             22     3
GZ             26     3
SH             30     3
SZ             30     3

# 可以对两列进行分组
g_new = df.groupby(['city', 'wind'])

g_new
Out[14]: 0x000001C7B3DCDE10>

g_new.groups
Out[15]: 
{('BJ', 2): Int64Index([1, 2, 4], dtype='int64'),
 ('BJ', 3): Int64Index([3, 5], dtype='int64'),
 ('BJ', 5): Int64Index([0], dtype='int64'),
 ('GZ', 2): Int64Index([14], dtype='int64'),
 ('GZ', 4): Int64Index([17], dtype='int64'),
 ('GZ', 5): Int64Index([15, 16], dtype='int64'),
 ('SH', 2): Int64Index([10], dtype='int64'),
 ('SH', 3): Int64Index([7, 8, 9], dtype='int64'),
 ('SH', 4): Int64Index([6, 11], dtype='int64'),
 ('SH', 5): Int64Index([12, 13], dtype='int64'),
 ('SZ', 1): Int64Index([18], dtype='int64'),
 ('SZ', 4): Int64Index([19], dtype='int64')}

# 获取某一组
g_new.get_group(('BJ',2))
Out[18]: 
         date city  temperature  wind
1  17/01/2016   BJ           12     2
2  31/01/2016   BJ           19     2
4  28/02/2016   BJ           19     2

# 打印每个组
n [19]: for (name_1,name_2), group in g_new:
    print(name_1,name_2)
    print(group)

BJ 2
         date city  temperature  wind
1  17/01/2016   BJ           12     2
2  31/01/2016   BJ           19     2
4  28/02/2016   BJ           19     2
BJ 3
         date city  temperature  wind
3  14/02/2016   BJ           -3     3
5  13/03/2016   BJ            5     3
BJ 5
         date city  temperature  wind
0  03/01/2016   BJ            8     5
GZ 2
          date city  temperature  wind
14  17/07/2016   GZ           10     2
GZ 4
          date city  temperature  wind
17  28/08/2016   GZ           25     4
GZ 5
          date city  temperature  wind
15  31/07/2016   GZ           -1     5
16  14/08/2016   GZ            1     5
SH 2
          date city  temperature  wind
10  22/05/2016   SH            4     2
SH 3
         date city  temperature  wind
7  10/04/2016   SH           19     3
8  24/04/2016   SH           20     3
9  08/05/2016   SH           17     3
SH 4
          date city  temperature  wind
6   27/03/2016   SH           -4     4
11  05/06/2016   SH          -10     4
SH 5
          date city  temperature  wind
12  19/06/2016   SH            0     5
13  03/07/2016   SH           -9     5
SZ 1
          date city  temperature  wind
18  11/09/2016   SZ           20     1
SZ 4
          date city  temperature  wind
19  25/09/2016   SZ          -10     4

你可能感兴趣的:(python数据科学)