python agg函数_Pandas的数据分组-aggregate函数

30. Pandas的数据分组-aggregate聚合

在对数据进行分组之后,可以对分组后的数据进行聚合处理统计。

agg函数,agg的形参是一个函数会对分组后每列都应用这个函数。

import pandas as pd

import numpy as np

idx = [101,101,101,102,102,102,103,103,103]

idx += [101,102,103]

name = ["apple","pearl","orange", "apple","pearl","orange","apple","pearl","orange"]

name += ["apple"] * 3

price = [1.0,2.0,3.0,4.00,5.0,6.0,7.0,8.0,9.0]

price += [4] * 3

df0 = pd.DataFrame({ "fruit": name, "price" : price, "supplier" :idx})

print "*" * 30

print df0

print "*" * 30

dg1 = df0.groupby(["fruit", "supplier"])

for n, g in dg1:

print "multiGroup on:", n, "\n|",g ,"|"

print "*" * 30

print dg1.agg(np.mean)

程序的执行结果:

******************************

fruit price supplier

0 apple 1 101

1 pearl 2 101

2 orange 3 101

3 apple 4 102

4 pearl 5 102

5 orange 6 102

6 apple 7 103

7 pearl 8 103

8 orange 9 103

9 apple 4 101

10 apple 4 102

11 apple 4 103

******************************

multiGroup on: ('apple', 101)

| fruit price supplier

0 apple 1 101

9 apple 4 101 |

...

multiGroup on: ('pearl', 103)

| fruit price supplier

7 pearl 8 103 |

******************************

price

fruit supplier

apple 101 2.5

102 4.0

103 5.5

orange 101 3.0

102 6.0

103 9.0

pearl 101 2.0

102 5.0

103 8.0

请注意水果apple的输出。

agg应用均值、求和、最大等示例。

import pandas as pd

import numpy as np

idx = [101,101,101,102,102,102,103,103,103]

idx += [101,102,103] * 3

name = ["apple","pearl","orange", "apple","pearl","orange","apple","pearl","orange"]

name += ["apple"] * 3 + ["pearl"] * 3 + ["orange"] * 3

price = [4.1,5.3,6.3,4.20,5.4,6.0,4.5,5.5,6.8]

price += [4] * 3 + [5] * 3 + [6] * 3

df0 = pd.DataFrame({ "fruit": name, "price" : price, "supplier" :idx})

print "*" * 30

print df0

print "*" * 30

dg1 = df0.groupby(["fruit", "supplier"])

print dg1.agg(np.mean)

print "*" * 30

print dg1.agg([np.mean, np.std, np.min, np.sum])

程序执行结果:

******************************

fruit price supplier

0 apple 4.1 101

...

17 orange 6.0 103

******************************

price

fruit supplier

apple 101 4.05

102 4.10

103 4.25

orange 101 6.15

102 6.00

103 6.40

pearl 101 5.15

102 5.20

103 5.25

******************************

price

mean std amin sum

fruit supplier

apple 101 4.05 0.070711 4 8.1

102 4.10 0.141421 4 8.2

103 4.25 0.353553 4 8.5

orange 101 6.15 0.212132 6 12.3

102 6.00 0.000000 6 12.0

103 6.40 0.565685 6 12.8

pearl 101 5.15 0.212132 5 10.3

102 5.20 0.282843 5 10.4

103 5.25 0.353553 5 10.5

各列用不同的处理函数。需要在agg函数里以字典的形式给出,分组后的那列用那个函数处理。

import pandas as pd

import numpy as np

idx = [101,101,101,102,102,102,103,103,103]

idx += [101,102,103] * 3

name = ["apple","pearl","orange", "apple","pearl","orange","apple","pearl","orange"]

name += ["apple"] * 3 + ["pearl"] * 3 + ["orange"] * 3

price = [4.1,5.3,6.3,4.20,5.4,6.0,4.5,5.5,6.8]

price += [4] * 3 + [5] * 3 + [6] * 3

df0 = pd.DataFrame({ "fruit": name, "price" : price, "supplier" :idx})

print "*" * 30

print df0

print "*" * 30

dg1 = df0.groupby(["fruit"])

print dg1.agg(np.mean)

print "*" * 30

print dg1.agg([np.mean, np.std, np.min, np.sum])

print "*" * 30

print dg1.agg({"price" : np.mean, "supplier" : np.max})

程序的执行结果:

******************************

fruit price supplier

0 apple 4.1 101

1 pearl 5.3 101

2 orange 6.3 101

3 apple 4.2 102

4 pearl 5.4 102

5 orange 6.0 102

6 apple 4.5 103

7 pearl 5.5 103

8 orange 6.8 103

9 apple 4.0 101

10 apple 4.0 102

11 apple 4.0 103

12 pearl 5.0 101

13 pearl 5.0 102

14 pearl 5.0 103

15 orange 6.0 101

16 orange 6.0 102

17 orange 6.0 103

******************************

price supplier

fruit

apple 4.133333 102

orange 6.183333 102

pearl 5.200000 102

******************************

price supplier

mean std amin sum mean std amin sum

fruit

apple 4.133333 0.196638 4 24.8 102 0.894427 101 612

orange 6.183333 0.325064 6 37.1 102 0.894427 101 612

pearl 5.200000 0.228035 5 31.2 102 0.894427 101 612

******************************

supplier price

fruit

apple 103 4.133333

orange 103 6.183333

pearl 103 5.200000

agg函数是对列而言的,如果打算对分组后列的数据进行处理可以使用tranform函数,见下一章。

你可能感兴趣的:(python,agg函数)