python的groupby中函数详解_Python:groupby函数及分组后使用自定义函数计算分组后的值...

groupby函数分组原理:

(1)不论分组键是数组、列表、字典、series、函数,只要待分组变量与分组键值的长度一致,都可以用groupby分组;

(2)分组可以按行或者按列进行,axis=0表示按行分组,axis=1表示按列分组,默认按行分组;

(3)对于分好的每个组,可以通过函数计算,python自带的或自定义的函数都行;

(4)将计算结果再聚合到一起输出。

下面通过例子对groupby函数进行具体说明:

创建一个dataFrame例子:

import numpy as np

import pandas as pd

def GroupbyDemo():

df = pd.DataFrame({'key1': ['a', 'a', 'b', 'b', 'a'],

'key2': ['one', 'two', 'one', 'two', 'one'],

'data1': np.random.randn(5),

'data2': np.random.randn(5)})

print(df)

if __name__ == '__main__':

GroupbyDemo()

打印输出结果:

key1 key2 data1 data2

0 a one 0.921248 1.090957

1 a two 0.211169 -1.826231

2 b one 0.058034 0.978667

3 b two 0.163153 0.835136

4 a one -0.231977 0.645021

(1)将key1作为分组键值,对data1进行分组,再求每组的均值

grouped = df['data1'].groupby(df['key1']).mean()

得到结果为:

key1

a 0.924545

b -0.148181

grouped = df['data1'].groupby(df['key1'])

for i in grouped:

print(i)

打印输出分组结果,分组结果类型为元祖

(2)将key1和key2都作为分组键值对data1进行分组,并求均值

grouped = df['data1'].groupby([df['key1'],df['key2']]).mean()

得到结果为:

key1 key2

a one -0.276938

two 1.882745

b one -0.679474

two -0.269018

上述分组都是按行分组的情况,下面阐述按列分组的情况:

创建一个含列key的dataFrame

import numpy as np

import pandas as pd

def GroupbyDemo():

df = pd.DataFrame({'key1': [1, 2, 3, 4, 5],

'key2': [10, 20, 30, 40, 50],

'data1': np.random.randn(5),

'data2': np.random.randn(5)},index=['joe','steve','wes','jim','travis'])

print(df)

if __name__ == '__main__':

GroupbyDemo()

打印输出:

key1 key2 data1 data2

joe 1 10 1.467131 0.760701

steve 2 20 1.631652 1.518505

wes 3 30 -0.058462 -0.244320

jim 4 40 -0.595540 -2.083987

travis 5 50 -0.587168 0.795081

(1)按列分组:

groupBy = {'key1': 'red', 'key2': 'red', 'data1': 'blue',

'data2': 'blue'}

grouped = df.groupby(groupBy, axis=1).mean()

print(grouped)

打印输出:

blue red

joe 0.016355 5.5

steve 0.379583 11.0

wes 0.474951 16.5

jim 0.692162 22.0

travis -1.670801 27.5

使用自定义函数计算分组值:

import numpy as np

import pandas as pd

def GroupbyDemo():

df = pd.DataFrame({'key1': [1, 2, 1, 2, 1],

'key2': [10, 20, 30, 40, 50],

'data1': np.random.randn(5),

'data2': np.random.randn(5)},index=['joe','steve','wes','jim','travis'])

print(df)

grouped = df['data1'].groupby(df['key1']).agg(peak_peak)

print("#################################################")

print(grouped)

def peak_peak(arr):

return arr.max() - arr.min()

if __name__ == '__main__':

GroupbyDemo()

打印结果:

key1 key2 data1 data2

joe 1 10 -1.064144 -1.419688

steve 2 20 -0.191633 -0.254214

wes 1 30 0.911625 -1.258709

jim 2 40 0.100250 0.445733

travis 1 50 -0.980806 1.710197

#################################################

key1

1 1.975770

2 0.291883

Name: data1, dtype: float64

你可能感兴趣的:(python的groupby中函数详解_Python:groupby函数及分组后使用自定义函数计算分组后的值...)