Pandas_聚合数据_groupby()1

聚合数据 分组运算DataFrame.groupby(by=Noneaxis=0level=Noneas_index=Truesort=Truegroup_keys=Truesqueeze=Falseobserved=False**kwargs

1.by 字符串 or 字符串列表(内部分组键)

df.groupby('subject').mean()

df.groupby(['subject']).mean()

df.groupby(['subjcet','teacher']).mean()

2.by Series or  Series列表(内部分组键)

df.groupby(df['subject']).mean()

df.groupby([df['subject']]).mean()

df.groupby([df['subject'],df['teacher']]).mean()

3.by 列表(外部分组键)
data = pd.DataFrame(np.random.randn(6, 6), 
                    columns=["a", "b", "c", "d", "e", "f"], 
                    index=["first", "second", "third", "forth", "fifth", "sixth"])

data
	a	        b	        c	        d	        e	        f
first	-0.835339	0.736753	1.081034	-1.628442	-1.757788	-1.577404
second	-1.052507	-0.256520	0.128088	-0.004466	0.371417	2.010037
third	2.731690	-0.760283	-0.394960	-0.365931	-1.296562	1.947408
forth	-0.837671	0.828986	-0.319208	0.267911	0.357810	1.075699
fifth	-0.824047	-1.276668	1.020152	0.060037	-0.386238	0.979591
sixth	-0.515060	0.140053	-0.955941	1.173538	-1.373465	-0.730290

lst = ["one", "one", "one", "two", "two", "two"]
data.groupby(lst).mean()
	a	        b	        c	        d	        e	        f
one	0.281281	-0.093350	0.271387	-0.666280	-0.894311	0.793347
two	-0.725593	-0.102543	-0.084999	0.500495	-0.467298	0.441667
#Default axis = 0
#默认建立与 index 的映射

data.groupby(lst, axis=1).mean()
	one	            two
first	0.327483	-1.654545
second	-0.393647	0.792329
third	0.525482	0.094971
forth	-0.109298	0.567140
fifth	-0.360188	0.217797
sixth	-0.443650	-0.310072
#Set axis = 1
#建立与 columns 的映射

4.by mapping(外部分组键)
mapping = {'a':"one", 'b':"one", 'c':"one", 'd':"two", 'e':"two", 'f':"two"}
data.groupby(mapping).mean()
	a	        b	        c	        d	        e	        f
#Default axis = 0
#建立与 index 的映射
#由于与 index 不存在映射关系,所以结果为空
 
data.groupby(mapping, axis=1).mean()
	one	        two
first	0.327483	-1.654545
second	-0.393647	0.792329
third	0.525482	0.094971
forth	-0.109298	0.567140
fifth	-0.360188	0.217797
sixth	-0.443650	-0.310072
#Set axis = 1
#建立与 columns 的映射
#与 columns 存在映射关系,得到结果

5.by Series(外部分组键)
series_map = pd.Series(mapping)
data.groupby(series_map).mean()
	a	        b	        c	        d	        e	        f
#Default axis = 0
#建立与 index 的映射
#由于与 index 不存在映射关系,所以结果为空

data.groupby(series_map, axis=1).mean()
	one	        two
first	0.327483	-1.654545
second	-0.393647	0.792329
third	0.525482	0.094971
forth	-0.109298	0.567140
fifth	-0.360188	0.217797
sixth	-0.443650	-0.310072
#Set axis = 1
#建立与 columns 的映射
#与 columns 存在映射关系,得到结果

6.by function
def is_t(name):    #创建一个函数,判断对象中是否含有字母“t”。
    if "t" in name:
        return True
    else:
        return False
data.groupby(is_t).mean()
	a	        b	        c	        d	        e	        f
False	-1.052507	-0.256520	0.128088	-0.004466	0.371417	2.010037
True	-0.056085	-0.066232	0.086215	-0.098578	-0.891248	0.339001

data.groupby(is_t,axis=1).mean()
	    False
first	-0.663531
second	0.199341
third	0.310227
forth	0.228921
fifth	-0.071195
sixth	-0.376861
#Default axis = 0,传入行索引 index
#Set axis = 1,传入列索引 columns

data.groupby(len).mean()
	a	        b	        c	        d	        e	        f
5	-0.056085	-0.066232	0.086215	-0.098578	-0.891248	0.339001
6	-1.052507	-0.256520	0.128088	-0.004466	0.371417	2.010037

data.groupby(len,axis=1).mean()
    	1
first	-0.663531
second	0.199341
third	0.310227
forth	0.228921
fifth	-0.071195
sixth	-0.376861

参数

类型 说明
by

mapping

function

label

list of labels

 
axis int 0 default
level

int

level name

sequence of such

 

 
as_index bollean True default
sort boolean

True default

False 可以获得更好的性能

group_keys boolean True default
squeeze boolean False default
oberved boolean False default

 

你可能感兴趣的:(Pandas_聚合数据)