R里面apply族函数很强大,原来以为python的是阉割版,没想到也很强大,还是需要多看看文档。。。
相关环境和package信息:
import sys
import pandas as pd
import numpy as np
print("Python版本:",sys.version)
print("pandas版本:",pd.__version__)
print("numpy版本:",np.__version__)
Python版本: 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)]
pandas版本: 0.23.4
numpy版本: 1.17.4
Parameters
func:function
Function to apply to each column or row.
axis:{0 or ‘index’, 1 or ‘columns’}, default 0
Axis along which the function is applied:
raw:bool, default False
Determines if row or column is passed as a Series or ndarray object:
result_type:{‘expand’, ‘reduce’, ‘broadcast’, None}, default None
These only act when axis=1 (columns):
arg:stuple
Positional arguments to pass to func in addition to the array/series.
kwds:
Additional keyword arguments to pass as keywords arguments to func.
Returns
Series or DataFrame
Result of applying func along the given axis of the DataFrame.
df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
df
A | B | |
---|---|---|
0 | 4 | 9 |
1 | 4 | 9 |
2 | 4 | 9 |
df.apply(np.sqrt)
A | B | |
---|---|---|
0 | 2.0 | 3.0 |
1 | 2.0 | 3.0 |
2 | 2.0 | 3.0 |
行求和
df.apply(np.sum,axis=1)
0 13
1 13
2 13
dtype: int64
取出每一行中最大的元素
df.apply(np.max,axis=1)
0 9
1 9
2 9
dtype: int64
df.apply(np.sum,axis=0,result_type="expand")
A 12
B 27
dtype: int64
df.apply(np.sum,axis=0,result_type="broadcast")
A | B | |
---|---|---|
0 | 12 | 27 |
1 | 12 | 27 |
2 | 12 | 27 |
还有些比较复杂的操作,比如对每一行中指定的某几列数据进行操作,这时,传入function即可,举个例子:
def test_f(row):
return row["A"]+10-row["B"]
df.apply(test_f,axis=1)
0 5
1 5
2 5
dtype: int64
def test_f2(row):
return [1,2,3,4]
df.apply(test_f2,axis=1)
0 [1, 2, 3, 4]
1 [1, 2, 3, 4]
2 [1, 2, 3, 4]
dtype: object
df.apply(test_f2,axis=1,result_type="expand")
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | 1 | 2 | 3 | 4 |
1 | 1 | 2 | 3 | 4 |
2 | 1 | 2 | 3 | 4 |
df.apply(test_f2,axis=1,result_type="reduce")
0 [1, 2, 3, 4]
1 [1, 2, 3, 4]
2 [1, 2, 3, 4]
dtype: object
def test_f3(row):
return [1,2]
df.apply(test_f3,axis=1,result_type="broadcast")
A | B | |
---|---|---|
0 | 1 | 2 |
1 | 1 | 2 |
2 | 1 | 2 |
broadcast好像只能扩展同样长度的,即return的list长度=列数
[1] https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
[2] https://blog.csdn.net/qq_19528953/article/details/79348929?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522159419042819724839247314%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=159419042819724839247314&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2allfirst_rank_ecpm_v3~pc_rank_v3-2-79348929.pc_ecpm_v3_pc_rank_v3&utm_term=pandas+apply
2020-07-08 于南京市江宁区九龙湖