PyPackage01---Pandas10_apply方法使用

Intro

  R里面apply族函数很强大,原来以为python的是阉割版,没想到也很强大,还是需要多看看文档。。。
相关环境和package信息:

import sys
import pandas as pd
import numpy as np 
print("Python版本:",sys.version)
print("pandas版本:",pd.__version__)
print("numpy版本:",np.__version__)
Python版本: 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)]
pandas版本: 0.23.4
numpy版本: 1.17.4

参数说明

Parameters
func:function
Function to apply to each column or row.

axis:{0 or ‘index’, 1 or ‘columns’}, default 0
Axis along which the function is applied:

  • 0 or ‘index’: apply function to each column.对每一列进行操作
  • 1 or ‘columns’: apply function to each row.对每一行进行操作,明明是columns但是却是对行操作。。。

raw:bool, default False
Determines if row or column is passed as a Series or ndarray object:

  • False : passes each row or column as a Series to the function.
  • True : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.
  • 这个参数不知道啥意思,似乎不影响使用

result_type:{‘expand’, ‘reduce’, ‘broadcast’, None}, default None
These only act when axis=1 (columns):

  • ‘expand’ : list-like results will be turned into columns.
  • ‘reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.
  • ‘broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.
    The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.
    New in version 0.23.0.
    返回结果的形式,除了broadcast,其他应该类似

arg:stuple
Positional arguments to pass to func in addition to the array/series.

kwds:
Additional keyword arguments to pass as keywords arguments to func.

Returns
Series or DataFrame
Result of applying func along the given axis of the DataFrame.

对所有元素进行操作

df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
df
A B
0 4 9
1 4 9
2 4 9
df.apply(np.sqrt)
A B
0 2.0 3.0
1 2.0 3.0
2 2.0 3.0

行操作

行求和

df.apply(np.sum,axis=1)
0    13
1    13
2    13
dtype: int64

取出每一行中最大的元素

df.apply(np.max,axis=1)
0    9
1    9
2    9
dtype: int64

列操作

df.apply(np.sum,axis=0,result_type="expand")
A    12
B    27
dtype: int64
df.apply(np.sum,axis=0,result_type="broadcast")
A B
0 12 27
1 12 27
2 12 27

其他复杂操作

还有些比较复杂的操作,比如对每一行中指定的某几列数据进行操作,这时,传入function即可,举个例子:

def test_f(row):
    return row["A"]+10-row["B"]
df.apply(test_f,axis=1)
0    5
1    5
2    5
dtype: int64
def test_f2(row):
    return [1,2,3,4]
df.apply(test_f2,axis=1)
0    [1, 2, 3, 4]
1    [1, 2, 3, 4]
2    [1, 2, 3, 4]
dtype: object
df.apply(test_f2,axis=1,result_type="expand")
0 1 2 3
0 1 2 3 4
1 1 2 3 4
2 1 2 3 4
df.apply(test_f2,axis=1,result_type="reduce")
0    [1, 2, 3, 4]
1    [1, 2, 3, 4]
2    [1, 2, 3, 4]
dtype: object
def test_f3(row):
    return [1,2]
df.apply(test_f3,axis=1,result_type="broadcast")
A B
0 1 2
1 1 2
2 1 2

broadcast好像只能扩展同样长度的,即return的list长度=列数

Ref

[1] https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
[2] https://blog.csdn.net/qq_19528953/article/details/79348929?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522159419042819724839247314%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=159419042819724839247314&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2allfirst_rank_ecpm_v3~pc_rank_v3-2-79348929.pc_ecpm_v3_pc_rank_v3&utm_term=pandas+apply

                             2020-07-08 于南京市江宁区九龙湖

你可能感兴趣的:(★★★Python,#,★★Python,Package)