Pandas - 聚合

Pandas - 聚合

DataFrame应用聚合

示例:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10, 4),
      index = pd.date_range('1/1/2019', periods=10),
      columns = ['A', 'B', 'C', 'D'])

print (df)
print("=======================================")
r = df.rolling(window=3,min_periods=1)
print (r)

输出如下:

                   A         B         C         D
2019-01-01  0.439879 -0.620716  0.384183 -0.745009
2019-01-02 -0.739876 -1.496333 -0.303799  0.986643
2019-01-03 -0.987521  0.582238 -0.533543 -0.276241
2019-01-04 -1.907731  0.291339  0.454158  0.299288
2019-01-05  1.336021  0.930051 -1.251177  0.148594
2019-01-06 -0.149214  0.490910 -0.087143 -1.070752
2019-01-07 -1.522815 -0.269420  0.086573 -0.622118
2019-01-08  1.506759  1.024990 -1.706531  1.464352
2019-01-09 -0.615030  0.500708 -0.414950 -1.003106
2019-01-10 -0.158443 -1.024776 -1.423664 -1.277663
=======================================
Rolling [window=3,min_periods=1,center=False,axis=0]

可以通过向整个DataFrame传递一个函数来进行聚合,或者通过标准的获取项目方法来选择一个列


在整个数据框上应用聚合

示例:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10, 4),index = pd.date_range('1/1/2000', periods=10),columns = ['A', 'B', 'C', 'D'])
print (df)

r = df.rolling(window=3,min_periods=1)
print (r.aggregate(np.sum))

输出如下:

                           A         B         C         D
2000-01-01  0.773141  2.292985 -0.904677  0.242353
2000-01-02 -1.925769  0.728835 -2.424622 -1.171005
2000-01-03 -0.111028 -0.078429 -1.549092 -0.847151
2000-01-04 -1.501588  1.198424  0.656210 -0.626713
2000-01-05 -0.294595 -0.641386  0.102715  0.413191
2000-01-06 -0.335482  1.506107  0.922520  0.481978
2000-01-07  0.196372  0.859856  0.063748 -1.007660
2000-01-08  0.758099  0.669792 -0.549431  2.022156
2000-01-09  1.894996 -0.454106 -0.752306 -0.653473
2000-01-10 -0.293079 -0.900467 -0.667537  0.115899
                    
                           A         B         C         D
2000-01-01  0.773141  2.292985 -0.904677  0.242353
2000-01-02 -1.152628  3.021820 -3.329299 -0.928653
2000-01-03 -1.263657  2.943391 -4.878391 -1.775804
2000-01-04 -3.538385  1.848830 -3.317504 -2.644870
2000-01-05 -1.907211  0.478608 -0.790168 -1.060673
2000-01-06 -2.131665  2.063145  1.681445  0.268455
2000-01-07 -0.433705  1.724576  1.088983 -0.112492
2000-01-08  0.618989  3.035755  0.436837  1.496473
2000-01-09  2.849466  1.075542 -1.237989  0.361023
2000-01-10  2.360015 -0.684781 -1.969273  1.484582

在数据框的单个列上应用聚合

示例:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 4),
      index = pd.date_range('1/1/2000', periods=5),
      columns = ['A', 'B', 'C', 'D'])
print (df)
print("====================================")
r = df.rolling(window=3,min_periods=1)
print (r['A'].aggregate(np.sum))

输出如下:

                   A         B         C         D
2000-01-01 -0.274997  0.407783  0.376416 -0.379465
2000-01-02 -1.168787  0.447412  0.595449 -0.251049
2000-01-03  1.894652 -1.213812 -0.179113  0.373034
2000-01-04  0.573073 -0.564909  0.112975  0.870387
2000-01-05  0.430524  1.020612 -1.200894 -0.030805
====================================
2000-01-01   -0.274997
2000-01-02   -1.443784
2000-01-03    0.450868
2000-01-04    1.298938
2000-01-05    2.898250
Freq: D, Name: A, dtype: float64

在DataFrame的多列上应用聚合

示例:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 4),
      index = pd.date_range('1/1/2018', periods=5),
      columns = ['A', 'B', 'C', 'D'])
print (df)
print ("==========================================")
r = df.rolling(window=3,min_periods=1)
print (r[['A','B']].aggregate(np.sum))

输出如下:

                   A         B         C         D
2018-01-01  0.396790  1.047652 -0.366593 -1.007342
2018-01-02  0.347995  0.760883 -2.792697  0.275284
2018-01-03 -1.683239 -0.106222  1.288306  1.401471
2018-01-04  0.693949  0.940266  0.567763 -0.266491
2018-01-05 -1.622852 -2.358109  0.004795  0.741159
==========================================
                   A         B
2018-01-01  0.396790  1.047652
2018-01-02  0.744784  1.808535
2018-01-03 -0.938454  1.702312
2018-01-04 -0.641295  1.594927
2018-01-05 -2.612142 -1.524065

在DataFrame的单个列上应用多个函数

示例:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 4),
      index = pd.date_range('2019/01/01', periods=5),
      columns = ['A', 'B', 'C', 'D'])
print (df)

print("==========================================")

r = df.rolling(window=3,min_periods=1)
print (r['A'].aggregate([np.sum,np.mean]))

输出如下:

                   A         B         C         D
2019-01-01 -0.152684  0.579792 -0.931197 -1.284427
2019-01-02  1.236191  0.147896  0.143260  0.696489
2019-01-03  0.579045  0.659531  0.806744 -1.222311
2019-01-04 -0.307059  2.119788 -1.640019  0.116214
2019-01-05 -2.199344  0.223366  0.722573  0.736895
==========================================
                 sum      mean
2019-01-01 -0.152684 -0.152684
2019-01-02  1.083508  0.541754
2019-01-03  1.662552  0.554184
2019-01-04  1.508177  0.502726
2019-01-05 -1.927358 -0.642453

在DataFrame的多列上应用多个函数

示例:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(5, 4),
      index = pd.date_range('2020/01/01', periods=5),
      columns = ['A', 'B', 'C', 'D'])

print (df)
print("==========================================")
r = df.rolling(window=3,min_periods=1)
print (r[['A','B']].aggregate([np.sum,np.mean]))

输出如下:

                   A         B         C         D
2020-01-01 -0.028071 -0.420028 -1.349171  1.143043
2020-01-02 -1.453048  0.713694  0.582064  0.254938
2020-01-03  0.102893  1.160601 -0.043761 -0.295737
2020-01-04  0.241136  0.151186 -0.282294 -0.667249
2020-01-05 -1.208541  0.301091 -0.121198  0.650411
==========================================
                   A                   B          
                 sum      mean       sum      mean
2020-01-01 -0.028071 -0.028071 -0.420028 -0.420028
2020-01-02 -1.481119 -0.740559  0.293666  0.146833
2020-01-03 -1.378226 -0.459409  1.454267  0.484756
2020-01-04 -1.109019 -0.369673  2.025481  0.675160
2020-01-05 -0.864512 -0.288171  1.612878  0.537626

将不同的函数应用于DataFrame的不同列

示例:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(3, 4),
      index = pd.date_range('2020/01/01', periods=3),
      columns = ['A', 'B', 'C', 'D'])
print (df)
print("==========================================")
r = df.rolling(window=3,min_periods=1)
print (r.aggregate({'A' : np.sum,'B' : np.mean}))

输出如下:

                   A         B         C         D
2020-01-01  0.412677  0.120010  0.074705 -0.735974
2020-01-02  0.508431 -0.383076 -0.754675 -0.509018
2020-01-03 -0.681609  0.405636 -0.451324  0.646307
==========================================
                   A         B
2020-01-01  0.412677  0.120010
2020-01-02  0.921107 -0.131533
2020-01-03  0.239498  0.047523

你可能感兴趣的:(Pandas - 聚合)