Pandas - 聚合
DataFrame应用聚合
示例:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),
index = pd.date_range('1/1/2019', periods=10),
columns = ['A', 'B', 'C', 'D'])
print (df)
print("=======================================")
r = df.rolling(window=3,min_periods=1)
print (r)
输出如下:
A B C D
2019-01-01 0.439879 -0.620716 0.384183 -0.745009
2019-01-02 -0.739876 -1.496333 -0.303799 0.986643
2019-01-03 -0.987521 0.582238 -0.533543 -0.276241
2019-01-04 -1.907731 0.291339 0.454158 0.299288
2019-01-05 1.336021 0.930051 -1.251177 0.148594
2019-01-06 -0.149214 0.490910 -0.087143 -1.070752
2019-01-07 -1.522815 -0.269420 0.086573 -0.622118
2019-01-08 1.506759 1.024990 -1.706531 1.464352
2019-01-09 -0.615030 0.500708 -0.414950 -1.003106
2019-01-10 -0.158443 -1.024776 -1.423664 -1.277663
=======================================
Rolling [window=3,min_periods=1,center=False,axis=0]
可以通过向整个DataFrame传递一个函数来进行聚合,或者通过标准的获取项目方法来选择一个列
在整个数据框上应用聚合
示例:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4),index = pd.date_range('1/1/2000', periods=10),columns = ['A', 'B', 'C', 'D'])
print (df)
r = df.rolling(window=3,min_periods=1)
print (r.aggregate(np.sum))
输出如下:
A B C D
2000-01-01 0.773141 2.292985 -0.904677 0.242353
2000-01-02 -1.925769 0.728835 -2.424622 -1.171005
2000-01-03 -0.111028 -0.078429 -1.549092 -0.847151
2000-01-04 -1.501588 1.198424 0.656210 -0.626713
2000-01-05 -0.294595 -0.641386 0.102715 0.413191
2000-01-06 -0.335482 1.506107 0.922520 0.481978
2000-01-07 0.196372 0.859856 0.063748 -1.007660
2000-01-08 0.758099 0.669792 -0.549431 2.022156
2000-01-09 1.894996 -0.454106 -0.752306 -0.653473
2000-01-10 -0.293079 -0.900467 -0.667537 0.115899
A B C D
2000-01-01 0.773141 2.292985 -0.904677 0.242353
2000-01-02 -1.152628 3.021820 -3.329299 -0.928653
2000-01-03 -1.263657 2.943391 -4.878391 -1.775804
2000-01-04 -3.538385 1.848830 -3.317504 -2.644870
2000-01-05 -1.907211 0.478608 -0.790168 -1.060673
2000-01-06 -2.131665 2.063145 1.681445 0.268455
2000-01-07 -0.433705 1.724576 1.088983 -0.112492
2000-01-08 0.618989 3.035755 0.436837 1.496473
2000-01-09 2.849466 1.075542 -1.237989 0.361023
2000-01-10 2.360015 -0.684781 -1.969273 1.484582
在数据框的单个列上应用聚合
示例:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 4),
index = pd.date_range('1/1/2000', periods=5),
columns = ['A', 'B', 'C', 'D'])
print (df)
print("====================================")
r = df.rolling(window=3,min_periods=1)
print (r['A'].aggregate(np.sum))
输出如下:
A B C D
2000-01-01 -0.274997 0.407783 0.376416 -0.379465
2000-01-02 -1.168787 0.447412 0.595449 -0.251049
2000-01-03 1.894652 -1.213812 -0.179113 0.373034
2000-01-04 0.573073 -0.564909 0.112975 0.870387
2000-01-05 0.430524 1.020612 -1.200894 -0.030805
====================================
2000-01-01 -0.274997
2000-01-02 -1.443784
2000-01-03 0.450868
2000-01-04 1.298938
2000-01-05 2.898250
Freq: D, Name: A, dtype: float64
在DataFrame的多列上应用聚合
示例:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 4),
index = pd.date_range('1/1/2018', periods=5),
columns = ['A', 'B', 'C', 'D'])
print (df)
print ("==========================================")
r = df.rolling(window=3,min_periods=1)
print (r[['A','B']].aggregate(np.sum))
输出如下:
A B C D
2018-01-01 0.396790 1.047652 -0.366593 -1.007342
2018-01-02 0.347995 0.760883 -2.792697 0.275284
2018-01-03 -1.683239 -0.106222 1.288306 1.401471
2018-01-04 0.693949 0.940266 0.567763 -0.266491
2018-01-05 -1.622852 -2.358109 0.004795 0.741159
==========================================
A B
2018-01-01 0.396790 1.047652
2018-01-02 0.744784 1.808535
2018-01-03 -0.938454 1.702312
2018-01-04 -0.641295 1.594927
2018-01-05 -2.612142 -1.524065
在DataFrame的单个列上应用多个函数
示例:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 4),
index = pd.date_range('2019/01/01', periods=5),
columns = ['A', 'B', 'C', 'D'])
print (df)
print("==========================================")
r = df.rolling(window=3,min_periods=1)
print (r['A'].aggregate([np.sum,np.mean]))
输出如下:
A B C D
2019-01-01 -0.152684 0.579792 -0.931197 -1.284427
2019-01-02 1.236191 0.147896 0.143260 0.696489
2019-01-03 0.579045 0.659531 0.806744 -1.222311
2019-01-04 -0.307059 2.119788 -1.640019 0.116214
2019-01-05 -2.199344 0.223366 0.722573 0.736895
==========================================
sum mean
2019-01-01 -0.152684 -0.152684
2019-01-02 1.083508 0.541754
2019-01-03 1.662552 0.554184
2019-01-04 1.508177 0.502726
2019-01-05 -1.927358 -0.642453
在DataFrame的多列上应用多个函数
示例:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 4),
index = pd.date_range('2020/01/01', periods=5),
columns = ['A', 'B', 'C', 'D'])
print (df)
print("==========================================")
r = df.rolling(window=3,min_periods=1)
print (r[['A','B']].aggregate([np.sum,np.mean]))
输出如下:
A B C D
2020-01-01 -0.028071 -0.420028 -1.349171 1.143043
2020-01-02 -1.453048 0.713694 0.582064 0.254938
2020-01-03 0.102893 1.160601 -0.043761 -0.295737
2020-01-04 0.241136 0.151186 -0.282294 -0.667249
2020-01-05 -1.208541 0.301091 -0.121198 0.650411
==========================================
A B
sum mean sum mean
2020-01-01 -0.028071 -0.028071 -0.420028 -0.420028
2020-01-02 -1.481119 -0.740559 0.293666 0.146833
2020-01-03 -1.378226 -0.459409 1.454267 0.484756
2020-01-04 -1.109019 -0.369673 2.025481 0.675160
2020-01-05 -0.864512 -0.288171 1.612878 0.537626
将不同的函数应用于DataFrame的不同列
示例:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 4),
index = pd.date_range('2020/01/01', periods=3),
columns = ['A', 'B', 'C', 'D'])
print (df)
print("==========================================")
r = df.rolling(window=3,min_periods=1)
print (r.aggregate({'A' : np.sum,'B' : np.mean}))
输出如下:
A B C D
2020-01-01 0.412677 0.120010 0.074705 -0.735974
2020-01-02 0.508431 -0.383076 -0.754675 -0.509018
2020-01-03 -0.681609 0.405636 -0.451324 0.646307
==========================================
A B
2020-01-01 0.412677 0.120010
2020-01-02 0.921107 -0.131533
2020-01-03 0.239498 0.047523