Pandas-数据操作-数值型(二):累计统计函数【cumsum、cumprod、cummax、cummin】【计算前1/2/3/…/n个数的和、积、最大值、最小值】

一、累计统计函数

函数 作用
cumsum 计算前1/2/3/…/n个数的和
cummax 计算前1/2/3/…/n个数的最大值
cummin 计算前1/2/3/…/n个数的最小值
cumprod 计算前1/2/3/…/n个数的积
import numpy as np
import pandas as pd

# np.nan :空值
df = pd.DataFrame({'key1': np.arange(10),
                   'key2': np.random.rand(10) * 10})
print("df = \n", df)
print('-' * 200)

key1_cumsum = df['key1'].cumsum()
key2_cumsum = df['key2'].cumsum()

print("key1_cumsum = \n{0} \ntype(key1_cumsum) = {1}".format(key1_cumsum, type(key1_cumsum)))
print('-' * 50)
print("key2_cumsum = \n{0} \ntype(key2_cumsum) = {1}".format(key2_cumsum, type(key2_cumsum)))
print('-' * 50)
df['key1_cumsum'] = df['key1'].cumsum()
df['key2_cumsum'] = df['key2'].cumsum()
print("添加cumsum样本的累计和之后: df = \n", df)
print('-' * 200)

key1_cumprod = df['key1'].cumprod()
key2_cumprod = df['key2'].cumprod()

print("key1_cumprod = \n{0} \ntype(key1_cumprod) = {1}".format(key1_cumprod, type(key1_cumprod)))
print('-' * 50)
print("key2_cumprod = \n{0} \ntype(key2_cumprod) = {1}".format(key2_cumprod, type(key2_cumprod)))
print('-' * 50)
df['key1_cumprod'] = key1_cumprod
df['key2_cumprod'] = key2_cumprod
print("添加cumprod样本的累计积之后: df = \n", df)
print('-' * 200)

# cummax,cummin分别求累计最大值,累计最小值,会填充key1,和key2的值,返回新的对象
df1 = df.cummax()
df2 = df.cummin()

print("df = \n", df)
print('-' * 50)
print("df1 = df.cummax() = \n", df1)
print('-' * 50)
print("df2 = df.cummin() = \n", df2)
print('-' * 200)

打印结果:

df = 
    key1      key2
0     0  5.946567
1     1  6.500338
2     2  0.517269
3     3  6.888832
4     4  0.029891
5     5  6.908777
6     6  4.522801
7     7  6.755125
8     8  6.676930
9     9  3.002233
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
key1_cumsum = 
0     0
1     1
2     3
3     6
4    10
5    15
6    21
7    28
8    36
9    45
Name: key1, dtype: int32 
type(key1_cumsum) = <class 'pandas.core.series.Series'>
--------------------------------------------------
key2_cumsum = 
0     5.946567
1    12.446905
2    12.964174
3    19.853006
4    19.882897
5    26.791673
6    31.314474
7    38.069599
8    44.746529
9    47.748762
Name: key2, dtype: float64 
type(key2_cumsum) = <class 'pandas.core.series.Series'>
--------------------------------------------------
添加cumsum样本的累计和之后: df = 
    key1      key2  key1_cumsum  key2_cumsum
0     0  5.946567            0     5.946567
1     1  6.500338            1    12.446905
2     2  0.517269            3    12.964174
3     3  6.888832            6    19.853006
4     4  0.029891           10    19.882897
5     5  6.908777           15    26.791673
6     6  4.522801           21    31.314474
7     7  6.755125           28    38.069599
8     8  6.676930           36    44.746529
9     9  3.002233           45    47.748762
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
key1_cumprod = 
0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
9    0
Name: key1, dtype: int32 
type(key1_cumprod) = <class 'pandas.core.series.Series'>
--------------------------------------------------
key2_cumprod = 
0        5.946567
1       38.654696
2       19.994865
3      137.741271
4        4.117176
5       28.444652
6      128.649488
7      869.043329
8     5802.541623
9    17420.580379
Name: key2, dtype: float64 
type(key2_cumprod) = <class 'pandas.core.series.Series'>
--------------------------------------------------
添加cumprod样本的累计积之后: df = 
    key1      key2  key1_cumsum  key2_cumsum  key1_cumprod  key2_cumprod
0     0  5.946567            0     5.946567             0      5.946567
1     1  6.500338            1    12.446905             0     38.654696
2     2  0.517269            3    12.964174             0     19.994865
3     3  6.888832            6    19.853006             0    137.741271
4     4  0.029891           10    19.882897             0      4.117176
5     5  6.908777           15    26.791673             0     28.444652
6     6  4.522801           21    31.314474             0    128.649488
7     7  6.755125           28    38.069599             0    869.043329
8     8  6.676930           36    44.746529             0   5802.541623
9     9  3.002233           45    47.748762             0  17420.580379
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
df = 
    key1      key2  key1_cumsum  key2_cumsum  key1_cumprod  key2_cumprod
0     0  5.946567            0     5.946567             0      5.946567
1     1  6.500338            1    12.446905             0     38.654696
2     2  0.517269            3    12.964174             0     19.994865
3     3  6.888832            6    19.853006             0    137.741271
4     4  0.029891           10    19.882897             0      4.117176
5     5  6.908777           15    26.791673             0     28.444652
6     6  4.522801           21    31.314474             0    128.649488
7     7  6.755125           28    38.069599             0    869.043329
8     8  6.676930           36    44.746529             0   5802.541623
9     9  3.002233           45    47.748762             0  17420.580379
--------------------------------------------------
df1 = df.cummax() = 
    key1      key2  key1_cumsum  key2_cumsum  key1_cumprod  key2_cumprod
0     0  5.946567            0     5.946567             0      5.946567
1     1  6.500338            1    12.446905             0     38.654696
2     2  6.500338            3    12.964174             0     38.654696
3     3  6.888832            6    19.853006             0    137.741271
4     4  6.888832           10    19.882897             0    137.741271
5     5  6.908777           15    26.791673             0    137.741271
6     6  6.908777           21    31.314474             0    137.741271
7     7  6.908777           28    38.069599             0    869.043329
8     8  6.908777           36    44.746529             0   5802.541623
9     9  6.908777           45    47.748762             0  17420.580379
--------------------------------------------------
df2 = df.cummin() = 
    key1      key2  key1_cumsum  key2_cumsum  key1_cumprod  key2_cumprod
0     0  5.946567            0     5.946567             0      5.946567
1     0  5.946567            0     5.946567             0      5.946567
2     0  0.517269            0     5.946567             0      5.946567
3     0  0.517269            0     5.946567             0      5.946567
4     0  0.029891            0     5.946567             0      4.117176
5     0  0.029891            0     5.946567             0      4.117176
6     0  0.029891            0     5.946567             0      4.117176
7     0  0.029891            0     5.946567             0      4.117176
8     0  0.029891            0     5.946567             0      4.117176
9     0  0.029891            0     5.946567             0      4.117176
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Process finished with exit code 0

二、累计统计函数怎么用?

在这里插入图片描述
以上这些函数可以对series和dataframe操作

这里我们按照时间的从前往后来进行累计

  • 排序
    # 排序之后,进行累计求和
    data = data.sort_index()
    
  • 对p_change进行求和
    stock_rise = data['p_change']
    # plot方法集成了前面直方图、条形图、饼图、折线图
    stock_rise.cumsum()
    
    2015-03-02      2.62
    2015-03-03      4.06
    2015-03-04      5.63
    2015-03-05      7.65
    2015-03-06     16.16
    2015-03-09     16.37
    2015-03-10     18.75
    2015-03-11     16.36
    2015-03-12     15.03
    2015-03-13     17.58
    2015-03-16     20.34
    2015-03-17     22.42
    2015-03-18     23.28
    2015-03-19     23.74
    2015-03-20     23.48
    2015-03-23     23.74
    

使用matplotlib画出连续求和的结果:

在这里插入图片描述

如果要使用plot函数,需要导入matplotlib.

import matplotlib.pyplot as plt
# plot显示图形
stock_rise.cumsum().plot()
# 需要调用show,才能显示出结果
plt.show()

你可能感兴趣的:(Pandas,Pandas,累计统计函数)