10-时间序列 - 重采样

将时间序列从一个频率转换为另一个频率的过程,且会有数据的结合

降采样:高频数据 → 低频数据,eg.以天为频率的数据转为以月为频率的数据
升采样:低频数据 → 高频数据,eg.以年为频率的数据转为以月为频率的数据

# 重采样:.resample()
# 创建一个以天为频率的TimeSeries,重采样为按2天为频率

rng = pd.date_range('20170101', periods = 12)
ts = pd.Series(np.arange(12), index = rng)
print("1".center(40,'*'))
print(ts)

ts_re = ts.resample('5D')
ts_re2 = ts.resample('5D').sum()
print("2".center(40,'*'))
print(ts_re, type(ts_re))
print(ts_re2, type(ts_re2))
# ts.resample('5D'):得到一个重采样构建器,频率改为5天
# ts.resample('5D').sum():得到一个新的聚合后的Series,聚合方式为求和
# freq:重采样频率 → ts.resample('5D')
# .sum():聚合方法
print("3".center(40,'*'))
print(ts.resample('5D').mean(),'→ 求平均值\n')
print(ts.resample('5D').max(),'→ 求最大值\n')
print(ts.resample('5D').min(),'→ 求最小值\n')
print(ts.resample('5D').median(),'→ 求中值\n')
print(ts.resample('5D').first(),'→ 返回第一个值\n')
print(ts.resample('5D').last(),'→ 返回最后一个值\n')
print(ts.resample('5D').ohlc(),'→ OHLC重采样\n')
# OHLC:金融领域的时间序列聚合方式 → open开盘、high最大值、low最小值、close收盘
#执行结果
*******************1********************
2017-01-01     0
2017-01-02     1
2017-01-03     2
2017-01-04     3
2017-01-05     4
2017-01-06     5
2017-01-07     6
2017-01-08     7
2017-01-09     8
2017-01-10     9
2017-01-11    10
2017-01-12    11
Freq: D, dtype: int32
*******************2********************
DatetimeIndexResampler [freq=<5 * Days>, axis=0, closed=left, label=left, convention=start, base=0] 
2017-01-01    10
2017-01-06    35
2017-01-11    21
dtype: int32 
*******************3********************
2017-01-01     2.0
2017-01-06     7.0
2017-01-11    10.5
dtype: float64 → 求平均值

2017-01-01     4
2017-01-06     9
2017-01-11    11
dtype: int32 → 求最大值

2017-01-01     0
2017-01-06     5
2017-01-11    10
dtype: int32 → 求最小值

2017-01-01     2.0
2017-01-06     7.0
2017-01-11    10.5
dtype: float64 → 求中值

2017-01-01     0
2017-01-06     5
2017-01-11    10
dtype: int32 → 返回第一个值

2017-01-01     4
2017-01-06     9
2017-01-11    11
dtype: int32 → 返回最后一个值

            open  high  low  close
2017-01-01     0     4    0      4
2017-01-06     5     9    5      9
2017-01-11    10    11   10     11 → OHLC重采样
# 降采样

rng = pd.date_range('20170101', periods = 12)
ts = pd.Series(np.arange(1,13), index = rng)
print("1".center(40,'*'))
print(ts)

print("2".center(40,'*'))
print(ts.resample('5D').sum(),'→ 默认\n')
print(ts.resample('5D', closed = 'left').sum(),'→ left\n')
print(ts.resample('5D', closed = 'right').sum(),'→ right\n')
# closed:各时间段哪一端是闭合(即包含)的,默认 左闭右闭
# 详解:这里values为0-11,按照5D重采样 → [1,2,3,4,5],[6,7,8,9,10],[11,12]
# left指定间隔左边为结束 → [1,2,3,4,5],[6,7,8,9,10],[11,12]
# right指定间隔右边为结束 → [1],[2,3,4,5,6],[7,8,9,10,11],[12]
print("3".center(40,'*'))
print(ts.resample('5D', label = 'left').sum(),'→ leftlabel\n')
print(ts.resample('5D', label = 'right').sum(),'→ rightlabel\n')
# label:聚合值的index,默认为取左
# 值采样认为默认(这里closed默认)
# 执行结果
*******************1********************
2017-01-01     1
2017-01-02     2
2017-01-03     3
2017-01-04     4
2017-01-05     5
2017-01-06     6
2017-01-07     7
2017-01-08     8
2017-01-09     9
2017-01-10    10
2017-01-11    11
2017-01-12    12
Freq: D, dtype: int32
*******************2********************
2017-01-01    15
2017-01-06    40
2017-01-11    23
dtype: int32 → 默认

2017-01-01    15
2017-01-06    40
2017-01-11    23
dtype: int32 → left

2016-12-27     1
2017-01-01    20
2017-01-06    45
2017-01-11    12
dtype: int32 → right

*******************3********************
2017-01-01    15
2017-01-06    40
2017-01-11    23
dtype: int32 → leftlabel

2017-01-06    15
2017-01-11    40
2017-01-16    23
dtype: int32 → rightlabel
# 升采样及插值

rng = pd.date_range('2017/1/1 0:0:0', periods = 5, freq = 'H')
ts = pd.DataFrame(np.arange(15).reshape(5,3),
                  index = rng,
                  columns = ['a','b','c'])
print(ts)

print(ts.resample('30T').asfreq())
print(ts.resample('30T').ffill())
print(ts.resample('30T').bfill())
# 低频转高频,主要是如何插值
# .asfreq():不做填充,返回Nan
# .ffill():向上填充
# .bfill():向下填充
#执行结果
                     a   b   c
2017-01-01 00:00:00   0   1   2
2017-01-01 01:00:00   3   4   5
2017-01-01 02:00:00   6   7   8
2017-01-01 03:00:00   9  10  11
2017-01-01 04:00:00  12  13  14
                        a     b     c
2017-01-01 00:00:00   0.0   1.0   2.0
2017-01-01 00:30:00   NaN   NaN   NaN
2017-01-01 01:00:00   3.0   4.0   5.0
2017-01-01 01:30:00   NaN   NaN   NaN
2017-01-01 02:00:00   6.0   7.0   8.0
2017-01-01 02:30:00   NaN   NaN   NaN
2017-01-01 03:00:00   9.0  10.0  11.0
2017-01-01 03:30:00   NaN   NaN   NaN
2017-01-01 04:00:00  12.0  13.0  14.0
                      a   b   c
2017-01-01 00:00:00   0   1   2
2017-01-01 00:30:00   0   1   2
2017-01-01 01:00:00   3   4   5
2017-01-01 01:30:00   3   4   5
2017-01-01 02:00:00   6   7   8
2017-01-01 02:30:00   6   7   8
2017-01-01 03:00:00   9  10  11
2017-01-01 03:30:00   9  10  11
2017-01-01 04:00:00  12  13  14
                      a   b   c
2017-01-01 00:00:00   0   1   2
2017-01-01 00:30:00   3   4   5
2017-01-01 01:00:00   3   4   5
2017-01-01 01:30:00   6   7   8
2017-01-01 02:00:00   6   7   8
2017-01-01 02:30:00   9  10  11
2017-01-01 03:00:00   9  10  11
2017-01-01 03:30:00  12  13  14
2017-01-01 04:00:00  12  13  14
# 时期重采样 - Period

prng = pd.period_range('2016','2017',freq = 'M')
ts = pd.Series(np.arange(len(prng)), index = prng)
print(ts)

print(ts.resample('90D').sum())  # 降采样
print(ts.resample('20D').ffill())  # 升采样
#执行结果
2016-01     0
2016-02     1
2016-03     2
2016-04     3
2016-05     4
2016-06     5
2016-07     6
2016-08     7
2016-09     8
2016-10     9
2016-11    10
2016-12    11
2017-01    12
Freq: M, dtype: int32
2016-01-01    0.0
2016-03-31    NaN
2016-06-29    NaN
2016-09-27    NaN
2016-12-26    NaN
Freq: 90D, dtype: float64
2016-01-01     0
2016-01-21     0
2016-02-10     1
2016-03-01     2
2016-03-21     2
2016-04-10     3
2016-04-30     3
2016-05-20     4
2016-06-09     5
2016-06-29     5
2016-07-19     6
2016-08-08     7
2016-08-28     7
2016-09-17     8
2016-10-07     9
2016-10-27     9
2016-11-16    10
2016-12-06    11
2016-12-26    11
2017-01-15    12
Freq: 20D, dtype: int32

你可能感兴趣的:(10-时间序列 - 重采样)