Pandas.DataFrame.resample 采样后时间序列起始时刻与采样前不一致

pandas 版本号:0.20.1
DataFrame.resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0, on=None, level=None)
Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.
时间序列变频和重采用的简便方法。DataFrame对象必须具有类似datetime的index(DatetimeIndex, PeriodIndex, or TimedeltaIndex)或者将类似datetime的值通过on或者level关键字传递给resample。

resample的时间序列处理函数为/pandas/tseries/resample.py。来源:https://github.com/pandas-dev/pandas/issues/4197

resample对类似datetime的index的处理会出现以下情况:

In [5]: rng = pd.date_range('1/1/2000 00:09:00', periods=12, freq='T')
In [6]: ts = pd.Series(np.arange(12), index=rng)
In [7]: ts
Out[7]:
2000-01-01 00:09:00 0
2000-01-01 00:10:00 1
2000-01-01 00:11:00 2
2000-01-01 00:12:00 3
2000-01-01 00:13:00 4
2000-01-01 00:14:00 5
2000-01-01 00:15:00 6
2000-01-01 00:16:00 7
2000-01-01 00:17:00 8
2000-01-01 00:18:00 9
2000-01-01 00:19:00 10
2000-01-01 00:20:00 11
Freq: T, dtype: int64
In [12]: ts.resample('4T')
Out[12]:
2000-01-01 00:08:00 1.0
2000-01-01 00:12:00 4.5
2000-01-01 00:16:00 8.5
2000-01-01 00:20:00 11.0
Freq: 4T, dtype: float64
即采样后的datetimeIndex的起始时间点与采样前的起始时间点不一致。原因在于base默认为0,即采样后的起始时间为起始时刻与采样频率的余数为0的值,即08:00,且默认处理08:00没有值。如果要实现从09:00开始处理,需要将base设置为1(即9%4)。
In [140]: ts.resample('4T', base=1)
Out[140]:
2000-01-01 00:09:00 1.5
2000-01-01 00:13:00 5.5
2000-01-01 00:17:00 9.5
Freq: 4T, dtype: float64
因此,使用resample,需要通过起始时刻和重采样频率计算base的值,并传入resample中。

转载于:https://my.oschina.net/u/3363145/blog/913232

你可能感兴趣的:(Pandas.DataFrame.resample 采样后时间序列起始时刻与采样前不一致)