在前面那篇文章中介绍了fbprophet的基础运用,但是,那些样本数据都是以天为粒度的。所以,如果想支持小时为粒度,还要对源码作出一些修改才行。下面,就来介绍一下怎么去修改源码。只需对forecaster.py进行修改。
文件所在路径:.local/lib/python2.7/site-packages/fbprophet或者/usr/lib/python2.7/site-packages
在传入参数中添加 daily_seasonality=True,同时对变量进行初始化,self.daily_seasonality = daily_seasonality
这个函数主要针对不同的seasonality进行数据准备:
可以先看一下如果weekly_seasonality = True时,会调用到一下这段代码:
if self.weekly_seasonality:
seasonal_features.append(self.make_seasonality_features(
df['ds'],
7,
3,
'weekly',
))
df['ds']是出入的数据框的时间序列数据,7:指定periods,3:指定series_order(级数),weekly:指定一个前缀
"ds","y"
2017-03-01 00:00:00,24
2017-03-01 01:00:00,17
2017-03-01 02:00:00,18
2017-03-01 03:00:00,19
2017-03-01 04:00:00,23
2017-03-01 05:00:00,23
2017-03-01 06:00:00,23
2017-03-01 07:00:00,21
2017-03-01 08:00:00,124
其中,df['ds']要严格按照datestamp的格式要求,在pandas里面有具体说明要求。
if self.daily_seasonality:
seasonal_features.append(self.make_hourly_seasonality_features(
df['ds'],
24,
5,
'daily'
))
类似的,在这个函数中加入这段代码,其中会调用到make_hourly_seasonaltiy_features这个自定义的函数,接下来会有说明:
这里指定periods为24,series_order为5,prefix为daily。
@classmethod
def make_hourly_seasonality_features(cls,dates,period,series_order,prefix):
features = cls.hourly_fourier_series(dates,period,series_order)
columns = [
'{}_{}'.format(prefix,i+1)
for i in range(features.shape[1])
]
return pd.DataFrame(features,columns = columns)
参考make_seasonality_features,这里调用了hourly_fourier_series函数。这里先贴出它的参考函数fourier_series函数代码:
def fourier_series(dates, period, series_order):
"""Generate a Fourier expansion for a fixed frequency and order.
Parameters
----------
dates: a pd.Series containing timestamps
period: an integer frequency (number of days)
series_order: number of components to generate
Returns
-------
a 2-dimensional np.array with one row per row in `dt`
"""
# convert to days since epoch
t = np.array(
(dates - pd.datetime(1970, 1, 1))
.apply(lambda x: x.days)
.astype(np.float)
)
return np.column_stack([
fun((2.0 * (i + 1) * np.pi * t / period))
for i in range(series_order)
for fun in (np.sin, np.cos)
])
@staticmethod
def hourly_fourier_series(dates, period, series_order):
t_d = np.array((dates - pd.datetime(1970,1,1)).apply(lambda x: x.days).astype(np.float))
t_s = np.array((dates - pd.datetime(1970,1,1)).apply(lambda x: x.seconds/3600).astype(np.float))
t_h = t_d * 24 + t_s
return np.column_stack([
fun((2.0 * ( i + 1) * np.pi * t_h / period))
for i in range(series_order)
for fun in (np.sin,np.cos)
])
说明:
t_d = np.array((dates - pd.datetime(1970,1,1)).apply(lambda x: x.days).astype(np.float))
获取到dates距离1970-1-1有多少天。
t_s = np.array((dates - pd.datetime(1970,1,1)).apply(lambda x: x.seconds/3600).astype(np.float))
获取到dates的具体小时间隔,先获取到每个datestamp在一天内的秒数,如下图,然后除以3600转换为小时:
t_h = t_d * 24 + t_s
获取到每个datestamp到1970-1-1的小时间隔。
预测的数据格式如下:(.csv格式文件)
ds:时间(0~23小时);y:数据
import pandas as pd
import numpy as np
from fbprophet import Prophet
file_path = '...'
if __name__ == '__main__':
df = pd.read_csv(file_path)
df['y'] = np.log(df['y'])
prophet = Prophet()
prophet.fit(df)
future = prophet.make_future_dataframe(freq='H',periods=24)
forecast = prophet.predict(future)
截止我写这篇博客时,fbprophet时不支持以小时为粒度的时间序列预测,该链接内容有说明:https://github.com/facebookincubator/prophet/issues/29
以上,就是源码的修改,当然,个人水平有限,可能存在错误,也请各位大神在评论中不吝赐教,谢谢。
上一篇文章:fbprophet的基础使用,fbprophet的探索历程