如果数据中囊括的时间段包括了假期或其他有重复性的特殊事件,必须为它们创建一个数据框(datadraft)。 该数据框需要包括有两列:holiday 、ds,每出现一个节假日或特殊事件有一行。
数据框需要包括假期的所有事件,包括过去(历史数据)和未来(预测)。
考虑到节日或事件的影响不一定局限在某一天,可以设置节日所影响的时间区间,在数据框中加入两列,分别是区间下界lower_window 和 区间上界upper_window ,从而将假期影响值延长到范围为 [lower_window, upper_window] 的日期中。
其中,lower_window、upper_window是距离该假日的天数。
例如:
圣诞节包括平安夜,设置lower_window=-1,upper_window=0。
感恩节包括黑色星期五,设置lower_window=0,upper_window=1。
进一步地,使用prior_scale 列分别为每个假期设置先前的比例。
创建一个数据框,对'holiday','ds','lower_window','upper_window'进行定义。记录佩顿·曼宁所有季后赛出场(playoffs apperance)的日期
(数据框就是c++中的class/struct?)
playoffs = pd.DataFrame({
'holiday': 'playoff',
'ds': pd.to_datetime(['2008-01-13', '2009-01-03', '2010-01-16',
'2010-01-24', '2010-02-07', '2011-01-08',
'2013-01-12', '2014-01-12', '2014-01-19',
'2014-02-02', '2015-01-11', '2016-01-17',
'2016-01-24', '2016-02-07']),
'lower_window': 0,
'upper_window': 1,
})
结果
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -19.4685
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
99 8124.7 0.0024575 617.553 0.9029 0.9029 133
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
199 8142.36 0.000450005 130.571 0.9794 0.9794 244
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
299 8145.89 0.000143046 261.785 0.502 0.502 362
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
347 8147.56 7.64187e-005 234.984 1.791e-007 0.001 455 LS failed, Hessian reset
399 8149.51 0.00013137 93.0648 0.9855 0.9855 517
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
409 8149.6 8.99008e-005 127.585 1.039e-006 0.001 573 LS failed, Hessian reset
499 8151.16 0.00129732 104.076 1 1 695
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
599 8152.2 0.0165716 467.02 0.2815 1 822
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
699 8154.09 7.88016e-005 120.573 1 1 946
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
745 8154.45 5.01699e-005 165.713 1.797e-007 0.001 1062 LS failed, Hessian reset
799 8154.87 8.15711e-006 72.1416 1 1 1135
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
899 8154.97 9.47403e-006 68.2667 0.4538 0.4538 1263
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
993 8155.11 1.93662e-006 71.1706 2.817e-008 0.001 1418 LS failed, Hessian reset
999 8155.11 7.04455e-007 49.8225 0.6467 0.06467 1427
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
1001 8155.11 3.11142e-007 57.4737 0.1968 0.1968 1430
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
将超级碗日(superbowl days) 包括在季后赛(playoff games)中。 超级碗效应是附加于季后赛效应的额外影响,使用holidays 参数将假期影响包含在预测中。
其中,季后赛对当天、之后一天有影响。故upper_window设置为1,超级碗日包括在季后赛中。
superbowls = pd.DataFrame({
'holiday': 'superbowl',
'ds': pd.to_datetime(['2010-02-07', '2014-02-02', '2016-02-07']),
'lower_window': 0,
'upper_window': 1,
})
设置holidays,进行预测。
holidays = pd.concat((playoffs, superbowls))
m = Prophet(holidays=holidays)
forecast = m.fit(df).predict(future)
设置输出表格
forecast[(forecast['playoff'] + forecast['superbowl']).abs() > 0][['ds', 'playoff', 'superbowl']][-10:]
ds playoff superbowl 2190 2014-02-02 1.227170 1.203772 2191 2014-02-03 1.913055 1.460153 2532 2015-01-11 1.227170 0.000000 2533 2015-01-12 1.913055 0.000000 2901 2016-01-17 1.227170 0.000000 2902 2016-01-18 1.913055 0.000000 2908 2016-01-24 1.227170 0.000000 2909 2016-01-25 1.913055 0.000000 2922 2016-02-07 1.227170 1.203772 2923 2016-02-08 1.913055 1.460153 |
作图,进一步看到假期效应的影响
fig = m.plot_components(forecast)
fig.show()
可以使用 plot_forecast_component 函数(从 Python 中的 predict.plot 导入)绘制单个假期,例如 plot_forecast_component(m, forecast, 'superbowl') 以仅绘制超级碗假期组件。
指定国家/地区的名称,除了上述人为指定的假期参数外,还将包括该国家/地区的主要假期:
m = Prophet(holidays=holidays)
m.add_country_holidays(country_name='US')
m.fit(df)
结果
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -19.4685
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
99 8150.87 0.00255627 418.81 1 1 126
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
199 8172.63 0.0049679 224.047 1 1 240
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
299 8178 0.000150487 208.217 0.03316 0.3846 354
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
375 8182.64 6.76923e-005 232.737 1.993e-007 0.001 490 LS failed, Hessian reset
399 8183.31 0.000543458 137.264 1 1 518
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
499 8185.01 0.000500865 88.2412 0.8946 0.8946 649
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
541 8186.07 4.97572e-005 178.393 2.22e-007 0.001 776 LS failed, Hessian reset
599 8187.19 0.000445848 175.194 1 1 844
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
648 8187.56 1.79703e-005 67.1318 1.912e-007 0.001 940 LS failed, Hessian reset
682 8187.66 2.1179e-005 103.741 2.31e-007 0.001 1025 LS failed, Hessian reset
699 8187.69 1.24666e-005 57.8593 0.2416 0.8861 1047
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
740 8187.7 1.13533e-005 61.4801 1.238e-007 0.001 1148 LS failed, Hessian reset
780 8187.71 4.94779e-006 63.4785 0.3536 1 1197
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
如果想查看Prophet模型自带的该国家的节日有哪些
m.train_holiday_names
结果
0 playoff
1 superbowl
2 New Year's Day
3 Martin Luther King Jr. Day
4 Washington's Birthday
5 Memorial Day
6 Independence Day
7 Labor Day
8 Columbus Day
9 Veterans Day
10 Thanksgiving
11 Christmas Day
12 Christmas Day (Observed)
13 Veterans Day (Observed)
14 Independence Day (Observed)
15 New Year's Day (Observed)
dtype: object
其他国家/地区的对应名字
此外,还包括以下国家/地区的假期:
巴西 (BR)、印度尼西亚 (ID)、印度 (IN)、马来西亚 (MY)、越南 (VN)、泰国 (TH)、菲律宾 (PH)、巴基斯坦 ( PK)、孟加拉国 (BD)、埃及 (EG)、中国 (CN) 、俄罗斯 (RU)、韩国 (KR)、白俄罗斯 (BY) 和阿拉伯联合酋长国 (AE)。
加入地区性节假日设定后,训练模型并作图
forecast = m.predict(future)
fig = m.plot_components(forecast)
fig.show()
(trend的图不知道为什么跟官网给的不一样)
使用部分傅里叶和(a partial Fourier sum)来估计季节性。其中,部分总和的项数是决定季节性变化速度的参数。年度季节性的默认傅里叶顺序(Fourier Order)为 10,从而产生以下拟合:
from fbprophet.plot import plot_yearly
m = Prophet().fit(df)
a = plot_yearly(m)
结果
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -19.4685
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
99 7974.84 0.00139678 451.299 1 1 127
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
199 7993.33 0.00250392 122.806 1 1 249
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
257 7996.11 6.49542e-005 196.815 3.664e-007 0.001 352 LS failed, Hessian reset
299 7997.14 0.000274336 138.333 0.4632 0.4632 400
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
353 7998.5 8.92333e-005 165.006 1.585e-007 0.001 508 LS failed, Hessian reset
399 7999.92 0.000379877 87.9033 1 1 568
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
499 8001.45 0.000925247 81.3781 1 1 690
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
501 8001.47 8.16398e-005 187.283 7.257e-007 0.001 737 LS failed, Hessian reset
599 8003.19 0.000309044 85.014 0.2 1 865
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
629 8003.45 0.000142042 190.253 1.703e-006 0.001 943 LS failed, Hessian reset
699 8003.94 3.56514e-005 75.4715 1 1 1030
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
730 8003.97 5.09062e-005 160.324 5.924e-007 0.001 1115 LS failed, Hessian reset
772 8003.99 4.1733e-007 59.2228 1 1 1167
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
(画不了结果图)
此函数输入名称、季节性周期(以天为单位)以及季节性的傅里叶阶。
Prophet 中默认,傅里叶顺序为3时:表示周季节性,为10时:表示年度季节性。
m = Prophet(weekly_seasonality=False)
m.add_seasonality(name='monthly', period=30.5, fourier_order=5)
forecast = m.fit(df).predict(future)
fig = m.plot_components(forecast)
fig.show()
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -19.4685
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
99 7784.8 0.00227015 614.046 0.2349 0.2349 128
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
199 7798.4 0.00143127 162.739 1 1 246
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
261 7799.8 3.61659e-005 104.539 2.659e-007 0.001 364 LS failed, Hessian reset
299 7800.62 0.000379718 118.058 1 1 413
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
333 7801.56 6.31933e-005 176.544 2.696e-007 0.001 513 LS failed, Hessian reset
391 7802.52 3.96175e-005 99.8835 5.005e-007 0.001 629 LS failed, Hessian reset
399 7802.54 1.67195e-005 62.7667 1.213 0.3624 640
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
499 7804.51 0.00131752 59.3718 1 1 763
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
599 7805.4 7.98569e-005 126.1 0.5951 0.5951 884
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
636 7806.01 5.72452e-005 126.767 2.072e-007 0.001 970 LS failed, Hessian reset
699 7806.67 0.000369987 149.683 1 1 1053
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
705 7806.69 4.38233e-005 107.309 3.084e-007 0.001 1106 LS failed, Hessian reset
734 7806.7 4.6617e-007 66.7769 0.1981 1 1149
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
对于年季节性,一年中可能有某些月份内部有特有的季节性(例如:每年夏季的数据不满足年季节性但这几个月份满足其独有的周季节性;每月的数据中,工作日跟休息日满足不同的日季节性)
即不同时间,可能还有其他因素影响数据波动,即对应的潜在规律不同。
假设淡季与旺季会影响数据趋势,在数据中加入一列描述淡季、旺季情况
def is_nfl_season(ds):
date = pd.to_datetime(ds)
return (date.month > 8 or date.month < 2)
df['on_season'] = df['ds'].apply(is_nfl_season)
df['off_season'] = ~df['ds'].apply(is_nfl_season)
然后,屏蔽原来关于周季节性的描述,将其替换为两个将这些列指定为条件的每周季节性。
新增的描述其他影响因素的列也需要添加到待预测的未来数据帧中。
m = Prophet(weekly_seasonality=False)
m.add_seasonality(name='weekly_on_season', period=7, fourier_order=3, condition_name='on_season')
m.add_seasonality(name='weekly_off_season', period=7, fourier_order=3, condition_name='off_season')
future['on_season'] = future['ds'].apply(is_nfl_season)
future['off_season'] = ~future['ds'].apply(is_nfl_season)
forecast = m.fit(df).predict(future)
fig = m.plot_components(forecast)
fig.show()
季节性仅适用于 condition_name 列为 True 的日期。
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -19.4685
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
99 8178.83 0.00761637 397.467 1 1 131
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
199 8200.19 0.0090541 1704.06 1 1 245
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
299 8206.39 0.0115594 300.104 1 1 365
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
399 8208.97 0.00450322 109.836 1 1 477
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
411 8209.38 9.41353e-005 285.111 5.633e-007 0.001 535 LS failed, Hessian reset
499 8210.88 0.0317132 548.857 1 1 637
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
536 8211.74 8.11792e-005 160.718 1.342e-007 0.001 726 LS failed, Hessian reset
599 8213.11 0.00170017 240.641 1 1 802
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
699 8213.7 0.00100337 221.527 0.1273 0.1273 929
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
799 8214.69 2.70914e-005 59.7178 1 1 1061
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
863 8214.75 6.49405e-007 56.7574 1 1 1140
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
从图像可以看到,在每周日打比赛的旺季,周日和周一都有大幅增加,而在淡季则完全没有。
如果假期项发生过拟合,调整参数holiday_prior_scale 先验比例使拟合曲线平滑。
默认情况下,此参数为 10,减少该参数能够减弱假日项的影响:
m = Prophet(holidays=holidays, holidays_prior_scale=0.05).fit(df)
forecast = m.predict(future)
forecast[(forecast['playoff'] + forecast['superbowl']).abs() > 0][
['ds', 'playoff', 'superbowl']][-10:]
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -19.4685
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
99 8116.8 0.00283116 470.914 0.7188 0.7188 132
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
199 8133.96 9.92653e-005 213.035 1 1 256
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
299 8137.94 0.000532056 213.994 0.7966 0.7966 377
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
337 8139.66 6.948e-005 167.129 1.509e-007 0.001 463 LS failed, Hessian reset
399 8141.99 0.00645722 672.979 1 1 543
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
404 8142.09 0.0001227 165.145 1.817e-006 0.001 589 LS failed, Hessian reset
499 8142.87 0.0028015 181.008 1 1 709
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
599 8145.14 0.000794858 177.759 1 1 843
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
680 8145.35 9.61321e-005 193.035 1.276e-006 0.001 980 LS failed, Hessian reset
699 8145.38 2.62971e-006 64.4397 0.1976 0.4691 1003
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
743 8145.52 0.000119296 302.311 9.074e-007 0.001 1109 LS failed, Hessian reset
797 8145.7 6.13508e-005 60.8905 7.609e-007 0.001 1219 LS failed, Hessian reset
799 8145.7 1.82998e-005 63.9972 0.9115 0.9115 1221
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
854 8145.71 4.56264e-007 54.811 0.5457 0.5457 1301
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
ds playoff superbowl
2190 2014-02-02 1.205772 0.962992
2191 2014-02-03 1.852125 0.992403
2532 2015-01-11 1.205772 0.000000
2533 2015-01-12 1.852125 0.000000
2901 2016-01-17 1.205772 0.000000
2902 2016-01-18 1.852125 0.000000
2908 2016-01-24 1.205772 0.000000
2909 2016-01-25 1.852125 0.000000
2922 2016-02-07 1.205772 0.962992
2923 2016-02-08 1.852125 0.992403
与之前相比,playoff/superbowl列的值明显偏小,节日效应的幅度有所降低
类似地,参数seasonality_prior_scale 也能够调整季节性模型的拟合程度。 通过在假期数据框中包含一个列prior_scale,可以为各个假期单独设置先前的比例。 单个季节性的先验比例可以作为参数传递给 add_seasonality。 例如,可以使用以下方法设置仅每周季节性的先前比例:
m = Prophet()
m.add_seasonality(name='weekly', period=7, fourier_order=3, prior_scale=0.1)
可以使用 add_regressor 方法或函数将其他回归量添加到模型的线性部分,拟合和预测数据帧中都需要存在具有回归量值的列。
def nfl_sunday(ds):
date = pd.to_datetime(ds)
if date.weekday() == 6 and (date.month > 8 or date.month < 2):
return 1
else:
return 0
df['nfl_sunday'] = df['ds'].apply(nfl_sunday)
m = Prophet()
m.add_regressor('nfl_sunday')
m.fit(df)
future['nfl_sunday'] = future['ds'].apply(nfl_sunday)
forecast = m.predict(future)
fig = m.plot_components(forecast)