【学习笔记-时间序列预测】prophet-使用.3节日与特殊事件

1.简介

如果数据中囊括的时间段包括了假期或其他有重复性的特殊事件,必须为它们创建一个数据框(datadraft)。 该数据框需要包括有两列:holiday 、ds,每出现一个节假日或特殊事件有一行。

数据框需要包括假期的所有事件,包括过去(历史数据)和未来(预测)。

2.节日影响范围设置

考虑到节日或事件的影响不一定局限在某一天,可以设置节日所影响的时间区间,在数据框中加入两列,分别是区间下界lower_window 和 区间上界upper_window ,从而将假期影响值延长到范围为 [lower_window, upper_window] 的日期中。

其中,lower_window、upper_window是距离该假日的天数。

例如:

圣诞节包括平安夜,设置lower_window=-1,upper_window=0。

感恩节包括黑色星期五,设置lower_window=0,upper_window=1。

进一步地,使用prior_scale 列分别为每个假期设置先前的比例。

创建一个数据框,对'holiday','ds','lower_window','upper_window'进行定义。记录佩顿·曼宁所有季后赛出场(playoffs apperance)的日期

(数据框就是c++中的class/struct?)

playoffs = pd.DataFrame({
  'holiday': 'playoff',
  'ds': pd.to_datetime(['2008-01-13', '2009-01-03', '2010-01-16',
                        '2010-01-24', '2010-02-07', '2011-01-08',
                        '2013-01-12', '2014-01-12', '2014-01-19',
                        '2014-02-02', '2015-01-11', '2016-01-17',
                        '2016-01-24', '2016-02-07']),
  'lower_window': 0,
  'upper_window': 1,
})
结果
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -19.4685
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
      99        8124.7     0.0024575       617.553      0.9029      0.9029      133
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     199       8142.36   0.000450005       130.571      0.9794      0.9794      244
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     299       8145.89   0.000143046       261.785       0.502       0.502      362
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     347       8147.56  7.64187e-005       234.984  1.791e-007       0.001      455  LS failed, Hessian reset
     399       8149.51    0.00013137       93.0648      0.9855      0.9855      517
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     409        8149.6  8.99008e-005       127.585  1.039e-006       0.001      573  LS failed, Hessian reset
     499       8151.16    0.00129732       104.076           1           1      695
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     599        8152.2     0.0165716        467.02      0.2815           1      822
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     699       8154.09  7.88016e-005       120.573           1           1      946
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     745       8154.45  5.01699e-005       165.713  1.797e-007       0.001     1062  LS failed, Hessian reset
     799       8154.87  8.15711e-006       72.1416           1           1     1135
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     899       8154.97  9.47403e-006       68.2667      0.4538      0.4538     1263
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     993       8155.11  1.93662e-006       71.1706  2.817e-008       0.001     1418  LS failed, Hessian reset
     999       8155.11  7.04455e-007       49.8225      0.6467     0.06467     1427
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
    1001       8155.11  3.11142e-007       57.4737      0.1968      0.1968     1430
Optimization terminated normally:
  Convergence detected: relative gradient magnitude is below tolerance

将超级碗日(superbowl days) 包括在季后赛(playoff games)中。 超级碗效应是附加于季后赛效应的额外影响,使用holidays 参数将假期影响包含在预测中。

其中,季后赛对当天、之后一天有影响。故upper_window设置为1,超级碗日包括在季后赛中。

superbowls = pd.DataFrame({
  'holiday': 'superbowl',
  'ds': pd.to_datetime(['2010-02-07', '2014-02-02', '2016-02-07']),
  'lower_window': 0,
  'upper_window': 1,
})

3.预测 

设置holidays,进行预测。

holidays = pd.concat((playoffs, superbowls))
m = Prophet(holidays=holidays)
forecast = m.fit(df).predict(future)

设置输出表格

forecast[(forecast['playoff'] + forecast['superbowl']).abs() > 0][['ds', 'playoff', 'superbowl']][-10:]
             ds   playoff  superbowl
2190 2014-02-02  1.227170   1.203772
2191 2014-02-03  1.913055   1.460153
2532 2015-01-11  1.227170   0.000000
2533 2015-01-12  1.913055   0.000000
2901 2016-01-17  1.227170   0.000000
2902 2016-01-18  1.913055   0.000000
2908 2016-01-24  1.227170   0.000000
2909 2016-01-25  1.913055   0.000000
2922 2016-02-07  1.227170   1.203772
2923 2016-02-08  1.913055   1.460153

作图,进一步看到假期效应的影响

fig = m.plot_components(forecast)
fig.show()

【学习笔记-时间序列预测】prophet-使用.3节日与特殊事件_第1张图片

可以使用 plot_forecast_component 函数(从 Python 中的 predict.plot 导入)绘制单个假期,例如 plot_forecast_component(m, forecast, 'superbowl') 以仅绘制超级碗假期组件。

 4.考虑地区性节假日(Built-in Country Holidays)

指定国家/地区的名称,除了上述人为指定的假期参数外,还将包括该国家/地区的主要假期:

m = Prophet(holidays=holidays)
m.add_country_holidays(country_name='US')
m.fit(df)

结果

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -19.4685
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
      99       8150.87    0.00255627        418.81           1           1      126
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     199       8172.63     0.0049679       224.047           1           1      240
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     299          8178   0.000150487       208.217     0.03316      0.3846      354
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     375       8182.64  6.76923e-005       232.737  1.993e-007       0.001      490  LS failed, Hessian reset
     399       8183.31   0.000543458       137.264           1           1      518
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     499       8185.01   0.000500865       88.2412      0.8946      0.8946      649
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     541       8186.07  4.97572e-005       178.393   2.22e-007       0.001      776  LS failed, Hessian reset
     599       8187.19   0.000445848       175.194           1           1      844
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     648       8187.56  1.79703e-005       67.1318  1.912e-007       0.001      940  LS failed, Hessian reset
     682       8187.66   2.1179e-005       103.741   2.31e-007       0.001     1025  LS failed, Hessian reset
     699       8187.69  1.24666e-005       57.8593      0.2416      0.8861     1047
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     740        8187.7  1.13533e-005       61.4801  1.238e-007       0.001     1148  LS failed, Hessian reset
     780       8187.71  4.94779e-006       63.4785      0.3536           1     1197
Optimization terminated normally:
  Convergence detected: relative gradient magnitude is below tolerance

如果想查看Prophet模型自带的该国家的节日有哪些

m.train_holiday_names

结果

0                         playoff
1                       superbowl
2                  New Year's Day
3      Martin Luther King Jr. Day
4           Washington's Birthday
5                    Memorial Day
6                Independence Day
7                       Labor Day
8                    Columbus Day
9                    Veterans Day
10                   Thanksgiving
11                  Christmas Day
12       Christmas Day (Observed)
13        Veterans Day (Observed)
14    Independence Day (Observed)
15      New Year's Day (Observed)
dtype: object

其他国家/地区的对应名字

此外,还包括以下国家/地区的假期:

巴西 (BR)、印度尼西亚 (ID)、印度 (IN)、马来西亚 (MY)、越南 (VN)、泰国 (TH)、菲律宾 (PH)、巴基斯坦 ( PK)、孟加拉国 (BD)、埃及 (EG)、中国 (CN) 、俄罗斯 (RU)、韩国 (KR)、白俄罗斯 (BY) 和阿拉伯联合酋长国 (AE)。

加入地区性节假日设定后,训练模型并作图

forecast = m.predict(future)
fig = m.plot_components(forecast)
fig.show()

【学习笔记-时间序列预测】prophet-使用.3节日与特殊事件_第2张图片

 (trend的图不知道为什么跟官网给的不一样)

5.季节性的傅里叶顺序(Fourier Order for Seasonalities)

使用部分傅里叶和(a partial Fourier sum)来估计季节性。其中,部分总和的项数是决定季节性变化速度的参数。年度季节性的默认傅里叶顺序(Fourier Order)为 10,从而产生以下拟合:

from fbprophet.plot import plot_yearly
m = Prophet().fit(df)
a = plot_yearly(m)

结果

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -19.4685
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
      99       7974.84    0.00139678       451.299           1           1      127
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     199       7993.33    0.00250392       122.806           1           1      249
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     257       7996.11  6.49542e-005       196.815  3.664e-007       0.001      352  LS failed, Hessian reset
     299       7997.14   0.000274336       138.333      0.4632      0.4632      400
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     353        7998.5  8.92333e-005       165.006  1.585e-007       0.001      508  LS failed, Hessian reset
     399       7999.92   0.000379877       87.9033           1           1      568
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     499       8001.45   0.000925247       81.3781           1           1      690
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     501       8001.47  8.16398e-005       187.283  7.257e-007       0.001      737  LS failed, Hessian reset
     599       8003.19   0.000309044        85.014         0.2           1      865
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     629       8003.45   0.000142042       190.253  1.703e-006       0.001      943  LS failed, Hessian reset
     699       8003.94  3.56514e-005       75.4715           1           1     1030
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     730       8003.97  5.09062e-005       160.324  5.924e-007       0.001     1115  LS failed, Hessian reset
     772       8003.99   4.1733e-007       59.2228           1           1     1167
Optimization terminated normally:
  Convergence detected: relative gradient magnitude is below tolerance

(画不了结果图)

6.自定义季节性

此函数输入名称、季节性周期(以天为单位)以及季节性的傅里叶阶。

Prophet 中默认,傅里叶顺序为3时:表示周季节性,为10时:表示年度季节性。

m = Prophet(weekly_seasonality=False)
m.add_seasonality(name='monthly', period=30.5, fourier_order=5)
forecast = m.fit(df).predict(future)
fig = m.plot_components(forecast)
fig.show()
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -19.4685
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
      99        7784.8    0.00227015       614.046      0.2349      0.2349      128
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     199        7798.4    0.00143127       162.739           1           1      246
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     261        7799.8  3.61659e-005       104.539  2.659e-007       0.001      364  LS failed, Hessian reset
     299       7800.62   0.000379718       118.058           1           1      413
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     333       7801.56  6.31933e-005       176.544  2.696e-007       0.001      513  LS failed, Hessian reset
     391       7802.52  3.96175e-005       99.8835  5.005e-007       0.001      629  LS failed, Hessian reset
     399       7802.54  1.67195e-005       62.7667       1.213      0.3624      640
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     499       7804.51    0.00131752       59.3718           1           1      763
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     599        7805.4  7.98569e-005         126.1      0.5951      0.5951      884
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     636       7806.01  5.72452e-005       126.767  2.072e-007       0.001      970  LS failed, Hessian reset
     699       7806.67   0.000369987       149.683           1           1     1053
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     705       7806.69  4.38233e-005       107.309  3.084e-007       0.001     1106  LS failed, Hessian reset
     734        7806.7   4.6617e-007       66.7769      0.1981           1     1149
Optimization terminated normally:
  Convergence detected: relative gradient magnitude is below tolerance

【学习笔记-时间序列预测】prophet-使用.3节日与特殊事件_第3张图片

 7.由其他因素影响的季节性(季节性中某部分的局部季节性/例外)

对于年季节性,一年中可能有某些月份内部有特有的季节性(例如:每年夏季的数据不满足年季节性但这几个月份满足其独有的周季节性;每月的数据中,工作日跟休息日满足不同的日季节性)

即不同时间,可能还有其他因素影响数据波动,即对应的潜在规律不同。

假设淡季与旺季会影响数据趋势,在数据中加入一列描述淡季、旺季情况

def is_nfl_season(ds):
    date = pd.to_datetime(ds)
    return (date.month > 8 or date.month < 2)

df['on_season'] = df['ds'].apply(is_nfl_season)
df['off_season'] = ~df['ds'].apply(is_nfl_season)

然后,屏蔽原来关于周季节性的描述,将其替换为两个将这些列指定为条件的每周季节性。

新增的描述其他影响因素的列也需要添加到待预测的未来数据帧中。

m = Prophet(weekly_seasonality=False)
m.add_seasonality(name='weekly_on_season', period=7, fourier_order=3, condition_name='on_season')
m.add_seasonality(name='weekly_off_season', period=7, fourier_order=3, condition_name='off_season')

future['on_season'] = future['ds'].apply(is_nfl_season)
future['off_season'] = ~future['ds'].apply(is_nfl_season)
forecast = m.fit(df).predict(future)
fig = m.plot_components(forecast)
fig.show()

季节性仅适用于 condition_name 列为 True 的日期。

INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -19.4685
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
      99       8178.83    0.00761637       397.467           1           1      131
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     199       8200.19     0.0090541       1704.06           1           1      245
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     299       8206.39     0.0115594       300.104           1           1      365
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     399       8208.97    0.00450322       109.836           1           1      477
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     411       8209.38  9.41353e-005       285.111  5.633e-007       0.001      535  LS failed, Hessian reset
     499       8210.88     0.0317132       548.857           1           1      637
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     536       8211.74  8.11792e-005       160.718  1.342e-007       0.001      726  LS failed, Hessian reset
     599       8213.11    0.00170017       240.641           1           1      802
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     699        8213.7    0.00100337       221.527      0.1273      0.1273      929
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     799       8214.69  2.70914e-005       59.7178           1           1     1061
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     863       8214.75  6.49405e-007       56.7574           1           1     1140
Optimization terminated normally:
  Convergence detected: relative gradient magnitude is below tolerance

【学习笔记-时间序列预测】prophet-使用.3节日与特殊事件_第4张图片

从图像可以看到,在每周日打比赛的旺季,周日和周一都有大幅增加,而在淡季则完全没有。

8.假期和季节性的先前规模

如果假期项发生过拟合,调整参数holiday_prior_scale 先验比例使拟合曲线平滑。

默认情况下,此参数为 10,减少该参数能够减弱假日项的影响:

m = Prophet(holidays=holidays, holidays_prior_scale=0.05).fit(df)
forecast = m.predict(future)
forecast[(forecast['playoff'] + forecast['superbowl']).abs() > 0][
    ['ds', 'playoff', 'superbowl']][-10:]
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Initial log joint probability = -19.4685
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
      99        8116.8    0.00283116       470.914      0.7188      0.7188      132
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     199       8133.96  9.92653e-005       213.035           1           1      256
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     299       8137.94   0.000532056       213.994      0.7966      0.7966      377
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     337       8139.66    6.948e-005       167.129  1.509e-007       0.001      463  LS failed, Hessian reset
     399       8141.99    0.00645722       672.979           1           1      543
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     404       8142.09     0.0001227       165.145  1.817e-006       0.001      589  LS failed, Hessian reset
     499       8142.87     0.0028015       181.008           1           1      709
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     599       8145.14   0.000794858       177.759           1           1      843
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     680       8145.35  9.61321e-005       193.035  1.276e-006       0.001      980  LS failed, Hessian reset
     699       8145.38  2.62971e-006       64.4397      0.1976      0.4691     1003
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     743       8145.52   0.000119296       302.311  9.074e-007       0.001     1109  LS failed, Hessian reset
     797        8145.7  6.13508e-005       60.8905  7.609e-007       0.001     1219  LS failed, Hessian reset
     799        8145.7  1.82998e-005       63.9972      0.9115      0.9115     1221
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
     854       8145.71  4.56264e-007        54.811      0.5457      0.5457     1301
Optimization terminated normally:
  Convergence detected: relative gradient magnitude is below tolerance

             ds   playoff  superbowl
2190 2014-02-02  1.205772   0.962992
2191 2014-02-03  1.852125   0.992403
2532 2015-01-11  1.205772   0.000000
2533 2015-01-12  1.852125   0.000000
2901 2016-01-17  1.205772   0.000000
2902 2016-01-18  1.852125   0.000000
2908 2016-01-24  1.205772   0.000000
2909 2016-01-25  1.852125   0.000000
2922 2016-02-07  1.205772   0.962992
2923 2016-02-08  1.852125   0.992403

与之前相比,playoff/superbowl列的值明显偏小,节日效应的幅度有所降低

类似地,参数seasonality_prior_scale 也能够调整季节性模型的拟合程度。 通过在假期数据框中包含一个列prior_scale,可以为各个假期单独设置先前的比例。 单个季节性的先验比例可以作为参数传递给 add_seasonality。 例如,可以使用以下方法设置仅每周季节性的先前比例:

m = Prophet()
m.add_seasonality(name='weekly', period=7, fourier_order=3, prior_scale=0.1)

9.额外的回归

可以使用 add_regressor 方法或函数将其他回归量添加到模型的线性部分,拟合和预测数据帧中都需要存在具有回归量值的列。 

def nfl_sunday(ds):
    date = pd.to_datetime(ds)
    if date.weekday() == 6 and (date.month > 8 or date.month < 2):
        return 1
    else:
        return 0
df['nfl_sunday'] = df['ds'].apply(nfl_sunday)

m = Prophet()
m.add_regressor('nfl_sunday')
m.fit(df)

future['nfl_sunday'] = future['ds'].apply(nfl_sunday)

forecast = m.predict(future)
fig = m.plot_components(forecast)

你可能感兴趣的:(时间序列分析,数据挖掘,数据分析,人工智能)