[python]时间序列预测

  1. https://blog.csdn.net/ZVAyIVqt0UFji/article/details/81784639
    周期时间序列的预测
    (有python代码)

具有周期性特征的序列需要将周期性特征提取出来。python里面的statsmodels工具包里面有针对周期性分解的函数seasonal_decompose,我们可以将序列进行分解。seasonal_decompose这个函数里面有个two_sided的参数,默认是True。Trend处理的时候用到移动平均的方法,熟悉此方法的读者就会发现,经过该方法处理以后,序列收尾两段有一部分数据缺失了,但是如果该参数为FALSE,则只有开始的时候有一段缺失值。

  1. 时间序列的一些理论?
    https://wiki.mbalib.com/wiki/时间序列预测法
    https://blog.csdn.net/webzjuyujun/article/details/50618617 http://www.faushine.com/2018/09/30/2018-10-01-time-serise-forecasting/http://www.faushine.com/2018/09/30/2018-10-01-time-serise-forecasting/

  2. Facebook 时间序列预测算法 Prophet 的研究
    https://zhuanlan.zhihu.com/p/52330017

  3. 知乎的一些回答(回头再看)
    https://www.zhihu.com/question/21229371

  4. https://juejin.im/entry/5bba1af56fb9a05ce469df75
    多变量时间序列的预测和建模指南(附Python代码)

  5. https://www.leiphone.com/news/201702/QjrKc9cLWAiqRGhT.html
    时间序列预测教程:如何利用 Python 预测波士顿每月持械抢劫案数量?
    里面有很多分析手法


ARIMA

判断p, d, q
d是差分
如果不用差分就是平稳的,那么d是0
如果是一阶差分才平稳,那d是1.ts_diff_1 = ts.diff(1)
。。。。

p,q的选择有:

  1. arma_order_select_ic
  2. 画画出ACF,PACF的图像
#https://blog.csdn.net/u012735708/article/details/82460962
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
def draw_acf_pacf(ts,lags):
    f = plt.figure(facecolor='white')
    ax1 = f.add_subplot(211)
    plot_acf(ts,ax=ax1,lags=lags)
    ax2 = f.add_subplot(212)
    plot_pacf(ts,ax=ax2,lags=lags)
    plt.subplots_adjust(hspace=0.5)
    plt.show()
draw_acf_pacf(ts_diff_2,30)

https://blog.csdn.net/duxu24/article/details/52079901

seasonal_decompose

https://blog.csdn.net/ZVAyIVqt0UFji/article/details/81784639
https://blog.csdn.net/u012735708/article/details/82460962
https://blog.csdn.net/u011596455/article/details/78650517
https://blog.csdn.net/shine19930820/article/details/72667656
https://www.jianshu.com/p/09e5218f58b4
https://www.biaodianfu.com/time-series-forecasting-codes-python.html (这篇感觉很详细!)

test

trainfile2.csv

date_time,pop
2018-11-12 00:00:00,8933.946387360553
2018-11-12 01:00:00,8988.490918493022
2018-11-12 02:00:00,9094.357585159689
2018-11-12 03:00:00,8988.490918493022
2018-11-12 04:00:00,8988.490918493024
2018-11-12 05:00:00,9074.14913063393
2018-11-12 06:00:00,10789.571764691244
2018-11-12 07:00:00,17151.58225829059
2018-11-12 08:00:00,34289.93003731031
2018-11-12 09:00:00,48453.44305799665
2018-11-12 10:00:00,55580.46013546843
2018-11-12 11:00:00,55285.64358646245
2018-11-12 12:00:00,53865.35841064149
2018-11-12 13:00:00,50751.98869523418
2018-11-12 14:00:00,48903.26192598338
2018-11-12 15:00:00,46048.75882075741
2018-11-12 16:00:00,42885.46736698381
2018-11-12 17:00:00,41089.513610739654
2018-11-12 18:00:00,32269.78924473016
2018-11-12 19:00:00,22994.202113475803
2018-11-12 20:00:00,15594.583963821966
2018-11-12 21:00:00,12348.689899740204
2018-11-12 22:00:00,9784.401824944971
2018-11-12 23:00:00,8684.962482819637
2018-11-13 00:00:00,9760.470497980394
2018-11-13 01:00:00,9371.725741882194
2018-11-13 02:00:00,9180.913563863385
2018-11-13 03:00:00,9021.205016507976
2018-11-13 04:00:00,9036.371051780854
2018-11-13 05:00:00,9849.773893538411
2018-11-13 06:00:00,11087.773655083816
2018-11-13 07:00:00,18758.84646297733
2018-11-13 08:00:00,32385.59143733428
2018-11-13 09:00:00,44902.54818034982
2018-11-13 10:00:00,47670.412754751254
2018-11-13 11:00:00,45573.23987926087
2018-11-13 12:00:00,42653.37685965751
2018-11-13 13:00:00,40504.422095592265
2018-11-13 14:00:00,42447.28367976887
2018-11-13 15:00:00,43520.866867409066
2018-11-13 16:00:00,41468.202076732196
2018-11-13 17:00:00,36698.02002347518
2018-11-13 18:00:00,30469.876226131535
2018-11-13 19:00:00,24439.12709756465
2018-11-13 20:00:00,17882.479550801847
2018-11-13 21:00:00,14457.957302516888
2018-11-13 22:00:00,12215.74874918257
2018-11-13 23:00:00,10647.026258806318
2018-11-14 00:00:00,6105.014513591727
2018-11-14 01:00:00,6825.578909884451
2018-11-14 02:00:00,7195.535556190217
2018-11-14 03:00:00,7087.721375598388
2018-11-14 04:00:00,7191.984020174522
2018-11-14 05:00:00,7265.483438736998
2018-11-14 06:00:00,8953.707210120705
2018-11-14 07:00:00,14589.511370989232
2018-11-14 08:00:00,31129.262851727544
2018-11-14 09:00:00,43311.91787561204
2018-11-14 10:00:00,47682.29373178458
2018-11-14 11:00:00,47402.49616644144
2018-11-14 12:00:00,45983.51758548242
2018-11-14 13:00:00,43637.92015605583
2018-11-14 14:00:00,43593.34105468072
2018-11-14 15:00:00,42262.561897424115
2018-11-14 16:00:00,41668.398586024356
2018-11-14 17:00:00,39405.83698887056
2018-11-14 18:00:00,30205.499610635372
2018-11-14 19:00:00,23025.13488288297
2018-11-14 20:00:00,18970.521105334265
2018-11-14 21:00:00,12742.530973464449
2018-11-14 22:00:00,8313.024366198704
2018-11-14 23:00:00,8982.590410152236
2018-11-15 00:00:00,8800.428253606111
2018-11-15 01:00:00,8811.189572200441
2018-11-15 02:00:00,8560.963274739343
2018-11-15 03:00:00,8660.59112690735
2018-11-15 04:00:00,8564.148434345312
2018-11-15 05:00:00,8851.501392228774
2018-11-15 06:00:00,10683.962940181626
2018-11-15 07:00:00,16952.609664650266
2018-11-15 08:00:00,32972.226845337456
2018-11-15 09:00:00,46118.75507854618
2018-11-15 10:00:00,47710.057854152205
2018-11-15 11:00:00,48305.77169129533
2018-11-15 12:00:00,45611.23911046107
2018-11-15 13:00:00,43636.33981682223
2018-11-15 14:00:00,44147.62983544601
2018-11-15 15:00:00,42076.843892962206
2018-11-15 16:00:00,41687.184638055514
2018-11-15 17:00:00,41172.49185094152
2018-11-15 18:00:00,35613.68535869582
2018-11-15 19:00:00,23564.64247908088
2018-11-15 20:00:00,19395.11023229511
2018-11-15 21:00:00,18196.63463327124
2018-11-15 22:00:00,13864.813290840497
2018-11-15 23:00:00,11704.740731628715
2018-11-16 00:00:00,9014.878261660757
2018-11-16 01:00:00,10379.633074227368
2018-11-16 02:00:00,10064.551109189064
2018-11-16 03:00:00,9606.290927464583
2018-11-16 04:00:00,9889.70216790543
2018-11-16 05:00:00,10231.92403036694
2018-11-16 06:00:00,12195.27278773964
2018-11-16 07:00:00,18136.043075006713
2018-11-16 08:00:00,33279.532310524584
2018-11-16 09:00:00,46046.38785307562
2018-11-16 10:00:00,51282.3453017939
2018-11-16 11:00:00,48926.98202235306
2018-11-16 12:00:00,47343.08963096526
2018-11-16 13:00:00,43644.404224324855
2018-11-16 14:00:00,44032.39025873879
2018-11-16 15:00:00,46844.29477636265
2018-11-16 16:00:00,46606.49476495913
2018-11-16 17:00:00,42034.73431723829
2018-11-16 18:00:00,33064.98309447841
2018-11-16 19:00:00,23832.555381010698
2018-11-16 20:00:00,16845.969273631083
2018-11-16 21:00:00,12057.925581073036
2018-11-16 22:00:00,9355.530601129482
2018-11-16 23:00:00,8700.08454664213

python

# -*- coding: utf-8 -*-
import numpy as np
import os
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.stattools import acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import adfuller,arma_order_select_ic
import pandas as pd
import pdb
import matplotlib.pyplot as plt

def draw_acf_pacf(ts,lags):
    f = plt.figure(facecolor='white')
    ax1 = f.add_subplot(211)
    plot_acf(ts,ax=ax1,lags=lags)
    ax2 = f.add_subplot(212)
    plot_pacf(ts,ax=ax2,lags=lags)
    plt.subplots_adjust(hspace=0.5)
    plt.show()

filename="trainfile2.csv"
df = pd.read_csv(filename)
df.index = pd.to_datetime(df.date_time)
df.drop(columns='date_time')

ts = df['pop']
#ts.plot()

#draw_acf_pacf(ts,30)
#resDiff = arma_order_select_ic(ts, ic='aic', trend='nc')

num_day = 4
num_train = 24*num_day
data_ = df["pop"][0:num_train]
decomposition = seasonal_decompose(data_, freq=24, two_sided=False)

trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
#decomposition.plot()

trend.dropna(inplace=True)
model = ARIMA(trend, order=(4,0,1))
predict_model = model.fit(disp=0)
trend_predict = predict_model.forecast(24)[0]

predict_time_index = pd.date_range(start=trend.index[-1],
        periods=25,freq='H')[1:]
pop = np.zeros(120)
pop[0:num_train]=data_
season_mean = np.mean(np.reshape(np.array(seasonal),[num_day,24]),0)
#pop[num_train:]=trend_predict+seasonal[0:24]
pop[num_train:]=trend_predict+season_mean
df_pred = pd.DataFrame(data=pop,index=df.index,columns=["pop"],dtype="float")
plt.plot(df_pred["pop"],'r')
plt.plot(df["pop"],'b')
plt.show()

others

同期推荐给了我一个arima模型的库,
https://www.alkaline-ml.com/pmdarima/index.html
这个可以自己选择最好的order(p,q),太好了!

import numpy as np
import pmdarima as pm
from pmdarima.datasets import load_wineind

# this is a dataset from R
wineind = load_wineind().astype(np.float64)

# fit stepwise auto-ARIMA
stepwise_fit = pm.auto_arima(wineind, start_p=1, start_q=1,
                             max_p=3, max_q=3, m=12,
                             start_P=0, seasonal=True,
                             d=1, D=1, trace=True,
                             error_action='ignore',  # don't want to know if an order does not work
                             suppress_warnings=True,  # don't want convergence warnings
                             stepwise=True)  # set to stepwise

输出:

Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=3066.760, BIC=3082.229, Fit time=0.720 seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 1, 0, 12); AIC=3133.376, BIC=3139.564, Fit time=0.025 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 1, 0, 12); AIC=3099.734, BIC=3112.109, Fit time=0.166 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=3066.930, BIC=3079.305, Fit time=0.165 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 1, 1, 12); AIC=3067.548, BIC=3086.110, Fit time=0.834 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 0, 12); AIC=3088.088, BIC=3100.463, Fit time=0.125 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 2, 12); AIC=3068.000, BIC=3086.563, Fit time=0.768 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 1, 2, 12); AIC=3068.915, BIC=3090.571, Fit time=2.064 seconds
Fit ARIMA: order=(2, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=3067.447, BIC=3086.010, Fit time=0.359 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 1, 1, 12); AIC=3094.571, BIC=3106.946, Fit time=0.189 seconds
Fit ARIMA: order=(1, 1, 2) seasonal_order=(0, 1, 1, 12); AIC=3066.742, BIC=3085.305, Fit time=0.435 seconds
Fit ARIMA: order=(2, 1, 3) seasonal_order=(0, 1, 1, 12); AIC=3070.634, BIC=3095.384, Fit time=1.244 seconds
Fit ARIMA: order=(1, 1, 2) seasonal_order=(1, 1, 1, 12); AIC=3068.010, BIC=3089.666, Fit time=0.636 seconds
Fit ARIMA: order=(1, 1, 2) seasonal_order=(0, 1, 0, 12); AIC=3090.957, BIC=3106.426, Fit time=0.167 seconds
Fit ARIMA: order=(1, 1, 2) seasonal_order=(0, 1, 2, 12); AIC=3067.731, BIC=3089.387, Fit time=1.125 seconds
Fit ARIMA: order=(1, 1, 2) seasonal_order=(1, 1, 2, 12); AIC=3069.703, BIC=3094.453, Fit time=2.624 seconds
Fit ARIMA: order=(0, 1, 2) seasonal_order=(0, 1, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds
Fit ARIMA: order=(2, 1, 2) seasonal_order=(0, 1, 1, 12); AIC=3068.673, BIC=3090.330, Fit time=0.633 seconds
Fit ARIMA: order=(1, 1, 3) seasonal_order=(0, 1, 1, 12); AIC=3068.810, BIC=3090.467, Fit time=0.636 seconds
Total fit time: 12.923 seconds

AIC,BIC 是nan的时候也不会报错,真好。

The auto_arima function itself operates a bit like a grid search, in that it tries various sets of p and q (also P and Q for seasonal models) parameters, selecting the model that minimizes the AIC (or BIC, or whatever information criterion you select).

我之前用from statsmodels.tsa.arima_model import ARIMA的ARIMA模型的时候,可是写了个函数去找p,q的。。

def get_order(df, p_range, d_range, q_range):
    warnings.filterwarnings("ignore")
    aic = []
    pdq = []
    for p in range(p_range):
        for d in range(d_range):
            for q in range(q_range):
                order = (p,d,q)
                try:
                    arima_mod=ARIMA(df,(p,d,q)).fit(transparams=True)
                    x=arima_mod.aic
                    x1= p,d,q
                    aic.append(x)
                    pdq.append(x1)
                except:
                    pass
    #When comparing two models, the one with the lower AIC is generally better
    ix = aic.index(min(aic))
    return pdq[ix]

你可能感兴趣的:(python)