目 录
3.1 数据准备
3.2 差分法分析
3.3 ARIMA(p,d,q)中p,q确定
3.4 BIC检验
3.4.1 遍历,寻找适宜的参数
3.4.2 直接法得到p和q的最优值
3.5 模型检验
3.6 模型预测
3.6.1 预测训练2021-04至2021-05月股票收盘价
3.6.2 预测训练2021-07至2021-09股票收盘价
3.6.2 预估15个工作日兴蓉环境股票收盘价
ARIMA模型原理详见(ARIMA(p,d,q)模型原理),下面进行实训编程。
# -*- coding: utf-8 -*-
"""
Created on Sat Sep 4 11:39:06 2021
@author: Zhuchunqiang
"""
import pandas as pd
Stock_XRHJ = pd.read_csv('XRHJ000598.csv',index_col = 'Date',parse_dates=['Date'])
>>Stock_XRHJ
Out[7]:
Unnamed: 0 Open Close High Low Volume
Date
2019-01-17 0 4.18 4.15 4.18 4.14 78508
2019-01-18 1 4.16 4.18 4.19 4.15 88076
2019-01-21 2 4.20 4.17 4.22 4.17 78809
2019-01-22 3 4.17 4.18 4.21 4.16 77226
2019-01-23 4 4.17 4.19 4.20 4.15 71678
... ... ... ... ... ...
2021-08-30 636 5.50 5.44 5.51 5.35 257074
2021-08-31 637 5.47 5.54 5.58 5.43 252643
2021-09-01 638 5.54 5.55 5.63 5.51 265328
2021-09-02 639 5.55 5.60 5.61 5.51 189830
2021-09-03 640 5.59 5.62 5.72 5.56 211018
[641 rows x 6 columns]
图 16 兴蓉环境全集数据 图 17 兴蓉环境训练集数据
图 18 兴蓉环境测试集数据
# -*- coding: utf-8 -*-
"""
Created on Sat Sep 4 11:39:06 2021
@author: Zhuchunqiang
"""
import pandas as pd
import matplotlib.pyplot as plt
Stock_XRHJ = pd.read_csv('XRHJ000598.csv',index_col = 'Date',parse_dates=['Date'])
df = pd.DataFrame(Stock_XRHJ)
#1.数据准备
df.index = pd.to_datetime(df.index)
sub = df['2019-01':'2021-08']['Close']
train = df['2019-01':'2020-12']['Close']
test = df['2021-01':'2021-08']['Close']
plt.figure(figsize=(20,10))
print(test)
plt.plot(test)
plt.grid()
plt.show()
#2.差分法
df['Close_diff_1'] = df['Close'].diff(1)#一阶差分
df['Close_diff_2'] = df['Close_diff_1'].diff(1)#二阶差分
fig = plt.figure(figsize=(20,6))
ax1 = fig.add_subplot(131)
ax1.plot(df['Close'])
ax2 = fig.add_subplot(132)
ax2.plot(df['Close_diff_1'])
ax3 = fig.add_subplot(133)
ax3.plot(df['Close_diff_2'])
plt.show()
图 19 数据集及一阶差分、二阶差分示意图
import statsmodels.api as sm
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(train, lags=20,ax=ax1)
ax1.xaxis.set_ticks_position('bottom')
fig.tight_layout()
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(train, lags=20, ax=ax2)
ax2.xaxis.set_ticks_position('bottom')
fig.tight_layout()
plt.show()
##数据自相关系数2阶拖尾,偏自相关系数2阶截尾,因此可以选择的是AR(2)模型。
图 20 自相关系数和偏相关系数示意图
#4.BIC检验
#4.1遍历,寻找适宜的参数
import itertools
import numpy as np
import seaborn as sns
p_min = 0
d_min = 0
q_min = 0
p_max = 5
d_max = 0
q_max = 5
# Initialize a DataFrame to store the results,以BIC准则
results_bic = pd.DataFrame(index=['AR{}'.format(i) for i in range(p_min,p_max+1)],
columns=['MA{}'.format(i) for i in range(q_min,q_max+1)])
for p,d,q in itertools.product(range(p_min,p_max+1),
range(d_min,d_max+1),
range(q_min,q_max+1)):
if p==0 and d==0 and q==0:
results_bic.loc['AR{}'.format(p), 'MA{}'.format(q)] = np.nan
continue
try:
model = sm.tsa.ARIMA(train, order=(p, d, q))
results = model.fit()
results_bic.loc['AR{}'.format(p), 'MA{}'.format(q)] = results.bic
except:
continue
results_bic = results_bic[results_bic.columns].astype(float)
fig, ax = plt.subplots(figsize=(10, 8))
ax = sns.heatmap(results_bic,
mask=results_bic.isnull(),
ax=ax,
annot=True,
fmt='.2f',
)
ax.set_title('BIC')
plt.show()
#根据热力图得到BIC(1,0),表明我们应该选择AR(1)模型
图 21 BIC热力图
#4.2直接法得到p和q的最优值
import statsmodels.api as sm
train_results = sm.tsa.arma_order_select_ic(train, ic=['aic', 'bic'], trend='nc', max_ar=8, max_ma=8)
>>print('AIC', train_results.aic_min_order)
>>print('BIC', train_results.bic_min_order)
输出结果:
AIC (1, 4)
BIC (1, 0)
#5.模型检验
import statsmodels.api as sm
model = sm.tsa.ARIMA(train, order=(1, 0, 0))
results = model.fit()
resid = results.resid #赋值
fig = plt.figure(figsize=(12,8))
fig = sm.graphics.tsa.plot_acf(resid.values.squeeze(), lags=40)
plt.show()
图 22 兴蓉环境收盘价AR(1,0,0)模型自相关示意图
#6.模型预测
#预测主要有两个函数,一个是predict函数,一个是forecast函数,predict中进行预测的时间段必须在我们训练ARIMA模型的数据中,forecast则是对训练数据集末尾下一个时间段的值进行预估。
import statsmodels.api as sm
model = sm.tsa.ARIMA(sub, order=(1, 0, 0))#ARIMA(1,0,0)模型
results = model.fit()
#6.1预测训练数据
predict_sunspots = results.predict(start=str('2021-04'),end=str('2021-05'),dynamic=False)
print(predict_sunspots)
fig, ax = plt.subplots(figsize=(12, 8))
ax = sub.plot(ax=ax)
predict_sunspots.plot(ax=ax)
plt.show()
图 23 兴蓉环境收盘价2021-04至05月份价格预测
>>print(predict_sunspots)
Date 预测值(真实值) Date 预测值(真实值)
2021-04-01 5.378886(5.24) 2021-04-02 5.231921(5.26)
2021-04-06 5.251516(5.17) 2021-04-07 5.163337(5.19)
2021-04-08 5.182932(5.08) 2021-04-09 5.075157(5.04)
2021-04-12 5.035966(5.09) 2021-04-13 5.084955(5.09)
2021-04-14 5.084955(5.05) 2021-04-15 5.045764(5.04)
2021-04-16 5.035966(5.07) 2021-04-19 5.065360(5.06)
2021-04-20 5.055562(5.07) 2021-04-21 5.065360(5.05)
2021-04-22 5.045764(5.06) 2021-04-23 5.055562(5.18)
2021-04-26 5.173134(5.07) 2021-04-27 5.065360(5.05)
2021-04-28 5.045764(5.20) 2021-04-29 5.192730(5.19)
2021-04-30 5.182932(5.20) 2021-05-06 5.192730(5.21)
dtype: float64
#6.2预测训练
# -*- coding: utf-8 -*-
"""
Created on Sat Sep 4 11:39:06 2021
@author: Zhuchunqiang
"""
import pandas as pd
import matplotlib.pyplot as plt
Stock_XRHJ = pd.read_csv('XRHJ000598.csv',index_col = 'Date',parse_dates=['Date'])
df = pd.DataFrame(Stock_XRHJ)
#1.数据准备
df.index = pd.to_datetime(df.index)
sub = df['2019-01':'2021-09']['Close']
train = df['2019-01':'2020-12']['Close']
test = df['2021-01':'2021-09']['Close']
plt.figure(figsize=(10,10))
print(test)
plt.plot(test)
plt.grid()
plt.show()
图 24 2021-01至2021-09-01日股票数据
#6.模型预测
import statsmodels.api as sm
model = sm.tsa.ARIMA(sub, order=(1, 0, 0))
results = model.fit()
#6.1预测训练数据
predict_sunspots = results.predict(start=str('2021-07'),end=str('2021-09'),dynamic=False)
print(predict_sunspots)
fig, ax = plt.subplots(figsize=(12, 8))
ax = sub.plot(ax=ax)
predict_sunspots.plot(ax=ax)
plt.show()
图 25 兴蓉环境2021-07-01至2021-09-01收盘价预测
>>print(predict_sunspots)
Date 预测值(真实值) Date 预测值(真实值)
2021-07-01 5.203269(5.22) 2021-07-02 5.213083(5.22)
2021-07-05 5.213083(5.19) 2021-07-06 5.183642(5.11)
2021-07-07 5.105135(5.11) 2021-07-08 5.105135(5.12)
2021-07-09 5.114949(5.19) 2021-07-12 5.183642(5.19)
2021-07-13 5.183642(5.22) 2021-07-14 5.213083(5.21)
2021-07-15 5.203269(5.18) 2021-07-16 5.173829(5.29)
2021-07-19 5.281776(5.26) 2021-07-20 5.252336(5.19)
2021-07-21 5.183642(5.14) 2021-07-22 5.134575(5.16)
2021-07-23 5.154202(5.19) 2021-07-26 5.183642(5.12)
2021-07-27 5.114949(5.12) 2021-07-28 5.114949(4.89)
2021-07-29 4.889241(4.89) 2021-07-30 4.889241(4.96)
2021-08-02 4.957934(4.96) 2021-08-03 4.957934(4.95)
2021-08-04 4.948121(4.97) 2021-08-05 4.967748(4.96)
2021-08-06 4.957934(5.03) 2021-08-09 5.026628(5.02)
2021-08-10 5.016815(5.07) 2021-08-11 5.065882(5.09)
2021-08-12 5.085508(5.07) 2021-08-13 5.065882(5.14)
2021-08-16 5.134575(5.19) 2021-08-17 5.183642(5.12)
2021-08-18 5.114949(5.12) 2021-08-19 5.114949(5.14)
2021-08-20 5.134575(5.19) 2021-08-23 5.183642(5.20)
2021-08-24 5.193456(5.31) 2021-08-25 5.301403(5.33)
2021-08-26 5.321030(5.30) 2021-08-27 5.291590(5.47)
2021-08-30 5.458417(5.44) 2021-08-31 5.428977(5.54)
2021-09-01 5.527111(5.55)
dtype: float64
#6.2 预估下一个值#results.forecast()[0]
>>results.forecast(15)
Out[58]:
(array([5.6056182 , 5.59150478, 5.57765473, 5.56406313, 5.55072516,
5.53763609, 5.52479127, 5.51218615, 5.49981625, 5.48767718,
5.47576463, 5.46407438, 5.45260228, 5.44134426, 5.43029633]),
array([0.06556483, 0.09186161, 0.11146884, 0.12753291, 0.14128705,
0.15337117, 0.16416988, 0.17393677, 0.18285008, 0.19104117,
0.19861034, 0.20563647, 0.21218303, 0.21830204, 0.2240369 ]),
array([[5.47711349, 5.73412292], [5.41145933, 5.77155023],
[5.35917983, 5.79612963], [5.31410322, 5.81402304],
[5.27380763, 5.8276427 ], [5.23703413, 5.83823806],
[5.20302423, 5.84655832], [5.17127635, 5.85309594],
[5.14143667, 5.85819582], [5.11324337, 5.86211098],
[5.08649552, 5.86503374], [5.0610343 , 5.86711447],
[5.03673119, 5.86847338], [5.01348012, 5.86920841],
[4.99119207, 5.86940059]]))
表3.6 兴蓉环境15日预测数据表
Date 收盘价 置信度 置信区间
2021-09-02 5.6056182 0.06556483 [5.47711349, 5.73412292]
2021-09-03 5.59150478 0.09186161 [5.41145933, 5.77155023]
2021-09-06 5.5776547 0.11146884 [5.35917983, 5.79612963]
2021-09-07 5.56406313 0.12753291 [5.31410322, 5.81402304]
2021-09-08 5.55072516 0.14128705 [5.27380763, 5.8276427 ]
2021-09-09 5.53763609 0.15337117 [5.23703413, 5.83823806]
2021-09-10 5.52479127 0.16416988 [5.20302423, 5.84655832]
2021-09-13 5.51218615 0.17393677 [5.17127635, 5.85309594]
2021-09-14 5.49981625 0.18285008 [5.14143667, 5.85819582]
2021-09-15 5.48767718 0.19104117 [5.11324337, 5.86211098]
2021-09-16 5.47576463 0.19861034 [5.08649552, 5.86503374]
2021-09-17 5.46407438 0.20563647 [5.0610343 , 5.86711447]
2021-09-22 5.45260228 0.21218303 [5.03673119, 5.86847338]
2021-09-23 5.44134426 0.21830204 [5.01348012, 5.86920841]
2021-09-24 5.43029633 0.2240369 [4.99119207, 5.86940059]