chapter15.4 时间序列3--ARIMA预测模型

15.4.1 平稳性(stationarity)

平稳的时间序列的性质不随观测时间的变化而变化【样本时间序列所得到的拟合曲线在未来一段时间内仍能顺着现有的形态“惯性”地延续下去】，平稳性要求序列的均值和方差不发生明显的变化

因此具有趋势或季节性的时间序列不是平稳时间序列——趋势和季节性使得时间序列在不同时段呈现不同性质。与它们相反，白噪声序列（white noise series）则是平稳的【严平稳】——不管观测的时间如何变化，它看起来都应该是一样的

1. 白噪声（white noise）

A series purely random in nature is called as white noise
Mean = 0, constant variance and uncorrelated
Average is best forecast for this series

y <- rnorm(100, mean = 0, sd = 1)  # 生成均值为0，方差为1的100个随机数
par(mfrow = c(1, 2))
plot(y, type = "l")
acf(y)

2. Augmented Dickey-Fuller Test(adf-test)

用于检验时间序列的平稳性，原假设是时间序列不平稳，在R下可以使用adf.test(ts)来检验

dNile <- diff(Nile)# 一阶差分
adf.test(dNile)

15.4.2 差分(d)（differencing)

A series which is not stationary can be made stationary after differencing

只有平稳的时间序列才可以预测，我们可以使用“差分”方法来使数据变平稳,差分可以让数据变得平稳，可以对数据差分多次。差分几次则d就为几

在R中，可以使用diff()函数对序列进行差分diff(ts, differences=d)

15.4.3 AR(p)(Autoregressive Model)

在一个p阶自回归模型中，序列中的每一个值都可以用它之前p个值的线性组合来表示【p阶表示滞后多少】

ß0: 常数项，也可以看成µ, 序列的均值

ß：表示自相关系数

p：阶数

varepsilon：误差

AR模型相关性质的推导如下

E(Yt | Yt-1, Yt-2,..., Yt-p) = ß0 + ß1Yt-1 + ß2Yt-2 + .. + ßpYt-p

Var(Yt | Yt-1, Yt-2,..., Yt-p) = sigma^2

E(Xt) = beta0 / 1 - beta1 - beta2 - ... - betap

$\begin{align} &Assume \ the \ series \ in \ equation \ is \ weakly \ stationary ,\ we \ hava \ E(Y_{t}) = \mu \ Var(Y_{t}) = \gamma0 \ and \ Cov(Y_{t},Y_{t-j}) = \gamma_{j} \\ \\&where \ \mu \ and \ \gamma_{0} \ are \ constant \\ \\&we \ have \ E(Y_{t}) = \beta0 + \beta_{1}E(Y_{t-1}) + \beta_{2}E(Y_{t-2}) + ... + \beta_{p}E(Y_{t-p}) + 0 \\ \\& \mu = \beta0 + \beta_{1}\mu + \beta_{2}\mu + ... + \beta_{p}\mu \\ \\& then,\quad \mu =\frac{\beta_{0}}{1-\beta_{1}-\beta_{2}-...-\beta_{p}} \\ \\& Var(Y_{t}) = \beta_{1}^2Var(Y_{t-1}) + \beta_{2}^2Var(Y_{t-2}) + \sigma^2 \\ \\& so \ that, Var(Y_{t}) = \frac{\sigma^2}{1-\beta_{1}^2-\beta_{2}^2-...-\beta_{p}^2} \\ \\& The \ AR \ model \ can \ be \ writtern \ as \ follow: \\ \\& \gamma_{j} = \begin{cases}\frac{\sigma^2}{1-\beta^2},\text{if} \ j = 0\\ \alpha_{1}^j\gamma_{0},\text{if} \ j >0, \end{cases} \\ \\& then \ the \ ACF \ of \ stationary \ process \ is \\ \\& \rho_{j} \equiv \frac{\gamma_{j}}{\gamma_{0}} = \begin{cases}1,\text{if} j=0 \\ \beta_{1}^j,\text{if} j >0, \end{cases} \end{align}$

15.4.4 MA(q)(moving Average Model)

移动平均模型关注的是误差项的累加，此方法能有效消除随机波动

在一个q阶移动平均模型中，时序中的每个值都可以用之前的q个残差的线性组合来表

Varepsilon: 预测的残差

theta: 权重

q：q阶移动平均模型

15.4.5 ARMA(p,q)

将AR模型和MA模型混合便得到ARMA模型

序列中的每个观测值用过去的p个观测值和q个残差的线性组合来表示

15.4.6 ARIMA(p,d,q)

ARIMA模型是对差分后的数据使用ARMA模型，意味着时序被差分了d次，且序列中的每个观测值都是用过去的p个观测值和q个残差的线性组合表示的

在R中，可使用arima(ts,order=c(p,d,q))

15.4.7 ACF & PACF

1. ACF(Autocorrelation function)

Autocorrelation is the similarity between observartions as a function of the time lag between them

ACF可以近似使用OLS来近似估计，下列代码为1阶滞后(lag=1)所计算出的相关系数和OLS方法估计的参数进行对比

x <- c(22,17,16,14,13,10,12,15,21,19,18,16,19,20,24)
y <- c(17,16,14,13,10,12,15,21,19,18,16,19,20,24,21)  # 1 lag shifting
cor(x,y)  #0.68132
lm(y~x)   # parameter of x is close to correlation, then cor(xt, xt-k) is close to OLS(xt,xt-k)
acf(x)

2. PACF(Partial Autocorrelation function)

对比ACF，PACF不单单考虑xt和xt-k的关系，还考虑了xt和xt-k之间的点对xt的影响，可以理解成计算出去除xt和xt-k之间的数对xt造成的影响后的ACF
$\begin{align} \\&Xt = \beta0 + \beta_{1}X_{t-1} + \varepsilon_{1t} \\&Xt = \beta0 + \beta_{1}X_{t-1} + \beta_{2}X_{t-2} +\varepsilon_{2t} \\&Xt = \beta0 + \beta_{1}X_{t-1} + \beta_{2}X_{t-2}+\beta_{3}X_{t-3}+\varepsilon_{3t} \\&Xt = \beta0 + \beta_{1}X_{t-1} + \beta_{2}X_{t-2}+\beta_{3}X_{t-3}+ \beta_{4}X_{t-4}+ \varepsilon_{4t} \\& ... \\& where \ \beta_{i} are\ the\ coefficient\ of\ AR\ model \\ \\& These\ models\ are\ in\ the\ form\ of\ a\ multiple\ linear\ regression\ model\ and\ can\ be\ estimated\ by\ OLS\ method \\ \\& the\ estimation\ of\ \beta_{1}\ is\ called\ the\ lag1\ sample PACF\ of\ Xt \end{align}$
PACF同样可以使用OLS来近似估计，xt可以表示成不同滞后阶数xt-k的线性组合，对于的系数即为对于xt-k的PACF【思路参考ACF】

# R中计算ACF&PACF
x <- c(22,17,16,14,13,10,12,15,21,19,18,16,19,20,24)
acf(x,type = 'correlation') # acf
acf(x,type='partial')       # pacf
pacf(x)

使用PACF可以决定AR模型的p值，ACF可以决定MA模型的q值，选取阈值以外的值作为对于的p值和q值【显著的值】

15.4.8 模型的评价 AIC & BIC

1. AIC(Akaike information criteria)
$\begin{align} \\&AIC = -\frac{2}{T}ln(likelihood) +\frac{2}{T}(number \ of \ parameter) \\ \\&For \ AR \ model , AIC = ln(\sigma_{l}^2) + \frac{2l}{T} \\ \\& \frac{2}{T} \ measure \ the \ goodness\ fit\ of\ the\ model \\ \\& \frac{2}{T}(number \ of \ parameter)\ is\ a\ penalty \end{align}$

2. BIC( Schwarz-Bayesian information criterion)

AIC和BIC都是越小越好，因此有多个模型可供选择的时候，BIC和AIC小的模型应该优先考虑

AIC和BIC的惩罚项的区别在于，对AIC来说，使用2作为惩罚项，而BIC使用了ln(T),因此当样本量比较大的时候，BIC更倾向于使用更简单的AR模型

使用不同的指标来挑选模型(AIC or BIC)，有时会得到不一样的结果

15.4.9 模型评价

一般来说，一个模型如果合适，那模型的残差应该满足均值为0的正态分布，并且对于任意的滞后阶数，残差自相关系数都应该为零。换句话说，模型的残差应该满足独立正态分布（即残差间没有关联）

使用qq-plot来检验正态性；使用Ljung-Box Q来检验残差是否彼此独立

qqnorm(fit$residuals) 
qqline(fit$residuals)  
Box.test(fit$residuals, type="Ljung-Box")

残差的点越近似于对角线，效果越好，越满足正态分布。

Ljung-Box检验的原假设为相关系数为0，P-value>0.05则不能拒绝原假设，残差满足独立的原假设

R语言案例

library(tseries)
library(forecast)
Nile
plot(Nile)
dNile <- diff(Nile)# 一阶差分
plot(dNile) 
adf.test(dNile) # 平稳性检验

# 通过pacf和acf来选择模型
acf(dNile)  # q = 2
pacf(dNile) # p=2

# 使用ARIMA模型预测
fit <- arima(Nile, order=c(2,1,2)) 
fit

# AIC和BIC
AIC(fit)
BIC(fit)
accuracy(fit)

# 模型评价
qqnorm(fit$residuals) 
qqline(fit$residuals)  
Box.test(fit$residuals, type="Ljung-Box")

# 预测
forecast(fit,3)
plot(forecast(fit, 3), xlab="Year", ylab="Annual Flow")

15.4.10 ARIMA的自动预测

程序包中的auto.arima()函数可以实现最优ARIMA模型的自动选取

# ARIMA的自动预测
fit<-auto.arima(sunspots)
fit
forecast(fit,3)
accuracy(fit)