Autoregressive models can be developed for univariate time series data that is stationary (AR), has a trend (ARIMA), and has a seasonal component (SARIMA).
One aspect of a univariate time series that these autoregressive models do not model is a change in the variance over time.
Classically, a time series with modest changes in variance can sometimes be adjusted using a power transform, such as by taking the Log or using a Box-Cox transform.
There are some time series where the variance changes consistently over time. In the context of a time series in the financial domain, this would be called increasing and decreasing volatility.
In time series where the variance is increasing in a systematic way, such as an increasing trend, this property of the series is called heteroskedasticity. It’s a fancy word from statistics that means changing or unequal variance across the series.
If the change in variance can be correlated over time, then it can be modeled using an autoregressive process, such as ARCH.
Autoregressive Conditional Heteroskedasticity, or ARCH, is a method that explicitly models the change in variance over time in a time series.
Specifically, an ARCH method models the variance at a time step as a function of the residual errors from a mean process (e.g. a zero mean).
The ARCH process introduced by Engle (1982) explicitly recognizes the difference between the unconditional and the conditional variance allowing the latter to change over time as a function of past errors.
A lag parameter must be specified to define the number of prior residual errors to include in the model. Using the notation of the GARCH model (discussed later), we can refer to this parameter as “q“. Originally, this parameter was called “p“, and is also called “p” in the arch Python package used later in this tutorial.
q: The number of lag squared residual errors to include in the ARCH model.
A generally accepted notation for an ARCH model is to specify the ARCH() function with the q parameter ARCH(q); for example, ARCH(1) would be a first order ARCH model.
The approach expects the series is stationary, other than the change in variance, meaning it does not have a trend or seasonal component. An ARCH model is used to predict the variance at future time steps.
[ARCH] are mean zero, serially uncorrelated processes with nonconstant variances conditional on the past, but constant unconditional variances. For such processes, the recent past gives information about the one-period forecast variance.
In practice, this can be used to model the expected variance on the residuals after another autoregressive model has been used, such as an ARMA or similar.
The model should only be applied to a prewhitened residual series {e_t} that is uncorrelated and contains no trends or seasonal changes, such as might be obtained after fitting a satisfactory SARIMA model.
Generalized Autoregressive Conditional Heteroskedasticity, or GARCH, is an extension of the ARCH model that incorporates a moving average component together with the autoregressive component.
Specifically, the model includes lag variance terms (e.g. the observations if modeling the white noise residual errors of another process), together with lag residual errors from a mean process.
The introduction of a moving average component allows the model to both model the conditional change in variance over time as well as changes in the time-dependent variance. Examples include conditional increases and decreases in variance.
As such, the model introduces a new parameter “p” that describes the number of lag variance terms:
p: The number of lag variances to include in the GARCH model.
q: The number of lag residual errors to include in the GARCH model.
A generally accepted notation for a GARCH model is to specify the GARCH() function with the p and q parameters GARCH(p, q); for example GARCH(1, 1) would be a first order GARCH model.
A GARCH model subsumes ARCH models, where a GARCH(0, q) is equivalent to an ARCH(q) model.
For p = 0 the process reduces to the ARCH(q) process, and for p = q = 0 E(t) is simply white noise.
In the ARCH(q) process the conditional variance is specified as a linear function of past sample variances only, whereas the GARCH(p, q) process allows lagged conditional variances to enter as well. This corresponds to some sort of adaptive learning mechanism.
As with ARCH, GARCH predicts the future variance and expects that the series is stationary, other than the change in variance, meaning it does not have a trend or seasonal component.
The configuration for an ARCH model is best understood in the context of ACF and PACF plots of the variance of the time series.
This can be achieved by subtracting the mean from each observation in the series and squaring the result, or just squaring the observation if you’re already working with white noise residuals from another model.
If a correlogram appears to be white noise […], then volatility ca be detected by looking at the correlogram of the squared values since the squared values are equivalent to the variance (provided the series is adjusted to have a mean of zero).
The ACF and PACF plots can then be interpreted to estimate values for p and q, in a similar way as is done for the ARMA model.
In this section, we will look at how we can develop ARCH and GARCH models in Python using the arch library.
First, let’s prepare a dataset we can use for these examples.
Test Dataset
We can create a dataset with a controlled model of variance.
The simplest case would be a series of random noise where the mean is zero and the variance starts at 0.0 and steadily increases.
We can achieve this in Python using the gauss() function that generates a Gaussian random number with the specified mean and standard deviation.
# create a simple white noise with increasing variance
from random import gauss
from random import seed
from matplotlib import pyplot
# seed pseudorandom number generator
# create dataset
data = [gauss(0, i*0.01) for i in range(0,100)]
# plot
We know there is an autocorrelation in the variance of the contrived dataset.
Nevertheless, we can look at an autocorrelation plot to confirm this expectation. The complete example is listed below.
# check correlations of squared observations
from random import gauss
from random import seed
from matplotlib import pyplot
from import plot_acf
import numpy as np
# seed pseudorandom number generator
# create dataset
data = [gauss(0, i*0.01) for i in range(0,100)]
# square the dataset
squared_data = [x**2 for x in data]
# create acf plot
Define the model
Fit the model
Make a forecast.
Before fitting and forecasting, we can split the dataset into a train and test set so that we can fit the model on the train and evaluate its performance on the test set.
A model can be defined by calling the arch_model() function. We can specify a model for the mean of the series: in this case mean=’Zero’ is an appropriate model. We can then specify the model for the variance: in this case vol=’ARCH’. We can also specify the lag parameter for the ARCH model: in this case p=15.
Note, in the arch library, the names of p and q parameters for ARCH/GARCH have been reversed.
# example of ARCH model
from random import gauss
from random import seed
from matplotlib import pyplot
from arch import arch_model
# seed pseudorandom number generator
# create dataset
data = [gauss(0, i*0.01) for i in range(0,100)]
# split into train/test
n_test = 10
train, test = data[:-n_test], data[-n_test:]
# define model
model = arch_model(train, mean='Zero', vol='ARCH', p=15)
# fit model
model_fit =
# forecast the test set
yhat = model_fit.forecast(horizon=n_test)
# plot the actual variance
var = [i*0.01 for i in range(0,100)]
# plot forecast variance
pyplot.plot(yhat.variance.values[-1, :])
We can fit a GARCH model just as easily using the arch library.
The arch_model() function can specify a GARCH instead of ARCH model vol=’GARCH’ as well as the lag parameters for both.
The dataset may not be a good fit for a GARCH model given the linearly increasing variance, nevertheless, the complete example is listed below.
p: The number of lag variances to include in the GARCH model.
q: The number of lag residual errors to include in the GARCH model.
# example of ARCH model
from random import gauss
from random import seed
from matplotlib import pyplot
from arch import arch_model
# seed pseudorandom number generator
# create dataset
data = [gauss(0, i*0.01) for i in range(0,100)]
# split into train/test
n_test = 10
train, test = data[:-n_test], data[-n_test:]
# define model
model = arch_model(train, mean='Zero', vol='GARCH', p=15, q=15)
# fit model
model_fit =
# forecast the test set
yhat = model_fit.forecast(horizon=n_test)
# plot the actual variance
var = [i*0.01 for i in range(0,100)]
# plot forecast variance
pyplot.plot(yhat.variance.values[-1, :])
