Forecasting

Forecasts in time series models are predictions of future observations conditioned on information that is known at the time of forecast.

Theorem (cf. Granger)

Granger states that the optimal forecast, ft,h, of a value is the conditional expectation of that value provided the cost function is symmetric and convex. In other words,

f_{t, h} = E[X_{t+h} | I_t]

provided the cost function is symmetrical and convex.

Computing Conditional Expectations

To compute conditional expectations, let’s consider the AR(1) model: 

r_t - \mu = -\lambda(r_{t-1} - \mu) + \sigma z_t

Suppose we let

Y_t = \frac{r_t-\mu}{\sigma}

we can rewrite the model as

Y_t = -\lambda Y_{t-1} + z_t

(Note that σ is not the standard deviation of r; it’s just a coefficient that we put in front of zt. Hence, Yt is not the standardized version of rt).

If we let It denote information that is known at time t, we can compute the conditional expectations of the forecasts Yt+1 and Yt+2 based on It.

\begin{aligned}
Y_{t+1} &= -\lambda Y_t + z_{t+1}\\
\\
E[Y_{t+1} | I_t] 
&= E[-\lambda Y_t + z_{t+1} | I_t]\\
&= E[-\lambda Y_t | I_t]+ E[z_{t+1} | I_t]\\
&=-\lambda Y_t
\end{aligned}

Explanation

E[−λYt | It] is a constant because It includes the value of Yt. Therefore, Yt is a known constant at time t.

\begin{aligned}
Y_{t+2} 
&= -\lambda Y_{t+1} + z_{t+2}\\
&= -\lambda(-\lambda Y_t + z_{t+1}) + z_{t+2}\\
&= \lambda^2 Y_t - \lambda z_{t+1} + z_{t+2}\\
\\
E[Y_{t+2} | I_t] 
&= E[\lambda^2 Y_t - \lambda z_{t+1} + z_{t+2} | I_t]\\
&=\lambda^2 Y_t
\end{aligned}

Cost Function

A commonly used cost function when evaluating models is the mean squared forecast error (MSFE). 

Forecast error, et+h, is defined as the difference between the outcome and the forecast. In other words,

e_{t+h} = x_{t+h} - f_{t, h}

where ft,h denotes the forecast at time t, with horizon h.

MSFE is defined as E[c(et+h)2], where c is some positive scale coefficient. 

MSFE is a symmetrical cost function as it treats forecasts that are above and beyond the actual outcomes equally. Therefore, it satisfies Granger’s theorem for optimal forecast.

However, one problem with using MSFE as the cost function is that it is not invariant. That is, the size of the forecast errors and the relative performance of different forecasts may change based on variables used and parameters estimation.

Consider a AR(1) model given by

\begin{align}
x_t &= -\lambda x_{t-1} + \epsilon_t\\
\end{align}
\text{where } x_t = r_t - \mu, \epsilon_t = \sigma z_t

If we subtract xt−1 on both sides, we get the Δxt, where 

\begin{aligned}
\Delta x_t &= x_t - x_{t-1}\\
&= (-\lambda x_{t-1} + \epsilon_t) - x_{t-1}
\end{aligned}
\begin{align}
\text{Therefore, }
\Delta x_t &= -(1+\lambda)x_{t-1} + \epsilon_t
\end{align}

(1) and (2) are equivalent equations.

Let ft,2 and f’t,2 be the forecasts of xt and Δxt 2 time-steps away respectively.

It can be shown that 

\begin{aligned}
E[(x_{t+2} - f_{t, 2})^2] &= (1+\lambda^2)\sigma^2\\
E[(\Delta x_{t+2} - f_{t, 2})^2] &= [1+(1+\lambda)^2]\sigma^2\\
\end{aligned}

The two MSFE are not the same even though xt+2 and Δxt+2 are equivalent. In addition, if we need to estimate the parameter λ, the MSFE differs even more.

\begin{aligned}
\text{MSFE for }f_{t, 2} &= \sigma^2(1+\lambda^2) + x_t^2(\lambda^2-\hat{\lambda})\\
\text{MSFE for }f'_{t, 2} &= \sigma^2[1+(1+\lambda)^2] + x_t^2(\lambda-\hat{\lambda})^2(1-\lambda-\hat{\lambda})^2\\
\end{aligned}

where λ is the true parameter and λ-hat is the estimated parameter. 

As MSFE is non-invariant, we need to be aware of this issue when comparing models. For instance, we should not compare the MSFE for ft,2 of one model with the MSFE for f’t,2 of another even though xt+2 and Δxt+2 are equivalent.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *