When we make a forecast about the future, we can measure the deviation of that forecast with the actual event, once its observed. We take the example of an [[Autoregressive Model]] $\mathrm{AR}(1)$, defined as follows: $ x_{t+1}=\lambda x_t + \epsilon_{t+1}$ **Non-Invariance:** While the one-step forecast error is invariant whether we predict the level or the difference, this invariance does not hold for larger horizons. When forecasting multiple steps ahead, the forecast errors for levels and differences evolve differently because the weights on the error terms change with the forecast horizon. ## One Step Ahead (Forecasting Level) **Forecast:** The one step ahead forecast $f_{t,1}$ is the expectation of $x_{t+1}$ conditional on the available information $I_t$ at time $t$. $ \begin{align} f_{t,1} &= \mathbb E[x_{t+1} \vert I_t] \tag{1}\\[2pt] &= \mathbb E[\lambda x_t + \epsilon_{t+1}] \tag{2}\\[2pt] &=\mathbb \lambda \mathbb E[x_t] + \mathbb E[\epsilon_{t+1}] \tag{3}\\[2pt] &=\lambda x_t \tag{4} \end{align} $ where: - (2) Using recursive to express $x_{t+1}$ in terms of variables that we already know at time $t$. Note that error terms are assumed to be [[Stationarity]]. Therefore we know the distribution of [[Random Variable]] $\epsilon_{t+1}$ already at $t$. - (3) By applying [[Linearity of Expectations]] we can split into a sum of expectations, where the expectation of the error term is zero. - (4) As $x_t$ is already observed at time $t$, it is a constant and not a [[Random Variable]] anymore. **Forecast error:** The forecast error $e_{t+1}$ at $(t+1)$ is the difference between the forecast and the realized value. By substituting $\lambda x_t$ for the forecast, and by recursively defining $x_{t+1}$ in terms of $x_t$, only the error term remains. $ \begin{align} e_{t+1} &= x_{t+1}- f_{t,1} \\ &= x_{t+1}- \lambda x_t \\ &=\lambda x_t+ \epsilon_{t+1} - \lambda x_t \\ &=\epsilon_{t+1} \end{align} $ **Mean-squared forecast error:** The "MSFE" is the same as the second moment of $e_{t+1}$ since the expectation of the underlying $\epsilon$ is $0$. $ \begin{align} \mathbb E[e_{t+1}^2] = \mathrm{Var}(e_{t+1}) &= \mathbb E[(e_{t+1}^2)] - (\mathbb E[e_{t+1}])^2 \\[2pt] &=\mathbb E[(\epsilon_{t+1}^2)] \\[2pt] &= \sigma^2 \end{align} $ ## One Step Ahead (Forecasting Differences) We achieve the same result, when we define $\Delta x_{t+1}$ as the difference in $x_t$ to the next period, and $\mathbb E[\Delta x_{t+1} \vert I_t]$ as the forecast $f^\prime_{t,1}$. Forecast: $ \begin{align} f^\prime_{t,1} &= \mathbb E[\Delta x_{t+1} \vert I_t]\\[2pt] &= \mathbb E[x_{t+1} -x_t \vert I_t]\\[2pt] &=\mathbb E[\lambda x_t+ \epsilon_{t+1}-x_t]\\[2pt] &=\mathbb E[x_t(\lambda -1)+ \epsilon_{t+1}]\\[2pt] &= x_t(\lambda-1) \end{align} $ Forecast error: $ \begin{align} e^\prime_{t+1} &= \Delta x_{t+1}-f^\prime_{t,1}\\ &= x_{t+1} - x_t -x_t(\lambda -1)\\ &= \lambda x_t+ \epsilon_{t+1}-x_t-x_t(\lambda -1)\\ &= \lambda x_t+ \epsilon_{t+1}-x_t-(\lambda x_t -x_t)\\ &=\epsilon_{t+1} \end{align} $ We can see that the forecast error is the same as no matter if we directly predict the level of $x_{t+1}$ instead of the differences $\Delta x_{t+1}$. Note that this does not hold in the general case with larger steps ahead. ## Two Steps Ahead (Forecasting Level) Forecast: $ \begin{align} f_{t,2} &= \mathbb E[x_{t+2} \vert I_t]\\ &= \mathbb E[\lambda x_{t+1} + \epsilon_{t+2} \vert I_t]\\ &= \mathbb E[\lambda (\lambda x_t + \epsilon_{t+1})+ \epsilon_{t+2}]\\ &=\mathbb E[\lambda^2x_t]+ \lambda\mathbb E[\epsilon_{t+1}] + \mathbb E[\epsilon_{t+2}]\\ &=\lambda^2 x_t \end{align} $ Forecast error: $ \begin{align} e_{t,2} &= x_{t+2} - f_{t,2}\\ &= x_{t+2} - \lambda^2 x_t\\ &= \lambda x_{t+1}+ \epsilon_{t+2} - \lambda^2 x_t \\ &= \lambda (\lambda x_{t} +\epsilon_{t+1})+ \epsilon_{t+2} - \lambda^2 x_t \\ &=\lambda^2x_t+\lambda \epsilon_{t+1} + \epsilon_{t+2}- \lambda^2 x_t \\ &=\lambda \epsilon_{t+1} + \epsilon_{t+2} \end{align} $ MSFE: $ \begin{align} \mathbb E[e_{t,2}^2] &= \mathrm{Var}(e_{t,2}^2) \\[2pt] &= \mathrm{Var}(\lambda \epsilon_{t+1}+\epsilon_{t+2})\\[2pt] &= \mathrm{Var}(\lambda \epsilon_{t+1})+\mathrm{Var}(\epsilon_{t+2})\\[2pt] &= \lambda^2\sigma^2+\sigma^2\\[2pt] &= \sigma^2(\lambda^2+1) \end{align} $ ## Two Steps Ahead (Forecasting Differences) Forecast: $ \begin{align} f^\prime_{t,2} &= \mathbb E[\Delta x_{t+2} \vert I_t]\\[2pt] &=\mathbb E[x_{t+2} - x_{t+1} \vert I_t]\\[2pt] &= \mathbb E[\lambda x_{t+1} + \epsilon_{t+2}- (\lambda x_t + \epsilon_{t+1}) \vert I_t]\\[2pt] &= \mathbb E[\lambda (\lambda x_t+ \epsilon_{t+1})+\epsilon_{t+2}- \lambda x_t - \epsilon_{t+1}]\\[2pt] &=\lambda^2 x_t-\lambda x_t \\[2pt] &= \lambda x_t(\lambda-1) \end{align} $ Forecast error: $ \begin{align} e^\prime_{t,2} &=\Delta x_{t+2} - f^\prime_{t,2} \\ &=x_{t+2}- x_{t+1}-\lambda x_t(\lambda -1)\\ &=\lambda x_{t+1} + \epsilon_{t+2}-(\lambda x_t+\epsilon_{t+1})-\lambda x_t(\lambda -1)\\ &=\lambda (\lambda x_t+\epsilon_{t+1}) + \epsilon_{t+2}-\lambda x_t-\epsilon_{t+1}-\lambda x_t(\lambda -1)\\ &=\lambda^2x_t + \lambda \epsilon_{t+1}+ \epsilon_{t+2}-\lambda x_t-\epsilon_{t+1}-\lambda^2 x_t+\lambda x_t\\ &=\lambda \epsilon_{t+1}+ \epsilon_{t+2}-\epsilon_{t+1} \end{align} $ MSFE: $ \begin{align} \mathbb E[e^{\prime^2}_{t,2}] &= \mathbb E\Big[\big(\epsilon_{t+1}(\lambda -1)+ \epsilon_{t+2}\big)^2\Big]\\[2pt] &= \mathbb E\Big[\epsilon_{t+1}^2(\lambda -1)^2+ \epsilon^2_{t+2}+2(\lambda-1)\epsilon_{t+1}\epsilon_{t+2}\Big]\\[2pt] &=(\lambda -1)^2 \mathbb E[\epsilon_{t+1}^2]+\mathbb E[\epsilon_{t+2}^2]+ 2(\lambda -1)* \mathbb E[\epsilon_{t+1}\epsilon_{t+2}]\\[6pt] &=(\lambda -1)^2 \sigma^2+\sigma^2 \end{align} $ ## Parameter Uncertainty Additionally if $\lambda$ is only estimated, we need to account for its uncertainty on top. For the one step ahead forecast directly on the level, we can write as follows: $ f_{t,1}=\hat \lambda x_t$ Forecast error: $ \begin{align} e_{t+1} &=x_{t+1} - f_{t,1}\\[2pt] &= \lambda x_{t+}+\epsilon_{t+1}-\hat \lambda x_t\\[2pt] &= x_t(\lambda - \hat \lambda)+\epsilon_{t+1} \end{align} $ This makes intuitive sense, as it is the forecast error from the previous setting, plus the deviation of the estimated $\lambda$ to the true parameter, scaled by $x_t$.