Yule-Walker equations are a set of linear equations to estimate the $\phi=(\phi_1, \dots, \phi_p)$ coefficients and the noise term variable $\sigma_W^2$ in an [[Autoregressive Model|Autoregressive Model]] in closed form. - *Autocovariance equations:* Autocovariance $\gamma$ with lag $h$ is the weighted sum of all $\gamma$ with shorter lags. We have equations for each $h=1,\dots ,p$. - *Variance equation:* The [[Variance]] of the [[White Noise Model|White Noise]] $\sigma_W^2$ remainder is the overall variance of the series, minus the weighted sum of autocovariances. $ \begin{aligned} \gamma(h) &= \phi_1 \gamma(h-1)+\phi_2 \gamma(h-2)+\dots+\phi_p \gamma(h-p) \\[6pt] \sigma_W^2&=\gamma(0)-\phi_1 \gamma(1) - \phi_2\gamma(2)-\dots-\phi_p \gamma_p(p) \end{aligned} $ The autocovariance equations, can also be written in matrix-form all at once. $ \underbrace{ \begin{bmatrix} \gamma(1) \\ \gamma(2) \\ \vdots \\ \gamma(p) \end{bmatrix} }_{\gamma_p} = \underbrace{\begin{bmatrix} \gamma(0) & \gamma(1) & \cdots & \gamma(p-1) \\ \gamma(1) & \ddots & &\vdots\\ \vdots & &\ddots& \vdots \\ \gamma(p-1) & \cdots & \cdots& \gamma(0) \end{bmatrix}}_{\Gamma_p} * \underbrace{\begin{bmatrix} \phi_1 \\ \phi_2 \\ \vdots \\ \phi_p \end{bmatrix}}_{\phi} $ This leads to a simplified notation: $ \gamma_p= \Gamma_p \phi \quad \implies \quad \phi=\Gamma_p^{-1}\gamma_p $ The equation for the noise variance can also be simplified by writing all $\phi$ terms as a [[Vector Operations|Vector]]. Finally we replace that vector by the estimated $\Gamma_p^{-1}$ from above. $ \begin{align} \sigma_W^2 &= \gamma(0) - \phi_1 \gamma(1)-\dots - \phi_p \gamma(p) \tag{1}\\[4pt] &=\gamma(0)- \phi^T\gamma_p \tag{2}\\[4pt] &=\gamma(0)-\gamma_p^T\, \Gamma_p^{-1}\gamma_p \tag{3} \end{align} $ (3) Since the [[Covariance Matrix]] $\Gamma_p$ is [[Convexity of Multivariate Functions#^f4dc92|positive-semidefinite]], by definition the product of $x^TAx$ is also non-negative. This means that the variance of the noise $\sigma_W^2$ is less or equal the variance of the overall time series $\gamma(0)$, which is the autocovariance without 0 lag. >[!note:] >Now we have expressed everything in terms of the autocovariances, which we just need to estimate from the data. ## Derivation of Yule-Walker Equations To derive the set of equations, we treat the $\text{AR}(p)$ model as a regression problem, which we solve via [[LMS Estimator]]. $ \hat X_t = \hat \phi_1 X_{t-1}+\hat \phi_2 X_{t-2}+\dots + \hat \phi_p X_{t-p} $ **Properties of LMS Estimator:** 1. *Lowest variance:* The estimator has lowest variance (by definition “least squares”) among all estimators. 2. *Unbiased estimator:* The estimator is [[Properties of an Estimator#Key Properties of an Estimator|unbiased]], which means that the residuals have zero expectation. $ \mathbb E[(\hat X_t-X_t)]=0 $ 3. *Uncorrelated residuals:* The residuals are uncorrelated with any of the regressors. Otherwise we could still improve the model, and this would not be the least squares solution. Due to their uncorrelatedness, multiplying residuals with some of the lagged variables from the $\text{AR}(p)$ model, should have a zero expectation, given the uncorrelatedness. $ \mathbb E[(\hat X_t-X_t)X_{t-k}]=0 \quad \text{for} \, k=1, \dots p $ **Derivation Using LMS Properties:** Example for $p=2; \, k=2$: $ \begin{align} \mathbb E[(\hat X_t-X_t)X_{t-k}]&=0 \tag{1}\\[2pt] \mathbb E[(\hat \phi_1 X_{t-1}+ \hat \phi_2 X_{t-2}-X_t)X_{t-2}]&=0 \tag{2}\\[2pt] \mathbb E[\hat \phi_1 X_{t-1}X_{t-2}]+ \mathbb E[\hat \phi_2 X_{t-2}X_{t-2}]- \mathbb E[X_tX_{t-2}]&=0 \tag{3}\\[2pt] \hat \phi_1 \mathbb E[X_{t-1}X_{t-2}]+ \hat \phi_2\mathbb E[X_{t-2}X_{t-2}]- \mathbb E[X_tX_{t-2}]&=0 \tag{4}\\[2pt] \hat \phi_1\gamma(1)+\hat \phi_2\gamma(0)-\gamma(2)&=0 \tag{5}\\[2pt] \hat \phi_1\gamma(1)+\hat \phi_2\gamma(0)&=\gamma(2) \tag{6} \end{align} $ where: - (2) Plug in the $\text{AR}(p)$ model for $\hat X_t$. - (3) Factor out the scalar multipliers from the expectation. - (5) The expectation of the product of different $X$ terms is actually the autocovariance, given that both $X_t, X_{t-h}$ have $0$ expectation. This generalizes to the equations from above, replacing actual lags in terms of $h$. $ \gamma(h)=\phi_1\gamma(h-1)+ \phi_2 \gamma(h-2)+ \dots+\phi_p(h-p) $