**Motivation:**
- In an [[Autoregressive Model|AR Model]] $\text{AR}(p)$ we should only include the lags that individually contribute, i.e. they help describing the underlying data generation process.
- When we only look at the ACF, we will see correlations of $X_t$ to prior $X_{t-h}$ due to the dependency structure in the autoregressive process, even though some of these variable are not individually contributing.
**Example:**
By definition an $\text{AR(1)}$ model only needs $X_{t-1}$ to be fully defined. However you can rewrite in terms of $X_{t-2}$ and some white noise terms:
$
\begin{align}
\text{AR}(1): X_t &= \phi X_{t-1}+W_t \\[2pt]
&= \phi (\phi X_{t-2}+W_{t-1})+W_t\\[2pt]
&=\phi^2X_{t-2}+ \phi W_{t-1}+W_t
\end{align}
$
Below you see the proof that $X_{t-2}$ is correlated to $X_t$ in an $\text{AR}(1)$ model. Therefore instead we need to look at the partial autocorrelation function (PACF).
$
\begin{align}
\rho(X_t, X_{t-2})&=\rho(\phi^2X_{t-2}+ \phi W_{t-1}+W_t, X_{t-2}) \\[4pt]
&= \rho(\phi^2X_{t-2}, X_{t-2}) + \rho(\phi W_{t-1},X_{t-2}) + \rho(W_t,X_{t-2}) \\[4pt]
&=\phi^2 \gamma(0)
\end{align}
$
## Partial Correlation
**General Setup:**
We have [[Random Variable|r.v's.]] $X,Y$ and both are influenced by $Z$. The partial correlation of $X,Y$ tells us their correlation beyond the common impact that $Z$ has on both. To factor out $Z$, we simply condition the correlation on it $(\rho_{X,Y | Z})$.
Therefore we regress each $X,Y$ separately on $Z$ to get the impact of $Z$ on each of them.
- Regress $X$ on $Z$ to get $\hat X$ ($X$ is the dependent variable)
- Regress $Y$ on $Z$ to get $\hat Y$ ($Y$ is the dependent variable)
Finally we subtract this impact in the [[Correlation]], so that we are effectively comparing the residuals beyond the impact of $Z$.
$ \rho_{X,Y | Z}=\rho(X-\hat X, \, Y-\hat Y) $
**Time Series Setup:**
For time series, we deploy the same concept for the autocorrelation of different lags $X_t$ and $X_{t-h}$. We need to factor out all terms “in between”, $X_{t-1}, \dots , X_{t-h+1}$ to identify the individual contribution of $X_{t-h}$.
Again to factor out variables we regress on them.
- We fit an $\text{AR}(h-1)$ to get $\hat X$
- We compute the correlation of the residuals $\rho(X-\hat X, X)$
- If the correlation is above a certain threshold, then an additional lag would have some descriptive power
**Frisch-Waugh-Lovell theorem:** The above approach is equivalent to fitting an $\mathrm{AR}(h)$ model, where the coefficient $\phi_h$ is the partial autocorrelation. The interpretation of a linear regression coefficient is: “How much does the output $(X_t)$ change, when I change this regressor $(X_{t-h})$, holding everything else equal.
>[!Note:]
>The partial autocorrelation function, is then the collection of partial autocorrelations plotted for all lags.