The moving average $\text{MA}(q)$ model looks similar in structure to the [[Autoregressive Model]] $\text{AR}(p)$. However it is fundamentally different in terms of [[Stationarity]].
$
\begin{align}
X_t&=W_t+\phi_1X_{t-1}+ \dots+\phi_qX_{t-q} && \text{(AR)} \\[6pt]
X_t&=W_t+\theta_1W_{t-1}+ \dots+\theta_qW_{t-q} && \text{(MA)}
\end{align}
$
The $\text{MA}(q)$ is a weighted average of the last $q$ [[White Noise Model|White Noise]] terms. As we move from $X_t$ to $X_{t+1}$ the window of averaging largely overlaps. The larger $q$, the more $W_i$ terms are overlapping in consecutive observations, and hence the larger is the dependency structure in the [[Time Series as Stochastic Process|Time Series]].
![[moving-average-model.svg|center]]
## Checking for Stationarity
To satisfy [[Stationarity#Weak Stationarity|Weak Stationarity]], we need to prove a constant mean, constant variance and covariance that only depends on the number of lags.
**Expectation:**
The expectation of $X_t$ is a weighted average of $W_i$ terms, which all have zero mean.
$ \mathbb E[X_t]= \underbrace{\mathbb E[W_t]}_{=0}+ \theta_1 \underbrace{\mathbb E[W_{t-1}]}_{=0}+\dots+\theta_q \underbrace{\mathbb E[W_{t-q}]}_{=0} =0$
Compared to the $\text{AR}(p)$ model, the $\text{MA}(q)$ model does not capture the full path (how it got from $X_0$ to $X_t$). It forgets everything that happened $q$ steps before.
>[!Note:]
>The expectation is constant over time.
**Variance:**
Assume the following moving average model, which considers the last 3 noise terms equally.
$ X_t = \frac{1}{3}(W_t+W_{t-1}+W_{t-2}) $
The variance of $X_t$ can be written as follows:
$
\begin{align}
\mathrm{cov}(X_t, X_t) =\mathrm{var}(X_t)& = \mathrm{var}(\frac{1}{3}W_t)+(\frac{1}{3}W_{t-1})+(\frac{1}{3}W_{t-2}) \tag{1}\\[8pt]
&=\frac{1}{9}\mathrm{var}(W_t)+\frac{1}{9}\mathrm{var}(W_{t-1})+\frac{1}{9}\mathrm{var}(W_{t-2}) \tag{2}\\[8pt]
&=\frac{3}{9} \sigma_t^2 \tag{3}
\end{align}
$
where:
- (1) We know that the variance of the [[Sum of Independent Random Variables]], equals the sum of the variances.
- (2) We can factor out the $\frac{1}{3}$ from the variance (it gets squared).
- (3) Each $W_i$ has the same variance $\sigma_t^2$.
>[!note:]
>We conclude that the variance is constant over time.
**Covariance:**
Assuming the same moving average model as above. Here we rely on the [[Covariance#Covariance after Linear Transformation|linearity of covariance]] property.
$
\begin{align}
\mathrm{cov}(X_t, X_{t-1})&=\mathrm{cov}\Big(
\frac{1}{3}(W_t+W_{t-1}+W_{t-2}), \frac{1}{3} (W_{t-1}+W_{t-2}+W_{t-3})\Big) \tag{1}\\[8pt]
&=\mathrm{cov}\Big(\frac{1}{3}(W_{t-1}, W_{t-1}+W_{t-2})\Big)+\mathrm{cov}\Big(\frac{1}{3}(W_{t-2}, W_{t-1}+W_{t-2})\Big) \tag{2}\\[8pt]
&=\frac{1}{9}\mathrm{cov}(W_{t-1}, W_{t-1})+\frac{1}{9}\mathrm{cov}(W_{t-1}, W_{t-2})+ \tag{3}\\[8pt]
&\quad \,\,\frac{1}{9}\mathrm{cov}(W_{t-2}, W_{t-2})+\frac{1}{9}\mathrm{cov}(W_{t-2}, W_{t-1}) \tag{4}\\[8pt]
&=\frac{1}{9}\mathrm{var}(W_{t-1})+\frac{1}{9}\mathrm{var}(W_{t-2}) \tag{5}\\[8pt]
&=\frac{2}{9}\sigma_t^2 \tag{6}
\end{align}
$
where:
- (4) The noise terms that only appear on one side, do not contribute to covariance, as they are completely independent of each other.
- (5) Applying linearity of covariance.
- (6) Applying linearity of covariance again. Covariances of independent terms are zero, covariances of the same r.v. turn into variances.
>[!note:]
>We conclude that the autocovariance $\gamma$ only depends on the gap between $s,t$ and not their absolute position on the time series. This satisfies stationarity.
## Autocovariance for MA(1)
The moving average model of order $1$, denoted as $\text{MA}(1)$, is defined as:
$ X_t=W_t+ \phi W_{t-1} \quad $
The autocovariance at lag $0$ corresponds to the variance of $X_t$. Since the noise terms $W_i$ are [[Independence and Identical Distribution|i.i.d.]], the expression simplifies.
$
\begin{align}
\mathrm{Cov}(X_t, X_t)
&=\mathrm{Cov}(W_t+ \phi W_{t-1},W_t+ \phi W_{t-1})\\[6pt]
&=\mathrm{Cov}(W_t,W_t) + 2*\mathrm{Cov}(W_t,\phi W_{t-1}) + \mathrm{Cov}(\phi W_{t-1},\phi W_{t-1})\\[6pt]
&=\sigma^2+2\phi \mathrm{Cov}(W_t, W_{t-1})+\phi^2 \mathrm{Cov}(W_{t-1},W_{t-1})\\[6pt] &=\sigma^2+\phi^2\sigma^2\\[6pt] &=\sigma^2(1+\phi^2)
\end{align}
$
The autocovariance at lag $1$ uses the same properties and can be written as:
$
\begin{align}
\mathrm{Cov}(X_t, X_{t-1}) &=\mathrm{Cov}(W_t+ \phi W_{t-1}, W_{t-1}+ \phi W_{t-2})\\[6pt]
&=\mathrm{Cov}(W_t,W_{t-1}) + \phi\mathrm{Cov}(W_t, W_{t-2}) \\[6pt]
&\phantom{==} +\, \phi \mathrm{Cov}(W_{t-1}, W_{t-1})+ \phi^2 \mathrm{Cov}(W_{t-1}, W_{t-2})\\[6pt]
&= \phi \sigma^2
\end{align}
$