The Linear Least Mean Squares (LLMS) estimator minimizes the [[MSE]] among all linear estimators of the form:
$ \hat \Theta_{\text{LLMS}} = aX+b $
Here $a$ and $b$ are parameters to be optimized. While the general [[LMS Estimator]] directly computes the conditional expectation $\hat \Theta_{\text{LMS}}=\mathbb E[\Theta \vert X]$, the LLMS estimator $\hat \Theta_{\text{LLMS}}$ is constrained to linear functions of $X$.
To find the optimal linear estimator (best $a,b$ parameters), we minimize the MSE.
$ \mathrm{MSE} = \mathbb{E}[(\Theta - \hat \Theta)^2] = \mathbb{E} \big [(\Theta - aX -b)^2 \big] $
## Derivation of Optimal Estimator
*Step 1: Minimizing the Intercept*
Assume that $a$ is fixed, for the purpose of optimizing the intercept $b$ first. We treat $(\Theta - aX)$ as a [[Random Variable|r.v.]] $Z$.
$
\min \Big (\mathbb{E} \big [(\underbrace{\Theta - aX}_{=Z} -b)^2 \big ] \Big )=
\min \Big (\mathbb{E} \big [(Z -b)^2 \big ] \Big )
$
The expression within the expectation is minimized when $b$ equals the expectation of $Z$.
$
\begin{align}
\min(\text{MSE}) \to b &= \mathbb{E}[Z] \tag{1}\\[2pt]
b &= \mathbb{E}[\Theta - aX] \tag{2}\\[2pt]
b &= \mathbb{E}[\Theta]-a\mathbb{E}[X] \tag{3}
\end{align}
$
where:
- (2) Insert definition of $Z$.
- (3) [[Linearity of Expectations]] allows us to split up the expectation. Also, since $a$ is treated as a fixed constant, we can pull it out of the expectation.
*Step 2: Minimizing the Slope Parameter*
By filling in the minimizing definition for $b$, we see that the MSE is minimized at the variance of $Z$.
$
\begin{align}
\mathrm{MSE}&=\mathbb E\big [(\overbrace{\Theta - aX}^{=Z} - \mathbb E[\overbrace{\Theta-aX}^{=Z}])^2 \big ] \\[2pt]
&=\mathbb{E}\big [(Z - \mathbb{E}[Z]^2) \big] \\[2pt]
&=\mathrm{Var}(Z) \\[2pt]
&=\mathrm{Var}(\Theta - aX)
\end{align}
$
We can compute the [[Variance of Sum of Random Variables]]. Note that, since $\Theta$ and $X$ are not independent, we cannot simply add up their individual variances.
$
\begin{align}
\mathrm{Var}(\Theta -aX)&=\mathrm{Var}(\Theta) + \mathrm{Var}(aX)-2*\mathrm{Cov}(\Theta, aX) \\
\mathrm{Var}(\Theta -aX)&=\mathrm{Var}(\Theta) + a^2\mathrm{Var}(X)-2a*\mathrm{Cov}(\Theta, X)
\end{align}
$
Since we want to find $a$ that minimizes the MSE, we take the derivative w.r.t $a$ and set it to zero.
$
\begin{aligned}
\frac{d}{da}\mathrm{Var}(\Theta) + a^2\mathrm{Var}(X)-2a*\mathrm{Cov}(\Theta, X)&\stackrel{!}{=}0 \\[8pt]
2a*\mathrm{Var}(X) - 2*\mathrm{Cov}(\Theta, X)&=0 \\[8pt]
\frac{\mathrm{Cov}(\Theta, X)}{\mathrm{Var}(X)}&=a
\end{aligned}
$
We can also express $a$ in terms of [[Correlation#^8e7d31|Correlation]].
$
a = \frac{\mathrm{Cov}(\Theta, X)}{\mathrm{Var}(X)}
= \frac{\rho \sigma_X \sigma_\Theta}{\sigma_X^2}
= \rho\frac{\sigma_\Theta}{\sigma_X}
$
This lets us express $b$ as:
$
\begin{align}
b &= \mathbb{E}[\Theta]-a\mathbb{E}[X] \\[2pt]
b &= \mathbb{E}[\Theta]-\rho\frac{\sigma_\Theta}{\sigma_X} \mathbb{E}[X]
\end{align}
$
*Step 3: Final Estimator*
Plugging in the optimal values of $a$ and $b$ into $\hat \Theta_{\text{LLMS}}$.
$
\begin{align}
\hat \Theta_{\text{LLMS}} &= aX+b \\[2pt]
\hat \Theta_{\text{LLMS}} &=\mathbb{E}[\Theta]+\rho\frac{\sigma_\Theta}{\sigma_X} *(X-\mathbb{E}[X])
\end{align}
$
**Interpretation:**
- *Zero correlation:* When $\rho=0$, then the LLMS estimator boils down to $\mathbb E[\Theta]$. Since uncorrelated any observation of $x$ does not help in estimating $\Theta$.
- *Positive correlation:* When $\rho>1$ and an observation $x$ is bigger than its mean $\mathbb E[X]$, this will increase the estimate $\hat \Theta$ beyond its mean $\mathbb E[\Theta]$.
- *Negative correlation:* When $\rho<1$ and an observation $x$ is bigger than its mean $\mathbb E[X]$, this will decrease the estimate $\hat \Theta$ below its mean $\mathbb E[\Theta]$.
## Mean Squared Error
Now that we have a formula for the estimator $\hat \Theta$, we can derive a simpler formulation of the MSE as well. For simplification we assume that $\mathbb{E}[\Theta]=0$ and $\mathbb{E}[X]=0$. However, everything remains valid even without that constraint.
$
\begin{align}
\mathrm{MSE}
&=\mathbb{E}\left [(\hat \Theta_{\text{LLMS}}- \Theta)^2 \right] \tag{1}\\[2pt]
&=\mathbb{E}\left [\big(\mathbb E[\Theta]+\rho \frac{ \sigma_\Theta}{\sigma_X}*(X-\mathbb E [X]) - \Theta \big)^2\right] \tag{2}\\[2pt]
&=\mathbb{E}\left [ \left(\rho\frac{ \sigma_\Theta}{\sigma_X}*X - \Theta \right)^2\right] \tag{3}\\[2pt] &=\mathbb{E} \left[\Theta^2 - 2\rho \frac{\sigma_\Theta }{\sigma_X}*X \Theta +\rho^2 \frac{\sigma_\Theta^2}{\sigma_X^2}*X^2\right] \tag{4}\\[2pt]
&=\mathbb{E} \left[\Theta^2\right] -\mathbb{E} \left[ 2\rho * \frac{\sigma_\Theta }{\sigma_X}*X \Theta\right] +\mathbb{E}\left[\rho^2\frac{\sigma_\Theta^2}{\sigma_X^2}*X^2\right] \tag{5}\\[2pt]
&=\sigma_\Theta^2-2\rho\frac{\sigma_\Theta}{\sigma_X}*\mathbb{E}[X \Theta]+ \rho^2\frac{ \sigma_\Theta^2}{\sigma_X^2}*\sigma_X^2 \tag{6}\\[2pt]
&=\sigma_\Theta^2-2\rho\frac{ \sigma_\Theta}{\sigma_X} *\rho \sigma_\Theta \sigma_X + \rho^2\sigma_\Theta^2 \tag{7}\\[6pt]
&=\sigma_\Theta^2-2\rho^2\sigma_\Theta^2 + \rho^2\sigma_\Theta^2 \tag{7}\\[10pt]
\mathrm{MSE}&=(1- \rho^2)* \mathrm{Var}(\Theta) \tag{8}
\end{align}
$
where:
- (2) Plugin the LLMS estimator
- (3) Cancel out terms that that are zero by assumption.
- (4) Expand the quadratic term.
- (5) Apply [[Linearity of Expectations]] to split up the expectation.
- (6) Since $\mathbb E[\Theta]=0$ by assumption, the second moment $\mathbb E[\Theta^2]$ is equal to the second central moment (i.e. variance). The same is true for $\mathbb E[X^2]$.
- (7) Write $\mathbb E[X \Theta]$ as $\mathrm{Cov}(X, \Theta)$, since both expectations are zero.
**Interpretation:**
- *Zero correlation:* When $\rho=0$, then observations $x$ do not make the estimate better. Thus the estimation error is $\mathrm{Var}(\Theta)$ itself.
- *Positive absolute correlation:* When $\rho > \vert 0 \vert$, then observations $x$ improve the estimate. In the extreme case of $\rho=1$, the estimate is fully described by $X$, and has zero forecast error.