LLMS Estimator - Bernhard Pfann, CFA

The Linear Least Mean Squares (LLMS) estimator minimizes the [[MSE]] among all linear estimators of the form: $ \hat \Theta_{\text{LLMS}} = aX+b $ Here $a$ and $b$ are parameters to be optimized. While the general [[LMS Estimator]] directly computes the conditional expectation $\hat \Theta_{\text{LMS}}=\mathbb E[\Theta \vert X]$, the LLMS estimator $\hat \Theta_{\text{LLMS}}$ is constrained to linear functions of $X$. To find the optimal linear estimator (best $a,b$ parameters), we minimize the MSE. $ \mathrm{MSE} = \mathbb{E}[(\Theta - \hat \Theta)^2] = \mathbb{E} \big [(\Theta - aX -b)^2 \big] $ ## Derivation of Optimal Estimator *Step 1: Minimizing the Intercept* Assume that $a$ is fixed, for the purpose of optimizing the intercept $b$ first. We treat $(\Theta - aX)$ as a [[Random Variable|r.v.]] $Z$. $ \min \Big (\mathbb{E} \big [(\underbrace{\Theta - aX}_{=Z} -b)^2 \big ] \Big )= \min \Big (\mathbb{E} \big [(Z -b)^2 \big ] \Big ) $ The expression within the expectation is minimized when $b$ equals the expectation of $Z$. $ \begin{align} \min(\text{MSE}) \to b &= \mathbb{E}[Z] \tag{1}\\[2pt] b &= \mathbb{E}[\Theta - aX] \tag{2}\\[2pt] b &= \mathbb{E}[\Theta]-a\mathbb{E}[X] \tag{3} \end{align} $ where: - (2) Insert definition of $Z$. - (3) [[Linearity of Expectations]] allows us to split up the expectation. Also, since $a$ is treated as a fixed constant, we can pull it out of the expectation. *Step 2: Minimizing the Slope Parameter* By filling in the minimizing definition for $b$, we see that the MSE is minimized at the variance of $Z$. $ \begin{align} \mathrm{MSE}&=\mathbb E\big [(\overbrace{\Theta - aX}^{=Z} - \mathbb E[\overbrace{\Theta-aX}^{=Z}])^2 \big ] \\[2pt] &=\mathbb{E}\big [(Z - \mathbb{E}[Z]^2) \big] \\[2pt] &=\mathrm{Var}(Z) \\[2pt] &=\mathrm{Var}(\Theta - aX) \end{align} $ We can compute the [[Variance of Sum of Random Variables]]. Note that, since $\Theta$ and $X$ are not independent, we cannot simply add up their individual variances. $ \begin{align} \mathrm{Var}(\Theta -aX)&=\mathrm{Var}(\Theta) + \mathrm{Var}(aX)-2*\mathrm{Cov}(\Theta, aX) \\ \mathrm{Var}(\Theta -aX)&=\mathrm{Var}(\Theta) + a^2\mathrm{Var}(X)-2a*\mathrm{Cov}(\Theta, X) \end{align} $ Since we want to find $a$ that minimizes the MSE, we take the derivative w.r.t $a$ and set it to zero. $ \begin{aligned} \frac{d}{da}\mathrm{Var}(\Theta) + a^2\mathrm{Var}(X)-2a*\mathrm{Cov}(\Theta, X)&\stackrel{!}{=}0 \\[8pt] 2a*\mathrm{Var}(X) - 2*\mathrm{Cov}(\Theta, X)&=0 \\[8pt] \frac{\mathrm{Cov}(\Theta, X)}{\mathrm{Var}(X)}&=a \end{aligned} $ We can also express $a$ in terms of [[Correlation#^8e7d31|Correlation]]. $ a = \frac{\mathrm{Cov}(\Theta, X)}{\mathrm{Var}(X)} = \frac{\rho \sigma_X \sigma_\Theta}{\sigma_X^2} = \rho\frac{\sigma_\Theta}{\sigma_X} $ This lets us express $b$ as: $ \begin{align} b &= \mathbb{E}[\Theta]-a\mathbb{E}[X] \\[2pt] b &= \mathbb{E}[\Theta]-\rho\frac{\sigma_\Theta}{\sigma_X} \mathbb{E}[X] \end{align} $ *Step 3: Final Estimator* Plugging in the optimal values of $a$ and $b$ into $\hat \Theta_{\text{LLMS}}$. $ \begin{align} \hat \Theta_{\text{LLMS}} &= aX+b \\[2pt] \hat \Theta_{\text{LLMS}} &=\mathbb{E}[\Theta]+\rho\frac{\sigma_\Theta}{\sigma_X} *(X-\mathbb{E}[X]) \end{align} $ **Interpretation:** - *Zero correlation:* When $\rho=0$, then the LLMS estimator boils down to $\mathbb E[\Theta]$. Since uncorrelated any observation of $x$ does not help in estimating $\Theta$. - *Positive correlation:* When $\rho>1$ and an observation $x$ is bigger than its mean $\mathbb E[X]$, this will increase the estimate $\hat \Theta$ beyond its mean $\mathbb E[\Theta]$. - *Negative correlation:* When $\rho<1$ and an observation $x$ is bigger than its mean $\mathbb E[X]$, this will decrease the estimate $\hat \Theta$ below its mean $\mathbb E[\Theta]$. ## Mean Squared Error Now that we have a formula for the estimator $\hat \Theta$, we can derive a simpler formulation of the MSE as well. For simplification we assume that $\mathbb{E}[\Theta]=0$ and $\mathbb{E}[X]=0$. However, everything remains valid even without that constraint. $ \begin{align} \mathrm{MSE} &=\mathbb{E}\left [(\hat \Theta_{\text{LLMS}}- \Theta)^2 \right] \tag{1}\\[2pt] &=\mathbb{E}\left [\big(\mathbb E[\Theta]+\rho \frac{ \sigma_\Theta}{\sigma_X}*(X-\mathbb E [X]) - \Theta \big)^2\right] \tag{2}\\[2pt] &=\mathbb{E}\left [ \left(\rho\frac{ \sigma_\Theta}{\sigma_X}*X - \Theta \right)^2\right] \tag{3}\\[2pt] &=\mathbb{E} \left[\Theta^2 - 2\rho \frac{\sigma_\Theta }{\sigma_X}*X \Theta +\rho^2 \frac{\sigma_\Theta^2}{\sigma_X^2}*X^2\right] \tag{4}\\[2pt] &=\mathbb{E} \left[\Theta^2\right] -\mathbb{E} \left[ 2\rho * \frac{\sigma_\Theta }{\sigma_X}*X \Theta\right] +\mathbb{E}\left[\rho^2\frac{\sigma_\Theta^2}{\sigma_X^2}*X^2\right] \tag{5}\\[2pt] &=\sigma_\Theta^2-2\rho\frac{\sigma_\Theta}{\sigma_X}*\mathbb{E}[X \Theta]+ \rho^2\frac{ \sigma_\Theta^2}{\sigma_X^2}*\sigma_X^2 \tag{6}\\[2pt] &=\sigma_\Theta^2-2\rho\frac{ \sigma_\Theta}{\sigma_X} *\rho \sigma_\Theta \sigma_X + \rho^2\sigma_\Theta^2 \tag{7}\\[6pt] &=\sigma_\Theta^2-2\rho^2\sigma_\Theta^2 + \rho^2\sigma_\Theta^2 \tag{7}\\[10pt] \mathrm{MSE}&=(1- \rho^2)* \mathrm{Var}(\Theta) \tag{8} \end{align} $ where: - (2) Plugin the LLMS estimator - (3) Cancel out terms that that are zero by assumption. - (4) Expand the quadratic term. - (5) Apply [[Linearity of Expectations]] to split up the expectation. - (6) Since $\mathbb E[\Theta]=0$ by assumption, the second moment $\mathbb E[\Theta^2]$ is equal to the second central moment (i.e. variance). The same is true for $\mathbb E[X^2]$. - (7) Write $\mathbb E[X \Theta]$ as $\mathrm{Cov}(X, \Theta)$, since both expectations are zero. **Interpretation:** - *Zero correlation:* When $\rho=0$, then observations $x$ do not make the estimate better. Thus the estimation error is $\mathrm{Var}(\Theta)$ itself. - *Positive absolute correlation:* When $\rho > \vert 0 \vert$, then observations $x$ improve the estimate. In the extreme case of $\rho=1$, the estimate is fully described by $X$, and has zero forecast error.