Conditional variance can be expressed as a function $g(Y)$ that takes as input any $y$ and returns the respective variance of $X$ under that condition.
$ g(y)=\mathrm{Var}(X \vert Y=y)$
To get from the conditional variance to the unconditional, we can use the law of total variance.
$
\mathrm{Var}(X) =
\underbrace{\mathbb{E}[\mathrm{Var}(X \vert Y)]}_{\text{Within-group variability}}+
\underbrace{\mathrm{Var}(\mathbb{E}[X \vert Y])}_{\text{Between-group variability}}
$
The formula can be decomposed into:
- *Within-group variability:* Measures the average [[Variance]] of $X$ within each group $y$.
- *Between-group variability:* Measures the variability of the group means $\mathbb E[X \vert Y]$ across all groups $y$.
## Derivation of Law of Total Variance
**Conditional Variance:** The variance of $X$ conditioned on $Y$ is:
$
\begin{align}
\mathrm{Var}(X)&= \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \tag{1}\\
\mathrm{Var}(X \vert Y) &= \mathbb{E}[X^2 \vert Y] - (\mathbb{E}[X \vert Y]^2) \tag{2}
\end{align}
$
where:
- (1) General variance formula.
- (2) Since a conditional r.v. has the same properties as an unconditional one, we just extend the regular variance formula.
**Expectation of Conditional Variance:** Each different $y$ gives a different variance of $X$. Consider the expectation of these different variances.
$
\begin{align}
\mathbb E[\mathrm{Var}(X \vert Y)] &= \mathbb E \big[\mathbb{E}[X^2 \vert Y] \big] - \mathbb E \big[(\mathbb{E}[X \vert Y])^2 \big] \tag{3}\\
\mathbb E [\mathrm{Var}(X \vert Y)] &= \mathbb E[X^2] - \mathbb E \big[(\mathbb{E}[X \vert Y])^2 \big] \tag{4}
\end{align}
$
where:
- (3) Wrapping [[Expectation]] around both terms. [[Linearity of Expectations]] allows to write them as separate expectations.
- (4) Applying [[Law of Iterated Expectations]] for the first term.
**Variance of Conditional Expectation:** Each $y$ gives a different expectation in $X$. Consider the variance of these different expectations.
$
\begin{align}
\mathrm{Var}(X)&= \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \tag{5}\\
\mathrm{Var}(\mathbb{E}[X \vert Y]) &= \mathbb{E} \big[(\mathbb{E}[X \vert Y])^2 \big]
- \Big(\mathbb{E} \big[\mathbb{E}[X \vert Y] \big] \Big)^2 \tag{6}\\
&= \mathbb{E} \big[(\mathbb{E}[X \vert Y])^2 \big] \tag{7}
-(\mathbb{E}[X])^2
\end{align}
$
where:
- (6) Inserting the conditional expectation into the general variance formula.
- (7) Applying law of iterated expectations for the second term.
**Combining Components:** Applying the law of total variance, will recover the original definition of the variance
$
\begin{align}
\mathrm{Var}(X)
&= \mathbb E [\mathrm{Var}(X \vert Y)]+\mathrm{Var}(\mathbb{E}[X \vert Y]) \\[2pt]
&= \mathbb{E}[X^2] - \mathbb E \big[(\mathbb{E}[X \vert Y])^2\big] +
\mathbb{E} \big[(\mathbb{E}[X\vert Y])^2 \big]-(\mathbb{E}[X])^2 \\[2pt]
&= \mathbb{E}[X^2] - (\mathbb{E}[X])^2
\end{align}
$
## Example: Student Scores in Two Sections
Let $X$ be the score of a randomly (uniform) picked student in a class. The class is split into two sections, where $y=1$ (10 students) and $y=2$ (20 students). We know that:
**Conditional Expectations:**
- Mean student score in section 1 → $\mathbb{E}[X \vert Y=1]=90$
- Mean student score in section 2 → $\mathbb{E}[X \vert Y=2]=60$
**Conditional Variances:**
- Variance of student scores in section 1 → $\mathrm{var}(X \vert Y=1) = 15$
- Variance of student scores in section 2 → $\mathrm{var}(X \vert Y=2) = 30$
**Unconditional Expectation:**
- $\mathbb{E}[X \vert Y]$ is a random variable that can take $\in [60, 90]$
- $\mathbb{E}\big[\mathbb{E}[X \vert Y]\big]$is the expectation of the r.v. above → $\frac{1}{3}*60 + \frac{2}{3}*90=70$
- $\mathbb{E}[X]=70$ as it is the same as $\mathbb{E}\big[\mathbb{E}[X \vert Y]\big]$ by law of iterated expectation
**Calculating Components:**
- *Within-group variability*: Each section has a variance of scores. Take the expectation of these section variances → $\mathbb{E}[\mathrm{Var}(X \vert Y)] = \frac{1}{3}*15+ \frac{2}{3}*30=25$
- *Between-group variability:* Each section has an expectation (average) of scores. Take the variance of these averages → $\mathrm{Var}(\mathbb{E}[X \vert Y]) = \frac{1}{3}*(90-70)^2+\frac{2}{3}*(60-70)^2=200$