Assume we want to test the effectiveness of a drug by comparing some measure from the test group $X$ with the control group $Y$.
- Test group: $X_1, \dots, X_n \stackrel{iid}{\sim} \mathcal N(\mu_d, \sigma_d^2)$
- Control group: $Y_1, \dots, Y_m \stackrel{iid}{\sim} \mathcal N(\mu_c, \sigma_c^2)$
## Hypothesis
The goal is to determine if $\mu_d > \mu_c$, i.e. if the difference in means is significantly greater than $0$. This is in contrast to the one-sample [[T-Test]], where some $\mu$ is compared to a fixed constant.
$
\begin{align}
H_0: \mu_d-\mu_c \le 0\\
H_1: \mu_d-\mu_c > 0
\end{align}
$
We begin by considering the normalized difference in sample means. Assuming we know the population variances, the test statistic follows a standard [[Gaussian Distribution]]:
$
\frac{\bar X_n- \bar Y_n-(\mu_d- \mu_c)}{\sqrt{\frac{\sigma_d^2}{n}+\frac{\sigma_c^2}{m}}} \sim \mathcal N(0,1)
$
## Why is this Gaussian?
**Distribution Mean:**
The sample means $\bar X_n$ and $\bar Y_n$ are [[Properties of an Estimator#Key Properties of an Estimator|consistent]] estimators, which means that their expectation e.g. $\mathbb E[\bar X_n]$ tends towards the mean $\mu_d$. Thus, under the $H_0$, where $\mu_d=\mu_c$, the expectation of the difference is $0$.
$ \mathbb E[\bar X_n- \bar Y_n] = \mu_d-\mu_c$
**Distribution Variance:**
The [[Variance of Sum of Random Variables#Special Case of Independence|variance of the sum of independent r.v's.]] is just the sum of the variances. Thus the sum of the difference is:
$ \mathrm{Var}(\bar X_n-\bar Y_n)= \mathrm{Var}(\bar X_n)+\mathrm{Var}(-\bar Y_n)$
The variances of the sample means are:
$ \mathrm{Var}(\bar X_n)= \frac{\sigma^2_d}{n}, \quad \mathrm{Var}(\bar Y_n)= \frac{\sigma^2_c}{n}$
Thus, the total variance is $\frac{\sigma^2_d}{n} + \frac{\sigma^2_c}{n}$. Dividing by its square root, scales the variance to $1$.
**Distribution Shape:**
Both $\bar X_n$ and $\bar Y_n$ are Gaussian since the underlying observations $X_i$ and $Y_i$ are Gaussian by assumption. We know that the [[Sum of Independent Random Variables#Gaussian Random Variables|sum of independent Gaussian r.v's.]] is Gaussian as well. No reference to [[Central Limit Theorem|CLT]] required.
## Unknown Population Variance
When sample sizes are small and the population variances are unknown, we substitute the population variances with unbiased sample variances. By [[T-Test#Cochran’s Theorem|Cochran’s Theorem]] this additional uncertainty gives us a [[Student T-Distribution]] instead of the Gaussian.
$
\frac{\bar X_n- \bar Y_n-(\mu_d- \mu_c)}{\sqrt{\frac{S_d^2}{n}+\frac{S_c^2}{m}}} \sim t_N
$
Here, $S_d^2$ and $S_c^2$ are the sample variances for the test and control groups, respectively.
## Welch-Satterthwaite Approximation
Since we now have two sample sizes $n,m$ it is not straight forward anymore, what degrees of freedom to use for the Student’s t-distribution. The Welch-Satterthwaite ("WS") formula provides an approximation.
$
N=\frac{\big(\frac{S_d^2}{n}+\frac{S_c^2}{m}\big)^2} {\frac{S_d^4}{n^2(n-1)}+\frac{S_c^4}{m^2(m-1)}} \ge \min(n,m)
$
>[!note:]
We have to round down the outcome from WS formula, which means being conservative (less observations mean broader confidence intervals).
>[!note:]
>If we do not want to compute the WS-formula, the most conservative choice is to take the minimum of $n,m$ instead.