T-Test - Bernhard Pfann, CFA

The [[Wald Test]] is not applicable for small sample sizes, as it relies on asymptotic statements ([[Combining Limits#Slutsky’s Theorem|Slutsky’s Theorem]]). For such small sample sizes we can use the T-test instead, which needs to take the additional assumption, that our raw data is already coming from a [[Gaussian Distribution|Gaussian]] $\mathcal N(\mu, \sigma^2)$. The T-test is designed to test $\mu$ from a Gaussian Distribution under a [[Hypothesis Tests|hypothesis]]. $ X_1,\dots, X_n \stackrel{iid}{\sim} \mathcal N(\mu, \sigma^2) $ When all $X_i$ are [[Independence and Identical Distribution|i.i.d.]] Gaussian, then $\bar X_n$ is a [[Sum of Independent Random Variables#Gaussian Random Variable|Sum of Independent Gaussians]] (scaled by $1 \over n$), which is also Gaussian. Since all other variables $(\mu, \sigma)$ are fixed (unknown) values, the can claim that the whole expression below is $\mathcal N$. We do not need to rely on asymptotic statements for that! $ \sqrt n *\frac{\bar X_n- \mu}{\sigma}\sim \mathcal N(0,1) $ This time, we cannot simply replace $\sigma^2$ with $\hat \sigma^2$ and still assume $\mathcal N(0,1)$ like before, where we relied on Slutsky. I mean, we can of course use the sample variance which is a computable statistic, but it will not lead to $\mathcal N(0,1)$. $ S_n^2 = \frac{1}{n-1} \sum_{i=1}^n(X_i- \bar X_n)^2 $ By plugging-in the sample variance, and rearranging, we see that the numerator is Gaussian. The distribution of the denominator will be defined using Cochran’s theorem. $ \frac{\sqrt n*(\bar X_n - \mu)}{S_n} = \frac{\sqrt n*(\bar X_n - \mu)}{\sqrt{S_n^2}} = \frac{\frac{\sqrt n*(\bar X_n -\mu)}{\sigma}}{\sqrt{\frac{S_n^2}{{\sigma^2}}}} \implies \frac{\sim \mathcal N(0,1)}{\sim \text{?}} $ ## Cochran’s Theorem If $X_1,\dots, X_n \stackrel{iid}{\sim} \mathcal N(\mu, \sigma^2)$ then, the sample variance scaled by $\frac{n-1}{\sigma^2}$ follows a [[Chi-Square Distribution]] with $(n-1)$ degrees of freedom. $ \begin{align} \frac{(n-1)*S_n^2}{\sigma^2}\sim \chi_{n-1}^2 \\[10pt] \frac{(n-1)*\frac{1}{n-1} \sum_{i=1}^n(X_i- \bar X_n)^2}{\sigma^2}\sim \chi_{n-1}^2 \\[10pt] \sum_{i=1}^n\Big(\underbrace{\frac{X_i- \bar X_n}{\sigma}}_{\mathcal N(0,1)}\Big)^2 \sim \chi_{n-1}^2 \end{align} $ >[!note:] >If we would center the $X_i$ by $\mu$ instead of $\bar X_n$, then by definition the chi-squared would be $n$ degrees of freedom. $ \frac{ \frac{\sqrt n*(\bar X_n - \mu)}{\sigma} } {\sqrt{\frac{S_n^2}{\sigma^2}}} \implies \frac{\sim\mathcal N}{\sim \sqrt{\frac{\chi_{n-1}^2}{n-1}}} \implies \text{Student's T} $ ## Conclusion - The T-test uses the sample variance as an estimator of the variance, because we cannot make asymptotic statements about the population variance given the small sample size. - By relying on sample statistics, we introduce additional uncertainty (different sample draws might give different sample variance), which is reflected by the fatter tails of the [[Student T-Distribution]]. - When $n$ increases, the standard error of that sample variance decreases, and the Student’s T-distribution converges to a $\mathcal N(0,1)$. The T-test statistic looks as follows.. $ T_n=\sqrt n* \frac{\bar X_n-\mu}{S_n} \sim t_{n-1} \quad \forall \,n $