We have demonstrated in [[Linear Regression with LSE]] that $\hat \beta$ follows a [[Gaussian Distribution]]. To test the significance of a single coefficient from the $\hat \beta$ vector, we leverage this distribution and standardize it.
$ \hat \beta \sim \mathcal N_p\Big(\beta^\star,\sigma^2(\mathbb X^T \mathbb X)^{-1}\Big) $
**Isolating Beta Element:**
To extract a single $\hat \beta_j$ element, we construct a product of $u\hat \beta$, where $u$ is a zero vector, with $1$ at the $j$-th position.
$
\hat \beta_1=u\hat \beta, \quad \text{where }
u^T= \begin{bmatrix} 1&0& \cdots &0 \end{bmatrix}
$
The distribution of $\hat \beta_j$ can be expressed as:
$
\begin{align}
\hat \beta_j &\sim \mathcal N \Big( u^T \beta^\star, u^T\big[\sigma^2(\mathbb X^T\mathbb X)^{-1}\big]\, u \Big) \tag{1}\\[10pt]
\hat \beta_j &\sim \mathcal N\Big(\beta^\star_j,\sigma^2 \underbrace{\Big[(\mathbb X^T \mathbb X)^{-1}\Big]_{jj}}_{\gamma_j}\Big) \tag{2}
\end{align}
$
(2) The [[Variance]] term of $\mathcal N$ simplifies to the product of $\sigma^2 \gamma_j$, where $\gamma_j$ is the $(j,j)$-the coordinate in the matrix $(\mathbb X^T \mathbb X)^{-1}$ which has dimension $p \times p$.
**Scaling Beta Element:**
To test the null hypothesis $H_0: \beta^\star_j=0$, we need to standardize $\hat \beta_j$.
$ \frac{\hat \beta_j-\beta^\star_j}{\sqrt{\sigma^2}* \gamma_j} \sim \mathcal N(0,1) $
However, we do not observe the population parameter $\sigma^2$. Replacing $\sigma^2$ with its unbiased estimator $\hat \sigma^2$ and applying [[T-Test#Cochran’s Theorem|Cochran’s Theorem]], we obtain:
$
\frac{\hat \beta_j-\beta^\star_j}{\sqrt{\hat \sigma^2* \gamma_j}} =
\frac{\frac{\hat \beta_j-\beta^\star_j}{\sqrt{\sigma^2* \gamma_j}}}{\sqrt{\frac{\hat \sigma^2}{\sigma^2}}} \sim \frac{\mathcal N(0,1)}{\sqrt{\frac{\chi^2_{n-p}}{n-p}}} \sim t_{n-p}
$
By definition the division of a standard Gaussian with this variation of a [[Chi-Square Distribution]] returns a $t_{n-p}$ [[Student T-Distribution]]. This allows us to perform a [[T-Test]] on the significance of each parameter $\hat \beta_j$.
>[!note:]
>If we would know the population parameter $\sigma^2$, then the test-statistic would follow a $\mathcal N(0,1)$, and we could simply look up the quantiles in a standard Gaussian table (”Z-test”) for significance testing.
## Two Sample-Test
When we compare the difference between two coefficients, the matrix multiplication in the variance term gets a bit more involved. Assume the following null hypothesis:
$ H_0: \beta_2 - \beta_1>0 $
We adjust the vector to single out only the parameters of interest, which are the first and second element, with their respective sign.
$ u^T=\begin{bmatrix} -1 &1&0& \cdots&0 \end{bmatrix} $
The difference $\hat \beta_2-\hat \beta_1$ follows:
$
\begin{aligned}
\hat \beta_2- \hat \beta_1 &\sim \mathcal N \Big( u^T \beta^\star, u^T\big[\sigma^2\overbrace{(\mathbb X^T\mathbb X)^{-1}}^{M}\big]\, u \Big) \\[8pt]
\hat \beta_2- \hat \beta_1&\sim \mathcal N \Big( \beta_2^\star- \beta_1^\star, \sigma^2 (M_{11}+M_{22}-2M_{12}) \Big)
\end{aligned}
$
The subsequent steps follow the regular procedure of a [[Two Sample T-Test]].