We already stated in the [[Linear Regression Assumptions#Dependent Variable|Linear Regression Assumptions]] that if $\epsilon$ is [[Gaussian Distribution|Gaussian]], then also $\mathbf Y$ is Gaussian. Given this setup, we can prove that the [[Linear Regression with LSE]] is equivalent to the [[Maximum Likelihood Estimation|MLE]].
$ \epsilon_i \sim \mathcal N(0, \sigma^2)\implies Y_i \sim \mathcal N(X_i^T\beta^\star, \sigma^2) $
We plug the estimate $\hat y_i$ into the [[Probability Density Function|PDF]] for such a Gaussian and derive the log-likelihood function for MLE estimation.
$
\begin{align}
f(y_i) &= \frac{1}{\sqrt{2\sigma ^2 \pi}}* \exp \Bigg\{-\frac{1}{2\sigma^2}*(y_i-\overbrace{X_i^T\beta^\star}^{\hat y_i})^2\Bigg \} \tag{1}\\[12pt]
L(Y_1, \dots, Y_n; \beta) &= \frac{1}{(\sqrt{2 \sigma ^2\pi})^{n}}* \exp \left \{-\frac{1}{2\sigma^2}*{\sum_{i=1}^n}(Y_i-X_i^T\beta^\star)^2\right \} \tag{2}\\[12pt]
\ell(Y_1, \dots, Y_n; \beta^\star) &= -n \ln(\sqrt{2 \sigma ^2\pi}) -\frac{1}{2\sigma^2}* \sum_{i=1}^n(Y_i-X_i^T\beta^\star)^2 \tag{3}\\[12pt]
\ell(Y_1, \dots, Y_n; \beta^\star) &= -\frac{n}{2} \ln(\sigma^2)-\frac{n}{2} \ln(2 \pi) -\frac{1}{2\sigma^2}* \big \| Y-\mathbb X\beta^\star \big \|_2^2 \tag{4}
\end{align}
$
where:
- (2) The [[Likelihood Functions|Likelihood Function]] is just the product of pdfs, as observations are assumed to be [[Independence and Identical Distribution|i.i.d.]]
- (4) The sum of the differences between $Y_i$ and the predictions $X_i^T\beta$ can also be expressed as a squared [[Vector Length#Norm Notation|Vector Norm]].
Based on this log-likelihood, we can take the derivative for both parameters $\beta^\star, \sigma^2$ to get MLE-estimators.
$
\begin{align}
\frac{\partial \ell}{\partial \beta^\star}
:\frac{2}{2\sigma^2}*( Y-\mathbb X\beta^\star )\mathbb X &\stackrel{!}{=}0 \\[2pt]
\mathbb X^T(Y- \mathbb X\hat \beta)&=0 \\[2pt]
\mathbb X^TY &= \mathbb X^T\mathbb X \hat \beta \\[2pt]
\hat \beta &= (\mathbb X^T\mathbb X)^{-1}\mathbb X^TY
\end{align}
$
**Equivalence:** We see that the MLE estimator for $\hat \beta$ is equivalent to the estimator under Least Squares estimation.
$
\begin{align}
\frac{\partial \ell}{\partial \sigma^2}:-\frac{n}{2\sigma^2}+\frac{1}{2\sigma^4}\big \| Y-\mathbb X\beta^\star \big \|_2^2 &\stackrel{!}{=}0 \\[6pt]
-n+\frac{1}{\hat\sigma^2} \big \| Y-\mathbb X\beta^\star \big \|_2^2&=0\\[6pt] n\hat \sigma^2&= \big \| Y-\mathbb X\beta^\star \big \|_2^2 \\[6pt]
\hat \sigma^2&=\frac{1}{n}\big \| Y-\mathbb X\beta^\star \big \|_2^2
\end{align}
$
>[!note:]
>This MLE estimator for $\hat \sigma^2$ is not the unbiased estimator.