We already stated in the [[Linear Regression Assumptions#Dependent Variable|Linear Regression Assumptions]] that if $\epsilon$ is [[Gaussian Distribution|Gaussian]], then also $\mathbf Y$ is Gaussian. Given this setup, we can prove that the [[Linear Regression with LSE]] is equivalent to the [[Maximum Likelihood Estimation|MLE]]. $ \epsilon_i \sim \mathcal N(0, \sigma^2)\implies Y_i \sim \mathcal N(X_i^T\beta^\star, \sigma^2) $ We plug the estimate $\hat y_i$ into the [[Probability Density Function|PDF]] for such a Gaussian and derive the log-likelihood function for MLE estimation. $ \begin{align} f(y_i) &= \frac{1}{\sqrt{2\sigma ^2 \pi}}* \exp \Bigg\{-\frac{1}{2\sigma^2}*(y_i-\overbrace{X_i^T\beta^\star}^{\hat y_i})^2\Bigg \} \tag{1}\\[12pt] L(Y_1, \dots, Y_n; \beta) &= \frac{1}{(\sqrt{2 \sigma ^2\pi})^{n}}* \exp \left \{-\frac{1}{2\sigma^2}*{\sum_{i=1}^n}(Y_i-X_i^T\beta^\star)^2\right \} \tag{2}\\[12pt] \ell(Y_1, \dots, Y_n; \beta^\star) &= -n \ln(\sqrt{2 \sigma ^2\pi}) -\frac{1}{2\sigma^2}* \sum_{i=1}^n(Y_i-X_i^T\beta^\star)^2 \tag{3}\\[12pt] \ell(Y_1, \dots, Y_n; \beta^\star) &= -\frac{n}{2} \ln(\sigma^2)-\frac{n}{2} \ln(2 \pi) -\frac{1}{2\sigma^2}* \big \| Y-\mathbb X\beta^\star \big \|_2^2 \tag{4} \end{align} $ where: - (2) The [[Likelihood Functions|Likelihood Function]] is just the product of pdfs, as observations are assumed to be [[Independence and Identical Distribution|i.i.d.]] - (4) The sum of the differences between $Y_i$ and the predictions $X_i^T\beta$ can also be expressed as a squared [[Vector Length#Norm Notation|Vector Norm]]. Based on this log-likelihood, we can take the derivative for both parameters $\beta^\star, \sigma^2$ to get MLE-estimators. $ \begin{align} \frac{\partial \ell}{\partial \beta^\star} :\frac{2}{2\sigma^2}*( Y-\mathbb X\beta^\star )\mathbb X &\stackrel{!}{=}0 \\[2pt] \mathbb X^T(Y- \mathbb X\hat \beta)&=0 \\[2pt] \mathbb X^TY &= \mathbb X^T\mathbb X \hat \beta \\[2pt] \hat \beta &= (\mathbb X^T\mathbb X)^{-1}\mathbb X^TY \end{align} $ **Equivalence:** We see that the MLE estimator for $\hat \beta$ is equivalent to the estimator under Least Squares estimation. $ \begin{align} \frac{\partial \ell}{\partial \sigma^2}:-\frac{n}{2\sigma^2}+\frac{1}{2\sigma^4}\big \| Y-\mathbb X\beta^\star \big \|_2^2 &\stackrel{!}{=}0 \\[6pt] -n+\frac{1}{\hat\sigma^2} \big \| Y-\mathbb X\beta^\star \big \|_2^2&=0\\[6pt] n\hat \sigma^2&= \big \| Y-\mathbb X\beta^\star \big \|_2^2 \\[6pt] \hat \sigma^2&=\frac{1}{n}\big \| Y-\mathbb X\beta^\star \big \|_2^2 \end{align} $ >[!note:] >This MLE estimator for $\hat \sigma^2$ is not the unbiased estimator.