The generalized linear model relaxes two main components of the linear regression model.
## Linear Regression
- *Distribution of target:* We assumed that the conditional [[Random Variable|r.v.]] $(Y \vert X)$ follows a [[Gaussian Distribution]]. Its mean is the [[Univariate Linear Regression#Regression Function|Regression Function]] $\mu(x)$. Its [[Variance]] is governed by $\sigma^2$, which is the variance of residuals.
$ (Y \vert X=x) \sim \mathcal N\left (\mu(x), \sigma^2 \mathbf I\right) $
- *Regression function:* It is defined as the conditional expectation $\mathbb E[Y \vert X=x]$. We restricted its form to linear combinations for both the [[Univariate Linear Regression]] and [[Multivariate Linear Regression]].
$
\begin{align}
\mu(x) &= \mathbb E[Y \vert X=x]\\[4pt]
&= a+bx+\epsilon &&\text{(Univariate)}\\[6pt]
&= \mathbb X^T\beta+\epsilon && \text{(Multivariate)}
\end{align}
$
## Generalized Linear Model
- *Distribution of target:* The conditional r.v. $(Y \vert X)$ can have any distribution $\mathbf P_\theta$, as long as it is part of the [[Exponential Family]].
$ (Y \vert X=x) \sim\mathbf P_\theta $
- *Regression function:* When $Y$ is only on a subset of $\mathbb R$ (e.g. [[Bernoulli Distribution|Bernoulli]]), then $\mathbb E[Y \vert X] \equiv \mu(x)$ is also on this subset. To ensure that the [[Sample Space]] of the linear predictor $\mathbb X^T \beta$ still fits to this subset, a [[Canonical Link Functions|Link Function]] $g$ needs to be introduced.
$
\overbrace{g(\underbrace{\mu(x)}_{[0,1]})}^{\mathbb R} = \underbrace{\mathbb X^T\beta}_{\mathbb R} \quad \implies \quad
\underbrace{\mu(x)}_{[0,1]} = \overbrace{g^{-1}(\underbrace{\mathbb X^T\beta}_{\mathbb R})}^{[0,1]}
$
>[!note:]
>The link function needs to have two properties. It needs to be [[Monotonicity|monotone]] increasing, to ensure that the function is invertible. Also the link function needs to be differentiable, since we want to work with [[Maximum Likelihood Estimation|MLE]], where derivatives need to be calculated.
**Example #1:**
When $\mu(x) \in [0,1]$ a link-function $g(x)$ needs to be chosen, such that is maps values from $[0,1]$ to $\mathbb R$. A common choice is the logit-function.
$
\begin{aligned} g&: [0,1] \mapsto \mathbb R \\[4pt] g^{-1}&: \mathbb R \mapsto [0,1]
\end{aligned}
$
**Example #2:**
Also when we know that the direct relationship between $X$ and $Y$ is not linear, we can correct this by choosing a suitable link-function to make $g(\mu(x))$ fit to a linear predictor.
$ g(\mu(x)) = (\mathbb X^T\beta)^3 \quad \text{when} \quad g(\mu)=\sqrt[3] \mu $