Generalized Linear Models - Bernhard Pfann, CFA

The generalized linear model relaxes two main components of the linear regression model. ## Linear Regression - *Distribution of target:* We assumed that the conditional [[Random Variable|r.v.]] $(Y \vert X)$ follows a [[Gaussian Distribution]]. Its mean is the [[Univariate Linear Regression#Regression Function|Regression Function]] $\mu(x)$. Its [[Variance]] is governed by $\sigma^2$, which is the variance of residuals. $ (Y \vert X=x) \sim \mathcal N\left (\mu(x), \sigma^2 \mathbf I\right) $ - *Regression function:* It is defined as the conditional expectation $\mathbb E[Y \vert X=x]$. We restricted its form to linear combinations for both the [[Univariate Linear Regression]] and [[Multivariate Linear Regression]]. $ \begin{align} \mu(x) &= \mathbb E[Y \vert X=x]\\[4pt] &= a+bx+\epsilon &&\text{(Univariate)}\\[6pt] &= \mathbb X^T\beta+\epsilon && \text{(Multivariate)} \end{align} $ ## Generalized Linear Model - *Distribution of target:* The conditional r.v. $(Y \vert X)$ can have any distribution $\mathbf P_\theta$, as long as it is part of the [[Exponential Family]]. $ (Y \vert X=x) \sim\mathbf P_\theta $ - *Regression function:* When $Y$ is only on a subset of $\mathbb R$ (e.g. [[Bernoulli Distribution|Bernoulli]]), then $\mathbb E[Y \vert X] \equiv \mu(x)$ is also on this subset. To ensure that the [[Sample Space]] of the linear predictor $\mathbb X^T \beta$ still fits to this subset, a [[Canonical Link Functions|Link Function]] $g$ needs to be introduced. $ \overbrace{g(\underbrace{\mu(x)}_{[0,1]})}^{\mathbb R} = \underbrace{\mathbb X^T\beta}_{\mathbb R} \quad \implies \quad \underbrace{\mu(x)}_{[0,1]} = \overbrace{g^{-1}(\underbrace{\mathbb X^T\beta}_{\mathbb R})}^{[0,1]} $ >[!note:] >The link function needs to have two properties. It needs to be [[Monotonicity|monotone]] increasing, to ensure that the function is invertible. Also the link function needs to be differentiable, since we want to work with [[Maximum Likelihood Estimation|MLE]], where derivatives need to be calculated. **Example #1:** When $\mu(x) \in [0,1]$ a link-function $g(x)$ needs to be chosen, such that is maps values from $[0,1]$ to $\mathbb R$. A common choice is the logit-function. $ \begin{aligned} g&: [0,1] \mapsto \mathbb R \\[4pt] g^{-1}&: \mathbb R \mapsto [0,1] \end{aligned} $ **Example #2:** Also when we know that the direct relationship between $X$ and $Y$ is not linear, we can correct this by choosing a suitable link-function to make $g(\mu(x))$ fit to a linear predictor. $ g(\mu(x)) = (\mathbb X^T\beta)^3 \quad \text{when} \quad g(\mu)=\sqrt[3] \mu $