**Exponential Family:**
A distribution belongs to that [[Exponential Family]], if its density has the following form.
$ f_p (y) = h(y)* \exp \Big\{ T(y)\eta (p)- B(p) \Big\} $
**Canonical Exponential Family:**
When there is only a single parameter $\theta$ and $T(y)=y$, then our distribution is member of the canonical exponential family.
$ f_\theta(y) = \exp \left\{ \frac{y \theta - b(\theta )}{\phi } + c(y,\phi ) \right\} $
| Symbol | Name | Comment |
| ------------ | ---------------------- | -------------------------------------------------------------------------- |
| $\phi$ | Dispersion parameter | The term is assumed to be known for simplicity. We treat it as a constant. |
| $\theta$ | Canonical parameter | The formerly denoted $\eta(p)$. |
| $b(\theta)$ | Log-partition function | When $\theta$ is a function, then $b(\theta)$ is a function of a function. |
| $c(y, \phi)$ | Normalization | The formerly denoted $h(y)$ moved into the exponential function. |
## General to Canonical Form
To understand what $b(\theta)$ is in terms of $B(p)$, we need to express $\theta$ in terms of $p$. Since we know that $\theta = \eta(p)$, we simply need to take the inverse $\eta^{-1}(\theta)$ for that translation.
$
\begin{aligned}
\eta(p):p &\mapsto \theta \\[2pt] \eta^{-1}(\theta):\theta &\mapsto p \\[4pt]
b(\theta) &= b\big(\underbrace{\eta(p)}_{\theta}\big) = B(\underbrace{\eta^{-1}(\theta)}_{p})
\end{aligned}
$
**Example:**
The density of a [[Bernoulli Distribution|Bernoulli]] $f_\theta(y)$ can be written as follows:
$
f_\theta(y) = \exp \left\{y*\underbrace{\ln(\frac{p}{1-p})}_{\theta}-(\underbrace{-\ln(1-p)}_{B(p)})\right \}
$
Taking the inverse of the canonical parameter:
$
\begin{align}
\eta(p):\theta&=\ln(\frac{p}{1-p})\\[8pt] e^\theta&=\frac{p}{1-p}\\[10pt]
e^\theta&=p*(1+e^\theta)\\[2pt] \eta^{-1}(\theta):p&=\frac{e^\theta}{1+e^\theta}
\end{align}
$
Substituting the inverse into $B(p)$ to get to the log-partition function $b(\theta)$:
$
\begin{align}
b(\theta) &= -\ln\big(1-\eta^{-1}(\theta)\big) \\ &=-\ln\Big(1-\frac{e^\theta}{1+e^\theta}\Big) \\ &=-\ln\Big(\frac{1+e^\theta}{1+e^\theta} -\frac{e^\theta}{1+e^\theta} \Big) \\[10pt] &=\ln(1+e^\theta)
\end{align}
$
## Calculate Moments
The log-likelihood of a single observation in $y$ for a canonical exponential distribution gets rid of the exponential term.
$ \ell(\theta) = \log\big(f_\theta(y_i)\big)=\frac{y_i \theta - b(\theta )}{\phi } + c(y_i,\phi ) $
By relying on the first two [[Identities of Log-Likelihood]], we can derive the [[Expectation]] and [[Variance]], from this notational form of the [[Probability Density Function|PDF]].
**Expectation:**
$
\begin{align}
\frac{\partial \ell}{\partial \theta} &= \frac{Y-b^\prime(\theta)}{\phi} \\[6pt]
0=\mathbb E\Big[\frac{\partial \ell}{\partial \theta}\Big] &= \frac{\mathbb E[Y]-b^\prime(\theta)}{\phi} \\[6pt]
\mathbb E[Y] &= b^\prime(\theta)
\end{align}
$
**Variance:**
$
\begin{aligned}
\frac{\partial^2 \ell}{\partial \theta^2}+ \Big(\frac{\partial \ell}{\partial \theta}\Big)^2 &=
-\frac{b^{\prime \prime}(\theta)}{\phi}+\Big(\frac{Y-\overbrace{b^\prime(\theta)}^{\mathbb E[Y]}}{\phi}\Big)^2 \\[6pt]
0&=-\frac{b^{\prime \prime}(\theta)}{\phi}+\frac{\mathrm{var}(Y)}{\phi^2} \\[8pt]
\mathrm{var}(Y)&=b^{\prime \prime}(\theta)*\phi
\end{aligned}
$