## Asymptotic Normality
**Univariate:** When we estimate $\theta \in \mathbb R$, the [[Maximum Likelihood Estimation#Estimator|MLE estimator]] converges to an univariate [[Gaussian Distribution|Gaussian]]. The [[Variance]] is the inverse Fisher information value.
$
\sqrt n*\big(\hat\theta_n^{MLE}-\theta^\star\big) \xrightarrow[n \to \infty]{(d)}\mathcal N\Big(0,\underbrace{I(\theta^\star)^{-1}}_{\text{scalar}}\Big) $
**Multivariate:** Assume we now look at a [[Vector Operations|Vector]] $X \in \mathbb R^d$, where the sample average of each vector element is collected in another vector $\bar X_n \in \mathbb R^d$.
$
\begin{align}
\sqrt n \,(\bar X_n-\mu) &\xrightarrow[n \to \infty]{(d)}\mathcal N_d(0, \Sigma) \tag{1}\\[4pt]
\sqrt n \,(\hat \theta^{MLE}-\theta^\star) &\xrightarrow[n \to \infty]{(d)}\mathcal N_d\Big(0, \underbrace{I(\theta)^{-1}}_{(d \times d)}\Big) \tag{2}
\end{align}
$
where:
- (1) The [[Multivariate Central Limit Theorem]] states that, as $n \to \infty$, the centered and scaled vector $\bar X_n$ converges to a [[Multivariate Gaussian]] with mean $\begin{bmatrix}0 &\cdots&0\end{bmatrix}^T$ and [[Covariance Matrix]] $\Sigma$.
- (2) When we use an MLE estimator, we can quantify $\Sigma$ by the inverse Fisher information matrix.
## Calculate Fisher Information
**Univariate:** The Fisher information $I(\theta)$ can be derived by the derivative of $\ell(\theta)$, which is the log-likelihood of a single observation.
$ I(\theta)=\mathrm{Var}(\ell^\prime(\theta)) = - \mathbb E[\ell^{\prime\prime}(\theta)] $
**Multivariate:** Equivalently we derive the Fisher information matrix $I(\theta)$ by the covariance of the [[Gradient Descent#Gradient Vector|Gradient Vector]], or the [[Expectation]] of the [[Hessian]].
$ I(\theta)=\mathrm{\mathbf { Cov}\big(\nabla \ell(\theta)}\big)= - \mathbb E[\mathbf H \,\ell(\theta)] $
Apply the short formula for covariance (multivariate case):
$
\begin{aligned}
\mathrm{Cov(X)}&= \mathbb E[XX^T]-\mathbb E[X]*\mathbb E[X]^T \\[6pt] \mathrm{Cov}\big(\nabla \ell(\theta)\big)&= \mathbb E\big[\nabla \ell(\theta)*\nabla \ell(\theta)^T\big]\, - \,\mathbb E\big[\nabla \ell(\theta)\big]*\mathbb E\big[\nabla \ell(\theta)\big]^T
\end{aligned}
$