## Asymptotic Normality **Univariate:** When we estimate $\theta \in \mathbb R$, the [[Maximum Likelihood Estimation#Estimator|MLE estimator]] converges to an univariate [[Gaussian Distribution|Gaussian]]. The [[Variance]] is the inverse Fisher information value. $ \sqrt n*\big(\hat\theta_n^{MLE}-\theta^\star\big) \xrightarrow[n \to \infty]{(d)}\mathcal N\Big(0,\underbrace{I(\theta^\star)^{-1}}_{\text{scalar}}\Big) $ **Multivariate:** Assume we now look at a [[Vector Operations|Vector]] $X \in \mathbb R^d$, where the sample average of each vector element is collected in another vector $\bar X_n \in \mathbb R^d$. $ \begin{align} \sqrt n \,(\bar X_n-\mu) &\xrightarrow[n \to \infty]{(d)}\mathcal N_d(0, \Sigma) \tag{1}\\[4pt] \sqrt n \,(\hat \theta^{MLE}-\theta^\star) &\xrightarrow[n \to \infty]{(d)}\mathcal N_d\Big(0, \underbrace{I(\theta)^{-1}}_{(d \times d)}\Big) \tag{2} \end{align} $ where: - (1) The [[Multivariate Central Limit Theorem]] states that, as $n \to \infty$, the centered and scaled vector $\bar X_n$ converges to a [[Multivariate Gaussian]] with mean $\begin{bmatrix}0 &\cdots&0\end{bmatrix}^T$ and [[Covariance Matrix]] $\Sigma$. - (2) When we use an MLE estimator, we can quantify $\Sigma$ by the inverse Fisher information matrix. ## Calculate Fisher Information **Univariate:** The Fisher information $I(\theta)$ can be derived by the derivative of $\ell(\theta)$, which is the log-likelihood of a single observation. $ I(\theta)=\mathrm{Var}(\ell^\prime(\theta)) = - \mathbb E[\ell^{\prime\prime}(\theta)] $ **Multivariate:** Equivalently we derive the Fisher information matrix $I(\theta)$ by the covariance of the [[Gradient Descent#Gradient Vector|Gradient Vector]], or the [[Expectation]] of the [[Hessian]]. $ I(\theta)=\mathrm{\mathbf { Cov}\big(\nabla \ell(\theta)}\big)= - \mathbb E[\mathbf H \,\ell(\theta)] $ Apply the short formula for covariance (multivariate case): $ \begin{aligned} \mathrm{Cov(X)}&= \mathbb E[XX^T]-\mathbb E[X]*\mathbb E[X]^T \\[6pt] \mathrm{Cov}\big(\nabla \ell(\theta)\big)&= \mathbb E\big[\nabla \ell(\theta)*\nabla \ell(\theta)^T\big]\, - \,\mathbb E\big[\nabla \ell(\theta)\big]*\mathbb E\big[\nabla \ell(\theta)\big]^T \end{aligned} $