When we build the [[Wald Test]] statistic with an [[Maximum Likelihood Estimation#Estimator|MLE Estimator]] $\theta^{MLE}_n$, we can rely on its property of [[Properties of an Estimator#Key Properties of an Estimator|asymptotic normality]].
$ \sqrt n*\big(\hat\theta_n^{MLE}-\theta\big) \xrightarrow[n \to \infty]{(d)}\mathcal N\Big(0,\frac{1}{I(\theta)}\Big) \\ $
Thereby we have ensured the convergences to a [[Gaussian Distribution|Gaussian]] $\mathcal N$, and we can also plug-in the asymptotic variance for the squared standard error of the parameter estimate $\widehat{\text{var}}(\hat \theta)$.
$ \overbrace{\sqrt n*\sqrt{I(\theta)}}^{\text{SE}}*\big(\hat \theta_n^{MLE}-\theta_0\big) \xrightarrow[n \to \infty]{(d)}\mathcal N(0,1) $
However, this is not a test-statistic yet, as $I(\theta)$ is unknown. However, by relying on the following two concepts, we can validly replace $I(\theta)$ with $I(\hat \theta)$.
1. *Continuous mapping theorem:* According to the [[Combining Limits#Continuous Mapping Theorem|Continuous Mapping Theorem]], $\theta^{MLE}$ is a [[Properties of an Estimator#Key Properties of an Estimator|consistent]] estimator $\theta^{MLE}\xrightarrow[]{\mathbf P}\theta$. The theorem states that a continuous function of a converging sequence also converges.
$
\text{if:}\quad
\hat \theta \xrightarrow[n \to \infty]{\mathbf P}\theta \quad \text{then:} \quad
I(\hat \theta) \xrightarrow[n \to \infty]{\mathbf P}I(\theta)
$
2. *Slutsky's theorem:* The [[Combining Limits#Slutsky’s Theorem|Slutsky’s Theorem]] states that a multiplication or division where one term has [[Modes of Convergence#Convergence in Distribution|Convergence in Distribution]] and the other has [[Modes of Convergence#Convergence in Probability|Convergence in Probability]], the result converges in distribution as well. Since we rely on asymptotic statements here, the Wald-test is only applicable for larger sample sizes.
$
\begin{rcases} \sqrt{I(\hat \theta)} &\xrightarrow[n \to \infty]{\mathbf P}\sqrt{I(\theta)} \\[10pt]
\sqrt n * (\hat \theta^{MLE}_n - \theta) &\xrightarrow[n \to \infty]{(d)}\mathcal N \big(0, I(\theta)^{-1}\big)
\end{rcases} \sqrt n*\sqrt{I(\hat \theta)}*\big(\hat \theta_n^{MLE}-\theta_0\big) \xrightarrow[n \to \infty]{(d)}\mathcal N(0,1)
$
So the final Wald-test statistic with an MLE estimator looks as follows:
$
\begin{align}
W=\sqrt{n*I(\hat \theta)}*\big(\hat \theta_n^{MLE}-\theta_0\big) &\xrightarrow[n \to \infty]{(d)}\mathcal N(0,1) \\[10pt]
=n* I(\hat \theta)*(\hat \theta^{MLE}_n-\theta_0)^2 &\xrightarrow[n \to \infty]{(d)} \chi^2_k
\end{align}
$