In [[Wald Test]], we have so far directly compared $\hat \theta$ with some $\theta_0$. In the Likelihood-Ratio test, we instead compare the [[Likelihood Functions|likelihood]] values that are retrieved from these $\theta$ values. ![[likelihood-ratio-test.png|center|400]] Both, the Wald test and Likelihood-Ratio test are asymptotically equivalent, which means that for large sample sizes they should give similar results. ## Univariate Form The most basic setup, is the assessment of the likelihood ratio between two simple [[Hypothesis Tests|hypothesis]]. $ \begin{cases} H_0: \theta = \theta_0 \\ H_1: \theta = \theta_1 \end{cases} $ We compute the likelihood of our data $(x_1, \dots, x_n)$ assuming $\theta_0$ or $\theta_1$, and then take the ratio of the two. The [[Statistical Test]] $\psi_C$ is an indicator variable that returns $1$ if the ratio is above a certain threshold value $C$. $ \psi_C = \mathbf 1 \left( \frac{L_n(x_1, \dots, x_n; \theta_1)}{L_n(x_1, \dots, x_n; \theta_0)}>C \right) $ When $H_1$ is a [[Hypothesis Tests#Types of Hypotheses|composite hypothesis]], then the numerator of the test $\psi$ takes the maximum (actually supremum) likelihood, among all $\theta \in \Theta_1$. Since $H_0$ is only covering a single $\theta_0$, we can say that $\Theta_1$ covers almost all $\theta$, which makes it the maximum likelihood. $ \psi_C = \mathbf 1 \left( \frac{\sup_{\theta \in \Theta_1} L_n(x_1, \dots, x_n; \theta)}{L_n(x_1, \dots, x_n; \theta_0)}>C \right) \implies \mathbf 1 \left( \frac{ L_n(X; \theta^{\mathrm{MLE}})}{L_n(X; \theta_0)}>C \right) $ When we multiply this $\psi_C$ with $2\log$ we get a test statistic $\Lambda$ that follows a [[Chi-Square Distribution]] $\chi^2$ according to Wilks theorem. This allows us to set $C$ at the desired $\alpha$ percentile of such a distribution. $ \Lambda = 2\log \left(\frac{ L_n(X; \theta^{\mathrm{MLE}})}{L_n(X; \theta_0)}\right) $ ## Multivariate Form We have an unknown parameter [[Vector Operations|Vector]] $\theta \in \mathbb R^d$, and we are deciding between two hypotheses. However, $H_0$ only makes statements about a subset of parameters $\theta \subset \Theta$. This subset does NOT include $(\theta_0, \dots, \theta_r)$. $ \begin{cases} H_0: (\theta_{r+1}, \dots,\theta_d)=(\theta_{r+1}^{(0)}, \dots,\theta_d^{(0)}) \\[4pt] H_1: (\theta_{r+1}, \dots,\theta_d) \not =(\theta_{r+1}^{(0)}, \dots,\theta_d^{(0)}) \end{cases} $ Now we perform [[Maximum Likelihood Estimation]] two times: | Symbol | Estimator | Description | | ------------------------- | ---------------------------------------------- | ----------------------------------------------------------------------------------------- | | $\theta^{\mathrm{MLE}}_n$ | $\arg \max_{\theta \in \Theta} \ell(\theta)$ | Unconstrained setup (regular MLE) | | $\hat \theta^c_n$ | $\arg \max_{\theta \in \Theta_0} \ell(\theta)$ | Constrained setup, where $(\theta_{r+1}, \dots , \theta_d)$ are fixed according to $H_0$. | The test statistic, where $\ell_n$ is the log-likelihood: $ T_n=2\left(\ell_n(\hat \theta_n^{\mathrm{MLE}})- \ell_n(\hat \theta_n^c) \right) $ If $H_0$ is true, then both estimators should give the same likelihood, and therefore $T_n$ should be small. Thus, we reject $H_0$ when $T_n$ is greater than some threshold value $C$. ## Wilks Theorem When $H_0$ is true, then the test statistic converges to a chi-square in distribution, with $(d-r)$ degrees of freedom. $ T_n \xrightarrow[n \to \infty]{(d)}\chi^2_{d-r} $ - $d:$ Number of parameters that are free to vary under $H_1$ (i.e. total number of parameters) - $r:$ Number of parameters that are free to vary under $H_0$ The difference $(d-r)$ is the number of restricted parameters (i.e. parameters explicitly set under $H_0$). These parameters need to be additionally estimated under $H_1$, compared to $H_0$. So the $\chi^2$ distribution reflects the statistical variability associated with estimating these additional parameters.