Chi-Square Test - Bernhard Pfann, CFA

The chi-square goodness of fit (GoF) test assesses whether observed discrete data matches: 1. A specific *parameterized distribution*, or 2. A distribution from a broader, *unparameterized distribution family* $\mathcal{F}$. ## Fit of Parameterized Distributions **Example:** We want to see if a dice is fair, so we roll it many times and record how often each number comes up. This reflects a [[Multinomial Distribution]]. $ \begin{cases} H_0:\mathbf P=\mathcal U(\frac{1}{6}) &\text{The dice is fair}\\ H_1:\mathbf P\not =\mathcal U(\frac{1}{6}) &\text{The dice is not fair} \end{cases} $ **Test Statistic:** $T_n$ compares observed and expected frequencies, scaled by the hypothesized probabilities. $ T_n=n \sum_{j=1}^K \frac{(\hat{\mathbf p_j}-\mathbf p_j^{(0)})^2}{\mathbf p_j^{(0)}} \xrightarrow[n \to \infty]{(d)}\chi^2_{k-1} $ where: - $\mathbf p^{(0)}=[p_1^{(0)}, \dots, p_K^{(0)}]$: The parameter vector of the hypothesized distribution. - $\hat {\mathbf p}_j=\frac{N_j}{n}$: The observed relative frequencies of the $j$-th class. For a multinomial distribution this is equal to the [[Maximum Likelihood Estimation#Estimator|MLE Estimator]], as shown [[Multinomial Distribution#Equivalence to Relative Frequencies|here]]. **Interpretation:** - The squared differences $(\hat{\mathbf{p}}_j - \mathbf{p}_j^{(0)})^2$ measure the deviation between observed and hypothesized distributions. - Dividing by $\mathbf{p}_j^{(0)}$ gives greater weight to differences where $\mathbf{p}_j^{(0)}$ is small, making the test sensitive to rare events. - The null hypothesis $H_0$ is rejected if $T_n$ is large compared to the [[Chi-Square Distribution]] with $K-1$ degrees of freedom. ## Fit of a Distribution Family **Example:** We check if our data comes from a [[Binomial Distribution]] or not. Instead of fixed values for $\mathbf p^{(0)}$ we use the estimated parameter $\hat \theta$ to define $f_{\hat \theta}(j)$, which is the probability mass, at element $j$. $ \begin{cases} H_0:& \mathbf P \in \{\text{Binom}(K, \theta )\}_{\theta \in (0,1)} \\[4pt] H_1:&\mathbf P \notin \{\text{Binom}(K, \theta)\}_{\theta \in (0,1)} \end{cases} $ **Test Statistic:** $ \begin{align} T_ n &= n\sum _{j =0}^ K \frac {\left(\hat \theta_j^{MLE} - {{f_{\widehat{\theta}}(j)}}\right)^2} {{{f_{\widehat{\theta }}(j)}}}\xrightarrow [n \to \infty ]{(d)} \chi ^2_{(K+1) - d - 1} \\ &= n\sum_{j=0}^K \frac{\left(\frac{N_j}{n} - f_{\hat \theta}(j) \right)^2} {f_{\hat \theta}(j)} \xrightarrow [n \to \infty ]{(d)} \chi^2_{(K+1) -d-1} \end{align} $ where: - $\hat \theta_j^{MLE}$ is the MLE estimate from the multinomial distribution (for category $j$), which is equivalent to the observed relative frequencies. - $f_{\hat \theta}(j)$ is the [[Probability Mass Function|PMF]] of the hypothesized distribution with their respective MLE estimated parameters. >[!note:] >We use the observed data to derive the parameter estimates $\hat \theta$ for the PMF under the null $f_{\hat \theta}(j)$. Since this naturally makes the $\hat \theta^{MLE}_j$ closer to the expected distribution under the null, we need to reduce the number of degrees of freedom, by the number of parameters $d$.