The chi-square goodness of fit (GoF) test assesses whether observed discrete data matches:
1. A specific *parameterized distribution*, or
2. A distribution from a broader, *unparameterized distribution family* $\mathcal{F}$.
## Fit of Parameterized Distributions
**Example:**
We want to see if a dice is fair, so we roll it many times and record how often each number comes up. This reflects a [[Multinomial Distribution]].
$
\begin{cases}
H_0:\mathbf P=\mathcal U(\frac{1}{6}) &\text{The dice is fair}\\
H_1:\mathbf P\not =\mathcal U(\frac{1}{6}) &\text{The dice is not fair}
\end{cases}
$
**Test Statistic:**
$T_n$ compares observed and expected frequencies, scaled by the hypothesized probabilities.
$
T_n=n \sum_{j=1}^K \frac{(\hat{\mathbf p_j}-\mathbf p_j^{(0)})^2}{\mathbf p_j^{(0)}} \xrightarrow[n \to \infty]{(d)}\chi^2_{k-1}
$
where:
- $\mathbf p^{(0)}=[p_1^{(0)}, \dots, p_K^{(0)}]$: The parameter vector of the hypothesized distribution.
- $\hat {\mathbf p}_j=\frac{N_j}{n}$: The observed relative frequencies of the $j$-th class. For a multinomial distribution this is equal to the [[Maximum Likelihood Estimation#Estimator|MLE Estimator]], as shown [[Multinomial Distribution#Equivalence to Relative Frequencies|here]].
**Interpretation:**
- The squared differences $(\hat{\mathbf{p}}_j - \mathbf{p}_j^{(0)})^2$ measure the deviation between observed and hypothesized distributions.
- Dividing by $\mathbf{p}_j^{(0)}$ gives greater weight to differences where $\mathbf{p}_j^{(0)}$ is small, making the test sensitive to rare events.
- The null hypothesis $H_0$ is rejected if $T_n$ is large compared to the [[Chi-Square Distribution]] with $K-1$ degrees of freedom.
## Fit of a Distribution Family
**Example:**
We check if our data comes from a [[Binomial Distribution]] or not. Instead of fixed values for $\mathbf p^{(0)}$ we use the estimated parameter $\hat \theta$ to define $f_{\hat \theta}(j)$, which is the probability mass, at element $j$.
$
\begin{cases}
H_0:& \mathbf P \in \{\text{Binom}(K, \theta )\}_{\theta \in (0,1)} \\[4pt]
H_1:&\mathbf P \notin \{\text{Binom}(K, \theta)\}_{\theta \in (0,1)}
\end{cases}
$
**Test Statistic:**
$
\begin{align}
T_ n &= n\sum _{j =0}^ K \frac {\left(\hat \theta_j^{MLE} - {{f_{\widehat{\theta}}(j)}}\right)^2}
{{{f_{\widehat{\theta }}(j)}}}\xrightarrow [n \to \infty ]{(d)} \chi ^2_{(K+1) - d - 1} \\
&= n\sum_{j=0}^K \frac{\left(\frac{N_j}{n} - f_{\hat \theta}(j) \right)^2} {f_{\hat \theta}(j)} \xrightarrow [n \to \infty ]{(d)} \chi^2_{(K+1) -d-1}
\end{align}
$
where:
- $\hat \theta_j^{MLE}$ is the MLE estimate from the multinomial distribution (for category $j$), which is equivalent to the observed relative frequencies.
- $f_{\hat \theta}(j)$ is the [[Probability Mass Function|PMF]] of the hypothesized distribution with their respective MLE estimated parameters.
>[!note:]
>We use the observed data to derive the parameter estimates $\hat \theta$ for the PMF under the null $f_{\hat \theta}(j)$. Since this naturally makes the $\hat \theta^{MLE}_j$ closer to the expected distribution under the null, we need to reduce the number of degrees of freedom, by the number of parameters $d$.