Kolmogorov-Lilliefors Test - Bernhard Pfann, CFA

In the [[Kolmogorov-Smirnov Test]], we test whether the data comes from a fully specified and deterministic [[Cumulative Density Function|CDF]] $F^{(0)}$. However, there are situations where we are only interested in testing whether the data comes from a *distribution family* $\mathcal{F}$ (e.g. any [[Gaussian Distribution]]), without specifying the exact parameters of the distribution. ## Modifications to Kolmogorov-Smirnov Test - Parameterize $F^{(0)}$ using im-sample estimates of the parameters (e.g. sample mean $\hat \mu$ and variance $\hat \sigma^2$ when testing for a Gaussian). - Replace $F^{(0)}(t)$ with the estimated CDF based on these parameters (e.g. $\Phi_{\hat \mu, \hat \sigma^2}$ for a Gaussian) The modified test statistic becomes: $ T_n = \sqrt n*\sup_{t \in \mathbb R} \Big \lvert F_n(t)- \underbrace{\Phi_{\hat \mu, \hat \sigma^2}(t)}_{F^{(0)}(t)} \Big \rvert $ ## Challenges with Parameter Estimation Using *in-sample estimates introduces bias* because the data is used to fit the very model being tested. The fitted parameters (e.g. $\hat \mu, \hat \sigma^2$) make the hypothesized CDF inherently dependent on the sample data. This dependency *breaks the assumptions* required for the [[Empirical Cumulative Density Function#Donsker’s Theorem|Donsker’s Theorem]], which guarantees the asymptotic distribution of the KS statistic under $H_0$. Thus, we have to rely on different critical values (that reject the null hypothesis less easily).