Empirical Cumulative Density Function

The [[Cumulative Density Function|CDF]] can be expressed as the expectation of an indicator variable. $ F(t)= \mathbf P(X \le t)=\mathbb E\big[\mathbf 1\{X \le t\}\big] $ ^ddb620 The empirical CDF ("sample CDF") replaces the expectation with a sample average, effectively counting the number of observations where $X_i \le t$ and dividing by the total number of observations $n$. $ F_n(t)=\frac{1}{n} \sum_{i=1}^n \mathbf 1 \{X_i \le t\} $ ^5957fc We can compute this empirical CDF at every location of $X_i$, which results in a step function (i.e. it increases with every new $X_i$). ![[empirical-cdf.png|center|500]] ## Convergence Properties **Pointwise Convergence:** For every fixed $t$, the empirical CDF [[Modes of Convergence#Convergence Almost Surely|converges almost surely]] to the true CDF. This is because of [[Law of Large Numbers|LLN]], which says that a sample average converges to the expectation as $n$ grows larger. $ \begin{align} \frac{1}{n} \sum_{i=1}^n \mathbf 1 \{X_i \le t\} &\xrightarrow[n \to \infty]{a.s.} \mathbb E\big[\mathbf 1\{X \le t\}\big] \\[12pt] F_n(t) &\xrightarrow[n \to \infty]{a.s.}F(t) \end{align} $ **Uniform Convergence:** This states that a function $F_n$ converges to the function $F$ as a whole and with a uniform rate. This behavior is *not implied* just because of the pointwise converge. - *Example:* While at $n=100$, we can see that $F_n(t=1.6)$ has already converged, but $F_n(t=2.5)$ did not convergence, due to few observations in that region. Since there are infinite points in $t \in \mathbb R$, the point-wise convergence at all infinite points would need to be proven, to establish uniform convergence. **Glivenko-Cantelli Theorem:** The theorem states that if the empirical CDF comes from [[Independence and Identical Distribution|i.i.d.]] [[Random Variable|r.v's.]], then it converges uniformly to the true CDF. Technically speaking, the supremum of the absolute difference between the empirical and asymptotic CDF goes to $0$, even for the “worst” choice of $t$. ^d0ca51 $ \sup_{t \in \mathbb R} \Big \lvert F_n(t) -F(t) \Big \rvert \xrightarrow[n \to \infty]{a.s.}0 $ ## Asymptotic Properties **Normality:** We showed above that $F_n(t)$ is the sample average of indicator (i.e. [[Bernoulli Distribution|Bernoulli]]) r.v’s. for a given $t$. Therefore $F_n(t)$ follows a [[Binomial Distribution]], scaled by $1\over n$. $ F_n(t) = \frac{1}{n} \sum_{i=1}^n \mathbf 1 \{X_i \le t\} \sim \frac{1}{n}*\mathrm{Binom}(n,p) $ With large enough $n$, this sample average converges to a [[Gaussian Distribution]] by [[Central Limit Theorem|CLT]]. $ F_n(t) \xrightarrow[n \to \infty]{(d)} \mathcal N(\mu, \sigma^2) $ **Variance:** $ \begin{align} \mathrm{Var}\big(F(t)\big) &=\mathrm{Var}\Big(\text{Binom}(n,p)*\frac{1}{n}\Big) \tag{1}\\[8pt] &=\frac{1}{n^2}*\mathrm{Var}\big(\text{Binom}(n,p)\big) \tag{2}\\[12pt] &=\frac{np(1-p)}{n^{2}} \tag{3}\\[10pt] &=\frac{F(t)(1-F(t)}{n} \tag{4} \end{align} $ where: - (1) The variance of $F(t)$ is just the variance of a scaled binomial. - (3) Apply the [[Binomial Distribution#Variance|Variance of a Binomial]] with parameters $n,p$. - (4) We know that $p$ is just $\mathbf 1 \{X_i <t\}$, which is $F(t)$. $ \begin{align} \big(F_n(t) -F(t) \big) &\xrightarrow[n \to \infty]{(d)} \mathcal N(0, \sigma^2) \tag{5}\\[10pt] \big(F_n(t) -F(t) \big) &\xrightarrow[n \to \infty]{(d)} \mathcal N\Big(0, \frac{F(t)(1-F(t)}{n}\Big) \tag{6}\\[10pt] \sqrt n* \big(F_n(t) -F(t) \big) &\xrightarrow[n \to \infty]{(d)} \mathcal N\Big(0,\, F(t)(1-F(t)\Big) \tag{7}\\[10pt] \end{align} $ where: - (5) By LLN we have pointwise convergence for a fixed $t$. Thus we can say that the difference between $F_n(t)$ and $F(t)$ goes to $0$ for some fixed $t$. Since $F(t)$ is deterministic, there is no effect on the variance. - (6) Insert asymptotic variance from above. ## Donsker’s Theorem Above we have established pointwise asymptotic normality. To make uniform statements for every $t \in \mathbb R$, we need to rely on Donsker’s theorem. Given $F$ is continuous, the supremum of the absolute difference scaled by $\sqrt n$ converges to a Brownian bridge $\mathbb B$ on the fixed interval $[0,1]$. Since $\mathbb B$ does not depend on anything unknown (i.e. it is a pivotal distribution), we can look up its quantiles in tables (or software). $ \sqrt{n} * \sup_{t \in \mathbb R} \Big \lvert F_n(t) - F(t) \Big \rvert \xrightarrow [n \to \infty ]{(d)} \sup_{0 \le x \le 1} \Big \lvert \mathbb B(x) \Big \rvert $