Confidence Intervals - Bernhard Pfann, CFA

A confidence interval (”CI”) at level $(1- \alpha)$ states that the true parameter $\theta$ lies within a random interval $\mathcal I$ with at least probability $\ge (1- \alpha)$. This needs to be the case for all possible values that $\theta$ can take. $ \mathbf P_\theta(\theta \in \mathcal I) \le 1- \alpha, \quad \forall \theta \in \Theta $ However, the interval $\mathcal I_n$ is a [[Random Variable]] since it is based on the sampled data (indicated by subscript $n$). Due to that randomness we cannot make exact statements about the CI based on finite data points. Instead, we need to talk about the asymptotic CI where we can rely on [[Central Limit Theorem|CLT]]. With infinite observations, the inequality turns into an equality. $ \lim_{n \to \infty} \space \mathbf P_\theta[\theta \in \mathcal I_n] = 1- \alpha, \quad \forall \theta \in \Theta $ **Interpretation:** When we repeat the same experiment many times and compute an interval $\mathcal I_n$ each time based on some sampled data, then the true parameter $\theta$ will lie within $(1- \alpha)$ percent of these different CI’s. >[!note:] >In the [[Bayesian Framework]] we treat $\theta$ as a r.v. with a posterior distribution, given the observed data. Hence, an interval for a “credible region” of the central mass from the posterior can be reported. ## Derivation Let $\hat\Theta_n$ be our estimator of the unknown parameter $\theta$ (e.g. an estimator could be the sample mean of observations $\bar X_n$). By CLT, the estimator will converge to a [[Gaussian Distribution]] $\mathcal N$ with mean $\mu$ and variance $\sigma^2 \over n$. After shifting the mean and scaling the variance the estimator converges to a standard Gaussian $\mathcal N (0,1)$. $ \begin{align} \hat \Theta_n &\xrightarrow[n \to \infty]{(d)} \mathcal N \Big(\theta, \frac{\sigma^2}{n}\Big) \\[6pt] \sqrt n * \frac{\hat \Theta_n -\theta}{\sigma} &\xrightarrow[n \to \infty]{(d)}\mathcal N(0,1) \\[6pt] \end{align} $ We can express both sides in in terms of a [[Cumulative Density Function|CDF]]. The Gaussian CDF is denoted as $\Phi(x)$ by convention. $ \Phi_n(x) = \mathbf P\left( \sqrt n * \frac{\hat \Theta_n -\theta}{\sigma}\le x\right) $ Because of the [[Modes of Convergence#Convergence in Distribution|Convergence in Distribution]], both terms should be approximately equal when $n$ is large enough. $ \begin{align} \Phi_n(x) &\approx \Phi(x) \tag{1} \\[6pt] \mathbf P\Big(\sqrt n*\frac{\hat \Theta_n -\theta}{\sigma} \le x \Big) &\approx \Phi(x) \tag{2} \\[8pt] \mathbf P\Big(\hat \Theta_n -\theta \le \frac{x \sigma}{\sqrt n} \Big) &\approx \Phi(x) \tag{3} \\[8pt] \mathbf P\Big(\lvert \hat \Theta_n -\theta \rvert > \frac{x \sigma}{\sqrt n} \Big) &\approx 2\Phi(-x) \tag{4} \\[6pt] \mathbf P \left(\lvert \hat \Theta_n -\theta \rvert > x \right) &\approx 2\Phi \Big(\frac{-x \sqrt n}{\sigma} \Big) \tag{5} \\[6pt] \mathbf P\left(\lvert \hat \Theta_n -\theta \rvert > x\right) &\approx \underbrace{2*\Big(1-\Phi \big(\frac{\sqrt n}{\sigma} \big)\Big)}_{\alpha} \tag{6} \end{align} $ where: - (5) The probability of $\lvert \hat \Theta_n-\theta\rvert$ being greater than some value, represents the two outer tails of the distribution. Due to [[Gaussian Distribution#^4ff648|symmetry]] this is the same as two times the left tail. - (6) Because of symmetry $\Phi(-x)$ is equal to $(1- \Phi(x))$. The left side of our equation states the probability that our estimator $\hat \Theta_n$ differs from the true parameter $\theta$ by more than some value $x$. For an e.g. $95\%$ CI, we need to find the $x$ that makes this equal to $0.05$ (denoted as $\alpha$). $ \begin{align} 2*\Big(1-\Phi \big(\frac{x*\sqrt n}{\sigma} \big)\Big) &= \alpha \tag{1} \\[6pt] \Phi \Big(\frac{x*\sqrt n}{\sigma} \Big) &= 1-\frac{\alpha}{2} \tag{2} \\[6pt] \frac{x*\sqrt n}{\sigma} &= \underbrace{\Phi^{-1} \big(1-\frac{\alpha}{2}\big)}_{q_{\alpha/2}} \tag{3} \end{align} $ (3) The inverse CDF $\Phi^{-1}$ takes the cumulative density $\in [0,1]$ and outputs the corresponding $x$-value. In our case $\Phi^{-1}(1-\frac{\alpha}{2})$ corresponds to the upper $x$-value of the CI. Now solving for $x$, we get distance from the estimate to the true parameter $(\vert \hat \theta_n - \theta \vert)$, which marks the threshold of the confidence interval. It turns out to be the z-value $q_\frac{\alpha}{2}$ times the standard error of the estimator $\frac{\sigma}{\sqrt n}$. $ x= q_{\frac{\alpha}{2}}*\frac{\sigma}{\sqrt n} $ This tells us the range of the interval around the true parameter at the given $\alpha$ level. $ \lim_{n \to \infty} \mathbf P\left(\hat \Theta_n \in \Big [\theta-\overbrace{\frac{q_\frac{\alpha}{2}* \sigma}{\sqrt n}}^{x}, \space \theta+ \overbrace{\frac{q_\frac{\alpha}{2}* \sigma}{\sqrt n}}^{x}\Big]\right)=1- \alpha $ Since we have a consistent estimator, we can say that $\hat \Theta_n \xrightarrow[]{\mathbf P}\theta$ when $n$ is sufficiently large. This allows us to flip the two terms. $ \mathcal I=\left[ \hat \Theta_n - \frac{q_\frac{\alpha}{2}*\sigma}{\sqrt n}, \, \hat \Theta_n + \frac{q_\frac{\alpha}{2}*\sigma}{\sqrt n} \right] $ Now we have an interval $\mathcal I$ at an asymptotic level. $ \lim_{n \to \infty} \mathbf P \left( \theta \ni \mathcal I \right)= 1- \alpha $ Note that $\mathcal I$ still entails the unknown parameter $\sigma$. Therefore we cannot directly compute numerical intervals, and need to rely on one of the following techniques.. - Conservative Bound - Solving the Quadratic - Plug-In Estimator >[!note:] >The symbol $\ni$ signifies that the random interval $\mathcal I$ is covering the unknown fixed parameter $\theta$. This is just of highlight where the randomness lies. ## Conservative Bound Since $\sigma$ is unknown and we want to compute some numerical CI, we can assume the maximum possible [[Variance]] for leading to the widest (most conservative) interval $\mathcal I$. E.g. when data is coming from a [[Bernoulli Distribution]] $\mathrm{Ber}(p)$, we know that the maximum variance is at $p=1/2$. $ \sigma^2 :\mathrm{Var}\big(\mathrm{Ber}(p)\big)=p*(1-p)=1/4 $ We plug this into the definition of the confidence interval. $ \begin{align} \mathcal I &\in \left[ \hat \Theta_n - \frac{q_\frac{\alpha}{2}*\sqrt{1/4}}{\sqrt n}, \space \hat \Theta_n + \frac{q_\frac{\alpha}{2}*\sqrt{1/4}}{\sqrt n} \right] \\[6pt] \mathcal I &\in \left[ \hat \Theta_n - \frac{q_\frac{\alpha}{2}}{2\sqrt n}, \space \hat \Theta_n + \frac{q_\frac{\alpha}{2}}{2\sqrt n} \right] \end{align} $ Note that this interval is the most conservative possible. Hence it is an inequality of at least $(1- \alpha)$. $ \\ \lim_{n \to \infty} \mathbf P(\theta \in \mathcal I) \ge 1 - \alpha $ ## Solving the Quadratic We can formulate the probability of being inside the interval, via inequalities. $ \begin{align} \lim_{n \to \infty}\mathbf P&\left(\theta \in \Big[ \hat \Theta_n - \frac{q_\frac{\alpha}{2}*\sigma}{\sqrt n}, \space \hat \Theta_n + \frac{q_\frac{\alpha}{2}*\sigma}{\sqrt n} \Big] \right)\\[8pt] \lim_{n \to \infty}\mathbf P& \left(\hat \Theta_n - \frac{q_\frac{\alpha}{2}*\sigma}{\sqrt n} \le \theta \le \hat \Theta_n + \frac{q_\frac{\alpha}{2}*\sigma}{\sqrt n} \right) \end{align} $ Writing the inequalities separately.. $ \theta \ge \hat \Theta_n - \frac{q_\frac{\alpha}{2}*\sigma}{\sqrt n} \\[8pt] \theta \le \hat \Theta_n + \frac{q_\frac{\alpha}{2}*\sigma}{\sqrt n} $ For some distributions (e.g. Bernoulli), squaring either of the two inequalities, results in a quadratic inequality in the parameter $p$. And this we can directly solve via the [[Quadratic Formula]]. >[!note:] >It does not actually matter which of the two inequalities we choose, because after subtracting by $\bar X_n$ and squaring both sides, they yield the same result. **Bernoulli Example:** When the data $X\sim\mathrm{Ber}(p)$, the parameter of interest is $p$. An example of an unbiased estimator $\hat \Theta_n$ would be the sample average $\bar X_n$. The standard deviation $\sigma$ of $\mathrm{Ber}(p)$ is known to be $\sqrt{p(1-p)}$. $ \begin{align} \theta &\mapsto p \\ \hat \Theta_n &\mapsto \bar X_n \\ \sigma &\mapsto \sqrt{p(1-p)} \end{align} $ $ \begin{align} p &\ge \bar X_n-\frac{q_{\frac{\alpha}{2}}\sqrt{p(1-p)}}{\sqrt n} \\[8pt] p- \bar X_n &\ge -\frac{q_{\frac{\alpha}{2}}\sqrt{p(1-p)}}{\sqrt n} \\[8pt] (p- \bar X_n)^2 &\ge \Big(\frac{q_{\frac{\alpha}{2}}\sqrt{p(1-p)}}{\sqrt n}\Big)^2 \\[8pt] (p- \bar X_n)^2 &\ge \frac{(q_{\frac{\alpha}{2}})^2*p(1-p)}n \end{align} $ By expanding the quadratic $(p- \bar X_n)^2$ on the left side and moving every term from right to left we get a quadratic formula w.r.t. $p$. The two solutions that we would retrieve from solving this, are the thresholds of our CI. $ \underbrace{\Big(1+ \frac{q_{\alpha/2}^2}{n}\Big)}_{A}*p^2 \underbrace{\Big(2\bar X_n + \frac{q_{\alpha/2}^2}{n}\Big)}_{B}*p + \underbrace{\bar X_n^2}_{C} = 0 $ This leads to a new confidence interval $\mathcal I = [p_1, p_2]$, where $p$ lies within, with exactly $1-\alpha$ probability. $ \begin{align} \mathcal I &= \Big[p_1, p_2\Big] \\ \lim_{n \to \infty} \mathbf P\left(\mathcal I \ni p\right)&= 1- \alpha \end{align} $ This time we did not approximate like in the conservative approach, wherefore the statement is exact. Therefore it is better (more narrow) than the conservative bound. >[!note:] >This time we did not approximate like in the conservative approach, wherefore the statement is exact. Therefore it is better (more narrow) than the conservative bound. ## Plug-In Estimate In this approach, we simply replace $\theta$ by its estimate $\hat \theta_n$. We can do this replacement as long as our estimator $\hat \Theta_n$ is [[Properties of an Estimator#Key Properties of an Estimator|consistent]], which means that it converges in probability to $\theta$. $ \hat \theta_n \xrightarrow[n \to \infty]{\mathbf P} \theta $ By CLT we have already shown that the centered and scaled estimator $\hat \Theta_n$ converges in distribution to a standard Gaussian. $ \sqrt n * \frac{\hat \Theta_n -\theta}{\sigma} \xrightarrow[n \to \infty]{(d)}\mathcal N(0,1) $ Now we claim that the same convergence holds true when we insert our estimate $\hat \theta$ instead of the true unknown $\theta$. $ \sqrt n * \frac{\hat \Theta_n -\hat \theta}{\hat \sigma} = \underbrace{\sqrt n * \frac{\hat \Theta_n -\theta}{\sigma}}_{\lim_{n \to \infty}\mathcal N(0,1)} * \underbrace{\frac{\sigma}{\hat \sigma}}_{\lim_{n\to \infty}=1} $ In the following equation we have a multiply of a term converging in distribution with another term converging in probability. By [[Combining Limits#Slutsky’s Theorem|Slutsky’s Theorem]] the product of such two terms converges in the same distribution as well. $ \mathcal I=\Big[ \hat \Theta_n - \frac{q_\frac{\alpha}{2}* \hat \sigma}{\sqrt n}, \, \hat \Theta_n + \frac{q_\frac{\alpha}{2}* \hat \sigma}{\sqrt n} \Big] $ >[!note:] >Here we are also getting an exact equality for the CI. However, we rely on an additional asymptotic assumption from Slutsky theorem. So when $n$ is finite, this is another source of uncertainty, and solving the quadratic might be the preferred solution for the confidence interval. $ \lim_{n \to \infty} \mathbf P\left(\mathcal I \ni \theta\right)= 1- \alpha $ ## Remarks **Centering intervals:** By definition, confidence intervals do not have to be centered around the mean. However, we can achieve the most narrow intervals for a given $\alpha$, when they are centered. That is at least true for Gaussian r.v’s where the highest density is around the mean $\mu$. **Computed intervals:** Assume the interval $\mathcal I$ centered around the estimator $\hat \Theta_n$: $ \mathcal I= \Big[\hat \Theta_n \pm \frac{q_{\frac{\alpha}{2}}}{2 \sqrt n}\Big] $ When we have run the numbers and determined the bounds of a confidence interval, it is not valid anymore to say, that $\theta$ is within that interval with probability $(1-\frac{\alpha}{2})$. Since both the interval bounds and $\theta$ are deterministic numbers, $\theta$ is either inside or outside the interval with certainty. $ \begin{align} \lim_{n \to \infty} \mathbf P \left(\theta \in \mathcal I\right) &\ge 1 - \alpha \\[12pt] \mathbf P(\theta \in[0.3, 0.7])&=0 \text{ or } 1 \end{align} $ Only as long as we say that $\theta$ is within a random interval around $\hat \Theta_n$ it is fine to associate a probability to it. This is because the interval is still a r.v.. However, in both cases, we can call the interval an e.g. $95\%$ confidence interval.