Multiple Testing Problem - Bernhard Pfann, CFA

When conducting many different [[Statistical Test|Statistical Tests]] simultaneously but evaluating them individually, the likelihood of false discoveries increases. For example, if 100 tests are conducted at a 5% significance level ($\alpha = 0.05$), we expect, on average, 5 false discoveries purely by chance. To mitigate this problem, two main approaches are used: - Family-wise error rate - False discovery rate ## Family-Wise Error Rate The Family-Wise Error Rate ("FWER") controls the probability of making at least 1 false discovery among $n$ tests. It corresponds to the probability of the union of $n$ indicator variables. We want to set the critical values $C_i, \dots, C_n$ such that this probability is $\le \alpha$. This is a very conservative approach. $ \mathbf P_{\mu=0}\Big(\bigcup_{i=1}^n \, \{\vert T_i \vert > C_i\}\Big)\le \alpha $ We control the FWER through the Bonferroni correction, which tells us to reject each individual test at level $\alpha \over n$ instead of just $\alpha$. $ \begin{align} \mathbf P_{\mu=0}\Big(\bigcup_{i=1}^n \, \{\vert T_i \vert > q_{\alpha \over 2n}\}\Big) &\le\sum_{i=1}^n \overbrace{\mathbf P_{\mu=0}\big(\lvert T_i \rvert > q_{\alpha\over 2n}\big)}^{\alpha/n} \tag{1}\\[12pt] \mathbf P_{\mu=0}\Big(\bigcup_{i=1}^n \, \{\vert T_i \vert > q_{\alpha \over 2n}\}\Big)&\le \sum_{i=1}^n \frac{\alpha}{n} \tag{2}\\[12pt] \mathbf P_{\mu=0}\Big(\bigcup_{i=1}^n \, \{\vert T_i \vert > q_{\alpha \over 2n}\}\Big) &\le \alpha \tag{3} \end{align} $ where: - (1) The probability of making at least 1 false discovery in $n$ tests has to be less or equal to the sum of probabilities of false discoveries from each individual test. It will only be equality if the events (i.e. false discoveries) are completely disjoint (cannot happen together). - (2) We set the new $\alpha$-level at $\alpha\over n$, which ensures that the probability of making at least 1 false discovery will be $\le\alpha$. **Example:** When 2 independent tests are performed at $\alpha =0.05$. $ \begin{align} \mathbf P_{\mu=0}\Big(\bigcup_{i=1}^n \, \{\vert T_i \vert > q_{\alpha \over 2n}\}\Big)&=1-(1-\alpha)^2=0.0975 \\ \sum_{i=1}^n \mathbf P_{\mu=0}\big(\lvert T_i \rvert > q_{\alpha\over 2n}\big) &= 2\alpha=0.10 \end{align} $ The probability of correctly not rejecting two times is $(1-\alpha)^2$. The probability of making at least one false rejection is the complement of that. **Example:** When 2 perfectly correlated test are performed at $\alpha=0.05$. $ \begin{aligned} \mathbf P_{\mu=0}\Big(\bigcup_{i=1}^n \, \{\vert T_i \vert > q_{\alpha \over 2n}\}\Big)&=1-\alpha=0.05 \\ \sum_{i=1}^n \mathbf P_{\mu=0}\big(\lvert T_i \rvert > q_{\alpha\over 2n}\big) &= 2\alpha=0.10 \end{aligned} $ ## False Discovery Rate The False Discovery Rate ("FDR") is the expected fraction of number of false rejections (type 1 error) to the number of all rejections. This is less conservative, as only the rejected tests are considered in the denominator. $ \text{FDR}= \mathbb E\left [\frac{\sum_{i=1}^n \mathbf 1\{\lvert T_i \rvert > C_i \cap \mu_i=0\}}{\sum_{i=1}^n \mathbf 1 \{\lvert T_i \rvert > C_i\}} \right ] $ To control FDR, we first sort the [[P-Value|P-Values]] of all conducted tests ascendingly. $ P_{(1)}<P_{(2)}<\dots <P_{(n)} $ Then we compare each p-value with $i* \alpha \over n$, where $i$ is the rank of the p-value. This correction called *Benjamini-Hochberg (”BH”) correction*. Finally we identify the maximum $i^{th}$ test, where the p-value is less than its respective threshold. We reject all tests $(i)$ where $i < i_{\max}$, and fail to reject otherwise. $ i_{\max}=\max\Big\{i: P_{(i)}<i*\frac{\alpha}{n}\Big\} $ It turns out that this procedure leads to an FDR $\le \alpha$. Note that this is only true for a series of independent tests. ![[false-discover-rate.png|center|400]]