The Markov inequality is useful for bounding the probability that a non-negative [[Random Variable]] $X$ that takes on extreme values. It applies when we know the [[Expectation]] of $X$ but lack specific details about its distribution, especially in the tails.
If $X \ge 0$ and $a>0$ then:
$ \mathbf P(X \ge a) \leq \frac{\mathbb{E}[X]}{a} $
When $a$ is big, then the probability of $X$ being bigger than that constant $a$, is small and vice versa.
**Derivation:**
$
\begin{align}
\mathbb{E}[X] = \int_0^\infty xf_X(x) \, dx
&\ge \int_a^\infty xf_X(x) \,dx \tag{1}\\
&\ge \int_a^\infty a f_X(x) \,dx \tag{2}\\
&\ge a \int_a^\infty f_X(x) \,dx \tag{3}\\[10pt]
&\ge a* \mathbf P(X \ge a) \tag{4}
\end{align}
$
where:
- (1) Since $X \ge 0$, its expectation equals the integral of the PDF from $[0,\infty]$, weighted by $x$. This has to be bigger than the same term integrated over a shorter range $[a, \infty]$.
- (2) When integrating from $[a, \infty]$, then $a$ is the lowest value that $x$ can take. Therefore the inequality holds even more true, when replacing $x$ with $a$.
- (3) As $a$ is a constant, it can be pulled out of the integral.
- (4) The remaining part under the integral is just the tail probability mass.
**Example:**
Let $X$ come from an [[Exponential Distribution]] with $\lambda=1$. As we know its full distribution, we can compare that exact tail probability with the bound given by Markov inequality is.
Exact tail probability:
$ \mathbf P(X \ge a) = e^{-a\lambda}$
Markov bound:
$ \mathbf P(X \ge a) = \mathbb E[X]/a = \lambda/a$
When $\lambda =1$, then as $a$ increases:
* The exact tail probability decreases exponentially.
* The Markov bound decreases linearly.
> [!note:]
> Since an exponential decay is faster than a linear decrease, it shows that the Markov inequality provides only a loose bound for large $a$. The bound is less close to the real tail probability (less insightful) the bigger $a$ gets.