In the maximum a posteriori (”MAP”) estimation, we want to choose a $\theta$ that maximizes this posterior distribution. So given the data $X$ that we have seen, we choose the $\theta$ that is most likely under that condition.
- For a discrete r.v. $\Theta_{\mathrm{MAP}}$ is equal to the mode of the posterior.
- For a continuous r.v. $\Theta_{\mathrm{MAP}}$ is at the spot of the posteriors maximum density.
Maximizing mass/density by trying different $\theta$ values:
$ \begin{align}
p_{\Theta \vert X}(\theta^\star \vert x)= \max_{\theta} \Big (p_{\Theta \vert X}(\theta \vert x)\Big) \\
f_{\Theta \vert X}(\theta^\star \vert x)= \max_{\theta} \Big (f_{\Theta \vert X}(\theta \vert x)\Big)
\end{align} $
The left side of the above equation is the PMF/PDF of the posterior distribution at specific point $\theta^\star$, given the realized observations $x$. This is obtained by looking for the maximum $\theta$ along this distribution. To calculate the maximum, we have to take the derivative of the PMF/PDF and set it to $0$.
## Conditional Probability of Error
**Definition:** It is the probability that the estimator $\hat \Theta$ does not equal the true parameter $\Theta$ when data $(X=x)$ is observed.
$ \mathbf P(\hat\Theta \neq \Theta \vert X=x) $
Since the observations $x$ are fixed, we obtain a specific estimate $\hat \theta$ from the estimator. However $\Theta$ remains a r.v. since it is still unknown to us.
$ \mathbf P(\Theta \neq \hat \theta \vert X=x) = 1- \mathbf P(\Theta=\hat \theta \vert X=x) $
> [!note:]
> Under the MAP-estimator $\hat \Theta_{\mathrm{MAP}}$, we are choosing the $\hat \theta$ which has the maximum mass (or density) conditional on the data $x$. Conversely this must mean, that this estimator always gives the minimum probability of error.
## Overall Probability of Error
**Definition:** It is the probability that the estimator $\hat \Theta$ does not equal the true parameter $\Theta$, irrespective of realized values of $X$.
$ \mathbf P(\Theta \neq \hat \Theta) $
This measures the estimators performance before observing any data. It provides a general assessment of the accuracy across all possible scenarios. To obtain this unconditional probability we use the total probability theorem (TPT).
- *Approach 1:* Do a weighted sum over the probabilities of being wrong conditional on $x$. We sum over all possible $x$-values. The weights are the probability mass of each respective $x$-value.
$ \mathbf P(\Theta \neq \hat \Theta)= \sum_{x} \mathbf P(\Theta \neq \hat \Theta \vert X=x)*p_X(x) $
- *Approach 2:* Do a weighted sum over the probabilities of being wrong conditional on what the true parameter $\theta$ is. We sum over all possible $\theta$ values. The weights come from the PMF of the prior distribution of $\Theta$.
$ \mathbf P(\Theta \neq \hat \Theta) =\sum_{\theta} \mathbf P(\Theta \neq \hat\Theta \vert \Theta=\theta)*p_{\Theta}(\theta) $
> [!note:]
> The MAP-estimator minimizes the conditional probability of error for any $x$. Since, in the overall probability of error we just sum over many of these conditional probabilities, $\hat \Theta_{\text{MAP}}$ also has the minimum overall error.