In the maximum a posteriori (”MAP”) estimation, we want to choose a $\theta$ that maximizes this posterior distribution. So given the data $X$ that we have seen, we choose the $\theta$ that is most likely under that condition. - For a discrete r.v. $\Theta_{\mathrm{MAP}}$ is equal to the mode of the posterior. - For a continuous r.v. $\Theta_{\mathrm{MAP}}$ is at the spot of the posteriors maximum density. Maximizing mass/density by trying different $\theta$ values: $ \begin{align} p_{\Theta \vert X}(\theta^\star \vert x)= \max_{\theta} \Big (p_{\Theta \vert X}(\theta \vert x)\Big) \\ f_{\Theta \vert X}(\theta^\star \vert x)= \max_{\theta} \Big (f_{\Theta \vert X}(\theta \vert x)\Big) \end{align} $ The left side of the above equation is the PMF/PDF of the posterior distribution at specific point $\theta^\star$, given the realized observations $x$. This is obtained by looking for the maximum $\theta$ along this distribution. To calculate the maximum, we have to take the derivative of the PMF/PDF and set it to $0$. ## Conditional Probability of Error **Definition:** It is the probability that the estimator $\hat \Theta$ does not equal the true parameter $\Theta$ when data $(X=x)$ is observed. $ \mathbf P(\hat\Theta \neq \Theta \vert X=x) $ Since the observations $x$ are fixed, we obtain a specific estimate $\hat \theta$ from the estimator. However $\Theta$ remains a r.v. since it is still unknown to us. $ \mathbf P(\Theta \neq \hat \theta \vert X=x) = 1- \mathbf P(\Theta=\hat \theta \vert X=x) $ > [!note:] > Under the MAP-estimator $\hat \Theta_{\mathrm{MAP}}$, we are choosing the $\hat \theta$ which has the maximum mass (or density) conditional on the data $x$. Conversely this must mean, that this estimator always gives the minimum probability of error. ## Overall Probability of Error **Definition:** It is the probability that the estimator $\hat \Theta$ does not equal the true parameter $\Theta$, irrespective of realized values of $X$. $ \mathbf P(\Theta \neq \hat \Theta) $ This measures the estimators performance before observing any data. It provides a general assessment of the accuracy across all possible scenarios. To obtain this unconditional probability we use the total probability theorem (TPT). - *Approach 1:* Do a weighted sum over the probabilities of being wrong conditional on $x$. We sum over all possible $x$-values. The weights are the probability mass of each respective $x$-value. $ \mathbf P(\Theta \neq \hat \Theta)= \sum_{x} \mathbf P(\Theta \neq \hat \Theta \vert X=x)*p_X(x) $ - *Approach 2:* Do a weighted sum over the probabilities of being wrong conditional on what the true parameter $\theta$ is. We sum over all possible $\theta$ values. The weights come from the PMF of the prior distribution of $\Theta$. $ \mathbf P(\Theta \neq \hat \Theta) =\sum_{\theta} \mathbf P(\Theta \neq \hat\Theta \vert \Theta=\theta)*p_{\Theta}(\theta) $ > [!note:] > The MAP-estimator minimizes the conditional probability of error for any $x$. Since, in the overall probability of error we just sum over many of these conditional probabilities, $\hat \Theta_{\text{MAP}}$ also has the minimum overall error.