In classical (”frequentist”) statistics we view $\theta$ as a unknown but fixed value that we want to estimate. In Bayesian statistics $\Theta$ is a [[Random Variable]] itself. Therefore our result in a Bayesian approach, will be a (posterior) distribution approximating that r.v..
By including a prior belief about $\Theta$ (expert knowledge), into the analysis we can obtain shaper estimates with fewer observations from the data.
[[Bayes Rule]] following standard notation of observations $X$ and parameters $\Theta$:
$
\mathbf P(\theta \vert X) = \frac{\mathbf P(X \vert \theta) * \mathbf P(\theta)}{\mathbf P(X)}
$
| Notation | Name | Description |
| --------------------------- | ------------- | ----------------------------------------------------------------------------------------------------------- |
| $\mathbf P(\theta)$ | Prior | Believe about $\theta$ before having seen any data. |
| $\mathbf P(X)$ | Normalization | Probability of seeing the data (can be neglected as we are only looking to maximize the numerator). |
| $\mathbf P(X \vert \theta)$ | Likelihood | Probability of seeing the specific data, under the assumption that $\Theta$ is the specific value $\theta$. |
| $\mathbf P(\theta \vert X)$ | Posterior | Probability that $\Theta$ takes a specific value $\theta$, given the data I have seen so far. |
The result of a [[Bayes Rule#Bayesian Inference|Bayesian inference]] is the posterior distribution $\mathbf P(\theta\vert X)$. However sometimes we want to summarize our findings about the posterior into a single estimate $\hat \theta$. The function to transform the posterior into an estimate is called “estimator” $\Theta$.
![[bayesian framework.png|center|500]]
**Estimators:**
Dependent on the chosen estimator, there are different metrics to calculate the estimation error.
- [[MAP Estimator]] $(\hat \Theta_{\mathrm{MAP}}$): Probability of error
- [[LMS Estimator]] ($\hat \Theta_{\mathrm{LMS}}$): Mean squared error
**Choice of Estimator:**
- If the posterior distribution has just 1 mode and is symmetric (e.g. normal distribution), both $\hat \Theta_{\mathrm{MAP}}, \hat \Theta_{\mathrm{LMS}}$ lead to the same result (otherwise they do not).
- In cases where the posterior is not unimodal (has $\ge 1$ modes), the [[MAP Estimator]] is inconclusive.
- While LMS is relevant for estimation problems, MAP can be used for [[Total Variation Distance]], where the [[MAP Estimator#Conditional Probability of Error|probability of error]] for discrete actions is most important.
**Terminology:**
| Term | Explanation |
| ------------- | ------------------------------------------------------------------------------------- |
| $X$ | Random variable of data generation process. |
| $x$ | Actually observed data. |
| $\Theta$ | Random variable of the unknown parameter. |
| $\theta$ | A specific value that $\Theta$ can take. |
| $\hat \Theta$ | The estimator. It is a function of $X$. |
| $\hat \theta$ | The specific estimate provided by the estimator function $\hat \Theta(x)$ when $X=x$. |