In classical (”frequentist”) statistics we view $\theta$ as a unknown but fixed value that we want to estimate. In Bayesian statistics $\Theta$ is a [[Random Variable]] itself. Therefore our result in a Bayesian approach, will be a (posterior) distribution approximating that r.v.. By including a prior belief about $\Theta$ (expert knowledge), into the analysis we can obtain shaper estimates with fewer observations from the data. [[Bayes Rule]] following standard notation of observations $X$ and parameters $\Theta$: $ \mathbf P(\theta \vert X) = \frac{\mathbf P(X \vert \theta) * \mathbf P(\theta)}{\mathbf P(X)} $ | Notation | Name | Description | | --------------------------- | ------------- | ----------------------------------------------------------------------------------------------------------- | | $\mathbf P(\theta)$ | Prior | Believe about $\theta$ before having seen any data. | | $\mathbf P(X)$ | Normalization | Probability of seeing the data (can be neglected as we are only looking to maximize the numerator). | | $\mathbf P(X \vert \theta)$ | Likelihood | Probability of seeing the specific data, under the assumption that $\Theta$ is the specific value $\theta$. | | $\mathbf P(\theta \vert X)$ | Posterior | Probability that $\Theta$ takes a specific value $\theta$, given the data I have seen so far. | The result of a [[Bayes Rule#Bayesian Inference|Bayesian inference]] is the posterior distribution $\mathbf P(\theta\vert X)$. However sometimes we want to summarize our findings about the posterior into a single estimate $\hat \theta$. The function to transform the posterior into an estimate is called “estimator” $\Theta$. ![[bayesian framework.png|center|500]] **Estimators:** Dependent on the chosen estimator, there are different metrics to calculate the estimation error. - [[MAP Estimator]] $(\hat \Theta_{\mathrm{MAP}}$): Probability of error - [[LMS Estimator]] ($\hat \Theta_{\mathrm{LMS}}$): Mean squared error **Choice of Estimator:** - If the posterior distribution has just 1 mode and is symmetric (e.g. normal distribution), both $\hat \Theta_{\mathrm{MAP}}, \hat \Theta_{\mathrm{LMS}}$ lead to the same result (otherwise they do not). - In cases where the posterior is not unimodal (has $\ge 1$ modes), the [[MAP Estimator]] is inconclusive. - While LMS is relevant for estimation problems, MAP can be used for [[Total Variation Distance]], where the [[MAP Estimator#Conditional Probability of Error|probability of error]] for discrete actions is most important. **Terminology:** | Term | Explanation | | ------------- | ------------------------------------------------------------------------------------- | | $X$ | Random variable of data generation process. | | $x$ | Actually observed data. | | $\Theta$ | Random variable of the unknown parameter. | | $\theta$ | A specific value that $\Theta$ can take. | | $\hat \Theta$ | The estimator. It is a function of $X$. | | $\hat \theta$ | The specific estimate provided by the estimator function $\hat \Theta(x)$ when $X=x$. |