**Trinity of Statistical Inference:**
- *Estimation:* Choose an estimator $\hat \Theta$ to compute a good estimate $\hat \theta$ of the parameter $\theta$.
- *Confidence intervals:* Construct error bounds around the estimate $\hat \theta$ to quantify uncertainty.
- *Hypothesis testing:* Assess whether the estimate $\hat \theta$ provides statistically significant evidence (reject or fail to reject a null hypothesis).
## Statistical Model
We have a statistical experiment that gives a sample of $n$ observations $X_n$. Our statistical model that we build on top of this, consists of 3 parts:
$ (E, \{\mathbf P_\theta\}_{\theta \in \Theta}) $
| Variable | Description |
| ------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| $E$ | The [[Sample Space]] is a set of all possible outcomes from the experiment. We try to define the smallest possible sample space, that covers all possibilities. |
| $\{\mathbf P_\theta\}_{\theta \in \Theta}$ | A specific family of probability distributions, that fits to the sample space (e.g. if $X$ has a sample space of only non-negative integer, then [[Poisson Distribution]] is a valid distribution family). |
| $\Theta$ | The space of all possible parameter values. E.g. in a [[Bernoulli Distribution]] the parameter $p$ is within $[0,1]$. |
| $\theta$ | The true parameter value of the probability family, that fits the data well. |
**Model Specification:**
We say that a model is well-specified, if the actual distribution $\mathbf P$ that generated my data, is within the set of the chosen family of probability distributions $\{\mathbf P_\theta\}_{\theta \in \Theta}$.
>[!note:]
>We use notation of $\theta$ when we talk about the abstract parameter of a probability model. However, as soon as we defined the model we will use the respective $\{p, \space \mu, \space\lambda, \space \dots\}$.
## Model Types
- *Parametric model:* Here we assume that the parameter space $\Theta$ is a vector of $d$ finite dimensions $\Theta \subseteq \mathbb R^d$. E.g. a Gaussian has a two-dimensional parameter space:
$ \Theta = \Big(\mu: [-\infty, \infty],\space \sigma^2: [0, \infty]\Big) $
- *Nonparametric model:* Here we assume that the parameter space $\Theta$ has infinitely many dimensions. So we do not really have a probability distribution in mind, with which we could reduce the parameter space.
- *Semiparametric model:* Theta consists of two parts $\Theta = \Theta_1 \times \Theta _2$, where $\Theta_1$ has finite dimensions, and $\Theta_2$ has infinitely many. We only want to estimate $\Theta_1$ and call $\Theta_2$ the nuisance parameter.
## Examples of Parametric Models
- For $n$ Bernoulli trials the sample is either $0$ or $1$, while the parameter space is within the continuous range of $[0,1]$.
$ \Big(\{0,1\}, \{\text{Ber}(p)\}_{p \in (0,1)}\Big) $
- $X_1, \dots , X_n$ are i.i.d. and form a Poisson distribution. While the sample space is any non-negative integer, the parameter space for $\lambda$ is any positive number.
$ \Big(\mathbb N, \{\text{Poiss}(\lambda)\}_{\lambda >0 }\Big) $
- $X_1, \dots , X_n$ are i.i.d. and form a normal distribution. While the sample space is any real number, the parameter space is different for $\mu$ or $\sigma$.
$ \Big(\mathbb R, \{\mathcal N(\mu, \sigma^2)\}_{(\mu, \sigma^2) \in \mathbb R \times (0, \infty)}\Big) $
>[!note:]
>The notation of the cross-term $\times$ forms a [Cartesian product](https://en.wikipedia.org/wiki/Cartesian_product). In this case it maps $\mu$ to $\mathbb R$ and $\sigma^2$ to $(0, \infty)$.
>[!note:]
>We only write out the parameter space that is unknown to us. E.g. if we already know $\sigma$ (or take an assumptions as given), we do not include it in this specification.
## Example of Nonparametric Model
$X_1, \dots , X_n \in \mathbb R$ are [[Independence and Identical Distribution|i.i.d.]] with an unknown PDF, where we only know that it is unimodal (only has one mode). We do not have a finite number of parameters to estimate, since $\Theta$ is the set of all unimodal distributions.