**Trinity of Statistical Inference:** - *Estimation:* Choose an estimator $\hat \Theta$ to compute a good estimate $\hat \theta$ of the parameter $\theta$. - *Confidence intervals:* Construct error bounds around the estimate $\hat \theta$ to quantify uncertainty. - *Hypothesis testing:* Assess whether the estimate $\hat \theta$ provides statistically significant evidence (reject or fail to reject a null hypothesis). ## Statistical Model We have a statistical experiment that gives a sample of $n$ observations $X_n$. Our statistical model that we build on top of this, consists of 3 parts: $ (E, \{\mathbf P_\theta\}_{\theta \in \Theta}) $ | Variable | Description | | ------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | $E$ | The [[Sample Space]] is a set of all possible outcomes from the experiment. We try to define the smallest possible sample space, that covers all possibilities. | | $\{\mathbf P_\theta\}_{\theta \in \Theta}$ | A specific family of probability distributions, that fits to the sample space (e.g. if $X$ has a sample space of only non-negative integer, then [[Poisson Distribution]] is a valid distribution family). | | $\Theta$ | The space of all possible parameter values. E.g. in a [[Bernoulli Distribution]] the parameter $p$ is within $[0,1]$. | | $\theta$ | The true parameter value of the probability family, that fits the data well. | **Model Specification:** We say that a model is well-specified, if the actual distribution $\mathbf P$ that generated my data, is within the set of the chosen family of probability distributions $\{\mathbf P_\theta\}_{\theta \in \Theta}$. >[!note:] >We use notation of $\theta$ when we talk about the abstract parameter of a probability model. However, as soon as we defined the model we will use the respective $\{p, \space \mu, \space\lambda, \space \dots\}$. ## Model Types - *Parametric model:* Here we assume that the parameter space $\Theta$ is a vector of $d$ finite dimensions $\Theta \subseteq \mathbb R^d$. E.g. a Gaussian has a two-dimensional parameter space: $ \Theta = \Big(\mu: [-\infty, \infty],\space \sigma^2: [0, \infty]\Big) $ - *Nonparametric model:* Here we assume that the parameter space $\Theta$ has infinitely many dimensions. So we do not really have a probability distribution in mind, with which we could reduce the parameter space. - *Semiparametric model:* Theta consists of two parts $\Theta = \Theta_1 \times \Theta _2$, where $\Theta_1$ has finite dimensions, and $\Theta_2$ has infinitely many. We only want to estimate $\Theta_1$ and call $\Theta_2$ the nuisance parameter. ## Examples of Parametric Models - For $n$ Bernoulli trials the sample is either $0$ or $1$, while the parameter space is within the continuous range of $[0,1]$. $ \Big(\{0,1\}, \{\text{Ber}(p)\}_{p \in (0,1)}\Big) $ - $X_1, \dots , X_n$ are i.i.d. and form a Poisson distribution. While the sample space is any non-negative integer, the parameter space for $\lambda$ is any positive number. $ \Big(\mathbb N, \{\text{Poiss}(\lambda)\}_{\lambda >0 }\Big) $ - $X_1, \dots , X_n$ are i.i.d. and form a normal distribution. While the sample space is any real number, the parameter space is different for $\mu$ or $\sigma$. $ \Big(\mathbb R, \{\mathcal N(\mu, \sigma^2)\}_{(\mu, \sigma^2) \in \mathbb R \times (0, \infty)}\Big) $ >[!note:] >The notation of the cross-term $\times$ forms a [Cartesian product](https://en.wikipedia.org/wiki/Cartesian_product). In this case it maps $\mu$ to $\mathbb R$ and $\sigma^2$ to $(0, \infty)$. >[!note:] >We only write out the parameter space that is unknown to us. E.g. if we already know $\sigma$ (or take an assumptions as given), we do not include it in this specification. ## Example of Nonparametric Model $X_1, \dots , X_n \in \mathbb R$ are [[Independence and Identical Distribution|i.i.d.]] with an unknown PDF, where we only know that it is unimodal (only has one mode). We do not have a finite number of parameters to estimate, since $\Theta$ is the set of all unimodal distributions.