**Statistic:** A statistic is a measurable function of the sample data (e.g., sample mean, sample variance, maximum). It depends only on the observed data and not on any unknown parameters. **Estimator of theta:** The estimator $\hat \theta$ is a type of statistic used to approximate the unknown parameter $\theta$. It is solely a function of the sample data and does not depend on $\theta$. For example, the sample mean or sample variance are common estimators. >[!note:] >The expectation e.g. $\mathbb E[X]$ is not an estimator, as it is not a function of the data, but a characteristic of the random variable (potentially $\theta$ itself). It is a theoretical quantity, which we do not know, as it relies on the population and not on the sample data. ## Key Properties of an Estimator **Consistency:** An estimator $\hat \theta_n$ is consistent if it converges to the true parameter $\theta$ as the sample size $n \to \infty$. This can be expressed as [[Modes of Convergence#Convergence in Probability|Convergence in Probability]] or [[Modes of Convergence#Convergence Almost Surely|Convergence Almost Surely]]. $ \hat \theta_n \xrightarrow[n \to \infty]{\mathbf P} \theta $ **Asymptotic normality:** An estimator $\hat \theta_n$ is asymptotically normal, if its scaled difference to the true $\theta$ converges to a [[Gaussian Distribution]] of $\mathcal N(0, \sigma^2)$. We call $\sigma^2$ the asymptotic variance. $ \sqrt n (\hat \theta_n- \theta) \xrightarrow[n \to \infty]{(d)}\mathcal N(0, \sigma^2)$ Alternatively, we can say that the estimator itself converges to a normal distribution with mean $\theta$ and variance $\frac{\sigma^2}{n}$. $ \hat \theta_n \xrightarrow[n \to \infty]{(d)}\mathcal N\Big(\theta, \frac{\sigma^2}{n}\Big) $ **Unbiasedness:** An estimator is unbiased if its [[Expectation]] equals the true parameter. $ \text{bias}(\theta)= \mathbb E[\hat \theta_n] - \theta $ **Variance:** The variance of an estimator $\hat \theta_n$ measures its variability across different samples. While unbiasedness ensures correctness "on average," low variance ensures reliability. ## Bias of an Estimator Note that an estimators can be unbiased, and still not be suited well (e.g. because of high variance). Assume that $X_1, \dots X_n \stackrel{iid}{\sim} \text{Ber}(p)$ and we need to choose an estimator for $p$. Note that all estimators are unbiased, but not all of them make sense. |Estimator|Estimator Description|Expectation|Bias of Estimator| |---|---|---|---| |$\hat p_n = \bar X_n$|Sample mean|$\mathbb E[\hat p_n]=p$|$\text{bias}(\hat p_n)=p-p=0$| |$\hat p_n=X_1$|First occurrence $\{0,1\}$|$\mathbb E[\hat p_n]=p$|$\text{bias}(\hat p_n)=p-p=0$| |$\hat p_n=\frac{X_1+X_2}{2}$|Average of first two occurrences|$\mathbb E[\hat p_n]=\frac{p+p}{2}=p$|$\text{bias}(\hat p_n)=p-p=0$| ## Variance of an Estimator Although all three estimators are unbiased, the sample mean is the estimator with the lowest variance and therefore the best when $n>2$. We are still considering $X_1, \dots X_n \stackrel{iid}{\sim} \text{Ber}(p)$. | Estimator | Estimator Description | Variance of Estimator | | ---------------------------- | -------------------------------- | --------------------- | | $\hat p_n = \bar X_n$ | Sample mean | $\frac{p*(1-p)}{n}$ | | $\hat p_n=X_1$ | First occurrence $\{0,1\}$ | $p(1-p)$ | | $\hat p_n=\frac{X_1+X_2}{2}$ | Average of first two occurrences | $\frac{p*(1-p)}{2}$ | ## Quadratic Risk The quadratic risk of an estimator combines its bias and variance into a single metric: $ R(\hat \theta_n)= \mathbb E\Big [( \hat \theta_n - \theta)^2 \Big] $ By expanding the square, the quadratic risk can be decomposed into the variance and the squared bias: $ \begin{align} R(\hat \theta_n)&= \mathbb E\Big [ \lvert \hat \theta_n - \theta \rvert^2 \Big ] \tag{1}\\[6pt] &= \mathbb E \Big[(\overbrace{\hat \theta_n - \mathbb E[\hat \theta_n]}^{a}+ \overbrace{\mathbb E[\hat \theta_n]-\theta)^2}^{b} \Big] \tag{2}\\[6pt] &= \overbrace{\mathbb E \Big[(\hat \theta_n - \mathbb E[\hat \theta_n])^2 \Big ]}^{a^2} - \overbrace{\mathbb E \Big[(\mathbb E[\hat \theta_n]-\theta)^2 }^{b^2}\Big] + \overbrace{2*\mathbb E\Big [(\hat \theta_n-\mathbb E[\hat \theta_n])*(\mathbb E[\hat \theta_n]- \theta)}^{2ab}\Big] \tag{3}\\[6pt] &=\text{var}(\hat \theta_n)+ (\mathbb E[\hat \theta_n]-\theta)^2+ 2*\mathbb E\Big [(\underbrace{\hat \theta_n-\mathbb E[\hat \theta_n}_{=0}])\Big]*(\mathbb E[\hat \theta_n]- \theta) \tag{4}\\ &=\text{var}(\hat \theta_n) + (\text{bias})^2 \tag{5} \end{align} $ where: - (2) Add a negative and positive expectation of the estimator. - (3) Split the quadratic into two squares and the cross terms. Since both, the expectation of the estimator $\mathbb E[\hat \theta_n]$ and the true $\theta$ are constants, the other expectation of the second square can be removed. - (4) We fill in the definitions for variance and bias, while the cross terms cancel out. A low quadratic risk means that both the bias and variance are low. When the quadratic risk of an estimator $R \to 0$ as $n\to \infty$ then the estimator itself will converge in probability.