To do statistical estimation and inference on time series data, we need some technical conditions about the underlying [[Time Series as Stochastic Process|Stochastic Process]] to hold. These conditions ensure the following: - *Representativeness:* Observations (which show a single realized path of the process) are representative of all possible (but unseen) realizations of the process. - *Parameter estimation:* Population parameters of the data generating process (e.g. expectations, variances, correlations) can be estimated via averages from the single realized path. - *Model extrapolation:* [[Statistical Model|Statistical Models]] fitted to past observations, can be reliably extrapolated into the future. ## Weak Stationarity There are 3 conditions that need to be satisfied for weak stationarity. 1. *Constant mean:* The marginal [[Expectation]] at a single $t$ is the same for all $t$. 2. *Constant variance:* The marginal [[Variance]] at a single point $t$ is the same for all $t$. 3. *Covariance only depending on lag:* The [[Covariance]] at any two points with a fixed distance to each other is always the same. This means, the covariance is not a function of time, but only a function of the size of the gap $(s-t)$. (e.g. the 30-days autocovariance is the same in January and March). $ \begin{align} \mu_X(t) &= \mu_X \tag{1} \\[2pt] \mathrm{var}_X(t) &= \sigma_X^2 \tag{2} \\[2pt] \mathrm{cov}(X_s, X_t)&= \gamma_X(|s-t|) \tag{3} \end{align} $ The first criterion allows us to compute averages over multiple observations. However the dependency structure still affects the quality of the estimates. **Autocorrelation:** The autocorrelation is simply the autocovariance divided by the variance. $ \rho_X(h) = \frac{\mathrm{cov}(X_s,X_t)}{\sigma_{X_s}\sigma_{X_t}} = \frac{\gamma_X(|s-t|)}{\sigma_{X}^2} = \frac{\gamma_X(h)}{\gamma_{X}(0)} $ >[!note:] >Basically weak stationarity requires the first two moments of the probability distribution of each $X_t$ to stay constant for the whole process. > ## Strong Stationarity For this to hold, the joint distribution of $X_t, \dots, X_{t+n}$ needs to be equal to the joint distribution of $X_{t+h}, \dots,X_{t+h+n}$, for any $t,n,h$. - $t:$ Point in time - $n:$ Length of the two series - $h:$ Shift between the two series $ (X_t, \dots, X_{t+n}) \stackrel{(d)}{=}(X_{t+h}, \dots, X_{t+h+n}) $ ![[stationarity.png|center|400]] Remember that $X$ are [[Random Variable|r.v's.]] and not realizations, so the [[Joint Probability Density Function]] is in multi-dimensional space $\in \mathbb R^n$. For strong stationarity all moments (not only the first two) of this joint distribution need to be the same. **Implications of Strong Stationarity:** - *Consistent dependency structure:* The dependency structure in the time series will stay the same for any new adjacent term. Therefore we can predict them with a fitted model. - *Identically distributed marginal variables:* The marginal distribution of each $X_t$ is the same for all $t$. This gives us identically distributed observations, that are still dependent though. - *Decaying dependency:* If the dependency structure in the series dies down quickly as the time gap between observations increases, then we can assume them to be close to being independent. This allows us to use appropriate generalizations of LLN and CLT. ## Sample Statistics If the series is stationary, then each observation in the sample averages above contributes statistical information about the common parameters. **Weak Dependence:** Although, most time series are dependent, a sufficiently fast decay of dependences (e.g. [[Correlation]]) as the time distance between terms get large, allows sample statistics to behave asymptotically like for [[Independence and Identical Distribution|i.i.d.]] data (using [[Law of Large Numbers|LLN]], [[Central Limit Theorem|CLT]]). **Key Estimators:** $ \begin{align} \hat \mu&= \bar X_n=\frac{1}{n}\sum_{t=1}^nX_t \\[6pt] \hat \sigma^2&=\frac{1}{n} \sum_{t=1}^n (X_t- \hat \mu)^2 \\[6pt] \hat \gamma(h) &= \frac{1}{n}\sum_{t=1}^{n-h} (X_t- \hat \mu)(X_{t+h}- \hat \mu) \quad \text{for } 1 \le h <n \end{align} $ **Auto-Covariance:** - *Normalization:* To obtain sample statistics, we divide the autocovariance by $n$, although there is only $n-k$ samples to sum over. This is because otherwise, the larger the lag k, the higher the autocovariance. Also the degrees of freedom in terms of variables is $n$ and not $n-k$. - *Consistency:* We can show that the sample autocovariance is a consistent estimator with asymptotic normality, under the assumption that the time series is stationary. Consistency means that the estimator $\hat \gamma(h)$ converges to the true parameter $\gamma(h)$ as $n \to \infty$. This is true for sample autocovariance, because the estimator for the mean $\hat \mu$ is consistent. $ \begin{align} \mathbb E[\hat \gamma(k)] &\to \gamma(k) \\[6pt] \hat \gamma(h) &= \frac{1}{n}\sum_{t=1}^{n-h} (X_t- \hat \mu)(X_{t+h}- \hat \mu) \\[8pt] \mathbb E[\hat \gamma(h)] &= \frac{1}{n}\sum_{t=1}^{n-h} (X_t- \mathbb E[\hat \mu])(X_{t+h}- \mathbb E[\hat \mu]) \\[8pt] &= \frac{1}{n}\sum_{t=1}^{n-h} (X_t- \mu)(X_{t+h}- \mu) \end{align} $ [[Central Limit Theorem|CLT]] says that when you sum independent r.v., this sum (or mean) converges in distribution to a [[Gaussian Distribution|Gaussian]]. In the case of autocovariance, this is not as straight forward, as we have a sum of products, that contain some sort of dependence.