Assume we have a two-dimensional joint [[Multivariate Gaussian]] $[X_A, X_B]$., where we observe $x_A, x_B$. In the following we show that conditioning on $X_B$, we still maintain a Gaussian $X_{A|B} \sim \mathcal N$. $ \begin{bmatrix}X_A\\X_B \end{bmatrix} \sim \mathcal N \left( \begin{bmatrix}\mu_A\\\mu_B \end{bmatrix}, \begin{bmatrix}\sigma_A^2 & \sigma_{AB}\\ \sigma_{AB} & \sigma_B^2 \end{bmatrix} \right) $ **Joint distribution:** The [[Probability Density Function|PDF]] of that joint Gaussian is as follows.. $ f(x_A,x_B)= \frac{1}{2\pi \sqrt{\det(\Sigma})} \exp \left(-\frac{1}{2} \begin{bmatrix} x_A-\mu_A\\ x_B-\mu_B\\ \end{bmatrix}^T \Sigma^{-1} \begin{bmatrix} x_A-\mu_A\\ x_B-\mu_B\\ \end{bmatrix} \right) $ where $\Sigma^{-1}$ is the inverse of the [[Covariance Matrix]].. $ \Sigma^{-1}=\begin{bmatrix} \Sigma_{BB} & -\Sigma_{AB}\\[2pt] -\Sigma_{BA} & \Sigma_{AA} \end{bmatrix}=\begin{bmatrix} \sigma_B^2 & -\sigma_{AB}\\[2pt] -\sigma_{AB} & \sigma_A^2 \end{bmatrix} $ **Marginal distribution:** The univariate PDF for $X_B$ is as follows.. $ f(x_B)=\frac{1}{\sqrt{2 \pi \Sigma_{BB}}} \exp\left( -\frac{1}{2} \frac{(x_B-\mu_B)^2}{\Sigma_{BB}}\right) $ **Conditional distribution:** Simple conditioning divides the joint by the marginal of the condition (here it is $X_B$). $ f(X_A \vert X_B) = \frac{f(X_A,X_B)}{f(X_B)} $ $ f(x_A \vert x_B)= \frac{f(x_A,x_B)}{f(x_B)}=\frac { \frac{1}{2\pi \sqrt{\det(\Sigma})} \exp \left(-\frac{1}{2} \begin{bmatrix} x_A-\mu_A\\ x_B-\mu_B\\ \end{bmatrix}^T \Sigma^{-1} \begin{bmatrix} x_A-\mu_A\\ x_B-\mu_B\\ \end{bmatrix} \right) } { \frac{1}{\sqrt{2 \pi \Sigma_{BB}}} \exp\left( -\frac{1}{2} \frac{(x_B-\mu_B)^2}{\Sigma_{BB}}\right) } $ This simplifies down to: $ f(x_A \vert x_B)=\frac{1}{\sqrt{2\pi \Sigma_{A \vert B}}} \exp \left(-\frac{1}{2} \frac{(x_A- \mu_{A \vert B})^2}{\Sigma_{A\vert B}} \right ) $ See that the conditional distribution is also a (univariate) Gaussian. Solving for $\mu$ and $\Sigma$ we get: $ \begin{aligned} \mu_{A \vert B}&= \mu_A + \frac{\sigma_{AB}}{\sigma_B^2}(x_B-\mu_B) \\[8pt] \Sigma_{A \vert B}=\sigma_{A \vert B} &= \sigma_A^2- \frac{\sigma_{AB}^2}{\sigma_B^{2}} \end{aligned} $ ## Conditioning on Multiple Variables Now we generalize for the case, where $X$ is a joint Gaussian $X = \begin{bmatrix}X_A&X_B\end{bmatrix}^T$, but both terms are multivariate Gaussians themselves. Again we want to condition on $X_B$ to get a conditional estimate $X_{A \vert B}$. For simplicity of visualization, assume that $X_A \in \mathbb R^k$ and $k=1$. $ X = \begin{bmatrix}X_A \in \mathbb R^k\\X_B \in \mathbb R^d \end{bmatrix} \sim \mathcal N(\mu, \Sigma), \quad \mu = \begin{bmatrix}\mu_A \in \mathbb R^k\\ \mu_B \in \mathbb R^d \end{bmatrix}, \quad \Sigma=\begin{bmatrix}\Sigma_{AA} \in\mathbb R^{k \times k} & \Sigma_{AB} \in\mathbb R^{k \times d} \\ \Sigma_{BA} \in\mathbb R^{d \times k} & \Sigma_{BB} \in\mathbb R^{d \times d} \end{bmatrix} $ Again the apply Bayes Rule to obtain the conditional PDF. $ f(X_A \vert X_B) = \frac{f(X_A,X_B)}{f(X_B)} $ where the general form of a multivariate Gaussian in $\mathbb R^d$ looks as follows.. $ f(x)=f\Big(x^{(1)},\dots, x^{(d)}\Big)= \frac{1}{\big(2 \pi \det (\Sigma)\big)^{d/2}} \exp\left(-\frac{1}{2}(x-\mu)^T \, \Sigma^{-1}(x-\mu)\right) $ See the matrix shapes of respective $\Sigma$ terms. ![[conditional-multivariate-gaussian.jpeg|center|500]] Finally the mean and variance of the conditional Gaussian can be derived as follows.. $ \begin{aligned} \mu_{A\vert B}&=\mu_A+ \Sigma_{BA}\Sigma_{BB}^{-1}\,(y-\mu_B)\\[6pt] \sigma_{A\vert B}^2&=\sigma_A^2-\Sigma_{BA}\, \Sigma_{BB}^{-1}\, \Sigma_{AB} \end{aligned} $ ## Kernel Functions The above has assumed that we already know the full covariance matrix of all $\Sigma_N$ terms. When that is not the case, we can use a kernel function of estimate it. **Advantages of Kernel Functions:** - Compact version of assumptions on correlation structure. - Allows prediction at any $x$ (while Covariance is discrete). **Requirements for Kernel Functions as Covariance Matrix: - Function must be symmetric: $k(x_i,x_j) = k(x_j, x_i)$ - Function mist yield a positive semi-definite matrix: The function must be some transformed version of the inner product $\langle x_i, x_j \rangle$ . **Radial Basis Function:** The covariance if modeled as the radial distance between the two measurement points. So the closer two points are, the more their respective Gaussians should covariate (i.e. look similar). $ \mathrm{Cov}(X_1, X_2)= k(x_1, x_2) = \exp\left(- \frac{||x_i-x_j||^2}{2 \ell^2} \right) $ By applying this transformation to the whole covariance matrix $\Sigma_N$, we obtain a kernel-matrix $K_N$, which we then use instead of $\Sigma$ for further computation. $ \Sigma=\begin{bmatrix} \mathrm{Cov}(X_1,X_1) & \cdots & \mathrm{Cov}(X_1,X_N) \\ \vdots & \ddots &\vdots \\ \mathrm{Cov}(X_N,X_1) &\cdots & \mathrm{Cov}(X_N, X_N) \end{bmatrix} = \underbrace{ \begin{bmatrix} k(X_1,X_1) & \cdots & k(X_1,X_N) \\ \vdots & \ddots &\vdots \\ k(X_N,X_1) &\cdots & k(X_N, X_N) \end{bmatrix}}_{K_N} $ Conditional estimate of Gaussian parameters based on the kernel function: $ \begin{align} \mu_{A\vert B}&=\mu_A+ k_{BA} k_{BB}^{-1}\,(y-\mu_B)\\[6pt] \sigma_{A\vert B}^2&=\sigma_A^2- k_{BA}\, k_{BB}^{-1}\, k_{AB} \end{align} $ >[!note:] >A Gaussian process is a collection of infinitely many random variables, where any finite number of them a Gaussian.