Correlation - Bernhard Pfann, CFA

While [[Covariance]] measures the dependence between two [[Random Variable|random variables]] $X$ and $Y$, it is difficult to interpret when $X$ and $Y$ are measured in different units. The correlation coefficient, denoted as $\rho(X,Y)$, is a dimensionless version of covariance that ranges between $[−1,1]$. It is defined as: $ \rho(X,Y)= \mathbb E \left[\frac{(X-\mathbb E[X])}{\sigma_X} *\frac{(Y-\mathbb E[Y])}{\sigma_Y} \right] =\frac{\mathrm{Cov}(X,Y)}{\sigma_X \sigma_Y} $ Since the numerator and the denominator of each of the two fractions are of the same units, both of the resulting quotients are dimensionless. ## Properties of Correlation **Symmetry:** The correlation coefficient is symmetric. $ \rho(X,Y)= \rho(Y,X)$ **Correlation of a r.v. with itself:** $ \rho(X,X)=\frac{\mathrm{Cov}(X,X)}{\sigma_X \sigma_X} = \frac{\mathrm{Var}(X)}{\mathrm{Var}(X)}=1 $ **Undefined correlation for constants:** The correlation coefficient is not defined when at least one of the two r.v's. is a constant. For a constant r.v. $\sigma=0$, leading to a division by zero. ## Interpretation of Correlation A correlation of $0.5$ tells us that the regression line has a slope, where it goes $\sigma_X$ on the x-axis, and $\frac{1}{2} \sigma_Y$ on the y-axis. ![[correlation.jpeg|center|300]] Correlation $\rho$ is related to the regression coefficient $\beta_1$. $ \rho = \frac{\text{Cov}(X,Y)}{\sigma_X\sigma_Y}, \quad \beta_1 = \frac{\text{Cov}(X,Y)}{\sigma_X^2} $ Substituting one into the other equations, results in the following relationship: $ \beta_1 = \rho *\frac{\sigma_Y}{\sigma_X} $ > [!note:] > Unlike correlation, the slope coefficient $\beta_1$ is not symmetric, as it depends on which variable is regressed on the other. See that $\frac{\sigma_Y}{\sigma_X} \neq \frac{\sigma_X}{\sigma_Y}$. ## Correlation after Linear Transformation We already know how [[Variance after Linear Transformation]] and [[Covariance#Covariance after Linear Transformation|Covariance after Linear Transformation]] behave. The correlation coefficient can be directly derived thereof. $ \begin{align} \rho (aX+b,Y) &= \frac{\mathrm{Cov}(aX+b,Y)}{\sigma_{aX+b} *\sigma_Y} \\[4pt] &=\frac{a*\mathrm{Cov}(X,Y)}{\sqrt{a^2\sigma_X^2}*\sigma_Y} \\[4pt] &=\frac{a*\mathrm{Cov}(X,Y)}{\vert a \vert* \sigma_X*\sigma_Y} \\[4pt] &=\mathrm{sign}(a)*\frac{\mathrm{Cov}(X,Y)}{\sigma_X*\sigma_Y} \\[4pt] \end{align} $ * *Shifting by a constant:* Adding a constant $b$ to $X$ or $Y$ does not affect their variance, or covariance. Hence, it has no effect on the correlation either. $ \rho(X+b,Y)=\rho(X,Y)$ * *Scaling by a factor:* Multiplying $X$ by a factor of $a$ scales its covariance by $a$ and scales its variance by $a^2$ (i.e. its standard deviation by $\sqrt{a^2}$). As this impacts the numerator and denominator in the same magnitude, only the plus or minus sign of $a$ remains. $ \rho(aX,Y)= \begin{cases} \rho(X,Y) & \text{if } a>0 \\ -\rho(X,Y) & \text{if } a<0 \end{cases} $