While [[Covariance]] measures the dependence between two [[Random Variable|random variables]] $X$ and $Y$, it is difficult to interpret when $X$ and $Y$ are measured in different units. The correlation coefficient, denoted as $\rho(X,Y)$, is a dimensionless version of covariance that ranges between $[−1,1]$. It is defined as:
$ \rho(X,Y)=
\mathbb E \left[\frac{(X-\mathbb E[X])}{\sigma_X}
*\frac{(Y-\mathbb E[Y])}{\sigma_Y} \right]
=\frac{\mathrm{Cov}(X,Y)}{\sigma_X \sigma_Y}
$
Since the numerator and the denominator of each of the two fractions are of the same units, both of the resulting quotients are dimensionless.
## Properties of Correlation
**Symmetry:** The correlation coefficient is symmetric.
$ \rho(X,Y)= \rho(Y,X)$
**Correlation of a r.v. with itself:**
$ \rho(X,X)=\frac{\mathrm{Cov}(X,X)}{\sigma_X \sigma_X} = \frac{\mathrm{Var}(X)}{\mathrm{Var}(X)}=1 $
**Undefined correlation for constants:** The correlation coefficient is not defined when at least one of the two r.v's. is a constant. For a constant r.v. $\sigma=0$, leading to a division by zero.
## Interpretation of Correlation
A correlation of $0.5$ tells us that the regression line has a slope, where it goes $\sigma_X$ on the x-axis, and $\frac{1}{2} \sigma_Y$ on the y-axis.
![[correlation.jpeg|center|300]]
Correlation $\rho$ is related to the regression coefficient $\beta_1$.
$ \rho = \frac{\text{Cov}(X,Y)}{\sigma_X\sigma_Y}, \quad \beta_1 = \frac{\text{Cov}(X,Y)}{\sigma_X^2} $
Substituting one into the other equations, results in the following relationship:
$ \beta_1 = \rho *\frac{\sigma_Y}{\sigma_X} $
> [!note:]
> Unlike correlation, the slope coefficient $\beta_1$ is not symmetric, as it depends on which variable is regressed on the other. See that $\frac{\sigma_Y}{\sigma_X} \neq \frac{\sigma_X}{\sigma_Y}$.
## Correlation after Linear Transformation
We already know how [[Variance after Linear Transformation]] and [[Covariance#Covariance after Linear Transformation|Covariance after Linear Transformation]] behave. The correlation coefficient can be directly derived thereof.
$
\begin{align}
\rho (aX+b,Y) &= \frac{\mathrm{Cov}(aX+b,Y)}{\sigma_{aX+b} *\sigma_Y} \\[4pt]
&=\frac{a*\mathrm{Cov}(X,Y)}{\sqrt{a^2\sigma_X^2}*\sigma_Y} \\[4pt]
&=\frac{a*\mathrm{Cov}(X,Y)}{\vert a \vert* \sigma_X*\sigma_Y} \\[4pt]
&=\mathrm{sign}(a)*\frac{\mathrm{Cov}(X,Y)}{\sigma_X*\sigma_Y} \\[4pt]
\end{align}
$
* *Shifting by a constant:* Adding a constant $b$ to $X$ or $Y$ does not affect their variance, or covariance. Hence, it has no effect on the correlation either.
$ \rho(X+b,Y)=\rho(X,Y)$
* *Scaling by a factor:* Multiplying $X$ by a factor of $a$ scales its covariance by $a$ and scales its variance by $a^2$ (i.e. its standard deviation by $\sqrt{a^2}$). As this impacts the numerator and denominator in the same magnitude, only the plus or minus sign of $a$ remains.
$
\rho(aX,Y)=
\begin{cases}
\rho(X,Y) & \text{if } a>0 \\
-\rho(X,Y) & \text{if } a<0
\end{cases}
$