## Kernel Trick
The kernel tricks shows that computing a higher order feature space (i.e. [[Polynomial Feature Transformation]] of $x$) is equivalent to just using the [[Dot Product]] of $x$ raised to the power $d$. It is called a trick because we do not need to explicitly compute the features, which makes is computationally efficient.
## Example: Quadratic Feature Transformation
Assume we have an original feature space $x \in [x_1, x_2]$ and we explicitly added quadratic terms and interactions. We made two observations $x$ and $x^\prime$.
$
\begin{align}
\phi(x)= \big [x_1, \, x_2,\, {x_1}^2,\, \sqrt{2}x_1x_2,\, {x_2}^2 \big ] \\[6pt] \phi(x^\prime) = \big [x_1^\prime ,\, x_2^\prime ,\, {x_1^\prime }^2,\, \sqrt{2}x_1^\prime x_2^\prime ,\, {x_2^\prime }^2 \big ]
\end{align}
$
>[!note:]
>The factor $\sqrt 2$ for the interaction terms was necessary to make the equation below fit. However such a scalar transformation does not matter to us, as the respective coefficient will just be scaled accordingly.
We can now take the dot product between these two vectors.
$
\begin{align}
\phi(x) \cdot \phi(x^\prime) &= x_1 x_1^\prime + x_2 x_2^\prime + x_1^2 {x_1^\prime}^2 + 2*x_1 {x_1^\prime} x_2 x_2^\prime+ x_2^2 {x_2^\prime}^2 \\[6pt]
&=\left({x_1}{x_1^\prime } + {x_2}{x_2^\prime }\right)+ \left({x_1}{x_1^\prime } + {x_2}{x_2^\prime }\right)^2 \\[6pt]
&=\underbrace{x \cdot x^\prime + (x \cdot x^\prime )^2}_{K(x, x^\prime)}
\end{align}
$
So we see that the explicitly computed feature space $\phi(x)$ can be also achieve by taking dot products on the original $x$ [[Vector Operations|Vector]], which is computationally more efficient. We call this transformation function $K(x,x^\prime)$ a *Kernel*.
$ K(x, x^\prime)=x \cdot x^\prime + (x \cdot x^\prime )^2 \equiv \phi(x) \cdot \phi(x^\prime) $
## Polynomial Kernel Function
The polynomial kernel this extends to higher orders than $d=2$. The Kernel function of a $n^{th}$ term polynomial feature vector can be written as follows..
$
\begin{align}
K(x, x^\prime)&= \phi(x) \cdot \phi(x^\prime)\\[4pt] &=(x\cdot x^\prime)+(x\cdot x^\prime)^2+\cdots+(x\cdot x^\prime)^n
\end{align}
$
The above is called a *homogeneous polynomial kernel*. In practice we often only use the highest order dot product in the kernel, and disregard all lower term items.
## Features Captured by Polynomial Kernels
Quadratic Kernel (n=2):
| Feature Type | General Polynomial Kernel | Homogeneous Polynomial Kernel |
| --------------- | ------------------------- | ----------------------------- |
| Linear Terms | - | $x_1, x_2$ |
| Quadratic Terms | $x_1^2, x_2^2, x_1*x_2$ | $x_1^2, x_2^2, x_1*x_2$ |
Cubic Kernel (n=3):
| Feature Type | General Polynomial Kernel | Homogeneous Polynomial Kernel |
| --------------- | ------------------------------------ | ---------------------------------- |
| Linear Terms | - | $x_1,x_2$ |
| Quadratic Terms | - | $x_1^2, x_2^2, x_1*x_2$ |
| Cubic Terms | $x_1^3, x_2^3, x_1^2*x_2, x_1*x_2^2$ | $x_1^3, x_2^3, x_1^2x_2, x_1x_2^2$ |