## Kernel Trick The kernel tricks shows that computing a higher order feature space (i.e. [[Polynomial Feature Transformation]] of $x$) is equivalent to just using the [[Dot Product]] of $x$ raised to the power $d$. It is called a trick because we do not need to explicitly compute the features, which makes is computationally efficient. ## Example: Quadratic Feature Transformation Assume we have an original feature space $x \in [x_1, x_2]$ and we explicitly added quadratic terms and interactions. We made two observations $x$ and $x^\prime$. $ \begin{align} \phi(x)= \big [x_1, \, x_2,\, {x_1}^2,\, \sqrt{2}x_1x_2,\, {x_2}^2 \big ] \\[6pt] \phi(x^\prime) = \big [x_1^\prime ,\, x_2^\prime ,\, {x_1^\prime }^2,\, \sqrt{2}x_1^\prime x_2^\prime ,\, {x_2^\prime }^2 \big ] \end{align} $ >[!note:] >The factor $\sqrt 2$ for the interaction terms was necessary to make the equation below fit. However such a scalar transformation does not matter to us, as the respective coefficient will just be scaled accordingly. We can now take the dot product between these two vectors. $ \begin{align} \phi(x) \cdot \phi(x^\prime) &= x_1 x_1^\prime + x_2 x_2^\prime + x_1^2 {x_1^\prime}^2 + 2*x_1 {x_1^\prime} x_2 x_2^\prime+ x_2^2 {x_2^\prime}^2 \\[6pt] &=\left({x_1}{x_1^\prime } + {x_2}{x_2^\prime }\right)+ \left({x_1}{x_1^\prime } + {x_2}{x_2^\prime }\right)^2 \\[6pt] &=\underbrace{x \cdot x^\prime + (x \cdot x^\prime )^2}_{K(x, x^\prime)} \end{align} $ So we see that the explicitly computed feature space $\phi(x)$ can be also achieve by taking dot products on the original $x$ [[Vector Operations|Vector]], which is computationally more efficient. We call this transformation function $K(x,x^\prime)$ a *Kernel*. $ K(x, x^\prime)=x \cdot x^\prime + (x \cdot x^\prime )^2 \equiv \phi(x) \cdot \phi(x^\prime) $ ## Polynomial Kernel Function The polynomial kernel this extends to higher orders than $d=2$. The Kernel function of a $n^{th}$ term polynomial feature vector can be written as follows.. $ \begin{align} K(x, x^\prime)&= \phi(x) \cdot \phi(x^\prime)\\[4pt] &=(x\cdot x^\prime)+(x\cdot x^\prime)^2+\cdots+(x\cdot x^\prime)^n \end{align} $ The above is called a *homogeneous polynomial kernel*. In practice we often only use the highest order dot product in the kernel, and disregard all lower term items. ## Features Captured by Polynomial Kernels Quadratic Kernel (n=2): | Feature Type | General Polynomial Kernel | Homogeneous Polynomial Kernel | | --------------- | ------------------------- | ----------------------------- | | Linear Terms | - | $x_1, x_2$ | | Quadratic Terms | $x_1^2, x_2^2, x_1*x_2$ | $x_1^2, x_2^2, x_1*x_2$ | Cubic Kernel (n=3): | Feature Type | General Polynomial Kernel | Homogeneous Polynomial Kernel | | --------------- | ------------------------------------ | ---------------------------------- | | Linear Terms | - | $x_1,x_2$ | | Quadratic Terms | - | $x_1^2, x_2^2, x_1*x_2$ | | Cubic Terms | $x_1^3, x_2^3, x_1^2*x_2, x_1*x_2^2$ | $x_1^3, x_2^3, x_1^2x_2, x_1x_2^2$ |