Kernel methods can be thought of as instance-based learners. Rather than learning some fixed set of $\theta$ parameters corresponding to the input features, they instead compare training examples with each other and learn corresponding weights $\alpha_j$.
Prediction for unlabeled inputs, is treated by the application of a similarity function $K$, called a kernel, between the unlabeled input $x^\prime$ and each of the training inputs $x_i$. Kernel functions can be defined differently, as they capture different notions of similarity. The most popular kernels are the following:
- *Polynomial kernel:* The formulation of the [[Polynomial Kernel Function]] is equivalent to a feature space with all feature combinations up to the $d$-th power.
$ K(x,x^\prime)=(x\cdot x^\prime+c)^d $
- *Radial Basis function kernel:* (RBF) For Gaussian RBF the parameter $\gamma=0.5$.
$ K(x,x^\prime) = \exp(-\gamma* \lvert \lvert x-x^\prime \rvert \rvert ^2) $ ^b82da0