We can use [[Kernel Based Methods]] to perform the [[Perceptron Algorithm]].
**Regular Perceptron Algorithm:**
![[Perceptron Algorithm#^3acd31]]
The parameter $\theta$ (if initialized at $0$) can be written as the sum over all $n$ observations, where each element of this sum is the product of:
- $x^{(j)}:$ Data point of observation $j$.
- $\phi(x^{(j)}):$ Feature vector of the original data point $x^{(j)}$.
- $y^{(j)}:$ Target label of observation $j$.
- $\alpha_j:$ Number of times the evaluation of observation $j$ has been wrong.
$ \theta = \sum _{j=1}^{n} \alpha _ i y^{(j)} \phi \left(x^{(j)}\right) $
## Evaluation Step
By substitution with this new definition, we can get rid of the $\theta$ parameter, and evaluate based on the Kernel function instead.
$
\begin{align}
y^{(i)}(\theta \cdot \phi(x^{(i)})) &\le 0 \\[8pt]
y^{(i)}\sum_{j=1}^n \alpha_j y^{(j)} \phi (x^{(j)})\cdot \phi(x^{(i)}) &\le 0 \\
y^{(i)} \sum_{j=1}^n \alpha_j y^{(j)} \underbrace{K(x^{(j)}, x^{(i)})}_{\text{similarity}} &\le 0
\end{align}
$
The summation term can be interpreted as a weighted $y$ label. The weighting comes from two sources:
- The kernel function $K(x^{(j)}, x^{(i)})$ is a measure of similarity between the two input vectors. Larger values mean larger similarity.
- $\alpha_j$ records the number of misclassifications of observation $j$. A large $\alpha_j$ indicates that $j$ is difficult to classify and is therefore close to the decision boundary, which makes it an important data point to shape the decision boundary.
## Update Step
We can show that the update step simplifies to a simple addition of the $\alpha$ parameter.
$
\begin{align}
\theta &= \theta + y^{(i)}\phi (x^{(i)}) \tag{1}\\[4pt]
\sum_{j=1}^n\alpha_jy^{(j)}\phi(x^{(j)}) &=\sum_{j=1}^n \alpha_j y^{(j)} \phi(x^{(j)})+y^{(i)}\phi(x^{(i)}) \tag{2}\\[4pt]
\alpha_iy^{(i)} \phi(x^{(i)})+ \sum_{j=1 ; \,\not \in i}^n\alpha_j y^{(j)}\phi(x^{(j)}) &= \alpha_i y^{(i)} \phi(x^{(i)})+\sum_{j=1; \, \not \in i}^n \alpha_j y^{(j)} \phi(x^{(j)})+y^{(i)}\phi(x^{(i)}) \tag{3}\\[8pt]
\alpha_i y^{(j)} \phi(x^{(j)}) &= (y^{(j)} \phi(x^{(j)}))*(\alpha_i+1) \tag{4}\\[8pt]
\alpha_i &= \alpha_i+1 \tag{5}
\end{align}
$
where:
- (2) Replace $\theta$ with new definition.
- (3) Split up the summation into the case where $j=i$ and there $j \neq i$.
- (4) Factor out $y^{(i)}\phi(x^{(i)})$ on the right side.
**Kernel Perceptron Algorithm:**
```
Initialize theta = 0, theta_0 = 0, alpha = 0
for t = 1 to T:
for i = 1 to N:
if y_i * sum(alpha_j * y_j) * Kernel(x_j, x_i) <= 0:
alpha += 1
```