Kernel Perceptron Algorithm - Bernhard Pfann, CFA

We can use [[Kernel Based Methods]] to perform the [[Perceptron Algorithm]]. **Regular Perceptron Algorithm:** ![[Perceptron Algorithm#^3acd31]] The parameter $\theta$ (if initialized at $0$) can be written as the sum over all $n$ observations, where each element of this sum is the product of: - $x^{(j)}:$ Data point of observation $j$. - $\phi(x^{(j)}):$ Feature vector of the original data point $x^{(j)}$. - $y^{(j)}:$ Target label of observation $j$. - $\alpha_j:$ Number of times the evaluation of observation $j$ has been wrong. $ \theta = \sum _{j=1}^{n} \alpha _ i y^{(j)} \phi \left(x^{(j)}\right) $ ## Evaluation Step By substitution with this new definition, we can get rid of the $\theta$ parameter, and evaluate based on the Kernel function instead. $ \begin{align} y^{(i)}(\theta \cdot \phi(x^{(i)})) &\le 0 \\[8pt] y^{(i)}\sum_{j=1}^n \alpha_j y^{(j)} \phi (x^{(j)})\cdot \phi(x^{(i)}) &\le 0 \\ y^{(i)} \sum_{j=1}^n \alpha_j y^{(j)} \underbrace{K(x^{(j)}, x^{(i)})}_{\text{similarity}} &\le 0 \end{align} $ The summation term can be interpreted as a weighted $y$ label. The weighting comes from two sources: - The kernel function $K(x^{(j)}, x^{(i)})$ is a measure of similarity between the two input vectors. Larger values mean larger similarity. - $\alpha_j$ records the number of misclassifications of observation $j$. A large $\alpha_j$ indicates that $j$ is difficult to classify and is therefore close to the decision boundary, which makes it an important data point to shape the decision boundary. ## Update Step We can show that the update step simplifies to a simple addition of the $\alpha$ parameter. $ \begin{align} \theta &= \theta + y^{(i)}\phi (x^{(i)}) \tag{1}\\[4pt] \sum_{j=1}^n\alpha_jy^{(j)}\phi(x^{(j)}) &=\sum_{j=1}^n \alpha_j y^{(j)} \phi(x^{(j)})+y^{(i)}\phi(x^{(i)}) \tag{2}\\[4pt] \alpha_iy^{(i)} \phi(x^{(i)})+ \sum_{j=1 ; \,\not \in i}^n\alpha_j y^{(j)}\phi(x^{(j)}) &= \alpha_i y^{(i)} \phi(x^{(i)})+\sum_{j=1; \, \not \in i}^n \alpha_j y^{(j)} \phi(x^{(j)})+y^{(i)}\phi(x^{(i)}) \tag{3}\\[8pt] \alpha_i y^{(j)} \phi(x^{(j)}) &= (y^{(j)} \phi(x^{(j)}))*(\alpha_i+1) \tag{4}\\[8pt] \alpha_i &= \alpha_i+1 \tag{5} \end{align} $ where: - (2) Replace $\theta$ with new definition. - (3) Split up the summation into the case where $j=i$ and there $j \neq i$. - (4) Factor out $y^{(i)}\phi(x^{(i)})$ on the right side. **Kernel Perceptron Algorithm:** ``` Initialize theta = 0, theta_0 = 0, alpha = 0 for t = 1 to T: for i = 1 to N: if y_i * sum(alpha_j * y_j) * Kernel(x_j, x_i) <= 0: alpha += 1 ```