Perceptron Algorithm - Bernhard Pfann, CFA

A simple updating algorithm to find some $\theta, \theta_0$ for a [[Linear Classifier]] where the training set $S_n$ is linearly separable. While $\theta_0$ is a scalar intercept, $\theta$ might be a vector of multiple parameters. $ S_n = \Big \{(x^{(i)}, y^{(i)}), i=1,\dots,n\Big\} $ **Algorithm:** ```text Initialize theta=0, theta_0=0 for t=1 to T: for i=1 to N: if y_i * (theta * x_i + theta_0) <= 0: theta += y_i * x_i theta_0 += y_i ``` ^3acd31 **Description:** 1. *Initialization:* We initialize $\theta=0$ and $\theta_0=0$. 2. *Epochs:* We iterate $T$ times over the complete training set. This is needed as early parameter updates can be undone through later training examples. After looping over the training set multiple times, the ordering of $S_n$ should not play a role anymore. 3. *Iteration over observations:* In each epoch we iterate over all $n$ training examples. 4. *Update:* For each training example we evaluate if it is misclassified $(\le 0)$. If that is the case, both $\theta, \theta_0$ are being updated, otherwise not. **Update:** The updated $\theta, \theta_0$ values always lead to an increase of the evaluation function for a given observation. - *Before update:* $ \mathrm{eval}=y^{(i)}(\theta \cdot x^{(i)}+\theta_0) $ - *After update:* $ \begin{align} \mathrm{eval}&= y^{(i)}\Big((\overbrace{\theta+y^{(i)}x^{(i)}}^{\theta^\prime}) \cdot x^{(i)}+(\overbrace{\theta_0+y^{(i)}}^{\theta_0^\prime})\Big) \tag{1}\\[4pt] &=y^{(i)}\Big((y^{(i)}x^{(i)}) \cdot x^{(i)}+y^{(i)}\Big) \tag{2}\\[4pt] &=(x^{(i)}) \cdot x^{(i)}+1 \tag{3}\\[4pt] &=\lvert\lvert x^{(i)} \rvert\rvert^2 +1 \tag{4} \end{align} $ where: - (1) Replace $(\theta, \theta_0)$ by their updated values $(\theta^\prime, \theta_0^\prime)$. - (2) To observe the difference to the before-update status, we assume that the before-update parameters $(\theta, \theta_0)$ are both $0$. - (3) $y^{(i)}$ multiplied by itself always returns $1$ since $y^{(i)} \in \{-1,1\}$. - (4) The remaining terms are always positive, which proves that the update procedure will always yield to an improvement. To evaluate the performance of the classifier, we can define a training error. $ \begin{align} \mathbf I^{(i)} &= y^{(i)}( \theta \cdot x^{(i)}+\theta_0) \le 0\\ \epsilon_n(\theta, \theta_0)&= \sum_{i=1}^n \mathbf I^{(i)} \end{align} $