**Feature vectors:** $x^{(i)}$ describes the list of characteristics from a single observation $i$. It is being mapped to the classification label $y$.
$ x \in \mathbb R^d, \;y \in \{-1,1\} $
**Training set:** A collection of $n$ number of $(x,y)$ pairs.
$ S_n=\left \{(x^{(i)}, y^{(i)}), \, i=1, \dots,n \right \} $
**Model:** The classification model $h$ maps the feature vectors from $\mathbb R^d$ to either $\{-1,1\}$.
$ h: \mathbb R^d \mapsto \{-1,1\} $
## Decision Boundary
It describes a hyperplane in $\mathbb R^d$. Observations fall on either of the two sides and are classified accordingly.
Points lie directly on the decision boundary when the [[Dot Product]] of the parameters $\Theta$ with the observation $\mathbf x$ equals $0$. When the decision boundary does not go through the origin, an offset $\theta_0$ needs to be added to the equation.
$
\begin{aligned}
\{x&:\theta_1x_1+ \theta_2x_2+\theta_0=0 \} \\[6pt]
\{x&: \theta \cdot \mathbf x +\theta_0=0 \}
\end{aligned}
$
**Note:**
1. A dot product of $0$ between two [[Vector Operations|Vectors]], means that they are orthogonal to each other.
2. The parameter vector of $\Theta$ is orthogonal to the decision boundary. Therefore we know if $\Theta \cdot \mathbf x=0$, then $\mathbf x$ has to point in the direction of the decision boundary itself.
3. The sign of the output defines on which side from the perspective of the decision boundary a point $\mathbf x$ lies. Hence the model classifies $x^{(i)}$ based on the sign.
## Linear Classifier
The general form of a linear classifier $h$ has the observations $x^{(i)}$ and the parameter values for the decision boundary $\theta, \theta_0$ as input. The output is the sign of the dot product (plus intercept).
$ \big \{h(x; \theta, \theta_0)= \mathrm{sign}(\theta \cdot x + \theta_0) \big \}, \quad \text{where} \, \theta\in \mathbb R^d, \theta\in \mathbb R $
**Linear separation:** Training examples are linearly separable if there exists a parameter vector $\hat \theta$ and offset parameter $\hat \theta_0$ such that..
$ \underbrace{y^{(i)}}_{\text{label}} \, \underbrace{(\hat \theta \cdot x^{(i)}+ \hat \theta_0)}_{\text{model output}}>0 \quad \text{for all } i=1,\dots, n $
The above equation is true when the label $y^{(i)}$ and the model output share the same sign, and therefore constitute a correctly labeled example.
## Example
Assume a $45°$ decision boundary $[1,1]$through the origin. The theta vector is orthogonal to it. Observations that fall on the side the theta vector is pointing towards, will have a positive sign.
![[linear-classifier.png|center|400]]
$
\begin{aligned}
\text{dec. bound} = \begin{bmatrix} 1 \\1\end{bmatrix}\quad
\Theta = \begin{bmatrix} \phantom{-}1 \\-1\end{bmatrix}\quad
x^{(1)} = \begin{bmatrix} 3\\ 2\end{bmatrix} \quad
x^{(2)} = \begin{bmatrix} 1\\ 4\end{bmatrix}
\end{aligned}
$
$
\begin{align}
h(x^{(1)}) &= (1*3)+(-1*2)=1 \\
h(x^{(2)}) &= (1*1)+(-1*4)=-3
\end{align}
$