**Feature vectors:** $x^{(i)}$ describes the list of characteristics from a single observation $i$. It is being mapped to the classification label $y$. $ x \in \mathbb R^d, \;y \in \{-1,1\} $ **Training set:** A collection of $n$ number of $(x,y)$ pairs. $ S_n=\left \{(x^{(i)}, y^{(i)}), \, i=1, \dots,n \right \} $ **Model:** The classification model $h$ maps the feature vectors from $\mathbb R^d$ to either $\{-1,1\}$. $ h: \mathbb R^d \mapsto \{-1,1\} $ ## Decision Boundary It describes a hyperplane in $\mathbb R^d$. Observations fall on either of the two sides and are classified accordingly. Points lie directly on the decision boundary when the [[Dot Product]] of the parameters $\Theta$ with the observation $\mathbf x$ equals $0$. When the decision boundary does not go through the origin, an offset $\theta_0$ needs to be added to the equation. $ \begin{aligned} \{x&:\theta_1x_1+ \theta_2x_2+\theta_0=0 \} \\[6pt] \{x&: \theta \cdot \mathbf x +\theta_0=0 \} \end{aligned} $ **Note:** 1. A dot product of $0$ between two [[Vector Operations|Vectors]], means that they are orthogonal to each other. 2. The parameter vector of $\Theta$ is orthogonal to the decision boundary. Therefore we know if $\Theta \cdot \mathbf x=0$, then $\mathbf x$ has to point in the direction of the decision boundary itself. 3. The sign of the output defines on which side from the perspective of the decision boundary a point $\mathbf x$ lies. Hence the model classifies $x^{(i)}$ based on the sign. ## Linear Classifier The general form of a linear classifier $h$ has the observations $x^{(i)}$ and the parameter values for the decision boundary $\theta, \theta_0$ as input. The output is the sign of the dot product (plus intercept). $ \big \{h(x; \theta, \theta_0)= \mathrm{sign}(\theta \cdot x + \theta_0) \big \}, \quad \text{where} \, \theta\in \mathbb R^d, \theta\in \mathbb R $ **Linear separation:** Training examples are linearly separable if there exists a parameter vector $\hat \theta$ and offset parameter $\hat \theta_0$ such that.. $ \underbrace{y^{(i)}}_{\text{label}} \, \underbrace{(\hat \theta \cdot x^{(i)}+ \hat \theta_0)}_{\text{model output}}>0 \quad \text{for all } i=1,\dots, n $ The above equation is true when the label $y^{(i)}$ and the model output share the same sign, and therefore constitute a correctly labeled example. ## Example Assume a $45°$ decision boundary $[1,1]$through the origin. The theta vector is orthogonal to it. Observations that fall on the side the theta vector is pointing towards, will have a positive sign. ![[linear-classifier.png|center|400]] $ \begin{aligned} \text{dec. bound} = \begin{bmatrix} 1 \\1\end{bmatrix}\quad \Theta = \begin{bmatrix} \phantom{-}1 \\-1\end{bmatrix}\quad x^{(1)} = \begin{bmatrix} 3\\ 2\end{bmatrix} \quad x^{(2)} = \begin{bmatrix} 1\\ 4\end{bmatrix} \end{aligned} $ $ \begin{align} h(x^{(1)}) &= (1*3)+(-1*2)=1 \\ h(x^{(2)}) &= (1*1)+(-1*4)=-3 \end{align} $