A neural network consists of a network of nodes that are connected to each other. Each node takes scalar inputs from other nodes and returns a scalar output. ## Node with Single Input In this simple case, node $a^{(1)}$ only receives input from $a^{(0)}$ where the superscript indicates the level of the layer within the network. ![[feed-forward-nn-1.png|center|400]] $ a^{(1)} = \sigma (w a^{(0)}+b) $ |Symbol|Name|Comment| |---|---|---| |$a^{(i)}$|activity at layer $i$|value of the node| |w|weight|scalar to be optimized| |b|bias|scalar to be optimized| |$\sigma$|activation function|e.g. tanh| ## Node with Multiple Inputs There can also be multiple input nodes to $a^{(1)}$, that are weighted by the respective $w_i$ parameter. ![[feed-forward-nn-2.png|center|400]] $ a^{(1)} = \sigma (w_0 * a_0^{(0)}+ \cdots + w_n * a_n^{(0)}+b) $ **Summation notation:** $ a^{(1)}= \sigma\Big(\big(\sum_{i=0}^n w_i a_i^{(0)}\big)+b\Big) $ **Vector notation:** $ w= \begin{bmatrix} w_0 \\ \vdots \\ w_n \end{bmatrix}, \quad a^{(0)}=\begin{bmatrix} a_0^{(0)} \\ \vdots \\ a_n^{(0)} \end{bmatrix}, \quad a^{(1)} = \sigma(\mathbf w \cdot \mathbf a^{(0)} +b) $ ## Layer with Multiple Nodes Sticking with the vector notation, all nodes within a layer can be expressed as above. $ a_0^{(1)} = \sigma(w_0 \cdot a^{(0)} +b_0) \\ \dots\\ a_n^{(1)} = \sigma(w_n \cdot a^{(0)} +b_n) $ ![[feed-forward-nn-3.png|center|400]] When we combine all nodes from a certain layer (e.g. layer 1) into a vector-valued function (e.g. 2 outputs), we can write the whole layer with one equation. $ a^{(1)} = \sigma \big(\mathbf W^{(1)} \cdot \mathbf a^{(0)}+\mathbf b^{(1)} \big) $ We build a row vector with all weights coming from one node. We iterate over all input nodes and stack these row vectors below each other. $\mathbf W$ indicates that we deal with a matrix instead of a vector. $ \mathbf W^{(1)}= \begin{bmatrix} w_{0,0}^{(1)} & \cdots & w_{0,n-1}^{(1)} \\ \cdots & \cdots & \cdots \\ w_{m-1,0}^{(1)} & \cdots & w_{m-1,n-1}^{(1)} \end{bmatrix} \quad a^{(0)}= \begin{bmatrix} a_0^{(0)}\\ \vdots \\a_n^{(0)} \end{bmatrix} \quad b^{(1)}= \begin{bmatrix} b_0^{(1)}\\ \vdots \\b_m^{(1)} \end{bmatrix} $ I.e. a column in $\mathbf W$ represents the weights going to a single node. ## Activation Functions ReLU: $ f(z)= \max\{0,z\} $ Hyperbolic tangent: $ f(z) =\tanh(z) = \frac{e^z-e^{-z}}{e^z+e^{-z}} = 1-\frac{2}{e^{2z}+1} $