## Jacobian Vector A row-vector that consists of all [[Partial Derivative|partial derivatives]] of a scalar valued multivariate function, e.g. $f(x_1, x_2, x_3)$. It represents the "gradient" of the function. $ J=\Big[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \frac{\partial f}{\partial x_3}, \dots \Big ] $ The Jacobian vector points in the direction of the steepest slope of $f$ at a given point $[x_1, x_2, x_3]$. This is true because the $\frac{\partial f}{\partial x_n}$ that results in the biggest change in $f$ will have the biggest value. > [!note:] > By convention the Jacobian is written as a row vector. ## Jacobian Matrix Vector-valued functions return a vector of outputs, e.g. $f(x,y) \to \{z_1, z_2\}$. To get the full picture on such a function, we can stack the row-wise Jacobians of each output $\{z_1, z_2\}$ below each other, and thereby retrieve a Jacobian matrix. $ \begin{align} J&= \begin{bmatrix} \frac{\partial f_1}{\partial x} & \frac{\partial f_1}{\partial y} \\ \frac{\partial f_2}{\partial x} & \frac{\partial f_2}{\partial y} \end{bmatrix} \end{align} $ where: - First row represents the Jacobian vector $J_{[11]}, J_{[12]}$ of the first output element $f_1$. - Second row represents the Jacobian vector $J_{[21]}, J_{[22]}$ of the second output element. When we want to know the steepest slope at some specific point, we simply do a matrix multiplication of the Jacobian matrix with the vector of the data point $[x,y]$. > [!note:] > Even if the function we are evaluating is not linear, a very small step $\delta$ is still approximately linear, as long as the function is smooth. ## Optimization When we want to find the minimum or maximum of a function, we could simply evaluate the function at every possible point. However, there are more efficient ways: - *Critical points:* A function’s maximum or minimum occurs at points where its gradient (Jacobian vector) is zero. If the function has multiple min/max points, we need to evaluate each of these points, to determine whether the point is a global min/max point. - *Gradient descent:* When we do not have an analytical function of our problem, we start at a random point, and always make one step into the direction of the biggest slope ([[Gradient Descent]]).