## Jacobian Vector
A row-vector that consists of all [[Partial Derivative|partial derivatives]] of a scalar valued multivariate function, e.g. $f(x_1, x_2, x_3)$. It represents the "gradient" of the function.
$ J=\Big[
\frac{\partial f}{\partial x_1},
\frac{\partial f}{\partial x_2},
\frac{\partial f}{\partial x_3},
\dots \Big ]
$
The Jacobian vector points in the direction of the steepest slope of $f$ at a given point $[x_1, x_2, x_3]$. This is true because the $\frac{\partial f}{\partial x_n}$ that results in the biggest change in $f$ will have the biggest value.
> [!note:]
> By convention the Jacobian is written as a row vector.
## Jacobian Matrix
Vector-valued functions return a vector of outputs, e.g. $f(x,y) \to \{z_1, z_2\}$. To get the full picture on such a function, we can stack the row-wise Jacobians of each output $\{z_1, z_2\}$ below each other, and thereby retrieve a Jacobian matrix.
$
\begin{align}
J&=
\begin{bmatrix}
\frac{\partial f_1}{\partial x} & \frac{\partial f_1}{\partial y} \\
\frac{\partial f_2}{\partial x} & \frac{\partial f_2}{\partial y}
\end{bmatrix}
\end{align}
$
where:
- First row represents the Jacobian vector $J_{[11]}, J_{[12]}$ of the first output element $f_1$.
- Second row represents the Jacobian vector $J_{[21]}, J_{[22]}$ of the second output element.
When we want to know the steepest slope at some specific point, we simply do a matrix multiplication of the Jacobian matrix with the vector of the data point $[x,y]$.
> [!note:]
> Even if the function we are evaluating is not linear, a very small step $\delta$ is still approximately linear, as long as the function is smooth.
## Optimization
When we want to find the minimum or maximum of a function, we could simply evaluate the function at every possible point. However, there are more efficient ways:
- *Critical points:* A function’s maximum or minimum occurs at points where its gradient (Jacobian vector) is zero. If the function has multiple min/max points, we need to evaluate each of these points, to determine whether the point is a global min/max point.
- *Gradient descent:* When we do not have an analytical function of our problem, we start at a random point, and always make one step into the direction of the biggest slope ([[Gradient Descent]]).