Lagrange multipliers provide a method to find the minima or maxima of a multivariate function $f(x,y)$ subject to constraints. This is useful for optimization problems where the solution is restricted to a specific line, surface, or hyperplane.
**Key Idea:**
To minimize a function $f(x,y)$ under constraints, we need to formulate the constraint as a function $g(x,y)=0$. Then we can combine $f$ and $g$ into a single equation.
The critical insight is that the minima/maxima are found where the gradient of $f$ is parallel to the gradient of $g$, where $\lambda$ is the *Lagrange multiplier*, a scalar that adjusts for the lengths of the two gradients.
$ \nabla f(x,y)=\lambda *\nabla g(x,y)$
## Example
Maximize the function $f(x,y)=x^2y$ subject to the constraint $x^2+y^2=1$. Below the light-blue contour lines depict different $[x,y]$ values yielding the same function value for $f$.
![[lagrange-multipliers-1.png|center|400]]
**Constraint as a function:** To combine function $f$ with the constraint, we need to express the constraint as a function of $x,y$ at returns $0$ at the constraint.
$x^2+y^2=1 \quad \mapsto \quad g(x,y)=x^2+y^2-1$
**Intersection of contours:** From the contour lines above we can see that the minima/maxima lie where the contours “kiss” the constraint line. At these intersection points both $f(x,y)$ and $g(x,y)$ share the same tangent.
![[lagrange-multipliers-2.png|center|400]]
Since the steepest slope of a function is perpendicular to its tangent, we know that the [[Gradient Descent#Gradient Vector|Gradient vectors]] $\nabla f$ and $\nabla g$ point into the same direction.
Set up the gradients:
$
\begin{align}
\nabla f(x,y) &=
\begin{bmatrix} \frac{\partial f}{\partial x} \\[6pt] \frac{\partial f}{\partial y} \end{bmatrix} =
\begin{bmatrix}2xy \\ x^2\end{bmatrix}\\[6pt]
\nabla g(x,y) &=
\begin{bmatrix} \frac{\partial g}{\partial x} \\[6pt] \frac{\partial g}{\partial y} \end{bmatrix} =
\begin{bmatrix}2x \\ 2y\end{bmatrix}
\end{align}
$
Applying Lagrange multiplier condition:
$
\begin{align}
\frac{\partial f}{\partial x} &= \lambda \frac{\partial g}{\partial x}\\[6pt]
\frac{\partial f}{\partial y} &= \lambda \frac{\partial g}{\partial y}
\end{align}
$
Substituting the gradients:
$\begin{align}2xy=2x\gamma \\ x^2=2y\gamma \end{align}$
By adding the constraint as a third function, we have a system of equations that can be solved for $x,y$ and $\lambda$.
> [!note:]
> In this plot we actually see 4 tangent points, where we do not know which of them are minima or maxima beforehand. After we identified their respective $[x,y]$ values, we have to compute $f(x,y)$ to see which point returns the lowest/highest function value.
**Useful links:**
* [Lagrange multipliers, using tangency to solve constrained optimization](https://www.youtube.com/watch?v=yuqB-d5MjZA)
* [Lagrange Multipliers | Geometric Meaning & Full Example](https://www.youtube.com/watch?v=8mjcnxGMwFo)