Convexity of Multivariate Functions

**Univariate Functions:** For a function $f(\theta): \mathbb{R} \to \mathbb{R}$, the second derivative w.r.t. $\theta$ determines whether the function is [[Identify Convex and Concave Functions|convex or concave]]: - *Convex:* $f''(\theta) \geq 0$ for all $\theta$. - *Concave:* $f''(\theta) \leq 0$ for all $\theta$. **Multivariate Functions:** For a scalar valued function with multiple inputs $f(\theta): \mathbb{R}^d \to \mathbb{R}$, the analysis of convexity or concavity requires computing the [[Gradient Descent#Gradient Vector|gradient vector]] $\nabla f$ and the [[Hessian]] matrix $\mathbf{H}(f)$. $ \theta = \begin{bmatrix} \theta_1\\ \vdots \\ \theta_d \end{bmatrix}, \quad \nabla f(\theta) = \begin{bmatrix} \frac{\partial f}{\partial \theta_1} \\ \vdots\\[4pt] \frac{\partial f}{\partial \theta_d} \\ \end{bmatrix}, \quad \mathbf H(f(\theta)) = \begin{bmatrix} \frac{\partial^2 f}{\partial \theta_1\theta_1} & \cdots & \frac{\partial^2 f}{\partial \theta_1\theta_d} \\ \vdots & \ddots& \vdots\\[4pt] \frac{\partial^2 f}{\partial \theta_d\theta_1} & \cdots & \frac{\partial^2 f}{\partial \theta_d\theta_d} \end{bmatrix} $ **Definition:** For a symmetric matrix $\mathbf A$ with shape $d \times d$ its convexity or concavity is defined by the outcome of the following transformed by any vector $x$. Note that this outcome is a scalar value. $ x^T\mathbf A x \le 0, \quad \text{for all } x \in \mathbb R^d $ | Cases | Type | Property | | ---------------- | ---------------------- | ----------------------- | | Concave | Negative semi-definite | $x^T \mathbf A x \le 0$ | | Strictly concave | Negative definite | $x^T \mathbf A x < 0$ | | Convex | Positive semi-definite | $x^T \mathbf A x \ge 0$ | | Strictly convex | Positive definite | $x^T \mathbf A x > 0$ | ^f4dc92 >[!note:] >For a diagonal matrix $\mathbf A$, the signs of the diagonal elements determine definiteness. >[!note:] >Strictly convex or concave functions cannot have saddle points, whereas non-strict versions might. Having a strictly concave (convex) function means that there is only one unique maximum (minimum). We can find it by taking the first derivative and setting it to zero. In the multivariate case, we set the gradient to zero. $ \overbrace{\begin{bmatrix} \frac{\partial f}{\partial \theta_1} \\ \vdots\\[4pt] \frac{\partial f}{\partial \theta_d} \\ \end{bmatrix}}^{\nabla f(\theta)} = \begin{bmatrix} 0\\ \vdots \\[4pt]0 \end{bmatrix} $ Since this results in $d$ equations with $d$ unknown values, this problems can be solved analytically.