Extrema (Real Scalar Fields)#
Local Extrema#
Definition: Local Minimum
Let \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field.
We say that \(\boldsymbol{p} \in \mathcal{D}\) is a place of a local minimum of \(f\) if there exists some open neighborhood \(\mathcal{N}\) of \(\boldsymbol{p}\) such that
for all \(\boldsymbol{x} \in \mathcal{N} \cap \mathcal{D}\).
We call \(f(\boldsymbol{p})\) the corresponding local minimum.
If the inequality is strict, we say (place of) strict local minimum.
Definition: Local Maximum
Let \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field.
We say that \(\boldsymbol{p} \in \mathcal{D}\) is a place of a local maximum of \(f\) if there exists some open neighborhood \(\mathcal{N}\) of \(\boldsymbol{p}\) such that
for all \(\boldsymbol{x} \in \mathcal{N} \cap \mathcal{D}\).
We call \(f(\boldsymbol{p})\) the corresponding local maximum.
If the inequality is strict, we say (place of) strict local maximum.
Definition: Local Extremum
A local extremum is a local minimum or a local maximum.
Theorem: First-Order Necessary Optimality Condition
Let \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field, let \(\boldsymbol{p}\) be an interior point of \(\mathcal{D}\) and let \(\boldsymbol{r} \in \mathbb{R}^n\).
If \(f\) is directionally differentiable at \(\boldsymbol{p}\) along \(\boldsymbol{r}\) and has a local extremum at \(\boldsymbol{p}\), then this directional derivative is zero:
Example: \(f(x,y) = x^2 + 3y^4\)
Consider the real scalar field \(f: \mathbb{R}^2 \to \mathbb{R}\) defined as follows:
It is partially differentiable on \(\mathbb{R}^2\):
The only possible place for a local extremum is thus \((0, 0)\) because it is the only place where both \(\partial_x f(x, y)\) and \(\partial_y f(x, y)\) are zero. Indeed, we see that it is a place of a local minimum, since \(f(x, y) \ge 0 = f(0, 0)\) for all \((x, y) \in \mathbb{R}^2\).
Proof
Since \(\boldsymbol{p}\) is an interior point of \(\mathcal{D}\), there exists some \(\varepsilon \gt 0\) such that \(\boldsymbol{p} + t \boldsymbol{r} \in \mathcal{D}\) for all \(t \in (-\varepsilon, \varepsilon)\). Let \(g : (-\varepsilon, \varepsilon) \to \mathbb{R}\) be a real function defined as follows:
Since \(f\) has a local extremum at \(\boldsymbol{p}\), we know that \(g\) must have a local extremum at \(t = 0\). Furthermore, since \(f\) is directionally differentiable at \(\boldsymbol{p}\) along \(\boldsymbol{r}\) we know the following [limit](../Real%20Functions/Limits%20(Real%20Functions.md) exists:
This is just the derivative \(g'(0)\). Since \(g\) has a local extremum at \(t = 0\), we know that \(g'(0)\) must be zero. Therefore, \(\partial_{\boldsymbol{r}}f(\boldsymbol{p})\) is also zero.
Theorem: Second-Order Necessary Optimality Condition
Let \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field and let \(\boldsymbol{p}\) be an interior point of \(\mathcal{D}\) such that \(f\) is twice totally differentiable at \(\boldsymbol{p}\).
If \(f\) has a local minimum at \(\boldsymbol{p}\), then its Hessian matrix \(H_f(\boldsymbol{p})\) is positive semidefinite.
If \(f\) has a local minimum at \(\boldsymbol{p}\), then its Hessian matrix \(H_f(\boldsymbol{p})\) is negative semidefinite.
Proof
We need to prove two things:
- (I) If \(f\) is twice totally differentiable at \(\boldsymbol{p}\) and has a local minimum there, then its Hessian matrix \(H_f(\boldsymbol{p})\) is positive semidefinite.
- (II) If \(f\) is twice totally differentiable at \(\boldsymbol{p}\) and has a local maximum there, then its Hessian matrix \(H_f(\boldsymbol{p})\) is negative semidefinite.
Let \(\boldsymbol{r} \in \mathbb{R}^n\). Since \(\boldsymbol{p}\) is an interior point of \(\mathcal{D}\), there exists some \(\varepsilon \gt 0\) such that \(\boldsymbol{p} + t \boldsymbol{r} \in \mathcal{D}\) for all \(t \in (-\varepsilon, \varepsilon)\). Let \(g : (-\varepsilon, \varepsilon) \to \mathbb{R}\) be a real function defined as follows:
Since \(f\) is twice totally differentiable at \(\boldsymbol{p}\) and the curve \(\gamma(t) = \boldsymbol{p} + t \boldsymbol{r}\) is infinitely differentiable, we know that \(g\) is differentiable around \(t = 0\). The chain rule gives us the following:
Since \(f\) is twice totally differentiable at \(\boldsymbol{p}\), we know that \(g'\) is differentiable at \(t = 0\):
The chain rule gives us the following:
Proof of (I):
Since \(f\) has a local minimum at \(\boldsymbol{p}\), we know that \(g\) must have a local minimum at \(t = 0\). Therefore, \(g''(0) \ge 0\) and so
Since \(\boldsymbol{r} \in \mathbb{R}^n\) was chosen arbitrarily, the Hessian matrix \(H_f(\boldsymbol{p})\) is positive semidefinite.
Proof of (II):
Since \(f\) has a local maximum at \(\boldsymbol{p}\), we know that \(g\) must have a local maximum at \(t = 0\). Therefore, \(g''(0) \le 0\) and so
Since \(\boldsymbol{r} \in \mathbb{R}^n\) was chosen arbitrarily, the Hessian matrix \(H_f(\boldsymbol{p})\) is negative semidefinite.
Theorem: Second-Order Sufficient Optimality Condition
Let \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field and let \(\boldsymbol{p}\) be an interior point of \(\mathcal{D}\) such that \(f\) is twice totally differentiable at \(\boldsymbol{p}\).
If the gradient \(\nabla f(\boldsymbol{p})\) is zero and:
-
the Hessian matrix \(H_f(\boldsymbol{p})\) is positive definite, then \(f\) has a strict local minimum at \(\boldsymbol{p}\).
-
the Hessian matrix \(H_f(\boldsymbol{p})\) is positive semi-definite with at least one zero and one non-zero eigenvalue, then \(f\) has either a local minimum or saddle point at \(\boldsymbol{p}\).
-
the Hessian matrix \(H_f(\boldsymbol{p})\) is negative definite, then \(f\) has a strict local maximum at \(\boldsymbol{p}\).
-
the Hessian matrix \(H_f(\boldsymbol{p})\) is negative semi-definite with at least one zero and one non-zero eigenvalue, then \(f\) has either a local maximum or saddle point at \(\boldsymbol{p}\).
-
the Hessian matrix \(H_f(\boldsymbol{p})\) is indefinite, then \(f\) has a saddle point at \(\boldsymbol{p}\).
Example: \(f(x, y) = x^2 + 3y^2\)
Let \(f: \mathbb{R}^2 \to \mathbb{R}\) be the real scalar field defined as follows:
It is twice totally differentiable on \(\mathbb{R}^2\) with the following gradient and Hessian matrix:
The gradient is zero if and only if \(x = y = 0\). We have
with the eigenvalues \(2\) and \(6\). Therefore, \(H_f (0, 0)\) is positive definite and so \(f\) has a local minimum at \((0,0)\).
Example: \(f(x, y) = (x-1)^2 - y^4 -4y\)
Let \(f: \mathbb{R}^2 \to \mathbb{R}\) be the real scalar field defined as follows:
It is twice totally differentiable on \(\mathbb{R}^2\) with the following gradient and Hessian matrix:
The gradient is zero if and only if \(x = 1\) and \(y = -1\). We have
with the eigenvalues \(2\) and \(-12\). Therefore, \(H_f (1, -1)\) is indefinite and so \(f\) has a saddle point at \((1,-1)\).
Example: \(f(x, y) = x^2 + y^4\)
Let \(f: \mathbb{R}^2 \to \mathbb{R}\) be the real scalar field defined as follows:
It is twice totally differentiable on \(\mathbb{R}^2\) with the following gradient and Hessian matrix:
The gradient is zero if and only if \(x = y = 0\). We have
with the eigenvalues \(2\) and \(0\). Therefore, \(H_f (0, 0)\) is positive semi-definite and so \(f\) has either a local minimum or a saddle point at \((0,0)\). Since \(x^2 \ge 0\) and \(y^4 \ge 0\) for all real \(x\) and \(y\), \(f(x,y) \ge f(0,0) = 0\), so \(f\) has a local minimum at \((0,0)\), but not a strict local minimum.
Example: \(f(x, y) = x^2\)
Let \(f: \mathbb{R}^2 \to \mathbb{R}\) be the real scalar field defined as follows:
It is twice totally differentiable on \(\mathbb{R}^2\) with the following gradient and Hessian matrix:
The gradient is zero if and only if \(x = 0\). We have
with the eigenvalues \(2\) and \(0\). Therefore, \(H_f (0, y)\) is positive semi-definite and so \(f\) has either a local minimum or a saddle point at \((0,y)\). Since \(x^2 \ge 0\) for all real \(x\) and \(y\), \(f(x,y) \ge f(0,y) = 0\), so \(f\) has a local minimum at \((0,y)\), but not a strict local minimum.
Example: \(f(x, y) = x^2 + y^3\)
Let \(f: \mathbb{R}^2 \to \mathbb{R}\) be the real scalar field defined as follows:
It is twice totally differentiable on \(\mathbb{R}^2\) with the following gradient and Hessian matrix:
The gradient is zero if and only if \(x = y = 0\). We have
with the eigenvalues \(2\) and \(0\). Therefore, \(H_f (0, 0)\) is positive semi-definite and so \(f\) has either a local minimum or a saddle point at \((0,0)\). For \(x=0\), we have \(f(0,y) = y^3\). For all \(\delta \gt 0\), we have \(f(0, y) \lt 0\) if \(y \in (-\delta, 0)\) and \(f(0, y) \gt 0\) if \(y \in (0, +\delta)\). Therefore, \(f\) cannot have a local minimum at \((0,0)\) and so it must have a saddle point there.
Proof
We need to prove two things:
- (I) If the Hessian matrix \(H_f(\boldsymbol{p})\) is positive definite, then \(f\) has a strict local minimum at \(\boldsymbol{p}\).
- (II) If the Hessian matrix \(H_f(\boldsymbol{p})\) is negative definite, then \(f\) has a strict local maximum at \(\boldsymbol{p}\).
Since \(f\) is twice totally differentiable at \(\boldsymbol{p}\) it has a second-order Taylor expansion around \(\boldsymbol{p}\)
with
Since \(\nabla f(\boldsymbol{p}) = \boldsymbol{0}\), we have:
Since \(H_f({\boldsymbol{p}})\) is a real symmetric matrix, we know that
where \(\lambda_n \ge \cdots \ge \lambda_1\) are the eigenvalues of \(H_f(\boldsymbol{p})\).
Proof of (I):
Since \(H_f(\boldsymbol{p})\) is positive definite, we get
and so
Since \(R_3 (\boldsymbol{p}; \boldsymbol{x} - \boldsymbol{p})\) is little o of \(||\boldsymbol{x} - \boldsymbol{p}||^2\) for \(\boldsymbol{x} \to \boldsymbol{p}\), we know that for \(\varepsilon = \frac{1}{4}\lambda_1\), there exists some deleted neighborhood \(\mathcal{N}(\boldsymbol{p})\) such that
for all \(\boldsymbol{x} \in \mathcal{N}(\boldsymbol{p}) \cap \mathcal{D}\). More specifically, we know that
for all \(\boldsymbol{x} \in \mathcal{N}(\boldsymbol{p}) \cap \mathcal{D}\). Therefore,
for all \(\boldsymbol{x} \in \mathcal{N}(\boldsymbol{p}) \cap \mathcal{D}\). In others words, we know that
for all \(\boldsymbol{x} \in \mathcal{N}(\boldsymbol{p}) \cap \mathcal{D}\) and so \(f\) has a strict local minimum at \(\boldsymbol{p}\).
Proof of (II):
Since \(H_f(\boldsymbol{p})\) is negative definite, we get
and so
Since \(R_3 (\boldsymbol{p}; \boldsymbol{x} - \boldsymbol{p})\) is little o of \(||\boldsymbol{x} - \boldsymbol{p}||^2\) for \(\boldsymbol{x} \to \boldsymbol{p}\), we know that for \(\varepsilon = -\frac{1}{4}\lambda_n\), there exists some deleted neighborhood \(\mathcal{N}(\boldsymbol{p})\) such that
for all \(\boldsymbol{x} \in \mathcal{N}(\boldsymbol{p}) \cap \mathcal{D}\). More specifically, we know that
for all \(\boldsymbol{x} \in \mathcal{N}(\boldsymbol{p}) \cap \mathcal{D}\). Therefore,
for all \(\boldsymbol{x} \in \mathcal{N}(\boldsymbol{p}) \cap \mathcal{D}\). In others words, we know that
for all \(\boldsymbol{x} \in \mathcal{N}(\boldsymbol{p}) \cap \mathcal{D}\) and so \(f\) has a strict local maximum at \(\boldsymbol{p}\).