Optimization Problems#
We are often interested in finding extrema of a real scalar field under specific conditions.
Imagine we are tasked with the construction of a box with a lid which should be able to fit exactly \(1\) cubic meter of stuff. We want to minimize the cost and thus the amount of material used for the box. In other words, we want to find the dimensions of the box which require the least material. Suppose the amount of material is given by
We see that \(f\) has no extrema on its own. However, we also have another condition: the box should have a volume of \(1\) cubic meter. This gives us a constraint for the dimensions \(x, y, z\):
We are now looking for the dimensions \(x, y, z\) with \(xyz = 1\) for which \(f(x, y, z)\) is minimal. Mathematically, we are no longer searching for the extrema of \(f\) but rather of \(f\)'s restriction on \(\Omega = \{(x, y, z) \in \mathbb{R}_{\gt 0}^3 \mid xyz = 1\}\). It turns out that \(f\vert_{\Omega}\) does have extrema.
Definition: Optimization Problem
An optimization problem \((f, \Omega_{\text{ad}})\) consists of an objective function which is a real scalar field \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) and a set \(\Omega_{\text{ad}} \subseteq \mathcal{D}\) of admissible values.
Definition: Equality Constraint
An equality constraint is a real vector function \(g: \mathcal{D}_g \subseteq \mathbb{R}^n \to \mathbb{R}^m\) such that \(g(\boldsymbol{p}) = 0\) for all \(\boldsymbol{p} \in \Omega_{\text{ad}}\).
Example: \(f(x, y) = x^2 + y^2\) with \(-x^2 + y = -3\)
Consider the optimization problem \((f, \Omega_{\text{ad}})\) consisting of the real scalar field \(f: \mathbb{R}^2 \to \mathbb{R}\) defined as
and the following admissible values:
The real scalar field \(g: \mathbb{R}^2 \to \mathbb{R}\) defined as
is an equality constraint.
In fact, we could have written \(\Omega_{\text{ad}}\) as follows:
Definition: Local Solution
We say that \(\boldsymbol{p} \in \Omega_{\text{ad}}\) is a local solution of \((f, \Omega_{\text{ad}})\) if the restriction \(f\vert_{\Omega_{\text{ad}}}\) has a local extremum at \(\boldsymbol{p}\).
Definition: Global Solution
We say that \(\boldsymbol{p} \in \Omega_{\text{ad}}\) is a global solution of \((f, \Omega_{\text{ad}})\) if the restriction \(f\vert_{\Omega_{\text{ad}}}\) has a global extremum at \(\boldsymbol{p}\).
In the vast majority of cases, we are interested in only one type of local extrema: either local minima or local maxima.
Example: \(f(x, y) = x^2 + y^2\) with \(-x^2 + y + 3 = 0\)
Consider the optimization problem \((f, \Omega_{\text{ad}})\) consisting of the real scalar field \(f: \mathbb{R}^2 \to \mathbb{R}\) defined as
and the following admissible values:
We have \((x, y) \in \Omega_{\text{ad}}\) if and only if \(y = x^2 - 3\). Therefore:
Let \(\tilde{f}: \mathbb{R} \to \mathbb{R}\) be the real function defined as follows:
We see that \(f|_{\Omega_{\text{ad}}}\) has a local extremum at \((x, y)\) if and only if \(\tilde{f}\) has a local extremum at \(x\). The derivative of \(\tilde{f}\) is the following:
This is \(0\) at the following points:
The second derivative of \(\tilde{f}\) is the following:
We have:
Therefore, \(\tilde{f}\) has local minima at \(x_1\) and \(x_3\) and a local maximum at \(x_2\). This implies that \(f|_{\Omega_{\text{ad}}}(x, y)\) has local minima at \(\left(-\frac{\sqrt{10}}{2}, -\frac{1}{2}\right)\) and \(\left(\frac{\sqrt{10}}{2}, -\frac{1}{2}\right)\) and a local maximum at \((0, -3)\).
If we are interested at the local minima, then we would say that \(\left(-\frac{\sqrt{10}}{2}, -\frac{1}{2}\right)\) and \(\left(\frac{\sqrt{10}}{2}, -\frac{1}{2}\right)\) are the solutions and if we are interested at the local maxima, then the solution would be \((0, -3)\).
Lagrange Multipliers#
The method of Lagrange multipliers allows us to find possible local solutions to an optimization problem specified by an equality constraint.
Definition: Lagrangian Function
Let \((f, \Omega_{\text{ad}})\) be an optimization problem, where \(f: \mathcal{D}_f \subseteq \mathbb{R}^n \to \mathbb{R}\) is a real scalar field and \(\Omega_{\text{ad}}\) can be specified via an equality constraint \(g: \mathcal{D}_g \subseteq \mathbb{R}^n \to \mathbb{R}^m\) with \(m \lt n\):
The Lagrangian function is the real scalar field \(\mathcal{L}: (\mathcal{D}_f \cap \mathcal{D}_g) \times \mathbb{R}^m \to \mathbb{R}\) defined as
for all \(\boldsymbol{x} \in \mathcal{D}_f \cap \mathcal{D}_g\) and all \(\boldsymbol{\lambda} \in \mathbb{R}^m\).
Notation
If we denote \(\boldsymbol{\lambda} = \begin{bmatrix}\lambda_1 & \cdots & \lambda_m\end{bmatrix}^{\mathsf{T}}\), we also write \(\mathcal{L}(\boldsymbol{x}, \lambda_1, \dotsc, \lambda_m)\).
Theorem: First-Order Necessary Optimality Condition for Optimization Problems
Let \((f, \Omega_{\text{ad}})\) be an optimization problem, where \(f: \mathcal{D}_f \subseteq \mathbb{R}^n \to \mathbb{R}\) is a real scalar field and \(\Omega_{\text{ad}}\) can be specified via an equality constraint \(g: \mathcal{D}_g \subseteq \mathbb{R}^n \to \mathbb{R}^m\) with \(m \lt n\):
If \(f\vert_{\Omega_{\text{ad}}}\) has a local extremum at \(\boldsymbol{p} \in \Omega_{\text{ad}}\) and \(f\) and the component functions \(g_1, \dotsc, g_m\) are continuously partially differentiable on an open neighborhood of \(\boldsymbol{p}\) such that the gradients \(\nabla g_1 (\boldsymbol{p}), \dotsc, \nabla g_m (\boldsymbol{p})\) are linearly independent, then there exist \(\tilde{\lambda}_1, \dotsc, \tilde{\lambda}_m \in \mathbb{R}\) such that
Alternatively, using the Lagrangian function: If \(f\vert_{\Omega_{\text{ad}}}\) has a local extremum at \(\boldsymbol{p} \in \Omega_{\text{ad}}\) and \(f\) and \(g\) are continuously partially differentiable on an open neighborhood of \(\boldsymbol{p}\) such that the Jacobian matrix \(J_g(\boldsymbol{p})\) has rank \(m\), then there exists some \(\boldsymbol{\tilde{\lambda}} = \begin{bmatrix} \tilde{\lambda}_1, \dotsc, \tilde{\lambda}_m \end{bmatrix}^{\mathsf{T}} \in \mathbb{R}^m\) such that the gradient of the Lagrangian function is zero at \((\boldsymbol{p}, \boldsymbol{\tilde{\lambda}})\):
Definition: Lagrange Multipliers
We call \(\tilde{\lambda}_1, \dotsc, \tilde{\lambda}_m\) Lagrange multipliers.
Notation
An alternative convention is sometimes used, where the Lagrange multipliers are defined as \(-\tilde{\lambda}_1, \dotsc, -\tilde{\lambda}_m\).
Proof
TODO
This theorem gives us a way to find solution candidates for optimization problems by solving the system
for \(\boldsymbol{x}\) and \(\lambda_1, \dotsc, \lambda_m\).
Notation
We can also write this system using the Lagrangian function. The condition \(\nabla \mathcal{L}(\boldsymbol{p})\)
is equivalent to the following:
Therefore, the solutions to the system
are candidates for solutions of the optimization problem. Since \(\mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}) = f(\boldsymbol{x}) + \boldsymbol{\lambda}^{\mathsf{T}} g(\boldsymbol{x})\) can be expressed as
the first \(n\) equations are equivalent to the following:
This is often written as
and we get:
Example: \(f(x, y) = x^3 y^3\) under \(g(x, y) = x^2 + 2y^2 - 1 = 0\)
Consider the optimization problem \((f, \Omega_{\text{ad}})\) consisting of the real scalar field \(f: \mathbb{R}^2 \to \mathbb{R}\) defined as
and the admissible values due to the the equality constraint \(g: \mathbb{R}^2 \to \mathbb{R}\):
Both \(f\) and \(g\) are continuously partially differentiable on \(\mathbb{R}^2\). For the Jacobian matrix \(J_g(x, y)\), we have:
Its rank is \(0 \lt 1\) if and only if \((x,y) = (0,0)\). However, \((0,0) \notin \Omega_{\text{ad}}\), since \(g(0,0) = 0^2 + 2\cdot 0^2 - 1 = -1 \ne 0\). Therefore, the Jacobian matrix has full rank for all \((x, y) \in \Omega_{\text{ad}}\). We can thus use the theorem to find solution candidates.
We have Lagrangian function
with the following gradient:
Therefore, we want to solve the following system:
We multiply the first equation by \(x\), the second by \(y\) and examine the difference:
We thus have \(\lambda = 0\) or \(x = -\sqrt{2}y\) or \(x = +\sqrt{2}y\).
If \(\lambda = 0\), we get \(3x^2y^3 = 0\) and so \(x\) or \(y\) must also be zero. For \(x = 0\), we get the following solutions by plugging \(x\) into the third equation:
For \(y = 0\), we get the following solutions by plugging \(x\) into the third equation:
If \(\lambda \ne 0\), and \(x = -\sqrt{2}y\), plugging this into the third equation gives \((-\sqrt{2}y)^2 + 2y^2 - 1 = 0 \implies 4y^2 = 1 \implies y = \pm\frac{1}{2}\).
For \(y = \frac{1}{2}\), we get \(x = -\frac{\sqrt{2}}{2}\). We can find \(\lambda\) by substituting into the first equation (\(2\lambda x = -3x^2y^3 \implies \lambda = -\frac{3}{2}xy^3\)):
For \(y = -\frac{1}{2}\), we get \(x = \frac{\sqrt{2}}{2}\). Finding \(\lambda\):
This yields the following solutions:
If \(\lambda \ne 0\), and \(x = +\sqrt{2}y\), plugging this into the third equation similarly gives \(4y^2 = 1 \implies y = \pm\frac{1}{2}\).
For \(y = \frac{1}{2}\), we get \(x = \frac{\sqrt{2}}{2}\). Finding \(\lambda\):
For \(y = -\frac{1}{2}\), we get \(x = -\frac{\sqrt{2}}{2}\). Finding \(\lambda\):
This yields the following solutions:
Our candidates are thus the following:
However, these are not guaranteed to be solutions to the optimization problem without further investigation.
Example: \(f(x, y, z) = x^2\) under \(g_1(x, y, z) = x^2 + y^2 + z^2 - 1 = 0\) and \(g_2 (x, y, z) = x - z = 0\)
Consider the optimization problem \((f, \Omega_{\text{ad}})\) consisting of the real scalar field \(f: \mathbb{R}^3 \to \mathbb{R}\) defined as
and the admissible values due to the the equality constraints \(g_1: \mathbb{R}^3 \to \mathbb{R}\) and \(g_2: \mathbb{R}^3 \to \mathbb{R}\):
Both \(f\), \(g_1\), and \(g_2\) are continuously partially differentiable on \(\mathbb{R}^3\). For the Jacobian matrix \(J_g(x, y, z)\) where \(g = (g_1, g_2)\), we have:
The rank of \(J_g(x, y, z)\) is strictly less than 2 if and only if its rows are linearly dependent, which occurs when \(y = 0\) and \(x = -z\). We must check if any such points exist in \(\Omega_{\text{ad}}\). If \(x = -z\), the second constraint \(g_2(x, y, z) = x - z = 0\) forces \(x = 0\) and \(z = 0\). Plugging \((0, 0, 0)\) into the first constraint yields \(g_1(0,0,0) = -1 \ne 0\). Therefore, no point where the rank drops exists in \(\Omega_{\text{ad}}\), meaning the Jacobian matrix has full rank for all \((x, y, z) \in \Omega_{\text{ad}}\). We can thus use the theorem to find solution candidates.
We have Lagrangian function
with the following gradient:
Therefore, we want to solve the following system:
From the second equation, we have \(\lambda_1 = 0\) or \(y = 0\).
If \(\lambda_1 = 0\), we get \(-\lambda_2 = 0 \implies \lambda_2 = 0\) from the third equation. Substituting these into the first equation yields \(2x = 0 \implies x = 0\). From the fifth equation, we get \(x - z = 0 \implies z = 0\). For \(x = 0\) and \(z = 0\), we get the following solutions by plugging into the fourth equation (\(0^2 + y^2 + 0^2 - 1 = 0 \implies y = \pm 1\)):
If \(y = 0\), we get \(x - z = 0 \implies z = x\) from the fifth equation. Plugging \(y=0\) and \(z=x\) into the fourth equation gives \(x^2 + 0^2 + x^2 - 1 = 0 \implies 2x^2 = 1 \implies x = \pm\frac{1}{\sqrt{2}}\). Since \(z = x\), this means \(z = \pm\frac{1}{\sqrt{2}}\) as well.
Adding the first and third equations gives \(2x + 2\lambda_1(x+z) = 0\). Substituting \(z=x\) yields \(2x + 4\lambda_1 x = 0 \implies 2x(1 + 2\lambda_1) = 0\). Since \(x \ne 0\), we have \(1 + 2\lambda_1 = 0 \implies \lambda_1 = -\frac{1}{2}\).
For \(x = \frac{1}{\sqrt{2}}\) and \(z = \frac{1}{\sqrt{2}}\), we find \(\lambda_2\) by substituting into the third equation (\(\lambda_2 = 2\lambda_1 z\)):
For \(x = -\frac{1}{\sqrt{2}}\) and \(z = -\frac{1}{\sqrt{2}}\), finding \(\lambda_2\):
This yields the following solutions:
Our candidates are thus the following:
However, these are not guaranteed to be solutions to the optimization problem without further investigation.