Gradient#

Theorem: Total Differentiability of Real Scalar Fields

Let \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field and let \(\boldsymbol{p} \in \mathbb{R}^n\) be an interior point of \(\mathcal{D}\).

Then \(f\) is totally differentiable at \(\boldsymbol{p}\) if and only if there exists some real vector \(\boldsymbol{v} \in \mathbb{R}^n\) such that the following limit is zero:

\[\lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{f(\boldsymbol{x}) - f(\boldsymbol{p}) - \boldsymbol{v} \cdot (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} = 0,\]

where \(\cdot\) is the dot product.

Definition: Gradient

We call \(\boldsymbol{v}\) the gradient of \(f\) at \(\boldsymbol{p}\).

Notation

\[\nabla f(\boldsymbol{p}) \qquad \operatorname{grad} f(\boldsymbol{p})\]

Example: \(f(\boldsymbol{x}) = \boldsymbol{a}^{\mathsf{T}}\boldsymbol{x}\)

Consider the real scalar field \(f: \mathbb{R}^n \to \mathbb{R}\) defined as

\[f(\boldsymbol{x}) = \boldsymbol{a}^{\mathsf{T}}\boldsymbol{x}\]

for some fixed real vector \(\boldsymbol{a} = \begin{bmatrix} a^1 & \cdots & a^n \end{bmatrix}^{\mathsf{T}}\in \mathbb{R}^n\).

For each \(\boldsymbol{p} \in \mathbb{R}^n\), if we let \(\boldsymbol{v} = \boldsymbol{a}\) we have:

\[\begin{aligned}\lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{f(\boldsymbol{x}) - f(\boldsymbol{p}) - \boldsymbol{a} \cdot (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{a}^{\mathsf{T}}\boldsymbol{x} - \boldsymbol{a}^{\mathsf{T}}\boldsymbol{p} - \boldsymbol{a}^{\mathsf{T}}(\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{a}^{\mathsf{T}}(\boldsymbol{x} - \boldsymbol{p}) - \boldsymbol{a}^{\mathsf{T}}(\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{0}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = 0\end{aligned}\]

Therefore, \(f\) is totally differentiable on \(\mathbb{R}^n\) and its gradient at each \(\boldsymbol{p} \in \mathbb{R}^n\) is equal to \(\boldsymbol{a}\):

\[\nabla f(\boldsymbol{p}) = \boldsymbol{a}\]

Example: \(f(\boldsymbol{x}) = \boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x}\)

Consider the real scalar field \(f: \mathbb{R}^n \to \mathbb{R}\) defined as

\[f(\boldsymbol{x}) = \boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x}\]

for some real matrix \(\boldsymbol{A} \in \mathbb{R}^{n \times n}\).

For each \(\boldsymbol{p} \in \mathbb{R}^n\), if we let \(\boldsymbol{v} = (\boldsymbol{A} + \boldsymbol{A}^{\mathsf{T}})\boldsymbol{p}\), we have:

\[\begin{aligned}\lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{f(\boldsymbol{x}) - f(\boldsymbol{p}) - \boldsymbol{v} \cdot (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - \boldsymbol{p}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{p} - ((\boldsymbol{A} + \boldsymbol{A}^{\mathsf{T}})\boldsymbol{p})^{\mathsf{T}}(\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - \boldsymbol{p}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{p} - \boldsymbol{p}^{\mathsf{T}}(\boldsymbol{A}^{\mathsf{T}} + A)(\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - \boldsymbol{p}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{p} - \boldsymbol{p}^{\mathsf{T}} (\boldsymbol{A}^{\mathsf{T}}\boldsymbol{x} - \boldsymbol{A}^{\mathsf{T}}\boldsymbol{p} + \boldsymbol{A}\boldsymbol{x} - \boldsymbol{A}\boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - \boldsymbol{p}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{p} - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}^{\mathsf{T}}\boldsymbol{x} + \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}^{\mathsf{T}}\boldsymbol{p} - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{x} + \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{p}}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}^{\mathsf{T}}\boldsymbol{x} + \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}^{\mathsf{T}}\boldsymbol{p} - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{x}}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - (\boldsymbol{x}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{p})^{\mathsf{T}} + (\boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{p})^{\mathsf{T}} - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{x}}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - \boldsymbol{x}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{p} + \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{p} - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{x}}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{p}) - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A} (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{(\boldsymbol{x}^{\mathsf{T}} - \boldsymbol{p}^{\mathsf{T}}) \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{(\boldsymbol{x} - \boldsymbol{p})^{\mathsf{T}} \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||}\end{aligned}\]

With the substitution \(\boldsymbol{h} = \boldsymbol{x} - \boldsymbol{p} = \begin{bmatrix}h^1, \dotsc, h^n \end{bmatrix}^{\mathsf{T}}\) we get:

\[\lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{(\boldsymbol{x} - \boldsymbol{p})^{\mathsf{T}} \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} = \lim_{\boldsymbol{h} \to \boldsymbol{0}} \frac{\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}}{||\boldsymbol{h}||}\]

The product \(\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}\) is given by the following:

\[\begin{aligned}\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h} & = \boldsymbol{h}^{\mathsf{T}}(\boldsymbol{A}\boldsymbol{h}) = \begin{bmatrix} h^1 & h^2 & \dots & h^n \end{bmatrix} \begin{bmatrix} \sum_{j=1}^{n} A_{1j} h^j \\ \sum_{j=1}^{n} A_{2j} h^j \\ \vdots \\ \sum_{j=1}^{n} A_{nj} h^j \end{bmatrix} \\ & = \sum_{i=1}^{n} h^i \left( \sum_{j=1}^{n} A_{ij} h^j \right) \\ & = \sum_{i=1}^{n} \sum_{j=1}^{n} A_{ij} h^i h^j\end{aligned}\]

We thus have:

\[|\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}| \leq \sum_{i=1}^{n} \sum_{j=1}^{n} |A_{ij}| |h^i| |h^j|\]

From \(||\boldsymbol{h}|| = \sqrt{\sum_{k = 1}^n (h^k)^2}\), we know that

\[|h^k| \le ||\boldsymbol{h}||\]

for all \(k \in \{1,\dotsc,n\}\).

Therefore,

\[|h^i||h^j| \le ||\boldsymbol{h}||^2\]

for all \(i,j \in \{1,\dotsc,n\}\).

From this, we obtain the following:

\[|\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}| \leq \sum_{i=1}^{n} \sum_{j=1}^{n} |A_{ij}| ||\boldsymbol{h}||^2 = ||\boldsymbol{h}||^2 \sum_{i=1}^{n} \sum_{j=1}^{n} |A_{ij}|\]

The sum \(\sum_{i=1}^{n} \sum_{j=1}^{n} |A_{ij}|\) is just a constant and so

\[|\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}| \leq C ||\boldsymbol{h}||^2\]

with \(C = \sum_{i=1}^{n} \sum_{j=1}^{n} |A_{ij}|\). Divide by \(||\boldsymbol{h}||\):

\[0 \le \frac{|\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}|}{||\boldsymbol{h}||} \leq C ||\boldsymbol{h}||\]

We know that the limit of \(C ||\boldsymbol{h}||\) for \(\boldsymbol{h} \to \boldsymbol{0}\) is zero. By the squeeze theorem we get:

\[\lim_{\boldsymbol{h} \to \boldsymbol{0}} \frac{|\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}|}{||\boldsymbol{h}||} = 0,\]

i.e.

\[\lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{(\boldsymbol{x} - \boldsymbol{p})^{\mathsf{T}} \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} = 0.\]

Therefore, \(f\) is totally differentiable on \(\mathbb{R}^n\) and its gradient at each \(\boldsymbol{p} \in \mathbb{R}^n\) is equal to \((\boldsymbol{A}+\boldsymbol{A}^{\mathsf{T}})\boldsymbol{p}\):

\[\nabla f(\boldsymbol{p}) = (\boldsymbol{A} + \boldsymbol{A}^{\mathsf{T}})\boldsymbol{p}\]

Proof

TODO

Theorem: Gradient via Jacobian Matrix

Let \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field and let \(\boldsymbol{p} \in \mathcal{D}\).

If \(f\) is totally differentiable at \(\boldsymbol{p}\), then its gradient there is the transpose of \(f\)'s Jacobian matrix:

\[\nabla f(\boldsymbol{p}) = (J_f(\boldsymbol{p}))^{\mathsf{T}}\]

Proof

TODO

Theorem: Gradient of Linear Combination

Let \(f: \mathcal{D}_f \subseteq \mathbb{R}^n \to \mathbb{R}\) and \(g: \mathcal{D}_g \subseteq \mathbb{R}^n \to \mathbb{R}\) be real scalar fields.

If \(f\) and \(g\) are totally differentiable at \(\boldsymbol{p} \in \mathcal{D}_f \cap \mathcal{D}_g\), then so is \(\lambda f + \mu g\) for all \(\lambda, \mu \in \mathbb{R}\) and its gradient is given by the gradients of \(f\) and \(g\) as follows:

\[\nabla (\lambda f + \mu g)(\boldsymbol{p}) = \lambda \nabla f(\boldsymbol{p}) + \mu \nabla g(\boldsymbol{p})\]

Proof

TODO

Theorem: Product Rule with Gradients

Let \(f: \mathcal{D}_f \subseteq \mathbb{R}^n \to \mathbb{R}\) and \(g: \mathcal{D}_g \subseteq \mathbb{R}^n \to \mathbb{R}\) be real scalar fields.

If \(f\) and \(g\) are totally differentiable at \(\boldsymbol{p} \in \mathcal{D}_f \cap \mathcal{D}_g\), then \(fg\) is also totally differentiable there and its gradient is given by the gradients of \(f\) and \(g\) as follows:

\[\nabla(fg)(\boldsymbol{p}) = g(\boldsymbol{p}) \nabla f(\boldsymbol{p}) + f(\boldsymbol{p}) \nabla g(\boldsymbol{p})\]

Proof

TODO

Theorem: Quotient Rule with Gradients

Let \(f: \mathcal{D}_f \subseteq \mathbb{R}^n \to \mathbb{R}\) and \(g: \mathcal{D}_g \subseteq \mathbb{R}^n \to \mathbb{R}\) be real scalar fields.

If \(f\) and \(g\) are totally differentiable at \(\boldsymbol{p} \in \mathcal{D}_f \cap \mathcal{D}_g\) and \(g(\boldsymbol{p}) \ne 0\), then \(f/g\) is also totally differentiable there and its gradient is given by the gradients of \(f\) and \(g\) as follows:

\[\nabla(f/g)(\boldsymbol{p}) = \frac{g(\boldsymbol{p}) \nabla f(\boldsymbol{p}) - f(\boldsymbol{p}) \nabla g(\boldsymbol{p})}{g(\boldsymbol{p})^2}\]

Proof

TODO

Theorem: Dot Product Rule

Let \(f: \mathcal{D}_f \subseteq \mathbb{R}^m \to \mathbb{R}^n\) and \(g: \mathcal{D}_g \subseteq \mathbb{R}^m \to \mathbb{R}^n\) be real vector functions.

If \(f\) and \(g\) are totally differentiable at \(\boldsymbol{p} \in \mathop{\operatorname{int}} (\mathcal{D}_f \cap \mathcal{D}_g)\), then their dot product is also totally differentiable at \(\boldsymbol{p}\) and its gradient is given by the Jacobian matrices of \(f\) and \(g\) as follows:

\[\nabla (f\cdot g)(\boldsymbol{p}) = J_f(\boldsymbol{p})^{\mathsf{T}}g(\boldsymbol{p}) + J_g(\boldsymbol{p})^{\mathsf{T}}f(\boldsymbol{p})\]

Proof

TODO

Theorem: Chain Rule with Real Functions

Let \(f: \mathcal{D}_f \subseteq \mathbb{R} \to \mathbb{R}\) be a real function, let \(g: \mathcal{D}_g \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field and let \(\boldsymbol{x}\) be an interior point of \(\mathcal{D}_g\).

If \(g\) is totally differentiable at \(\boldsymbol{x}\) and \(g(\boldsymbol{x})\) is an interior point of \(\mathcal{D}_f\) and \(f\) is differentiable at \(g(\boldsymbol{x})\), then the composition \(f\circ g\) is totally differentiable at \(\boldsymbol{x}\) with the following gradient:

\[\nabla (f \circ g)(\boldsymbol{x}) = f'(g(\boldsymbol{x}))\nabla g(\boldsymbol{x})\]

Example

Let \(f: \mathbb{R}_{\gt 0} \to \mathbb{R}\) be a real function which is differentiable on \(\mathbb{R}_{\gt 0}\) and consider the real scalar field \(f(||\boldsymbol{x}||)\).

We have:

\[f(||\boldsymbol{x}||) = f\left(\sqrt{\sum_{i=1}^n x_i^2}\right)\]

For the partial derivative of \(\sqrt{\sum_{i=1}^n x_i^2}\) at \(\boldsymbol{x} \in \mathbb{R}^n \setminus \{\boldsymbol{0}\}\) w.r.t. to the \(k\)-th Cartesian coordinate (\(k \in \{1, \dotsc, n\}\)) we have the following:

\[\begin{aligned}\partial_k\sqrt{\sum_{i=1}^n x_i^2} = \frac{2 x_k}{2\sqrt{\sum_{i=1}^n x_i^2}} = \frac{x_k}{||\boldsymbol{x}||}\end{aligned}\]

Since these are all continuous on \(\mathbb{R}^n \setminus \{\boldsymbol{0}\}\), we know that \(\sqrt{\sum_{i=1}^n x_i^2}\) is totally differentiable on \(\mathbb{R}^n \setminus \{\boldsymbol{0}\}\) and its gradient at each \(\boldsymbol{p} \in \mathbb{R}^n \setminus \{\boldsymbol{0}\}\) is the following:

\[\nabla \left(\sqrt{\sum_{i=1}^n x_i^2}\right) (\boldsymbol{x}) = \frac{1}{||\boldsymbol{x}||}\boldsymbol{x}\]

Since \(f\) is differentiable on \(\mathbb{R}_{\gt 0}\), we know that \(f(||\boldsymbol{x}||)\) is totally differentiable on \(\mathbb{R}^n \setminus \{\boldsymbol{0}\}\) and its gradient at each \(\boldsymbol{x} \in \mathbb{R}^n \setminus \{\boldsymbol{0}\}\) is the following:

\[\begin{aligned}\nabla f(||\boldsymbol{x}||) = \frac{f'(||\boldsymbol{x}||)}{||\boldsymbol{x}||} \boldsymbol{x}\end{aligned}\]

Example: \(f(\boldsymbol{x}) = \ln (||\boldsymbol{x}||)\)

From the above example, we know that \(f(\boldsymbol{x}) = \ln (||\boldsymbol{x}||)\) is totally differentiable on \(\mathbb{R}^n \setminus \{\boldsymbol{0}\}\) with the following gradient:

\[\nabla \ln (||\boldsymbol{x}||) = \frac{1}{||\boldsymbol{x}||} \frac{1}{||\boldsymbol{x}||} \boldsymbol{x} = \frac{1}{||\boldsymbol{x}||^2}\boldsymbol{x}\]

Proof

TODO

Theorem: Chain Rule with Curves

Let \(f: \mathcal{D}_f \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field, and let \(g: \mathcal{D}_g \subseteq \mathbb{R} \to \mathbb{R}^n\) be vector-valued.

If \(g\) is totally differentiable at \(t \in \mathcal{D}_g\) and \(f\) is totally differentiable at \(g(t) \in \mathcal{D}_f\), then the derivative of the composition \(f \circ g\) is the dot product of \(f\)'s gradient and \(g\)'s derivative:

\[(f\circ g)'(t) = \nabla f(g(t))\cdot g'(t)\]

Example

Let \(k: [0,2\uppi] \to \mathbb{R}\) be the curve defined as follows:

\[k(t) = \begin{bmatrix} \cos t \\ \sin t \end{bmatrix}\]

Let \(f: \mathbb{R}^2 \to \mathbb{R}\) be the real scalar field defined as follows:

\[f(x,y) = x^2 + xy + y^2\]

For the derivative of \(k\), we have the following:

\[k'(t) = \begin{bmatrix}-\sin t \\ \cos t \end{bmatrix}\]

For the gradient of \(f\), we have:

\[\nabla f (x,y) = \begin{bmatrix}2x + y \\ x + 2y\end{bmatrix}\]

For the derivative of \(g(t) = f(k(t))\), we then have:

\[\begin{aligned}g'(t) & = \nabla f(k(t))^{\mathsf{T}}k'(t) \\ & = \begin{bmatrix}2 \cos t + \sin t & \cos t + 2\sin t\end{bmatrix} \begin{bmatrix}-\sin t \\ \cos t \end{bmatrix} \\ & = (2 \cos t + \sin t)(-\sin t) + (\cos t + 2\sin t) \cos t \\ & = \cos^2 t - \sin^2 t \\ & = \cos (2t)\end{aligned}\]

Proof

Since \(f\) is totally differentiable at \(g(t) \in \mathcal{D}_f\), we know that \(f(g(t) + \boldsymbol{h}) - f(g(t)) - \nabla f(g(t))^{\mathsf{T}} \boldsymbol{h}\) is little o of \(||\boldsymbol{h}||\) for \(\boldsymbol{h} \to \boldsymbol{0}\):

\[f(g(t) + \boldsymbol{h}) - f(g(t)) - \nabla f(g(t))^{\mathsf{T}} \boldsymbol{h} = o(||\boldsymbol{h}||) \qquad \text{for} \qquad \boldsymbol{h} \to \boldsymbol{0}\]

TODO

Theorem: Curl of Gradient

Let \(f: \mathcal{D} \subseteq \mathbb{R}^3 \to \mathbb{R}\) be a real scalar field which is twice totally differentiable at an interior point \(\boldsymbol{p}\) of \(\mathcal{D}\).

The curl of \(f\)'s gradient at \(\boldsymbol{p}\) is zero:

\[\operatorname{curl} \operatorname{grad} f(\boldsymbol{p}) = \boldsymbol{0}\]

Proof

TODO

Theorem: Mean Value Theorem via Gradient

Let \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field and let \(\boldsymbol{a}, \boldsymbol{b} \in \mathcal{D}\) such that \(L = \{\boldsymbol{a} + t(\boldsymbol{b} - \boldsymbol{a}) \mid t \in [0,1]\} \subseteq \mathcal{D}\).

If \(f\) is continuous on \(L\) and totally differentiable on \(\operatorname{int} L\), then there exists some \(\boldsymbol{\xi} \in \operatorname{int} L\) such that \(f(\boldsymbol{b}) - f(\boldsymbol{a})\) is equal to the product of \(f\)'s gradient at \(\boldsymbol{\xi}\) and \(\boldsymbol{b} - \boldsymbol{a}\):

\[f(\boldsymbol{b}) - f(\boldsymbol{a}) = \nabla f(\boldsymbol{\xi}) \cdot (\boldsymbol{b} - \boldsymbol{a})\]

Example: \(f(x, y, z) = x^2 - 3y^2 + z\)

Consider the real scalar field \(f: \mathcal{D} \to \mathbb{R}\) defined on \(\mathcal{D} = \{\boldsymbol{p} \in \mathbb{R}^3 \mid ||\boldsymbol{p}|| \le 1\}\) as follows:

\[f(x, y, z) = x^2 - 3y^2 + z\]

For all \(\boldsymbol{a}, \boldsymbol{b} \in \mathcal{D}\) and \(t \in [0,1]\), we have:

\[||\boldsymbol{a} + t(\boldsymbol{b} - \boldsymbol{a})|| = ||(1 - t)\boldsymbol{a} + t \boldsymbol{b}|| \le ||(1 - t)\boldsymbol{a}|| + ||t \boldsymbol{b}|| = |1 - t| ||\boldsymbol{a}|| + |t| ||\boldsymbol{b}||\]

Since \(t \in [0,1]\), we get:

\[||\boldsymbol{a} + t(\boldsymbol{b} - \boldsymbol{a})|| \le (1 - t) ||\boldsymbol{a}|| + t ||\boldsymbol{b}||\]

Since \(\boldsymbol{a}, \boldsymbol{b} \in \mathcal{D}\), we know that \(||\boldsymbol{a}|| \le 1\) and \(||\boldsymbol{b}|| \le 1\). Therefore:

\[||\boldsymbol{a} + t(\boldsymbol{b} - \boldsymbol{a})|| \le (1 - t) + t = 1\]

and so \(L = \{\boldsymbol{a} + t(\boldsymbol{b} - \boldsymbol{a}) \mid t \in [0, 1]\} \subseteq \mathcal{D}\).

We see that \(f\) is continuous on \(L\) and totally differentiable on \(\operatorname{int} L\). Therefore, there exists some \(\boldsymbol{\xi} \in \operatorname{int} L\) such that

\[f(\boldsymbol{b}) - f(\boldsymbol{a}) = \nabla f(\boldsymbol{\xi}) \cdot (\boldsymbol{b} - \boldsymbol{a})\]

For the gradient of \(f\), we have

\[\nabla f(x, y, z) = \begin{bmatrix} 2 x \\ -6 y \\ 1 \end{bmatrix}\]

and so

\[||\nabla f(x, y, z)|| = \sqrt{4x^2 + 36y^2 + 1} \le \sqrt{41}\]

for all \(x, y, z \in \operatorname{int} \mathcal{D}\). Specifically, \(||\nabla f(\boldsymbol{\xi})|| \le \sqrt{41}\).

By applying the Cauchy-Schwarz inequality to

\[\nabla f(\boldsymbol{\xi}) \cdot (\boldsymbol{b} - \boldsymbol{a}),\]

we get the following:

\[|\nabla f(\boldsymbol{\xi}) \cdot (\boldsymbol{b} - \boldsymbol{a})| \le ||\nabla f(\boldsymbol{\xi})|| \, ||(\boldsymbol{b} - \boldsymbol{a})|| = \sqrt{41} ||\boldsymbol{b} - \boldsymbol{a}||\]

Since \(\nabla f(\boldsymbol{\xi}) \cdot (\boldsymbol{b} - \boldsymbol{a}) = f(\boldsymbol{b}) - f(\boldsymbol{a})\), we get:

\[|f(\boldsymbol{b}) - f(\boldsymbol{a})| \le \sqrt{41} ||\boldsymbol{b} - \boldsymbol{a}||\]

Proof

From the mean value theorem via total differentials, we know that there exists some \(\boldsymbol{\xi} \in \operatorname{int} L\) such that

\[f(\boldsymbol{b}) - f(\boldsymbol{a}) = \mathrm{d}f_{\boldsymbol{\xi}}(\boldsymbol{b} - \boldsymbol{a})\]

Using the matrix representation of the total differential \(\mathrm{d}f_{\boldsymbol{\xi}}\), we get:

\[f(\boldsymbol{b}) - f(\boldsymbol{a}) = J_f(\boldsymbol{\xi}) (\boldsymbol{b} - \boldsymbol{a})\]

The Jacobian matrix \(J_f(\boldsymbol{\xi})\) is just the transpose of the gradient \(\nabla f(\boldsymbol{\xi})\):

\[f(\boldsymbol{b}) - f(\boldsymbol{a}) = \nabla f(\boldsymbol{\xi})^{\mathsf{T}} (\boldsymbol{b} - \boldsymbol{a})\]

\[f(\boldsymbol{b}) - f(\boldsymbol{a}) = \nabla f(\boldsymbol{\xi}) \cdot (\boldsymbol{b} - \boldsymbol{a})\]

Theorem: Gradient via Partial Derivatives

Let \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field and let \(\boldsymbol{p} \in \mathcal{D}\).

If \(f\) is totally differentiable at \(\boldsymbol{p}\), then its gradient there is given by \(f\)'s partial derivatives as follows:

\[\nabla f(\boldsymbol{p}) = \begin{bmatrix} \partial_1 f (\boldsymbol{p}) \\ \vdots \\ \partial_n f(\boldsymbol{p}) \end{bmatrix}\]

Example: \(f(x, y) = x^2 y^3 + x\)

Consider the real scalar field \(f: \mathbb{R}^2 \to \mathbb{R}\) defined as follows:

\[f\left(x, y\right) = x^2 y^3 + x\]

Its partial derivatives are the following:

\[\frac{\partial f}{\partial x} \left(x, y\right) = 2x y^3 + 1 \qquad \frac{\partial f}{\partial y} \left(x, y\right) = 3 x^2 y^2\]

Proof

TODO

Theorem: Gradient in Polar Coordinates

Let \(f: \mathcal{D} \subseteq \mathbb{R}^2 \to \mathbb{R}\) be a real scalar field and let \(\mathcal{T}: (0, +\infty) \times (0, 2\uppi) \to \mathbb{R}^2\) be the coordinate transformation from polar coordinates:

\[\mathcal{T}(\rho, \varphi) = \begin{bmatrix} \rho \cos \varphi \\ \rho \sin \varphi\end{bmatrix}\]

Let \(\tilde{f} = f \circ \mathcal{T}\).

If \(f\) is totally differentiable at \(\mathcal{T}(\rho, \varphi)\), then the gradient of \(f\) at \(\mathcal{T}(\rho, \varphi)\) is given in the local coordinate basis of polar coordinates using \(\tilde{f}\)'s partial derivatives as follows:

\[\nabla f(\mathcal{T}(\rho, \varphi)) = \frac{\partial \tilde{f}}{\partial \rho}(\rho, \varphi) \boldsymbol{\hat{\rho}}(\rho, \varphi) + \frac{1}{\rho} \frac{\partial \tilde{f}}{\partial \varphi}(\rho, \varphi)\boldsymbol{\hat{\varphi}}(\rho, \varphi)\]

Proof

TODO

Theorem: Gradient in Cylindrical Coordinates

Let \(f: \mathcal{D} \subseteq \mathbb{R}^3 \to \mathbb{R}\) be a real scalar field and let \(\mathcal{T}: (0,+\infty) \times (0,2\uppi) \times \mathbb{R} \to \mathbb{R}^3\) be the coordinate transformation from cylindrical coordinates:

\[\mathcal{T}(\rho, \varphi, z) = \begin{bmatrix}\rho \cos \varphi \\ \rho \sin \varphi \\ z\end{bmatrix}\]

Let \(\tilde{f} = f \circ \mathcal{T}\).

If \(f\) is totally differentiable at \(\mathcal{T}(\rho, \varphi, z)\), then the gradient of \(f\) at \(\mathcal{T}(\rho, \varphi, z)\) is given in the local coordinate basis of cylindrical coordinates using \(\tilde{f}\)'s partial derivatives as follows:

\[\nabla f(\mathcal{T}(\rho, \varphi, z)) = \frac{\partial \tilde{f}}{\partial \rho}(\rho, \varphi, z) \boldsymbol{\hat{\rho}}(\rho, \varphi, z) + \frac{1}{\rho} \frac{\partial \tilde{f}}{\partial \varphi}(\rho, \varphi, z)\boldsymbol{\hat{\varphi}}(\rho, \varphi, z) + \frac{\partial \tilde{f}}{\partial z}(\rho, \varphi, z)\boldsymbol{\hat{z}}(\rho, \varphi, z)\]

Proof

TODO

Theorem: Gradient in Spherical Coordinates

Let \(f: \mathcal{D} \subseteq \mathbb{R}^3 \to \mathbb{R}\) be a real scalar field and let \(\mathcal{T}: (0,+\infty) \times (0,\uppi) \times (0,2\uppi) \to \mathbb{R}^3\) be the coordinate transformation from spherical coordinates:

\[\mathcal{T}(r, \theta, \varphi) = \begin{bmatrix}r \sin \theta \cos \varphi \\ r \sin \theta \sin \varphi \\ r \cos \theta\end{bmatrix}\]

Let \(\tilde{f} = f \circ \mathcal{T}\).

If \(f\) is totally differentiable at \(\mathcal{T}(r, \theta, \varphi)\), then the gradient of \(f\) at \(\mathcal{T}(r, \theta, \varphi)\) is given in the local coordinate basis of spherical coordinates using \(\tilde{f}\)'s partial derivatives as follows:

\[\nabla f(\mathcal{T}(r, \theta, \varphi)) = \frac{\partial \tilde{f}}{\partial r}(r, \theta, \varphi) \boldsymbol{\hat{r}}(r, \theta, \varphi) + \frac{1}{r} \frac{\partial \tilde{f}}{\partial \theta}(r, \theta, \varphi)\boldsymbol{\hat{\theta}}(r, \theta, \varphi) + \frac{1}{r \sin \theta} \frac{\partial \tilde{f}}{\partial \varphi}(r, \theta, \varphi)\boldsymbol{\hat{\varphi}}(r, \theta, \varphi)\]

Example: \(\ln (x^2 + y^2 +z^2)\)

Consider the real scalar field \(f: \mathbb{R}^3 \setminus \boldsymbol{0} \to \mathbb{R}\) defined as follows:

\[f(x, y, z) \overset{\text{def}}{=}\ln (x^2 + y^2 + z^2)\]

In spherical coordinates, we have:

\[\begin{aligned}\tilde{f}(r, \theta, \varphi) & = (f \circ \mathcal{T})(r, \theta, \varphi) \\ & = \ln \left(r^2 \sin^2 \theta \cos^2 \varphi + r^2 \sin^2 \theta \sin^2 \varphi + r^2 \cos^2 \theta \right) \\ & = \ln (r^2 \sin^2 \theta (\cos^2 \varphi + \sin^2 \varphi) + r^2 \cos^2 \theta) \\ & = \ln (r^2 \sin^2 \theta + r^2 \cos^2 \theta) \\ & = \ln (r^2) \\ & = 2 \ln r \end{aligned}\]

For its gradient, we have:

\[\begin{aligned}\nabla f(\mathcal{T}(r, \theta, \varphi)) & = \frac{\partial \tilde{f}}{\partial r}(r, \theta, \varphi) \boldsymbol{\hat{r}}(r, \theta, \varphi) + \frac{1}{r} \frac{\partial \tilde{f}}{\partial \theta}(r, \theta, \varphi)\boldsymbol{\hat{\theta}}(r, \theta, \varphi) + \frac{1}{r \sin \theta} \frac{\partial \tilde{f}}{\partial \varphi}(r, \theta, \varphi)\boldsymbol{\hat{\varphi}}(r, \theta, \varphi) \\ & = \frac{2}{r} \cdot \boldsymbol{\hat{r}} + \frac{1}{r} \cdot 0 \cdot \boldsymbol{\hat{\theta}}(r, \theta, \varphi) + \frac{1}{r \sin \theta} \cdot 0 \cdot \boldsymbol{\hat{\varphi}}(r, \theta, \varphi) \\ & = \frac{2}{r} \boldsymbol{\hat{r}}(r, \theta, \varphi) \\ & = \frac{2}{r}\begin{bmatrix} \sin \theta \cos \varphi \\ \sin \theta \sin \varphi \\ \cos \theta \end{bmatrix} \end{aligned}\]

Proof

TODO