mathematical-analysis mathematics real-mathematical-analysis vector-mathematical-analysis Gradient Theorem: Total Differentiability of Real Scalar Fields
Let \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field and let \(\boldsymbol{p} \in \mathbb{R}^n\) be an interior point of \(\mathcal{D}\) .
Then \(f\) is totally differentiable at \(\boldsymbol{p}\) if and only if there exists some real vector \(\boldsymbol{v} \in \mathbb{R}^n\) such that the following limit is zero:
\[\lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{f(\boldsymbol{x}) - f(\boldsymbol{p}) - \boldsymbol{v} \cdot (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} = 0,\]
where \(\cdot\) is the dot product .
Definition: Gradient
We call \(\boldsymbol{v}\) the gradient of \(f\) at \(\boldsymbol{p}\) .
Notation
\[\nabla f(\boldsymbol{p}) \qquad \operatorname{grad} f(\boldsymbol{p})\]
Example: \(f(\boldsymbol{x}) = \boldsymbol{a}^{\mathsf{T}}\boldsymbol{x}\) Consider the real scalar field \(f: \mathbb{R}^n \to \mathbb{R}\) defined as
\[f(\boldsymbol{x}) = \boldsymbol{a}^{\mathsf{T}}\boldsymbol{x}\]
for some fixed real vector \(\boldsymbol{a} = \begin{bmatrix} a^1 & \cdots & a^n \end{bmatrix}^{\mathsf{T}}\in \mathbb{R}^n\) .
For each \(\boldsymbol{p} \in \mathbb{R}^n\) , if we let \(\boldsymbol{v} = \boldsymbol{a}\) we have:
\[\begin{aligned}\lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{f(\boldsymbol{x}) - f(\boldsymbol{p}) - \boldsymbol{a} \cdot (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{a}^{\mathsf{T}}\boldsymbol{x} - \boldsymbol{a}^{\mathsf{T}}\boldsymbol{p} - \boldsymbol{a}^{\mathsf{T}}(\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{a}^{\mathsf{T}}(\boldsymbol{x} - \boldsymbol{p}) - \boldsymbol{a}^{\mathsf{T}}(\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{0}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = 0\end{aligned}\]
Therefore, \(f\) is totally differentiable on \(\mathbb{R}^n\) and its gradient at each \(\boldsymbol{p} \in \mathbb{R}^n\) is equal to \(\boldsymbol{a}\) :
\[\nabla f(\boldsymbol{p}) = \boldsymbol{a}\]
Example: \(f(\boldsymbol{x}) = \boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x}\) Consider the real scalar field \(f: \mathbb{R}^n \to \mathbb{R}\) defined as
\[f(\boldsymbol{x}) = \boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x}\]
for some real matrix \(\boldsymbol{A} \in \mathbb{R}^{n \times n}\) .
For each \(\boldsymbol{p} \in \mathbb{R}^n\) , if we let \(\boldsymbol{v} = (\boldsymbol{A} + \boldsymbol{A}^{\mathsf{T}})\boldsymbol{p}\) , we have:
\[\begin{aligned}\lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{f(\boldsymbol{x}) - f(\boldsymbol{p}) - \boldsymbol{v} \cdot (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - \boldsymbol{p}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{p} - ((\boldsymbol{A} + \boldsymbol{A}^{\mathsf{T}})\boldsymbol{p})^{\mathsf{T}}(\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - \boldsymbol{p}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{p} - \boldsymbol{p}^{\mathsf{T}}(\boldsymbol{A}^{\mathsf{T}} + A)(\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - \boldsymbol{p}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{p} - \boldsymbol{p}^{\mathsf{T}} (\boldsymbol{A}^{\mathsf{T}}\boldsymbol{x} - \boldsymbol{A}^{\mathsf{T}}\boldsymbol{p} + \boldsymbol{A}\boldsymbol{x} - \boldsymbol{A}\boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - \boldsymbol{p}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{p} - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}^{\mathsf{T}}\boldsymbol{x} + \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}^{\mathsf{T}}\boldsymbol{p} - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{x} + \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{p}}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}^{\mathsf{T}}\boldsymbol{x} + \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}^{\mathsf{T}}\boldsymbol{p} - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{x}}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - (\boldsymbol{x}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{p})^{\mathsf{T}} + (\boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{p})^{\mathsf{T}} - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{x}}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{x} - \boldsymbol{x}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{p} + \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{p} - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A}\boldsymbol{x}}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{\boldsymbol{x}^{\mathsf{T}} \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{p}) - \boldsymbol{p}^{\mathsf{T}}\boldsymbol{A} (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{(\boldsymbol{x}^{\mathsf{T}} - \boldsymbol{p}^{\mathsf{T}}) \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} \\ & = \lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{(\boldsymbol{x} - \boldsymbol{p})^{\mathsf{T}} \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||}\end{aligned}\]
With the substitution \(\boldsymbol{h} = \boldsymbol{x} - \boldsymbol{p} = \begin{bmatrix}h^1, \dotsc, h^n \end{bmatrix}^{\mathsf{T}}\) we get:
\[\lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{(\boldsymbol{x} - \boldsymbol{p})^{\mathsf{T}} \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} = \lim_{\boldsymbol{h} \to \boldsymbol{0}} \frac{\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}}{||\boldsymbol{h}||}\]
The product \(\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}\) is given by the following:
\[\begin{aligned}\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h} & = \boldsymbol{h}^{\mathsf{T}}(\boldsymbol{A}\boldsymbol{h}) = \begin{bmatrix} h^1 & h^2 & \dots & h^n \end{bmatrix} \begin{bmatrix} \sum_{j=1}^{n} A_{1j} h^j \\ \sum_{j=1}^{n} A_{2j} h^j \\ \vdots \\ \sum_{j=1}^{n} A_{nj} h^j \end{bmatrix} \\ & = \sum_{i=1}^{n} h^i \left( \sum_{j=1}^{n} A_{ij} h^j \right) \\ & = \sum_{i=1}^{n} \sum_{j=1}^{n} A_{ij} h^i h^j\end{aligned}\]
We thus have:
\[|\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}| \leq \sum_{i=1}^{n} \sum_{j=1}^{n} |A_{ij}| |h^i| |h^j|\]
From \(||\boldsymbol{h}|| = \sqrt{\sum_{k = 1}^n (h^k)^2}\) , we know that
\[|h^k| \le ||\boldsymbol{h}||\]
for all \(k \in \{1,\dotsc,n\}\) .
Therefore,
\[|h^i||h^j| \le ||\boldsymbol{h}||^2\]
for all \(i,j \in \{1,\dotsc,n\}\) .
From this, we obtain the following:
\[|\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}| \leq \sum_{i=1}^{n} \sum_{j=1}^{n} |A_{ij}| ||\boldsymbol{h}||^2 = ||\boldsymbol{h}||^2 \sum_{i=1}^{n} \sum_{j=1}^{n} |A_{ij}|\]
The sum \(\sum_{i=1}^{n} \sum_{j=1}^{n} |A_{ij}|\) is just a constant and so
\[|\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}| \leq C ||\boldsymbol{h}||^2\]
with \(C = \sum_{i=1}^{n} \sum_{j=1}^{n} |A_{ij}|\) . Divide by \(||\boldsymbol{h}||\) :
\[0 \le \frac{|\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}|}{||\boldsymbol{h}||} \leq C ||\boldsymbol{h}||\]
We know that the limit of \(C ||\boldsymbol{h}||\) for \(\boldsymbol{h} \to \boldsymbol{0}\) is zero. By the squeeze theorem we get:
\[\lim_{\boldsymbol{h} \to \boldsymbol{0}} \frac{|\boldsymbol{h}^{\mathsf{T}} \boldsymbol{A} \boldsymbol{h}|}{||\boldsymbol{h}||} = 0,\]
i.e.
\[\lim_{\boldsymbol{x} \to \boldsymbol{p}} \frac{(\boldsymbol{x} - \boldsymbol{p})^{\mathsf{T}} \boldsymbol{A} (\boldsymbol{x} - \boldsymbol{p})}{||\boldsymbol{x} - \boldsymbol{p}||} = 0.\]
Therefore, \(f\) is totally differentiable on \(\mathbb{R}^n\) and its gradient at each \(\boldsymbol{p} \in \mathbb{R}^n\) is equal to \((\boldsymbol{A}+\boldsymbol{A}^{\mathsf{T}})\boldsymbol{p}\) :
\[\nabla f(\boldsymbol{p}) = (\boldsymbol{A} + \boldsymbol{A}^{\mathsf{T}})\boldsymbol{p}\]
Proof TODO
Theorem: Gradient via Jacobian Matrix
Let \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field and let \(\boldsymbol{p} \in \mathcal{D}\) .
If \(f\) is totally differentiable at \(\boldsymbol{p}\) , then its gradient there is the transpose of \(f\) 's Jacobian matrix :
\[\nabla f(\boldsymbol{p}) = (J_f(\boldsymbol{p}))^{\mathsf{T}}\]
Proof TODO
Theorem: Gradient of Linear Combination
Let \(f: \mathcal{D}_f \subseteq \mathbb{R}^n \to \mathbb{R}\) and \(g: \mathcal{D}_g \subseteq \mathbb{R}^n \to \mathbb{R}\) be real scalar fields .
If \(f\) and \(g\) are totally differentiable at \(\boldsymbol{p} \in \mathcal{D}_f \cap \mathcal{D}_g\) , then so is \(\lambda f + \mu g\) for all \(\lambda, \mu \in \mathbb{R}\) and its gradient is given by the gradients of \(f\) and \(g\) as follows:
\[\nabla (\lambda f + \mu g)(\boldsymbol{p}) = \lambda \nabla f(\boldsymbol{p}) + \mu \nabla g(\boldsymbol{p})\]
Proof TODO
Theorem: Product Rule with Gradients
Let \(f: \mathcal{D}_f \subseteq \mathbb{R}^n \to \mathbb{R}\) and \(g: \mathcal{D}_g \subseteq \mathbb{R}^n \to \mathbb{R}\) be real scalar fields .
If \(f\) and \(g\) are totally differentiable at \(\boldsymbol{p} \in \mathcal{D}_f \cap \mathcal{D}_g\) , then \(fg\) is also totally differentiable there and its gradient is given by the gradients of \(f\) and \(g\) as follows:
\[\nabla(fg)(\boldsymbol{p}) = g(\boldsymbol{p}) \nabla f(\boldsymbol{p}) + f(\boldsymbol{p}) \nabla g(\boldsymbol{p})\]
Proof TODO
Theorem: Quotient Rule with Gradients
Let \(f: \mathcal{D}_f \subseteq \mathbb{R}^n \to \mathbb{R}\) and \(g: \mathcal{D}_g \subseteq \mathbb{R}^n \to \mathbb{R}\) be real scalar fields .
If \(f\) and \(g\) are totally differentiable at \(\boldsymbol{p} \in \mathcal{D}_f \cap \mathcal{D}_g\) and \(g(\boldsymbol{p}) \ne 0\) , then \(f/g\) is also totally differentiable there and its gradient is given by the gradients of \(f\) and \(g\) as follows:
\[\nabla(f/g)(\boldsymbol{p}) = \frac{g(\boldsymbol{p}) \nabla f(\boldsymbol{p}) - f(\boldsymbol{p}) \nabla g(\boldsymbol{p})}{g(\boldsymbol{p})^2}\]
Proof TODO
Theorem: Dot Product Rule
Let \(f: \mathcal{D}_f \subseteq \mathbb{R}^m \to \mathbb{R}^n\) and \(g: \mathcal{D}_g \subseteq \mathbb{R}^m \to \mathbb{R}^n\) be real vector functions .
If \(f\) and \(g\) are totally differentiable at \(\boldsymbol{p} \in \mathop{\operatorname{int}} (\mathcal{D}_f \cap \mathcal{D}_g)\) , then their dot product is also totally differentiable at \(\boldsymbol{p}\) and its gradient is given by the Jacobian matrices of \(f\) and \(g\) as follows:
\[\nabla (f\cdot g)(\boldsymbol{p}) = J_f(\boldsymbol{p})^{\mathsf{T}}g(\boldsymbol{p}) + J_g(\boldsymbol{p})^{\mathsf{T}}f(\boldsymbol{p})\]
Proof TODO
Theorem: Chain Rule with Real Functions
Let \(f: \mathcal{D}_f \subseteq \mathbb{R} \to \mathbb{R}\) be a real function , let \(g: \mathcal{D}_g \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field and let \(\boldsymbol{x}\) be an interior point of \(\mathcal{D}_g\) .
If \(g\) is totally differentiable at \(\boldsymbol{x}\) and \(g(\boldsymbol{x})\) is an interior point of \(\mathcal{D}_f\) and \(f\) is differentiable at \(g(\boldsymbol{x})\) , then the composition \(f\circ g\) is totally differentiable at \(\boldsymbol{x}\) with the following gradient :
\[\nabla (f \circ g)(\boldsymbol{x}) = f'(g(\boldsymbol{x}))\nabla g(\boldsymbol{x})\]
Example Let \(f: \mathbb{R}_{\gt 0} \to \mathbb{R}\) be a real function which is differentiable on \(\mathbb{R}_{\gt 0}\) and consider the real scalar field \(f(||\boldsymbol{x}||)\) .
We have:
\[f(||\boldsymbol{x}||) = f\left(\sqrt{\sum_{i=1}^n x_i^2}\right)\]
For the partial derivative of \(\sqrt{\sum_{i=1}^n x_i^2}\) at \(\boldsymbol{x} \in \mathbb{R}^n \setminus \{\boldsymbol{0}\}\) w.r.t. to the \(k\) -th Cartesian coordinate (\(k \in \{1, \dotsc, n\}\) ) we have the following:
\[\begin{aligned}\partial_k\sqrt{\sum_{i=1}^n x_i^2} = \frac{2 x_k}{2\sqrt{\sum_{i=1}^n x_i^2}} = \frac{x_k}{||\boldsymbol{x}||}\end{aligned}\]
Since these are all continuous on \(\mathbb{R}^n \setminus \{\boldsymbol{0}\}\) , we know that \(\sqrt{\sum_{i=1}^n x_i^2}\) is totally differentiable on \(\mathbb{R}^n \setminus \{\boldsymbol{0}\}\) and its gradient at each \(\boldsymbol{p} \in \mathbb{R}^n \setminus \{\boldsymbol{0}\}\) is the following:
\[\nabla \left(\sqrt{\sum_{i=1}^n x_i^2}\right) (\boldsymbol{x}) = \frac{1}{||\boldsymbol{x}||}\boldsymbol{x}\]
Since \(f\) is differentiable on \(\mathbb{R}_{\gt 0}\) , we know that \(f(||\boldsymbol{x}||)\) is totally differentiable on \(\mathbb{R}^n \setminus \{\boldsymbol{0}\}\) and its gradient at each \(\boldsymbol{x} \in \mathbb{R}^n \setminus \{\boldsymbol{0}\}\) is the following:
\[\begin{aligned}\nabla f(||\boldsymbol{x}||) = \frac{f'(||\boldsymbol{x}||)}{||\boldsymbol{x}||} \boldsymbol{x}\end{aligned}\]
Example: \(f(\boldsymbol{x}) = \ln (||\boldsymbol{x}||)\) From the above example, we know that \(f(\boldsymbol{x}) = \ln (||\boldsymbol{x}||)\) is totally differentiable on \(\mathbb{R}^n \setminus \{\boldsymbol{0}\}\) with the following gradient :
\[\nabla \ln (||\boldsymbol{x}||) = \frac{1}{||\boldsymbol{x}||} \frac{1}{||\boldsymbol{x}||} \boldsymbol{x} = \frac{1}{||\boldsymbol{x}||^2}\boldsymbol{x}\]
Proof TODO
Theorem: Chain Rule with Curves
Let \(f: \mathcal{D}_f \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field , and let \(g: \mathcal{D}_g \subseteq \mathbb{R} \to \mathbb{R}^n\) be vector-valued .
If \(g\) is totally differentiable at \(t \in \mathcal{D}_g\) and \(f\) is totally differentiable at \(g(t) \in \mathcal{D}_f\) , then the derivative of the composition \(f \circ g\) is the dot product of \(f\) 's gradient and \(g\) 's derivative :
\[(f\circ g)'(t) = \nabla f(g(t))\cdot g'(t)\]
Example Let \(k: [0,2\uppi] \to \mathbb{R}\) be the curve defined as follows:
\[k(t) = \begin{bmatrix} \cos t \\ \sin t \end{bmatrix}\]
Let \(f: \mathbb{R}^2 \to \mathbb{R}\) be the real scalar field defined as follows:
\[f(x,y) = x^2 + xy + y^2\]
For the derivative of \(k\) , we have the following:
\[k'(t) = \begin{bmatrix}-\sin t \\ \cos t \end{bmatrix}\]
For the gradient of \(f\) , we have:
\[\nabla f (x,y) = \begin{bmatrix}2x + y \\ x + 2y\end{bmatrix}\]
For the derivative of \(g(t) = f(k(t))\) , we then have:
\[\begin{aligned}g'(t) & = \nabla f(k(t))^{\mathsf{T}}k'(t) \\ & = \begin{bmatrix}2 \cos t + \sin t & \cos t + 2\sin t\end{bmatrix} \begin{bmatrix}-\sin t \\ \cos t \end{bmatrix} \\ & = (2 \cos t + \sin t)(-\sin t) + (\cos t + 2\sin t) \cos t \\ & = \cos^2 t - \sin^2 t \\ & = \cos (2t)\end{aligned}\]
Proof Since \(f\) is totally differentiable at \(g(t) \in \mathcal{D}_f\) , we know that \(f(g(t) + \boldsymbol{h}) - f(g(t)) - \nabla f(g(t))^{\mathsf{T}} \boldsymbol{h}\) is little o of \(||\boldsymbol{h}||\) for \(\boldsymbol{h} \to \boldsymbol{0}\) :
\[f(g(t) + \boldsymbol{h}) - f(g(t)) - \nabla f(g(t))^{\mathsf{T}} \boldsymbol{h} = o(||\boldsymbol{h}||) \qquad \text{for} \qquad \boldsymbol{h} \to \boldsymbol{0}\]
TODO
Theorem: Curl of Gradient
Let \(f: \mathcal{D} \subseteq \mathbb{R}^3 \to \mathbb{R}\) be a real scalar field which is twice totally differentiable at an interior point \(\boldsymbol{p}\) of \(\mathcal{D}\) .
The curl of \(f\) 's gradient at \(\boldsymbol{p}\) is zero:
\[\operatorname{curl} \operatorname{grad} f(\boldsymbol{p}) = \boldsymbol{0}\]
Proof TODO
Theorem: Mean Value Theorem via Gradient
Let \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field and let \(\boldsymbol{a}, \boldsymbol{b} \in \mathcal{D}\) such that \(L = \{\boldsymbol{a} + t(\boldsymbol{b} - \boldsymbol{a}) \mid t \in [0,1]\} \subseteq \mathcal{D}\) .
If \(f\) is continuous on \(L\) and totally differentiable on \(\operatorname{int} L\) , then there exists some \(\boldsymbol{\xi} \in \operatorname{int} L\) such that \(f(\boldsymbol{b}) - f(\boldsymbol{a})\) is equal to the product of \(f\) 's gradient at \(\boldsymbol{\xi}\) and \(\boldsymbol{b} - \boldsymbol{a}\) :
\[f(\boldsymbol{b}) - f(\boldsymbol{a}) = \nabla f(\boldsymbol{\xi}) \cdot (\boldsymbol{b} - \boldsymbol{a})\]
Example: \(f(x, y, z) = x^2 - 3y^2 + z\) Consider the real scalar field \(f: \mathcal{D} \to \mathbb{R}\) defined on \(\mathcal{D} = \{\boldsymbol{p} \in \mathbb{R}^3 \mid ||\boldsymbol{p}|| \le 1\}\) as follows:
\[f(x, y, z) = x^2 - 3y^2 + z\]
For all \(\boldsymbol{a}, \boldsymbol{b} \in \mathcal{D}\) and \(t \in [0,1]\) , we have:
\[||\boldsymbol{a} + t(\boldsymbol{b} - \boldsymbol{a})|| = ||(1 - t)\boldsymbol{a} + t \boldsymbol{b}|| \le ||(1 - t)\boldsymbol{a}|| + ||t \boldsymbol{b}|| = |1 - t| ||\boldsymbol{a}|| + |t| ||\boldsymbol{b}||\]
Since \(t \in [0,1]\) , we get:
\[||\boldsymbol{a} + t(\boldsymbol{b} - \boldsymbol{a})|| \le (1 - t) ||\boldsymbol{a}|| + t ||\boldsymbol{b}||\]
Since \(\boldsymbol{a}, \boldsymbol{b} \in \mathcal{D}\) , we know that \(||\boldsymbol{a}|| \le 1\) and \(||\boldsymbol{b}|| \le 1\) . Therefore:
\[||\boldsymbol{a} + t(\boldsymbol{b} - \boldsymbol{a})|| \le (1 - t) + t = 1\]
and so \(L = \{\boldsymbol{a} + t(\boldsymbol{b} - \boldsymbol{a}) \mid t \in [0, 1]\} \subseteq \mathcal{D}\) .
We see that \(f\) is continuous on \(L\) and totally differentiable on \(\operatorname{int} L\) . Therefore, there exists some \(\boldsymbol{\xi} \in \operatorname{int} L\) such that
\[f(\boldsymbol{b}) - f(\boldsymbol{a}) = \nabla f(\boldsymbol{\xi}) \cdot (\boldsymbol{b} - \boldsymbol{a})\]
For the gradient of \(f\) , we have
\[\nabla f(x, y, z) = \begin{bmatrix} 2 x \\ -6 y \\ 1 \end{bmatrix}\]
and so
\[||\nabla f(x, y, z)|| = \sqrt{4x^2 + 36y^2 + 1} \le \sqrt{41}\]
for all \(x, y, z \in \operatorname{int} \mathcal{D}\) . Specifically, \(||\nabla f(\boldsymbol{\xi})|| \le \sqrt{41}\) .
By applying the Cauchy-Schwarz inequality to
\[\nabla f(\boldsymbol{\xi}) \cdot (\boldsymbol{b} - \boldsymbol{a}),\]
we get the following:
\[|\nabla f(\boldsymbol{\xi}) \cdot (\boldsymbol{b} - \boldsymbol{a})| \le ||\nabla f(\boldsymbol{\xi})|| \, ||(\boldsymbol{b} - \boldsymbol{a})|| = \sqrt{41} ||\boldsymbol{b} - \boldsymbol{a}||\]
Since \(\nabla f(\boldsymbol{\xi}) \cdot (\boldsymbol{b} - \boldsymbol{a}) = f(\boldsymbol{b}) - f(\boldsymbol{a})\) , we get:
\[|f(\boldsymbol{b}) - f(\boldsymbol{a})| \le \sqrt{41} ||\boldsymbol{b} - \boldsymbol{a}||\]
Proof From the mean value theorem via total differentials , we know that there exists some \(\boldsymbol{\xi} \in \operatorname{int} L\) such that
\[f(\boldsymbol{b}) - f(\boldsymbol{a}) = \mathrm{d}f_{\boldsymbol{\xi}}(\boldsymbol{b} - \boldsymbol{a})\]
Using the matrix representation of the total differential \(\mathrm{d}f_{\boldsymbol{\xi}}\) , we get:
\[f(\boldsymbol{b}) - f(\boldsymbol{a}) = J_f(\boldsymbol{\xi}) (\boldsymbol{b} - \boldsymbol{a})\]
The Jacobian matrix \(J_f(\boldsymbol{\xi})\) is just the transpose of the gradient \(\nabla f(\boldsymbol{\xi})\) :
\[f(\boldsymbol{b}) - f(\boldsymbol{a}) = \nabla f(\boldsymbol{\xi})^{\mathsf{T}} (\boldsymbol{b} - \boldsymbol{a})\]
\[f(\boldsymbol{b}) - f(\boldsymbol{a}) = \nabla f(\boldsymbol{\xi}) \cdot (\boldsymbol{b} - \boldsymbol{a})\]
Theorem: Gradient via Partial Derivatives
Let \(f: \mathcal{D} \subseteq \mathbb{R}^n \to \mathbb{R}\) be a real scalar field and let \(\boldsymbol{p} \in \mathcal{D}\) .
If \(f\) is totally differentiable at \(\boldsymbol{p}\) , then its gradient there is given by \(f\) 's partial derivatives as follows:
\[\nabla f(\boldsymbol{p}) = \begin{bmatrix} \partial_1 f (\boldsymbol{p}) \\ \vdots \\ \partial_n f(\boldsymbol{p}) \end{bmatrix}\]
Example: \(f(x, y) = x^2 y^3 + x\) Consider the real scalar field \(f: \mathbb{R}^2 \to \mathbb{R}\) defined as follows:
\[f\left(x, y\right) = x^2 y^3 + x\]
Its partial derivatives are the following:
\[\frac{\partial f}{\partial x} \left(x, y\right) = 2x y^3 + 1 \qquad \frac{\partial f}{\partial y} \left(x, y\right) = 3 x^2 y^2\]
Proof TODO
Theorem: Gradient in Polar Coordinates
Let \(f: \mathcal{D} \subseteq \mathbb{R}^2 \to \mathbb{R}\) be a real scalar field and let \(\mathcal{T}: (0, +\infty) \times (0, 2\uppi) \to \mathbb{R}^2\) be the coordinate transformation from polar coordinates :
\[\mathcal{T}(\rho, \varphi) = \begin{bmatrix} \rho \cos \varphi \\ \rho \sin \varphi\end{bmatrix}\]
Let \(\tilde{f} = f \circ \mathcal{T}\) .
If \(f\) is totally differentiable at \(\mathcal{T}(\rho, \varphi)\) , then the gradient of \(f\) at \(\mathcal{T}(\rho, \varphi)\) is given in the local coordinate basis of polar coordinates using \(\tilde{f}\) 's partial derivatives as follows:
\[\nabla f(\mathcal{T}(\rho, \varphi)) = \frac{\partial \tilde{f}}{\partial \rho}(\rho, \varphi) \boldsymbol{\hat{\rho}}(\rho, \varphi) + \frac{1}{\rho} \frac{\partial \tilde{f}}{\partial \varphi}(\rho, \varphi)\boldsymbol{\hat{\varphi}}(\rho, \varphi)\]
Proof TODO
Theorem: Gradient in Cylindrical Coordinates
Let \(f: \mathcal{D} \subseteq \mathbb{R}^3 \to \mathbb{R}\) be a real scalar field and let \(\mathcal{T}: (0,+\infty) \times (0,2\uppi) \times \mathbb{R} \to \mathbb{R}^3\) be the coordinate transformation from cylindrical coordinates :
\[\mathcal{T}(\rho, \varphi, z) = \begin{bmatrix}\rho \cos \varphi \\ \rho \sin \varphi \\ z\end{bmatrix}\]
Let \(\tilde{f} = f \circ \mathcal{T}\) .
If \(f\) is totally differentiable at \(\mathcal{T}(\rho, \varphi, z)\) , then the gradient of \(f\) at \(\mathcal{T}(\rho, \varphi, z)\) is given in the local coordinate basis of cylindrical coordinates using \(\tilde{f}\) 's partial derivatives as follows:
\[\nabla f(\mathcal{T}(\rho, \varphi, z)) = \frac{\partial \tilde{f}}{\partial \rho}(\rho, \varphi, z) \boldsymbol{\hat{\rho}}(\rho, \varphi, z) + \frac{1}{\rho} \frac{\partial \tilde{f}}{\partial \varphi}(\rho, \varphi, z)\boldsymbol{\hat{\varphi}}(\rho, \varphi, z) + \frac{\partial \tilde{f}}{\partial z}(\rho, \varphi, z)\boldsymbol{\hat{z}}(\rho, \varphi, z)\]
Proof TODO
Theorem: Gradient in Spherical Coordinates
Let \(f: \mathcal{D} \subseteq \mathbb{R}^3 \to \mathbb{R}\) be a real scalar field and let \(\mathcal{T}: (0,+\infty) \times (0,\uppi) \times (0,2\uppi) \to \mathbb{R}^3\) be the coordinate transformation from spherical coordinates :
\[\mathcal{T}(r, \theta, \varphi) = \begin{bmatrix}r \sin \theta \cos \varphi \\ r \sin \theta \sin \varphi \\ r \cos \theta\end{bmatrix}\]
Let \(\tilde{f} = f \circ \mathcal{T}\) .
If \(f\) is totally differentiable at \(\mathcal{T}(r, \theta, \varphi)\) , then the gradient of \(f\) at \(\mathcal{T}(r, \theta, \varphi)\) is given in the local coordinate basis of spherical coordinates using \(\tilde{f}\) 's partial derivatives as follows:
\[\nabla f(\mathcal{T}(r, \theta, \varphi)) = \frac{\partial \tilde{f}}{\partial r}(r, \theta, \varphi) \boldsymbol{\hat{r}}(r, \theta, \varphi) + \frac{1}{r} \frac{\partial \tilde{f}}{\partial \theta}(r, \theta, \varphi)\boldsymbol{\hat{\theta}}(r, \theta, \varphi) + \frac{1}{r \sin \theta} \frac{\partial \tilde{f}}{\partial \varphi}(r, \theta, \varphi)\boldsymbol{\hat{\varphi}}(r, \theta, \varphi)\]
Example: \(\ln (x^2 + y^2 +z^2)\) Consider the real scalar field \(f: \mathbb{R}^3 \setminus \boldsymbol{0} \to \mathbb{R}\) defined as follows:
\[f(x, y, z) \overset{\text{def}}{=}\ln (x^2 + y^2 + z^2)\]
In spherical coordinates , we have:
\[\begin{aligned}\tilde{f}(r, \theta, \varphi) & = (f \circ \mathcal{T})(r, \theta, \varphi) \\ & = \ln \left(r^2 \sin^2 \theta \cos^2 \varphi + r^2 \sin^2 \theta \sin^2 \varphi + r^2 \cos^2 \theta \right) \\ & = \ln (r^2 \sin^2 \theta (\cos^2 \varphi + \sin^2 \varphi) + r^2 \cos^2 \theta) \\ & = \ln (r^2 \sin^2 \theta + r^2 \cos^2 \theta) \\ & = \ln (r^2) \\ & = 2 \ln r \end{aligned}\]
For its gradient , we have:
\[\begin{aligned}\nabla f(\mathcal{T}(r, \theta, \varphi)) & = \frac{\partial \tilde{f}}{\partial r}(r, \theta, \varphi) \boldsymbol{\hat{r}}(r, \theta, \varphi) + \frac{1}{r} \frac{\partial \tilde{f}}{\partial \theta}(r, \theta, \varphi)\boldsymbol{\hat{\theta}}(r, \theta, \varphi) + \frac{1}{r \sin \theta} \frac{\partial \tilde{f}}{\partial \varphi}(r, \theta, \varphi)\boldsymbol{\hat{\varphi}}(r, \theta, \varphi) \\ & = \frac{2}{r} \cdot \boldsymbol{\hat{r}} + \frac{1}{r} \cdot 0 \cdot \boldsymbol{\hat{\theta}}(r, \theta, \varphi) + \frac{1}{r \sin \theta} \cdot 0 \cdot \boldsymbol{\hat{\varphi}}(r, \theta, \varphi) \\ & = \frac{2}{r} \boldsymbol{\hat{r}}(r, \theta, \varphi) \\ & = \frac{2}{r}\begin{bmatrix} \sin \theta \cos \varphi \\ \sin \theta \sin \varphi \\ \cos \theta \end{bmatrix} \end{aligned}\]
Proof TODO
April 3, 2026 April 3, 2026