# (p.540) Appendix D THE CALCULUS OF MANY VARIABLES

We summarize here some standard results from the calculus of many variables. These theorems will be invoked frequently throughout the text. In particular, the reader should review this material before beginning the chapter on Lagrangian methods. More background on this topic can be found in many standard calculus texts; a particularly accessible source is the two volumes of Courant (1936a,b). The mathematical appendices of Desloge (1982) are also very useful.

# D.1 Basic Properties of Functions

Afunction *y* of the set of variables *x* _{1}, *x* _{2},…, *x _{N}* will be written in either of the two equivalent forms,

In the second form, the single unsubscripted variable *x* denotes the whole set, *x* = *x* _{1}, *x* _{2},…, *x _{N}*. Note that the same letter

*y*is used to denote both quantity and function. Whereas a calculus book might write

*y*=

*f*(

*x*), wewrite

*y*=

*y*(

*x*). This notation has the advantage of clarity and alphabetic economy. It will be clear from context when letters like

*y*denote the function and when they denote the value of the function. Usually, a letter on the left of an equal sign denotes the value, the same letter before the (

*x*) denotes the function that produces that value.

# D.2 Regions of Definition of Functions

Functions are assumed to be single valued for all *x* _{1}, *x* _{2},…, *x _{N}* in a

*region R*. A simple example of such a region is an

*open rectangle R*

_{⊥}defined by

*a*<

_{k}*x*<

_{k}*b*for

_{k}*k*= 1,…,

*N*, wherethe

*a*<

_{k}*b*are some constants.

_{k}The region *R* is *connected* if any two points in *R* can be joined by a curve lying entirely in *R*. The region is *simply connected* if it is connected, and if every closed curve lying entirely in *R* can be continuously shrunk to a point without leaving *R*. For example, the interior of a ball is simply connected, the interior of a doughnut is not.

Region *R* is called *open* if, for every point in *R*, there is some (possibly quite small) open rectangle lying entirely in *R* and containing the point. Such an open rectangle will be called an *open neighborhood* of the point *x* and will be denoted *N _{x}*. See Chapter I of Spivak (1965) for more definitions.

# (p.541) D.3 Continuity of Functions

The definition of continuity is similar to that for functions of one variable.^{128} Define an arbitrary increment *h _{k}* for each variable

*x*for

_{k}*k*= 1,…,

*N*. Then function

*y*is continuous at point

*x*if, for

*every*possible choice of the

*h*,

_{k}For example, there is no assigned value of *y*(0, 0) for which $y=y({x}_{1},{x}_{2})=({x}_{1}^{2}-{x}_{2}^{2})/({x}_{1}^{2}+{x}_{2}^{2})$ would be continuous at the point *x* _{1} = 0, *x* _{2} = 0, since the left side of eqn (D.2) would have the limit *h* _{1} = 1, *h* _{2} = 0 but the limit −1 when *h* _{1} = 0, *h* _{2} = 1.

# D.4 Compound Functions

Suppose we have a function

and each of the *x* _{1}, *x* _{2},…, *x _{N}* is a function of another set of variables

*r*

_{1},

*r*

_{2},…

*r*as in

_{M}Then *y* is a *compound function* of the variables *r*. This compound function *y* = *y*(*r* _{1}, *r* _{2},…, *r _{M}*) is defined by direct substitution of the

*x*of eqn (D.4) into eqn (D.3). This substitution will be denoted by

_{k}or, in a somewhat shorter notation,

or, even more simply,

# D.5 The Same Function in Different Coordinates

A scalar field *T* is often represented by functions of different kinds of coordinates. For example,

where the subscripts refer to Cartesian, cylindrical polar, and spherical polar coor dinates, respectively. Of course, eqn (D.8) contains the implicit assumption that the various coordinates in it must be linked by the relations in Section A.8, and so represent the same point **r**. The function denoted *T* _{C}(*x* _{1}, *x* _{2}, *x* _{3}) is plainly not the same
(p.542)
function of its arguments *x* _{1}, *x* _{2}, *x* _{3} as *T* _{s}(*r*, θ, ϕ) is of its arguments *r*, θ, ϕ. For example, suppose *T* _{C}(*x* _{1}, *x* _{2}, *x* _{3}) = *ax* _{2}/*x* _{1} which leads to *T* _{s}(*r*, θ, ϕ) = *a* tanϕ. But they both represent the same underlying scalar field function *T* = *T*(**r**). The almost universal custom in physics texts is to omit the subscripts “C”, “c”, and “s” in eqn (D.8) and write simply

The argument list of a function is taken as an adequate label for it.

The various functions in eqn (D.9) are thought of as the *same function*. The value *T* is really a function of the underlying field point **r** and is only being *represented* in Cartesian, cylindrical polar, or spherical polar form. This physics custom is especially evident in Lagrangian and Hamiltonian mechanics where one sees expressions such as *L* (*s, ṡ, t*) for $L(q,\dot{q},t)$ functions expressing the same underlying physical quantity in terms of different variable sets.

We will follow the physics custom in this book, and in fact have already done so by using the same letter *y* to denote both *y*(*x* _{1}, *x* _{2},…, *x _{N}*) and the compound function

*y*(

*r*

_{1},

*r*

_{2},…,

*r*) in Section D.4 above. Note that the different functional forms in eqn (D.9) can be defined as compound functions. For example,

_{M}*T*(

*r*, θ, ϕ) may be derived from

*T*(

*x*

_{1},

*x*

_{2},

*x*

_{3}) by

# D.6 Partial Derivatives

Lagrangian mechanics makes extensive use of partial derivatives. Given a function *y* = *y*(*x* _{1}, *x* _{2},…, *x _{N}*), a partial derivative with respect to variable

*x*is defined as

_{k}which says to take an ordinary derivative with respect to *x _{k} as if* variables

*x*

_{1},

*x*

_{2},…,

*x*

_{k−1},

*x*

_{k+1},…,

*x*were constants. It does not say that they

_{N}*are*constants, only that they are to be treated as such in calculating the derivative.

Partial derivatives thus depend on the *list* of variables as well as on the variable being differentiated with respect to, since only that list tells what to hold constant as *y* is differentiated. The often-seen notation ∂*y*/∂*x _{k}* is inherently ambiguous, unless one happens, as here, to know from context what list of variables is intended. In this text we will give the list of variables in all partial derivatives whenever there is any cause for doubt as to what that list might be, often using the shorthand form like ∂

*y*(

*x*)/∂

*x*in which

_{k}*x*stands for the whole list

*x*

_{1},

*x*

_{2},…,

*x*.

_{N}One should note that the list of variables *x* = *x* _{1}, *x* _{2},…, *x _{N}* in a partial derivative is just to indicate those variables to be treated as constants when a derivative is taken. It does not indicate that

*y*necessarily depends on each member of the list in every (p.543) case. Thus, we might have

*y*= 2

*x*

_{1}+

*x*

_{3}as the function, even though the list is

*x*

_{1},

*x*

_{2},…,

*x*

_{5}. In this case ∂

*y*(

*x*)/∂

*x*

_{2}= 0, ∂

*y*(

*x*)/∂

*x*

_{4}= 0, and ∂

*y*(

*x*)/∂

*x*

_{5}= 0 for all

*x*values.

Conversely, as discussed in Corollary D.10.2 in Section D.10 below, if ∂*y*(*x*)/∂*x _{n}* = 0 for all

*x*values in region

*R*, then

*y*does not depend on

*x*, and we may choose to expunge

_{n}*x*from the list of variables and write

_{n}*y*=

*y*(

*x*

_{1},

*x*

_{2},…,

*x*

_{N −1},

*x*

_{N+1},…,

*x*). For example, the list of variables for the above function might be shortened to

_{N}*x*

_{1},

*x*

_{2},

*x*

_{3}, with

*x*

_{4}and

*x*

_{5}dropped but the

*x*

_{2}retained.

Partial derivatives are themselves functions of the *same* variable list *x* _{1}, *x* _{2},…, *x _{N}* as was the function

*y*being differentiated. Thus second and higher derivatives can be taken by repeated application of rules like eqn (D.11). If they exist, these higher derivatives are denoted by expressions like

# D.7 Continuously Differentiable Functions

If all first partial derivatives of a function *y* = *y*(*x* _{1}, *x* _{2},…, *x _{N}*) exist and are continuous functions of

*x*, then the function

*y*itself is continuous. Such functions are called

*continuously differentiable*. If all partial derivatives up to and including the

*n*th order exist and are continuous functions of

*x*, the function is called

*continuously differentiable to n*th

*order*. Unless specifically stated otherwise, we will assume that all functions used in the present text are continuously differentiable to any order.

# D.8 Order of Differentiation

If the second partial derivatives exist and are continuous functions for all *x* in *R* (i.e. if *y*(*x*) is continuously differentiable to second order), then the order of the secondorder partial derivatives is unimportant since, for all *i*, *j*,

Generalizing, if *y* is continuously differentiable to *n*th order, then all partial derivatives of that order or less are also independent of the order in which they are taken.

# D.9 Chain Rule

Let *y* be a compound function of *r* as defined in eqn (D.5). Assume that all partial derivatives of the form ∂*y*(*x*)/∂*x _{k}* and ∂

*x*(

_{k}*r*)/∂

*r*exist. Then the partial derivatives of

_{j}*y*with respect to the

*r*variables exist and may be written using what is called the (p.544)

*chain rule*of partial differentiation,

or, in shorter but equivalent notation,

# D.10 Mean Values

## Theorem D.10.1: The Mean Value Theorem

*Suppose that a function y* = *y*(*x* _{1}, *x* _{2},…, *x _{N}*)

*is continuously differentiable in a region R and has partial derivatives*∂

*y*(

*x*)/∂

*x*=

_{k}*g*(

_{k}*x*)

*. Let h*(

_{k}be increments added to x_{k}and assume that point*x*

_{1}+ η

*h*

_{1},

*x*

_{2}+ η

*h*

_{2},…,

*x*+ η

_{N}*h*)

_{N}*lies in region R for all*η

*in the range*0 ≤ η ≤ 1

*. Then*

*for some* θ *in the range* 0 < θ < 1.

# D.11 Orders of Smallness

We often want to compare two functions as a variable approaches some limit *L*. A useful notation is found in Chapter I of Titchmarsh (1939). It is

In words, “Function *f* is of *smaller* order than ϕ as *x* approaches *L*.” The following notation is also used.

In words, “The difference between *f* and *g* is of *smaller* order than ϕ as *x* approaches *L*.”

(p.545)
For example, with *L* = 0,

and, with *L*=∞,

# D.12 Differentials

If *y* = *y*(*x*) is a function of one variable, the change in *y* as the independent variable is incremented from *x* to *x* + *dx* is *y* = *y*(*x* + *dx*) − *y*(*x*). The increment *dx* in this expression is *not* assumed to be small. It may take any value. Assuming that the function is differentiable and has a finite derivative at *x*, the *differential dy* at point *x* is defined as the linear approximation to *y* based on the tangent line to the curve at point *x*,

The approximation of *y* by *dy* may be good or bad, of course. But we are guaranteed that the difference between the two vanishes for small enough *dx*. Moreprecisely, it follows from the definition of the derivative that lim_{dx→0} {(Δ*y* − *dy*)/*dx*} = 0, orequivalently, that *y* = *dy* + *o*(*dx*) as *dx* → 0. In other words, the difference between Δ*y* and *dy* is of smaller order than *dx*, as *dx* approaches zero.

# D.13 Differential of a Function of Several Variables

The definition of differential can be extended to functions of more than one variable. Our definition of this differential follows that of pages 66–69 of Courant (1936*b*). Note that he calls it the *total differential*.

If *y* = *y*(*x* _{1}, *x* _{2},…, *x _{N}*) and each variable is independently incremented from

*x*to

_{k}*x*+

_{k}*dx*for all

_{k}*k*= 1,…,

*N*, then the change in

*y*is

and the differential *dy* is defined as as the linear approximation to *y* given by

As in the one variable case, the increments *dx _{k}* may be large or small. If the set

*x*

_{1},

*x*

_{2},…,

*x*is a set of independent variables, then the

_{N}*dx*are independent and may take any values.

_{k}
(p.546)
In the many-variable case, just as for the functions of one-variable discussed above, the approximation of Δ*y* by *dy* may be good or bad. Define *h* = max_{k}{|*dx _{k}*|}. If

*y*(

*x*) is a continuously differentiable function of

*x*, then the mean value theorem, Theorem D.10.1, implies that

where Δ*y* and *dy* are defined in eqns (D.22, D.23).

The differential in eqn (D.23) is well defined for any values of *dx _{k}*. Nonetheless, since eqn (D.24) says that, as

*h*= max

_{k}{|

*dx*|} goes to zero, the difference between Δ

_{k}*y*and

*dy*if of smaller order than

*h*, it is also legitimate to think of the differential as the small change in the value of

*y*as the independent variables are incremented by small amounts. The differential is often used heuristically in this way.

# D.14 Differentials and the Chain Rule

The differential of a compound function may be constructed by direct substitution. As in Section D.4, suppose we have a function *y* = *y*(*x* _{1}, *x* _{2},…, *x _{N}*) whose differential is

and suppose that each *x _{k}* is in turn a function of

*r*so that, for for

*k*= 1,…,

*N*,

Substituting eqn (D.26) into eqn (D.25), gives

where the chain rule eqn (D.15) was used to obtain the last equality. But the last expression in eqn (D.27) is precisely the differential of the compound function *y* = *y*(*r* _{1}, *r* _{2},…, *r _{m}*). This example illustrates that differentials provide a clear and correct notation for manipulating the chain rule of partial differentiation.

# D.15 Differentials of Second and Higher Orders

The first order differential defined in eqn (D.23) is itself a function of the variables *x* _{1},…, *x _{N}*,

*dx*

_{1},…,

*dx*. Thus, the second-order differential may be defined as the differential of the first-order differential, using the same increments. Writing eqn (p.547) (D.23) using an operator formalism as

_{N}we may define

assuming that the second-order partial differentials exist. Generalizing to *n*th order gives

where we assume that the partial differentials to *n*th order exist.

# D.16 Taylor Series

The Taylor theorem may be thought of as a generalization of the mean value theorem of Section D.10.

## Theorem D.16.1: Taylor Series

*If function y*(*x* _{1},…, *x _{N}*)

*is continuously differentiable to*(

*n*+ 1)

*st order, then*

*where all of the differentials dy, d* ^{2} *y, etc., are to be evaluated using the same increments dx _{k} and where the remainder R_{n} is*

*where*

*and* θ *is some number in the range* 0 < θ < 1.

# (p.548) D.17 Higher-Order Differential as a Difference

Suppose that the conditions of the Taylor theorem of Section D.16 apply, and define the difference *y* as

Then inspection of eqns (D.31, D.32) shows that

where *h* = max_{k}{|*dx _{k}*|} for 1 ≤

*k*≤

*N*.

# D.18 Differential Expressions

Given *N* functions *a _{k}*(

*x*) of the set of independent variables

*x*

_{1},

*x*

_{2},…,

*x*, we may form

_{N}*differential expressions*like

which may or may not be the actual differential of some function. These expressions may be manipulated by the usual rules of algebra (think of the *dx _{k}* simply as finite increments).

We adopt the usual convention that an equality involving differential expressions includes the implicit assumption that it holds for all possible *dx _{k}* values. Since the increments

*dx*of independent variables

_{k}*x*

_{1},

*x*

_{2},…,

*x*can take any values, it follows that the

_{N}*dx*may be set nonzero one at a time, which leads to the following Lemmas.

_{k}## Lemma D.18.1: Zero Differential Expressions

*The differential expression is zero*

*if and only if a _{k}*(

*x*) = 0

*for all k*= 1,…,

*N*.

## Lemma D.18.2: Equal Differential Expressions

*Two differential expressions are equal*,

*if and only if a _{k}*(

*x*) =

*b*(

_{k}*x*)

*for all k*= 1,…,

*N*.

## (p.549) Lemma D.18.3: Differential Expression and Differential

*It follows from eqn* (D.38) *together with the definition of the differential in eqn* (D.23) *that, if we are given a function f* = *f*(*x* _{1}, *x* _{2},…, *x _{N}*)

*of the independent variables x*

_{1},

*x*

_{2},…,

*x*${\sum}_{k=1}^{N}{a}_{k}(x)d{x}_{k}$,

_{N}and a differential expression*the equality*

*holds if and only if* ∂*f*(*x*)/∂*x _{k}* =

*a*(

_{k}*x*)

*for all k*= 1,…,

*N*.

## Lemma D.18.4: Zero Differential

*It follows from Lemma* D.18.1 *that a function f* = *f*(*x* _{1}, *x* _{2},…, *x _{N}*)

*of the independent variables x*

_{1},

*x*

_{2},…,

*x*= 0

_{N}will have df*at a point x*

_{1},

*x*

_{2},…,

*x*∂

_{N}if and only if*f*(

*x*)/∂

*x*= 0

_{k}*for all k*= 1,…,

*N at that point*.

In some cases of interest, the variables *x* will be functions of another set of variables *r* = *r* _{1}, *r* _{2},…, *r _{M}* as in the discussion of compound functions in Section D.4. In that case, using the chain rule from Section D.14, the differential expression becomes

where

In some applications, it is important to know if the Lemmas above still apply to the differential expression ${\sum}_{k=1}^{N}{a}_{k}(x)d{x}_{k}$ when one assumes *only* that the *r* variables are independent.

## Theorem D.18.5: Compound Differential Expressions

*Let the variables r be assumed to be independent so that the increments dr _{j} can be set equal to zero one at a time in eqn* (D.40).

*Then Lemmas*

*D.18.1*

*through*

*D.18.4*

*will continue to hold for the differential expression*${\sum}_{k=1}^{N}{a}_{k}(x)d{x}_{k}$

*if and only if M*=

*N and the determinant condition*

*is satisfied*.

**Proof:** Theorem D.24.1 below shows that the determinant condition in eqn (D.42) is the necessary and sufficient condition for the transformation *x* → *r* to be invertible.

(p.550) Thus, one can use the inverse matrix from Section D.25 to write

from which a choice of the independent increments *dr _{j}* can be found that will make the

*dx*have any value desired. Thus the

_{k}*dx*are also arbitrary and independent, and can be set nonzero one at a time, which is the condition needed.

_{k}# D.19 Line Integral of a Differential Expression

A curve can be defined in region *R* by making each *x _{k}* be a function of some monotonically varying parameter β, sothat

*x*=

_{k}*x*(β) for

_{k}*k*= 1,…,

*N*. The integral

is called a *line integral* of the differential expression ${\sum}_{k=1}^{N}{a}_{k}(x)d{x}_{k}$ along a portion of that curve.

The line integral in eqn (D.44) is often denoted more simply, as just the integral of a differential expression with integration along a particular curve being understood. Thus, one often sees *I* _{01} denoted as

where the latter form treats the differential expression as a dot product of two vectors in an N-dimensional Cartesian space with the *x _{k}* as its coordinates.

# D.20 Perfect Differentials

Differential expressions ${\sum}_{k=1}^{N}{a}_{k}(x)d{x}_{k}$ for which a function *f* exists satisfying $df={\sum}_{k=1}^{N}{a}_{k}(x)d{x}_{k}$ are called *perfect differentials*. Line integrals and perfect differentials. are treated in Chapter V of Courant (1936*b*) and Appendix 11 in Volume I of Desloge (1982).

The function *f* is sometimes called a *potential function* since the *a _{k}*(

*x*) can be derived from it by partial differentiation in a way analogous to the derivation of the electric field components from the electric potential. A condition for ${\sum}_{k=1}^{N}{a}_{k}(x)d{x}_{k}$ to be a perfect differential is given by the following theorem.

## Theorem D.20.1: Condition for a Perfect Differential

*Assume the variables x* _{1}, *x* _{2},…, *x _{N} to lie in an open rectangle R*

_{⊥}

*. Given a set of continuously differentiable functions a*(

_{k}*x*)

*for k*= 1,…,

*N, there exists a potential function*(p.551)

*f*=

*f*(

*x*

_{1},

*x*

_{2},…,

*x*)

_{N}*such that the following two equivalent conditions are satisfied*,

*if and only if, for all x in R* _{⊥} *and all pairs of indices i*, *j* = 1,…, *N*,

**Proof:** First we prove that eqn (D.46) implies eqn (D.47). For if an *f* exists satisfying eqn (D.46), then, using eqn (D.13) gives

To prove that eqn (D.47) implies eqn (D.46) we construct a suitable *f* explicitly. Starting from some arbitrary point ${x}_{1}^{\left(0\right)},{x}_{2}^{\left(0\right)},\dots ,{x}_{N}^{\left(0\right)}$, we perform a line integral along a series of straight line segments, first along *x* _{1}, then along *x* _{2}, etc., until the final point 1 at *x* _{1}, *x* _{2},…, *x _{N}* is reached. Such a path will lie entirely in

*R*

_{⊥}, and the integral

*I*

_{10}will be the sum of the integrals along its segments. Along the

*j*th segment, all

*x*for

_{i}*i*≠

*j*are held constant but

*x*varies as

_{j}*x*= β, giving

_{j}*dx*(β)/

_{k}*d*β = δ

_{jk}. Inserting this result into eqn (D.44), and setting

*f*(

*x*

_{1},

*x*

_{2},…,

*x*) equal to the integral

_{N}*I*

_{10}gives

Since any point *x* can be reached by this integration, the function *f* is defined for all *x* in *R* _{⊥}.

The partial derivatives of this *f* may be written as

With *x _{j}* temporarily replaced by β, the assumption stated in eqn (D.47) implies that

(p.552) Thus

which shows that the *f* constructed in eqn (D.49) does have the required property eqn (D.46).

The line integral eqn (D.49) used in this proof is of great practical use. It will be used, for example, to compute generating functions for canonical transformations.

The theorem proved in this section (but not, of course, the proof of it given here) remains true even when open rectangle *R* _{⊥} is replaced by a more general region *R* which is only assumed to be open and simply connected. See Chapter V of Courant (1936*b*) for details.

# D.21 Perfect Differential and Path Independence

An alternate, and equivalent, condition for a differential expression to be a perfect differential is that its line integral between two end points be independent of the particular choice of the path between them.

## Theorem D.21.1: Path Independence

*Assume a given set of functions a _{k}*(

*x*)

*for k*= 1,…,

*N, continuously differentiable in an open and simply connected region R. The differential expression*${\sum}_{k=1}^{N}{a}_{k}\left(x\right)d{x}_{k}$

*is a perfect differential with a potential function satisfying the equivalent conditions eqn*(D.46),

*if and only if the line integral between any two points in R*

*depends only on x at the endpoints of the integration x* _{1}(β_{0}), *x* _{2}(β_{0}),…, *x _{N}*(β

_{0}) (p.553)

*and x*

_{1}(β

_{1}),

*x*

_{2}(β

_{1}),…,

*x*(β

_{N}_{1})

*and is therefore independent of the path x*(β)

_{k}*taken between those endpoints*.

**Proof:** First assume eqn (D.53) and prove the path independence of eqn (D.54). If a potential function *f* exists, then eqn (D.54) becomes

which depends only on the value of *f* at the end points and hence is independent of the path, as was to be proved. For proof of the converse, see Chapter V of Courant (1936*b*).

It follow from Theorem D.21.1 that integration along *any* path between the points ${x}_{1}^{\left(0\right)},{x}_{2}^{\left(0\right)},\dots ,{x}_{N}^{\left(0\right)}$ would *x* _{1}, *x* _{,},…, *x* _{N} have given the same integral as the particular path used in the proof of Theorem D.20.1.

The path independence of line integrals between any two points is equivalent to the vanishing of line integrals around closed paths. The following Corollary is given without proof.

# D.22 Jacobians

Suppose that we are given *y _{k}* =

*y*(

_{k}*x*

_{1},

*x*

_{2},…,

*x*,

_{N}*z*

_{1},

*z*

_{2},…,

*z*) =

_{P}*y*(

_{k}*x*,

*z*) for

*k*= 1,…,

*N*, where

*P*may have any non-negative value, including the value zero (which would indicate that the extra

*z*variables are absent). Then we may form an

*N*×

*N*matrix of partial derivatives denoted (∂

*y*(

*x*,

*z*)/∂

*x*) and defined by

This matrix is called the *Jacobian matrix*, and its determinant is called the *Jacobian determinant*, or simply the *Jacobian*. It is variously denoted. We give here the two forms we will use, and the determinant itself,

The first form, which is the traditional one, does not specify the list of variables to be held constant when the partial derivatives are taken. It is implicit in the notation that (p.554) all of the variables listed in the denominator are on that list, but there may be others, as here. Usually the intended list is obvious, but in cases of doubt we will modify the traditional notation and use expressions like

in which the list is explicitly stated. Jacobians are treated in Volume I, Appendix 3 of Desloge (1982) and in Chapter III of Courant (1936*b*).

## Lemma D.22.1: Jacobian of a Compound Function

*If the variables x _{i} are themselves functions of another set of variables r, as well as of the same extra variables z as above, x_{i}* =

*x*(

_{i}*r*

_{1},

*r*

_{2},…,

*r*,

_{N}*z*

_{1},

*z*

_{2},…,

*z*)

_{P}*for i*= 1,…,

*N, then the compound functions y*(

_{k}*r*

_{1},

*r*

_{2},…,

*r*,

_{N}*z*

_{1},

*z*

_{2},…,

*z*)

_{P}*may be defined by*

*Then the Jacobians obey the relation*

*or, in the traditional notation*,

**Proof:** The chain rule of partial differentiation gives

which may be written as a matrix equation

Equating the determinant of both sides of eqn (D.63) gives eqn (D.60), as was to be proved.

## Lemma D.22.2: Jacobian of an Augmented Variable Set

*If y _{k}* =

*y*(

_{k}*x*

_{1},

*x*

_{2},…,

*x*,

_{N}*z*

_{1},

*z*

_{2},…,

*z*) =

_{P}*y*(

_{k}*x*,

*z*)

*for k*= 1,…,

*N as before, the following identity holds*

**Proof:** The left expression in eqn (D.64) is the determinant of an (*N* + *P*) × (*N* + *P*) matrix whose last *P* rows consist of *N* zeroes followed by an element of the *P* × *P* identity matrix. Its determinant is thus the determinant of the *N* × *N* upper left-hand block, which is the right expression in eqn (D.64).

## (p.555) Lemma D.22.3: Jacobian of an Inverse Function

*If a set of functions y _{k}* =

*y*(

_{k}*x*

_{1},

*x*

_{2},…,

*x*,

_{N}*z*

_{1},

*z*

_{2},…,

*z*)

_{P}*can be solved for x, giving x*=

_{i}*x*(

_{i}*y*

_{1},

*y*

_{2},…,

*y*,

_{N}*z*

_{1},

*z*

_{2},…,

*z*)

_{P}*then*

*or, in traditional form*,

**Proof:** As is proved in Theorem D.24.1 below, the necessary and sufficient condition for *y _{k}* =

*y*(

_{k}*x*,

*z*) to be solved for

*x*=

_{i}*x*(

_{i}*y*,

*z*) is the determinant condition

Then, as discussed in Section D.25, the following matrix equation holds

where U is the *N* × *N* unit matrix. Taking the determinant of both sides gives eqn (D.65).

## Lemma D.22.4: Change of Variable in an Integral

*Given a set of continuously differentiable functions y _{k}* =

*y*(

_{k}*x*

_{1},

*x*

_{2},…,

*x*)

_{N}*for k*= 1,…,

*N, whose Jacobian does not vanish in the range of integration*,

*the multiple integral*

*may be transformed into an integral over the variables x*

*where the compound function f*(*x* _{1}, *x* _{2},…, *x _{N}*) is

*and the limits of integration in eqn* (D.71) *are chosen so that x ranges over the inverse image of the range of y*.

In practice, these limits of integration in eqn (D.71) are usually chosen so that the two integrals would have the same value, including the same sign, for the case in which *f* is replaced by the number 1.

# (p.556) D.23 Global Inverse Function Theorem

We often need to invert a set of functions like *y _{i}* =

*y*(

_{i}*x*

_{1},

*x*

_{2},…,

*x*,

_{N}*z*

_{1},

*z*

_{2},…,

*z*) where

_{P}*i*= 1,…,

*N*, that is, to solve them for the variables

*x*=

_{j}*x*(

_{j}*y*

_{1},

*y*

_{2},…,

*y*,

_{N}*z*

_{1},

*z*

_{2},…,

*z*). The following global and local inverse function theorems are of great importance.

_{P}The inverses proved by the present theorem are called *global*, because the same, unique inverse functions apply to the whole of an open rectangle. This open rectangle may be indefinitely large. For example, in the transformation from plane polar coordinates ρ, ϕ to plane Cartesian coordinates *x*, *y*, the open rectangle might be 0 < ρ < ∞ and −π < ϕ < π.

## Theorem D.23.1: The Global Inverse Function Theorem

*Assume that all points x* _{1}, *x* _{2},…, *x _{N}*,

*z*

_{1},

*z*

_{2},…,

*z*

_{P}lie in an open rectangle R_{⊥}

*, and that*

*for i* = 1,…, *N, are a set of continuously differentiable functions of the stated variables. If for all x*, *z in R* _{⊥}, *the Jacobian determinant*

*is nonzero and has a persistent, nested set of critical minors*,^{129} *then functions y _{i}* =

*y*(

_{i}*x*

_{1},

*x*

_{2},…,

*x*,

_{N}*z*

_{1},

*z*

_{2},…,

*z*)

_{P}*can be solved for the inverse functions x*=

_{j}*x*(

_{j}*y*

_{1},

*y*

_{2},…,

*y*,

_{N}*z*

_{1},

*z*

_{2},…,

*z*)

_{P}*for j*= 1,…,

*N so that*

*These inverse functions will be unique and continuously differentiable in the range covered by variables y, z as variables x, z range over R* _{⊥}.

**Proof:** This proof is adapted from Volume II, Appendix 18 of Desloge (1982). The proof is by induction. First prove the theorem for *N* = 1. Then prove that, if the theorem is true for *N* = *k* − 1, it must be true for *N* = *K*. It follows that the theorem must be true for any integer *N*.

For *N* = 1, (∂*y*(*x*, *z*)/∂*x*) = ∂*y* _{1}(*x* _{1}, *z*)/∂*x* _{1}. By assumption, this partial derivative is nonzero in a range of values *a* _{1} < *x* _{1} < *b* _{1}. Considering the *z* variables as fixed parameters, the inverse function theorem of ordinary one-variable calculus applies. Since ∂*y* _{1}(*x* _{1}, *z*)/∂*x* _{1} is continuous and nonzero, it must have the same sign throughout the range. Thus *y* _{1}(*x* _{1}, *z*) is monotonic and has a unique inverse *x* _{1} = *x* _{1}(*y* _{1}, *z*). Since
(p.557)
∂*x* _{1}(*x* _{1}, *z*)/∂*y* _{1} = (∂*y* _{1}(*x* _{1}, *z*)/∂*x* _{1})^{−1} it follows that *x* _{1} = *x* _{1}(*y* _{1}, *z*) is also a continuously differentiable function.

For *N* = *K*, the Jacobian determinant is

Since by assumption this determinant is nonzero, it must have a critical (*k* − 1) rowed minor. By assumption, the same minor is nonzero for all *x*, *z* in *R* _{⊥}. For simplicity (and without loss of generality since the functions and variables can be relabeled in any way) we assume that this is the minor with the first row and first column removed. Then

By the induction assumption, the theorem is true for *N* = *k* − 1. So eqn (D.77), and its assumed persistent nested critical minors, imply that inverse functions exist of the form

where the set *z* _{1}, *z* _{2},…, *z _{P}* is now being represented by the single unsubscripted letter

*z*. Substitute eqn (D.78) into

*y*

_{1}to obtain the compound function

*y*

_{1}=

*y*

_{1}(

*x*

_{1},

*y*

_{2},…,

*y*,

_{K}*z*) defined by

We now show that this function can be solved for *x* _{1}. Using the chain rule and eqn
(p.558)
(D.79), the partial derivative ∂*y* _{1}(*x* _{1}, *y* _{2},…, *y _{K}*,

*z*)/∂

*x*

_{1}can be expanded as

where eqn (D.64) has been used. Now multiply each term in eqn (D.80) by the Jacobian determinant

and use eqn (D.61) to obtain

But eqn (D.64) and the usual rules for sign change when rows or columns of a determinant are exchanged, show the last Jacobian determinant in eqn (D.82) to be

Thus

The right side of eqn (D.84) is just the Jacobian determinant ∂(*y* _{1}, *y* _{2},…, *y _{K}*)/∂(

*x*

_{1},

*x*

_{2},…,

*x*) evaluated using expansion by cofactors along its first row. Thus eqn (D.82) becomes

_{K}Since both

(p.559) by assumption, this proves that the partial derivative

Since *R* _{⊥} is an open rectangle, eqn (D.87) will hold for any fixed values of *y* _{2},…, *y _{K}*,

*z*

_{1},

*z*

_{2},…,

*z*and for all

_{P}*a*

_{1}<

*x*

_{1}<

*b*

_{1}where

*a*

_{1}and

*b*

_{1}are the least and greatest value of

*x*

_{1}in the open rectangle. Thus, by the same reasoning as was used for the case

*N*= 1 above,

*y*

_{1}=

*y*

_{1}(

*x*

_{1},

*y*

_{2},…,

*y*,

_{K}*z*) may be inverted to yield the unique, continuously differentiable inverse function

*x*

_{1}=

*x*

_{1}(

*y*

_{1},

*y*

_{2},…,

*y*,

_{K}*z*). Substituting that equation into eqn (D.78) gives the desired result:

*x*=

_{i}*x*(

_{i}*y*

_{1},

*y*

_{2},…,

*y*,

_{K}*z*) for all

*i*= 1,…

*K*.

Since the truth of the theorem for *N* = *K* − 1 proves its truth for *N* = *K*, the theorem must be true for any *N*.

# D.24 Local Inverse Function Theorem

In many cases, global inverses provided by Theorem D.23.1 are not needed, and indeed may not be available. In these cases, we can still define *local* inverse functions. These local inverses will be proved to exist only in some open neighborhood *N* _{x, z} surrounding point *x, z*.

Local inverses are important because they may exist at points of a wider class of regions than the *R* _{⊥} assumed in Theorem D.23.1. And, of course, if a global inverse does exist in a region, then local inverses will exist at each point of that region also.

## Theorem D.24.1: The Local Inverse Function Theorem

*Assume that point x* _{1}, *x* _{2},…, *x _{N}*,

*z*

_{1},

*z*

_{2},…,

*z*

_{P}lies in an open region R, and that*are continuously differentiable functions of the stated variables. If the Jacobian determinant*

*is nonzero at the point x, z, then there is some open neighborhood N _{x, z} of this point in which functions y_{i}* =

*y*(

_{i}*x*

_{1},

*x*

_{2},…,

*x*,

_{N}*z*

_{1},

*z*

_{2},…,

*z*)

_{P}*can be solved for the inverse functions*

*for j* = 1,…, *N. These inverse function will be unique and continuously differentiable in the range covered by variables y, z as variables x, z vary over N _{x, z}*.

**Proof:** Since *R* is open, every point *x*, *z* is in some open neighborhood *N* _{x, z} which is contained entirely in *R*. Since the function *y* is assumed to be continuously differentiable, the Jacobian eqn (D.74), and all of its minors, are continuous functions of *x*, *z*. Thus, since the Jacobian (and hence a set of nested minors) are nonzero at *x*, *z* by assumption, we may shrink the open neighborhood *N* _{x, z} until these determinants
(p.560)
are nonzero over the whole of *N* _{x, z}. Since, by definition, the open neighborhood is a (possibly small) open rectangle, the conditions of Theorem D.23.1 are satisfied, and the local inverse functions exist, as was to be proved. A direct proof of this theorem, not based on the global inverse function theorem, is given on page 152 of Courant (1936*b*).

# D.25 Derivatives of the Inverse Functions

If inverse functions, from either the global or local inverse function theorems, exist at some point *x*, *z*, then the partial derivatives of the inverse functions *x _{j}* can be expressed in terms of the partial derivatives of the original functions

*y*at that point.

_{i}Substituting eqn (D.90) into eqn (D.88) gives the compound function

The chain rule then gives

which in matrix form is

where U denotes the *N* × *N* identity matrix. It follows that both product matrices are nonsingular and that

and hence that, for any *i*, *j*

which expresses the partials of *x _{j}* with respect to the

*y*as functions of the partials of the original functions

_{i}*y*(

_{i}*x*,

*z*).

Similarly, applying the chain rule to the differentiation of eqn (D.91) with respect to *z _{n}* gives

which leads to the matrix equation

and hence

which expresses the partials of *x _{j}* with respect to the

*z*in terms of the partials of the (p.561) original functions

_{n}*y*(

_{i}*x*,

*z*).

# D.26 Implicit Function Theorem

It often happens that a set of functions *y _{j}* =

*y*(

_{j}*x*

_{1},

*x*

_{2},…,

*x*), for

_{P}*j*= 1,…,

*N*, is not given directly, but rather in

*implicit*form. One defines other functions

*f*(

_{i}*x*,

*y*), for

*i*= 1,…,

*N*, and requires all

*x*,

*y*values to be those that will make these

*f*identically zero,

_{i}The following theorem gives the conditions under which such identities actually specify the implicit functions *y _{j}*.

## Theorem D.26.1: Implicit Function Theorem

*Assume that f _{i}*(

*x*

_{1},

*x*

_{2},…,

*x*,

_{P}*y*

_{1},

*y*

_{2},…,

*y*)

_{N}*for i*= 1,…,

*N, are continuously differentiable functions. If the Jacobian determinant*

*is nonzero at point x*, *y, then there is an open neighborhood N _{xy} of the point x*,

*y in which the identities*

*for i* = 1,…, *N, can be solved for the implicit functions y _{j}* =

*y*(

_{j}*x*

_{1},

*x*

_{2},…,

*x*)

_{P}*for j*= 1,…,

*N. These functions will be unique and continuously differentiable in the open neighborhood*.

**Proof:** Apply the local inverse function theorem, Theorem D.24.1, to solve *fi* = *fi*(*x* _{1}, *x* _{2},…, *x* _{p}, *y* _{1}, *y* _{2},…, *y* _{N}) for *y* _{j} = *y* _{j}(*f* _{1},…, *f* _{N}, *x* _{1},…, *x* _{p}). Then apply the identity to set *f _{i}* = 0 for each

*i*, and so obtain

which are the desired functions.

# D.27 Derivatives of Implicit Functions

Applying eqn (D.98) of Section D.25 with the replacements *y* → *f*, *x* → *y*, *z* → *x*, and *f* then set equal to zero, gives the partial derivatives of the implicit functions in terms of the partial derivatives of the *f _{i}*. Thus, for all

*j*= 1,…,

*N*and

*n*= 1,…,

*P*,

# (p.562) D.28 Functional Independence

Consider the *M* continuously differentiable functions of *N* variables *f _{k}*(

*x*

_{1},

*x*

_{2},…,

*x*) for

_{N}*k*= 1,…,

*M*. These functions are

*functionally dependent*and have a

*dependency relation*at point

*x*if there is a continuously differentiable function

*F*with at least one nonzero partial derivative ∂

*F*(

*f*)/∂

*f*≠ 0 for which

_{i}holds identically in an open neighborhood *N _{x}* of point

*x*.

If the functions have no dependency relation at point *x*, they are said to be *functionally independent* at that point. We now give a condition for a set of functions *f _{k}* to be functionally independent.

## Theorem D.28.1: Condition for Functional Independence

*Consider M continuously differentiable functions of N variables, f _{k}*(

*x*

_{1},

*x*

_{2},…,

*x*)

_{N}*for k*= 1,…,

*M. Letthe M*×

*N matrix*(∂

*f*(

*x*)/∂

*x*)

*be defined by its matrix elements*

*If, for some x, therank r of this matrix is r* = *M (which is possible only if M* ≤ *N since r cannot be greater than N), then the functions f _{k} are functionally independent at x*.

**Proof:** We show that the existence of a dependency relation implies that *r* < *M*, and therefore that *r* = *M* implies functional independence. Differentiating an assumed dependency relation eqn (D.104) with respect to *x _{i}* using the chain rule gives

Defining an *M* × 1 column vector[∂ *F*(*f*)/∂ *f*] by

we may rewrite eqn (D.106) as the matrix equation

By assumption the column vector [∂ *F*(*f*)/∂*f*] will have at least one nonzero element. Thus Corollary B.19.2 requires that the matrix in eqn (D.108), and hence its transpose (∂*f*(*x*)/∂*x*), must have rank less than *M*. Since the existence of a dependency relation implies *r* < *M*, it follows that *r* = *M* at point *x* implies the non-existence of a dependency relation, as was to be proved.

# (p.563) D.29 Dependency Relations

When functions are functionally dependent, there may be one or more dependency relations among them. It is often important to know how many dependency relations there are, as specified in the following theorem.

## Theorem D.29.1: Dependency Relations

*Consider M continuously differentiable functions*,

*of N variables x* = *x* _{1}, *x* _{2},…, *x _{N}. Consider again the M* ×

*N matrix*(∂

*f*(

*x*)/∂

*x*)

*defined by its matrix elements in eqn*(D.105).

*If at the point x the rank r of this matrix is r*<

*M, then there are M*−

*r functionally independent dependency relations among the f*,

_{k}which hold in an open neighborhood N_{x}of point xFor a proof of this theorem, see Volume I, Appendix 14 of Desloge (1982).

# D.30 Legendre Transformations

We often have a function like *f*(*x* _{1},…, *x _{M}*,

*y*

_{1},…,

*y*) =

_{N}*f*(

*x*,

*y*) that is important mainly because its partial derivatives yield desired functions

*u*=

_{i}*u*(

_{i}*x*,

*y*) and

*w*=

_{j}*w*(

_{j}*x*,

*y*), as in

for *i* = 1,…, *M* and *j* = 1,…, *N*.

It is sometimes useful to have a different, but related, function

whose partial derivatives are

again for *i* = 1,…, *M* and *j* = 1,…, *N*. Notice that the roles of *y _{j}* and

*w*are interchanged in the last partial derivatives in eqns (D.111, D.113), while the partials with respect to

_{j}*x*yield the same function

_{i}*u*, except for the minus sign and its expression in the new

_{i}*x*,

*w*variable set. This transformation from

*f*to

*g*is called a

*Legendre transformation*.

(p.564)
The Legendre transformation is effected by defining *g* to be

But the *g* in eqn (D.114) is not yet expressed in the correct set of variables *x*, *w*. In order to complete the Legendre transformation, we must prove that the functions *w _{j}* =

*w*(

_{j}*x*,

*y*) defined in the second of eqn (D.111) can be inverted to give

*y*=

_{j}*y*(

_{j}*x*,

*w*). By Section D.24, the condition for this inversion to be possible is that

where the matrix (∂*w*(*x*, *y*)/∂*y*) is defined by

In general, eqn (D.115) must be proved in each individual case to which the Legendre transformation is to be applied. Assume now that this has been done.

The inverse function *y _{j}* =

*y*(

_{j}*x*,

*w*) then allows us to write

*g*as a compound function of the variable set

*x*,

*w*as

Now consider the differential of function *g*. Equation (D.114) gives

where the second of eqn (D.111) has been used to cancel the *dy _{j}* terms, and the first of eqn (D.111) has been used to get the

*u*. We know from eqn (D.117) that

_{i}*g*(

*x*,

*w*) exists and is well defined, and hence has the differential

Equations (D.118, D.119) are two expressions for the same differential *dg* and hence equal to each other. Since|∂*w*(*x*, *y*)/∂*y*|≠ 0 by assumption, Theorem D.18.5
(p.565)
shows that the differentials *dw*, *dx* may be considered as arbitrary and independent. Thus the equality of eqns (D.118, D.119) implies the equality of each of the coefficients of the differentials *dw* _{j} and *dx _{i}* in the two equations. Thus eqn (D.113) holds, as desired.

# D.31 Homogeneous Functions

Let *f* (*x* _{1}, *x* _{2}, …, *x _{N}*,

*z*

_{1},

*z*

_{2}, …,

*z*) =

_{P}*f*(

*x*,

*z*) be a continuously differentiable function of the stated variables, defined over a region

*R*. Function

*f*(

*x*,

*z*) is

*homogeneous of degree k*in the set of variables

*x*

_{1},

*x*

_{2}, …,

*x*if and only if, for some

_{N}*k*and any positive number λ > 0,

An alternate, and equivalent, definition of homogeneous functions is given in the following theorem, whose proof can be found in Chapter II of Courant (1936*b*).

## Theorem D.31.1: Euler Condition

*Function f(x, z) is homogeneous of degree k in the set of variables x _{1}, x_{2}, …, x_{N} as defined in eqn* (D.120)

*if and only if*

# D.32 Derivatives of Homogeneous Functions

In Lagrangian mechanics, it is important also to consider the homogeneity of the partial derivatives of homogeneous functions.

## Theorem D.32.1: Derivatives of Homogeneous Functions

*Function f is homogeneous of degree k* + 1 *in the set of variables x _{1}, x_{2}, …, x_{N}, that is*

*if and only if all partial derivatives* ∂*f*(*x*, *z*)/∂*x _{i}* =

*g*(

_{i}*x*,

*z*)

*for i*= 1, …,

*N are homogeneous of degree k, that is*

**Proof:** First, we assume eqn (D.122) and prove eqn (D.123). Taking partial derivatives, and using eqn (D.122) gives

which establishes eqn (D.123).

(p.566) Conversely, assume eqn (D.123) and prove eqn (D.122). Equation (D.123) can be written

Hence

and so

It follows from Corollary D.10.2 that *f*(λ*x*, *z*) − λ^{(k+1)} *f*(*x*, *z*) = *C*. Setting λ = 1 proves that the constant *C* = 0, which is equivalent to eqn (D.122). ⎸

# D.33 Stationary Points

For functions of one variable *f* = *f*(*x*), maxima, minima, and points of inflection (collectively called *stationary points*) are those points at which *df*(*x*)/*dx* = 0.

For functions of many variables, stationary points may be defined similarly as points at which, for all *k* = 1, …, *N*,

Since the variables *x* and hence their differentials *dx _{k}* for

*k*= 1, …,

*N*are assumed independent, Lemma D.18.4 implies that this definition may be stated more simply, and equivalently, as

Since we know that this differential approximates the difference Δ*f* when the differentials *dx _{j}* are small, it is also valid to say that the extremum is a point such that

*f*is constant to first order for small excursions from it in any direction.

In one sense, the many variable case is much more complex than the single variable one. A function of many variables may have maxima in some variables, minima in others, etc. We avoid this complexity here by assuming that it will be clear from context in most practical examples whether the stationary point is a maximum, minimum, or some more complicated mixture of the two.

# D.34 Lagrange Multipliers

Often, we want to find the stationary points of a function *f*(*x*) given certain constraints on the allowed values of *x* _{1}, *x* _{2}, …, *x _{N}* and their differentials. These constraints are expressed by defining a functionally independent set of functions

*G*and setting them equal to zero, 0=

_{a}*G*(

_{a}*x*

_{1},

*x*

_{2}, …,

*x*) for

_{N}*a*= 1, …,

*C*.

For example, with

defined to be the distance from the origin in a three-dimensional Cartesian space, eqn (D.128) gives a stationary point at (0, 0, 0), obviously a minimum. But suppose that
(p.567)
we want the stationary points of *f* subject to the constraint that *x* lie on a plane at distance Λ from the origin, which can be expressed by

where constants α, β, γ obeyα^{2} + β^{2} + γ^{2} = 1. The Lagrange multiplier theorem gives an elegant method for solving such problems.

## Theorem D.34.1: Lagrange Multiplier Theorem

*Let the values of the independent variables x* _{1}, *x* _{2},…, *x _{N} in some open region R be constrained by equations of the form*

*for a* = 1,…, *C, where the G _{a} are continuously differentiable functions. Assume that the C* ×

*N Jacobian matrix*g

*defined by*

*has rank C so that the functions G _{a} are functionally independent and represent C independent constraints*.

*A continuously differentiable function f*(*x* _{1}, *x* _{2},…, *x _{N}*)

*has a stationary point at x, subject to these constraints, if and only if there exist Lagrange multipliers*λ

_{a}= λ

_{a}(

*x*)

*such that, at the stationary point*,

*for all k* = 1,…, *N*.

**Proof:** The necessary and sufficient condition for a stationary point is that *df* = 0 subject to the constraints on the possible *dx _{k}* values given by the vanishing of the differentials of eqn (D.132)

for all *a* = 1,…, *C*. Since the matrix g defined in eqn (D.133) has rank *C*, it must have an *C*-rowed critical minor. The variables *x* can be relabeled in any way, so we lose no generality by assuming that this critical minor is the determinant containing the *C* rows and the last *C* columns. Call the corresponding matrix g^{(b)} so that ${g}_{aj}^{(b)}={g}_{a}(N-C+j)$ for *a*, *j* = 1,…, *C*. By construction, the determinant of this matrix is a critical minor, and so |g^{(b)}| ≠ 0. In the following, we will also use the shorthand notations *x*(^{f}) for what will be called the *free* variables *x* _{1},…, *x* _{N −C} and *x*(^{b}) for the *bound* variables *x* _{N−C+1},…, *x _{N}*. And

*x*will continue to denote the full set of variables

*x*1,

*x*2,…,

*xN*.

(p.568) Using these notations, eqn (D.135) may be written with separate sums over the free and bound variables

Since g^{(b)} is nonsingular, it has an inverse g^{(b)−1}, which may be used with eqn (D.136) to write the bound differentials *dx* ^{(b)} in terms of the free ones.

The differential *df* can also be written with separate sums over the free and bound variables

Substituting eqn (D.137) into this expression gives

where the λ_{a} in the last expression are defined, for *a* = 1,…,*C*, as

The solution, eqn (D.137), for the bound differentials *dx* ^{(b)} reduces eqn (D.135) to an identity. Hence no constraint is placed on the free differentials *dx*(^{f}). Setting these free differentials nonzero one at a time in eqn (D.139), the condition *df* = 0 for *x* to be a stationary point implies and is implied by

for all *i* = 1,…,(*N* −*C*), which establishes the theorem for those values of the index.

For indices in the range (*N* − *C* + 1),…, *N*, with the same choice of λ_{a} as defined in eqn (D.140), the eqn (D.134) is satisfied identically at all points, including
(p.569)
the stationary one. To demonstrate this, let *j* = 1,…, *C* and use eqn (D.140) to write

Thus eqn (D.134) holds for all index values if and only if *x* is a stationary point, as was to be proved.

Note that the *N* equations of eqn (D.134) together with the *C* equations of eqn (D.132), are *N* + *C* equations in the *N* + *C* unknowns *x* _{1}, *x* _{2},…, *x _{N}*, λ

_{1},…, λ

_{C}, and so can be solved to find the stationary points.

# D.35 Geometry of the Lagrange Multiplier Theorem

When applied to the simple example in eqns (D.130, D.131) of Section D.34, the three equations of the Lagrange multiplier condition eqn (D.134) can be written as a single vector equation, the equality of two gradient vectors,

Since, as described in Section D.37, vector ∇*G* _{1} = α**ê** _{1} + β**ê** _{2} + γ **ê** _{3} is perpendicular to the surface of constraint *G* _{1} = 0, eqn (D.143) says that ∇ *f* must be in the same or opposite direction, and so also perpendicular to that constraint surface.

If we denote by *d* **r** = *dx* _{1} **ê** _{1} + *dx* _{2} **ê** _{2} + *dx* _{3} **ê** _{3} the differential displacement vector whose Cartesian components are the differentials *dx _{i}*, the chain rule gives

The constraint *G* _{1} = 0, and hence *dG* _{1} = 0, constrains the vector *d* **r** to have a zero dot product with ∇*G* _{1}, and so to be perpendicular to the perpendicular to the surface of constraint. Thus *d* **r** must lie *in* the surface of constraint.

If there were no constraint, the condition for *f* to have a stationary point would simply be that

Since without a constraint the displacement *d* **r** can take any direction, eqn (D.145) implies that ∇ *f* = 0, which is equivalent to the three equations of eqn (D.128).

However, with the constraint, eqn (D.145) does not imply that ∇ *f* = 0, butonly that ∇ *f* must have no components in possible *d* **r** directions. Since *d* **r** can lie anywhere
(p.570)
in the surface of constraint, this means that ∇ *f* must be perpendicular to that surface, in other words that ∇ *f* must be parallel or anti-parallel to ∇*G* _{1}, as eqn (D.143) states.

If another constraint were added, then eqn (D.143) would become

Adding the second constraint further restricts the possible directions of *d* **r** and hence *increases* the possible directions that ∇ *f* can have while still maintaining the condition *d* **r** · ∇ *f* = 0.

# D.36 Coupled Differential Equations

A basic theorem of one-variable calculus is that a first-order differential equation *dx*/*d*β = *f*(β, *x*), with the initial condition that *x* = *b* when the independent variable is β = β_{0}, has a unique solution *x* = *x*(β, *b*). That result is generalized to a set of *N* functions of β by the following theorem.

## Theorem D.36.1: Coupled Differential Equations

*Consider a set of N unknown functions x _{i} for i* = 1, …,

*N obeying the coupled, first order differential equations*

*and the initial conditions that x _{i}* =

*b*β = β

_{i}when the independent variable is_{0}(

*where b*= 1, …,

_{i}for i*N are arbitrarily chosen constants). Assume that the functions f*β

_{i}are continuously differentiable. These equations have a unique solution depending on*and the set of constants b*

_{1}, …,

*b*

_{N}. The solution is the set of equations*for i* = 1, …, *N. The same solution can also be written in implicit form by solving eqn* (D.148) *for the b and writing*

*where the* ϕ* _{i} are functionally independent, and are called integrals of the set of differential equations*.

The proof of this standard theorem can be found, for example, in Chapter 6 of Ford (1955). Note that the initial value of the independent variable β_{0} is *not* included in the list of constants upon which the *x _{i}* depend. It is simply the value of β at which the integration constants

*b*are specified.

_{i}The special case in which the functions *f _{i}* in eqn (D.147) do not depend explicitly on β is of great importance in analytical mechanics.

## (p.571) Theorem D.36.2: Sets of Equations Without the Independent Variable

*Consider the set of differential equations which are the same as eqn* (D.147) *but with the independent variable* β *not appearing explicitly in the functions f _{i}*

*for i* = 1, …, *N. Assume that N* ≥ 2 *and that there is some index l for which f _{l}* = 0

*in some region of interest. For simplicity, but without loss of generality, assume the variables relabeled so that l*= 1

*. Given the initial conditions that, for j*= 2, …,

*N, x*=

_{j}*b*

_{j}when x_{1}=

*b*

_{1}

*, these equations have a unique solution*

*for j* = 2, …, *N, in which the role of independent variable has been assumed by x* _{1} *and the number of arbitrary integration constants has been reduced by one*.

*This solution also can be written in implicit form by solving eqn* (D.151) *for the b and writing*

*for j* = 2, …, *N. The* ϕ* _{j} are the functionally independent integrals of eqn* (D.150).

*There are only N*− 1

*integrals*ϕ

*β.*

_{j}, and they do not depend explicitly on**Proof:** Assuming *f* _{1} ≠ 0, divide the last *N* − 1 of eqn (D.150) by the first one, giving

for *j* = 2, …, *N*. The functions *A _{j}* defined in eqn (D.153) are continuously differentiable, and hence Theorem D.36.1 can be applied with

*f*→

*A*, the replacement β →

*x*

_{1}for the independent variable, and the range of unknown functions now running from

*x*

_{2}to

*x*. The theorem follows immediately.

_{N}Note that the initial value *x* _{1} = *b* _{1} for the new independent variable is *not* included among the arbitrary integration constants *b* _{2}, …, *b _{N}*. As in the original theorem with β

_{0}, it is simply the value of the new independent variable

*x*

_{1}at which the arbitrary integration constants

*x*=

_{j}*b*, for

_{j}*j*= 2, …,

*N*, are specified.

The variable β has disappeared from the solutions, eqns (D.151, D.152). Instead of having all the *x _{i}* as functions of some β, we have solutions with all of the

*x*for

_{j}*j*= 1 given as functions of one of them, the

*x*

_{1}.

But sometimes, even under the conditions of Theorem D.36.2, it is convenient to have a solution with all of the unknowns written as functions of β. Such a *parametric* expression can always be found, as is shown in the following Corollary.

## Corollary D.36.3: Recovery of the Parameter

*Assume that the conditions in Theorem* D.36.2 *hold. Using solutions eqn* (D.151), *a set of functions*

*for i* = 1, …, *N, can always be found that are solutions of the original equations, eqn* (D.150). *They depend only on the difference* (β − β_{0}) *as indicated. Thus no generality is
(p.572)
lost by simply taking* β_{0} = 0*, as is frequently done. The solutions in eqn* (D.154) *can be constructed to obey the initial conditions that* β = β_{0} *implies x _{i}* =

*b*= 1, …,

_{i}for all i*N*.

**Proof:** The first of eqn (D.150) is

where eqn (D.151) has been used to get the last equality. Thus we have a differential equation with one unknown function *x* _{1}

Integrating, with *x* _{1} assigned the value *b* _{1} at β = β_{0},

Denoting by *F* ^{−1} the inverse to function *F* for fixed values of the constants *b* then gives

which has, by construction, the value *x* _{1} = *b* _{1} when β = β_{0}. Substituting this *x* _{1} into the solutions eqn (D.151) then gives

for *j* = 2, …, *N*. Since the solutions eqn (D.151) have the property that *x* _{1} = *b* _{1} implies *x _{j}* =

*b*, the

_{j}*x*defined in eqn (D.159) must taken those same values

_{j}*b*when β = β

_{j}_{0}. Equations (D.158, D.159) are the desired eqn (D.154) and the corollary is proved.

A note of caution: Even though eqn (D.154) does write the solution to eqn (D.150) in a form that appears to have *N* arbitrary integration constants *b* _{1}, …, *b _{N}*, the

*b*

_{1}is not actually an integration constant. It is the initial value of the independent variable

*x*

_{1}in solution eqn (D.151).

# D.37 Surfaces and Envelopes

A two-dimensional surface in three dimensions may be written either as

Any surface in the first form can be written in the second form as *F* (*x*, *y*, *z*) = ϕ(*x*, *y*) − *z* = 0. The second form is preferable since it often leads to simpler expressions. For example, writing the unit sphere as *F*(*x*, *y*, *z*) = *x* ^{2} + *y* ^{2} + *z* ^{2} −1 = 0 avoids
(p.573)
the need to use different signs of a square root for the upper and lower hemispheres as the first form would require.

The differential change in *F* resulting from differential displacement *d* **r** is given by the chain rule as *dF* = *d* **r** · ∇*F*, which is maximum when *d* **r**∥∇*F* and zero when *d* **r**⊥∇*F*. Since a small displacement along the surface keeps *F* = 0 and so must have *dF* = 0, it follows that vector ∇*F* is perpendicular to the surface at point *x*, *y*, *z*. The tangent plane touching the surface at **r** thus consists of the points **r′** where

and ∇*F* is normal to the tangent plane.

Consider now what is called a *one-parameter family* of surfaces, defined by

where each different value of parameter *a* in general gives a different surface. Often (but not always) the two surfaces with parameter values *a*+*da* and *a*−*da* in the limit *da*→0 will intersect in a curved line called a *curve of intersection*. This is also called a *characteristic curve* by some authors, but we use the term “curve of intersection” from Courant and Hilbert (1962) to avoid confusion with other curves in the theory of partial differential equations that are also called characteristic curves.

The curve of intersection lies in both surfaces and hence obeys both *F*(*x*, *y*, *z*, *a* + *da*) = 0 and *F*(*x*, *y*, *z*, *a*−*da*) = 0. But these two equations are satisfied if and only if both

Taking the limit *da*→0, the curve of intersection can therefore be defined as the solution of the two equations

It is the intersection of the surfaces *F* = 0 and *G* = 0 where

As an example, consider the family of spheres *F* = *x* ^{2} + *y* ^{2} + *z* ^{2}−*a* ^{2} = 0. Since the surfaces in this family are concentric spheres, the surfaces with *a*+*da* and *a* −*da* never intersect and so there is no curve of intersection.

A better example for our purposes is the family of unit spheres with center at point *a* on the *z*-axis, *F* = *x* ^{2} + *y* ^{2}+(*z* − *a*)^{2} − 1 = 0. Then ∂ *F*/∂*a* = 2(*z* − *a*)= 0 shows that the curve of intersection for a given *a* is the intersection of the plane *z*=*a* with the sphere *x* ^{2} + *y* ^{2}+(*z* − *a*)^{2} − 1 = 0. It is a unit circle lying in a plane parallel to the *x*-*y* plane and at height *z*=*a*. This circle is also the equator of the sphere for that *a* value.

(p.574)
A suitable one-parameter family of surfaces has a curve of intersection for every value of *a*. The *envelope* of the one-parameter family is a surface defined as the set of all points of all possible curves of intersection. It may be thought of as the surface swept out by the curve of intersection as *a* is varied. It is found by solving the second of eqn (D.164) for *a* as a function of *x*, *y*, *z* and substituting this *a*(*x*, *y*, *z*) into the first of eqn (D.164), thus eliminating *a* between the two equations. For a given value of *a*, thesurface *F* = 0 and the envelope are in contact along the curve of intersection defined by that *a* value.

In the above example, the envelope is got by using ∂ *F*/∂*a* = 2(*z* − *a*)= 0 to get *a*(*x*, *y*, *z*) = *z* and then substituting that *a* into *F* = *x* ^{2} + *y* ^{2}+(*z* − *a*)^{2} −1 = 0 to obtain *x* ^{2} + *y* ^{2} − 1 = 0. The envelope is thus a right circular cylinder of unit radius with its symmetry line along the *z*-axis. This envelope is the surface swept out by all of the curves of intersections (circles of unit radius at height *a*)as *a* is varied. For any value of *a*, the sphere is in contact with the envelope along the equator of the sphere. The sphere thus contacts the envelope along the curve of intersection for that *a* value.

## Notes:

(128)
Definitions, and proofs of theorems stated here without proof, may be found in Chapter II of Courant (1936*b*).

(129)
“Persistent” here means that the nonzero Jacobian determinant in eqn (D.74) must have the *same N* − 1 rowed critical (i.e. nonzero) minor throughout *R* _{⊥}. “Nested” means that this persistent *N* − 1 rowed critical minor must, in turn, have the *same N* − 2 rowed critical minor throughout *R* _{⊥}, etc.