Jump to ContentJump to Main Navigation
Analytical Mechanics for Relativity and Quantum Mechanics$

Oliver Johns

Print publication date: 2005

Print ISBN-13: 9780198567264

Published to Oxford Scholarship Online: January 2010

DOI: 10.1093/acprof:oso/9780198567264.001.0001

(p.540) Appendix D THE CALCULUS OF MANY VARIABLES

Source:
Analytical Mechanics for Relativity and Quantum Mechanics
Publisher:
Oxford University Press

We summarize here some standard results from the calculus of many variables. These theorems will be invoked frequently throughout the text. In particular, the reader should review this material before beginning the chapter on Lagrangian methods. More background on this topic can be found in many standard calculus texts; a particularly accessible source is the two volumes of Courant (1936a,b). The mathematical appendices of Desloge (1982) are also very useful.

D.1 Basic Properties of Functions

Afunction y of the set of variables x 1, x 2,…, xN will be written in either of the two equivalent forms,

(D.1) Appendix D THE CALCULUS OF MANY VARIABLES

In the second form, the single unsubscripted variable x denotes the whole set, x = x 1, x 2,…, xN. Note that the same letter y is used to denote both quantity and function. Whereas a calculus book might write y = f(x), wewrite y = y(x). This notation has the advantage of clarity and alphabetic economy. It will be clear from context when letters like y denote the function and when they denote the value of the function. Usually, a letter on the left of an equal sign denotes the value, the same letter before the (x) denotes the function that produces that value.

D.2 Regions of Definition of Functions

Functions are assumed to be single valued for all x 1, x 2,…, xN in a region R. A simple example of such a region is an open rectangle R defined by ak < xk < bk for k = 1,…, N, wherethe ak < bk are some constants.

The region R is connected if any two points in R can be joined by a curve lying entirely in R. The region is simply connected if it is connected, and if every closed curve lying entirely in R can be continuously shrunk to a point without leaving R. For example, the interior of a ball is simply connected, the interior of a doughnut is not.

Region R is called open if, for every point in R, there is some (possibly quite small) open rectangle lying entirely in R and containing the point. Such an open rectangle will be called an open neighborhood of the point x and will be denoted Nx. See Chapter I of Spivak (1965) for more definitions.

(p.541) D.3 Continuity of Functions

The definition of continuity is similar to that for functions of one variable.128 Define an arbitrary increment hk for each variable xk for k = 1,…, N. Then function y is continuous at point x if, for every possible choice of the hk,

(D.2) Appendix D THE CALCULUS OF MANY VARIABLES

For example, there is no assigned value of y(0, 0) for which y = y ( x 1 , x 2 ) = ( x 1 2 - x 2 2 ) / ( x 1 2 + x 2 2 ) would be continuous at the point x 1 = 0, x 2 = 0, since the left side of eqn (D.2) would have the limit h 1 = 1, h 2 = 0 but the limit −1 when h 1 = 0, h 2 = 1.

D.4 Compound Functions

Suppose we have a function

(D.3) Appendix D THE CALCULUS OF MANY VARIABLES

and each of the x 1, x 2,…, xN is a function of another set of variables r 1, r 2,…rM as in

(D.4) Appendix D THE CALCULUS OF MANY VARIABLES

Then y is a compound function of the variables r. This compound function y = y(r 1, r 2,…, rM) is defined by direct substitution of the xk of eqn (D.4) into eqn (D.3). This substitution will be denoted by

(D.5) Appendix D THE CALCULUS OF MANY VARIABLES

or, in a somewhat shorter notation,

(D.6) Appendix D THE CALCULUS OF MANY VARIABLES

or, even more simply,

(D.7) Appendix D THE CALCULUS OF MANY VARIABLES

D.5 The Same Function in Different Coordinates

A scalar field T is often represented by functions of different kinds of coordinates. For example,

(D.8) Appendix D THE CALCULUS OF MANY VARIABLES

where the subscripts refer to Cartesian, cylindrical polar, and spherical polar coor dinates, respectively. Of course, eqn (D.8) contains the implicit assumption that the various coordinates in it must be linked by the relations in Section A.8, and so represent the same point r. The function denoted T C(x 1, x 2, x 3) is plainly not the same (p.542) function of its arguments x 1, x 2, x 3 as T s(r, θ, ϕ) is of its arguments r, θ, ϕ. For example, suppose T C(x 1, x 2, x 3) = ax 2/x 1 which leads to T s(r, θ, ϕ) = a tanϕ. But they both represent the same underlying scalar field function T = T(r). The almost universal custom in physics texts is to omit the subscripts “C”, “c”, and “s” in eqn (D.8) and write simply

(D.9) Appendix D THE CALCULUS OF MANY VARIABLES

The argument list of a function is taken as an adequate label for it.

The various functions in eqn (D.9) are thought of as the same function. The value T is really a function of the underlying field point r and is only being represented in Cartesian, cylindrical polar, or spherical polar form. This physics custom is especially evident in Lagrangian and Hamiltonian mechanics where one sees expressions such as L (s, ṡ, t) for L ( q , q ˙ , t ) functions expressing the same underlying physical quantity in terms of different variable sets.

We will follow the physics custom in this book, and in fact have already done so by using the same letter y to denote both y(x 1, x 2,…, xN) and the compound function y(r 1, r 2,…, rM) in Section D.4 above. Note that the different functional forms in eqn (D.9) can be defined as compound functions. For example, T(r, θ, ϕ) may be derived from T(x 1, x 2, x 3) by

(D.10) Appendix D THE CALCULUS OF MANY VARIABLES

D.6 Partial Derivatives

Lagrangian mechanics makes extensive use of partial derivatives. Given a function y = y(x 1, x 2,…, xN), a partial derivative with respect to variable xk is defined as

(D.11) Appendix D THE CALCULUS OF MANY VARIABLES

which says to take an ordinary derivative with respect to xk as if variables x 1, x 2,…, x k−1, x k+1,…, xN were constants. It does not say that they are constants, only that they are to be treated as such in calculating the derivative.

Partial derivatives thus depend on the list of variables as well as on the variable being differentiated with respect to, since only that list tells what to hold constant as y is differentiated. The often-seen notation ∂y/∂xk is inherently ambiguous, unless one happens, as here, to know from context what list of variables is intended. In this text we will give the list of variables in all partial derivatives whenever there is any cause for doubt as to what that list might be, often using the shorthand form like ∂y(x)/∂xk in which x stands for the whole list x 1, x 2,…, xN.

One should note that the list of variables x = x 1, x 2,…, xN in a partial derivative is just to indicate those variables to be treated as constants when a derivative is taken. It does not indicate that y necessarily depends on each member of the list in every (p.543) case. Thus, we might have y = 2x 1 + x 3 as the function, even though the list is x 1, x 2,…, x 5. In this case ∂y(x)/∂x 2 = 0, ∂y(x)/∂x 4 = 0, and ∂y(x)/∂x 5 = 0 for all x values.

Conversely, as discussed in Corollary D.10.2 in Section D.10 below, if ∂y(x)/∂xn = 0 for all x values in region R, then y does not depend on xn, and we may choose to expunge xn from the list of variables and write y = y(x 1, x 2,…, x N −1, x N+1,…, xN). For example, the list of variables for the above function might be shortened to x 1, x 2, x 3, with x 4 and x 5 dropped but the x 2 retained.

Partial derivatives are themselves functions of the same variable list x 1, x 2,…, xN as was the function y being differentiated. Thus second and higher derivatives can be taken by repeated application of rules like eqn (D.11). If they exist, these higher derivatives are denoted by expressions like

(D.12) Appendix D THE CALCULUS OF MANY VARIABLES

D.7 Continuously Differentiable Functions

If all first partial derivatives of a function y = y(x 1, x 2,…, xN) exist and are continuous functions of x, then the function y itself is continuous. Such functions are called continuously differentiable. If all partial derivatives up to and including the nth order exist and are continuous functions of x, the function is called continuously differentiable to nth order. Unless specifically stated otherwise, we will assume that all functions used in the present text are continuously differentiable to any order.

D.8 Order of Differentiation

If the second partial derivatives exist and are continuous functions for all x in R (i.e. if y(x) is continuously differentiable to second order), then the order of the secondorder partial derivatives is unimportant since, for all i, j,

(D.13) Appendix D THE CALCULUS OF MANY VARIABLES

Generalizing, if y is continuously differentiable to nth order, then all partial derivatives of that order or less are also independent of the order in which they are taken.

D.9 Chain Rule

Let y be a compound function of r as defined in eqn (D.5). Assume that all partial derivatives of the form ∂y(x)/∂xk and ∂xk(r)/∂rj exist. Then the partial derivatives of y with respect to the r variables exist and may be written using what is called the (p.544) chain rule of partial differentiation,

(D.14) Appendix D THE CALCULUS OF MANY VARIABLES

or, in shorter but equivalent notation,

(D.15) Appendix D THE CALCULUS OF MANY VARIABLES

D.10 Mean Values

Theorem D.10.1: The Mean Value Theorem

Suppose that a function y = y(x 1, x 2,…, xN) is continuously differentiable in a region R and has partial derivativesy(x)/∂xk = gk(x). Let hk be increments added to xk and assume that point (x 1 + ηh 1, x 2 + ηh 2,…, xN + ηhN) lies in region R for all η in the range 0 ≤ η ≤ 1. Then

(D.16) Appendix D THE CALCULUS OF MANY VARIABLES

for some θ in the range 0 < θ < 1.

Corollary D.10.2: Constancy of Functions

Ify(x)/∂xk = 0 for all k = 1,…, N and for all x in R, then function y(x) is constant in that region.

Corollary D.10.3: Non-dependence on Variables

If a function y = y(x 1, x 2,…, xN) hasy(x)/∂xn = 0 for some n and for all x in R, then function y(x) does not depend on the variable xn and so can be written as y = y(x 1, x 2,…, x N −1, x N+1,…, xN) in that region.

D.11 Orders of Smallness

We often want to compare two functions as a variable approaches some limit L. A useful notation is found in Chapter I of Titchmarsh (1939). It is

(D.17) Appendix D THE CALCULUS OF MANY VARIABLES

In words, “Function f is of smaller order than ϕ as x approaches L.” The following notation is also used.

(D.18) Appendix D THE CALCULUS OF MANY VARIABLES

In words, “The difference between f and g is of smaller order than ϕ as x approaches L.”

(p.545) For example, with L = 0,

(D.19) Appendix D THE CALCULUS OF MANY VARIABLES

and, with L=∞,

(D.20) Appendix D THE CALCULUS OF MANY VARIABLES

D.12 Differentials

If y = y(x) is a function of one variable, the change in y as the independent variable is incremented from x to x + dx is y = y(x + dx) − y(x). The increment dx in this expression is not assumed to be small. It may take any value. Assuming that the function is differentiable and has a finite derivative at x, the differential dy at point x is defined as the linear approximation to y based on the tangent line to the curve at point x,

(D.21) Appendix D THE CALCULUS OF MANY VARIABLES

The approximation of y by dy may be good or bad, of course. But we are guaranteed that the difference between the two vanishes for small enough dx. Moreprecisely, it follows from the definition of the derivative that limdx→0 {(Δydy)/dx} = 0, orequivalently, that y = dy + o(dx) as dx → 0. In other words, the difference between Δy and dy is of smaller order than dx, as dx approaches zero.

D.13 Differential of a Function of Several Variables

The definition of differential can be extended to functions of more than one variable. Our definition of this differential follows that of pages 66–69 of Courant (1936b). Note that he calls it the total differential.

If y = y(x 1, x 2,…, xN) and each variable is independently incremented from xk to xk + dxk for all k = 1,…, N, then the change in y is

(D.22) Appendix D THE CALCULUS OF MANY VARIABLES

and the differential dy is defined as as the linear approximation to y given by

(D.23) Appendix D THE CALCULUS OF MANY VARIABLES

As in the one variable case, the increments dxk may be large or small. If the set x 1, x 2,…, xN is a set of independent variables, then the dxk are independent and may take any values.

(p.546) In the many-variable case, just as for the functions of one-variable discussed above, the approximation of Δy by dy may be good or bad. Define h = maxk{|dxk|}. If y(x) is a continuously differentiable function of x, then the mean value theorem, Theorem D.10.1, implies that

(D.24) Appendix D THE CALCULUS OF MANY VARIABLES

where Δy and dy are defined in eqns (D.22, D.23).

The differential in eqn (D.23) is well defined for any values of dxk. Nonetheless, since eqn (D.24) says that, as h = maxk{|dxk|} goes to zero, the difference between Δy and dy if of smaller order than h, it is also legitimate to think of the differential as the small change in the value of y as the independent variables are incremented by small amounts. The differential is often used heuristically in this way.

D.14 Differentials and the Chain Rule

The differential of a compound function may be constructed by direct substitution. As in Section D.4, suppose we have a function y = y(x 1, x 2,…, xN) whose differential is

(D.25) Appendix D THE CALCULUS OF MANY VARIABLES

and suppose that each xk is in turn a function of r so that, for for k = 1,…, N,

(D.26) Appendix D THE CALCULUS OF MANY VARIABLES

Substituting eqn (D.26) into eqn (D.25), gives

(D.27) Appendix D THE CALCULUS OF MANY VARIABLES

where the chain rule eqn (D.15) was used to obtain the last equality. But the last expression in eqn (D.27) is precisely the differential of the compound function y = y(r 1, r 2,…, rm). This example illustrates that differentials provide a clear and correct notation for manipulating the chain rule of partial differentiation.

D.15 Differentials of Second and Higher Orders

The first order differential defined in eqn (D.23) is itself a function of the variables x 1,…, xN, dx 1,…, dxN. Thus, the second-order differential may be defined as the differential of the first-order differential, using the same increments. Writing eqn (p.547) (D.23) using an operator formalism as

(D.28) Appendix D THE CALCULUS OF MANY VARIABLES

we may define

(D.29) Appendix D THE CALCULUS OF MANY VARIABLES

assuming that the second-order partial differentials exist. Generalizing to nth order gives

(D.30) Appendix D THE CALCULUS OF MANY VARIABLES

where we assume that the partial differentials to nth order exist.

D.16 Taylor Series

The Taylor theorem may be thought of as a generalization of the mean value theorem of Section D.10.

Theorem D.16.1: Taylor Series

If function y(x 1,…, xN) is continuously differentiable to (n + 1)st order, then

(D.31) Appendix D THE CALCULUS OF MANY VARIABLES

where all of the differentials dy, d 2 y, etc., are to be evaluated using the same increments dxk and where the remainder Rn is

(D.32) Appendix D THE CALCULUS OF MANY VARIABLES

where

(D.33) Appendix D THE CALCULUS OF MANY VARIABLES

and θ is some number in the range 0 < θ < 1.

(p.548) D.17 Higher-Order Differential as a Difference

Suppose that the conditions of the Taylor theorem of Section D.16 apply, and define the difference y as

(D.34) Appendix D THE CALCULUS OF MANY VARIABLES

Then inspection of eqns (D.31, D.32) shows that

(D.35) Appendix D THE CALCULUS OF MANY VARIABLES

where h = maxk{|dxk|} for 1 ≤ kN.

D.18 Differential Expressions

Given N functions ak(x) of the set of independent variables x 1, x 2,…, xN, we may form differential expressions like

(D.36) Appendix D THE CALCULUS OF MANY VARIABLES

which may or may not be the actual differential of some function. These expressions may be manipulated by the usual rules of algebra (think of the dxk simply as finite increments).

We adopt the usual convention that an equality involving differential expressions includes the implicit assumption that it holds for all possible dxk values. Since the increments dxk of independent variables x 1, x 2,…, xN can take any values, it follows that the dxk may be set nonzero one at a time, which leads to the following Lemmas.

Lemma D.18.1: Zero Differential Expressions

The differential expression is zero

(D.37) Appendix D THE CALCULUS OF MANY VARIABLES

if and only if ak(x) = 0 for all k = 1,…, N.

Lemma D.18.2: Equal Differential Expressions

Two differential expressions are equal,

(D.38) Appendix D THE CALCULUS OF MANY VARIABLES

if and only if ak(x) = bk(x) for all k = 1,…, N.

(p.549) Lemma D.18.3: Differential Expression and Differential

It follows from eqn (D.38) together with the definition of the differential in eqn (D.23) that, if we are given a function f = f(x 1, x 2,…, xN) of the independent variables x 1, x 2,…, xN and a differential expression k = 1 N a k ( x ) d x k , the equality

(D.39) Appendix D THE CALCULUS OF MANY VARIABLES

holds if and only iff(x)/∂xk = ak(x) for all k = 1,…, N.

Lemma D.18.4: Zero Differential

It follows from Lemma D.18.1 that a function f = f(x 1, x 2,…, xN) of the independent variables x 1, x 2,…, xN will have df = 0 at a point x 1, x 2,…, xN if and only iff(x)/∂xk = 0 for all k = 1,…, N at that point.

In some cases of interest, the variables x will be functions of another set of variables r = r 1, r 2,…, rM as in the discussion of compound functions in Section D.4. In that case, using the chain rule from Section D.14, the differential expression becomes

(D.40) Appendix D THE CALCULUS OF MANY VARIABLES

where

(D.41) Appendix D THE CALCULUS OF MANY VARIABLES

In some applications, it is important to know if the Lemmas above still apply to the differential expression k = 1 N a k ( x ) d x k when one assumes only that the r variables are independent.

Theorem D.18.5: Compound Differential Expressions

Let the variables r be assumed to be independent so that the increments drj can be set equal to zero one at a time in eqn (D.40). Then Lemmas D.18.1 through D.18.4 will continue to hold for the differential expression k = 1 N a k ( x ) d x k if and only if M = N and the determinant condition

(D.42) Appendix D THE CALCULUS OF MANY VARIABLES

is satisfied.

Proof: Theorem D.24.1 below shows that the determinant condition in eqn (D.42) is the necessary and sufficient condition for the transformation xr to be invertible.

(p.550) Thus, one can use the inverse matrix from Section D.25 to write

(D.43) Appendix D THE CALCULUS OF MANY VARIABLES

from which a choice of the independent increments drj can be found that will make the dxk have any value desired. Thus the dxk are also arbitrary and independent, and can be set nonzero one at a time, which is the condition needed.

D.19 Line Integral of a Differential Expression

A curve can be defined in region R by making each xk be a function of some monotonically varying parameter β, sothat xk = xk(β) for k = 1,…, N. The integral

(D.44) Appendix D THE CALCULUS OF MANY VARIABLES

is called a line integral of the differential expression k = 1 N a k ( x ) d x k along a portion of that curve.

The line integral in eqn (D.44) is often denoted more simply, as just the integral of a differential expression with integration along a particular curve being understood. Thus, one often sees I 01 denoted as

(D.45) Appendix D THE CALCULUS OF MANY VARIABLES

where the latter form treats the differential expression as a dot product of two vectors in an N-dimensional Cartesian space with the xk as its coordinates.

D.20 Perfect Differentials

Differential expressions k = 1 N a k ( x ) d x k for which a function f exists satisfying d f = k = 1 N a k ( x ) d x k are called perfect differentials. Line integrals and perfect differentials. are treated in Chapter V of Courant (1936b) and Appendix 11 in Volume I of Desloge (1982).

The function f is sometimes called a potential function since the ak(x) can be derived from it by partial differentiation in a way analogous to the derivation of the electric field components from the electric potential. A condition for k = 1 N a k ( x ) d x k to be a perfect differential is given by the following theorem.

Theorem D.20.1: Condition for a Perfect Differential

Assume the variables x 1, x 2,…, xN to lie in an open rectangle R . Given a set of continuously differentiable functions ak(x) for k = 1,…, N, there exists a potential function (p.551) f = f(x 1, x 2,…, xN) such that the following two equivalent conditions are satisfied,

(D.46) Appendix D THE CALCULUS OF MANY VARIABLES

if and only if, for all x in R and all pairs of indices i, j = 1,…, N,

(D.47) Appendix D THE CALCULUS OF MANY VARIABLES

Proof: First we prove that eqn (D.46) implies eqn (D.47). For if an f exists satisfying eqn (D.46), then, using eqn (D.13) gives

(D.48) Appendix D THE CALCULUS OF MANY VARIABLES

To prove that eqn (D.47) implies eqn (D.46) we construct a suitable f explicitly. Starting from some arbitrary point x 1 ( 0 ) , x 2 ( 0 ) , , x N ( 0 ) , we perform a line integral along a series of straight line segments, first along x 1, then along x 2, etc., until the final point 1 at x 1, x 2,…, xN is reached. Such a path will lie entirely in R , and the integral I 10 will be the sum of the integrals along its segments. Along the jth segment, all xi for ij are held constant but xj varies as xj = β, giving dxk(β)/d β = δjk. Inserting this result into eqn (D.44), and setting f(x 1, x 2,…, xN) equal to the integral I 10 gives

(D.49) Appendix D THE CALCULUS OF MANY VARIABLES

Since any point x can be reached by this integration, the function f is defined for all x in R .

The partial derivatives of this f may be written as

(D.50) Appendix D THE CALCULUS OF MANY VARIABLES

With xj temporarily replaced by β, the assumption stated in eqn (D.47) implies that

(D.51) Appendix D THE CALCULUS OF MANY VARIABLES

(p.552) Thus

(D.52) Appendix D THE CALCULUS OF MANY VARIABLES

which shows that the f constructed in eqn (D.49) does have the required property eqn (D.46).

The line integral eqn (D.49) used in this proof is of great practical use. It will be used, for example, to compute generating functions for canonical transformations.

The theorem proved in this section (but not, of course, the proof of it given here) remains true even when open rectangle R is replaced by a more general region R which is only assumed to be open and simply connected. See Chapter V of Courant (1936b) for details.

D.21 Perfect Differential and Path Independence

An alternate, and equivalent, condition for a differential expression to be a perfect differential is that its line integral between two end points be independent of the particular choice of the path between them.

Theorem D.21.1: Path Independence

Assume a given set of functions ak(x) for k = 1,…, N, continuously differentiable in an open and simply connected region R. The differential expression k = 1 N a k ( x ) d x k is a perfect differential with a potential function satisfying the equivalent conditions eqn (D.46),

(D.53) Appendix D THE CALCULUS OF MANY VARIABLES

if and only if the line integral between any two points in R

(D.54) Appendix D THE CALCULUS OF MANY VARIABLES

depends only on x at the endpoints of the integration x 10), x 20),…, xN0) (p.553) and x 11), x 21),…, xN1) and is therefore independent of the path xk(β) taken between those endpoints.

Proof: First assume eqn (D.53) and prove the path independence of eqn (D.54). If a potential function f exists, then eqn (D.54) becomes

(D.55) Appendix D THE CALCULUS OF MANY VARIABLES

which depends only on the value of f at the end points and hence is independent of the path, as was to be proved. For proof of the converse, see Chapter V of Courant (1936b).

It follow from Theorem D.21.1 that integration along any path between the points x 1 ( 0 ) , x 2 ( 0 ) , , x N ( 0 ) would x 1, x ,,…, x N have given the same integral as the particular path used in the proof of Theorem D.20.1.

The path independence of line integrals between any two points is equivalent to the vanishing of line integrals around closed paths. The following Corollary is given without proof.

Corollary D.21.2: Closed Paths

The line integral I 01 between any two points is independent of the path if and only if the line integral around any closed path is zero.

D.22 Jacobians

Suppose that we are given yk = yk(x 1, x 2,…, xN, z 1, z 2,…, zP) = yk(x, z) for k = 1,…, N, where P may have any non-negative value, including the value zero (which would indicate that the extra z variables are absent). Then we may form an N × N matrix of partial derivatives denoted (∂y(x, z)/∂x) and defined by

(D.56) Appendix D THE CALCULUS OF MANY VARIABLES

This matrix is called the Jacobian matrix, and its determinant is called the Jacobian determinant, or simply the Jacobian. It is variously denoted. We give here the two forms we will use, and the determinant itself,

(D.57) Appendix D THE CALCULUS OF MANY VARIABLES

The first form, which is the traditional one, does not specify the list of variables to be held constant when the partial derivatives are taken. It is implicit in the notation that (p.554) all of the variables listed in the denominator are on that list, but there may be others, as here. Usually the intended list is obvious, but in cases of doubt we will modify the traditional notation and use expressions like

(D.58) Appendix D THE CALCULUS OF MANY VARIABLES

in which the list is explicitly stated. Jacobians are treated in Volume I, Appendix 3 of Desloge (1982) and in Chapter III of Courant (1936b).

Lemma D.22.1: Jacobian of a Compound Function

If the variables xi are themselves functions of another set of variables r, as well as of the same extra variables z as above, xi = xi(r 1, r 2,…, rN, z 1, z 2,…, zP) for i = 1,…, N, then the compound functions yk(r 1, r 2,…, rN, z 1, z 2,…, zP) may be defined by

(D.59) Appendix D THE CALCULUS OF MANY VARIABLES

Then the Jacobians obey the relation

(D.60) Appendix D THE CALCULUS OF MANY VARIABLES

or, in the traditional notation,

(D.61) Appendix D THE CALCULUS OF MANY VARIABLES

Proof: The chain rule of partial differentiation gives

(D.62) Appendix D THE CALCULUS OF MANY VARIABLES

which may be written as a matrix equation

(D.63) Appendix D THE CALCULUS OF MANY VARIABLES

Equating the determinant of both sides of eqn (D.63) gives eqn (D.60), as was to be proved.

Lemma D.22.2: Jacobian of an Augmented Variable Set

If yk = yk(x 1, x 2,…, xN, z 1, z 2,…, zP) = yk(x, z) for k = 1,…, N as before, the following identity holds

(D.64) Appendix D THE CALCULUS OF MANY VARIABLES

Proof: The left expression in eqn (D.64) is the determinant of an (N + P) × (N + P) matrix whose last P rows consist of N zeroes followed by an element of the P × P identity matrix. Its determinant is thus the determinant of the N × N upper left-hand block, which is the right expression in eqn (D.64).

(p.555) Lemma D.22.3: Jacobian of an Inverse Function

If a set of functions yk = yk(x 1, x 2,…, xN, z 1, z 2,…, zP) can be solved for x, giving xi = xi(y 1, y 2,…, yN, z 1, z 2,…, zP) then

(D.65) Appendix D THE CALCULUS OF MANY VARIABLES

or, in traditional form,

(D.66) Appendix D THE CALCULUS OF MANY VARIABLES

Proof: As is proved in Theorem D.24.1 below, the necessary and sufficient condition for yk = yk(x, z) to be solved for xi = xi(y, z) is the determinant condition

(D.67) Appendix D THE CALCULUS OF MANY VARIABLES

Then, as discussed in Section D.25, the following matrix equation holds

(D.68) Appendix D THE CALCULUS OF MANY VARIABLES

where U is the N × N unit matrix. Taking the determinant of both sides gives eqn (D.65).

Lemma D.22.4: Change of Variable in an Integral

Given a set of continuously differentiable functions yk = yk(x 1, x 2,…, xN) for k = 1,…, N, whose Jacobian does not vanish in the range of integration,

(D.69) Appendix D THE CALCULUS OF MANY VARIABLES

the multiple integral

(D.70) Appendix D THE CALCULUS OF MANY VARIABLES

may be transformed into an integral over the variables x

(D.71) Appendix D THE CALCULUS OF MANY VARIABLES

where the compound function f(x 1, x 2,…, xN) is

(D.72) Appendix D THE CALCULUS OF MANY VARIABLES

and the limits of integration in eqn (D.71) are chosen so that x ranges over the inverse image of the range of y.

In practice, these limits of integration in eqn (D.71) are usually chosen so that the two integrals would have the same value, including the same sign, for the case in which f is replaced by the number 1.

(p.556) D.23 Global Inverse Function Theorem

We often need to invert a set of functions like yi = yi(x 1, x 2,…, xN, z 1, z 2,…, zP) where i = 1,…, N, that is, to solve them for the variables xj = xj(y 1, y 2,…, yN, z 1, z 2,…, zP). The following global and local inverse function theorems are of great importance.

The inverses proved by the present theorem are called global, because the same, unique inverse functions apply to the whole of an open rectangle. This open rectangle may be indefinitely large. For example, in the transformation from plane polar coordinates ρ, ϕ to plane Cartesian coordinates x, y, the open rectangle might be 0 < ρ < ∞ and −π < ϕ < π.

Theorem D.23.1: The Global Inverse Function Theorem

Assume that all points x 1, x 2,…, xN, z 1, z 2,…, zP lie in an open rectangle R , and that

(D.73) Appendix D THE CALCULUS OF MANY VARIABLES

for i = 1,…, N, are a set of continuously differentiable functions of the stated variables. If for all x, z in R , the Jacobian determinant

(D.74) Appendix D THE CALCULUS OF MANY VARIABLES

is nonzero and has a persistent, nested set of critical minors,129 then functions yi = yi(x 1, x 2,…, xN, z 1, z 2,…, zP) can be solved for the inverse functions xj = xj(y 1, y 2,…, yN, z 1, z 2,…, zP) for j = 1,…, N so that

(D.75) Appendix D THE CALCULUS OF MANY VARIABLES

These inverse functions will be unique and continuously differentiable in the range covered by variables y, z as variables x, z range over R .

Proof: This proof is adapted from Volume II, Appendix 18 of Desloge (1982). The proof is by induction. First prove the theorem for N = 1. Then prove that, if the theorem is true for N = k − 1, it must be true for N = K. It follows that the theorem must be true for any integer N.

For N = 1, (∂y(x, z)/∂x) = ∂y 1(x 1, z)/∂x 1. By assumption, this partial derivative is nonzero in a range of values a 1 < x 1 < b 1. Considering the z variables as fixed parameters, the inverse function theorem of ordinary one-variable calculus applies. Since ∂y 1(x 1, z)/∂x 1 is continuous and nonzero, it must have the same sign throughout the range. Thus y 1(x 1, z) is monotonic and has a unique inverse x 1 = x 1(y 1, z). Since (p.557) x 1(x 1, z)/∂y 1 = (∂y 1(x 1, z)/∂x 1)−1 it follows that x 1 = x 1(y 1, z) is also a continuously differentiable function.

For N = K, the Jacobian determinant is

(D.76) Appendix D THE CALCULUS OF MANY VARIABLES

Since by assumption this determinant is nonzero, it must have a critical (k − 1) rowed minor. By assumption, the same minor is nonzero for all x, z in R . For simplicity (and without loss of generality since the functions and variables can be relabeled in any way) we assume that this is the minor with the first row and first column removed. Then

(D.77) Appendix D THE CALCULUS OF MANY VARIABLES

By the induction assumption, the theorem is true for N = k − 1. So eqn (D.77), and its assumed persistent nested critical minors, imply that inverse functions exist of the form

(D.78) Appendix D THE CALCULUS OF MANY VARIABLES

where the set z 1, z 2,…, zP is now being represented by the single unsubscripted letter z. Substitute eqn (D.78) into y 1 to obtain the compound function y 1 = y 1(x 1, y 2,…, yK, z) defined by

(D.79) Appendix D THE CALCULUS OF MANY VARIABLES

We now show that this function can be solved for x 1. Using the chain rule and eqn (p.558) (D.79), the partial derivative ∂y 1(x 1, y 2,…, yK, z)/∂x 1 can be expanded as

(D.80) Appendix D THE CALCULUS OF MANY VARIABLES

where eqn (D.64) has been used. Now multiply each term in eqn (D.80) by the Jacobian determinant

(D.81) Appendix D THE CALCULUS OF MANY VARIABLES

and use eqn (D.61) to obtain

(D.82) Appendix D THE CALCULUS OF MANY VARIABLES

But eqn (D.64) and the usual rules for sign change when rows or columns of a determinant are exchanged, show the last Jacobian determinant in eqn (D.82) to be

(D.83) Appendix D THE CALCULUS OF MANY VARIABLES

Thus

(D.84) Appendix D THE CALCULUS OF MANY VARIABLES

The right side of eqn (D.84) is just the Jacobian determinant ∂(y 1, y 2,…, yK)/∂(x 1, x 2,…, xK) evaluated using expansion by cofactors along its first row. Thus eqn (D.82) becomes

(D.85) Appendix D THE CALCULUS OF MANY VARIABLES

Since both

(D.86) Appendix D THE CALCULUS OF MANY VARIABLES

(p.559) by assumption, this proves that the partial derivative

(D.87) Appendix D THE CALCULUS OF MANY VARIABLES

Since R is an open rectangle, eqn (D.87) will hold for any fixed values of y 2,…, yK, z 1, z 2,…, zP and for all a 1 < x 1 < b 1 where a 1 and b 1 are the least and greatest value of x 1 in the open rectangle. Thus, by the same reasoning as was used for the case N = 1 above, y 1 = y 1(x 1, y 2,…, yK, z) may be inverted to yield the unique, continuously differentiable inverse function x 1 = x 1(y 1, y 2,…, yK, z). Substituting that equation into eqn (D.78) gives the desired result: xi = xi(y 1, y 2,…, yK, z) for all i = 1,… K.

Since the truth of the theorem for N = K − 1 proves its truth for N = K, the theorem must be true for any N.

D.24 Local Inverse Function Theorem

In many cases, global inverses provided by Theorem D.23.1 are not needed, and indeed may not be available. In these cases, we can still define local inverse functions. These local inverses will be proved to exist only in some open neighborhood N x, z surrounding point x, z.

Local inverses are important because they may exist at points of a wider class of regions than the R assumed in Theorem D.23.1. And, of course, if a global inverse does exist in a region, then local inverses will exist at each point of that region also.

Theorem D.24.1: The Local Inverse Function Theorem

Assume that point x 1, x 2,…, xN, z 1, z 2,…, zP lies in an open region R, and that

(D.88) Appendix D THE CALCULUS OF MANY VARIABLES

are continuously differentiable functions of the stated variables. If the Jacobian determinant

(D.89) Appendix D THE CALCULUS OF MANY VARIABLES

is nonzero at the point x, z, then there is some open neighborhood Nx, z of this point in which functions yi = yi(x 1, x 2,…, xN, z 1, z 2,…, zP) can be solved for the inverse functions

(D.90) Appendix D THE CALCULUS OF MANY VARIABLES

for j = 1,…, N. These inverse function will be unique and continuously differentiable in the range covered by variables y, z as variables x, z vary over Nx, z.

Proof: Since R is open, every point x, z is in some open neighborhood N x, z which is contained entirely in R. Since the function y is assumed to be continuously differentiable, the Jacobian eqn (D.74), and all of its minors, are continuous functions of x, z. Thus, since the Jacobian (and hence a set of nested minors) are nonzero at x, z by assumption, we may shrink the open neighborhood N x, z until these determinants (p.560) are nonzero over the whole of N x, z. Since, by definition, the open neighborhood is a (possibly small) open rectangle, the conditions of Theorem D.23.1 are satisfied, and the local inverse functions exist, as was to be proved. A direct proof of this theorem, not based on the global inverse function theorem, is given on page 152 of Courant (1936b).

D.25 Derivatives of the Inverse Functions

If inverse functions, from either the global or local inverse function theorems, exist at some point x, z, then the partial derivatives of the inverse functions xj can be expressed in terms of the partial derivatives of the original functions yi at that point.

Substituting eqn (D.90) into eqn (D.88) gives the compound function

(D.91) Appendix D THE CALCULUS OF MANY VARIABLES

The chain rule then gives

(D.92) Appendix D THE CALCULUS OF MANY VARIABLES

which in matrix form is

(D.93) Appendix D THE CALCULUS OF MANY VARIABLES

where U denotes the N × N identity matrix. It follows that both product matrices are nonsingular and that

(D.94) Appendix D THE CALCULUS OF MANY VARIABLES

and hence that, for any i, j

(D.95) Appendix D THE CALCULUS OF MANY VARIABLES

which expresses the partials of xj with respect to the yi as functions of the partials of the original functions yi(x, z).

Similarly, applying the chain rule to the differentiation of eqn (D.91) with respect to zn gives

(D.96) Appendix D THE CALCULUS OF MANY VARIABLES

which leads to the matrix equation

(D.97) Appendix D THE CALCULUS OF MANY VARIABLES

and hence

(D.98) Appendix D THE CALCULUS OF MANY VARIABLES

which expresses the partials of xj with respect to the zn in terms of the partials of the (p.561) original functions yi(x, z).

D.26 Implicit Function Theorem

It often happens that a set of functions yj = yj(x 1,x 2,…,xP), for j = 1,…,N, is not given directly, but rather in implicit form. One defines other functions fi(x, y), for i = 1,…,N, and requires all x,y values to be those that will make these fi identically zero,

(D.99) Appendix D THE CALCULUS OF MANY VARIABLES

The following theorem gives the conditions under which such identities actually specify the implicit functions yj.

Theorem D.26.1: Implicit Function Theorem

Assume that fi(x 1, x 2,…, xP, y 1, y 2,…, yN) for i = 1,…, N, are continuously differentiable functions. If the Jacobian determinant

(D.100) Appendix D THE CALCULUS OF MANY VARIABLES

is nonzero at point x, y, then there is an open neighborhood Nxy of the point x, y in which the identities

(D.101) Appendix D THE CALCULUS OF MANY VARIABLES

for i = 1,…, N, can be solved for the implicit functions yj = yj(x 1, x 2,…, xP) for j = 1,…, N. These functions will be unique and continuously differentiable in the open neighborhood.

Proof: Apply the local inverse function theorem, Theorem D.24.1, to solve fi = fi(x 1, x 2,…, x p, y 1, y 2,…, y N) for y j = y j(f 1,…, f N, x 1,…, x p). Then apply the identity to set fi = 0 for each i, and so obtain

(D.102) Appendix D THE CALCULUS OF MANY VARIABLES

which are the desired functions.

D.27 Derivatives of Implicit Functions

Applying eqn (D.98) of Section D.25 with the replacements yf, xy, zx, and f then set equal to zero, gives the partial derivatives of the implicit functions in terms of the partial derivatives of the fi. Thus, for all j = 1,…, N and n = 1,…, P,

(D.103) Appendix D THE CALCULUS OF MANY VARIABLES

(p.562) D.28 Functional Independence

Consider the M continuously differentiable functions of N variables fk(x 1, x 2,…, xN) for k = 1,…, M. These functions are functionally dependent and have a dependency relation at point x if there is a continuously differentiable function F with at least one nonzero partial derivative ∂ F(f)/∂ fi ≠ 0 for which

(D.104) Appendix D THE CALCULUS OF MANY VARIABLES

holds identically in an open neighborhood Nx of point x.

If the functions have no dependency relation at point x, they are said to be functionally independent at that point. We now give a condition for a set of functions fk to be functionally independent.

Theorem D.28.1: Condition for Functional Independence

Consider M continuously differentiable functions of N variables, fk(x 1, x 2,…, xN) for k = 1,…, M. Letthe M × N matrix (∂ f(x)/∂x) be defined by its matrix elements

(D.105) Appendix D THE CALCULUS OF MANY VARIABLES

If, for some x, therank r of this matrix is r = M (which is possible only if MN since r cannot be greater than N), then the functions fk are functionally independent at x.

Proof: We show that the existence of a dependency relation implies that r < M, and therefore that r = M implies functional independence. Differentiating an assumed dependency relation eqn (D.104) with respect to xi using the chain rule gives

(D.106) Appendix D THE CALCULUS OF MANY VARIABLES

Defining an M × 1 column vector[∂ F(f)/∂ f] by

(D.107) Appendix D THE CALCULUS OF MANY VARIABLES

we may rewrite eqn (D.106) as the matrix equation

(D.108) Appendix D THE CALCULUS OF MANY VARIABLES

By assumption the column vector [∂ F(f)/∂f] will have at least one nonzero element. Thus Corollary B.19.2 requires that the matrix in eqn (D.108), and hence its transpose (∂f(x)/∂x), must have rank less than M. Since the existence of a dependency relation implies r < M, it follows that r = M at point x implies the non-existence of a dependency relation, as was to be proved.

(p.563) D.29 Dependency Relations

When functions are functionally dependent, there may be one or more dependency relations among them. It is often important to know how many dependency relations there are, as specified in the following theorem.

Theorem D.29.1: Dependency Relations

Consider M continuously differentiable functions,

(D.109) Appendix D THE CALCULUS OF MANY VARIABLES

of N variables x = x 1, x 2,…, xN. Consider again the M × N matrix (∂f(x)/∂x) defined by its matrix elements in eqn (D.105). If at the point x the rank r of this matrix is r < M, then there are Mr functionally independent dependency relations among the fk which hold in an open neighborhood Nx of point x,

(D.110) Appendix D THE CALCULUS OF MANY VARIABLES

For a proof of this theorem, see Volume I, Appendix 14 of Desloge (1982).

D.30 Legendre Transformations

We often have a function like f(x 1,…, xM, y 1,…, yN) = f(x, y) that is important mainly because its partial derivatives yield desired functions ui = ui(x, y) and wj = wj(x, y), as in

(D.111) Appendix D THE CALCULUS OF MANY VARIABLES

for i = 1,…, M and j = 1,…, N.

It is sometimes useful to have a different, but related, function

(D.112) Appendix D THE CALCULUS OF MANY VARIABLES

whose partial derivatives are

(D.113) Appendix D THE CALCULUS OF MANY VARIABLES

again for i = 1,…, M and j = 1,…, N. Notice that the roles of yj and wj are interchanged in the last partial derivatives in eqns (D.111, D.113), while the partials with respect to xi yield the same function ui, except for the minus sign and its expression in the new x, w variable set. This transformation from f to g is called a Legendre transformation.

(p.564) The Legendre transformation is effected by defining g to be

(D.114) Appendix D THE CALCULUS OF MANY VARIABLES

But the g in eqn (D.114) is not yet expressed in the correct set of variables x, w. In order to complete the Legendre transformation, we must prove that the functions wj = wj(x, y) defined in the second of eqn (D.111) can be inverted to give yj = yj(x, w). By Section D.24, the condition for this inversion to be possible is that

(D.115) Appendix D THE CALCULUS OF MANY VARIABLES

where the matrix (∂w(x, y)/∂y) is defined by

(D.116) Appendix D THE CALCULUS OF MANY VARIABLES

In general, eqn (D.115) must be proved in each individual case to which the Legendre transformation is to be applied. Assume now that this has been done.

The inverse function yj = yj(x, w) then allows us to write g as a compound function of the variable set x, w as

(D.117) Appendix D THE CALCULUS OF MANY VARIABLES

Now consider the differential of function g. Equation (D.114) gives

(D.118) Appendix D THE CALCULUS OF MANY VARIABLES

where the second of eqn (D.111) has been used to cancel the dyj terms, and the first of eqn (D.111) has been used to get the ui. We know from eqn (D.117) that g(x, w) exists and is well defined, and hence has the differential

(D.119) Appendix D THE CALCULUS OF MANY VARIABLES

Equations (D.118, D.119) are two expressions for the same differential dg and hence equal to each other. Since|∂w(x, y)/∂y|≠ 0 by assumption, Theorem D.18.5 (p.565) shows that the differentials dw, dx may be considered as arbitrary and independent. Thus the equality of eqns (D.118, D.119) implies the equality of each of the coefficients of the differentials dw j and dxi in the two equations. Thus eqn (D.113) holds, as desired.

D.31 Homogeneous Functions

Let f (x 1, x 2, …, xN, z 1, z 2, …, zP) = f(x, z) be a continuously differentiable function of the stated variables, defined over a region R. Function f(x, z) is homogeneous of degree k in the set of variables x 1, x 2, …, xN if and only if, for some k and any positive number λ > 0,

(D.120) Appendix D THE CALCULUS OF MANY VARIABLES

An alternate, and equivalent, definition of homogeneous functions is given in the following theorem, whose proof can be found in Chapter II of Courant (1936b).

Theorem D.31.1: Euler Condition

Function f(x, z) is homogeneous of degree k in the set of variables x1, x2, …, xN as defined in eqn (D.120) if and only if

(D.121) Appendix D THE CALCULUS OF MANY VARIABLES

D.32 Derivatives of Homogeneous Functions

In Lagrangian mechanics, it is important also to consider the homogeneity of the partial derivatives of homogeneous functions.

Theorem D.32.1: Derivatives of Homogeneous Functions

Function f is homogeneous of degree k + 1 in the set of variables x1, x2, …, xN, that is

(D.122) Appendix D THE CALCULUS OF MANY VARIABLES

if and only if all partial derivativesf(x, z)/∂xi = gi(x, z) for i = 1, …, N are homogeneous of degree k, that is

(D.123) Appendix D THE CALCULUS OF MANY VARIABLES

Proof: First, we assume eqn (D.122) and prove eqn (D.123). Taking partial derivatives, and using eqn (D.122) gives

(D.124) Appendix D THE CALCULUS OF MANY VARIABLES

which establishes eqn (D.123).

(p.566) Conversely, assume eqn (D.123) and prove eqn (D.122). Equation (D.123) can be written

(D.125) Appendix D THE CALCULUS OF MANY VARIABLES

Hence

(D.126) Appendix D THE CALCULUS OF MANY VARIABLES

and so

(D.127) Appendix D THE CALCULUS OF MANY VARIABLES

It follows from Corollary D.10.2 that fx, z) − λ(k+1) f(x, z) = C. Setting λ = 1 proves that the constant C = 0, which is equivalent to eqn (D.122). ⎸

D.33 Stationary Points

For functions of one variable f = f(x), maxima, minima, and points of inflection (collectively called stationary points) are those points at which df(x)/dx = 0.

For functions of many variables, stationary points may be defined similarly as points at which, for all k = 1, …, N,

(D.128) Appendix D THE CALCULUS OF MANY VARIABLES

Since the variables x and hence their differentials dxk for k = 1, …, N are assumed independent, Lemma D.18.4 implies that this definition may be stated more simply, and equivalently, as

(D.129) Appendix D THE CALCULUS OF MANY VARIABLES

Since we know that this differential approximates the difference Δf when the differentials dxj are small, it is also valid to say that the extremum is a point such that f is constant to first order for small excursions from it in any direction.

In one sense, the many variable case is much more complex than the single variable one. A function of many variables may have maxima in some variables, minima in others, etc. We avoid this complexity here by assuming that it will be clear from context in most practical examples whether the stationary point is a maximum, minimum, or some more complicated mixture of the two.

D.34 Lagrange Multipliers

Often, we want to find the stationary points of a function f(x) given certain constraints on the allowed values of x 1, x 2, …, xN and their differentials. These constraints are expressed by defining a functionally independent set of functions Ga and setting them equal to zero, 0= Ga(x 1, x 2, …, xN) for a = 1, …, C.

For example, with

(D.130) Appendix D THE CALCULUS OF MANY VARIABLES

defined to be the distance from the origin in a three-dimensional Cartesian space, eqn (D.128) gives a stationary point at (0, 0, 0), obviously a minimum. But suppose that (p.567) we want the stationary points of f subject to the constraint that x lie on a plane at distance Λ from the origin, which can be expressed by

(D.131) Appendix D THE CALCULUS OF MANY VARIABLES

where constants α, β, γ obeyα2 + β2 + γ2 = 1. The Lagrange multiplier theorem gives an elegant method for solving such problems.

Theorem D.34.1: Lagrange Multiplier Theorem

Let the values of the independent variables x 1, x 2,…, xN in some open region R be constrained by equations of the form

(D.132) Appendix D THE CALCULUS OF MANY VARIABLES

for a = 1,…, C, where the Ga are continuously differentiable functions. Assume that the C × N Jacobian matrix g defined by

(D.133) Appendix D THE CALCULUS OF MANY VARIABLES

has rank C so that the functions Ga are functionally independent and represent C independent constraints.

A continuously differentiable function f(x 1, x 2,…, xN) has a stationary point at x, subject to these constraints, if and only if there exist Lagrange multipliers λa = λa(x) such that, at the stationary point,

(D.134) Appendix D THE CALCULUS OF MANY VARIABLES

for all k = 1,…, N.

Proof: The necessary and sufficient condition for a stationary point is that df = 0 subject to the constraints on the possible dxk values given by the vanishing of the differentials of eqn (D.132)

(D.135) Appendix D THE CALCULUS OF MANY VARIABLES

for all a = 1,…, C. Since the matrix g defined in eqn (D.133) has rank C, it must have an C-rowed critical minor. The variables x can be relabeled in any way, so we lose no generality by assuming that this critical minor is the determinant containing the C rows and the last C columns. Call the corresponding matrix g(b) so that g a j ( b ) = g a ( N - C + j ) for a, j = 1,…, C. By construction, the determinant of this matrix is a critical minor, and so |g(b)| ≠ 0. In the following, we will also use the shorthand notations x(f) for what will be called the free variables x 1,…, x NC and x(b) for the bound variables x NC+1,…, xN. And x will continue to denote the full set of variables x1, x2,…, xN.

(p.568) Using these notations, eqn (D.135) may be written with separate sums over the free and bound variables

(D.136) Appendix D THE CALCULUS OF MANY VARIABLES

Since g(b) is nonsingular, it has an inverse g(b)−1, which may be used with eqn (D.136) to write the bound differentials dx (b) in terms of the free ones.

(D.137) Appendix D THE CALCULUS OF MANY VARIABLES

The differential df can also be written with separate sums over the free and bound variables

(D.138) Appendix D THE CALCULUS OF MANY VARIABLES

Substituting eqn (D.137) into this expression gives

(D.139) Appendix D THE CALCULUS OF MANY VARIABLES

where the λa in the last expression are defined, for a = 1,…,C, as

(D.140) Appendix D THE CALCULUS OF MANY VARIABLES

The solution, eqn (D.137), for the bound differentials dx (b) reduces eqn (D.135) to an identity. Hence no constraint is placed on the free differentials dx(f). Setting these free differentials nonzero one at a time in eqn (D.139), the condition df = 0 for x to be a stationary point implies and is implied by

(D.141) Appendix D THE CALCULUS OF MANY VARIABLES

for all i = 1,…,(NC), which establishes the theorem for those values of the index.

For indices in the range (NC + 1),…, N, with the same choice of λa as defined in eqn (D.140), the eqn (D.134) is satisfied identically at all points, including (p.569) the stationary one. To demonstrate this, let j = 1,…, C and use eqn (D.140) to write

(D.142) Appendix D THE CALCULUS OF MANY VARIABLES

Thus eqn (D.134) holds for all index values if and only if x is a stationary point, as was to be proved.

Note that the N equations of eqn (D.134) together with the C equations of eqn (D.132), are N + C equations in the N + C unknowns x 1, x 2,…, xN, λ1,…, λC, and so can be solved to find the stationary points.

D.35 Geometry of the Lagrange Multiplier Theorem

When applied to the simple example in eqns (D.130, D.131) of Section D.34, the three equations of the Lagrange multiplier condition eqn (D.134) can be written as a single vector equation, the equality of two gradient vectors,

(D.143) Appendix D THE CALCULUS OF MANY VARIABLES

Since, as described in Section D.37, vector ∇G 1 = αê 1 + βê 2 + γ ê 3 is perpendicular to the surface of constraint G 1 = 0, eqn (D.143) says that ∇ f must be in the same or opposite direction, and so also perpendicular to that constraint surface.

If we denote by d r = dx 1 ê 1 + dx 2 ê 2 + dx 3 ê 3 the differential displacement vector whose Cartesian components are the differentials dxi, the chain rule gives

(D.144) Appendix D THE CALCULUS OF MANY VARIABLES

The constraint G 1 = 0, and hence dG 1 = 0, constrains the vector d r to have a zero dot product with ∇G 1, and so to be perpendicular to the perpendicular to the surface of constraint. Thus d r must lie in the surface of constraint.

If there were no constraint, the condition for f to have a stationary point would simply be that

(D.145) Appendix D THE CALCULUS OF MANY VARIABLES

Since without a constraint the displacement d r can take any direction, eqn (D.145) implies that ∇ f = 0, which is equivalent to the three equations of eqn (D.128).

However, with the constraint, eqn (D.145) does not imply that ∇ f = 0, butonly that ∇ f must have no components in possible d r directions. Since d r can lie anywhere (p.570) in the surface of constraint, this means that ∇ f must be perpendicular to that surface, in other words that ∇ f must be parallel or anti-parallel to ∇G 1, as eqn (D.143) states.

If another constraint were added, then eqn (D.143) would become

(D.146) Appendix D THE CALCULUS OF MANY VARIABLES

Adding the second constraint further restricts the possible directions of d r and hence increases the possible directions that ∇ f can have while still maintaining the condition d r · ∇ f = 0.

D.36 Coupled Differential Equations

A basic theorem of one-variable calculus is that a first-order differential equation dx/dβ = f(β, x), with the initial condition that x = b when the independent variable is β = β0, has a unique solution x = x(β, b). That result is generalized to a set of N functions of β by the following theorem.

Theorem D.36.1: Coupled Differential Equations

Consider a set of N unknown functions xi for i = 1, …, N obeying the coupled, first order differential equations

(D.147) Appendix D THE CALCULUS OF MANY VARIABLES

and the initial conditions that xi = bi when the independent variable is β = β0 (where bi for i = 1, …, N are arbitrarily chosen constants). Assume that the functions fi are continuously differentiable. These equations have a unique solution depending on β and the set of constants b 1, …, bN. The solution is the set of equations

(D.148) Appendix D THE CALCULUS OF MANY VARIABLES

for i = 1, …, N. The same solution can also be written in implicit form by solving eqn (D.148) for the b and writing

(D.149) Appendix D THE CALCULUS OF MANY VARIABLES

where the ϕi are functionally independent, and are called integrals of the set of differential equations.

The proof of this standard theorem can be found, for example, in Chapter 6 of Ford (1955). Note that the initial value of the independent variable β0 is not included in the list of constants upon which the xi depend. It is simply the value of β at which the integration constants bi are specified.

The special case in which the functions fi in eqn (D.147) do not depend explicitly on β is of great importance in analytical mechanics.

(p.571) Theorem D.36.2: Sets of Equations Without the Independent Variable

Consider the set of differential equations which are the same as eqn (D.147) but with the independent variable β not appearing explicitly in the functions fi

(D.150) Appendix D THE CALCULUS OF MANY VARIABLES

for i = 1, …, N. Assume that N ≥ 2 and that there is some index l for which fl = 0 in some region of interest. For simplicity, but without loss of generality, assume the variables relabeled so that l = 1. Given the initial conditions that, for j = 2, …, N, xj = bj when x 1 = b 1 , these equations have a unique solution

(D.151) Appendix D THE CALCULUS OF MANY VARIABLES

for j = 2, …, N, in which the role of independent variable has been assumed by x 1 and the number of arbitrary integration constants has been reduced by one.

This solution also can be written in implicit form by solving eqn (D.151) for the b and writing

(D.152) Appendix D THE CALCULUS OF MANY VARIABLES

for j = 2, …, N. The ϕj are the functionally independent integrals of eqn (D.150). There are only N − 1 integrals ϕj, and they do not depend explicitly on β.

Proof: Assuming f 1 ≠ 0, divide the last N − 1 of eqn (D.150) by the first one, giving

(D.153) Appendix D THE CALCULUS OF MANY VARIABLES

for j = 2, …, N. The functions Aj defined in eqn (D.153) are continuously differentiable, and hence Theorem D.36.1 can be applied with fA, the replacement β → x 1 for the independent variable, and the range of unknown functions now running from x 2 to xN. The theorem follows immediately.

Note that the initial value x 1 = b 1 for the new independent variable is not included among the arbitrary integration constants b 2, …, bN. As in the original theorem with β0, it is simply the value of the new independent variable x 1 at which the arbitrary integration constants xj = bj, for j = 2, …, N, are specified.

The variable β has disappeared from the solutions, eqns (D.151, D.152). Instead of having all the xi as functions of some β, we have solutions with all of the xj for j = 1 given as functions of one of them, the x 1.

But sometimes, even under the conditions of Theorem D.36.2, it is convenient to have a solution with all of the unknowns written as functions of β. Such a parametric expression can always be found, as is shown in the following Corollary.

Corollary D.36.3: Recovery of the Parameter

Assume that the conditions in Theorem D.36.2 hold. Using solutions eqn (D.151), a set of functions

(D.154) Appendix D THE CALCULUS OF MANY VARIABLES

for i = 1, …, N, can always be found that are solutions of the original equations, eqn (D.150). They depend only on the difference (β − β0) as indicated. Thus no generality is (p.572) lost by simply taking β0 = 0, as is frequently done. The solutions in eqn (D.154) can be constructed to obey the initial conditions that β = β0 implies xi = bi for all i = 1, …, N.

Proof: The first of eqn (D.150) is

(D.155) Appendix D THE CALCULUS OF MANY VARIABLES

where eqn (D.151) has been used to get the last equality. Thus we have a differential equation with one unknown function x 1

(D.156) Appendix D THE CALCULUS OF MANY VARIABLES

Integrating, with x 1 assigned the value b 1 at β = β0,

(D.157) Appendix D THE CALCULUS OF MANY VARIABLES

Denoting by F −1 the inverse to function F for fixed values of the constants b then gives

(D.158) Appendix D THE CALCULUS OF MANY VARIABLES

which has, by construction, the value x 1 = b 1 when β = β0. Substituting this x 1 into the solutions eqn (D.151) then gives

(D.159) Appendix D THE CALCULUS OF MANY VARIABLES

for j = 2, …, N. Since the solutions eqn (D.151) have the property that x 1 = b 1 implies xj = bj, the xj defined in eqn (D.159) must taken those same values bj when β = β0. Equations (D.158, D.159) are the desired eqn (D.154) and the corollary is proved.

A note of caution: Even though eqn (D.154) does write the solution to eqn (D.150) in a form that appears to have N arbitrary integration constants b 1, …, bN, the b 1 is not actually an integration constant. It is the initial value of the independent variable x 1 in solution eqn (D.151).

D.37 Surfaces and Envelopes

A two-dimensional surface in three dimensions may be written either as

(D.160) Appendix D THE CALCULUS OF MANY VARIABLES

Any surface in the first form can be written in the second form as F (x, y, z) = ϕ(x, y) − z = 0. The second form is preferable since it often leads to simpler expressions. For example, writing the unit sphere as F(x, y, z) = x 2 + y 2 + z 2 −1 = 0 avoids (p.573) the need to use different signs of a square root for the upper and lower hemispheres as the first form would require.

The differential change in F resulting from differential displacement d r is given by the chain rule as dF = d r · ∇F, which is maximum when d r∥∇F and zero when d r⊥∇F. Since a small displacement along the surface keeps F = 0 and so must have dF = 0, it follows that vector ∇F is perpendicular to the surface at point x, y, z. The tangent plane touching the surface at r thus consists of the points r′ where

(D.161) Appendix D THE CALCULUS OF MANY VARIABLES

and ∇F is normal to the tangent plane.

Consider now what is called a one-parameter family of surfaces, defined by

(D.162) Appendix D THE CALCULUS OF MANY VARIABLES

where each different value of parameter a in general gives a different surface. Often (but not always) the two surfaces with parameter values a+da and ada in the limit da→0 will intersect in a curved line called a curve of intersection. This is also called a characteristic curve by some authors, but we use the term “curve of intersection” from Courant and Hilbert (1962) to avoid confusion with other curves in the theory of partial differential equations that are also called characteristic curves.

The curve of intersection lies in both surfaces and hence obeys both F(x, y, z, a + da) = 0 and F(x, y, z, ada) = 0. But these two equations are satisfied if and only if both

(D.163) Appendix D THE CALCULUS OF MANY VARIABLES

Taking the limit da→0, the curve of intersection can therefore be defined as the solution of the two equations

(D.164) Appendix D THE CALCULUS OF MANY VARIABLES

It is the intersection of the surfaces F = 0 and G = 0 where

(D.165) Appendix D THE CALCULUS OF MANY VARIABLES

As an example, consider the family of spheres F = x 2 + y 2 + z 2a 2 = 0. Since the surfaces in this family are concentric spheres, the surfaces with a+da and ada never intersect and so there is no curve of intersection.

A better example for our purposes is the family of unit spheres with center at point a on the z-axis, F = x 2 + y 2+(za)2 − 1 = 0. Then ∂ F/∂a = 2(za)= 0 shows that the curve of intersection for a given a is the intersection of the plane z=a with the sphere x 2 + y 2+(za)2 − 1 = 0. It is a unit circle lying in a plane parallel to the x-y plane and at height z=a. This circle is also the equator of the sphere for that a value.

(p.574) A suitable one-parameter family of surfaces has a curve of intersection for every value of a. The envelope of the one-parameter family is a surface defined as the set of all points of all possible curves of intersection. It may be thought of as the surface swept out by the curve of intersection as a is varied. It is found by solving the second of eqn (D.164) for a as a function of x, y, z and substituting this a(x, y, z) into the first of eqn (D.164), thus eliminating a between the two equations. For a given value of a, thesurface F = 0 and the envelope are in contact along the curve of intersection defined by that a value.

In the above example, the envelope is got by using ∂ F/∂a = 2(za)= 0 to get a(x, y, z) = z and then substituting that a into F = x 2 + y 2+(za)2 −1 = 0 to obtain x 2 + y 2 − 1 = 0. The envelope is thus a right circular cylinder of unit radius with its symmetry line along the z-axis. This envelope is the surface swept out by all of the curves of intersections (circles of unit radius at height a)as a is varied. For any value of a, the sphere is in contact with the envelope along the equator of the sphere. The sphere thus contacts the envelope along the curve of intersection for that a value.

Notes:

(128) Definitions, and proofs of theorems stated here without proof, may be found in Chapter II of Courant (1936b).

(129) “Persistent” here means that the nonzero Jacobian determinant in eqn (D.74) must have the same N − 1 rowed critical (i.e. nonzero) minor throughout R . “Nested” means that this persistent N − 1 rowed critical minor must, in turn, have the same N − 2 rowed critical minor throughout R , etc.