(p.645) Appendix A Mathematics
(p.645) Appendix A Mathematics
(p.645) Appendix A
Mathematics
Mathematics Hilbert spacesA.1 Vector analysis
Our conventions for elementary vector analysis are as follows. The unit vectors corresponding to the Cartesian coordinates x, y, z are u _{x}, u _{y}, u _{z}. For a general vector v, we denote the unit vector in the direction of v by v∼ = v/ v.
The scalar product of two vectors is a · b = a_{x}b_{x} + a_{y}b_{y} + a_{z}b_{z}, or
where (a _{1}, a _{2}, a _{3}) = (a_{x},a_{y},a_{z}), etc. Since expressions like this occur frequently, we will use the Einstein summation convention: repeated vector indices are to be summed over; that is, the expression a_{i}b_{i} is understood to imply the sum in eqn (A.1). The summation convention will only be employed for three‐dimensional vector indices. The cross product is
where the alternating tensor ϵ_{ijk} is defined by
A.2 General vector spaces
A complex vector space is a set ℏ on which the following two operations are defined.

(1) Multiplication by scalars. For every pair (α, ψ), where α is a scalar, i.e. a complex number, and ψ ϵ ℏ, there is a unique element of ℏ that is denoted by αψ.

(2) Vector addition. For every pair ψ, φ of vectors in ℏ there is a unique element of ℏ denoted by ψ + φ.
The two operations satisfy (a) α(βψ) = (αβ) ψ, and (b) α (ψ + φ) = αψ + αφ. It is assumed that there is a special null vector, usually denoted by 0, such that α0 = 0 and ψ + 0 = ψ. If the scalars are restricted to real numbers these conditions define a real vector space.
(p.646) Ordinary displacement vectors, r, belong to a real vector space denoted by ℝ^{3}. The set ℂ^{n} of n‐tuplets ψ = (ψ _{1},…, ψ_{n}), where each component ψ_{i} is a complex number, defines a complex vector space with component‐wise operations:
Each vector in ℝ^{3} or ℂ^{n} is specified by a finite number of components, so these spaces are said to be finite dimensional.
The set of complex functions, C (ℝ), of a single real variable defines a vector space with point‐wise operations:
where α is a scalar, and ψ (x) and φ (x) are members of C (ℝ). This space is said to be infinite dimensional, since a general function is not determined by any finite set of values.
For any subset u ⊂ ℏ, the set of all linear combinations of vectors in u is called the span of u, written as span (u). A family B ⊂ ℏ is a basis for ℏ if ℏ = span (B), i.e. every vector in ℏ can be expressed as a linear combination of vectors in B. In this situation ℏ is said to be spanned by B.
A linear operator is a rule that assigns a new vector Mψ to each vector ψ ϵ ℏ, such that
for any pair of vectors ψ and φ, and any scalars α and β. The action of a linear operator M on ℏ is completely determined by its action on the vectors of a basis B.
A.3 Hilbert spaces
A.3.1 Definition
An inner product on a vector space ℏ is a rule that assigns a complex number, denoted by (φ,ψ), to every pair of elements φ and ψ ϵ ℏ, with the following properties:
An inner product space is a vector space equipped with an inner product. The inner product satisfies the Cauchy—Schwarz inequality:
Two vectors are orthogonal if (φ, ψ) = 0. If 𝕱 is a subspace of ℏ, then the orthogonal complement of 𝕱 is the subspace 𝕱^{⊥} of vectors orthogonal to every vector in 𝕱. (p.647) The norm ‖ψ‖ of ψ is defined as,$\left\right\psi \left\right=\sqrt{(\psi ,\psi )},$ so that ‖ψ‖ = 0 implies ψ = 0. Vectors with ‖ψ‖ = 1 are said to be normalized. A set of vectors is complete if the only vector orthogonal to every vector in the set is the null vector. Each complete set contains a basis for the space. A vector space with a countable basis set, B = {φ^{(1)},^{(2)},…}, is said to be separable. The vector spaces relevant to quantum theory are all separable. A basis for which (φ^{(n)}, φ^{(m)}) = δφ _{nm} holds is called orthonormal. Every vector in ℏ can be uniquely expanded in an orthonormal basis, e.g.
where the expansion coefficients are ψ_{n} = (φ^{(n)},ψ).
A sequence ψ^{1}, ψ^{2},…, ψ^{k},… of vectors in ℏ is convergent if
A vector ψ is a limit of the sequence if
A Hilbert space is an inner product space that contains the limits of all convergent sequences.
A.3.2 Examples
The finite‐dimensional spaces ℝ^{3} and ℂ^{N} are both Hilbert spaces. The inner product for ℝ^{3} is the familiar dot product, and for ℂ^{N} it is
If we constrain the complex functions ψ (x) by the normalizability condition
then the Cauchy‐Schwarz inequality for integrals,
is sufficient to guarantee that the inner product defined by
makes the vector space of complex functions into a Hilbert space, which is called L ^{2}(ℝ).
(p.648) A.3.3 Linear operators
Let A be a linear operator acting on ℏ then the domain of A, called D(A), is the subspace of vectors ψ ϵ ℏ such that ‖Aψ‖ < ∞. An operator A is positive definite if (ψ, Aψ) ≥ 0 for all ψ ϵ D(A), and it is bounded if ‖Atp‖ < b‖ψ‖, where b is a constant independent of ψ. The norm of an operator is defined by
so a bounded operator is one with finite norm.
If Aψ = λψ, where ψ is a complex number and ψ is a vector in the Hilbert space, then ψ is an eigenvalue and ψ is an eigenvector of A. In this case λ is said to belong to the point spectrum of A. The eigenvalue λ is nondegenerate if the eigenvector ψ is unique (up to a multiplicative factor). If ψ is not unique, then λ is degenerate. The linearly‐independent solutions of Aψ = Aψ form a subspace called the eigenspace for λ, and the dimension of the eigenspace is the degree of degeneracy for λ. The continuous spectrum of A is the set of complex numbers λ such that: (1) λ is not an eigenvalue, and (2) the operator λ — A does not have an inverse.
The adjoint (hermitian conjugate) A ^{‡} of A is defined by
and A is self‐adjoint (hermitian) if D(A^{‡}) = D(A) and (φ, Aψ) = (Aφ, ψ). Bounded self‐adjoint operators have real eigenvalues and a complete orthonormal set of eigenvectors. For unbounded self‐adjoint operators, the point and continuous spectra are subsets of the real numbers. Note that (ψ, Aδ Aψ) = (φ, φ), where φ = Aψ, so that
i.e. A^{‡}A is positive definite.
A self‐adjoint operator, P, satisfying
is called a projection operator; it has only a point spectrum consisting of {0,1}. Consider the set of vectors Pℏ, consisting of all vectors of the form Pψ as ψ ranges over ℏ. This is a subspace of ℏ, since
shows that every linear combination of vectors in Pℏ is also in Pℏ. Conversely, let 𝕾 be a subspace of ℏ and {φ^{(n)}} an orthonormal basis for 𝕾. The operator P, defined by
is a projection operator, since
Thus there is a one‐to‐one correspondence between projection operators and subspaces of ℏ. Let P and Q be projection operators and suppose that the vectors in Pℏ are (p.649) orthogonal to the vectors in Qℏ; then PQ = QP = 0 and P and Q are said to be orthogonal projections. In the extreme case that (𝕾 = ℏ), the expansion (A.10) shows that P is the identity operator, Pψ = ψ.
A self‐adjoint operator with pure point spectrum {λ_{1},λ_{2},…} has the spectral resolution
where P^{n} is the projection operator onto the subspace of eigenvectors with eigenvalue λ^{n}. The spectral resolution for a self‐adjoint operator A with a continuous spectrum is
where dΜ (λ) is an operator‐valued measure defined by the following statement: for each subset δ of the real line,
is the projection operator onto the subspace of vectors ψ such that ‖ (λ − A)^{−1} ψ ‖ > ∞ for all λ ⊋ δ (Riesz and Sz.‐Nagy, 1955, Chap. VIII, Sec. 120).
A linear operator U is unitary if it preserves inner products, i.e.
for any pair of vectors ψ, φ in the Hilbert space. A necessary and sufficient condition for unitarity is that the operator is norm preserving, i.e.
The spectral resolution for a unitary operator with a pure point spectrum is
and for a continuous spectrum
A linear operator N is said to be a normal operator if
The hermitian and unitary operators are both normal. The hermitian operators N_{1} = (N+ N^{‡}) /2 and N_{2} = (N−N^{‡}) /2i satisfy N = N_{1}+ iN_{2} and [N_{1},N_{2}] = 0. Normal operators therefore have the spectral resolutions
(p.650) for a point spectrum, and
for a continuous spectrum.
A.3.4 Matrices
A linear operator X acting on an N‐dimensional Hilbert space, with basis {f^{(1)}, …, f^{(N)}}, is represented by the N × N matrix
The operator and its matrix are both called X. The matrix for the product XY of two operators is the matrix product
The determinant of X is defined as
where the generalized alternating tensor is
The trace of X is
The transpose matrix X^{T} is defined by X_{nm} ^{T} = X_{nm}. The adjoint matrix X^{‡} is the complex conjugate of the transpose: X_{nm} ^{‡} = X_{nm} ^{*}. A matrix X is symmetric if X = X^{T}, self‐adjoint or hermitian if X^{‡} = X, and unitary if X^{‡}X = XX^{‡} = I, where I is the N × N identity matrix. Unitary transformations preserve the inner product. The hermitian and unitary matrices both belong to the larger class of normal matrices defined by X^{‡}X =XX^{‡}.
A matrix X is positive definite if all of its eigenvalues are real and non‐negative. This immediately implies that the determinant and trace of the matrix are both non‐ negative. An equivalent definition is that X is positive definite if
for all vectors φ. For a positive‐definite matrix X, there is a matrix Y such that X = Y Y ^{+}.
The normal matrices have the following important properties (Mac Lane and Birk‐ hoff, 1967, Sec. XI–10).
(p.651) Theorem A.1 (i)If f is an eigenvector of the normal matrix Z with eigenvalue z, then f is an eigenvector of Z^{‡} with eigenvalue z^{*}, i.e. Zf = zf ⟹ Z^{‡}f = z^{*} f.
(ii)Every normal matrix has a complete, orthonormal set of eigenvectors.
Thus hermitian matrices have real eigenvalues and unitary matrices have eigenvalues of modulus 1.
A.4 Fourier transforms
A.4.1 Continuous transforms
In the mathematical literature it is conventional to denote the Fourier (integral) transform of a function f(x) of a single, real variable by
so that the inverse Fourier transform is
The virtue of this notation is that it reminds us that the two functions are, generally, drastically different, e.g. if f(x) = 1, then f∼(k) = 2π‡ (k).
On the other hand, the ∼ is a typographical nuisance in any discussion involving many uses of the Fourier transform. For this reason, we will sacrifice precision for convenience. In our convention, the Fourier transform is indicated by the same letter, and the distinction between the functions is maintained by paying attention to the arguments.
The Fourier transform pair is accordingly written as
This is analogous to the familiar idea that the meaning of a vector V is independent of the coordinate system used, despite the fact that the components (V_{x},V_{y},V_{z}) of V are changed by transforming to a new coordinate system. From this point of view, the functions f(x) and f(k) are simply different representations of the same physical quantity. Confusion is readily avoided by paying attention to the physical significance of the arguments, e.g. x denotes a point in position space, while k denotes a point in the reciprocal space or k‐space.
If the position‐space function f(x) is real, then the Fourier transform satisfies
When the position variable x is replaced by the time t, it is customary in physics to use the opposite sign convention: (p.652)
Fourier transforms of functions of several variables, typically f(r), are defined similarly:
where the integrals are over position space and reciprocal space (k‐space) respectively. If f(r) is real then
Combining these conventions for a space‐time function f(r, t) yields the transform pair
The last result is simply the plane‐wave expansion of f(r, t). If f(r, t) is real, then the Fourier transform satisfies
Two related and important results on Fourier transforms—which we quote for the one‐ and three‐dimensional cases—are Parseval's theorem:
and the convolution theorem:
These results are readily derived by using the delta function identities (A.95) and (A.96).
(p.653) A.4.2 Fourier series
It is often useful to simplify the mathematics of the one‐dimensional continuous transform by considering the functions to be defined on a finite interval (−L/2, L/2) and imposing periodic boundary conditions. The basis vectors are still of the form uk (x) = C exp(ikx), but the periodicity condition, u_{k} (−L/2) = u_{k} (L/2), restricts k to the discrete values
Normalization requires$C=1/\sqrt{L},$ SO the transform is
and the inverse transform f(x) is
The continuous transform is recovered in the limit L → ∞ by first using eqn (A.60) to conclude that
and writing the inverse transform as
The difference between neighboring k‐values is ‡k = 2π/L, So this equation can be recast as
In Cartesian coordinates the three‐dimensional discrete transform is defined on a rectangular parallelepiped with dimensions L_{x}, L_{y}, L_{z}. The one‐dimensional results then imply
where the k‐vector is restricted to
and V = L_{x}L_{y}L_{x}. The inverse transform is
and the integral transform is recovered by (p.654)
The sum and integral over k are related by
which in turn implies
A.5 Laplace transforms
Another useful idea—which is closely related to the one‐dimensional Fourier transform—is the Laplace transform defined by
In this case, we will use the standard mathematical notation f∼(ζ), since we do not use Laplace transforms as frequently as Fourier transforms. The inverse transform is
The line (ζ_{0} − i∞, ζ_{0} + i∞) in the complex ζ‐plane must lie to the right of any poles in the transform function f∼(ζ).
The identity
is useful in treating initial value problems for sets of linear, differential equations. Thus to solve the equations
with a constant matrix V, and initial data f_{n}(0), one takes the Laplace transform to get
This set of algebraic equations can be solved to expressf∼_{n}(ζ) in terms of f_{n}(0). Inverting the Laplace transform yields the solution in the time domain.
The convolution theorem for Laplace transforms is
where the integration contour is to the right of any poles of both g∼(ζ) and f∼(ζ).
(p.655) An important point for applications to physics is that poles in the Laplace transform correspond to exponential time dependence. For example, the function f(t) = exp (zt) has the transform
More generally, consider a function f∼(cz) with N simple poles in ζ:
where the complex numbers z_{1},…, Z_{N} are all distinct. The inverse transform is
where ζ_{0} > max[Re z _{1},…, Re z_{N}]. The contour can be closed by a large semicircle in the left half plane, and for N > 1 the contribution from the semicircle can be neglected. The integral is therefore given by the sum of the residues,
which explicitly exhibits f(t) as a sum of exponentials.
A.6 Functional analysis
A.6.1 Linear functionals
In normal usage, a function, e.g. f(x), is a rule assigning a unique value to each value of its argument. The argument is typically a point in some finite‐dimensional space, e.g. the real numbers ℝ, the complex numbers ℂ, three‐dimensional space ℝ^{3}, etc. The values of the function are also points in a finite‐dimensional space. For example, the classical electric field is represented by a function ε (r) that assigns a vector—a point in ℝ^{3}—to each position r in ℝ^{3}.
A rule, X, assigning a value to each point f in an infinite‐dimensional space 𝔐 (which is usually a space of functions) is called a functional and written as X [f]. The square brackets surrounding the argument are intended to distinguish functionals from functions of a finite number of variables.
If 𝔐 is a vector space, e.g. a Hilbert space, then a functional Y [f] that obeys
for all scalars α, β and all functions f,g ϵ 𝔐, is called a linear functional. The family, 𝔐′, of linear functionals on 𝔐 is called the dual space of 𝔐. The dual space is also a vector space, with linear combinations of its elements defined by
for all f ϵ 𝔐.
(p.656) A.6.2 Generalized functions
In Section 3.1.2 the definition (3.18) and the rule (3.21) are presented with the cavalier disregard for mathematical niceties that is customary in physics. There are however some situations in which more care is required. For these contingencies we briefly outline a more respectable treatment. The chief difficulty is the existence of the integrals defining the operators s(−‡^{2}). This problem can be overcome by restricting the functions ϕ(r) in eqn (3.18) to good functions (Lighthill, 1964, Chap. 2), i.e. infinitely‐differentiable functions that fall off faster than any power of r. The Fourier transform of a good function is also a good function, so all of the relevant integrals exist, as long as s(k) does not grow exponentially at large k. The examples we need are all of the form k^{α}, where −1 ≤ α ≤ 1, so eqns (3.18) and (3.21) are justified. For physical applications the really important assumption is that all functions can be approximated by good functions.
A generalized function is a linear functional, say G[ϕ], defined on the good functions, i.e.
for any scalars α, β and any good functions ϕ, ψ. A familiar example is the delta function. The rule
maps the function ϕ (r) into the single number ϕ (R). In this language, the transverse delta function${\Delta}_{ij}^{\perp}(r{r}^{\prime})$ is also a generalized function. An alternative terminology, often found in the mathematical literature, labels good functions as test functions and generalized functions as distributions.
In quantum field theory, the notion of a generalized function is extended to linear functionals sending good functions to operators, i.e. for each good function ϕ,
Such functionals are called operator‐valued generalized functions. For any density operator ρ describing a physical state, X [ϕ] defines an ordinary (c‐number) generalized function X _{ρ}∖ [ϕ] by
A.7 Improper functions
A.7.1 The Heaviside step function
The step function θ(x) is defined by
and it has the useful representation (p.657)
which is proved using contour integration.
A.7.2 The Dirac delta function
A Standard properties
(1) If the function f(x) has isolated, simple zeros at the points x^{1}, x^{1},… then
The multidimensional generalization of this rule is
where x = (x_{1}, x_{2},…, x_{N}), f(x) = (f_{1} (x), f_{2} (x),…, f_{N} (x)),
the Jacobian ∂f/dx is the N × N matrix with components ∂f_{n}/dx_{m}, and x ^{i} satisfies f^{n} (x ^{i}) = 0, for n = 1,…, N.
(2) The derivative of the delta function is defined by
(3) By using contour integration methods one gets
where P is the principal part defined by
(4) The definition of the Fourier transform yields
in one dimension, and
in three dimensions.
(p.658) (5) The step function satisfies
(6) The end‐point rule is
(7) The three‐dimensional delta function δ (r − r′) is defined as
and is expressed in polar coordinates by
B A special representation of the delta function
In many calculations, particularly in perturbation theory, one encounters functions of the form
which have the limit
provided that the integral
exists.
A.7.3 Integral kernels
The definition of a generalized function as a linear rule assigning a complex number to each good function can be extended to a linear rule that maps a good function, e.g. f(t), to another good function g(t). The linear nature of the rule means that it can always be expressed in the form
For a fixed value of t, W (t, t′) defines a generalized function of t′ which is called an integral kernel. This definition is easily extended to functions of several variables, e.g. f(r). The delta function, the Heaviside step function, etc. are examples of integral kernels. An integral kernel is positive definite if
for every good function f(t).
(p.659) A.8 Probability and random variables
A.8.1 Axioms of probability
The abstract definition of probability starts with a set ω of events and a probability function P that assigns a numerical value to every subset of ω. In principle, ω could be any set, but in practice it is usually a subset of ℝ^{N} or ℂ^{N}, or a subset of the integers. The essential properties of probabilities are contained in the axioms (Gardiner, 1985, Chap. 2):
(1) P(S)≥0 for all S⊂ ω;
(2) P(ω) = l;
(3) if S_{1}, S_{2},… is a discrete (countable) collection of nonoverlapping sets, i.e.
then
The familiar features 0 ≤ P (S) ≤ 1, P (∅) = 0, and P (S′) = 1 − P (S), where S′ is the complement of S, are immediate consequences of the axioms. If ω is a discrete (countable) set, then one writes P(x) = P({x}), where {x} is the set consisting of the single element x. If ω is a continuous (uncountable) set, then it is customary to introduce a probability density p(x) so that
where dx is the natural volume element on ω.
If ω = ℝ^{n}, the probability density is a function of n variables: p(x_{1},x_{2},…, x_{n}). The marginal distribution of x_{j} is then defined as
The joint probability for two sets S and T is P (S ∩T); this is the probability that an event in S is also in T. This is more often expressed with the notation
which is used in the text. The conditional probability for S given T is
this is the probability that x ε S, given that x ϵ T.
(p.660) The compound probability rule is just eqn (A.111) rewritten as
This can be generalized to joint probabilities for more than two outcomes by applying it several times, e.g.
Dividing both sides by P (R) yields the useful rule
Two sets of events S and T are said to be independent or statistically independent if the joint probability is the product of the individual probabilities:
A.8.2 Random variables
A random variable X is a function X (x) defined on the event space ω. The function can take on values in ω or in some other set. For example, if ω = ℝ, then X(t) could be a complex number or an integer. The average value of a random variable is
If the function X does take on values in ω, and is one‐one, i.e. X (x_{1}) = X (x_{2}) implies x_{1} = x_{2}, then the distinction between X (x) and x is often ignored.