Jump to ContentJump to Main Navigation
Analytical Mechanics for Relativity and Quantum Mechanics$

Oliver Johns

Print publication date: 2011

Print ISBN-13: 9780191001628

Published to Oxford Scholarship Online: December 2013

DOI: 10.1093/acprof:oso/9780191001628.001.0001

Show Summary Details
Page of

PRINTED FROM OXFORD SCHOLARSHIP ONLINE (www.oxfordscholarship.com). (c) Copyright Oxford University Press, 2017. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a monograph in OSO for personal use (for details see http://www.oxfordscholarship.com/page/privacy-policy). Subscriber: null; date: 26 February 2017

(p.543) Appendix B Matrices and Determinants

(p.543) Appendix B Matrices and Determinants

Source:
Analytical Mechanics for Relativity and Quantum Mechanics
Publisher:
Oxford University Press

(p.543) Appendix B

Matrices and Determinants

We summarize here some of the standard properties of matrices and determinants that are used throughout the text. Proofs not given here can be found in most surveys of linear algebra, for example Birkhoff and MacLane (1977), Mirsky (1961), and Strang (1998).

B.1 Definition of Matrices

An M × N matrix is a rectangular array of numbers placed in M rows and N columns. These numbers are called matrix elements and are indexed by subscripts ij where i = 1,…, M is the row index and j = 1,…, N the column index. Thus a 3 × 4 matrix A would be written as

A= ( A 11 A 12 A 13 A 14 A 21 A 22 A 23 A 24 A 31 A 32 A 33 A 34 )
(B.1)
If all of the matrix elements A ij are real numbers, then A is a real matrix. If one or more of the matrix elements are complex numbers, the matrix is a complex matrix.

B.2 Transposed Matrix

The transpose of an M × N matrix A is an N × M matrix denoted as A T or Ã. It is defined by listing its matrix elements for all values i = 1,…, M and j = 1,…, N,

A i j T = A j i
(B.2)
where A i j T denotes the ij th matrix element of the matrix A T. For example, if
B= ( 1 2 3 4 5 6 ) then B T = ( 1 4 2 5 3 6 )
(B.3)
By definition, the transpose of the transpose is just the original matrix,
( A T ) T = A
(B.4)

B.3 Column Matrices and Column Vectors

A special case of importance is the M × 1 matrix called a column matrix. Since these matrices are often used to represent the components of a vector, they are also called column (p.544) vectors. They are denoted by enclosing their label in square brackets and suppressing the second subscript of their matrix elements,

[ V ] = ( V 1 V 2 V M )
(B.5)
The transpose of an M × 1 column matrix is a 1 × M row matrix or row vector,
[ V ] T = ( V 1 V 2 V M )
(B.6)

The column vector all of whose matrix elements are zeroes is called the null vector and is denoted by [0], although sometimes it is also denoted by just the scalar 0,

[ 0 ] = ( 0 0 0 )
(B.7)

B.4 Square, Symmetric, and Hermitian Matrices

If M = N, the matrix is called an N ‐rowed square matrix. The transpose of a square matrix is also a square matrix, with the same number of rows. The transpose of a square matrix can be thought of as a reflection of the matrix elements about the diagonal from upper left to lower right. Thus, if

A= ( 1 2 3 4 5 6 7 8 9 ) then A T = ( 1 4 7 2 5 8 3 6 9 )
(B.8)

If a square matrix S is equal to its transpose, then it is said to be symmetric. Such symmetric matrices have

S T = S and hence S i j T = S j i = S i j
(B.9)
for all ij values. For example,
S= ( 1 2 3 2 5 6 3 6 9 )
(B.10)
is a real, symmetric matrix.

If the square matrix C is equal to the negative of its transpose, then it is called anti‐symmetric or skew‐symmetric,

C=-C T and hence C i j = C j i
(B.11)
The diagonal elements of anti‐symmetric matrices are zeroes. For example,
C= ( 0 2 3 2 0 6 3 6 0 )
(B.12)
is a real, anti‐symmetric matrix with N = 3.

(p.545) The complex conjugate of a complex matrix H is the matrix all of whose matrix elements are complex conjugated. Thus H * has matrix elements H i j * where * denotes complex conjugation.

The Hermitian conjugate of a complex matrix H is denoted H and is defined by

H = ( H T ) * = ( H * ) T and hence   H i j = H j i *
(B.13)
where H i j denotes the ijth matrix element of the matrix H .

For complex matrices, the concept of symmetric matrices is generalized to that of Hermitian matrices. A square matrix is Hermitian if it is equal to its Hermitian conjugate,

H = H and hence H i j = H j i * = H i j
(B.14)
for all ij values. A matrix that is both Hermitian and real is therefore symmetric, since complex conjugation has no effect on real numbers.

Anti‐Hermitian matrices can also be defined, by analogy to the anti‐symmetric matrices above. A matrix G is anti‐Hermitian if

G = G
(B.15)

B.5 Algebra of Matrices: Addition

Two matrices are equal if and only if they have the same number of rows and columns and all of their matrix elements are equal. Thus if A and B both are M × N matrices then

A=B if and only if A i j = B i j
(B.16)
for all possible values i = 1,…, M and j = 1,…, N.

A matrix may be pre‐ or post‐multiplied by a number α. If the matrix A has matrix elements A ij then the matrix α A has matrix elements α A ij. Note that each matrix element is multiplied by α. Thus, using the matrix of eqn (B.1) as an example,

A α = α A= ( α A 11 α A 12 α A 13 α A 14 α A 21 α A 22 α A 23 α A 24 α A 31 α A 32 α A 33 α A 34 )
(B.17)

Matrices of the same number of rows and columns may be added. The result will be a matrix also of the same type. Each matrix element of the sum is just the sum of the corresponding matrix elements of the addends. Thus, if A and B both are M × N matrices then

C=A+B if and only if C i j = A i j + B i j
(B.18)
for all possible values i = 1,…, M and j = 1,…, N.

(p.546) For example, with M = N = 2,

A= ( A 11 A 12 A 21 A 22 ) and B= ( B 11 B 12 B 21 B 22 )
(B.19)
give
C=A+B= ( A 11 + B 11 A 12 + B 12 A 21 + B 21 A 22 + B 22 )
(B.20)

A null matrix 0 is defined as the matrix all of whose elements are zeroes. It has the property that

0 + A=A=A+0
(B.21)
for all matrices A. The null matrix 0 is often denoted by just the number 0.

Matrix addition is associative,

A+(B+C)=(A+B)+C
(B.22)

B.6 Algebra of Matrices: Multiplication

Matrices may also be multiplied by each other. If A is an M × N matrix and B is an N × P matrix, then C = AB is an M × P matrix with matrix elements for all i = 1,&,M and j = 1,…,P given by

C i j = k = 1 N A i k B k j
(B.23)
Notice that the second index of A and the first index of B are both set equal to k and then summed over k = 1,…,N. This is possible only if the number of columns in A is the same as the number of rows in B.

A useful graphical device is to write the two matrices down in the order in which they are to be multiplied. Then run a left‐hand finger over the i th row of A and simultaneously run a right‐hand finger over the j th column of B, imagining the matrix elements touched to be multiplied and added. The result will be C ij.

It follows from eqns (B.2, B.23) that the transpose of a matrix product is a product of the transposes in reverse order,

( A B ) T = B T A T
(B.24)

Matrices A and B can often be multiplied in either order, A B or B A, but the resulting matrix products will in general be different. We say that matrix multiplication is in general not commutative. If two particular matrices happen to give the same product when they are multiplied in opposite orders, then we say that they commute. Only square matrices with the same number of rows can commute. For such matrices, the commutator of the two matrices is defined as

[ A,B ] c = A B - B A
(B.25)
The two matrices commute if and only if their commutator is the null matrix. (The subscript c is to distinguish commutators from Poisson brackets.)

(p.547) Matrix multiplication is associative. Thus

A(B C)=(A B)C
(B.26)

Besides noncommutativity, another important difference between the algebra of matrices and the algebra of numbers concerns null products, sometimes called divisors of zero. For numbers, α β =0 implies that either α = 0 or β = 0, or both. However, for matrices A B = 0 is possible even when both A and B are non‐null.

B.7 Diagonal and Unit Matrices

The matrix elements with the same row and column indices A ij are called diagonal elements. A square matrix whose only nonzero elements are the diagonal ones is called a diagonal matrix. Thus

D= ( 1 0 0 0 3 0 0 0 7 )
(B.27)
is a three‐rowed diagonal matrix.

A diagonal matrix whose diagonal elements are all ones is called the unit matrix. For example, the unit matrix with N = 3 is

∪= ( 1 0 0 0 1 0 0 0 1 )
(B.28)
As can be see from eqn (B.23) and the fact that U ij = δ ij, the Kroeneker delta, pre‐ or post‐multiplication of any matrix A by U leaves it unchanged,
A=A=A
(B.29)

A diagonal matrix all of whose diagonal elements are equal is called a scalar matrix. Thus, for some number α

S= ( α 0 0 0 α 0 0 0 α )
(B.30)
is a scalar matrix with N = 3. Pre‐ or post‐multiplication of any square matrix A by a scalar matrix of the same size has the same effect as multiplying A by the diagonal number. Thus, if the diagonal elements of scalar matrix S are all equal to the number α
S A= α A=A S
(B.31)

(p.548) B.8 Trace of a Square Matrix

The trace of an N ‐rowed square matrix A is denoted Tr A and is defined as the sum of the diagonal elements,

Tr A= i = 1 N A i i
(B.32)
The trace of a product of matrices is unchanged by a cyclic permutation of them. Thus, for example,
Tr ( A B C ) = Tr (C A B)=Tr(B C A)
(B.33)

B.9 Differentiation of Matrices

Suppose that each matrix element of an M × N matrix is a function of some parameter β. Thus A ij = A ij (β) and A = A (β). Then d A (β)/d β is defined as that matrix whose matrix elements are d A ij(β)/d β. If

A= ( A 11 A 12 A 13 A 21 A 22 A 23 ) then d A d β = ( d A 11 d β d A 11 d β d A 11 d β d A 11 d β d A 11 d β d A 11 d β )
(B.34)
Note that each matrix element is individually differentiated.

The product rule holds for matrix products. If

C=A B then d C d β = d A d β B+A d B d β
(B.35)
as can be seen by applying the ordinary rules of differentiation to eqn (B.23). Note that the order of the factors in the matrix product must be preserved as the product rule is applied.

B.10 Determinants of Square Matrices

The determinant of an N ‐rowed square matrix A is a single number, calculated from the elements of the matrix and denoted det A or ǀAǀ.

The rule for calculating the determinant uses the idea of a permutation (k 1, k 2,…, k N) of the integers (1,2,…, N). For example, for N = 3, two such permutations might be (3,1,2) and (1,3,2). Each such permutation is either even or odd. If an even(odd) number of exchanges is required to go from (1,2,…, N) to the final arrangement (k 1, k 2,…, k N), then the permutation is even(odd). In the above example, (3,1,2) is even and (1,3,2) is odd. The function ε(k 1, k 2,…, k N) is defined to be +1 for even permutations and −1 for odd ones.

The determinant is defined as a sum of products of elements A ij. Each product in this sum has one matrix element chosen from each row, with the choices of column being (p.549) given by the permutation (k 1, k 2,…, k N). The sum is over all N! possible permutations,

| A | = ( k 1 , k 2 , , k N ) ε ( k 1 , k 2 , k N ) A 1 k 1 A 2 k 2 A N k N
(B.36)
Alternately, the same determinant can be written using one element from each column and permutations of the rows, as
| A | = ( k 1 , k 2 , , k N ) ε ( k 1 , k 2 , k N ) A k 1 1 A k 2 2 A k N N
(B.37)

If A has N = 2,

| A | = A 11 A 22 A 12 A 21
(B.38)
If N = 3,
| A | = A 11 A 22 A 23 + A 12 A 23 A 31 + A 13 A 21 A 32 A 13 A 22 A 31 A 11 A 23 A 32 A 12 A 21 A 33
(B.39)
which can be remembered by a simple mnemonic device: Starting at the upper left, diagonals down to the right from each element of the first row give the positive products. Then, starting again at the lower left, diagonals up and to the right from each element of the last row give the negative products. However, this mnemonic fails for N = 4 or greater, and one of the expansion theorems listed below becomes the method of choice for evaluating the determinant.

B.11 Properties of Determinants

We list here some of the important properties of the determinants of square matrices. The proofs can be found in any standard text on linear algebra.

  1. 1. If any two rows of the matrix are exchanged, or any two columns of the matrix are exchanged, the new determinant is −1 times the old one.

  2. 2. If any row(column) of the matrix is all zeroes, then the determinant is zero.

  3. 3. If any row(column) of the matrix is multiplied by any constant α and then added to any other row(column), the value of the determinant is unchanged.

  4. 4. If any two rows(columns) are identical, or can be made identical by multiplying all of the elements in one of them by the same number, then the determinant is zero.

  5. 5. Transposing a matrix does not change its determinant. For any square matrix, ǀA Tǀ = ǀAǀ.

  6. 6. The determinant of a diagonal matrix is equal to the product of its diagonal elements. That is, if A ij = λ i δ ij, then ǀ A ǀ = λ 1 λ 2λ N.

  7. 7. The determinant of the unit matrix is the number one, ǀ U ǀ = 1.

  8. 8. The determinant of the null matrix is the number zero, ǀ0 ǀ =0.

  9. 9. If an N ‐rowed square matrix A is multiplied by a number α, then its determinant is multiplied by α N, that is ǀα A ǀ = α N ǀ Aǀ

  10. (p.550) 10. The determinant of a product is the product of the determinants. If C = A B then ǀ C ǀ = ǀ A ǀ ǀ B ǀ, which can be generalized to the product of any number of square matrices.

  11. 11. A matrix whose determinant is zero is called a singular matrix. If A B = 0 then ǀ A ǀ ǀ B ǀ = 0. Hence A B = 0 implies that at least one of A or B must be singular.

B.12 Cofactors

Let A be any N‐rowed square matrix. Suppose that we delete all elements in its i th row and all elements in its j th column and collect the remaining matrix elements into a new matrix, preserving their relative orders. The result will be an (N − 1)‐rowed square matrix, which we denote by A (ij). The N ‐rowed square matrix a is then defined by letting its ijth matrix element be the determinant of A (ij) multiplied by a factor of plus or minus one,

a i j = ( 1 ) i + j | A ¯ ( i j ) |
(B.40)
The quantity a ij is called the cofactor of the matrix element A ij and the matrix a is called the matrix of cofactors. For example, if
A= ( 1 2 3 4 5 6 7 8 9 )
(B.41)
then
A ¯ ( 23 ) = ( 1 2 7 8 ) and a 23 = ( 1 ) 2 ( 6 ) = 6
(B.42)
with the other elements of a defined similarly.

The matrix of cofactors has the property that

k = 1 N A i k a j k = δ i j | A | = k = 1 N A k i a k j
(B.43)
where δ ij is the Kroeneker delta function.

B.13 Expansion of a Determinant by Cofactors

If we set i = j and use the first equality in eqn (B.43), then we obtain an expansion for ǀ A ǀ by cofactors along its ith row. Each element A ik of the ith row is multiplied by the determinant | A ¯ ( i k ) | and the factor (−1)i+k, and then summed over k,

| A | = k = 1 N A i k a i k = k = 1 N ( 1 ) i + k A i k | A ¯ ( i k ) |
(B.44)
Notice that the (−1)i+k factors can be remembered easily since they have the pattern of a checkerboard.

(p.551) Similarly, the determinant can also be calculated by expansion along its i th column, using the second equality in eqn (B.43),

| A | = k = 1 N A k i a k i = k = 1 N ( 1 ) i + k A k i | A ¯ ( k i ) |
(B.45)

These expansion theorems allow the determinant of any N‐rowed matrix to be reduced to a sum whose terms involve only determinants of size N − 1. This expansion becomes particularly useful when one precedes it by a judicious use of Property 3 of Section B.11 to make several of the elements along a particular row(column) become zeroes.

Modern computer algebra systems automate the numerical, and even the symbolic, evaluation of large determinants. But these expansion theorems remain important. For example, the proof of Theorem D.23.1 in Appendix D makes use of them.

B.14 Inverses of Nonsingular Matrices

The inverse of an N‐rowed square matrix A, if it exists, is denoted A −1 and is defined by the property

A 1 A=U=A A 1
(B.46)
where U is the N‐rowed unit matrix.

Theorem B.14.1: Matrix Inverses

A square matrix A has an inverse if and only if ǀAǀ ≠ 0, that is if and only if A is nonsingular.

Proof: First we assume that the inverse exists and prove that ǀ Aǀ≠ 0. Taking the determinant of both sides of the left equality of eqn (B.46) gives

| A 1 | | A | = | U | = 1
(B.47)
Thus both ǀ Aǀ and ǀ A −1ǀ must be nonzero.

Conversely, assume that ǀ Aǀ≠ 0 and define A −1 by giving its matrix elements

A i j 1 = 1 | A | a j i
(B.48)
where a ij are the cofactors defined in Section B.12. It then follows from the second equality in eqn (B.43) that, for all ij,
( A 1 A ) i j = k = 1 N A i k 1 A k j = 1 | A | k = 1 N a k i A k j = 1 | A | δ j i | A | = δ i j = U i j
(B.49)
and hence that A −1 A = U as was to be proved. Similarly, the first equality in eqn (B.43) proves that A A −1 = U for the product in reverse order.    ◻

This theorem for the existence of matrix inverses is of great importance in La-grangian mechanics and in the general calculus of many variables.

(p.552) B.15 Partitioned Matrices

A matrix may be divided into partitions, each of which is itself a matrix. This is best seen by an example, which can be generalized to more complicated cases. Suppose that we have a 5 × 5 matrix E that is composed of four partitions: A 3 × 3 matrix A, a 3 × 2 matrix B, a 2 × 3 matrix C, and a 2 × 2 matrix D,

E = ( A 11 A 12 A 13 B 11 B 12 A 21 A 22 A 23 B 21 B 22 A 31 A 32 A 33 B 31 B 32 C 11 C 12 C 13 D 11 D 12 C 21 C 22 C 23 D 21 D 22 )
(B.50)

It follows from the basic definitions in Section B.10 that the determinants of0 partitioned matrices containing null partitions can sometimes be written in terms of the determinants of their non‐null parts. For example, if the matrix B is a null matrix (all zeroes) then ǀ E ǀ = ǀ A ǀ ǀ D ǀ, which does not depend on the elements C ij. Also, if matrix C is a null matrix (all zeroes) then ǀ E ǀ = ǀ A ǀ ǀ D ǀ, which does not depend on the elements B ij.

If both B and C are null matrices, then the matrix is what is called a block diagonal matrix. For example,

F= ( A 11 A 12 A 13 0 0 A 21 A 22 A 23 0 0 A 31 A 32 A 33 0 0 0 0 0 D 11 D 12 0 0 0 D 21 D 22 )
(B.51)
is a block diagonal matrix with two blocks of different size. Then ǀ F ǀ = ǀ A ǀ ǀ D ǀ, a result that can be generalized to block matrices with any number of diagonal blocks.

One extremely important example of the partitioning of matrices is to consider each of the N columns of an M × N matrix to be an M × 1 column vector. Thus the 3 × 4 matrix in eqn (B.1) can be written as

A = ( ( A 11 A 21 A 31 ) ( A 12 A 22 A 32 ) ( A 13 A 23 A 33 ) ( A 14 A 24 A 34 ) )
(B.52)
where we have put brackets around the columns to emphasize the partitioning.

An M × N matrix can also be partitioned in a similar way into row vectors. Each of the M rows can be considered to be a 1 × N row vector.

The multiplication of two matrices, each of which is partitioned into square blocks of the same dimension, can be done by the standard rule eqn (B.23) with numbers replaced by blocks. For example, if the blocks Ak, Bk, Ck, and Dk are all N × N square matrices, then

( A 1 B 1 C 1 D 1 ) ( A 2 B 2 C 2 D 2 ) = ( A 1 A 2 + B 1 C 2 A 1 B 2 + B 1 D 2 C 1 A 2 + D 1 C 2 C 1 B 2 + D 1 D 2 )
(B.53)

(p.553) B.16 Cramer's Rule

Suppose that we wish to solve a system of N linear equations in N unknowns, of the form

A 11 x 1 + A 12 x 2 + + A 1 N x N = b 1 A 21 x 1 + A 22 x 2 + + A 2 N x N = b 2 A N 1 x 1 + A N 2 x 2 + + A N N x N = b N
(B.54)
where A ij and b i are given numbers, and the x j are the unknowns to be solved for. Defining an N×N square matrix by
A = ( A 11 A 12 A 1 N A 21 A 22 A 2 N A N 1 A N 2 A N N )
(B.55)
and column vectors by
[ x ] = ( x 1 x 2 x N ) and [ b ] = ( b 1 b 2 b N )
(B.56)
eqn (B.54) can be written in matrix form as A[x] = [b]. If A is nonsingular, the solution is obtained by applying A −1 to both sides of this equation to get
[ x ] = A 1 [ b ]
(B.57)

Cramer's rule is a method of getting that same solution without the labor of calculating the inverse of A directly To apply it, define matrix A (j) to be matrix A partitioned into its columns in the style of eqn (B.52), but with the j th column replaced by the column vector [b]. Then, for all j = 1, … N, the matrix elements of [x], in other words the solution, are given by

x j = | A ( j ) | | A |
(B.58)
Thus, for example,
x 1 = | b 1 A 12 A 1 N b 2 A 22 A 2 N b N A N 2 A N N | | A 11 A 12 A 1 N A 21 A 22 A 2 N A N 1 A N 2 A N N |
(B.59)

(p.554) B.17 Minors and Rank

Starting from anM×N matrix A, suppose that we delete any MR rows (which need not be contiguous). We then delete any NR columns (which also need not be contiguous). Collecting all elements that remain, and re‐assembling them without changing their relative orders, gives an R × R square matrix m, where of course R ≤ min{M, N}. The determinant of this matrix ǀ m ǀ is called an R‐rowed minor of A.

The rank of matrix A is denoted R(A). It is the largest value of R for which there exists a nonzero R‐rowed minor of A. A nonzero minor with R = R(A) is called a critical minor of A. By the definition of R(A), all minors with R.>R(A)will be zero.

As an example, starting from the 3×4 matrix in eqn (B.1), we may delete rows 3 and 4, and columns 1 and 3. The corresponding two rowed minor of A is then

| m | = | A 12 A 14 A 22 A 24 |
(B.60)
If it happens that (A 12 A 24A 14 A 22)≠, then ǀ m ǀ is a two rowed nonzero minor of A, and that matrix must have rank R(A) of at least two. To see if its rank is possibly more than two, we would have to investigate its three rowed minors to see if at least one of them is nonzero. If so, the rank of the matrix would be three, the maximum possible value for a 3 × 4 matrix.

If an N‐rowed square matrix is nonsingular, then its determinant itself constitutes a critical minor with R =N. Thus it has rank R(A)=NOn the other hand, if a square non‐null matrix A is singular, it must have 0 <R(A) < N.

B.18 Linear Independence

Consider the set of M × 1 column vectors [V (1)],[V (2)],…,[V (N)]. If there exist N numbers α 1, α 2,…α N not all zero, such that

α 1 [ V ( 1 ) ] + α 2 [ V ( 2 ) ] + + α N [ V ( N ) ] = [ 0 ]
(B.61)
where [0] is the null column vector containing M zeroes, then this set of column vectors is linearly dependent (LD). Otherwise, we say that the set of column vectors is linearly independent (LI).

The following properties relate to linear independence and the rank of matrices.

  1. 1. If eqn (B.61) implies that α 1 =,α 2 = … =α N =0 then the set of vectors is LI. Otherwise the set is LD.

  2. 2. The maximum number of M × 1 column vectors that can appear in any LI set is M. We say that general M × 1 column vectors form a vector space of dimension M. Any particular set of M LI column vectors is said to span this space and form a basis for it. Any M× 1 column vector [V] can be expanded in this basis as [V] = α 1[V (1)] + … +α M.[V (M)]

  3. 3. A test for the linear independence of the column vectors in eqn (B.61) is to assemble them into an M×N 1 matrix, like, for example, eqn (B.52). Call this matrix A. Then the column vectors are linearly independent if and only if R(A) = N.

  4. (p.555) 4. Conversely, suppose we have any M × N matrix A of rank R(A). If we partition it into N column vectors, we can always find R(A) of them, and not more than R(A) of them, that are linearly independent.

  5. 5. A square matrix is nonsingular if and only if all of its columns(rows) are linearly independent.

  6. 6. Statements similar to the above can also be made for sets of 1 × N row vectors (V (1)),(V (2)),…(V (M)).

B.19 Homogeneous Linear Equations

Equation (B.54) with b 1 =b 2 = …=b N =0 is called a set of N homogeneous equations for the N unknowns x 1, x 2,…,x N. These homogeneous equations may be written in matrix form as

A [ x ] = [ 0 ]
(B.62)
It follows at once from eqn (B.57) that eqn (B.62) and ǀ A ǀ≠0 imply [x]= A [0] D= [0]. If the square matrix A is nonsingular, then the only solution to eqn (B.62) is the trivial one [x] = [0] in which x i = 0 for i = 1, … N. The following theorem generalizes eqn (B.62) to the case of M equations in N unknowns.

Theorem B.19.1: Homogeneous Linear Equations

If A is an M × N matrix, then the homogeneous linear equations

A 11 x 1 + A 12 x 2 + + A 1 N x N = 0 A 21 x 1 + A 22 x 2 + + A 2 N x N = 0 A M 1 x 1 + A M 2 x 2 + + A M N x N = 0
(B.63)
will have k linearly independent solutions, [x]= [c (i)] for i = 1, …, k, if and only if matrix A has rank R = Nk. The general solution, when it exists, can then be written in the form
[ x ] = λ 1 [ c ( 1 ) ] + + λ k [ c ( k ) ]
(B.64)
where the λ i may take any values.

The proof of this theorem can be found in Chapter 5 of Mirsky (1961). The case in which R = N−1, so that k = 1, is noteworthy because its general solution [x] =λ 1 [c (1)] has the property that the ratios x j / x l = c j ( 1 ) / c l ( 1 ) of the unknowns to some nonzero x l are uniquely determined functions of the matrix elements A ij.

The following corollary applies to square matrices.

Corollary B.19.2: Homogeneous Equations with Square Matrices

For square matrices with M = N eqn (B.63) has a nontrivial solution with [x] ≠ [0] if and only if A is singular withǀ A ǀ = 0. This result can also be stated: A matrix A is nonsingular with ǀ A ǀ≠ 0 if and only if A[x] =0 implies the trivial solution [x] = 0.

(p.556) B.20 Inner Products of Column Vectors

An inner product of two real, M‐rowed column vectors may be defined by analogy with the dot product of vectors in Cartesian three space. If [x] = (x 1, …,x M)T and [y] = (y 1, …,y M)T are two such column vectors, then we define

[ x ] [ y ] = [ x ] T [ y ] = i = 1 M x i y i
(B.65)
It follows that
[ x ] [ y ] = [ y ] [ x ] and that [ x ] [ 0 ] implies [ x ] [ x ] > 0
(B.66)
If [x] ∙ [y] = 0, then we say that [x] and [y] are orthogonal.

If each member of a set of vectors [V (1)]; [V (2)], …, [V (N)] is orthogonal to all the others, then this set is LI. If there are M mutually orthogonal vectors in the set, then it spans the space of M × 1 column vectors and any column vector [V] can be expanded as

[ V ] = α 1 [ V ( 1 ) ] + α 2 [ V ( 2 ) ] + + α M [ V ( M ) ]
(B.67)
where, for i = 1,…,M,
α i = [ V ( i ) ] [ V ] [ V ( i ) ] [ V ( i ) ]
(B.68)

Conversely if we are given any LI set of column vectors [V (1)],[V (2)],…,[V (N)], we can always construct a mutually orthogonal set by the Schmidt orthogonalization procedure. Starting from an arbitrary member of the set, which for simplicity we may take to be the first one [V (1)], define

(B.69S)
[ W ( 1 ) ] = [ V ( 1 ) ] [ W ( 2 ) ] = [ V ( 2 ) ] [ V ( 2 ) ] [ W ( 1 ) ] [ W ( 1 ) ] [ W ( 1 ) ] [ W ( 1 ) ] [ W ( 3 ) ] = [ V ( 3 ) ] [ V ( 3 ) ] [ W ( 1 ) ] [ W ( 1 ) ] [ W ( 1 ) ] [ W ( 1 ) ] [ V ( 3 ) ] [ W ( 2 ) ] [ W ( 2 ) ] [ W ( 2 ) ] [ W ( 2 ) ]
and so on, following the same pattern until, [W (N)] is reached. Then the set [W (1)],[W (2)]…,[W N] will be mutually orthogonal by construction.

Any non‐null column vector can be normalized. If [y] is the original vector, the corresponding normalized vector is defined by

[ x ] 1 [ y ] [ y ] [ y ]
(B.70)
which has the property [x] ∙ [x] = 1. If the vectors of a set [V (1)],[V (2)],…,[V (N)] of mutually orthogonal vectors are all normalized, it is called an orthonormal set and obeys the orthonormality condition
[ V ( k ) ] [ V ( t ) ] = δ k l
(B.71)
for all k;l = 1,…,N.

(p.557) If [V (1)],[V (2)],…,[V (M)] are an orthonormal set, then [V (k)]∙[V (k)],=1 and so eqn (B.67) can be written as

[ V ] = k = 1 M [ V ( k ) ] { [ V ( k ) ] [ V ] }
(B.72)
where [V (k)] is post‐multiplied by the scalar α k = to enhance the clarity of the expression. The α k is called the component of [V] in the orthonormal basis [V (1)],[V (2)],…,[V (M)]. Such an orthonormal basis is often referred to as a complete orthonormal set.

B.21 Complex Inner Products

For complex column vectors, the inner product must be generalized. It becomes

[ x ] * [ y ] = [ x ] [ y ] = i = 1 M x i * y i
(B.73)
Then [y]*∙ [x] = {[x]*∙[y]}* and, as for real vectors, [x]*∙[x] > 0 for any non‐null column vector [x].

All results of the previous Section B.20 apply also to complex column vectors when all inner products there are replaced by the generalized ones, adding an * to the left‐hand vectors of the products. Thus, for example, in an orthonormal basis obeying

[ V ( k ) ] * [ V ( l ) ] = δ k l
(B.74)
for all k; l = 1, …, N, eqn (B.72) becomes
[ V ] = k = 1 M [ V ( k ) ] { [ V ( k ) ] * [ V ] }
(B.75)
and the component becomes α k =[V (k)]*∙[V].

B.22 Orthogonal and Unitary Matrices

A real, square matrix M is orthogonal if it is nonsingular and its transpose is its inverse. Then

M 1 = M T and so M T M=U=M M T
(B.76)

As may be seen by writing out the second of eqn (B.76) as a sum, if an N‐rowed real, square matrix is partitioned into its N columns(rows), the resulting column(row) vectors will be an orthonormal set obeying eqn (B.71) if and only if the matrix is orthogonal.

Theorem B.22.1: Proof of Orthogonality

To prove a matrix orthogonal, it is sufficient to show either that M T M = U or that M M T = U.

(p.558) Proof: It follows from the first stated condition that

1 = | U | = | M T M | = | M T | | M | = | M | 2
(B.77)
and hence that ǀ M ǀ = ±1. Thus M is nonsingular and has an inverse. Applying its inverse to M T M = U then proves that
M T = M T M M 1 = U M 1 = M 1
(B.78)
which is the condition for M to be orthogonal. A similar argument demonstrates the sufficiency of the second stated condition.      ◻

The generalization to complex matrices involves the Hermitian conjugate in place of the transpose. A matrix D is unitary if it is nonsingular and its Hermitian conjugate is its inverse. Then

D 1 = D and so D D=U=D D
(B.79)

As may be see by writing out the second of eqn (B.79) as a sum, the columns(rows) of a square, complex matrix are an orthonormal set and obey eqn (B.74) if and only if the matrix is unitary. Notice that the generalized definition of inner product must be used for the complex vectors coming from complex, unitary matrices.

The above theorem applies also to unitary matrices, with a similar proof.

Theorem B.22.2: Proof of Unitarity

To prove a matrix unitary, it is sufficient to show either that D D = U or that DD = U.

Proof: For unitary matrices, eqn (B.77) becomes

1 | U | = | D D | = | D | | D | = | D | * | D |
(B.80)
which shows that, for some real number θ, ǀDǀ = exp() ≠0. The rest of the proof is the same as for orthogonal matrices.     ◻

B.23 Eigenvalues and Eigenvectors of Matrices

An N‐rowed square matrix A, is said to have an eigenvalue λ k and corresponding eigenvector [x (k)] if

A [ x ( k ) ] = λ k [ x ( k ) ]
(B.81)
where the λ k are simply numbers multiplying the vectors on the right side. Equation (B.81) may be rewritten as
( A- λ k U ) [ x ( k ) ] = [ 0 ]
(B.82)
where U is the unit matrix, and [0] is the null column vector (all zeroes). As noted in Section B.19, this equation will have a nontrivial solution [x (k)] ≠ [0] if and only if the (p.559) matrix multiplying [x (k)] is singular. Thus the condition for this singularity,
| A λ U | = 0
(B.83)
is an Nth order polynomial equation in λ. It is called the characteristic equation. Its N roots,
λ 1 , λ 2 λ N
(B.84)
are the eigenvalues, the values of λ that will give nontrivial eigenvector solutions.

Equation (B.83) in expanded form is

| ( A 11 λ ) A 12 A 1 N A 21 ( A 22 λ ) A 2 N A N 1 A N 2 ( A N N λ ) | = 0
(B.85)
Each of its roots λ k is substituted into eqn (B.82) to obtain the homogeneous equation for the components x i ( k ) of the k th eigenvector. In expanded form, this equation is
( A 11 λ k ) x 1 ( k ) + A 12 x 2 ( k ) + + A 1 N x N ( k ) = 0 A 21 x 1 ( k ) + ( A 22 λ k ) x 2 ( k ) + + A 2 N x N ( k ) = 0 A N 1 x 1 ( k ) + A N 2 x 2 ( k ) + + ( A N N λ k ) x N ( k ) = 0
(B.86)
If the eigenvalue λ k is unique, then eqn (B.86) can be solved for a unique set of ratios x i ( k ) / x 1 ( k ) of the ith component to some nonzero one, here taken to be the first one although in practice some other one may be used. The component x 1 ( k ) can then be determined from the normalization condition [x.(k)]*∙ [x.(k)]==1 1. But, even after normalization, the eigenvectors are not completely determined. Equation (B.81) is linear in the eigenvectors; if [x.(k)] is a normalized eigenvector corresponding to, λ kso is e [x (k)] where θ is any real number.

If some eigenvalue λ k is not unique, that is if that root of the characteristic equation has multiplicity K > 1, then that eigenvalue is said to be degenerate. As will be shown below, there is a large class of matrices called normal matrices for which eqn (B.86) for a K‐fold root will have K linearly independent solutions, each one a set of ratios of the sort just described. Either by a lucky guess in the original determination, or by the use of the Schmidt orthogonalization procedure of Section B.20, these K linearly independent solutions can be made to produce K mutually orthogonal ones (of course, using the complex inner product of Section B.21 when the eigenvectors are complex).

B.24 Eigenvectors of Real Symmetric Matrix

The eigenvalue problem for real, symmetric matrices is a special case of great importance in mechanics. For example, it is used to find principal axes of rigid bodies. In the present section, we will assume that the matrix A is an N‐rowed, real, symmetric matrix, and that the techniques of Section B.23 are being used to find its eigenvalues and eigenvectors. We begin with two lemmas, and then state the main theorem.

(p.560) Lemma B.24.1: Real Eigenvalues

All eigenvalues of a real, symmetric matrix are real numbers.

Proof: Multiply eqn (B.81) from the left by [x(k)] to obtain

[ x ( k ) ] A [ x ( k ) ] = λ k [ x ( k ) ] [ x ( k ) ]
(B.87)
Now take the Hermitian conjugate of eqn (B.81) and multiply it from the right by [x(k)] which gives
[ x ( k ) ] A [ x ( k ) ] = λ k * [ x ( k ) ] [ x ( k ) ]
(B.88)
But A is both real and symmetric, and hence A = A. Then, subtracting eqn (B.88) from eqn (B.87) gives
0 = ( λ k λ k * ) [ x ( k ) ] [ x ( k ) ]
(B.89)
Since, for a nontrivial solution,
[ x ( k ) ] [ x ( k ) ] = i = 1 N x i ( k ) * x i ( k ) > 0
(B.90)
it follows that ( λ k λ k * ) = 0 and so λ k is real.      ◻

Since the eigenvalues are all real, and since the matrix elements of A are real by assumption, we assume from now on that the eigenvectors have been chosen to be real column vectors. Also, we will assume that the eigenvectors have been normalized so that each of them obeys [x (k)] ∙ [x (k)] = 1.

Lemma B.24.2: Orthogonal Eigenvectors

For a real, symmetric matrix, the eigenvectors corresponding to two distinct eigenvalues λ kλ 1 will be orthogonal.

Proof: Multiply eqn (B.81) from the left by [x (l)]T to obtain

[ x ( l ) ] T A [ x ( k ) ] = λ k [ x ( l ) ] T [ x ( k ) ]
(B.91)
Now take the transpose of eqn (B.81), but with k replaced by l. Multiply this from the right by [x (k)] to obtain
[ x ( l ) ] T A T [ x ( k ) ] = λ l [ x ( l ) ] T [ x ( k ) ]
(B.92)
Subtract eqn (B.92) from eqn (B.91), and use A T = A to obtain
0 = ( λ k λ l ) [ x ( l ) ] T [ x ( k ) ] = ( λ k λ l ) [ x ( l ) ] [ x ( k ) ]
(B.93)
where the definition of inner product from Section B.20 has been used. Since, by assumption, (λ kλ l) ≠ 0 it follows that [x (l)] ∙ [x (k)] = 0 and so the two eigenvectors are orthogonal.     ◻

The main theorem may now be stated.

(p.561) Theorem B.24.3: Complete Orthonormal Set of Eigenvectors

A real, N-rowed, symmetric matrix A has real eigenvalues λ k and N real, normalized, mutually orthogonal eigenvectors [x (k)],

A [ x ( k ) ] = λ k [ x ( k ) ] w h e r e [ x ( k ) ] [ x ( l ) ] = δ k l
(B.94)
for all k,l = 1,…, N.

Proof: The reality of the eigenvalues and eigenvectors has already been proved in B.24.1. If all N of the eigenvalues are distinct, then the existence of N mutually orthogonal eigenvectors has also been established, by B.24.2. But some eigenvalue roots of eqn (B.83) may be degenerate, so a more general proof is required.

The proof is by induction, and exploits the symmetry of A. The theorem is trivially true for matrices with N = 1. We assume the theorem true for N − 1 and prove it true for N. Thus it will be proved true for any N.

Let [x (1)] be some real, normalized eigenvector of A with eigenvalue λ 1 so that

A [ x ( 1 ) ] = λ 1 [ x ( 1 ) ]
(B.95)
Let [y (2)], …, [y (N)] be N − 1 normalized and mutually orthogonal vectors, all of which are also orthogonal to [x (2)]. Then [x (1)] is also orthogonal to each of the vectors A [y (2)],…, A [y.(N)]. To see this, take the transpose of eqn (B.95) and multiply it from the right by [y (l)] to obtain
[ x ( 1 ) ] A T [ y ( l ) ] = λ 1 [ x ( 1 ) ] T [ y ( l ) ] = λ 1 [ x ( 1 ) ] [ y ( l ) ] = 0
(B.96)
where l = 2,…,N. But by assumption A T = A and hence eqn (B.96) may be written
[ x ( 1 ) ] ( A [ y ( l ) ] ) = 0
(B.97)

Since both [y (l)] and A [y (l)] are orthogonal to [x (1)] we can consider the eigenvalue problem separately in the N − 1 dimensional space spanned by [y (2)],…, [y (N)]. Define the (N − 1)-rowed, square, symmetric matrix B by

B l m [ y ( l ) ] T A [ y ( m ) ]
(B.98)
for l, m = 2,…,N. By the induction hypothesis, the theorem is true for (N − 1)-rowed matrices. So N − 1 normalized and mutually orthogonal (N − 1) × 1 eigenvectors [z (2)]; [z (3)],…, [z (N)] can be found that obey
B [ z ( k ) ] = λ k [ z ( k ) ] and [ z ( k ) ] [ z ( k ) ] = δ k k
(B.99)
for k, k′ = 2,…, N.

Now the original eigenvector [x (1)] and the vectors

[ x ( k ) ] = l = 2 N z l ( k ) [ y ( l ) ]
(B.100)
for k = 2,…, N, are N normalized and mutually orthogonal eigenvectors of A, as was to be proved.              ◻

(p.562) B.25 Eigenvectors of Complex Hermitian Matrix

Proofs very similar to those in Section B.24 show that any N-rowed, square, complex, Hermitian matrix H has real eigenvalues and N mutually orthogonal eigenvectors. The difference is that the eigenvectors are now complex, so the complex inner product of eqn (B.73) must be used. In the present section, we assume that the eigenvalues and eigenvectors of an N-rowed, complex, Hermitian matrix H are being found by the techniques of Section B.23.

We state, without further proof, some results for Hermitian matrices corresponding to those of Section B.24 for real, symmetric ones:

  1. 1. A Hermitian matrix has real eigenvalues.

  2. 2. If two eigenvalues are unequal λ kλ l, then the corresponding eigenvector solutions are orthogonal, [x (k)]* ∙ [x (l)] = 0.

  3. 3. A complex, N-rowed, Hermitian matrix H has real eigenvalues λ k and N normalized, mutually orthogonal eigenvectors [x (k)] such that, for all k, l = 1,…, N,

    H [ x ( k ) ] = λ k [ x ( k ) ] where [ x ( k ) ] * [ x ( l ) ] = δ k l
    (B.101)

Another theorem of importance, stated without proof, is:

Theorem B.25.1: Common Eigenvectors

Two complex (or real) N-rowed Hermitian matrices H 1 and H 2 have a common set of N mutually orthogonal eigenvectors [x (k)] such that

H 1 [ x ( k ) ] = λ k [ x ( k ) ] a n d H 2 [ x ( k ) ] = γ k [ x ( k ) ]
(B.102)
if and only if they commute,
[ H 1 , H 2 ] c = 0
(B.103)
Note that the real eigenvalues λ k, γ k are in general different, even though the eigenvectors are the same.

B.26 Normal Matrices

Since N-rowed real symmetric and complex Hermitian matrices both have N mutually orthogonal eigenvectors, one might wonder if other species such as orthogonal or unitary matrices do also. The answer is yes. The general rule is that an N-rowed matrix has N mutually orthogonal eigenvectors if and only if it commutes164 with its Hermitian conjugate. Such matrices are called normal matrices. By definition, a matrix is normal if and only if the matrix and its Hermitian conjugate commute,

[ A , A ] c = A A A A = 0
(B.104)

Theorem B.26.1: Eigenvectors of Normal Matrices

The N-rowed, square matrix A has N mutually orthogonal eigenvectors [x (k)] with

A [ x ( k ) ] = λ k [ x ( k ) ] w h e r e [ x ( k ) ] * [ x ( l ) ] = δ k l
(B.105)
for all k, l = 1,…,N, if and only if it is normal.

(p.563) Proof: To prove that eqn (B.104) implies the existence of N mutually orthogonal eigenvectors, note that any square matrix A can be written as a sum of two Hermi-tian matrices as

A = A R + i A I where A R = 1 2 ( A + A ) and A I = i 2 ( A A )
(B.106)
By Theorem B.25.1, the two Hermitian matrices will have a common set of N mutually orthogonal eigenvectors if and only if
0 = [ A R , A I ] c = 1 2 i [ A , A ] c
(B.107)
where the second equality follows from the definitions in eqn. (B.106). A vector [x (k)] that is a common eigenvector of A r, A I with eigenvalues λ k, γ k, respectively will be an eigenvector of A with eigenvalue (λ k + i γ k), which completes the proof. The proof of the converse, that the existence of a set of N orthonormal eigenvectors satisfying eqn (B.105) implies eqn (B.104), follows immediately from eqn (B.119) of Section B.27 and the fact that any two diagonal matrices commute.     ◻

The eigenvalue and eigenvectors of normal matrices are found using the same techniques as described in Section B.23 and used earlier for real symmetric and complex Hermitian matrices. The eigenvalues of normal matrices are derived from the same determinant condition as in eqn (B.85). The eigenvalues of normal matrices may in general be complex rather than real, but the eigenvector solutions are still obtained from eqn (B.86). And the following lemma remains true.

Lemma B.26.2: Orthogonality of Eigenvectors of Normal Matrix

If a normal matrix A has two unequal eigenvalues λ kλ l, then the corresponding eigenvectors will be orthogonal, [x (k)]* ∙ [x (l)] = 0. Two complex eigenvalues are considered unequal if either their real or their imaginary parts differ.

Since complex conjugation has no effect on real matrices, the condition for a real matrix to be normal reduces to

[ A , A T ] c = A A T A T A = 0
(B.108)
Then examination of the conditions in eqns (B.104, B.108) shows that real symmetric, real anti-symmetric, real orthogonal, complex Hermitian, complex anti-Hermitian, and complex unitary matrices are all normal.

B.27 Properties of Normal Matrices

Normal matrices have a complete orthonormal set of eigenvectors. A number of important results follow from this fact. We present some of these results here, assuming that the eigenvectors and eigenvalues may be complex. But the same formulas may also be used for real, symmetric matrices with real eigenvectors and eigenvalues by ignoring the complex conjugation signs “*002A;”, replacing all Hermitian conjugate signs “† ” by transpose signs “T”, and replacing the word “unitary” by the word “orthogonal.” Thus the results of the present section apply in particular to real symmetric, real anti-symmetric, (p.564) orthogonal, complex Hermitian, complex skew-Hermitian, and complex unitary matrices.

Given a normal, N-rowed matrix A and its N orthonormal eigenvectors [x (k)], let us define a matrix D whose columns are the eigenvectors. The matrix elements of D are therefore

D i k = x i ( k )
(B.109)
for i, k = 1,…, N, and so
D = ( x 1 ( 1 ) x 1 ( 2 ) x 1 ( N ) x 2 ( 1 ) x 2 ( 2 ) x 2 ( N ) x N ( 1 ) x N ( 2 ) x N ( N ) )
(B.110)
As was discussed in Section B.22, since its columns are an orthonormal set of vectors this D will be a unitary matrix. But it is useful here to prove its unitarity directly. From the orthonormality of the [x (k)] in eqn (B.105),
( D D ) k l = i = 1 N D k i D i l = i = 1 N D i k * D i l = i = 1 N x i ( k ) * x i ( l ) = [ x ( k ) ] [ x ( l ) ] = [ x ( k ) ] * [ x ( l ) ] = δ k l
(B.111)
Since the unit matrix has U kl = δ kl, this proves that all matrix elements of DD are identical to those of U and hence that
D D = U
(B.112)
As proved in B.22.2, this is sufficient to prove D unitary.

Since D is unitary, it follows also that U = D D . In terms of the components, this can be written

U i j = δ ( D D ) i j = k = 1 N x i ( k ) x j ( k ) * = ( k = 1 N [ x ( k ) ] [ x ( k ) ] ) i j
(B.113)
It follows that
U = k = 1 N [ x ( k ) ] [ x ( k ) ]
(B.114)
which expands the unit matrix in terms of eigenvectors. This expansion is called a resolution of unity. Applying this expression to an arbitrary vector [V] gives
[ V ] = U [ V ] = k = 1 N [ x ( k ) ] [ x ( k ) ] [ V ] = k = 1 N [ x ( k ) ] ( [ x ( k ) ] * [ V ] )
(B.115)
which should be compared to eqn (B.75).

(p.565) Now consider the matrix E defined by the matrix product

E = D A D
(B.116)
Expanding this product and using eqn (B.105) gives
E k l = i = 1 N j = 1 N D k i A i j D j l = i = 1 N j = 1 N x k i ( k ) * A i j x j ( l ) = [ x ( k ) ] A [ x ( l ) ] = [ x ( k ) ] * λ l [ x ( l ) ] = δ k l λ l
(B.117)
Thus E is a diagonal matrix with the eigenvalues of A as its diagonal elements. We say that D reduces A to the diagonal matrix E
E = ( λ 1 0 0 0 λ 2 0 0 0 λ N )
(B.118)

Equation (B.116) can be inverted, using the unitarity of D to write

A = D E D
(B.119)
which can be written out as an expansion of A in terms of its eigenvalues and eigenvectors
A i j = k = 1 N l = 1 N D i k E k l D l j = k = 1 N l = 1 N x i ( k ) δ k l λ l x j ( l ) * = k = 1 N x i ( k ) λ k x j ( k ) *
(B.120)
In matrix notation, this equation becomes
A = k = 1 N [ x ( k ) ] λ k [ x ( k ) ]
(B.121)
which is referred to as an eigen-dyadic expansion of matrix A in terms of its eigenvectors and eigenvalues.

Equation (B.119) also allows the trace and determinant of a normal matrix to be found from its eigenvalues.

Theorem B.27.1: Trace and Determinant of Normal Matrices

If matrix A is an N-rowed, normal matrix (either real or complex), with eigenvalues λ 1, λ 2,…, λ .N then its trace and determinant are

Tr A = λ 1 + λ 2 + + λ N a n d | A | = λ 1 λ 2 λ N
(B.122)

(p.566) Proof: From eqn (B.119), and the invariance of the trace under cyclic permutations discussed in Section B.8, it follows that

Tr A = Tr ( D E D ) = Tr ( E D D ) Tr E = λ 1 + λ 2 + + λ N
(B.123)
From eqn (B.119), and Properties 5 and 11 of Section B.11, it follows that
| A | = | D E D | = | D | | E | | D | = | E | = λ 1 λ 2 λ N
(B.124)
     ◻

Equation (B.119) also allows the characteristic equation of normal matrix A to be expressed in terms of the eigenvalues corresponding to its N mutually orthogonal eigenvectors.

Theorem B.27.2: Form of Characteristic Equation

If matrix A is an N-rowed, normal matrix (either real or complex), with eigenvectors [x (1)], [x (2)],…, [x (N)] and corresponding eigenvalues λ 1, λ 2, …, λ n then the characteristic equation eqn (B.83) is

0 = | A λ | = ( λ 1 λ ) ( λ 2 λ ) ( λ N λ )
(B.125)
with one factor for each eigenvector, regardless of the uniqueness or degeneracy of that eigenvalue.

Proof: Using eqn (B.119) and the unitarity of D,

| A λ | = | D E D λ D D | = | D ( E λ ) D | = | D | | E λ | | D | = | E λ |
(B.126)
But both E and U are diagonal matrices, so
| E λ | = ( λ 1 λ ) ( λ 2 λ ) ( λ N λ )
(B.127)
which establishes eqn (B.125).           ◻

Theorem B.27.2 establishes that, for normal matrices, the multiplicity of a particular eigenvalue in the characteristic equation is equal to the number of orthogonal eigenvectors having that eigenvalue. Thus eqn (B.86) for a degenerate eigenvalue λ k with multiplicity k will always yield k mutually orthogonal eigenvector solutions, as was asserted at the end of Section B.23.

B.28 Functions of Normal Matrices

Matrix functions of normal matrices may be defined by using the dyadic expansion in eqn. (B.121) of the previous Section B.27.

Let A be an N-rowed normal matrix with eigenvalues λ k and mutually orthogonal eigenvectors [x (k)], for k = 1,…, N. Let f = f(z) be a complex function of the complex (p.567) variable z that is well defined when z is equal to any one of the λ k. Then the matrix function F = f(A) maybe defined as

F = f ( A ) = k = 1 N [ x ( k ) ] f ( λ k ) [ x ( k ) ]
(B.128)
This definition has several useful consequences:
  1. 1. The matrix F is also a normal matrix.

  2. 2. The eigenvectors [x (k)] of A are also eigenvectors of F. The corresponding eigenvalues of matrix F are f (λ (k)). That is,

    F [ x ( k ) ] = γ k [ x ( k ) ] where γ k = f ( λ k )
    (B.129)

  3. 3. As may be seen by repeated use of eqn (B.121) using the orthonormality of the eigenvectors, if f(z) = z n for any positive integer n, then F = A.n where A n = A A A n times

  4. 4. It follows from consequence 3 and the resolution of unity in eqn. (B.114) that, when f(z) has a power-series expansion f(z) = a 0 + a 1 z + a 2 z 2 + … and all λ k lie in the circle of convergence of the power series, then the power series

    F = a 0 + a 1 A + a 2 A 2 +
    (B.130)
    converges to the same F as that defined in eqn (B.128).

  5. 5. A characteristic equation like eqn (B.83) is used to find the eigenvalues λ k of any N-rowed normal matrix A. As proved by the last theorem in Section B.27, this characteristic equation may be written as

    0 = | A λ | = ( λ 1 λ ) ( λ 2 λ ) ( λ N λ )
    (B.131)
    where the λ 1, λ 2,…, λ N are the eigenvalues corresponding to its N eigenvectors. As may be seen by repeated use of eqns (B.114, B.121), using the orthonormality of the eigenvectors, the normal matrix itself satisfies its own characteristic equation. With A substituted for the unknown λ, and λ k U for the numbers λ k,
    0 = ( λ 1 A ) ( λ 2 A ) ( λ N A )
    (B.132)

  6. 6. It follows from consequence 5 that the Nth power A N of any N-rowed square, normal matrix A may be written as a polynomial p(A) of degree (N − 1) containing only powers of A less than N, so that A N = p(A). Thus, for any N-rowed square, normal matrix, the power series eqn (B.130) can be reduced to an expression containing only powers of A up to and including A (N)

Notes:

(164) The commutator of two matrices is defined as [A,B]c = AB − BA. See the analogous definition for operators in Section 7.1. If the commutator vanishes, the two matrices are said to commute.