The Cayley-Hamilton Theorem

Philip J. Erdelsky

June 8, 2006
Reformatted June 7, 2012

Please e-mail comments, corrections and additions to the webmaster at pje@efgh.com.

1. The Theorem

Some mathematical theorems are especially beautiful. It's hard to say what makes a theorem beautiful, but two properties come to mind:

It's simple to state.
It's hard to prove.

In my opinion, one very beautiful theorem is the Cayley-Hamilton Theorem of matrix algebra. It states that if p(z) is the characteristic polynomial of an n ⨉ n complex matrix A, then p(A) is the zero matrix, where addition and multiplication in its evaluation are the usual matrix operations, and the constant term p₀ of p(z) is replaced by the matrix p₀I.

Recall that the characteristic polynomial of A is given by

p(z) = det(A-zI).

2. The Simple But Invalid Proof

At this point, students sometimes naively substitute A for z in the definition of the characteristic polynomial and falsely conclude that the Cayley-Hamilton Theorem is trivial:

Why isn't p(A) = det(A-AI) = det(A-A) = det(O) = 0?

There are two obvious fallacies. First of all, zI in the definition of the characteristic polynomial represents the multiplication of a matrix by a scalar, but AI is the multiplication of two matrices. Moreover, the argument given above seems to prove that a matrix is equal to a scalar.

Actually, this argument is valid in the special case where A is a 1 ⨉ 1 matrix, and there is essentially no difference between matrix and scalar operations; but in that case the Cayley-Hamilton Theorem really is trivial.

3. An Analytic Proof

There are many proofs of the Cayley-Hamilton Theorem. I must admit that I have trouble following most of them, because they involve rather advanced algebra or combinatorics. A more analytic argument, like the one presented here, is more suited to my own training and talents.

It helps to work out the general 2 ⨉ 2 case:

    | a b |
A = |     |
    | c d |

The characteristic polynomial of A is

        | a-z b   |
p(z) =  |         | = (a-z)(d-z) - bc = z² - (a+d)z + ad - bc.
        | c   d-z |

Substituting A for z gives:

p(A) = A² - (a+d)*A + (ad-bc)*I =

       | a b |   | a b |           | a b |             | 1 0 |
       |     | * |     | - (a+d) * |     | + (ad-bc) * |     | =
       | c d |   | c d |           | c d |             | 0 1 |

       | a²+bc  ab+bd |     | a²+ad  ab+bd |   | ad-bc 0     |
       |              |  -  |              | + |             | =
       | ac+cd  bc+d² |     | ac+cd  ad+d² |   | 0     ad-bc |

       | a²+bc-a²-ad+ad-bc  ab+bd-ab-bd       |    | 0 0 |
       |                                      | =  |     |.
       | ac+cd-ac-cd        bc+d²-ad-d²+ad-bc |    | 0 0 |

To prove the theorem in the general case, let λ be an eigenvalue of A, and let x be the corresponding eigenvector (expressed as a column vector). Then

Ax = λx.

Using elementary properties of scalar and matrix operations gives

A²x = (AA)x = A(Ax) = A(λx) = λ(Ax) = λ(λx) = λ²x.

It can be shown in general that

[1] A^kx = λ^kx, for k = 0, 1, 2, ... n,

where A⁰ = I.

Let the characteristic polynomial of A be

p(z) = p_nzⁿ + p_n-1z^n-1 + p_n-2z^n-2 + ... + p₀.

Then multiply each equation in [1] above by p_k and add them to obtain

p(A)x = p(λ)x.

Now p(λ) is zero because λ is an eigenvalue of A. Hence p(A)x = 0 for every eigenvector x of A.

If A has n linearly independent eigenvectors, this implies that p(A) must be the zero matrix.

If A does not have n linearly independent eigenvectors, we construct a sequence A₁, A₂, ... of matrices whose limit is A, each of which has n linearly independent eigenvectors. Then if p_j(z) is the characteristic polynomial of A_j, p_j(A_j) = O. Since all coefficients in p_j(A_j) are continuous functions of the matrix entries, the same is true of the limit p(A).

To create such a sequence, it is sufficient to construct matrices arbitrarily close to A, each of which has n linearly independent eigenvectors.

First, we need a simple lemma. The matrix A, like all complex matrices, is similar to an upper triangular matrix, i.e., there is a nonsingular matrix Q for which Q^-1AQ is upper triangular. This result is well-known, but a simple proof is given in Appendix A.

The eigenvalues of an upper triangular matrix appear along its principal diagonal. There is an upper triangular matrix T arbitrarily close to Q^-1AQ with n distinct eigenvalues. Then QTQ^-1 is arbitrarily close to A and has the same n distinct eigenvalues as T.

A matrix with n distinct eigenvalues has n distinct eigenvectors. If these eigenvectors were linearly dependent, they would span a space of dimension less than n. The mapping defined by the matrix, restricted to this space, would still have the same n distinct eigenvalues, which is impossible. Hence the eigenvectors are linearly independent.

4. Proof for Commutative Rings

This proves the Cayley-Hamilton Theorem for complex matrices, but it is also true for matrices over more general commutative rings.

The proof of this is actually fairly simple. Our experience in proving the 2 ⨉ 2 case shows the way. The expression for each entry in p(A) is a polynomial in n² variables, which are the entries of A. It's not just any polynomial, but one which takes on the value zero for all values of the variables. That can happen only if all the coefficients are zero when like terms are combined. (This seems to be an obvious result, but it requires proof, so one is given in Appendix B.) Hence the polynominal evaluates to zero in any other algebraic entity that has all the necessary operations.

It might appear that the ring must have a unit. However, if we refrain from combining like terms, we will have a sum of monomials, each prepended by a plus sign or a minus sign. Even in the complex field, cancellation is possible only if every positive monomial has a corresponding negative monomial. They will cancel in a ring, too, even if the ring has no unit.

Appendix A. Every Complex Matrix is Similar to an Upper Triangular Matrix.

Let A be an n ⨉ n complex matrix. We show, by induction on n, that it is similar to an upper triangular matrix.

For n = 1 the assertion is trivial because A is already upper triangular.

In other cases, let λ be an eigenvalue of A and let x₁ be a corresponding eigenvector. Extend this to a set x₁, x₂, ..., x_n of linearly independent vectors, and let Q be the matrix whose columns are x₁, x₂, ..., x_n.

Then if e₁ is the column vector with 1 in its first row and zeros elsewhere, Me₁ is the first column of M for any n ⨉ n matrix M. Hence

Q^-1AQe₁ = Q^-1Ax₁ = Q^-1λx₁ = λQ^-1x₁ = λe₁,

and the first column of Q^-1AQ is λe₁. Hence it can be partitioned as follows:

| λ  v |
|      |
| 0  B |,

where v is an (n-1)-element row vector, B is an (n-1) ⨉ (n-1) matrix, and 0 is all zeros. By inductive hypothesis, R^-1BR is upper triangular for some nonsingular matrix R, so the following matrix is the required upper triangular matrix similar to A:

| 1  0  |           | 1  0 |
|       | * Q^-1AQ *  |      | .
| 0  R^-1 |           | 0  R |

APPENDIX B. A Complex Polynomial Is Identically Zero Only If All Coefficients Are Zero

If a polynomial is identically zero, all of its coefficients are zero, when like terms are combined, of course.

The proof is by induction on the number of independent variables.

For a polynomial p(z) in one variable, we simply note that p^(k)(0) is k! times the coefficient of z^k. If the polynomial is identically zero, all of its derivatives are also identically zero, and all of its coefficients must be zero.

A polynominal in n variables can be rearranged so it is a polynomial in one variable, and each of its coefficients is a polynomial in the remaining n-1 variables.

By the result for one variable, every such coefficient is zero for all values of its independent variables. Hence by inductive hypothesis all of its coefficients are zero.

Although the coefficients of an identically zero complex polynomial are all zero, this is not true over finite fields. If N is the number of elements in a finite field, then z^N - z = 0 for every element z of the field. (The proof given above breaks down in this case. Although formal derivatives can be defined, N! is zero, as it appears in the formal derivative.)