## The Cayley-Hamilton Theorem## Philip J. Erdelsky
Reformatted June 7, 2012 |

Please e-mail comments, corrections and additions to the webmaster at pje@efgh.com.

Some mathematical theorems are especially beautiful. It's hard to say what makes a theorem beautiful, but two properties come to mind:

- It's simple to state.
- It's hard to prove.

In my opinion, one very beautiful theorem is the **Cayley-Hamilton
Theorem** of matrix algebra. It states that if *p(z)* is the
characteristic polynomial of an *n ⨉ n* complex matrix * A*,
then

Recall that the characteristic polynomial of **A** is given by

p(z) = det(A-zI).

At this point, students sometimes naively substitute * A* for

Why isn'tp(?A) = det(A-AI) = det(A-A) = det(O) = 0

There are two obvious fallacies. First of all, *z I* in the definition of
the characteristic polynomial represents the multiplication of a matrix
by a scalar, but

Actually, this argument *is* valid in the special case where
* A*
is a

There are many proofs of the Cayley-Hamilton Theorem. I must admit that I have trouble following most of them, because they involve rather advanced algebra or combinatorics. A more analytic argument, like the one presented here, is more suited to my own training and talents.

It helps to work out the general *2 ⨉ 2* case:

| a b | A = | | | c d |

The characteristic polynomial of * A* is

| a-z b | p(z) = | | = (a-z)(d-z) - bc = z^{2}- (a+d)z + ad - bc. | c d-z |

Substituting * A* for

p(A) = A^{2}- (a+d)*A + (ad-bc)*I= | a b | | a b | | a b | | 1 0 | | | * | | - (a+d) * | | + (ad-bc) * | | = | c d | | c d | | c d | | 0 1 | | a^{2}+bc ab+bd | | a^{2}+ad ab+bd | | ad-bc 0 | | | - | | + | | = | ac+cd bc+d^{2}| | ac+cd ad+d^{2}| | 0 ad-bc | | a^{2}+bc-a^{2}-ad+ad-bc ab+bd-ab-bd | | 0 0 | | | = | |. | ac+cd-ac-cd bc+d^{2}-ad-d^{2}+ad-bc | | 0 0 |

To prove the theorem in the general case, let *λ* be an eigenvalue of
**A**, and let * x* be the corresponding eigenvector (expressed as a column
vector). Then

Ax= λx.

Using elementary properties of scalar and matrix operations gives

A^{2}x= (AA)x=A(Ax) =A(λx) = λ(Ax) = λ(λx) = λ^{2}x.

It can be shown in general that

[1], forA^{k}x= λ^{k}xk = 0, 1, 2, ... n,

where * A^{0} = I*.

Let the characteristic polynomial of * A* be

p(z) = p_{n}z^{n}+ p_{n-1}z^{n-1}+ p_{n-2}z^{n-2}+ ... + p_{0}.

Then multiply each equation in [1] above by *p _{k}* and add them
to obtain

p(A)x= p(λ)x.

Now *p(λ)* is zero because *λ* is an eigenvalue
of * A*. Hence

If * A* has n linearly independent eigenvectors, this implies that

If * A* does not have n linearly independent eigenvectors,
we construct
a sequence

To create such a sequence, it is sufficient to construct matrices
arbitrarily close to * A*, each of which has

First, we need a simple lemma. The matrix * A*, like all complex matrices,
is similar to an upper triangular matrix, i.e., there is a nonsingular
matrix

The eigenvalues of an upper triangular matrix appear along its principal
diagonal. There is an upper triangular matrix * T*
arbitrarily close to

A matrix with *n* distinct eigenvalues has *n* distinct eigenvectors. If
these eigenvectors were linearly dependent, they would span a space of
dimension less than *n*. The mapping defined by the matrix, restricted to
this space, would still have the same *n* distinct eigenvalues, which is
impossible. Hence the eigenvectors are linearly independent.

This proves the Cayley-Hamilton Theorem for complex matrices, but it is also true for matrices over more general commutative rings.

The proof of this is actually fairly simple. Our experience in proving
the *2 ⨉ 2* case shows the way. The expression for each entry in
*p( A)* is a
polynomial in

It might appear that the ring must have a unit. However, if we refrain from combining like terms, we will have a sum of monomials, each prepended by a plus sign or a minus sign. Even in the complex field, cancellation is possible only if every positive monomial has a corresponding negative monomial. They will cancel in a ring, too, even if the ring has no unit.

Let * A* be an

For *n = 1* the assertion is trivial because * A* is already upper
triangular.

In other cases, let *λ* be an eigenvalue of
* A* and let

Then if * e_{1}* is the column vector with

Q^{-1}AQe_{1}=Q^{-1}Ax_{1}=Q^{-1}λx_{1}= λQ^{-1}x_{1}= λe_{1},

and the first column of * Q^{-1}AQ* is

| λv| | | |0B|,

where * v* is an

| 10| | 10| | | *Q^{-1}AQ* | | . |0R^{-1}| |0R|

If a polynomial is identically zero, all of its coefficients are zero, when like terms are combined, of course.

The proof is by induction on the number of independent variables.

For a polynomial *p(z)* in one variable, we simply note that
*p ^{(k)}(0)* is

A polynominal in *n* variables can be rearranged so it is a polynomial
in one variable, and each of its coefficients is a polynomial in the
remaining *n-1* variables.

By the result for one variable, every such coefficient is zero for all values of its independent variables. Hence by inductive hypothesis all of its coefficients are zero.

Although the coefficients of an identically zero complex polynomial are
all zero, this is not true over finite fields. If *N* is the number of
elements in a finite field, then *z ^{N} - z = 0* for every element