The CayleyHamilton TheoremPhilip J. ErdelskyJune 8, 2006

Please email comments, corrections and additions to the webmaster at pje@efgh.com.
Some mathematical theorems are especially beautiful. It's hard to say what makes a theorem beautiful, but two properties come to mind:
In my opinion, one very beautiful theorem is the CayleyHamilton Theorem of matrix algebra. It states that if p(z) is the characteristic polynomial of an n ⨉ n complex matrix A, then p(A) is the zero matrix, where addition and multiplication in its evaluation are the usual matrix operations, and the constant term p_{0} of p(z) is replaced by the matrix p_{0}I.
Recall that the characteristic polynomial of A is given by
p(z) = det(AzI).
At this point, students sometimes naively substitute A for z in the definition of the characteristic polynomial and falsely conclude that the CayleyHamilton Theorem is trivial:
Why isn't p(A) = det(AAI) = det(AA) = det(O) = 0?
There are two obvious fallacies. First of all, zI in the definition of the characteristic polynomial represents the multiplication of a matrix by a scalar, but AI is the multiplication of two matrices. Moreover, the argument given above seems to prove that a matrix is equal to a scalar.
Actually, this argument is valid in the special case where A is a 1 ⨉ 1 matrix, and there is essentially no difference between matrix and scalar operations; but in that case the CayleyHamilton Theorem really is trivial.
There are many proofs of the CayleyHamilton Theorem. I must admit that I have trouble following most of them, because they involve rather advanced algebra or combinatorics. A more analytic argument, like the one presented here, is more suited to my own training and talents.
It helps to work out the general 2 ⨉ 2 case:
 a b  A =    c d 
The characteristic polynomial of A is
 az b  p(z) =   = (az)(dz)  bc = z^{2}  (a+d)z + ad  bc.  c dz 
Substituting A for z gives:
p(A) = A^{2}  (a+d)*A + (adbc)*I =  a b   a b   a b   1 0    *    (a+d) *   + (adbc) *   =  c d   c d   c d   0 1   a^{2}+bc ab+bd   a^{2}+ad ab+bd   adbc 0       +   =  ac+cd bc+d^{2}   ac+cd ad+d^{2}   0 adbc   a^{2}+bca^{2}ad+adbc ab+bdabbd   0 0    =  .  ac+cdaccd bc+d^{2}add^{2}+adbc   0 0 
To prove the theorem in the general case, let λ be an eigenvalue of A, and let x be the corresponding eigenvector (expressed as a column vector). Then
Ax = λx.
Using elementary properties of scalar and matrix operations gives
A^{2}x = (AA)x = A(Ax) = A(λx) = λ(Ax) = λ(λx) = λ^{2}x.
It can be shown in general that
[1] A^{k}x = λ^{k}x, for k = 0, 1, 2, ... n,
where A^{0} = I.
Let the characteristic polynomial of A be
p(z) = p_{n}z^{n} + p_{n1}z^{n1} + p_{n2}z^{n2} + ... + p_{0}.
Then multiply each equation in [1] above by p_{k} and add them to obtain
p(A)x = p(λ)x.
Now p(λ) is zero because λ is an eigenvalue of A. Hence p(A)x = 0 for every eigenvector x of A.
If A has n linearly independent eigenvectors, this implies that p(A) must be the zero matrix.
If A does not have n linearly independent eigenvectors, we construct a sequence A_{1}, A_{2}, ... of matrices whose limit is A, each of which has n linearly independent eigenvectors. Then if p_{j}(z) is the characteristic polynomial of A_{j}, p_{j}(A_{j}) = O. Since all coefficients in p_{j}(A_{j}) are continuous functions of the matrix entries, the same is true of the limit p(A).
To create such a sequence, it is sufficient to construct matrices arbitrarily close to A, each of which has n linearly independent eigenvectors.
First, we need a simple lemma. The matrix A, like all complex matrices, is similar to an upper triangular matrix, i.e., there is a nonsingular matrix Q for which Q^{1}AQ is upper triangular. This result is wellknown, but a simple proof is given in Appendix A.
The eigenvalues of an upper triangular matrix appear along its principal diagonal. There is an upper triangular matrix T arbitrarily close to Q^{1}AQ with n distinct eigenvalues. Then QTQ^{1} is arbitrarily close to A and has the same n distinct eigenvalues as T.
A matrix with n distinct eigenvalues has n distinct eigenvectors. If these eigenvectors were linearly dependent, they would span a space of dimension less than n. The mapping defined by the matrix, restricted to this space, would still have the same n distinct eigenvalues, which is impossible. Hence the eigenvectors are linearly independent.
This proves the CayleyHamilton Theorem for complex matrices, but it is also true for matrices over more general commutative rings.
The proof of this is actually fairly simple. Our experience in proving the 2 ⨉ 2 case shows the way. The expression for each entry in p(A) is a polynomial in n^{2} variables, which are the entries of A. It's not just any polynomial, but one which takes on the value zero for all values of the variables. That can happen only if all the coefficients are zero when like terms are combined. (This seems to be an obvious result, but it requires proof, so one is given in Appendix B.) Hence the polynominal evaluates to zero in any other algebraic entity that has all the necessary operations.
It might appear that the ring must have a unit. However, if we refrain from combining like terms, we will have a sum of monomials, each prepended by a plus sign or a minus sign. Even in the complex field, cancellation is possible only if every positive monomial has a corresponding negative monomial. They will cancel in a ring, too, even if the ring has no unit.
Let A be an n ⨉ n complex matrix. We show, by induction on n, that it is similar to an upper triangular matrix.
For n = 1 the assertion is trivial because A is already upper triangular.
In other cases, let λ be an eigenvalue of A and let x_{1} be a corresponding eigenvector. Extend this to a set x_{1}, x_{2}, ..., x_{n} of linearly independent vectors, and let Q be the matrix whose columns are x_{1}, x_{2}, ..., x_{n}.
Then if e_{1} is the column vector with 1 in its first row and zeros elsewhere, Me_{1} is the first column of M for any n ⨉ n matrix M. Hence
Q^{1}AQe_{1} = Q^{1}Ax_{1} = Q^{1}λx_{1} = λQ^{1}x_{1} = λe_{1},
and the first column of Q^{1}AQ is λe_{1}. Hence it can be partitioned as follows:
 λ v     0 B ,
where v is an (n1)element row vector, B is an (n1) ⨉ (n1) matrix, and 0 is all zeros. By inductive hypothesis, R^{1}BR is upper triangular for some nonsingular matrix R, so the following matrix is the required upper triangular matrix similar to A:
 1 0   1 0    * Q^{1}AQ *   .  0 R^{1}   0 R 
If a polynomial is identically zero, all of its coefficients are zero, when like terms are combined, of course.
The proof is by induction on the number of independent variables.
For a polynomial p(z) in one variable, we simply note that p^{(k)}(0) is k! times the coefficient of z^{k}. If the polynomial is identically zero, all of its derivatives are also identically zero, and all of its coefficients must be zero.
A polynominal in n variables can be rearranged so it is a polynomial in one variable, and each of its coefficients is a polynomial in the remaining n1 variables.
By the result for one variable, every such coefficient is zero for all values of its independent variables. Hence by inductive hypothesis all of its coefficients are zero.
Although the coefficients of an identically zero complex polynomial are all zero, this is not true over finite fields. If N is the number of elements in a finite field, then z^{N}  z = 0 for every element z of the field. (The proof given above breaks down in this case. Although formal derivatives can be defined, N! is zero, as it appears in the formal derivative.)