Cayley-Hamilton Theorem

Definition¹

Let $T : V \to V$ be a linear transformation on a finite-dimensional vector space $V$ . Let $f(t)$ be the characteristic polynomial of $T$ . Then, the following holds:

$f(T) = T_{0}$

Here, $T_{0}$ is the zero transformation. In other words, a linear transformation satisfies its own characteristic polynomial. Rewriting this theorem from the perspective of matrices,

Corollary

Square matrices satisfy their own characteristic equations.

$f(A) = O$

Explanation

The owner and the customers of the same age must have learned about matrices in high school, and what they saw then is precisely this Cayley-Hamilton theorem. (Apparently, it wasn’t part of the curriculum, just like L’Hôpital’s Rule²)

For a 2nd order square matrix $A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}$ , the following holds: $A^{2} -(a + d)A + (ad - bc)I = O$

Proof

What we need to show is that for every $\mathbf{v} \in V$ , $f(T)(\mathbf{v}) = \mathbf{0}$ holds. Since $T$ is a linear transformation, the case where $\mathbf{v} = \mathbf{0}$ is trivial. Assume $\mathbf{v} \ne \mathbf{0}$ .

Let $W$ be the $T$ -cyclic subspace generated by $\mathbf{v}$ , and let $k = \dim(W)$ .

Lemma on cyclic subspaces
$\left\{ \mathbf{v}, T\mathbf{v}, \dots, T^{k-1}\mathbf{v} \right\}$ is a basis of $W$ .
If $a_{0}\mathbf{v} + a_{1}T \mathbf{v} + \cdots + a_{k-1}T^{k-1} \mathbf{v} + T^{k}\mathbf{v} = \mathbf{0}$ , then the characteristic polynomial of the restriction map $T|_{W}$ is $f(t) = (-1)^{k}\left( a_{0} + a_{1}t + \cdots +a_{k-1}t^{k-1} + t^{k} \right)$

By Lemma 1., there exists a constant $a_{0}, a_{1}, \dots, a_{k-1}$ that satisfies the following:

$\begin{equation} a_{0}\mathbf{v} + a_{1}T\mathbf{v} + \cdots + a_{k-1}T^{k-1}\mathbf{v} + T^{k}\mathbf{v} = \mathbf{0} \end{equation}$

Then, by Lemma 2., the characteristic polynomial of the restriction map $T|_{W}$ is as follows:

$\begin{equation} g(t) = (-1)^{k}\left( a_{0} + a_{1}t + \cdots +a_{k-1}t^{k-1} + t^{k} \right) \end{equation}$

Hence, by $(1)$ and $(2)$ , we obtain the following:

$g(T)(\mathbf{v}) = (-1)^{k}\left( a_{0}I + a_{1}T + \cdots +a_{k-1}T^{k-1} + T^{k} \right)(\mathbf{v}) = \mathbf{0}$

Lemma on invariant subspaces
If $W$ is an $T$ -invariant subspace, then the characteristic polynomial of $T|_{W}$ divides the characteristic polynomial of $T$ .

By the above Lemma, $g(t)$ divides the characteristic polynomial $f(t)$ of $T$ . Therefore, for some polynomial $q(t)$ , $f(t) = q(t)g(t)$ holds. Thus,

$f(T)(\mathbf{v}) = q(T)g(T)(\mathbf{v}) = g(T)\left( g(T)(\mathbf{v}) \right) = g(T)(\mathbf{0}) = \mathbf{0}$

■

Stephen H. Friedberg, Linear Algebra (4th Edition, 2002), p317 ↩︎
https://namu.wiki/w/%EC%BC%80%EC%9D%BC%EB%A6%AC-%ED%95%B4%EB%B0%80%ED%84%B4%20%EC%A0%95%EB%A6%AC#s-2 ↩︎