logo

Cayley-Hamilton Theorem 📂Linear Algebra

Cayley-Hamilton Theorem

Definition1

Let $T : V \to V$ be a linear transformation on a finite-dimensional vector space $V$. Let $f(t)$ be the characteristic polynomial of $T$. Then, the following holds:

$$ f(T) = T_{0} $$

Here, $T_{0}$ is the zero transformation. In other words, a linear transformation satisfies its own characteristic polynomial. Rewriting this theorem from the perspective of matrices,

Corollary

Square matrices satisfy their own characteristic equations.

$$ f(A) = O $$

Explanation

The owner and the customers of the same age must have learned about matrices in high school, and what they saw then is precisely this Cayley-Hamilton theorem. (Apparently, it wasn’t part of the curriculum, just like L’Hôpital’s Rule2)

For a 2nd order square matrix $A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}$, the following holds: $$ A^{2} -(a + d)A + (ad - bc)I = O $$

Proof

What we need to show is that for every $\mathbf{v} \in V$, $f(T)(\mathbf{v}) = \mathbf{0}$ holds. Since $T$ is a linear transformation, the case where $\mathbf{v} = \mathbf{0}$ is trivial. Assume $\mathbf{v} \ne \mathbf{0}$.

Let $W$ be the $T$-cyclic subspace generated by $\mathbf{v}$, and let $k = \dim(W)$.

Lemma on cyclic subspaces

  1. $\left\{ \mathbf{v}, T\mathbf{v}, \dots, T^{k-1}\mathbf{v} \right\}$ is a basis of $W$.

  2. If $a_{0}\mathbf{v} + a_{1}T \mathbf{v} + \cdots + a_{k-1}T^{k-1} \mathbf{v} + T^{k}\mathbf{v} = \mathbf{0}$, then the characteristic polynomial of the restriction map $T|_{W}$ is $$ f(t) = (-1)^{k}\left( a_{0} + a_{1}t + \cdots +a_{k-1}t^{k-1} + t^{k} \right) $$

By Lemma 1., there exists a constant $a_{0}, a_{1}, \dots, a_{k-1}$ that satisfies the following:

$$ \begin{equation} a_{0}\mathbf{v} + a_{1}T\mathbf{v} + \cdots + a_{k-1}T^{k-1}\mathbf{v} + T^{k}\mathbf{v} = \mathbf{0} \end{equation} $$

Then, by Lemma 2., the characteristic polynomial of the restriction map $T|_{W}$ is as follows:

$$ \begin{equation} g(t) = (-1)^{k}\left( a_{0} + a_{1}t + \cdots +a_{k-1}t^{k-1} + t^{k} \right) \end{equation} $$

Hence, by $(1)$ and $(2)$, we obtain the following:

$$ g(T)(\mathbf{v}) = (-1)^{k}\left( a_{0}I + a_{1}T + \cdots +a_{k-1}T^{k-1} + T^{k} \right)(\mathbf{v}) = \mathbf{0} $$

Lemma on invariant subspaces

If $W$ is an $T$-invariant subspace, then the characteristic polynomial of $T|_{W}$ divides the characteristic polynomial of $T$.

By the above Lemma, $g(t)$ divides the characteristic polynomial $f(t)$ of $T$. Therefore, for some polynomial $q(t)$, $f(t) = q(t)g(t)$ holds. Thus,

$$ f(T)(\mathbf{v}) = q(T)g(T)(\mathbf{v}) = g(T)\left( g(T)(\mathbf{v}) \right) = g(T)(\mathbf{0}) = \mathbf{0} $$