Let #A# be an \((n\times n)\)-matrix. Earlier, we saw that all matrices of this size, with the usual matrix addition and scalar multiplication, form an #n^2#-dimensional vector space. Thus, there is a non-negative integer #k\le n^2#, such that the system \[\basis{1,A,A^2,\ldots, A^{k}}\] is linearly dependent. In that case there is an equality of the form
\[A^k+c_{k-1} \cdot A^{k-1}+\cdots +c_1\cdot A + c_0\cdot I_n = 0\]
According to the Cayley-Hamilton theorem below, #k# is at most #n#. To understand this result, we specify what is meant by evaluating a polynomial at a square matrix.
Let #n# be a natural number and #A# an #(n\times n)#-matrix. Under substitution of #A# (for #x#) into a polynomial \[ p(x) = c_0+c_1x+\cdots+ c_kx^k\] in #x#, or evaluating this polynomial at #A#, we mean determining the #(n\times n)#-matrix \(p(A)\) defined by \[p(A)= c_0\cdot I_n+c_1\cdot A+\cdots +c_k\cdot A^k\]
The assignment of #p(A)# to #p(x)# is a mapping # P\to M_{n\times n}# from the vector space #P# of all polynomials in #x# to the vector space #M_{n\times n}# of all #(n\times n)#-matrices.
This map respects multiplication in the following sense: if #p(x)#, #q(x)#, and #r(x)# are polynomials such that #r (x)= p(x)\cdot q(x)#, then \[r(A) = p(A)\, q(A)\] Under substitution of a linear map #L: V\to V# into the polynomial #p(x)# as given above, we mean the linear map #p(L): V\to V# defined by \[p(L)= c_0\cdot I_V+c_1\cdot L+\cdots +c_k\cdot L^k\]
If, for example, #p(x) = 3x +4# and #A = \matrix{1&-3\\ 2&-5}#, then \[\begin{array}{rcl}p(A) &=& 3\cdot A+4\cdot I_2 \\ &=&3\cdot\matrix{1&-3\\ 2&-5}+4\cdot\matrix{1&0\\ 0&1}\\ &=&\matrix{3+4&-9\\ 6&-15+4}\\ &=&\matrix{7&-9\\ 6&-11}\end{array}\]
If #A = \matrix{a}#, then the substitution of the matrix #A# into a polynomial #p(x)# is no different than substituting the value #a# for #x# into #p(x)#, that is to say: #p(A) # is the #(1\times1)#-matrix with entry #p(a)#.
The meaning of #p(L)# follows from the theory of
Composition of linear maps and
Sum and multiple of linear maps.
The matrix of #p(L)# with respect to a basis #\alpha# of #V# is obtained by the substitution of the matrix of #L# with respect to #\alpha# into #p(x)#:\[\left(p(L)\right)_\alpha=p(L_\alpha)\]
A remarkable fact about linear maps from a finite-dimensional vector space to itself is that substitution in their characteristic polynomial gives the null map:
Each linear map #L:V\to V# from a finite-dimensional vector space #V# to itself satisfies \[p_L(L) = 0_V\]where #p_L(x)# is the characteristic polynomial.
If #\dim{V}=n# and we write the characteristic polynomial as \[\det(L-\lambda\cdot I_V) = (-1)^n\lambda^n+c_{n-1} \lambda^{n-1}+\cdots +c_1 \lambda + c_0\] then for any basis #\alpha# of #V#, the #(n\times n)#-matrix #X=L_\alpha# is a solution of the corresponding matrix equation in #X#: \[(-1)^n\cdot X^n+c_{n-1} \cdot X^{n-1}+\cdots +c_1\cdot X + c_0\cdot I_n = 0_n\]
Let #n# be a natural number and #A# an #(n\times n)#-matrix. Then #A# satisfies #p_A(A) = 0_n#. This is the special case of the theorem for the linear map #L_A# determined by #A#.
Let #n# be a natural number. First we prove the statement: Every #(n\times n)#-matrix #A# satisfies #p_A(A) = 0_n#.
We use the matrix equation in the Cramer's rule. We recall that the #(i,j)#-entry of the adjoint matrix #\text{adj}(A)# is #(-1)^{i+j}\cdot \det(A_{ji})#. Said equation is \[\det(A)\cdot I_n = A\, \text{adj}(A)\] If we replace #A# by #A-x\cdot I_n#, we find \[\det(A-x\cdot I_n)\cdot I_n = (A-x\cdot I_n)\, \text{adj}(A-x\cdot I_n)\]
We carry out the proof by applying substitution of an #(n\times n)#-matrix into a more general expression than a polynomial, namely a polynomial whose coefficients are also #(n\times n)#-matrices. We denote by #Q# the collection of all such polynomials. An element #q(x)# of #Q# is of the form
\[q(x) =x^k\,C_k+x^{k-1}\,C_{k-1} +\cdots + x\,C_1 + C_0\] where #C_k,C_{k-1},\ldots,C_0# belong to #M_{n\times n}#. The factor #r(x) = A-x\cdot I_n# in the above equality is an example of an element of #Q#. The left-hand side of the equality is also an example; here all of the coefficients of powers of #x# are multiples of the identity matrix #I_n#. Finally, the factor \(p(x) = \text{adj}(A-x\cdot I_n)\) on the right-hand side is also a polynomial in #Q# of degree #n-1# in #x# (after all, the determinants in the adjoint matrix are of matrices with #n-1# rows and #n-1# columns).
For #n=1#, the matrices are no different than ordinary numbers and we have #Q=P#.
Again, we denote by #q(A)# the matrix obtained from #q(x)# by substitution of #A# for #x#. It is easy to see that #Q# is again a vector space and that the map assigning to #q(x)# the matrix #q(A)#, is a linear map #Q\to M_{n\times n}#.
Elements of #Q# can be multiplied by one another with the usual rules, where #C\,x# is rewritten to #x\, C# for each matrix #C# in #M_{n\times n}#. If #n\gt 1#, then matrix multiplication is no longer commutative, so substitution of #A# for #x# no longer respects multiplication. This means that it may happen that \((p\cdot q)(A) \ne p(A)\, q(A)\). For example, if #A# and #B# are invertible matrices that do not commute (that is, #A\, B \ne B\, A#), then #p(x) = x# and #q(x) =B # satisfy
\[(p\cdot q)(A) = B\,A \ne A\,B = p(A)\, q(A)\]
However, if #A# commutes with #p(x)# and #q(x)#, and so commutes with each coefficient of #p(x)# and #q(x)#, then we do have
\[(p\cdot q)(A) = p(A)\, q(A)\]
because then the product in the right-hand side can be expanded in the same manner as for a polynomial in #x# (always when #C\,x# is rewritten to #x\, C# for a coefficient of #p(x)# or #q(x)#, the same commutation also applies after substitution: #C\,A = A\,C#).
We apply this statement to the equality \(\det(A-x I_n)\cdot I_n = (A-x I_n)\, \text{adj}(A-x I_n)\) derived above. Because #A-x I_n# commutes with #\text{adj}(A-x I_n)#, and because #x\, I_n# commutes with each matrix, it follows that #A=(A-xI_n)+xI_n# commutes with #\text{adj}(A-x I_n)# and, as a consequence, with each coefficient of #\text{adj}(A-x I_n)#.
If we substitute #A# for #x#, then the right-hand side becomes equal to the null matrix because the substitution of #A# in #A-x I_n# yields the null matrix. We have found that substitution of #A# respects multiplication, thus substitution of #A# in the entire right-hand side gives the null matrix. We conclude that substitution of #A# in the left-hand side produces the null matrix as well. This means that substitution of #A# in #\det(A-x I_n)\cdot I_n# gives the null matrix. Since substituting #A# for #x# in #p_A(x)\cdot I_n=\det(A-x I_n)\cdot I_n# equals #p_A(A)\cdot I_n#, this means that #p_A(A)=0_n#, which proves the statement about #A#.
The theorem is a direct consequence because, for each basis #\alpha# of #V# we have \[\left(p_L(L) \right)_{\alpha}=\left(p_A(L)\right)_{\alpha}=p_A( A) = 0_n\] where #A=L_\alpha#.
Substitution of #A# for #x# in #\text{adj}(A- x I_n)# does not always give the null matrix. If, for example,
\[A = \matrix{a&b\\ c&d}\] then \[\text{adj}(A- x I_2) = \matrix{d-x&-b\\- c&a-x}=\matrix{d&-b\\ -c&a}-x\cdot I_2\] so substitution of #A# gives \[\matrix{d&-b\\ -c&a}-\matrix{a&b\\ c&d} = \matrix{d-a & -2b\\-2c&a-d}\]
This is distinct from the null matrix if for example #d\ne a#.
Consider the matrix \[ A =\matrix{1 & 0 & 0 \\ 2 & 3 & -2 \\ 4 & 4 & -3 \\ }\] and the polynomial \[p(x) = x^2-1\] Calculate #p(A)#.
\(p(A)= \) \(\matrix{0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ }\)
In order to calculate the matrix #p(A)#, we substitute #A# for #x# in the polynomial #p(x)# and we simplify the result to a single #(3\times3)#-matrix:
\[\begin{array}{rcl}
p(A) &=&A^2-I_3\\
&&\phantom{xxx}\color{blue}{A\text{ substituted }}\\
&=& \matrix{1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ } -1\cdot \matrix{1&0&0\\ 0&1&0\\ 0&0&1}\\
&&\phantom{xxx}\color{blue}{\text{explicit matrices substituted}}\\
&=& \matrix{0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ }\\
&&\phantom{xxx}\color{blue}{\text{linear combination simplified}}\\
\end{array}
\]