We will now show that finding a more simple form of the matrix of a linear map #V\to V# relative to the basis for a finite-dimensional vector space #V# can be reduced to the case in which the minimal polynomial is either the power of a single linear polynomial (in the case #V# is real) or the power of a quadratic polynomial with negative discriminant.
We keep in mind that the notation #V = U\oplus W# means that #V# is the direct sum of the subspaces #U# and #W#; meaning: #V = U+W# and #U\cap W =\{\vec{0}\}#.
Let #V# be a vector space with finite dimension #n# and let #L:V\to V# be a linear map which is a zero of the polynomial #f(x)#. If #g(x)# and #h(x)# are polynomials with \[\gcd(g(x),h(x))=1\phantom{xx}\text{ and }\phantom{xx}f(x) = g(x)\cdot h(x)\] then we have
\[\begin{array}{rcl} \im{g(L)} &=& \ker{h(L)}\\ \im{h(L)} &=& \ker{g(L)}\\ V &=& \ker{g(L)}\oplus \ker{h(L)}\end{array}\]
The involved subspaces #\ker{h(L)}# and # \ker{g(L)}# are both invariant under #L#.
Hence, if we choose the bases #\alpha# for #\ker{g(L)}# and #\beta# for #\ker{h(L)}#, then the composition #\gamma# is a basis for #V# and the matrix of #L# relative to #\gamma# has the form \[L_\gamma = \matrix{L_\alpha&0\\ 0 & L_\beta}\]where #L_\alpha# is the matrix of #\left.L\right|_{\ker{g(L)}}# relative to #\alpha# and #L_\beta# the matrix of #\left.L\right|_{\ker{h(L)}}# relative to #\beta#.
In particular, the characteristic polynomial of #L# on #V# is the product of the characteristic polynomials of the restrictions of #L# to #\ker{g(L)}# and to #\ker{h(L)}#.
The statement on the invariance is a direct result of the statement Invariance of kernel and image under communting linear maps because #L# commutes with each matrix of the form #p(L)#, where #p# is a polynomial.
Assume that #\vec{x}# belongs to #\im{h(L)}#. Then there exists a vector #\vec{y}# in #V#, such that #\vec{x}= {h(L)} (\vec{y})#. Application of #{g(L)}# on this equality gives \[{g(L)}\vec{x}= {g(L)}\,{h(L)} (\vec{y}) = {f(L)} (\vec{y})=\vec{0}\] which means that #\vec{x}# belongs to #\ker{g(L)}#. This means that #\im{h(L)}# lies in #\ker{g(L)}#.
For the proof of the reverse we use the extended Euclidean algorithm, which gives us two polynomials #p(x)# and #q(x)#, in such a way that \[p(x)\cdot g(x)+q(x)\cdot h(x) = 1\]Substitution of #L# gives \[p(L)\, g(L)+q(L)\, h(L) = I_V\]If #\vec{x}# belongs to #\ker{g(L)}#, then the equality gives \[h(L)\left( q(L)\vec{x} \right)=q(L)\, h(L)\vec{x} = \vec{x}\] from which we conclude that #\ker{g(L)}# lies in #\im{h(L)}#. Together with the previously proven inclusion this gives us the equality #\ker{g(L)}=\im{h(L)}#. Since the roles of #g# and #h# are interchangable, we also determine that #\ker{h(L)}=\im{g(L)}# is proven.
To prove the statement on the direct sum, we have to deduce two statements:
- The vector space #V# is spanned by #\ker{g(L)}# and #\ker{h(L)}#.
- The intersection of #\ker{g(L)}# and #\ker{h(L)}# is #\{\vec{0}\}#.
1. A glance on the previously deduced equality \[p(L)\, g(L)+q(L)\, h(L) = I_V\]teaches us that every #\vec{x}# in #V# can be written as \(g(L)\left(p(L)(\vec{x})\right)+h(L)\left(q(L)(\vec{x})\right)\) hence, lies in the span of #\im{g(L)}# and #\im{h(L)}#. We previously saw that these subspaces coincide with #\ker{h(L)}# respectively #\ker{g(L)}#, from which it follows that #V# is spanned by #\ker{g(L)}# and #\ker{h(L)}#.
2. Assume #\vec{x}# belongs to both #\ker{g(L)}# as well as #\ker{h(L)}#. We then have
\[\vec{x} = p(L)\left(g(L)(\vec{x})\right)+q(L)\left(h(L)(\vec{x})\right) =p(L)( \vec{0})+q(L)(\vec{0})= \vec{0}+\vec{0} = \vec{0}\] This shows us that #\vec{x}# is the zero vector.
The fact that the characteristic polynomial of #L# on #V# is the product of the characteristic polynomials of the restrictions of #L# to #\ker{g(L)}# and to #\ker{h(L)}# is the result of the matrix decomposition and a formula for the determinant of some special matrices applied to the matrix of #L-x\cdot I_V# relative to a basis for #V# consisting of bases for #\ker{g(L)}# and #\ker{h(L)}#.
The most important examples of polynomials #f(x)# with #f(L) = 0# are the minimal polynomial #f(x) = m_L(x)# and the characteristic polynomial #f(x) = p_L(x)#.
The condition #\gcd(g(x),h(x))=1# is definitely needed to come to the conclusion that #V# is a direct sum of #\ker{g(L)}# and #\ker{h(L)}#. If #L:\mathbb{R}^2\to\mathbb{R}^2# is the linear map determined by the matrix \[ \matrix{0&1\\ 0&0}\] then the minimal polynomial of #L# is equal to #x^2#. The polynomials #g(x)=x# and #h(x)=x# satisfy all conditions of the theorem except #\gcd(g(x),h(x))=1#. Since #g(x) =x= h(x)#, we have #\ker{g(L)}=\ker{L} = \ker{h(L)}#. Hence, the vector space #\mathbb{R}^2# is not the direct sum of the two kernels.
Application to a complete factorization of the characteristic polynomial of #L# gives:
If #V# is a finite-dimensional vector space and #L:V\to V# a linear map with characteristic polynomial equal to \[(x-\lambda_1)^{k_1} \cdots (x-\lambda_r)^{k_r}\]for certain differing numbers #\lambda_1,\ldots,\lambda_r# and natural numbers #k_1,\ldots,k_r#, then we have \[ V = \ker{(L-\lambda_1\,I_V)^{k_1}}\oplus \cdots \oplus \ker{(L-\lambda_r\,I_V)^{k_r}}\]
In particular, the restriction of #L# to \(\ker{(L-\lambda_i\,I_V)^{k_i}}\) has the characteristic polynomial \((x-\lambda_i)^{k_i}\) and we have \(\dim{\ker{(L-\lambda_i\,I_V)^{k_i}}}=k_i\) for #i=1,\ldots,r#.
The direct sum decomposition \(V=\ker{(L-\lambda_1\,I_V)^{k_1}}\oplus \cdots \oplus \ker{(L-\lambda_r\,I_V)^{k_r}}\) follows directly from repeated application of the theorem above with #f(x)=p_L(x)# and #g(x)# consecutively equal to \((x-\lambda_i)^{k_i}\) for #i=1,\ldots,r#.
The subspace \(\ker{(L-\lambda_i\,I_V)^{k_i}}\) is invariant under #L#. Because #\lambda_i# is the only root of the minimal polynomial on this subspace and because of statement 4 of theorem Minimal polynomial the restriction of #L# to this subspace has characteristic polynomial \((x-\lambda_i)^{m_i}\) for a certain natural number #m_i#. By theorem Determinants of some special matrices the characteristic polynomial of #L# is equal to the product \[p_L(x) = (x-\lambda_1)^{m_1}\cdots (x-\lambda_r)^{m_r}\]Comparing with the given factorization shows us that #m_i = k_i# for #i=1,\ldots,r#. This implies that #k_i# is the dimension of \(\ker{(L-\lambda_i\,I_V)^{k_i}}\).
As we know, the linear map #L# has as minimal polynomial \[m_L(x)=(x-\lambda_1)^{\ell_1}\cdots (x-\lambda_r)^{\ell_r}\]for certain natural numbers #\ell_1,\ldots,\ell_r#, with #\ell_i\le k_i#. Later we will see that #(x-\lambda_i)^{\ell_i}# is the minimal polynomial of #L# restricted to #\ker{(x-\lambda_i)^{k_i}}#. Not only \(\ker{(L-\lambda_i\,I_V)^{\ell_i}}\) is a linear subspace invariant under #L#, but also \(\ker{(L-\lambda_i\,I_V)^{k}}\) for every natural number #k#. Later on, we will use the dimensions of these linear subspaces to characterize the conjugation class of #L#.
Assume that \(A\) is a #(2\times2)#-matrix and that #\lambda# and #\mu# are the roots of the characteristic polynomial #p_A(x)# of #A#. If #\lambda\ne\mu#, then #\mathbb{R}^2# is the direct sum of the two #1#-dimensional eigenspaces #\ker{A-\lambda\cdot I_2}# and #\ker{A-\mu\cdot I_2}#. Relative to each basis constructed from a vector from each of both eigenspaces, the matrix of #L_A# is the diagonal matrix with #\lambda# and #\mu# on the diagonal. Here, the theorem confirms what we already know: #A# is diagonalizable.
Now assume #\lambda = \mu#. Then #A# is diagonalizable if and only if #A= \lambda\cdot I_2#. If this isn't the case, then #\ker{A-\lambda\cdot I_2}# must have dimension #1#. If we now choose a basis consisting of a vector from #\ker{A-\lambda\cdot I_2}# and a vector outside of it, then the matrix of #L_A# relative to it has the form
\[\matrix{\lambda&*\\ 0&\lambda}\]where #*# indicates a number unequal to #0#. The theorem cannot be used here, but later we will see that the basis can be chosen in such a way that #*# is equal to #1#.
Assume that #L# is a linear map #V\to V# on a linear vector space #V# with complex non-real eigenvalue #\lambda#. Then #\overline\lambda# is an eigenvalue too and there exists a #2#-dimensional linear subspace #U# of #V# which is invariant under #L# and on which #L# has eigenvalues #\lambda# and #\overline\lambda#. This space can be obtained by first determining the image #W# of #V# under the real linear map #L^2-2\Re{\lambda}\cdot L + \lambda\cdot\overline{\lambda} \cdot I_V# and next choosing a vector #\vec{w}\in W# unequal to the zero vector. Then #U = \linspan{\vec{w},L(\vec{w})}# is a subspace as required.
See the matrix \[ A = \matrix{1 & -3 & 4 \\ 6 & -9 & 10 \\ 3 & -4 & 4 \\ } \] The characteristic polynomial of #A# is equal to #{-\left(x+1\right)^2 \left(x+2\right)}#.
Determine a #(3\times3)#-matrix #T# whose first two columns are a basis for #\ker{( A+ I_3)^2}# and whose third column is a basis for #\ker{ A+2I_3}#.
#T=# #\matrix{3 & -3 & -2 \\ 6 & -7 & -6 \\ 3 & -4 & -3 \\ }#
According to the theorem
Invariant direct sum, the requested basis is also a basis of #\im{A+2I_3 }# supplemented with a basis of #\im{( A+ I_3)^2 }#. These subspaces are spanned by the columns of the matrices \[ A+2I_3 = \matrix{3 & -3 & 4 \\ 6 & -7 & 10 \\ 3 & -4 & 6 \\ }\phantom{xx}\text{ and }\phantom{xx}( A+ I_3)^2= \matrix{-2 & 2 & -2 \\ -6 & 6 & -6 \\ -3 & 3 & -3 \\ } \] After thinning out, we find that the following columns form a basis:
\[\begin{array}{rcl}\basis{\cv{ 3 \\ 6 \\ 3 } , \cv{ -3 \\ -7 \\ -4 } } \phantom{xx}&\text{ for } &\ker{( A+ I_3)^2 }\phantom{xxxx}\\ &\text{ and }&\\ \basis{\cv{ -2 \\ -6 \\ -3 } } \phantom{xx}&\text{ for } &\ker{A+2I_3 }\end{array}\] This leads to the answer
\[T = \matrix{3 & -3 & -2 \\ 6 & -7 & -6 \\ 3 & -4 & -3 \\ }\]
The choice of #T# is not unique.
The matrix #L_A# relative to the basis consisting of the columns of #T# is
\[ T^{-1}\, A\, T = \matrix{-1 & -{{1}\over{3}} & 0 \\ 0 & -1 & 0 \\ 0 & 0 & -2 \\ }\]