Similar and Diagonalizable Matrices

Section 8.2 Similar and Diagonalizable Matrices

Let \(A\) and \(B\) be \(n \times n\) matrices. Then the products \(AB\) and \(BA\) are both \(n \times n\) matrices. In most cases the products \(AB\) and \(BA\) are not equal. However, for some pairs of \(n \times n\) matrices \(A\) and \(B\text{,}\) we are able to find an invertible matrix \(P\) such that \(PB = AP\text{.}\) This leads to the following definition.

Definition 8.2.1.

If \(A\) and \(B\) are \(n \times n\) matrices, we say that \(A\) and \(B\) are similar, if \(B = P^{-1}AP\) for some invertible matrix \(P\text{.}\) In this case we write

\begin{equation*} A \sim B. \end{equation*}

The following theorem shows that similarity (\(\sim\)) satisfies reflexive, symmetric, and transitive properties.

Theorem 8.2.2.

Similarity is an equivalence relation, i.e. for \(n \times n\) matrices \(A,B,\) and \(C\text{,}\)

\(A \sim A\) (reflexive)
If \(A \sim B\text{,}\) then \(B \sim A\) (symmetric)
If \(A \sim B\) and \(B \sim C\text{,}\) then \(A \sim C\) (transitive)

Proof.

Item 1: It is clear that \(A\sim A\) (let \(P=I\)). Item 2 If \(A\sim B,\) then for some invertible matrix \(P\text{,}\)

\begin{equation*} A=P^{-1}BP \end{equation*}

and so

\begin{equation*} PAP^{-1}=B. \end{equation*}

But then

\begin{equation*} \left( P^{-1}\right) ^{-1}AP^{-1}=B, \end{equation*}

which shows that \(B\sim A\text{.}\) Item 3 Now suppose \(A\sim B\) and \(B\sim C\text{.}\) Then there exist invertible matrices \(P,Q\) such that

\begin{equation*} A=P^{-1}BP,\ B=Q^{-1}CQ. \end{equation*}

Then,

\begin{equation*} A=P^{-1} \left( Q^{-1}CQ \right)P=\left( QP\right) ^{-1}C\left( QP\right), \end{equation*}

showing that \(A\) is similar to \(C\text{.}\)

Any relation satisfying the reflexive, symmetric and transitive properties is called an equivalence relation. Theorem 8.2.2 proves that similarity between matrices is an equivalence relation. Exercise 8.2.2.1 gives a good example of a relation that is NOT an equivalence relation. As we will see later, similar matrices share many properties.

Before proceeding to explore these properties, we pause to introduce a simple matrix function that we will continue to use throughout the course. Another important quantity we can compute is called the trace of a matrix.

Definition 8.2.3.

The trace of an \(n \times n\) matrix \(A\text{,}\) abbreviated \(\mbox{tr} A\text{,}\) is defined to be the sum of the main diagonal elements of \(A\text{.}\) In other words, if \(A = [a_{ij}]\text{,}\) then

\begin{equation*} \mbox{tr}(A) = a_{11} + a_{22} + \dots + a_{nn}. \end{equation*}

We may also write \(\mbox{tr}(A) =\sum_{i=1}^n a_{ii}\text{.}\)

Remark 8.2.4.

It is easy to see that \(\mbox{tr}(A + B) = \mbox{tr} A + \mbox{tr} B\) and that \(\mbox{tr}(cA) = c \mbox{ tr}(A)\) holds for all \(n \times n\) matrices \(A\) and \(B\) and all scalars \(c\text{.}\) The following fact is more surprising.

Theorem 8.2.5.

Let \(A\) and \(B\) be \(n \times n\) matrices. Then \(\mbox{tr}(AB) = \mbox{tr}(BA)\text{.}\)

Proof.

Write \(A = [a_{ij}]\) and \(B = [b_{ij}]\text{.}\) For each \(i\text{,}\) the \((i, i)\)-entry \(d_{i}\) of the matrix \(AB\) is given as follows:

\begin{equation*} d_{i} = a_{i1}b_{1i} + a_{i2}b_{2i} + \dots + a_{in}b_{ni} = \sum_{j}a_{ij}b_{ji}. \end{equation*}

Hence

\begin{equation*} \mbox{tr}(AB) = d_1 + d_2 + \dots + d_n = \sum_{i}d_i = \sum_{i}\left(\sum_{j}a_{ij}b_{ji}\right). \end{equation*}

Similarly we have

\begin{equation*} \mbox{tr}(BA) = \sum_{i}\left(\sum_{j}b_{ij}a_{ji}\right). \end{equation*}

Since these two double sums are the same, we have proven the theorem.

The following theorem lists a number of properties shared by similar matrices.

Theorem 8.2.6.

If \(A\) and \(B\) are \(n\times n\) matrices and \(A\sim B\text{,}\) then

\(\det(A) = \det(B)\text{,}\)
\(\mbox{rank}(A) = \mbox{rank}(B)\text{,}\)
\(\mbox{tr}(A)= \mbox{tr}(B)\text{,}\)
\(A\) and \(B\) have the same characteristic equations, and
\(A\) and \(B\) have the same eigenvalues.

Proof.

Let \(B = P^{-1}AP\) for some invertible matrix \(P\text{.}\)

For Item 1, \(\det B = \det(P^{-1}) \det A \det P = \det A\) because \(\det(P^{-1}) = 1/ \det P\) (see Theorem 7.2.16).

Similarly, for Item 2, \(\mbox{rank} B = \mbox{rank}(P^{-1}AP) = \mbox{rank} A\text{,}\) because multiplication by an invertible matrix cannot change the rank. To see this, note that any invertible matrix is a product of elementary matrices. Multiplying by elementary matrices is equivalent to performing elementary row (column) operations on \(A\text{,}\) which does not change the row (column) space, nor the rank. It follows that similar matrices have the same rank.

For Item 3, we make use of Theorem 8.2.5:

\begin{equation*} \mbox{tr} (P^{-1}AP) = \mbox{tr}[P^{-1}(AP)] = \mbox{tr}[(AP)P^{-1}] = \mbox{tr} A. \end{equation*}

As for Item 4,

\begin{align*} \det(B-\lambda I) \amp = \det \{P^{-1}AP-\lambda(P^{-1}P)\} \\ \amp = \det \{ P^{-1}(A-\lambda I)P\} \\ \amp = \det (A-\lambda I), \end{align*}

so \(A\) and \(B\) have the same characteristic equation.

Finally, Item 4 implies Item 5 because the eigenvalues of a matrix are the roots of its characteristic polynomial.

Remark 8.2.7.

Sharing the five properties in Theorem 8.2.6 does not guarantee that two matrices are similar. The matrices

\begin{equation*} A = \begin{bmatrix} 1 \amp 1 \\ 0 \amp 1 \end{bmatrix} \quad \text{and} \quad I = \begin{bmatrix} 1 \amp 0 \\ 0 \amp 1 \end{bmatrix} \end{equation*}

have the same determinant, rank, trace, characteristic polynomial, and eigenvalues, but they are not similar because \(P^{-1}IP = I\) for any invertible matrix \(P\text{.}\)

Even though the properties in Theorem 8.2.6 cannot be used to show two matrices are similar, these properties come in handy for showing that two matrices are NOT similar.

With all the abstract business running around, let us turn to a concretete example.

Example 8.2.8.

Are the matrices \(A = \begin{bmatrix} 2 \amp 1 \\ 1 \amp -1 \end{bmatrix}\) and \(B = \begin{bmatrix} 3 \amp 0 \\ 1 \amp -1 \end{bmatrix}\) similar?

Answer.

A quick check shows us \(\det A = \det B\text{,}\) and both matrices are seen to be invertible, so they have the same rank. However, \(\mbox{tr} A = 1\) and \(\mbox{tr} B = 2\text{,}\) so the matrices are not similar.

The next theorem shows that similarity is preserved under inverses, transposes, and powers:

Theorem 8.2.9.

If \(A\) and \(B\) are \(n\times n\) matrices and \(A\sim B\text{,}\) then

\(A^{-1} \sim B^{-1}\text{,}\)
\(A^T \sim B^T\text{,}\) and
\(A^k \sim B^k\) for all integers \(k \geq 1\text{.}\)

Proof.

See Exercise 8.2.2.9.

Subsection 8.2.1 Diagonalizable Matrices and Multiplicity

Recall that a diagonal matrix \(D\) is a matrix containing a zero in every entry except those on the main diagonal. More precisely, if \(d_{ij}\) is the \(ij^{th}\) entry of a diagonal matrix \(D\text{,}\) then \(d_{ij}=0\) unless \(i=j\text{.}\) Such matrices look like the following.

\begin{equation*} D = \begin{bmatrix} * \amp \amp 0 \\ \amp \ddots \amp \\ 0 \amp \amp * \end{bmatrix} \end{equation*}

where \(*\) is a number which might not be zero. Diagonal matrices have some nice properties, as we demonstrate below.

Exploration 8.2.1.

Let us warm up with a small computation.

Problem 8.2.10.

Let

\begin{equation*} M =\begin{bmatrix}1 \amp 2 \amp 3\\ 4\amp 5\amp 6\\7\amp 8\amp 9\end{bmatrix} \quad \text{and} \quad D =\begin{bmatrix}2 \amp 0 \amp 0\\ 0\amp -5\amp 0\\0\amp 0\amp 10\end{bmatrix}. \end{equation*}

Compute \(MD\) and \(DM\text{.}\)

Answer.

\begin{equation*} MD = \begin{bmatrix} 2 \amp -10 \amp 30\\ 8\amp -25\amp 60\\14\amp -40\amp 90\end{bmatrix}, \quad DM= \begin{bmatrix} 2 \amp 4 \amp 6\\ -20\amp -25\amp -30\\70\amp 80\amp 90\end{bmatrix}. \end{equation*}

Notice the patterns present in the product matrices. Each row of \(DM\) is the same as its corresponding row of \(M\) multiplied by the scalar which is the corresponding diagonal element of \(D\text{.}\) In the product \(MD\text{,}\) it is the columns of \(M\) that have been multiplied by the diagonal elements. These patterns hold in general for any diagonal matrix, and they are fundamental to understanding diagonalization, the process we discuss below.

Definition 8.2.11.

Let \(A\) be an \(n\times n\) matrix. Then \(A\) is said to be diagonalizable if there exists an invertible matrix \(P\) such that

\begin{equation*} P^{-1}AP=D, \end{equation*}

where \(D\) is a diagonal matrix. In other words, a matrix \(A\) is diagonalizable if it is similar to a diagonal matrix, \(A \sim D\text{.}\)

If we are given a matrix \(A\) that is diagonalizable, then we can write \(P^{-1}AP=D\) for some matrix \(P\text{,}\) or, equivalently,

\begin{equation} AP=PD \tag{8.2.1} \end{equation}

If we pause to examine (8.2.1), the work that we did in Exploration 8.2.1 can help us to understand how to find \(P\) that will diagonalize \(A\text{.}\) The product \(PD\) is formed by multiplying each column of \(P\) by a scalar which is the corresponding element on the diagonal of \(D\text{.}\) To restate this, if \(\mathbf{x}_i\) is column \(i\) in our matrix \(P\text{,}\) then (8.2.1) tells us that

\begin{equation} A \mathbf{x}_i = \lambda_i \mathbf{x}_i, \tag{8.2.2} \end{equation}

where \(\lambda_i\) is the \(i\)th diagonal element of \(D\text{.}\) Of course, (8.2.2) is very familiar! We see that if we are able to diagonalize a matrix \(A\text{,}\) the columns of matrix \(P\) will be the eigenvectors of \(A\text{,}\) and the corresponding diagonal entries of \(D\) will be the corresponding eigenvalues of \(A\text{.}\) This is summed up in the following theorem.

Theorem 8.2.12.

An \(n\times n\) matrix \(A\) is diagonalizable if and only if there is an invertible matrix \(P\) given by

\begin{equation*} P=\begin{bmatrix} | \amp | \amp \amp | \\ \mathbf{x}_1 \amp \mathbf{x}_2 \amp \cdots \amp \mathbf{x}_n \\ | \amp | \amp \amp | \end{bmatrix}, \end{equation*}

where the columns \(\mathbf{x}_i\) are eigenvectors of \(A\text{.}\) Moreover, if \(A\) is diagonalizable, the corresponding eigenvalues of \(A\) are the diagonal entries of the diagonal matrix \(D\text{.}\)

Proof.

Suppose \(P\) is given as above as an invertible matrix whose columns are eigenvectors of \(A\text{.}\) To show that \(A\) is diagonalizable, we will show

\begin{equation*} AP=PD, \end{equation*}

which is equivalent to \(P^{-1}AP=D\text{.}\) We have

\begin{equation*} AP=\begin{bmatrix} | \amp | \amp \amp | \\ A\mathbf{x}_1 \amp A\mathbf{x}_2 \amp \cdots \amp A\mathbf{x}_n \\ | \amp | \amp \amp | \end{bmatrix}, \end{equation*}

while

\begin{align*} PD \amp =\begin{bmatrix} | \amp | \amp \amp | \\ \mathbf{x}_1 \amp \mathbf{x}_2 \amp \cdots \amp \mathbf{x}_n \\ | \amp | \amp \amp | \end{bmatrix} \begin{bmatrix} \lambda _{1} \amp \amp 0 \\ \amp \ddots \amp \\ 0 \amp \amp \lambda _{n} \end{bmatrix} \\ \amp =\begin{bmatrix} | \amp | \amp \amp | \\ \lambda _{1}\mathbf{x}_1 \amp \lambda _{2}\mathbf{x}_2 \amp \cdots \amp \lambda_{n}\mathbf{x}_n \\ | \amp | \amp \amp | \end{bmatrix}. \end{align*}

We can complete this half of the proof by comparing columns, and noting that

\begin{equation} A \mathbf{x}_i = \lambda_i \mathbf{x}_i \tag{8.2.3} \end{equation}

for \(i=1,\ldots,n\) since the \(\mathbf{x}_i\) are eigenvectors of \(A\) and the \(\lambda_i\) are corresponding eigenvalues of \(A\text{.}\)

Conversely, suppose \(A\) is diagonalizable so that \(P^{-1}AP=D.\) Let

\begin{equation*} P=\begin{bmatrix} | \amp | \amp \amp | \\ \mathbf{x}_1 \amp \mathbf{x}_2 \amp \cdots \amp \mathbf{x}_n \\ | \amp | \amp \amp | \end{bmatrix} \end{equation*}

where the columns are the vectors \(\mathbf{x}_i\) and

\begin{equation*} D=\begin{bmatrix} \lambda _{1} \amp \amp 0 \\ \amp \ddots \amp \\ 0 \amp \amp \lambda _{n} \end{bmatrix}. \end{equation*}

Then

\begin{equation*} AP=PD=\begin{bmatrix} | \amp | \amp \amp | \\ \mathbf{x}_1 \amp \mathbf{x}_2 \amp \cdots \amp \mathbf{x}_n \\ | \amp | \amp \amp | \end{bmatrix} \begin{bmatrix} \lambda _{1} \amp \amp 0 \\ \amp \ddots \amp \\ 0 \amp \amp \lambda _{n} \end{bmatrix} \end{equation*}

and so

\begin{equation*} \begin{bmatrix} | \amp | \amp \amp | \\ A\mathbf{x}_1 \amp A\mathbf{x}_2 \amp \cdots \amp A\mathbf{x}_n \\ | \amp | \amp \amp | \end{bmatrix} =\begin{bmatrix} | \amp | \amp \amp | \\ \lambda _{1}\mathbf{x}_1 \amp \lambda _{2}\mathbf{x}_2 \amp \cdots \amp \lambda_{n}\mathbf{x}_n \\ | \amp | \amp \amp | \end{bmatrix}, \end{equation*}

showing the \(\mathbf{x}_i\) are eigenvectors of \(A\) and the \(\lambda _{i}\) are eigenvalues.

Notice that because the matrix \(P\) defined above is invertible it follows that the set of eigenvectors of \(A\text{,}\) \(\left\{ \mathbf{x}_1 , \mathbf{x}_2 , \ldots, , \mathbf{x}_n \right\}\text{,}\) is a basis of \(\mathbb{R}^n\text{.}\)

We demonstrate the concept given in the above theorem in the next example. Note that not only are the columns of the matrix \(P\) formed by eigenvectors, but \(P\) must be invertible, and therefore must consist of a linearly independent set of eigenvectors.

Example 8.2.13.

Let

\begin{equation*} A=\begin{bmatrix} 2 \amp 0 \amp 0 \\ 1 \amp 4 \amp -1 \\ -2 \amp -4 \amp 4 \end{bmatrix} \end{equation*}

Find an invertible matrix \(P\) and a diagonal matrix \(D\) such that \(P^{-1}AP=D\text{.}\)

Answer.

We will use eigenvectors of \(A\) as the columns of \(P\text{,}\) and the corresponding eigenvalues of \(A\) as the diagonal entries of \(D\text{.}\) The eigenvalues of \(A\) are \(\lambda_1 =2,\lambda_2 = 2\text{,}\) and \(\lambda_3 = 6\text{.}\) We leave these computations as exercises, as well as the computations to find a basis for each eigenspace. One possible basis for \(\mathcal{S}_2\text{,}\) the eigenspace corresponding to \(2\text{,}\) is

\begin{equation*} \left \lbrace \begin{bmatrix} -2 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} \right \rbrace, \end{equation*}

while a basis for \(\mathcal{S}_6\) is given by

\begin{equation*} \left \lbrace \begin{bmatrix} 0 \\ 1 \\ -2 \end{bmatrix}\right \rbrace \text{.} \end{equation*}

We construct the matrix \(P\) by using these basis elements as columns.

\begin{equation*} P=\begin{bmatrix} -2 \amp 1 \amp 0 \\ 1 \amp 0 \amp 1 \\ 0 \amp 1 \amp -2 \end{bmatrix} \end{equation*}

You can verify (and will do so during exercise) that

\begin{equation*} P^{-1}=\begin{bmatrix} -1/4 \amp 1/2 \amp 1/4 \\ 1/2 \amp 1 \amp 1/2 \\ 1/4 \amp 1/2 \amp -1/4 \end{bmatrix} \end{equation*}

Thus,

\begin{align*} P^{-1}AP \amp = \begin{bmatrix} -1/4 \amp 1/2 \amp 1/4 \\ 1/2 \amp 1 \amp 1/2 \\ 1/4 \amp 1/2 \amp -1/4 \end{bmatrix} \begin{bmatrix} 2 \amp 0 \amp 0 \\ 1 \amp 4 \amp -1 \\ -2 \amp -4 \amp 4 \end{bmatrix} \begin{bmatrix} -2 \amp 1 \amp 0 \\ 1 \amp 0 \amp 1 \\ 0 \amp 1 \amp -2 \end{bmatrix} \\ \amp =\begin{bmatrix} 2 \amp 0 \amp 0 \\ 0 \amp 2 \amp 0 \\ 0 \amp 0 \amp 6 \end{bmatrix} \end{align*}

You can see that the result here is a diagonal matrix where the entries on the main diagonal are the eigenvalues of \(A\text{.}\) Notice that eigenvalues on the main diagonal must be in the same order as the corresponding eigenvectors in \(P\text{.}\)

It is often easier to work with matrices that are diagonalizable, as the next Exploration demonstrates.

Exploration 8.2.2.

Let

\begin{equation*} A=\begin{bmatrix} 2 \amp 0 \amp 0 \\ 1 \amp 4 \amp -1 \\ -2 \amp -4 \amp 4 \end{bmatrix} \quad \text{and} \quad D=\begin{bmatrix} 2 \amp 0 \amp 0 \\ 0 \amp 2 \amp 0 \\ 0 \amp 0 \amp 6 \end{bmatrix}. \end{equation*}

Would it be easier to compute \(A^5\) or \(D^5\) if you had to do so by hand, without a computer? Certainly \(D^5\) is easier, due to the number of zero entries!

Problem 8.2.14.

Do compute \(A^5\) together with \(D^5\text{.}\) Feel free to use a program or online calculator tool for \(A^5 \text{,}\) but do \(D^5 \) by hand.

We see that raising a diagonal matrix to a power amounts to simply raising each entry to that same power, whereas computing \(A^5\) requires many more calculations. However, we learned in Example 8.2.13 that \(A\) is similar to \(D\text{,}\) and we can use this to make our computation easier. This is because

\begin{align*} A^5\amp =\left(PDP^{-1}\right)^5 \\ \amp =(PDP^{-1})(PDP^{-1})(PDP^{-1})(PDP^{-1})(PDP^{-1}) \\ \amp =PD(P^{-1}P)D(P^{-1}P)D(P^{-1}P)D(P^{-1}P)DP^{-1} \\ \amp =PD(I)D(I)D(I)D(I)DP^{-1} \\ \amp =PD^5P^{-1}. \end{align*}

With this in mind, it is not as daunting to calculate \(A^5\) by hand. We can compute the product \(PD^5\) quite easily since \(D^5\) is diagonal, as we learned in Exploration 8.2.1. That leaves just one product of \(3 \times 3\) matrices to compute by hand to compute \(A^5\text{.}\) And the savings in work would certainly be more pronounced for larger matrices or for powers larger that 5.

In Exploration 8.2.2, because matrix \(A\) was diagonalizable, we were able to cut down on computations. When we chose to work with \(D\) and \(P\) instead of \(A\) we worked with the eigenvalues and eigenvectors of \(A\text{.}\) Each column of \(P\) is an eigenvector of \(A\text{,}\) and so we repeatedly made use of the following theorem (with \(m=5\)).

Theorem 8.2.15.

Let \(A\) be an \(n \times n\) matrix and suppose \(A\mathbf{x}=\lambda \mathbf{x}\text{.}\) Then

\begin{equation*} A^m \mathbf{x} = \lambda^m \mathbf{x}. \end{equation*}

Proof.

We prove this theorem by induction on \(m\text{.}\) Clearly \(A^m \mathbf{x} = \lambda^m \mathbf{x}\) holds when \(m=1\text{,}\) as that was given. For the inductive step, suppose that we know \(A^{m-1} \mathbf{x} = \lambda^{m-1} \mathbf{x}\text{.}\) Then

\begin{align*} A^m \mathbf{x} \amp = (A A^{m-1}) \mathbf{x} \\ \amp = A (A^{m-1} \mathbf{x}) \\ \amp = A (\lambda^{m-1} \mathbf{x}) \\ \amp = \lambda^{m-1} A\mathbf{x} \\ \amp = \lambda^{m-1} \lambda \mathbf{x} \\ \amp = \lambda^m \mathbf{x}. \end{align*}

as desired.

Matrix \(A\) from the Example 8.2.13 and Exploration 8.2.2 had a repeated eigenvalue of 2. The next theorem and corollary show that matrices which have distinct eigenvalues (where none are repeated) have desirable properties.

Theorem 8.2.16.

Let \(A\) be an \(n\times n\) matrix, and suppose that \(A\) has distinct eigenvalues \(\lambda_1, \lambda_2, \ldots, \lambda_m\text{.}\) For each \(i\text{,}\) let \(\mathbf{x}_i\) be a \(\lambda_i\)-eigenvector of \(A\text{.}\) Then \(\{ \mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_m\}\) is linearly independent.

Proof.

We prove this by induction on \(m\text{,}\) the number of vectors in the set. If \(m = 1\text{,}\) then \(\{\mathbf{x}_{1}\}\) is a linearly independent set because \(\mathbf{x}_{1} \neq \mathbf{0}\text{.}\) In general, suppose we have established that the theorem is true for some \(m \geq 1\text{.}\) Given eigenvectors \(\{\mathbf{x}_{1}, \mathbf{x}_{2}, \dots, \mathbf{x}_{m+1}\}\text{,}\) suppose

\begin{equation} c_1\mathbf{x}_1 + c_2\mathbf{x}_2 + \dots + c_{m+1}\mathbf{x}_{m+1} = \mathbf{0}.\tag{8.2.4} \end{equation}

We must show that each \(c_{i} = 0\text{.}\) Multiply both sides of (8.2.4) on the left by \(A\) and use the fact that \(A\mathbf{x}_{i} = \lambda_{i}\mathbf{x}_{i}\) to get

\begin{equation} c_1\lambda_1\mathbf{x}_1 + c_2\lambda_2\mathbf{x}_2 + \dots + c_{m+1}\lambda_{m+1}\mathbf{x}_{m+1} = \mathbf{0}.\tag{8.2.5} \end{equation}

If we multiply (8.2.4) by \(\lambda_{1}\) and subtract the result from (8.2.5), the first terms cancel and we obtain

\begin{equation*} c_2(\lambda_2 - \lambda_1)\mathbf{x}_2 + c_3(\lambda_3 - \lambda_1)\mathbf{x}_3 + \dots + c_{m+1}(\lambda_{m+1} - \lambda_1)\mathbf{x}_{m+1} = \mathbf{0}. \end{equation*}

Since \(\mathbf{x}_{2}, \mathbf{x}_{3}, \dots, \mathbf{x}_{m+1}\) correspond to distinct eigenvalues \(\lambda_{2}, \lambda_{3}, \dots, \lambda_{m+1}\text{,}\) the set \(\{\mathbf{x}_{2}, \mathbf{x}_{3}, \dots, \mathbf{x}_{m+1}\}\) is linearly independent by the induction hypothesis. Hence,

\begin{equation*} c_2(\lambda_2 - \lambda_1) = 0, \quad c_3(\lambda_3 - \lambda_1) = 0, \quad \dots, \quad c_{m+1}(\lambda_{m+1} - \lambda_1) = 0 \end{equation*}

and so \(c_{2} = c_{3} = \dots = c_{m+1} = 0\) because the \(\lambda_{i}\) are distinct. It follows that (8.2.4) becomes \(c_{1}\mathbf{x}_{1} = \mathbf{0}\text{,}\) which implies that \(c_{1} = 0\) because \(\mathbf{x}_{1} \neq \mathbf{0}\text{,}\) and the proof is complete.

The corollary that follows from this theorem gives a useful tool in determining if \(A\) is diagonalizable.

Corollary 8.2.17.

Let \(A\) be an \(n \times n\) matrix and suppose it has \(n\) distinct eigenvalues. Then it follows that \(A\) is diagonalizable.

Remark 8.2.18.

Note that Corollary 8.2.17 is NOT an ``if and only if statement". This means that if \(A\) has repeated eigenvalues it is still sometimes possible to diagonalize \(A\text{,}\) as seen in Example 8.2.13.

Definition 8.2.19.

If we are able to diagonalize \(A\text{,}\) say \(A=PDP^{-1}\text{,}\) we say that \(PDP^{-1}\) is an eigenvalue decomposition of \(A\text{.}\)

Not every matrix has an eigenvalue decomposition. Sometimes we cannot find an invertible matrix \(P\) such that \(P^{-1}AP=D\text{.}\) Consider the following example.

Example 8.2.20.

Let

\begin{equation*} A = \begin{bmatrix} 1 \amp 1 \\ 0 \amp 1 \end{bmatrix}. \end{equation*}

If possible, find an invertible matrix \(P\) and a diagonal matrix \(D\) so that \(P^{-1}AP=D\text{.}\)

Answer.

We see immediately (how?) that the eigenvalues of \(A\) are \(\lambda_1 =1\) and \(\lambda_2=1\text{.}\) To find \(P\text{,}\) the next step would be to find a basis for the corresponding eigenspace \(\mathcal{S}_1\text{.}\) We solve the equation \(\left( A - \lambda I \right) \mathbf{x} = \mathbf{0}\text{.}\) Writing this equation as an augmented matrix, we already have a matrix in row echelon form:

\begin{equation*} \left[\begin{array}{cc|c} 0 \amp -1 \amp 0 \\ 0 \amp 0 \amp 0 \end{array}\right] \end{equation*}

We see that the eigenvectors in \(\mathcal{S}_1\) are of the form

\begin{equation*} \begin{bmatrix} t \\ 0 \end{bmatrix} =t\begin{bmatrix} 1 \\ 0 \end{bmatrix}, \end{equation*}

so a basis for the eigenspace \(\mathcal{S}_1\) is given by \([1,0]\text{.}\) It is easy to see that we cannot form an invertible matrix \(P\text{,}\) because any two eigenvectors will be of the form \([t,0]\text{,}\) and so the second row of \(P\) would have a row of zeros, and \(P\) could not be invertible. Hence \(A\) cannot be diagonalized.

We saw earlier in Corollary 8.2.17 that an \(n \times n\) matrix with \(n\) distinct eigenvalues is diagonalizable. It turns out that there are other useful diagonalizability tests.

Recall that the algebraic multiplicity of an eigenvalue \(\lambda\) is the number of times that it occurs as a root of the characteristic polynomial.

Definition 8.2.21.

The geometric multiplicity of an eigenvalue \(\lambda\) is the dimension of the corresponding eigenspace \(\mathcal{S}_\lambda\text{.}\)

Consider now the following lemma.

Lemma 8.2.22.

Let \(A\) be an \(n\times n\) matrix, and let \(\mathcal{S}_{\lambda_1}\) be the eigenspace corresponding to the eigenvalue \(\lambda_1\) which has algebraic multiplicity \(m\text{.}\) Then

\begin{equation*} \mbox{dim}(\mathcal{S}_{\lambda_1})\leq m. \end{equation*}

In other words, the geometric multiplicity of an eigenvalue is less than or equal to the algebraic multiplicity of that same eigenvalue.

Proof.

Let \(k\) be the geometric multiplicity of \(\lambda_1\text{,}\) i.e., \(k=\mbox{dim}(\mathcal{S}_{\lambda_1})\text{.}\) Suppose \(\left\{\mathbf{x}_1, \mathbf{x}_2, \ldots ,\mathbf{x}_k\right\}\) is a basis for the eigenspace \(\mathcal{S}_{\lambda_1}\text{.}\) Let \(P\) be any invertible matrix having \(\mathbf{x}_1, \mathbf{x}_2, \ldots ,\mathbf{x}_k\) as its first \(k\) columns, say

\begin{equation*} P=\begin{bmatrix} | \amp | \amp \amp | \amp | \amp \amp | \\ \mathbf{x}_1 \amp \mathbf{x}_2 \amp \cdots \amp \mathbf{x}_k \amp \mathbf{x}_{k+1} \amp \cdots \amp \mathbf{x}_n \\ | \amp | \amp \amp | \amp | \amp \amp | \end{bmatrix}. \end{equation*}

In block form we may write

\begin{equation*} P=\begin{bmatrix} B\amp C \end{bmatrix} \quad \text{and} \quad P^{-1}=\begin{bmatrix} D \\ E \end{bmatrix}, \end{equation*}

where \(B\) is \(n \times k\text{,}\) \(C\) is \(n \times (n-k)\text{,}\) \(D\) is \(k \times n\text{,}\) and \(E\) is \((n-k) \times n\text{.}\) We observe

\begin{equation*} I_n = P^{-1}P = \left[\begin{array}{c|c} DB \amp DC \\ \hline EB \amp EC \end{array}\right]. \text{.} \end{equation*}

This implies

\begin{equation*} DB = I_k,\quad DC=O_{k\,\,n-k},\quad EB = O_{n-k\,\,k} \quad\text{ and }\quad EC = I_{n-k}. \end{equation*}

Therefore,

\begin{align*} P^{-1}AP \amp =\begin{bmatrix} D \\ E \end{bmatrix} A \begin{bmatrix} B\amp C \end{bmatrix} = \left[\begin{array}{c|c} DAB \amp DAC \\ \hline EAB \amp EAC \end{array}\right ] \\ \amp = \left[\begin{array}{c|c} \lambda_1 DB \amp DAC \\ \hline \lambda_1 EB \amp EAC \end{array}\right] = \left[\begin{array}{c|c} \lambda_1 I_k \amp DAC \\ \hline O \amp EAC \end{array}\right]. \end{align*}

We finish the proof by comparing the characteristic polynomials on both sides of this equation, and making use of the fact that similar matrices have the same characteristic polynomials.

\begin{equation*} \det(A-\lambda I) = \det(P^{-1}AP-\lambda I)=(\lambda_1 - \lambda)^k \det(EAC). \end{equation*}

We see that the characteristic polynomial of \(A\) has \((\lambda_1 - \lambda)^k\) as a factor. This tells us that algebraic multiplicity of \(\lambda_1\) is at least \(k\text{,}\) proving the desired inequality.

This result tells us that if \(\lambda\) is an eigenvalue of \(A\text{,}\) then the number of linearly independent \(\lambda\)-eigenvectors is never more than the multiplicity of \(\lambda\text{.}\) We now use this fact to provide a useful diagonalizability condition.

Theorem 8.2.23.

Let \(A\) be an \(n \times n\) matrix \(A\text{.}\) Then \(A\) is diagonalizable if and only if for each eigenvalue \(\lambda\) of \(A\text{,}\) the algebraic multiplicity of \(\lambda\) is equal to the geometric multiplicity of \(\lambda\text{.}\)

Proof.

Suppose \(A\) is diagonalizable and let \(\lambda_1, \ldots, \lambda_t\) be the distinct eigenvalues of \(A\text{,}\) with algebraic multiplicities \(m_1, \ldots, m_t\text{,}\) respectively and geometric multiplicities \(k_1, \ldots, k_t\text{,}\) respectively. Since \(A\) is diagonalizable, Theorem 8.2.12 implies that \(k_1+\cdots+k_t=n\text{.}\) By applying Lemma 8.2.22 \(t\) times, we have

\begin{equation*} n = k_1+\cdots+k_t \le m_1+\cdots+m_t = n, \end{equation*}

which is only possible if \(k_i=m_i\) for \(i=1,\ldots,t\text{.}\) Conversely, if the geometric multiplicity equals the algebraic multiplicity of each eigenvalue, then obtaining a basis for each eigenspace yields \(n\) eigenvectors. Applying Theorem 8.2.16, we know that these \(n\) eigenvectors are linearly independent, so Theorem 8.2.12 implies that \(A\) is diagonalizable.

Exercises 8.2.2 Exercises

1.

At the beginning of this section we mentioned that similarity of \(n \times n\) matrices is an equivalence relation. An equivalence relation is a binary relation \(R\) on elements of a set \(S\) that has the following properties:

The reflexive property: \(x R x\) for every \(x \in S\)
The symmetric property: If \(x R y\text{,}\) then \(y R x\) for every \(x,y \in S\)
The transitive property: If \(x R y\) and \(y R z\text{,}\) then \(x R z\) for every \(x,y,z \in S\)

Let \(S\) be the set of all positive integers. We can show that the relation ``less than’’ (symbolized by \(\lt \)) is NOT an equivalence relation on this set. To see this, note that ``less than’’ is not reflexive, because \(s\lt s\) is not true for any positive integer \(s\text{.}\)

Is the relation ``less than’’ symmetric?
Is the relation ``less than’’ transitive?

Answer.

The relation "less than" not symmetric, but it is transitive.

2.

Another relation between matrices we have studied in this course is that two matrices can be ``row equivalent’’. Is the relation ``row equivalent’’

reflexive?
symmetric?
transitive?

Answer.

The relation is reflexive, symmetric and transitive.

Exercise Group.

By computing the trace, determinant, and rank, show that \(A\) and \(B\) are not similar in each case.

3.

\begin{equation*} A = \begin{bmatrix} 1 \amp 2 \\ 2 \amp 1 \end{bmatrix}, \quad B = \begin{bmatrix} 1 \amp 1\\ -1 \amp 1 \end{bmatrix}. \end{equation*}

4.

\begin{equation*} A = \begin{bmatrix} 3 \amp 1 \\ 2 \amp -1 \end{bmatrix}, \quad B = \begin{bmatrix} 1 \amp 1 \\ 2 \amp 1 \end{bmatrix}. \end{equation*}

5.

\begin{equation*} A = \begin{bmatrix} 3 \amp 1 \\ -1 \amp 2 \end{bmatrix}, \quad B = \begin{bmatrix} 2 \amp -1 \\ 3 \amp 2 \end{bmatrix} \end{equation*}

6.

\begin{equation*} A = \begin{bmatrix} 2 \amp 1 \amp 1 \\ 1 \amp 0 \amp 1 \\ 1 \amp 1 \amp 0 \end{bmatrix}, \quad B = \begin{bmatrix} 1 \amp -2 \amp 1 \\ -2 \amp 4 \amp -2 \\ -3 \amp 6 \amp -3 \end{bmatrix}. \end{equation*}

7.

\begin{equation*} A = \begin{bmatrix} 1 \amp 2 \amp -3 \\ 1 \amp -1 \amp 2 \\ 0 \amp 3 \amp -5 \end{bmatrix}, \quad B = \begin{bmatrix} -2 \amp 1 \amp 3 \\ 6 \amp -3 \amp -9 \\ 0 \amp 0 \amp 0 \end{bmatrix}. \end{equation*}

8.

Show that

\begin{equation*} \begin{bmatrix} 1 \amp 2 \amp -1 \amp 0 \\ 2 \amp 0 \amp 1 \amp 1 \\ 1 \amp 1 \amp 0 \amp -1 \\ 4 \amp 3 \amp 0 \amp 0 \end{bmatrix} \quad \text{and} \quad \begin{bmatrix} 1 \amp -1 \amp 3 \amp 0 \\ -1 \amp 0 \amp 1 \amp 1 \\ 0 \amp -1 \amp 4 \amp 1 \\ 5 \amp -1 \amp -1 \amp -4 \end{bmatrix} \end{equation*}

are not similar.

9.

Prove Theorem 8.2.9.

10.

If \(A\) is invertible, show that \(AB\) is similar to \(BA\) for all \(B\text{.}\)

11.

Show that the only matrix similar to a scalar matrix \(A = rI\text{,}\) where \(r\) is any number, is \(A\) itself.

12.

Let \(\lambda\) be an eigenvalue of \(A\) with corresponding eigenvector \(\mathbf{x}\text{.}\) If \(B = P^{-1}AP\) is similar to \(A\text{,}\) show that \(P^{-1}\mathbf{x}\) is an eigenvector of \(B\) corresponding to \(\lambda\text{.}\)

Exercise Group.

In this exercise you will "fill in the details" of Example 8.2.13.

13.

Find the eigenvalues of matrix

\begin{equation*} A=\begin{bmatrix} 2 \amp 0 \amp 0 \\ 1 \amp 4 \amp -1 \\ -2 \amp -4 \amp 4 \end{bmatrix} \end{equation*}

14.

Find a basis for each eigenspace of the matrix \(A\text{.}\)

15.

Compute the inverse of

\begin{equation*} P= \begin{bmatrix} -2 \amp 1 \amp 0 \\ 1 \amp 0 \amp 1 \\ 0 \amp 1 \amp -2 \end{bmatrix} \end{equation*}

16.

Compute \(D=P^{-1}AP\text{.}\)

17.

Show that computing the inverse of \(P\) is not really necessary by comparing the products \(PD\) and \(AP\text{.}\)

18.

In each case, decide whether the matrix \(A\) is diagonalizable. If so, find \(P\) such that \(P^{-1}AP\) is diagonal.

\begin{equation*} \begin{bmatrix} 1 \amp 0 \amp 0 \\ 1 \amp 2 \amp 1 \\ 0 \amp 0 \amp 1 \end{bmatrix}, \end{equation*}
\begin{equation*} \begin{bmatrix} 3 \amp 0 \amp 6 \\ 0 \amp -3 \amp 0 \\ 5 \amp 0 \amp 2 \end{bmatrix}, \end{equation*}
\begin{equation*} \begin{bmatrix} 3 \amp 1 \amp 6 \\ 2 \amp 1 \amp 0 \\ -1 \amp 0 \amp -3 \end{bmatrix}, \end{equation*}
\begin{equation*} \begin{bmatrix} 4 \amp 0 \amp 0 \\ 0 \amp 2 \amp 2 \\ 2 \amp 3 \amp 1 \end{bmatrix}. \end{equation*}

Answer.

Diagonalizable.
Diagonalizable.
Diagonalizable.
Not diagonalizable.

19.

Let \(A\) denote an \(n \times n\) upper triangular matrix.

If all the main diagonal entries of \(A\) are distinct, show that \(A\) is diagonalizable.
If all the main diagonal entries of \(A\) are equal, show that \(A\) is diagonalizable only if it is already diagonal.
Show that

\begin{equation*} \begin{bmatrix} 1 \amp 0 \amp 1 \\ 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 2 \end{bmatrix} \end{equation*}

is diagonalizable while

\begin{equation*} \begin{bmatrix} 1 \amp 1 \amp 0 \\ 0 \amp 1 \amp 0 \\ 0 \amp 0 \amp 2 \end{bmatrix} \end{equation*}

is not diagonalizable.

Answer.

For part (b): The eigenvalues of \(A\) are all equal (they are the diagonal elements), so if \(P^{-1}AP = D\) is diagonal, then \(D = \lambda I\text{.}\) Hence \(A = P^{-1}(\lambda I)P = \lambda I\text{.}\)

20.

Let \(A\) be a diagonalizable \(n \times n\) matrix with eigenvalues \(\lambda_{1}, \lambda_{2}, \dots, \lambda_{n}\) (including multiplicities). Show that:

\(\displaystyle \det A = \lambda_{1}\lambda_{2}\cdots \lambda_{n}\)
\(\displaystyle \mbox{tr} A = \lambda_{1} + \lambda_{2} + \cdots + \lambda_{n}\)

21.

Show that two diagonalizable matrices are similar if and only if they have the same eigenvalues with the same multiplicities.
If \(A\) is diagonalizable, show that \(A \sim A^{T}\text{.}\)
Show that \(A \sim A^{T}\) if

\begin{equation*} A = \begin{bmatrix} 1 \amp 1 \\ 0 \amp 1 \end{bmatrix} \end{equation*}

Prev Top Next