Exploration 4.6.1.
Consider the equation:
\begin{equation*}
\begin{bmatrix}1\amp 0\amp 0\\3\amp 1\amp 0\\-2\amp 2\amp 1\end{bmatrix}\begin{bmatrix}2\amp -1\amp 2\\0\amp 4\amp -1\\0\amp 0\amp 3\end{bmatrix}\mathbf{x}=\begin{bmatrix}1\\8\\5\end{bmatrix}.
\end{equation*}
This equation is unusual in that it involves two matrices on the left-hand side. If we multiply the matrices together, we get
\begin{equation*}
\begin{bmatrix}2\amp -1\amp 2\\6\amp 1\amp 5\\-4\amp 10\amp -3\end{bmatrix}\mathbf{x}=\begin{bmatrix}1\\8\\5\end{bmatrix}.
\end{equation*}
Gaussian elimination yields:
\begin{equation*}
\left[\begin{array}{ccc|c}
2\amp -1\amp 2\amp 1\\6\amp 1\amp 5\amp 8\\-4\amp 10\amp -3\amp 5
\end{array}\right]
\quad\rightsquigarrow\quad
\left[\begin{array}{ccc|c}
1\amp 0\amp 0\amp 2\\0\amp 1\amp 0\amp 1\\0\amp 0\amp 1\amp -1
\end{array}\right].
\end{equation*}
We conclude that
\begin{equation*}
\mathbf{x}=\begin{bmatrix}2\\1\\-1\end{bmatrix}.
\end{equation*}
Now that we know the answer, we will turn to the original question and consider the advantages of the format of the original problem. Observe that the two matrices have a distinct pattern. The matrix on the left has zeros above the main diagonal, while the matrix on the right has zeros below the main diagonal. Matrices that follow such a pattern are called lower triangular and upper triangular matrices, respectively.
Let
\begin{equation*}
\mathbf{y}=\begin{bmatrix}2\amp -1\amp 2\\0\amp 4\amp -1\\0\amp 0\amp 3\end{bmatrix}\mathbf{x}.
\end{equation*}
Then we can write the original equation as
\begin{equation*}
\begin{bmatrix}1\amp 0\amp 0\\3\amp 1\amp 0\\-2\amp 2\amp 1\end{bmatrix}\left(\begin{bmatrix}2\amp -1\amp 2\\0\amp 4\amp -1\\0\amp 0\amp 3\end{bmatrix}\mathbf{x}\right)=\begin{bmatrix}1\amp 0\amp 0\\3\amp 1\amp 0\\-2\amp 2\amp 1\end{bmatrix}\mathbf{y}=\begin{bmatrix}1\\8\\5\end{bmatrix}.
\end{equation*}
The equation
\begin{equation*}
\begin{bmatrix}1\amp 0\amp 0\\3\amp 1\amp 0\\-2\amp 2\amp 1\end{bmatrix}\mathbf{y}=\begin{bmatrix}1\amp 0\amp 0\\3\amp 1\amp 0\\-2\amp 2\amp 1\end{bmatrix}\begin{bmatrix}y_1\\y_2\\y_3\end{bmatrix}=\begin{bmatrix}1\\8\\5\end{bmatrix}.
\end{equation*}
corresponds to the system
\begin{equation*}
\begin{array}{ccccccc}
y_1 \amp + \amp \amp \amp\amp = \amp 5 \\
3y_1 \amp + \amp y_2 \amp \amp \amp =\amp 8\\
-2y_1 \amp + \amp 2y_2 \amp +\amp y_3 \amp =\amp 11
\end{array}.
\end{equation*}
This system can be easily solved by substitution, giving us
\begin{equation*}
y_1=1,
\end{equation*}
\begin{equation*}
3(1)+y_2=8 \rightsquigarrow y_2=5,
\end{equation*}
\begin{equation*}
-2(1)+2(5)+y_3=5 \rightsquigarrow y_3=-3.
\end{equation*}
So,
\begin{equation*}
\mathbf{y}=\begin{bmatrix}y_1\\y_2\\y_3\end{bmatrix}=\begin{bmatrix}1\\5\\-3\end{bmatrix}.
\end{equation*}
Recall that we let
\begin{equation*}
\mathbf{y}=\begin{bmatrix}2\amp -1\amp 2\\0\amp 4\amp -1\\0\amp 0\amp 3\end{bmatrix}\mathbf{x}.-
\end{equation*}
The equation
\begin{equation*}
\begin{bmatrix}2\amp -1\amp 2\\0\amp 4\amp -1\\0\amp 0\amp 3\end{bmatrix}\mathbf{x}=\begin{bmatrix}1\\5\\-3\end{bmatrix}
\end{equation*}
corresponds to the system
\begin{equation*}
\begin{array}{ccccccc}
2x_1 \amp - \amp x_2 \amp + \amp 2x_3 \amp = \amp 1 \\
\amp \amp 4x_2 \amp - \amp x_3 \amp =\amp 5\\
\amp \amp \amp \amp 3x_3 \amp =\amp -3
\end{array}.
\end{equation*}
Problem 4.6.1.
Solve the above system of equations in any way you wish.
Answer.
\begin{equation*}
x_3=-1, \quad x_2 = 1, \quad x_1 = 1.
\end{equation*}
Thus, \(\mathbf{x}=[2\\1\\-1]\text{.}\) Observe that this is the same answer that we obtained in the beginning of the problem using Gaussian elimination.
Given a matrix equation such as
\begin{equation*}
\begin{bmatrix}2\amp -1\amp 2\\6\amp 1\amp 5\\-4\amp 10\amp -3\end{bmatrix}\mathbf{x}=\begin{bmatrix}1\\8\\5\end{bmatrix}
\end{equation*}
of
Exploration 4.6.1, often in practice we express the matrix on the left as a product of an upper triangular matrix
\(U\) and a lower triangular matrix
\(L\) and use substitution to solve the equation instead of using Gaussian elimination to find the solution. In
Exploration 4.6.1,
\begin{equation*}
\begin{bmatrix}2\amp -1\amp 2\\6\amp 1\amp 5\\-4\amp 10\amp -3\end{bmatrix}=LU=\begin{bmatrix}1\amp 0\amp 0\\3\amp 1\amp 0\\-2\amp 2\amp 1\end{bmatrix}\begin{bmatrix}2\amp -1\amp 2\\0\amp 4\amp -1\\0\amp 0\amp 3\end{bmatrix}
\end{equation*}
The process of taking a matrix \(A\) and expressing it as a product \(A=LU\) of a lower triangular matrix \(L\) and an upper triangular matrix \(U\) is called \(LU\) factorization.
We here present one method of finding LU factorizations using inspection.
Subsection 4.6.1 Finding an \(LU\) factorization by the Multiplier Method
In the following example we demonstrate a method for computing an \(LU\) factorization known as the multiplier method.
Example 4.6.7.
Find an \(LU\) factorization for
\begin{equation*}
\begin{bmatrix}
1 \amp 2 \amp 3 \\
2 \amp 3 \amp 1 \\
-2 \amp 3 \amp -2
\end{bmatrix}.
\end{equation*}
Answer.
Write the matrix as the following product.
\begin{equation*}
\begin{bmatrix}
1 \amp 0 \amp 0 \\
0 \amp 1 \amp 0 \\
0 \amp 0 \amp 1
\end{bmatrix}
\begin{bmatrix}
1 \amp 2 \amp 3 \\
2 \amp 3 \amp 1 \\
-2 \amp 3 \amp -2
\end{bmatrix}.
\end{equation*}
In the matrix on the right, we begin with the left row and zero out the entries below the top using the row operation which involves adding a multiple of a row to another row. You do this and also update the matrix on the left so that the product will be unchanged.
Here is the first step. We take \(-2\) times the top row and add to the second. Then take \(2\) times the top row and add to the second in the matrix on the left. This gives us
\begin{equation*}
\begin{bmatrix}
1 \amp 0 \amp 0 \\
2 \amp 1 \amp 0 \\
0 \amp 0 \amp 1
\end{bmatrix}
\begin{bmatrix}
1 \amp 2 \amp 3 \\
0 \amp -1 \amp -5 \\
-2 \amp 3 \amp -2
\end{bmatrix}.
\end{equation*}
The next step is to take \(2\) times the top row and add to the bottom in the matrix on the right. To ensure that the product is unchanged, we place a \(-2\) in the bottom left corner in the matrix on the left. Thus the second step yields
\begin{equation*}
\begin{bmatrix}
1 \amp 0 \amp 0 \\
2 \amp 1 \amp 0 \\
-2 \amp 0 \amp 1
\end{bmatrix}
\begin{bmatrix}
1 \amp 2 \amp 3 \\
0 \amp -1 \amp -5 \\
0 \amp 7 \amp 4
\end{bmatrix}.
\end{equation*}
For our final step, we take \(7\) times the middle row on right and add to bottom row. Updating the matrix on the left in the manner we did earlier, we get
\begin{equation*}
\begin{bmatrix}
1 \amp 0 \amp 0 \\
2 \amp 1 \amp 0 \\
-2 \amp -7 \amp 1
\end{bmatrix}
\begin{bmatrix}
1 \amp 2 \amp 3 \\
0 \amp -1 \amp -5 \\
0 \amp 0 \amp -31
\end{bmatrix}.
\end{equation*}
At this point, we can stop. We have an \(LU\) factorization of the original matrix. We can always multiply the matrices to check, if we wish.
Notice that when we perform the multiplier method, we are making repeated use of a certain type of elementary row operation, namely, we are adding a scalar multiple of one row to a row below it. The reason this helps to create an \(LU\) factorization depends upon the fact that the elementary matrices corresponding to such operations are lower triangular. To understand how this works, we begin with a lemma.
Lemma 4.6.8.
Let \(L\) be a lower (upper) triangular matrix \(m\times m\) which has ones down the main diagonal. Then \(L^{-1}\) also is a lower (upper) triangular matrix which has ones down the main diagonal. In the case that \(L\) is of the form
\begin{equation}
L=
\begin{bmatrix}
1 \amp \amp \amp \\
a_{1} \amp 1 \amp \amp \\
\vdots \amp \amp \ddots \amp \\
a_{n} \amp \amp \amp 1
\end{bmatrix}, \tag{4.6.2}
\end{equation}
where all entries are zero except for the left column and main diagonal, it is also the case that \(L^{-1}\) is obtained from \(L\) by simply multiplying each entry below the main diagonal in \(L\) by \(-1\text{.}\) The same is true if the single nonzero column is in another position.
Proof.
Consider the usual setup for finding the inverse
\(\begin{bmatrix}L \ |\ I\end{bmatrix}\text{.}\) Each row operation used on
\(L\) to transform this matrix to reduced row echelon form changes only the entries in
\(I\) below the main diagonal. Whether we have the special case of
\(L\) given in
(4.6.2) where the nonzero nondiagonal entries are in the left column, or if the single nonzero column is in another position, it is clear that multiplication by
\(-1\) as described in the lemma gives us
\(L^{-1}\text{.}\)
For a simple illustration of the lemma, observe:
\begin{equation*}
\left[\begin{array}{ccc|ccc}
1 \amp 0 \amp 0 \amp 1 \amp 0 \amp 0 \\
0 \amp 1 \amp 0 \amp 0 \amp 1 \amp 0 \\
0 \amp a \amp 1 \amp 0 \amp 0 \amp 1
\end{array}\right]
\rightsquigarrow
\left[\begin{array}{ccc|ccc}
1 \amp 0 \amp 0 \amp 1 \amp 0 \amp 0 \\
0 \amp 1 \amp 0 \amp 0 \amp 1 \amp 0 \\
0 \amp 0 \amp 1 \amp 0 \amp -a \amp 1
\end{array}\right].
\end{equation*}
Now let \(A\) be an \(m\times n\) matrix, say
\begin{equation*}
A=
\begin{bmatrix}
a_{11} \amp a_{12} \amp \cdots \amp a_{1n} \\
a_{21} \amp a_{22} \amp \cdots \amp a_{2n} \\
\vdots \amp \vdots \amp \amp \vdots \\
a_{m1} \amp a_{m2} \amp \cdots \amp a_{mn}
\end{bmatrix},
\end{equation*}
and assume \(A\) can be row reduced to an upper triangular form using only row operation 3. Thus, in particular, \(a_{11}\neq 0\text{.}\) Multiply on the left by
\begin{equation*}
E_{1}=
\begin{bmatrix}
1 \amp 0 \amp \cdots \amp 0 \\
-
\frac{a_{21}}{a_{11}} \amp 1 \amp \cdots \amp 0 \\
\vdots \amp \vdots \amp \ddots \amp \vdots \\
-\frac{a_{m1}}{a_{11}} \amp 0 \amp \cdots \amp 1
\end{bmatrix}.
\end{equation*}
This is the product of elementary matrices which make modifications in the first column only. It is equivalent to taking \(-a_{21}/a_{11}\) times the first row and adding to the second. Then taking \(-a_{31}/a_{11}\) times the first row and adding to the third and so forth. The quotients in the first column of the above matrix are the multipliers. Thus the result is of the form
\begin{equation*}
E_{1}A=
\begin{bmatrix}
a_{11} \amp a_{12} \amp \cdots \amp a_{1n}^{\prime } \\
0 \amp a_{22}^{\prime } \amp \cdots \amp a_{2n}^{\prime } \\
\vdots \amp \vdots \amp \amp \vdots \\
0 \amp a_{m2}^{\prime } \amp \cdots \amp a_{mn}^{\prime }
\end{bmatrix}.
\end{equation*}
By assumption, \(a_{22}^{\prime }\neq 0\) and so it is possible to use this entry to zero out all the entries below it in the matrix on the right by multiplication by a matrix of the form
\begin{equation*}
E_{2}=
\begin{bmatrix}
1 \amp \mathbf{0} \\
\mathbf{0} \amp E
\end{bmatrix},
\end{equation*}
where \(E\) is an \(( m-1)\times ( m-1)\) matrix of the form
\begin{equation*}
E=\begin{bmatrix}
1 \amp 0 \amp \cdots \amp 0 \\
-\frac{a_{32}^{\prime }}{a_{22}^{\prime }} \amp 1 \amp \cdots \amp 0 \\
\vdots \amp \vdots \amp \ddots \amp \vdots \\
-\frac{a_{m2}^{\prime }}{a_{22}^{\prime }} \amp 0 \amp \cdots \amp 1
\end{bmatrix}.
\end{equation*}
Again, the entries in the first column below the 1 are the multipliers. Continuing this way, zeroing out the entries below the diagonal entries, finally leads us to
\begin{equation*}
E_{m-1}E_{m-2}\cdots E_{1}A=U,
\end{equation*}
where \(U\) is upper triangular. Each \(E_{j}\) has all ones down the main diagonal and is lower triangular. Now we can multiply both sides by the inverses of the \(E_{j}\) in the reverse order. This yields
\begin{equation*}
A=E_{1}^{-1}E_{2}^{-1}\cdots E_{m-1}^{-1}U.
\end{equation*}
By
Lemma 4.6.8, this implies that the product of those
\(E_{j}^{-1}\) is a lower triangular matrix having all ones down the main diagonal.
The above discussion and lemma explain how the multiplier method works. The expressions
\begin{equation*}
-\frac{a_{21}}{a_{11}},-\frac{a_{31}}{a_{11}},\cdots, -\frac{a_{m1}}{a_{11}},
\end{equation*}
which we obtained in building \(E_{1}\text{,}\) and which we denote respectively by \(M_{21},\cdots ,M_{m1}\) to save notation, are the multipliers. \index{multipliers} Therefore, according to the lemma, to find \(E_{1}^{-1}\) we simply write
\begin{equation*}
\begin{bmatrix}
1 \amp 0 \amp \cdots \amp 0 \\
-M_{21} \amp 1 \amp \cdots \amp 0 \\
\vdots \amp \vdots \amp \ddots \amp \vdots \\
-M_{m1} \amp 0 \amp \cdots \amp 1
\end{bmatrix}.
\end{equation*}
Similar considerations apply to the other \(E_{j}^{-1}.\) Thus \(L\) is a product of the form
\begin{equation*}
\begin{bmatrix}
1 \amp 0 \amp \cdots \amp 0 \\
-M_{21} \amp 1 \amp \cdots \amp 0 \\
\vdots \amp \vdots \amp \ddots \amp \vdots \\
-M_{m1} \amp 0 \amp \cdots \amp 1
\end{bmatrix}
\cdots
\begin{bmatrix}
1 \amp 0 \amp \cdots \amp 0 \\
0 \amp 1 \amp \cdots \amp 0 \\
\vdots \amp 0 \amp \ddots \amp \vdots \\
0 \amp \cdots \amp -M_{m(m-1)} \amp 1
\end{bmatrix},
\end{equation*}
where each factor has at most one nonzero column, the position of which moves from left to right as we scan the above product of matrices from left to right. It follows from what we know about the effect of multiplying on the left by an elementary matrix that the above product is of the form
\begin{equation*}
\begin{bmatrix}
1 \amp 0 \amp \cdots \amp 0 \amp 0 \\
-M_{21} \amp 1 \amp \cdots \amp 0 \amp 0 \\
\vdots \amp -M_{32} \amp \ddots \amp \vdots \amp \vdots \\
-M_{(M-1)1} \amp \vdots \amp \cdots \amp 1 \amp 0 \\
-M_{M1} \amp -M_{M2} \amp \cdots \amp -M_{MM-1} \amp 1
\end{bmatrix}.
\end{equation*}
To sum up the procedure, beginning at the left column and moving toward the right, you simply insert, into the corresponding position in the identity matrix, \(-1\) times the multiplier which was used to zero out an entry in that position below the main diagonal in \(A,\) while retaining the main diagonal which consists entirely of ones. This will give us \(L\) as we create \(U\) using row operation 3.
We now return to
Example 4.6.2, to understand why we could not use the multiplier method to find an
\(LU\) factorization of the coefficient matrix. Suppose we had written
\begin{equation*}
\begin{bmatrix}1\amp 0\amp 0\\0\amp 1\amp 0\\0\amp 0\amp 1\end{bmatrix}\begin{bmatrix}0\amp 12\amp -18\\1\amp 2\amp -3\\1\amp -2\amp 1\end{bmatrix}.
\end{equation*}
Our first step of Gaussian elimination would require a row swap to get a nonzero entry into the top left corner (we swapped rows 1 and 2 in
Example 4.6.2). Unfortunately, the elementary matrix that accomplishes this,
\begin{equation*}
P=\begin{bmatrix}0\amp 1\amp 0\\1\amp 0\amp 0\\0\amp 0\amp 1\end{bmatrix} \quad \text{is NOT lower triangular.}
\end{equation*}
So we cannot apply Lemma
Lemma 4.6.8 to generate a lower triangular
\(L\text{.}\)
If the elementary matrices used to reduce our matrix to row-echelon form are all lower triangular, then we can find an \(LU\) factorization. But what about the general case?
Theorem 4.6.9.
Suppose an \(m \times n\) matrix \(A\) is transformed to a row-echelon matrix \(U\) using Gaussian elimination. Let \(P_1, P_2, \ldots, P_s\) be the elementary matrices corresponding (in order) to the row interchanges used, and write \(P=P_s \cdots P_2 P_1\text{.}\) (If no interchanges are used take \(P = I_M\text{.}\)) Then:
\(PA\) is the matrix obtained from \(A\) by doing these interchanges (in order) to A.
\(PA\) has an \(LU\) factorization.
For a proof of the above theorem, the reader is referred to [Nicholson].
The matrix \(P\) in the above theorem is called a permutation matrix. These matrices have other important applications, as we will see later.