Skip to main content
Logo image

Coordinated Linear Algebra

Section 10.2 Gram-Schmidt Orthogonalization

We said that a set \(\{ \mathbf{f}_1, \mathbf{f}_2, \dots, \mathbf{f}_m\}\) of nonzero vectors in \(\R^n\) is called an orthogonal set if \(\mathbf{f}_i \cdot \mathbf{f}_j =0\) for all \(i \neq j\text{.}\) In this section we will prove that every orthogonal set is linearly independent, and therefore it is a basis for its span. We have already seen that the expansion of a vector as a linear combination of orthogonal basis vectors is easy to obtain because formulas exist for the coefficients. Hence the orthogonal bases are the ``nice’’ bases.

Subsection 10.2.1 A Visual Guide to Creating an Orthogonal Basis

Given an arbitrary basis \(\{\mathbf{v}_1, \mathbf{v}_2\}\) of \(\R^2\text{,}\) let’s start building our orthogonal basis, \(\{\mathbf{f}_1, \mathbf{f}_2\}\text{,}\) by setting \(\mathbf{f}_1 = \mathbf{v}_1\text{.}\) To find the next element of our orthogonal basis, consider the orthogonal projection of \(\mathbf{v}_2\) onto \(\mathbf{f}_1\text{.}\) (See the figure below.)
Projection pictured
Continuation of above
Next, let \(\mathbf{f}_2=\mathbf{v}_2-\mbox{proj}_{\mathbf{f}_1}\mathbf{v}_2\text{.}\) Observe that \(\mathbf{f}_2\) is orthogonal to \(\mathbf{f}_1\) (see Theorem 10.1.15). This gives us an orthogonal collection \(\mathcal{B}=\{\mathbf{f}_1,\mathbf{f}_2\}\text{.}\) It is intuitively clear that \(\mathbf{f}_1\) and \(\mathbf{f}_2\) are linearly independent. Therefore \(\mathcal{B}\) is an orthogonal basis of \(\R^2\text{.}\) The following exploration illustrates this process dynamically.

Exploration 10.2.1.

Choose an arbitrary basis \(\{\mathbf{v}_1, \mathbf{v}_2\}\) of \(\R^2\) by dragging the tips of vectors \(\mathbf{v}_1\) and \(\mathbf{v}_2\) to desired positions. Use the navigation bar at the bottom of the interactive window to go through the steps of constructing an orthogonal basis of \(\R^2\text{.}\)
Figure 10.2.1.
We can apply this process to any two-dimensional subset of \(\R^n\text{.}\) The following exploration will guide you through the process of constructing an orthogonal basis for a plane spanned by two arbitrary vectors in \(\R^3\text{.}\)

Exploration 10.2.2.

Let \(W =\mbox{span}\left(_1,_2\right)\text{.}\) \(W\) is a plane through the origin in \(\R^3\text{.}\) Use the navigation bar at the bottom of the interactive window to go through the steps of constructing an orthogonal basis for \(W\text{.}\) RIGHT-CLICK and DRAG to rotate the image for a better view.
Figure 10.2.2.
In the next exploration, we take the process of constructing an orthogonal basis to the edge of the visual realm and construct an orthogonal basis for \(\R^3\text{.}\)

Exploration 10.2.3.

In the interactive below \(\{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3\}\) is a basis of \(\R^3\text{.}\) Use check boxes to go through the steps for constructing an orthogonal basis starting with the given basis. RIGHT-CLICK and DRAG to rotate the image for a better view.
Figure 10.2.3.

Subsection 10.2.2 Gram-Schmidt Orthogonalization Algorithm

In the first sections of this chapter, we have repeatedly assumed that our subspaces of \(\R^n\) have an orthogonal basis. We will now prove that this is indeed the case. Recall that to be a basis of a subspace, a set of vectors must be linearly independent and it must span the subspace. We will start by demonstrating that a set of orthogonal vectors must be linearly independent.

Proof.

To show that this set is linearly independent, we need to demonstrate that the only solution to the following equation is the trivial solution. So suppose
\begin{equation*} a_1 \mathbf{w}_1 + a_2 \mathbf{w}_2 + \cdots + a_k \mathbf{w}_k = \mathbf{0}. \end{equation*}
To accomplish this, we need to show that all \(a_i = 0\) for all \(0\leq i\leq k\text{.}\) To do so we take the dot product of each side of the above equation with the vector \(\mathbf{w}_i\) and obtain the following.
\begin{align*} \mathbf{w}_i \cdot (a_1 \mathbf{w}_1 + a_2 \mathbf{w}_2 + \cdots + a_k \mathbf{w}_k ) \amp = \mathbf{w}_i \cdot \mathbf{0} \\ a_1 (\mathbf{w}_i \cdot \mathbf{w}_1) + a_2 (\mathbf{w}_i \cdot \mathbf{w}_2) + \cdots + a_k (\mathbf{w}_i \cdot \mathbf{w}_k) \amp = 0 \end{align*}
Now since the set is orthogonal, \(\mathbf{w}_i \cdot \mathbf{w}_m = 0\) for all \(m \neq i\text{,}\) so we have:
\begin{equation*} a_1 (0) + \cdots + a_i(\mathbf{w}_i \cdot \mathbf{w}_i) + \cdots + a_k (0) = 0, \end{equation*}
\begin{equation*} a_i \norm{\mathbf{w}_i}^2 = 0. \end{equation*}
We know that \(\norm{\mathbf{w}_i}^2 \neq 0\text{,}\) so it follows that \(a_i =0\text{.}\) Since \(i\) was chosen arbitrarily, \(a_i =0\) for all \(i\) \((0\leq i\leq k)\text{.}\) This proves that \(\{ \mathbf{w}_1, \mathbf{w}_2, \cdots, \mathbf{w}_k \}\) is linearly independent.
The following theorem shows how to start with an arbitrary basis of a subspace \(W\) of \(\R^n\) and find an orthogonal basis for \(W\text{.}\) To better understand the notation and the process presented in this theorem, you may want to match the steps of the theorem to the steps of Exploration Exploration 10.2.3.

Proof.

Using the definition of projection onto a subspace, the iterative procedure above may be written:
\begin{equation} \begin{array}{ccl} \mathbf{f}_{1} \amp =\amp \mathbf{v}_{1} \\ \mathbf{f}_{2} \amp =\amp \mathbf{v}_{2} - \frac{\mathbf{v}_{2} \cdot \mathbf{f}_{1}}{\norm{\mathbf{f}_{1}}^2}\mathbf{f}_{1} \\ \mathbf{f}_{3} \amp =\amp \mathbf{v}_{3} - \frac{\mathbf{v}_{3} \cdot \mathbf{f}_{1}}{\norm{\mathbf{f}_{1}}^2}\mathbf{f}_{1} - \frac{\mathbf{v}_{3} \cdot \mathbf{f}_{2}}{\norm{\mathbf{f}_{2}}^2}\mathbf{f}_{2} \\ \vdots \amp \amp \\ \mathbf{f}_{k} \amp =\amp \mathbf{v}_{k} - \frac{\mathbf{v}_{k} \cdot \mathbf{f}_{1}}{\norm{\mathbf{f}_{1}}^2}\mathbf{f}_{1} - \frac{\mathbf{v}_{k} \cdot \mathbf{f}_{2}}{\norm{\mathbf{f}_{2}}^2}\mathbf{f}_{2} - \dots -\frac{\mathbf{v}_{k} \cdot \mathbf{f}_{k-1}}{\norm{\mathbf{f}_{k-1}}^2}\mathbf{f}_{k-1} \\ \vdots \amp \amp \end{array}\tag{10.2.1} \end{equation}
We see immediately that \(\mbox{span}\{\mathbf{f}_{1}\}=W_1\) and that \(\mbox{span}\{\mathbf{f}_{1},\mathbf{f}_{2}\}=W_2\) because \(\mathbf{f}_{2}\) is a linear combination of \(\mathbf{v}_{1}\) and \(\mathbf{v}_{2}\text{.}\) In fact, for any value of \(k\text{,}\) we see that \(\mbox{span}\{\mathbf{f}_{1},\mathbf{f}_{2},\ldots,\mathbf{f}_{k}\}=W_k\text{,}\) because each \(\mathbf{f}_{k}\) is a linear combination of the vectors \(\{\mathbf{v}_{1},\mathbf{v}_{2},\ldots,\mathbf{v}_{k-1}\}\text{.}\)
Repeated application of Corollary 10.1.16 shows that the set \(\{\mathbf{f}_{1},\mathbf{f}_{2},\ldots,\mathbf{f}_{m}\}\) is orthogonal. Linear independence follows from orthogonality by Theorem 10.2.4.
We conclude that \(\{\mathbf{f}_{1},\mathbf{f}_{2},\ldots,\mathbf{f}_{m}\}\) is a linearly independent orthogonal set that spans \(W\text{.}\)
Knowing the Gram-Schmidt procedure in detail is important, albeit slightly tedious. As such, we better run through an example in detail.

Example 10.2.6.

Find an orthogonal basis of the row space of
\begin{equation*} A = \begin{bmatrix} 1 \amp 1 \amp -1 \amp -1\\ 3 \amp 2 \amp 0 \amp 1\\ 1 \amp 0 \amp 1 \amp 0 \end{bmatrix}. \end{equation*}
Answer.
Let \(\mathbf{v}_{1}\text{,}\) \(\mathbf{v}_{2}\text{,}\) \(\mathbf{v}_{3}\) denote the rows of \(A\) and observe that \(\{\mathbf{v}_{1}, \mathbf{v}_{2}, \mathbf{v}_{3}\}\) is linearly independent. Take \(\mathbf{f}_{1} = \mathbf{v}_{1}\text{.}\) The algorithm gives
\begin{align*} \mathbf{f}_{2} \amp = \mathbf{v}_{2} - \frac{\mathbf{v}_{2} \cdot \mathbf{f}_{1}}{\norm{\mathbf{f}_{1}}^2}\mathbf{f}_{1} = [3, 2, 0, 1] - \frac{4}{4}[1, 1, -1, -1] = [2, 1, 1, 2] \\ \mathbf{f}_{3} \amp = \mathbf{v}_{3} - \frac{\mathbf{v}_{3} \cdot \mathbf{f}_{1}}{\norm{\mathbf{f}_{1}}^2}\mathbf{f}_{1} - \frac{\mathbf{v}_{3} \cdot \mathbf{f}_{2}}{\norm{\mathbf{f}_{2}}^2}\mathbf{f}_{2} = \mathbf{v}_{3} - \frac{3}{10}\mathbf{f}_{2} = \frac{1}{10}[4, -3, 7, -6]. \end{align*}
Hence
\begin{equation*} \{[1, 1, -1, -1], [2, 1, 1, 2], \frac{1}{10}[4, -3, 7, -6]\} \end{equation*}
is the orthogonal basis provided by the algorithm. In hand calculations it may be convenient to eliminate fractions (see the Remark below), so
\begin{equation*} \{[1, 1, -1, -1], [2, 1, 1, 2], [4, -3, 7, -6]\} \end{equation*}
is also an orthogonal basis for \(\mbox{row}(A)\text{.}\)

Remark 10.2.7.

Observe that the vector \(\frac{\mathbf{x} \cdot \mathbf{f}_{i}}{\norm{\mathbf{f}_{i}}^2}\mathbf{f}_{i}\) is unchanged if a nonzero scalar multiple of \(\mathbf{f}_{i}\) is used in place of \(\mathbf{f}_{i}\text{.}\) Hence, if a newly constructed \(\mathbf{f}_{i}\) is multiplied by a nonzero scalar at some stage of the Gram-Schmidt algorithm, the subsequent \(\mathbf{f}\)s will be unchanged. This is useful in actual calculations.
The Gram-Schmidt algorithm demonstrates in a constructive way that every subspace of \(\R^n\) has an orthogonal basis. We formalize this in one final theorem.

Proof.

Suppose \(\{\mathbf{f}_{1}, \dots , \mathbf{f}_{m}\}\) is an orthogonal subset of \(W\text{.}\) If
\begin{equation*} \mbox{span}\left(\mathbf{f}_{1}, \dots , \mathbf{f}_{m}\right) = W, \end{equation*}
it is already a basis. Otherwise, there exists \(\mathbf{x}\) in \(W\) outside \(\mbox{span}\left(\mathbf{f}_{1}, \dots , \mathbf{f}_{m}\right)\text{.}\)
Using the Gram-Schmidt procedure we define
\begin{equation*} \mathbf{f}_{m+1} = \mathbf{x} - \mbox{proj}_{W_{m}}(\mathbf{x}), \end{equation*}
where \(W_m = \mbox{span}\{\mathbf{f}_{1},\mathbf{f}_{2},\ldots,\mathbf{f}_{m}\}\text{.}\) If \(\mbox{span}\left(\mathbf{f}_{1}, \dots, \mathbf{f}_{m}, \mathbf{f}_{m+1}\right) = W\text{,}\) we are done. Otherwise, the process continues to create larger and larger orthogonal subsets of \(W\text{.}\) They are linearly independent by Theorem 10.2.5, so we have a basis when we reach a subset containing \(\mbox{dim}(W)\) vectors.
The process described in the proof of this theorem is used in this final example.

Example 10.2.9.

In Example 10.2.6, given
\begin{equation*} A = \begin{bmatrix} 1 \amp 1 \amp -1 \amp -1\\ 3 \amp 2 \amp 0 \amp 1\\ 1 \amp 0 \amp 1 \amp 0 \end{bmatrix}, \end{equation*}
we showed that an orthogonal basis for \(\mbox{row}(A)\) is
\begin{equation*} \lbrace \mathbf{f}_1=[1, 1, -1, -1], \mathbf{f}_2=[2, 1, 1, 2], \mathbf{f}_3=[4, -3, 7, -6] \rbrace. \end{equation*}
Choose any vector \(\mathbf{v}_4 \in \R^4\) not in \(\mbox{span}\{\mathbf{f}_1, \mathbf{f}_2, \mathbf{f}_3\}\text{,}\) and apply the Gram-Schmidt algorithm to produce a vector \(\mathbf{f}_4\) such that \(\{\mathbf{f}_1, \mathbf{f}_2, \mathbf{f}_3, \mathbf{f}_4\}\) is an orthogonal basis for \(\R^4\text{.}\)
Answer.
Let \(\mathbf{v}_4 = [1, 0, 0, 0]\) (quick mental exercise: How would you check that \(\mathbf{v}_4\) is not in \(\mbox{span}\{\mathbf{f}_1, \mathbf{f}_2, \mathbf{f}_3\}\text{?}\)). To get a vector \(\mathbf{f}\) orthogonal to the row space, we perform an iteration of Gram-Schmidt:
\begin{align*} \mathbf{f} \amp = \mathbf{v}_{4} - \frac{\mathbf{v}_{4} \cdot \mathbf{f}_{1}}{\norm{\mathbf{f}_{1}}^2}\mathbf{f}_{1} - \frac{\mathbf{v}_{4} \cdot \mathbf{f}_{2}}{\norm{\mathbf{f}_{2}}^2}\mathbf{f}_{2} - \frac{\mathbf{v}_{4} \cdot \mathbf{f}_{3}}{\norm{\mathbf{f}_{3}}^2}\mathbf{f}_{3} \\ \amp = [1, 0, 0, 0] - \frac{1}{4}[1, 1, -1, -1] - \frac{2}{10}[2, 1, 1, 2] - \frac{4}{110}[4, -3, 7, -6] \\ \amp = \frac{1}{44}[9, -15, -9, 3]. \end{align*}
Since any multiple of \(\mathbf{f}\) will suffice, we are free to choose \(\mathbf{f}_{4} = 44\mathbf{f} = [9, -15, -9, 3]\) to get rid of the fraction.
It is easy to check that \(\{\mathbf{f}_1, \mathbf{f}_2, \mathbf{f}_3, \mathbf{f}_4\}\) is an orthogonal set, and it follows from Theorem 10.2.4 that this set is a basis for \(\R^4\text{.}\)

Remark 10.2.10.

Suppose instead of \([1,0,0,0]\) we had started with \(\mathbf{v}_4 = [7, -1, 7, -5]\text{.}\) This vector \(\mathbf{v}_4\) is in \(\mbox{span}\{\mathbf{f}_1, \mathbf{f}_2, \mathbf{f}_3\}\text{,}\) as it is the sum of those three vectors. But if we were to try to proceed as above, we would get
\begin{align*} \mathbf{f} \amp = \mathbf{v}_{4} - \frac{\mathbf{v}_{4} \cdot \mathbf{f}_{1}}{\norm{\mathbf{f}_{1}}^2}\mathbf{f}_{1} - \frac{\mathbf{v}_{4} \cdot \mathbf{f}_{2}}{\norm{\mathbf{f}_{2}}^2}\mathbf{f}_{2} - \frac{\mathbf{v}_{4} \cdot \mathbf{f}_{3}}{\norm{\mathbf{f}_{3}}^2}\mathbf{f}_{3} \\ \amp = [7, -1, 7, -5] - \frac{4}{4}[1, 1, -1, -1] - \frac{10}{10}[2, 1, 1, 2] - \frac{110}{110}[4, -3, 7, -6] \\ \amp = [0,0,0,0]. \end{align*}
We could not add a multiple of \(\mathbf{f}\) to \(\{\mathbf{f}_1, \mathbf{f}_2, \mathbf{f}_3\}\) to get an orthogonal basis.

Exercises 10.2.3 Exercises

1.

Try Example 10.2.9 again starting with some other vector \(\mathbf{v}_4 \in \R^4\text{.}\)

Exercise Group.

In each case, use the Gram-Schmidt algorithm to convert the given basis \(\mathcal{B}\) of \(V\) to an orthogonal basis.
2.
\(V = \R^2\text{,}\) \(\mathcal{B} = \left\{\begin{bmatrix}1\\ -1\end{bmatrix}, \begin{bmatrix}2\\ 1\end{bmatrix}\right\}\text{.}\)
3.
\(V = \R^2\text{,}\) \(\mathcal{B} = \left\{\begin{bmatrix}2\\ 1\end{bmatrix}, \begin{bmatrix}1\\ 2\end{bmatrix}\right\}\text{.}\)
4.
\(V = \R^3\text{,}\) \(\mathcal{B} = \left\{\begin{bmatrix}1\\ -1\\ 1\end{bmatrix}, \begin{bmatrix}1\\ 0\\ 1\end{bmatrix}, \begin{bmatrix}1\\ 1\\ 2\end{bmatrix}\right\}\text{.}\)
5.
\(V = \R^3\text{,}\) \(\mathcal{B} = \left\{\begin{bmatrix}0\\ 1\\ 1\end{bmatrix}, \begin{bmatrix}1\\ 1\\ 1\end{bmatrix}, \begin{bmatrix}1\\ -2\\ 2\end{bmatrix}\right\}\text{.}\)