Linear and Multilinear Algebra
Linear Independence and Bases

Linear Independence and Bases

Okay. So, we've introduced the idea of a vector space and the notion of linear functions between vector spaces. However, these are still very much abstract objects. In order to make these ideas useful, we need to convert an abstract vector vv into something we can actually manipulate on a computer. We need coordinates.

Coordinates

The idea behind coordinates is that we try to write vectors in VV as a linear combination of vectors that we choose ahead of time. By doing this, we can perform numerical manipulations on the coefficients in the linear combination instead of the abstract objects themselves. For example, if

v=α1v1+α2v2+α3v3,w=β1v1+β2v2+β3v3,\bm{v} = \alpha_1 \bm{v_1} + \alpha_2 \bm{v_2} + \alpha_3 \bm{v_3} \,, \bm{w} = \beta_1 \bm{v_1} + \beta_2 \bm{v_2} + \beta_3 \bm{v_3} \,,

then it is very easy to add these objects together,

v+w=(α1+β1)v1+(α2+β2)v2+(α3+β3)v3.\bm{v} + \bm{w} = (\alpha_1 + \beta_1) \bm{v_1} + (\alpha_2 + \beta_2) \bm{v_2} + (\alpha_3 + \beta_3) \bm{v_3} \,.

Therefore, on a computer, we can represent v\bm{v} and w\bm{w} by using arrays of the coefficients αi\alpha_i and βi\beta_i (a.k.a. the coordinates with respect to v1,v2,v3\bm{v}_1, \bm{v}_2, \bm{v}_3). In array notation, I like to write the set v1,...,v3\bm{v}_1, ..., \bm{v}_3 as a row vector,

B=[v1v2v3].\mathcal{B} = \left[\begin{array}{ccc} \bm{v}_1 & \bm{v}_2 & \bm{v}_3 \end{array}\right] \,.

Then we can write v\bm{v} and w\bm{w} as

v=B[α1α2α3],w=B[β1β2β3].\bm{v} = \mathcal{B}\left[\begin{array}{c} \alpha_1 \\ \alpha_2 \\ \alpha_3 \end{array}\right] \,, \qquad \bm{w} = \mathcal{B}\left[\begin{array}{c} \beta_1 \\ \beta_2 \\ \beta_3 \end{array}\right] \,.

However, note that the B\mathcal{B} part of this equation exists only in our heads. It cannot really be put into a computer as long as the vectors v1,...,v3\bm{v}_1, ..., \bm{v}_3 remain abstract objects. But in the context of the vector space VV, the coordinates have no meaning unless it is with respect to a set of vectors B\mathcal{B}. It is important to remember this, it can have dire ramifications. Namely, if we swap out B\mathcal{B}, then the coordinates change. In many cases, the set B\mathcal{B} is implied, so it is common for people in practice to drop the B\mathcal{B} and work exclusively on the coordinates, but in order to avoid confusion it is good to remind oneself occasionally what the coordinates are actually in reference to.

However, we've glanced over an even more glaring issue. So far, we have no guarantee that the coordinates are actually unique or actually exist for any vector! Indeed, we can't just choose any set B\mathcal{B} to represent vectors. In particular, if we think about the operation of taking an array of coordinates and mapping it to a vector by applying the set B\mathcal{B} on the left hand side,

B():R3V,\mathcal{B} (\cdot) : \mathbb{R}^3 \rightarrow V \,,

we want to be able to know two things about this coordinate map:

  1. (Injectivity) For any two distinct set of coordinates a1,a2R3\bm{a}_1, \bm{a}_2 \in \mathbb{R}^3, are Ba1,Ba2V\mathcal{B}\bm{a}_1, \mathcal{B}\bm{a}_2 \in V actually distinct vectors?
  2. (Surjectivity) For any vV\bm{v} \in V, is there actually a set of coordinates in R3\mathbb{R}^3 that yields this vector?

Clearly both of these properties would be very desirable to have for B\mathcal{B} to be a basis (foreshadowing) for our computations. Let's begin with the first one, injectivity. The answer to this question has to do with something called linear independence.

Linear Independence

In linear algebra, perhaps the most fundamental relationship between vectors is that of linear independence. Linear independence characterizes lack of redundancy in a set of vectors. For example, suppose that

v1=[10],v2=[01],v3=[11].\bm{v_1} = \left[\begin{array}{c} 1 \\ 0 \end{array}\right]\,, \qquad \bm{v_2} = \left[\begin{array}{c} 0 \\ 1 \end{array}\right]\,, \qquad \bm{v_3} = \left[\begin{array}{c} 1 \\ 1 \end{array}\right]\,.

From the perspective of the user, note that only two out of the three of these vectors are actually useful. To be more precise, any vector that can be written as a combination of v1,v2,v3\bm{v_1}, \bm{v_2}, \bm{v_3} can also be written as a linear combination of v1,v2\bm{v_1}, \bm{v_2}, since

v3=v1+v2.\bm{v_3} = \bm{v_1} + \bm{v_2} \,.

However, note that such a combination could just have easily been written as a combination of v2,v3\bm{v_2}, \bm{v_3} or v1,v3\bm{v_1}, \bm{v_3}, since

v1=v3v2,v2=v3v1.\bm{v_1} = \bm{v_3} - \bm{v_2} \,, \qquad \bm{v_2} = \bm{v_3} - \bm{v_1} \,.

Moreover, in the context of our previous discussion about coordinates, this relationship between the vectors means that we do not have unique coordinates. For example,

B[110]=v1+v2=v3=B[001].\mathcal{B} \left[\begin{array}{c} 1 \\ 1 \\ 0 \end{array}\right] = \bm{v}_1 + \bm{v}_2 = \bm{v}_3 = \mathcal{B} \left[\begin{array}{c} 0 \\ 0 \\ 1 \end{array}\right] \,.

See? The coordinates are not unique. Bummer. It would be very annoying to perform calculations with respect to B\mathcal{B}. On the other hand, C=[v1v2]\mathcal{C} = \left[ \begin{array}{cc} \bm{v}_1 & \bm{v}_2 \end{array} \right] doesn't have this annoying feature. So what is the difference between the sets B\mathcal{B} and C\mathcal{C}? Well, it has to do with the fact that there was a non trivial way of writing zero as a non-trivial linear combination the members of the set,

v1+v2v3=0.\bm{v}_1 + \bm{v}_2 - \bm{v}_3 = \bm{0} \,.

Whereas there is no way of writing

α1v1+α2v2=0,\alpha_1 \bm{v}_1 + \alpha_2 \bm{v}_2 = \bm{0} \,,

for any α1,α2\alpha_1, \alpha_2 except the trivial α1=α2=0\alpha_1 = \alpha_2 = 0. The relationship v1+v2v3=0\bm{v}_1 + \bm{v}_2 - \bm{v}_3 = \bm{0} is called a linear dependence relationship and if such a relationship exists we say B\mathcal{B} is linearly dependent whereas if no such relationship exists then we say the set is linearly independent. This leads to the following definition,


Definition: A set of vectors B=[v1,...,vn]\mathcal{B} = \left[ \bm{v}_1, ..., \bm{v}_n \right] is linearly independent if

iαivi=0\sum_i \alpha_i \bm{v}_i = \bm{0}

implies αi=0\alpha_i = 0. In other words, there exists no way of achieving 0\bm{0} as a non-trivial linear combination of vi\bm{v}_i.


Note that linear independence automatically implies that the coordinate map B():FnV\mathcal{B} (\cdot) : F^n \rightarrow V is injective, since if a,bFn\bm{a}, \bm{b} \in F^n are two sets of coordinates that yield the same vector under B\mathcal{B} \cdot, then

0=BaBb=iaiviibivi=i(aibi)vi.\bm{0} = \mathcal{B} \bm{a} - \mathcal{B} \bm{b} = \sum_i a_i \bm{v}_i - \sum_i b_i \bm{v}_i = \sum_i (a_i - b_i) \bm{v}_i \,.

Linear independence therefore implies that ai=bia_i = b_i and hence a=b\bm{a} = \bm{b}. Here FF denotes the field of the vector space VV.

Surjectivity

Now, the surjectivity of the coordinate map (i.e., can I write every vector in VV as a linear combination of vectors in B\mathcal{B}) has to do with a property called dimension. In order to proceed, we will need two more definitions:


Definition: The dimension of a linearly independent set B=[v1,...,vn]\mathcal{B} = \left[ \bm{v}_1, ..., \bm{v}_n \right] is the number of vectors in that set, i.e., dim(B)=n\text{dim}(\mathcal{B}) = n.

Definition: The span of a set B=[v1,...,vn]\mathcal{B} = \left[ \bm{v}_1, ..., \bm{v}_n \right] is the range of the coordinate map B():FnV\mathcal{B} (\cdot) : F^n \rightarrow V, i.e., the set of all vectors of the form

span(B)={iαiviαiF}.\text{span}(\mathcal{B}) = \left\{ \sum_i \alpha_i \bm{v}_i \mid \alpha_i \in F \right\} \,.

Note that span(B)\text{span}(\mathcal{B}) is also a vector space.


Tautologically, the coordinate map of B\mathcal{B} is surjective if span(B)=V\text{span}(\mathcal{B}) = V. If this is the case then we call B\mathcal{B} a basis for VV. Moreover, we define the dimension of VV to be the dimension of B\mathcal{B}. This raises an interesting question: is the dimension of VV unique? Can we have two bases B1\mathcal{B}_1 and B2\mathcal{B}_2 for VV of different sizes? Turns out the answer is no. I won't prove it here, but if you had two bases of different sizes nn and mm, you could build a invertible linear function from FnF^n to FmF^m using the corresponding coordinate maps. This intuitively seems incorrect, but I will leave it as an exercise to show that it cannot exist.

Regardless, once you have a basis B\mathcal{B} for a vector space VV you are in business. You can now start doing a bunch of stuff on a computer. From the perspective of calculations, there is no difference between operations on VV and the corresponding operations on the coordinates in FnF^n. Hence, every finite dimensional vector space VV is isomorphic to Fdim(V)F^{\text{dim}(V)}. This is really a godsend, and it enables calculations that aren't easily possible in other algebraic objects like abstract rings or groups.

Linear Functions as Matrices

Now that we can represent abstract vectors vV\bm{v} \in V as arrays using coordinates, one might ask if we can do the same for linear functions f:VWf: V \rightarrow W. In order to do this, we need a basis both for the domain VV and the codomain WW. Suppose we have such bases BV=[v1,...,vn]\mathcal{B}_V = \left[ \bm{v}_1, ..., \bm{v}_n \right] and BW=[w1,...,wm]\mathcal{B}_W = \left[\bm{w}_1 ,..., \bm{w}_m \right]. Note that by examining what ff does to the basis elements of BV\mathcal{B}_V, we can easily determine what it will do to an arbitrary vector in VV, since

f(v)=f(iαivi)=iαif(vi).f(\bm{v}) = f\left( \sum_i \alpha_i \bm{v}_i\right) = \sum_i \alpha_i f(\bm{v}_i) \,.

Thus, the images f(vi)f(\bm{v}_i) fully determine the behavior of ff. Now let us write down these images f(vi)f(\bm{v}_i) as a linear combinations of the elements in BW\mathcal{B}_W,

f(vi)=jwjMji,f(\bm{v}_i) = \sum_j \bm{w}_j M_{ji} \,,

where here MjiM_{ji} is the jj-th coordinate of f(vi)f(v_i) with respect to BW\mathcal{B}_W. Note I reversed the order of the vectors wj\bm{w}_j and the scalars MjiM_{ji} from how they are usually written. You will see why shortly. Now, for an arbitrary vector v\bm{v}, this expression simply becomes

f(iαivi)=ijwjMjiαi.f\left( \sum_i \alpha_i \bm{v}_i \right) = \sum_i \sum_j \bm{w}_j M_{ji} \alpha_i \,.

Written in array form, where a\bm{a} is the array of coordinates αi\alpha_i and M\bm{M} is the matrix of entries MijM_{ij}, this is equivalent to

f(BVa)=BWMa.f(\mathcal{B}_V \bm{a}) = \mathcal{B}_W \bm{M} \bm{a} \,.

We see here that we've written down an expression that converts the abstract object ff into a 2-d array of numbers M\bm{M}. This is how one performs computations with linear functions on a computer. We specify bases for the domain and the codomain, then we write down the matrix coefficients of ff, and finally we can compute the coordinates of f(v)f(\bm{v}) from the coordinates of v\bm{v} by simply applying the matrix M\bm{M}. Note well that M\bm{M} depends on both the choice of BV\mathcal{B}_V as well as the choice of BW\mathcal{B}_W. Changing these choices will change the matrix representation.

We will use the following notation to denote the matrix representation of ff,

[f]BW,BV=M.[f]_{\mathcal{B}_W, \mathcal{B}_V} = \bm{M} \,.

A special case of the above is when V=WV = W. In this case, one usually uses the same basis for both the domain and the co-domain, in which case, the above reads

f(BVa)=BVMa.f(\mathcal{B}_V \bm{a}) = \mathcal{B}_V \bm{M} \bm{a} \,.

In this case we can write

[f]BV=M.[f]_{\mathcal{B}_V} = \bm{M} \,.

Change of Basis

We noted before that the matrix representation M\bm{M} of a linear function ff depends on the choice of basis for the domain and the codomain. In this section we answer the question of how the representation actually actually changes when we want to swap out the basis. Why would we want to do such as thing? Well, some bases are more amenable for computation than others. A linear function might be sparse (i.e., mostly zeroes) on one basis but completely dense (i.e., mostly nonzeros) in another. Sparse matrices are generally far more computationally efficient and require far less memory to store, so there are good reasons why we might prefer some bases over others. The process of updating M\bm{M} when we change the basis of the domain or codomain is called change of basis. We will derive the formulas below.

Suppose that A=[a1,...,an]\mathcal{A} = \left[\bm{a}_1, ..., \bm{a}_n\right] and B=[b1,...,bn]\mathcal{B} = \left[\bm{b}_1, ..., \bm{b}_n \right] are bases for a vector space VV. It is therefore possible to write the basis A\mathcal{A} in coordinate form with respect to the basis B\mathcal{B},

ai=jbjMji.\bm{a}_i = \sum_j \bm{b}_j M_{ji} \,.

In matrix form, we write this as

A=BMBA.\mathcal{A} = \mathcal{B} \bm{M}_{\mathcal{B} \rightarrow \mathcal{A}} \,.

Conversely, we can also write

B=AMAB.\mathcal{B} = \mathcal{A} \bm{M}_{\mathcal{A} \rightarrow \mathcal{B}} \,.

Plugging one into the other, we get

A=AMABMBA.\mathcal{A} = \mathcal{A} \bm{M}_{\mathcal{A} \rightarrow \mathcal{B}} \bm{M}_{\mathcal{B} \rightarrow \mathcal{A}} \,.

Linear independence tells us that we can remove the A\mathcal{A} from both sides (the coordinates must be equal if the vectors are equal). This leaves us with

I=MABMBA,\bm{I} = \bm{M}_{\mathcal{A} \rightarrow \mathcal{B}} \bm{M}_{\mathcal{B} \rightarrow \mathcal{A}} \,,

where I\bm{I} is the identity matrix. Likewise, we have MBAMAB=I\bm{M}_{\mathcal{B} \rightarrow \mathcal{A}} \bm{M}_{\mathcal{A} \rightarrow \mathcal{B}} = \bm{I} from the other direction. This gives us

MAB=MBA1.\bm{M}_{\mathcal{A} \rightarrow \mathcal{B}} = \bm{M}_{\mathcal{B} \rightarrow \mathcal{A}}^{-1} \,.

The matrices MAB\bm{M}_{\mathcal{A} \rightarrow \mathcal{B}} and MBA1\bm{M}_{\mathcal{B} \rightarrow \mathcal{A}}^{-1} are known as change of basis matrices, and we see that the change of basis matrix from A\mathcal{A} to B\mathcal{B} is the inverse of the change of basis matrix from B\mathcal{B} to A\mathcal{A}. This makes perfect sense. It also means that if you have a matrix of coordinates of a basis with respect to another basis, you can get the coordinates of the other direction by performing a matrix inverse on your computer.

Now suppose we have a linear function f:VWf : V \rightarrow W and corresponding bases BV\mathcal{B}_V and BW\mathcal{B}_W and we want to change basis in the domain to BV\mathcal{B}_V' and in the codomain to BW\mathcal{B}_W'. How can we write [f]BW,BV[f]_{\mathcal{B}_W', \mathcal{B}_V'} in terms of [f]BW,BV[f]_{\mathcal{B}_W, \mathcal{B}_V}? Well, we start with

f(BVa)=BW[f]BW,BVa.f(\mathcal{B}_V \bm{a}) = \mathcal{B}_W [f]_{\mathcal{B}_W, \mathcal{B}_V} \bm{a} \,.

Now, replacing a=MBVBVb\bm{a} = \bm{M}_{\mathcal{B}_V \rightarrow \mathcal{B}_V'} \bm{b}, we get

f(BVMBVBVb)=BW[f]BW,BVMBVBVb.f(\mathcal{B}_V \bm{M}_{\mathcal{B}_V \rightarrow \mathcal{B}_V'} \bm{b}) = \mathcal{B}_W [f]_{\mathcal{B}_W, \mathcal{B}_V} \bm{M}_{\mathcal{B}_V \rightarrow \mathcal{B}_V'} \bm{b} \,.

The quantity of the left can be rewritten,

f(BVb)=BW[f]BW,BVMBVBVb.f(\mathcal{B}_V' \bm{b}) = \mathcal{B}_W [f]_{\mathcal{B}_W, \mathcal{B}_V} \bm{M}_{\mathcal{B}_V \rightarrow \mathcal{B}_V'} \bm{b} \,.

Now we substitute BW=BWMBWBW\mathcal{B}_W = \mathcal{B}_W' \bm{M}_{\mathcal{B}_W' \rightarrow \mathcal{B}_W} and we get

f(BVb)=BW(MBWBW[f]BW,BVMBVBV)b.f(\mathcal{B}_V' \bm{b}) = \mathcal{B}_W' (\bm{M}_{\mathcal{B}_W' \rightarrow \mathcal{B}_W} [f]_{\mathcal{B}_W, \mathcal{B}_V} \bm{M}_{\mathcal{B}_V \rightarrow \mathcal{B}_V'}) \bm{b} \,.

The quantity in the parenthesis is exactly what we are after!

[f]BW,BV=MBWBW[f]BW,BVMBVBV.[f]_{\mathcal{B}_W', \mathcal{B}_V'} = \bm{M}_{\mathcal{B}_W' \rightarrow \mathcal{B}_W} [f]_{\mathcal{B}_W, \mathcal{B}_V} \bm{M}_{\mathcal{B}_V \rightarrow \mathcal{B}_V'} \,.

For a linear function f:VVf : V \rightarrow V with the same domain and codomain, we get

[f]BV=MBVBV[f]BVMBVBV=MBVBV[f]BVMBVBV1.[f]_{\mathcal{B}_V'} = \bm{M}_{\mathcal{B}_V' \rightarrow \mathcal{B}_V} [f]_{\mathcal{B}_V} \bm{M}_{\mathcal{B}_V \rightarrow \mathcal{B}_V'} = \bm{M}_{\mathcal{B}_V' \rightarrow \mathcal{B}_V} [f]_{\mathcal{B}_V} \bm{M}_{\mathcal{B}_V' \rightarrow \mathcal{B}_V}^{-1} \,.

We see that when we are thinking in terms of automorphisms (linear functions with the same domain and codomain), the matrix representation of ff gets sandwiched between a matrix and its inverse. This operation of sandwiching a matrix A\bm{A} and turning it into MAM1\bm{M} \bm{A} \bm{M}^{-1} is called conjugation. Moreover, if two matrices are related by conjugation, we say that they are similar (denoted AB\bm{A} \sim \bm{B}). In effect, if you ever see two matrices A,B\bm{A}, \bm{B} such that

B=MAM1,\bm{B} = \bm{M} \bm{A} \bm{M}^{-1} \,,

then one way of interpreting this formula is that B\bm{B} and A\bm{A} represent the same linear function up to a change of basis. An interesting question follows naturally --- it is possible to categorize all matrices into classes based on conjugation? The answer is yes, and if you're interested, see Jordan Normal Form (opens in a new tab). Unfortunately, this is out of scope for the moment, but still fun to mention.

A much weaker equivalence relationship between matrices is called matrix equivalence. Note from our change of basis expression with different domain and codomain that the matrix for ff essentially gets sandwiched between two invertible matrices (not necessary the same). However, unlike the automorphism case where we take the same basis for the domain and codomain, if we take different bases, then we can similarly say that A\bm{A} and B\bm{B} represent the same linear function ff up to choice of basis in the domain and codomain if we can write

B=MAP1.\bm{B} = \bm{M} \bm{A} \bm{P}^{-1} \,.

where both M\bm{M} and P\bm{P} are invertible. The matrices A\bm{A} and B\bm{B} are then said to be equivalent. The same question can now be asked: can we categorize all matrices of a certain size by this equivalence relation? The answer is also yes, and unlike Jordan Normal Form, the answer is also much easier to prove. It turns out that there is a special number, called the rank of a linear function, such that all matrices of a certain rank are equivalent. We will see this in the next chapter.