Linear Independence and Bases
Okay. So, we've introduced the idea of a vector space and the notion of linear functions between vector spaces. However, these are still very much abstract objects. In order to make these ideas useful, we need to convert an abstract vector into something we can actually manipulate on a computer. We need coordinates.
Coordinates
The idea behind coordinates is that we try to write vectors in as a linear combination of vectors that we choose ahead of time. By doing this, we can perform numerical manipulations on the coefficients in the linear combination instead of the abstract objects themselves. For example, if
then it is very easy to add these objects together,
Therefore, on a computer, we can represent and by using arrays of the coefficients and (a.k.a. the coordinates with respect to ). In array notation, I like to write the set as a row vector,
Then we can write and as
However, note that the part of this equation exists only in our heads. It cannot really be put into a computer as long as the vectors remain abstract objects. But in the context of the vector space , the coordinates have no meaning unless it is with respect to a set of vectors . It is important to remember this, it can have dire ramifications. Namely, if we swap out , then the coordinates change. In many cases, the set is implied, so it is common for people in practice to drop the and work exclusively on the coordinates, but in order to avoid confusion it is good to remind oneself occasionally what the coordinates are actually in reference to.
However, we've glanced over an even more glaring issue. So far, we have no guarantee that the coordinates are actually unique or actually exist for any vector! Indeed, we can't just choose any set to represent vectors. In particular, if we think about the operation of taking an array of coordinates and mapping it to a vector by applying the set on the left hand side,
we want to be able to know two things about this coordinate map:
- (Injectivity) For any two distinct set of coordinates , are actually distinct vectors?
- (Surjectivity) For any , is there actually a set of coordinates in that yields this vector?
Clearly both of these properties would be very desirable to have for to be a basis (foreshadowing) for our computations. Let's begin with the first one, injectivity. The answer to this question has to do with something called linear independence.
Linear Independence
In linear algebra, perhaps the most fundamental relationship between vectors is that of linear independence. Linear independence characterizes lack of redundancy in a set of vectors. For example, suppose that
From the perspective of the user, note that only two out of the three of these vectors are actually useful. To be more precise, any vector that can be written as a combination of can also be written as a linear combination of , since
However, note that such a combination could just have easily been written as a combination of or , since
Moreover, in the context of our previous discussion about coordinates, this relationship between the vectors means that we do not have unique coordinates. For example,
See? The coordinates are not unique. Bummer. It would be very annoying to perform calculations with respect to . On the other hand, doesn't have this annoying feature. So what is the difference between the sets and ? Well, it has to do with the fact that there was a non trivial way of writing zero as a non-trivial linear combination the members of the set,
Whereas there is no way of writing
for any except the trivial . The relationship is called a linear dependence relationship and if such a relationship exists we say is linearly dependent whereas if no such relationship exists then we say the set is linearly independent. This leads to the following definition,
Definition: A set of vectors is linearly independent if
implies . In other words, there exists no way of achieving as a non-trivial linear combination of .
Note that linear independence automatically implies that the coordinate map is injective, since if are two sets of coordinates that yield the same vector under , then
Linear independence therefore implies that and hence . Here denotes the field of the vector space .
Surjectivity
Now, the surjectivity of the coordinate map (i.e., can I write every vector in as a linear combination of vectors in ) has to do with a property called dimension. In order to proceed, we will need two more definitions:
Definition: The dimension of a linearly independent set is the number of vectors in that set, i.e., .
Definition: The span of a set is the range of the coordinate map , i.e., the set of all vectors of the form
Note that is also a vector space.
Tautologically, the coordinate map of is surjective if . If this is the case then we call a basis for . Moreover, we define the dimension of to be the dimension of . This raises an interesting question: is the dimension of unique? Can we have two bases and for of different sizes? Turns out the answer is no. I won't prove it here, but if you had two bases of different sizes and , you could build a invertible linear function from to using the corresponding coordinate maps. This intuitively seems incorrect, but I will leave it as an exercise to show that it cannot exist.
Regardless, once you have a basis for a vector space you are in business. You can now start doing a bunch of stuff on a computer. From the perspective of calculations, there is no difference between operations on and the corresponding operations on the coordinates in . Hence, every finite dimensional vector space is isomorphic to . This is really a godsend, and it enables calculations that aren't easily possible in other algebraic objects like abstract rings or groups.
Linear Functions as Matrices
Now that we can represent abstract vectors as arrays using coordinates, one might ask if we can do the same for linear functions . In order to do this, we need a basis both for the domain and the codomain . Suppose we have such bases and . Note that by examining what does to the basis elements of , we can easily determine what it will do to an arbitrary vector in , since
Thus, the images fully determine the behavior of . Now let us write down these images as a linear combinations of the elements in ,
where here is the -th coordinate of with respect to . Note I reversed the order of the vectors and the scalars from how they are usually written. You will see why shortly. Now, for an arbitrary vector , this expression simply becomes
Written in array form, where is the array of coordinates and is the matrix of entries , this is equivalent to
We see here that we've written down an expression that converts the abstract object into a 2-d array of numbers . This is how one performs computations with linear functions on a computer. We specify bases for the domain and the codomain, then we write down the matrix coefficients of , and finally we can compute the coordinates of from the coordinates of by simply applying the matrix . Note well that depends on both the choice of as well as the choice of . Changing these choices will change the matrix representation.
We will use the following notation to denote the matrix representation of ,
A special case of the above is when . In this case, one usually uses the same basis for both the domain and the co-domain, in which case, the above reads
In this case we can write
Change of Basis
We noted before that the matrix representation of a linear function depends on the choice of basis for the domain and the codomain. In this section we answer the question of how the representation actually actually changes when we want to swap out the basis. Why would we want to do such as thing? Well, some bases are more amenable for computation than others. A linear function might be sparse (i.e., mostly zeroes) on one basis but completely dense (i.e., mostly nonzeros) in another. Sparse matrices are generally far more computationally efficient and require far less memory to store, so there are good reasons why we might prefer some bases over others. The process of updating when we change the basis of the domain or codomain is called change of basis. We will derive the formulas below.
Suppose that and are bases for a vector space . It is therefore possible to write the basis in coordinate form with respect to the basis ,
In matrix form, we write this as
Conversely, we can also write
Plugging one into the other, we get
Linear independence tells us that we can remove the from both sides (the coordinates must be equal if the vectors are equal). This leaves us with
where is the identity matrix. Likewise, we have from the other direction. This gives us
The matrices and are known as change of basis matrices, and we see that the change of basis matrix from to is the inverse of the change of basis matrix from to . This makes perfect sense. It also means that if you have a matrix of coordinates of a basis with respect to another basis, you can get the coordinates of the other direction by performing a matrix inverse on your computer.
Now suppose we have a linear function and corresponding bases and and we want to change basis in the domain to and in the codomain to . How can we write in terms of ? Well, we start with
Now, replacing , we get
The quantity of the left can be rewritten,
Now we substitute and we get
The quantity in the parenthesis is exactly what we are after!
For a linear function with the same domain and codomain, we get
We see that when we are thinking in terms of automorphisms (linear functions with the same domain and codomain), the matrix representation of gets sandwiched between a matrix and its inverse. This operation of sandwiching a matrix and turning it into is called conjugation. Moreover, if two matrices are related by conjugation, we say that they are similar (denoted ). In effect, if you ever see two matrices such that
then one way of interpreting this formula is that and represent the same linear function up to a change of basis. An interesting question follows naturally --- it is possible to categorize all matrices into classes based on conjugation? The answer is yes, and if you're interested, see Jordan Normal Form (opens in a new tab). Unfortunately, this is out of scope for the moment, but still fun to mention.
A much weaker equivalence relationship between matrices is called matrix equivalence. Note from our change of basis expression with different domain and codomain that the matrix for essentially gets sandwiched between two invertible matrices (not necessary the same). However, unlike the automorphism case where we take the same basis for the domain and codomain, if we take different bases, then we can similarly say that and represent the same linear function up to choice of basis in the domain and codomain if we can write
where both and are invertible. The matrices and are then said to be equivalent. The same question can now be asked: can we categorize all matrices of a certain size by this equivalence relation? The answer is also yes, and unlike Jordan Normal Form, the answer is also much easier to prove. It turns out that there is a special number, called the rank of a linear function, such that all matrices of a certain rank are equivalent. We will see this in the next chapter.