Linear and Multilinear Algebra
Rank-Nullity Theorem

Rank-Nullity Theorem

Recall from the previous section that two matrices A\bm{A} and B\bm{B} are equivalent if there exist invertible P\bm{P} and M\bm{M} such that

B=MAP1.\bm{B} = \bm{M} \bm{A} \bm{P}^{-1} \,.

The objective of this chapter is to classify all matrices of a given size up to equivalence.

Fundamental Subspaces: Kernel and Range

In order to begin talking about this, we need to talk about two fundamental subspaces associated with a linear function f:VWf: V \rightarrow W, the range and the kernel. But first let us properly define the image and pre-image of a vector space.


Definition: The image of a vector subspace UVU \subset V under a linear function f:VWf : V \rightarrow W, denoted f(U)f(U), is the set of all elements of WW which can be written

w=f(u),\bm{w} = f(\bm{u}) \,,

for some uU\bm{u} \in U. Note the image of a vector space is also a vector space.

Definition: The preimage of a vector subspace UWU \subset W under a linear function f:VWf : V \rightarrow W, denoted f1(U)f^{-1}(U), is the set of all elements of VV which map to UU, i.e.,

f(v)U.f(\bm{v}) \in U \,.

The preimage of a vector space is also a vector space.


With these definitions, the most important image and preimage of a function ff are the range and kernel,


Definition: The range of a linear function f:VWf : V \rightarrow W, denoted range(f)\text{range}(f), is the image of VV under ff.

Definition: The kernel of a linear function f:VWf : V \rightarrow W, denoted ker(f)\text{ker}(f), is the preimage of the trivial subspace {0}W\{ 0 \} \subset W under ff. That is, it is the set of all elements that map to zero (i.e., "annihilated" by ff).


Both of these are fundamental subspaces that determine the structure of ff. In particular, their dimensions are important properties of ff,


Definition: The rank of a linear function f:VWf : V \rightarrow W, denoted rank(f)\text{rank}(f), is the dimension of its range.

Definition: The nullity of a linear function f:VWf : V \rightarrow W, denoted null(f)\text{null}(f) is the dimension of its kernel.


The key to rank-nullity theorem is to recognize that ff essentially acts as a "filter" on the algebraic structure of VV, it annihilates some parts of the structure --- the structure contained within the kernel space ker(f)\text{ker}(f), but preserves the remaining structure and embeds it into the range space range(f)\text{range}(f). The idea of the rank-nullity theorem is that if one "glues" the structure of the kernel and the range together, we can "recover" the structure of the original domain VV. More precisely, this means that the dimensions of ker(f)\text{ker}(f) and range(f)\text{range}(f) should sum to the dimension original space VV. This is the statement of rank-nullity theorem,


Theorem (Rank-Nullity): For any linear function f:VWf: V \rightarrow W,

rank(f)+null(f)=dim(V).\text{rank}(f) + \text{null}(f) = \text{dim}(V) \,.

Moreover, any matrix representation of ff is equivalent to

[Irank(f)0null(f)].\left[\begin{array}{cc} \bm{I}_{\text{rank}(f)} & \\ & \bm{0}_{\text{null}(f)}\end{array}\right] \,.

Where above Irank(f)\bm{I}_{\text{rank}(f)} is the identity of size rank(f)\text{rank}(f) and 0null(f)\bm{0}_{\text{null}(f)} is the zero matrix of size null(f)\text{null}(f).

Proof of Theorem

The proof of this statement is not particularly difficult. It just requires finding the "right" basis for VV. To begin, let a1,...,ak\bm{a}_1, ..., \bm{a}_k be a basis for ker(f)\text{ker}(f). Note that we can expand any linearly independent set SS with dimension less than dim(V)\text{dim}(V) to a linearly independent set with one more element by simply selecting an element from Vspan(S)V \setminus \text{span}(S) to add to it (you can verify for yourself that the resulting set is still linearly independent). This allows us to expand the basis a1,...,ak\bm{a}_1, ..., \bm{a}_k for ker(f)\text{ker}(f) to a basis a1,...,ak,b1,...,bm\bm{a}_1, ..., \bm{a}_k, \bm{b}_1, ..., \bm{b}_m for all of VV. The core part of the proof is now to show that f(b1),...,f(bm)f(\bm{b}_1), ..., f(\bm{b}_m) are a basis for range(f)\text{range}(f). If this is the case, then the first part of the theorem follows automatically since then k=null(f)k = \text{null}(f), m=rank(f)m = \text{rank}(f), and m+k=dim(V)m + k = \text{dim}(V).

First, we note that f(b1),...,f(bm)f(\bm{b}_1), ..., f(\bm{b}_m) must span the range of ff. This is because anything in the range of ff can be written as

w=f(iαiai+iβibi)=iαif(ai)+iβif(bi)=iβif(bi),\bm{w} = f \left( \sum_i \alpha_i \bm{a}_i + \sum_i \beta_i \bm{b}_i\right) = \sum_i \alpha_i f (\bm{a}_i) + \sum_i \beta_i f(\bm{b}_i) = \sum_i \beta_i f(\bm{b}_i) \,,

where we have used the fact that f(ai)=0f(\bm{a}_i) = \bm{0} by definition of the kernel. Now, in order to show that f(b1),...,f(bm)f(\bm{b}_1), ..., f(\bm{b}_m) is a basis, we need linear independence. To show this, suppose for contradiction that there exist nontrivial βi\beta_i such that

iβif(bi)=0.\sum_i \beta_i f(\bm{b}_i) = \bm{0} \,.

Then, by linearity,

f(iβibi)=0.f\left( \sum_i \beta_i \bm{b}_i \right) = \bm{0} \,.

But this means that iβibi\sum_i \beta_i \bm{b}_i must be a member of the kernel of ff, thus, there must exist αi\alpha_i such that

iβibi=iαiai.\sum_i \beta_i \bm{b}_i = \sum_i \alpha_i \bm{a}_i \,.

But this contradicts the assumption that the full basis a1,...,ak,b1,...,bm\bm{a}_1, ..., \bm{a}_k, \bm{b}_1, ..., \bm{b}_m is linearly independent since we have just found a nonzero set of coordinates that gives the zero vector,

iαiaiiβibi=0.\sum_i \alpha_i \bm{a}_i - \sum_i \beta_i \bm{b}_i = \bm{0} \,.

Thus, f(b1),...,f(bm)f(\bm{b}_1), ..., f(\bm{b}_m) must be a basis for the range of ff.

Now that we have this basis, we can easily prove the second part of the theorem as well. Let ci=f(bi)\bm{c}_i = f(\bm{b}_i). Expand [c1,...,cm][\bm{c}_1, ..., \bm{c}_m] to a basis [c1,..,cm][\bm{c}_1, .., \bm{c}_m] for all of WW. Then we have that

ci1=f(bi),di0=f(ai).\bm{c}_i \cdot 1 = f(\bm{b}_i) \,,\\ \bm{d}_i \cdot 0 = f(\bm{a}_i) \,.

Let v\bm{v} be an arbitrary array in Fm+kF^{m + k}. We multiply each equation by the corresponding entry of v\bm{v},

ci1vi=f(vibi),di0vm+i=f(vm+iai).\bm{c}_i \cdot 1 \cdot v_i = f(v_i \bm{b}_i) \,,\\ \bm{d}_i \cdot 0 \cdot v_{m + i} = f(v_{m + i} \bm{a}_i) \,.

Summing all these equalities and letting BV=[b1,...,bm,a1,...,ak]\mathcal{B}_V = [\bm{b}_1, ..., \bm{b}_m, \bm{a}_1, ..., \bm{a}_k] and BW=[c1,...,cm,d1,...,dm]\mathcal{B}_W = [\bm{c}_1, ..., \bm{c}_m, \bm{d}_1, ..., \bm{d}_m], we rewrite the above in matrix form,

BW[Irank(f)0null(f)]v=f(BVv).\mathcal{B}_W \left[\begin{array}{cc} \bm{I}_{\text{rank}(f)} & \\ & \bm{0}_{\text{null}(f)}\end{array}\right] \bm{v} = f(\mathcal{B}_V \bm{v}) \,.

Hence,

[f]BW,BV=[Irank(f)0null(f)].[f]_{\mathcal{B}_W, \mathcal{B}_V} = \left[\begin{array}{cc} \bm{I}_{\text{rank}(f)} & \\ & \bm{0}_{\text{null}(f)}\end{array}\right] \,.

From the discussion of matrix equivalence in the previous section, we therefore have the second part of the theorem.