JORDAN CANONICAL FORM

JORDAN CANONICAL FORM

Steven Miller sjmiller@math.ohio-state.edu

I. Introduction:

We’ve seen that not every matrix is diagonalizable. For example, consider

(0 1)

(0 0)

Then direct calculation shows that it is not diagonalizable. Why do we care about diagonalizing matrices? The main reason is ease of computation. If we can write A = S L S^-1, then A¹⁰⁰⁰ = S L¹⁰⁰⁰ S^-1, and the calculations can be performed very quickly. If we had to multiply 1,000 powers of A, this would be very time consuming. Theoretically, we may not need such a time-saving method, but if we’re trying to model any physical system or economic model, we’re going to want to run calculations on a computer. And if the matrix is decently sized, very quickly these calculations will cause noticeable time-lags.

Jordan Canonical Form is the answer. The Question? What is the ‘nicest’ form we can get an arbitrary matrix into. We already know that, to every eigenvalue, there is a corresponding eigenvector. If an nxn matrix has n linearly independent eigenvectors, then it is diagonalizable. Hence,

Theorem 1: If an nxn matrix A has n distinct eigenvalues, then A is diagonalizable, and for the diagonalizing matrix S we can take the columns to be the n eigenvectors (S^-1 A S = L).

In the proof of the above, we see all we needed was n linearly independent vectors. So we obtain

Theorem 2: If an nxn matrix A has n linearly independent eigenvectors, then A is diagonalizable, and for the diagonalizing matrix S we can take the columns to be the n eigenvectors (S^-1 A S = L).

Now consider the case of an nxn matrix A that does not have n linearly independent eigenvectors. Then we have

Theorem 3: If an nxn matrix does not have n linearly independent eigenvectors, then A is not diagonalizable.

Proof:Assume A is diagonalizable by the matrix S.

Then S^-1 A S = L, or A = S L S^-1.

The standard basis vectors e₁, ..., e_n are eigenvectors of

L, and as S is invertible, we get Se₁, ..., Se_n are

eigenvectors of A, and these n vectors are linearly

independent. (Why?) But this contradicts the fact that

A does not have n linearly independent eigenvectors.

Contradiction, hence A is not diagonalizable.

So, in studying what can be done to an arbitrary nxn matrix, we need only study matrices that do not have n linearly independent eigenvectors.

Jordan Canonical Form Theorem (JCF):

Let A be an nxn matrix. Then there exists an invertible matrix M such that M^-1 A M = J, where J is a block diagonal matrix, and each block is of the form

(l 1 )

( l 1 )

( . . . )

( l 1)

( l)

Note J¹⁰⁰⁰ is much easier to computer than A¹⁰⁰⁰. In fact, there is an explicit formula for J¹⁰⁰⁰ if you know the eigenvalues and the sizes of each block.

II. Notation:

Recall that l is an eigenvalue of A if Det(A - lI) = 0, and v is an eigenvector of A with eigenvalue l if Av = lv. We say v is a generalized eigenvector of A with eigenvalue l if there is some number N such that (A-lI)^Nv = 0. Note all eigenvectors are generalized eigenvectors.

For notational convenience, we write gev for generalized eigenvector, or l-gev for generalized eigenvector corresponding to l.

We say the l-Eigenspace of A is the subspace spanned by the eigenvectors of A that have eigenvalue l. Note that this is a subspace, for if v and w are eigenvectors with eigenvalue l, then av + bw is an eigenvector with eigenvalue l.

We define the l-Generalized Eigenspace of A to be the subspace of vectors killed by some power of (A-lI). Again, note that this is a subspace.

III. Needed Theorems:

Fundamental Theorem of Algebra: Any polynomial with complex coefficients of degree n has n complex roots (not necessarily distinct).

Cayley-Hamilton Theorem: Let p(l) = Det(A-lI) be the characteristic polynomial of A. Let l₁, ..., l_k be the distinct roots of this polynomial, with multiplicities n₁, ...., n_k (so n₁ + ... + n_k = n). Then we can factor p(l) as

p(l) = (l - l₁)ⁿ¹(l - l₂)ⁿ² _* ... _* (l - l_k) ^nk,

and the matrix A satisfies

p(A) = (A - l₁I)ⁿ¹(A - l₂I)ⁿ² _* ... _* (A - l_kI) ^nk = 0,

Schur’s Lemma (Triangularization Lemma): Let A be an nxn matrix. Then there exists a unitary U such that U^-1 A U = T, where T is an upper triangular matrix.

Proof: construct U by fixing one column at a time.

IV. Reduction to Simpler Cases:

In the rest of this handout, we will always assume A has eigenvalues l₁, ..., l_k, with multiplicities n₁, ...., n_k (so n₁ + ... + n_k = n). We will show that we can find n₁ l-gev, n₂ l-gev, ..., n_k l-gev, such that these n vectors are linearly independent (LI). These will then form our matrix M.

So, if we can show that the n generalized eigenvectors are linearly independent, and that each one ‘block diagonalizes’ where it should, it is enough to study each l separately.

For example, we’ll show it’s sufficient to consider l = 0. Let l be an eigenvalue of A. Then if v_j is a generalized eigenvector of A with eigenvalue l, then v_j is a generalized eigenvector with eigenvalue 0 of B = A - lI:

A v_j = l v_j + v_j-1 à B v_j = 0 v_j + v_j-1.

So, if we can find n_j LinIndep gev for B corresponding to 0, we’ve found n_j LinIindep gev for A corresponding to l.

The next simplification is that if we can find n_j LinIndep gev for U^-1 B U, then we’ve found n_jLinIndep gev for B. The proof is a straightforward calculation: let v₁, ..., v_m be the m LinIndp gev for U^-1 B U; then U^-1 v₁, ...., U^-1 v_m will be m LinIndep gev for B.

Lemma 4: Let p(l) = (l - l₁)ⁿ¹(l - l₂)ⁿ² _* ... _* (l - l_k) ^nk be the char poly of A, so p(A) = (A - l₁I)ⁿ¹(A - l₂I)ⁿ² _* ... _* (A - l_kI) ^nk. For 1 £ i £ k, consider (A - l_iI). This matrix has exactly n_i LinIndep generalized eigenvectors with eigenvalue 0, hence A has n_i LinIndep generalized eigenvectors with eigenvalue l_i.

Proof: For notational simplicity, we’ll prove this for l = l₁, and let’s write m for the multiplicity of l (so m = n₁). Further, by the above arguments we see it is sufficient to consider the case l = 0. By the Triangularization Lemma, we can put B = A - lI (which has first eigenvalue = 0) into upper triangular form. What we need from the proof is that if we take the first column of U to be v, where v is an eigenvector of B corresponding to eigenvalue 0, then the first column of T = U₁^-1 B U₁ would be (0,0, ..., 0)^T.

The lower (n-1)x(n-1) block of T_n, call it C_n-1, is upper triangular, hence the eigenvalues of B appear as the entries on the main diagonal. Hence we can again apply the triangularization argument to C_n-1, and get an (n-1)x(n-1) unitary matrix U_2b, such that U_2b^-1 C_n-1 U_2b = T_n-1 has first column (0,0,....0), and the rest is upper triangular. Hence we can form an nxn unitary matrix U₂

(1 0 0 ... 0)

(0 )

(0 U_2b )

(... )

(0 )

Then U₂^-1U₁^-1 B U₁ U₂ =

(0 * * ... *)

(0 0 * ... *)

(... ... ... ... )

(0 0 0 ... *)

The net result is that we’ve now rearranged our matrix so that the first two entries on the main diagonal are zero. By ‘triangularizing’ like this m times, we can continue so that the upper mxm block is upper triangular, with zeros along the main diagonal, and the remaining entries on the main diagonal are non-zero (as we are assuming the multiplicity was m). Call this matrix T_m. Note there is a unitary U such that T_m = U^-1BU. Remember, T_m and B are nxn matrices, not mxm matrices.

Sublemma 1: At most m vectors can be killed by powers of T_m.

Proof: direct calculation: When we multiply powers of T_m, we still have an upper triangular matrix. The entries on the main diagonal are zero for the first m terms, and then non-zero for the remaining terms (because the multiplicity of the eigenvalue l = 0 is exactly m). Hence the vectors e_m+1, e_m+2, ..., e_n are not killed by powers of T_m, and so powers of T_m can have a nullspace of dimension at most m.

We now show that exactly m vectors are killed by T_m. This follows immediately from

Sublemma 2: Let C be an mxm upper triangular matrix with zeros along the main diagonal. Then C^m is the zero matrix.

Proof: straightforward calculation, left to the reader.

Hence the nullspace of (T_m)^m (and all higher powers) is exactly m, which proves there are m generalized eigenvectors of B with eigenvalue l = 0. These vectors are LinIndep: As T_m is upper triagonal with zeros on the main diagonal for the first m entries, T_m has m LinIndep gev e₁, …, e_m with eigenvalue 0. As B = UT_mU^-1, B has m LinIndep gev Ue₁, …, Ue_m with eigenvalue 0 (show that B cannot have any more LinIndep gev with l = 0).

Returning to the proof of Lemma 4, we see that there are exactly n₁ vectors killed by (A-l₁I)ⁿ¹, ...., n_k vectors killed by (A-l_kI)^nk.

The only reason we go thru this triagonalizing is to conclude that there are exactly n_i vectors killed by (A-l_iI)ⁿⁱ. Try to prove this fact directly!

IV. Appendix: Representation of l-Generalized Eigenvectors.

We know that if l is an eigenvalue with multiplicity m, there are m generalized eigenvectors, satisfying (A - lI)^m v = 0. We describe a very useful way to write these eigenvectors. Let us assume there are t eigenvectors, say v₁, …., v_t. We know there are m l-gev. Note if v is a l-gev, so is (A-lI)v, (A-lI)² v, …., (A-lI)^m v. Of course, some of these may be the zero vector.

We claim that each eigenvector is the termination of some chain of l-gev. In particular, we have

(A-lI) v_1,a = v_1,a-1

(A-lI) v_1,a-1 = v_1,a-2

_...

(A-lI) v₁ = 0 where v₁ = v_1,1.

and

(A-lI) v_2,b = v_2,b-1

(A-lI) v_2,b-1 = v_2,b-2

_...

(A-lI) v₂ = 0 where v₂ = v_2,1.

all the way down to

(A-lI) v_t,r = v_t,r-1

(A-lI) v_t,r-1 = v_t,r-2

_...

(A-lI) v_t = 0 where v_t = v_t,1,

and a + b + …+ r = m.

We emphasize that we have not shown that such a sequence of l-gev exists. Later we shall show how to construct these vectors, and then in Lemma 8 we will prove they are Linearly Independent. For now, we assume their existence (and linear independence), and complete the proof of Jordan Canonical Form.

Let us say a l-gev is a pure-gev if it is not an eigenvector. Thus, in the above we have t eigenvectors, and m-t pure-generalized eigenvectors. For notational convenience, we often label the l-generalized eigenvectors by

v₁, …, v_m. Thus, for a given j, we have (A-lI)v_j = 0 if v_j is an eigenvector, and (A-lI)v_j = v_j-1if v_j is a pure-gev.

V. Linear Independence of the l-Generalized Eigenspaces.

Assume the n₁ gev corresponding to l₁ are linearly independent amongst themselves, and the same for the n₂ gev corresponding to l₂, .... We now show that the n gev are linearly independent. This fact complete the proof of Jordan Canonical Form (of course, we still must prove the n_i l_i-gev are linearly independent).

Assume we have some linear combination of the n gev equaling zero. By

LC l_i-gev we mean a linear combination of the n_i l_i-gev. (This is just to simplify notation).

Then (LC l₁-gev) + (LC l₂-gev) + ... + (LC l_k-gev) = 0.

We’ll show first that the coefficients in the first linear combination are all zero. Recall the characteristic polynomial is

p(A) = (A - l₁I)ⁿ¹(A - l₂I)ⁿ² _* ... _* (A - l_kI) ^nk.

Define

g₁(A) = (A - l₂I)ⁿ²(A - l₃I)ⁿ³ _* ... _* (A - l_kI) ^nk,

g₁(A) kills (LC l₂-gev), g₁(A) kills (LC l₃-gev),...., g₁(A) kills (LC l_k-gev).

Why? For example, for the l₂-gev, they are all killed by (A-l₂I)ⁿ², and hence as g₁(A) contains this factor, they are all killed.

What does g₁(A) do to (LC l₁-gev)? Again, for notational simplicity we’ll write m for n₁, and v₁, ..., v_m for the corresponding m l₁-gev.

We can look at it factor by factor, as all the different terms (A - l_iI) commute.

Lemma 5: For i > 1, let the v_j’s be the gev corresponding to l₁.

If v_j is a pure-gev, then (A - l_iI) v_j = (l₁- l_i) v_j + v_j-1

If v_j is an eigenvector, then (A - l_iI) v_j = (l₁- l_i) v_j.

Again, the proof is a calculation: if v_j is a pure-gev,

(A - l_iI) v_j = (A - l₁I + l₁I - l_iI) v_j

= (A - l₁I) v_j + (l₁I - l_iI) v_j

= v_j-1 + (l₁ - l_i) v_j

The proof when v_j is an eigenvector is similar.

Now we examine g₁(A) ((LC l₁-gev) + (LC l₂-gev) + ... + (LC l_k-gev)) = 0.

Clearly g₁(A) kills the last k-1 linear combinations, and we are left with

g₁(A) (LC l₁-gev) = 0

Let’s say the LC l₁-gev = a₁ v₁ + ... + a_m v_m. We need to show that all the a_j’s are zero. (Remember we are assuming the v_j’s are linear independent – we will prove this fact when we construct the v_j’s). Assume a_m ¹ 0. From our labeling, v_m is either an eigenvector, or a pure-gev that starts a chain leading to an eigenvector: v_m, (A - l_iI) v_m, (A - l_iI)2 v_m, …. Note no other chain will contain v_m.

We claim that g₁(A) (LC l₁-gev) will contain a non-zero multiple of v_m. Why? When each factor (A - l_iI) hits a v_j, one gets back (l₁- l_i) v_j + v_j-1 if v_j is not an eigenvector, and (l₁- l_i) v_j if v_j is an eigenvector. Regardless, we always get back a non-zero multiple of v_j, as l₁≠ l_i.

Hence direct calculation shows the coefficient of v_m in g₁(A) (LC l₁-gev) is

a_m (l₁ - l₂)ⁿ²(l₁ - l₃)ⁿ³ _* ... _*(l₁ - l_k)^nk

As we are assuming the different l’s are distinct (remember we grouped the eigenvalues together to have multiplicity), this coefficient is non-zero. As we are assuming v₁ thru v_m are linearly independent, the only way the coefficient of v_m can be zero is if a_m = 0. Similar reasoning implies a_m-1 = 0, and so on. Hence we have proved:

Theorem 5: Assuming that the n_i generalized eigenvectors associated to the eigenvalue l_i are linearly independent (for 1 £ i £ n), then the n generalized eigenvectors are linearly independent. Furthermore, there is an invertible M such that M^-1 A M = J.

The only item not immediately clear is what M is. As an exercise, show that one may take M to be the generalized eigenvectors of A. They must be put in a special order. For example, one may group all the l₁-gev together, the l₂-gev together, and so on. For each i, order the l_i-gev as follows: say there are t eigenvectors which give sequences v₁, …, v_1,a, v₂, …, v_2,b,…, v_t,…,v_t,r. Then this ordering works (exercise).

VI. Finding the l-gev:

The above arguments show we need only find the n_i generalized eigenvectors corresponding to the eigenvalue l_i; these will be of the form (A-l_iI) v_j = 0 or (A-l_iI) v_j = v_j-1. Moreover, we’ve also seen we may take l_i = 0 without loss of generality. For notational convenience, we write l for l_i and m for n_i.

So we assume the multiplicity of l = 0 to be m. Hence in the sequel we show how to find m generalized eigenvectors of an mxm matrix whose m^th power vanishes. (By the triangularizing we’re done, finding m such generalized eigenvectors for this is equivalent to finding m generalized eigenvectors for the original nxn matrix A).

We define the following spaces, where A is our mxm matrix:

1. N(A) = Nullspace(A). The dimension of this is the number of eigenvectors, as we are assuming l = 0.

2. V₁ = W₁= N(A)

3. V_i = N(Aⁱ), all vectors killed by Aⁱ. Note that V_mis the entire space.

4. W_i= {w Î N(Aⁱ) such that w ^ N(A^i-1)}, for 2 £ i £ m.

For example, assume we are in R³, and A² is the zero matrix. Let’s consider V₂. For definiteness, assume V₁ is 1-dimensional, and V₂ is 3-dimensional. W₁ is just V₁. The problem is, if y₁ and y₂ are two vectors killed by A² and not by A, then it is possible that y₁ - y₂ (or some linear combination) is killed by A.

In the picture above, the line represents W₁ and the plane represents W₂. Anything in the 3-space above is killed by A², and only those vectors along the line are killed by just A.. It is possible to take two vectors in R³ that are linearly independent, neither of which lie on the line, but their difference does lie on the line.

Why are we constructing such spaces as W₂? Why isn’t V₂ good enough? The reason is we want a very nice basis. The first basis vector will just be a vector in V₁ = W₁. For the other two directions, we can take two vectors perpendicular to W₁. (How? This is a 3-dimensional space – simply apply Gram-Schmidt).

The advantage of such a basis is that if z₁ and z₂ are linearly independent vectors in W₂, then the only way a z₁ + b z₂ can be in W₁ is for a = b = 0. Why? W₂ is a subspace, and as z₁ and z₂ are perpendicular to W₁, so is their linear combination. So their linear combination is still in the plane perpendicular to W₁, and as long as a and b are not both zero, it will not be the zero vector in the plane, hence it will be killed by A² and not A.

What we are really doing is Partial Orthogonal Complementation. Instead of finding the orthogonal complement of V₁ in R^m, we are finding the orthogonal complement in V₂.

Let L be the smallest integer such that A^L is the zero matrix. Then we only need to look up to W_Land V_L. V_Lwill be an m-dimensional space (as every vector is killed by A^L). We’ll have a nice basis for V_L, consisting of the bases of W₁, ..., W_L. The advantage of this decomposition is that the spaces are mutually perpendicular, and if you have a linear combination of vectors in W_j, then the only way it can be in a W_b with b < j is if the combination is the zero vector.

Lemma 6: dim(W_i-1) ³ dim(W_i), for i = 2, 3, ..., L.

Proof: Assume not: let N = dim(W_i). So consider a basis of W_i: z₁, z₂, ..., z_N, and the vectors Az₁, ..., Az_N. Clearly each Az_j is in V_i-1. We claim each Az_j must have some component in W_i-1. Why? The smallest power that kills each z_j is Aⁱ. If there was no component in W_i-1, then A^i-2 would kill z_j, contradiction.

Let P = P_i-1 be the projection operator from V_i-1to W_i-1. Note P² = P, and by the above arguments each vector Az₁, ..., Az_N has a non-zero component in W_i-1. Therefore the N vectors PAz₁, ..., PAz_N are N non-zero vectors in W_i-1.

As we are assuming that dim(W_i-1) < dim(W_i), the N vectors PAz₁ thru PAz_N cannot be linearly independent, for the dimension of a subspace is the maximal number of linear independent vectors you can have in that space. Hence there exist constants, not all zero, such that

a₁PAz₁ + ... + a_N PAz_N = 0

Hence PA (a₁z₁+ ... + a_Nz_N) = 0. By the definition of W_i, the linear combination a₁z₁+ ... + a_Nz_N is in W_i. Therefore, the smallest power of A that kills it is I unless it is the zero vector. As we are assuming the vectors z₁

through z_N are linearly independent, it is only the zero vector if a₁ = ... = a_N = 0.

As i > 1, A cannot kill a₁z₁+ ... + a_Nz_Nunless this is the zero vector. Could PA kill a non-zero vector? No: by definition, if a₁z₁+ ... + a_Nz_N is not the zero vector, then it is in W_i. Therefore A(a₁z₁+ ... + a_Nz_N) has a non-zero component in W_i-1 (if not, that contradicts a₁z₁+ ... + a_Nz_N is in W_i). Therefore, PA(a₁z₁+ ... + a_Nz_N) cannot be zero, as A(a₁z₁+ ... + a_Nz_N) has a component in W_i-1. Therefore, the only way PA(a₁z₁+ ... + a_Nz_N) can be the zero vector is if a₁z₁+ ... + a_Nz_N is the zero vector, which forces a₁ = ... = a_N = 0. Contradiction. QED.

REMARK: by Lemma 6, we see our previous example is impossible. An example that is consistent with Lemma 6 is to consider R⁵, let V₁ = W₁ be three-dimensional, and W₁ a plane perpendicular to W₁.

Lemma 7: dim(W_i) ³ 1 for i = 1, 2, ..., L.

Proof: As L is the smallest integer such that A^L is the zero matrix, then there must be a vector killed by A^L but not by A^L-1. Whence dim(W_L) is at least 1, so by Lemma 6 we obtain dim(W_i) is at least 1 for i = 1, 2, ..., L.

We now show how to construct the m generalized eigenvectors. We find bases for the spaces W₁, W₂, ..., W_m. We then use A to ‘pullback’.

It’s easier to explain by an example: assume the dimensions are as follows. Let’s take m = 12, and L = 5. For definiteness sake, consider the following:

V₁ V₂ V₃ V₄ V₅ V₆

W₁ W₂ W₃ W₄ W₅ W₆

dimW: 4 3 2 2 1 0

basis: u₁,...,u₄v_1,...,v₃ w_1,w₂ x₁,x₂ y

pullback:A⁴ y ß A³ y ß A² y ß A y ß y

Now, W₄ is 2-dimensional.

WE DO NOT KNOW THAT Ay IS IN W₄ !!! IT IS QUITE POSSIBLE THAT Ay IS KILLED BY A⁴ AND NO SMALLER POWER OF A WITHOUT BEING IN W₄!!!

We know y is killed by A⁵ and no smaller power of A; hence Ay is killed by A⁴ and no smaller power of A. But this does not mean that Ay is in W₄.

Fortunately, there is a huge degree of non-uniqueness in the Jordan Canonical Form. We did not need Ay to be in W₄ – all we needed was Ay (and A²y, A³y, ...) to be killed by A⁴ and nothing lower (A³ and nothing lower, ....). We’ll see below how to handle this.

So for now, all we know is that Ay is in V₄, with a non-zero projection in W₄; that A² y is in V₃ with a non-zero projection in W₃, and so on.

W₄ is 2-dimensional. Choose a vector x in W₄ such that x is linearly independent with the projection of Ay onto W₄. Then this x gives us another Jordan Block:

V₁ V₂ V₃ V₄ V₅ V₆

W₁ W₂ W₃ W₄ W₅ W₆

dimW: 4 3 2 2 1 0

basis: u₁,...,u₄v_1,...,v₃ w_1,w₂ x₁,x₂ y

pullback A⁴ y ß A³ y ß A² y ß A y ß y

pullback A³ x ß A² x ß A x ß x

We continue the game (noting that Ax is in V₃, but not necessarily in W₃). We already have two candidates for directions in V₃, namely A²y and Ax. We’ll show later that though they are not necessarily in W₃, they are killed by A³ and not A², and that their projections onto W₃ are linearly independent.

We need to find 3 directions in W₂. The projections of A³y and A²x give us at most two (these two directions could be the same – again, we will show later that this cannot happen). As W₂ is a 3 dimensional space, we can find a vector v in W₃ that is linearly independent with the projections of A³y and A²x:

V₁ V₂ V₃ V₄ V₅ V₆

A⁴ y ß A³ y ß A² y ß A y ß y

A³ x ß A² x ß A x ß x

A v ß v

We then have to find four directions in W₁, and have three candidates. We’ll see later these three candidates are linearly independent, hence we can find a fourth vector u linearly independent with the rest:

V₁ V₂ V₃ V₄ V₅ V₆

A⁴ y ß A³ y ß A² y ß A y ß y

A³ x ß A² x ß A x ß x

A v ß v

We now have enough (m) candidates. We will show that they are linearly independent.

First we prove that A² y and Ax are linearly independent; then we will prove A³ y, A² x are linearly independent, and so on. (Proof suggested by L. Fefferman and O. Pascu). Assume a A² y + b Ax = 0. Then A(a Ay + b x) = 0. But Ay has a non-zero projection in W₄, and we’ve chosen x to be linearly independent in W₄ with the projection of Ay. Therefore the smallest power that can kill this combination is A⁴, unless it is the zero combination. Hence the only way this can be killed by A is if a = b = 0.

Similarly, assume a A³ y + b A² x = 0. Then A²(a Ay + b x) = 0, and by the same argument as above, a = b = 0. Note that we did not need Ay to be in W₄, only that it had a non-zero projection there.

By construction, v is linearly independent with A³ y and A² x. What about A⁴y, A³x, and Av? Assume a A⁴y + b A³x + cAv = 0. Then again we obtain A(aA³y + bA²x + cv) = 0, and the construction of v forces a = b = c = 0.

Lemma 8: Assume now some linear combination of the m generalized eigenvectors constructed above is zero. Then all the coefficients are zero.

Proof:

V₁ V₂ V₃ V₄ V₅ V₆

A⁴ y ß A³ y ß A² y ß A y ß y

A³ x ß A² x ß A x ß x

A v ß v

Assume the coefficient of y, a, is non-zero. Then we have

a y = - (rest of terms), where the rest is killed by A⁴, but y isn’t.

Hence a must be zero.

Now assume the coefficients of Ay and x are a and b, respectively. Then

a Ay + b x = - (rest of terms), rest killed by A³.

As x is linearly independent with the projection of Ay onto W₄, a Ay + bx is killed by A⁴ and not A³ unless this combination is the zero vectory. As the rest of the terms are killed by A³ , this implies a = b = 0.

Continuing to argue in this way, we obtain all the coefficients are zero, and hence the generalized eigenvectors are linearly independent.

We can now build up our matrix M and J! At last!

For each l, we have associated generalized eigenvectors. For definiteness sake, let’s consider the above case, and I’ll leave to you the generalization. We have 4 blocks corresponding to l = 0: one block with 5 generalized eigenvectors (starting with the eigenvector A⁴y, and ending with y), another block with 4 gev (starting with the eigenvector A³x, and ending with x), another block of length 2, and one of length 1.

We can order the blocks anyway we want – that will just change the order of the block in J; however, in each block we must write the vectors starting with the eigenvector on the far left, and going to the highest generalized eigenvector on the far right:

( A⁴y A³y A²y Ay y A³x A² x Ax x Av v u )

Another possible arrangements would be

( A³x A² x Ax x A⁴y A³y A²y Ay y Av v u )

and so on. I’ll leave it to you to verify that M^-1 A M = J.

VII. Calculation Shortcut:

When trying to find bases for the spaces W_i, there is a nice shortcut. First, we find a basis for V_i, or start to. How do we find vectors killed by Aⁱ? We just have to find the nullspace of Aⁱ. We do this by Gaussian Elimination, reducing Aⁱ to an upper triangular matrix U. We can assume we’ve already found bases for W₁ thru W_i-1, or equivalently, a basis for V_i-1. Let’s say the basis for V_i-1 is b₁, b₂, ..., b_q. Then if we add these q rows to U, forming a new matrix U’, we observe the following:

(1) If U’ v = 0, then v is killed by Aⁱ (from the first m rows of U’ are the same as those of U).

(2) If U’ v = 0, then v is perpendicular to W₁ thru W_i-1: this follows immediately from the fact that we put the basis for W₁ thru W_i-1 as the last q rows of U’, and so this forces v to be perpendicular to these spaces.

Also, if an eigenvalue has multiplicity 3 or less, counting the number of linearly independent eigenvectors gives us the Jordan Form. Why? If there are 3 LI eigenvectors, it’s diagonalizable. If there is only 1, it must be a 3x3 block. If there are 2, we must have a 2x2 and a 1x1 block. Note we have no idea what M looks like.

Also note that this argument fails for multiplicity 4 and greater. If we have multiplicity 4 and 2 eigenvectors, it could be 2x2, 2x2, or it could be 3x3, 1x1.

Note the difference between theory and practice: theoretically, we know that bases for the different W_i exist, so with a wave of the hand we have them to work with. But if we were actually going to Jordanize large matrices, finding bases for all these spaces takes time, and we don’t always need all those basis elements. Often it’s enough to just find vectors in V_i; for example, show that if instead of taking y in W_L we took y in V_L the pullback process would work. Then there could be many i where we’ve pulled-back all the vectors we need, and hence there would be no need there to find a basis. If this doesn’t make too much sense, don’t worry: it’s late at night here for me, and at this stage in your life, you won’t be dealing with terrible Jordanizations where this would really make a difference. I just want to emphasize that often you can come up with a theoretical line of argument that, in practice, will yield the correct answer, but be so computationally inefficient that a better way is greatly desired.

SUMMARY: HOW TO JORDANIZE:

STEP 1: Find the eigenvalues, their multiplicities, and all the eigenvectors.

STEP 2: For each eigenvalue l and it’s multiplicity m, calculate (A-lI),

(A-lI)², ..., (A-lI)^m.

STEP 3: Find bases for the spaces W_i described above. This will yield bases

for V_i. Use the calculation shortcut to find the bases.

STEP 4: ‘Pullback’ vectors as described, add in vectors linearly

independent with projections as needed.