NOTES ON LINEAR ALGEBRA

 

CONTENTS:

[1] MULTIPLYING MATRICES

[2] GAUSSIAN ELIMINATION

[3] INVERTING MATRICES

 

 

[1] MULTIPLYING MATRICES:

 

For ease of presentation, I will NOT draw the parentheses around the matrices correctly. If I were to, I’d have to use either the Equation Editor (which takes more time) . If anyone wants to TeX up these notes, please feel free to do so!

 

Let’s say we have the matrix A =

 

            (1 2)

            (3 4)

 

And we want to multiply it by the column vector v =

 

            (5)

            (6)

 

The answer is Av =

 

            (1 2) (5)       (1*5 + 2*6)      (17)

            (3 4) (6)   =  (3*5 + 4*6)  =  (39)

 

 

Let’s do another example. Let B =

 

            (2 7)

            (3 5)

 

and let the vector w =

 

            (1)

            (3)

 

 

Then B w =

 

            (2 7) (1)       (2*1 + 7*3)      (23)

            (3 5) (3)   =  (3*1 + 5*3)  =  (18)

 

 

Let’s study a bigger matrix now. Let C =

 

            (1 2)

            (3 4)

            (5 6)

 

and consider the vector x =

 

            (2)

            (1)

 

Then C x =

 

            (1 2)     (2)          (1*2 + 2*1)        ( 4 )

            (3 4)     (1)     =   (3*2 + 4*1)   =   (10)

            (5 6)                    (5*2 + 6*1)        (16)

 

And finally, let’s look at D =

 

            (1 2 0)

            (3 4 2)

            (5 6 3)

 

and the vector y =

 

            (1)

            (0)

            (2)

 

Then the product D y =

 

            (1 2 0)  (1)      (1*1 + 2*0 + 0*2)

            (3 4 2)  (0)  =  (3*1 + 4*0 + 2*0)

            (5 6 3)  (2)      (5*1 + 6*0 + 3*2)

 

 

 

This is basically how to multiply a matrix by a column vector. Now we want to study how to multiply two matrices together. We have the following rule, which we proved:

 

Matrix Multiplication Rule:

          Let’s say A has Ra rows and Ca columns, and B has Rb rows and Cb columns. This means A is an Ra x Ca matrix, and B is an

Rb x Cb matrix. Then we can do the multiplication AB if and only if Ca = Rb, and the resulting matrix AB has Ra rows and Cb columns.

 

For example, if A is 3x4 and B is 4x2, then we can do the multiplication AB, and the product AB is a 3x2 matrix; however, we cannot do the multiplication BA, for 2 ¹ 3.

 

 

Let’s do some examples: Let the matrices A and B be (respectively)

 

          (1 2)            and              (5 6)

          (3 4)                               (7 8)

 

Then in this case we can multiply in EITHER order, as both are 2x2. Let’s do AB =

 

          (1 2)  (5 6)

          (3 4)  (7 8)

 

The way we multiply matrices is column by column. To find the first column in the product, we multiply the matrix A by the first column of B, and that’s the answer. To find the second column of the product, we multiply A by the second column of B.

 

Step 1: Finding the first column of the product:

          (1 2)  (5)      (1*5 + 2*7)      (19)

          (3 4)  (7)  =  (3*5 + 4*7)  =  (43)

 

Step 2: Finding the second column of the product:

          (1 2)  (6)      (1*6 + 2*8)      (22)

          (3 4)  (8)  =  (3*6 + 4*8)  =  (50)

 

Step 3: Combining the above:

 

          (1 2)  (5 6)      (19 22)

          (3 4)  (7 8)  =  (43 50) 

 

Let’s do a harder one: Let the matrices C and D be (respectively)

 

          (1 2 3)                  (3 0)

          (4 5 6)         and    (1 2)

          (2 1 0)                  (0 5)

 

First, let’s check and make sure we can multiply CD. C is 3x3, D is 3x2, so yes we can, and the product will be 3x2.

 

Step 1: C times the first column of D gives the first column of CD

 

          (1 2 3)   (3)      (1*3 + 2*1 + 3*0)                ( 5)

          (4 5 6)   (1)  =  (4*3 + 5*1 + 6*0)    =        (17)

          (2 1 0)   (0)      (2*3 + 1*1 + 0*0)               ( 7)

 

Step 2: C times the second column of D gives the second column of CD

 

          (1 2 3)   (0)      (1*0 + 2*2 + 3*5)                (19)

          (4 5 6)   (2)  =  (4*0 + 5*2 + 6*5)    =        (40)

          (2 1 0)   (5)      (2*0 + 1*2 + 0*5)               (  2)

 

Step 3: Combining the above yields CD =

 

          (1 2 3)  (3 0)         (5   19)

          (4 5 6)  (1 2)    =   (17 40)

          (2 1 0)  (0 5)         ( 7   2 )

 

 

 

 

 

 

 

 

[2] GAUSSIAN ELIMINATION:

 

Matrices can be used to represent systems of equations, which we then try to solve. For example, let’s say we have the two equations:

 

          3x + 2y = 5

          4x + 5y = 7

 

Then we can write this in matrix form by

 

          (3 2) (x)      (5)

          (4 5) (y)  =  (7)

 

Or, if we had the three equations

 

          3x + 2y + 5z = 8

          2x + 2y + 4z = 7

          7x + 9y + 0z = 1

 

Then we can write this in matrix form by

 

          (3 2 5) (x)      (8)

          (2 2 4) (y)  =  (7)

          (7 9 0) (z)      (1)

 

Now, we want to find a way to solve such systems of equations. Let’s start with an easy example:

 

          1x + 2y = 1

          3x + 7y = 2

 

We can write this in matrix form by

 

          (1 2) (x)      (1)

          (3 7) (y)  =  (2)

 

Now, let’s look at the two equations. If we multiply the first equation by -3 we get: -3x -6y = -3. If we then add this to the second equation (3x + 7y = 2) we get a new second equation:

 

                     3x + 7y =   2

+                 -3x  - 6y  = -3

                   ------------------

                     0x  + 1y = -1

 

So now we have the two equations

 

          1x + 2y =  1

          0x + 1y = -1

 

which we can write in matrix form as

 

          (1 2) (x)      ( 1)

          (0 1) (y)  =  (-1)

 

We started with the matrix

 

          (1 2) (x)      (1)

          (3 7) (y)  =  (2)

 

If we multiply the first row by -3 and add that to the second row, we get the matrix

 

          (1 2)

          (0 1)

 

And if we multiply 1 by -3 and add it to 2 we get the vector

 

          ( 1)   

          (-1)

 

So we see we can symbolically represent multiplying and adding equations by multiplying and adding rows. Slowly, here goes:

 

 

The goal is to reduce the matrix to something easy to work with, namely something with all zeros below the diagonal.

 

We start with

 

          (1 2) (x)      (1)

          (3 7) (y)  =  (2)

 

Step 1: What do we need to multiply the first row by to cancel the 3 in the second row? Or, find ‘a’ such that 1a + 3 = 0, hence a = -3. We then multiply the first row by -3, and write the result under the second row.

Question 1: why do we multiply the first row by -3? Because that’s what we need to cancel the 3 in the second row.

Question 2: why do we write the result under the second row? Because

that’s where we’re adding the result.

Remember, you must also multiply 1 by -3 and add it to 2. Why? Equality: whatever you do to one side of the equation, you must do to the other.

 

[-3]

          (1 2) (x)      (1)

          (3 7) (y)  =  (2)

           -3  -6                     -3

 

          (1 2) (x)      ( 1)

          (0 1) (y)  =  (-1)

 

Step 2: We can now read off the answers! The two equations are:

 

          1x + 2y = 1

          0x + 1y = -1

 

So we learn from the second equation that y = -1. We then substitute that value into the first equation and get 1x + 2(-1) = 1, so x = 3. We can check this by substituting these values for x and y into the original equations:

         

          1x + 2y = 1  è      1(3) + 2(-1) = 1    

          3x + 7y = 2  è      3(3) + 7(-1) = 2

 

So yes, these values work.

 

Let’s do a slightly harder example.

 

Consider the following:

 

          (1 2 3) (x)    (  2)

          (2 3 0) (y)  = (  1)

          (3 0 1) (z)      (10)

 

Step 1: we want to get a matrix that has all zeros under the diagonal. So we need to get rid of the 2 in the second row and the 3 in the third row. To get rid of the 2 in the second row, we multiply the first row by -2 and add the result to the second row; we multiply the first row by -3 and add the result to the third row. Remember, we write the results of the multiplication under the row we’re going to add it to, and remember we MUST also do the multiplication on the right hand side. So we must multiply 1 by -2 and we must multiply 10 by -3.

 

          (1 2 3) (x)    ( 2)

[-2]      (2 3 0) (y)  = ( 1)

                 -2 -4 -6                  -4

[-3]      (3 0 1) (z)     (10)

                 -3 -6 -9                  -6           

 

This gives

 

          (1  2   3)(x)       (2)

          (0 -1 -6)(y)   =  (-3)

          (0 -6 -8)(z)       ( 4)

 

We’re almost there – we now need to get rid of the -6 in the third row. Then we’ll have a matrix with all zeros under the main diagonal, and we’ll be able to read off the answers.

 

Step 2: We need to get rid of the -6 in the third row. There’s nothing we can multiply the first row by. Why? If we add copies of the first row to the third, we’ll lose the 0 which starts off the third row. What we should do is multiply the second row by something and add it to the third, as this way we won’t lose the zero. So, we need to find ‘a’ such that (-1)a + (-6) = 0, hence a = -6.

 

          (1  2   3)(x)       (2)

          (0 -1 -6)(y)   =  (-3)

[-6]      (0 -6 -8)(z)       ( 4)

                  0    6  36                    18

 

This yields

 

          (1  2   3)(x)       ( 2)

          (0 -1 -6)(y)   =  (-3)

          (0  0 28)(z)       (22)

 

Step 3: We can now read off the answers! The three equations are

 

          1x + 2y + 3z  = 2

          0x  - 1y  - 6z  = -3

          0x  + 0y +28z = 22

 

So z = 22/28  = 11/14

So -y - 6(11/14) = -3 è y = -24/14

So x + 2(-24/14) + 3(11/14) = 2 è x = 43/14

 

Let’s check these numbers in the original equations:

 

          1x + 2y + 3z = 2  è  1(43/14) + 2(-24/14) + 3(11/14) = 2

          2x + 3y + 0z = 1  è  2(43/14) + 3(-24/14) + 0(11/14) = 1

          3x + 0y + 1z = 10 è 3(43/14) + 0(-24/14) + 1(11/14) = 10

 

So we see we do obtain the correct answer! (If it makes you feel better, I got wrong answers the first two times I did the problem – I did the algebra wrong).

 

 

 

 

 

 

 

 

 

 

[3] INVERTING MATRICES:

 

We’re now ready to use the method of Gaussian Elimination to invert matrices. Let’s review how Gaussian Elimination works. We start off with a matrix A and we do row operations to it. This is equivalent to multiplying A by several matrices E1, E2, ..., En (say).

 

For simplicity, let’s assume it takes 5 steps to Gaussian Eliminate A to the Identity matrix, so E5 E4 E3 E2 E1 A = I. Then E5 E4 E3 E2 E1 = A-1, the inverse matrix to A.

 

To keep track of these steps, we can just form E5 E4 E3 E2 E1 I, which by the above is A-1.

 

An example should illustrate.

 

Let’s try to find the inverse to A =

 

          (1 2)

          (3 5)

 

THE GOAL: We will use Gaussian Elimination to get A to the identity matrix (ones on the main diagonal, zeros elsewhere). We will keep track of the Gaussian Elimination by acting on the Identity matrix.

 

Step 1: Write the matrix A followed by the identity:

 

          (1 2)            (1 0)

          (3 5)            (0 1)

 

Step 2: We need to eliminate the 3 in the second row, so we must find ‘a’ such that 1a + 3 = 0.  Hence a = -3. So we multiply the first row of A by -3 and add it to the second row. And remember, by EQUALITY, we must do the same to the other side, to the Identity.

 

          (1 2)            (1 0)

[-3]      (3 5)            (0 1)

           -3 -6                      -3  0   

 

 

          (1   2)          (1  0)

          (0 -1)           (-3 1)

 

 

 

Step 3: Now, we want to have all 1s along the main diagonal, so we might as well adjust the second row right now. We have a -1, where we want a 1. So we must multiply the second row by -1. Again, must do this to both sides:

 

          (1   2)          (1  0)

          (0 -1)           (-3 1)

                  0    1                        3   -1

 

Hence we get

 

          (1 2)            (1  0)

          (0 1)            (3 -1)

 

Step 4: Now we need to get rid of the 2 in the first row, so we multiply the second row by -2 and get:

 

          (1 2)            (1  0)

           0  -2                         -6   2

          (0 1)            (3 -1)

 

and we get

 

          (1 0)            (-5 2)

          (0 1)            ( 3 -1)

 

Note: as a check, you can go thru and see that

 

          (-5  2)

          ( 3  -1)

 

is the inverse to A.

 

 

Let’s do one more problem. Let’s find the inverse for B =

 

          (9 4)

          (7 3)

 

Step 1: Write the matrix B followed by the Identity:

 

          (9 4)            (1 0)

          (7 3)            (0 1)

 

Step 2: What should we multiply the first row by to get rid of the 7 in the second row? So, find ‘a’ such that 9a + 7 = 0, or a = -7/9.

 

          (9 4)            (1  0)

[-7/9]    (7 3)            (0  1)

           -7  -28/9 -7/9  0

 

And we get

 

          (9     4)                  (1     0)

          (0 -1/9)                  (-7/9 1)

 

Step 3: We want to end up with the identity matrix on the left. We have -1/9 in the lower diagonal – we need to multiply the second row by -9 to get 1.

 

          (9     4)                  (1     0)

[-9]      (0 -1/9)                  (-7/9 1)

            0       1                                      7     -9

 

And we get

 

          (9 4)                      (1   0)

          (0 1)                      (7  -9)

 

 

 

 

 

 

Step 4: We need to get rid of the 4 in the first row, so we multiply the second row by -4 and add it to the first

 

[-4]      (9 4)                      (1   0)

            0 -4                                        -28  36

          (0 1)                      (7  -9)

 

And we get

 

          (9 0)            (-27 36)

          (0 1)            (  7   -9)      

 

Step 5: We need to have the identity on the left. We have a 9 in the upper left corner, so we must multiply the first row by 1/9.

 

[1/9]     (9 0)            (-27 36)

            1  0                           -3       4

(0 1)            (  7   -9)      

 

And we get

 

          (1 0)            (-3 4)

          (0 1)            ( 7 -9)

 

You can check that this is the inverse of B by doing the multiplication.

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTES ON LINEAR ALGEBRA

 

CONTENTS:

[4] MATRIX ADDITION

[5] MATRIX NOTATION

[6] TRANSPOSE

[7] SYMMETRIC MATRICES

[8] BASIC FACTS ABOUT MATRICES

 

 

[4] MATRIX ADDITION

Let A and B be two matrices. When can we add them, and what is the answer? We define matrix addition by adding componentwise. For example:

 

            (1 2)         +     (5 7)       =    (6 9)     

            (3 4)                  (2 0)              (5 4)

 

Or

 

            (1 2 5)     +     (5 7 1)       =    (6 9 6)           

            (3 4 0)               (2 0 8)              (5 4 8)

 

 

 

Of course, we’ve yet to give any motivation as to why one would want to define matrix addition by the above. Remember how we introduced matrices as maps from one space to another. For example, consider the matrix

 

            (1 2 5)

            (3 4 0)

 

It has 2 rows and 3 columns. It acts on vectors with three components, and returns something with 2 components. For example:

 

            (1 2 5)  (3)       (1*3 + 2*2 + 5*1)      ( 7)

            (3 4 0)  (2)   =  (3*3 + 4*2 + 0*1)  =  (17)

                         (1)

 

So, if we have two matrices A and B acting on the same vector, we can now see why they should have the same number of rows and columns. They should have the same number of

columns because they both act on the same vector. They should have the same number of

rows because they should each take that vector to the same space.

 

For example, here’s an example of what can go wrong when we try to add two matrices of different sizes.

 

Consider

 

(1 3 2) (3)        (1*3 + 3*2 + 2*1)       (11)

(2 4 1) (2)   =  (2*3 + 4*2 + 1*1)  =  (15)

(4 5 1) (1)        (4*3 + 5*2 + 1*1)       (23)

 

 

Then

 

            (1 2 5)  (3)                   (1 3 2) (3)                    ( 7)         (11)

            (3 4 0)  (2)       +          (2 4 1) (2)        =         (17)    +   (15)

                         (1)                   (4 5 1) (1)                                   (23)

 

And we have trouble, as the two vectors are different sizes. One lives in the 2dimensional plane, one lives in 3space. There is no way we can write down one matrix to represent the action of the two matrices.

 

 

 

[5] MATRIX NOTATION

When proving a mathematical theorem, it is not enough to check it on a couple of matrices. For example:

 

            CLAIM: For any matrix A, A + A is the zero matrix.

 

            FALSE PROOF:

 

                        (0 0)           (0 0)           (0 0)

                        (0 0)    +    (0 0)    =    (0 0)

 

            But ANY other matrix will not work. If you are trying to disprove a claim, it is enough to show that, for a specific example, it fails.

 

            Hence

 

                        (1 2)          (1 2)           (2 4)

                        (3 4)    +   (3 4)    =     (6 8)

 

 

So it is very useful in mathematics to handle a large number of matrices all at once. We don’t have the time to check each and every matrix individually, as there are infinitely many matrices!

 

So, we develop shorthand notation. We represent an arbitrary entry of a matrix A by

                                                a­i,j

 

The ‘i’ stands for the ith row, the ‘j’ stands for the jth column. So, a12 means the 1st entry in the 2nd row, a22 means the 2nd entry in the 2nd row, and so on.

 

So, we write an arbitrary 2x2 matrix by

 

(a11 a12)

            (a21 a22)

 

We write an aribrary 2x3 matrix by

 

(a11 a12 a13)

            (a21 a22 a23)

 

We write an arbitrary 3x3 matrix by

 

(a11 a12 a13)

            (a21 a22 a23)

            (a31 a32 a33)

 

And we write an arbitrary mxn music (m rows, n columns) by

 

(a11 a12 a13        ...         a1n)

            (a21 a22 a23        ...         a2n)

            (a31 a32 a33        ...         a3n)

            (                       .               )

            (                       .              )

            (                         .             )

            (am1 am2 am3      ...         amn)

 

 

So, to revisit Matrix addition:

 

 

(a11 a12 a13)       +          (b11 b12 b13)      =          (a11+b11   a12+b12   a13+b13)

            (a21 a22 a23)                   (b21 b22 b23)                  (a21+b21   a22+b22   a23+b23)

 

Or, in a specific example:

 

            (1 2 3)              +          (1 0 2)              +          (2 2 5)

            (4 5 6)                          (3 1 0)                          (7 6 6)

 

 

 

 

[6] TRANSPOSE

We now define the transpose of a matrix. For us, the main use will be in studying symmetric matrices, matrices that are equal to their transpose.

 

We write AT for the transpose of the matrix A, and we form AT as follows: the first row of A becomes the first column of AT; the second row of A becomes the second column of AT; the third row of A becomes the third column of AT; ... ; the last row of A becomes the last row of AT.

 

So, if A has 3 rows and 5 columns, then AT has 3 columns and 5 rows (or as we’d normally write it, 5 rows and 3 columns).

 

Let’s do an example:

 

                                    (0 1 1)                                          (0 1)

A         =          (1 2 3)              then  AT     =        (1 2)

                                                                            (1 3)

 

 

Or

                                    (1 2 3 4)                                     (1 0 5)

            A         =          (0 0 1 2)           then AT     =       (2 0 4)

                                    (5 4 3 2)                                     (3 1 3)

                                                                                      (4 2 2)

 

So, for a 2x3 matrix

 

           (a11 a12 a13)                                            (a11 a21)

A   =    (a21 a22 a23)                   then AT    =     (a12 a22)

                                                                                    (a13 a23)

           

[7] SYMMETRIC MATRICES

Symmetric matrices are very useful in mathematics, physics, and engineering. First, the definition. We say a matrix A is symmetric if it equals it’s tranpose, so A = AT. Later we’ll briefly mention why they are useful.

 

The first thing we note is that for a matrix A to be symmetric A must be a square matrix, namely, A must have the same number of rows and columns.  Why? If A has m rows and n columns then AT has n rows and m columns. Since they’re equal, they must have the same number of rows (hence m = n) and the same number of columns (hence n = m). We call matrices with the same number of rows and columns square matrices.

 

For example,

 

            (1 2)        

            (3 4)                

 

even though the above is a square matrix, is not symmetric, as it’s tranpose is

 

            (1 3)

            (2 4)

 

However,

 

            (1 5)

            (5 1)

 

is symmetric, as it does equal its tranpose.

 

THEOREM: Let A a 2x2 matrix. Then A is Symmetric if it’s lower left and upper right entries (a21 and a12) are the same.

 

Proof: We write A as [using a,b,c,d instead of a11, ... as it’s easier to view]

 

            (a b)

            (c d)

 

Then AT  is

 

            (a c)

            (b d)

 

And A = AT means

            (a b)                 (a c)

            (c d)     =          (b d)

Since the two matrices are equal, they must be equal componentwise. So the two upper left entries must be the same. This gives a = a, which imposes no new conditions. Let’s look at the other entires. The upper right entires must be the same, which imposes the condition

 

                        b = c.

 

The lower left entries must be the same, which imposes the condition c = b (which we already had), and the two lower right entries must be the same, which imposes d = d.

 

Hence for a 2x2 matrix A to be symmetric we must have b = c, so the matrix looks like

 

            (a b)

            (b c)

 

What about a 3x3 matrix? Assume a 3x3 matrix A equals its transpose:

 

            (a b c)                          (a d g)

            (d e f)               =          (b e h)

            (g h i)                           (c f  i)

 

This gives nine conditions:

 

            a = a

            b = d

            c = g                these come from looking at the first row of each side of the above.

 

            d = b (already had)

            e = e

            f = h                 these come from looking at the second row of each side

 

            g = c (already had)

            h = f  (already had)

            i = i

 

Hence the most general 3x3 symmetric matrix looks like

 

            (a b c)

            (b e f)

            (c f  i)

 

We can, of course, continue to do this for 4x4, 5x5, ..., nxn, ... matrices. The main thing to notice is that symmetric matrices are ‘nice’ with respect to the main diagonal. (Recall the main diagonal is a11, a22, ..., ann. We see that for a symmetric matrix, the entry in the ith row and jth column is the same as the entry in the jth row and ith column).

 

THEOREM: (A + B)T  = AT + BT (or, the transpose of a sum is the sum of the transposes).

 

Proof: Let’s do a specific case first.

 

                        (1 2 3)                          (3 2 1)

            A  =     (4 5 6)              B   =    (2 1 0)

 

 

                        (1 4)                             (3 2)                             (4 6)

Then  AT    =   (2 5)                 BT  =   (2 1)   and  AT + BT =  (4 6)

                        (3 6)                             (1 0)                             (4 6)

 

 

And we find that

 

                             (4 4 4)                                 (4 6)

            A + B  =   (6 6 6)     and (A + B)T  =  (4 6)

                                                                         (4 6)

 

 

Hence we see that (A + B)T = AT + BT for these two matrices!!!

 

Note that the above is NOT a proof – it is merely a verification in this one particular case. Here’s a sketch of the proof.

 

Consider an arbitrary row, say the 2nd. We want to show that (A + B)T = AT + BT are the same. We’ll do this by showing that each column on the left hand side equals the corresponding column on the right hand side.

 

Let’s look at the LHSide first. We add the 2nd row of A to the 2nd row of B, and then this sum becomes the 2nd row of A + B. Taking transposes, this gives the 2nd row of (A + B)T.

 

Now we examine the RHSide. The 2nd column of AT is the 2nd row of A; the 2nd column of BT  is the 2nd row of B. So the 2nd column of  AT + BT is the 2nd row of A plus the 2nd row of B.

 

So, the 2nd row of (A + B)T equals the 2nd row of AT + BT. But there is nothing special about 2 – we could do this equally well for any column, and we see the two sides are in fact equal.

 

As promised, a few words about why symmetric matrices are useful. First, they’re easier to handle then general matrices, as they only need about half as many entries. Once you specify the entries on the main diagonal and above the diagonal, you know all the entries (as the entries below the diagonal equal the ones above the diagonal). You’ve seen in your engineering course one example of where symmetric matrices arise. One common example in mathematical physics is with the matrix of second derivatives. For example, consider the matrix where

                        a­i,j  =  df/dxidxj.

 

Here f is a function of n variables (x1, ..., xn), and df/dxidxj is the partial derivative of f with respect to xi and xj. For “good” functions f we have df/dxidxj  =  df/dxjdxi (or, it doesn’t matter which order you take the derivatives).

 

 

 

 

[8] BASIC FACTS ABOUT MATRICES

 

[1] A + B = B + A

[2] x(A + B) = xA + xB,           where x is any number

[3] (x+y)A = xA + yB

[4] AB does not always equal BA

[5] A(BC) = (AB)C

[6] A(BC) does not always equal (AC)B (for example, consider A = I)

[7] AA-1 = I, the Identity matrix

[8] (AT)T = A

[9] (A + B)T  = AT + BT

[10] (xA)T = x AT

[11] (AB)-1 = B-1 A-1

[12] (AB)T = BT AT

[13] (A-1)T = (AT)-1

 

Note: we define, for x a real number and A a matrix, xA to be the matrix whose entries are x times those of A.

 

Example:

               (1 2)      (2 4)

            2 (0 1)  =  (0 2)

               (3 4)      (6 8)

 

NOTES ON LINEAR ALGEBRA

 

CONTENTS:

[9] BASIS VECTORS

 

 

[9] BASIS VECTORS

Consider the vector

(2)

(5)

 

This means two units in the x direction, five units in the y direction.

 

 

 

 

 

 

 

 

 

 

 

 


Graphically, we see we can write it as a vector in the x direction, and a vector in the y direction. Let

                                                (1)                    (0)

                                    Ex  =    (0)        Ey  =    (1)

 

be the unit vectors in the x direction and the y direction. We will show that they are a basis for the plane. What this means is that we can write any vector as some copies of Ex and some copies of Ey.

 

For example,

                        (2)      (2)       (0)         (1)         (0)

(5)  =  (0)  +   (5)   =  2 (0)  +  5 (1)   =  2 Ex  + 2 Ey

 

How did we get this? We’re trying to write the vector (2,5) as some number of copies of (1,0) and some number of copies of (0,1).

 

So we’re trying to solve

 

            (2)      (*)      (0)

(5)  =  (0)  +  (**)

 

So, what does * and what does ** equal?

 

Let’s look at the x component, the ‘top’. Then 2 = * + 0, so * = 2.

Let’s look at the y component, the ‘bottom’. Then 5 = 0 + **, so ** = 5.

 

Let’s do another example.

 

                        (7)      (7)       (0)         (1)         (0)

(3)  =  (0)  +  (3)   =  7 (0)  +  3 (1)   =  7 Ex  + 3 Ey

 

 

Again, let’s go through the computation as to how we found it:

 

            (7)      (*)      (0)

(3) =   (0)  +  (**)

 

Let’s look at the x component, the ‘top’. Then 7 = * + 0, so * = 7.

Let’s look at the y component, the ‘bottom’. Then 3 = 0 + **, so ** = 3.

 

Now, this leads us to conjecture:

 

            ANY vector in the plane can be written as some number of copies of Ex and some number of copies of Ey.

 

NOTE: just because we’ve checked this for several vectors, doesn’t mean we’ve proven the theorem. For example, I might conjecture every 2x2 matrix is symmetric. Why? Well, look at some matrices:

 

            (2 3)     (5 0)     (5 5)     (7 1)     (2 0)     (2 1)     (12 92)

            (3 2)     (0 5)     (5 5)     (1 7)     (0 5)     (1 2)     (92 12)

 

But this is absurd! Consider

 

            (1 0)                 (1 2)                 (5 9)

            (2 1)                 (3 4)                 (3 3)

 

So we must be careful not to be misled by checking certain special cases. It is a very good idea to test a theorem or conjecture by looking at certain specific cases. This helps lead you to what should be true, but you must prove it in the end.

 

So, in our case, we must show that, given any vector (x,y) in the plane, we can find numbers a and b (where a and b will depend on x and y) such that

 

            (x)                       (1)             (0)

(y)        =          a (0)     +   b (1)           =          a Ex  +  b Ey

 

 

Now, for the two vectors Ex and Ey, it is easy to find an a and a b. Just take a = x and b = y.

 

 

 

 

 

 

 

 

 

 

 

 


Let’s consider a slightly more exotic example.

 

                                    (5)                                            (12)

            V1       =          (0)                    V2       =          (10)

 

Are V1 and V2 a basis? Before showing it is, before showing that we can write any vector as copies of V1 plus copies of V2, let’s do a specific example first. Consider the vector (1700,-500)

 

So we want to solve

 

            (1700)                                        (5)                     (12)

            (-500)              =                      a (0)       +        b (10)

 

We are looking for ‘a’ and ‘b’. We have two equations:

 

(Eq1.1)            1700    =          5a + 12b

 

Unfortunately, this isn’t too easy to just look at and see what ‘a’ and ‘b’ are. Let’s look at the second equation:

 

(Eq1.2)            -500     =          0a + 10b

 

This we can easily solve. We get 10b = -500, or b = -50. Now that we know b, we can substitute this back into (Eq1.1):

 

                 1700 = 5a + 12(-50)

            è 1700 = 5a - 600

            è 2300 = 5a

            è       a = 460

 

So, we get

 

            (1700)                                        (5)                     (12)

            (-500)              =                      a (0)       +        b (10)

 

or

 

            (1700)                                        (5)                       (12)

            (-500)              =                460  (0)        +      -50 (10)

 

So

 

            (1700)

            (-500)              =                460 V1   -    50 V2

 

 

Why does this work? Why are V1 and V2 a basis? Notice that while V2 has a piece in the x direction and a piece in the y direction, V1 only has a piece in the x direction. So if we have a vector (x,y), we must find an a and a b such that (x,y) = aV1 + bV2.

 

Right now, we don’t have to actually find an a and a b, but just show that we could. We show that we determine b first, and then can find a. Since V2 has a y component, we multiply it by whatever is needed to equal the y component of (x,y). We now have b V2. This has the same y component as (x,y), but may not have the correct x component.

 

But this is no problem, as we can still add a number of copies of V1, which is only in the x direction. So we can add whatever we need to correct the x component.

 

Now, let’s prove that V1 and V2 are a basis. So, given a vector (x,y) we need to find a and b such that (x,y) = a V1 + b V2. Now, a and b will be different for different values of x and y. Really,

 

                        a = a(x,y)

                        b = b(x,y)

 

So let’s find them!

 

            (x)                       (5)                 (12)

            (y)        =          a (0)     +      b (10)

 

So we must solve

 

            (Eq1.3)            x = 5a + 12b

            (Eq1.4)            y = 0a + 10b

 

We can solve (Eq1.4) easily, getting b = y/10. Substituting this into (Eq1.3) yields

 

             x = 5a + 12b

             x = 5a + 12(y/10)

             x = 5a + 1.2y

            5a = x - 1.2y

             a = (x - 1.2y) / 5

 

So we have succeeded in finding a and b, given x and y!

 

            a = (x - 1.2y) / 5

            b = y / 10

 

Note that a and b are different for different values of a and b. For the example we did before, namely (x,y) = (1700,-500) what should a and b be?

 

Well, the formulas above give

 

            a = ( 1700 - 1.2*(-500) ) / 5  =  2300 / 5  =  460

            b = -500 / 5  =  -50

 

And this agrees with what we calculated before!

 

 

 

From what we’ve just seen above, we might be led to expect that any two vectors in the plane are a basis. A quick example shows that there are certain pairs of vectors that cannot be a basis. Consider, for example,

           

                                    (2)                                            (4)

            U1       =          (1)                    U2       =          (2)

 

Now, U1 and U2 are along the same line, as U2 is just twice U1. Any multiple of U1 will be in the same direction as U1; any multiple of U2 will be in the same direction as U2. Hence the sum of a multiple of U1 and a multiple of U2 will still be in the direction of U1.

 

Why does this mean U1 and U2 cannot be a basis? Just take any vector (x,y) that’s not in the direction of U1. Then as multiples of U1 and U2 are still in the direction of U1, we cannot get (x,y).

 

y axis

 

 

 


U1

 

 

 

 

 

 


x axis

 

 

 

 

 

 

 

 

 

 

 


Again, any number of copies of U1 will still be in the direction of U1. If U2 is in the same direction as U1, then copies of U1 plus copies of U2 will still be in the direction of U1.

 

So, this quick sketch shows why certain pairs cannot be a basis.

 

We have the following:

 

            THEOREM: Let W1 and W2 be any two vectors that are not in the same direction (ie, that do not lie on the same line). Then W1 and W2 are a basis.

 

Let’s assume one of the vectors is in the direction of the x axis, and draw a picture.

 

 

 


                                                                                                                        W2

 

 

 

 

 

 


                                                                                                      W1

 

 

 

 

 

 

 

 

I don’t really want to go into a theoretical, rigorous proof, so I’ll just do it in the case when W1 is in the direction of the x axis.

 

Let’s just do a sketch. (Sorry for the pun). The first vector W1 equals, say,  (W1­x, 0).

 

W2 has a non-zero component in the y direction. We’re trying to get the vector (x,y). Let’s say W2 = (W2x , W2­Y). We need to solve

 

            (x,y)  =  a W1 + b W2

            (x,y)  =  a W1 + b (W2x , W2­Y)

 

            If we take b = y / W2­Y,  (we can divide by W2­Y as it is not zero) then we get

 

            (x,y) =  a W1 + y / W2­Y (W2x , W2­Y)

            (x,y) =  a W1 + (y W2x / W2­Y, y W2­Y / W2­Y)

            (x,y) =  a (W1­x, 0)  + (y W2x / W2­Y, y)

            (x,y) =  (a W1­x, 0)  + (y W2x / W2­Y, y)

 

So the y component on the Left Hand Side equals the y component on the Right Hand Side. We now solve for a:

 

            (a W1­x, 0)  =  (x,y) - (y W2x / W2­Y, y)

            (a W1­x, 0)  =  (x   -   y W2x / W2­Y, 0)

 

Then we can solve for a:

 

            a =  (x   -   y W2x / W2­Y)   /   W1­x.

 

A similar argument would work for any two vectors W1, W2 that are not in the same direction.

 

 

The last thing we’re going to do is how to find a and b for W1, W2. I’ll give a method (using Gaussian Elimination) that will always work, although I won’t prove why.

 

Let W1 = (A, B) and W2 = (C,D).

 

We want, given the vector (x,y), to find a and b such that

 

            (x)                      (A)                    (C)

(y)        =          a (B)      +        b (D)

 

So we have

 

            (x)                    (aA)                 (bC)

(y)        =          (aB)     +         (bD)

 

 

            (x)                    (aA + bC)

            (y)        =          (aB + bD)

 

 

            (x)                    (Aa + Cb)

            (y)        =          (Ba + Db)       

 

 

            (x)                    (A   C) (a)

(y)        =          (B   D) (b)

 

 

Now we use Gaussian Elimination to solve! So, we say: what must we multiple the first row (A  C)  by so that, when we add it to  the second row (B  D)  we get  (0 something).

 

So A*m + B = 0, so we multiply the first row by -B/A. Etc...

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTES ON LINEAR ALGEBRA

 

CONTENTS:

[10] BASIS VECTORS - PART II

[11] LINEAR TRANSFORMATIONS

 

 

[10] BASIS VECTORS - PART II

We’ll now give a procedure to determine when two vectors W1 and W2 are a basis for the plane. Not only will our method say if they’re a basis, but it will also tell us how to find a and b.

 

The Equation we’re trying to solve is:

 

Let                                           (R)                                           (U)                  

                        W1      =          (S)                   W2      =          (V)

 

 

Find a, b so that           

 

                        (x)                    (R)                   (U)

                        (y)        =       a (S)       +      b  (V)

                                               

Then

 

                        (x)                    (aR)                 (bU)

(y)        =          (aS)     +          (bV)

 

 

                        (x)                    (Ra)                 (Ub)

(y)        =          (Sa)     +          (Vb)

 

 

                        (x)                    (Ra + Ub)

(y)        =          (Sa  + Vb)

 

 

                        (x)                    (R U) (a)

(y)        =          (S V) (b)

 

 

 

 

But this equation is just begging us to use Gaussian Elimination. We need to find a number m such that, if we multiply the first row by m and add it to the second, we get the new row will be (0 something).

 

So           Rm + S = 0

 

hence               m =  -S / R

 

 

 

So, we carry out the Gaussian Elimination. It can be shown that if the two vectors (R,S) and (U,V) do not lie on the same line, then Gaussian Elimination will never yield the last row all zero.

 

Let’s do some examples:

 

FIRST EXAMPLE

 

                                    (2)                                            (3)

            W1      =          (4)                    W2      =          (7)

 

 

Then we must solve

 

                        (x)                    (2)                    (3)

                        (y)        =       a (4)        +      b  (7)

                                               

Then

 

                        (x)                    (2a)                  (3b)

(y)        =          (4a)      +          (7b)

           

 

Hence

                        (x)                    (2 3) (a)

(y)        =          (4 7) (b)

           

 

 

NOW WE DO GAUSSIAN ELIMINATION:

 

So, what should we multiply the first row by? We need 2m + 4 = 0, so m = -4/2 = -2.

 

Hence we get

 

                        (x)                    (2 3) (a)

(y-2x)  =          (0 1) (b)

           

Or        2a + 3b = x;     0a + 1b = y

 

Therefore         b = y.

                        2a + 3b = x      à        a = (x-3b)/2 = (x-3y)/2

 

So, given a vector (x,y) we can find a,b such that (x,y) = aW1 + bW2. So W1, W2 is a basis!

 

 

SECOND EXAMPLE

 

                                    (2)                                            (4)

            W1      =          (4)                    W2      =          (8)

 

 

Then we must solve

 

                        (x)                    (2)                    (4)

                        (y)        =       a (4)        +      b  (8)

                                               

Then

 

                        (x)                    (2a)                  (4b)

(y)        =          (4a)      +          (8b)

           

 

Hence

                        (x)                    (2 4) (a)

(y)        =          (4 8) (b)

           

NOW WE DO GAUSSIAN ELIMINATION:

 

So, what should we multiply the first row by? We need 2m + 4 = 0, so m = -4/2 = -2.

 

Hence we get

 

                        (x)                    (2 4) (a)

(y-2x)  =          (0 0) (b)

 

So, the two equations we must solve are:

 

            2a + 4b = x                  and                   0a + 0b = y - 2x

 

Now, regardless of what a and b are, 0a + 0b is always zero. If y - 2x is not zero, it will be impossible to solve the second equation. So, can we find x and y such that y - 2x ¹ 0? Sure. Take x = 0, y non-zero. Or take x nonzero, y = 0. Or take y = 22, x = 12. Almost any choice works.

 

So we see that W1, W2 are not a basis. We ended up with a row of zeros. Let’s look at our two vectors again:

 

                                    (2)                                            (4)

            W1      =          (4)                    W2      =          (8)

 

Notice that

                                    (2*2)                (2)

            W2      =          (2*4)      =     2 (4)          =        2W1

 

Not only are W1 and  W2 not a basis, but they lie on the same line!

 

 

THIRD EXAMPLE

 

                                    (1)                                            (2)                                            (0)

            W1      =          (2)                    W2      =          (2)                    W3      =          (1)

(1)                                            (2)                                            (1)

 

 

Are W1, W2, W3 a basis?

 

 

(x)                    (1)                    (2)                    (0)

(y)        =     a   (2)        +      b  (2)        +       c (1)

(z)                    (1)                    (2)                    (1)

 

 

            (x)                    (1a)                  (2b)                  (0c)

            (y)        =          (2a)      +          (2b)      +          (1c)

            (z)                    (1a)                  (2b)                  (1c)

 

 

            (x)                    (1a + 2b + 0c)

            (y)        =          (2a + 2b + 1c)

            (z)                    (1a + 2b + 1c)

 

            (x)                    (1 2 0) (a)

            (y)        =          (2 2 1) (b)

            (z)                    (1 2 1) (c)

 

 

 

Now we do Gaussian Elimination! We multiply the first row by -2 and add it to the second row. (1m + 2 = 0, m = -2).

 

            (x)                    (1 2  0) (a)

            (y-2x)  =          (0 -2 1) (b)

            (z)                    (1 2  1) (c)

 

Now we multiply the first row by -1 and add it to the third row (1m + 1 = 0, m = -1).

 

            (x)                    (1 2  0) (a)

            (y-2x)  =          (0 -2 1) (b)

            (z-x)                 (0 0  1) (c)

 

We don’t have to do any more work, as this matrix is UPPER TRIANGULAR. This means the matrix is all zeros below the main diagonal. We can now solve the three equations, one at a time.

 

            0a + 0b + 1c = z - x     à c = z - x

           

            0a - 2b + 1c  = y - 2x  à b = (y-2x - z + x) / -2

 

            1a + 2b + 0c = x          à a = y - 2x - z + x

 

So these three vectors are a basis.

 

 

In general, to determine if something is a basis for 3space:

 

           

                                    (L)                                           (P)                                           (S)

            W1      =          (M)                  W2      =          (Q)                   W3      =          (T)

(N)                                           (R)                                           (U)

 

 

Are W1, W2, W3 a basis?

 

(x)                    (L)                   (P)                   (S)

(y)        =     a   (M)      +      b  (Q)       +       c (T)

(z)                    (N)                   (R)                   (U)

 

 

            (x)                    (L a)                (P b)                (S c)

            (y)        =          (M a)   +          (Q b)    +          (T c)

            (z)                    (N a)                (R b)                (U c)

 

 

            (x)                    (L a + P b + S c)

            (y)        =          (M a +Q b + Tc)

            (z)                    (N a + R b + Uc)

 

            (x)                    (L  P  S) (a)

            (y)        =          (M Q T) (b)

            (z)                    (N  R U) (c)

 

 

The reason for all the colour is to (hopefully) show how things are going. To determine if W1, W2, W3 are a basis, we are led to solving a matrix equation. The first column of our matrix is W1, the second column is W2, the third column is W3. Call this matrix W. We then have

 

            (x)

            (y)        =          a W1  + b W2 + c W3

            (z)

 

            (x)                    (                         ) (a)

            (y)        =          ( W1   W2   W3)(b)

            (z)                    (                         ) (c)

 

 

 

 

 

[11] LINEAR TRANSFORMATIONS

Linear Transformations are very useful in mathematics. The reason is they allow us to understand functions at complicated values by understanding them at simpler values. First, the definition for functions, then we’ll generalize to matrices:

 

We say a function is a linear function if two conditions hold:

(1)  f(x + y) = f(x) + f(y) for all x,y

(2)  f(ax) = af(x)

 

Now, it is very unusual for a function to be linear. Take f(w) = Sin[w].

 

Then f(x) = Sin[ax], which usually is not a Sin[x]. For example, if x = 180, then a Sin[x] is always zero. But if a = ½, Sin[a x] = Sin[90] = 1.

 

Let’s try f(w) = w2. Does f(ax) = af(x)?

 

Well, f(ax) = (ax)2 = a2 x2 = a2 f(x) ¹  a f(x) unless a = 1 or 0.

Also, f(x+y) = (x+y)2 = x2 + 2xy + y2 = f(x) + 2xy + f(y) ¹ f(x) + f(y) unless x or y = 0.

 

How about f(w) = 3w + 1?

Well, f(ax) = 3(ax) + 1 =          a(3x) + 1

                                                =          a(3x + 1 - 1) + 1

                                                =          a(f(x) - 1) + 1

                                                =          a f(x) - a + 1

¹          a f(x) unless a = 1

 

 

Just in case you’re wondering if any function is linear, here’s one that is:

 

            f(w)      =          3w

 

Then     f(ax)   =   3(ax)   =   a(3x)   =  a f(x)

 

            f(x+y)  =  3(x+y)  =  3x + 3y = f(x) + f(y)

 

[NOTE: one can prove that the only linear functions are f(x) = cx, where c is any real or complex number].

 

 

We now generalize this to higher dimensions. Why do we care about higher dimensions? Well, matrices act on vectors (you’ve seen this in your force / stress diagrams) and it turns out that matrices will be linear transformations.

 

Let  V and W be any two vectors with the same number of components, and let e be a real number. Then any matrix (that is the correct size to act on V and W) is a linear transformation, namely,

 

            (1)        A (V + W)       =   A V + A W

            (2)        A(c V)             = c A V           

 

 

I’ll sketch the proof for the 2x2 case:

 

                                    (v1)                                          (w1)

Let       V         =          (v2)                  W        =          (w2)

 

                                    (a b)

Let the matrix  A   =     (c d)

 

 

Then

           

                                    (a b) (    (v1)        (w1)    )

A (V + W)       =          (c d) (    (v2)   +   (w2)    )

 

                                    (a b) ( v1 + w1)

                        =          (c d) ( v2 + w2)

 

                                    ( a(v1 + w1)  +  b(v2 + w2) )

                        =          ( c(v1 + w1)  +  d(v2 + w2) )

 

                                    ( av1 + bv2      +    aw1 + bw2 )         

                        =          ( cv1 + dv2      +    cw1 + dw2 )

 

                                    ( av1 + bv2 )       ( aw1 + bw2 )         

                        =          ( cv1 + dv2 )    +  ( cw1 + dw2 )

 

                                    (a b) (v1)             (a b) (w1)

                        =          (c d) (v2)         +  (c d) (w2)

 

                        =          A V + A W

                       

 

The other condition is even easier to check:

 

 

                                    (a b) (    (v1)  )

A (eV)             =          (c d) ( e (v2)   )

 

                                    (a b) (e v1)

                        =          (c d) (e v2)

 

                                    (ae v1 + be v2)

                        =          (ce v1 + de v2)

 

                                    (a v1 + b v2)

                        =      e  (c v1 + d v2)

 

                                    (a b) (v1)

                        =      e  (c d) (v2)

 

                        =      e A V

 

 

A similar proof works for any size matrix, concluding the proof.

 

COMING ATTRACTIONS:

WHY DO WE CARE ABOUT BASES? WHY DO WE CARE ABOUT LINEAR TRANSFORMATIONS? WHAT’S THE CONNECTION BETWEEN THE TWO?

 

Eventually, we’ll see that certain matrices have natural ‘bases’ attached to them. They (and powers of them) may look very ugly as given. But if we changes bases, using something else other than the x-axis and the y-axis, we can often make the matrices look good.

 

For symmetric matrices, this will be the case. In fact, the Principle Axis Theorem says we will be able to find a basis where, if we write our matrix relative to that basis, it will be diagonal!

 

Also, let’s say W1 and W2 are a basis. Then we can write any vector V = (x,y) in terms of the two, or

 

                                                V = a W1 + b W2

 

Then if A is a matrix, we have

 

                                                A V = A ( aW1 + bW2 )

 

                                                A V = A (aW1) + A (bW2)

 

                                                A V = a (A W1) + b (A W2)

 

Or, more generally,                   AN V = a (AN W1) + b (AN W2)

 

 

Real Symmetric Matrices have what is called a ‘basis of eigenvectors’. That means there are real numbers c1 and c2 such that

 

            A W1 = c1 W1             A W2 = c2 W2

 

Applying A multiple times yields

 

            AN W1 = c1N W1         AN W2 = c2N W2

 

Hence

 

AN V    = a (AN W1) + b (AN W2)

 

                        = a c1N W1       +   b c2N W2    (Eq 11.1)

 

 

So here’s the advantage: Let’s say N is real big, say a billion. If we were to calculate AN V we would have to multiply A by itself one billion times, and then have that act on V. That’s a lot of calculation to do.

 

But, if our matrix is symmetric, we’ll be able to find W1 and W2 (two calculations), c1 and c2 (two more calculations) and numbers a and b (two more calculations), and then we just take N = one billion in (Eq 11.1), and we’re done!

 

See how much we saved!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTES ON LINEAR ALGEBRA

 

CONTENTS:

[12] EIGENVALUES / EIGENVECTORS

 

 

[12] EIGENVALUES / EIGENVECTORS

 

In the last section, we examined when two vectors are a basis for the plane. Let’s recall what this means. The plane is 2-dimensional. So, we expect that we should be able to specify any vector with two pieces of information, say East component and North component. This corresponds to the standard basis (x-axis, y-axis).

 

In [11] we saw that, as long as W1 and W2 are not in the same direction, they are a basis for the plane. This means we can write any vector V as V = a W1 + b W2, where a and b are numbers that can be determined, and depend on V.

 

However, nowhere in [11] did we discuss why we would want to use a basis other than the standard x-axis, y-axis.

 

The reason is geometry. Often we’ll be studying certain matrices that model physical systems. Those physical systems may have certain axes of symmetry, which will often manifest itself in the matrix. And what we will find is that the matrix looks more ‘natural’, more ‘symmetric’, if we change basis.

 

Let A be a matrix acting on a vector v. What can we say about the vector Av?

 

In general, not much, unless I give you information about A and v. A vector encodes two pieces of information: magnitude, and direction. When we apply a matrix to a vector, we get a new vector. Usually, that vector will have a different magnitude and a different direction.

 

In terms of computations, this is often unfortunate. We may be interested in some iterative system, where we might have A100 v or A6022045 v. If Av is in a different direction than v, we have no quick and easy way to calculate A2 v. Why? We know the magnitude and direction of Av. But we know nothing about A(Av).

 

If Av is in the same direction as v, however, it’s a different ballgame. Let’s say Av = 3v. Then we can calculate Aany power v easily.

 

 

 

For example:

 

A2 v     =          A (Av)

            =          A (3 v)

            =          3 (A v)

            =          3 (3 v)

            =          32 v

 

 

A3 v     =          A (A2 v)

            =          A (32 v)            by the previous calculation

            =          32 (A v)

            =          32 (3 v)

            =          33 v

 

 

A4 v     =          A (A3 v)

            =          A (33 v)            by the previous calculation

            =          33 (A v)

            =          33 (3 v)

            =          34 v

 

A5 v     =          A (A4 v)

            =          A (34 v)            by the previous calculation

            =          34 (A v)

            =          34 (3 v)

            =          35 v

 

 

And a similar argument shows A6 = 36 v, ..., An v = 3n v.

 

 

DEFINITION OF EIGENVALUE / EIGENVECTOR:

We say v is an eigenvector and l is its eigenvalue if

(1) A v = l v

(2) v is not the zero vector

 

 

 

 

 

 

 

Note that a vector is an eigenvector relative to a given matrix. For example,

 

(2 1) (1)                       (3)                       (1)

(1 2) (1)           =          (3)        =          3 (1)

 

(2 1) (1)                       (1)                       (1)

(1 2) (-1)          =          (-1)      =          1 (-1)

 

So (1,1) is an eigenvector with eigenvalue 3, while (1,-1) is an eigenvector with eigenvalue 1. But if we consider a different matrix

 

(1 2) (1)                       (2)

(3 4) (1)           =          (7)

 

we see (1,1) is not an eigenvector.

 

Why do we exclude the zero vector? The reason is the zero vector would be an eigenvector for ANY matrix, and ANY number would be its eigenvalue. Let Z be the zero vector. Then

 

            A Z = Z = 2 Z = 3 Z = -2342.324 Z

 

and Z would not have a unique eigenvalue. Again, this is a matter of notation / convenience. We will see later its just easier if all eigenvalues are non-zero, for we’ll prove certain nice facts about them. For example, a beautiful theorem of Linear Algebra (The Principal Axis Theorem) states that if you have a symmetric real matrix, you can find a basis of mutually perpendicular vectors Vi such that each Vi is an eigenvector of the matrix A! Wow! This means we can compute the action of large powers of symmetric matrices with very little work.

 

Let’s see now how eigenvectors can make life liveable. We’ll work with the matrix

 

           

            A         =          (2 1)

                                    (1 2)

 

We saw above that if V = (1,1) and W = (1,-1) then

 

            A V = 3 V

            A W = 1 W

 

V and W are not in the same direction, so they’re a basis for the plane. So, if you give me any vector (x,y), I can find a and b (depending on x and y) such that

 

            (x,y) = a V + b W.

 

Why is this helpful?

 

Let’s build up in stages.

 

            A(V + W)        =          AV + AW

                                    =          3V + 1W         =          3V + W

 

            A(2V + 11 W)             =          A(2V)  + A(11 W)

                                                =          2 (A V) + 11 (A W)

                                                =          2 (3 V) + 11 W

                                                =          6 V + 11 W

 

            A4 (2V + 11 W)           =          B (2 V + 11 W),          where B = A4

                                                =          B (2V) + B (11 W)

                                                =          2 (B V) + 11 (B W)

                                                =          2 (A4 V) + 11 (A4 W)

                                                =          2 (34 V) + 11 W

 

NOTE: we have the rule

 

            A (X + Y)                    =          A X + AY

 

This is still true for powers of A, as A4, A5, etc. are also matrices:

 

            A4 (X + Y)                   =          A4 X + A4 Y

 

 

So, in full generality:

 

Eq 12.1            A n (aV + bW)             =          An (aV) + An (bW)

                                                            =          a (An V) + b (An W)

                                                            =          a (3n V) + b W

 

So we see how easy it is to calculate. Let’s take n = 2,000,000, and consider A U, where U = (2,3). Now, A U is not in the same direction as U, nor is A2 U in the same direction as U or A U, etc., so we have 2,000,000 matrix multiplications to do! That’s a lot.

 

The other way is, FIRST, we express U in terms of our nice basis V, W. We do this by Gaussian Elimination:

 

            U = a V + b W

 

            (2)                       (1)                   (1)

            (3)        =          a (1)     +          b(-1)

 

            (2)                    (1a)                  (1b)

            (3)        =          (1a)      +          (-1b)

 

 

            (2)                    (1a +  1b)

            (3)        =          (1a + -1b)

 

 

            (2)                    (1  1) (a)

            (3)        =          (1 -1) (b)

 

 

And then we do Gaussian Elimination. And now we are done! By Eq 12.1 we are done – all we have to do is put in the values of a and b, and that n = 2,000,000.

 

Doing the calculation this way is about two steps; the other way is 2,000,000. This is a phenomenal savings. The long way is beyond the strength of the computers -–the numbers are just too large.

 

Why are we able to have such a savings? This is not a trivial point – it’s one of the strengths of Linear Algebra. Linear Algebra is an efficient way of arranging computations. The long way involves lots of hidden cancellation, cancellation that never survives to the end when we group everything.

 

An example might help. Consider doing the following addition:

 

             1

            +5        - 5

                        +6        -6

                                    +7        -7

                                                +8        -8

                                                            +9        -9

 

If we add horizontally, each row is zero but the first. It doesn’t matter how many rows I’ve got, the final answer is just going to be one. There’s only one computation to do.

 

What if we add vertically?

 

The we get 6 + 1 +1 + 1 + 1 - 9 = 1. We get the same answer, but it’s many more steps. The reason is we add 5 then subtract 5. We add 6 then subtract 6. We add 7 then subtract 7. We add 8 then subtract 8. We add 9 then subtract 9. We keep doing things that cancel, but we don’t realize it, and hence have to go thru all the steps.

 

 

NOTES ON LINEAR ALGEBRA

 

CONTENTS:

[13] DOT PRODUCTS

[14] DETERMINANTS - I

 

 

[13] DOT PRODUCTS

 

The Dot Product is a function from pairs of vectors to numbers. So, the input is two vectors, say v = (x1, y1) and w = (x2, y2). We use a dot, ·, to represent the Dot Product.

 

Let |v| denote the length of the vector v.

 

                                                v  =  (x1, y1)

                                                 

 

 

                                                                this vector has length y1      

 

 

                  

                             this vector has length x1

 

 

 

So by the Pythagorean Theorem, the vector v  has length Sqrt[x12 + y12].

 

Hence |v| = Ö x12 + y12. Similarly |w| = Ö x22 + y22. We now define the Dot Product:

 

          v · w = x1 x2 + y1 y2

 

We will show later that the Dot Product has a very special property, which will explain its usefulness. Namely, if we have

 

 

                                w

 

 

 

 

 

 


                                                                                v

                        q

 

 

 

 

So, if q is the angle between v and w, then it is a theorem that

 

          v  · w  =  |v| |w| cosq

 

 

As always, we will only prove this in two dimensions. Let’s look at some special cases. Consider two vectors v and w that are perpendicular, for example:

 

                        w = (0,y2)

 

 

 

 

 

 

 


                                                                                          v = (x1, 0)

 

 

Then v  · w = x1 0 + 0 y2 = 0. But as q = 90, cosq = 0, so the formula holds in this case!

 

Now let’s consider v and w in the same direction, say along the x-axis:

 

 

                                v = (x1,0)              w = (x2, 0)

 

 

 


Then v  · w = x1 x2. But here q = 0, so cosq = 1, and again the formula works.

 

Let’s do a more exotic example. Let’s do v and w in the same direction, but not necessarily along the x-axis.

 

 

 


                                                                             w = (3x,3y)

 

 

 

 

 


                                                    v = (x,y)

                                    

 

 

 

 

Now, |v| = Sqrt[x2 + y2], |w| = Sqrt[9x2 + 9y2], q = 0 so cosq = 1.

 

Then |v| |w| cosq    = Sqrt[x2 + y2] * Sqrt[9x2 + 9y2]

                             = Sqrt[x2 + y2] * 3 Sqrt[x2 + y2]

                             = 3 (x2 + y2)

 

And   v  · w          =       x 3x + y 3y = 3 (x2 + y2).

 

So again, the formula is true.

 

We now need a linearity property of the Dot Product. Let’s say we have three vectors u, v, and w. Then

 

          u · (v + w) = u · v  +  u · w

 

The proof is by straightforward computation. Let’s take as our three vectors

 

          u = (x1, y1)

          v = (x2, y2)  

w = (x3, y3)

 

Then v + w = (x2 + x3, y2 + y3)

and u · (v + w)      =  x1  (x2 + x3) + y1 (y2 + y3)

                             =  x1 x2  + x1 x3 + y1 y2 + y1 y3

                                                = x1 x2  + y1 y2 + x1 x3 + y1 y3

                             = u · v  +  u · w

 

 

Using all the junk we’ve just proved, we can now show

v  · w  =  |v| |w| cosq

 

Consider two vectors v = (a,b) and w = (c,d):

 

                                                w = (c,d)

 

 

 

 

 

 


                                   q                                                           v = (a,b)

 

 

 

 

We break w up into two different vectors:

          wperp, which is perpendicular to v, and wpara, which is parallel to v.

 

By the above, we have

 

          v · w  = v · (wperp + wpara) = v · wperp + v · wpara

 

But v · wperp = 0, and by the special case, v · wpara = |v| |wpara| cosqv wpara

 

where  qv wpara is the angle between v and wpara. But this angle is 0, so we get

 

v · w = v · wpara = |v| |wpara­|

 

However, we know what |wpara| is – it’s just |w| cosq. Why? wpara is the base of a right triangle with hypotenuse w and angle q. So substituting above for |wpara| yields

 

          v · w  =  |v| |w| cosq

 

So we have proved the result in two dimensions. If we were working in 3 space, where we’d have vectors like

 

          v = (x1, y1, z1)

          w = (x2, y2, z2)

 

then analogously we define v · w = x1 x2 + y1 y2 + z1 z2. Since any two vectors lie in a plane (doesn’t matter how many dimensions we are in) we can still talk about the angle between two vectors, and the analogous statement is true.

 

The three things to take away from Dot Products are:

 

          [1] Two vectors have dot product zero if and only if they are perpendicular

          [2] The dot product of two vectors is the product of their lengths if and only if the two vectors are parallel.

          [3] The Dot Product measures the angle between two vectors. More precisely,  cosq =  v · w / |v| |w|. So, if I know the length of two vectors AND if I know their dot product, I can immediately measure the angle between them!

 

 

 

 

 

 

[14] DETERMINANTS - I

 

There are several interpretations for Determinant. For now, we will view it as a function whose input is a SQUARE matrix and whose output is a number. We will see in the 2x2 case that this number is the AREA of the parallelogram formed by the rows of A.

 

If the rows of A are parallel, then this parallelogram will have zero area; if the rows of A aren’t parallel, then this parallelogram will have non-zero area. So, the Determinant provides a quick check of whether or not two vectors are in the same direction.

 

In the plane, this isn’t too important; however, in higher dimensions it becomes indispensible. Let’s say we are in 3-dimensional space. A is a 3x3 matrix, so its three rows give us three vectors. They form the generalization of a parallelogram, a parallelpiped. (I may have the terminology wrong – it’s been a long time since I’ve used these words!). So instead of talking about area, we should talk about volume. If the three vectors lie in one plane, then this parallelpiped will have zero volume. If the three vectors don’t lie in one plane, then the parallelpiped will have non-zero volume. So for 3x3 matrices, the Determinant will measure whether or not the three rows lie in a plane, or if they ‘fill’ all of three space. Eventually we’ll see this is related to questions of when is a matrix invertible.

 

Now for the definition for 2x2 matrices.

 

                             (a b)

Let     A       =       (c d)

 

Then we denote Determinant(A) several ways:

 

                                                |a b|

Determinant(A) = Det(A)   =      |c d|  = ad - bc

 

Let’s see that Det(A) does give the area in certain special cases.

 

 

 

 

CASE 1:

          (  a   b)

A  =   (3a 3b)

 

 

Then Det(A) = a 3b - b 3a = 0.

 

Note there’s nothing special about 3:

 

          (  a    b)

A   = (ma mb)

 

Then Det(A) = a mb - b ma = 0.

 

So, when one row is parallel to another, we do get Det(A) = 0!

 

 

CASE 2:

 

          (a 0)

A   = (c d)

 

                                  (c,d)

 

 

 

 

                                                                                      (a,0)

 

 

The base of the parallelogram is a, the height is d. Hence the area is ad.

But Det(A) = ad - 0c = ad. So in this case, it works.

 

 

Now we consider the general 2x2 case and, using the Dot Product, we’ll prove that Det(A) = area of parallelogram formed by the rows of A.

 

 

                             (a b)

Again, take A   =   (c d)

 

                               w =  (c,d)

 

 

 

                                                            wperp

 

 

 

                                                                                                               v =(a,b)

 

 

 

 

 


                           q    wpara

 

 

 

 

 

|v|2 = a2 + b2 and |w| = c2 + d2 (by the pythagorean theorem).

 

|wpara| = |w| cosq, |wperp| = |w| sinq.

 

So the area of the parallelogram is |v| |wperp|, or

 

          Area = |v| |wperp| = |v| |w| sinq

 

But cos2q + sin2q = 1, so sinq = Sqrt[1 - cos2q].

 

Moreover, |v| |w| cosq = v · w = ac + bd.

 

Dividing by |v| |w| yields  cosq = (ac+bd) / |v| |w|

 

Hence Area = |v| |w| Sqrt[1 - cos2q]

                   = |v| |w| Sqrt[1 - (ac+bd)2 / |v|2 |w|2 ]

                   = Sqrt[|v|2 |w|2 - (ac+bd)2]

 

Substituting for |v|2 = a2 + b2 and |w| = c2 + d2 yields

 

          Area   = Sqrt[ (a2 + b2)( c2 + d2) - (ac+bd)2 ]

                   = Sqrt[ a2c2 + a2d2 + b2c2 + b2d2 - a2c2 - 2acbd - b2d2]

                   = Sqrt[a2d2 + b2c2  - 2acbd ]

                   = Sqrt[a2d2 - 2adbc + b2c2]

                   = Sqrt[ (ad - bc)2 ]

                   = ad - bc

 

So the area of the parallelogram is ad - bc, which is just Det(A)!

 

One of the reasons Determinant is such a useful function is that say we start with a matrix A, and we do Gaussian Elimination, ending up with a matrix B. Then A and B have the same determinant!

 

The reason is Gaussian Elimination is just adding multiples of one row to another. So, let’s start with the matrix

 

                             (a b)

          A       =       (c d)

 

 

 

 

               w = (c,d)

 

 

 

 

                                                                                      v = (a,b)

 

(To simplify things, I’m drawing it as if v is along the x-axis, though the method of proof works in general. This just makes the pictures look nicer).                                                                                                                                 

Now let’s say we add on a small multiple of (a,b) to (c,d). So we have a new vector w’ = (c+ma, d+mb). Geometrically:

 

               w = (c,d)

                                          w

                             

 

 

                                                                                      v = (a,b)

 

 

Notice that they have the same base, and the same size height! Hence the two areas are the same.

 

We can also argue algebraically:

 

 

Det(A) = ad - bc.

 

          (   a           b   )

B  =   (c+ma  d+mb)

 

Then Det(B) = a(d+mb) - b(c+ma)

                    = ad + mab - bc - mab = ad - bc

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTES ON LINEAR ALGEBRA

 

CONTENTS:

[15] COMPLEX NUMBERS

[16] FINDING EIGENVALUES

 

 

[15] COMPLEX NUMBERS

 

While we’ve seen in previous sections how useful eigenvalues and eigenvectors can be, we haven’t yet seen how to find them! If it’s a very complicated process, then the benefits they provide could be canceled by the work needed to find them. Fortunately, all one needs to do is solve a polynomial and perform Gaussian Elimination.

 

Somehow, to each square matrix we’ll attach a polynomial in one variable, whose degree is the number of columns (or equivalently, the number of rows). So to find the eigenvalues of a 2x2 matrix just requires us to solve a quadratic equation, which is trivial by the quadratic formula.

 

Unfortunately, a polynomial with real coefficients does not necessarily have real roots. For example, x2 + 1 = 0 has two roots, x = i and x = -i, where as always i = Sqrt[-1].

 

Reminder: below is a list of types of numbers. Each one is a subset of the next:

 

          [1] Integers: ..., -2, -1, 0, 1, 2, ...

          [2] Rationals: p/q, where p, q are integers and q ¹ 0

          [3] Reals: think any terminating or infinite decimal

          [4] Complex: of the form a + bi where a and b are real numbers

 

So, even if we want to study ONLY matrices with real coefficients, we may need to introduce complex numbers to find their eigenvalues. For example, consider the following matrix

 

 

                   (0 -1)

R       =       (1  0)

 

We’ll see later that this has eigenvalues ±i.

 

However, all is not lost. We have several theorems that will help us in our study:

 

 

FUNDAMENTAL THEOREM OF ALGEBRA:

Consider a polynomial of one variable, of degree n. Then there are n roots (not necessarily distinct).

 

 

THEOREM OF COMPLEX CONJUGATION:

Let f(x) be a polynomial with real coefficients. Then, if z is a root of f(x) (ie, f(z) = 0), then so is the complex conjugate of z.

 

 

EIGENVALUES OF SYMMETRIC MATRICES:

The eigenvalues of symmetric matrices are real if all the entries of the symmetric matrix are real.

 

 

Basic properties of complex numbers:

 

ADDITION:

    2 + 3i                  11 - 7i

          +  4 - 5i                 + 8 + 8i

          -----------                -----------

              6 - 2i                   19 + i

 

 

 

 

 

 

 

 

MULTIPLICATION:

 

Recall (a+b)(c+d) = ac + ad + bc + bd. This is how you multiply complex numbers, but  you must remember that i2 = -1.

 

For example:

 

          (2 - 3i)(5 + 2i)        =       2*5 + 2*2i - 3i*5 - 3i*2i

                                      =       10 + 4i - 15i - 6i2   

                                      =       10 - 11i + 6

                                      =       16 - 11i

 

 

GRAPHICAL REPRESENTATION:

 

 

                                                              2+4i

 

 

 


                        

                2-2i        

 


                                                                       3+i

 

 


                       -2-2i

 

 

 

 

 

 

 

 


COMPLEX CONJUGATION:

                               

If z = x + iy, then  z  = x - iy. We read this the complex conjgate of z.

So 3 - 2i goes to 3 + 2i.   -5 - 7i goes to -5 + 7i. -11 goes to -11, 76i goes to -76i. Remember any real number x can be written x + oi. Any number of the form 0 + iy is said to be purely imaginary.

 

[16a] FINDING EIGENVECTORS (First Version)

 

A vector has two parts: (1) a direction; (2) a magnitude. Let A be a matrix, and v a vector. Then Av is a new vector. In general, Av and v will be in different directions. However, sometimes one can find a special vector (or vectors) where Av and v have the same direction. In this case we say v is an eigenvector of A. For shorthand, we often drop the ‘of A’ and say v is an eigenvector.

 

However, in general v will not equal Av – they may be in the same direction, but they’ll differ in magnitude. For example, Av may be twice as long as v, or Av = 2v. Or maybe it’s three times, giving Av = 3v. Or maybe it’s half as long, and pointing in the opposite direction: Av =  -½ v.

 

In general, we write for v an eigenvector of A:

 

          Av = l v, where l is called the eigenvalue.

 

One caveat: for any matrix A, the zero vector 0 satisfies A 0 = 0. But it also satisfies A 0 = 2 0, A 0 = 3 0, .... The zero vector would always be an eigenvector, and any real number would be its eigenvalue. Later you’ll see it’s useful to have just one eigenvalue for each eigenvector; moreover, you’ll also see non-zero eigenvectors encode lots of information about the matrix. Hence we make a definition and require an eigenvector to be non-zero.

 

The whole point of eigenvectors / eigenvalues is that instead of studying the action of our matrix A on every possible direction, if we can just understand it in a few special ones, we’ll understand A completely. Studying the effect of A on the zero vector provides no new information, as EVERY matrix A acting on the zero vector yields the zero vector.

 

Note: what is an eigenvector for one matrix may not be an eigenvector for another matrix. For example:

 

(1 2) (1)  =   (3)  = 3 (1)

(2 1) (1)       (3)         (1)

 

so here (1,1) is an eigenvector with eigenvalue 3.

 

(1 1) (1)       =       (2)

(2 2) (1)                 (4)

 

and as (2,4) is not a multiple of (1,1), we see (1,1) is not an eigenvector.

 

 

Let’s find a method to determine what the eigenvector is, given an eigenvalue.

 

So, we are given as input a matrix A and an eigenvalue l, and we are trying to find a non-zero vector v such that Av = l v.

 

Remember, if I is the identity matrix, Iv = v for any vector v. This is the matrix equivalent of multiplying by 1.

 

Av = l v      in algebra, we put all the unknowns on one side.

                   So here we subtract lv from both sides. I’m going to

                   write 0 for the zero vector, but remember, it is not

                   just a number, but a vector where every component is zero.

 

Av - l v = 0

 

Av - lIv = 0 remember, your prof is from an Iv-y school: put in the ‘I’

 

(A - lI) v = 0

 

See, lI is a matrix, A is a matrix, so the above is legal. We NEED to put in the Identity matrix. Otherwise, if we went from Av - lv to (A - l)v we’d be in trouble. There, A is a matrix (say 2x2), but l is a number. And we don’t know how to subtract a number from a matrix. We do, however, know how to subtract two matrices. Hence we can calculate what A - lI is, and then do Gaussian Elimination.

 

 

Let’s do an example: 

 

Say A is

 

(4 3)

(2 5)

 

and say l = 2. We now try to find the eigenvector v.

 

Av = 2v

Av - 2v = 0

Av - 2Iv = 0

(A - 2I)v = 0

 

Let’s determine the matrix A - 2I

 

I is     (1 0)      so lI is     (2 0)

          (0 1)                      (0 2)

 

Hence

 

A - lI =       (4 3)   -        (2 0)   =      (2  3)

                   (2 5)            (0 2)            (2  3)

 

So we are doing Gaussian elimination on the above matrix. Let’s write v = (x,y). Then we must solve:

 

(2 3) (x)       =       (0)

(2 3) (y)                 (0)

 

So, we multiply the first row by -1 and add it to the second, getting

 

(2 3) (x)       =       (0)

(0 0) (y)                 (0)

 

The second equation, 0x + 0y = 0, is satisfied for all x and y. The first equation, 2x + 3y = 0, says y = - 2/3  x. So we see that

 

v   =   (x)     =       (    x    )

          (y)               (-2/3 x)

 

Now x is arbitrary, as long as v is not the zero vector. There’s many different choices we can make. We can take x = 1 and get the vector

(1, -2/3). We can take x = 3, and get the vector v = (3,-2). Notice, however, that the second choice is in the same direction as the first, just a different magnitude.

 

This reflects the fact that if v is an eigenvector of A, then so is any multiply of v. Moreover, it has the same eigenvalue. Here’s the proof:

 

Say Av = lv. Consider the vector w = mv, where m is any number.

 

Then Aw    =       A(mv)

                   =       m(Av)

                   =       m(lv)

                   =       l(mv)

                   =       lw

 

Hence the different choices of x just correspond to taking different lenghts of the eigenvector. Usual choices include x = 1, x = whatever is needed to clear all denominators, and x = whatever is needed to make the vector have length one.

 

 

 

[16b] FINDING EIGENVALUES (Second Version)

 

We now (finally!) shall see how to find the eigenvalues for a given matrix. Let’s look at a SQUARE matrix A, and see what numbers can be eigenvalues, and what numbers can’t. Let I be the corresponding identity matrix. So, if A is 2x2, I is 2x2; if A is 3x3, I is 3x3, etc.

 

If l is an eigenvalue of A, that means there is a non-zero vector v such that

 

          A v    =       l v

 

But Iv = v (as I is the Identity matrix) so

 

          A v              =       l I v

          A v - l I v   =       O where O is the zero vector.

          (A - lI) v     =       O

 

Now, A - lI is a new matrix. Let’s call it Bl. Remember how we subtract matrices:

 

          (a b)                      (e f)                       (a-e   b-f)

          (c d)            -        (g h)            =       (c-g   h-d)   

 

So, we are trying to find pairs l and v (v non-zero) such that

 

          Bl v = O

 

Assume the matrix Bl is invertible. Then we can multiply both sides by Bl-1 and we get

 

          Bl-1 Bl v = Bl-1 O

 

But any matrix acting on the zero vector is the zero vector. Hence the Right Hand Side is just O. On the left, Bl-1 Bl = I, the Identity matrix. So the Left Hand Side is just v.

 

Hence, if Bl is invertible, we get  v = O. But v must not be the zero vector!

 

So we have found a necessary condition:

 

Given a square matrix A, l is not an eigenvalue of A if A - lI is invertible. Hence the only candidates are those l such that A - lI is not invertible.

 

 

 

 

 

 

 

 

It can actually be shown that this necessary condition is in fact sufficient, namely, if A - lI is not invertible, then l is an eigenvalue and there is an eigenvector v. Unfortunately, even if the matrix A has all real entries, it’s possible that its eigenvector could have complex entries, so we will not give a proof now.

 

Hence we need an easy way to tell when a matrix is invertible, and when it isn’t. It turns out that if A is a square matrix (remember, only square matrices are invertible), then A is invertible if and only if Determinant(A) is non-zero. We’ll talk more about this later, for now, you may trust Fine Hall.

 

Given a square matrix A, l is an eigenvalue of A if and only if Determinant(A - lI) = 0.

 

 

 

 

 

 

 

Let’s now do an example. Consider

 

                   (3 2)

A       =       (4 1)

 

Now

                   (l 0)

lI      =       (0 l)

 

and

 

                             (3-l   2    )

A - lI          =       (4       1-l)

 

 

Determinant(A-lI) =       (3-l)(1-l) - (2)(4)

                                      =       3 - l - 3l + l2 - 8

                                      =       l2 - 4l - 5

                                      =       (l - 5)(l + 1)

 

                                      So l = 5 or l = -1, agreeing with the Homework

 

 

Let’s do one more example:

 

                   (2 6)

A       =       (4 4)

 

Now

                   (l 0)

lI      =       (0 l)

 

and

 

                             (2-l   6    )

A - lI          =       (4       4-l)

 

 

Determinant(A-lI) =       (2-l)(4-l) - (6)(4)

                                      =       8 - 4l - 2l + l2 - 24

                                      =       l2 - 6l - 16

                                      =       (l - 8)(l + 2)

                                     

                                      So l = 8 or l = -2.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTES ON LINEAR ALGEBRA

 

CONTENTS:

[18] VECTORS AND MATRICES

[19] GENERAL REVIEW

 

 

[18] VECTORS AND MATRICES

 

This section will be a general review on the differences between vectors and matrices. Depending on what problem you’re studying, there are several different ways of looking at a matrix. For this section, we will look at matrices as maps from one Vector Space to another Vector Space.

 

We won’t go into a technical definition of what a vector space is. Instead, I’ll just mention the ones we’ll be considering: the set of all vectors with exactly two components; the set of all vectors with exactly three components; the set of all vectors with exactly four components; etc.

 

Now, a vector has magnitude and direction. Let’s take a vector v, and have a matrix A act on it. Now, not every matrix can act on every vector. We have the old row-column rule, which says the number of columns of our Matrix must equal the number of rows of our vector.

 

Hence

 

(1 3 5) (4)

(4 6 1) (2)

 

does not make sense: we get 1*4 + 3*2 + 5*???. However,

 

(1 3 5) (4)

(4 6 1) (2)

(3 7 9) (3)

(3 4 1)

 

does make sense.

 

 

Let’s consider Av, where our matrix A and v are chosen so that this makes sense. For example, we could have

 

A       =       (1 2 3)

                   (4 5 6)

 

and

                   (x)

v        =       (y)

                   (z)

 

Then we find that Av equals

 

(1x+2y+3z)

(4x+5y+6z)

 

Note that v is in three-space: it has exactly three components. Av, however, is in two-space: it has exactly two components.

 

Hence we cannot talk about Av + v. It’s impossible for v to be an eigenvector for A. Why? Let’s say it is an eigenvector, with eigenvalue 2. Then we’d have Av = 2v. The left hand side is a vector with two components. The right hand side is a vector with three components. Trouble!

 

Think of it as A takes as input vectors with three components, and outputs vectors with just two components. So we cannot talk about Av + v.

 

This is similar to our troubles with eigenvalue problems. Let’s assume now A is a nice 2x2 matrix, say A equals

 

(5 5)

(7 3)

 

and let’s say someone is kind enough to tell us 2 is an eigenvalue, but is unkind enough to ask us to give the corresponding eigenvector. We reason like

 

Av               =       2v

Av - 2v        =       0

 

But we cannot write (A-2)v. Why? A is a 2x2 matrix, whereas 2 is just a number. Hence A - 2 is not defined. What we can do is remember your professor came from an IVy League school.

 

For any vector v, Iv = v. So 2Iv = 2v.

 

IMPORTANT NOTE: we are not saying that 2 = 2I. 2 is a number, 2I is a matrix. What we are saying is that the affect of acting on a vector v with the number 2 is the same as the affect of acting on the vector v by the matrix 2I.

 

Then we get

 

Av - 2Iv       =       0

(A-2I) v       =       0

 

 

 

Let’s now quickly review adding vectors. For ease of writing, I’m going to write the vectors horizontally instead of vertically.

 

So, instead of writing

 

(1)

(4)

 

I’ll write (1,4).

 

Let’s look at 2(1,4). What does this mean? It means we add two copies of (1,4). The answer is (1,4) + (1,4) = (2,8). One adds vectors by adding them componentwise. So, to add two vectors, they must have the same number of components.

 

3(1,4) = (1,4) + (1,4) + (1,4) = (1+1+1,4+4+4) = (3,12).

 

More generally, let r be any number. Then

 

r(1,4) = (1*r, 4*r).

 

Fractions get a little tricky, but if you remember the above, it should hopefully lessen the confusion. Let’s take, for example,

 

9/5 (1,4)

 

Now, if I write 9/5 as 1.8, then it would be

 

1.8 (1,4) = (1*1.8, 4*1.8) = (1.8, 7.2) = (9/5, 36/5).

 

When we have a fraction, you just have to remember it’s the fraction times the first component is the new first component; the fraction times the second component is the new second component; etc.

 

So 9/5 (1,4) = (1 * 9/5, 4 * 9/5) = (9/5, 36/5).

 

NOT 9/5 (1,4) = (1 * 9, 4 * 5). WRONG!

 

 

[19] GENERAL REVIEW

 

This will be a general review on the differences between matrices, vectors, and numbers. Lots of things that you can do with numbers sadly don’t hold for matrices. However, some things are the same, so it can get a little confusing. Remember, whenever you write something, you need to have a reason justifying it. Being true for numbers is NOT a valid reason.

 

Let’s look at some examples that are true for numbers and matrices:

 

1/Addition

3 + 5 = 5 + 3

 

Or, it doesn’t matter what order you add two numbers.

 

A + B = B + A

 

For example,

 

(1 2)   +       (3 4)            =       (4 6)   =       (3 4)   +       (1 2)

(5 6)            (0 1)                      (5 7)            (0 1)            (5 6)

 

So, you can add two matrices in any order.

 

You can also add two vectors in any order.

 

 

2/Multiplying in a Sequence

Recall what  2 * (3 * 4) means. It means FIRST we multiply 3 and 4, THEN we multiply that by 2 on the left. This is the same as (2 * 3) * 4, which means first multiplying 2 by 3, then multiplying that by 4.

 

For matrices, it’s the same. A(BC) = (AB)C. However, please not that we do not have A(BC) = (AC)B. And we also don’t have A(BC) = A(CB). We have to keep the matrices in the same order.

 

 

Let’s look at some things that are different:

 

 

3/Getting Zero:

If m and n are two numbers, and mn = 0, then either m = 0, n = 0, or both m and n equal zero. This is not true for multiplying matrices. For example:

 

Consider the following product:

 

(0 1) (1 0)

(0 0) (0 0)

 

How do we find the first column of the product? It’s just

 

(0 1) (1)       =       (0*1 + 1*0)  =       (0)

(0 0) (0)                 (0*1 + 0*0)  =       (0)

 

How do we find the second column? It’s just

 

(0 1) (0)       =       (0*0 + 1*0)  =       (0)

(0 0) (0)                 (0*0 + 0*0)  =       (0)

 

Hence

 

(0 1) (1 0)    =       (0 0)

(0 0) (0 0)              (0 0)

 

So, even though neither matrix is zero, their product is.

 

 

 

 

 

4/Switching Order

For numbers, mn = nm – it doesn’t matter which order you multiply them. This, however, is not true for matrices. In general, AB does not equal BA. Let’s do a specific example.

 

(0 1) (0 0)

(0 0) (1 0)

 

Then the first column is:

 

(0 1) (0)       =       (0*1 + 1*1)  =       (1)

(0 0) (1)                 (0*0 + 0*1)           (0)

 

And the second column is:

 

(0 1) (0)       =       (0*0 + 1*0)  =       (0)

(0 0) (0)                 (0*0 + 0*0)           (0)

 

Hence

 

(0 1) (0 0)    =       (1 0)

(0 0) (1 0)              (0 0)

 

Let’s see what happens if you multiply them the other way:

 

(0 0) (0 1)

(1 0) (0 0)

 

Then the first column is:

 

(0 0) (0)       =       (0*0 + 0*0)  =       (0)

(1 0) (0)                 (1*0 + 0*0)           (0)

 

And the second column is:

 

(0 0) (1)       =       (0*1 + 0*0)  =       (0)

(1 0) (0)                 (1*1 + 0*0)           (1)

 

So we find that

 

(0 0) (0 1)    =       (0 0)

(1 0) (0 0)              (1 0)

 

So the two products are not equal!

 

 

 

 

5/Now, let’s look at what the actions of different

objects on other objects give.

 

For example, let v be a vector, and consider any number, say 2 for definiteness. Then 2v will also be a vector. It will have the same direction as v, but twice the length. Similarly, -3v will point in the opposite direction from v, and be thrice (is that good Queen’s English?) the length.

 

Now let’s consider a matrix acting on a vector, say Av. Then this will be a new vector, and except for special v (depending on the matrix A), the direction of Av will not be the same as v. Now, trivially the magnitude of Av will be a multiple of v (think about it – it has to be true!), but as in general their directions are different, it doesn’t help us.

 

Now let’s go back to the eigenvalue problem. Let’s say we know A, and someone is kind enough to tell you lamda (if a few weeks, you’ll know how to find it yourself). Let’s say lamda is 5. Then we’re trying to solve

 

          Av = 5v

 

We remember our algebra, which says we want all the unknowns on one side, so we subtract 5v, and get

 

          Av - 5v = 0-vector

 

Remember: Iv = v. The Identity matrix doesn’t change any vector. Hence

 

If       Iv       = v

then   5Iv     =5v

 

So we can substitute for 5v and we go from

 

          Av - 5v = 0-vector

to       Av - 5Iv = 0-vector

 

Now, why did we have to introduce the Identity matrix? We would’ve loved to have been able to go from

 

          Av - 5v        to       (A-5)v

 

but alas, we cannot. Why? A is a matrix, 5 is a number, and we cannot subtract a number from a matrix.

 

Note we’re never saying 5 = 5I – the left hand side is a number, the right hand side is a vector. What we are saying is 5v = 5Iv.

 

Now we get

 

          (A - 5I)v = 0.

 

And we can solve this by Gaussian Elimination.

 

 

6/Mnemonic for multiplying matrices:

Here’s a way to remember how to multiply matrices:

 

Say A is

(1 2)

(2 3)

 

and B is

(1 0)

(2 1)

 

And we want to find AB. Well, let’s call the first column of B the vector v; let’s call the second column of B the vector u. We know how to multiply

Av, and we know how to multiply Au. Matrix multiplication is just

 

AB     =       A(v u)         =       (Av   Au)

 

This gives the rule write down the first matrix, then write down the first column of the second matrix. That multiplication gives the first colum of the product matrix AB.

 

So, in our case:

 

Av     =       (1 2)(1)        =       (1*2 + 2*2)  =       (6)    

(2 3)(2)                  (2*1 + 3*2)           (8)

 

 

Now we do the first matrix times the second column of B to get the second column of the product matrix AB:

 

Av     =       (1 2)(0)        =       (1*0 + 2*1)  =       (2)    

(2 3)(1)                  (2*0 + 3*1)           (3)

 

 

 

 

 

 

 

JORDAN CANONICAL FORM

Steven Miller sjmiller@math.ohio-state.edu

 

 

I.     Introduction:

We’ve seen that not every matrix is diagonalizable. For example, consider

 

                                                (0       1)

                                                (0       0)

 

Then direct calculation shows that it is not diagonalizable. Why do we care about diagonalizing matrices? The main reason is ease of computation. If we can write A = S L S-1, then A1000 = S L1000 S-1, and the calculations can be performed very quickly. If we had to multiply 1,000 powers of A, this would be very time consuming. Theoretically, we may not need such a time-saving method, but if we’re trying to model any physical system or economic model, we’re going to want to run calculations on a computer. And if the matrix is decently sized, very quickly these calculations will cause noticeable time-lags.

 

Jordan Canonical Form is the answer. The Question? What is the ‘nicest’ form we can get an arbitrary matrix into. We already know that, to every eigenvalue, there is a corresponding eigenvector. If an nxn matrix has n linearly independent eigenvectors, then it is diagonalizable. Hence,

 

Theorem 1: If an nxn matrix A has n distinct eigenvalues, then A is diagonalizable, and for the diagonalizing matrix S we can take the columns to be the n eigenvectors (S-1 A S = L).

 

In the proof of the above, we see all we needed was n linearly independent vectors. So we obtain

 

Theorem 2: If an nxn matrix A has n linearly independent eigenvectors, then A is diagonalizable, and for the diagonalizing matrix S we can take the columns to be the n eigenvectors (S-1 A S = L).

 

Now consider the case of an nxn matrix A that does not have n linearly independent eigenvectors. Then we have

 

Theorem 3: If an nxn matrix does not have n linearly independent eigenvectors, then A is not diagonalizable.

 

          Proof:Assume A is diagonalizable by the matrix S.

                   Then S-1 A S =  L, or A = S L S-1.

                   The standard basis vectors e1, ..., en are eigenvectors of

                   L, and as S is invertible, we get Se1, ..., Sen are

                   eigenvectors of A, and these n vectors are linearly

                   independent. (Why?)  But this contradicts the fact that

                   A does not have n linearly independent eigenvectors.

                   Contradiction, hence A is not diagonalizable.

 

So, in studying what can be done to an arbitrary nxn matrix, we need only study matrices that do not have n linearly independent eigenvectors.

 

Jordan Canonical Form Theorem (JCF):

Let A be an nxn matrix. Then there exists an invertible matrix M such that M-1 A M = J, where J is a block diagonal matrix, and each block is of the form

 

                             (l      1                                      )

                             (        l       1                           )

                             (                  l       1                 )

                             (                            . . .         )

                             (                                     l       1)

                             (                                               l)

 

Note J1000 is much easier to computer than A1000. In fact, there is an explicit formula for J1000 if you know the eigenvalues and the sizes of each block.

 

II. Notation:

Recall that l is an eigenvalue of A if Det(A - lI) = 0, and v is an eigenvector of A with eigenvalue l if Av = lv. We say v is a generalized eigenvector of A with eigenvalue l if there is some number N such that   (A-lI)N v = 0. Note all eigenvectors are generalized eigenvectors.

 

For notational convenience, we write gev for generalized eigenvector, or l-gev for generalized eigenvector corresponding to l.

 

We say the l-Eigenspace of A is the subspace spanned by the eigenvectors of A that have eigenvalue l.  Note that this is a subspace, for if v and w are eigenvectors with eigenvalue l, then av + bw is an eigenvector with eigenvalue l.

 

We define the l-Generalized Eigenspace of A to be the subspace of vectors killed by some power of (A-lI). Again, note that this is a subspace.

 

 

III. Needed Theorems:

 

Fundamental Theorem of Algebra: Any polynomial with complex coefficients of degree n has n complex roots (not necessarily distinct).

 

Cayley-Hamilton Theorem: Let p(l) = Det(A-lI) be the characteristic polynomial of A. Let l1, ..., lk be the distinct roots of this polynomial, with multiplicities n1, ...., nk (so n1 + ... + nk = n). Then we can factor p(l) as

                                               

p(l) = (l - l1) n1 (l - l2) n2 * ... * (l - lk) nk,

 

and the matrix A satisfies

 

p(A) = (A - l1I) n1 (A - l2I) n2 * ... * (A - lkI) nk  = 0,

 

Schur’s Lemma (Triangularization Lemma): Let A be an nxn matrix. Then there exists a unitary U such that U-1 A U = T, where T is an upper triangular matrix.

 

Proof: construct U by fixing one column at a time.

 

IV. Reduction to Simpler Cases:

In the rest of this handout, we will always assume A has eigenvalues l1, ..., lk, with multiplicities n1, ...., nk (so n1 + ... + nk = n). We will show that we can find n1 l-gev, n2 l-gev, ..., nk l-gev, such that these n vectors are linearly independent (LI). These will then form our matrix M.

 

So, if we can show that the n generalized eigenvectors are linearly independent, and that each one ‘block diagonalizes’ where it should, it is enough to study each l separately.

 

For example, we’ll show it’s sufficient to consider l = 0. Let l be an eigenvalue of A. Then if vj is a generalized eigenvector of A with eigenvalue l, then vj is a generalized eigenvector with eigenvalue 0 of B = A - lI:

 

          A vj = l vj  + vj-1­   à  B vj  =  0 vj  +  vj-1.

 

So, if we can find nj LinIndep gev for B corresponding to 0, we’ve found nj LinIindep gev for A corresponding to l.

 

The next simplification is that if we can find nj  LinIndep gev for U-1 B U, then we’ve found nj LinIndep gev for B. The proof is a straightforward calculation: let v1, ..., vm be the m LinIndp gev for U-1 B U; then  U-1 v1, ...., U-1 vm will be m LinIndep gev for B.

 

Lemma 4: Let p(l) = (l - l1) n1 (l - l2) n2 * ... * (l - lk) nk be the char poly  of A, so p(A) = (A - l1I) n1 (A - l2I) n2 * ... * (A - lkI) nk.  For 1 £  i  £  k, consider (A - liI). This matrix has exactly ni LinIndep generalized eigenvectors with eigenvalue 0, hence A has ni LinIndep generalized eigenvectors with eigenvalue li.

 

Proof: For notational simplicity, we’ll prove this for l = l1, and let’s write m for the multiplicity of l (so m = n1). Further, by the above arguments we see it is sufficient to consider the case l = 0. By the Triangularization Lemma, we can put B = A - lI (which has first eigenvalue = 0) into upper triangular form. What we need from the proof is that if we take the first column of U to be v, where v is an eigenvector of B corresponding to eigenvalue 0, then the first column of T = U1-1 B U1 would be (0,0, ..., 0)T. 

 

The lower (n-1)x(n-1) block of Tn, call it Cn-1, is upper triangular, hence the eigenvalues of B appear as the entries on the main diagonal. Hence we can again apply the triangularization argument to Cn-1, and get an (n-1)x(n-1) unitary matrix U2b, such that U2b-1 Cn-1 U2b = Tn-1 has first column (0,0,....0), and the rest is upper triangular. Hence we can form an nxn unitary matrix U2

 

                   (1       0        0        ...       0)

                   (0                                     )

                   (0                U2b                )

                   (...                                    )

(0                                     )

                                     

 

Then U2-1 U1-1 B U1 U2 =

 

                   (0       *        *        ...       *)

                   (0       0        *        ...       *)

                   (0       0        *        ...       *)

                   (...     ...       ...               ... )

                   (0       0        0        ...       *)

 

The net result is that we’ve now rearranged our matrix so that the first two entries on the main diagonal are zero. By ‘triangularizing’ like this m times, we can continue so that the upper mxm block is upper triangular, with zeros along the main diagonal, and the remaining entries on the main diagonal are non-zero (as we are assuming the multiplicity was m). Call this matrix Tm. Note there is a unitary U such that Tm =  U-1BU. Remember, Tm and B are nxn matrices, not mxm matrices.

 

Sublemma 1: At most m vectors can be killed by powers of Tm.

Proof: direct calculation: When we multiply powers of Tm, we still have an upper triangular matrix. The entries on the main diagonal are zero for the first m terms, and then non-zero for the remaining terms (because the multiplicity of the eigenvalue l = 0 is exactly m). Hence the vectors em+1, em+2, ..., en are not killed by powers of Tm, and so powers of Tm can have a nullspace of dimension at most m.

 

 

We now show that exactly m vectors are killed by Tm. This follows immediately from

 

Sublemma 2: Let C be an mxm upper triangular matrix with zeros along the main diagonal. Then Cm is the zero matrix.

Proof: straightforward calculation, left to the reader.

 

Hence the nullspace of (Tm)m   (and all higher powers) is exactly m, which proves there are m generalized eigenvectors of B with eigenvalue l = 0. These vectors are LinIndep: As Tm is upper triagonal with zeros on the main diagonal for the first m entries, Tm has m LinIndep gev e1, …, em with eigenvalue 0. As B = UTmU-1, B has m LinIndep gev Ue1, …, Uem with eigenvalue 0 (show that B cannot have any more LinIndep gev with l = 0).

 

 

Returning to the proof of Lemma 4, we see that there are exactly n1 vectors killed by (A-l1I)n1, ...., nk vectors killed by (A-lkI)nk.

 

The only reason we go thru this triagonalizing is to conclude that there are exactly ni vectors killed by (A-liI)ni.  Try to prove this fact directly!

 

 

IV. Appendix: Representation of l-Generalized Eigenvectors.

 

We know that if l is an eigenvalue with multiplicity m, there are m generalized eigenvectors, satisfying (A - lI)m v = 0. We describe a very useful way to write these eigenvectors. Let us assume there are t eigenvectors, say v1, …., vt. We know there are m l-gev. Note if v is a l-gev, so is (A-lI)v, (A-lI)2 v, …., (A-lI)m v. Of course, some of these may be the zero vector.

 

We claim that each eigenvector is the termination of some chain of l-gev. In particular, we have

 

                   (A-lI) v1,a  = v1,a-1

                   (A-lI) v1,a-1 = v1,a-2

                                                ...

                   (A-lI) v1  =  0                 where v1 = v1,1.

 

and

 

                   (A-lI) v2,b  = v2,b-1

                   (A-lI) v2,b-1 = v2,b-2

                                                ...

                   (A-lI) v2  =  0                 where v2 = v2,1.

 

all the way down to

 

                   (A-lI) vt,r  = vt,r-1

                   (A-lI) vt,r-1 = vt,r-2

                                                ...

                   (A-lI) vt  =  0                 where vt = vt,1,

 

and a + b + …+ r = m.

 

We emphasize that we have not shown that such a sequence of  l-gev exists. Later we shall show how to construct these vectors, and then in Lemma 8 we will prove they are Linearly Independent. For now, we assume their existence (and linear independence), and complete the proof of Jordan Canonical Form.

 

Let us say a l-gev is a pure-gev if it is not an eigenvector. Thus, in the above we have t eigenvectors, and m-t pure-generalized eigenvectors. For notational convenience, we often label the l-generalized eigenvectors by

v1, …, vm. Thus, for a given j, we have (A-lI)vj = 0 if vj is an eigenvector, and (A-lI)vj = vj-1 if vj is a pure-gev.

 

                  

V. Linear Independence of the l-Generalized Eigenspaces.

 

Assume the n1 gev corresponding to l1 are linearly independent amongst themselves, and the same for the n2 gev corresponding to l2, .... We now show that the n gev are linearly independent. This fact complete the proof of Jordan Canonical Form (of course, we still must prove the ni li-gev are linearly independent).

 

Assume we have some linear combination of the n gev equaling zero. By

LC li-gev we mean a linear combination of the ni li-gev. (This is just to simplify notation).

 

Then (LC l1-gev) + (LC l2-gev) + ... + (LC lk-gev) = 0.

 

We’ll show first that the coefficients in the first linear combination are all zero. Recall the characteristic polynomial is

 

p(A) = (A - l1I) n1 (A - l2I) n2 * ... * (A - lkI) nk.

 

 

Define

                   g1(A) = (A - l2I) n2 (A - l3I) n3 * ... * (A - lkI) nk,

 

g1(A) kills (LC l2-gev), g1(A) kills (LC l3-gev),...., g1(A) kills (LC lk-gev).

Why? For example, for the l2-gev, they are all killed by (A-l2I)n2, and hence as g1(A) contains this factor, they are all killed.

 

What does g1(A) do to (LC l1-gev)? Again, for notational simplicity we’ll write m for n1, and v1, ..., vm for the corresponding m l1-gev.

 

We can look at it factor by factor, as all the different terms (A - liI) commute.

 

Lemma 5: For i > 1,  let the vj’s be the  gev corresponding to l1.

If vj is a pure-gev, then (A - liI) vj  = (l1 - li) vj  +  vj-1

If vj is an eigenvector, then (A - liI) vj  = (l1 - li) vj .

 

Again, the proof is a calculation: if vj is a pure-gev,

(A - liI) vj    =  (A - l1I  +  l1I  -  liI) vj

                   =  (A - l1I) vj  +  (l1I  -  liI) vj

                   =  vj-1  +  (l1 - li) vj

The proof when vj is an eigenvector is similar.

 

 

Now we examine g1(A) ((LC l1-gev) + (LC l2-gev) + ... + (LC lk-gev)) = 0.

Clearly g1(A) kills the last k-1 linear combinations, and we are left with

 

         g1(A) (LC l1-gev) = 0

 

Let’s say the LC l1-gev = a1 v1 + ... + amm.  We need to show that all the aj’s are zero. (Remember we are assuming the vj’s are linear independent – we will prove this fact when we construct the vj’s). Assume am ¹ 0. From our labeling, vm is either an eigenvector, or a pure-gev that starts a chain leading to an eigenvector: vm, (A - liI) vm, (A - liI)2 vm, …. Note no other chain will contain vm.

 

We claim that g1(A) (LC l1-gev) will contain a non-zero multiple of vm. Why? When each factor (A - liI) hits a vj, one gets back (l1 - li) vj + vj-1 if vj is not an eigenvector, and (l1 - li) vj if vj is an eigenvector. Regardless, we always get back a non-zero multiple of vj, as l1 ? li.  

 

Hence direct calculation shows the coefficient of vm in g1(A) (LC l1-gev) is

 

                   am (l1 - l2)n2 (l1 - l3)n3  * ... * (l1 - lk)nk

 

As we are assuming the different l’s are distinct (remember we grouped the eigenvalues together to have multiplicity), this coefficient is non-zero. As we are assuming v1 thru vm are linearly independent, the only way the coefficient of vm can be zero is if am = 0. Similar reasoning implies am-1 = 0, and so on. Hence we have proved:

 

Theorem 5: Assuming that the ni generalized eigenvectors associated to the eigenvalue li are linearly independent (for 1 £  i  £  n), then the n generalized eigenvectors are linearly independent. Furthermore, there is an invertible M such that M-1 A M = J.

 

The only item not immediately clear is what M is. As an exercise, show that one may take M to be the generalized eigenvectors of A. They must be put in a special order. For example, one may group all the l1-gev together, the l2-gev together, and so on. For each i, order the li-gev as follows: say there are t eigenvectors which give sequences v1, …, v1,a, v2, …, v2,b,…, vt,…,vt,r. Then this ordering works (exercise).

 

 

 

 

 

VI. Finding the l-gev:

The above arguments show we need only find the ni generalized eigenvectors corresponding to the eigenvalue li; these will be of the form (A-liI) vj = 0 or (A-liI) vj = vj-1.  Moreover, we’ve also seen we may take li = 0 without loss of generality. For notational convenience, we write l for li and m for ni.

 

So we assume the multiplicity of l = 0 to be m. Hence in the sequel we show how to find m generalized eigenvectors of an mxm matrix whose mth power vanishes. (By the triangularizing we’re done, finding m such generalized eigenvectors for this is equivalent to finding m generalized eigenvectors for the original nxn matrix A).

 

We define the following spaces, where A is our mxm matrix:

 

1.     N(A) = Nullspace(A).  The dimension of this is the number of eigenvectors, as we are assuming l = 0.

2.     V1 = W1 = N(A)

3.     Vi = N(Ai), all vectors killed by Ai. Note that Vm is the entire space.

4.     Wi = {w Î N(Ai) such that w ^ N(Ai-1)}, for  2  £  i  £  m.

 

For example, assume we are in R3,  and A2 is the zero matrix. Let’s consider V2. For definiteness, assume V1 is 1-dimensional, and V2 is 3-dimensional. W1 is just V1. The problem is, if y1 and y2 are two vectors killed by A2 and not by A, then it is possible that y1 - y2 (or some linear combination) is killed by A.

 

 

 

 

 

 

 


In the picture above, the line represents W1 and the plane represents W2. Anything in the 3-space above is killed by A2, and only those vectors along the line are killed by just A.. It is possible to take two vectors in R3 that are linearly independent, neither of which lie on the line, but their difference does lie on the line.

 

Why are we constructing such spaces as W2? Why isn’t V2 good enough? The reason is we want a very nice basis. The first basis vector will just be a vector in V1 = W1. For the other two directions, we can take two vectors perpendicular to W1. (How? This is a 3-dimensional space – simply apply Gram-Schmidt).

 

The advantage of such a basis is that if z1 and z2 are linearly independent vectors in W2, then the only way a z1 + b z2 can be in W1 is for a = b = 0. Why? W2 is a subspace, and as z1 and z2 are perpendicular to W1, so is their linear combination. So their linear combination is still in the plane perpendicular to W1, and as long as a and b are not both zero, it will not be the zero vector in the plane, hence it will be killed by A2 and not A.

 

What we are really doing is Partial Orthogonal Complementation. Instead of finding the orthogonal complement of V1 in Rm, we are finding the orthogonal complement in V2.

 

Let L be the smallest integer such that AL  is the zero matrix. Then we only need to look up to WL and VL.  VL will be an m-dimensional space (as every vector is killed by AL). We’ll have a nice basis for VL, consisting of the bases of W1, ..., WL. The advantage of this decomposition is that the spaces are mutually perpendicular, and if you have a linear combination of vectors in Wj, then the only way it can be in a Wb with b < j is if the combination is the zero vector.

 

Lemma 6: dim(Wi-1) ³ dim(Wi), for i = 2, 3, ..., L.

Proof: Assume not: let N = dim(Wi). So consider a basis of Wi: z1, z2, ..., zN, and the vectors Az1, ..., AzN. Clearly each Azj is in Vi-1. We claim each Azj must have some component in Wi-1. Why? The smallest power that kills each zj is Ai. If there was no component in Wi-1, then Ai-2 would kill zj, contradiction.

 

Let P = Pi-1 be the projection operator from Vi-1 to Wi-1. Note P2 = P, and by the above arguments each vector Az1, ..., AzN has a non-zero component in Wi-1. Therefore the N vectors PAz1, ..., PAzN are N non-zero vectors in Wi-1.

 

As we are assuming that dim(Wi-1) < dim(Wi), the N vectors PAz1 thru PAzN cannot be linearly independent, for the dimension of a subspace is the maximal number of linear independent vectors you can have in that space. Hence there exist constants, not all zero, such that

 

                                      a1 PAz1 + ... + aN PAzN = 0

 

Hence PA (a1z1 + ... + aNzN) = 0. By the definition of Wi, the linear combination a1z1 + ... + aNzN is in Wi. Therefore, the smallest power of A that kills it is I unless it is the zero vector. As we are assuming the vectors z1

through zN are linearly independent, it is only the zero vector if a1 = ... = aN = 0.

 

As i > 1, A cannot kill  a1z1 + ... + aNzN unless this is the zero vector. Could PA kill a non-zero vector? No: by definition, if a1z1 + ... + aNzN is not the zero vector, then it is in Wi. Therefore A(a1z1 + ... + aNzN) has a non-zero component in Wi-1 (if not, that contradicts a1z1 + ... + aNzN is in Wi). Therefore, PA(a1z1 + ... + aNzN) cannot be zero, as A(a1z1 + ... + aNzN) has a component in Wi-1. Therefore, the only way PA(a1z1 + ... + aNzN) can be the zero vector is if a1z1 + ... + aNzN is the zero vector, which forces a1 = ... = aN = 0. Contradiction. QED.

 

REMARK: by Lemma 6, we see our previous example is impossible. An example that is consistent with Lemma 6 is to consider R5, let V1 = W1 be three-dimensional, and W1 a plane perpendicular to W1.

 

 

Lemma 7: dim(Wi) ³ 1 for i = 1, 2, ..., L.

Proof: As L is the smallest integer such that AL is the zero matrix, then there must be a vector killed by AL but not by AL-1. Whence dim(WL) is at least 1, so by Lemma 6 we obtain dim(Wi) is at least 1 for i = 1, 2, ..., L.

 

 

 

 

 

 

 

 

 

We now show how to construct the m generalized eigenvectors. We find bases for the spaces W1, W2, ..., Wm. We then use A to ‘pullback’.

 

It’s easier to explain by an example: assume the dimensions are as follows. Let’s take m = 12, and L = 5. For definiteness sake, consider the following:

 

          V1               V2               V3               V4               V5      V6                   

          W1              W2              W3              W4              W5     W6       

dimW:  4                 3                 2                 2                 1        0

basis:   u1,...,u4              v1,...,v3          w1,w2           x1,x2            y

 

pullback:A4 y   ß     A3 y   ß     A2 y    ß     A y    ß      y                                             

Now, W4 is 2-dimensional.

 

WE DO NOT KNOW THAT Ay IS IN W4 !!! IT IS QUITE POSSIBLE THAT Ay IS KILLED BY A4 AND NO SMALLER POWER OF A WITHOUT BEING IN W4 !!!

 

We know y is killed by A5 and no smaller power of A; hence Ay is killed by A4 and no smaller power of A. But this does not mean that Ay is in W4.

Fortunately, there is a huge degree of non-uniqueness in the Jordan Canonical Form. We did not need Ay to be in W4 – all we needed was Ay (and A2y, A3y, ...) to be killed by A4 and nothing lower (A3 and nothing lower, ....). We’ll see below how to handle this.

 

So for now, all we know is that Ay is in V4, with a non-zero projection in W4; that A2 y is in V3 with a non-zero projection in W3, and so on.

 

W4 is 2-dimensional. Choose a vector x in W4 such that x is linearly independent with the projection of Ay onto W4. Then this x gives us another Jordan Block:

          V1               V2               V3               V4               V5      V6                   

          W1              W2              W3              W4              W5     W6       

dimW:   4                 3                 2                 2                 1        0

basis:   u1,...,u4              v1,...,v3          w1,w2           x1,x2            y

 

pullback A4 y   ß     A3 y   ß     A2 y    ß     A y    ß      y                          

pullback A3 x   ß     A2 x   ß     A x     ß     x                          

 

We continue the game (noting that Ax is in V3, but not necessarily in W3). We already have two candidates for directions in V3, namely A2y and Ax. We’ll show later that though they are not necessarily in W3, they are killed by A3 and not A2, and that their projections onto W3 are linearly independent.

 

We need to find 3 directions in W2. The projections of A3y and A2x give us at most two (these two directions could be the same – again, we will show later that this cannot happen). As W2 is a 3 dimensional space, we can find a vector v in W3 that is linearly independent with the projections of A3y and A2x:

          V1               V2               V3               V4               V5      V6                   

          A4 y   ß     A3 y   ß     A2 y    ß     A y    ß      y                          

          A3 x   ß     A2 x   ß     A x     ß     x                          

          A v    ß     v

 

We then have to find four directions in W1, and have three candidates. We’ll see later these three candidates are linearly independent, hence we can find a fourth vector u linearly independent with the rest:

 

 

 

          V1               V2               V3               V4               V5      V6                   

          A4 y   ß     A3 y   ß     A2 y    ß     A y    ß      y                          

          A3 x   ß     A2 x   ß     A x     ß     x                          

          A v    ß     v

              u

 

We now have enough (m) candidates. We will show that they are linearly independent.

 

First we prove that A2 y and Ax are linearly independent; then we will prove A3 y, A2 x are linearly independent, and so on. (Proof suggested by L. Fefferman and O. Pascu). Assume a A2 y + b Ax = 0. Then A(a Ay + b x) = 0. But Ay has a non-zero projection in W4, and we’ve chosen x to be linearly independent in W4 with the projection of Ay. Therefore the smallest power that can kill this combination is A4, unless it is the zero combination. Hence the only way this can be killed by A is if a = b = 0.

 

Similarly, assume a A3 y + b A2 x = 0. Then A2(a Ay + b x) = 0, and by the same argument as above, a = b = 0. Note that we did not need Ay to be in W4, only that it had a non-zero projection there.

 

By construction, v is linearly independent with A3 y and A2 x. What about A4y, A3x, and Av? Assume a A4y + b A3x + cAv = 0. Then again we obtain A(aA3y + bA2x + cv) = 0, and the construction of v forces a = b = c = 0.

 

Lemma 8: Assume now some linear combination of the m generalized eigenvectors constructed above is zero. Then all the coefficients are zero.

Proof:

          V1               V2               V3               V4               V5      V6                   

          A4 y   ß     A3 y   ß     A2 y    ß     A y    ß      y                          

          A3 x   ß     A2 x   ß     A x     ß     x                          

          A v    ß     v

              u

 

Assume the coefficient of y, a, is non-zero. Then we have

 

          a y = - (rest of terms), where the rest is killed by A4, but y isn’t.

 

Hence a must be zero.

 

Now assume the coefficients of Ay and x are a and b, respectively. Then

 

          a Ay + b x = - (rest of terms), rest killed by A3.

 

As x is linearly independent with the projection of Ay onto W4, a Ay + bx  is killed by A4 and not A3 unless this combination is the zero vectory. As the rest of the terms are killed by A3 , this implies a = b = 0.

 

Continuing to argue in this way, we obtain all the coefficients are zero, and hence the generalized eigenvectors are linearly independent.

 

We can now build up our matrix M and J! At last!

 

For each l, we have associated generalized eigenvectors. For definiteness sake, let’s consider the above case, and I’ll leave to you the generalization. We have 4 blocks corresponding to l = 0: one block with 5 generalized eigenvectors (starting with the eigenvector A4y, and ending with y), another block with 4 gev (starting with the eigenvector A3x, and ending with x), another block of length 2, and one of length 1.

 

We can order the blocks anyway we want – that will just change the order of the block in J; however, in each block we must write the vectors starting with the eigenvector on the far left, and going to the highest generalized eigenvector on the far right:

 

          (   A4y  A3y  A2y  Ay  y  A3x   A2 x  Ax  x   Av  v  u   )

 

Another possible arrangements would be

 

(  A3x   A2 x  Ax  x A4y  A3y  A2y  Ay  y Av  v  u   )

 

and so on. I’ll leave it to you to verify that M-1 A M = J.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

VII. Calculation Shortcut:

When trying to find bases for the spaces Wi, there is a nice shortcut. First, we find a basis for Vi, or start to. How do we find vectors killed by Ai? We just have to find the nullspace of Ai. We do this by Gaussian Elimination, reducing Ai to an upper triangular matrix U. We can assume we’ve already found bases for W1 thru Wi-1, or equivalently, a basis for Vi-1. Let’s say the basis for Vi-1 is b1, b2, ..., bq. Then if we add these q rows to U, forming a new matrix U’, we observe the following:

 

(1)  If U’ v = 0, then v is killed by Ai (from the first m rows of U’ are the same as those of U).

(2)  If U’ v = 0, then v is perpendicular to W1 thru Wi-1: this follows immediately from the fact that we put the basis for W1 thru Wi-1 as the last q rows of U’, and so this forces v to be perpendicular to these spaces.

 

Also, if an eigenvalue has multiplicity 3 or less, counting the number of linearly independent eigenvectors gives us the Jordan Form. Why? If there are 3 LI eigenvectors, it’s diagonalizable. If there is only 1, it must be a 3x3 block. If there are 2, we must have a 2x2 and a 1x1 block. Note we have no idea what M looks like.

 

Also note that this argument fails for multiplicity 4 and greater. If we have multiplicity 4 and 2 eigenvectors, it could be 2x2, 2x2, or it could be 3x3, 1x1.

 

Note the difference between theory and practice: theoretically, we know that bases for the different Wi exist, so with a wave of the hand we have them to work with. But if we were actually going to Jordanize large matrices, finding bases for all these spaces takes time, and we don’t always need all those basis elements. Often it’s enough to just find vectors in Vi; for example, show that if instead of taking y in WL we took y in VL the pullback process would work. Then there could be many i where we’ve pulled-back all the vectors we need, and hence there would be no need there to find a basis. If this doesn’t make too much sense, don’t worry: it’s late at night here for me, and at this stage in your life, you won’t be dealing with terrible Jordanizations where this would really make a difference. I just want to emphasize that often you can come up with a theoretical line of argument that, in practice, will yield the correct answer, but be so computationally inefficient that a better way is greatly desired.

 

SUMMARY: HOW TO JORDANIZE:

STEP 1: Find the eigenvalues, their multiplicities, and all the eigenvectors.

STEP 2: For each eigenvalue l and it’s multiplicity m, calculate (A-lI),

               (A-lI)2, ..., (A-lI)m.

STEP 3: Find bases for the spaces Wi described above. This will yield bases

               for Vi. Use the calculation shortcut to find the bases.

STEP 4: ‘Pullback’ vectors as described, add in vectors linearly

      independent with projections as needed.