Additional comments related to material from the class. If anyone wants to
convert this to a blog, let me know. These additional remarks are for your
enjoyment, and will not be on homeworks or exams. These are just meant to
suggest additional topics worth considering, and I am happy to discuss any of
these further.
- Friday, May 15. We finished calculus
of variations, seeing how if J[y] = Int_{a to b} F(x,y,y')dx has F not
explicitly depending on x, then we get F - F_{y'} y' is constant. We also
discussed the
hyperbolic trig functions, which naturally arise in the catenoid problem.
We then analyzed a conjecture and a model for two firms competing. As some
people missed class and this example illustrates many of the key concepts of
the course (creating a reasonable model, solving it, exploring the
consequences of the parameter dependence in our solution),
I have taken the time and
written up the analysis in great detail.
- Wednesday, May 13. The
Euler (also called the
Euler-Lagrange) equation for when the functional J[y] = Int_{x = a to b}
F(x,y,y')dx, namely F_y - d F_{y'} / dx = 0, is one of the most (if not the
most) important differential equations in mathematical physics. I apologize
for not doing this in 30 minutes or less (but getting to it in 35 minutes is
not bad, as I'd much rather have the lecture self-contained:
click
here and search for the phrase eight-minute) is extremely important. It is
the analogue of the first derivative vanishing for functionals, and as such is
the cornerstone of optimization, a very important subject. It is quite amazing
that all of classical mechanics can be derived from minimizing the
action (L
= T - U, the Lagrangian of the system); quantum mechanics comes from
minimizing the
Hamiltonian, H = T + U (WHY this is the case, as I said,
is
not my department). This leads to
Newton's famous F = ma. Has
any progress been made? At first it isn't clear. We still have to solve this
equation, but it turns out that we can work in generalized coordinate systems,
and this leads to greatly simplified math for many problems (if you've done
classical mechanics, you've probably seen
the double pendulum
problem; the link there contains an analysis of the solution using the
Euler-Lagrange equations and generalized coordinates, and discusses how this
system is chaotic). Notice that we are not claiming that we get physics for
free. We must still assume / axiomatize something. Instead of assuming F = ma,
we assume that physics can be described by minimizing the action / Lagrangian
T - U. WHY this is the case is a bit of a mystery (just like why must F = ma).
Something must be assumed somewhere (although perhaps see the discussions on
the fine
structure constant and then chat with me about
dimension analysis).
We'll see in class on Friday how this is related to solving other problems
(such as the
catenary, the shape a
string will take under gravity). One last bit: one of the key inputs in
proving the Euler (Euler-Lagrange) equation was
Taylor's theorem in several variables. One way to prove this is our
powerful technique of adding zero. To simplify the write-up, I'll just
consider a function of two variables. Say we are studying F(a+h, b+k) - F(a,b),
so (h,k) is the small change. We add zero and see this is the same as [F(a+h,
b+k) - F(a, b+k)] + [F(a, b+k) - F(a, b)]. To get the first order piece, we
apply Taylor's theorem in one variable to each bracketed quantity, and get (Fx
means derivative with respect to first variable, Fy means derivative with
respect to second): Fx h + Fy k + ... (higher order pieces). One can continue
this analysis to get the higher order terms, but what I want you to see is how
this follows from one variable.
- Monday, May 11. To study the
Calculus of
Variations, we needed to quickly review the notions of a
vector space and a
normed vector space.
We study functionals now, which are maps from functions to real numbers. We
can generalize the notion of a continuous function to a continuous functional;
however, there are often several different metrics we can use as to whether or
not two functions are close, and depending on which metric we use the
functional may or may not be continuous. For example, the arc length of a
curve y(x) is given by J[y] Int_{x = a to b} Sqrt(1 + y'(x)^2} dx. Two normed
vector spaces are C[a,b] = {continuous functions on [a,b]} and C^k[a,b] =
{functions continuously differentiable k times on [a,b]}. We define the
corresponding norms by ||f||_0 = max_{a <= x <= b} |f(x)| for functions in
C[a,b] and ||f||_1 = max_{a <= x <= b} |f(x)| + max_{a <= x <= b} |f '(x)| for
functions in C^1[a,b], and the distance between two functions f1 and f2 is
d(f1,f2) = ||f1 - f2||. Note two functions can be quite close to each other in
one space but far apart in another. We saw this with the staircase and the
sinusoidal function and the straight line: they are close in C[a,b] but not in
C^1[a,b]. It's a nice exercise to show that the arc length functional is
continuous when we study functions in C^1[a,b] but NOT when we look at
functions in C[a,b]. The wikipedia article,
Calculus of
Variations, does a nice job describing some of the problems and norms.
Other big problems in the subject are the
catenary (the shape a
string will take under gravity), and the
Brachistochrone Problem (find the curve such that a ball sliding down it
will travel from A to B in least time). It took a bit of searching, but
click here for the anecdote of Newton returning from the mint and staying up
until 4am to solve this and then having it published anonymously, leading to
Bernoulli saying that he recognizes the lion by his claw.
- Monday, May 4. The
Laplace Transform
is an important example of an
integral transform.
It is extremely useful in solving certain types of differential equations. We
will not prove that the
inverse
Laplace transform exists and is unique for a large class of functions, as
there is no need. This is because we often have general existence theorems for
solutions to differential questions. Thus, a solution exists, and thus the
inverse Laplace transform will exist for the functions we study. The Laplace
transform is but one of many useful integral transforms; other extremely
important ones are the
Fourier (with
applications ranging from solving
partial
differential equations, such as the
heat equation, to
proving the
Central Limit Theorem) and
Mellin transforms.
- Wednesday, April 29. As we've seen,
creating a good model requires serious contemplation and an analysis of the
trade-offs between capturing all the features and being tractable. Frequently
the system being modeled is so complex that, amazingly, using random numbers
leads to accurate predictions. One example of this is
Random Matrix
Theory (for a description of the nuclear physics origins, basic results
and connections to number theory, see
chapter 15 of
my book An Invitation to
Modern Number Theory). (If you are interested in seeing the connections
with differential equations, let me know and I'll give you the relevant pages
from Chapter 17. The fact that Q^T A Q and A must have the same probabilities
for any orthogonal change of variables Q means that the function f(Q) = Prob(Q^T
A Q) - Prob(A) must be constant. This leads to the simple differential
equation f'(Q) = 0, f(I) = 0; we then pass from this differential equation to
differential equations for the probability distributions we use for each entry
of A, and find that the entries of A are chosen from Gaussian distributions.)
We then analyzed a population problem involving the number of pairs of whales
of various ages at any time (v_{n+1} = A v_n where A is a
Leslie matrix). We
first modeled this with a simple constant coefficient system of difference
equations, which we can solve completely. We then discussed the problems with
such a model, and possible generalizations that would address these issues.
For more details, see the two models described
in my notes here. Interestingly, there is a connection between the
generalized model and random matrix theory!
- Monday, April 27. Today we discussed
modeling, in particular, the interplay between finding a model that captures
the key features and one that is mathematically tractable. While we used a
problem from baseball as an example, the general situation is frequently quite
similar. Often one makes simplifying assumptions in a model that we know are
wrong, but lead to doable math (for us, it was using continuous probability
distributions in general, and in particular the three parameter Weibull). For
more on these and related models, see
- Wednesday, April 22.
Euler's method (great
site here with handouts and animations) is a good method to
numerically approximate solutions to first order differential equations of the
form y'(t) = f(t,y(t)). The key ingredient in this method is the tangent line
approximation for a function (ie, a zeroth order Taylor series). There
are better algorithms. Similar to first order Taylor approximations, the error
in the Euler tangent line method is proportional to h^2 (where h is the step
size); methods such as
Runga-Kutta (see also
here) have better errors, such as of size h^4. The reason for these
improvements is due to better approximations of areas (we may rewrite our
problem as y(t) - y(0) = Int_{s = 0 to t} f(s,y(s))ds, and thus we see the
importance of estimating areas well). One terrific method for approximating
integrals is Simpson's
rule; I strongly urge you to read the
wikipedia article on Simpson's rule (link
is here). The method of proof is more important than the result itself.
Specifically, by taking a weighted combination of the midpoint method and the
trapezoid method (two algorithms that are order h^3), we get a new method
which is at least as good as order h^4 (it's actually order h^5 because of
additional symmetries). This is amazing! Each method is only as good as h^3,
but appropriately combined they give h^5! As you can see from the statement of
Simpson's rule, it isn't that much additional algebra to compute. The key is
in choosing the correct weights. This is a very common principle; it is used
in economics to combine various investments into a portfolio and decrease the
variance in the rate of return (the key assumption is independence of the
various funds; the correlations of the various funds are often incorrectly
estimated, as the current collapse of Western Civilization shows). As you can
tell, I am a huge fan of this method, as it illustrates numerous important
techniques and tools. In particular, it shows how one can rearrange algebra to
do things more efficiently or how rearranging the algebra leads to a new way
of interpreting the results (putting the division by 6 with the sum of the
functions versus with the b-a). Also, we see how consistency checks /
investigations of simple cases can help us figure out if a formula is
reasonable (why we're not surprised Simpson's method has a b-a, or a division
by 6 for example).
- Monday, April 20. Any first order
linear system of differential equations x'(t) = Ax(t) with x(0) = x_0 has the
solution x(t) = exp(At) x_0. While this formula is of great theoretical
interest (as it does completely solve the problem), from a computational
standpoint we have merely traded one problem for another. Thus,
unless we can easily compute exp(At), this is not a good trade.
Fortunately there are good techniques for a variety of problems to either
compute exp(At) or at least to approximate it to a given accuracy. Simplest
classes of such matrices are diagonalizable and
nilpotent. While a
general matrix cannot be diagonalized, it can be put into
Jordan Canonical
Form, which often is sufficient for computations. Even
though many systems of differential equations are non-linear, they can often
be locally approximated by linear systems; this is similar to approximating
complex functions in calculus with their first order Taylor series, which is
quite useful (for example,
Newton's Method).
We ended with a discussion of a model for how a virus propagates in a network;
if anyone is interested in either reading more about this or joining our
research group, just let me know. What I particularly like is that reasonable
assumptions lead to a guess of di/dt = beta p (N-1) i(t) (1 - i(t)) - delta
i(t) (for how the percentage of infected nodes changes with time). Quick
sanity checks show this is a reasonable equation (what happens if i(t) is 0 or
1, ...). This is one of the most important points to take away from the class,
namely how to look at a model and see if it is reasonable. For another
example, see my notes on the log5
rule for determining the probability one team beats another.
- Friday, April 17. The technique of
adding zero (which we used to analyze the second HW problem due on Wednesday,
namely x y''(x) + y'(x) + x y(x) = 0, where we replaced x with x - 1 + 1 --
see the notes on series solutions for details)
is extremely powerful and important in mathematics. This is the key step in
many proofs; the first which you might remember is the proof of the
product formula from calculus (see
the alternative proof on Wikipedia).
- Wednesday, April 15. One of the most important things to learn in this (or any class) is the
importance of asking the right question.
Here is a fascinating clip of a great speaker talking about the importance of
asking the right question. Today we asked several questions to try to
illuminate some important concepts in linear algebra (which will be of use in
solving systems of differential equations). Sadly, even if everything is real
we'll need to use
complex numbers in general. This is because we find eigenvalues by solving
Det(A - lambda I) = 0. This leads to a polynomial in lambda to solve, and
while the
Fundamental Theorem of Algebra asserts a polynomial of degree n has
exactly n roots (not necessarily distinct), the roots to a real polynomial
might be complex. While exp(x) exp(y) = exp(x+y) if x and y are real, this is
not true in general for matrices A and B; ie, exp(A) exp(B) is not exp(A+B) in
general. There is a formula for exp(A) exp(B), the
Baker-Campbell-Hausdorff formula, See the
Zassenhaus formula for a nice explicit formula for this product. The
formula involves the
commutator of two matrices, where [X,Y] = XY - YX measures how far X and Y
are from commuting. The commutator arises throughout the sciences, in
particular in quantum mechanics, where the
fundamental commutator relation asserts that [X,P] = i hbar I (hbar =
planck's constant
divided by 2 pi, X is the position operator and P is the momentum operator).
Finally, a few points about optimal algorithms / efficient algorithms.
- Monday, April 13. Because first order
linear non-homogenous differential equations can be solved exactly through two
integrals (one gives the integrating factor, and then we have to integrate
that against the non-homogenous term), in some sense all such problems should
be viewed as 'easy' (ie, by numerically approximating these integrals we can
get excellent approximations to the true solutions). Thus it is not surprising
that we always want to convert differential equations into a system of
uncoupled first order linear non-homogenous differential equations. This
explains the motivation for our quick tour through linear algebra. The key
concept is that if we change our basis, we change what a matrix looks like but
not its effect. You should think of this as a matrix is a representation of a
physical transformation; this will look different in different bases. (For
example, if we use East-North coordinates our location might be one set of
numbers, but if we use Northeast-Southwest, it would be something else, yet
it's physically the same point.) We've seen in class that if we have a
diagonal matrix A, then solving the system of equations x'(t) = A x(t) + g(t)
(where x(t) and g(t) are vectors of functions) is straightforward, namely it
just reduces to a bunch of independent first order linear non-homogenous
differential equations. It turns out that if we're given x'(t) = A x(t) + g(t)
for a generic A, then most of the time we can find a matrix T and new
functions y(t) such that x(t) = Ty(t) and y'(t) = D y(t) + h(t), where D is a
diagonal matrix and h(t) = T^{-1} g(t). Thus we can solve for y(t) and then
find x(t) with the rule x(t) = T y(t). Thus we see the importance of being
able to
diagonalize a matrix (ie, find a matrix T as described above). The way
this is done (and we'll discuss this in greater detail) is to find the
eigenvalues and eigenvectors of A. Recall v is an
eigenvector with
eigenvalue lambda if Av
= lambda v. All we will need for our class is that the eigenvalues for A are
found by solving Det(A - lambda I) = 0, where A is a given n x n matrix and I
is the n x n identity matrix (all 1s on the main diagonal and 0s elsewhere). I
have written some notes on linear algebra over the years which might be
helpful (my apologies for the formatting -- these are OLD notes that I've
converted to html -- I strongly urge you to read the MSWord versions and
not the html version!):
- Friday, April 10. In solving for
series solutions y(x) = Sum_n a_n x^n to differential equations, we saw how we
can get somewhat messy expressions in the recurrence relations for the unknown
coefficients a_n. We saw this when we studied the
Airy equation (which
arises in optics); the answer involved the
triple factorial (keep
reading the wikipedia entry to get to some really large generalizations of the
factorial, such as the superfactorial, the up-arrows, ...; this leads to
unbelievably large numbers such as
Graham's number).
There is a lot of interesting behavior about the solutions to the
Airy equation, as
well as to generalizations of the factorial function. The most common
generalization is the
Gamma function: Gamma(s) = Int_{t = 0 to oo} exp(-t) t^s dt/t.
Integrating by
parts, one sees that for n a positive integer, Gamma(n) = (n-1)!; thus,
the Gamma function generalizes the factorial function to all real numbers (and
even to complex numbers!). My favorite fact is that Gamma(1/2) = (-1/2)! =
sqrt(pi). (I won't put an exclamation point for emphasis, but this is a
fascinating result! If you want to see a proof, let me know.) This is not just
a mathematical curiosity, but arises throughout the sciences. In particular,
it's related to the normalization for the
standard Gaussian
density: p(x) = exp(-x^2/2) / sqrt(2pi) (this is a
probability
distribution, and integrates to 1). For our purposes, the reason these
facts matter is that we want to get some feel for our series solution (for
what x does it converge, how rapidly is it growing / decaying, how many terms
do we need to get a good approximation, ...).
- We ended with a statement of what a system of differential equations is,
when it is linear, and the claim that systems of
first order linear differential equations (homogenous or non-homogenous)
will be easily solvable using integrating factors after inputting a little
linear algebra. What makes this one of the most important techniques /
theorems of linear algebra is that we can reduce so many things to such
systems. For example, a little algebra shows that if we have u''(t) + .125
u'(t) + u(t) = 0, then setting x1(t) = u(t) and x2(t) = u'(t) leads to a
system of linear differential equations for x1(t) and x2(t).
- We briefly mentioned in class the
Bessel equation (I
actually did remember it correctly: it's x^2 y''(x) + x y'(x) + (x^2 - n^2)
y(x) = 0). This equation is a bit harder to solve than Airy's equation, as x =
0 is not an ordinary point (the coefficient of y''(x) vanishes at x=0).
This equation is discussed in our textbook. It arises frequently in physics
and number theory (if people are interested I'm happy to discuss how it arises
in my work on the distribution of primes and the energy levels of heavy nuclei
such as Uranium).
- One final point to note: it is not yet clear that it is advantageous to
replace second order differential equations with systems of first order
differential equations. We're increasing the number of unknown functions
(which is bad), but we're decreasing the highest power of the derivatives that
appear (which is good!). Here's my favorite example of how making a change
doesn't help: the nobel laurate
Richard Feynman
showed that you can reduce all of physics to solving the equation U = 0. What
is U? It's the unworldliness of the physical world. What you do is you take
each physical law, say F = ma, E = mc^2, .... We rewrite these as U = (F -
ma)^2 + (E - mc^2)^2 + ... = 0; thus the only way this can be equal to zero is
if each physical law is satisfied! Wow -- all of physics in one equation
involving just three recognizable symbols! Unfortunately, of course, this is
completely intractable!
See here for a
little more on these notational gimmicks.
- Wednesday, April 8. Systems of
equations are frequently used to model real world problems, as it is quite
rare for there to be only one function of interest. If you want to read more
about applying math to analyze the
Battle of Trafalgar,
here is a nice
handout (or, even better, I think we could go further and write a nice
paper for a general interest journal expanding on the
Mathematica program I wrote). The
model we discussed is very similar to the
Lotka-Volterra
predator-prey equations (our evolution is quite different, though; this is
due to the difference in sign in one of the equations). Understanding these
problems is facilitated by knowing some linear algebra; we'll discuss what is
needed in class. It is also possible to model this problem using a system of
difference equations, which can readily be solved with linear algebra.
Finally, it's worth noting a major drawback of this model, namely that it is
entirely deterministic: you specify the initial concentrations of red and blue
and we know exactly how many exist at any time. More generally one would want
to allow some luck or fluctuations; one way to do this is with
Markov chains. This
leads to more complicated (not surprisingly) but also more realistic models.
In particular, you can have different probabilities for one ship hitting
another, and given a hit you can have different probabilities for how much
damage is done. This can be quite important in the 'real' world. A classic
example is the British efforts to sink the German battleship Bismarck in WWII.
The Bismarck was superior to all British ships, and threatened to decisively
cripple Britain's commerce (ie, the flow of vital war and food supplies to the
embattled island). One of the key incidents in the several days battle was a
lucky torpedo shot by a British plane which seriously crippled the Bismarck's
rudder.
See the wikipedia entry for more details on one of the seminal naval
engagements of WWII. The point to take away from all this is the need to
always be aware of the limitations of one's models. With the power and
availability of modern computers, one workaround is to run numerous
simulations and get probability windows (ie, 95% of the time we expect a
result of the following type to occur). Sometimes we are able to theoretically
prove bounds such as these; other times (using Markov chains and
Monte Carlo
techniques) we numerically approximate these probabilities,.
- Monday, April 6. Series expansions sum
a_n x^n are a great tool to solve differential equations; the important
ingredient is some control over where these series converge and how rapidly
they converge. This is why we need tests such as the ratio or the root test.
It is not always easy to see the true magnitude of a series from its
expansion. For example, exp(x) = sum x^n/n! and exp(-x) = sum (-x)^n/n!; there
is a huge difference in behavior as x --> oo, but looking at the size of the
coefficients a_n one sees no difference. (There are times in mathematical
physics when we look at divergent series. These are often called
asymptotic series;
one standard example is the
exponential
integral. The main step in solving differential equations with a series
expansion is to find a tractable
recurrence
relation for the unknown coefficients a_n and then solve it (this is
another reason why we spent so much time on difference / recurrence relations
at the start of the semester). The advantage is we have reduced a differential
equation to systems of equations, and often these are more tractable. The
example we did in class is typical, where we see two free parameters arising
(as we did a second order differential equation), and specifying a_0 and a_1
uniquely determine all other a_n (not the similarity with difference equations
such as the Fibonacci
equation).
- Wednesday, March 18. Convergence (and
divergence) of sequences and series is an extremely important topic. We want
to replace complicated objects with simpler ones, and then approximate the
complex arguments by analyzing the simpler ones and understanding the error
terms. Key items to study:
series,
absolute
convergence,
conditional
convergence,
radius of convergence,
geometric series,
harmonic
series, ..., and see the bottom of the wikipedia article for the various
tests (comparison,
integral, ratio, root, ...). Behavior of sequences and series can be quite
surprising: here are the four questions and answers:
- (1) Consider Sum_{n=1 to oo}
a_n x^n; must this converge for some n not equal to zero? Answer: NO! Take a_n
= n^n.
- (2) Consider the Taylor series for f and g at the x=0. If the two series
are equal, must f(x) = g(x) for some x not equal to zero? Answer: NO! Looking
at f(x)-g(x), this is equivalent to asking if a function has Taylor series
identically equal to zero at a point, is the function identically equal to
zero in a neighborhood of that point? The answer is no. Let h(x) = 0 if x=0
and exp(-1/x^2) otherwise. Using the definition of the derivative, one can
show all derivatives are zero at zero, but the radius of convergence is zero (ie,
the function is only zero at zero). One has to resort to the
definition of the derivative
because there is a change in the definition of the function.
- (3) Let a_n be any sequence of real numbers. Is there always an
infinitely differentiable function such that the n-th
derivative at x=0 equals a_n? Answer: Amazingly, YES! Extra credit if you can
give an example.
- (4) Let f(x) be a continuous function. Must f(x) be differentiable for at
least one x? Answer: NO!
Weierstrass came up with a nice example (refined later by Hardy, I
believe). It turns out that one can define a sense of size on the space of
continuous functions. Just as almost no real numbers are rational (Cantor's
diagonalization argument), almost all continuous functions are nowhere
differentiable!
- Monday, March 16. While
variation of parameters is a great theoretical tool, in practice it is
hard to use. There are two difficulties with the approach. First, we need to
know the solutions to the homogenous equation, and second we need to be able
to evaluate integrals involving reciprocals of the
Wronskian determinant! In
the course of our analysis, at one point we chose to restrict attention to
u1(t) and u2(t) with u1'(t)y1(t) + u2'(t)y2(t) = 0. This is a severe
restriction on the possible functions, but fortunately it turns out that (for
this type of problem at least) we can find a solution. To do so involves
solving a system of two equations with two unknowns, the unknowns being the
functions u1'(t) and u2'(t). We thus end up with two first order differential
equations, and this is progress (ie, it's easier to solve the two separable
first order differential equations than the original second order differential
equation). This is a common theme throughout the course: there are lots of
methods we can try; sometimes we end up with a simpler equation and win, other
times it's more complicated and we lose (see the notes for Wednesday, March
11). We also talked about sums of k-th powers of integers: sum_{n = 1 to N}
n^k. These are typically proved by
mathematical
induction (see
also my online notes). The problem in using induction is that you need to
know what you are trying to prove. Thus, if k=1 you try and show it is
N(N+1)/2, if k=2 you show it is N(N+1)(2N+1)/6; how do you figure out the
general guess for arbitrary k? There are several approaches. One of my
favorite uses
differentiating identities and L'Hopital's rule (but it is NOT easy to
use; I don't think I can do more than k=2 with it, though I'd love to see
someone go further). Another is to gather numerical data. ASSUME that the
answer is a polynomial in N of degree k+1 (this is not an unreasonable
conjecture based on analyzing k=1 and k=2). If this is the case, the answer is
some polynomial a_k(0) + a_k(1) N + ... + a_k(k+1) N^{k+1}. A polynomial of
degree k+1 is uniquely determined by its value at k+2 points. Thus, we simply
evaluate our k-th power sums for N = 0, ..., k+1 and deduce the associated
a_k(i)'s (this involves some linear algebra or
Lagrange
interpolation). We now have a guess for the induction! As a further note,
if you remember the
integral test from calculus you can see that a_k(k+1) has to be 1/(k+1)
(compare Sum_{n = 1 to N} n^k with Integral_{1 to N} x^k dx).
- Friday, March 13. The method of
undetermined coefficients works very well for very specific choices of the
non-homogenous component g(t); unfortunately it does not work in general and
we will have to resort to series solutions (chapter 5). A fun exercise is to
try to solve y'' + 3y' + 2y = 1/t; there is no nice guess, and the solutions
Mathematics generates involves the
exponential
integral.
- Wednesday, March 11. Today we studied
the
Method of Undetermined Coefficients (a special case of the Method of
Divine Inspiration). In general, methods such as this take as input a
differential equation and output a new differential equation. In general, the
new equation will be at least as hard (if not harder!) to solve as the
original equation, and thus nothing is gained. Occasionally, however, we're
lucky and we end up with a simpler equation to solve. We saw this today when
studying y'' - 2ay' + a^2 y = 0; we found y_1(t) = exp(at) was one root, and
guessing y(t) = u(t) y_1(t) led to the differential equation u''(t) = 0. While
this is also a second order differential equation, it is an easier
equation to solve (u(t) = c_1 + c_2 t). For the third order equation y''' +
3y'' + 3y' + 3y = 0 we get (r-1)^3 = 0 for the characteristic polynomial, or r
= 1 as a triple root. In class we conjectured that algebra would show guessing
y(t) = u(t) exp(-t) would lead to u'''(t) = 0; I've done the algebra and
confirmed that (well, I had Mathematica do the algebra for me!). This is
another third order differential equation, but a much simpler one than the
initial equation. This is a common theme in mathematics, reducing a hard
problem to a hopefully easier one. Sometimes it works, sometimes it doesn't.
One of my favorite examples is solving polynomial equations, specifically when
it is possible to write down the solution in terms of the coefficients of the
polynomial? We have formulas for
quadratic,
cubics and
quartics, but
there is no formula for
quintic and
higher. One way to tackle these problems is to associate a new polynomial,
called the resolvent, to the initial polynomial. In general the resolvent is
of a higher degree than the initial polynomial, but sometimes we're lucky and
it is easily solved. For example, the
resolvent for a
quartic is a cubic, which can be solved using the cubic formula; for the
cubic, one gets a sextic (degree six) polynomial, but as it is of the form y^6
+ ay^3 + b = 0, it is really a quadratic in y^3 (which we can solve with the
quadratic formula), and then we just take cube roots. In general, as one goes
higher the degree of the resolvent grows too fast to be useful. For more on
resolvents and solving polynomials,
see the lecture notes
by B. Cherowitzo. (One final note: for more on Galois and his interesting
life, as well as last few hours,
click here.)
- Friday, March 6. Today we discussed
finding the second solution for second order linear constant coefficient
homogenous difference or differential equations. While one can always
substitute these guesses into the original equation and see if they work, to
me this is extremely unlikely. WHY does this work? HOW does one know to guess
this? Hopefully this was somewhat clear from today's lecture;
I've written up some notes
about this which essentially recap what was said in class today (I hope
these make sense; I had to get up at 5am to write the exam today and thus am a
little short on sleep). Again, for exams all that is needed is to know how to
solve the repeated root case; however, if you can understand how we were led
to these guesses, hopefully you'll be able to generalize this to attack a
future problem. As always, I'm happy to chat more about this with anyone.
- Wednesday, March 4. Today we discussed
when linear combinations of two solutions generate all solutions to a second
order differential equation. While it is unfortunate that the Picard iteration
method doesn't generalize to give a constructive approximation to the
solution, sometimes it suffices to know that there is a unique solution. The
standard example is when we can find two solutions whose
Wronskian is non-zero, as
then we can invoke our existence / uniqueness theorem to assert that we have
ALL the solutions. The Wronskian is a very nice condition to tell if two
solutions are linearly independent; in fact, generalizations of this method
are used in many parts of mathematics. There are several examples in random
matrix theory (my main research interest); another fun one is the following:
Is it possible to construct a matrix with 2009 rows and INFINITELY many
columns such that ANY 2009 columns are linearly independent (ie, span all of
R^2009)? One solution involves a generalization of the Wronskian. As a final
note, we used a little 2x2 linear algebra to discuss finding when two
solutions are a fundamental set of solutions. For a review of the needed
linear algebra, see my notes on
multiplying matrices (html,
word) (for
many notes on linear algebra, which you do not need for this course, see
these notes).
- Monday, March 2. Today we used
Picard's method to
generate a sequence of functions converging to the solution of dy/dt = 2t(1+y)
with y(0)=0; a fascinating problem is to see what happens if instead you try a
different initial function, say phi_0(t) = sin(t) -- is the convergence
faster? Slower? Are there methods to find better initial guesses, and if so
are they worth the time and effort? We've just started Second Order Linear
Differential Equations, in particular, constant coefficient homogenous.
Standard examples in physics and engineering include
kinematics equations with friction and masses on springs, as well as certain
problems in circuit theory.
- Friday, February 25. The proof of the
existence / uniqueness result uses numerous results from analysis. For those
who want more details, see the
supplemental notes I've written on the proof (which include proofs of the
Intermediate and Mean Value Theorems, as well as Mathematical Induction). The
main result we needed which we didn't prove is Lebesgue's
Dominated
Convergence Theorem. We also discussed some of the HW problems, including
the interesting one on
reaction rates in chemistry.
- Wednesday, February 25. In class today
we proved the existence and uniqueness theorem for solutions to first order
differential equations of the form dy/dt = f(t,y) with f continuous. We used
Picard's iterative
method / method of successive approximations. This is an extremely
powerful technique, and the general idea is used to prove many
fixed point
theorems (that if f is a nice function, say a
contraction map,
then there is at least one solution to f(x) = x). There are NUMEROUS
applications of these fixed point theorems / contraction maps to a variety of
problems, especially in
game theory and economics. If anyone is interested, I have a very readable
textbook on this. If you want an amusing read, I highly recommend looking
up the famous
coffee cup theorem (involving the Brouwer fixed point theorem).
- Monday, February 23. We talked about
different numerical methods.
Newton's method is
significantly more powerful than
divide and conquer
(also called the bisecting algorithm); this is not surprising as it assumes
more information about the function of interest (namely, differentiability).
The numerical stability of Newton's method leads to many fascinating problems.
One terrific example is looking at roots in the complex plane of a polynomial.
We assign each root a different color (other than purple), and then given any
point in the complex plane, we apply Newton's method to that point repeatedly
until one of two things happen: it converges to a root or it diverges. If the
iterates of our point converges to a root, we color our point the same color
as that root, else we color it purple. This leads to
Newton fractals,
where two points extremely close to each other can be colored differently,
with remarkable behavior as you zoom in. If you're interested in more
information, let me know; a good chaos program is
xaos (I have other
links to such programs for those interested). One final aside: it is often
important to evaluate these polynomials rapidly; naive substitution is often
too slow, and
Horner's algorithm is frequently used. We also talked about the dangers of
interchanging operations (such as interchanging the order of summation or a
limit and an integral). For limit-integral problems, frequently one appeals to
Lebesgue's
Dominated
Convergence Theorem to justify these interchanges.
Measure theory is a
generalization of integration, and allows us to handle more general sets. One
example is the characteristic function of the rationals on [0,1], ie, the
function that is 1 if x is a rational in [0,1] and 0 otherwise. This function
is not Riemann
integrable, as the upper sums are always 1 and the lower sums are always
0. It can be shown in a 'natural' generalization of integration that this
function integrates to 0 (which agrees with our intuition that there are a lot
more irrationals than rationals). The example from class where lim_n Int f_n
doesn't equal Int lim_n f_n involves continuous functions; it is possible to
find an example involving infinitely differentiable functions! One of the HW
problems is to find an example where all the functions are bounded by M (ie, |f_n(x)|,
|f(x)| < M for all n and x).
- Wednesday, February 18. Today we
discussed exact equations, which is another useful technique for solving
certain classes of differential equations. I'll do some research and find
examples of real world processes which are so modeled. We also discussed
integration techniques (see the links for
partial fractions and
trig
substitutions for additional details). We also described the solution of
the autonomous growth problem. The solution here exhibits many interesting
features. In addition to being able to write it down explicitly, we can see
the dependence of the solution's behavior on the parameters. We thus see that
it is very important to determine r or y_0 exactly if they are near 0, less so
elsewhere. Parameter dependence in models is a very active research field; see
this article on
parameter sensitivity for some information. Finally, we discussed implicit
versus explicit functions. While exact differential equations are another
example where we can write down a solution, in general we only get an implicit
function for y (ie, psi(x,y) = C).
- Monday, February 16. We discussed the
solution to separable differential equations. For some real world examples,
see
http://www-rohan.sdsu.edu/~jmahaffy/courses/f00/math122/lectures/sep_diffequations/sepdeeg.htm
(population growth, water leaking from a cylander). Another good example is
the Solow growth
model in economics; see
Warren
Wessecker's notes or Chris
Edmond's notes. One of the problems with
using integrating factors or the separable equation method is that initially
an equation may not look to be in that form, but with some work can be
converted. A common technique is that if dy/dx = F(y/x) (ie, y' depends only
on the ratio y/x), then we can convert this to a separable equation for dv/dx
by setting v=y/x. As this implies xv=y, we see we are back to integrating by
parts! The other key point in today's lecture was the introduction to
bifurcations.
One of my favorite examples, as discussed, involves the path
Elvis the dog takes to fetch
a ball. The bifurcation article is
http://www.williams.edu/go/math/sjmiller/public_html/103/MintonPenningsElvisArticleFinal.pdf
while the first one (does he know calculus) is
http://www.williams.edu/go/math/sjmiller/public_html/103/Pennings_DogsCalculus.pdf.
We'll discuss bifurcations in greater detail when we study Section 2.5. (An
additional aside is that the bifurcation in the dog paper leads to a nice
proof of the arithmetic / geometric mean inequality, one of the more important
ones in math. For other proofs, see
http://www.williams.edu/go/math/sjmiller/public_html/OSUClasses/487/ArithMeanGeoMean.pdf.)
- Friday, February 13. Today's class was
a terrific example of a very common technique in mathematics: reduce the
problem you care about to a simpler one until you find one you can solve. (The
standard example of this is completing the square to find the quadratic
formula.) We showed how to solve any differential equation of the form y'(t) +
p(t)y(t) = g(t). Unfortunately the solution could involve two integrals which
may be impossible to evaluate in closed form. We'll discuss these issues in
greater detail on Monday, but there are very good techniques to approximate
arbitrarily well integrals of continuous functions (two good techniques are
the trapezoidal rule,
http://en.wikipedia.org/wiki/Trapezoidal_rule, and Simpson's rule,
http://en.wikipedia.org/wiki/Simpson%27s_rule). We also reviewed integrals
of standard functions; for Monday or Wednesday's lectures we'll need partial
fractions:
http://www.calc101.com/partial_fractions.html.
- Wednesday, February 11. Today's class
basically sets the notation for the rest of the semester. We will look
primarily at finding an unknown function of one variable where the highest
derivative can be written as a linear combination of the lower derivatives.
Explicitly, y^(n) = a(t) + a_0(t) y + a_1(t) y' + a_2(t) y'' + ... +
a_{n-1}(t) y^(n-1). Solving such an equation will involve n integrations (as n
is the order of the equation, namely the highest derivative that arises). As
each integration introduces an arbitrary constant, we see we will need n
initial conditions. To obtain closed form expressions for y(t) will require
finding anti-derivatives (or integrals). While a general function does not
have a nice anti-derivative (the classic example is exp(-t^2)), there are many
that do. Thus it is important to remember the results concerning standard
functions from Math 103 and 104. There are many tables online; see
http://en.wikipedia.org/wiki/Table_of_integrals#Table_of_Integrals or
http://www.math.com/tables/integrals/tableof.htm; you can also use a
program like Mathematica to solve some integrals (this can be done online at
http://integrals.wolfram.com/index.jsp).
- Monday, February 9. In studying
difference equations we saw how linear algebra can be useful; in particular,
the need to evaluate large powers of a matrix quickly. This is known as fast
exponentiation, and the ability to do this (both for matrices as well as
regular numbers) is extremely important. For example, one's first instinct is
to say we need 100 (or 99) multiplications to evaluate x^100, but it is
possible to do this in just 8: x*x, x^2 * x^2, x^4 * x^4, x^8 * x^8, x^16 *
x^16, x^32 * x^32, x^64 * x^32 * x^4. The key observation is using the base 2
expansion of 100; this idea is one of the reasons
RSA encryption is feasible. For
more details, see
Chapter 1 of my book,
http://press.princeton.edu/chapters/s8220.pdf
(especially Sections 1.1 and 1.2.1). Quite often in mathematics we have
algorithms to solve problems that are not feasible in practice, and finding
efficient ways of computing quantities is a big (and important) industry.
Another great example of where we know the solution exists but have trouble
finding it is given by Euclid's proof of the infinitude of primes. Euclid
argued that there must be infinitely many primes as follows: Assume not, and
thus let p_1, ..., p_n be all the primes. Consider the product p_1 * ... * p_n
+ 1; either that number is prime, or it is divisible by a prime p. This prime
p cannot be any of p_1, ..., p_n, as each p_i leaves remainder 1. Thus there
are infinitely many primes, and we denote this new prime p by p_{n+1}. Lather,
rinse, repeat. Keep doing this and we'll get an infinite list of primes. OK,
great. This shows there are infinitely many. What can we say about the
sequence of primes constructed? Does this list contain all the primes? Do we
have any idea which primes are in the list and when? Is it easy to compute the
terms? Euclid's method leads to the following sequence of primes:
2, 3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 2801, 11, 17, 5471,
52662739, 23003, 30693651606209, 37, 1741, 1313797957, 887, 71, 7127, 109, 23,
97, 159227, 643679794963466223081509857, 103, 1079990819, 9539, 3143065813,
29, 3847, 89, 19, 577, 223, 139703, 457, 9649, 61, 4357....
(Remember how we generated the sequence. We started with p_1 = 2, the first prime. We apply
Euclid's argument and consider 2+1; this is the prime 3 so we set p_2 = 3. We
apply Euclid's argument and now have 2*3+1 = 7, which is prime, and set p_3 =
7. We apply Euclid's argument again and have 2*3*7+1 = 43, which is prime and
set p_4 = 43. Now things get interesting: we apply Euclid's argument and
obtain 2*3*7*43 + 1 = 1807 = 13*139, and set p_5 = 13.) This is a great sequence to think about, but it is a computational nightmare
to enumerate! I downloaded these terms from the Online Encyclopedia of Integer
Sequences (homepage
is
http://www.research.att.com/~njas/sequences/ and the page for our
sequence is
http://www.research.att.com/~njas/sequences/A000945 ). You can enter the
first few terms of an integer sequence, and it will list whatever sequences it
knows that start this way, provide history, generating functions, connections
to parts of mathematics, .... This is a GREAT website to know if you want to
continue in mathematics. There have been several times I've computed the first
few terms of a problem, looked up what the future terms could be (and thus had
a formula to start the induction).