Additional discussions for Math 209

Additional comments related to material from the class. If anyone wants to convert this to a blog, let me know. These additional remarks are for your enjoyment, and will not be on homeworks or exams. These are just meant to suggest additional topics worth considering, and I am happy to discuss any of these further.

Friday, May 15. We finished calculus of variations, seeing how if J[y] = Int_{a to b} F(x,y,y')dx has F not explicitly depending on x, then we get F - F_{y'} y' is constant. We also discussed the hyperbolic trig functions, which naturally arise in the catenoid problem. We then analyzed a conjecture and a model for two firms competing. As some people missed class and this example illustrates many of the key concepts of the course (creating a reasonable model, solving it, exploring the consequences of the parameter dependence in our solution), I have taken the time and written up the analysis in great detail.

Wednesday, May 13. The Euler (also called the Euler-Lagrange) equation for when the functional J[y] = Int_{x = a to b} F(x,y,y')dx, namely F_y - d F_{y'} / dx = 0, is one of the most (if not the most) important differential equations in mathematical physics. I apologize for not doing this in 30 minutes or less (but getting to it in 35 minutes is not bad, as I'd much rather have the lecture self-contained: click here and search for the phrase eight-minute) is extremely important. It is the analogue of the first derivative vanishing for functionals, and as such is the cornerstone of optimization, a very important subject. It is quite amazing that all of classical mechanics can be derived from minimizing the action (L = T - U, the Lagrangian of the system); quantum mechanics comes from minimizing the Hamiltonian, H = T + U (WHY this is the case, as I said, is not my department). This leads to Newton's famous F = ma. Has any progress been made? At first it isn't clear. We still have to solve this equation, but it turns out that we can work in generalized coordinate systems, and this leads to greatly simplified math for many problems (if you've done classical mechanics, you've probably seen the double pendulum problem; the link there contains an analysis of the solution using the Euler-Lagrange equations and generalized coordinates, and discusses how this system is chaotic). Notice that we are not claiming that we get physics for free. We must still assume / axiomatize something. Instead of assuming F = ma, we assume that physics can be described by minimizing the action / Lagrangian T - U. WHY this is the case is a bit of a mystery (just like why must F = ma). Something must be assumed somewhere (although perhaps see the discussions on the fine structure constant and then chat with me about dimension analysis). We'll see in class on Friday how this is related to solving other problems (such as the catenary, the shape a string will take under gravity). One last bit: one of the key inputs in proving the Euler (Euler-Lagrange) equation was Taylor's theorem in several variables. One way to prove this is our powerful technique of adding zero. To simplify the write-up, I'll just consider a function of two variables. Say we are studying F(a+h, b+k) - F(a,b), so (h,k) is the small change. We add zero and see this is the same as [F(a+h, b+k) - F(a, b+k)] + [F(a, b+k) - F(a, b)]. To get the first order piece, we apply Taylor's theorem in one variable to each bracketed quantity, and get (Fx means derivative with respect to first variable, Fy means derivative with respect to second): Fx h + Fy k + ... (higher order pieces). One can continue this analysis to get the higher order terms, but what I want you to see is how this follows from one variable.

Monday, May 11. To study the Calculus of Variations, we needed to quickly review the notions of a vector space and a normed vector space. We study functionals now, which are maps from functions to real numbers. We can generalize the notion of a continuous function to a continuous functional; however, there are often several different metrics we can use as to whether or not two functions are close, and depending on which metric we use the functional may or may not be continuous. For example, the arc length of a curve y(x) is given by J[y] Int_{x = a to b} Sqrt(1 + y'(x)^2} dx. Two normed vector spaces are C[a,b] = {continuous functions on [a,b]} and C^k[a,b] = {functions continuously differentiable k times on [a,b]}. We define the corresponding norms by ||f||_0 = max_{a <= x <= b} |f(x)| for functions in C[a,b] and ||f||_1 = max_{a <= x <= b} |f(x)| + max_{a <= x <= b} |f '(x)| for functions in C^1[a,b], and the distance between two functions f1 and f2 is d(f1,f2) = ||f1 - f2||. Note two functions can be quite close to each other in one space but far apart in another. We saw this with the staircase and the sinusoidal function and the straight line: they are close in C[a,b] but not in C^1[a,b]. It's a nice exercise to show that the arc length functional is continuous when we study functions in C^1[a,b] but NOT when we look at functions in C[a,b]. The wikipedia article, Calculus of Variations, does a nice job describing some of the problems and norms. Other big problems in the subject are the catenary (the shape a string will take under gravity), and the Brachistochrone Problem (find the curve such that a ball sliding down it will travel from A to B in least time). It took a bit of searching, but click here for the anecdote of Newton returning from the mint and staying up until 4am to solve this and then having it published anonymously, leading to Bernoulli saying that he recognizes the lion by his claw.

Friday, May 8. TBD: Chaos links: Chaos theory, three and n-body problem, Mandelbrot set (Cantor set, Sierpinski gasket), Newton fractal. The Mathematica notebook for the random whale evolution is here (click here for a .pdf in case you don't have a Mathematica reader).

Wednesday, May 6. We saw how the Laplace transform leads to a solution of constant coefficient second order linear homogenous differential equations. The key identity is that the Laplace transform of y^(n) (the nth derivative of y) is just s^n * (Laplace transform of y) plus a simple polynomial in s whose coefficients are the initial values of the problem. The solution is then obtained by taking the inverse Laplace transform; fortunately by using partial fractions we see that the solution will lead to combinations of terms like exp(ct). While we only did the case of distinct roots, this method can handle repeated roots as well. We have thus replaced the method of divine inspiration, which I consider progress. The difficulty is computing inverse Laplace transforms, but fortunately extensive tables exist. Many of the identities follow from what we'll call `Bring it over', namely unknown = good + c*unknown, and thus unknown = good / (1-c) (which is fine so long as c is not 1). The theory obviously generalizes to higher order linear homogenous differential equations. To handle non-homogenous requires us to be able to compute the inverse Laplace transform of the Laplace transform of the non-homogenous piece divided by a polynomial in s (which amazingly can frequently be done, as we'll see on Friday). See the article on wikipedia for numerous examples of how to apply the Laplace transform to solve a variety of problems (solving radioactive decay, deriving the impedence of a resistor, ...).

Monday, May 4. The Laplace Transform is an important example of an integral transform. It is extremely useful in solving certain types of differential equations. We will not prove that the inverse Laplace transform exists and is unique for a large class of functions, as there is no need. This is because we often have general existence theorems for solutions to differential questions. Thus, a solution exists, and thus the inverse Laplace transform will exist for the functions we study. The Laplace transform is but one of many useful integral transforms; other extremely important ones are the Fourier (with applications ranging from solving partial differential equations, such as the heat equation, to proving the Central Limit Theorem) and Mellin transforms.

Wednesday, April 29. As we've seen, creating a good model requires serious contemplation and an analysis of the trade-offs between capturing all the features and being tractable. Frequently the system being modeled is so complex that, amazingly, using random numbers leads to accurate predictions. One example of this is Random Matrix Theory (for a description of the nuclear physics origins, basic results and connections to number theory, see chapter 15 of my book An Invitation to Modern Number Theory). (If you are interested in seeing the connections with differential equations, let me know and I'll give you the relevant pages from Chapter 17. The fact that Q^T A Q and A must have the same probabilities for any orthogonal change of variables Q means that the function f(Q) = Prob(Q^T A Q) - Prob(A) must be constant. This leads to the simple differential equation f'(Q) = 0, f(I) = 0; we then pass from this differential equation to differential equations for the probability distributions we use for each entry of A, and find that the entries of A are chosen from Gaussian distributions.) We then analyzed a population problem involving the number of pairs of whales of various ages at any time (v_{n+1} = A v_n where A is a Leslie matrix). We first modeled this with a simple constant coefficient system of difference equations, which we can solve completely. We then discussed the problems with such a model, and possible generalizations that would address these issues. For more details, see the two models described in my notes here. Interestingly, there is a connection between the generalized model and random matrix theory!

Monday, April 27. Today we discussed modeling, in particular, the interplay between finding a model that captures the key features and one that is mathematically tractable. While we used a problem from baseball as an example, the general situation is frequently quite similar. Often one makes simplifying assumptions in a model that we know are wrong, but lead to doable math (for us, it was using continuous probability distributions in general, and in particular the three parameter Weibull). For more on these and related models, see
- slides of my baseball / sabermetrics talk (the paper is available here)
- my marketing paper for the movie industry

Friday, April 24. Today we briefly discussed some advanced topics; though we won't be able to cover them all, as mentioned let me know if you want additional reading or a private lecture. Many of these are in our textbook. The top vote getters are Modeling (click here for slides of my baseball / sabermetrics talk, or here for my marketing paper for the movie industry), Chaos Theory, Calculus of Variations, and the Laplace Transform. When we discuss Chaos Theory the importance of efficient algorithms will become clear; in particular, we'll revisit Horner's algorithm for fast polynomial evaluation. The need to quickly and efficiently evaluate expressions is one of the central themes of our course (and much of modern science).

Wednesday, April 22. Euler's method (great site here with handouts and animations) is a good method to numerically approximate solutions to first order differential equations of the form y'(t) = f(t,y(t)). The key ingredient in this method is the tangent line approximation for a function (ie, a zeroth order Taylor series). There are better algorithms. Similar to first order Taylor approximations, the error in the Euler tangent line method is proportional to h^2 (where h is the step size); methods such as Runga-Kutta (see also here) have better errors, such as of size h^4. The reason for these improvements is due to better approximations of areas (we may rewrite our problem as y(t) - y(0) = Int_{s = 0 to t} f(s,y(s))ds, and thus we see the importance of estimating areas well). One terrific method for approximating integrals is Simpson's rule; I strongly urge you to read the wikipedia article on Simpson's rule (link is here). The method of proof is more important than the result itself. Specifically, by taking a weighted combination of the midpoint method and the trapezoid method (two algorithms that are order h^3), we get a new method which is at least as good as order h^4 (it's actually order h^5 because of additional symmetries). This is amazing! Each method is only as good as h^3, but appropriately combined they give h^5! As you can see from the statement of Simpson's rule, it isn't that much additional algebra to compute. The key is in choosing the correct weights. This is a very common principle; it is used in economics to combine various investments into a portfolio and decrease the variance in the rate of return (the key assumption is independence of the various funds; the correlations of the various funds are often incorrectly estimated, as the current collapse of Western Civilization shows). As you can tell, I am a huge fan of this method, as it illustrates numerous important techniques and tools. In particular, it shows how one can rearrange algebra to do things more efficiently or how rearranging the algebra leads to a new way of interpreting the results (putting the division by 6 with the sum of the functions versus with the b-a). Also, we see how consistency checks / investigations of simple cases can help us figure out if a formula is reasonable (why we're not surprised Simpson's method has a b-a, or a division by 6 for example).

Monday, April 20. Any first order linear system of differential equations x'(t) = Ax(t) with x(0) = x_0 has the solution x(t) = exp(At) x_0. While this formula is of great theoretical interest (as it does completely solve the problem), from a computational standpoint we have merely traded one problem for another. Thus, unless we can easily compute exp(At), this is not a good trade. Fortunately there are good techniques for a variety of problems to either compute exp(At) or at least to approximate it to a given accuracy. Simplest classes of such matrices are diagonalizable and nilpotent. While a general matrix cannot be diagonalized, it can be put into Jordan Canonical Form, which often is sufficient for computations. Even though many systems of differential equations are non-linear, they can often be locally approximated by linear systems; this is similar to approximating complex functions in calculus with their first order Taylor series, which is quite useful (for example, Newton's Method). We ended with a discussion of a model for how a virus propagates in a network; if anyone is interested in either reading more about this or joining our research group, just let me know. What I particularly like is that reasonable assumptions lead to a guess of di/dt = beta p (N-1) i(t) (1 - i(t)) - delta i(t) (for how the percentage of infected nodes changes with time). Quick sanity checks show this is a reasonable equation (what happens if i(t) is 0 or 1, ...). This is one of the most important points to take away from the class, namely how to look at a model and see if it is reasonable. For another example, see my notes on the log5 rule for determining the probability one team beats another.

Friday, April 17. The technique of adding zero (which we used to analyze the second HW problem due on Wednesday, namely x y''(x) + y'(x) + x y(x) = 0, where we replaced x with x - 1 + 1 -- see the notes on series solutions for details) is extremely powerful and important in mathematics. This is the key step in many proofs; the first which you might remember is the proof of the product formula from calculus (see the alternative proof on Wikipedia).

Wednesday, April 15. One of the most important things to learn in this (or any class) is the importance of asking the right question. Here is a fascinating clip of a great speaker talking about the importance of asking the right question. Today we asked several questions to try to illuminate some important concepts in linear algebra (which will be of use in solving systems of differential equations). Sadly, even if everything is real we'll need to use complex numbers in general. This is because we find eigenvalues by solving Det(A - lambda I) = 0. This leads to a polynomial in lambda to solve, and while the Fundamental Theorem of Algebra asserts a polynomial of degree n has exactly n roots (not necessarily distinct), the roots to a real polynomial might be complex. While exp(x) exp(y) = exp(x+y) if x and y are real, this is not true in general for matrices A and B; ie, exp(A) exp(B) is not exp(A+B) in general. There is a formula for exp(A) exp(B), the Baker-Campbell-Hausdorff formula, See the Zassenhaus formula for a nice explicit formula for this product. The formula involves the commutator of two matrices, where [X,Y] = XY - YX measures how far X and Y are from commuting. The commutator arises throughout the sciences, in particular in quantum mechanics, where the fundamental commutator relation asserts that [X,P] = i hbar I (hbar = planck's constant divided by 2 pi, X is the position operator and P is the momentum operator). Finally, a few points about optimal algorithms / efficient algorithms.
- Horner's algorithm to evaluate a polynomial quickly: 4x^3 + 5x^2 - 3x + 8 = ((4x+5)x - 3)x + 8 (saves a few multiplications!). Saving multiplications is very important; one application of evaluating polynomials quickly is in constructing Mandelbrot sets.
- Telescoping sums: often re-arranging the algebra leads to a significantly easier computation.
- Fast matrix multiplication: Naively we expect it takes N^3 multiplications to find all N^2 entries of A^2 or AB when A and B are NxN matrices. The Strassen algorithm (see also the Mathworld entry here, which I think is a bit more readable) does it in about N^(log_2 7); the reason for this savings is that they can multiply two 2x2 matrices with seven and not 8 multiplications (3 = log_2 8). The best known algorithm is the Coopersmith-Winograd algorithm, which is of the order of N^2.376 multiplications. See also this paper for some comparison analysis, or email me if you want to see some of these papers.
  - Some important facts. The Strassen algorithm has some issues with numerical stability.
  - One can ask similar questions about one dimension matrices, ie, how many bit operations does it take to multiply two N digit numbers. It can be done in less than N^2 bit operations (again, very surprising!). One way to do this is with the Karatsuba algorithm (see also the Mathworld entry for the Karatsuba algorithm).

Monday, April 13. Because first order linear non-homogenous differential equations can be solved exactly through two integrals (one gives the integrating factor, and then we have to integrate that against the non-homogenous term), in some sense all such problems should be viewed as 'easy' (ie, by numerically approximating these integrals we can get excellent approximations to the true solutions). Thus it is not surprising that we always want to convert differential equations into a system of uncoupled first order linear non-homogenous differential equations. This explains the motivation for our quick tour through linear algebra. The key concept is that if we change our basis, we change what a matrix looks like but not its effect. You should think of this as a matrix is a representation of a physical transformation; this will look different in different bases. (For example, if we use East-North coordinates our location might be one set of numbers, but if we use Northeast-Southwest, it would be something else, yet it's physically the same point.) We've seen in class that if we have a diagonal matrix A, then solving the system of equations x'(t) = A x(t) + g(t) (where x(t) and g(t) are vectors of functions) is straightforward, namely it just reduces to a bunch of independent first order linear non-homogenous differential equations. It turns out that if we're given x'(t) = A x(t) + g(t) for a generic A, then most of the time we can find a matrix T and new functions y(t) such that x(t) = Ty(t) and y'(t) = D y(t) + h(t), where D is a diagonal matrix and h(t) = T^{-1} g(t). Thus we can solve for y(t) and then find x(t) with the rule x(t) = T y(t). Thus we see the importance of being able to diagonalize a matrix (ie, find a matrix T as described above). The way this is done (and we'll discuss this in greater detail) is to find the eigenvalues and eigenvectors of A. Recall v is an eigenvector with eigenvalue lambda if Av = lambda v. All we will need for our class is that the eigenvalues for A are found by solving Det(A - lambda I) = 0, where A is a given n x n matrix and I is the n x n identity matrix (all 1s on the main diagonal and 0s elsewhere). I have written some notes on linear algebra over the years which might be helpful (my apologies for the formatting -- these are OLD notes that I've converted to html -- I strongly urge you to read the MSWord versions and not the html version!):
- Multiplying matrices MSWord version (as html doesn't display everything properly)
- Eigenvalues and eigenvectors MSWord version (as html doesn't display everything properly)
- Finding eigenvalues MSWord version (as html doesn't display everything properly)
- If you want to know what to do about matrices that can't be diagonalized, click here for notes on Jordan Canonical Form.
Friday, April 10. In solving for series solutions y(x) = Sum_n a_n x^n to differential equations, we saw how we can get somewhat messy expressions in the recurrence relations for the unknown coefficients a_n. We saw this when we studied the Airy equation (which arises in optics); the answer involved the triple factorial (keep reading the wikipedia entry to get to some really large generalizations of the factorial, such as the superfactorial, the up-arrows, ...; this leads to unbelievably large numbers such as Graham's number). There is a lot of interesting behavior about the solutions to the Airy equation, as well as to generalizations of the factorial function. The most common generalization is the Gamma function: Gamma(s) = Int_{t = 0 to oo} exp(-t) t^s dt/t. Integrating by parts, one sees that for n a positive integer, Gamma(n) = (n-1)!; thus, the Gamma function generalizes the factorial function to all real numbers (and even to complex numbers!). My favorite fact is that Gamma(1/2) = (-1/2)! = sqrt(pi). (I won't put an exclamation point for emphasis, but this is a fascinating result! If you want to see a proof, let me know.) This is not just a mathematical curiosity, but arises throughout the sciences. In particular, it's related to the normalization for the standard Gaussian density: p(x) = exp(-x^2/2) / sqrt(2pi) (this is a probability distribution, and integrates to 1). For our purposes, the reason these facts matter is that we want to get some feel for our series solution (for what x does it converge, how rapidly is it growing / decaying, how many terms do we need to get a good approximation, ...).
- We ended with a statement of what a system of differential equations is, when it is linear, and the claim that systems of first order linear differential equations (homogenous or non-homogenous) will be easily solvable using integrating factors after inputting a little linear algebra. What makes this one of the most important techniques / theorems of linear algebra is that we can reduce so many things to such systems. For example, a little algebra shows that if we have u''(t) + .125 u'(t) + u(t) = 0, then setting x1(t) = u(t) and x2(t) = u'(t) leads to a system of linear differential equations for x1(t) and x2(t).
- We briefly mentioned in class the Bessel equation (I actually did remember it correctly: it's x^2 y''(x) + x y'(x) + (x^2 - n^2) y(x) = 0). This equation is a bit harder to solve than Airy's equation, as x = 0 is not an ordinary point (the coefficient of y''(x) vanishes at x=0). This equation is discussed in our textbook. It arises frequently in physics and number theory (if people are interested I'm happy to discuss how it arises in my work on the distribution of primes and the energy levels of heavy nuclei such as Uranium).
- One final point to note: it is not yet clear that it is advantageous to replace second order differential equations with systems of first order differential equations. We're increasing the number of unknown functions (which is bad), but we're decreasing the highest power of the derivatives that appear (which is good!). Here's my favorite example of how making a change doesn't help: the nobel laurate Richard Feynman showed that you can reduce all of physics to solving the equation U = 0. What is U? It's the unworldliness of the physical world. What you do is you take each physical law, say F = ma, E = mc^2, .... We rewrite these as U = (F - ma)^2 + (E - mc^2)^2 + ... = 0; thus the only way this can be equal to zero is if each physical law is satisfied! Wow -- all of physics in one equation involving just three recognizable symbols! Unfortunately, of course, this is completely intractable! See here for a little more on these notational gimmicks.

Wednesday, April 8. Systems of equations are frequently used to model real world problems, as it is quite rare for there to be only one function of interest. If you want to read more about applying math to analyze the Battle of Trafalgar, here is a nice handout (or, even better, I think we could go further and write a nice paper for a general interest journal expanding on the Mathematica program I wrote). The model we discussed is very similar to the Lotka-Volterra predator-prey equations (our evolution is quite different, though; this is due to the difference in sign in one of the equations). Understanding these problems is facilitated by knowing some linear algebra; we'll discuss what is needed in class. It is also possible to model this problem using a system of difference equations, which can readily be solved with linear algebra. Finally, it's worth noting a major drawback of this model, namely that it is entirely deterministic: you specify the initial concentrations of red and blue and we know exactly how many exist at any time. More generally one would want to allow some luck or fluctuations; one way to do this is with Markov chains. This leads to more complicated (not surprisingly) but also more realistic models. In particular, you can have different probabilities for one ship hitting another, and given a hit you can have different probabilities for how much damage is done. This can be quite important in the 'real' world. A classic example is the British efforts to sink the German battleship Bismarck in WWII. The Bismarck was superior to all British ships, and threatened to decisively cripple Britain's commerce (ie, the flow of vital war and food supplies to the embattled island). One of the key incidents in the several days battle was a lucky torpedo shot by a British plane which seriously crippled the Bismarck's rudder. See the wikipedia entry for more details on one of the seminal naval engagements of WWII. The point to take away from all this is the need to always be aware of the limitations of one's models. With the power and availability of modern computers, one workaround is to run numerous simulations and get probability windows (ie, 95% of the time we expect a result of the following type to occur). Sometimes we are able to theoretically prove bounds such as these; other times (using Markov chains and Monte Carlo techniques) we numerically approximate these probabilities,.

Monday, April 6. Series expansions sum a_n x^n are a great tool to solve differential equations; the important ingredient is some control over where these series converge and how rapidly they converge. This is why we need tests such as the ratio or the root test. It is not always easy to see the true magnitude of a series from its expansion. For example, exp(x) = sum x^n/n! and exp(-x) = sum (-x)^n/n!; there is a huge difference in behavior as x --> oo, but looking at the size of the coefficients a_n one sees no difference. (There are times in mathematical physics when we look at divergent series. These are often called asymptotic series; one standard example is the exponential integral. The main step in solving differential equations with a series expansion is to find a tractable recurrence relation for the unknown coefficients a_n and then solve it (this is another reason why we spent so much time on difference / recurrence relations at the start of the semester). The advantage is we have reduced a differential equation to systems of equations, and often these are more tractable. The example we did in class is typical, where we see two free parameters arising (as we did a second order differential equation), and specifying a_0 and a_1 uniquely determine all other a_n (not the similarity with difference equations such as the Fibonacci equation).

Wednesday, March 18. Convergence (and divergence) of sequences and series is an extremely important topic. We want to replace complicated objects with simpler ones, and then approximate the complex arguments by analyzing the simpler ones and understanding the error terms. Key items to study: series, absolute convergence, conditional convergence, radius of convergence, geometric series, harmonic series, ..., and see the bottom of the wikipedia article for the various tests (comparison, integral, ratio, root, ...). Behavior of sequences and series can be quite surprising: here are the four questions and answers:
- (1) Consider Sum_{n=1 to oo} a_n x^n; must this converge for some n not equal to zero? Answer: NO! Take a_n = n^n.
- (2) Consider the Taylor series for f and g at the x=0. If the two series are equal, must f(x) = g(x) for some x not equal to zero? Answer: NO! Looking at f(x)-g(x), this is equivalent to asking if a function has Taylor series identically equal to zero at a point, is the function identically equal to zero in a neighborhood of that point? The answer is no. Let h(x) = 0 if x=0 and exp(-1/x^2) otherwise. Using the definition of the derivative, one can show all derivatives are zero at zero, but the radius of convergence is zero (ie, the function is only zero at zero). One has to resort to the definition of the derivative because there is a change in the definition of the function.
- (3) Let a_n be any sequence of real numbers. Is there always an infinitely differentiable function such that the n-th derivative at x=0 equals a_n? Answer: Amazingly, YES! Extra credit if you can give an example.
- (4) Let f(x) be a continuous function. Must f(x) be differentiable for at least one x? Answer: NO! Weierstrass came up with a nice example (refined later by Hardy, I believe). It turns out that one can define a sense of size on the space of continuous functions. Just as almost no real numbers are rational (Cantor's diagonalization argument), almost all continuous functions are nowhere differentiable!

Monday, March 16. While variation of parameters is a great theoretical tool, in practice it is hard to use. There are two difficulties with the approach. First, we need to know the solutions to the homogenous equation, and second we need to be able to evaluate integrals involving reciprocals of the Wronskian determinant! In the course of our analysis, at one point we chose to restrict attention to u1(t) and u2(t) with u1'(t)y1(t) + u2'(t)y2(t) = 0. This is a severe restriction on the possible functions, but fortunately it turns out that (for this type of problem at least) we can find a solution. To do so involves solving a system of two equations with two unknowns, the unknowns being the functions u1'(t) and u2'(t). We thus end up with two first order differential equations, and this is progress (ie, it's easier to solve the two separable first order differential equations than the original second order differential equation). This is a common theme throughout the course: there are lots of methods we can try; sometimes we end up with a simpler equation and win, other times it's more complicated and we lose (see the notes for Wednesday, March 11). We also talked about sums of k-th powers of integers: sum_{n = 1 to N} n^k. These are typically proved by mathematical induction (see also my online notes). The problem in using induction is that you need to know what you are trying to prove. Thus, if k=1 you try and show it is N(N+1)/2, if k=2 you show it is N(N+1)(2N+1)/6; how do you figure out the general guess for arbitrary k? There are several approaches. One of my favorite uses differentiating identities and L'Hopital's rule (but it is NOT easy to use; I don't think I can do more than k=2 with it, though I'd love to see someone go further). Another is to gather numerical data. ASSUME that the answer is a polynomial in N of degree k+1 (this is not an unreasonable conjecture based on analyzing k=1 and k=2). If this is the case, the answer is some polynomial a_k(0) + a_k(1) N + ... + a_k(k+1) N^{k+1}. A polynomial of degree k+1 is uniquely determined by its value at k+2 points. Thus, we simply evaluate our k-th power sums for N = 0, ..., k+1 and deduce the associated a_k(i)'s (this involves some linear algebra or Lagrange interpolation). We now have a guess for the induction! As a further note, if you remember the integral test from calculus you can see that a_k(k+1) has to be 1/(k+1) (compare Sum_{n = 1 to N} n^k with Integral_{1 to N} x^k dx).

Friday, March 13. The method of undetermined coefficients works very well for very specific choices of the non-homogenous component g(t); unfortunately it does not work in general and we will have to resort to series solutions (chapter 5). A fun exercise is to try to solve y'' + 3y' + 2y = 1/t; there is no nice guess, and the solutions Mathematics generates involves the exponential integral.

Wednesday, March 11. Today we studied the Method of Undetermined Coefficients (a special case of the Method of Divine Inspiration). In general, methods such as this take as input a differential equation and output a new differential equation. In general, the new equation will be at least as hard (if not harder!) to solve as the original equation, and thus nothing is gained. Occasionally, however, we're lucky and we end up with a simpler equation to solve. We saw this today when studying y'' - 2ay' + a^2 y = 0; we found y_1(t) = exp(at) was one root, and guessing y(t) = u(t) y_1(t) led to the differential equation u''(t) = 0. While this is also a second order differential equation, it is an easier equation to solve (u(t) = c_1 + c_2 t). For the third order equation y''' + 3y'' + 3y' + 3y = 0 we get (r-1)^3 = 0 for the characteristic polynomial, or r = 1 as a triple root. In class we conjectured that algebra would show guessing y(t) = u(t) exp(-t) would lead to u'''(t) = 0; I've done the algebra and confirmed that (well, I had Mathematica do the algebra for me!). This is another third order differential equation, but a much simpler one than the initial equation. This is a common theme in mathematics, reducing a hard problem to a hopefully easier one. Sometimes it works, sometimes it doesn't. One of my favorite examples is solving polynomial equations, specifically when it is possible to write down the solution in terms of the coefficients of the polynomial? We have formulas for quadratic, cubics and quartics, but there is no formula for quintic and higher. One way to tackle these problems is to associate a new polynomial, called the resolvent, to the initial polynomial. In general the resolvent is of a higher degree than the initial polynomial, but sometimes we're lucky and it is easily solved. For example, the resolvent for a quartic is a cubic, which can be solved using the cubic formula; for the cubic, one gets a sextic (degree six) polynomial, but as it is of the form y^6 + ay^3 + b = 0, it is really a quadratic in y^3 (which we can solve with the quadratic formula), and then we just take cube roots. In general, as one goes higher the degree of the resolvent grows too fast to be useful. For more on resolvents and solving polynomials, see the lecture notes by B. Cherowitzo. (One final note: for more on Galois and his interesting life, as well as last few hours, click here.)

Friday, March 6. Today we discussed finding the second solution for second order linear constant coefficient homogenous difference or differential equations. While one can always substitute these guesses into the original equation and see if they work, to me this is extremely unlikely. WHY does this work? HOW does one know to guess this? Hopefully this was somewhat clear from today's lecture; I've written up some notes about this which essentially recap what was said in class today (I hope these make sense; I had to get up at 5am to write the exam today and thus am a little short on sleep). Again, for exams all that is needed is to know how to solve the repeated root case; however, if you can understand how we were led to these guesses, hopefully you'll be able to generalize this to attack a future problem. As always, I'm happy to chat more about this with anyone.

Wednesday, March 4. Today we discussed when linear combinations of two solutions generate all solutions to a second order differential equation. While it is unfortunate that the Picard iteration method doesn't generalize to give a constructive approximation to the solution, sometimes it suffices to know that there is a unique solution. The standard example is when we can find two solutions whose Wronskian is non-zero, as then we can invoke our existence / uniqueness theorem to assert that we have ALL the solutions. The Wronskian is a very nice condition to tell if two solutions are linearly independent; in fact, generalizations of this method are used in many parts of mathematics. There are several examples in random matrix theory (my main research interest); another fun one is the following: Is it possible to construct a matrix with 2009 rows and INFINITELY many columns such that ANY 2009 columns are linearly independent (ie, span all of R^2009)? One solution involves a generalization of the Wronskian. As a final note, we used a little 2x2 linear algebra to discuss finding when two solutions are a fundamental set of solutions. For a review of the needed linear algebra, see my notes on multiplying matrices (html, word) (for many notes on linear algebra, which you do not need for this course, see these notes).

Monday, March 2. Today we used Picard's method to generate a sequence of functions converging to the solution of dy/dt = 2t(1+y) with y(0)=0; a fascinating problem is to see what happens if instead you try a different initial function, say phi_0(t) = sin(t) -- is the convergence faster? Slower? Are there methods to find better initial guesses, and if so are they worth the time and effort? We've just started Second Order Linear Differential Equations, in particular, constant coefficient homogenous. Standard examples in physics and engineering include kinematics equations with friction and masses on springs, as well as certain problems in circuit theory.

Friday, February 25. The proof of the existence / uniqueness result uses numerous results from analysis. For those who want more details, see the supplemental notes I've written on the proof (which include proofs of the Intermediate and Mean Value Theorems, as well as Mathematical Induction). The main result we needed which we didn't prove is Lebesgue's Dominated Convergence Theorem. We also discussed some of the HW problems, including the interesting one on reaction rates in chemistry.

Wednesday, February 25. In class today we proved the existence and uniqueness theorem for solutions to first order differential equations of the form dy/dt = f(t,y) with f continuous. We used Picard's iterative method / method of successive approximations. This is an extremely powerful technique, and the general idea is used to prove many fixed point theorems (that if f is a nice function, say a contraction map, then there is at least one solution to f(x) = x). There are NUMEROUS applications of these fixed point theorems / contraction maps to a variety of problems, especially in game theory and economics. If anyone is interested, I have a very readable textbook on this. If you want an amusing read, I highly recommend looking up the famous coffee cup theorem (involving the Brouwer fixed point theorem).

Monday, February 23. We talked about different numerical methods. Newton's method is significantly more powerful than divide and conquer (also called the bisecting algorithm); this is not surprising as it assumes more information about the function of interest (namely, differentiability). The numerical stability of Newton's method leads to many fascinating problems. One terrific example is looking at roots in the complex plane of a polynomial. We assign each root a different color (other than purple), and then given any point in the complex plane, we apply Newton's method to that point repeatedly until one of two things happen: it converges to a root or it diverges. If the iterates of our point converges to a root, we color our point the same color as that root, else we color it purple. This leads to Newton fractals, where two points extremely close to each other can be colored differently, with remarkable behavior as you zoom in. If you're interested in more information, let me know; a good chaos program is xaos (I have other links to such programs for those interested). One final aside: it is often important to evaluate these polynomials rapidly; naive substitution is often too slow, and Horner's algorithm is frequently used. We also talked about the dangers of interchanging operations (such as interchanging the order of summation or a limit and an integral). For limit-integral problems, frequently one appeals to Lebesgue's Dominated Convergence Theorem to justify these interchanges. Measure theory is a generalization of integration, and allows us to handle more general sets. One example is the characteristic function of the rationals on [0,1], ie, the function that is 1 if x is a rational in [0,1] and 0 otherwise. This function is not Riemann integrable, as the upper sums are always 1 and the lower sums are always 0. It can be shown in a 'natural' generalization of integration that this function integrates to 0 (which agrees with our intuition that there are a lot more irrationals than rationals). The example from class where lim_n Int f_n doesn't equal Int lim_n f_n involves continuous functions; it is possible to find an example involving infinitely differentiable functions! One of the HW problems is to find an example where all the functions are bounded by M (ie, |f_n(x)|, |f(x)| < M for all n and x).

Wednesday, February 18. Today we discussed exact equations, which is another useful technique for solving certain classes of differential equations. I'll do some research and find examples of real world processes which are so modeled. We also discussed integration techniques (see the links for partial fractions and trig substitutions for additional details). We also described the solution of the autonomous growth problem. The solution here exhibits many interesting features. In addition to being able to write it down explicitly, we can see the dependence of the solution's behavior on the parameters. We thus see that it is very important to determine r or y_0 exactly if they are near 0, less so elsewhere. Parameter dependence in models is a very active research field; see this article on parameter sensitivity for some information. Finally, we discussed implicit versus explicit functions. While exact differential equations are another example where we can write down a solution, in general we only get an implicit function for y (ie, psi(x,y) = C).

Monday, February 16. We discussed the solution to separable differential equations. For some real world examples, see http://www-rohan.sdsu.edu/~jmahaffy/courses/f00/math122/lectures/sep_diffequations/sepdeeg.htm (population growth, water leaking from a cylander). Another good example is the Solow growth model in economics; see Warren Wessecker's notes or Chris Edmond's notes. One of the problems with using integrating factors or the separable equation method is that initially an equation may not look to be in that form, but with some work can be converted. A common technique is that if dy/dx = F(y/x) (ie, y' depends only on the ratio y/x), then we can convert this to a separable equation for dv/dx by setting v=y/x. As this implies xv=y, we see we are back to integrating by parts! The other key point in today's lecture was the introduction to bifurcations. One of my favorite examples, as discussed, involves the path Elvis the dog takes to fetch a ball. The bifurcation article is http://www.williams.edu/go/math/sjmiller/public_html/103/MintonPenningsElvisArticleFinal.pdf while the first one (does he know calculus) is http://www.williams.edu/go/math/sjmiller/public_html/103/Pennings_DogsCalculus.pdf. We'll discuss bifurcations in greater detail when we study Section 2.5. (An additional aside is that the bifurcation in the dog paper leads to a nice proof of the arithmetic / geometric mean inequality, one of the more important ones in math. For other proofs, see http://www.williams.edu/go/math/sjmiller/public_html/OSUClasses/487/ArithMeanGeoMean.pdf.)

Friday, February 13. Today's class was a terrific example of a very common technique in mathematics: reduce the problem you care about to a simpler one until you find one you can solve. (The standard example of this is completing the square to find the quadratic formula.) We showed how to solve any differential equation of the form y'(t) + p(t)y(t) = g(t). Unfortunately the solution could involve two integrals which may be impossible to evaluate in closed form. We'll discuss these issues in greater detail on Monday, but there are very good techniques to approximate arbitrarily well integrals of continuous functions (two good techniques are the trapezoidal rule, http://en.wikipedia.org/wiki/Trapezoidal_rule, and Simpson's rule, http://en.wikipedia.org/wiki/Simpson%27s_rule). We also reviewed integrals of standard functions; for Monday or Wednesday's lectures we'll need partial fractions: http://www.calc101.com/partial_fractions.html.

Wednesday, February 11. Today's class basically sets the notation for the rest of the semester. We will look primarily at finding an unknown function of one variable where the highest derivative can be written as a linear combination of the lower derivatives. Explicitly, y^(n) = a(t) + a_0(t) y + a_1(t) y' + a_2(t) y'' + ... + a_{n-1}(t) y^(n-1). Solving such an equation will involve n integrations (as n is the order of the equation, namely the highest derivative that arises). As each integration introduces an arbitrary constant, we see we will need n initial conditions. To obtain closed form expressions for y(t) will require finding anti-derivatives (or integrals). While a general function does not have a nice anti-derivative (the classic example is exp(-t^2)), there are many that do. Thus it is important to remember the results concerning standard functions from Math 103 and 104. There are many tables online; see http://en.wikipedia.org/wiki/Table_of_integrals#Table_of_Integrals or http://www.math.com/tables/integrals/tableof.htm; you can also use a program like Mathematica to solve some integrals (this can be done online at http://integrals.wolfram.com/index.jsp).

Monday, February 9. In studying difference equations we saw how linear algebra can be useful; in particular, the need to evaluate large powers of a matrix quickly. This is known as fast exponentiation, and the ability to do this (both for matrices as well as regular numbers) is extremely important. For example, one's first instinct is to say we need 100 (or 99) multiplications to evaluate x^100, but it is possible to do this in just 8: x*x, x^2 * x^2, x^4 * x^4, x^8 * x^8, x^16 * x^16, x^32 * x^32, x^64 * x^32 * x^4. The key observation is using the base 2 expansion of 100; this idea is one of the reasons RSA encryption is feasible. For more details, see Chapter 1 of my book, http://press.princeton.edu/chapters/s8220.pdf
(especially Sections 1.1 and 1.2.1). Quite often in mathematics we have algorithms to solve problems that are not feasible in practice, and finding efficient ways of computing quantities is a big (and important) industry. Another great example of where we know the solution exists but have trouble finding it is given by Euclid's proof of the infinitude of primes. Euclid argued that there must be infinitely many primes as follows: Assume not, and thus let p_1, ..., p_n be all the primes. Consider the product p_1 * ... * p_n + 1; either that number is prime, or it is divisible by a prime p. This prime p cannot be any of p_1, ..., p_n, as each p_i leaves remainder 1. Thus there are infinitely many primes, and we denote this new prime p by p_{n+1}. Lather, rinse, repeat. Keep doing this and we'll get an infinite list of primes. OK, great. This shows there are infinitely many. What can we say about the sequence of primes constructed? Does this list contain all the primes? Do we have any idea which primes are in the list and when? Is it easy to compute the terms? Euclid's method leads to the following sequence of primes: 2, 3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 2801, 11, 17, 5471, 52662739, 23003, 30693651606209, 37, 1741, 1313797957, 887, 71, 7127, 109, 23, 97, 159227, 643679794963466223081509857, 103, 1079990819, 9539, 3143065813, 29, 3847, 89, 19, 577, 223, 139703, 457, 9649, 61, 4357.... (Remember how we generated the sequence. We started with p_1 = 2, the first prime. We apply Euclid's argument and consider 2+1; this is the prime 3 so we set p_2 = 3. We apply Euclid's argument and now have 2*3+1 = 7, which is prime, and set p_3 = 7. We apply Euclid's argument again and have 2*3*7+1 = 43, which is prime and set p_4 = 43. Now things get interesting: we apply Euclid's argument and obtain 2*3*7*43 + 1 = 1807 = 13*139, and set p_5 = 13.) This is a great sequence to think about, but it is a computational nightmare to enumerate! I downloaded these terms from the Online Encyclopedia of Integer Sequences (homepage is http://www.research.att.com/~njas/sequences/ and the page for our sequence is http://www.research.att.com/~njas/sequences/A000945 ). You can enter the first few terms of an integer sequence, and it will list whatever sequences it knows that start this way, provide history, generating functions, connections to parts of mathematics, .... This is a GREAT website to know if you want to continue in mathematics. There have been several times I've computed the first few terms of a problem, looked up what the future terms could be (and thus had a formula to start the induction).