Additional Comments

Additional comments related to material from the class. If anyone wants to convert this to a blog, let me know. These additional remarks are for your enjoyment, and will not be on homeworks or exams. These are just meant to suggest additional topics worth considering, and I am happy to discuss any of these further.

Friday, May 9: For details of today's lectures, see my Lecture notes on Green's Theorem.
Today we discussed some of the Big Three theorems of Vector Calculus (Green's Theorem, Gauss' Divergence Theorem, and Stokes' Theorem). These theorems are massive generalizations of the Fundamental Theorem of Calculus, which can be generalized even more. The idea is to relate the integral of the derivative of something over a region to the integral of the something over the boundary of the region. To state these theorems requires many concepts from vector calculus (parametrizing curves, vectors, ...) as well as the Change of Variable theorem (converting integrals over curves and surfaces to integrals over simpler curves and surfaces).
- To truly see and appreciate the richness of the three theorems (which are really three variants of the same theorem), one must be in at least three dimensions. There the Stokes' Theorem states that the integral of a certain function over a surface equals the integral of another over the boundary curve. This means that many integrals turn out to be the same.
- To see the equivalence of these formulations requires differential forms. Frequently it is not immediately clear how to generalize a concept to higher dimensions or other settings.
- While we only briefly touched on the subject, conservative forces are extremely important in physics and engineering, primarily because of a wonderful property they have: the work done in moving an object from A to B is independent of the path taken if the exerted force is conservative. Many of the most important forces in classical mechanics are taken to be conservative, such as gravity and electricity. In modern physics, these forces are replaced with more complicated objects. One of the central quests in modern physics is to unify the various fundamental forces (gravity, strong, weak and electricity and magnetism).
- Click here for more on divergence, and click here for more on curl. Another related object (one we have seen many times) is the gradient. All of these involve the same differential operator, called del (and represented with a nabla). We used our intuition for vectors to define new combiantions involving the del operator (the curl and the divergence). While our intuition comes from vectors, we must be careful as we do not have commutivity. For example, nabla dot F is not the same as F dot nabla; the first is a scalar (number) while the second is an operator. Click here for more on differential operators. For those who want to truly go wild on operators, modern quantum mechanics replaces concepts like position and momentum with differential operators (click here for the momentum operator)! This allows us to rewrite the Heisenberg uncertainty principle in the following strange format.
- One of the most famous applications of these concepts is the Navier-Stokes equation, which is one of the Millenium Problems (solving one of these is probably the hardest path to one million dollars!). The Navier-Stokes equation describes the motion of fluids, which not surprisingly has numerous practical (as well as theoretical) applications. Click here for a nice derivation, which includes many of the new operators we saw today.
- Another place where gradients, curls and divergences appear is the Maxwell equations for electricity and magnetism; you can view the equations here.
- The General Stokes Theorem is a massive generalization of the fundamental theorem of calculus. The idea of formally moving the derivative from the function to the region of integration is meant to be suggestive, but of course is in no ways a proof. Notation should help us see connections and results. The great physicist Richard Feynman showed that all of physics is equivalent to solving the equation U = 0, where U measure the unworldliness of everything. It is made up of squaring the differences between the left and right hand sides of every physical law. Thus it has terms like (F - ma)^2 and (E - mc^2)^2. It is a concise way of encoding information, but it is not useful; everything is hidden. This is very different than the vector calculus formulations of electricity and magnetism, which do aid our understanding. For more information, skim the article here(search for unworldliness if you wish).
- We saw that we can compute the lengths of curves by evaluating integrals of ||c'(t)||, where c(t) = (x(t), y(t), z(t)) is our curve. While this formulation immediately reduces the problem of finding lengths to a Calc II problem, in general these are very difficult integrals, and frequently cannot be done in closed form even for simple shapes. For example, for extra credit find the length of the ellipse (x/a)^2 + (y/b)^2 = 1. Click here for the solution (the answer involves the elliptic integral of the second kind).
- We talked today about generalizing the Fundamental Theorem of Calculus. There are not that many fundamental theorems in mathematics -- we do not use the term lightly! Other ones you may have seen are the Fundamental Theorem of Arithmetic and the Fundamental Theorem of Algebra; click here for a list of more fundamental theorems (including the Fundamental Theorem of Poker!).
- Today was a fast introduction to path integrals, line integrals, and Green's Theorem (which is a special case of the Generalized Stokes' Theorem). While our tour of these subjects has to be rushed in a 12 week course, if you are continuing in certain parts of math, physics or engineering you will meet these again and again (for example, see Maxwell equations for electricity and magnetism). In fact, one can view all of classical mechanics as path integrals where the trajectory of the particle (its c(t)) minimizes the action; there is also a path integral approach to quantum mechanics.
  - For those continuing in mathematics or physics, you will see these ideas again if you take complex analysis. In particular, one of the gems of that subject is Cauchy's Integral Theorem, A complex differentiable function satisfies what is called the Cauchy-Riemann equations, and these are essentially the combination of partial derivatives one sees in Green's theorem. In other words, the mathematics used for Green's theorem is crucial in understanding functions of a complex variable.
  - For me, I consider it one of the most beautiful gems in mathematics that we can in some sense move the derivative of the function we're integrating to act on the region of integration! This allows us to exchange a double integral for a single integral for Green's theorem (or a triple integral for a double integral in the divergence theorem). As we've seen constantly throughout the year, often one computation is easier than another, and thus many difficult area or volume integrals are reduced to simpler, lower dimensional integrals.
  - The fact that Int_{t = a to b} grad(f)(c(t)) . c'(t) dt = f(c(b)) - f(c(a)) means that this integral does not depend on the path. If a vector field F = (F1, F2, F3) equals grad(f) for some f, we say F is a conservative force field and f is the potential. The fact that these integrals do not depend on the path has, as you would expect, profound applications.
  - This is a good point to stop and think about the number of spatial dimensions in the universe. Imagine a universe with two point masses under gravity, and assume gravity is proportional to 1/r^{n-1} with r the distance between the masses and n the number of spatial dimensions. If there are three or more dimensions, then the work done in moving a particle from infinity to a fixed, non-zero distance from the other mass is finite, while if there are two dimensions the work is infinite! One should of course ask why the correct generalization to other dimensions is 1/r^{n-1} and not 1/r^2 always. There is a nice geometric justification in terms of flux and surface area; the surface area of a sphere grows like r^2 and thus the only way to have the total flux of force out of it be constant is to assume the force drops like 1/r^2; click here for a bit on the justification of inverse-square laws.
  - Speaking of dimensions, one of my favorite problems from undergraduate days was the Random Walk. In 1-dimension, imagine a person so completely drunk that he/she has a 50% chance at any moment of stepping to the left or the right; what is the probability the drunkard eventually returns home? It turns out that this happens with probability 1. In 2-dimension, we have a 25% chance of moving north, south, east or west, and again the probability of returning is 1. In 3 dimensions, however, the drunkard only returns home with probability about 34%. As my professor Peter Jones said, a three-dimensional universe is the smallest one that could be created that will be interesting for drunks, as they really get to explore! These random walk models are very important, and have been applied to economics (the random walk hypothesis), as well as playing a role in statistical mechanics in physics.
Wednesday, May 9:
Monday, May 9:
Friday, May 2:
Wednesday, May 2
- We saw how well Taylor series approximate functions. The Mathematica program here is (hopefully) easy to use. You can specify the point and number of terms of the Taylor series of cos(x) to do. At first it might seem surprising that there is no improvement in fit when we go from a second order to a third order Taylor series approximation; however, we have cos(x) = 1 - x^2/2! + x^4/4! - x^6/6! + .... In other words, all the odd derivatives vanish at the origin, and thus there is no improvement at the origin in adding a cubic term (ie, the best cubic coefficient at the origin is 0). If we go to a fourth order, we do see improvement. By n=10 or 12 we are already getting essentially an entire period correct; by n=40 we have several cycles.
- Today we discussed Taylor's theorem. This is one of the most important applications of calculus. It allows us to replace complicated functions with simpler ones. There are numerous questions to ask.
  - Are Taylor series unique? Yes. The definition just involves taking sums of derivatives; the process is well-defined.
  - Does every infinitely differentiable function equal its Taylor series expansion? Sadly, no; the function f(x) = exp(1/x^2) if |x| > 0 and 0 if x=0 is the standard example. This function causes enormous problems in probability. There are many functions which do equal their own Taylor series expansion, such as exp(x), cos(x) and sin(x). It's not surprising that these three are listed together, as we have the wonderful Euler-Cotes formula: exp(i x) = cos(x) + i sin(x), with i = sqrt(-1). At first this formula doesn't seem that important; after all, we mostly care about real quantities, so why complexify our life by adding complex (i.e., imaginary) numbers? Amazingly, even for real applications (applications where everything is real), complex numbers play a pivotal role. For example, note that a little algebra gives cos(x) = (exp(i x) + exp(-i x)) / 2 and sin(x) = (exp(i x) - exp(-i x)) / 2i. Thus properties of the exponential function transfer to our trig functions. The hyperbolic cosine and sine functions are similarly defined; cosh(x) = cos(i x) = (exp(-i x) + exp(x)) / 2. The Foxtrot strip below (many thanks to the author, Bill Amend, for permission to post) illustrates the confusions that can happen between hyperbolic and regular trig functions (for extra credit, why does Eugene know that the calculator cannot be giving the right answer?). It's worth noting that the formula exp(i x) = cos(x) + i sin(x) allows us to derive ALL trig identities painlessly! See the comments from February 25.
  - FoxTrot (c) Bill Amend. Used by permission of Universal Uclick. All rights reserved.
  - How easy are Taylor series to use? If we keep just a few terms, it's not too bad; however, as the great Foxtrot strip below shows, it's not always clear how nicely something simplifies.
  - FoxTrot (c) Bill Amend. Used by permission of Universal Uclick. All rights reserved.
  - In the strip above, notice the large factorials in the denominator. Note 52! is about 10⁶⁸; in other words, these terms are small! For interest, 52! is the number of ways (with order mattering) of arranging a standard deck of cards. There are about 10⁸⁵ or so subatomic thingamabobs in the universe; we see quite quickly reach numbers this high (a deck with 70 cards more than sufices; in other words, we could not have each subatomic object in the universe represent a different shuffling of a deck of 70 cards). In a related note, it's important to think a bit and decide what 0! should be. It simplifies many formulas to have 0! = 1, and we can make this somewhat natural by saying there is only 1 way to do nothing (mathematically, of course). The definition of the factorial function on Wikipedia talks a little bit about this.
  - Unlike 0!, 0^0 is a bit more controversial as to what the definition should be. As I don't want to pressure anyone, I will not publically disclose where I stand in the great debate, though I'm happy to tell you privately / through email.
  - It's worth remarking on why we have n! in the denominators. This is to ensure that the nth derivative of our function equals the nth derivative of the Taylor expansion at the point we're expanding. In other words, we're matching up to the first 2 derivatives for the 2nd order Taylor expansion, up to the first 3 for the 3rd order Taylor expansion, and so on. It isn't surprising that we should be able to do a good job; the more derivatives we use, the more information we have on how the function is changing near the key point.
  - For many purposes, we just need a first order or second order Taylor series; one of my favorites is the proof of the Central Limit Theorem in probability. One of my favorite proofs involves second order Taylor expansions of the Fourier Transforms (these were mentioned in the additional comments on Friday, March 12).
  - If f(x) equals its infinite Taylor series expansion, can we differentiate term by term? This needs to be proved, and is generally done in a real analysis course. For some functions such as exp(x) we can justify the term by term differentiation, but note that this is something which must be justified.
  - A terrific application of just doing a first order Taylor expansion is Newton's Method.
  - For some reason, most books don't mention the trick on how to quickly compute higher order Taylor expansions in several variables. The idea is to 'bundle' variables together and use one-dimensional expansions. For example, consider f(x,y) = exp(-(x^2 + y^2)) cos(xy). We saw in class how painful it is to compute the Hessian, the matrix of second partial derivatives. That involved either two product rules or knowing the triple product formula. If we use our trick, it's much easier. Note exp(u) = (1 + u + u^2/2! + ...) and cos(v) = (1 - v^2/2! + ...). A second order Taylor expansion means keep only terms with no x's and y's, with just x or y, or with just x^2, xy or y^2 (a third order would allow terms such as x^3, x^2 y, x y^2, y^3, and so on). Thus we expand exp(u) cos(v) and then set u = -(x^2+y^2) and v = xy. For exp(u), we just need 1 + u, as already the u^2/2 term will be order 4 when we substitute -(x^2+y^2). For cos(v), we only keep the 1 as v^2/2 is order 4. Thus the Taylor expansion of order 2 is just (1 -(x^2+y^2)) (1) = 1 - x^2 - y^2; this is a lot faster than the standard method! That method works in general, but there are so many cases where this is faster that it's worth knowing.
Monday, May 2
Friday, April 29.
- A nice application of a series expansion is Stirling's formula for n!. We can get close to the correct value by the integral test or the Euler-MacLaurin summation formula, This builds on a very important question: we know the integral test tells whether or not a series converges; if it does converge, how close is the sum to the integral? The Euler-MacLaurin formula teaches us how to convert sums to integrals and bound the error.
  - The fact that Sum_{n = 1 to oo} 1/n^2 = pi^2/6 has a lot of applications. It can be used to prove that there are infinitely many primes via the Riemann zeta function. The Riemann zeta function is zeta(s) = Sum_{n = 1 to oo} 1/n^s. By unique factorization (also known as the Fundamental Theorem of Arithmetic), it also equals Prod_{p prime} 1 / (1 - 1/p^s); notice that a generalization of the harmonic sum and the geometric series formula are coming into play. It turns out that zeta(2) = pi^2/6, as can be seen in many different ways. As pi^2 is irrational, if there were only finitely many primes then the product would be irrational, contradiction! See wikipedia for a proof of this sum.
    - 6/pi^2 is the probability that two randomly chosen integers are relatively prime.
    - It's also the probability a randomly chosen integer is square-free!
    - We can also use the fact that Sum_{n=1 to oo} 1/n diverges to prove there are infinitely many primes: if there were only finitely many primes then as s --> 1 from above the product would be finite, and that contradicts the sum diverging!
    - A large amount of modern mathematics involves taking a series that only converges for certain values and `extending' it. For example, 1 + 2 + 4 + 8 + 16 + ... really does, in some sense, equal -1. This is the principle of meromorphic continuation, one of the central themes of complex analysis. We can similarly continue the zeta function, and we find zeta(-1) = 1 + 2 + 3 + 4 + ... = -1/12.
  - Another interesting application of summing series involving primes is to the Pentium bug (see the links there for more information, as well as Nicely's webpage). The calculation being performed was sum_{p: p prime and either p+2 or p-2 is prime} 1/p; this is known as Brun's constant. If this sum were infinite then there would be infinitely many twin primes, proving one of the most famous conjectures in mathematics; sadly the sum is finite and thus there may or may not be infinitely many twin primes (twin primes are two primes differing by 2).
- L'Hopital's rule is frequently used to analyze the behavior of sequences. Remember that you can only use it if you have 0 over 0 or infinity over infinity.
  - Extra credit: What is wrong with the following argument: Let's say we want to compute lim_{h --> 0} sin(h) / h; this is the most important trig limit. We use L'Hopital's rule and note that it is the same as lim_{h --> 0} cos(h) / 1; as cos(h) tends to 1, the limit is just 1. Why is this argument not valid? The answer is one of the most important principles in mathemtatics!
- We talked about a very important issue in modern mathematics -- often we do not need to solve a problem exactly, but only get an answer close to the truth. This is particularly true if the parameters of our equations must be estimated. Many problems can be solved by Linear Programming, which would make sense after a linear algebra class. I've done some work on this for the movie industry; see also some notes I've written on Linear Programming. Another nice application is `correctly' computing elimination numbers (which many websites do not do correctly). Here is a paper that implements linear programming to very efficiently solve this problem.
Wednesday, April 27. Today we gave our first test to see if a series converges or diverges, and discussed some of the mathematics needed to make implementing the test practical.

Monday, April 25. We encountered two of the most important sequences today: the geometric sequence and the harmonic sequence. We proved the geometric series formula today and discussed how to estimate the size of the harmonic series.
- The proof we gave today of the geometric series formula (by shooting baskets) uses many great techniques in mathematics. It is thus well worth it to study and ponder the proof.
  - Memoryless process: once both people miss, it is as if we've just started the game fresh.
  - Calculating something two different ways: a good part of combinatorics is to note that there are two ways to compute something, one of which is easy and one of which is not. We then use our knowledge of the easy calculation to deduce the hard. For example, Sum_{k = 0 to n} (n choose k)^2 = (2n choose n); the right side is easy to compute, the left side not so clear. Why are the two equal? It involves finding a story, which we leave to the reader.
  - For another example of applications of harmonic numbers, see the coin collector problem (if you want more info on this problem, let me know -- I have lots of notes on it from teaching it in probability).
  - In Section 2 last year, a basketball shot basically went in and out; see the following article on golf for some info on related problems in golf.
- Standard examples of sequences and series include
  - The harmonic series (see also In Perfect Harmony by John Webb for examples of where it arises). The standard example given is that you can use this to place dominoes on top of each other hanging out infinitely far without having them fall or be supported other than their weight! See http://www.ken.duisenberg.com/potw/archive/arch03/030728sol.html as well as http://www.cs.cmu.edu/afs/cs/academic/class/16741-s07/www/projects06/chechetka_16-741_project_report.pdf, or the movie of the week: Stacking blocks. For recent results on what can be done if you allow non-simple patterns, see this paper.
  - The geometric series
  - The Fibonacci series (an example of a difference or recurrence equation)
  - An infinite series of surprises: a nice article going from the geometric series to the harmonic series to other important examples.
  - We mentioned that sequences and series are very important; two of the most powerful applications are Taylor series (approximating complicated functions with simpler ones) and Riemann sums (allowing us to calculate areas with integrals).
  - L'Hopital's rule is frequently used to analyze the behavior of sequences. Remember that you can only use it if you have 0 over 0 or infinity over infinity.
- Another great application of sequences and series is to calculating probabilities. If two random variables are independent, the probability that both happen is the product of the probabilities that each happens. By using the logarithm function, we can convert products to sums, and we'll see later that there are good ways to estimate the value of these sums.
- We saw in class today that the harmonic series has a divergent sum, or Sum_{n = 1 to oo} 1/n is infinity. We'll see later that Sum_{n = 1 to oo} (-1)^{n+1} / n converges to ln(2). Related to this, one can consider the series Sum_{n = 1 to oo} x_n / n, where each x_n is 1 with probability 1/2 and -1 with probability 1/2 (think of this as infinitely many independent coin flips). Interestingly, a lot can be said about these random sums; see a great article here.
- Extra credit: What is wrong with the following argument: Let's say we want to compute lim_{h --> 0} sin(h) / h; this is the most important trig limit. We use L'Hopital's rule and note that it is the same as lim_{h --> 0} cos(h) / 1; as cos(h) tends to 1, the limit is just 1. Why is this argument not valid? The answer is one of the most important principles in mathemtatics.
Friday, April 22. Today's lecture again served two purposes. The first was to introduce (or to refamiliarize) you with sequences and series, and the second was to talk about proofs by induction (one of the most powerful proof techniques we have.
- Mathematical Induction is a wonderful way to prove results. One common image for induction is that of following dominoes. We have a statement P(n), and if we can show P(1) is true and whenever P(n) is true then P(n+1) follows, we can then conclude P(n) holds for all positive integers. We gave some standard examples, such as sums of odd integers and sums of integers, and a more exotic one (how a simple mistake leads to everyone has the same name). It is very easy, sadly, to subtly assume special properties when you try to prove something. No one noticed that the argument given for the same name subtly assumed n was at least 2. You need to constantly be vigilant about making additional, unwarranted assumptions. A lot of the financial crises was due to people using a math formula where they shouldn't.
  - Here is a great article: The formula that killed Wall Street (Wired Magazine).
  - Here is another: Did a math formula cause financial crisis.
- If you want to read more about mathematical induction / see more examples, click here for some of my notes.
- We talked about sequences and series. We've seen many examples in previous classes, one of the most important being the upper and lower sums leading to a proof of the Fundamental Theorem of calculus. Remember a sequence {a_n}_{n = 1 to oo} converges to L if lim_{n --> oo} |a_n - L| = 0. A nice exercise is to show a sequence can have at most one limit. Often we can `guess' the limit and check, or by brute force show something is not a limit.
  - For example (done at 10am but not at 11am): let a_n = n^2. We show no L is a limit. For definiteness, let's show L = 2011 cannot work. We have to study lim_{n --> oo} |a_n - 2011|. Note if n is large, |n^2 - 2011| > |2011n-2011| = 2011(n-1); this is because eventually n^2 > 2011. But as n --> oo clearly 2011(n-1) goes to infinity, and thus L=2011 is not a limit.
  - We talked a bit about the alternating sequence a_n = (-1)^n/n. We'll see later that if we were to sum the terms of the sequence we would get ln(1/2).
- We then discussed the Birthday Problem. In addition to being a fun example, it also has applications in cryptography, leading to the birthday attack. This is used to help people have secure electronic signing of messages, and thus is essential for modern commerce! While playing our clicker game we saw that we could eliminate some answers -- getting an intuition for problems like this is very important.
- There are lots of great questions you can ask to generalize the birthday problem. One of the best things you can do to train to be a scientist or researcher is to practice asking questions to generalize something. We did one in class: how many people would we need if we lived on the dwarf planet, Pluto. In general it's hard to find an answer in closed form depending on the parameters of the problem; it turns out that the number of people needed if there are D days in a year to have a 50% chance of two birthdays the same is about sqrt(D * ln(4)), a very nice function of D. Here are some other questions:
  - How many people do you need before you have a 50% chance that three people share a birthday?
  - How many people do you need before you have a 50% chance that there are at least two pairs of people with the same birthday?
  - We know that we need about 23 people for a 50% of a birthday collision; we know if we reach 365 people without a birthday collision then the next person must force two to share a birthday. For each person we can see what percent of the time they are the first person to cause a birthday collision. Which person is most likely to cause the collision? While the 366th person always causes a collision, it is very unlikely to reach that.
  - Try and write your own questions -- email one to me for extra credit and say why you find it interesting.

Monday, April 18.
Today's lecture serves two purposes. While it does review many of the concepts from integration, more importantly it introduces many of the key ideas and challenges of mathematical modeling. Most students of 105 won't be taking partial derivatives or integrals later in life (though you never know!); however, almost surely you'll have a need to model, to try and describe a complex phenomena in a tractable manner.
- Sabermetrics is the `science' of applying math/stats reasoning to baseball. The formula I mentioned at the start of the semester is known as the log-5 method; a better formula is the Pythagorean Won - Loss formula (someone linked my paper deriving this from a reasonable model to the wikipedia page), the topic of today's lecture. ESPN, MLB.com and all sites like this use the Pythagorean win expectation in their expanded series. My derivation is a nice exercise in multivariable calculus and probability
- In general, it is sadly the case that most functions do not have a simple closed form expression for their anti-derivative. Thus integration is magnitudes harder than differentiation. One of the most famous that cannot be integrated in closed form is exp(-x²), which is related to calculating areas under the normal (or bell or Gaussian) curve. We do at least have good series expansions to approximate it; see the entry on the erf (or error) function.
  - Earlier in the semester we mentioned that the anti-derivative of ln(x) is x ln(x) - x; it is a nice exercise to compute the anti-derivative for (ln(x))ⁿ for any integer n. For example, if n=4 we get 24 x-24 x Ln[x]+12 x Ln[x]²-4 x Ln[x]³+x Ln[x]⁴.
- Another good distribution to study for sabermetrics would be a Beta Distribution. We've seen an example already this semester when we looked at the Laffer curve from economics.I would like to try to modify the Weibull analysis from today's lecture to Beta distributions. The resulting integrals are harder -- if you're interested please let me know.
- Today we discussed modeling, in particular, the interplay between finding a model that captures the key features and one that is mathematically tractable. While we used a problem from baseball as an example, the general situation is frequently quite similar. Often one makes simplifying assumptions in a model that we know are wrong, but lead to doable math (for us, it was using continuous probability distributions in general, and in particular the three parameter Weibull). For more on these and related models, my baseball paper is available here; another interesting read might be my marketing paper for the movie industry (which is a nice mix of modeling and linear programming, which is the linear algebra generalization of Lagrange multipliers).
  - One of the most important applications of finding areas under curves is in probability, where we may interpret these areas as the probability that certain events happen. Key concepts are:
    - Probability distribution
    - Mean or Expected Value
    - Standard Deviation
    - Independence
    - Skewness and kurtosis (for the hypercompetitive students who really want to compare themselves to the class)
  - The more distributions you know, the better chance you have of finding one that models your system of interest. Weibulls are frequently used in survival analysis. The exponential distribution occurs in waiting times in lines as well as prime numbers.
  - In seeing whether or not data supports a theoretical contention, one needs a way to check and see how good of a fit we have. Chi-square tests are one of many methods.
  - Much of the theory of probability was derived from people interested in games of chance and gambling. Remember that when the house sets the odds, the goal is to try and get half the money bet on one team and half the money on the other. Not surprisingly, certain organizations are very interested in these computations. Click here for some of the details on the Bulger case (the bookie I mentioned in class is Chico Krantz, and is referenced briefly).
  - Any lecture on multivariable calculus and probabilities would be remiss if it did not mention how unlikely it is to be able to derive closed form expressions; this is why we will study Monte Carlo integration later. For example, the normal distribution is one of the most important in probability, but there is no nice anti-derivative. We must resort to series expansions; that expansion is so important it is given a name: the error function.
  - I strongly urge you to read the pages where we evaluate the integrals in closed form. The methods to get these closed form expressions occur frequently in applications. I particularly love seeing relations such as 1/c = 1/a + 1/b; you may have seen this in resistors in parallel or perhaps the reduced mass from the two body problem (masses under gravity). Extra credit to anyone who can give me another example of quantities with a relation such as this.
  - Click here for a clip of Plinko on the Price I$ Right, or here for a showcase showdown.
Friday, April 15.
The Change of Variable formula ties together many of the topics of the semester and generalizes a similar result from one-variable calculus. With complicated formulas such as this, it is quite useful to look at special cases first to get a sense of what is going on and then to try and generalize, being aware of course that sometimes there are features that are missed in the special cases. I like to look at this formula as giving the exchange rate from measuring in one coordinate system to another. For example, going from Cartesian to Polar coordinates we cannot have dx dy go to dr dtheta, as dx dy would have units of meters-squared while dr dtheta has units of meters-radians and radians are essentially unitless. (As a side note, the most important unitless number in physics is the fine structure constant.) We will see later that dx dy transforms to r dr dtheta.
- Our analysis shows that when we have a linear rescaling, say u = 2x and v = 3y, then if T(x,y) = (u,v) and T^{-1}(u,v) = (x,y), then dx dy transforms to |det(D T^{-1})|. Note how many concepts are being applied here. We have the derivative of a vector valued function and we have determinants. The reason for the absolute value is a bit tricky, but comes from the danger of having signed areas. Remember in Calc I that Int_{x = a to b} f(x) dx = - Int_{x = b to a} f(x) dx. We are looking at how the area elements transform. In order to make sure the areas are positive, we need to insert absolute values here.
- Another caveat is where to evaluate our function when we integrate it over the transformed region. Assume we have a map T from xy-space to uv-space. Let R = T(S). What should Int Int_S f(x,y) dx dy equal in uv-space? It becomes Int Int_T(S) f(T^{-1}(u,v)) |det DT^{-1}(u,v)| du dv. This is similar to the chain rule. If A(x) = f(g(x)) remember A'(x) = f'(g(x)) g'(x) and not f'(x) g'(x). This is one of the most common mistakes, namely evaluating f at the wrong point. Similarly here we need to make sure we evaluate f at the right place. In uv-space, our inputs are u and v, but f is expecting as inputs x and y. As T sends x and y to u and v, T^{-1} sends u and v to x and y, and thus the new function say g(u,v) = f(T^{-1}(u,v)) is what we should integrate over T(S).
- There are many applications of the Change of Variables formula, especially in probability theory; see here for a one-dimensional example (if you have access to JStor, here is one in economics).
- We saw how we can easily get the area of an ellipse or the volume of an ellipsoid using the change of variable; the perimeter of an ellipse is much harder!
- We discussed the game of Pac-man (click here for some facts about the ghosts movement), which after a little thought we see is really happening on a cylinder; if there was another pair of warp tunnels connecting the top to the bottom we would have a torus or a donut. It is amazing that we can represent these complicated regions by simple maps of a unit square, and give another example of the power of these coordinate transformations. This is the beginning of the field of topology.
- You can view a cylinder as what you get when you take a piece of paper and glue two opposite sides together. If you then glue the other two sides together you get a torus or a donut. if instead you start with a square and glue two sides together but twist the sides as you glue, you get the Mobius strip. This strange figure has only one side!
- I forgot to initially add this: we talked about changing units. For example, the ellipse (x/4)^2 + y^2 = 1 is four times longer than wide. It is clearly not circular; however, if we change units and measure along the x-axis in meters and the y-axis in the new length units of Ephs (where 1 Eph equals 1/4 of a meter, or 4 Ephs equals a meter), then in this biased measuring it is a circle!
  - There are lots of great units, many created by MIT students. Two of my favorites are the Smoot (interestingly, the person was the unit of measurement ended up as the president of the International Organization for Standardization) and the Bruno (this is the indentation, in cubic-centimeters I believe, made from a piano dropped 6 stories...).

Wednesday, April 13.
- We finished the Big Three coordinate changes: polar coordinates and cylidrical coordinates and spherical coordinates (be aware that physicists and mathematicians have different definitions of the angles in spherical!).
  - One can generalize spherical coordinates to hyperspheres in n-dimensional space. These lead to wonderful applications of special functions, such as the Gamma function, in writing down formulas for the `areas' and `volumes'. As a nice exercise, you can rewrite the integral in the comment above as Gamma(1/2)!
  - There are many fascinating question involving spheres (with applications to error correcting codes!):
    - The kissing number, which is related to spherical codes.
    - Sphere packing (the special case in 3-dimensional space is known as the Kepler Conjecture, and a proof was presented by Hales in 1998).
    - If you want to read more about these, I'm happy to share a chapter in a cryptography book I'm writing.
  - One of the most important applications of spherical coordinates is to planetary motion, specifically, proving that the force one sphere exerts on another is equivalent to all of the mass being located at the center of the sphere. This is the most important integral in Newton's great work, Principia (we have a first edition at the library here). I strongly urge everyone to look at this problem. Proving that one can take all of the mass to be at the center enormously simplifies the calculations of planetary motion. See the Wikipedia article on the Shell Theorem for the computation. As this is so important, here is another link to a proof. Oh, let's do another proof here as well as another proof here. For an example of a non-proof, read the following and the comments.
- We talked a bit today about how glass in windows could sink over time. I don't think this was the problem in the Hancock Tower in Boston, but it's still a fun read.
Monday, April 7.
Fubini's Theorem (changing the order of integrations) is one of the most important observations in multivariable calculus. For us, we assume our function f(x,y) is either continuous or bounded, and that it is defined on a simple region D contained in a finite rectangle. If D is an unbounded region, say D = {(x,y): x, y >= 0} then Fubini's theorem can fail for continuous, bounded functions. In class we did an example involving a double sum, where a_{0,0} = 1, a_{0,1} = -1, a_{0,n} = 0 for all n >= 2, then a_{1,0} = 0, a_{1,1} = 1, a_{1,2} = -1, and then a_{1,n} = 0 for all n >= 3, and so on. If we want to have a continuous function, we can tweak it as follows. Consider the indices {m,n}. Draw a circle of radius 1/2 with center {m,n} (note no two points will have circles that intersect or overlap). If a_{m,n} is positive, draw a cone with base a circle of radius 1/2 centered at {m,n} and height 12/π. As the area of a cone is (1/3) (area base) (height), this cone will have volume 1; if a_{m,n} was positive we draw a similar cone but instead of going up we go down, so now the volume is -1. What is going wrong? The problem is that Sum_m Sum_n |a_{m,n}| = ∞ (the sum of the absolute values diverges), and when infinities enter strange things can occur. Recall we are not allowed to talk about ∞ - ∞; the contribution from where our function or sequence is positive is +∞, the contribution where it is negative is -∞, and we are not allowed to subtract infinities.
- To motivate the Change of Variable Formula, which we'll see later, try to find the area of a circle by doing the integration directly. While there are many ways to justify learning the Change of Variable Formula (it's one of the key tools in probability), I want to take the path of looking at what should be a simple integral and seeing how hard it can be to evaluate in the given coordinate system. Much of modern physics is related to changing coordinate systems to where the problem is simpler to study (see the Lagrangian or Hamiltonian formulations of physics); these are equivalent to F = ma, but lead to much simpler algebra. The problem we considered was using one-variable calculus to find the area under a circle. This requires us to integrate sqrt(1 - x²) from x=0 to x=1. This is one of the most important shapes in mathematics -- if calculus is such a great and important subject, it should be able to handle this!
- To attack this problem, recall a powerful technique from Calc I: if f(g(x)) = x (so f and g are inverse functions, such as sqrt(x^2)), then g'(x) = 1 / f'(g(x)); in other words, knowing the derivative of f we know the derivative of its inverse function. This was used in Calc I to pass from knowing the derivative of exp(x) to the derivative of ln(x). We can various inverse trig functions; while many are close to sqrt(1-x^2), none of them are exactly that (a list of the derivatives of these are here). This highlights one of the most painful parts of integration theory -- just because we are close to finding an anti-derivative does not mean we can actually find it! While there is a nice anti-derivative of sqrt(1 - x^2), it is not a pure derivative of an inverse trig function. There are many tables of anti-derivatives (or integrals) (a fun example on that page is the Sophomore's Dream). Unfortunately it is not always apparent how to find these anti-derivatives, though of course if you are given one you can check by differentiating (though sometimes you have to do some non-trivial algebra to see that they match). In fact, there are some tables of integrals of important but hard functions where most practitioners have no idea how these results are computed (and occasionally there are errors!). We will see later how much simpler these problems become if we change variables; to me, this is one of the most important lessons you can take from the course: Many problems have a natural point of view where the algebra is simpler, and it is worth the time to try to find that point of view!

Friday, April 4.
The main result today was a method for integrating over regions other than a rectangle. We discussed a theoretical way to do it last class by replacing our initial function f on a rectangle including D with a new function f^*, with f^*(x,y) = f(x,y) if (x,y) is in our domain D and 0 otherwise. To make this rigorous we need to argue and show that we may cover any curve with a union of rectangles with arbitrarily small area. This leads to some natural, interesting questions.
- The first, and most important, involves what happens to a function when we force it to be 0 from some point onward (say outside D). The function may be discontinuous at the boundary, but then again it may not. There are many interesting and important examples from mathematical physics where we are attempting to solve some equation that governs how that system evolves. One of the most studied are the vibrations of a drum, where the drumhead is connected and stationary. We can thus view the vibrating drumhead as giving the values of our function on some region D, with value 0 along the boundary. This leads to the fascinating question of whether or not you can hear the shape of a drum. This means that if you hear all the different harmonics of the drum, does that uniquely determine a shape? Sadly, the answer is no -- different drums can have the same sounds. An excellent article on this is due to Kac, and can be read here.
- We discussed horizontally-simple and vertically-simple and simple regions (other books use the words y-simple, x-simple and simple regions). Note that a region is often called elementary if it is either horizontally or vertically simple. (Click here for some more examples on simple regions.) The point of our analysis here is to avoid having to go back to the definition of the integral (ie, the Riemann sum). While not every region is elementary, many are either elementary or the union of elementary regions. Below are two interesting tidbits about how strange things can be:
  - Space filing curves: click here for just how strange a curve can be!
  - Koch snowflake: This is an example of a fractal set; the boundary has dimension greater than 1 but less than 2! It's fractal dimension is log 4 / log 3.
  - Jordan curve theorem: It turns out to be surprisingly difficult to prove that every non-intersecting curve in the plane divides the plane into an inside and outsider region. It's not too bad for polygons, but for more general curves (such as the non-differentiable boundary of the Kock snowflake), it's harder.
- The video of the week was coin sorting (this leads to Lebesgue's Measure Theory). There are many reasons leading to this as the selection. One is that the Lebesgue theory is needed in a lot of higher mathematics, and if you continue you'll eventually meet it. The other, and more important for us, is that this demonstrates the power of a fresh perspective. This happens again and again in mathematics (and life). We have blinders on and don't even realize they're there. We get so used to doing things a certain way it becomes heretical to think of doing it another way. (This allows me to relink to Asimov's Nightfall story.) It's natural to divide the x-axis and add the areas as we go along; however, it is useful to consider dividing it along the y-axis as well.
  - If you know combinatorics, here's a nice example illustrating the above point. Evaluate the sum Sum_{k = 0 to n} (n choose k) (n choose n-k), where (x choose y) is x! / (y! (x-y)!). The answer is (2n choose n). There are a lot of ways to view this, here is my favorite. Imagine we have 2n people, n who prefer Star Trek: The Original Series and n who prefer Star Trek: The Next Generation. There are (2n choose n) ways to choose n people from the 2n people. We can view this another way: let's look at how many groups we can form of n people where exactly k prefer the original series. There are (n choose k) ways to choose k people from the n who prefer the original series, and then (n choose n-k) ways to choose n-k from the n who prefer the new series. The total number of ways with exactly k who prefer the original is the product: (n choose k) * (n choose n-k). We then sum over k; as any group of n people must have SOME number who prefer the original series, this sum is just the number of ways to choose n people from 2n, or (2n choose n). Telling a story and changing our perspective really helps!

Wednesday, April 4. Today we proved the
Fundamental Theorem of Calculus. There are not that many fundamental theorems in mathematics -- we do not use the term lightly! Other ones you may have seen are the Fundamental Theorem of Arithmetic and the Fundamental Theorem of Algebra; click here for a list of more fundamental theorems (including the Fundamental Theorem of Poker!). To simplify the proof, we made the additional assumptions that our function was continuously differentiable and the derivative was bounded. These assumptions can all be removed; it suffices for the function to be continuous on a finite interval (in such a setting, a continuous function is actually uniformly continuous; informally, this means in the epsilon-delta formulation of continuity that delta is independent of the point. Such a result is typically proved in an analysis class. What I find particularly interesting about the proof is that the actual value that bounds the function is irrelevant; all that matters is that our function is bounded. Theoretical math constantly uses such tricks; this is somewhat reminiscent of some of the Lagrange Multiplier problems, where we needed to use the existence of lambda to solve the problem, but frequently we never had to compute the value of lambda.
- The key ingredients in the proof are using the Mean Value Theorem and observing that we have a telescoping sum. One has to be a little careful with telescoping sums with infinitely many terms. The wikipedia article has some nice examples of telescoping sums and warnings of the dangers if there are infinitely many summands.
  - This lecture was titled 'The one with the Mean Value Theorem' in homeage to the sitcom Friends, where every episode title begins with `The one' (most have with as the next word, but not all).
- Whenever you are given a new theorem (such as the Fundamental Theorem of Calculus), you should always check its predictions against some cases that you an readily calculate without using the new machinery. For example, if we want to find the area under f(x) from x=0 to x=1, obviously the answer will depend on f. If f is constant it is trivial; if f is a linear relation then the answer is still readily calculated. For more general polynomial n, one can compute the Riemann sums (the upper and lower sums) by Mathematical Induction. For example, using induction one can show that the sum from n=1 to n=N of n is simply n(n+1)(2n+1)/6, and this result can then be used to find the area under the parabola y = x².
- The integration covered through Calc III is known as Riemann sums / Riemann integrals. In more advanced math classes you'll meet the successor, Lebesgue integrals. Informally, the difference between the two is as follows. Imagine you have a large number of coins of varying denominators to assist; your job is to count the amount of money. Riemann sums work by breaking up the domain of the function; Lebesgue integration works by breaking up the domain.

Monday, April 4.
In one dimension, there is not much choice in how we integrate; however, if we are trying to integrate a function of several variables over a rectangle (or other such regions), not surprisingly the situation is markedly different. Similar to the freedom we have with limits in several variables (where we have to consider all possible paths), there are many ways to integrate. Imagine we have a function of two variables and we want to integrate it over the rectangle [a, b] x [c, d], with x in [a, b] and y in [c, d]. One possibility is we can fix x and let y vary, computing the integral over y for the fixed x, and then let x vary, computing the integral over x. Of course, we could also do it the other way. As we are integrating the same function over the same region (just in a different order), we hope that the answers are the same! So long as everything is nice, this is the case. There are many formulations as to exactly what is needed to make the situation nice; if our function is continuous and bounded and we are integrating over a finite rectangle, then we can interchange the order of integration without changing the answer. This is called Fubini's theorem, and is one of the most important results in integration theory in several variables. There really isn't an analogue in one dimension, as there we have no choice in how to integrate!
- Whenever you are given a theorem, it is worthwhile to remove a condition and see if it is still true. Typically the answer is no (or if it is still true, the proof is frequently much harder). There are many functions and regions where the order of integration matters. The simplest example is looking at double sums rather than double integrals, though with a little work we can convert this example to a double integral. We give a sequence a_{mn} such that Sum_{m = 0 to oo} Sum_{n = 0 to oo} a_{m,n}) is not equal to Sum_{n = 0 to oo} Sum_{m = 0 to oo} a_{m,n}). For m, n >= 0 let a_{m,n} = 1 if m = n, -1 if n=m+1 and 0 otherwise. Show that the two different orders of summation yield different answers. The reason for this is that the sum of the
  absolute value of the terms diverges.
- Click here for another example where we cannot interchange the order of integration; a more involved example is available here.
- Click here for a video by Cameron on how he applies Fubini's theorem to change the order of operations (he does a double sum instead of a double integral, but the principle is the same).
- We spent a lot of time today reviewing integrals, ranging from formulas to tables. Two of the most popular are available online, but should only be used when you have a truly pesky integral: Abramovich and Stegun and Gradshteyn and Ryzhik. For everyday purposes, this should suffice.
- We covered the standard techniques, such as integration by parts and u-substitution. Another powerful technique is partial fractions. At first this seems like the domain of sadistic professors, but in reality it can be quite useful. I and one of my students needed to use it this summer to attack a nice problem in combinatorics / number theory. Zeckendorf proved that if you write the Fibonacci numbers with just one 1, so 1, 2, 3, 5, 8, 13, ..., then every number can be written uniquely as a sum of non-adjacent Fibonacci numbers. Lekkerkerker proved that as x ranges from the nth to the (n+1)st Fibonacci numbers then the average number of summands needed is n/(phi+2), where phi is the golden mean. My students and I proved the fluctuations about the mean are normally distributed (and generalized this to other systems). One of the key inputs was integration by partial fraction. If you're interested, let me know. This project allowed me to use my research funds to by a Cookie Monster!

Wednesday, March 16 and Friday, March 18. Lagrange multipliers are a terrific application of multivariable calculus. Frequently one needs to optimize something, be it revenue in economics or steathiness in a fighter. Lagrange multipliers give us a way to find maxima / minima subject to constraints, provided we can solve the equations!
We first generalized the methods from one variable calculus on how to find maxima and minima of functions. Recall that if f is a differentiable real-valued function on an interval [a,b], then the candidates for maxima / minima are (1) the critical points, namely those x in [a,b] where f'(x) = 0, and (2) the endpoints. How does this generalize to several variables? In one-dimension the boundary of an interval is `boring'; it's just the two endpoints, and thus it isn't that painful to have to check the value of the function there as well as at the critical point. What about several variables? The situation is quite different. For example, the interval [-1,1] might become a sphere x^2 + y^2 + z^2 <= 1; the interior is all points (x,y,z) such that x^2 + y^2 + z^2 < 1, while the boundary is now the set of points with x^2 + y^2 + z^2 = 1. Unfortunately this leads to infinitely many points to check; while we could afford to just check the endpoints by brute force in one-dimension, that won't be possible now. The solution is the Method of Lagrange Multipliers.
- Two good links: An introduction to Lagrange Multipliers and Lagrange Multipliers.
- The Method of Lagrange Multipliers is one of the most frequently used results in multivariable calculus. It arises in physics (Hamiltonians and Lagrangian, Calculus of Variations), information theory, economics, linear and non-linear programming, .... You name it, it's there. The two webpages referenced above have several examples in these and other subjects; there are of course many other sources and problems (click here for a nice post on gasoline taxes, pollution and Lagrange multipliers). For more on the economics impact, click here, as well as see the following papers:
  - Lagrange Multipliers and the Fundamental Theorem of Algebra (Theo de Jong), American Mathematical Monthly, Volume 116, Number 9, November 2009, pp. 828-830(3). The Fundamental Theorem of Algebra is one of the gems of mathematics (we don't have that many fundamental theorems). This is a beautiful proof mostly using what we've done in class (gradients, Lagrange Multipliers, Level Sets). There is a bit of real analysis terminology and results, but not too much, and I think confined enough so that you can get the general flavor.
  - Lagrange Multipliers and Optimality (R. Tyrrell Rockafellar)
  - Arbitrage Theory: A Mathematical Introduction (David P. Ellerman)
- The Method of Lagrange Multipliers ties together many of the concepts we've studied this semester, as well as some from Calc I and Calc II (vectors, directional derivatives and gradients, and level sets, to name a few). The goal is to show you how the theoretical framework we developed can be used to solve problems of interest. The military example we discussed is just one of many possible applications. We were concerned with how to deploy a fleet to minimize average deployment time to trouble spots (for more information, see my notes on the problem and the Mathematica code); of course, instead of considering each place equally important we could easily add weights. One consequence of war is that it does strongly encourage efficiency and optimization; in fact, many optimization algorithms and techniques were developed because of the problems encountered. The subject of Operations Research took off during WWII; see the excellent wikipedia article on Operations Research, especially the subsection on the problems OR attempts to solve. Not surprisingly, there are also numerous applications in business. Feel free to talk either to my wife (who is a Professor of Marketing) or myself (I've written several papers with marketing professors, applying such techniques to many companies, my favorite being movie theaters). As mentioned, we can reinterpret our problem as minimizing shipping costs from a central distributor to various markets (where some markets may be more valuable than others, leading to a weighted function).
- One of the most important takeaways of the deployment problem is that the answer you get, as well as the difficulty of the math needed to arrive at the answer, depends on how you choose to model the world. For us, it depends on how we choose to measure 'distance'. My notes on a deployment problem on the Earth's surface give four different methods yielding three different solutions, all of which differ from what you get if you use the 'correct' measure of distance. This is an extremely common outcome -- your answer depends on how you choose to model / measure! You need to be very aware of this when you compare different people's answers to the same problem. For a nice example of how the answer can depend on your point of view, consider the riddle below (passed on by G. Mejia). What's the right answer?
  - The police rounded up Jim, Bud and Sam yesterday, because one of them was suspected of having robbed the local bank. The three suspects made the following statements under
    intensive questioning (see below). If only one of therse statements turns out to be true, who robbed the bank?
    - Jim: I'm innocent.
    - Bud: I'm innocent.
    - Sam: Bud is the guilty one.
- For more on the problem of building an efficient computer in terms of retrieval of information, see the solution to the related extra credit problem from earlier in the semester. Note the problem is harder without the tools of multivariable calculus. See also the article by Hayes in the American Scientist, Third Base.
- I've scanned in a chapter by Lanchester on The Mathematics of Warfare; you can also view it through GoogleBooks here. This article is from a four volume series, The World of Mathematics. (I am fortunate enough to own two sets; one originally belonged to a great uncle of mine, another to a grandfather-in-law of my wife). I've written some Mathematica code to analyze the Battle of Trafalgar, which is described in the Lanchester article; the Mathematica code is here (though it might not make sense without comments from me). (The file name is boring because, during the 200th anniversary re-enactment, in order to avoid hurting anyone's feelings they refused to call the two sides 'English' and 'French/Spanish'). This is a terrific problem to illustrate applying mathematics to the real world. One has a very complicated situation, and you must decide what are the key features. The more features you include the better your model will be, but the less likely you'll be able to solve it! It's a bit of an art figuring out exactly how much to include to capture what truly matters and still be able to solve your model. We'll discuss this in greater detail when we do the Pythagorean Won-Loss theorem from baseball, which is a nice application of probability and multiple integrations.
- Finally, a common theme that surfaces as we do more and more mathematical modeling is that simple models very quickly lead to very hard equations to solve. The drowning swimmer problem is actually the same asSnell's law, for how light travels / bends in going from one medium to another. If you write down the equations for the drowning swimmer, you quickly find a quartic to solve. For interesting articles related to this, see the two papers below by Pennings on whether or not dogs know calculus. Click here for a picture of his dog, Elvis, who does know calculus.
  Do dogs know calculus?
  
  Do dogs know bifurcations?
- General comment: it's important to be able to take complex information and sift to the relevant bits. A great example is the song I'm my own Grandpa. Listen to it and try to graph all the relations and see that he really is his own grandfather (with no incest!). A solution is here (don't view this until you try to graph it!). Actually, this is a MUCH better illustration of the relationships.

Monday, March 14. We discussed directional derivatives. It is natural that we develop such a concept, as up until now we have only considered derivatives in directions parallel to the various coordinate axes. A central theme of multivariable calculus is the need to be able to approach a point along any path, and that in several dimensions numerous paths are available (unlike the 1-dimensional case, where essentially we just have two paths). Directional derivatives will play a key role in optimization problems.
- One of the requests in Spring 2010 was to talk about applications of multivariable calculus to molecular gastronomy. After some web browsing, I eventually beecame interested in how bees communicate amongst themselves as to where food is. There appear to bee two schools; one is the waggle dance / language school, and the other is the odor plume theory. In addition to controversies on how bees learn, there are lots of nice applications to gradients and (I believe) directional derivatives. The goal is to convey information about a specific path through a very complex space.
  - See also the paper: Odor landscapes and animal behavior: tracking odor plumes in different physical worlds (Paul Moorea, John Crimaldib). Abstract: The acquisition of information from sensory systems is critical in mediating many ecological interactions. Chemosensory signals are predominantly used as sources of information about habitats and other organisms in aquatic environments. The movement and distribution of chemical signals within an environment is heavily dependent upon the physics that dominate at different size scales. In this paper, we review the physical constraints on the dispersion of chemical signals and show how those constraints are size-dependent phenomenon. In addition, we review some of the morphological and behavioral adaptations that aquatic animals possess which allow them to effectively extract ecological information from chemical signals.
- It is very important to know proofs and definitions; there's a reason one of the exam questions required you to be able to describe clearly key concepts from the course. A very important example is the fall of Western Civilization (or, if you're not quite as pessimistic, the financial mortgage meltdown). While there are many reasons behind the collapse (I have close family that has worked in the upper levels of many of the top financial firms; if you are interested in stories of what isn't reported in the news, let me know), one large component was an incorrect use of Gaussian copulas. It's similar to looking at low-velocity data and extrapolating to relativistic speeds -- there is an enormous danger when you apply results from one region in another with no direct data in that second realm. A great article on this is from Wired Magazine (The Formula That Killed Wall Street). It's worth reading this. Some particularly noteworthy passages:
  - Bankers should have noted that very small changes in their underlying assumptions could result in very large changes in the correlation number. They also should have noticed that the results they were seeing were much less volatile than they should have been which implied that the risk was being moved elsewhere. Where had the risk gone? They didn't know, or didn't ask. One reason was that the outputs came from "black box" computer models and were hard to subject to a commonsense smell test. Another was that the quants, who should have been more aware of the copula's weaknesses, weren't the ones making the big asset-allocation decisions. Their managers, who made the actual calls, lacked the math skills to understand what the models were doing or how they worked. They could, however, understand something as simple as a single correlation number. That was the problem.
  - No one knew all of this better than David X. Li: "Very few people understand the essence of the model," he told The Wall Street Journal way back in fall 2005. "Li can't be blamed," says Gilkes of CreditSights. After all, he just invented the model. Instead, we should blame the bankers who misinterpreted it. And even then, the real danger was created not because any given trader adopted it but because every trader did. In financial markets, everybody doing the same thing is the classic recipe for a bubble and inevitable bust.
Wednesday, March 9. Today we discussed the importance of proving results in a math class. Things are not true merely b/c I seem nice and have a good resume and teach at Williams, but rather b/c I can show you why they must hold using just agreed upon rules of logical inference from accepted starting points. We discussed Russell's paradox and proofs of the Chain Rule, with some applications.
- Russell's paradox is one of the most famous in all of mathematics; it showed that we didn't even understand what it meant to be a set or an element of a set! Another famous one is the Banach - Tarski paradox, which tells us that we don't understand volumes! It basically says if you assume the Axion of Choice, you can cut solid sphere into 5 pieces, and reassemble the five pieces to get two completely solid spheres of the same size as the original! While it is rare to find these paradoxes in mathematics, understanding them is essential. It is in these counter-examples that we find out what is really going on. It is these examples that truly illuminate how the world is (or at least what our axioms, imply). Most people use the Zermelo-Fraenkel axioms, abbreviated ZF. If you additionally assume the Axiom of Choice, it's called ZFC or ZF+C. Not all problems in mathematics can be answered yea or nay within this structure. For example, we can quantify sizes of infinity; the natural numbers are much smaller than the reals; is there any set of size strictly between? This is called the Continuum Hypothesis, and my mathematical grandfather (my thesis advisor's advisor), Paul Cohen, proved it is independent (ie, you may either add it to your axiom system or not; if your axioms were consistent before, they are still consistent).
- In a real analysis course, one develops the notation and machinery to put calculus on a rigorous footing. In fact, several prominent people criticized the foundations of calculus, such as Lord Berkeley; his famous attack, The Analyst, is available here. It wasn't until decades later that a good notion of limit, integral and derivative were developed. Most people are content to stop here; however, see also Abraham Robinson's work in Non-standard Analysis. He is one of several mathematicians we'll encounter this semester who have been affiliated with my Alma Mater, Yale. Another is the great Josiah Willard Gibbs.
- One item we must deal with carefully in the proof of the chain rule is that we had g(x+h) - g(x) divided by itself; what if g(x+h) = g(x) for infinitely many arbitrarily small h? Then we are dividing by zero. Can you prove that this cannot happen if g is differentable? If that is not a strong enough condition, what if we assume the derivative g' does not vanish at x -- does that suffice to prove that g(x+h) cannot equal g(x) infinitely often for arbitrarily small h?
- If f(g(x)) = x (so f and g are inverse functions, such as sqrt(x^2)), then g'(x) = 1 / f'(g(x)); in other words, knowing the derivative of f we know the derivative of its inverse function. This was used in Calc I to pass from knowing the derivative of exp(x) to the derivative of ln(x). We can apply this to various inverse trig functions (a list of the derivatives of these are here). This highlights one of the most painful parts of integration theory -- just because we are close to finding an anti-derivative does not mean we can actually find it! While there is a nice anti-derivative of sqrt(1 - x^2), it is not a pure derivative of an inverse trig function. There are many tables of anti-derivatives (or integrals) (a fun example on that page is the Sophomore's Dream). Unfortunately it is not always apparent how to find these anti-derivatives, though of course if you are given one you can check by differentiating (though sometimes you have to do some non-trivial algebra to see that they match). In fact, there are some tables of integrals of important but hard functions where most practitioners have no idea how these results are computed (and occasionally there are errors!). We will see later how much simpler these problems become if we change variables; to me, this is one of the most important lessons you can take from the course: Many problems have a natural point of view where the algebra is simpler, and it is worth the time to try to find that point of view!
- Let f(x) = exp(x). Then f'(x) = lim [f(x+h) - f(x)]/h = lim [exp(x+h) - exp(x)] / h = lim [exp(x) exp(h) - exp(x)] / h = exp(x) lim [exp(h) - 1] / h; as exp(0) = 1, we find f'(x) = exp(x) lim [f(h) - f(0)] / h = exp(x) f'(0); thus we know the derivative of the exponential function everywhere once we know the derivative at 0!

Monday, March 7. Today we gave the proofs of some of the key theorems in multivariable calculus, and discussed Newton's method.
- We compared two methods to find roots of polynomials. In some special cases we can find closed form expressions for roots in terms of the coefficients. For example, any linear equation (ax+b=0), quadratic (ax^2+bx+c=0), cubic (ax^3+bx^2+cx+d=0) or quartic (ax^4+bx^3+cx^2+dx+e=0) has a formula for the roots in terms of the coefficients of the polynomials; this fails for polynomials of degree 5 and higher (the Abel-Ruffini Theorem; see also Galois). It is very convenient when we have a solution that is a function of the parameters; we can then use our methods to find the optimal values of the parameters. Sadly in industry it is often difficult to get closed form expressions; if you are looking for the most potent compound, for example, you might be required to do numerous different trial runs and just observe which is best. We thus need a way to find optimal values / solve equations. We describe two below.
  - Newton's method is significantly more powerful than divide and conquer (also called the bisecting algorithm); this is not surprising as it assumes more information about the function of interest (namely, differentiability). The numerical stability of Newton's method leads to many fascinating problems. One terrific example is looking at roots in the complex plane of a polynomial. We assign each root a different color (other than purple), and then given any point in the complex plane, we apply Newton's method to that point repeatedly until one of two things happen: it converges to a root or it diverges. If the iterates of our point converges to a root, we color our point the same color as that root, else we color it purple. This leads to Newton fractals, where two points extremely close to each other can be colored differently, with remarkable behavior as you zoom in. If you're interested in more information, let me know; a good chaos program is xaos (I have other links to such programs for those interested). One final aside: it is often important to evaluate these polynomials rapidly; naive substitution is often too slow, and Horner's algorithmis frequently used.
  - The fractal behavior exhibited by Newton's method applied to finding roots of polynomials is one of many examples of Chaos Theory, or extreme sensitivity to initial conditions. While one of the earliest examples was the work of Poincare on the motion of three planetary bodies, the subject really took off with Lorenz work on weather (the Butterfly Effect). Another nice example is the orbit of Pluto; while we know it will orbit the sun, its orbit is chaotic and we cannot say where exactly in the orbit it will be millions of years from now.
- Instead of approximating a function locally by a line, we now use a plane (in 2-dimensions) or hyperplane (in general). We can use the Mean Value Theorem to get some information on how close the estimation is, and then use these estimations to approximate our function. A Mathematica file with the tangent line and tangent plane approximations is here. One definition of differentiability is that a function is differentiable if the error in the tangent plane approximation tends to zero faster than the distance of where we are to where we start tends to zero. It is sadly possible for the partial derivatives to exist without the function being differentiable. We showed how it is not sufficient for the partial derivatives to exist; that is not enough to imply our function is differentiable. The example was f(x,y) = (xy)^1/3. What must we assume in order for the partial derivatives to imply our function is differentiable? It turns out it suffices to assume the partial derivatives are continuous. This is the major theorem in the subject, and provides a nice way to check for when a function is differentiable.
  - The proof of the alluded to theorem above uses two of my favorite techniques. While sadly we do not multiply by 1, we do get to add 0 and we do use the Mean Value Theorem. One of my goals in the class is to illustrate how to think about these problems, why we try certain approaches for our proofs. We want to study how well the tangent plane approximates our function, thus we need to study f(x,y) - f(0,0) - (δf/δx)(0,0) x - (δf/δy)(0,0) y. Our theorem assumes the partial derivatives are continuous, thus it stands to reason that at some point in the proof we should use the partial derivatives are continuous! The trick is to try and see how we can get another δf/δx and another δf/δy to appear. The key is to recall the MVT. If we add 0 in a clever way, we can do this. Our expression equals f(x,y) - f(0,y) + f(0,y) - f(0,0) - (δf/δx)(0,0) x - (δf/δy)(0,0) y. We now use the MVT on f(x,y) - f(0,y) and on f(0,y) - f(0,0). In each of these two expressions, only one variable changes. Thus the first is (δf/δx)(c,y) x and the second is (δf/δy)(0,ĉ). Thus the error in using the tangent plane is [(δf/δx)(c,y) - (δf/δx)(0,y)] x + [(δf/δy)(0,ĉ) - (δf/δx)(0,o)] y. We now see how the continuity of the partials enters -- it ensures that these differences are small, even when we divide by |(x,y)-(0,0)|.
Friday, March 4.
Today we discussed the Chain Rule. The Chain Rule is one of the most important results in multivariable calculus, as it allows us to build complicated functions depending on functions of many inputs. To state it properly requires some linear algebra, especially matrix multiplication. The proof uses multiple applications of adding zero. This is a essential skill to master if you wish to continue in mathematics. It is somewhat similar to adding auxiliary lines in geometry. With experience, it becomes easier to `see' where and how to add zero. The idea is we want to add zero in such a way that we convert one expression to several, where the resulting expressions are easier to analyze because we are subtracting two quantities that are quite close. For the chain rule, we will do this by adding numerous intermediary points.
- One way to view the Chain Rule is that it is all about giving you the freedom to choose. You can either plug everything in and differentiate directly by brute force, or you can use the Chain Rule to find the derivative of the composition in terms of the derivatives of the constituent pieces. Depending on the problem, one way could be easier than the other; there are examples of situations where direct substitution is best, and examples where it is better to use the Chain Rule. With experience it becomes clear which way is better. When we discuss gradients and directional derivatives, we'll see a theoretical interpretation of the Chain Rule. This will play a fundamental role when we return to optimization problems. Finally, of course, it is useful to be able to compute an answer two different ways, as this provides a nice check of your work.
- To use the Chain Rule in full glory, we needed to understand how to multiply matrices, as h(x) = f(g(x)) implies (Dh)(x) = (Df)(g(x)) (Dg)(x), where x = (x₁, ..., x_n). One can motivate matrix multiplication through the dot product, as we know how to take the dot product of two vectors of the same number of coordinates. Matrix multiplication looks quite mysterious at first. Wikipedia has a nice article (with color) on multiplying matrices, though it is a bit short on motivation. The advanced reason as to why we do this comes from also viewing matrices as linear transformations, and we want the product of two matrices to represent the composition of the transformations. This is an advanced topic, and sadly is frequently mangled in a linear algebra course. I've posted a little bit about this in the advanced notes from Thursday's lecture. The best motivation I know is to consider 2 x 2 rotation matrices. If R(a) corresponds to rotating by a radians, and R(b) to rotating by b radians, then R(b) R(a) should equal R(b+a); this does happen if we use the matrix multiplication method we discussed.
- I did a quick google search for applications of the chain rule in various subjects. Here's something in economics. Here's another econ example. Here's a chemistry example.

Wednesday, March 2.
Today's lecture covered the Method of Least Squares. The best fit value of the parameters depends on how we choose to measure errors. It is very important to think about how you are going to measure / model, as frequently people reach very different conclusions because they have different starting points / different metrics. We'll see another example of how our metric can affect the answer when we get to Lagrange multipliers.
- The Method of Least Squares is one of my favorites in statistics (click here for the Wikipedia page, and click here for my notes). The Method of Least Squares is a great way to find best fit parameters. Given a hypothetical relationship y = a x + b, we observe values of y for different choices of x, say (x1, y1), (x2, y2), (x3, y3) and so on. We then need to find a way to quantify the error. It's natural to look at the observed value of y minus the predicted value of y; thus it is natural that the error should be Sum_{i=1 to n} h(yi - (a x_i + b)) for some function h. What is a good choice? We could try h(u) = u, but this leads to sums of signed errors (positive and negative), and thus we could have many errors that are large in magnitude canceling out. The next choice is h(u) = |u|; while this is a good choice, it is not analytically tractable as the absolute value function is not differentiable. We thus use h(u) = u²; though this assigns more weight to large errors, it does lead to a differentiable function, and thus the techniques of calculus are applicable. We end up with a very nice, closed form expression for the best fit values of the parameters.
- Unfortunately, the Method of Least Squares only works for linear relations in the unknown parameters. As a great exercise, try to find the best fit values of a and c to y = c/x^a (for definiteness you can think of this as the force due to two unit masses that are x units apart). When you take the derivative with respect to a and set that equal to zero, you won't get a tractable equation that is linear in a to solve. Fortunately there is a work-around. If we change variables by taking logarithms, we find ln(y) = ln(c/x^a); using logarithm laws this is equivalent to ln(y) = a ln(x) + ln(c); setting Y = ln(y), X = ln(X) and b = ln(c) this is equivalent to Y = a X + b, which is exactly the formulation we need! This example illustrates the power of logarithms; it allows us to transform our data and apply the Method of Least Squares.
- There are many examples of power laws in the world. Many of my favorite are related to Zipf's law. The frequencies of the most common words in English is a fascinating problem (click here for the data; see also this site); this works for other languages as well, for the size of the most populous cities, ...; if you consider more general power laws, you also get Benford's law of digit bias, which is used by the IRS to detect tax fraud (the link is to an article by a colleague of mine on using Benford's law to detect fraud). The power law relation is quite nice, and initially surprising to many. My Mathematica programming analyzing this is available here. See also this paper by Gabaix for Zipf's law and the growth of cities. As a nice exercise, you should analyze the growth of city populations (you can get data on both US and the world from Wikipedia).
- We discussed Kepler's Three Laws of Planetary Motion (the Wikipedia article is very nice). Kepler was proudest (at least for a longtime) of Mysterium Cosmographicum (I strongly urge you to read this; yes, the same Kepler whom we revere today for his understanding of the cosmos also advanced this as a scientific theory -- times were different!).
- Finally, a theme of the past two days is the importance of how we choose to measure things; how we model and how we judge the model's prediction will greatly affect the answer. In a similar spirit, I thought I would post a brief note about Oulipo, a type of mathematical poetry (this is a link to the Wikipedia page, which has links to examples). There was a nice article about this recently in Math Horizons (you can view the article here). This is a nice example of the intersection of math and the arts, and discusses how the structure of a poem affects the output, and what structures might lead to interesting works.
Monday, February 28.
- The search for extrema is a central pursuit in modern science and engineering. It is important to have techniques to winnow the list of candidate points. The methods discussed in class are the natural generalizations from one-variable calculus. While one must prove that the function under consideration does have a max/min, typically this is clear from physical reasons (for example, there should be a pen of maximal area for given perimeter; there should be a path of least time).
  - In one-dimension, boundaries of sets aren't too bad; for example, the boundary of [a, b] is just two points, a and b. The situation is violently different in several variables. There the boundary can have infinitely many points, and reducing a problem to interior critical points and checking the function on the boundary is not enough; we must have a way to evaluate all these points on the boundary.
  - The generalization of the second derivative tests involves determinants and whether or not the Hessian is a positive definite matrix, a negative definite matrix, et cetera. What is really going on is that we want to use the Principal Axis Theorem and change to a coordinate system where the Hessian is easier to understand because, in this new coordinate system, it is a diagonal matrix!
  - In one of the sabermetrics lectures we'll discuss linear programming. This is a wonderful topic, and it allows us to solve (or approximate the solutions) to a wealth of problems very quickly. My lecture notes are online here. One of my favorite applications of linear programming is to determining when teams are eliminated from playoff contention; MLB and ESPN frequently do the analysis incorrectly by not taking into account secondary effects of teams playing teams playing teams. For example, ESPN or MLB back in '04 had the wild-card unclinched for one extra day. (The Sox had a big lead over the Angels and a slightly smaller lead over the As; however, the As and the Angels were playing each other and thus at least one team would get 2 losses, and one had to win the ALWest. Thus the Sox had clinched the wildcard a day earlier than thought.) If you're interested, click here for a paper I wrote with colleagues applying linear programming to helping a movie theatre determine optimal schedules.
  - It is worth remarking that for many applications in the real world, we do not need to find the true extremum, but rather just something very close. For example, say we are trying to determine the optimal schedule for an airline for a given day. We can write the linear programming problem down, but it might take days to years to run; however, frequently we can obtain bounds showing how close our answer is to the theoretical best (ie, we can show we are no more than X away from optimal). It often happens that X is small, and thus with a small run-time we can be close enough. (It isn't worth it to ground our airfleet for a few years to find the optimal schedule!)
  - In honor of my kids loaning one of their favorite books, I think it's fitting to end with some comments about children's books. Click, Clack Moo. Cows that type is the first of a series of well-illustrated and entertaining adventures of Duck and his compatriots. It's a Caldecott Honor award winner, which after winning the Caldecott Medal is, I believe, considered the top honor for a children's story. One of my favorite childhood books is Make Way For Ducklings, which has now been read to three generations of Millers. This is extremely popular in the eastern part of the state (ie, Boston). We've taken Cam to the statues; it's a fun area. For the western part of the state, the big children's attractions are Dr Seuss (from Springfield) and the Eric Carle Museum (in Amherst -- okay, there is something nice there!). Finally, no `cultural' introduction to the Commonwealth of Massachusetts would be complete without providing a link to Norman Rockwell.
- Systems of equations are frequently used to model real world problems, as it is quite rare for there to be only one quantify of interest. If you want to read more about applying math to analyze the Battle of Trafalgar, here is a nice handout (or, even better, I think we could go further and write a nice paper for a general interest journal expanding on the Mathematica program I wrote). The model is very similar to the Lotka-Volterra predator-prey equations (our evolution is quite different, though; this is due to the difference in sign in one of the equations). Understanding these problems is facilitated by knowing some linear algebra. It is also possible to model this problem using a system of difference equations, which can readily be solved with linear algebra. Finally, it's worth noting a major drawback of this model, namely that it is entirely deterministic: you specify the initial concentrations of red and blue and we know exactly how many exist at any time. More generally one would want to allow some luck or fluctuations; one way to do this is with Markov chains. This leads to more complicated (not surprisingly) but also more realistic models. In particular, you can have different probabilities for one ship hitting another, and given a hit you can have different probabilities for how much damage is done. This can be quite important in the 'real' world. A classic example is the British efforts to sink the German battleship Bismarck in WWII. The Bismarck was superior to all British ships, and threatened to decisively cripple Britain's commerce (ie, the flow of vital war and food supplies to the embattled island). One of the key incidents in the several days battle was a lucky torpedo shot by a British plane which seriously crippled the Bismarck's rudder. See the wikipedia entry for more details on one of the seminal naval engagements of WWII. The point to take away from all this is the need to always be aware of the limitations of one's models. With the power and availability of modern computers, one workaround is to run numerous simulations and get probability windows (ie, 95% of the time we expect a result of the following type to occur). Sometimes we are able to theoretically prove bounds such as these; other times (using Markov chains and Monte Carlo techniques) we numerically approximate these probabilities.

Friday, February 25. We continued our discussion of partial derivatives.
We talked a lot about different notations for the derivative. It is very convenient to be able to refer to all the different derivatives (functions with one input, several inputs and one output, several inputs and several outputs) with just one notation. The definition is that a function is differentiable if the error in the tangent plane approximation tends to zero faster than the distance of where we are to where we start tends to zero. It is sadly possible for the partial derivatives to exist without the function being differentiable. We showed how it is not sufficient for the partial derivatives to exist; that is not enough to imply our function is differentiable. The example was f(x,y) = (xy)^1/3. What must we assume in order for the partial derivatives to imply our function is differentiable? It turns out it suffices to assume the partial derivatives are continuous. This is the major theorem in the subject, and provides a nice way to check for when a function is differentiable.
- The proof of the alluded to theorem above uses two of my favorite techniques. While sadly we do not multiply by 1, we do get to add 0 and we do use the Mean Value Theorem. One of my goals in the class is to illustrate how to think about these problems, why we try certain approaches for our proofs. We want to study how well the tangent plane approximates our function, thus we need to study f(x,y) - f(0,0) - (δf/δx)(0,0) x - (δf/δy)(0,0) y. Our theorem assumes the partial derivatives are continuous, thus it stands to reason that at some point in the proof we should use the partial derivatives are continuous! The trick is to try and see how we can get another δf/δx and another δf/δy to appear. The key is to recall the MVT. If we add 0 in a clever way, we can do this. Our expression equals f(x,y) - f(0,y) + f(0,y) - f(0,0) - (δf/δx)(0,0) x - (δf/δy)(0,0) y. We now use the MVT on f(x,y) - f(0,y) and on f(0,y) - f(0,0). In each of these two expressions, only one variable changes. Thus the first is (δf/δx)(c,y) x and the second is (δf/δy)(0,ĉ). Thus the error in using the tangent plane is [(δf/δx)(c,y) - (δf/δx)(0,y)] x + [(δf/δy)(0,ĉ) - (δf/δx)(0,o)] y. We now see how the continuity of the partials enters -- it ensures that these differences are small, even when we divide by ||(x,y)-(0,0)||.
- The Mean Value Theorem is also the key ingredient in the proof of the equality of mixed partial derivatives (assuming both are continuous). Sadly there do exist functions where the mixed derivatives are unequal (for extra credit, show that the mixed derivatives are not equal in the linked example).
- Notation is very important. The subscript notation for partial derivatives is nice and elegant; it's easy to glance at u_xxy and quickly glean that it's two derivatives with respect to x followed by one with respect to y. This allows us to write down many partial differential equations in a nice, compact form. Some of the most famous and important are (1) the heat equation, (2) the wave equation, and (3) the Navier-Stokes equation. The last arises in fluid flow, and is one of the Clay Millenium Prize Problems.
- We talked a bit about differential trigonometry, and how everything comes down to the limit as h tends to zero of sin(h)/h. One can prove this limit geometrically, as is often done, and then obain the derivatives by using the angle addition formulas. We sketch another avenue to these addition formulas. The Pythagorean Theorem says cos²(x) + sin²(x) = 1. There are many ways to obtain this formula. Perhaps one of the most useful is the Euler - Cotes formula, exp(ix) = cos(x) + isin(x). One can essentially derive all of trigonometry from this relation, with just a little knowledge of the exponential function. Specifically, we have exp(z) = 1 + z + z²/2! + z³/3! + .... It is not at all clear from this definition that exp(z) exp(w) = exp(z+w); this is a statement about the product of two infinite sums equaling a third infinite sum. It is a nice exercise in combinatorics to show that this relation holds for all complex z and w.
  - Taking the above identities, we sketch how to derive all of trigonometry! Let's prove the angle addition formulas. We have exp(ix) = cos(x) + isin(x) and exp(iy) = cos(y) + isin(y). Then exp(ix) exp(iy) = [cos(x) + isin(x)] [cos(y) + isin(y)] = [cos(x) cos(y) - sin(x) sin(y)] + i [sin(x) cos(y) + cos(x) sin(y)]; however, exp(ix) exp(iy) = exp(i(x+y)) = cos(x+y) + i sin(x+y) by Euler's formula. The only way two complex numbers can be equal is if they have the same real and the same imaginary parts. Thus, equating these yields cos(x+y) = cos(x) + isin(x) and sin(x+y) = sin(x) cos(y) + cos(x) sin(y).
  - It is a nice exercise to derive all the other identities. One can even get the Pythagorean theorem! To obtain this, use exp(ix) exp(-ix) = exp(0) = 1.
  - We thus see there is a connection between the angle addition formulas in trigonometry and the exponential addition formula. Both of these are used in critical ways to compute the derivatives of these functions. For example, these formulas allow us to differentiate sine, cosine and the exponential functions anywhere once we know their derivative at just one point. Let f(x) = exp(x). Then f'(x) = lim [f(x+h) - f(x)]/h = lim [exp(x+h) - exp(x)] / h = lim [exp(x) exp(h) - exp(x)] / h = exp(x) lim [exp(h) - 1] / h; as exp(0) = 1, we find f'(x) = exp(x) lim [f(h) - f(0)] / h = exp(x) f'(0); thus we know the derivative of the exponential function everywhere once we know the derivative at 0! One finds a similar result for the derivatives of sine and cosine (again, this shouldn't be surprising as the functions are related to the exponential through Euler's formula).
Wednesday, February 23.
Monday, February 21. We started with a discussion on level sets. These occur all the time in real world plots. For example, weather maps constantly show lines of constant temperature; these are called isotherms.
- The homework involves sketching various curves, many of which are famous conic sections. The shapes that arise are often ellipses, hyperbolas, parabolas, lines and circles. The theory of conic sections says that these are all related, and arise as cross sections obtained by having planes intersect a cone at various angles. These shapes arise throughout mathematics and science. Here are just a few examples, which illustrate their importance.
  - Chemistry / Physics: The ideal gas law states that PV = nRT. If we set T equal to a constant, we then get PV is constant (this special case is called Boyle's law). Note that this is an equation of a hyperbola, and thus the isotherms (level sets of constant temperature) are hyperbolas.
  - Physics / Astrophysics: The most common example of conic section is orbits of planets. In three-dimensional space, planets orbiting the sun under a gravitational force proportional to the inverse-square of the distance travel in ellipses, hyperbolas, or parabolas (see here for more details).
- It is not too hard for us to imagine what it would be like for a sphere to enter a plane, but it does become harder and harder to imagine four dimensional objects arriving in our three dimensional world. One of my favorite stories it the classic Nightfall (by Isaac Asimov). What makes this such a great story is that he takes something that is conceivable for us and creates a world where it is inconceivable for the population. I strongly urge you to read this story.
- The video clips for Flatland are available here: Flatland trailer. (The full movie is available here for class purposes only.) There's also projections of 4-dimensional cubes in our 3-dimensional space. I find it very hard to imagine four dimensional objects passing through our space, but it's a fun exercise. We can imagine a sphere passing through Flatland; what would a 4-dimensional sphere look like passing through our space? We can imagine a 3-dimensional cube passing through Flatland (preferably at an angle as otherwise it's nothing, a full square for awhile, and then nothing again); what happens with a 4-dimensional square going through our space?
- Kepler's laws of planetary motion heavily use ellipses, hyperbolas and the like. These are the famous conic sections, and there's a beautiful unified theory of them.
- It's not hard to find functions of several variables. Baseball is filled with these nowadays; one very popular one is the runs created formula.

Wednesday, February 16. Today we continued with equations of lines and planes, and then ended with the advantages of changing coordinates. We discussed again the controversy of Eakins paitings (see the comments from Monday, February 14). As a liberal arts student, one of our goals is to get you to the point where you can talk to almost anyone for 15 minutes to an hour; this was useful for a job interview with a Human Resource person who was an art history major from Williams!
- As I've said in class, we could title the first week of the semester Applications of the Pythagorean Theorem. As a number theorist, it's hard for me not to discuss its generalizations. The Pythagorean Theorem says that for a right triangle, a² + b² = c², where a and b are the bases of our triangle and c is the hypotenuse. It is not immediately clear that there are integer solutions to this, but a little inspection turns up a few, such as (3, 4, 5), (5, 12, 13), and of course trivial modifications such as (6, 8, 10). It turns out there are infinitely many solutions in the integers, and there is even a way to generate all of these solutions, which are called Pythagorean triples. Click here for more information on the Pythagorean triples.
- One can of course ask about generalizations of the Pythagorean Theorem; the most famous is whether or not there are any non-trivial integer solutions to aⁿ + bⁿ = cⁿ, where n > 2 and abc is non-zero. This is the famous Fermat's Last Theorem, solved by Wiles / Taylor-Wiles using elliptic curves andmodular forms (on a personal note, I had the pleasure of teaching a class with Wiles on elliptic curves at Princeton in 2001). There are other generalizations, such as the Beal's Conjecture. Another wonderful generalization is Euler's sum of powers conjecture. For a nice occurrence of these in popular culture, see the Homer³ short from Treehouse of Horror VI. Full clip is here, but no sound on my computer. Sound works here, but it's in German, or if you prefer, in Spanish.
- We talked about how Newton was led to his Law of Gravity (it bothers me that if you wikipedia the phrase `Law of Gravity' you get this!) from Kepler's observational results. We'll eventually have a field trip to the rare books library to see first editions of all these key works.
- There are many different coordinate systems we can use; depending on the symmetry of the problem, frequently it is advantageous to use one system over another. We saw in class how complicated regions were reduced to simpler regions. As a rule of thumb, it's better to have a harder integral over a nicer region (rectangle, box) than a simpler integral over a more complicated region. Three of the most common coordinate systems (after Cartesian) are the following:
  - polar coordinates: used most often in the plane where our quantity depends only on the distance from the origin.
  - cylindrical coordinates: generalization of the above where now we live in three dimensional space, but the part depending on x and y only depends on x² + y².
  - spherical coordinates: another generalization to three dimensions, where we only depend on the distance from the origin.
  - Of course, the story does not end in three dimensions. For many problems we need to work with an n-dimensional sphere, and the resulting coordinate system.

Monday, February 14.
A common feature in several variables is to first recall the one variable case, and use that as intuition to describe what's happening. We started by reviewing the three different ways to write the equation of a line in the plane, point-slope, point-point and slope-intercept, and talked about the hidden vector lurking in the equationi of a line in a plane. We then generalized this to higher dimensions, and then wrote down the definition of a plane (we'll discuss alternate definitions involving normal vectors later in the course; note that planes arose in the Superbowl in 2010 as to whether or not the Saints had control when the ball broke the plane during the two point conversion; click here,click here or click here for more on breaking the plane in football).
- We discussed how there are several different but equivalent ways of writing the same expression. We can do it with vectors, as in (x,y,z) = P + tv, or we can do it as a series of equations, such as x = p₁ + tv₁, y = p₂ + tv₂, z = p₃ + tv₃, or as x_i = p_i + tv_i with i in {1,2,3}. You should use whichever way is easier for you to visualize. It is possible to get so caught up in reductions and compactifications that the resulting equation hides all meaning. A terrific example is the great physicist Richard Feynman's reduction of all of physics to one equation, U = 0, where U represents the unworldliness of the universe. Suffice it to say, reducing all of physics to this one equation does not make it easier to solve physics problems / understand physics (though, of course, sometimes good notation does assist us in looking at things the right way).
- A nice problem is to prove the following about perpendicular lines: the product of their slopes is always -1 if neither is parallel to the x- or y-axis. In some sense, this tells us that in the special case when the lines are the x- and y-axes, we should interpret the product of their slopes as -1, or in other words in this case 0 ·∞ = -1.
- There are many applications of equations of lines, planes and projections. One of my favorites comes from art. The painter Thomas Eakins projected pictures of people onto canvasses; this allowed him to have realistic pictures, and saved him hours of computations. Two pictures frequently mentioned are Arcadia and Mending the Net. He hid what he did; it wasn't until years later that people noticed he had done this. If memory serves, this was discovered when people were looking through photographs in an attic and noticed a picture of four people on a New Jersey highway who were identical to four people in a seascape. Upon closer inspection of the canvass, they noticed marks (which were partly hidden) indicating Eakins projected the image onto the canvass. Click here for more on the subject. See also here for a nice story on the controversy (the use of `technology' such as projectors in art). For a semi-current view on the merits of tracing, watch this video clip.
- There is an enormous literature on the applications of lines, planes, projections et cetera in art. The wikipedia article is a good starting point. Another fun example is the original movie Tron; here is the light cycle scene. Notice how back then almost everything is straight lines, and how the computers are dealing with the perspectives.
- The subject has advanced considerably over the years; ray tracing is huge now, and can do amazing things very fast.
- One final nice application is a paper by Byers and Henle determining where a camera was for a given picture, which allows us to do a great job comparing then and now.
- We discussed the equation for the angle between two vectors. Geometrically, it's clear that if we change the lengths of the vectors then we shouldn't change the angle; after a little inspection, we saw that our formula satisfies that property. It is a great skill to be able to look at a formula and see behavior like this. There is a rich history of applying intuition like this to problems. One example is dimensional (or unit) analysis,which is frequently seen in physics or chemistry; my favorite / standard example is the simple pendulum.
- The Cauchy-Schwarz inequality is one of the most important in mathematics; it's used all the time to bound quantities. My favorite application, which is quite advanced, is to the uncertainty principle in quantum mechanics! It turns out one can view the uncertainty principle as a mathematical statement about a function and its Fourier transform. See me if you want more details.
- There are lots of inequalities in mathematics. Another very useful one is the arithmetic mean - geometric mean; see also my handout with some proofs (written years ago in my Ohio State days)
- We will not cover determinants in great detail. For us, the most important property is that determinants are related to the volume of the span of the different directions.

Friday, February 11. We continued our list of applications of the Pythagorean Theorem. We saw how it leads to the law of cosines, which leads to our angle formula relating the angle in terms of the dot product. We then talked about determinants, which will be really useful when we get to the multidimensional change of variable formula. The cross product will be very useful in dealing with the geometry of various functions, and occurs all the time in physics and engineering, ranging from Maxwell's equations for electromagnetism to the Navier-Stokes equation for fluid flow.
- In one of the sections today, when asked for a relation between sine and cosine, someone mentioned the derivative of sin(x) is cos(x). In differential trigonometry, it is essential that we measure angles in radians. If we use radians, then the derivative of sin(x) is cos(x) and the derivative of cos(x) is -sin(x); this is not true if we use degrees. If we use degrees, we have pesky conversion factors of 360/2π to worry about. The proof of these derivatives follow from the angle addition formulas; let me know if you want more details about this (we'll mention this briefly when we do Taylor series of exp(x)).
- In the proof of the Law of Cosines, the key step was adding an auxiliary line to reduce the problem to the point where we could apply the Pythagorean Theorem. Learning how to add these auxiliary lines is one of the hardest things to do in math. As a good exercise, figure out what auxiliary lines to add to prove the angle addition formula for sine, namely sin(x+y) = sin(x) cos(y) + cos(x) sin(y); click here for the solution. For another example, click here. One thing to keep in mind is what do we know, what are we building upon. We know the Pythagorean formula; we thus want right triangles, which suggests drawing an altitude. There's a lot of nice theorems about altitudes and other such lines.
- In the proof that the area of the hyper-parallelogram is given by the absolute value of the determinant (in two dimensions) we wanted to replace the sin(theta) term with a funtion of cos(theta). Note how similar this is to the proof of the law of cosines; we again are trying to reduce our analysis to something known. We have formulas for the cosines of angles in terms of dot products, but not their signs.
- In one of the sections we talked a bit about the movie Flatland when discussing vectors and parallelograms. The original story is available here, while a trailer from the new movie is here. It's an interesting exercise to think about what life would be like confined to two dimension (think of how you eat and what happens after). Any move that has squaricles is worth seeing! Star Trek: The Next Generation dealt with two-dimensional life forms in the episode The Loss (part 1 part 2 part 3 part 4 part 5).
- In our course we only deal with integral dimensions, but that misses a lot! There are many natural phenomena that legitimately have a fractal dimension. There are famous papers trying to compute the length of the British coast; would you be surprised or not surprised to hear that the Finnish coast has a higher dimension than the British?
- At the end of the 11am section, I mentioned the following fun fact of the day: a medical researcher rediscovers integration and gets 75 citations! The article on this is here, while the paper is here.
Wednesday, February 9. Today we discussed the basic properties of vectors, specifically how to add, subtract, and rescale them.
- The proof that the length of a vector is the square-root of the sum of the squares is a nice example of a proof by induction (see also my notes here). There are many statements in mathematics that can be proved using this technique, and if you plan on continuing in math/physics this is worth learning.
- Years ago I prepared a short handout for some of my students on various proof techniques (click here); it goes through several of the standard methods.
- We ended the day with the definition of the inner or dot product. While our definition only works for vectors, it turns out this is one of the most useful ideas in mathematics, and can be generalized greatly. For example, we can talk about the dot product of functions! We've seen a bit how the dot product is related to angles and lengths, and thus we will find that we can discuss in a sensible manner what the `angle' is between sin(x) and cos(x)! A key part of the lecture was looking at special cases to test a claim (this is related to the extra credit problem due on Friday).
- Finally, we commented on how our analysis of the angle formula, while somewhat convincing, suffers a severe drawback: we're only looking at special vectors, either parallel or perpendicular. There's a real danger of drawing the wrong conclusion from special cases, as we saw from 16/64 and 19/95. For example, if we only looked at right triangles we'd think the sum of the squares of the shorter sides equals the square of the longer. We must check a truly generic case to get some real security; unfortunately, it's hard to check those cases as we don't know the angles! Related to this are some nice stories about people taking advantage of processes that were supposed to be random but weren't. A nice recent example is with scratch lottery tickets (see here for the Wired article, and here for another). For another example, there are some very small errors the Germans made with their Enigma code during WWII, which allowed the Allies to read all German military orders! See the Wikipedia article on Ultra (Ultra was the code given to allied decrypt efforts), as well as Articles from the NSA on cryptography (this is a link to many subpages). Two especially good and accessible ones deal with the German code Enigma, and Ultra, the allied deciphering of it. I strongly urge you to look at the links here. Another nice one is on the Battles of Coral Sea and Midway. An amusing story involves a Civil war message just decoded -- fortunately it wasn't needed! (Another version of the story here.) This is nice application of the Vigenere cipher (see also the notes by my colleague here on how to crack it). This is yet another example of what was supposed to be a random pattern not being truly random, and thus susceptible to attack.

Friday, February 4. The main result we proved today was the Pythagorean Theorem, which relates the length of the hypotenuse of a right triangle to the lengths of the sides (President Garfield is credited with a proof). For us, this result is important as gives us a way to compute the length of vectors. While we only proved it in the special case of a vector with two components, the result holds in general. Specifically, if v = (v₁, ..., v_n) then ||v|| = sqrt(v₁² + ... + v_n²). It is a nice exercise to prove this. One way is to use Mathematical Induction (one common image for induction is that of following dominoes); see also my handout on induction. Below are some additional remarks. These relate to material mentioned in class. The comments below are entirely for your personal enjoyment and edification. You do not need to read these for the class. These are meant to show how topics discussed arise in other parts of mathematics / science; these will not be on exams, you are not responsible for learning them, ....
- We also discussed notation for the natural numbers, the integers, the rationals, the reals and the complex numbers. We will not do too much with the complex numbers in the course, but it is important to be aware of their existence. Generalizations of the complex numbers, the quaternions, played a key role in the development of mathematics, but have thankfully been replaced with vectors (online vector identities here). The quaternions themselves can be generalized a bit further to the octonions (there are also the sedenions, which I hadn't heard of until doing research for today's comments).
- A natural question to ask is, if all we care about are real numbers, then why study complex numbers? The reason is that certain operations are not closed under the reals. For example, consider quadratic polynomials f(x) = ax² + bx + c with a, b and c real numbers. Say we want to find the roots of f(x) = 0; unfortunately, not all polynomials with real coefficients have real roots, and thus to find the solutions may require us to leave the real. Of course, you could say that if all you care about is real world problems, this won't matter as your solutions will be real. That said, it becomes very useful (algebraically) to allow imaginary numbers such as i = sqrt(-1). The reason is that it allows us a very clean way to manipulate many quantities. Our text has a great discussion of this on pages 54 to 61, especially the top of page 55. There is an explicit, closed form expression for the three roots of a cubic; while it may not be as simple as the quadratic formula, it does the job. Interestingly, if you look at x³ - 15x - 4 = 0, the aforementioned method yields (2 + 11i)^1/3 + (2-11i)^1/3. It isn't at all obvious, but algebra will show that this does in fact equal 4! As you continue further and further in mathematics, the complex numbers play a larger and larger role.
- Later in the semester we will revisit Monte Carlo Integration, called by many the most important mathematical paper of the 20th century. Sadly, most integrals cannot be evaluated in closed form, and we must resort to approximation methods.
  - Metropolis: The Beginning Of The Monte Carlo Method
  - Metropolis and Ulam: The Monte Carlo Method
  - Note on the origins of the method: http://www.fas.org/sgp/othergov/doe/lanl/pubs/00326866.pdf
- Sabermetrics is the `science' of applying math/stats reasoning to baseball. The formula I mentioned in class is what's known as the log-5 method; a better formula is the Pythagorean Won - Loss formula (someone linked my paper deriving this from a reasonable model to the wikipedia page). ESPN, MLB.com and all sites like this use the Pythagorean win expectation in their expanded series. My derivation is a nice exercise in multivariable calculus and probability; we will either derive it in class or I'll give a supplemental talk on it.
- In general, it is sadly the case that most functions do not have a simple closed form expression for their anti-derivative. Thus integration is magnitudes harder than differentiation. One of the most famous that cannot be integrated in closed form is exp(-x²), which is related to calculating areas under the normal (or bell or Gaussian) curve. We do at least have good series expansions to approximate it; see the entry on the erf (or error) function.
  - In class we mentioned that the anti-derivative of ln(x) is x ln(x) - x; it is a nice exercise to compute the anti-derivative for (ln(x))ⁿ for any integer n. For example, if n=4 we get 24 x-24 x Ln[x]+12 x Ln[x]²-4 x Ln[x]³+x Ln[x]⁴
- The Fibonacci numbers show up in a variety of places. They satisfy the following recurrence relation: F_{n+2} = F_{n+1} + F_n (with the initial conditions F_0 = 0 and F_1 = 1). After a little inspection one sees that the entire sequence is determined once we know two consecutive numbers, as we can just use the recurrence relation. There are many fun applications of Fibonacci (and other recurrence relations) in nature; perhaps my favorite is proving why Double Plus One is a bad strategy in roulette (though many website, like the one here, don't seem to realize the danger, or perhaps deliberately avoid stating it!). If you're interested in gambling applications of this (or other aspects), just let me know.
- Finally, the quest to understand the cosmos played an enormous role in the development of mathematics and physics. For those interested, we'll go to the rare books library and see first editions of Newton, Copernicus, Galileo, Kepler, .... Some interesting stories below; see also a great article by Isaac Asimov on all of this, titled The Planet That Wasn't.