Additional comments related to material from the 
class. If anyone wants to convert this to a blog, let me know. These additional 
remarks are for your enjoyment, and will not be on homeworks or exams. These are 
just meant to suggest additional topics worth considering, and I am happy to 
discuss any of these further.
  - Wednesday, 
  May 8. We 
  ended the semester by discussing the M&M game (slides 
  are here). Lots of great items to look at. The main lesson of today 
  brought us back to a theme from the start of the semester: the importance of 
  asking open ended questions, of looking for connections between different 
  fields and applying methods and tools from one area in another. Experience is 
  good -- the more things you know, the more connections you can see. You need 
  to separate yourself from the pack -- you don't want to attack the same 
  problems with the same tools as everyone else. That's a recipe for mediocrity 
  (unless you happen to get lucky). One of my 
  - We used a log-log plot 
  to get a better sense of what data is doing -- the way we present information 
  is very important.
 
  - 
  
  We talked about memoryless processes again -- a great way to reduce the 
  analysis of something complicated to something simpler.
 
  - 
  
  The general solution can be written as a special value of a special function, 
  a 
  hypergeometric function.
 
  - 
  
  Video on the application 
  of recurrence relations (a generalization of Fibonacci numbers) to 
  roulette.
 
  - 
  
  One can get a closed form expression for the Fibonacci numbers by using
  generating 
  functions. This is a nice application of series.
 
  - 
  
  We used the Online Encyclopedia of Integer 
  Sequences (OEIS) to use the first few terms of a sequence to get a sense 
  of where it ends. It's fascinating what you'll find there. If you put in some 
  Fibonacci terms, such as 3, 5, 8, 13, 21, you get the Fibonacci numbers and so 
  much more (including something involving the Rubics cube).
 
     
  - Monday, 
  May 6. I 
  am writing a module on streaming video; if you would like to see a draft 
  please email me and let me know. Below are links to some of the key points.
  My slides are available here.
  
   
  - Friday, May 
  3. For 
  details of today's lectures, see my Lecture 
  notes on Green's Theorem. Today 
  we discussed some of the Big Three theorems of Vector Calculus (Green's 
  Theorem, Gauss' 
  Divergence Theorem, and Stokes' 
  Theorem). These theorems are massive generalizations of the Fundamental 
  Theorem of Calculus, which can be generalized even more. The idea is to 
  relate the integral of the derivative of something over a region to the 
  integral of the something over the boundary of the region. To state these 
  theorems requires many concepts from vector calculus (parametrizing curves, 
  vectors, ...) as well as the Change of Variable theorem (converting integrals 
  over curves and surfaces to integrals over simpler curves and surfaces).
    - To truly see and appreciate the richness of the three theorems (which 
    are really three variants of the same theorem), one must be in at least 
    three dimensions. There the 
    Stokes' Theorem states that the integral of a certain function over a 
    surface equals the integral of another over the boundary curve. This means 
    that many integrals turn out to be the same.
 
    - To see the equivalence of these formulations requires differential 
    forms. Frequently it is not immediately clear how to generalize a 
    concept to higher dimensions or other settings.
 
    - While we only briefly touched on the subject, conservative 
    forces are extremely 
    important in physics and engineering, primarily because of a wonderful 
    property they have: the work done in moving an object from A to B is 
    independent of the path taken if the exerted force is conservative. Many of 
    the most important forces in classical mechanics are taken to be 
    conservative, such as gravity and electricity. 
    In modern physics, these forces are replaced with more complicated objects. 
    One of the central quests in modern physics is to unify 
    the various fundamental forces (gravity, 
    strong, weak and electricity and 
    magnetism).
 
    - Click here for more on 
    divergence, and click 
    here for more on curl. Another related object (one we have seen many 
    times) is the gradient. 
    All of these involve the same differential operator, called del 
    (and represented with a nabla). We used our intuition for vectors to 
    define new combiantions involving the del operator (the curl and the 
    divergence). While our intuition comes from vectors, we must be careful as 
    we do not have commutivity. For example, nabla dot F is not the same as F 
    dot nabla; the first is a scalar (number) while the second is an operator. Click 
    here for more on differential operators. For those who want to truly go 
    wild on operators, modern quantum mechanics replaces concepts like position 
    and momentum with differential operators (click 
    here for the momentum operator)! This allows us to rewrite the Heisenberg 
    uncertainty principle in the following 
    strange format.
 
    - One of the most famous applications of these concepts is the Navier-Stokes 
    equation, which is one of the Millenium 
    Problems (solving one of 
    these is probably the hardest path to one 
    million dollars!). The Navier-Stokes equation describes the motion of 
    fluids, which not surprisingly has numerous practical (as well as 
    theoretical) applications. Click 
    here for a nice derivation, which includes many of the new operators we 
    saw today.
 
    - Another place where gradients, curls and divergences appear is the Maxwell 
    equations for electricity and magnetism; you can view 
    the equations here.
 
    - The General Stokes Theorem is a massive generalization of the 
    fundamental theorem of calculus. The idea of formally moving the derivative 
    from the function to the region of integration is meant to be suggestive, 
    but of course is in no ways a proof. Notation should help us 
    see connections and 
    results. The great physicist Richard 
    Feynman showed that all of 
    physics is equivalent to solving the equation \(U = 0\), where \(U\) measure 
    the unworldliness of everything. It is made up of squaring the differences 
    between the left and right hand sides of every physical law. Thus it has 
    terms like \((F - ma)^2\) and \((E - mc^2)^2\). It is a concise way 
    of encoding information, but it is not useful; everything is hidden. This is 
    very different than the vector calculus formulations of electricity and 
    magnetism, which do aid 
    our understanding. For more information, skim 
    the article here(search for unworldliness if you wish).
 
    - We saw that we can compute the lengths of curves by evaluating integrals 
    of \(||c'(t)||\), where \(c(t) = (x(t), y(t), z(t))\) is our curve. While 
    this formulation immediately reduces the problem of finding lengths to a 
    Calc II problem, in general these are very difficult integrals, and 
    frequently cannot be done in closed form even for simple shapes. For 
    example, for extra credit find the length of the ellipse \((x/a)^2 + (y/b)^2 
    = 1\). Click 
    here for the solution (the 
    answer involves the elliptic 
    integral of the second kind).
 
    - We talked today about generalizing the Fundamental 
    Theorem of Calculus. There are not that many fundamental theorems in 
    mathematics -- we do not use the term lightly! Other ones you may have seen 
    are the Fundamental 
    Theorem of Arithmetic and the Fundamental 
    Theorem of Algebra; click 
    here for a list of more fundamental theorems (including 
    the Fundamental 
    Theorem of Poker!).
 
    - Today was a fast introduction to path 
    integrals, line integrals, and Green's 
    Theorem (which is a special 
    case of the Generalized Stokes' 
    Theorem). While our tour of these subjects has to be rushed in a 12 week 
    course, if you are continuing in certain parts of math, physics or 
    engineering you will meet these again and again (for example, see Maxwell 
    equations for electricity and magnetism). In fact, one can view all of 
    classical mechanics as path 
    integrals where the trajectory of the particle (its c(t)) minimizes the 
    action; there is also a path 
    integral approach to quantum mechanics.
      - For those continuing in mathematics or physics, you will see these 
      ideas again if you take complex 
      analysis. In particular, one of the gems of that subject is Cauchy's 
      Integral Theorem, A complex differentiable function satisfies what is 
      called the Cauchy-Riemann 
      equations, and these are essentially the combination of partial 
      derivatives one sees in Green's theorem. In other words, the mathematics 
      used for Green's theorem is crucial in understanding functions of a 
      complex variable.
 
      - For me, I consider it one of the most beautiful gems in mathematics 
      that we can in some sense move the derivative of the function we're 
      integrating to act on the region of integration! This allows us to 
      exchange a double integral for a single integral for Green's theorem (or a 
      triple integral for a double integral in the divergence theorem). As we've 
      seen constantly throughout the year, often one computation is easier than 
      another, and thus many difficult area or volume integrals are reduced to 
      simpler, lower dimensional integrals.
 
      - The fact that \(\int_{t = a}^{b} \nabla(f)(c(t)) \cdot c'(t) dt = 
      f(c(b)) - f(c(a))\) means that this integral does not depend on the path. 
      If a vector field \(F = (F_1, F_2, F_3)\) equals \(\nabla(f)\) for some 
      \(f\), we say \(F\) is a conservative 
      force field and \(f\) is 
      the potential. 
      The fact that these integrals do not depend on the path has, as you would 
      expect, profound applications.
 
      - This is a good point to stop and think about the number of spatial 
      dimensions in the universe. Imagine a universe with two point masses under 
      gravity, and assume gravity is proportional to \(1/r^{n-1}\) with \(r\) 
      the distance between the masses and \(n\) the number of spatial 
      dimensions. If there are three or more dimensions, then the work done in 
      moving a particle from infinity to a fixed, non-zero distance from the 
      other mass is finite, while if there are two dimensions the work is 
      infinite! One should of course ask why the correct generalization to other 
      dimensions is \(1/r^{n-1}\) and not \(1/r^2\) always. There is a nice 
      geometric justification in terms of flux and surface area; the surface 
      area of a sphere grows like \(r^2\) and thus the only way to have the 
      total flux of force out of it be constant is to assume the force drops 
      like \(1/r^2\); click 
      here for a bit on the justification of inverse-square laws.
 
      - Speaking of dimensions, one of my favorite problems from undergraduate 
      days was the Random 
      Walk. In 1-dimension, imagine a person so completely drunk that he/she 
      has a 50% chance at any moment of stepping to the left or the right; what 
      is the probability the drunkard eventually returns home? It turns out that 
      this happens with probability 1. In 2-dimension, we have a 25% chance of 
      moving north, south, east or west, and again the probability of returning 
      is 1. In 3 dimensions, however, the drunkard only returns home with 
      probability about 34%. As my professor Peter 
      Jones said, a 
      three-dimensional universe is the smallest one that could be created that 
      will be interesting for drunks, as they really get to explore! These 
      random walk models are very important, and have been applied to economics 
      (the random 
      walk hypothesis), as well as playing a role in statistical 
      mechanics in physics.
 
    
     
  
     
  - Wednesday, 
  May 1. We 
  discussed convergence of Taylor series, theoretically using the Mean Value 
  Theorem (though a better argument gives a smaller error) and experimentally by 
  looking at a
  
  Mathematica notebook on \(\cos(x)\). We then talked about multivariable 
  Taylor series, and a trick to quickly evaluate them. We ended with a 
  discussion of the
  second 
  derivative test in several variables.
  For more on Taylor 
  series see the Wikipedia page.
    
    - Our proof of how well Taylor 
    series approximate heavily 
    involves the Mean 
    Value Theorem. Taylor series involve writing our function as a 
    combination of the functions \(1, x, x^2, x^3\) and so on; other 
    possibilities exist. We could use trigonometric 
    polynomials, writing our function as combinations of \(\sin(nx)\) and 
    \(\cos(nx)\) where \(n\) ranges over all integers. This leads to Fourier 
    series, which are very useful (and often have great convergence 
    properties). What is so great about all of these is that we can transmit 
    just a few coefficients and then rebuild the function. Why does this work? 
    Rather than transmitting all values of the function, by sending just a few 
    coefficients we can exploit the fact that we have a powerful computer on our 
    end to rebuild the function. If you want to send a video, for example, you 
    could have a two dimensional function \(f(x,y)\), where \(f(x,y)\) 
    represents the color of the pixel at \((x,y)\). We need to reconstruct the 
    function, but we don't want to send the value of each pixel. Enter Fourier 
    series! We now index by time, and consider \(f(x,y;t)\); actually, it's 
    probably better to send \(g(x,y;t) = f(x,y;t) - f(x,y;t-1)\).
 
    - Finally, one can generalize even further and consider orthogonal 
    polynomials.
 
    - 
    
    We saw how well Taylor series approximate functions. The Mathematica 
    program here is 
    (hopefully) easy to use. You can specify the point and number of terms of 
    the Taylor series of \(\cos(x)\) to do. At first it might seem surprising 
    that there is no improvement in fit when we go from a second order to a 
    third order Taylor series approximation; however, we have \(\cos(x) = 1 - 
    x^2/2! + x^4/4! - x^6/6! + \cdots\). In other words, all the odd derivatives 
    vanish at the origin, and thus there is no improvement at the origin in 
    adding a cubic term (ie, the best cubic coefficient at the origin is 0). If 
    we go to a fourth order, we do see improvement. By \(n=10\) or \(12\) we are 
    already getting essentially an entire period correct; by \(n=40\) we have 
    several cycles.
 
    - For many purposes, we just need a first order or second order Taylor 
    series; one of my favorites is the proof of the Central 
    Limit Theorem in probability. 
    One of my favorite proofs involves second order Taylor expansions of the Fourier 
    Transforms (these were 
    mentioned in the additional comments on Friday, March 12).
 
    - If \(f(x)\) equals its infinite Taylor series expansion, can we 
    differentiate term by term? This needs to be proved, and is generally done 
    in a real analysis course. For some functions such as \(\exp(x)\) we can 
    justify the term by term differentiation, but note that this is something 
    which must be 
    justified.
 
    - A terrific application of just doing a first order Taylor expansion is Newton's 
    Method.
 
    - For some reason, most books don't mention the trick on how to quickly 
    compute higher order Taylor expansions in several variables. The idea is to 
    'bundle' variables together and use one-dimensional expansions. For example, 
    consider \(f(x,y) = \exp(-(x^2 + y^2)) \cos(xy)\). We saw in class how 
    painful it is to compute the Hessian, the matrix of second partial 
    derivatives. That involved either two product rules or knowing the triple 
    product formula. If we use our trick, it's much easier. Note \(\exp(u) = (1 
    + u + u^2/2! + \cdots)\) and \(\cos(v) = (1 - v^2/2! + \cdots)\). A second 
    order Taylor expansion means keep only terms with no \(x\)'s and \(y\)'s, 
    with just \(x\) or \(y\), or with just \(x^2\), \(xy\) or \(y^2\) (a third 
    order would allow terms such as \(x^3, x^2 y, x y^2, y^3\), and so on). Thus 
    we expand \(\exp(u) \cos(v)\) and then set \(u = -(x^2+y^2)\) and \(v = xy\). 
    For \(\exp(u)\), we just need \(1 + u\), as already the \(u^2/2\) term will 
    be order \(4\) when we substitute \(-(x^2+y^2)\). For \(\cos(v)\), we only 
    keep the \(1\) as \(v^2/2\) is order \(4\). Thus the Taylor expansion of 
    order \(2\) is just \((1 -(x^2+y^2)) (1) = 1 - x^2 - y^2\); this is a lot 
    faster than the standard method! That method works in general, but there are 
    so many cases where this is faster that it's worth knowing.
 
    
  
     
  - Monday, April 
  29. The 
  main item today was to talk about infinite Taylor series, and see the 
  applications to trigonometry.
    
    - We had a quick introduction to complex 
    numbers \(z = a + ib\), with 
    \(i = \sqrt{-1}\). If \(w = c + id\), then \(z + w = (a+c) + i(b+d)\), and 
    \(zw = (ac-bd) + i(bc+ad)\). Complex numbers play an important role in many 
    subjects, including linear algebra. If you have a general quadratic 
    equation, \(ax^2 + bx + c = 0\), even if \(a, b\) and \(c\) are real then it 
    is not the case that all roots must be real. What is fascinating is that if 
    you have a polynomial with complex coefficients of any degree, all the roots 
    are complex. In other words, once you add in \(i = \sqrt{-1}\), a root of 
    \(x^2 + 1 = 0\), you don't need to add anything else!
      - The complex numbers can be generalized a bit to the Quaternions and 
      the Octonions. 
      The story of the
      discovery of 
      the Quaternions by
      Hamilton 
      is well worth the read.
 
      - Notice that we can view complex numbers as vectors with two real 
      components, but with additional properties. Thus studying them is a nice 
      way to review some of the material we covered earlier. For example, if \(z 
      = a+ib\) and \(\overline{z} = a-ib\) is the complex conjugate of \(z\), 
      then \(z\overline{z} = a^2 + b^2\). Notice how similar this is to the 
      square of the length of the vector \((a,b)\) is \(a^2+b^2\).   
      
 
    
     
    - In differential trigonometry, everything comes down to the limit as 
    \(h\) tends to zero of \(\sin(h)/h\); this limit is only 1 in radians (and 
    thus the derivative of sine is not cosine if we measure in degrees!). One 
    can prove this limit geometrically, as is often done, and then obain the 
    derivatives by using the angle addition formulas. We sketch another avenue 
    to these addition formulas from Taylor series. The Pythagorean 
    Theorem says 
    \(\cos^2(x) + \sin^2(x) = 1\). There are many ways to obtain this formula. 
    Perhaps one of the most useful is the Euler 
    - Cotes formula, \(\exp(ix) = \cos(x) + i\sin(x)\). One can essentially 
    derive all of trigonometry from this relation, with just a little knowledge 
    of the exponential 
    function. Specifically, we have \(\exp(z) = 1 + z + z^2/2! + z^3/3! + \cdots\). 
    It is not at all clear from this definition that \(\exp(z) \exp(w) = \exp(z+w)\); 
    this is a statement about the product of two infinite sums equaling a third 
    infinite sum. It is a nice exercise in combinatorics to show that this 
    relation holds for all complex \(z\) and \(w\). That proof uses the
    Binomial Theorem 
    and binomial 
    coefficients, and a change of variables (replace the double sum over 
    rows and columns with a diagonal one).
    
      - Taking the above identities, we sketch how to derive all of 
      trigonometry! Let's prove the angle addition formulas. We have \(\exp(ix) 
      = \cos(x) + i\sin(x)\) and \(\exp(iy) = \cos(y) + i\sin(y)\). Then \(\exp(ix) 
      \exp(iy) = [\cos(x) + i\sin(x)] [\cos(y) + i\sin(y)]\) \(= [\cos(x) \cos(y) 
      - \sin(x) \sin(y)]\) \(+ i [\sin(x) \cos(y) + \cos(x) \sin(y)]\); however, 
      \(\exp(ix) \exp(iy) = \exp(i(x+y)) = \cos(x+y) + i \sin(x+y)\) by Euler's 
      formula. The only way two complex numbers can be equal is if they have the 
      same real and the same imaginary parts. Thus, equating these yields \(\cos(x+y) 
      = \cos(x) + i\sin(x)\) and \(\sin(x+y) = \sin(x) \cos(y) + \cos(x) \sin(y)\).
 
      - It is a nice exercise to derive all the other identities. One can even 
      get the Pythagorean theorem! To obtain this, use \(\exp(ix) \exp(-ix) = 
      \exp(0) = 1\). Note that \(\exp(ix) = \cos(-x) + i\sin(-x) = \cos(x) - i\sin(x)\), 
      where we used cosine is even and sine is odd.
 
      - We thus see there is a connection between the angle addition formulas 
      in trigonometry and the exponential addition formula. Both of these are 
      used in critical ways to compute the derivatives of these functions. For 
      example, these formulas allow us to differentiate sine, cosine and the 
      exponential functions anywhere once we know their derivative at just one 
      point. Let \(f(x) = \exp(x)\). Then \(f'(x) = \lim_{h\to 0} [f(x+h) - f(x)]/h 
      = \lim_{h\to 0} [\exp(x+h) - \exp(x)] / h = \lim_{h\to 0} [\exp(x) \exp(h) 
      - \exp(x)] / h = \exp(x) \lim_{h\to 0} [\exp(h) - 1] / h\); as \(\exp(0) = 
      1\), we find \(f'(x) = \exp(x) \lim_{h\to 0} [f(h) - f(0)] / h = \exp(x) 
      f'(0)\); thus we know the derivative of the exponential function 
      everywhere once we know the derivative at 0! One finds a similar result 
      for the derivatives of sine and cosine (again, this shouldn't be 
      surprising as the functions are related to the exponential through Euler's 
      formula).
 
    
     
    - Another application of the exponential function is to take the 
    derivative of \(x^r\) for general \(r\). If \(r\) is an integer we can do it 
    via the binomial 
    theorem, which gives us the expansion for \((x+y)^n\) for integer \(n\) 
    (you might know this from Pascal's 
    Triangle). To take the derivative of \(x^{3/2}\) we proceed via the 
    Chain rule: if \(f(x) = x^{3/2}\) then \(g(x) = f(x)^2 = x^3\); we then get 
    \(2 f(x)f '(x) = 3 x^2\); substituting for \(f(x)\) and isolating the 
    derivative \(f '(x)\) gives \(f '(x) = (3/2) x^{1/2}\). If now we have 
    \(x^{\sqrt{2}}\), this is harder. What do we even mean by a number to an 
    irrational power? If we write \(x^{\sqrt{2}}\) as \(e^{y(x)}\), we see \(y(x) 
    = \sqrt{2} \ln(x)\). Thus \(x^{\sqrt{2}} = \exp(\sqrt{2} \ln(x))\); we take 
    the derivative of this using the chain rule, and after some algebra find the 
    derivative of \(x^{\sqrt{2}}\) is \(\sqrt{2} x^{\sqrt{2}-1}\). It's a bit 
    amazing that to find the derivative of \(x^r\) in general requires us to 
    know the exponential function!
 
    - If we look at \(\cos(ix)\) and \(\sin(ix)\), quantities like this can be 
    transformed into expressions that make sense! If \(\exp(ix) = \cos(x) + i \sin(x)\) 
    and \(\exp(-ix) = \cos(x) - i \sin(x)\), then after some algebra we find \(\cos(x) 
    = (\exp(ix) + \exp(-ix)) / 2\) and \(\sin(x) = (\exp(ix) - \exp(-ix)) / 
    2i\). Using these, we can now make sense of \(\cos(ix)\) or even \(\cos(a+ib)\)! 
    This leads to the hyperbolic 
    functions. So yes, it does make sense to talk about quantities such as 
    \(\cos(i)\)!
 
    
  
   
   
  - Friday, April 
  26. Today 
  we discussed Taylor's theorem. This is one of the most important applications 
  of calculus. It allows us to replace complicated functions with simpler ones. 
  There are numerous questions to ask.
    - Are Taylor series unique? Yes. The definition just involves taking sums 
    of derivatives; the process is well-defined.
 
    - Does every infinitely differentiable function equal its Taylor series 
    expansion? Sadly, no; the function \(f(x) = exp(1/x^2)\) if \(|x| > 0\) and 
    \(0\) if \(x=0\) is the standard example. This function causes enormous 
    problems in probability. There are many functions which do equal their own 
    Taylor series expansion, such as \(\exp(x), \cos(x)\) and \(sin(x)\). It's 
    not surprising that these three are listed together, as we have the 
    wonderful Euler-Cotes 
    formula: \(\exp(i x) = \cos(x) + i sin(x)\), with i 
    = sqrt(-1). At first this formula doesn't seem that important; after 
    all, we mostly care about real quantities, so why complexify our life by 
    adding complex (i.e., imaginary) numbers? Amazingly, even for real 
    applications (applications where everything is real), complex numbers play a 
    pivotal role. For example, note that a little algebra gives \(\cos(x) = (\exp(i 
    x) + \exp(-i x)) / 2\) and \(\sin(x) = (\exp(i x) - \exp(-i x)) / 2i\). Thus 
    properties of the exponential function transfer to our trig functions. The hyperbolic 
    cosine and sine functions are 
    similarly defined; \(\cosh(x) = \cos(i x) = (\exp(-i x) + \exp(x)) / 2\). 
    The Foxtrot strip 
    below (many thanks to the author, Bill 
    Amend, for permission to post) illustrates the confusions that can 
    happen between hyperbolic and regular trig functions (for 
    extra credit, why does Eugene know that the calculator cannot be giving the 
    right answer?). 
    It's worth noting that the formula \(\exp(i x) = \cos(x) + i \sin(x)\) 
    allows us to derive ALL trig identities painlessly!
 
    - 
    

 
    - FoxTrot (c) Bill Amend. Used by permission of Universal Uclick. All 
    rights reserved.
 
    - How easy are Taylor series to use? If we keep just a few terms, it's not 
    too bad; however, as the great Foxtrot strip below shows, it's not always 
    clear how nicely something simplifies.
 
    - 
    

 
    - FoxTrot (c) Bill Amend. Used by permission of Universal Uclick. All 
    rights reserved.
 
    - In the strip above, notice the large factorials in 
    the denominator. Note 52! is about 1068; in other words, these 
    terms are small! For interest, 52! is the number of ways (with order 
    mattering) of arranging a standard deck of cards. There are about 1085 or 
    so subatomic thingamabobs in 
    the universe; we see quite quickly reach numbers this high (a deck with 70 
    cards more than sufices; in other words, we could not have each subatomic 
    object in the universe represent a different shuffling of a deck of 70 
    cards). In a related note, it's important to think a bit and decide what 0! 
    should be. It simplifies many formulas to have 0! = 1, and we can make this 
    somewhat natural by saying there is only 1 way to do nothing 
    (mathematically, of course). The 
    definition of the factorial function on Wikipedia talks a little bit about 
    this.
 
    - Unlike 
    0!, 0^0 is a bit more controversial as to what the definition should be. 
    As I don't want to pressure anyone, I will not publically disclose where I 
    stand in the great debate, though I'm happy to tell you privately / through 
    email.
 
    - It's worth remarking on why we have n! in the denominators. This is to 
    ensure that the nth derivative of our function equals the nth derivative of 
    the Taylor expansion at the point we're expanding. In other words, we're 
    matching up to the first 2 derivatives for the 2nd order Taylor expansion, 
    up to the first 3 for the 3rd order Taylor expansion, and so on. It isn't 
    surprising that we should be able to do a good job; the more derivatives we 
    use, the more information we have on how the function is changing near the 
    key point.
 
    - For many purposes, we just need a first order or second order Taylor 
    series; one of my favorites is the proof of the Central 
    Limit Theorem in probability. 
    One of my favorite proofs involves second order Taylor expansions of the Fourier 
    Transforms (these were 
    mentioned in the additional comments on Friday, March 12).
 
    - If \(f(x)\) equals its infinite Taylor series expansion, can we 
    differentiate term by term? This needs to be proved, and is generally done 
    in a real analysis course. For some functions such as \(exp(x)\) we can 
    justify the term by term differentiation, but note that this is something 
    which must be 
    justified.
 
    - A terrific application of just doing a first order Taylor expansion is Newton's 
    Method.
 
  
   
 
  - Wednesday, 
  April 24. We 
  finished our unit on basic properties of sequences and series -- now on to 
  Taylor series!
  - 
  
  We've mentioned Conway's Game 
  of Life a 
  few times, as well as the 
  field of Cellular 
  Automata (which is huge nowadays). You 
  can play it online here (I don't care for the soundtrack and mute it). 
  There are lots of good videos about the Game of Life. Here is Gosper's 
  glider gun, and a breeder 
  leaving Gosper glider guns in its wake. The wikipedia page has a lot of 
  great information on the history and applications of this. One particularly 
  nice bit is on the difficulty of programming, in particular, storing the 
  values of the cells. As most cells don't change, it seems wasteful to keep 
  updating cells that aren't changing, and a more efficient way of conveying 
  change is needed. This idea is not limited to the Game of Life, but applies 
  for instance to streaming 
  video!
 
  - A nice application of sequences and series is to the strategy of double 
  plus one for roulette, 
  and why that is such a bad idea. Using some linear algebra one can write down 
  explicit solutions for these finite sums. In particular, this leads to the 
  topic of difference 
  equations and Binet's 
  formula. I made a nice video on this with OIT:
  double plus ungood.
 
  -  I see a lot of similarities 
  between the convergence tests and finding roots of quadratic polynomials. The 
  'fastest' way to find the roots is to factor, but of course that only works if 
  you can 'see' the roots. If you can't see them, you can use the mechanical 
  grind of the quadratic formula. It's similar with these tests. The 'easiest / 
  fastest' to use is the comparison test, but its effectiveness is tied to how 
  many series you know that converge or diverge. If you can't see a good series 
  to compare it with, you then move to the ratio and root tests, and then to the 
  integral test.
    - The big tests are:
      - Comparison test: 
      This is one of my favorites, though to be effective you must know how a 
      lot of examples of convergent and divergent series.
 
      - Ratio Test: 
      Remember that the ratio test provides no information if the value is 1; 
      thus it says nothing about the convergence or divergence of \(1/n^p\) for 
      any fixed \(p > 0\).
 
      - Root Test: 
      Remember that the root test provides no information if the value is 1; 
      thus it says nothing about the convergence or divergence of \(1/n^p\) for 
      any fixed \(p > 0\).
 
      - Integral Test: 
      The most common example is the harmonic sum, \(1/n\). The integral test 
      not only gives the divergence, but with a bit more work shows that the sum 
      of the first n reciprocals of integers is about \(\ln(n)\).
 
    
     
  
   
  - 
  
  A nice application of a series expansion is Stirling's 
  formula for n!. 
  We can get close to the correct value by the integral 
  test or 
  the Euler-MacLaurin 
  summation formula, 
  This builds on a very important question: we know the integral test tells 
  whether or not a series converges; if it does converge, how close is the sum 
  to the integral? The Euler-MacLaurin formula teaches us how to convert sums to 
  integrals and bound the error.
    - The fact that \(\sum_{n = 1}^\infty 1/n^2 = \pi^2/6\) has a lot of 
    applications. It can be used to prove that there are infinitely many primes 
    via the Riemann 
    zeta function. The Riemann zeta function is \(\zeta(s) = \sum_{n = 1}^\infty 
    1/n^s\). By unique 
    factorization (also known as the Fundamental Theorem of Arithmetic), it 
    also equals \(\prod_{p\ {\rm prime}} 1 / (1 - 1/p^s)\); notice that a 
    generalization of the harmonic sum and the geometric series formula are 
    coming into play. It turns out that \(\zeta(2) =  \pi^2/6\), as can be seen 
    in many different ways. As \(\pi^2\) is irrational, 
    if there were only finitely many primes then the product would be 
    irrational, contradiction! See 
    wikipedia for a proof of this sum.
    
 
    - Another interesting application of summing series involving primes is to 
    the Pentium 
    bug (see the links there for 
    more information, as well as Nicely's 
    webpage). The calculation being performed was sum_{\(p\): \(p\) prime 
    and either \(p+2\) or \(p-2\) is prime} \(1/p\); this is known as Brun's 
    constant. If this sum were infinite then there would be infinitely many 
    twin primes, proving 
    one of the most famous conjectures in mathematics; sadly the sum is 
    finite and thus there may or may not be infinitely many twin primes (twin 
    primes are two primes differing by 2).
 
  
   
  - No set of comments involving Conway and in a class studying sequences 
  would be complete without the following story.
  
 
    
   
  - Monday, April 
  22. Today 
  we saw an example of a geometric series (probabilities of winning), and 
  discussed some convergence tests.
  - 
  
  The proof we gave today of the geometric series formula (by shooting baskets) 
  uses many great techniques in mathematics. It is thus well worth it to study 
  and ponder the proof.
    - Memoryless 
    process: once both people miss, it is as if we've just started the game 
    fresh.
 
    - Calculating something two different ways: a good part of combinatorics 
    is to note that there are two ways to compute something, one of which is 
    easy and one of which is not. We then use our knowledge of the easy 
    calculation to deduce the hard. For example, \(\sum_{k=0}^n \left({n \atop 
    k}\right)^2 = \left({2n \atop n}\right)\); the right side is easy to 
    compute, the left side not so clear. Why are the two equal? It involves 
    finding a story, which we leave to the reader.
 
  
  
    - For another example of applications of harmonic numbers, see 
    the coin collector problem (if 
    you want more info on this problem, let me know -- I have lots of notes on 
    it from teaching it in probability).
 
    - In Section 2 a few years ago, a basketball shot basically went in and 
    out; see 
    the following article on golf 
    for some info on related problems in golf.
 
  
   
  - 
  
  
  The Comparison 
  Test is one of the most 
  important ways we have to tell if a series converges or diverges, but it is 
  one of the hardest to use. It is only as good as our list of comparable 
  series. Frequently one must do some algebra to manipulate the expressions. In 
  particular, one needs to know how rapidly certain functions grow. We showed 
  polynomials grow slower than exponentials, and logarithms grow slower than 
  polynomials. One important application of results like these is in algorithm 
  analysis in computer science, where we try to determine how fast an algorithm 
  runs. Measuring which algorithm is best is not easy; do we care about how fast 
  it is on the worst input, or on the average speed? A common problem is to sort 
  n elements in a list. Different ways are QuickSort, BubbleSort and MergeSort. 
  There are other ways -- can you think of one?
 
  - 
  We used 
  L'Hopital's rule to compare growth rates of functions; we'll discuss the proof 
  later the semester, but for now see 
  the article here on it.
 
     
  - Friday, April 
  19. We 
  encountered two of the most important sequences today: the geometric sequence 
  and the harmonic sequence. We proved the geometric series formula today and 
  our analysis of why the harmonic series diverges can be used to get an 
  estimate to its rate of growth.
  - 
  
  Standard examples of sequences and series include
  
    - An infinite 
    series of surprises: a nice article going from the geometric series to 
    the harmonic series to other important examples.
 
    - We mentioned that sequences and series are very important; two of the 
    most powerful applications are Taylor 
    series (approximating 
    complicated functions with simpler ones) and Riemann 
    sums (allowing us to 
    calculate areas with integrals).
 
    - L'Hopital's rule is 
    frequently used to analyze the behavior of sequences. Remember that you can 
    only use it if you have 0 over 0 or infinity over infinity.
 
  
   
  - Another great application of sequences and series is to calculating 
  probabilities. If two random 
  variables are independent, the probability that both happen is the product 
  of the probabilities that each happens. By using the logarithm function, we 
  can convert products to sums, and we'll see later that there are good ways to 
  estimate the value of these sums.
 
  - We saw in class today that the harmonic series has a divergent sum, or \(\sum_{n 
  = 1}^{\infty} 1/n\) is infinity. We'll see later that \(\sum_{n = 1}^{\infty} 
  (-1)^{n+1} / n\) converges to \(\ln(2)\). Related to this, one can consider 
  the series \(\sum_{n=1}^\infty x_n / n\), where each \(x_n\) is 1 with 
  probability 1/2 and -1 with probability 1/2 (think of this as infinitely many 
  independent coin flips). Interestingly, a lot can be said about these random 
  sums; see 
  a great article here.
 
  - 
  
  Extra credit: What is wrong with the following argument: Let's 
  say we want to compute \(\lim_{h\to 0} \frac{\sin(h)}{h}\); this is the most 
  important trig limit. We use L'Hopital's rule and note that it is the same as 
  \(\lim_{h \to 0} \cos(h) / 1\); as \(\cos(h)\) tends to 1, the limit is just 
  1. Why is this argument not valid? The answer is one of the most important 
  principles in mathematics.
 
     
  - Wednesday, April 
  15.
  
  Today's lecture serves two purposes (click 
  here for the slides). While it does review many of the concepts from 
  integration, more importantly it introduces many of the key ideas and 
  challenges of mathematical modeling. Most students of 105 won't be taking 
  partial derivatives or integrals later in life (though you never know!); 
  however, almost surely you'll have a need to model, to try and describe a 
  complex phenomena in a tractable manner.
    - Sabermetrics is 
    the `science' of applying math/stats reasoning to baseball. The formula I 
    mentioned at the start of the semester is known as the log-5 
    method; a better formula is the Pythagorean 
    Won - Loss formula (someone 
    linked my 
    paper deriving this from a reasonable model to 
    the wikipedia page), the topic of today's lecture. ESPN, MLB.com and all 
    sites like this use the Pythagorean win expectation in their expanded 
    series. My derivation is a nice exercise in multivariable calculus and 
    probability
 
    - In general, it is sadly the case that most functions do not have a 
    simple closed form expression for their anti-derivative. Thus integration is 
    magnitudes harder than differentiation. One of the most famous that cannot 
    be integrated in closed form is \(exp(-x^2)\), which is related to 
    calculating areas under the normal (or bell or Gaussian) curve. We do at 
    least have good series expansions to approximate it; see the entry on the erf 
    (or error) function.
      - The anti-derivative of \(ln(x)\) is \(x ln(x) - x\); it is a nice 
      exercise to compute the anti-derivative for \((ln(x))^2\) for any integer 
      \(n\). For example, if \(n=4\) we get \(24 
      x - 24 x \ln(x) + 12 x (\ln x)^2 
    - 4 x (\ln x)^3 + x (\ln x)^4\).
 
    
     
  
  
  
    - 
    
    Another good distribution to study for sabermetrics would be a Beta 
    Distribution. We've seen an example already this semester when we looked 
    at the Laffer 
    curve from economics.I would 
    like to try to modify the Weibull analysis from today's lecture to Beta 
    distributions. The resulting integrals are harder -- if you're interested 
    please let me know.
 
    - 
    
    Today we discussed modeling, in particular, the interplay between finding a 
    model that captures the key features and one that is mathematically 
    tractable. While we used a problem from baseball as an example, the general 
    situation is frequently quite similar. Often one makes simplifying 
    assumptions in a model that we know are wrong, but lead to doable math (for 
    us, it was using continuous probability distributions in general, and in 
    particular the three 
    parameter Weibull). For more on these and related models, my 
    baseball paper is available here; another interesting read might be my 
    marketing paper for the movie industry (which 
    is a nice mix of modeling and linear programming, which is the linear 
    algebra generalization of Lagrange multipliers).
      - One of the most important applications of finding areas under curves 
      is in probability, where we may interpret these areas as the probability 
      that certain events happen. Key concepts are:
      
 
      - The more distributions you know, the better chance you have of finding 
      one that models your system of interest. Weibulls are frequently used in 
      survival analysis. The exponential 
      distribution occurs in 
      waiting times in lines as well as prime numbers.
 
      - In seeing whether or not data supports a theoretical contention, one 
      needs a way to check and see how good of a fit we have. Chi-square 
      tests are one of many 
      methods.
 
      - Much of the theory of probability was derived from people interested 
      in games of chance and gambling. Remember that when the house sets the 
      odds, the goal is to try and get half the money bet on one team and half 
      the money on the other. Not surprisingly, certain organizations are very 
      interested in these computations. Click 
      here for some of the details on the Bulger case (the 
      bookie I mentioned in class is Chico Krantz, and is referenced briefly).
 
      - Any lecture on multivariable calculus and probabilities would be 
      remiss if it did not mention how unlikely it is to be able to derive 
      closed form expressions; this is why we will study Monte 
      Carlo integration later. 
      For example, the normal 
      distribution is one of the 
      most important in probability, but there is no nice anti-derivative. We 
      must resort to series expansions; that expansion is so important it is 
      given a name: the 
      error function.
 
      - I strongly urge you to read the pages where we evaluate the integrals 
      in closed form. The methods to get these closed form expressions occur 
      frequently in applications. I particularly love seeing relations such as 
      \(1/c = 1/a + 1/b\); you may have seen this in resistors 
      in parallel or perhaps the reduced 
      mass from the two 
      body problem (masses under 
      gravity). Extra credit to anyone who can give me another example of 
      quantities with a relation such as this.
 
      - Click here for a 
      clip of Plinko on the Price I$ Right, or here for a showcase 
      showdown.
 
    
     
    - We discussed how website like ESPN and MLB have a very limited space to 
    display information, especially if it's for a smart phone. Thus one cannot 
    show every statistic, and we have to pick and choose which ones are worth 
    showing. In one section I made a joke about including the team names, but 
    this is actually a serious comment! The MBTA (or 
    MTA for us old folk!) is having a contest on how to redesign their 
    subway map of Boston. Below are links to an interesting article on the 
    subject and the maps.
    
 
    
  
     
  - Friday, April 
  12.
  
  Today's lecture again served two purposes. The first was to introduce (or to 
  refamiliarize) you with sequences and series, and the second was to talk about 
  proofs by induction (one of the most powerful proof techniques we have.
  - 
  
  Mathematical 
  Induction is a wonderful way to prove 
  results. One common image for induction is that of following 
  dominoes. We have a statement \(P(n)\), 
  and if we can show \(P(1)\) is true and whenever \(P(n)\) is true then 
  \(P(n+1)\) follows, we can then conclude \(P(n)\) holds for all positive 
  integers. We gave some standard examples, such as sums of odd integers and 
  sums of integers, and a more exotic one (how a simple mistake leads to 
  everyone has the same name). It is very easy, 
  sadly, to subtly assume special properties when you try to prove something. No 
  one noticed that the argument given for the same name subtly assumed n was at 
  least 2. You need to constantly be vigilant about making additional, 
  unwarranted assumptions. A lot of the financial crises was due to people using 
  a math formula where they shouldn't.
  
 
  - 
  If you 
  want to read more about mathematical induction / see more examples, click 
  here for some of my notes.
 
  - 
  We 
  talked about sequences and series. We've seen many examples in previous 
  classes, one of the most important being the upper and lower sums leading to a 
  proof of the Fundamental Theorem of calculus. Remember a sequence \(\{a_n\}_{n 
  = 1}^{\infty}\) converges to \(L\) if \(\lim_{n \to \infty} |a_n - L| = 0\). A 
  nice exercise is to show a sequence can have at most one limit. Often we can 
  `guess' the limit and check, or by brute force show something is not a limit.
  - 
  For 
  example: let \(a_n = n^2\). We show no \(L\) is a limit. For definiteness, 
  let's show \(L\) = 2013 cannot work. We have to study \(\lim_{n \to \infty} |a_n 
  - 2013|\). Note if \(n\) is large, \(|n^2 - 2013| > |2013n-2013| = 
  2013(n-1)\); this is because eventually \(n^2 > 2013\). But as \(n \to \infty\) 
  clearly \(2011(n-1)\) goes to infinity, and thus \(L=2013\) is not a limit.
 
  - 
  We 
  talked a bit about the alternating sequence \(a_n = (-1)^n/n\). We'll see 
  later that if we were to sum the 
  terms of the sequence we would get \(\ln(1/2)\).
 
   
  - 
  We then 
  discussed the Birthday 
  Problem. In addition to being a fun example, it also has applications in 
  cryptography, leading to the birthday 
  attack. This is used to help people have secure electronic signing of 
  messages, and thus is essential for modern commerce! While playing our clicker 
  game we saw that we could eliminate some answers -- getting an intuition for 
  problems like this is very important.
 
  - 
  There 
  are lots of great questions you can ask to generalize the birthday problem. 
  One of the best things you can do to train to be a scientist or researcher is 
  to practice asking questions to generalize something. We did one in class: how 
  many people would we need if we lived on the dwarf 
  planet, Pluto. In general it's hard to find an answer in closed form 
  depending on the parameters of the problem; it turns out that the number of 
  people needed if there are \(D\) days in a year to have a 50% chance of two 
  birthdays the same is about \(\sqrt{D \cdot \ln(4)}\), a very nice function of 
  \(D\). Here are some other questions:
  - How many people do you need before you have a 50% chance that three people 
  share a birthday?
 
  - How many people do you need before you have a 50% chance that there are at 
  least two pairs of people with the same birthday?
 
  - We know that we need about 23 people for a 50% of a birthday collision; we 
  know if we reach 365 people without a birthday collision then the next person 
  must force two to share a birthday. For each person we can see what percent of 
  the time they are the first person to cause a birthday collision. Which person 
  is most likely to cause the collision? While the 366th person always causes a 
  collision, it is very unlikely to reach that.
 
  - Try and write your own questions -- email one to me 
  for extra credit and say why you find it interesting.
 
   
   
 
  - Wednesday, 
  April 10.
  
  The Change 
  of Variable formula ties 
  together many of the topics of the semester and generalizes a similar result 
  from one-variable calculus. With complicated formulas such as this, it is 
  quite useful to look at special cases first to get a sense of what is going on 
  and then to try and generalize, being aware of course that sometimes there are 
  features that are missed in the special cases. I like to look at this formula 
  as giving the exchange rate from measuring in one coordinate system to 
  another. For example, going from Cartesian to Polar coordinates 
  we cannot have dx dy go to dr dtheta, as \(dx dy\) would have units of 
  meters-squared while \(dr d\theta\) has units of meters-radians and radians 
  are essentially unitless. (As a side note, the most important unitless number 
  in physics is the fine 
  structure constant.) 
  We will see later that \(dx dy\) transforms to \(r dr d\theta\). The notes 
  below are a bit more detailed on the Change of Variable formula than what was 
  covered in class. See my lecture notes for further details.
    - Our analysis shows that when we have a linear rescaling, say \(u = 2x\) 
    and \(v = 3y\), then if \(T(x,y) = (u,v)\) and \(T^{-1}(u,v) = (x,y)\), then 
    \(dx dy\) transforms to \(|\det(D T^{-1})|\). Note how many concepts are 
    being applied here. We have the derivative of a vector valued function and 
    we have determinants. The reason for the absolute value is a bit tricky, but 
    comes from the danger of having signed areas. Remember in Calc I that \(\int_{x 
    = a}^{b} f(x) dx = - \int_{x = b}^{a} f(x) dx\). We are looking at how the 
    area elements transform. In order to make sure the areas are positive, we 
    need to insert absolute values here.
 
    - Another caveat is where to evaluate our function when we integrate it 
    over the transformed region. Assume we have a map \(T\) from \(xy\)-space to 
    \(uv\)-space. Let \(R = T(S)\). What should \(\int \int_S f(x,y) dx dy\) 
    equal in \(uv\)-space? It becomes \(\int \int_{T(S)} f(T^{-1}(u,v)) |\det 
    DT^{-1}(u,v)| du dv\). This is similar to the chain rule. If \(A(x) = f(g(x))\) 
    remember \(A'(x) = f'(g(x)) g'(x)\) and not \(f'(x) g'(x)\). This is one of 
    the most common mistakes, namely evaluating \(f\) at the wrong point. 
    Similarly here we need to make sure we evaluate \(f\) at the right place. In 
    \(uv\)-space, our inputs are \(u\) and \(v\), but \(f\) is expecting as 
    inputs \(x\) and \(y\). As \(T\) sends \(x\) and \(y\) to \(u\) and \(v\), 
    \(T^{-1}\) sends \(u\) and \(v\) to \(x\) and \(y\), and thus the new 
    function say \(g(u,v) = f(T^{-1}(u,v))\) is what we should integrate over \(T(S)\).
 
    - There are many applications of the Change of Variables formula, 
    especially in probability theory; see 
    here for a one-dimensional example (if 
    you have access to JStor, here 
    is one in economics).
 
    - We saw how we can easily get the area of an ellipse or the volume of an 
    ellipsoid using the change of variable; the 
    perimeter of an ellipse is much harder!
 
    - Consider the game 
    of Pac-man (click 
    here for some facts about the ghosts movement), which after a little 
    thought we see is really happening on a cylinder; 
    if there was another pair of warp tunnels connecting the top to the bottom 
    we would have a torus 
    or a donut. It is amazing that we can represent these complicated 
    regions by simple maps of a unit square, and give another example of the 
    power of these coordinate transformations. This is the beginning of the 
    field of topology.
 
    - You can view a cylinder as what you get when you take a piece of paper 
    and glue two opposite sides together. If you then glue the other two sides 
    together you get a torus or a donut. if instead you start with a square and 
    glue two sides together but twist the sides as you glue, you get the Mobius 
    strip. This strange figure has only one side!
 
    - For example, the ellipse \((x/4)^2 + y^2 = 1\) is four times longer than 
    wide. It is clearly not circular; however, if we change units and measure 
    along the \(x\)-axis in meters and the \(y\)-axis in the new length units of 
    Ephs (where 1 Eph equals 1/4 of a meter, or 4 Ephs equals a meter), then in 
    this biased measuring it is a 
    circle!
      - 
      
      There are lots of great units, many created by MIT students. Two of my 
      favorites are the Smoot (interestingly, 
      the person was the 
      unit of measurement ended up as the president of the International 
      Organization for Standardization) and 
      the Bruno (this 
      is the indentation, in cubic-centimeters I believe, made from a piano 
      dropped 6 stories...).
 
    
     
    - We also started our discussion of 
    sequences and series. 
    
    Instead of the standard examples, it's fun to explore some of the more 
    exotic possibilities:
      - The 3x+1 problem (I 
      have a paper 
      on the 3x+1 problem, concerning the distribution of leading digits of 
      the iterates and applications to fighting tax fraud; this is related to Benford's 
      law, and is a very important area of research that is accessible to 
      undergraduates).
 
      - 
      
      We talked about sequences and series. We've seen many examples in previous 
      classes, one of the most important being the upper and lower sums leading 
      to a proof of the Fundamental Theorem of calculus. 
 
    
     
    - 
    
    Standard examples of sequences and series include
    
      - An 
      infinite series of surprises: a nice article going from the geometric 
      series to the harmonic series to other important examples.
 
      - We mentioned that sequences and series are very important; two of the 
      most powerful applications are Taylor 
      series (approximating 
      complicated functions with simpler ones) and Riemann 
      sums (allowing us to 
      calculate areas with integrals).
 
      - L'Hopital's 
      rule is frequently used to 
      analyze the behavior of sequences. Remember that you can only use it if 
      you have 0 over 0 or infinity over infinity.
 
    
     
    - 
    
    Finally, a common theme in mathematics is the need to 
    simplify tedious algebra. Frequently we have claims that can be proven by 
    long and involved computations, but these often leave us without a real 
    understanding of why the claim is true. I talked at the start of class today 
    about my proof of Morley's theorem. If you want, let me know and I'll show 
    you my 40-50 page proof of Morley's 
    theorem; Conway 
    has a beautiful proof which you can read here (it's after the irrationality 
    of sqrt(2)).
 
 
  
     
  - Monday, 
  April 8.
  
  Monte Carlo Integration 
  is 
  called by many the most 
  important mathematical paper of the 20th century. Sadly, most integrals cannot 
  be evaluated in closed form, and we must resort to approximation methods.
  
  - 
  
  Here are some additional readings on the subject
  
 
  - 
  
  In general, it is sadly the case that most functions do not have a simple 
  closed form expression for their anti-derivative. Thus integration is 
  magnitudes harder than differentiation. One of the most famous that cannot be 
  integrated in closed form is exp(-x2), 
  which is related to calculating areas under the normal (or bell or Gaussian) 
  curve. We do at least have good series expansions to approximate it; see the 
  entry on the erf 
  (or error) function.
    - The anti-derivative of \(\ln(x)\) is \(x \ln(x) - x\); it is a nice 
    exercise to compute the anti-derivative for \((\ln(x))^n\) for any integer 
    \(n\). For example, if \(n=4\) we get \(24 x - 24 x \ln(x) + 12 x (\ln x)^2 
    - 4 x (\ln x)^3 + x (\ln x)^4\).
 
  
   
  - Today we finished the classic, Big Three change of 
  variables. 
  
  There are many different coordinate systems we can use; depending on the 
  symmetry of the problem, frequently it is advantageous to use one system over 
  another. We saw in class time and again how complicated regions were reduced 
  to simpler regions. As a rule of thumb, it's better to have a harder integral 
  over a nicer region (rectangle, box) than a simpler integral over a more 
  complicated region. Three of the most common coordinate systems (after 
  Cartesian) are the following:
  
 
     
  - Friday, April 
  5. In life you eventually learn to have Pavlovian 
  responses. In math if you ever see a product you should think of a logarithm. 
  The first comment is on another such situation.
    
    - Probably my favorite example (and one of the most important!) of using 
    polar coordinates to evaluate an integral is to find the value of the Gaussian 
    integral \(\int_{-\infty}^\infty 
    \exp(-x^2)dx\). Of course, it seems absurd to use polar coordinates for this 
    as we are in one-dimension! Our book has a good discussion of this problem, 
    as does the wikipedia 
    page. This is one of the most important integrals in the world, and 
    leads to the normalization constant for the normal 
    distribution (also known as 
    the bell curve or the Gaussian distribution), which may be interpreted as 
    saying the factorial of -1/2 is \(\sqrt{\pi}\)! 
    (The exclamation here is for emphasis, not for factorial.)
 
 
    - We will soon finish 
    the Big Three coordinate changes:  polar 
    coordinates and cylidrical 
    coordinates and spherical 
    coordinates (be aware that 
    physicists and mathematicians have different definitions of the angles in 
    spherical!).
      - One can generalize spherical coordinates to hyperspheres 
      in n-dimensional space. These lead to wonderful applications of 
      special functions, such as the Gamma 
      function, in writing down formulas for the `areas' and `volumes'. As a 
      nice exercise, you can rewrite the integral in the comment above as 
      \(\Gamma(1/2)\).
 
      - There are many fascinating question involving spheres (with 
      applications to error 
      correcting codes!):
      
 
      - One of the most important applications of spherical coordinates is to 
      planetary motion, specifically, proving that the force one sphere exerts 
      on another is equivalent to all of the mass being located at the center of 
      the sphere. This is the most 
      important integral in Newton's 
      great work, Principia (we 
      have a first edition at the library here). I strongly urge everyone to 
      look at this problem. Proving that one can take all of the mass to be at 
      the center enormously simplifies the calculations of planetary motion. See 
      the Wikipedia article on the Shell 
      Theorem for the computation. As this is so important, here 
      is another link to a proof. Oh, let's 
      do another proof here as 
      well as another 
      proof here. For an example of a non-proof, read 
      the following and the comments.
 
    
     
    - Today we had a brief introduction to probability; one of the most 
    important applications of integration is to determining probabilities, which 
    frequently are areas under curvse.
 
    - I'll post comments on Monte Carlo integration when we finish it on 
    Monday. I loved how we were able to conjecture a formula for the area of an 
    ellipse based on symmetry and a special case, and numerically verify it. 
    This is one of the most important skills to have.
 
  
    
  - Wednesday, 
  April 3.
  
  Fubini's Theorem (changing the order of integrations) is one of the most 
  important observations in multivariable calculus. For us, we assume our 
  function f(x,y) is either continuous or bounded, and that it is defined on a 
  simple region D contained in a finite rectangle. If D is an unbounded region, 
  say D = {(x,y): x, y >= 0} then Fubini's theorem can fail for continuous, 
  bounded functions. In class we did an example involving a double sum, where 
  a_{0,0} = 1, a_{0,1} = -1, a_{0,n} = 0 for all n >= 2, then a_{1,0} = 0, 
  a_{1,1} = 1, a_{1,2} = -1, and then a_{1,n} = 0 for all n >= 3, and so on. If 
  we want to have a continuous function, we can tweak it as follows. Consider 
  the indices {m,n}. Draw a circle of radius 1/2 with center {m,n} (note no two 
  points will have circles that intersect or overlap). If a_{m,n} is positive, 
  draw a cone with base a circle of radius 1/2 centered at {m,n} and height 12/π. 
  As the area of a cone is (1/3) (area base) (height), this cone will have 
  volume 1; if a_{m,n} was positive we draw a similar cone but instead of going 
  up we go down, so now the 
  volume is -1. What is going wrong? The problem is that Sum_m Sum_n |a_{m,n}| = ∞ 
  (the sum of the absolute values diverges), and when infinities enter strange 
  things can occur. Recall we are not allowed to talk about ∞ - ∞; the 
  contribution from where our function or sequence is positive is +∞, the 
  contribution where it is negative is -∞, and we are not allowed to subtract 
  infinities.
    - To motivate the Change 
    of Variable Formula, which we'll see later, try to find the area of a 
    circle by doing the integration directly. While there are many ways to 
    justify learning the Change of Variable Formula (it's one of the key tools 
    in probability), I want to take the path of looking at what should be 
    a simple integral and seeing how hard it can be to evaluate in the given 
    coordinate system. Much of modern physics is related to changing coordinate 
    systems to where the problem is simpler to study (see the Lagrangian or Hamiltonian 
    formulations of physics); 
    these are equivalent to F = ma, but lead to much simpler algebra. The 
    problem we considered was using one-variable calculus to find the area under 
    a circle. This requires us to integrate sqrt(1 - x2) from x=0 to 
    x=1. This is one of the most important shapes in mathematics -- if calculus 
    is such a great and important subject, it should be able to handle this!
 
    - To attack this problem, recall a powerful 
    technique from Calc I: if f(g(x)) = x (so f and g are inverse functions, 
    such as sqrt(x^2)), then g'(x) = 1 / f'(g(x)); in other words, knowing the 
    derivative of f we know the derivative of its inverse function. This was 
    used in Calc I to pass from knowing the derivative of exp(x) to the 
    derivative of ln(x). We can various inverse 
    trig functions; while many are close to sqrt(1-x^2), none of them are 
    exactly that (a 
    list of the derivatives of these are here). This highlights one of the 
    most painful parts of integration theory -- just because we are close to 
    finding an anti-derivative does not mean we can actually find it! While 
    there is a 
    nice anti-derivative of sqrt(1 - x^2), it is not a pure derivative of an 
    inverse trig function. There are many tables 
    of anti-derivatives (or integrals) (a 
    fun example on that page is the Sophomore's 
    Dream). Unfortunately it is not always apparent how to find these 
    anti-derivatives, though of course if you are given one you can check by 
    differentiating (though sometimes you have to do some non-trivial algebra to 
    see that they match). In fact, there are some tables of integrals of 
    important but hard functions where most practitioners have no idea how these 
    results are computed (and occasionally there are errors!). We will see later 
    how much simpler these problems become if we change variables; to me, this 
    is one of the most important lessons you can take from the course:  MANY 
    PROBLEMS HAVE A NATURAL POINT OF VIEW WHERE THE ALGEBRA IS SIMPLER, AND IT 
    IS WORTH THE TIME TO TRY TO FIND THAT POINT OF VIEW!
 
    - For another example of changing your 
    viewpoint, think of trying to write down an ellipse aligned with the 
    coordinate axes, and one rotated at an angle. Linear algebra provides a nice 
    framework for doing these coordinate transformations, changing hard problems 
    to simpler ones already understood.
 
    - Frequently we are confronted with the need to find the integral of a 
    function that we have never seen. One approach is to consult a table of 
    integrals (here 
    is one at wikipedia; see also the 
    table here). Times have changed from when I was in college. Gone are the 
    days of carrying around these tables; you can access Mathematica's 
    Integrator on-line, and it 
    will evaluate many of these. One caveat: sometimes these integrals are 
    doable but do not appear in the table in the form you have, and some work is 
    required to show that they equal what is tabulated.
      - A good 
      example, of course, is just computing the area of a circle! In Cartesian 
      coordinates we quickly see we need the anti-derivative of sqrt(1 - x^2), 
      which involves inverse trigonometric functions; it is very straightforward 
      in polar! In fact, we can easily get the volume of a sphere by integrating 
      the function sqrt(1 - x^2 - y^2) over the unit disk!
 
      - Famous tables are Abramowitz 
      and Stegun and Gradshteyn 
      and Ryzhik.
 
      - For those interested in some of the history of special functions and 
      integrals, see 
      the nice article here by Stephen Wolfram. There's a lot of nice bits 
      in this article.
        - One of my favorites is the throw-away comment in the beginning on 
        how the Babylonians reduced multiplication to squaring. Here's the full 
        story. The Babylonians worked base 60; if you think memorizing our 
        multiplication table is bad, consider their problem: 3600 items! Of 
        course, you lose almost 1800 as xy = yx, but still, that's a lot of 
        tablets to lug. To compute xy, the Babylonians noted that xy = ((x+y)^2 
        - x^2 - y^2) / 2, which reduces the problem to just squaring, 
        subtracting and division by 2. There are more steps, but they are easier 
        steps, and now we essentially just need a table of squares. This concept 
        is still with us today: it's the idea of a look-up 
        table, computing new values (or close approximations) from a small 
        list. The idea is that it is very fast for computers to look things up 
        and interpolate, and time consuming to compute from scratch.
 
      
       
    
     
    
  
     
  - Monday, 
  April 1.
  
  The main result today was a method for integrating over regions other than a 
  rectangle. We discussed a theoretical way to do it last class by replacing our 
  initial function \(f\) on a rectangle including \(D\) with a new function 
  \(f^\ast\), with \(f^\ast(x,y) = f(x,y)\) if \((x,y)\) is in our domain \(D\) 
  and 0 otherwise. To make this rigorous we need to argue and show that we may 
  cover any curve with a union of rectangles with arbitrarily small area. This 
  leads to some natural, interesting questions.
    - The first, and most important, involves what happens to a function when 
    we force it to be 0 from some point onward (say outside \(D\)). The function may be 
    discontinuous at the boundary, but then again it may not. There are many 
    interesting and important examples from mathematical physics where we are 
    attempting to solve some equation that governs how that system evolves. One 
    of the most studied are the vibrations of a drum, where the drumhead is 
    connected and stationary. We can thus view the vibrating drumhead as giving 
    the values of our function on some region \(D\), with value 0 along the 
    boundary. This leads to the fascinating 
    question of whether or not you can hear the shape of a drum. This means 
    that if you hear all the different harmonics of the drum, does that uniquely 
    determine a shape? Sadly, the answer is no -- different drums can have the 
    same sounds. An 
    excellent article on this is due to Kac, and can be read here.
 
    - We discussed horizontally-simple and vertically-simple and simple 
    regions (other books use the words y-simple, x-simple and simple regions). 
    Note that a region is often called elementary if it is either horizontally 
    or vertically simple. (Click 
    here for some more examples on simple regions.) The point of our 
    analysis here is to avoid having to go back to the definition of the 
    integral (ie, the Riemann sum). While not every region is elementary, many 
    are either elementary or the union of elementary regions. Below are two 
    interesting tidbits about how strange things can be:
      - Space 
      filing curves: click here for just how strange a curve can be!
 
      - 
      
      Koch snowflake: This is an example of a fractal set; the boundary has 
      dimension greater than 1 but less than 2! It's fractal 
      dimension is \(\log 4 / 
      \log 3\).
 
      - Jordan 
      curve theorem: It turns out to be surprisingly difficult to prove that 
      every non-intersecting curve in the plane divides the plane into an inside 
      and outsider region. It's not too bad for polygons, but for more general 
      curves (such as the non-differentiable boundary of the Kock snowflake), 
      it's harder.
 
    
     
    - The video of the week was coin 
    sorting (this leads to Lebesgue's 
    Measure Theory). There are many reasons leading to this as the 
    selection. One is that the Lebesgue theory is needed in a lot of higher 
    mathematics, and if you continue you'll eventually meet it. The other, and 
    more important for us, is that this demonstrates the power of a fresh 
    perspective. This happens again and again in mathematics (and life). We have 
    blinders on and don't even realize they're there. We get so used to doing 
    things a certain way it becomes heretical to think of doing it another way. 
    (This allows me to link to Asimov's Nightfall story, 
    considered by many the greatest sci-fi short story; it creates a society 
    which must confront the inconceivable for them -- obviously anything a 
    writer writes must be conceivable to the writer, so this is something 
    conceivable to us but not to them.) It's natural to divide the x-axis and 
    add the areas as we go along; however, it is useful to consider dividing it 
    along the y-axis as well.
      - If you know combinatorics, here's a nice example illustrating the 
      above point. Evaluate \(\sum_{k = 0}^{n} \left({n \atop k}\right) \left({n 
      \atop n-k}\right)\), where \(\left({x \atop y}\right) = x! / (y! 
      (x-y)!)\). The answer is \(\left({2n \atop n}\right)\). There are a lot of 
      ways to view this, here is my favorite. Imagine we have \(2n\) people, 
      \(n\) who prefer Star Trek: The Original Series and n who prefer Star 
      Trek: The Next Generation. There are \(\left({2n \atop n}\right)\) ways to 
      choose \(n\) people from the \(2n\) people. We can view this another way: 
      let's look at how many groups we can form of \(n\) people where exactly 
      \(k\) prefer the original series. There are \(\left({n \atop k}\right)\) 
      ways to choose \(k\) people from the \(n\) who prefer the original series, 
      and then \(\left({n \atop n-k}\right)\) ways to choose \(n-k\) from the 
      \(n\) who prefer the new series. The total number of ways with exactly 
      \(k\) who prefer the original is the product: \(\left({n \atop k}\right) \cdot 
      \left({n \atop n-k}\right)\). We then sum over \(k\); as any group of 
      \(n\) people must have SOME number who prefer the original series, this 
      sum is just the number of ways to choose \(n\) people from \(2n\), or 
      \(\left({2n \atop n}\right)\). Telling a story and changing our 
      perspective really helps!
 
    
     
  
     
  - Wednesday, 
  March 13.
  
  In one dimension, there is not much choice in how we integrate; however, if we 
  are trying to integrate a function of several variables over a rectangle (or 
  other such regions), not surprisingly the situation is markedly different. 
  Similar to the freedom we have with limits in several variables (where we have 
  to consider all possible paths), there are many ways to integrate. Imagine we 
  have a function of two variables and we want to integrate it over the 
  rectangle \([a, b] \times [c, d]\), with \(x\) in \([a, b]\) and \(y\) in 
  \([c, d]\). One possibility is we can fix \(x\) and let y vary, computing the 
  integral over y for the fixed \(x\), and then let \(x\) vary, computing the 
  integral over \(x\). Of course, we could also do it the other way. As we are 
  integrating the same function over the same region (just in a different 
  order), we hope that the answers are the same! So long as everything is nice, 
  this is the case. There are many formulations as to exactly what is needed to 
  make the situation nice; if our function is continuous and bounded and we are 
  integrating over a finite rectangle, then we can interchange the order of 
  integration without changing the answer. This is called Fubini's 
  theorem, 
  and is one of the most important results in integration theory in several 
  variables. There really isn't an analogue in one dimension, as there we have 
  no choice in how to integrate!
    - Whenever you are given a theorem, it is worthwhile to remove a condition 
    and see if it is still true. Typically the answer is no (or if it is still 
    true, the proof is frequently much harder). There are many functions and 
    regions where the order of integration matters. The simplest example is 
    looking at double sums rather than double integrals, though with a little 
    work we can convert this example to a double integral. We give a sequence \(a_{mn}\) 
    such that \(\sum_{m = 0}^{\infty} \sum_{n = 0}^{\infty} a_{m,n}\) is not 
    equal to \(\sum_{n = 0}^{\infty} \sum_{m = 0}{^\infty} a_{m,n}\). For \(m, n 
    \ge 0\) let \(a_{m,n} = 1\) if \(m = n\), \(-1\) if \(n=m+1\) and \(0\) 
    otherwise. Show that the two different orders of summation yield different 
    answers. The reason for this is that the sum of the absolute value of the 
    terms diverges.
 
    - Click here for 
    another example where we cannot interchange the order of integration; a 
    more involved example 
    is available here.
 
    - 
    
    Click here for a video by Cameron on how he applies Fubini's theorem to 
    change the order of operations (he 
    does a double sum instead of a double integral, but the principle is the 
    same).
 
    - It is important to know your integrals. There are many formulas, and 
    plenty of tables of integrals to help you. Two of the most popular are 
    available online, but should only be used when you have a truly pesky 
    integral: Abramovich 
    and Stegun    and    Gradshteyn 
    and Ryzhik. For everyday purposes, this 
    should suffice.
 
    - Two of the most important 1-dimensional techniques are integration 
    by parts and u-substitution. 
    Another powerful technique is partial 
    fractions. At first this seems like the domain of sadistic professors, 
    but in reality it can be quite useful. I and one of my students needed to 
    use it this summer to attack a nice problem in combinatorics / number 
    theory. Zeckendorf proved that if you write the Fibonacci numbers with just 
    one 1, so 1, 2, 3, 5, 8, 13, ..., then every number can be written uniquely 
    as a sum of non-adjacent Fibonacci numbers. Lekkerkerker proved that as 
    \(x\) ranges from the \(n\)th to the (\(n+1\))st Fibonacci numbers then the 
    average number of summands needed is \(n/(\phi+2)\), where \(\phi\) is the 
    golden mean. My students and I proved the fluctuations about the mean are 
    normally distributed (and generalized this to other systems). One of the key 
    inputs was integration by partial fraction. If you're interested, let me 
    know. This project allowed me to use my research funds to by a Cookie 
    Monster!
 
  
     
  - Monday, March 
  11.
  
  Today we 
  proved the  Fundamental 
  Theorem of Calculus. There are not that many fundamental theorems in 
  mathematics -- we do not use the term lightly! Other ones you may have seen 
  are the Fundamental 
  Theorem of Arithmetic and the Fundamental 
  Theorem of Algebra; click 
  here for a list of more fundamental theorems (including 
  the Fundamental 
  Theorem of Poker!). To simplify the proof, we made the additional 
  assumptions that our function was continuously differentiable and the 
  derivative was bounded. These assumptions can all be removed; it suffices for 
  the function to be continuous on a finite interval (in such a setting, a 
  continuous function is actually uniformly 
  continuous; informally, this means in the epsilon-delta 
  formulation of continuity that 
  delta is independent of the point. Such a result is typically proved in an 
  analysis class. What I find particularly interesting about the proof is that 
  the actual value that bounds the function is irrelevant; all that matters is 
  that our function is bounded. Theoretical math constantly uses such tricks; 
  this is somewhat reminiscent of some of the Lagrange Multiplier problems, 
  where we needed to use the existence of lambda to solve the problem, but 
  frequently we never had to compute the value of lambda.
    - The key ingredients in the proof are using the Mean 
    Value Theorem and observing 
    that we have a telescoping 
    sum. One has to be a little careful with telescoping sums with 
    infinitely many terms. The wikipedia article has some nice examples of 
    telescoping sums and warnings of the dangers if there are infinitely many 
    summands.
      - This lecture was titled 'The one with the Mean Value Theorem' in 
      homeage to the sitcom Friends, where 
      every (or almost every) episode title begins with `The one' (most 
      have with as the next word, but not all).
 
      - The hardest, but perhaps most important, step in the proof of the 
      Fundamental Theorem of Calculus was taking a special sequence of points \(p_k\) 
      in \([x_k, x_{k+1}]\) and applying the Mean Value Theorem to replace \(f(p_k) 
      \frac1{n}\) with \(F(x_{k+1}) - F(x_k)\). It's natural to try something 
      like this. We need to get \(F\) into the proof somehow; if we apply the 
      MVT to \(F\) we get \(F'\) coming out; as \(F' = f\) there's a hope of 
      getting this to match some of the terms we have. It takes many years of 
      experience to `see' arguments like this, but a great goal is to try to 
      reach such a mastery. You're going to forget technical details and 
      results; what you want to remember is how to attack a problem. That's why, 
      for me, this was one of the most important moments of the lecture (and one 
      of the most important of the class).
 
      
 
      
      - For additional reading on some of the background and related material, 
      see the following links. If you're interested in a math major, I strongly 
      urge you to read these.
 
        
        (we'll get to the Taylor series part later in the 
        semester). 
        
        
        - 
        
        Proofs by Induction 
 
        
        (as well as other items, including the notes above!).
      
      
    
    
    
    
    Whenever you are given a new theorem (such as the Fundamental Theorem of 
    Calculus), you should always check its predictions against some cases that 
    you an readily calculate without using the new machinery. For example, if we 
    want to find the area under \(f(x)\) from \(x=0\) to \(x=1\), obviously the 
    answer will depend on \(f\). If \(f\) is constant it is trivial (area of a 
    rectangle); if \(f\) is a linear relation then the answer is still readily 
    calculated (area of a triangle). For more general polynomials, one can 
    compute the Riemann sums (the upper 
    and lower sums) by Mathematical 
    Induction. For example, using induction one can show that \(\sum_{k=1}^n 
    k^2 = n(n+1)(2n+1)/6\), and this result can then be used to find the area 
    under the parabola \(y = x^2\).
    The integration covered through Calc III is known as Riemann 
    sums / Riemann integrals. In more advanced math classes you'll meet the 
    successor, Lebesgue 
    integrals. Informally, the difference between the two is as follows. 
    Imagine you have a large number of coins of varying denominators to assist; 
    your job is to count the amount of money. Riemann sums work by breaking up 
    the domain of the function; Lebesgue integration works by breaking up the 
    domain.
    (Extra Credit) For those looking for a challenge: Let 
    \(f\) satisfy the conditions of the Fundamental Theorem of Calculus. Let \(L(n)\) 
    denote the corresponding lower sum when we partition the interval \([0,1]\) 
    into \(n\) equal pieces, and similarly let \(U(n)\) denote the upper sum. We 
    know \(U(n) - L(n)\) tends to zero and \(L(n) \le\) True Area \(\le U(n)\); 
    as \(U(n) - L(n) \to 0\) as \(n \to \infty\), both \(U(n)\) and \(L(n)\) 
    tend to the true area. Must we have \(L(n) \le L(n+1)\), or is it possible 
    that \(L(n+1)\) might be less than \(L(n)\)?
    
  
  
  
     
  - Friday, March 
  8.
  
  Lagrange multipliers are 
  a terrific application of multivariable calculus. Frequently one needs to 
  optimize something, be it revenue in economics or steathiness in a fighter. 
  Lagrange multipliers give us a way to find maxima / minima subject to 
  constraints, provided 
  we can solve the equations!  We 
  first generalized the methods from one variable calculus on how to find maxima 
  and minima of functions. Recall that if f is a differentiable real-valued 
  function on an interval \([a,b]\), then the candidates for maxima / minima are 
  (1) the critical points, namely those \(x\) in \([a,b]\) where \(f'(x) = 0\), 
  and (2) the endpoints. How does this generalize to several variables? In 
  one-dimension the boundary of an interval is `boring'; it's just the two 
  endpoints, and thus it isn't that painful to have to check the value of the 
  function there as well as at the critical point. What about several variables? 
  The situation is quite different. For example, the interval \([-1,1]\) might 
  become a sphere \(x^2 + y^2 + z^2 \le 1\); the interior is all points \((x,y,z)\) 
  such that \(x^2 + y^2 + z^2 < 1\), while the boundary is now the set of points 
  with \(x^2 + y^2 + z^2 = 1\). Unfortunately this leads to infinitely many 
  points to check; while we could afford to just check the endpoints by brute 
  force in one-dimension, that won't be possible now. The solution is the Method 
  of Lagrange Multipliers.
    - Two good links: An 
    introduction to Lagrange Multipliers    and    Lagrange 
    Multipliers.
 
    - The Method of Lagrange Multipliers is one of the most frequently used 
    results in multivariable calculus. It arises in physics (Hamiltonians and 
    Lagrangian, Calculus of Variations), information theory, economics, linear 
    and non-linear programming, .... You name it, it's there. The two webpages 
    referenced above have several examples in these and other subjects; there 
    are of course many other sources and problems (click 
    here for a nice post on gasoline taxes, pollution and Lagrange multipliers). 
    For more on the economics impact, click 
    here, as well as see the following papers:
    
 
    - The Method of Lagrange Multipliers ties together many of the concepts 
    we've studied this semester, as well as some from Calc I and Calc II 
    (vectors, directional derivatives and gradients, and level sets, to name a 
    few). The goal is to show you how the theoretical framework we developed can 
    be used to solve problems of interest. The military example we discussed is 
    just one of many possible applications. We were concerned with how to deploy 
    a fleet to minimize average deployment time to trouble spots (for more 
    information, see my 
    notes on the problem and the Mathematica 
    code); of course, instead of considering each place equally important we 
    could easily add weights. One consequence of war is that it does strongly 
    encourage efficiency and optimization; in fact, many optimization algorithms 
    and techniques were developed because of the problems encountered. The 
    subject of Operations Research took off during WWII; see the excellent 
    wikipedia article on Operations Research, especially the subsection 
    on the problems OR attempts to solve. Not surprisingly, there are also 
    numerous applications in business. Feel free to talk either to my wife (who 
    is a Professor of Marketing) or myself (I've written several papers with 
    marketing professors, applying such techniques to many companies, my 
    favorite being movie theaters). As mentioned, we can reinterpret our 
    problem as minimizing shipping costs from a central distributor to various 
    markets (where some markets may be more valuable than others, leading to a 
    weighted function).
 
    - One of the most important takeaways of the deployment problem is that 
    the answer you get, as well as the difficulty of the math needed to arrive 
    at the answer, depends on how you choose to model the world. For us, it 
    depends on how we choose to measure 'distance'. My 
    notes on a deployment problem on the Earth's surface give 
    four different methods yielding three different solutions, all of which 
    differ from what you get if you use the 'correct' measure of distance. This 
    is an extremely common outcome -- your answer depends on how you choose to 
    model / measure! You need to be very aware 
    of this when you compare different people's answers to the same problem. For 
    a nice example of how the answer can depend on your point of view, consider 
    the riddle below (passed on by G. Mejia). What's the right answer? There are 
    at least two different right answers, depending on how you interpret things.
      - The police rounded up Jim, Bud and Sam yesterday, because one of them 
      was suspected of having robbed the local bank. The three suspects made the 
      following statements under intensive 
      questioning (see below). If only one of therse statements turns out to be 
      true, who robbed the bank?
        - Jim: I'm innocent. 
 
        - Bud: I'm innocent.
 
        - Sam: Bud is the guilty one.
 
      
       
    
     
    - In 2011 I gave an extra credit problem which has applications to 
    building an efficient computer for information retrieval (as opposed to 
    processing). For more on the problem of building an efficient computer in 
    terms of retrieval of information, see the 
    solution to the related extra credit problem from earlier in the 2011 
    iteration of the course. Note the problem is harder without the tools of 
    multivariable calculus. See 
    also the article by Hayes in the American Scientist, Third Base.
 
    - I've scanned in a chapter by Lanchester 
    on The Mathematics of Warfare; you can also view 
    it through GoogleBooks here. This article is from a four volume series, 
    The World of Mathematics. (I am fortunate enough to own two sets; one 
    originally belonged to a great uncle of mine, another to a 
    grandfather-in-law of my wife). I've written some Mathematica code to 
    analyze the Battle 
    of Trafalgar, which is described in the Lanchester article; the 
    Mathematica code is here (though 
    it might not make sense without comments from me). (The file name is boring 
    because, during 
    the 200th anniversary re-enactment, in order to avoid hurting anyone's 
    feelings they refused to call the two sides 'English' and 'French/Spanish'). 
    This is a terrific problem to illustrate applying mathematics to the real 
    world. One has a very complicated situation, and you must decide what are 
    the key features. The more features you include the better your model will 
    be, but the less likely you'll be able to solve it! It's a bit of an art 
    figuring out exactly how much to include to capture what truly matters and 
    still be able to solve your model. We'll discuss this in greater detail when 
    we do the Pythagorean 
    Won-Loss theorem from baseball, which is a nice application of 
    probability and multiple integrations.
 
    - Finally, a common theme that surfaces as we do more and more 
    mathematical modeling is that simple models very quickly lead to very hard 
    equations to solve. The drowning swimmer problem is actually the same as
    Snell's law, for how 
    light travels / bends in going from one medium to another. If you write 
    down the equations for the drowning swimmer, you quickly find a quartic to 
    solve. For interesting articles related to this, see the two papers below by 
    Pennings on whether or not dogs know calculus. Click 
    here for a picture of his 
    dog, Elvis, who does know 
    calculus.
    
 
    - General comment: it's important to be able to take complex information 
    and sift to the relevant bits. A great example is the song I'm 
    my own Grandpa. Listen to it and try to graph all the relations and see 
    that he really is his own grandfather (with no incest!). A solution is here (don't 
    view this until you try to graph it!). Actually, 
    this is a MUCH better illustration of the relationships.
 
  
  
     
  - Wednesday, March 
  1.
  
  We discussed directional 
  derivatives. It is natural that we develop such a concept, as up until now 
  we have only considered derivatives in directions parallel to the various 
  coordinate axes. A central theme of multivariable calculus is the need to be 
  able to approach a point along any path, and that in several dimensions 
  numerous paths are available (unlike the 1-dimensional case, where essentially 
  we just have two paths). Directional derivatives will play a key role in 
  optimization problems.
    - 
    
    One of the requests in Spring 2010 was to talk about applications of 
    multivariable calculus to molecular gastronomy. After some web browsing, I 
    eventually beecame interested in how bees communicate amongst themselves as 
    to where food is. There appear to bee two schools; one is the waggle 
    dance / language school, and the other is the odor 
    plume theory. In addition to controversies on how bees learn, there are 
    lots of nice applications to gradients and (I believe) directional 
    derivatives. The goal is to convey information about a specific path through 
    a very complex space.
      - See also the paper: Odor 
      landscapes and animal behavior: tracking odor plumes in different physical 
      worlds (Paul Moorea, John 
      Crimaldib). Abstract: The acquisition of information from sensory systems 
      is critical in mediating many ecological interactions. Chemosensory 
      signals are predominantly used as sources of information about habitats 
      and other organisms in aquatic environments. The movement and distribution 
      of chemical signals within an environment is heavily dependent upon the 
      physics that dominate at different size scales. In this paper, we review 
      the physical constraints on the dispersion of chemical signals and show 
      how those constraints are size-dependent phenomenon. In addition, we 
      review some of the morphological and behavioral adaptations that aquatic 
      animals possess which allow them to effectively extract ecological 
      information from chemical signals.
 
    
     
    - Today was a dividends lecture. The concept of directional derivative 
    tied together many items we saw in Chapter 11, including the notion of a 
    line, of a tangent plane, the formula for the dot product in terms of the 
    lengths of the vectors and the cosine of the angle, level sets, .... The 
    list goes on and on. This is common in mathematics: you spend a good amount 
    of time on the preliminaries and then reap great rewards later. We saw a 
    beautiful geometric interpretation of the gradient: \((\nabla f)(\overrightarrow{x})\) 
    points in the direction of maximum change of \(f\); further, the gradient is 
    normal to the level set. 
 
    - It's a good idea to check new results with old -- do they really 
    generalize? The directional derivative \(D_{\overrightarrow{u}}f(\overrightarrow{x})\) 
    can be computed by \((\nabla f)(\overrightarrow{x}) \cdot \overrightarrow{u}\); 
    if we take \(\overrightarrow{u}\) to be a unit vector along a coordinate 
    axis (so it is \(\overrightarrow{e}_1, \overrightarrow{e}_2\), ..., or \(\overrightarrow{e}_n\)), 
    then the directional derivative reduces to the partial derivative in that 
    direction. In other words, \(D_{\overrightarrow{e}_i}(\overrightarrow{x}) = 
    \partial f / \partial x_i\) (evaluated of course at \(\overrightarrow{x}\)).
 
    - When we were trying to find the direction of greatest change of \(f\) at 
    \(\overrightarrow{x}\), we eventually saw it had to be in the direction of 
    \((\nabla f)(\overrightarrow{x})\). You'll forgot, or never use, most of the 
    material in this course; that's fine, as learning these facts is only part 
    of why you're here. You're here in large part to get a sense of how to 
    attack a problem, what to look for. We're looking for the direction \(\overrightarrow{v}\) 
    that maximizes how fast \(f\) is changing; in other words, we want to find 
    \(\overrightarrow{v}\) such that \(D_{\overrightarrow{v}}(\overrightarrow{x}) 
    = (\nabla f)(\overrightarrow{x}) \cdot \overrightarrow{v}\) is greatest. 
    When we look at this expression, there is a special, 
    distinguished vector. We've fixed a function \(f\) and a point \(\overrightarrow{x}\) 
    and we're searching for \(\overrightarrow{v}\). We want to know what 
    direction \(\overrightarrow{v}\) should point in. While it's natural to 
    guess that \(\overrightarrow{x}\) plays a role, that can't be the answer as 
    \(\overrightarrow{x}\) is where we evaluate the function and the answer 
    needs to depend on \(f\). The only vector present that involves \(f\) is 
    \((\nabla f)(\overrightarrow{x})\). This is a vector, it involves \(f\) 
    evaluated at the point we care about, \(\overrightarrow{x}\). This 
    suggests that it plays a role in the answer. Maybe this is the 
    direction of greatest increase, or greatest decrease, or perhaps a direction 
    of no change. But it is a 
    special direction, and it should be investigated. You want to get to 
    the point where you can see this, where you can get a sense of what to do 
    and what to try. This vector \((\nabla f)(\overrightarrow{x})\) is in every 
    directional derivative of \(f\) at \(\overrightarrow{x}\); it's probably 
    important and thus it suggests we calculate the directional derivative in 
    that direction (as well as it's opposite direction, as well as all 
    directions perpendicular to this -- these perpendicular directions lead to 
    the level sets).
 
    - It is very important 
    to know proofs and definitions; there's a reason one of the exam questions 
    required you to be able to describe clearly key concepts from the course. A 
    very important example is the fall of Western Civilization (or, if you're 
    not quite as pessimistic, the financial mortgage meltdown). While 
    there are many reasons behind the collapse (I 
    have close family that has worked in the upper levels of many of the top 
    financial firms; if you are interested in stories of what isn't reported in 
    the news, let me know), one large component was an incorrect use of Gaussian 
    copulas. It's similar to looking at low-velocity data and extrapolating to 
    relativistic speeds -- there is an enormous danger when you apply results 
    from one region in another with no direct data in that second realm. A great 
    article on this is from Wired Magazine (The 
    Formula That Killed Wall Street). It's worth reading this. Some 
    particularly noteworthy passages:
      - Bankers should have noted that very small 
      changes in their underlying assumptions could result in very large changes 
      in the correlation number. They also should have noticed that the results 
      they were seeing were much less volatile than they should have been which 
      implied that the risk was being moved elsewhere. Where had the risk gone? 
      They didn't know, or didn't ask. One reason was that the outputs came from 
      "black box" computer models and were hard to subject to a commonsense 
      smell test. Another was that the quants, who should have been more aware 
      of the copula's weaknesses, weren't the ones making the big 
      asset-allocation decisions. Their managers, who made the actual calls, 
      lacked the math skills to understand what the models were doing or how 
      they worked. They could, however, understand something as simple as a 
      single correlation number. That was the problem.
 
      - No one knew all of this better than David 
      X. Li: "Very few people understand the essence of the model," he told The 
      Wall Street Journal way back in fall 2005. "Li can't be blamed," says 
      Gilkes of CreditSights. After all, he just invented the model. Instead, we 
      should blame the bankers who misinterpreted it. And even then, the real 
      danger was created not because any given trader adopted it but because 
      every trader did. In financial markets, everybody doing the same thing is 
      the classic recipe for a bubble and inevitable bust.
 
    
     
  
     
  - Monday, March 
  1. 
  
  Today we discussed the Chain Rule. The 
  Chain Rule is one of the most 
  important results in multivariable calculus, as it allows us to build 
  complicated functions depending on functions of many inputs. To state it 
  properly requires some linear algebra, especially matrix 
  multiplication. The proof uses multiple applications of adding zero. This 
  is a essential skill to master if you wish to continue in mathematics. It is 
  somewhat similar to adding auxiliary lines in geometry. With experience, it 
  becomes easier to `see' where and how to add zero. The idea is we want to add 
  zero in such a way that we convert one expression to several, where the 
  resulting expressions are easier to analyze because we are subtracting two 
  quantities that are quite close. For the chain rule, we will do this by adding 
  numerous intermediary points.
    - One way to view the Chain Rule is that it is all about giving you the 
    freedom to choose. You can either plug everything in and differentiate 
    directly by brute force, or you 
    can use the Chain Rule to find the derivative of the composition in terms of 
    the derivatives of the constituent pieces. Depending on the problem, one way 
    could be easier than the other; there are examples of situations where 
    direct substitution is best, and examples where it is better to use the 
    Chain Rule. With experience it becomes clear which way is better. When we 
    discuss gradients and directional derivatives, we'll see a theoretical 
    interpretation of the Chain Rule. This will play a fundamental role when we 
    return to optimization problems. Finally, of course, it is useful to be able 
    to compute an answer two different ways, as this provides a nice check of 
    your work.
 
    - To use the Chain Rule in full glory, we needed to understand how to 
    multiply matrices, as \(h(\overrightarrow{u}) = f(g(\overrightarrow{u}))\) 
    implies \((Dh)(\overrightarrow{u}) = (Df)(g(\overrightarrow{u})) (Dg)(\overrightarrow{u})\), 
    where \(\overrightarrow{u} = (u_1, 
    \dots, u_m)\). One can motivate matrix multiplication through the dot 
    product, as we know how to take the dot product of two vectors of the same 
    number of coordinates. Matrix multiplication looks quite mysterious at 
    first. Wikipedia 
    has a nice article (with color) on multiplying matrices, though it is a 
    bit short on motivation. The advanced reason as to why we do this comes from 
    also viewing matrices as linear transformations, and we want the product of 
    two matrices to represent the composition of the transformations. This is an 
    advanced topic, and sadly is frequently mangled in a linear algebra course. 
    I've posted a little bit about this in the advanced 
    notes from Thursday's lecture. The best motivation I know is to consider 
    \(2 \times 2\) rotation 
    matrices. If \(R(a)\) corresponds to rotating by \(a\) radians, and \(R(b)\) 
    to rotating by \(b\) radians, then \(R(b) R(a)\) should equal \(R(b+a)\); 
    this does happen if we use the matrix multiplication method.
 
    - I did a quick google search for applications of the chain rule in 
    various subjects. Here's 
    something in economics. Here's 
    another econ example. Here's 
    a chemistry example.
 
    - One of our biggest applications of the Chain Rule was to inverse 
    functions and derivatives.
    
      - If \(f(g(x)) = x\) (so \(f\) and \(g\) are inverse functions, such as 
      \(\sqrt{x^2})\), then \(g'(x) = 1 / f'(g(x))\); in other words, knowing 
      the derivative of \(f\) we know the derivative of its inverse function 
      \(g\). This was used in Calc I to pass from knowing the derivative of \(\exp(x)\) 
      to the derivative of \(\ln(x)\). We can apply this to various inverse 
      trig functions (a 
      list of the derivatives of these are here). This highlights one of the 
      most painful parts of integration theory -- just because we are close to 
      finding an anti-derivative does not mean we can actually find it! While 
      there is a 
      nice anti-derivative of \(\sqrt{1 - x^2}\), it is not a pure derivative of 
      an inverse trig function. There are many tables 
      of anti-derivatives (or integrals) (a 
      fun example on that page is the Sophomore's 
      Dream). Unfortunately it is not always apparent how to find these 
      anti-derivatives, though of course if you are given one you can check by 
      differentiating (though sometimes you have to do some non-trivial algebra 
      to see that they match). In fact, there are some tables of integrals of 
      important but hard functions where most practitioners have no idea how 
      these results are computed (and occasionally there are errors!). We will 
      see later how much simpler these problems become if we change variables; 
      to me, this is one of the most important lessons you can take from the 
      course:  Many problems 
      have a natural point of view where the algebra is simpler, and it is worth 
      the time to try to find that point of view!
 
      - Let \(f(x) = \exp(x)\). Then \(f'(x) = \lim [f(x+h) - f(x)]/h\) = \(\lim_{h 
      \to 0} [\exp(x+h) - \exp(x)] / h\) = \(\lim_{h \to 0} [\exp(x) \exp(h) - \exp(x)] 
      / h\) = \(\exp(x) \lim_{h \to 0} [\exp(h) - 1] / h\). As \(\exp(0) = 1\), 
      we find \(f'(x) = \exp(x) \lim_{h \to 0} [f(h) - f(0)] / h\) = \(\exp(x) 
      f'(0)\); thus we know the 
      derivative of the exponential function everywhere once we know the 
      derivative at 0!
  
    
     
  
  
     
  - Friday, March 
  1.
  
  The goal of today's lecture was to see the power of approximating a 
  complicated function with simpler ones. 
  
    - 
  
    Videos: Mandelbrot zoom:
  video 1,
  video 2.
  Here's a 
  cubic fractal zoom. (other copies: Videos:
    Mandelbrot set       
    
    Newton Fractal (cubic) )
 
    - 
  
  Mathematica notebook for Newton's Method (pdf 
  version here)
 
    - 
    
    We compared two methods to find roots of polynomials. In some special cases 
    we can find closed form expressions for roots in terms of the coefficients. 
    For example, any linear equation (\(ax+b=0\)), quadratic (\(ax^2+bx+c=0\)), 
    cubic (\(ax^3+bx^2+cx+d=0\)) or quartic (\(ax^4+bx^3+cx^2+dx+e=0\)) has a 
    formula for the roots in terms of the coefficients of the polynomials; this 
    fails for polynomials of degree 5 and higher (the Abel-Ruffini 
    Theorem; see also Galois). 
    It is very convenient when we have a solution that is a function of the 
    parameters; we can then use our methods to find the optimal values of the 
    parameters. Sadly in industry it is often difficult to get closed form 
    expressions; if you are looking for the most potent compound, for example, 
    you might be required to do numerous different trial runs and just observe 
    which is best. We thus need a way to find optimal values / solve equations. 
    We describe two below.
      - Newton's method is 
      significantly more powerful than divide 
      and conquer (also called 
      the bisecting algorithm); this is not surprising as it assumes more 
      information about the function of interest (namely, differentiability). 
      The numerical stability of Newton's method leads to many fascinating 
      problems. One terrific example is looking at roots in the complex plane of 
      a polynomial. We assign each root a different color (other than purple), 
      and then given any point in the complex plane, we apply Newton's method to 
      that point repeatedly until one of two things happen: it converges to a 
      root or it diverges. If the iterates of our point converges to a root, we 
      color our point the same color as that root, else we color it purple. This 
      leads to Newton 
      fractals, where two points extremely close to each other can be 
      colored differently, with remarkable behavior as you zoom in. If you're 
      interested in more information, let me know; a good chaos program is xaos (I 
      have other links to such programs for those interested). One final aside: 
      it is often important to evaluate these polynomials rapidly; naive 
      substitution is often too slow, and Horner's 
      algorithm is frequently used.
 
      - The fractal behavior exhibited by Newton's method applied to finding 
      roots of polynomials is one of many examples of Chaos 
      Theory, or extreme sensitivity to initial conditions. While one of the 
      earliest examples was the work of Poincare on the motion of three 
      planetary bodies, the subject really took off with Lorenz work on 
      weather (the 
      Butterfly Effect). Another nice example is the orbit 
      of Pluto; while we know it will orbit the sun, its orbit is chaotic 
      and we cannot say where exactly in the orbit it will be millions of years 
      from now.
 
    
     
    - Instead of approximating a function locally by a line, we now use a 
    plane (in 2-dimensions) or hyperplane (in general). We can use the Mean 
    Value Theorem to get some 
    information on how close the estimation is, and then use these estimations 
    to approximate our function. A Mathematica 
    file with the tangent line and tangent plane approximations is here. One 
    definition of differentiability is that a function is differentiable if the 
    error in the tangent plane approximation tends to zero faster than the 
    distance of where we are to where we start tends to zero. It is sadly 
    possible for the partial derivatives to exist without the function being 
    differentiable. We showed how it is not sufficient for the partial 
    derivatives to exist; that is not enough to imply our function is 
    differentiable. The example was \(f(x,y) = (xy)^{1/3}\). What must we assume 
    in order for the partial derivatives to imply our function is 
    differentiable? It turns out it suffices to assume the partial derivatives 
    are continuous. This is the major theorem in the subject, and provides a 
    nice way to check for when a function is differentiable.
      - 
      The proof of the alluded to theorem above uses two of my 
      favorite techniques. While sadly we do not multiply by 1, we do get to add 
      0 and we do use the Mean 
      Value Theorem. One of my goals in the class is to illustrate how to 
      think about these problems, why we try certain approaches for our proofs. 
      We want to study how well the tangent plane approximates our function, 
      thus we need to study f(x,y) - f(0,0) - (δf/δx)(0,0) 
      x - (δf/δy)(0,0) y. Our theorem assumes the partial derivatives are 
      continuous, thus it stands to reason that at some point in the proof we 
      should use the partial derivatives are continuous! The trick is to try and 
      see how we can get another δf/δx and another δf/δy to appear. The key is 
      to recall the MVT. If we add 0 in a clever way, we can do this. Our 
      expression equals f(x,y) - 
      f(0,y) + f(0,y) - f(0,0) - (δf/δx)(0,0) 
      x - (δf/δy)(0,0) y. We now use the MVT on f(x,y) - f(0,y) and on f(0,y) - 
      f(0,0). In each of these two expressions, only one variable changes. Thus 
      the first is (δf/δx)(c,y) x and the second is (δf/δy)(0,ĉ). Thus the error 
      in using the tangent plane is [(δf/δx)(c,y) - (δf/δx)(0,y)] 
      x + [(δf/δy)(0,ĉ) 
      - (δf/δx)(0,o)] y. We now see how the continuity of the partials enters -- 
      it ensures that these differences are small, even when we divide by |(x,y)-(0,0)|.
 
    
     
    - 
    
    In Economics, 
    the standard random 
    walk hypothesis seems to have 
    lost most of its supporters, though there are variants (and I'm not familiar 
    with all); see also the efficient 
    market hypothesis and technical 
    analysis, and all the links there. (There are also many good links on 
    the wikipedia page on Eugene 
    Fama). Two famous books (with different conclusions) are Malkiel's A 
    random walk down wall street and 
    Mandelbrot-Hudson's The 
    (mis)behavior of markets (a fractal view of risk, ruin and reward). Some 
    interesting papers if you want to read more:
    
    
 
 
  
     
  - Wednesday, February 
  25.
  
  Today's lecture covered the Method of Least Squares. The best fit value of the 
  parameters depends on how we choose to measure errors. It is very important to 
  think about how you are going to measure / model, as frequently people reach 
  very different conclusions because they have different starting points / 
  different metrics. We'll see another example of how our metric can affect the 
  answer when we get to Lagrange multipliers.
  
  Here is a nice website to see the difference.
    - The Method of Least Squares is one of my favorites in statistics (click 
    here for the Wikipedia page, and click 
    here for my notes). The Method of Least Squares is a great way to find 
    best fit parameters. Given a hypothetical relationship \(y = a x + b\), we 
    observe values of \(y\) for different choices of \(x\), say \((x_1, y_1), 
    (x_2, y_2), (x_3, y_3)\) and so on. We then need to find a way to quantify 
    the error. It's natural to look at the observed value of \(y\) minus the 
    predicted value of \(y\); thus it is natural that the error should be \(\sum_{i=1}^n 
    h\left(y_i - (a x_i + b) \right)\) for some function \(h\). What is a good 
    choice? We could try \(h(u) = u\), but this leads to sums of signed errors 
    (positive and negative), and thus we could have many errors that are large 
    in magnitude canceling out. The next choice is \(h(u) = |u|\); while this is 
    a good choice, it is not analytically tractable as the absolute value 
    function is not differentiable. We thus use \(h(u) = u^2\); though this 
    assigns more weight to large errors, it does lead to a differentiable 
    function, and thus the techniques of calculus are applicable. We end up with 
    a very nice, closed form expression for the best fit values of the 
    parameters.
 
    - Unfortunately, the Method of Least Squares only works for linear 
    relations in the unknown parameters. As a great exercise, try to find the 
    best fit values of \(a\) and \(c\) to \(y = c/x^a\) (for 
    definiteness you can think of this as the force due to two unit masses that 
    are \(x\) units apart). When you take the derivative with respect to \(a\) 
    and set that equal to zero, you won't get a tractable equation that is 
    linear in a to solve. Fortunately there is a work-around. If we change 
    variables by taking logarithms, we find \(\ln(y) = \ln(c/x^2)\); using logarithm 
    laws this is equivalent to 
    \(\ln(y) = a \ln(x) + \ln(c)\); setting \(Y = \ln(y), X = \ln(X)\) and \(b = 
    \ln(c)\) this is equivalent to \(Y = a X + b\), which is exactly the 
    formulation we need! This example illustrates the power of logarithms; it 
    allows us to transform our data and apply the Method of Least Squares.
 
    - There are many examples of power laws in the world. Many of my favorite 
    are related to Zipf's 
    law. The frequencies of the most common words in English is a 
    fascinating problem (click 
    here for the data; see also this 
    site); this works for other languages as well, for the size of the most 
    populous cities, ...; if you consider more general power laws, you also get Benford's 
    law of digit bias, which is used 
    by the IRS to detect tax fraud (the 
    link is to an article by a colleague of mine on using Benford's law to 
    detect fraud). The power law relation is quite nice, and initially 
    surprising to many. My Mathematica 
    programming analyzing this is available here. See also this 
    paper by Gabaix for Zipf's law and the growth of cities. As a nice 
    exercise, you should analyze the growth of city populations (you can get 
    data on both US and the 
    world from Wikipedia).
 
    - We discussed Kepler's 
    Three Laws of Planetary Motion (the 
    Wikipedia article is very nice). Kepler was proudest (at least for a 
    longtime) of Mysterium 
    Cosmographicum (I strongly 
    urge you to read this; yes, the same Kepler whom we revere today for his 
    understanding of the cosmos also advanced this as a scientific theory -- 
    times were different!).
 
    - Finally, we saw the importance of how we choose to measure things; how 
    we model and how we judge the model's prediction will greatly affect the 
    answer. In a similar spirit, I thought I would post a brief note about Oulipo, 
    a type of mathematical poetry (this 
    is a link to the Wikipedia page, which has links to examples). There was a 
    nice article about this recently in Math Horizons (you 
    can view the article here). This is a nice example of the intersection 
    of math and the arts, and discusses how the structure of 
    a poem affects the output, and what structures might lead to interesting 
    works.
 
  
     
  - Monday, February 
  25.
  
  Today is one of the biggest applications of multivariable calculus, 
  optimization. We'll see a great instance of this when we do the Method of 
  Least Squares. 
  
    - 
    
    The search for extrema is a central pursuit in modern science and 
    engineering. It is important to have techniques to winnow the list of 
    candidate points. The methods discussed in class are the natural 
    generalizations from one-variable calculus. While one must prove that the 
    function under consideration does have a max/min, typically this is clear 
    from physical reasons (for example, there should be a pen of maximal area 
    for given perimeter; there should be a path of least time).
      - In one-dimension, boundaries of sets aren't too bad; for example, the 
      boundary of [a, b] is just two points, a and b. The situation is violently 
      different in several variables. There the boundary can have infinitely 
      many points, and reducing a problem to interior critical points and 
      checking the function on the boundary is not enough; we must have a way to 
      evaluate all these points on the boundary.
 
      - The generalization of the second derivative tests involves 
      determinants and whether or not the Hessian is a positive 
      definite matrix, a negative 
      definite matrix, et cetera. What is really going on is that we want to 
      use the Principal Axis Theorem and change to a coordinate system where the 
      Hessian is easier to understand because, in this new coordinate system, it 
      is a diagonal matrix! This makes a lot more sense after taking Linear 
      Algebra.
 
      - In one of the sabermetrics lectures we might discuss linear 
      programming. This is a wonderful topic, and it allows us to solve (or 
      approximate the solutions) to a wealth of problems very quickly. My 
      lecture notes are online here. One of my favorite applications of 
      linear programming is to determining 
      when teams are eliminated from playoff contention; MLB and ESPN 
      frequently do the analysis incorrectly by not taking into account 
      secondary effects of teams playing teams playing teams. For example, ESPN 
      or MLB back in '04 had the wild-card unclinched for one extra day. (The 
      Sox had a big lead over the Angels and a slightly smaller lead over the 
      As; however, the As and the Angels were playing each other and thus at 
      least one team would get 2 losses, and one had to win the ALWest. Thus the 
      Sox had clinched the wildcard a day earlier than thought. We'll discuss 
      this paper when the pre-frosh visit, as it's a nice application of 
      multivariable calculus.) If you're interested, click 
      here for a paper I wrote with colleagues applying 
      linear programming to helping a movie theatre determine optimal schedules.
      
 
      - It is worth remarking that for many applications in the real world, we 
      do not need to find the true extremum, but rather just something very 
      close. For example, say we are trying to determine the optimal schedule 
      for an airline for a given day. We can write the linear programming 
      problem down, but it might take days to years to run; however, frequently 
      we can obtain bounds showing how close our answer is to the theoretical 
      best (ie, we can show we are no more than X away from optimal). It often 
      happens that X is small, and thus with a small run-time we can be close 
      enough. (It isn't worth it to ground our airfleet for a few years to find 
      the optimal schedule!)
 
    
     
    
    - 
    
    Systems of equations are frequently used to model real world problems, as it 
    is quite rare for there to be only one quantify of interest. If you want to 
    read more about applying math to analyze the Battle 
    of Trafalgar, here 
    is a nice handout (or, even 
    better, I think we could go further and write a nice paper for a general 
    interest journal expanding on the Mathematica 
    program I wrote). The model is very similar to the Lotka-Volterra 
    predator-prey equations (our 
    evolution is quite different, though; this is due to the difference in sign 
    in one of the equations). Understanding these problems is facilitated by 
    knowing some linear algebra. It is also possible to model this problem using 
    a system of difference equations, which can readily be solved with linear 
    algebra. Finally, it's worth noting a major drawback of this model, namely 
    that it is entirely deterministic: you specify the initial concentrations of 
    red and blue and we know exactly how many exist at any time. More generally 
    one would want to allow some luck or fluctuations; one way to do this is 
    with Markov 
    chains. This leads to more complicated (not surprisingly) but also more 
    realistic models. In particular, you can have different probabilities for 
    one ship hitting another, and given a hit you can have different 
    probabilities for how much damage is done. This can be quite important in 
    the 'real' world. A classic example is the British efforts to sink the 
    German battleship Bismarck in WWII. The Bismarck was superior to all British 
    ships, and threatened to decisively cripple Britain's commerce (ie, the flow 
    of vital war and food supplies to the embattled island). One of the key 
    incidents in the several days battle was a lucky torpedo shot by a British 
    plane which seriously crippled the Bismarck's rudder. See 
    the wikipedia entry for more details on one of the seminal naval engagements 
    of WWII. The point to take away from all this is the need to always be 
    aware of the limitations of one's models. With the power and availability of 
    modern computers, one workaround is to run numerous simulations and get 
    probability windows (ie, 95% of the time we expect a result of the following 
    type to occur). Sometimes we are able to theoretically prove bounds such as 
    these; other times (using Markov chains and Monte 
    Carlo techniques) we numerically approximate these probabilities.
 
  
     
  - Friday, February 
  22.
  
  We continued our discussion of partial derivatives. We 
  talked a lot about different notations for the derivative. It is very 
  convenient to be able to refer to all the different derivatives (functions 
  with one input, several inputs and one output, several inputs and several 
  outputs) with just one notation. The definition is that a function is 
  differentiable if the error in the tangent plane approximation tends to zero 
  faster than the distance of where we are to where we start tends to zero. It 
  is sadly possible for the partial derivatives to exist without the function 
  being differentiable. We showed how it is not sufficient for the partial 
  derivatives to exist; that is not enough to imply our function is 
  differentiable. The example was f(x,y) = (xy)1/3. What must we 
  assume in order for the partial derivatives to imply our function is 
  differentiable? It turns out it suffices to assume the partial derivatives are 
  continuous. This is the major theorem in the subject, and provides a nice way 
  to check for when a function is differentiable.
    - 
    The proof of the alluded to theorem above uses two of my 
    favorite techniques. While sadly we do not multiply by 1, we do get to add 0 
    and we do use the Mean 
    Value Theorem. One of my goals in the class is to illustrate how to 
    think about these problems, why we try certain approaches for our proofs. We 
    want to study how well the tangent plane approximates our function, thus we 
    need to study f(x,y) - f(0,0) - (δf/δx)(0,0) 
    x - (δf/δy)(0,0) y. Our theorem assumes the partial derivatives are 
    continuous, thus it stands to reason that at some point in the proof we 
    should use the partial derivatives are continuous! The trick is to try and 
    see how we can get another δf/δx and another δf/δy to appear. The key is to 
    recall the MVT. If we add 0 in a clever way, we can do this. Our expression 
    equals f(x,y) - 
    f(0,y) + f(0,y) - f(0,0) - (δf/δx)(0,0) 
    x - (δf/δy)(0,0) y. We now use the MVT on f(x,y) - f(0,y) and on f(0,y) - 
    f(0,0). In each of these two expressions, only one variable changes. Thus 
    the first is (δf/δx)(c,y) x and the second is (δf/δy)(0,ĉ). Thus the error 
    in using the tangent plane is [(δf/δx)(c,y) - (δf/δx)(0,y)] x + [(δf/δy)(0,ĉ) 
    - (δf/δx)(0,o)] y. We now see how the continuity of the partials enters -- 
    it ensures that these differences are small, even when we divide by ||(x,y)-(0,0)||.
 
    - The Mean Value Theorem is also the key ingredient in the proof of the equality 
    of mixed partial derivatives (assuming 
    both are continuous). Sadly 
    there do exist functions where the mixed derivatives are unequal
    (for extra credit, show that the mixed 
    derivatives are not equal in the linked example).
 
    - Notation is very important. The subscript notation for partial 
    derivatives is nice and elegant; it's easy to glance at uxxy and 
    quickly glean that it's two derivatives with respect to x followed by one 
    with respect to y. This allows us to write down many partial 
    differential equations in a 
    nice, compact form. Some of the most famous and important are (1) the 
    heat equation, (2) the 
    wave equation, and (3) the Navier-Stokes 
    equation. The last arises in fluid flow, and is one of the Clay 
    Millenium Prize Problems.
 
    - In differential trigonometry, everything comes down to the limit as h 
    tends to zero of sin(h)/h. One can prove this limit geometrically, as is 
    often done, and then obain the derivatives by using the angle addition 
    formulas. We sketch another avenue to these addition formulas. The Pythagorean 
    Theorem says 
    cos2(x) + sin2(x) = 1. There are many ways to obtain 
    this formula. Perhaps one of the most useful is the Euler 
    - Cotes formula, exp(ix) = cos(x) + i sin(x). One can essentially derive 
    all of trigonometry from this relation, with just a little knowledge of the exponential 
    function. Specifically, we have exp(z) = 1 + z + z2/2! + z3/3! 
    + .... It is not at all clear from this definition that exp(z) exp(w) = 
    exp(z+w); this is a statement about the product of two infinite sums 
    equaling a third infinite sum. It is a nice exercise in combinatorics to 
    show that this relation holds for all complex z and w.
      - Taking the above identities, we sketch how to derive all of 
      trigonometry! Let's prove the angle addition formulas. We have exp(ix) = 
      cos(x) + i sin(x) and exp(iy) = cos(y) + i sin(y). Then exp(ix) exp(iy) = 
      [cos(x) + i sin(x)] [cos(y) + i sin(y)] = [cos(x) cos(y) - sin(x) sin(y)] 
      + i [sin(x) cos(y) + cos(x) sin(y)]; however, exp(ix) exp(iy) = exp(i(x+y)) 
      = cos(x+y) + i sin(x+y) by Euler's formula. The only way two complex 
      numbers can be equal is if they have the same real and the same imaginary 
      parts. Thus, equating these yields cos(x+y) = cos(x) + isin(x) and sin(x+y) 
      = sin(x) cos(y) + cos(x) sin(y).
 
      - It is a nice exercise to derive all the other identities. One can even 
      get the Pythagorean theorem! To obtain this, use exp(ix) exp(-ix) = exp(0) 
      = 1.
 
      - We thus see there is a connection between the angle addition formulas 
      in trigonometry and the exponential addition formula. Both of these are 
      used in critical ways to compute the derivatives of these functions. For 
      example, these formulas allow us to differentiate sine, cosine and the 
      exponential functions anywhere once we know their derivative at just one 
      point. Let f(x) = exp(x). Then f'(x) = lim [f(x+h) - f(x)]/h = lim [exp(x+h) 
      - exp(x)] / h = lim [exp(x) exp(h) - exp(x)] / h = exp(x) lim [exp(h) - 1] 
      / h; as exp(0) = 1, we find f'(x) = exp(x) lim [f(h) - f(0)] / h = exp(x) 
      f'(0); thus we know the derivative of the exponential function everywhere 
      once we know the derivative at 0! One finds a similar result for the 
      derivatives of sine and cosine (again, this shouldn't be surprising as the 
      functions are related to the exponential through Euler's formula).
 
    
     
  
     
  - Wednesday, February 
  13.
  
  No class on Friday (Winter Carnival); Monday's class is an optional review, 
  exam is Wednesday February 20th in class. 
  
    
    - The terminology of open 
    sets, closed 
    sets, boundary 
    points and so on won't be 
    used too much (perhaps not ever again) in this course, but are essential in 
    more advanced analysis classes. It is important to build our subjects on 
    firm foundations. A great example of why this is needed is Russell's 
    paradox, which showed 
    that we didn't even understand what it meant to be a set or an element of a 
    set! Another famous paradox is the Banach 
    - Tarski paradox, which tells us that we don't understand volumes! It 
    basically says if you assume the Axion 
    of Choice, you can cut solid sphere into 5 pieces, and reassemble the 
    five pieces to get two completely solid spheres of the same size as the 
    original! While it is rare to find these paradoxes in mathematics, 
    understanding them is essential. It 
    is in these counter-examples that we find out what is really going on. It is 
    these examples that truly illuminate how the world is (or at least what our 
    axioms, imply). Most people use the Zermelo-Fraenkel  
    axioms, abbreviated ZF. If you additionally assume the Axiom of Choice, 
    it's called ZFC or ZF+C. Not all problems in mathematics can be answered yea 
    or nay within this structure. For example, we can quantify sizes of 
    infinity; the natural numbers are much smaller than the reals; is there any 
    set of size strictly between? This is called the Continuum 
    Hypothesis, and my mathematical grandfather (one of my thesis 
    advisor's advisor), Paul 
    Cohen, proved it is independent (ie, you may either add it to your axiom 
    system or not; if your axioms were consistent before, they are still 
    consistent).
      - In a real analysis course, one develops the notation and machinery to 
      put calculus on a rigorous footing. In fact, several 
      prominent people criticized the foundations of calculus, such as Lord 
      Berkeley; his famous attack, The 
      Analyst, is available here. It wasn't until decades later that a good 
      notion of limit, integral and derivative were developed. Most people are 
      content to stop here; however, see also Abraham 
      Robinson's work in Non-standard 
      Analysis. He is one of several mathematicians we'll encounter this 
      semester who have been affiliated with my Alma Mater, Yale. 
      Another is the great Josiah 
      Willard Gibbs.
 
      - One of my favorite applications of open 
      and closed sets is Furstenberg's 
      proof of the infinitude of primes; one night while a postdoc at Ohio 
      State I had drinks with Hillel 
      Furstenberg and one of his 
      students, Vitaly 
      Bergelson. This is considered by many to be one of the best proofs of 
      the infinitude of primes; it is so good it is one of six proofs given in THE 
      Book. Unlike most proofs of the infinitude of primes, this gives no 
      bounds on how many primes there are at most x; even Euclid's 
      proof (if there are only 
      finitely many primes, say p1, ..., pn, then consider 
      (p1*...*pn)+1; either this is new prime or is 
      divisible by a prime not in our list, since each prime in our list has 
      remainder 1 when we divide by it) gives a lower bound, namely log log x 
      (the true answer is that there are about x / log x primes at most x). As 
      a nice exercise (for fun), prove this fact. This leads to an interesting 
      sequence: 2, 
      3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 2801, 11, 17, 5471, 
      52662739, 23003, 30693651606209, 37, 1741, 1313797957, 887, 71, 7127, 109, 
      23, 97, 159227, 643679794963466223081509857, 103, 1079990819, 9539, 
      3143065813, 29, 3847, 89, 19, 577, 223, 139703, 457, 9649, 61, 4357....   
      This sequence is generated as follows. Let a_1 = 2, the first prime. We 
      apply Euclid's argument and consider 2+1; this is the prime 3 so we set 
      a_2 = 3. We apply Euclid's argument and now have 2*3+1 = 7, which is 
      prime, and set a_3 = 7. We apply Euclid's argument again and have 2*3*7+1 
      = 43, which is prime and set a_4 = 43. Now things get interesting: we 
      apply Euclid's argument and obtain 2*3*7*43 + 1 = 1807 = 13*139, and set 
      a_5 = 13. Thus a_n is the smallest prime not on our list genereated by 
      Euclid's argument at the nth stage. There are a plethora of (I believe) 
      unknown questions about this sequence, the biggest of course being whether 
      or not it contains every prime. This is a great sequence to think about, 
      but it is a computational nightmare to enumerate! I downloaded these terms 
      from the Online Encyclopedia of Integer Sequences (homepage is 
 
 http://oeis.org/
      
      
      
      
      and the page for our sequence is http://oeis.org/A000945
      ). You can enter the first few terms of an integer sequence, and it 
      will list whatever sequences it knows that start this way, provide 
      history, generating functions, connections to parts of mathematics, .... 
      This is a GREAT website to know if you want to continue in mathematics. 
      There have been several times I've computed the first few terms of a 
      problem, looked up what the future terms could be (and thus had a formula 
      to start the induction).
 
    
    
    We talked about limits and continuity. Informally, a continuous function 
    is one where we can draw its graph without lifting our pen/pencil from the 
    paper. If we take this as our working definition, however, we can easily be 
    misled in terms of properties of continuous functions. For example, are all 
    continuous functions differentiable? Clearly not, as we can take f(x) = |x|. 
    While this function is not differentiable at x=0, it is differentiable 
    everywhere else. Thus we might be led to believe that all continuous 
    functions are differentiable at most points. This sadly is not true.
      - Weierstrass showed (the first text where I read this used the phrase 'Weierstrass 
      distressed 19th century mathematicians) that it is possible to have a 
      function which is continuous and differentiable nowhere! The 
      wikipedia article is a good starting point. In addition to explicitly 
      stating what the function is, it has a nice plot and good comments. The 
      function exhibits fractal 
      behavior, though the term fractal wasn't used until many years later 
      by Mandelbrot.
 
      - In higher mathematics we learn to quantify orders of infinity. We see 
      that there are more real numbers than rational numbers (see Cantor's 
      diagonalization argument); the comments from Wednesday, February 19th 
      discuss whether or not there is a set whose size is strictly between the 
      rationals and the real. Amazingly, if you count functions properly, it 
      turns out almost every continuous function is differentiable nowhere! See 
      here for some comments on 
      this strange state of facts. The key ingredient is an advanced result from Functional 
      Analysis, the Baire 
      Category Theorem. There are also sets (fractals) of non-integral 
      dimension. Fractals have a rich history and numerous applications, ranging 
      from economics to Star Trek II: The Wrath of Khan (where they were used to 
      generate the simulated landscapes of the Genesis Torpedo; see 
      the wikipedia article on fractals in film). The economics applications 
      are quite important. One of the most influential papers is due to 
      Mandelbrot (The 
      Variation of Certain Speculative Prices). This is one of the most 
      important papers in all of economics, and argues that the standard 
      Brownian motion / random walk model of wall street is wrong. The crux of 
      the argument is that these standard theories do not allow enough large 
      deviation days. For more on this, see Mandelbroit-Hudson: The fractal (mis)behavior 
      of markets (I have a copy of this book and can lend you part of it if you 
      are interested).
 
    
    
    We saw that for many limits of the form 0/0, a good way to attack the 
    problem is to switch to polar coordinates. We then replace (x,y) goes to 
    (0,0) through an arbitrary path with r tends to 0 and θ does 
    whatever it wants. This works for many problems, and is a good thing to try.
    For the Star Trek fans, there's an interesting Next Generation episode 
    which I forgot to mention: "The 
    Loss". The premise is that the Enterprise runs into two-dimensional 
    beings, and there are singularity issues. There's also a cosmic string 
    fragment (or some such technobabble, I forget exactly what). The reason I 
    was going to mention this is to talk about singularities and domains of 
    definitions of the function. Basically, things on the Enterprise go haywire 
    b/c part of it is now being intersected by the plane of two-dimensional 
    beings. There are some animations of the Enterprise being intersected by the 
    plane (i.e., a level set!). 
    We ended by giving the definition of a partial 
    derivative, and hinting at some of the key properties. Some questions to 
    think about: what is the correct generalization of the definition of a 
    function of one-variable being differentiable to a function of several 
    variables being differentiable? Is it enough for all partial derivatives to 
    exist? Can we always interchange the order of two partial derivatives?
    From a 
    colleague: General Advice On How To Study Physics (though a lot of it 
    applies to any subject).
    
  
     
  - Monday, February 
  11.
  
  We started with the three big change of variables (polar, cylindrical and 
  spherical), then moved on to a discussion on 
  functions and level 
  sets. These occur all the time in real world plots. For example, weather 
  maps constantly show lines of constant temperature; these are called isotherms.
    - 
    
    There are many different coordinate systems we can use; depending on the 
    symmetry of the problem, frequently it is advantageous to use one system 
    over another. We saw in class how complicated regions were reduced to 
    simpler regions. As a rule of thumb, it's better to have a harder integral 
    over a nicer region (rectangle, box) than a simpler integral over a more 
    complicated region. Three of the most common coordinate systems (after 
    Cartesian) are the following:
      - polar 
      coordinates: used most often in the plane where our quantity depends 
      only on the distance from the origin. Note that the filled in circle of 
      radius 1 in polar coordinates corresponds to the rectangle \(0 \le r \le 
      1\) and \(0 \le \theta \le 2\pi\). Thus changing variables replaces a 
      `hard' region with a simple rectangle (and it is much easier to integrate 
      over rectanges!). To compute the area of the unit circle it's just \(4 
      \int_0^1 \sqrt{1 - x^2} dx\); this can be done directly but the 
      integration is a bit of a chore, and doing it will hopefully give you an 
      appreciation for the power of changing variables.
 
      - 
      cylindrical coordinates: generalization of the above where now we live 
      in three dimensional space, but the part depending on x and y only depends 
      on x2 + y2.
 
      - spherical 
      coordinates: another generalization to three dimensions, where we only 
      depend on the distance from the origin.
 
      - Of course, the story does not end in three dimensions. For many 
      problems we need to work with an n-dimensional 
      sphere, and the resulting coordinate system.
 
    
     
    - One can easily assign 
    homework involving sketching various curves, many of which are famous conic 
    sections. The shapes that arise are often ellipses, 
    hyperbolas, parabolas, lines and circles. 
    The theory of conic sections says that these are all related, and arise as 
    cross sections obtained by having planes intersect a cone at various angles. 
    These shapes arise throughout mathematics and science. Here are just a few 
    examples, which illustrate their importance.
      - Chemistry / Physics: The ideal 
      gas law states that PV = nRT. If we set T equal to a constant, we then 
      get PV is constant (this special case is called Boyle's 
      law). Note that this is an equation of a hyperbola, and thus the isotherms (level 
      sets of constant temperature) are hyperbolas.
 
      - Physics / Astrophysics: The most common example of conic section is 
      orbits of planets. In three-dimensional space, planets orbiting the sun 
      under a gravitational force proportional to the inverse-square of the 
      distance travel in ellipses, hyperbolas, or parabolas (see 
      here for more details).
 
    
     
    - It is not too hard for us to imagine what it would be like for a sphere 
    to enter a plane, but it does become harder and harder to imagine four 
    dimensional objects arriving in our three dimensional world. One of my 
    favorite stories it the classic Nightfall (by 
    Isaac Asimov). What makes this such a great story is that he takes something 
    that is conceivable for us and creates a world where it is inconceivable for 
    the population. I strongly urge you to read this story.
 
    - The video clips for Flatland are available here: Flatland 
    trailer. (The 
    full movie is available here for class purposes only.) There's also projections 
    of 4-dimensional cubes in our 3-dimensional space. I find it very hard 
    to imagine four dimensional objects passing through our space, but it's a 
    fun exercise. We can imagine a sphere passing through Flatland; what would a 
    4-dimensional sphere look like passing through our space? We can imagine a 
    3-dimensional cube passing through Flatland (preferably at an angle as 
    otherwise it's nothing, a full square for awhile, and then nothing again); 
    what happens with a 4-dimensional square going through our space?
 
    - 
    
    Kepler's laws of planetary motion heavily 
    use ellipses, hyperbolas and the like. These 
    are the famous conic sections, and there's a beautiful unified theory of 
    them.
 
    - It's not hard to find functions of several variables. Baseball is filled 
    with these nowadays; one very popular one is the runs 
    created formula.
 
    - At the end of class we talked about how easy it is to mislead or 
    manipulate based on how data is presented. The following is one of my 
    favorite examples. Imagine someone tells you that your team has won 3 of the 
    last 4 games. You're happy and think they're doing well, with a recent 
    winning percentage of 75%. However, things are not as rosy as they seem. In 
    the last four games, imagine they lost the game played four ago. If that 
    were the case, then since they've won 3 of their last 4 they must have won 
    their last 3. If this is the case, we'd just say they've won 3 in a row, as 
    that sounds better. Thus, 4 games ago was a loss, and they've won 2 out of 
    the last 3. Moreover, we know what happened 5 games ago. It was a loss (if 
    it were a win, we would've said they've won 4 of 5). So their last 5 games 
    are either WWLWL, WLWWL or LWWWL. In all cases they've won 3 of 5 for 60% 
    (or 2 of the last 3 for 66%). In both cases they're essentially at what an 
    average team would be, but saying 3 of the last 4 makes them appear 
    stronger.
 
  
     
  - Friday, February 
  8.
  
  A common feature in several variables is to first recall the one variable 
  case, and use that as intuition to describe what's happening. We started by 
  reviewing the three different ways to write the 
  equation of a line in the plane, 
  point-slope, point-point and slope-intercept, and talked about the hidden 
  vector lurking in the equation of a line in a plane. We then generalized this 
  to higher dimensions, and then wrote down the definition 
  of a plane (we'll 
  discuss alternate definitions involving normal vectors later in the course; 
  note that planes arose in the Superbowl in 2010 as to whether or not the 
  Saints had control when the ball broke the plane during the two point 
  conversion; click 
  here,
  
  
  click here or click 
  here for 
  more on breaking the plane in football).
    - We discussed how there are several different but equivalent ways of 
    writing the same expression. We can do it with vectors, as in (x,y,z) = P + 
    tv, or we can do it as a series of equations, such as x = p1 + 
    tv1, y = p2 + 
    tv2, z = p3 + 
    tv3, or as xi = 
    pi + tvi with 
    i in {1,2,3}.  You should use whichever way is easier for you to visualize. 
    It is possible to get so caught up in reductions and compactifications that 
    the resulting equation hides all meaning. A 
    terrific example is the great physicist Richard Feynman's reduction of all of 
    physics to one equation, U = 0, where U represents the unworldliness of the 
    universe. Suffice it to say, reducing all of physics to this one 
    equation does not make it easier to solve physics problems / understand 
    physics (though, of course, sometimes good notation does assist us in 
    looking at things the right way).
 
    - A nice problem is to prove the following about perpendicular lines: 
    the product of their slopes is always -1 if neither is parallel to the x- or 
    y-axis. In some sense, this 
    tells us that in the special case when the lines are the x- and y-axes, we 
    should interpret the product of their slopes as -1, or in other words in 
    this case \(0 \cdot -\infty = -1\).
 
    - There are many applications of equations of lines, planes and 
    projections. One of my favorites comes from art. The painter Thomas 
    Eakins projected pictures of 
    people onto canvasses; this allowed him to have realistic pictures, and 
    saved him hours of computations. Two pictures frequently mentioned are Arcadia and Mending 
    the Net. He hid what he did; it wasn't until years later that people 
    noticed he had done this. If memory serves, this was discovered when people 
    were looking through photographs in an attic and noticed a picture of four 
    people on a New Jersey highway who were identical to four people in a 
    seascape. Upon closer inspection of the canvass, they noticed marks (which 
    were partly hidden) indicating Eakins projected the image onto the canvass. Click 
    here for more on the subject. See 
    also here for a nice story on the controversy (the 
    use of `technology' such as projectors in art). For 
    a semi-current view on the merits of tracing, watch this video clip.
 
    - There is an enormous literature on the applications of lines, planes, 
    projections et cetera in art. The 
    wikipedia article is a good starting point. Another fun example is the 
    original movie Tron; here 
    is the light cycle scene. Notice how back then almost everything is 
    straight lines, and how the computers are dealing with the perspectives.
 
    - The subject has advanced considerably over the years; ray 
    tracing is huge now, and can 
    do amazing 
    things very fast.
 
    - One final nice application is a paper 
    by Byers and Henle determining 
    where a camera was for a given picture, which allows us to do a great job 
    comparing then and now.
 
    - We discussed the equation for the angle between two vectors. 
    Geometrically, it's clear that if we change the lengths of the vectors then 
    we shouldn't change the angle; after a little inspection, we saw that our 
    formula satisfies that property. It is a great skill to be able to look at a 
    formula and see behavior like this. There is a rich history of applying 
    intuition like this to problems. One example is dimensional 
    (or unit) analysis,which is frequently seen in physics or chemistry; my 
    favorite / standard example is the simple 
    pendulum.
 
    - We talked a bit about inequalities a few days ago with the triangle 
    inequality. Here's some more on the subject.
    
 
    - We will not cover determinants in 
    great detail. For us, the most important property is that determinants 
    are related to the volume of the span of the different directions. We 
    saw an application of that today in the interpretation of the triple product 
    formula for \(\overrightarrow{A} \cdot (\overrightarrow{B} \times \overrightarrow{C}\).
 
  
   
 
  - Wednesday, February 
  6.
  
  We continued our list of applications of the Pythagorean 
  Theorem. 
  We saw how it leads to the law 
  of cosines, 
  which leads to our angle formula relating the angle in terms of the dot 
  product. 
  We then talked about determinants, which will be really useful when we get to 
  the multidimensional 
  change of variable formula. 
  The cross 
  product will 
  be very useful in dealing with the geometry of various functions, and occurs 
  all the time in physics and engineering, ranging from Maxwell's 
  equations for electromagnetism to 
  the Navier-Stokes 
  equation for fluid flow.
    - 
    
    When asked for a relation between sine and cosine, we used 
    \(\sin^2 x + \cos^2 x = 1\). This is the natural one to use as we're talking 
    about areas and the sines and cosines come in as side lengths. There are, of 
    course, other relations. One important one is the relation between the derivatives: 
    the derivative of \(\sin x\) is \(\cos x\) and the derivative of \(\cos x\) 
    is \(-\sin x\) (you have to remember which one gets the minus sign; my 
    mnemonic is minus sine). In differential 
    trigonometry, it is essential that 
    we measure angles in radians. If 
    we use radians, then the derivative of sin(x) is cos(x) and the derivative 
    of cos(x) is -sin(x); this is not true if we use degrees. If we use 
    degrees, we have pesky conversion factors of 360/2π to 
    worry about. The proof of these derivatives follow from the angle 
    addition formulas; let me know if you want more details about this 
    (we'll mention this briefly when we do Taylor series of exp(x)).
 
    - In the proof of the Law 
    of Cosines, the key step was adding an auxiliary line to reduce the 
    problem to the point where we could apply the Pythagorean Theorem. Learning 
    how to add these auxiliary lines is one of the hardest things to do in math. 
    As a good exercise, figure out what auxiliary lines to add to prove the 
    angle addition formula for sine, namely \(\sin(x+y) = \sin(x) \cos(y) + \cos(x) 
    \sin(y)\); click 
    here for the solution. For another example, click 
    here. One thing to keep in mind is what do we know, what are we building 
    upon. We know the Pythagorean formula; we thus want right triangles, which 
    suggests drawing an altitude. 
    There's a lot of nice theorems about altitudes and other such lines.
 
    - In the proof that the area of the hyper-parallelogram is given by the 
    absolute value of the determinant (in two dimensions) we wanted to replace 
    the \(\sin(\theta)\) term with a function of \(\cos(\theta)\). Note how similar this is 
    to the proof of the law of cosines; we again are trying to reduce our 
    analysis to something known. We have formulas for the cosines of angles in 
    terms of dot products, but not their signs.
 
    - I mentioned the movie Flatland. The 
    original story is available here, 
    while a trailer 
    from the new movie is here. It's an interesting exercise to think about 
    what life would be like confined to two dimension (think of how you eat and 
    what happens after). Any move that has squaricles is worth seeing! Star 
    Trek: The Next Generation dealt with two-dimensional life forms in the 
    episode The Loss (clips no longer easily found online).
 
    - In our course we only deal with integral dimensions, but that misses a 
    lot! There are many natural phenomena that legitimately have a fractal 
    dimension. There 
    are famous papers trying to compute the length of the British coast; 
    would you be surprised or not surprised to hear that the Finnish coast has a 
    higher dimension than the British?
 
    - Here is a fun fact for 
    the day: a medical researcher rediscovers integration and gets 75 citations! The 
    article on this is here, while the paper 
    is here.
 
  
   
 
  - Monday, February 4.
  
  Today we discussed some notation and the basic properties of vectors, 
  specifically how to add, subtract, and rescale them.
    - We also discussed notation for the natural numbers, the integers, the 
    rationals, the reals and the complex numbers. We will not do too much with 
    the complex numbers in the course, but it is important to be aware of their 
    existence. Generalizations of the complex numbers, the quaternions, 
    played a key role in the development of mathematics, but have thankfully 
    been replaced with vectors (online 
    vector identities here). The quaternions themselves can be generalized a 
    bit further to the octonions (there 
    are also the sedenions, 
    which I hadn't heard of until doing research for theses comments).
 
    - A natural question to ask is, if all we care about are real numbers, 
    then why study complex numbers? The reason is that certain operations are 
    not closed under the reals. For example, consider quadratic polynomials f(x) 
    = ax2 + bx + c with a, 
    b and c real numbers. Say we want to find the roots of f(x) = 0; 
    unfortunately, not all polynomials with real coefficients have real roots, 
    and thus to find the solutions may require us to leave the real. Of course, 
    you could say that if all you care about is real world problems, this won't 
    matter as your solutions will be real. That said, it becomes very useful 
    (algebraically) to allow imaginary numbers such as i = sqrt(-1). The reason 
    is that it allows us a very clean way to manipulate many quantities.
    
    There is an explicit, closed form expression for the three roots of a cubic; 
    while it may not be as simple as the quadratic 
    formula, it does the job. Interestingly, if you look at x3 - 
    15x - 4 = 0, the aforementioned method yields (2 + 11i)1/3 + 
    (2-11i)1/3. It isn't at all obvious, but algebra will show that 
    this does in fact equal 4! As you continue further and further in 
    mathematics, the complex numbers play a larger and larger role.
 
    - The proof that the length of a vector is the square-root of the sum of 
    the squares is a nice example of a proof 
    by induction (see also my 
    notes here). There are many statements in mathematics that can be proved 
    using this technique, and if you plan on continuing in math/physics this is 
    worth learning.
 
    - Years ago I prepared a short handout for some of my students on various 
    proof techniques (click 
    here); it goes through several of the standard methods.
 
    - 
    
    We ended the mathematical lecturing with the definition of the inner 
    or dot product. While our definition only works for vectors, it turns 
    out this is one of the most useful ideas in mathematics, and can be 
    generalized greatly. For example, we can talk about the dot product of 
    functions! We've seen a bit how the dot product is related to angles and 
    lengths, and thus we will find that we can discuss in a sensible manner what 
    the `angle' is between sin(x) and cos(x)! You can look at special cases to 
    get a sense of the reasonableness of the formula (take two perpendicular 
    vectors, or see what happens when you rescale their lengths).
 
    - While 
    looking at special cases of the angle formula can appear convincing, it 
    suffers a severe drawback: we're only looking at special vectors, either 
    parallel or perpendicular or rescaling a known case. There's a real danger 
    of drawing the wrong conclusion from special cases. For example, if we only 
    looked at right triangles we'd think the sum of the squares of the shorter 
    sides equals the square of the longer. We must check a truly generic case to 
    get some real security; unfortunately, it's hard to check those cases as we 
    don't know the angles! Related to this are some nice stories about people 
    taking advantage of processes that were supposed to be random but weren't. A 
    nice recent example is with scratch lottery tickets (see 
    here for the Wired article, and here 
    for another). For another example, there are some very small errors the 
    Germans made with their Enigma code during WWII, which allowed the Allies to 
    read all German military orders! See the Wikipedia 
    article on Ultra (Ultra was 
    the code given to allied decrypt efforts), as well as Articles 
    from the NSA on cryptography (this 
    is a link to many subpages). Two especially good and accessible ones deal 
    with the German 
    code Enigma, and Ultra, 
    the allied deciphering of it. I strongly urge you to look at the links 
    here. Another nice one is on the Battles 
    of Coral Sea and Midway. An amusing story involves a Civil 
    war message just decoded -- fortunately it wasn't needed! (Another 
    version of the story here.) This is nice application of the Vigenere 
    cipher (see 
    also the notes by my colleague here on how to crack it). This is yet 
    another example of what was supposed to be a random pattern not being truly 
    random, and thus susceptible to attack.
 
    - 
    
    The Fibonacci 
    numbers (here 
    is the class shown in class) show up in a variety of places. They 
    satisfy the following recurrence relation: F_{n+2} = F_{n+1} + F_n (with the 
    initial conditions F_0 = 0 and F_1 = 1). After a little inspection one sees 
    that the entire sequence is determined once we know two consecutive numbers, 
    as we can just use the recurrence relation. There are many fun applications 
    of Fibonacci (and other recurrence relations) in nature; perhaps my favorite 
    is proving why Double Plus One is a bad strategy in roulette (though many 
    website, like 
    the one here, 
    don't seem to realize the danger, or perhaps deliberately avoid stating 
    it!). If you're interested in gambling applications of this (or other 
    aspects), just let me know;
    here's a short video 
    clip explaining how the Fibonacci numbers show this is a poor strategy!
 
  
     
 
  - Friday, February 
  1. The main result we 
  proved today was the
  Pythagorean Theorem, 
  which relates the length of the hypotenuse of a right triangle to the lengths 
  of the sides (President 
  Garfield is credited with a proof). For us, this result is important as 
  gives us a way to compute the length of vectors. While we only proved it in 
  the special case of a vector with two components, the result holds in general. 
  Specifically, if v = (v1, ..., vn) then ||v|| = sqrt(v12 
  + ... + vn2). It is a nice exercise to prove this. 
  One way is to use
  Mathematical 
  Induction (one common image for induction is that of
  following dominoes); 
  see also my handout on induction. 
  Below are some additional remarks. These relate to material mentioned in 
  class. The comments below are entirely for your personal enjoyment and 
  edification. You do not need to read these for the class. These are meant to 
  show how topics discussed arise in other parts of mathematics / science; these 
  will not be on exams, you are not responsible for learning them, ....
  - Later in the semester we will revisit
  Monte Carlo 
  Integration, called by many the most important mathematical paper 
  of the 20th century. Sadly, most integrals cannot be evaluated in closed form, 
  and we must resort to approximation methods.
  
 
  - Sabermetrics is 
  the `science' of applying math/stats reasoning to baseball. The formula I 
  mentioned in class is what's known as the
  
  log-5 method; a better formula is the
  Pythagorean Won 
  - Loss formula (someone linked 
  my paper deriving this from a reasonable model to the wikipedia page). 
  ESPN, MLB.com and all sites like this use the Pythagorean win expectation in 
  their expanded series. My derivation is a nice exercise in multivariable 
  calculus and probability; we will either derive it in class or I'll give a 
  supplemental talk on it.
 
  - In general, it is sadly the case that most functions do not have a simple 
  closed form expression for their anti-derivative. Thus integration is 
  magnitudes harder than differentiation. One of the most famous that cannot be 
  integrated in closed form is exp(-x2), which is related to 
  calculating areas under the normal (or bell or Gaussian) curve. We do at least 
  have good series expansions to approximate it; see the entry on the
  erf (or error) function.
  - In class we mentioned that the anti-derivative of ln(x) is x ln(x) - x; it 
  is a nice exercise to compute the anti-derivative for (ln(x))n for 
  any integer n. For example, if n=4 we get 24 x-24 x Ln[x]+12 
  x Ln[x]2-4 x Ln[x]3+x Ln[x]4
 
   
  - Here are some links to generate articles:
  
 
  - Finally, the quest to understand the cosmos played an enormous role in the 
  development of mathematics and physics. For those interested, we'll go to the 
  rare books library and see first editions of Newton, Copernicus, Galileo, 
  Kepler, .... Some interesting stories below; see also a great article 
  by Isaac Asimov on all of this, titled
  The Planet That Wasn't.