Additional comments related to material from the
class. If anyone wants to convert this to a blog, let me know. These additional
remarks are for your enjoyment, and will not be on homeworks or exams. These are
just meant to suggest additional topics worth considering, and I am happy to
discuss any of these further.
- Friday, May 14. We have seen numerous
times the
importance of optimizing, of finding maxima and minima. For a continuous,
differentiable function f of one variable the candidates are the
critical points (all x where f'(x) = 0) and the endpoints. To determine if
a critical point is a maximum, minimum or neither, we have the
First Derivative
Test and the
Second Derivative Test. The tests become more complicated in several
variables. We discussed a continuous, differentiable function f of two
variables. At a critical point its gradient vanishes, and thus its
second order Taylor series looks like T(x,y) = f(0,0) + (1/2) (x y) (Hf)(0,0)
(x y)T, where (x y)T means the column vector. If we take
the matrix Hf to have first row (2 0) and second row (0 3) we find T(x,y) =
(1/2)[2 x^2 + 3 y^2], which is always positive for (x,y) distinct from (0,0).
We thus see that we have a minimum. In other words, to generalize the second
derivative test to several variables we must figure out what the
generalization of g''(0) > 0 is. It becomes (x y) (Hf)(0,0) (x y)T
> 0 for (x y) not equal to zero. This is called
positive
definiteness; there are similar generalizations for maxima. The easiest
way (or at least a common way) to determine if a matrix is positive
definiteness is to compute its
eigenvalues; if all the
eigenvalues are positive it is positive definite, if all are negative its
negative definite, and if some are positive and some negative it is indefinite
(if some are zero we use the word indefinite). The mathematics behind these
classifications and eigenvalues is related to much of our course, especially
the Change of Variable formula and finding equations of curves. See the
Principal Axis
Theorem for more details (which includes how to write down the equation of
an ellipse whose axes are not aligned with the standard coordinate axes).
- Wednesday, May 12. Today was a fast
introduction to path
integrals, line integrals, and Green's
Theorem (which is a special case of the Generalized
Stokes' Theorem).
While our tour of these subjects has to be rushed in a 12 week course, if you
are continuing in certain parts of math, physics or engineering you will meet
these again and again (for example, see
Maxwell equations
for electricity and magnetism). In fact, one can view all of classical
mechanics as path
integrals where the trajectory of the particle (its c(t)) minimizes the action;
there is also a
path integral
approach to quantum mechanics.
- For those continuing in mathematics or physics, you will see these ideas
again if you take
complex analysis. In particular, one of the gems of that subject is
Cauchy's
Integral Theorem, A complex differentiable function satisfies what is
called the
Cauchy-Riemann equations, and these are essentially the combination of
partial derivatives one sees in Green's theorem. In other words, the
mathematics used for Green's theorem is crucial in understanding functions of
a complex variable.
- For me, I consider it one of the most beautiful gems in mathematics that
we can in some sense move the derivative of the function we're integrating to
act on the region of integration! This allows us to exchange a double integral
for a single integral for Green's theorem (or a triple integral for a double
integral in the divergence theorem). As we've seen constantly throughout the
year, often one computation is easier than another, and thus many difficult
area or volume integrals are reduced to simpler, lower dimensional integrals.
- The fact that Int_{t = a to b} grad(f)(c(t)) . c'(t) dt = f(c(b)) - f(c(a))
means that this integral does not depend on the path. If a vector field F =
(F1, F2, F3) equals grad(f) for some f, we say F is a
conservative force
field and f is the
potential. The
fact that these integrals do not depend on the path has, as you would expect,
profound applications.
- This is a good point to stop and think about the number of spatial
dimensions in the universe. Imagine a universe with two point masses under
gravity, and assume gravity is proportional to 1/r^{n-1} with r the distance
between the masses and n the number of spatial dimensions. If there are three
or more dimensions, then the work done in moving a particle from infinity to a
fixed, non-zero distance from the other mass is finite, while if there are two
dimensions the work is infinite! One should of course ask why the correct
generalization to other dimensions is 1/r^{n-1} and not 1/r^2 always. There is
a nice geometric justification in terms of flux and surface area; the surface
area of a sphere grows like r^2 and thus the only way to have the total flux
of force out of it be constant is to assume the force drops like 1/r^2;
click here for a bit
on the justification of inverse-square laws.
- Speaking of dimensions, one of my favorite problems from undergraduate
days was the Random Walk.
In 1-dimension, imagine a person so completely drunk that he/she has a 50%
chance at any moment of stepping to the left or the right; what is the
probability the drunkard eventually returns home? It turns out that this
happens with probability 1. In 2-dimension, we have a 25% chance of moving
north, south, east or west, and again the probability of returning is 1. In 3
dimensions, however, the drunkard only returns home with probability about
34%. As my professor
Peter Jones
said, a three-dimensional universe is the smallest one that could be created
that will be interesting for drunks, as they really get to explore! These
random walk models are very important, and have been applied to economics (the
random walk hypothesis), as well as playing a role in
statistical
mechanics in physics.
- Monday, May 10. Today we began our
drive to the end of the semester, which will culminate in a statement of the
Big Three theorems of Vector Calculus (Green's
Theorem, Gauss'
Divergence Theorem, and
Stokes' Theorem).
These theorems are massive generalizations of the
Fundamental Theorem of Calculus, which can be generalized even more. The
idea is to relate the integral of the derivative of something over a region to
the integral of the something over the boundary of the region. To state these
theorems requires many concepts from vector calculus (parametrizing curves,
vectors, ...) as well as the Change of Variable theorem (converting integrals
over curves and surfaces to intergrals over simpler curves and surfaces).
- To truly see and appreciate the richness of the three theorems (which are
really three variants of the same theorem), one must be in at least three
dimensions. There the Stokes' Theorem states that the integral of a certain
function over a surface equals the integral of another over the boundary
curve. This means that many integrals turn out to be the same.
- To see the equivalence of these formulations requires
differential forms.
Frequently it is not immediately clear how to generalize a concept to higher
dimensions or other settings.
- While we only briefly touched on the subject,
conservative forces
are extremely important in physics and engineering, primarily because of a
wonderful property they have: the work done in moving an object from A to B is
independent of the path taken if the exerted force is conservative. Many of
the most important forces in classical mechanics are taken to be conservative,
such as gravity and
electricity. In
modern physics, these forces are replaced with more complicated objects. One
of the central quests in modern physics is to
unify the various
fundamental forces (gravity, strong, weak and electricity and magnetism).
- Click here for more on
divergence, and
click here for more on curl. Another related object (one we have seen many
times) is the gradient.
All of these involve the same differential operator, called
del (and represented with a nabla).
We used our intuition for vectors to define new combiantions involving the del
operator (the curl and the divergence). While our intuition comes from
vectors, we must be careful as we do not have commutivity. For example, nabla
dot F is not the same as F dot nabla; the first is a scalar (number) while the
second is an operator.
Click here for
more on differential operators. For those who want to truly go wild on
operators, modern quantum mechanics replaces concepts like position and
momentum with differential operators (click
here for the momentum operator)! This allows us to rewrite the
Heisenberg
uncertainty principle in the
following strange format.
- One of the most famous applications of these concepts is the
Navier-Stokes
equation, which is one of the
Millenium
Problems (solving one of these is probably the hardest path to
one million dollars!).
The Navier-Stokes equation describes the motion of fluids, which not
surprisingly has numerous practical (as well as theoretical) applications.
Click here for a nice
derivation, which includes many of the new operators we saw today.
- Another place where gradients, curls and divergences appear is the
Maxwell equations
for electricity and magnetism; you can
view the equations here.
- The General Stokes Theorem is a massive generalization of the fundamental
theorem of calculus. The idea of formally moving the derivative from the
function to the region of integration is meant to be suggestive, but of course
is in no ways a proof. Notation should help us see connections
and results. The great physicist
Richard Feynman showed that
all of physics is equivalent to solving the equation U = 0, where U measure
the unworldliness of everything. It is made up of squaring the differences
between the left and right hand sides of every physical law. Thus it has terms
like (F - ma)^2 and (E - mc^2)^2. It is a concise way of
encoding information, but it is not useful; everything is hidden. This is very
different than the vector calculus formulations of electricity and magnetism,
which do aid our understanding. For more information,
skim the article
here (search for unworldliness if you wish).
- We saw that we can compute the lengths of curves by evaluating integrals
of ||c'(t)||, where c(t) = (x(t), y(t), z(t)) is our curve. While this
formulation immediately reduces the problem of finding lengths to a Calc II
problem, in general these are very difficult integrals, and frequently cannot
be done in closed form even for simple shapes. For example, for extra credit
find the length of the ellipse (x/a)^2 + (y/b)^2 = 1.
Click here for
the solution (the answer involves the
elliptic integral of
the second kind).
- Finally, we talked today about generalizing the
Fundamental Theorem of Calculus. There are not that many fundamental
theorems in mathematics -- we do not use the term lightly! Other ones you may
have seen are the
Fundamental Theorem of Arithmetic and the
Fundamental Theorem of Algebra;
click here for a
list of more fundamental theorems (including the
Fundamental Theorem of Poker!).
- Friday, May 7. Today we saw proofs of
the various convergence / divergence tests, as well as more examples.
- A nice application of a series expansion is
Stirling's formula
for n!. We can get close to the correct value by the integral test or the
Euler-MacLaurin
summation formula, This builds on a question asked in each section: we
know the integral test tells whether or not a series converges; if it does
converge, how close is the sum to the integral? The Euler-MacLaurin formula
teaches us how to convert sums to integrals and bound the error.
- The fact that Sum_{n = 1 to oo} 1/n^2 = pi^2/6 has a lot of applications.
It can be used to prove that there are infinitely many primes via the
Riemann zeta
function. The Riemann zeta function is zeta(s) = Sum_{n = 1 to oo} 1/n^s.
By
unique factorization (also known as the Fundamental Theorem of Arithmetic),
it also equals Prod_{p prime} 1 / (1 - 1/p^s); notice that a generalization of
the harmonic sum and the geometric series formula are coming into play. It
turns out that zeta(2) = pi^2/6, as can be seen in many different ways.
As pi^2 is irrational,
if there were only finitely many primes then the product would be irrational,
contradiction! See wikipedia
for a proof of this sum.
- Another interesting application of summing series involving primes is to
the Pentium bug (see
the links there for more information, as well as
Nicely's webpage). The calculation
being performed was sum_{p: p prime and either p+2 or p-2 is prime} 1/p; this
is known as Brun's
constant. If this sum were infinite then there would be infinitely many
twin primes, proving one of
the most famous conjectures in mathematics; sadly the sum is finite and
thus there may or may not be infinitely many twin primes (twin primes are two
primes differing by 2).
- The proof we gave today of the geometric series formula (by shooting
baskets) uses many great techniques in mathematics. It is thus well worth it
to study and ponder the proof.
- Memoryless
process: once both people miss, it is as if we've just started the game
fresh.
- Calculating something two different ways: a good part of combinatorics is
to note that there are two ways to compute something, one of which is easy and
one of which is not. We then use our knowledge of the easy calculation to
deduce the hard. For example, Sum_{k = 0 to n} (n choose k)^2 = (2n choose n);
the right side is easy to compute, the left side not so clear.Why are the two
equal? It involves finding a story, which we leave to the reader.
- For another example of applications of harmonic numbers,
see the coin
collector problem (if you want more info on this problem, let me know -- I
have lots of notes on it from teaching it in probability).
- In Section 2 today a basketball shot basically went in and out; see
the following article on golf for some info on related problems in golf.
- Finally, we talked about how the hardest part of the integral test is
finding the appropriate function to use. Typically you try to replace all n's
with x's. Thus if a_n = 1/n try f(x) = 1/x. If we have a_n = 1/n!, it's not so
clear. What does x! mean if x is not an integer? This can be done through an
extension of the factorial function to what is called the
Gamma function. We
have Gamma(n+1) = n! if n is an integer, and use this to generalize the
factorial function.
- Wednesday, May 5. We discussed the
various tests for whether or not a series converges. We briefly discuss the
tests below, and then we say a few words about the rate of convergence. I see
a lot of similarities between these tests and finding roots of quadratic
polynomials. The 'fastest' way to find the roots is to factor, but of course
that only works if you can 'see' the roots. If you can't see them, you can use
the mechanical grind of the quadratic formula. It's similar with these tests.
The 'easiest / fastest' to use is the comparison test, but its effectiveness
is tied to how many series you know that converge or diverge. If you can't see
a good series to compare it with, you then move to the ratio and root tests,
and then to the integral test.
- The big tests are:
- Comparison test:
This is one of my favorites, though to be effective you must know how a lot of
examples of convergent and divergent series.
- Ratio Test: Remember
that the ratio test provides no information if the value is 1; thus it says
nothing about the convergence or divergence of 1/n^p for any fixed p > 0.
- Root Test: Remember
that the root test provides no information if the value is 1; thus it says
nothing about the convergence or divergence of 1/n^p for any fixed p > 0.
- Integral Test:
The most common example is the harmonic sum, 1/n. The integral test not only
gives the divergence, but with a bit more work shows that the sum of the first
n reciprocals of integers is about ln(n).
- We can use the tests above to show that the Taylor series expansions for
exp(x), cos(x), sin(x), ln(1-x) et cetera converge in various neighborhoods
(the first three for all x, the last for |x| < 1). Thus our results on
convergence of series lets us know when we may replace a function with its
Taylor series expansion. Amazingly, in complex analysis we have the following
wondrous result: if a function of a complex variable z = x + i y is
differentiable
once with respect to z then it is infinitely differentiable and it equals
its Taylor series expansion! This is violently false for functions of a real
variable. What's going on? The difference is that to be differentiable once
requires us to allow x + i y to tend to a prescribed value along any path, and
this freedom of travel imposes sharp constraints as to what functions will be
differentiable. The end result is that complex differentiable functions are a
lot nicer than functions of one variable that are differentiable. Lurking in
the background are the
Cauchy-Riemann
equations, implying that a form of
Stokes theorem
holds.
- We wanted to show the series Sum_{n = 1 to oo} 1 / (4^n - 3^n) converges.
I claimed that for n large, we have 4^n > 2 * 3^n. We can prove this by using
logarithms. Say we have 4^n ? 2 * 3^n; we want to figure out is ? equal to >
or to <.Taking logarithms of both sides gives n ln(4) ? n ln(3) + ln(2), where
we used several log rules to simplify ln(2*3^n). Subtracting gives n( ln(4) -
ln(3) ) ? ln(2), or n ? ln(2) / ln(4/3) (where we again used log rules, and
noted that ln(4) - ln(3) = ln(4/3) > 0, so dividing by it does not change the
relation). As ln(2) / ln(4/3) is about 2.41, we see that for n > 2 that 4^n >
2*3^n,which is what we needed to show. This type of argument is used all the
time.
- Finally, last class we mentioned that certain notations become
standardized, no matter how much we may desire to change them. A great example
of how things have become fixed is the
qwerty keyboard --
if we were to start from scratch, we would not choose this! Another favorite
of mine is the use of cc in an email --
anyone remember what cc
standa for?
- Minday, May 3. We finished the last of
the Big Three coordinate changes,
spherical
coordinates (be aware that physicists and mathematicians have different
definitions of the angles!) and started sequences and series.
- One can generalize spherical coordinates to
hyperspheres in
n-dimensional space. These lead to wonderful applications of special
functions, such as the
Gamma function, in writing down formulas for the `areas' and `volumes'.
- There are many fascinating question involving spheres:
- One of the most important applications of spherical coordinates is to
planetary motion, specifically, proving that the force one sphere exerts on
another is equivalent to all of the mass being located at the center of the
sphere. This is the most important integral in
Newton's great work,
Principia (we have a first edition at the library here). I strongly urge
everyone to look at this problem. Proving that one can take all of the mass to
be at the center enormously simplifies the calculations of planetary motion.
See the Wikipedia article on the
Shell Theorem for the
computation. As this is so important,
here is another link to a proof. Oh,
let's
do another proof here as well as
another proof here. For an example of a non-proof,
read the following
and teh comments.
- Instead of the standard examples of sequences and series, it is fun to
explore some of the more exotic possibilities:
- Standard examples of sequences and series include
- An infinite
series of surprises: a nice article going from the geometric series to the
harmonic series to other important examples.
- We mentioned that sequences and series are very important; two of the most
powerful applications are
Taylor series (approximating complicated functions with simpler ones) and
Riemann sums (allowing
us to calculate areas with integrals).
- L'Hopital's rule
is frequently used to analyze the behavior of sequences. Remember that you can
only use it if you have 0 over 0 or infinity over infinity.
- Extra credit: What is wrong with the following argument: Let's say we want
to compute lim_{h --> 0} sin(h) / h; this is the most important trig limit. We
use L'Hopital's rule and note that it is the same as lim_{h --> 0} cos(h) / 1;
as cos(h) tends to 1, the limit is just 1. Why is this argument not valid? The
answer is one of the most important principles in mathemtatics!
- Friday, April 30. We sketched the
proof of the
Change of Variable formula in two-dimensions. The main idea is to keep
track of how the area changes under the mapping. The proof used many of the
techniques and concepts we've studied all semester, including the cross
product giving the area of a parallelogram and the derivative is defined as
the tangent plane giving an excellent approximation. We then applied the
change of variables formulas to
polar coordinates
and cylidrical
coordinates. We saw how much easier problems with angular symmetry become.
- Frequently we are confronted with the need to find the integral of a
function that we have never seen. One approach is to consult a table of
integrals (here
is one at wikipedia; see also the
table here). Times have changed from when I was in college. Gone are the
days of carrying around these tables; you can access
Mathematica's Integrator
on-line, and it will evaluate many of these. One caveat: sometimes these
integrals are doable but do not appear in the table in the form you have, and
some work is required to show that they equal what is tabulated.
- Probably my favorite example (and one of the most important!) of using
polar coordinates to evaluate an integral is to find the value of the
Gaussian integral
Int_{x = -oo to oo} exp(-x^2)dx. Of course, it seems absurd to use polar
coordinates for this as we are in one-dimension! Our book has a good
discussion of this problem, as does the
wikipedia page.
This is one of the most important integrals in the world, and leads to the
normalization constant for the
normal distribution
(also known as the bell curve or the Gaussian distribution), which may be
interpreted as saying the factorial of -1/2 is the square-root of
π!
- Another good example, of course, is just
computing the area of a circle! In Cartesian coordinates we quickly see we
need the anti-derivative of sqrt(1 - x^2), which involves inverse
trigonometric functions; it is very straightforward in polar! In fact, we can
easily get the volume of a sphere by integrating the function sqrt(1 - x^2 -
y^2) over the unit disk!
- Famous tables are
Abramowitz and
Stegun and Gradshteyn and
Ryzhik.
- For those interested in some of the history of special functions and
integrals,
see the nice article here by Stephen Wolfram. There's a lot of nice bits
in this article.
- One of my favorites is the throw-away comment in the beginning on how the
Babylonians reduced multiplication to squaring. Here's the full story. The
Babylonians worked base 60; if you think memorizing our multiplication table
is bad, consider their problem: 3600 items! Of course, you lose almost 1800 as
xy = yx, but still, that's a lot of tablets to lug. To compute xy, the
Babylonians noted that xy = ((x+y)^2 - x^2 - y^2) / 2, which reduces the
problem to just squaring, subtracting and division by 2. There are more steps,
but they are easier steps, and now we essentially just need a table of
squares. This concept is still with us today: it's the idea of a
look-up table,
computing new values (or close approximations) from a small list. The idea is
that it is very fast for computers to look things up and interpolate, and time
consuming to compute from scratch.
- Wednesday, April 28. The
Change of Variable formula ties together many of the topics of the
semester and generalizes a similar result from one-variable calculus. With
complicated formulas such as this, it is quite useful to look at special cases
first to get a sense of what is going on and then to try and generalize, being
aware of course that sometimes there are features that are missed in the
special cases. I like to look at this formula as giving the exchange rate from
measuring in one coordinate system to another. For example, going from
Cartesian
to Polar
coordinates we cannot have dx dy go to dr dtheta, as dx dy would have units of
meters-squared while dr dtheta has units of meters-radians and radians are
essentially unitless. (As a side note, the most important unitless number in
physics is the
fine structure constant.) We will see later that dx dy transforms to r dr
dtheta.
- Our analysis shows that when we have a linear rescaling, say u = 2x and v
= 3y, then if T(x,y) = (u,v) and T^{-1}(u,v) = (x,y), then dx dy transforms to
|det(D T^{-1})|. Note how many concepts are being applied here. We have the
derivative of a vector valued function and we have determinants. The reason
for the absolute value is a bit tricky, but comes from the danger of having
signed areas. Remember in Calc I that Int_{x = a to b} f(x) dx = - Int_{x = b
to a} f(x) dx. We are looking at how the area elements transform. In order to
make sure the areas are positive, we need to insert absolute values here.
- Another caveat is where to evaluate our function when we integrate it over
the transformed region. Assume we have a map T from xy-space to uv-space. Let
R = T(S). What should Int Int_S f(x,y) dx dy equal in uv-space? It becomes Int
Int_T(S) f(T^{-1}(u,v)) |det DT^{-1}(u,v)| du dv. This is similar to the chain
rule. If A(x) = f(g(x)) remember A'(x) = f'(g(x)) g'(x) and not f'(x) g'(x).
This is one of the most common mistakes, namely evaluating f at the wrong
point. Similarly here we need to make sure we evaluate f at the right place.
In uv-space, our inputs are u and v, but f is expecting as inputs x and y. As
T sends x and y to u and v, T^{-1} sends u and v to x and y, and thus the new
function say g(u,v) = f(T^{-1}(u,v)) is what we should integrate over T(S).
- There are many applications of the Change of Variables formula, especially
in probability theory;
see here for a one-dimensional example (if you have access to JStor,
here is one in
economics).
- Friday, April 23. In the first
part of class we discussed
Monte Carlo
Integration. Two good papers on the subject are available here:
Metropolis
(beginnings of method) and
Metropolis-Ulam
(the Monte Carlo Method). The basic idea is that if you have a region inside a
big rectangle (or box or hyperbox in higher dimensions) where it is easy to
tell if a point is in or not, then by choosing numbers randomly you can
approximate the area very well and very quickly. To quantify how fast and how
accurate you are, you can use various results from probability theory, such as
Chebyshev's
Theorem or even better the
Central Limit
Theorem; the arguments rely on knowledge of the concepts of
mean (or expected value)
and variance (or standard
deviation).
-
There are a multitude of applications of these
techniques; see for example the wikipedia article on
Monte
Carlo Methods in Finance, or the book
http://www.amazon.com/Financial-Engineering-Stochastic-Modelling-Probability/dp/0387004513
(a book on applying Monte Carlo to financial engineering).
-
We also talked about how to evaluate certain
integrals quickly by noting that we're integrating an
odd function
over a symmetric region.
-
Certain definite integrals can be evaluated
easily, even though the anti-derivatives are hard to find. We saw this for
integrating cos^2(x) from 0 to π/4; there are many others. After doing the
change of variable formula, we'll see this for integration exp(-x^2/2) from
minus infinity to infinity (unlike the cosine example, there is NO closed form
expression for the anti-derivative; it's not just that it is messy, it's that
it doesn't exist!). Personally, I find it a bit miraculous at times that we
can exploit identities and get clean expressions for quantities such as these.
-
In the sabermetrics lunch today we digressed a
bit, and eventually started talking about
Erdos number,
Bacon numbers, and my
favorite:
Erdos-Bacon numbers. You can use
MathSciNet (click
on collaborative distance) to search for Erdos numbers, and the
Oracle of Bacon for Bacon numbers. It
is a very non-trivial problem to take a large network and find the shortest
route between two points. Another good example of the mathematics of efficient
searching is Google's PageRank.
- Wednesday, April 21. Fubini's
Theorem (changing the order of integrations) is one of the most important
observations in multivariable calculus. For us, we assume our function f(x,y)
is either continuous or bounded, and that it is defined on a simple region D
contained in a finite rectangle. If D is an unbounded region, say D = {(x,y):
x, y >= 0} then Fubini's theorem can fail for continuous, bounded functions.
In class we did an example involving a double sum, where a_{0,0} = 1, a_{0,1}
= -1, a_{0,n} = 0 for all n >= 2, then a_{1,0} = 0, a_{1,1} = 1, a_{1,2} = -1,
and then a_{1,n} = 0 for all n >= 3, and so on. If we want to have a
continuous function, we can tweak it as follows. Consider the indices {m,n}.
Draw a circle of radius 1/2 with center {m,n} (note no two points will have
circles that intersect or overlap). If a_{m,n} is positive, draw a cone with
base a circle of radius 1/2 centered at {m,n} and height 12/π.
As the area of a cone is (1/3) (area base) (height), this cone will have
volume 1; if a_{m,n} was positive we draw a similar cone but instead of going
up we go down, so now the volume is -1. What is going wrong? The
problem is that Sum_m Sum_n |a_{m,n}| = ∞ (the
sum of the absolute values diverges), and when infinities enter strange things
can occur. Recall we are not allowed to talk about ∞ - ∞; the contribution
from where our function or sequence is positive is +∞, the contribution where
it is negative is -∞, and we are not allowed to subtract infinities.
- To motivate the
Change of Variable
Formula, we discussed trying to find the area of a circle by doing the
integration directly. While there are many ways to justify learning the Change
of Variable Formula (it's one of the key tools in probability), I wanted to
take the path of looking at what should be a simple integral and seeing
how hard it can be to evaluate in the given coordinate system. Much of modern
physics is related to changing coordinate systems to where the problem is
simpler to study (see the
Lagrangian or
Hamiltonian
formulations of physics); these are equivalent to F = ma, but lead to much
simpler algebra. The problem we considered was using one-variable calculus to
find the area under a circle. This requires us to integrate sqrt(1 - x2)
from x=0 to x=1. This is one of the most important shapes in mathematics -- if
calculus is such a great and important subject, it should be able to handle
this!
- To attack this problem, we recalled a
powerful technique from Calc I: if f(g(x)) = x (so f and g are inverse
functions, such as sqrt(x^2)), then g'(x) = 1 / f'(g(x)); in other words,
knowing the derivative of f we know the derivative of its inverse function.
This was used in Calc I to pass from knowing the derivative of exp(x) to the
derivative of ln(x). We tried various
inverse
trig functions; while many were close to sqrt(1-x^2), none of them were
exactly that (a
list of the derivatives of these are here). This highlights one of the
most painful parts of integration theory -- just because we are close to
finding an anti-derivative does not mean we can actually find it! While there
is a nice anti-derivative of sqrt(1 - x^2), it is not a pure
derivative of an inverse trig function. There are many
tables of anti-derivatives (or integrals) (a fun example on that page is
the Sophomore's Dream).
Unfortunately it is not always apparent how to find these anti-derivatives,
though of course if you are given one you can check by differentiating (though
sometimes you have to do some non-trivial algebra to see that they match). In
fact, there are some tables of integrals of important but hard functions where
most practitioners have no idea how these results are computed (and
occasionally there are errors!). We will see later how much simpler these
problems become if we change variables; to me, this is one of the most
important lessons you can take from the course: Many problems have
a natural point of view where the algebra is simpler, and it is worth the time
to try to find that point of view!
- For another example of changing your
viewpoint, think of trying to write down an ellipse aligned with the
coordinate axes, and one rotated at an angle. Linear algebra provides a nice
framework for doing these coordinate transformations, changing hard problems
to simpler ones already understood.
- Monday, April 19. Today we discussed modeling, in particular, the interplay between
finding a model that captures the key features and one that is mathematically
tractable. While we used a problem from baseball as an example, the general
situation is frequently quite similar. Often one makes simplifying assumptions
in a model that we know are wrong, but lead to doable math (for us, it was
using continuous probability distributions in general, and in particular the
three parameter Weibull). For more on these and related models,
my baseball paper is available here; another interesting read might be
my marketing paper for the movie industry (which is a nice mix of modeling
and linear programming, which is the linear algebra generalization of Lagrange
multipliers).
- One of the most important applications of finding areas under curves is in
probability, where we may interpret these areas as the probability that
certain events happen. Key concepts are:
- The more distributions you know, the better chance you have of finding one
that models your system of interest. Weibulls are frequently used in survival
analysis. The
exponential distribution occurs in waiting times in lines as well as prime
numbers.
- In seeing whether or not data supports a theoretical contention, one needs
a way to check and see how good of a fit we have.
Chi-square tests
are one of many methods.
- Much of the theory of probability was derived from people interested in
games of chance and gambling. Remember that when the house sets the odds, the
goal is to try and get half the money bet on one team and half the money on
the other. Not surprisingly, certain organizations are very interested in
these computations.
Click here for
some of the details on the Bulger case (the bookie I mentioned in class is
Chico Krantz, and is referenced briefly).
- Any lecture on multivariable calculus and probabilities would be remiss if
it did not mention how unlikely it is to be able to derive closed form
expressions; this is why we will study
Monte Carlo
integration later. For example, the
normal distribution
is one of the most important in probability, but there is no nice
anti-derivative. We must resort to series expansions; that expansion is so
important it is given a name:
the error function.
- I strongly urge you to read the pages where we evaluate the integrals in
closed form. The methods to get these closed form expressions occur frequently
in applications. I particularly love seeing relations such as 1/c = 1/a + 1/c;
you may have seen this in
resistors in parallel or perhaps the
reduced mass from the
two body problem
(masses under gravity). Extra credit to anyone who can give me another example
of quantities with a relation such as this.
- The probability distribution 6 x (1-x) for 0 <= x <= 1 is an example of a
Beta distribution,
which is very useful in modeling a wide variety of phenomena (for example, see
the Laffer curve).
- Click here for a clip
of Plinko on the Price I$ Right, or here for a
showcase showdown.
- Friday, April 16. The main
result today was a method for integrating over regions other than a rectangle.
We discussed a theoretical way to do it by replacing our initial function f on
a rectangle including D with a new function f*, with f*(x,y)
= f(x,y) if (x,y) is in our domain D and 0 otherwise. To make this rigorous we
need to argue and show that we may cover any curve with a union of rectangles
with arbitrarily small area. This leads to some natural, interestin gquestions.
- The first, and most important, involves what happens to a function when we
force it to be 0 from some point onward (say outside D). The function may
be discontinuous at the boundary, but then again it may not. There are
many interesting and important examples from mathematical physics where we are
attempting to solve some equation that governs how that system evolves. One of
the most studied are the vibrations of a drum, where the
drumhead is connected and
stationary. We can thus view the vibrating drumhead as giving the values of
our function on some region D, with value 0 along the boundary. This leads to
the
fascinating question of whether or not you can hear the shape of a drum.
This means that if you hear all the different harmonics of the drum, does that
uniquely determine a shape? Sadly, the answer is no -- different drums can
have the same sounds.
An excellent article
on this is due to Kac, and can be read here.
- We discussed y-simple, x-simple an simple regions (note that a region is
said to be elementary if it is either x-simple or y-simple). The point of our
analysis here is to avoid having to go back to the definition of the integral
(ie, the Riemann sum). While not every region is elementary, many are either
elementary or the union of elementary regions. Below are two interesting
tidbits about how strange things can be:
- Space filing
curves: click here for just how strange a curve can be!
-
Koch snowflake: This is an example of a fractal set; the boundary has
dimension greater than 1 but less than 2! It's
fractal dimension is log 4 / log 3.
- Jordan curve
theorem: It turns out to be surprisingly difficult to prove that every
non-intersecting curve in the plane divides the plane into an inside and
outsider region. It's not too bad for polygons, but for more general curves
(such as the non-differentiable boundary of the Kock snowflake), it's harder.
- Monday, April 12. Today we
proved the
Fundamental Theorem of Calculus in one variable. To simplify the prove, we
made the additional assumptions that our function was continuously
differentiable and the derivative was bounded. These assumptions can all be
removed; it suffices for the function to be continuous on a finite interval
(in such a setting, a continuous function is actually
uniformly continuous;
informally, this means in the
epsilon-delta formulation of continuity that delta is independent of the
point. Such a result is typically proved in an analysis class. What I find
particularly interesting about the proof is that the actual value that bounds
the function is irrelevant; all that matters is that our function is bounded.
Theoretical math constantly uses such tricks; this is somewhat reminiscent of
some of the Lagrange Multiplier problems, where we needed to use the existence
of lambda to solve the problem, but frequently we never had to compute the
value of lambda.
- The key ingredients in the proof are using the
Mean Value Theorem
and observing that we have a
telescoping sum.
One has to be a little careful with telescoping sums with infinitely many
terms. The wikipedia article has some nice examples of telescoping sums and
warnings of the dangers if there are infinitely many summands.
- Whenever you are given a new theorem (such as the Fundamental Theorem of
Calculus), you should always check its predictions against some cases that you
an readily calculate without using the new machinery. For example, if we want
to find the area under f(x) from x=0 to x=1, obviously the answer will depend
on f. If f is constant it is trivial; if f is a linear relation then the
answer is still readily calculated. For more general polynomial n, one can
compute the Riemann sums (the
upper and lower sums)
by Mathematical
Induction. For example, using induction one can show that the sum from n=1
to n=N of n is simply n(n+1)(2n+1)/6, and this result can then be used to find
the area under the parabola y = x2.
- The integration covered through Calc III is known as
Riemann sums / Riemann
integrals. In more advanced math classes you'll meet the successor,
Lebesgue integrals.
Informally, the difference between the two is as follows. Imagine you have a
large number of coins of varying denominators to assist; your job is to count
the amount of money. Riemann sums work by breaking up the domain of the
function; Lebesgue integration works by breaking up the domain.
- In the review session on Sunday we talked about paths on surfaces; in
particular, we talked about how
Pac-man is really played on a
torus / donut. The mathematics of tori and shapes in general is quite
fascinating; if you want,
click here and download games for tori (including tic-tac-toe for the
torus!).
- For those looking for a challenge: Let f satisfy the conditions of the
Fundamental Theorem of Calculus. Let L(n) denote the corresponding lower sum
when we partition the interval [0,1] into n equal pieces, and similarly let
U(n) denote the upper sum. We know U(n) - L(n) tends to zero and L(n) <= True
Area <= U(n); as U(n) - L(n) --> 0 as n --> oo, both U(n) and L(n) tend to the
true area. Must we have L(n) <= L(n+1), or is it possible that L(n+1) might be
less than L(n)?
- Friday, April 9. In one
dimension, there is not much choice in how we integrate; however, if we are
trying to integrate a function of several variables over a rectangle (or other
such region), not surprisingly the situation is markedly different. Similar to
the freedom we have with limits in several variables (where we have to
consider all possible paths), there are many ways to integrate. Imagine we
have a function of two variables and we want to integrate it over the
rectangle [a, b] x [c, d], with x in [a, b] and y in [c, d]. One possibility
is we can fix x and let y vary, computing the integral over y for the fixed x,
and then let x vary, computing the integral over x. Of course, we could also
do it the other way. As we are integrating the same function over the same
region (just in a different order), we hope that the answers are the same! So
long as everything is nice, this is the case. There are many formulations as
to exactly what is needed to make the situation nice; if our function is
continuous and bounded and we are integrating over a finite rectangle, then we
can interchange the order of integration without changing the answer. This is
called Fubini's
theorem, and is one of the most important results in integration theory in
several variables. There really isn't an analogue in one dimension, as there
we have no choice in how to integrate!
- Whenever you are given a theorem, it is worthwhile to remove a condition
and see if it is still true. Typically the answer is no (or if it is still
true, the proof is frequently much harder). There are many functions and
regions where the order of integration matters. The simplest example is
looking at double sums rather than double integrals, though with a little work
we can convert this example to a double integral. We give a sequence a_{mn}
such that Sum_{m = 0 to oo} Sum_{n = 0 to oo} a_{m,n}) is not equal to Sum_{n
= 0 to oo} Sum_{m = 0 to oo} a_{m,n}). For m, n >= 0 let a_{m,n} = 1 if m = n,
-1 if n=m+1 and 0 otherwise. Show that the two different orders of summation
yield different answers. The reason for this is that the sum of the
absolute value of the terms diverges.
- Click here for
another example where we cannot interchange the order of integration; a
more involved
example is available here.
- Click here for a video
by Cameron on how he applies Fubini's theorem to change the order of
operations (he does a double sum instead of a double integral, but the
principle is the same).
- Wednesday, April 7. Though
today's lecture covered two themes (Lagrange Multipliers and the Method of
Least Squares), there are similarities between applications of each. Both
highlight how our choice of measuring can affect the answer. For the Lagrange
problem, where to build the base or distribution center varies greatly
depending on how we weight differences. Similarly the best fit value of the
parameters depends on how we choose to measure errors. It is very important to
think about how you are going to measure / model, as frequently people reach
very different conclusions because they have different starting points /
different metrics.
- A complete
write-up of the five different models for measuring distance between our base
and the trouble points is available here. For three of the five models one
can use Lagrange Multipliers and obtain a system of equations that is solvable
with straightforward algebra; for two of the methods the gradients end up
involving very nasty functions (with square-roots of polynomials in the
denominator), and we need to resort to numerical techniques. The notes
referred to above describe the different approaches, especially the merits and
disadvantages of each model. These notes are a good introduction to how
sensitive an answer can be on how you measure success.
- The Method of Least Squares is one of my favorites in statistics (click
here for the Wikipedia page, and
click here for my notes). The Method of Least Squares is a great way to
find best fit parameters. Given a hypothetical relationship y = a x + b, we
observe values of y for different choices of x, say (x1, y1), (x2, y2), (x3,
y3) and so on. We then need to find a way to quantify the error. It's natural
to look at the observed value of y minus the predicted value of y; thus it is
natural that the error should be Sum_{i=1 to n} h(yi - (a xi + b))
for some function h. What is a good choice? We could try h(u) = u, but this
leads to sums of signed errors (positive and negative), and thus we could have
many errors that are large in magnitude canceling out. The next choice is h(u)
= |u|; while this is a good choice, it is not analytically tractable as the
absolute value function is not differentiable. We thus use h(u) = u2;
though this assigns more weight to large errors, it does lead to a
differentiable function, and thus the techniques of calculus are applicable.
We end up with a very nice, closed form expression for the best fit values of
the parameters.
- Unfortunately, the Method of Least Squares only works for linear relations
in the unknown parameters. As a great exercise, try to find the best fit
values of a and b to y = c/xa (for definiteness you can think of
this as the force due to two unit masses that are x units apart). When you
take the derivative with respect to a and set that equal to zero, you won't
get a tractable equation that is linear in a to solve. Fortunately there is a
work-around. If we change variables by taking logarithms, we find ln(y) = ln(c/xa);
using logarithm
laws this is equivalent to ln(y) = a ln(x) + ln(c); setting Y = ln(y), X =
ln(X) and b = ln(c) this is equivalent to Y = a X + b, which is exactly the
formulation we need! This example illustrates the power of logarithms; it
allows us to transform our data and apply the Method of Least Squares.
- There are many examples of power laws in the world. Many of my favorite
are related to Zipf's law.
In class we discussed the frequencies of the most common words in English (click
here for the data; see also
this site);
this works for other languages as well, for the size of the most populous
cities, ...; if you consider more general power laws, you also get
Benford's law of digit
bias, which is
used
by the IRS to detect tax fraud (the link is to an article by a colleague
of mine on using Benford's law to detect fraud). The power law relation is
quite nice, and initially surprising to many. My
Mathematica
programming analyzing this is available here. See also
this paper by Gabaix for Zipf's
law and the growth of cities. As a nice exercise, you should analyze the
growth of city populations (you can get data on both
US and
the
world from Wikipedia).
- We discussed Kepler's
Three Laws of Planetary Motion (the Wikipedia article is very nice).
Kepler was proudest (at least for a longtime) of
Mysterium Cosmographicum (I strongly urge you to read this; yes, the same
Kepler whom we revere today for his understanding of the cosmos also advanced
this as a scientific theory -- times were different!).
- Finally, a theme of the past two days is the importance of how we choose
to measure things; how we model and how we judge the model's prediction will
greatly affect the answer. In a similar spirit, I thought I would post a brief
note about Oulipo, a type of
mathematical poetry (this is a link to the Wikipedia page, which has links
to examples). There was a nice article about this recently in Math Horizons (you
can view the article here). This is a nice example of the intersection of
math and the arts, and discusses how the structure of a poem
affects the output, and what structures might lead to interesting works.
- Monday, April 5. Welcome back!
Today's class and Wednesday's are about applications of differentiating
functions of several variables. Specifically, we first generalized the methods
from one variable calculus on how to find maxima and minima of functions.
Recall that if f is a differentiable real-valued function on an interval [a,b],
then the candidates for maxima / minima are (1) the critical points, namely
those x in [a,b] where f'(x) = 0, and (2) the endpoints. How does this
generalize to several variables? In one-dimension the boundary of an interval
is `boring'; it's just the two endpoints, and thus it isn't that painful to
have to check the value of the function there as well as at the critical
point. What about several variables? The situation is quite different. For
example, the interval [-1,1] might become a sphere x^2 + y^2 + z^2 <= 1; the
interior is all points (x,y,z) such that x^2 + y^2 + z^2 < 1, while the
boundary is now the set of points with x^2 + y^2 + z^2 = 1. Unfortunately this
leads to infinitely many points to check; while we could afford to just check
the endpoints by brute force in one-dimension, that won't be possible now. The
solution is the Method of Lagrange Multipliers.
- Two good links:
An
introduction to Lagrange Multipliers and
Lagrange Multipliers.
- The Method of Lagrange Multipliers is one of the most frequently used
results in multivariable calculus. It arises in physics (Hamiltonians and
Lagrangian, Calculus of Variations), information theory, economics, linear and
non-linear programming, .... You name it, it's there. The two webpages
referenced above have several examples in these and other subjects; there are
of course many other sources and problems (click
here for a nice post on gasoline taxes, pollution and Lagrange multipliers).
For more on the economics impact,
click here, as well as see the following papers:
- The Method of Lagrange Multipliers ties together many of the concepts
we've studied this semester, as well as some from Calc I and Calc II (vectors,
directional derivatives and gradients, and level sets, to name a few). The
goal is to show you how the theoretical framework we developed can be used to
solve problems of interest. The military example we discussed is just one of
many possible applications. We were concerned with how to deploy a fleet to
minimize average deployment time to trouble spots (for more information, see
my notes on the
problem and the
Mathematica
code); of course, instead of considering each place equally important we
could easily add weights. One consequence of war is that it does strongly
encourage efficiency and optimization; in fact, many optimization algorithms
and techniques were developed because of the problems encountered. The subject
of Operations Research took off during WWII; see the
excellent wikipedia
article on Operations Research, especially the
subsection on the problems OR attempts to solve. Not surprisingly, there
are also numerous applications in business. Feel free to talk either to my
wife (who is a Professor of Marketing) or myself (I've written several papers
with marketing professors, applying such techniques to many companies,
my favorite being movie theaters). As mentioned, we can reinterpret our
problem as minimizing shipping costs from a central distributor to various
markets (where some markets may be more valuable than others, leading to a
weighted function).
- One of the most important takeaways of the deployment problem is that the
answer you get, as well as the difficulty of the math needed to arrive at the
answer, depends on how you choose to model the world. For us, it depends on
how we choose to measure 'distance'.
My notes on the
problem give four different methods yielding three different solutions,
all of which differ from what you get if you use the 'correct' measure of
distance. This is an extremely common outcome -- your answer depends on how
you choose to model / measure! You need to be very aware of this
when you compare different people's answers to the same problem. For a nice
example of how the answer can depend on your point of view, consider the
riddle below (passed on by G. Mejia). What's the right answer?
- The police rounded up Jim, Bud and Sam yesterday, because one of them was
suspected of having robbed the local bank. The three suspects made the
following statements under
intensive questioning (see below). If only one of therse statements turns out
to be true, who robbed the bank?
- Jim: I'm innocent.
- Bud: I'm innocent.
- Sam: Bud is the guilty one.
- For more on the problem of building an efficient computer in terms of
retrieval of information, see
the solution to the related extra credit problem from earlier in the semester.
Note the problem is harder without the tools of multivariable calculus.
See also the article by Hayes in the
American Scientist, Third Base.
- I've scanned in a chapter by
Lanchester on The Mathematics of Warfare; you can also
view it through GoogleBooks here. This article is from a four volume
series, The World of Mathematics. (I am fortunate enough to own two sets; one
originally belonged to a great uncle of mine, another to a grandfather-in-law
of my wife). I've written some Mathematica code to analyze the
Battle of Trafalgar, which is described in the Lanchester article;
the Mathematica code is here
(though it might not make sense without comments from me). (The file name is
boring because, during
the 200th anniversary re-enactment, in order to avoid hurting anyone's
feelings they refused to call the two sides 'English' and 'French/Spanish').
This is a terrific problem to illustrate applying mathematics to the real
world. One has a very complicated situation, and you must decide what are the
key features. The more features you include the better your model will be, but
the less likely you'll be able to solve it! It's a bit of an art figuring out
exactly how much to include to capture what truly matters and still be able to
solve your model. We'll discuss this in greater detail when we do the
Pythagorean Won-Loss theorem from baseball, which is a nice application of
probability and multiple integrations.
- Finally, a common theme that surfaces as we do more and more mathematical
modeling is that simple models very quickly lead to very hard equations to
solve. The drowning swimmer problem is actually the same as
Snell's law, for how light
travels / bends in going from one medium to another. If you write down the
equations for the drowning swimmer, you quickly find a quartic to solve. For
interesting articles related to this, see the two papers below by Pennings on
whether or not dogs know calculus.
Click here for a picture of
his dog, Elvis, who does know calculus.
- Friday, March 19.
- We began today by seeing how well Taylor series approximate functions. The
Mathematica program
here is (hopefully) easy to use. You can specify the point and number of
terms of the Taylor series of cos(x) to do. At first it might seem surprising
that there is no improvement in fit when we go from a second order to a third
order Taylor series approximation; however, we have cos(x) = 1 - x^2/2! +
x^4/4! - x^6/6! + .... In other words, all the odd derivatives vanish at the
origin, and thus there is no improvement at the origin in adding a cubic term
(ie, the best cubic coefficient at the origin is 0). If we go to a fourth
order, we do see improvement. By n=10 or 12 we are already getting essentially
an entire period correct; by n=40 we have several cycles.
- We then compared two methods to find roots of polynomials. In some special
cases we can find closed form expressions for roots in terms of the
coefficients. For example, any linear equation (ax+b=0), quadratic
(ax^2+bx+c=0), cubic (ax^3+bx^2+cx+d=0) or quartic (ax^4+bx^3+cx^2+dx+e=0) has
a formula for the roots in terms of the coefficients of the polynomials; this
fails for polynomials of degree 5 and higher (the
Abel-Ruffini
Theorem; see also Galois).
It is very convenient when we have a solution that is a function of the
parameters; we can then use our methods to find the optimal values of the
parameters. Sadly in industry it is often difficult to get closed form
expressions; if you are looking for the most potent compound, for example, you
might be required to do numerous different trial runs and just observe which
is best. We thus need a way to find optimal values / solve equations. We
describe two below.
- Newton's method
is significantly more powerful than
divide and conquer
(also called the bisecting algorithm); this is not surprising as it assumes
more information about the function of interest (namely, differentiability).
The numerical stability of Newton's method leads to many fascinating problems.
One terrific example is looking at roots in the complex plane of a polynomial.
We assign each root a different color (other than purple), and then given any
point in the complex plane, we apply Newton's method to that point repeatedly
until one of two things happen: it converges to a root or it diverges. If the
iterates of our point converges to a root, we color our point the same color
as that root, else we color it purple. This leads to
Newton fractals,
where two points extremely close to each other can be colored differently,
with remarkable behavior as you zoom in. If you're interested in more
information, let me know; a good chaos program is
xaos (I have other
links to such programs for those interested). One final aside: it is often
important to evaluate these polynomials rapidly; naive substitution is often
too slow, and
Horner's algorithm is frequently used. We also talked about the dangers of
interchanging operations (such as interchanging the order of summation or a
limit and an integral). For limit-integral problems, frequently one appeals to
Lebesgue's
Dominated
Convergence Theorem to justify these interchanges.
Measure theory is a
generalization of integration, and allows us to handle more general sets. One
example is the characteristic function of the rationals on [0,1], ie, the
function that is 1 if x is a rational in [0,1] and 0 otherwise. This function
is not Riemann
integrable, as the upper sums are always 1 and the lower sums are always
0. It can be shown in a 'natural' generalization of integration that this
function integrates to 0 (which agrees with our intuition that there are a lot
more irrationals than rationals).
- The fractal behavior exhibited by Newton's method applied to finding roots
of polynomials is one of many examples of
Chaos Theory, or
extreme sensitivity to initial conditions. While one of the earliest examples
was the work of Poincare on the motion of
three planetary bodies, the subject really took off with Lorenz work on
weather (the Butterfly
Effect). Another nice example is the
orbit of Pluto;
while we know it will orbit the sun, its orbit is chaotic and we cannot say
where exactly in the orbit it will be millions of years from now.
- We ended today by discussing the
Birthday Paradox:
assuming each day of the year is equally likely (not true for hockey players!)
to be someone's birthday, and no one is born on Feb 29, how many people do you
need in a room before there is a 50% chance of two sharing a birthday? The
answer is surprisingly low, only about 23. To compute this, if there are n
people then the probability no one shares a birthday with anyone else is (1 -
0/365) (1 - 1/365) (1 - 2/365) * ... * (1 - (n-1)/365), or Prod_{k=0 to n-1}(1
- k/365). If we set this equal to 1/2, we just need to keep multiplying. But
this is inelegant, and it's not at all clear how the answer depends on the
number of days in the year (ie, we'd have to do an entirely new calculation if
we move to Pluto). We can solve this by using logarithms to summify the
expression. In other words, those log laws we make you learn in junior high /
high school can be used for problems you're interested in! Taking logarithms
gives log(1/2) = Sum_{k=0 to n-1} log(1 - k/365), as the log of a product is
the sum of the logs. We then use Taylor series to expand log(1-x), noting that
log(1-x) is approximately -x. This gives us log(1/2) =approx= -(1/365) Sum_{k=0
to n-1} k; we approximate the sum with an integral (Int_{x=0 to n-1} xdx,
which is (n-1)^2/2), and find that (n-1)^2/2 =approx= 365 log 2, or n =approx=
1 + Sqrt(365 * 2 log 2), which allows us to see how things would change if we
move to Pluto, where there are about 90,000 days in a year!
- I forgot to mention applications of this in Section 2. One application is
the following: say you have a steel girder, and acid rain is equally likely to
hit it anywhere. It is safe until rain hits the same spot twice -- how long
must you wait until you have a 50% chance of seeing it crack? You can of
course generalize this to you need 5 hits for it to crack.
- Another application (not mentioned in either section due to time) is to
birthday attacks in
cryptography.
- In the baseball lecture today we discussed one of my favorite riddles:
Find a way to place 5 queens on a 5x5 chess board such that there are three
squares where we may safely place pawns so that no queen attacks any pawn.
This is one of my favorite riddles (if you enjoy problems like this,
check out my riddles homepage; while the site is poorly designed, it
receives a lot of hits, and is usually in the top 10 in the world when you
google math riddles). The solution of this is related to
linear programming
(click
here for my notes on the subject). This is a phenomenally useful field,
and is a natural outgrowth of linear algebra and the optimization methods
we've been discussing. Linear programming can be used to find solutions to
problems ranging from determining cost-effective diets for poor nations to
scheduling baseball seasons, airlines, or movies (click
here for a paper of mine helping theaters optimizing scheduling). If
anyone is interested in discussing these topics further, let me know.
- Wednesday, March 17. The search
for extrema is a central pursuit in modern science and engineering. It is
important to have techniques to winnow the list of candidate points. The
methods discussed in class are the natural generalizations from one-variable
calculus. While one must prove that the function under consideration does have
a max/min, typically this is clear from physical reasons (for example, there
should be a pen of maximal area for given perimeter; there should be a path of
least time).
- In one-dimension, boundaries of sets aren't too bad; for example, the
boundary of [a, b] is just two points, a and b. The situation is violently
different in several variables. There the boundary can have infinitely many
points, and reducing a problem to interior critical points and checking the
function on the boundary is not enough; we must have a way to evaluate all
these points on the boundary.
- The generalization of the second derivative tests involves determinants
and whether or not the Hessian is a
positive
definite matrix, a
negative definite matrix, et cetera. What is really going on is that we
want to use the Principal Axis Theorem and change to a coordinate system where
the Hessian is easier to understand because, in this new coordinate system, it
is a diagonal matrix!
- In the Friday sabermetrics lecture, we'll discuss
linear programming.
This is a wonderful topic, and it allows us to solve (or approximate the
solutions) to a wealth of problems very quickly.
My lecture notes are online here. One of my favorite applications of
linear programming is to
determining
when teams are eliminated from playoff contention; MLB and ESPN frequently
do the analysis incorrectly by not taking into account secondary effects of
teams playing teams playing teams. For example, ESPN or MLB back in '04 had
the wild-card unclinched for one extra day. (The Sox had a big lead over the
Angels and a slightly smaller lead over the As; however, the As and the Angels
were playing each other and thus at least one team would get 2 losses, and one
had to win the ALWest. Thus the Sox had clinched the wildcard a day earlier
than thought.) If you're interested,
click here for a paper I wrote with colleagues applying linear programming
to helping a movie theatre determine optimal schedules.
- Finally, it is worth remarking that for many applications in the real
world, we do not need to find the true extremum, but rather just something
very close. For example, say we are trying to determine the optimal schedule
for an airline for a given day. We can write the linear programming problem
down, but it might take days to years to run; however, frequently we can
obtain bounds showing how close our answer is to the theoretical best (ie, we
can show we are no more than X away from optimal). It often happens that X is
small, and thus with a small run-time we can be close enough. (It isn't worth
it to ground our airfleet for a few years to find the optimal schedule!)
- In honor of Kayla's guest appearance today, I think it's fitting to end
with some comments about children's books.
Click, Clack Moo. Cows that type is the first of a series of
well-illustrated and entertaining adventures of Duck and his compatriots. It's
a Caldecott Honor
award winner, which after winning the
Caldecott Medal is,
I believe, considered the top honor for a children's story. One of my favorite
childhood books is
Make Way For
Ducklings, which has now been read to three generations of Millers. This
is extremely popular in the eastern part of the state (ie, Boston). We've
taken Cam to the statues; it's a fun area. For the western part of the state,
the big children's attractions are
Dr Seuss (from
Springfield) and the
Eric Carle Museum (in Amherst -- okay, there is something nice there!).
Finally, no `cultural' introduction to the Commonwealth of Massachusetts would
be complete without providing a link to
Norman Rockwell.
- Monday, March 15. Today we
discussed Taylor's theorem (in one and several variables). This is one of the
most important applications of calculus. It allows us to replace complicated
functions with simpler ones. There are numerous questions to ask.
- Are Taylor series unique? Yes. The definition just involves taking sums of
derivatives; the process is well-defined.
- Does every infinitely differentiable function equal its Taylor series
expansion? Sadly, no; the function f(x) = exp(1/x^2) if |x| > 0 and 0 if x=0
is the standard example. This function causes enormous problems in
probability. There are many functions which do equal their own Taylor series
expansion, such as exp(x), cos(x) and sin(x). It's not surprising that these
three are listed together, as we have the wonderful
Euler-Cotes formula:
exp(i x) = cos(x) + i sin(x), with
i = sqrt(-1). At
first this formula doesn't seem that important; after all, we mostly care
about real quantities, so why complexify our life by adding complex (i.e.,
imaginary) numbers? Amazingly, even for real applications (applications where
everything is real), complex numbers play a pivotal role. For example, note
that a little algebra gives cos(x) = (exp(i x) + exp(-i x)) / 2 and sin(x) = (exp(i
x) - exp(-i x)) / 2i. Thus properties of the exponential function transfer to
our trig functions. The
hyperbolic cosine
and sine functions are similarly defined; cosh(x) = cos(i x) = (exp(-i x)
+ exp(x)) / 2. The Foxtrot strip below
(many thanks to the author, Bill
Amend, for permission to post) illustrates the confusions that can happen
between hyperbolic and regular trig functions (as a note, why does Eugene know
that the calculator cannot be giving the right answer?). It's worth noting
that the formula exp(i x) = cos(x) + i sin(x) allows us to derive ALL trig
identities painlessly! See the comments from Friday, February 12.
-
- FoxTrot (c) Bill Amend. Used by permission of Universal Uclick. All rights
reserved.
- How easy are Taylor series to use? If we keep just a few terms, it's not
too bad; however, as the great Foxtrot strip below shows, it's not always
clear how nicely something simplifies.
-

- FoxTrot (c) Bill Amend. Used by permission of Universal Uclick. All rights
reserved.
- In the strip above, notice the large
factorials in the
denominator. Note 52! is about 1068; in other words, these terms
are small! For interest, 52! is the number of ways (with order mattering) of
arranging a standard deck of cards. There are about 1085 or so
subatomic thingamabobs
in the universe; we see quite quickly reach numbers this high (a deck with 70
cards more than sufices; in other words, we could not have each subatomic
object in the universe represent a different shuffling of a deck of 70 cards).
In a related note, it's important to think a bit and decide what 0! should be.
It simplifies many formulas to have 0! = 1, and we can make this somewhat
natural by saying there is only 1 way to do nothing (mathematically, of
course). The
definition of the factorial function on Wikipedia talks a little bit about
this.
- Unlike
0!, 0^0 is a bit more controversial as to what the definition should be.
As I don't want to pressure anyone, I will not publically disclose where I
stand in the great debate, though I'm happy to tell you privately / through
email.
- It's worth remarking on why we have n! in the denominators. This is to
ensure that the nth derivative of our function equals the nth derivative of
the Taylor expansion at the point we're expanding. In other words, we're
matching up to the first 2 derivatives for the 2nd order Taylor expansion, up
to the first 3 for the 3rd order Taylor expansion, and so on. It isn't
surprising that we should be able to do a good job; the more derivatives we
use, the more information we have on how the function is changing near the key
point.
- For many purposes, we just need a first order or second order Taylor
series; one of my favorites is the proof of the
Central Limit
Theorem in probability. One of my favorite proofs involves second order
Taylor expansions of the
Fourier Transforms
(these were mentioned in the additional comments on Friday, March 12).
- If f(x) equals its infinite Taylor series expansion, can we differentiate
term by term? This needs to be proved, and is generally done in a real
analysis course. For some functions such as exp(x) we can justify the term by
term differentiation, but note that this is something which must
be justified.
- A terrific application of just doing a first order Taylor expansion is
Newton's Method,
which we'll discuss in great detail on Friday.
- For some reason, most books don't mention the trick on how to quickly
compute higher order Taylor expansions in several variables. The idea is to
'bundle' variables together and use one-dimensional expansions. For example,
consider f(x,y) = exp(-(x^2 + y^2)) cos(xy). We saw in class how painful it is
to compute the Hessian, the matrix of second partial derivatives. That
involved either two product rules or knowing the triple product formula. If we
use our trick, it's much easier. Note exp(u) = (1 + u + u^2/2! + ...) and
cos(v) = (1 - v^2/2! + ...). A second order Taylor expansion means keep only
terms with no x's and y's, with just x or y, or with just x^2, xy or y^2 (a
third order would allow terms such as x^3, x^2 y, x y^2, y^3, and so on). Thus
we expand exp(u) cos(v) and then set u = -(x^2+y^2) and v = xy. For exp(u), we
just need 1 + u, as already the u^2/2 term will be order 4 when we substitute
-(x^2+y^2). For cos(v), we only keep the 1 as v^2/2 is order 4. Thus the
Taylor expansion of order 2 is just (1 -(x^2+y^2)) (1) = 1 - x^2 - y^2; this
is a lot faster than the standard method! That method works in general, but
there are so many cases where this is faster that it's worth knowing.
- Friday, March 12. The equality
of mixed partial derivatives is one of the most important results in
multivariable calculus. It is unusual to have two operations whose order
commutes; the square root of a sum is not the sum of the square-roots. Higher
derivatives arise in partial
differential equations. Fortunately, most of these equations involve at
most second order derivatives, and most don't have mixed partials.
- Famous examples: wave
equation, heat
equation,
Navier-Stokes equation.
- There is another, higher-level way of viewing the equality of mixed
partial derivatives. It involves the
Fourier Transform.
The Fourier transform converts differentiation to multiplication. Noting this,
the equality of mixed partials is then equivalent to multiplication being
commutative! It's tough to read about this without a solid background in math
and physics, so please contact me if you want to know more.
- One very fun and strange application of Fourier analysis is to the
Heisenberg uncertainty principle, which may be interpreted as a statement
about a function and its Fourier transform (one can not localize both well).
- Note that, as usual, our proof involves multiplying by 1 and the
Mean Value Theorem.
The MVT is one of my favorite tools in mathematics. It is, in some sense, a
variant of Taylor series, which we will meet very soon. It allows us to work
with differences between a function at two close points.
- The way we showed the two mixed partials are equal (for C2
functions) was to show that both of them equal Lim_{Δx,Δy
--> 0} S(Δx,Δy) / ΔxΔy, and then two quantities equal to a third are equal to
each other. This is one of the most powerful ideas in mathematics, going at
least as far back as being
one of Euclid's
5 common notions (there are 10 axioms in Euclid's Elements, 5 about
constructions and 5 about common notions). This principle of things equal to a
common object must be equal is one of the major tools in combinatorics. The
way we compute many sums is to give two interperations, one of which is doable
and the other what we want to know.
- Here is one of my favorite combinatorial
problems. Let (n C r) denote the number of ways of choosing r people from n
when order does not matter; thus (n C r) = n! / r!(n-r)!. One can show that
the number of ways of dividing n identical cookies among p people is (n + p -
1 C p - 1). Find a nice formula for Sum_{n = 0 to N} (n + p - 1 C p - 1). What
makes this a particularly challenging combinatorial sum is that the numerator
of the binomial coefficient is varying! If you look at this problem the right
way (all the info you need is in the previous lines!), you can evaluate this
immediately! In other words, what is Sum_{n = 0 to 2010} (n+76)! / 76! n!?
- Today we talked about how to use Mathematica
to attack many problems (plotting, integrating, finding derivatives).
My template (and more) is available here. While these programs are great,
they do not replace knowing how to do the calculations yourself. There are
many reasons for this. One, of course, is that these programs can make
mistakes, and it is useful to have a sense of what the right answer should be.
This is why the extra credit problem from the first day is one of my favorite
problems, namely trying to see which of 4 possible formulas is reasonable. I
can actually give an example of how this has impacted my professional research
today. I'm working on a paper with a colleague, and I conjectured that the
following combinatorial identity holds: Sum_{r = 0 to m} (-1)^r (m C r) (m+r C
r) / (r+1)(m+r) = 0 for all integers m > 1. There are computer programs to
PROVE conjectures like this. I ran it adn it generated the following:
The equation is routinely verifyable by dividing the
right-hand side by the left-hand side and simplifying the resulting rational
function to 1. Not very illuminating,
and I couldn't get it to output more. I was at least convinced that there was
a proof, but I felt bad not being able to see it. After some work, I was able
to prove it myself. Just yesterday (ok, technically today as it was 3am) I was
finishing up a paper where I needed to prove a formula for a much simpler
quantity, namely what is Sum_{k = 0 to n/2} k (n-k C k), which the computer
program could not do. Fortunately my experience on similar problems left me
trained to handle stuff such as this.
- I think I only discussed the following
example in one of the sections today. It relates to why I find it so important
to know proofs and definitions. It's about the fall of Western Civilization (orif
you're not quite as pessimistic, the financial mortgage meltdown).
While there
are many reasons behind the collapse (I have close family that has worked
in the upper levels of many of the top financial firms; if you are interested
in stories of what isn't reported in the news, let me know), one large
component was an incorrect use of Gaussian copulas. It's similar to looking at
low-velocity data and extrapolating to relativistic speeds -- there is an
enormous danger when you apply results from one region in another with no
direct data in that second realm. A great article on this is from Wired
Magazine (The
Formula That Killed Wall Street). It's worth reading this. Some
particularly noteworthy passages:
- Bankers should have noted that very small
changes in their underlying assumptions could result in very large changes in
the correlation number. They also should have noticed that the results they
were seeing were much less volatile than they should have been which implied
that the risk was being moved elsewhere. Where had the risk gone? They didn't
know, or didn't ask. One reason was that the outputs came from "black box"
computer models and were hard to subject to a commonsense smell test. Another
was that the quants, who should have been more aware of the copula's
weaknesses, weren't the ones making the big asset-allocation decisions. Their
managers, who made the actual calls, lacked the math skills to understand what
the models were doing or how they worked. They could, however, understand
something as simple as a single correlation number. That was the problem.
- No one knew all of this better than David X.
Li: "Very few people understand the essence of the model," he told The Wall
Street Journal way back in fall 2005. "Li can't be blamed," says Gilkes of
CreditSights. After all, he just invented the model. Instead, we should blame
the bankers who misinterpreted it. And even then, the real danger was created
not because any given trader adopted it but because every trader did. In
financial markets, everybody doing the same thing is the classic recipe for a
bubble and inevitable bust.
- Application of the day: I was asked to incorporate bicycle math into the
course. After some web-browsing, I came across
powerpoint slides by
Jason Achilich. Slides 11 and 12 discuss wheel size, slide 13 discusses
gears, and slides 16 - 18 discuss banking (which is essentially determining an
optimal choice for your c(t) given your opponents choices!).
- Monday, March 8. TBD, including
- One of the requests was to talk about applications of multivariable
calculus to molecular gastronomy. After some web browsing, I eventually
beecame interested in how bees communicate amongst themselves as to where food
is. There appear to bee two schools; one is the
waggle dance / language
school, and the other is the
odor plume theory. In addition to controversies on how bees learn, there
are lots of nice applications to gradients and (I believe) directional
derivatives. The goal is to convey information about a specific path through a
very complex space.
- See also the paper:
Odor
landscapes and animal behavior: tracking odor plumes in different physical
worlds (Paul Moorea, John Crimaldib). Abstract: The acquisition of
information from sensory systems is critical in mediating many ecological
interactions. Chemosensory signals are predominantly used as sources of
information about habitats and other organisms in aquatic environments. The
movement and distribution of chemical signals within an environment is heavily
dependent upon the physics that dominate at different size scales. In this
paper, we review the physical constraints on the dispersion of chemical
signals and show how those constraints are size-dependent phenomenon. In
addition, we review some of the morphological and behavioral adaptations that
aquatic animals possess which allow them to effectively extract ecological
information from chemical signals.
- We discussed
directional derivatives. It is natural that we develop such a concept, as
up until now we have only considered derivatives in directions parallel to the
various coordinate axes. A central theme of multivariable calculus is the need
to be able to approach a point along any path, and that in several dimensions
numerous paths are available (unlike the 1-dimensional case, where essentially
we just have two paths). Directional derivatives will play a key role in
optimization problems.
- Friday, March 5. Today we
discussed the Chain Rule. We essentially proved it in the special case of Calc
I (i.e., the one variable chain rule). The key step in the proof is a clever
multiplication by 1. The reason we need to do this is, as always, to be able
to isolate out the definition of the derivative.
- One item we must deal with carefully is that we had g(x+h) - g(x) divided
by itself; what if g(x+h) = g(x) for infinitely many arbitrarily small h? Then
we are dividing by zero. Can you prove that this cannot happen if g is
differentable? If that is not a strong enough condition, what if we assume the
derivative g' does not vanish at x -- does that suffice to prove that g(x+h)
cannot equal g(x) infinitely often for arbitrarily small h?
- One way to view the Chain Rule is that it is all about giving you the
freedom to choose. You can either plug everything in and differentiate
directly by brute force, or you can use the Chain Rule to find
the derivative of the composition in terms of the derivatives of the
constituent pieces. Depending on the problem, one way could be easier than the
other; there are examples of situations where direct substitution is best, and
examples where it is better to use the Chain Rule. With experience it becomes
clear which way is better. In Section 2.6, when we discuss gradients and
directional derivatives, we'll see a theoretical interpretation of the Chain
Rule. This will play a fundamental role when we turn to optimization problems
in Chapter 3. Finally, of course, it is useful to be able to compute an answer
two different ways, as this provides a nice check of your work.
- To use the Chain Rule in full glory, we needed to understand how to
multiply matrices, as h(x) = f(g(x)) implies (Dh)(x) = (Df)(g(x))
(Dg)(x), where x = (x1, ..., xn). I chose
to motivate matrix multiplication through the dot product, as we know how to
take the dot product of two vectors of the same number of coordinates. Matrix
multiplication looks quite mysterious at first.Wikipedia
has a nice article (with color) on multiplying matrices, though it is a
bit short on motivation. The advanced reason as to why we do this comes from
also viewing matrices as linear transformations, and we want the product of
two matrices to represent the composition of the transformations. This is an
advanced topic, and sadly is frequently mangled in a linear algebra course.
I've posted a little bit about this in the
advanced notes from
Thursday's lecture. The best motivation I know is to consider 2 x 2
rotation matrices.
If R(a) corresponds to rotating by a radians, and R(b) to rotating by b
radians, then R(b) R(a) should equal R(b+a); this does happen if we use the
matrix multiplication method we discussed.
- Wednesday, March 3.
- General comment: frequently it is easier to attack a problem directly than
to use the advanced theorems. For example, we saw today that if f(x,y) = x2y
and g(x,y) = x2 + y2, it is easier to take the partials
of h(x,y) = f(x,y)g(x,y) directly. In other words, just note that h(x,y) = x4y
+ x2y3. One place you may have seen this is in the
Farmer Brown optimization problem. Recall Farmer Brown has 40 meters of fence
and wants to enclose as large of an area as possible for his cows, but he is
particular and won't consider any shape other than a rectangle. The enclosed
area equals xy, but since 2x+2y=40 we have y=20-x, which implies that the area
is A(x) = x(20-x). We could differentiate this with the product rule, or just
note that the answer is A(x) = 20x-x2, which is simpler to
differentiate.
- Art fraud: See the
wikipedia article for information about the history of art forgeries, both
creation and detection. There are some very entertaining bits, especially
about how some forgers became so famous that people forged the forgeries!
There's another bit about a son forging his dad's work and providing
statements that it was done by his father, and another about the forger
Meegeren,
who had to create a forgery under police supervision to prove himself innocent
of treason charges! There's also the
Getty Kouros, which is
displayed with the note (according to the Wikipedia article) Greek, 530 B. C.
or modern forgery.
- Below are some articles to read about some of the mathematics of detecting
art forgeries:
- The Chain Rule is
one of the most important results in multivariable calculus, as it allows us
to build complicated functions depending on functions of many inputs. To state
it properly requires some linear algebra, especially
matrix
multiplication. The proof uses multiple applications of adding zero. This
is a essential skill to master if you wish to continue in mathematics. It is
somewhat similar to adding auxiliary lines in geometry. With experience, it
becomes easier to `see' where and how to add zero. The idea is we want to add
zero in such a way that we convert one expression to several, where the
resulting expressions are easier to analyze because we are subtracting two
quantities that are quite close. For the chain rule, we did this by adding
numerous intermediary points.
- For a baseball /
sabermetrics application of these rules, consider estimating a team's
winning percentage. The
Pythagorean
Won-Loss Theorem (I've
written a paper providing a theoretical justification for this observation)
says that if RS (resp. RA) represents the average number of runs a team scores
(resp. allows), #Wins/#Games is about RSγ
/ (RSγ + RAγ); originally γ was taken to be 2, but now
the best fit value is believed to be about 1.8. If we want to figure out how
many wins a team will have, we can estimate RS and RA by a micro analysis of
players, using the Runs
Created Formulas (and similar examples). We can then investigate the
changes in winning percentage as we vary parameters used to model player
productivity.
- Monday, March 1. Note:
for a general, all purpose Mathematica template (as well as a LaTeX one) that
I've written, click here.
- Today we talked about parametrized curves. Frequently a curve or a surface
may be regarded as the level set of a given function. For example, a sphere of
radius 2 is the level set with value 4 of the function f(x,y,z) = x2
+ y2 + z2. A circle of radius 5 may be regarded as the
level set of value 25 of the function f(x,y) = x2 + y2.
- As I had to prepare much of the lecture without hearing what I should try
to incorporate into the class, I decided to do another
Isaac Asimov
reference. Today's choice is the short story Runaround (you
can read the story here). If you do choose to read this story, remember it
was written in 1941 -- try and remember what technology was around back then
(of course, that doesn't explain the writing style...). One may interpret the
entire story as a study of level sets and parametrized curves; though this is
probably not what Asimov intended, it is a nice way to view it. (On another
note, it seems everything
these days has a wikipedia page!)
- In discussing paths of objects, the standard examples are planetary motion
(either orbits of planets or rocket ships and probes) and cannonballs /
bullets.
- Planetary motion:
Kepler's laws describe the orbits of planets, but give no reason as to why
the planets follow these paths. These were deduced from observational data,
and were crucial in leading
Newton to the inverse-square law of gravity.
- Probes: Approximately every 176 years, the outer gas giants align and one
probe launched from Earth can visit them all; this is called the
(planetary) Grand
Tour (in analogy with the
Grand Tour of Europe).
NASA didn't have the technology to prepare everything for the mission at the
needed launch date, and gambled that they could successfully reprogram the
probe billions of miles later. It's a phenomenal success story. See in
particular the details of the
Voyager 2 mission; Voyager 1 is currently the farthest man-made object,
and the fictional Voyager 6 sadly became the basis for a really bad Star Trek
movie, Star
Trek: The Motion Picture.
- Ballistics: Ballistics
deals with the trajectories of objects such as bullets and cannonballs. This
is an extremely important application of mathematics; for years people were
employed in creating
firing
tables for artillery. One interesting application is in the
Falklands War between Great Britain and Argentina. The Earth's rotation
causes the trajectory of objects fired in the northern and southern
hemispheres to be different; it is claimed the British missed the Argentinians
in their first volley, but quickly corrected. The explanation is the
Coriolis Effect.
- We ended the day with a discussion of
Greek Astronomy.
Particularly fascinating (to me) is how well they can do with circles on
circles;
click here for some more information, and
click here for
the Mathematica file from class today. In some sense, you can view the
circles on circles as a Taylor series expansion of planetary motion! The idea
that planets must move in circles seriously slowed down scientific
advancement. That said, it is truly impressive how well one can do with all
these circles, but the theory is not elegant (and that bothers a
mathematician!). The world need not be elegant, of course, but so much is that
we tend to seek out elegance. If you are given enough free parameters, you can
fit almost any data set; thus we tend to prefer theories with just a few input
parameters but sweeping predictions. The motions we saw are very similar to
what you see for orbits of atoms in electrons in the Bohr model when we
represent the electron's path by wiggles.
- Friday, February 26. Today was
hopefully the first of many days when you'll challenge me to work specific
items in a natural way into the lecture. Today's challenge was to use
Isaac Asimov and Star Trek: The Next Generation. I did have a way to do ST:TNG
(as we geeks write it), but forgot.
- One of my favorite science articles by Isaac Asimov is the
Relativity of Wrong. I
strongly urge you to read it. The article can really be read as a
homage to the tangent plane. The Earth (locally) is so close to being flat
that it really isn't that bad of an approximation to say the tangent plane is
perfect, and not just an approximation. Basically, the Earth's curvature is 8
inches per mile, which again is so slight that it's not surprising the
ancients (but not
The Ancients) thought the Earth was flat. Of course, eventually people
realized the Earth is not flat (ships on the horizon and shadows were two key
inputs). Later we realized that, since the Earth is rotating, there will be an
equatorial bulge. It's about 8.027 inches / mile near the equator, and about
7.973 near the poles. (Of course, this isn't quite right, see the article for
more details). The point of all this is that tangent planes can do a great
job, and small differences (which are hard to see) can be important and add
up!
- I believe the physical quantity that has been measured so accurately as to
cause Feynman to remark it is equivalent to measuring the distance from LA to
NY to within the thickness of a human hair is the
Gyromagnetic ratio.
- The ST:TNG reference was
going to be the episode "The Loss". The premise is that the Enterprise
runs into two-dimensional beings, and there are singularity issues. There's
also a cosmic string fragment (or some such technobabble, I forget exactly
what). The reason I was going to mention this is to talk about singularities
and domains of definitions of the function. Basically, things on the
Enterprise go haywire b/c part of it is now being intersected by the plane of
two-dimensional beings. There are some animations of the Enterprise being
intersected by the plane (i.e., a level set!). You can watch the episode on
YouTube; the key clip is part 2 at around 8:11:
-
We talked a lot about different notations for the derivative. It is very
convenient to be able to refer to all the different derivatives (functions
with one input, several inputs and one output, several inputs and several
outputs) with just one notation. Our definition is that a function is
differentiable if the error in the tangent plane approximation tends to zero
faster than the distance of where we are to where we start tends to zero. It
is sadly possible for the partial derivatives to exist without the function
being differentiable. We showed how it is not sufficient for the partial
derivatives to exist; that is not enough to imply our function is
differentiable. The example was f(x,y) = (xy)1/3. What must we
assume in order for the partial derivatives to imply our function is
differentiable? It turns out it suffices to assume the partial derivatives are
continuous. This is the major theorem in the subject, and provides a nice way
to check for when a function is differentiable.
-
The proof of the alluded to theorem above uses two of my favorite techniques.
While sadly we do not multiply by 1, we do get to add 0 and we do use the
Mean Value Theorem.
One of my goals in the class is to illustrate how to think about these
problems, why we try certain approaches for our proofs. We want to study how
well the tangent plane approximates our function, thus we need to study f(x,y)
- f(0,0) - (δf/δx)(0,0) x - (δf/δy)(0,0) y.
Our theorem assumes the partial derivatives are continuous, thus it stands to
reason that at some point in the proof we should use the partial derivatives
are continuous! The trick is to try and see how we can get another δf/δx
and another δf/δy to appear. The key is to recall the MVT. If we add 0 in a
clever way, we can do this. Our expression equals f(x,y) - f(0,y) +
f(0,y) - f(0,0) - (δf/δx)(0,0) x - (δf/δy)(0,0)
y. We now use the MVT on f(x,y) - f(0,y) and on f(0,y) - f(0,0). In each of
these two expressions, only one variable changes. Thus the first is (δf/δx)(c,y)
x and the second is (δf/δy)(0,ĉ). Thus the error in using the tangent plane is
[(δf/δx)(c,y) - (δf/δx)(0,y)] x + [(δf/δy)(0,ĉ) - (δf/δx)(0,o)]
y. We now see how the continuity of the partials enters -- it ensures that
these differences are small, even when we divide by ||(x,y)-(0,0)||.
- Wednesday, February 24.
- Continuing our economics theme, one powerful application of the concept of
the derivative is that if f'(x) is positive then f is increasing to the right
and decreasing to the left, while if f'(x) < 0 then f is decreasing to the
right and increasing to the left. This has numerous applications in
optimization problems, and explains the shape of objects such as the
Laffer curve.
- Unfortunately, the composition of differentiable functions need not be
differentiable. The example we discussed, f(x,y) = (xy)1/3 and g(x)
= (x,x) illustrates a major difference between one and several variables. The
real issue is that when we compute the partials, we are only moving in a few
directions and not exploring all directions. We will remedy this later.
- Instead of approximating a function locally by a line, we now use a plane
(in 2-dimensions) or hyperplane (in general). We can use the
Mean Value
Theorem to get some information on how close the estimation is, and then use
these estimations to approximate our function. One of the most important
applications of the tangent line approximation is
Newton's Method,
which we'll discuss in much greater detail later. It's a phenomenal way of
finding roots of polynomials, and also leads to chaotic / fractal behavior
when applied to finding roots of cubics and other functions;
click here for the
wikipedia article (which has many fascinating pictures). The
Mathematica
file with the tangent line and tangent plane approximations is here.
- We discussed how important it is to quantify how close our approximated
value is to the function's value. We will expand on this later in the
semester; for an entertaining clip on a miscalculation, see the footage here
of the Tacoma Narrows
Bridge collapse; warning: this video is from the 1940s, and is
presented in a very different style than you may be used to!
- This is the second day in a row where the additional comments have
involved fractals; this is not a coincidence; these objects appear throughout
modern math / physics / econ / .... The most famous example is the
Mandelbrot set. A
very nice popular book on the subject is James Gleik's
Chaos:
Making a New Science (the webpage has links to excerpts from the book).
Another wonderful read of his is
Genius, which is about the great physicist Richard Feynman (who's work is
mentioned in our book).
- Monday, February 22.
- We talked about limits and continuity. Informally, a continuous function
is one where we can draw its graph without lifting our pen/pencil from the
paper. If we take this as our working definition, however, we can easily be
misled in terms of properties of continuous functions. For example, are all
continuous functions differentiable? Clearly not, as we can take f(x) = |x|.
While this function is not differentiable at x=0, it is differentiable
everywhere else. Thus we might be led to believe that all continuous functions
are differentiable at most points. This sadly is not true.
- Weierstrass showed (the first text where I read this used the phrase 'Weierstrass
distressed 19th century mathematicians) that it is possible to have a function
which is continuous and differentiable nowhere!
The wikipedia
article is a good starting point. In addition to explicitly stating what
the function is, it has a nice plot and good comments. The function exhibits
fractal behavior, though
the term fractal wasn't used until many years later by Mandelbrot.
- In higher mathematics we learn to quantify orders of infinity. We see that
there are more real numbers than rational numbers (see
Cantor's diagonalization argument); the comments from Wednesday, February
19th discuss whether or not there is a set whose size is strictly between the
rationals and the real. Amazingly, if you count functions properly, it turns
out almost every continuous function is differentiable nowhere!
See here for some
comments on this strange state of facts. The key ingredient is an advanced
result from
Functional Analysis, the
Baire Category
Theorem.
- Fractals have a rich history and numerous applications, ranging from
economics to Star Trek II: The Wrath of Khan (where they were used to generate
the simulated landscapes of the Genesis Torpedo;
see the wikipedia article on
fractals in film). The economics applications are quite important. One of
the most influential papers is due to Mandelbrot (The
Variation of Certain Speculative Prices). This is one of the most
important papers in all of economics, and argues that the standard Brownian
motion / random walk model of wall street is wrong. The crux of the argument
is that these standard theories do not allow enough large deviation days. For
more on this, see Mandelbroit-Hudson: The fractal (mis)behavior of markets (I
have a copy of this book and can lend you part of it if you are interested).
- We saw that for many limits of the form 0/0, a good way to attack the
problem is to switch to polar coordinates. We then replace (x,y) goes to (0,0)
through an arbitrary path with r tends to 0 and θ
does whatever it wants. This works for many problems, and is a good thing to
try.
- Wednesday, February 19.
- Russell's paradox
is one of the most famous in all of mathematics; it showed that we didn't even
understand what it meant to be a set or an element of a set! Another famous
one is the
Banach - Tarski paradox, which tells us that we don't understand volumes!
It basically says if you assume the
Axion of Choice,
you can cut solid sphere into 5 pieces, and reassemble the five pieces to get
two completely solid spheres of the same size as the original! While it is
rare to find these paradoxes in mathematics, understanding them is
essential. It is in these counter-examples that we find out what is really
going on. It is these examples that truly illuminate how the world is (or at
least what our axioms, imply). Most people use the
Zermelo-Fraenkel axioms, abbreviated ZF. If you additionally assume
the Axiom of Choice, it's called ZFC or ZF+C. Not all problems in mathematics
can be answered yea or nay within this structure. For example, we can quantify
sizes of infinity; the natural numbers are much smaller than the reals; is
there any set of size strictly between? This is called the
Continuum
Hypothesis, and my mathematical grandfather (my thesis advisor's advisor),
Paul Cohen, proved it is independent (ie, you may either add it to your axiom
system or not; if your axioms were consistent before, they are still
consistent).
- In a real analysis course, one develops the notation and machinery to put
calculus on a rigorous footing. In fact,
several prominent people
criticized the foundations of calculus, such as Lord Berkeley; his famous
attack,
The
Analyst, is available here. It wasn't until decades later that a good
notion of limit, integral and derivative were developed. Most people are
content to stop here; however, see also
Abraham Robinson's
work in
Non-standard Analysis. He is one of several mathematicians we'll encounter
this semester who have been affiliated with my Alma Mater,
Yale. Another is the great
Josiah Willard
Gibbs.
- One of my favorite applications of
open and closed sets is
Furstenberg's proof of the infinitude of primes; one night while a postdoc
at Ohio State
I had drinks with
Hillel Furstenberg and one of his students,
Vitaly Bergelson.
This is considered by many to be one of the best proofs of the infinitude of
primes; it is so good it is one of six proofs given in
THE Book.
Unlike most proofs of the infinitude of primes, this gives no bounds on how
many primes there are at most x; even
Euclid's proof (if
there are only finitely many primes, say p1, ..., pn,
then consider (p1*...*pn)+1; either this is new prime or
is divisible by a prime not in our list, since each prime in our list has
remainder 1 when we divide by it) gives a lower bound, namely log log x (the
true answer is that there are about x / log x primes at most x).
As a nice exercise (for fun), prove this fact. This leads to an interesting
sequence: 2,
3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 2801, 11, 17, 5471,
52662739, 23003, 30693651606209, 37, 1741, 1313797957, 887, 71, 7127, 109, 23,
97, 159227, 643679794963466223081509857, 103, 1079990819, 9539, 3143065813,
29, 3847, 89, 19, 577, 223, 139703, 457, 9649, 61, 4357....
This sequence is generated as follows. Let a_1 = 2, the first prime. We apply
Euclid's argument and consider 2+1; this is the prime 3 so we set a_2 = 3. We
apply Euclid's argument and now have 2*3+1 = 7, which is prime, and set a_3 =
7. We apply Euclid's argument again and have 2*3*7+1 = 43, which is prime and
set a_4 = 43. Now things get interesting: we apply Euclid's argument and
obtain 2*3*7*43 + 1 = 1807 = 13*139, and set a_5 = 13. Thus a_n is the
smallest prime not on our list genereated by Euclid's argument at the nth
stage. There are a plethora of (I believe) unknown questions about this
sequence, the biggest of course being whether or not it contains every prime.
This is a great sequence to think about, but it is a computational nightmare
to enumerate! I downloaded these terms from the Online Encyclopedia of Integer
Sequences (homepage is http://www.research.att.com/~njas/sequences/
and the page for our sequence is http://www.research.att.com/~njas/sequences/A000945 ).
You can enter the first few terms of an integer sequence, and it will list
whatever sequences it knows that start this way, provide history, generating
functions, connections to parts of mathematics, .... This is a GREAT website
to know if you want to continue in mathematics. There have been several times
I've computed the first few terms of a problem, looked up what the future
terms could be (and thus had a formula to start the induction).
- Tuesday, February 14: Special algebra supplement.
The homework problems due Wednesday involve sketching various curves, many of
which are famous conic
sections. The shapes that arise are often
ellipses,
hyperbolas,
parabolas,
lines and
circles. The theory of conic
sections says that these are all related, and arise as cross sections obtained
by having planes intersect a cone at various angles. These shapes arise
throughout mathematics and science. Here are just a few examples, which
illustrate their importance.
- Chemisty / Physics: The
ideal gas law states that
PV = nRT. If we set T equal to a constant, we then get PV is constant
(this special case is called
Boyle's law). Note that
this is an equation of a hyperbola, and thus the isotherms (level sets of
constant temperature) are hyperbolas.
- Physics / Astrophysics: The most common example of conic section is orbits
of planets. In three-dimensional space, planets orbiting the sun under a
gravitational force proportional to the inverse-square of the distance travel
in ellipses, hyperbolas, or parabolas (see
here for more details).
- Monday, February 15.
- There are many different coordinate systems we can use; depending on the
symmetry of the problem, frequently it is advantageous to use one system over
another. Three of the most common are the following:
- We discussed level sets
today. These occur all the time in real world plots. For example, weather maps
constantly show lines of constant temperature; these are called
isotherms.
- Mathematica is a wonderful program to use to plot functions, take
derivatives, evaluate integrals, et cetera. I've created a template on how to
use Mathematica (and one on how to use LaTeX, to write nicely formatted
technical documents). For more information on these,
go to the link here.
- We quickly reviewed finding maxima and minima by searching for critical
points and comparing the value of our function there and at the end points.
This is fine in one-dimensional calculus, but becomes a bit harder in several
variables? Why? What's the difference? The problem is that the boundary of an
interval in one dimension is just two points; in other words, the interval [a,b]
has just two endpoints, namely a and b. What about an area in the plane, such
as B(0,0)(1), the ball of radius 1 about the origin? The problem is
that the boundary here is a one-dimensional object, and there are now
infinitely many points to check! We'll see later how to attack this. The key
input will be vectors of partial derivatives and figuring out when one vector
is in the same direction as another (in other words, using all the material we
banked from week 1). This method is known as
Lagrange
Multipliers.
- In finding the roots of r^4 - 4r^2 - c = 0 we had to solve a quartic
equation; a monumental theorem in mathematics states that there are explicit,
closed form solutions for the roots in terms of the coefficients of
polynomials of degree one (lines), two (quadratics),
three (cubics) and
four (quartics),
but not for a
general polynomial of degree five or higher.
- Friday, February 12.
- The cross product
is related to generalizations of the derivative of a function. It occurs all
the time in physics and engineering. Perhaps the most famous set of equations
involving it are
Maxwell's equations of Electricity and Magnetism. We talked briefly about
the operator
= (δ/δx, δ/δy,
δ/δz). We'll see it greater detail later in the course.The cross product
with this symbol
arises in Stokes'
theorem, a beautiful generalization of the fundamental theorem of calculus
to higher dimensions.
- While we defined the cross product through an abuse of notation, the
formula at the end does make sense. It takes two vectors as inputs and gives
back another vector. Note the symmetry in the subscripts: v = (v1, v2, v3) and
w = (w1, w2, w3) leads to v x w = (v2 w3 - v3 w2, v3 w1 - v1 w3, v1 w2 - v2
w1); each component has a difference of a product of a v and a w, each
component is missing the subscript corresponding to it, and the initial
subscripts are in cyclic order (1,2,3,1,2,...).
- We use the right
hand screw rule to determine how to orient our axes; while it is not wrong
to do so another way, it is counter to all other notation and thus should be
avoided.
- A nice application of the cross product is in determining
equations of planes.
- Our proof that the determinant of a matrix is the area of the
parallelogram used crucially the dot product. It is not surprising that this
happens. The dot product gives us the cosine of the angle between two vectors;
the area requires us to know the sine of the angle (to get the height of the
parallelogram), and these are related through the
Pythagorean Theorem, which
says cos2(x) + sin2(x) = 1. There are many ways to
obtain this formula. Perhaps one of the most useful is the
Euler - Cotes formula,
exp(ix) = cos(x) + isin(x). One can essentially derive all of trigonometry
from this relation, with just a little knowledge of the
exponential
function. Specifically, we have exp(z) = 1 + z + z2/2! + z3/3!
+ .... It is not at all clear from this definition that exp(z) exp(w) =
exp(z+w); this is a statement about the product of two infinite sums equaling
a third infinite sum. It is a nice exercise in combinatorics to show that this
relation holds for all complex z and w.
- Taking the above identities, we sketch how to derive all of trigonometry!
Let's prove the angle addition formulas. We have exp(ix) = cos(x) + isin(x)
and exp(iy) = cos(y) + isin(y). Then exp(ix) exp(iy) = [cos(x) + isin(x)] [cos(y)
+ isin(y)] = [cos(x) cos(y) - sin(x) sin(y)] + i [sin(x) cos(y) + cos(x) sin(y)];
however, exp(ix) exp(iy) = exp(i(x+y)) = cos(x+y) + i sin(x+y) by Euler's
formula. The only way two complex numbers can be equal is if they have the
same real and the same imaginary parts. Thus, equating these yields cos(x+y) =
cos(x) + isin(x) and sin(x+y) = sin(x) cos(y) + cos(x) sin(y).
- It is a nice exercise to derive all the other identities. One can even get
the Pythagorean theorem! To obtain this, use exp(ix) exp(-ix) = exp(0) = 1.
- We thus see there is a connection between the angle addition formulas in
trigonometry and the exponential addition formula. Both of these are used in
critical ways to compute the derivatives of these functions. For example,
these formulas allow us to differentiate sine, cosine and the exponential
functions anywhere once we know their derivative at just one point. Let f(x) =
exp(x). Then f'(x) = lim [f(x+h) - f(x)]/h = lim [exp(x+h) - exp(x)] / h = lim
[exp(x) exp(h) - exp(x)] / h = exp(x) lim [exp(h) - 1] / h; as exp(0) = 1, we
find f'(x) = exp(x) lim [f(h) - f(0)] / h = exp(x) f'(0); thus we know the
derivative of the exponential function everywhere once we know the derivative
at 0! One finds a similar result for the derivatives of sine and cosine
(again, this shouldn't be surprising as the functions are related to the
exponential through Euler's formula).
- As I've said in class, we could title the first week of the semester
Applications of the Pythagorean Theorem. As a number theorist, it's hard
for me not to discuss its generalizations. The Pythagorean Theorem says that
for a right triangle, a2 + b2 = c2, where a
and b are the bases of our triangle and c is the hypotenuse. It is not
immediately clear that there are integer solutions to this, but a little
inspection turns up a few, such as (3, 4, 5), (5, 12, 13), and of course
trivial modifications such as (6, 8, 10). It turns out there are infinitely
many solutions in the integers, and there is even a way to generate all of
these solutions, which are called Pythagorean triples.
Click here for more
information on the Pythagorean triples.
- One can of course ask about generalizations of the Pythagorean Theorem;
the most famous is whether or not there are any non-trivial integer
solutions to an + bn = cn, where n > 2 and
abc is non-zero. This is the famous
Fermat's Last
Theorem, solved by
Wiles / Taylor-Wiles using
elliptic curves and
modular forms (on a
personal note, I had the pleasure of
teaching a class with Wiles on elliptic curves at Princeton in 2001).
There are other generalizations, such as the
Beal's Conjecture.
Another wonderful generalization is
Euler's sum of powers conjecture. For a nice occurrence of these in
popular culture, see the
Homer3
short from
Treehouse of Horror VI.
Full clip
is here, but no sound on my computer.
Sound
works here, but it's in German, or if you prefer,
in
Spanish.
- Wednesday, February 10.
- There are many applications of equations of lines, planes and projections.
One of my favorites comes from art. The painter
Thomas Eakins
projected pictures of people onto canvasses; this allowed him to have
realistic pictures, and saved him hours of computations. Two pictures
frequently mentioned are
Arcadia and
Mending the Net. He hid what he did; it wasn't until years later that
people noticed he had done this. If memory serves, this was discovered when
people were looking through photographs in an attic and noticed a picture of
four people on a New Jersey highway who were identical to four people in a
seascape. Upon closer inspection of the canvass, they noticed marks (which
were partly hidden) indicating Eakins projected the image onto the canvass.
Click here for more on the
subject.
See also here for a nice story on the controversy (the use of `technology'
such as projectors in art).
For a semi-current view
on the merits of tracing, watch this video clip.
- There is an enormous literature on the applications of lines, planes,
projections et cetera in art.
The wikipedia
article is a good starting point.
- We discussed the equation for the angle between two vectors.
Geometrically, it's clear that if we change the lengths of the vectors then we
shouldn't change the angle; after a little inspection, we see that our formula
satisfies that property. It is a great skill to be able to look at a formula
and see behavior like this. There is a rich history of applying intuition like
this to problems. One example is
dimensional (or unit)
analysis,which is frequently seen in physics or chemistry; my favorite /
standard example is the
simple pendulum.
- The
Cauchy-Schwarz inequality is one of the most important in mathematics;
it's used all the time to bound quantities. My favorite application, which is
quite advanced, is to the
uncertainty
principle in quantum mechanics! It turns out one can view the uncertainty
principle as a mathematical statement about a function and its
Fourier transform.
See me if you want more details.
- There are lots of inequalities in mathematics. Another very useful one is
the arithmetic mean - geometric
mean; see also my
handout with some proofs (written years ago in my Ohio State days)
- We will not cover
determinants in great detail. For us, the most important property is that
determinants are related to the volume of the span of the different directions.
- Monday, February 8. A common feature
in several variables is to first recall the one variable case, and use that as
intuition to describe what's happening. We started by reviewing the three
different ways to write
the equation of a
line in the plane, point-slope, point-point and slope-intercept. We then
generalized this to higher dimensions, and then wrote down the
definition of a plane
(we'll discuss alternate definitions involving normal vectors later in the
course; note that planes arose in the Superbowl yesterday as to whether or not
the Saints had control when the ball broke the plane during the two point
conversion;
click here,
click here or
click here
for more on breaking the plane in football).
- We discussed how there are several different but equivalent ways of
writing the same expression. We can do it with vectors, as in
(x,y,z) = P + tv,
or we can do it as a series of equations, such as x = p1 + tv1,
y = p2 + tv2, z = p3 + tv3, or as
xi = pi + tvi with i in {1,2,3}. You
should use whichever way is easier for you to visualize. It is possible to get
so caught up in reductions and compactifications that the resulting equation
hides all meaning. A
terrific example is the great physicist Richard Feynman's reduction of
all of physics to one equation, U = 0, where U represents the
unworldliness of the universe. Suffice it to say, reducing all of physics
to this one equation does not make it easier to solve physics problems /
understand physics (though, of course, sometimes good notation does assist us
in looking at things the right way).
- A nice problem is to prove the following about perpendicular lines: the
product of their slopes is always -1 if neither is parallel to the x- or
y-axis. In some sense, this tells us that in the special case when the
lines are the x- and y-axes, we should interpret the product of their slopes
as -1, or in other words in this case 0
·∞ = -1.
- In differential trigonometry, it is essential that we
measure angles in radians.
If we use radians, then the derivative of sin(x) is cos(x) and the derivative
of cos(x) is -sin(x); this is not true if we use degrees. If we use
degrees, we have pesky conversion factors of 360/2π
to worry about. The proof of these derivatives follow from the
angle addition formulas; let me know if you want more details about this
(we'll mention this briefly when we do Taylor series of exp(x)).
- In the proof of the
Law of Cosines, the key step was adding an auxiliary line to reduce the
problem to the point where we could apply the Pythagorean Theorem. Learning
how to add these auxiliary lines is one of the hardest things to do in math.
As a good exercise, figure out what auxiliary lines to add to prove the angle
addition formula for sine, namely sin(x+y) = sin(x) cos(y) + cos(x) sin(y);
click here for the solution. For another example,
click here.
- We ended the day with the definition of the
inner or dot product.
While our definition only works for vectors, it turns out this is one of the
most useful ideas in mathematics, and can be generalized greatly. For example,
we can talk about the dot product of functions! We'll see on Wednesday how the
dot product is related to angles and lengths, and thus we will find that we
can discuss in a sensible manner what the `angle' is between sin(x) and cos(x)!
- Friday, February 5. The main result we
proved today was the
Pythagorean Theorem,
which relates the length of the hypotenuse of a right triangle to the lengths
of the sides (President
Garfield is credited with a proof). For us, this result is important as
gives us a way to compute the length of vectors. While we only proved it in
the special case of a vector with two components, the result holds in general.
Specifically, if v = (v1, ..., vn) then ||v|| = sqrt(v12
+ ... + vn2). It is a nice exercise to prove this.
One way is to use
Mathematical
Induction (one common image for induction is that of
following dominoes);
see also my handout on induction.
Below are some additional remarks. These relate to material mentioned in
class. The comments below are entirely for your personal enjoyment and
edification. You do not need to read these for the class. These are meant to
show how topics discussed arise in other parts of mathematics / science; these
will not be on exams, you are not responsible for learning them, ....
- We also discussed notation for the natural numbers, the integers, the
rationals, the reals and the complex numbers. We will not do too much with the
complex numbers in the course, but it is important to be aware of their
existence. Generalizations of the complex numbers, the
quaternions, played a
key role in the development of mathematics, but have thankfully been replaced
with vectors (online
vector identities here). The quaternions themselves can be generalized a
bit further to the octonions
(there are also the sedenions,
which I hadn't heard of until doing research for today's comments).
- A natural question to ask is, if all we care about are real numbers, then
why study complex numbers? The reason is that certain operations are not
closed under the reals. For example, consider quadratic polynomials f(x) = ax2
+ bx + c with a, b and c real numbers. Say we want to find the roots of f(x) =
0; unfortunately, not all polynomials with real coefficients have real roots,
and thus to find the solutions may require us to leave the real. Of course,
you could say that if all you care about is real world problems, this won't
matter as your solutions will be real. That said, it becomes very
useful (algebraically) to allow imaginary numbers such as i = sqrt(-1). The
reason is that it allows us a very clean way to manipulate many quantities.
Our text has a great discussion of this on pages 54 to 61, especially the top
of page 55.
There is an explicit, closed form expression for the three roots of a cubic;
while it may not be as simple as the
quadratic formula, it does the job. Interestingly, if you look at x3
- 15x - 4 = 0, the aforementioned method yields (2 + 11i)1/3 +
(2-11i)1/3. It isn't at all obvious, but algebra will show that
this does in fact equal 4! As you continue further and further in mathematics,
the complex numbers play a larger and larger role.
- Later in the semester we will revisit
Monte Carlo
Integration, called by many the most important mathematical paper
of the 20th century. Sadly, most integrals cannot be evaluated in closed form,
and we must resort to approximation methods.
- Sabermetrics is
the `science' of applying math/stats reasoning to baseball. The formula I
mentioned in class is what's known as the
log-5 method; a better formula is the
Pythagorean Won
- Loss formula (someone linked
my paper deriving this from a reasonable model to the wikipedia page).
ESPN, MLB.com and all sites like this use the Pythagorean win expectation in
their expanded series. My derivation is a nice exercise in multivariable
calculus and probability; we will either derive it in class or I'll give a
supplemental talk on it.
- In general, it is sadly the case that most functions do not have a simple
closed form expression for their anti-derivative. Thus integration is
magnitudes harder than differentiation. One of the most famous that cannot be
integrated in closed form is exp(-x2), which is related to
calculating areas under the normal (or bell or Gaussian) curve. We do at least
have good series expansions to approximate it; see the entry on the
erf (or error) function.
- In class we mentioned that the anti-derivative of ln(x) is x ln(x) - x; it
is a nice exercise to compute the anti-derivative for (ln(x))n for
any integer n. For example, if n=4 we get 24 x-24 x Ln[x]+12
x Ln[x]2-4 x Ln[x]3+x Ln[x]4.
- Finally, the quest to understand the cosmos played an enormous role in the
development of mathematics and physics. For those interested, we'll go to the
rare books library and see first editions of Newton, Copernicus, Galileo,
Kepler, .... Some interesting stories below; see also a great article
by Isaac Asimov on all of this, titled
The Planet That Wasn't.