Additional comments related to material from the
class. If anyone wants to convert this to a blog, let me know. These additional
remarks are for your enjoyment, and will not be on homeworks or exams. These are
just meant to suggest additional topics worth considering, and I am happy to
discuss any of these further. These comments are from an earlier iteration.
-
Wednesday, May 9: Lecture
From Math/Stat 341: Probability:
M&M game: Lecture from Probability:
M&M Game: https://youtu.be/DFobthridWI.
M&M Game: Memoryless Processes, Geometric Series, Double Recurrences,
Hypergeometric, ...: (slides)
(paper)
-
Lots of great topics introduced:
-
Research of mine (mostly with undergrads) of Fibonacci numbers and
recurrences (lots of opportunities for new research here):
-
Monday,
May 7.
- Erf function:
https://en.wikipedia.org/wiki/Error_function
Friday,
May 4.
Today
we discussed Taylor's theorem. This is one of the most important applications
of calculus in general and sequences and series in particular. It allows us to replace complicated functions with simpler ones.
There are numerous questions to ask.
- Are Taylor series unique? Yes. The definition just involves taking sums
of derivatives; the process is well-defined.
- Does every infinitely differentiable function equal its Taylor series
expansion? Sadly, no; the function \(f(x) = exp(1/x^2)\) if \(|x| > 0\) and
\(0\) if \(x=0\) is the standard example. This function causes enormous
problems in probability. There are many functions which do equal their own
Taylor series expansion, such as \(\exp(x), \cos(x)\) and \(sin(x)\). It's
not surprising that these three are listed together, as we have the
wonderful Euler-Cotes
formula: \(\exp(i x) = \cos(x) + i sin(x)\), with i
= sqrt(-1). At first this formula doesn't seem that important; after
all, we mostly care about real quantities, so why complexify our life by
adding complex (i.e., imaginary) numbers? Amazingly, even for real
applications (applications where everything is real), complex numbers play a
pivotal role. For example, note that a little algebra gives \(\cos(x) = (\exp(i
x) + \exp(-i x)) / 2\) and \(\sin(x) = (\exp(i x) - \exp(-i x)) / 2i\). Thus
properties of the exponential function transfer to our trig functions. The hyperbolic
cosine and sine functions are
similarly defined; \(\cosh(x) = \cos(i x) = (\exp(-i x) + \exp(x)) / 2\).
The Foxtrot strip
below (many thanks to the author, Bill
Amend, for permission to post) illustrates the confusions that can
happen between hyperbolic and regular trig functions (for
extra credit, why does Eugene know that the calculator cannot be giving the
right answer?).
It's worth noting that the formula \(\exp(i x) = \cos(x) + i \sin(x)\)
allows us to derive ALL trig identities painlessly!
-

- FoxTrot (c) Bill Amend. Used by permission of Universal Uclick. All
rights reserved.
- How easy are Taylor series to use? If we keep just a few terms, it's not
too bad; however, as the great Foxtrot strip below shows, it's not always
clear how nicely something simplifies.
-

- FoxTrot (c) Bill Amend. Used by permission of Universal Uclick. All
rights reserved.
- In the strip above, notice the large factorials in
the denominator. Note 52! is about 1068; in other words, these
terms are small! For interest, 52! is the number of ways (with order
mattering) of arranging a standard deck of cards. There are about 1085 or
so subatomic thingamabobs in
the universe; we see quite quickly reach numbers this high (a deck with 70
cards more than sufices; in other words, we could not have each subatomic
object in the universe represent a different shuffling of a deck of 70
cards). In a related note, it's important to think a bit and decide what 0!
should be. It simplifies many formulas to have 0! = 1, and we can make this
somewhat natural by saying there is only 1 way to do nothing
(mathematically, of course). The
definition of the factorial function on Wikipedia talks a little bit about
this.
- Unlike
0!, 0^0 is a bit more controversial as to what the definition should be.
As I don't want to pressure anyone, I will not publically disclose where I
stand in the great debate, though I'm happy to tell you privately / through
email.
- It's worth remarking on why we have n! in the denominators. This is to
ensure that the nth derivative of our function equals the nth derivative of
the Taylor expansion at the point we're expanding. In other words, we're
matching up to the first 2 derivatives for the 2nd order Taylor expansion,
up to the first 3 for the 3rd order Taylor expansion, and so on. It isn't
surprising that we should be able to do a good job; the more derivatives we
use, the more information we have on how the function is changing near the
key point.
- For many purposes, we just need a first order or second order Taylor
series; one of my favorites is the proof of the Central
Limit Theorem in probability.
One of my favorite proofs involves second order Taylor expansions of the Fourier
Transforms (these were
mentioned in the additional comments on Friday, March 12).
- If \(f(x)\) equals its infinite Taylor series expansion, can we
differentiate term by term? This needs to be proved, and is generally done
in a real analysis course. For some functions such as \(exp(x)\) we can
justify the term by term differentiation, but note that this is something
which must be
justified.
- A terrific application of just doing a first order Taylor expansion is Newton's
Method.
-
We discussed convergence of Taylor series, theoretically using the Mean
Value Theorem (though a better argument gives a smaller error) and
experimentally by looking at a
Mathematica notebook on \(\cos(x)\). We then talked about multivariable
Taylor series, and a trick to quickly evaluate them. We ended with a
discussion of the
second
derivative test in several variables.
For more on Taylor
series see the Wikipedia page.
- Our proof of how well Taylor
series approximate heavily
involves the Mean
Value Theorem. Taylor series involve writing our function as a
combination of the functions \(1, x, x^2, x^3\) and so on; other
possibilities exist. We could use trigonometric
polynomials, writing our function as combinations of \(\sin(nx)\) and
\(\cos(nx)\) where \(n\) ranges over all integers. This leads to Fourier
series, which are very useful (and often have great convergence
properties). What is so great about all of these is that we can transmit
just a few coefficients and then rebuild the function. Why does this work?
Rather than transmitting all values of the function, by sending just a few
coefficients we can exploit the fact that we have a powerful computer on our
end to rebuild the function. If you want to send a video, for example, you
could have a two dimensional function \(f(x,y)\), where \(f(x,y)\)
represents the color of the pixel at \((x,y)\). We need to reconstruct the
function, but we don't want to send the value of each pixel. Enter Fourier
series! We now index by time, and consider \(f(x,y;t)\); actually, it's
probably better to send \(g(x,y;t) = f(x,y;t) - f(x,y;t-1)\).
- Finally, one can generalize even further and consider orthogonal
polynomials.
-
We saw how well Taylor series approximate functions. The Mathematica
program here is
(hopefully) easy to use. You can specify the point and number of terms of
the Taylor series of \(\cos(x)\) to do. At first it might seem surprising
that there is no improvement in fit when we go from a second order to a
third order Taylor series approximation; however, we have \(\cos(x) = 1 -
x^2/2! + x^4/4! - x^6/6! + \cdots\). In other words, all the odd derivatives
vanish at the origin, and thus there is no improvement at the origin in
adding a cubic term (ie, the best cubic coefficient at the origin is 0). If
we go to a fourth order, we do see improvement. By \(n=10\) or \(12\) we are
already getting essentially an entire period correct; by \(n=40\) we have
several cycles.
- For many purposes, we just need a first order or second order Taylor
series; one of my favorites is the proof of the Central
Limit Theorem in probability.
One of my favorite proofs involves second order Taylor expansions of the Fourier
Transforms (these were
mentioned in the additional comments on Friday, March 12).
- If \(f(x)\) equals its infinite Taylor series expansion, can we
differentiate term by term? This needs to be proved, and is generally done
in a real analysis course. For some functions such as \(\exp(x)\) we can
justify the term by term differentiation, but note that this is something
which must be
justified.
- A terrific application of just doing a first order Taylor expansion is Newton's
Method.
- For some reason, most books don't mention the trick on how to quickly
compute higher order Taylor expansions in several variables. The idea is to
'bundle' variables together and use one-dimensional expansions. For example,
consider \(f(x,y) = \exp(-(x^2 + y^2)) \cos(xy)\). We saw in class how
painful it is to compute the Hessian, the matrix of second partial
derivatives. That involved either two product rules or knowing the triple
product formula. If we use our trick, it's much easier. Note \(\exp(u) = (1
+ u + u^2/2! + \cdots)\) and \(\cos(v) = (1 - v^2/2! + \cdots)\). A second
order Taylor expansion means keep only terms with no \(x\)'s and \(y\)'s,
with just \(x\) or \(y\), or with just \(x^2\), \(xy\) or \(y^2\) (a third
order would allow terms such as \(x^3, x^2 y, x y^2, y^3\), and so on). Thus
we expand \(\exp(u) \cos(v)\) and then set \(u = -(x^2+y^2)\) and \(v = xy\).
For \(\exp(u)\), we just need \(1 + u\), as already the \(u^2/2\) term will
be order \(4\) when we substitute \(-(x^2+y^2)\). For \(\cos(v)\), we only
keep the \(1\) as \(v^2/2\) is order \(4\). Thus the Taylor expansion of
order \(2\) is just \((1 -(x^2+y^2)) (1) = 1 - x^2 - y^2\); this is a lot
faster than the standard method! That method works in general, but there are
so many cases where this is faster that it's worth knowing.
-
Twenty-seventh
day lecture:
http://youtu.be/yr01SLw9t4c
(May 12, 2014: Taylor Series)
Wednesday May 2:
Big item was the logarithm trick; often easier to study the limit of the
logarithm of a quantity and then exponentiate. Whenever you see a product
you should have a Pavlovian response of taking logarithms.
- Wednesday,
April 25. The main items today
were the last of the big series convergence tests.
- Root Test:
Remember that the root test provides no information if the value is 1;
thus it says nothing about the convergence or divergence of \(1/n^p\) for
any fixed \(p > 0\).
- Integral Test:
The most common example is the harmonic sum, \(1/n\). The integral test
not only gives the divergence, but with a bit more work shows that the sum
of the first n reciprocals of integers is about \(\ln(n)\).
- While we could cleverly prove the harmonic series diverges, we get this
result easily with the integral test. It's worth revisiting some of the
earlier comments on this series. It diverges, but just barely. There are
many references for information on
the harmonic series (see also In
Perfect Harmony by John Webb
for examples of where it arises). The standard example given is that you can
use this to place dominoes on top of each other hanging out infinitely far
without having them fall or be supported other than their weight! See http://www.ken.duisenberg.com/potw/archive/arch03/030728sol.html as
well as http://www.cs.cmu.edu/afs/cs/academic/class/16741-s07/www/projects06/chechetka_16-741_project_report.pdf,
or the movie of the week: Stacking
blocks. For recent results on what can be done if you allow non-simple
patterns, see
this paper.
-
A nice application of a series expansion is Stirling's
formula for n!.
We again convert a product to a sum by taking logarithms; this is a very
important technique. We saw it earlier in the Method of Least Squares
(allowing us to handle cases when the unknown parameters were exponents). We
also saw it earlier in the day when analyzing \(\lim_{n\to\infty} n^{1/n}\):
take logarithms and use L'Hopital's rule to show \(\lim_{n\to\infty} \frac{\log
n}{n} = 0\), so exponentiating we see the original limit is 1.
- We can get close to the correct value
for \(n!\) by the integral
test or
the Euler-MacLaurin
summation formula,
This builds on a very important question: we know the integral test tells
whether or not a series converges; if it does converge, how close is the sum
to the integral? The Euler-MacLaurin formula teaches us how to convert sums to
integrals and bound the error.
- The fact that \(\sum_{n = 1}^\infty 1/n^2 = \pi^2/6\) has a lot of
applications. It can be used to prove that there are infinitely many primes
via the Riemann
zeta function. The Riemann zeta function is \(\zeta(s) = \sum_{n = 1}^\infty
1/n^s\). By unique
factorization (also known as the Fundamental Theorem of Arithmetic), it
also equals \(\prod_{p\ {\rm prime}} 1 / (1 - 1/p^s)\); notice that a
generalization of the harmonic sum and the geometric series formula are
coming into play. It turns out that \(\zeta(2) = \pi^2/6\), as can be seen
in many different ways. As \(\pi^2\) is irrational,
if there were only finitely many primes then the product would be
irrational, contradiction! See
wikipedia for a proof of this sum.
- Another interesting application of summing series involving primes is to
the Pentium
bug (see the links there for
more information, as well as Nicely's
webpage). The calculation being performed was sum_{\(p\): \(p\) prime
and either \(p+2\) or \(p-2\) is prime} \(1/p\); this is known as Brun's
constant. If this sum were infinite then there would be infinitely many
twin primes, proving
one of the most famous conjectures in mathematics; sadly the sum is
finite and thus there may or may not be infinitely many twin primes (twin
primes are two primes differing by 2).
-
Twenty-sixth
day lecture: http://youtu.be/ujJbUpCab6M
(May 7, 2014: Root Test, Integral Test)
For
details of the Green's theorem lecture (it
is lecture 26 on Glow)
I gave last year, see my Lecture
notes on Green's Theorem. We discussed some of the Big Three theorems of Vector Calculus (Green's
Theorem, Gauss'
Divergence Theorem, and Stokes'
Theorem). These theorems are massive generalizations of the Fundamental
Theorem of Calculus, which can be generalized even more. The idea is to
relate the integral of the derivative of something over a region to the
integral of the something over the boundary of the region. To state these
theorems requires many concepts from vector calculus (parametrizing curves,
vectors, ...) as well as the Change of Variable theorem (converting integrals
over curves and surfaces to integrals over simpler curves and surfaces).
- To truly see and appreciate the richness of the three theorems (which
are really three variants of the same theorem), one must be in at least
three dimensions. There the
Stokes' Theorem states that the integral of a certain function over a
surface equals the integral of another over the boundary curve. This means
that many integrals turn out to be the same.
- To see the equivalence of these formulations requires differential
forms. Frequently it is not immediately clear how to generalize a
concept to higher dimensions or other settings.
- While we only briefly touched on the subject, conservative
forces are extremely
important in physics and engineering, primarily because of a wonderful
property they have: the work done in moving an object from A to B is
independent of the path taken if the exerted force is conservative. Many of
the most important forces in classical mechanics are taken to be
conservative, such as gravity and electricity.
In modern physics, these forces are replaced with more complicated objects.
One of the central quests in modern physics is to unify
the various fundamental forces (gravity,
strong, weak and electricity and
magnetism).
- Click here for more on
divergence, and click
here for more on curl. Another related object (one we have seen many
times) is the gradient.
All of these involve the same differential operator, called del
(and represented with a nabla). We used our intuition for vectors to
define new combiantions involving the del operator (the curl and the
divergence). While our intuition comes from vectors, we must be careful as
we do not have commutivity. For example, nabla dot F is not the same as F
dot nabla; the first is a scalar (number) while the second is an operator. Click
here for more on differential operators. For those who want to truly go
wild on operators, modern quantum mechanics replaces concepts like position
and momentum with differential operators (click
here for the momentum operator)! This allows us to rewrite the Heisenberg
uncertainty principle in the following
strange format.
- One of the most famous applications of these concepts is the Navier-Stokes
equation, which is one of the Millenium
Problems (solving one of
these is probably the hardest path to one
million dollars!). The Navier-Stokes equation describes the motion of
fluids, which not surprisingly has numerous practical (as well as
theoretical) applications. Click
here for a nice derivation, which includes many of the new operators we
saw today.
- Another place where gradients, curls and divergences appear is the Maxwell
equations for electricity and magnetism; you can view
the equations here.
- The General Stokes Theorem is a massive generalization of the
fundamental theorem of calculus. The idea of formally moving the derivative
from the function to the region of integration is meant to be suggestive,
but of course is in no ways a proof. Notation should help us
see connections and
results. The great physicist Richard
Feynman showed that all of
physics is equivalent to solving the equation \(U = 0\), where \(U\) measure
the unworldliness of everything. It is made up of squaring the differences
between the left and right hand sides of every physical law. Thus it has
terms like \((F - ma)^2\) and \((E - mc^2)^2\). It is a concise way
of encoding information, but it is not useful; everything is hidden. This is
very different than the vector calculus formulations of electricity and
magnetism, which do aid
our understanding. For more information, skim
the article here(search for unworldliness if you wish).
- We saw that we can compute the lengths of curves by evaluating integrals
of \(||c'(t)||\), where \(c(t) = (x(t), y(t), z(t))\) is our curve. While
this formulation immediately reduces the problem of finding lengths to a
Calc II problem, in general these are very difficult integrals, and
frequently cannot be done in closed form even for simple shapes. For
example, for extra credit find the length of the ellipse \((x/a)^2 + (y/b)^2
= 1\). Click
here for the solution (the
answer involves the elliptic
integral of the second kind).
- We talked today about generalizing the Fundamental
Theorem of Calculus. There are not that many fundamental theorems in
mathematics -- we do not use the term lightly! Other ones you may have seen
are the Fundamental
Theorem of Arithmetic and the Fundamental
Theorem of Algebra; click
here for a list of more fundamental theorems (including
the Fundamental
Theorem of Poker!).
- Today was a fast introduction to path
integrals, line integrals, and Green's
Theorem (which is a special
case of the Generalized Stokes'
Theorem). While our tour of these subjects has to be rushed in a 12 week
course, if you are continuing in certain parts of math, physics or
engineering you will meet these again and again (for example, see Maxwell
equations for electricity and magnetism). In fact, one can view all of
classical mechanics as path
integrals where the trajectory of the particle (its c(t)) minimizes the
action; there is also a path
integral approach to quantum mechanics.
- For those continuing in mathematics or physics, you will see these
ideas again if you take complex
analysis. In particular, one of the gems of that subject is Cauchy's
Integral Theorem, A complex differentiable function satisfies what is
called the Cauchy-Riemann
equations, and these are essentially the combination of partial
derivatives one sees in Green's theorem. In other words, the mathematics
used for Green's theorem is crucial in understanding functions of a
complex variable.
- For me, I consider it one of the most beautiful gems in mathematics
that we can in some sense move the derivative of the function we're
integrating to act on the region of integration! This allows us to
exchange a double integral for a single integral for Green's theorem (or a
triple integral for a double integral in the divergence theorem). As we've
seen constantly throughout the year, often one computation is easier than
another, and thus many difficult area or volume integrals are reduced to
simpler, lower dimensional integrals.
- The fact that \(\int_{t = a}^{b} \nabla(f)(c(t)) \cdot c'(t) dt =
f(c(b)) - f(c(a))\) means that this integral does not depend on the path.
If a vector field \(F = (F_1, F_2, F_3)\) equals \(\nabla(f)\) for some
\(f\), we say \(F\) is a conservative
force field and \(f\) is
the potential.
The fact that these integrals do not depend on the path has, as you would
expect, profound applications.
- This is a good point to stop and think about the number of spatial
dimensions in the universe. Imagine a universe with two point masses under
gravity, and assume gravity is proportional to \(1/r^{n-1}\) with \(r\)
the distance between the masses and \(n\) the number of spatial
dimensions. If there are three or more dimensions, then the work done in
moving a particle from infinity to a fixed, non-zero distance from the
other mass is finite, while if there are two dimensions the work is
infinite! One should of course ask why the correct generalization to other
dimensions is \(1/r^{n-1}\) and not \(1/r^2\) always. There is a nice
geometric justification in terms of flux and surface area; the surface
area of a sphere grows like \(r^2\) and thus the only way to have the
total flux of force out of it be constant is to assume the force drops
like \(1/r^2\); click
here for a bit on the justification of inverse-square laws.
- Speaking of dimensions, one of my favorite problems from undergraduate
days was the Random
Walk. In 1-dimension, imagine a person so completely drunk that he/she
has a 50% chance at any moment of stepping to the left or the right; what
is the probability the drunkard eventually returns home? It turns out that
this happens with probability 1. In 2-dimension, we have a 25% chance of
moving north, south, east or west, and again the probability of returning
is 1. In 3 dimensions, however, the drunkard only returns home with
probability about 34%. As my professor Peter
Jones said, a
three-dimensional universe is the smallest one that could be created that
will be interesting for drunks, as they really get to explore! These
random walk models are very important, and have been applied to economics
(the random
walk hypothesis), as well as playing a role in statistical
mechanics in physics.
-
Monday April 23:
Today's lecture serves two purposes (click
here for the slides). While it does review many of the concepts from
integration, more importantly it introduces many of the key ideas and
challenges of mathematical modeling. Most students of 150 won't be taking
partial derivatives or integrals later in life (though you never know!);
however, almost surely you'll have a need to model, to try and describe a
complex phenomena in a tractable manner.
- Sabermetrics is
the `science' of applying math/stats reasoning to baseball. The formula I
mentioned at the start of the semester is known as the log-5
method; a better formula is the Pythagorean
Won - Loss formula (someone
linked my
paper deriving this from a reasonable model to
the wikipedia page), the topic of today's lecture. ESPN, MLB.com and all
sites like this use the Pythagorean win expectation in their expanded
series. My derivation is a nice exercise in multivariable calculus and
probability
- In general, it is sadly the case that most functions do not have a
simple closed form expression for their anti-derivative. Thus integration is
magnitudes harder than differentiation. One of the most famous that cannot
be integrated in closed form is \(exp(-x^2)\), which is related to
calculating areas under the normal (or bell or Gaussian) curve. We do at
least have good series expansions to approximate it; see the entry on the erf
(or error) function.
- The anti-derivative of \(ln(x)\) is \(x ln(x) - x\); it is a nice
exercise to compute the anti-derivative for \((ln(x))^2\) for any integer
\(n\). For example, if \(n=4\) we get \(24
x - 24 x \ln(x) + 12 x (\ln x)^2
- 4 x (\ln x)^3 + x (\ln x)^4\).
-
Another good distribution to study for sabermetrics would be a Beta
Distribution. We've seen an example already this semester when we looked
at the Laffer
curve from economics.I would
like to try to modify the Weibull analysis from today's lecture to Beta
distributions. The resulting integrals are harder -- if you're interested
please let me know.
-
Today we discussed modeling, in particular, the interplay between finding a
model that captures the key features and one that is mathematically
tractable. While we used a problem from baseball as an example, the general
situation is frequently quite similar. Often one makes simplifying
assumptions in a model that we know are wrong, but lead to doable math (for
us, it was using continuous probability distributions in general, and in
particular the three
parameter Weibull). For more on these and related models, my
baseball paper is available here; another interesting read might be my
marketing paper for the movie industry (which
is a nice mix of modeling and linear programming, which is the linear
algebra generalization of Lagrange multipliers).
- One of the most important applications of finding areas under curves
is in probability, where we may interpret these areas as the probability
that certain events happen. Key concepts are:
- The more distributions you know, the better chance you have of finding
one that models your system of interest. Weibulls are frequently used in
survival analysis. The exponential
distribution occurs in
waiting times in lines as well as prime numbers.
- In seeing whether or not data supports a theoretical contention, one
needs a way to check and see how good of a fit we have. Chi-square
tests are one of many
methods.
- Much of the theory of probability was derived from people interested
in games of chance and gambling. Remember that when the house sets the
odds, the goal is to try and get half the money bet on one team and half
the money on the other. Not surprisingly, certain organizations are very
interested in these computations. Click
here for some of the details on the Bulger case (the
bookie I mentioned in class is Chico Krantz, and is referenced briefly).
- Any lecture on multivariable calculus and probabilities would be
remiss if it did not mention how unlikely it is to be able to derive
closed form expressions; this is why we will study Monte
Carlo integration later.
For example, the normal
distribution is one of the
most important in probability, but there is no nice anti-derivative. We
must resort to series expansions; that expansion is so important it is
given a name: the
error function.
- I strongly urge you to read the pages where we evaluate the integrals
in closed form. The methods to get these closed form expressions occur
frequently in applications. I particularly love seeing relations such as
\(1/c = 1/a + 1/b\); you may have seen this in resistors
in parallel or perhaps the reduced
mass from the two
body problem (masses under
gravity). Extra credit to anyone who can give me another example of
quantities with a relation such as this.
- Click here for a
clip of Plinko on the Price I$ Right, or here for a showcase
showdown.
- We discussed how website like ESPN and MLB have a very limited space to
display information, especially if it's for a smart phone. Thus one cannot
show every statistic, and we have to pick and choose which ones are worth
showing. In one section I made a joke about including the team names, but
this is actually a serious comment! The MBTA (or
MTA for us old folk!) is having a contest on how to redesign their
subway map of Boston. Below are links to an interesting article on the
subject and the maps.
-
Twentyeth
day lecture:
http://youtu.be/gFDly_6qOn4 (April 18, 2014:
Sabermetrics and Multivariable Calculus - Lecture from 2013)
Wednesday,
April 18.
-
The Comparison
Test is one of the most
important ways we have to tell if a series converges or diverges, but it is
one of the hardest to use. It is only as good as our list of comparable
series. Frequently one must do some algebra to manipulate the expressions. In
particular, one needs to know how rapidly certain functions grow. We showed
polynomials grow slower than exponentials, and logarithms grow slower than
polynomials. One important application of results like these is in algorithm
analysis in computer science, where we try to determine how fast an algorithm
runs. Measuring which algorithm is best is not easy; do we care about how fast
it is on the worst input, or on the average speed? A common problem is to sort
n elements in a list. Different ways are QuickSort, BubbleSort and MergeSort.
There are other ways -- can you think of one?
- You can use
L'Hopital's rule to compare growth rates of functions; we'll discuss the proof
later the semester, but for now see
the article here on it.
Larry Bird / Michael Jordan commercial:
https://www.youtube.com/watch?v=_oACRt-Qp-s
The periodic table (http://www.chemicool.com/images/periodic-table.png),
a great example of a sequence!
Read more here.
The fine
structure constant is one of the best examples of math/physics where we
have approximations. Another great example is the 'classical
limit' (when Plank's constant is sent to zero).
One of my favorite applications of open
and closed sets is Furstenberg's
proof of the infinitude of primes; one night while a postdoc at Ohio
State I had drinks with Hillel
Furstenberg and one of his
students, Vitaly
Bergelson. This is considered by many to be one of the best proofs of
the infinitude of primes; it is so good it is one of six proofs given in THE
Book. Unlike most proofs of the infinitude of primes, this gives no
bounds on how many primes there are at most x; even Euclid's
proof (if there are only
finitely many primes, say \(p_1, \dots, p_n\), then consider
\((p_1 \cdots p_n)+1\); either this is a new prime or it is
divisible by a prime not in our list, since each prime in our list has
remainder 1 when we divide by it) gives a lower bound, namely log log x
(the true answer is that there are about x / log x primes at most x). As
a nice exercise (for fun), prove this fact. This leads to an interesting
sequence: 2,
3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 2801, 11, 17, 5471,
52662739, 23003, 30693651606209, 37, 1741, 1313797957, 887, 71, 7127, 109,
23, 97, 159227, 643679794963466223081509857, 103, 1079990819, 9539,
3143065813, 29, 3847, 89, 19, 577, 223, 139703, 457, 9649, 61, 4357....
This sequence is generated as follows. Let a_1 = 2, the first prime. We
apply Euclid's argument and consider 2+1; this is the prime 3 so we set
a_2 = 3. We apply Euclid's argument and now have 2*3+1 = 7, which is
prime, and set a_3 = 7. We apply Euclid's argument again and have 2*3*7+1
= 43, which is prime and set a_4 = 43. Now things get interesting: we
apply Euclid's argument and obtain 2*3*7*43 + 1 = 1807 = 13*139, and set
a_5 = 13. Thus a_n is the smallest prime not on our list genereated by
Euclid's argument at the nth stage. There are a plethora of (I believe)
unknown questions about this sequence, the biggest of course being whether
or not it contains every prime. This is a great sequence to think about,
but it is a computational nightmare to enumerate! I downloaded these terms
from the Online Encyclopedia of Integer Sequences (homepage is http://oeis.org/
and the page for our sequence is http://oeis.org/A000945
). You can enter the first few terms of an integer sequence, and it
will list whatever sequences it knows that start this way, provide
history, generating functions, connections to parts of mathematics, ....
This is a GREAT website to know if you want to continue in mathematics.
There have been several times I've computed the first few terms of a
problem, looked up what the future terms could be (and thus had a formula
to start the induction).
Twenty-fourth
day lecture:
http://youtu.be/6bf9fjwMs2o (May
2, 2014: Comparison Test, Implications of Limits of Terms)
Monday,
April 16. We continued our exploration of sequences and series
(we had seen many of these concepts earlier in the semester).
-
Instead of the standard examples, it's fun to explore some of the more
exotic possibilities:
- The 3x+1 problem (I
have a paper
on the 3x+1 problem, concerning the distribution of leading digits of
the iterates and applications to fighting tax fraud; this is related to Benford's
law, and is a very important area of research that is accessible to
undergraduates).
-
We talked about sequences and series. We've seen many examples in previous
classes, one of the most important being the upper and lower sums leading
to a proof of the Fundamental Theorem of calculus.
The 196 algorithm and Lycherel
numbers:
http://en.wikipedia.org/wiki/Lychrel_number
Standard examples of sequences and series include
- An
infinite series of surprises: a nice article going from the geometric
series to the harmonic series to other important examples.
- We mentioned that sequences and series are very important; two of the
most powerful applications are Taylor
series (approximating
complicated functions with simpler ones) and Riemann
sums (allowing us to
calculate areas with integrals).
- L'Hopital's
rule is frequently used to
analyze the behavior of sequences. Remember that you can only use it if
you have 0 over 0 or infinity over infinity.
The proof we gave today of the geometric series formula (by shooting baskets)
uses many great techniques in mathematics. It is thus well worth it to study
and ponder the proof.
- Memoryless
process: once both people miss, it is as if we've just started the game
fresh.
- Calculating something two different ways: a good part of combinatorics
is to note that there are two ways to compute something, one of which is
easy and one of which is not. We then use our knowledge of the easy
calculation to deduce the hard. For example, \(\sum_{k=0}^n \left({n \atop
k}\right)^2 = \left({2n \atop n}\right)\); the right side is easy to
compute, the left side not so clear. Why are the two equal? It involves
finding a story, which we leave to the reader.
- For another example of applications of harmonic numbers, see
the coin collector problem (if
you want more info on this problem, let me know -- I have lots of notes on
it from teaching it in probability).
- In Section 2 a few years ago, a basketball shot basically went in and
out; see
the following article on golf
for some info on related problems in golf.
Twenty-third
day lecture: http://youtu.be/aigdKmu-5ow (April
28, 2014: Geometric and Harmonic Series, Memoryless Processes)
-
Friday, April 13:
We finished
the Big Three coordinate changes: polar
coordinates and cylidrical
coordinates and spherical
coordinates (be aware that
physicists and mathematicians have different definitions of the angles in
spherical!).
- One can generalize spherical coordinates to hyperspheres
in n-dimensional space. These lead to wonderful applications of
special functions, such as the Gamma
function, in writing down formulas for the `areas' and `volumes'. As a
nice exercise, you can rewrite the integral in the comment above as
\(\Gamma(1/2)\).
- There are many fascinating question involving spheres (with
applications to error
correcting codes!):
- One of the most important applications of spherical coordinates is to
planetary motion, specifically, proving that the force one sphere exerts
on another is equivalent to all of the mass being located at the center of
the sphere. This is the most
important integral in Newton's
great work, Principia (we
have a first edition at the library here). I strongly urge everyone to
look at this problem. Proving that one can take all of the mass to be at
the center enormously simplifies the calculations of planetary motion. See
the Wikipedia article on the Shell
Theorem for the computation. As this is so important, here
is another link to a proof. Oh, let's
do another proof here as
well as another
proof here. For an example of a non-proof, read
the following and the comments.
-
Nineteenth
day lecture: http://youtu.be/3Pt4E1BeUTw
(April 16, 2014: Cylindrical and Spherical Coordinates, Newton's Shell
Theorem)
- Wednesday, April 11: Special class at WCMA
-
Monday, April 9. We finished
the Big Three coordinate changes: polar
coordinates and cylidrical
coordinates and spherical
coordinates (be aware that
physicists and mathematicians have different definitions of the angles in
spherical!).
- One can generalize spherical coordinates to hyperspheres
in n-dimensional space. These lead to wonderful applications of
special functions, such as the Gamma
function, in writing down formulas for the `areas' and `volumes'. As a
nice exercise, you can rewrite the integral in the comment above as
\(\Gamma(1/2)\).
- There are many fascinating question involving spheres (with
applications to error
correcting codes!):
- One of the most important applications of spherical coordinates is to
planetary motion, specifically, proving that the force one sphere exerts
on another is equivalent to all of the mass being located at the center of
the sphere. This is the most
important integral in Newton's
great work, Principia (we
have a first edition at the library here). I strongly urge everyone to
look at this problem. Proving that one can take all of the mass to be at
the center enormously simplifies the calculations of planetary motion. See
the Wikipedia article on the Shell
Theorem for the computation. As this is so important, here
is another link to a proof. Oh, let's
do another proof here as
well as another
proof here. For an example of a non-proof, read
the following and the comments.
-
Nineteenth
day lecture: http://youtu.be/3Pt4E1BeUTw
(April 16, 2014: Cylindrical and Spherical Coordinates, Newton's Shell
Theorem)
-
Friday,
April 6.
Monte Carlo Integration
is
called by many the most
important mathematical paper of the 20th century. Sadly, most integrals cannot
be evaluated in closed form, and we must resort to approximation methods.
Remember, the Fundamental Theorem of Calculus is useless for finding areas
if we don't know the anti-derivative.
-
Here are some additional readings on the subject
-
In general, it is sadly the case that most functions do not have a simple
closed form expression for their anti-derivative. Thus integration is
magnitudes harder than differentiation. One of the most famous that cannot be
integrated in closed form is exp(-x2),
which is related to calculating areas under the normal (or bell or Gaussian)
curve. We do at least have good series expansions to approximate it; see the
entry on the erf
(or error) function.
- The anti-derivative of \(\ln(x)\) is \(x \ln(x) - x\); it is a nice
exercise to compute the anti-derivative for \((\ln(x))^n\) for any integer
\(n\). For example, if \(n=4\) we get \(24 x - 24 x \ln(x) + 12 x (\ln x)^2
- 4 x (\ln x)^3 + x (\ln x)^4\).
- Today we had a brief introduction to probability; one of the most
important applications of integration is to determining probabilities, which
frequently are areas under curves.
- Probably my favorite example (and one of the most important!) of using
polar coordinates to evaluate an integral is to find the value of the Gaussian
integral \(\int_{-\infty}^\infty
\exp(-x^2)dx\). Of course, it seems absurd to use polar coordinates for this
as we are in one-dimension! Our book has a good discussion of this problem,
as does the wikipedia
page. This is one of the most important integrals in the world, and
leads to the normalization constant for the normal
distribution (also known as
the bell curve or the Gaussian distribution), which may be interpreted as
saying the factorial of -1/2 is \(\sqrt{\pi}\)!
(The exclamation here is for emphasis, not for factorial.)
- Eighteenth
day lecture: http://youtu.be/Nz7ahXOMTus
(April 14, 2014: Monte Carlo Integration, Change of Variables for Ellipses)
- Wednes
day,
April 4.
- Let's
say we want to compute \(\lim_{h\to 0} \frac{\sin(h)}{h}\); this is the most
important trig limit. We use L'Hopital's rule and note that it is the same as
\(\lim_{h \to 0} \cos(h) / 1\); as \(\cos(h)\) tends to 1, the limit is just
1. Why is this argument not valid? The answer is one of the most important
principles in mathematics. We need to know this limit is 1 in order to get
the derivative of sin(x) is cosine(x); it's sadly easy to argue in circles.
We saw how this limit is related to finding the derivatives of trig
functions, as well as the area and perimeter of a circle.
-
Fubini's Theorem (changing the order of integrations) is one of the most
important observations in multivariable calculus. For us, we assume our
function f(x,y) is either continuous or bounded, and that it is defined on a
simple region D contained in a finite rectangle. If D is an unbounded region,
say D = {(x,y): x, y >= 0} then Fubini's theorem can fail for continuous,
bounded functions. In class we did an example involving a double sum, where
a_{0,0} = 1, a_{0,1} = -1, a_{0,n} = 0 for all n >= 2, then a_{1,0} = 0,
a_{1,1} = 1, a_{1,2} = -1, and then a_{1,n} = 0 for all n >= 3, and so on. If
we want to have a continuous function, we can tweak it as follows. Consider
the indices {m,n}. Draw a circle of radius 1/2 with center {m,n} (note no two
points will have circles that intersect or overlap). If a_{m,n} is positive,
draw a cone with base a circle of radius 1/2 centered at {m,n} and height 12/π.
As the area of a cone is (1/3) (area base) (height), this cone will have
volume 1; if a_{m,n} was positive we draw a similar cone but instead of going
up we go down, so now the
volume is -1. What is going wrong? The problem is that Sum_m Sum_n |a_{m,n}| = ∞
(the sum of the absolute values diverges), and when infinities enter strange
things can occur. Recall we are not allowed to talk about ∞ - ∞; the
contribution from where our function or sequence is positive is +∞, the
contribution where it is negative is -∞, and we are not allowed to subtract
infinities.
- To motivate the Change
of Variable Formula, which we'll see later, try to find the area of a
circle by doing the integration directly. While there are many ways to
justify learning the Change of Variable Formula (it's one of the key tools
in probability), I want to take the path of looking at what should be
a simple integral and seeing how hard it can be to evaluate in the given
coordinate system. Much of modern physics is related to changing coordinate
systems to where the problem is simpler to study (see the Lagrangian or Hamiltonian
formulations of physics);
these are equivalent to F = ma, but lead to much simpler algebra. The
problem we considered was using one-variable calculus to find the area under
a circle. This requires us to integrate \(\sqrt{1 - x^2}\) from x=0 to
x=1. This is one of the most important shapes in mathematics -- if calculus
is such a great and important subject, it should be able to handle this!
- To attack this problem, recall a powerful
technique from Calc I: if f(g(x)) = x (so f and g are inverse functions,
such as sqrt(x^2)), then g'(x) = 1 / f'(g(x)); in other words, knowing the
derivative of f we know the derivative of its inverse function. This was
used in Calc I to pass from knowing the derivative of exp(x) to the
derivative of ln(x). We can various inverse
trig functions; while many are close to sqrt(1-x^2), none of them are
exactly that (a
list of the derivatives of these are here). This highlights one of the
most painful parts of integration theory -- just because we are close to
finding an anti-derivative does not mean we can actually find it! While
there is a
nice anti-derivative of \(\sqrt{1 - x^2}\), it is not a pure derivative of an
inverse trig function. There are many tables
of anti-derivatives (or integrals) (a
fun example on that page is the Sophomore's
Dream). Unfortunately it is not always apparent how to find these
anti-derivatives, though of course if you are given one you can check by
differentiating (though sometimes you have to do some non-trivial algebra to
see that they match). In fact, there are some tables of integrals of
important but hard functions where most practitioners have no idea how these
results are computed (and occasionally there are errors!). We will see later
how much simpler these problems become if we change variables; to me, this
is one of the most important lessons you can take from the course: MANY
PROBLEMS HAVE A NATURAL POINT OF VIEW WHERE THE ALGEBRA IS SIMPLER, AND IT
IS WORTH THE TIME TO TRY TO FIND THAT POINT OF VIEW!
- For another example of changing your
viewpoint, think of trying to write down an ellipse aligned with the
coordinate axes, and one rotated at an angle. Linear algebra provides a nice
framework for doing these coordinate transformations, changing hard problems
to simpler ones already understood.
- Frequently we are confronted with the need to find the integral of a
function that we have never seen. One approach is to consult a table of
integrals (here
is one at wikipedia; see also the
table here). Times have changed from when I was in college. Gone are the
days of carrying around these tables; you can access Mathematica's
Integrator on-line, and it
will evaluate many of these. One caveat: sometimes these integrals are
doable but do not appear in the table in the form you have, and some work is
required to show that they equal what is tabulated.
- A good
example, of course, is just computing the area of a circle! In Cartesian
coordinates we quickly see we need the anti-derivative of \(\sqrt{1 - x^2}\),
which involves inverse trigonometric functions; it is very straightforward
in polar! In fact, we can easily get the volume of a sphere by integrating
the function \(\sqrt{1 - x^2 - y^2}\) over the unit disk!
- Famous tables are Abramowitz
and Stegun and Gradshteyn
and Ryzhik.
- For those interested in some of the history of special functions and
integrals, see
the nice article here by Stephen Wolfram. There's a lot of nice bits
in this article.
- One of my favorites is the throw-away comment in the beginning on
how the Babylonians reduced multiplication to squaring. Here's the full
story. The Babylonians worked base 60; if you think memorizing our
multiplication table is bad, consider their problem: 3600 items! Of
course, you lose almost 1800 as xy = yx, but still, that's a lot of
tablets to lug. To compute xy, the Babylonians noted that xy = ((x+y)^2
- x^2 - y^2) / 2, which reduces the problem to just squaring,
subtracting and division by 2. There are more steps, but they are easier
steps, and now we essentially just need a table of squares. This concept
is still with us today: it's the idea of a look-up
table, computing new values (or close approximations) from a small
list. The idea is that it is very fast for computers to look things up
and interpolate, and time consuming to compute from scratch.
- We started the semester by giving a dimensional analysis proof of the
Pythagorean formula. We returned to this mindset today when we talked about
the polar change of variables, going from \(dx dy\) to \(r dr d\theta\),
where the latter should be viewed as \(dr \times r d\theta\). We need to
have an area, and angles are unitless; however, if we multiply the angle by
a length (like \(r\)) then we do get an area.
-
Seventeenth
day lecture: http://youtu.be/JP78q_ri-4o (April 11, 2014: Polar Change of Variables, Circles and Spheres)
- Wednesday,
March 14.
The main result today was a method for integrating over regions other than a
rectangle. We discussed a theoretical way to do it last class by replacing our
initial function \(f\) on a rectangle including \(D\) with a new function
\(f^\ast\), with \(f^\ast(x,y) = f(x,y)\) if \((x,y)\) is in our domain \(D\)
and 0 otherwise. To make this rigorous we need to argue and show that we may
cover any curve with a union of rectangles with arbitrarily small area. This
leads to some natural, interesting questions.
- The first, and most important, involves what happens to a function when
we force it to be 0 from some point onward (say outside \(D\)). The function may be
discontinuous at the boundary, but then again it may not. There are many
interesting and important examples from mathematical physics where we are
attempting to solve some equation that governs how that system evolves. One
of the most studied are the vibrations of a drum, where the drumhead is
connected and stationary. We can thus view the vibrating drumhead as giving
the values of our function on some region \(D\), with value 0 along the
boundary. This leads to the fascinating
question of whether or not you can hear the shape of a drum. This means
that if you hear all the different harmonics of the drum, does that uniquely
determine a shape? Sadly, the answer is no -- different drums can have the
same sounds. An
excellent article on this is due to Kac, and can be read here.
- We discussed horizontally-simple and vertically-simple and simple
regions (other books use the words y-simple, x-simple and simple regions).
Note that a region is often called elementary if it is either horizontally
or vertically simple. (Click
here for some more examples on simple regions.) The point of our
analysis here is to avoid having to go back to the definition of the
integral (ie, the Riemann sum). While not every region is elementary, many
are either elementary or the union of elementary regions. Below are two
interesting tidbits about how strange things can be:
- Space
filing curves: click here for just how strange a curve can be!
-
Koch snowflake: This is an example of a fractal set; the boundary has
dimension greater than 1 but less than 2! It's fractal
dimension is \(\log 4
/ \log 3\), and continues our discussion of fractals.
- Jordan
curve theorem: It turns out to be surprisingly difficult to prove that
every non-intersecting curve in the plane divides the plane into an inside
and outsider region. It's not too bad for polygons, but for more general
curves (such as the non-differentiable boundary of the Kock snowflake),
it's harder.
- The video of the week was coin
sorting (this leads to Lebesgue's
Measure Theory). There are many reasons leading to this as the
selection. One is that the Lebesgue theory is needed in a lot of higher
mathematics, and if you continue you'll eventually meet it. The other, and
more important for us, is that this demonstrates the power of a fresh
perspective. This happens again and again in mathematics (and life). We have
blinders on and don't even realize they're there. We get so used to doing
things a certain way it becomes heretical to think of doing it another way.
(This allows me to link to Asimov's Nightfall story,
considered by many the greatest sci-fi short story; it creates a society
which must confront the inconceivable for them -- obviously anything a
writer writes must be conceivable to the writer, so this is something
conceivable to us but not to them.) It's natural to divide the x-axis and
add the areas as we go along; however, it is useful to consider dividing it
along the y-axis as well.
- If you know combinatorics, here's a nice example illustrating the
above point. Evaluate \(\sum_{k = 0}^{n} \left({n \atop k}\right) \left({n
\atop n-k}\right)\), where \(\left({x \atop y}\right) = x! / (y!
(x-y)!)\). The answer is \(\left({2n \atop n}\right)\). There are a lot of
ways to view this, here is my favorite. Imagine we have \(2n\) people,
\(n\) who prefer Star Trek: The Original Series and n who prefer Star
Trek: The Next Generation. There are \(\left({2n \atop n}\right)\) ways to
choose \(n\) people from the \(2n\) people. We can view this another way:
let's look at how many groups we can form of \(n\) people where exactly
\(k\) prefer the original series. There are \(\left({n \atop k}\right)\)
ways to choose \(k\) people from the \(n\) who prefer the original series,
and then \(\left({n \atop n-k}\right)\) ways to choose \(n-k\) from the
\(n\) who prefer the new series. The total number of ways with exactly
\(k\) who prefer the original is the product: \(\left({n \atop k}\right) \cdot
\left({n \atop n-k}\right)\). We then sum over \(k\); as any group of
\(n\) people must have SOME number who prefer the original series, this
sum is just the number of ways to choose \(n\) people from \(2n\), or
\(\left({2n \atop n}\right)\). Telling a story and changing our
perspective really helps!
- I'm a big fan of the song "I'm my own grandpa". What lessons does it
have for us in Calc III and beyond? It's all about how you take information
embedded in a problem and extract it. If you haven't tried diagramming it, I
urge you to do so. I'm including links to the song, as well as a link to the
family tree diagram.
-
Sixteenth
day lecture: http://youtu.be/G9d9lcYevnM (April 9, 2014: Iterated integrals, changing order)
- Monday,
March 12.
In one dimension, there is not much choice in how we integrate; however, if we
are trying to integrate a function of several variables over a rectangle (or
other such regions), not surprisingly the situation is markedly different.
Similar to the freedom we have with limits in several variables (where we have
to consider all possible paths), there are many ways to integrate. Imagine we
have a function of two variables and we want to integrate it over the
rectangle \([a, b] \times [c, d]\), with \(x\) in \([a, b]\) and \(y\) in
\([c, d]\). One possibility is we can fix \(x\) and let y vary, computing the
integral over y for the fixed \(x\), and then let \(x\) vary, computing the
integral over \(x\). Of course, we could also do it the other way. As we are
integrating the same function over the same region (just in a different
order), we hope that the answers are the same! So long as everything is nice,
this is the case. There are many formulations as to exactly what is needed to
make the situation nice; if our function is continuous and bounded and we are
integrating over a finite rectangle, then we can interchange the order of
integration without changing the answer. This is called Fubini's
theorem,
and is one of the most important results in integration theory in several
variables. There really isn't an analogue in one dimension, as there we have
no choice in how to integrate!
- Whenever you are given a theorem, it is worthwhile to remove a condition
and see if it is still true. Typically the answer is no (or if it is still
true, the proof is frequently much harder). There are many functions and
regions where the order of integration matters. The simplest example is
looking at double sums rather than double integrals, though with a little
work we can convert this example to a double integral. We give a sequence \(a_{mn}\)
such that \(\sum_{m = 0}^{\infty} \sum_{n = 0}^{\infty} a_{m,n}\) is not
equal to \(\sum_{n = 0}^{\infty} \sum_{m = 0}{^\infty} a_{m,n}\). For \(m, n
\ge 0\) let \(a_{m,n} = 1\) if \(m = n\), \(-1\) if \(n=m+1\) and \(0\)
otherwise. Show that the two different orders of summation yield different
answers. The reason for this is that the sum of the absolute value of the
terms diverges.
- Click here for
another example where we cannot interchange the order of integration; a
more involved example
is available here.
-
Click here for a video by Cameron on how he applies Fubini's theorem to
change the order of operations (he
does a double sum instead of a double integral, but the principle is the
same).
- It is important to know your integrals. There are many formulas, and
plenty of tables of integrals to help you. Two of the most popular are
available online, but should only be used when you have a truly pesky
integral: Abramovich
and Stegun and Gradshteyn
and Ryzhik. For everyday purposes, this
should suffice.
- Two of the most important 1-dimensional techniques are integration
by parts and u-substitution.
Another powerful technique is partial
fractions. At first this seems like the domain of sadistic professors,
but in reality it can be quite useful. I and one of my students needed to
use it this summer to attack a nice problem in combinatorics / number
theory. Zeckendorf proved that if you write the Fibonacci numbers with just
one 1, so 1, 2, 3, 5, 8, 13, ..., then every number can be written uniquely
as a sum of non-adjacent Fibonacci numbers. Lekkerkerker proved that as
\(x\) ranges from the \(n\)th to the (\(n+1\))st Fibonacci numbers then the
average number of summands needed is \(n/(\phi+2)\), where \(\phi\) is the
golden mean. My students and I proved the fluctuations about the mean are
normally distributed (and generalized this to other systems). One of the key
inputs was integration by partial fraction. If you're interested, let me
know. This project allowed me to use my research funds to by a Cookie
Monster!
-
Fifteenth
day lecture: http://youtu.be/N8nFFWG_6J4 (April 7, 2014: Integration in two variables)
- Friday,
March 9.
Today we
proved the Fundamental
Theorem of Calculus. There are not that many fundamental theorems in
mathematics -- we do not use the term lightly! Other ones you may have seen
are the Fundamental
Theorem of Arithmetic and the Fundamental
Theorem of Algebra; click
here for a list of more fundamental theorems (including
the Fundamental
Theorem of Poker!). To simplify the proof, we made the additional
assumptions that our function was continuously differentiable and the
derivative was bounded. These assumptions can all be removed; it suffices for
the function to be continuous on a finite interval (in such a setting, a
continuous function is actually uniformly
continuous; informally, this means in the epsilon-delta
formulation of continuity that
delta is independent of the point. Such a result is typically proved in an
analysis class. What I find particularly interesting about the proof is that
the actual value that bounds the function is irrelevant; all that matters is
that our function is bounded. Theoretical math constantly uses such tricks;
this is somewhat reminiscent of some of the Lagrange Multiplier problems,
where we needed to use the existence of lambda to solve the problem, but
frequently we never had to compute the value of lambda.
- The key ingredients in the proof are using the Mean
Value Theorem and observing
that we have a telescoping
sum. One has to be a little careful with telescoping sums with
infinitely many terms. The wikipedia article has some nice examples of
telescoping sums and warnings of the dangers if there are infinitely many
summands.
- This lecture was titled 'The one with the Mean Value Theorem' in
homeage to the sitcom Friends, where
every (or almost every) episode title begins with `The one' (most
have with as the next word, but not all).
- The hardest, but perhaps most important, step in the proof of the
Fundamental Theorem of Calculus was taking a special sequence of points \(p_k\)
in \([x_k, x_{k+1}]\) and applying the Mean Value Theorem to replace \(f(p_k)
\frac1{n}\) with \(F(x_{k+1}) - F(x_k)\). It's natural to try something
like this. We need to get \(F\) into the proof somehow; if we apply the
MVT to \(F\) we get \(F'\) coming out; as \(F' = f\) there's a hope of
getting this to match some of the terms we have. It takes many years of
experience to `see' arguments like this, but a great goal is to try to
reach such a mastery. You're going to forget technical details and
results; what you want to remember is how to attack a problem. That's why,
for me, this was one of the most important moments of the lecture (and one
of the most important of the class).
- For additional reading on some of the background and related material,
see the following links. If you're interested in a math major, I strongly
urge you to read these.
(we'll get to the Taylor series part later in the
semester).
-
Proofs by Induction
(as well as other items, including the notes above!).
Whenever you are given a new theorem (such as the Fundamental Theorem of
Calculus), you should always check its predictions against some cases that
you an readily calculate without using the new machinery. For example, if we
want to find the area under \(f(x)\) from \(x=0\) to \(x=1\), obviously the
answer will depend on \(f\). If \(f\) is constant it is trivial (area of a
rectangle); if \(f\) is a linear relation then the answer is still readily
calculated (area of a triangle). For more general polynomials, one can
compute the Riemann sums (the upper
and lower sums) by Mathematical
Induction. For example, using induction one can show that \(\sum_{k=1}^n
k^2 = n(n+1)(2n+1)/6\), and this result can then be used to find the area
under the parabola \(y = x^2\). What do you think \(\sum_{k=1}^n k^3\) will
be? It can be shown that \(\sum_{k=1}^n k^m\) is a polynomial of degree
\(m+1\) in \(n\), with leading coefficient \(\frac{1}{m+1}\). This is quite
reasonable, as for \(n\) large if we look at \(\sum_{k=1}^n (k/n)^m \cdot
(1/n)\) this looks like \(\int_0^1 x^m dx\), and the anti-derivative of
\(x^m\) is \(x^{m+1}/(m+1)\). If you're interested in how you prove that
this is a polynomial, let me know. Here's one intriguing hint: if you
assume
it is a polynomial in \(n\) then if you evaluate it for enough values of
\(n\) you can interpolate and figure out the polynomial, and then feed that
into the mathematical induction!
The integration covered through Calc III is known as Riemann
sums / Riemann integrals. In more advanced math classes you'll meet the
successor, Lebesgue
integrals. Informally, the difference between the two is as follows.
Imagine you have a large number of coins of varying denominators to assist;
your job is to count the amount of money. Riemann sums work by breaking up
the domain of the function; Lebesgue integration works by breaking up the
domain.
(Extra Credit) For those looking for a challenge: Let
\(f\) satisfy the conditions of the Fundamental Theorem of Calculus. Let \(L(n)\)
denote the corresponding lower sum when we partition the interval \([0,1]\)
into \(n\) equal pieces, and similarly let \(U(n)\) denote the upper sum. We
know \(U(n) - L(n)\) tends to zero and \(L(n) \le\) True Area \(\le U(n)\);
as \(U(n) - L(n) \to 0\) as \(n \to \infty\), both \(U(n)\) and \(L(n)\)
tend to the true area. Must we have \(L(n) \le L(n+1)\), or is it possible
that \(L(n+1)\) might be less than \(L(n)\)?
Fourteenth
day lecture:
http://www.youtube.com/watch?v=Q1TQtH6POyI (March 19, 2014:
Fundamental Theorem of Calculus in a Day)
- Wednesday,
March 7.
Lagrange multipliers are
a terrific application of multivariable calculus. Frequently one needs to
optimize something, be it revenue in economics or steathiness in a fighter.
Lagrange multipliers give us a way to find maxima / minima subject to
constraints, provided
we can solve the equations! We
first generalized the methods from one variable calculus on how to find maxima
and minima of functions. Recall that if f is a differentiable real-valued
function on an interval \([a,b]\), then the candidates for maxima / minima are
(1) the critical points, namely those \(x\) in \([a,b]\) where \(f'(x) = 0\),
and (2) the endpoints. How does this generalize to several variables? In
one-dimension the boundary of an interval is `boring'; it's just the two
endpoints, and thus it isn't that painful to have to check the value of the
function there as well as at the critical point. What about several variables?
The situation is quite different. For example, the interval \([-1,1]\) might
become a sphere \(x^2 + y^2 + z^2 \le 1\); the interior is all points \((x,y,z)\)
such that \(x^2 + y^2 + z^2 < 1\), while the boundary is now the set of points
with \(x^2 + y^2 + z^2 = 1\). Unfortunately this leads to infinitely many
points to check; while we could afford to just check the endpoints by brute
force in one-dimension, that won't be possible now. The solution is the Method
of Lagrange Multipliers.
- Two good links: An
introduction to Lagrange Multipliers and Lagrange
Multipliers.
- The Method of Lagrange Multipliers is one of the most frequently used
results in multivariable calculus. It arises in physics (Hamiltonians and
Lagrangian, Calculus of Variations), information theory, economics, linear
and non-linear programming, .... You name it, it's there. The two webpages
referenced above have several examples in these and other subjects; there
are of course many other sources and problems (click
here for a nice post on gasoline taxes, pollution and Lagrange multipliers).
For more on the economics impact, click
here, as well as see the following papers:
- The Method of Lagrange Multipliers ties together many of the concepts
we've studied this semester, as well as some from Calc I and Calc II
(vectors, directional derivatives and gradients, and level sets, to name a
few). The goal is to show you how the theoretical framework we developed can
be used to solve problems of interest. The military example we discussed is
just one of many possible applications. We were concerned with how to deploy
a fleet to minimize average deployment time to trouble spots (for more
information, see my
notes on the problem and the Mathematica
code); of course, instead of considering each place equally important we
could easily add weights. One consequence of war is that it does strongly
encourage efficiency and optimization; in fact, many optimization algorithms
and techniques were developed because of the problems encountered. The
subject of Operations Research took off during WWII; see the excellent
wikipedia article on Operations Research, especially the subsection
on the problems OR attempts to solve. Not surprisingly, there are also
numerous applications in business. Feel free to talk either to my wife (who
is a Professor of Marketing) or myself (I've written several papers with
marketing professors, applying such techniques to many companies, my
favorite being movie theaters). As mentioned, we can reinterpret our
problem as minimizing shipping costs from a central distributor to various
markets (where some markets may be more valuable than others, leading to a
weighted function).
- Walmart: if you don't like the military
applications in the above problem, think of Walmart and why they are well
known (and why Rainman evaluates K-mart as he does).
One of the most important takeaways of the deployment problem is that
the answer you get, as well as the difficulty of the math needed to arrive
at the answer, depends on how you choose to model the world. For us, it
depends on how we choose to measure 'distance'. My
notes on a deployment problem on the Earth's surface give
four different methods yielding three different solutions, all of which
differ from what you get if you use the 'correct' measure of distance. This
is an extremely common outcome -- your answer depends on how you choose to
model / measure! You need to be very aware
of this when you compare different people's answers to the same problem. For
a nice example of how the answer can depend on your point of view, consider
the riddle below (passed on by G. Mejia). What's the right answer? There are
at least two different right answers, depending on how you interpret things.
- The police rounded up Jim, Bud and Sam yesterday, because one of them
was suspected of having robbed the local bank. The three suspects made the
following statements under intensive
questioning (see below). If only one of therse statements turns out to be
true, who robbed the bank?
- Jim: I'm innocent.
- Bud: I'm innocent.
- Sam: Bud is the guilty one.
In 2011 I gave an extra credit problem which has applications to
building an efficient computer for information retrieval (as opposed to
processing). For more on the problem of building an efficient computer in
terms of retrieval of information, see the
solution to the related extra credit problem from earlier in the 2011
iteration of the course. Note the problem is harder without the tools of
multivariable calculus. See
also the article by Hayes in the American Scientist, Third Base.
I've scanned in a chapter by Lanchester
on The Mathematics of Warfare; you can also view
it through GoogleBooks here. This article is from a four volume series,
The World of Mathematics. (I am fortunate enough to own two sets; one
originally belonged to a great uncle of mine, another to a
grandfather-in-law of my wife). I've written some Mathematica code to
analyze the Battle
of Trafalgar, which is described in the Lanchester article; the
Mathematica code is here (though
it might not make sense without comments from me). (The file name is boring
because, during
the 200th anniversary re-enactment, in order to avoid hurting anyone's
feelings they refused to call the two sides 'English' and 'French/Spanish').
This is a terrific problem to illustrate applying mathematics to the real
world. One has a very complicated situation, and you must decide what are
the key features. The more features you include the better your model will
be, but the less likely you'll be able to solve it! It's a bit of an art
figuring out exactly how much to include to capture what truly matters and
still be able to solve your model. We'll discuss this in greater detail when
we do the Pythagorean
Won-Loss theorem from baseball, which is a nice application of
probability and multiple integrations.
Finally, a common theme that surfaces as we do more and more
mathematical modeling is that simple models very quickly lead to very hard
equations to solve. The drowning swimmer problem is actually the same as
Snell's law, for how
light travels / bends in going from one medium to another. If you write
down the equations for the drowning swimmer, you quickly find a quartic to
solve. For interesting articles related to this, see the two papers below by
Pennings on whether or not dogs know calculus. Click
here for a picture of his
dog, Elvis, who does know
calculus.
General comment: it's important to be able to take complex information
and sift to the relevant bits. A great example is the song I'm
my own Grandpa. Listen to it and try to graph all the relations and see
that he really is his own grandfather (with no incest!). A solution is here (don't
view this until you try to graph it!). Actually,
this is a MUCH better illustration of the relationships.
I gave a heuristic on why the correct generalization of the Method of
Lagrange Multipliers to several constraints \(g_1(\overrightarrow{x}) = c_1,
\dots, g_L(\overrightarrow{x}) = c_L\) is to require \((\nabla f)(\overrightarrow{x})
= \lambda_1 (\nabla g_1)(\overrightarrow{x}) + \cdots + \lambda_L (\nabla
g_L)(\overrightarrow{x})\).
The idea was to consider the constraint \(g(\overrightarrow{x}) = a_1 (g_1(\overrightarrow{x})
- c_1)^2 + \cdots + a_L (g_L(\overrightarrow{x}) - c_L)^2\), and notice that
the gradient of \(g\) is just \(2a_1 g_1(\overrightarrow{x}) - c_1) (\nabla
g_1)(\overrightarrow{x}) + \cdots + 2a_L g_L(\overrightarrow{x}) - c_L) (\nabla
g_L)(\overrightarrow{x})\), and then say we can vary the \(a_i\)'s so that
\(\lambda_1 = 2a_1 g_1(\overrightarrow{x}) - c_1) (\nabla g_1)(\overrightarrow{x})\)
and so on. There are some issues with this, as the differences between the
\(g\)'s and the constants are zero and we need the \(a\)'s to be positive in
order for \(g(\overrightarrow{x}) = 0\) to force each \(g_i(\overrightarrow{x})
= c_i\) , but it gives some flavor. One possibility is to keep the \(a\)'s
as positive and replace \(g(\overrightarrow{x}) = 0\) with \(g(\overrightarrow{x})
= c\) for a
very
small c, for example, \(1/10^{100000!}!\). Then the various constraints are
almost satisfied.... Again, this is meant to give you a rough flavor and not
be a full proof; a full proof uses (http://www.youtube.com/watch?v=mzAfTmC3It0)
linear algebra!
It is possible to get so caught up in reductions and compactifications that
the resulting equation hides all meaning. A
terrific example is the great physicist Richard Feynman's reduction of all of
physics to one equation, U = 0, where U represents the unworldliness of the
universe.
Suffice it to say, reducing all of physics to this one equation does not
make it easier to solve physics problems / understand physics (though, of
course, sometimes good notation does assist us in looking at things the
right way).
Thirteenth
day lecture: http://youtu.be/pgwC2vOwRuE
(March 17, 2014: Lagrange Multipliers)
- Monday,
March 5.
We discussed directional
derivatives. It is natural that we develop such a concept, as up until now
we have only considered derivatives in directions parallel to the various
coordinate axes. A central theme of multivariable calculus is the need to be
able to approach a point along any path, and that in several dimensions
numerous paths are available (unlike the 1-dimensional case, where essentially
we just have two paths). Directional derivatives will play a key role in
optimization problems.
-
One of the requests in Spring 2010 was to talk about applications of
multivariable calculus to molecular gastronomy. After some web browsing, I
eventually beecame interested in how bees communicate amongst themselves as
to where food is. There appear to bee two schools; one is the waggle
dance / language school, and the other is the odor
plume theory. In addition to controversies on how bees learn, there are
lots of nice applications to gradients and (I believe) directional
derivatives. The goal is to convey information about a specific path through
a very complex space.
- See also the paper: Odor
landscapes and animal behavior: tracking odor plumes in different physical
worlds (Paul Moorea, John
Crimaldib). Abstract: The acquisition of information from sensory systems
is critical in mediating many ecological interactions. Chemosensory
signals are predominantly used as sources of information about habitats
and other organisms in aquatic environments. The movement and distribution
of chemical signals within an environment is heavily dependent upon the
physics that dominate at different size scales. In this paper, we review
the physical constraints on the dispersion of chemical signals and show
how those constraints are size-dependent phenomenon. In addition, we
review some of the morphological and behavioral adaptations that aquatic
animals possess which allow them to effectively extract ecological
information from chemical signals.
- Today was a dividends lecture. The concept of directional derivative
tied together many items we saw in Chapter 11, including the notion of a
line, of a tangent plane, the formula for the dot product in terms of the
lengths of the vectors and the cosine of the angle, level sets, .... The
list goes on and on. This is common in mathematics: you spend a good amount
of time on the preliminaries and then reap great rewards later. We saw a
beautiful geometric interpretation of the gradient: \((\nabla f)(\overrightarrow{x})\)
points in the direction of maximum change of \(f\); further, the gradient is
normal to the level set.
- It's a good idea to check new results with old -- do they really
generalize? The directional derivative \(D_{\overrightarrow{u}}f(\overrightarrow{x})\)
can be computed by \((\nabla f)(\overrightarrow{x}) \cdot \overrightarrow{u}\);
if we take \(\overrightarrow{u}\) to be a unit vector along a coordinate
axis (so it is \(\overrightarrow{e}_1, \overrightarrow{e}_2\), ..., or \(\overrightarrow{e}_n\)),
then the directional derivative reduces to the partial derivative in that
direction. In other words, \(D_{\overrightarrow{e}_i}(\overrightarrow{x}) =
\partial f / \partial x_i\) (evaluated of course at \(\overrightarrow{x}\)).
- When we were trying to find the direction of greatest change of \(f\) at
\(\overrightarrow{x}\), we eventually saw it had to be in the direction of
\((\nabla f)(\overrightarrow{x})\). You'll forgot, or never use, most of the
material in this course; that's fine, as learning these facts is only part
of why you're here. You're here in large part to get a sense of how to
attack a problem, what to look for. We're looking for the direction \(\overrightarrow{v}\)
that maximizes how fast \(f\) is changing; in other words, we want to find
\(\overrightarrow{v}\) such that \(D_{\overrightarrow{v}}(\overrightarrow{x})
= (\nabla f)(\overrightarrow{x}) \cdot \overrightarrow{v}\) is greatest.
When we look at this expression, there is a special,
distinguished vector. We've fixed a function \(f\) and a point \(\overrightarrow{x}\)
and we're searching for \(\overrightarrow{v}\). We want to know what
direction \(\overrightarrow{v}\) should point in. While it's natural to
guess that \(\overrightarrow{x}\) plays a role, that can't be the answer as
\(\overrightarrow{x}\) is where we evaluate the function and the answer
needs to depend on \(f\). The only vector present that involves \(f\) is
\((\nabla f)(\overrightarrow{x})\). This is a vector, it involves \(f\)
evaluated at the point we care about, \(\overrightarrow{x}\). This suggests that it plays a role in the answer. Maybe this is the
direction of greatest increase, or greatest decrease, or perhaps a direction
of no change. But it is a special direction, and it should be investigated. You want to get to
the point where you can see this, where you can get a sense of what to do
and what to try. This vector \((\nabla f)(\overrightarrow{x})\) is in every
directional derivative of \(f\) at \(\overrightarrow{x}\); it's probably
important and thus it suggests we calculate the directional derivative in
that direction (as well as it's opposite direction, as well as all
directions perpendicular to this -- these perpendicular directions lead to
the level sets).
- It is very important
to know proofs and definitions; there's a reason one of the exam questions
required you to be able to describe clearly key concepts from the course. A
very important example is the fall of Western Civilization (or, if you're
not quite as pessimistic, the financial mortgage meltdown). While
there are many reasons behind the collapse (I
have close family that has worked in the upper levels of many of the top
financial firms; if you are interested in stories of what isn't reported in
the news, let me know), one large component was an incorrect use of Gaussian
copulas. It's similar to looking at low-velocity data and extrapolating to
relativistic speeds -- there is an enormous danger when you apply results
from one region in another with no direct data in that second realm. A great
article on this is from Wired Magazine (The
Formula That Killed Wall Street). It's worth reading this. Some
particularly noteworthy passages:
- Bankers should have noted that very small
changes in their underlying assumptions could result in very large changes
in the correlation number. They also should have noticed that the results
they were seeing were much less volatile than they should have been which
implied that the risk was being moved elsewhere. Where had the risk gone?
They didn't know, or didn't ask. One reason was that the outputs came from
"black box" computer models and were hard to subject to a commonsense
smell test. Another was that the quants, who should have been more aware
of the copula's weaknesses, weren't the ones making the big
asset-allocation decisions. Their managers, who made the actual calls,
lacked the math skills to understand what the models were doing or how
they worked. They could, however, understand something as simple as a
single correlation number. That was the problem.
- No one knew all of this better than David
X. Li: "Very few people understand the essence of the model," he told The
Wall Street Journal way back in fall 2005. "Li can't be blamed," says
Gilkes of CreditSights. After all, he just invented the model. Instead, we
should blame the bankers who misinterpreted it. And even then, the real
danger was created not because any given trader adopted it but because
every trader did. In financial markets, everybody doing the same thing is
the classic recipe for a bubble and inevitable bust.
-
Reading for Frozen Fractal Flick lecture:
https://www.math.ucla.edu/~jteran/papers/SSCTS13.pdf
-
Twelfth
day lecture: Part I: http://youtu.be/FbwGZfkf9P8 Part II:
http://youtu.be/RbhK3a308sg
(March 14, 2014: Directional Derivatives, Exp Fn, Trig in
a Day)
-
Friday, March
2.
Today we discussed the Chain Rule. The
Chain Rule is one of the most
important results in multivariable calculus, as it allows us to build
complicated functions depending on functions of many inputs. To state it
properly requires some linear algebra, especially matrix
multiplication. The proof uses multiple applications of adding zero. This
is a essential skill to master if you wish to continue in mathematics. It is
somewhat similar to adding auxiliary lines in geometry. With experience, it
becomes easier to `see' where and how to add zero. The idea is we want to add
zero in such a way that we convert one expression to several, where the
resulting expressions are easier to analyze because we are subtracting two
quantities that are quite close. For the chain rule, we will do this by adding
numerous intermediary points.
- One way to view the Chain Rule is that it is all about giving you the
freedom to choose. You can either plug everything in and differentiate
directly by brute force, or you
can use the Chain Rule to find the derivative of the composition in terms of
the derivatives of the constituent pieces. Depending on the problem, one way
could be easier than the other; there are examples of situations where
direct substitution is best, and examples where it is better to use the
Chain Rule. With experience it becomes clear which way is better. When we
discuss gradients and directional derivatives, we'll see a theoretical
interpretation of the Chain Rule. This will play a fundamental role when we
return to optimization problems. Finally, of course, it is useful to be able
to compute an answer two different ways, as this provides a nice check of
your work.
- To use the Chain Rule in full glory, we needed to understand how to
multiply matrices, as \(h(\overrightarrow{u}) = f(g(\overrightarrow{u}))\)
implies \((Dh)(\overrightarrow{u}) = (Df)(g(\overrightarrow{u})) (Dg)(\overrightarrow{u})\),
where \(\overrightarrow{u} = (u_1,
\dots, u_m)\). One can motivate matrix multiplication through the dot
product, as we know how to take the dot product of two vectors of the same
number of coordinates. Matrix multiplication looks quite mysterious at
first. Wikipedia
has a nice article (with color) on multiplying matrices, though it is a
bit short on motivation. The advanced reason as to why we do this comes from
also viewing matrices as linear transformations, and we want the product of
two matrices to represent the composition of the transformations. This is an
advanced topic, and sadly is frequently mangled in a linear algebra course.
I've posted a little bit about this in the advanced
notes from Thursday's lecture from a few years ago. The best motivation I know is to consider
\(2 \times 2\) rotation
matrices. If \(R(a)\) corresponds to rotating by \(a\) radians, and \(R(b)\)
to rotating by \(b\) radians, then \(R(b) R(a)\) should equal \(R(b+a)\);
this does happen if we use the matrix multiplication method.
- I did a quick google search for applications of the chain rule in
various subjects. Here's
something in economics. Here's
another econ example. Here's
a chemistry example.
- One of our biggest applications of the Chain Rule was to inverse
functions and derivatives. We talked about this before, but it's so
important it's worth seeing again.
- If \(f(g(x)) = x\) (so \(f\) and \(g\) are inverse functions, such as
\(\sqrt{x^2})\), then \(g'(x) = 1 / f'(g(x))\); in other words, knowing
the derivative of \(f\) we know the derivative of its inverse function
\(g\). This was used in Calc I to pass from knowing the derivative of \(\exp(x)\)
to the derivative of \(\ln(x)\). We can apply this to various inverse
trig functions (a
list of the derivatives of these are here). This highlights one of the
most painful parts of integration theory -- just because we are close to
finding an anti-derivative does not mean we can actually find it! While
there is a
nice anti-derivative of \(\sqrt{1 - x^2}\), it is not a pure derivative of
an inverse trig function. There are many tables
of anti-derivatives (or integrals) (a
fun example on that page is the Sophomore's
Dream). Unfortunately it is not always apparent how to find these
anti-derivatives, though of course if you are given one you can check by
differentiating (though sometimes you have to do some non-trivial algebra
to see that they match). In fact, there are some tables of integrals of
important but hard functions where most practitioners have no idea how
these results are computed (and occasionally there are errors!). We will
see later how much simpler these problems become if we change variables;
to me, this is one of the most important lessons you can take from the
course: Many problems
have a natural point of view where the algebra is simpler, and it is worth
the time to try to find that point of view!
- Let \(f(x) = \exp(x)\). Then \(f'(x) = \lim [f(x+h) - f(x)]/h\) = \(\lim_{h
\to 0} [\exp(x+h) - \exp(x)] / h\) = \(\lim_{h \to 0} [\exp(x) \exp(h) - \exp(x)]
/ h\) = \(\exp(x) \lim_{h \to 0} [\exp(h) - 1] / h\). As \(\exp(0) = 1\),
we find \(f'(x) = \exp(x) \lim_{h \to 0} [f(h) - f(0)] / h\) = \(\exp(x)
f'(0)\); thus we know the
derivative of the exponential function everywhere once we know the
derivative at 0!
- Wednesday
February 28.
The goal of today's lecture was to see the power of approximating a
complicated function with simpler ones.
-
Videos: Mandelbrot zoom:
video 1,
video 2.
Here's a
cubic fractal zoom. (other copies: Videos:
Mandelbrot set
Newton Fractal (cubic) )
-
Mathematica notebook for Newton's Method (pdf
version here)
-
We compared two methods to find roots of polynomials. In some special cases
we can find closed form expressions for roots in terms of the coefficients.
For example, any linear equation (\(ax+b=0\)), quadratic (\(ax^2+bx+c=0\)),
cubic (\(ax^3+bx^2+cx+d=0\)) or quartic (\(ax^4+bx^3+cx^2+dx+e=0\)) has a
formula for the roots in terms of the coefficients of the polynomials; this
fails for polynomials of degree 5 and higher (the Abel-Ruffini
Theorem; see also Galois).
It is very convenient when we have a solution that is a function of the
parameters; we can then use our methods to find the optimal values of the
parameters. Sadly in industry it is often difficult to get closed form
expressions; if you are looking for the most potent compound, for example,
you might be required to do numerous different trial runs and just observe
which is best. We thus need a way to find optimal values / solve equations.
We describe two below.
- Newton's method is
significantly more powerful than divide
and conquer (also called
the bisecting algorithm); this is not surprising as it assumes more
information about the function of interest (namely, differentiability).
The numerical stability of Newton's method leads to many fascinating
problems. One terrific example is looking at roots in the complex plane of
a polynomial. We assign each root a different color (other than purple),
and then given any point in the complex plane, we apply Newton's method to
that point repeatedly until one of two things happen: it converges to a
root or it diverges. If the iterates of our point converges to a root, we
color our point the same color as that root, else we color it purple. This
leads to Newton
fractals, where two points extremely close to each other can be
colored differently, with remarkable behavior as you zoom in. If you're
interested in more information, let me know; a good chaos program is xaos (I
have other links to such programs for those interested). One final aside:
it is often important to evaluate these polynomials rapidly; naive
substitution is often too slow, and Horner's
algorithm is frequently used.
- The fractal behavior exhibited by Newton's method applied to finding
roots of polynomials is one of many examples of Chaos
Theory, or extreme sensitivity to initial conditions. While one of the
earliest examples was the work of Poincare on the motion of three
planetary bodies, the subject really took off with Lorenz work on
weather (the
Butterfly Effect). Another nice example is the orbit
of Pluto; while we know it will orbit the sun, its orbit is chaotic
and we cannot say where exactly in the orbit it will be millions of years
from now.
- Instead of approximating a function locally by a line, we now use a
plane (in 2-dimensions) or hyperplane (in general). We can use the Mean
Value Theorem to get some
information on how close the estimation is, and then use these estimations
to approximate our function. A Mathematica
file with the tangent line and tangent plane approximations is here. One
definition of differentiability is that a function is differentiable if the
error in the tangent plane approximation tends to zero faster than the
distance of where we are to where we start tends to zero. It is sadly
possible for the partial derivatives to exist without the function being
differentiable. We showed how it is not sufficient for the partial
derivatives to exist; that is not enough to imply our function is
differentiable. The example was \(f(x,y) = (xy)^{1/3}\). What must we assume
in order for the partial derivatives to imply our function is
differentiable? It turns out it suffices to assume the partial derivatives
are continuous. This is the major theorem in the subject, and provides a
nice way to check for when a function is differentiable.
-
The proof of the alluded to theorem above uses two of my
favorite techniques. While sadly we do not multiply by 1, we do get to add
0 and we do use the Mean
Value Theorem. One of my goals in the class is to illustrate how to
think about these problems, why we try certain approaches for our proofs.
We want to study how well the tangent plane approximates our function,
thus we need to study f(x,y) - f(0,0) - (δf/δx)(0,0)
x - (δf/δy)(0,0) y. Our theorem assumes the partial derivatives are
continuous, thus it stands to reason that at some point in the proof we
should use the partial derivatives are continuous! The trick is to try and
see how we can get another δf/δx and another δf/δy to appear. The key is
to recall the MVT. If we add 0 in a clever way, we can do this. Our
expression equals f(x,y) -
f(0,y) + f(0,y) - f(0,0) - (δf/δx)(0,0)
x - (δf/δy)(0,0) y. We now use the MVT on f(x,y) - f(0,y) and on f(0,y) -
f(0,0). In each of these two expressions, only one variable changes. Thus
the first is (δf/δx)(c,y) x and the second is (δf/δy)(0,ĉ). Thus the error
in using the tangent plane is [(δf/δx)(c,y) - (δf/δx)(0,y)]
x + [(δf/δy)(0,ĉ)
- (δf/δx)(0,o)] y. We now see how the continuity of the partials enters --
it ensures that these differences are small, even when we divide by |(x,y)-(0,0)|.
-
In Economics,
the standard random
walk hypothesis seems to have
lost most of its supporters, though there are variants (and I'm not familiar
with all); see also the efficient
market hypothesis and technical
analysis, and all the links there. (There are also many good links on
the wikipedia page on Eugene
Fama). Two famous books (with different conclusions) are Malkiel's A
random walk down wall street and
Mandelbrot-Hudson's The
(mis)behavior of markets (a fractal view of risk, ruin and reward). Some
interesting papers if you want to read more:
-
One of our biggest applications of the Chain Rule was to inverse
functions and derivatives.
- If \(f(g(x)) = x\) (so \(f\) and \(g\) are inverse functions, such as
\(\sqrt{x^2})\), then \(g'(x) = 1 / f'(g(x))\); in other words, knowing
the derivative of \(f\) we know the derivative of its inverse function
\(g\). This was used in Calc I to pass from knowing the derivative of \(\exp(x)\)
to the derivative of \(\ln(x)\). We can apply this to various inverse
trig functions (a
list of the derivatives of these are here). This highlights one of the
most painful parts of integration theory -- just because we are close to
finding an anti-derivative does not mean we can actually find it! While
there is a
nice anti-derivative of \(\sqrt{1 - x^2}\), it is not a pure derivative of
an inverse trig function. There are many tables
of anti-derivatives (or integrals) (a
fun example on that page is the Sophomore's
Dream). Unfortunately it is not always apparent how to find these
anti-derivatives, though of course if you are given one you can check by
differentiating (though sometimes you have to do some non-trivial algebra
to see that they match). In fact, there are some tables of integrals of
important but hard functions where most practitioners have no idea how
these results are computed (and occasionally there are errors!). We will
see later how much simpler these problems become if we change variables;
to me, this is one of the most important lessons you can take from the
course: Many problems
have a natural point of view where the algebra is simpler, and it is worth
the time to try to find that point of view!
- Below is a more detailed calculation for the inverse derivative
rule. Let \(A(x) = \exp(\ln x) = f(g(x))\); thus \(f(x) = e^x, g(x)
= \ln x, f'(x) = e^x, f'(g(x)) = e^{\ln x} = x\) and the goal is to find
\(g'(x)\). As \(A'(x) = f'(g(x)) g'(x) = 1\), we have \(g'(x) = 1/f'(g(x))
= 1/x\); thus \(\ln'(x) = 1/x\). Note that in college math courses, people
use \(\log x\) to refer to the natural logarithm; I'll try and write \(\ln
x\), but \(\log x\) should always be taken to be base \(e\).
- Below is a more detailed calculation for the inverse derivative
rule. Let \(A(x) = f(g(x)) = \tan(\arctan x) = x\). Using the
quotient rule we get the derivative of \(\tan x\) is \(1/\cos^2 x\) (some
people like to say \(\csc x\) here). We find \(f'(x) = \tan'(x) = 1/\cos^2(x)\),
\(f'(g(x)) = 1/\cos^2(\arctan x) = 1/(1+x^2)\). To see the last claim,
draw a right triangle with base \(1\) and height \(x\); this triangle has
an angle with tangent \(x\). The hypotenuse will be \(\sqrt{1+x^2}\) by
the Pythagorean theorem; draw the picture! Thus the angle here has a
cosine of \(1\) divided by \(\sqrt{1+x^2}\), and our claim follows. Using
the inverse derivative rule we find \(\arctan'(x) = 1/\tan'(\arctan x) =
1/(1+x^2)\). This is actually a very important formula; it's
used extensively in probability (for the
Cauchy
distribution, which is a heavy tailed distribution), and in the
Gregory-Leibniz Formula for \(pi\).
- Tenth day lecture: http://youtu.be/Da0cO905Aj8
(March 10, 2014: Linear Approximation)
- Mon
day,
February 26.
Today's lecture covered the Method of Least Squares. The best fit value of the
parameters depends on how we choose to measure errors. It is very important to
think about how you are going to measure / model, as frequently people reach
very different conclusions because they have different starting points /
different metrics. We'll see another example of how our metric can affect the
answer when we get to Lagrange multipliers.
Here is a nice website to see the difference.
- The Method of Least Squares is one of my favorites in statistics (click
here for the Wikipedia page, and click
here for my notes). The Method of Least Squares is a great way to find
best fit parameters. Given a hypothetical relationship \(y = a x + b\), we
observe values of \(y\) for different choices of \(x\), say \((x_1, y_1),
(x_2, y_2), (x_3, y_3)\) and so on. We then need to find a way to quantify
the error. It's natural to look at the observed value of \(y\) minus the
predicted value of \(y\); thus it is natural that the error should be \(\sum_{i=1}^n
h\left(y_i - (a x_i + b) \right)\) for some function \(h\). What is a good
choice? We could try \(h(u) = u\), but this leads to sums of signed errors
(positive and negative), and thus we could have many errors that are large
in magnitude canceling out. The next choice is \(h(u) = |u|\); while this is
a good choice, it is not analytically tractable as the absolute value
function is not differentiable. We thus use \(h(u) = u^2\); though this
assigns more weight to large errors, it does lead to a differentiable
function, and thus the techniques of calculus are applicable. We end up with
a very nice, closed form expression for the best fit values of the
parameters.
- Unfortunately, the Method of Least Squares only works for linear
relations in the unknown parameters. As a great exercise, try to find the
best fit values of \(a\) and \(c\) to \(y = c/x^a\) (for
definiteness you can think of this as the force due to two unit masses that
are \(x\) units apart). When you take the derivative with respect to \(a\)
and set that equal to zero, you won't get a tractable equation that is
linear in a to solve. Fortunately there is a work-around. If we change
variables by taking logarithms, we find \(\ln(y) = \ln(c/x^2)\); using logarithm
laws this is equivalent to
\(\ln(y) = a \ln(x) + \ln(c)\); setting \(Y = \ln(y), X = \ln(X)\) and \(b =
\ln(c)\) this is equivalent to \(Y = a X + b\), which is exactly the
formulation we need! This example illustrates the power of logarithms; it
allows us to transform our data and apply the Method of Least Squares.
- There are many examples of power laws in the world. Many of my favorite
are related to Zipf's
law. The frequencies of the most common words in English is a
fascinating problem (click
here for the data; see also this
site); this works for other languages as well, for the size of the most
populous cities, ...; if you consider more general power laws, you also get Benford's
law of digit bias, which is used
by the IRS to detect tax fraud (the
link is to an article by a colleague of mine on using Benford's law to
detect fraud). The power law relation is quite nice, and initially
surprising to many. My Mathematica
programming analyzing this is available here. See also this
paper by Gabaix for Zipf's law and the growth of cities. As a nice
exercise, you should analyze the growth of city populations (you can get
data on both US and the
world from Wikipedia).
- We discussed Kepler's
Three Laws of Planetary Motion (the
Wikipedia article is very nice). Kepler was proudest (at least for a
longtime) of Mysterium
Cosmographicum (I strongly
urge you to read this; yes, the same Kepler whom we revere today for his
understanding of the cosmos also advanced this as a scientific theory --
times were different!).
- Finally, we saw the importance of how we choose to measure things; how
we model and how we judge the model's prediction will greatly affect the
answer. In a similar spirit, I thought I would post a brief note about Oulipo,
a type of mathematical poetry (this
is a link to the Wikipedia page, which has links to examples). There was a
nice article about this recently in Math Horizons (you
can view the article here). This is a nice example of the intersection
of math and the arts, and discusses how the structure of
a poem affects the output, and what structures might lead to interesting
works.
-
See
http://www.smbc-comics.com/?id=3279#comic for a nice comic on
absolute values....
-
Favorite clip on family feud:
http://www.youtube.com/watch?v=7gt1qh_bR-4
-
Ninth day lecture:
See lecture six on GLOW
(March 7, 2014: Method of Least Squares)
-
Friday, February 23.
Special coding lecture.
- Wednesday, February 21.
Today is one of the biggest applications of multivariable calculus,
optimization. We'll see a great instance of this when we do the Method of
Least Squares.
-
The search for extrema is a central pursuit in modern science and
engineering. It is important to have techniques to winnow the list of
candidate points. The methods discussed in class are the natural
generalizations from one-variable calculus. While one must prove that the
function under consideration does have a max/min, typically this is clear
from physical reasons (for example, there should be a pen of maximal area
for given perimeter; there should be a path of least time).
- In one-dimension, boundaries of sets aren't too bad; for example, the
boundary of [a, b] is just two points, a and b. The situation is violently
different in several variables. There the boundary can have infinitely
many points, and reducing a problem to interior critical points and
checking the function on the boundary is not enough; we must have a way to
evaluate all these points on the boundary.
- The generalization of the second derivative tests involves
determinants and whether or not the Hessian is a positive
definite matrix, a negative
definite matrix, et cetera. What is really going on is that we want to
use the Principal Axis Theorem and change to a coordinate system where the
Hessian is easier to understand because, in this new coordinate system, it
is a diagonal matrix! This makes a lot more sense after taking Linear
Algebra.
- In one of the sabermetrics lectures we might discuss linear
programming. This is a wonderful topic, and it allows us to solve (or
approximate the solutions) to a wealth of problems very quickly. My
lecture notes are online here. One of my favorite applications of
linear programming is to determining
when teams are eliminated from playoff contention; MLB and ESPN
frequently do the analysis incorrectly by not taking into account
secondary effects of teams playing teams playing teams. For example, ESPN
or MLB back in '04 had the wild-card unclinched for one extra day. (The
Sox had a big lead over the Angels and a slightly smaller lead over the
As; however, the As and the Angels were playing each other and thus at
least one team would get 2 losses, and one had to win the ALWest. Thus the
Sox had clinched the wildcard a day earlier than thought. We'll discuss
this paper when the pre-frosh visit, as it's a nice application of
multivariable calculus.) If you're interested, click
here for a paper I wrote with colleagues applying
linear programming to helping a movie theatre determine optimal schedules.
- It is worth remarking that for many applications in the real world, we
do not need to find the true extremum, but rather just something very
close. For example, say we are trying to determine the optimal schedule
for an airline for a given day. We can write the linear programming
problem down, but it might take days to years to run; however, frequently
we can obtain bounds showing how close our answer is to the theoretical
best (ie, we can show we are no more than X away from optimal). It often
happens that X is small, and thus with a small run-time we can be close
enough. (It isn't worth it to ground our airfleet for a few years to find
the optimal schedule!)
-
Systems of equations are frequently used to model real world problems, as it
is quite rare for there to be only one quantify of interest. If you want to
read more about applying math to analyze the Battle
of Trafalgar, here
is a nice handout (or, even
better, I think we could go further and write a nice paper for a general
interest journal expanding on the Mathematica
program I wrote). The model is very similar to the Lotka-Volterra
predator-prey equations (our
evolution is quite different, though; this is due to the difference in sign
in one of the equations). Understanding these problems is facilitated by
knowing some linear algebra. It is also possible to model this problem using
a system of difference equations, which can readily be solved with linear
algebra. Finally, it's worth noting a major drawback of this model, namely
that it is entirely deterministic: you specify the initial concentrations of
red and blue and we know exactly how many exist at any time. More generally
one would want to allow some luck or fluctuations; one way to do this is
with Markov
chains. This leads to more complicated (not surprisingly) but also more
realistic models. In particular, you can have different probabilities for
one ship hitting another, and given a hit you can have different
probabilities for how much damage is done. This can be quite important in
the 'real' world. A classic example is the British efforts to sink the
German battleship Bismarck in WWII. The Bismarck was superior to all British
ships, and threatened to decisively cripple Britain's commerce (ie, the flow
of vital war and food supplies to the embattled island). One of the key
incidents in the several days battle was a lucky torpedo shot by a British
plane which seriously crippled the Bismarck's rudder. See
the wikipedia entry for more details on one of the seminal naval engagements
of WWII. The point to take away from all this is the need to always be
aware of the limitations of one's models. With the power and availability of
modern computers, one workaround is to run numerous simulations and get
probability windows (ie, 95% of the time we expect a result of the following
type to occur). Sometimes we are able to theoretically prove bounds such as
these; other times (using Markov chains and Monte
Carlo techniques) we numerically approximate these probabilities.
-
Eighth day lecture:
http://www.youtube.com/watch?v=L1WUWmjvnE0
(March 5, 2014: Optimization)
- Monday,
February 19.
We continued our discussion of partial derivatives. We
talked a lot about different notations for the derivative. It is very
convenient to be able to refer to all the different derivatives (functions
with one input, several inputs and one output, several inputs and several
outputs) with just one notation. The definition is that a function is
differentiable if the error in the tangent plane approximation tends to zero
faster than the distance of where we are to where we start tends to zero. It
is sadly possible for the partial derivatives to exist without the function
being differentiable. We showed how it is not sufficient for the partial
derivatives to exist; that is not enough to imply our function is
differentiable. The example was f(x,y) = (xy)1/3. What must we
assume in order for the partial derivatives to imply our function is
differentiable? It turns out it suffices to assume the partial derivatives are
continuous. This is the major theorem in the subject, and provides a nice way
to check for when a function is differentiable.
-
The proof of the alluded to theorem above uses two of my
favorite techniques. While sadly we do not multiply by 1, we do get to add 0
and we do use the Mean
Value Theorem. One of my goals in the class is to illustrate how to
think about these problems, why we try certain approaches for our proofs. We
want to study how well the tangent plane approximates our function, thus we
need to study f(x,y) - f(0,0) - (δf/δx)(0,0)
x - (δf/δy)(0,0) y. Our theorem assumes the partial derivatives are
continuous, thus it stands to reason that at some point in the proof we
should use the partial derivatives are continuous! The trick is to try and
see how we can get another δf/δx and another δf/δy to appear. The key is to
recall the MVT. If we add 0 in a clever way, we can do this. Our expression
equals f(x,y) -
f(0,y) + f(0,y) - f(0,0) - (δf/δx)(0,0)
x - (δf/δy)(0,0) y. We now use the MVT on f(x,y) - f(0,y) and on f(0,y) -
f(0,0). In each of these two expressions, only one variable changes. Thus
the first is (δf/δx)(c,y) x and the second is (δf/δy)(0,ĉ). Thus the error
in using the tangent plane is [(δf/δx)(c,y) - (δf/δx)(0,y)] x + [(δf/δy)(0,ĉ)
- (δf/δx)(0,o)] y. We now see how the continuity of the partials enters --
it ensures that these differences are small, even when we divide by ||(x,y)-(0,0)||.
- The Mean Value Theorem is also the key ingredient in the proof of the equality
of mixed partial derivatives (assuming
both are continuous). Sadly
there do exist functions where the mixed derivatives are unequal
(for extra credit, show that the mixed
derivatives are not equal in the linked example).
- Notation is very important. The subscript notation for partial
derivatives is nice and elegant; it's easy to glance at uxxy and
quickly glean that it's two derivatives with respect to x followed by one
with respect to y. This allows us to write down many partial
differential equations in a
nice, compact form. Some of the most famous and important are (1) the
heat equation, (2) the
wave equation, and (3) the Navier-Stokes
equation. The last arises in fluid flow, and is one of the Clay
Millenium Prize Problems.
- In differential trigonometry, everything comes down to the limit as h
tends to zero of sin(h)/h. One can prove this limit geometrically, as is
often done, and then obain the derivatives by using the angle addition
formulas. We sketch another avenue to these addition formulas. The Pythagorean
Theorem says
cos2(x) + sin2(x) = 1. There are many ways to obtain
this formula. Perhaps one of the most useful is the Euler
- Cotes formula, exp(ix) = cos(x) + i sin(x). One can essentially derive
all of trigonometry from this relation, with just a little knowledge of the exponential
function. Specifically, we have exp(z) = 1 + z + z2/2! + z3/3!
+ .... It is not at all clear from this definition that exp(z) exp(w) =
exp(z+w); this is a statement about the product of two infinite sums
equaling a third infinite sum. It is a nice exercise in combinatorics to
show that this relation holds for all complex z and w.
- Taking the above identities, we sketch how to derive all of
trigonometry! I'll try to work this into a class later in the semester
when we have more time. Let's prove the angle addition formulas. We have exp(ix) =
cos(x) + i sin(x) and exp(iy) = cos(y) + i sin(y). Then exp(ix) exp(iy) =
[cos(x) + i sin(x)] [cos(y) + i sin(y)] = [cos(x) cos(y) - sin(x) sin(y)]
+ i [sin(x) cos(y) + cos(x) sin(y)]; however, exp(ix) exp(iy) = exp(i(x+y))
= cos(x+y) + i sin(x+y) by Euler's formula. The only way two complex
numbers can be equal is if they have the same real and the same imaginary
parts. Thus, equating these yields cos(x+y) = cos(x) + isin(x) and sin(x+y)
= sin(x) cos(y) + cos(x) sin(y).
- It is a nice exercise to derive all the other identities. One can even
get the Pythagorean theorem! To obtain this, use exp(ix) exp(-ix) = exp(0)
= 1.
- We thus see there is a connection between the angle addition formulas
in trigonometry and the exponential addition formula. Both of these are
used in critical ways to compute the derivatives of these functions. For
example, these formulas allow us to differentiate sine, cosine and the
exponential functions anywhere once we know their derivative at just one
point. Let f(x) = exp(x). Then f'(x) = lim [f(x+h) - f(x)]/h = lim [exp(x+h)
- exp(x)] / h = lim [exp(x) exp(h) - exp(x)] / h = exp(x) lim [exp(h) - 1]
/ h; as exp(0) = 1, we find f'(x) = exp(x) lim [f(h) - f(0)] / h = exp(x)
f'(0); thus we know the derivative of the exponential function everywhere
once we know the derivative at 0! One finds a similar result for the
derivatives of sine and cosine (again, this shouldn't be surprising as the
functions are related to the exponential through Euler's formula).
-
Seventh day lecture: http://www.youtube.com/watch?v=S6wiYRCiQhs
(March 3, 2014: Derivatives)
- Wednes
day,
February 14. The terminology of open
sets, closed
sets, boundary
points and so on won't be
used too much (perhaps not ever again) in this course, but are essential in
more advanced analysis classes. It is important to build our subjects on
firm foundations. A great example of why this is needed is Russell's
paradox, which showed
that we didn't even understand what it meant to be a set or an element of a
set! Another famous paradox is the Banach
- Tarski paradox, which tells us that we don't understand volumes! It
basically says if you assume the Axion
of Choice, you can cut solid sphere into 5 pieces, and reassemble the
five pieces to get two completely solid spheres of the same size as the
original! While it is rare to find these paradoxes in mathematics,
understanding them is essential. It
is in these counter-examples that we find out what is really going on. It is
these examples that truly illuminate how the world is (or at least what our
axioms, imply). Most people use the Zermelo-Fraenkel
axioms, abbreviated ZF. If you additionally assume the Axiom of Choice,
it's called ZFC or ZF+C. Not all problems in mathematics can be answered yea
or nay within this structure. For example, we can quantify sizes of
infinity; the natural numbers are much smaller than the reals; is there any
set of size strictly between? This is called the Continuum
Hypothesis, and my mathematical grandfather (one of my thesis
advisor's advisor), Paul
Cohen, proved it is independent (ie, you may either add it to your axiom
system or not; if your axioms were consistent before, they are still
consistent).
- In a real analysis course, one develops the notation and machinery to
put calculus on a rigorous footing. In fact, several
prominent people criticized the foundations of calculus, such as Lord
Berkeley; his famous attack, The
Analyst, is available here. It wasn't until decades later that a good
notion of limit, integral and derivative were developed. Most people are
content to stop here; however, see also Abraham
Robinson's work in Non-standard
Analysis. He is one of several mathematicians we'll encounter this
semester who have been affiliated with my Alma Mater, Yale.
Another is the great Josiah
Willard Gibbs.
- One of my favorite applications of open
and closed sets is Furstenberg's
proof of the infinitude of primes; one night while a postdoc at Ohio
State I had drinks with Hillel
Furstenberg and one of his
students, Vitaly
Bergelson. This is considered by many to be one of the best proofs of
the infinitude of primes; it is so good it is one of six proofs given in THE
Book. Unlike most proofs of the infinitude of primes, this gives no
bounds on how many primes there are at most x; even Euclid's
proof (if there are only
finitely many primes, say \(p_1, \dots, p_n\), then consider
\((p_1 \cdots p_n)+1\); either this is a new prime or it is
divisible by a prime not in our list, since each prime in our list has
remainder 1 when we divide by it) gives a lower bound, namely log log x
(the true answer is that there are about x / log x primes at most x). As
a nice exercise (for fun), prove this fact. This leads to an interesting
sequence: 2,
3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 2801, 11, 17, 5471,
52662739, 23003, 30693651606209, 37, 1741, 1313797957, 887, 71, 7127, 109,
23, 97, 159227, 643679794963466223081509857, 103, 1079990819, 9539,
3143065813, 29, 3847, 89, 19, 577, 223, 139703, 457, 9649, 61, 4357....
This sequence is generated as follows. Let a_1 = 2, the first prime. We
apply Euclid's argument and consider 2+1; this is the prime 3 so we set
a_2 = 3. We apply Euclid's argument and now have 2*3+1 = 7, which is
prime, and set a_3 = 7. We apply Euclid's argument again and have 2*3*7+1
= 43, which is prime and set a_4 = 43. Now things get interesting: we
apply Euclid's argument and obtain 2*3*7*43 + 1 = 1807 = 13*139, and set
a_5 = 13. Thus a_n is the smallest prime not on our list genereated by
Euclid's argument at the nth stage. There are a plethora of (I believe)
unknown questions about this sequence, the biggest of course being whether
or not it contains every prime. This is a great sequence to think about,
but it is a computational nightmare to enumerate! I downloaded these terms
from the Online Encyclopedia of Integer Sequences (homepage is
http://oeis.org/
and the page for our sequence is http://oeis.org/A000945
). You can enter the first few terms of an integer sequence, and it
will list whatever sequences it knows that start this way, provide
history, generating functions, connections to parts of mathematics, ....
This is a GREAT website to know if you want to continue in mathematics.
There have been several times I've computed the first few terms of a
problem, looked up what the future terms could be (and thus had a formula
to start the induction).
- We talked about limits and continuity. Informally, a continuous function
is one where we can draw its graph without lifting our pen/pencil from the
paper. If we take this as our working definition, however, we can easily be
misled in terms of properties of continuous functions. For example, are all
continuous functions differentiable? Clearly not, as we can take f(x) = |x|.
While this function is not differentiable at x=0, it is differentiable
everywhere else. Thus we might be led to believe that all continuous
functions are differentiable at most points. This sadly is not true.
- Weierstrass showed (the first text where I read this used the phrase "Weierstrass
distressed 19th century mathematicians") that it is possible to have a
function which is continuous and differentiable nowhere! The
wikipedia article is a good starting point. In addition to explicitly
stating what the function is, it has a nice plot and good comments. The
function exhibits fractal
behavior, though the term fractal wasn't used until many years later
by Mandelbrot. For more on fractals, see the comments from Monday,
February 17.
- In higher mathematics we learn to quantify orders of infinity. We see
that there are more real numbers than rational numbers (see Cantor's
diagonalization argument); the comments from Wednesday, February 19th
discuss whether or not there is a set whose size is strictly between the
rationals and the real. Amazingly, if you count functions properly, it
turns out almost every continuous function is differentiable nowhere! See
here for some comments on
this strange state of facts. The key ingredient is an advanced result from Functional
Analysis, the Baire
Category Theorem. There are also sets (fractals) of non-integral
dimension. Fractals have a rich history and numerous applications, ranging
from economics to Star Trek II: The Wrath of Khan (where they were used to
generate the simulated landscapes of the Genesis Torpedo; see
the wikipedia article on fractals in film). The economics applications
are quite important. One of the most influential papers is due to
Mandelbrot (The
Variation of Certain Speculative Prices). This is one of the most
important papers in all of economics, and argues that the standard
Brownian motion / random walk model of wall street is wrong. The crux of
the argument is that these standard theories do not allow enough large
deviation days. For more on this, see Mandelbroit-Hudson: The fractal (mis)behavior
of markets (I have a copy of this book and can lend you part of it if you
are interested).
- We saw that for many limits of the form 0/0, a good way to attack the
problem is to switch to polar coordinates. We then replace (x,y) goes to
(0,0) through an arbitrary path with r tends to 0 and θ does
whatever it wants. This works for many problems, and is a good thing to try.
- For the Star Trek fans, there's an interesting Next Generation episode
which I forgot to mention: "The
Loss". The premise is that the Enterprise runs into two-dimensional
beings, and there are singularity issues. There's also a cosmic string
fragment (or some such technobabble, I forget exactly what). The reason I
was going to mention this is to talk about singularities and domains of
definitions of the function. Basically, things on the Enterprise go haywire
b/c part of it is now being intersected by the plane of two-dimensional
beings. There are some animations of the Enterprise being intersected by the
plane (i.e., a level set!).
- We ended by giving the definition of a partial
derivative, and hinting at some of the key properties. Some questions to
think about: what is the correct generalization of the definition of a
function of one-variable being differentiable to a function of several
variables being differentiable? Is it enough for all partial derivatives to
exist? Can we always interchange the order of two partial derivatives?
-
Sixth day lecture:
http://www.youtube.com/watch?v=71_8kHSAE4w
(February 21, 2014: Limits, Defn Partial Derivatives)
- Monday, February
12.
We started with the three big change of variables (polar, cylindrical and
spherical), then moved on to a discussion on
functions and level
sets. These occur all the time in real world plots. For example, weather
maps constantly show lines of constant temperature; these are called isotherms.
-
There are many different coordinate systems we can use; depending on the
symmetry of the problem, frequently it is advantageous to use one system
over another. We saw in class how complicated regions were reduced to
simpler regions. As a rule of thumb, it's better to have a harder integral
over a nicer region (rectangle, box) than a simpler integral over a more
complicated region. Three of the most common coordinate systems (after
Cartesian) are the following:
- polar
coordinates: used most often in the plane where our quantity depends
only on the distance from the origin. Note that the filled in circle of
radius 1 in polar coordinates corresponds to the rectangle \(0 \le r \le
1\) and \(0 \le \theta \le 2\pi\). Thus changing variables replaces a
`hard' region with a simple rectangle (and it is much easier to integrate
over rectanges!). To compute the area of the unit circle it's just \(4
\int_0^1 \sqrt{1 - x^2} dx\); this can be done directly but the
integration is a bit of a chore, and doing it will hopefully give you an
appreciation for the power of changing variables.
- cylindrical coordinates: generalization of the above where now we live
in three dimensional space, but the part depending on x and y only depends
on x2 + y2.
- spherical
coordinates: another generalization to three dimensions, where we only
depend on the distance from the origin.
- Of course, the story does not end in three dimensions. For many
problems we need to work with an n-dimensional
sphere, and the resulting coordinate system.
- One can easily assign
homework involving sketching various curves, many of which are famous conic
sections. The shapes that arise are often ellipses,
hyperbolas, parabolas, lines and circles.
The theory of conic sections says that these are all related, and arise as
cross sections obtained by having planes intersect a cone at various angles.
These shapes arise throughout mathematics and science. Here are just a few
examples, which illustrate their importance.
- Chemistry / Physics: The ideal
gas law states that PV = nRT. If we set T equal to a constant, we then
get PV is constant (this special case is called Boyle's
law). Note that this is an equation of a hyperbola, and thus the isotherms (level
sets of constant temperature) are hyperbolas.
- Physics / Astrophysics: The most common example of conic section is
orbits of planets. In three-dimensional space, planets orbiting the sun
under a gravitational force proportional to the inverse-square of the
distance travel in ellipses, hyperbolas, or parabolas (see
here for more details).
- It is not too hard for us to imagine what it would be like for a sphere
to enter a plane, but it does become harder and harder to imagine four
dimensional objects arriving in our three dimensional world. One of my
favorite stories it the classic Nightfall (by
Isaac Asimov). What makes this such a great story is that he takes something
that is conceivable for us and creates a world where it is inconceivable for
the population. I strongly urge you to read this story.
- The video clips for Flatland are available here: Flatland
trailer. (The
full movie is available here for class purposes only.) There's also projections
of 4-dimensional cubes in our 3-dimensional space. I find it very hard
to imagine four dimensional objects passing through our space, but it's a
fun exercise. We can imagine a sphere passing through Flatland; what would a
4-dimensional sphere look like passing through our space? We can imagine a
3-dimensional cube passing through Flatland (preferably at an angle as
otherwise it's nothing, a full square for awhile, and then nothing again);
what happens with a 4-dimensional square going through our space?
-
Kepler's laws of planetary motion heavily
use ellipses, hyperbolas and the like. These
are the famous conic sections, and there's a beautiful unified theory of
them.
- It's not hard to find functions of several variables. Baseball is filled
with these nowadays; one very popular one is the runs
created formula.
- It is easy to mislead or
manipulate people based on how data is presented. The following is one of my
favorite examples. Imagine someone tells you that your team has won 3 of the
last 4 games. You're happy and think they're doing well, with a recent
winning percentage of 75%. However, things are not as rosy as they seem. In
the last four games, imagine they lost the game played four ago. If that
were the case, then since they've won 3 of their last 4 they must have won
their last 3. If this is the case, we'd just say they've won 3 in a row, as
that sounds better. Thus, 4 games ago was a loss, and they've won 2 out of
the last 3. Moreover, we know what happened 5 games ago. It was a loss (if
it were a win, we would've said they've won 4 of 5). So their last 5 games
are either WWLWL, WLWWL or LWWWL. In all cases they've won 3 of 5 for 60%
(or 2 of the last 3 for 66%). In both cases they're essentially at what an
average team would be, but saying 3 of the last 4 makes them appear
stronger.
-
Fifth day lecture:
http://youtu.be/np_Icg33Lug
(February 19, 2014:
Coordinate systems, level sets)
- Friday, February
9.
A common feature in several variables is to first recall the one variable
case, and use that as intuition to describe what's happening. We started by
reviewing the three different ways to write the
equation of a line in the plane,
point-slope, point-point and slope-intercept, and talked about the hidden
vector lurking in the equation of a line in a plane. We then generalized this
to higher dimensions, and then wrote down the definition
of a plane (we'll
discuss alternate definitions involving normal vectors later in the course;
note that planes arose in the Superbowl in 2010 as to whether or not the
Saints had control when the ball broke the plane during the two point
conversion; click
here,
click here or click
here for
more on breaking the plane in football).
- We discussed how there are several different but equivalent ways of
writing the same expression. We can do it with vectors, as in (x,y,z) = P +
tv, or we can do it as a series of equations, such as x = p1 +
tv1, y = p2 +
tv2, z = p3 +
tv3, or as xi =
pi + tvi with
i in {1,2,3}. You should use whichever way is easier for you to visualize.
It is possible to get so caught up in reductions and compactifications that
the resulting equation hides all meaning. A
terrific example is the great physicist Richard Feynman's reduction of all of
physics to one equation, U = 0, where U represents the unworldliness of the
universe. Suffice it to say, reducing all of physics to this one
equation does not make it easier to solve physics problems / understand
physics (though, of course, sometimes good notation does assist us in
looking at things the right way).
- A nice problem is to prove the following about perpendicular lines:
the product of their slopes is always -1 if neither is parallel to the x- or
y-axis. In some sense, this
tells us that in the special case when the lines are the x- and y-axes, we
should interpret the product of their slopes as -1, or in other words in
this case \(0 \cdot -\infty = -1\).
- There are many applications of equations of lines, planes and
projections. One of my favorites comes from art. The painter Thomas
Eakins projected pictures of
people onto canvasses; this allowed him to have realistic pictures, and
saved him hours of computations. Two pictures frequently mentioned are Arcadia and Mending
the Net. He hid what he did; it wasn't until years later that people
noticed he had done this. If memory serves, this was discovered when people
were looking through photographs in an attic and noticed a picture of four
people on a New Jersey highway who were identical to four people in a
seascape. Upon closer inspection of the canvass, they noticed marks (which
were partly hidden) indicating Eakins projected the image onto the canvass. Click
here for more on the subject. See
also here for a nice story on the controversy (the
use of `technology' such as projectors in art). For
a semi-current view on the merits of tracing, watch this video clip.
- There is an enormous literature on the applications of lines, planes,
projections et cetera in art. The
wikipedia article is a good starting point. Another fun example is the
original movie Tron; here
is the light cycle scene. Notice how back then almost everything is
straight lines, and how the computers are dealing with the perspectives.
- The subject has advanced considerably over the years; ray
tracing is huge now, and can
do amazing
things very fast.
- One final nice application is a paper
by Byers and Henle determining
where a camera was for a given picture, which allows us to do a great job
comparing then and now.
- We discussed the equation for the angle between two vectors.
Geometrically, it's clear that if we change the lengths of the vectors then
we shouldn't change the angle; after a little inspection, we saw that our
formula satisfies that property. It is a great skill to be able to look at a
formula and see behavior like this. There is a rich history of applying
intuition like this to problems. One example is dimensional
(or unit) analysis,which is frequently seen in physics or chemistry; my
favorite / standard example is the simple
pendulum.
- We talked a bit about inequalities a few days ago with the triangle
inequality. Here's some more on the subject.
- We will not cover determinants in
great detail. For us, the most important property is that determinants
are related to the volume of the span of the different directions. We
saw an application of that today in the interpretation of the triple product
formula for \(\overrightarrow{A} \cdot (\overrightarrow{B} \times \overrightarrow{C}\).
- We did a quick introduction to fractals today.
Below I'll describe the results for Pascal Triangle and the applications to
Economics.
- Pascal's Triangle and Binomial Coefficients (http://youtu.be/nrknXC8xmTU):
41minutes: goes through Pascal's triangle, what it is and why it works,
why the relations hold, and shows how to get from that to Chaos Theory and
Fractals (we can discuss more on this later). Includes a Mathematica video
generating the triangle modulo 2, discussions on related problems, and
issues in computing these triangles quickly. Gives story proofs of some
binomial identities. This is a lecture I gave at the Teachers for Scholars
program in Boston February 2014.
- In Economics,
the standard random
walk hypothesis seems to
have lost most of its supporters, though there are variants (and I'm not
familiar with all); see also the efficient
market hypothesis and technical
analysis, and all the links there. (There are also many good links on
the wikipedia page on Eugene
Fama). Two famous books (with different conclusions) are Malkiel's A
random walk down wall street and
Mandelbrot-Hudson's The
(mis)behavior of markets (a fractal view of risk, ruin and reward).
Some interesting papers if you want to read more:
- Fourth day lecture:
http://youtu.be/ft2zViDFXcY (February 17, 2014:
Lines)
-
Wednesday, February
7.
We continued our list of applications of the Pythagorean
Theorem.
We saw how it leads to the law
of cosines,
which leads to our angle formula relating the angle in terms of the dot
product.
We then talked about determinants, which will be really useful when we get to
the multidimensional
change of variable formula.
The cross
product will
be very useful in dealing with the geometry of various functions, and occurs
all the time in physics and engineering, ranging from Maxwell's
equations for electromagnetism to
the Navier-Stokes
equation for fluid flow.
-
When asked for a relation between sine and cosine, we used
\(\sin^2 x + \cos^2 x = 1\). This is the natural one to use as we're talking
about areas and the sines and cosines come in as side lengths. There are, of
course, other relations. One important one is the relation between the derivatives:
the derivative of \(\sin x\) is \(\cos x\) and the derivative of \(\cos x\)
is \(-\sin x\) (you have to remember which one gets the minus sign; my
mnemonic is minus sine). In differential
trigonometry, it is essential that
we measure angles in radians. If
we use radians, then the derivative of sin(x) is cos(x) and the derivative
of cos(x) is -sin(x); this is not true if we use degrees. If we use
degrees, we have pesky conversion factors of 360/2π to
worry about. The proof of these derivatives follow from the angle
addition formulas; let me know if you want more details about this
(we'll mention this briefly when we do Taylor series of exp(x)).
- In the proof of the Law
of Cosines, the key step was adding an auxiliary line to reduce the
problem to the point where we could apply the Pythagorean Theorem. Learning
how to add these auxiliary lines is one of the hardest things to do in math.
As a good exercise, figure out what auxiliary lines to add to prove the
angle addition formula for sine, namely \(\sin(x+y) = \sin(x) \cos(y) + \cos(x)
\sin(y)\); click
here for the solution. For another example, click
here. One thing to keep in mind is what do we know, what are we building
upon. We know the Pythagorean formula; we thus want right triangles, which
suggests drawing an altitude.
There's a lot of nice theorems about altitudes and other such lines.
- In the proof that the area of the hyper-parallelogram is given by the
absolute value of the determinant (in two dimensions) we wanted to replace
the \(\sin(\theta)\) term with a function of \(\cos(\theta)\). Note how similar this is
to the proof of the law of cosines; we again are trying to reduce our
analysis to something known. We have formulas for the cosines of angles in
terms of dot products, but not their signs.
- I mentioned the movie Flatland. The
original story is available here,
while a trailer
from the new movie is here. It's an interesting exercise to think about
what life would be like confined to two dimension (think of how you eat and
what happens after). Any move that has squaricles is worth seeing! Star
Trek: The Next Generation dealt with two-dimensional life forms in the
episode The Loss: http://www.youtube.com/watch?v=rSiZd8qGejY (see around 8:30).
- In our course we only deal with integral dimensions, but that misses a
lot! There are many natural phenomena that legitimately have a fractal
dimension. There
are famous papers trying to compute the length of the British coast;
would you be surprised or not surprised to hear that the Finnish coast has a
higher dimension than the British?
- Here is a fun fact for
the day: a medical researcher rediscovers integration and gets 75 citations! The
article on this is here, while the paper
is here.
-
Third day lecture:
http://youtu.be/aIZmP640Sn4
(February 12, 2014: Dot
Products, Cross Products, Determinants, Areas)
- Monday, February
5.
Today we discussed some notation and the basic properties of vectors,
specifically how to add, subtract, and rescale them.
- We also discussed notation for the natural numbers, the integers, the
rationals, the reals and the complex numbers. We will not do too much with
the complex numbers in the course, but it is important to be aware of their
existence. Generalizations of the complex numbers, the quaternions,
played a key role in the development of mathematics, but have thankfully
been replaced with vectors (online
vector identities here). The quaternions themselves can be generalized a
bit further to the octonions (there
are also the sedenions,
which I hadn't heard of until doing research for theses comments).
- A natural question to ask is, if all we care about are real numbers,
then why study complex numbers? The reason is that certain operations are
not closed under the reals. For example, consider quadratic polynomials f(x)
= ax2 + bx + c with a,
b and c real numbers. Say we want to find the roots of f(x) = 0;
unfortunately, not all polynomials with real coefficients have real roots,
and thus to find the solutions may require us to leave the real. Of course,
you could say that if all you care about is real world problems, this won't
matter as your solutions will be real. That said, it becomes very useful
(algebraically) to allow imaginary numbers such as i = sqrt(-1). The reason
is that it allows us a very clean way to manipulate many quantities.
There is an explicit, closed form expression for the three roots of a cubic;
while it may not be as simple as the quadratic
formula, it does the job. Interestingly, if you look at x3 -
15x - 4 = 0, the aforementioned method yields (2 + 11i)1/3 +
(2-11i)1/3. It isn't at all obvious, but algebra will show that
this does in fact equal 4! As you continue further and further in
mathematics, the complex numbers play a larger and larger role.
- The proof that the length of a vector is the square-root of the sum of
the squares is a nice example of a proof
by induction (see also my
notes here). There are many statements in mathematics that can be proved
using this technique, and if you plan on continuing in math/physics this is
worth learning.
- Years ago I prepared a short handout for some of my students on various
proof techniques (click
here); it goes through several of the standard methods.
-
We ended the mathematical lecturing with the definition of the inner
or dot product. While our definition only works for vectors, it turns
out this is one of the most useful ideas in mathematics, and can be
generalized greatly. For example, we can talk about the dot product of
functions! We've seen a bit how the dot product is related to angles and
lengths, and thus we will find that we can discuss in a sensible manner what
the `angle' is between sin(x) and cos(x)! You can look at special cases to
get a sense of the reasonableness of the formula (take two perpendicular
vectors, or see what happens when you rescale their lengths).
- While
looking at special cases of the angle formula can appear convincing, it
suffers a severe drawback: we're only looking at special vectors, either
parallel or perpendicular or rescaling a known case. There's a real danger
of drawing the wrong conclusion from special cases. For example, if we only
looked at right triangles we'd think the sum of the squares of the shorter
sides equals the square of the longer. We must check a truly generic case to
get some real security; unfortunately, it's hard to check those cases as we
don't know the angles! Related to this are some nice stories about people
taking advantage of processes that were supposed to be random but weren't. A
nice recent example is with scratch lottery tickets (see
here for the Wired article, and here
for another). For another example, there are some very small errors the
Germans made with their Enigma code during WWII, which allowed the Allies to
read all German military orders! See the Wikipedia
article on Ultra (Ultra was
the code given to allied decrypt efforts), as well as Articles
from the NSA on cryptography (this
is a link to many subpages). Two especially good and accessible ones deal
with the German
code Enigma, and Ultra,
the allied deciphering of it. I strongly urge you to look at the links
here. Another nice one is on the Battles
of Coral Sea and Midway. An amusing story involves a Civil
war message just decoded -- fortunately it wasn't needed! (Another
version of the story here.) This is nice application of the Vigenere
cipher (see
also the notes by my colleague here on how to crack it). This is yet
another example of what was supposed to be a random pattern not being truly
random, and thus susceptible to attack.
-
The Fibonacci
numbers (here
is a great video) show up in a variety of places. They
satisfy the following recurrence relation: F_{n+2} = F_{n+1} + F_n (with the
initial conditions F_0 = 0 and F_1 = 1). After a little inspection one sees
that the entire sequence is determined once we know two consecutive numbers,
as we can just use the recurrence relation. There are many fun applications
of Fibonacci (and other recurrence relations) in nature; perhaps my favorite
is proving why Double Plus One is a bad strategy in roulette (though many
website, like
the one here,
don't seem to realize the danger, or perhaps deliberately avoid stating
it!). If you're interested in gambling applications of this (or other
aspects), just let me know;
here's a short video
clip explaining how the Fibonacci numbers show this is a poor strategy!
- Second day lecture: (February 10,
2014: Definition and properties of vectors, through dot product)
- Friday, February
2. The main result we
proved today was the
Pythagorean Theorem,
which relates the length of the hypotenuse of a right triangle to the lengths
of the sides (President
Garfield is credited with a proof). For us, this result is important as
gives us a way to compute the length of vectors. While we only proved it in
the special case of a vector with two components, the result holds in general.
Specifically, if \(v = (v_1, \dots, v_n)\) then \(||v|| = \sqrt{v_1^2
+ \cdots + v_n^2}\). It is a nice exercise to prove this.
One way is to use
Mathematical
Induction (one common image for induction is that of
following dominoes);
see also my handout on induction.
Below are some additional remarks. These relate to material mentioned in
class. The comments below are entirely for your personal enjoyment and
edification. You do not need to read these for the class. These are meant to
show how topics discussed arise in other parts of mathematics / science; these
will not be on exams, you are not responsible for learning them, ....
- Later in the semester we will revisit
Monte Carlo
Integration, called by many the most important mathematical paper
of the 20th century. Sadly, most integrals cannot be evaluated in closed form,
and we must resort to approximation methods.
- Sabermetrics is
the `science' of applying math/stats reasoning to baseball. The formula I
mentioned in class is what's known as the
log-5 method; a better formula is the
Pythagorean Won
- Loss formula (someone linked
my paper deriving this from a reasonable model to the wikipedia page).
ESPN, MLB.com and all sites like this use the Pythagorean win expectation in
their expanded series. My derivation is a nice exercise in multivariable
calculus and probability; we will either derive it in class or I'll give a
supplemental talk on it.
- In general, it is sadly the case that most functions do not have a simple
closed form expression for their anti-derivative. Thus integration is
magnitudes harder than differentiation. One of the most famous that cannot be
integrated in closed form is exp(-x2), which is related to
calculating areas under the normal (or bell or Gaussian) curve. We do at least
have good series expansions to approximate it; see the entry on the
erf (or error) function.
- For example, the anti-derivative of ln(x) is x ln(x) - x; it
is a nice exercise to compute the anti-derivative for (ln(x))n for
any integer n,and shows how much harder integration is than differentiation. For example, if n=4 we get 24 x-24 x Ln[x]+12
x Ln[x]2-4 x Ln[x]3+x Ln[x]4
- Here are some links to generate articles:
- Finally, the quest to understand the cosmos played an enormous role in the
development of mathematics and physics. For those interested, we'll go to the
rare books library and see first editions of Newton, Copernicus, Galileo,
Kepler, .... Some interesting stories below; see also a great article
by Isaac Asimov on all of this, titled
The Planet That Wasn't.
-
First day lecture:
http://youtu.be/Sabcuhsxekg (February 7, 2014:
Introduction, Pythagoras)