Additional comments related to material from the
class. If anyone wants to convert this to a blog, let me know. These additional
remarks are for your enjoyment, and will not be on homeworks or exams. These are
just meant to suggest additional topics worth considering, and I am happy to
discuss any of these further.
-
Monday, December 5. We ended the semester
with a brief tour on how so many of our different topics tie together. I hope
you enjoyed exploring these areas this semester, and seeing the connections
- Slides for the talk online here:
http://web.williams.edu/Mathematics/sjmiller/public_html/math/talks/Michigan2012Part1.pdf
- Video for the talk here:
http://youtu.be/JV4KtPuLWT0
- We talked about a lot of material, here are some links for more reading.
- Links
- Hayes: The
Spectrum of Riemannium: a light description of the connection between
random matrix theory and number theory (there are a few minor errors in the
presentation, basically to simplify the story). This is a quick read, and
gives some of the history.
- Firk and Miller: Nuclei,
primes and the Random Matrix connection: a survey paper on the history of
the subject, including both the nuclear physics experiments and the
theoretical calculations.
- Conrey: L-functions
and Random Matrix Theory: This is a high level description of the
similarities between number theory and random matrix theory.
- Katz-Sarnak: Zeros
of Zeta Functions and Symmetry: Another high level article similar to the
others (email me for a copy).
- Diaconis: Patterns
in Eigenvalues: this is a bit more readable than the others, and is based
on a distinguished lecture he delivered.
- Rudnick-Sarnak: Zeros
of principle L-functions and random matrix theory: This paper analyzes the
n-level correlations of zeros of automorphic L-functions and shows agreement
with random matrix theory; included in the paper are bounds towards Ramanujan
and the explicit formula for GL(n). For more papers, see Zeev
Rudnick's homepage.
- Iwaniec-Luo-Sarnak: Low
lying zeros of families of L-functions: This is a must read.
This is the first major paper calculating the 1-level density for families of
L-functions.
- Hughes-Miller: Low
lying zeros of L-functions with orthogonal symmetry. This paper
generalizes the results of Iwaniec-Luo-Sarnak to the n-level density. The
difficulty is in handling the combinatorics to show agreement with RMT.
- Rubinstein: Low
lying zeros of L-functions and Random Matrix Theory: this is his
dissertation, and in it he analyzes the 1-level density of the family of
quadratic Dirichlet characters, and shows agreement with Random Matrix Theory.
This is one of the easiest families to look at, and a great testing ground. The
published paper (in Duke) is here.
- Conrey-Snaith: Applications
of the L-functions Ratios Conjecture: This is a very recent conjecture
which is enjoying remarkable success in predicting answers. I somewhat
jokingly call it the conjecture of the four lies, as there are five steps and
four of the steps are provably wrong (ie, the assumptions in those steps
fail); however, miraculously, all the errors seem to cancel to phenomenal
level! I've become very interested in testing this conjecture as much as
possible, and have written several papers in this (and have ideas for a few
more which will be very accessible).
- Miller: A
symplectic test of the L-functions Ratios Conjecture: This paper builds on
those by Conrey-Snaith and Rubinstein and uses the Ratios Conjecture to
predict the lower order terms up to square-root cancellation, and then shows
(for suitable test functions) that this is the correct answer. An obvious
project is to generalize this test for other families or to enlarge the
support.
- Duenez-Miller: The
effect of convolving families of L-functions on the underlying symmetry.
In this paper we show how one may determine the corresponding classical
compact group for convolutions of certain families of L-functions. This paper
was motivated by The
low-lying zeros of a GL(4) and a GL(6) family of L-functions (Duenez-Miller),
which disproved a folklore conjecture on the corresponding classical compact
group.
- Miller (with an appendix by Duenez): Investigations
of zeros near the central point of elliptic curve L-functions. In this
paper we look at the experimental data of the first few zeros above the
central point in families of elliptic curves, with and without rank. We see
the effect of rank, and are led to certain conjectures as to the behavior of
low-lying zeros for finite conductors. We know what the behavior of these
zeros is in the limit as the conductors tend to infinity; see 1
and 2 level densities for rational families of elliptic curves (Steven
J Miller) and Low-lying
zeros of families of elliptic curves (Matthew
Young).
- Other items mentioned today:
- Video:
http://youtu.be/JV4KtPuLWT0
-
Monday, December 1. Two presentations
today emphasizing how to do calculations.
- Partial
summation is an extremely important technique; frequently a lot of work is
required to get expressions to a nice enough form so that the calculations can
be done and be useful. The idea of a
telescoping series
is very nice, or a series that is constant; these allow us to easily move /
extend intervals of consideration. One of the major techniques in my thesis
was to ensure in my oscillating function I didn't traverse a bounded interval
order n times (as I had n steps). To avoid that catastrophe I sieved and
looked at a subsequence that was monotone increasing, so I would go through it
at most once and get a universal bound independent of n. For this a useful
ingredient is bounded
variation.
- The second item was on the
Mellin transform;
this is just the
Fourier transform if we change variables. They're related by a logarithmic
change of variables. A great example of where this could be useful is in
Benford's law analysis: are you looking at the original data (if so use Mellin)
or the logarithmic transform of it (if so use Fourier). Frequently you'll find
a book with identities in one but not the other, but you can convert from
identity to identity.
- Fourier: The Modulo 1 Central Limit Theorem and Benford's Law for
Products (with Mark Nigrini), International
Journal of Algebra. (2 (2008),
no. 3, 119--130). pdf
- Foruier: Order statistics and Benford's law (with Mark Nigrini), International
Journal of Mathematics and Mathematical Sciences (Volume
2008 (2008), Article ID 382948, 19 pages, doi:10.1155/2008/382948) pdf
- Mellin: Chains of distributions, hierarchical Bayesian models and
Benford's Law (with D. Jang, J. U. Kang, A. Kruckman and J. Kudo), Journal
of Algebra, Number Theory: Advances and Applications. (volume 1, number 1
(March 2009), 37--60) pdf
-
Monday, November 24. We ended our unit on
calculating the limiting spectral measure from the real symmetric ensemble.,
seeing how the combinatorics emerge. This is just the beginning of a very vast
field. Due to lack of time we cannot investigate all the myriad of
opportunities, but the next step is seeing how different structures on the
matrices affect the combinatorics. Some ensembles are described in the
additional comments for Friday, November 21.
- McKay's paper on d-regular graphs:
http://cs.anu.edu.au/~bdm/papers/RandRegEigenvalues.pdf
- Graphs have a wealth of application, and the eigenvalues there have much
meaning. For example, for d-regular graphs if the largest eigenvalue has
multiplicity 1 then the graph is connected, while the gap between the largest
and second largest is related to how quickly information propagates through
the network. For more on the conjectured size, see the work of Friedman:
http://arxiv.org/pdf/cs/0405020v1
- See also Ramanujan graphs:
http://en.wikipedia.org/wiki/Ramanujan_graph
- Here is a nice application of random matrix theory to bus routes:
http://authors.library.caltech.edu/3946/1/BAIjpa06.pdf
- Due to time constraints we focused on the limiting spectral measure and
not on the gaps between adjacent eigenvalues, though that is an important
field with much progress in the past 5 years.
- What I hope you got out of this unit was an appreciation of how different
areas of mathematics meet and interact, and how problems in analysis often
reduce to technical machinery plus combinatorics!
- Video online here:
http://youtu.be/Pz43bbE_sPE
-
Friday, November 21. We continued our
analysis. If anyone is reading this let me know and I'll put more comments
here! So far one response, and have added.
- We built on the method of moments and started doing the integration. A key
fact was that if we have \(\int a_{ij}^r p(a_{ij}) da_{ij}\) that this is 1 if
r is 0 (as \(p\) is a probability distribution it integrates to 1) or r is 2
(as \(p\) has mean zero and variance 1, and thus this is the same as the
variance), while if r is 1 it is zero (as \(p\) has mean zero); for higher r
we have a finite value by assumption. While we have on the order of \(N^2/2\)
integrals to do, in the k-th moment most are 1. In fact, when we expand the
trace we have a polynomial in the matrix elements of degree k, and thus at
most k of the integrals are not 1.
- We saw the method of moments at work in going from knowing sums of powers
of the eigenvalues and using that to determine their values.
- Typically we cannot convert from knowing the coefficients of a polynomial
to a closed form expression for the roots; we can only do that for small
degrees.
- Moments of the Gaussian are double factorials:
http://en.wikipedia.org/wiki/Double_factorial
- Moments of the semi-circle are the Catalan numbers:
http://en.wikipedia.org/wiki/Catalan_number
- The main insight in our analysis is that most of the matchings do
not contribute; one of my advisors (Iwaniec) refers to this as doing a
good job counting on your hands, getting a good sense of hte number of degrees
of freedom. If we can show that even if everything contributed fully it is
negligible in the limit, then there is no need to carefully figure out its
contribution! What really matters is getting the correct growth rate for the
contributions as a function of \(N\); we often have factors of 2 coming from
either being on the same side of the main diagonal or opposite sides; while
these matter if we want to figure out the precise answer, they often
contribute at most \(2^k\) and this is indepenent of \(N\).
- In general, if we impose structure on the matrix that will affect the
contribution of different matchings. The more structure we have, the more
chances we have to match. In real symmetric matrices we have very little
structure, and if things are matched in pairs then \(a_{ij}\) and \(a_{mn}\)
are equal if and only if the pairs \((i,j)\) and \((m,n)\) are equal. The
situation is very different if we consider Toeplitz matrices, which are
constant along diagonals. Now there are a lot more choices for the indices to
correspond to the same independent variable. Below are links to some
supervised work I've done in the area.
- Distribution of eigenvalues for the ensemble of real symmetric Toeplitz
matrices (with Chris Hammond). Journal
of Theoretical Probability (18 (2005),
no. 3, 537-566). pdf
- Distribution of eigenvalues of real symmetric palindromic Toeplitz
matrices and circulant matrices (with Adam Massey and John Sinsheimer), Journal
of Theoretical Probability. (20 (2007),
no. 3, 637--662.) pdf
- The distribution of the second largest eigenvalue in families of random
regular graphs (with Tim Novikoff and Anthony Sabelli), Experimental
Mathematics. (17 (2008),
no. 2, 231--244.) pdf
- Distribution of eigenvalues for highly palindromic real symmetric Toeplitz
matrices (with Steven Jackson and Thuy Pham), Journal
of Theoretical Probability. (25 (2012),
464--495) pdf
- The Limiting Spectral Measure for Ensembles of Symmetric Block Circulant
Matrices (with Gene S. Kopp Murat Koloğlu,
Frederick Strauch, Wentao Xiong). Journal
of Theoretical Probability (26(2013), no. 4, 1020--1060) pdf
- The expected eigenvalue distribution of large, weighted d-regular graphs
(with Leo Goldmakher, Cap Khoury and Kesinee Ninsuwan). Random
Matrices: Theory and Applications. (3 (2014),
no. 2, 1450015, 22 pages) pdf
- Distribution of eigenvalues of weighted, structured matrix ensembles (with
Olivia Beckwith, Victor Luo, Karen Shen and Nicholas Triantafillou), submitted
to INTEGERS. pdf
- Video here:
http://youtu.be/Ij-t_KHt9KU
-
Wednesday, November 19. We talked about how to determine the
appropriate scale for RMT, the Eigenvalue Trace Lemma, the Triangularization
Lemma, and the Method of Moments.
-
Monday, November 17.
We've begun in earnest our study of Random Matrix Theory. See the article
by Brian Hayes for
a bit of the history of the connection between Random Matrix Theory and Number
Theory (though there are a few math mistakes in the article!). We use the
Moment Technique to prove Wigner's
Semicircle law;
see the article
by Jacob Christiansen for
an introduction to the moment problem (given a sequence of non-negative
numbers, do they represent the moments of a probability distribution and if
so, is there only one distribution with these moments?); the interested reader
is strongly encouraged to read this article to get a sense of the problem of
how moments may or may not specify a probability distribution. The semicircle
law is what one obtains for the density of eigenvalues from real symmetric
matrices with each independent entry chosen independently from a mean 0,
variance 1 distribution with finite higher moments; if we look at other sets
of matrices with different structure, very different behavior is seen.
Terrific examples are the densities for d-regular graphs or for Toeplitz
matrices (see our book for more details).
- We
reviewed the probability we'll need, moments and the method
of moments (note
the Wikipedia entry specifically mentions Wigner's
Semi-Circle Law),
and that expectation is
linear. A good exercise is to find,
if possible,
two dependent random variables such that the expected value of the sum is not
the sum of the expected values (if we let \(X_1\) be the roll of a fair die,
and \(X_2 = 1/X_1\) -- does that work?).
-
The reason we are able to prove results such as Wigner's
Semi-Circle Law (and
so much more) is the Eigenvalue Trace Lemma. More generally, one can consider
similar problems in number theory, such as the density of zeros of the Riemann
zeta function (or more general L-functions)
or the spacings between adjacent zeros. The problem is that while there are
generalizations of the Eigenvalue Trace Lemma (such as Riemann's
Explicit Formula),
these formulae are useless unless accompanied by a good averaging formula.
We'll see more about this later, but briefly: if we can't make sense of
Trace(\(A^k\)), does it really help to express the moments in terms of this.
While we have nice averaging formulas in linear algebra, we don't have nearly
as good formulas in number theory (excellent PHD thesis topics here -- the
lack of these averaging formulas is holding up a lot of progress on a variety
of problems!).
- A
fascinating aside is \(d\)-regular random graphs. These have enjoyed
remarkable success in building efficient networks. There are known limits as
to how far the second largest eigenvalue can be from the largest in a
connected \(d\)-regular graph. Graphs with large separations are called Ramanujan
graphs;
this is a terrific topic for an aside, and I have a lot of literature I can
share.
- General reading:
- Hayes: The
Spectrum of Riemannium: a light description of the connection between
random matrix theory and number theory (there are a few minor errors in the
presentation, basically to simplify the story). This is a quick read, and
gives some of the history.
- Conrey: L-functions
and Random Matrix Theory: This is a high level description of the
similarities between number theory and random matrix theory.
- Katz-Sarnak: Zeros
of Zeta Functions and Symmetry: Another high level article similar to the
others.
- Diaconis: Patterns
in Eigenvalues: this is a bit more readable than the others, and is based
on a distinguished lecture he delivered.
- Video here:
http://youtu.be/kiswyjwDirg
-
Friday, November 14. Today we wrote down the Circle Method
calculation for a variety of problems and discussed counting and bounding
heuristics, and had our first presentation.
- Rubinstein paper on the Hardy-Littlewood Constant and Twin Primes:
http://www.jstor.org/stable/2324298
- Rubinstein and Sarnak: Chebyshev's bias:
http://www.math.uwaterloo.ca/~mrubinst/publications/Chebyshev.pdf
- Hardy-Littlewood Conjectures:
- Twin prime conjecture:
http://en.wikipedia.org/wiki/Twin_prime#First_Hardy.E2.80.93Littlewood_conjecture
- k-tuple conjecture:
http://mathworld.wolfram.com/k-TupleConjecture.html (notice the
integral!).
- Prime constellation:
http://mathworld.wolfram.com/PrimeConstellation.html
- USS Constellation: NCC 1017:
http://mathworld.wolfram.com/PrimeConstellation.html
- From MemoryAlpha: The Constellation studio model was constructed from a
1966 first edition AMT Enterprise model kit , no. S921. (Star Trek
Encyclopedia, 3rd. ed., p.
85) That particular edition sported a decal sheet with only the "NCC-1701"
decal, and, with very limited options, its numbers had to be rearranged to
create the unusually low "NCC-1017" Constellation registry number, the first
new registry actually seen on a ship and, as it turned out, also the only time
in the original airing of the Original Series , though
considered somewhat incongruous by many, due to the perceived discrepancy in
the numbering system.
- We did a lot of heuristics, which is great and a nice way to get a sense
of the magnitude of the answer.
- We did some more order of magnitude estimates. We had \(\sum_{p \le N}
(\log p)^2\). We got an upper bound of \((\log N)^2 \pi_2(N)\) very easily.
For the lower bound we saw it was good to restrict the sum to primes in the
range \(N^a \le p \le N\), as here up to a constant \(\log p\) doesn't change;
if we went all the way down to \(p \sim \log N\) then \(\log p \sim \log\log
N\), much lower. As we believe the number of twin primes up to \(N\) is of
size \(C_2 N/\log^2 N\), this is a fine place to cut. We can trivially
estimate the number of primes up to \(N^a\) by \(N^a\), and end with a good
estimate: \(a^2 \log^2 N\) (\pi_2(N) - N^a) \le \sum_{p \le N} (\log N)^2 \le
(\log N)^2 \pi_2(N)\).
- We also discussed \(\log (p+2) = \log p + \log(1 + 2/p) = \log p +
O(1/p)\); this is a common technique, go for the main term, rewrite as one
plus something small, Taylor expand. Interestingly this was the point the
presenters chose to bring up!
- Brun's sieve:
http://en.wikipedia.org/wiki/Brun_sieve
-
The main idea behind Brun's sieve is the Method
of Inclusion - Exclusion.
-
The inclusion
/ exclusion principle is
one of my favorite methods and is especially important in probability
as it is very easy to accidentally double count events. We can use this to show
that the probability that none of \(n\) people return to their seat (given that
each ordering is equally likely) converges to \(1/e\). Another fun example is
to show that the probability a number is square-free converges to \(6/\pi^2\);
more generally, the probability that it is \(k\)-power free for \(k\) at least
\(2\) is \(1/\zeta(k)\), where \(\zeta(s) = \sum_{n = 1}^\infty 1 / n^s = \prod_{p\
{\rm prime}} (1 - 1/p^s)^{-1}\) (if Re(s) > 1) is the Riemann
zeta function.
Sadly these arguments cannot be used to prove results about how many primes
there are (it comes down to dealing with the error terms in dropping the floor
function,
though this has not stopped lots of amateurs from using this to `prove' some
of the big open problems in number theory).
-
One of the more interesting uses of this principle is in Brun's
sieve,
where he uses inclusion-exclusion to show that there cannot be too many twin
primes. Perhaps
the strangest application of this is that this is how the famous Pentium Bug
was discovered!
What about the more general case, namely when we reorder and have at least
\(r\) correct?
-
Here's Nicely's
webpage. The
calculation being performed was \(\sum_{p:\ p\ {\rm prime\ and\ either}\ p+2\
{\rm or} \ {p-2}\ {\rm is\ prime}} 1/p\); this is known as Brun's
constant.
If this sum were infinite then there would be infinitely many twin primes, proving
one of the most famous conjectures in mathematics;
sadly the sum is finite and thus there may or may not be infinitely many twin
primes (twin primes are two primes differing by 2).
- Video online here:
http://youtu.be/_9xI2tmhjHs
-
Wednesday, November 12. We first discussed some of the exam
problems (this discussion was not recorded as some students shared their
work), and then delved into why the binary Goldbach problem is so hard but the
ternary is doable.
- It came down to how much cancellation we need on the major arcs. If we are
trying to write a number as a sum of \(s\) primes, we got the main term was \(\goth{S}_s(N)
N^{s-1}\), where \(\goth{S}_s(N)\) is the singular series and, importantly,
can be bounded away from zero with a bound independent of \(N\) (if the bound
did depend on \(N\) then if it were sufficiently small it could wipe out the
main term.
- We then needed to estimate the minor arc contribution: \(\int_{\mathrm{m}}
f_N(x)^s \exp(-2\pi i N x) dx\), where \(f_N(x) = \sum_{p\le N} (\log p)
\exp(2\pi i x)\). Unfortunately the only thing we know how to do here is pull
the absolute value inside. This is disastrous as we lose all the oscillation
from the highly oscillatory term \(\exp(-2\pi i N x)\), but at least we still
have oscillation and cancellation from the \(f_N(x)^s\).
- We went through bounding when \(s=3\). We pulled out one factor of \(f_N(x)\)
and showed that the minor arc contribution is bounded by \(\max_{x \in \mathrm{m}}
|f_N(x)| \ast \int_0^1 |f_N(x)|^2 dx\) (where we extended the integration from
the minor arcs to the entire interval). We have a lot of cancellation in this
integral (the absolute value is easily handled: \(|f_N(x)|^2 = f_N(x) f_N(-x)\),
and the resulting double sum when we expand collapses to the two primes being
equal); this integral is just \(N \log N\) (approximately), and thus as the
major arcs are of size \(N^2\), we win if we can save a few logarithms in \(|f_N(x)|\)
on the minor arcs. This is how the subject developed, and results on the
distribution of primes in arithmetic progression (RH and GRH for the Riemann
zeta function and Dirichlet \(L\)-functions) come into play.
- The situation is sadly different for the binary Goldbach. The problem is
if we again pull out one factor of \(f\_N(x)\) over the minor arcs we're left
with the \(L_1\)-norm, which we can only estimate by extending the integration
from the minor arcs to all the interval and getting it is of size \(\sqrt{N\log
N}\) (to see this use Cauchy-Schwarz). Now if we're going to
have the minor arcs smaller than the major arcs we need the maximum value to
be \(o(N^{1/2}/\log N)\). This is disastrous -- this is smaller than our
estimate of the average value!
- I'm hoping that the above discussion gives you a sense of the potential
and the limitations of the Circle Method. We'll formulate more problems along
these lines on Friday, and we'll hear short reports.
- Read classics:
- Video online here:
http://youtu.be/3n2BTQKdFfo
-
Wednesday, November 5.
Remember no class on Friday (b/c of the exam)
or Monday (to do group work).
- Today we explored the definition of the major arcs. The key idea is that,
to get a sense of their size, we can make crude estimates as we only care
about order of magnitude. This is a general principle: no need to spend too
much time on quantities that don't matter!
- There are a lot of tedious details in the application of the Circle
Method. There was a request to do some more examples in class, which we'll do
next Wednesday (moving the group presentations to Friday).
- Key quantities:
- Major and minor arcs: there definition depends on the cancellation we need
and can get; for Goldbach type problems are major arcs are embarrassingly
short.
- Singular series: represent it as a product, quantifies local obstructions.
- Waring's problem:
http://www.maths.bris.ac.uk/~sp2331/WaringCircle.pdf (nice set of notes).
- Video online here:
http://youtu.be/5iROetdSRVY
-
Monday,
November 3. We talked about the importance of having a variable
parameter when trying to estimate integrals. When it's not clear where to cut
things, add a free parameter. For example, we were trying to estimate \(\sum_{p
\le N} \log p\). To get a handle on it, we assumed we knew how to evaluate
unweighted sums. Of course, in reality it's the other way around -- we have
information on these weighted sum and we want to pass to unweighted ones! No
one seemed to question this today.... The reason I wanted to do this was not
because we needed this result, but because I wanted to talk about how to get
results along these lines. Assuming good knowledge on the unweighted sum, we
split into to sums, from \(2\) to \(N/\log^\alpha N\) and then from
\(N/\log^\alpha N\) to \(N\). In the second regime the \(\log p\) weights are
essentially constant, differing only by terms of size \(\log \log N\), while
if \(\alpha > 1\) in the first regime there are just an insignificant
number of primes relative to the number in the second term.
-
Read and discuss in groups of at least 2 and be prepared to present in class
on Wednesday:
- Video online here:
http://youtu.be/sF9gBL85lrc
Friday, October 31. Building
on our knowledge of L-functions and primes in arithmetic progression, we now
turn to the Circle Method. In my opinion, this is one of the most beautiful
ideas in all of mathematics. Unfortunately the application of it is often
quite involved and technical, and it is easy to get lost in the calculations.
Similar to our study of primes in arithmetic progressions, the Circle Method
works by showing that because something happens many times, it must happen at
least once!
-
We discussed two of the main ingredients in the Circle Method: (1) writing
expressions as Main Term + Error with good control on the error, and (2)
proving something happens at least once by showing it happens many times. We
discussed prime
races (see
also here),
and how misleading the data can be. Instead of looking at \(\pi_{3,4}(x) -
\pi_{1,4}(x)\) we could look at \({\rm Li}(x) - \pi(x)\); this was also
observed to be positive as far as people could see, but it turns out that they
flip infinitely often as to which is larger. This was shown by Littlewood, but
it was not known how far one must go to see \(pi(x) > Li(x)\). His student Skewes
showed it
suffices to go up to \(10^{10^{34}}\) if the Riemann
hypothesis is
true, or \(10^{10^{10^{963}}}\) otherwise (as large as these numbers are, they
are dwarfed by Graham's
number).
We 'believe' it's around 10^316 where pi(x) beats Li(x) for the first time
(note this is well beyond what we can investigate on the computer!). The proof
involves the Grand Simplicity Hypothesis (that the imaginary parts of the
non-trivial zeros of Dirichlet L-functions are linearly independent over the
rationals); this is used to show that (n \(\gamma_1, ..., n \gamma_k) \bmod
1\) is equidistributed in \([0,1]^k\) where the \(\gamma_j\) are the imaginary
parts of these zeros. Note that this is Kronecker's theorem (which we
discussed in Chapter 12); it's amazing how this result surfaces throughout
mathematics.
-
Instead of looking at \(A+A+\cdots+A\) we could look at \(A-A\); for example,
if \(A = P\) (the set of primes), we know \(2\) is in \(P-P\). What is nice
about the Circle Method is the way it proves something is in a set like \((A+A+\cdots+A\)
or \(A-A\) is to count how MANY times it is in. Thus, the Circle Method will
give heuristics as to how many times \(2\) occurs in \(P(N) - P(N)\), where \(P(N)\)
is the set of primes at most \(N\). This leads to the Hardy-Littlewood
heuristics for
the number of twin primes. In our book we study how many Germain
primes there
are, primes p such that (p-1)/2 is also prime. These primes have applications
in cryptography (in
proving that it is possible to do primality testing in polynomial time (see
also here)
-- if there are as many Germain primes as the Circle Method predicts, certain
primality tests run faster) and in Fermat's Last Theorem (if
x^p + y^p = z^p and p is a Germain prime then p|xyz).
-
The following problem arose in Math 331: can a product of \(k\) consecutive
integers be a perfect \(r \ge 2\) power? Here is a nice paper by Erdos and
Selfridge proving this cannot be done:
https://www.renyi.hu/~p_erdos/1975-46.pdf (there are elementary proofs of
some simpler cases).
-
Here are some notes on evaluating \(\zeta(2)\):
http://www.uam.es/personal_pdi/ciencias/cillerue/Curso/zeta2.pdf
-
We studied the cookie problem, counting the number of ways to divide 10
identical cookies among 5 distinct people (here's the classic clip where Cookie
Monster meets the Count, whose full name is Count von Count -- they
changed how he appears!). Counting is very important; we talked a bit about
counting 'good' games in tic-tac-toe, and ignoring counting obviously bad
ones. (Consider the error in the classic Princess
Bride `Chess' Match or in the Princess
Bridge Battle of Wits). It's usually called the stars
and bars problem. What I love here is the power of changing your
perspective -- we go from a very painful brute force approach to being able to
solve it in one line.
-
I have posted the cookie problem on my math
riddles page (email me if you
want to contribute); someone with far more
patience than I solved it by brute force. Here's their solution. The final
number, tabbed from the others, are how many distinct rearrangements we have
of this basic configuration. The total is 1001,
or (10+5-1 choose 5-1). We saw this problem lead to a discussion of
multinomial coefficients. When there are two people getting cookies, say 8
and 2, there aren't (5 choose 2) ways to assign people, but (5 choose 2) *
2! (we choose the 2 people, then there are 2! ways to choose which gets the
8 and which gets the 2). If we have 8 1 1 it's more involved. In that case
it's (5 choose 3) to choose the three people, 3! ways to order which of the
people gets which number, but then we must divide by 2! (as the two people
getting 1 are indistinguishable). A better way to view 3!/2! is 3! / (2! 1!)
(note the numbers on the bottom sum to the top). This is an example of amultinomial
coefficient, a generalization of binomial coefficients. For example, if
we have MISSISSIPPI, there would be 11! ways to order the letters (order
matters) ifthe letters are
distinguishable, but they're not. So let's put subscripts on the letters: MI1S1S2I2S3S4I3P1P2I4.
We then have 4! ways of placing the four marked S's in the four S positions,
and so on, giving 11! / (4! 4! 2! 1!) (I like including the final 1 so that
the bottom sums to the top).
- Below are the person's solutions.
-
0 0 0 0 10 5
0 0 0 1 9 20
0 0 0 2 8 20
0 0 0 3 7 20
0 0 0 4 6 20
0 0 0 5 5 10
0 0 1 1 8 30
0 0 1 2 7 60
0 0 1 3 6 60
0 0 1 4 5 60
0 0 2 2 6 30
0 0 2 3 5 60
0 0 2 4 4 30
0 0 3 3 4 30
0 1 1 1 7 20
0 1 1 2 6 60
0 1 1 3 5 60
0 1 1 4 4 30
0 1 2 2 5 60
0 1 2 3 4 120
0 1 3 3 3 20
0 2 2 2 4 20
0 2 2 3 3 30
1 1 1 1 6 5
1 1 1 2 5 20
1 1 1 3 4 20
1 1 2 2 4 30
1 1 2 3 3 30
1 2 2 2 3 20
2 2 2 2 2 1
-
What we're really doing is solving the equation x1 + ... + x5 = 10 in
non-negative integers. This is a very special type of Diophantine
equation. It's actually a special case of Waring's
problem, which looks at solving x1^k + ... + xs^k = n for fixed s and k.
These problems are in general not accessible through combinatorics; the case
k=1 is special. The general approach proceeds via generating
functions, which we will cover in great detail later in the semester
(it's one of the key concepts of the class).
- Our solution to
the cookie problem is quite elegant, and in some respects reminiscent of
geometry class (remember all those proofs where the teacher cleverly addsauxiliary
lines; the difference here is we just add more cookies). While it is
possible to solve many combinatorial problems by brute force in principle,
in practice this is not a good way to go -- it is time consuming, and quite
likely that one makes a mistake. Typically one finds a way to interpret a
given quantity two ways; we can compute one of them and thus we obtain a
formula for the other. For example, we showed the number of ways of dividing
C cookies among P people is (C + P - 1 choose P-1); here all the identical
cookies are divided. What if we don't assume all the cookies are divided --
what is the answer now? It is just Sum_{c = 0 to C} (c + P - 1 choose P -
1); this is because we are just going through all the cases (we give out no
cookies, 1 cookie, ...). What does this sum equal? Imagine now we have
another person, say the Cookie
Monster (this is one of
Cameron's favorite clips), who gets all the remaining cookies. Then dividing
at most C cookies among P people is the same as dividing exactly C cookies
among P+1 people, and hence our sum equals (C + P+1 - 1 choose P+1 - 1).
- Related to the
cookie problem is the partition problem: for the cookie problem we consider
2+3+3+1+1 different from 1+2+3+3+1 as the five people are distinct; if we
don't consider these distinct then we have a partition
problem. It's a lot more complicated to count these, but some great
mathematics (such as Young
tableau).
-
Video online here:
http://youtu.be/jzSt1Uepv-c
Wednesday,
October 29. We finished our (first) unit on L-functions. We
saw the importance of studying a family of L-functions to glean information.
We used Dirichlet characters to build Dirichlet L-functions, which we then
used to prove the infinitude of primes congruent to \(a\) modulo \(m\)
(provided these numbers are relatively prime). In the five step program, the
key step was proving \(L(1,\chi) \neq 0\) if \(\chi\) is not the principle (or
trivial) character \(\chi_0\). The proof in general leads to Dirichlet's class
number formula.
-
Dirichlet's class number formula:
http://en.wikipedia.org/wiki/Class_number_formula
-
Lecture notes expanding on Davenport's classic "Multiplicative Number Theory"
by Andreas Strombergsson:
http://www2.math.uu.se/~astrombe/analtalt08/www_notes.pdf
-
One of the main steps in the proof was taking \(a=1\) and noting that
\(\sum_{\chi \ {\rm mod}\ m} \log L(\sigma,\chi) \ge 0\) for \(\sigma > 1\).
We were able to use this to show that if \(\chi\) is not a real character (so
it doesn't equal its complex conjugate) then the associated \(L(1,\chi) \neq
0\); this is because if there were two zeros the sum of the logarithms would
go to \(-\infty\) as \(\sigma \to 1\) from above, as only one factor has a
pole which is then cancelled by one of the two zeros.
-
This reduces the analysis to \(L(1,\chi)\) for \(\chi = \overline{\chi}\). We
looked at the characters when \(m=4\); this is one of the first non-trivial
examples as we need an \(m\) where there are two relatively prime numbers to
it. We saw that \(L(1,\chi) = \sum_{n=1}^\infty \chi(n)/n = 1 - 1/3 + 1/5 -
1/7 + \cdots\). This sum converges nicely; it's an alternating strictly
decreasing sum. We saw that we could write this as \(\int_0^1 dx/(1+x^2)\),
expanded the denominator by the geometric series (whenver you see
\(1/(1\pm{\rm blah})\) you should think of using the geometric series).
If we integrate directly we get \(\arctan(1) - \arctan(0) = \pi/4\); if we
interchange the integral and the sum we get the series alluded to above.
-
We have to be a bit careful. First, we can't just integrate to 1 as then the
geometric series formula doesn't work as the ratio has absolute value 1. All
is not lost; we can integrate to \(1-\epsilon\) and send \(\epsilon\) to 0.
Next, we have to worry about interchanging the integral and the sum.
Fortunately this is easy for the geometric series, as it has the wonderful
property that the tail is also a geometric series. Thus \(\frac1{1+x^2} = \sum_{n=0}^{N-1}
(-x^2)^n + (-x^2)^N \frac1{1+x^2}\), and we now have a finite sum.
-
Note that if we took \(x=1\) this corresponds to a geometric series where \(r
= -1\). If we think about it, we see we're saying \(\frac1{1-(-1)} = 1 - 1 + 1
- 1 + 1 - 1 + \cdots\); it's reasonable to declare this to be 1/2, as half the
time (when we truncate after an odd number of terms) we have 1 while the other
times we have 0. This is the first example of a rich theory dealing with how
to make sense of divergent sums.
-
Video online here:
http://youtu.be/Wavl_-DsdWw
Monday,
October 27.
See Chapters 3 and 18 of our book for more information about Dirichlet
Characters and Dirichlet
L-funcions.
Their main applications (for us) are in proving Dirichlet's
Theorem of Primes in Arithmetic Progression and
other similar results.
-
The Riemann zeta function is the first of many such functions we can study.
The generalizations are called L-functions,
and for us are of the form \(L(s,f) = \sum_n a_n(f) / n^s = \prod_p L_p(s,f)^{-1}\)
where \(L_p(s,f)\) is a polynomial of degree \(d\) in \(p^{-s}\). Two of the
most important are Dirichlet
L-functions (which
has applications to primes in arithmetic progression) and Elliptic Cuve
L-functions (which has applications to understanding the size of the group of
rational solutions of the elliptic curve -- see the Birch
and Swinnerton-Dyer conjecture for
more information). Dirichlet characters are sometimes covered in a group
theory or abstract algebra course. If you want more details, see Chapter 3 of
our book (from Section 3.3.2 to 3.3.6). Elliptic curves are discussed in
Section 4.2.We initially used some knowledge of the zeros of the Riemann zeta
function to deduce information about the primes. Amazingly, if we look at
families of L-functions we can convert knowledge of sums over the family of
the \(a_n(f)\) to information about the zeros of the associated L-functions.
We saw how we have great formulas for summing Dirichlet characters; similar
formulas exist for other families as well. For details in the case of
Dirichlet L-functions, see Chapter 18 of our book. Note the amazing similarity
between random matrix theory. We have three ingredients to understand the
zeros. (1) Determine the correct scale to study the zeros (this actually falls
out from the functional equation). (2) Derive a formula relating sums over the
zeros to a related sum over the prime coefficients. This is the analogue of
the Eigenvalue Trace Lemma, Tr\((A^k) = \sum {\lambda_i}(A)^k\). The reason
this formula was so useful is that while we want to understand the eigenvalues
of our random matrices, it is the matrix elements that we choose. Thus, this
formula allows us to pass from knowledge of the matrix elements to knowledge
of the zeros. These are known as Explicit
Formulas.
(3) The Eigenvalue Trace Lemma and the Explicit formula would be useless,
however, if we were unable to actually execute the sums. Our theory thus
requires some kind of averaging formula. For random matrix theory, this was
the integrals of Tr\((A^k) P(A) dA\); we could compute these as Tr\((A^k)\) is
a polynomial in the matrix elements, and then we used combinatorics and
probability theory. Sadly, we do not have great averaging formulas in number
theory, and this is why the results there are significantly worse.
-
There are many questions one can ask about primes in progression. The first is
on primes in arithmetic progressions. We can also ask about twin primes (p and
p+2k also prime). We could look at prime triples such as p, p+2 and p+4 other
than 3, 5, 7; this leads to the question of just which arithmetic progressions
are possible. We quickly see here that
there are no such triples; there is an arithmetic obstruction as at least one
of the three numbers is a multiple of three. The
largest to date might be an arithmetic progression of length 25; Green
and Tao proved
that there are arbitrarily long arithmetic progression (sadly it's an
existence theorem, with nothing said about actually finding one!).
-
Another great question is about the least prime in an arithmetic progression:
-
Video online here:
Friday, October 24. We finished our sketch of the Prime
Number Theorem and began our unit on Dirichlet L-functions, the first
generalization of the Riemann zeta function.
-
The complex analytic proof of the Prime
Number Theorem uses
several key facts. We need the functional equation of the Riemann zeta
function (which follows from Poisson
summation and properties
of the Gamma function), the Euler
product (namely that
\(zeta(s)\) is a product over primes), and the important fact that the
Riemann zeta function does not have a zero on the line Re(s) = 1! If
this happened, then the main term of \(x\) from integrating \(\zeta'(s)/\zeta(s)
\ast x^s/s\) arising from the pole of \(\zeta(s)\) at \(s=1\) would be cancelled by
the contribution from this zero! Thus it is essential that there be no
zero of zeta(s) on Re(s) = 1. There are many proofs of this result. My
favorite proof is based
on a wonderful trig identity: \(3 + 4 \cos(x) + \cos(2x) = 2 (1 - \cos(x))^2
\ge 0\) (many people have said that \(w^2 \ge 0\) for real \(w\) is the most
important inequality in mathematics). There is an elementary proof of the prime number theorem
(ie, one without complex analysis). For those interested in history and
some controversy, see
this article by Goldfeld for a terrific analysis of the history of the
discovery of the elementary proof of the prime number theorem and the
priority dispute it created in the mathematics community. We
mentioned Riemann computed zeros of \(\zeta(s)\) but didn't mention his
achievement; the method only came to light about 70 years later when
Siegel was looking at Riemann's papers. Click
here for more on the Riemann-Siegel formula for computing zeros of
zeta(s). Finally, terrific advice given to all young mathematicians
(and this advice applies to many fields) is to read the greats. In
particular, you should read Riemann's
original paper. In case your mathematical German is poor, you can click
here for the English translation of Riemann's paper. The key passage
is on page 4 of the paper: One
now finds indeed approximately this number of real roots within these
limits, and it is very probable that all roots are real. Certainly one
would wish for a stricter proof here; I have meanwhile temporarily put
aside the search for this after some fleeting futile attempts, as it
appears unnecessary for the next objective of my investigation.
- One of course should be careful about saying that it is impossible to
prove a result without resorting to using specific facts, even though those
facts might seem quite obviously necessary to use. A terrific example is the
elementary proof of the Prime
Number Theorem (which says that as \(x \to \infty\), the number of primes at most
\(x\) is asymptotic to \(x/log x\)). It turns out that this statement is equivalent to
the fact that the Riemann zeta function \(\zeta(s) = \sum_{n = 1}^{\infty} 1/n^s\) (or
actually its meromorphic continuation) has no zero on the line \({\rm Re}(s) = 1\). This
is quite clearly a complex analytic statement. It was thought that there could
be no `elementary' proof of this (elementary doesn't mean easy; it just means
without using complex analysis), but if there were one, boy would it open our
eyes! Both statements are false.
See this article by
Dorian Goldfeld for the history of the proof of the Prime Number Theorem
(and the priority dispute).
-
The Riemann zeta function is the first of many such functions we can study.
The generalizations are called L-functions,
and for us are of the form \(L(s,f) = \sum_n a_n(f) / n^s = \prod_p L_p(s,f)^{-1}\)
where \(L_p(s,f)\) is a polynomial of degree \(d\) in \(p^{-s}\). Two of the
most important are Dirichlet
L-functions (which
has applications to primes in arithmetic progression) and Elliptic Cuve
L-functions (which has applications to understanding the size of the group of
rational solutions of the elliptic curve -- see the Birch
and Swinnerton-Dyer conjecture for
more information). Dirichlet characters are sometimes covered in a group
theory or abstract algebra course. If you want more details, see Chapter 3 of
our book (from Section 3.3.2 to 3.3.6). Elliptic curves are discussed in
Section 4.2.We initially used some knowledge of the zeros of the Riemann zeta
function to deduce information about the primes. Amazingly, if we look at
families of L-functions we can convert knowledge of sums over the family of
the \(a_n(f)\) to information about the zeros of the associated L-functions.
We saw how we have great formulas for summing Dirichlet characters; similar
formulas exist for other families as well. For details in the case of
Dirichlet L-functions, see Chapter 18 of our book. Note the amazing similarity
between random matrix theory, which we will hopefully cover later in the
semester. We have three ingredients to understand the zeros. (1) Determine the
correct scale to study the zeros (this actually falls out from the functional
equation). (2) Derive a formula relating sums over the zeros to a related sum
over the prime coefficients. This is the analogue of the Eigenvalue Trace
Lemma, Tr\((A^k) = \sum_k \lambda_i(A)^k\). The reason this formula was so
useful is that while we want to understand the eigenvalues of our random
matrices, it is the matrix elements that we choose. Thus, this formula allows
us to pass from knowledge of the matrix elements to knowledge of the zeros.
These are known as Explicit
Formulas.
(3) The Eigenvalue Trace Lemma and the Explicit formula would be useless,
however, if we were unable to actually execute the sums. Our theory thus
requires some kind of averaging formula. For random matrix theory, this was
the integrals of Tr\((A^k) P(A) dA\); we could compute these as Tr\((A^k)\) is
a polynomial in the matrix elements, and then we used combinatorics and
probability theory. Sadly, we do not have great averaging formulas in number
theory, and this is why the results there are significantly worse.
- Euler product GOOD! See what can happen when you generalize the zeta
function by changing the denominator: you can get a similar function where the
generalized Riemann Hypothesis fails! See
http://en.wikipedia.org/wiki/Hurwitz_zeta_function
- It's always good thinking about how to generalize. We start with \(zeta(s)
= \sum_n 1/n^s\); we can generalize by changing the numerator to \(a_f(n)\),
or changing the denominator to \(f(n)^s\). The first is the more standard, and
as commented above leads to better properties.
- We began our brief tour of
Dirichlet Characters and Dirichlet
L-funcions.
Their main applications are in proving Dirichlet's
Theorem of Primes in Arithmetic Progression and
other similar results. We can modify the explicit formula for the Riemann zeta
function to obtain one for Dirichlet L-functions. It helps to assume the Generalized
Riemann Hypothesis,
which allows us to write the non-trivial zeros of \(L(s,\chi)\) as \(1/2 + i
\gamma_\rho\), where \(\gamma_\rho\) is a real number. This leads to an
explicit formula (see Chapter 18 for the details) relating \(\sum_\rho
h(\gamma_\rho \ast \log m)\) to \(\sum_p \log p h^(\log p / \log m) / \sqrt{p}\).
-
We also talked a bit about how to write down formulas for each Dirichlet
character, using
the fact that (Z/mZ)* is a cyclic group for m prime.
Each group is generated by some element \(g\), so a typical element \(x =
g^k\) for some \(k\). It therefore suffices to define the Dirichlet character
at the generator. As these characters map (Z/mZ)* to complex numbers of
absolute value 1 and are group homomorphisms, we have \(|\chi(g)| = 1, \chi(g^{m-1})
= \chi(g)^{m-1} = 1\) and \(\chi(1) = 1\), implying that the characters are
the same as the functions \(f(g) = \exp(2 \pi i \ell / (m-1))\) for \(ell \in
\{0, 1, \dots, m-2\}\). If \(\ell = 0\) we basically recover the Riemann zeta
function when we look at \(\sum_n \chi(n) / n^s\); for other \(n\), however,
we get very different functions. These functions are significantly easier to
extend past Re(s) = 1 because of the cancellation in sums of \(\chi(n)\). In
fact, if \(ell\) is not zero, then the associated character \(\chi\) satisfies
\(\sum_{n=0}^{m-2} \chi(n) = 0\). This and partial summation allow us to
extend \(\sum_n \chi(n) / n^s\) from Re(s) > 1 to Re(s) > 0.
- Consider random
harmonic series:
\(\sum_n \omega(n) / n\) where \(\omega(n) = 1\) with probability \(1/2\) and
\(-1\) with probability \(1/2\). Schmuland
has a fascinating paper on the properties of these sequences (try
here if that link doesn't work).
- For a quick review of Dirichlet L-functions
(needed properties and why we care) see
http://web.williams.edu/Mathematics/sjmiller/public_html/ntrmt09/handouts/Lfns/DirichletDens_sjmiller.pdf
(this has since been folded into a paper of mine with Dan Fiorilli).
- Video online here:
http://youtu.be/DfSCUElVr6s
Wednesday, October 22. We continued our brief introduction
to complex analysis. We did some contour integration, saw how much easier
integration can become, and saw applications to number theory through a sketch
of the proof of the Prime Number Theorem, which highlights why we need to
analytically continue \(\zeta(s)\) and understand the location of its zeros.
- We continued working
on the Residue Theorem. The difficulty is often in exploiting decay; try
integrating
\(\sin^2 x / x^2\) over the real line... (if you integrate by parts first you
can make life easier!). The
Residue Theorem is
an incredibly powerful tool. Even if you only care about integrals of
functions of a real variable, it is frequently useful to extend to the complex
plane. The reason is that, in general, it is not possible to write down
anti-derivatives; integration is hard! (There is an interesting
algorithm (due to Risch) to find anti-derivatives involving elementary
functions.
The
linked article has a nice example here, where changing the constant term
by 1 leads to the method failing; this is related to a change in the
Galois group.) There
are several steps to using the Residue Theorem:
- Step one: determine the function. Frequently
it is easy: given \(f(x)\) try \(f(z)\). Sometimes, though, it's a bit harder. If you
have a \(\cos(x)\) term you could try \(\cos(z) = (\exp(iz) + \exp(-iz))/2\), or you might
try taking just \(\exp(iz)\) and taking the real part.
- Step two and three are related: choose a
contour and find the poles and residues. Often the location of the poles
affects what contour you take. You DO NOT
want a pole on the contour (I've had to do this a few times in my
research, and it is not fun). Sometimes you have to split the integrand up
into different pieces, and do one part with one closed curve and another part
with another. A big factor in determining contours is how the function decays.
Remember that decay is a bit trickier in complex numbers; for instance, let's
assume \(|z| > 2000\); then \(1/|1+z^2| \le 1/(R^2 - 1)\); if we
restricted to \(|x| > 2000\) then \(1/(1+x^2) \le 1/(R^2 -1)\).
The issue is that we have a phase.
- Step three: repeat earlier steps as needed.
- We did a very important example, finding the
normalization constant for integrating \(1/(1+x^2)\) over the real line.
This leads to the
Cauchy distribution, which is very important in probability.
- Consider finding the poles and residues of \(1/1+z^{2010}\). Key is
Euler's formula:
\(\exp(ix) = \cos(x) + i \sin(x)\). Remember that if we want to solve \(z^{2010}
= -1 = \exp(i\pi)\), we could also write \(-1\) as \(\exp(i\pi + 2\pi in)\) for any integer
\(n\).
There will be \(2010\) distinct solutions (half in the upper half plane, half in
the lower half plane).
Contour integral examples:
Recall a powerful technique from Calc I: if \(f(g(x)) = x\) (so \(f\) and
\(g\) are inverse functions, such as \(\sqrt{x^2}\) or, one needed for the
Cauchy distribution, \(\tan(\arctan(x))\)), then \(g'(x) = 1 / f'(g(x))\); in
other words, knowing the derivative of \(f\) we know the derivative of its
inverse function. This was used in Calc I to pass from knowing the derivative
of \(\exp(x)\) to the derivative of \(\ln(x)\). We can also use this to find
various anti-derivatives in terms of inverse
trig functions; while many are close to \(\sqrt{1-x^2}\), none of them are
exactly that (a
list of the derivatives of these are here). This highlights one of the
most painful parts of integration theory -- just because we are close to
finding an anti-derivative does not mean we can actually find it! While there is a
nice anti-derivative of \(\sqrt{1 - x^2}\), it is not a pure derivative of an
inverse trig function. There are many tables
of anti-derivatives (or integrals) (a
fun example on that page is the Sophomore's
Dream). Unfortunately it is not always apparent how to find these
anti-derivatives, though of course if you are given one you can check by
differentiating (though sometimes you have to do some non-trivial algebra to
see that they match). In fact, there are some tables of integrals of important
but hard functions where most practitioners have no idea how these results are
computed (and occasionally there are errors!). We will see later how much
simpler these problems become if we change variables; to me, this is one of
the most important lessons you can take from the course: MANY
PROBLEMS HAVE A NATURAL POINT OF VIEW WHERE THE ALGEBRA IS SIMPLER, AND IT IS
WORTH THE TIME TO TRY TO FIND THAT POINT OF VIEW!
Computing residues: We've seen that using the
geometric series is a great way to compute residues. For example, if we want a
residue at say \(3\) we replace \(z\) everywhere with \(z-3 + 3\); the first factor
\(z-3\) is
then small. Another useful approach is through differentiation. Say we have \(f(z) = g(z) / (z-3)^{10}\) with
\(g(z)\) holomorphic at \(z=3\). To calculate
the residue at \(z=3\) we need to find the \((z-3)^9\) term of \(g(z)\). We
could do this with our trick, or we could compute 9 derivatives.
Video online here:
http://youtu.be/EGDPdOuK3Jg
Monday, October 20. To see the connection between zeros of
the Riemann zeta function \(\zeta(s)\) and the distribution of primes requires
some results from complex analysis. Interestingly, one does not need to go
through the zeros of \(\zeta(s)\) to reach the Prime Number Theorem, though it
is an efficient, good way to go.
-
The complex analytic proof of the Prime
Number Theorem uses
several key facts. We need the functional equation of the Riemann zeta
function (which we saw follows from Poisson summation and properties of the
Gamma function), the Euler product (namely that \(zeta(s)\) is a product over
primes), and
the Riemann zeta function has no zeros on the line \({\rm Re}(s) = 1\).
If this happened, then the main term of \(x\) from integrating \(\zeta'(s)/\zeta(s)
\ast x^s/s\) arising from the pole of \(\zeta(s)) at \(s=1\) would be
cancelled by the contribution from this zero! Thus it is essential that there
be no zero of \(zeta(s)\) on \({\rm Re}(s) = 1\). There are many proofs of
this result. My
favorite proof is
based on a wonderful trig identity: \(3 + 4 \cos(x) + \cos(2x) = 2 (1 - \cos(x))^2
\ge 0\) (many people have said that \(w^2 \ge 0\) for real \(w\) is the most
important inequality in mathematics). There is an elementary proof of the
prime number theorem (ie, one without complex analysis). For those interested
in history and some controversy, see
this article by Goldfeld for a terrific analysis of the history of the
discovery of the elementary proof of the prime number theorem and the priority
dispute it created in the mathematics community.
Riemann computed zeros of \(\zeta(s)\) but didn't mention his achievement; the
method only came to light about 70 years later when Siegel was looking at
Riemann's papers. Click
here for more on the Riemann-Siegel formula for computing zeros of zeta(s).
Finally, terrific advice given to all young mathematicians (and this advice
applies to many fields) is to read the greats. In particular, you should read Riemann's
original paper.
In case your mathematical German is poor, you can click
here for the English translation of Riemann's paper.
The key passage is on page 4 of the paper:
- One now finds indeed approximately this
number of real roots within these limits, and it is very probable that all
roots are real. Certainly one would wish for a stricter proof here; I have
meanwhile temporarily put aside the search for this after some fleeting futile
attempts, as it appears unnecessary for the next objective of my
investigation.
- The main input we will need is that
integrals along circles (or more generally nice curves) of the
logarithmic derivative of a nice function is just the order of the zero or
pole at the center of the circle. In other words, if we have a Taylor
expansion \(f(z) = a_k z^k + \cdots\) (where \(k\) is the first non-zero term;
thus \(a_k\) is not zero and if \(k > 0\) we say the function has a zero of
order k at the origin, while if \(k < 0\) we say the function has a pole of
order \(k\)). The Residue theorem then gives: \((1 / 2 \pi i) \int_{|z| = r}
f'(z)/f(z) dz = k\). Note that if the function doesn't have a zero or pole at
the origin then this integral is zero (for r sufficiently small). More
generally, if \(g(z)\) is a nice function \((1 / 2 \pi i) \int_{|z| = r}g(z)
f'(z)/f(z) dz = k g(0)\). We will use a further generalization of this to
relate the zeros of the Riemann zeta function to counting the number of primes
at most x. For more details on the complex analysis we are using, see Cauchy-Riemann
equations, Cauchy-Goursat
Theorem, Residue
Theorem, Green's
Theorem. The key takeaways from today's class are: (1) we can convert
certain types of integrals to finding the \(a_{-1}\) coefficient in a Taylor
expansion (and this is good as algebra is easier
than integration); (2) integrating the logarithmic derivative is useful as the
answer is related to the zeros and poles of the function.
To really drive the point home: the reason this is such a spectacular formula
is that it reduces integration (hard) to finding ONE Taylor coefficient (ie,
algebra, ie easy).
- Green's theorem in a day:
https://www.youtube.com/watch?v=Iq-Og1GAtOQ
-
Video online here:
http://youtu.be/zCbm7hZUY9Q
Friday, October 17. Our Fourier analysis paid big dividends today
with the application of Poisson Summation to get the functional equation.
-
There are many proofs of the functional
equation of
the Riemann
zeta function;
the proof we gave is `secretly' relating the Riemann zeta function to the Mellin
transform (which
is basically the Fourier
transform after
a change of variables) of the theta
function.
A crucial input was the Gamma
function,
which arises throughout mathematics, statistics, science, .... Functional
equations are extremely important, as they allow us to work with useful
functions that are initially only defined in one region in larger regions. The
functional equation of the Riemann zeta function or the Gamma function (or the
geometric series) are just a few instances. It is worth pondering what allows
us to find a functional equation. For the Gamma function, it was integrating
by parts in
the integral defining the Gamma function; for the theta function, it was Poisson
summation.
Finally, it is worth noting that we have seen yet again examples of how
problems can be converted to integrals. In this case, the Riemann zeta
function initially was only defined for Re(s) > 1; however, we then rewrote it
as an integral from \(x = 0\) to \(\infty\)
involving the omega function (which also made sense only for Re(s) > 1),
but then we rewrote that as two integrals from \(x = 1\) to \(\infty\)
involving the omega function, and these integrals exist for all \(s\). We are
fortunate in finding an integral expression which we can work with. It should
hopefully seem `natural' (at least in hindsight) in passing from the omega
function to the theta function (omega is a sum over \(n > 0\), theta is a sum
over all \(n\) and thus there is a chance Poisson summation could be useful).
-
Dirichlet eta function:
http://en.wikipedia.org/wiki/Dirichlet_eta_function
-
Riemann's original paper:
http://www.claymath.org/sites/default/files/ezeta.pdf ((READ THE
CLASSICS -- READ THIS!!))
-
Video online here:
http://youtu.be/z8WstIYV3Xc
Wednesday, October 15. The Riemann zeta function is one of the most
important functions in number theory; we finally got to it!
-
We finished our analysis of splitting integrals. This is a very important
technique to master, which is why I was willing to spend more time on it
today. You want to get used to having free parameters to choose and optimize
later.
-
Infinite product:
http://en.wikipedia.org/wiki/Infinite_product
-
Infinite product that converges to a non-zero number, all terms rational,
product rational: \(\prod_{n=2}^\infy \frac{n^2}{n^2-1} = 2\). Unlike the
diverging \(\prod_{n=2}^\infty \frac{n}{n-1}\) the difference is a \(1/n^2\)
instead of a \(1/n\).
-
Evaluating zeta(2):
http://www.uam.es/personal_pdi/ciencias/cillerue/Curso/zeta2.pdf
-
Even better: zeta(2n):
http://www.uam.es/personal_pdi/ciencias/cillerue/Curso/zeta2.pdf
-
We looked at special values of the Riemann zeta function to get proofs of the
infinitude of the primes.
-
From Proofs from the Book: Six proofs of the infinitude of the primes:
http://www.cwu.edu/~glasbys/INFINITY.PDF
-
Many of the Riemann zeta function's properties are related to viewing it as a
function of a complex variable \(s\). As such, it is not surprising that we
need some results from Complex Analysis for our studies. The main result we
are heading towards is the Cauchy
Residue Theorem.
The most important fact is that if \(f(x) = \sum_{n = -N}^{\infty} a_n z^n\),
then \((1 / 2\pi i) \int_{|z| = r} f(z)dz = a_{-1}\). The reason this is such
a spectacular formula is that it reduces integration (hard) to finding ONE
Taylor coefficient (ie, algebra, ie easy). Finally, below are the three
arxiv posts related to either topics we've just studied, are about to, or
could have if the class voted differently (note: the arxiv is
a wonderful site, but nothing on it is refereed; many professional
mathematicians check the arxiv every
day and skim the titles and abstracts of all posts; many more do this for
their speciality)
-
Video online here.
http://youtu.be/XhVHeawbLPc
Friday, October 10. Breaking up sums and integrals is extremely
important in analytic number theory -- you want to learn how to balance
getting the best possible results and getting accessible algebra. Often we
don't need optimal bounds, and can make do with less.
Wednesday, October 8. Today's class highlights the differences
between books and lectures. In a book you have to set down the material; in a
lecture you can change the path. Several of you had made some good comments in
class or in emails about the size of the divisor function, and I felt it might
be illuminating to think about the problem. We talked at great length on the
divisor function (click
here for the Wikipedia page). It's nice to be able to take the time and
look deeply at some arguments; I want to give you a flavor for the subject so
you can get a sense of whether or not this area is for you.
- We first obtained a non-trivial exponent savings of \(d(n) \le 2
n^{1/2}\); we did this by noting that if \(n = xy\) then at least one factor
is at most \(\sqrt{n}\). We ran into problems when we tried to extend this
further. We tried breaking into cases, and saw that if there was a large prime
factor then we had a savings. For example, if one of the prime factors \(p\)
is at least \(n^{1/4}\) then \(n/p \le n^{3/4}\), and by our earlier work \(d(n/p)
\le 2 (n/p)^{1/2} \le 2 n^{3/8}\). To get all the divisors of \(n\), we look
at each divisor of \(n/p\) and we can either multiply by \(p\) or not (if
\(p^2|n\) this will double count some divisors, but that's fine as we're just
shooting for an upper bound). Thus \(d(n) \le 2 d(n/p) \le 4 n^{3/8}\), and we
have saved a power in the exponent. Saving powers in the exponents is a
huge part of analytic number theory, and one of the reasons I wanted
to spend so much time on this.
- What can we do next? The argument we did showed that if there is a large
prime factor then the divisor function is probably smaller. It suggests that
the more small prime factors, the larger it will be. A little work shows that
if \(n = p_1^{r_1} \cdots p_k^{r_k}\) then \(d(n) = (1+r_1) \cdots (1+r_k)\).
Our arguments suggest that the worse case is when we have as many factors as
possible, as it's better to have more terms than one term higher (adding a new
term with \(r=0\) doubles the product, whereas increasing an \(r_i\) by 1
increases the product by at most 3/2.
- This suggests we look at primorials, which are factorials with the terms
restricted to primes. So \(5\# = 5 \cdot 3 \cdot 2\), for example. We did a
lot of work to figure out the right way to look at things. If we assume \(n\)
is a primorial, say \(n = p_m\#\) for some (\m\), we need to find the index
\(m\). This led us to looking at approximating solutions of transcenental
equations!
- For example, how big is \(p_n\)? Well, if \(\pi(x) = \#\{p: p \le x, p\ \)
prime\(\}\), the Prime Number Theorem says that \(\pi(x) \sim x/\log x\). So
\(n = \pi(p_n) \sim p_n/\log p_n\). We try to solve this. Our first naive
(ridiculous) guess is that \(p_n = n\); substituting this gives \(n = n/\log
n\), which doesn't work. We see our guess was too low, we need to increase it
by approximately \(\log n\), so we try \(p_n = n \log n\). Substituting this
into \(n \sim p_n/\log p_n\) gives \(n \sim n \log n / (\log n + \log\log n) =
n - n \log \log n / (\log n + \log\log n) \), which is approximately
correct as \(\log \log n / \log n \to 0\).
- We used this kind of analysis to figure out what the index \(m\) should be
so that \(p_m\# = n\). Whenever we see a product we want to
take logarithms! We have \(\sum_{p \le p_m} \log p = \log(p_m\#)\); by the
Prime Number Theorem \(\sum_{p \le x} \log p \sim x\), and so we get \(\log(p_m\#)
\sim p_m\) so \(n \sim p_m\# \sim e^{p_m} \sim e^{m \log m} = m^m\). Now of
course how big is \(m\), given that \(n \sim m^m\)? This is another
transcendental equation to solve, and we get \(\log n \sim m \log m\). We try
to solve as before. If we try \(m = \log n\) we get \(\log n \sim \log n
\log\log n\), which is too high. So we correct and try \(m = \log n / \log
\log n\), and find \(\log n \sim \frac{\log n}{\log \log n} (\log\log n -
\log\log\log n\), so \(n \sim \log n - \log n \cdot \frac{\log\log\log
n}{\log\log n}\), which to first order is correct.
- These arguments get quite involved; often mathematicians write \(\log_3
n\) for \(\log\log\log n\), as we don't really use base 3 (I'd vote for
\(\ln_3\), as that clearly isn't base 3, but I'm outvoted.
- Now that we have these calculations telling us that if \(p_m\# = n\) then
\(m \sim \log m/\log \log n\), we know \(d(n) \sim 2^{\log n / \log\log
n} = e^{\log 2 \log n / \log\log n} = n^{\log 2/\log\log n}\). This shows us
that (assuming our assumption of which inputs have the largest value of the
divisor function) we have \(d(n) \le n^\epsilon\) for any positive
\(\epsilon\).
- Video online here:
http://youtu.be/C4gw6cYxmEo
Monday, October 6. There are many important arithmetical functions,
and lots of important properties of them.
Euclid's
argument actually
gives a lower bound, namely on the order of
\(\log \log x\) primes at most \(x\) (the true answer is that there are about
\(x / \log x\) primes at most \(x\)). As
a nice exercise (for fun), prove that Euclid's argument gives a bound of this
size.
This leads to an interesting sequence: 2,
3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 2801, 11, 17, 5471,
52662739, 23003, 30693651606209, 37, 1741, 1313797957, 887, 71, 7127, 109, 23,
97, 159227, 643679794963466223081509857, 103, 1079990819, 9539, 3143065813,
29, 3847, 89, 19, 577, 223, 139703, 457, 9649, 61, 4357....
This sequence is generated as follows. Let a_1 = 2, the first prime. We apply
Euclid's argument and consider 2+1; this is the prime 3 so we set \(a_2 = 3\).
We apply Euclid's argument and now have \(2\cdot 3+1 = 7\), which is prime,
and set \(a_3 = 7\). We apply Euclid's argument again and have \(2\cdot 3\cdot
7+1 = 43\), which is prime and set \(a_4 = 43\). Now things get interesting:
we apply Euclid's argument and obtain \(2\cdot 3\cdot 7 \cdot 43 + 1 = 1807 =
13\cdot 139\), and set \(a_5 = 13\). Thus \(a_n\) is the smallest prime not on
our list genereated by Euclid's argument at the nth stage. There are a
plethora of (I believe) unknown questions about this sequence, the biggest of
course being whether or not it contains every prime. This is a great sequence
to think about, but it is a computational nightmare to enumerate! I downloaded
these terms from the Online Encyclopedia of Integer Sequences (homepage is http://oeis.org/ and
the page for our sequence is http://oeis.org/A000945 ).
You can enter the first few terms of an integer sequence, and it will list
whatever sequences it knows that start this way, provide history, generating
functions, connections to parts of mathematics, .... This is a GREAT website
to know if you want to continue in mathematics. There have been several times
I've computed the first few terms of a problem, looked up what the future
terms could be (and thus had a formula to start the induction).
One of the
great joys of teaching at Williams is how intellectually curious you are; in
another class one of you has already contacted me about the Euclid-Mullin
sequence we discussed, and I've put some links to more in the additional
comments below. We're probably the only class in the world to talk about this
in a class on operations research, but I think it's a great use of time for
many reasons, ranging from seeing how fast algorithms run, to how well they
run (is anything missed), to the difficulties of finding them (hey, this class
is all about finding solutions -- how do we find the terms in this sequence?).
For more on these primes, see here:
There are
other proofs of the infinitude of primes. In
a real analysis course, one develops the notation and machinery to put
calculus on a rigorous footing. In fact, several
prominent people criticized the foundations of calculus, such as Lord
Berkeley; his famous attack, The
Analyst, is available here. It wasn't until decades later that a good
notion of limit, integral and derivative were developed. Most people are
content to stop here; however, see also Abraham
Robinson's work in Non-standard
Analysis. One of my favorite applications of open
and closed sets is Furstenberg's
proof of the infinitude of primes; one night while a postdoc at Ohio
State I had drinks with Hillel
Furstenberg and one of his
students, Vitaly
Bergelson. This is considered by many to be one of the best proofs of the
infinitude of primes; it is so good it is one of six proofs given in THE
Book. Unlike most proofs of the infinitude of primes, this gives no bounds
on how many primes there are at most x.
Video online
here:
http://youtu.be/EJ4Ijxwfi5Q
Friday, October 3. Mountain Day, no class
Wednesday, October 1. We studied Poissonian behavior of uniformly
distributed random variables; an area of active research is what happens for
special sequences.
-
We studied the distribution of nearest neighbor spacings between independent,
identically distributed random variables taken
from the uniform distribution. We see similar behavior when we look at the
spacings between adjacent primes or the ordered \(n^k \alpha\) mod \(1\) for
\(k\) at least two. In neither case
do we have a proof; in fact, for \(n^k \alpha\) the behavior depends greatly
on the irrationality exponent of \(\alpha\). For more details, see the
textbook and the references therein. Our proof used several results from
previous classes, including the Fundamental
Theorem of Calculus
to
find the probability and then the definition of the derivative of exp(x).
-
We also discussed the natural scale to study problems (ie, looking at the
average spacing between events, where the events here are the ordered values
of our random variables). This is one reason the twin prime problem is so
difficult, as this is a miniscule difference relative to the average spacing;
calculating Brun's
constant (the
sum of the reciprocals of twin primes) led Nicely to discover the Pentium
bug;
a nice description of the discovery of the bug is given at http://www.trnicely.net/pentbug/pentbug.html.
-
Integration in general is hard, and frequently we need to resort to numerical
methods such as Monte-Carlo
integration (see http://www.fas.org/sgp/othergov/doe/lanl/pubs/00326866.pdf for
a note about the beginnings of the method). Choosing random sequences has nice
applications in such subjects.
-
Monte Carlo integration
has been hailed by some as one of the (if not the) most influential papers in
the 20th century. We only touch on the briefest part of the theory here. It
can be combined with Central
Limit Theorem or Chebyshev's
Theorem to give really good
results on numerically evaluating integrals. Specifically, if \(N\) is large
and we choose N points uniformly, we can simultaneously assert that with
extremely high probability (such as at most \(1 - N^{-1/2}\)) the error is
extremely small (at most \(N^{-1/4}\)). If you want to know more, please see
me -- there are a variety of applications from statistics to mathematics to
economics to .... Below are links to two papers on the subject to give you a
little more info:
- Mathematica code from class:
- poissontest[alpha_,k_,num_] := Module[{},
list = {};
For[n = 1, n <= num, n++, list = AppendTo[list,SetAccuracy[Mod[1.0n^k
alpha,1],100]]];
list = Sort[list];
diff = {};
For[n = 2, n <= num, n++, diff = AppendTo[diff, list[[n]] - list[[n-1]]]];
Print[Histogram[diff]];
];
poissontest[Sqrt[Pi],1,2014]
-
Video online here:
http://youtu.be/I_10ADutXD8. Video issues: Unfortunately the first 17
minutes is missing audio. NO idea why, no idea why it jumps in. Briefly what
we did was look at Mathematica code (available in the additional comments)
that showed that if we took n alpha mod 1 there were only 2 or 3 possible
differences, but n^2 alpha mod 1 had what looked like a continuum. Thus, while
n alpha mod 1 is equidistributed, it does have different behavior. We talked
about applications to Monte Carlo Integration, and the advantage of not
necessarily taking points completely at random but rather using some
structure.
Monday, September 29. There's a lot that can be done with
irrationals. Today we saw how to use Fejer's theorem to obtain a proof of
Weyl's law.
-
The proof of the equidistribution of \(n \alpha\) mod \(1\) today uses a very
common analysis technique. To prove a result for a step function (like the
characteristic function of the interval \(([a,b]\)), it suffices to prove the
result for a continuous function, as we can find a continuous function that is
arbitrarily close. Then, to prove the result for continuous functions we
instead prove the result for a nice, finite Fourier series, as we can find
such a series that is arbitrarily close to our continuous function. Such
arguments are used all the time in Measure Theory. The crux of the argument is
that we have a finite sum of sines and cosines (the \(\exp(2 \pi i m x)\)),
and that these can be divided into two parts. The first is the constant term
(\(m=0\)) gives \(b-a\) plus a small error; the remaining terms are 'small' in
terms of \(N\). How small is a VERY deep question, and involves the
irrationality exponent of alpha (ie, how well we may approximate \(\alpha\) by
irrationals). The big result along these lines is the Erdos-Turan theorem (For
applications, it is often important to have a sense as to how rapidly one has
convergence in equidistribution results. One of the common techniques involves
using the Erdos-Turan theorem
(the web resources aren't great; I have a copy of a good
book that
shows how the irrationality exponent is connected to quantifying the rate of
convergence to equidistribution).). Finally, it is worth going over the
argument and keeping track of what was given and what we chose. We are given
an \(\epsilon > 0\); this leads to a \(j\) (for how well the continuous
functions approximate the step function) and \(M\) (the number of terms in our
finite Fourier sum); we then send \(N\) to infinity.
-
We spent a lot of time talking about the difference between sharp cutoff
functions and smooth cutoff functions. For many problems, it is preferable to
use smooth cutoff functions (though in the world we often care about sharp
cutoffs). We talked about the differences between refining a partition and a
new partition, and a suggestion from class gave an example where the lower sum
approximation to the area can decrease as we go from a partition with n pieces
to one with n+1.
-
We discussed at length how easy it is to accidentally assume something. For
example, while it is reasonable to expect that the more terms we take
the more accurate the Fejer series approximation is to f, we never proved that
and it might be false. Consider for example the sequence 1/2, 1/2, 1, 1/3,
1/3, 1/3, 1/2, 1/4, 1/4, 1/4, 1/4, 1/3, 1/5, 1/5, 1/5, 1/5, 1/5, 1/4, 1/6,
.... The sequence converges to zero, but not monotonically.
-
Video online here:
http://youtu.be/BFVuimP8ZLE
Friday,
September 26. There are many applications of Fourier analysis; we
saw how to use Fejer's theorem to obtain a proof of Weyl's law.
-
In class we talked about denseness
of certain sequences. Other fun ones are \(\sin(n)\) and \(\cos(n)\) -- are
these dense in the interval [-1, 1]? Equidistributed? What can you say about
these? (I believe one is somewhat elementary, one is more advanced. Email me
for a hint on what math results might be useful.) We also looked in the book
at how knowledge of the irrationality type of \(\alpha\) can be used to see
\(n^2 \alpha\) mod \(1\) is dense. We assumed \(\alpha\) had irrationality
exponent of \(4 + \eta\) for some \(\eta > 1\) -- can the argument work for a
smaller exponent? What if we studied \(n^k \alpha\) mod \(1\) -- what would we
need to assume about the irrationality exponent? Can you somewhat elementarily
prove the denseness of \(n^2 \alpha\) if the irrationality exponent is less
than 3? I say somewhat elementarily as we will later show the sequence is
equidistributed, and thus it must be dense. Can you come up with a more
elementary proof, where you just get denseness? Finally, for those who know
(or are interested in) measure theory, one natural question to ask is how
severe is the restriction to studying irrational \(\alpha\) with exponent \(4
+ \eta\)? If you're familiar with Cantor's diagonalization argument (Theorem
5.3.24), almost all numbers are transcendental (and thus irrationality);
however, this does not mean they have an irrationality exponent as large as
\(4+\eta\) (for example, \(\ln(2)\) is irrational but has exponent less than
\(4\)). A good exercise is to modify the proof of Theorem A.5.1 to show that
almost no irrationals (in the sense of measure) have irrationality exponent as
large as \(4 + \eta\).
- Proving \(\sqrt{2}\) is irrational:
- Introduction to continued fractions: Wikipedia:
http://en.wikipedia.org/wiki/Continued_fraction (see also notes by van der
Poorten:
http://maths.mq.edu.au/~alf/www-centre/alfpapers/a094.pdf )
- Hurwitz' Theorem and the most irrational of numbers:
http://en.wikipedia.org/wiki/Hurwitz's_theorem_(number_theory)
- Video online here:
http://youtu.be/tg47OJkNkcQ
Wednesday,
September 24. We spent most of the day estimating integrals. The
main idea is figuring out where things are large and small, and getting a
sense of when an approximation is harmful or not.
-
A great example of this is counting primes. First, putting in log weights is
harmless and can be removed easily, but sets us up to use complex analysis.
More importantly, though (and described in Chapter 3 of our text) is that it
is much easier to study prime powers than just primes, and remove the
contribution from prime powers. The idea is that certain 'completed' sets are
more natural to study, and it's worth sieving with an inclusion / exclusion or
dealing with the higher terms than to look at the quantity of interest.
-
Speaking of primes,
one does not need complex analysis
and there is an elementary
proof of the Prime Number Theorem,
which has sadly led to one of the biggest priority disputes and controversies
in mathematics (see
here for more on it);
another famous controversy is theNewton
- Leibniz on calculus.
-
Video online here:
http://youtu.be/AA_GHlM6sU4
Monday, September 22. We earned dividends
from all our Fourier analysis work (in particular, Poisson Summation), in our
Benford analysis.
Friday, September 19. Today is one of the
biggest applications of Fourier analysis, Poisson Summation!
-
Poisson Summation is
one of the standard tools of analytic number theory, allowing us to frequently
convert long, slowly decaying sums to short, rapidly decaying sums, so that
just a few terms suffice to get a good estimate. One nice application is to
counting the number of lattice points with integer coordinates inside a circle
(also called the Gauss
circle problem).
If you consider points with integer coordinates, you would expect
approximately \(\pi R^2\) such points to be in a circle of radius \(R\); what
is the error? A little inspection shows that the error shouldn't be much worse
than the perimeter, so the answer might be \(\pi R^2\) with an error of at
most something like \(2 \pi R\) (Gauss proved an error of at most \(2 \sqrt{2}
\pi R\)). The current record is by Hooley, who shows
that the error is at most C R^theta, where theta <= .6298....
-
We mentioned the Fourier
Transform and
interesting functions that satisfy a lot of nice conditions but not every
property we'd like. See for example the function \(f\) on page 270 (or better
yet modify this to be infinitely differentiable and it and its first five
powers are integrable). There are many applications, one of the most important
being a proof of the Central Limit Theorem.
When we get to Benford's law we'll need to know what the
Central Limit Theorem modulo 1 looks like. I
prove this in detail in this paper.
- There are other
generalizations of the central limit theorem. One particularly nice version
involves Haar
measure. Consider the set of \(N \times N\) unitary
matrices \(U(N)\), or its
subgroups the
orthogonal matrices and the symplectic
matrices. It turns out there is a way to define a probability measure on
these spaces (this is the Haar measure), and there are generalizations of
the central limit theorem in these contexts: The n-fold convolution of a
regular probability measure on a compact Hausdorff group \(G\) converges to
normalized Haar measure in weak-star topology if and only if the support of
the distribution not contained in a coset of a proper normal closed subgroup
of \(G\).
-
The Central Limit Theorem has
a rich history and numerous applications. What makes it so powerful and
applicable is that the assumptions are fairly week, essentially finite mean,
finite variance, and something about the higher moments. The natural question
is what exactly do we mean by convergence? There are several different
notions.
- A classic result about how rapidly we have
convergence to the standard normal is the Berry-Esseen
Theorem. As many distributions have zero third moment, the fourth moment
frequently controls the speed. This is why instead of looking at the kurtosis (fourth
moment) we often look at the excess kurtosis, which is the kurtosis of our
random variable minus the kurtosis of the standard normal. This is because
it is this difference that frequently controls the speed of convergence.
Taylor series played
a key role in our proofs; the idea is that we can locally replace a
complicated function by a simpler function, so long as we can control the
error estimates.
- We summify our expression by using the identity P = exp(log P); this is
very useful whenever P is a product as logarithms convert products to sums.
This is a great way to do nothing! We saw how well this worked to understand
quantities such as P = lim_{N --> ∞} (1 + x / N2 ) N.
We took the logarithm and log PN =
N log(1 + x / N2 ); we
then Taylor
expanded the logarithm and
found log PN =
x / N + terms of size N2, N3, .... Exponentiating
gives us PN =
exp(x / N) exp(terms of size N2, N3, ...), and we thus
obtain information on the speed of convergence.
- One can prove the
CLT directly in the case of Bin(N, 1/2). As a binomial
random variable is the sum of Bernoulli
random variables, we see that Bin(N,1/2) should become normally
distributed as N tends to infinity. This can be proved directly, and uses Stirling's
formula to estimate the binomial
coefficients.
Video online here:
http://youtu.be/v9eoWGQkoeM
Wednesday, September 17.
Finally, some serious Fourier analysis!
-
Monday, September 15.
After building up some basic results on convergence in analysis, we will be
able to tackle convergence of Fourier series on Wednesday. This is a vast
topic, and cannot be done justice in just a day; thus we have to content
ourselves with highlighting some of the important items.
-
Wednesday, September 10. We continued our
exploration of interchanging operations (derivatives and sums), discussed the
exponential function and various \(L^p\) spaces.
-
Monday, September 8. Rather than covering
the standard definitions, which you can read in the book, we instead
concentrated on some of the advanced analysis concepts underlying operations
with infinities, especially interchanging operations. We'll continue our
conversation on these later.
-
Mathematics StackExchange is a good place to look for answers:
http://math.stackexchange.com/questions/147869/interchanging-the-order-of-differentiation-and-summation
and
http://math.stackexchange.com/questions/352150/differentiating-an-infinite-sum
-
Interchanging derivative and integral:
http://planetmath.org/differentiationundertheintegralsign
-
We need to spend a lot of time worrying about technical issues; this is par
for the course as you continue in analysis. There are a lot of statements
which appear reasonable, but turn out to be false. This is why we looked at
the function \(g(x) = \exp(-1/x^2)\) for \(x \neq 0\) and \(0\) otherwise;
this showed that Taylor series need not converge beyond the point of expansion
(where trivially they must converge). There's even more bad news -- this shows
us that Taylor series need not be unique, as \(x - x^3/3! + x^5/5! - \cdots\)
could be the Taylor series of \(\sin x\) or of \(g(x) + \sin x\).
-
Another big theme of the day was asking questions. There's a standard list
that work in many situations: Does it exist? Where does it exist? Is it
unique? How quickly does it converge? What about higher dimensional analogues?
How do I compute it?
-
One question we didn't consider today was how Taylor series behave under
combinations. What is the Taylor series of a sum? Of a product? Of a
composition? Are there nice formulas relating the new object to the original
ones?
- We then discussed the geometric
series formula. The standard proof is nice; however, for our course the
`basketball' proof is very important, as it illustrates a key concept in
probability. Specifically, if we have a memoryless
game, then frequently after some number of moves it is as if the game
began again. This is how we were able to quickly calculate the probability
that the first shooter wins, as after both miss it is as if the game just
started.
- The geometric series formula only makes sense when \(|r| < 1\), in which case
\(1 + r + r^2 + \cdots = 1/(1-r)\); however, the right hand side makes sense for all
r other than 1. We say the function \(1/(1-r)\) is a(meromorphic)
continuation of
\(1+r+r^2+\cdots.\)
This means that they are equal when both are defined; however, \(1/(1-r)\) makes
sense for additional values of \(r\). Interpreting \(1+2+4+8+\cdots\) as \(-1\) or
\(1+2+3+4+5+ \cdots = -1/12\) actually DOES make sense, and arises in modern physics
and number theory (the latter is \(\zeta(1)\), where \(zeta(s)\) is the Riemann
zeta function)!
- Wikipedia page on limsup and liminf:
http://en.wikipedia.org/wiki/Limit_superior_and_limit_inferior (the
further down you read, the less useful it is for our purposes!).
- The
rearrangement theorem illlustrates the dangers that can happen when we
deal with conditionally convergent but not absolutely convergent sums.
-
Here is a link with more GRE information and practice exams:
http://www.wmich.edu/mathclub/gre.html
-
Here is a link to today's lecture:
http://youtu.be/MonfQXBshnI