Additional comments related to material from the class. If anyone wants to
convert this to a blog, let me know. These additional remarks are for your
enjoyment, and will not be on homeworks or exams. These are just meant to
suggest additional topics worth considering, and I am happy to discuss any of
these further.
- Friday, May 15. Today's question on
the universality of spacings / results is one of the central ones in modern
mathematics and science. What I've seen over the past few years / decades is
that this universality is frequently due to the fact that the answer depends
only on the first two moments, and the higher moments affect the rate of
convergence. We saw this a bit when we looked at the zeros of Dirichlet
L-functions and when we looked at the density of states in random matrix
theory (only the first and second moments mattered there). Today we discussed
the proof of the
Central Limit Theorem. We saw the density for the sum of two independent
variables X1 and X2 with densities f1 and f2 is given by Prob(X1 + X2 = x) =
Int_{-oo to oo} f1(t) f2(x - t) dt; this is called the
convolution of two
functions, and is denoted (f1 * f2)(x). The proof uses the fact that the
Fourier Transform
of a convolution is the product of the Fourier transforms and that
lim_{n --> oo} (1
+ x/n)^n = exp(x). The reason we did not give a full proof is that we
needed to use the inverse Fourier transform, and we never showed that the
inverse Fourier
transform of certain nice functions is unique. For details see sections
11.4.3 and 11.5 in our book.
- Wednesday, May 13. Today we analyzed
the first sum (the (log p / sqrt(p) log m) chi(p) ... term) in the explicit
formula for the family of Dirichlet L-functions (arising from chi: (Z/mZ)* to
Complex Numbers of Absolute Value 1). We saw that if the Fourier transform of
our test function h has support in (-sigma, sigma) with sigma < 2 then these
terms do not contribute as m --> oo. The two inputs for this were the
following; (1) a formula for summing chi(n) over all chi in our family (this
was -1 if n wasn't equivalent to 1 mod m and m-1-1 if n was); (2) trivially
estimating the two resulting sums, one over all primes and one over primes
congruent to 1 modulo m. Almost surely one can do better. We attacked this
problem the same way we've attacked numerous others: put in absolute values
and take worst case scenarios every single time. Inputting more number theory
should lead to us getting better results. There are numerous papers which have
estimates that are similar to what we need (ie, estimates for error terms in
the distribution of primes in arithmetic progression), but sadly none of these
have provable results which help (they do have conjectures which help). If
you're interested, let me know and I'll send you some papers to read. A good
first paper to get some familiarity with the subject is Montgomery's
Primes in Arithmetic Progression.
- Monday, May 11. As always, see
Chapters 3 and 18 of our book for more information about
Dirichlet
Characters and
Dirichlet L-funcions. Their main applications are in proving
Dirichlet's Theorem of Primes in Arithmetic Progression and other similar
results. We showed today how to modify the explicit formula for the Riemann
zeta function to obtain one for Dirichlet L-functions. We assumed the
Generalized Riemann Hypothesis, which allows us to write the non-trivial
zeros of L(s,chi) as 1/2 + i gamma_rho, where gamma_rho is a real number. This
led to an explicit formula (see Chapter 18 for the details) relating sum_rho
h(gamma_rho * log m) to sum_p log p h^(log p / log m) / sqrt(p). We
discussed in detail why we have a function h on the zeros side and its Fourier
transform h^ on the prime side. In Wednesday's class we'll analyze these sums
(if
you want to read ahead and see all the details, the preprint is available here).
This falls out naturally from the logarithmic derivative and choosing our test
function to be nice when Re(s) = 1/2. Specifically, we're integrating (1/2 pi
i) Int_{Re(s) = 1+epsilon} phi(s) ds / n^s, where phi(s) = h( (s - 1/2) / i )
for some nice function h. When we shift contours to Re(s) = 1/2, we have s =
1/2 + it so ds = i dt and the integral becomes (1 / 2 pi) Int_{t = -oo to oo}
h(t) dt / n^{1/2 + it} = (1 / 2 pi) (1 / sqrt(n)) |int_{t = -oo to oo} h(t)
exp(-2 pi i (log n / 2 pi) t) dt, and this integral is just h^(log n / 2pi).
We also talked a bit about how to write down formulas for each Dirichlet
character,
using the fact that (Z/mZ)* is a cyclic group for m prime. We saw each
group is generated by some element g, so a typical element x = g^k for some k.
It therefore suffices to define the Dirichlet character at the generator. As
these characters map (Z/mZ)* to complex numbers of absolute value 1 and are
group homomorphisms, we have |chi(g)| = 1, chi(g^{m-1}) = chi(g)^{m-1} = 1 and
chi(1) = 1, implying that the characters are the same as the functions
f(g) = exp(2 pi i ell / (m-1)) for ell in {0,1,...,m-2}. If ell = 0 we
basically recover the Riemann zeta function when we look at Sum_n chi(n) /
n^s; for other n, however, we get very different functions. These functions
are significantly easier to extend past Re(s) = 1 because of the cancellation
in sums of chi(n). In fact, if ell is not zero, then the associated character
chi satisfies Sum_{n=0 to m-2} chi(n) = 0. This and partial summation allow us
to extend Sum_n chi(n) / n^s from Re(s) > 1 to Re(s) > 0. We briefly mentioned
random harmonic series: Sum_n omega(n) / n where omega(n) = 1 with
probability 1/2 and -1 with probability 1/2.
Schmuland
has a fascinating paper on the properties of these sequences (try
here if that link doesn't work).
- Friday, May 8. The Riemann zeta
function is the first of many such functions we can study. The generalizations
are called L-functions,
and for us are of the form L(s,f) = Sum_n a_n(f) / n^s = Prod_p L_p(s,f)^{-1}
where L_p(s,f) is a polynomial of degree d in p^{-s}. Two of the most
important are
Dirichlet L-functions (which has applications to primes in arithmetic
progression) and Elliptic Cuve L-functions (which has applications to
understanding the size of the group of rational solutions of the elliptic
curve -- see the
Birch and Swinnerton-Dyer conjecture for more information). Dirichlet
characters are sometimes covered in a group theory or abstract algebra course.
If you want more details, see Chapter 3 of our book (from Section 3.3.2 to
3.3.6). Elliptic curves are discussed in Section 4.2.We initially used some
knowledge of the zeros of the Riemann zeta function to deduce information
about the primes. Amazingly, if we look at families of L-functions we can
convert knowledge of sums over the family of the a_n(f) to information about
the zeros of the associated L-functions. We saw how we have great formulas for
summing Dirichlet characters; similar formulas exist for other families as
well. For details in the case of Dirichlet L-functions, see Chapter 18 of our
book. Note the amazing similarity between random matrix theory. We have three
ingredients to understand the zeros. (1) Determine the correct scale to study
the zeros (this actually falls out from the functional equation). (2) Derive a
formula relating sums over the zeros to a related sum over the prime
coefficients. This is the analogue of the Eigenvalue Trace Lemma, Tr(A^k) =
Sum lambda_i(A)^k. The reason this formula was so useful is that while we want
to understand the eigenvalues of our random matrices, it is the matrix
elements that we choose. Thus, this formula allows us to pass from knowledge
of the matrix elements to knowledge of the zeros. These are known as
Explicit Formulas.
(3) The Eigenvalue Trace Lemma and the Explicit formula would be useless,
however, if we were unable to actually execute the sums. Our theory thus
requires some kind of averaging formula. For random matrix theory, this was
the integrals of Tr(A^k) P(A) dA; we could compute these as Tr(A^k) is a
polynomial in the matrix elements, and then we used combinatorics and
probability theory. Sadly, we do not have great averaging formulas in number
theory, and this is why the results there are significantly worse.
- Wednesday, May 6. The complex analytic
proof of the Prime
Number Theorem uses several key facts. We need the functional equation of
the Riemann zeta function (which follows from Poisson summation and properties
of the Gamma function), the Euler product (namely that zeta(s) is a product
over primes), and one important fact that no one questioned in class: what
if the Riemann zeta function has a zero on the line Re(s) = 1! If this
happened, then the main term of x from integrating zeta'(s)/zeta(s) * x^s/s
arising from the pole of zeta(s) at s=1 would be cancelled by the contribution
from this zero! Thus it is essential that there be no zero of zeta(s) on Re(s)
= 1. There are many proofs of this result.
My favorite proof is based on a wonderful trig identity: 3 + 4 cos(x) +
cos(2x) = 2 (1 - cos(x))^2 >= 0 (many people have said that w^2 >= 0 for real
w is the most important inequality in mathematics). If people are interested
I'm happy to give this proof in class next week (or see Exercise 3.2.19 in our
textbook; this would make a terrific aside if anyone is still looking for a
problem). There is an elementary proof of the prime number theorem (ie, one
without complex analysis). For those interested in history and some
controversy,
see
this article by Goldfeld for a terrific analysis of the history of the
discovery of the elementary proof of the prime number theorem and the priority
dispute it created in the mathematics community. We mentioned Riemann
computed zeros of zeta(s) but didn't mention his achievement; the method only
came to light about 70 years later when Siegel was looking at Riemann's
papers. Click
here for more on the Riemann-Siegel formula for computing zeros of zeta(s).
Finally, terrific advice given to all young mathematicians (and this advice
applies to many fields) is to read the greats. In particular, you should read
Riemann's original paper. In case your mathematical German is poor, you
can
click here for the English translation of Riemann's paper. The key passage
is on page 4 of the paper:
- One now finds indeed approximately this number of real roots within
these limits, and it is very probable that all roots are real. Certainly one
would wish for a stricter proof here; I have meanwhile temporarily put aside
the search for this after some fleeting futile attempts, as it appears
unnecessary for the next objective of my investigation.
- Monday, May 4. Today's lecture
highlights the connections between complex analysis and number theory. The key
relationship is the logarithmic derivative of the Riemann zeta function is a
sum of primes. There are two expressions for the Riemann zeta function when
Re(s) > 1, one as a sum over integers and one as a product over primes. While
typically the sum is easier to work with, it is the product expression that is
more useful here (not surprisingly as we are interested in properties of
primes). Whenever we see a product we want to take a logarithm, and thus it is
natural to take the logarithm of the product expansion. This product expansion
is called the Euler
Product, and is one of the most useful properties of the Riemann zeta
function. We then showed (or rather sketched the argument -- we'll provide
more details on Wednesday) by contour integration that Sum_{p <= x} log p +
nice(x) = x - sum_{rho a zero of zeta(s)} x^rho / rho. This is the
Explicit Formula
of the Riemann Zeta Function (see Chapter 18 for more details), and shows how
we can pass from knowledge of the zeros to knowledge of the primes.
The Riemann
Hypothesis asserts that all the non-trivial zeros have real part equal to
1/2; this basically implies that the error term in the
Prime Number Theorem is of size
x^{1/2 + epsilon} for any epsilon > 0. We will explore in greater detail
connections between fine properties of these zeros and properties of the
primes specifically and other number theoretic functions in general. For
example, the
class
number is extremely important in additive number theory, and the best
unconditional bounds on its size are significantly less than we expect is
true. It is now believed that the spacing statistics of the Riemann zeta zeros
agrees with that of eigenvalues of the
Gaussian
Unitary Ensemble (an important class of random matrices); if true (or even
if we could just prove some weaker results towards this conjectured
agreement), we end up with bounds for the class number of the expected order
of magnitude. For details, see the paper
Spacing of
Zeros of Hecke L-Functions and the Class Number Problem by Conrey and
Iwaniec. Other applications of fine properties of the distribution of the
zeros, the Grand Simplicity Hypothesis, are important in analyzing
Chebyshev's bias;
see
Chebyshev's Bias by Rubinstein and Sarnak.
- Wednesday, April 29. To see the
connection between zeros of the Riemann zeta function and the distribution of
primes requires some results from complex analysis. The main input we will
need is that integrals along circles (or more generally nice curves) of the
logarithmic derivative of a nice function is just the order of the zero or
pole at the center of the circle. In other words, if we have a Taylor
expansion f(z) = a_k z^k + ... (where k is the first non-zero term; thus a_k
is not zero and if k > 0 we say the function has a zero of order k at the
origin, while if k < 0 we say the function has a pole of order k). The Residue
theorem then gives: (1 / 2 pi i) Integral_{|z| = r} f'(z)/f(z) dz = k. Note
that if the function doesn't have a zero or pole at the origin then this
integral is zero (for r sufficiently small). More generally, if g(z) is a nice
function (1 / 2 pi i) Integral_{|z| = r}g(z) f'(z)/f(z) dz = k g(0). We will
use a further generalization of this on Monday to relate the zeros of the
Riemann zeta function to counting the number of primes at most x. For more
details on the complex analysis we are using, see Cauchy-Riemann
equations,
Cauchy-Goursat Theorem,
Residue Theorem,
Green's Theorem.
The key takeaways from today's class are: (1) we can convert certain types of
integrals to finding the a_{-1} coefficient in a Taylor expansion (and this is
good as algebra is easier than integration); (2) integrating the logarithmic
derivative is useful as the answer is related to the zeros and poles of the
function.
- Monday, April 27. There are many
proofs of the
functional equation of the
Riemann zeta
function; the proof we gave is `secretly' relating the Riemann zeta
function to the Mellin
transform (which is basically the
Fourier transform
after a change of variables) of the
theta function. A
crucial input was the
Gamma function, which arises throughout mathematics, statistics, science,
.... Functional equations are extremely important, as they allow us to work
with useful functions that are initially only defined in one region in larger
regions. The functional equation of the Riemann zeta function or the Gamma
function (or the geometric series) are just a few instances. It is worth
pondering what allows us to find a functional equation. For the Gamma
function, it was
integrating by parts in the integral defining the Gamma function; for the
theta function, it was
Poisson
summation. Finally, it is worth noting that we have seen yet again
examples of how problems can be converted to integrals. In this case, the
Riemann zeta function initially was only defined for Re(s) > 1; however, we
then rewrote it as an integral from x = 0 to oo involving the omega function
(which also made sense only for Re(s) > 1), but then we rewrote that as
two integrals from x = 1 to oo involving the omega function, and these
integrals exist for all s. We are fortunate in finding an integral expression
which we can work with. It should hopefully seem `natural' (at least in
hindsight) in passing from the omega function to the theta function (omega is
a sum over n > 0, theta is a sum over all n and thus there is a chance Poisson
summation could be useful).
- Friday, April 24. Many of the Riemann
zeta function's properties are related to viewing it as a function of a
complex variable s. As such, it is not surprising that we need some results
from Complex Analysis for our studies. The main result we are heading towards
is the Cauchy
Residue Theorem. The most important fact is that if f(x) = Sum_{n = -N to
oo} a_n z^n, then (1 / 2pi i) Int_{|z| = r} f(z)dz = a_{-1}. The reason this
is such a spectacular formula is that it reduces integration (hard) to finding
ONE Taylor coefficient (ie, algebra, ie easy). Finally, below are the three
arxiv posts I mentioned related to either topics we've just studied or are
about to (note: the arxiv is a
wonderful site, but nothing on it is refereed; many professional
mathematicians check the arxiv
every day and skim the titles and abstracts of all posts; many more do this
for their speciality)
- Wednesday, April 22. Creating prime
deserts efficiently is an interesting challenge (one can try to prove the
existence of gaps or one can try and construct explicitly such a gap);
see the wikipedia entry
for a description of known results. Note that the (k+1)! method Bryan
mentioned means we need to look at numbers of size 10^5768 to find a gap of
size 2009; The correct notation for Ralph's prime factorial is #, called
the primorial (see the
link for a nice summary of how it grows relative to the
factorial function.);
using the primorial function we see it suffices to `merely' go up to about
10^845 (hey, it's large but it's better than where our results kick in for the
sum of three primes, and it does beat using Chebyshev, I think). Trying to
find good upper and lower bounds for
phi(q), Euler's
totient function, is but one of many problems concerning the standard
arithmetic functions. One is often interested in average values, standard
deviations, et cetera of such functions. A great source for such material is
Hardy and Wright's
classic An Introduction to the Theory of Numbers (you should also read
Hardy's A
Mathematician's Apology for a description of one person's reasons for
doing math). Other functions that are fascinating to study (as q varies) is
the divisor function,
the number of
distinct prime factors, .... Finally,
partial summation
is one of the most frequently used tools in number theory and analysis, for
two reasons: (1) it allows us to pass from a sum we know to one we want to
know; (2) it replaces sums with integrals, and we have more closed form
expressions for integrals. We used partial summation today to prove that the
sum of the reciprocals of the primes at most x grows like log log x, which we
then used to prove phi(q) > C q / log log q. This is but one of many
applications of partial summation (and
but one of many proofs that the sum of the reciprocals of the primes diverge).
- Monday, April 20.
RSA encryption is just one of
many encryption schemes based on number theory. The pirate example we
mentioned (transmitting a secret from island to island) is an example of key
exchange; one popular method is the
Diffie-Hellman key exchange (it was quite surprising to some that there
was a way for two people to agree upon a secret knowable only to them without
meeting in person to exchange the secret). The key ingredient in proving RSA
works is
Fermat's little Theorem. Thus the key ingredients in our analysis are some
of the standard functions of number theory (such as the
Euler totient
function), efficient computations algorithms (such as
fast
exponentiation and the
Euclidean algorithm,
both of which are described in detail in Chapter 1 of our book), and of course
the need to have a fast way to determine if a number is prime. There have long
been known good, efficient methods to check primality, but these either were
probabilistic or depended on the Riemann hypothesis; a few years ago Agrawal,
Kayal and Saxena gave an explicit, deterministic polynomial time algorithm,
Primes in P. We also
briefly discussed other efficient algorithms for multiplying matrices. The
Strassen algorithm
(see also the
Mathworld entry here, which I think is a bit more readable)
multiples two NxN matrices A and B in about N^(log_2 7) multiplications; the reason for this savings is that they can
multiply two 2x2 matrices with seven and not 8 multiplications (3 = log_2 8).
The best known algorithm is the
Coopersmith-Winograd algorithm, which is of the order of N^2.376
multiplications.
See also this paper for some comparison analysis, or email me if you want
to see some of these papers. Some important facts. (1)
The Strassen
algorithm has some issues with numerical stability. (2) One can ask similar questions about one dimension matrices, ie, how many
bit operations does it take to multiply two N digit numbers. It can be done in
less than N^2 bit operations (again, very surprising!). One way to do this is
with the Karatsuba algorithm
(see also the Mathworld entry for the
Karatsuba
algorithm). Finally, we ended with a discussion of the
group structure of
elliptic curves, which replaces the group (Z/pqZ)* with a more complicated
group and thus open up
another
possibility for encryption. In both systems the difficulty in cracking the
code comes from having to solve the
discrete log
problem.
- Friday, April 15. The main
accomplishment of today was three false attempts at trying to prove that an
even number can be written as
the sum of
two primes (also known as the Binary Goldbach problem). Though we didn't
succeed, it is illuminating to see what approaches are `natural' and why they
may or may not work. In particular, approximating the integral of a
non-negative function through the
Cauchy-Schwartz inequality does frequently work in many problems.
- Wednesday, April 15. Today we
emphasized heuristics to see if the Circle Method has a chance of working. One
of the HW problems gives lower bounds for how many kth powers are
needed so that all integers are a sum of at most that many kth
powers; surprisingly, this lower bound turns out to be the correct answer for
many problems (ie, we need at least this many terms to suffice for some
integers, and in fact this will suffice for all integers);
see the article on
Wikipedia for more details. Before doing long calculations it is
worthwhile to try some quick rough estimates to get a sense of whether or not
the method has a chance of success.
- Friday, April 10. Looking for
obstructions to Diophantine equations is a great way to start to investigate
whether or not there is a solution. In particular, if an equation f(x1, ...,
xn) = 0 is to be solvable in the integers two things must hold: (1) it must
have a solution with each xi real; (2) it must have a solution modulo p for
each prime, ie, f(x1,...,xn) = 0 mod p. While we can use this to prove certain
equations don't have solutions (x1^2 + x2^2 + 2009 = 0 has no real solution,
and 2x1 + 2x2 - 2009 = 0 has no solution modulo 2), it is not the case
that if these two conditions hold than the equation has an integer solution.
The classic example is 3x^3 + 4x^3 + 5x^3 = 0 (due to Selmer). The hope is
that somehow (often using the
Chinese
Remainder Theorem) we can piece together the local solutions for each
prime p to form a global solution to the original problem. This is known as
the Hasse Principle
(works well for quadratics, but as Selmer's example shows it does not work in
general). We also discussed the Philosophy of Square Root Cancellation (ie,
the Central Limit
Theorem) (see pages 213 to 215 of our book for a proof in the special case
of tossing a fair coin). See also this blog entry by
E. Kowalski. If you want to numerically explore and see the square-root
cancellation in practice,
you
can go to the following applet on the web. Finally, there was a nice post
on the arxiv recently by Boklan and Elkies:
Every
multiple of 4 except 212, 364, 420, and 428
is the sum of seven cubes.
- Wednesday, April 8. The numerics of
the main term of the Hardly-Littlewood conjectures show a phenomenal agreement
with numerics (for predicting twin primes at most x; similar agreement is seen
in other problems); unfortunately, in general proving the error term is small
is beyond all current techniques. See the nice
blog post by Terry Tao for more on randomness in primes and predictions.
The current record for writing large odd numbers as the sum of three primes is
that any odd number at least 10^1346 is the sum of at most three primes (this
number is far beyond the range of anything we can investigate on the
computer). The key ingredient in our investigations is to use
generating
functions; the difficulty is finding such functions whose coefficients
encode the information we want while being tractable enough to work with. One
enormous advantage of the modern formulation of the Circle Method to the
original is that we just use finite series; this avoids many convergence
issues and simplifies the analysis.
- Monday, April 6. We discussed two of
the main ingredients in the Circle Method: (1) writing expressions as Main
Term + Error with good control on the error, and (2) proving something happens
at least once by showing it happens many times. We discussed
prime races (see
also here), and how misleading the data can be. Instead of looking at
pi_{3,4}(x) - pi_{1,4}(x) we could look at Li(x) - pi(x); this was also
observed to be positive as far as people could see, but it turns out that they
flip infinitely often as to which is larger. This was shown by Littlewood, but
it was not known how far one must go to see pi(x) > Li(x). His student
Skewes showed it
suffices to go up to 10^(10^34) if the
Riemann hypothesis
is true, or 10^(10^(10^963)) otherwise (as large as these numbers are, they
are dwarfed by Graham's
number). We 'believe' it's around 10^316 where pi(x) beats Li(x) for the
first time (note this is well beyond what we can investigate on the
computer!). The proof involves the Grand Simplicity Hypothesis (that the
imaginary parts of the non-trivial zeros of Dirichlet L-functions are linearly
independent over the rationals); this is used to show that (n gamma_1, ..., n
gamma_k) mod 1 is equidistributed in [0,1]^k where the gamma_j are the
imaginary parts of these zeros. Note that this is Kronecker's theorem (which
we discussed in one of the homework problems); it's amazing how this result
surfaces throughout mathematics. We'll continue on Wednesday with the Circle
Method applied to certain
Diophantine
Problems; if anyone is interested in looking at the
Catalan Conjecture
(now a theorem) and its relation to no product of four consecutive integers is
a square, let me know. We ended with a brief discussion on how instead of
looking at A+A+...+A we could look at A-A; for example, if A = P (the set of
primes), we know 2 is in P-P. What is nice about the Circle Method is the way
it proves something is in a set like A+A+...+A or A-A is to count how MANY
times it is in. Thus, the Circle Method will give heuristics as to how many
times 2 occurs in P(N) - P(N), where P(N) is the set of primes at most N. This
leads to the
Hardy-Littlewood heuristics for the number of twin primes. In our book we
study how many Germain
primes there are, primes p such that (p-1)/2 is also prime. These primes
have applications in cryptography (in
proving that it is possible to do primality testing in polynomial time (see
also here) -- if there are as many Germain primes as the Circle Method
predicts, certain primality tests run faster) and in Fermat's Last Theorem (if
x^p + y^p = z^p and p is a Germain prime then p|xyz).
- Friday, March 20. TBD: We'll almost
surely talk about Poisson Summation and its applications. One fun application
is to proving the iterates of the
3x+1 map satisfy Benford's law. The 3x+1 is just one of many fascinating
sequences to study; another great one is
Conway's See and
Say (or Look and Say) sequence: 1, 11, 21, 1211, 111221, .... There are
numerous fascinating properties of these sequences; one of my favorites is the
Cosmological
Theorem and the interpretation of terms in the sequence in terms of
elements in the periodic table! There were several proofs of this wonderful
theorem; unfortunately they were all `lost'. Since then new ones have
appeared; see
the write-up here for a proof. The first author of the paper is
Shalosh B. Ekhad;
if you've never met `Professor Ekhad' I strongly urge you to click on the link
and take a look at the `Professor' (who is the first author of the
paper!).
- Wednesday, March 18.
Poisson
Summation is one of the standard tools of analytic number theory, allowing
us to frequently convert long, slowly decaying sums to short, rapidly decaying
sums, so that just a few terms suffice to get a good estimate. One nice
application is to counting the number of lattice points with integer
coordinates inside a circle (also called the
Gauss circle
problem). If you consider points with integer coordinates, you would
expect approximately pi R^2 such points to be in a circle of radius R; what is
the error? A little inspection shows that the error shouldn't be much worse
than the perimeter, so the answer might be pi R^2 with an error of at most
something like 2 pi R (Gauss proved an error of at most 2 sqrt(2) pi R). The
current record is by Hooley, who
shows that the error is at most C R^theta, where theta <= .6298.... We
also looked at the
Fourier Transform and interesting functions that satisfy a lot of nice
conditions but not every property we'd like. See for example the function f on
page 270 (or better yet modify this to be infinitely differentiable and it and
its first five powers are integrable).
- Monday, March 16. Today we chatted
about spacings between primes, and consequences. We discussed Chebyshev's
theorem (we prove a version on page 40 of our textbook) that for all x
sufficiently large, there are numbers A and B with 0 < A < 1 < B < oo such
that the number of primes pi(x) satisfies Ax/ln(x) <= pi(x) <= Bx/ln(x); we
used this to prove Bertrand's postulate that there is always a prime between n
and 2n. We then discussed the
Principle of
Inclusion - Exclusion (see pages 18 and 44) and used this to prove that if
we rearrange n people, as n --> oo the probability no one is back where they
started tends to 1/e (this is known as a
derangement). Using
inclusion / exclusion,
Brun was able to prove the sum of the reciprocals of twin primes converge.
If we set pi_2(x) to be the number of twin primes at most x, he showed pi_2(x)
< C x (log log x)^2 / (log x)^2; I used it was at most C x / (log x)^{3/2} in
class, a MUCH weaker result. Our proof in class is typical of many results in
number theory -- be as crude as possible as long as possible in your
estimations; if you don't get your result, then refine your estimates as
needed. We did numerous 'worst case' approximation and still won. Now, if we
asked a harder question (estimate the RATE of convergence), then we of course
couldn't be so crude. Many of our arguments used dyadic interval
decompositions (as does the proof of Chebyshev given in our book). For the
original question as to what is the best known result about spacings between
primes? Well, this depends on what one assumes. We believe there is always a
prime between n^2 and (n+1)^2; when I started grad school I believe the best
unconditional result was there is a prime between x and x + c x^{7/12}, which
has been improved to a prime between x and x + x^.525 (see
the article on Wikipedia for a general description,
or the paper by
Baker, Harman and Pintz for the details.) In particular, we've known for
awhile that there is a prime between n^3 and (n+1)^3 (which is basically
between x and x + C x^{2/3}). One of the ways we proved results such as these
is to show that there are MANY primes in the region, and thus there is at
least one. This is another common number theory technique, and we'll see it
again in the Circle Method. (Another occurrence is proving there is at least
one prime congruent to a mod b if a and b are relatively prime; the only way
we can do this in general is to prove there are INFINITELY many such primes,
and in fact that they occur in the right proportion; this is
Dirichlet's Theorem on Primes in Arithmetic Progression, included in
Chapter 3 of our book.) We discussed how there can't be a prime triple of the
form p, p+2 and p+4 other than 3, 5, 7; this leads to the question of just
which arithmetic progressions are possible.
The
largest to date might be an arithmetic progression of length 25;
Green and Tao
proved that there are arbitrarily long arithmetic progression (sadly it's an
existence theorem, with nothing said about actually finding one!).
- Friday, March 13. In discussing the
various fourth, sixth and general moment calculations, we saw two key points.
The first is that there can't be a contribution to the main term unless we are
matched in pairs; however, being matched in pairs is only a necessary but not
a sufficient condition for a main term contribution. For all real symmetric
matrices, it seems the contribution to the main term of the even moments
occurs only when things are in a generalized matched pairing with neighbors. This
means that we can remove the paired edges one at a time. For example, consider
the matchings in the fourth and sixth moments below. The first two cases in
the fourth moment contribute, but the third doesn't. When you look at the
number of free indices, it's three in cases 1 and 2 but only one or two in
case 3. Similarly, in the sixth moment only cases 1 and 2 contribute. These
have everything in a generalized match with a neighbor. While this is clear
for case 1 of the sixth moment, it isn't immediately clear for case 2. What
happens is that if two neighbors are matched, we can 'remove' that matching
from our graph and look at what remains, and we have one free variable as we
remove. For example, we first remove the top and then the bottom matching in
case 2, and what remains is just a neighbor matching. Why does this work? Say
a_{no} is matched with a_{op}; thus our string looks like a_{mn}, a_{no},
a_{op}, a_{pq}. As a_{no} and a_{op} are matched, 'o' is a free variable, and
n=p. We can then lift this matching out, and this collapses to a_{mn}, a_{n,q}.
Thus it reduces to a smaller graph. In general, the combinatorics depend
crucially on the structure of the real symmetric matrices under consideration.
We get very different combinatorics for
d-regular
graphs. Another fun example is
Toeplitz matrices
(these matrices have applications in computer science, and connections with
Fourier series; they are constant along diagonals and thus have far fewer
degrees of freedom than a general real symmetric matrix). The combinatorics
are very different; for real symmetric Toeplitz matrices case 3 of the fourth
moment now contributes. This isn't too surprising, as we now have many ways
for two indices to be the same; we don't just need say {i,j} = {m,n}, but only
that they be on the same diagonal (or on reflected diagonals, ie, |i-j| =
|m-n|). For
real symmetric Toeplitz matrices (the link is to my paper with students on
the subject), however, there are still some obstructions in the combinatorics.
While there are (2m-1)!! matchings in pairs, not all of the matchings
contribute fully; if they did, the density of states would be a Gaussian, as
(2m-1)!! is the 2m^th moment of the Gaussian. The fourth moment of these
matrices is 2 2/3, not 3 (we have freedom to adjust to make the first two
moments be 0 and 1, but after that the later moments show the true shape of
the distribution). This is seen in that we have equations such as i = j + k -
l with all indices in {1,...,N}, but if j,k > 2N/3 and l < N/3 then there is
no valid i; these Diophantine obstructions cause the moment to be less than
the Gaussian. If we add the condition that the first row be a palindrome, then
these obstructions vanish, and the density of states for
Real Symmetric Palindromic Toeplitz matrices (the link is to my paper with
students on the subject) is the Gaussian.
- Wednesday, March 11. Today we finished
the proof of the odd moments for Wigner's Semicircle Law. We were fortunate
that we don't need to determine all the possible matchings; all we needed to
know was that, for fixed k = 2m+1, the number of matchings where everything is
matched in at least a pair is independent of N. The combinatorics for the even
moments are more delicate, as these do contribute and we will need to know
exactly how many there are. This leads to the Catalan numbers. Though it isn't
needed, it's an interesting question to ask how many solutions there are to
r_1 + r_2 + ... + r_n = 2m+1 where each r_i >= 2 and n can range from 1 to m.
This is an example of a
Diophantine
equation, at least when n is fixed. Diophantine equations arise in many
problems, and lots of people spend their life looking for integer solutions to
equations (or systems of equations) involving polynomials with integer
coefficients. There is a well-developed theory to these problems, and it is
surprisingly easy (once you look at things the right way, which is how much of
combinatorics is!) to incorporate conditions such as each r_i >= 2; allowing n
to be free is harder. A good extra credit problem is to determine the number
of such matchings.
- Friday, March 6. We analyzed the third
moment for the density of eigenvalues of real symmetric matrices. For the
third moment the assumption that the density p(x) is even helped but not
enormously so; for the higher odd moments it initially looks to be quite
useful (as it allows us to avoid some messy combinatorics). We'll see on
Wednesday that the combinatorics can be bypassed for the odd moments by a
simple counting argument (namely, that as long as the density p(x) has finite
moments, there aren't enough matchings to contribute in the limit). This is
violently false for the even moments. There we will have to do some subtle
combinatorics (in fact, the differences in the density of states between
different families of real symmetric matrices is due to the differences in
combinatorics that arise). We'll see the answer is related to the
Catalan numbers. We
also discussed the `cookie problem' (see chapter 1 of our textbook). This is
actually the k=1version of
Waring's problem;
unfortunately the combinatorial argument to solve the k=1 case does not
generalize to the higher cases. There are many techniques to analyze
combinatorial problems; one of my favorites is matching coefficients (see
my handout from my mathematical statistics course from my Brown days).
- Wednesday, March 4. We saw how the
number of degrees of freedom in our random matrix ensemble greatly affects the
density of eigenvalues, but surprisingly doesn't seem to affect the spacings
between normalized eigenvalues. In some sense, these spacings between events
are universal (and there is just a rescaling going on). One reason I love
number theory / random matrix theory so much is the connections between
behavior here and in other systems. A great question from class today was why
we only looked at the spacings between the eigenvalues in the middle of the
semi-circle (this is called the 'bulk' of the spectrum) and not near 1 or -1
(called, not surprisingly, the 'edge' of the spectrum). The behavior of the
largest eigenvalue is well understood, and given by a
Tracy-Widom
distribution (for real symmetric matrices, a TW distribution with beta =
2). The TW distribution occurs in many other interesting places, including the
length of the
longest
increasing subsequence.
- Monday, March 2. Today we reviewed the
probability we'll need, moments and the
method of moments (note the Wikipedia entry specifically mentions
Wigner's
Semi-Circle Law, and no, I wasn't the one who added that!), and that
expectation is
linear. A good exercise is to find, if possible, two dependent random
variables such that the expected value of the sum is not the sum of the
expected values (for example, it was suggested in class that we let X_1 be the
roll of a fair die, and X_2 = 1/X_1 -- does that work?). The reason we are
able to prove results such as
Wigner's
Semi-Circle Law (and so much more) is the Eigenvalue Trace Lemma. More
generally, one can consider similar problems in number theory, such as the
density of zeros of the Riemann zeta function (or more general
L-functions) or the
spacings between adjacent zeros. The problem is that while there are
generalizations of the Eigenvalue Trace Lemma (such as
Riemann's Explicit
Formula), these formulae are useless unless accompanied by a good
averaging formula. We'll see more about this later, but briefly: if we can't
make sense of Trace(A^k), does it really help to express the moments in terms
of this. While we have nice averaging formulas in linear algebra, we don't
have nearly as good formulas in number theory (excellent PHD thesis topics
here -- the lack of these averaging formulas is holding up a lot of progress
on a variety of problems!). Finally, we discussed a fascinating aside,
d-regular random graphs. These have enjoyed remarkable success in building
efficient networks. There are known limits as to how far the second largest
eigenvalue can be from the largest in a connected d-regular graph. Graphs with
large separations are called
Ramanujan graphs;
this is a terrific topic for an aside, and I have a lot of literature I can
share.
- Friday, February 27. We've begun in
earnest our study of Random Matrix Theory. See the
article by Brian Hayes for a bit of the history of the connection between
Random Matrix Theory and Number Theory (though there are a few math mistakes
in the article!). We use the Moment Technique to prove
Wigner's Semicircle law; see the
article by Jacob Christiansen for an introduction to the moment problem
(given a sequence of non-negative numbers, do they represent the moments of a
probability distribution and if so, is there only one distribution with these
moments?); the interested reader is strongly encouraged to read this article
to get a sense of the problem of how moments may or may not specify a
probability distribution. The semicircle law is what one obtains for the
density of eigenvalues from real symmetric matrices with each independent
entry chosen independently from a mean 0, variance 1 distribution with finite
higher moments; if we look at other sets of matrices with different structure,
very different behavior is seen. Terrific examples are the densities for
d-regular graphs or for Toeplitz matrices (see our book for more details).
- Wednesday, February 25. We summarized
the first unit, and showed an application of the equidistribution of n alpha
mod 1 to
Benford's law. For applications, it is often important to have a sense as
to how rapidly one has convergence in equidistribution results. One of the
common techniques involves using the
Erdos-Turan
theorem (the web resources aren't great; I have a copy of a
good book that shows how the irrationality exponent is connected to
quantifying the rate of convergence to equidistribution). We ended by listing
/ recalling some of the probability and linear algebra we'll need for Random
Matrix Theory. For probability, we need to know about
means,
variances, the
Central Limit
Theorem or
Chebyshev's Theorem, and moments of a distribution; for linear algebra, we
need to know about the
trace of a matrix,
its eigenvalues,
orthogonal matrices,
and the
Spectral Theorem (or diagonalization theorem) for
real symmetric
matrices.
- Monday, February 23. Today we proved
Fejer's theorem. One of the most important uses of such a result is that we
can find a finite trigonometric polynomial arbitrarily close to any continuous
function. Thus, to study continuous functions, it often suffices to study
finite trigonometric polynomials and take a limit. One nice application is a
proof of the Weierstrass approximation theorem (any continuous function on a
compact interval can be uniformly approximated by a polynomial); there are
many important generalizations of this result (see the
Stone-Weierstrass entry on wikipedia or the entry on the
Weierstrass approximation theorem on
PlanetMath); there is a very nice, explicit proof in Rudin's Principles
of Mathematical Analysis (aka, the blue book). We have only scratched the
surface on the theory and applications of Fourier analysis; we'll return to
some more of these applications later in the course. One particularly
important question is when the Fourier series converges to the original
function. In the book we give a proof assuming the function is differentiable;
what happens in general? Kolmogorov proved that it is possible to have a
function f such that Int_0^1 |f(x)|dx is finite but the Fourier series
diverges everywhere! (I can get the paper if anyone is interested). The result
is very different if Int_0^1 |f(x)|^2dx is finite; Carleson (and subsequently
C. Fefferman) proved the Fourier series converges almost everywhere (I have
notes from a class by C. Fefferman on this which I can share).
- Wednesday, February 18. Today we
proved the basic properties we'll need for Fejer's theorem. After proving
Fejer's theorem we'll talk in greater detail about some of the strange
properties of convergence (if people are interested) and then move on to
Random Matrix Theory. Some things to think about: will there always be that
overshoot in approximating A_1j and A_2j? What if we instead try to
approximate the characteristic function? If a function is periodic and twice
continuously differentiable, is the same true about its Fourier series
(question asked by Scott after class).
- Monday, February 16. We studied the
distribution of nearest neighbor spacings between
independent, identically distributed random variables taken from the
uniform distribution. We see similar behavior when we look at the spacings
between adjacent primes or the ordered n^k alpha mod 1 for k at least two. In
neither case do we have a proof; in fact, for n^k alpha the behavior
depends greatly on the irrationality exponent of alpha. For more details, see
the textbook and the references therein. Our proof used several results from
previous classes, including the
Fundamental Theorem of Calculus to find the probability and then the
definition of the derivative of
exp(x). We also discussed
Monte-Carlo
integration (see
http://www.fas.org/sgp/othergov/doe/lanl/pubs/00326866.pdf for a note
about the beginnings of the method). We also discussed the natural scale
to study problems (ie, looking at the average spacing between events, where
the events here are the ordered values of our random variables). This is one
reason the twin prime problem is so difficult, as this is a miniscule
difference relative to the average spacing; calculating
Brun's constant
(the sum of the reciprocals of twin primes) led Nicely to discover the
Pentium bug; a nice
description of the discovery of the bug is given at
http://www.trnicely.net/pentbug/pentbug.html.
- Friday, February 13. The proof of the
equidistribution of n alpha mod 1 today uses a very common analysis technique.
To prove a result for a step function (like the characteristic function of the
interval [a,b]), it suffices to prove the result for a continuous function, as
we can find a continuous function that is arbitrarily close. Then, to prove
the result for continuous functions we instead prove the result for a nice,
finite Fourier series, as we can find such a series that is arbitrarily close
to our continuous function. Such arguments are used all the time in Measure
Theory. The crux of the argument is that we have a finite sum of sines and
cosines (the exp(2 pi i m x)), and that these can be divided into two parts.
The first is the constant term (m=0) gives b-a plus a small error; the
remaining terms are 'small' in terms of N. How small is a VERY deep question,
and involves the irrationality exponent of alpha (ie, how well we may
approximate alpha by irrationals). The big result along these lines is the
Erdos-Turan theorem (which is a great topic for an aside / project); we'll
discuss this briefly on Monday. Finally, it is worth going over the argument
and keeping track of what was given and what we chose. We are given an epsilon
> 0; this leads to a j (for how well the continuous functions approximate the
step function) and M (the number of terms in our finite Fourier sum); we then
send N to infinity.
- Monday, February 9 and Wednesday, February 11. In class we talked
about denseness of certain sequences. Other fun ones are sin(n) and cos(n) --
are these dense in the interval [-1, 1]? Equidistributed? What can you say
about these? (I believe one is somewhat elementary, one is more advanced.
Email me for a hint on what math results might be useful.) We also looked at
how knowledge of the irrationality type of alpha can be used to see n^2 alpha
mod 1 is dense. We assumed alpha had irrationality exponent of 4 + eta for
some eta > 1 -- can the argument work for a smaller exponent? What if we
studied n^k alpha mod 1 -- what would we need to assume about the
irrationality exponent? Can you somewhat elementarily prove the denseness of
n^2 alpha if the irrationality exponent is less than 3? I say somewhat
elementarily as we will later show the sequence is equidistributed, and thus
it must be dense. Can you come up with a more elementary proof, where you just
get denseness? Finally, for those who know (or are interested in) measure
theory, one natural question to ask is how severe is the restriction to
studying irrational alpha with exponent 4 + eta? If you're familiar with
Cantor's diagonalization argument (Theorem 5.3.24), almost all numbers are
transcendental (and thus irrationality); however, this does not mean they have
an irrationality exponent as large as 4+eta (for example, the ln(2) is
irrational but has exponent less than 4). A good exercise is to modify the
proof of Theorem A.5.1 to show that almost no irrationals (in the sense of
measure) have irrationality exponent as large as 4 + eta. There were two nice
colloquium so far this week. On Tuesday we saw some dynamics of complex valued
maps, and saw a bit of the difference between real and complex valued
functions. On Monday we saw several proofs of the irrationality of sqrt(2),
including a nice geometric one by Conway. I've been able to generalize that to
show sqrt(3) is irrational -- by using hexagons or other such shapes, can you
do any other numbers? For a fuller description, see the headline / blog post
at
http://www.williams.edu/go/math/sjmiller/public_html/406/discussions/irrsqrtk.htm
- Friday, February 6. In class we defined pi(x) to be the number of primes
at most x. We discussed Euclid's argument which shows that pi(x) tends to
infinity with x, and mentioned that with some work one can show Euclid's
argument implies pi(x) >> log log x. As a nice exercise (for fun), prove this
fact. This leads to an interesting sequence:
2, 3, 7, 43, 13, 53, 5, 6221671, 38709183810571, 139, 2801, 11, 17, 5471,
52662739, 23003, 30693651606209, 37, 1741, 1313797957, 887, 71, 7127, 109, 23,
97, 159227, 643679794963466223081509857, 103, 1079990819, 9539, 3143065813,
29, 3847, 89, 19, 577, 223, 139703, 457, 9649, 61, 4357.... This
sequence is generated as follows. Let a_1 = 2, the first prime. We apply
Euclid's argument and consider 2+1; this is the prime 3 so we set a_2 = 3. We
apply Euclid's argument and now have 2*3+1 = 7, which is prime, and set a_3 =
7. We apply Euclid's argument again and have 2*3*7+1 = 43, which is prime and
set a_4 = 43. Now things get interesting: we apply Euclid's argument and
obtain 2*3*7*43 + 1 = 1807 = 13*139, and set a_5 = 13. Thus a_n is the
smallest prime not on our list genereated by Euclid's argument at the nth
stage. There are a plethora of (I believe) unknown questions about this
sequence, the biggest of course being whether or not it contains every prime.
This is a great sequence to think about, but it is a computational nightmare
to enumerate! I downloaded these terms from the Online Encyclopedia of Integer
Sequences (homepage
is
http://www.research.att.com/~njas/sequences/ and the page for our
sequence is
http://www.research.att.com/~njas/sequences/A000945 ). You can enter the
first few terms of an integer sequence, and it will list whatever sequences it
knows that start this way, provide history, generating functions, connections
to parts of mathematics, .... This is a GREAT website to know if you want to
continue in mathematics. There have been several times I've computed the first
few terms of a problem, looked up what the future terms could be (and thus had
a formula to start the induction). One last comment: we also talked about the
infinitude of primes from zeta(2) = pi^2/6. While at first this doesn't seem
to say anything about how rapidly pi(x) grows, one can isolate a growth rate
from knowing how well pi^2 can be approximated by rationals (see
http://arxiv.org/PS_cache/arxiv/pdf/0709/0709.2184v3.pdf for
details; unfortunately the growth rate is quite weak, and the only way I know
to prove the needed results on how well pi^2 is approximable by rationals
involves knowing the Prime Number Theorem!).