Additional Comments

Additional comments related to material from the class. If anyone wants to convert this to a blog, let me know. These additional remarks are for your enjoyment, and will not be on homeworks or exams. These are just meant to suggest additional topics worth considering, and I am happy to discuss any of these further.

Monday, December 5. We ended the semester with a brief tour on how so many of our different topics tie together. I hope you enjoyed exploring these areas this semester, and seeing the connections

Slides for the talk online here: http://web.williams.edu/Mathematics/sjmiller/public_html/math/talks/Michigan2012Part1.pdf
Video for the talk here: http://youtu.be/JV4KtPuLWT0
We talked about a lot of material, here are some links for more reading.
Links
- Hayes: The Spectrum of Riemannium: a light description of the connection between random matrix theory and number theory (there are a few minor errors in the presentation, basically to simplify the story). This is a quick read, and gives some of the history.
- Firk and Miller: Nuclei, primes and the Random Matrix connection: a survey paper on the history of the subject, including both the nuclear physics experiments and the theoretical calculations.
- Conrey: L-functions and Random Matrix Theory: This is a high level description of the similarities between number theory and random matrix theory.
- Katz-Sarnak: Zeros of Zeta Functions and Symmetry: Another high level article similar to the others (email me for a copy).
- Diaconis: Patterns in Eigenvalues: this is a bit more readable than the others, and is based on a distinguished lecture he delivered.
- Rudnick-Sarnak: Zeros of principle L-functions and random matrix theory: This paper analyzes the n-level correlations of zeros of automorphic L-functions and shows agreement with random matrix theory; included in the paper are bounds towards Ramanujan and the explicit formula for GL(n). For more papers, see Zeev Rudnick's homepage.
- Iwaniec-Luo-Sarnak: Low lying zeros of families of L-functions: This is a must read. This is the first major paper calculating the 1-level density for families of L-functions.
- Hughes-Miller: Low lying zeros of L-functions with orthogonal symmetry. This paper generalizes the results of Iwaniec-Luo-Sarnak to the n-level density. The difficulty is in handling the combinatorics to show agreement with RMT.
- Rubinstein: Low lying zeros of L-functions and Random Matrix Theory: this is his dissertation, and in it he analyzes the 1-level density of the family of quadratic Dirichlet characters, and shows agreement with Random Matrix Theory. This is one of the easiest families to look at, and a great testing ground. The published paper (in Duke) is here.
- Conrey-Snaith: Applications of the L-functions Ratios Conjecture: This is a very recent conjecture which is enjoying remarkable success in predicting answers. I somewhat jokingly call it the conjecture of the four lies, as there are five steps and four of the steps are provably wrong (ie, the assumptions in those steps fail); however, miraculously, all the errors seem to cancel to phenomenal level! I've become very interested in testing this conjecture as much as possible, and have written several papers in this (and have ideas for a few more which will be very accessible).
- Miller: A symplectic test of the L-functions Ratios Conjecture: This paper builds on those by Conrey-Snaith and Rubinstein and uses the Ratios Conjecture to predict the lower order terms up to square-root cancellation, and then shows (for suitable test functions) that this is the correct answer. An obvious project is to generalize this test for other families or to enlarge the support.
- Duenez-Miller: The effect of convolving families of L-functions on the underlying symmetry. In this paper we show how one may determine the corresponding classical compact group for convolutions of certain families of L-functions. This paper was motivated by The low-lying zeros of a GL(4) and a GL(6) family of L-functions (Duenez-Miller), which disproved a folklore conjecture on the corresponding classical compact group.
- Miller (with an appendix by Duenez): Investigations of zeros near the central point of elliptic curve L-functions. In this paper we look at the experimental data of the first few zeros above the central point in families of elliptic curves, with and without rank. We see the effect of rank, and are led to certain conjectures as to the behavior of low-lying zeros for finite conductors. We know what the behavior of these zeros is in the limit as the conductors tend to infinity; see 1 and 2 level densities for rational families of elliptic curves (Steven J Miller) and Low-lying zeros of families of elliptic curves (Matthew Young).
- Odlyzko: Distribution of spacings between zeros of the zeta function: One of the classics in the field of the phenomenal agreement between the zeros of the Riemann zeta function and the eigenvalues of matrices. See also the 10^22nd zero of the Riemann zeta function, or more generally, Andrew Odlyzko's homepage.
- Mezzadri: How to generate random matrices from the classical compact groups: if you want to numerically explore eigenvalues of the classical compact groups, this is a great place to begin. For more papers, see Francesco Mezzadri's homepage.
- See Michael Rubinstein's homepage for many papers, including his Lcalc package.
- Conference notes I took on a workshop for graduate students on L-functions: http://web.williams.edu/Mathematics/sjmiller/public_html/ntandrmt/talks/UtahWorkshopNotes.pdf
- Homepage for notes I made / slides of talks from that workshop: http://web.williams.edu/Mathematics/sjmiller/public_html/ntandrmt/index.htm
Other items mentioned today:
- Langlands program: http://en.wikipedia.org/wiki/Langlands_program
- Kesten measure: http://www.math.snu.ac.kr/~wylee/KOTAC/KOTAC2008/Abstracts/Bozejko.pdf
- Explicit formula for L-functions: http://en.wikipedia.org/wiki/Explicit_formulae_%28L-function%29
- Riemann's explicit formula: http://www.math.umn.edu/~garrett/m/mfms/notes_c/mfms_notes_02.pdf
- McKay: Eigenvalues of Large Random Graphs: This readable paper calculates the density of eigenvalues for d-regular graphs; the answer is different than the semi-circle Wigner found for the family of all real symmetric matrices. Excellent projects are finding the density of states for special sets of matrices.
- Jackobson-(SD)Miller-Rivin-Rudnick: Random graphs: Numerics about the neighbor spacings of d-regular graphs.
- Womald: Models of Random Graphs: various ways to generate random graphs.
- Riemann's paper: http://www.claymath.org/sites/default/files/ezeta.pdf (on page 4 of the paper, page 5 of the file, he writes: One now finds indeed approximately this number of real roots within these limits, and it is very probable that all roots are real. Certainly one would wish for a stricter proof here; I have meanwhile temporarily put aside the search for this after some fleeting futile attempts, as it appears unnecessary for the next objective of my investigation.)
- Riemann Hypothesis: http://en.wikipedia.org/wiki/Riemann_hypothesis
- Grand Simplicity Hypothesis: http://projecteuclid.org/euclid.em/1048515870 (this is the Chebyshev's bias paper by Rubinstein and Sarnak)
Video: http://youtu.be/JV4KtPuLWT0

Monday, December 1. Two presentations today emphasizing how to do calculations.

Partial summation is an extremely important technique; frequently a lot of work is required to get expressions to a nice enough form so that the calculations can be done and be useful. The idea of a telescoping series is very nice, or a series that is constant; these allow us to easily move / extend intervals of consideration. One of the major techniques in my thesis was to ensure in my oscillating function I didn't traverse a bounded interval order n times (as I had n steps). To avoid that catastrophe I sieved and looked at a subsequence that was monotone increasing, so I would go through it at most once and get a universal bound independent of n. For this a useful ingredient is bounded variation.
The second item was on the Mellin transform; this is just the Fourier transform if we change variables. They're related by a logarithmic change of variables. A great example of where this could be useful is in Benford's law analysis: are you looking at the original data (if so use Mellin) or the logarithmic transform of it (if so use Fourier). Frequently you'll find a book with identities in one but not the other, but you can convert from identity to identity.
- Fourier: The Modulo 1 Central Limit Theorem and Benford's Law for Products (with Mark Nigrini), International Journal of Algebra. (2 (2008), no. 3, 119--130). pdf
- Foruier: Order statistics and Benford's law (with Mark Nigrini), International Journal of Mathematics and Mathematical Sciences (Volume 2008 (2008), Article ID 382948, 19 pages, doi:10.1155/2008/382948) pdf
- Mellin: Chains of distributions, hierarchical Bayesian models and Benford's Law (with D. Jang, J. U. Kang, A. Kruckman and J. Kudo), Journal of Algebra, Number Theory: Advances and Applications. (volume 1, number 1 (March 2009), 37--60) pdf

Monday, November 24. We ended our unit on calculating the limiting spectral measure from the real symmetric ensemble., seeing how the combinatorics emerge. This is just the beginning of a very vast field. Due to lack of time we cannot investigate all the myriad of opportunities, but the next step is seeing how different structures on the matrices affect the combinatorics. Some ensembles are described in the additional comments for Friday, November 21.

McKay's paper on d-regular graphs: http://cs.anu.edu.au/~bdm/papers/RandRegEigenvalues.pdf
Graphs have a wealth of application, and the eigenvalues there have much meaning. For example, for d-regular graphs if the largest eigenvalue has multiplicity 1 then the graph is connected, while the gap between the largest and second largest is related to how quickly information propagates through the network. For more on the conjectured size, see the work of Friedman: http://arxiv.org/pdf/cs/0405020v1
See also Ramanujan graphs: http://en.wikipedia.org/wiki/Ramanujan_graph
Here is a nice application of random matrix theory to bus routes: http://authors.library.caltech.edu/3946/1/BAIjpa06.pdf
Due to time constraints we focused on the limiting spectral measure and not on the gaps between adjacent eigenvalues, though that is an important field with much progress in the past 5 years.
- Erdos, Schlein and Yau: http://arxiv.org/abs/0711.1730
- Tao and Vu: http://arxiv.org/pdf/0906.0510v10
What I hope you got out of this unit was an appreciation of how different areas of mathematics meet and interact, and how problems in analysis often reduce to technical machinery plus combinatorics!
Video online here: http://youtu.be/Pz43bbE_sPE

Friday, November 21. We continued our analysis. If anyone is reading this let me know and I'll put more comments here! So far one response, and have added.

We built on the method of moments and started doing the integration. A key fact was that if we have \(\int a_{ij}^r p(a_{ij}) da_{ij}\) that this is 1 if r is 0 (as \(p\) is a probability distribution it integrates to 1) or r is 2 (as \(p\) has mean zero and variance 1, and thus this is the same as the variance), while if r is 1 it is zero (as \(p\) has mean zero); for higher r we have a finite value by assumption. While we have on the order of \(N^2/2\) integrals to do, in the k-th moment most are 1. In fact, when we expand the trace we have a polynomial in the matrix elements of degree k, and thus at most k of the integrals are not 1.
We saw the method of moments at work in going from knowing sums of powers of the eigenvalues and using that to determine their values.
- Symmetric polynomials: http://en.wikipedia.org/wiki/Symmetric_polynomial
- Newton's identities: http://en.wikipedia.org/wiki/Newton%27s_identities
- Vieta's formulas: http://en.wikipedia.org/wiki/Vieta%27s_formulas
Typically we cannot convert from knowing the coefficients of a polynomial to a closed form expression for the roots; we can only do that for small degrees.
- Quadratic formula: http://en.wikipedia.org/wiki/Quadratic_formula
- Cubic Formula: http://en.wikipedia.org/wiki/Cubic_function#General_formula_for_roots
- Quartic Formula: http://en.wikipedia.org/wiki/Quartic_function
- Abel-Ruffini theorem: http://en.wikipedia.org/wiki/Abel%E2%80%93Ruffini_theorem
Moments of the Gaussian are double factorials: http://en.wikipedia.org/wiki/Double_factorial
Moments of the semi-circle are the Catalan numbers: http://en.wikipedia.org/wiki/Catalan_number
The main insight in our analysis is that most of the matchings do not contribute; one of my advisors (Iwaniec) refers to this as doing a good job counting on your hands, getting a good sense of hte number of degrees of freedom. If we can show that even if everything contributed fully it is negligible in the limit, then there is no need to carefully figure out its contribution! What really matters is getting the correct growth rate for the contributions as a function of \(N\); we often have factors of 2 coming from either being on the same side of the main diagonal or opposite sides; while these matter if we want to figure out the precise answer, they often contribute at most \(2^k\) and this is indepenent of \(N\).
In general, if we impose structure on the matrix that will affect the contribution of different matchings. The more structure we have, the more chances we have to match. In real symmetric matrices we have very little structure, and if things are matched in pairs then \(a_{ij}\) and \(a_{mn}\) are equal if and only if the pairs \((i,j)\) and \((m,n)\) are equal. The situation is very different if we consider Toeplitz matrices, which are constant along diagonals. Now there are a lot more choices for the indices to correspond to the same independent variable. Below are links to some supervised work I've done in the area.
- Distribution of eigenvalues for the ensemble of real symmetric Toeplitz matrices (with Chris Hammond). Journal of Theoretical Probability (18 (2005), no. 3, 537-566). pdf
- Distribution of eigenvalues of real symmetric palindromic Toeplitz matrices and circulant matrices (with Adam Massey and John Sinsheimer), Journal of Theoretical Probability. (20 (2007), no. 3, 637--662.) pdf
- The distribution of the second largest eigenvalue in families of random regular graphs (with Tim Novikoff and Anthony Sabelli), Experimental Mathematics. (17 (2008), no. 2, 231--244.) pdf
- Distribution of eigenvalues for highly palindromic real symmetric Toeplitz matrices (with Steven Jackson and Thuy Pham), Journal of Theoretical Probability. (25 (2012), 464--495) pdf
- The Limiting Spectral Measure for Ensembles of Symmetric Block Circulant Matrices (with Gene S. Kopp Murat Koloğlu, Frederick Strauch, Wentao Xiong). Journal of Theoretical Probability (26(2013), no. 4, 1020--1060) pdf
- The expected eigenvalue distribution of large, weighted d-regular graphs (with Leo Goldmakher, Cap Khoury and Kesinee Ninsuwan). Random Matrices: Theory and Applications. (3 (2014), no. 2, 1450015, 22 pages) pdf
- Distribution of eigenvalues of weighted, structured matrix ensembles (with Olivia Beckwith, Victor Luo, Karen Shen and Nicholas Triantafillou), submitted to INTEGERS. pdf
Video here: http://youtu.be/Ij-t_KHt9KU

Wednesday, November 19. We talked about how to determine the appropriate scale for RMT, the Eigenvalue Trace Lemma, the Triangularization Lemma, and the Method of Moments.

Eigenvalue Trace Lemma: http://en.wikipedia.org/wiki/Determinant#Relation_to_eigenvalues_and_trace
Triangularization Lemma: http://www.math.uga.edu/~foucart/TeachingFiles/F12/M504Lect1.pdf
- Jordan canonical form: http://en.wikipedia.org/wiki/Jordan_normal_form
  - Look and say sequence: http://en.wikipedia.org/wiki/Look-and-say_sequence
  - Conway polynomial: http://www.njohnston.ca/2010/10/a-derivation-of-conways-degree-71-look-and-say-polynomial/
Method of Moments: http://en.wikipedia.org/wiki/Method_of_moments_(statistics)
Central Limit Theorem: http://www.njohnston.ca/2010/10/a-derivation-of-conways-degree-71-look-and-say-polynomial/
Chebyshev's inequality: http://en.wikipedia.org/wiki/Chebyshev%27s_inequality
Wigner's semicircle law: http://en.wikipedia.org/wiki/Wigner_semicircle_distribution
Video here: http://youtu.be/-3wk-wqNFJw

Monday, November 17. We've begun in earnest our study of Random Matrix Theory. See the article by Brian Hayes for a bit of the history of the connection between Random Matrix Theory and Number Theory (though there are a few math mistakes in the article!). We use the Moment Technique to prove Wigner's Semicircle law; see the article by Jacob Christiansen for an introduction to the moment problem (given a sequence of non-negative numbers, do they represent the moments of a probability distribution and if so, is there only one distribution with these moments?); the interested reader is strongly encouraged to read this article to get a sense of the problem of how moments may or may not specify a probability distribution. The semicircle law is what one obtains for the density of eigenvalues from real symmetric matrices with each independent entry chosen independently from a mean 0, variance 1 distribution with finite higher moments; if we look at other sets of matrices with different structure, very different behavior is seen. Terrific examples are the densities for d-regular graphs or for Toeplitz matrices (see our book for more details).

We reviewed the probability we'll need, moments and the method of moments (note the Wikipedia entry specifically mentions Wigner's Semi-Circle Law), and that expectation is linear. A good exercise is to find, if possible, two dependent random variables such that the expected value of the sum is not the sum of the expected values (if we let \(X_1\) be the roll of a fair die, and \(X_2 = 1/X_1\) -- does that work?).
The reason we are able to prove results such as Wigner's Semi-Circle Law (and so much more) is the Eigenvalue Trace Lemma. More generally, one can consider similar problems in number theory, such as the density of zeros of the Riemann zeta function (or more general L-functions) or the spacings between adjacent zeros. The problem is that while there are generalizations of the Eigenvalue Trace Lemma (such as Riemann's Explicit Formula), these formulae are useless unless accompanied by a good averaging formula. We'll see more about this later, but briefly: if we can't make sense of Trace(\(A^k\)), does it really help to express the moments in terms of this. While we have nice averaging formulas in linear algebra, we don't have nearly as good formulas in number theory (excellent PHD thesis topics here -- the lack of these averaging formulas is holding up a lot of progress on a variety of problems!).
A fascinating aside is \(d\)-regular random graphs. These have enjoyed remarkable success in building efficient networks. There are known limits as to how far the second largest eigenvalue can be from the largest in a connected \(d\)-regular graph. Graphs with large separations are called Ramanujan graphs; this is a terrific topic for an aside, and I have a lot of literature I can share.

General reading:

Hayes: The Spectrum of Riemannium: a light description of the connection between random matrix theory and number theory (there are a few minor errors in the presentation, basically to simplify the story). This is a quick read, and gives some of the history.
Conrey: L-functions and Random Matrix Theory: This is a high level description of the similarities between number theory and random matrix theory.
Katz-Sarnak: Zeros of Zeta Functions and Symmetry: Another high level article similar to the others.
Diaconis: Patterns in Eigenvalues: this is a bit more readable than the others, and is based on a distinguished lecture he delivered.

Video here: http://youtu.be/kiswyjwDirg

Friday, November 14. Today we wrote down the Circle Method calculation for a variety of problems and discussed counting and bounding heuristics, and had our first presentation.

Rubinstein paper on the Hardy-Littlewood Constant and Twin Primes: http://www.jstor.org/stable/2324298
- Direct link to Rubinstein's paper (not sure if this will work): http://www.jstor.org/stable/pdfplus/2324298.pdf?acceptTC=true&jpdConfirm=true
- Note from one of my students: http://web.math.princeton.edu/mathlab/projects/primes/ds/ds_primes.doc
Rubinstein and Sarnak: Chebyshev's bias: http://www.math.uwaterloo.ca/~mrubinst/publications/Chebyshev.pdf
Hardy-Littlewood Conjectures:
- Twin prime conjecture: http://en.wikipedia.org/wiki/Twin_prime#First_Hardy.E2.80.93Littlewood_conjecture
- k-tuple conjecture: http://mathworld.wolfram.com/k-TupleConjecture.html (notice the integral!).
- Prime constellation: http://mathworld.wolfram.com/PrimeConstellation.html
  - USS Constellation: NCC 1017: http://mathworld.wolfram.com/PrimeConstellation.html
    - From MemoryAlpha: The Constellation studio model was constructed from a 1966 first edition AMT Enterprise model kit , no. S921. (Star Trek Encyclopedia, 3rd. ed., p.
      85) That particular edition sported a decal sheet with only the "NCC-1701" decal, and, with very limited options, its numbers had to be rearranged to create the unusually low "NCC-1017" Constellation registry number, the first new registry actually seen on a ship and, as it turned out, also the only time in the original airing of the Original Series , though
      considered somewhat incongruous by many, due to the perceived discrepancy in the numbering system.
We did a lot of heuristics, which is great and a nice way to get a sense of the magnitude of the answer.
- Bui and Keating: Twin primes from random arguments: http://ac.els-cdn.com/S0022314X05002386/1-s2.0-S0022314X05002386-main.pdf?_tid=f0d5ecfa-6c2c-11e4-9572-00000aacb35f&acdnat=1415990280_275bb415c7d93ed42d51f2243c1b9231 (arxiv post here: http://arxiv.org/pdf/math/0607196v3 ).
- Prime number theorem for random sequences: http://ac.els-cdn.com/0022314X76900846/1-s2.0-0022314X76900846-main.pdf?_tid=0a4360a0-6c32-11e4-a1e5-00000aacb35e&acdnat=1415992470_f7f154c98f8c5d8f3e5569e95cae91b8
- Granville: Cramer and the distribution of primes: https://www.dartmouth.edu/~chance/chance_news/for_chance_news/Riemann/cramer.pdf
- Cramer's model for the primes: http://michaelnielsen.org/polymath1/index.php?title=Cramer%27s_random_model_for_the_primes (there are issues with this: http://arxiv.org/pdf/math/0003234v1 )
We did some more order of magnitude estimates. We had \(\sum_{p \le N} (\log p)^2\). We got an upper bound of \((\log N)^2 \pi_2(N)\) very easily. For the lower bound we saw it was good to restrict the sum to primes in the range \(N^a \le p \le N\), as here up to a constant \(\log p\) doesn't change; if we went all the way down to \(p \sim \log N\) then \(\log p \sim \log\log N\), much lower. As we believe the number of twin primes up to \(N\) is of size \(C_2 N/\log^2 N\), this is a fine place to cut. We can trivially estimate the number of primes up to \(N^a\) by \(N^a\), and end with a good estimate: \(a^2 \log^2 N\) (\pi_2(N) - N^a) \le \sum_{p \le N} (\log N)^2 \le (\log N)^2 \pi_2(N)\).
We also discussed \(\log (p+2) = \log p + \log(1 + 2/p) = \log p + O(1/p)\); this is a common technique, go for the main term, rewrite as one plus something small, Taylor expand. Interestingly this was the point the presenters chose to bring up!
Brun's sieve: http://en.wikipedia.org/wiki/Brun_sieve
- The main idea behind Brun's sieve is the Method of Inclusion - Exclusion.
  - The inclusion / exclusion principle is one of my favorite methods and is especially important in probability as it is very easy to accidentally double count events. We can use this to show that the probability that none of \(n\) people return to their seat (given that each ordering is equally likely) converges to \(1/e\). Another fun example is to show that the probability a number is square-free converges to \(6/\pi^2\); more generally, the probability that it is \(k\)-power free for \(k\) at least \(2\) is \(1/\zeta(k)\), where \(\zeta(s) = \sum_{n = 1}^\infty 1 / n^s = \prod_{p\ {\rm prime}} (1 - 1/p^s)^{-1}\) (if Re(s) > 1) is the Riemann zeta function. Sadly these arguments cannot be used to prove results about how many primes there are (it comes down to dealing with the error terms in dropping the floor function, though this has not stopped lots of amateurs from using this to `prove' some of the big open problems in number theory).
  - One of the more interesting uses of this principle is in Brun's sieve, where he uses inclusion-exclusion to show that there cannot be too many twin primes. Perhaps the strangest application of this is that this is how the famous Pentium Bug was discovered! What about the more general case, namely when we reorder and have at least \(r\) correct?
  - Here's Nicely's webpage. The calculation being performed was \(\sum_{p:\ p\ {\rm prime\ and\ either}\ p+2\ {\rm or} \ {p-2}\ {\rm is\ prime}} 1/p\); this is known as Brun's constant. If this sum were infinite then there would be infinitely many twin primes, proving one of the most famous conjectures in mathematics; sadly the sum is finite and thus there may or may not be infinitely many twin primes (twin primes are two primes differing by 2).
Video online here: http://youtu.be/_9xI2tmhjHs

Wednesday, November 12. We first discussed some of the exam problems (this discussion was not recorded as some students shared their work), and then delved into why the binary Goldbach problem is so hard but the ternary is doable.

It came down to how much cancellation we need on the major arcs. If we are trying to write a number as a sum of \(s\) primes, we got the main term was \(\goth{S}_s(N) N^{s-1}\), where \(\goth{S}_s(N)\) is the singular series and, importantly, can be bounded away from zero with a bound independent of \(N\) (if the bound did depend on \(N\) then if it were sufficiently small it could wipe out the main term.
We then needed to estimate the minor arc contribution: \(\int_{\mathrm{m}} f_N(x)^s \exp(-2\pi i N x) dx\), where \(f_N(x) = \sum_{p\le N} (\log p) \exp(2\pi i x)\). Unfortunately the only thing we know how to do here is pull the absolute value inside. This is disastrous as we lose all the oscillation from the highly oscillatory term \(\exp(-2\pi i N x)\), but at least we still have oscillation and cancellation from the \(f_N(x)^s\).
We went through bounding when \(s=3\). We pulled out one factor of \(f_N(x)\) and showed that the minor arc contribution is bounded by \(\max_{x \in \mathrm{m}} |f_N(x)| \ast \int_0^1 |f_N(x)|^2 dx\) (where we extended the integration from the minor arcs to the entire interval). We have a lot of cancellation in this integral (the absolute value is easily handled: \(|f_N(x)|^2 = f_N(x) f_N(-x)\), and the resulting double sum when we expand collapses to the two primes being equal); this integral is just \(N \log N\) (approximately), and thus as the major arcs are of size \(N^2\), we win if we can save a few logarithms in \(|f_N(x)|\) on the minor arcs. This is how the subject developed, and results on the distribution of primes in arithmetic progression (RH and GRH for the Riemann zeta function and Dirichlet \(L\)-functions) come into play.
The situation is sadly different for the binary Goldbach. The problem is if we again pull out one factor of \(f\_N(x)\) over the minor arcs we're left with the \(L_1\)-norm, which we can only estimate by extending the integration from the minor arcs to all the interval and getting it is of size \(\sqrt{N\log N}\) (to see this use Cauchy-Schwarz). Now if we're going to have the minor arcs smaller than the major arcs we need the maximum value to be \(o(N^{1/2}/\log N)\). This is disastrous -- this is smaller than our estimate of the average value!
I'm hoping that the above discussion gives you a sense of the potential and the limitations of the Circle Method. We'll formulate more problems along these lines on Friday, and we'll hear short reports.

Read classics:

Hardy and Littlewood: Partitio Numerorum III: http://fuchs-braun.com/media/8cdd73c813c342f8ffff80d1fffffff0.pdf
Tao: Odd numbers exceeding 1 are the sum of at most 5 primes: http://arxiv.org/pdf/1201.6656v4.pdf
Helfgott: Series of papers on Goldbach:
- Major Arcs: http://arxiv.org/abs/1305.2897
- Minor Arcs: http://arxiv.org/abs/1205.5252/
- Ternary True: http://arxiv.org/abs/1312.7748

Video online here: http://youtu.be/3n2BTQKdFfo

Wednesday, November 5. Remember no class on Friday (b/c of the exam) or Monday (to do group work).
- Today we explored the definition of the major arcs. The key idea is that, to get a sense of their size, we can make crude estimates as we only care about order of magnitude. This is a general principle: no need to spend too much time on quantities that don't matter!
- There are a lot of tedious details in the application of the Circle Method. There was a request to do some more examples in class, which we'll do next Wednesday (moving the group presentations to Friday).
- Key quantities:
  - Major and minor arcs: there definition depends on the cancellation we need and can get; for Goldbach type problems are major arcs are embarrassingly short.
  - Singular series: represent it as a product, quantifies local obstructions.
  - Waring's problem: http://www.maths.bris.ac.uk/~sp2331/WaringCircle.pdf (nice set of notes).
- Video online here: http://youtu.be/5iROetdSRVY
Mon
day, November 3. We talked about the importance of having a variable parameter when trying to estimate integrals. When it's not clear where to cut things, add a free parameter. For example, we were trying to estimate \(\sum_{p \le N} \log p\). To get a handle on it, we assumed we knew how to evaluate unweighted sums. Of course, in reality it's the other way around -- we have information on these weighted sum and we want to pass to unweighted ones! No one seemed to question this today.... The reason I wanted to do this was not because we needed this result, but because I wanted to talk about how to get results along these lines. Assuming good knowledge on the unweighted sum, we split into to sums, from \(2\) to \(N/\log^\alpha N\) and then from \(N/\log^\alpha N\) to \(N\). In the second regime the \(\log p\) weights are essentially constant, differing only by terms of size \(\log \log N\), while if \(\alpha > 1\) in the first regime there are just an insignificant number of primes relative to the number in the second term.
- We saw how weighted counting of primes in arithmetic progression allow us to understand the size of the function \(f_N(x) = \sum_{p\le N} (\log p) e^{2\pi i p x}\).
- Siegel-Walfisz: http://en.wikipedia.org/wiki/Siegel%E2%80%93Walfisz_theorem
- Read and discuss in groups of at least 2 and be prepared to present in class on Wednesday:
  
  Read classics: Hardy and Littlewood: Partitio Numerorum III: http://fuchs-braun.com/media/8cdd73c813c342f8ffff80d1fffffff0.pdf
  
  Tao: Odd numbers exceeding 1 are the sum of at most 5 primes: http://arxiv.org/pdf/1201.6656v4.pdf
  
  Helfgott: Series of papers on Goldbach:
  
  Major Arcs: http://arxiv.org/abs/1305.2897
  
  Minor Arcs: http://arxiv.org/abs/1205.5252/
  
  Ternary True: http://arxiv.org/abs/1312.7748
- Video online here: http://youtu.be/sF9gBL85lrc
Friday, October 31. Building on our knowledge of L-functions and primes in arithmetic progression, we now turn to the Circle Method. In my opinion, this is one of the most beautiful ideas in all of mathematics. Unfortunately the application of it is often quite involved and technical, and it is easy to get lost in the calculations. Similar to our study of primes in arithmetic progressions, the Circle Method works by showing that because something happens many times, it must happen at least once!
- We discussed two of the main ingredients in the Circle Method: (1) writing expressions as Main Term + Error with good control on the error, and (2) proving something happens at least once by showing it happens many times. We discussed prime races (see also here), and how misleading the data can be. Instead of looking at \(\pi_{3,4}(x) - \pi_{1,4}(x)\) we could look at \({\rm Li}(x) - \pi(x)\); this was also observed to be positive as far as people could see, but it turns out that they flip infinitely often as to which is larger. This was shown by Littlewood, but it was not known how far one must go to see \(pi(x) > Li(x)\). His student Skewes showed it suffices to go up to \(10^{10^{34}}\) if the Riemann hypothesis is true, or \(10^{10^{10^{963}}}\) otherwise (as large as these numbers are, they are dwarfed by Graham's number). We 'believe' it's around 10^316 where pi(x) beats Li(x) for the first time (note this is well beyond what we can investigate on the computer!). The proof involves the Grand Simplicity Hypothesis (that the imaginary parts of the non-trivial zeros of Dirichlet L-functions are linearly independent over the rationals); this is used to show that (n \(\gamma_1, ..., n \gamma_k) \bmod 1\) is equidistributed in \([0,1]^k\) where the \(\gamma_j\) are the imaginary parts of these zeros. Note that this is Kronecker's theorem (which we discussed in Chapter 12); it's amazing how this result surfaces throughout mathematics.
- Instead of looking at \(A+A+\cdots+A\) we could look at \(A-A\); for example, if \(A = P\) (the set of primes), we know \(2\) is in \(P-P\). What is nice about the Circle Method is the way it proves something is in a set like \((A+A+\cdots+A\) or \(A-A\) is to count how MANY times it is in. Thus, the Circle Method will give heuristics as to how many times \(2\) occurs in \(P(N) - P(N)\), where \(P(N)\) is the set of primes at most \(N\). This leads to the Hardy-Littlewood heuristics for the number of twin primes. In our book we study how many Germain primes there are, primes p such that (p-1)/2 is also prime. These primes have applications in cryptography (in proving that it is possible to do primality testing in polynomial time (see also here) -- if there are as many Germain primes as the Circle Method predicts, certain primality tests run faster) and in Fermat's Last Theorem (if x^p + y^p = z^p and p is a Germain prime then p|xyz).
- The following problem arose in Math 331: can a product of \(k\) consecutive integers be a perfect \(r \ge 2\) power? Here is a nice paper by Erdos and Selfridge proving this cannot be done: https://www.renyi.hu/~p_erdos/1975-46.pdf (there are elementary proofs of some simpler cases).
- Here are some notes on evaluating \(\zeta(2)\): http://www.uam.es/personal_pdi/ciencias/cillerue/Curso/zeta2.pdf
- We studied the cookie problem, counting the number of ways to divide 10 identical cookies among 5 distinct people (here's the classic clip where Cookie Monster meets the Count, whose full name is Count von Count -- they changed how he appears!). Counting is very important; we talked a bit about counting 'good' games in tic-tac-toe, and ignoring counting obviously bad ones. (Consider the error in the classic Princess Bride `Chess' Match or in the Princess Bridge Battle of Wits). It's usually called the stars and bars problem. What I love here is the power of changing your perspective -- we go from a very painful brute force approach to being able to solve it in one line.
  - I have posted the cookie problem on my math riddles page (email me if you want to contribute); someone with far more patience than I solved it by brute force. Here's their solution. The final number, tabbed from the others, are how many distinct rearrangements we have of this basic configuration. The total is 1001, or (10+5-1 choose 5-1). We saw this problem lead to a discussion of multinomial coefficients. When there are two people getting cookies, say 8 and 2, there aren't (5 choose 2) ways to assign people, but (5 choose 2) * 2! (we choose the 2 people, then there are 2! ways to choose which gets the 8 and which gets the 2). If we have 8 1 1 it's more involved. In that case it's (5 choose 3) to choose the three people, 3! ways to order which of the people gets which number, but then we must divide by 2! (as the two people getting 1 are indistinguishable). A better way to view 3!/2! is 3! / (2! 1!) (note the numbers on the bottom sum to the top). This is an example of amultinomial coefficient, a generalization of binomial coefficients. For example, if we have MISSISSIPPI, there would be 11! ways to order the letters (order matters) ifthe letters are distinguishable, but they're not. So let's put subscripts on the letters: MI₁S₁S₂I₂S₃S₄I₃P₁P₂I₄. We then have 4! ways of placing the four marked S's in the four S positions, and so on, giving 11! / (4! 4! 2! 1!) (I like including the final 1 so that the bottom sums to the top).
  - Below are the person's solutions.
  - 0 0 0 0 10         5
    
    0 0 0 1 9         20
    0 0 0 2 8         20
    0 0 0 3 7         20
    0 0 0 4 6         20
    0 0 0 5 5         10
    
    0 0 1 1 8         30
    0 0 1 2 7         60
    0 0 1 3 6         60
    0 0 1 4 5         60
    0 0 2 2 6         30
    0 0 2 3 5         60
    0 0 2 4 4         30
    0 0 3 3 4         30
    
    0 1 1 1 7         20
    0 1 1 2 6         60
    0 1 1 3 5         60
    0 1 1 4 4         30
    0 1 2 2 5         60
    0 1 2 3 4         120
    0 1 3 3 3         20
    0 2 2 2 4         20
    0 2 2 3 3         30
    
    1 1 1 1 6         5
    1 1 1 2 5         20
    1 1 1 3 4         20
    1 1 2 2 4         30
    1 1 2 3 3         30
    1 2 2 2 3         20
    
    2 2 2 2 2          1
  - What we're really doing is solving the equation x1 + ... + x5 = 10 in non-negative integers. This is a very special type of Diophantine equation. It's actually a special case of Waring's problem, which looks at solving x1^k + ... + xs^k = n for fixed s and k. These problems are in general not accessible through combinatorics; the case k=1 is special. The general approach proceeds via generating functions, which we will cover in great detail later in the semester (it's one of the key concepts of the class).
  - Our solution to the cookie problem is quite elegant, and in some respects reminiscent of geometry class (remember all those proofs where the teacher cleverly addsauxiliary lines; the difference here is we just add more cookies). While it is possible to solve many combinatorial problems by brute force in principle, in practice this is not a good way to go -- it is time consuming, and quite likely that one makes a mistake. Typically one finds a way to interpret a given quantity two ways; we can compute one of them and thus we obtain a formula for the other. For example, we showed the number of ways of dividing C cookies among P people is (C + P - 1 choose P-1); here all the identical cookies are divided. What if we don't assume all the cookies are divided -- what is the answer now? It is just Sum_{c = 0 to C} (c + P - 1 choose P - 1); this is because we are just going through all the cases (we give out no cookies, 1 cookie, ...). What does this sum equal? Imagine now we have another person, say the Cookie Monster (this is one of Cameron's favorite clips), who gets all the remaining cookies. Then dividing at most C cookies among P people is the same as dividing exactly C cookies among P+1 people, and hence our sum equals (C + P+1 - 1 choose P+1 - 1).
  - Related to the cookie problem is the partition problem: for the cookie problem we consider 2+3+3+1+1 different from 1+2+3+3+1 as the five people are distinct; if we don't consider these distinct then we have a partition problem. It's a lot more complicated to count these, but some great mathematics (such as Young tableau).
- Video online here: http://youtu.be/jzSt1Uepv-c
Wednesday, October 29. We finished our (first) unit on L-functions. We saw the importance of studying a family of L-functions to glean information. We used Dirichlet characters to build Dirichlet L-functions, which we then used to prove the infinitude of primes congruent to \(a\) modulo \(m\) (provided these numbers are relatively prime). In the five step program, the key step was proving \(L(1,\chi) \neq 0\) if \(\chi\) is not the principle (or trivial) character \(\chi_0\). The proof in general leads to Dirichlet's class number formula.
- Dirichlet's class number formula: http://en.wikipedia.org/wiki/Class_number_formula
- Lecture notes expanding on Davenport's classic "Multiplicative Number Theory" by Andreas Strombergsson: http://www2.math.uu.se/~astrombe/analtalt08/www_notes.pdf
- One of the main steps in the proof was taking \(a=1\) and noting that \(\sum_{\chi \ {\rm mod}\ m} \log L(\sigma,\chi) \ge 0\) for \(\sigma > 1\). We were able to use this to show that if \(\chi\) is not a real character (so it doesn't equal its complex conjugate) then the associated \(L(1,\chi) \neq 0\); this is because if there were two zeros the sum of the logarithms would go to \(-\infty\) as \(\sigma \to 1\) from above, as only one factor has a pole which is then cancelled by one of the two zeros.
- This reduces the analysis to \(L(1,\chi)\) for \(\chi = \overline{\chi}\). We looked at the characters when \(m=4\); this is one of the first non-trivial examples as we need an \(m\) where there are two relatively prime numbers to it. We saw that \(L(1,\chi) = \sum_{n=1}^\infty \chi(n)/n = 1 - 1/3 + 1/5 - 1/7 + \cdots\). This sum converges nicely; it's an alternating strictly decreasing sum. We saw that we could write this as \(\int_0^1 dx/(1+x^2)\), expanded the denominator by the geometric series (whenver you see \(1/(1\pm{\rm blah})\) you should think of using the geometric series). If we integrate directly we get \(\arctan(1) - \arctan(0) = \pi/4\); if we interchange the integral and the sum we get the series alluded to above.
- We have to be a bit careful. First, we can't just integrate to 1 as then the geometric series formula doesn't work as the ratio has absolute value 1. All is not lost; we can integrate to \(1-\epsilon\) and send \(\epsilon\) to 0. Next, we have to worry about interchanging the integral and the sum. Fortunately this is easy for the geometric series, as it has the wonderful property that the tail is also a geometric series. Thus \(\frac1{1+x^2} = \sum_{n=0}^{N-1} (-x^2)^n + (-x^2)^N \frac1{1+x^2}\), and we now have a finite sum.
- Note that if we took \(x=1\) this corresponds to a geometric series where \(r = -1\). If we think about it, we see we're saying \(\frac1{1-(-1)} = 1 - 1 + 1 - 1 + 1 - 1 + \cdots\); it's reasonable to declare this to be 1/2, as half the time (when we truncate after an odd number of terms) we have 1 while the other times we have 0. This is the first example of a rich theory dealing with how to make sense of divergent sums.
  - Abel summation: http://en.wikipedia.org/wiki/Abel's_summation_formula
  - Cesaro summation: http://en.wikipedia.org/wiki/Ces%C3%A0ro_summation
  - Divergent series: http://en.wikipedia.org/wiki/Divergent_series
- Video online here: http://youtu.be/Wavl_-DsdWw
Monday, October 27. See Chapters 3 and 18 of our book for more information about Dirichlet Characters and Dirichlet L-funcions. Their main applications (for us) are in proving Dirichlet's Theorem of Primes in Arithmetic Progression and other similar results.
- The Riemann zeta function is the first of many such functions we can study. The generalizations are called L-functions, and for us are of the form \(L(s,f) = \sum_n a_n(f) / n^s = \prod_p L_p(s,f)^{-1}\) where \(L_p(s,f)\) is a polynomial of degree \(d\) in \(p^{-s}\). Two of the most important are Dirichlet L-functions (which has applications to primes in arithmetic progression) and Elliptic Cuve L-functions (which has applications to understanding the size of the group of rational solutions of the elliptic curve -- see the Birch and Swinnerton-Dyer conjecture for more information). Dirichlet characters are sometimes covered in a group theory or abstract algebra course. If you want more details, see Chapter 3 of our book (from Section 3.3.2 to 3.3.6). Elliptic curves are discussed in Section 4.2.We initially used some knowledge of the zeros of the Riemann zeta function to deduce information about the primes. Amazingly, if we look at families of L-functions we can convert knowledge of sums over the family of the \(a_n(f)\) to information about the zeros of the associated L-functions. We saw how we have great formulas for summing Dirichlet characters; similar formulas exist for other families as well. For details in the case of Dirichlet L-functions, see Chapter 18 of our book. Note the amazing similarity between random matrix theory. We have three ingredients to understand the zeros. (1) Determine the correct scale to study the zeros (this actually falls out from the functional equation). (2) Derive a formula relating sums over the zeros to a related sum over the prime coefficients. This is the analogue of the Eigenvalue Trace Lemma, Tr\((A^k) = \sum {\lambda_i}(A)^k\). The reason this formula was so useful is that while we want to understand the eigenvalues of our random matrices, it is the matrix elements that we choose. Thus, this formula allows us to pass from knowledge of the matrix elements to knowledge of the zeros. These are known as Explicit Formulas. (3) The Eigenvalue Trace Lemma and the Explicit formula would be useless, however, if we were unable to actually execute the sums. Our theory thus requires some kind of averaging formula. For random matrix theory, this was the integrals of Tr\((A^k) P(A) dA\); we could compute these as Tr\((A^k)\) is a polynomial in the matrix elements, and then we used combinatorics and probability theory. Sadly, we do not have great averaging formulas in number theory, and this is why the results there are significantly worse.
- There are many questions one can ask about primes in progression. The first is on primes in arithmetic progressions. We can also ask about twin primes (p and p+2k also prime). We could look at prime triples such as p, p+2 and p+4 other than 3, 5, 7; this leads to the question of just which arithmetic progressions are possible. We quickly see here that there are no such triples; there is an arithmetic obstruction as at least one of the three numbers is a multiple of three. The largest to date might be an arithmetic progression of length 25; Green and Tao proved that there are arbitrarily long arithmetic progression (sadly it's an existence theorem, with nothing said about actually finding one!).
  - Great paper by Ram Murty and Nithum Thain on elementary proofs of primes in arithmetic progressions: http://www.mast.queensu.ca/~murty/murty-thain2.pdf
  - Twin prime conjecture: http://en.wikipedia.org/wiki/Twin_prime
  - Prime tuples and constellations: http://en.wikipedia.org/wiki/Prime_k-tuple
    - When we do the Circle Method, we'll learn how to predict the numbers of such objects up to a given bound.
- Another great question is about the least prime in an arithmetic progression:
  - MathOverflow: http://mathoverflow.net/questions/80865/least-prime-in-a-arithmetic-progression
  - Linnik's paper: http://www.mathnet.ru/links/d9f8956b4a8f04153475fcaefc7eb67c/sm6202.pdf
  - Sometimes larger than expect: Granville and Pomerance: https://math.dartmouth.edu/~carlp/least.pdf
- Video online here:
Friday, October 24. We finished our sketch of the Prime Number Theorem and began our unit on Dirichlet L-functions, the first generalization of the Riemann zeta function.
- Video online here: http://youtu.be/DfSCUElVr6s
Wednesday, October 22. We continued our brief introduction to complex analysis. We did some contour integration, saw how much easier integration can become, and saw applications to number theory through a sketch of the proof of the Prime Number Theorem, which highlights why we need to analytically continue \(\zeta(s)\) and understand the location of its zeros.
- We continued working on the Residue Theorem. The difficulty is often in exploiting decay; try integrating \(\sin^2 x / x^2\) over the real line... (if you integrate by parts first you can make life easier!). The Residue Theorem is an incredibly powerful tool. Even if you only care about integrals of functions of a real variable, it is frequently useful to extend to the complex plane. The reason is that, in general, it is not possible to write down anti-derivatives; integration is hard! (There is an interesting algorithm (due to Risch) to find anti-derivatives involving elementary functions. The linked article has a nice example here, where changing the constant term by 1 leads to the method failing; this is related to a change in the Galois group.) There are several steps to using the Residue Theorem:
  - Step one: determine the function. Frequently it is easy: given \(f(x)\) try \(f(z)\). Sometimes, though, it's a bit harder. If you have a \(\cos(x)\) term you could try \(\cos(z) = (\exp(iz) + \exp(-iz))/2\), or you might try taking just \(\exp(iz)\) and taking the real part.
  - Step two and three are related: choose a contour and find the poles and residues. Often the location of the poles affects what contour you take. You DO NOT want a pole on the contour (I've had to do this a few times in my research, and it is not fun). Sometimes you have to split the integrand up into different pieces, and do one part with one closed curve and another part with another. A big factor in determining contours is how the function decays. Remember that decay is a bit trickier in complex numbers; for instance, let's assume \(|z| > 2000\); then \(1/|1+z^2| \le 1/(R^2 - 1)\); if we restricted to \(|x| > 2000\) then \(1/(1+x^2) \le 1/(R^2 -1)\). The issue is that we have a phase.
  - Step three: repeat earlier steps as needed.
  - We did a very important example, finding the normalization constant for integrating \(1/(1+x^2)\) over the real line. This leads to the Cauchy distribution, which is very important in probability.
    - One of my favorite applications of the Cauchy distribution is by Mandelbrot in his economics research. See the wikipedia entry on Kurtosis risk.
    - T
      he standard random walk hypothesis seems to have lost most of its supporters, though there are variants (and I'm not familiar with all); see also the efficient market hypothesis and technical analysis, and all the links there. (There are also many good links on the wikipedia page on Eugene Fama). Two famous books (with different conclusions) are Malkiel's A random walk down wall street and Mandelbrot-Hudson's The (mis)behavior of markets (a fractal view of risk, ruin and reward). Some interesting papers if you want to read more:
      
      Mandelbrot: Variation on certain speculative prices (a must read!)
      
      Fama: Mandelbrot and Stable Paretian Hypothesis
      
      Fama: Random Walks Stock Prices
      
      For more on randomness, check out The Black Swan by Taleb (amazon.com page here, wikipedia page here). Several members of my probability class have recommended this book highly, and from reading excerpts on the web I understand why.
      
      For more on fractal geometry, click here. One of the most common is the Koch snowflake; another popular one is the Cantor set. See here for fractal dimensions. To actually compute pictures of items like the Mandelbrot set, one needs to iterate polynomials. This can lead to the fascinating subject of efficient algorithms; when I wrote such programs years ago on what would now be considered `slow' computer, I had to use Horner's algorithm to get things to run in a reasonable time.
  - Consider finding the poles and residues of \(1/1+z^{2010}\). Key is Euler's formula: \(\exp(ix) = \cos(x) + i \sin(x)\). Remember that if we want to solve \(z^{2010} = -1 = \exp(i\pi)\), we could also write \(-1\) as \(\exp(i\pi + 2\pi in)\) for any integer \(n\). There will be \(2010\) distinct solutions (half in the upper half plane, half in the lower half plane).
- Contour integral examples:
  - http://en.wikipedia.org/wiki/Methods_of_contour_integration
  - http://web.williams.edu/Mathematics/sjmiller/public_html/372/coursenotes/Trapper_MethodsContourIntegrals.pdf
- Recall a powerful technique from Calc I: if \(f(g(x)) = x\) (so \(f\) and \(g\) are inverse functions, such as \(\sqrt{x^2}\) or, one needed for the Cauchy distribution, \(\tan(\arctan(x))\)), then \(g'(x) = 1 / f'(g(x))\); in other words, knowing the derivative of \(f\) we know the derivative of its inverse function. This was used in Calc I to pass from knowing the derivative of \(\exp(x)\) to the derivative of \(\ln(x)\). We can also use this to find various anti-derivatives in terms of inverse trig functions; while many are close to \(\sqrt{1-x^2}\), none of them are exactly that (a list of the derivatives of these are here). This highlights one of the most painful parts of integration theory -- just because we are close to finding an anti-derivative does not mean we can actually find it! While there is a nice anti-derivative of \(\sqrt{1 - x^2}\), it is not a pure derivative of an inverse trig function. There are many tables of anti-derivatives (or integrals) (a fun example on that page is the Sophomore's Dream). Unfortunately it is not always apparent how to find these anti-derivatives, though of course if you are given one you can check by differentiating (though sometimes you have to do some non-trivial algebra to see that they match). In fact, there are some tables of integrals of important but hard functions where most practitioners have no idea how these results are computed (and occasionally there are errors!). We will see later how much simpler these problems become if we change variables; to me, this is one of the most important lessons you can take from the course: MANY PROBLEMS HAVE A NATURAL POINT OF VIEW WHERE THE ALGEBRA IS SIMPLER, AND IT IS WORTH THE TIME TO TRY TO FIND THAT POINT OF VIEW!
- Computing residues: We've seen that using the geometric series is a great way to compute residues. For example, if we want a residue at say \(3\) we replace \(z\) everywhere with \(z-3 + 3\); the first factor \(z-3\) is then small. Another useful approach is through differentiation. Say we have \(f(z) = g(z) / (z-3)^{10}\) with \(g(z)\) holomorphic at \(z=3\). To calculate the residue at \(z=3\) we need to find the \((z-3)^9\) term of \(g(z)\). We could do this with our trick, or we could compute 9 derivatives.
- Video online here: http://youtu.be/EGDPdOuK3Jg
Monday, October 20. To see the connection between zeros of the Riemann zeta function \(\zeta(s)\) and the distribution of primes requires some results from complex analysis. Interestingly, one does not need to go through the zeros of \(\zeta(s)\) to reach the Prime Number Theorem, though it is an efficient, good way to go.
- The complex analytic proof of the Prime Number Theorem uses several key facts. We need the functional equation of the Riemann zeta function (which we saw follows from Poisson summation and properties of the Gamma function), the Euler product (namely that \(zeta(s)\) is a product over primes), and the Riemann zeta function has no zeros on the line \({\rm Re}(s) = 1\). If this happened, then the main term of \(x\) from integrating \(\zeta'(s)/\zeta(s) \ast x^s/s\) arising from the pole of \(\zeta(s)) at \(s=1\) would be cancelled by the contribution from this zero! Thus it is essential that there be no zero of \(zeta(s)\) on \({\rm Re}(s) = 1\). There are many proofs of this result. My favorite proof is based on a wonderful trig identity: \(3 + 4 \cos(x) + \cos(2x) = 2 (1 - \cos(x))^2 \ge 0\) (many people have said that \(w^2 \ge 0\) for real \(w\) is the most important inequality in mathematics). There is an elementary proof of the prime number theorem (ie, one without complex analysis). For those interested in history and some controversy, see this article by Goldfeld for a terrific analysis of the history of the discovery of the elementary proof of the prime number theorem and the priority dispute it created in the mathematics community. Riemann computed zeros of \(\zeta(s)\) but didn't mention his achievement; the method only came to light about 70 years later when Siegel was looking at Riemann's papers. Click here for more on the Riemann-Siegel formula for computing zeros of zeta(s). Finally, terrific advice given to all young mathematicians (and this advice applies to many fields) is to read the greats. In particular, you should read Riemann's original paper. In case your mathematical German is poor, you can click here for the English translation of Riemann's paper. The key passage is on page 4 of the paper:
  - One now finds indeed approximately this number of real roots within these limits, and it is very probable that all roots are real. Certainly one would wish for a stricter proof here; I have meanwhile temporarily put aside the search for this after some fleeting futile attempts, as it appears unnecessary for the next objective of my investigation.
- The main input we will need is that integrals along circles (or more generally nice curves) of the logarithmic derivative of a nice function is just the order of the zero or pole at the center of the circle. In other words, if we have a Taylor expansion \(f(z) = a_k z^k + \cdots\) (where \(k\) is the first non-zero term; thus \(a_k\) is not zero and if \(k > 0\) we say the function has a zero of order k at the origin, while if \(k < 0\) we say the function has a pole of order \(k\)). The Residue theorem then gives: \((1 / 2 \pi i) \int_{|z| = r} f'(z)/f(z) dz = k\). Note that if the function doesn't have a zero or pole at the origin then this integral is zero (for r sufficiently small). More generally, if \(g(z)\) is a nice function \((1 / 2 \pi i) \int_{|z| = r}g(z) f'(z)/f(z) dz = k g(0)\). We will use a further generalization of this to relate the zeros of the Riemann zeta function to counting the number of primes at most x. For more details on the complex analysis we are using, see Cauchy-Riemann equations, Cauchy-Goursat Theorem, Residue Theorem, Green's Theorem. The key takeaways from today's class are: (1) we can convert certain types of integrals to finding the \(a_{-1}\) coefficient in a Taylor expansion (and this is good as algebra is easier than integration); (2) integrating the logarithmic derivative is useful as the answer is related to the zeros and poles of the function. To really drive the point home: the reason this is such a spectacular formula is that it reduces integration (hard) to finding ONE Taylor coefficient (ie, algebra, ie easy).
- Green's theorem in a day: https://www.youtube.com/watch?v=Iq-Og1GAtOQ
- Video online here: http://youtu.be/zCbm7hZUY9Q
Friday, October 17. Our Fourier analysis paid big dividends today with the application of Poisson Summation to get the functional equation.
- There are many proofs of the functional equation of the Riemann zeta function; the proof we gave is `secretly' relating the Riemann zeta function to the Mellin transform (which is basically the Fourier transform after a change of variables) of the theta function. A crucial input was the Gamma function, which arises throughout mathematics, statistics, science, .... Functional equations are extremely important, as they allow us to work with useful functions that are initially only defined in one region in larger regions. The functional equation of the Riemann zeta function or the Gamma function (or the geometric series) are just a few instances. It is worth pondering what allows us to find a functional equation. For the Gamma function, it was integrating by parts in the integral defining the Gamma function; for the theta function, it was Poisson summation. Finally, it is worth noting that we have seen yet again examples of how problems can be converted to integrals. In this case, the Riemann zeta function initially was only defined for Re(s) > 1; however, we then rewrote it as an integral from \(x = 0\) to \(\infty\) involving the omega function (which also made sense only for Re(s) > 1), but then we rewrote that as two integrals from \(x = 1\) to \(\infty\) involving the omega function, and these integrals exist for all \(s\). We are fortunate in finding an integral expression which we can work with. It should hopefully seem `natural' (at least in hindsight) in passing from the omega function to the theta function (omega is a sum over \(n > 0\), theta is a sum over all \(n\) and thus there is a chance Poisson summation could be useful).
- Dirichlet eta function: http://en.wikipedia.org/wiki/Dirichlet_eta_function
- Riemann's original paper: http://www.claymath.org/sites/default/files/ezeta.pdf ((READ THE CLASSICS -- READ THIS!!))
- Video online here: http://youtu.be/z8WstIYV3Xc
Wednesday, October 15. The Riemann zeta function is one of the most important functions in number theory; we finally got to it!
- We finished our analysis of splitting integrals. This is a very important technique to master, which is why I was willing to spend more time on it today. You want to get used to having free parameters to choose and optimize later.
- Infinite product: http://en.wikipedia.org/wiki/Infinite_product
- Infinite product that converges to a non-zero number, all terms rational, product rational: \(\prod_{n=2}^\infy \frac{n^2}{n^2-1} = 2\). Unlike the diverging \(\prod_{n=2}^\infty \frac{n}{n-1}\) the difference is a \(1/n^2\) instead of a \(1/n\).
- Evaluating zeta(2): http://www.uam.es/personal_pdi/ciencias/cillerue/Curso/zeta2.pdf
- Even better: zeta(2n): http://www.uam.es/personal_pdi/ciencias/cillerue/Curso/zeta2.pdf
- We looked at special values of the Riemann zeta function to get proofs of the infinitude of the primes.
- From Proofs from the Book: Six proofs of the infinitude of the primes: http://www.cwu.edu/~glasbys/INFINITY.PDF
- Many of the Riemann zeta function's properties are related to viewing it as a function of a complex variable \(s\). As such, it is not surprising that we need some results from Complex Analysis for our studies. The main result we are heading towards is the Cauchy Residue Theorem. The most important fact is that if \(f(x) = \sum_{n = -N}^{\infty} a_n z^n\), then \((1 / 2\pi i) \int_{|z| = r} f(z)dz = a_{-1}\). The reason this is such a spectacular formula is that it reduces integration (hard) to finding ONE Taylor coefficient (ie, algebra, ie easy). Finally, below are the three arxiv posts related to either topics we've just studied, are about to, or could have if the class voted differently (note: the arxiv is a wonderful site, but nothing on it is refereed; many professional mathematicians check the arxiv every day and skim the titles and abstracts of all posts; many more do this for their speciality)
- Video online here. http://youtu.be/XhVHeawbLPc
Friday, October 10. Breaking up sums and integrals is extremely important in analytic number theory -- you want to learn how to balance getting the best possible results and getting accessible algebra. Often we don't need optimal bounds, and can make do with less.
- Prime Number Theorem: http://en.wikipedia.org/wiki/Prime_number_theorem
- Bertrand's postulate: http://en.wikipedia.org/wiki/Bertrand%27s_postulate
- Divergence of sum of reciprocal of primes: http://en.wikipedia.org/wiki/Divergence_of_the_sum_of_the_reciprocals_of_the_primes
- We finished the class by looking at the integral \(\int_2^x du / \log^2 u\). We broke the integral up as \(\int_2^x = \int_2^{\sqrt{x}} + \int_{\sqrt{x}}^x\), but of course there are other ways. We could split at \(x^c\), or maybe \(x/\log^r x\), or \(x/\log_r x\) (where \(\log_r\) means \(\log \log \cdots \log\)). The most general place to split is at \(f(x)\). As a great exercise, choose \(f(x)\) to get the best approximation of the integral. In our choice we had a large dispersal of the denominator (ok, 'large' means up to a constant, but it should be tightened).
- Video online here: http://youtu.be/ejbZw-tIN9w
Wednesday, October 8. Today's class highlights the differences between books and lectures. In a book you have to set down the material; in a lecture you can change the path. Several of you had made some good comments in class or in emails about the size of the divisor function, and I felt it might be illuminating to think about the problem. We talked at great length on the divisor function (click here for the Wikipedia page). It's nice to be able to take the time and look deeply at some arguments; I want to give you a flavor for the subject so you can get a sense of whether or not this area is for you.
- We first obtained a non-trivial exponent savings of \(d(n) \le 2 n^{1/2}\); we did this by noting that if \(n = xy\) then at least one factor is at most \(\sqrt{n}\). We ran into problems when we tried to extend this further. We tried breaking into cases, and saw that if there was a large prime factor then we had a savings. For example, if one of the prime factors \(p\) is at least \(n^{1/4}\) then \(n/p \le n^{3/4}\), and by our earlier work \(d(n/p) \le 2 (n/p)^{1/2} \le 2 n^{3/8}\). To get all the divisors of \(n\), we look at each divisor of \(n/p\) and we can either multiply by \(p\) or not (if \(p^2|n\) this will double count some divisors, but that's fine as we're just shooting for an upper bound). Thus \(d(n) \le 2 d(n/p) \le 4 n^{3/8}\), and we have saved a power in the exponent. Saving powers in the exponents is a huge part of analytic number theory, and one of the reasons I wanted to spend so much time on this.
- What can we do next? The argument we did showed that if there is a large prime factor then the divisor function is probably smaller. It suggests that the more small prime factors, the larger it will be. A little work shows that if \(n = p_1^{r_1} \cdots p_k^{r_k}\) then \(d(n) = (1+r_1) \cdots (1+r_k)\). Our arguments suggest that the worse case is when we have as many factors as possible, as it's better to have more terms than one term higher (adding a new term with \(r=0\) doubles the product, whereas increasing an \(r_i\) by 1 increases the product by at most 3/2.
- This suggests we look at primorials, which are factorials with the terms restricted to primes. So \(5\# = 5 \cdot 3 \cdot 2\), for example. We did a lot of work to figure out the right way to look at things. If we assume \(n\) is a primorial, say \(n = p_m\#\) for some (\m\), we need to find the index \(m\). This led us to looking at approximating solutions of transcenental equations!
  - For example, how big is \(p_n\)? Well, if \(\pi(x) = \#\{p: p \le x, p\ \) prime\(\}\), the Prime Number Theorem says that \(\pi(x) \sim x/\log x\). So \(n = \pi(p_n) \sim p_n/\log p_n\). We try to solve this. Our first naive (ridiculous) guess is that \(p_n = n\); substituting this gives \(n = n/\log n\), which doesn't work. We see our guess was too low, we need to increase it by approximately \(\log n\), so we try \(p_n = n \log n\). Substituting this into \(n \sim p_n/\log p_n\) gives \(n \sim n \log n / (\log n + \log\log n) = n - n \log \log n / (\log n + \log\log n) \), which is approximately correct as \(\log \log n / \log n \to 0\).
  - We used this kind of analysis to figure out what the index \(m\) should be so that \(p_m\# = n\). Whenever we see a product we want to take logarithms! We have \(\sum_{p \le p_m} \log p = \log(p_m\#)\); by the Prime Number Theorem \(\sum_{p \le x} \log p \sim x\), and so we get \(\log(p_m\#) \sim p_m\) so \(n \sim p_m\# \sim e^{p_m} \sim e^{m \log m} = m^m\). Now of course how big is \(m\), given that \(n \sim m^m\)? This is another transcendental equation to solve, and we get \(\log n \sim m \log m\). We try to solve as before. If we try \(m = \log n\) we get \(\log n \sim \log n \log\log n\), which is too high. So we correct and try \(m = \log n / \log \log n\), and find \(\log n \sim \frac{\log n}{\log \log n} (\log\log n - \log\log\log n\), so \(n \sim \log n - \log n \cdot \frac{\log\log\log n}{\log\log n}\), which to first order is correct.
  - These arguments get quite involved; often mathematicians write \(\log_3 n\) for \(\log\log\log n\), as we don't really use base 3 (I'd vote for \(\ln_3\), as that clearly isn't base 3, but I'm outvoted.
  - Now that we have these calculations telling us that if \(p_m\# = n\) then \(m \sim \log m/\log \log n\), we know \(d(n) \sim 2^{\log n / \log\log n} = e^{\log 2 \log n / \log\log n} = n^{\log 2/\log\log n}\). This shows us that (assuming our assumption of which inputs have the largest value of the divisor function) we have \(d(n) \le n^\epsilon\) for any positive \(\epsilon\).
- Video online here: http://youtu.be/C4gw6cYxmEo
Monday, October 6. There are many important arithmetical functions, and lots of important properties of them.
- Mobius function: http://en.wikipedia.org/wiki/M%C3%B6bius_function
- von Mangoldt function: http://en.wikipedia.org/wiki/Von_Mangoldt_function
- Euler totient function: http://en.wikipedia.org/wiki/Euler%27s_totient_function
- Divisor function: http://en.wikipedia.org/wiki/Divisor_function
- Dirichlet convolution: http://en.wikipedia.org/wiki/Dirichlet_convolution
- Partial Summation: http://en.wikipedia.org/wiki/Summation_by_parts
Friday, October 3. Mountain Day, no class
Wednesday, October 1. We studied Poissonian behavior of uniformly distributed random variables; an area of active research is what happens for special sequences.
- We studied the distribution of nearest neighbor spacings between independent, identically distributed random variables taken from the uniform distribution. We see similar behavior when we look at the spacings between adjacent primes or the ordered \(n^k \alpha\) mod \(1\) for \(k\) at least two. In neither case do we have a proof; in fact, for \(n^k \alpha\) the behavior depends greatly on the irrationality exponent of \(\alpha\). For more details, see the textbook and the references therein. Our proof used several results from previous classes, including the Fundamental Theorem of Calculus to find the probability and then the definition of the derivative of exp(x).
- We also discussed the natural scale to study problems (ie, looking at the average spacing between events, where the events here are the ordered values of our random variables). This is one reason the twin prime problem is so difficult, as this is a miniscule difference relative to the average spacing; calculating Brun's constant (the sum of the reciprocals of twin primes) led Nicely to discover the Pentium bug; a nice description of the discovery of the bug is given at http://www.trnicely.net/pentbug/pentbug.html.
- Integration in general is hard, and frequently we need to resort to numerical methods such as Monte-Carlo integration (see http://www.fas.org/sgp/othergov/doe/lanl/pubs/00326866.pdf for a note about the beginnings of the method). Choosing random sequences has nice applications in such subjects.
- Monte Carlo integration has been hailed by some as one of the (if not the) most influential papers in the 20th century. We only touch on the briefest part of the theory here. It can be combined with Central Limit Theorem or Chebyshev's Theorem to give really good results on numerically evaluating integrals. Specifically, if \(N\) is large and we choose N points uniformly, we can simultaneously assert that with extremely high probability (such as at most \(1 - N^{-1/2}\)) the error is extremely small (at most \(N^{-1/4}\)). If you want to know more, please see me -- there are a variety of applications from statistics to mathematics to economics to .... Below are links to two papers on the subject to give you a little more info:
  - Metropolis: The Beginning Of The Monte Carlo Method
  - Metropolis and Ulam: The Monte Carlo Method
  - The subject continues with quasirandom processes. This is a very active area of research, with lots of great applications. James Propp has written some very nice papers and talks on the subject. You can click here for some papers and talks. A great talk of his is online: click here for the slides
- Mathematica code from class:
  - poissontest[alpha_,k_,num_] := Module[{},
    list = {};
    For[n = 1, n <= num, n++, list = AppendTo[list,SetAccuracy[Mod[1.0n^k
    alpha,1],100]]];
    list = Sort[list];
    diff = {};
    For[n = 2, n <= num, n++, diff = AppendTo[diff, list[[n]] - list[[n-1]]]];
    Print[Histogram[diff]];
    ];
    poissontest[Sqrt[Pi],1,2014]
- Video online here: http://youtu.be/I_10ADutXD8. Video issues: Unfortunately the first 17 minutes is missing audio. NO idea why, no idea why it jumps in. Briefly what we did was look at Mathematica code (available in the additional comments) that showed that if we took n alpha mod 1 there were only 2 or 3 possible differences, but n^2 alpha mod 1 had what looked like a continuum. Thus, while n alpha mod 1 is equidistributed, it does have different behavior. We talked about applications to Monte Carlo Integration, and the advantage of not necessarily taking points completely at random but rather using some structure.
Monday, September 29. There's a lot that can be done with irrationals. Today we saw how to use Fejer's theorem to obtain a proof of Weyl's law.
- The proof of the equidistribution of \(n \alpha\) mod \(1\) today uses a very common analysis technique. To prove a result for a step function (like the characteristic function of the interval \(([a,b]\)), it suffices to prove the result for a continuous function, as we can find a continuous function that is arbitrarily close. Then, to prove the result for continuous functions we instead prove the result for a nice, finite Fourier series, as we can find such a series that is arbitrarily close to our continuous function. Such arguments are used all the time in Measure Theory. The crux of the argument is that we have a finite sum of sines and cosines (the \(\exp(2 \pi i m x)\)), and that these can be divided into two parts. The first is the constant term (\(m=0\)) gives \(b-a\) plus a small error; the remaining terms are 'small' in terms of \(N\). How small is a VERY deep question, and involves the irrationality exponent of alpha (ie, how well we may approximate \(\alpha\) by irrationals). The big result along these lines is the Erdos-Turan theorem (For applications, it is often important to have a sense as to how rapidly one has convergence in equidistribution results. One of the common techniques involves using the Erdos-Turan theorem (the web resources aren't great; I have a copy of a good book that shows how the irrationality exponent is connected to quantifying the rate of convergence to equidistribution).). Finally, it is worth going over the argument and keeping track of what was given and what we chose. We are given an \(\epsilon > 0\); this leads to a \(j\) (for how well the continuous functions approximate the step function) and \(M\) (the number of terms in our finite Fourier sum); we then send \(N\) to infinity.
- We spent a lot of time talking about the difference between sharp cutoff functions and smooth cutoff functions. For many problems, it is preferable to use smooth cutoff functions (though in the world we often care about sharp cutoffs). We talked about the differences between refining a partition and a new partition, and a suggestion from class gave an example where the lower sum approximation to the area can decrease as we go from a partition with n pieces to one with n+1.
- We discussed at length how easy it is to accidentally assume something. For example, while it is reasonable to expect that the more terms we take the more accurate the Fejer series approximation is to f, we never proved that and it might be false. Consider for example the sequence 1/2, 1/2, 1, 1/3, 1/3, 1/3, 1/2, 1/4, 1/4, 1/4, 1/4, 1/3, 1/5, 1/5, 1/5, 1/5, 1/5, 1/4, 1/6, .... The sequence converges to zero, but not monotonically.
- Video online here: http://youtu.be/BFVuimP8ZLE
Friday, September 26. There are many applications of Fourier analysis; we saw how to use Fejer's theorem to obtain a proof of Weyl's law.
- In class we talked about denseness of certain sequences. Other fun ones are \(\sin(n)\) and \(\cos(n)\) -- are these dense in the interval [-1, 1]? Equidistributed? What can you say about these? (I believe one is somewhat elementary, one is more advanced. Email me for a hint on what math results might be useful.) We also looked in the book at how knowledge of the irrationality type of \(\alpha\) can be used to see \(n^2 \alpha\) mod \(1\) is dense. We assumed \(\alpha\) had irrationality exponent of \(4 + \eta\) for some \(\eta > 1\) -- can the argument work for a smaller exponent? What if we studied \(n^k \alpha\) mod \(1\) -- what would we need to assume about the irrationality exponent? Can you somewhat elementarily prove the denseness of \(n^2 \alpha\) if the irrationality exponent is less than 3? I say somewhat elementarily as we will later show the sequence is equidistributed, and thus it must be dense. Can you come up with a more elementary proof, where you just get denseness? Finally, for those who know (or are interested in) measure theory, one natural question to ask is how severe is the restriction to studying irrational \(\alpha\) with exponent \(4 + \eta\)? If you're familiar with Cantor's diagonalization argument (Theorem 5.3.24), almost all numbers are transcendental (and thus irrationality); however, this does not mean they have an irrationality exponent as large as \(4+\eta\) (for example, \(\ln(2)\) is irrational but has exponent less than \(4\)). A good exercise is to modify the proof of Theorem A.5.1 to show that almost no irrationals (in the sense of measure) have irrationality exponent as large as \(4 + \eta\).
Wednesday, September 24. We spent most of the day estimating integrals. The main idea is figuring out where things are large and small, and getting a sense of when an approximation is harmful or not.
- A great example of this is counting primes. First, putting in log weights is harmless and can be removed easily, but sets us up to use complex analysis. More importantly, though (and described in Chapter 3 of our text) is that it is much easier to study prime powers than just primes, and remove the contribution from prime powers. The idea is that certain 'completed' sets are more natural to study, and it's worth sieving with an inclusion / exclusion or dealing with the higher terms than to look at the quantity of interest.
- Speaking of primes, one does not need complex analysis and there is an elementary proof of the Prime Number Theorem, which has sadly led to one of the biggest priority disputes and controversies in mathematics (see here for more on it); another famous controversy is theNewton - Leibniz on calculus.
- Video online here: http://youtu.be/AA_GHlM6sU4
Monday, September 22. We earned dividends from all our Fourier analysis work (in particular, Poisson Summation), in our Benford analysis.
- One fun application of Poisson Summation is to proving the iterates of the 3x+1 map satisfy Benford's law. The 3x+1 is just one of many fascinating sequences to study; another great one is Conway's See and Say (or Look and Say) sequence: 1, 11, 21, 1211, 111221, .... There are numerous fascinating properties of these sequences; one of my favorites is the Cosmological Theorem and the interpretation of terms in the sequence in terms of elements in the periodic table! There were several proofs of this wonderful theorem; unfortunately they were all `lost'. Since then new ones have appeared; see the write-up here for a proof. The first author of the paper is Shalosh B. Ekhad; if you've never met `Professor Ekhad' I strongly urge you to click on the link and take a look at the `Professor' (who is the first author of the paper!).
- We showed an application of the equidistribution of \(n \alpha\) mod 1 to Benford's law. For applications, it is often important to have a sense as to how rapidly one has convergence in equidistribution results. One of the common techniques involves using the Erdos-Turan theorem (the web resources aren't great; I have a copy of a good book that shows how the irrationality exponent is connected to quantifying the rate of convergence to equidistribution).
- Equidistribution and Kronecker-Weyl theorem: http://en.wikipedia.org/wiki/Equidistributed_sequence#Weyl.27s_criterion
- Cam's description on switching orders of summation: http://web.williams.edu/Mathematics/sjmiller/public_html/videoclasses/Aug172009CamSwitchSums_Physics.avi
- Here is my paper with Alex Kontorovich on 3x+1, L-functions and Random Matrices obeying Benford.
- Video online here: http://youtu.be/8dPhdo98Gk8
Friday, September 19. Today is one of the biggest applications of Fourier analysis, Poisson Summation!
- Poisson Summation is one of the standard tools of analytic number theory, allowing us to frequently convert long, slowly decaying sums to short, rapidly decaying sums, so that just a few terms suffice to get a good estimate. One nice application is to counting the number of lattice points with integer coordinates inside a circle (also called the Gauss circle problem). If you consider points with integer coordinates, you would expect approximately \(\pi R^2\) such points to be in a circle of radius \(R\); what is the error? A little inspection shows that the error shouldn't be much worse than the perimeter, so the answer might be \(\pi R^2\) with an error of at most something like \(2 \pi R\) (Gauss proved an error of at most \(2 \sqrt{2} \pi R\)). The current record is by Hooley, who shows that the error is at most C R^theta, where theta <= .6298....
- We mentioned the Fourier Transform and interesting functions that satisfy a lot of nice conditions but not every property we'd like. See for example the function \(f\) on page 270 (or better yet modify this to be infinitely differentiable and it and its first five powers are integrable). There are many applications, one of the most important being a proof of the Central Limit Theorem. When we get to Benford's law we'll need to know what the Central Limit Theorem modulo 1 looks like. I prove this in detail in this paper.
  - There are other generalizations of the central limit theorem. One particularly nice version involves Haar measure. Consider the set of \(N \times N\) unitary matrices \(U(N)\), or its subgroups the orthogonal matrices and the symplectic matrices. It turns out there is a way to define a probability measure on these spaces (this is the Haar measure), and there are generalizations of the central limit theorem in these contexts: The n-fold convolution of a regular probability measure on a compact Hausdorff group \(G\) converges to normalized Haar measure in weak-star topology if and only if the support of the distribution not contained in a coset of a proper normal closed subgroup of \(G\).
- The Central Limit Theorem
  has a rich history and numerous applications. What makes it so powerful and applicable is that the assumptions are fairly week, essentially finite mean, finite variance, and something about the higher moments. The natural question is what exactly do we mean by convergence? There are several different notions.
  - A classic result about how rapidly we have convergence to the standard normal is the Berry-Esseen Theorem. As many distributions have zero third moment, the fourth moment frequently controls the speed. This is why instead of looking at the kurtosis (fourth moment) we often look at the excess kurtosis, which is the kurtosis of our random variable minus the kurtosis of the standard normal. This is because it is this difference that frequently controls the speed of convergence. Taylor series played a key role in our proofs; the idea is that we can locally replace a complicated function by a simpler function, so long as we can control the error estimates.
  - We summify our expression by using the identity P = exp(log P); this is very useful whenever P is a product as logarithms convert products to sums. This is a great way to do nothing! We saw how well this worked to understand quantities such as P = lim_{N --> ∞} (1 + x / N²)^N. We took the logarithm and log P_N= N log(1 + x / N²); we then Taylor expanded the logarithm and found log P_N= x / N + terms of size N², N³, .... Exponentiating gives us P_N= exp(x / N) exp(terms of size N², N³, ...), and we thus obtain information on the speed of convergence.
  - One can prove the CLT directly in the case of Bin(N, 1/2). As a binomial random variable is the sum of Bernoulli random variables, we see that Bin(N,1/2) should become normally distributed as N tends to infinity. This can be proved directly, and uses Stirling's formula to estimate the binomial coefficients.
- Video online here: http://youtu.be/v9eoWGQkoeM
Wednesday, September 17. Finally, some serious Fourier analysis!
- Carleson's theorem on pointwise convergence: http://en.wikipedia.org/wiki/Carleson's_theorem (nice summary here of what is known about convergence).
- Dirac delta function: http://en.wikipedia.org/wiki/Dirac_delta_function#nascent_delta_function
- Stone-Weierstrass Theorem: http://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem
- Gibbs phenomenon: http://en.wikipedia.org/wiki/Gibbs_phenomenon
- Convergence of Fourier Series: Mathematica File PDF file
- Video online here: http://youtu.be/XuqIurfW_fs

Monday, September 15. After building up some basic results on convergence in analysis, we will be able to tackle convergence of Fourier series on Wednesday. This is a vast topic, and cannot be done justice in just a day; thus we have to content ourselves with highlighting some of the important items.
- \(L^p\) spaces: see for example http://en.wikipedia.org/wiki/Lp_space or https://www.math.ucdavis.edu/~hunter/measure_theory/measure_notes_ch7.pdf or http://www.math.uh.edu/~dlabate/PagesfromAnalysis.pdf .
- Cauchy-Schwarz inequality: http://en.wikipedia.org/wiki/Cauchy%E2%80%93Schwarz_inequality . This is a special case of a more general inequalty, Holder's inequality: http://en.wikipedia.org/wiki/H%C3%B6lder%27s_inequality . One of the reasons \(p=2\) is so useful is that we have a Hilbert space.
- We explored a lot the containment between various \(L^p\) spaces. When the space is compact we can do a lot. One proof that \(L^2([0,1]) \subset L^1([0,1])\) used the Cauchy-Schwarz inequality. Another used the beautiful relation that \(|f| \le |f|^2 + 1\); both proofs crucially use that we're integrating over a finite region.
- Video online here: http://youtu.be/Utf4esln9e0

Wednesday, September 10. We continued our exploration of interchanging operations (derivatives and sums), discussed the exponential function and various \(L^p\) spaces.
- One of the key steps in understanding the exponential function is using the Binomial Theorem to show exp(x) exp(y) = exp(x+y). It's interesting to see that we need to know the derivative of the exponential function to find the derivative of \(x^t\) for general \(r\).
- We are able to rederive trig identities from the Euler-Cotes formula: http://en.wikipedia.org/wiki/Euler's_formula
- We just scratched the surface on differentiating identities; if you want to read more, see these notes of mine.
- Video online here: http://youtu.be/vVjOYrsHGqM

Monday, September 8. Rather than covering the standard definitions, which you can read in the book, we instead concentrated on some of the advanced analysis concepts underlying operations with infinities, especially interchanging operations. We'll continue our conversation on these later.
- Mathematics StackExchange is a good place to look for answers: http://math.stackexchange.com/questions/147869/interchanging-the-order-of-differentiation-and-summation and http://math.stackexchange.com/questions/352150/differentiating-an-infinite-sum
- Interchanging derivative and integral: http://planetmath.org/differentiationundertheintegralsign
- We need to spend a lot of time worrying about technical issues; this is par for the course as you continue in analysis. There are a lot of statements which appear reasonable, but turn out to be false. This is why we looked at the function \(g(x) = \exp(-1/x^2)\) for \(x \neq 0\) and \(0\) otherwise; this showed that Taylor series need not converge beyond the point of expansion (where trivially they must converge). There's even more bad news -- this shows us that Taylor series need not be unique, as \(x - x^3/3! + x^5/5! - \cdots\) could be the Taylor series of \(\sin x\) or of \(g(x) + \sin x\).
- Another big theme of the day was asking questions. There's a standard list that work in many situations: Does it exist? Where does it exist? Is it unique? How quickly does it converge? What about higher dimensional analogues? How do I compute it?
- One question we didn't consider today was how Taylor series behave under combinations. What is the Taylor series of a sum? Of a product? Of a composition? Are there nice formulas relating the new object to the original ones?
- Wikipedia page on limsup and liminf: http://en.wikipedia.org/wiki/Limit_superior_and_limit_inferior (the further down you read, the less useful it is for our purposes!).
- The rearrangement theorem illlustrates the dangers that can happen when we deal with conditionally convergent but not absolutely convergent sums.
- Here is a link with more GRE information and practice exams: http://www.wmich.edu/mathclub/gre.html
- Here is a link to today's lecture: http://youtu.be/MonfQXBshnI