Math 399 (Sabermetrics)
Instructor: Professor Steven J Miller (sjm1 AT williams.edu,
Bronfman 202, x3293)
♦
COURSE DESCRIPTION: Sabermetrics
deals with the application of mathematical and statistical reasoning to baseball
problems. The purpose of this class is to conduct sabermetrics research on a
variety of problems, ranging from items of interest to the participants to
problems suggested by a Major League Ballclub. Investigating these problems will
require numerous advanced lectures on topics such as
Markov Chain
Monte Carlo (and the
Perron-Frobenius Theorem),
Linear Programming
(especially binary linear programming), probability theory and mathematical
modeling. As there is a plethora of valuable data on the web, part of the course
will be to write script programs to gather and analyze the data to test our
models and theories.
Format:
lecture/discussion and presentations. Evaluation will be based primarily on
scholarship and discussions.
Prerequisites:
Multivariable calculus, linear algebra, Stat 201. Enrollment
limit: 5 or 6.
♦
GENERAL: Please
feel free to swing by my office or mention before, in or after class any
questions or concerns you have about the course (click
here for my schedule). If you have any suggestions for
improvements, ranging from method of presentation to choice of examples, just
let me know. If you would prefer to make these suggestions anonymously, you can
send email from
mathephs@gmail.com (the password
is the first seven Fibonacci
numbers, 11235813).
♦
OBJECTIVES: There
are two main goals to this course: to explore
problems in sabermetrics and to learn
some advanced mathematics and modeling techniques.
♦
SYLLABUS: The
following is a tentative syllabus, including some of the topics to be studied
but not necessarily the order in which they will be investigated. As this is an
independent studies course, there will be significant student input in the
choice of topics and the order in which they are covered. Though the class does
not begin to formally meet until Fall 2009, we have already had one meeting with
all participants, and plan on having several more so that we can begin working
over the summer.
- Markov chain monte carlo. This is the way to go for many
simulations in industry, and thus it is worth studying in detail. These
techniques are used all the time to simulate probability ranges for various
events. There are two things we'll do: one is to apply this to baseball /
sabermetrics questions and models, and one is to learn the general
theory. This leads to some really cool linear algebra (actually, many things
lead to cool linear algebra). In particular, the Perron-Frobenius theorem for
the dominance of the largest eigenvalue of a matrix with all non-negative
entries. Additionally we'll discuss Monte Carlo integration, a powerful
probabilistic way to approximate difficult multi-dimensional integrals. Such a
theory is extremely useful, as very few integrands have closed form
anti-derivatives.
- Mathematical modeling. We'll explore the basics of what is a good model, the interplay
between keeping
the model mathematically tractable and capturing the key features. We'll do a
lot of this in
modeling games / player / team performance. We'll discuss in detail my paper
on the Pythagorean Won-Loss Theorem, as well as possible generalizations
incorporating ballpark effects, extra innings, interleague games and blowouts
(to name a few). In the course of doing so we'll review standard methods from
statistics (least squares, maximum likelihood), and talk about some
not-so-common difficulties (structural zeros in r x c contingency tables) and
the theory to handle them. Another model we'll explore is what is known as the
log-5 method (which is used to
predict the probability one team beats another solely from their winning
percentages).
- Linear programming. This is another nice application of advanced linear
algebra. Linear
programming is a great way of solving or approximately solving many
optimization problems.
It's used in designing schedules for MLB (minimize travel time, have Sox -
Yankees games at
good times, have lots of division games at the end of the season). It's also
used to correctly compute elimination numbers, taking into account who has games against whom (MLB
does not correctly calculate elimination numbers).
We'll read
the paper that implements linear programming to very efficiently solve this
problem (click
here for more information). We will use my
notes on Linear Programming as a
guide (these are based on the excellent book by Joel Franklin on Mathematical
Methods in Economics).
- Sabermetrics topics / statistics. The holy grail in the subject is
to find a statistic that no one else is aware of which is a terrific predictor
of future outcomes; this would allow teams to take advantage of market
inefficiencies. We will attempt to find such statistics. To set the stage, we
will first look at some common statistics, such as batting average, on-base
percentage, slugging percentage, OPS (on-base plus slugging),
RC (runs created), et
cetera. Other items include trying to throw away meaningless statistics (are
some players just a Mr. Regular Season or Mr. No Pressure, putting up
impressive numbers when the game has already been decided), optimal batting
orders, bullpen usage, et cetera. For many of these questions, it is not
enough to present a plausible mathematical model -- ballplayers, managers and
owners have intuition based on years of play, and there is resistance to
change (the classic example is that the sabermetrics community believes that
ace relievers are not used properly; however, anyone who watches a game will
not that many of these pitchers do perform differently when the game is on the
line).
♦
Useful Resources
♦
Baseball webpages
♦
Pictures of Baseball
Parks (please send your photos)
♦
Interesting math articles