WSJ Blogs

The Numbers Guy
Carl Bialik examines the way numbers are used, and abused.

Cracking the Code Math for a Veto Message

My print column this week analyzes the math behind a coded message in a veto by California Gov. Arnold Schwarzenegger. The governor said it was a “total coincidence” that the first letter of each line in his veto message spelled out a common obscenity. Mathematicians who crunched the numbers say the expression was highly unlikely to emerge by chance, though their estimates, as published in the press and on blogs, differ widely.

writing

Steven Piantadosi, a graduate student in cognitive sciences at the Massachusetts Institute of Technology, arrived at a probability of slightly less than one in one trillion, as he noted on his blog. A 1960s-era compilation of American English texts known as the Brown corpus, analyzed by Linguistic Data Consortium researcher David Graff, yielded almost exactly the same calculation. Dave Thomas, a staff scientist at New Mexico Tech, used works from the Project Gutenberg database of copyright-free electronic books, and came up with one in 500 billion. Mathematician Edward Lewand of Goucher College told NPR he came up with about one in 180 billion. “I think it is very unlikely that this acrostic was just a coincidence,” Lewand said.

Schwarzenegger isn’t Jane Austen nor William Shakespeare, two of the sources for the letter-frequency numbers behind Thomas’s calculation. It’s possible that he’s more prone to use words that begin with the letters that made up the coded message. For instance, “K” is the least likely letter in the message to start a word. But the “K” in the veto message comes from a statement that the state legislature “kicks the can down the alley.” Some blogs have suggested that this phrase was included merely to make the encoded veto message work. But the phrase, meaning delaying tough decisions, improbably has become a standard one in California politics in the last year.

Also, as Thomas and others note, these probabilities are for the message to appear in any given seven lines. Schwarzenegger has vetoed about 1,700 bills, and often attaches messages to his rejections. However, these brief missives don’t add up to enough text to explain the improbable profanity. Using the text of the letters from the California government’s Web site, Graff could find no message formed by the beginning of each line that was any edgier than “espy tit,” in a veto of a bill that would have designated 211 as a social-services phone number.

If one broadens the criteria for such codes, they become easier to find. In my brief email to Brendan McKay, a computer scientist at Australian National University in Canberra, about the veto message, he found the word “measures,” by counting backwards four letters at a time from a certain spot in the email. He also found “events,” by counting backwards five letters at a time from elsewhere in the message. “The chance of this happening accidentally (even allowing for all possible starting points and all possible skip amounts forwards or backwards) is about one in a billion,” McKay said. “But I don’t believe you did it on purpose.” That’s partly because he didn’t look specifically for those words; “I fed a whole dictionary [into his computer program] and then spun a story around two of the many words that appeared,” McKay said.

Reflecting on the difficulties of calculating the probability that the veto message arose by chance, Williams College mathematician Edward Burger said, “That’s the annoying thing about coincidences. We should definitely expect them. But the annoying thing is we just don’t know the actual probability that one of these things will happen.”

Further reading: Read more about a one in 10 billion trillion long shot that cost an English journalist a job, a one in 300,000 trillion trillion long shot that riled some Pakistanis and other Numbers Guy writings on coincidences and unlikely events.

Thanks to Jason Fry for the idea.

Add a Comment

We welcome thoughtful comments from readers. Please comply with our guidelines. Our blogs do not require the use of your real name.

Comments (5 of 6)

View all Comments »
    • Mr. Bialik:

      I thoroughly enjoyed this article about Arnold Schwarzenegger’s veto to the state Legislature. But I must confess I’m a bit perplexed. The logic of the analyses seems off to me. I’m sure the mathematicians who conducted these analyses are smarter than I so perhaps you can help me understand this better.

      The analyses calculate the odds of those particular letters appearing as the first letter of a line and assume pure randomness or weight the letters based on their usage frequency in English. The argument if I have correctly understood is that since those odds are infinitesimally small, that they must have been deliberately selected to send a message (i.e., the meaning of the acrostic).

      I’m just not sure this is the right way to think about it. For one thing, people do not write by randomly selecting letters from the alphabet or, even, by randomly selecting letters and then weighting their use based on the usage in various corpora. Rather, people have an idea to convey and they then write paragraphs, sentences and words to convey that idea. The smallest unit that people plan is typically the word. They do not write letter by letter. Once you have selected a word as the correct one to convey your idea, you are not free to decide what its first letter will be. The first letter is determined by the word. It is a given. Now you could argue that there are several words that could be used to convey the same idea and that Arnold picked the word that had the first letter he was looking for. OK. Maybe. But then your analysis of each line must constrain itself to the total set of words that could have been used to make the point Arnold was trying to make.

      There seems to be another problem. If the odds of those particular letters appearing as the first letter of the first lines in sequence are X, then the odds of those same letters appearing in the same sequence but as the second, third or 27th letters of the line would be the same. The odds would be the same if the scenario was: letter 1 appears as the first letter of line 1, letter 2 appears as the 2nd letter of line 2, letter 3 appears as the 3rd letter of line 3… In fact, any pattern at all involving a sequence of those same letters would have the same odds - roughly 1/26 to the 7th (possibly weighted by frequency in the corpus). Now if the logic of the argument is that the extremely small odds of those letters appearing as the first letters of the lines indicates that Arnold chose them deliberately to make a point, then one must also argue that the 2nd letter pattern, the 3rd letter pattern, the 27th letter pattern and all patterns involving 7 letters were deliberately planned by Arnold to communicate a point. Does that seem reasonable to you?

      You could argue that, well, of course he didn’t plan all those patterns. He couldn’t possibly do that but the first letter of a line is so obvious and so controllable. OK. But that is a common sense argument, not a statistical argument.

      I’m curious to see what you have to say. Very interesting!

      Adam

      Adam Schorr
      http://adam-1001words.blogspot.com/
      “It isn’t worth it if you have to leave your soul behind”

    • Dear Mr. Bilaik:

      Re your “Coincidental Obscenity” essay:

      1. The odds (“1 in 8.03 billion”; “one in ten million”; “one in one trillion”) are meaningless unless we know how many “four-line paragraph and a three-line paragraph” appear in print each day? week? month?
      If a trillion 4-3’s are written, then the odds are 1 in 1 that the F-Y obscenity will appear.
      Now, million, billion, and trillion are large numbers, but “The CTIA released its semi annual report this week: over 740 billion text messages were sent in the first half of 2009. That works out to 4.1 billion text messages sent per day in the United States! That’s billion with a “B”.
      (http://www.mobilecommons.com/blog/2009/10/report-4-1-billion-text-messages-sent-every-day/)
      I googled “How many paragraphs are written each day”, but got no hit.

      2. Your three “Text Message” examples are from the Project Gutenberg edition. But unless one examines the author’s own manuscripts, any vertical messages are the result of page width, end-justified lines, use of hyphens, etc. There are so many editions of famous novels, each edition with its own typesetting, that some edition would have a vertical message. And one that wold not.

      3. As for the Bible Code – which Bible? Again, so many editions, based on so many fragmented manuscripts, that one can always find a “Bible” that has vertical messages – or one that does not.

      Numerologically yours,

      Richard Handelsman
      West Palm Beach, Florida.

    • Since when does a governor write his own statements, and especially type them? Who says that “Arnold” did it, and not a staffer?

    • For all of our thinking that he was linguisticly challenged,
      uncle arnold amazes us again.
      couldn’t he have just
      kicked his own can down the road, and

      helped us all by turning
      in his political aspirations, and going home to
      maria.

About The Numbers Guy

  • The Numbers Guy examines numbers in the news, business and politics. Some numbers are flat-out wrong or biased, while others are valid and help us make informed decisions. Carl Bialik tells the stories behind the stats, in occasional updates on this blog and in his column published every Wednesday in The Wall Street Journal. Carl, who holds a degree in mathematics and physics from Yale University, also writes daily about sports numbers on WSJ.com. He welcomes your comments at numbersguy@wsj.com.