## Posts tagged ‘statistics’

### Mr. Consistency, Khris Davis

If you flipped four coins, the probability of getting exactly one head would be 0.25.

But the probability of doing that four times in a row is much lower, somewhere closer to 0.0039, or about 1 in 250.

Now, imagine flipping 100 coins four times, and getting the same number of heads each time. The odds of that happening are only slightly better than impossible. In fact, if every person *in the entire world* were to flip 100 coins four times, it would still be highly unlikely that this would ever happen.

That’s how rare it is, and it gives you some idea of what Major League Baseball player Khris Davis just pulled off. The Oakland Athletics outfielder just finished his fourth consecutive season with a batting average of .247. That’s right — the same average four seasons in a row.

Davis had some advantage over our coins, though. For starters, he wasn’t required to have the same number of at-bats every year. Moreover, batting averages are rounded to three decimal places, so his average wasn’t *exactly* the same during those four years; it was just really, really close:

**2015**: .24745 (97 hits in 392 at-bats)**2016**: .24685 (137 in 555)**2017**: .24735 (140 in 566)**2018**: .24653 (142 in 576)

How could something like this happen? According to Davis, “I guess it was meant to be.“

Perhaps it *was* predestination, but I prefer to put my faith in numbers.

Empirically, we can look at the data. From 1876 to present, there have been 19,103 players in the major leagues. The average length of an MLB career is about 5.6 years, which means that an average player would have about three chances to record the same batting average four seasons in a row. It’s then reasonable to say that there have been approximately 3 × 19,103 = 57,309 opportunities for this to happen, yet Khris Davis is the only one to accomplish this feat. So experimentally, the probability is about 1 in 60,000.

Theoretically, we can look at the number of ways a player could finish a season with a .247 batting average. In 2007, the Phillies’ Jimmy Rollins recorded an astounding 716 at-bats. That’s the most ever by a Major League Baseball player. So using a sample space from 1 to 716 at-bats, I determined the number of ways to achieve a .247 batting average:

- 18 hits, 73 at-bats
- 19 hits, 77 at-bats
- 20 hits, 81 at-bats
- 21 hits, 85 at-bats
- 22 hits, 89 at-bats
- 36 hits, 146 at-bats
- …
- 161 hits, 652 at-bats
- 161 hits, 653 at-bats
- …
- 177 hits, 716 at-bats

And, of course, there are the examples above from Davis’s last four seasons.

It’s interesting that it’s not possible to obtain a batting average of .247 if the number of at-bats is anywhere from 90 to 145; yet it’s possible to hit .247 with 161 hits for either 652 or 653 at-bats. I guess it’s like Ernie said: “That’s how the numbers go.“

All told, **there are 245 different ways to hit .247** if the number of at-bats is 716 or fewer.

That may sound like a lot, but consider the alternative: there are 256,441 ways to **not** hit .247 with 716 or fewer at-bats.

So, yeah. No matter how you look at it, what Davis did is pretty ridiculous. Almost as ridiculous as what happened to Saul…

Saul is working in his store when he hears a voice from above. “Saul, sell your business,” the voice says. He ignores it. His business is doing well, and he’s happy. “Saul, sell your business,” the voice repeats. The voice goes on like this for days, then weeks. “Saul, sell your business.” Finally, Saul can’t take it any more. He finds a buyer and sells his business for a nice profit.

“Saul, take your money, and go to Las Vegas,” the voice says.

“But why?” asks Saul. “I have enough to retire!”

“Saul, take your money to Las Vegas,” the voice repeats. It is incessant. Finally, Saul relents and heads to Vegas.

“Saul, go to the blackjack table and bet all your money on one hand.”

He hesitates for a moment, but he knows the voice won’t stop. So, he places his bet. He’s dealt 18, while the dealer has a 6 showing. “Saul, take a card.”

“What? The dealer has…”

“Saul, take a card!” the voice booms.

Saul hits. He gets an ace, 19. He sighs in relief.

“Saul, take another card.”

“You’ve got to be kidding me!” he pleads.

“Saul, take another card.”

He asks for another card. Another ace, 20.

“Saul, take another card,” the voice demands.

But I have 20!” Saul shouts.

“TAKE ANOTHER CARD, SAUL!”

“Hit me,” Saul says meekly. He gets another ace, 21.

And the voice says, “Un-fucking-believable!”

### Nationals Win Probability, and Other Meaningless Statistics

The first pitch of last night’s Nationals-Phillies game was 8:08 p.m. That’s pretty late for me on a school night, and when a 38-minute rain delay interrupted the 4th inning, well, that made a late night even later.

The Phillies scored 4 runs in the top of the 5th to take a 6‑2 lead. When the Nationals failed to score in the bottom of the 5th, I asked my friends, “What are the chances that the Nationals come back?” With only grunts in response and 10:43 glowing from the scoreboard, we decided to leave.

On the drive home, we listened as the Nationals scored 3 runs to bring it to 6‑5. That’s where the score stood in the middle of the 8th inning when I arrived home, and with the Nats only down by 1, I thought it might be worth tuning in.

The Nats then scored 3 runs in the bottom of the 8th to take an 8-6 lead. And that’s when an awesome stat flashed on the television screen:

Nats Win Probability

- Down 6-2 in the 6th: 6%
- Up 8-6 in the 8th: 93%

Seeing that statistic reminded me of a Dilbert cartoon from a quarter-century ago:

I often share Dogbert’s reaction to statistics that I read in the newspaper or hear on TV or — *egad!* — are sent to me via email.

I had this kind of reaction to the stat about the Nationals win probability.

For a weather forecast, a 20% chance of rain means it will rain on 20% of the days with exactly the same atmospheric conditions. Does the Nats 6% win probability mean that *any team* has a 6% chance of winning when they trail 6-2 in the 6th inning?

Or does it more specifically mean that the Nationals trailing 6-2 in the 6th inning to the Phillies would only win 1 out of 17 times?

Or is it far more specific still, meaning that this particular lineup of Nationals players playing against this particular lineup of Phillies players, late on a Sunday night at Nationals Stadium, during the last week of June, with 29,314 fans in attendance, with a 38-minute rain delay in the 4th inning during which I consumed a soft pretzel and a beer… are **those** the right “atmospheric conditions” such that the Nats have a 6% chance of winning?

As it turns out, the win probability actually includes lots of factors: whether a team is home or away, inning, number of outs, which bases are occupied, and the score difference. It does not, however, take into account the cost or caloric content of my mid-game snack.

A few other stupid statistics I’ve heard:

- Fifty percent of all people are below average.
- Everyone who has ever died has breathed oxygen.
- Of all car accidents in Canada, 0.3% involve a moose.
- Any time Detroit scores more than 100 points and holds the other team below 100 points, they almost always win.

**Have you heard a dumb stat recently?** Let us know in the comments.

### Stupid Stats

C’mon, now… really?

Uterine size in non-pregnant women varies in relation to age and gravidity [number of pregnancies]. The

mean length-to-width ratio conformed to the golden ratioat the age of 21, coinciding with peak fertility.

Claiming that a uterine golden ratio coincides with peak fertility is highly suspect. The good folks at Ava Women claim that, “Most women reach their peak fertility rates between the ages of 23 and 31.” Information at Later Baby states, “Female fertility and egg quality peak around the age of 27.” And WebMD says, “A woman’s peak fertility is in her early 20s.” So, there seems to be some debate about when peak fertility actually occurs. Consequently, this strikes me as retro-fitting, and it seems that Dr. Verguts and his colleagues may have played loose with the age of peak fertility in order to make a connection to the golden ratio.

In their defense, though, it’s not the first time that folks have gone uptown trying to find a connection to the golden ratio. A claim by The Golden Number states, “[The DNA molecule] measures 34 angstroms long by 21 angstroms wide for each full cycle of its double helix spiral,” and 34/21 ≈ 1.6190476, which is approximately equal to φ, 1.6180339.

Though this guy — an honest-to-goodness biologist — seems to disagree:

I’ve also heard folks say that people are perceived as more beautiful if certain bodily proportions are in the golden ratio. The most extreme example of this that I’ve found involves the teeth:

…the most “beautiful” smiles are those in which central incisors are 1.618 wider than the lateral incisors, which are 1.618 wider than canines, and so on.

In a study of 4,572 extracted adult teeth, Dr. Julian Woelfel found the average width of the central incisor to be 8.6 mm. If the teeth in a beautiful smile follow the geometric progression described above, well, that would imply that the first molar would be just 8.6 × 0.618^{5} ≈ 0.8 mm wide, which isn’t reasonable and, moreover, is not even remotely close to the average width that Dr. Woelfel found for the first molar: approximately 10.4 mm.

But all of these claims involving the golden ratio are not even close to being the stupidest statistics I’ve heard in my life. Mary Anne Tebedo made a remark on the floor of the Colorado State Senate in 1995 that may hold that distinction:

Statistics show that teen pregnancy drops off significantly after age 25.

Of course, it’s hard to call that a *statistic*, since it’s completely nonsensical. Maybe it’s only the stupidest *statement* I’ve ever heard.

Then there’s this one, from the *New York Times* on August 8, 2016, which couldn’t be more useless:

No presidential candidate has secured a major party nomination after an FBI investigation into her use of a private email server.

Well, duh. Email didn’t even exist before the 1970’s. Moreover, besides Hillary Clinton, has *any* presidential candidate ever had their use of a private server investigated by the FBI? This is like saying, “No one has ever been named *People*‘s Sexiest Man Alive after writing a math joke book.” (Not yet, anyway.)

Randall Munroe made fun of these types of “no politician has ever…” claims in 2012 with his cartoon *Election Precedents*:

And it’s true:

But perhaps my all-time favorite is one that Frank Deford — may he rest in peace — included in his piece “The Stupidest Statistics in the Modern Era” on NPR’s Morning Edition:

He’s [Brandon Phillips] the first National League player to account for as many as 30 steals and 25 double plays in one season.

About this stat, Deford commented, “Steals and double plays together? This is like saying, ‘He’s the first archaeologist to find 23 dinosaur bones and 12 Spanish doubloons on the same hunt.'” (I sure am going to miss him.)

The preponderance of dumb stats shouldn’t come as a surprise, though. A recent study found that people deemed real news headlines to be accurate 83% of the time and fake news headlines to be accurate 75% of the time. So, if we can’t tell truth from fiction, how can we possibly distinguish useful statistics from inane?

If you’d like to test your ability to detect fake news, check out Factitious from American University.

### Morelli, Coleman, and Statistical Outliers

You won’t see Pete Morelli and crew officiating tonight’s Monday Night Football game — and Philadelphia Eagles’ fans couldn’t be happier.

At kick-off, more than 74,000 fans had signed a petition to have Morelli banned from serving as the referee for any Eagles’ game. That’s because last Thursday night, Morelli and his crew called 10 penalties for 126 yards against the Eagles, whereas they only called 1 penalty for 1 yard against their opponents, the Carolina Panthers.

But Philadelphia sports reporter Dave Zangaro pointed out that Morelli has a history of lopsided officiating against the Eagles. In the last four Eagles’ games that Morelli has covered, his crew has called 40 penalties for 396 yards against the Eagles, but only eight penalties for 74 yards against the opponents.

No doubt, that’s quite a disparity.

But I’m curious if any of the petition signers have actually checked the numbers. Statistical anomalies happen, and I suspect that the imbalance they’ve identified is likely one of many. I didn’t run the numbers to determine if Morellli’s stats constitute an outlier; that would be too much work. But, I did take a quick peek at the other referees in the league to see what I can see.

And what I found leads me to wonder, **Why hasn’t anyone started a petition to get Walt Coleman banned from officiating Atlanta Falcons games?** Maybe it’s because Coleman officiates *in favor* of the Falcons.

Check it. In the last six Falcons’ games that Coleman has officiated, the Falcons have been penalized only 29 times for 216 yards. Their opponents, by comparison, have been penalized 53 times for 463 yards. That’s an average of four fewer penalties and half as many penalty yards per game.

And it’s even worse if you consider only home games. In those four games, the advantage is just 16 penalties for 111 yards against the Falcons to 37 penalties for 320 yards against their opponents.

Don’t believe me? Take a look…

Date | Game | Opponent’s Penalties | Opponent’s Penalty Yds | Falcons’ Penalties | Falcons’ Penalty Yds |

9/30/12 | Panthers @ Falcons | 9 | 64 | 2 | 15 |

1/13/13 | Seahawks @ Falcons | 6 | 35 | 3 | 11 |

9/29/13 | Patriots @ Falcons | 9 | 93 | 6 | 55 |

12/23/13 | Falcons @ 49ers | 7 | 45 | 5 | 37 |

12/4/16 | Chiefs @ Falcons | 13 | 128 | 5 | 30 |

9/24/17 | Falcons @ Lions | 9 | 98 | 8 | 68 |

Totals |
53 |
463 |
29 |
216 |

Admittedly, those numbers aren’t quite as stark as Morelli’s, but they don’t exactly paint a picture of Coleman as an impartial ref, either.

In 2012, replacement official Brian Stropolo was banned from working a New Orleans Saints’ game when pictures of him donning Saints’ attire were found on his Facebook page. So there is precedence if the NFL wants to use my analysis to ban Coleman from Falcons’ games, or if they want to accept the petition and ban Morelli from Eagles’ games.

But let’s keep this in perspective and remember one thing: **It is Philadelphia**, after all. I mean, we’re talking about a sports town where fans threw snowballs at Santa Claus and threw batteries at Eagles quarterback Doug Pederson — the same Doug Pederson, in fact, who is now the Eagles coach. So if Morelli and his crew are deliberately blowing the whistle more against the Eagles than their opponents, who cares? This type of denigration couldn’t be offered to a more deserving team.

### Sound Smart with Math Words

When law professor Richard D. Friedman appeared in front of the Supreme Court, he stated that an issue was “entirely orthogonal” to the discussion. Chief Justice John G. Roberts Jr. stopped him, saying, “I’m sorry. Entirely *what*?”

“Orthogonal,” Friedman replied, and then explained that it meant *unrelated* or *irrelevant*.

Justice Antonin Scalia was so taken by the word that he let out an **ooh** and suggested that the word be used in the opinion.

In math class,* orthogonal* means “at a right angle,” but in common English, it means that two things are unrelated. Many mathematical terms have taken a similar path; moreover, there are many terms that had extracurricular meanings long before we ever used them in a math classroom. *Average* is used to mean “typical.” *Odd* is used to mean “strange” or “abnormal.” And *base* is used to mean “foundation.” To name a few.

The stats teacher said that I was average, but he was just being mean.

You know what’s odd to me? Numbers that aren’t divisible by 2.

An exponent’s favorite song is, “All About the Base.”

Even words for quantities can have multiple meanings. Plato used *number* to mean any quantity more than 2. And *forty* used to refer to any large quantity, which is why Ali Baba had forty thieves, and why the Bible says that it rained for forty days and forty nights. Nowadays, we use *thousands* or *millions* or *billions* or *gazillions* to refer to a large, unknown quantity. (That’s just grammatical inflation, I suspect. In a future millennium, we’ll talk of *sextillion* tourists waiting in line at Disneyland or of *googol* icicles hanging from the gutters.)

Zevenbergen (2001) provided a list of 36 such terms that have both math and non-math meanings, including:

- angle
- improper
- point
- rational
- table
- volume

The alternate meanings can lead to a significant amount of confusion. Ask a mathematician, “What’s your point?” and she may respond, “(2, 4).” Likewise, if you ask a student to determine the volume of a soup can, he may answer, “Uh… quiet?”

It can all be quite perplexing. But don’t be overwhelmed. Sarah Cooper has some suggestions for working mathy terms into business meetings and everyday speech. Like this…

For more suggestions, check out her blog post How to Use Math Words to Sound Smart.

If you really want to sound smart, though, be sure to heed the advice of columnist Dave Barry:

Don’t say:“I think Peruvians are underpaid.”

Say instead:“The average Peruvian’s salary in 1981 dollars adjusted for the revised tax base is $1452.81 per annum, which is $836.07 below the mean gross poverty level.”

NOTE: Always make up exact figures. If an opponent asks you where you got your information, makethatup, too.

This reminds me of several stats jokes:

- More than 83% of all statistics are made up on the spot.
- As many as one in four eggs contains salmonella, so you should only make three-egg omelettes, just to be safe.
- Even some failing students are in the top 90% of their class.
- An unprecedented 69.846743% of all statistics reflect an unjustified level of precision.

You can see the original version of “How to Win an Argument” at Dave Barry’s website, or you can check out a more readable version from the Cognitive Science Dept at Rensselaer.

Zevenbergen, R. (2001). Mathematical literacy in the middle years. *Literacy Learning: the Middle Years*, *9*(2), 21-28.

### Hold On… *How Many* Copies?

How many copies of *Math Jokes 4 Mathy Folks* do you think sold last week?

Make Your Prediction Here (Google Form) |

Why would you want to make a prediction? Well, lots of reasons…

- Like the author (and readers) of this blog, you’re a math geek.
- You swoon at the sight of data.
- You’ve never met a puzzle you didn’t like.
- You want to show the world how awesome you are.
**You’d like to win a signed copy of***Math Jokes 4 Mathy Folks*, some cool MJ4MF stickers, and a surprise gift, all shipped to you in exclusive MJ4MF packaging!- All of the above.

If you’re reading this blog, then you surely love being alive in the Age of Big Data. I love it, too, and I devour any data that I can get my hands on.

Amazon feeds my desire by providing two valuable pieces of data about * Math Jokes 4 Mathy Folks*. First, they provide the

**sales rank**for the book, which is updated hourly. Second, they provide

**weekly sales data**. The downside to this latter stat is the delay in its release — they provide data for Monday-Sunday, but it isn’t released until the following Friday. The upside is that big dorks like me use the time from Monday through Thursday to make predictions.

The chart below shows the average sales rank and weekly sales for Nov 24 through Dec 14. (The “average sales rank” is the average of the sales ranks for the seven days each week. Although it’s updated hourly, I don’t have the time to check it that frequently, so I rely on Author Central, which reports the sales rank at the end of each day.) It also shows the average sales rank (but not sales) for last week, Dec 15‑21.

Week |
Amazon Sales Rank(Weekly Average) |
WeeklySales |

Nov 24-30 |
4,742 | 114 |

Dec 1-7 |
3,437 | 279 |

Dec 8-14 |
2,390 | 435 |

Dec 15-21 |
2,063 | ? |

**The question: How many copies of MJ4MF were sold last week?**

Oh, sure… I could just wait until Friday to find out — but what fun would that be?

Instead, I constructed several mathematical models, and then I tweaked them to predict how many books were sold. The tweaks were based on some things I’ve learned over the past couple of years:

- Holiday sales are most vigorous in the first two weeks of December. They slow down a bit in the third week. Consequently, a sales rank of 1,655 on Dec 1 does not equal a sales rank of 1,655 on Dec 21.
- The long-term trend is not linear. In fact, this graph from Foner Books shows that it’s logarithmic.

Which brings us to the contest. Go to the Google form and **enter your prediction and email address**. (The email is only so I can contact you if you win.) **Closest guess to the actual number of sales will win the grand prize.** In the event of a tie, a winner will be randomly selected (or if I’m feeling generous, maybe there will be multiple winners… it’s hard to predict my disposition on any given day).

So, what are you waiting for? Open Excel or SPSS or your stat software of choice, muddle through a few regressions, and submit your entry!

**Winners will be announced on Saturday, December 27, 2014.** The exact time will depend on what time I roll out of bed, what activities my wife and kids propose for the day, and my particular disposition on Saturday. On second thought… safest if you check back on Sunday.

Good luck!

### You Say It’s Your Birthday…

Well, no, actually it’s not my birthday. And it’s not my friend Jacqui’s birthday, either, but she did just celebrate a milestone with us that she wanted to share. Via email, she announced,

I’ve been alive for two billion seconds, a milestone I passed this morning.

This reminded me of a problem from Steve Leinwand’s book, *Accessible Mathematics*, in which he tells kids his age as a unitless number, then asks them to identify what units he must be using. Along those lines, here are some questions for you.

- How old (in years) is my friend Jacqui?
- What is her date of birth?
- If I tell you that my age is 22,333,444, what units must I be using? Assuming I’m not telling a fib, of course. And what is my age in years and my date of birth?

This reminds me of two math jokes about birthdays.

Statistics show that those who celebrate the most birthdays live longest.

An algebraist remembers that his wife’s birthday is on the (

n– 1)^{st}of the month. Unfortunately, he only remembers this when he is reminded on then^{th}.