## Posts tagged ‘data’

### Stupid Stats

C’mon, now… really?

Uterine size in non-pregnant women varies in relation to age and gravidity [number of pregnancies]. The

mean length-to-width ratio conformed to the golden ratioat the age of 21, coinciding with peak fertility.

Claiming that a uterine golden ratio coincides with peak fertility is highly suspect. The good folks at Ava Women claim that, “Most women reach their peak fertility rates between the ages of 23 and 31.” Information at Later Baby states, “Female fertility and egg quality peak around the age of 27.” And WebMD says, “A woman’s peak fertility is in her early 20s.” So, there seems to be some debate about when peak fertility actually occurs. Consequently, this strikes me as retro-fitting, and it seems that Dr. Verguts and his colleagues may have played loose with the age of peak fertility in order to make a connection to the golden ratio.

In their defense, though, it’s not the first time that folks have gone uptown trying to find a connection to the golden ratio. A claim by The Golden Number states, “[The DNA molecule] measures 34 angstroms long by 21 angstroms wide for each full cycle of its double helix spiral,” and 34/21 ≈ 1.6190476, which is approximately equal to φ, 1.6180339.

Though this guy — an honest-to-goodness biologist — seems to disagree:

I’ve also heard folks say that people are perceived as more beautiful if certain bodily proportions are in the golden ratio. The most extreme example of this that I’ve found involves the teeth:

…the most “beautiful” smiles are those in which central incisors are 1.618 wider than the lateral incisors, which are 1.618 wider than canines, and so on.

In a study of 4,572 extracted adult teeth, Dr. Julian Woelfel found the average width of the central incisor to be 8.6 mm. If the teeth in a beautiful smile follow the geometric progression described above, well, that would imply that the first molar would be just 8.6 × 0.618^{5} ≈ 0.8 mm wide, which isn’t reasonable and, moreover, is not even remotely close to the average width that Dr. Woelfel found for the first molar: approximately 10.4 mm.

But all of these claims involving the golden ratio are not even close to being the stupidest statistics I’ve heard in my life. Mary Anne Tebedo made a remark on the floor of the Colorado State Senate in 1995 that may hold that distinction:

Statistics show that teen pregnancy drops off significantly after age 25.

Of course, it’s hard to call that a *statistic*, since it’s completely nonsensical. Maybe it’s only the stupidest *statement* I’ve ever heard.

Then there’s this one, from the *New York Times* on August 8, 2016, which couldn’t be more useless:

No presidential candidate has secured a major party nomination after an FBI investigation into her use of a private email server.

Well, duh. Email didn’t even exist before the 1970’s. Moreover, besides Hillary Clinton, has *any* presidential candidate ever had their use of a private server investigated by the FBI? This is like saying, “No one has ever been named *People*‘s Sexiest Man Alive after writing a math joke book.” (Not yet, anyway.)

Randall Munroe made fun of these types of “no politician has ever…” claims in 2012 with his cartoon *Election Precedents*:

And it’s true:

But perhaps my all-time favorite is one that Frank Deford — may he rest in peace — included in his piece “The Stupidest Statistics in the Modern Era” on NPR’s Morning Edition:

He’s [Brandon Phillips] the first National League player to account for as many as 30 steals and 25 double plays in one season.

About this stat, Deford commented, “Steals and double plays together? This is like saying, ‘He’s the first archaeologist to find 23 dinosaur bones and 12 Spanish doubloons on the same hunt.'” (I sure am going to miss him.)

The preponderance of dumb stats shouldn’t come as a surprise, though. A recent study found that people deemed real news headlines to be accurate 83% of the time and fake news headlines to be accurate 75% of the time. So, if we can’t tell truth from fiction, how can we possibly distinguish useful statistics from inane?

If you’d like to test your ability to detect fake news, check out Factitious from American University.

### Morelli, Coleman, and Statistical Outliers

You won’t see Pete Morelli and crew officiating tonight’s Monday Night Football game — and Philadelphia Eagles’ fans couldn’t be happier.

At kick-off, more than 74,000 fans had signed a petition to have Morelli banned from serving as the referee for any Eagles’ game. That’s because last Thursday night, Morelli and his crew called 10 penalties for 126 yards against the Eagles, whereas they only called 1 penalty for 1 yard against their opponents, the Carolina Panthers.

But Philadelphia sports reporter Dave Zangaro pointed out that Morelli has a history of lopsided officiating against the Eagles. In the last four Eagles’ games that Morelli has covered, his crew has called 40 penalties for 396 yards against the Eagles, but only eight penalties for 74 yards against the opponents.

No doubt, that’s quite a disparity.

But I’m curious if any of the petition signers have actually checked the numbers. Statistical anomalies happen, and I suspect that the imbalance they’ve identified is likely one of many. I didn’t run the numbers to determine if Morellli’s stats constitute an outlier; that would be too much work. But, I did take a quick peek at the other referees in the league to see what I can see.

And what I found leads me to wonder, **Why hasn’t anyone started a petition to get Walt Coleman banned from officiating Atlanta Falcons games?** Maybe it’s because Coleman officiates *in favor* of the Falcons.

Check it. In the last six Falcons’ games that Coleman has officiated, the Falcons have been penalized only 29 times for 216 yards. Their opponents, by comparison, have been penalized 53 times for 463 yards. That’s an average of four fewer penalties and half as many penalty yards per game.

And it’s even worse if you consider only home games. In those four games, the advantage is just 16 penalties for 111 yards against the Falcons to 37 penalties for 320 yards against their opponents.

Don’t believe me? Take a look…

Date | Game | Opponent’s Penalties | Opponent’s Penalty Yds | Falcons’ Penalties | Falcons’ Penalty Yds |

9/30/12 | Panthers @ Falcons | 9 | 64 | 2 | 15 |

1/13/13 | Seahawks @ Falcons | 6 | 35 | 3 | 11 |

9/29/13 | Patriots @ Falcons | 9 | 93 | 6 | 55 |

12/23/13 | Falcons @ 49ers | 7 | 45 | 5 | 37 |

12/4/16 | Chiefs @ Falcons | 13 | 128 | 5 | 30 |

9/24/17 | Falcons @ Lions | 9 | 98 | 8 | 68 |

Totals |
53 |
463 |
29 |
216 |

Admittedly, those numbers aren’t quite as stark as Morelli’s, but they don’t exactly paint a picture of Coleman as an impartial ref, either.

In 2012, replacement official Brian Stropolo was banned from working a New Orleans Saints’ game when pictures of him donning Saints’ attire were found on his Facebook page. So there is precedence if the NFL wants to use my analysis to ban Coleman from Falcons’ games, or if they want to accept the petition and ban Morelli from Eagles’ games.

But let’s keep this in perspective and remember one thing: **It is Philadelphia**, after all. I mean, we’re talking about a sports town where fans threw snowballs at Santa Claus and threw batteries at Eagles quarterback Doug Pederson — the same Doug Pederson, in fact, who is now the Eagles coach. So if Morelli and his crew are deliberately blowing the whistle more against the Eagles than their opponents, who cares? This type of denigration couldn’t be offered to a more deserving team.

### AWOKK, Day 3: KenKen Times

Today is Day 3 in MJ4MF’s **A Week of KenKen** series. In case you missed the fun we’ve had previously…

- Day 1: Introduction
- Day 2: The KENtathlon

Yesterday, I introduced you to the **KENtathlon**.

While completing a KENtathlon, my goal is to complete a 6 × 6 puzzle in less than 2 minutes; a 5 × 5 puzzle in less than 1 minute; and a 4 × 4 puzzle in less than 20 seconds. Even though the sum of those times for all three puzzles is 3 minutes, 20 seconds, my goal is a combined time of 3 minutes. It’s good to have goals.

Puzzle Size |
Goal Time |
Personal Best |

4 × 4 | 0:20 | 0:12 |

5 × 5 | 1:00 | 0:27 |

6 × 6 | 2:00 | 1:29 |

KENtathlon | 3:00 | 2:32 |

I don’t always perform well enough to meet those goals. And when I don’t, I repeat the same size puzzle again… and again… and again… for as many attempts as it takes to complete each puzzle in the allotted time. And when I’ve met the time goal for each puzzle individually, if the combined time isn’t satisfactory, then I start the whole thing over.

To say that I’m slightly obsessive would be like saying that the Pope is a little bit Catholic.

As you may have noticed in the table, I once finished a 4 × 4 puzzle in 12 seconds. The key word there is **once**. The stars were in alignment that day — it was an easy puzzle, and the dexterity of my thumbs and fingers was at an all-time high. Though I’ve attained 13 a handful of times, I’ve never replicated that 12-second feat.

That said, I regularly complete 4 × 4 puzzles in 14 or 15 seconds. With that being the case, you have to wonder if the 20-second goal is really a challenge. And what about the goal times for 5 × 5 and 6 × 6 puzzles?

Admittedly, my time goals are arbitrary, though not random. When I chose those goals, I had completed enough KenKen puzzles that I intuitively knew what felt right. Still, it wasn’t based on hard data… and if you’ve read this blog long enough, you know that that bothered me. A lot.

But what’s a boy to do?

I suppose a well-adjusted human might do nothing, think it’s not worth the trouble, and just let the whole thing go. But an obsessive numbers guy? Well, he’d painstakingly solve 132 KenKen puzzles, collect data on the amount of time each one took to complete, meticulously record the data in an Excel spreadsheet, and perform a thorough analysis. You may think that undertaking such a project is ludicrous; but to me, it was absolutely essential.

The graph below shows the results. The circular dots represent my median time for each puzzle size, and the square dots represent the upper and lower quartiles. For instance, the median time for 6 × 6 puzzles was 217 seconds, while the interquartile range for 6 × 6 puzzles extended from 163 to 284 seconds.

What this reveals is that my intuition wasn’t perfect, but not bad.

- I completed
**49% of 4 × 4 puzzles**in less than the goal time of 20 seconds. - I completed
**58% of 5 × 5 puzzles**in less than the goal time of 1 minute. - But, I completed
**only 14% of 6 × 6 puzzles**in less than the goal time of 2 minutes.

Further analysis revealed that I completed 40% of the 6×6 puzzles under 3 minutes, and that seems a bit more reasonable, so **my new goal time for 6 × 6 puzzles is 3 minutes**.

Now, I know you thought this analysis was completely unnecessary, but the proof is in the pudding. The results were invaluable. By considering the data, interpreting the results, and revising my goal time for 6 × 6 puzzles, the probability that I can now complete each size puzzle in the allotted time on the first or second try has increased from 16% to 39%. Or said another way, Remy’s morning walks now last an average of just 15 minutes, whereas some of them used to take an hour-and-a-half.

### What (Math) is in a Name?

One of my favorite online tools is the Mean and Median app from Illuminations. This tool allows you to create a data set with up to 15 elements, plot them on a number line, investigate the mean and median, and consider a box-and-whisker plot based on the data. Perhaps the coolest feature is that you can copy an entire set of data, make some changes, and compare the modified set to the original set. For example, the box-and-whisker plots below look very different, even though the mean and median of the two sets are the same.

It’s a neat tool for learning about mean and median, and I plan to use this tool in an upcoming presentation.

**Exceptional, Free Online Resources for the Middle Grades Classroom**

*G. Patrick Vennebush*

Thursday, October 20, 12:30-2:00pm

Room 401 (Atlantic City Convention Center)

For classroom use, I like to use this app with real sets of data. However, the app requires all elements of a data set to be integers from 1-100. Can you think of a data set with a reasonable spread that has no (or at least few) elements greater than 100? If so, leave a comment.

Recently, and rather accidentally, I found a data set that works well. Do the following:

Assign each letter of the alphabet a value as follows: A = 1, B = 2, C = 3, and so on. Find the sum of the letters in your name; e.g., BOB → 2 + 15 + 2 = 19.

Now imagine that every student in a class finds the sum of the letters in their first name. For a typical class, what is the range of the data? What is the mean and median?

The name with the smallest sum that I could find?

ABE → 1 + 2 + 5 = 8

The name with the largest sum?

CHRISTOPHER → 3 + 8 + 18 + 9 + 19 + 20 + 15 + 16 + 8 + 5 + 18 = 139

The Social Security Administration provides a nice resource for investigation, Popular Baby Names. Using a randomly selected set of 2,000 names and an Excel spreadsheet, I found the mean name sum to be 62.49, and 96% of the names had sums less than 100. Of the 80 names with sums greater than 100, many (such as Christopher, Timothy, Gwendolyn, Jacquelyn) have shortened forms (Chris, Tim, Gwen, Jackie) for which the sum is less than 100.

As it turns out, the frequency with which letters occur in first names differs from their frequency in common English words. The most common letter in English words is *e*, but the most common letter in names is *a*. The chart below shows the frequency with which letters occur in first names.

Because of this distribution, the average value of a letter within a first name is 10.54, which is slightly less than the 13.50 you might expect. This is because letters at the beginning of the alphabet, which contribute smaller values to the name sum, occur more often in names than letters at the end of the alphabet.

The chart below shows the distribution for the number of letters within first names. The mean number of letters within first names is 5.92 letters, and the median is 6. (In the data set of 2,000 names from which this chart is derived, no name contained more than 11 letters.)

Do you know a name that has more than 11 letters or has a name sum greater than 139 or less than 8? Let me know in the comments.