Correlation between pirate populations and global warming. Image credit: Church Of The Flying Spaghetti Monster/Wikimedia
That’s one of the best illustrations of correlation does not imply causation I’ve ever seen. It’s even better than Stephen J. Gould’s .400 batting average discussion in Full House, mostly because while I get pirates, I’ll never really understand baseball.
This gives me the chance to mention a little something Gould pointed out about statistics: if you’re not seeing the full picture, you might be missing an important point. Such as, increased complexity in evolution. Those who look at the fact that life began with simple unicellular critters and ended up with complex multicellular critters and automatically jump to “ZOMG! Evolution = progression in complexity!” miss a little something in the data:
This doesn’t mean evolution’s not progressive – Richard Dawkins makes an excellent argument (pdf) about that, while paddling Gould as only the British can do – but it does mean that it’s rather silly to rest your case on the fact there are more complex critters than there used to be. Of course there are – just like a lot of drunks ended up in a sheep pasture:
Here’s an analogy to get the right model into your head. Imagine a busy bar that closes at 2am, and sends all the drunks out the door to walk home. Since scienceblogs was so unfair to our Australian readership last night, let’s imagine it is an Australian bar, and a million brain-blitzed Australian drunks spill out the door and start walking determinedly down the street. There are a few properties at play here. One is that this street happens to be paralleled on the right by a wall, so the drunks can’t stagger too far in that direction. The other is that on the left is a wide-open sheep pasture which provides no obstacle to their progress that way. Another is that they are all initially aimed straight down the street, but because they are drunk, they stagger every once in a while and veer off a few degrees to the left or the right, entirely by chance.
You’re hovering overhead in a helicopter. What do you think you will see?
The mob will proceed down the street, but as it goes, it will spread out gradually to the left. The majority will stagger right and left with equal frequency, and wobble roughly down the street. There will be a subset that will, by chance, stagger left a little more than to the right, and they’ll drift off into the sheep pasture. Some may veer more to the right than the left, but they’ll just bounce into the wall and get straightened out that way.
No drunk Australian has a preference to stroll into the sheep pasture. There is no intent to end up there. But some do, just by the odds. You, in your helicopter, can even look at the shape of the sprawling mob and make useful calculations about drunk Australian kinetics and make predictions about the aggregate trajectories of strolling drunkards, although you wouldn’t be able to predict the pattern of an individual drunk.
This is the general model for how size and complexity vary over time.
Statistics will bite your butt if you don’t use ’em wisely. That’s why I recommend Full House, despite the fact Dawkins blew some of Gould’s major arguments out of the water. That book is a wonderful primer on statistics, even if it does natter on and on and on about baseball. You’d also be wise to check in regularly with Efrique at Ecstathy, since he regularly deconstructs wooly statistical arguments and shows you precisely how you’re being had.
We’re also going to have a little discussion about mean, median and mode. You’ll probably never forgive me for this:
Go to YouTube, type in “mean, median and mode,” and realize it could’ve been much worse. Much worse.
The whole point of that obnoxious little video was to show you the difference between the three, in case you didn’t already know (or didn’t bother to remember). I’ve also just provided you with ammunition to use against people who won’t provide their data sets, or who won’t tell you if the number they’re so proud of is the mean, median or mode. Threaten to make them watch this video until they give in. Don’t worry, it’s not against the Geneva Conventions – yet.
George at Decrepit Old Fool posted a thought-provoking video as I worked on this post, and I think it (and he) makes an important point:
I think this is a good idea. Sure, calculus is important for engineering and advanced business courses. But statistics is key to allocating the use of limited resources (for example in health care), to mitigating risk, to epidemiology, even to understanding the environment – to lots of stuff. It would be generally useful to a huge section of the public.
Like Arthur Benjamin said, “In summary, instead of our students learning about the techniques of calculus, I think it would be far more significant if all of them knew what ‘two standard deviations from the mean’ means, and I mean it.”
That’s actually pretty simple:
The standard deviation is the root mean square (RMS) deviation of the values from their arithmetic mean. For example, in the population (4, 8), the mean is 6 and the standard deviation is 2. This may be written: (4, 8) ≈ 6±2. In this case 100% of the values in the population are within two standard deviations away from the mean.
See how easy that is?
Statistics can be meaningless, and they can lie – but with a little savvy on our part, we can tell the difference between legit and ZOMG GLOBAL WARMING’S CAUSED BY NOT ENOUGH PIRATES!!1!11! Not that we’d ever fall for something that obviously silly, right?
Now if you’ll excuse me, I’m off to save the planet.