Interpreting stats: regression to mean vs regression towards a mean

Diclaimer: If you love stats, keep reading. If you hate stats, then you can check out any of Lord Duggan's posts about how Mark Teixeira is goofy with semi-pictoral representations. If you like representations of bunnies using only keyboard characters, then just scroll to the bottom and ignore everything that I wrote.

So let me give you a scenario: I'm flipping a coin ten times. First, I get 9 heads and 1 tail. I just outperformed expectations by 40% (or underperformed by 40% if you're a pessimist or have a tail fetish). What do they odds suggest about my number of heads next time? I think we can all agree that it is still five.


Frankly my dear, I just don't give a damn. Let’s say I haven’t flipped the second set. I have well overperformed the expected outcome of 50%. Obviously there is going to be a regression towards the mean (note the use of TOWARDS, not TO).

So stats are going to say that I’m going to get 5 heads next time. Now I’m at an average of 7. See how that’s closer to the mean (my average is going towards the mean). I get five again- my average is now 6.33. Another 5 sets me at an average of 6. You get the picture- I’m going towards five. After a long while, I’ll get to just about 5 after a large sample.

Two things to note of this:

1) 1. this is what we’re commonly introduced to as a “small samples size” bias- that is looking at the first set and suggesting that I’m really good at getting heads. Data sets will regress to the mean in large enough batches.

2) 2. More importantly, note how the 9 heads were not indicative of how I would perform in the future because the odds of getting a heads on a fair coin are 50%. Rather, in a small-medium sample size (e.g. a single season out of a whole career), they will only regress towards a mean, unless something freak happens like I flip 9 tails the next time around (improbable, but plausible)

So how in the name of moses does this relate to baseball? Every single statistic BABIP, of course! When reading these really good posts on BABIP and BABIP w/ RISP, and you read something like “Bud Selig needs botox! Also, did you know Mark Teixeira has a BABIP w/ RISP .107 lower than his career?”

this is the wrong way to interpret them:

“Hooray! In the rest of the season, Tex is going to hit .107 higher than his career to balance it out!”

False. Go play golf on your day off. This is the correct way to interpret that:

“Hooray/Meh/ (O_o) ! Teixeira will probably hit his career average going forward, which will make it look like he’s regressing towards the mean, but probably won’t get all the way back without tweaking something”

Of course, this is all contingent upon the assumption that they have not totally forgotten how to hit baseballs and that given a solid sample size for their career, we know what we are getting from these players. Or they are aging dramatically. Or they get traded to the Padres.

Thank you for reading. Here is your bunny. Please be quiet as he is sleeping.


( -_- )

(“) (“)

Oh no, you woke him up.


FanPosts are user-created content and do not necessarily reflect the views of the writing staff of Pinstripe Alley or SB Nation.

