clock menu more-arrow no yes mobile

Filed under:

Words.

lolbanez
lolbanez

Mood Music - Frankenstein by The Edgar Winters Group

Frankenstein doesn't have words. I was talking about the rest of this post, which will. My wordless posts are instrumentals?

Obvious flaws with the RBI statistic:

- Outside of solo home runs, scoring a run is a team effort, but an RBI is attributed to a single player.
- Players get different number of opportunities to drive in runs.
- Situational hitting is a specious concept. For an individual batter, the percentage of his hits that happen to come with runners on base is at least influenced by luck.

For reasons like these, it's usually best to avoid giving much weight to an RBI total if you're looking for a fair way to compare or evaluate. But, it seems to be based in a worthwhile concept. Doing things at the plate that cause runs to score should be the goal of the batter every time up. While still limited in scope, there should be a way to fill some of the holes in the RBI concept by simply normalizing the opportunity that each batter gets.

Using Tom Tango's run expectancy matrix, we can find an average value for each batting situation. Example: With runners on first and second and one out, the expected value of runs scored in the inning is 0.963 runs. Last year, Alex Rodriguez came to the plate in that situation twenty times.

The obvious thing to do here is to make a weighted average and divide total RBIs by some expected value based on the opportunities the batter had. So, let's do that. Here's A-Rod again:

- There is no real difference between the RBI opportunity with zero, one, or two outs unless there is a runner on third with less than two outs. So, if there was no runner on third, I used the numbers associated with two outs. If there was a runner on third, I separated between the number of outs. But, it doesn't really matter if there are zero or one out with a runner on third (no RBI on a run scoring double play), so both can get lumped into the one out group.

- What is being modeled is the expected runs scored for the entire inning, not just one batter. So everyone is going to fall short of their expectation. What's important, however, is not the number itself, but how they stack up against one another.

Doing this for every batter:

The Expectation is what we calculated above by giving a weight to each plate appearance. Leverage is Expectation divided by PA and measures the relative weight of each player's trips to the plate. Players who often came to the plate with runners on base and in good RBI situations will have higher Leverage values.

Run Production is RBI divided by Expectation and measures how many runs were driven in given the opportunity, which was the point of this whole thing. Thoughts:

- Say what you will about Mark Teixeira last season. It's hard to hit 39 home runs on a team that gets a lot of people on base and not have a big contribution to scoring runs.

- Guys who spent a lot of time hitting in the middle third of the lineup (A-Rod, Cano, Swisher) had the best opportunities to produce runs. Switching Robinson Cano and Mark Teixeira in the batting order will make absolutely no difference on the field. The better hitter (Cano) will hit slightly more, but his average trip to the plate will be slightly less important.

- There's still a systematic bias due to batting order. A runner on second with Alex Rodriguez at the plate was likely to be Derek Jeter or Curtis Granderson. A runner on second with Brett Gardner at the plate was likely to be Nick Swisher or Jorge Posada. Jorge Posada thinks that scoring from second on a single is how the devil gets inside you.