clock menu more-arrow no yes

Filed under:

What does ‘analytics’ mean anyway?

New, 39 comments

An epistemological discussion on baseball’s culture war.

I’m going to give you two players, both of whom play first base, and I want you to tell me who you’d rather have starting every day for the New York Yankees.

Player A

Player B

Well that’s not very helpful, is it? You have no information, you’re flipping a coin between two players and hoping for the best. Let’s add some information.

Player A is a right handed hitter. Player B is a left handed hitter. Is that a little more helpful? We have a little more information, we can add it to what we already know — Yankee Stadium is generally friendlier to lefty hitters, for example. This is going to influence our decision on who starts. But what if we had even more information?

Player A had a .948 OPS last year. Player B had a .496 OPS. We can now add that to what we already know, and all of a sudden it looks like Player A is the guy to start. Congratulations, you’ve just used analytics to make a baseball decision.

That’s it. That’s analytics. John Smoltz hasn’t fallen out of the sky. That’s all analytics is. You use information to help you make a better decision, or at least, what you think will be a better decision. As we added more information to our question of Luke Voit or Mike Ford, we can make a pretty good call, at least based on the best information we had at the time, that Voit is the best choice to start at first.

Yet if you have followed Baseball Discourse throughout the playoffs, you would think that analytics were some sort of phantom roaming the country, an Angel of Baseball Death descending upon Globe Life and Petco. In so many ways, analytics has become a catch-all for Everything I Personally Don’t Like About Baseball, but it’s not that, and never has been.

We pine for more information, especially in baseball. We don’t anoint an Opening Day starter because he looks like a pitcher, but because most of the information we have suggest he’s the best pitcher. There is perhaps no better example of this than the best free-agent pitcher available this winter, Trevor Bauer, who so many people, both in old-school and new-school camps want the Yankees to pursue.

Wild Card Round - Cincinnati Reds v Atlanta Braves - Game One Photo by Todd Kirkland/Getty Images

Bauer has been adamant that he is not a good athlete — read his criticism of his own athleticism in The MVP Machine. He’s not particularly fast, particularly strong and can’t jump very high; in his own words the only above-average athletic trait he possesses is strong hand-eye coordination. Yet he’s been one of the most accomplished college pitchers in recent history, carved out a very good major league career for himself, and is set to land a significant free-agent contract based on the that track record.

If it weren’t for analytics, if Bauer didn’t have access to better and better information, information that drove the decisions he’s made, he never would have made it to UCLA, much less a Cy Young caliber-season in 2020. It’s hard for the hardline opponents of analytics to square that, but it’s not those people that this column is for, because they’ve never been intellectually honest.

What I do want to accomplish here is a more honest conversation around analytics, their strengths and weaknesses. Those weaknesses exist — there is certainly a real case of information overload, for example:

This is from Game Six of the World Series, when Nick Anderson was brought in, shot immediately after Mookie Betts scored on a fielder’s choice to put the Dodgers up 2-1. What this shows us is pitch-specific, count-specific location and swing trends for right handed hitters against Nick Anderson. When he’s brought in, Dodger hitters can see for themselves what pitch and location Anderson favors in any given count, and where hitters tend to swing and miss against him.

Of course, the reverse information is available too — every Dodger hitter has a specific swing plane that gives them a better chance of hitting a pitch in a specific location, or maybe they pick up sliders better than changeups, etc. It’s entirely possible to end up in this tornado of information, where so much data is available it’s impossible to isolate the three or four pieces that are really suggestive.

Of course, all this also really only works in the aggregate. You make a decision to swing at a pitch, or to put a certain pitcher in the game, with the knowledge that this should work most of the time. Putting in a particular pitcher might be expected to work out 70% of the time, but the 30% chance it doesn’t work out is going to haunt you. These are the risks you take in embracing more information, but by rejecting information and going with your gut/eyes/whatever body part is apparently allowed to make decisions, you assume risk as well.

I didn’t want to just talk about the Blake Snell decision, but it’s impossible to not comment on it. Obviously, taking Blake Snell out of Game Six did not work for the Rays. The pitcher they brought in was Nick Anderson, he pitched poorly, the Rays lost, Series over. Kevin Cash was roundly criticized because Blake Snell was ‘dealing’, and the analytics decided to remove him from the game, or something.

Of course that exact strategy had worked for the Rays in Game Two. Snell pitched a no-hitter through four innings, struck out seven batters over those four innings, and was, as we say, ‘dealing’. He even started the fifth ‘dealing’, getting a groundout and strikeout. And then, all of a sudden, he wasn’t ‘dealing’ anymore — a walk, home run, walk and single to the next four batters, and what was a comfortable 5-0 lead became a much closer 5-2 game.

Now maybe you say that Kevin Cash was right to let Snell try and work out of it, that you don’t pull a pitcher as good as Snell at the first sign of trouble, and that’s fine, but that approach objectively didn’t work. The Rays gave up runs, and against a team like the Dodgers, you never want to let them back in the game. Nick Anderson had to be brought in, got the Rays out of the jam, and the Rays won the game.

This strategy also worked well in Game Seven of the ALCS against Houston. Charlie Morton was ‘dealing’ through 5.2 innings, throwing just 66 pitches, and was lifted after surrendering a walk and a hit. As we keep seeing, Nick Anderson was brought in, got the Rays out of the jam, and they won the game.

Indeed, the reliance on Nick Anderson really showed that the problem with analytics isn’t the existence of information, but the way it can be interpreted. Anderson was gassed, by his own admission, all postseason. He threw almost exactly the same number of innings in October as he did the entire season — 14.2 in the playoffs against 16.1 — seeing a two-thirds reduction in strikeout rate (!!!!), increase in walk rate, and much harder contact allowed.

Whether he was hurt or simply worn out from pitching so much over the past two years, there’s a strong analytical case to be made that the decision to go to Anderson, instead of Pete Fairbanks or Diego Castillo, was the real mistake in Game Six. Yet, as we said above, going to Anderson first out of the pen had worked twice for Cash already. What we don’t know is, was Cash going to Anderson because he put more faith in the two years of data on Anderson than he did the playoff data, which indicated Anderson was no longer the same pitcher — or was he going with his gut that this guy had come up big for him twice in the last week, and trust him to do it again?

2020 World Series Game 1: Los Angeles Dodgers v. Tampa Bay Rays Photo by Kelly Gavin/MLB Photos via Getty Images

Ironically, if it’s the latter, then many people are criticizing Cash for doing what they want managers to do more of! Go with your gut, trust your guys, they’ll scream. If Nick Anderson was Cash’s guy, which is at least plausible, he did what so many of the Alex Rodriguez-types wanted him to do, and it didn’t work. Cash either interpreted the data wrong, or he went with his gut, or... it was just that 30% of the time the move didn’t work.

That 30%, that risk, that uncertainty, can never be eliminated. Mariano Rivera blew saves sometimes, even in the playoffs, even in the World Series. All the data in the world would suggest that in the postseason, there is no pitcher you’d rather have on a per-inning basis than Mo. Some of the time, that just didn’t work.

The use of analytics has been often criticized for taking the human element out of the game, which can be a fair criticism, except for the fact that that risk is baked in to analytically-based decisions. There are error bars on everything, and part of those bars is the human element.

More information should lead us to make better decisions. Using OPS at the start of this post helps us agree that Luke Voit should start over Mike Ford. One of the reasons why Mookie Betts was drafted, why the Red Sox took a chance on an otherwise undersized, unconventional player, is because the team had literal brain scans of their possible picks, and the information taken from those scans indicated that Betts might be able to recognize pitches and react to them quicker, and more effectively, than other picks. In both cases, having more information has led to better decisions for the team.

There’s always risk in acquiring information. You can get lost in the data, you can make a judgement call with wider error bars. But staring at a starter from the dugout, not knowing if he’s losing velocity or spin pitch to pitch, and simply saying “he’s dealing, so he’s going to continue dealing” has error bars too. In a game where an eighth of an inch along the surface of a bat is the difference between a home run and an easy out, the real question is, what’s going to make those error bars narrower?