clock menu more-arrow no yes mobile

Filed under:

On Michael Fishman and the analytics dilemma

Are the Yankees too reliant on data?

Samsung Electronics Digital Plaza Store Ahead of Preliminary Earnings Announcement

By now you’ve probably read Joel Sherman’s interview with Michael Fishman, the Yankees’ assistant GM and head of analytics. It’s conducted with the typical Sherman approach — a narrative in search of evidence — but I think there are a few interesting tidbits in Fishman’s answers that lean into the broader conversation we have about advanced analytics.

Before we can get into that though, Sherman, as well as a lot of fans, just don’t understand what the term “analytics” entails — making decisions based on data, any data. Sometimes, analytics is proprietary software projecting that a player with x exit velocity and y-to-y1 launch angle will hit z OPS over a set number of games, so that player should be in the lineup more often than another player. Of course, basing a player’s value on his batting average is an analytical decision too — you’re still using data to drive your decisions, albeit data that is less useful or less predictive.

The predictive value — in addition to the margin for error — is really what counts in data-driven decisions. FIP or Statcast data is more predictive of a player’s performance in the future than their batting average or ERA, and so if you’re building a roster for an upcoming season, you’re going to get a better picture, most of the time, of how a player will do. The greater the level of granularity you can get, the better the predictive picture is as well. High-speed video and motion capture tech allows isolation of particular muscles and joints, allowing you to see flaws in a swing or delivery that are much harder to notice when you’re a scout sitting 100 feet away on the third-base side.

However, all of these tools come with margin of error, and that’s where I think people get a lot wrong with “analytics.” The “analytics” don’t say that Gleyber Torres will hit a certain slash line over 162 games; the projections just take a set amount of inputs and generate a median output with error bars. It’s up to the person running the projection to decide which inputs are appropriate, and how much margin for error is tolerable.

That point about the inputs I think is pretty telling, and Fishman gets into it in the interview:

Effectively, Fishman is saying that although all teams use analytics, the inputs that are considered appropriate can vary between teams. The Astros, for example, put more weight on pure contact ability than a lot of other teams, and so when they place a specific value on a player — an AstrosWAR, if you will — they’re going to add value for guys that produce more contact on the whole, and that plays a role in who they sign, who they retain, who gets the most at-bats, etc.

Contrast that with a team like the Rays, who at least over the past two years haven’t valued contact as much. The relative weight of that input is very different for the two teams, and you see this borne out in performance — the Astros have the lowest strikeout rate in baseball, and the lowest whiff rate, while Tampa has the third-highest in both categories. So we have two teams, both extremely data driven, weighting an input completely differently, and thus, the output produced by each team is also different.

If there’s one critique of Fishman and the Yankees broadly, it comes from this quote:

The goal over 162 games is to register as positive a run differential as possible, and I have no problems with the front office setting that as the main objective of the season. However, the way that they’ve chosen to build the best possible run differential is open to criticism — there are redundancies on the roster, with a number of players bringing the exact same skill set. Acquiring Giancarlo Stanton is a great decision, and the Yankees certainly made that trade with the understanding that adding Stanton added to that run differential more than the alternative options. But, Stanton’s skill set is similar to other hitters in the lineup, and by adding him, you’re not adding a player with a little more diverse approach.

Thus, if an external factor changes the game, like a different baseball, you don’t have a diverse enough skill set to accommodate, which is one reason, I think, why the offensive failure has been a team-wide struggle. Again, as we keep saying, the problem isn’t the Excel spreadsheet — it’s the inputs the team is valuing.

There’s also valid criticism about a sense of urgency. The Yankees are built with a long-term focus, that over 162 and the playoffs, the players will play to their median projections and be successful. This means that they’re not going to change course because of a bad June. Sample size is critical to evaluate metrics, but the fetishization of sample size can also leave you blind to legitimate externalities — like, again, a different baseball or a rival team arriving a year early.

Lastly, what the article doesn’t explore, and partially I think because it doesn’t fit Sherman’s pre-determined narrative, is how the team communicates the data. Baseball players are, well, baseball players. They have the career they have because they are very good at hitting baseballs or throwing baseballs. They are not computer scientists because they are better at baseball than computer science.

Therefore, the biggest challenge is getting players to understand and buy in to the conclusions you draw from the data. Private sector coaches are great at this; someone like Don Latta has rebuilt swings from the ground up by being able to identify holes in a hitter’s mechanics, and properly convey the fix. Sometimes the Yankees seem like they can do that — Marcus Thames corrected Gary Sánchez’s swing just this year, and Matt Blake’s certainly had a hand in turning Jonathan Loáisiga into one of the best relievers in baseball — but sometimes they can’t.

Analytics are the bogeyman that fans are frightened of, and they always have been. In the 1920s, hitters began to adopt a more uppercut-focused swing to engineer more fly balls, and more home runs, a more efficient way to score (hey, when have we heard that before?) and people panicked that the “soul” or “strategy” of the game was disappearing. As we’ve developed more and more sophisticated ways to analyze the game, that panic has increased in volume.

But “the analytics!” isn’t the problem. A computer isn’t setting the Yankee lineup. A combination of inputs, the valuing of which is done by a person, returns an output that is then interpreted by a person. If there’s a fault in the Yankee organization, that’s where it lies — in the valuing, interpretation, and communication, not in the desire to have data-driven decisions.