clock menu more-arrow no yes

Filed under:

The pitfalls of incomplete statistical inquiry

New, 7 comments

In order to responsibly evaluate a player, we must look at the whole data set, and not just pick and choose the metrics that support our preformed biases

Daily Life In Warsaw Photo by Jaap Arriens/NurPhoto via Getty Images

When the Yankees traded for Rougned Odor last week, it spurred a whirlwind of discussion in the PSA Slack. First, we tried to make sense of the acquisition of a seemingly washed-up player signed to a prohibitive contract. When more details came out about the effective zero cost to the team, conversation turned to speculation over what value — if any — he could provide the Yankees, given his frightening offensive decline and average-ish glove. Finally, some of us dreamed of a scenario where the same batting department that tapped the offensive potential in Luke Voit, Gio Urshela, and Mike Tauchman could work its magic on the previously power-hitting Odor.

Each who participated in the roundtable of sorts supported his/her reasoning with various metrics. The optimists brought up his consistent barrel rate and competent infield defense over the last few years, while detractors pointed out his plummeting average exit velocity and miserable on-base percentage. During the discourse, one piece of evidence (a tweet in this instance) caught my eye:

In the weeks prior to our Slack conversation, I had been mulling writing about the importance of thoroughness and context when providing statistical evidence to evaluate a player. The tweet in question reignited that idea in my mind, as it is the perfect example of how an incomplete statistical picture can lead to a seriously flawed assessment. The author of this tweet likely intended to paint an image of Odor as one of the premier sluggers at the keystone. However, a fuller analysis reveals that he was a well below-average hitter in that time frame even with the third-most extra-base hits of a second baseman.

And so I would like to continue that discussion today by first starting out with a thought experiment. Consider the following players, each of whom makes four plate appearances per game:

Player A: Hits a home run once every other game, strikes out in every other PA

Player B: Walks five times every two games, strikes out in every other PA

Obviously this is a quite ludicrous scenario, as Player A would break Barry Bonds’ single-season home run record by eight, Player B would almost double his single-season walk record, and both would surpass the single-season strikeout record held by Mark Reynolds. However, it is exactly this type of extreme example that illustrates the importance of context and a comprehensive body of data.

You may be surprised to learn that both players have the exact same OPS for the season (.625). You can then see how citing solely OPS as evidence that the two players are identical is flawed; they’re basically polar opposites of each other!

So which player would you rather have? By wOBA, Player B (.431) is the easy choice over Player A (.243). I talked about this with Josh, and we concluded that context is the key element in this hypothetical. If you are a team with a slugger-stocked lineup, say the Yankees, you might opt for Player B, as his high on-base ability provides greater opportunity for run generation by his teammates. (Huh, this sounds kinda like Aaron Hicks...)

However, if you are a team with few productive bats, you need all the raw production you can get. So you would probably go with Player A, a guy who hits bombs but not much else. (Gary Sánchez anyone?)

I think this is also a good illustration of how metrics and methods of player evaluation have evolved over time. In the past, one might look at Player B’s 125-point advantage in batting average and record-setting 81 home runs and declare him the superior player. But today, as evidenced by the nearly 200 point disparity in wOBA, it is clear that the ability to get on base is a much more coveted asset.

I think Domingo Germán’s 2019 season is the perfect template to highlight the evolution of player assessment. For a century, his 18-4 record alone might’ve been enough to earn him down-ballot Cy Young votes, and certainly enough to garner praise as an effective starter. But now with a new set of tools at our disposal, we can begin peeling back the layers to reveal a more troubling picture. Germán allowed almost a half a homer per game more than the league average, a trend which has reared its ugly head in the early going this year. He also received roughly two runs more run support per start than league average, so it’s easy to see how many of those wins could turn into losses without the backing of his high-powered offense.

And this isn’t by any means designed to be a knock against the more traditional set of player statistics. I think they still have a place in the game. Things like RBIs and batting average with RISP tell the real-life story of outcomes at the game level. But they are largely ineffective for legitimate analysis of a player’s ability.

For centuries, physicians thought that blood-letting and trepanning were the best treatments for various ailments. Turns out that kills people. As methods become more refined, we adapt our ways of thinking. As Heraclitus stated more that 2,500 years ago: “Change is the only constant in life.” Baseball is not exempt from that truth, so why not embrace the change in our game and how we understand it?