On May 1, 2018, the Los Angeles Dodgers looked like a crumbling franchise, the Arizona Diamondbacks were only a game within the best record in baseball, and the New York Mets showed signs of their 2015 National League Championship selves. Oh, and on top of all that, Didi Gregorius was running away with the American League MVP award.
Fast-forward to the end of the season, and none of that happened, or even came close to happening.
Events in the month of April get magnified. It is, at this point in the season, the only data we have, so we scrutinize it more than any other month of the baseball year, perhaps save only for October. Even so, for a long time, baseball writers have stressed the importance of short sample sizes in regards to player performances. They remind fans not to worry about a star struggling out of the gate, like Giancarlo Stanton did last season, and to not get too excited over players potentially breaking out, such as Gio Urshela this year.
When it comes to teams, however, this caution gets thrown out the window. To use an example from this season, many analysts have talked about Boston’s poor performance to start the season, claiming that their 15-18 record has dug them into an impossible hole to climb out of. Is there a reason for this apparent double-standard? Are teams, somehow, less prone to small sample sizes? Does the month of April somehow set the tone of the season?
To test this, I took a sample of 57 teams from over the last 10 years, chosen at random using a random number generator, and plotted their winning percentage in the month of April by their final winning percentage. I then generated a trend line.
At the extremes, one sees a strong correlation between April winning percentage and final winning percentage, while towards the middle, there exists a lot more variation. This follows the common sense that better teams, as a whole, tend to play better than worse teams. Since teams tend to go through ups and downs over the course of a season, however, they may perform better or worse than their final performance over the course of the season.
Of course, this study is not conclusive, but the trend is clear: while smaller sample sizes definitely do skew the results, the truly great teams and the truly terrible teams tend to follow the model shown. Teams in the middle — which range from good teams, like the 2017 Dodgers, to bad teams, like the 2015 Reds — tend to be more susceptible to statistical noise. This generalization, however, is still not quite broad enough, as those 2015 Reds finished just as poorly as the 2013 Marlins, who began the year with a .292 winning percentage.
Many Yankees fans have enjoyed Boston’s rough start to the season, as the defending champs have begun the year 15-18. While their slow start has certainly put them in a little bit of a hole — they currently sit 6.5 games out of first place and only have a 54% chance of reaching the playoffs according to FanGraphs, we should not read too much into their slow start. Their .438 winning percentage puts them on the lower end of the middle of the pack, right on the graph where anything can happen. As bad as they have been, they have not been exceptionally bad. History tells us they have a good chance at turning things around, and thus cannot yet be written off in the same way that the Baltimore Orioles can be.
Ultimately, all one can conclude from looking at these numbers is that one can make predictions based on a team’s performance in April if and only if they were exceptionally good or disastrously bad during the month. Otherwise, the average team in the month of April can either finish strong as a contender or become one of the league’s bottom-dwellers.