Hall: We’ve had a couple interesting conversations recently on sabermetric hitting and pitching statistics. It seems like a good time to talk about Wins Above Replacement, usually called WAR. What do you know about WAR?
Oates: You mean, WAR, what is it g-
Hall: Stop it. Any and all references of that type are only allowed to be made by Elaine Benes when discussing alternate titles for War and Peace. It has no place in baseball.
Oates: My apologies. So what is WAR exactly?
Hall: WAR is a framework to measure players against each other, using statistics to convert a player’s value into runs and then into wins, compared to a replacement-level player.
Oates: What makes it better than other statistics?
Hall: What makes WAR so good to use is that it takes into account everything a player does on the field: hitting, running, fielding, and pitching and combines all aspects of the game into a single number. It is very good for comparing players with different skills to determine just how valuable they are when combined.
Oates: What about the will to win?
Hall: To the extent that a player shows their will to win by making positive contributions on the baseball field with hits, solid defense and things of that nature, the will to win is captured in WAR.
Oates: What is a replacement player?
Hall: A replacement player is the type of player that is generally available to all teams in free agency on a minimum salary. A veteran who might spend time during a season in AAA as well as the majors.
Oates: So, like, prospects that get called up from the minors?
Hall: No, not prospects. Prospects get called up because teams feel like they are ready to produce at levels above replacement level. If they thought a prospect was a replacement-level player, they would keep him in the minors until he was better prepared.
Oates: So who are replacement level players?
Oates: But Overbay’s been good this year.
Hall: Right, we’ll see some players perform slightly above and others will be slightly below replacement level, but overall these players will generally average out to roughly replacement level.
Oates: Why use replacement level? Why not use the average player?
Hall: If average players were used we would see a whole lot of players with negative numbers and it doesn’t really provide as clean of a look to the numbers. For example, out of 143 hitters that qualified for the batting title last year, only ten had a negative WAR. If we used the average player, which is about 2.0 WAR, we would have close to 50 and that doesn’t account for all the players who didn’t qualify for the batting title who would also have a negative WAR.
Oates: That makes sense. I’ve heard that there are two versions of WAR and that players can have different values depending on where you look. How can a stat be legitimate if there are two different versions?
Hall: You are correct that there are two different versions used, one from fangraphs, sometimes called fWAR, and one from baseball-reference, sometimes called bWAR or rWAR. I said before that WAR is a framework for valuing players. In order to value the players, each site plugs in statistics for hitting, running, fielding, and pitching. The sites enter the data a little bit differently which can result in different numbers.
Oates: Which one is better? Why can’t we just decide on one?
Hall: I don’t think there is a better one. It depends on which statistics you value more. We don’t need to have just one. Using both can be helpful. Do you watch movies?
Oates: Yes. Why do you ask?
Oates: Yes. I usually check out both.
Hall: How does imdb rate movies?
Oates: People go and rate movies they have seen on a scale of 1-10.
Hall: How does rottentomatoes rate their movies?
Oates: They take a look at all the reviews throughout the country and then give the percentage of positive reviews. If it is over 60%, they call the movie fresh. If it is under 60%, then they call it rotten.
Hall: So if a movie is over 7 at imdb or over 60% at rottentomatoes the movie is probably decent, right?
Hall: For the most part the movies that get high ratings at imdb, get high ratings at rottentomatoes, and the movies that get bad ratings imdb also get bad ratings at rottentomatoes.
Hall: But I bet if you looked hard enough you would find some ratings that the two sites differed on.
Hall: But that doesn’t mean that the two sites don't have value. In fact, you usually take a look at both sites before seeing a movie so you can get more information on it.
Oates: I think I see what you’re getting at.
Hall: Each site rates movies in a slightly different way, but the goal is to rate the movie and both sites do a good job educating viewers on the quality of a movie. WAR is the same way. Fangraphs and baseball-reference calculate WAR a little differently, but their goal is to provide an accurate value of a player, and both sites do a good job of providing it.
Oates: So how are the WARs different?
Hall: First, let’s talk a little about how they are the same. Both tally up the runs a player is worth and then convert that to wins. They calculate the number of runs using replacement level as the base.
Oates: So a replacement level player is worth zero runs?
Hall: Exactly. We start with zero runs above replacement level. The first thing both sites do is figure out how many runs above average a player is at the plate when hitting. When we discussed hitting before, we talked about wOBA. We can take wOBA and then turn that into runs above average.
Oates: I thought we were dealing with runs above replacement level, but you said they calculate runs above average. How does that work?
Hall: If you remember before that the average player is worth about 2.0 wins above replacement. Every win is equivalent to roughly 10 runs. So to calculate the WAR, they add in the difference between the replacement player and the average player. Over the course of a full season, this will be pretty close to 20 runs. If a player only plays in half the games, he’ll get credited with about ten runs.
Oates: So we start at zero, then we add the runs above average at the plate, then we add the difference between a replacement player and an average player. Then what?
Hall: Offensively, we don’t want to know the results of hits, we also want to calculate baserunning. This gives value for going from first to third on a single or scoring on a double. So they calculate the number of runs above average players are on the basepaths.
Oates: Do you have to add in the difference between a replacement player and an average player on the basepaths like we did with batting runs above average?
Hall: No, we only need to do that once. The difference between replacement level and average that we already put in incorporates base running and defense as well. Otherwise we would be double and triple-counting the difference.
Oates: So what’s next?
Hall: Next, the fielding runs above average are included.
Oates: Is it above average for the position or overall.
Hall: It’s for the position.
Oates: But that’s not really fair. It’s a lot harder to be a shortstop or catcher than a first baseman.
Hall: That’s true. That’s why WAR incorporates a positional adjustment.
Oates: How does that work.
Hall: Based on historical averages, the production at different positions varies, so each position gets runs added or taken away depending on the position they play. Over the course of a full season, catchers receive a 12.5 run credit. Shortstops get 7.5, second base, third base and shortstop get 2.5 runs while left field and right field get 7.5 runs taken away, first base gets 12.5 runs taken away and designated hitters get 17.5 runs taken away.
Oates: What if a player plays more than one position?
Hall: It is accounted for. Runs will be added or taken away depending on how much a player played at each position.
Oates: So then what happens?
Hall: The runs for batting, baserunning, fielding, the difference between an average and replacement level player are added together with the positional adjustment for a total runs above replacement level. Since ten runs are worth roughly one win, the runs are converted to wins, and that’s how you get your final WAR number.
Oates: What's a good WAR to have?
Hall: Like I said before, two is going to be about average for everyday players. Anything above four is really good, and above six is going to be your MVP contenders.
Oates: What about for pitchers?
Hall: The same scale works for starting pitchers. For relievers, anything above one is going to be solid. Determining the WAR for pitcher’s is a little easier. They take the runs above a replacement pitcher, and then convert that to wins in the same way they did for hitters.
Oates: So how are the two WARs different?
Hall: There are a fair number of minor differences that baseball-reference has a chart of on their website, but there are principally two major differences. First, is calculating runs above replacement for pitchers.
Oates: What does baseball-reference do?
Hall: They take runs allowed and innings pitched to determine runs above replacement level.
Oates: What about errors? Why do they count unearned runs?
Hall: They count the unearned runs, but then make an adjustment for defense.
Oates: Well, what does fangraphs do?
Hall: Last time, we discussed Fielding Independent Pitching, FIP. Fangraphs uses FIP to determine runs above replacement level.
Oates: Do they compensate for defense?
Hall: No. Because the stats that go into FIP, strikeouts, walks, and homers, all have nothing to do with defense, it is not necessary to make that adjustment.
Oates: So which one is better?
Hall: It depends on what you are looking for. If you like using FIP then you will find fangraphs’ WAR a more accurate representation of a pitcher’s value. If not, you may prefer baseball-reference. Using runs allowed over the course of one season may include a little more luck than you prefer depending on BABIP or pitching with runners in scoring position, however, over the course of a pitcher’s career, the luck in using runs allowed is more likely to even out.
Oates: So what is the second major difference.
Hall: The other major difference is in fielding. Prior to 2002, both sites used Total Zone, however, due to technological advances, we have a lot more batted ball information so the sites use different fielding metrics for the last ten years.
Oates: What do you mean about batted ball information?
Hall: We have a lot more resources available to tell us where balls landed. That has provided more information on where and how defenders get to ball in comparison to each other. That gives everyone a better idea of how good players are defensively. Baseball Info Solutions has trained scorers and that combined with the velocity of the batted ball provides a lot of information.
Oates: So these scorers’ eyes are watching where the ball goes.
Hall: Yes, they see the ball’s every move.
Oates: So what does baseball-reference use?
Hall: They use Baseball Info Solutions’ Defensive Runs Saved which takes a look at every play and determines how many runs a player saved by making defensive plays. You can find more information on the Fielding Bible’s website.
Oates: So what does fangraphs use?
Hall: Fangraphs uses Ultimate Zone Rating, UZR, developed by MItchel Lichtman. They are both similar in that players are given credit for plays they make and it is taken away for plays they don’t make depending on the difficulty of the play. They do have differences as well. You can find more information on UZR at fangraphs.
Oates: How reliable are these defensive statistics?
Hall: They are significantly better than our other options. One thing to keep in mind is that it takes a lot more time before those statistics can be reasonably relied upon. For example, if a hitter hits .300 for two weeks, you may not be all that confident that the hitter will hit .300 going forward, but if a hitter hits .300 for three months you can be more confident in that hitter’s ability, and feel really good about it after a year.
Oates: So when can we look at the defensive stats?
Hall: It depends what you are comfortable with. I always look at the previous year whenever I look at the current year to get a better idea of the accuracy, and until August or so, I would trust the prior year more than the current one. If the numbers are similar, it’s probably fair to trust them. If the numbers are vastly different, the fielder’s true level is probably somewhere in between.
Oates: If that is the case, is it really fair to trust WAR at all when valuing players?
Hall: You just have to keep in mind the conditions you’re using. If you try to compare WARs between players after one month, you are probably better off looking at wOBA an FIP and the prior year’s defensive statistics. As the season goes on, WAR is going to get better and better to look at. By the end of the season, you are going to have a really good stat to go to compare players. And of course, it’s really useful for comparing careers as all players are compared to their peers so you can determine just how good a player was in his era.
Oates: Is that all?
Hall: No, but this conversation has gone on long enough. I think we can go ahead and stop here.
As usual, feel free to post any questions below, but I highly recommend clicking on any of the links above for more information.