The second part of our interview with Professor Benjamin Baumer deals with his recent project, openWAR, a method of calculating wins above replacement that is entirely open-source and has a one-to-one correspondence with MLB team wins. OpenWAR is an R package available on Github that also scrapes data directly from MLB Advanced Media for anyone to use. Often, we see WAR used as a be-all, end-all metric for describing a player's performance, but the question remains, is it the best metric? Why do believe in it so much? Baumer gives his insight and illustrates why he believes openWAR is, at the very least, a compelling alternative method of calculating WAR from what is currently available.
Now that I've heard about it from you a couple times and played around with the code a bit, it seems as though your project openWAR fulfills two purposes, the first being an open-sourced formula for player wins above replacement—hence the name—and the second is sort of a sandbox where we can perhaps derive new metrics. Is this accurate? What was the impetus behind this project?
That's right, these are the major contributions of openWAR.
Like a lot of people, I've been thinking about WAR for a while, but after the 2012 AL MVP race, I became increasingly frustrated with the way that people were talking about WAR. The concept of WAR has a number of advantages: the units (wins) are easy to understand, and the quantity being measured is of obvious importance. This has fueled the popularity of WAR, but we're now in a place where journalists and news organizations are quoting WAR figures as if they were indisputable facts.
The truth is that multiple organizations (e.g. Fangraphs, BP, Baseball-Reference) produce estimates of WAR, but they are rarely labelled as such. The problem is compounded by the fact that none of the aforementioned organizations produce interval estimates for WAR—they only present point estimates. I think the people that produce WAR generally have a reasonable sense of the uncertainty that is associated with these estimates, but it seems like many of the people who read their blogs and then use those numbers to buttress their arguments don't necessarily have that perspective.
As statisticians, it is our responsibility to help people understand uncertainty, and that was really the impetus for this project. Incidentally, Nate Silver spoke about exactly this issue—the challenges of communicating statistical uncertainty to journalists—at the Joint Statistical Meetings last summer. The larger issue with the WAR estimates that are out there now is that they aren't really reproducible, at least not by scientific standards. Reproducibility has become a huge issue in scientific research, and I would love to see the sabermetric community recommit itself to this notion.
What we produced, simply put, is a fully open-source implementation of WAR (which we're calling openWAR) that includes interval estimates. It's too early to make grandiose claims about the accuracy of openWAR relative to fWAR, rWAR, and WARP, but I do believe that if the community takes an interest in this, there is no reason why openWAR couldn't become the gold standard of WAR implementations in the future. The paper is currently under review, but the arXiv version should provide the details for those who are interested.
The second aspect of this is that Greg Matthews and I have created an R package, also called openWAR, that will do all the calculations. The nice thing about this package is that it scrapes the GameDay XML files from MLBAM and compiles the play-by-play information into a data table. So even if you don't care about openWAR, you can use the openWAR package to download MLBAM play-by-play data into R and do whatever you want with it. Unlike Retrosheet, the MLBAM data is updated live, so you could potentially use it for all kinds of applications. Carson Sievert has a similar, more mature package called pitchRx that scrapes the PITCHf/x data, but not play-by-play. We've spoken about some kind of unification, and I think that is likely, but hasn't happened yet. Stay tuned!
More on R: It seems to become a sort of industry standard amongs MLB teams. Additionally, Max Marchi (now with the Cleveland Indians) just wrote a book, Analyzing Baseball Data with R, along with Jim Albert. Why R? What makes it special and how did it become a standard?
The popularity of R within baseball is merely symptomatic of the popularity of R among data scientists across the world. There is really nothing baseball-specific about R, other than a few recent packages (Lahman, pitchRx, openWAR). Part of what makes R so great is that it is open-source, so not only is it free and cross-platform, but it is constantly being updated, and is fully transparent. Another major strength is that because of R's extensible and open nature, there is now a large, worldwide group of developers who are extending R's basic functionality to do all kinds of very specific and sophisticated things. So when new statistical techniques or visualizations or data structures are constructed, they can be released as an R package that provides that functionality, in a fully reproducible fashion, to everyone right away.
But if I can speak to a larger issue, there really is no comparison between a statistical computing environment like R (Python would be a popular alternative), and a spreadsheet application like Excel. If you are working with a small amount of data (i.e. no more data than you can see on the screen), and you want to manipulate things by hand (i.e. with the mouse), then Excel is easier to use. But if you want to work with a larger of amount of data (i.e. more than you can visually see on the screen) and perform manipulations, transformations, or algorithmic procedures on your data (i.e. writing code, not mouse-clicks), then you really want something with R's capabilities. I'll posit the analogy that if Excel is like a pencil, then R is like a laser printer.
Wow. That makes me feel good about buying Marchi's book, as well. And on that note, this is the last question before we move on to part three of the interview, The Sabermetric Revolution, which is now on my bookshelf: What do you want to see done with the MLBAM and PitchF/X data that you may not have the time to do?
Hmm...I've always thought the relationship between location and velocity would be interesting to study. That is, what velocity is required to make a fastball right down the middle as effective as a 90 mph fastball on the outside corner? That might not be so hard of a question to address, but obviously the real answer will depend on the sequencing, the secondary offerings, the count, etc.
In general, I'd be interested in having a more definitive sense of why certain pitchers can be effective with an 86 mph fastball and others struggle when they're throwing 96. Is it just location? Can we examine the repertoire of a soft-tossing lefty who dominates in AA and definitively say that he simply doesn't have a chance to pitch in the major leagues? Conversely, can we analyze the offerings of a flamethrower in A ball and explain to him exactly how much better his fastball command has to get in order for him to be successful in the major leagues? These seem like accessible questions to which it would be valuable to have answers.