A few years ago, I was sitting in a mostly empty classroom before our statistics class chatting with one of my classmates (and a Mets fan to boot!) about a metric that he was thinking about using to select which stocks he should use for his personal stock option investing. After listening to him describe the idea, I realized that he was talking about "dollar vols". Previous to grad school, I had traded stock and commodity options for about four years and I had heard of this idea before and even messed around it early in my career. Basically, it's a cute theoretical idea that has been around for a while although it doesn't really work. I explained how it looked great on paper but got derailed by real world concerns and concluded my argument with, "It's fun to look at, but it's a bad metric. It doesn't tell you anything." Meanwhile, my statistics professor, who had entered the hall while we were chatting, sharply corrected me by saying, "There are no bad metrics, just bad statisticians. The metric might not tell you what you thought it might, but if you calculated it properly, there is always value to that information. You just haven't found out what that value is yet."
And so we come to the RBI, the most misunderstood of all statistics currently appearing on most Jeff Francouer-approved scoreboards. In general, the shortcomings of the RBI as a talent evaluator have been well-documented here and we cling on to it only because it describes an event that we all understand. As much as advanced statistics have enhanced my enjoyment of baseball, I think I'd swallow my Big League Chew in disgust if I heard Gary Cohen say, "Ike Davis slightly improved his wOBA in the second inning with his double into the right-center gap. The fact that David Wright was standing on 1st base with two out and happened to score on the play was a complete coincidence and does not speak to Ike's credit." I just want to know Wright got on with two out and Ike punched him in with an RBI double because that's a huge play! I don't really care about the play's effect on talent discovery.. that's what post-game recaps on the internet are for. So just as I was ready to write the RBI off as merely a descriptive statistic instead of one that could be used to evaluate talent, I thought of a new cousin statistic to the RBI that could keep it's dorky friend popular with the cool kids much like Brendan Fraser did for Sean Astin in Encino Man. We are the cool kids in this analogy and Brendan Fraser will take the form of Sean Astin's new friend, the xRBI.
The idea is pretty simple. Consider the run expectancy tables that Tom Tango among many others have derived: http://www.tangotiger.net/RE9902.html This table shows the expectancy of the number of runs an average team would score against average pitching in an inning given the number of outs and number and position of base runners. All we would need is a variation on this table that instead showed the RBI expectancy for an average hitter given the situation and create a counting statistic called the xRBI... the number of RBI's the league average hitter would be expected to get facing league average pitching (I'm assuming that league and park adjustments anything else hedging the run environment could also get factored in there if you like). So let's say that Ike Davis sees four AB's in a game with the four following run expectancies:
Runner on first, two out: .10
Bases empty, one out: .06
Runner on second, none out: .33
Runners on first and third, two out: .41
Ike would thus accumulate 0.90 xRBI. Assuming that the went 1-for-4 and only run he knocked in was with that double in his first at-bat, Ike would accumulate that one RBI. We could then measure the difference between the two (RBI - xRBI) to establish who the run producers are. Here, Ike would fall 0.10 to the positive since the fairly unlikely RBI in his first at-bat and his flubbing of the third and fourth at-bat's wash out to a meh performance for the day. The higher the difference, the better the ability to produce runs.
I tried to work in consideration of run environment and comparison against the league average, which seem to be desirable. Of course, this all contends on the production of a reliable RBI expectancy table, but given the existence of the run expectancy table and the tons of other things for which we have linear or matrix weights, it seems pretty doable. Any thoughts?