clock menu more-arrow no yes mobile

Filed under:

Selective Application Of Statistics

A common refrain of those who do not like their baseball analysis inundated with statistics is something along these lines:

There are three kinds of lies: lies, damned lies, and statistics.

This is usually uttered by those unwilling to let statistics disprove perceptions. In general, this is a silly statement. Statistics exist to measure whatever it is they set out to measure and nothing more. Numbers alone cannot deceive, as long as the benefits and limitations of such numbers are understood. Stats like wOBA and UZR are frequently cited here, but only because the writers and most members of the community comprehend the context neutral linear weights concept behind wOBA and the sample size limitations of UZR. Criticizing statistics without fully understanding their strengths and weaknesses happens more than it should, unfortunately, and is a major pet peeve. That said, it is true that statistics can be used by people to unfairly support biased arguments.

A post about Mike Cameron at Metsmerized provided an example of the thought that stats are sometimes used to bolster pre-conceived biased positions:

I think that more and more people would accept advanced metrics if the stats weren’t used so often to strengthen just one side of an argument. 

I’m not knocking advanced metrics, but I do find fault with them when they are used to make unfair comparisons.

The point here seems to be that advanced metrics are used as a weapon to trumpet players like Cameron who, atleast in the eyes of the MMO writer, isn't as good as Fangraphs suggests. This couldn't be further from the truth. Advanced stats are cited only because the methodologies behind their calculation have been investigated, questioned and eventually accepted by the best minds in the sabermetric community. If I invented a stat and continued citing it despite being discredited by Tom Tango or Colin Wyers, my credibility would be shot. WAR, UZR, Plus/Minus, etc., have been scrutinized by the likes of The Hardball Times and Statistically Speaking and been accepted as worthwhile metrics. It's not some nonsense created with the intention of making money or fooling people. The other point about the inability to compare the values of players who play different positions is also invalid, as posts like this one by Mark illustrate.

I like to think I'm generally unbiased in any analysis I provide so coming up with an example of my own selective application of statistics is impossible. Endy Chavez is one of my favorite players but I wouldn't project a .350 wOBA as part of an argument in favor of signing him. However, I'd like to offer an example of how even an enlightened writer might appear to be selectively using numbers to agree with an agenda. Howard Megdal of SNY, MLBTR and about three dozen other outlets is a well known Oliver Perez fanatic (just look at the Baseball-Reference sponsor for the RMS Titanic Perez). Unsurprisingly, he has often provided opinion on Derek Lowe, a pitcher the Mets reportedly pursued last offseason before eventually settling on Perez. In a post at MLBTradeRumors from October about Lowe being available in a trade, Howard wrote:

Lowe is coming off of a season with a 4.67 ERA along with a strikeout rate of just 5.1 per nine innings. He certainly didn't finish strong, with a 5.05 second-half ERA, and a 6,23 [sic] mark from September 1 on.

ERA isn't my pitching statistic of choice but sure, Lowe had a somewhat disappointing season. At his debate site, the recommended The Perpetual Post, Howard participated in a Lowe vs. Perez discussion in November:

I think it represents the likelihood that Lowe, fresh off of an 88 ERA+, is likely to be around that or below it for the remainder of his three years and $45 million.

A healthy Perez has been considerably better than the 2009 edition of Derek Lowe. Time and reality are converging to bring Derek Lowe’s career to an unceremonious close. It isn’t time to declare Atlanta the winner in the choice of Lowe over Perez just yet.

Again, ERA+ isn't a great metric for evaluating a pitcher's performance but I can live with it. That is until reading this piece by Howard at SNY about Mike Pelfrey:

Pelfrey had a 3.72 ERA last year, while even Sunday's stellar performance only lowered his 2009 ERA to 4.83. But let's take a closer look.

Fangraphs has his FIP at 3.96 last year, 4.18 this year -- a negligible difference in performance over the two seasons.

Wait a second, why is Pelfrey afforded the FIP treatment but Lowe isn't? Lowe's 2009 FIP was 4.06, significantly better than his ERA. He was a victim of an inflated BABIP and subpar defense backing him up, much like Pelfrey. Why not consistently utilize stats for player evaluation? Having read Howard's work for some time now, I doubt he consciously wanted to unfairly paint a negative picture of Lowe. However, this is an example of how a known affinity for one player over another, and subsequent assessment of those players, might cause some readers to question a writer's intentions. In the case of Cameron and Jason Bay, reliable statistics show that the gap between them isn't as large as many think. That this doesn't jibe with most fans' perceptions of the two players has little bearing on whether it is true. Rational people can have differing opinions. Those opinions will certainly be respected as long as they're not served up with a helping of "Get your noses out of the books, and keep your eyes on the ball."

Certain metrics are largely ignored here, with good reason. For pitchers, W-L record and ERA are generally useless. For hitters, RBI doesn't really tell us anything. For fielders, errors and fielding percentage have become archaic. Better metrics are consistently applied and there isn't much bias in player evaluation. Regardless, should any of the AA writers or community members appear to be twisting stats to fit a pre-conceived agenda, I would expect a commenter to blow the whistle. If it happens, please point it out.

Also, happy holidays. Or, alternatively, bah humbug!