(bumped from fanposts. --eric)
For a class project I looked at Pitch f/x data and I wanted to share some Mets-related results with AA. Basically, I tried to determine a batter's personal strike zone, or the area where they would be more likely than not to swing at a pitch. To this end, I looked at the location of every pitch thrown to them, and used a statistical technique to find the boundaries of their zone. (For more detail, see the end of this post.)
Now for fun graphs! These are from the catcher's point of view.
As you would expect, Castillo has an incredibly small strike zone. It more or less overlays the rulebook strike zone, which I've marked in red. The knees to the letters, roughly. Units are in feet. (*)
On the other hand, you have Francoeur. He really does swing at stuff close to his head and at his shins. Hilarious. He also looks like he swings at pitches down and in more than he should. In my study of qualified batters in 2009, he had one of the largest strike zones. Bengie Molina was one of the few with an even larger strike zone.
As a baseline, here's the graph for a batter I think we'll all agree is good: Pujols.
And another right-handed batter, David Wright.
His 2009 looks fine and from this there doesn't appear to be any particular problem. Here's 2008, below.
If you compare the two seasons, 2009 is clearly different from 2008, almost as though in 2009 Wright was standing further from home plate in the batter's box. He's also standing more upright in 2009, since his rulebook strike zone moves up. The Pitch f/x data from 2007 is spotty, so I can't compare any further back.
Pujols' 2008 isn't much different from his 2009 (it's a little bit flatter), so I wonder if the change in Wright's strike zone from 2008 to 2009 is real.
All of these graphs are fairly crude and I'm definitely open to any suggestions on what else to look at.
I threw out all the called strikes because a batter might take a pitch that he knows is a strike, or he might take a pitch that he thinks is a ball before the pitch hits the strike zone. Since I have no good way of determining which is the case, I decided ignoring them would be easier. In extreme cases like Castillo, that eliminated about a third of the dataset. I think this could be handled better, but I'm not sure how.
The statistical technique I alluded to above is SVM.
(*) The top and the bottom of the rulebook strike zones are from Pitch f/x. For every pitch, the top and bottom heights are recorded based on the batter's stance. Note that John Walsh over at HBT has found that the "real" strike zone is wider and shorter than the rulebook says.