One of the more interesting developments in the study of baseball over the last few years — whether you're interested in sabermetrics or not — has been the public release of pitch-tracking data, or PITCHf/x. This data, available for free on the MLB website (hidden within files at gd2.mlb.com), allows anyone to look at every individual pitch thrown by their favorite pitchers (or not-so-favorite pitchers) to discover what pitches these pitchers throw, how often they throw them, how these pitches move, and of course, how effective those pitches are.
Originally, this information was really only useful to those who were willing to trawl through the individual data (think large large excel spreadsheets) and knew how to manipulate it. Over the last few years, however, a bunch of websites have culled the useful data (movement, effectiveness of each pitch, etc.) and presented it in human-readable form. These sites included Texasleaguers.com and JoeLefkowitz.com, which provided a series of graphs and charts using the data. Even Fangraphs had a PITCHf/x page for every pitcher in the database.
In other words, if you ever wanted to become an expert on the pitches of your favorite pitcher, your most-hated pitcher, or just a pitcher you find interesting, now you could!
But there was a problem for most people who wanted to use these sites to learn about various pitchers. The graphs/charts/data on these sites would be presented like this:
As you can see, the chart shows you how often each pitch is used by a pitcher and some statistics for each of those pitches. The problem is this: Where do those pitch types come from? The answer is that the pitch type is determined by a computer algorithm developed by MLB Advanced Media, which classifies every pitch in real time. But while the computer algorithm has improved each year, it's still very flawed. Case in point, the above chart is actually for Mike Pelfrey in 2010, and it would seem to indicate that Pelfrey has SEVEN distinct pitches. In reality, that's just not the case. In 2010, Pelfrey threw a four-seamer, a two-seamer/sinker, a splitter, a slider, and a curveball. However, the MLBAM algorithm has sometimes classified Pelfrey's splitter as a change-up and have cleaved his sinker/two-seamer — which is the same pitch — into two different pitches. Moreover, the classifications sometimes are sometimes just plain wrong. The end result is that the data from these websites is sometimes very inaccurate.
The solution to this problem is to look at the data for each pitcher directly and and classify the pitches by hand. This is a pain in the ass even for those who know how and have the ability to do it. But for most people, understandably, this is probably more trouble than it's worth.
But now, for the most part, this problem has been solved. The Hardball Times' Harry Pavlidis, with help from Lucas Apostoleris (also of THT, Beyond the Boxscore, and Fangraphs) has actually manually classified every pitch thrown in the majors from 2007-2011. I can't understate how big a task this was — this involved going through millions of pitches and manually classifying each one (for hundreds of pitchers).
This data is now available via Player Cards on http://Brooksbaseball.net (under the tab marked Player Cards). Just type in a player's name and his data pops right up. You can see where a pitcher locates each pitch by count, the flight path each pitch follows, he stats for each pitch, etc. And thanks to the tireless classifications done by Pavlidis and Apostoleris, the pitch types are as reliable as you could realistically get them.
Two quick words of caution:
1. Be careful when drawing conclusions from the data. Just because a pitch's movement has changed over the years does not mean it has directly resulted in a change in the pitch's results — there can easily be other variables at work here (or just plain luck).
2. The data is still subject to the limits of human analysis. Some pitchers have pitches that are really easy to tell apart — Jon Niese comes to mind. Other pitchers have pitches that are a nightmare to tell apart, such as Mike Pelfrey's two fastballs. While the human classifications are superior to the computer algorithm even in the cases of hard-to-classify pitchers, these pitch type classifications are still approximations so don't take them to be 100% accurate. Still, they're good enough that you can usually tell when a pitcher is totally bluffing (or misinformed) about needing to use a certain pitch more frequently — when you can see right on the player card how bad that pitch is or how infrequently it's thrown.
*(You can identify the pitchers who are potentially problematic with respect to pitch classification by looking at the graphs in each player card; the problem guys are generally the pitchers who have considerable overlap with their pitches).
And that's it. Brooks Baseball Player Cards are the newest great resource for looking at any pitcher in the Majors. Take a look at them, and have fun!