3
Stats-Results Correlation Primer
Link
amazinavenue.com
Stats-Results Correlation Primer: A venture into the theoretical physics of baseball
There have been many times, when people (like us) have wondered at some point about the bearing and the weight of stats on the end result of a team's success, be it attributed to wins or to winning champ(t)ionships. Well now, I have decided to take on an enormous undertaking, which could take weeks to complete - a Stats-Results Correlation Primer.
Basically what this is, is a project that will measure, on a scale of -1.00 to +1.00, the (theoretical) effect that a stat has on the results of a team, through every season. (-1.00 signifies absolute inverse effect, +1.00 signifies absolute positive effect, and 0.00 signifies virtually no effect whatsoever.) To filter out statistical bias, I will separate the data on a year-to-year basis, and compile them into stats reflecting time periods based on several factors, such as the number of teams or the playoff configuration. Taking into account that correlation doesn't necessarily equal causation, it's (virtually) impossible for a -1.00 or +1.00 result to occur.
For example, we know pretty much (by reasonable assumption) that teams that hit more HRs or have a lower overall ERA* tend to be more likely to win their respective divisions or win the World Series. But how much bearing does each stat have on a team's chances of achieving that? That's what this project is all about. But to be sure to account for yearly trends, this will be done on a year-to-year basis, to enclose data and prevent the occurrence of failing to take account of certain anomalies respective of their seasons. (In other words, statistical relativity.) For instance: a team as a whole could hit 65 HRs in 1915, which would label them as a "power-hitting team". (The highest team total that year for HRs was 58.) Nowadays, that would be considered horribly pathetic. (You would need 200+ HRs to achieve that same recognition today). It's all about relativity, when filtering out statistical bias on a year-to-year and era-to-era basis.
* I will use ERA more predominantly than sabermetric stats, since ERA is strictly results-based, which is what ultimately factors in to the final results. Baseball is largely about luck, and this project will reflect that too. Good process does not necessarily guarantee good results.
With that said, let me know what your take on this is. I believe that this could have significant ramifications going forward.



