clock menu more-arrow no yes

Filed under:

A probabilistic way to interpret early season results

New, 30 comments

Or: How I learned to stop worrying and love the math.

Coronavirus Berlin - Home Schooling Photo by Kira Hofmann/picture alliance via Getty Images

A rough start to the Mets’ season has many fans frustrated, annoyed, angry, or sad. It’s understandable, since the on-field product has been extremely disappointing. But how rational are those feelings really? Though our emotional reaction can swing wildly, most of us know deep down not to overreact to small samples in the early season. With a bit of math, we can justify that position, and perhaps make ourselves feel a little better about the team.

Let’s make a simple analogy: a baseball team is like a coin. When you flip the coin, sometimes it comes up heads (a win) and sometimes tails (a loss). A perfectly weighted coin will land on either side with a 50% probability - think of that as a .500 team. A weighted coin, on the other hand, will show more bias towards heads or tails. A team that you expect to win 97 games, which is roughly a 60% winning percentage, would be akin to a coin landing on heads 60% of the time. This isn’t a perfect model of course, because there are talent disparities in opponents and even your own team day-to-day, but it’s a simplified approach that makes our subsequent inferences easier.

If you know the true underlying weight of the coin, you can ask a question like “how many heads (wins) will happen in a given number of flips (games)?”. Naturally, the most likely outcome is found by multiplying the probability of heads by the number of flips - for instance, five heads in ten flips with a perfectly even coin. However, that’s not the only possible outcome, and there’s a gradual spread around that most likely value. That spread is defined by the binomial distribution, which takes two parameters - a probability p and a number of trials n - and returns the expected number of successes. Here’s how this could look in a baseball context; the expected number of wins for a team with a 55% winning percentage under true talent in 11 games (on the left) or 110 games (on the right):

Now, let’s say you don’t know the true weight of the coin. Instead, you have the results of a series of flips of the coin, which amounts to a tally of heads and tails. What you’d like to know is the true weight of the coin. The most likely answer is simply the number of heads divided by the total number of tosses. However, this answer fails to account for the random chance inherent in each flip. Over a small number of flips, a coin could land on heads far more often than it’s actual weight would dictate over a longer sample, making a single-value a poor estimate.

What we need is another distribution, something that’s the opposite of the binomial distribution we described before. Thankfully, such a distribution exists - the beta distribution. Rather than taking in a probability p and returning a distribution of the expected number of heads, the beta distribution takes in two parameters - the number of heads and the number of tails - and returns a probability distribution over the true value p of the coin. Going back to our baseball analogy, we can draw the distribution of the true-talent winning percentage of a 6-5 team (left) and a 60-50 team (right):

Similarly to with the binomial distribution, less data makes our spread larger and therefore less accurate. We can counteract this by leaning on what we expected to happen before we collected our data. This knowledge is known as a prior distribution - think of it as an assumption based on reasonable expectations. For a coin, you’d probably expect it to be evenly balanced without any information. For a baseball team, we can look to preseason projections and expert opinions to build a prior distribution of what the true-talent winning percentage should be.

These priors are particularly important when you have limited data, the situation we find ourselves in the beginning of the season. By combining our prior distribution with what we know happened in our experiment, we can help limit the huge fluctuations we’d see in small sample sizes. Returning again to our 6-5 team, let’s say our prior had them as a 55% W% true-talent team. In this case, our estimate of their true talent based on 11 games of results is more narrow because our prior distribution (in red) and our distribution based on on-field results (in blue) have identical means:

If instead, our prior had them as a 40% W% true-talent team, our estimate of their true talent based on a 6-5 record would be very different. Using the same coloring scheme as above (prior in red, on-field results in blue, combined estimate in black), you can see that the two distributions are essentially pulling against each other, with the maximal value of our estimate landing somewhere in between the two means:

That wraps up a more accurate estimate of the team’s true talent, one that doesn’t necessarily overreact to early season results. What we just applied is Bayes’ Theorem, one of the hallmark theorems in statistics and discrete math.

So, back to the Mets. They currently sit at 11-12. Most preseason projections had them in the range of 90-95 wins; for simplicity, let’s split the difference and say 92 wins, which works out to a 56.7% W%. Here’s our best estimate of the true talent W% of the team:

Suddenly, things don’t look so bad - our model estimates the Mets as a true-talent 55% W% team, which paces out to about 89 wins. The early season has been frustrating, but we shouldn’t totally throw out our preseason assumptions, and when we incorporate them appropriately, we find the Mets are still very solid and have a good shot at the division title.

To be clear, we’ve made a lot of simplifying assumptions here. Modeling team wins as coin flips is a huge abstraction that ignores competition, starting pitching, and various on-field effects. Whether the prior should be weighted more or less is also a discussion worth having in more detail; here, I’ve weighted it to have equal effect on our estimate to 81 on-field games, but that’s a totally arbitrary mark. Nevertheless, this example should hopefully help you understand a simple probabilistic inference and why panicking about early season results isn’t necessarily a good idea.

If you’re curious and want to mess around with this some more, check out this visualization link. Here, you can put in your own estimates of the Mets win percentage and see how it changes as we get more data throughout the year: