Numeric Projections Should Be Distributions

Next week I’m releasing projections for the 2017 Major League Baseball season. These projections differ from others because they are distributions. I believe most projections should include a distribution, range, or some other precision indicator.

To argue why, I need to be a little obtuse and deconstruct the idea of a projection. It is an educated guess about a future metric. It is not, however, a statement of certainty. When someone projects a single number, like 35 home runs, we understand if the total at the end of the season is not exactly 35. We just expect it to be close. 

Single value predictions are useful because they allow you to rank players by their expected output, and make meaningful drafting/bidding decisions based on the lists. We must not, however, disregard the assumptions we know are explicit in projections, which we just acknowledged: these rankings are not certainties. We expect the rankings compiled at the beginning of the season to differ from the ones at the end of the year. We know there will be unexpected injuries, breakouts, and benchings. Some systems make this inexactness explicit, and include ranges or confidence intervals.

If we want to consider uncertainty when valuing players, we have to start with projections that include many possibilities. That is, projections that are not a single value, but also include measures at least for range, and ideally also skew, and extreme upside and downside events.

Here is a link to an example. I hope projections like these will be helpful in constructing draft rankings, and also more complex decision tools.