Best Ball: Stacking Strategy Through Simulation (Fantasy Football)

Last year, I wrote an article examining the effect of stacking fantasy players – starting players from the same NFL team – in your lineup. The takeaway was intuitive: QB/WR stacks (and other combinations) increase the boom/bust profile of your team.

This time around, I wanted to take a look at stacking in Best Ball. We know that individual player variance is crucial in Best Ball formats, about a third as crucial as how much a player actually scores. Will stacking prove to be as beneficial? All data, unless otherwise specified, is from nflfastR.

What’s the mechanism?

In typical redraft fantasy formats, managers select the players that they want to start each week. Starters accumulate points on your team, while non-starters, or bench players, do not. Their scores are only used in the event of a tiebreaker.

Best Ball eschews this concept by auto-starting your optimal lineup each week. You don’t choose which three WRs to start; after they’ve played, your WR score is the total of the three highest performing wideouts. This allows, as discussed in the player variance article, for an element of ‘bust protection’. If a player does not score that many fantasy points, there’s a chance his performance will not actually hurt you. That’s because other, higher-scoring WR performances will be inserted into your lineup.

The natural extension is that we want volatility in our Best Ball lineups, since we can enjoy the booms while simultaneously guarding against the busts.

Simulation Background

We’re going to try to estimate a very specific metric: advance rate in a 12-team league after the first 13 weeks. Generally, the top two teams – in terms of cumulative points – are the two squads that advance out of the twelve, which means that the average rate for advancing is 1/6 (or 2 out of 12). I’m using this set-up because it’s broadly standard, but also because it’s used by Josh Lark in this excellent article that dives look and deep into the concept of stacking.

Josh looks at actual, empirical data from Best Ball tournaments and works to extract the impact of having stacked players on your roster. In this article, I’m going to be taking a simulation approach.

Simulations, compared to empirical work (looking at the actual available data), have their pros and cons. We can completely control and isolate for the variables we want in a simulation, and it’s easy to run more iterations to get a huge sample size. This might be trickier in an empirical setting, since there are all sorts of confounding variables and the sample sizes are limited (at least, compared to the hundreds of thousands of iterations a simulation can run). However, it can be tricky to actually have a simulation reflect real life; no matter how hard we try to mimic actual fantasy football data, there will always be differences between what we generate on our computer and what’s happening on the field. This is a huge benefit of empirical data!

Generally, the two approaches broadly agree, with some slight differences depending on the specific case. However, it is useful to test both approaches to see if there es a difference, and what that difference might tell us!

Simulation in Action

If you know me, you know that I love to run simulations. Here’s what we are going to be doing today:

  1. Calculate the mean and standard deviation of fantasy points scored (Half-PPR) across the major positions (QB, RB, WR, TE) for all players who played 7+ games in 2021. All data is from nflfastR.
  2. Gather the data for highly relevant, top-scoring QBs and WRs (scored 200+ points) since 2018. Organize the data so that QBs and WRs on the same NFL team in the same game are paired together.
  3. Sample a QB and WR score independently from Step (2). This will be the ‘unstacked’ pair of players, or Lineup 1.
  4. Sample a QB and WR score, where both players are on the same team and in the same game, from Step (2). This is our ‘stacked’ pair, or Lineup 2, and their scores will be correlated.*
  5. Draw the rest of the scores for both Lineup 1 and Lineup 2 from a Normal distribution using the mean and standard deviations from Step (1). In this case, draw 2 QB scores, 5 RB, 7 WR scores each, and 2 TE scores. Including the QB/WR pair already generated in Steps (3 – 4), this broadly represents the make-up of a typical roster with 18 slots.
  6. Calculate the scores of Lineup 1 and Lineup 2 by taking the highest QB score, the top 2 RB and WR scores, the top TE score, and then the best remaining FLEX score.
  7. Perform Steps 3 – 6 a total of 13 times, and add the weekly scores for Lineups 1 and 2 to get a full season of data.
  8. Perform steps 3 – 7 a total of 100,000 times to get many seasons of data.

Essentially, we are taking two identical lineups in terms of their scoring distribution, and toggling one single variable: if the best QB and WR in the lineup are stacked or not. Since this variable is the only difference between the lineups, any change in performance can be attributed to the stack.

Once we have this data, we can look at the score that is needed to advance among non-stacked teams, or a score in the 83.3rd percentile (top 2 out of 12). We can then look at how many of the ‘stacked’ lineup seasons beat this mark and thus advanced. Remember, the base rate of advancing is 16.6%, or 1/6. Here, we find that the advance rate is 17.6%, or 1% better. I tried adjusting the simulation to include two stacks instead of one, and the advance rate stay about the same (17.5%).

This is within the ballpark of the result that Josh found in the 2020 data, which I think is a good thing. It would be strange if the numbers were very off, and getting a similar result via a different method (and across different data) is reassuring. Still, the ultimate effect is a bit lower than I was hoping for…


In the player variance article, the result was a ringing endorsement. Individual variance matters a lotwhich means that the volatility inherent in a player’s outcomes should affect their ADP. Generally, it does: Mike Williams, historically a boom/bust WR, is currently the WR19 in redraft and WR16 in Best Ball.

We confirmed here that stacking matters…but how much so? I looked at stacking the best QB and WR on your roster, which we would expect has a greater impact than stacking a WR2 or WR3, and the jump in advance rate was only 1%. Now, an improvement in 1% is certainly nothing to sneeze at: fantasy football is a game of thin margins, and you should take any edge you can get. Stacking is a good strategy for Best Ball; indeed, it would be surprising if the data suggested otherwise. I was just hoping for a bit more.

The upshot is that you shouldn’t go too far in trying to construct stacks on your roster. The margin is thin enough – just a 1% improvement – ​​that excessively targeting stacks can have consequences. Josh has a great metric that indicates when a manager ‘reaches’ relative to ADP to construct a stack, and the results indicate the outlook for said manager is not rosy. I am happy to reach when it comes to player variance but not, it turns out, when it comes to stacking.


Want to hear more? Message me on Twitter.

*I really wanted to draw the correlated scores from a Multivariate Normal distribution, which allows us to specify a covariance matrix. Unfortunately, using this process actually makes stacking look very harmful. I figured out that this was because the Normal is marginally symmetric, which means the downside scores sometimes ended up being look and low. Real fantasy data is skewed right, with a left bound more or less at zero. That’s why I used actual scores for this step, although a Normal distribution for the other steps (with an appropriately small variance) should be ok.

Leave a Comment

Your email address will not be published.