# LeBron James, Coupon Collector

## 2019/11/20

Last night, LeBron James became the first player to have a triple double against all 30 NBA teams. The article mentions that he has 86 career triple doubles. To me, it sounded surprising that he needed "only" 86 triple doubles to get one against every team. This post will do some quick back-of-the-envelope calculations to test my intuition.

My first thought was to model this as a Coupon Collector's Problem. This asks how many triple doubles should be needed, assuming that each triple double is equally likely to come against any team in the league. With 30 teams, it predicts that on average, 120 triple doubles are needed. The limit theorem of Laplace, Erdos and Renyi suggests that the chance of collecting all 30 teams after 86 triple doubles is approximately 18%.

n = 30
n*sum(1/c(1:n)) #Expected number of triple doubles needed
## [1] 119.8496
alpha = (86-n*log(n))/n
exp(-exp(-alpha)) #Probability of collecting every team with 86 triple doubles
## [1] 0.1814732

Of course, triple doubles are not equally likely to come against every team. Most notably, you can never get a triple double against your own team! Russell Westbrook has many more triple doubles (141) than LeBron , but never had a chance to get one against the Thunder until he moved to the Rockets this season. So let's take a different simple model: each game is equally likely to produce a triple double, and outcomes are independent across games.1 Using data on the number of games against each team for LeBron and Westbrook, and using a frequentist estimate for the probability of a triple double, we get the following:

lebron_games_played = c(56, 54, 51, 51, 57, 14, 31, 31, 55, 29, 30, 56, 31, 30, 32, 39, 56, 31, 34, 53, 28, 55, 49, 33, 30, 29, 32, 54, 31, 50)
lebron_pTriple = 86/sum(lebron_games_played)
prod(1-(1-lebron_pTriple)^lebron_games_played) #Probability of triple double against every team
## [1] 0.08971551
westbrook_games_played = c(20, 18, 19, 20, 22, 19, 38, 39, 20, 37, 35, 22, 40, 35, 36, 21, 20, 39, 40, 19,  1, 21, 20, 37, 41, 38, 37, 20, 38, 21)
westbrook_pTriple = 141/sum(westbrook_games_played)
prod(1-(1-westbrook_pTriple)^westbrook_games_played) #Probability of triple double against every team
## [1] 0.1153136

The estimate of 9% for LeBron suggests that my intuition was reasonable: it's somewhat unlikely that LeBron would have a triple double against every team, though certainly not shocking. These calculations suggest that Russel Westbrook is actually more likely to have gotten a triple double against each team, despite having only played against the Thunder once! He plays them several more times this year, giving him a decent shot of matching LeBron soon.

Similar ideas apply in other sports. For example, Brett Favre and Peyton Manning are the only QBs to have defeated all 32 NFL teams, and Drew Brees has a chance to join them this week. Pulling his game splits, the model predicts a 47% chance that he would already have hit this mark. This jumps to 80% if we condition on the event that he won his only game against the Saints. So in some ways, he's actually unlucky not to have accomplished this feat already!2

Questions for readers: what are some of your favorite (or least favorite) sports records?

brees_games_played = c(7, 28,  5,  5, 24,  7,  5,  6,  9, 10,  7,  7,  6,  5,  5, 10, 11,  6,  6,  5,  1,  8,  6,  7,  6,  3,  9,  5, 10, 28,  4,  8)
brees_pWin = 158/269
prod(1 - (1-brees_pWin)^brees_games_played) #Probability of defeating all 32 teams
## [1] 0.4670173

1. Of course, this still neglects many factors. Certain opponents may be tougher on defense, or players may try harder in marquee matchups. Most notably, LeBron has been playing primarily as a point guard with the Lakers, making triple doubles more frequent this season than in years past. All models are wrong, but some are useful. Although we could build and calibrate a more detailed model with more data, I think this model is already useful.

2. Admittedly, the iid model seems worse for quarterback wins than triple doubles. Clearly, some teams are harder to defeat than others, and neglecting this in the analysis inflates the estimated probability of collecting a win against every team.