This year, I ran my first marathon! The Twin Cities Marathon passes 2 blocks from my house (we are at mile 22). I enjoyed the training and race day, and ended the race feeling surprisingly good.
It turns out, some people enjoy running further than a marathon! The most famous ultramarathon in the United States is the Western States Endurance Run, which covers 100.2 miles of trail in California. In order to run this event, you must run a qualifying ultramarathon in the previous year, and pay the entry fee of $525. Despite these barriers, demand far outstrips supply! This year, over 11,000 people applied, and due to National Forest Service regulations, only 369 spots are available.
Some of these spots go to “automatic” qualifiers. This page goes into detail about the different ways you can qualify automatically, but the main ways are
There are several other ways to be an automatic qualifier, which are less common. For the 2026 race, there were 112 automatic qualifiers, leaving 257 spots to be awarded by lottery.1
According to this page, 11,328 people applied in the lottery for 2026.2 However, not all applicants are equally likely to be selected! The reason is that applicants who have been unsuccessful in the past are given more tickets.
Specifically, if this is your time applying since your last time running the race, you get tickets. This means that first-time applicants get one ticket, second-time applicants (who were not selected on their first time) get two tickets, third time applicants get four, fourth time applicants get eight, and so on. The tickets are mixed and drawn at random.3 Each person can only win one spot, even if more than one of their tickets is drawn. The drawing continues until the right number of people have been selected. This means that to fill, say, 257 spots, you will likely need to draw more than 257 tickets.4
For the main 2026 lottery, there are a total of 93,003 tickets. Five applicants are on their 10th year of applying, and get a whopping 512 tickets! A natural question is, what is their chance of being selected?
It turns out, answering this question is surprisingly tricky! One way to estimate it is to repeatedly simulate the lottery draw, and report the results. The lottery organizers do this each year. Their 2026 analysis is here, and includes a table summarizing their simulation results.
They find that the applicants on their 10th year have approximately an 80% chance of being selected. This is pretty good, but still means that it’s quite likely that at least one of them will not be chosen! That is, it is likely that there will be someone who has run an ultramarathon for 10 years in a row to earn 512 tickets and is still not selected. Tough.
However, I am a mathematician at heart. I am less focused on the human interest side of the story, and more focused on the question of whether there are other ways to calculate or approximate these success probabilities. If you’re interested in that as well, read on.
For the rest of this post, let’s introduce some notation. There are applicants for available spots. Applicant has tickets. Let be the total number of tickets, and let be the probability that applicant is successful (that is, they claim one of the spots).
The question in this section is, can we calculate the exactly, rather than simulating them? The following code does this by enumerating all possible sequences of applicants and calculating the probability of each sequence.
library(combinat)
##
## Attaching package: 'combinat'
## The following object is masked from 'package:utils':
##
## combn
exact_perm = function(Entrants,Tickets,m){
tickets = rep(Tickets,Entrants)
permutations = do.call(rbind, permn(1:length(tickets)))
denom = function(perm){prod(cumsum(rev(tickets[perm])))}
probs = prod(tickets)/apply(permutations,1,denom)
ind = match(Tickets,tickets)
successful = Vectorize(function(i){return(sum(probs*(rowSums(permutations[, 1:m] == i))))})
#rbind(Tickets,successful(ind))
successful(ind)
}
Tickets = c(8,4,2,1)
Entrants = c(1,2,3,4)
m = 3
scale = 1
#start_time = Sys.time()
exact_perm(Entrants*scale,Tickets,m*scale)
## [1] 0.7213612 0.4706161 0.2625541 0.1374361
#Sys.time()-start_time
Unfortunately, there are possible sequences, which is generally enormous. With 10 applicants, this code took 60 seconds to run. With 11 applicants, it took 25 minutes. Running it with even 20 applicants is totally infeasible, let alone 11,000!
We could try to speed this up using two observations:
The first observation leaves us with sequences of length , which is approximately : still far too many. The second observation implies that we can stop tracking individual applicants, and only track the number of tickets held by the first applicant to be drawn, the second, and so on. Since applicants can only have one of 10 ticket counts, this gives us something on the order of possibilities (slightly fewer, since certain ticket counts have fewer than applicants). But is still a completely impractical number. We need a new approach.
One challenge is that the probability of drawing a given individual evolves as tickets are pulled from the bin. Furthermore, this probability depends on exactly who has been drawn so far. To get around this, I introduce a simple yet helpful observation, which will provide a more efficient method for calculating exactly, as well as inspiring a method for approximating it.
Key Observation. We could choose to return tickets to the hat after they are selected. This might make the lottery last a bit longer (since we can now draw an individual ticket more than once), but will have no effect on the success probabilities , since whenever a duplicate ticket is drawn we ignore it.
The advantage of this perspective is that now the drawings themselves are iid. This observation inspired the following approach for calculating , which I think is somewhat clever (if I do say so myself).
Fix an applicant . Suppose that instead of drawing until unique names have been chosen, I draw tickets until the first time one of applicant ’s tickets is chosen. I then look to see, at this point, have other applicants been chosen? If so, then is unsuccessful. If not, then has been selected.
Define to be the probability of drawing a ticket from applicant on a single draw. The total number of draws required to draw one of ’s tickets follows a geometric distribution: the probability that other tickets are drawn before the first ticket from applicant is exactly .
Given , we want to know, what is the chance that draws from the applicants other than select at least unique applicants? Let denote the probability that draws from the tickets of applicants other than result in exactly unique individuals. Then we can express as follows:
The question becomes, how do we calculate ? We will tackle this through dynamic programming. Specifically, order all applicants other than arbitrarily (make applicant come last).
Define to be the probability that we get exactly unique applicants when sampling times iid from the first applicants, with probabilities proportional to . Then we clearly have , so we can write
Define . It is clear that for all , and . I claim that we can use the following equations to fill the remaining entries of :
In words, this says that in order to get unique individuals in samples from the first , you must either not draw individual and get individuals from the remaining , or draw individual time and get individuals from the remaining .
Below, I implement this recursion in R code (with some help from ChatGPT). Some error is introduced by only including finitely many terms from the sum in (1). However, this error can easily be bounded: if we only sum up to , then for we have (that is, increasing decreases the probability of selecting fewer than applicants). Furthermore, . From these two equations, it follows that if we truncate (1) at , we under-estimate the true success probability, with error at most .
This method is much faster than enumerating over sequences. The array is of size . Filling an individual cell in might require summing up to terms, so the time complexity of this function is . This method calculates the probability of success for a single applicant; to do this for all applicants, we must call calculate_P once for each unique value in Tickets (potentially up to times if all applicants have different numbers of tickets).
We can improve this in various ways. For example, the code below remembers only and , reducing the space complexity from to . We could potentially only track up to , reducing space complexity of calculate_P to and time complexity to . However, the main point is that we have moved from an exponential regime – something like – to a polynomial one.
Without any optimization, the code below handled a case with applicants and spots (achieved by setting scale = 10) in 14 seconds. Recall the enumerative approach could only handle problems with or so. But even my DP-based approach becomes totally prohibitive with and .
# t : number of tickets for each applicant
# d_max : maximum number of draws
#
# Returns: matrix P, where P[j, d] represents the probability that d draws select j unique applicants.
#Ideally, this could determine an appropriate d_max automatically
calculate_P <- function(t, d_max) {
n <- length(t)
p <- t / sum(t)
s <- cumsum(p) # so p_k = p[k] / s[k]
Q_prev <- matrix(0, d_max + 1, d_max + 1)
Q_prev[1, ] <- 1 # k=0: always 0 uniques
for (k in seq_len(n)) {
p_k <- p[k] / s[k]
Q_curr <- matrix(0, d_max + 1, d_max + 1)
Q_curr[1, 1] <- 1
for (d in 1:d_max) {
bp <- dbinom(0:d, d, p_k) #Calculating binomial probabilities
j_max <- min(k, d)
Q_curr[1, d + 1] <- bp[1] * Q_prev[1, d + 1] # j = 0
if (j_max >= 1) {
block <- Q_prev[1:j_max, d:1, drop = FALSE] # j_max x d
conv <- as.vector(block %*% bp[-1]) # length j_max
Q_curr[2:(j_max + 1), d + 1] <-
bp[1] * Q_prev[2:(j_max + 1), d + 1] + conv
}
}
Q_prev <- Q_curr
}
Q_prev[-1, -1] # Drop j=0 row and d=0 column
}
#Todo -- replace hacky setting of d_max, could also calculate bound on error
exact_dp = function(Entrants,Tickets,m){
#pi[i,j] = probability that i is j^th individual selected
pi = matrix(0,nrow=length(Tickets),ncol=m)
for(i in 1:length(Tickets)){
if(Entrants[i]>0){
other_apps = Entrants
other_apps[i] = other_apps[i]-1
all_other_tickets <- rep(Tickets, other_apps)
d_max = m*10 #Hacky way to set d_max -- replace
P = calculate_P(all_other_tickets,d_max)
p_i = Tickets[i]/(Tickets[i]+sum(all_other_tickets))
pd = p_i*(1-p_i)^(1:d_max)
pi[i,] = c(p_i,(P[1:m,]%*%pd)[-m])
}
}
apply(pi,1,sum) #return probability of selection for each individual
}
scale = 10
exact_dp(Entrants*scale,Tickets,m*scale)
## [1] 0.7101440 0.4615618 0.2654645 0.1425847
Given the limitations of the exact methods, it is natural to wonder whether we can come up with a good approximation for using an algorithm that runs much more quickly. Both of the methods below are very fast – much faster than using simulation. But the real appeal (to me) is that they provide explicit epressions that offer much more insight than any of the exact methods.
Method 1
The idea underlying this approximation is that on each draw, the chance that one of applicant ’s tickets is chosen is . If we decided in advance to draw exactly times, then the success probability for applicant would be exactly . The first step of the approximation is to find the value of such that the expected number of unique successful applicants is equal to the number of available spots. The second step assumes that we always draw exactly times (instead of sometimes drawing more or fewer, depending on how many duplicates we see).
Method 2
This approximation rests on two ideas, neither of which is exactly correct. First, all tickets should have the same chance of being drawn. Second, knowing that one particular ticket was drawn should not give much information about whether another ticket is drawn.
Suppose that we decide to choose each ticket independently with probability . That is, for each ticket we flip a coin, and if it lands heads, that ticket is chosen. Step 1 in the method sets the bias of the coin so that on average, we get successful applicants. Step 2 then calculates each applicant’s selection probability, given this .
Connection to Sampling Literature
Unsurprisingly, Chat GPT informs me that there is a large literature on “Sampling Problems.” I don’t know this literature well, and do not want to turn this post into a literature review, but my Method #2 is equivalent to a Poisson trick that has already been proposed by that literature. That approach is described as follows:
This is equivalent to my second method, with .
The following code implements each of these approximations (as well as a monte carlo simulation). As you can see, both approximations produce answers that are very close to the simulated results. I spent some time trying to formally bound the error from these methods, but that is more of a research-level problem that will take more time than I have at the moment.
approx_1 = function(applicants,tickets,places_available){
p = tickets/sum(tickets*applicants)
excess_demand = function(d){return(sum(applicants*(1-(1-p)^d))-places_available)}
d_max = sum(applicants)
while(excess_demand(d_max)<0){ d_max = 2*d_max}
d = uniroot(excess_demand,lower=places_available,upper=d_max)$root
1-(1-p)^d
}
approx_2 = function(applicants,tickets,places_available){
excess_demand = function(p){return(sum(applicants*(1-(1-p)^tickets))-places_available)}
p = uniroot(excess_demand,lower=0,upper=1)$root
1 - (1-p)^tickets
}
simulate_success_probs = function(applicants,tickets,spots_available, ntrials = 10000, seed = NA){
all_tickets <- rep(tickets, applicants)
counts = table(factor(c(),levels=tickets))
if(!is.na(seed)) set.seed(seed)
for(t in 1:ntrials){
lottery_numbers = rexp(length(all_tickets))/all_tickets
df = as.data.frame(cbind(all_tickets,lottery_numbers))
df = df[order(df$lottery_numbers), ]
counts = counts + table(factor(df$all_tickets[1:m],levels=tickets))
}
counts/applicants/ntrials
}
generate_df = function(Entrants,Tickets,m,n_trials = 10000,seed=123){
pi_df = as.data.frame(cbind(
approx_1(Entrants,Tickets,m),
approx_2(Entrants,Tickets,m),
simulate_success_probs(Entrants,Tickets,m,n_trials,seed)
))
names(pi_df) = c("Approx 1","Approx 2", "Simulation")
pi_df
}
# Data from 2026 WSER
Tickets = 2^(9:0)
Entrants = c(5,44,128,239,354,672,1102,1752,2563,4469)
m = 257
pi_df = generate_df(Entrants,Tickets,m)
round(100*pi_df,2)
## Approx 1 Approx 2 Simulation
## 512 79.60 79.52 79.70
## 256 54.78 54.74 54.85
## 128 32.74 32.73 32.66
## 64 17.98 17.98 17.98
## 32 9.43 9.44 9.44
## 16 4.83 4.83 4.84
## 8 2.45 2.45 2.45
## 4 1.23 1.23 1.23
## 2 0.62 0.62 0.62
## 1 0.31 0.31 0.31
Three of the lottery spots are given out (uniformly at random) to people who attend the lottery in person and were not selected in the main lottery. Thus, of the 257 lottery spots, only 254 are awarded by the main lottery. I will ignore this fact below.↩︎
You can get data from past years in a single spreadsheet, or by replacing the ‘2026’ in the previous url with the desired year.↩︎
There used to be physical tickets selected by a Bingo machine, but in recent years the drawing has been virtual.↩︎
This same procedure is actually used to select some of the automatic qualifiers through two “raffles” that are held each year (before the main lottery). Each raffle awards spots in the race to 5 applicants. The only difference between the raffle and the main lottery is that in the lottery, tickets are determined by years spent waiting, while in the raffle, applicants can purchase as many $5 tickets as they wish (the money is then given to charity).↩︎
In practice, this is not actually true. Every year, some of the applicants who are selected withdraw from the race. Anticipating this, as part of the lottery, race organizers draw an ordered waitlist of 75 people. Many of these people eventually receive an offer to run the race, although sometimes this offer come only shortly before the race itself.↩︎