Ashlagi, Kanoria and Leshno
Azevedo and Leshno
This is really from my paper “Lottery Design for School Choice.” (Do a separate post for Probabilistic Predictions in Random Matching Markets.)
Note that I have a chip on my shoulder.
Without my work, this would be a dissertation. With my work, it becomes a class exercise for a PhD student. (Reference “which markets lead to stark effect of competition”?)
If people aren’t using it, I need to do a better job of making it seem usable.
I chose the title to emphasize the “managerial insights” that come out of the paper, but I view the biggest contribution of the paper to be developing the analytical model.
Although a rigorous proof exists only for cases where preferences are “sparse”, we will see that the method can generate predictions for any list lengths. (I was most excited by number of matches, as no other work had really been able to do this, and this quantity is only interesting when lists are not too long)
Analyzing the number of unmatched agents becomes tractable.
Consider a random matching market with four parameters:
Schools have uniform random priorities, and student preferences are also uniformly random. This is of course a very stylized set of assumptions, but many papers have made these assumptions, and precise analysis of these markets is a challenging problem that has already resulted in a number of high-profile publications.
This post will show how to make analysis of these markets easy.
My analysis techniques will make the following assumptions: 1. The number of proposals at each school is Poisson with mean . 2. From a student’s perspective, each of their proposals is accepted independently with probability .
I leave these unjustified for now: my focus in this post is on how to make predictions, not why it works. If you are the skeptical type, know that my paper does provide some rigorous justification for the equations in this post, and also that these equations can be used to reproduce most of the insights shown for uniform markets.
Given and , we can calculate the expected number of matches from the perspective of each side of the market. If a school receives more proposals than its capacity, it fills all seats. Otherwise, it accepts all proposals. Therefore, Assumption 1 above implies that each school’s expected enrollment (number of accepted proposals) is
Meanwhile, looking from students’ perspectives, Assumption 2 implies that each student is matched with probability . Therefore the number of matches must be
The other equation we need is one that relates and . I use the following equation.
For a heuristic argument justifying this choice, note that Assumption 1 implies that there are total proposals sent, and of them are accepted, so is the ratio of accepted proposals to total proposals.
Alternatively, we could express as a function of . In that case, we could calculate that the expected number of proposals sent by each student is , and therefore, the expected number of proposals sent to any given school is
Combining and , we get…
It is easy to see that for any , , , and this equation has a unique solution : as increases from to , the left side increases continuously from to , while decreases continuously from to , causing the right side to decrease continuously from to . Therefore, there must be a unique choice of such that the two sides are equal.
Let be the solution to this equation, and $ = p().
Then the predicted fraction of students who receive their choice is . From this, we can calculate all sorts of statistics, including matched students’ average rank (which has been a focus of prior work).
For example, the setting of Ashlagi, Kanoria, and Leshno assumes that , , and . Plugging these choices into and solving can yield all of the insights present in their paper.
Note that when (students list all schools), the number of matches “should be” (the minimum of the number of students and the number of seats). The equation above will predict a slightly smaller number. This is because it implicitly assumes that students sample schools on their list with replacement.
Note that I prove this is rigorous when and are bounded and . But I think it works much more broadly than that. At the very least, we can plug in whatever values we want.
Strictly speaking, I didn’t prove their result. But I do provide all of the same insights, with much less work. And I show how their formulas would extend to other settings. For example, you can use this to analyze what happens if students submit shorter lists, or if each school has two seats instead of one.
Think about it as, these equations will almost always give you the right answer. There might still be a little work to prove it.
Show figures?
Suppose that we relax our assumption that schools have the same number of seats. Now school has capacity . Furthermore, we will allow students to list different numbers of schools, and schools to differ in their popularity. Specifically, student lists schools, and the schools listed by are sampled from a fixed distribution over the set of schools. Specifically, let be times the…
Mention Immorlica-Mahdian, Kojima-Pathak.
Define , so that the average “school popularity” is .
Then the modification becomes
Where
So far, we have focused on predicting aggregate outcomes (i.e. the total number of proposals, or the fraction of students who receive their choice). But this model can also be used to make predictions for individual students and schools.
Suppose that student proposes to school . Their probability of admission is Applying our independence assumption, we conclude that for a student who listed schools in the order the probability of being matched to school is .
We can also predict the cutoff score distribution for an individual school. For example, suppose that student priorities are sampled uniformly on . (This is without loss of generality – because only the ordering of students matters, any atomless distribution would work.)
Suppose that we want to know the chance that the cutoff score at school is at most . This is simply .
When the cutoff at a school is zero, we might be interested to know how many of its seats are filled. The chance that school is
Mention that of course we can take the same limit as them: scale up and by a large constant (keeping distribution over fixed). In this limit, the predictions coincide: my converges to one minus their cutoff score.
But there is actually an easier way! We can just replace the function with in , and solve the model identically from there.
Note that this shows how to recover their prediction in this special case, and how to improve upon it by allowing school capacities to be arbitrary (rather than assuming they are large). But we have still made some very restrictive assumptions on priorities (iid uniformly random, or in the paper they can also be identical across schools) and preferences (sampled iid from a fixed distribution over schools). Azevedo Leshno works much more generally! My next post will explain how to recover their results, and produce probabilistic predictions in markets with almost arbitrary preferences and priorities.