One of the units in my operations management course is on variability and waiting time. I really enjoy this unit, for several reasons. First, we all have experience waiting in lines, so the topic feels personally relevant. Second, despite our experience waiting in lines, we don't have experience thinking about them systematically, and our intuition is actually quite poor.

One of the most important insights from queueuing theory is that as utilization rises, average waiting time "blows up." When I first taught the course, I found myself struggling to explain **why** this occurs without resorting to a lot of calculations. Over time, I have found a way of thinking about the problem that offers "quantitative intuition" (i.e. a simple rule of thumb that enables back-of-the-envelope calculations), without resorting to analysis of Markov chains. This post outlines how I introduce the topic to my MBA class.

Consider the following scenario. You run a food truck, and have worked diligently on streamlining your service process, so that it now takes an average of only one minute to take a customer's order, collect payment, and hand them their food. Your truck is fairly popular, with an average of 50 customers showing up per hour. As a result, there is often a line, and on average five people are waiting by your truck at any one time (this includes those waiting to place their order as well as anyone you are currently helping). Say that the "flow time" for a customer is the time from joining the line to receiving food. I present students with three questions:^{1}

Currently, what is the average flow time?

If demand increased by 10%, what would be your new average flow time?

If demand doubled and you hired a second employee who was just as fast as you, what would be your new average flow time?

The first question can be answered using Little's Law. Our flow rate \(R = 50\) customers/hour, while our average number in system \(N = 5\) customers. Therefore, the average flow time is \(T = N/R = 1/10\) hour, or 6 minutes.^{2} Unfortunately, we can't use Little's Law to answer the second and third questions. While we can predict our new flow rate, we don't know how these changes will affect the average number of customers waiting in line. Therefore, I ask students to take a guess at what they think the answers might be.^{3} If you haven't seen this topic before, I encourage you to do the same now!

To actually answer these questions, we need a tool that will allow us to make predictions about the future (rather than just using data retroactively). Queueing theory is one such tool. If we assume that customer inter-arrival times and service times both follow an exponential distribution^{4} and customers never pre-emptively abandon the line, then the number of customers in system follows a continuous time Markov chain. Anyone familiar with Markov chains can write out balance equations and use them to solve for the steady-state distribution. Formulas for the average flow time are given by this wikipedia page. While they look complicated, they simplify significantly in the case of one or two employees (servers). If we define our utilization to be the percentage of time that we must work to meet demand, then with a single employee we get \[Average\, Flow\, Time = \frac{Average\, Service\, Time}{1 - utilization}.\] Recall that our average service time is one minute. Because our demand is 50 customers/hour but our capacity is 60 customers/hour, our utilization is \(5/6\). Therefore, this formula predicts an average flow time of 6 minutes, in agreement with our previous calculation. If demand increases by 10%, then our utilization also increases by 10%, to \(11/12\), and our flow time *doubles* to 12 minutes!

Meanwhile, with two emplyees our formula for average flow time is^{5} \[Average\, Flow\, Time = \frac{Average\, Service\, Time}{1 - utilization^2}.\] When demand is 100 customers/hour and we have two employees, our utilization remains \(5/6\), but this formula predicts a dramatically reduced flow time of \(36/11 \approx 3.27\) minutes!

These answers should seem surprising. I have a lot of data (from student guesses) suggesting that they are surprising. If you are not surprised, then either (i) you have already learned about this topic, or (ii) you are not thinking about it very carefully.

What do we take from these calculations? One possible conclusion is that randomness is confusing, our intuition is bad, and we need to use the formulas above to predict the effect of a change. A more helpful approach is to use the formulas to develop insights. For example, both formulas clearly show that as utilization approaches \(1\), average flow time increases dramatically. They also show that fixing the service time and utilization, a "large scale" system (with higher demand and higher capacity) results in lower average waits. These are two key messages from class.

However, I wanted to go further. Why did flow time double in the first case? To answer this question, let's consider a simple setting with no randomness. Suppose thata group of five customers show up all at once. From there on, everything is perfectly predictable, and in line with historical averages: one new customer arrives every 1.2 minutes (equivalent to a rate of 50 customers/hour), and we serve a customer every minute. How long until we "catch up" and empty the line?

You might initially be tempted to say 5 minutes, as this is how long it takes to serve our initial 5 customers. However, in that time, four new customers have showed up! Thinking through things more carefully, we see that the answer is 30 minutes: in that time, we can serve 30 customers (the original 5 plus 25 new arrivals).

Now suppose that demand is instead 55 customers per hour. How long will it take us to empty the line after that group of five shows up? In this case, the answer is 60 minutes: in this time, we can serve 60 customers (the original 5 plus 55 new arrivals). Thus, we see that this 10% increase in demand doubled the time that it takes us to catch up! Furthermore, Little's Law establishes that this time is proportional to customers' average wait.^{6}

In the original scenario, our capacity is 60 customers/hour, while our demand is 50 customers/hour. This means that we have an "excess capacity" of 10 customers/hour. In other words, it takes an hour to clear a backlog of 10 customers, or 30 minutes to clear a backlog of 5 customers. A 10% increase in demand reduces our excess capacity to 5 customers/hour, **half** of what it was before. Thus, our average wait time approximately **doubles.**

This example illustrates that the waiting time experienced by our customers is primarily determined by the rate at which we can catch up when we fall behind. Because we can't work on a customer's order before they arrive, we will inevitably fall behind sometimes (i.e. whenever a group of customers arrive in quick succession). If we are at high utilization, then new customers show up at nearly the same rate that we can help them, and it will take a **long time** to empty the queue again.

This motivates the following rule of thumb. Instead of thinking about percentage changes in demand, ask yourself "how does this change affect my excess capacity?" If you excess capacity is cut in half, wait times should roughly double. If it is cut by a factor of three, wait times should roughly triple. Conversely, if excess capacity doubles, wait times should be approximately cut in half.^{7}

This rule of thumb also helps to explain the lower wait times when demand and capacity both doubled. Notice that this doubles our excess capacity, to 20 customers/hour. Thus, we should expect it to roughly halve wait times.^{8}

In an effort to keep things simple, I have ignored many factors in the analysis above. Most notably, I have taken the size of the backlog (5 customers) as fixed across scenarios. The frequency with which we fall behind (and the amount by which we fall behind) depend on both demand and the amount of variability present in the system. This is one reason that my rule of thumb is not perfect.^{9} However, neither is the model that we started with! In particular, the assumptions that arrivals are steady over a long period and do not depend on the current queue length are frequently violated in practice.

The reality is, most of the formulas we teach in class are worth far less than the insights that they reveal. I find the rule of thumb based on excess capacity to be a helpful way to translate a classroom model into simple intuition which could plausibly be deployed in practice. If you have taught this material before, I'd love to hear your thoughts! How do you explain these concepts to MBAs? And what are your biggest concerns with my rule of thumb?

In class, I don't use the phrase "flow time," as one of the key challenges to applying concepts from class is recognizing them when they arise.↩

This answer is counter-intuitive to many students. After all, if the line has an average of five customers and it takes one minute to serve each, then even the last customer in line should get their food within the next five minutes! How can the average time be 6 minutes? The key to understanding the flaw in this logic is recognizing that there are two different perspectives: the

*customer's*perspective, and the*employee's*perspective. The average of five customers gives the employee's perspective, and does*not*imply that the average customer sees a line of length five. This is easiest to see by taking an extreme: if the truck is very unpopular, the average number of people in line could be below one, yet customers always observe at least one person in line (themselves). Put another way, when the line is empty, nobody is there to observe it! This is known as the inspection paradox. Wikipedia's offerings on the topic are sadly technical and unenlightening, but Allen Downey has an excellent blog post illustrating many contexts in which it arises.↩The most common guesses are that the 10% increase in demand will cause flow time to increase by approximately 10%, and that simultaneously doubling demand and the number of employees should keep flow time roughly constant.↩

For customer arrivals, this is plausible: it holds if customers make independent arrival decisions. Of course, this may not be the case: for example, many students go get lunch right after class lets out. The assumption of exponential service times is even less plausible, but is mathematically convenient, captures the key insights highlighted in this post, and can readily be relaxed.↩

In class, we tell students to use a multi-server approximation derived by Sakasegawa in 1977. As far as I can tell, this approximation has (for no particular reason) become gospel in operations management classes across the country. Students always ask me about where the crazy square root term in the exponent comes from. For this post, it seems simpler to use the exact expression for an M/M/2 queue.↩

One version of Little's Law states that total customer waiting time is equal to the area under the inventory curve. Until the backlog is drained, the inventory curve looks roughly like a triangle with height five and base equal to the time it takes to empty the queue. Thus, doubling the time taken to empty the queue roughly doubles total customer waiting time.↩

These calcluations parallel those for calculating the profitability of a business. Suppose that our total revenue is $60, and our total cost is $50. If costs rise by 10%, does profit decrease by 10%? No! Our initial profit (or "excess revenue") is $10. A 10% rise in cost cuts profit in half! Thus, for a business with low profit margins, even small changes in cost or revenue can have a dramatic effect on profit. Analogously, in a queue with high utilization, even small changes in the arrival or service rate can dramatically affect wait times.↩

This conclusion in part depends on how work is structured. Are employees working in parallel, or are they arranged sequentially, with each assigned to one step in the process? This will be the topic of another post.↩

It is possible to use simple heuristics to explain the variability term of \((CV_s^2 + CV_a^2)/2\) that appears for more general service time and inter-arrival time distributions, but that is a topic for another post.↩