Similar to how transportation congestion snarls the morning commute into a big city, congestion can happen when users desire to transfer data over the internet more quickly than the network can handle it.
Data is divided into smaller packets and sent across the internet by computers and other devices, which employ a particular algorithm to determine how quickly to send each packet. These network capacity management algorithms work to fairly distribute network capacity among any other users who might be using the same network. These algorithms aim to reduce network delays brought on by data queuing.
Researchers from business and academia have created a number of algorithms over the past ten years that aim to attain high rates while minimizing delays. Some of these, like the Google-created BBR algorithm, are now widely used by numerous websites and applications.
But a team of MIT researchers has discovered that these algorithms can be deeply unfair. According to a recent study, starving is an issue that cannot be avoided in networks where at least one sender obtains nearly minimal bandwidth in comparison to other senders.
“What is really surprising about this paper and the results is that when you take into account the real-world complexity of network paths and all the things they can do to data packets, it is basically impossible for delay-controlling congestion control algorithms to avoid starvation using current methods,” says Mohammad Alizadeh, associate professor of electrical engineering and computer science (EECS).
While Alizadeh and his co-authors were unable in finding a conventional congestion control algorithm that could prevent starvation, other classes of algorithms might be able to do so. In some network scenarios, their analysis also implies that altering these algorithms’ behavior to allow for greater changes in delay may assist prevent famine.
Alizadeh wrote the paper with first author and EECS graduate student Venkat Arun and senior author Hari Balakrishnan, the Fujitsu Professor of Computer Science and Artificial Intelligence. The research will be presented at the ACM Special Interest Group on Data Communications (SIGCOMM) conference.
We started the project because we lacked a theoretical understanding of congestion control behavior in the presence of jitter. To place it on a firmer theoretical footing, we built a mathematical model that was simple enough to think about, yet able to capture some of the complexities of the internet. It has been very rewarding to have math tell us things we didn’t know and that have practical relevance.
Venkat Arun
Controlling congestion
Congestion control is a fundamental problem in networking that researchers have been trying to tackle since the 1980s.
A user’s computer lacks information, such as the caliber of the network connection or the number of other senders using the network, so it cannot determine how quickly to send data packets over the network. Poor usage of the available bandwidth occurs when packets are sent too slowly.
However, if they are sent too quickly, the network could become overloaded, which would result in packets being dropped. These packets must be resent, which leads to longer delays. Delays can also be caused by packets waiting in queues for a long time.
Packet losses and delays are used as signals by congestion control algorithms to estimate congestion and determine the data transmission rate. However, because of the complexity of the internet, packets might be lost or delayed for causes unrelated to network congestion.
For example, the receiver’s acknowledgment can be delayed, or data might be held up in a queue along the route and then released with a burst of other packets. The authors call delays that are not caused by congestion “jitter.”
Even if a congestion control algorithm measures delay accurately, it will not be able to distinguish between jitter- and congestion-induced delays. Delay caused by jitter is unpredictable and confuses the sender.
Users begin evaluating delay differently as a result of this ambiguity, which leads them to send packets at varying speeds. Eventually, this leads to a situation where starvation occurs and someone gets shut out completely, Arun explains.
“We started the project because we lacked a theoretical understanding of congestion control behavior in the presence of jitter. To place it on a firmer theoretical footing, we built a mathematical model that was simple enough to think about, yet able to capture some of the complexities of the internet. It has been very rewarding to have math tell us things we didn’t know and that have practical relevance,” he says.
Studying starvation
The researchers fed a computer their mathematical model, provided it with a number of well-known congestion control techniques, and instructed the machine to utilize their model to discover an approach that might prevent famine.
“We couldn’t do it. We tried every algorithm that we are aware of, and some new ones we made up. Nothing worked. The computer always found a situation where some people get all the bandwidth and at least one person gets basically nothing,” Arun says.
The researchers were taken aback by this outcome, especially given that these algorithms are frequently regarded as being fairly balanced. They began to worry that starvation a severe form of injustice might not be preventable. This motivated them to define a class of algorithms they call “delay-convergent algorithms” that they proved will always suffer from starvation under their network model. All existing congestion control algorithms that control delay (that the researchers are aware of) are delay-convergent.
“The fact that such simple failure modes of these widely used algorithms remained unknown for so long illustrates how difficult it is to understand algorithms through empirical testing alone,” Arun adds. “It underscores the importance of a solid theoretical foundation.”
But all hope is not lost. Even if every strategy they examined was unsuccessful, there may be additional, non-delay-convergent algorithms that can prevent hunger. This implies that one solution to the issue could be to create congestion control algorithms that change the delay range more broadly, making the range bigger than any delay that might occur due to network jitter.
“To control delays, algorithms have tried to also bound the variations in delay about a desired equilibrium, but there is nothing wrong in potentially creating greater delay variation to get better measurements of congestive delays. It is just a new design philosophy you would have to adopt,” Balakrishnan adds.
The researchers are currently working hard to see whether they can develop or discover an algorithm that can end famine. They also intend to tackle other challenging, unresolved issues in networked systems using same strategy of mathematical modeling and computer evidence.
“We are increasingly reliant on computer systems for very critical things, and we need to put their reliability on a firmer conceptual footing. We’ve shown the surprising things you can discover when you put in the time to come up with these formal specifications of what the problem actually is,” says Alizadeh.