3.7 – TCP Congestion Control
- The approach taken by TCP is to have each sender limit the rate at which it sends traffic into its connection as a function of perceived network congestion.
- If the TCP sender perceives that there is little congestion then it increases its send rate
- If the TCP sender perceives there is congestion then it reduces its send rate
- How a TCP sender limits the rate at which it sends traffic into the connection:
- The TCP congestion-control mechanism operating at the sender keeps track of an additional variable (to all the other like buffers and other variables), the congestion window (cwnd).
- Cwnd imposes a constraint on the rate at which a TCP sender can send traffic into the network (the amount of unacknowledged data at a sender may not exceed the minimum of cwnd and rwnd):
- $$LastByteSent-LastByteAcked \leq min(cwnd,rwdn)$$
- Let’s assume the receive-window constraint can be ignored, thus the amount of unacknowledged data at the sender is solely limited by cwnd and that the sender always has data to send:
- The constraint above limits the amount of unacknowledged data at the sender and therefore indirectly limits the sender’s send rate.
- Thus the sender’s send rate is roughly cwnd/RTT bytes/sec. By adjusting the value of cwnd, the sender can therefore adjust the rate at which it sends data into its connection.
- How a TCP sender perceives that there is congestion on the path between itself and the destination:
- A “loss event” at a TCP sender: the occurrence of either a timeout or the receipt of three duplicate ACKs from the receiver.
- When there is excessive congestion, then one (or more) router buffers along the path overflows, causing datagram to be dropped.
- The dropped datagram, in turn, results in a loss event at the sender, which is taken by the sender to be an indication of congestion.
- The case when the network is congestion-free, that is, when a loss event doesn’t occur:
- Acknowledgments for previously unacknowledged segments will be received at the TCP sender.
- TCP takes the arrival of these acknowledgments as an indication that all is well, and will use acknowledgments to increase its congestion window size.
- Note that if acknowledgments arrive at a relatively slow, then the congestion window will be increased at a relatively slow rate. On the other hand, if acknowledgments arrive at a high rate, then the congestion window will be increased more quickly.
- Because TCP uses acknowledgments to trigger its increase in congestion windows size, TCP is said to be self-clocking.
- How should a TCP sender determine the rate at which it should send? TCP uses the following guiding principles to answer the question:
- A lost segment implies congestion, and hence, the TCP sender’s rate should be decreased when a segment is lost.
- An acknowledged segment indicates that the network is delivering the sender’s segment to the receiver, and hence, the sender’s rate can be increased when an ACK arrives for a previously unacknowledged segment.
- Bandwidth probing:
- Given ACKs indicating a congestion-free path and loss events indicating a congested path, TCP’s strategy for adjusting its transmission rate is to increase its rate in response to arriving ACKs until a loss event occurs, at which point, the transmission rate is decreased.
- The TCP sender thus increases its transmission rate to prove for the rate that at which congestion onset begins, backs off form that rate, and then to begins probing again to see if the congestion onset rate has changed.
- Note that there is no explicit signaling of congestion state by the network, ACKs and loss events serve as implicit signals, and that each TCP sender acts on local information asynchronously form other TCP senders.
- The TCP congestion-control algorithm has three major components:
- Slow start (required)
- Congestion avoidance (required)
- Fast recovery (not required, but recommended)
- Slow start:
- When a TCP connection begins the value of cwnd is typically set to small value of 1 MSS, resulting in an initial sending rate of roughly MSS/RTT.
- The TCP sender would like to find the amount of available bandwidth quickly. Thus, in this slow-start state, the value of cwnd is increased by 1 MSS every time a transmitted segment is first acknowledged.
- This process results in a doubling of the sending rate every RTT.
- The TCP send rate starts slow, but grows exponentially during the slow start phase.
- When should the exponential growth end?, there’s three cases:
- If there is a loss event indicated by a timeout, the TCP sender sets the value of cwnd to 1 and begins the slow start process anew.
- It also sets the value of a second state variable ssthresh to cwnd/2
- Since the ssthresh is half of the value of cwnd when the congestion was last detected, it might be a bit reckless to keep doubling cwnd when it reaches or surpasses the value of ssthresh. Thus, when the value of cwnd equals ssthresh, slow star ends and TCP transitions into congestion avoidance mode.
- If three duplicate ACKs are detected, in which case TCP performs a fast retransmit and enters the fast recovery state.
- Congestion Avoidance:
- The cwnd is now approximately half its value of when congestion was last encountered. Instead of doubling the value of the cwnd every RTT, TCP increases the value of cwnd by a single MSS every RTT.
- A common approach is for the TCP sender to increase cwnd by MSS bytes whenever a new acknowledgment arrives.
- Example:
- If MSS is 1460 bytes and cwnd is 14600 bytes, then 10 segemnts are being sent withing RTT. Each arriving ACK increases the congestion window size by 1/10 MSS, and thus, the value of congestion window will have increased by one MSS after ACKs when all 10 segments have been received.
- When should congestion avoidance’s linear increase end? TCP congestion-avoidance algorithm behaves the same when a timeout occurs, as in the case of a slow start:
- The value of cwnd is set to 1 MSS; and the value of ssthresh is updated to half of the value of cwnd when the loss event occurred (remember that a loss event can also be triggered by a triple duplicate ACK event).
- In this case, the network is continuing to deliver segments from the sender to receiver. So TCP’s behavior to this type of loss event should be less drastic than with a timeout-indicated loss:
- TCP halves the value of cwnd (adding in 3 MSS for good measure to account for the triple duplicate ACKs received) and records the value of ssthresh to be half the value of cwnd when the triple duplicate ACKs were received. The fast-recovery statue is then entered.
- Fast recovery:
- In fast recovery, the value of cwnd is increased by 1 MSS for every duplicate ACK received for the missing segment that caused TCP to enter the fast-recovery state.
- Eventually, when an ACK arrives for the missing segment, TCP enters the congestion-avoidance state after deflating cwnd. If a timeout event occurs, fast recovery transitions to the slow-start state after performing the same actions as in slow start and congestion avoidance:
- The value of cwnd is set to 1 MSS, and the value of ssthresh is set to half the value of cwnd when the loss even occurred.
- An earlier version of TCP (TCP Tahoe) unconditionally cut its congestion window to 1 MSS and entered the slow-start phase after either a timeout-indication or triple-duplicate ACK-indicated loss event.
- The newer version who incorporated fast recovery is named TCP Reno
- Ignoring the initial slow-start period when a connection begins and assuming that losses are indicated by triple duplicate ACKs rather than timeouts, TCP’s congestion control consists of linear increase in cwnd of 1 MSS per RTT and then a halving of cwnd on a triple duplicate-ACK event. For this reason, TCP congestion control is often referred to as an additive-increase, multiplicative decrease (AIMD) form of congestion control.
- AIMD gives rise to the “saw tooth” behavior.
- The TCP Vegas algorithm attempts to avoid congestion while maintaining good throughput. The basic idea of Vegas:
- Detect congestion in the routers between source and destination before packet loss occurs
- Lower the rate linearly when this imminent packet loss is detected.
- Imminent packet loss is predicted by observing the RTT. The longer the RTT of the packets, the greater the congestion in the routers.
- Ten years after TCP’s development, theoretical analyses showed that TCP’s congestion-control algorithm serves as a distributed asynchronous-optimization algorithm that result in several important aspects of user and network performance being simultaneously optimized.
- The average throughput of a long-lived TCP connection is if we ignore the slow-start phases that occur after timeout events:
- During a particular round-trip interval, the rate at which TCP sends data is a function of the congestion window and the current RTT. When the window size is w bytes and the current round-trip time is RTT seconds, then TCP’s transmission rate is roughly w/RTT
- TCP probes for additional bandwidth by increasing w by 1 MSS each RTT until a loss event occurs.
- Denote by W the value of w when a loss event occurs. Assuming that RTT and W are approximately constant over the duration of the connection, the TCP transmission rate ranges from W/(2*RTT) to W/RTT
- The network drops a packet from the connection when the rate increases to W/RTT; the rate is then cut in half and then increases by MSS/RTT every RTT until it again reaches W/RTT. This process repeats itself over and over again. Because TCP’s throughout increases linearly between the two extreme values, we have:
- Average Throughput of a connection$$ = \frac{o.75\cdot w}{RTT}$$
- The need for continued evolution of TCP can be illustrated by considering the high-speed TCP connections that are needed for grid- and clouding computing applications:
- consider a TCP connection with 1500-byte segments and a 100 ms RTT, and suppose we want to send data through this connection at 10 Gbps. In order to achieve a 10 Gbps throughput, the average congestion window size would need to be 83,333 segments.
- What fraction of the transmitted segments could be lost that would allow the TCP congestion-control algorithm to still achieve the desired 10 Gbps rate:
- average throughput of a connection = $$\frac{1,22\cdot MSS}{RTT\sqrt{L}}$$
- Using this formula we see that the TCP congestion-control algorithm can only tolerate a segment loss probability of $$2\cdot 10^{-10}$$
3.7.1 – Fairness
- Consider K TCP connections, each with a different end-to-end path, but all passing through a bottleneck link with a transmission rate r bps. Suppose each connection is transferring a large file and there is no UDP traffic passing through the bottleneck link. A congestion-control mechanism is said to be fair if the average transmission rate of each connection is approximately R/K; that is, each connection gets an equal share of the link bandwidth.
- TCP congestion control converges to provide an equal share of a bottleneck link’s bandwidth among competing TCP connections:
- Consider the simple case of two TCP connections sharing a single link with transmission rate R. Assume that the two connections have the same MSS and RTT, that they have a large amount of data to send, and that no other TCP connections or UDP datagrams traverse this shared link. Aslo ignore the slow-start phase of TCP and assume the TCP connections are operating in CA mode (AIMD) at all times.
- Suppose that the TCP window sizes are such that at a given point in time, connections 1 and 2 realize throughout indicated by point A. Because the amount of link bandwidth jointly consumed by the two connections is less than R, no loss will occur, and both connections will increase their window by 1 MSS per RTT as a result of TCP’s congestion-avoidance algorithm. Thus, the joint throughput of the two connections proceeds along a 45-degree line starting from point A. Eventually, the link bandwidth jointly consumed by the two connections will be greater than R, and eventually packet loss will occur.
- Suppose that connections 1 and 2 experience packet loss when they realize throughout indicated by point B. Connections 1 and 2 then decreases their windows by a factor of two. The resulting throughout realized are thus at point C, halfway along a vector starting at B and ending at the origin. Because the joint bandwidth use is less than R at point C, the two connections again increase their throughputs along a 45-degree line starting from C.
- Eventually, loss will again occur, for example, at point D, and the two connections again decrease their window size by a factor of two, and so on.
- The bandwidth realized by the two connections eventually fluctuates along the equal bandwidth share line and the two connection will converge to this behavior regardless of where they are in the two-dimensional space.
- When multiple connections share a common bottleneck, those sessions with a smaller RTT are able to grab the available bandwidth at that link more quickly as it becomes free and thus will enjoy higher throughput than those connection with larger RTTs.
- UDP does not have built-in congestion control.
- Applications can pump data into the network at a constant rate and occasionally lose packets, rather than reduce their rates to “fair” levels at times of congestion and not lose any packets.
- UDP is thus not being fair.
- A fairness problem we still have to consider is that there is nothing stopping a TCP-based application from using multiple parallel connections.
- When an application uses multiple parallel connections, it gets a larger fraction of the bandwidth in a congested link.
- Consider a link of rate R supporting nine ongoing client-server applications, with each of the applications using one TCP connection. If a new application comes along and also uses one TCP connection, then each application gets approximately the same transmission rate of R/10. But if this new application instead uses 11 parallel TCP connections, then the new application gets an unfair allocation of more than R/2.