draft-ietf-quic-recovery-latest.txt   draft-ietf-quic-recovery-auth48.txt 
Internet Engineering Task Force (IETF) J. Iyengar, Ed. Internet Engineering Task Force (IETF) J. Iyengar, Ed.
Request for Comments: 9002 Fastly Request for Comments: 9002 Fastly
Category: Standards Track I. Swett, Ed. Category: Standards Track I. Swett, Ed.
ISSN: 2070-1721 Google ISSN: 2070-1721 Google
May 2021 April 2021
QUIC Loss Detection and Congestion Control QUIC Loss Detection and Congestion Control
Abstract Abstract
This document describes loss detection and congestion control This document describes loss detection and congestion control
mechanisms for QUIC. mechanisms for QUIC.
Status of This Memo Status of This Memo
skipping to change at page 2, line 7 skipping to change at line 46
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction
2. Conventions and Definitions . . . . . . . . . . . . . . . . . 3 2. Conventions and Definitions
3. Design of the QUIC Transmission Machinery . . . . . . . . . . 4 3. Design of the QUIC Transmission Machinery
4. Relevant Differences Between QUIC and TCP . . . . . . . . . . 5 4. Relevant Differences Between QUIC and TCP
4.1. Separate Packet Number Spaces . . . . . . . . . . . . . . 5 4.1. Separate Packet Number Spaces
4.2. Monotonically Increasing Packet Numbers . . . . . . . . . 5 4.2. Monotonically Increasing Packet Numbers
4.3. Clearer Loss Epoch . . . . . . . . . . . . . . . . . . . 5 4.3. Clearer Loss Epoch
4.4. No Reneging . . . . . . . . . . . . . . . . . . . . . . . 6 4.4. No Reneging
4.5. More ACK Ranges . . . . . . . . . . . . . . . . . . . . . 6 4.5. More ACK Ranges
4.6. Explicit Correction For Delayed Acknowledgments . . . . . 6 4.6. Explicit Correction For Delayed Acknowledgments
4.7. Probe Timeout Replaces RTO and TLP . . . . . . . . . . . 6 4.7. Probe Timeout Replaces RTO and TLP
4.8. The Minimum Congestion Window Is Two Packets . . . . . . 7 4.8. The Minimum Congestion Window Is Two Packets
4.9. Handshake Packets Are Not Special . . . . . . . . . . . . 7 5. Estimating the Round-Trip Time
5. Estimating the Round-Trip Time . . . . . . . . . . . . . . . 7 5.1. Generating RTT Samples
5.1. Generating RTT Samples . . . . . . . . . . . . . . . . . 7 5.2. Estimating min_rtt
5.2. Estimating min_rtt . . . . . . . . . . . . . . . . . . . 8 5.3. Estimating smoothed_rtt and rttvar
5.3. Estimating smoothed_rtt and rttvar . . . . . . . . . . . 9 6. Loss Detection
6. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 11 6.1. Acknowledgment-Based Detection
6.1. Acknowledgment-Based Detection . . . . . . . . . . . . . 11 6.1.1. Packet Threshold
6.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 12 6.1.2. Time Threshold
6.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 12 6.2. Probe Timeout
6.2. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 13 6.2.1. Computing PTO
6.2.1. Computing PTO . . . . . . . . . . . . . . . . . . . . 14 6.2.2. Handshakes and New Paths
6.2.2. Handshakes and New Paths . . . . . . . . . . . . . . 15 6.2.3. Speeding up Handshake Completion
6.2.3. Speeding up Handshake Completion . . . . . . . . . . 16 6.2.4. Sending Probe Packets
6.2.4. Sending Probe Packets . . . . . . . . . . . . . . . . 17 6.3. Handling Retry Packets
6.3. Handling Retry Packets . . . . . . . . . . . . . . . . . 18 6.4. Discarding Keys and Packet State
6.4. Discarding Keys and Packet State . . . . . . . . . . . . 18 7. Congestion Control
7. Congestion Control . . . . . . . . . . . . . . . . . . . . . 19 7.1. Explicit Congestion Notification
7.1. Explicit Congestion Notification . . . . . . . . . . . . 19 7.2. Initial and Minimum Congestion Window
7.2. Initial and Minimum Congestion Window . . . . . . . . . . 20 7.3. Congestion Control States
7.3. Congestion Control States . . . . . . . . . . . . . . . . 20 7.3.1. Slow Start
7.3.1. Slow Start . . . . . . . . . . . . . . . . . . . . . 21 7.3.2. Recovery
7.3.2. Recovery . . . . . . . . . . . . . . . . . . . . . . 21 7.3.3. Congestion Avoidance
7.3.3. Congestion Avoidance . . . . . . . . . . . . . . . . 22 7.4. Ignoring Loss of Undecryptable Packets
7.4. Ignoring Loss of Undecryptable Packets . . . . . . . . . 22 7.5. Probe Timeout
7.5. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 23 7.6. Persistent Congestion
7.6. Persistent Congestion . . . . . . . . . . . . . . . . . . 23 7.6.1. Duration
7.6.1. Duration . . . . . . . . . . . . . . . . . . . . . . 23 7.6.2. Establishing Persistent Congestion
7.6.2. Establishing Persistent Congestion . . . . . . . . . 24 7.6.3. Example
7.6.3. Example . . . . . . . . . . . . . . . . . . . . . . . 25 7.7. Pacing
7.7. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 26 7.8. Underutilizing the Congestion Window
7.8. Underutilizing the Congestion Window . . . . . . . . . . 27 8. Security Considerations
8. Security Considerations . . . . . . . . . . . . . . . . . . . 27 8.1. Loss and Congestion Signals
8.1. Loss and Congestion Signals . . . . . . . . . . . . . . . 27 8.2. Traffic Analysis
8.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 27 8.3. Misreporting ECN Markings
8.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 27 9. IANA Considerations
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 10. References
9.1. Normative References . . . . . . . . . . . . . . . . . . 28 10.1. Normative References
9.2. Informative References . . . . . . . . . . . . . . . . . 29 10.2. Informative References
Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 30 Appendix A. Loss Recovery Pseudocode
A.1. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 31 A.1. Tracking Sent Packets
A.1.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 31 A.1.1. Sent Packet Fields
A.2. Constants of Interest . . . . . . . . . . . . . . . . . . 31 A.2. Constants of Interest
A.3. Variables of Interest . . . . . . . . . . . . . . . . . . 32 A.3. Variables of Interest
A.4. Initialization . . . . . . . . . . . . . . . . . . . . . 33 A.4. Initialization
A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 33 A.5. On Sending a Packet
A.6. On Receiving a Datagram . . . . . . . . . . . . . . . . . 34 A.6. On Receiving a Datagram
A.7. On Receiving an Acknowledgment . . . . . . . . . . . . . 34 A.7. On Receiving an Acknowledgment
A.8. Setting the Loss Detection Timer . . . . . . . . . . . . 36 A.8. Setting the Loss Detection Timer
A.9. On Timeout . . . . . . . . . . . . . . . . . . . . . . . 37 A.9. On Timeout
A.10. Detecting Lost Packets . . . . . . . . . . . . . . . . . 38 A.10. Detecting Lost Packets
A.11. Upon Dropping Initial or Handshake Keys . . . . . . . . . 39 A.11. Upon Dropping Initial or Handshake Keys
Appendix B. Congestion Control Pseudocode . . . . . . . . . . . 40 Appendix B. Congestion Control Pseudocode
B.1. Constants of Interest . . . . . . . . . . . . . . . . . . 40 B.1. Constants of Interest
B.2. Variables of Interest . . . . . . . . . . . . . . . . . . 40 B.2. Variables of Interest
B.3. Initialization . . . . . . . . . . . . . . . . . . . . . 41 B.3. Initialization
B.4. On Packet Sent . . . . . . . . . . . . . . . . . . . . . 41 B.4. On Packet Sent
B.5. On Packet Acknowledgment . . . . . . . . . . . . . . . . 41 B.5. On Packet Acknowledgment
B.6. On New Congestion Event . . . . . . . . . . . . . . . . . 42 B.6. On New Congestion Event
B.7. Process ECN Information . . . . . . . . . . . . . . . . . 43 B.7. Process ECN Information
B.8. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 43 B.8. On Packets Lost
B.9. Removing Discarded Packets from Bytes in Flight . . . . . 43 B.9. Removing Discarded Packets from Bytes in Flight
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Contributors
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 45 Authors' Addresses
1. Introduction 1. Introduction
QUIC is a secure, general-purpose transport protocol, described in QUIC is a secure, general-purpose transport protocol, described in
[QUIC-TRANSPORT]. This document describes loss detection and [QUIC-TRANSPORT]. This document describes loss detection and
congestion control mechanisms for QUIC. congestion control mechanisms for QUIC.
2. Conventions and Definitions 2. Conventions and Definitions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in
14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
Definitions of terms that are used in this document: Definitions of terms that are used in this document:
Ack-eliciting frames: All frames other than ACK, PADDING, and Ack-eliciting frames: All frames other than ACK, PADDING, and
CONNECTION_CLOSE are considered ack-eliciting. CONNECTION_CLOSE are considered ack-eliciting.
Ack-eliciting packets: Packets that contain ack-eliciting frames Ack-eliciting packets: Packets that contain ack-eliciting frames
elicit an ACK from the receiver within the maximum acknowledgment elicit an ACK from the receiver within the maximum acknowledgment
delay and are called ack-eliciting packets. delay and are called ack-eliciting packets.
skipping to change at page 4, line 37 skipping to change at line 173
transmissions and retransmissions; this eliminates significant transmissions and retransmissions; this eliminates significant
complexity from QUIC's interpretation of TCP loss detection complexity from QUIC's interpretation of TCP loss detection
mechanisms. mechanisms.
QUIC packets can contain multiple frames of different types. The QUIC packets can contain multiple frames of different types. The
recovery mechanisms ensure that data and frames that need reliable recovery mechanisms ensure that data and frames that need reliable
delivery are acknowledged or declared lost and sent in new packets as delivery are acknowledged or declared lost and sent in new packets as
necessary. The types of frames contained in a packet affect recovery necessary. The types of frames contained in a packet affect recovery
and congestion control logic: and congestion control logic:
o All packets are acknowledged, though packets that contain no ack- * All packets are acknowledged, though packets that contain no ack-
eliciting frames are only acknowledged along with ack-eliciting eliciting frames are only acknowledged along with ack-eliciting
packets. packets.
o Long header packets that contain CRYPTO frames are critical to the * Long header packets that contain CRYPTO frames are critical to the
performance of the QUIC handshake and use shorter timers for performance of the QUIC handshake and use shorter timers for
acknowledgment. acknowledgment.
o Packets containing frames besides ACK or CONNECTION_CLOSE frames * Packets containing frames besides ACK or CONNECTION_CLOSE frames
count toward congestion control limits and are considered to be in count toward congestion control limits and are considered in
flight. flight.
o PADDING frames cause packets to contribute toward bytes in flight * PADDING frames cause packets to contribute toward bytes in flight
without directly causing an acknowledgment to be sent. without directly causing an acknowledgment to be sent.
4. Relevant Differences Between QUIC and TCP 4. Relevant Differences Between QUIC and TCP
Readers familiar with TCP's loss detection and congestion control Readers familiar with TCP's loss detection and congestion control
will find algorithms here that parallel well-known TCP ones. will find algorithms here that parallel well-known TCP ones.
However, protocol differences between QUIC and TCP contribute to However, protocol differences between QUIC and TCP contribute to
algorithmic differences. These protocol differences are briefly algorithmic differences. These protocol differences are briefly
described below. described below.
skipping to change at page 5, line 44 skipping to change at line 227
number signifies that the packet was sent earlier. When a packet number signifies that the packet was sent earlier. When a packet
containing ack-eliciting frames is detected lost, QUIC includes containing ack-eliciting frames is detected lost, QUIC includes
necessary frames in a new packet with a new packet number, removing necessary frames in a new packet with a new packet number, removing
ambiguity about which packet is acknowledged when an ACK is received. ambiguity about which packet is acknowledged when an ACK is received.
Consequently, more accurate RTT measurements can be made, spurious Consequently, more accurate RTT measurements can be made, spurious
retransmissions are trivially detected, and mechanisms such as Fast retransmissions are trivially detected, and mechanisms such as Fast
Retransmit can be applied universally, based only on packet number. Retransmit can be applied universally, based only on packet number.
This design point significantly simplifies loss detection mechanisms This design point significantly simplifies loss detection mechanisms
for QUIC. Most TCP mechanisms implicitly attempt to infer for QUIC. Most TCP mechanisms implicitly attempt to infer
transmission ordering based on TCP sequence numbers -- a nontrivial transmission ordering based on TCP sequence numbers - a nontrivial
task, especially when TCP timestamps are not available. task, especially when TCP timestamps are not available.
4.3. Clearer Loss Epoch 4.3. Clearer Loss Epoch
QUIC starts a loss epoch when a packet is lost. The loss epoch ends QUIC starts a loss epoch when a packet is lost. The loss epoch ends
when any packet sent after the start of the epoch is acknowledged. when any packet sent after the start of the epoch is acknowledged.
TCP waits for the gap in the sequence number space to be filled, and TCP waits for the gap in the sequence number space to be filled, and
so if a segment is lost multiple times in a row, the loss epoch may so if a segment is lost multiple times in a row, the loss epoch may
not end for several round trips. Because both should reduce their not end for several round trips. Because both should reduce their
congestion windows only once per epoch, QUIC will do it once for congestion windows only once per epoch, QUIC will do it once for
skipping to change at page 7, line 21 skipping to change at line 301
that single packet means that the sender needs to wait for a PTO to that single packet means that the sender needs to wait for a PTO to
recover (Section 6.2), which can be much longer than an RTT. Sending recover (Section 6.2), which can be much longer than an RTT. Sending
a single ack-eliciting packet also increases the chances of incurring a single ack-eliciting packet also increases the chances of incurring
additional latency when a receiver delays its acknowledgment. additional latency when a receiver delays its acknowledgment.
QUIC therefore recommends that the minimum congestion window be two QUIC therefore recommends that the minimum congestion window be two
packets. While this increases network load, it is considered safe packets. While this increases network load, it is considered safe
since the sender will still reduce its sending rate exponentially since the sender will still reduce its sending rate exponentially
under persistent congestion (Section 6.2). under persistent congestion (Section 6.2).
4.9. Handshake Packets Are Not Special
TCP treats the loss of SYN or SYN-ACK packet as persistent congestion
and reduces the congestion window to one packet; see [RFC5681]. QUIC
treats loss of a packet containing handshake data the same as other
losses.
5. Estimating the Round-Trip Time 5. Estimating the Round-Trip Time
At a high level, an endpoint measures the time from when a packet was At a high level, an endpoint measures the time from when a packet was
sent to when it is acknowledged as an RTT sample. The endpoint uses sent to when it is acknowledged as an RTT sample. The endpoint uses
RTT samples and peer-reported host delays (see Section 13.2 of RTT samples and peer-reported host delays (see Section 13.2 of
[QUIC-TRANSPORT]) to generate a statistical description of the [QUIC-TRANSPORT]) to generate a statistical description of the
network path's RTT. An endpoint computes the following three values network path's RTT. An endpoint computes the following three values
for each path: the minimum value over a period of time (min_rtt), an for each path: the minimum value over a period of time (min_rtt), an
exponentially weighted moving average (smoothed_rtt), and the mean exponentially weighted moving average (smoothed_rtt), and the mean
deviation (referred to as "variation" in the rest of this document) deviation (referred to as "variation" in the rest of this document)
in the observed RTT samples (rttvar). in the observed RTT samples (rttvar).
5.1. Generating RTT Samples 5.1. Generating RTT Samples
An endpoint generates an RTT sample on receiving an ACK frame that An endpoint generates an RTT sample on receiving an ACK frame that
meets the following two conditions: meets the following two conditions:
o the largest acknowledged packet number is newly acknowledged, and * the largest acknowledged packet number is newly acknowledged, and
o at least one of the newly acknowledged packets was ack-eliciting. * at least one of the newly acknowledged packets was ack-eliciting.
The RTT sample, latest_rtt, is generated as the time elapsed since The RTT sample, latest_rtt, is generated as the time elapsed since
the largest acknowledged packet was sent: the largest acknowledged packet was sent:
latest_rtt = ack_time - send_time_of_largest_acked latest_rtt = ack_time - send_time_of_largest_acked
An RTT sample is generated using only the largest acknowledged packet An RTT sample is generated using only the largest acknowledged packet
in the received ACK frame. This is because a peer reports in the received ACK frame. This is because a peer reports
acknowledgment delays for only the largest acknowledged packet in an acknowledgment delays for only the largest acknowledged packet in an
ACK frame. While the reported acknowledgment delay is not used by ACK frame. While the reported acknowledgment delay is not used by
the RTT sample measurement, it is used to adjust the RTT sample in the RTT sample measurement, it is used to adjust the RTT sample in
subsequent computations of smoothed_rtt and rttvar (Section 5.3). subsequent computations of smoothed_rtt and rttvar (Section 5.3).
To avoid generating multiple RTT samples for a single packet, an ACK To avoid generating multiple RTT samples for a single packet, an ACK
frame SHOULD NOT be used to update RTT estimates if it does not newly frame SHOULD NOT be used to update RTT estimates if it does not newly
acknowledge the largest acknowledged packet. acknowledge the largest acknowledged packet.
skipping to change at page 9, line 6 skipping to change at line 376
limits potential underestimation due to erroneously reported delays limits potential underestimation due to erroneously reported delays
by the peer. by the peer.
The RTT for a network path may change over time. If a path's actual The RTT for a network path may change over time. If a path's actual
RTT decreases, the min_rtt will adapt immediately on the first low RTT decreases, the min_rtt will adapt immediately on the first low
sample. If the path's actual RTT increases, however, the min_rtt sample. If the path's actual RTT increases, however, the min_rtt
will not adapt to it, allowing future RTT samples that are smaller will not adapt to it, allowing future RTT samples that are smaller
than the new RTT to be included in smoothed_rtt. than the new RTT to be included in smoothed_rtt.
Endpoints SHOULD set the min_rtt to the newest RTT sample after Endpoints SHOULD set the min_rtt to the newest RTT sample after
persistent congestion is established. This avoids repeatedly persistent congestion is established. This is to allow a connection
declaring persistent congestion when the RTT increases. This also to reset its estimate of min_rtt and smoothed_rtt after a disruptive
allows a connection to reset its estimate of min_rtt and smoothed_rtt network event (Section 5.3), and because it is possible that an
after a disruptive network event; see Section 5.3. increase in path delay resulted in persistent congestion being
incorrectly declared.
Endpoints MAY reestablish the min_rtt at other times in the Endpoints MAY reestablish the min_rtt at other times in the
connection, such as when traffic volume is low and an acknowledgment connection, such as when traffic volume is low and an acknowledgment
is received with a low acknowledgment delay. Implementations SHOULD is received with a low acknowledgment delay. Implementations SHOULD
NOT refresh the min_rtt value too often since the actual minimum RTT NOT refresh the min_rtt value too often since the actual minimum RTT
of the path is not frequently observable. of the path is not frequently observable.
5.3. Estimating smoothed_rtt and rttvar 5.3. Estimating smoothed_rtt and rttvar
smoothed_rtt is an exponentially weighted moving average of an smoothed_rtt is an exponentially weighted moving average of an
skipping to change at page 10, line 8 skipping to change at line 427
by the peer that are greater than the peer's max_ack_delay are by the peer that are greater than the peer's max_ack_delay are
attributed to unintentional but potentially repeating delays, such as attributed to unintentional but potentially repeating delays, such as
scheduler latency at the peer or loss of previous acknowledgments. scheduler latency at the peer or loss of previous acknowledgments.
Excess delays could also be due to a noncompliant receiver. Excess delays could also be due to a noncompliant receiver.
Therefore, these extra delays are considered effectively part of path Therefore, these extra delays are considered effectively part of path
delay and incorporated into the RTT estimate. delay and incorporated into the RTT estimate.
Therefore, when adjusting an RTT sample using peer-reported Therefore, when adjusting an RTT sample using peer-reported
acknowledgment delays, an endpoint: acknowledgment delays, an endpoint:
o MAY ignore the acknowledgment delay for Initial packets, since * MAY ignore the acknowledgment delay for Initial packets, since
these acknowledgments are not delayed by the peer (Section 13.2.1 these acknowledgments are not delayed by the peer (Section 13.2.1
of [QUIC-TRANSPORT]); of [QUIC-TRANSPORT]);
o SHOULD ignore the peer's max_ack_delay until the handshake is * SHOULD ignore the peer's max_ack_delay until the handshake is
confirmed; confirmed;
o MUST use the lesser of the acknowledgment delay and the peer's * MUST use the lesser of the acknowledgment delay and the peer's
max_ack_delay after the handshake is confirmed; and max_ack_delay after the handshake is confirmed; and
o MUST NOT subtract the acknowledgment delay from the RTT sample if * MUST NOT subtract the acknowledgment delay from the RTT sample if
the resulting value is smaller than the min_rtt. This limits the the resulting value is smaller than the min_rtt. This limits the
underestimation of the smoothed_rtt due to a misreporting peer. underestimation of the smoothed_rtt due to a misreporting peer.
Additionally, an endpoint might postpone the processing of Additionally, an endpoint might postpone the processing of
acknowledgments when the corresponding decryption keys are not acknowledgments when the corresponding decryption keys are not
immediately available. For example, a client might receive an immediately available. For example, a client might receive an
acknowledgment for a 0-RTT packet that it cannot decrypt because acknowledgment for a 0-RTT packet that it cannot decrypt because
1-RTT packet protection keys are not yet available to it. In such 1-RTT packet protection keys are not yet available to it. In such
cases, an endpoint SHOULD subtract such local delays from its RTT cases, an endpoint SHOULD subtract such local delays from its RTT
sample until the handshake is confirmed. sample until the handshake is confirmed.
skipping to change at page 10, line 49 skipping to change at line 468
smoothed_rtt and rttvar are initialized as follows, where kInitialRtt smoothed_rtt and rttvar are initialized as follows, where kInitialRtt
contains the initial RTT value: contains the initial RTT value:
smoothed_rtt = kInitialRtt smoothed_rtt = kInitialRtt
rttvar = kInitialRtt / 2 rttvar = kInitialRtt / 2
RTT samples for the network path are recorded in latest_rtt; see RTT samples for the network path are recorded in latest_rtt; see
Section 5.1. On the first RTT sample after initialization, the Section 5.1. On the first RTT sample after initialization, the
estimator is reset using that sample. This ensures that the estimator is reset using that sample. This ensures that the
estimator retains no history of past samples. Packets sent on other estimator retains no history of past samples.
paths do not contribute RTT samples to the current path, as described
in Section 9.4 of [QUIC-TRANSPORT].
On the first RTT sample after initialization, smoothed_rtt and rttvar On the first RTT sample after initialization, smoothed_rtt and rttvar
are set as follows: are set as follows:
smoothed_rtt = latest_rtt smoothed_rtt = latest_rtt
rttvar = latest_rtt / 2 rttvar = latest_rtt / 2
On subsequent RTT samples, smoothed_rtt and rttvar evolve as follows: On subsequent RTT samples, smoothed_rtt and rttvar evolve as follows:
ack_delay = decoded acknowledgment delay from ACK frame ack_delay = decoded acknowledgment delay from ACK frame
if (handshake confirmed): if (handshake confirmed):
ack_delay = min(ack_delay, max_ack_delay) ack_delay = min(ack_delay, max_ack_delay)
adjusted_rtt = latest_rtt adjusted_rtt = latest_rtt
if (latest_rtt >= min_rtt + ack_delay): if (min_rtt + ack_delay < latest_rtt):
adjusted_rtt = latest_rtt - ack_delay adjusted_rtt = latest_rtt - ack_delay
smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt
rttvar_sample = abs(smoothed_rtt - adjusted_rtt) rttvar_sample = abs(smoothed_rtt - adjusted_rtt)
rttvar = 3/4 * rttvar + 1/4 * rttvar_sample rttvar = 3/4 * rttvar + 1/4 * rttvar_sample
6. Loss Detection 6. Loss Detection
QUIC senders use acknowledgments to detect lost packets and a PTO to QUIC senders use acknowledgments to detect lost packets and a PTO to
ensure acknowledgments are received; see Section 6.2. This section ensure acknowledgments are received (see Section 6.2). This section
provides a description of these algorithms. provides a description of these algorithms.
If a packet is lost, the QUIC transport needs to recover from that If a packet is lost, the QUIC transport needs to recover from that
loss, such as by retransmitting the data, sending an updated frame, loss, such as by retransmitting the data, sending an updated frame,
or discarding the frame. For more information, see Section 13.3 of or discarding the frame. For more information, see Section 13.3 of
[QUIC-TRANSPORT]. [QUIC-TRANSPORT].
Loss detection is separate per packet number space, unlike RTT Loss detection is separate per packet number space, unlike RTT
measurement and congestion control, because RTT and congestion measurement and congestion control, because RTT and congestion
control are properties of the path, whereas loss detection also control are properties of the path, whereas loss detection also
skipping to change at page 11, line 50 skipping to change at line 515
Acknowledgment-based loss detection implements the spirit of TCP's Acknowledgment-based loss detection implements the spirit of TCP's
Fast Retransmit [RFC5681], Early Retransmit [RFC5827], Forward Fast Retransmit [RFC5681], Early Retransmit [RFC5827], Forward
Acknowledgment [FACK], SACK loss recovery [RFC6675], and RACK-TLP Acknowledgment [FACK], SACK loss recovery [RFC6675], and RACK-TLP
[RACK]. This section provides an overview of how these algorithms [RACK]. This section provides an overview of how these algorithms
are implemented in QUIC. are implemented in QUIC.
A packet is declared lost if it meets all of the following A packet is declared lost if it meets all of the following
conditions: conditions:
o The packet is unacknowledged, in flight, and was sent prior to an * The packet is unacknowledged, in flight, and was sent prior to an
acknowledged packet. acknowledged packet.
o The packet was sent kPacketThreshold packets before an * The packet was sent kPacketThreshold packets before an
acknowledged packet (Section 6.1.1), or it was sent long enough in acknowledged packet (Section 6.1.1), or it was sent long enough in
the past (Section 6.1.2). the past (Section 6.1.2).
The acknowledgment indicates that a packet sent later was delivered, The acknowledgment indicates that a packet sent later was delivered,
and the packet and time thresholds provide some tolerance for packet and the packet and time thresholds provide some tolerance for packet
reordering. reordering.
Spuriously declaring packets as lost leads to unnecessary Spuriously declaring packets as lost leads to unnecessary
retransmissions and may result in degraded performance due to the retransmissions and may result in degraded performance due to the
actions of the congestion controller upon detecting loss. actions of the congestion controller upon detecting loss.
Implementations can detect spurious retransmissions and increase the Implementations can detect spurious retransmissions and increase the
packet or time reordering threshold to reduce future spurious reordering threshold in packets or time to reduce future spurious
retransmissions and loss events. Implementations with adaptive time retransmissions and loss events. Implementations with adaptive time
thresholds MAY choose to start with smaller initial reordering thresholds MAY choose to start with smaller initial reordering
thresholds to minimize recovery latency. thresholds to minimize recovery latency.
6.1.1. Packet Threshold 6.1.1. Packet Threshold
The RECOMMENDED initial value for the packet reordering threshold The RECOMMENDED initial value for the packet reordering threshold
(kPacketThreshold) is 3, based on best practices for TCP loss (kPacketThreshold) is 3, based on best practices for TCP loss
detection [RFC5681] [RFC6675]. In order to remain similar to TCP, detection [RFC5681] [RFC6675]. In order to remain similar to TCP,
implementations SHOULD NOT use a packet threshold less than 3; see implementations SHOULD NOT use a packet threshold less than 3; see
skipping to change at page 13, line 8 skipping to change at line 569
constant. The time threshold is: constant. The time threshold is:
max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity) max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity)
If packets sent prior to the largest acknowledged packet cannot yet If packets sent prior to the largest acknowledged packet cannot yet
be declared lost, then a timer SHOULD be set for the remaining time. be declared lost, then a timer SHOULD be set for the remaining time.
Using max(smoothed_rtt, latest_rtt) protects from the two following Using max(smoothed_rtt, latest_rtt) protects from the two following
cases: cases:
o the latest RTT sample is lower than the smoothed RTT, perhaps due * the latest RTT sample is lower than the smoothed RTT, perhaps due
to reordering where the acknowledgment encountered a shorter path; to reordering where the acknowledgment encountered a shorter path;
o the latest RTT sample is higher than the smoothed RTT, perhaps due * the latest RTT sample is higher than the smoothed RTT, perhaps due
to a sustained increase in the actual RTT, but the smoothed RTT to a sustained increase in the actual RTT, but the smoothed RTT
has not yet caught up. has not yet caught up.
The RECOMMENDED time threshold (kTimeThreshold), expressed as an RTT The RECOMMENDED time threshold (kTimeThreshold), expressed as an RTT
multiplier, is 9/8. The RECOMMENDED value of the timer granularity multiplier, is 9/8. The RECOMMENDED value of the timer granularity
(kGranularity) is 1 millisecond. (kGranularity) is 1 ms.
Note: TCP's RACK [RACK] specifies a slightly larger threshold, | Note: TCP's RACK [RACK] specifies a slightly larger threshold,
equivalent to 5/4, for a similar purpose. Experience with QUIC | equivalent to 5/4, for a similar purpose. Experience with QUIC
shows that 9/8 works well. | shows that 9/8 works well.
Implementations MAY experiment with absolute thresholds, thresholds Implementations MAY experiment with absolute thresholds, thresholds
from previous connections, adaptive thresholds, or the including of from previous connections, adaptive thresholds, or the including of
RTT variation. Smaller thresholds reduce reordering resilience and RTT variation. Smaller thresholds reduce reordering resilience and
increase spurious retransmissions, and larger thresholds increase increase spurious retransmissions, and larger thresholds increase
loss detection delay. loss detection delay.
6.2. Probe Timeout 6.2. Probe Timeout
A Probe Timeout (PTO) triggers the sending of one or two probe A Probe Timeout (PTO) triggers the sending of one or two probe
skipping to change at page 14, line 30 skipping to change at line 637
the peer is expected to not delay these packets intentionally; see the peer is expected to not delay these packets intentionally; see
Section 13.2.1 of [QUIC-TRANSPORT]. Section 13.2.1 of [QUIC-TRANSPORT].
The PTO period MUST be at least kGranularity to avoid the timer The PTO period MUST be at least kGranularity to avoid the timer
expiring immediately. expiring immediately.
When ack-eliciting packets in multiple packet number spaces are in When ack-eliciting packets in multiple packet number spaces are in
flight, the timer MUST be set to the earlier value of the Initial and flight, the timer MUST be set to the earlier value of the Initial and
Handshake packet number spaces. Handshake packet number spaces.
An endpoint MUST NOT set its PTO timer for the Application Data An endpoint MUST NOT set its PTO timer for the application data
packet number space until the handshake is confirmed. Doing so packet number space until the handshake is confirmed. Doing so
prevents the endpoint from retransmitting information in packets when prevents the endpoint from retransmitting information in packets when
either the peer does not yet have the keys to process them or the either the peer does not yet have the keys to process them or the
endpoint does not yet have the keys to process their acknowledgments. endpoint does not yet have the keys to process their acknowledgments.
For example, this can happen when a client sends 0-RTT packets to the For example, this can happen when a client sends 0-RTT packets to the
server; it does so without knowing whether the server will be able to server; it does so without knowing whether the server will be able to
decrypt them. Similarly, this can happen when a server sends 1-RTT decrypt them. Similarly, this can happen when a server sends 1-RTT
packets before confirming that the client has verified the server's packets before confirming that the client has verified the server's
certificate and can therefore read these 1-RTT packets. certificate and can therefore read these 1-RTT packets.
skipping to change at page 15, line 31 skipping to change at line 687
The PTO timer MUST NOT be set if a timer is set for time threshold The PTO timer MUST NOT be set if a timer is set for time threshold
loss detection; see Section 6.1.2. A timer that is set for time loss detection; see Section 6.1.2. A timer that is set for time
threshold loss detection will expire earlier than the PTO timer in threshold loss detection will expire earlier than the PTO timer in
most cases and is less likely to spuriously retransmit data. most cases and is less likely to spuriously retransmit data.
6.2.2. Handshakes and New Paths 6.2.2. Handshakes and New Paths
Resumed connections over the same network MAY use the previous Resumed connections over the same network MAY use the previous
connection's final smoothed RTT value as the resumed connection's connection's final smoothed RTT value as the resumed connection's
initial RTT. When no previous RTT is available, the initial RTT initial RTT. When no previous RTT is available, the initial RTT
SHOULD be set to 333 milliseconds. This results in handshakes SHOULD be set to 333 ms. This results in handshakes starting with a
starting with a PTO of 1 second, as recommended for TCP's initial PTO of 1 second, as recommended for TCP's initial RTO; see Section 2
RTO; see Section 2 of [RFC6298]. of [RFC6298].
A connection MAY use the delay between sending a PATH_CHALLENGE and A connection MAY use the delay between sending a PATH_CHALLENGE and
receiving a PATH_RESPONSE to set the initial RTT (see kInitialRtt in receiving a PATH_RESPONSE to set the initial RTT (see kInitialRtt in
Appendix A.2) for a new path, but the delay SHOULD NOT be considered Appendix A.2) for a new path, but the delay SHOULD NOT be considered
an RTT sample. an RTT sample.
When the Initial keys and Handshake keys are discarded (see Initial packets and Handshake packets could never be acknowledged,
Section 6.4), any Initial packets and Handshake packets can no longer but they are removed from bytes in flight when the Initial and
be acknowledged, so they are removed from bytes in flight. When Handshake keys are discarded, as described below in Section 6.4.
Initial or Handshake keys are discarded, the PTO and loss detection When Initial or Handshake keys are discarded, the PTO and loss
timers MUST be reset, because discarding keys indicates forward detection timers MUST be reset because discarding keys indicates
progress and the loss detection timer might have been set for a now- forward progress, and the loss detection timer might have been set
discarded packet number space. for a now discarded packet number space.
6.2.2.1. Before Address Validation 6.2.2.1. Before Address Validation
Until the server has validated the client's address on the path, the Until the server has validated the client's address on the path, the
amount of data it can send is limited to three times the amount of amount of data it can send is limited to three times the amount of
data received, as specified in Section 8.1 of [QUIC-TRANSPORT]. If data received, as specified in Section 8.1 of [QUIC-TRANSPORT]. If
no additional data can be sent, the server's PTO timer MUST NOT be no additional data can be sent, the server's PTO timer MUST NOT be
armed until datagrams have been received from the client because armed until datagrams have been received from the client because
packets sent on PTO count against the anti-amplification limit. packets sent on PTO count against the anti-amplification limit. Note
that the server could fail to validate the client's address even if
When the server receives a datagram from the client, the 0-RTT is accepted.
amplification limit is increased and the server resets the PTO timer.
If the PTO timer is then set to a time in the past, it is executed
immediately. Doing so avoids sending new 1-RTT packets prior to
packets critical to the completion of the handshake. In particular,
this can happen when 0-RTT is accepted but the server fails to
validate the client's address.
Since the server could be blocked until more datagrams are received Since the server could be blocked until more datagrams are received
from the client, it is the client's responsibility to send packets to from the client, it is the client's responsibility to send packets to
unblock the server until it is certain that the server has finished unblock the server until it is certain that the server has finished
its address validation (see Section 8 of [QUIC-TRANSPORT]). That is, its address validation (see Section 8 of [QUIC-TRANSPORT]). That is,
the client MUST set the PTO timer if the client has not received an the client MUST set the probe timer if the client has not received an
acknowledgment for any of its Handshake packets and the handshake is acknowledgment for any of its Handshake packets and the handshake is
not confirmed (see Section 4.1.2 of [QUIC-TLS]), even if there are no not confirmed (see Section 4.1.2 of [QUIC-TLS]), even if there are no
packets in flight. When the PTO fires, the client MUST send a packets in flight. When the PTO fires, the client MUST send a
Handshake packet if it has Handshake keys, otherwise it MUST send an Handshake packet if it has Handshake keys, otherwise it MUST send an
Initial packet in a UDP datagram with a payload of at least 1200 Initial packet in a UDP datagram with a payload of at least 1200
bytes. bytes.
6.2.3. Speeding up Handshake Completion 6.2.3. Speeding up Handshake Completion
When a server receives an Initial packet containing duplicate CRYPTO When a server receives an Initial packet containing duplicate CRYPTO
skipping to change at page 18, line 22 skipping to change at line 809
sent, implementations must choose between sending the same payload sent, implementations must choose between sending the same payload
every time or sending different payloads. Sending the same payload every time or sending different payloads. Sending the same payload
may be simpler and ensures the highest priority frames arrive first. may be simpler and ensures the highest priority frames arrive first.
Sending different payloads each time reduces the chances of spurious Sending different payloads each time reduces the chances of spurious
retransmission. retransmission.
6.3. Handling Retry Packets 6.3. Handling Retry Packets
A Retry packet causes a client to send another Initial packet, A Retry packet causes a client to send another Initial packet,
effectively restarting the connection process. A Retry packet effectively restarting the connection process. A Retry packet
indicates that the Initial packet was received but not processed. A indicates that the Initial Packet was received but not processed. A
Retry packet cannot be treated as an acknowledgment because it does Retry packet cannot be treated as an acknowledgment because it does
not indicate that a packet was processed or specify the packet not indicate that a packet was processed or specify the packet
number. number.
Clients that receive a Retry packet reset congestion control and loss Clients that receive a Retry packet reset congestion control and loss
recovery state, including resetting any pending timers. Other recovery state, including resetting any pending timers. Other
connection state, in particular cryptographic handshake messages, is connection state, in particular cryptographic handshake messages, is
retained; see Section 17.2.5 of [QUIC-TRANSPORT]. retained; see Section 17.2.5 of [QUIC-TRANSPORT].
The client MAY compute an RTT estimate to the server as the time The client MAY compute an RTT estimate to the server as the time
skipping to change at page 19, line 9 skipping to change at line 845
[QUIC-TRANSPORT]. At this point, recovery state for all in-flight [QUIC-TRANSPORT]. At this point, recovery state for all in-flight
Initial packets is discarded. Initial packets is discarded.
When 0-RTT is rejected, recovery state for all in-flight 0-RTT When 0-RTT is rejected, recovery state for all in-flight 0-RTT
packets is discarded. packets is discarded.
If a server accepts 0-RTT, but does not buffer 0-RTT packets that If a server accepts 0-RTT, but does not buffer 0-RTT packets that
arrive before Initial packets, early 0-RTT packets will be declared arrive before Initial packets, early 0-RTT packets will be declared
lost, but that is expected to be infrequent. lost, but that is expected to be infrequent.
It is expected that keys are discarded at some time after the packets It is expected that keys are discarded after packets encrypted with
encrypted with them are either acknowledged or declared lost. them would be acknowledged or declared lost. However, Initial and
However, Initial and Handshake secrets are discarded as soon as Handshake secrets are discarded as soon as Handshake and 1-RTT keys
Handshake and 1-RTT keys are proven to be available to both client are proven to be available to both client and server; see
and server; see Section 4.9.1 of [QUIC-TLS]. Section 4.9.1 of [QUIC-TLS].
7. Congestion Control 7. Congestion Control
This document specifies a sender-side congestion controller for QUIC This document specifies a sender-side congestion controller for QUIC
similar to TCP NewReno [RFC6582]. similar to TCP NewReno [RFC6582].
The signals QUIC provides for congestion control are generic and are The signals QUIC provides for congestion control are generic and are
designed to support different sender-side algorithms. A sender can designed to support different sender-side algorithms. A sender can
unilaterally choose a different algorithm to use, such as CUBIC unilaterally choose a different algorithm to use, such as CUBIC
[RFC8312]. [RFC8312].
skipping to change at page 19, line 36 skipping to change at line 872
document, the chosen controller MUST conform to the congestion document, the chosen controller MUST conform to the congestion
control guidelines specified in Section 3.1 of [RFC8085]. control guidelines specified in Section 3.1 of [RFC8085].
Similar to TCP, packets containing only ACK frames do not count Similar to TCP, packets containing only ACK frames do not count
toward bytes in flight and are not congestion controlled. Unlike toward bytes in flight and are not congestion controlled. Unlike
TCP, QUIC can detect the loss of these packets and MAY use that TCP, QUIC can detect the loss of these packets and MAY use that
information to adjust the congestion controller or the rate of ACK- information to adjust the congestion controller or the rate of ACK-
only packets being sent, but this document does not describe a only packets being sent, but this document does not describe a
mechanism for doing so. mechanism for doing so.
The congestion controller is per path, so packets sent on other paths
do not alter the current path's congestion controller, as described
in Section 9.4 of [QUIC-TRANSPORT].
The algorithm in this document specifies and uses the controller's The algorithm in this document specifies and uses the controller's
congestion window in bytes. congestion window in bytes.
An endpoint MUST NOT send a packet if it would cause bytes_in_flight An endpoint MUST NOT send a packet if it would cause bytes_in_flight
(see Appendix B.2) to be larger than the congestion window, unless (see Appendix B.2) to be larger than the congestion window, unless
the packet is sent on a PTO timer expiration (see Section 6.2) or the packet is sent on a PTO timer expiration (see Section 6.2) or
when entering recovery (see Section 7.3.2). when entering recovery (see Section 7.3.2).
7.1. Explicit Congestion Notification 7.1. Explicit Congestion Notification
skipping to change at page 24, line 16 skipping to change at line 1079
long period of time, even when no acknowledgments are being received. long period of time, even when no acknowledgments are being received.
The use of a duration enables a sender to establish persistent The use of a duration enables a sender to establish persistent
congestion without depending on PTO expiration. congestion without depending on PTO expiration.
7.6.2. Establishing Persistent Congestion 7.6.2. Establishing Persistent Congestion
A sender establishes persistent congestion after the receipt of an A sender establishes persistent congestion after the receipt of an
acknowledgment if two packets that are ack-eliciting are declared acknowledgment if two packets that are ack-eliciting are declared
lost, and: lost, and:
o across all packet number spaces, none of the packets sent between * across all packet number spaces, none of the packets sent between
the send times of these two packets are acknowledged; the send times of these two packets are acknowledged;
o the duration between the send times of these two packets exceeds * the duration between the send times of these two packets exceeds
the persistent congestion duration (Section 7.6.1); and the persistent congestion duration (Section 7.6.1); and
o a prior RTT sample existed when these two packets were sent. * a prior RTT sample existed when these two packets were sent.
These two packets MUST be ack-eliciting, since a receiver is required These two packets MUST be ack-eliciting, since a receiver is required
to acknowledge only ack-eliciting packets within its maximum to acknowledge only ack-eliciting packets within its maximum
acknowledgment delay; see Section 13.2 of [QUIC-TRANSPORT]. acknowledgment delay; see Section 13.2 of [QUIC-TRANSPORT].
The persistent congestion period SHOULD NOT start until there is at The persistent congestion period SHOULD NOT start until there is at
least one RTT sample. Before the first RTT sample, a sender arms its least one RTT sample. Before the first RTT sample, a sender arms its
PTO timer based on the initial RTT (Section 6.2.2), which could be PTO timer based on the initial RTT (Section 6.2.2), which could be
substantially larger than the actual RTT. Requiring a prior RTT substantially larger than the actual RTT. Requiring a prior RTT
sample prevents a sender from establishing persistent congestion with sample prevents a sender from establishing persistent congestion with
skipping to change at page 25, line 15 skipping to change at line 1122
7.6.3. Example 7.6.3. Example
The following example illustrates how a sender might establish The following example illustrates how a sender might establish
persistent congestion. Assume: persistent congestion. Assume:
smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay = 2 smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay = 2
kPersistentCongestionThreshold = 3 kPersistentCongestionThreshold = 3
Consider the following sequence of events: Consider the following sequence of events:
+--------+-----------------------------------+ +========+===================================+
| Time | Action | | Time | Action |
+--------+-----------------------------------+ +========+===================================+
| t=0 | Send packet #1 (application data) | | t=0 | Send packet #1 (application data) |
| | | +--------+-----------------------------------+
| t=1 | Send packet #2 (application data) | | t=1 | Send packet #2 (application data) |
| | | +--------+-----------------------------------+
| t=1.2 | Receive acknowledgment of #1 | | t=1.2 | Receive acknowledgment of #1 |
| | | +--------+-----------------------------------+
| t=2 | Send packet #3 (application data) | | t=2 | Send packet #3 (application data) |
| | | +--------+-----------------------------------+
| t=3 | Send packet #4 (application data) | | t=3 | Send packet #4 (application data) |
| | | +--------+-----------------------------------+
| t=4 | Send packet #5 (application data) | | t=4 | Send packet #5 (application data) |
| | | +--------+-----------------------------------+
| t=5 | Send packet #6 (application data) | | t=5 | Send packet #6 (application data) |
| | | +--------+-----------------------------------+
| t=6 | Send packet #7 (application data) | | t=6 | Send packet #7 (application data) |
| | | +--------+-----------------------------------+
| t=8 | Send packet #8 (PTO 1) | | t=8 | Send packet #8 (PTO 1) |
| | | +--------+-----------------------------------+
| t=12 | Send packet #9 (PTO 2) | | t=12 | Send packet #9 (PTO 2) |
| | | +--------+-----------------------------------+
| t=12.2 | Receive acknowledgment of #9 | | t=12.2 | Receive acknowledgment of #9 |
+--------+-----------------------------------+ +--------+-----------------------------------+
Table 1
Packets 2 through 8 are declared lost when the acknowledgment for Packets 2 through 8 are declared lost when the acknowledgment for
packet 9 is received at "t = 12.2". packet 9 is received at t = 12.2.
The congestion period is calculated as the time between the oldest The congestion period is calculated as the time between the oldest
and newest lost packets: "8 - 1 = 7". The persistent congestion and newest lost packets: 8 - 1 = 7. The persistent congestion
duration is "2 * 3 = 6". Because the threshold was reached and duration is: 2 * 3 = 6. Because the threshold was reached and
because none of the packets between the oldest and the newest lost because none of the packets between the oldest and the newest lost
packets were acknowledged, the network is considered to have packets were acknowledged, the network is considered to have
experienced persistent congestion. experienced persistent congestion.
While this example shows PTO expiration, they are not required for While this example shows PTO expiration, they are not required for
persistent congestion to be established. persistent congestion to be established.
7.7. Pacing 7.7. Pacing
A sender SHOULD pace sending of all in-flight packets based on input A sender SHOULD pace sending of all in-flight packets based on input
skipping to change at page 26, line 24 skipping to change at line 1182
Section 7.2. A sender with knowledge that the network path to the Section 7.2. A sender with knowledge that the network path to the
receiver can absorb larger bursts MAY use a higher limit. receiver can absorb larger bursts MAY use a higher limit.
An implementation should take care to architect its congestion An implementation should take care to architect its congestion
controller to work well with a pacer. For instance, a pacer might controller to work well with a pacer. For instance, a pacer might
wrap the congestion controller and control the availability of the wrap the congestion controller and control the availability of the
congestion window, or a pacer might pace out packets handed to it by congestion window, or a pacer might pace out packets handed to it by
the congestion controller. the congestion controller.
Timely delivery of ACK frames is important for efficient loss Timely delivery of ACK frames is important for efficient loss
recovery. To avoid delaying their delivery to the peer, packets recovery. Packets containing only ACK frames SHOULD therefore not be
containing only ACK frames SHOULD therefore not be paced. paced to avoid delaying their delivery to the peer.
Endpoints can implement pacing as they choose. A perfectly paced Endpoints can implement pacing as they choose. A perfectly paced
sender spreads packets exactly evenly over time. For a window-based sender spreads packets exactly evenly over time. For a window-based
congestion controller, such as the one in this document, that rate congestion controller, such as the one in this document, that rate
can be computed by averaging the congestion window over the RTT. can be computed by averaging the congestion window over the RTT.
Expressed as a rate in units of bytes per time, where Expressed as a rate in units of bytes per time, where
congestion_window is in bytes: congestion_window is in bytes:
rate = N * congestion_window / smoothed_rtt rate = N * congestion_window / smoothed_rtt
skipping to change at page 27, line 9 skipping to change at line 1215
One possible implementation strategy for pacing uses a leaky bucket One possible implementation strategy for pacing uses a leaky bucket
algorithm, where the capacity of the "bucket" is limited to the algorithm, where the capacity of the "bucket" is limited to the
maximum burst size and the rate the "bucket" fills is determined by maximum burst size and the rate the "bucket" fills is determined by
the above function. the above function.
7.8. Underutilizing the Congestion Window 7.8. Underutilizing the Congestion Window
When bytes in flight is smaller than the congestion window and When bytes in flight is smaller than the congestion window and
sending is not pacing limited, the congestion window is sending is not pacing limited, the congestion window is
underutilized. This can happen due to insufficient application data underutilized. When this occurs, the congestion window SHOULD NOT be
or flow control limits. When this occurs, the congestion window increased in either slow start or congestion avoidance. This can
SHOULD NOT be increased in either slow start or congestion avoidance. happen due to insufficient application data or flow control limits.
A sender that paces packets (see Section 7.7) might delay sending A sender that paces packets (see Section 7.7) might delay sending
packets and not fully utilize the congestion window due to this packets and not fully utilize the congestion window due to this
delay. A sender SHOULD NOT consider itself application limited if it delay. A sender SHOULD NOT consider itself application limited if it
would have fully utilized the congestion window without pacing delay. would have fully utilized the congestion window without pacing delay.
A sender MAY implement alternative mechanisms to update its A sender MAY implement alternative mechanisms to update its
congestion window after periods of underutilization, such as those congestion window after periods of underutilization, such as those
proposed for TCP in [RFC7661]. proposed for TCP in [RFC7661].
skipping to change at page 28, line 18 skipping to change at line 1273
their sending rate, which is similar in effect to advertising reduced their sending rate, which is similar in effect to advertising reduced
connection flow control limits and so no advantage is gained by doing connection flow control limits and so no advantage is gained by doing
so. so.
Endpoints choose the congestion controller that they use. Congestion Endpoints choose the congestion controller that they use. Congestion
controllers respond to reports of ECN-CE by reducing their rate, but controllers respond to reports of ECN-CE by reducing their rate, but
the response may vary. Markings can be treated as equivalent to loss the response may vary. Markings can be treated as equivalent to loss
[RFC3168], but other responses can be specified, such as [RFC8511] or [RFC3168], but other responses can be specified, such as [RFC8511] or
[RFC8311]. [RFC8311].
9. References 9. IANA Considerations
9.1. Normative References This document has no IANA actions.
[QUIC-TLS] 10. References
Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure
QUIC", RFC 9001, DOI 10.17487/RFC9001, May 2021, 10.1. Normative References
[QUIC-TLS] Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure
QUIC", RFC 9001, DOI 10.17487/RFC9001, April 2021,
<https://www.rfc-editor.org/info/rfc9001>. <https://www.rfc-editor.org/info/rfc9001>.
[QUIC-TRANSPORT] [QUIC-TRANSPORT]
Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
Multiplexed and Secure Transport", RFC 9000, Multiplexed and Secure Transport", RFC 9000,
DOI 10.17487/RFC9000, May 2021, DOI 10.17487/RFC9000, April 2021,
<https://www.rfc-editor.org/info/rfc9000>. <https://www.rfc-editor.org/info/rfc9000>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, DOI 10.17487/RFC3168, September 2001, RFC 3168, DOI 10.17487/RFC3168, September 2001,
<https://www.rfc-editor.org/info/rfc3168>. <https://www.rfc-editor.org/info/rfc3168>.
[RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
March 2017, <https://www.rfc-editor.org/info/rfc8085>. March 2017, <https://www.rfc-editor.org/info/rfc8085>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
9.2. Informative References 10.2. Informative References
[FACK] Mathis, M. and J. Mahdavi, "Forward acknowledgement: [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgement:
Refining TCP Congestion Control", Refining TCP Congestion Control", ACM SIGCOMM Computer
DOI 10.1145/248157.248181, ACM SIGCOMM Computer Communication Review, DOI 10.1145/248157.248181, August
Communication Review, August 1996. 1996, <https://doi.org/10.1145/248157.248181>.
[PRR] Mathis, M., Dukkipati, N., and Y. Cheng, "Proportional [PRR] Mathis, M., Dukkipati, N., and Y. Cheng, "Proportional
Rate Reduction for TCP", RFC 6937, DOI 10.17487/RFC6937, Rate Reduction for TCP", RFC 6937, DOI 10.17487/RFC6937,
May 2013, <https://www.rfc-editor.org/info/rfc6937>. May 2013, <https://www.rfc-editor.org/info/rfc6937>.
[RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "The [RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "The
RACK-TLP Loss Detection Algorithm for TCP", Work in RACK-TLP Loss Detection Algorithm for TCP", RFC 8985,
Progress, draft-ietf-tcpm-rack-15, December 2020. DOI 10.17487/RFC8985, February 2021,
<https://www.rfc-editor.org/info/rfc8985>.
[RETRANSMISSION] [RETRANSMISSION]
Karn, P. and C. Partridge, "Improving Round-Trip Time Karn, P. and C. Partridge, "Improving Round-Trip Time
Estimates in Reliable Transport Protocols", Estimates in Reliable Transport Protocols", ACM SIGCOMM
DOI 10.1145/118544.118549, ACM Transactions on Computer CCR, January 1995.
Systems, November 1991.
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, Selective Acknowledgment Options", RFC 2018,
DOI 10.17487/RFC2018, October 1996, DOI 10.17487/RFC2018, October 1996,
<https://www.rfc-editor.org/info/rfc2018>. <https://www.rfc-editor.org/info/rfc2018>.
[RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte
Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February
2003, <https://www.rfc-editor.org/info/rfc3465>. 2003, <https://www.rfc-editor.org/info/rfc3465>.
skipping to change at page 34, line 19 skipping to change at line 1563
successfully processed. In such a case, the PTO timer will need to successfully processed. In such a case, the PTO timer will need to
be rearmed. be rearmed.
Pseudocode for OnDatagramReceived follows: Pseudocode for OnDatagramReceived follows:
OnDatagramReceived(datagram): OnDatagramReceived(datagram):
// If this datagram unblocks the server, arm the // If this datagram unblocks the server, arm the
// PTO timer to avoid deadlock. // PTO timer to avoid deadlock.
if (server was at anti-amplification limit): if (server was at anti-amplification limit):
SetLossDetectionTimer() SetLossDetectionTimer()
if loss_detection_timer.timeout < now():
// Execute PTO if it would have expired
// while the amplification limit applied.
OnLossDetectionTimeout()
A.7. On Receiving an Acknowledgment A.7. On Receiving an Acknowledgment
When an ACK frame is received, it may newly acknowledge any number of When an ACK frame is received, it may newly acknowledge any number of
packets. packets.
Pseudocode for OnAckReceived and UpdateRtt follow: Pseudocode for OnAckReceived and UpdateRtt follow:
IncludesAckEliciting(packets): IncludesAckEliciting(packets):
for packet in packets: for packet in packets:
skipping to change at page 35, line 44 skipping to change at line 1633
// min_rtt ignores acknowledgment delay. // min_rtt ignores acknowledgment delay.
min_rtt = min(min_rtt, latest_rtt) min_rtt = min(min_rtt, latest_rtt)
// Limit ack_delay by max_ack_delay after handshake // Limit ack_delay by max_ack_delay after handshake
// confirmation. // confirmation.
if (handshake confirmed): if (handshake confirmed):
ack_delay = min(ack_delay, max_ack_delay) ack_delay = min(ack_delay, max_ack_delay)
// Adjust for acknowledgment delay if plausible. // Adjust for acknowledgment delay if plausible.
adjusted_rtt = latest_rtt adjusted_rtt = latest_rtt
if (latest_rtt >= min_rtt + ack_delay): if (latest_rtt > min_rtt + ack_delay):
adjusted_rtt = latest_rtt - ack_delay adjusted_rtt = latest_rtt - ack_delay
rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - adjusted_rtt) rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - adjusted_rtt)
smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt
A.8. Setting the Loss Detection Timer A.8. Setting the Loss Detection Timer
QUIC loss detection uses a single timer for all timeout loss QUIC loss detection uses a single timer for all timeout loss
detection. The duration of the timer is based on the timer's mode, detection. The duration of the timer is based on the timer's mode,
which is set in the packet and timer events further below. The which is set in the packet and timer events further below. The
skipping to change at page 36, line 32 skipping to change at line 1666
space = Initial space = Initial
for pn_space in [ Handshake, ApplicationData ]: for pn_space in [ Handshake, ApplicationData ]:
if (time == 0 || loss_time[pn_space] < time): if (time == 0 || loss_time[pn_space] < time):
time = loss_time[pn_space]; time = loss_time[pn_space];
space = pn_space space = pn_space
return time, space return time, space
GetPtoTimeAndSpace(): GetPtoTimeAndSpace():
duration = (smoothed_rtt + max(4 * rttvar, kGranularity)) duration = (smoothed_rtt + max(4 * rttvar, kGranularity))
* (2 ^ pto_count) * (2 ^ pto_count)
// Anti-deadlock PTO starts from the current time // Arm PTO from now when there are no in-flight packets.
if (no ack-eliciting packets in flight): if (no in-flight packets):
assert(!PeerCompletedAddressValidation()) assert(!PeerCompletedAddressValidation())
if (has handshake keys): if (has handshake keys):
return (now() + duration), Handshake return (now() + duration), Handshake
else: else:
return (now() + duration), Initial return (now() + duration), Initial
pto_timeout = infinite pto_timeout = infinite
pto_space = Initial pto_space = Initial
for space in [ Initial, Handshake, ApplicationData ]: for space in [ Initial, Handshake, ApplicationData ]:
if (no ack-eliciting packets in flight in space): if (no in-flight packets in space):
continue; continue;
if (space == ApplicationData): if (space == ApplicationData):
// Skip Application Data until handshake confirmed. // Skip Application Data until handshake confirmed.
if (handshake is not confirmed): if (handshake is not confirmed):
return pto_timeout, pto_space return pto_timeout, pto_space
// Include max_ack_delay and backoff for Application Data. // Include max_ack_delay and backoff for Application Data.
duration += max_ack_delay * (2 ^ pto_count) duration += max_ack_delay * (2 ^ pto_count)
t = time_of_last_ack_eliciting_packet[space] + duration t = time_of_last_ack_eliciting_packet[space] + duration
if (t < pto_timeout): if (t < pto_timeout):
skipping to change at page 38, line 15 skipping to change at line 1740
OnLossDetectionTimeout(): OnLossDetectionTimeout():
earliest_loss_time, pn_space = GetLossTimeAndSpace() earliest_loss_time, pn_space = GetLossTimeAndSpace()
if (earliest_loss_time != 0): if (earliest_loss_time != 0):
// Time threshold loss Detection // Time threshold loss Detection
lost_packets = DetectAndRemoveLostPackets(pn_space) lost_packets = DetectAndRemoveLostPackets(pn_space)
assert(!lost_packets.empty()) assert(!lost_packets.empty())
OnPacketsLost(lost_packets) OnPacketsLost(lost_packets)
SetLossDetectionTimer() SetLossDetectionTimer()
return return
if (no ack-eliciting packets in flight): if (bytes_in_flight > 0):
// PTO. Send new data if available, else retransmit old data.
// If neither is available, send a single PING frame.
_, pn_space = GetPtoTimeAndSpace()
SendOneOrTwoAckElicitingPackets(pn_space)
else:
assert(!PeerCompletedAddressValidation()) assert(!PeerCompletedAddressValidation())
// Client sends an anti-deadlock packet: Initial is padded // Client sends an anti-deadlock packet: Initial is padded
// to earn more anti-amplification credit, // to earn more anti-amplification credit,
// a Handshake packet proves address ownership. // a Handshake packet proves address ownership.
if (has Handshake keys): if (has Handshake keys):
SendOneAckElicitingHandshakePacket() SendOneAckElicitingHandshakePacket()
else: else:
SendOneAckElicitingPaddedInitialPacket() SendOneAckElicitingPaddedInitialPacket()
else:
// PTO. Send new data if available, else retransmit old data.
// If neither is available, send a single PING frame.
_, pn_space = GetPtoTimeAndSpace()
SendOneOrTwoAckElicitingPackets(pn_space)
pto_count++ pto_count++
SetLossDetectionTimer() SetLossDetectionTimer()
A.10. Detecting Lost Packets A.10. Detecting Lost Packets
DetectAndRemoveLostPackets is called every time an ACK is received or DetectAndRemoveLostPackets is called every time an ACK is received or
the time threshold loss detection timer expires. This function the time threshold loss detection timer expires. This function
operates on the sent_packets for that packet number space and returns operates on the sent_packets for that packet number space and returns
a list of packets newly detected as lost. a list of packets newly detected as lost.
skipping to change at page 44, line 17 skipping to change at line 2020
foreach packet in discarded_packets: foreach packet in discarded_packets:
if packet.in_flight if packet.in_flight
bytes_in_flight -= size bytes_in_flight -= size
Contributors Contributors
The IETF QUIC Working Group received an enormous amount of support The IETF QUIC Working Group received an enormous amount of support
from many people. The following people provided substantive from many people. The following people provided substantive
contributions to this document: contributions to this document:
o Alessandro Ghedini * Alessandro Ghedini
* Benjamin Saunders
o Benjamin Saunders * Gorry Fairhurst
* 山本和彦 (Kazu Yamamoto)
o Gorry Fairhurst * 奥 一穂 (Kazuho Oku)
* Lars Eggert
o Kazu Yamamoto * Magnus Westerlund
* Marten Seemann
o Kazuho Oku * Martin Duke
* Martin Thomson
o Lars Eggert * Mirja Kühlewind
* Nick Banks
o Magnus Westerlund * Praveen Balasubramanian
o Marten Seemann
o Martin Duke
o Martin Thomson
o Mirja Kuehlewind
o Nick Banks
o Praveen Balasubramanian
Authors' Addresses Authors' Addresses
Jana Iyengar (editor) Jana Iyengar (editor)
Fastly Fastly
Email: jri.ietf@gmail.com Email: jri.ietf@gmail.com
Ian Swett (editor) Ian Swett (editor)
Google Google
 End of changes. 70 change blocks. 
223 lines changed or deleted 196 lines changed or added

This html diff was produced by rfcdiff 1.44jr. The latest version is available from http://tools.ietf.org/tools/rfcdiff/