draft-ietf-quic-recovery-24.txt | draft-ietf-quic-recovery-latest.txt | |||
---|---|---|---|---|
QUIC Working Group J. Iyengar, Ed. | QUIC Working Group J. Iyengar, Ed. | |||
Internet-Draft Fastly | Internet-Draft Fastly | |||
Intended status: Standards Track I. Swett, Ed. | Intended status: Standards Track I. Swett, Ed. | |||
Expires: May 7, 2020 Google | Expires: June 6, 2020 Google | |||
November 4, 2019 | December 4, 2019 | |||
QUIC Loss Detection and Congestion Control | QUIC Loss Detection and Congestion Control | |||
draft-ietf-quic-recovery-24 | draft-ietf-quic-recovery-latest | |||
Abstract | Abstract | |||
This document describes loss detection and congestion control | This document describes loss detection and congestion control | |||
mechanisms for QUIC. | mechanisms for QUIC. | |||
Note to Readers | Note to Readers | |||
Discussion of this draft takes place on the QUIC working group | Discussion of this draft takes place on the QUIC working group | |||
mailing list (quic@ietf.org), which is archived at | mailing list (quic@ietf.org), which is archived at | |||
skipping to change at page 1, line 42 ¶ | skipping to change at page 1, line 42 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on May 7, 2020. | This Internet-Draft will expire on June 6, 2020. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 40 ¶ | skipping to change at page 2, line 40 ¶ | |||
4.3. Estimating smoothed_rtt and rttvar . . . . . . . . . . . 8 | 4.3. Estimating smoothed_rtt and rttvar . . . . . . . . . . . 8 | |||
5. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 9 | 5. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
5.1. Acknowledgement-based Detection . . . . . . . . . . . . . 10 | 5.1. Acknowledgement-based Detection . . . . . . . . . . . . . 10 | |||
5.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 10 | 5.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 10 | |||
5.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 10 | 5.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 10 | |||
5.2. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 11 | 5.2. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 11 | |||
5.2.1. Computing PTO . . . . . . . . . . . . . . . . . . . . 11 | 5.2.1. Computing PTO . . . . . . . . . . . . . . . . . . . . 11 | |||
5.3. Handshakes and New Paths . . . . . . . . . . . . . . . . 12 | 5.3. Handshakes and New Paths . . . . . . . . . . . . . . . . 12 | |||
5.3.1. Sending Probe Packets . . . . . . . . . . . . . . . . 13 | 5.3.1. Sending Probe Packets . . . . . . . . . . . . . . . . 13 | |||
5.3.2. Loss Detection . . . . . . . . . . . . . . . . . . . 14 | 5.3.2. Loss Detection . . . . . . . . . . . . . . . . . . . 14 | |||
5.4. Handling Retry Packets . . . . . . . . . . . . . . . . . 14 | 5.4. Handling Retry Packets . . . . . . . . . . . . . . . . . 15 | |||
5.5. Discarding Keys and Packet State . . . . . . . . . . . . 14 | 5.5. Discarding Keys and Packet State . . . . . . . . . . . . 15 | |||
6. Congestion Control . . . . . . . . . . . . . . . . . . . . . 15 | 6. Congestion Control . . . . . . . . . . . . . . . . . . . . . 15 | |||
6.1. Explicit Congestion Notification . . . . . . . . . . . . 15 | 6.1. Explicit Congestion Notification . . . . . . . . . . . . 16 | |||
6.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 16 | 6.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
6.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 16 | 6.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 16 | |||
6.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 16 | 6.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 16 | |||
6.5. Ignoring Loss of Undecryptable Packets . . . . . . . . . 16 | 6.5. Ignoring Loss of Undecryptable Packets . . . . . . . . . 17 | |||
6.6. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 16 | 6.6. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 17 | |||
6.7. Persistent Congestion . . . . . . . . . . . . . . . . . . 17 | 6.7. Persistent Congestion . . . . . . . . . . . . . . . . . . 17 | |||
6.8. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 18 | 6.8. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
6.9. Under-utilizing the Congestion Window . . . . . . . . . . 18 | 6.9. Under-utilizing the Congestion Window . . . . . . . . . . 19 | |||
7. Security Considerations . . . . . . . . . . . . . . . . . . . 19 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 19 | |||
7.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 19 | 7.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 19 | |||
7.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 19 | 7.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 19 | |||
7.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 19 | 7.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 20 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 | |||
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
9.1. Normative References . . . . . . . . . . . . . . . . . . 20 | 9.1. Normative References . . . . . . . . . . . . . . . . . . 20 | |||
9.2. Informative References . . . . . . . . . . . . . . . . . 20 | 9.2. Informative References . . . . . . . . . . . . . . . . . 21 | |||
9.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 22 | 9.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 22 | Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 22 | |||
A.1. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 22 | A.1. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 23 | |||
A.1.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 22 | A.1.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 23 | |||
A.2. Constants of interest . . . . . . . . . . . . . . . . . . 23 | A.2. Constants of interest . . . . . . . . . . . . . . . . . . 23 | |||
A.3. Variables of interest . . . . . . . . . . . . . . . . . . 23 | A.3. Variables of interest . . . . . . . . . . . . . . . . . . 24 | |||
A.4. Initialization . . . . . . . . . . . . . . . . . . . . . 24 | A.4. Initialization . . . . . . . . . . . . . . . . . . . . . 25 | |||
A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 24 | A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 25 | |||
A.6. On Receiving an Acknowledgment . . . . . . . . . . . . . 25 | A.6. On Receiving an Acknowledgment . . . . . . . . . . . . . 26 | |||
A.7. On Packet Acknowledgment . . . . . . . . . . . . . . . . 26 | A.7. On Packet Acknowledgment . . . . . . . . . . . . . . . . 27 | |||
A.8. Setting the Loss Detection Timer . . . . . . . . . . . . 27 | A.8. Setting the Loss Detection Timer . . . . . . . . . . . . 27 | |||
A.9. On Timeout . . . . . . . . . . . . . . . . . . . . . . . 29 | A.9. On Timeout . . . . . . . . . . . . . . . . . . . . . . . 29 | |||
A.10. Detecting Lost Packets . . . . . . . . . . . . . . . . . 29 | A.10. Detecting Lost Packets . . . . . . . . . . . . . . . . . 29 | |||
Appendix B. Congestion Control Pseudocode . . . . . . . . . . . 30 | Appendix B. Congestion Control Pseudocode . . . . . . . . . . . 30 | |||
B.1. Constants of interest . . . . . . . . . . . . . . . . . . 30 | B.1. Constants of interest . . . . . . . . . . . . . . . . . . 30 | |||
B.2. Variables of interest . . . . . . . . . . . . . . . . . . 31 | B.2. Variables of interest . . . . . . . . . . . . . . . . . . 31 | |||
B.3. Initialization . . . . . . . . . . . . . . . . . . . . . 32 | B.3. Initialization . . . . . . . . . . . . . . . . . . . . . 32 | |||
B.4. On Packet Sent . . . . . . . . . . . . . . . . . . . . . 32 | B.4. On Packet Sent . . . . . . . . . . . . . . . . . . . . . 32 | |||
B.5. On Packet Acknowledgement . . . . . . . . . . . . . . . . 32 | B.5. On Packet Acknowledgement . . . . . . . . . . . . . . . . 32 | |||
B.6. On New Congestion Event . . . . . . . . . . . . . . . . . 33 | B.6. On New Congestion Event . . . . . . . . . . . . . . . . . 33 | |||
skipping to change at page 4, line 20 ¶ | skipping to change at page 4, line 20 ¶ | |||
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 40 | Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 40 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40 | |||
1. Introduction | 1. Introduction | |||
QUIC is a new multiplexed and secure transport atop UDP. QUIC builds | QUIC is a new multiplexed and secure transport atop UDP. QUIC builds | |||
on decades of transport and security experience, and implements | on decades of transport and security experience, and implements | |||
mechanisms that make it attractive as a modern general-purpose | mechanisms that make it attractive as a modern general-purpose | |||
transport. The QUIC protocol is described in [QUIC-TRANSPORT]. | transport. The QUIC protocol is described in [QUIC-TRANSPORT]. | |||
QUIC implements the spirit of existing TCP loss recovery mechanisms, | QUIC implements the spirit of existing TCP congestion control and | |||
described in RFCs, various Internet-drafts, and also those prevalent | loss recovery mechanisms, described in RFCs, various Internet-drafts, | |||
in the Linux TCP implementation. This document describes QUIC | and also those prevalent in the Linux TCP implementation. This | |||
congestion control and loss recovery, and where applicable, | document describes QUIC congestion control and loss recovery, and | |||
attributes the TCP equivalent in RFCs, Internet-drafts, academic | where applicable, attributes the TCP equivalent in RFCs, Internet- | |||
papers, and/or TCP implementations. | drafts, academic papers, and/or TCP implementations. | |||
2. Conventions and Definitions | 2. Conventions and Definitions | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
Definitions of terms that are used in this document: | Definitions of terms that are used in this document: | |||
skipping to change at page 4, line 50 ¶ | skipping to change at page 4, line 50 ¶ | |||
and are not ACK-only, and they are not acknowledged, declared | and are not ACK-only, and they are not acknowledged, declared | |||
lost, or abandoned along with old keys. | lost, or abandoned along with old keys. | |||
Ack-eliciting Frames: All frames other than ACK, PADDING, and | Ack-eliciting Frames: All frames other than ACK, PADDING, and | |||
CONNECTION_CLOSE are considered ack-eliciting. | CONNECTION_CLOSE are considered ack-eliciting. | |||
Ack-eliciting Packets: Packets that contain ack-eliciting frames | Ack-eliciting Packets: Packets that contain ack-eliciting frames | |||
elicit an ACK from the receiver within the maximum ack delay and | elicit an ACK from the receiver within the maximum ack delay and | |||
are called ack-eliciting packets. | are called ack-eliciting packets. | |||
Crypto Packets: Packets containing CRYPTO data sent in Initial or | ||||
Handshake packets. | ||||
Out-of-order Packets: Packets that do not increase the largest | Out-of-order Packets: Packets that do not increase the largest | |||
received packet number for its packet number space by exactly one. | received packet number for its packet number space by exactly one. | |||
Packets arrive out of order when earlier packets are lost or | Packets arrive out of order when earlier packets are lost or | |||
delayed. | delayed. | |||
3. Design of the QUIC Transmission Machinery | 3. Design of the QUIC Transmission Machinery | |||
All transmissions in QUIC are sent with a packet-level header, which | All transmissions in QUIC are sent with a packet-level header, which | |||
indicates the encryption level and includes a packet sequence number | indicates the encryption level and includes a packet sequence number | |||
(referred to below as a packet number). The encryption level | (referred to below as a packet number). The encryption level | |||
indicates the packet number space, as described in [QUIC-TRANSPORT]. | indicates the packet number space, as described in [QUIC-TRANSPORT]. | |||
Packet numbers never repeat within a packet number space for the | Packet numbers never repeat within a packet number space for the | |||
lifetime of a connection. Packet numbers monotonically increase | lifetime of a connection. Packet numbers are sent in monotonically | |||
within a space, preventing ambiguity. | increasing order within a space, preventing ambiguity. | |||
This design obviates the need for disambiguating between | This design obviates the need for disambiguating between | |||
transmissions and retransmissions and eliminates significant | transmissions and retransmissions and eliminates significant | |||
complexity from QUIC's interpretation of TCP loss detection | complexity from QUIC's interpretation of TCP loss detection | |||
mechanisms. | mechanisms. | |||
QUIC packets can contain multiple frames of different types. The | QUIC packets can contain multiple frames of different types. The | |||
recovery mechanisms ensure that data and frames that need reliable | recovery mechanisms ensure that data and frames that need reliable | |||
delivery are acknowledged or declared lost and sent in new packets as | delivery are acknowledged or declared lost and sent in new packets as | |||
necessary. The types of frames contained in a packet affect recovery | necessary. The types of frames contained in a packet affect recovery | |||
skipping to change at page 6, line 43 ¶ | skipping to change at page 6, line 43 ¶ | |||
retransmissions are trivially detected, and mechanisms such as Fast | retransmissions are trivially detected, and mechanisms such as Fast | |||
Retransmit can be applied universally, based only on packet number. | Retransmit can be applied universally, based only on packet number. | |||
This design point significantly simplifies loss detection mechanisms | This design point significantly simplifies loss detection mechanisms | |||
for QUIC. Most TCP mechanisms implicitly attempt to infer | for QUIC. Most TCP mechanisms implicitly attempt to infer | |||
transmission ordering based on TCP sequence numbers - a non-trivial | transmission ordering based on TCP sequence numbers - a non-trivial | |||
task, especially when TCP timestamps are not available. | task, especially when TCP timestamps are not available. | |||
3.1.3. Clearer Loss Epoch | 3.1.3. Clearer Loss Epoch | |||
QUIC ends a loss epoch when a packet sent after loss is declared is | QUIC starts a loss epoch when a packet is lost and ends one when any | |||
acknowledged. TCP waits for the gap in the sequence number space to | packet sent after the epoch starts is acknowledged. TCP waits for | |||
be filled, and so if a segment is lost multiple times in a row, the | the gap in the sequence number space to be filled, and so if a | |||
loss epoch may not end for several round trips. Because both should | segment is lost multiple times in a row, the loss epoch may not end | |||
reduce their congestion windows only once per epoch, QUIC will do it | for several round trips. Because both should reduce their congestion | |||
correctly once for every round trip that experiences loss, while TCP | windows only once per epoch, QUIC will do it once for every round | |||
may only do it once across multiple round trips. | trip that experiences loss, while TCP may only do it once across | |||
multiple round trips. | ||||
3.1.4. No Reneging | 3.1.4. No Reneging | |||
QUIC ACKs contain information that is similar to TCP SACK, but QUIC | QUIC ACKs contain information that is similar to TCP SACK, but QUIC | |||
does not allow any acked packet to be reneged, greatly simplifying | does not allow any acked packet to be reneged, greatly simplifying | |||
implementations on both sides and reducing memory pressure on the | implementations on both sides and reducing memory pressure on the | |||
sender. | sender. | |||
3.1.5. More ACK Ranges | 3.1.5. More ACK Ranges | |||
skipping to change at page 7, line 30 ¶ | skipping to change at page 7, line 30 ¶ | |||
QUIC endpoints measure the delay incurred between when a packet is | QUIC endpoints measure the delay incurred between when a packet is | |||
received and when the corresponding acknowledgment is sent, allowing | received and when the corresponding acknowledgment is sent, allowing | |||
a peer to maintain a more accurate round-trip time estimate (see | a peer to maintain a more accurate round-trip time estimate (see | |||
Section 13.2 of [QUIC-TRANSPORT]). | Section 13.2 of [QUIC-TRANSPORT]). | |||
4. Estimating the Round-Trip Time | 4. Estimating the Round-Trip Time | |||
At a high level, an endpoint measures the time from when a packet was | At a high level, an endpoint measures the time from when a packet was | |||
sent to when it is acknowledged as a round-trip time (RTT) sample. | sent to when it is acknowledged as a round-trip time (RTT) sample. | |||
The endpoint uses RTT samples and peer-reported host delays (see | The endpoint uses RTT samples and peer-reported ACK delays (see | |||
Section 13.2 of [QUIC-TRANSPORT]) to generate a statistical | Section 13.2 of [QUIC-TRANSPORT]) to generate a statistical | |||
description of the connection's RTT. An endpoint computes the | description of the connection's RTT. An endpoint computes the | |||
following three values: the minimum value observed over the lifetime | following three values: the minimum value observed over the lifetime | |||
of the connection (min_rtt), an exponentially-weighted moving average | of the connection (min_rtt), an exponentially-weighted moving average | |||
(smoothed_rtt), and the variance in the observed RTT samples | (smoothed_rtt), and the variance in the observed RTT samples | |||
(rttvar). | (rttvar). | |||
4.1. Generating RTT samples | 4.1. Generating RTT samples | |||
An endpoint generates an RTT sample on receiving an ACK frame that | An endpoint generates an RTT sample on receiving an ACK frame that | |||
skipping to change at page 8, line 5 ¶ | skipping to change at page 8, line 5 ¶ | |||
o the largest acknowledged packet number is newly acknowledged, and | o the largest acknowledged packet number is newly acknowledged, and | |||
o at least one of the newly acknowledged packets was ack-eliciting. | o at least one of the newly acknowledged packets was ack-eliciting. | |||
The RTT sample, latest_rtt, is generated as the time elapsed since | The RTT sample, latest_rtt, is generated as the time elapsed since | |||
the largest acknowledged packet was sent: | the largest acknowledged packet was sent: | |||
latest_rtt = ack_time - send_time_of_largest_acked | latest_rtt = ack_time - send_time_of_largest_acked | |||
An RTT sample is generated using only the largest acknowledged packet | An RTT sample is generated using only the largest acknowledged packet | |||
in the received ACK frame. This is because a peer reports host | in the received ACK frame. This is because a peer reports ACK delays | |||
delays for only the largest acknowledged packet in an ACK frame. | for only the largest acknowledged packet in an ACK frame. While the | |||
While the reported host delay is not used by the RTT sample | reported ACK delay is not used by the RTT sample measurement, it is | |||
measurement, it is used to adjust the RTT sample in subsequent | used to adjust the RTT sample in subsequent computations of | |||
computations of smoothed_rtt and rttvar Section 4.3. | smoothed_rtt and rttvar Section 4.3. | |||
To avoid generating multiple RTT samples using the same packet, an | To avoid generating multiple RTT samples for a single packet, an ACK | |||
ACK frame SHOULD NOT be used to update RTT estimates if it does not | frame SHOULD NOT be used to update RTT estimates if it does not newly | |||
newly acknowledge the largest acknowledged packet. | acknowledge the largest acknowledged packet. | |||
An RTT sample MUST NOT be generated on receiving an ACK frame that | An RTT sample MUST NOT be generated on receiving an ACK frame that | |||
does not newly acknowledge at least one ack-eliciting packet. A peer | does not newly acknowledge at least one ack-eliciting packet. A peer | |||
does not send an ACK frame on receiving only non-ack-eliciting | does not send an ACK frame on receiving only non-ack-eliciting | |||
packets, so an ACK frame that is subsequently sent can include an | packets, so an ACK frame that is subsequently sent can include an | |||
arbitrarily large Ack Delay field. Ignoring such ACK frames avoids | arbitrarily large Ack Delay field. Ignoring such ACK frames avoids | |||
complications in subsequent smoothed_rtt and rttvar computations. | complications in subsequent smoothed_rtt and rttvar computations. | |||
A sender might generate multiple RTT samples per RTT when multiple | A sender might generate multiple RTT samples per RTT when multiple | |||
ACK frames are received within an RTT. As suggested in [RFC6298], | ACK frames are received within an RTT. As suggested in [RFC6298], | |||
skipping to change at page 8, line 36 ¶ | skipping to change at page 8, line 36 ¶ | |||
open research question. | open research question. | |||
4.2. Estimating min_rtt | 4.2. Estimating min_rtt | |||
min_rtt is the minimum RTT observed over the lifetime of the | min_rtt is the minimum RTT observed over the lifetime of the | |||
connection. min_rtt is set to the latest_rtt on the first sample in a | connection. min_rtt is set to the latest_rtt on the first sample in a | |||
connection, and to the lesser of min_rtt and latest_rtt on subsequent | connection, and to the lesser of min_rtt and latest_rtt on subsequent | |||
samples. | samples. | |||
An endpoint uses only locally observed times in computing the min_rtt | An endpoint uses only locally observed times in computing the min_rtt | |||
and does not adjust for host delays reported by the peer. Doing so | and does not adjust for ACK delays reported by the peer. Doing so | |||
allows the endpoint to set a lower bound for the smoothed_rtt based | allows the endpoint to set a lower bound for the smoothed_rtt based | |||
entirely on what it observes (see Section 4.3), and limits potential | entirely on what it observes (see Section 4.3), and limits potential | |||
underestimation due to erroneously-reported delays by the peer. | underestimation due to erroneously-reported delays by the peer. | |||
4.3. Estimating smoothed_rtt and rttvar | 4.3. Estimating smoothed_rtt and rttvar | |||
smoothed_rtt is an exponentially-weighted moving average of an | smoothed_rtt is an exponentially-weighted moving average of an | |||
endpoint's RTT samples, and rttvar is the endpoint's estimated | endpoint's RTT samples, and rttvar is the endpoint's estimated | |||
variance in the RTT samples. | variance in the RTT samples. | |||
The calculation of smoothed_rtt uses path latency after adjusting RTT | The calculation of smoothed_rtt uses path latency after adjusting RTT | |||
samples for host delays. For packets sent in the ApplicationData | samples for ACK delays. For packets sent in the ApplicationData | |||
packet number space, a peer limits any delay in sending an | packet number space, a peer limits any delay in sending an | |||
acknowledgement for an ack-eliciting packet to no greater than the | acknowledgement for an ack-eliciting packet to no greater than the | |||
value it advertised in the max_ack_delay transport parameter. | value it advertised in the max_ack_delay transport parameter. | |||
Consequently, when a peer reports an Ack Delay that is greater than | Consequently, when a peer reports an Ack Delay that is greater than | |||
its max_ack_delay, the delay is attributed to reasons out of the | its max_ack_delay, the delay is attributed to reasons out of the | |||
peer's control, such as scheduler latency at the peer or loss of | peer's control, such as scheduler latency at the peer or loss of | |||
previous ACK frames. Any delays beyond the peer's max_ack_delay are | previous ACK frames. Any delays beyond the peer's max_ack_delay are | |||
therefore considered effectively part of path delay and incorporated | therefore considered effectively part of path delay and incorporated | |||
into the smoothed_rtt estimate. | into the smoothed_rtt estimate. | |||
skipping to change at page 9, line 46 ¶ | skipping to change at page 9, line 46 ¶ | |||
ack_delay = min(Ack Delay in ACK Frame, max_ack_delay) | ack_delay = min(Ack Delay in ACK Frame, max_ack_delay) | |||
adjusted_rtt = latest_rtt | adjusted_rtt = latest_rtt | |||
if (min_rtt + ack_delay < latest_rtt): | if (min_rtt + ack_delay < latest_rtt): | |||
adjusted_rtt = latest_rtt - ack_delay | adjusted_rtt = latest_rtt - ack_delay | |||
smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt | smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt | |||
rttvar_sample = abs(smoothed_rtt - adjusted_rtt) | rttvar_sample = abs(smoothed_rtt - adjusted_rtt) | |||
rttvar = 3/4 * rttvar + 1/4 * rttvar_sample | rttvar = 3/4 * rttvar + 1/4 * rttvar_sample | |||
5. Loss Detection | 5. Loss Detection | |||
QUIC senders use both ack information and timeouts to detect lost | QUIC senders use acknowledgements to detect lost packets, and a probe | |||
packets, and this section provides a description of these algorithms. | time out Section 5.2 to ensure acknowledgements are received. This | |||
section provides a description of these algorithms. | ||||
If a packet is lost, the QUIC transport needs to recover from that | If a packet is lost, the QUIC transport needs to recover from that | |||
loss, such as by retransmitting the data, sending an updated frame, | loss, such as by retransmitting the data, sending an updated frame, | |||
or abandoning the frame. For more information, see Section 13.3 of | or abandoning the frame. For more information, see Section 13.3 of | |||
[QUIC-TRANSPORT]. | [QUIC-TRANSPORT]. | |||
5.1. Acknowledgement-based Detection | 5.1. Acknowledgement-based Detection | |||
Acknowledgement-based loss detection implements the spirit of TCP's | Acknowledgement-based loss detection implements the spirit of TCP's | |||
Fast Retransmit [RFC5681], Early Retransmit [RFC5827], FACK [FACK], | Fast Retransmit [RFC5681], Early Retransmit [RFC5827], FACK [FACK], | |||
skipping to change at page 10, line 22 ¶ | skipping to change at page 10, line 24 ¶ | |||
A packet is declared lost if it meets all the following conditions: | A packet is declared lost if it meets all the following conditions: | |||
o The packet is unacknowledged, in-flight, and was sent prior to an | o The packet is unacknowledged, in-flight, and was sent prior to an | |||
acknowledged packet. | acknowledged packet. | |||
o Either its packet number is kPacketThreshold smaller than an | o Either its packet number is kPacketThreshold smaller than an | |||
acknowledged packet (Section 5.1.1), or it was sent long enough in | acknowledged packet (Section 5.1.1), or it was sent long enough in | |||
the past (Section 5.1.2). | the past (Section 5.1.2). | |||
The acknowledgement indicates that a packet sent later was delivered, | The acknowledgement indicates that a packet sent later was delivered, | |||
while the packet and time thresholds provide some tolerance for | and the packet and time thresholds provide some tolerance for packet | |||
packet reordering. | reordering. | |||
Spuriously declaring packets as lost leads to unnecessary | Spuriously declaring packets as lost leads to unnecessary | |||
retransmissions and may result in degraded performance due to the | retransmissions and may result in degraded performance due to the | |||
actions of the congestion controller upon detecting loss. | actions of the congestion controller upon detecting loss. | |||
Implementations that detect spurious retransmissions and increase the | Implementations that detect spurious retransmissions and increase the | |||
reordering threshold in packets or time MAY choose to start with | reordering threshold in packets or time MAY choose to start with | |||
smaller initial reordering thresholds to minimize recovery latency. | smaller initial reordering thresholds to minimize recovery latency. | |||
5.1.1. Packet Threshold | 5.1.1. Packet Threshold | |||
skipping to change at page 10, line 45 ¶ | skipping to change at page 10, line 47 ¶ | |||
(kPacketThreshold) is 3, based on best practices for TCP loss | (kPacketThreshold) is 3, based on best practices for TCP loss | |||
detection [RFC5681] [RFC6675]. | detection [RFC5681] [RFC6675]. | |||
Some networks may exhibit higher degrees of reordering, causing a | Some networks may exhibit higher degrees of reordering, causing a | |||
sender to detect spurious losses. Implementers MAY use algorithms | sender to detect spurious losses. Implementers MAY use algorithms | |||
developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's | developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's | |||
reordering resilience. | reordering resilience. | |||
5.1.2. Time Threshold | 5.1.2. Time Threshold | |||
Once a later packet packet within the same packet number space has | Once a later packet within the same packet number space has been | |||
been acknowledged, an endpoint SHOULD declare an earlier packet lost | acknowledged, an endpoint SHOULD declare an earlier packet lost if it | |||
if it was sent a threshold amount of time in the past. To avoid | was sent a threshold amount of time in the past. To avoid declaring | |||
declaring packets as lost too early, this time threshold MUST be set | packets as lost too early, this time threshold MUST be set to at | |||
to at least kGranularity. The time threshold is: | least kGranularity. The time threshold is: | |||
max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity) | ||||
kTimeThreshold * max(smoothed_rtt, latest_rtt, kGranularity) | ||||
If packets sent prior to the largest acknowledged packet cannot yet | If packets sent prior to the largest acknowledged packet cannot yet | |||
be declared lost, then a timer SHOULD be set for the remaining time. | be declared lost, then a timer SHOULD be set for the remaining time. | |||
Using max(smoothed_rtt, latest_rtt) protects from the two following | Using max(smoothed_rtt, latest_rtt) protects from the two following | |||
cases: | cases: | |||
o the latest RTT sample is lower than the smoothed RTT, perhaps due | o the latest RTT sample is lower than the smoothed RTT, perhaps due | |||
to reordering where the acknowledgement encountered a shorter | to reordering where the acknowledgement encountered a shorter | |||
path; | path; | |||
skipping to change at page 11, line 33 ¶ | skipping to change at page 11, line 36 ¶ | |||
variance. Smaller thresholds reduce reordering resilience and | variance. Smaller thresholds reduce reordering resilience and | |||
increase spurious retransmissions, and larger thresholds increase | increase spurious retransmissions, and larger thresholds increase | |||
loss detection delay. | loss detection delay. | |||
5.2. Probe Timeout | 5.2. Probe Timeout | |||
A Probe Timeout (PTO) triggers sending one or two probe datagrams | A Probe Timeout (PTO) triggers sending one or two probe datagrams | |||
when ack-eliciting packets are not acknowledged within the expected | when ack-eliciting packets are not acknowledged within the expected | |||
period of time or the handshake has not been completed. A PTO | period of time or the handshake has not been completed. A PTO | |||
enables a connection to recover from loss of tail packets or | enables a connection to recover from loss of tail packets or | |||
acknowledgements. The PTO algorithm used in QUIC implements the | acknowledgements. | |||
reliability functions of Tail Loss Probe [RACK], RTO [RFC5681] and | ||||
F-RTO algorithms for TCP [RFC5682], and the timeout computation is | As with loss detection, the probe timeout is per packet number space. | |||
based on TCP's retransmission timeout period [RFC6298]. | The PTO algorithm used in QUIC implements the reliability functions | |||
of Tail Loss Probe [RACK], RTO [RFC5681], and F-RTO algorithms for | ||||
TCP [RFC5682]. The timeout computation is based on TCP's | ||||
retransmission timeout period [RFC6298]. | ||||
5.2.1. Computing PTO | 5.2.1. Computing PTO | |||
When an ack-eliciting packet is transmitted, the sender schedules a | When an ack-eliciting packet is transmitted, the sender schedules a | |||
timer for the PTO period as follows: | timer for the PTO period as follows: | |||
PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay | PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay | |||
kGranularity, smoothed_rtt, rttvar, and max_ack_delay are defined in | kGranularity, smoothed_rtt, rttvar, and max_ack_delay are defined in | |||
Appendix A.2 and Appendix A.3. | Appendix A.2 and Appendix A.3. | |||
The PTO period is the amount of time that a sender ought to wait for | The PTO period is the amount of time that a sender ought to wait for | |||
an acknowledgement of a sent packet. This time period includes the | an acknowledgement of a sent packet. This time period includes the | |||
estimated network roundtrip-time (smoothed_rtt), the variance in the | estimated network roundtrip-time (smoothed_rtt), the variance in the | |||
estimate (4*rttvar), and max_ack_delay, to account for the maximum | estimate (4*rttvar), and max_ack_delay, to account for the maximum | |||
time by which a receiver might delay sending an acknowledgement. | time by which a receiver might delay sending an acknowledgement. | |||
When the PTO is armed for Initial or Handshake packet number spaces, | ||||
the max_ack_delay is 0, as specified in 13.2.5 of [QUIC-TRANSPORT]. | ||||
The PTO value MUST be set to at least kGranularity, to avoid the | The PTO value MUST be set to at least kGranularity, to avoid the | |||
timer expiring immediately. | timer expiring immediately. | |||
A sender computes its PTO timer every time an ack-eliciting packet is | ||||
sent. When ack-eliciting packets are in-flight in multiple packet | ||||
number spaces, the timer MUST be set for the packet number space with | ||||
the earliest timeout, except for ApplicationData, which MUST be | ||||
ignored until the handshake completes; see Section 4.1.1 of | ||||
[QUIC-TLS]. Not arming the PTO for ApplicationData prioritizes | ||||
completing the handshake and prevents the server from sending a 1-RTT | ||||
packet on a PTO before before it has the keys to process a 1-RTT | ||||
packet. | ||||
When a PTO timer expires, the PTO period MUST be set to twice its | When a PTO timer expires, the PTO period MUST be set to twice its | |||
current value. This exponential reduction in the sender's rate is | current value. This exponential reduction in the sender's rate is | |||
important because the PTOs might be caused by loss of packets or | important because consecutive PTOs might be caused by loss of packets | |||
acknowledgements due to severe congestion. The life of a connection | or acknowledgements due to severe congestion. Even when there are | |||
that is experiencing consecutive PTOs is limited by the endpoint's | ack-eliciting packets in-flight in multiple packet number spaces, the | |||
idle timeout. | exponential increase in probe timeout occurs across all spaces to | |||
prevent excess load on the network. For example, a timeout in the | ||||
Initial packet number space doubles the length of the timeout in the | ||||
Handshake packet number space. | ||||
A sender computes its PTO timer every time an ack-eliciting packet is | The life of a connection that is experiencing consecutive PTOs is | |||
sent. A sender might choose to optimize this by setting the timer | limited by the endpoint's idle timeout. | |||
fewer times if it knows that more ack-eliciting packets will be sent | ||||
within a short period of time. | ||||
The probe timer is not set if the time threshold Section 5.1.2 loss | The probe timer is not set if the time threshold Section 5.1.2 loss | |||
detection timer is set. The time threshold loss detection timer is | detection timer is set. The time threshold loss detection timer is | |||
expected to both expire earlier than the PTO and be less likely to | expected to both expire earlier than the PTO and be less likely to | |||
spuriously retransmit data. | spuriously retransmit data. | |||
5.3. Handshakes and New Paths | 5.3. Handshakes and New Paths | |||
The initial probe timeout for a new connection or new path SHOULD be | The initial probe timeout for a new connection or new path SHOULD be | |||
set to twice the initial RTT. Resumed connections over the same | set to twice the initial RTT. Resumed connections over the same | |||
network SHOULD use the previous connection's final smoothed RTT value | network SHOULD use the previous connection's final smoothed RTT value | |||
as the resumed connection's initial RTT. If no previous RTT is | as the resumed connection's initial RTT. If no previous RTT is | |||
available, the initial RTT SHOULD be set to 500ms, resulting in a 1 | available, the initial RTT SHOULD be set to 500ms, resulting in a 1 | |||
second initial timeout as recommended in [RFC6298]. | second initial timeout as recommended in [RFC6298]. | |||
A connection MAY use the delay between sending a PATH_CHALLENGE and | A connection MAY use the delay between sending a PATH_CHALLENGE and | |||
receiving a PATH_RESPONSE to seed initial_rtt for a new path, but the | receiving a PATH_RESPONSE to set the initial RTT (see kInitialRtt in | |||
delay SHOULD NOT be considered an RTT sample. | Appendix A.2) for a new path, but the delay SHOULD NOT be considered | |||
an RTT sample. | ||||
Until the server has validated the client's address on the path, the | Until the server has validated the client's address on the path, the | |||
amount of data it can send is limited to three times the amount of | amount of data it can send is limited to three times the amount of | |||
data received, as specified in Section 8.1 of [QUIC-TRANSPORT]. If | data received, as specified in Section 8.1 of [QUIC-TRANSPORT]. If | |||
no data can be sent, then the PTO alarm MUST NOT be armed. | no data can be sent, then the PTO alarm MUST NOT be armed until | |||
datagrams have been received from the client. | ||||
Since the server could be blocked until more packets are received | Since the server could be blocked until more packets are received | |||
from the client, it is the client's responsibility to send packets to | from the client, it is the client's responsibility to send packets to | |||
unblock the server until it is certain that the server has finished | unblock the server until it is certain that the server has finished | |||
its address validation (see Section 8 of [QUIC-TRANSPORT]). That is, | its address validation (see Section 8 of [QUIC-TRANSPORT]). That is, | |||
the client MUST set the probe timer if the client has not received an | the client MUST set the probe timer if the client has not received an | |||
acknowledgement for one of its Handshake or 1-RTT packets. | acknowledgement for one of its Handshake or 1-RTT packets. | |||
Prior to handshake completion, when few to none RTT samples have been | Prior to handshake completion, when few to none RTT samples have been | |||
generated, it is possible that the probe timer expiration is due to | generated, it is possible that the probe timer expiration is due to | |||
an incorrect RTT estimate at the client. To allow the client to | an incorrect RTT estimate at the client. To allow the client to | |||
improve its RTT estimate, the new packet that it sends MUST be ack- | improve its RTT estimate, the new packet that it sends MUST be ack- | |||
eliciting. If Handshake keys are available to the client, it MUST | eliciting. If Handshake keys are available to the client, it MUST | |||
send a Handshake packet, and otherwise it MUST send an Initial packet | send a Handshake packet, and otherwise it MUST send an Initial packet | |||
in a UDP datagram of at least 1200 bytes. | in a UDP datagram of at least 1200 bytes. | |||
Initial packets and Handshake packets may never be acknowledged, but | Initial packets and Handshake packets could be never acknowledged, | |||
they are removed from bytes in flight when the Initial and Handshake | but they are removed from bytes in flight when the Initial and | |||
keys are discarded. | Handshake keys are discarded. | |||
5.3.1. Sending Probe Packets | 5.3.1. Sending Probe Packets | |||
When a PTO timer expires, a sender MUST send at least one ack- | When a PTO timer expires, a sender MUST send at least one ack- | |||
eliciting packet as a probe, unless there is no data available to | eliciting packet in the packet number space as a probe, unless there | |||
send. An endpoint MAY send up to two full-sized datagrams containing | is no data available to send. An endpoint MAY send up to two full- | |||
ack-eliciting packets, to avoid an expensive consecutive PTO | sized datagrams containing ack-eliciting packets, to avoid an | |||
expiration due to a single lost datagram. | expensive consecutive PTO expiration due to a single lost datagram or | |||
transmit data from multiple packet number spaces. | ||||
In addition to sending data in the packet number space for which the | ||||
timer expired, the sender SHOULD send ack-eliciting packets from | ||||
other packet number spaces with in-flight data, coalescing packets if | ||||
possible. | ||||
When the PTO timer expires, and there is new or previously sent | When the PTO timer expires, and there is new or previously sent | |||
unacknowledged data, it MUST be sent. Data that was previously sent | unacknowledged data, it MUST be sent. | |||
with Initial encryption MUST be sent before Handshake data and data | ||||
previously sent at Handshake encryption MUST be sent before any | ||||
ApplicationData data. | ||||
It is possible the sender has no new or previously-sent data to send. | It is possible the sender has no new or previously-sent data to send. | |||
As an example, consider the following sequence of events: new | As an example, consider the following sequence of events: new | |||
application data is sent in a STREAM frame, deemed lost, then | application data is sent in a STREAM frame, deemed lost, then | |||
retransmitted in a new packet, and then the original transmission is | retransmitted in a new packet, and then the original transmission is | |||
acknowledged. When there is no data to send, the sender SHOULD send | acknowledged. When there is no data to send, the sender SHOULD send | |||
a PING or other ack-eliciting frame in a single packet, re-arming the | a PING or other ack-eliciting frame in a single packet, re-arming the | |||
PTO timer. | PTO timer. | |||
Alternatively, instead of sending an ack-eliciting packet, the sender | Alternatively, instead of sending an ack-eliciting packet, the sender | |||
skipping to change at page 15, line 43 ¶ | skipping to change at page 16, line 19 ¶ | |||
congestion window, unless the packet is a probe packet sent after a | congestion window, unless the packet is a probe packet sent after a | |||
PTO timer expires, as described in Section 5.2. | PTO timer expires, as described in Section 5.2. | |||
Implementations MAY use other congestion control algorithms, such as | Implementations MAY use other congestion control algorithms, such as | |||
Cubic [RFC8312], and endpoints MAY use different algorithms from one | Cubic [RFC8312], and endpoints MAY use different algorithms from one | |||
another. The signals QUIC provides for congestion control are | another. The signals QUIC provides for congestion control are | |||
generic and are designed to support different algorithms. | generic and are designed to support different algorithms. | |||
6.1. Explicit Congestion Notification | 6.1. Explicit Congestion Notification | |||
If a path has been verified to support ECN, QUIC treats a Congestion | If a path has been verified to support ECN [RFC3168] [RFC8311], QUIC | |||
Experienced codepoint in the IP header as a signal of congestion. | treats a Congestion Experienced(CE) codepoint in the IP header as a | |||
This document specifies an endpoint's response when its peer receives | signal of congestion. This document specifies an endpoint's response | |||
packets with the Congestion Experienced codepoint. As discussed in | when its peer receives packets with the Congestion Experienced | |||
[RFC8311], endpoints are permitted to experiment with other response | codepoint. | |||
functions. | ||||
6.2. Slow Start | 6.2. Slow Start | |||
QUIC begins every connection in slow start and exits slow start upon | QUIC begins every connection in slow start and exits slow start upon | |||
loss or upon increase in the ECN-CE counter. QUIC re-enters slow | loss or upon increase in the ECN-CE counter. QUIC re-enters slow | |||
start anytime the congestion window is less than ssthresh, which only | start any time the congestion window is less than ssthresh, which | |||
occurs after persistent congestion is declared. While in slow start, | only occurs after persistent congestion is declared. While in slow | |||
QUIC increases the congestion window by the number of bytes | start, QUIC increases the congestion window by the number of bytes | |||
acknowledged when each acknowledgment is processed. | acknowledged when each acknowledgment is processed. | |||
6.3. Congestion Avoidance | 6.3. Congestion Avoidance | |||
Slow start exits to congestion avoidance. Congestion avoidance in | Slow start exits to congestion avoidance. Congestion avoidance in | |||
NewReno uses an additive increase multiplicative decrease (AIMD) | NewReno uses an additive increase multiplicative decrease (AIMD) | |||
approach that increases the congestion window by one maximum packet | approach that increases the congestion window by one maximum packet | |||
size per congestion window acknowledged. When a loss is detected, | size per congestion window acknowledged. When a loss is detected, | |||
NewReno halves the congestion window and sets the slow start | NewReno halves the congestion window and sets the slow start | |||
threshold to the new congestion window. | threshold to the new congestion window. | |||
skipping to change at page 17, line 40 ¶ | skipping to change at page 18, line 17 ¶ | |||
+-----+------------------------+ | +-----+------------------------+ | |||
| t=1 | Send Pkt #2 (PTO 1) | | | t=1 | Send Pkt #2 (PTO 1) | | |||
| | | | | | | | |||
| t=3 | Send Pkt #3 (PTO 2) | | | t=3 | Send Pkt #3 (PTO 2) | | |||
| | | | | | | | |||
| t=7 | Send Pkt #4 (PTO 3) | | | t=7 | Send Pkt #4 (PTO 3) | | |||
| | | | | | | | |||
| t=8 | Recv ACK of Pkt #4 | | | t=8 | Recv ACK of Pkt #4 | | |||
+-----+------------------------+ | +-----+------------------------+ | |||
The first three packets are determined to be lost when the ACK of | The first three packets are determined to be lost when the | |||
packet 4 is received at t=8. The congestion period is calculated as | acknowlegement of packet 4 is received at t=8. The congestion period | |||
the time between the oldest and newest lost packets: (3 - 0) = 3. | is calculated as the time between the oldest and newest lost packets: | |||
The duration for persistent congestion is equal to: (1 * | (3 - 0) = 3. The duration for persistent congestion is equal to: (1 | |||
kPersistentCongestionThreshold) = 3. Because the threshold was | * kPersistentCongestionThreshold) = 3. Because the threshold was | |||
reached and because none of the packets between the oldest and the | reached and because none of the packets between the oldest and the | |||
newest packets are acknowledged, the network is considered to have | newest packets are acknowledged, the network is considered to have | |||
experienced persistent congestion. | experienced persistent congestion. | |||
When persistent congestion is established, the sender's congestion | When persistent congestion is established, the sender's congestion | |||
window MUST be reduced to the minimum congestion window | window MUST be reduced to the minimum congestion window | |||
(kMinimumWindow). This response of collapsing the congestion window | (kMinimumWindow). This response of collapsing the congestion window | |||
on persistent congestion is functionally similar to a sender's | on persistent congestion is functionally similar to a sender's | |||
response on a Retransmission Timeout (RTO) in TCP [RFC5681] after | response on a Retransmission Timeout (RTO) in TCP [RFC5681] after | |||
Tail Loss Probes (TLP) [RACK]. | Tail Loss Probes (TLP) [RACK]. | |||
skipping to change at page 19, line 37 ¶ | skipping to change at page 20, line 13 ¶ | |||
frames to reduce leaked information. | frames to reduce leaked information. | |||
7.3. Misreporting ECN Markings | 7.3. Misreporting ECN Markings | |||
A receiver can misreport ECN markings to alter the congestion | A receiver can misreport ECN markings to alter the congestion | |||
response of a sender. Suppressing reports of ECN-CE markings could | response of a sender. Suppressing reports of ECN-CE markings could | |||
cause a sender to increase their send rate. This increase could | cause a sender to increase their send rate. This increase could | |||
result in congestion and loss. | result in congestion and loss. | |||
A sender MAY attempt to detect suppression of reports by marking | A sender MAY attempt to detect suppression of reports by marking | |||
occasional packets that they send with ECN-CE. If a packet marked | occasional packets that they send with ECN-CE. If a packet sent with | |||
with ECN-CE is not reported as having been marked when the packet is | ECN-CE is not reported as having been CE marked when the packet is | |||
acknowledged, the sender SHOULD then disable ECN for that path. | acknowledged, then the sender SHOULD disable ECN for that path. | |||
Reporting additional ECN-CE markings will cause a sender to reduce | Reporting additional ECN-CE markings will cause a sender to reduce | |||
their sending rate, which is similar in effect to advertising reduced | their sending rate, which is similar in effect to advertising reduced | |||
connection flow control limits and so no advantage is gained by doing | connection flow control limits and so no advantage is gained by doing | |||
so. | so. | |||
Endpoints choose the congestion controller that they use. Though | Endpoints choose the congestion controller that they use. Though | |||
congestion controllers generally treat reports of ECN-CE markings as | congestion controllers generally treat reports of ECN-CE markings as | |||
equivalent to loss [RFC8311], the exact response for each controller | equivalent to loss [RFC8311], the exact response for each controller | |||
could be different. Failure to correctly respond to information | could be different. Failure to correctly respond to information | |||
skipping to change at page 20, line 15 ¶ | skipping to change at page 20, line 38 ¶ | |||
8. IANA Considerations | 8. IANA Considerations | |||
This document has no IANA actions. Yet. | This document has no IANA actions. Yet. | |||
9. References | 9. References | |||
9.1. Normative References | 9.1. Normative References | |||
[QUIC-TLS] | [QUIC-TLS] | |||
Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure | Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure | |||
QUIC", draft-ietf-quic-tls-24 (work in progress). | QUIC", draft-ietf-quic-tls-latest (work in progress). | |||
[QUIC-TRANSPORT] | [QUIC-TRANSPORT] | |||
Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | |||
Multiplexed and Secure Transport", draft-ietf-quic- | Multiplexed and Secure Transport", draft-ietf-quic- | |||
transport-24 (work in progress). | transport-latest (work in progress). | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
[RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | ||||
Notification (ECN) Experimentation", RFC 8311, | ||||
DOI 10.17487/RFC8311, January 2018, | ||||
<https://www.rfc-editor.org/info/rfc8311>. | ||||
9.2. Informative References | 9.2. Informative References | |||
[FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgement: | [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgement: | |||
Refining TCP Congestion Control", ACM SIGCOMM , August | Refining TCP Congestion Control", ACM SIGCOMM , August | |||
1996. | 1996. | |||
[RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: | [RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: | |||
a time-based fast loss detection algorithm for TCP", | a time-based fast loss detection algorithm for TCP", | |||
draft-ietf-tcpm-rack-06 (work in progress), November 2019. | draft-ietf-tcpm-rack-06 (work in progress), November 2019. | |||
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | ||||
of Explicit Congestion Notification (ECN) to IP", | ||||
RFC 3168, DOI 10.17487/RFC3168, September 2001, | ||||
<https://www.rfc-editor.org/info/rfc3168>. | ||||
[RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte | |||
Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February | |||
2003, <https://www.rfc-editor.org/info/rfc3465>. | 2003, <https://www.rfc-editor.org/info/rfc3465>. | |||
[RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, | [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, | |||
"Improving the Robustness of TCP to Non-Congestion | "Improving the Robustness of TCP to Non-Congestion | |||
Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, | Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, | |||
<https://www.rfc-editor.org/info/rfc4653>. | <https://www.rfc-editor.org/info/rfc4653>. | |||
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
skipping to change at page 22, line 5 ¶ | skipping to change at page 22, line 26 ¶ | |||
[RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, | [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis, | |||
"Increasing TCP's Initial Window", RFC 6928, | "Increasing TCP's Initial Window", RFC 6928, | |||
DOI 10.17487/RFC6928, April 2013, | DOI 10.17487/RFC6928, April 2013, | |||
<https://www.rfc-editor.org/info/rfc6928>. | <https://www.rfc-editor.org/info/rfc6928>. | |||
[RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating | [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating | |||
TCP to Support Rate-Limited Traffic", RFC 7661, | TCP to Support Rate-Limited Traffic", RFC 7661, | |||
DOI 10.17487/RFC7661, October 2015, | DOI 10.17487/RFC7661, October 2015, | |||
<https://www.rfc-editor.org/info/rfc7661>. | <https://www.rfc-editor.org/info/rfc7661>. | |||
[RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | ||||
Notification (ECN) Experimentation", RFC 8311, | ||||
DOI 10.17487/RFC8311, January 2018, | ||||
<https://www.rfc-editor.org/info/rfc8311>. | ||||
[RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and | [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and | |||
R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", | R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", | |||
RFC 8312, DOI 10.17487/RFC8312, February 2018, | RFC 8312, DOI 10.17487/RFC8312, February 2018, | |||
<https://www.rfc-editor.org/info/rfc8312>. | <https://www.rfc-editor.org/info/rfc8312>. | |||
9.3. URIs | 9.3. URIs | |||
[1] https://mailarchive.ietf.org/arch/search/?email_list=quic | [1] https://mailarchive.ietf.org/arch/search/?email_list=quic | |||
[2] https://github.com/quicwg | [2] https://github.com/quicwg | |||
skipping to change at page 24, line 6 ¶ | skipping to change at page 24, line 39 ¶ | |||
described in [RFC6298] | described in [RFC6298] | |||
rttvar: The RTT variance, computed as described in [RFC6298] | rttvar: The RTT variance, computed as described in [RFC6298] | |||
min_rtt: The minimum RTT seen in the connection, ignoring ack delay. | min_rtt: The minimum RTT seen in the connection, ignoring ack delay. | |||
max_ack_delay: The maximum amount of time by which the receiver | max_ack_delay: The maximum amount of time by which the receiver | |||
intends to delay acknowledgments for packets in the | intends to delay acknowledgments for packets in the | |||
ApplicationData packet number space. The actual ack_delay in a | ApplicationData packet number space. The actual ack_delay in a | |||
received ACK frame may be larger due to late timers, reordering, | received ACK frame may be larger due to late timers, reordering, | |||
or lost ACKs. | or lost ACK frames. | |||
loss_detection_timer: Multi-modal timer used for loss detection. | loss_detection_timer: Multi-modal timer used for loss detection. | |||
pto_count: The number of times a PTO has been sent without receiving | pto_count: The number of times a PTO has been sent without receiving | |||
an ack. | an ack. | |||
time_of_last_sent_ack_eliciting_packet: The time the most recent | time_of_last_sent_ack_eliciting_packet[kPacketNumberSpace]: The time | |||
ack-eliciting packet was sent. | the most recent ack-eliciting packet was sent. | |||
largest_acked_packet[kPacketNumberSpace]: The largest packet number | largest_acked_packet[kPacketNumberSpace]: The largest packet number | |||
acknowledged in the packet number space so far. | acknowledged in the packet number space so far. | |||
loss_time[kPacketNumberSpace]: The time at which the next packet in | loss_time[kPacketNumberSpace]: The time at which the next packet in | |||
that packet number space will be considered lost based on | that packet number space will be considered lost based on | |||
exceeding the reordering window in time. | exceeding the reordering window in time. | |||
sent_packets[kPacketNumberSpace]: An association of packet numbers | sent_packets[kPacketNumberSpace]: An association of packet numbers | |||
in a packet number space to information about them. Described in | in a packet number space to information about them. Described in | |||
skipping to change at page 24, line 39 ¶ | skipping to change at page 25, line 25 ¶ | |||
At the beginning of the connection, initialize the loss detection | At the beginning of the connection, initialize the loss detection | |||
variables as follows: | variables as follows: | |||
loss_detection_timer.reset() | loss_detection_timer.reset() | |||
pto_count = 0 | pto_count = 0 | |||
latest_rtt = 0 | latest_rtt = 0 | |||
smoothed_rtt = 0 | smoothed_rtt = 0 | |||
rttvar = 0 | rttvar = 0 | |||
min_rtt = 0 | min_rtt = 0 | |||
max_ack_delay = 0 | max_ack_delay = 0 | |||
time_of_last_sent_ack_eliciting_packet = 0 | ||||
for pn_space in [ Initial, Handshake, ApplicationData ]: | for pn_space in [ Initial, Handshake, ApplicationData ]: | |||
largest_acked_packet[pn_space] = infinite | largest_acked_packet[pn_space] = infinite | |||
time_of_last_sent_ack_eliciting_packet[pn_space] = 0 | ||||
loss_time[pn_space] = 0 | loss_time[pn_space] = 0 | |||
A.5. On Sending a Packet | A.5. On Sending a Packet | |||
After a packet is sent, information about the packet is stored. The | After a packet is sent, information about the packet is stored. The | |||
parameters to OnPacketSent are described in detail above in | parameters to OnPacketSent are described in detail above in | |||
Appendix A.1.1. | Appendix A.1.1. | |||
Pseudocode for OnPacketSent follows: | Pseudocode for OnPacketSent follows: | |||
OnPacketSent(packet_number, pn_space, ack_eliciting, | OnPacketSent(packet_number, pn_space, ack_eliciting, | |||
in_flight, sent_bytes): | in_flight, sent_bytes): | |||
sent_packets[pn_space][packet_number].packet_number = | sent_packets[pn_space][packet_number].packet_number = | |||
packet_number | packet_number | |||
sent_packets[pn_space][packet_number].time_sent = now | sent_packets[pn_space][packet_number].time_sent = now | |||
sent_packets[pn_space][packet_number].ack_eliciting = | sent_packets[pn_space][packet_number].ack_eliciting = | |||
ack_eliciting | ack_eliciting | |||
sent_packets[pn_space][packet_number].in_flight = in_flight | sent_packets[pn_space][packet_number].in_flight = in_flight | |||
if (in_flight): | if (in_flight): | |||
if (ack_eliciting): | if (ack_eliciting): | |||
time_of_last_sent_ack_eliciting_packet = now | time_of_last_sent_ack_eliciting_packet[pn_space] = now | |||
OnPacketSentCC(sent_bytes) | OnPacketSentCC(sent_bytes) | |||
sent_packets[pn_space][packet_number].size = sent_bytes | sent_packets[pn_space][packet_number].size = sent_bytes | |||
SetLossDetectionTimer() | SetLossDetectionTimer() | |||
A.6. On Receiving an Acknowledgment | A.6. On Receiving an Acknowledgment | |||
When an ACK frame is received, it may newly acknowledge any number of | When an ACK frame is received, it may newly acknowledge any number of | |||
packets. | packets. | |||
Pseudocode for OnAckReceived and UpdateRtt follow: | Pseudocode for OnAckReceived and UpdateRtt follow: | |||
skipping to change at page 28, line 5 ¶ | skipping to change at page 28, line 5 ¶ | |||
which is set in the packet and timer events further below. The | which is set in the packet and timer events further below. The | |||
function SetLossDetectionTimer defined below shows how the single | function SetLossDetectionTimer defined below shows how the single | |||
timer is set. | timer is set. | |||
This algorithm may result in the timer being set in the past, | This algorithm may result in the timer being set in the past, | |||
particularly if timers wake up late. Timers set in the past SHOULD | particularly if timers wake up late. Timers set in the past SHOULD | |||
fire immediately. | fire immediately. | |||
Pseudocode for SetLossDetectionTimer follows: | Pseudocode for SetLossDetectionTimer follows: | |||
// Returns the earliest loss_time and the packet number | GetEarliestTimeAndSpace(times): | |||
// space it's from. Returns 0 if all times are 0. | time = times[Initial] | |||
GetEarliestLossTime(): | ||||
time = loss_time[Initial] | ||||
space = Initial | space = Initial | |||
for pn_space in [ Handshake, ApplicationData ]: | for pn_space in [ Handshake, ApplicationData ]: | |||
if (loss_time[pn_space] != 0 && | if (times[pn_space] != 0 && | |||
(time == 0 || loss_time[pn_space] < time)): | (time == 0 || times[pn_space] < time) && | |||
time = loss_time[pn_space]; | # Skip ApplicationData until handshake completion. | |||
(pn_space != ApplicationData || | ||||
IsHandshakeComplete()): | ||||
time = times[pn_space]; | ||||
space = pn_space | space = pn_space | |||
return time, space | return time, space | |||
PeerNotAwaitingAddressValidation(): | PeerNotAwaitingAddressValidation(): | |||
# Assume clients validate the server's address implicitly. | # Assume clients validate the server's address implicitly. | |||
if (endpoint is server): | if (endpoint is server): | |||
return true | return true | |||
# Servers complete address validation when a | # Servers complete address validation when a | |||
# protected packet is received. | # protected packet is received. | |||
return has received Handshake ACK || | return has received Handshake ACK || | |||
has received 1-RTT ACK | has received 1-RTT ACK | |||
SetLossDetectionTimer(): | SetLossDetectionTimer(): | |||
loss_time, _ = GetEarliestLossTime() | earliest_loss_time, _ = GetEarliestTimeAndSpace(loss_time) | |||
if (loss_time != 0): | if (earliest_loss_time != 0): | |||
// Time threshold loss detection. | // Time threshold loss detection. | |||
loss_detection_timer.update(loss_time) | loss_detection_timer.update(earliest_loss_time) | |||
return | return | |||
if (no ack-eliciting packets in flight && | if (no ack-eliciting packets in flight && | |||
PeerNotAwaitingAddressValidation()): | PeerNotAwaitingAddressValidation()): | |||
loss_detection_timer.cancel() | loss_detection_timer.cancel() | |||
return | return | |||
// Use a default timeout if there are no RTT measurements | // Use a default timeout if there are no RTT measurements | |||
if (smoothed_rtt == 0): | if (smoothed_rtt == 0): | |||
timeout = 2 * kInitialRtt | timeout = 2 * kInitialRtt | |||
else: | else: | |||
// Calculate PTO duration | // Calculate PTO duration | |||
timeout = smoothed_rtt + max(4 * rttvar, kGranularity) + | timeout = smoothed_rtt + max(4 * rttvar, kGranularity) + | |||
max_ack_delay | max_ack_delay | |||
timeout = timeout * (2 ^ pto_count) | timeout = timeout * (2 ^ pto_count) | |||
loss_detection_timer.update( | sent_time, _ = GetEarliestTimeAndSpace( | |||
time_of_last_sent_ack_eliciting_packet + timeout) | time_of_last_sent_ack_eliciting_packet) | |||
loss_detection_timer.update(sent_time + timeout) | ||||
A.9. On Timeout | A.9. On Timeout | |||
When the loss detection timer expires, the timer's mode determines | When the loss detection timer expires, the timer's mode determines | |||
the action to be performed. | the action to be performed. | |||
Pseudocode for OnLossDetectionTimeout follows: | Pseudocode for OnLossDetectionTimeout follows: | |||
OnLossDetectionTimeout(): | OnLossDetectionTimeout(): | |||
loss_time, pn_space = GetEarliestLossTime() | earliest_loss_time, pn_space = | |||
if (loss_time != 0): | GetEarliestTimeAndSpace(loss_time) | |||
if (earliest_loss_time != 0): | ||||
// Time threshold loss Detection | // Time threshold loss Detection | |||
DetectLostPackets(pn_space) | DetectLostPackets(pn_space) | |||
SetLossDetectionTimer() | SetLossDetectionTimer() | |||
return | return | |||
if (endpoint is client without 1-RTT keys): | if (endpoint is client without 1-RTT keys): | |||
// Client sends an anti-deadlock packet: Initial is padded | // Client sends an anti-deadlock packet: Initial is padded | |||
// to earn more anti-amplification credit, | // to earn more anti-amplification credit, | |||
// a Handshake packet proves address ownership. | // a Handshake packet proves address ownership. | |||
if (has Handshake keys): | if (has Handshake keys): | |||
SendOneAckElicitingHandshakePacket() | SendOneAckElicitingHandshakePacket() | |||
else: | else: | |||
SendOneAckElicitingPaddedInitialPacket() | SendOneAckElicitingPaddedInitialPacket() | |||
else: | else: | |||
// PTO. Send new data if available, else retransmit old data. | // PTO. Send new data if available, else retransmit old data. | |||
// If neither is available, send a single PING frame. | // If neither is available, send a single PING frame. | |||
SendOneOrTwoAckElicitingPackets() | _, pn_space = GetEarliestTimeAndSpace( | |||
time_of_last_sent_ack_eliciting_packet) | ||||
SendOneOrTwoAckElicitingPackets(pn_space) | ||||
pto_count++ | pto_count++ | |||
SetLossDetectionTimer() | SetLossDetectionTimer() | |||
A.10. Detecting Lost Packets | A.10. Detecting Lost Packets | |||
DetectLostPackets is called every time an ACK is received and | DetectLostPackets is called every time an ACK is received and | |||
operates on the sent_packets for that packet number space. | operates on the sent_packets for that packet number space. | |||
Pseudocode for DetectLostPackets follows: | Pseudocode for DetectLostPackets follows: | |||
skipping to change at page 33, line 14 ¶ | skipping to change at page 33, line 14 ¶ | |||
InCongestionRecovery(sent_time): | InCongestionRecovery(sent_time): | |||
return sent_time <= congestion_recovery_start_time | return sent_time <= congestion_recovery_start_time | |||
OnPacketAckedCC(acked_packet): | OnPacketAckedCC(acked_packet): | |||
// Remove from bytes_in_flight. | // Remove from bytes_in_flight. | |||
bytes_in_flight -= acked_packet.size | bytes_in_flight -= acked_packet.size | |||
if (InCongestionRecovery(acked_packet.time_sent)): | if (InCongestionRecovery(acked_packet.time_sent)): | |||
// Do not increase congestion window in recovery period. | // Do not increase congestion window in recovery period. | |||
return | return | |||
if (IsAppLimited()): | if (IsAppOrFlowControlLimited()): | |||
// Do not increase congestion_window if application | // Do not increase congestion_window if application | |||
// limited. | // limited or flow control limited. | |||
return | return | |||
if (congestion_window < ssthresh): | if (congestion_window < ssthresh): | |||
// Slow start. | // Slow start. | |||
congestion_window += acked_packet.size | congestion_window += acked_packet.size | |||
else: | else: | |||
// Congestion avoidance. | // Congestion avoidance. | |||
congestion_window += max_datagram_size * acked_packet.size | congestion_window += max_datagram_size * acked_packet.size | |||
/ congestion_window | / congestion_window | |||
B.6. On New Congestion Event | B.6. On New Congestion Event | |||
skipping to change at page 34, line 50 ¶ | skipping to change at page 35, line 5 ¶ | |||
o PTO MUST send data if possible (#3056, #3057) | o PTO MUST send data if possible (#3056, #3057) | |||
o Connection Close is not ack-eliciting (#3097, #3098) | o Connection Close is not ack-eliciting (#3097, #3098) | |||
o MUST limit bursts to the initial congestion window (#3160) | o MUST limit bursts to the initial congestion window (#3160) | |||
o Define the current max_datagram_size for congestion control | o Define the current max_datagram_size for congestion control | |||
(#3041, #3167) | (#3041, #3167) | |||
o Separate PTO by packet number space (#3067, #3074, #3066) | ||||
C.2. Since draft-ietf-quic-recovery-22 | C.2. Since draft-ietf-quic-recovery-22 | |||
o PTO should always send an ack-eliciting packet (#2895) | o PTO should always send an ack-eliciting packet (#2895) | |||
o Unify the Handshake Timer with the PTO timer (#2648, #2658, #2886) | o Unify the Handshake Timer with the PTO timer (#2648, #2658, #2886) | |||
o Move ACK generation text to transport draft (#1860, #2916) | o Move ACK generation text to transport draft (#1860, #2916) | |||
C.3. Since draft-ietf-quic-recovery-21 | C.3. Since draft-ietf-quic-recovery-21 | |||
End of changes. 59 change blocks. | ||||
132 lines changed or deleted | 161 lines changed or added | |||
This html diff was produced by rfcdiff 1.44jr. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |