Binary Representation of HTTP MessagesMozillamt@lowentropy.netCloudflarecaw@heapingbits.net
ART
HTTPThis document defines a binary format for representing HTTP messages.Discussion of this document takes place on the HTTP Working Group mailing list (ietf-http-wg@w3.org), which is archived at .IntroductionThis document defines a simple format for representing an HTTP message (), either request or response. This allows for the encoding of HTTP messages that can be conveyed outside of an HTTP protocol. This enables the transformation of entire messages, including the application of authenticated encryption.This format is informed by the framing structure of HTTP/2 () and HTTP/3 (). In comparison, this format simpler by virtue of not including either header compression (, ) or a generic framing layer.This format provides an alternative to the message/http content type defined in . A binary format permits more efficient encoding and processing of messages. A binary format also reduces exposure to security problems related to processing of HTTP messages.Two modes for encoding are described:a known-length encoding includes length prefixes for all major message components; andan indefinite-length encoding enables efficient generation of messages where lengths are not known when encoding starts.Conventions and DefinitionsThe key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.This document uses terminology from HTTP () and notation from QUIC ().Format defines five distinct parts to HTTP messages. A framing indicator is added to signal how these parts are composed:Framing indicator. This format uses a single integer to describe framing, which describes whether the message is a request or response and how subsequent sections are formatted; see .For a response, any number of interim responses, each consisting of an informational status code and header section.Control data. For a request, this contains the request method and target. For a response, this contains the status code.Header section. This contains zero or more header fields.Content. This is a sequence of zero or more bytes.Trailer section. This contains zero or more trailer fields.All lengths and numeric values are encoded using the variable-length integer encoding from .Known Length MessagesA message that has a known length at the time of construction uses the format shown in .That is, a known-length message consists of a framing indicator, a block of control data that is formatted according to the value of the framing indicator, a header section with a length prefix, binary content with a length prefix, and a trailer section with a length prefix.Response messages that contain informational status codes result in a different structure; see .Fields in the header and trailer sections consist of a length-prefixed name and length-prefixed value. Both name and value are sequences of bytes that cannot be zero length.The format allows for the message to be truncated before any of the length prefixes that precede the field sections or content. This reduces the overall message size. A message that is truncated at any other point is invalid; see .The variable-length integer encoding means that there is a limit of 2^62-1 bytes for each field section and the message content.Indeterminate Length MessagesA message that is constructed without encoding a known length for each section uses the format shown in :That is, an indeterminate length consists of a framing indicator, a block of control data that is formatted according to the value of the framing indicator, a header section that is terminated by a zero value, any number of non-zero-length chunks of binary content, a zero value, and a trailer section that is terminated by a zero value.Response messages that contain informational status codes result in a different structure; see .Indeterminate-length messages can be truncated in a similar way as known-length messages. Truncation occurs after the control data, or after the Content Terminator field that ends a field section or sequence of content chunks. A message that is truncated at any other point is invalid; see .Indeterminate-length messages use the same encoding for field lines as known-length messages; see .Framing IndicatorThe start of each is a framing indicator that is a single integer that describes the structure of the subsequent sections. The framing indicator can take just four values:A value of 0 describes a request of known length.A value of 1 describes a response of known length.A value of 2 describes a request of indeterminate length.A value of 3 describes a response of indeterminate length.Other values cause the message to be invalid; see .Request Control DataThe control data for a request message includes four values that correspond to the values of the :method, :scheme, :authority, and :path pseudo-header fields described in HTTP/2 (). These fields are encoded, each with a length prefix, in the order listed.The rules in for constructing pseudo-header fields apply to the construction of these values. However, where the :authority pseudo-header field might be omitted in HTTP/2, a zero-length value is encoded instead.The format of request control data is shown in .Response Control DataThe control data for a request message includes a single field that corresponds to the :status pseudo-header field in HTTP/2; see . This field is encoded as a single variable length integer, not a decimal string.The format of final response control data is shown in .Informational Status CodesResponses that include information status codes (see ) are encoded by repeating the response control data and associated header section until the final status code is encoded.The format of the informational response control data is shown in .A response message can include any number of informational responses. If the response control data includes an informational status code (that is, a value between 100 and 199 inclusive), the control data is followed by a header section (encoded with known- or indeterminate- length according to the framing indicator). After the header section, another response control data block follows.Header and Trailer Field LinesHeader and trailer sections consist of zero or more field lines; see . The format of a field section depends on whether the message is known- or intermediate-length.Each field line includes a name and a value. Both the name and value are sequences of bytes, with the name including at least one byte. The format of a field line is shown in .For field names, byte values that are not permitted in an HTTP field name cause the message to be invalid; see for a definition of what is valid and for handling of invalid messages.Field names and values MUST be constructed and validated according to the rules of . A recipient MUST treat a message that contains field values that would cause an HTTP/2 message to be malformed () as invalid; see .The same field name can be repeated in multiple field lines; see for the semantics of repeated field names and rules for combining values.Like HTTP/2, this format has an exception for the combination of multiple instances of the Cookie field. Instances of fields with the ASCII-encoded value of cookie are combined using a semicolon octet (0x3b) rather than a comma; see .This format provides fixed locations for content that would be carried in HTTP/2 pseudo-fields. Therefore, there is no need to include field lines containing a name of :method, :scheme, :authority, :path, or :status. Fields that contain one of these names cause the message to be invalid; see . Pseudo-fields that are defined by protocol extensions MAY be included. Field lines containing pseudo-fields MUST precede other field lines; a message that contains a pseudo-field after any other field is invalid; see .ContentThe content of messages is a sequence of bytes of any length. Though a known-length message has a limit, this limit is large enough that it is unlikely to be a practical limitation. There is no limit to the size of content in an indeterminate length message.Omitting content by truncating a message is only possible if the content is zero-length.Invalid MessagesThis document describes a number of ways that a message can be invalid. Invalid messages MUST NOT be processed except to log an error and produce an error response.The format is designed to allow incremental processing. Implementations need to be aware of the possibility that an error might be detected after performing incremental processing.ExamplesThis section includes example requests and responses encoded in both known-length and indefinite-length forms.Request ExampleThe example HTTP/1.1 message in shows the content of a message/http.Valid HTTP/1.1 messages require lines terminated with CRLF (the two bytes 0x0a and 0x0d). For simplicity and consistency, the content of these examples is limited to text, which also uses CRLF for line endings.This can be expressed as a binary message (type message/bhttp) using a known-length encoding as shown in hexadecimal in . view includes some of the text alongside to show that most of the content is not modified.This example shows that the Host header field is not replicated in the :authority field, as is required for ensuring that the request is reproduced accurately; see .The same message can be truncated with no effect on interpretation. In this case, the last two bytes - corresponding to content and a trailer section - can each be removed without altering the semantics of the message.The same message, encoded using an indefinite-length encoding is shown in . As the content of this message is empty, the difference in formats is negligible.This indefinite-length encoding can be truncated by two bytes in the same way.Response ExampleResponse messages can contain interim (1xx) status codes as the message in shows. includes examples of informational status codes defined in and .As this is a longer example, only the indefinite-length encoding is shown in . Note here that the specific text used in the reason phrase is not retained by this encoding.A response that uses the chunked encoding (see ) as shown for can be encoded using indefinite-length encoding, which minimizes buffering needed to translate into the binary format. However, chunk boundaries do not need to be retained and any chunk extensions cannot be conveyed using the binary format. shows this message using the known-length coding. Note that the transfer-encoding header field is removed."message/bhttp" Media TypeThe message/http media type can be used to enclose a single HTTP request or response message, provided that it obeys the MIME restrictions for all "message" types regarding line length and encodings.
Type name:
message
Subtype name:
bhttp
Required parameters:
N/A
Optional parameters:
None
Encoding considerations:
only "8bit" or "binary" is permitted
Security considerations:
see
Interoperability considerations:
N/A
Published specification:
this specification
Applications that use this media type:
N/A
Fragment identifier considerations:
N/A
Additional information:
Magic number(s):
N/A
Deprecated alias names for this type:
N/A
File extension(s):
N/A
Macintosh file type code(s):
N/A
Person and email address to contact for further information:
see Authors' Addresses section
Intended usage:
COMMON
Restrictions on usage:
N/A
Author:
see Authors' Addresses section
Change controller:
IESG
Security ConsiderationsMany of the considerations that apply to HTTP message handling apply to this format; see and for common issues in handling HTTP messages.Strict parsing of the format with no tolerance for errors can help avoid a number of attacks. However, implementations still need to be aware of the possibility of resource exhaustion attacks that might arise from receiving large messages, particularly those with large numbers of fields.The format is designed to allow for minimal state when translating for use with HTTP proper. However, producing a combined value for fields, which might be necessary for the Cookie field when translating this format (like HTTP/1.1 ), can require the commitment of resources. Implementations need to ensure that they aren't subject to resource exhaustion attack from a maliciously crafted message.IANA ConsiderationsPlease add the "Media Types" registry at https://www.iana.org/assignments/media-types with the registration information in for the media type "message/bhttp".HTTP SemanticsAdobeFastlygreenbytes GmbHHTTP/1.1AdobeFastlygreenbytes GmbHHypertext Transfer Protocol Version 2 (HTTP/2)MozillaApple Inc.QUIC: A UDP-Based Multiplexed and Secure TransportKey words for use in RFCs to Indicate Requirement LevelsAmbiguity of Uppercase vs Lowercase in RFC 2119 Key WordsHypertext Transfer Protocol Version 3 (HTTP/3)AkamaiHPACK: Header Compression for HTTP/2QPACK: Header Compression for HTTP/3NetflixAkamai TechnologiesFacebookHTTP Extensions for Distributed Authoring -- WEBDAVAn HTTP Status Code for Indicating HintsAcknowledgmentsTODO: credit where credit is due.