Binary Representation of HTTP MessagesMozillamt@lowentropy.netCloudflarecaw@heapingbits.net
art
httpbisThis document defines a binary format for representing HTTP messages.Status of This Memo
This is an Internet Standards Track document.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by
the Internet Engineering Steering Group (IESG). Further
information on Internet Standards is available in Section 2 of
RFC 7841.
Information about the current status of this document, any
errata, and how to provide feedback on it may be obtained at
.
Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
() in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Revised BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Revised BSD License.
Table of Contents
. Introduction
. Conventions and Definitions
. Format
. Known-Length Messages
. Indeterminate-Length Messages
. Framing Indicator
. Request Control Data
. Response Control Data
. Informational Status Codes
. Header and Trailer Field Lines
. Content
. Padding and Truncation
. Invalid Messages
. Examples
. Request Example
. Response Example
. Notable Differences with HTTP Protocol Messages
. "message/bhttp" Media Type
. Security Considerations
. IANA Considerations
. References
. Normative References
. Informative References
Acknowledgments
Authors' Addresses
IntroductionThis document defines a simple format for representing an HTTP message
, either request or response. This allows for the encoding of HTTP
messages that can be conveyed outside an HTTP protocol. This enables the
transformation of entire messages, including the application of authenticated
encryption.The design of this format is informed by the framing structure of HTTP/2
and HTTP/3 . Rules for constructing messages rely on the rules
defined in HTTP/2, but the format itself is distinct; see .This format defines "message/bhttp", a binary alternative to the "message/http"
content type defined in . A binary format permits more efficient
encoding and processing of messages. A binary format also reduces exposure to
security problems related to processing of HTTP messages.Two modes for encoding are described:
a known-length encoding includes length prefixes for all major message
components, and
an indeterminate-length encoding enables efficient generation of messages
where lengths are not known when encoding starts.
This format is designed to convey the semantics of valid HTTP messages as simply
and efficiently as possible. It is not designed to capture all of the details
of the encoding of messages from specific HTTP versions . As such, this format is unlikely to be suitable for applications that
depend on an exact recording of the encoding of messages.Conventions and DefinitionsThe key words "MUST", "MUST NOT",
"REQUIRED", "SHALL",
"SHALL NOT", "SHOULD",
"SHOULD NOT",
"RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document
are to be interpreted as described in BCP 14
when, and only
when, they appear in all capitals, as shown here.This document uses terminology from HTTP and notation from QUIC
().Format defines the general structure of HTTP messages and
composes those messages into distinct parts. This format describes how those
parts are composed into a sequence of bytes. At a high level, binary messages
are comprised of:
Framing indicator. This format uses a single integer to describe framing, which describes
whether the message is a request or response and how subsequent sections are
formatted; see .
For a response, zero or more informational responses. Each informational
response consists of an informational status code and header section.
Control data. For a request, this contains the request method and target.
For a response, this contains the status code.
Header section. This contains zero or more header fields.
Content. This is a sequence of zero or more bytes.
Trailer section. This contains zero or more trailer fields.
Optional padding. Any amount of zero-valued bytes.
All lengths and numeric values are encoded using the variable-length integer
encoding from . Integer values do not need to be encoded
on the minimum number of bytes necessary.Known-Length MessagesA request or response that has a known length at the time of construction uses
the format shown in .A known-length request consists of a framing indicator (), request
control data (), a header section with a length prefix,
binary content with a length prefix, a trailer section with a length prefix, and
padding.A known-length response contains the same fields, with the exception that
request control data is replaced by zero or more informational responses
() followed by response control data ().For a known-length encoding, the length prefix on field sections and content is
a variable-length encoding of an integer. This integer is the number of bytes
in the field section or content, not including the length field itself.Fields in the header and trailer sections consist of a length-prefixed name and
length-prefixed value; see .The format allows for the message to be truncated before any of the length
prefixes that precede the field sections or content; see .The variable-length integer encoding means that there is a limit of 262-1
bytes for each field section and the message content.Indeterminate-Length MessagesA request or response that is constructed without encoding a known length for
each section uses the format shown in :An indeterminate-length request consists of a framing indicator (),
request control data (), a header section that is terminated
by a zero value, any number of non-zero-length chunks of binary content, a zero
value, a trailer section that is terminated by a zero value, and padding.An indeterminate-length response contains the same fields, with the exception
that request control data is replaced by zero or more informational responses
() and response control data ().The indeterminate-length encoding only uses length prefixes for content blocks.
Multiple length-prefixed portions of content can be included, each prefixed by a
non-zero Chunk Length integer describing the number of bytes in the block. The
Chunk Length is encoded as a variable-length integer.Each Field Line in an Indeterminate-Length Field Section starts with a Name
Length field. An Indeterminate-Length Field Section ends with a Content
Terminator field. The zero value of the Content Terminator distinguishes it
from the Name Length field, which cannot contain a value of 0.Indeterminate-length messages can be truncated in a way similar to that for
known-length
messages; see .Indeterminate-length messages use the same encoding for Field Line as
known-length messages; see .Framing IndicatorThe start of each binary message is a framing indicator that is a single integer that
describes the structure of the subsequent sections. The framing indicator can
take just four values:
A value of 0 describes a request of known length.
A value of 1 describes a response of known length.
A value of 2 describes a request of indeterminate length.
A value of 3 describes a response of indeterminate length.
Other values cause the message to be invalid; see .Request Control DataThe control data for a request message contains the method and request target.
That information is encoded as an ordered sequence of fields: Method, Scheme,
Authority, Path. Each of these fields is prefixed with a length.The values of these fields follow the rules in HTTP/2 ()
that apply to the ":method", ":scheme", ":authority", and ":path" pseudo-header
fields, respectively. However, where the ":authority" pseudo-header field might
be omitted in HTTP/2, a zero-length value is encoded instead.The format of request control data is shown in .Response Control DataThe control data for a response message consists of the status code. The status
code () is encoded as a variable-length integer, not a
length-prefixed decimal string.The format of final response control data is shown in
.Informational Status CodesResponses that include informational status codes (see )
are encoded by repeating the response control data and associated header section
until a final status code is encoded; that is, a Status Code field with a value from 200 to 599 (inclusive). The status code distinguishes
between informational and final responses.The format of the informational response control data is shown in
.A response message can include any number of informational responses that
precede a final status code. These convey an informational status code and a
header block.If the response control data includes an informational status code (that is, a
value between 100 and 199 inclusive), the control data is followed by a header
section (encoded with known length or indeterminate length according to the framing
indicator) and another block of control data. This pattern repeats until the
control data contains a final status code (200 to 599 inclusive).Header and Trailer Field LinesHeader and trailer sections consist of zero or more field lines; see . The format of a field section depends on whether the message is of
known length or indeterminate length.Each Field Line encoding includes a name and a value. Both the name and value are
length-prefixed sequences of bytes. The Name field is a minimum of one
byte. The format of a Field Line is shown in .For field names, byte values that are not permitted in an HTTP field name cause
the message to be invalid; see for a definition of what
is valid and regarding the handling of invalid messages. A recipient MUST
treat a message that contains field values that would cause an HTTP/2 message to
be malformed according to as invalid; see .The same field name can be repeated over more than one field line; see for the semantics of repeated field names and rules for combining
values.Messages are invalid () if they contain fields named ":method",
":scheme", ":authority", ":path", or ":status". Other pseudo-fields that are
defined by protocol extensions MAY be included; pseudo-fields cannot be included
in trailers (see ). A Field Line containing pseudo-fields
MUST precede other Field Line values. A message that contains a pseudo-field after
any other field is invalid; see .Fields that relate to connections () cannot be used to
produce the effect on a connection in this context. These fields SHOULD be
removed when constructing a binary message. However, they do not cause a
message to be invalid (); permitting these fields allows a binary
message to capture messages that are exchanged in a protocol context.Like HTTP/2 or HTTP/3, this format has an exception for the combination of
multiple instances of the Cookie field. Instances of fields with the
ASCII-encoded value of "cookie" are combined using a semicolon octet (0x3b)
rather than a comma; see .ContentThe content of messages is a sequence of bytes of any length. Though a
known-length message has a limit, this limit is large enough that it is
unlikely to be a practical limitation. There is no limit to the size of content
in an indeterminate-length message.Padding and TruncationMessages can be padded with any number of zero-valued bytes. Non-zero padding
bytes cause a message to be invalid (see ). Unlike other parts of a
message, a processor MAY decide not to validate the value of padding bytes.Truncation can be used to reduce the size of messages that have no data in
trailing field sections or content. If the trailers of a message are empty, they
MAY be omitted by the encoder in place of adding a length field equal to
zero. An encoder MAY omit empty content in the same way if the trailers are also
empty. A message that is truncated at any other point is invalid; see
.Decoders MUST treat missing truncated fields as equivalent to having been sent
with the length field set to zero.Padding is compatible with truncation of empty parts of the messages.
Zero-valued bytes will be interpreted as a zero-length part, which is semantically
equivalent to the part being absent.Invalid MessagesThis document describes a number of ways that a message can be invalid. Invalid
messages MUST NOT be processed further except to log an error and produce an
error response.The format is designed to allow incremental processing. Implementations need to
be aware of the possibility that an error might be detected after performing
incremental processing.ExamplesThis section includes example requests and responses encoded in both
known-length and indeterminate-length forms.Request ExampleThe example HTTP/1.1 message in shows the content in the
"message/http" format.Valid HTTP/1.1 messages require lines terminated with CRLF (the two bytes 0x0d and 0x0a). For simplicity and consistency, the content of these examples is
limited to text, which also uses CRLF for line endings.This can be expressed as a binary message (type "message/bhttp") using a
known-length encoding as shown in hexadecimal in .
includes text alongside to show that most of the content is
not modified.This example shows that the Host header field is not replicated in the
":authority" field, as is required for ensuring that the request is reproduced
accurately; see .The same message can be truncated with no effect on interpretation. In this
case, the last two bytes -- corresponding to content and a trailer section -- can
each be removed without altering the semantics of the message.The same message, encoded using an indeterminate-length encoding, is shown in
. As the content of this message is empty, the difference in
formats is negligible.This indeterminate-length encoding contains 10 bytes of padding. As two additional
bytes can be truncated in the same way as the known-length example, anything up
to 12 bytes can be removed from this message without affecting its meaning.Response ExampleResponse messages can contain interim (1xx) status codes, as the message in
shows. includes examples of informational
status codes 102 and 103, as defined in (now obsolete but defines status code 102) and , respectively.As this is a longer example, only the indeterminate-length encoding is shown in
. Note here that the specific text used in the reason
phrase is not retained by this encoding.A response that uses the chunked encoding (see ) as
shown in can be encoded using indeterminate-length encoding, which
minimizes buffering needed to translate into the binary format. However, chunk
boundaries do not need to be retained, and any chunk extensions cannot be
conveyed using the binary format; see . shows this message using the known-length encoding. Note that
the Transfer-Encoding header field is removed.Notable Differences with HTTP Protocol MessagesThis format is designed to carry HTTP semantics just like HTTP/1.1 , HTTP/2 , or
HTTP/3 . However, there are some notable
differences between this format and the format used in an interactive protocol
version.In particular, as a standalone representation, this format lacks the following
features of the formats used in those protocols:
chunk extensions () and transfer encoding
()
generic framing and extensibility capabilities
field blocks other than a single header and trailer field block
carrying reason phrases in responses ()
header compression
response framing that depends on the corresponding request (such as HEAD)
or the value of the status code (such as 204 or 304); these responses use the
same framing as all other messages
Some of these features are also absent in HTTP/2 and HTTP/3.Unlike HTTP/2 and HTTP/3, this format uses a fixed format for control data
rather than using pseudo-fields.Note that while some messages -- CONNECT or upgrade requests in particular -- can
be represented using this format, doing so serves no purpose, as these requests
are used to affect protocol behavior, which this format cannot do without
additional mechanisms."message/bhttp" Media TypeThe "message/bhttp" media type can be used to enclose a single HTTP request or
response message, provided that it obeys the MIME restrictions for all
"message" types regarding line length and encodings.
Type name:
message
Subtype name:
bhttp
Required parameters:
N/A
Optional parameters:
N/A
Encoding considerations:
Only "8bit" or "binary" is permitted.
Security considerations:
See .
Interoperability considerations:
N/A
Published specification:
RFC 9292
Applications that use this media type:
Applications seeking to convey HTTP semantics that are independent of a
specific protocol.
Fragment identifier considerations:
N/A
Additional information:
Deprecated alias names for this type:
N/A
Magic number(s):
N/A
File extension(s):
N/A
Macintosh file type code(s):
N/A
Person & email address to contact for further information:
See the Authors' Addresses section.
Intended usage:
COMMON
Restrictions on usage:
N/A
Author:
See the Authors' Addresses section.
Change controller:
IESG
Security ConsiderationsMany of the considerations that apply to HTTP message handling apply to this
format; see and for common
issues in handling HTTP messages.Strict parsing of the format with no tolerance for errors can help avoid a
number of attacks. However, implementations still need to be aware of the
possibility of resource exhaustion attacks that might arise from receiving
large messages, particularly those with large numbers of fields.Implementations need to ensure that they aren't subject to resource exhaustion
attacks from maliciously crafted messages. Overall, the format is designed to
allow for minimal state when processing messages. However, producing a combined
field value () for fields might require the commitment of
resources. In particular, combining might be necessary for the Cookie field
when translating this format for use in other contexts, such as use in an API or
translation to HTTP/1.1 , where the recipient of the field might
not expect multiple values.IANA ConsiderationsIANA has added the media type "message/bhttp" to the "Media Types" registry at
. See for registration
information.ReferencesNormative ReferencesHTTP SemanticsThe Hypertext Transfer Protocol (HTTP) is a stateless application-level protocol for distributed, collaborative, hypertext information systems. This document describes the overall architecture of HTTP, establishes common terminology, and defines aspects of the protocol that are shared by all versions. In this definition are core protocol elements, extensibility mechanisms, and the "http" and "https" Uniform Resource Identifier (URI) schemes.This document updates RFC 3864 and obsoletes RFCs 2818, 7231, 7232, 7233, 7235, 7538, 7615, 7694, and portions of 7230.HTTP/2This specification describes an optimized expression of the semantics of the Hypertext Transfer Protocol (HTTP), referred to as HTTP version 2 (HTTP/2). HTTP/2 enables a more efficient use of network resources and a reduced latency by introducing field compression and allowing multiple concurrent exchanges on the same connection.This document obsoletes RFCs 7540 and 8740.QUIC: A UDP-Based Multiplexed and Secure TransportThis document defines the core of the QUIC transport protocol. QUIC provides applications with flow-controlled streams for structured communication, low-latency connection establishment, and network path migration. QUIC includes security measures that ensure confidentiality, integrity, and availability in a range of deployment circumstances. Accompanying documents describe the integration of TLS for key negotiation, loss detection, and an exemplary congestion control algorithm.Key words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Ambiguity of Uppercase vs Lowercase in RFC 2119 Key WordsRFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.Informative ReferencesHPACK: Header Compression for HTTP/2This specification defines HPACK, a compression format for efficiently representing HTTP header fields, to be used in HTTP/2.HTTP/1.1The Hypertext Transfer Protocol (HTTP) is a stateless application-level protocol for distributed, collaborative, hypertext information systems. This document specifies the HTTP/1.1 message syntax, message parsing, connection management, and related security concerns.This document obsoletes portions of RFC 7230.HTTP/3The QUIC transport protocol has several features that are desirable in a transport for HTTP, such as stream multiplexing, per-stream flow control, and low-latency connection establishment. This document describes a mapping of HTTP semantics over QUIC. This document also identifies HTTP/2 features that are subsumed by QUIC and describes how HTTP/2 extensions can be ported to HTTP/3.QPACK: Field Compression for HTTP/3This specification defines QPACK: a compression format for efficiently representing HTTP fields that is to be used in HTTP/3. This is a variation of HPACK compression that seeks to reduce head-of-line blocking.HTTP Extensions for Distributed Authoring -- WEBDAVThis document specifies a set of methods, headers, and content-types ancillary to HTTP/1.1 for the management of resource properties, creation and management of resource collections, namespace manipulation, and resource locking (collision avoidance). [STANDARDS-TRACK]An HTTP Status Code for Indicating HintsThis memo introduces an informational HTTP status code that can be used to convey hints that help a client make preparations for processing the final response.Acknowledgments, , , and provided
excellent feedback on both the design and its documentation.Authors' AddressesMozillamt@lowentropy.netCloudflarecaw@heapingbits.net