rfc5987.txt | draft-reschke-rfc5987bis-latest.txt | |||
---|---|---|---|---|
Internet Engineering Task Force (IETF) J. Reschke | Network Working Group J. Reschke | |||
Request for Comments: 5987 greenbytes | Internet-Draft greenbytes | |||
Category: Standards Track August 2010 | Obsoletes: 5987 (if approved) July 6, 2024 | |||
ISSN: 2070-1721 | Intended status: Standards Track | |||
Expires: January 7, 2025 | ||||
Character Set and Language Encoding for | Indicating Character Encoding and Language for HTTP Header Field | |||
Hypertext Transfer Protocol (HTTP) Header Field Parameters | Parameters | |||
draft-reschke-rfc5987bis-latest | ||||
Abstract | Abstract | |||
By default, message header field parameters in Hypertext Transfer | By default, message header field parameters in Hypertext Transfer | |||
Protocol (HTTP) messages cannot carry characters outside the ISO- | Protocol (HTTP) messages cannot carry characters outside the ISO- | |||
8859-1 character set. RFC 2231 defines an encoding mechanism for use | 8859-1 character set. RFC 2231 defines an encoding mechanism for use | |||
in Multipurpose Internet Mail Extensions (MIME) headers. This | in Multipurpose Internet Mail Extensions (MIME) headers. This | |||
document specifies an encoding suitable for use in HTTP header fields | document specifies an encoding suitable for use in HTTP header fields | |||
that is compatible with a profile of the encoding defined in RFC | that is compatible with a profile of the encoding defined in RFC | |||
2231. | 2231. | |||
Editorial Note (To be removed by RFC Editor before publication) | ||||
Distribution of this document is unlimited. Although this is not a | ||||
work item of the HTTPbis Working Group, comments should be sent to | ||||
the Hypertext Transfer Protocol (HTTP) mailing list at ietf-http- | ||||
wg@w3.org [1], which may be joined by sending a message with subject | ||||
"subscribe" to ietf-http-wg-request@w3.org [2]. | ||||
Discussions of the HTTPbis Working Group are archived at | ||||
<http://lists.w3.org/Archives/Public/ietf-http-wg/>. | ||||
XML versions, latest edits, diffs, and the issues list for this | ||||
document are available from <http://greenbytes.de/tech/webdav/#draft- | ||||
reschke-rfc5987bis>. A collection of test cases is available at | ||||
<http://greenbytes.de/tech/tc2231/>. | ||||
Status of This Memo | Status of This Memo | |||
This is an Internet Standards Track document. | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | ||||
This document is a product of the Internet Engineering Task Force | Internet-Drafts are working documents of the Internet Engineering | |||
(IETF). It represents the consensus of the IETF community. It has | Task Force (IETF). Note that other groups may also distribute | |||
received public review and has been approved for publication by the | working documents as Internet-Drafts. The list of current Internet- | |||
Internet Engineering Steering Group (IESG). Further information on | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet Standards is available in Section 2 of RFC 5741. | ||||
Information about the current status of this document, any errata, | Internet-Drafts are draft documents valid for a maximum of six months | |||
and how to provide feedback on it may be obtained at | and may be updated, replaced, or obsoleted by other documents at any | |||
http://www.rfc-editor.org/info/rfc5987. | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | ||||
This Internet-Draft will expire on January 7, 2025. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2010 IETF Trust and the persons identified as the | Copyright (c) 2024 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 3 | 2. Notational Conventions . . . . . . . . . . . . . . . . . . . 3 | |||
3. Comparison to RFC 2231 and Definition of the Encoding . . . . 3 | 3. Comparison to RFC 2231 and Definition of the Encoding . . . . 4 | |||
3.1. Parameter Continuations . . . . . . . . . . . . . . . . . 4 | 3.1. Parameter Continuations . . . . . . . . . . . . . . . . . 4 | |||
3.2. Parameter Value Character Set and Language Information . . 4 | 3.2. Parameter Value Character Encoding and Language | |||
3.2.1. Definition . . . . . . . . . . . . . . . . . . . . . . 4 | Information . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
3.2.2. Examples . . . . . . . . . . . . . . . . . . . . . . . 6 | 3.2.1. Definition . . . . . . . . . . . . . . . . . . . . . 5 | |||
3.3. Language Specification in Encoded Words . . . . . . . . . 7 | 3.2.2. Historical Notes . . . . . . . . . . . . . . . . . . 7 | |||
4. Guidelines for Usage in HTTP Header Field Definitions . . . . 7 | 3.2.3. Examples . . . . . . . . . . . . . . . . . . . . . . 8 | |||
4.1. When to Use the Extension . . . . . . . . . . . . . . . . 8 | 3.3. Language Specification in Encoded Words . . . . . . . . . 8 | |||
4.2. Error Handling . . . . . . . . . . . . . . . . . . . . . . 8 | 4. Guidelines for Usage in HTTP Header Field Definitions . . . . 9 | |||
5. Security Considerations . . . . . . . . . . . . . . . . . . . 8 | 4.1. When to Use the Extension . . . . . . . . . . . . . . . . 9 | |||
6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 | 4.2. Error Handling . . . . . . . . . . . . . . . . . . . . . 10 | |||
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | |||
7.1. Normative References . . . . . . . . . . . . . . . . . . . 9 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 | |||
7.2. Informative References . . . . . . . . . . . . . . . . . . 10 | 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 | |||
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 | ||||
8.1. Normative References . . . . . . . . . . . . . . . . . . 11 | ||||
8.2. Informative References . . . . . . . . . . . . . . . . . 12 | ||||
8.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 13 | ||||
Appendix A. Changes from RFC 5987 . . . . . . . . . . . . . . . 14 | ||||
Appendix B. Implementation Report . . . . . . . . . . . . . . . 14 | ||||
Appendix C. Change Log (to be removed by RFC Editor before | ||||
publication) . . . . . . . . . . . . . . . . . . . . 14 | ||||
C.1. Since RFC5987 . . . . . . . . . . . . . . . . . . . . . . 14 | ||||
C.2. Since draft-reschke-rfc5987bis-00 . . . . . . . . . . . . 15 | ||||
C.3. Since draft-reschke-rfc5987bis-01 . . . . . . . . . . . . 15 | ||||
C.4. Since draft-reschke-rfc5987bis-02 . . . . . . . . . . . . 15 | ||||
C.5. Since draft-reschke-rfc5987bis-03 . . . . . . . . . . . . 15 | ||||
C.6. Since draft-reschke-rfc5987bis-04 . . . . . . . . . . . . 15 | ||||
C.7. Since draft-reschke-rfc5987bis-05 . . . . . . . . . . . . 15 | ||||
C.8. Since draft-reschke-rfc5987bis-06 . . . . . . . . . . . . 15 | ||||
Appendix D. Open issues (to be removed by RFC Editor prior to | ||||
publication) . . . . . . . . . . . . . . . . . . . . 15 | ||||
D.1. edit . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | ||||
D.2. httpbis . . . . . . . . . . . . . . . . . . . . . . . . . 15 | ||||
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 16 | ||||
1. Introduction | 1. Introduction | |||
By default, message header field parameters in HTTP ([RFC2616]) | By default, message header field parameters in HTTP ([RFC2616]) | |||
messages cannot carry characters outside the ISO-8859-1 character set | messages cannot carry characters outside the ISO-8859-1 coded | |||
([ISO-8859-1]). RFC 2231 ([RFC2231]) defines an encoding mechanism | character set ([ISO-8859-1]). RFC 2231 ([RFC2231]) defines an | |||
for use in MIME headers. This document specifies an encoding | encoding mechanism for use in MIME headers. This document specifies | |||
suitable for use in HTTP header fields that is compatible with a | an encoding suitable for use in HTTP header fields that is compatible | |||
profile of the encoding defined in RFC 2231. | with a profile of the encoding defined in RFC 2231. | |||
This document obsoletes [RFC5987] and moves it to "historic" status; | ||||
the changes are summarized in Appendix A. | ||||
Note: in the remainder of this document, RFC 2231 is only | Note: in the remainder of this document, RFC 2231 is only | |||
referenced for the purpose of explaining the choice of features | referenced for the purpose of explaining the choice of features | |||
that were adopted; they are therefore purely informative. | that were adopted; they are therefore purely informative. | |||
Note: this encoding does not apply to message payloads transmitted | Note: this encoding does not apply to message payloads transmitted | |||
over HTTP, such as when using the media type "multipart/form-data" | over HTTP, such as when using the media type "multipart/form-data" | |||
([RFC2388]). | ([RFC2388]). | |||
2. Notational Conventions | 2. Notational Conventions | |||
skipping to change at page 3, line 34 ¶ | skipping to change at page 4, line 5 ¶ | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
This specification uses the ABNF (Augmented Backus-Naur Form) | This specification uses the ABNF (Augmented Backus-Naur Form) | |||
notation defined in [RFC5234]. The following core rules are included | notation defined in [RFC5234]. The following core rules are included | |||
by reference, as defined in [RFC5234], Appendix B.1: ALPHA (letters), | by reference, as defined in [RFC5234], Appendix B.1: ALPHA (letters), | |||
DIGIT (decimal 0-9), HEXDIG (hexadecimal 0-9/A-F/a-f), and LWSP | DIGIT (decimal 0-9), HEXDIG (hexadecimal 0-9/A-F/a-f), and LWSP | |||
(linear whitespace). | (linear whitespace). | |||
Note that this specification uses the term "character set" for | This specification uses terminology defined in [RFC6365], namely: | |||
consistency with other IETF specifications such as RFC 2277 (see | ""character encoding scheme"" (below abbreviated to ""character | |||
[RFC2277], Section 3). A more accurate term would be "character | encoding""), ""charset"" and ""coded character set"". | |||
encoding" (a mapping of code points to octet sequences). | ||||
Note that this differs from RFC 2231, which uses the term "character | ||||
set" for "character encoding scheme". | ||||
3. Comparison to RFC 2231 and Definition of the Encoding | 3. Comparison to RFC 2231 and Definition of the Encoding | |||
RFC 2231 defines several extensions to MIME. The sections below | RFC 2231 defines several extensions to MIME. The sections below | |||
discuss if and how they apply to HTTP header fields. | discuss if and how they apply to HTTP header fields. | |||
In short: | In short: | |||
o Parameter Continuations aren't needed (Section 3.1), | o Parameter Continuations aren't needed (Section 3.1), | |||
o Character Set and Language Information are useful, therefore a | o Character Encoding and Language Information are useful, therefore | |||
simple subset is specified (Section 3.2), and | a simple subset is specified (Section 3.2), and | |||
o Language Specifications in Encoded Words aren't needed | o Language Specifications in Encoded Words aren't needed | |||
(Section 3.3). | (Section 3.3). | |||
3.1. Parameter Continuations | 3.1. Parameter Continuations | |||
Section 3 of [RFC2231] defines a mechanism that deals with the length | Section 3 of [RFC2231] defines a mechanism that deals with the length | |||
limitations that apply to MIME headers. These limitations do not | limitations that apply to MIME headers. These limitations do not | |||
apply to HTTP ([RFC2616], Section 19.4.7). | apply to HTTP ([RFC7231], Appendix A.6). | |||
Thus, parameter continuations are not part of the encoding defined by | Thus, parameter continuations are not part of the encoding defined by | |||
this specification. | this specification. | |||
3.2. Parameter Value Character Set and Language Information | 3.2. Parameter Value Character Encoding and Language Information | |||
Section 4 of [RFC2231] specifies how to embed language information | Section 4 of [RFC2231] specifies how to embed language information | |||
into parameter values, and also how to encode non-ASCII characters, | into parameter values, and also how to encode non-ASCII characters, | |||
dealing with restrictions both in MIME and HTTP header parameters. | dealing with restrictions both in MIME and HTTP header field | |||
parameters. | ||||
However, RFC 2231 does not specify a mandatory-to-implement character | However, RFC 2231 does not specify a mandatory-to-implement character | |||
set, making it hard for senders to decide which character set to use. | encoding, making it hard for senders to decide which encoding to use. | |||
Thus, recipients implementing this specification MUST support the | Thus, recipients implementing this specification MUST support the | |||
character sets "ISO-8859-1" [ISO-8859-1] and "UTF-8" [RFC3629]. | "UTF-8" character encoding [RFC3629]. | |||
Furthermore, RFC 2231 allows the character set information to be left | Furthermore, RFC 2231 allows the character encoding information to be | |||
out. The encoding defined by this specification does not allow that. | left out. The encoding defined by this specification does not allow | |||
that. | ||||
3.2.1. Definition | 3.2.1. Definition | |||
The syntax for parameters is defined in Section 3.6 of [RFC2616] | The syntax for parameters is defined in Section 3.6 of [RFC2616] | |||
(with RFC 2616 implied LWS translated to RFC 5234 LWSP): | (with RFC 2616 implied LWS translated to RFC 5234 LWSP): | |||
parameter = attribute LWSP "=" LWSP value | parameter = attribute LWSP "=" LWSP value | |||
attribute = token | attribute = token | |||
value = token / quoted-string | value = token / quoted-string | |||
quoted-string = <quoted-string, defined in [RFC2616], Section 2.2> | quoted-string = <quoted-string, defined in [RFC7230], Section 3.2.6> | |||
token = <token, defined in [RFC2616], Section 2.2> | token = <token, defined in [RFC7230], Section 3.2.6> | |||
In order to include character set and language information, this | In order to include character encoding and language information, this | |||
specification modifies the RFC 2616 grammar to be: | specification modifies the RFC 2616 grammar to be: | |||
parameter = reg-parameter / ext-parameter | parameter = reg-parameter / ext-parameter | |||
reg-parameter = parmname LWSP "=" LWSP value | reg-parameter = parmname LWSP "=" LWSP value | |||
ext-parameter = parmname "*" LWSP "=" LWSP ext-value | ext-parameter = parmname "*" LWSP "=" LWSP ext-value | |||
parmname = 1*attr-char | parmname = 1*attr-char | |||
ext-value = charset "'" [ language ] "'" value-chars | ext-value = charset "'" [ language ] "'" value-chars | |||
; like RFC 2231's <extended-initial-value> | ; like RFC 2231's <extended-initial-value> | |||
; (see [RFC2231], Section 7) | ; (see [RFC2231], Section 7) | |||
charset = "UTF-8" / "ISO-8859-1" / mime-charset | charset = "UTF-8" / mime-charset | |||
mime-charset = 1*mime-charsetc | mime-charset = 1*mime-charsetc | |||
mime-charsetc = ALPHA / DIGIT | mime-charsetc = ALPHA / DIGIT | |||
/ "!" / "#" / "$" / "%" / "&" | / "!" / "#" / "$" / "%" / "&" | |||
/ "+" / "-" / "^" / "_" / "`" | / "+" / "-" / "^" / "_" / "`" | |||
/ "{" / "}" / "~" | / "{" / "}" / "~" | |||
; as <mime-charset> in Section 2.3 of [RFC2978] | ; as <mime-charset> in Section 2.3 of [RFC2978] | |||
; except that the single quote is not included | ; except that the single quote is not included | |||
; SHOULD be registered in the IANA charset registry | ; SHOULD be registered in the IANA charset registry | |||
skipping to change at page 5, line 48 ¶ | skipping to change at page 6, line 48 ¶ | |||
; token except ( "*" / "'" / "%" ) | ; token except ( "*" / "'" / "%" ) | |||
Thus, a parameter is either a regular parameter (reg-parameter), as | Thus, a parameter is either a regular parameter (reg-parameter), as | |||
previously defined in Section 3.6 of [RFC2616], or an extended | previously defined in Section 3.6 of [RFC2616], or an extended | |||
parameter (ext-parameter). | parameter (ext-parameter). | |||
Extended parameters are those where the left-hand side of the | Extended parameters are those where the left-hand side of the | |||
assignment ends with an asterisk character. | assignment ends with an asterisk character. | |||
The value part of an extended parameter (ext-value) is a token that | The value part of an extended parameter (ext-value) is a token that | |||
consists of three parts: the REQUIRED character set name (charset), | consists of three parts: the REQUIRED character encoding name | |||
the OPTIONAL language information (language), and a character | (charset), the OPTIONAL language information (language), and a | |||
sequence representing the actual value (value-chars), separated by | character sequence representing the actual value (value-chars), | |||
single quote characters. Note that both character set names and | separated by single quote characters. Note that both character | |||
language tags are restricted to the US-ASCII character set, and are | encoding names and language tags are restricted to the US-ASCII coded | |||
matched case-insensitively (see [RFC2978], Section 2.3 and [RFC5646], | character set, and are matched case-insensitively (see [RFC2978], | |||
Section 2.1.1). | Section 2.3 and [RFC5646], Section 2.1.1). | |||
Inside the value part, characters not contained in attr-char are | Inside the value part, characters not contained in attr-char are | |||
encoded into an octet sequence using the specified character set. | encoded into an octet sequence using the specified character | |||
That octet sequence is then percent-encoded as specified in Section | encoding. That octet sequence is then percent-encoded as specified | |||
2.1 of [RFC3986]. | in Section 2.1 of [RFC3986]. | |||
Producers MUST use either the "UTF-8" ([RFC3629]) or the "ISO-8859-1" | Producers MUST use the "UTF-8" ([RFC3629]) character encoding. | |||
([ISO-8859-1]) character set. Extension character sets (mime- | Extension character encodings (mime-charset) are reserved for future | |||
charset) are reserved for future use. | use. | |||
Note: recipients should be prepared to handle encoding errors, | Note: recipients should be prepared to handle encoding errors, | |||
such as malformed or incomplete percent escape sequences, or non- | such as malformed or incomplete percent escape sequences, or non- | |||
decodable octet sequences, in a robust manner. This specification | decodable octet sequences, in a robust manner. This specification | |||
does not mandate any specific behavior, for instance, the | does not mandate any specific behavior, for instance, the | |||
following strategies are all acceptable: | following strategies are all acceptable: | |||
* ignoring the parameter, | * ignoring the parameter, | |||
* stripping a non-decodable octet sequence, | * stripping a non-decodable octet sequence, | |||
* substituting a non-decodable octet sequence by a replacement | * substituting a non-decodable octet sequence by a replacement | |||
character, such as the Unicode character U+FFFD (Replacement | character, such as the Unicode character U+FFFD (Replacement | |||
Character). | Character). | |||
Note: the RFC 2616 token production ([RFC2616], Section 2.2) | 3.2.2. Historical Notes | |||
differs from the production used in RFC 2231 (imported from | ||||
Section 5.1 of [RFC2045]) in that curly braces ("{" and "}") are | ||||
excluded. Thus, these two characters are excluded from the attr- | ||||
char production as well. | ||||
Note: the <mime-charset> ABNF defined here differs from the one in | The RFC 7230 token production ([RFC7230], Section 3.2.6) differs from | |||
Section 2.3 of [RFC2978] in that it does not allow the single | the production used in RFC 2231 (imported from Section 5.1 of | |||
quote character (see also RFC Errata ID 1912 [Err1912]). In | [RFC2045]) in that curly braces ("{" and "}") are excluded. Thus, | |||
practice, no character set names using that character have been | these two characters are excluded from the attr-char production as | |||
registered at the time of this writing. | well. | |||
3.2.2. Examples | The <mime-charset> ABNF defined here differs from the one in | |||
Section 2.3 of [RFC2978] in that it does not allow the single quote | ||||
character (see also RFC Errata ID 1912 [Err1912]). In practice, no | ||||
character encoding names using that character have been registered at | ||||
the time of this writing. | ||||
For backwards compatibility with RFC 2231, the encoding defined by | ||||
this specification deviates from common parameter syntax in that the | ||||
quoted-string notation is not allowed. Implementations using generic | ||||
parser components might not be able to detect the use of quoted- | ||||
string notation and thus might accept that format, although invalid, | ||||
as well. | ||||
[RFC5987] did require support for ISO-8859-1, too; for compatibility | ||||
with legacy code, recipients are encouraged to support this encoding | ||||
as well. | ||||
3.2.3. Examples | ||||
Non-extended notation, using "token": | Non-extended notation, using "token": | |||
foo: bar; title=Economy | foo: bar; title=Economy | |||
Non-extended notation, using "quoted-string": | Non-extended notation, using "quoted-string": | |||
foo: bar; title="US-$ rates" | foo: bar; title="US-$ rates" | |||
Extended notation, using the Unicode character U+00A3 (POUND SIGN): | Extended notation, using the Unicode character U+00A3 (POUND SIGN): | |||
foo: bar; title*=iso-8859-1'en'%A3%20rates | foo: bar; title*=utf-8'en'%C2%A3%20rates | |||
Note: the Unicode pound sign character U+00A3 was encoded into the | Note: the Unicode pound sign character U+00A3 was encoded into the | |||
single octet A3 using the ISO-8859-1 character encoding, then | octet sequence C2 A3 using the UTF-8 character encoding, then | |||
percent-encoded. Also, note that the space character was encoded as | percent-encoded. Also, note that the space character was encoded as | |||
%20, as it is not contained in attr-char. | %20, as it is not contained in attr-char. | |||
Extended notation, using the Unicode characters U+00A3 (POUND SIGN) | Extended notation, using the Unicode characters U+00A3 (POUND SIGN) | |||
and U+20AC (EURO SIGN): | and U+20AC (EURO SIGN): | |||
foo: bar; title*=UTF-8''%c2%a3%20and%20%e2%82%ac%20rates | foo: bar; title*=UTF-8''%c2%a3%20and%20%e2%82%ac%20rates | |||
Note: the Unicode pound sign character U+00A3 was encoded into the | Note: the Unicode pound sign character U+00A3 was encoded into the | |||
octet sequence C2 A3 using the UTF-8 character encoding, then | octet sequence C2 A3 using the UTF-8 character encoding, then | |||
percent-encoded. Likewise, the Unicode euro sign character U+20AC | percent-encoded. Likewise, the Unicode euro sign character U+20AC | |||
was encoded into the octet sequence E2 82 AC, then percent-encoded. | was encoded into the octet sequence E2 82 AC, then percent-encoded. | |||
Also note that HEXDIG allows both lowercase and uppercase characters, | Also note that HEXDIG allows both lowercase and uppercase characters, | |||
so recipients must understand both, and that the language information | so recipients must understand both, and that the language information | |||
is optional, while the character set is not. | is optional, while the character encoding is not. | |||
3.3. Language Specification in Encoded Words | 3.3. Language Specification in Encoded Words | |||
Section 5 of [RFC2231] extends the encoding defined in [RFC2047] to | Section 5 of [RFC2231] extends the encoding defined in [RFC2047] to | |||
also support language specification in encoded words. Although the | also support language specification in encoded words. RFC 2616, the | |||
HTTP/1.1 specification does refer to RFC 2047 ([RFC2616], Section | now-obsolete HTTP/1.1 specification, did refer to RFC 2047 | |||
2.2), it's not clear to which header field exactly it applies, and | ([RFC2616], Section 2.2). However, it wasn't clear to which header | |||
whether it is implemented in practice (see | field it applied. Consequently, the current revision of the HTTP/1.1 | |||
<http://tools.ietf.org/wg/httpbis/trac/ticket/111> for details). | specification has deprecated use of the encoding forms defined in RFC | |||
2047 (see Section 3.2.4 of [RFC7230]). | ||||
Thus, this specification does not include this feature. | Thus, this specification does not include this feature. | |||
4. Guidelines for Usage in HTTP Header Field Definitions | 4. Guidelines for Usage in HTTP Header Field Definitions | |||
Specifications of HTTP header fields that use the extensions defined | Specifications of HTTP header fields that use the extensions defined | |||
in Section 3.2 ought to clearly state that. A simple way to achieve | in Section 3.2 ought to clearly state that. A simple way to achieve | |||
this is to normatively reference this specification, and to include | this is to normatively reference this specification, and to include | |||
the ext-value production into the ABNF for that header field. | the ext-value production into the ABNF for that header field. | |||
For instance: | For instance: | |||
foo-header = "foo" LWSP ":" LWSP token ";" LWSP title-param | foo-header = "foo" LWSP ":" LWSP token ";" LWSP title-param | |||
title-param = "title" LWSP "=" LWSP value | title-param = "title" LWSP "=" LWSP value | |||
/ "title*" LWSP "=" LWSP ext-value | / "title*" LWSP "=" LWSP ext-value | |||
ext-value = <see RFC 5987, Section 3.2> | ext-value = <see RFC 5987, Section 3.2> | |||
Note: The Parameter Value Continuation feature defined in Section | ||||
3 of [RFC2231] makes it impossible to have multiple instances of | Note: The Parameter Value Continuation feature defined in | |||
extended parameters with identical parmname components, as the | Section 3 of [RFC2231] makes it impossible to have multiple | |||
processing of continuations would become ambiguous. Thus, | instances of extended parameters with identical parmname | |||
specifications using this extension are advised to disallow this | components, as the processing of continuations would become | |||
case for compatibility with RFC 2231. | ambiguous. Thus, specifications using this extension are advised | |||
to disallow this case for compatibility with RFC 2231. | ||||
Note: This specification does not automatically assign a new | ||||
interpretration to parameter names ending in an asterisk. As | ||||
pointed out above, it's up to the specification for the non- | ||||
extended parameter to "opt in" to the syntax defined here. That | ||||
being said, some existing implementations are known to | ||||
automatically switch to the use of this notation when a parameter | ||||
name ends with an asterisk, thus using parameter names ending in | ||||
an asterisk for something else is likely to cause interoperability | ||||
problems. | ||||
4.1. When to Use the Extension | 4.1. When to Use the Extension | |||
Section 4.2 of [RFC2277] requires that protocol elements containing | Section 4.2 of [RFC2277] requires that protocol elements containing | |||
human-readable text are able to carry language information. Thus, | human-readable text are able to carry language information. Thus, | |||
the ext-value production ought to be always used when the parameter | the ext-value production ought to be always used when the parameter | |||
value is of textual nature and its language is known. | value is of textual nature and its language is known. | |||
Furthermore, the extension ought to also be used whenever the | Furthermore, the extension ought to also be used whenever the | |||
parameter value needs to carry characters not present in the US-ASCII | parameter value needs to carry characters not present in the US-ASCII | |||
([USASCII]) character set (note that it would be unacceptable to | ([USASCII]) coded character set (note that it would be unacceptable | |||
define a new parameter that would be restricted to a subset of the | to define a new parameter that would be restricted to a subset of the | |||
Unicode character set). | Unicode character set). | |||
4.2. Error Handling | 4.2. Error Handling | |||
Header field specifications need to define whether multiple instances | Header field specifications need to define whether multiple instances | |||
of parameters with identical parmname components are allowed, and how | of parameters with identical parmname components are allowed, and how | |||
they should be processed. This specification suggests that a | they should be processed. This specification suggests that a | |||
parameter using the extended syntax takes precedence. This would | parameter using the extended syntax takes precedence. This would | |||
allow producers to use both formats without breaking recipients that | allow producers to use both formats without breaking recipients that | |||
do not understand the extended syntax yet. | do not understand the extended syntax yet. | |||
skipping to change at page 9, line 18 ¶ | skipping to change at page 10, line 47 ¶ | |||
See Section 10 of [RFC3629] for more information on both topics. | See Section 10 of [RFC3629] for more information on both topics. | |||
In addition, the extension specified in this document makes it | In addition, the extension specified in this document makes it | |||
possible to transport multiple language variants for a single | possible to transport multiple language variants for a single | |||
parameter, and such use might allow spoofing attacks, where different | parameter, and such use might allow spoofing attacks, where different | |||
language versions of the same parameter are not equivalent. Whether | language versions of the same parameter are not equivalent. Whether | |||
this attack is useful as an attack depends on the parameter | this attack is useful as an attack depends on the parameter | |||
specified. | specified. | |||
6. Acknowledgements | 6. IANA Considerations | |||
There are no IANA Considerations related to this specification. | ||||
7. Acknowledgements | ||||
Thanks to Martin Duerst and Frank Ellermann for help figuring out | Thanks to Martin Duerst and Frank Ellermann for help figuring out | |||
ABNF details, to Graham Klyne and Alexey Melnikov for general review, | ABNF details, to Graham Klyne and Alexey Melnikov for general review, | |||
to Chris Newman for pointing out an RFC 2231 incompatibility, and to | to Chris Newman for pointing out an RFC 2231 incompatibility, and to | |||
Benjamin Carlyle and Roar Lauritzsen for implementer's feedback. | Benjamin Carlyle, Roar Lauritzsen, Eric Lawrence, and James Manger | |||
for implementer's feedback. | ||||
7. References | 8. References | |||
7.1. Normative References | 8.1. Normative References | |||
[ISO-8859-1] International Organization for Standardization, | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
"Information technology -- 8-bit single-byte coded | Requirement Levels", BCP 14, RFC 2119, | |||
graphic character sets -- Part 1: Latin alphabet No. | DOI 10.17487/RFC2119, March 1997, | |||
1", ISO/IEC 8859-1:1998, 1998. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext | |||
Transfer Protocol -- HTTP/1.1", RFC 2616, | ||||
DOI 10.17487/RFC2616, June 1999, | ||||
<https://www.rfc-editor.org/info/rfc2616>. | ||||
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., | [RFC2978] Freed, N. and J. Postel, "IANA Charset Registration | |||
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext | Procedures", BCP 19, RFC 2978, DOI 10.17487/RFC2978, | |||
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. | October 2000, <https://www.rfc-editor.org/info/rfc2978>. | |||
[RFC2978] Freed, N. and J. Postel, "IANA Charset Registration | [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO | |||
Procedures", BCP 19, RFC 2978, October 2000. | 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November | |||
2003, <https://www.rfc-editor.org/info/rfc3629>. | ||||
[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO | [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform | |||
10646", RFC 3629, STD 63, November 2003. | Resource Identifier (URI): Generic Syntax", STD 66, | |||
RFC 3986, DOI 10.17487/RFC3986, January 2005, | ||||
<https://www.rfc-editor.org/info/rfc3986>. | ||||
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, | [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax | |||
"Uniform Resource Identifier (URI): Generic Syntax", | Specifications: ABNF", STD 68, RFC 5234, | |||
RFC 3986, STD 66, January 2005. | DOI 10.17487/RFC5234, January 2008, | |||
<https://www.rfc-editor.org/info/rfc5234>. | ||||
[RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for | [RFC5646] Phillips, A., Ed. and M. Davis, Ed., "Tags for Identifying | |||
Syntax Specifications: ABNF", STD 68, RFC 5234, | Languages", BCP 47, RFC 5646, DOI 10.17487/RFC5646, | |||
January 2008. | September 2009, <https://www.rfc-editor.org/info/rfc5646>. | |||
[RFC5646] Phillips, A., Ed. and M. Davis, Ed., "Tags for | [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer | |||
Identifying Languages", BCP 47, RFC 5646, | Protocol (HTTP/1.1): Message Syntax and Routing", | |||
September 2009. | RFC 7230, DOI 10.17487/RFC7230, June 2014, | |||
<https://www.rfc-editor.org/info/rfc7230>. | ||||
[USASCII] American National Standards Institute, "Coded Character | [RFC7231] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer | |||
Set -- 7-bit American Standard Code for Information | Protocol (HTTP/1.1): Semantics and Content", RFC 7231, | |||
Interchange", ANSI X3.4, 1986. | DOI 10.17487/RFC7231, June 2014, | |||
<https://www.rfc-editor.org/info/rfc7231>. | ||||
7.2. Informative References | [USASCII] American National Standards Institute, "Coded Character | |||
Set -- 7-bit American Standard Code for Information | ||||
Interchange", ANSI X3.4, 1986. | ||||
[Err1912] RFC Errata, "Errata ID 1912, RFC 2978", | 8.2. Informative References | |||
<http://www.rfc-editor.org>. | ||||
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet | [Err1912] RFC Errata, "Errata ID 1912, RFC 2978", | |||
Mail Extensions (MIME) Part One: Format of Internet | <http://www.rfc-editor.org>. | |||
Message Bodies", RFC 2045, November 1996. | ||||
[RFC2047] Moore, K., "MIME (Multipurpose Internet Mail | [ISO-8859-1] | |||
Extensions) Part Three: Message Header Extensions for | International Organization for Standardization, | |||
Non-ASCII Text", RFC 2047, November 1996. | "Information technology -- 8-bit single-byte coded graphic | |||
character sets -- Part 1: Latin alphabet No. 1", ISO/ | ||||
IEC 8859-1:1998, 1998. | ||||
[RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and | [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail | |||
Encoded Word Extensions: Character Sets, Languages, and | Extensions (MIME) Part One: Format of Internet Message | |||
Continuations", RFC 2231, November 1997. | Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996, | |||
<https://www.rfc-editor.org/info/rfc2045>. | ||||
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) | |||
Languages", BCP 18, RFC 2277, January 1998. | Part Three: Message Header Extensions for Non-ASCII Text", | |||
RFC 2047, DOI 10.17487/RFC2047, November 1996, | ||||
<https://www.rfc-editor.org/info/rfc2047>. | ||||
[RFC2388] Masinter, L., "Returning Values from Forms: multipart/ | [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded | |||
form-data", RFC 2388, August 1998. | Word Extensions: Character Sets, Languages, and | |||
Continuations", RFC 2231, DOI 10.17487/RFC2231, November | ||||
1997, <https://www.rfc-editor.org/info/rfc2231>. | ||||
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and | ||||
Languages", BCP 18, RFC 2277, DOI 10.17487/RFC2277, | ||||
January 1998, <https://www.rfc-editor.org/info/rfc2277>. | ||||
[RFC2388] Masinter, L., "Returning Values from Forms: multipart/ | ||||
form-data", RFC 2388, DOI 10.17487/RFC2388, August 1998, | ||||
<https://www.rfc-editor.org/info/rfc2388>. | ||||
[RFC5987] Reschke, J., "Character Set and Language Encoding for | ||||
Hypertext Transfer Protocol (HTTP) Header Field | ||||
Parameters", RFC 5987, DOI 10.17487/RFC5987, August 2010, | ||||
<https://www.rfc-editor.org/info/rfc5987>. | ||||
[RFC5988] Nottingham, M., "Web Linking", RFC 5988, | ||||
DOI 10.17487/RFC5988, October 2010, | ||||
<https://www.rfc-editor.org/info/rfc5988>. | ||||
[RFC6266] Reschke, J., "Use of the Content-Disposition Header Field | ||||
in the Hypertext Transfer Protocol (HTTP)", RFC 6266, | ||||
DOI 10.17487/RFC6266, June 2011, | ||||
<https://www.rfc-editor.org/info/rfc6266>. | ||||
[RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in | ||||
Internationalization in the IETF", BCP 166, RFC 6365, | ||||
DOI 10.17487/RFC6365, September 2011, | ||||
<https://www.rfc-editor.org/info/rfc6365>. | ||||
8.3. URIs | ||||
[1] mailto:ietf-http-wg@w3.org | ||||
[2] mailto:ietf-http-wg-request@w3.org?subject=subscribe | ||||
Appendix A. Changes from RFC 5987 | ||||
This section summarizes the changes compared to [RFC5987]: | ||||
o The document title was changed to "Indicating Character Encoding | ||||
and Language for HTTP Header Field Parameters". | ||||
o The requirement to support the "ISO-8859-1" encoding was removed. | ||||
Appendix B. Implementation Report | ||||
The encoding defined in this document currently is used for two | ||||
different HTTP header fields: | ||||
o "Content-Disposition", defined in [RFC6266], and | ||||
o "Link", defined in [RFC5988]. | ||||
As the encoding is a profile/clarification of the one defined in | ||||
[RFC2231] in 1997, many user agents already supported it for use in | ||||
"Content-Disposition" when [RFC5987] got published. | ||||
Since the publication of [RFC5987], three more popular desktop user | ||||
agents have added support for this encoding; see | ||||
<http://purl.org/NET/http/content-disposition-tests#encoding- | ||||
2231-char> for details. At this time, the current versions of all | ||||
major desktop user agents support it. | ||||
Note that the implementation in Internet Explorer 9 does not support | ||||
the ISO-8859-1 character encoding; this document revision | ||||
acknowledges that UTF-8 is sufficient for expressing all code points, | ||||
and removes the requirement to support ISO-8859-1. | ||||
The "Link" header field, on the other hand, was only recently | ||||
specified in [RFC5988]. At the time of this writing, no shipping | ||||
User Agent except Firefox supported the "title*" parameter (starting | ||||
with release 15). | ||||
Appendix C. Change Log (to be removed by RFC Editor before publication) | ||||
C.1. Since RFC5987 | ||||
Only editorial changes for the purpose of starting the revision | ||||
process (obs5987). | ||||
C.2. Since draft-reschke-rfc5987bis-00 | ||||
Resolved issues "iso-8859-1" and "title" (title simplified). Added | ||||
and resolved issue "historic5987". | ||||
C.3. Since draft-reschke-rfc5987bis-01 | ||||
Added issues "httpbis", "parmsyntax", "terminology" and | ||||
"valuesyntax". Closed issue "impls". | ||||
C.4. Since draft-reschke-rfc5987bis-02 | ||||
Resolved issue "terminology". | ||||
C.5. Since draft-reschke-rfc5987bis-03 | ||||
In Section 3.2, pull historical notes into a separate subsection. | ||||
Resolved issues "valuesyntax" and "parmsyntax". | ||||
C.6. Since draft-reschke-rfc5987bis-04 | ||||
Update status of Firefox support in HTTP Link Header field. | ||||
C.7. Since draft-reschke-rfc5987bis-05 | ||||
Update status of Firefox support in HTTP Link Header field. | ||||
C.8. Since draft-reschke-rfc5987bis-06 | ||||
Update status with respect to Safari 6. | ||||
Started work on update with respect to RFC 723x. | ||||
Appendix D. Open issues (to be removed by RFC Editor prior to | ||||
publication) | ||||
D.1. edit | ||||
Type: edit | ||||
julian.reschke@greenbytes.de (2011-04-15): Umbrella issue for | ||||
editorial fixes/enhancements. | ||||
D.2. httpbis | ||||
Type: edit | ||||
julian.reschke@greenbytes.de (2011-09-17): The document refers | ||||
normatively to RFC 2616. Should it continue to do so, or should we | ||||
wait for HTTPbis? This may affect edge case in the ABNF, such as the | ||||
definition of linear white space or the characters allowed in | ||||
"token". | ||||
Author's Address | Author's Address | |||
Julian F. Reschke | Julian F. Reschke | |||
greenbytes GmbH | greenbytes GmbH | |||
Hafenweg 16 | Hafenweg 16 | |||
Muenster, NW 48155 | Muenster, NW 48155 | |||
Germany | Germany | |||
EMail: julian.reschke@greenbytes.de | EMail: julian.reschke@greenbytes.de | |||
End of changes. 54 change blocks. | ||||
142 lines changed or deleted | 368 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |