Update to MIME regarding Charset Parameter Handling in Textual Media Types
Isode Limited
5 Castle Business Village
36 Station Road
Hampton
Middlesex
TW12 2BX
UK
Alexey.Melnikov@isode.com
greenbytes GmbH
Hafenweg 16
MuensterNW48155
Germany
julian.reschke@greenbytes.de
http://greenbytes.de/tech/webdav/
Applications
Applications Area Working Group
MIME
charset
text
This document changes RFC 2046 rules regarding default charset parameter
values for text/* media types to better align with common usage by existing
clients and servers.
specified that the default charset parameter
(i.e. the value used when it is not specified) is "US-ASCII".
changed the default for use by HTTP to be "ISO-8859-1".
This encoding is not very common for new text/* media types
and a special rule in HTTP adds confusion
about which specification ( or )
is authoritative in regards to the default charset for text/* media types.
At the time of writing of this document the IETF HTTPBIS WG is working
on an update to RFC 2616 which removes the default charset of "ISO-8859-1"
for "text/*" media types. It is expected that the set of HTTPBIs documents
will reference this document in order to use the updated rules
of default charset in "text/*" media types.
Many complex text subtypes such as text/html and text/xml have internal
(to their format) means of describing the charset.
Many existing User Agents ignore the default of "US-ASCII" rule for at least
text/html and text/xml.
This document changes RFC 2046 rules regarding default charset parameter
values for text/* media types to better align with common usage by existing
clients and servers.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in
.
Section 4.1.2 of says:
The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.
As explained in the Introduction section this rule is considered
to be outdated, so this document replaces it with the following set
of rules:
Each subtype of the "text" media type which uses the "charset"
parameter can define its own default value for the "charset" parameter,
including absence of any default.
In order to improve interoperability with deployed agents,
"text/*" media type definitions SHOULD either
a) specify that the "charset" parameter is not used for the defined subtype,
because the charset information is transported inside the payload (as in "text/xml") or
b) require explicit unconditional inclusion of the "charset" parameter
eliminating the need for a default value.
In accordance with option (a), above, "text/*" media types that can
transport charset information inside the corresponding payloads,
specifically including "text/html" and "text/xml", SHOULD NOT specify
the use of a "charset" parameter, nor any default value, in order to
avoid conflicting interpretations should the charset parameter value
and the value specified in the payload disagree.
New subtypes of the "text" media type, thus, SHOULD NOT define a
default "charset" value. If there is a strong reason to do so
despite this advice, they SHOULD use the "UTF-8" charset
as the default.
Specifications of how to specify the "charset" parameter, and what
default value, if any, is used, are subtype-specific, NOT protocol-
specific. Protocols that use MIME, therefore, MUST NOT override
default charset values for "text/*" media types to be different for
their specific protocol. The protocol definitions MUST leave that
to the subtype definitions.
The default charset parameter value for text/plain is unchanged
from and remains as "US-ASCII".
TBD. Guessing of default charset is a security problem.
Conflicting information in-band vs out-of-band is also a security problem.
This document asks IANA to update the "text" subregistry of
the Media Types registry to additionally point to this document.
Key words for use in RFCs to Indicate Requirement Levels
Harvard University
1350 Mass. Ave.
Cambridge
MA 02138
- +1 617 495 3864
sob@harvard.edu
General
keyword
In many standards track documents several words are used to signify
the requirements in the specification. These words are often
capitalized. This document defines these words as they should be
interpreted in IETF documents. Authors who follow these guidelines
should incorporate this phrase near the beginning of their document:
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
Note that the force of these words is modified by the requirement
level of the document in which they are used.
Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types
Innosoft International, Inc.
1050 East Garvey Avenue South
West Covina
CA
91790
US
+1 818 919 3600
+1 818 919 3614
ned@innosoft.com
First Virtual Holdings
25 Washington Avenue
Morristown
NJ
07960
US
+1 201 540 8967
+1 201 993 3032
nsb@nsb.fv.com
STD 11, RFC 822 defines a message representation protocol specifying considerable detail about US-ASCII message headers, but which leaves the message content, or message body, as flat US-ASCII text. This set of documents, collectively called the Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages to allow for
(1) textual message bodies in character sets other than US-ASCII,
(2) an extensible set of different formats for non-textual message bodies,
(3) multi-part message bodies, and
(4) textual header information in character sets other than US-ASCII.
These documents are based on earlier work documented in RFC 934, STD 11 and RFC 1049, but extends and revises them. Because RFC 822 said so little about message bodies, these documents are largely orthogonal to (rather than a revision of) RFC 822.
The initial document in this set, RFC 2045, specifies the various headers used to describe the structure of MIME messages. This second document defines the general structure of the MIME media typing sytem and defines an initial set of media types. The third document, RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text data in Internet mail header fields. The fourth document, RFC 2048, specifies various IANA registration procedures for MIME-related facilities. The fifth and final document, RFC 2049, describes MIME
conformance criteria as well as providing some illustrative examples of MIME message formats, acknowledgements, and the bibliography.
These documents are revisions of RFCs 1521 and 1522, which themselves were revisions of RFCs 1341 and 1342. An appendix in RFC 2049 describes differences and changes from previous versions.
UTF-8, a transformation format of ISO 10646
ISO/IEC 10646-1 defines a large character set called the Universal Character Set (UCS) which encompasses most of the world's writing systems. The originally proposed encodings of the UCS, however, were not compatible with many current applications and protocols, and this has led to the development of UTF-8, the object of this memo. UTF-8 has the characteristic of preserving the full US-ASCII range, providing compatibility with file systems, parsers and other software that rely on US-ASCII values but are transparent to other values. This memo obsoletes and replaces RFC 2279.
Hypertext Transfer Protocol -- HTTP/1.1
Department of Information and Computer Science
University of California, Irvine
Irvine
CA
92697-3425
+1(949)824-1715
fielding@ics.uci.edu
World Wide Web Consortium
MIT Laboratory for Computer Science, NE43-356
545 Technology Square
Cambridge
MA
02139
+1(617)258-8682
jg@w3.org
Compaq Computer Corporation
Western Research Laboratory
250 University Avenue
Palo Alto
CA
94305
mogul@wrl.dec.com
World Wide Web Consortium
MIT Laboratory for Computer Science, NE43-356
545 Technology Square
Cambridge
MA
02139
+1(617)258-8682
frystyk@w3.org
Xerox Corporation
MIT Laboratory for Computer Science, NE43-356
3333 Coyote Hill Road
Palo Alto
CA
94034
masinter@parc.xerox.com
Microsoft Corporation
1 Microsoft Way
Redmond
WA
98052
paulle@microsoft.com
World Wide Web Consortium
MIT Laboratory for Computer Science, NE43-356
545 Technology Square
Cambridge
MA
02139
+1(617)258-8682
timbl@w3.org
The Hypertext Transfer Protocol (HTTP) is an application-level
protocol for distributed, collaborative, hypermedia information
systems. It is a generic, stateless, protocol which can be used for
many tasks beyond its use for hypertext, such as name servers and
distributed object management systems, through extension of its
request methods, error codes and headers . A feature of HTTP is
the typing and negotiation of data representation, allowing systems
to be built independently of the data being transferred.
HTTP has been in use by the World-Wide Web global information
initiative since 1990. This specification defines the protocol
referred to as "HTTP/1.1", and is an update to RFC 2068 .
The 'text/html' Media Type
This document summarizes the history of HTML development, and defines the "text/html" MIME type by pointing to the relevant W3C recommendations. This memo provides information for the Internet community.
XML Media Types
This document standardizes five new media types -- text/xml, application/xml, text/xml-external-parsed-entity, application/xml- external-parsed-entity, and application/xml-dtd -- for use in exchanging network entities that are related to the Extensible Markup Language (XML). This document also standardizes a convention (using the suffix '+xml') for naming media types outside of these five types when those media types represent XML MIME (Multipurpose Internet Mail Extensions) entities. [STANDARDS-TRACK]
Many thanks to Ned Freed and John Klensin for comments and ideas that motivated
creation of this document, and to Barry Leiba for suggested text.