<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
	<!ENTITY rfc2119 PUBLIC '' 'bibxml/reference.RFC.2119.xml'>
	<!ENTITY rfc2046 PUBLIC '' 'bibxml/reference.RFC.2046.xml'> <!--MIME Part Two: Media Types-->
	<!ENTITY rfc2616 PUBLIC '' 'bibxml/reference.RFC.2616.xml'> <!--HTTP/1.1-->
	<!ENTITY rfc3629 PUBLIC '' 'bibxml/reference.RFC.3629.xml'> <!--UTF-8-->
]>
<!-- ?xml-stylesheet type='text/xsl' href='http://xml.resource.org/authoring/rfc2629.xslt' ? -->
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

<?rfc toc="yes" ?>
<?rfc tocindent="no" ?>
<?rfc autobreaks="no" ?>
<?rfc comments="yes" ?>
<?rfc inline="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="no" ?>
<?rfc compact="yes"?>

<rfc category="std"
     ipr="trust200902"
     updates="2046"
     docName="draft-melnikov-mime-default-charset-00">
    <front>
        <title abbrev="MIME Charset Default Update">Update to MIME regarding Charset Parameter Handling in Textual Media Types</title>
        <author initials='A.' surname="Melnikov" fullname="Alexey Melnikov">
            <organization>Isode Limited</organization>
            <address>
              <postal>
                <street>5 Castle Business Village</street>
                <street>36 Station Road</street>
                <city>Hampton</city>
                <region>Middlesex</region>
                <code>TW12 2BX</code>
                <country>UK</country>
              </postal>
              <email>Alexey.Melnikov@isode.com</email>
            </address>
        </author>
        <author initials="J. F." surname="Reschke" fullname="Julian F. Reschke">
            <organization abbrev="greenbytes">greenbytes GmbH</organization>
            <address>
              <postal>
                <street>Hafenweg 16</street>
                <city>Muenster</city><region>NW</region><code>48155</code>
                <country>Germany</country>
              </postal>
              <email>julian.reschke@greenbytes.de</email>	
              <uri>http://greenbytes.de/tech/webdav/</uri>	
            </address>
        </author>
        <date year="2011" month="June" day="14"/>
        <area>Applications</area>
        <keyword>MIME</keyword>
        <keyword>charset</keyword>
        <keyword>text</keyword>
        <abstract>
          <t>
            This document changes RFC 2046 rules regarding default charset parameter
	    values for text/* media types to better align with common usage by existing
	    clients and servers.
          </t>
        </abstract>
    </front>

    <middle>
        <section title="Introduction and overview">
            <t>

<!--////Alexey: this might need improvments-->

	    <xref target="RFC2046"/> specified that the default charset parameter
	    (i.e. the value used when it is not specified) is "US-ASCII".
	    <xref target="RFC2616"/> changed the default for use by HTTP to be "ISO-8859-1".
	    This encoding is not very common for new text/* media types
	    and a special rule in HTTP adds confusion
	    about which specification (<xref target="RFC2046"/> or <xref target="RFC2616"/>)
	    is authoritative in regards to the default charset for text/* media types.
	    
	    <cref>At the time of writing of this document the IETF HTTPBIS WG is working
	    on an update to RFC 2616 which removes the default charset of "ISO-8859-1"
	    for "text/*" media types. It is expected that the set of HTTPBIs documents
	    will reference this document in order to use the updated rules
	    of default charset in "text/*" media types.</cref>
            </t>
    
            <t>
	    Many complex text subtypes such as text/html and text/xml have internal
	    (to their format) means of describing the charset.
	    Many existing User Agents ignore the default of "US-ASCII" rule for at least
	    text/html and text/xml.
            </t>

	    <t>This document changes RFC 2046 rules regarding default charset parameter
	    values for text/* media types to better align with common usage by existing
	    clients and servers.
	    </t>

        </section>

	<section title="Conventions Used in This Document">
	    
	    <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
	    this document are to be interpreted as described in
	    <xref target="RFC2119"/>.</t>
          
	</section>
	
	<section title="New rules for default charset parameter values for text/* media types">

	    <t>Section 4.1.2 of <xref target="RFC2046"/> says:</t>

	    <t>"The default character set, which must be assumed in the absence
	    of a charset parameter, is US-ASCII."</t>
	    
<!--///RFC 2046, Section 4.1.2 also says:
   Note that the character set used, if anything other than US- ASCII,
   must always be explicitly specified in the Content-Type field.
-->
	    
	    <t>As explained in the Introduction section this rule is considered
	    to be outdated, so this document replaces it with the following set
	    of rules:</t>

<!--///Ned wrote:
    In the absence of a specification of a default, I'm tempted
    to say the "default default" should be UTF-8.
-->

<!--///John Klensin wrote:
    Require text/* to be accompanied by a charset parameter
    always.  Of course, if one is omitted, which will happen
    in practice, people will assume the old rules.  But new
    and updated specs say "parameter MUST be supplied".
-->   
	    
	    <t>Each subtype of the "text" media type which uses the charset
	    parameter can define its own default value for the charset parameter,
	    including absence of any default.
	    </t>
	    
	    <t>
<!-- jre I'm not sure I understand that "for interoperability..." part-->
	    In order to improve interoperability with deployed agents,
	    "text/*" media type definitions SHOULD either
	    a) recommend no default charset parameter value (i.e. the charset information
	    is transport inside the payload, for example as in "text/xml") or
	    b) require explicit unconditional inclusion of the charset parameter
	    with the default value.
<!--////Alexey: Hmmm, does this mean that the second choice above doesn't specify a default either?-->

	    "text/*" media types that can transport charset information inside the corresponding
	    payloads SHOULD NOT specify any default, in order to avoid conflicting
	    instructions if the charset parameter value and the value specified
	    in the payload don't agree.
	    </t>

<!--////Alexey: Julian also suggested that only charsets that are supersets of US-ASCII
should be used as defaults. I.e. this would rule out UTF-16. I tend to agree.-->
	    <t>
	    New subtypes of the "text" media type that do define a default charset
	    SHOULD use the "UTF-8" <xref target='RFC3629'/> charset as the default.
	    </t>

	    <t>
	    Protocols using MIME MUST NOT override default charset values for "text/*"
	    media types to be different for their specific protocol.
	    </t>
	    
        </section>

	<section title="Default charset parameter value for text/plain media type">

	    <t>The default charset parameter value for text/plain is unchanged
	    from <xref target="RFC2046"/> and remains as "US-ASCII".</t>
	    
        </section>
	
	
<!--////Do we also need to update the document that registers text/xml?-->


	<section anchor="security" title="Security Considerations">
	    
          <t>TBD. Guessing of default charset is a security problem.
	  Conflicting information in-band vs out-of-band is also a security problem.
          </t>

	</section>

        <section anchor="iana" title="IANA Considerations">

            <t>
	      This document asks IANA to update the "text" subregistry of
	      the Media Types registry to additionally point to this document.
            </t>
	    
        </section>
    </middle>

    <back>
        <references title="Normative References">

	    &rfc2119;
	    &rfc2046;
	    &rfc3629;

        </references>

        <references title="Informative References">

	    &rfc2616;
      
      <!--////jre: here are citations for XML and HTML in W3C land, but maybe
      we should cite their MIME type registrations instead-->
	    
      <reference anchor='HTML'
                 target='http://www.w3.org/TR/1999/REC-html401-19991224'>
        <front>
          <title>HTML 4.01 Specification</title>
          <author fullname='Arnaud Le Hors' surname='Le Hors' initials='A.'/>
          <author fullname='David Raggett' surname='Raggett' initials='D.'/>
          <author fullname='Ian Jacobs' surname='Jacobs' initials='I.'/>
          <date year='1999' month='December' day='24'/>
        </front>
        <seriesInfo name='W3C Recommendation' value='REC-html401-19991224'/>
        <annotation>
          Latest version available at
          <eref target='http://www.w3.org/TR/html401'/>.
        </annotation>
      </reference>

      <reference anchor='XML'
                 target='http://www.w3.org/TR/2008/REC-xml-20081126/'>
        <front>
          <title>Extensible Markup Language (XML) 1.0 (Fifth Edition)</title>
          <author fullname='C. M. Sperberg-McQueen' surname='Sperberg-McQueen' initials='C. M.'/>
          <author fullname='Eve Maler' surname='Maler' initials='E.'/>
          <author fullname='Francois Yergeau' surname='Yergeau' initials='F.'/>
          <author fullname='Jean Paoli' surname='Paoli' initials='J.'/>
          <author fullname='Tim Bray' surname='Bray' initials='T.'/>
          <date year='2008' month='November' day='26'/>
        </front>
        <seriesInfo name='W3C Recommendation' value='REC-xml-20081126'/>
        <annotation>
          Latest version available at
          <eref target='http://www.w3.org/TR/xml'/>.
        </annotation>
      </reference>
	    
        </references>
    
    <section title="Acknowledgements">
	
      <t>
	Many thanks to Ned Freed and John Klensin for comments and ideas that motivated
	creation of this document.
      </t>

    </section>
	
    </back>
</rfc>
