<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc [
  <!ENTITY MAY "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>MAY</bcp14>">
  <!ENTITY MUST "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>MUST</bcp14>">
  <!ENTITY MUST-NOT "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>MUST NOT</bcp14>">
  <!ENTITY OPTIONAL "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>OPTIONAL</bcp14>">
  <!ENTITY RECOMMENDED "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>RECOMMENDED</bcp14>">
  <!ENTITY REQUIRED "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>REQUIRED</bcp14>">
  <!ENTITY SHALL "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>SHALL</bcp14>">
  <!ENTITY SHALL-NOT "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>SHALL NOT</bcp14>">
  <!ENTITY SHOULD "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>SHOULD</bcp14>">
  <!ENTITY SHOULD-NOT "<bcp14 xmlns='http://purl.org/net/xml2rfc/ext'>SHOULD NOT</bcp14>">
	<!ENTITY rfc2119 PUBLIC '' 'bibxml/reference.RFC.2119.xml'>
	<!ENTITY rfc2046 PUBLIC '' 'bibxml/reference.RFC.2046.xml'> <!--MIME Part Two: Media Types-->
	<!ENTITY rfc2616 PUBLIC '' 'bibxml/reference.RFC.2616.xml'> <!--HTTP/1.1-->
	<!ENTITY rfc2854 PUBLIC '' 'bibxml/reference.RFC.2854.xml'> <!--text/html-->
	<!ENTITY rfc3023 PUBLIC '' 'bibxml/reference.RFC.3023.xml'> <!--text/xml-->    
	<!ENTITY rfc3629 PUBLIC '' 'bibxml/reference.RFC.3629.xml'> <!--UTF-8-->
]>
<!-- ?xml-stylesheet type='text/xsl' href='http://xml.resource.org/authoring/rfc2629.xslt' ? -->
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>

<?rfc toc="yes" ?>
<?rfc tocindent="no" ?>
<?rfc autobreaks="no" ?>
<?rfc comments="yes" ?>
<?rfc inline="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes"?>
<?rfc iprnotified="no" ?>
<?rfc strict="no" ?>
<?rfc compact="yes"?>

<rfc category="std"
     ipr="trust200902"
     updates="2046"
     docName="draft-ietf-appsawg-mime-default-charset-02"
     xmlns:x='http://purl.org/net/xml2rfc/ext'
     x:maturity-level="proposed">
    <x:feedback template="mailto:apps-discuss@ietf.org?subject={docname},%20%22{section}%22&amp;body=&lt;{ref}&gt;:"/> 
    <front>
        <title abbrev="MIME Charset Default Update">Update to MIME regarding Charset Parameter Handling in Textual&#160;Media&#160;Types</title>
        <author initials='A.' surname="Melnikov" fullname="Alexey Melnikov">
            <organization>Isode Limited</organization>
            <address>
              <postal>
                <street>5 Castle Business Village</street>
                <street>36 Station Road</street>
                <city>Hampton</city>
                <region>Middlesex</region>
                <code>TW12 2BX</code>
                <country>UK</country>
              </postal>
              <email>Alexey.Melnikov@isode.com</email>
            </address>
        </author>
        <author initials="J. F." surname="Reschke" fullname="Julian F. Reschke">
            <organization abbrev="greenbytes">greenbytes GmbH</organization>
            <address>
              <postal>
                <street>Hafenweg 16</street>
                <city>Muenster</city><region>NW</region><code>48155</code>
                <country>Germany</country>
              </postal>
              <email>julian.reschke@greenbytes.de</email>	
              <uri>http://greenbytes.de/tech/webdav/</uri>	
            </address>
        </author>
        <date year="2012" month="April" day="21"/>
        <area>Applications</area>
        <workgroup>Applications Area Working Group</workgroup>
        
        <keyword>MIME</keyword>
        <keyword>charset</keyword>
        <keyword>text</keyword>
        <abstract>
          <t>
            This document changes RFC 2046 rules regarding default charset parameter
	    values for text/* media types to better align with common usage by existing
	    clients and servers.
          </t>
        </abstract>
        <note title="Editorial Note (To be removed by RFC Editor)">
          <t>
            Discussion of this draft should take place on the Apps Area Working Group
            mailing list (apps-discuss@ietf.org), which is archived at
            <eref target="http://www.ietf.org/mail-archive/web/apps-discuss"/>.
          </t>
        </note>
    </front>

    <middle>
        <section title="Introduction and Overview">
            <t>

<!--////Alexey: this might need improvments-->

	    <xref target="RFC2046"/> specified that the default charset parameter
	    (i.e. the value used when it is not specified) is "US-ASCII".
	    <xref target="RFC2616"/> changed the default for use by HTTP to be "ISO-8859-1".
	    This encoding is not very common for new text/* media types
	    and a special rule in HTTP adds confusion
	    about which specification (<xref target="RFC2046"/> or <xref target="RFC2616"/>)
	    is authoritative in regards to the default charset for text/* media types.
	    
	    <!-- jre recommends to raise an HTTPbis issue one feels strongly about this
      
      <cref>At the time of writing of this document the IETF HTTPBIS WG is working
	    on an update to RFC 2616 which removes the default charset of "ISO-8859-1"
	    for "text/*" media types. It is expected that the set of HTTPBIs documents
	    will reference this document in order to use the updated rules
	    of default charset in "text/*" media types.</cref> -->
            </t>
    
            <t>
	    Many complex text subtypes such as text/html <xref target="RFC2854"/>  and text/xml <xref target="RFC3023"/>  have internal
	    (to their format) means of describing the charset.
	    Many existing User Agents ignore the default of "US-ASCII" rule for at least
	    text/html and text/xml.
            </t>

	    <t>This document changes RFC 2046 rules regarding default charset parameter
	    values for text/* media types to better align with common usage by existing
	    clients and servers. It does not change the defaults for any currently
      registered media type.<!-- FIXME if we actually do change the default for text/plain-->
	    </t>
<!-- JR: we may also want to state that we do not define handling of broken messages-->
        </section>

	<section title="Conventions Used in This Document">
	    
	    <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
	    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
	    this document are to be interpreted as described in
	    <xref target="RFC2119"/>.</t>
          
	</section>
	
	<section title="New rules for default charset parameter values for text/* media types">

	    <t><xref target="RFC2046" x:sec="4.1.2" x:fmt="of"/> says:</t>

      <x:blockquote cite="http://tools.ietf.org/html/rfc2046#section-4.1.2">
        <t>The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.</t>
      </x:blockquote>

<!--///RFC 2046, Section 4.1.2 also says:
   Note that the character set used, if anything other than US- ASCII,
   must always be explicitly specified in the Content-Type field.
-->
	    
	    <t>As explained in the Introduction section this rule is considered
	    to be outdated, so this document replaces it with the following set
	    of rules:</t>

<!--///Ned wrote:
    In the absence of a specification of a default, I'm tempted
    to say the "default default" should be UTF-8.
-->

<!--///John Klensin wrote:
    Require text/* to be accompanied by a charset parameter
    always.  Of course, if one is omitted, which will happen
    in practice, people will assume the old rules.  But new
    and updated specs say "parameter MUST be supplied".
-->   
	    
	    <t>Each subtype of the "text" media type which uses the "charset"
	    parameter can define its own default value for the "charset" parameter,
	    including the absence of any default.
	    </t>
	    
	    <t>
<!-- jre I'm not sure I understand that "for interoperability..." part-->
<!--
Henri S.> For backwards compatibility, pretty much every existing text/* type
Henri S.> will have to violate this "SHOULD NOT".

Ned F.> Yep. That's the main reason why it needs to be a SHOULD.

JR: those who will update text/xml and text/html know how to read the SHOULD.
-->
	    In order to improve interoperability with deployed agents,
	    "text/*" media type registrations &SHOULD; either
      </t>
      <t>
      <list style="letters">
        <t>
          specify that the "charset" parameter is not used for the defined subtype,
    	    because the charset information is transported inside the payload (such as in "text/xml"), or
        </t>
        <t>
          require explicit unconditional inclusion of the "charset" parameter
    	    eliminating the need for a default value.
        </t>
      </list>
      </t>
      <t>
<!--////Alexey: Hmmm, does this mean that the second choice above doesn't specify a default either?-->

	    In accordance with option (a), above, registrations for "text/*" media types that can
	    transport charset information inside the corresponding payloads (such
	    as "text/html" and "text/xml") &SHOULD-NOT; specify
	    the use of a "charset" parameter, nor any default value, in order to
	    avoid conflicting interpretations should the charset parameter value
	    and the value specified in the payload disagree.</t>
	    
	    <t>
	    New subtypes of the "text" media type, thus, &SHOULD-NOT; define a
	    default "charset" value.  If there is a strong reason to do so
	    despite this advice, they &SHOULD; use the "UTF-8" <xref target='RFC3629'/> charset
	    as the default.
	    </t>

	    <t>
	    Specifications covering the "charset" parameter, and what
	    default value, if any, is used, are subtype-specific, NOT
      protocol-specific.  Protocols that use MIME, therefore, &MUST-NOT;
      override default charset values for "text/*" media types to be different
      for their specific protocol.  The protocol definitions &MUST; leave that
	    to the subtype definitions.
	    </t>
	    
        </section>

	<section title="Default charset parameter value for text/plain media type">

	    <t>The default charset parameter value for text/plain is unchanged
	    from <xref target="RFC2046"/> and remains as "US-ASCII".</t>
	    
        </section>
		<section anchor="security" title="Security Considerations">
	    
          <t>
            Guessing of the charset parameter can lead to security issues
            such as content buffer overflows, denial of services or bypass
            of filtering mechanisms. However, this document does not
            promote guessing, but encourages use of charset information
            that is specified by the sender.
          </t>
          <t>
            Conflicting information in-band vs out-of-band can also lead to
            similar security problems, and this document recommends the use
            of charset information which is more likely to be correct (for
            example, in-band over out-of-band). 
          </t>

	</section>

        <section anchor="iana" title="IANA Considerations">

            <t>
	      This document asks IANA to update the "text" subregistry of
	      the Media Types registry to additionally point to this document.
            </t>
	    
        </section>
    </middle>

    <back>
        <references title="Normative References">

	    &rfc2119;
	    &rfc2046;
	    &rfc3629;

        </references>

        <references title="Informative References">

	    &rfc2616;
	    &rfc2854;
	    &rfc3023;
	    
        </references>
    
    <section title="Acknowledgements">
	
      <t>
	Many thanks to Ned Freed and John Klensin for comments and ideas that motivated
	creation of this document, and to Carsten Bormann, Murray S. Kucherawy, Barry Leiba, and Henri Sivonen for feedback and text suggestions.
      </t>

    </section>
	
    </back>
</rfc>
