httpbis: Ticket #20: Default charsets for text media types

Link:  http://trac.tools.ietf.org/wg/httpbis/trac/ticket/20

Origin:  http://www.w3.org/mid/B6C10798-0A18-4D37-AEC7-E93E8C0F102A@yahoo-inc.com

Component: p3-payload

2616 Section 3.7.1 states;

When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.

However, many, if not all, of the text/* media types define their own defaults; text/plain (RFC2046), for example, defaults to ASCII, as does text/xml (RFC3023).

How do these format-specific defaults interact with HTTP's default? Is HTTP really overriding them?

I'm far from the first to be confused by this text, and I'm sure it's been asked before, but I haven't been able to find a definitive answer. If errata are still being considered, perhaps removing/ modifying this line would be a good start...

Mails

Mails by Sender (Top 10) Mails by Month

Associated Checkins

julian.reschke@gmx.de (Fri, 04 Jan 2008 15:24:35 GMT)
[146]: Add directory for test cases, starting with encoding tests (addresses #20)
julian.reschke@gmx.de (Fri, 04 Jan 2008 15:27:49 GMT)
[147]: Set mime types for test files (addresses #20)
julian.reschke@gmx.de (Tue, 12 Feb 2008 13:46:02 GMT)
[209]: Remove character set defaulting for text media types (to be done: add ...
julian.reschke@gmx.de (Thu, 14 Feb 2008 11:18:12 GMT)
[211]: Back out change [209], see discussion around ...
[209]: Remove character set defaulting for text media types (to be done: add ...

History

: comment added; version, component, milestone set (Fri, 04 Jan 2008 06:18:54 GMT)

: comment added (Fri, 04 Jan 2008 15:24:35 GMT)

From [146]:

Add directory for test cases, starting with encoding tests (addresses #20)

: comment added (Fri, 04 Jan 2008 15:27:49 GMT)

From [147]:

Set mime types for test files (addresses #20)

: comment added; milestone changed (Tue, 05 Feb 2008 15:34:10 GMT)

Resolution:

  1. remove < http://tools.ietf.org/id/draft-ietf-httpbis-p3-payload-01.txt >, section 2.3.1, the entire forth paragraph (i.e., the last one in that section).
  2. From 2.1.1: Move """HTTP/1.1 recipients MUST respect the charset label provided by the sender; and those user agents that have a provision to "guess" a charset MUST use the charset from the content-type field if they support that charset, rather than the recipient's preference, when initially displaying a document. """ to the end of 2.3.1, removing the rest of 2.1.1.
  3. Add text to Security Considerations explaining UTF-7 vulnerability in browsers and exclude such charsets from the guessing algorithm. (see http://www.w3.org/mid/B412EABE-8E69-455F-A00B-A1ED1F386440@gbiv.com )

: comment added (Tue, 05 Feb 2008 17:15:19 GMT)

3. Add text to Security Considerations explaining UTF-7 vulnerability in browsers and exclude such charsets from the guessing algorithm. (see http://www.w3.org/mid/B412EABE-8E69-455F-A00B-A1ED1F386440@gbiv.com )

I'll be happy to apply the changes if somebody proposes the exact text to be added to the security considerations...

: comment added (Tue, 12 Feb 2008 13:46:02 GMT)

From [209]:

Remove character set defaulting for text media types (to be done: add security considerations WRT charset sniffing); relates to #20.

: comment added (Thu, 14 Feb 2008 11:18:12 GMT)

From [211]:

Back out change [209], see discussion around < http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0233.html >; relates to #20.

: comment added; milestone changed (Thu, 28 Feb 2008 05:24:28 GMT)

: comment added; milestone changed (Wed, 02 Apr 2008 05:59:56 GMT)

: comment added (Tue, 04 Nov 2008 13:30:21 GMT)

From Dublin meeting minutes ( http://jabber.ietf.org/logs/httpbis/2008-07-29.txt ):

[09:20:47] <Thomas Roessler> julian: default character set for media types (text/*)?
[09:20:51] <Thomas Roessler> aleksey: oh gosh
[09:20:54] <Thomas Roessler> mnot: wary of this
[09:21:58] <Thomas Roessler> mnot: <grepping RFC 2616 for ISO-8859-1 occurences>
[09:22:08] <Thomas Roessler> mnot: well, the issue is...
[09:22:25] <Thomas Roessler> julian: the issue is when you look in different RFCs for default encoding of text/*, you get different answers
[09:22:33] <Thomas Roessler> ... text registration, text/xml registration, HTTP text
[09:22:37] <Thomas Roessler> ... wouldn't know which one is normative ...
[09:22:59] <Thomas Roessler> ... were close to getting rid of ISO-8859-1, but then Roy stepped in ...
[09:23:11] <Thomas Roessler> ... if we can't make normative change, might be useful to phrase this in a way that makes clear what's going on ...
[09:23:15] <Thomas Roessler> mnot: issue-20
[09:23:47] <Thomas Roessler> ... proposed tetx suggests that we override defaults ...
[09:24:00] <Thomas Roessler> ... relationship between the two isn't clear -- which takes precedence ...
[09:24:04] <Thomas Roessler> ... this is confusing people ...
[09:24:08] <Thomas Roessler> ... we had a proposal that we backed out ...
[09:24:26] <Thomas Roessler> mnot: roy, did you have a proposal for this that you remember?
[09:24:32] <roy.fielding> It was a deliberate decision to override MIME.  Lots of discussion way back then.
[09:24:42] <Thomas Roessler> barry: <channeling roy>
[09:24:48] <roy.fielding> not that I can remember .. will search
[09:25:25] <Thomas Roessler> julian: If it was deliberate discussion to override MIME, should we now override text/...?
[09:25:44] <Thomas Roessler> mnot: remember there were historical reasons for iso-8859-1
[09:25:51] <roy.fielding> right, Mosaic puked on charset parameter
[09:26:06] <Thomas Roessler> julian: problem is that default is harmful for formats that carry their own charset info
[09:26:23] <Thomas Roessler> ... at least for text/xml, should document what's implemented in practice ...
[09:26:40] <Thomas Roessler> mnot: document
[09:26:55] <Thomas Roessler> ACTION: mnot to research previous discussion, and restate so we can get going again

Related Information

Issues List Index