Link: http://trac.tools.ietf.org/wg/httpbis/trac/ticket/20
Origin: http://www.w3.org/mid/B6C10798-0A18-4D37-AEC7-E93E8C0F102A@yahoo-inc.com
Component: p3-payload
2616 Section 3.7.1 states;
When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.
However, many, if not all, of the text/* media types define their own defaults; text/plain (RFC2046), for example, defaults to ASCII, as does text/xml (RFC3023).
How do these format-specific defaults interact with HTTP's default? Is HTTP really overriding them?
I'm far from the first to be confused by this text, and I'm sure it's been asked before, but I haven't been able to find a definitive answer. If errata are still being considered, perhaps removing/ modifying this line would be a good start...
From [146]:
Add directory for test cases, starting with encoding tests (addresses #20)
From [147]:
Set mime types for test files (addresses #20)
Resolution:
3. Add text to Security Considerations explaining UTF-7 vulnerability in browsers and exclude such charsets from the guessing algorithm. (see http://www.w3.org/mid/B412EABE-8E69-455F-A00B-A1ED1F386440@gbiv.com )
I'll be happy to apply the changes if somebody proposes the exact text to be added to the security considerations...
From [209]:
Remove character set defaulting for text media types (to be done: add security considerations WRT charset sniffing); relates to #20.
From [211]:
Back out change [209], see discussion around < http://lists.w3.org/Archives/Public/ietf-http-wg/2008JanMar/0233.html >; relates to #20.
From Dublin meeting minutes ( http://jabber.ietf.org/logs/httpbis/2008-07-29.txt ):
[09:20:47] <Thomas Roessler> julian: default character set for media types (text/*)? [09:20:51] <Thomas Roessler> aleksey: oh gosh [09:20:54] <Thomas Roessler> mnot: wary of this [09:21:58] <Thomas Roessler> mnot: <grepping RFC 2616 for ISO-8859-1 occurences> [09:22:08] <Thomas Roessler> mnot: well, the issue is... [09:22:25] <Thomas Roessler> julian: the issue is when you look in different RFCs for default encoding of text/*, you get different answers [09:22:33] <Thomas Roessler> ... text registration, text/xml registration, HTTP text [09:22:37] <Thomas Roessler> ... wouldn't know which one is normative ... [09:22:59] <Thomas Roessler> ... were close to getting rid of ISO-8859-1, but then Roy stepped in ... [09:23:11] <Thomas Roessler> ... if we can't make normative change, might be useful to phrase this in a way that makes clear what's going on ... [09:23:15] <Thomas Roessler> mnot: issue-20 [09:23:47] <Thomas Roessler> ... proposed tetx suggests that we override defaults ... [09:24:00] <Thomas Roessler> ... relationship between the two isn't clear -- which takes precedence ... [09:24:04] <Thomas Roessler> ... this is confusing people ... [09:24:08] <Thomas Roessler> ... we had a proposal that we backed out ... [09:24:26] <Thomas Roessler> mnot: roy, did you have a proposal for this that you remember? [09:24:32] <roy.fielding> It was a deliberate decision to override MIME. Lots of discussion way back then. [09:24:42] <Thomas Roessler> barry: <channeling roy> [09:24:48] <roy.fielding> not that I can remember .. will search [09:25:25] <Thomas Roessler> julian: If it was deliberate discussion to override MIME, should we now override text/...? [09:25:44] <Thomas Roessler> mnot: remember there were historical reasons for iso-8859-1 [09:25:51] <roy.fielding> right, Mosaic puked on charset parameter [09:26:06] <Thomas Roessler> julian: problem is that default is harmful for formats that carry their own charset info [09:26:23] <Thomas Roessler> ... at least for text/xml, should document what's implemented in practice ... [09:26:40] <Thomas Roessler> mnot: document [09:26:55] <Thomas Roessler> ACTION: mnot to research previous discussion, and restate so we can get going again