By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress”.
The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.
This Internet-Draft will expire in January 2007.
Copyright © The Internet Society (2006). All Rights Reserved.
Both WebDAV servers and clients frequently map URI-escaped characters inside a path segment to non-ASCII characters. These mappings can only be interoperable if there is a consensus about the appropriate character encoding. This document specifies a default encoding that is compatible with both the recommendations for URIs in HTML content and the "Internationalized Resource Identifiers" (IRI) specification.
Furthermore, servers that implement a mapping to locally constrained names frequently do not support specific names, or silently map "similar" names to the same resource (for instance when content is stored in a filesystem that is case-preserving, but not case-sensitive). For these cases, discovery and error signalling features are defined.
Distribution of this document is unlimited. Please send comments to the Distributed Authoring and Versioning (WebDAV) working group at firstname.lastname@example.org, which may be joined by sending a message with subject "subscribe" to email@example.com.
Discussions of the WEBDAV working group are archived at URL: <http://lists.w3.org/Archives/Public/w3c-dist-auth/>.
|I edit (type: edit, status: open)|
|firstname.lastname@example.org||2005-11-15||Umbrella issue for editorial fixes/enhancements.|
|Associated changes in this document: 3, 5, 5.1.1, 5.1.2, 5.2, 7.1.1, 12.2, <#rfc.change.edit.8>.|
|I UNICODE_NORMALIZATION (type: change, status: open)|
|email@example.com||2005-11-15||(pointed out by Jim Luther:) Servers may do Unicode normalization, thus not being able to roundtrip arbitrary UTF-8. Potentially one specific normalization form needs to be recommended.|
Both WebDAV servers and clients frequently map URI-escaped characters (see [RFC3986]) inside a path segment to non-ASCII characters. These mappings can only be interoperable if there is a consensus about the appropriate character encoding. This document specifies a default encoding that is compatible with both the recommendations for URIs in HTML content (see [HTML], Appendix B.2.1) and the IRI specification [RFC3987].¶
Furthermore, servers that implement a mapping to locally constrained names frequently do not support specific names, or silently map "similar" names to the same resource (for instance when content is stored in a filesystem that is case-preserving, but not case-sensitive). For these cases, discovery and error signalling features are defined.¶
The terminology used here follows that in WebDAV [RFC2518], HTTP [RFC2616] and "Versioning Extensions to WebDAV" [RFC3253]. Definitions of the terms resource, Uniform Resource Identifier (URI), and Uniform Resource Locator (URL) are provided in [RFC3986].¶
In proposing a common mapping, the following requirements were taken into account: ¶
The only widely-deployed character encoding fulfilling these requirements is the UTF-8 character decoding, defined in [RFC3629]. Consequently, it's also the encoding recommended for URLs in HTML content ([HTML], Appendix B.2.1) and for IRIs ([RFC3987]).¶
Therefore, clients and servers SHOULD use the UTF-8 character encoding to map non-ASCII characters to/from character sequences in URL segments.¶
When mapping HTTP URL segments (see ↑↓
[RFC3986], section 3.3) to local storage, the server's behaviour usually depends on the API used to access that storage. In practice, two styles are widely deployed: binary and character-based. The sections below discuss the implications of each and also describe an "identity" mapping.¶
A typical scenario for this case is when the server does a direct mapping between URLs and objects in a filesystem, and the filesystem uses filenames based on byte sequences. This is the case for typical Unix filesystem implementations.¶
In this case, mapping between URL segments and local names is straightforward: ¶
The advantage of this simple mapping is that it faithfully stores whatever the original URL contained. On the other hand, this is a binary encoding, and programs that display filenames usually have to map the byte sequence to a character sequence for display. Unless both character encodings match, the results will be either inaccurate (incorrect characters) or the display function will break completely (for instance when an attempt is made to UTF-8-decode a byte stream that was originally encoded using an incompatible encoding such as ISO-8859-1).¶
Things get even more complicated when there is no single character encoding being used on the server. For instance, in a Unix system multiple users may use different character encodings for filenames. However, the filesystem does not preserve information about what character encoding the filename was encoded with; thus, depending on their "locale" settings, different users will see different names for the same filesystem object.¶
This scenario is similar to the one discussed in the previous section (5.1.1). For instance it occurs when objects are stored locally in a way that allows Unicode characters in names, such as filenames in the Windows filesystem.¶
However, in addition to the mapping to byte sequences, an additional mapping to a character sequence is required. As discussed in Section 4, this mapping should use the UTF-8 character encoding ([RFC3629]). Thus, here the mapping can be described as: ¶
Finally, it's also possible to simply store the URL segments character by character, in which case no special mapping considerations apply. Note that this approach may be inefficient in case the names contain many URL-escaped sequences (such as when asian characters have been encoded using UTF-8).¶
The non-trivial mappings have the common drawback that certain sets of legal HTTP URLs can not be mapped to local names (and therefore usually need to be rejected). For the byte sequence mapping described in Section 5.1.1, this will usually be just the null character.¶
However, when using the character mapping described in Section 5.1.2, whole Unicode character ranges may either be impossible to represent (such as when the underlying filesystem does only support a Unicode subset), or explicitly disallowed (such as non-normalized character sequences, see ↑↓
[CNORM], section 3.2).¶
In general, the mappings discussed in Section 5.1.2 apply to clients as well. Whether a client maps segments to byte or character sequences usually depends on the platform it runs on, and what system layer it uses. For instance, a filesystem driver for a Unix system usually will have to translate to byte sequences (because that's how many Unix system internally represent filenames).¶
However, if the client needs to do any mapping it all, there may be sitations where parts of a URL segment can't be mapped to what the client needs internally. In cases like these, it is recommended that the client signals the problem, and provides a way to repair the problem (such as renaming the resource).¶
Servers that use a non-identity mapping may not be able to create new resources with the URLs specified by the client (such as in an MKCOL or a PUT request).¶
Clients that use a non-identity mapping may not be able to handle all URLs returned by a server (such as a result of a PROPFIND request).¶
All of the security considerations of HTTP/1.1 and the WebDAV Distributed Authoring Protocol specification also apply to this protocol specification.¶
TBD: add notes about the inherent security risks when a backend storage maps multiple notations to the same physical object (file), think uppercase/lowercase, trailing blanks/dots, resolution of relative paths ("./", "../"). ¶
There are no IANA Considerations.¶
Copyright © The Internet Society (2006).
This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
This document and the information contained herein are provided on an “AS IS” basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at firstname.lastname@example.org.