Network Working Group | J. Reschke |
Internet-Draft | greenbytes |
Intended status: Informational | July 2011 |
Expires: January 2012 |
The parsing of Uniform Resource Identifiers (URIs, RFC 3986) and Internationalized Resource Identifiers (IRIs, RFC 3987) is defined in terms of Augmented Backus-Naur Form (ABNF). The ABNF grammars are defined in terms of valid identifiers, and thus technically do not address how to handle invalid ones.¶
The URI specification however includes a note how to use Regular Expressions for parsing, and this note applies to invalid identifiers as well. This document introduces terminology referring to potentially invalid identifiers, and demonstrates how the rules in the URI specification can be applied to them.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress”.¶
This Internet-Draft will expire in January 2012.¶
Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
Distribution of this document is unlimited. Although this is not a work item of the IRI Working Group, comments should be sent to the IRI mailing list at public-iri@w3.org, which may be joined by sending a message with subject "subscribe" to public-iri-request@w3.org.¶
Discussions of the IRI Working Group are archived at <http://lists.w3.org/Archives/Public/public-iri/>.¶
XML versions, latest edits, and the issues list for this document are available from <http://greenbytes.de/tech/webdav/#draft-reschke-ref-parsing>.¶
I edit (type: edit, status: open) | ||
julian.reschke@greenbytes.de | 2011-07-02 | Umbrella issue for editorial fixes/enhancements. |
I iri (type: change, status: open) | ||
julian.reschke@greenbytes.de | 2011-07-02 | Expand for IRIs. |
I proc (type: change, status: proc) | ||
julian.reschke@greenbytes.de | 2011-07-02 | Re-state the parsing algorithm as a procedural algorithm, maybe in JS? |
derhoermi@gmx.net | 2011-07-03 | We can turn the regular expression into a concise ABNF grammar if that helps, we might also adapt it if that is found to be necessary, but I do not see a reason why hundreds of lines of prose code or JavaScript code would help (and anyone who would like to have that anyway can easily de- rive it from the expression or from a grammar). |
I pre (type: change, status: proc) | ||
julian.reschke@greenbytes.de | 2011-07-02 | Define pre-processing steps for extraction of candidate references from content (WS stripping)? |
I post (type: change, status: proc) | ||
julian.reschke@greenbytes.de | 2011-07-02 | Define post-processing steps, such as query component rewriting based on document encoding. |
The parsing of Uniform Resource Identifiers (URIs, [RFC3986]) and Internationalized Resource Identifiers (IRIs, [RFC3987]) is defined in terms of Augmented Backus-Naur Form (ABNF). The ABNF grammars are defined in terms of valid identifiers, and thus technically do not address how to handle invalid ones.¶
The URI specification however includes a note how to use Regular Expressions for parsing, and this note applies to invalid identifiers as well. This document introduces terminology referring to potentially invalid identifiers, and demonstrates how the rules in the URI specification can be applied to them.¶
In addition to the terms defined in the URI specification, namely the Syntax Components (see Section 3 of [RFC3986]), this document defines:¶
Candidate URI Reference ¶
Candidate Scheme Component ¶
Candidate Authority Component ¶
Candidate Path Component ¶
Candidate Query Component ¶
Candidate Fragment Component ¶
The regular expression given in Appendix B of [RFC3986] will parse any input string into a Candidate Scheme Component, a Candidate Authority Component, a Candidate Path Component, a Candidate Query Component, and a Candidate Fragment Component. Note that of these five components, all components except for the Path Component can be undefined.¶
If each of the defined components is valid according to the related URI component definition, the input was a valid URI reference.¶
I combine-valid (type: change, status: open) | ||
derhoermi@gmx.net | 2011-07-03 | In section 3.2 you have "The result will be a valid URI Reference if and only if the components used by the algorithm were valid themselves." I have some doubts about "only if", consider for instance removing dot segments, which might remove a malformed part, if I recall correctly. |
[rfc.comment.1: TBD] ¶
There are no IANA Considerations related to this specification.¶
<http://greenbytes.de/tech/tc/uris/> shows results for the parsing/resolution processing described above, based on a test implementation written in XSLT 2.0.¶
Added issue combine-valid.¶