URIs as IDs in ModSAX (was Re: ModSAX (SAX 1.1) Proposal) (fwd)

Wed Feb 17 19:47:20 GMT 1999

On Wed, 17 Feb 1999, Robb Shecter wrote:

> > From: David Megginson <david at megginson.com>
> >  > Justification:
> >  >      setFeature needs to be passed an identifying string that
> >  >      uniquely picks out some property. Rather than go with the
> >  >      pseudo-uri's in Java, it would be more in the spirit of XML
> >  >      and Web to use URIs. Using URIs is also more friendly towards
> >  >      non-Java implementations of the SAX API.
> >
> > I'm strongly inclined to agree.  Does anyone have a strong case
> > against?
> 
> Hi,
> 
> I'm joining the discussion a bit late, but it seems to me that allowing something that looks
> like a URL that's not a URL is a bad idea.  This seems to have introduced unnecessary
> confusion into namespaces:  It makes an accompanying explanation and admonishment necessary

That was not the proposal. The proposal was to allow SAX features to
be individuated using Uniform Resource Identifiers (URIs). URIs are a
superset of URLs and URNs, so a non-URL URI might be used. The XML-Data 
proposal, admirably, used uuid: URIs to make this point.
(eg.  urn:uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882/ )

Excerpt from RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax
available from http://www.isi.edu/in-notes/rfc2396.txt

   A Uniform Resource Identifier (URI) is a compact string of characters
   for identifying an abstract or physical resource.  
[...]
   This document defines a grammar that is a superset of all valid URI,
   such that an implementation can parse the common components of a URI
   reference without knowing the scheme-specific requirements of every
   possible identifier type

> ("Don't assume there's anything at this 'not-URL' ").  From a software engineering point of

(I'd argue that software engineers who write code that assumes all URLs
can be unproblematically de-referenced are asking for trouble. but
that's besides the point)

> view,  I think it'd be better to chose something that doesn't require the extra documentation
> - Instead of saying, "Watch out for the problem here...", we should not create the problem in
> the first place.

We are not creating a problem. It is fine to use a URL to identify a SAX
property, but by choosing to allow _all_ forms of URI we leave room for
other approaches. This echoes the approach taken by XML namespaces.

Again, from the URI spec:

      Resource
         A resource can be anything that has identity.  Familiar
         examples include an electronic document, an image, a service
         (e.g., "today's weather report for Los Angeles"), and a
         collection of other resources.  Not all resources are network
         "retrievable"; e.g., human beings, corporations, and bound
         books in a library can also be considered resources

...java classes, perl modules, sax filters are equally 'resources' by
this definition.

> I think that the Java standard is very good. I don't think that it's unfriendly towards
> non-Java implementations:  it is after all, only a standard, and not hardcoded into the
> language.

I disagree. The string 'util.tools.png' is as meaningless in the Java 
community as in the wider world. The Java package naming convention has
the look but not the substance of a hieararchically managed namespace.
'util.tools.png' does not uniquely name anything except within the
context of a group of consenting adults who've agreed a set of
conventions for doing so. I don't think the Java world have agreed to do
this yet. The Web approach, using URIs, seems mor mature than Java's
way of doing so. (Maybe a URI scheme for Java package/class naming might
be on the cards for Java 3...?)

>  The main problem with it is that users who do not have a measure of authority at
> their organization can have problems.  So, for example, I've released some open source code
> with the package name:
> 
> org.acm.robb
> 
> ...because I have the e-mail address robb at acm.org.  That works until the ACM decides to put a
> host named "robb" on the acm network.  (OK, that may not happen :) - but the point is valid.)

Quite! That's the problem URIs address, at least within the context of
the Web.
> 
> So, I'm for a clear format, maybe Java-like, that --doesn't-- resemble a URL, that solves the
> above issue.

URIs do this. Though I really don't see a problem with using URIs that
are URLs, so long as we bear in mind that knowing the URL for a resource
is not, and never has been, a guarantee that you'll be able to
de-reference it. For example, there may well be an
http://intranet.whitehouse.gov/ resource in URL (and hence URI) space,
but I'm unlikely to ever be able to access it. Similarly,
http://intranet.whitehouse.gov/identifiers/SecretSaxFilter might serve
to name a SAX module. 

The fact that HTTP allows for content negotiation of language & format
specific views into resources in the http://* namespace also buys some
room to think. You could have english, french and german HTML docs, or
text/xml or application/java-serialised-class or whatever, all
accessible (or not; dereferencing isn't a right) via the same abstract
URL. URLs can be a lot more abstract than people give credit for...

Dan

ps. one last excerpt from the URI spec; sorry if I'm doing this point to
death!

from section 1.2. URI, URL, and URN

   Although many URL schemes are named after protocols, this does not
   imply that the only way to access the URL's resource is via the named
   protocol.  Gateways, proxies, caches, and name resolution services
   might be used to access some resources, independent of the protocol
   of their origin, and the resolution of some URL may require the use
   of more than one protocol (e.g., both DNS and HTTP are typically used
   to access an "http" URL's resource when it can't be found in a local
   cache).

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)