Namespaces and URNs

Thu Aug 6 20:37:29 BST 1998

"So if we are serious about making namespaces work we need to start using
the same strings to refer to the same namespace. I agree that that means
that DTD owners/maintainers need to be involved. Since most of the common
DTDs are within the remit of the W3C, they should be thinking of how to
identify them in namespaces. If they want to use FPIs, fine. But they
should make it clear which the FPIs *are* and use them consistently. If
they want to use URLs instead - fine. But they shouldn't encourage the use
of both simultaneously."

Ok, so I'm searching the 300,000 acronyms in my head and FPI is not popping up
:-) Did I miss class that day? What does FPI stand for?

On the issue of URNs, I, like you I guess, had in my mind as I read the
namespace specs that there would eventually be some sort of NSNS (Namespace
Naming Server) scheme out there. It seems like the obvious way to go over the
long haul, though I can see that it would be a big, big step and involve a lot
of overhead.

Would not a another possibility be to take another cue from the object oriented
world of components? COM objects and other components have a globally unique
identifier that indicates a particular version of a particular interface. Could
you not come up with a scheme where each creator of a DTD or Schema generates a
unique id (using a well published algorithm for which public domain tools are
easily available) and publishes some canonical name of the DTD, the versions
that exist, the namespace names each one defines and any gotchas, and the
unique id of each DTD version.

Then the parser could see, even if a DTD came in from different sources via
different URIs, that in effect they were the exact same version and that
subsequent instances of the DTD could be just ignored and the current content
used? It would involve some statement in the DTD that must come first and which
identifies the cononical name and the unique id that represents the particular
version. The same could applied to Schemas and XML document instances in
general as well I assume.

It could also allow a parser to recognize that two or more versions of the same
DTD/Schema was being used simultaneously and warn about it. Really smart
systems could use this to automatically build a map of synonyms I guess, though
that's kind of a scary thought in some ways. It could let particular
applications insure that they were getting only particular versions of
particular DTDs, etc...

Unfortunately its probably a bit late to be suggesting something like this,
since it will require some new verbiage in the file format. But, if there is no
real likelihood of getting some registration mechanism out there (and/or such a
mechanism would be too much of a burden), then some such unique identification
mechanism could be a decent second choice, don't you think?

Something like the MD5 hash generates 128 bit hashes. Its well known, free,
etc... All you need is a simple algorithm to feed it a semi-consistent input
buffer made up of the canonical name, current time in milliseconds on your
system, version string of the particular version, author name, etc... and the
likelihood of it generating a clash with 2^128 possibilities (many, many times
the number of atoms in the universe I believe) is extraordinarily low. The
likelihood of two files with a clash being used by the same document is
probably not even worth thinking about.

Of course you could argue that you could just do a DOMHash on both files and
consider them the same if they hash the same, but that seems to be unreasonable
since it would leave out any non-DOM parsers from the fun. The scheme above
would let everyone play and push the overhead to 'compile' time instead of
runtime.

So is this a totally dumb idea, or does it make some sense? It was a basically
off the cuff, unencumbered by the thought process suggestion, but it makes some
sense on the face of it.

----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
roddey at us.ibm.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)