SAX: New Idea for Entity Resolution

Alex Milowski lex at
Thu Apr 16 06:46:28 BST 1998

> Alex Milowski writes:
>  > In effect, although the above interface is useful, it reduces
>  > interchange in that I can make a document with broken system
>  > identifiers work on my system.  Essentially, I can make an
>  > *invalid* document valid!
> You can do this in any case, though -- you can intercept URIs in the
> system libraries (Java, for example, lets you register your own
> schemes), or you can redirect them with a proxy server.
> With URLs, file:// will almost always break on exchange, as will http:
> system identifiers that refer to hostnames visible only within a
> private network.

Yes, but then if you do this, don't expect it to work elsewhere.  ;-)

Why would you use absolute URLs?  Bad author, bad!  Ok, maybe you would
use them for a standard DTD. ;-)  (This is where I beat the URN drum)

<SGMLRANT type='mild'>
In the SGML world, I could come up with a scheme that made location
orthogonal to my documents.  I *never* put a system identifier in my
documents.  In XML, this is much harder.

Now, if URN support was *standard*, I could at least put a URN in the
place of every system identifier I needed and then my document is
quite portable.  The key phrase here is *standard*.

Of course, we could also fix public identifiers and forget about the
URN stuff.   ...but, then we would have to come up with 
yet-another-resolution-mechanism... which sounds too much like URNs.

> Your other points (which I omitted above) are well taken -- public
> identifiers are a bit of a muddle right now, but since they're in XML
> 1.0, it makes sense to support them.  The interface is not only for
> public identifiers, however -- users can also remote URIs to
> local/secure equivalents, and they can even screen out certain URIs if
> necessary.  I'd better copyright "XML-Nanny" before someone else
> thinks of it.

Well, a further point I was making off-line is that this kind
of mapping could be lead people down the wrong road.  I have run into
so many SGML users over the years that didn't know how to or *couldn't* use
public identifiers without system identifiers.  In an SGML world, I see this
as bad practice.  Likewise, I see mapping system identifiers in XML as bad

Two general rules I can recommend:

   1. Use an internal resolution system inside your production
      systems.  Locations will change even inside your own system.

   2. Use a fairly static naming system (URN/Public identifier) when
      you exchange documents.

One thing XML has over SGML is that it is tied more closely to a location
mechanism.  If you add in URN ability, there is no issue of "configuring"
you local system to know about mappings--you just do a URN lookup.

(Obviously, URNs can be miss-configured or not available.  Ever had
 problems on the Internet with DNS names?  Same idea, same problem, same
 frustration when it is wrong!)

R. Alexander Milowski   alex at
Copernican Solutions Incorporated                  (612) 379 - 3608

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list