SAX: New Idea for Entity Resolution

Alex Milowski lex at www.copsol.com
Thu Apr 16 06:46:28 BST 1998


> Alex Milowski writes:
> 
>  > In effect, although the above interface is useful, it reduces
>  > interchange in that I can make a document with broken system
>  > identifiers work on my system.  Essentially, I can make an
>  > *invalid* document valid!
> 
> You can do this in any case, though -- you can intercept URIs in the
> system libraries (Java, for example, lets you register your own
> schemes), or you can redirect them with a proxy server.
> 
> With URLs, file:// will almost always break on exchange, as will http:
> system identifiers that refer to hostnames visible only within a
> private network.

Yes, but then if you do this, don't expect it to work elsewhere.  ;-)

Why would you use absolute URLs?  Bad author, bad!  Ok, maybe you would
use them for a standard DTD. ;-)  (This is where I beat the URN drum)

<SGMLRANT type='mild'>
In the SGML world, I could come up with a scheme that made location
orthogonal to my documents.  I *never* put a system identifier in my
documents.  In XML, this is much harder.
</SGMLRANT>

<URNRANT>
Now, if URN support was *standard*, I could at least put a URN in the
place of every system identifier I needed and then my document is
quite portable.  The key phrase here is *standard*.
</URNRANT>

Of course, we could also fix public identifiers and forget about the
URN stuff.   ...but, then we would have to come up with 
yet-another-resolution-mechanism... which sounds too much like URNs.

> Your other points (which I omitted above) are well taken -- public
> identifiers are a bit of a muddle right now, but since they're in XML
> 1.0, it makes sense to support them.  The interface is not only for
> public identifiers, however -- users can also remote URIs to
> local/secure equivalents, and they can even screen out certain URIs if
> necessary.  I'd better copyright "XML-Nanny" before someone else
> thinks of it.

Well, a further point I was making off-line is that this kind
of mapping could be lead people down the wrong road.  I have run into
so many SGML users over the years that didn't know how to or *couldn't* use
public identifiers without system identifiers.  In an SGML world, I see this
as bad practice.  Likewise, I see mapping system identifiers in XML as bad
practice.

Two general rules I can recommend:

   1. Use an internal resolution system inside your production
      systems.  Locations will change even inside your own system.

   2. Use a fairly static naming system (URN/Public identifier) when
      you exchange documents.

One thing XML has over SGML is that it is tied more closely to a location
mechanism.  If you add in URN ability, there is no issue of "configuring"
you local system to know about mappings--you just do a URN lookup.

(Obviously, URNs can be miss-configured or not available.  Ever had
 problems on the Internet with DNS names?  Same idea, same problem, same
 frustration when it is wrong!)

==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex at copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list