SAX: New Idea for Entity Resolution
Alex Milowski
lex at www.copsol.com
Thu Apr 16 06:46:28 BST 1998
> Alex Milowski writes:
>
> > In effect, although the above interface is useful, it reduces
> > interchange in that I can make a document with broken system
> > identifiers work on my system. Essentially, I can make an
> > *invalid* document valid!
>
> You can do this in any case, though -- you can intercept URIs in the
> system libraries (Java, for example, lets you register your own
> schemes), or you can redirect them with a proxy server.
>
> With URLs, file:// will almost always break on exchange, as will http:
> system identifiers that refer to hostnames visible only within a
> private network.
Yes, but then if you do this, don't expect it to work elsewhere. ;-)
Why would you use absolute URLs? Bad author, bad! Ok, maybe you would
use them for a standard DTD. ;-) (This is where I beat the URN drum)
<SGMLRANT type='mild'>
In the SGML world, I could come up with a scheme that made location
orthogonal to my documents. I *never* put a system identifier in my
documents. In XML, this is much harder.
</SGMLRANT>
<URNRANT>
Now, if URN support was *standard*, I could at least put a URN in the
place of every system identifier I needed and then my document is
quite portable. The key phrase here is *standard*.
</URNRANT>
Of course, we could also fix public identifiers and forget about the
URN stuff. ...but, then we would have to come up with
yet-another-resolution-mechanism... which sounds too much like URNs.
> Your other points (which I omitted above) are well taken -- public
> identifiers are a bit of a muddle right now, but since they're in XML
> 1.0, it makes sense to support them. The interface is not only for
> public identifiers, however -- users can also remote URIs to
> local/secure equivalents, and they can even screen out certain URIs if
> necessary. I'd better copyright "XML-Nanny" before someone else
> thinks of it.
Well, a further point I was making off-line is that this kind
of mapping could be lead people down the wrong road. I have run into
so many SGML users over the years that didn't know how to or *couldn't* use
public identifiers without system identifiers. In an SGML world, I see this
as bad practice. Likewise, I see mapping system identifiers in XML as bad
practice.
Two general rules I can recommend:
1. Use an internal resolution system inside your production
systems. Locations will change even inside your own system.
2. Use a fairly static naming system (URN/Public identifier) when
you exchange documents.
One thing XML has over SGML is that it is tied more closely to a location
mechanism. If you add in URN ability, there is no issue of "configuring"
you local system to know about mappings--you just do a URN lookup.
(Obviously, URNs can be miss-configured or not available. Ever had
problems on the Internet with DNS names? Same idea, same problem, same
frustration when it is wrong!)
==============================================================================
R. Alexander Milowski http://www.copsol.com/ alex at copsol.com
Copernican Solutions Incorporated (612) 379 - 3608
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list