SAX and delayed entity loading

Thu Dec 3 20:59:57 GMT 1998

At 02:18 PM 12/3/98 -0600, you wrote:
>At 02:52 PM 12/3/98 -0500, Simon St.Laurent wrote:
>
>>Will the software that reads your documents still be running in a hundred
>>years, or will people be cracking open your archives with applications that
>>can't make head or tail of your notations?
>
>IT'S NOT ABOUT SOFTWARE.
>It's about knowing the software requirements are.  Given a notation
>definition document I, as a programmer, should be able to understand what
>needs to be done to implement the software needed to support the notation.
>That's the whole point.  It's a given that software will come and go but
>data will remain.  Thus a mechanism that indirects from data to software
>through the definition for the requirements of the data type.

I know it's not about software.  How can you guarantee that in a hundred
years someone will be able to find the identifiers used in your notations?
Will they be at the same network address?  Will they have to go through yet
another stack of backup tapes?  What if you referenced something outside of
your network and its control, and there is nothing left...  (I think
they'll start wishing it was more about software and less about indirection
at that point.)

>If someone creates a reference to anotation for which the documentation is
>unavailable, then they, not the system, has screwed up.  If I don't provide
>a good URL or URN or public ID for a notation when I declare it, then I've
>made a terrible mistake.  If I define a notation and don't document it,
>I've made a terrible mistake.  Notations simply try to reinforce the idea
>that for every thing there better by golly be some documentation and it
>better be where people know how to find it.

Again, why should document authors (and even schema authors) be responsible
for 'not screwing up'?  Seems like something much better handled by other
mechanisms.  Well-known and generally reliable mechanisms like MIME
content-type headers avoid this problem completely.

>I observe that XML is itself an excellent example of a data notation that
>can be reliably declared using the notation mechanism because it is both
>well documented and the authoritative name for it is well managed.  In my
>opinion this is the canonical declaration for the XML notation:
>
><!NOTATION whatevernameyoulikeitslocalsoitdoesntmatter
>  PUBLIC "http://www.w3.org/TR/REC-xml"

In my opinion, the canonical declaration is a lot simpler:

<?xml?>

or 

<?xml version='1.0'?>

The MIME type application/xml references http://www.w3.org/TR/REC-xml if an
application can't figure that out.  By the time it reaches your notation,
we have three layers of identification.  I'd give decent odds that the URL
in your NOTATION is more likely to have a typo than an automatically
generated application/xml or even <?xml?>.  And who's to say the W3C will
still be around, at that address, or that DNS to resolve it will still be
around in 20 years?

>Now there is no question that the document is *expected to be* an XML
>document. Whether it is or not is another question, but I really do need to
>know in advance what I, as author of this document, expect it to be.
>Without this, I'm just throwing pointers around without any way of saying,
>as an author, what I expect to get.

Why specify?  Why are you so concerned about getting what _you_ expect to
get?  Why not leave it open, and build applications that can handle such
'unreliability'?  Oddly enough, they tend to be more reliable, certainly
more extensible, and can be used in wider number of scenarios than the
author originally planned for.

>The fact that, in an HTTP environment (one of an infinite number of
>possible environments in which I might be using both documents), a MIME
>header will come back telling me what the server says it thinks the
>resource is (which is not necessarily what the document really is) lets me
>do a sanity check by making sure that my expectation and the result are the
>same. But of course, I didn't actually need the MIME type in this case as
>XML documents are self describing (but it might be nice to know if the
>server is correctly configured).

I'd argue that your notation is redundant and that you didn't build a very
flexible application.  It may be okay in some situations, but not build a
more flexible system to start with?  Or perform your type checking
someplace else?

>So in that sense, the MIME type is redundant for any data type that is
>already self describing (e.g., XML, SGML, most graphic formats, VRML,
>etc.).  Hmmm.

As I said above, MIME + self-describing is enough of a check for me.  The
third check, the notation, is redundant and unnecessary.

Simon St.Laurent
XML: A Primer / Cookies
Sharing Bandwidth (December)
Building XML Applications (January)
http://www.simonstl.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)