SAX and delayed entity loading

W. Eliot Kimber eliot at dns.isogen.com
Thu Dec 3 20:28:47 GMT 1998


At 02:52 PM 12/3/98 -0500, Simon St.Laurent wrote:

>Will the software that reads your documents still be running in a hundred
>years, or will people be cracking open your archives with applications that
>can't make head or tail of your notations?

IT'S NOT ABOUT SOFTWARE.

It's about knowing the software requirements are.  Given a notation
definition document I, as a programmer, should be able to understand what
needs to be done to implement the software needed to support the notation.
That's the whole point.  It's a given that software will come and go but
data will remain.  Thus a mechanism that indirects from data to software
through the definition for the requirements of the data type.

If someone creates a reference to anotation for which the documentation is
unavailable, then they, not the system, has screwed up.  If I don't provide
a good URL or URN or public ID for a notation when I declare it, then I've
made a terrible mistake.  If I define a notation and don't document it,
I've made a terrible mistake.  Notations simply try to reinforce the idea
that for every thing there better by golly be some documentation and it
better be where people know how to find it.

I observe that XML is itself an excellent example of a data notation that
can be reliably declared using the notation mechanism because it is both
well documented and the authoritative name for it is well managed.  In my
opinion this is the canonical declaration for the XML notation:

<!NOTATION whatevernameyoulikeitslocalsoitdoesntmatter
  PUBLIC "http://www.w3.org/TR/REC-xml"
>

You could also argue that the MIME type itself would be appropriate as the
external ID, but then I've just added two layers of indirection to get the
spec itself (one to look up the MIME RFC, another to go from that document
to the XML spec itself, which I assume it references).

I can now use the XML notation for entities representing other documents
(irrespective of how they be used semantically or whether I actually
reference them from the instance):

<!ENTITY somedoc SYSTEM "whatever" NDATA
whatevernameyoulikeitslocalsoitdoesntmatter >

Now there is no question that the document is *expected to be* an XML
document. Whether it is or not is another question, but I really do need to
know in advance what I, as author of this document, expect it to be.
Without this, I'm just throwing pointers around without any way of saying,
as an author, what I expect to get.

The fact that, in an HTTP environment (one of an infinite number of
possible environments in which I might be using both documents), a MIME
header will come back telling me what the server says it thinks the
resource is (which is not necessarily what the document really is) lets me
do a sanity check by making sure that my expectation and the result are the
same. But of course, I didn't actually need the MIME type in this case as
XML documents are self describing (but it might be nice to know if the
server is correctly configured).

So in that sense, the MIME type is redundant for any data type that is
already self describing (e.g., XML, SGML, most graphic formats, VRML,
etc.).  Hmmm.

Cheers,

E.
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 75202.  214.953.0004
www.isogen.com
</Address>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list