SAX and delayed entity loading
W. Eliot Kimber
eliot at dns.isogen.com
Thu Dec 3 16:12:24 GMT 1998
At 09:30 PM 12/2/98 -0500, david at megginson.com wrote:
>I think that notations and unparsed entities in XML have proven
>themselves to be non-starters. They worked well in the SGML and have
>done me good service, but MIME types and hrefs provide the same
>functionality (if somewhat weaker validation) and they work with or
>without a document type declaration.
I can't agree with David's statement that MIME types and hrefs provide the
same functionality as notations and external data entities. They are
similar, but weaker:
1. href provides no indirection mechanism, which is one of the key points
of entities. By concentrating the mapping of local names to storage objects
in the document prolog, processors (and authors) do not need to scan an
entire document to know what the doc-to-entity dependencies are. For small
docs this doesn't really matter, but for very large docs, this can be a
significant savings. This is why the HyTime bounded object set facility is
defined in terms of entity declarations and not entity references.
2. Notations provide a richer degree of data type specification that is
more flexible and more generally applicable than MIME types. For example,
how do you apply a MIME type to an element or attribute? Notations are one
of the most underappreciated aspects of SGML.
The fact that you need a DOCTYPE declaration to use them is a minor
problem, but having a DOCTYPE declaration doesn't mean you have a DTD, it
only means you have the minimum declarations needed to understand a
document. I think we should be careful to distinguish documents with no
explicit prolog from documents with no explicit element type declarations.
My personal opinion is that SGML has inappropriately conflated the element
type declarations with the entity declarations. The former define the
syntactic rules for the document, the latter define the storage
organization of the document. These are two fundamentally different and
unrelated things and should be completely syntactically separated. It is
unfortunate that they are not. [I do agree with David that XML should
never have included external text entities.]
Of course, the use or non-use of entities and notations is a data
management choice that has to be made on a case-by-case basis. There are
certainly classes of document for which the indirection of entities does
not provide sufficient benefit to justify the cost. But that isn't the case
for all XML documents.
So saying that entities and notations are non-starters is, I think, a bit
strong.
Cheers,
E.
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 75202. 214.953.0004
www.isogen.com
</Address>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list