external parsed entites (was: A unique ID question ?)

Vane Lashua vlashua at RSGsystems.com
Tue Nov 9 15:40:32 GMT 1999


... and what, master [no cynicism implied], is the mechanism you have come
up with for Texans?
Vane
-----Original Message-----
From: W. Eliot Kimber [mailto:eliot at isogen.com]
Sent: Monday, November 08, 1999 9:48 AM
To: xml-dev at ic.ac.uk
Subject: Re: external parsed entites (was: A unique ID question ?)


"Steven R. Newcomb" wrote:
> 
> [Eliot Kimber:]
> 
> > [...snip...] external parsed entities ("external text entities" in
> > SGML parlance) are evil and should never have been included in XML.

[...]

> The trick is for people to realize that the use of parsed external
> entities should be avoided unless their use is purely for storage
> convenience.  There should never be a semantic load on the fact that
> some data is stored in a particular entity.  Regrettably, most of the
> times when parsed external entities have been used, they have been
> used for impure, semantically loaded purposes (such as the semantic of
> re-usability).  So maybe as a matter of good public education policy,
> Eliot's position that "external parsed entities are evil" is the right
> approach, as long as we're not going to get rid of XML's ability to
> support them.

The reason I think that external parsed entities are evil is precisely
because they encourage people to do this stupid thing *which is almost
never the right thing to do*. People innocently and understandably use
them to solve immediate practical problems, encouraged by tools that
support only this form of data organization (at least out of the
box--most, if not all, can be taught to do the right thing), only to
wake up one day and discover they have an unworkable system that will
now require huge effort to rework to meet their requirements.  To my
mind, this is the same sort of evil as Microsoft Word: an immdiately
convenient and apparently useful facility that is available everywhere
that you can use without thinking, only to realize later that you're
screwed eight ways from Sunday for having used it without thinking about
it carefully first.

Or maybe it needs to be a social presure thing: "friends don't let
friends use external parsed entities for re-use."

The *ONLY TIME* external parsed entities are the right thing to do is
when exactly one actor is working on exactly one doucment and needs to
partition it into separate files for *their own* convenience. *AT ALL
OTHER TIMES* external parsed entities are not the right thing to use.

I think it would have been much better to force all non-trivial XML
applications to step up to the reality that almost all documents are
compound documents semantically (but not syntactically) composed of
separate XML documents. If SGML or XML had done this, we wouldn't be
having this discussion now because the problem wouldn't arise.

Note that there is the inverse requirement as well: the need to
*syntactically combine* structures that are *semantically* separate
documents. This is the problem that name spaces try to solve (but fail
to solve, for reasons that should be obvious and that I will not repeat
having expounded on them at length in the past). The solution has often
be characterized as "inline subdocs". I am convinced that without this
feature, XML and SGML are inherently limited in a serious way (as
evidenced by the sadly misguided excitement over name spaces, which
demonstrates a serious unmet requirement). [Fortunately, the
infrastructure you need to have in place to do re-use can also support
this form of document, so we can survive without this feature although
the solution is suboptimal and not very satisfying.]

Essentially, we need to have a markup and processing infrastructure that
decouples the storage structure of documents from their logical
structure so that we can have it both ways: one storage object with
multiple documents or one document distributed across multiple storage
objects, as well as true reusability.

Lest people think that I'm just talking: I have contributed to the
development of a very large scale SGML/XML-based system now in
production that is entirely based on the non-use of external parsed
entities. I am currently helping to implement another such system for
the State of Texas. Everything you need can be done with existing tools
and technology with very little extra effort *if you do it up front*.

My take is that there is no excuse for not doing it right. It's not like
the tool developers don't understand the issue nor are they unware of
the existing standards and implementing technology (I know because I've
talked to all the editor and database vendors personally at some length
about this).

Cheers,

E.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN
981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list