Why aren't document entities named?
Ronald Bourret
rbourret at ito.tu-darmstadt.de
Wed Jun 23 09:56:11 BST 1999
Jeffrey E. Sussna wrote:
> The XML spec states "this specification does not specify how the document
> entity is to be located by an XML processor; unlike other entities, the
> document entity has no name and might well appear on a processor input
> stream without any identification at all." I believe that failure to
specify
> a named identifier for document entities causes at least two problems:
> 1. There is no standard way to embed multiple WELL-FORMED documents
within a
> single physical document entity. Actually it's easy to embed them, but
> difficult to extract them, since there's no standard way to detect "start
of
> document". I can think of two obvious ways to do it: a) hardwire the
parsing
> application to know about the root element; b) use a processing
instruction,
> such as <?start-doc?>. Neither of these are satisfactory because they
step
> out of the realm of a general standard.
> 2. Among other things, a document defines a scope for ID attributes. When
a
> document maps 1-to-1 to a file, it is easy to construct an URL that
> identifies an element based on its ID. But if a file (or other storage
unit)
> contains multiple documents, how do you address ID'd elements (or even
the
> document itself). Again, the processing instruction could solve this
problem
> by providing a document name, a la <?start-doc name="doc1" ?>.
What exactly do you mean by multiple documents in a single physical
document entity? If you mean something like this:
<?xml version="1.0">
<doc1>...</doc1>
<doc2>...</doc2>
<doc3>...</doc3>
then the result is not well-formed (it does not have a single root) and is
therefore not an XML document. Note that there is nothing to stop you from
placing multiple XML documents in the same file. However, the addressing
and extraction mechanisms are outside the scope of XML.
You could include these inside a single root, in which case you could
address each fragment with XPointer. However, all fragments would share the
same ID space, which sounds like it's a problem for you. Unfortunately,
there is no way around this using XML. You either have one document with a
single ID space (which is XML) or one structure with multiple documents and
multiple ID spaces (which is not XML). It is not surprising that XML
constructs such as an ID attributes won't work in the latter case (nor will
entities, DTDs, or a lot of other things).
For better or worse, XML does not define where or how documents are stored.
Such a universal addressing scheme is clearly beyond the capabilities of
XML, which is simply a document format. Put another way, RTF, Lotus 1-2-3,
Latex, and the guy down the hall whose instruments spit out comma-separated
files all define document formats, but none defines document addressing
schemes.
-- Ron Bourret
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list