Why aren't document entities named?

Wed Jun 23 09:56:11 BST 1999

Jeffrey E. Sussna wrote:

> The XML spec states "this specification does not specify how the document
> entity is to be located by an XML processor; unlike other entities, the
> document entity has no name and might well appear on a processor input
> stream without any identification at all." I believe that failure to 
specify
> a named identifier for document entities causes at least two problems:
> 1. There is no standard way to embed multiple WELL-FORMED documents 
within a
> single physical document entity. Actually it's easy to embed them, but
> difficult to extract them, since there's no standard way to detect "start 
of
> document". I can think of two obvious ways to do it: a) hardwire the 
parsing
> application to know about the root element; b) use a processing 
instruction,
> such as <?start-doc?>. Neither of these are satisfactory because they 
step
> out of the realm of a general standard.
> 2. Among other things, a document defines a scope for ID attributes. When 
a
> document maps 1-to-1 to a file, it is easy to construct an URL that
> identifies an element based on its ID. But if a file (or other storage 
unit)
> contains multiple documents, how do you address ID'd elements (or even 
the
> document itself). Again, the processing instruction could solve this 
problem
> by providing a document name, a la <?start-doc name="doc1" ?>.

What exactly do you mean by multiple documents in a single physical 
document entity?  If you mean something like this:

   <?xml version="1.0">
   <doc1>...</doc1>
   <doc2>...</doc2>
   <doc3>...</doc3>

then the result is not well-formed (it does not have a single root) and is 
therefore not an XML document. Note that there is nothing to stop you from 
placing multiple XML documents in the same file. However, the addressing 
and extraction mechanisms are outside the scope of XML.

You could include these inside a single root, in which case you could 
address each fragment with XPointer. However, all fragments would share the 
same ID space, which sounds like it's a problem for you. Unfortunately, 
there is no way around this using XML. You either have one document with a 
single ID space (which is XML) or one structure with multiple documents and 
multiple ID spaces (which is not XML). It is not surprising that XML 
constructs such as an ID attributes won't work in the latter case (nor will 
entities, DTDs, or a lot of other things).

For better or worse, XML does not define where or how documents are stored. 
Such a universal addressing scheme is clearly beyond the capabilities of 
XML, which is simply a document format. Put another way, RTF, Lotus 1-2-3, 
Latex, and the guy down the hall whose instruments spit out comma-separated 
files all define document formats, but none defines document addressing 
schemes.

-- Ron Bourret

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)