Why aren't document entities named?
rbourret at ito.tu-darmstadt.de
Wed Jun 23 09:56:11 BST 1999
Jeffrey E. Sussna wrote:
> The XML spec states "this specification does not specify how the document
> entity is to be located by an XML processor; unlike other entities, the
> document entity has no name and might well appear on a processor input
> stream without any identification at all." I believe that failure to
> a named identifier for document entities causes at least two problems:
> 1. There is no standard way to embed multiple WELL-FORMED documents
> single physical document entity. Actually it's easy to embed them, but
> difficult to extract them, since there's no standard way to detect "start
> document". I can think of two obvious ways to do it: a) hardwire the
> application to know about the root element; b) use a processing
> such as <?start-doc?>. Neither of these are satisfactory because they
> out of the realm of a general standard.
> 2. Among other things, a document defines a scope for ID attributes. When
> document maps 1-to-1 to a file, it is easy to construct an URL that
> identifies an element based on its ID. But if a file (or other storage
> contains multiple documents, how do you address ID'd elements (or even
> document itself). Again, the processing instruction could solve this
> by providing a document name, a la <?start-doc name="doc1" ?>.
What exactly do you mean by multiple documents in a single physical
document entity? If you mean something like this:
then the result is not well-formed (it does not have a single root) and is
therefore not an XML document. Note that there is nothing to stop you from
placing multiple XML documents in the same file. However, the addressing
and extraction mechanisms are outside the scope of XML.
You could include these inside a single root, in which case you could
address each fragment with XPointer. However, all fragments would share the
same ID space, which sounds like it's a problem for you. Unfortunately,
there is no way around this using XML. You either have one document with a
single ID space (which is XML) or one structure with multiple documents and
multiple ID spaces (which is not XML). It is not surprising that XML
constructs such as an ID attributes won't work in the latter case (nor will
entities, DTDs, or a lot of other things).
For better or worse, XML does not define where or how documents are stored.
Such a universal addressing scheme is clearly beyond the capabilities of
XML, which is simply a document format. Put another way, RTF, Lotus 1-2-3,
Latex, and the guy down the hall whose instruments spit out comma-separated
files all define document formats, but none defines document addressing
-- Ron Bourret
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev