Documents and Document Fragments

Tue Feb 23 23:13:37 GMT 1999

Mark Birbeck wrote:
> It's also relevant to document fragments. In previous posts, I was
> trying to say that as far as a parser is concerned, whether it receives
> a complete XML document by retrieving a file from a disk, a page from a
> web server, or four nodes from an object database is neither here nor
> there. As far as it is concerned, it has an 'XML document'. I called
> this a 'logical' document because I wanted to indicate that it may not
> actually exist in any physical form, but it is a
> 'data-object-that-conforms' item, and that if we can process an 'XML
> document' we can process one node, many nodes or the whole tree. You
> don't then need to devise another system to process well-formed
> 'uberdocuments', and yet another to process well-formed 'document
> fragments' or 'microdocuments' or whatever.

Although it may reflect the state of existing parsers, I disagree with
this assessment of how XML parsers must relate to 'XML documents' and
'document fragments'.  It seems like it has things backwards.  You
imply that if a parser is able to process a collection of nodes in one
particular form, that it is able to process a collection of nodes in
any arrangement whatsoever.  Perhaps, but not necessarily.

An XML document has to have a root node.  A subset of that document,
produced by an XSL engine or by some other means, doesn't necessarily
have a root node.  An XML parser may or may not require that a
document has a root node.  Any parser capable of handling documents
without a root will do fine if one exists, but the reverse it not
necessarily true.

Perhaps the question is whether there is a difference between an 'XML
parser' and an 'XML document parser'.  Which brings up the question of
whether there is such as thing as XML (by definition well-formed) that
is not a XML document.  I think there is, and that this is where the
term 'document fragment' is useful.

Here's a simplified version of how I'd like the world to be defined:  :)

document fragment: 
	 A piece of well-formed XML, that may or may not have a root
	 element.  In the parlance of the spec, it would probably
	 be called a 'well-formed textual object'.  Doesn't even have
	 to contain any elements.  Colloquially synonymous with 'XML'.

XML document:
	 A document fragment that, in the words of the spec, 'when
	 taken as a whole matches production labeled document'.  In
	 practice, some XML with a single root element.  Parsed
	 entities must also be well formed.

XML parser: 
	 Something that accepts XML as input and/or treats its input
	 as XML.  May or may not care if the input is well-formed
	 and/or valid.  It's quite possible that cat(1) would qualify
	 as an XML parser by this definition.

XML document parser:
	 An XML parser that expects an XML document as input, and
	 complains if it does not receive one.  A 'conforming XML
	 processor' in the spec's terms.  (Although the spec often
	 uses just 'XML processor' and implies 'conforming').

Which is to say, that I think the notion of 'document fragment' is
still useful, and that it is worthwhile to think about textual XML
that is not in the form of an XML document.

nathan kurz
nate at valleytel.net

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)