Documents and Document Fragments (Was RE: XML Information Set Requirements, W3C Note 18-February-1999)

Mon Feb 22 21:52:56 GMT 1999

James Tauber wrote:
> Mark Birbeck wrote:
> > Of course, if all of
> >your documents (logical) are stored as text files 
> (physical), or to put
> >it another way, if there is a one-to-one mapping between 
> your physical
> >and logical XML documents, then none of this is of any use to you;
> 
> XML documents = text files. What you are calling "logical XML 
> documents"
> aren't "XML documents" in the sense of the XML 1.0 REC. I'm 
> not arguing
> about the value of what you are talking about doing. I think 
> it's the way to
> go. I am just trying to be careful with the terminology.

I apologise for this James. I was using the terms in a loose sense, and
I had no idea that they were actually used in XML 1.0. I've gone back to
the spec and I see exactly what you mean. Sorry ...

So, I'll try to re-state my case, this time without mixing up the terms:

XML 1.0 refers to a 'data object' that can be an 'XML document',
providing it meets certain criteria (namely having prolog, element and
misc.) There is no discussion of the means by which this 'data object'
is conveyed to a parser, other than it must meet the criteria for a
'physical' XML document.

This is relevant to the question of serialisation in two ways. First,
there is no reason why a stream cannot contain many of these
'data-objects-that-conform-to-the-criteria-of-an-XML-document'.
Secondly, those who want 'infinite documents' are challenging XML 1.0
pretty much at its opening paragraph! To be an XML document, your 'data
object' - in whatever form, whether text file or stream - must be
'well-formed'. So, by definition you cannot have an XML document with no
closing tag.

[I'm not accusing you of these things James, just thought I'd slip it in
to subliminally back up my other thread on streaming.]

It's also relevant to document fragments. In previous posts, I was
trying to say that as far as a parser is concerned, whether it receives
a complete XML document by retrieving a file from a disk, a page from a
web server, or four nodes from an object database is neither here nor
there. As far as it is concerned, it has an 'XML document'. I called
this a 'logical' document because I wanted to indicate that it may not
actually exist in any physical form, but it is a
'data-object-that-conforms' item, and that if we can process an 'XML
document' we can process one node, many nodes or the whole tree. You
don't then need to devise another system to process well-formed
'uberdocuments', and yet another to process well-formed 'document
fragments' or 'microdocuments' or whatever.

However, unfortunately for me ;-), XML 1.0 uses these terms quite
specifically, and what I have called the 'logical' representation of an
XML document, 1.0 has as a 'physical' document, that is a sequence of
elements, attributes, comments, etc.

So, in the XML 1.0 terminology my database full of objects actually
contains 'logical' documents, in the sense that they are XML documents
in some abstract way - they have a hierarchy, attributes, and so on.
When the database is queried I create 'physical' manifestations of those
'logical' documents (but not in the physical sense of a text file,
rather in the sense of a sequence of characters that can be parsed)
which can themselves be made up of many 'logical' documents.

My point was to argue that we do not need notions of uberdocuments, or
document fragments to solve problems of streaming or storing documents
in databases - the terms already exist in XML to understand this. For
example, I saw a site the other day that used the term Microdocuments to
explain a database product. Initially I thought that was quite a useful
way of looking at it, but on re-examining 1.0 you realise that what they
- and we - have is simply a 'logical-to-physical' XML document server,
which you would be hard pushed to say was not anticipated in the spec.
(Of course I recognise the need to put this in terms others will
understand, and hence I'm quite jealous they thought of 'Microdocument'
first! But that's marketing.)

This is therefore why I said:

> >I pointed out that all this fits with the XML 1.0 notion of a logical
> >document, in order to stress that we don't need some other terms
> >inventing to cope with these concepts.

Of course, now I realise that to be precise I should have said
'physical' instead of logical, in that it's 'a sequence of characters in
some form or other, that a parser could interpret as a complete XML
document'.

Anyway, thanks for being pedantic, because I now see how I have confused
the issue - apologies to anyone who read the postings!

But despite my confusing use of the words, I still think my main point
is right! - we don't need extra terminology

Best regards,

Mark

PS Got your other mail. Will get in touch next week - got to get the
site live this week.

Mark Birbeck
Managing Director
Intra Extra Digital Ltd.
39 Whitfield Street
London
W1P 5RE
w: http://www.iedigital.net/
t: 0171 681 4135
e: Mark.Birbeck at iedigital.net

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)