Streams, protocols, documents and fragments

Wed Feb 24 16:14:56 GMT 1999

     I agree with the arguments so far - just send lots of little 
     documents, and the protocol is just a layer on top, to be removed by 
     the input stream processor.

     But, isn't the example below not wf XML - it doesn't seem to have a 
     prolog? 

     I have no problem with that either - again, you need a client side 
     stream processor to pick apart the XML ... what do I call them? FSA 
     'chunks' ... chunks and, using some client side determination, add the 
     prolog - and then pass it to the XML parser as a WF (and hopefully 
     valid) XML document.

     This is 'trivial', and interleaving the protocol stuff is no great 
     problem (plenty of examples, and I've done it at least 5 times for 
     different socket-based systems).

     My concern tho' is that we require a piece of Client-side stream 
     processing logic to pick up the XML 'chunks' and convert them to Valid 
     WF XML - and this is not standard (read 'generally agreed' to avoid 
     mention of inertia).

     Fun tho'

     tim

______________________________ Reply Separator _________________________________
Subject: RE: Streams, protocols, documents and fragments
Author:  Mark.Birbeck (Mark.Birbeck at iedigital.net) at unix,mime
Date:    24/02/99 15:18

> From: Borden, Jonathan [SMTP:jborden at mediaone.net] 
> My sole purpose in discussing 'document
> fragments' was because the thread had gotten stuck on the notion that 
> a
> continuous XML stream would contain a single long document (perhaps 
> w/o a
> closing tag) and the actual PDU's consist of document fragments ... 
> the
> point is that if we create a protocol on a stream which transmitts 
> multiple
> documents, there is no loss of functionality over a solution employing 
> 'document fragments'
> 
        I agree with this. And the point I was trying to get to was that
therefore we don't need to introduce loads of terms on top of XML 1.0 to 
understand the concepts.

        I still think all of this is being over-complicated - but then
maybe I'm the one who's missing something, so let's see.

        I don't follow why so many suggestions to resolving this problem
involve stepping 'outside of' XML 1.0. We have suggestions for sync 
characters like ^C and ^L, we have the proposal that XML 1.0 should be 
fundamentally altered to allow the concept of a 'not well-formed' 
document (or one that may *become* well-formed at some point in the 
future), we have proposals for documents that contain subsets of 
validity. All of these suggestions seem to go against the grain of what 
XML is about.

        XML 1.0 already copes with streams and files. A physical XML
document is a linear sequence of characters conforming to certain rules. 
You can't tell whether those rules have been met until you have received 
the entire sequence of characters. You know when you've reached the end 
by the closing tag. That's it! There's not much else you can do about 
it, because that's what XML is all about - well-formed, possibly 
validated documents conforming to certain rules.

        Now, the fact that the beginning and end of this sequence of
characters may be presented to the parser eight hours apart is to me an 
application problem. If someone has a document that takes eight hours to 
arrive then maybe they should re-think how they're setting the system 
up. If it's a massive document that can only be processed in its 
entirety, and if any part fails to arrive the whole document fails, then 
sure, you have to go ahead and send it over eight hours. But the stock 
ticker example is not like this. If I miss the stock price for Microsoft 
at 11am, then I can still make use of the stock price for Microsoft at 
11.20am. It will affect my historical archives, but at least I have 
something to display. It is not an 'all or nothing' situation.

        So, accepting for a moment that we should transmit many
documents throughout the day, rather than one big one, it leaves the 
question of demarcation. And here I'm surprised that people want to step 
outside of XML to find a solution. Say we send the following:

        ^L
        <stockPrice timestamp="19992402141500">
            <ticker>MSFT</ticker>
            <price>1000</price>
        </stockPrice>
        ^L
        <stockPrice timestamp="19992402132540">
            <ticker>ICI</ticker>
            <price>1010</price>
        </stockPrice>
        ^L

< Protocol stuff snipped >

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)