Streaming XML (Was RE: XML Information Set Requirements, W3C Note 18-February-1999)

Sat Feb 20 17:36:15 GMT 1999

On streaming ...

I completely agree with - and we have implemented - Marc's notion:

> What could be accomplished is a unified solution to problems addressed

> and/or recognized in SAX, XSL, queries, DOM, and fragments. It also 
> provides a model for a data server as an XML 'document' constructor.

We now treat our web servers logically as 'XML servers', with either one
massive document on or thousands of smaller ones, whichever way you want
to slice it. The delivery of those documents in a 'stream' is marked by
the opening and closing elements. Anything before the opening element is
'prolog', but is not necessary; it may give the recipient additional
information as to what to expect, such as XML version number or DTD. And
anything after the final element is not part of the document, so the
stream can be 'closed' when the final element is received.

I don't think, therefore, that things are as complicated as Simon
implies:

> As far as streaming is concerned, it seems like the hardest thing in
its
> way is the prolog and the requirement of a root element.  Establish
the
> prolog information at the start of the stream, figure out a way to end
a
> stream, and go.

We already have a way to end the stream - with the closing element. And
the prolog is just the prolog for the document. I think part of the
problem here is when people try to map the stream itself to a document.
You end up with an extra layer of document that is not really part of
your data and confuses things. Take Marc's example of regular
transmissions of:

    <sensor-quantum timestamp="19990220T142003">
        <speed>25</speed>
        <direction>N</direction>
    </sensor-quantum>

What other information does your server need? You have the start and end
of stream info with the element tags. You could make it more
sophisticated by sending the DTD along too, but otherwise we have
everything we need to delineate. BUT ... it would be odd programming
practice to then wrap these individual documents in a bigger document
that represents the stream, because you are no longer representing your
data, you're representing the carrier. (That doesn't preclude storing it
for later use wrapped in a containing element, but we are talking about
the input stream here.) Which is why I have to disagree with the
following comments:

John Cowan wrote:
> Nathan Kurz wrote:
> > And if the stream is continuous (for example, an XML
> > stock ticker) even the concept of a well-formed stream seems
tenuous.
>
> It's not clear that XML supports infinitely long streams (where the
> end-tag of the document element is *never* reached).

Firstly on the level of XML, since it *is* clear that XML does *not*
support infinitely long streams. An element is not an element without
its closing tag, and a document is an element. But secondly, why would
you do this anyway? If you have a series of stock prices being passed
down a wire, why do you then want to prefix them by an opening element
that says 'this is a stream of stock prices'. It tells us about the
medium, not the data - we already know that each packet is a stock
price. It's a bit like going:

    <hardDisk drive="d">
        <directory name="stock prices">
            <stockPrice>
                <ticker>MSFT</ticker>
                <midPrice>1000</midPrice>
            </stockPrice>
            <stockPrice>
                <ticker>IBM</ticker>
                <midPrice>1000</midPrice>
            </stockPrice>
        </directory> 
    </hardDisk drive="d">

You have put into your data information about the data's carrier - 'this
is a carrier for stock prices' - which you'd kinda hope the receiving
application knew already! If you further think through real-world
examples, then this 'open a stream for the rest of the day' method
becomes even worse. Take the UK Stock Exchange data. They have seven or
eight data sources that pump out data all day long. One has the bid and
offer prices as they're changed by market-makers, another would be the
volumes of trades, another would be news headlines, and so on. If we say
'here is a document of news headlines' in the morning, and don't send
the closing element 'till after tea, then we can't put anything on that
wire other than news headlines (and really you shouldn't process
anything until you receive that closing element, but I know that's what
people are requesting they can do). However, if you treat the wire as
'stateless', you simply send each document as a self-contained entity -
news, quotes, trades and so on, as well as types not yet invented.

Now, there is nothing wrong with having stream information in the stream
itself. Of course you'd want to be able to have:

    <streamOpen name="MarketLine" timeStamp="19990220T083000" />
    <stockPrice timeStamp="1003">
        <ticker>MSFT</ticker>
        <midPrice>1000</midPrice>
    </stockPrice>
    <streamStatus name="MarketLine" status="open"
timeStamp="19990220T110000" />
    <stockPrice timeStamp="1103">
        <ticker>MSFT</ticker>
        <midPrice>1002</midPrice>
    </stockPrice>
    <streamClose name="MarketLine" timeStamp="19990220T173000" />

In other words, the 'stream' contains stream-control data - regular
timestamps to synchronise clocks, status information if the stream is to
close for maintenance and so on - but that is not part of some
super-mega-stock price document.

In the past, streams of data like this were encoded with all sorts of
checksums and so on, to ensure the accuracy of transmission. But there
was little that could be done about the accuracy of the data in its
internal relationships. This had to be encoded into all receiving
applications, and became very difficult to maintain. Now, however, we
can send a DTD down with the data which gives an indication of what the
data is meant to look like. On failure the recipient knows exactly which
node has failed, and could even re-request just that node. (In the past
you'd need the whole packet.)

Regards,

Mark

Mark Birbeck
Managing Director
Intra Extra Digital Ltd.
39 Whitfield Street
London
W1P 5RE
w: http://www.iedigital.net/
t: 0171 681 4135
e: Mark.Birbeck at iedigital.net

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)