Streams, protocols, documents and fragments

Nathan Kurz nate at valleytel.net
Wed Feb 24 00:05:04 GMT 1999


Jonathan Borden writes:
> <term>document</term> is defined as in the XML spec. documents are well
> formed. when a document fragment is isolated from its parent document, it
> becomes a standalone document.

Sounds fine so far...
 
> a document may contain a prolog. a document fragment may not. a document may
> contain a !DOCTYPE definition (DTD), a document fragment may not. Hence all
> document fragments are legal documents but not all documents are legal
> document fragments.

I think I follow what you are saying, but I'm confused why you would
choose to define a document fragment in this way.  Why can't it
contain a prolog?  Are you assuming that document fragment must be
produced as a reduction of a parent document?  It strikes me as very
odd to define 'document fragment' as a superset of 'document'.

> <term>stream</term> is ambiguous but generally refers to a series of bits or
> bytes or characters. In general, a stream behaves similarly to a socket.

Yes, or going further, a stream behaves similarly to a continuous
unidirectional broadcast.  The canonical stock ticker might well be
continuously transmitting the XML data by FM radio.

> <term>protocol</term> is layered above a network transport, or socket and
> defines a mutually agreed upon mechanism to exchange messages and other
> data.
>
> So what does this have to do with XML? The canconical example of streamed
> XML is the stock ticker. Assuming each stock quote is transmitted in a
> document, the HTTP protocol can employ a particular URL e.g.,
> http://wherever/quotes/next to return the next quote as a single document.
> Suppose we wish to transmit 100 quotes as distinct documents, this does not
> work with HTTP which returns a single MIME message response for each
> request. The solutions would be to employ 1) multipart messages 2) wrap the
> quotes in a single document 3) use another protocol.
> 
> Suppose we use raw sockets? Nothing to prevent sending one document after
> another down the socket. The end of one document and the start of another
> are unambigous assuming the documents are well-formed.

I completely agree.  Which is to say that using a non-XML character
(cntl-l or cntl-c) as a seperator might be a useful protocol, but is
not necessary.  Nothing prevents one from sending multiple documents
serially as unadulterated XML.

> So, the problem here is not one with XML, rather the protocol used to
> transmit documents, HTTP and SMTP send one MIME message per PDU, streaming
> protocols can be defined which transmit multiple documents.

But the definition of XML processor does become a problem here.  If
the stream consists of multiple XML documents, one must use an
XML-aware processor to parse it.  But this had better be a
non-conforming XML processor, since according to the spec a
'conforming XML processor' must cry foul if its input doesn't have one
and only one root element.

nathan kurz
nate at valleytel.net




xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list