Appending to an XML document
Rick Jelliffe
ricko at allette.com.au
Sat Dec 11 08:26:01 GMT 1999
From: David Megginson <david at megginson.com>
>You have to distinguish the document boundaries in a single stream
>before you pass it on to the parser. For example, you could use ^L as
>the document separator, and start a new parse each time you see it.
This is at least the third time this has come up (from memory, it was
discussed during the XML development, then a year ago on XML-DEV).
So it would be good to make a QnA for the SGML FAQ on this subject.
Can anyone suggest an improvement on the following?
-----
Q. How can I have an unending stream of data in XML?
A. You must use a stream of XML documents. The simplest way
to do this is separate each document with ^L, which is not an
allowed character in XML and which is not used for in-band
signalling by common streaming systems.
If the incoming stream terminates unexpectedly during
a document, then that document is not well-formed. You
should consider how to handle such fragments.
Note that "document" is a technical term meaning a
"collection of information that is processed as a unit"
(ISO 8879:1986) and represents a distinct layer between
storage/transport (e.g., entities, streams, archives) and
publication. An open-ended stream must be partitioned
into distinct XML documents, for example, one per
entry. Consequently, you cannot use ID/IDREF for
references between documents in a stream, but rather
you should use some more general reference mechanism,
such as W3C XPointers.
Another alternative, suggested by Uche Ogbuji, is suitable
when the incoming log data is to be sent to a file rather than
processed:
The schema (your "underlying data model") for my XML logging document
would be
as follows:
<!ELEMENT log (entry*)>
<!ELEMENT entry (#PCDATA)>
My low-level logging code (where efficiency counts more than schematics)
would
manage a disk file in the form
<entry>Nam Sybillam quidem Cumis ego oculis meis vidi in ampulla
pendere</entry>
<entry>Pueris respondebat "Volo perire"</entry>
And appending is as efficient as you please. Let us say this disk file
was
"/var/log/classic.log"
The rest of the world (which is expecting an XML: document) would access
the
logs through the following
<?xml version="1.0">
<!DOCTYPE log [<!ENTITY lf SYSTEM "file:/var/log/classic.log">]>
<log>&lf;</log>
And ta-da! We've satisfied both our efficiency and semantic concerns
using
XML 1.0.
------
Why is this not in the XML Spec?
1) Simplicity and layering
2) It is not the W3C's business to make specs for streams of
entities: IETF is the forum for that.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list