XML Information Set Requirements, W3C Note 18-February-1999

Sat Feb 20 02:02:42 GMT 1999

I agree with Sussna, by thinking out the concept of a document more 
fully a number of interesting ideas present themselves.

In considering a document to be a stream or information set, it allows 
a distributive organization over a network. Instead of requiring the 
entire 'document' to be transferred en-masse as a file, it can be done 
piece-wise over a stream. Consider this just-in-time manufacturing of 
the 'document'.

Naturally, you can think of cases where only part of the entire 
document is needed. Subsetting of the document tree is one of the 
features of XSL.

Unifying these 2 ideas provides a new use for a DTD. It is not only a 
means to describe the valid structure of a document, but now can 
advertise the information available. A site can be described as 
capable of providing information sets in a set of structures defined 
by DTDs (or their replacement). A consuming application could request 
information by a pattern or query which would return the desired 
subset of information.

What could be accomplished is a unified solution to problems addressed 
and/or recognized in SAX, XSL, queries, DOM, and fragments. It also 
provides a model for a data server as an XML 'document' constructor.

In terms of architecture, it removes bottlenecks. Converting to a file 
model is expensive if the information is large and it can be used 
piecemeal on the other side. It is a worst-case solution. A 
demand-based stream model will create entire documents only if 
required by the ultimate consumer of the information and otherwise 
incrementally provide elements.

Marc B McDonald, Principal Software Scientist
Design Intelligence Inc, Seattle WA
http://www.design-intelligence.com

----------
From:  Jeffrey E. Sussna [SMTP:jes at kuantech.com]
Sent:  Friday, February 19, 1999 10:42 AM
To:  'Clark Evans'; 'Marcus Carr'
Cc:  xml-dev at ic.ac.uk
Subject:  RE: XML Information Set Requirements, W3C Note 
18-February-1999

I completely agree with Clark. As someone working with real-time XML 
streams, I think this is very important. In particular, the whole 
notion of "document" needs to be thought through very carefully in the 
context of 1999, rather than the context of 1990 when SGML was 
developed. If I may grow philosophical for a moment, I believe that 
XML is at a crossroads. That crossroads can be defined by examining 
the term "markup". I believe that XML is actually moving away from 
being "markup" oriented. First of all, one can easily imagine an XML 
document where all leaf-level elements are EMPTY, and contain all 
their semantics within attributes. In that case, there is nothing to 
be "marked up". Furthermore, when you apply XML to things like 
database record interchange, it really isn't a text-oriented 
environment anymore.

I believe that XML points more towards type systems than markup. If 
you look at a programming language, it generally supports 2 things (I 
am being very poetic and not rigorous here): defining and 
instantiating data types, and defining and instantiating operations on 
data. XML supports the first. It provides a mechanism to create and 
exchange instances of data types between external systems that will 
provide the operations on those data. The realization that DTD's are 
inadequate, and that a more robust schema specification language is 
needed, points in the same direction.

If you approach XML as a type system, the concept of document loses 
its first-class status (or at least should, in my opinion). It is 
interesting that the concept of document (even physical document as 
file) has crept into programming languages, and has caused problems 
there as well. The C language include directive is a physical rather 
than a logical mechanism. When you try to build a database-driven 
incremental build system, includes become problematic.

I would like to encourage the XML community to 1) pay attention to the 
lessons of 30 years of development in the arena programming and type 
languages, and 2) not get bogged down by the historical baggage of the 
M in XML.

Jeff Sussna

-----Original Message-----
From: owner-xml-dev at ic.ac.uk [mailto:owner-xml-dev at ic.ac.uk]On Behalf 
Of
Clark Evans
Sent: Thursday, February 18, 1999 9:09 PM
To: Marcus Carr
Cc: xml-dev at ic.ac.uk
Subject: Re: XML Information Set Requirements, W3C Note 
18-February-1999

Marcus Carr wrote:

> I think you might be applying a meaning to that
> phrase that it doesn't deserve - it doesn't call XML
> a document standard, it uses the term "XML document",
> with document defined in the XML recommendation as:
>
> "A data object is an XML document if it is well-formed,
> as defined in this specification. A well-formed XML document
> may in addition be valid if it meets certain further constraints."
>
> This allows you to use the phrases "XML data object" and
> an "XML document" interchangeably.
>
> This isn't incongruous with stream markup - you just need to
> consider the stream as an XML document. Seriously though, you
> probably wouldn't have the same concerns about "XML data object" 
...

My concerns would be even greater.  This conjures up in my
mind a Java or C++ object where the complete stream
has to be loaded in memory (or some other random-access
medium) before it can be used.  Yes, I know you can have
a multi-threaded implementation so that you can start using
the data object before it finishes reading, etc.  However,
given the object model it is *reasonable* for the niave
programmer to ask for something at the _end_ of the stream.
This will cause the call to block untill the stream ends.

If "data stream" processing was treated with *equal*
importance by the W3C committees, then they would see,
in many cases, that this complementary approach is at
least as good as, or in some cases far superior to
an "data object" approach.

Constantly viewing XML as a standard for the description
of "data objects" and not "data streams" is a subtle, and
important bias.  It is taking object-orientation too far
and discarding parallel stream processing, and it's related
technologies like SAX and SAXON.

:) Clark

xml-dev: A list for W3C XML Developers. To post, 
mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on 
CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following 
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)