Object-oriented serialization (Was Re: Some questions)

Fri Dec 3 22:54:36 GMT 1999

On Fri, 3 Dec 1999, Matthew Gertner wrote:

> David Megginson wrote:
> > How does the schema tell me that foo represents a container for a
> > collection of objects, bar represents an object, and hack and flurb
> > represent the object's properties?
> 
> The point is not what the current schema draft allows, it is whether it
> would be feasible and appropriate to represent this information in XML
> schemas, as Paul rightly stated. My opinion is that it would be fairly
> trivial and extremely useful.

I believe it will be possible to annotate XML schemas with information
for mapping into (generic or domain specific) application datamodels
such as RDF. I don't think it is right to expect the hard-pressed XML
Schema group to define all these mappings within that working group.
But that doesn't matter; all we need is a placeholder for such
information.

My understanding of the Cambridge Communique meeting was that we reached
agreement on just this. See points 1-6 under '3. Observations and
Recommendations' in http://www.w3.org/TR/1999/NOTE-schema-arch-19991007

If it _is_ really trivial to define a mapping from XML Schema
information to a classes/objects/properties RDFesque model, I for one would like to
see this documented and implemented. XML-DEV seems as good a place as
any to play around with such a thing...

Excerpt from the Cambridge Communique:
(I've no idea where the XML Schema WG's work is up to in relation to
these ideas; the basic principles outlined here seem enough to get
discussion going on XML-DEV though)

	[from http://www.w3.org/TR/1999/NOTE-schema-arch-19991007]
	3. Observations and Recommendations 
	This group reached consensus on the following observations and recommendations: 

	The XML data model is the XML Information Set being specified by the XML
	Information Set Working Group. Other data models exist, both generic and
	application-specific. RDF is an example of one such generic data
	model. The XML Schema and RDF Schema languages are separate languages
	based on different data models and do not need to be merged into a
	single comprehensive language. 

	An XML Schema schema document will be able to hold declarations for
	validating instance documents. It should also be able to hold
	declarations for mapping from instance document XML infosets to
	application-oriented data structures. 

	For evolvability and interoperability, the XML Schema specification
	should provide an extension mechanism allowing for the augmentation of
	XML Schema schemas with additional material. At a minimum, XML Schema
	should permit elements from other namespaces to be included in schema
	documents. This extension mechanism should also permit individual
	extensions to be marked 'mandatory', meaning that a document instance
	cannot be deemed 'schema valid' if the processing required by a marked
	extension cannot be performed. 

	The extension mechanism should be appropriate for use to incorporate
	declarations ("mapping declarations") to aid the construction of
	application-oriented data structures (e.g. ones implementing the RDF
	model) as part of the schema-validation and XML infoset construction
	process. This facility should not be exclusive to RDF, but should also
	be useable to guide the construction of data structures conforming to
	other data models, e.g. UML. 
	[...]

> > It can be.  The DOM represents a domain-specific object layer that is
> > useful for a wide subset of XML operations (especially document- and
> > browser-oriented work).  There need to be many layers on top of XML,
> > one for each domain -- it happens that many of those layers will share
> > the need to encode objects, so a standard object layer sandwiched
> > between XML and the domain-specific layers can save a lot of work.
> 
> Sure, the DOM has value. My point is that maybe 95% of applications want
> a domain-specific rather than a generic interface. My other point is
> that a domain-specific interface can be implemented generically; i.e.
> programmatic interfaces for accessing XML data can be generated
> automatically from XML schemas. This isn't *that* far from what MDSAX is
> doing. IBM's XML BeanMaker (http://alphaworks.ibm.com/tech/xmlbeanmaker)
> is a good example of this concept.
> 
> > > There are a variety of efforts to create
> > > domain-specific objects automatically from XML objects. I don't have a
> > > list at the tips of my fingers, but if anyone does it would be a great
> > > resource. They are out there because I keep bumping into them.
> > 
> > One example is RDF.
> 
> So we are talking about different things. RDF is a formalism but it
> doesn't provide you with any code (although I'm sure that tools for this
> could be written, and perhaps already have been). I am talking about
> something that will take my schema with Customer and Invoice element
> types and turn it into, say, Java classes called Customer and Invoice.

Sure, you could do this. My hunch is that the urge to do this won't be
as strong when we have more abstract (objects and properties) interfaces
to XML content, rather than our current APIs that obsess on detail of
particular serialisations rather than on what those serialisations have
told us about the objects. If we could get to a world where generic
rather than domain interfaces being useful to even 10% instead of 5% of
applications (to borrow your figure), that'd be a huge win.

> > I disagree strongly with the last part of that statement.  I'd argue
> > the opposite -- higher-level layers should be as independent of XML as
> > possible.  That's the only way to build good, layered architectures.
> > XML does one thing (represent a tree structure in a character stream)
> > very well: it's an excellent layer to build other layers on top of,
> > but XML itself should stay as simple as possible so that it's
> > applicable widely to many different fields.
> 
> I agree with the layering approach. But well-formed XML should be viewed
> as the lowest level (representing tree structures); when bound to an XML
> schema it then becomes a serialized object representation.

There is also a need to know the objects'n'properties view of the data
without going to fetch (or having advance knowledge of) the
syntactic schema or serialisation policy. RDF's
initial syntax was one approach; there have been and will be
others. The Microsoft folks were for a while throwing around some
interesting ideas on mapping more 'colloquial' XML syntax into directed
labelled graphs. There's a version at http://www.biztalk.org/Resources/canonical.asp
for example. 

> > That would be another serious mistake.  Object exchange, while
> > important, represents only one of many layers that can be build on top
> > of XML, and if XML Schemas start trying to solve high-level problems
> > for every specific domain, it will become an unimplementable mess.
> > RDF already made a similar mistake by mixing together a spec for
> > object encoding in XML with a spec for representing knowledge about
> > Web pages.
> 
> Maybe this is the crux of our disagreement. I see object exchange as
> *the* application for valid XML. 

I've also heard that some folks want to use it for structured hypertext 
documents...

(One consequence of XML's document heritage is that document order is
generally treated as meaningful and in need of preservation. This can be
a pain in the butt for data-centric apps.)

				I'd be interested to hear some examples
> of applications that cannot be cast effectively in this light. In this
> view, RDF and XML Schemas are coming at the same problem from different
> angles. RDF is saying essentially "how do we build an XML application
> that represents object structures", 

This is one aspect of what RDF attempts, ie the syntax component.
The initial RDF Syntax is saying 'how do we build an XML application to
represent a particularly Webbish flavour of object   
structures? (ie. directed labelled graphs with web identifiers for
nodes, node types, relation/property types).

RDF in general doesn't look for one way of stuffing RDF data graphs into
XML; there are bound to be many ways of shipping these kinds of object
structures around in angle brackets.  So... the upper levels of RDF
(model and schema) *don't* care how about the way in which we "build an
XML application that represents object structures".

				while XML Schemas are saying "how do
> we enhance DTDs by adding some object-oriented facilities". My fear is
> that these two approaches are going to meet somewhere in the middle and
> turn out to be the same thing. If so, I vastly prefer the use of XML
> schemas. Why? Because this results in a vast simplication of the whole
> XML picture. Isn't it better to take a normal XML instance, using base
> XML syntax, and "turn" it into an object by adding the appropriate
> information in a separate schema, rather than having to recast the whole
> thing in a different syntax?

I don't see a conflict here. RDF is happy with multiple ways of shipping
data around; what it cares about is having a unified model for this
heterogenous data. Nobody I've met ever expected all interesting RDF
applications to use RDF 1.0 Syntax.

> (I wonder if I am expressing this idea clearly. I'll happily post an
> example of how this could be done if I'm not.)

I'd love to see examples of an annotated XML Schema that shows how to
derrive an objects'n'properties view of instance data.

Dan

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)