<HTML>

<HEAD>

   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

   <META NAME="GENERATOR" CONTENT="Mozilla/4.04 (Macintosh; U; PPC) [Netscape]">

</HEAD>

<BODY>

<CENTER>

<H1>

XML Documents Are Objects!</H1></CENTER>

<CENTER>or</CENTER>

<CENTER>

<H2>

Killing OO Softly With XML</H2></CENTER>

<CENTER></CENTER>

<CENTER><A HREF="mailto:pazandak@objs.com">Paul Pazandak</A></CENTER>

<CENTER><A HREF="http://www.objs.com/">Object Services and Consulting,

Inc.</A></CENTER>

<HR WIDTH="100%">

<CENTER><I>"Wouldn't it be nice if one could simply tell an object to serialize

to XML, and then deserialize back into an object?"</I></CENTER>

<P>As programmers do you long for the old days when data was data and code

was code? Do you buy into the idea that the behavior associated with data

should be embedded within the application so as to restrict reuse of that

data? Ah, the good old days of relational databases! In its current usage

XML is enabling you to revisit those days again... but don't be persuaded

by the dark force! Put on your OO glasses and see the light!

<P>Sure, XML provides incredible potential, and I am all for it. But in

their current form, XML documents are nothing more than mobile semi-structured

non-object databases (ohhh so close! But not quite enough). Why is it that

programmers have suddenly forgotten all about objects just so they could

write XML? Is a return to relational databases that enticing? (Bleech!)

The only practical reasoning behind such an approach is that programmers

want to keep their data private. They don't want other applications to

have the ability to reuse that data, and they accomplish this feat by embedding

all of the code associated with that data (formally called "behaviors"

in the OO era) in their own applications. [Who's running this show anyway?

Is XML some kind of conspiracy to kill OO?]

<P>Here's a simple example. You write an application that converts unformatted

poems into composite poem objects rich with behavior. You want to store

these poems, and share them with other applications that want to do things

with poems (whatever it is you do with poems). You define an XML structure

and start generating XML documents as a means to store and share the poems.

Every application (<B>including</B> yours) that reads in your poems using

an XML parser will see the poem as something similar to:

<P>[This XML document was taken from an example accessible at the <A HREF="http://www.microstar.com/">Microstar</A>

website (distributors of the AElfred XML parser).&nbsp; The file name is

<A HREF="http://www.microstar.com/XML/donne.xml">donne.xml</A>. Below is

the parse tree for this document.]

<P><TT>root |-> Element |-> Element |-> Element</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |-> Element</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |-> Element

|-> Element</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> Element |-> Element</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> Element</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> Element</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> Element</TT>

<P>Pretty impressive right? Then <I>every</I> single application will need

to supply its own code to understand how to navigate and interpret this

structure, and provide behavior for it. This is typical if you are a C

programmer, but be clear, this isn't OO. And, while DOM takes us a bit

farther, you still won't get the parser to produce a poem object and its

poem-specific behaviors from the XML document (but we still want DOM!).

<P>The process of generating XML strips the behavior out of the objects;

or, saying it differently, XML and related standards do not describe a

mechanism by which one can attach behavior to XML documents. The parser,

in turn, cannot therefore work miracles when it reads the data (which are

no longer objects) back into the application. Or can it? Why can't we view

XML as a serialized object representation? If we agree that this is not

too far fetched, then why can't parsers deserialize or objectify the objects

contained in the XML documents, rather than simply handing us data and

making the applications do all of the work?&nbsp; What if the parsers generated

<I>real</I> classes (with behavior!) instead of generic <TT>Element</TT>

classes? The poem above would instead look like this: (perhaps if we talked

about XML documents as orders (or anything else) instead of poems it might

be more motivating?)

<P><TT>root |-> poem&nbsp;&nbsp;&nbsp; |-> front&nbsp;&nbsp; |-> title</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |-> author</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |-> revision-history

|-> item</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> body&nbsp;&nbsp;&nbsp; |-> stanza</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> stanza</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> stanza</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> stanza</TT>

<P>Oh, but could it be <I>that</I> simple? (The answer is "yes.") Would

having a parser output <B>objects </B>with type-specific <B>behavior </B>be

useful? (Hmm...) Would programmers really want to share their <B>objects</B>

if they could? (The answer <I>should</I> be "yes.") Even if they didn't

want to share their objects, or if nobody wanted their objects, why violate

the principles of OO and make the programmers' lives more difficult? Wouldn't

it be nice if one could simply tell an object to serialize to XML, and

then deserialize back into an object?

<P>With some VERY simple extensions to current parsers this can occur,

and already has -- we've created an extended version of the <A HREF="http://www.textuality.com/Lark">Lark

XML parser</A> which provides this capability. Our input to this extended

parser is the XML document and the type-specific classes (like poem) extended

with the basic ability to deserialize themselves.

<BR>

<HR WIDTH="100%">

<H2>

Introduction</H2>

XML documents are indeed objects, or at least they <I>could</I> be. If

we simply associate behavior with the data structures defined within the

XML documents we could have normal, living, breathing objects... like we're

used to in the programming world. Instead of having the parser breathe

life back into our objects, as part of the deserializing or re-objectifying

the object, we are forced to do this within our applications. Simply put,

parsers aren't doing enough for us.

XML parsers currently support non-portable object specifications.

While the XML <I>documents</I> themselves are portable by virtue of being

written in XML, the objects represented by those documents are cannot be

objectified without an accompanying document-specific application which

interacts with the parser.

<P>Current XML parsers provide the ability to parse an XML document, and

perhaps generate a generic object structure (parse tree) corresponding

to the document. However, XML documents could potentially represent more

than simple structured documents, they could describe complex objects with

behavior. Common (simple) examples of XML documents include address lists.

But making use of this information requires each application which desires

to consume address lists to write parser-related code, as well as code

to implement the behaviors of the address lists and their entries. We propose

a simple extension to parsers which would all but eliminate application-parser

interaction and the need for document handlers (which do not migrate with

the XML document), and would facilitate objectifying XML documents into

type-specific objects (like we're used to having in the programming world)

having all related behaviors intact.

<BR>&nbsp;

<H2>

Background</H2>

Current XML parsers generate generic parse trees (most do anyway). These

trees represent the <B>structure</B> of the data that was parsed. But what

is missing is the <B>behavior</B> associated with this data. While there

are methods associated with the generic parse tree elements, these are

not data-specific but rather generic methods (see the <A HREF="#SampleCode">sample

code</A>). This approach places the burden on the application to deserialize

the document back into objects using the generic calls and a lot of validating

code. <B>This is true of all current XML parsers</B> (which support parse

tree generation)<B>.</B>

<P>Once the XML document is parsed the information needs to be retrieved

by the application, so it must access it from the parse tree (if one was

generated -- see the note on problems with <A HREF="#Event-based parsing">event-based

parsing</A>). In general, the consuming application may proceed in one

of two ways to accomplish this:

<UL>

<LI>

<B>Simple Extraction.</B></LI>

<BR>The application will march down the structure, extracting out and consuming

the data as it goes. This requires making calls using the generic parse

tree methods (parser-specific -- SAX doesn't support a parse tree API).

<UL>&nbsp;</UL>

<LI>

<B>Tree Transformation / Mapping</B></LI>

<BR>The application copies the data out of the generic parse tree into

type-specific structures (e.g. Java objects) which contain type-specific

definitions. The data is then accessed by the application using the type-specific

API of these new structures.</UL>

In both cases, the application embeds the intelligence of how to access

the data within itself. This is not unlike the approach used by relational

database applications which separates the data from its behavior. If another

application wishes to access this data, it must define its own behavior

for that data.

<BR>&nbsp;

<H3>

An Example</H3>

Here's an example to illustrate this. This XML document was taken from

an example accessible at the Microstar website (distributors of the AElfred

XML parser).&nbsp; The file name is <A HREF="http://www.microstar.com/XML/donne.xml">donne.xml</A>.

When an XML parser generates a parse tree for this document, the resulting

(informative) tree will look like the following in Lark (and similar in

the other parsers as well):

<P><TT>root |-> Element |-> Element |-> Element</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |-> Element</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |-> Element

|-> Element</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> Element |-> Element</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> Element</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> Element</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> Element</TT>

<P>The <TT>Element</TT> entries are the objects created by XML parser corresponding

to the Element Declarations in the XML document. To determine what each

element is, the application must navigate the structure and inspect each

<TT>Element </TT>object using a generic API. This requires that the knowledge

of how to navigate the structure is embedded within the application.

The interface of this object must be embedded within the application as

well which really violates the object-oriented paradigm -- yes, the data

is stored in objects, but the associated <B>type-specific </B>behavior

is stored someplace else. While this may appear similar to how objects

are serialized today (without code), the distinction is that any other

application that wants to access this object will <B>not</B> have access

to the code since it is buried in the application which created it. All

other applications will have to provide their own code (this, again, is

how applications for relational databases are written).

<P>There are several other problems with this approach, not the least of

which is that the application should not be responsible for doing this.

Furthermore, the parser-related code required to walk a complex structure

is complex itself (not quite as complex as code used for event-based parsing

of complex structures however), and is more difficult to maintain. Finally,

the application is forced to do what the parser has already done, that

is understand and navigate the structure of the document. The parser has

already gone through the entire document and generated a structured instantiation

of objects. The crux of the problem is that the parser generates <B>generic</B>

objects which forces all of this additional work on the application. Worse

yet, there is no reason this has to occur -- nor does the (tree-generating)

parser have to be significantly modified.

<BR>&nbsp;

<H3>

<A NAME="Event-based parsing"></A>Event-based parsing</H3>

An alternative to tree generation is simply to consume the structure on-the-fly

as it is parsed. This requires writing an XML structure-specific handler

(a <I>document</I> handler in SAX terms) which describes what should happen

for each XML declaration that is encountered; no structure is automatically

generated, so if objectification of the XML document is desired the handler

is responsible for this. Using event-based parsing the application could

adopt either of the above two approaches, the first being simple consumption

and the latter which would cause the construction of some structure corresponding

to the XML document. In both cases, at least for complex XML structures,

there would be a lot of conditional segmented code which is more difficult

to write and modify when changes in the XML structure occur. Using the

extension proposed the majority of the work is done by the tree-generating

parser, empowering the application to see XML documents as objects and

alleviating their burden of using event-based parsing.

<P>Granted, when an application will only encounter one kind of XML structure,

event-based parsing might be a reasonable approach from the standpoint

that only one handler would need to be written. But it still suffers from

some of the same problems as generic parse tree generation (see the <A HREF="#Summary">summary</A>

section).

<BR>&nbsp;

<H2>

XML Parsers Extended</H2>

What if the output of the parser was a type-specific structure which coincided

with the definition of the structure in the XML document? And, what if

that resulting objects contained the type-specific behavior for the specific

element type parsed? What if the resulting parse tree for the example above

instead looked like:

<P><TT>root |-> poem&nbsp;&nbsp;&nbsp; |-> front&nbsp;&nbsp; |-> title</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |-> author</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |-> revision-history

|-> item</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> body&nbsp;&nbsp;&nbsp; |-> stanza</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> stanza</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> stanza</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

|-> stanza</TT>

<P>where<TT> poem, front, body, title, author, revision-history, </TT>and<TT>

stanza </TT>were all classes with type-specific behavior? Instead of writing

something like the following to retrieve the title of the poem:

<P><TT>&nbsp;<A NAME="SampleCode"></A>Element front = null;</TT>

<BR><TT>&nbsp;Vector v = root.chilren();</TT>

<BR><TT>&nbsp;if (v != null) {</TT>

<BR><TT>&nbsp;&nbsp;&nbsp; Element front = v.elementAt(0); // v(0) "should"

be the front element, we hope</TT>

<BR><TT>&nbsp;&nbsp;&nbsp; if (front != null)</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; v = front.children();</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; for (i=0; i &lt; front.size();

i++) {</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

Element child = v.elementAt(i);</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

if child.type().equals("title")</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

return child.content();</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</TT>

<BR><TT>&nbsp;&nbsp;&nbsp;&nbsp; }</TT>

<BR><TT>&nbsp;}</TT>

<BR><TT>&nbsp;return null;</TT>

<BR>&nbsp;

<P>one could simply write:

<P><TT>&nbsp;poem.getTitle();</TT>

<P>More importantly, all of the behaviors that should be associated with

each of these object types would be defined as part of the object interfaces

themselves rather than embedded within the application.

<P>Granted, an application can generate this same structure using the transformation

/ mapping technique above. However, this is partially a duplication of

effort since it requires the application to navigate the structure generated

by the parse tree, and then generate a new structure which mirrors the

parse tree. The extension to Lark eliminates the need to do this because

it instantiates the correct type-specific parse tree the first time.

Note that this is an extension to Lark, and therefore applicable to any

XML document.

<BR>&nbsp;

<H3>

Details</H3>

What occurs in the underlying implementation of an XML parser is rather

straightforward. When it sees an XML element declaration, it instantiates

a generic <TT>Element</TT> object (with <TT>Element </TT>only related methods).

The extension to Lark simply extends the behavior of the parser so that

instead of instantiating generic <TT>Element</TT> objects, it instantiates

type-specific ones.

<P>So when the parser encounters a new element declaration, it looks for

a <I>class declaration</I> which identifies which class to instantiate

in lieu of a generic <TT>Element </TT>class object (where it looks is described

below). For example, when the parser identifies the "poem" element declaration,

it looks for a class declaration for poem. If it finds one, it instantiates

an object of that class rather than a generic <TT>Element </TT>object.

The <TT>poem </TT>class extends the interface of the Lark <TT>Element </TT>class,

but in addition, adds type-specific methods relevant to a <TT>poem </TT>object.

<P>Within a type-specific parse tree class, like <TT>poem,</TT> is code

which understands how to extract the parsed information. In effect, the

object understands how to investigate itself. This code is provided by

the object type creator. It will travel with the object as a means to facilitate

re-objectifying the XML back into an object. This enables reuse of the

object by any application. Of course, as stated above, the <TT>poem</TT>

class will also provide a poem-specific interface.

A method I have added to the <TT>Element </TT>class is <TT>process()</TT>.

It can be called once an element has been parsed. In each implementation,

for example within the <TT>poem</TT> class, the <TT>process()</TT> code

handles extracting the data from the inherited generic structures of the

<TT>Element</TT> class. Alternatively, <TT>poem</TT> methods could simply

be written that do this directly. But, it is important to note that the

object itself is doing this, and further, that no other parse trees or

duplicate structures are being constructed.

<P>The location of the class declaration is not hard-coded. It could be

within the XML file itself, in a DTD, in a stylesheet, or in a remote repository,

for example. In addition, local class declarations may be used to override

default class declarations. In the implementation of the Lark extension,

I have simply embedded them in the DTD file along with the declaration

of the structure of the XML file. In its current form the class declaration

would look like the following for the poem example above, although there

would be many ways to accomplish this:

<P>&lt;!ENTITY Poem-Class "http://www.objs.com/xml/poem/com.objs.ia.specification.xml.poem">

<BR>&lt;!ENTITY Front-Class "http://www.objs.com/xml/poem/com.objs.ia.specification.xml.front">

<BR>&lt;!ENTITY Body-Class "http://www.objs.com/xml/poem/com.objs.ia.specification.xml.body">

<BR>...

<BR>&lt;!ENTITY ClassSuffix "-Class">

<P>The ClassSuffix is used to avoid possible naming collisions (which may

be solved otherwise using the XML namespaces proposal).&nbsp; So, when

a new element declaration is identified by Lark it inspects this list looking

for an entry matching the pattern &lt;element type>&lt;ClassSuffix>, or

in the case of the poem element declaration, "Poem-Class".

<BR>&nbsp;

<H3>

Cavaet Language?</H3>

Is this a language-specific extension? Not really. The class declarations

could be (for example) written in Active-X I suppose, or even wrapped in

CORBA, thereby enabling any language to take advantage of the idea of XML

documents as objects. It would up to the parser to find the correct class

declaration and objectify accordingly.

<BR>&nbsp;

<H3>

Implementation Experience</H3>

My experience with XML parsers began last year. As part of a DARPA-funded

project I am implementing an architecture to demonstrate scalable object

service architectures. I started using event-based parsing as a means to

import object service specifications. These XML specifications represent

real (Java and CORBA) services that are invoked by the architecture.

<P>I noticed that by adopting an event-based approach to parsing I would

have to write a lot of code which would be difficult to maintain should

I have changes in the future. In addition, this code would be hard for

someone else to understand since each parser callback method would include

conditional statements for several types of elements, and the code would

be spread across several methods. I prefer a clean separation of code whenever

possible, and this didn't seem very clean.

<P>I decided that tree parsing was a more practical route. The parser would

automatically generate a structure for me. But, then I realized that I

had to write all of the code to navigate this generic object structure,

pull out the information I wanted, and then copy it into service specification

objects having the behavior I wanted.

<P>Since the parser was already generating classes, why not just tell it

to generate the real classes to begin with?&nbsp; The classes themselves

would handle deserialization.&nbsp; Sounds like OO to me! With modest changes

to Lark, when it sees an XML service specification document it will generate

service specification objects right away. This extension will work for

any XML document which defines specializations of the <TT>Element</TT>

class and makes them available to the parser. Besides asking Lark to parse

the document, my application has no other parser-related code. Furthermmore,

any other application can use my XML service specification documents, and

load them in as service specification objects with only a few lines of

code.

<BR>&nbsp;

<H3>

<A NAME="Summary"></A>Summary</H3>

In summary, an extension has been presented which extends the capabilities

of Lark, but which could be applied to all tree-generating XML parsers.

It enables type-specific composite object construction to occur within

the parser which is a significant improvement over generic parse tree construction

because:

<UL>

<LI>

We can attach <B>behavior</B> to XML documents.</LI>

<LI>

We can therefore treat XML documents as <B>objects.</B></LI>

<LI>

It eliminates most of the neccessity of the application to understand SAX,

or parser-specific structures, as well as reducing the amount of direct

interaction between the application and the parser. To a certain extent

DOM will accomplish this, but the extension proposed here augments this

by enabling the generation of type-specific interfaces.</LI>

<LI>

The application is then free to interact with the generated objects as

"real" objects having type-specific structure and behavior.</LI>

<LI>

More importantly, the XML documents can roam freely (likened to serialized

objects) which can be objectified again by any application. This would

not be possible with generic tree parsing or with event-based parsing (which

requires specialized structure-specific parsing handlers).</LI>

</UL>

If we view XML as a means to serialize an object, we should view the parser

as the mechanism to deserialize (or objectify) it. Once we convert an object

to an XML representation, it simply doesn't make sense to throw away its

behavior or the code which understands how it should be deserialized. Embedding

this knowledge within an external application is just revisiting the relational

DBMS experience and ignores the principal benefits of object technology.

<P>If this proposed extension were adopted it would benefit significantly

from a standardization of the <TT>Element</TT> interface (something that

will happen with DOM). In this way, the associated class files would not

be parser-specific, and therefore any XML document could be objectified

by any tree-generating parser.

<BR>&nbsp;

<H3>

Status</H3>

I anticipate that the extensions I have made to Lark will be incorporated

into a next version of Lark (I assume this from previous dialogues I have

had with Tim Bray). If not, and in the meantime, the enhanced version of

Lark is freely available on request.

<H3>

References &amp; Acknowledgements</H3>

Related work in this area is described in <A HREF="http://www.objs.com/OSA/wom.htm">Towards

a Web Object Model</A> by Frank Manola, Object Services and Consulting,

Inc. Thanks to Frank Manola (OBJS, Inc.) and Tim Bray (Textuality, Inc.)

for their useful feedback.

<P>

<HR WIDTH="100%"><FONT SIZE=-2>This research is sponsored by the Defense

Advanced Research Projects Agency and managed by the U.S. Army Research

Laboratory under contract DAAL01-95-C-0112. The views and conclusions contained

in this document are those of the authors and should not be interpreted

as necessarily representing the official policies, either expressed or

implied of the Defense Advanced Research Projects Agency, U.S. Army Research

Laboratory, or the United States Government.</FONT>

<P><FONT SIZE=-2>&copy; Copyright 1998 Object Services and Consulting,

Inc. Permission is granted to copy this document provided this copyright

statement is retained in all copies. Disclaimer: OBJS does not warrant

the accuracy or completeness of the information in this document.</FONT>

</BODY>

</HTML>