Lark 0.90 available, with an applica

Peter Murray-Rust Peter at
Fri Jun 27 20:46:03 BST 1997

In message <E0whbWp-0002rw-00 at> akirkpatrick at writes:
> Sorry if this has been over before, but these are my

No, it's a new and useful discussion :-)

> thoughts on the class-per-element-type idea (mentioned
> recently in Tim Bray's post about Lark).
> I did something very similar recently (admittedly in C++)
> and abandoned it. My application was an SGML->RTF
> convertor. It read the events using SP and created a tree
> of elements derived from SGMLElement but specialised
> towards RTF. The hierarchy looked something like:
>   SGMLElement
>       RtfFile
>       RtfContainer
>       RtfPara
>          RtfTitle
>             RtfTitleTarget
>          RtfAdmonition
>       RtfInline (parametrised)
>          RtfLink
>       etc.
> I found the following drawbacks:

I think the primary problem is that the mapping of SGML to RTF is formally
impossible.  If the SGML application was MathML, the content might be a second
order differential equation; if CML, it might be the active site of HIV 
protease.  Neither of these has the concept of 'paragraph' :-)

It's very common that people use 'SGML' as a shorthand for 'a-conventional
sense'.  They then devise SGML2XYZ translators.  These can only be generic if
the have heuristics about how commonly encountered markup maps onto XYZ

JUMBO has a small number of such heuristics.  It tries to find the title of
an Element (for display) as follows:
	- use the TITLE attribute
	- else find a child with TITLE elementType
	- else use the ID attribute
	- else take the first 30 characters of PCDATA
	- else take the elementType
but this is only to try to help human navigators - it's not a formal 

> 1. Leads to "class spaghetti" with similar code being spread
> all over the place.

This isn't necessary if inheritance is used.  JUMBO has a superclass Node which
has default procedures (e.g. getTitle() above).  By default all Elements display
or are processed using this.  There are a lot of useful defaults a Node can 
> 2. There is usually a large degree of dependence between the
> elements and the driving application. Often the elements need
> to access the driving application directly and there is no obvious
> and efficient way provide this interface.

No.  In JUMBO there is very little coupling between subclassed Nodes and
JUMBO.  Yes, they have to be subclassed from Node, because that's what they
are, but beyond that they have their own behaviour (or none).
> 3. You need to create a new class for each new element type
> (less of a problem in Java?). For C++, this means recompiling
> the application.

My MOLNode class is 1500 lines of Java because molecules are complex.  There
are routines like orthogonaliseFractionalCoordinates, getMolecularWeight,
countHydrogenAtoms, etc.  These would have to be written whatever structure was 
used.  There is actually very little duplicated code.   Similarly Matrix, Graph
and so forth require distinct code.

If classes share common functions then they can be subclasses of an 
intermediate class.  Thus in PLAYDTD, both ACT and SCENE could be subclassed
from PlayDivision.  This class would know that both ACT and SCENE had a child
TITLE.  [Indeed they might both be instances of PlayDivision directly.]  
Many elements can get by with just the generic Node class.

> It was actually when I looked at the prospect of creating a whole
> new raft of classes for the HTML output that I decided to start again.
> I rewrote my application to use the follow process:
> 1. SgmlReader reads document and creates tree of generic elements.
> Each element has an SgmlRule member variable/class.
> 2. SgmlStylesheet reads a stylesheet (also in SGML) and associates
> properties with the elements based on gi, position, etc. These properties
> are added to the SgmlRule for each element.
> 3. RtfWriter/HtmlWriter/XxxWriter recursively processes the elements
> deciding what to do based on the properties applied by the stylesheet.
> (I realise this is similar to the way Jade operates but our RTF writer
> also handles WinHelp and has other output/app-specific features).

It sounds as if you would be better off using DSSSL, since it handles
transformations.  It's possible to do the same thing in Java - and probably
takes the same amount of code - but you may need to define some formatting
classes (Div, Para, etc.).

> I'd be really interested to hear views in favour of the class approach.

I hope I've given some above.  Wherever the object is complex, then it makes
sense for its behaviour to be attached closely to it.  I wouldn't like to
write a 3-D geometry program in DSSSL (though it would be possible) just as
I'd prefer not to do typesetting in Java.

The difficult part comes with element-in-context.  If an element has different
behaviours in different contexts, then code can become hairy.  This is often a 
problem with CML-like DTDs where there are only 10-20 elements per DTD.  

The other difficult bit is with relations between objects.  This can be managed
generically with XML-LINK, but usually semantics have to be added.  I am 
trying to make XML-LINK as generic as possible in JUMBO, but I suspect there
will be places within one Node where links to anoth have to be specifically



Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences

xml-dev: A list for W3C XML Developers
Archived as:
To unsubscribe, send to majordomo at the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at

More information about the Xml-dev mailing list