Lark 0.90 available, with an applica

akirkpatrick at akirkpatrick at
Fri Jun 27 15:57:42 BST 1997

Sorry if this has been over before, but these are my
thoughts on the class-per-element-type idea (mentioned
recently in Tim Bray's post about Lark).

I did something very similar recently (admittedly in C++)
and abandoned it. My application was an SGML->RTF
convertor. It read the events using SP and created a tree
of elements derived from SGMLElement but specialised
towards RTF. The hierarchy looked something like:

      RtfInline (parametrised)

I found the following drawbacks:

1. Leads to "class spaghetti" with similar code being spread
all over the place.

2. There is usually a large degree of dependence between the
elements and the driving application. Often the elements need
to access the driving application directly and there is no obvious
and efficient way provide this interface.

3. You need to create a new class for each new element type
(less of a problem in Java?). For C++, this means recompiling
the application.

It was actually when I looked at the prospect of creating a whole
new raft of classes for the HTML output that I decided to start again.
I rewrote my application to use the follow process:

1. SgmlReader reads document and creates tree of generic elements.
Each element has an SgmlRule member variable/class.

2. SgmlStylesheet reads a stylesheet (also in SGML) and associates
properties with the elements based on gi, position, etc. These properties
are added to the SgmlRule for each element.

3. RtfWriter/HtmlWriter/XxxWriter recursively processes the elements
deciding what to do based on the properties applied by the stylesheet.

(I realise this is similar to the way Jade operates but our RTF writer
also handles WinHelp and has other output/app-specific features).

Ideally, this should be generalised further with a SgmlElementPlusRule
class which just contains a pointer to the SgmlElement and the SgmlRule
(otherwise the SgmlElement has a dependency on SgmlRule).

The stylesheet mechanism is (just about) indendent of the output format.
All the code to handle RTF/HTML/whatever is centralised in the XxxWriter
class. I've found this much easier to enhance and maintain than the   
implementation. I've also found that 90% of the time we can do things   
the stylesheet without recompiling the application.

I'd be really interested to hear views in favour of the class approach.


xml-dev: A list for W3C XML Developers
Archived as:
To unsubscribe, send to majordomo at the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at

More information about the Xml-dev mailing list