Architectural Forms, separation of formatting and loose-leaf management

Wed May 6 16:52:47 BST 1998

Here are three random things which may be useful to consider.

1) The first is that DSSSL allows you to have external functions. So even though DSSSL itself has no way to query the pagination system, DSSSL does allow you to stick in your own queries or functions. You can do all sorts of tricks with these. I dont know to what extent JADE supports this, though. One trouble with stream-based SGML processors is that they often have an output buffer (or are in a pipe) so unless you can flush the output buffers, your SGML processor may be left stranded if it waits for some feedback from a downstream program.

A DSSSL system built on top of a general purpose Scheme would be most likely to cope with feedback from layout engines.  Tony Graham of the DSSSL list would be a good contact in this regard.

2) People often put pagination information in processing instructions.  Or the information can be kept in an external database with, for example, HyTime locators. If you can decide in advance to only break pages on paragraph boundaries, then you can piggyback the pagination information on top of element markup.

3) If you find you have many of these concurrent structures, you may opt for "point markup", which is rather extreme, and would be an interesting challenge for some stream-based processors. In point markup, your main text is just marked up using 
<!DOCTYPE document [
    <!ELEMENT text ( #PCDATA | point)*>
    <!ELEMENT point EMPTY>
    <!ATTLIST point  id ID #REQUIRED >

Then you have as separate element trees for each kind of structure: these trees probably contain no character data of their own, just IDREFs to the start and end of their range.  In this way you can represent concurrent, overlapping hierarchies in SGML. For example:

    <!ELEMENT document (tree+, text)>
    <!ELEMENT tree     (start, tree*, end)>
    <!ELEMENT ( start | end ) EMPTY >
    <!ATTLIST    tree            type NMTOKEN #IMPLIED >
    <!ATTLIST (start | end )   refid IDREF #REQUIRED >
]>
    <document>
        <tree name="pages">
            <start refid="x1"/>        
                <tree name="page1">
                        <start refid="x1"/>
                        <end refid="x4"/>
                </tree>
                <tree name="page2">
                        <start refid="x4"/>
                        <end refid="x5"/>
                </tree>
            <end refid="x5"/>
        </tree>
        <tree name="p">
            <start refid="x2"/>
                <tree name="b">
                        <start refid="x3"/>
                        <end refid="x5"/>
                </tree>
                <end refid="x5"/>
        </tree>
        <text><point id="x1"/>here is <point id="x2"/>some<point id="x3"/>
                data <point id="x4">of no interest.<point id="x5"/></text>
    </document>

This structure has the advantage of neatness, and provides a lot of modeling power
for just one extra level of indirection. If you used HREF rather than REFID, you can use
external point markup too.

The effect, of course, is to have concurrently
    <pages><page1>here is some
                data </page1><page2>of no interest.</page2></pages>
and
    <p>here is <b>some</b>
                data of no interest.</p>

Rick Jelliffe

Author, "The XML & SGML Cookbook", out in May from Prentice Hall.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19980506/36137d71/attachment.htm