Reference Model

Wed Jul 15 01:24:32 BST 1998

> From:   James K. Tauber

> Has anyone developed (or is anyone developing) a reference model for
> document-based knowledge/information interchange (with a generic markup
> focus)?
...
> Furthermore, in a post earlier the same day I used the following
> OSI-inspired models:
>
> > In normal publishing:
> >
> > AUTHOR CONCEPT           [semantics]
> >  --presentation-->
> >   DOCUMENT               [presentation]
> >    --interpretation-->
> >     READER CONCEPT       [semantics]
> >
> > With generic markup:
> >
> > AUTHOR CONCEPT             [semantics]
> >  --markup-->
> >   XML DOCUMENT             [syntax]
> >    --stylesheet-->
> >     DOCUMENT               [presentation]
> >      --interpretation-->
> >       READER CONCEPT       [semantics]
> >
> > Now, where a machine (or a human, for that matter) directly
> reads the XML
> documents we have:
> >
> > AUTHOR CONCEPT             [semantics]
> >  --markup-->
> >   XML DOCUMENT             [syntax]
> >    --processing-->
> >     MACHINE ACTION         [?semantics]
> >
> > [of course, machines can generate the documents too, a case I haven't
> considered in the above > diagrams]

In my book I develop a six view model of publications (I even found a word
"senocular" meaning "having six eyes" :-). The first three correspond to
"presentation markup" and the latter three correspond to "logical markup",
and may be all simulataneously present (though often piggybacked):
	(Page) Layout
	(Page) Objects
	Glyphs
	Characters
	Editorial Structure
	Topical Stucture

I comment that editorial and topical structure are usually marked up with
elements, that characters and glyphs are usually marked up (in SGML at
least) with entity (and numeric character) references, and that page layout
and page object manipulation are where PIs tend to fit in.

I use this model to explain the "flow of dependence" idea behind generic
markup, and thus which kinds of flows are not straightforward to implement
with generic markup. I found that a more detailed model like this makes it
simpler to think about many issues in markup. The various models that James
T proposes hard-codes the flow (i.e. "processing") between the various
views, which I think misses the basic problem that the relationships between
the different views (especially w.r.t. causality) are complex: a simple flow
from higher to lower works often, but not always.

(Take a newspaper for example: the page design and the rendered sizes of the
other articles on a page will constrain how many paragraphs a new article
can have: in effect this disguises that paragraphs and sentences in
newspapers have a priority (which could be labelled with an attribute) which
determines which ones will be selected. In fact, an editor-free system could
allow various alternative paragraphs, depending on space, though perhaps
that causes other problems.)

To give another example of why a simple model which dependency flows from
higher-level to lower level structures is often an over-simplification: in
my book, the each chapter starts with a summary--I had to write these
summaries with knowledge of the layout and the amount of space available. So
text often has layout dependencies.  The result of ignore
presentation->topical depencies is ugly and sterile typeset documents--I
notice that both the TeX and the Scribe manuals comment that page-breaking
is never 100% successful (i.e. w.r.t. traditional typesetting aesthetics)
just from automation: I think the same thing is true about hyphenation.
Scroll-bars do get rid of a lot of the media-restrictions from paper
publishing, it is true, but visual and design rules still exist.

However, every model has its limitations of course: this five view model
sticks "metadata" in as a "topical structure", which is not so comfortable.
And it skips over the fundamental "document/publication" dichotomy, to a
certain extent: a publication is made from combining many sub-publications
from various structured documents, each of which is rendered/published at a
different time (browsing-time, caching-time, site-building-time,
editing-time, authoring-time, etc.)

Anyway, I have found this six-view model to be very useful in many
situations: it is not difficult to understand and fits in with the ISO
character/glyph model, the DSSSL/XSL flow-object model, and Topic Navigation
Maps.

So I certainly commend the six-view model, but with the proviso that it
helps explain which kinds of publications are suitable for generic markup
processing. The generic markup movement IMHO is based on exploring which
kinds of publications are simple for computer processing: I certainly don't
think the generic markup movement should promote the idea that all
publications exhibit a simple "flow of dependence" from the sensed
page-objects to the imagined underlying logical structures. That is a danger
of models like the ones James T is brainstorming.

Rick Jelliffe

==========================================================
The XML & SGML Cookbook, by Rick Jelliffe
Charles F. Goldfarb Series on Open Information Management
656 pages + CD-ROM, Prentice Hall 1998, ISBN 0-13-614223-0
http://www.sil.org/sgml/jelliffeXMLAnn.html
http://www.phptr.com/  > Book Search > "Jelliffe"
http://www.amazon.com/exec/obidos/ASIN/0136142230/002-4102466-3352420

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)