Call for unifying and clarifying XML 1.0, DOM, XPATH, and XML Infoset

Norman Gray norman at astro.gla.ac.uk
Thu Jan 27 15:41:50 GMT 2000


Greetings,

I put in some effort to understand ISO 10744.  I ended up feeling
that it was indeed Simple (as distinct from `simple'), and that it
enhanced my understanding of both SGML/XML and the notion of structured
information[1].  When I read the auxiliary XML specs (DOM, InfoSet,
XLink,...), my first thought was `why are they creating a complicated
version of HyTime?'.

However:

    (a) Simple is not the same as easy.  The flip-side of the Simplicity
    is that the standard is somewhat abstract (ahem!) and, like all
    grand projects, has its own universe of concepts.  Also, I found it
    rather difficult to work out from the standard just what problem
    10744 was trying to solve.  All of this means that there's not an
    obvious reason why someone with other claims on their time should put
    aside the mental house-room for the standard, or put in the effort
    to see through the details to the elegant structures underlying them.
    In other words, the standard does not give the impression, up front,
    that the effort of understanding it will be repaid.  This is a pity.

    (b) Separately from that, I felt that the standard was rather
    confusingly written; I strongly suspect, however, that this was
    because of the constraints of writing according to the ISO format.
    My impression was that there were at least _four_ standards here,
    which were linked but independent.  Having all four in the one
    document produces an indigestibly rich pudding.

Taking the second point first: 10744 more-or-less simultaneously
defines

  1 The definition of the Grove (appx A.4.1.4), the notion of
    grove construction as the sole result of a *ML parse, and the
    grove as the data structure on which all subsequent operations are
    formally defined.  This is related to the definition of the DOM,
    though that's an interface, I suppose, rather than a data structure.
    I understand Nils to have been discussing something very like this
    in his message in this thread.

  2 The property set in general (appx A.4), and specifically one example
    of it -- the SGML Property Set (appx A.7).  The XML InfoSet appears
    to be a reinvention of the notion of a Property Set.

  3 The notion of the architecture (appx A.3), as a mechanism for supplying
    inheritance (in the OO sense) and mapping general structures, and
    their semantics, onto disparate DTDs.

  4 One particular architecture, the HyTime architecture (defined in
    the rest of the standard), defining a ready-rolled bundle of semantics
    for DTDs to inherit and subset.  XLink appears to be a version of
    this architecture, albeit without the flexibility and generality.

  ... and a few other bits and bobs in the rest of appendix A, `SGML
    extended facilities'.

Unless I am misunderstanding something quite severely (in which case,
that's another datapoint), these `mini-standards' are reasonably
independent, though designed to fit together well.  At least, the
bundle of SGML add-ons in appendix A could have been decoupled from
the rest of the standard.  The fact that they were not has, I imagine
(from a position of ignorance), as much to do with ISO bureaucracy and
traditions as with anything else.

I've ended up with the impression that part of the point of 10744 is to
make it easier to do things like define the XML auxiliary specs.

My impression of the thing is that it is the result of a lot of work to
produce carefully defined, though admittedly not transparent, fundamental
concepts, which clarify the problem and articulate with each other snugly,
and which make the next generation of standards and tools easier to write.
It would appear that once you have absorbed 10744, the XML auxiliary specs
could be defined _extremely_ compactly in terms of the concepts of 10744
(the InfoSet would be just a property set, the DOM just a grove plan,
XLink a set of HyTime options, and so on).  In this spirit, James Clark's
`Comparison of SGML and XML' document is essentially a definition of XML
= SGML + the XML declaration + three pages of commentary, and makes the
30-page XML Recommendation look bloated.  The shorter the spec, and the
more it can build on previous standards, the less chance there is for
ambiguity, contradiction, and confusion.

After such a minimalist set of specifications, the crucial effort would
be to provide a mass of non-normative commentary on the definitions,
in a form which makes them accessible or more generally usable.

What appears to be happening now is that HyTime is being reinvented
piecemeal -- in the auxiliary specs -- which is bad for just the reasons
Nils mentioned: specs seem to contradict each other, act on different
information sets, require a forest of new terminology and concepts
which may or may not be isomorphic to each other.  I can see that
there's the intention of making these specs more accessible by being
less general, but the `issues' listed in, say, the 19991220 version of
XLink simply illustrate that as XLink becomes more nearly finished, it
becomes more abstract, and more and more like HyTime without the
elegance.

I don't have any axes to grind here; I'm a user of the specs rather
than an author of them.  Speaking as a user, however, I can say that the
current auxiliary specs look like a rat's nest of slightly out-of-synch
definitions, rapidly being set in concrete.  Though individual specs are
intelligible in isolation, the network of specs looks collectively ugly,
inexpressive, and Not Fun To Play With.  It's starting to look as if
an XML project is something to avoid getting lumbered with, whereas a
HyTime project I'd fight for tooth and nail.

All the best,

Norman


[1] I found quite a few useful exegeses of 10744 stuff, covering
    HyTime, architectures and groves, but generally
    without making it terribly clear that these were
    fairly distinct things.  I've a small collection at
    http://www.astro.gla.ac.uk/users/norman/bookmarks/lists-hytime.html

-- 
---------------------------------------------------------------------------
Norman Gray                        http://www.astro.gla.ac.uk/users/norman/
Physics and Astronomy, University of Glasgow, UK     norman at astro.gla.ac.uk


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom at ic.ac.uk the message
unsubscribe xml-dev  (or)
unsubscribe xml-dev your-subscribed-email at your-subscribed-address

Please note: New list subscriptions now closed in preparation for transfer to OASIS.





More information about the Xml-dev mailing list