Call for unifying and clarifying XML 1.0, DOM, XPATH, and XML Infoset

Steven R. Newcomb srn at
Fri Jan 28 23:39:52 GMT 2000

> [Lars Marius Garshol:]
> | [ Norman Gray]

> | When I read the auxiliary XML specs (DOM, InfoSet, XLink,...), my
> | first thought was `why are they creating a complicated version of
> | HyTime?'.
> You are not the only one to react that way, although a case could be
> made for the DOM being slightly more than just a new version of
> HyTime.

The DOM is an API.  It describes functionality.  That's good.  The DOM
API implicitly operates on something that certainly seems like a grove
to me.  That's OK, and, in fact, it's just great when InfoSet
describes just what that grove-like something actually is.  HyTime
does not describe any APIs, so of course the DOM cannot be a
reinvention of HyTime.

> |     (b) Separately from that, I felt that the standard was rather
> |     confusingly written; I strongly suspect, however, that this was
> |     because of the constraints of writing according to the ISO format.

Too right.

> |     My impression was that there were at least _four_ standards here,
> |     which were linked but independent.  Having all four in the one
> |     document produces an indigestibly rich pudding.

Perhaps *much* too right.  Personally, I would estimate a higher
number than four.

However, the set of standards that is ISO/IEC 10744:1997 (HyTime)
interdepend in complex and important ways.  I don't think the overall
elegance of HyTime could have been achieved by segregating things like
groves from things like architectural forms, as separate standards
development activities.  HyTime's most significant contribution is not
any one of its separate component standards, but the sensible,
economical, simple and powerful way they all fit together, having
contributed vital requirements to each other.

The W3C process appears to be based on the naive belief that
independent design assignments for all the various aspects of
Web-based information interchange can be made to a plethora of
independent technical committees, and, in the end, everything can be
made to work together somehow.  Even when we combine all of the W3C's
favorite magic bullets (RDF, XML Namespaces, XML Schemas, etc.),
there's still no paradigm for the re-use of vocabulary-specific engine
software, and no doctrine providing for the validation of the use of
mixed vocabularies.  Thanks to the dazzlingly brilliant insights of
James Clark, who invented groves and who very significantly improved
architectural forms, HyTime 1997 (aka HyTime 2nd Edition) got the
basics of that stuff right -- very, very right.  The fact that these
insights have been systematically ignored by the leadership of the W3C
is tragic.  If I were working on W3C projects, I'd be very angry about
the way that my time is being blithely flushed down the toilet by
leaders who are trusted by everyone to know the state of the art, but
who don't really, and who portray themselves as trustworthily knowing
where W3C's technical developments are leading the world, when in fact
they do not.

As a public demonstration of what the leadership of the W3C does not
know, I hereby invite the press to question the W3C leadership about
how the W3C's standards-making efforts are addressing the blindingly
obvious global requirement for a standard doctrine as to how modularly
re-usable vocabulary-specific software engines can be used in
applications that process the documents that use those vocabularies.
For example,

* How will the XLink vocabulary be supportable by an Xlink engine, and
  the XHTML vocabulary be supported by an XHTML engine, with both
  engines usable in a variety of applications contexts, and with both
  engines able to be built and marketed by any software vendor, and
  used in any combination by any application developer?  (I suspect
  the W3C leadership's answer will be, "That's for The Software Vendor
  to figure out.")

* Even more fundamentally, how will a document's use of both
  vocabularies be validated?  When information interchange fails, it's
  vital to be able to point the finger of blame at either (1) the
  software that created the invalid use of the vocabulary, or (2) the
  software that could not understand a perfectly valid use of the
  vocabulary.  (The Web culture's notion that "If 80% of the
  information gets through, that's good enough", simply won't fly in
  business-to-business communications.  100% is absolutely required.
  I claim that there is no such thing as a useful B2B vocabulary in
  the absence of a vendor-neutral, application-neutral way to
  determine whether that vocabulary is being used properly in any
  given instance.  XML Schema *does not* address the problem of
  validating mixed vocabularies.  As far as I can tell, this
  fundamental problem doesn't even appear on that committee's radar.)
  (The answer, "That's for The Software Vendor to figure out," is
  obviously not a good answer.  Software vendors are not going to come
  up with a scheme whereby their product offerings can compete solely
  on their intrinsic value!  That job can only be for their designated
  consortium, the W3C, to do.)

* How will a software application that reads such a document invoke
  the two re-usable engines, incorporating their intelligence into
  itself?  (If the W3C does not provide an answer to this question,
  then there can be only one source for re-usable vocabulary-specific
  software engines, in which case the term "re-usable" is kind of a
  joke.  Long Live The Monopoly!  Let the Monopoly Own All The
  Business Vocabularies of the World!)

So far, the answers to these questions are undefined in W3C land, and
yet the answers are essential for information interchange in a world
that most of us hope will not always be entirely dominated by a single
software monopoly.  The answers to the above questions are fundamental
to the development of any business vocabulary, and several fundamental
business vocabularies are being developed by the W3C.  How will these
business vocabularies actually work together, huh?  Dear Ladies and
Gentlemen of the Press: Please demand the answers to the above
questions, and please publicize the answers you receive.  (Please also
be aware of how HyTime answers them, and of how HyTime's answers are
already working.  I'll be happy to help you with this, and so will
quite a number of other knowledgeable people.)

> | What appears to be happening now is that HyTime is being reinvented
> | piecemeal -- in the auxiliary specs -- which is bad for just the
> | reasons Nils mentioned: specs seem to contradict each other, act on
> | different information sets, require a forest of new terminology and
> | concepts which may or may not be isomorphic to each other.
> I agree to some extent, and I think one way to handle this would be to
> do a scaled-down version of property sets for the XML specifications
> and then build things on that. This means that most of the difficult
> stuff would have to go (grove plans, for example) and the terminology
> might have to be simplified, but the simple essence could be kept.

We should all welcome such an outcome!


Steven R. Newcomb, President, TechnoTeacher, Inc.
srn at

voice: +1 972 517 7954
fax    +1 972 517 4571

Suite 211
7101 Chase Oaks Boulevard 
Plano, Texas 75025 USA

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: or CD-ROM/ISBN 981-02-3594-1
Unsubscribe by posting to majordom at the message
unsubscribe xml-dev  (or)
unsubscribe xml-dev your-subscribed-email at your-subscribed-address

Please note: New list subscriptions now closed in preparation for transfer to OASIS.

More information about the Xml-dev mailing list