Parser compliance

Paul Prescod paul at
Fri Nov 19 08:01:57 GMT 1999

David Megginson wrote:
> That's probably a rhetorical question, but for those new to the field
> (i.e. who didn't come from SGML), SGML consultancies throughout the
> 1990's made an enormous portion of their money writing (and rewriting
> and rerewriting and rererewriting) massive and incomprehensible DTDs
> for government, military, and big industry, so naturally they (OK,
> "we") hyped the importance of DTDs as the cornerstone of any system.

There are two issues here. Tim raised one about how many mistakes DTDs
actually catch. You raise a (mostly unrelated, IMHO, one) about how much
of a project's energy should be spent on the DTD design. Even though you
are talking about different things, I think you're both wrong.

I claim that when people start using XML for what its good for schemas
(nee DTDs) will come back to the fore. After all, schema design is the
only hard thing about XML. The system design aspects are no different
than NON-XML system design. But for schemas, you could erase the "XML
here" box and put "s-expressions here" (or maybe even "CORBA here") and
nothing else would change.

Once the core standards mature, people will set about building vertical
schemas (they already have). They will spend an enormous portion of
money writing, and rewriting, and rewriting, massive and
incomprehensible DTDs for electronic commerce, manufacturing, three-D
graphics and so forth. It's precisely because getting these things right
is so hard AND so important that so much energy is spent on them. 

The XML world won't spend less money, it will just spend it differently.
Instead of several large, collosal failures, it will have dozens and
dozens of tiny failures (there are probably at least a dozen failed
XML-based languages already).

> In brief, then, SGML systems tend to be DTD-centric while XML systems
> tend to be component-centric.  There's nothing in SGML or XML that
> forces that distinction; it's just the way things fell out.  Tim's
> right -- DTD-based validation will tell you only a tiny portion of
> what's wrong with your document, though that portion can be helpful
> in some circumstances.

That's a massive generalization. I'll bet that if we cleaned up all
HTML, 80% of all errors would be caught by a well-formedness parser
(mandated by XML), and 80% of the rest would be caught by a validator.
Most of the remaining errors would be fixed if we ran an automated link
checker against them.

I think that Tim should spend some time getting to know Lauren's
customers. I would guess that most people in the document publishing
world who use validating editors and parsers cut their error checking
code by 9/10th. More important, they elevate the error checking into a
syntax that can easily be read and shared.

Your average XML editor purchaser has only two non-negotiable
requirements: realtime DTD checking and realtime stylesheet application.
Tim seems to think that they are naive but I don't believe so. That
feature can and does save many companies millions of dollars.

 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Bart: Dad, do I really have to brush my teeth?
Homer: No, but at least wash your mouth out with soda.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list