Proposition: "SGML is Gumming Up the Works"

Paul Prescod papresco at
Sat Sep 12 16:21:04 BST 1998

The hardest part of coming to a new domain is recognizing what parts of 
what we know from other domains do NOT apply.

On Fri, 11 Sep 1998, Mark Tucker wrote:
> Like "Samuel R. Blackburn" <sblackbu at>, I got interested
> in XML not because it is a good document formatting language,
> but on the promise that with XML, I can interchange Data!

Documents are data. Documents are pretty much the most complicated type 
of data in existence. You can ignore everything that SGML taught us about 
encoding complex data for interchange, but then you'll be forced to 
reinvent it (and probably not as well).

> I'm a computer scientist. I see XML-Data, and think: "Hey, there are
> type definitions.  And hey, this data file clearly contains instances
> of those types."
> But, stepping into the XML community, I'm overwhelmed by the SGML
> history of XML.  I'm told: "No, conforming to type definitions isn't
> good enough. That is not Real Validation. You must be valid according
> to a DTD."  (Perhaps XML-DATA seems to have died because it wasn't
> DTD-ish enough; I don't know.)

XML-Data died because it was half-baked crap that did not meet real 
needs. DCD will hopefully follow it into oblivion. Nevertheless, XML-Data 
was very "DTDish". At its heart is the same grammar that you complain 
about. The type system junk is a slight extension to that grammar.

> And then, looking at DTD's, I find that they aren't even as good as
> BNF context free grammars. And BNF is much weaker than type systems,
> which we need and want.

Consider, for a moment, your brain. It has all kinds of great type 
systems. It's full of Venn diagrams and hierarchies, right? But then, 
when you want to converse, you flatten all of that out into text streams 
described not by hierarchies and Venn diagrams, but by regular 
expressions and context-free grammars.

Saying that BNF is weaker than types systems is equivalent to saying that 
hammers are weaker than screwdrivers. They are not comparable. Grammars
describes serialization syntax and the other describes a data model.

If "type systems" could replace serializations, then we wouldn't need 
XML, would we? We'd just use Java's type system.

> So, we end up jumping through hoops to write DTD's to express DATA
> which is very, very, very easily described in terms of modern
> programming language type systems.  All the while, hearing a low chant:
> "What kind of cretin are you? You don't want to *validate* your data! (shock)
> You only want well-formed documents." -- NO and YES.  I don't care
> if my document can be validated by a pitiful DTD.  I do care that 
> it conform to a real type schema!

"Bang. Bang. Bang. I think I bent my screwdriver." I hate to let you 
down, but when you serialize your data model into XML, all you have is 
characters. Characters have to be verified according to the techniques that 
God and Chomsky provided for verifying character streams: regular 
languages, context free grammars, regular tree grammars, etc.

> I'm not really mad at XML but, I think "Richard L. Goerwitz III"
> <richard at> is on to something in wondering if SGML
> compatibility is going to bring down the XML effort.

That's not what Richard L. Goerwitz III said. You are projecting your own 
feelings into his messages.

> If you have to be an SGML wizard to express easy things,
> then we're in trouble.  
> Much of the initial selling of XML was:
> 		You don't need DTD's to be a good citizen.

> P.S. I'm optimistic about RDF, and am afraid that DCD sold out a bit towards
> documents.  I want DATA schemas!
> **************************************************
> [Why DTD's aren't as good as BNF]

DTDs are much better than BNF. DTDs describe XML data. BNF describes a
MUCH larger family of languages. If we were to use BNF, we would have to
put constraints on the BNF that would make it almost identical to DTDs. 

Here's the ironic part: you are right that it should be possible to use 
the same element type name in multiple contexts as long as it isn't 
ambiguous (as in C). I have a proposal for an extension to DTDs (or 
schemas) that would allow that.

The problem is, that when you try to combine this advanced facility with 
type system-based proposals (e.g. inheritance, subtyping, etc.) 
everything goes to hell. The irony is that it is people who are screaming 
for "types" instead of lexical constraints who are *weakening* the 
lexical constraints that would make DTDs (or schemas) closer in power to BNF.



What does it mean to "subclass" the PAREN element type when it is clearly 
used in two different contexts with two different content models? The 
answer: there is no PAREN type, really. There is a PAREN "tag" that can 
be used in completely different ways in completely different contexts.

In my opinion, you must THROW OUT the notion of type to make progress on 
this front. Of course, you can then re-introduce the notion of type at 
some higher level. But I think that we should make this lexical level 
powerful enough to do everything we need it to do before we move on to 
the type level.

 Paul Prescod

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list