Proposition: "SGML is Gumming Up the Works"
papresco at technologist.com
Sat Sep 12 16:21:04 BST 1998
The hardest part of coming to a new domain is recognizing what parts of
what we know from other domains do NOT apply.
On Fri, 11 Sep 1998, Mark Tucker wrote:
> Like "Samuel R. Blackburn" <sblackbu at erols.com>, I got interested
> in XML not because it is a good document formatting language,
> but on the promise that with XML, I can interchange Data!
Documents are data. Documents are pretty much the most complicated type
of data in existence. You can ignore everything that SGML taught us about
encoding complex data for interchange, but then you'll be forced to
reinvent it (and probably not as well).
> I'm a computer scientist. I see XML-Data, and think: "Hey, there are
> type definitions. And hey, this data file clearly contains instances
> of those types."
> But, stepping into the XML community, I'm overwhelmed by the SGML
> history of XML. I'm told: "No, conforming to type definitions isn't
> good enough. That is not Real Validation. You must be valid according
> to a DTD." (Perhaps XML-DATA seems to have died because it wasn't
> DTD-ish enough; I don't know.)
XML-Data died because it was half-baked crap that did not meet real
needs. DCD will hopefully follow it into oblivion. Nevertheless, XML-Data
was very "DTDish". At its heart is the same grammar that you complain
about. The type system junk is a slight extension to that grammar.
> And then, looking at DTD's, I find that they aren't even as good as
> BNF context free grammars. And BNF is much weaker than type systems,
> which we need and want.
Consider, for a moment, your brain. It has all kinds of great type
systems. It's full of Venn diagrams and hierarchies, right? But then,
when you want to converse, you flatten all of that out into text streams
described not by hierarchies and Venn diagrams, but by regular
expressions and context-free grammars.
Saying that BNF is weaker than types systems is equivalent to saying that
hammers are weaker than screwdrivers. They are not comparable. Grammars
describes serialization syntax and the other describes a data model.
If "type systems" could replace serializations, then we wouldn't need
XML, would we? We'd just use Java's type system.
> So, we end up jumping through hoops to write DTD's to express DATA
> which is very, very, very easily described in terms of modern
> programming language type systems. All the while, hearing a low chant:
> "What kind of cretin are you? You don't want to *validate* your data! (shock)
> You only want well-formed documents." -- NO and YES. I don't care
> if my document can be validated by a pitiful DTD. I do care that
> it conform to a real type schema!
"Bang. Bang. Bang. I think I bent my screwdriver." I hate to let you
down, but when you serialize your data model into XML, all you have is
characters. Characters have to be verified according to the techniques that
God and Chomsky provided for verifying character streams: regular
languages, context free grammars, regular tree grammars, etc.
> I'm not really mad at XML but, I think "Richard L. Goerwitz III"
> <richard at goon.stg.brown.edu> is on to something in wondering if SGML
> compatibility is going to bring down the XML effort.
That's not what Richard L. Goerwitz III said. You are projecting your own
feelings into his messages.
> If you have to be an SGML wizard to express easy things,
> then we're in trouble.
> Much of the initial selling of XML was:
> You don't need DTD's to be a good citizen.
> P.S. I'm optimistic about RDF, and am afraid that DCD sold out a bit towards
> documents. I want DATA schemas!
> [Why DTD's aren't as good as BNF]
DTDs are much better than BNF. DTDs describe XML data. BNF describes a
MUCH larger family of languages. If we were to use BNF, we would have to
put constraints on the BNF that would make it almost identical to DTDs.
Here's the ironic part: you are right that it should be possible to use
the same element type name in multiple contexts as long as it isn't
ambiguous (as in C). I have a proposal for an extension to DTDs (or
schemas) that would allow that.
The problem is, that when you try to combine this advanced facility with
type system-based proposals (e.g. inheritance, subtyping, etc.)
everything goes to hell. The irony is that it is people who are screaming
for "types" instead of lexical constraints who are *weakening* the
lexical constraints that would make DTDs (or schemas) closer in power to BNF.
What does it mean to "subclass" the PAREN element type when it is clearly
used in two different contexts with two different content models? The
answer: there is no PAREN type, really. There is a PAREN "tag" that can
be used in completely different ways in completely different contexts.
In my opinion, you must THROW OUT the notion of type to make progress on
this front. Of course, you can then re-introduce the notion of type at
some higher level. But I think that we should make this lexical level
powerful enough to do everything we need it to do before we move on to
the type level.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev