Proposition: "SGML is Gumming Up the Works"

Chris Maden crism at oreilly.com
Sat Sep 12 20:47:23 BST 1998


[Mark Tucker]
> I wish I had a clean concscience. That I could say "I'm an SGML
> expert, and I still believe ...." But I'm a newbie, and definitely
> NOT an SGML expert.

If I may be so bold, it's possible that that lack of experience is
causing you to perceive a dichotomy where there really isn't one.

> Like "Samuel R. Blackburn" <sblackbu at erols.com>, I got interested in
> XML not because it is a good document formatting language, but on
> the promise that with XML, I can interchange Data!

As Paul Prescod pointed out, documents *are* data: very complex,
non-regularized data.  An informal proof of this is that XML was
designed for documents, and proved (completely as a side-effect) to be
very useful for regularized data.

> But, stepping into the XML community, I'm overwhelmed by the SGML
> history of XML.  I'm told: "No, conforming to type definitions isn't
> good enough. That is not Real Validation. You must be valid
> according to a DTD."  (Perhaps XML-DATA seems to have died because
> it wasn't DTD-ish enough; I don't know.)

Who told you that?  The biggest difference between XML and SGML is
that you do *not* have to be valid according to a DTD.  The only kind
of validation *defined by XML* is DTD validation, but (a) that
validation is not required, and (b) validation outside of the scope of
REC-XML is allowed (and in fact encouraged by the specification).

What you may have heard was a caution about imprecise langauge, which
SGMLers tend to be picky about.  If you used the word "validation" in
an XML context, someone may have pointed out that the word is well-
and precisely-defined for XML, and that you were misusing it in that
sense.

> So, we end up jumping through hoops to write DTD's to express DATA
> which is very, very, very easily described in terms of modern
> programming language type systems.

So describe it in terms of modern programming language type systems.
I'm not sure what the problem here is:

<data>
  <type-specification>
    <int name="i"/>
    <char name="c"/>
    <float name="f"/>
  </type-specification>
  <i>5</i>
  <c>h</c>
  <f>1.541</f>
  <j>Undefined data type</j>
  <c>Type violation error</c>
</data>

Of course, you'll have to write your own program to check the
type-validity of your document, whereas you can get DTD validation for
free.  But if what you need to do goes beyond DTDs, and you haven't
the patience to wait for the various data specification efforts going
on right now, then you have to roll your own.

> All the while, hearing a low chant: "What kind of cretin are you?
> You don't want to *validate* your data! (shock) You only want
> well-formed documents." -- NO and YES.

I'm not sure where you're getting this shock and horror.  Validation
is good because, and only because, it verifies that your data is what
it claims to be, and therefore other applications may make certain
assumptions without breaking.  If your serialized data stream can be
guaranteed because it came out of a database or is the result of
literate programming, then you *have* validated your data, though not
in the XML sense.

> I don't care if my document can be validated by a pitiful DTD.  I do
> care that it conform to a real type schema!

So create your own form of validation.  It's as simple as that.
Document geeks created XML, and its built-in validation is optimized
for validating documents.  Want something else?  Make it!

> I'm not really mad at XML but, I think "Richard L. Goerwitz III"
> <richard at goon.stg.brown.edu> is on to something in wondering if SGML
> compatibility is going to bring down the XML effort.

Quite the contrary.  If XML had not been built on SGML, new
applications would have had to be written to test the ideas in the
specification.  Building on SGML, XML was usable *the moment it was
created* with existing, high-powered tools.  If this had not been
true, it might have succeeded, but not nearly so quickly.

> Much of the initial selling of XML was:
> 		You don't need DTD's to be a good citizen.
> 
> I hope we can honor that promise.

The first part ("You don't need DTDs") isn't a promise, it's a fact,
enshrined in the XML specification:

   [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?

The second part ("to be a good citizen") is a completely subjective
statement and depends on so many philosophical variables that I won't
attempt to address it, except to say that no one could possibly have
promised that in any meaningful way.

If you like XML, use it.  If you don't, use something else; take the
good stuff from XML and leave the bad stuff out.  That's what XML did
with SGML; you go right ahead and do the same thing.  If we like what
you do, we'll use that instead, and you'll have your picture on the
cover of _Wired_.

-Chris
-- 
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list