Must XML be SGML compatible?
jarle.stabell at dokpro.uio.no
Wed Oct 29 13:34:06 GMT 1997
Sorry for posting another mail on this topic to this list, this posting would be more at home in XML.advocacy if there were such a thing...
Jarle Stabell wrote:
> * The ability of using non-XML-aware SGML tools on XML documents.
> How long will this benefit be of any substantial value?
David Megginson writes:
That's hard to say. There is an enormous number of SGML document
systems now in place, and (as we know now with the Y2K pseudo-crisis)
companies are _very_ reluctant to change their software once they've
installed a new system, especially if the system is the result of an
expensive and difficult project.
[JS] Then I'd say that they should stick with SGML, then they won't be bothered by XML not being SGML compatible and don't have anything to win with switching to XML if not for using better tools. (Their SGML documents are likely not well-formed XML documents in the first place)
> [JS] One of the benefits may be that parsing XML documents will run noticeably
> faster with a XML-specfic parser than a general SGML parser.
I don't know enough about automata theory to know if this statement is
true (or even verifiable) -- it seems to me, though, that the number
of productions in the grammar shouldn't affect parsing speed, and I do
know that SGML is designed explicitly to avoid backtracking by
requiring no more than one look-ahead token (to everyone's
[JS] I don't know the theory well enough to tell how the grammar size influences the speed of the parsers buildt with (LA)LR tools myself, but I guess the typical XML parser won't be buildt by such tools at all, because an XML parser probably is quite simple to build "manually", and because one generally gets faster parsers this way.
(Hopefully some of the implementors read this, they could easily falsify my view)
I haven't checked all the public XML parsers, I see NXP use JavaCC which is a parser generator, but not a typical Yacc/Bison sort of thing.
I would be *very* suprised if someone were able to write a general SGML parser being as fast as the fastest XML parser (in f.i. 3 years time).
You also state "to everyone's annoyance". This is exactly what I mean, is the SGML compatibility so much worth that we instead will force upon perhaps millions of users in the next 10-20 years syntactic design "flaws" which are well known to us today?
1) Credibility: by tying itself to a well-established international
standard (ISO 8879), XML can win over conservative users in
important areas like financial services and EDI.
[JS] Yes. But I'm not old/wise enough to understand that doing some minor syntactic "fixes" should scare those away as long as it will be an international standard with the great ideas of SGML intact.
2) Implementation: the XML standard will live and die partly based on
the enthusiasm of early implementors; piggy-backing on SGML gives
it a good, experienced implementor-base right from the start.
[JS] Yes. But to speak for one possible implementor (myself), I would be much more enthusiastic about it if I believed it was as well-designed as it could be.
I really believe in the "semantic" beauty of SGML, the tree structure/groves, the separation between document type and instance, DSSL/XSL etc and also the "general" concrete syntax, but I also think the general user would be better off if XML were *simplified* SGML, not only a well-defined subset/fragment of it.
Paul Prescod wrote:
The language that we use to encode humanity's knowledge should be a true
standard and not merely a "recommendation." That means that it should
be built upon our democratic institutions and not vendor consortiums.
[JS] Agree (with your whole mail). But why must we be so "selfish" to let humanity struggle with SGML compatibility? (No reply necessary... :-) )
As current SGML (and HTML) documents typically won't be well-formed XML documents, I just can't see the big practical win in ensuring that well-formed XML documents should be SGML documents.
What I feel *would* have practical value would be SGML documents being wf XML documents, and/or HTML documents being wf XML documents, but none of these will of course be true.
"Syntax is arbitrary"
[some french linguist or philosopher which name I can't remember the spelling of]
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev