SGML and XML
Simon St.Laurent
SimonStL at classic.msn.com
Wed Nov 26 13:35:47 GMT 1997
XML is the best opportunity I've yet seen to create a standard which handles
documents (and other data) intelligently yet simply. Whatever XML's roots
(which of course are SGML), XML has the opportunity to reach an extremely
broad audience - an audience the size of the current (and future) HTML
audience, not just the established SGML community.
The terms of the XML discussion have always been framed in SGML, and are
likely to continue to be for a considerable time to come. While that has
advantages, I don't think the concept of using XML as a Trojan Horse to
introduce SGML proper to a larger audience is a good one. I gave a seminar
two weeks ago in Washington DC to the ACM - a place and an organization that I
would tend to think of as friendly to SGML. Of 50 people in the seminar
(which was on Dynamic HTML), 15 had worked with SGML. Every time I brought up
SGML (in connection with XML, CSS, and the DOM), I was greeted with questions
about "is that really necessary?" "Are those SGML people trying to change
_our_ world?" These questions didn't just come from the HTML beginners; many
of them came from the developers who had worked with SGML, some quite
extensively. At lunch the discussion quickly turned to XML, and I had to do a
lot of convincing to get people 'past' SGML.
For public relations reasons, it seems like XML needs to be able to have it
both ways. Companies already using SGML and developing SGML tools need to be
encouraged to accept XML - not as a replacement for SGML, but as something to
take seriously. The larger non-SGML community, however, needs to be given XML
as something new and different. XML should not just carry in SGML's
reputation as a complicated, slow-to-develop, and difficult-to-implement tool
of the Federal Government. XML evangelists need to be able describe the
problems that XML fixes and how it fixes them, without reference to enormous
systems that SGML has created in the past.
>So XML says it is SGML. Furthermore, the recent correction to SGML (WebSGML),
which
>is in its next-to-final draft before release (it has already been voted)
>means that there should be no doubt that the national standards bodies
>involved with ISO want SGML to be XML-accepting too. I have attended ISO
>meetings on this, and the ISO people certainly do not see XML as something
>independent of SGML either.
XML says it is SGML. Fine. But should the future development of XML be aimed
at gradually including SGML features, or should it be aimed at meeting the
needs of the developing XML community? I expect the XML community in six
months to a year to be rather distinct from the SGML community and hopefully
quite a bit larger. This issue will grow; we'll see what the W3C and ISO do.
>The complexity of unadorned SGML and the generality of its toolkit approach
>is the thing that made it dificult. The very thing that makes you rich makes
>you poor.
And conversely, the thing that makes you poor will make you rich. HTML took
off because it was brilliantly simple. (There were plenty of other factors,
of course, but simplicity was key.) SGML has done very well in sectors that
were able to make the investment in learning SGML, developing in SGML, and
creating systems around SGML. XML has the opportunity to take its much
simpler toolkit to a much larger audience. Simplicity is key to reaching that
larger audience; adding SGML features, even with an on/off switch, is likely
to confuse new users of XML while still disappointing the SGML community.
>But if a company wants
>to use something more powerful at their back-end, why shouldn't they use
>a more powerful language nearer SGML if that serves their inhouse needs
>better. And why shouldnt Microsoft allow this in their parser?
If a company wants to use something more powerful, why don't they consider
'real' SGML an get a parser designed for that instead of creating documents
that are called XML but are no longer XML? Using this suggestion effectively
will require a new series of standards to define what features of SGML have
been added to a set of documents so that people don't blindly run them through
XML parsers with the switch set wrong. Data interchange will be a mess, once
again.
>XML development has been an exhaustive analysis of every part of mainstream
>SGML. And I think almost everyone on the SIG would agree that there are
>good reasons for almost all the non-intuitive parts of SGML. However, the
>need to be straightforward (the #1 goal of XML) means that there is
>a different cost/benefit trade-off for deciding what should go into the
>base language (compared to SGML in the early 1980s).
There is a completely different cost-benefit analysis. XML is the grand
opportunity to extend generalized markup to a far larger audience than exists
today. There may be good reasons for almost all the non-intuitive parts of
SGML, but the fact remains that these non-intuitive features have been
barriers to use and development. After reading some of the ISO specs and too
large a chunk of the SGML literature, it became quite clear to me why SGML
never percolated down to small companies and developers. It's too complicated
to be used without considerable upfront investment.
>The English-using world already runs on SGML. Computer chips, air
>transport, legal systems, the military, many stock markets,
>much print media, diagnostics of office equipment, and (with HTML 4.0)
>WWW. Any claim that SGML is not good for what it has tried to do
>are wrong, as far as the market has spoken.
The market has spoken that SGML does a great job for managing enormous amounts
of information. It has also spoken that SGML presents enormous barriers to
entry (steep learning curve, cost of development, etc.) that have kept a lot
of people from using it. SGML does a great job in many systems. The "many"
there, however, is a tiny select few compared to the many that a simpler
syntax (i.e. XML) could reach. The scale of those projects is very different
from those XML makes possible.
>And, in any case, the distinction between SGML and XML people is entirely
>spurious. If you use XML, you are an SGML person.
This distinction will grow as XML is adopted more widely. Visit the high-end
web development mailing lists and you'll find an incredible amount of
hostility to SGML but a simmering interest in XML. If you use XML, you are
using SGML tools. This does not make you an SGML person. As you may have
detected, I do have a certain amount of hostility toward SGML and SGML
culture, while remaining very enthusiastic about XML.
>SGML is not the enemy. The enemy is poorly described data that is no use,
>and systems that are inappropriately complicated (or simple) for their
>user requirements. SGML is merely a toolkit for constructing markup
>languages, which includes a lot of features that are not relevant
>to delivering structured data over the Web.
XML appears to be addressing the problems with SGML that have kept it from
being used by a wider audience. Poorly described data is the real enemy, of
course. Attacking that enemy in a larger sense requires a reconsideration of
the weapons we have used previously and a refinement. XML's simplicity will
encourage a large number of people to describe their data properly, people who
wouldn't have bothered with SGML.
This is an improvement, and the SGML community deserves great credit for the
effort they have poured into building a simple but useful toolkit, which
avoided the byzantine complexity SGML proposals are known for. XML is more
than just SGML, however. XML is going to bring a lot of 'bozos' into the
field of markup, people who care neither about the history nor the theory and
just want to get things done. A different attitude and different needs will
very likely increase the demands for XML to find its own voice.
I could, of course, be dead wrong. We'll know in a couple of years.
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list