A Plea for Schemas
Len Bullard
cbullard at hiwaay.net
Tue Nov 2 02:30:25 GMT 1999
Matthew Gertner wrote:
>
> I have written a short "XML Rant"
Enjoyable. It is good to see some reasonable passion from a
reasonable mind. Here is some rant for the rant.
o "the 1980s, Charles Goldfarb invented SGML". Ok for a
rant, but ISO created SGML. If any man can be said to
have lead that work, it is Dr. Charles Goldfarb at IBM Almaden.
He was a member of the IBM team (Goldfarb, Mosher, Lorie) that designed,
GML.
To the idea of GenCodes, GML added among other things,
type-defined namespaces for markup. GML and research were combined to
propose and ratify ISO 8879. Invention like that is a community
process. Dr. Goldfarb leads that community.
In the late 1960s, publishers needed a means to
exchange working files. A solution proposed at that time,
GenCodes, was supported. The limited power of sharing
the same single namespace (the Gencodes) did not evolve.
The reasons are not complex and are the same as HTML:
the namespace represents a local application context.
When shared for all types, it limits the expressiveness
needed to document multi-context real time events.
o "..thousands loved it." Conceded. SGML was an expensive
system deployed on then mostly mainframe and mini environments.
Who had it? Aerospace tech writers, some artists, and lawyers.
Why? They had a use for it and the costs were justified
relative to the cost of the lifecycle of the information
in its topical context. Manuals. Expensive ones.
SGML lends itself to interpreted means and interpreted
means are inefficient. That is relative to resources.
As soon as SGML was moved to PC-based systems,
it became cost-effective. There are and were examples of
SGML-based systems working well for hypertext client
applications in those environments. Except for
lowlyIADS, mostly expensive ones. Systems like
IADS proved SGML, if deFanged a bit, could be
deployed cheaply. Free even.
IADS did not use a DTD. It used a stylesheet (circa 1990).
It had a DTD, and the tags within it were modifiable and
extensible via the stylesheet processor. Its tags (file, frame,
hyperlink)
were the equivalent of the ThenMalignedAndDespised PROCESSING
INSTRUCTIONS
but they looked like tags, so DTDs written for the system
incorporated them and went on about their business. Framing worked.
In 1989:
1. Software was expensive
2. Hardware was expensive
3. The dominant application of SGML (1000dpi print) was hard.
SGML emerged into more general use when more power
was on more desks. Complexity coupled to complexity
produces emergence. TCO. The critical innovation
to enable the emergence of SGML came from Intel, et al.
The unification of a significantly sized software base by a dominant
operating system company did the rest. Kick MS as much
as people want to, without them, the Web today would
still be something university students surfed and
researchers occasionally mastered, IMNSHO.
HTML emerged when:
o The Internet was opened to commercial use
o The power of the processor could support the
lowest-common denominator application of SGML
o Governments paid to implement and give away
a means and process to share the namespace in that
application
o A person to lead the effort emerged with a plan
that would work: Tim Berners-Lee, HTTP and HTML.
These convergent events, all in the same five years, gave you the
WorldWideWeb.
o HTML is a subset of SGML: NYET. Get out the ruler
and rap the knuckles. XML is a subset of SGML. HTML
is an *application* of SGML. It is obnoxious, and I
apologize in advance, but getting others to understand
**that** critical difference in thinking about markup is
very hard sometimes. Where I put "application", some
say, "vocabulary". Que bueno, but as Charles said,
"conserve names" and that is all.
Systems are invented or specified. Vocabularies are spoken.
HTML was not hobbled. It was distilled like other vocabularies
from agreements made among organizations to share information.
CERN, Univ of Ill, DARPA agree to make such agreements and
vocabularies are the result of that agreement. What the organizations
share are namespaces and the implementations of processors for
creating, adding, deleting, or modifying statements in those
namespaces. HTML was GenCode: partDeux. TimBL gets the credit,
but there were those who helped him and if you ask, I'm sure he
will tell you names. Names are what is shared.
It's all about names. Read the XML 1.0 and, IMHO, that
is the conceptual breakthrough to understand markup. In essence,
SGML has always been principally a lexical standard. That
structural integrity is important, and specifying that
provides the necessary freedom from implementation
to enable an inexhaustible range of expression.
It makes the agreement needed to implement a
system to use it very expensive. XML locks
down the SGML Declaration. Most of the biggest
changes from SGML start there. To keep the original
expressive power, the means for making beyondLex agreements
are still needed.
A DTD is not about lexical validation only. It
is about validating a hierarchical namespace to
determine conformance. Whether you use DTDs,
MS Schemas, XML Schemas(someday), or just use
the table design window for Access or Oracle,
validating a vocabulary requires you to declare
one or derive it. IMHO, of the two means, declaration
is usually cheaper, but it is always political.
Politics are human means to declare namespaces.
BizTalk and OASIS both exist because of the names
and interest of those named in the shared politics
of creating their shared namespaces. That is all.
XML does not care.
Syntax unification is not enough. Using markup systems
requires you to accept the idea that the namespace is
primary. What does that mean? Just as sql systems
must disambiguate aggregate naming, so must markup systems.
A name means what you need it to. It must be unique and persistent
to be a name and you require a means to discover if it is
meeting that need. Trust but verify.
Schemas are just one of the tools for discovering if
that is the case. You can do more with schema information
in the same way the relational system does it. Names
are associated to create processable unique names.
You can do a lot with the DTDs and schemas, really.
They are just metainformation by which
you agree to organize the screen and the objects on it,
or the messages among objects, or whatever you want
to talk about. The reason to use them
is to validate or as a source for initialization. In
effect, they really are, just another database of
names and values. That is what makes using XML
Schemas (in deference to DTDs), attractive. Application
outside very specialize ISO 8879-conforming processors
for DTDs are also useful for managing the namespace
of that metainformation.
DTDs do not aggregate; so, if instances do, they
are not validatible. That does not keep them from
being useful. The names in the space are unique.
Their persistence is questionable, yet if you treat
them as a relational designer treats a view, they
are very useful. Well-formed is what you need for
any lifecycle of the information. Valid is what
you need to ensure correct processes among systems
that use the information at particular times. When
a formal means to persist these better is provided,
then we have a very good system for maintaining
namespace communities.
Schemas organize a namespace; not doing that is
relaxing a design constraint on the namespace. Relaxing
that constraint is efficient particularly at this
time when database systems are so cheap and ubiquitous,
using them for serving strings is optimal. Correct-
by-construction from a trusted source is faster,
more compact, and less-restricting on system evolution.
Badly-formed HTML? It was a trade-off. It cleans
up over time. Better tools, better hunts, better times.
All XML says is, you don't have to use the DTD.
It doesn't say it isn't useful. Enlightened XMLers
write them and use them and even throw them away.
A DTD is snapshot of the organization of a namespace
in time. Time moves on. Information does too.
The DTD might not. Some part of it probably
will and will influence the next version. The
reason to use or not use a DTD or any other
schema is determined by the namespace evolution:
and evolution of agreements, so cooperation.
Cooperation among large human communities is
always furthered when agreements about what
to name the names are simple and easy to verify.
When the means to communicate among companies
became the Web, the need to verify these agreements
by simple means became an ecological imperative.
So, patience. But don't quit pleading. Namespaces
are gardens. To grow usefully, they have to be tended.
It takes tools, lots of them, for particular
purposes, to do that. Most of us have sheds full of
tools we only use occasionally next to ones we use
every day.
That golden 10% of XML is the distilled essence of
SGML and the years of practice and competing, sometimes
awkward specifications and standards written there
by all of the people I met in those years. Even
those HyTime guys worked on creating XML. HyTime,
DSSSL, TEI, but before them, Dexter, FRESS, Englebart,
all feed the single stream that is now XML and as
with SGML, all the competing, sometimes awkward
specifications being written by many of the same people.
If you want to plead for schemas, I plead with you. Schemas are a
tool for validating agreements among overlapping namespace
communities. Ecom-ecologies (keiretsu) emerge because
the tools they use to make agreements, their namespaces,
become efficient. S=KlogW - Boltzman. To control
the temperature, control the value of W. DTDs help
you control the rate at which entropy consumes referents.
The trick to fix the web is to fix the web's indexes.
To do that, ensure the agreements by which the indexes
are made enable validation of the namespaces indexed.
Well-formed, and valid by agreement are the keys to creating
semantic space, overlapping vocabularies, if that is what
you want.
DTDs are a tool to make agreements. Beyond the agreement are the
names that agree. XML Doesn't Care. You do. You write:
Dilution of the basic principles of generic markup, and
misunderstanding of their purpose, will then give rise to
inevitable
disappointment, and hence rejection: "We switched our whole
company over to XML and we still can't interchange data
effortlessly.
So this means that XML doesn't work, right?"
How many 'MLers here want a dollar for every time you've heard that?
Tell 'em, "ahh, XML Works. We just don't agree on how."
len bullard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list