XML complexity, namespaces (was WG)

Thu Mar 18 17:41:20 GMT 1999

"Richard L. Goerwitz" wrote:
> I just updated STG's web-available validator
> to cope with namespaces.  I'm not claiming that I got it right on the
> first pass.  But the updates should help those of you experimenting
> with the Jan 14 spec:
> 
>   http://www.stg.brown.edu/service/xmlvalid/

Thanks; its interesting to experiment with.

> After working with them now for a few months, I can't say I'm any more
> impressed with namespaces than when I started.  Why?
> 
>   --  No no matter what anyone says, they screw up validation.  --

I observe the same results, but draw different conclusions. I'm not any
more impressed with DTDs than when I started.
> 
>     1) because DTDs aren't namespace-aware, and therefore
>       a) don't know the difference between a defaulted element and one
>          that simply has no namespace
>       b) have no scoping mechanism to at least allow you to kludge
>          namespace defaulting by restricting elements to one or another
>          part of the syntax tree

These are problems with DTDs rather than with namespaces as such. 

>     2) because namespaces require you to parse attributes and values
>        fully before finishing element name processing; this is bad be-
>        cause it
>       a) makes one-pass parsing more difficult, and requires retention
>          of much more information during the parse
>       b) makes for unexpected interactions between the DTD (which may
>          provide default attributes for a given element, including
>          xmlns="" - which puts the element into a namespace)

Ok, and fair comment, but it seems a reasobable power/complexity
trade-off to me.

>     3) because inherited attributes are inimical to the whole DTD
>        concept
>       a) it was bad enough that we had to put up with xml:lang and
>          such (which processing software must pass down the parse
>          tree), now the XML standard itself has inherited attributes
>          built in with namespaces

Inherited attributes are a powerful and obvious concept; again, it seems
to be DTDs which are insufficiently expressive rather than namespaces
which are broken.

> And my observation is that namespaces screw up validation.

screw up validation *with DTDs using current DTD syntax*, which is not
the same thing at all. 

Unfortunately I came across EBNF long before I came accross DTD syntax,
so about half an hour after meeting DTDs I was, like, what do you mean
it can't express that this attribute is a url? Why can't it express that
this attribute is an ISO standard date?

So I quickly formed the opinion that DTDs really got in the way of
validation ;-)

> This is all very bothersome because validation is one of the key points
> that separate XML from HTML, 

that separates XML from HTML practice. HTML theory always required
validation, doctypes, all that good stuff; but the bar was massively
high and thus the spec was not really relating tothe users at all. I
first saw an implementation of HTML only a couple of months ago (the
DocZilla browser from Citec).

With XML, the bar has been owered sufficiently by throwing out the
criuftl bits of SGML, that it becomes an achievable target. So, there
are lots of users stress-testing XML, which is great, and getting much
more from it than was possible with typical HTML "implementations" which
is also great. But one result of that stress testing is that DTDs (which
were just about OK in a closed, single system, single user world) are
poorly suited to an open, multi-user, Web-enabled world.

You know what they used to say about SGML; its assymetric. Getting the
data in just takes a text editor, but getting it out again requires a
consultant. Well, with XML, the effort to get some benefit from XML is
reduced because of economy of scale - someone somewhere will have the
dtd you want to do part of your job. Build what you want from a kit of
parts that other people wrote; add a little glue, and off you go.

That model has been spectacularly succesful in programming; namespaces
gives that same power for XML. 

Yes, validation is important - and I mean real validation, with no
critical-path human-readable comments in the DTD and multiple utilities
to check different aspects of validity (like separate scripts to ensure
that an attribute is a valid date or customer number).

So what is critically needed is a real, namespace-aware, schema language
that can be used to do real validation.

> and potentially make it better.  With XML,
> anyone can define their own HTML, so to speak, or another markup lang they
> find useful, and then simply publish a DTD with it.

Right; in the same way, anyone can do the data modelling rquired to
define a database format and anyone can write a parser. In theory. But
most people choose not to, and to use ones that someone else wrote. This
works for code. It should work for data, too.

Since many people will have come across *some* aspect of a users problem
space before, but no-one will have come across the *exact same* entire
problem, then namespaces are required so that people can build what they
want from a distributed kit of parts.

> It's to
> the point where the only people who can write effective HTML processing
> software are outfits with armies of programmers hired to deal with error
> recovery and proprietary extensions (both their own and their competitors').

Yes. Typically 95% of the programming effor is in the reverse
engineering and undocumented trickery; implementing the actual specs is
the remaining 5%.

> With XML, we can potentially start out on the right foot, and avoid this
> nonsense 

Yes

> by using validation from the start. 

For stand alone documents ina single namespace, that can still be done.
For combinations of particular namespaces, it can be done declaratively
and a resulting DTD auto generated, but that is fragile because it makes
assumptions about the namespace prefix and limits the use of namespace
defaulting. Given a more powerful schema language, creating a schema for
a new  XML application should be as easy as reading in a selection of
DTDs and doing drag and drop tree construction withthe component parts.

> Well-formedness is nice, but it's not clearly enough defined 

It seems fairly clearly defined; it may not be sufficient.

> (and anyway, many non-validating
> processors find it necessary to at least grab attribute defaults, if not
> also look for parameter entities and conditional sections). 

You mean, ones in the external subset? If code is doing that, it might
as well do validation too.

>  Using it
> alone could easily put us back into an HTML-like mess.

Oh no, even that gives us more than HTML-in-practice. For example,
different applications can actually be assumed to be using the same
parse tree ;-) which does make the DOM and style sheets a whole lot more
predictable.

The combination of SGML omissible start-tags and HTML extensions meant
that all parsing wa error recovery and that no two brwsers would have
the same parse tree. Or if they did, it was because they had five or so
different possible parse trees around, depending on what you were doing
;-)

> So the problem now is how to encourage validation despite the fact that
> the W3C has apparently shot DTDs and itself in the foot with namespaces.

Or rather, the problem is that the W3C (and the general public)
exercising XML and putting it into real practice has made painfully
obvious some shortcomings inherited from SGML The W3C XML Schema WG will
however solve these, I am confident of this.

That isn't shooting ourselves in the foot; it is more akin to
discovering that movement is much faster when your clogs aren't nailed
together, and pausing for a while to separate them and to develop
running shoes. 

> The answer, obviously, is to shed any pretense of DTDs being the basic
> XML schema mechanism. 

For declaring multi-namespace documents, yes. They still have at least
an interim role in validating single namespace documents and in defining
the building blocks from which a multi-namespace schema can be
constructed.

> We could waffle for years, claiming that both the
> DTD and some other mechanism are "standard".  But what's this supposed
> to do to the complexity (remember complexity?) of our processing soft-
> ware?
> 
> It's not like it's any harder to construct a schema mechanism that
> offers a superset of what a DTD offers, and then provide simple conver-
> sion tools.
> 
> Yes, SGML compatibility was an original goal.  But a lot of original
> goals seem to have gone out the window.  Another one isn't going to make
> any difference now.

;-)

> The only problem with this scenario is that it will horrify the old SGML
> community, which looks to me as if it's trying to kludge architectural
> forms onto XML, maybe in efforts to save DTDs.

There are significant portions of the old SGML community working to
improve XML and to help build the missing parts which are needed. I have
a lot of rwespect for that portion. There are, as you say, other parts
which are merely trying to save their own highly paid jobs as priests of
complex, low-powered technology. One can usually tell the difference by
noting that the former portion have their eyes open.

> It's all getting rather bizarre.  Again, I say this as someone who has
> gotten with the program, and implemented everything the W3C has put out

Cool. Implementation experience is like gold.

--
Chris

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)