RFP: Namespace URI for HTML

Fri Sep 10 02:41:37 BST 1999

Sebastian Schnitzenbaumer wrote:

> > But an HTML processor is supposed to accept a well-formed document and
> > gracefully ignore unknown elements (actually treat them as text). So,
> what
> > happens when your cellphone microbrowser gets a frameset document
> instead of
> > a strict document? Does it just put up an error box and show nothing?
> How
> > does a non-validating parser ensure a document is frameset or strict?
> 
> In this specific scenario, there will be transformation on a proxy 
> server, trying to make the best out of it. But the microbrowser itself 
> might just render documents of a specific type.
> 
> > Namespaces do not define the set of valid names, they only allow
> > differentiation. Without validation there is no enforcement that a
> document
> > is strict, frameset or transitional. Since the namespace declaration has
> no
> > enforced meaning, why bother with it? 
> 
> Differentiation is the point. Strict, frameset and transitional are only 
> the base family members. It is likely that there will be a larger 
> XHTML family, where family members will be even more different 
> than just those three. As I said in my first mail, there is more to it 
> and I'll continue here.
> 
> HTML is a damn useful vocabulary after all. Designing a completely 
> new XML language is often the only way. But sometimes, a new 
> application is rather a mixture of the features that HTML (or a 
> subset of HTML) already provides together with entirely new 
> features. In this case, one would re-use a subset of HTML in a new 
> XML language, forming a new XHTML family member. 
> 
This is a general problem not specific to HTML. First, there is no way in
XML to use part of another DTD except through the kludge of parameter
entities. Re-use of parts of an XML language is a general problem, not
specific to XHTML. Again, namespaces DO NOT define a vocabulary. 

> If my new language wants to allow the use of images, instead of 
> inventing my own tags, why not take the image module from 
> XHTML, authors will be happy since they don't have to learn 
> something new.
> 
Because there is no such thing as 'the image module from XHTML' and putting
'html:' in front of it doesn't make it so. I could put
'externalImageReference:' in front and get the same effect. Both would have
to be special cased in the application. If you want modules, wait for a
modules spec that applies to any XML language instead of inventing something
special for XHTML that doesn't work.

> Lets go a bit further. You have written a new XML language for 
> Forms. In the end you realize that the part dealing with form 
> controls and forms logic is fine, but the visual representation of 
> forms, ie. the definition of the page, the text formatting and layout 
> is actually better done by HTML. You take a subset of XHTML for 
> that part.
> 
> The new language, however, has no DTD. The XHTML DTDs have not been
> constructed for reuse of their parts using the only available mechanism,
> parameter entities. So, cutting and pasting of definitions from the XHTML
> DTDs would be required to define such a language. 
> 
> Namespaces don't define a language. All you have done is punt a second
> level of parsing off into the application (with no validity checking).
> 
> Your language is bound together with a subset of XHTML, but is 
> still a new, unique XML grammar. If all XHTML variants were one 
> namespace, then that XHTML subset being used in this new XML 
> grammar would also belong to the XHTML namespace. The new 
> language would need the change the default namespace from 
> XHTML to the rest of the language all the time or use colons. But 
> logically, this is a different kind of animal, and should have its own, 
> unique namespace so applications can identify it as such.
> 
In this example you have made XHTML structure the root and the new elements
the leaves (the opposite of the previous example). First, constructing a DTD
for this language would involve cutting and pasting since you are defining
new possible contents for some XHTML elements. 

In particular you would be redefining the body element to accept this new
kind of form definition (newlang:form as opposed to html:form). So really my
body element is different from the HTML body element so I should prefix it
with a newlang. In doing that I've changed the top level 'html' element's
content to use 'newlang:body' so I should prefix that with newlang too. The
result is I've used the new prefix on all of the higher html elements.

   <!ELEMENT newlang:html (head, newlang:body)>
   <!ELEMENT newlang:body ((#PCDATA | %block; | newlang:form | %inline; |
%misc;)>

So, none of the HTML elements would belong to the HTML namespace. Here you
have a basic question: does a namespace define the content model of a
language or does it define the semantic meaning of its elements. The answer
is it defines neither, don't try to make it do something it is defined not
to do.

> > The only reason I've seen presented is fragments. BUT, there is a
> fragments
> > working group, why not let them find a general solution to the problem?
> Why
> > are you usurping their authority?
> 
> I just wanted to point out that it is sometimes handy to exactly 
> know what kind of XHTML this is, especially when we have many 
> different XHTMLs. Fragments were just an example, I'm not 
> usurping anyones authority.
> 
I doublechecked the W3C groups and it looks like there isn't one that has
been given the charter of re-use of DTD subsets (call it modules if you
wish). Maybe it falls under the schema group.

I just see XHTML trying to provide its own solution to a general XML
problem, the solution doesn't work, and just adds to complexity and
confusion. The issue should not be addressed by XHTML at all. There should
ONLY be a general solution to the problem of reuse of sections of a DTD or
schema.

In addition I see a confusion between namespaces, modules, languages, DTDs,
and schemas. Namespaces have no connection with any of the other items. That
is explicit in the spec and the use of URIs. Namespaces do only 1 thing:
resolve ambiguity. 

In the language examples given, I don't need to use namespaces UNLESS it
conflicts with another name in the document. It would be a convention to
identify non-ambiguous names. XHTML is changing it into a requirement, not
in any way validating the elements (pass in a form instead of an image ref,
how was that detected), and depending on an application convention of
checking the namespace of elements and essientially providing a subset of an
HTML parser to validate the elements. You could drop the convention of
checking the namespace and just check the element names (except if
ambigious). The namespace buys no simplification for the application.

Marc B. McDonald
Principal Software Scientist

Design Intelligence, Inc.
1111 Third Avenue, Suite 1500
Seattle, WA  98101
marc.mcdonald at design-intelligence.com
Ph: 206.343-7797
Fax: 206.343.7750

http://www.design-intelligence.com