XHTML and the Three Namespaces

Wed Sep 22 01:37:51 BST 1999

> :
> 
	Andrew Layman wrote:

> One thing that people would like is to be able to clearly define which
> documents are valid per the Strict, Transitional and Frameset rules. This
> is
> currently done via three DTDs.
> 
Fine, though I would argue for a single DTD with conditionals.

> Another thing people would like is to be able to indicate in a document
> which set of rules the document is intended to conform to. This is done by
> giving each of the three grammars a namespace, and saying that the
> elements
> in each namespace are to be validated against the syntax in the DTD
> corresponding to that namespace.
> 
I assume you mean including XHTML in some non-XHTML document. First, this is
a general XML problem not an XHTML problem and should be addressed by
schemas not second-guessed. Second, there is no way to associate a namespace
with a DTD and force validation in XML 1.0. Third, the DTD for the outer
document would have to include definitions for the XHTML elements in which
case the appropriate definition could be chosen. What use is it to validate
a section of XHTML when the entire document would fail validation? You are
describing significant changes to validation mechanisms which are in no way
part of XHTML's responsiblity.

story.dtd
...
<!ELEMENT story (headline, pix, body)>
<!ELEMENT headline (#PCDATA | html:p)>
<!ELEMENT pix (html:a)>
...
story.xml:
...
<story><headline><p>here's some <emp>XHTML</emp> text</p></headline>
...
</story>

If my DTD doesn't include XHTML elements, the document doesn't validate. If
you don't validate, then the XHTML isn't validated either - its only well
formed.

> Browsers, and other software specifically designed to deal with XHTML,
> could
> deal with this fairly easily by hard-coding the relationship between the
> definitions attending the three namespaces. However, generic software
> would
> find no machine-readable connection between the namespaces, and this would
> lead to awkwardness. For example, a search for documentw with "A" tags
> that
> specified the Strict namespace would miss all documents containing
> Transitional "A" tags.
> 
In fact browsers would have to ignore the namespace prefixes in order to see
the XHTML elements, unless you want all browsers to use hard-coded URIs for
matching. In any case I thought XHTML was supposed to be XML- cleaned HTML -
no implicit element ends, quoted attribute values, etc. That being the case
an XHTML document should be usable by a HTML4 browser and vice-versa. But if
namespace usage is required that won't be the case. If I pasted a section of
an HTML document into an XML document I would then need to add namespaces to
it.

> Are three namespaces the right answer?  Here is a provisional phrasing of
> the problem we need to solve: How can we reliably distinguish elements
> requiring slightly different processing, while at the same time permitting
> them to be processed similarly to the degree that the differences do not
> matter?
> 
What code is distinguishing and what code is processing? An application is
going to require special code to process any XHTML content anyway.
Applications are going to have their version of validation anyway to
interpret the elements.

> The content model of b:X permits subelements not permitted in a:X.
> 
> So I ask myself "Is this problem unique to XML, or has it appeared in
> other
> contexts, and if so, how was it solved there?"
> 
> What I notice is that a very similar issue appears in languages such as
> Java
> or C++, and is solved in the following manner:
> 
> Package A;
> 
> Class X {
>   Object Y;
>   Object Z;
>   }
> 
> Package B;
> 
> Class Y extends A.X {
>   Object W;
>   }
> 
Except that you forgot that the difference for a number of elements is the
allowed attributes, which may or may not be present. Also, what about the
reverse case where class Z is class Y except Object Z is not allowed (i.e.
where one version excludes elements allowed in another). You can not
guarantee that there will be only one classification of difference so that
the solution is to reverse the order to use the above solution. 

A:	allows a,b,c
B:	allows a,b,c,d
C:	allows a,c,d

The Extends method does not work in this case.

Also, namespaces by no stretch of the imagination even mentions the concept
of derived namespaces. Again, this group is charging into a domain that is
not decided, does not even have a working draft, and is not their
responsiblity.

> From this I conclude that if we had a way to declare the extended content
> model of B as an extension of that of A, then we would be able to express,
> in a machine-readable form, the relation between b:X and a:X.
> 
> Given that, it would be proper to have three namespaces, each designating
> a
> slightly different set of validation rules.
> 
> So our present difficulty appears to be a timing problem: the three
> namespaces distinguish the different validation rules of the three
> categories of elements, but there is at present no machine-readable way to
> express their relationship.  What we have now is readable by humans, but
> not
> by validation programs, and what we will have eventually that is machine
> readable is still under design by the Schemas working group.
> 
So use one namespace and wait for Schema's decision. Assuming what the
result will be at such an early stage is not repsonsible. And any HTML4
documents out there would need to be edited to add namespaces to be used in
another XML dialect. So much for cut and paste of content.

> Three namespaces, and the consequent mapping and other conversion
> processing
> is certainly more expensive than if we had only one namespace, yet that
> expense must be compared against other alternatives that actually solve
> the
> problem we set out to address: How to reliably distinguish elements
> requiring slightly different processing, while at the same time permitting
> them to be processed similarly to the degree that the differences do not
> matter?
> 
As you have mentioned, many times the differences between the allowed
content are extra attributes or content. But, such elements or content are
not required to be present. Hence an element can be valid under more then
one DTD at once. So my 50 page document needs to have its outer element
prefix changed because I just added an element that makes it invalid under
the namespace selection I previously made? Any HTML4 documents I have can't
have sections inserted into an XHTML document without editing (manual or
programmed).

> Certainly, interpreting a document incorrectly because we did not read the
> relevant definitions and mappings is not attractive.  Nor is it attractive
> to label different element types indistinguishably so that the relevant
> definitions cannot be determined.
> 
> Of the alternatives that I have seen, only the proposal for three distinct
> namespaces seems to have sufficient information in it.  Perhaps I have
> overlooked a proposal that also works, but at this point I conclude that
> the
> burden of proof should rest with those who assert that the three namespace
> approach is faulty, and any such proof should include a demonstration of a
> workable, better alternative approach that actually solves the same
> problem.
> 
No, the burden is that your group is an XHTML group not an XML group and not
the Schemas group. It is not your job to solve this problem. Your job is to
make a XML compatible HTML, not define schemas. What if the means of
implementing schemas ends up being:
	<p xmlschema="strict"> instead of <strict:p>?

I could say that you solution doesn't solve the problem without significant
changes to the nature of validation and XML processing. You have focused in
on validating a fragment of XHTML in a docuemtn and forgotten about
validating the rest of the document. Why is the fragment so special?

Simple solution:
1. If the document is XHTML, use the appropriate DTD in the DOCTYPE
declaration
2. If you want a document that can be validated and includes XHTML, modify
your DTD to include the XHTML element and attribute definitions of choice
from the modularization files. If there is a conflict between XHTML elements
and the rest of the document grammer, use a namespace prefix.
3. If the document is not validated, only well-formedness is guaranteed.
4a. An application can recognize XHTML elements because they aren't any of
the elements that were defined in its grammer (i.e. application has code
that know what element's it processes and which aren't anyway).
or
4b. Add a default attribute to the XHTML modules, htmltype, which defaults
to 'strict', 'transitional', or 'frameset' and have the application check
for it (a convention just like using parameter entities for re-use). You
could even allow "strict, transitional" as a value if it is the same in
both.

This needs to work under XML 1.0, not XML modified for schemas. Use what 1.0
provides. The modules are being constructed under the asumption of XML 1.0 -
using multiple files and parameter entities, do the same for this.

Marc B. McDonald
Principal Software Scientist

Design Intelligence, Inc.
1111 Third Avenue, Suite 1500
Seattle, WA  98101
marc.mcdonald at design-intelligence.com
Ph: 206.343-7797
Fax: 206.343.7750

http://www.design-intelligence.com

> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on
> CD-ROM/ISBN 981-02-3594-1
> To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo at ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
> 

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)