HTML2_X.DTD

Tue Jun 17 21:42:45 BST 1997

Richard Light <richard at light.demon.co.uk> wrote:

> I'm probably not the only person to have done this, but I had a go at
> XML-izing the HTML 2.0 DTD. [...] 
> 
> However, two issues that remain are the use of '&' in the content model
> for <HEAD>, and the liberal use of inclusion and exclusion exceptions.
> 
> Both are invalid in XML, and neither can be trivially re-mapped to an
> XML-compliant equivalent.  Is anyone else interested in this sort of
> issue?  Any thoughts on how these problems should be addressed?  

For the HEAD content model:

	(TITLE & ISINDEX? & BASE?) +(META|LINK)

you can get rid of the inclusion exceptions by changing this to:

	( (meta|link)*, 
		(   (TITLE, 	(meta|link)*)
		  & (ISINDEX,	(meta|link)*)?
		  & (BASE,	(meta|link)*)?  ) )

then use the standard transformation on AND groups to get:

    <!ENTITY % head.misc "(META|LINK)*" >
    <!ENTITY % title 	"(TITLE, %head.misc;)">
    <!ENTITY % isindex	"(ISINDEX, %head.misc;)">
    <!ENTITY % base 	"(BASE, %head.misc;)">

    <!ELEMENT HEAD
	( %head.misc;,
	  (   (%title;,  (  (%isindex; , (%base;)?)
			  | (%base;    , (%isindex;)?))?)
	    | (%isindex;,(  (%title;   , (%base;)?)
			  | (%base;    , %title;)))
	    | (%base;,   (  (%title;   , (%isindex;)?)
			  | (%isindex; , %title;))) ) )   >

(A question of my own: Why does SP complain about e.g., "%base;?"
but not "(%base;)?"  I can't find the reason for this in the Standard.)

Addition of NEXTID, SCRIPT, and STYLE is left as an excercise to 
the reader (GAAAH!).

Or, more sensibly, you can follow Naggum's First Law of AND groups:
If the order doesn't matter, you might as well pick one and stick
with it:

    <!ELEMENT HEAD  (BASE?, TITLE, ISINDEX?, META*, LINK*) >

In this case the order does matter to some degree, since there 
are metadata schemes which require groups of METAs and LINKs
to appear in a certain order, so this is probably better:

    <!ELEMENT HEAD  (BASE?, TITLE, ISINDEX?, (META|LINK)*) >

This is stricter than HTML 2, but most HTML will need to be
modified anyway to be XMLized.

Inclusion and exclusion exceptions have to be treated on a
case-by-case basis.  The exclusion exceptions in HTML 2.0 are
used primarily to limit recursion (e.g., to make sure that an
"A" element can't appear inside another "A"), and in some cases
to undo the effects of inclusion exceptions (e.g., on TITLE and
SELECT to undo the inclusions on HEAD and FORM, respectively).

For the FORM elements you should do what HTML 3.2 does: Instead of
making (INPUT|SELECT|TEXTAREA) inclusions on the FORM element and
then excluding them from SELECT and TEXTAREA, just add them to the '%text;'
parameter entity so they can appear anywhere in content.  (That they 
must appear inside a FORM element is still enforced, but as an 
application convention rather than by the DTD).

Once the inclusions are taken care of, all the exclusions can be 
safely removed, since this yields a less restrictive DTD.

--Joe English

  jenglish at crl.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)