Partial DTDs (was: ModSAX: Proposed Core Features)

John Cowan cowan at locke.ccil.org
Fri Mar 12 19:34:14 GMT 1999


Simon St.Laurent wrote:

> Still, I can imagine that it might well be useful to privilege the external
> subset's initial contents, without retrieving stacks of validation
> information stored in external parameter entities.

Actually it's quite tricky to do that correctly, although the XML
spec is silent on just how.

In particular, you must be very careful about processing parts of the
DTD that follow a reference to a PE that you do not expand.  For
one thing, further entity declarations may have been overridden
and must be ignored, possibly leading to further troubles.
For another, conditional sections where the keyword is an
unknown PE reference must be treated as IGNORE.

The simplest approach is probably the approach taken in the
internal subset:  Ignore everything after the first uninterpretable
PE reference.  Is this really useful?  XHTML, for example,
loads its external PEs (lists of HTML general entities for
characters) almost the first thing.

> An XLink application,
> for example, might not care about retrieving and analyzing lots of element
> declarations when all it really needs is the attribute declarations for
> defaulting.  A mechanism like this might be useful in such a context - put
> attribute declarations in the ext subset, element declarations in a file
> referenced by PEs, and go.  Validating parsers would get all of it, while
> non-validating parsers could pick out the parts they need.

Well, yes, but what hope is there that people will structure their
DTDs in this oddball way?  To do so messily separates element
declarations from their corresponding attribute declarations for
the sake of an implementation hack.  I sure wouldn't do it if I
had any hope of keeping the ELEMENT and ATTLIST declarations
in sync.

> You could do the
> same thing with the internal subset, but frankly I'd rather not use the
> internal subset for anything I can avoid - management of an external subset
> is _much_ easier.

The internal subset is good for things like document-specific
internal general entities.
 
> (I don't think the spec is clear on whether a non-validating parser that
> has read the external subset is then required to go get PE values; I
> suspect it doesn't have to.)

No, it doesn't.  An NVP is privileged to not read any and all external
entities with the sole exception of the document entity, which it
must read.

In practice, however, I suspect that all NVPs fall into one of the
following four classes:

1) Read only the document entity.

2) Read the whole DTD but no external general entities.

3) Read all external general entities, but only process
the internal DTD subset.

4) Read all external entities (except unparsed entities).
 
-- 
John Cowan	http://www.ccil.org/~cowan		cowan at ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list