public identifiers [Re: Referencing multiple DTD in an XML document]

Wed Nov 3 18:31:14 GMT 1999

james anderson wrote:
> 
> greetings;
> 
> David Brownell wrote:
> >
> > ...
> >
> > It's a simple extension of what I showed above:
> >
> >         <!DOCTYPE foo [
> >         <!ENTITY % MathML SYSTEM "http://www.example.com/MathML.dtd">
> >         %MathML;
> >         <!ENTITY % SpeechML SYSTEM "http://www.example.com/SpeechML.dtd">
> >         %SpeechML;
> >         ]>
> >
> > I'd encourage you to use PUBLIC identifiers as well, so that
> > local cached copies of those declarations can easily be used.
> >
> 
> Could someone expound further on this statement?

I'll give my understanding, which AFIAK isn't radical.

> I recall only that the rec says something about permitting a processor to use
> the public identifier to generate an alternative URI to the system identifier
> to be used to retrieve an external resource.

Right.

> Is the "customary usage", in fact, that processors use the public identifier
> to name an entry in a local and/or process-specific cache?

Now you're getting into cache management policies.  Lots of
them are possible.  Pick the best one for your application
or your site -- key advice for any cache, not just XML.

>	 which they manage dynamically?

I've normally seen this be static ... the way to do this dynamically
is to have the XML parser talk to a caching HTTP proxy server, so that
you're re-using existing infrastructure.

The downside of dynamically caching these is epitomized by me using my
laptop in a cafe, without a wireless modem:  no network.  So it's
desirable to have some sort of static catalog describing the local
cache.  I do that when validating XHTML, for example.

Hence the word "easy" in my quote above:  easy to set up a
manually administered cache, with a catalog.  More complex
schemes are also possible, just less easy (except for use
of a caching HTTP proxy server, which only works for one
class of system URIs).

> I wonder this also because I've yet to see a conclusive discussion of whether
> dtd's (or for that matter schema) in the namespace age are stable. That is,
> whether the same system uri will always identify an identical dtd/schema.

Depends on the URI and how it's managed.  As a rule of thumb, I
never assume that any identifier will always identify the same
thing ... serves me quite well.  Some folk would like to arrange
that URNs always identify an "identical" resource though.

> For example, would a processor be conforming if it were to, upon not finding a
> resource "under" a public identifier, retrieve and expand the resource located
> through the system identifier and cache the expanded (or even decoded result)
> "under" the public identifier?

I don't think you'll find a conformance requirement for cache
management policies.  Two things to watch out for though:

    -	XML doesn't define the namespace for PUBLIC identifiers.
	(I understand SGML did.)  So two different folk could easily
	assign different meanings to a public ID like "foo".  If you
	choose to dynamically manage a cache, you may want a heuristic
	to make it apply only to SGML FPIs or to URNs.

    -	Watch out for conditional sections, since you may need to
	deal with two different documents that INCLUDE or IGNORE
	different chunks of an external PE.  (If you just cache
	unparsed data bytes you should be safe.)

I was using the word "cache" in a broad sense.   Also, using the
existence of a PUBLIC identifier as a flag that the content in
question is intended to fit into some such "cached" usage model.
(Else, why use that syntax?)

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)