public identifiers [Re: Referencing multiple DTD in an XMLdocument]

David Brownell david-b at pacbell.net
Thu Nov 4 08:08:56 GMT 1999


"Liam R. E. Quin" wrote:
> 
> On Wed, 3 Nov 1999, james anderson wrote:
> > David Brownell wrote:
> [...]
> > > I'd encourage you to use PUBLIC identifiers as well, so that
> > > local cached copies of those declarations can easily be used.
> >
> > Could someone expound further on this statement?
> 
> I would *discourage* you from using PUBLIC identifiers.
> Any system that bases caches on PUBLIC rather then SYSTEM identifiers
> for XML is broken.

Unless done as I later elaborated ... tho long-term caching on SYSTEM
identifiers can be substantially worse.  With PUBLIC ones there's the
expectation (from the document provider) that there be such a caching
mechanism ... and, very handy, a backup access method (SYSTEM id) is
provided in case the client doesn't grok the appropriate caching.


> The PUBILC keyword is there for backwards compatibility with SGML;
> its meaning in XML is not formally defined, and it's usually ignored.
> Therefore, if you want interoperability, simply forget it's there.

That's way too extreme.  One specific use for PUBLIC ids is called out
in the XML spec.  It's been used interoperably for almost 2 years now,
just counting the context of XML.  The suggestion was to use it.

There's the issue I noted (namespace of PUBLIC ids isn't specified),
but that can be dealt with.  Normally people use IDs that are also
legal SGML FPIs, and manage them like they manage FPIs.  Several other
schemes can work too -- it's all in how you use the identifiers.


It's not that PUBLIC ids are perfect ... but there is one substantial
problem that they can manage nicely, even with their current warts.
And that's providing a cheap (!) way to avoid network access for
external entities, notably for DTD components.  It's not just folk
with 28.8Kb modems that have such issues; those T1 lines also have
better uses for their limited capacity, and sometimes there's no net
connection at all.


> [...]
> 
> > I've yet to see a conclusive discussion of whether
> > dtd's (or for that matter schema) in the namespace age are stable. That is,
> > whether the same system uri will always identify an identical dtd/schema.
>
> They are not and it won't.

Doesn't that conflict with your advice to cache based on SYSTEM ids?

Clearly _some_ SYSTEM ids are stable enough to cache.  The trick is how
to know which those are ... hmm, that's part of why PUBLIC ids exist,
they can be "known" to be stable enough (if you know what the ID means;
I'd not advise automated guessing).


> <!NOTATION sh SYSTEM "/bin/sh -c">
> <!ENITY systemdate SYSTEM "/bin/date" NDATA sh>
>
> now systemdate returns a different value every tiem you use it (maybe
> evenwithin the same document, depending on the processor).
>
> This use of NOTATION is implied by the XML spec 

I never found such an implication, and I've been over the XML spec more
than once with a fine tooth comb.  You may be  thinking about PIs, though
they don't involve SYSTEM ids. ("<?sh /bin/date?>" that is.)

For that matter, that's an unparsed entity, so it will never be
included as (parsed) text in any XML document ... "systemdate"
can only be used as a value of a "NOTATION" attribute.  (Plus it
may well be that "/bin/date" isn't a valid relative URI, as it
must be to be used in a SYSTEM identifier.)


>	and by the SGML handbook,
> but since a docuemnt you recieve could just as easily use "format" or "rm",
> it's not a good idea, by the way.

It might be implied by the SGML handbook, but I'd surely hope
that in this day and age, nobody would contemplate desiging a
new system (e.g. one using XML) with security holes that big.


> > For example, would a processor be conforming if it were to, upon not finding a
> > resource "under" a public identifier, retrieve and expand the resource located
> > through the system identifier and cache the expanded (or even decoded result)
> > "under" the public identifier?
>
> You can do whatever you like with a public identifier -- it's undefined.

... undefined except for one specific usage, which is explicitly called
out (not just "implied") in the XML specification.


> No amount of wishful thinking will make it otherwise, for those who
> want to use it :-) sorry.

Umm ... your facts are generally a _lot_ better than that!

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list