URIs in XML canonical forms

David Brownell david-b at pacbell.net
Tue Aug 3 17:24:32 BST 1999

Richard Tobin wrote:
> Sun's definition of "second XML canonical form" in the Oasis
> conformance tests refers to the "shortest such relative URI".  Is that
> shortest in terms of characters?  Or bytes?  Is it measured before or
> after escaping non-ascii characters?

If it's done correctly and consistently, I don't see how answers
to those three questions could change anything.  Re "..", it would.

>	  Is it meant that include the use
> of ".."?  If so, then determing the shortest URI is somewhat
> non-obvious.

It's good, in fact, that current test cases don't need to rely on
any of the trickier bits of semantics for URI handling.  The wording
was imperfect, in any case -- it should only apply to the case of
URIs which originally appeared as relative URIs.  Which still leaves
the issue you noted (and implied in that writeup -- "if possible"),
and another issue of how to detect such cases with standard APIs,
like SAX (which doesn't expose the relative URIs).

For context:  the reason this is there is that otherwise you can't
turn all NOTATION (and later, unparsed ENTITY) declarations into a
single "canonical" form which can be compared, to verify conformance
of an XML processor with its specification.  The same document and
external entities, when parsed from different locations, mustn't
produce different "canonical" outputs, else it's not "canonical"!

I'd love to see that issue treated better, but the core issue may
be that documents using relative URIs with NOTATION and unparsed ENTITY
declarations can't always be be canonicalized in a way that supports
XML processor conformance testing for those features.  Perhaps such
inputs should instead be rejected; in effect, that's been done by
the selection of the input documents for such testing.

(For a bit of history -- this work predates the W3C group now talking
about canonicalization, which should have seen this writeup as it was
getting formed.)

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list