First draft of proposed XML TC for Unicode 3.0 (unofficial)

Fri Sep 10 22:33:58 BST 1999

Nik O scripsit:

> It is a given that changes from Unicode 2.0 to 3.0 will require changes to
> XML 1.0, and thus all existing XML-compliant parsers will cease to be
> compliant when the changes are made.  These Unicode changes aren't
> "corrections of printers errors" -- they are real changes in the XML spec,
> and will require changes to XML parsers and apps, as well.

I think only to the parsers, and only to their lowest levels at that;
apps are probably not sensitive to what characters appear in names
for the most part.

> I guess my previous message was sufficiently obtuse, since my real intention
> was to raise the issue of how these sorts of changes are to be managed.

Obscure (hard to understand) maybe, but not obtuse (stupid).

> I should have said that "..the new BaseChars changes _will_ break _all_
> existing XML parsers and/or apps..".  Once this change is made to XML,
> existing parsers won't be compliant since they've implemented BNF rule 85
> from REC-xml-19980210, and thus won't recognize these new BaseChars (e.g.
> #x01F6) as legal name characters.

Correct.  And in fact there may wind up being XML 1.0 vs. XML 1.0 with
Unicode 3.0 support (a la SGML TCs).  Not, I trust, XML 1.1.

> Very true, but isn't this [backward-compatibility rules] the top of
> a slippery slope, whereby every change
> to Unicode might require yet another special rule to maintain backward
> compatibility?

In principle yes, in fact no.  The Unicode Consortium and WG2 are focused
on adding support for new characters in Unicode 4.0 and beyond, not on
making changes to old stuff, though no one can rule out the possibility
of an error that has simply gone undetected hitherto, like the classification
of U+212E ESTIMATED SYMBOL as a lower-case letter.  (The glyph looks
a lot like "e", but isn't subject to font variants, which accounts for
the persistent error.)

In particular, for reasons having to do with canonical forms of Unicode,
decompositions are most unlikely to change henceforth.  So we can
expect some new XML name and name-start characters in Unicode 4.0,
new backwards compatibility hacks are rather unlikely.

> It would be possible to define XML characters as being based directly upon
> the current Unicode data tables, i.e., replace the whole BNF rule 85 table
> with a rule that directly referenced Unicode: "BaseChar ::= [..what Unicode
> says..]".  I realise that this example isn't real BNF, but it is just as
> valid a method of specifying characters.  We could perhaps refer to
> Unicode's BNF rules for the purpose of the XML grammar, but use Unicode
> tables for actual XML implementations.

Tim Bray laid out the perceived risks here pretty nicely.

> If i'm using XML in a real-world environment, it doesn't matter if XML 1.0
> has been changed to allow some new character if i haven't upgraded my
> Unicode support, and vice versa.

Well, not if you are trying to *render* the new characters.  But
an application that translates XML to some legacy format could cope
with new Unicode characters without change.

> This would tighten the bond between XML
> and Unicode, since the latter organization couldn't make their changes
> oblivious to their impact upon XML (no insult intended to Unicode, Inc.).
> Since XML is based upon Unicode, XML developers are also, by definition,
> Unicode developers -- these two communities are already interdependent.

Yes, but overly detailed coupling between independently developed
standards makes for tricky management issues.  Particularly because
the Unicode Standard is developed by both the Unicode Consortium
and ISO JTC1/SC2/WG2, coordinating a third body is probably too tricky.

-- 
John Cowan                                   cowan at ccil.org
       I am a member of a civilization. --David Brin

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)