datatypes and i18n

Reynolds, Gregg greynolds at
Tue Nov 16 17:13:19 GMT 1999

This message is being sent (separately) to the XML-DEV list and to the W3C
XML Schema and I18N Interest Groups.

I'd like to raise a few issues for the XML Schema WG to think about.  I know
it would make your difficult task easier if you could to see detailed
proposals, but I'm afraid it'll be some weeks before I can put one together.
I also plead guilty to not having followed the mailing list closely, read
the issues lists, or even having read the structures draft very closely.
Can't be helped; my circuits are overloaded.  

Nevertheless I think it fair to urge you to reconsider the "lexical space"
aspect of datatypes.  For starters, the notion that a lexical "space" is an
aspect of a datatype is highly suspect, IMO.  I think it more accurate to
construe the set of strings as just another type, you get more power and
flexibility that way.  Interpreting a string in various semantic fields -
number, date, name, etc. - then just becomes an issue of providing a mapping
from one type to another.  

A key point is that the values of the various semantic fields have *no*
natural lexical representation whatsoever, so the mapping is from lexemes to
pure values.  Of course one must choose a canonical metalanguage to
represent the values, but only so that users of the language may employ
whatever lexical representations they choose by mapping to the canonical

This is one way to address the issue of internationalization that has been
raised recently regarding the boolean type.  But this problem of
Eurocentrism is everywhere in the current draft.  Another simple example:
integers.  Pick ASCII as the canonical representation, then users pick
whatever characters they like to denote the integers, mapping to ASCII 0-9.
So a document may use numeric chars from any Unicode block as the lexical
representation of integers.  Even if multiple such lexical types were used
in one document, the semantics would be the same: integer.

A more sophisticated approach would provide for a "lexical syntax"
("lexitactics"?) restricting lexical forms.  E.g. 1.23E-4.

I tried to suggest this some time ago in

but except for some generous help on how various numbering systems work, I
never heard a word from anybody on the WG.  It would be nice to at least
receive some feedback, even "Sorry, no time for that now" or "What a stupid
idea!"  (I've made a few minor changes to the lextype doc since September
when I first sent it; obviously it needs work but you should be able to get
the main idean.)

I've got lots of other observations but I'm afraid I'm out of time for now.
However, you can get an idea of how I think these issues should be
approached from the following material on the Z specification language:

	The Z Reference Manual (ZRM) is at

	The Final Committee Draft of the forthcoming ISO Z standard is at

I urge you to take a hard look at these documents; you'll find that much of
the work you're doing in coming up with terminology and conceptual
frameworks has already been done.  If I had more time, I'd write a nice
proposal that Z be adopted as the Official Metalanguage of the W3C.  In fact
it is possible to write the definition of XML in Z in such a way that the
full power of Z is available for defining semantically typed XML.  But with
the holidays approaching I'm sure to have even less time; maybe somebody
else would like to give it a try or collaborate?


Gregg Reynolds

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list