datatypes and i18n
Reynolds, Gregg
greynolds at datalogics.com
Tue Nov 16 17:13:19 GMT 1999
This message is being sent (separately) to the XML-DEV list and to the W3C
XML Schema and I18N Interest Groups.
I'd like to raise a few issues for the XML Schema WG to think about. I know
it would make your difficult task easier if you could to see detailed
proposals, but I'm afraid it'll be some weeks before I can put one together.
I also plead guilty to not having followed the mailing list closely, read
the issues lists, or even having read the structures draft very closely.
Can't be helped; my circuits are overloaded.
Nevertheless I think it fair to urge you to reconsider the "lexical space"
aspect of datatypes. For starters, the notion that a lexical "space" is an
aspect of a datatype is highly suspect, IMO. I think it more accurate to
construe the set of strings as just another type, you get more power and
flexibility that way. Interpreting a string in various semantic fields -
number, date, name, etc. - then just becomes an issue of providing a mapping
from one type to another.
A key point is that the values of the various semantic fields have *no*
natural lexical representation whatsoever, so the mapping is from lexemes to
pure values. Of course one must choose a canonical metalanguage to
represent the values, but only so that users of the language may employ
whatever lexical representations they choose by mapping to the canonical
representation.
This is one way to address the issue of internationalization that has been
raised recently regarding the boolean type. But this problem of
Eurocentrism is everywhere in the current draft. Another simple example:
integers. Pick ASCII as the canonical representation, then users pick
whatever characters they like to denote the integers, mapping to ASCII 0-9.
So a document may use numeric chars from any Unicode block as the lexical
representation of integers. Even if multiple such lexical types were used
in one document, the semantics would be the same: integer.
A more sophisticated approach would provide for a "lexical syntax"
("lexitactics"?) restricting lexical forms. E.g. 1.23E-4.
I tried to suggest this some time ago in
http://www.datalogics.com/xml/xsl/wg/lextype.htm
but except for some generous help on how various numbering systems work, I
never heard a word from anybody on the WG. It would be nice to at least
receive some feedback, even "Sorry, no time for that now" or "What a stupid
idea!" (I've made a few minor changes to the lextype doc since September
when I first sent it; obviously it needs work but you should be able to get
the main idean.)
I've got lots of other observations but I'm afraid I'm out of time for now.
However, you can get an idea of how I think these issues should be
approached from the following material on the Z specification language:
The Z Reference Manual (ZRM) is at
http://spivey.oriel.ox.ac.uk/~mike/zrm/
The Final Committee Draft of the forthcoming ISO Z standard is at
http://www-users.cs.york.ac.uk/~ian/zstan/CD.html
I urge you to take a hard look at these documents; you'll find that much of
the work you're doing in coming up with terminology and conceptual
frameworks has already been done. If I had more time, I'd write a nice
proposal that Z be adopted as the Official Metalanguage of the W3C. In fact
it is possible to write the definition of XML in Z in such a way that the
full power of Z is available for defining semantically typed XML. But with
the holidays approaching I'm sure to have even less time; maybe somebody
else would like to give it a try or collaborate?
Sincerely,
Gregg Reynolds
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list