Why XML data typing is hard (was Re: Internal subset equivalent in new schema proposals?)

Fri Nov 27 14:40:30 GMT 1998

David,

There may be times when there is good justification for strong
typing. So the programmer puts in the validation logic.

Now, if the requirements change, due to a integration project
or a change in local, then the situation is a whole lot better
if the validation logic is meta-data driven. 

I will not argue that a DTD is a good place to put this. But I will
argue that the programmer needs to be tempted with something
better than interlacing validation and application logic.

I'd also like to argue that where ever you put the data typing,
you also need to support type transforms, i.e. the ability
to specify source type and destination type.  (I've been thinking 
about SAX event filters lately.)

Bill

-----Original Message-----
From: david at megginson.com <david at megginson.com>
To: xml-dev at ic.ac.uk <xml-dev at ic.ac.uk>
Date: Friday, November 27, 1998 8:55 AM
Subject: Re: Why XML data typing is hard (was Re: Internal subset equivalent in new schema proposals?)

>Ketil Z Malde writes:
>
> > Catching illegal values early on - in validation of the document -
> > instead of relying on some obscure run-time error in some program,
> > is a *feature*.
>
>Agreed -- this is a very good choice, especially if you have human
>authors.
>
>The real question, though, is how constraints could be enforced.
>Let's start with an extremely simple example:
>
>  <value xml:type="float"></value>
>
>What are the allowed contents?  Certainly, +, -, and the digits 0-9
>should be allowed, as well as the letter 'e', but which of the
>following should throw an error?
>
>  <value xml:type="float">1,5</value>
>  <value xml:type="float">1.5</value>
>
>There are three obvious answers:
>
>1. Both are accepted.
>2. Only one is accepted, and everyone learns to use that format.
>3. Only the correct one for the current locale is accepted.
>
>Option #2 is politically unworkable (either France or the U.S. would
>take up arms), and option #1 seriously weakens validation (what if an
>English author had mistakenly intended to use the comma to specify a
>range?).  Option #3 looks OK on the surface, but it is actually the
>worst of the three because it destroys interoperability: same XML
>document may be considered correct by some parsers and erroneous by
>others, depending on what locale the user happened to choose.
>
>This is a very simple example; after you've worked this out, you can
>start worrying about how to count combining characters with
>field-length restrictions, etc.
>
>
>All the best,
>
>
>David
>
>-- 
>David Megginson                 david at megginson.com
>           http://www.megginson.com/
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)