Why XML data typing is hard (was Re: Internal subset equivalent in new schema proposals?)

david at megginson.com david at megginson.com
Fri Nov 27 13:49:23 GMT 1998


Ketil Z Malde writes:

 > Catching illegal values early on - in validation of the document -
 > instead of relying on some obscure run-time error in some program,
 > is a *feature*.

Agreed -- this is a very good choice, especially if you have human
authors.

The real question, though, is how constraints could be enforced.
Let's start with an extremely simple example:

  <value xml:type="float"></value>

What are the allowed contents?  Certainly, +, -, and the digits 0-9
should be allowed, as well as the letter 'e', but which of the
following should throw an error?

  <value xml:type="float">1,5</value>
  <value xml:type="float">1.5</value>

There are three obvious answers:

1. Both are accepted.
2. Only one is accepted, and everyone learns to use that format.
3. Only the correct one for the current locale is accepted.

Option #2 is politically unworkable (either France or the U.S. would
take up arms), and option #1 seriously weakens validation (what if an
English author had mistakenly intended to use the comma to specify a
range?).  Option #3 looks OK on the surface, but it is actually the
worst of the three because it destroys interoperability: same XML
document may be considered correct by some parsers and erroneous by
others, depending on what locale the user happened to choose.

This is a very simple example; after you've worked this out, you can
start worrying about how to count combining characters with
field-length restrictions, etc.


All the best,


David

-- 
David Megginson                 david at megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list