Why XML data typing is hard (was Re: Internal subset equivalent in new schema proposals?)

Ketil Z Malde ketil at ii.uib.no
Fri Nov 27 13:06:06 GMT 1998

<david at megginson.com> writes:

> Imagine that someone sent an expensive XML-based system from North
> America to Norway, and you discovered that the system constrained
> phone numbers

> XML-based system from Norway to Beijing, and it constrained city
> names to contain only Roman letters

I see the problem, but that constraint is IMHO an indication that
you somehow has encountered an area the system is unable to deal
with.  And very probably, you need to change other components of your
system as well.  Catching illegal values early on - in validation of
the document - instead of relying on some obscure run-time error in
some program, is a *feature*.

One occasion I've run into, was when I needed sort keys for a list of
records.  Unfortunately, the data quality was not very good, and there 
were a lot of characters without defined sort semantics stuck in the
sort field.  I had, in effect, to write a small tool that parsed the
data, and came up with the illegal keys.

If content constraints were supported in the DTD and validator, I
could have thrown the DTD at my customers and said that this is what
I accept, and they would be free to use whatever tools they chose to
verify that what they send me is valid.

> but coming up with data-type constraints that both useful and
> generalised enough for all XML users across all of the major Locales
> and all of the Unicode character repetoire is *very* difficult.

Only slightly more difficult than coming up with the DTD, IMHO, and
certainly a lot less difficult than ensuring that your processing
environment correctly deals with all ``major Locales'' and Unicode
byzantinery and all.

If I haven't seen further, it is by standing in the footprints of giants

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list