Why XML data typing is hard (was Re: Internal subset equivalent in new schema proposals?)

roddey at us.ibm.com roddey at us.ibm.com
Tue Dec 1 19:02:21 GMT 1998




"I'm not suggesting that a set of simple XML data-typing constraints
cannot be helpful -- if you're building a database only of Norwegian
city names, you know that you don't have to deal with Han or Kanji
(unless, of course, you do such a good job that you decide to
commercialise your system) -- but coming up with data-type constraints
that both useful and generalised enough for all XML users across all
of the major Locales and all of the Unicode character repetoire is
*very* difficult."

The constraint proposal I posted a while back deals with some of these
issues, partly by punting and allowing constraint checking to be done by
way of installable 'constraint bundles'. Shipping an app with these types
of localization issues would be similar to writing any such app. If you
localize your constraint bundle code, then the constraint checking is
localized.

Personally though I separated data typing from constraint checking, for a
number of reasons, but the most important is that tying them together would
have required a kind of 'type equivilence' scheme to avoid having lots of
redundant code. If the datatype is "FooBar" but its really just an int with
range limitations, I didn't want to have to either create a new FooBar
constraint type when there is already a range checked int constraint out
there, or have the typing system have to know that FooBar is really an int
and figure out by itself what to do.

But, other than those issues, if the constraint mechanism can be pluggable
(though with a core set of fundamental ones that all parsers implement
intrinsically and which are not affected by localization), I think its not
unreasonable to do. But the issue is that then the XML data is tied to a
particular set of classes which implement the constraint checking. For
custom apps, that's probably fine. But if you want to distribute the XML
data widely, then there has to be a mechanism by which constraint code
comes along with it. Since parsers can be written in many languages, that
certainly imposes some serious concerns.

However, if we are really serious about it, that is not to say that there
cannot be sets of registered constraint bundles whose semantics are well
defined, and which can be implemented in many languages. If these bundles
were say given a globally unique hash id or unique URL for instance, then
an XML schema could indicate which registered constraint bundles it depends
upon and load them if local implementations are available or ping the user
to get them if required. Actually the URL would be better since my
constraint proposal already used URLs, relative to namespaces, to locate
the loaded bundle that handles a particular constraint.

And the same mechanism could be used locally within a particular
organization to deal with bundles registered in their own world, but not to
any outside registration body. Or, there could be more informal
registration mechanisms as well, for those folks who feel comfortable with
that. So it would be nicely extensible without having an act of God.

If the W3C just came up with official locale names in XML syntax, then of
course we could also embed into the constraint bundles which locale's they
can reasonably deal with so that the constraint bundle lookup and load
mechanism can do a 'best effort' check to find one that most closely
matches the user's current locale and which implements the registered
bundle.

This is all obviously a big step to take, but relative to the complexity of
the problem its not so bad I don't think. Am I missing something here? Why
does this suck and I don't know it? 8-)



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list