Why XML data typing is hard

Ketil Z Malde ketil at ii.uib.no
Wed Dec 2 08:28:00 GMT 1998

Joel Bender <joel at spooky.emcs.cornell.edu> writes:

>>  Alternatively, you could force people to use "YYYY-MM-DD" by
>>  forcing conformance to a regular expression, and have your
>>  applications only have to deal with that.

> It may not be politicaly correct, and depending on the context might even
> come across as ethnocentric, but IMHO that's not a bad thing.

The important point I'm trying to make, is that I don't want to
enforce this on a global scale.  I want to enforce this in a DTD, for
a specific application.  It's a fair chance it will be ethnocentric
and politically incorrect, but if I worry about those issues, I am
free to try to provide a solution.

Of course, having a *recommended* format for common data types would
be a good thing.

> A standards process doesn't need to cover all the cases

I think it is important that the standards process come up with a good
and preferably simple, mechanism.  Like DTDs.  I fear that trying to
go into specifics is a political and technical rat's nest.

> Let's say you give me a bunch of XML files which is are
> marked-up email messages, and I would like to find out which ones are at
> least a week old.  It sure would be nice to know that the <received>Tue, 1
> Dec 1998 02:00:09 +0000</received> contents you provided me have some
> standard form.

Of course it would be nice, and if the date format was properly
specified in the DTD, you could.  If you don't have a DTD (and know
the semantics for it), you won't be able to figure this out anyway,
since the message may contain many xml:type="date"s, and you won't
know which ones to look at.

> No, not specific to a language mapping, that belongs in some API or SAX
> reference not in XML.  

That's what I meant (I think).  It would make SAX a whole lot more
complex, though, if it has to understand e.g. standardised dates, and
return some kind of date object (or struct) when it encounters one.
And by specifying the content type in the element attribute, you also
risk of running into an unexpected data type, which will cause your
application to give you a run time error.

> Supporting grep content pattern matching doesn't
> seem like it would be any more difficult than namespaces, kinda like...

I would have thougth it would be simple, but then again, I'm
culturally biased, and hadn't read the Unicode regexp document. Oh

If I haven't seen further, it is by standing in the footprints of giants

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list