Why XML data typing is hard (was Re: Internal subset equivalent in new schema proposals?)

G. Ken Holman gkholman at CanadaMail.com
Sat Nov 28 11:39:32 GMT 1998


At 98/11/27 16:17 +0100, Ketil Z Malde wrote:
><david at megginson.com> writes:
>
>> The real question, though, is how constraints could be enforced.
>> Let's start with an extremely simple example:
>
>>   <value xml:type="float"></value>
>
>Now you're adding type information to the content, what I suggested
>was to constrain *form*.  For one thing, I would not specify this in
>the document (this is just a gut feeling, but why would you?), I would
>specify it in the DTD, e.g. like so:
>
>	<!element value #REGEXP:"-?[0-9]*.[0-9][0-9]">
>
>(or some such, you get the point).  
>
>> What are the allowed contents?
>
>Then the document could contain
>
>	<value>4.50</value>
>	<value>-0.01</value> or
>	<value>.00</value>
>
>but not
>
>	<value>1.0</value> or
>	<value>4,50</value>

Then your example proposed range of values is inappropriate because "4,50"
is a valid float from an I18N point of view.

In Canada, valid expressions of currency numbers are $1.47 or 1,47$ based
on where you are.  The decimal separator is "." in English Canada and ","
in French Canada.

I understood David's point to be that two valid expressions of the same
float aren't lexically the same.

>>   <value xml:type="float">1,5</value>
>>   <value xml:type="float">1.5</value>
>
>This won't be a problem, if the DTD specifies what can the processing
>software should expect.  You could even validate processing software
>to some extent. 

And I suppose your regular expression example could be changed to 

	<!element value #REGEXP:"-?[0-9]*(\.|,)[0-9][0-9]">

Currency values would need a larger expression.

I gather from Michael S-McQ in a presentation in Chicago that the regular
expression for a valid date (taking into account days of the month and leap
years) is 4801 characters long.

Since these values themselves are hierarchical, one could model the example
as:

  <whole>4</whole><part>50</part>

but one couldn't do that with the separate elements

  <year>1998</year><month>02</month><day>29</day>

and know it was invalid without the concept of the set representing a valid
date.

>I think trying to define some set of types to be used in *all* XML
>documents is taking the wrong approach.  I don't really see this as
>either workable or desirable.  What would the point of using xml:type
>be?  

Perhaps to abstract what is being expressed in markup to allow different
lexical expressions of the same value to be considered valid.

............ Ken



--
G. Ken Holman         mailto:gkholman at CraneSoftwrights.com
Crane Softwrights Ltd.  http://www.CraneSoftwrights.com/x/
Box 266,                                V: +1(613)489-0999
Kars, Ontario CANADA K0A-2E0            F: +1(613)489-0995
Training:   http://www.CraneSoftwrights.com/x/schedule.htm
Resources: http://www.CraneSoftwrights.com/x/resources.htm
Shareware: http://www.CraneSoftwrights.com/x/shareware.htm


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list