Why XML data typing is hard (was Re: Internal subset
equivalent in new schema proposals?)
G. Ken Holman
gkholman at CanadaMail.com
Sat Nov 28 11:39:32 GMT 1998
At 98/11/27 16:17 +0100, Ketil Z Malde wrote:
><david at megginson.com> writes:
>
>> The real question, though, is how constraints could be enforced.
>> Let's start with an extremely simple example:
>
>> <value xml:type="float"></value>
>
>Now you're adding type information to the content, what I suggested
>was to constrain *form*. For one thing, I would not specify this in
>the document (this is just a gut feeling, but why would you?), I would
>specify it in the DTD, e.g. like so:
>
> <!element value #REGEXP:"-?[0-9]*.[0-9][0-9]">
>
>(or some such, you get the point).
>
>> What are the allowed contents?
>
>Then the document could contain
>
> <value>4.50</value>
> <value>-0.01</value> or
> <value>.00</value>
>
>but not
>
> <value>1.0</value> or
> <value>4,50</value>
Then your example proposed range of values is inappropriate because "4,50"
is a valid float from an I18N point of view.
In Canada, valid expressions of currency numbers are $1.47 or 1,47$ based
on where you are. The decimal separator is "." in English Canada and ","
in French Canada.
I understood David's point to be that two valid expressions of the same
float aren't lexically the same.
>> <value xml:type="float">1,5</value>
>> <value xml:type="float">1.5</value>
>
>This won't be a problem, if the DTD specifies what can the processing
>software should expect. You could even validate processing software
>to some extent.
And I suppose your regular expression example could be changed to
<!element value #REGEXP:"-?[0-9]*(\.|,)[0-9][0-9]">
Currency values would need a larger expression.
I gather from Michael S-McQ in a presentation in Chicago that the regular
expression for a valid date (taking into account days of the month and leap
years) is 4801 characters long.
Since these values themselves are hierarchical, one could model the example
as:
<whole>4</whole><part>50</part>
but one couldn't do that with the separate elements
<year>1998</year><month>02</month><day>29</day>
and know it was invalid without the concept of the set representing a valid
date.
>I think trying to define some set of types to be used in *all* XML
>documents is taking the wrong approach. I don't really see this as
>either workable or desirable. What would the point of using xml:type
>be?
Perhaps to abstract what is being expressed in markup to allow different
lexical expressions of the same value to be considered valid.
............ Ken
--
G. Ken Holman mailto:gkholman at CraneSoftwrights.com
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/
Box 266, V: +1(613)489-0999
Kars, Ontario CANADA K0A-2E0 F: +1(613)489-0995
Training: http://www.CraneSoftwrights.com/x/schedule.htm
Resources: http://www.CraneSoftwrights.com/x/resources.htm
Shareware: http://www.CraneSoftwrights.com/x/shareware.htm
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list