Why XML data typing is hard (was Re: Internal subset
equivalent in new schema proposals?)
G. Ken Holman
gkholman at CanadaMail.com
Sat Nov 28 11:39:32 GMT 1998
At 98/11/27 16:17 +0100, Ketil Z Malde wrote:
><david at megginson.com> writes:
>> The real question, though, is how constraints could be enforced.
>> Let's start with an extremely simple example:
>> <value xml:type="float"></value>
>Now you're adding type information to the content, what I suggested
>was to constrain *form*. For one thing, I would not specify this in
>the document (this is just a gut feeling, but why would you?), I would
>specify it in the DTD, e.g. like so:
> <!element value #REGEXP:"-?[0-9]*.[0-9][0-9]">
>(or some such, you get the point).
>> What are the allowed contents?
>Then the document could contain
> <value>-0.01</value> or
> <value>1.0</value> or
Then your example proposed range of values is inappropriate because "4,50"
is a valid float from an I18N point of view.
In Canada, valid expressions of currency numbers are $1.47 or 1,47$ based
on where you are. The decimal separator is "." in English Canada and ","
in French Canada.
I understood David's point to be that two valid expressions of the same
float aren't lexically the same.
>> <value xml:type="float">1,5</value>
>> <value xml:type="float">1.5</value>
>This won't be a problem, if the DTD specifies what can the processing
>software should expect. You could even validate processing software
>to some extent.
And I suppose your regular expression example could be changed to
<!element value #REGEXP:"-?[0-9]*(\.|,)[0-9][0-9]">
Currency values would need a larger expression.
I gather from Michael S-McQ in a presentation in Chicago that the regular
expression for a valid date (taking into account days of the month and leap
years) is 4801 characters long.
Since these values themselves are hierarchical, one could model the example
but one couldn't do that with the separate elements
and know it was invalid without the concept of the set representing a valid
>I think trying to define some set of types to be used in *all* XML
>documents is taking the wrong approach. I don't really see this as
>either workable or desirable. What would the point of using xml:type
Perhaps to abstract what is being expressed in markup to allow different
lexical expressions of the same value to be considered valid.
G. Ken Holman mailto:gkholman at CraneSoftwrights.com
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/
Box 266, V: +1(613)489-0999
Kars, Ontario CANADA K0A-2E0 F: +1(613)489-0995
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev