Why XML data typing is hard

Fredrik Lindgren f.lindgren at upright.se
Mon Dec 7 09:58:12 GMT 1998



Joel Bender wrote:

[snip]

> 
> OK, so keep it out of SAX.  The work of translating "1.5" into 1.5 has to
> get done someplace, is done in a very similar way by lots of applications,
> and seems ripe for standardization.  IMHO, it doesn't seem like a huge leap
> to go from XML documents that contain just text to ones that contain atomic
> types (boolean, integer, float for starters).
> 
I always thought that the problem was about defining the the textual
representations of values that should be understood by an XML-processor
and therefor used by who/what-ever writes the document. I don't see the
point in just defining that an element is a boolean value when this can
be represented be (true/false), (1/0) and (yes/no). 

The same problem arise when it comes to integers, "15" and "FF" are both
strings representing the same value. To my knowledge both need to be
parsed before they can be used. This could be hidden by the programming
language you use but the XML-document should IMO be language
independant.

Regexp patterns might be a way to go to specify the allowed
representations, but the real need IMO is to specify what the
representation means, and for interoperability reasons agree on standard
representations for some set of data types. 

I think it is important that such standard representations of data types
should be based on intended meaning and not computer representation. An
integer data type might need to be specified with a min and a max value
- it should not be specified as "a 32 bit integer" If in the future a
data typing functionality was added to the XML familly of standards the
processor could get these boudaries and decide on how to store the value
internally.


> I assume that date formats have even more variations than numbers, at least
> until there is agreement on a stardate!  So stick with simple things like
> binding a regexp pattern to content.  There will be debates about a date
> being an atomic type or a structure (I tend to think of them as integers
> with a really bad number base).  There shouldn't be any need for structure
> parsing because structures will already be described by the XML document
> being parsed.
> 
> > I would have thought it would be simple, but then again,
> > I'm culturally biased, and hadn't read the Unicode regexp
> > document. Oh horror!
> 
> :-)  It looks 'hard', but doesn't seem like there's any more real
> complexity than what hasn't already been solved by some very talented
> folks.  In particular it is written...
> 
> > (Regular expression syntax varies widely: the issues discussed
> > here would need to be adapted to the syntax of the particular
> > implementation.)
> 
> This pattern definition/association document (this beast needs a name!) can
> make all that hand wringing and the "levels of support" go away.  No need
> for funky esacpe characters, escaped escape characters, misinterpretation
> of parens, brackets, braces, stars...gag!
> 
> Here is a start...
> 
>         <set id="letter">
>             ABCDEFGHIJKLMNOPQRSTUVWXYZ
>         </set>
>         <set id="digit">
>             0123456789
>         </set>
>         <set id="special">
>             _$                          <!-- $ is a VMS thing -->
>         </set>
>         <set id="namechar">
>             <set idref="letter"/>
>             <set idref="special"/>
>         </set>
> 
>         <token id="name">
>             <set idref="namechar"/>
>             <group optional="1" repeatable="1" disjunction="1">

What does the attribute value "1" represent?

>                 <set idref="namechar"/>
>                 <set idref="digit"/>
>             </group>
>         </token>
> 
>         <pattern id="namevalue">
>             <token idref="name"/>
>             <s ignore="1"/>             <!-- 's' is whitespace -->
>             <token idref="AttValue"/>   <!-- from XML spec -->
>         </pattern>
> 
> BTW, if I'm not using 'id' and 'idref' correctly, please forgive me, I'm
> still very new at this!  I'd be happy to take more discussion off-line if
> it doesn't belong in xml-dev.  In the mean time I'll draft a DTD of this
> for feedback.
> 
> Joel
> 
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list