joel at spooky.emcs.cornell.edu
Tue Dec 8 23:10:25 GMT 1998
Fredrik Lindgren writes:
> I always thought that the problem was about defining the
> textual representations of values that should be understood
> by an XML-processor and therefor used by who/what-ever writes
> the document.
There are really two problems: (1) what pattern of character strings are
acceptable for content (attribute value or element content) and (2) how
should those patterns be mapped to atomic types?
> I don't see the point in just defining that an element is a
> boolean value when this can be represented be (true/false),
> (1/0) and (yes/no).
Correct, both the 'type' of value and the 'encoding' of the value must be
specified. In the world of HVAC, booleans are also (set/reset),
(open/closed), (enabled/disabled), and (on/off). They all represent a
two-state condition, although some are represented as two-value
enumerations so the typical boolean operators cannot be applied.
> The same problem arise when it comes to integers, "15" and "FF"
> are both strings representing the same value.
Almost :-). I've seen hex contants coded as 0xF, x'F', %x"F", I'm sure
there are others.
> To my knowledge both need to be parsed before they can be used.
Yes, some kind of parsing is necessary, someplace!
> This could be hidden by the programming language you use but the
> XML-document should IMO be language independant.
At some point a determination has to be made about what languages/patterns
will be acceptable. Should 'fifteen' also be supported? What are the
tradeoffs in parsing complexity vs. universal acceptance and language
independance? If the source and target of some communication using an XML
document are both humans, most of us are quite flexable (even if seeing
some form or other makes us irritated). Give me 'fifteen' in French and
I'll be just as lost as my parsers :-).
> Regexp patterns might be a way to go to specify the allowed
> representations, but the real need IMO is to specify what the
> representation means, and for interoperability reasons agree
> on standard representations for some set of data types.
Coming to a concensious on this point is a good first step! This moves the
discussion away from the "XML is only document markup and not a protocol"
camp. I'm not in the "XML as a replacement for CORBA" camp, but my
applications do need to interoperate with others I'm not writing (including
humans), so I'm a little closer to the latter than the former.
> I think it is important that such standard representations of
> data types should be based on intended meaning and not computer
To be a little simplistic, the "%4d" formatting specification in C and (I4)
in FORTRAN is such a standard. It defines both the character string
representation and how that is translated to and from an internal (machine
specific) form. The examples I've seen are culturally biased in that they
use Arabic numbers, but in theory it is localizable.
>> <token id="name">
>> <set idref="namechar"/>
>> <group optional="1" repeatable="1" disjunction="1">
> What does the attribute value "1" represent?
>> <set idref="namechar"/>
>> <set idref="digit"/>
"1" means the group is optional, repeatable, etc. "0" means it is not
optional, not repeatable, etc.
I've tried to avoid design issues (which are mostly arbitrary), for
example, is "<group optional='1'>" better or worse than "<group
prop='optional'>"? Or should I say "<group optional='yes'>"? If we had a
way to specify that the optional attribute is a boolean, that would be a
help... :-) I would really like to say...
<group optional> ... </group>
But this is not support in the current standard. (At some point I would
like to see support for attributes that don't have values, but that is for
another thread. I intend to read the archives for this discussion.)
Continuing the thread, James Tauber wrote:
> Agreed. And to encourage reuse of notations, I am going to set up
> a section on SCHEMA.NET as a repository for notation declarations.
> John Cowan's are a great start. Any more.
> Let's actually put together some for some ISO8601 formats.
Wheee! Count me in. :-)
> > I would like to see the following syntax:
> > <!ELEMENT OrderTime (#PCDATA) NDATA Iso8601_DateTime >
> when you can just have
> <!ELEMENT OrderTime (#PCDATA)>
> <!ATTLIST OrderTime
> notation NOTATION (Iso8601_DateTime) #FIXED "Iso8601_DateTime">
Can this be rolled along with parameters? Kinda like:
notation NOTATION (Iso8601_Float) #FIXED "Float(Precision,10,Scale,4)">
I've gotten confused about the format of the value of the notation
attribute, isn't it supposed to be a URL or URN? Or perhaps this is only
supposed to apply to the notation description? I'm guessing here...
<!NOTATION Iso8601_Float PUBLIC "ISO 8601//NOTATION Float//EN">
The layers of standards and references to standards (that may or may not be
working drafts of other standards) is driving me crazy! The XML spec says
"associated system and/or public identifiers, to be used in interpreting
the element to which the attribute is attached". From this I don't get the
impression that is a URL or a URN, but "now for something completely
The idea that it can be "a reference into your schema or whatever you like"
as Charles Reitzel wrote doesn't thrill me. It does bring a smile to my
face when I think that I could simply say (which I know doesn't work)...
<!NOTATION integer "%d">
All these clues are getting closer! Thanks everyone for helping me through
Joel Bender Voice: 607-255-8880
Senior Programmer/Analyst FAX: 607-255-5377
Utilities Department 131 Humphreys Service Building
Cornell University Ithaca, NY 14853-3701
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev