Strong Typing in SGML and XML

Peter Murray-Rust Peter at
Tue May 6 11:29:53 BST 1997

In message < at> Tim Bray writes:
> Ever since about 15 minutes after SGML was born, database people have been
> discovering, to their surprise, that it contains no facilities for
> strong data typing.  You can have an element named <BIRTH-DATE>, and
> SGML will have no problem accepting 
> <birth-date>purple bananas rule</birth-date>.
> Whenever more than two people start talking about the future of SGML, 
> someone starts complaining about typing.  With the advent of XML, the 
> volume has increased.  As an old database guy, I've been one of the loud 
> complainers.  

I agree fully with this proposal.  This also highlights one of the essential
aspects of XML-DEV, which is going to come up repeatedly.  This is that
there are things that the ERB/WG is going to consider in the future, but
people want ways forward right now.  XML-DEV provides a forum so that:
	- people can find what previous approaches already exist
	- groups of people can point in the same direction if they wish to
	- problems can be identified before the ERB/WG process, making that
		faster and more effective.

This is an area which I've had to address in CML.  CML uses strong-data-typing
but I made it up myself.  It has STRING, INTEGER, FLOAT, DATE and various
others that XML-LINK has made obsolete.  So it's very easy to change to
the approach suggested here.

Wherever possible concepts should be re-used and I like the use of SQL.
(I don't like *SQL*, but that's a different matter).  I'm assuming, Tim,
that some of the proposal was carried nearly verbatim, because parts of
it are slightly opaque to those who don't know the SQL standard.

> While we're really not ready for this on the WG, it is something that
> we're going have to do something about before too long.  So I've posted
> a modest proposal at:

Good start.  I don't think it needs expanding in scope, just some reworking
in places.

> Overview points:
> 1. This only types elements, not attributes.  It's easier.

Agree 100%.  I started with typed attributes and there is an enormous amount
of work in managing them as well as typed content.  You have to be able to
serach them, transform them (at least in CML), qualify them with attributes
and so on.

> 2. It's based on SQL types, not HyTime lextypes.  That's what the
>    database world is used to.  This could probably be implemented

What you have seems fine.  I assume that it is virtually an automatic 

>    using lextypes.
> 3. The syntax for dates and so on should match some ISO standard,
>    but I haven't found which one yet.

Do you mean you there are several and you haven't decided between them?
I thought that people had converged on a single one (I can't remember
the number, it's something like 8601).

Detailed points:

I don't find SQLSIZE 'obvious' - it's essentially the character-string 
length, and if starting from scratch it should be more like SQLMAXLENGTH.
But if everyone uses it and learns to love it, I suppose we have to.

In box 2 you have XML-MIN - I assume this is a typo.

I found SIZE, MIN and MAX, very confusing.  I *think* that the text is 
correct, but it's very easy to get lost.  Are we stuck with these?

4.5  Presumably SQLMIN<=SQLMAX? etc...

4.6 Reference to SQL SCALE was unclear.  Is there a requirement for SQLSCALE
as well or does this simply need rewriting.

4.7 I am not happy without exponential notation.  For example do we 
really have to represent Avogadro's number (6.023E+23) as
602300000000000000000000?  Surely we can use IEEE notation?

Is equality defined/definable for floating point?

4.8 I go along with 8601 or whatever it is.  That also defines TIME.

4.9 SQLSIZE was bad enough before.  Overloading it to manage the timezone
is really horrible.  Is this not defined in 8601 in which case we can use it?

4.10 Again I think this is covered by the ISO standard.

But this is an excellent start.  Again I raise the idea that XML should
introduce Generally Accepted Conventions.  This could be one.  
Later it might become part of the standard.  This way we help point people
in the right direction.

We have a lot of readers of XML-DEV.  This sort of area is an excellent one
to be contributing to.  Volunteers to summarise resources of this sort
(e.g. pointers to the ISO data standard, SQL datatyping, etc.) would be 
much appreciated.


> Cheers, Tim Bray
> tbray at +1-604-708-9592
> xml-dev: A list for W3C XML Developers
> Archived as:
> To unsubscribe, send to majordomo at the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa at
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences

xml-dev: A list for W3C XML Developers
Archived as:
To unsubscribe, send to majordomo at the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at

More information about the Xml-dev mailing list