Strong Typing in SGML and XML

Peter Murray-Rust Peter at
Wed May 7 12:40:44 BST 1997

Hi Alan, 
	Thanks very much for your contribution.It raises several points.

In message  <9705071041.aa05447 at> Alan Spencer writes:
> In message <6259 at>Peter writes:
> Hi,
> I'm certainly not an authority on regular expressions, but I have been using
> the one in perl for many years now and I find it meets all of my requirements.
> It can be a bit messy (but aren't all regular experssions!). I'm
> sure most of you know how it works, it is quite like the one outlined above.
> It may be too complicated for what is necessary, as I'm sure that is a goal
> here, to make things as simple as possible and only as complicated as necessary.

Tim Bray (ERB) has been looking for a RE tools for XML.  The point is (I
think, Tim) that they're not trivial to write and that it's critical that
everyone uses the same one.  So we don't want to build into XML a RE that
isn't easily available.  If someone says, 'here's one in 
Java/C/Scheme/what/ever' that has no copyright restrictions, I think we'd
make progress.
> The ideas may need to be changed a bit, but the underlying structure is
> definitely there, the 'telephone number' example would be similar to that
> suggested.
> What is the plan as regards things not matching the constraints, I presume
> it is just a strict error, ie. not a valid XML document. Is there any plans
> to give a flexability to the rules, as to make corrupted data, for example,
> parseable, as is the case with HTML, most browsers are fairly smart when
> it comes to 'guessing'. This *is* a bad thing most of the time in HTML, as
> it promotes guess-work on the part of the inexperienced author. I have 
> experienced this with co-workers using WYSIWYG editors - 'It looks good
> on my computer, what's wrong with yours'. So I suggest this very lightly,
> I don't want to promote that.

This matter has been discussed at very great length on the WG and the ERB
is closing in on a position.  ERB, I think it could be very useful to 
cross post your position here (or modify it appropriately).
The treatment of errors is an extremely important issue, but it will not
be profitable to discuss it till the ERB has pronounced.  I would also
ask XML-DEV to accept that the ERB position has required much midnight oil
and to try not to repeat the discussions on XML-WG.

> As regards to the strong typing, could there be generic types which a particular
> application/Style would define, or even go undefined throughout. There
> are applications which work with arbitrary percision calcuations (like calc
> on UNIX), this would need a generic *real* type. For example,
> I have an interest in Mathematical formatting, simillar to that done by LaTeX,
> but with a more structured approach, ie. these documents could be parsed as
> formatte text or as real mathematical equations/functions/...
> For example, in TeX the code: "x^{ijk}_{lmn}" will produce:
>       ijk
>      x
>       lmn
> This doesn't define what this *means*, just what it looks like, it could be
> powers/indecies.... So if I was to try to define a generic variable *x* and
> add the functionality to it, it would make sense.
> If I am actually making sense, myself, any input on this would be helpfull.

You are!

I have been asking for some tie for 'parsable math' - i.e. something that
can be input to a machine, rather than being typeset for a human.  I 
accept that math is a wide spectrum and covers everything from research
maths papers to teaching 3 year-olds.  I doubt that a single DTD will
cover this.

The W3C group on math will report on may 15 (HTML-MATH).  This will be 
XML-compatible.  the group is aware of the need for interoperability with
other DTDs and the need for 'parsable' math.  I believe they will cross
post this list.

> As far as I see it, if there were generic types, or maybe *a* generic type,
> people could extend the basic types using styles to add the necessary
> functionality to these types.
> I'm not if I am starting to tend towards a type of programming language, but
> what the hell.

I think it's very important to make sure that none of us re-invent the wheel.
I like Tim's idea of mining SQL for elements, not because I like SQL 
(I don't much) because lots of people have thought hard about it.  For the
same reason I have suggseted that dates be ISO8601 compatible, because
the authors of that have thought of most of the problems.  Similarly *if*
any of the math groups working on DTDs come up with recommmendations we
should treat them very seriously.  If there is (and there must be) a 
statndard for string representations of the types described here, let's
use that.


Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences

xml-dev: A list for W3C XML Developers
Archived as:
To unsubscribe, send to majordomo at the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at

More information about the Xml-dev mailing list