Strong Typing in SGML and XML

Eric Albright eric_albright at
Wed May 7 05:46:25 BST 1997

First, I'd like to concur with the need for a formal specification for data

I had hoped that HyTime's lextype feature would be sufficient. I for one
would like to hear from the HyTime experts about how they would implement
the parallel data typing. -- No use reinventing any standard. It may only
need simplifying and explaining.

Having said that, I ask when is strong data typing necessary? As far as I
can tell there is only one place where it is useful -- when the document is
being created or altered. There will always be data validation that cannot
be handled by data typing and as such must be delegated to a validating
application or a human. e.g.

As for comments about the proposal:

I would like to see a simplified version of the data types. It is very
important for databases to know the exact size in bytes that a data element
will occupy. SGML/XML deals with a character string and therefore does not
care. More important to me are the constraints on the data implicit by a
given type. I think we need to determine the types of constraints that each
data type requires and allow for the maximum flexibility without
sacrificing precision.

As far as I can tell, there are three basic types--character, numeric, and
temporal. Each type requires its own unique constraints:

CHARACTER - an alphabet, length constraint, content constraint (regular

NUMERIC - a maximum value, a minimum value, some type of rounding/precision

TEMPORAL - a maximum value, minimum value, (the maximum and minimum values
may be constrained in relation to the current value), some type of

I think that the CHARACTER data type should be able to specify the alphabet
and length constraint within the content constraint. However some
modification to the standard regular expression writing would be necessary.
I for one do not want to have to type
\([0-9][0-9][0-9]\)[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9] for a phone number.
Perhaps \([0-9](+3)\)[0-9](+3)-[0-9](+4) would be better.

To allow maximum flexibility and precision for numeric values, we should be
able to specify the form (roman/arabic) and a base. The rounding allows us
to constrain the significant digits to some factor of the base. A rounding
type would be needed for the greatest flexibility (round/ceiling/floor).

Temporal values can specify either an instant of time or an extent of time.
They should also be able to be rounded. When an instant is rounded, the
significant digits are to the left; when an extent is rounded, the
significant digits are to the right. To signify that an instant is precise
to the nearest five years, it would be rounded to 0005/00/00 00:00:00. To
signify that an extent is precise to the nearest tenth of a second, it
would be rounded by 0000/00/00 00:00:00.1 .

Given this the "architectural form" for data typing would become:

<!ATTLIST AnyElement
    XML-TYPE 	   (character|numeric|temporal)  #IMPLIED -- if omitted, 
                                                        default is
                                                        with no other
                                                        applied --
    XML-TYPE-CONTENT CDATA                 #IMPLIED -- For CHARACTER types
                                                       default is no
constraint --
    XML-TYPE-MIN     CDATA                 #IMPLIED -- For
                                                       default is no
constraint --
    XML-TYPE-MAX     CDATA                 #IMPLIED -- For
                                                       default is no
constraint --
    XML-TYPE-ROUNDTO CDATA                 #IMPLIED -- For
                                                       default is no
constraint --
    XML-TYPE-RNDMETH (round|ceiling|floor) #IMPLIED -- Round method;
                                                       For NUMERIC/TEMPORAL
                                                       default is "round"
    XML-TYPE-FORM    (roman|arabic)        #IMPLIED -- For NUMERIC;
                                                       default is "roman"
    XML-TYPE-BASE    CDATA                 #IMPLIED -- For NUMERIC;
                                                       default is "10" --
    XML-TYPE-TYPE    (instant|extent)      #IMPLIED -- required for

This changes the number of attributes from 4 to 9 but provides for higher
precision for data constraint.

The examples would become:

For a bank loan; balance, interest rate, and maturity date: 

                   XML-TYPE-ROUNDTO  CDATA #FIXED "0.01" 
                   XML-TYPE-MIN      CDATA #FIXED "0.00" >
                   XML-TYPE-MAX  CDATA #FIXED "100" -- in practice we may
                                                       this to be much
lower --
                   XML-TYPE-MIN  CDATA #FIXED "0" >
                   XML-TYPE-TYPE     CDATA #FIXED "INSTANT"
                   XML-TYPE-ROUNDTO  CDATA #FIXED "0000/00/01 00:00:00">

For an airline departure: passenger name, seat number, and departure time: 

                    XML-TYPE-CONTENT CDATA #FIXED "[A-Z](*20)" 
                                           -- up to 20 repetitions of
                        XML-TYPE-CONTENT  CDATA #FIXED "[A-Z]" >
                   XML-TYPE-MIN      CDATA #FIXED "1"
                   XML-TYPE-MAX      CDATA #FIXED "36"
                   XML-TYPE-ROUNDTO  CDATA #FIXED "1" >
                      XML-TYPE-CONTENT  CDATA #FIXED "[A-F]" >
                    XML-TYPE-TYPE     CDATA #FIXED "INSTANT"
                    XML-TYPE-ROUNDTO  CDATA #FIXED "0000/00/00 00:01:00"
                                       -- to the nearest minute -->
                      XML-TYPE-TYPE     CDATA #FIXED "EXTENT"
                      XML-TYPE-ROUNDTO  CDATA #FIXED "0000/00/00 00:15:00"
                                        -- to the nearest 15 minutes -->

Well, what do you think?


xml-dev: A list for W3C XML Developers
Archived as:
To unsubscribe, send to majordomo at the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at

More information about the Xml-dev mailing list