Internal subset equivalent in new schema proposals?

Paul Prescod paul at
Thu Nov 26 18:15:16 GMT 1998

Michael Kay wrote:
> A document is information organised for human communication; data is
> information organised for machine processing. 

SGML and XML are explicitly about organizing information for machine
processing. So according to your definition, SGML is about data, not

> XML can do both, but I stick
> with my original claim that it is optimised for the former.

I can't stop you from sticking to your claim, but I do ask that you
replace your repudiated arguments with new ones.

> I wasn't complaining that it contains many redundant features which would
> not be there in a data-oriented syntax. I have learned not to use those
> features, and I try to explain patiently when people ask yet again whether
> they should be using elements or attributes...

Could you please describe how it is important that a document-oriented
language have both elements and attributes, but not a data-oriented
language? It may be convenient to attribute (excuse me) everything that
seems strange or wrong about XML to this false cultural dichotomy, but it
doesn't hold up to analysis. "Document people" don't understand the
distinction between elements and attributes either. (or if they do, nobody
has told me) We discussed removing them in the move to XML, but it was
decided that they were a typing and programming convenience and provided
simple lexical typing.

If you want to claim that XML is biased towards human *creation* (not
consumption) then I will strongly agree. IF it were not for human factors,
we could have defined a much more compact, portable, binary encoding.

> Rather my complaint was about things that I'd like to do in the data
> interchange world but can't. As Ron says, I can't do data typing in XML 1.0,
> and Paul's explanation doesn't alter the fact.

That "fact" is arguable. But even if we accept it, you haven't
demonstrated how it has anything to do with XML's roots in document
processing instead of "data processing?" You just assert it.

As far as the fact goes: you are right, you cannot do data typing in XML
1.0 alone -- you must reference some other standard like ISO 8601 or
HyTime Lextype. You also can't do hypertext linking in XML 1.0 alone.
Would you argue that "document people" don't do hypertext linking and
that's why it isn't built in?

You *can* do data typing in one of four standardized ways I've described
already: content models, NOTATION, LEXTYPE and HYLEX. So this problem was
solved in the document community either 10 or 6 years before XML came
along, depending on how you count it, but document-centricity still takes
the blame that XML doesn't support it? The logic doesn't jive.

> The complaint in my original post was my recent discovery that the internal
> DTD subset destroys many of the assumptions I have made in my applications
> about the conformance of the incoming document to a schema.  

Document people have been complaining about this problem since before
there was an XML also. Eliot Kimber (a "document guy") has expounded on
this issue in this very forum and in comp.text.sgml.

This is just as big a flaw for document people as it is for data people. I
know that it is natural to try to and find reasons for flaws, but
sometimes things are just wrong. There is no cultural reason for them. I
guess that if it were really important to blame someone, I would have to
blame the computer scientists who could not see that documents were data
and thus left it to a lawyer to solve the generalized problem of
character-based data representation. Compounding the mistake doesn't solve

Of course, "document people" must take most of the blame for SGML's flaws,
because it clearly came out of the document processing community. But it
doesn't logically follow that document people have different needs from
everyone else. We have a superset of everyone else's needs. Any sort of
data can end up in some document and we must be able to handle it.

> Actually, the problem is not quite as bad as this: the internal DTD subset
> can override constraints in my attribute declarations but not in my element
> declarations. Let us be thankful for small mercies. This seems to be another
> reason for using elements rather than attributes, which I will add to my
> standard answer on the question: the very limited data typing available for
> attributes can be overriden at the whim of the user!

It was only in the change from SGML to XML that it became possible to
override attributes from the internal subset. You would have to blame the
"web people" for that change, I guess. Old SGML didn't have that
particular problem.

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself.

Thanksgiving in November? Toto, I have a feeling we 
aren't in Canada anymore.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list