Internal subset equivalent in new schema proposals?
paul at prescod.net
Thu Nov 26 12:33:39 GMT 1998
Ronald Bourret wrote:
> Paul Prescod wrote:
> > Argh. Documents are data. The dichotomy is in your head. Doesn't XML
> > itself makes this abundantly clear?
> Documents might be data, but the dichotomy is not just in our heads. XML
> has a clear bias towards linear, prose-oriented verbiage. How else to
> explain mixed content, the significance of order,
I find it interesting that you use the fact that XML supports things that
other languages do not evidence of XML's bias. Java must be biased towards
Asians because it supports Unicode.
I use this analogy for a reason: XML's strength comes from the fact that
it solves the *whole problem* and not just the tiny quarter of it that
would make relational database people happy. It turns out that the
problems of structuring relational data are so incredibly simple to solve
that all of their problems are subsumed in the problem of modelling
documents. After all, relational data regularly finds it way into
documents: look at a telephone book or parts catalog. I would trust Asians
to invent character sets more than North Americans because our problems
are a subset of theirs!
> (as opposed to nesting),
XML supports nesting nicely!
> and the lack of basic data typing?
There are two myths here: that SGML does not support data typing (which it
has, explicitly, since 1986 and in another sense, since 1992), and that
document people do not care about data typing.
XML provides basic data typing. Data typing is provided at the element
level. Sub-element data typing is supposed to be provided by applications
and triggered by notations. This is generalized (like everything else in
SGML) because we all have different needs, and it seemed an easy enough
thing to add in another layer. You might not like these design decisions,
but they were design decisions and not things that the designers left out.
So called data-typing is something that people from document backgrounds
*regularly ask for*. "Please allow us to force our users to make
phone-numbers phone-numbers and dates dates", they beg. If you go back in
the archives of various obscure mailing lists, you'll hear me asking this
question five years ago, and without a relational or OO database in sight!
Dejanews seems to only go back to 1995 for comp.text.sgml, but you'll see
the question asked there almost immediately:
Even if you don't believe me that document people care about things like
phone numbers, dates and other regular forms, consider this: how much do
you think it would be worth to a technical publication director to be able
to stipulate that empty paragraphs are illegal? To me, this is just
another lexical constraint, and is not much different from requiring a
valid credit card number.
The powers that be made the decision in 1986 that the notation mechanism
allowed users to implement their own solutions, in some cases based upon
other ISO standards. Later, they came up with a standard lexical-typing
mechanism to this problem around 1992. It's a little ugly, so I don't
blame the W3Cers for wanting to do it themselves, but nevertheless it
exists. It is an add-on to SGML called HyLex. It isn't a mandatory part of
an SGML system, so people can still invent their own solutions if they
want. Later, they extended it to allow Posix regular expressions also.
It is debatable whether lexical typing and data typing are the same. But
it is similarly debatable whether it is the responsibility of a *language
definition language* to enforce constraints that can not be expressed in
terms of linguistic formalisms. It isn't a question of "documents vs.
data" but "whose job is it?" There are many sorts of document-oriented
constraints that SGML also cannot express without add-ons.
> Fortunately, XML is proving itself to
> be a good way to transport the kinds of data many of us think of when we
> hear the word "data".
Of course...SGML has been doing so for years.
Paul Prescod - ISOGEN Consulting Engineer speaking for only himself.
Christmas shopping in a T-Shirt? Toto, I have a feeling we
aren't in Canada anymore.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev