[SML] Re: SML ?!?

Fri Nov 26 17:36:06 GMT 1999

On Fri, 26 Nov 1999, James Tauber wrote:

> > See http://www.xml.com/pub/1999/11/sml/index.html for an article
> > describing the SML idea

[James]
> I noted with interest (and disagreement) the technical arguments against
> attributes.

I had a similar feeling about the 'arguments against attributes' - I think
it drew a conclusion without having engaged the matter adequately.

1. Yeah, the SGML/XML notion of an "attribute" is badly broken
2. A markup language used in part as a data modelling language certainly
   should be able to distinguish notationally and conceptually between
   an object ('element') and an attribute.
3. Just throwing out the SGML/XML attribute isn't the right solution,
   in my judgment.

[...]
> 
> Why? Because in *markup* there is a distinction between content and markup.
> The character data content of an element is content. The value of an
> attribute is markup. Attributes, like other markup, provide information in
> addition to the textual content.
> 
> For example, a person thinking how to express the fact that Max is a dog
> that is black might use:
> 
> <dog>
>     <name>Max</name>
>     <colour>black</colour>
> </dog>
> 
> However, a person wanting to markup the text "Max" indicating that he is a
> black dog couldn't do the above. They might, instead, use:
> 
> <dog colour="black">Max</dog>
> 
> So if XML is being used for marking up existing textual content, attributes
> have a definite place.

Even this example, while instructive and illustrative, does not begin to
address the deeper issues.  For example, whatever the SGML/XML standards
may say formally about "content" versus "non-content", and irrespective
of whether these definitions accord well with users notions of "content"
in different application domains (a HUGE usability concern), we have in
TEI for example, a markup strategy for correcting a known or suspected
error in some text, with sic/corr tags:

I write:  As Job Bosak astutely observed
You encode:  <p>As <corr sic="Job">Jon</corr> Bosak astutely observed

(or something like this -- see TEI's mirror tags)

In practice, trying to declare in advance what may be reckoned as
"content" is extremely difficult (attribute value literals in some
contexts but not others; most all PCDATA but not all, etc.)

The definitions I've seen are break very quickly, when one considers
the range of applications and users WRT SGML/XML: it's not "what
is seen vs. what is not seen"; nor "what's really in the 'text'
vs. what is metadata".  All such distinctions I've seen are
broken, and non-repairable.  Especially if one is concerned to
uphold the notion of a descriptive markup language as having
no pre-defined application level processing semantics.

-rcc

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)