SGML, XML and SML

Tue Nov 23 05:57:48 GMT 1999

From: Paul Tchistopolskii <paul at qub.com

>1. Why entities should live in the core, if one can use
>any macroprocessor to get  *more*  flexible functionality?

1) So that I can reference entities in attribute values
2) So that I can have all my URIs in a header at the top,
for maintainability
3) Because I am not using UNIX pipes: is there a java
version of M4?
4) Because if I want to, I can already use a preprocessor;
removing entities does not increase my options;
5) Because no macro preprocessors are internationalized;
6) Because entities allow tracking of file and line-number:
if a macro package does this and it is implemented using
pipes then there must be some kind of PI or conventional
comment embedded (I guess Don and the Elementalists would
want this to be an element which a post-processor would
then hide!)

>2. How often do we need entities outside the DTD's ?

Often.

>> I suspect CDATA sections are hard to live
>> without if you're writing XML documents about HTML or XML, though.

Or any time you are writing programming code that uses <, & or >

>Let us have <CDATA> element ? I think up to 3-5 elements with
>'hardcoded' semantics will not cause a big problem.

So the parent element of data changes depending on which way the
person has marked the data up?   I suppose that in DOM & XPath
"parent" will become useless; instead it would be "first ancestor that
is not a CDATA element".

This is the same problem that open
content models give: with open content models, the "previousSibling"
and "nextSibling" elements are not so useful (and, indeed, "firstChild"
and "lastChild") because we cannot be sure what they are without
checking.  But that is a small cost.  However, moving us from a
"parent" to a "virtualParent" relationship would be quite a major
change.

>It's why I think that SML vs XML is very similiar to
>XML vs SGML.
>
>At some point it would be easier to break the
>compatibility than to support legacy. As far as I
>understand, exactly that thing happened with
>XML vs SGML.

No, SGML was enhanced to allow XML.  (By the way, I think
WebSGML already allows SML: I think it is legit to
map the MDO and PIO delimiter to some "shunned
characters" for example. The only difficulty is if multiple
headers are involved.)

SGML was based on the
idea that it is impossible to get agreement on lexical
issues from everyone, so the important issue is that
different requirements can be described formally/legally.
So XML did not break compatability with the standard;
but no-one has ever thought that all SGML products
should support all possible SGML syntaxes (despite James
Clarks heroic efforts).

XML is interesting not because it is simpler (my first
simplified-SGML text-processing system was more than
10 years ago) but because there is *agreement*.  That
has been the new thing.

I think this list will contain many old SGML hacks who
will be dissapointed but not surprised if eventually
XML fragments.
---
By the way, I missed sending this before: it gives the
cannonical XML productions with an guess at SML
productions: they are the same size. (And, if we take
out namespaces from one, we can take them out
from the other, so no effective difference.)

The current canonical XML grammar is this:
[1]    canonXML    ::=    (PI #xA)* element #xA (PI #xA)*
[2]    element    ::=    Stag (Datachar | element | PI)* Etag
[3]    Stag    ::=    '<' Name NSDecl? (Att NSDecl?)* '>'
[4]    Etag    ::=    '</' Name '>'
[5]    NSDecl    ::=    #x20 'xmlns:' Prefix '=' '"' Attvalchar* '"'
[6]    Att    ::=    #x20 Name '=' '"' Attvalchar* '"'
[7]    Datachar    ::=    '&amp;' | '&lt;' | '&gt;' | '&#xD;'
   | (Char - ('&' | '<' | '>' | #xD ))
[8]    Attvalchar    ::=    '&amp;' | '&lt;' | '&quot;' | '&#x9;' |
'&#xA;' | '&#xD;'
   | (Char - ('&' | '<' | '"' | #x9 | #xA | #xD))
[9]    Name    ::=    (Prefix ':')? NCName
[10]    Prefix    ::=    'n' [1-9] [0-9]*
[11]    PI    ::=    '<?' PITarget (#x20 (Char+ - (Char* '?>' Char*)))?
'?>'
[12]    PITarget    ::=    NCName - (('X' | 'x') ('M' | 'm') ('L' |
'l'))

which is hardly complex. Presumably, your SML would be
something like this

[1]    canonXML    ::=    (Comment | #xA)*  element (Comment | #xA)*
[2]    element    ::=   ( Stag (Datachar | element  | Comment )* Etag )
| Mtag
[3]    Stag    ::=    '<' Name NSDecl? (Att NSDecl?)* '>'
[4]    Etag    ::=    '</' Name '>'
[5]    NSDecl    ::=    #x20 'xmlns:' Prefix '=' '"' Attvalchar* '"'
[6]    Att    ::=    #x20 Name '=' '"' Attvalchar* '"'
[7]    Datachar    ::=    '&amp;' | '&lt;' | '&gt;' | (Char - ('&' | '<'
| '>' | #x9 | #xA | | #xD ))
[8]    Attvalchar    ::=    '&amp;' | '&lt;' | '&quot;'  | (Char - ('&'
| '<' | '"' | #x9 | #xA | #xD))
[9]    Name    ::=    (Prefix ':')? NCName
[10]    Prefix    ::=    'n' [1-9] [0-9]*
[11]    Comment   ::=    '<!--' (Char+ - (Char* '-->')))? '-->'
[12 ]  Mtag   ::=    '<' Name NSDecl? (Att NSDecl?)* '/>'

Which is the same level of complexity. You take out PIs and
add comments and empty start tags.

Rick Jelliffe

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)