SGML, XML and SML

Tue Nov 23 07:55:27 GMT 1999

> From: Paul Tchistopolskii <paul at qub.com
> 
> >1. Why entities should live in the core, if one can use
> >any macroprocessor to get  *more*  flexible functionality?
> 
> 1) So that I can reference entities in attribute values

I'm talking about SML. ;-) No attributes there ( I think ).
Anyway - you can do it with m4.

> 2) So that I can have all my URIs in a header at the top,
> for maintainability

m4

> 3) Because I am not using UNIX pipes: is there a java
> version of M4?

You could wrap m4 with JNI. 1 day task, I think.

> 4) Because if I want to, I can already use a preprocessor;
> removing entities does not increase my options;

Why SGML has been stripped to XML? It does not 
increazed any  options ....

> 5) Because no macro preprocessors are internationalized;

Oh... I was also thinking that XML is internationalized. 
Until I tried ( occasionaly) to invoke Expat on some file 
with <?xml encoding="windows-something">. 
It appears that it may be  better to talk about 
Java-internationalized XML and MS-internationalized XML ?

> 6) Because entities allow tracking of file and line-number:
> if a macro package does this and it is implemented using
> pipes then there must be some kind of PI or conventional
> comment embedded (I guess Don and the Elementalists would
> want this to be an element which a post-processor would
> then hide!)

I agree,  it is sometimes inconvinient to use 
macroprocessing. However, I think that because 
most of users are *not* using macroprocessing 
anyway, they'l not notice that entities has been 
gone. ;-)

Also,  one can use belowed XML macroprocessor on server-side
( together with belowed weak DTD-based validation), but 
should all that code go to the client ? 

I already  know people who are writing simplified  XML parsers 
( right now ;-) to fit into some hardware.  There  is no 
validation, no entities and ... well ... they  have hacked together 
something very close to SML ;-)

Next step for them will be to write simplified XT, DOM e t.c. 

As far as I remember - there will be always some device with 
'small' memory.  For last 10 years I'm moving from 2-16-32K to 
Megabytes and then back to Kilobytes again ;-) 

I don't know why it happens, I'l better to consider it to be 
some rule.

> >2. How often do we need entities outside the DTD's ?
> 
> Often.

Big question. For example, when placing bigger and bigger 
load on  XT,  macroses become templates , so  it could result
in plain XML and heavy stylesheet.  Macroses give us nothing 
with the database. Stylesheets give a lot.

I guess macroprocessing was so important  to SGML
because of  manual editing of many documents, long 
time ago. Right ? 

Actualy, macroprocessor looks like not a  good thing 
when it  comes to dataflow. C++ tried to avoid 
macroprocessing as a bad practice.  

Having 'internal' macroprocessor  is ... maybe nice ... 
but suspicious ( even I like macroprocessing itself,  
I prefer not to overuse it.)  I don't  think it should live 
in the core, because it's something 'extra'.  'Optional'.

I think it's the issue of taste.

> >> I suspect CDATA sections are hard to live
> >> without if you're writing XML documents about HTML or XML, though.
> 
> Or any time you are writing programming code that uses <, & or >
> 
> >Let us have <CDATA> element ? I think up to 3-5 elements with
> >'hardcoded' semantics will not cause a big problem.
> 
> So the parent element of data changes depending on which way the
> person has marked the data up?   I suppose that in DOM & XPath
> "parent" will become useless; instead it would be "first ancestor that
> is not a CDATA element".

Perfect.  I'm glad I already agree to remove CDATA  from  SML  - 
there is something to think about. 

> This is the same problem that open
> content models give: with open content models, the "previousSibling"
> and "nextSibling" elements are not so useful (and, indeed, "firstChild"
> and "lastChild") because we cannot be sure what they are without
> checking.  But that is a small cost.  However, moving us from a
> "parent" to a "virtualParent" relationship would be quite a major
> change.

I agree - it is not easy to find the workaround. It's better to drop 
CDATA ;-)

> >It's why I think that SML vs XML is very similiar to
> >XML vs SGML.
> >
> >At some point it would be easier to break the
> >compatibility than to support legacy. As far as I
> >understand, exactly that thing happened with
> >XML vs SGML.
> 
> No, SGML was enhanced to allow XML.  (By the way, I think
> WebSGML already allows SML: I think it is legit to
> map the MDO and PIO delimiter to some "shunned
> characters" for example. The only difficulty is if multiple
> headers are involved.)

> SGML was based on the
> idea that it is impossible to get agreement on lexical
> issues from everyone, so the important issue is that
> different requirements can be described formally/legally.
> So XML did not break compatability with the standard;
> but no-one has ever thought that all SGML products
> should support all possible SGML syntaxes (despite James
> Clarks heroic efforts).
> 
> XML is interesting not because it is simpler (my first
> simplified-SGML text-processing system was more than
> 10 years ago) but because there is *agreement*.  That
> has been the new thing.

Oh ... It's something I can not understand ...   
It's too complex  for me...

The only reason  why I supported the idea of SML 
was that 2 weeks ago I just had a couple of long 
discussions  with one real-life client.  

As a result I realized that they are reimplementing 
almost *every*part* of XML standard, because 
existing XML framework  is  too 'bloated' for them.  

So when I saw the SML posting - I provided some 
thoughts I got after that discussion. The thoughts 
are: there is some place for SML.  

How big is it ?

I don't know.

Is XML good? Of course it is. Is it 'bloated' .... well ... 
I don't know.  It's the issue of taste.

It *could* be considered to be bloated.

> I think this list will contain many old SGML hacks who
> will be dissapointed but not surprised if eventually
> XML fragments.
> ---
> By the way, I missed sending this before: it gives the
> cannonical XML productions with an guess at SML
> productions: they are the same size. (And, if we take
> out namespaces from one, we can take them out
> from the other, so no effective difference.)
> 
> The current canonical XML grammar is this:
> [1]    canonXML    ::=    (PI #xA)* element #xA (PI #xA)*
> [2]    element    ::=    Stag (Datachar | element | PI)* Etag
> [3]    Stag    ::=    '<' Name NSDecl? (Att NSDecl?)* '>'
> [4]    Etag    ::=    '</' Name '>'
> [5]    NSDecl    ::=    #x20 'xmlns:' Prefix '=' '"' Attvalchar* '"'
> [6]    Att    ::=    #x20 Name '=' '"' Attvalchar* '"'
> [7]    Datachar    ::=    '&amp;' | '&lt;' | '&gt;' | '&#xD;'
>    | (Char - ('&' | '<' | '>' | #xD ))
> [8]    Attvalchar    ::=    '&amp;' | '&lt;' | '&quot;' | '&#x9;' |
> '&#xA;' | '&#xD;'
>    | (Char - ('&' | '<' | '"' | #x9 | #xA | #xD))
> [9]    Name    ::=    (Prefix ':')? NCName
> [10]    Prefix    ::=    'n' [1-9] [0-9]*
> [11]    PI    ::=    '<?' PITarget (#x20 (Char+ - (Char* '?>' Char*)))?
> '?>'
> [12]    PITarget    ::=    NCName - (('X' | 'x') ('M' | 'm') ('L' |
> 'l'))
> 
> which is hardly complex. Presumably, your SML would be
> something like this
> 
> [1]    canonXML    ::=    (Comment | #xA)*  element (Comment | #xA)*
> [2]    element    ::=   ( Stag (Datachar | element  | Comment )* Etag )
> | Mtag
> [3]    Stag    ::=    '<' Name NSDecl? (Att NSDecl?)* '>'
> [4]    Etag    ::=    '</' Name '>'
> [5]    NSDecl    ::=    #x20 'xmlns:' Prefix '=' '"' Attvalchar* '"'
> [6]    Att    ::=    #x20 Name '=' '"' Attvalchar* '"'
> [7]    Datachar    ::=    '&amp;' | '&lt;' | '&gt;' | (Char - ('&' | '<'
> | '>' | #x9 | #xA | | #xD ))
> [8]    Attvalchar    ::=    '&amp;' | '&lt;' | '&quot;'  | (Char - ('&'
> | '<' | '"' | #x9 | #xA | #xD))
> [9]    Name    ::=    (Prefix ':')? NCName
> [10]    Prefix    ::=    'n' [1-9] [0-9]*
> [11]    Comment   ::=    '<!--' (Char+ - (Char* '-->')))? '-->'
> [12 ]  Mtag   ::=    '<' Name NSDecl? (Att NSDecl?)* '/>'
> 
> Which is the same level of complexity. You take out PIs and
> add comments and empty start tags.

As far as I understand,  SML has no attributes ... And also 
maybe no MTag ...  rules are simpler ...  I suggest  
"take no prisoners" :

+  Stag and Etag 
+  Datachar

+  Comment ( optional )
+  Mtag for empty element ( optional )

I feel very suspicious when I see some project  that 
could not be implemented by one 'ideal' developer. 

All the programs I love - could be implemented by 
one developer ( to the shape of  reasonable prototype) . 

Implementing the XML framework to get something 
handy looks like a *huge*  task. Implementing reasonable 
SML framework looks doable.

Rgds.Paul.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)