SGML, XML and SML
Paul Tchistopolskii
paul at qub.com
Tue Nov 23 07:55:27 GMT 1999
> From: Paul Tchistopolskii <paul at qub.com
>
> >1. Why entities should live in the core, if one can use
> >any macroprocessor to get *more* flexible functionality?
>
> 1) So that I can reference entities in attribute values
I'm talking about SML. ;-) No attributes there ( I think ).
Anyway - you can do it with m4.
> 2) So that I can have all my URIs in a header at the top,
> for maintainability
m4
> 3) Because I am not using UNIX pipes: is there a java
> version of M4?
You could wrap m4 with JNI. 1 day task, I think.
> 4) Because if I want to, I can already use a preprocessor;
> removing entities does not increase my options;
Why SGML has been stripped to XML? It does not
increazed any options ....
> 5) Because no macro preprocessors are internationalized;
Oh... I was also thinking that XML is internationalized.
Until I tried ( occasionaly) to invoke Expat on some file
with <?xml encoding="windows-something">.
It appears that it may be better to talk about
Java-internationalized XML and MS-internationalized XML ?
> 6) Because entities allow tracking of file and line-number:
> if a macro package does this and it is implemented using
> pipes then there must be some kind of PI or conventional
> comment embedded (I guess Don and the Elementalists would
> want this to be an element which a post-processor would
> then hide!)
I agree, it is sometimes inconvinient to use
macroprocessing. However, I think that because
most of users are *not* using macroprocessing
anyway, they'l not notice that entities has been
gone. ;-)
Also, one can use belowed XML macroprocessor on server-side
( together with belowed weak DTD-based validation), but
should all that code go to the client ?
I already know people who are writing simplified XML parsers
( right now ;-) to fit into some hardware. There is no
validation, no entities and ... well ... they have hacked together
something very close to SML ;-)
Next step for them will be to write simplified XT, DOM e t.c.
As far as I remember - there will be always some device with
'small' memory. For last 10 years I'm moving from 2-16-32K to
Megabytes and then back to Kilobytes again ;-)
I don't know why it happens, I'l better to consider it to be
some rule.
> >2. How often do we need entities outside the DTD's ?
>
> Often.
Big question. For example, when placing bigger and bigger
load on XT, macroses become templates , so it could result
in plain XML and heavy stylesheet. Macroses give us nothing
with the database. Stylesheets give a lot.
I guess macroprocessing was so important to SGML
because of manual editing of many documents, long
time ago. Right ?
Actualy, macroprocessor looks like not a good thing
when it comes to dataflow. C++ tried to avoid
macroprocessing as a bad practice.
Having 'internal' macroprocessor is ... maybe nice ...
but suspicious ( even I like macroprocessing itself,
I prefer not to overuse it.) I don't think it should live
in the core, because it's something 'extra'. 'Optional'.
I think it's the issue of taste.
> >> I suspect CDATA sections are hard to live
> >> without if you're writing XML documents about HTML or XML, though.
>
> Or any time you are writing programming code that uses <, & or >
>
> >Let us have <CDATA> element ? I think up to 3-5 elements with
> >'hardcoded' semantics will not cause a big problem.
>
> So the parent element of data changes depending on which way the
> person has marked the data up? I suppose that in DOM & XPath
> "parent" will become useless; instead it would be "first ancestor that
> is not a CDATA element".
Perfect. I'm glad I already agree to remove CDATA from SML -
there is something to think about.
> This is the same problem that open
> content models give: with open content models, the "previousSibling"
> and "nextSibling" elements are not so useful (and, indeed, "firstChild"
> and "lastChild") because we cannot be sure what they are without
> checking. But that is a small cost. However, moving us from a
> "parent" to a "virtualParent" relationship would be quite a major
> change.
I agree - it is not easy to find the workaround. It's better to drop
CDATA ;-)
> >It's why I think that SML vs XML is very similiar to
> >XML vs SGML.
> >
> >At some point it would be easier to break the
> >compatibility than to support legacy. As far as I
> >understand, exactly that thing happened with
> >XML vs SGML.
>
> No, SGML was enhanced to allow XML. (By the way, I think
> WebSGML already allows SML: I think it is legit to
> map the MDO and PIO delimiter to some "shunned
> characters" for example. The only difficulty is if multiple
> headers are involved.)
> SGML was based on the
> idea that it is impossible to get agreement on lexical
> issues from everyone, so the important issue is that
> different requirements can be described formally/legally.
> So XML did not break compatability with the standard;
> but no-one has ever thought that all SGML products
> should support all possible SGML syntaxes (despite James
> Clarks heroic efforts).
>
> XML is interesting not because it is simpler (my first
> simplified-SGML text-processing system was more than
> 10 years ago) but because there is *agreement*. That
> has been the new thing.
Oh ... It's something I can not understand ...
It's too complex for me...
The only reason why I supported the idea of SML
was that 2 weeks ago I just had a couple of long
discussions with one real-life client.
As a result I realized that they are reimplementing
almost *every*part* of XML standard, because
existing XML framework is too 'bloated' for them.
So when I saw the SML posting - I provided some
thoughts I got after that discussion. The thoughts
are: there is some place for SML.
How big is it ?
I don't know.
Is XML good? Of course it is. Is it 'bloated' .... well ...
I don't know. It's the issue of taste.
It *could* be considered to be bloated.
> I think this list will contain many old SGML hacks who
> will be dissapointed but not surprised if eventually
> XML fragments.
> ---
> By the way, I missed sending this before: it gives the
> cannonical XML productions with an guess at SML
> productions: they are the same size. (And, if we take
> out namespaces from one, we can take them out
> from the other, so no effective difference.)
>
> The current canonical XML grammar is this:
> [1] canonXML ::= (PI #xA)* element #xA (PI #xA)*
> [2] element ::= Stag (Datachar | element | PI)* Etag
> [3] Stag ::= '<' Name NSDecl? (Att NSDecl?)* '>'
> [4] Etag ::= '</' Name '>'
> [5] NSDecl ::= #x20 'xmlns:' Prefix '=' '"' Attvalchar* '"'
> [6] Att ::= #x20 Name '=' '"' Attvalchar* '"'
> [7] Datachar ::= '&' | '<' | '>' | '
'
> | (Char - ('&' | '<' | '>' | #xD ))
> [8] Attvalchar ::= '&' | '<' | '"' | '	' |
> '
' | '
'
> | (Char - ('&' | '<' | '"' | #x9 | #xA | #xD))
> [9] Name ::= (Prefix ':')? NCName
> [10] Prefix ::= 'n' [1-9] [0-9]*
> [11] PI ::= '<?' PITarget (#x20 (Char+ - (Char* '?>' Char*)))?
> '?>'
> [12] PITarget ::= NCName - (('X' | 'x') ('M' | 'm') ('L' |
> 'l'))
>
> which is hardly complex. Presumably, your SML would be
> something like this
>
> [1] canonXML ::= (Comment | #xA)* element (Comment | #xA)*
> [2] element ::= ( Stag (Datachar | element | Comment )* Etag )
> | Mtag
> [3] Stag ::= '<' Name NSDecl? (Att NSDecl?)* '>'
> [4] Etag ::= '</' Name '>'
> [5] NSDecl ::= #x20 'xmlns:' Prefix '=' '"' Attvalchar* '"'
> [6] Att ::= #x20 Name '=' '"' Attvalchar* '"'
> [7] Datachar ::= '&' | '<' | '>' | (Char - ('&' | '<'
> | '>' | #x9 | #xA | | #xD ))
> [8] Attvalchar ::= '&' | '<' | '"' | (Char - ('&'
> | '<' | '"' | #x9 | #xA | #xD))
> [9] Name ::= (Prefix ':')? NCName
> [10] Prefix ::= 'n' [1-9] [0-9]*
> [11] Comment ::= '<!--' (Char+ - (Char* '-->')))? '-->'
> [12 ] Mtag ::= '<' Name NSDecl? (Att NSDecl?)* '/>'
>
> Which is the same level of complexity. You take out PIs and
> add comments and empty start tags.
As far as I understand, SML has no attributes ... And also
maybe no MTag ... rules are simpler ... I suggest
"take no prisoners" :
+ Stag and Etag
+ Datachar
+ Comment ( optional )
+ Mtag for empty element ( optional )
I feel very suspicious when I see some project that
could not be implemented by one 'ideal' developer.
All the programs I love - could be implemented by
one developer ( to the shape of reasonable prototype) .
Implementing the XML framework to get something
handy looks like a *huge* task. Implementing reasonable
SML framework looks doable.
Rgds.Paul.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list