Scripting and XML

Mon Oct 20 14:58:05 BST 1997

> From: Simon St.Laurent <SimonStL at classic.msn.com>

> It seems like the complexities of SGML that XML stripped away are still 
> haunting XML.  

Of course. 

But I think we should be aware that SGML came at the end of 
perhaps a fifteen year develop project involving thousands of 
documents at IBM and other places.  So XML is the result of 
25 years of continuous development. I think humility should make
those of us with substantially less experience (time-wise and
scale-wise) be careful not to label as irrelevant or excessive
any SGML feature that we have not personally seen the need for! 

Which is not to say we cannot fruitfully bitch and clamour for what
we need for own tasks, of course:-)   That we all could have
done it ever so much better goes without saying.  If you are interested
in improving SGML (which can flow through into XML) then contact
ANSI or your local standards organisation and becoem part of the process.
If you are interested in improving XML (which can flow through into SGML)
then I guess join W3C or be vocal on this group!  However, I think
the XML 1.0 design is pretty much stabilized now.

> Excellent.  Now we know what the SGML developers were thinking - now we just 
> need to figure out why this is relevant to XML.  Why is it so difficult to 
> create CDATA elements - which have to be marked clearly in XML by start and 
> end tags? 

It may be helpful to clarify what "Language" in SGML and XML means: it is not 
"something with a grammar" but "something directly readable by humans and 
editable with plain text editors".  In other words, there can be no 
"binary" markup in SGML documents.

So any solution to embed binary indexes to ends of binary sections is
not SGML, because it is not human readable on a simple text editor.
(It is just a fancy data storage format.  I think the HyTime
sBento provides a high-level interface to data storage of this kind, so
you can use these from within SGML/XML and still be ISO standard, by the
way. )

> There is no need in XML to stop CDATA at just any </ sequence, just 
> the </ sequence which turns into the full end tag of the element.  Of course, 
> this would probably break compatibility with all my favorite SGML parsers, at 
> least if I wrote scripts that used </ at some point

If your favourite parser is conforming to SGML (e.g. SP, OmniMark) then
it will treat the CDATA as stopping when it finds any "</" followed by
any NAMESTART character, whether or not the character is the start of
the current element's type-name or not.  This is because SGML allows you
to omit tags in many circumstances: this is familiar from HTML (except
that vendors implement their own minimiation systems in their editors which 
currently I am so mad at, because I have to re-mark up a few hundred 
pages because a highly regarded HTML editor completely stuffed up 
all my nice markup, rrrr: I am sure if the programmers knew how much 
difficulty they cause by their brilliant proprietary ideas they would 
resign in shame as *HOPELESS HACKERS*).  

XML does not allow you to omit any tags. XML does not do any context checking,
as far as well-formedness.  

Rick Jelliffe

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)