Feeler for SML (Simple Markup Language)

Ken MacLeod ken at bitsko.slc.ut.us
Thu Nov 11 17:57:09 GMT 1999


"Don Park" <donpark at docuverse.com> writes:

> I have been thinking that there are applications out there that can
> benefit from using XML yet donot need all of its features.  The
> Canonical XML spec goes quite a distance in cutting away some of the
> features, for different purpose, but I still feel that more can be
> cut away.  Rick Jelliffe's rather funny message along with some
> WebDAV papers I read over the weekend got my mind buzzing enough for
> me to share this idea with the rest of XML-DEV.

There have been many threads on "XML could have/should have been much
simpler if they had just done XYZ" often referring to Scheme or TeX
syntax, for example.

I too have had one of those brainfarts because I like to hand edit XML
and it's way too bulky in most cases.  I've been playing with an "XML
shorthand" that follows the Scheme camp's proposals (I originally
started out with a TeX style ;-).  One of the key requirements,
though, is the need to be interoperable with XML 1.0.  Here's what I
came up with.  The basic syntax only sees what XML people would call
"elements" and "character data", all other XML structures are actually
built from XML-SH "elements".  XML-SH "element names" may contain XML
special characters.  First an example with minimal XML markup:

  {p I've been playing with an {quote XML shorthand} that follows the
  {language Scheme} camp's proposals {note{=type parenthetical} I
  originally started out with a {language TeX} style {wink}}.  One of
  the key requirements, though, is the need to be interoperable with
  {standard{=version 1.0} XML}.}

With more XML markup:

  {?xml{=version 1.0}}
  {!DOCTYPE{=PUBLIC -//blah/}{=SYSTEM foo.dtd}
   {!ELEMENT foo (#PCDATA|subfoo)*}
  }
  {foo this is a foo, {subfoo this is subfoo.}
   This is {subfoo{=bar with an attribute}}.}


And a very rough draft of the basic syntax:

  Document ::= (Element | S)*

  Char ::= #x9 | #xA | #xD | [#x20-#xD7FF]
          | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
        /* as defined in XML 1.0 */

  S ::= (#x20 | #x9 | #xD | #xA)+
        /* as defined in XML 1.0 */

  Element ::= StartDelim (Char - S)*
              (S | StartDelim | EndDelim) Char* EndDelim
        /* This is meant to state that an element starts with a
           StartDelim, followed by zero or more non-white space
           characters, and may contain mixed content (starting with
           another StartDelim or S), and ending with EndDelim.  Empty
           elements are represented using {TAG}, no space after tag
           content is represented using {TAG{}content} */

StartDelim and EndDelim are charset-dependent.  For ASCII I'd use `{'
and `}' as in the examples above.  In Unicode I'd really like to see
two characters dedicated to this purpose so markup can never conflict
with character data.

Interoperability with XML 1.0 is implemented using a transform and
well-formedness constraints in XML-SH.  "Well formed" in XML-SH would
mean "well formed when converted to XML 1.0".

The transform is fairly guessable from the above.  The XML-SH parser
reads everything as elements and character data, the transformer
recognizes elements that are actually XML structures and converts them
appropriately.  An XML-SH compliant parser should not need to generate
XML events directly, depending on a transform filter to do that
instead.  A minimal filter should be provided to restrict passing
non-well-formed element names in startElement()/endElement() events.
This may (should) also involve gathering attribute-elements into
startElement() attributes.  More examples of transformable XML-SH:

 {?TARGET DATA}                          -- processing instruction
 {?TARGET {=NAME VALUE}}                 -- a PI using attr form
 {!DOCTYPE {=PUBLIC FPI} {=SYSTEM URI}   -- start of DOCTYPE
   {!ELEMENT NAME CONTENTSPEC}           -- element declaration
 }
 {!-- COMMENT}                           -- comment
 {&ENTITYREF}                            -- entity reference
 {&#123}                                 -- character reference
 {&#xabcd}                               -- hex character reference
 {NAME ...}                              -- an element with content
 {NAME{=ATTR VALUE} ...}                 -- an element with attributes
 {NAME{}...}                             -- element w/o extra whitespace
 {NAME followed by many lines of text
   {/NAME}}                              -- a no-op used as a comment

Some messages in this thread refer to "contracts".  Several of these
contracts could be based on limiting what features the XML transformer
will allow.  For example, there is no support in the XML-SH parser for
external entities, that's left to the transformer.

-- 
  Ken MacLeod
  ken at bitsko.slc.ut.us

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list