Simple approaches to XML implementation

Toby Speight tms at ansa.co.uk
Sun Mar 2 15:41:29 GMT 1997


Peter> Peter Murray-Rust <URL:mailto:Peter at ursus.demon.co.uk>
Tim> Tim Bray <URL:mailto:tbray at textuality.com>

I'm not signing this, since one of our mailhosts is badly corrupting
text.

>>>>> In article <3.0.32.19970301162825.006900ac at pop.intergate.bc.ca>,
>>>>> Tim wrote:

Tim> Whereas I agree with the rest of Francois' contribution, this
Tim> paragraph is not quite right.  If you change "ESIS event stream"
Tim> to "Instance character stream", then it would be more correct.
Tim> But in fact the SGML-> SGML declaration was not one of our goals;

>>>>> In article <4143 at ursus.demon.co.uk>, Peter wrote:

Peter> I hope I haven't muddied the waters here - SGML->SGML was not
Peter> my intention either.  The (possibly fuzzy) idea was that I (and
Peter> probably others) are familiar with ESIS because they use sgmls
Peter> and 'could this help us in our search for the ideas that go
Peter> into the API'.

I think it's good to have some of these conceptual anchors around - it
helps us know when we're talking about the same things.

Tim> ... for example, the processor is not required to tell the app
Tim> about [at least] comments and <![CDATA[ sections.  The XML spec
Tim> says *nothing* about the ESIS, merely, in a very abstract way,
Tim> what the processor has to give the application.

Tim> If either of these problems (the impossibility of SGML->SGML or
Tim> the absence of an ESIS equivalent) is a big huge flaw in XML,
Tim> there's still time to fix it.  The SGML->SGML problem is probably
Tim> a job for the XML WG.  The ESIS issue is perhaps a job for this
Tim> list.  I personally think an API is better than an ESIS [even if
Tim> the ESIS were properly defined] anyhow.

Peter> ... there is no reason why we shouldn't initially limit the
Peter> power of the API if it makes sense.  For example [as Tim says]
Peter> I can do without the comments and CDATA.


IMO, the application should be able to decide (preferably at compile
time) whether it is interested in comments etc.  We want to enable the
creation of small, efficient applications as well as highly capable
ones; I suggest an approach of providing lots at the parser, but
providing filtering down to ESIS by default.


My mental model has two kinds of application: those that take a well-
formed document and present it to the user, and those that take a
valid document and allow the user to edit and save it.  The ability to
perform the identity transform is obviously a requirement for the
latter, whereas an ESIS stream may be sufficient for the former.  What
exactly constitutes an identity transform is not entirely clear cut,
though.  Is it okay to expand internal CDATA entities?  Do we need to
preserve record-end information?  (We might want to do this if we will
be running "diff" on the output - for version control systems,
perhaps).

I'd like to see a parser come with a base class for building an
application's event-stream handler, that simply throws away most
events - the application writer overrides the methods he is interested
in.  Some of the events, however would have other actions.  Two
examples:

1. the default handler for #PCDATA would expand internal CDATA entity
   references and splice in marked sections, and pass the result to
   the handler for ESIS "data".
2. the default handler for #EMPTY elements would call the handler for
   start-tag, then the one for end-tag.

I'm looking at the "Esis" interface in NXP, and I think it could be
modified to act as such a base class.  Comments from Norbert would be
appreciated.

The use of the base-class methods as a filter from the XML event
stream to an ESIS stream means that an application could be written[*]
that acts on ESIS events, but could selectively choose events to
handle from a superset of ESIS - could we agree on a suitable
superset?

[*] or an existing application quickly ported - this makes a
    convincing argument to me :-)

[In my approach, we could even change the superset without affecting
those applications that run off the subset - and simply extending the
superset shouldn't affect any existing application, because the base
class would simply throw away the new events]


Peter> We don't lose anything by getting txis off the ground quickly.
Peter> It exercises the language, helps us locate resources and
Peteb> clarifies our thoughts.  A first-generation set of tools will
Peter> impress the world and maybe might be extended into more
Peter> powerful systems.  It also helps to build up a core of
Peter> documents that act as examples.

I'm not comment on this; I just quoted it because I agree :-)


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list