SAX and whitespace (was Re: Problems with whitespace and msxml)

David Megginson ak117 at freenet.carleton.ca
Sat Jan 3 02:48:48 GMT 1998


Tim Bray writes:

 > Lark, BTW, does *not* catch ignorable white space unless it is
 > validating.  Since it is perfectly OK to build SAX with such a
 > processor, *if* we want to build ignorable white-space notification
 > into SAX, it has to be out-of-band; i.e. white space is passed in
 > the same way as all other content; with perhaps another boolean argument
 > to the text() method (that what it's called now?) that if true, means
 > this is ignorable white space.

Thank you for the reply, Tim.  I would like to make certain, however,
that I understand the behaviour that you're recommending.  If a
DTD-driven parser finds ignorable whitespace, and if we decide that
SAX should not provide ignorable whitespace notification, then which
of the following is the correct action?

1) the parser should not report the whitespace; or

2) the parser should report the whitespace as regular character data.

>From my reading of the PR, and from my understanding of your comments,
you are recommending (2); in other words, given the following document:

  <!DOCTYPE foo [
   <!ELEMENT foo (bar+)>
   <!ELEMENT bar (#PCDATA)>
  ]>
  <foo>
  <bar>one bar</bar>
  <bar>two bars</bar>
  </foo>

A DTD-driven parser would report something like the following events
through SAX:

  - start document
  - start element: "foo"
  - character data: "\n"
  - start element: "bar"
  - character data: "one bar"
  - end element: "bar"
  - character data: "\n"
  - start element: "bar"
  - character data: "two bars"
  - end element: "bar"
  - character data: "\n"
  - end element: "foo"
  - end document

In full SGML, you'd get something a little simpler, because the
whitespace in element content would be discarded:

  - start document
  - start element: "foo"
  - start element: "bar"
  - character data: "one bar"
  - end element: "bar"
  - start element: "bar"
  - character data: "two bars"
  - end element: "bar"
  - end element: "foo"
  - end document

 > But I would oppose doing this in SAX; let's keep it simple for now. -T.

Sounds reasonable.


All the best,


David

-- 
David Megginson                 ak117 at freenet.carleton.ca
Microstar Software Ltd.         dmeggins at microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list