SAX: Whitespace Handling (question 5 of 10)

Matthew Gertner matthewg at poet.de
Mon Jan 5 10:35:12 GMT 1998


>At 01:02 PM 03/01/98 -0500, David Megginson wrote:
>>Should SAX allow DTD-driven parsers to distinguish ignorable
>>whitespace from other character data?
>
>If you want to do this, the only reasonable way is with another
>argument on the charData() callback, so that it's always chardata,
>but some processors will in some circumstances signal that it's
>also ignorable.
>
>Since I think it would be highly unwise for any SAX-using
>application to have behavior dependent on the ignorability of
>some white space, I would argue strongly just for leaving
>this out. -Tim


I am pretty leery of arguments along the line of "if we allow this, people
will abuse it". There are certainly cases where this information is
essential, so why lock out certain classes of applications for what
essentially amounts to a single boolean parameter, which could be defaulted?
For example, consider an application that takes an HTML document augmented
with XML tags which are to be converted to text or HTML by some mechanism
for viewing in a HTML browser. If the document reads something like:

...
<body>
First line.
<myprint value="foo">
    <param name="name1" value="value1"/>
</myprint>
Second line.
</body>
...

I am sure there are plenty of similar examples when one DTD is being used to
generate another, viewable one. This is a perfect SAX application since it
doesn't require any funky comments, entity resolution, etc., but if there is
no indication of which whitespace is ignorable, it is impossible to
implement since you get spurious carriage returns and spaces in the
generated output.

BTW: IMHO, IFF there is going to be a "default implementation" anyway, I
would actually prefer an "ignorableWhitespace" method which calls charData
by default. This will permit cleaner implementations.

<ignorance>
Is text containing *only* whitespace inside an "ambiguous" area of a mixed
content model considered to be ignorable?
</ignorance>

Regards,

Matthew
------------------------------------------------
Matthew Gertner
Project Manager/Architect, Internet/Document Management
POET Software GmbH
Tel: +49 (40) 609 90254
Fax: +49 (40) 609 90115
E-mail: matthewg at poet.de
------------------------------------------------


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list