How to keep "useless" information with SAX (2?).

Paul Tchistopolskii paul at qub.com
Tue Nov 23 06:13:21 GMT 1999


Hello.

I'm playing with projectX ( SAX 1.0 based) code, 
trying to force SAX API to read the XML document 
and then to save it *unchanged*.

I was having some problems with 
'useless whitespace' outside the elements.  

I mean the situation, when I have:

<!DOCTYPE content SYSTEM "content.dtd">
<!-- some comment -->
<root>
</root>

With Sun's extensions to SAX API I could get 
the content of <!DOCTYPE and the content of comment.

Unfortunately, whitespace ( including the 
newline) between those constructions has been 
lost.

What I did to workaround this problem was 
patching the code in one place:

/com/sun/xml/parser/Parser.java

 private boolean maybeWhitespace () throws IOException, SAXException
 {

 if (!(inExternalPE && doLexicalPE))
  return in.ignorableWhitespace (docHandler);  

 // return in.maybeWhitespace ();    -- this was the old code

This allows me to get "ignoreable whitespace" 
everywhere, but not only  inside the elements, 
using ignoreableWhitespace callback. 

However - I feel that I'm making some ugly thing, 
because for some reason ignoreableWhitespace is 
defined only for element's content.

I'm wondering, what was the idea behind tracking 
ignoreableWhitespace only inside the elements ?

What happens in SAX2 ?

It also appears that Sun's parser ( I think that 
Sun's parser is not the exeption, right?) does not care 
about some 'useless' things, like <?xml header.

I mean that accodring to the code there is simply 
to way to capture the content of the <?xml header, even 
it may have some *very*  interesting ( sometimes 
critical ) information, like encoding. 

Because I already saw poor people, providing 
"windows-something", and because of some 
Java-specific issues, the current point of view 
on <?xml  header  (something not useful enough 
to keep) looks strange.

What happens in SAX2 ?

Rgds.Paul.




xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list