How to keep "useless" information with SAX (2?).
Paul Tchistopolskii
paul at qub.com
Tue Nov 23 06:13:21 GMT 1999
Hello.
I'm playing with projectX ( SAX 1.0 based) code,
trying to force SAX API to read the XML document
and then to save it *unchanged*.
I was having some problems with
'useless whitespace' outside the elements.
I mean the situation, when I have:
<!DOCTYPE content SYSTEM "content.dtd">
<!-- some comment -->
<root>
</root>
With Sun's extensions to SAX API I could get
the content of <!DOCTYPE and the content of comment.
Unfortunately, whitespace ( including the
newline) between those constructions has been
lost.
What I did to workaround this problem was
patching the code in one place:
/com/sun/xml/parser/Parser.java
private boolean maybeWhitespace () throws IOException, SAXException
{
if (!(inExternalPE && doLexicalPE))
return in.ignorableWhitespace (docHandler);
// return in.maybeWhitespace (); -- this was the old code
This allows me to get "ignoreable whitespace"
everywhere, but not only inside the elements,
using ignoreableWhitespace callback.
However - I feel that I'm making some ugly thing,
because for some reason ignoreableWhitespace is
defined only for element's content.
I'm wondering, what was the idea behind tracking
ignoreableWhitespace only inside the elements ?
What happens in SAX2 ?
It also appears that Sun's parser ( I think that
Sun's parser is not the exeption, right?) does not care
about some 'useless' things, like <?xml header.
I mean that accodring to the code there is simply
to way to capture the content of the <?xml header, even
it may have some *very* interesting ( sometimes
critical ) information, like encoding.
Because I already saw poor people, providing
"windows-something", and because of some
Java-specific issues, the current point of view
on <?xml header (something not useful enough
to keep) looks strange.
What happens in SAX2 ?
Rgds.Paul.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list