Character Tugging (Was:Re: SAX2/Java: Towards a final form)
Clark C. Evans
clark.evans at manhattanproject.com
Tue Jan 18 04:20:35 GMT 2000
On Mon, 10 Jan 2000, David Megginson wrote:
> public interface ContentHandler
> {
> public void characters (char ch[], int start, int length)
> throws SAXException;
>
> public void ignorableWhitespace (char ch[], int start, int length)
> throws SAXException;
>
> }
Context:
1. You have a multi-stage process, where SAX is the
interface between each stage.
2. Much of your XML document includes Date, Currency,
and other similar object types; which can take
significant parsing time and can always be retrieved
by Object.toString()
Problems:
1. In between each stage of my process I have to serialize
non-mutable application specific objects (Date, Currency,
TimeInterval) to a character stream and then re-constitute
the characters into the same application. This seems like
a waste of memory and processing time.
2. Lets say that my character content is bigger than the
fixed buffer that the parser is using. This means that
multiple calls to characters() will be generated for
the content, correct? The general case, then, forces
the use of a StringBuffer on the receiving end if
the goal is to build the content into a single String.
Am I correct here? I'm not sure.
3. Much of my XML information is sparse, thus a great deal
of the characters() calls will be wasted. Question:
is memory allocated for the char[] passed? If so, then
this is unnecessary allocation?
Suggested Solution:
public interface CharTug {
boolean isCharacters();
boolean isWhitespace();
Reader getReader();
Object getObject();
}
public interface ContentHandler {
public characters(CharTug content);
}
Problems Revisited:
1. Each stage can pass a CharTug, so that
the Object can be pulled without conversion.
If the Handler wants a Reader, then
new StringReader(myObject.toString())
can be returned. A helper can automate this.
2. Two items to note.
First, if the content is really huge, then
a full blown Reader is ideal.
Second, if the Handler wants a string, then
getObject.toString(); will do the trick nicely.
No need to construct a StringBuffer.
3. If the content is not needed, then the
CharTug will be ignored, and the execution
will return to the emitter, which can
then *skip* the content that could have
been pulled.
Just trying to solve my pracical problems...
Clark
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.
More information about the Xml-dev
mailing list