Character Tugging (Was:Re: SAX2/Java: Towards a final form)

Clark C. Evans clark.evans at manhattanproject.com
Tue Jan 18 04:20:35 GMT 2000


On Mon, 10 Jan 2000, David Megginson wrote:
>   public interface ContentHandler
>   {
>     public void characters (char ch[], int start, int length)
>       throws SAXException;
> 
>     public void ignorableWhitespace (char ch[], int start, int length)
>       throws SAXException;
>   
>   }


Context:
   
  1. You have a multi-stage process, where SAX is the
     interface between each stage.
  
  2. Much of your XML document includes Date, Currency,
     and other similar object types; which can take 
     significant parsing time and can always be retrieved
     by Object.toString()

Problems:

 1. In between each stage of my process I have to serialize
    non-mutable application specific objects (Date, Currency, 
    TimeInterval) to a character stream and then re-constitute 
    the characters into the same application.  This seems like
    a waste of memory and processing time.

 2. Lets say that my character content is bigger than the 
    fixed buffer that the parser is using.  This means that
    multiple calls to characters() will be generated for
    the content, correct?   The general case, then, forces
    the use of a StringBuffer on the receiving end if
    the goal is to build the content into a single String.
    Am I correct here?  I'm not sure.

 3. Much of my XML information is sparse, thus a great deal
    of the characters() calls will be wasted.  Question:
    is memory allocated for the char[] passed?  If so, then
    this is unnecessary allocation?

Suggested Solution:

   public interface CharTug {
       boolean isCharacters();
       boolean isWhitespace();
       Reader  getReader();
       Object  getObject();
   }

   public interface ContentHandler {
       public characters(CharTug content);
   }


Problems Revisited:

  1.  Each stage can pass a CharTug, so that
      the Object can be pulled without conversion.

      If the Handler wants a Reader, then
      new StringReader(myObject.toString())
      can be returned.  A helper can automate this.
  
  2.  Two items to note.  

      First, if the content is really huge, then 
      a full blown Reader is ideal.

      Second, if the Handler wants a string, then
      getObject.toString(); will do the trick nicely.
      No need to construct a StringBuffer.

  3.  If the content is not needed, then the 
      CharTug will be ignored, and the execution
      will return to the emitter, which can
      then *skip* the content that could have
      been pulled.


Just trying to solve my pracical problems...


Clark


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.





More information about the Xml-dev mailing list