SAX: Byte Streams and Character Streams

David Megginson ak117 at
Sun Apr 19 23:47:16 BST 1998

Tim Bray writes:

 > Hmmm, for what it's worth, Lark, both in its current form and
 > after the big performance update coming Real Soon Now, works
 > at approximately equal speed off byte and character streams...
 > the overhead of pouring a buffer's worth of bytes into the
 > internal character buffer that Lark will be reading from
 > is hardly detectable.  The next Lark will not be as fast as XP but
 > it won't be that much slower.  I'd be interested in what the other 
 > parser builders have done in this area.   -T.

AElfred reads a big buffer (up to 32K) of bytes, then translates it
into a big buffer of characters using whatever the current encoding
scheme is.  Profiling shows that the overhead of doing this is
surprisingly low, since it all happens in a tight loop.

AElfred can now also read directly from a Reader, bypassing the
conversion altogether.

The Lark driver in the current pre-release of SAX feeds a character
stream to Lark as an InputStream of UTF-8 bytes, using a surprisingly
inefficient algorithm that I can fix when I have time.  Will the next
version of Lark support character streams?

All the best,


David Megginson                 ak117 at
Microstar Software Ltd.         dmeggins at

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list