SAX: Byte Streams and Character Streams
David Megginson
ak117 at freenet.carleton.ca
Sun Apr 19 23:47:16 BST 1998
Tim Bray writes:
> Hmmm, for what it's worth, Lark, both in its current form and
> after the big performance update coming Real Soon Now, works
> at approximately equal speed off byte and character streams...
> the overhead of pouring a buffer's worth of bytes into the
> internal character buffer that Lark will be reading from
> is hardly detectable. The next Lark will not be as fast as XP but
> it won't be that much slower. I'd be interested in what the other
> parser builders have done in this area. -T.
AElfred reads a big buffer (up to 32K) of bytes, then translates it
into a big buffer of characters using whatever the current encoding
scheme is. Profiling shows that the overhead of doing this is
surprisingly low, since it all happens in a tight loop.
AElfred can now also read directly from a Reader, bypassing the
conversion altogether.
The Lark driver in the current pre-release of SAX feeds a character
stream to Lark as an InputStream of UTF-8 bytes, using a surprisingly
inefficient algorithm that I can fix when I have time. Will the next
version of Lark support character streams?
All the best,
David
--
David Megginson ak117 at freenet.carleton.ca
Microstar Software Ltd. dmeggins at microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list