SAX: Byte Streams and Character Streams

Tim Bray tbray at textuality.com
Mon Apr 20 00:16:32 BST 1998


At 05:45 PM 4/19/98 -0400, David Megginson wrote:
>The Lark driver in the current pre-release of SAX feeds a character
>stream to Lark as an InputStream of UTF-8 bytes, using a surprisingly
>inefficient algorithm that I can fix when I have time.  Will the next
>version of Lark support character streams?

Well, the current version of Lark really doesn't really support anything
*but* character streams... that and synchronization, if my measurements
are correct, amount to >50% of the difference between XP and Lark.
It is clear and (sigh) not surprising that method-dispatch-per-char
is, well, less than optimal.  Thus my plan had been to move to 
a three-arg-read read call.

As a result of this, I'm a bit conflicted about James' suggestion that
we lose the int read() methods.  While they are a surefire way
to run slow, I spent enough years in Unix that doing things via
getc() feels natural and I appreciate its advantages; assuming
of course that getc() is a macro with buffering, which of course
a Java method dispatch, uh, isn't.  Nice thing about stdio is it made
it easy for the programmer to pretend to do character streams without
having to really do serious per-char work. -T.



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list