istvanc at microsoft.com
Fri Sep 5 18:02:59 BST 1997
You are right, it is a well known technique, Java JDK1.1 in fact uses
very similar code for character classification. I replaced that with the
simple 256 member array lookup (for characters in that range) and it
sped up the parser ~10%.
> From: James Clark[SMTP:jjc at jclark.com]
> Reply To: James Clark
> Sent: Thursday, September 04, 1997 10:04 PM
> To: xml-dev at ic.ac.uk
> Subject: Re: Character classification
> Istvan Cseri wrote:
> > For better speed I would suggest an alternative solution: use a
> > array lookup for characters below 256 and go to the more expensive
> > method above... It will do wonders with your parser.
> Except of course when you're parsing non-Latin scripts.
> There's another technique which exploits the fact that characters on
> same page often have similar properties, and this is true even more so
> for characters in the same column.
> The idea is to have a three-level table, the first level with 256
> entries, the second and third levels with 16 entries. The entries for
> the first and second levels are a (possibly null) pointer to a
> plus a value; the entries for the third level are just values. To look
> up the value for a character, you use the high 8 bits to index into
> first-level table; if the pointer part of the entry is null, then
> the value part of entry; otherwise use the sub-table table addressed
> the pointer; use the next 4 bits to index into that in a similar way,
> and, if necessary, the bottom 4 bits to index into the bottom table.
> This is I believe quite a well-known technique; I got it from Glenn
> You can use this to implement case-folding by storing the difference
> between a character and its upper-case equivalent modulo 2^16.
> There's a C++ implementation of this in SP in include/CharMap.h.
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo at ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev