NameChar (was: Editing text)

Fri Nov 28 16:21:40 GMT 1997

Richard Tobin writes:

 > > The fastest solution would be to maintain a static 65,536
 > > (or at least 32,768) entry array, with bit flags for different
 > > character properties.  That would be fine for big programs, but it
 > > would kill Java applets
 > 
 > Bear in mind that the main problem of size for Java applets is the
 > time taken for downloading, rather than the memory used at runtime.
 > So it may well be practical to store the data in a compact-but-slow
 > form and use that to initialise a large-but-fast lookup table.

(I hear that memory _is_ a problem right now on Windows systems, since
both Netscape and (especially) MSIE 4 bloat to ridiculous sizes,
sometimes double or triple the typical 32MB of RAM on people's
systems; however, an extra 64k or so would make little difference).

The best optimisation will depend on your expected usage.  If, for
example, you expect that 80% of all characters would be <=0x007f, then
Tim's approach of using a bit-array for those characters and jumping
to a hairy lookup method for the rest would make sense; if, however,
you expected that some documents might be almost entirely encoded with
characters >=0x0080 (say, in Han Chinese characters), then a 64K
lookup table would be necessary for acceptable performance.  If you
were keeping only one bit for each character, then you could encode a
compact lookup table in only 4K.

All the best,

David

-- 
David Megginson                 ak117 at freenet.carleton.ca
Microstar Software Ltd.         dmeggins at microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)