andrewl at microsoft.com
Wed Sep 3 22:39:59 BST 1997
JDK 1.1 is still broken for Unicode. Take a look at the code in the
Microsoft XML Parser (http://www.microsoft.com/standards/xml) to see our
AndrewL at microsoft.com
> -----Original Message-----
> From: Tim Bray [SMTP:tbray at textuality.com]
> Sent: Wednesday, September 03, 1997 12:51 PM
> To: xml-dev at ic.ac.uk
> Subject: Character classification
> I've been working on making Lark really do Unicode. JDK 1.1 is
> to have, unlike 1.0, a usable input method; thus the problem is to
> when you're reading a GI or Attribute name, whether the characters are
> legal namestart/name characters.
> It turns out to be quite a lot of work, so this is an offer to share.
> I wrote a program (based on Lark) that pulls the relevant character
> classes out of the XML spec, picks apart the markup, and writes
> Java class that has some static arrays and offers two methods:
> package textuality.lark;
> public class CharClasses
> public static boolean isNameC(char c)
> public static boolean isNameStart(char c)
> It needs about 4k of tables (which it binary-searches); it might be
> with 128k of byte-addressable tables or 16K of bitmaps, neither of
> would be hard to implement.
> (a) is this a waste of time, i.e. are there Unicode library calls that
> do it?
> (b) if not, has everyone else already done this?
> (c) if not, if I'm going to publish this, is the API above OK?
> I've attached the current Java source file for those who find the
> explanation above insufficiently clear.
> Cheers, Tim Bray
> tbray at textuality.com http://www.textuality.com/ +1-604-708-9592 <<
> File: CharClasses.java.txt >>
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev