Lark 0.97 Available
Tim Bray
tbray at textuality.com
Wed Oct 29 21:40:46 GMT 1997
Lark 0.97 is now available at
http://www.textuality.com/Lark/
Lark now
+ is smaller! More code, but the class files are back down to 45k.
+ is faster! About 200K/second on my mouldy old P100, i.e. Lark parses
Jon's Old Testament file (3.88M) in under 20 seconds - this is just
the event-stream & syntax check. If you want to build complete trees
in memory, parsing for any document slows down a lot, obviously.
+ is free of case-folding.
+ checks for duplicate attribute names attached to one element.
+ reads multiple attlist declarations, collisions go to the first.
+ won't let you &refer-to; an external text entity in an attribute value -
what the spec says, and James says this is a good idea and he's
usually right.
+ reads the external DTD subset if the toggle
lark.processExternalEntities(true) has been set (and, of course, if
a usable SYSTEM ID has been provided).
+ has a new version of the central readXML method, that allows you to
specify a base URL for the document entity; necessary for relative-URL
constructions such as <!DOCTYPE foo SYSTEM "foo.dtd" > to work.
+ has another Entity member
java.net.URL mBaseURL;
constructor argument, and set/get function pair, to retrieve the URL
associated with an external entity.
+ does full PE processing, including external PE's.
+ as a result, class Entity has a new member
boolean mPE;
with a new argument on its constructor and a new method
public boolean isPE().
- doesn't do conditional sections, still.
+ upon encountering a reference to an undeclared entity, checks to see if the
declaration might have been external and bypassed; this can happen when
(a) you have turned off mProcessExternalEntities, and
(b) there is an external DTD subset, or
(c) there is a ref to an external PE in the internal subset at
a point where a whole markup declaration might be recognized.
If so, Lark turns off draconian error handling and allows processing
to continue; however, Handler has a new method, doWarning(),
that gets called in this situation.
+ processes entity/char references correctly in <!ATTLIST default values.
+ has had the Handler.doAttlist() method changed - now takes
an Object[] instead of String[] argument, since the default value is
now a Text as opposed to a String, because of entities in defaults.
+ does entity declaration processing properly, doing Henry Thomson's
hideous example from the spec Appendix C, and another, just as nasty,
that I have cooked up for the next release of the spec. Blecch.
+ has a big bug-fix: it turns out pre-0.97 Lark almost never parsed
<!doctype declarations properly, botching SYSTEM & PUBLIC identifiers;
so the Handler.doDoctype() method has been rebuilt, since I can't
imagine anybody ever actually did anything useful with it.
+ has a change to Handler.doSyntaxError() (sorry), which now has a third
arg, char c, that gives the character that caused Lark to decide
the doc wasn't well-formed... in lots of cases, this turns out to
be real useful. Others not.
Cheers, Tim Bray
tbray at textuality.com http://www.textuality.com/ +1-604-708-9592
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list