Lark 1.0 final beta and Larval 0.8
tbray at textuality.com
Mon Jan 5 19:15:37 GMT 1998
This isn't finished yet, but I am uncomfortable about the fact that
for the last couple of months, there has not been a java-language XML
syntax-checker that is really very close to the spec. So, at
I have placed the Lark 1.0 final beta, and release 0.8 of Larval,
a validating XML processor based on Lark.
In my tests, Lark does all the things it used to do, and also rejects
163 of 164 of James' non-well-formed documents; the odd-doc-out is the
notorious 088.xml, which I consider to be well-formed and represents
a policy issue that the WG is going to have to make a call on. The
only hole I know about in Lark at the moment is that it doesn't do
text declarations in external parsed entities; but I won't have
time to work on it until next week, so decided to ship anyhow.
James' test-suite represents a tremendous resource: a de-facto
reproducible test of conformance that will greatly increase the
interoperability of XML docs. We are all considerably in his debt,
not for the first time; thank you once again, James.
Larval validates quite a few things, and boots out quite a few other
things, but has not been tested to anywhere near the same level that
These class files have been compiled with Microsoft VJ++1.1 and
tested with Microsoft JView and with Sun's Java from JDK 1.1.3.
At the moment, if I compile with the Sun fastjavac, then neither
the Sun nor Microsoft java interpreters can use the resulting
class files. Admittedly, Lark.java and Larval.java are a pretty
severe strain on a compiler; on the other I know about some pretty
egregious violations of the Java language spec that will get by
both of those compilers. I suspect that my current problem with
fastjavac is as likely to be me breaking some rule about what can
be in a static string (J++ is forgiving) as it is a compiler bug.
There's a policy change in that the Java source code for every Lark
class is now included in the distribution. If you actually look at
Lark.java and Larval.java, you'll see that this is not quite as
generous as it sounds.
Lark 1.0 has also not received a walk-through looking for dead code,
software rot, and unconcealed evidence of stupidity, and has not been
profiled. It is noticeably but not unbearably slower than 0.97, but it'll
be faster before I'm done. I have established with previous releases
that with a little work any given release of Lark can be made faster
and smaller. This release has grown in size by 10K.
Lark's UTF8 processing is still pretty shaky - I think that the
Java libraries are moving in the right direction fast enough to
make it not cost-effective for me to wrangle with this much more
at the moment. Since XmlInputStream is now available at source level,
if someone were to want to plug in some robust UTF8 code that'd be
Everything else is conformant I think without exception.
Larval is just another version of Lark; but it has some more methods,
public void validate(boolean)
which as a side-effect turns on processExternalEntities; there
is a new validityError() callback in the Handler. Of course there
are a bunch of new classes with names like DTD and Validator and
Attlist and so on.
Larval is done this way because if you just use Lark, you'll never have
to include any validation class files. I can get away with this because
even though Java doesn't have a preprocessor, Lark does. Presumably I
will use the same trick to do SAX.
The validation implementation is pretty naive. Rather than compiling
tables, Larval builds a data structure more or less isomorphic to
the declaration in the DTD, and then laboriously pokes around in it
every time it sees a start/end tag. I think it proves that (a) a
naive implementation of validation can be done, and (b) this isn't
the right way to do it in the long-term. However, it's nowhere
near as slow as I expected, and is good enough to be useful already
in debugging XML documents.
The doPI method now has separate args for target and remainder.
There is a doXmlDeclaration method.
There is a new method to tell Lark what name it should use for
the document Entity, e.g. in error reporting.
There is an ESIS class that extends Handler; I don't claim this to
be anything like a real SGML ESIS, but it's sure useful in
Lark's version will remain 1.0 as long as XML does (a long time, I
hope). Once it's no longer 'final beta' Lark.toString() will add a
build date-stamp to the "1.0" version string.
Larval will progress toward 1.0 as I get around to doing some really
serious testing on it.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev