David Brownell david-b at pacbell.net
Mon Jun 7 21:50:31 BST 1999

John Cowan wrote:
> David Brownell wrote:
> > Hmm, I have a SAX2 driver that parses XML, which I'll release this week.
> I suppose you mean "parses HTML".

Yes indeed ... typos abound, so much of the world has taken to
writing XML when they mean HTML!  I did so below, too ... ;-)

> > It uses the Swing HTML parser, which is pretty universally available
> > though (like all HTML parsers) it's got quirks with respect to how it
> > handles faulty XML.
> That was my first idea, but I learned that the Swing parser doesn't
> do the amount of cleanup I want, so I decided to roll my own.

It's imperfect, but is pretty generally available (and getting moreso).
It works for much, but not all, of the broken HTML in the world.  And
at a bare minimum, it's a good lead-in to more sophisticated packages!!

I know they've worked to improve its error recovery, and will do more,
though there are limits to how much broken HTML they'll accept.

> Don Park also has a SAX interface to Swing-HTML, freely available
> but closed source.

I'll have this one under an Open Source (tm) license.

- Dave

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list