FWD: Announcement - World Wide Web Wrapper Factory (W4F)
John E. Simpson
simpson at polaris.net
Wed Mar 24 00:49:21 GMT 1999
I received this announcement via e-mail yesterday. It may (or may not :) be
of interest to xml-dev and xml-l subscribers. Contact information is at the
foot of the announcement.
[Disclaimer: I have no affiliation with the W4F product development group.
My correspondent, previously unknown to me, just happened on my website.
Apologies for the cross-posting to subscribers of both lists.]
>----- Looking at the Web through XML glasses, using W4F -----
>
>The World Wide Web Wrapper Factory (W4F) is a Java toolkit to
>generate wrappers for HTML data sources.
>
>Version 1.03 offers a built-in declarative mapping to XML.
>Using W4F it is now possible to easily specify the translation
>of HTML pages into XML documents. Moreover, the specification
>gives for free the DTD.
>
>W4F consists of a retrieval language to identify Web sources, a
>declarative extraction language (HEL: HTML Extraction Language)
>to express robust extraction rules and a mapping interface to
>export the extracted information into some user-defined data-
>structures (text, Java objects, XML, etc.).
>The wrappers are generated as Java classes that can be used as is
>or integrated into higher-level applications.
>
>Version 1.03 provides some improved visual support to make the
>creation of wrappers easier and faster. In particular, the
>extraction of HTML can be done via a wysiwyg interface.
>
>The W4F toolkit comes as a Java package and can be downloaded from
>the W4F web site. It is free for non-commercial use.
>Various examples of running wrappers are also available for download
>from the web site.
>
>Web site:
>http://db.cis.upenn.edu/W4F
>
>Contacts:
>Arnaud Sahuguet
>Database Research Group, Univ. of Pennsylvania, PA, USA
>sahuguet at gradient.cis.upenn.edu
>http://www.cis.upenn.edu/~sahuguet
>
>Fabien Azavant
>École Nationale Supérieure des Télécommunications, Paris, France
>Fabien.Azavant at enst.fr
>http://www.stud.enst.fr/~azavant
==========================================================
John E. Simpson | The secret of eternal youth
simpson at polaris.net | is arrested development.
http://www.flixml.org | -- Alice Roosevelt Longworth
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list