General comments on parsers (was [NEW] AElfred)

Peter Murray-Rust peter at ursus.demon.co.uk
Wed Dec 10 21:53:35 GMT 1997


At 19:18 09/12/97 -0500, David Megginson wrote:
>Microstar Software Ltd. is happy to announce lfred (AElfred), a
>small, fast, DTD-aware Java-based XML parser, especially suitable for
>use in Java applets.

Great!

I have bolted support for (AE)?lfred into JUMBO and tested the last but one
lfred pre-release.  Many thanks to Microstar (and David) for having
approached JUMBO.

JUMBO now supports three parsers (in alpha order)
	- Lark
	- lfred
	- NXP

(is MXSML WORA yet??)

They are run with the commandline 
	java jumbo.sgml.SGMLTree myfile.xml PARSER=AElfred (or whatever)

It has proved relatively easy to bolt these in, but there have been
significant differences in the interfaces offered and I hope that we can
move towards some uniformity - at least in the terminology. I shall post
more on this to XML-DEV.


Specific comments:

>lfred is free for both commercial and non-commercial use, and COMES
 ^^^^^
I am not sure whether the ligature has disappeared here or whether you have
shortened it to 'lfred' (5 chars). Although I support the use of Unicode,
many mailers don't (this is Eudora).

Note also that I use names for Java classes as well and so do authors, so
we have Lark.class, etc. I doubt whether JDK1.02 supports ligature.

There are 3 possibilities:
	7 chars (AElfred)
	6 chars (<ligature>lfred)
	5 chars (lfred)

I think you need to standardise on ONE! 

[... valuable design points omitted...]
>
>6. lfred must produce correct output for well-formed and valid
>   documents, but need not reject every document that is not valid or
>   not well-formed.
>
>   STATUS: lfred is DTD-aware, and handles all current XML features,

I can see several ways a parser can treat the DTD:
	- ignore external and internal subsets completely
	- read and parse the internal subset and apply ATTLISTs and ENTITYs
	- ditto and provide handles for the application to retrieve DTD information
	- ditto, but include the external subset
	- as above, but validate attribute values
	- as above but also validate content 

Only the latter is full validation.

JUMBO wants to retrieve the DTD information for its authoring process, and
needs the ELEMENT and ATTLIST information. At my last attempt I was unable
to extract ELEMENT information from Lark (but can get ATTLISTs) and I don't
think I could get ELEMENT info from lfred. I haven't looked at NXP, and
perhaps Norbert could update us.

>    including CDATA and INCLUDE/IGNORE marked sections, internal and
>
Again, many thanks to Microstar and David, Tim, Norbert (and the MSXML
players when we get the WORA version).


	P.

>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list