Simple XML conformance

Peter Murray-Rust peter at ursus.demon.co.uk
Sun Jan 16 22:14:51 GMT 2000


I have been preparing a set of XML documents and a collection of XML-aware
tools to introduce newcomers to XML (on our VirtualXML course). I have
encountered a surprisingly number of cases where an XML tool is unable to
read an XML document. [There is not meant to be anything tricky here since
Henry and I are actually trying to demonstrate how to learn XML by doing.
We are not looking to "torture" the tools - more the reverse.]

As a collection of XML documents I took:
	Jon Bosak's Shakespeare	(elements and DTD)
	http://www.w3.org/TR/1998/REC-xml-19980210.xml 		(elements, attributes and
DTD (with PEs))
	http://www.w3.org/TR/DOM-Level-2/			(elements, attributes, entities and
DTD(with PEs and GEs)) [I point out that this is an excellent document for
showing a wide range of XML constructs in a meaningful way.]
	(and a number of examples distributed with tools, including my own).

Here are some of the problems ( I will not list the tools explicitly)
	- tool threw a fatal error because <?xml version="1.0"?> was absent
	- tool threw a fatal error because <!DOCTYPE was missing
	- REC-xml and DOM specify DTD but spec.dtd is not mounted
	- One content model in spec.dtd appeared to be inconsistent with the
REC-xml (I may have th wrong spec.dtd but it was downloaded from w3.org)
	- one tool "skipped" general entity references (i.e. did not expand them)
and threw a content model error
	- one tool regarded undeclared parameter entities in comments (in
spec.dtd) as errors
	- several tools regard the absence of a DTD as a fatal error (i.e. they
appear to be validating by default).

As an example, I believe that it is likely that many tools when pointed at:
http://www.w3.org/TR/1998/REC-xml-19980210.xml 
will fail. 

I expect that by tweaking some of the tools with commandline switches I
might be able to alter their behaviour, but I am slightly surprised that
some tools will only read validatable files (e.g. the file 

<greeting>Hello World</greeting>

is often not readable (unless "edited" to:

<?xml version="1.0"?>
<!DOCTYPE greeting [
<!ELEMENT greeting (ANY)>
]>
<greeting>Hello World</greeting>
		
Is there a definitive resource anywhere which explicitly states what
behaviour can be expected from various types of parsers? I know it is
inferable from the spec, but I suspect that not all implementers have taken
identical interpretations. I would ideally like to have a matrix of parsers
against standard "correct" [not always "valid"] documents and see how many
conform.

Henry and I are obviously keen to show that XML is simple to use with the
correct tools and that interoperability is achievable. 

	TIA

	P.
(http://www.cmlconsulting.com)


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.





More information about the Xml-dev mailing list