Partial XML Processors (was Re: JavaScript parser update and Questions)

David Megginson ak117 at freenet.carleton.ca
Sat Jan 17 12:21:40 GMT 1998


Jeremie Miller writes:

 > A well-formed XML document is not required to have a DTD, internal or
 > external, correct?  Is a well-formed parser not an XML parser that does not
 > have access to or does not process a DTD, internal or external?  I guess I
 > haven't found a clear definition of what a well-formed parser is yet.

The PR is not very clear about processing requirements (other than
error reporting and a few details like ignorable whitespace).  As I
understand things, however, a well-formed parser must be able to do
the following:

1) Parse all of the grammar, including the document type declaration
  and internal DTD subset, without throwing spurious errors (even if
  it does nothing with the declarations).
2) Act correctly on the rmd parameter of the xml declaration.
3) Report a large range of errors, such as "]]>" in character data,
  "<" in an attribute value literal, illegal characters in element and
  attribute names, mismatched start- and end-tags, etc.

There is no provision for a conforming XML parser that does not do
full error reporting, even if the parser correctly handles all XML
constructions.  For example, AElfred parses a DTD, resolves all
general and parameter entities, stores information on entities and
notations, fills in defaulted attribute values, marks ignorable
whitespace, and supports multiple character encodings, but it is a
non-conforming XML parser because it does not report all required
well-formedness errors.

 > If this is true, then a well-formed parser doesn't even have to acknowledge
 > that entities exist except for the built in ones, and absolutely all
 > whitespace is preserved, right?

Yes, that is my understanding, except that the well-formed parser must
check that the entity reference itself is well-formed.  For example,
if you found

  &1front2;

you would be required to report a well-formedness error.  You have to
be prepared to check the whole range of Unicode characters, not just
the first 256 (see the PR for what's allowed at the start and middle
of a name).  AElfred does not do this right now, because it would make
the parser too large for use in applets (I added the support
experimentally once, then removed it again).


All the best,


David

-- 
David Megginson                 ak117 at freenet.carleton.ca
Microstar Software Ltd.         dmeggins at microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list