PI, XMLDecl, and EncodingPI

Chris Hubick hubick at medlib.com
Thu Dec 4 01:14:37 GMT 1997

I am writing a recursive descent XML parser in Java and have
a couple questions....

The XML Working Draft dated 17-November-1997 states:

[24] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?
[28] Misc ::= Comment | PI | S
[19] PI ::= '<?' Name (S (Char* - (Char* '?>' Char*)))? '?>'
[25] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[79] EncodingPI ::= '<?xml' S 'encoding' Eq QEncoding S? '?>'

	Within a PI is the Name "xml" reserved?  If it is, should
there not be a [wfc] on PI stating so?
	By the current definition any XMLDecl and EncodingPI is also
a valid PI.  In a prolog an XMLDecl is optional, and is followed
by Misc, which includes PI.
	Ok, so I have can have an XML file with no XMLDecl
(it's optional) followed by "<?xml version="blah" encoding=5?>" which
matches PI, in my Misc*.  And this is legal?  My parser will
take this just fine as such, but I wonder about the others.
It makes detecting a bad XMLDecl impossible!  My parser will just
say fine, that wasn't an XMLDecl, and feed it to Misc, which will
most likely match (or possibly spew) it as a PI.

Shouldn't [19] PI have an S? at the end before '?>' ?

Also shouldnt PCData be:

[17] PCData ::= [^<&]+

rather than the current:

[17] PCData ::= [^<&]*
[44] content ::= (element | PCData | Reference | CDSect | PI | Comment)*


<TEST>This is a test</TEST>

In my recursive descent parses to:

    <PCData>This is a test</PCData>

And we get infinite matches on a zero length PCData.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list