PEReference

Chris Hubick maillist at chris.hubick.com
Mon May 4 22:07:55 BST 1998


	I am writing an XML analization tool upon which I am building a
parser.  At the bottom level it reads in XML and generates
start/end/character events based on the productions in the XML spec.  For
example, when the parser encounters something matching the Name
production, say the Name "foo", it would generate events:

<Name>
  <Letter>
    <BaseChar>f</BaseChar>
  </Letter>
  <NameChar>
    <Letter>
      <BaseChar>o</BaseChar>
    </Letter>
  </NameChar>
  <NameChar>
    <Letter>
       <BaseChar>o</BaseChar>
    </Letter>
  </NameChar>
</Name>


	When it is complete I hope to build an actual parser on top of it.
The analizer can currently read most any document, "all" it is lacking
support for, and which I am working on, is production [29] markupdecl, and
all of it's dependencies.  Now this is where I hit a snag.

In the XML spec at:

http://www.w3.org/TR/REC-xml#NT-markupdecl

It states:

> The markup declarations may be made up in whole or
> in part of the replacement text of parameter entities.
> The productions later in this specification for
> individual nonterminals (elementdecl, AttlistDecl,
> and so on) describe the declarations after all the
> parameter entities have been included.

I want the productions for an XML document BEFORE the parameter entities
have been included.  I really think the XML spec should have included
productions for before as well as after PEReference inclusion.

I want to do PEReference inclusion at the parser level, not at my lower
"analizer" level, which I want to generate events that directly reflect
what is in the document (before inclusion).

So for my purposes, I need to figure out the grammer for an _unprocessed_
XML document.

So my first step/idea was to just look at the current grammer, and start
adding PEReferences where I thought necessary:

[45] elementdecl ::= '<!ELEMENT' S (Name | PEReference) S contentspec S? '>'
[48] cp ::= (Name | PEReference | choice | seq) ('?' | '*' | '+')?
[51] Mixed ::=  '(' S? '#PCDATA' (S? '|' S? (Name | PEReference))* S? ')*' | '(' S? '#PCDATA' S? ')'
[52] AttlistDecl ::= '<!ATTLIST' S (Name | PEReference) AttDef* S? '>'
[53] AttDef ::=  S (Name | PEReference) S AttType S DefaultDecl 
[54] AttType ::=  StringType | TokenizedType | EnumeratedType | PEReference
[58] NotationType ::= 'NOTATION' S '(' S? (Name | PEReference) (S? '|' S? (Name | PEReference))* S? ')'
[59] Enumeration ::= '(' S? (Nmtoken | PEReference) (S? '|' S? (Nmtoken | PEReference))* S? ')'

Where I get really confused is:

[9] EntityValue ::= '"' ([^%&"] | PEReference | Reference)* '"' | "'" ([^%&'] | PEReference | Reference)* "'"

If, as the spec states, these are the declarations AFTER PE inclusion, how
can there be PEReferences???

Part of the reason I am writing this is to get a better grip on (read
learn) XML.  Any guidance would be much appreciated, thanks!

---
Chris Hubick
mailto:chris at hubick.com
http://www.hubick.com/



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list