PEReference
Chris Hubick
maillist at chris.hubick.com
Mon May 4 22:07:55 BST 1998
I am writing an XML analization tool upon which I am building a
parser. At the bottom level it reads in XML and generates
start/end/character events based on the productions in the XML spec. For
example, when the parser encounters something matching the Name
production, say the Name "foo", it would generate events:
<Name>
<Letter>
<BaseChar>f</BaseChar>
</Letter>
<NameChar>
<Letter>
<BaseChar>o</BaseChar>
</Letter>
</NameChar>
<NameChar>
<Letter>
<BaseChar>o</BaseChar>
</Letter>
</NameChar>
</Name>
When it is complete I hope to build an actual parser on top of it.
The analizer can currently read most any document, "all" it is lacking
support for, and which I am working on, is production [29] markupdecl, and
all of it's dependencies. Now this is where I hit a snag.
In the XML spec at:
http://www.w3.org/TR/REC-xml#NT-markupdecl
It states:
> The markup declarations may be made up in whole or
> in part of the replacement text of parameter entities.
> The productions later in this specification for
> individual nonterminals (elementdecl, AttlistDecl,
> and so on) describe the declarations after all the
> parameter entities have been included.
I want the productions for an XML document BEFORE the parameter entities
have been included. I really think the XML spec should have included
productions for before as well as after PEReference inclusion.
I want to do PEReference inclusion at the parser level, not at my lower
"analizer" level, which I want to generate events that directly reflect
what is in the document (before inclusion).
So for my purposes, I need to figure out the grammer for an _unprocessed_
XML document.
So my first step/idea was to just look at the current grammer, and start
adding PEReferences where I thought necessary:
[45] elementdecl ::= '<!ELEMENT' S (Name | PEReference) S contentspec S? '>'
[48] cp ::= (Name | PEReference | choice | seq) ('?' | '*' | '+')?
[51] Mixed ::= '(' S? '#PCDATA' (S? '|' S? (Name | PEReference))* S? ')*' | '(' S? '#PCDATA' S? ')'
[52] AttlistDecl ::= '<!ATTLIST' S (Name | PEReference) AttDef* S? '>'
[53] AttDef ::= S (Name | PEReference) S AttType S DefaultDecl
[54] AttType ::= StringType | TokenizedType | EnumeratedType | PEReference
[58] NotationType ::= 'NOTATION' S '(' S? (Name | PEReference) (S? '|' S? (Name | PEReference))* S? ')'
[59] Enumeration ::= '(' S? (Nmtoken | PEReference) (S? '|' S? (Nmtoken | PEReference))* S? ')'
Where I get really confused is:
[9] EntityValue ::= '"' ([^%&"] | PEReference | Reference)* '"' | "'" ([^%&'] | PEReference | Reference)* "'"
If, as the spec states, these are the declarations AFTER PE inclusion, how
can there be PEReferences???
Part of the reason I am writing this is to get a better grip on (read
learn) XML. Any guidance would be much appreciated, thanks!
---
Chris Hubick
mailto:chris at hubick.com
http://www.hubick.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list