SAX2: LexicalHandler

David Megginson david at megginson.com
Wed Dec 22 16:24:48 GMT 1999


David Brownell writes:

 > For DOM Level 2 support, the literal text of the internal subset
 > needs to be provided.

You're kidding!  That's disgusting -- I'm strongly tempted just to
leave the DOM people dangling on that one.  After all, the proposed
SAX2 interfaces provide enough information to construct an equivalent
internal subset.

 > 
 > >     public void startEntity (String name) throws IOException;
 > >     public void endEntity (String name) throws IOException;
 > 
 > A bunch of restrictions to this were identified as being essential,
 > such as the fact that entities expanded within other constructs
 > mustn't be exposed.  For example:
 > 
 > 	<!ATTLIST foo %std-attrs; %i18n-attrs; %gooey-attrs;>
 > 
 > 	<element foo="&entity1;" bar="&entity2;" />

Agreed.

 > I'm hoping the full spec for those callbacks makes clear that
 > in such situations the entities MUST NOT be reported.  (And
 > would strongly prefer that parameter entities never show up
 > in any context whatsoever.)

To tell the truth, I don't think that many people really need any of
this stuff, so it's hard for me to distinguish one type of noise from
another.  If I were dictator, the only things I'd put in SAX2 would be
property/feature queries and Namespace support.

 > The reason was briefly that applications can't see inside the
 > structure of those constructs -- they'll just see some start/end
 > entity calls, FOLLOWED (oops!) by the callback of which they're
 > a part.  Just like they would if the entities preceded that
 > construct.

Agreed -- entity boundaries inside attribute values are forever lost.

 > >	 I wonder if a little
 > > redundancy would make sense:
 > > 
 > >     public void startEntity (String name, String publicId,
 > >                              String systemId) throws IOException;
 > >     public void endEntity (String name) throws IOException;
 > > 
 > > That way, if the parser supports the LexicalHandler but not the
 > > DeclHandler, the public and system identifiers for entities will still
 > > be available.
 > 
 > That wouldn't handle internal entities, though.

For internal entities, both publicId and systemId would be null, and
the value would be the text that appears before the corresponding
endEntity callback.

 > I have fundamental issues with the notion of exposing the entity
 > structure of documents beyond that needed to recreate the DOCTYPE
 > declaration (DTD).  Not just in SAX; DOM does it pretty poorly too
 > (children of entity refs must be readonly, making them impossible
 > to manipulate in typical ways).

Yes, I know -- that's why I want (at least) to make all of this mess
optional.  XML is simple at heart, but not when they start letting API 
writers loose on it.

 > So I'd really rather not see that particular thing done ... if
 > any substantial change is to be made to entity reporting, my vote
 > is to just drop it entirely.  It's too messy a notion (IMHO) to
 > show up in any API offering higher level notions than lexical
 > tokens. (angle bracket, name, space, name token, space, equals,
 > double quote, text, entity ref, text, double quote, angle bracket,
 > text ... you get the idea.)

I'd like to leave it out as well.  Personally, I think that the XML
community would be better served if purely lexical items like
Namespace prefixes, the DOCTYPE declaration, comments, element type
declarations, entity boundaries, etc. were simply inaccessible through
any standard API -- that way, the APIs would be easier to learn and
the obfuscators of the world would be less likely to abuse them.

I am tired, however, from all the e-mails from DOM implementors who
want comments (for example) in SAX so that they can bloat their DOM
trees with them.  They're wrong, of course, but I'm too tired to fight 
any more.


All the best,


David

-- 
David Megginson                 david at megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list