Entities and Expat (was Re: Confused about & in entity literal)

Nik O niko at cmsplatform.com
Tue May 11 21:38:49 BST 1999

Recently, in "Re: Confused about & in entity literal", John Cowan wrote (re
parsing after the "&" character):

> All you actually have to do is to ensure that the next character
> (if not #, see above) is a NAMESTRT character, and that all characters
> until ; are either NAME or NAMESTRT characters.  There is no need (and
> in fact it is forbidden) to look up the supposed entity name anywhere.

I'm trying to parse and index documents that contain several HTML-style
general entities ("©", "•", etc.), using Expat ("Version
19990307") as the parser.  I want to be able to trap these entity strings
(they're not to be indexed, but they are to be included, untranslated, in
the document output).  But, Expat exits with an XML_ERROR_UNDEFINED_ENTITY
error ("..undefined entity at line..").  Inspection of the Expat source
shows the following:
======= Begin code excerpt =======
 entity = (ENTITY *)lookup(&dtd.generalEntities, name, 0);
 if (!entity) {
   if (dtd.complete || dtd.standalone)
   if (defaultHandler)
     reportDefault(parser, enc, s, next);
======= End code excerpt =======
Thus, it would appear that Expat is indeed trying to validate the entity
name.  Is this indeed wrong, as John Cowen said?

I've setup both XML_SetDefaultHandler() and
XML_SetExternalEntityRefHandler(), but neither of these functions are
being called by Expat (little wonder, given the above code excerpt).  Do
y'all know what i might be doing wrong here?

-Nik O, Content Mgmt Solutions, Jackson, Wyo.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list