Expat's XML_SetCharacterDataHandler

Nik O niko at cmsplatform.com
Tue Jun 29 02:40:15 BST 1999

Kalyan wrote:

> I have set a character data handler to trap the char data between the
> and end tags, but the callback is called for every subsequent call to
> handler. I want to handle only on the occurance of genuine char data
> between the start and end tags.

Are you calling Expat for each "text record" individually?  If so, you would
get a "char data" callback for a start or end element's end-of-record
character (forced to '\n', regardless of the actual EOR char, per the XML
spec).  By specification, only the characters between the "<>" are
considered part of the start/end element -- all other characters are char
data.  If your parse buffer contains multiple elements and their associated
char data, you'd still see the EOR characters in the char data callback
(modified as described above).

I'm using Expat to parse pseudo-XML data derived from our proprietary markup
format, and so i've had to use a combination of "(char_data == '\n)'" and
"(buffer_length == 1)", plus a flag in the parser "userdata" to
differentiate between the EOR characters at the end of element tags (what
i'm discarding), and those EOR characters that represent blank lines in the
content (what i have to keep).

-Nik O, Content Mgmt Solutions, Jackson, Wyo.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list