XML processing experiments

Tim Bray tbray at textuality.com
Fri Nov 7 17:40:10 GMT 1997

At 06:11 PM 07/11/97 +0100, Jarle Stabell wrote:
>Ok. My current design will first return PCData="x", then entity ref="foo",
>and (if the client want entities expanded: PCData="a" followed by
>EmptyElement="b" and then PCData="c".)
>ie it may return two consecutive PCData's, with perhaps some
>EntityExpansionStart and -End signals between them.
>(Is this design flawed?)

If "foo" is an *internal* entity, the spec clearly requires your
parser to expand it for the application.  But letting the app know
that the ref was encountered is also fine.

However, the spec says nothing that would require you to merge the
text from a variety of entities.  For example, Lark's event-stream
API will generate a series of Text object events in just this
situation.  On the other hand, once you've seen the end of the element,
Lark has an API just to get all the text.  This is strictly a matter
of a design choice; as Richard points out, if you want to support a 
"grep" application, you'd probably like to have entity replacements
merged for you.  On the other hand, if you're building a full-text
index, you probably need to have the separate chunks made visible
so that you know what to point at from the index.

As James has pointed out more than once, there is no universal 
document API that meets everybody's application needs.  One of
the nice things about XML is that if you can't find a parser that
has the API you need, you can go build your own without excessive
pain. -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list