SAX-J and the DPH (DJH?)

Chris Maden crism at
Wed Dec 31 19:11:29 GMT 1997

[Sean McGrath]
> So this works if:
>  1) No more than 1 telephone number per line [Chris]

For my trivial solution.  Perl can handle multiple matches per line;
I'm just not very sophisticated yet.

>  2) No cdata marked sections [Chris]

Can be handled by looking for CDATA marked section starts and ends,
using code similar to the appendix, and adding && !$incdata to all
element-matching conditionals.

>  3) The attribute value literal for client does not have any entity
> references [Sean - suggested]
>  4) The target telephone number does not contain entity references
> [Sean - suggested ]

The two real problems in this list.

>  5) appendix elements do not nest [Sean - suggested]

Not a problem - keep a reference counter instead of my trivial boolean
approach.  (Appendices rarely nest, but this is applicable to other
kinds of elements.)

>  6) Telephone numbers do not nest (problem if regexp matching is
> greedy) [Sean - suggested]

The regexp is greedy, but I can use a pattern that will only match
single elements.

> Others? I think a little list of "gotchas" like this would find the
> way onto many a DPH's wall (including mine!).

There are only two real problems here, the ones with entity
references.  These are, on their face, beyond the scope of a DPH.  I
would either (a) do a quick grep to see if I need to worry about it,
or (b) run my script on the output of spam or a similar normalizer.

I don't think anyone has claimed that Perl can address everything; as
David (I think) said, there is a large fuzzy gray line between
problems in the Perl domain and problems in the full XML processor
domain.  (The assertion can be proven by the fact that a Perl script
can solve arbitrary XML processing problems, but will, in the course
of doing so, eventually implement a full XML processor.)

<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL> <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as:
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list