SAX-J and the DPH (DJH?)

Chris Maden crism at ora.com
Wed Dec 31 19:11:29 GMT 1997


[Sean McGrath]
> So this works if:
> 
>  1) No more than 1 telephone number per line [Chris]

For my trivial solution.  Perl can handle multiple matches per line;
I'm just not very sophisticated yet.

>  2) No cdata marked sections [Chris]

Can be handled by looking for CDATA marked section starts and ends,
using code similar to the appendix, and adding && !$incdata to all
element-matching conditionals.

>  3) The attribute value literal for client does not have any entity
> references [Sean - suggested]
>  4) The target telephone number does not contain entity references
> [Sean - suggested ]

The two real problems in this list.

>  5) appendix elements do not nest [Sean - suggested]

Not a problem - keep a reference counter instead of my trivial boolean
approach.  (Appendices rarely nest, but this is applicable to other
kinds of elements.)

>  6) Telephone numbers do not nest (problem if regexp matching is
> greedy) [Sean - suggested]

The regexp is greedy, but I can use a pattern that will only match
single elements.

> Others? I think a little list of "gotchas" like this would find the
> way onto many a DPH's wall (including mine!).

There are only two real problems here, the ones with entity
references.  These are, on their face, beyond the scope of a DPH.  I
would either (a) do a quick grep to see if I need to worry about it,
or (b) run my script on the output of spam or a similar normalizer.

I don't think anyone has claimed that Perl can address everything; as
David (I think) said, there is a large fuzzy gray line between
problems in the Perl domain and problems in the full XML processor
domain.  (The assertion can be proven by the fact that a Perl script
can solve arbitrary XML processing problems, but will, in the course
of doing so, eventually implement a full XML processor.)

-Chris
-- 
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list