SAX-J and the DPH (DJH?)
Chris Maden
crism at ora.com
Wed Dec 31 19:11:29 GMT 1997
[Sean McGrath]
> So this works if:
>
> 1) No more than 1 telephone number per line [Chris]
For my trivial solution. Perl can handle multiple matches per line;
I'm just not very sophisticated yet.
> 2) No cdata marked sections [Chris]
Can be handled by looking for CDATA marked section starts and ends,
using code similar to the appendix, and adding && !$incdata to all
element-matching conditionals.
> 3) The attribute value literal for client does not have any entity
> references [Sean - suggested]
> 4) The target telephone number does not contain entity references
> [Sean - suggested ]
The two real problems in this list.
> 5) appendix elements do not nest [Sean - suggested]
Not a problem - keep a reference counter instead of my trivial boolean
approach. (Appendices rarely nest, but this is applicable to other
kinds of elements.)
> 6) Telephone numbers do not nest (problem if regexp matching is
> greedy) [Sean - suggested]
The regexp is greedy, but I can use a pattern that will only match
single elements.
> Others? I think a little list of "gotchas" like this would find the
> way onto many a DPH's wall (including mine!).
There are only two real problems here, the ones with entity
references. These are, on their face, beyond the scope of a DPH. I
would either (a) do a quick grep to see if I need to worry about it,
or (b) run my script on the output of spam or a similar normalizer.
I don't think anyone has claimed that Perl can address everything; as
David (I think) said, there is a large fuzzy gray line between
problems in the Perl domain and problems in the full XML processor
domain. (The assertion can be proven by the fact that a Perl script
can solve arbitrary XML processing problems, but will, in the course
of doing so, eventually implement a full XML processor.)
-Chris
--
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list