Sean Mc Grath
digitome at iol.ie
Thu Aug 28 19:52:35 BST 1997
[Sean Mc Grath]
>> >>Won't our desperate Perl hackers' beloved $_ variable be significantly less
>> >>useful if it contains the *entire* document.
>> >Just as it's not useful in processing HTML. Regexps that don't match across
>> >line boundaries are the most common problem I've seen in HTML-processing
>> >Perl scripts. Looks like that will continue until people figure out that
>> >Perl's line "Feature" is jsut a big when used with XML/HTML.
[Sean Mc Grath]
>> Bang goes the the notion of a lightweigth XML app. then! Thou shalt always
>Nonsense! Regexps that fail across line boundaries are only due to
I think you miss my point. I am concerned about what happens to
line oriented tools when there are *no* line breaks not when there are.
awk, head, tail, xargs etc. are all line oriented tools.
I am concerned about their utility with WS-less XML. (Approximately zero?).
Perl allows you to set the Record End pattern to whatever you like.
You certainly cannot with grep, tail etc. to my knowledge.
>> XML as a friendly format to, say, DPH needs some explaining.
>>To use Perl to read/write XML
>> you *must* use an XML parser. Indeed any tool intending
>>to read/write XML needs to use a
>> *fully blown parser* to get at the document. Bye bye the
>> entire Unix family of line oriented text processing apps:-(
>Maybe you just need to put a filter at the beginning of your pipeline
>to normalize whitespace to whatever you need.
I think you have missed my point again. I said *"read/write"* XML
a filter at the start to normalize the WS *blows* my ability to losslessly
write the result. If I munge WS I have munged the doc. A doc with WS leads
complex cross translations than, say, Monastic SGML, because of the
space that intermingled significant WS brings with it.
I would like to thank David Durand for the suggestion posted to this group
that a few paragraphs on WS be put together for developers. I think this
would be really, really useful and would have the inestimably beneficial
of shutting me up:-)
I am acutely aware that this thread is annoying to some (many?!) and I
would like to take this opportunity to bow out and await the WS explanatory
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev