Whitespace

Sean Mc Grath digitome at iol.ie
Thu Aug 28 19:52:35 BST 1997


[Sean Mc Grath]
>> >>Won't our desperate Perl hackers' beloved $_ variable be significantly less
>> >>useful if it contains the *entire* document.

[David Durand]
>> >Just as it's not useful in processing HTML. Regexps that don't match across
>> >line boundaries are the most common problem I've seen in HTML-processing
>> >Perl scripts. Looks like that will continue until people figure out that
>> >Perl's line "Feature" is jsut a big when used with XML/HTML.

[Sean Mc Grath]
>> Bang goes the the notion of a lightweigth XML app. then! Thou shalt always
>> parse!
>> 

[Matthew Fuchs]
>Nonsense! Regexps that fail across line boundaries are only due to
>lazy DPHs.

I think you miss my point. I am concerned about what happens to
line oriented tools when there are *no* line breaks not when there are.
Brief, grep,
awk, head, tail, xargs etc. are all line oriented tools.

I am concerned about their utility with WS-less XML. (Approximately zero?).
>From memory, 
Perl allows you to set the Record End pattern to whatever you like.
You certainly cannot with grep, tail etc. to my knowledge.

>> XML as a friendly format to, say, DPH needs some explaining.
>>To use Perl to read/write XML 
>> you *must* use an XML parser. Indeed any tool intending
>>to read/write XML needs to use a 
>> *fully blown parser* to get at the document. Bye bye the
>> entire Unix family of line oriented text processing apps:-(
>> 

[Matthew Fuchs]
>Maybe you just need to put a filter at the beginning of your pipeline
>to normalize whitespace to whatever you need.

I think you have missed my point again. I said *"read/write"* XML
applications. Putting
a filter at the start to normalize the WS *blows* my ability to losslessly
write the result. If I munge WS I have munged the doc. A doc with WS leads
to more
complex cross translations than, say, Monastic SGML, because of the
escalating state
space that intermingled significant WS brings with it.

FINALLY,

I would like to thank David Durand for the suggestion posted to this group
that a few paragraphs on WS be put together for developers. I think this
would be really, really useful and would have the inestimably beneficial
side-effect
of shutting me up:-)

I am acutely aware that this thread is annoying to some (many?!) and I
would like to take this opportunity to bow out and await the WS explanatory
note...



xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list