Whitespace

Matthew Fuchs matt at wdi.disney.com
Thu Aug 28 18:19:48 BST 1997


> 
> 
> >>Won't our desperate Perl hackers' beloved $_ variable be significantly less
> >>useful if it contains the *entire* document.
> >
> >Just as it's not useful in processing HTML. Regexps that don't match across
> >line boundaries are the most common problem I've seen in HTML-processing
> >Perl scripts. Looks like that will continue until people figure out that
> >Perl's line "Feature" is jsut a big when used with XML/HTML.
> >
> 
> Bang goes the the notion of a lightweigth XML app. then! Thou shalt always
> parse!
> 
Nonsense! Regexps that fail across line boundaries are only due to
lazy DPHs.  The "s" modifer to a regex will treat the entire string
(i.e., document) as a single target.  The problem here is that
_insignificant_ whitespace (a newline) is treated significantly.
A regex modifier which treated newline, tabs, etc., as spaces would
really help reduce this problem. (Larry Wall doesn't follow this
mailing list does he?)


> XML as a friendly format to, say, DPH needs some explaining. To use Perl to
> read/write XML 
> you *must* use an XML parser. Indeed any tool intending to read/write XML
> needs to use a 
> *fully blown parser* to get at the document. Bye bye the entire Unix family
> of line oriented text processing apps:-(
> 

Maybe you just need to put a filter at the beginning of your pipeline
to normalize whitespace to whatever you need.

Matthew

-----------------------------------------------------
Matthew Fuchs
matt at wdi.disney.com
http://cs.nyu.edu/phd_students/fuchs
-----------------------------------------------------

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list