Re Whitespace

David G. Durand dgd at
Fri Aug 29 17:06:30 BST 1997

At 3:36 PM -0500 8/28/97, Sean Mc Grath wrote:
>>> Bye bye the entire Unix family of line oriented text processing apps:-(
>>Come on, This is a crock.
>[Discussion about a *single* tool - Perl - from the genus "Unix familiy of
>oriented text processing apps" elided]

Perl is of course the tool whose usage was made a part of the design goals
of XML. It's also the most common language of web-hackers, by far.

>Since when is Perl == Unix family of line oriented text processing apps?

>The world is littered with s/w tools that have line length
>limits. These tools are *blown* by WS-less XML.

The mainframe world was littered with tools that couldn't edit nything
other than 80 character fixed length records -- but that eventually changed.

It think a little less passion is in order here: there's _no requirement_
that XML tools not use whitespace, nor is there a requirement that they
_do_ use whitespace. People will do what is convenient for them, and for
the people whose convenience they care about.

This is as it always is. I suspect that line-breaks will in fact be common
in XML files for some time to come. The thing that worries me is that most
tools are not as smart as the editor I use on my Mac, that can edit and
save files in their native line-ending convention without even worrying
about it. And it is unfortunately true that stupid processors (like
emailers and non-XML editors) _are_ going to "convert" files. This won't
mess up PCDATA chunk counts, but will destroy character offsets (a riskiy
linking mechanism anyway). It is likely to cause problem for verbatim-style
formatting in carelessly written stylesheets, and I don't see any way
(other than painful experience) that solutions are to be found to this --
because the solutions are either reformed behavior (Don't convert linend
strings) or smarter processing software (prepare to accept CR, LF, or CRLF
at any time).

This is a problem that XML has not created, but simply tries not to make
worse, by at least picking a simple rule that can be understood.

>Throw out that grep, that text editor, that fgets(), that diff,sort,uniq
>There all busted for XML use.

gets is of course Broken As Designed, as the cause of most security bugs in
Unix systems.

Again, they are broken for XML use with files created a particular way.
They are also broken for HTML files created the same way, and I don't hear
the weeping and wailing.

>"Crock". I'll add that to my collection of spicy ripostes I have had
>accumulates over the course of this thread.:-)

I meant it as a description, in a similar, (but IMHO) slightly less-frantic

>Time to end.

Can you suggest any solution to the "grep" problem other than requiring a
fixed line-max in XML. Do you think that that hideous hack to accomodate
defective (if very useful) tools is really worth it. Can you suggest how we
would determine that buffer size? (Test Grep and AWK on our favorite 5
unices (what about wc, and Minix)) There are too many arbitrary lines that
would have to be drawn in the sand to "solve" that problem. What about
card-format editors like XEDIT, where editing lines of more than 72
characters is inconvenient  (and lines of more than about 1800 characters
is unbelievably convenient). There's still a lot of IBM iron out there. Or
should we only worry about _your_ favorite tools being able to handle any
XML document?

Certainly authors can work within the limits of their chosen tools with
XML. I don't see that we can realistically  provide them with more.

>If nothing else, David's five paragraphs have been born from this.
>I suggest they should be mandatory reading for anyone approaching
>XML development.

Edits for clarity would be appreciated, and if they pass muster by other
experts, maybe they should move to a section of the XML-FAQ for developers.
If there isn't such a section, maybe we should start one!

>It is clear that I see a problem that others don't.
>Thus the odds are I am wrong.
>I hope so.

Actually, I agree with you that there are problems (there are legal XML
documents that won't work with grep). There are plausible and common file
operations, like changing line-end marking conventions, that _may well_
cause problems with some documents and stylesheets. I just don't see any
solutions to these problems other than to let them work themselves out in
the many different environments where they must be worked out. There is no
solution that isn't so complex in its ramifications and details that it
wouldn't simply be another problem for some reasonable application of XML.

  -- David

David Durand              dgd at  \  david at
Boston University Computer Science        \  Sr. Analyst   \  Dynamic Diagrams
MAPA: mapping for the WWW                    \__________________________

xml-dev: A list for W3C XML Developers
Archived as:
To unsubscribe, send to majordomo at the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at

More information about the Xml-dev mailing list