Re Whitespace

David G. Durand dgd at
Thu Sep 18 18:09:41 BST 1997

>Sorry for the lateness of this reply. It got a bit lost in my out-box for a
>[Sean Mc Grath]
>>>Throw out that grep, that text editor, that fgets(), that diff,sort,uniq
>>>utility There all busted for XML use.
>[David Durand]
>>gets is of course Broken As Designed, as the cause of most security bugs in
>>Unix systems.
>Sorry David, I cannot let you get away with that one. I said *fgets()* which
>is an entirely different function to gets(). It takes
>three paramaters one of which is the maximum number of characters to read.
>It is not Broken As Designed.

No, but fgets (unlike gets) can deal with long lines --- you have to
recognize that you overflowed and make accomodations, but you can do the
right thing. iw as giving you the benefit of the doubt, since gets, at
least, has the problem that you are raising, while fgets does not.

>>Again, they are broken for XML use with files created a particular way.
>>They are also broken for HTML files created the same way, and I don't hear
>>the weeping and wailing.
>No weeping and wailing required because it is typically possible to splice in
>line-ends into HTML *without affecting the content*. This is not the case
>with XML.

Just try that in tables. You have to know the meaning of the markup, even
in HTML, if you want to do this. Now you can claim that table markup is
broken, and you might be right, but HTML does not suport your argument.

Similarly for pre elements: You can't do anything to lineneds in there --
maybe I'm using a 20K line in <pre> to force horisontal scrolling for a
rhetorical reason.

>>Can you suggest any solution to the "grep" problem other than requiring a
>>fixed line-max in XML.
>Yes. Ignore all line ends. I know this presents its own set of difficult
>but I'd prefer to tackle these - and maintain compatability with a decades
>of tools - rather than break the tools.

But this creates worse problems: lack of <pre>-style elements, inability to
write XML filters that preserve linespace jsut from generic XML parsers.
No way to use string offsets in linking.

>> Do you think that that hideous hack to accomodate
>>defective (if very useful) tools is really worth it.
>Yes. Line oriented text processing has been a hugely popular paradigm for
>many years now. I don't think of these tools as "defective" at all. I dare
>say many wielders of these tools are of the same opinion. These people will
>be rightly miffed at the suggestion that they are defective by virtue of the
>use of a line oriented paradigm. They will also be rightly miffed that they
>cannot bring their tools/skills to bear in the XML world.

But they can, they just need to limit their files to crrespond to the
limitation of their tools. People do this all the time, without difficulty.
Of course if the world at large decides to abandon the "line paradigm" then
those who stick to it will be inconvenienced. But then if "the world" make
the shift, then there's still not a very big problem, is there?

Even in that case, with some (usually minimal) human intervention, such
linend conversion/insertion is trivial in practice.

I'm sorry I still don't see how this is _worse_ than what we have with text
files today. And compared to HTML and SGML, I think XML's rules are more
consistent, and useful for more things.

I deal with the Mac (where line == paragraph), as well as Unix, all the
time. This problem is not usually of more than 10 seconds concern on the
few times in a month that it comes to mind. On occasion, of course, I find
myself spending 1-10 minutes in an editor fixing things (usually by
invoking a "wrap" command of some sort).

>>Can you suggest how we
>>would determine that buffer size?
>Question is Broken As Designed. No need for a silly fixed limit. Just a
>of the existence *of* limits and a standardised mechanism for dealing with

I can't imagine what such a mechanism is: IBM text editors for decades had
an 80-character limit. Some still work best with 72 column files. if XML is
supposed to require lines no longer than some limit, we need to specify
that limit in the standard. Otherwise all we can say is that any XML
processor is free to reject any document if the lines are "too long for
that tool". That's en even worse prescription for interoperability.

If there are limits, a standard has to tell you how to be safe and not
break any of those limits. At least, a good standard should.

 -- David

David Durand              dgd at  \  david at
Boston University Computer Science        \  Sr. Analyst   \  Dynamic Diagrams
MAPA: mapping for the WWW                    \__________________________

xml-dev: A list for W3C XML Developers
Archived as:
To unsubscribe, send to majordomo at the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at

More information about the Xml-dev mailing list