Re Whitespace

Sean Mc Grath digitome at iol.ie
Thu Sep 18 20:40:28 BST 1997


[Sean Mc Grath]
>>
>>Sorry David, I cannot let you get away with that one. I said *fgets()* which
>>is an entirely different function to gets(). It takes
>>three paramaters one of which is the maximum number of characters to read.
>>It is not Broken As Designed.
>
[David Durand]
>No, but fgets (unlike gets) can deal with long lines --- you have to
>recognize that you overflowed and make accomodations, but you can do the
>right thing. iw as giving you the benefit of the doubt, since gets, at
>least, has the problem that you are raising, while fgets does not.
>
[Sean Mc Grath]
You mentioned gets(). I didn't. How your insertion of an irrelevant reference
to gets() can be construed as giving me "the benefit of the doubt" I don't know.

[Sean Mc Grath]
>>No weeping and wailing required because it is typically possible to splice in
>>line-ends into HTML *without affecting the content*. This is not the case
>>with XML.
>
[David Durand]
>Just try that in tables. You have to know the meaning of the markup, even
>in HTML, if you want to do this. Now you can claim that table markup is
>broken, and you might be right, but HTML does not suport your argument.

[Sean Mc Grath]
Why not? Why cannot I replace say, "<TD>" with "<TD>\n" everywhere?
The problem then reduces to long data chunks such as...
pre elements:-

[David Durand]
>
>Similarly for pre elements: You can't do anything to lineneds in there --
>maybe I'm using a 20K line in <pre> to force horisontal scrolling for a
>rhetorical reason.

[Sean Mc Grath]
Absolutely agreed. the <data><line end><data> case is fundamentally different.
These line-ends are truly part of the data and a processor that adds new ones
is blowing the integrity of the data. Thus the plausible argument in favour
of not
using line-end as data content.

[David Durand]
>
>>>Can you suggest any solution to the "grep" problem other than requiring a
>>>fixed line-max in XML.
>>
[Sean Mc Grath]
>>Yes. Ignore all line ends. I know this presents its own set of difficult
>>problems
>>but I'd prefer to tackle these - and maintain compatability with a decades
>>worth
>>of tools - rather than break the tools.
>

[David Durand]
>But this creates worse problems: 

[Sean Mc Grath]

Worse?

[David Durand]
>lack of <pre>-style elements

Broken As Designed. If something has to give I think <pre> elements should
be first to go.
Alternatively the problem can alway be "arcformed" away. We use 
     <!ATTLIST <e> DIGITOME CDATA #FIXED "PREFORM">
all the time. Our pretty printing, word wrapping SGML processing tools use
this to
avoid adding extraneous WS that would blow the data content.

[David Durand]
>, inability to write XML filters that preserve linespace jsut from generic
XML parsers.

[Sean Mc Grath]
Line ends (at least those) tipping up to start-end tags would *not* be part
of the data. They
could thus be added/dropped without effecting the data. The CGR output of
the grove
would be the final arbiter on "equivalence" and the launching pad for
offsets used in
addressing.

>No way to use string offsets in linking.

If it ain't got a representation in the grove it ain't in the data and thus
is not counted
when totting up offsets.

[David Durand]
>
>>> Do you think that that hideous hack to accomodate
>>>defective (if very useful) tools is really worth it.

[Sean Mc Grath]
>>Yes. Line oriented text processing has been a hugely popular paradigm for
>>many years now. I don't think of these tools as "defective" at all. I dare
>>say many wielders of these tools are of the same opinion. These people will
>>be rightly miffed at the suggestion that they are defective by virtue of the
>>use of a line oriented paradigm. They will also be rightly miffed that they
>>cannot bring their tools/skills to bear in the XML world.

[David Durand]
>But they can, they just need to limit their files to crrespond to the
>limitation of their tools. People do this all the time, without difficulty.


[Sean Mc Grath]
No difficulty?

Problem : I receive an XML file from a user who works with <1024 lines in
his tools.

I use <512. how do I munge his file to suite my tools? I can't without
blowing the data. If tag-tipping line ends were transient I could make 
a stab at it. I would still have to address the "<data><line end><data>"
case. But hey! I never said this was simple! I just said that the alternate
set of problems this presents have the benefit of not throwing out our
existing line oriented tools and techniques.

[David Durand]
>Of course if the world at large decides to abandon the "line paradigm" then
>those who stick to it will be inconvenienced. But then if "the world" make
>the shift, then there's still not a very big problem, is there?

[Sean Mc Grath]
That is one-helluva shift IMHO! I am not sure to what extent the world is
   a) aware of this aspect of XML
   b) willing to bite that bullet.
 
[David Durand]
>if XML is
>supposed to require lines no longer than some limit, we need to specify
>that limit in the standard.

[Sean Mc Grath]
No we don't! We need to have a well defined mechanism whereby a tool with
a line length limit of N can work with XML with line length > N without
blowing the integrity of the data.

[David Durand]
>Otherwise all we can say is that any XML
>processor is free to reject any document if the lines are "too long for
>that tool". That's en even worse prescription for interoperability.
>
See above.

[David Durand]
>If there are limits, a standard has to tell you how to be safe and not
>break any of those limits. At least, a good standard should.
>

[Sean Mc Grath]
The standard does not have to establish a limit. It could help users
of "legacy" tools to *cope* with limits though. "Buy/build better tools" is one
line that can be taken but it is not the only one.




Sean Mc Grath
sean at digitome.com
www.digitome.com



xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list