Whitespace

Sean Mc Grath digitome at iol.ie
Tue Aug 26 18:45:21 BST 1997


[David Durand]
>
>Editing tools that change whitespace are not preserving the XML data stream
>that would be returned by a parser on the document. a Tool that works like
>this is simply buggy, since it reads in data that would return one data
>stream to applications, and produces output that would produce a different
>stream.
>
>On the current definition, even tools that normalize CRLF to LF are
>potentially damaging the document. This last is the only poitn that worries
>me much.

It worries me too! Here is a concrete example of a CRLF bug that I hit
today.

I have just used an OffLine Browser called Snake to download a web site
authored in MS FrontPage. some of the links have been correctly munged to 
local links and some have not. By inspecting the HTML it emerged that
correctly munged links looked like this:-

<AREA ... HREF="http://www.a.com/foo.htm">

whilst un-munged links looked like this:-

<AREA ...
HREF = "http://www.a.com/foo.htm">

It is easy to see what has happened here. The s/w developers have
a pattern for matching AREA elements that does not countenance the presence
of a CRLF.

How should analagous problems in XML be addressed. Doing WS processing makes
pattern
matching/state space handling easier but at the expense of making it very
difficult
to re-produce the elided WS to ensure lossless transformation.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list