Unix/Java design issues (Was: Re: Is CDATA "structure"?)

Hunter, David dhunter at Mobility.com
Tue Jul 20 22:41:35 BST 1999


From: Nik O [mailto:niko at cmsplatform.com]

> <back_to_xml_issue>
> 
> I originally brought this up re "XML's [specified] auto-conversion of
> CRLF-delimited text records to LF-delimited records".  My 
> concern is that,
> given Microsoft's market dominance, much of the XML text that will be
> generated in the near future (or that what comes from legacy 
> data) will use
> the CRLF delimiter.  When an XML-compliant parser replaces 
> these characters
> with a single LF, the data will no longer be 
> viewable/editable with simple
> MS-Windows text tools (e.g. Notepad).  Also, the original XML data is
> replaced by a converted form (let's ignore entity expansion 
> for the moment).

Tim Bray comments on this in the Annotated XML spec (at
http://www.xml.com/axml/axml.html):

<quote>
Line End Trade-Offs
The idea here is that a programmer need never have to wrestle with the fact
that Windows boxes, Macintoshes, and Unix systems all use different
characters to separate lines. Since XML documents will be stored in files on
all these systems, and will often be broken up into lines, it's absolutely
certain that these documents will use all these different combinations of
carriage-return and line-feed.

But as a programmer using an XML processor, you can count on never seeing
anything but a single line-feed character separating lines. This means your
code will run anywhere.

Since the publication of the spec, we have received a certain number of
complaints from Microsoft Windows programmers, who find it surprising and
disturbing that the data they receive from the XML processor has "weird,
unconventional" line separation. Given the relative number of Windows
programmers, it might have been a good idea to adopt the Windows-standard
CR-LF as the line separator signal, as opposed to the single LF; but it's
too late for that now.
</quote>

I've noticed myself that my beautifully formatted XML documents do sometimes
have line separators stripped out [from a Windows point of view] when
viewing them in Notepad, but since I only ever need to look at the bare XML
when I'm debugging, it's not really a problem in my case.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list