Whitespace

David G. Durand dgd at cs.bu.edu
Thu Aug 28 15:52:08 BST 1997


At 4:35 PM -0500 8/27/97, Sean Mc Grath wrote:
>What about "File-Save As"?
If the document is intended to be equivalent to the one you read, it should
have whitespace in the same places.

Is this such a hard rule? Seems easier to me than any other I can think of.
>>[Murray Altheim]
>>In applications that do modify source documents (such as editors), I don't
>>expect them to mangle/reformat whitespace, unless whitespace is simply
>>not an issue (such as XML-as-database apps).
Right. I fyou don't know the application, you _always preserve_ the space
that you saw on input.
>
>Why is WS not an issue for XML-as-database apps?
In _some such applications_ you will know that line breaks don't matter --
or that certain elements (e.g. <RECORD>) are element content. If you _know_
the purpose of the data, you might be able to normalize whitespace. But if
you're writing a general XML editor, you would be foolish to assume that
you have such knowledge.

>Because the data stream is a single line of XML?
Might be, or might not be. Author's decision.
>I use Borland Brief and Borlands 32 bit grep.exe all the time.
>Both have line length limits. I cannot use these with WS-free XML.

True, so you'd best not process such files with them. What's the point, really?

If you're creating documents you can put WS in. Even HTML parsers are
accepting arbitrary-length lines nowadays -- because lots of database HTML
TOOLS produce them.

>
>Won't our desperate Perl hackers' beloved $_ variable be significantly less
>useful if it contains the *entire* document.

Just as it's not useful in processing HTML. Regexps that don't match across
line boundaries are the most common problem I've seen in HTML-processing
Perl scripts. Looks like that will continue until people figure out that
Perl's line "Feature" is jsut a big when used with XML/HTML.

>>I imagine WYSIWYG XML
>>editing/word processing applications that completely reformat whitespace
>>will not be capable of proper link creation, so those would then be simply
>>considered 'broken applications' and probably not be successful.

This comment of Murray's is right on!

>The market loves WYSIWYG (however "pseudo" the reality is). Isn't XML
>in trouble if user-friendly (read "WYSIWYG") apps cannot do linking?

This has nothing to do with whitespace, but is an issue of how you choose
to display things on your screen. One could choose to present a nicely
formatted display and still track whitespace explicitly. I've  often wished
that tools like MS WORD would remember _not_ to typeset two spaces after a
period, for instance.

IT Sounds to me like what we really need is a small paper (about 5
paragraphs) explaining whitespace for developers:

the 3 sentence version would be as follows:

We're serious about all whitespace being significant. If you're not dealing
with an element in a document type that warrants some form of whitespace
normalization, then you _should not_  output different whitespace without
the user being aware that a significant change has been made in the
document. Such notification might take many forms in an interface: an
option, an interface that displays the whitespace as read, or an explicit
operation to "normalize" withespace.

   -- David

_________________________________________
David Durand              dgd at cs.bu.edu  \  david at dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________



xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list