XML vs the Dreaded Whitespace

Tim Bray tbray at textuality.com
Sat Dec 13 22:55:26 GMT 1997


At 03:00 AM 11/12/97 -0500, Chris Smith wrote:
>Part of this work requires that these documents carry document
>authentication information. This, in turn, requires that some regions
>of an XML document must be transported *exactly*, and must be received
>and checked identically so that the message authentication actually
>works. That fact that we are considering the idea of including email
>as a transport mechanism doesn't help matters.

So your proposal is: 
(1) transcode into UTF-16 if necessary
(2) digitally sign what you get after (1).

I think this is a sensible way to go.  Obviously, there are
anomalies; 

<a foo='1' bar="2"/> 
will not be the same as
<a
 foo="1"
 bar='2'
></a>

which is surprising, but trying to find solutions may well not be
cost-effective.

You *might* want to consider losing the prologue and start checking
just at the root element.

You *might* want to consider normalizing namespace prefixes.

You *might* want to normalize whitespace in markup.

You *might*, etc etc etc etc; unless you are willing to commit to
a full grove/propert-set model a la SGML's extended facilities, you
may well be better off signing the instance as it sits.

In particular, I think there are lots of things that would be easier
and less trouble-prone to work around than line-breaking, which is well
known to be highly error-prone.  For example, in the line-break HERE->    
how many space characters that you can't see follow the ">"?

There might be a useful halfway point as follows; run it through an
XML processor and sign just the combination of element type, attribute
name-value pairs, and textual content that the processor emits; this 
allows you to finesse a lot of quoting/white-space/line-end issues; 
also it allows authors to use tricks like default attributes and 
internal entities that don't "really" change the content.

On the other hand, I'd say that off the top, just digitally signing the
UTF-i-fied characters as they sit is a reasonable way to go. -Tim


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list