Word and XML (was: XML standards coherency and so forth)

Sean Mc Grath digitome at iol.ie
Thu Jan 21 22:01:40 GMT 1999


>Biron,Paul V wrote:
>
>> Actually, it is very easy to generate a Word '97 document which when saved
>> as HTML will be non-wellformed.  Try the following, where *xxx* means "make
>> xxx bold", and _yyy_ means "make yyy italicized".
>> 
>>         This is *a test _of the* emergency_ broadcast system
>> 
>> The relevant portion of the HTML produced by word is
>> 
>>         <P>This is <B>a test <I>of the</B> emergency</I> broadcast
>> system</P>
>
[John Cowan]
>Un*censored*believable.  This not only isn't XML, it isn't even
>HTML.  What were they thinking of?  (I know, I know: $$$$.)
>Microsoft folks, is there any hope of getting this fixed for
>Office 2K?
>
RTF doesn't map well to XML -- even very low level -- formatting
oriented XML -- because of the way RTF is structured.

It is stack based and allows structures to overlap:-

	\b1 bold \i1 bold italic \b0 italic \i0 plain

Matching up the on/offs:-
	<b> bold <i> bold italic </b> italic </i> plain

invalid XML (or indeed SGML) because of the overlaps.

This kind of overlapping structure is nasty to
do in XML/SGML. (Which is a real pity because it
crops up in some important areas -- looseleaf
publishing for example.)

SGML made a stab at it with an optional feature
called CONCUR (never implemented to my knowledge).

Sean

<Sean uri="http://www.digitome.com/sean.htm"/>



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list