Word and XML (was: XML standards coherency and so forth)
Sean Mc Grath
digitome at iol.ie
Thu Jan 21 22:01:40 GMT 1999
>Biron,Paul V wrote:
>
>> Actually, it is very easy to generate a Word '97 document which when saved
>> as HTML will be non-wellformed. Try the following, where *xxx* means "make
>> xxx bold", and _yyy_ means "make yyy italicized".
>>
>> This is *a test _of the* emergency_ broadcast system
>>
>> The relevant portion of the HTML produced by word is
>>
>> <P>This is <B>a test <I>of the</B> emergency</I> broadcast
>> system</P>
>
[John Cowan]
>Un*censored*believable. This not only isn't XML, it isn't even
>HTML. What were they thinking of? (I know, I know: $$$$.)
>Microsoft folks, is there any hope of getting this fixed for
>Office 2K?
>
RTF doesn't map well to XML -- even very low level -- formatting
oriented XML -- because of the way RTF is structured.
It is stack based and allows structures to overlap:-
\b1 bold \i1 bold italic \b0 italic \i0 plain
Matching up the on/offs:-
<b> bold <i> bold italic </b> italic </i> plain
invalid XML (or indeed SGML) because of the overlaps.
This kind of overlapping structure is nasty to
do in XML/SGML. (Which is a real pity because it
crops up in some important areas -- looseleaf
publishing for example.)
SGML made a stab at it with an optional feature
called CONCUR (never implemented to my knowledge).
Sean
<Sean uri="http://www.digitome.com/sean.htm"/>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list