Word and XML (was: XML standards coherency and so forth)

Biron,Paul V Paul.V.Biron at kp.ORG
Thu Jan 21 00:34:52 GMT 1999

> From: "Ogievetsky, Nikita" <nikita.ogievetsky at csfb.com>
> Date: Wed, 13 Jan 1999 12:37:06 -0500
> Subject: RE: XML standards coherency and so forth
> >Andreas Berg wrote:
> > I am searching for a converter from Word documents to XML. Unfortunatly
> >I
> have
> > no time to wait for Office 2000..... Is there something like this
> available?
> In the MS Word go to <File>/<Save As> menu, select "Save as HTML
> document".
> It will create a well formed XML file: HTML with all elements having start
> and end tags.
> (Just remember to exhume the <body> - sorry for bad joke).
> Nikita Ogievetsky.
Actually, it is very easy to generate a Word '97 document which when saved
as HTML will be non-wellformed.  Try the following, where *xxx* means "make
xxx bold", and _yyy_ means "make yyy italicized".

	This is *a test _of the* emergency_ broadcast system

The relevant portion of the HTML produced by word is

	<P>This is <B>a test <I>of the</B> emergency</I> broadcast

The "nesting" of the B and I elements is not well-formed.  As far as I can
tell this works (or doesn't as the case may be) for any format/font changes.

Word 97 also produced several well-formedness violations when doing anything
more than simple nested lists.

SGML Business Analyst
Kaiser Permanente, So Cal.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list