XML Convertor

DuCharme, Robert DuCharmR at moodys.com
Wed Dec 22 14:41:01 GMT 1999


>I am currently looking out for converting Word Perfect, MS Word and ASCII
>files into XML. 
>So Far I was just able to find out only RTF to XML convertor, which uses
>omnimark technology.

Converting something to XML means converting it to a text file in which
start and end tags show the beginning and end of structural elements (and,
maybe storing certain pieces of information as attributes in the
start-tags). There has to be some way for the converter to identify the
beginning and end of these structural elements. Rick Geimer's Omnimark-based
rtf2xml (see http://www.omnimark.com/develop/contributed/) does this by
looking at RTF codes.

A program that reads proprietary binary formats (WordPerfect or MS Word) and
does this would be difficult enough that no one I know of has bothered--they
just save as RTF and either write something customized to convert that RTF
to their own DTD or use Rick's program and then convert its output to their
own DTD. WordPerfect and Word 2000 have some XML-related features, so you
might want to look at those. 

To convert an ASCII file to XML, you could put "<myDocument>" at the
beginning and "</myDocument>" at the end, but this wouldn't do you much
good. To put additional tags in places where they would be useful requires a
program that knows what to look for. People often use perl, python, awk,
etc. to write scripts that look for patterns in their input that give them
clues as to which tags should go where.

>Is there anything generalised which would take care of all (or most) types
>of Binary & ASCII files.

To find and identify the structure of the input, the processing program has
to know its structure intimately, so a generalized program that takes care
of all types of binary and ASCII files is impossible. Having spent too much
time studying RTF, I applaud Rick for studying it even harder so that others
wouldn't have to. It would be difficult to do any better.

Bob DuCharme       www.snee.com/bob       <bob@  
snee.com>  see www.snee.com/bob/xmlann for "XML:
The Annotated Specification" from Prentice Hall.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list