XML <-> non-XML filter project
G. Ken Holman
gkholman at CraneSoftwrights.com
Fri Apr 2 19:33:29 BST 1999
At 99/03/30 13:13 +0800, James Tauber wrote:
>Earlier this month, I posted the following to XSL-LIST. With apologies to
>those who received it there, I'm posting it (modified) here to see if anyone
>is interested in some co-operative effort in this area.
>
>What I would like to see is people taking existing non-XML formats and
>developing:
>
> a) a URI for the non-XML format (for notations and for the namespace of
>the XML format)
> b) a DTD representing the existing non-XML format
> c) an output filter to convert documents conforming to the DTD into the
>non-XML format
> d) (possibly) an input filter to convert the non-XML format into XML
>...
>I would personally find great value in this being done for Makefiles,
>procmail files, simple shell scripts and PalmPilot databases. Others of
>value I can think of include Windows INI files, Unix mailboxes, your
>favourite programming language...
I'm sorry I didn't notice it when reading XSL-list, but I found this last
night on XML-DEV, so I'll post my response to both lists ... apologies in
advance for the duplicates.
The subject line implies *both* directions XML<->non-XML ... but your prose
leans towards only XML->non-XML.
I've just recently added this to my XSL training materials (X-Tech
attendees didn't see it, WWW8 attendees will see it) because I have since
successfully used XML and XSL to produce text-only files (including batch
files, control files, etc.) using an environment created by James Clark
(many thanks, James!) for his XT program:
At Sun, 17 Jan 1999 10:34:34 +0700 James Clark wrote:
====8<----
Here's what the DTD for such a result namespace might look like:
<!ELEMENT nxml (escape*, (control|data)*)>
<!ATTLIST nxml encoding NMTOKEN "UTF-8">
<!ELEMENT escape (#PCDATA|char)*>
<!ATTLIST escape char CDATA #REQUIRED>
<!ELEMENT control (#PCDATA|char|data|control)*>
<!ELEMENT data (#PCDATA|data|control)*>
<!ELEMENT char EMPTY>
<!ATTLIST char number NMTOKEN #REQUIRED>
The nxml element is the root element; the encoding attribute is a MIME
charset to be using for encoding characters as bytes.
The data element contains data. Within a data element control
characters get escaped. The escape element specifies how a particular
control character gets escaped.
The control element contains control information. Within a control
element, all characters are output directly without escaping.
The char element allows the output of a character that is not allowed by
XML (such as control-L).
====8<----
The encoding= attribute works with the character set encodings supported by
the Java engine running XT ... unfortunately, I haven't found a list of
encodings for XT.EXE (Microsoft VM).
The character sets that I think I'll need personally for all my text-only
work are ISO-8859-1 (Latin 1), IBM Code Page 850 and UTF-8.
>From the list of character sets in:
ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
... I found through trial and error that for the Symantec Java environment
these are named "Latin1", "IBM85O" and "UTF8" respectively.
HELP!!!! - Can anyone help me find the reference list of these (and other)
character encodings supported by the Microsoft Java VM?
Attached is the sample I wrote to help myself understand the features of
the namespace.
Once I found the encodings, I richly marked up in XML the source material
for a number of simple text files and I now use XT to emit from the XML by
using this namespace. So far it has covered what I personally need to emit
non-XML text.
I haven't yet needed to emit accented characters, but I'm ready with the
encodings for my Symantec environment ... I'm hoping someone can help me
find the encodings for the Microsoft Java VM.
I hope this helps.
......... Ken
P:\jclark>type nxml.xsl
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"
xmlns="java:com.jclark.xsl.sax.NXMLOutputHandler"
result-ns="">
<xsl:template match="/"> <!--indicate the kind of text being produced-->
<nxml encoding="Latin1"> <!--others for Symantec: "IBM850", "UTF8"-->
<escape char="\">\\</escape> <!--escape any back slashes-->
<data><xsl:apply-templates/></data> <!--translate what's in data-->
</nxml>
</xsl:template>
<xsl:template match="charValue"><!--don't translate what's in control-->
<control>
<xsl:text>\</xsl:text>
<xsl:value-of select="@val"/>-<char number="{@val}"/>
<xsl:text>\</xsl:text>
</control>
</xsl:template>
</xsl:stylesheet>
P:\jclark>type nxml.xml
<?xml version="1.0"?>
<test>This is a test with a backslash \ and eacute é in it -
plus the latin-1 for eacute <charValue val="233"/> as well
</test>
P:\jclark>call xsljava nxml.xml nxml.xsl nxml.txt
P:\jclark>type nxml.txt
This is a test with a backslash \\ and eacute é in it -
plus the latin-1 for eacute \233-é\ as well
P:\jclark>
--
G. Ken Holman mailto:gkholman at CraneSoftwrights.com
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/
Box 266, Kars, Ontario CANADA K0A-2E0 +1(613)489-0999 (Fax:-0995)
Website: XSL/XML/DSSSL/SGML services outline, XSL/DSSSL shareware,
stylesheet resource library, conference training schedule,
commercial stylesheet training materials, on-line XSL CBT.
Next instructor-led XSL Training: WWW8:1999-05-11
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list