HTML != XML (was Re: [ANN] Kludgey workarounds for xt)

Eddie Sheffield eddie.sheffield at enterworks.com
Wed Sep 9 16:44:59 BST 1998


But it seems that the problem isn't the HTML, but rather with SCRIPTS that might
be included in the HTML. I believe that HTML defines the <SCRIPT
LANGUAGE="whatever">...</SCRIPT> tags, but NOT the actual script that lies within
the tags. This is where the problem is. That script might be one of many
languages (javascript, jscript, vbscript, ecmascript, etc.) and knowing exactly
how to properly post-process the fine would be VERY non-trivial, especially if
the script itself has to generate HTML on the fly. For example:

What I want:

document.write("She said &quot;Run away!&quot;");

but the generated code is:

document.write(&quot;She said &quot;Run away!&quot;&quot;);

Obviously a post-processor can't simply replace EVERY &quot; in the line, or the
script becomes invalid. But how do you know which to replace and which not? I
suppose you could parse the script and try replacing the ones that are necessary
for the script to be valid, but then you would need separate processors/parsers
for each type of script language that might be in the script.

As much as possible, a workaround would be to use external scripts that are never
processed at all, but are pointed to with the optional SRC attribute on the
SCRIPT tag. This only works for scripts that don't have to be dynamically
generated, though.

It does seem odd that with the advent of the DOM which really eases scripting and
makes it much more powerful that almost simultaneously problems occur that make
generating those scripts more difficult.

Eddie


David Megginson wrote:

> Chris Maden writes:
>
>  > Support for pre-XML HTML was explicitly considered and rejected by
>  > the Working Group.
>
> Absolutely correct.
>
> Since HTML <= 4.0 is *not* XML, it is best to treat it as an output
> format, like PDF, TeX, RDF, Postscript, etc. -- in other words, first
> produce your XML, then run it through a filter (such as a SAX-based
> app) that does a down-translation to HTML syntax.  If the XML document
> contains the same element types as the HTML, the translation will be
> very simple.
>
> All the best,
>
> David
>
> --
> David Megginson                 david at megginson.com
>            http://www.megginson.com/
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list