XML and HTML Intermixed

Peter Murray-Rust Peter at ursus.demon.co.uk
Fri Jun 27 09:31:42 BST 1997


In message <9706261658.AB03565 at hoccson.ho.att.com> "Seibel, Robert R" writes:
> XML Dev. Team:

There is no 'team' other than the public-spirited members of this list and
others :-).  Everyone is invited to join in - no entrance qualifications - just
a willingness to help the development process.

> 
> In my application, I see the need to be able to mix XML (my own tags)
> and HTML tags in a core content database. I plan on using a DTD
> at various authoring points to validate structure and tags.

This is an absolutely key question - which some of us raise at regular 
intervals.  My analysis - which I hope others will challenge or amplify - is
something like this:

HTML2.0 and HTML3.2 *at present* are SGML-compatible (if properly authored,
with balanced tags, quoted attributes, etc.)  They are not XML-compatible
for reasons which have been discussed here (inclusions/exclusions, '&' content
models, etc. in the DTD, and some EMPTY tags which require the <FOO/> syntax
in XML).  We all expect that 'someone' will convert common DTDs to XML and HTML
is a leading candidate but so far no-one has actually done it. (IMO it needs
to have the (in)formal blessing of the W3C, since HTML is a W3C protegee).

So the question might break down to:

(a) can I mix HTML(non-XML) with XML in the same document?
	This would not be a valid XML document overall, but it might be valid
input to an HTML browser which recognised XML markup.  It's up to the browser
(or other software) creator as to whether that's meaningful.

(b) can I refer to an XML document from an HTML document?
	This is simple if there is a MIME type for XML, since standard helper
technology can be used. [This is what I do for CML (Chemical Markup Language)
and I use the browser to call a viewer for text/xml or chemical/x-cml]. It
is generally believed that 'someone' is submitting an application to IETF/IANA 
for registration of the text/xml MIME type (??Progress??).

(c) can I XML-ise HTML and mix it with my own DTD?
	Yes.  It depends on how this is done.  I have edited HTML2.0 to be 
XML-compliant for my own purposes.  CML 'contains' HTML2.0 as part of the
CML DTD.  This guarantees there are no namespace problems (i.e. CML cannot
have identical ELEMENTs to those in HTML).  So this allows CML documents to 
contain chunks of XML-ised HTML.  Rendering these is non trivial, because it
is not easy to pass HTML to the browser without using Javascript and I do
not like doing this (non-portable, flaky, etc.)  Moreover I have tweaked
my HTML to use the full XML-LINK syntax for tags such as <A>.

(d) Can I use HTML with my document if I have an ElementType which clashes
with one in HTML?
	Not easily.  The question of combining DTDs and document fragments
has exercised the ERB/WG and generated megabytes of opinion.  A solution
will appear at some time in the future.

(e) Can I use XML-ised HTML and include XML-LINKs to other XML documents?
	Yes, if the HTML has been extended to use XML-LINK.  This is what I
do to avoid namespace clashes.  It may have its detractors. Be warned that 
there is not much software which can display XML documents using two different
DTDs at the same time; I'm working out how JUMBO will do this - if I get some
answers to my LINK queries it should be fairly straighforward.
> 
> Do you see mixing tags as reasonable? The XML tags could be converted
> to the appropriate HTML tags if sent to a browser. Then again

There are normally no default 'appropriate HTML tags'.  How would you convert
<FOO>
<BAR>276+354/872=6354?</BAR>
</FOO>

to HTML? One way to tackle this is through stylesheets (CSS1 or DSSSL) where 
appropriate formatting/rendering is applied to each tag, including context.
Alternatively (as in JUMBO) Java classes can be supplied for each ElementType
which might convert to HTML. (For example, MOLecule in CML has 1500 lines of 
Java which among many other things will render it as HTML).

> all of the tags or information could be formatted for the appropriate
> output device
> on the fly.
> 
> For instance, I may have a tag called PROBLEM and another called
> SOLUTION.
> As I'm explaining the solution, it would be nice to use HTML tags to
> explain the
> solution.
> 
> Example:
> 
> <PROBLEM>Problem description</PROBLEM>
> <SOLUTION>
> <OL>
>     <LI>Do this first</LI>
>     <LI>This is second</LI>
> </OL>
> <P>Call me on questions.</P>
> <SOLUTION>
> 
> Let's say I used a style sheet to display the contents. It seems to me
> that
> using HTML tags intermixed with XML tags is a good thing. I don't have
> to
> reinvent my own tags when HTML already defines them. 
> Comments?

I am strongly in favour or re-using DTDs and document fragments.  So many
chemical documents will draw from 3 DTDs:
	- HTML for the main text
	- MathML for the mathematics
	- CML for the chemistry
The ERB/WG has debated this at great length and accepts it as very desirable and
high-priority.  No actual mechanism is given at present.  An additional 
character has been reserved for NAMEs in case we need to use it for namespace
#in the future, but we're not allowed to use it yet [I think that is the correct
position??]. 

To summarise, I believe that mix-and-match from different DTDs is a valid and
useful approach to XML.  It means that there can be 'islands of validity'
[an idea from the WG] within XML documents, so that XML-WF docs will not
be semantically void tag soup.  The difficulty at present is how those
islands are identified - there is no consensus yet.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list