MathML (and implications for XML)
Peter at ursus.demon.co.uk
Sat May 17 21:05:53 BST 1997
I have read quickly through the MathML (970515) draft and have some
(hopefully constructive) comments to make - any crossmember of xml-dev
and html-math-wg is welcome to crosspost them.
Before giving detailed comments, I must say that I think it's an
extremely useful document and covers all of the areas that I - as a
mathematically oriented scientist - would like to see. The initial
discussion is very useful and I shall borrow some of the flavour of
it when redrafting Chemical Markup Language.
An archetypal XML DTD
Since MathML is one of the very first XML DTDs to be published it naturally
sets a style which others may imitate. In general I think it does this
well, though it is at the mercy of a still fluid XML-lang and XML-link spec.
I appreciate that some of this was probably written some time before the
latest XML drafts.
Specific comments in this area are:
3.1.4 'By default, XML processors remove all leading and trailing whitespace
... between the begin/end tags and collapse any internal w/s to a single
space character'. My current understanding is that *validating* parsing
removes the start and end w/s but does not collapse the internal w/s, but
that WF-parsing passes the whole lot unchanged included the leading/trailing
w/s. [I'm usually wrong on this, but it's a problem area :-)].
7.1 ['</' is not allowed in CDATA]. My reading of XML 2.7 is that '</'
is unrecognised within CDATA (indeed only ']]>' can terminate it). This
might allow significant simplification to MathML 7.1 and allow the
elimination of two sets of tags.
MathML proposes two generic means of extending functionality, one through
attributes and the other through macros.
7.2.3 the OTHER attribute has the syntax:
and essentially allows a means of adding additional attributes independently
of the DTD. Personally I'm sympathetic to this (as long as the attributes
are ones *I*'ve though of :-). This is 'not to encourage software developers
to use this as a loophole for circumventing the MathML core markup'... but
as we all know this is the sort of unchecked semantics that people love
and which soon leads to non-interoperable documents and processors. I'd be
frightened of it in the Chemical community. This is a point which
is important for XML in general.
5.3 Macros. This is the ability to create macros to avoid repetition of
verbose markup and seems particularly appropriate to math. (I think it has
a similar, but smaller, role in chemistry.) As far as I can see it is
totally compatible with XML/SGML, ***BUT it requires a pre-processor***
(I have been calling this a pre-parser).
There will be a role for a pre-parser in XML and one of its functions will be
to apply macros. Can we work towards a standard set of operations that a
pre-parser might carry out?
XML-LINK. The document is written with little reference to XML-link
(not surprising, since it's new and AFAIK JUMBO is the only tool that
implements it even at prototype level). However I think there are at least
the following areas where XML-link mechanism might be alternatives:
7.1 Display and in-line notations. The draft assumes that the MATH component
of a document is embedded in the HTML at the point that it occurs in natural
reading. XML-LINK gives a mechanism for separating the math and the text and
combining them under the flexibility of the linking mechanism. The problem
occurs in exactly the same way in chemistry - do we encode HCl in-line
or as a display;
This is a matter of style which may not be totally within the author's
control - the publisher or renderer or reader may have the power to alter
it. Since XML will approach this generically at the LINK level, I have
used constructs like:
<P>this is <A HREF="#HCl" XML-LINK="SIMPLE" ACTUATE="AUTO" SHOW="EMBED">
<XVAR CONVENTION="SMILES">Cl</XVAR> <!-- yes, I really meant to omit H! -->
This - in the present JUMBO - will in-line the formula for HCl. I am sure
that by use of stylesheets and BEHAVIOUR it would be possible to control
your equations to be at the para end, etc.
7.2.4 <MACTION>. I am sure that it is possible to recast this tag in
terms of XML-LINK BEHAVIOR. That saves a lot of hassle writing code because
it may already have been done...at least in part.
Communality with future XML DTDs
As XML develops, CML gets smaller. This is wonderful. There are a number
of general components of MathML that will help CML and probably other
people as well. A particular example is VECTOR and MATRIX (4.2.9).
It is clear from the XML-WG that many people want a method of representing
(multidimensional) regular arrays of strongly typed data and also the
means for addressing into these. Some (including me) will try to push
for economy of expression and avoid the <SEP/> syntax. (At present
CML uses the following matrix syntax:
<ARRAY ROWS="2" COLUMNS="3" TYPE="FLOAT>1 2 3 4 5 6</ARRAY>
and has a kludgy mechanism for repeated arrayElements or arrayElements
with whitespace. Since some of our matrices are large I'd quite like to
drop <SEP/>, though recent XML-WG discussion has emphasised that space is
not an issue.
MathML, CML, and other XML enthusiasts should strive towards a common
*extensible* way of representing arrays and matrices
Interoperability with HTML
This is a key area and I'm not clear from MathML spec exactly what the
mechanism is. AFAIK CML and MathML are the first DTDs to tackle the question
of how to interoperate with HTML. As we know there are syntactic problems
of how to combine two or more DTDs (DTD fragments).
It should ultimately be possible to create a joint HTML/*ML document
which can be validated (i.e. not just well-formed).
This raises considerable problems in general since HTML content models
do not allow for <MATH> or <CML> or other foreign tags. In CML I
'solve' this by embedding chunks of HTML within CML documents - i.e. the
CML document 'owns' the HTML. It's not clear in MathML which document
contains chunks of the other (this is a general XML/HTML problems which
has to be addressed).
MathML also provides for a subset of HTML within the <MATH> container - I
assume it's a subset because it has to be processed and rendered by the
MathML processor and I'm extremely sympathetic to this problem - I've spent
far too much time hacking HTML rendering.
At present I favour a solution where CML (and MathML) are separated from
the HTML and connected by XML-LINK as in the previous section.
XML should investigate mechanisms for HTML and *ML interoperability
Interoperability with CML
AFAICS there are no namespace collisons between the MathML tagset and
CML so it's straightforward to write:
<!DOCTYPE CML SYSTEM "cml.dtd" [
<!ENTITY % mathml SYSTEM "http://www.w3.org/some/where/mathml.dtd">
and then use MathML tags. This is more luck than good planning :-), but
CML has been careful to restrict its tagset.
Linking between variables
If I write:
x = y + 3 (I)
2x = y + 4 (II)
I would 'normally' deduce that
x = 1 (III)
y = -2 (IV)
However, there is nothing in MathML AFAICS that allows one to specify
that the 'x' in (I) is the same x as in (II). [Please forgive me if I've
missed this]. For many applications we need to label a variable or function
as having the same value and semantics throughout a document, e.g.
'Determination of <A HREF="#c"> the velocity of light</A>'.
In this example I would point to some central target which represented a
the variable 'c', though I'm not clear how MathML would manage this in
equations. This is a very important requirement for re-usable scientific
publications, though perhaps ambitious at this stage.
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev