DOCTYPE (was Re: Announcement: SAX 1998-01-12 Draft)

Wed Jan 14 01:59:01 GMT 1998

Peter Murray-Rust wrote:
> Thanks. I wasn't aware of this. We need something like it.  It does,
> of
> course, rely on building a significant registry for FPIs. As far as I
> remember from previous discussions very FPIs are registered at
> present, and
> the mechanism is not widely known. If this mechanism is to become
> popular
> for XML - before the WWW gets swamped with untyped documents without
> meaningful FPIs - there needs to be a lot of effort to publicise and
> implement it.

I think we need to add an architecture system-id attribute, which would
let you provide a URL for the architecture definition document.  I don't
think that would be a controversial change (but you never know). 

Of course, you'd really want to use a URN for that, which is what a
public ID is (and can be syntactically if you don't require formal
public IDs).  It's not really a question of registering FPIs, it's a
question of making it clear what the abstract type is *to a human
observer*.  From a code perspective, either all you care about is the
architectural declarations (so you can validate with respect to them),
or you have the FPI hard-coded into a table of architectures that you
understand.  The most you have to do is implement the normalization
rules for minimum literals (i.e., squeeze out non-significant white
space).

> 
> As I understand it, these PIs are *permitted* in XML (any PI is
> permitted)
> but they are given no special importance and implementers are not
> required
> to support them. So XML - as it stands today - has no mechanism for
> requiring this to be implemented or interpreted.

It could by using this mechanism as the basis for solving the name-space
proposal. Note that XML doesn't have much in the way of syntax
choices--it can't add a new declaration unless SGML does as well (which
seems likely as part of the revision, but that's a ways off still).  You
can't use element attributes because that imposes on the document's
private name space. So that only leaves notations and PIs.  Notations
are out because XML doesn't provide data attributes (which you need to
do the configuration of the architecture use), so that leaves PIs. 
Thus, whatever you come up with will look very much like the PI defined
in N1957 (the proposed HyTime amendment).

Besides, talking about "required to support" is meaningless because it's
not a syntactic issue--it's a semantic processing issue and you can
never require semantic processing in a syntactic spec. You can require
it in a semantic spec, such as HyTime or DSSSL or XML Link, but not in
XML Lang. Or said another way, even if you make the true document type
painfully clear to me, I am still free to ignore that information during
processing.

If the facility has value, systems will support it.

Note also that in the simple case, where the document could use the
architectural DTD as its own if it cared to, the mapping can be
completely automatic (by the rules of default architectural mapping). 
In other words, if my architecture defines an element called "foo" and
my document, derived from that architecture, has an element called
"foo", then my foo is taken to be the architectural foo unless you tell
me otherwise, without the need to explicitly map it.  Thus, any document
with an explicit DTD can use that same DTD as an architecture without
changing the instance.  In other words, I can go from this:

<?XML 1.0 ?>
<!DOCTYPE Foo SYSTEM "foo.dtd" >
<foo/>

To this:

<?XML 1.0?>
<?IS10744:arch name="foo" dtd-systemid="foo.dtd"?>
<foo/>

With exactly the same processing effect, except that no validating XML
processor is *required* to process the declarations (but it can if it
wants to, after XML Lang-required validation is done).  This solves the
problem of wanting to limit declarations to external subsets: you make
the declarations architectural DTDs.  The authors of individual
documents can't modify the architectural DTD and any local declarations
don't affect it (only the local mapping to the architecture), so
architecture-based processors can be confident in only worrying only
about the element types and attributes defined in the architectural
DTD--they simply ignore anything in the base document that isn't
architectural (that is, that isn't mapped to something in the
architecture).

This solves the problem that RDF is seeing, where they want to be able
to disallow declarations in RDF documents but still have some formal
specifications somewhere, without imposing the burdent of declaration
awareness on all RDF-aware processors.  With this approach they can do
that.

Here's one more trick.  Say in your architecture you define your element
type and attribute names using colons.  For example, consider this
simple architectural DTD:

<!-- My simple architecture -->
<!ELEMENT kimber:para (#PCDATA) >
<!-- End of architectural DTD -->

And this one:

<!-- Another simple architecture -->
<!ELEMENT woods:para (#PCDATA) >
<!-- End of architectural DTD -->

And this document derived from it:

<?XML 1.0?>
<?IS10744:arch name="kimber" dtd-systemid="kimber.dtd" ?>
<?IS10744:arch name="woods" dtd-systemid="woods.dtd" ?>
<foo>
<kimber:para>This is a kimber paragraph</kimber:para>
<para>This is not a kimber paragraph</para>
<woods:para>This is a woods paragraph</woods:para>
</foo>

This looks just like all the colonized name proposals, but it's simply
taking advantage of the automatic name mapping of architectures: the
name "kimber:para" matches the name "kimber:para" in the kimber.dtd.  Of
course, the down side that if you want to map an element to two forms,
you have some redundancy:

<kimber:para woods="woods:para">This is both a kimber and woods para
</kimber:para>

But you can't have everything--at least you can *do* multiple mappings. 
You could also provide two different versions of the architectural DTD:
one with colons and one without.  You'd use the colonized one for the
architecture you use the most and the non-colonized version for the
others. The architecture definition document would explain the
correlation between colonized and non-colonized versions of
architectural elements for the benefit of implementors.

Cheers,

Eliot

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)