Deterministic Content Models ?

Richard Goerwitz richard at goon.stg.brown.edu
Sun Sep 13 22:20:02 BST 1998


Philippe Le Hégaret wrote:

> > Is (paragraph*)* a deterministic content model ?
> > If yes, so I think (a+ | b)* is a deterministic content model too.
> > >
> > >   it is an error if an element in the document can match more
> > >   than one occurrence of an element type in the content model.
>
>   I'm not totally agree with you, because if you write the
> sequence like this:
>
>     (a, a*)*
>
> is it still deterministic ? For me no, because there are
> two states in this content model. (a+)* is the same case and
> (a+ | b)* too.

Looks like everybody is more or less correct.

The whole point of flagging nondeterministic content models (which
is what SGML did, and XML may optionally do) is that nondetermin-
istic content models often indicate logic errors by the writer.

Put somewhat differently, if a DTD writer composes a content model
that allows a given sequence of elements to be processed in more
than one way, this often indicates an error.

So, for example, with (a, a*)*, it's hard to imagine what is
intended, because a single <a/><a/> could match two instances of
(a, a*), or one instance if (a, a*), depending on how you go
through the automaton.  Processors may, incidentally, flag (a+)*
as "ambiguous", since a+ usually implemented as (a, a*).

Such ambiguities create unintended differences in how the same
input might be processed by different software.  Or they simply
lead to the input being processed in a way the surprises the user
(or worse yet, the programmer).

That's why I think it's a good idea for validators, in particular,
to flag "ambiguous" content models aggressively.

To test these sorts of things is easy enough.  Just make up a toy
DTD and run it through a good validator.  Take, for example, the
following (where elements x, y, and z should get flagged as "am-
biguous"):

<!DOCTYPE test [
  <!ELEMENT test ANY>
  <!ELEMENT a EMPTY>
  <!ELEMENT b EMPTY>
  <!ELEMENT w (a*)*>
  <!ELEMENT x (a+ | b)*>
  <!ELEMENT y (a, a*)*>
  <!ELEMENT z (a+, b?, a+)>
]>

<test></test>

Yes, as always, you can try this out with the validator at:

  http://www.stg.brown.edu/service/xmlvalid/

-- 

Richard Goerwitz
PGP key fingerprint:    C1 3E F4 23 7C 33 51 8D  3B 88 53 57 56 0D 38 A0
For more info (mail, phone, fax no.):  finger richard at goon.stg.brown.edu

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list