Another look at namespaces

James Tauber jtauber at
Mon Sep 20 15:21:07 BST 1999

----- Original Message -----
From: Simon St.Laurent <simonstl at>
> >You don't actually need the "vocabulary". The alphabet of a formal
> >is part of the grammar.
> In XML-based languages that rely on DTDs or schemas, yes.  But in all
> formal languages?

Yes. The grammar includes the symbols it uses.

> Seems that it wouldn't be hard to create a formal
> language that had classes of vocabulary (like noun, verb, adjective) and
> fit them into patterns (subject[noun]-verb[verb]-object[noun]) that were
> separate.

This separation is merely partitioning the grammar into productions that
take penultimate symbols to terminal symbol and all the other productions.

    [1] Sentence -> NP VP
    [2] VP -> V NP
    [3] NP -> Simon
    [4] NP -> XML
    [5] V -> likes

What you are talking about is splitting productions 3-5 from 1-2. This is
often done in natural language processing and many theories of (natural)
language make a distinction between the lexicon and the syntactic rules. But
we are talking about formal languages, not natural languages.

> It's that, but it's also worse.  Suppose you have a nice modular DTD that
> expresses most of the vocabulary a user will need to create documents of a
> certain type, but has ANY sections so that users can organize it any way
> they like.  Users build sets of DTDs to see what exactly it is they're
> getting or producing, but all of the possibilities are actually open.  Is
> the language described by the 'master' DTD, which doesn't get you very
>  Or is the language described by the particular DTDs?  Or do we measure
> interoperability?  A 'master DTD' containing all possibilities will
> grow obese.

I'm not sure I understand what you are saying here. When a user pieces
together bits of different DTDs, they end up with a *single* DTD. This is a
single grammar definining a single set of valid instances.

> Then there's the simpler case of well-formed documents, for which we can
> _derive_ grammars, but can't make definitive statements above the level of
> XML 1.0 conformance.

Pardon? A grammar for well-formed documents doesn't need to be derived
because it is in the XML 1.0 REC. It is a BNF augmented by WFCs and the odd
bit of prose.

> I think 'formal language' in that sense is not especially useful except
> limited situations, and should probably be reserved for the few cases
> XML development is limited to representations of older legacy systems that
> relied on formal languages based on that sense.  XML itself, it seems, can
> do better than that.

It can. But formal languages are part of the picture because sometimes there
are syntactic constraints. They might be loose, but they are still a

> It depends on what kind of 'formalizing' you want to do.  In many cases,
> I'd suggest that we focus on 'relaxing', producing more flexible models
> that aren't so concerned about locking everything down into a single
> grammar and a single vocabulary.  It requires a change of mindset.

A formal grammar is still a formal grammar even if it permits any of the
terminal symbols in any order. A more flexible model is still a model. The
moment you model the syntax, you have a formal grammar.

> Why is it that only one validating Java parser allows the application to
> continue after a validity constraint (not a well-formedness constraint)
> been violated?

Because the others are wrong.

> I suspect it's because a lot of folks are taking the 'formal grammar' of
DTDs more seriously than the XML 1.0
> spec itself does...

But that has nothing to do with the value of formal grammars. If I present
you with a CFG modelling English and refuse to listen to you unless your
sentences parse to my CFG, that isn't a problem with my CFG *or* the notion
of CFGs in general.

> I don't think we're incompatibly far apart

I actually agree with you completely in pretty much everything but

> I just would like folks to look at 'formal languages' a bit more closely
and a bit more critically.  Rick
> Jelliffe's made excellent arguments in other postings on this thread, for
example, regarding the ways formal
> languages can obscure as well as illuminate. Right now, I think we need to
contemplate whether 'formal
> grammars' sufficiently distinguish 'languages' in practice before putting
extra work
> for programmers and authors (namespaces) on every formal grammar that
> our way.

I think the XML community would generally agree that:

1. certain classes of formal grammar are not sufficient for the syntactic
constraints people wish to express
2. syntax isn't all there is

Linguists worked these out well before you and I were born, Simon :-)
I think SGMLers did too which is one of the reasons that a Document Type
Definition in SGML includes semantics as well as syntax (see another post
where I follow on from Rick's comments relating to this)

As far as I can tell, no one is arguing that formal grammars are all we
need. I am merely trying to clarify what formal grammars are so that people
understand what is meant when someone says that a language has a grammar or
that a DTD is a grammar.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at
Archived as: and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at

More information about the Xml-dev mailing list