Proposed process for DTDs in XML

Peter Murray-Rust peter at ursus.demon.co.uk
Sun May 24 21:14:31 BST 1998


At 17:16 24/05/98 UT, Simon St.Laurent wrote:
>It sounds like we have a lot of genuine interest in using XML syntax for
DTDs.

Agreed. 

>
>I remain somewhat concerned that we are potentially duplicating the work of 
>others, particularly XML-Data and RDF.  The narrowness of scope that Peter
has 

I think this was initially a worry about SAX as well - i.e. it would be a
'subset' of the DOM. However SAX and the DOM co-exist very well - I have
had feedback in-real-life saying that SAX was exactly what they wanted and
the DOM was too large and confusing. Similarly I can see that 'XMLDTD' (or
whatever) could find a useful niche.


>I'm willing to devote a considerable amount of time to this project.  I do 

Great. 

>have two books to write this summer and I'm moving in June, but I think I
can 
>work around those obligations.  Additional volunteers (in addition to 
>contributors) may be needed as this project develops. Unless someone else 
>would like the position, I'm willing to attempt to do as good a job as David 
>Megginson did with SAX.

There is no doubt that this is a proven way.  With SAX the 'original
process' was - as David pointed out - finished within 4 weeks, but other
fundamental problems were encountered. Because SAX was the first API for
XML, David and we had to face *generic* problems that hadn't been foreseen,
particularly Exceptions and encodings.
>
>As for the project and the process, there are a number of things to work
out, 
>before we even get down to syntax.

Agreed. One fundamental aspect is whether there needs to be any software or
APIs. Although theoretically SAX could have been written as an API, in
practice it was critical that David produced an implementation as proof of
concept. (and also for 'marketing').


>
>- Name of Project: Peter has referred to the project as DTDXML; at one point 
>Paul Prescod referred to this type of DTD with the identifer xdtd.  I
avoided 
>naming it it in my original proposal.  Suggestions?

Agreed we need one. This was purely shorthand - I have no strong feelings.
I think that this should carry the idea of a DTD rather than a schema as
hopefully that will unite people across the spectrum.

>
>- Scope of Project: Are the schemas defined by this project intended to
map a 
>subset of current XML practice?  No one seems interested in making parameter 

It's almost too early to say what current practice *is*. For example I have
not yet seen XML documents with  NOTATION in. This doesn't mean it's not
important - merely that I would be unable to comment from experience.

>entities, for instance, a part of this project. I suggested in my original 
>proposal that parameter entities might in fact be unnecessary in this type
of 
>project.  Tim also brought up the question of "obscure" attribute types.
(Data 
>typing in general is a key issue, but one I'd like to avoid for right now.) 

I agree it should be avoided in this project. For the historical record,
TimB wrote a note about a year ago on XML-type. It seemed an obvious and
valuable thing to do. I am not sure why it didn't go further - perhaps
because of XML-data.

However I think that this project should envision that there may well be an
orthogonal XML-type activity of some sort and should make sure that nothing
undermines this possibility.

>There are always the internal/external battles as well. We'll need to figure 
>out exactly what parts of XML are worth including.  Eventually, it may prove 
>possible to map everything, but I think we'll be better off starting with
less 
>and building a firm base.  We need to define an achievable set of goals
early.

Agreed. As I said before, I'd recommend starting with those aspects of the
DTD that survive the normalisation process. This would mean that
distinction between the internal/external DTDs would not be included, nor
would PEs. 

After further elucidation from EliotK last week, it became clear that I
have failed to understand external entities. (I had not understood the
purpose and difference of NDATA.) I would urge that we start without a
complete spectrum of entities since I suspect they will be difficult. 

Another way of tackling it is to suggest that we manage those components
which are most relevant to SAX (ELEMENTs and ATTRIBUTEs and some aspects of
ENTITYs).  IOW 'what does a SAX user want from DTD information'?

>
>- Linking to XML Documents: Tim asked "How do we associate the new schemas 
>with document instances?" Current XML DTD's are defined/linked with the 
>DOCTYPE declaration.  The DTDs defined with this project probably need a 
>mechanism that indicates the DTD type - if they aren't converted to
normalized 
>XML 1.0 DTDs before validation, of course.  I'm not certain this mechanism 
>should be defined in this proposal - linking schemas (and other supporting 
>materials, like stylesheets) to documents seems to need another standard
that 
>isn't directly bound to this one.  Still, this issue needs consideration.

Good point. It may even be that we preprocess these XDTDs to conventional
DTDs. After all there will have to be *some* processing machinery for them,
even if it's only an XML parser and a transformation engine.
>
>- Namespaces: My proposal used elements created without namespaces.  It
seems 
>like namespaces would be useful and/or necessary here.  It would probably be 
>smart to decide on this _after_ the project has a name.

This could be a very useful area. It seems very likely that some processing
machinery will need to be present for namespaces DTDs (e.g. to
add/remove/edit prefixes) and this might combine with the current proposal.

>
>- Relation to other standards: As I said above, I don't want to stomp on the 
>feet of the people working on RDF and XML-Data.  I'd like to see this
proposal 
>grow cooperatively, with little conflict with other proposals.  I also think 
>the syntax presented for DTDs should be formally expressed using the XML 1.0 
>DTD syntax.  (If it can be described using RDF and XML-Data as well, so much 
>the better.)

XML-data seemed to have three components:
	- a type-ing proposal (integer, etc.)
	- a DTD representation (STAR, ONEORMORE, etc.)
	- relations

In my simple mind I could see these being disassembled into three separate
components. I would see the middle one as equivalent to what we are doing
and therefore the most fundamental (or at least the first we have to tackle).
>
>- Relation to XML-DEV: This forum is easily the best place to gather 
>high-level feedback, but I'm not sure everyone here wants to read all the 
>proceedings.  I feel strongly that this disussion will benefit by being 
>public, and public-domain.  Would it be acceptable to use XML-DEV, and make 
>sure that all discussion includes whatever namespace token we choose on the 
>subject line?  That would make it easy for people to filter in or out what 
>they want to read.  (I'm also aware that there are people paying by the byte 
>out there - if necessary, we could move to another space.)

I think the SAX process worked very well. David sent out questions with
precise topics and replies (either public or private) addressed these. I
suspect David had many hundreds of replies and so Simon (assuming you are
the coordinator) needs to organise your mailbox well. Perhaps David could
give a very brief overview of any organisational problems Simon will hit.

>
>- Document Creation: We also need to decide how to create this document.  My 
>original proposal (at http://members.aol.com/simonstl/xml/) has been
chewed on 
>fairly well.  It might still serve as a base (_minus_ all the specific
syntax 
>I proposed) for further development.  Do people want to proceed this way, or 
>should we start from a clean outline?
>
>There's a lot to do, but it's definitely exciting!
>
I am assuming that part of the result will be the ability to express a DTD
(or equivalent) in XML syntax. That means it can be held as a tree. In that
case JUMBO (and many other tools) will naturally be able to hold the result
in memory and to display it in a variety of ways. So, to the extent that
that is useful, I can hopefully commit to being able to provide it.

	Best of luck.

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list