PCDATA

Peter Murray-Rust Peter at ursus.demon.co.uk
Wed May 7 17:40:44 BST 1997


Thanks Gavin,

In message <199705071512.LAA12189 at nathaniel.ebt> gtn at eps.inso.com (Gavin Nicol) writes:
> >Norbert's answers agree with what I got and also with the consensus
> >of the group.  It's clear that WF files can give *different* data from
> >those with some or all of the ELEMENT declarations.  I do not find the 
> >behaviour intuitive and believe we have to address it in some manner.
> 
> Agreed. I believe that RE delenda est solves the problems.

I am not sure that I was on board for this discussion (I have been told 
that whitespace occupied a large amount of bytes last year :-)  A summary
could be useful - it clearly has a good pedigree.  Is it a language or an
implementation issue?

> 
> >I am sympathetic to trashing the whitespace PCDATA elements, but there is
> >no clear idea of how.
> 
> The SGML rules are not always intuitive either....
> 
> >There has rightly been concern about the conformance of parsers (esp. their
> >reaction to errors).  This is an area where I suspect conformance is 
> >non-trivial.
> 
> Validation of parsers should *certainly* extend to grove construction
> as well as error handling.
> 
Yes.  For those not on the WG, Jon has informed us that the likely major
implementors are keen on conformance , so this must surely be an early issue.
It suggests that we shall need some test data and while this already exists
(torture) I am not sure that the outputs have been rigorously investigated.
Of course there is more than one type of output, and when I compare NXP's
output to Lark's I am comparing an Esis stream to a tree of Elements
(but not a complete grove).  

The discussion here and elsewhere makes it very clear that the *parser*
is a fundamental unit and that wherever possible it should be 
self-contained and independent of the 'application'.  That makes it even more
important for us to specify an API.

Please correct this, but I see three possible outputs from a parser:
	- a grove
	- an esis_stream
	- a tree of elements, possibly with PIs, attached to nodes.
We ought to be able to give outputs for each of these so that implementers
can check.

What concerns me at present is that some of the functions (e.g. XML-SPACE)
may vary with parsers and that this could be extremely difficult to pin
down in a monolithic application.  I'd recommend that what ever of the 
methods above is used, it should be possible to tap into them.

It's also clear that applications must recognise certain *attributes*.  At
present these seem to be:
	XML-SPACE
	XML-LINK 
	ROLE
	HREF
	TITLE
	SHOW
	ACTUATE
	BEHAVIOR

Because most of these are non-trivial (e.g. XML-SPACE extends to its
children, so they have to be stamped with it, but when editing a tree
the attribute may need to disappear from relocated children).  XML-LINK
is quite complex and affects content of elements (XML-LINK="EXTENDED").

Is there a case for, and is it possible to have, a PRE-application module
that deals with attributes and other generic stuff.  This would also
help people to converge on a single interpretation.  I's feel much happier 
about telling a pre-application with carefully argued semantics what to
do with whitespace or link structure validation than trusting to any old 
application.

	P.


> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list