SAX: problem areas (was Re: SAX: Whitespace Handling)

Peter Murray-Rust peter at ursus.demon.co.uk
Sun Jan 4 11:23:20 GMT 1998


Firstly can I congratulate everyone on the very high standard and value of
the postings already.  I am relying implicitly on DavidM to corral them,
but they seem to be tending towards enough communality that a synthesis is
possible. That synthesis will not give everyone everything they would like
but will be workable.

I am relying on DavidM to steer this if he feels:
	- he has got enough material on any one subtopic already. Please be
sensitive to any requests for discontinuation of postings on a subtopic.
	- any subtopic is getting too complex.
	- there is merit in further interim proposals, etc.
I am sure that you will all agree that we are very grateful to him and will
continue to make sure that the postings help in making his task possible :-)

There are some strategic issues that PaulP raises in his posting:

At 21:11 03/01/98 -0500, Paul Prescod wrote:
[...]
>
>You've mentioned this a few times, but I wonder if we are really making
>a spec. for people who are not familiar with XML itself. Ignorable

My impression is that although this is perhaps where some of us started
(including myself) we are producing something which relies on a thorough
understanding of XML. I am happy to go along with this. If SAX develops in
the current way, it will be much easier to build "newbie" interfaces on top
of it. [e.g. a newbie interface might omit any references to PIs.]

>whitespace is an unfortunate fact of life (and entities are a fortunate

I agree. I we do not support IWS then we shall frustrate many of the
currently experienced XML/SGML community who are a major part of the XML
implementation community.

>fact of life) and people who want to work with XML parsers should be
>familiar with XML concepts. All we should hide from them is the nitty
>gritty syntax.
> 
>> Tim Bray's recent comments on this list imply that a validating parser
>> using SAX could report ignorable whitespace as regular character data
>> and still be conforming; if I have inferred correctly, then I am
>> willing to omit this callback.
>
>Could someone please show me where the spec. provides leeway for this
>sort of thing? If SAX is meant to be usable with validating parsers
>(e.g. parsers which report validation errors), then I feel that it
>should support ignorable whitespace. On the other hand, if it is only
>interested in the well-formedness level, then of course this is
>irrelevant.
> 
 PaulP has touched on two of the areas that I think will give us most
problems - "Validating parser" and "Whitespace". I agree with him that we
must address both of these.

I don't know if we need an additional question, "Should SAX support
validating parsers?", but if so, my answer would be YES.

I also take it as almost axiomatic that SAX should support everything in
the spec *relevant to those areas it addresses*.  IOW if it doesn't support
NOTATION it could ignore everything to do with that (e.g. NDATA.
NotationType) and might simply throw an Exception (SAX ignores NOTATION -
or whatever). [I am not making  any judgment on NOTATION - but it is
possibly not a core component].

The problem with VPs and IWS is that they are not sufficiently fully
defined *in the spec* that their interpretation is trivial. You have to
read the spec extremely carefully and it often comes down to very small
details. [I actually believe that there is more variety of opinion among
the experts than some realise.] I am sure that in VPs and IWS we are moving
into uncharted territory. I suspect we may have to either come up with
minimalist or rather fuzzy implementations, which may get amended later in
the light of experience OR further spec revisions. [Remember that we
haven't seen the final result of the PR process - "minor changes" are still
allowed.]

on VPs I think it would be very valuable if someone could list what they
think a "VP", or a "Beyond WF parser" should do. A BeyondWF parser - DavidM
uses the phrase DTD-driven parser, which we may wish to adopt - can produce
*different output* from a WF parser. It must normalise non-CDATA attribute
values, for example. It must also report things such as the occurrence of
IWS. It also may or must throw additional violations. This document

<?xml version="1.0"?>
<!DOCTYPE FOO [
<!ELEMENT FOO EMPTY>
<!ATTLIST FOO xml:lang NMTOKEN "en">
]>
<FOO xml:lang="  fr "/>

will give different values according to whether the ATTLIST is present (and
used). The document above has enough information to be "validatable".
Whether this "invokes a validating parser" (if the parser is capable of it)
is not clear to me.

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list