Classification: XML Parser Features

Fri Dec 12 23:34:56 GMT 1997

At 12:17 12/12/97 -0500, David Megginson wrote:
>Tim Bray writes:
[.. extremely important discussion deleted ...]

I also (unfortunately) have sympathy with David's view that it's harder to
write a conforming parser than appears on first reading. I agree that there
are few if any fully conforming parsers at present.

> > I'll stop here.  I suggest you go back and re-work your
> > (potentially helpful) list based on a re-reading of the
> > specification. -Tim
>
>Thank you very much for your comments.  I am grateful for the work
>that you and the rest of the WG have done with the spec, and I hope
>that you find my comments constructive rather than confrontational.
>
I am sure this is not a confrontational issue. I think David has made an
excellent first pass at defining what we need to do. WG and SIG discussions
(which David has not seen) are confidential, but it's clear from the
relatively recent introduction of 'standalone' that this issue has been
thought about.

I do not believe this problem is solved yet. I have always felt that until
we get working prototypes we shall not uncover all the difficult semantic
problems. It is exactly now that they will start to appear with a 'stable'
spec and a crop of new software. If you think 'no need to write a new
parser, it's all been done' that's probably optimistic.

The problem is that the semantics are very hidden and depend on what your
background is.  You may use SGML as a marker and it would be *logical* to
design an XML parser to do exactly what an SGML one does. However, XML
deliberately introduces flexibility into the spec, and in so doing
introduces fuzziness. If anyone thinks this isn't a fuzzy area, state
precisely what you think of David's classification (amended if necessary).
Only if most of the 'XML experts' agree, can we say it isn't fuzzy.  

There will be worse fuzziness introduced if it isn't clear to
'non-XML-experts' what to do. IMO there are still areas of difficulty and
different authors will introduce different 'features' - often without
realising it.

I suspect that a useful way forward will be to attach commandline options
to parsers. They are already potentially required for 'may' clauses.
Perhaps we should identify the areas where there are two schools of thought
(e.g
'assume document is WF'/'check for WF error') and add a switch. Then the
newcomers will understand that there is an area they have to think about.
These may also help to clarify the drafters' minds if necessary.

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)