XML QuotedCData question
Peter at ursus.demon.co.uk
Mon Mar 10 10:19:17 GMT 1997
It seems that there is enough ambiguity or possible misinterpretation
that this is a problem unless tackled. If WG or ERB members are reading this
then they might wish to take note.
In message <9703100330.AA04839 at sqrex.sq.com> lee at sq.com writes:
> The question about how to expand entities may arise, I think, because
> XML, like SGML, is not layered.
> Most programming languages talk explicitly about tokenisation,
> or tokenization if you prefer :-), and in doing so explain how
> the sequence of tokens that a compiler (say) sees is derived from
> an input stream. Usually, comments are stripped at this stage,
> and in languages such as C or SGML that have (in effect) macros,
> the macros are expanded at input time.
Agreed. And having come from C I think in those terms.
> I'd personally like to see a version of the XML spec in which there
> was no S production, but rather a list of things that are self-delimiting
> (such as <) and don't require whitespace; the explanation about
> entities would then be clearer.
I hadn't realised this (S) was the problem :-)
> SGML entities can't all be expanded at input time, since some
> of them are of differing types (e.g. external files) and must be
> treated differently. I'm not sure whether this applies to XML
> general entities or not, but it probably does -- do we have
> NDATA entities?
Entity substitution is very briefly defined in the draft. I don't know
what it's like in 8879 (and I'm not going to find out!).
I see the following problems:
- it is *possible* (though I think unlikely) that not everyone on the
ERB agrees as to what is meant to happen during substitution
- parser implementers may:
* find the spec not well-enough defined
* interpret it in different ways
- DTD implementers (i.e. those using PEs) may:
* find the spec not well-enough defined
* interpret it in 'incorrect' ways
I have found 'programming' in SGML one of the most tedious and
counter-intuitive things I have had to do. The primary problem has been
entities, though RE hasn't helped. I had only two ways of proceeding:
- if it failed with sgmls it was my fault
- Joe English helped a great deal by answering 'simple' questions
I finally ended up with a complex, hairy, and totally non-intuitive way
(to non-SGML folk) set of DTDs and 'include' files. sgmls was the only
way that I could tell whether it was 'right'.
The only way that we can expect people to develop applications for XML
using entities is:
- be absolutely clear what we are doing
- be as consistent as possible with past practice in SGML and
provide guidance on conversion
- have 100% accurate parsers
- have very clear examples and torture tests
- have tutorials
My starting point would be to take HTML2.0 (or 3.2 or whatever), and make sure
that the spec is capable of 100% accuracy in deciding what should happen.
If not it needs revising.
At present the immediate problem arises for Norbert (since his is the only
validating parser we are working with) and those who are working with it.
However PEs are used for other things than validation - I used them to
'add directory names' to a 'list of files' (i.e. manipulation of the
location of general entities).
Above all, of course, the XML documents must be valid SGML documents and
they must give the same 'result' as when processed by sgmls.
> Maybe when the syntax settles down finally I'll do that.
In a sense this is mainly the interpretation of the syntax and therefore
the documentation rather than the productions (have I got that right?)
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo at ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev