Rebelling parser writers (Was: ]]> within a CDATA marked section ?)

Jarle Stabell jarle.stabell at dokpro.uio.no
Thu Nov 27 18:12:38 GMT 1997


I wrote: 
<<<
 > I assume the reasons for *not* allowing "if x<>nil then doSomething" as legal content is because it is better for users that & and < are consistently not allowed for anything than markup, but I'm not convinced about this.
> (At least it seems trivial for parsers to check this situation)
>>>

Paul Prescod wrote:
<<<
Parser writers are rebelling at the number of trivial things that they
must manage.
>>> 

[JS]  I'm actually surprised that I haven't heard much rebelling here. :-)
I think there are lots of *non-trivial* things parser writers must manage in XML, so I don't think they care much about trivial things if they actually are useful to many users.

I'm afraid of making my parser look stupid/stubborn, because that very likely means higher support costs, and also lowers the average user's impression of the quality of the product. Gurus may know why the parser complains, but perhaps not the average support personell, and certainly not the average user

My current "favourite XML annoyance" is the rules for entity expansion, which makes writing the name AT&T in an entity rocket science for the average XML user, and probably gives some implementors gray hairs.
(I understand that these rules gives maximum power, but I can hardly see the need for it. (Or is it "often" needed because one  has chosen " or ' to mark the end of an entity value?))

I'll try to explain why it probably will give me some gray hairs when I'll implement it:

After attempting to process a document containing errors, I want to present to the user a list of error messages, and when the user clicks on one of these messages, I want to highlight the exact part of the document where the error occurs.
The problem with entity expansion is that the parser isn't parsing what the user literally wrote into the entity definitions, it is parsing a processed/"virtual" version, which *may* not be a real subpart of the document, so one has to map "virtual" locations/positions to physical (real document) positions, which doesn't seem trivial to me. It is also likely to give slightly confusing error messages, as it may be mentioning expanded stuff ("<xxx>") which the user never wrote, the user may have written "&lt;xxx&gt;" etc.

This single issue is likely to give me many hours of thinking (and programming) , while allowing stuff like "x < 5" in content only takes me a single line to handle. I sometimes get the impression that XML contains many hard to implement (and understand) things (which won't be useful to anyone but the gurus), while disallowing things that are easy to implement and also useful to the average user.
Ok, enough rebelling for now... :-)


Cheers,
Jarle


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list