A call for reason

Tue Nov 30 21:57:35 GMT 1999

On 30 Nov 1999 rev-bob at gotc.com wrote:

> > >So as it stands, there will exist valid SML documents that are not well
> > >formed XML and will therefore trigger a fatal error if given to an XML
> > >parser.
> > 
> > I was talking about removing an error reporting requirement
> > because I believe XML WG went overboard with the error
> > reporting requirements which places a heavy burden on
> > the performance of XML parsers.
> 
> In other words, the WG went overboard in specifying that when something is broken, 
> the parser must say so?  I don't consider that overboard at all.
> 
> > IMHO, HTML crowd got burnt badly with HTML's forgivable parsers
> > and over-reacted.  There are other ways to solve this sort of
> > problems without resorting to high tariff on all.
> 
> I think the UA authors finally got sick of having to code bloated "forgiving" parsers 
> because the HTML spec required a huge amount of tolerance, but I could be wrong.  

The problem with HTML was that the spec basically said "you can attempt to
repair errors in an application-defined fashion."  The result was a
shitload of "HTML" that was "designed" with "knowledge" of particular
applications' (undocumented) error-correction behavior; much of that
"HTML" suddenly became unusable when new versions of popular browsers came
out.  Plenty of Web designers were arguing that they were getting paid to
create pages that worked on what the majority of viewers were using, and
they didn't care whether the only reason they "worked" was that the
browsers' error-correction was giving them the results they wanted.  "I'm
too busy making money to make sure all my quoted attribute values have
closing quotes just so Lynx users can read my pages.  They work just fine
in Netscape."  And so they did, until Netscape 1.0 came out.  And so on
(*every* major release of Netscape came with a change in error-recovery
behavior that broke lots of "they worked" documents).

The lesson here is that allowing parsers to implement their own
idiosyncratic error-recovery strategies inevitably leads to the
proliferation of incompatible "slang" versions of the base language, each
of which is understandable to only one parser implementation.  In the case
of HTML, we were dealing with authors who were either ignorant of the
pitfalls of best-case design ("it works for me and works when I demo it to
the boss, so it must be OK") or actively hostile to the philosophy of
robustness ("when you say that's 'incorrect,' that's just *your*
opinion")--the latter was IMHO partially due to an affected
artistic-rebel-poseur attitude that couldn't distinguish between esthetic
judgments, which, contrary to the moaning of some ultra-conservatives,
*cannot* be treated as truth-valued statements, and criteria for
formal-language correctness.

The requirement that an XML processor not allow application-level attempts
at repairing illegal syntax (which does *not*, BTW, demand that any
application make a "terminate" system call to the OS if there's illegal
syntax; it only demands that the application refuse to process the
document in question) may sound harsh, but the alternative is either the
creation of languages whose grammars can only be inferred by
experimentation, or putting precise error-recovery strategies in the spec
itself (which might work for an appllication-specific language where you
know what the semantics are, but hardly for a metalanguage).

Novice programmers (and non-novice script kiddies) like languages that
don't enforce error-checking because, let's face it, error-handling code
is tedious and boring to write.  But there are only three alternatives:
write the error-handling code, sweep errors under the rug (where they
eventually catch fire), or use a system that's guaranteed not to generate
errors.  The latter option assumes that perfection is not only achievable,
but has already been achieved.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)