What XML parsers have to report (was Re: James Anderson's table)

Thu Aug 13 18:13:35 BST 1998

John Cowan writes:

 > james anderson wrote:
 > 
 > >   IV: for the moment, i've chosen the precedence
 > >        tag-content,
 > >        bindings-from-containing-elements,
 > >        attribute-defaults
 > >   where the bindings-from-containing-elements takes its initial value from the xml-decl.
 > 
 > This cannot be correct XML, and what is not correct XML is not correct
 > XML-ns.  An XML processor, as Tim Bray pointed out the other day,
 > must implement attribute defaulting (modulo problems with not reading
 > external DTD/parameter entities), but need not tell the application
 > whether the attribute was explicitly present or defaulted.

One of the new items of work assigned to the XML Working Group is the
creation of a formal XML data model, specifying what information XML
parsers must deliver to an application (but not how the information
should be delivered -- that's up to formal or informal standards like
the DOM and SAX).

This matters quite a bit because right now, a processor is not
required to report attribute values at all (for example); here's what
the spec says (3.3.2, "Attribute Defaults"):

  If a default value is declared, when an XML processor encounters an
  omitted attribute, it is to behave as though the attribute were
  present with the declared default value.

Later, in 3.3.3, the spec does include the words "Before the value of
an attribute is passed to the application or checked for validity...",
implying an intention, at least, that the value should be passed on,
but it's never stated as a requirement.  Here's what the XML 1.0 REC
explicitly requires parsers to report to applications:

1) Processing instructions (2.6).
2) All non-markup characters, including whitespace (2.10) [presumably
   only those within the document element, though the spec is unclear].
3) Normalised line-ends (2.11) [exception to #2].
4) The external identifiers of unparsed entities and notations (4).
5) Unreferenced external parsed entities (4.4.3).

Note that elements and attributes are _not_ in the list (oops!).  Only
the common sense (or blissful ignorance) of parser writers has
guaranteed that that information is always available.

In any case, John is correct: it shouldn't matter whether an attribute
value is defaulted or specified.  As a logical task, namespace
processing takes place *after* XML 1.0 parsing and validation, not
before -- for SGML weenies, think of namespace processing as a
transformation applied to a grove.

All the best,

David

-- 
David Megginson                 david at megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)