Re WF, V, and MSXML

Richard Light richard at light.demon.co.uk
Mon Jun 9 12:51:24 BST 1997


In message <199706082339.QAA08654 at bolt.sonic.net>, Terry Allen
<tallen at sonic.net> writes
>|
>| > But for an XML parser, the boundaries are shifted, because
>| > it has to deal with an XML document that *includes* the prologue
>| > (XMLlang production 23, where "element" corresponds to the SGML 
>| > "document instance set", I think).  I don't know whether this is a good 
>| > idea or not, just trying to understand it as an early adopter.

I don't see _any_ difference between SGML and XML on this front.  SGML
parsers also have to deal with the prolog: the formal syntax of an "SGML
document entity" is:

        S
        SGML declaration,
        prolog,
        document instance set,
        'entity end' signal

(so in fact they also have to deal with the SGML declaration as well!)
The fact that the default ESIS output from the parser doesn't include
any DTD-related information shouldn't be taken to mean the parser hasn't
processed this information.

>| I am actually unclear whether a WF-only parser (e.g. Lark) has to read the
>| internal subset at all, other than skipping to the ']>' at the end.  If it 
>| *does* read and parse it, what does it do with the information.  For example,
>
>The soft spot here is the first line of 2.2, where "match" is not
>defined except that later in that section it "implies" a few things,
>which are not apparently meant to be a complete set.  What the
>WF document matches is production 23, Prolog element Misc*.  As
>the processor attempting to determine WFness must look inside element to 
>determine WFness, presumably the same is true of prolog.
>
> ... unless I determine WFness by *parsing* with a *real parser* which
>the processor is not meant to be ...

I would read the existing XML spec in a stricter spirit than you have
done.  To me, "match" means just that, i.e. that _if_ a WF document has
an internal or an external DTD, these should be parsed as though for a
valid XML document.  Any _syntactic_ errors in the DTD should be
flagged, even in 'WF' mode.  (Bear in mind that no-one is forcing WF
documents to have a DTD at all, except for entity declarations.)  If you
try to adopt a 'don't care' mode of parsing for the DTD when dealing
with WF documents, you probably create many more problems than you
solve.

The only difference is the use that is made of the DTD information: in a
WF document only the entity declarations matter to the parser.

>| what is the implied structure of the document in:
>| 
>| <!DOCTYPE FOO [
>| <!ATTLIST FOO XML-LINK CDATA #FIXED "SIMPLE">
>| ]>
>| <FOO HREF="bar"/>
>| 
>| Can we assume that FOO (which has no Element declaration) has an ATTLIST as
>| given, and that therefore it inherits the SHOW and ACTUATE attributes?
>| IOW *must* a parser decorate all matching elements with the ATTLISTS in the 
>| internal subset?
>
>No, not per XMLlang alone.  FOO's only declared attribute has as its name
>the unreserved string "XML-LINK" although it uses an undeclared attribute
>name "HREF".  So it is WF but not valid.

.. and since it is only well-formed and not valid, it cannot (in my
view) partake in any operations that require knowledge of <!ELEMENT or
<!ATTLIST declarations.  IOW, XML-LINK is not relevant to WF documents
...?

Richard Light
SGML and Museum Information Consultancy
richard at light.demon.co.uk
3 Midfields Walk 
Burgess Hill
West Sussex RH15 8JA
U.K.
tel. (44) 1444 232067

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list