MSXML, WF, and Validity

Terry Allen tallen at sonic.net
Sat Jun 7 20:46:57 BST 1997


Jean Paoli wrote:
| The Microsoft XML Parser is a validating XML parser written in Java. 
| Once parsed, the XML document is exposed as a tree through a simple set
| of Java methods. 

After playing with it for awhile this morning I found myself wondering
about WF and validity; I don't know if the following counts as a bug,
but it would be useful to hear what other think.

My input is:

<?XML version="1.0" encoding="UTF-8" ?>
<!doctype book [
<!element book (title, chapter+)>
<!entity foo "bar">
]>
<book><title>Palmy Days</title>
<chapter><title>One Frond at a Time</title>
<para>It was a dark and stormy night.  The crows clattered
amongst the fronds.  
</para>
<para>&foo;</para>
</chapter>
</book>

I stuck the DTD in the internal subset because I couldn't get the
parser to find an external DTD.  The output of 

  jview msxml -d palmy

is

<?XML VERSION="1.0" ENCODING="UTF-8"?>
<!DOCTYPE BOOK [
    <!ENTITY foo 'bar'>
    <!ELEMENT BOOK (TITLE,(CHAPTER,CHAPTER*))>
]>
<BOOK>
    <TITLE>
        Palmy Days
    </TITLE>
    <CHAPTER>
        <TITLE>
            One Frond at a Time
        </TITLE>
        <PARA>
            It was a dark and stormy night. The crows clattered amongst the fronds.
        </PARA>
        <PARA>
            bar
        </PARA>
    </CHAPTER>
</BOOK>

Now the declarations in the internal subset have been read (and munged),
and the foo:bar entity expansion has been performed.  Yet the instance
does not conform to the "DTD" in the internal subset, although taken
on its own it is well formed.  Is the input file "palmy" a valid
XML document?  The VC comment following [36] indicates not.  Is it
WF?  I can't find a WF comment indicating that the document must
conform to the DTD (which is reasonable, although perhaps this point
should be covered explictly).  Is MSXML only parsing "palmy" as WF?
If not, is this error recovery?

These (real, not rhetorical) questions are of interest whether or
not this is the intended behavior of MSXML.


Regards,

  Terry Allen    Electronic Publishing Consultant    tallen[at]sonic.net
                   http://www.sonic.net/~tallen/
    Davenport and DocBook:  http://www.ora.com/davenport/index.html
          T.A. at Passage Systems:  terry.allen[at]passage.com 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list