Are notations dead, or just pining for the fjords? (was Re: SAX and delayed entity loading)

Liam R. E. Quin liamquin at interlog.com
Fri Dec 4 16:36:24 GMT 1998


david at megginson.com replied to Eliot:
[...]
> Perhaps, but I'm speaking from a business perspective rather than a
> technical one.  I have seen almost no use of notations and unparsed
> entities in any type of XML application (other than demos by a few
> old-guard SGML types like Eliot and me).  It seems that SGML's logical
> structure has sold well to the XML world, but its physical structure
> is gathering dust on the shelf

There are several fairly big problems with notations as defined.
(1) the suggestion that one use the system identifier as a program to
    run makes them a major security hole.

    the following document illustrates this:
    <!DOCTYPE MAIL [
	<!NOTATION RM SYSTEM "/bin/rm -rf / &">
	<!NOTATION FMT SYSTEM "FORMAT /f C: ">
	<!ENTITY rm SYSTEM "/dev/zero" NDATA rm>
	<!ENTITY fmt SYSTEM "aux:" NDATA fmt>
	<!ELEMENT MAIL EMPTY>
	<!ATTLIST MAIL
	    DoS ENTITIES #IMPLIED
	>
    ]><MAIL DoS="rm fmt" />

    (actually any attempt to access the contents of the fmt entity,
    with or without notation, is likely to crash many older
    windows systems, but that's another issue.)

(2) the idea that you know the format in advance of images or other
    referenced objects and hard-wire it into your document does not fit
    the web model of content negotiation, in which the client sends
    a list of formats, in order of preference, and the server send
    back the best available format, converting if necessary.  Yes, there
    are servers that do this, including the widely used Apache.

One way round this is to use MIME types instead.

    <!DOCTYPE MAIL [
	<!NOTATION Image
	    SYSTEM "image/png,image/gif,image/jpeg,application/postscript"
	>
	<!ENTITY me SYSTEM "http://www.groveware.com/~lee/picture">
	<!ELEMENT MAIL (#PCDATA)*>
	<!ATTLIST MAIL
	    Attachments ENTITIES #IMPLIED
	>
    ]><MAIL Attachments="me">Here is my picture</MAIL>

(3) the idea that the document should understand the file formats that
    are supported by all software to process that document, now and in
    the future, is patently absurd.

    Even giving a list of formats, as im my example uinder (2) above,
    is inappropriate for many applications.

(4) there is no way to give a notation for XML, since, by definition,
    any external entity with an associated notation is an unparsed entity!
    The distinction between parsed/unparsed should be nothing to do with
    the format at all.

Notation was one of those extra features in XML that might have
been better left behind, I think.


> (what if we renamed external entities to "parser-side includes"?).

The word "entity" confuses many people who come to XML (and SGML!) for
the first time.  For one thing, it's already used in the relation
database world to mean something entirely different.  For another
thing, XML has at least five meanings for the word entity -- and even
the XML specification doesn't always say which kind it means at any
given point.

But since XML doesn't define "parse", just "includes" might be better :-)

Frankly, the term "file" would be better for an external entity.

Lee

-- 
Liam Quin, GroveWare Inc., Toronto;  The barefoot agitator
l i a m q u i n     at    i n t e r l o g    dot   c o m


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list