XML & Entities inclusion against Inline Tag facilities.

Thu May 22 14:01:35 BST 1997

At 09:52 AM 5/22/97 GMT, Peter Murray-Rust wrote:
>In message <199705220709.JAA28224 at ifhamy.insa-lyon.fr> Alexandre Mutel
writes:

>No, you can use XML-LINK to refer to part of the current document, as well
>as to external documents.  If the external documents are XML then it is
>often straightforward to include them, but only if they have the same DOCTYPE
>If they have different DOCTYPEs we have a namespace problem and we are still
>wrestling with that one (e.g.
>
><CML>
>The rate of this reaction is given by 
><A HREF="eqn1.xml">equation 1</A>
></CML>
>where eqn1.xml might be written in MathML.  
>)

There is *NOT* a name space problem in this case.  The document "eqnl.xml"
is *parsed* outside the scope of the document that references (it is
semantically and functionally identical to a SUBDOC reference in normal
SGML).  Once the document is parsed, the result of that parsing is
combined, by application-specific means, with the document tree of the
referencing document.  At that point, things like content model constraints
are irrelevant and there are *NO* name space problems. 

In a typical implementation, the parsed result of the A element would
include a *pointer* to the parsed result ("grove") of the eqnl.xml
document, rather than literally including that document tree as a direct
child of the CML element or the A element (depending on how you decided to
represent the reference).  Because each document is in its own grove, the
name spaces for the documents are kept separate and there is no conflict.
Applications are free to follow the reference from one grove to another and
behave as if the second document was literally included at the point of
reference.

IT IS VITALLY IMPORTANT to remember the distinction between external text
entities referenced by inline entity reference, which are fragments of the
document string and are always parsed as part of it (when parsed at all),
and references to document entities using addressing from attributes
(either by URL or by attributes with a value prescription of ENTITY or
ENTITIES).

In the latter case, the referenced document is NOT parsed as part of the
referencing document.

Thus, there is a clear semantic difference between the use-by-reference of
text entity references and the use-by-value of document entity references.
[Do I have these two confused? It's early in the morning and I'm still
suffering jet lag.  By use-by-value, I mean you get the thing's value, not
the thing itself.]

The HyTime standard formalizes this notion of use-by-value through the
"value reference" facility, which simply makes explicit the semantic
intended by the A element in the above (that the effective value of the A
element is really the document it refers to).  But it is make very clear
that a value reference is a *semantic* distinction--it doesn't change the
way the source data is parsed.

One confusion factor here is that, unlike SGML today (but not in the near
future), if an XML file has no DOCTYPE declaration it can be used as either
an external text entity (parsed in the context of its reference) or as a
document entity (parsed in isolation), and you can't tell by looking at the
entity which it was intended to be.  In a very real sense, XML is saying
that all external entities are either subdocuments or documents, even
though XML doesn't include the formal notion of subdocument as in SGML.

>If the external entity is BINARY (i.e. not XML - it may stiil be ASCII) then
>a NOTATION is required (e.g. for GIF).
>
>I'll stop there and suggest someone else tells us how to use NOTATION 
>because I haven't implemented it yet!!

Notations serve two primary purposes:

1. To clearly document the data type of an entity
2. To enable the association of processors with data types.

The external identifier of a notation is intended to refer to the
documentation for the notation (e.g., the CGM standard, the GIF spec,
etc.).  It may also be used to associate the notation with a notation
processor.  In a general SGML or XML processing system, you would expect to
find a facility for mapping notations (by name or external ID) to
processors or entries in function libraries, e.g., through some form of
mapping catalog.  An obvious implementation technique on Windows would be
use OLE facilities to integrate the processors for data entities with the
base browser.  Part of the notation mapping would be the information needed
to configure the OLE communication.  I think at least one SGML editor is
implemented in this way.

Notations are somewhat redundant with MIME types, in that you may be able
to determine the data type of an entity by examining the entity or applying
whatever entrail reading gives you the MIME type.  However, notations have
the advantage that they're part of the document.  One way to use notations,
of course, is to map them to MIME types, e.g.:

<!NOTATION gif  SYSTEM "<mime>application/gif" > 
<!-- Here using the syntax of "formal system identifiers" defined in the
     Formal System Identifier Requirements annex of the HyTime standard
     to indicate that the system identifier is in fact a mime-type, which
     we need because it just as easily be a relative path name to a 
     file named "gif". -->

Or whatever the MIME type for GIF is.  If this mapping is done in a catalog
(rather than in the document), the same notation can be mapped to different
things on different systems (MIME types are not universal).

Notations must be used for data ("binary") entities.  They can also be
associated with elements by using attributes with a value prescription of
"NOTATION".  The notation named by the attribute then governs the
interpretation of the element and its content (after parsing, of course).
For example, you might do something like this:

<!DOCTYPE ProgramListing [
<!NOTATION C  PUBLIC "Kernigan and Richie" >
<!NOTATION Cpp SYSTEM >
<!NOTATION Perl SYSTEM >
<!NOTATION Scheme SYSTEM >
<!ELEMENT ProgramListing - - (#PCDATA) >
<!ATTLIST ProgramListing
          language (C | Cpp | Perl | Scheme) NOTATION #IMPLIED
>
]>
<programlisting language=perl>
<![CDATA[
sub do_nothing {
    return(0)
}
]]>
</programlisting>

Depending on the notation, you might provide different formatting of the
source or even automatically extract the content and test it or compile it
or something.

In full SGML, notations can have attributes defined for them, which can be
specified as part of the entity declarations.  Notation attributes are
intended to act as parameters to the processor of the notation.  A typical
example is attributes that describe the nature of a graphic, e.g.:

<!NOTATION TIFF SYSTEM >
<!ATTLIST #NOTATION TIFF
       compression (NONE |CCITTG4) NONE
>
<!ENTITY mytiff SYSTEM "mytiff.tiff" NDATA TIFF
   [ compression=ccittg4 ]
>

Notations and notation attributes are also used for declaring the use of
architectures and configuring their use within a document.  This makes
sense because a document type or architecture is defining the rules for a
particular data type, namely documents that conform to the document type or
architecture, therefore, it is part of the formal definition of a notation.
For example, to derive a document from an architecture, you would do
something like this (in this example, the archtecture is one I made up for
representing bibliography entries):

<!DOCTYPE MyDoc [
<!-- Declare that this document is derived, in part, from the
     Bibliography Entry architecture. -->

<?IS10744 ArcBase BibCat>
<!-- Names following "ArcBase" are names of notations that declare
     architectures. Architecture engine will expect to find those
     notations declared in the document: if it doesn't, it's an error.
  -->

<!NOTATION SGML PUBLIC "ISO 8879:1986//NOTATION Standard Generalized
                        Markup Language//EN" >
<!NOTATION BibCat PUBLIC "-//Kimber//NOTATION Bibliography Entry
Architecture//EN"
 -- A document architecture conforming to the 
    Architectural Form Definition Requirements of
    International Standard ISO/IEC 10744.         --
 >
<!-- The following notation attributes configure the use of the
     architecture and control how architectural recognition and
     processing is done by a general architecture engine such
     as SP. The attribute names are defined in the Architectural
     Forms Definition Requirements annex of the HyTime standard. 
  -->
<!ATTLIST #NOTATION BibCat
       ArcFormA NAME  #FIXED "BibCat" 
       ArcNamrA NAME  #FIXED "BibNames"
       ArcBridF NAME  #FIXED "BibBrid"
       ArcDocF  NAME  #FIXED "BibDoc"
       ArcOptSA NAMES #FIXED "options"
       ArcDTD   CDATA #FIXED "BibCat"
       options  CDATA #FIXED "" -- Specify "marc" to turn on MARC options --
 >
 <!-- The following entity is the "meta-DTD" for the BibCat architecture.
      It is referenced by the ArcDTD notation attribute.  An architecture
      processor uses this meta-DTD to validate the document against the
      meta-DTD. -->
 <!ENTITY BibCat SYSTEM "bibcat11.mdt" CDATA SGML
 >

 <!ELEMENT MyDoc - - (List-of-Books) >
 <!ATTLIST MyDoc
           BibCat  NAME #FIXED "BibCat" -- MyDoc derived from BibCat form --
 >
 ...

Personally, I think it is a serious mistake for XML to not have notation
attributes, in large part because of their use with architectures, which
are of critical importance to the use of XML.

Notations are also used in XML (and in the future, with SGML), to create
"formal processing instructions", where the notation name is the first
keyword of the processing instruction. e.g.:

<!NOTATION MyBrowser PUBLIC "my cool XML browser" >
<?MyBrowser something unique to my browser to control its processing>

This mechanism allows general processors to associate processing
instructions with processors (using the notation-to-processor mapping it
must already provide for entities and elements).  It also enables better
error reporting, because the processor can say 'Cannot find processor or
definition for processing instruction notation "MyBrowser", public ID "my
cool XML browser"',
rather than either silently ignoring the PI or issuing an "Unknown PI ..."
message.

Cheers,

Eliot

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)