Stylesheets considered limiting (was Re: XLink - where are we?)

W. Eliot Kimber eliot at dns.isogen.com
Fri Nov 13 05:50:19 GMT 1998


At 04:23 PM 11/12/98 -0500, John Cowan wrote:
>W. Eliot Kimber scripsit:
>
>> [D]ata
>> entities is the only feature of SGML not provided by XML that also has
>> significant *semantic* utility (as opposed to being a syntactic convenience
>> like markup minimization or the LINK feature) [...]
>
>And can you enlighten those of us who don't know SGML about
>the semantics of data entities?  Thanks.

Sure.

SGML and XML both have the concept of data content notations, declared with
the NOTATION declaration:

<!NOTATION MyDataType PUBLIC "http://www.drmacro.com/mydatatype.xml" >

Notations allow you to give names to specific data types, that is, the
rules that govern the interpretation and processing of a particular type of
data. XML is an example of a data content notation, GIF is another.  For
example, to refer to another XML document as a data entity, I would do this:

<!NOTATION xml PUBLIC "http://www.w3.org/rest of URL for XML spec" >
<!ENTITY somedoc SYSTEM "somedoc.xml" NDATA XML >

This entity "somedoc" is an unparsed entity in XML terms, meaning that it
is not parsed in the context of the document that references it (even
though it will of course be parsed as an independent document should it
ever be processed).

In SGML, you can define attributes for notations just as you can for
element types ("data attributes"). These attributes act as parameters to
the processors that know how to interpret data governed by the notations.
In SGML, you can specify data attributes either as part of a data entity
("unparsed entity") declaration or on an element that is also governed by a
notation.  With the WebSGML TC to SGML, you can also use data attributes as
part of the declaration of attributes in order to define specific data
types for attributes.

One use of data attributes is to associate attributes with data entities.
The textbook use is parameters associated with graphic data entities, e.g.:

<!NOTATION gif SYSTEM >
<!ATTLIST #NOTATION gif
   width 
     CDATA
     "100"
   height
     CDATA
     "100"
>

<!ENTITY big-graphic SYSTEM "picture.gif" NDATA gif
  [ width="640" height="480"] 
>

A typical SGML processor will, when it encounters a reference to the entity
big-graphic, will see if it knows of a processor that can process the
notation "gif". It finds one and passes it the information from the entity
declaration, including the data attributes, which the processor presumably
takes as parameters.  The presumption is that the notation has defined, as
part of its formal documentation or definition, what the attributes should
be.  I.e., somewhere in the definition of this fictional gif notation (or
the local processor associated with the gif notation), the documentation
says something like "processors should accept height and width parameters
from which they determine the presentation size of the graphic".

Data attributes can be used with elements that are "governed" by a
notation.  For example, say I want to define a query notation that I'll use
to address things in some repository.  I first define a notation that
represents the general query mechanism:

<!NOTATION MyQuery PUBLIC "http://www.drmacro.com/notations/myquery.xml" >

The resource identified by the external identifier should be the
authoritative definition of what the notation is about, how to process it,
etc.  *IT SHOULD NOT BE A PROGRAM*.  Programs should be associated with
notations by mapping their external identifiers to programs, dlls,
tool-provided functions, etc., through some tool-specific mechanism. For
example, in PHyLIS, my HyTime engine, my intent is to provide a
configuration file that lets you map notations to dynamic libraries or
objects (e.g., COM objects, Java classes, Corba whatsits, etc.).

To make the query easy for authors to specify, I want to provide a few
parameters that authors fill in to specify the query details. I do this by
declaring some notation attributes to serve as the parameters to the query:

<!ATTLIST #NOTATION MyQuery
   table       -- Name of table query is applied to --
     CDATA
     #REQUIRED 
   select-on   -- Value to select on --
     CDATA
     #REQUIRED
   where       -- Field whose value to select on --
     (name | ssnum | phone)  -- Some field names --
     #REQUIRED
>

I then declare an element type that can be used to specify queries in
documents:

<!ELEMENT SelectData -- Do a query --
  - O
  EMPTY
>
<!ATTLIST SelectData
   notation
     NOTATION
     (MyQuery)
     MyQuery
   table       -- Name of table query is applied to --
     CDATA
     #REQUIRED 
   select-on   -- Value to select on --
     CDATA
     #REQUIRED
   where       -- Field whose value to select on --
     (name | ssnum | phone)  -- Some field names --
     #REQUIRED
>

Note the NOTATION attribute. This attribute defines the SelectData element
as being "governed by" the notation MyQuery.  This means that *after* the
document is parsed, the processor will process this element type by looking
up a processor associated with the notation
"http://www.drmacro.com/notations/myquery.xml" (remembering that the
external ID is the real name, the local name, MyQuery, is just a local
proxy).  It finds one, so it passes the whole SelectData element to it
(that is, the element node constructed from the SelectData markup).  The
element's attributes are associated with the notation attributes simply by
matching the names [the Data Attributes for Elements (DAFE) facility of the
HyTime architecture provides machinery for resolving name conflicts, but
you wouldn't normally need this extra stuff because you can just define
your elements and notations so as to avoid the problem.]

The processor knows to expect to find the three attributes declared for the
notation (because the notation serves the processor and thus reflects what
it's defined as needing). It uses normal element processing (e.g.,
element.Attributes["table"]) to get the value of the attributes, does what
it does, and returns the result. The processor takes the result and goes on
doing whatever it's doing.

One way to think about data attributes is that they provide the binding
between software components in the outside world and data objects in the
document, either data entities or elements. Essentially, the processor for
some data type defines its interface, in the API sense, and the notation
attributes "implement" that interface by providing the necessary attributes.  

Of course, it's up to particular processing systems to provide the actual
integration APIs and configuration infrastructure so that processors and
documents can take advantage of notations and data attributes, but it's
pretty basic stuff.

I think that data attributes, especially when used with elements governed
by notations, are really really valuable. I pushed very hard for their
inclusion in XML but was unable to convince the rest of the WG. I hope that
the XML developers will reconsider data attributes when XML is eventually
revised.  I think they're very valuable.

Cheers,

E.
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 75202.  214.953.0004
www.isogen.com
</Address>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list