Notations

W. Eliot Kimber eliot at dns.isogen.com
Wed Sep 30 04:37:42 BST 1998


At 07:56 PM 9/29/98 -0400, david at megginson.com wrote:

>HTML and XML should have similar public IDs, since they're both W3C
>specs -- the public ID will probably include the w3.org domain name.
>What do you use, Eliot?

For graphic notations, I usually use an omitted system identifier. For
other notations, I use the URL of the spec, if I know it. I didn't realize
there was an Adobe-defined FPI for EPS (is there one for PDF? Frame MIF?
Frame binary?).

The purpose of the external identifier for a notation is to uniquely
identify the notation, presumably by identifying the authoritative
documentation for that notation.  This has two purposes:

1. To allow observing humans to determine what a particular notation is all
about and have some hope of figuring out how to process it.
2. To allow the mapping of local notation names (i.e., on data ("unparsed")
entity declarations and NOTATION attributes) to the processor for that
notation.

This latter function is *identical* to the way that object references are
mapped to objects in COM.  If you dig into how COM object connections are
managed, you'll discover that the Windows registry is, in part, nothing but
a mapping table that gets you from local (to your machine) names for
notations to the UUIDs of the COM objects that implement those notations,
which are then mapped to the local program names on your machine (e.g., a
.exe, .dll, or .ocx file).

This is just like for notations: local name for notation "EPS" maps to
universally unique name for notation (+//ISBN
0-201-18127-4::Adobe//NOTATION PostScript Language Ref. Manual//EN) maps to
local processor object that interprests the notation (e.g., acroread.exe). 

I find this interesting for two reasons.  First, it suggests that the
notation mechanism the correct solution for the problem because someone
else came up with essentially the same solution for essentially the same
problem. Second, during the XML discussions, Microsoft often complained
that indirection was too hard in various contexts. However, here is
Microsoft using pretty sophisticated indirection in the heart of their
operating systems.  Hmmmm.  Maybe it's not so hard after all.

Or is it simply that in the case of COM, as for notations, there's simply
no way to avoid the indirection, so you have to suck it up and deal with
it?  Hmmmm. 

The main difference between what's happening in COMland and what notations
do is that in COMland the unique name is completely opaque and unique
because the generation algorithm depends on a bunch of variables that
pretty well guarantee uniqueness, but also guarantee opacity; while
external IDs can be just as unique, but require things like registration
authorities and name management processes in order to remain human
understandable and meaningful.

One of the things this means is that FPIs can, if constructed in clever
ways, be "researchable" (as Martin Bryan said) in the absense of a known
mapping, while UUIDs are pretty much just noise unless you already have the
mapping.

I can tell you one thing, the Windows registry would be a heck of lot
easier to debug if you could tell by looking at a UUID what it named, or at
least have a clue.

This then leads to a question: do I use public IDs, URLs, or UUIDs for my
notations? I think that I would *never* use UUIDs, because they are too
opaque. But I would definitely use them as the right hand side of my local
mapping table, assuming that I'm using COM-based software (which until
someone provides a usuable SGML editor on Linux {other than psgml--sorry,
I'm dependent on graphical interfaces for structured editing}, I'm forced
to do).

Once I properly implement generalized notation processing for PHyLIS
(www.phylis.com), you will actually see things like this in the "entity"
mapping catalog PHyLIS uses:

PUBLIC "x"
       "{00000014-0000-0010-8000-00AA006D2EA4}"

Where "x" is the external ID for the notation (Notation name, URL, or FPI,
doesn't matter) and "{00000014-0000-0010-8000-00AA006D2EA4}" is the UUID of
the COM object that implements PHyLIS' notation processor interface on your
machine for that notation.  

Within PHyLIS, the processing will be:

1. Get reference to data with a notation (for example, a request to
construct a grove from a data entity with the notation "x").
2. Look up the external ID of the notation for the data entity
3. For the external ID, look up the UUID of the implementing object
4. Use that UUID as the argument to create_object() (in VB, not sure what
it would be in Python, but there must be something).
5. Windows handles resolving the UUID to an executable.

When configuring PHyLIS, you would register the COM objects you want to use
to process various notations, just as you register helper apps in your Web
browser, using some PHyLIS-provided interface (or by modifying the XML
document(s) PHyLIS will use for configuration--you can bet I'm not going
near the registry for that). Big difference--no dependency on extensions,
as there are with MIME types (at least on Windows, Unix systems may be
smarter).  In fact, the external ID of the data entity is irrelevant, the
notation governs.

Of course, you might define a very generic notation, like "graphic", where
the processor uses other means to determine how to really process the
graphic (it might use MIME types), but that's ok--if it makes sense to do
that for you, no reason not to. In the case of things like graphics,
there's already a well established mechanism for making graphics
self-defining for type (magic numbers), so why make the entity declaration
redundant and risk lying (how many times have you changed the format of a
graphic and grumbled about having to update the entity declaration?)?  But
not all data types have this facility, so you still need something like
notations to handle that case. 

You also need notations to indicate that special interpretations should be
applied to an element (after parsing, of course), which is what notation
attributes do.

Cheers,

E.
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 75202.  214.953.0004
www.isogen.com
</Address>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list