Musing over Namespaces [long]

Sat Dec 18 01:23:20 GMT 1999

Reinserting the original question for reference purposes....

> > > Why is XML different? Is it just that we come from the SGML 
> > > background, where we consider structural validation to be part of 
> > > a document rather than a process applied to it, or is there some 
> > > kind of a fundamental difference between naming code and naming 
> > > document nodes that no one has articulated yet? 
> >
> > Just to take a stab in the dark here, but wouldn't this fundamental difference be that
> > Perl and Java are almost unilaterally self-contained parcels (hence, it doesn't matter
> > what the package is named, because you're writing everything that deals with it
> > anyway), but XML documents are designed for interchange - where the names don't
> > just have to make sense to you, but also to an unknown client?
> 
> This is not true.  We combine Java and Perl from a variety of sources and refer to
> these classes from other packages from other sources -- significantly more diverse than
> our current XML declarations in terms of one set of classes referring to another.

Okay, granted - I don't pretend to be a Java/Perl guru (hence, "stab in the dark"), so 
perhaps I misunderstood exactly what is meant by "naming code" in this context.  Now, 
here's the key point I'm trying to get a handle on - what level of code are we talking 
about?

Currently, I see the following levels corresponding to each other between J/P and XML:

1. Content == content ==  non-code.  Irrelevant for this discussion; no names involved.
2. Program-specific objects (private class?) == local DTD fragments == stuff that 
pertains to this file and only this file.  Names don't matter, since all the definitions/code 
required to deal with them are sent in the file itself.  (This seems from where I sit to be 
common in programs but rare in documents; all programs need to have a few routines 
that are unique to them, but very few *ML documents need to have tags which are 
defined for only that document.  This is part of the distinction I was drawing above.)
3. Public class == Public DTD == definitions for shared objects.  Names matter; other 
programs have to know how to access/handle/deal with these.
4. Inside class (pub or private) == Inside DTD (pub or private) == internal code for the 
definitions.  Names do not matter; the very purpose of a class/DTD is to let people deal 
with these objects without having to know about the guts of those manipulation methods 
- we don't need to know the protocols for manipulating magnetic particles on a videotape 
to hit Record on a VCR, as we trust the VCR (class/DTD definition/etc.) to handle that 
stuff for us.  Those protocols are only important to the author, so he can refer to them as 
Clyde and Billy if he so desires.  The key point here is "unexposed" code - you don't 
care if I use the variable I, R, or GatesIsEvil as a loop iterator in a function, because 
you're just calling the function on a black-box level.  As long as you feed the box the 
right inputs and it spits out the right outputs, you don't care what goes on in the box - 
and that's why you have the box in the first place.

My analysis is as follows, and please tell me if I'm wrong.  With XML, you have very 
little private code in terms of document elements - after all, the whole idea is one of 
interchange, which means sharing, which means some level of publicness, which in turn 
requires some way to resolve the public parts that appear in a document.  (Cascading 
style sheets come to mind as a framework model - look on the client's system for the 
highest-priority definitions, then look at the document and where it points.  Just a 
tangent.)  Your private code is going to be in your software, in how you handle 
documents that come in from wherever.  OTOH, with programs, you actually have quite 
a bit of private code - even if that private code is built with public-class bricks, the 
building itself is private.  Responsibility demands that, if you're going to use public-class 
stuff, you have to bundle it with the program in some way - but, if I get the classpath 
concept right, the client can say "oh, I already have that module".

To translate that last bit a little more clearly, I am saying that programs either need to be 
sent as complete entities - with all the public modules required to handle all the public 
classes used in the program included with the distribution - or there needs to be some 
way for the client to obtain any public modules that were not sent.  From what I can tell, 
C++ takes the former approach; if you reference a standard class, the tools required to 
manipulate that class are built into the object file, and thus the distinction between public 
and private classes vanishes where the end user is concerned.  If Java and Perl do things 
differently in this regard, then I would expect there to be some way to take a reference to 
an unknown yet defined-public class, and retrieve the missing class definitions from 
somewhere.

Isn't that what namespaces are supposed to be for in XML?

As I understand the XML namespace concept, the namespace is included in a document 
to tell the client software where to get instructions on how to handle some set of 
elements.  If you want to use HTML elements, you give directions to the HTML 
definitions.  If you want to stick some MathML in there, tell the client where the 
MathML definitions are.  In short, assume the client software doesn't know what ANY 
elements mean and thus provide namespaces that cover everything you use...just as a 
responsible programmer includes all the libraries that his program needs.  However, just 
as a Java VM can apparently say "oh, I have that class already" and default to a local 
version, the key to meaningful XML namespaces seems to be giving the XML client a 
way to say "oh, I need to transform that element into *this*".

I think that's right, anyway.  (Gimme a break; I've had a long day.)

Markup languages are, when you get down to brass tacks, all about the facilitation of a 
transfomation of data.  (Okay, so technically ALL computer tasks are merely data 
transformation in some form, but that's a tangent.)  Transformation requires input, 
instructions, and output.  Well-formed XML gives us a coherent form of input, but that 
means nothing without meaningful instructions...and those instructions have to come 
from somewhere.  That's where the DTDs, schemas, and namespaces come in - making 
the pretty document mean something to a client.  (And yes, I'm being deliberately 
ambiguous with the word "client".  Think about it.)

With that in mind, consider my other comments:

> > In other words, my only concern when naming a function or a class in a program is
> > that I need to know what it is; I can name a variable "Fred" or a 50-char string class
> > "Bubba" if I want to, and it doesn't matter - because nobody else needs to
> > understand what those names mean.

I'm talking about private/local code here.  If I write my own funky sort routine, nobody 
else needs to know what I call my variables.  If I make the routine private and keep it in 
my own software, nobody else even needs to know what its name is.  If I make the 
routine public, I need to document the calling protocols, the inputs and the outputs - but 
the interior code is still "mine" to name as I please.

> >  However, if I'm writing a document that I'm going to send somewhere else for Joe
> > to deal with, I'd better use names that Joe can understand and easily map.

With XML, this mapping can be an XSLT transformation or something else that Joe can 
handle - but I cannot simply leave names undefined.  If I use something funky, I have to 
tell Joe what to do with it.

> Packages are heavily reused and refer to the public declarations of other packages.

Great.  I like code reuse.

> In Java, methods only to be used within a package are not given scope beyond that
> package and can be renamed as desired.  But software packages would be quite
> useless without a significant number of public declarations.

In other words, private code is yours to deal with as you please, but for certain 
functions, you shouldn't reinvent the wheel every time.  Fine; I accept that and even 
encourage it.

> For example, take the W3C DOM Java bindings which exists almost purely of public
> declarations.  It declares such common public names for use as "Document",
> "Comment", "Attr". There are numerous conflicts between these compiled interface
> class names and classes in the applications I use them in.  Without package name
> qualification of the classes, the situation would be quite difficult not only distinguishing
> between ambiguous names, but also just trying to keep track of which standard each
> class belonged to.

In other words, you need a way to take a given element/object and find out for certain 
exactly what set of instructions you need to follow when processing it.  Sounds awful 
similar to XML namespaces from here.  Context is critical, yet we can only assume a 
minimal context (the declared XML spec/the defined Java syntax) - everything else has 
to be unambiguously identified in some way.

> IMO, this is exactly what happens with XML, mixing different standard elements and
> architectural forms from different specs in a single DTD or content model to produce a
> desired result.

I'm not so sure a single DTD is the answer, but I may be reading you a bit too closely.  
Rather, I'd say references to all required name sources are needed - why copy someone 
else's definition into your single DTD when you can simply say "for this element, go by 
this other DTD"?

> > The difference you're looking for is one of scope.  Internal names don't matter to the
> > outside world, because nobody outside has to do anything with them...but external
> > names MUST be defined in some way, else nobody outside CAN do anything with
> > them.  Am I expressing it clearly?
> 
> No.  Java packages typically have lots of external names that matter to the outside.

Rupture alert!  As I read you, you're talking about "external names that matter to the 
outside" in the contexts of (a) public classes that will be referred to by other software, 
and/or (b) calls to such public classes.  Yes, both of those DO matter to the outside, and 
I say as much above ("external names MUST be defined in some way") - but neither of 
them is an "internal name" as I discussed, precisely because I mean by "internal names" 
things which are not exposed to third parties; they are completely internal to your code.  
You're talking about one thing, I'm talking about another.  External names have to be 
referenced in a meaningful way, otherwise the program falls apart - "I'm supposed to 
make an object named Bubba of class public-hick - but what's "hick" mean?  It's not 
defined anywhere, and I don't know where else to look!  (crash)".  This is wholly 
different from "I'm supposed to make an object named Fred of class public-foo - okay, I 
know what foo is, so I can do that."  The class names hick and foo matter because they 
are external references.  The variable names Bubba and Fred only matter if they are 
being exposed for specific reuse.  You're talking about hick and foo; I'm talking about 
Bubba and Fred.  (I'm also talking about both sides of "object Jake of class private-
schmuck".)

 Rev. Robert L. Hood  | http://rev-bob.gotc.com/
  Get Off The Cross!  | http://www.gotc.com/

Download NeoPlanet at http://www.neoplanet.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)