Namespaces and Parsers

Wed Jun 3 10:51:35 BST 1998

I've now started implementing Namespaces in JUMBO2 and have a clearer view
of what is required. I've found myself writing at least some of the
handling in SAX, and this represents my first thinking. I'd be extremely
grateful for comments.

Firstly, I think the namespace draft is exactly what I want at present. Let
me assume a document like:

<?xml version="1.0"?>   <!-- optional, but valuable -->
<?xml:namespace ns="http://foo.org" src="file:/usr/lib/foo.xml" prefix="F"?>
<?xml:namespace ns="http://bar.org" src="file:/usr/lib/bar.xml" prefix="B"?>
<!DOCTYPE F:foo [
<!ATTLIST F:foo xml:lang CDATA "en">
]>
<F:foo>
<?F:safety hazard="dwarf"?>
	<F:xyzzy B:y2="plugh">Orange smoke</F:xyzzy>
	<B:bar/>
</F:foo>

the following happens.

SAX calls the parser, which send calls to the DocumentHandler and
DTDHandler. The following result in SAX calls:
	xml:namespace PIs
	the F:safety PI
	the elements 
The following do not result in SAX calls. 
	The DOCTYPE
	the ATTLIST
	the XML declaration (The startDocument is triggered by the start of file)

(Am I correct?). 

The namespace draft suggest several places for validation (e.g.
xml:namespace must precede DOCTYPE). There is at least one validation which
cannot be carried out with SAX as it stands because information is lost.
That is that the xml:namespaces precede the DOCTYPE, and it was there that
I was suggesting minor revisions to SAX.

I think The rest of rhe namespace constructs can be verified and processed
through SAX calls.

My current strategy is as follows:

Implement a Namespace class, whose constructor is called for each
xml:namespace (through processingInstruction). This can verify the
constraints and leads to an NsDef, SrcDef and Prefix for each Namespace.
Namespaces can be retrieved at any stage by their Prefix (from a
Hashtable). The Namespace stores the NsDef and the SrcDef (if supplied).

When a PI is encountered (processingInstruction) or an element or attribute
(startElement) the Names will be inspected for colons [SAX preserves these,
but has no special apparatus]. If found, a UniversalName will be
constructed for each object (PINode, XNode, Attribute). The UniversalName
consists of the ordered pair of the 
namespaceName and the localName.

Thus the UN for F:foo might be [http://foo.com,foo]. For internal storage
this might be held as the unique and parsable String http://foo.com:foo
This is, of course, an illegal elementTypeName in documents, but represents
a planet-wide uniquification. This is, IMO, the key aspect of namespaces -
we can create these unique strings and handle them in software. 

When a stylesheet comes to render such an element, it must be able to
create the UniversalName for that element (i.e. the stylesheet must have
the same namespaceName and localParts in its components. It need not, and
may well not, use the same prefix. It is likely that for screen display I
shall use the prefix (e.g. the element might be displayed as F:foo) but the
user should have the option of displaying the UniversalName if required. 

At this stage, merging of documents presents no problems. If I import
another document with a different namespace which also happens to use the
prefix F: I can create the universal names *before* merging the documents.
I would probably create a trivial prefix (e.g. F1) for display. 

The behavior associated with each element will either be determined through
a stylesheet (*after UniversalName resolution*) or through mapping to java
classes. For the latter, the mapping will also involve resolution. Thus if
I use an XSchema of the form:
<?xml version="1.0"?>
<?xml:namespace ns="http://foo.com" prefix="FOO"?> <!-- note different
prefix -->
<XSC:Schema>
	<ElementType id="FOO:foo">
		<Java>com.foo.FOONode.class</Java>
...
The element FOO:foo will be resolved to the same UniversalName as in the
document
instance. This element is then mapped onto the Java class. It can display
the button in orange and emit smoke when pressed.

For output the user will have the option to set the prefixes for each
namespace. By default they will have the same as they started with. 

I feel very happy and excited about this. As far as I can see it's
implementable fairly simply in JUMBO2 and will be much nicer than JUMBO1's
namespaces.

There are some remaining problems:

	- unqualified names (i.e. prefix = null). Although I expect that in a
year's time almost all XML will be using namespaces, there will be chunks
which don't. Very often these will conform to a particular DTD (e.g. XHTML)
but without qualification. I therefore have a (slightly dangerous)
mechanism which I used in JUMBO1. It mapped all unqualified names to a
prefix of "#DEFAULT" (a deliberately illegal Name). This could then be
associated with a namespace either by an xml:namespace or through program
operation. The advantage of this is that it then gives the user a chance to
qualify their documents.

	- Xpointers. This is more serious. XPointers locate elements and
attributes by the occurrence of QNames in the document. Thus
	descendant(1,FOO:foo) 
will *not* find anything in our example instance. Since 'most people' agree
that the prefix has no formal standing, perhaps XPointer V2.0 (or even the
latest revision) could allow UniversalName substitution. I think this would
be extremely valuable and do not see any serious downside (given that we
are implementing namespaces anyway).

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)