More on Namespaces (also long, but also optimistic)

Mon Aug 17 16:56:41 BST 1998

I attempted a post earlier, but I now realize this is a moderated list.
Perhaps whoever recieves this (Mr. Murray-Rust?) would be kind enough to
forward this mail to the list or, even better, to folks in the WG.

I have been working on my own XML parser part-time for a couple months now
as part of an effort to build an Internet aware, XML based EDI translator.
My background is in developing transaction-oriented database and GUI
applications and, recently, web-based application servers.

I am writing to complain (I know, I know) about the new namespace WD and the
(apparent) lack of a strategy to develop XML schemas.  I am also writing to
get an idea off my chest that I think will greatly improve XML and assist in
the development of schemas.

o 	Namespaces

Namespaces are an inherently global construct and I can see no reason why
document writers cannot live with a single prefix per global identifier
within a document.  Put another way, element level prefix declarations are
simply not necessary.  They solve no real-world problem and create many.
The original processing instruction for declaring a prefix seemed fine.  If
folks wish to add default prefix scoping, well that is a great idea, but an
independent issue.

For example (using a hypothetical ns syntax):
<?xml version="1.1">

<!NAMESPACE isbn PUBLIC ‘urn:ISBN:0-395-36341-6’>
<!NAMESPACE foo 
        PUBLIC ‘uuid:{00021A11-0012-0022-C000-000000782246}’ 
        #DEFAULT
> <-- now that’s globally unique -->
<!NAMESPACE bar PUBLIC ‘id:bar’> <-- any globally unique string, right? -->

<!DOCTYPE books_ordered [
<! ENTITY % foodtd SYSTEM ‘http://www.foo.com/foo.dtd’>
<! ENTITY % bardtd SYSTEM ‘http://www.bar.com/bar.dtd’>
<!ENTITY % isbndtd 
        PUBLIC ‘urn:ISBN:0-395-36341-6’
        SYSTEM ‘http://www.bar.com/bar.dtd’>

<-- Associate the dtd'd with their 
    locally defined namespace prefixes.
-->
<!NAMESPACE isbn #DEFAULT>
%isbndtd;
<!NAMESPACE foo #DEFAULT>
%foodtd;
<!NAMESPACE bar #DEFAULT>
%bardtd;
]>
<!NAMESPACE foo #DEFAULT>

<-- Begin elements: Everything above can be pre-parsed 
    and cached for frequent document types. 
-->
<books_ordered>
<book>
	<title>My Life As A Dog<subtitle>Tail of a programmer</subtitle></title>
	<isbn:number>123456789-234-234</isbn:number>
	<author>Yusef Lateef</author>
	<ncopies>97</ncopies>
</book>
<bar:book 
        title="Technology Evangelism For Dummies" 
        isbn:numattr="139593933-859-833" 
        number_ordered="13"
>
        <author>Bob Dobbs</author>
        <author>Horatio Alger</author>
</bar:book>
</books_ordered>

In this example, all of the names of attributes and child elements of
<book> are resolved in the foo namespace, unless a prefix indicates
otherwise (e.g. isbn:number).  Likewise, the names of attributes and child
elements of <bar:book> are resolved in the namespace of the containing
element.  Scoping is a useful extension of namespaces that is in no way
dependent on element level declarations.  The exact syntax of the namespace
declaration is unimportant, except to note that it should be part of the
document header.

o 	Expanded Names

The issue of expanded names reminds me a great deal of C++ name mangling.
The C++ committee chose not to standardize mangled names.  Consequently, it
is impossible to link libraries compiled by different libraries.  This has
not been a show stopper for C++, but is worth avoiding with XML.  For
example, a style sheet processor and XML processor should be able to match
elements and attributes based on the global namespace identifier.  If the
two programs expand the name to slightly different strings - no go.  

As an aside, the stylesheet processor might want to match by prefix, which
would insulate documents from changes in the namespace ID, which might
change to reflect DTD version changes.

o	Schemas

www.w3c.org is starting to look like Schema-of-the-Month Club.  I am
confused about where things are going.  I have noticed a couple recurring
themes to these projects: 1) strong data types and 2) semantic validation.
There are good reasons why neither of these are included in the base XML
spec.  The user communities are too diverse.  That said, all non-trivial XML
applications must grapple with these important subjects.  Thus, it seems
worthwhile for the XML to include hooks to support strong data types and
provide guidance for semantic validation for schema spec writers and
application developers.

The basic issue with data types is to notate how non-text data is to be
converted to and from the XML text format.  Well known data domains include
dates, numbers, currencies, social security number, phone numbers, etc.,
etc.  In each of these examples, the actual internal representation used by
a program may not be the same as the text format.  Note that this is a
wholly separate issue from presentation, which describes size, font, color,
etc.  Applications need this format information to successfully transform
the data into their respective internal formats.

I submit the following proposal for data format support in XML.  Simply
allow a new "xml:fmt" pseudo-attribute in both entity and attribute
declarations.  It could also be allowed as a predefined attribute for each
element.  The xml parser need only pass along the text of the format
specifier.  It is up to the application to use it to do useful work.  For
example,

<!ELEMENT process_date (#PCDATA 
content model
) 
xml:fmt="iso8061:YYYYMMDDTHH:MI:SSZ"
>

<!ATTLIST some_elem 
	effective_date CDATA #REQUIRED xml:fmt="iso8061:YYYYMMDD"
>

<some_date_elem xml:fmt="iso8061:YYYYMMDD">19980808</some_date_elem>

It may be that, for attribute values, this syntax is redundant with NOTATION
syntax.  But no one has seen fit to use it yet and it is helpful to a) make
it explicit and b) consistent for both element and attribute values.  In
fact, xml:fmt could supercede xml:space, which governs two different text
formats.  The no xml:fmt would leave the current default text handling with
all whitespace preserved.  xml:fmt="xml:nospace" would compress all
whitespace to a single space character (#x20;).  If you like being explicit,
use xml:fmt="xml:space" to denote that you want whitespace preserved.

Again, the exact syntax is not important.  The important thing is the
ability to preserve format information along with the data in a compact,
normalized form.  This information will be highly useful to query tools and
xml applications such as EDI translators.  Part of a schema specification
would be the definition of data formats used to transform various data types.

My only other issue with various schemas to date is that they are overlap
too much with DTD’s.  This is dangerous at this early stage in the XML
life-cycle.   This is not to say that schemas are unnecessary, simply that
they should use the facilities of XML to the extent possible.  If a schema
provides only incremental benefit over a DTD, then perhaps the DTD
specification itself warrants reexamination.  Whither XML 1.1?  We need this
to integrate namespaces and links and provide guidance on schema development.  

I hope you find these comments and suggestions constructive.

Best regards,
Charlie Reitzel

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)