SAX, Java, and Namespaces (was Re: Restricted Namespaces for XML)

Tyler Baker tyler at infinet.com
Fri Feb 5 00:25:04 GMT 1999


David Megginson wrote:

> Tyler Baker writes:
>
>  > If SAX were to make a simple requirement that all strings that
>  > represent symbols (like names) were to be interned then things
>  > would be a lot cheaper.  The same can be said of the DOM as well.
>
> The problem is that Java's own intern is so terribly inefficient that
> no serious parser writer will use it (most of them have their own,
> custom interns).

As of JDK 1.1.6 things are not so bad and Java 2 is a bit better as interned Strings are under
the hood managed using Weak References.  It could be made better in the JDK though.  I suspect
if they made a real effort in the Java 2 JVM they could make string interns at least twice as
fast as things currently are.  Nevertheless, string interning is a one time cost so lets put
that in perspective here.

> Even then, you wouldn't get any help with the "xmlns:" prefix
> matching, which is the costliest part.  The most efficient way to do

Very true (ouch, ouch, ouch)...

> namespace processing is directly in the parser (which has to look at
> every attribute name anyway), but my own tests have shown that filter
> layer on top of SAX isn't too bad.

Unfortunately as in the case with all XML or XSL benchmarks, the test data can vary
enormously.  If you have documents that have few elements with attributes (except of course
namespace attributes), then things probable will not be so bad.  However, if you have lots of
attributes in elements, then you need to check every single attribute to see if it starts with
"xmlns:" (ouch, ouch, ouch).

So I suppose we should no encourage document designers to model data only as character content
in elements and only use attributes for ID's and namespaces declarations.

For types like a rectangle, I think using attributes makes a lot more sense in the general
case, but in the presence of "Namespaces in XML" I would change things from:

<Rectangle x="0" y="1" width="59" height="23">

to:

<myprefix:Rectangle xmlns:myprefix="YabbaDabbaDoo">
  <myprefix:x>
    0
  </myprefix:x>
  <myprefix:y>
    1
  </myprefix:y>
  <myprefix:width>
    59
  </myprefix:width>
  <myprefix:height>
    23
  </myprefix:height>
</myprefix:Rectangle>

The really sad thing about this is that there tends to be a feeling among a lot of people that
meaningful prefixes do not matter at all.  If XML is ever going to be editable by an average
internet user for some common tasks, meaningful prefixes do matter.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list