SAX C/C++ Implementations?

Steinar Bang sb at metis.no
Wed Sep 29 08:50:21 BST 1999


>>>>> <mlepage at antimeta.com>:

> On Tue, Sep 28, 1999 at 07:04:55PM +0200, Steinar Bang wrote:

>> Yes, sorry!  The compilers in question are Sunpro 4.1 and egcs/gcc
>> (currently egcs 1.1.2-pre2).

> Sunpro 4.1 is hopelessly out of date. I started using it almost 3
> years ago, that's an eternity.

It's about as long since our last upgrade.  However, even after this
time, it is still one of the best C++ compilers I have run across
wrt. to warnings and template support.

> I believe Sun has new compilers available.

Yes.  We're dithering between paying for an upgrade or dropping the
Solaris platform altogether.  We'll see.  

> GCC 2.95.1 installs easily in /usr/local (configure, make, make
> install) and seems to work great. It's still not entirely standard
> compliant,

Yup.  But egcs-1.1.2 works for me, and it was the first egcs release
that worked for a while because of a static initializer bug on ELF
architectures.  For the interested, see:
        http://egcs.cygnus.com/ml/gcc-bugs/1999-02/msg00469.html

> but in many ways more so than VC++6.

...which is the third of our platforms...

Ironically on paper the Standard C++ Library of MSVC++6 is the most
complete implementation of the three (Sunpro 4.1 doesn't have an
implementation of the standard library, we use Standards<ToolKit> from 
Objectspace), but it is unusable as delivered with MSVC++6 because of
bugs.  You can get fixes from Dinkumware (who wrote the library), but
if you use DLLs you still will call instantiations of the buggy
versions of some of the templates in the library, in the run-time
library. 

No fix will ever come in an SP.  It _may_ arrive with MSVC++7, but I'm
not holding my breath.

So we use Standards<ToolKit> here as well.  Unfortunately
Standards<ToolKit> is not compatible with std::iostream, so we have to 
use the old iostream library.

[snip!]
> I don't think yours will, that's why I didn't take that route. In my
> opinion, if you request the value of a non-existent attribute, an
> exception should be thrown. The reason is simple: the function
> cannot do what it promises to do.

In principle I agree.  However, according to Scott Meyers in "More
Effective C++", exceptions in C++ tend to be costly beasts.

Also if you plan to catch the exception close to where it occurs, the
code becomes more cluttered than code using return codes.

> Perhaps also a function should be added to tell if a particular
> attribute is present. Again, these are interface changes over and
> above a "straight" port of SAX from Java to C++.

OK.  Java goes "the perl way"?  (Ie. return empty values instead of
errors for first use of undefined values)

[snip!]
> I agree that an enum, in this case, is preferable to the
> strings. Again, that's a bigger interface change than a "straight"
> port. I also don't mind the method of returning a string where it is
> copied into a reference parameter.

That's what my attribute list currently does.  

The wrapper to the expat attributes use lazy evaluation.  It doesn't
convert from the vector of char* into strings until needed.  It
doesn't create the name map until needed.  I try to limit the copy of
strings to the single copy from char* to string, where I do UTF-8
decoding at the same time.

If we use return codes and let the caller provide the storage for the
string, we can be even lazier and not convert into strings before they 
are actually used.

The drawback is that several calls to get the same property will all
result in a copy.

(an experience I have from other parsers is that I try to build up an
absolute minimum of temporary data structures, and avoid string copies 
if I can).

> Other than using the reference, that's how Lakos recommends
> returning something, if I recall correctly.

Hm... who's Lakos?

>> [snip!]
>> > That all sounds reasonable, especially for a private (protected?) 
>> 
>> Not neccessarily!  I'll ask.

> Ideally, we'd at least get our interfaces, and Jez', synchronized. I
> predict more difficulty with IBM's. Finally, I've been doing as
> "straight" a port as possible; ideally, for a more "C++-ized" port,
> we'd get more blessing from Megginson for the interface changes we
> make. That's as close to a standard as we probably can get.

Agreed!

> Also, my Expat driver is extremely simple. It lets me parse XML, but
> mostly I focussed on the document handler methods.

Mine too (so far, at least).

> I'm not sure I know enough about the rest to properly support, for
> example, the element/attribute types. I would appreciate advice on
> this matter, or your code.  :-) 

As far as I could tell from xmlparse.h you won't get the attribute
types from expat, and I'm guessing that the reason is that information 
about the types would be in a DTD or a schema, and expat has no
knowledge of either.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list