parser asynch input (Was: SAX/C++: First interface draft)

Steinar Bang sb at metis.no
Mon Dec 6 10:25:44 GMT 1999


>>>>> nisse at lysator.liu.se (Niels Möller):

> Steinar Bang <sb at metis.no> writes:
>> I would like to add operations that can be used to "push" data to the
>> parser asynchronously:

> I also think this is important (and it's a sadly missing feature of
> the IBM's xml4c parser, which provides another SAX-like C++ API).

> But is there any reason not to use the same InputSource abstraction
> for the fragment blocks? Say, something like

>   class Parser
>   {
>   public:
[snip!]
> !   virtual void parseFragment (const InputSource &input) = 0;
> !   virtual void parseEnd() = 0;
>   private:
>     void operator delete (void *);
>   };

Hm... I think for me at least, this will cause an extra copy of the
fragment before parsing.

If I have a buffer, and wrap an strstream around it, I would still need
to read the entire fragment from the istream into another buffer
before feeding it to expat.

Or would it be more efficient to do a loop on the stream and put the
buffer's contents char by char into expat...?

Being able to put a buffer directly into the parser is the most
efficient way of doing things, from the way we currently handle
different file formats in our application.  We have a map from MIME
types to pointers to instances of a class called NetStreamFactory:

class NetStreamFactory {
  public:
    virtual ~NetStreamFactory();
    virtual NetStream* newStream(const Url* url = 0) = 0;
};

not surprisingly, these factories are used to create instances of
subclasses of NetStream (subclasses handling XML, and our old file
format, as well as decoding image formats like PNG and JPEG):

class NetStream {
  public:
    virtual ~NetStream();
    virtual void setReadOnly(bool readOnly = true);
    virtual void putBlock(const char* buf, unsigned long len,
			  bool entireFile = false) = 0;
    virtual void eof();
};

(The idea with the "entireFile" argument to putBlock, is that I can avoid
doing buffering of the data for the NetStream classes that need the
entire file (our old format which uses a recursive descent parser, and 
our current JPEG decoder) for the case where I'm reading in the file
from the local file system.  Also for the case of data arriving on the 
net, I'm delivering the buffer read from the network as is, to the XML 
parser, without doing an extra copy).

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list