SAX: New Idea for Entity Resolution

Lars Marius Garshol larsga at ifi.uio.no
Fri Apr 17 14:05:37 BST 1998


* David Megginson
>
>Here are what seem to me to be the costs and benefits of supporting
>character streams, byte streams, or both:
>
> [pros and cons deleted]

What about using Tyler Bakers proposal with a SAXStreamFactory

>* Character streams only
>
>  Pro: - the application writer has specialised knowledge about the
>         information source that the parser writer lacks; as a
>         result, the application writer can better optimise the
>         conversion, if necessary
>       - information from dialogue boxes, internal buffers, and
>         (eventually, with internationalisation) databases will all be
>         characters rather than bytes
>       - most programming languages are moving towards characters and
>         away from processing raw bytes 
>       - many programming languages (such as Java) already have
>         standard methods for converting byte streams to character
>         streams, and application writers can use these if needed or
>         desired
>
>  Con: - the application may have to convert from bytes to characters
>         itself if an input source is not available
>       - the parser may have its own, internal, efficient mechanism
>         for byte-stream conversion
>
>
>* Byte streams only
>
>  Pro: - supports the minimum common denominator: all platforms have
>         some concept of a byte stream
>       - allows parsers to use their own, efficient, internal methods
>         for byte-stream conversion
>
>  Con: - adds serious inefficiencies, since characters (say, from a
>         dialog box, an internal buffer, or a database with I18N
>         support) will have to be decomposed back into bytes to be
>         passed to the parser, then reassembled back into characters
>         by the parser
>       - requires a new SAX class encapsulating a ByteStream and its
>         recommended encoding
>
>
>* Both Byte and Character streams
>
>  Pro: - keeps everyone happy
>
>  Con: - requires more interfaces
>       - requires another method in the Parser interface
>       - requires a new SAX class encapsulating a ByteStream and its
>         recommended encoding (or perhaps the ByteStream interface
>         will have a getEncoding() method)
>       - will greatly complicate the EntityResolver mechanism (the
>         application will need to be able to return a byte stream _or_
>         a character stream -- how could I handle this?)
>
>
>Thanks, and all the best,
>
>
>David
>
>-- 
>David Megginson                 ak117 at freenet.carleton.ca
>Microstar Software Ltd.         dmeggins at microstar.com
>      http://home.sprynet.com/sprynet/dmeggins/
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
>
>


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list