SAX: New Idea for Entity Resolution
Lars Marius Garshol
larsga at ifi.uio.no
Fri Apr 17 14:05:37 BST 1998
* David Megginson
>
>Here are what seem to me to be the costs and benefits of supporting
>character streams, byte streams, or both:
>
> [pros and cons deleted]
What about using Tyler Bakers proposal with a SAXStreamFactory
>* Character streams only
>
> Pro: - the application writer has specialised knowledge about the
> information source that the parser writer lacks; as a
> result, the application writer can better optimise the
> conversion, if necessary
> - information from dialogue boxes, internal buffers, and
> (eventually, with internationalisation) databases will all be
> characters rather than bytes
> - most programming languages are moving towards characters and
> away from processing raw bytes
> - many programming languages (such as Java) already have
> standard methods for converting byte streams to character
> streams, and application writers can use these if needed or
> desired
>
> Con: - the application may have to convert from bytes to characters
> itself if an input source is not available
> - the parser may have its own, internal, efficient mechanism
> for byte-stream conversion
>
>
>* Byte streams only
>
> Pro: - supports the minimum common denominator: all platforms have
> some concept of a byte stream
> - allows parsers to use their own, efficient, internal methods
> for byte-stream conversion
>
> Con: - adds serious inefficiencies, since characters (say, from a
> dialog box, an internal buffer, or a database with I18N
> support) will have to be decomposed back into bytes to be
> passed to the parser, then reassembled back into characters
> by the parser
> - requires a new SAX class encapsulating a ByteStream and its
> recommended encoding
>
>
>* Both Byte and Character streams
>
> Pro: - keeps everyone happy
>
> Con: - requires more interfaces
> - requires another method in the Parser interface
> - requires a new SAX class encapsulating a ByteStream and its
> recommended encoding (or perhaps the ByteStream interface
> will have a getEncoding() method)
> - will greatly complicate the EntityResolver mechanism (the
> application will need to be able to return a byte stream _or_
> a character stream -- how could I handle this?)
>
>
>Thanks, and all the best,
>
>
>David
>
>--
>David Megginson ak117 at freenet.carleton.ca
>Microstar Software Ltd. dmeggins at microstar.com
> http://home.sprynet.com/sprynet/dmeggins/
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
>
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list