Feeler for SML (Simple Markup Language)

David Megginson david at megginson.com
Sat Nov 13 11:55:08 GMT 1999


"Don Park" <donpark at docuverse.com> writes:

> Right.  This means that SML is not a good choice for 'documents' nor
> encoding data with lots of foreign characters.

Like, say, a database with the names of subscribers to a Chinese
e-mag, or a collection of information about Arabic movies.  

Right now, it happens that a few large English-speaking former British
colonies (U.S., Canada, Australia, New Zealand) and Western Europe
make up a majority of the computer-using world, but since we make up a
small minority of the world in general, I expect that things will
change rapidly -- data from Turkey or Saudi Arabia or Japan or Korea
can have exactly the same problem as a document.

Actually, in the end I found that supporting UTF-16/UCS-2 as well as
UTF-8 wasn't that hard in AElfred (again, just a few lines of code,
and I didn't use the Java 1.1 library stuff).  The hard part about
Unicode is that there's such a wide range of characters allowed and
not allowed in each context, and XML 1.0 *requires* parsers to report
all of those errors.  That's a problem with UTF-8 as well as UTF-16.


All the best,


David

-- 
David Megginson                 david at megginson.com
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)





More information about the Xml-dev mailing list