Feeler for SML (Simple Markup Language)

Jelks Cabaniss jelks at jelks.nu
Fri Nov 12 05:10:25 GMT 1999

Clark Evans wrote:

> > o UTF-8 encoding only
> I'm kinda ingnorant... would it still be
> possible to handle oriental character sets
> with UTF-8 ?

Yes.  ASCII characters in UTF-8 only take up 1 byte, but if you're using
oriental character sets, one *character* can take up several bytes.  I forget
the max number and the algorhythm used, but I bet some other folks here would
know.  The only problem is when an *entire* oriental character set document is
in UTF-8, then it's liable to be bigger than one encoded in UTF-16, where you
know that every character will take up two bytes, no more, no less.

> > o No non-character entity references
> > o No predefined character entities (I am iffy on this one)
> Sure.

How are you going to handle < and & in PCDATA, unless you declare them
explicitly (which is hard to do when you've done away with the DOCTYPE
declaration)  ...


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list