Whitespace

Rick Jelliffe ricko at allette.com.au
Wed Aug 27 04:43:11 BST 1997


 
> From: Peter Murray-Rust <Peter at ursus.demon.co.uk>
  
> I would strongly argue against Unicode characters at this stage. *I* wouldn't
> know where to get them from, and typing by hand could be a disaster. 

I have attached a table with how XML, by adopting ISO 10646, allows developers
to handle spaces, hyphenation and breaking.  I hope people find it useful. (I have
previously sent around versions of the ISO public entity sets converted for XML
use: these are available on the Robin Cover's website at the Summer Institute of
Linguistics. The table has a copyright note against printing because I have 
prepared it for my forthcoming book "The SGML Cookbook" out soon.)

You can get more information

* the Unicode 2.0 book, available in book stores
* ISO 10646 standard, availabel from your national standards bocy
* there is an online listing of the characters at the Unicode consortium's
website, and an independent one on the SGML Oslo archive site, and 
by looking at the SPREAD public entity set
* on NT you can use the keycaps viewer to see (printing) characters in Unicode
fonts.


>It will take a while before Unicode is natural to HTML authors.

ISO 10646 provides a very rich set of characters to handle spaces and 
newlines.  It is very important that XML developers understand and implement
them, because then it simplifies what people need to do in their XML scripts.
It removes spacing from being a "how to format this element" issue to being
a "how render this character" issue, which is neater.  If developers ignore
these unambiguous characters, they then have to overload space and -, with
unpredictable results. To get definite results you need definite markup:
developers should not confuse the visual simplicity of the space and hyphen
with the complexity of what must be marked-up to get them to work.

There is *no* natural way for HTML people to do most of the things that
ISO 10646 offers for control of spaces, hyphenation and breaking. 
However, it is more like what users of word processors will find natural.


Rick Jelliffe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: space.htm
Type: application/octet-stream
Size: 2841 bytes
Desc: space.htm (Internet Document (HTML))
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970827/bc40c921/space.obj


More information about the Xml-dev mailing list