(ISOdia, ISOtech, etc explained)
ricko at allette.com.au
Sun Jul 20 07:03:08 BST 1997
Someone on this list has asked what ISOdia, ISOtech etc are.
The SGML standard (ISO 8879) included several sets of entity definitions
for many special characters:
* ISOlat1 gives the characters in extended Latin alphabet #1, which is also
the upper part of ISO 8859-1
* ISOlat2 gives a whole lot of extra latin characters
* ISOgrk1 and ISOgrk2 give simple modern Greek characters
* ISOcyr2 and ISOcyr2 give modern Russian and non-Russian Cyrillic characters
* ISOdia gives spacing versions of diacritical marks (and is therefore not
very useful, I think)
* ISOpub, ISOtech give symbols used in publishing and science
* ISOnum, ISOgrk3 and ISOgrk4 give symbols used in mathematics
* ISObox gives the box characters (yuck)
These entity sets allow you to use special characters in your document, regardless
of what the document character set you are using. Well known examples are "<",
"&", or "—".
They are "SDATA" entity sets, which means that it is the job of the recipient to
map them to something locally useful. XML uses ISO 10646 (~=Unicode) as its
document character set, so I made up versions of some of the public entity sets
resolved for use with ISO 10646. That was the sets I posted.
The ISO standard character entity sets are almost universally used in SGML documents,
and giving the XML versions of them makes translation from SGML to XML easier.
W3C has put out its own versions (HTMLsymbol, HTMLlat2, and HTMLmisc) which
contain have a selection of the most ubiquitous special characters from the
ISO sets: basically, the characters in the so-called ANSI code page used on
Windows and the Adobe Symbol font, again resolved for ISO 10646.
Other public entity sets of interest are:
* Martin Bryan has put together ISOchem, (in ISO 9573 Techniques for using
SGML) for chemical symbols;
* TEI (Harry Gaylord, etc) has put together entity sets for Arabic and
* I have put together a set (SPREAD) for representing all Unicode characters
as entities: in the case of XML, this is redundant, but it does allow
transport between XML and SGML fairly trivially;
* American Mathematics Society has contributed, and ISO has standardised,
several sets of characters for mathermatical use (ISOamsr etc.);
* Anders Berglund has revised some of the ISO 8879 public entity sets,
and the TEI sets, and added some others for other languages e.g. Thai
as part of ISO 9573 part 15 "Entity sets for non-Latin Languages). These
sets have apparantly been voted YES by ISO national bodies, but have
sat in limbo for several years, apparantly due to some obstruction from
somewhere, which (if true) is a very bad thing. (I hope members of ISO
bodies will get their national bodies to investigate and push for
distribution of the public entity sets.)
I hope this answers the question from the list member who asked.
In closing, ISO 10646 does not contain all the symbols or letters in the world.
Especially in technical fields (e.g. ISOchem for example). So there is still
the need for XML to provide a standard way to request special glyphs over the
web. There will always be this need, since the number of glyphs and characters
is unbounded. I hope that XML WG will look at this issue. Gavin Nicol has started
a maillist on this subject.
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev