(ISOdia, ISOtech, etc explained)

Rick Jelliffe ricko at allette.com.au
Sun Jul 20 07:03:08 BST 1997


Someone on this list has asked what ISOdia, ISOtech etc are.


The SGML standard (ISO 8879) included several sets of entity definitions
for many special characters:

	* ISOlat1  gives the characters in extended Latin alphabet #1, which is also
	the upper part of ISO 8859-1
	* ISOlat2 gives a whole lot of extra latin characters
	* ISOgrk1 and ISOgrk2 give simple modern Greek characters
	* ISOcyr2 and ISOcyr2 give modern Russian and non-Russian Cyrillic characters
	* ISOdia gives spacing versions of diacritical marks (and is therefore not
	very useful, I think)
	* ISOpub, ISOtech give symbols used in publishing and science
	* ISOnum, ISOgrk3 and ISOgrk4 give symbols used in mathematics
	* ISObox gives the box characters (yuck)

These entity sets allow you to use special characters in your document, regardless
of what the document character set you are using.  Well known examples are "<",
"&", or "—".

They are "SDATA" entity sets, which means that it is the job of the recipient to 
map them to something locally useful.  XML uses ISO 10646 (~=Unicode) as its
document character set, so I made up versions of some of the public entity sets
resolved for use with ISO 10646.  That was the sets I posted.

The ISO standard character entity sets are almost universally used in SGML documents,
and giving the XML versions of them makes translation from SGML to XML easier.

W3C has put out its own versions (HTMLsymbol, HTMLlat2, and HTMLmisc) which 
contain have a selection of the most ubiquitous special characters from the
ISO sets: basically, the characters in the so-called ANSI code page used on
Windows and the Adobe Symbol font, again resolved for ISO 10646.

Other public entity sets of interest are:
	
	* Martin Bryan has put together ISOchem, (in ISO 9573 Techniques for using 
	SGML) for chemical symbols;
	* TEI (Harry Gaylord, etc) has put together entity sets for Arabic and
	Hebrew;
	* I have put together a set (SPREAD) for representing all Unicode characters
	as entities: in the case of XML, this is redundant, but it does allow
	transport between XML and SGML fairly trivially;
	* American Mathematics Society has contributed, and ISO has standardised,
	several sets of characters for mathermatical use (ISOamsr etc.);
	* Anders Berglund has revised some of the ISO 8879 public entity sets,
	and the TEI sets, and added some others for other languages e.g. Thai
	as part of ISO 9573 part 15 "Entity sets for non-Latin Languages). These
	sets have apparantly been voted YES by ISO national bodies, but have 
	sat in limbo for several years, apparantly due to some obstruction from
	somewhere, which (if true) is a very bad thing.  (I hope members of ISO
	bodies will get their national bodies to investigate and push for 
	distribution of the public entity sets.)

I hope this answers the question from the list member who asked.

In closing, ISO 10646 does not contain all the symbols or letters in the world.
Especially in technical fields (e.g. ISOchem for example).  So there is still 
the need for XML to provide a standard way to request special glyphs over the
web.  There will always be this need, since the number of glyphs and characters
is unbounded.  I hope that XML WG will look at this issue.  Gavin Nicol has started
a maillist on this subject.



Rick Jelliffe



xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)




More information about the Xml-dev mailing list