i18n (was Re: Open Standards Processes)

Rick Jelliffe ricko at allette.com.au
Tue Apr 28 10:39:26 BST 1998


From: Gregg Reynolds <greyno at mcs.com>

>If I've misunderstood something I hope somebody will correct me, but if
>I'm not mistaken pretty much everybody involved is from the the
>"developed" world, mostly the West.

Because to participate means there must be the leisure or finance to
do so, and there must be the technological background to do so, and
there must be the techno-cultural self-awareness to  do so. All these
are attributes of the center (or North, or West, whatever you call it.)

I asked several Thais when line breaks could occur, for example. The
best answer I got was "when it is beautiful". (Actually, in the particular
case of Thai and the Indic script languages, I would imagine there will
be a great increase in knowledge because of James Clark's interest
in the region. Exploration is always done by outsiders.)

> The W3C has no doubt made excellent good-faith
>efforts to internationalize the standard; but is there any input from,
>say, an Indian librarian?  An Egyptian computer scientist?  An Ugandan
>Web-site operator?  Has the W3C made an effort to seek out qualified
>professionals from "the South"?  I don't see how it's possible for a
>truly "world"-wide-web to happen without such input.


I introduced W3C's Bert Bos to my current boss, a Jordanian with Arabic
i18n (internationalization) experience, at the WWW7 conference.
Bos said that there was currently
no input from Arabic people: no-one (or perhaps none with sufficient
credentials)
had come forward. The driving force behind W3C i18n, as was clear at the
developer's day session, is the need to support the needs of advertisers
better.
The Web is not  a library, it is a TV network posing as a library. So i18n
efforts
through W3C will be prioritized by market value: Europe, then CJK, then
anything else that is easy.

If you are concerned about this, the best approach is to ask them exactly
what they need: I have found an enormous goodwill to the idea of
throrough-going i18n at W3C.  Their problem is that they cannot devote
resources to finding out what is needed. So make up a nice couple of
pages of solutions to  real problems that you see, and send it off to
Martin Duerst, Jon Bosak and Bert Bos. I am sure they would be
delighted for all input: they are gathering information for CSS3 and XSL.

When I started looking at "native language markup" it is interesting that
the only opposition I got, outside Americans, was from Indians. I think that
was because all educated Indians speak English, so if someone uses a
computer they are not held back by English markup. Also, markup in
a foreign language is very visually distinct. But I cannot agree with them:
enumerations in attributes are really a kind of data: so even if an Indian
DTD
can get away with English element type names, other kinds of names
will need an extended range of characters available.

> I understand it's no easy matter
>to rewrite gcc to support c programs written entirely in Urdu, but XML
>(and XSL and etc) is another matter.  It's entirely reasonable (IMO) to
>write the spec in a way that supports multiple writing systems.


SGML made it an explicit goal "there should be no national language
dependencies". XML has improved on this, adopting ISO 10646
Universal Character Set (Unicode) and predefining xml:lang for
every element type.  (I wish they had also predefined xml:script
too, but users can do that if they need it.)

SGML seems to have spearheaded an awareness of this at ISO.
The new guidelines for programming languages mandate
language neutrality, and some way of encoding ISO 10646
characters into 8 bit strings are being retrofitted onto most
standards.  Making UTF-8 the encoding used in 8-bit strings
seems the least cost method, if your software is 8-bit clean.

Rick Jelliffe

<PLUG>PS Since you are particularly interested in Arabic, you
may be interested that in my book "The XML and SGML Cookbook",
which comes out next month, there is an index in which you can look
up the XML numeric character codes for all the arabic characters
available in XML, and also a CD-ROM which has some arabic
entity sets.</PLUG>


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list