SDATA or UNICODE

Wed Jan 28 11:51:47 GMT 1998

I'd like to thank the readers for their replies. I'll try to formulate my
thoughts on this issue.

The original EURO/FLORIN question raised an interesting discussion -to me,
anyway- on the status of doing XML 'standardization' work under W3C. In general,
for any new model to be defined, we are forced to build the specs (at least in
part) in a bottom-up fashion. Notably, for XML we must assume there is a solid
way of defining characters, and this must be supported by our software tools.
That way we can focus on information structuring and interchange rather than the
technical issues that *underly* (but are not essential to) any project for
building software. I'd say character specification is an underlying feature. If
we are not certain that we have a *complete* way of defining and referencing
characters (technically, but also socially as is the case for UNICODE assignment
of characters by their external form and use), and if we still want to work on
XML, we must find a 'way out' for those cases where standard character
description fails.

*If* there is no character known as, say, FLORIN, in UNICODE, we must have a way
out and be able to represent (reference) that character. Chris Maden: "There are
a lot of symbols, especially scientific and technical ones, that aren't
represented in Unicode." Martin Bryan's remark: "I agree XML needs to use the
10646 value once it becomes officially approved, but does anyone have the
faintest when that will be?" seems somewhat beside the point; I'd say we can be
sure UNICODE will never be able to represent *all* characters (I may create a new
character every morning at breakfast). When we *know* that there will be
characters such as FLORIN in some knowledge domain/language, and if we assume
there will *always* be such characters that simply have not been recorded before
and/or officially, we must at the very offset of XML standardisation support an
escape route, an alternative. That's SDATA in SGML. If we all agree that we can
never place all distinguishable characters in one world-wide standard (such as
UNICODE), we must introduce the escape route, and not postpone that to a 'future
version' of XML.

So, in my opinion, following the replies to my earlier question, I think XML
should start to support SDATA (as in SGML) and a way of allowing a system to
determine what character the replacement text represents (which is 'covered' by
public character sets in SGML).

PS. Note that SDATA is not a character referencing format: it is a form of
allowing the system to insert *any* data when and where the SDATA entity is
referenced. Notably, this could be a complete character sequence, or even
different data for each reference to the same SDATA entity.

Thanks again for the replies.

Arjan

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)