Editing text

Rick Jelliffe ricko at allette.com.au
Fri Nov 28 05:30:25 GMT 1997

> From: Peter Murray-Rust <peter at ursus.demon.co.uk>

> I assume there is no short cut...

On the contrary, there *IS* a short cut: the most obvious one!

Just treat the name as a token (i.e. terminated by whitespace or >,
or any other delimiter if you want to be careful). Any valid XML will
work with just that!

If you want to completely validate your XML, then the more sophisticated
checks are appropriate.  The intent (as I see it) is to let people use
customary words in their language and script, if they want to.  It is bad
practise to use crazy symbols and uncommon characters in markup, because
the purpose of markup is to reveal meaning, not hide it.  The complexity
of the rules merely encodes that to give guidance in the peripheral cases.

> I applaud the work of the WG on the Internationalisation and I don't want

Yes, they have been exemplory in this, I think.  They have taken the issue 
very seriously, and kept their eyes on the goal.  It is very easy for I18N
to bamboozle people, in that there is always a fuzzy and heaving morass of
quibbling that makes people want to give up.  But in the case of XML, we
can have our cake (the fans of strict, codified naming rules can exactly
specify what is allowed) *AND* eat it (bewildered parser-writers can just
use simple tokenizing).

> to detract from it. What I would suggest is that because of the extremely
> likelihood of error if individuals do try to hack their own isNameChar(),
> and because if ever this list is revised software will be invalidated, that
> the WG, or W3C or whoever, maintain an isNameChar() routine in the common
> languages 

It is possible that isNameChar() will be adequate.  The issue of how complex
the naming rules should be is under last-minute finalization.  The important
thing is not to bee distracted by how detailed the official list is.  If
you do not have a validating XML processor (which means you in fact are 
assuming that your documents are valid) then a much simpler tokenizing regime
should  work fine.  That was a thing explicit in the discussions for the
naming system: it must be straightforward to implement a (non-validating)
XML parser.

> (C, C++, Java) so that we know we shall all be working with the same one.

There is a draft ISO technical report on this issue, for future programming 
language standards.  This technical report has clearly been influenced by
XML and SGML's approaches to the problem.  I know that the WG representatives
who are looking after finalizing the naming rules are looking at that
as well.

Rick Jelliffe

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list