Editing text

Peter Murray-Rust peter at ursus.demon.co.uk
Fri Nov 28 01:55:35 GMT 1997

I am writing an editor for JUMBO where I expect most of the characters like
'"<>& to have been converted into entities (e.g. &apos, etc.). [I do not
expect any raw <![CDATA[ sections in the text - they will have been
transformed by the parser. On the other hand there may be other entities
which have not been expanded (e.g. &foo;

My understanding of the spec [71] is that an entity is a Name and that Names
[4], [5] and [6] are constructed from letters, digits and numbers. In
determining whether something is an entity, I have to look for a string of
the form: '&'(Letter | '_' | ':') (NameChar)* ';'
NameChars are Digits, MiscNames and Letters.

Appendix B lists six and a half pages of potential NameChars for which
JUMBO has to test - is this correct? If so I have code of the form:

public boolean isNameChar(char ch) {
    return <six pages of conditionals>;

I assume there is no short cut...
I applaud the work of the WG on the Internationalisation and I don't want
to detract from it. What I would suggest is that because of the extremely
likelihood of error if individuals do try to hack their own isNameChar(),
and because if ever this list is revised software will be invalidated, that
the WG, or W3C or whoever, maintain an isNameChar() routine in the common
(C, C++, Java) so that we know we shall all be working with the same one.

There may be other similar aspects of the spec where it is worth having a
central curated resource...


Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list