Mix encodings in a document?

Tony Graham tgraham at mulberrytech.com
Mon Sep 28 21:32:04 BST 1998

At 23 Sep 1998 16:21 -0400, John Cowan wrote:
 > Deke Smith wrote:
 > > And what is the implications of this (if any) for XML rendering? I'm not
 > > sure of what you mean by "surrogates are correctly processed."
 > Essentially it means that the two 16-bit values that form a
 > surrogate-pair (representing a Unicode character on the Astral
 > Plane) is always treated as a single character.
 > In XML, surrogate-pairs can appear only in attribute values, #PCDATA
 > content, PIs, and comments; they are not allowed in element GIs,
 > attribute names, or the like.

Surrogate pairs are not allowed in parsed entities.  The production
for Char excludes the surrogate blocks:

[2] Char::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
            | [#x10000-#x10FFFF]

You can include non-BMP/non-UCS-2 characters by making numeric
references to their Unicode Scalar Value (or by using UCS-4).


Tony Graham
Tony Graham                            mailto:tgraham at mulberrytech.com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9632
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
  Mulberry Technologies: A Consultancy Specializing in SGML and XML

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)

More information about the Xml-dev mailing list