Mix encodings in a document?
Tony Graham
tgraham at mulberrytech.com
Mon Sep 28 21:32:04 BST 1998
At 23 Sep 1998 16:21 -0400, John Cowan wrote:
> Deke Smith wrote:
> > And what is the implications of this (if any) for XML rendering? I'm not
> > sure of what you mean by "surrogates are correctly processed."
>
> Essentially it means that the two 16-bit values that form a
> surrogate-pair (representing a Unicode character on the Astral
> Plane) is always treated as a single character.
>
> In XML, surrogate-pairs can appear only in attribute values, #PCDATA
> content, PIs, and comments; they are not allowed in element GIs,
> attribute names, or the like.
Surrogate pairs are not allowed in parsed entities. The production
for Char excludes the surrogate blocks:
[2] Char::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
| [#x10000-#x10FFFF]
You can include non-BMP/non-UCS-2 characters by making numeric
references to their Unicode Scalar Value (or by using UCS-4).
Regards,
Tony Graham
======================================================================
Tony Graham mailto:tgraham at mulberrytech.com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9632
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list