Character encoding questions
Eric Baatz - Sun Microsystems Labs BOS
ebaatz at barbaresco.East.Sun.COM
Wed Jun 25 21:26:17 BST 1997
I was struck by the following sentence in the Microsoft XML White Paper:
XML supports a range of encodings...subject only to the restriction
that an entire document must share the same encoding.
My immediate reaction was that that wasn't correct, although the
definition of "document" above isn't obvious to me (for example, are
external entities part of a document?). However, when checking into the
XML April specification, I got in over my head. I am hoping that someone
here will help me out of my hole.
If my XML document is a simple Unicode text file then I begin it like
the following
a Byte Order Mark
<?XML version="1.0" encoding="ISO-10646-UCS-2"?>
...
with the Byte Order Mark being required even though an EncodingDecl is
used? (I would have said "yes" until I got to Appendix E "Autodetection
of Character Sets," which worries about detecting UCS-2 when there
is no Byte Order Mark.) Is the EncodingDecl necessary if the file
starts with a Byte Order Mark?
Where can I have an EncodingPI? Section 4.3.3 talks about their being
"at the beginning of a system entity, before any other character data or
markup" but doesn't define "system entity" (perhaps one that has an
ExternalID that contains "SYSTEM"?). If my document references an
external entity, then I believe that the external entity must start
with an EncodingPI (see Appendix E "Autodetection of Character Sets")
if it isn't in UTF-8 or start with a Byte Order Mark.
If I wanted to take the external entity and, for portability reasons,
bundle it into my XML document as an internal entity, what do I do with
the external entity's EncodingPI? It doesn't seem to be allowed in the
internal entity declaration, somewhat like:
<!ENTITY Pub-Status <?XML encoding="ISO-10646-UCS-2"?>"text here">
I presume that the answer is that I cannot convert an external entity
into an internal unless the external entity and my XML document have the
same encoding.
What is the motivation for not allowing a change of encoding within
an entity? The mechanism for handling that seems no different than
that needed to handle different encodings in external entities, which
I think of as being logically a part of the referencing document.
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo at ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the Xml-dev
mailing list