SDATA or UNICODE

Paul Prescod papresco at technologist.com
Thu Jan 29 05:31:23 GMT 1998


On Thu, 29 Jan 1998, Rick Jelliffe wrote:
> > No, for the reason Tim points out. On the other hand, you might be on the 
> > right track. A processing instruction would serve as a hack to tell the 
> > application where to insert the euro. <?EURO>
> 
> XML has, underlying its decisions, the SGML model which separates the
> encoding of data (i.e. "storage management") from their logical representation
> as streams of characters in a single character set (i.e. "entity management").

I'm not sure how your observation argues against my proposed hack to 
insert a non-Unicode character into a Unicode document. This is not an 
issue of encodings, but of character sets.

> If your customers
> require multiple encodings, then they have to source each one from a separate
> external entity. These entities can be bundled up or interleaved in any
> fashion you like, but this is a *PRE* XML storage management issue, not
....
> But once you have changed encodings, do you scan for the end of the
> marked section using the old or the new encoding? These kinds of ISO 2022
> mode changing are what we are trying to get rid of from XML (and from
> SGML).

It is exactly *because* the issues do not belong in XML, and are "*PRE* 
XML" that I advised a preprocessor. I don't see anything that argues 
against that here. As far as the signalling of mode switches -- it 
depends on the encodings in question. 

> So you can have multiple encodings before the parser, but not being presented
> to the parser. The other choice is multiple encodings after the parser: e.g.
> embedded the SJIS encoded in a latin-1-safe way. This is the same as Dave's 
> comment about transliteration using notation. You can have a document like
> 
> <?XML version="1.0" encoding="8859-1"?>
> <!DOCTYPE x SYSTEM "x.dtd"
> [
> 	<!NOTATION sjis-Qencoded SYSTEM "SjisQ.pl">
> 	<!ELEMENT SJIS-SECTION ( #PCDATA ) >
> 	<!ATTLIST SJIS-SECTION
> 		I-need-decoding NOTATION ( sjis-Qencoded ) > 
> ]>
> <x>
> ...
> 
> <SJIS-SECTION><![CDATA[
> smdkfjhhjwfnnweofijslkdm
> ]]></SJIS-SECTION>
> ...
> </x>

You had better hope that CDEnd does not appear in the encoded data!

 Paul Prescod


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)




More information about the Xml-dev mailing list