Is CDATA "structure"?
cowan at locke.ccil.org
Tue Jul 20 16:28:09 BST 1999
Nik O wrote:
> <rant original_post_to="Z3950IW">
> ..much of the internet is still constrained by Unix's feeble 7-bit character
> TTY legacy. This latter issue is echoed in Java's lack of _unsigned_ bytes
> (!), and XML's auto-conversion of CRLF-delimited text records to
> LF-delimited records (yet another legacy/bias from Unix).
> Is there historical basis to the above statement? It was a deduction based
> upon the old Xenix "text mode" I/O and the probability that most of the
> developers of the XML standard were based in the Unix world.
Java doesn't have unsigned arithmetic values (and type *byte* is
meant to be arithmetic) because they have all kinds of surprising
results if misused: see the relevant sections of _Writing Solid C_.
The purposes served by unsigned bytes are better served by characters;
you can't just cast bytes to characters, though, but need to use
c = b < 0 ? b + 256 : b instead.
The use of LF as newline is Unix, but is also ISO; the character
code was explicitly ambiguous between advance-paper and go-to-new-line.
In the rarely used C1 character set, there are two separate control
characters to disambiguate these functions. As for "TTY legacy",
real Teletypes (at least models 33/35) want CR/LF, not just LF.
John Cowan http://www.ccil.org/~cowan cowan at ccil.org
Schlingt dreifach einen Kreis um dies! / Schliesst euer Aug vor heiliger Schau,
Denn er genoss vom Honig-Tau / Und trank die Milch vom Paradies.
-- Coleridge / Politzer
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)
More information about the Xml-dev