Unix/Java design issues (Was: Re: Is CDATA "structure"?)

Wed Jul 21 04:21:42 BST 1999

On Tue, Jul 20, 1999 at 02:37:43PM -0400, John Cowan wrote:
> Nik O scripsit:
> 
> > I understand Java's intent re the *byte* type, and i agree that
> > Java's use of Unicode is a long-overdue move to non-Anglo-centric
> > computing.  However, since i've been writing rock-solid C code for
> > almost 20 years, i've long observed a common programmer laziness
> > concerning all numeric types -- namely the non-use of the
> > "unsigned" qualifier for inherently unsigned numbers (e.g. file
> > offsets, binary file contents).  The former example created an
> > unnecessary 32KB (and later 2GB) limit to file sizes handled using
> > the standard C library.  The latter example is still an issue
> > today -- at least for those whose computing environment includes
> > low-resource embedded systems and/or legacy byte-oriented data
> > formats.
> 
> The trouble is that although you can squeeze out a bit with unsigned
> numbers, you can get hosed very easily in other ways.  (BTW, the
> 15-bit limit on file sizes was gone well before the 6th Edition of
> Unix.)  Furthermore, if 31 bits isn't enough, is 32 bits really so
> much better?
> 
> The basic problem is code like this:
> 
> 	unsigned count;
> 	while (--count > 0) {
> 		/* do something */
> 		}
> 
> This loop will never fail,

Yes it will.  I presume you were thinking of >=.  There is a simple
canonical loop for down-counting:

  int count;
  for (count = size; count-- > 0;)  /* size == start + 1 */
  {
    /* Do something */
  }

In C++, the declaration can even be folded into the for construct to
keep the loop variable local to the block:

  for (int count = size; count-- >= 0;) // ...

Of course this doesn't mitigate the fact that programmers have to
remember to do this for unsigned types.  This is especially hazardous
if the type is Foo::SizeType.  And using signed types to iterate over
an object that uses an unsigned index type can generate endless
signed-unsigned comparison warnings from some compilers.

Ultimately I advocate defensiveness on both sides: define signed
interfaces and use the safe down-counting loop, regardless of integral
type.  Obviously all of this is moot in Java.  I think it comes down
to whether you need power or safety.

Usually one will find that if 2 billion isn't a large enough number
then 4 billion isn't either (or won't be for long), and a 64 kB file
size limit is hardly less annoying than a 32 kB limit, so increased
numerical range isn't as compelling an argument for unsigned types as
it might seem.  However, this is only a general argument.  It will
often break down when one looks at specific cases with particular
requirements.  For instance, it might be exceedingly annoying to be
unable to create a 40 kB data structure in a 64 kB architecture; you
know that a 64 kB limit will forever suit your needs, but the size is
signed so you are stuck with 32 kB.  You have to split your data in
unnatural ways (or use segmented access) and live with all the added
complications to your code.  It might even blow your program out
enough to blow the project.

Consequently I think it is unwise to enforce a no-unsigned-types rule.
This will almost certainly cause someone grief some day.  I would
stick to making it a recommendation.

Cheers,
Marcelo

P.S.: This advice comes from the painful experience of cleaning up the
myriad bugs that surfaced after I converted our array class to
size_type index values.

-- 
http://www.simdb.com/~marcelo/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev at ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo at ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo at ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa at ic.ac.uk)