My name
is
Jon Skeet

BigInteger.toByteArray() returns purposeful leading zeros?

I'm transforming bigints into binary, radix16 and radix64 encoding and seeing mysterious msb zero paddings. Is this a biginteger problem that I can workaround by stripping zero padding or perhaps doing something else?

My test code:

    String s;
    System.out.printf( "%s length %d\n", s = "123456789A", (new BigInteger( s, 16 )).toByteArray().length );
    System.out.printf( "%s length %d\n", s = "F23456789A", (new BigInteger( s, 16 )).toByteArray().length );

Produces output:

    123456789A length 5
    F23456789A length 6

Of which the longer array has zero padding at the front. Upon inspection of BigInteger.toByteArray() I see:

public byte[] toByteArray() {
    int byteLen = bitLength()/8 + 1;
    byte[] byteArray = new byte[byteLen];

Now, I can find private int bitLength;, but I can't quite find where bitLength() is defined to figure out exactly why this class does this - connected to sign extension perhaps?

Yes, this is the documented behaviour:

The byte array will be in big-endian byte-order: the most significant byte is in the zeroth element. The array will contain the minimum number of bytes required to represent this BigInteger, including at least one sign bit, which is (ceil((this.bitLength() + 1)/8)).

bitLength() is documented as:

Returns the number of bits in the minimal two's-complement representation of this BigInteger, excluding a sign bit.

So in other words, two values with the same magnitude will always have the same bit length, regardless of sign. Think of a BigInteger as being an unsigned integer and a sign bit - and toByteArray() returns all the data from both parts, which is "the number of bits required for the unsigned integer, and one bit for the sign".

See more on this question at Stackoverflow