Most efficient way to count occurrences?

I've got an array of bytes (primitive), they can have random values. I'm trying to count occurrences of them in the array in the most efficient/fastest way. Currently I'm using:

HashMap<Byte, Integer> dataCount = new HashMap<>();
for (byte b : data) dataCount.put(b, dataCount.getOrDefault(b, 0) + 1);

This one-liner takes ~500ms to process a byte[] of length 24883200. Using a regular for loop takes at least 600ms.

I've been thinking of constructing a set (since they only contain one of each element) then adding it to a HashMap using Collections.frequency(), but the methods to construct a Set from primitives require several other calls, so I'm guessing it's not as fast.

What would be the fastest way to accomplish counting of occurrences of each item?

I'm using Java 8 and I'd prefer to avoid using Apache Commons if possible.

Jon Skeet
people
quotationmark

I would create an array instead of a HashMap, given that you know exactly how many counts you need to keep track of:

int[] counts = new int[256];
for (byte b : data) {
    counts[b & 0xff]++;
}

That way:

  • You never need to do any boxing of either the keys or the values
  • Nothing needs to take a hash code, check for equality etc
  • It's about as memory-efficient as it gets

Note that the & 0xff is used to get a value in the range [0, 255] instead of [-128, 127], so it's suitable as the index into the array.

people

See more on this question at Stackoverflow