My name
is
Jon Skeet

2D Array vs Array of Arrays in tight loop performance C#

I had a look and couldn't see anything quite answering my question.

I'm not exactly the best at creating accurate 'real life' tests, so i'm not sure if that's the problem here. Basically I want to create a few simple neural networks to create something to the effect of Gridworld. Performance of these neural networks will be critical and i dont want the hidden layer to be a bottleneck as much as possible.

I would rather use more memory and be faster, so I opted to use arrays instead of lists (due to lists having an extra bounds check over arrays). The arrays aren't always full, but because the if statement (check if the element is null) is the same until the end, it can be predicted and there is no performance drop from that at all.

My question comes from how I store the data for the network to process. I figured due to 2D arrays storing all the data together it would be better cache wise and would run faster. But from my mock up test that an array of arrays performs much better in this scenario.

Some Code:

    private void RunArrayOfArrayTest(float[][] testArray, Data[] data)
    {
        for (int i = 0; i < testArray.Length; i++) {
            for (int j = 0; j < testArray[i].Length; j++) {
                var inputTotal = data[i].bias;

                for (int k = 0; k < data[i].weights.Length; k++) {
                    inputTotal += testArray[i][k];
                }
            }
        }
    }

    private void Run2DArrayTest(float[,] testArray, Data[] data, int maxI, int maxJ)
    {
        for (int i = 0; i < maxI; i++) {
            for (int j = 0; j < maxJ; j++) {
                var inputTotal = data[i].bias;

                for (int k = 0; k < maxJ; k++) {
                    inputTotal += testArray[i, k];
                }
            }
        }
    }

These are the two functions that are timed. Each 'creature' has its own network (The first for loop), each network has hidden nodes (The second for loop) and i need to find the sum of the weights for each input (The third loop). In my test i stripped it so that it's not really what i am doing in my actual code, but the same amount of loops happen (The data variable would have it's own 2D array, but i didn't want to possibly skew the results). From this i was trying to get a feel for which one is faster, and to my surprise the array of arrays was.

Code to start the tests:

        // Array of Array test
        Stopwatch timer = Stopwatch.StartNew();

        RunArrayOfArrayTest(arrayOfArrays, dataArrays);

        timer.Stop();
        Console.WriteLine("Array of Arrays finished in: " + timer.ElapsedTicks);

        // 2D Array test
        timer = Stopwatch.StartNew();

        Run2DArrayTest(array2D, dataArrays, NumberOfNetworks, NumberOfInputNeurons);

        timer.Stop();
        Console.WriteLine("2D Array finished in: " + timer.ElapsedTicks);

Just wanted to show how i was testing it. The results from this in release mode give me values like:

Array of Arrays finished in: 8972
2D Array finished in: 16376

Can someone explain to me what i'm doing wrong? Why is an array of arrays faster in this situation by so much? Isn't a 2D array all stored together, meaning it would be more cache friendly?

Note i really do need this to be fast as it needs to sum up hundreds of thousands - millions of numbers per frame, and like i said i don't want this is be a problem. I know this can be multi threaded in the future quite easily because each network is completely separate and even each node is completely separate.

Last question i suppose, would something like this be possible to run on the GPU instead? I figure a GPU would not struggle to have much larger amounts of networks with much larger numbers of input/hidden neurons.

In the CLR, there are two different types of array:

Vectors, which are zero-based, single-dimensional arrays
Arrays, which can have non-zero bases and multiple dimensions

Your "array of arrays" is a "vector of vectors" in CLR terms.

Vectors are significantly faster than arrays, basically. It's possible that arrays could be optimized further in later CLR versions, but I doubt that there'll get the same amount of love as vectors, as they're so relatively rarely used. There's not a lot you can do to make CLR arrays faster. As you say, they'll be more cache friendly, but they have this CLR penalty.

You can improve your array-of-arrays code already, however, by only performing the first indexing operation once per row:

private void RunArrayOfArrayTest(float[][] testArray, Data[] data)
{
    for (int i = 0; i < testArray.Length; i++) {

        // These don't change in the loop below, so extract them
        var row = testArray[i];            
        var inputTotal = data[i].bias;
        var weightLength = data[i].weights.Length;
        for (int j = 0; j < row.Length; j++) {
            for (int k = 0; k < weightLength; k++) {
                inputTotal += row[k];
            }
        }
    }
}

If you want to get the cache friendliness and still use a vector, you could have a single float[] and perform the indexing yourself... but I'd probably start off with the array-of-arrays approach.

See more on this question at Stackoverflow