Why is it faster to process sorted array than an unsorted array?

In this blog post, we learn why is it faster to process sorted array than an unsorted array? We will see a C++ code to check the performance of the sorted and unsorted array. In C++, it is faster to process a sorted array than an unsorted array because of branch prediction.

Here is a C++ code that illustrates that sorting the data miraculously makes the code faster than the unsorted version. Let’s try out a sample C++ program to understand the problem statement better.

Unsorted Array:

Here we are creating an unsorted array and analyzing the processing time.

#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];

    for (unsigned c = 0; c < arraySize; ++c)
    {
        data[c] = std::rand() % 256;
    }


    // Test timing
    clock_t start = clock();
    long long sum = 0;

    for (unsigned i = 0; i < 100000; ++i)
    {
        // Primary loop
        for (unsigned c = 0; c < arraySize; ++c)
        {
            if (data[c] >= 128)
                sum += data[c];
        }
    }

    double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;

    std::cout << elapsedTime << std::endl;
    std::cout << "sum = " << sum << std::endl;

    return 0;
}

Output:

Sorted Array:

Now we are sorting the array using the sort function and analyzing the processing time of sorted array.

#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];

    for (unsigned c = 0; c < arraySize; ++c)
    {
        data[c] = std::rand() % 256;
    }

    //Sorting the array
    std::sort(data, data + arraySize);

    // Test timing
    clock_t start = clock();
    long long sum = 0;

    for (unsigned i = 0; i < 100000; ++i)
    {
        // Primary loop
        for (unsigned c = 0; c < arraySize; ++c)
        {
            if (data[c] >= 128)
                sum += data[c];
        }
    }

    double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;

    std::cout << elapsedTime << std::endl;
    std::cout << "sum = " << sum << std::endl;

    return 0;
}

Output:

Observe that time taken for processing a sorted array is less as compared to the unsorted array. The reason for this optimization for the sorted array is branch prediction.

What is branch prediction?

In computer architecture, branch prediction means determining whether a conditional branch(jump) in the instruction flow of a program is likely to be taken or not. All the pipelined processors do branch prediction in some form because they must guess the address of the next instruction to fetch before the current instruction has been executed.

Why is processing a sorted array faster than an unsorted array?

Let consider the above-mentioned example where sorted array processing is faster in comparison to the unsorted array.

if (data[c] >= 128)
    sum += data[c];

Case 1: Sorted Array

Notice that the data is evenly distributed between 0 and 255. When the data is sorted, roughly the first half of the iterations will not enter the if-statement. After that, they will all enter the if-statement.

This is very friendly to the branch predictor since the branch consecutively goes the same direction many times. Even a simple saturating counter will correctly predict the branch except for the few iterations after it switches direction.

Quick visualization:

T = branch taken
N = branch not taken

data[] = 0, 1, 2, 3, 4, ... 126, 127, 128, 129, 130, ... 250, 251, 252, ...
branch = N  N  N  N  N  ...   N    N    T    T    T  ...   T    T    T  ...

       = NNNNNNNNNNNN ... NNNNNNNTTTTTTTTT ... TTTTTTTTTT  (easy to predict)

Case 2: Unsorted Array

However, when the data is completely random, the branch predictor is rendered useless, because it can’t predict random data. Thus there will probably be around 50% misprediction (no better than random guessing).

A branch prediction works on the pattern the algorithm is following or basically the history, how it got executed in previous steps. If the guess is correct, then CPU continues executing and if it goes wrong, then CPU needs to flush the pipeline and roll back to the branch and restart from the beginning.

data[] = 226, 185, 125, 158, 198, 144, 217, 79, 202, 118,  14, 150, 177, 182, 133, ...
branch =   T,   T,   N,   T,   T,   T,   T,  N,   T,   N,   N,   T,   T,   T,   N  ...

       = TTNTTTTNTNNTTTN ...   (completely random - hard to predict)

How to increase the performance of the unsorted array?

If the compiler isn’t able to optimize the branch into a conditional move, you can try some hacks if you are willing to sacrifice readability for performance.

So let see an example,

If in the above code we remove the if condition with some hack statement, it definitely increases the performance.

if (data[c] >= 128)
    sum += data[c];
  
  
  
 Replace With
    ||
    \/
    
    
int t = (data[c] - 128) >> 31;
sum += ~t & data[c];

Now let see the performance of the above changes with unsorted array on the same platform.

#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];

    for (unsigned c = 0; c < arraySize; ++c)
    {
        data[c] = std::rand() % 256;
    }

    // Test timing
    clock_t start = clock();
    long long sum = 0;

    for (unsigned i = 0; i < 100000; ++i)
    {
        // Primary loop
        for (unsigned c = 0; c < arraySize; ++c)
        {
            int t = (data[c] - 128) >> 31;
            sum += ~t & data[c];
        }
    }

    double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;

    std::cout << elapsedTime << std::endl;
    std::cout << "sum = " << sum << std::endl;

    return 0;
}

Output:

Note: This hack is not strictly equivalent to the original if-statement And the performance of the code could be different on different platforms.

Unsorted Array:

Sorted Array:

What is branch prediction?

Why is processing a sorted array faster than an unsorted array?

How to increase the performance of the unsorted array?

Recommended Articles for you: