Calculating prime numbers on all your cores

I was playing a bit with OpenMP in GCC and decided to try something fun. Like calculating the first X number of prime numbers.

You know… Numbers that are only dividable by one and itself and that are greater than one. So, 2 is prime. 2 is also the only even prime number because all other even numbers are dividable by two by definition.

There is only one way to know if a number is prime. You need to check a number against every smaller prime number. If it turns out that it is divisible by any prime, then the number is not prime.

Lets start coding

#include <iostream>

#define NUMBER_OF_PRIMES 100000

int main()
{
    uint64_t primes[NUMBER_OF_PRIMES];
    uint32_t count = 0;

    for(uint64_t i = 2; i < UINT64_MAX && count < NUMBER_OF_PRIMES; i++)
    {
        bool isPrime = true;

        #pragma omp parallel for
        for (uint32_t x = 0; x < count; x++) {
            if(!isPrime){
                x = count;
                continue;
            }

            if(i % primes[x] == 0) {
                #pragma omp critical
                isPrime = false;
            }
        }

        if(isPrime) {
            primes[count++] = i;
            std::cout << "Found " << count << "th prime: " << i << std::endl;
        }
    }

    return 0;
}

As you can see i have some #pragma omp directives in my code on some places.

#pragma omp parallel for tells the compiler to generate code that executes the following for loop in multiple cores where it divides the work up in chunks.

#pragma omp critical tells the compiler that the following statement should not be executed by multiple threads at the same time.

Now it is compile time!!!

$ g++ -Ofast main.cpp

Note that this command does not enable OpenMP. But I would like to benchmark. So I run it with:

$ time ./a.out 
... 
... 
... 
real    0m31,409s 
user    0m31,324s 
sys     0m0,080s

Now it uses one core to the max but it takes about 30 seconds to complete on my system.

Lets try again with openmp enabled

$ g++ -Ofast -fopenmp main.cpp

$ time ./a.out
...
...
...
real    0m9,687s
user    1m50,757s
sys     0m1,008s

That’s quite an improvement. It went from 30 seconds to about 10 seconds. But my computer has 12 cores and they were all 100% busy at the time. So it’s not running efficiently in this example.

I am sure there are some optimizations possible, and that there are probably way better use cases for OpenMP.