Multithreaded programming offers many benefits - speed, efficiency, and the ability to harness every core that your processor has to offer. But of course, there is a catch - shared memory. When two or more threads try to access the same data, you enter a domain where unpredictability becomes a real concern. In such a scenario, the timing of thread execution can dramatically influence the outcome.

This is where mutexes come into play. A mutex provides a way to manage shared resources such that only one thread can access a given resource at a time, thus preventing undesirable side-effects.

The Problem - Race Conditions

A race condition is a bug where the outcome of a program depends on the unpredictable timing of two or more threads accessing a shared resource. In simple words, when multiple threads try to modify the same chunk of memory without proper coordination, the outcome is unpredictable.

Let us take the following example:

#include <stdio.h>
#include <pthread.h>

int count = 0;

void* increment_count()
{
    for (int i = 0; i < 1000000; i++)
    {
        count += 1;
    }
}

int main(int argc, char* argv[])
{
    pthread_t thread_1, thread_2;

    pthread_create(&thread_1, NULL, &increment_count, NULL);
    pthread_create(&thread_2, NULL, &increment_count, NULL);

    pthread_join(thread_1, NULL);
    pthread_join(thread_2, NULL);

    printf("%d\n", count);

    return 0;
}

Here, logically we would expect the value of count to be 2,000,000. However, that is not true, and the actual value is an arbitrary value between 1,000,000 and 2,000,000.

But why are these race conditions brought about? The key issue is that operations we think of as “single step” are actually multiple steps at the CPU level.

Take the simple count += 1 operation. While it may seem like an atomic operation, at the CPU level, it is actually performed in three different steps:

  1. Read the current value of count from memory into a register
  2. Increment the value of the register
  3. Write the new value back into memory.

When two threads execute this operation simultaneously, their operations can overlap in ways that can cause non-deterministic behaviour. We can see one such instance as follows:

  • Thread 1 reads count - let us assume it to be 1000.
  • Thread 2 reads count - which is still 1000, because Thread 1 hasn’t incremented and written back yet.
  • Thread 1 increments, getting 1001.
  • Thread 2 increments, getting 1001.
  • Thread 1 writes back 1001 to memory.
  • Thread 2 also writes back 1001 to memory.

Both threads are individually behaving as expected, but the combined outcome is 1001 instead of 1002. This is called a lost update.

A race condition refers to this “race” in between the threads over who complete the read-modify-write sequence before another thread interferes.

The Solution - Mutex

So, if race conditions arise due to multiple threads racing to modify the same resource, why not just coordinate the threads such that only one of them can modify the resource at a given time? Well, mutexes are one of such mechanisms used to coordinate threads.

Mutexes or mutex locks are a synchronization mechanism used to prevent multiple threads from accessing a shared resource at the same time.

Our previous program would look something like this if we implemented mutexes:

#include <stdio.h>
#include <pthread.h>

int count = 0;
pthread_mutex_t mutex;

void* increment_count()
{
    for (int i = 0; i < 1000000; i++)
    {
        pthread_mutex_lock(&mutex);
        count += 1;
        pthread_mutex_unlock(&mutex);
    }
}

int main(int argc, char* argv[])
{
    pthread_t thread_1, thread_2;

    pthread_mutex_init(&mutex, NULL);

    pthread_create(&thread_1, NULL, &increment_count, NULL);
    pthread_create(&thread_2, NULL, &increment_count, NULL);

    pthread_join(thread_1, NULL);
    pthread_join(thread_2, NULL);

    pthread_mutex_destroy(&mutex);

    printf("%d\n", count);

    return 0;
}

Now, we can observe that we consistently get the same expected output i.e. 2,000,000. The mutex, upon being locked, only allows the thread which has locked it to perform operations for the duration of it being locked. While this can lead to minor performance reduction, it fulfills the goal of preserving determinism.

At the most fundamental level, a mutex is just an integer value (0 or 1) that tells a thread whether a lock has been acquired or not. If the thread reads the mutex as 1, it will continue to “spin” until it becomes 0. If it reads the mutex as 0, it will lock the mutex and continue with its operations.

Now one might naturally wonder, since a mutex also exists in shared memory, what if two threads happen to try to lock a mutex at the same time? Wouldn’t this lead to a race condition? Well, to answer that question, we would have to descend to a much lower level, closer to the CPU.

Atomic Operations

Mutexes are built on top of low-level atomic operations provided by the hardware, like test-and-set, compare-and-swap, etc. These instructions, unlike the previously mentioned increment instruction, are atomic i.e. they are executed as a single indivisible step. Because of this, when one thread tries to acquire a lock, no other thread can interrupt the process, leading to the elimination of race conditions.

When mutexes fall short

While mutexes are a crucial thread synchronization method, they aren’t always the best choice. Mutexes can introduce performance bottlenecks, especially in scenarios where multiple threads are competing for the same lock. Moreover, mutexes struggle in scenarios where you need more nuanced control over resource access. In cases like these, more advanced synchronization primitives - such as semaphores and condvars - are used instead of mutexes.

In the following part of this series, we will explore how semaphores can address some of the limitations of mutexes, offering better performance in certain use cases.