On Thread Synchronization : Part 2 - An overview of semaphores

In the previous post of this series, we explored what mutexes are and how they solve the problem of race conditions at a fundamental level by ensuring mutual exclusion. While mutexes are incredibly powerful and have several uses, they possess the limitation of being inherently binary - a resource is either locked or it isn’t. But what would happen when you require multiple threads to access a single resource? In the case of mutexes, all of those threads compete for the same lock, which can introduce performance bottlenecks.

In this case, we have to resort to a more advanced synchronization primitive called a semaphore.

The Problem - Producer and Consumer

Consider the following scenario - you have one or more “producer” threads inserting some data into a buffer, and one or more “consumer” threads removing some data from that same buffer. You need to co-ordinate all the threads such that:

The producer threads do not add data to the buffer when it is full
The consumer threads do not attempt to remove data from the buffer when it is empty
The buffer itself is not corrupted by simultaneous access

This is a classic scenario called the Producer-Consumer problem. It can be solved using just mutexes in a naive manner, as follows:

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>

#define BUFFER_SIZE 5

int buffer[BUFFER_SIZE];
int count = 0;
pthread_mutex_t mutex;

void* producer(void* arg)
{
    int item;
    for (int i = 0; i < 10; i++)
    {
        item = rand() % 100;

        pthread_mutex_lock(&mutex);
        
        // Busy waiting for buffer to not be full
        while (count == BUFFER_SIZE)
        {
            pthread_mutex_unlock(&mutex);
            usleep(1000);
            pthread_mutex_lock(&mutex);
        }

        buffer[count] = item;
        printf("Produced %d at slot %d\n", item, i);
        count++;

        pthread_mutex_unlock(&mutex);
        usleep(100000);
    }
    return NULL;
} 

void* consumer(void* arg)
{
    for (int i = 0; i < 10; i++)
    {
        pthread_mutex_lock(&mutex);
        
        // Busy waiting for buffer to not be empty
        while (count == 0)
        {
            pthread_mutex_unlock(&mutex);
            usleep(1000);
            pthread_mutex_lock(&mutex);
        }

        int item = buffer[count - 1];
        printf("Consumed %d from slot %d\n", item, i);
        count--;

        pthread_mutex_unlock(&mutex);
        usleep(150000);
    }
    return NULL;
}

int main(int argc, char* argv[])
{
    pthread_t prod1, prod2, prod3; // Producer threads 
    pthread_t cons1, cons2, cons3; // Consumer threads

    pthread_mutex_init(&mutex, NULL);

    pthread_create(&prod1, NULL, &producer, NULL);
    pthread_create(&prod2, NULL, &producer, NULL);
    pthread_create(&prod3, NULL, &producer, NULL);

    pthread_create(&cons1, NULL, &consumer, NULL);
    pthread_create(&cons2, NULL, &consumer, NULL);
    pthread_create(&cons3, NULL, &consumer, NULL);

    pthread_join(prod1, NULL);
    pthread_join(prod2, NULL);
    pthread_join(prod3, NULL);

    pthread_join(cons1, NULL);
    pthread_join(cons2, NULL);
    pthread_join(cons3, NULL);

    pthread_mutex_destroy(&mutex);

    return 0;
}

In this example, we have three producer threads and three consumer threads. The producer threads insert a random value into the buffer, which is an array with 5 elements. Meanwhile, the consumer threads read the values in the buffer on a first in-first out basis and prints them to the console. A mutex lock ensures that only one thread can access the buffer at a single time.

The problem with using mutexes in this scenario is that the mutex doesn’t tell me exactly when the buffer is empty or full. When I encounter a situation when I cannot operate on the buffer, like when it is full for the producer or empty for the consumer,the most optimum path forward is to sleep for an arbitrary amount of time and recheck if my condition has been satisfied. Essentially, a mutex cannot say “block this thread until there is space in the buffer” or “wake this thread when an item becomes available”. This leads to needless busy-waiting scenarios which in turn introduce bottlenecks.

The Solution - Semaphore

A semaphore is a synchronization primitive that maintains a numerical value and provides two atomic operations: wait (also known as P) and signal (also known as V or post). They were developed by legendary mathematician Edsgar Djikstra. Unlike a mutex, a semaphore is non-binary and can actually “count” in a sense.

At its very core, a semaphore represents a count of available resources, in the sense that the integer value represents how many resources are available. When a thread wants to use a resource, it performs a wait operation. When the thread is finished with that resource, it performs a signal operation.

The two operations of a semaphore work as follows:

wait (P):

If the semaphore value is greater than 0, decrement it atomically
If the semaphore value is 0, block the thread until it becomes greater than 0

signal (V):

Increment the semaphore value atomically
If any of the threads are blocked waiting, wake one of them up

Types of semaphores

Binary semaphores: Binary semaphores have either a value of 0 or 1, hence they function essentially like a mutex lock. However, a key difference between a mutex lock and a binary semaphore is that any thread can signal a binary semaphore, unlike a mutex lock, which can only be unlocked by the thread that acquired it.
Counting semaphores: Unlike binary semaphores, counting semaphores can have any non-negative integer value. Hence, they can be used to represent multiple instances of a resource.

Revisting the producer-consumer problem, this time we solve it using two counting semaphores and a mutex lock:

#include <stdio.h>
#include <pthread.h>
#include <semaphore.h>
#include <stdlib.h>
#include <unistd.h>

#define BUFFER_SIZE 5

int buffer[BUFFER_SIZE];
int in = 0;
int out = 0;

sem_t empty_slots; // Semaphore to count empty slots
sem_t full_slots;  // Semaphore to count full slots
pthread_mutex_t mutex;

void* producer(void* arg)
{
    int item;
    for (int i = 0; i < 10; i++)
    {
        item = rand() % 100;

        sem_wait(&empty_slots); // Producer "waits" on empty_slots
        pthread_mutex_lock(&mutex);

        buffer[in] = item;
        printf("Produced: %d at position %d\n", item, in);
        in = (in + 1) % BUFFER_SIZE;

        pthread_mutex_unlock(&mutex);
        sem_post(&full_slots); // Producer "signals" full_slots

        usleep(100000);
    }
    return NULL;
}

void* consumer(void* arg)
{
    int item;
    for (int i = 0; i < 10; i++)
    {
        sem_wait(&full_slots); // Consumer "waits" on full_slots
        pthread_mutex_lock(&mutex);

        item = buffer[out];
        printf("Consumed: %d from position %d\n", item, out);
        out = (out + 1) % BUFFER_SIZE;

        pthread_mutex_unlock(&mutex);
        sem_post(&empty_slots); // Consumer "signals" empty_slots

        usleep(150000);
    }
    return NULL;
}

int main(int argc, char* argv[])
{
    pthread_t prod1, prod2, prod3;
    pthread_t cons1, cons2, cons3;

    sem_init(&empty_slots, 0, BUFFER_SIZE);
    sem_init(&full_slots, 0, 0);

    pthread_mutex_init(&mutex, NULL);

    pthread_create(&prod1, NULL, &producer, NULL);
    pthread_create(&prod2, NULL, &producer, NULL);
    pthread_create(&prod3, NULL, &producer, NULL);

    pthread_create(&cons1, NULL, &consumer, NULL);
    pthread_create(&cons2, NULL, &consumer, NULL);
    pthread_create(&cons3, NULL, &consumer, NULL);

    pthread_join(prod1, NULL);
    pthread_join(prod2, NULL);
    pthread_join(prod3, NULL);

    pthread_join(cons1, NULL);
    pthread_join(cons2, NULL);
    pthread_join(cons3, NULL);

    pthread_mutex_destroy(&mutex);

    sem_destroy(&empty_slots);
    sem_destroy(&full_slots);

    return 0;
}

In this example, the empty_slots semaphore tracks how many empty positions are available in the buffer, while the full_slots semaphore tracks how many items in the buffer are ready to be consumed. The producer threads wait on empty_slots before producing, which essentially entails that if empty_slots > 0, the producer thread can produce the value, and if empty_slots = 0, the producer thread will get blocked until an empty slot becomes available. Conversely, the consumer threads wait on full_slots before consuming from the buffer, which essentially entails that if full_slots > 0, the consumer thread can consume the value, however if full_slots = 0, the consumer thread will get blocked until a consumable value becomes available in the buffer. A mutex is still used to ensure that only one thread actually modifies the buffer at a time.

In summary, at the most fundamental level, a semaphore contains of an integer value and a queue of waiting threads. When sem_wait() is called, the system atomically checks if the value is greater than 0. If yes, then it decrements the value and the thread continues. If no, then the thread is added to the queue of waiting threads and blocked. When sem_post() is called, the system atomically increases the value and wakes up a waiting thread.

Just like mutexes, semaphores also rely on atomic operations at a hardware level to prevent race conditions on the semaphore value itself.

When semaphores fall short

While semaphores are much more flexible and efficient than mutexes in some cases, they are not without their shortcomings.

One major issue is that semaphores are purely count-based. They can tell you “there are n resources available” but they cannot be used to implement complex conditions such as “wake the thread up when the buffer is half-full”. For such cases, you would need to manually track the state of the buffer and use multiple semaphores, which can quickly become needlessly complex and increasingly prone to errors.

Semaphores are also blurry when it comes to the concept of ownership. While a mutex knows which thread has locked it, and can be only unlocked by that thread, a semaphore does not care which thread performs wait or signal operations. Because of this, one must be especially careful when using semaphores, as debugging them can be difficult. A common bug occurs when a thread signals a semaphore even when it hasn’t waited on it, leading to over-signalling bugs where more threads enter a critical section than intended.

In scenarios where you need complex conditions rather than simply counting resources, another synchronization primitive called conditional variables provide a more intuitive and efficient solution. In the following part of the series, we will have a look at condvars, and how they enable threads to efficiently wait for certain conditions.

The Problem - Producer and Consumer¶

The Solution - Semaphore¶

Types of semaphores¶

When semaphores fall short¶

The Problem - Producer and Consumer

The Solution - Semaphore

Types of semaphores

When semaphores fall short