Since the past few months, I seem to have caught a bug named low-level programming. It started with C++, then I moved on to C. But before long, not even C could scratch the itch of getting ever closer to bare metal. And that is how I ended up here - writing code in pure x86 assembly.

Memory management is a key element of low level programming - yet most developers are, for some reason, irrationally afraid of it. I, on the other hand, am incredibly fascinated by the idea of having the power to manually allocate and deallocate memory at my will.

Most developers take malloc and free for granted - they’re just there, doing their job in the background. But what actually happens when you allocate memory? This question is what drove me to strip away all the abstractions and build a minimal allocator from scratch.

The Theory

At its core, this allocator relies on two internal Linux system calls:

mmap2 - Memory Mapping

mmap2 is the syscall which is used to request memory pages from the kernel. Essentially, you are asking the OS to give your process a specified region of virtual memory. Here is the parameters it takes in:

  • Address: Where you want the memory (pass 0 to leave it to the kernel)
  • Length: What is the size of memory required
  • Protection: What operations is the program allowed to do on this memory (e.g. read, write, etc)
  • Mapping: How visible is the memory i.e. private to this process, anonymous, etc
  • File Descriptor: Which file to map (set to -1 for anonymous memory)
  • Page Offset: Offset in the file (must be 0 for anonymous memory)

For our implementation, we use PROT_READ | PROT_WRITE with MAP_PRIVATE | MAP_ANONYMOUS, which gives us private read-write memory that isn’t backed by any file - exactly what we need for malloc.

munmap - Memory Unmapping

munmap is the inverse operation of mmap2. It tells the OS kernel to deallocate a chunk of memory once we are done using it. Here is the parameters it takes in:

  • Address: The starting address of the memory region
  • Length: The size of the memory region The kernel then marks those pages as “freed”, which means that those regions of memory can be reallocated to other programs.

The Interface Layer

The big idea behind this project is simple - create wrapper functions that interface with these system calls. Instead of the programmer having to fiddle with mmap2 flags and protection, he will simply call malloc(size) and free(ptr). Another key addition in this implementation is the header. The header is 8 bytes of memory prefixing the allocated memory block - the first four bytes contains the metadata i.e. the size of the memory block, and the next four bytes contains a magic number that is used for validation. This is useful because the munmap syscall requires the length of the memory to be freed, and instead of keeping track of this as a separate variable, we can prefix this in the header itself.

The Implementation

To start off, we declare some static variables as follows:

section .data
    MAGIC          equ 0xDEADC0DE
    HEADER_SIZE    equ 8

    SYSCALL_mmap2  equ 192
    SYSCALL_munmap equ 91

    PROT_READ      equ 1
    PROT_WRITE     equ 2
    PROT_RW        equ (PROT_READ | PROT_WRITE)

    MAP_PRIVATE    equ 0x02
    MAP_ANONYMOUS  equ 0x20
    MAP_FLAGS      equ (MAP_PRIVATE | MAP_ANONYMOUS)

Some of these are self explanatory - like MAP_PRIVATE and PROT_READ. The others, we will get into very soon.

The allocator - malloc

Our malloc function takes in one parameter over the stack - which is the size of memory to be allocated. As an output, it returns the pointer to the beginning of the memory block in the eax register.

malloc:
    push ebp                    ; Prologue
    mov ebp, esp

    mov eax, [ebp+8]            ; Load eax with the input param
    test eax, eax               ; Check if requested memory size is 0
    jz .malloc_zero

    add eax, HEADER_SIZE        ; Allocate space for the header
    mov ecx, eax                ; ecx contains length of memory (for mmap2)

    xor ebx, ebx                ; ebx contains address of memory (for mmap2)
    mov edx, PROT_RW            ; edx contains protection flags (for mmap2)
    mov esi, MAP_FLAGS          ; esi contains mapping flags (for mmap2)
    mov edi, -1                 ; edi contains file descriptor (for mmap2)

    push ebp
    xor ebp, ebp                ; ebp contains page offset (for mmap2)

    mov eax, SYSCALL_mmap2      ; mmap2 syscall
    int 0x80

    pop ebp

    cmp eax, 0xfffff000         ; Check if returned pointer is valid
    jae .malloc_fail

    mov [eax], ecx              ; Store the size of memory in header
    mov dword [eax + 4], MAGIC  ; Store the magic number in header

    add eax, HEADER_SIZE        ; Move the pointer to the start of usable memory
    mov esp, ebp                ; Epilogue
    pop ebp
    ret

.malloc_zero:
    xor eax, eax                ; Return a NULL pointer if zero bytes requested
    mov esp, ebp                ; Epilogue
    pop ebp
    ret

.malloc_fail:
    xor eax, eax                ; Return a NULL pointer if mmap2 failed
    mov esp, ebp                ; Epilogue
    pop ebp
    ret

When the programmer calls the malloc function, here’s what happens:

  1. Size validation: If zero bytes of memory is requested, the function returns NULL immediately.
  2. Header allocation: It allocates an extra 8 bytes of memory to hold the metadata.
  3. System call: It calls mmap2 to request memory from the kernel with read/write permissions.
  4. Metadata storage: It stores the size of the allocation and the magic number 0xDEADC0DE in the header.
  5. Pointer adjustment: It returns a pointer which points to the beginning of the useable memory block.

The deallocator - free

Our free function also takes in one parameter over the stack - which is the pointer to the beginning of the memory block to be freed. It deduces the size of the memory block from the header we created earlier.

free:
    push ebp                    ; Prologue
    mov ebp, esp

    mov eax, [ebp+8]            ; Load eax with the input param
    test eax, eax               ; Check if input is NULL pointer
    jz .free_done

    sub eax, HEADER_SIZE        ; Move the pointer back to read the header
    mov ebx, eax                ; ebx now points to the size in header

    mov ecx, [ebx+4]            ; ecx now points to the magic number in header
    cmp ecx, MAGIC              ; Validate if magic number is valid
    jne .bad_magic

    mov edx, [ebx]              ; edx contains the size of memory (for munmap)
    mov eax, SYSCALL_munmap     ; munmap syscall
    mov ecx, edx

    int 0x80

    jmp .free_done

.bad_magic:
    nop                         ; Error handling logic if memory is corrupted

.free_done:
    mov esp, ebp                ; Epilogue
    pop ebp
    ret

When the programmer calls the free function, here’s what happens:

  1. NULL check: If the pointer passed is a NULL pointer, the function does nothing and exits.
  2. Header retrieval: It goes back 8 bytes in memory from the pointer to deduce the header.
  3. Validation: It checks for the magic number to detect corruption.
  4. Size retrieval: It retrieves the size of the block to be freed from the metadata.
  5. System call: It calls munmap to return the memory back to the operating system.

Example Usage

_start:
    push dword 4096     ; Pass 4096 to malloc
    call malloc

    mov ebx, eax        ; Move output pointer to ebx
    test ebx, ebx       ; Check if ebx is NULL pointer
    jz .exit

    push ebx            ; Pass ebx to free
    call free

.exit:
    mov eax, 1
    mov ebx, 0
    int 0x80

Compiling this program with NASM and linking it with ld, we find out that this program compiles without any errors, and more importantly, no SEGFAULTs. If we run strace on this program, we can see the syscalls made by this program

mmap2(NULL, 4104, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf7f56000
munmap(0xf7f56000, 4104)                = 0
exit(0)                                 = ?

We can see that the memory allocated is 4096+8 bytes, and the memory freed is corresponding to the memory allocated. Hence, we can conclude that we successfully implemented malloc and free.

Conclusion

This project was a fantastic deep-dive into low level memory management. While something like this is completely pointless to be used in production, building it gave me a deeper appreciation for the existing and more sophisticated allocators. The full source code of this program is available here.