Code, once written by the programmer and compiled into an executable, is usually treated as immutable. Yet, at the lowest level of execution, this immutability is more of a suggestion than a rule, enforced by operating systems rather than the hardware itself. In reality, code is just stored in the memory within the text section, and as you can with any other location in memory, you can modify it at runtime, given that you have the right privileges.
This simple realisation opens the door to several possibilities - the greatest being the ability to create programs that can mutate themselves at runtime. Functions, that are programmed to do one task, but end up doing something else entirely. In this post, we explore how we can create such mutating programs, what uses do they have, and why they can be potentially dangerous.
The Background
In a program’s memory, code is stored as instructions in the text section. These instructions are stored in the typical opcode-operand format. Now, if a different instruction changes the value of either the opcode or the operand at the memory location of the first instruction, the first instruction has been mutated. The CPU itself won’t object to this, because it only reads and interprets the memory.
The only thing stopping someone from doing so is the operating system. In a typical program, different sections of memory are assigned different privileges. Each segment has three permission bits - read, write, and execute. The text segment of the program conventionally only has read and execute privileges. This provision exists for several purposes, such as preventing malicious attacks by intjecting executable code, or preventing bugs from corrupting program logic. In fact, the CPU can even cache the code more aggressively if it knows that it won’t change.
The Implementation
section .data
SYSCALL_write db 4
SYSCALL_mprotect db 125
PROT_READ equ 1
PROT_WRITE equ 2
PROT_EXEC equ 4
PROT_RWX equ (PROT_READ | PROT_WRITE | PROT_EXEC)
newline db 10
section .bss
result rsb 4 ; Result buffer to temporarily store value
section .text
global _start
modifiable_func:
instr_to_modify:
mov eax, 5 ; Instruction that we intend to modify
ret
_start:
mov eax, SYSCALL_mprotect ; mprotect syscall
mov ebx, modifiable_func ; ebx contains memory address (for mprotect)
and ebx, 0xFFFFF000 ; align the memory address to the page boundary
mov ecx, 4096 ; ecx contains the size of the memory block (for mprotect)
mov edx, PROT_RWX ; edx contains the new protection flags (for mprotect)
int 0x80
call modifiable_func ; First time we call, the value in eax will be 5
mov eax, [result]
mov eax, SYSCALL_write
mov ebx, 1
mov ecx, result
mov edx, 1
int 0x80 ; Output : 5
mov eax, SYSCALL_write
mov ebx, 1
mov ecx, newline
mov edx, 1
int 0x80
mov byte [instr_to_modify + 1], 6 ; Modifying the opcode of the given instruction
call modifiable_func ; Second time we call, the value in eax will be 6
mov eax, [result]
mov eax, SYSCALL_write
mov ebx, 1
mov ecx, result
mov edx, 1
int 0x80 ; Output : 6
mov eax, 1
xor ebx, ebx
int 0x80
In the above program, we have declared the function we intend to modify in the text section. Additionally, we have also added a label to the specific instruction that we want to modify.
At the very beginning of the _start function, we make a system call to mprotect. mprotect is the syscall used to change the memory access protections for the calling process’s memory pages. The function takes in the address, aligned to a page boundary, a size, the size of the page in consideration, and the new protection flags that are to be assigned to the memory block. In this case, we pass the address of the modifiable function, along with a page size of 4096, and requesting privileges to read, write, and execute.
Then, the program calls the modifiable function, which places 5 in the eax register. It then print the contents of eax. After this, the program actually starts mutating code. It goes to the memory location of the instruction to be modified, moves one byte further (that one byte corresponds to the opcode), and changes the operand from a 5 to a 6. And after calling the function and printing the contents of eax, we can see that the mutation has succeeded.
Applications
Practical applications of self-modifying code are few and far between. From an optimization standpoint, mutating code can be used to optimize loops via unrolling and dynamically adjusting loop counters based on runtime data sizes, however any benefit gained from this will be negated by the performance drop caused by the loss of cacheing of instructions.
A more realistic use of this can be in DRM protection and anti-reverse engineering measures. For example, if a program detects that it is being run inside a debugger, it will intentionally abstract and obfuscate the code to prevent analysis and subsequent reverse engineering.
However, this paradigm can also lead itself to more malicious applications. Using this technique, one can create a polymorphic virus, with code that mutates with every infection. Changing minor things, like the instruction order or registers used can be enough to morph the signature enough and evade detection. This technique can also be used to create runtime metamorphic viruses, where the virus changes its own code completely while still preserving its funcion.
Conclusion
Just like most of my previous projects, this one as well is not something that should be applied in the real world. However, watching a function return different values from the exact same function call, knowing I had rewritten the instructions within mid-execution, gave me a visceral understanding of how fragile the boundary between code and data really is. The full source code of the program can be found here.