Understanding Simulation Execution#
Many times we prefer to debug on real devices because the real device environment is the best. However, sometimes we cannot find a real device environment, or the environment is difficult to reproduce (such as router analysis, etc.). At this point, we naturally think of emulators, similar to the Nox emulator on Windows and the QEMU emulator on Linux.
However, emulators still have certain limitations. For example, the Nox emulator can only run the x86_64 instruction set, which is where the simulation execution framework comes into play.
Unicorn is an excellent cross-platform simulation execution framework that can execute native programs of various instruction sets, including Arm, Arm64 (Armv8), M68K, Mips, Sparc, and X86 (including X86_64).
Using Unicorn#
I prefer to get straight to the point and demonstrate code execution directly. The installation is quite simple; just use pip install
.
When using Unicorn, there are several steps to follow:
- step1:
mu = Uc(UC_ARCH_ARM, UC_MODE_THUMB)
Initialize a virtual machine, selecting your instruction set and architecture. - step2:
mu.mem_map(ADDRESS, 2 * 0x10000)
Map a memory area. - step3:
mu.mem_write(ADDRESS, ARM_CODE)
Write a piece of code to the memory area. - step4:
mu.reg_write
Set the values of the registers, i.e., the environment of the code execution context. - step5:
mu.emu_start
Start the simulation. - step6: Monitor the status during execution, such as the result of executing a single instruction and changes in register values, similar to gdb debugging.
Having understood the above process, I think the general flow of simulation execution is clear. Below is an actual execution of a piece of code.
from unicorn import *
from unicorn.arm_const import *
ARM_CODE = b"\x37\x00\xa0\xe3\x03\x10\x42\xe0"
# mov r0, #0x37;
# sub r1, r2, r3
# Test ARM
# callback for tracing instructions
def hook_code(uc, address, size, user_data):
print(">>> Tracing instruction at 0x%x, instruction size = 0x%x" %(address, size))
def test_arm():
print("Emulate ARM code")
try:
# Initialize emulator in ARM mode
mu = Uc(UC_ARCH_ARM, UC_MODE_THUMB)
# map 2MB memory for this emulation
# uc_mem_map memory address and size must be aligned to 0x1000
ADDRESS = 0x10000
mu.mem_map(ADDRESS, 2 * 0x10000)
mu.mem_write(ADDRESS, ARM_CODE)
mu.reg_write(UC_ARM_REG_R0, 0x1234)
mu.reg_write(UC_ARM_REG_R2, 0x6789)
mu.reg_write(UC_ARM_REG_R3, 0x3333)
# hook UC_HOOK_CODE is an instruction-level hook
mu.hook_add(UC_HOOK_CODE, hook_code, begin=ADDRESS, end=ADDRESS)
# emulate machine code in infinite time
mu.emu_start(ADDRESS, ADDRESS + len(ARM_CODE))
r0 = mu.reg_read(UC_ARM_REG_R0)
r1 = mu.reg_read(UC_ARM_REG_R1)
print(">>> R0 = 0x%x" % r0)
print(">>> R1 = 0x%x" % r1)
except UcError as e:
print("ERROR: %s" % e)
test_arm()
In the above code, we want to simulate executing a piece of ARM code. Of course, the code is simple, and you can see the execution result at a glance.
mov r0, #0x37
sub r1, r2, r3
At this point, what we need to do is write the code according to the process of step1~step6, and here I will focus on how to monitor debugging.
Unicorn provides instruction-level hooks; you only need to write a callback function to monitor the context.
# callback for tracing instructions
def hook_code(uc, address, size, user_data):
print(">>> Tracing instruction at 0x%x, instruction size = 0x%x" %(address, size))
... code omitted
# hook UC_HOOK_CODE is an instruction-level hook
mu.hook_add(UC_HOOK_CODE, hook_code, begin=ADDRESS, end=ADDRESS)
Finally, we read the register results after executing this piece of code as follows:
❯ python unicorn_t2.py
Emulate ARM code
>>> Tracing instruction at 0x10000, instruction size = 0x4
>>> R0 = 0x37
>>> R1 = 0x3456
What Can Unicorn Do?#
Since we are using such a powerful simulation execution framework, we can use it to... write assembly!
It's funny; in the past, when I wanted to simulate running x86 or ARM assembly, I would be troubled by environment simulation. Now with Unicorn, I can easily write assembly in any language!
For example, we can write an assembly program to calculate the Fibonacci sequence in ARM and learn about ARM instructions.
Three registers are enough to perform addition, and the final code is:
.global main
main:
MOV R0, #10 // Set the length of the Fibonacci sequence to 10
MOV R1, #0 // Initialize the first Fibonacci number to 0
MOV R2, #1 // Initialize the second Fibonacci number to 1
loop:
CMP R0, #0 // Check if the counter is 0
BEQ end // If the counter is 0, jump to end
ADD R3, R1, R2 // Calculate the next Fibonacci number
MOV R1, R2 // Update the first Fibonacci number to the current second Fibonacci number
MOV R2, R3 // Update the second Fibonacci number to the newly calculated number
SUB R0, R0, #1 // Decrement the counter by 1
B loop // Jump back to the beginning of the loop
end:
// End the program
Learning ARM Assembly Instructions#
Here are some common ARM assembly instructions:
MOV
: Data transfer instruction. Used to move an immediate value or the value of another register into a register. For example,MOV R0, #10
moves 10 into register R0.CMP
: Comparison instruction. Used to compare the values of two registers. The result is not stored but affects the status register (sets condition flags). For example,CMP R0, #0
compares R0 with 0.BEQ
: Conditional branch instruction. If the result of the lastCMP
was equal, it jumps to the label. For example,BEQ end
jumps toend
if the condition is met (equal).ADD
: Addition instruction. Adds the values of two registers and stores the result in another register. For example,ADD R3, R1, R2
adds the values of R1 and R2, storing the result in R3.SUB
: Subtraction instruction. Subtracts the value of one register from another register or immediate value. For example,SUB R0, R0, #1
decrements the value of R0 by 1.B
: Unconditional branch instruction. Jumps to the specified label. For example,B loop
unconditionally jumps back to theloop
label.LDR/STR
(Load/Store): Used to load data from memory into a register or store data from a register into memory.- Example:
LDR R3, [R1]
loads data from the memory address specified by R1 into R3. - Example:
STR R3, [R1]
stores the data in R3 to the memory address specified by R1.
- Example:
BL/BLX
(Branch with Link/Branch with Link and Exchange): Used for function calls, saving the return address to the link register (LR).- Example:
BL function_name
calls the function namedfunction_name
and saves the return address in LR.
- Example:
PUSH/POP
(Stack Push/Stack Pop): Used to manipulate the stack, typically to save and restore registers during function calls.- Example:
PUSH {R0, R1, LR}
pushes R0, R1, and the link register LR onto the stack. - Example:
POP {R0, R1, LR}
pops data from the stack into R0, R1, and LR.
- Example:
BNE, BGT, BLE, etc.
(Branch if Not Equal, Branch if Greater Than, Branch if Less or Equal, etc.): Conditional branch instructions that branch based on flags set by the CMP instruction.
- Example:
BNE somewhere
jumps tosomewhere
if the last comparison result was not equal.
Simulation Execution#
Since we are directly writing ARM assembly code, the machine does not recognize it, so we need to use the Keystone tool to convert it to machine code. Once converted to machine code, we can simulate execution.
from unicorn import *
from unicorn.arm_const import *
from keystone import *
# ARM assembly code
arm_code = """
.global main
main:
MOV R0, #10 // Set the length of the Fibonacci sequence to 10
MOV R1, #0 // Initialize the first Fibonacci number to 0
MOV R2, #1 // Initialize the second Fibonacci number to 1
loop:
CMP R0, #0 // Check if the counter is 0
BEQ end // If the counter is 0, jump to end
ADD R3, R1, R2 // Calculate the next Fibonacci number
MOV R1, R2 // Update the first Fibonacci number to the current second Fibonacci number
MOV R2, R3 // Update the second Fibonacci number to the newly calculated number
SUB R0, R0, #1 // Decrement the counter by 1
B loop // Jump back to the beginning of the loop
end:
// End the program
"""
# Initialize the Keystone engine
ks = Ks(KS_ARCH_ARM, KS_MODE_ARM)
# Compile the ARM assembly code into binary code
arm_code_binary, _ = ks.asm(arm_code.encode())
# Set up the simulator
mu = Uc(UC_ARCH_ARM, UC_MODE_ARM)
# Allocate memory space and load the ARM code into memory
ADDRESS = 0x1000000
mu.mem_map(ADDRESS, 0x1000)
mu.mem_write(ADDRESS, bytes(arm_code_binary))
# Set initial values for registers
mu.reg_write(UC_ARM_REG_SP, 0x7fffffff)
# Define a hook function to output register values after each instruction execution
def hook_code(uc, address, size, user_data):
print(f"Instruction at 0x{address:x} executed")
# Register name mapping
reg_names = {
UC_ARM_REG_R0: 'R0',
UC_ARM_REG_R1: 'R1',
UC_ARM_REG_R2: 'R2',
UC_ARM_REG_R3: 'R3',
}
for reg in [UC_ARM_REG_R0, UC_ARM_REG_R1, UC_ARM_REG_R2, UC_ARM_REG_R3]:
reg_value = mu.reg_read(reg)
reg_name = reg_names.get(reg, 'Unknown')
print(f"{reg_name}: {reg_value}")
print()
# Add the hook function to trigger after each instruction execution
mu.hook_add(UC_HOOK_CODE, hook_code)
# Start the simulation execution
try:
mu.emu_start(ADDRESS, ADDRESS + len(arm_code_binary))
except UcError as e:
print(f"Error: {e}")
# Output register values
print("R1:", mu.reg_read(UC_ARM_REG_R1))
print("R2:", mu.reg_read(UC_ARM_REG_R2))
print("Final result R3:", mu.reg_read(UC_ARM_REG_R3))
Problem Solving#
One interesting problem you can find online is the 100mazes challenge. Reference: Example: MTCTF2021 100mazes
The problem requires: There are 100 mazes, and you need to provide the route for each maze, finally submitting the MD5 result of these routes as the flag.
Of course, you cannot solve it manually, so you observe the code for each maze:
Each function actually reads maze data and then determines whether you can reach the endpoint based on your input route:
You can then write code to extract the maze data and use DFS to determine if you can navigate through the maze (it is actually possible to extract maze data by writing scripts like idapython without simulating execution).
Here, I will focus on how to use Unicorn to simulate executing the code. Although Unicorn can directly simulate at the instruction level, it cannot handle functions like printf, so additional processing is needed.
Additionally, besides registers, we also need to simulate a stack.
The key code is this:
def hook_code(uc, address, size, user_data):
global map_data, str_map, ans_map, ans, all_input
# print('>>> Tracing instruction at 0x%x, instruction size = 0x%x' % (address, size))
assert isinstance(uc, Uc)
code = uc.mem_read(address, 4)
if code == b"\x48\x0F\xC7\xF0":
uc.reg_write(UC_X86_REG_RIP, address + 4) # Skip rdrand rax directly
if address == 0x640: # Encounter printf ret
rsp = uc.reg_read(UC_X86_REG_RSP)
retn_addr = u64(uc.mem_read(rsp, 8))
uc.reg_write(UC_X86_REG_RIP, retn_addr)
elif address == 0x650: # Encounter getchar, read maze
rbp = uc.reg_read(UC_X86_REG_RBP)
maze_data = uc.mem_read(rbp - 0xC6A, 0x625) # Maze data
step_data = uc.mem_read(rbp - 0x9F9, 4).decode() # Direction data
xor_data = uc.mem_read(rbp - 0x9D0, 0x9C4) # XOR data
lr_val = u32(uc.mem_read(rbp - 0x9F4, 4)) # Starting x
ur_val = u32(uc.mem_read(rbp - 0x9F0, 4)) # Starting y
maze_data = list(maze_data) # XOR
for i in range(0, 0x9C4, 4):
maze_data[i // 4] ^= u32(xor_data[i: i + 4])
for i in range(25): # Synthesize the final maze
line_data = ""
for j in range(25):
line_data += chr(maze_data[i * 25 + j])
# print(line_data)
map_data = maze_data
str_map = step_data
ans = ""
assert dfs(0, -1, -1, lr_val, ur_val) # Deep search
# print(ans)
all_input += ans
# leave;ret
rbp = uc.reg_read(UC_X86_REG_RBP)
new_rbp = u64(uc.mem_read(rbp, 8))
retn_addr = u64(uc.mem_read(rbp + 8, 8))
uc.reg_write(UC_X86_REG_RBP, new_rbp)
uc.reg_write(UC_X86_REG_RSP, rbp + 0x18)
uc.reg_write(UC_X86_REG_RIP, retn_addr)