Project 2

Due: Friday, October 25, 2013, 23:59

Goal

Unlike the previous project where you were given access to the source code of the target programs, for this project, you will have to find the vulnerabilities using only the binary, IDA Pro, and gdb.

All work in this project must be done on the VirtualBox virtual machine provided on the course website; see below for information about this environment.

You are given a single server binary which is a very simple webserver. The server program takes an optional argument specifying the port it listens to. It defaults to port 80 which requires root privileges:

user@sandbox:~/project2$ cd webroot
user@sandbox:~/project2/webroot$ sudo ./server # Or
user@sandbox:~/project2/webroot$ ./server 8080

Your goal is to write two exploit programs: sploit1 and sploit2 to get a remote shell from the webserver.

For both exploit programs, you should ensure that your attack works both when the server is launched as sudo ./server and as ./server 8080 from inside the project2/webroot directory.

Some sample code is provided in the form of fetchfile.c which, when compiled and run as ./fetchfile localhost 8080 /index.html, for example, will connect to the webserver and download the file index.html, displaying the raw server output to stdout.

Both of your exploit programs are expected to run from inside the VM (so that we have a consistent environment to test). This is most easily accomplished by configuring VirtualBox to forward a port so that you can ssh into the VM. Directions are given below.

Collaboration

Unlike the previous project, you may not collaborate with anyone on this project. As before, you must not discuss the project with anyone in the class besides the course staff, nor with anyone outside of class. This includes discussing what particular vulnerabilities in the target are! You may use online resources for general reference, but not to search for solutions to specific questions posed in this project.

The Environment

You (and we, for grading) will test your exploit programs within a VirtualBox virtual machine. To use this VM on your personal computer (Windows, Mac OS X, and Linux are supported), you will need to download the virtual machine image provided on the course website — sandbox-1.0.tar.bz2 — as well as VirtualBox from the VirtualBox website. This is the same virtual machine image you used in the previous project. If you still have it, you do not need to download it again.

We have also arranged for VirtualBox to be installed on the Windows machines in Maryland 310.

Preliminaries

1. Get the sandbox virtual machine (the same one we used for project 1).

2. Turn on PAE/NX by shutting down the virtual machine and going to Settings > System > Processor and clicking “Enable PAE/NX”.

3. Go to Settings > Network > Adapter 1 > Port Forwarding and forward a port from the host machine to port 22 in the guest:

Port Forwarding 

4. Start the VM and install linux-image-2.6.32-5-686-bigmem by running

$ sudo apt-get install linux-image-2.6.32-5-686-bigmem

This will cause Linux to use the NX bit to disable executing code on the stack.

5. Optional Install linux-headers-2.6.32-5-686-bigmem and reinstall the VirtualBox guest additions. This is necessary if you wish to use shared folders.

6. Download and unpack project2.

user@sandbox:~$ wget http://www.cs.jhu.edu/~s/teaching/cs460/2013-fall/project2.tar.gz
user@sandbox:~$ tar zxf project2.tar.gz

Getting started

1. Open the server binary in IDA.

Loading the binary in IDA Pro 

2. With the main window selected, press spacebar to toggle between graph view and linear view.

3. In the Functions window, double clicking on the name of a function jumps the main window to that function.

4. Double click on start to move to the start function.

The start function 

5. The first argument to __libc_start_main is the main function.

6. Select sub_8049DCB by clicking on it. Let's name it main by pressing n to rename it.

7. This has updated our functions list in the Functions window. Examine the main function by double clicking on its name, either in the Functions window or in the assembly listing.

8. If you are not currently in the graph view, press the spacebar to enter it. Scroll around the control flow graph either by clicking and dragging in the graph overview window or in the space between functions in the main window. The colored arrows have the following meanings:

9. Let's take a look at the code surrounding the call to _socket.

The socket call 

We can immediately see that the assembly was produced from code similar to the following:

sock = socket(2, 1, 0);
if (sock == -1)
{
	perror("sock = socket(PF_INET, SOCK_STREAM, 0)");
	exit(1);
}

Notice that the error message contains the symbolic constants used although the code only contains the numeric values.

10. If we are so inclined, we can have IDA replace the numeric values with the constants. First, select the domain parameter, 2, by clicking on it. Right clicking opens a contextual menu. Select “Use standard symbolic constant” and then select PF_INET from the list. Similarly, select SOCK_STREAM for the type parameter. In general, we won't have error messages that tell us what the constants we want are. In that case, we can look through the manual page for a given function to see in which system header file it is defined and then look through the header file to find the value of the constant. IDA does a good job of only suggesting constants with the appropriate numerical value, most of the time.

11. Identify the loop in the main function. Loops can easily be identified in the graph view by arrows that point toward earlier blocks of code. Clicking on a jump target will also highlight the label.

In addition to calling standard library functions, each pass through the loop calls a function in the server binary. Since this is a webserver, we can surmise that this function is responsible for handling web clients (i.e., browsers). Rename this function from sub_… to handle_client the same way we did with main.

12. Let's identify the two arguments to handle_client. Start by selecting ebx in the mov [esp], ebx instruction just before call handle_client. All instances of ebx are highlighted in the assembly. Looking up the code, we see that ebx is set most recently by the return value of the accept function. By checking the manual page for accept, we see that it returns a file descriptor (an integer) corresponding to the newly opened socket. So the first argument to handle_client is the socket, which makes sense.

The second argument to handle_client, which is passed in [esp+4], comes from esi. By selecting esi, we see that it is used in several places but it is only assigned once via the lea instruction following the call to listen. esi is set to an address on the stack, namely esp+28h. We see that this address is passed as the buf argument to inet_ntop. The manual page reveals that inet_ntop writes a string containing the address to the buf argument.

Slightly more cryptically, but if inet_ntop does not return 0, then the 4 byte word 0x0000002D is written to the first word of buf. Recall that x86 is little endian so this is stored as 0x2D, 0x00, 0x00, 0x00 in memory. By selecting the 2Dh and pressing r, we see that 0x2D is the hyphen character ‘-’. Thus if inet_ntop does not return 0, then the address is set to be “-”.

13. Finally, double click on handle_client. At the beginning of the function, IDA has declared a bunch of symbols like var_1120. These are what IDA thinks are local variables on the stack. Some of them have names already corresponding to arguments of standard library functions that IDA knows about. We can change these names as we see fit.

In addition to local variables, IDA can deduce information about function arguments. In particular, it knows that handle_client takes two arguments, but it thinks both are ints. We can change this by selecting handle_client and pressing y and giving handle_client the prototype:

void handle_client(int sock, const char *address)

After doing this, notice that the arguments have been named sock and address, similar to the local variables.

Finding a vulnerability

We could, if we wanted, spend a lot of time working out how each function works but our goal is not to understand the complete inner workings of the program. Rather, our goal is to find a way to exploit it.

1. Let's begin by finding usage of functions which are prone to errors. Let's start by finding all instances of strcpy. In the Functions window, double click on _strcpy. (As an aside, what IDA calls _strcpy is what gdb calls strcpy@plt. We'll come back to this later.) If you are in “graph view,” press the spacebar to return to “text view.” From the comment CODE XREF: sub_8049214+33, we can see that strcpy is called from exactly one location.

strcpy 

2. Double click on sub_8049214 to be taken to that function. We can see that the destination argument to strcpy comes from the return value of sub_8048EF9 and by tracing the source argument back, we see that it is the first (and only) argument to sub_8049214.

3. Examine function sub_8048EF9. From a quick inspection, it should be clear that this function calls malloc, passing its only argument and then asserts that the returned value is not NULL. From the argument to the __assert_fail function, we can see that sub_8048EF9 is actually named xmalloc. Rename sub_8048EF9 to xmalloc. This way, if we see it again, we'll immediately remember what it does.

4. Go back to sub_8049214 (by pressing esc or by hitting the back arrow in the toolbar). At this point, it's clear that the function argument is a string that gets copied to some memory allocated on the heap. Furthermore, we can tell that a pointer to this memory is returned. All that remains is to figure out if it's allocating enough memory to hold the entire string. The few instructions before and after the repne scasb are actually computing the length of the string plus 1 and passing that to malloc. To see that, look up the definition of scasb, repne, and not and recall that integers are stored in twos complement.

Since enough space is allocated to hold the contents of the string and malloc is asserted to not return NULL, this strcpy does not appear to be exploitable.

5. It's worth noting that this function behaves very similar to the standard library function strdup in that it takes a string as an argument and then creates a copy of it, returning the copy. Rename the function xstrdup (in analogy with xmalloc).

6. Follow the same procedures to check for the presence of other potentially exploitable functions such as memcpy or strcat. By default IDA only displays 2 cross references. It's helpful to increase this number. Look in Options > General… > Cross-references and increase the number of displayed xrefs to 200. This should be far more than we need.

7. You should be able to find several buffer overflows on the stack. Which functions have overflows and what input do we need to send over the socket to trigger them?

Since the stack is not executable, we are not able to inject code onto the stack. We will need to try something else.

Writing the first exploit

We're going to perform a return-into-libc attack by smashing the stack in one of the error handling functions. From the last section, you should have discovered a buffer overflow on the stack when handling an incorrect HTTP version (as well as a few other places).

1. Make a copy of fetchfile.c and name it sploit1.c.

2. A valid HTTP request line looks like:

GET /path/to/file HTTP/1.1

Modify sploit1.c to hard code a request for an arbitrary file (you can use “/” for the file, if you'd like) and a short, invalid HTTP version string. When the program is run, what is the result? What about a long, invalid HTTP version string?

3. Compute the distance on the stack from the saved instruction pointer to the location where the version is written to the stack. This can be computed directly from the disassembly or you can use gdb.

4. Produce a request string that overwrites the saved eip on the stack with an arbitrary value.

5. Find the address of the exit function in libc (gdb works nicely here) and craft input that will overwrite the saved eip with that address.

6. When you run the exploit, the server will likely crash before it returns to exit. Look carefully at the what happens after you overwrite the saved eip. In particular, some other values that are subsequently used as pointers are overwritten as well. Modify your exploit to overwrite those values with valid addresses. Addresses on the stack work very well.

7. Find the address of the system function in libc. Modify the exploit such that instead of returning to exit, it returns to system. The argument to system should be a pointer to a string that contains:

nc -c sh -l -p 1234

Since you don't know exactly where on the stack this string will live, you can prepend it with a large number of spaces, similar to a nop sled. Take care that any text after the string is ignored.

If you're not familiar with the netcat tool, read the man page. What this particular command will do is it will listen on port 1234 and on a connection, it will launch sh.

After sploit1 sends the exploit string to the server, it should close the socket, sleep for a second (just to give the server process time to run and launch nc), and finally execute nc. The last 5 lines of my sploit1's main function are

	sendn(sock, request, strlen(request));
	close(sock);
	sleep(1);
	execl("/bin/nc", "/bin/nc", argv[1], "1234", NULL);
	_exit(1);

If the server is launched as root,

$ sudo ./server

then when I run sploit1, no prompt is printed, but I can run shell commands and the response will be sent back:

$ ./sploit1 localhost 80
whoami
root
exit

Your sploit1 should behave the same way.

Writing the second exploit

For the second exploit, the goal is the same (get a remote shell) but we're going to turn on address space layout randomization (ASLR).

1. Run

$ sudo sh -c 'echo 2 > /proc/sys/kernel/randomize_va_space'

This will enable ASLR until either you restart the VM or you write 0 to /proc/sys/kernel/randomize_va_space.

Now, if we run the server and sploit1, the server will crash because we relied on the address of system and the address of our string on the stack. We're going to need to perform an address disclosure attack to learn where the stack is and where libc is loaded.

2. Examine the four calls to snprintf in IDA, three of them have constant format strings, one dies not. Start the server in gdb. Set a breakpoint just prior to the call to the snprintf that does not have a constant format string and use gdb to print the format string.

3. Use fetchfile to provide input which can be used for the format string. Try giving it file paths of the form: ZZZZ%1$08x-%2$08x-%3$08x and then change the 1, 2, and 3 to 4, 5, and 6 and so on until you find the value ZZZZ. You'll be able to see the output by watching stdout of server. Naturally, this information won't be available to a real client, but we will deal with that later.

4. Double clicking on _snprintf in IDA shows the implementation in the plt section:

jmp	ds:off_804B7E4

Although it may not be obvious, this is actually an indirect jump to an absolute address. That is, address 0x0804b7e4 contains a pointer to the actual snprintf function which is set by the dynamic linker after snprintf is called for the first time.

5. Copy fetchfile.c to sploit2.c. Modify sploit2.c so that it tries to fetch a file that contains a format string exploit. This time, instead of ZZZZ%n$08x for some value of n, replace the ZZZZ with the address 0x0804b7e4 (in little endian) and replace the 08x with s. This should cause the server to treat that address as a pointer to a string and print the string. The first four bytes of which should be the address of snprintf.

6. Now modify the file string to addr%m$08x%n$s where addr and n are the values from step 5 and m is chosen such that it prints out the saved frame pointer (ebp) in hex. Hint: rather than just trying value after value like we did to find the ZZZZ, look at the lea instruction just above the call to _snprintf to find the start of the buffer.

7. In addition to writing the connection information to stdout, the server logs the same information to server.log. Since this is written in the same directory as server, we can fetch it using a normal HTTP GET.

Change sploit2.c to make two requests. The first exploits the format string vulnerability to write the address of the saved ebp in hex and the address of snprintf to server.log and the second fetches /server.log.

8. Read the fetched server.log line by line until you get to the end. Hint: Using code like

	FILE *fp = fdopen(sock, "r");
	char line[2][1024];
	int which = 0;
	while (!feof(fp) && !ferror(fp) && fgets(line[which], 1024, fp))
		which = !which;
	which = !which;
	fclose(fp);

puts the final line read from the socket in line[which].

9. Parse the 8 byte hex string that is the saved frame pointer and the 4 byte integer that is the address of snprintf out of the last line read from the socket.

10. Add your exploit from sploit1.c to sploit2.c but make sure you adjust the address of system, exit, and the pointers to the stack appropriately. Now that you know a stack address, you can compute exactly which address the argument to system should be. Remove the “space sled” that we used for sploit1 since it is unnecessary.

11. You should now be able to start the server as root and get a remote, root shell by running sploit2 even though the process is protected by ASLR! Congratulations!

Deliverables

You are to provide a tarball (i.e., a .tar.gz or .tar.bz2 file) containing sploit1.c and sploit2.c:

user@sandbox:~/project2$ tar zcf p2.tar.gz sploit1.c sploit2.c

We should be able to build and run your exploits without error or warning by running:

user@sandbox:~/project2$ tar zxf p2.tar.gz
user@sandbox:~/project2$ gcc -Wall -Werror -std=c99 -o sploit1 sploit1.c
user@sandbox:~/project2$ ./sploit1 localhost 80
user@sandbox:~/project2$ gcc -Wall -Werror -std=c99 -o sploit2 sploit2.c
user@sandbox:~/project2$ ./sploit2 localhost 80

(Of course, sploit1 will not work with ASLR enabled.)