The goal of this project is to learn about some crypto basics.
You may work on this project in collaboration with a single partner as described on the main page.
You must not discuss the project with anyone other than your partner and course staff. You may use online resources for general reference, but not to search for solutions to specific questions posed in this project.
Your Python 3 code should run on hacker.cs.oberlin.edu.
There are three parts for this assignment.
SHA-256. Implement SHA-256 in sha256.py
. You should look at the pseudocode on Wikipedia for the algorithm. Your program should take a single string as an argument and print the SHA-256 hash of the string as a sequence of hex characters followed by a newline and nothing else.
You must implement the algorithm itself. You may not use a library (e.g., Python’s standard hashlib
) or an external program to perform the computation for you. To check your answer, you can echo -n
a string and pipe it to sha256sum
.
$ ./sha256.py 'Crypto is complicated'
b13ec1838eada3f2908c144f54ead46cf76277b6e0afa847763f7f5cdd610222
$ echo -n 'Crypto is complicated' | sha256sum
b13ec1838eada3f2908c144f54ead46cf76277b6e0afa847763f7f5cdd610222 -
To make part 2 of the assignment easier, I recommend implementing SHA-256 by making use of several helper functions pad_message
which takes a bytes-like object and returns a properly padded copy of it and process_chunk
which updates the internal state after processing a 512-bit chunk. (See the pseudocode for the chunk processing.)
In my implementation, I implemented functions rotate_right
, round_sum
, extend_sum
, choose
, majority
, pad_message
, process_chunk
, hexdigest
, and sha256
. The first seven of those implement various parts of the psuedocode. hexdigest
, given in the starter code, takes a list of 8 integers and formats them as a hex string. You’ll likely want to use this in part 2 as well. sha256
takes a bytes-like object and produces the full hash.
MAC forgery. In this part, you are going to produce a MAC forgery. The provided badmac.py
file implements a simple (and bad) message authentication code, BADMAC(key, msg) = SHA-256(key || msg)
.
Your task is to write a function create_forgery(oracle)
in forger.py
. The one argument, oracle
, can be called as oracle(msg)
to create a MAC for the msg
argument. You may query the oracle as many times as you wish, passing whatever messages you want and getting their corresponding MACs. Afterward, you need to return a pair (msg, mac)
where msg
has not been used as a parameter to oracle
and mac
is a valid MAC (represented as a lowercase hex string) for msg
. That is, oracle(msg) == mac
must be True
.
Running ./badmac.py
will call your create_forgery
function and pass it an oracle containing a random 16-byte key. After the function returns, it will print out some information. Here is an example (with the forged string redacted).
$ ./badmac.py
Made 10 distinct oracle call(s)
Key: 5ac68d18f108467317aa7b2e220e2467
Forgery: '<redacted>'
Forged MAC: dc6f61981bba914df332f8225e18a415cd668812208cd2baa8be0c304cd9d070
MAC: dc6f61981bba914df332f8225e18a415cd668812208cd2baa8be0c304cd9d070
Success
You can produce a successful forgery using fewer than 10 calls to the oracle, but you’re unlikely to be successful if you make zero calls.
The create_forgery
function must not write anything to the terminal or to a file, nor read any files, nor make any attempt to extract the key from the oracle
or in any other object.
Hint: Think about the way SHA-256 works by processing one chunk at a time. BADMAC(key, msg)
is going to concatenate the 16-byte key with the message and hash that. SHA-256 is going to pad that out until the whole thing is a multiple of 64-bytes (512-bits). Starting with the md
list set to the initialization vector, SHA-256 is going to process each 64-byte chunk in order, updating md
after each one. The final hash value is what you get from concatenating the 8 elements of md
. If we view the output of the oracle call as md
(that is, convert from a hex string to a list of 8 integers), you can process additional chunks.
MD5 collision. Creating an MD5 collision is now a fast process. Your task is to create a pair of Python files, hello.py
and goodbye.py
, that have the same MD5 hash but when run produce different output. hello.py
should print Hello!
and goodbye.py
should print Goodbye!
. The two programs must not rely on their names or any aspect of their environment (including the presence or absence of files) to have different behavior.
$ md5sum hello.py goodbye.py
1721ab6cb96d9e43ce0e96486f898754 hello.py
1721ab6cb96d9e43ce0e96486f898754 goodbye.py
$ sha256sum hello.py goodbye.py
b48f689c3a79f64c0924be53117bc074a4c1987fb7705de4c0b4c895b6c495af hello.py
cc5169383251dafc9319ce73d161854a2d79379db24a900b3ede95d7d25d1460 goodbye.py
$ ./hello.py
Hello!
$ ./goodbye.py
Goodbye!
Your MD5 and SHA-256 hashes will be different from mine.
To produce these files, clone the oberlin-security/clone-fastcoll repository and build the code by typing make
. You should not check this code in to your repository. The resulting fastcoll
program takes the path of a file as an argument and produces two files md5_data1
and md5_data2
that have the same MD5 hash and both have the contents of the file as a prefix.
For example, given a file prefix
containing the single line “CS 343”, running fastcoll prefix
will produce two different files with the same MD5 hash.
$ echo 'CS 343' > prefix
$ ./fastcoll prefix
Generating first block: ..........
Generating second block: S01.......
use 'md5sum md5_data*' check MD5
$ md5sum md5_data?
adb00d1dfed84f8aba5304fb51d5e8c7 md5_data1
adb00d1dfed84f8aba5304fb51d5e8c7 md5_data2
$ xxd md5_data1
00000000: 4353 2033 3433 0a00 0000 0000 0000 0000 CS 343..........
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000040: ffe0 4306 16f9 0e44 f2bc 03fb 92c8 2ad4 ..C....D......*.
00000050: 5974 8677 09ee e705 1b4e 8279 1654 bdf2 Yt.w.....N.y.T..
00000060: 0dad b8c0 6e0a 0a48 eb5b 57d9 2d74 640c ....n..H.[W.-td.
00000070: 536b 1d04 a507 165d 6a20 92a7 0148 75e0 Sk.....]j ...Hu.
00000080: ad16 4013 e889 0bc3 185b c595 7b69 bf4e ..@......[..{i.N
00000090: 2c0c 0053 17e6 0e5e cb1a f63d ec46 fe65 ,..S...^...=.F.e
000000a0: d85b a656 02cb 0200 ba2b 4057 78ec 6ae6 .[.V.....+@Wx.j.
000000b0: ff7e e6bd c438 ef96 571d 4927 757f fc03 .~...8..W.I'u...
$ xxd md5_data2
00000000: 4353 2033 3433 0a00 0000 0000 0000 0000 CS 343..........
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000040: ffe0 4306 16f9 0e44 f2bc 03fb 92c8 2ad4 ..C....D......*.
00000050: 5974 86f7 09ee e705 1b4e 8279 1654 bdf2 Yt.......N.y.T..
00000060: 0dad b8c0 6e0a 0a48 eb5b 57d9 2df4 640c ....n..H.[W.-.d.
00000070: 536b 1d04 a507 165d 6a20 9227 0148 75e0 Sk.....]j .'.Hu.
00000080: ad16 4013 e889 0bc3 185b c595 7b69 bf4e ..@......[..{i.N
00000090: 2c0c 00d3 17e6 0e5e cb1a f63d ec46 fe65 ,......^...=.F.e
000000a0: d85b a656 02cb 0200 ba2b 4057 786c 6ae6 .[.V.....+@Wxlj.
000000b0: ff7e e6bd c438 ef96 571d 49a7 757f fc03 .~...8..W.I.u...
If you look closely at the contents of the two files, you can see that they are different but both start with 4353 2033 3433 0a
which is CS 343
followed by a new line.
One thing that stands out is that there’s a bunch of binary data which is going to be a little hard to incorporate into a Python program. Fortunately, you can tell Python that a source file has a particular character encoding by including a magic comment at the top of the file. Then you can constrain all of this binary data in a Python string.
Start by creating a file named prefix
containing the following four lines.
#!/usr/bin/env python3
# coding: iso-8859-1
MSG = bytes(r"""
Next, run fastcoll
on this file producing md5_data1
and md5_data2
.
Next, create a file named suffix
that starts with the following two lines. (The first line is blank.)
Finally, if you run the two commands
$ cat md5_data1 suffix > hello.py
$ cat md5_data2 suffix > goodbye.py
this will create two files hello.py
and goodbye.py
that start with the contents of prefix
and end with the context of suffix
. In between, the two files will differ. The last line of prefix
is creating a bytes
object from the bytes generated by fastcoll
. The two lines at the beginning of suffix
are ending the creating of that object and telling Python create the bytes
by encoding its argument using ISO 8859-1.
Now, you can change the contents of suffix
and re-run the cat
commands such that you get the required behavior of printing the two strings.
You’ll want to make the two python files executable.
$ chmod +x hello.py goodbye.py
Make sure you add and commit prefix
, suffix
, hello.py
, and goodbye.py
to your repository.
You should not include md5_data1
or md5_data2
in your repository.