The goal of this project is to learn about some crypto basics.
You may work on this project in collaboration with one or two partners as described on the main page.
You must not discuss the project with anyone other than your partners and course staff. You may use online resources for general reference, but not to search for solutions to specific questions posed in this project.
Your Python 3 code should run on occs.
There are three parts for this assignment.
SHA-256. Implement SHA-256 in sha256.py. You should look at the pseudocode on Wikipedia for the algorithm. Your program should take a single string as an argument and print the SHA-256 hash of the string as a sequence of hex characters followed by a newline and nothing else.
You must implement the algorithm itself. You may not use a library (e.g., Python’s standard hashlib) or an external program to perform the computation for you. To check your answer, you can echo -n a string and pipe it to sha256sum.
$ ./sha256.py 'Crypto is complicated'
b13ec1838eada3f2908c144f54ead46cf76277b6e0afa847763f7f5cdd610222
$ echo -n 'Crypto is complicated' | sha256sum
b13ec1838eada3f2908c144f54ead46cf76277b6e0afa847763f7f5cdd610222 -
To make the assignment easier (especially part 2), I recommend implementing SHA-256 by making use of several helper functions.
padding(num_bytes) takes a number of bytes and returns a bytes object that contains the appropriate padding for a num_bytes message;rotate_right(x, bits) that returns a right rotation of x by bits (don’t forget to mask your result to 32-bits); andprocess_chunk(md, chunk) which updates the eight SHA-256 state variables (stored in the 8-element list md) after running the compression function on chunk.
I’ve provided a hexdigest(md) function for you that takes the md list and concatenates the big-endian hexadecimal representation of the state variables.
MAC forgery. In this part, you are going to produce a MAC forgery. The provided badmac.py file implements a simple (and bad) message authentication code, BADMAC(key, msg) = SHA-256(key || msg).
Your task is to write a function create_forgery(oracle) in forger.py. The one argument, oracle, can be called as oracle(msg) to create a MAC for the msg argument. You may query the oracle as many times as you wish, passing whatever messages you want and getting their corresponding MACs. Afterward, you need to return a pair (msg, mac) where msg has not been used as a parameter to oracle and mac is a valid MAC (represented as a lowercase hex string) for msg. That is, oracle(msg) == mac must be True.
Running ./badmac.py will call your create_forgery function and pass it an oracle containing a random 16-byte key. After the function returns, it will print out some information. Here is an example (with the forged string redacted).
$ ./badmac.py
Made 10 distinct oracle call(s)
Key: 5ac68d18f108467317aa7b2e220e2467
Forgery: b'<redacted>'
Forged MAC: dc6f61981bba914df332f8225e18a415cd668812208cd2baa8be0c304cd9d070
MAC: dc6f61981bba914df332f8225e18a415cd668812208cd2baa8be0c304cd9d070
Success
You can produce a successful forgery using fewer than 10 calls to the oracle, but you’re unlikely to be successful if you make zero calls.
The create_forgery function must not write anything to the terminal or to a file, nor read any files, nor make any attempt to extract the key from the oracle or in any other object.
Hint: Think about the way SHA-256 works by processing one chunk at a time. BADMAC(key, msg) is going to concatenate the 16-byte key with the message and hash that. SHA-256 is going to pad that out until the whole thing is a multiple of 64-bytes (512-bits). Starting with the md list set to the initialization vector, SHA-256 is going to process each 64-byte chunk in order, updating md after each one. The final hash value is what you get from concatenating the 8 elements of md. If we view the output of the oracle call as md (that is, convert from a hex string to a list of 8 integers), you can process additional chunks.
MD5 collision. Creating an MD5 collision is now a fast process. Your task is to create a pair of Python files, hello.py and goodbye.py, that have the same MD5 hash but when run produce different output. hello.py should print Hello! and goodbye.py should print Goodbye!. The two programs must not rely on their names (e.g., they may not examine sys.argv) or any aspect of their environment (including the presence or absence of files) to have different behavior.
$ md5sum hello.py goodbye.py
1721ab6cb96d9e43ce0e96486f898754 hello.py
1721ab6cb96d9e43ce0e96486f898754 goodbye.py
$ sha256sum hello.py goodbye.py
b48f689c3a79f64c0924be53117bc074a4c1987fb7705de4c0b4c895b6c495af hello.py
cc5169383251dafc9319ce73d161854a2d79379db24a900b3ede95d7d25d1460 goodbye.py
$ ./hello.py
Hello!
$ ./goodbye.py
Goodbye!
Your MD5 and SHA-256 hashes will be different from mine.
To assist you in this process, you can start with these two, different files that have the same MD5 hash.
#!/usr/bin/env python3
# This is a very special string
x = b"""
6gJ|SQKGv}HhJ[vs,QJUaow>VVqmrXVet{n9:CTmXk~Mo_WqBW#p:##pr&<0l;W|v+(-jCmzJC,74GmE3PJHKoQt0mMuU_g&pQ?kG>i99dG2U0{2R2lF####&###>,_m
"""
#!/usr/bin/env python3
# This is a very special string
x = b"""
6gJ|SQKGv}HhJ[vs,QJUasw>VVqmrXVet{n9:CTmXk~Mo_WqBW#p:##pr&<0l;W|v+(-jCmzJC,74GmE3PJHKoQt0mMuU_g&pQ?kG>i99dG2U0{2R2lF####&###>,_m
"""
If you save these as 5-line files (including the trailing newline), they should have the same MD5 hash value of 1696045fb5e33f6138a6e745c5e82b00.
Like SHA-256, MD5 uses the Merkle–Damgård construction. You should use that fact to produce hello.py and goodbye.py.
If you’d like, you may start with different files. I used Marc Stevens’ hashclash to generate them. It took a few hours to run.