Project 4: Crypto Fun

Due: Monday, December 11, 2017, 23:59.

Goal

The goal of this project is to learn about some crypto basics.

Collaboration

You may work on this project in collaboration with a single partner as described on the main page.

You must not discuss the project with anyone other than your partner and course staff. You may use online resources for general reference, but not to search for solutions to specific questions posed in this project.

The Environment

Your Python 2.7 code should run on bertvm or ernievm.

The Assignment

There are three parts for this assignment.

  1. SHA-256. Implement SHA-256 in sha256.py. You should look at the pseudocode on Wikipedia for the algorithm. Your program should take a single string as an argument and print the SHA-256 hash of the string as a sequence of hex characters followed by a newline and nothing else.

    You must implement the algorithm itself. You may not use a library (e.g., Python’s standard hashlib) or an external program to perform the computation for you. To check your answer, you can echo -n a string and pipe it to sha256sum.

    $ ./sha256.py 'Crypto is complicated'
    b13ec1838eada3f2908c144f54ead46cf76277b6e0afa847763f7f5cdd610222
    $ echo -n 'Crypto is complicated' | sha256sum
    b13ec1838eada3f2908c144f54ead46cf76277b6e0afa847763f7f5cdd610222  -
    

    To make part 2 of the assignment easier, I recommend implementing SHA-256 by making use of several helper functions pad_message which takes a string and returns a properly padded copy of it and process_chunk which updates the internal state after processing a 512-bit chunk. (See the pseudocode for the chunk processing.)

  2. MAC forgery. In this part, you are going to produce a MAC forgery. The provided badmac.py file implements a simple (and bad) message authentication code, BADMAC(key, msg) = SHA-256(key || msg).

    Your task is to write a function create_forgery(oracle) in forger.py. The one argument, oracle, can be called as oracle(msg) to create a MAC for the msg argument. You may query the oracle as many times as you wish, passing whatever messages you want and getting their corresponding MACs. Afterward, you need to return a pair (msg, mac) where msg has not been used as a parameter to oracle and mac is a valid MAC (represented as a lowercase hex string) for msg. That is, oracle(msg) == mac must be True.

    Running ./badmac.py will call your create_forgery function and pass it an oracle containing a random 16-byte key. After the function returns, it will print out some information. Here is an example (with the forged string redacted).

    $ ./badmac.py
    Made 10 distinct oracle call(s)
    Key:        5ac68d18f108467317aa7b2e220e2467
    Forgery:    '<redacted>'
    Forged MAC: dc6f61981bba914df332f8225e18a415cd668812208cd2baa8be0c304cd9d070
    MAC:        dc6f61981bba914df332f8225e18a415cd668812208cd2baa8be0c304cd9d070
    Success
    

    You can produce a successful forgery using fewer than 10 calls to the oracle, but you’re unlikely to be successful if you make zero calls.

    The create_forgery function must not write anything to the terminal or to a file, nor read any files, nor make any attempt to extract the key from the oracle or in any other object.

  3. MD5 collision. Creating an MD5 collision is now a fast process. Your task is to create a pair of Python files, hello.py and goodbye.py, that have the same MD5 hash but when run produce different output. hello.py should print Hello and goodbye.py should print Goodbye. The two programs must not rely on their names or any aspect of their environment (including the presence or absence of files) to have different behavior.

    $ md5sum hello.py goodbye.py
    0b613ccb13e31344b9b9033f8d1db462  hello.py
    0b613ccb13e31344b9b9033f8d1db462  goodbye.py
    $ sha256sum hello.py goodbye.py
    d4f6340f80f31a98eeae16a774de1c8a71cb3b9970297711cf1fb09662345403  hello.py
    847a3580b86d318be772269c55b2ca22762a1e5ff680b8ba4679f4d653ffff87  goodbye.py
    $ ./hello.py
    Hello
    $ ./goodbye.py
    Goodbye
    

    Your MD5 hashes will be different from mine.

    To produce these files, clone the upbit/clone-fastcoll repository and build the code by typing make. You do not need to check this code in to your repository. The resulting fastcoll program takes the path of a file as an argument and produces two files md5_data1 and md5_data2 that have the same MD5 hash and both have the contents of the file as a prefix.

    For example, given a file prefix containing the single line “CS 487”, running fastcoll prefix will produce two different files with the same MD5 hash.

    $ echo 'CS 487' > prefix
    $ ./fastcoll prefix
    Generating first block: .........................
    Generating second block: S11.......................................................
    use 'md5sum md5_data*' check MD5
    $ md5sum md5_data?
    3261e0625ab45a7970cfb8f369e85277  md5_data1
    3261e0625ab45a7970cfb8f369e85277  md5_data2
    $ xxd md5_data1
    00000000: 4353 2034 3837 0a00 0000 0000 0000 0000  CS 487..........
    00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    00000040: 9260 4aaa 5098 fd42 341d 3da1 46de 2ad0  .`J.P..B4.=.F.*.
    00000050: 2ed2 8119 441b 3219 3412 7ffd 113e 12de  ....D.2.4....>..
    00000060: 1cd4 07d2 ecf4 c4ef ef45 4e62 086e 9f73  .........ENb.n.s
    00000070: ee6d 3382 fdab f6c9 153f e779 7099 eed0  .m3......?.yp...
    00000080: 44b4 c73f 95a7 a575 4298 eeb4 34cd 02a5  D..?...uB...4...
    00000090: 7be2 1a19 f3c9 74de 7607 cc4b 4f5d 45c1  {.....t.v..KO]E.
    000000a0: ed83 acee 04ac 8237 9bc1 67f9 05c3 791f  .......7..g...y.
    000000b0: 8470 f75b 9057 6485 34e2 e3b9 f97c 5acf  .p.[.Wd.4....|Z.
    $ xxd md5_data2
    00000000: 4353 2034 3837 0a00 0000 0000 0000 0000  CS 487..........
    00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    00000040: 9260 4aaa 5098 fd42 341d 3da1 46de 2ad0  .`J.P..B4.=.F.*.
    00000050: 2ed2 8199 441b 3219 3412 7ffd 113e 12de  ....D.2.4....>..
    00000060: 1cd4 07d2 ecf4 c4ef ef45 4e62 08ee 9f73  .........ENb...s
    00000070: ee6d 3382 fdab f6c9 153f e7f9 7099 eed0  .m3......?..p...
    00000080: 44b4 c73f 95a7 a575 4298 eeb4 34cd 02a5  D..?...uB...4...
    00000090: 7be2 1a99 f3c9 74de 7607 cc4b 4f5d 45c1  {.....t.v..KO]E.
    000000a0: ed83 acee 04ac 8237 9bc1 67f9 0543 791f  .......7..g..Cy.
    000000b0: 8470 f75b 9057 6485 34e2 e339 f97c 5acf  .p.[.Wd.4..9.|Z.
    

    If you look closely at the contents of the two files, you can see that they are different but both start with 4353 2034 3837 0a which is CS 487 followed by a new line.

    One thing that stands out is that there’s a bunch of binary data which is going to be a little hard to incorporate into a Python program. Fortunately, you can tell Python that the source file is UTF-8 by including a comment at the beginning of the file and then trying to constrain all of this binary data in a Python string.

    Start by creating a file containing the following four lines.

    #!/usr/bin/env python2
    # -*- coding: utf-8 -*-
       
    MSG = r"""
    

    Next, run fastcoll on this file producing md5_data1 and md5_data2. Using those two files, produce hello.py and goodbye.py with the behavior specified above.

    You’ll want to append at least """ to them in order to end the string. Note that the binary data produced may not produce a valid string. If that happens, you’ll get errors from Python about the """ not being terminated. In that case, rerun fastcoll to produce different collisions. It should only take a few tries to get some that work.

    Make sure you add and commit hello.py and goodbye.py to your repo.

Deliverables