Cryptopals challenge 3: Single-byte XOR cipher in Python

This is my write up of the third challenge of the Cryptopal’s challenges, using Python3 as my language of choice. The challenge:

Single-byte XOR cipher

The hex encoded string:

1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736

… has been XOR’d against a single character. Find the key, decrypt the message.

You can do this by hand. But don’t: write code to do it for you.

How? Devise some method for “scoring” a piece of English plaintext. Character frequency is a good metric. Evaluate each output and choose the one with the best score.

Achievement Unlocked

You now have our permission to make “ETAOIN SHRDLU” jokes on Twitter.

I tackled this problem in two stages: 1) Implement single-byte XOR against the hex encoded string; 2) Implement the English language scoring system.

Single-byte XOR

This function is similar to the function I wrote about in challenge two. The function takes two inputs, a byte string and a single integer character value (0-255). For each byte in the byte string, the byte will be XOR’d  with that value, and the output will be added to a new byte string. The new byte string will be returned after all input bytes have been processed. Here is the function in the interpreter, which I send in a for loop where the variable ‘i’ will be passed as an argument to the XOR function representing the single character value to XOR against:

Most of the XOR’d return values look like just a bunch of garbled nonsense but the correct value should produce a plaintext value. With a bit of visual grep I spotted the plaintext:

The second step is to write code to replace visual grep.

English language scoring system

My solution for a scoring system was to figure out the frequencies that characters occur in English. Once again, a Wikipedia article provided mostly what I needed. I translated the letter frequency table from the article into a dictionary, and for each character of input data I built a score for the data. Here’s the function:

Let me break down what I am returning. It is equivalent to the following code:

# Initialize variable to store each value of each character. 
scores = []
# Iterate over each character and make each character lowercase
for byte in input_bytes.lower()
    # Change the byte value to a string, and look up the 
    # character in the character_frequencies variable. If the
    # character doesn’t exist in the dictionary, make the value 0.
    score = character_frequencies.get(chr(byte), 0)
    # Add the score to the list of scores
    scores.append(score)
# Sum and return the score
return sum(scores)

Putting it all together

Here is my solution for this challenge:

def get_english_score(input_bytes):
    """Compares each input byte to a character frequency 
    chart and returns the score of a message based on the
    relative frequency the characters occur in the English
    language
    """

    # From https://en.wikipedia.org/wiki/Letter_frequency
    # with the exception of ' ', which I estimated.
    character_frequencies = {
        'a': .08167, 'b': .01492, 'c': .02782, 'd': .04253,
        'e': .12702, 'f': .02228, 'g': .02015, 'h': .06094,
        'i': .06094, 'j': .00153, 'k': .00772, 'l': .04025,
        'm': .02406, 'n': .06749, 'o': .07507, 'p': .01929,
        'q': .00095, 'r': .05987, 's': .06327, 't': .09056,
        'u': .02758, 'v': .00978, 'w': .02360, 'x': .00150,
        'y': .01974, 'z': .00074, ' ': .13000
    }
    return sum([character_frequencies.get(chr(byte), 0) for byte in input_bytes.lower()])


def single_char_xor(input_bytes, char_value):
    """Returns the result of each byte being XOR'd with a single value.
    """
    output_bytes = b''
    for byte in input_bytes:
        output_bytes += bytes([byte ^ char_value])
    return output_bytes


def main():
    hexstring = '1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736'
    ciphertext = bytes.fromhex(hexstring)
    potential_messages = []
    for key_value in range(256):
        message = single_char_xor(ciphertext, key_value)
        score = get_english_score(message)
        data = {
            'message': message,
            'score': score,
            'key': key_value
            }
        potential_messages.append(data)
    best_score = sorted(potential_messages, key=lambda x: x['score'], reverse=True)[0]
    for item in best_score:
        print("{}: {}".format(item.title(), best_score[item]))

if __name__ == '__main__':
    main()

Leave a Reply

Your email address will not be published. Required fields are marked *