This is my write up of the third challenge of the Cryptopal’s challenges, using Python3 as my language of choice. The challenge:
Single-byte XOR cipher
The hex encoded string:
1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736
… has been XOR’d against a single character. Find the key, decrypt the message.
You can do this by hand. But don’t: write code to do it for you.
How? Devise some method for “scoring” a piece of English plaintext. Character frequency is a good metric. Evaluate each output and choose the one with the best score.
Achievement Unlocked
You now have our permission to make “ETAOIN SHRDLU” jokes on Twitter.
I tackled this problem in two stages: 1) Implement single-byte XOR against the hex encoded string; 2) Implement the English language scoring system.
Single-byte XOR
This function is similar to the function I wrote about in challenge two. The function takes two inputs, a byte string and a single integer character value (0-255). For each byte in the byte string, the byte will be XOR’d with that value, and the output will be added to a new byte string. The new byte string will be returned after all input bytes have been processed. Here is the function in the interpreter, which I send in a for loop where the variable ‘i’ will be passed as an argument to the XOR function representing the single character value to XOR against:
Most of the XOR’d return values look like just a bunch of garbled nonsense but the correct value should produce a plaintext value. With a bit of visual grep I spotted the plaintext:
The second step is to write code to replace visual grep.
English language scoring system
My solution for a scoring system was to figure out the frequencies that characters occur in English. Once again, a Wikipedia article provided mostly what I needed. I translated the letter frequency table from the article into a dictionary, and for each character of input data I built a score for the data. Here’s the function:
Let me break down what I am returning. It is equivalent to the following code:
# Initialize variable to store each value of each character.
scores = []
# Iterate over each character and make each character lowercase
for byte in input_bytes.lower()
# Change the byte value to a string, and look up the
# character in the character_frequencies variable. If the
# character doesn’t exist in the dictionary, make the value 0.
score = character_frequencies.get(chr(byte), 0)
# Add the score to the list of scores
scores.append(score)
# Sum and return the score
return sum(scores)
Putting it all together
Here is my solution for this challenge:
def get_english_score(input_bytes):
"""Compares each input byte to a character frequency
chart and returns the score of a message based on the
relative frequency the characters occur in the English
language
"""
# From https://en.wikipedia.org/wiki/Letter_frequency
# with the exception of ' ', which I estimated.
character_frequencies = {
'a': .08167, 'b': .01492, 'c': .02782, 'd': .04253,
'e': .12702, 'f': .02228, 'g': .02015, 'h': .06094,
'i': .06094, 'j': .00153, 'k': .00772, 'l': .04025,
'm': .02406, 'n': .06749, 'o': .07507, 'p': .01929,
'q': .00095, 'r': .05987, 's': .06327, 't': .09056,
'u': .02758, 'v': .00978, 'w': .02360, 'x': .00150,
'y': .01974, 'z': .00074, ' ': .13000
}
return sum([character_frequencies.get(chr(byte), 0) for byte in input_bytes.lower()])
def single_char_xor(input_bytes, char_value):
"""Returns the result of each byte being XOR'd with a single value.
"""
output_bytes = b''
for byte in input_bytes:
output_bytes += bytes([byte ^ char_value])
return output_bytes
def main():
hexstring = '1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736'
ciphertext = bytes.fromhex(hexstring)
potential_messages = []
for key_value in range(256):
message = single_char_xor(ciphertext, key_value)
score = get_english_score(message)
data = {
'message': message,
'score': score,
'key': key_value
}
potential_messages.append(data)
best_score = sorted(potential_messages, key=lambda x: x['score'], reverse=True)[0]
for item in best_score:
print("{}: {}".format(item.title(), best_score[item]))
if __name__ == '__main__':
main()