Home About Me

How I Accidentally Broke My Own AES-CTR Encryption

Using a cryptographic algorithm the wrong way is really unsafe.

Why I started looking into it

A few days ago I wrote a very simple encryption program in Python. It used AES-128-CTR from tinyaes, and because I found it troublesome, I did not bother setting up an IV properly.

After that, I spent several days searching for information about IVs in AES. I kept wondering what the IV was actually for.

While reading about the five AES modes, I found plenty of explanations for ECB and CBC, but the other modes were often described vaguely. The one thing everyone seemed to say very clearly was: ECB is unsafe, reusing the same key and IV is unsafe, and in CTR mode it can completely destroy confidentiality 😂.

That sounded like a direct denial of the little program I had just written. But most explanations stopped right there. They would say “confidentiality is lost,” but not explain exactly how it is lost. So I had to think it through myself 😓.

Figuring out what goes wrong

First I compared the five AES modes and noticed something different about CTR mode. It does not directly encrypt the data in the same way some other modes do. Instead, it encrypts the counter/IV-related input with AES to produce a stream, then XORs that stream with the plaintext to get the ciphertext.

So that is why encryption and decryption can use the same operation. It really is XOR encryption in that sense 😂.

Then I searched for whether XOR encryption is safe. Many people said it is a very unsafe encryption method, something only people who do not understand cryptography or beginners would use. Coincidentally, I am exactly someone who does not understand cryptography 😂. But again, they often did not explain clearly why XOR encryption is unsafe. That was honestly frustrating.

Since the explanations were missing, I tried deriving it myself.

The warning about CTR is specifically about encrypting different data with the same key and IV. The encryption is based on XOR, so it can be described like this:

K = AES(key, iv)

C1 = K xor P1

C2 = K xor P2

Here, K is the value produced by encrypting the key/IV-related input with AES, C is ciphertext, and P is plaintext.

Then I wondered: what happens if I calculate C1 xor C2?

Using the rules of XOR, the result becomes clear:

C1 xor C2 = P1 xor P2

In this situation, if you know either P1 or P2, you can XOR it with that result and recover the other plaintext:

(C1 xor C2) xor P1 = P2

So as long as you know one plaintext-ciphertext pair encrypted with the same key and IV, you can crack the other ciphertexts encrypted the same way.

Based on that idea, I wrote a Python program to recover a plaintext using a known plaintext-ciphertext pair:

import sys

if not len(sys.argv) == 4:
    exit(f"Usage: {sys.argv[0]} [enc_file1] [enc_file2] [plain_file2]")
with open(sys.argv[1], "rb") as enc_file1:
    with open(sys.argv[2], "rb") as enc_file2:
        with open(sys.argv[3], "rb") as plain_file2:
            with open(sys.argv[1] + ".crack", "wb") as crack_file1:
                crack_file1.write(bytes(a ^ b for (a, b) in zip(bytes(a ^ b for (a, b) in zip(enc_file1.read(), enc_file2.read())), plain_file2.read())))

After writing it, I realized it had about the same number of lines as the encryption program 😂.

I tried it, and it really could recover the plaintext corresponding to another ciphertext, without knowing the password at all. I am not sure whether it was because of my program or because of the derivation, but when the file I wanted to crack was longer than the known plaintext-ciphertext pair, the recovered plaintext only worked up to the same length as the known pair.

But anyway, it was enough to prove that my earlier encryption program had indeed “completely lost confidentiality.”

Fixing the program

To fix the problem in the previous encryption program, I decided to behave properly this time and use the IV according to the official instructions.

Here is the rewritten version:

import hashlib, tinyaes, sys, os

if not len(sys.argv) == 3:
    exit(f"Usage: {sys.argv[0]} [filepath] [key]")
enc = False
if len(sys.argv[1]) > 4:
    if sys.argv[1][-4:] == ".enc":
        enc = True
with open(sys.argv[1], 'rb') as orig:
    iv = os.urandom(16)
    key = tinyaes.AES(hashlib.md5(sys.argv[2].encode()).digest(), orig.read(16) if enc else iv)
    with open(sys.argv[1][:-4] if enc else sys.argv[1] + ".enc", 'wb') as targetfile:
        if not enc:
            targetfile.write(iv)
        for byte_block in iter(lambda: orig.read(4096), b''):
            targetfile.write(key.CTR_xcrypt_buffer(byte_block))

This time, I stored the randomly generated IV at the beginning of the file. Because of that, I could no longer use exactly the same process for both encryption and decryption, so I used the .enc suffix to distinguish plaintext files from encrypted files.

Also, the number of lines almost doubled 😂.

What I learned

For subjects I do not understand very well, it is probably better to follow the manual properly. Pretending to understand something usually means the result will not behave the way I expected.