Crystalline Cipher and cryptography snakeoil

Martijn Grooten on Twitter (with his signature sarcastic undertone) posed a weekend challenge for anyone interested in cryptography: have fun with Crystalline Cipher and show that it is fundamentally broken. Now, I've met with Martijn only once, but from that meeting I gathered that he really loves mathematics, at least as much as I do, and has a great understanding of cryptography. This whole business with Crystalline started off with the discussion on irtf.org, when the cipher author was convincing everyone that he discovered the best cipher in the world.  But to Martijn, mailing list members and me the cipher has a smell of snakeoil. Since this weekend turned out to be rather free for me, I decided to have a go at cryptoanalysis.

Mind you, I'm not in any way formally trained in cryptography. I've had two one semester long cryptography courses - one from more mathematical standpoint and the other one from more technical standpoint. The most complex cryptographic thing that I can do is probably doing an RSA encryption using pen, paper and simple calculator. So, I'm not in any way qualified to do cipher assessments and I don't know virtually anything on how to create ciphers. However, putting it the other way: if even I can see problems with a cipher, that means that there's something wrong with it.

Crystalline Cipher - the pitch

In the words of an author:
Cystalline Cipher is an experimental new type of symmetric cipher. Designed to be as secure as a one-time pad, with cipher text randomness approaching (or exceeding) random sources based upon radioactive decay, Crystalline hopes to set a new standard in security.
Well, there are several things wrong with this statements, and let's start with a little bit of history. If you heard about one-time pads just skip the next section (really, just skip it starting from here...).

One-time pads

Is there a cipher that cannot be broken? Some of you may know that the answer to that question is "yes". Nobody can break that cipher, not the NSA, not Five Eyes, not even all of the countries in the world combined. And that cipher is called one-time pad. Basically the idea is that you randomly generate key which is the same length as the plaintext. Then, you combine the key with plaintext using XOR operation (or letter shifting if you're doing it on paper).

Let's analyze this cipher and see what interesting properties it has. If you encrypt any message using a random key of the same length it's pretty easy to show that the result will be a random string. This is the most sought-after property of any cipher: truly random output, which does not (directly) depend on the plaintext. Of course, if you have the key, then you can decrypt the ciphertext. But if you have the wrong key, there isn't a way to know it. You can still try to decrypt a message and end up with a different message then the original. In case of the one-time pad you can end up with ANY message that you want, if you use the right key.

This is what (intuitively) makes this cipher mathematically unbreakable. So, why don't we all just use one-time pad and get rid of RSA, AES, RC4 and all the other ciphers? Because one-time pad is hugely impractical. You have to have key of the same length as the message you want to encode. This means that you have to find a secure way to transfer the key - which is the same length as the message. So, if you have that channel ready at all times, why not just transfer the message using that channel?

Of course you can transfer the key beforehand. Some spy agencies did it. However, then you have to store the copy of that key with you and have it ready for encryption. Which again poses some threat of security violations - someone may steal it from you and decrypt you past messages. And when you try to encrypt file using one-time pad you have to have twice as much storage for it to work. So, that's why we use not perfectly secure, but more practical solutions - AES, RSA and others.

(... to here)

So let's dissect the statements made by the author about the Crystalline cipher. Designed to be as secure as a one-time pad - well, clearly that means that there has to be a mathematical proof of it being unbreakable. Otherwise, it's just cheap marketing talk. These kind of statements always throw me off, because they prove that the author does not have the faintest idea about the profound role of one-time pad in the cryptography - the subject that he tries to represent. 

Next statement - cipher text randomness approaching (or exceeding) random sources based upon radioactive decay - does also show that the author doesn't know what's he talking about. We currently believe that radioactive decay is a truly random process, hence you cannot claim that something is "more random" without annoying many physicists.

And now we come to these two statements, which are just hilarious:
Unlike many ciphers, Crystalline operates on an entire file and does not require padding operations or compression.
IMPORTANT SECURITY NOTE: Compress all files first 
Not only are they contradictory, but also there is a serious problem with your cipher if you are depended on compressing the data first.

And lastly, I don't even know what this paragraph is about:
Crystalline employs information loss as the basis of its security. This is achieved by shifting the location of data at both a binary and byte level based upon a value provided by the key and salt. As such, the end result is a cipher text that reflects the relationships found in the key and salt, rather than the data. If the key and salt is from a true random source, such as atmospheric noise or quantum fluctuations, then no relationship should be present in the location of bits and bytes in the cipher text. By multiplying the key and salt, information about both sources is also lost.

Crystalline Cipher - the algorithm

It's easy to make fun of marketing and buzzwords (look up "APT"), but let's go into more technical stuff. The algorithm looks rather simple. First you need values called "key", "salt" and "iv". First two are byte strings and the last one is integer. Then the cipher runs a number of "rounds" on plaintext. Each round is composed of two parts. Let's assume that byte(a, i) means i-th byte of string modulo string length. 

Part A (bit switch-shuffle part):
  1. For every bit in the plaintext:
    1. Switch (negate) that bit value.
    2. Swap that bit (which is at position i) with a bit whose position is byte(key, i) * byte(salt, i).
Part B (byte shuffling part):
  1. For every byte in the plaintext:
    1. Swap that byte (which is at position i) with a byte whose position is byte(key, i) * byte(salt, i).
And that's all. Number of rounds is determined using some values of key, salt and iv, but let's just assume (for the sake of simplicity) that it's just the iv value. That's an easy exercise to show that this assumption does not weaken the cipher in any way.

Crystalline Cipher - observations and weaknesses

What's the key in this cipher?

The author probably wanted to make "key" variable the actual key. However, if we take a closer look "key" is not used in its original form. Instead, we only use byte(key, i) * byte(salt, i) value, which is the "actual key". It is worth noting that if the length of key and length of salt is the same (as author recommends) the actual key is just a (byte-oriented) multiplication of that values. So there is no need to introduce "salt" value just to confuse the reader (or make it sound more scientific) and we can now assume that we have just a random key, which tells us which file positions to swap. 

This key actually has some limitations, which will be discussed later on.

Is permutation a form of encryption?

While this question may be weird at first sight, I've never thought about it before this cipher. Of course permutation can make it harder to read the text, but there's still a lot to infer from a random permutation of plaintext. You can guess the language of the plaintext. You may guess some words, provided that the messages have a constant template, which is the case for many file formats and our e-mails (Hi ..., ... Best regards, My Name). If the cipher is based mainly on permutation (as is the case with this one), there is a lot of information that can easily leak.

But this is just all a hand-waving arguments, let's get to more serious flaws.

The recommended key length: 16 KB + 16 KB

Author recommends key length of 16 KB (for "key") and 16 KB (for "salt"). That's 32 KB in total. Yes, kilobytes. The largest key length in common use today that I was able to find is 4 Kb for RC4. That's 64 times smaller! This key overhead is really significant, especially if you're trying to encrypt small files.

Every key produces different output, right?

One of the most prominent features of one-time pad (or any good cipher) is the fact that different key produces different output. This is in fact really important, because if two (or more) keys produce the same result, we can reduce the key space to brute-force. This means that having a big key size can amount to nothing, because in fact you only have a handful of ciphertexts.

Let's see what the following key will do with the plaintext:
2, 1, 4, 3, 6, 5, 8, 7, 10, 9, ...
So, the first bit will be negated and switch with the second one. Then it will be negated again and will get back at his original position. And so on... So, using that (seemingly proper) key, you'll get the same ciphertext as plaintext - no encryption at all. And, as you can prove yourself, this outcome does not depend on the number of rounds used.

Now, Do you know what will the following actual key will produce?
 3, 4, 1, 2, 7, 8, 5, 6, ...
Hmmm... I'm wondering if there is a pattern. Maybe something like this:

  1. Choose any permutation (p1, p2, ..., pn) of values 1, ..., n where n is the (even) key length.
  2. Let the key (k1, k2, ..., kn) be:
    1. k(p2*i) = k(p2*i+1) for i = 1, ..., n/2
    2. k(p2*i+1) = k(p2*i) for i = 1, ..., n/2
Hence, every permutation can produce a key that does nothing to the plaintext. That's at least n!/2^(n/2) keys that will not encrypt the plaintext. Why "at least"? There are other keys that can do that, but I'll leave it as an exercise for you.

What happens over 64KB?

Since the actual key can only tell the algorithm to swap current index with index byte(key, i) * byte(salt, i) < 65,025 so any bit (or byte) after 65,025 can only be swapped with bytes that come before that mark.

Hence every bit over that mark will be negated. If you have odd numbers of rounds, this means that above some mark most of the bits will be negation of the plaintext bits. If you have even number of rounds, above some mark most of the bits will be actual bits of the plaintext. They may be at different places of the ciphertext though. And that leads us to the final question...

Is the ciphertext really, truly random?

The author says it himself:
Thus, randomness in the ciphertext is a very strong indicator of strength in the cipher.
And, as we saw in the previous question, it may not be true. Let's assume that we have really large (e.g. 1 GB) file filled with zeros. Using the observation that we made above, if we use 1 round of cipher, we will get a lot of ones. How much? Well, all of the zeros above 64Kb will get converted to ones, no matter what key we will use. Of course, they can be in a different places in the file, but they still will be ones. That's approximately 99.9992% of the ciphertext filled with ones. That doesn't seem like "random" and certainly not  approaching (or exceeding) random sources based upon radioactive decay. 

Summary

I've learned a lot playing with that simple and obviously broken cipher. I hope you learned a lot by reading about my journey. I'd like to thank Martijn for the information about that cipher - it was just at the right level for me to learn some cryptoanalysis techniques and see how real ciphers should look like. Bottom line is that Crystalline Cipher looks like it started as a simple shuffle and then author applied patchwork at few steps to answer problems it posed to his cipher. I know that mindset from working on several projects and it is never, ever a good idea. 

Anyhow, that's my weekend story for you!

Comments

Popular posts from this blog

Having fun with AndroidManifest.xml

Android malware based on SMS encryption and with KitKat support

Android malware goes Mono (.NET) and Lua!