Saturday, November 30, 2019

2.2 Birthday Paradox, or, When Should I Change my Encryption Keys?

You're probably familiar with the basic idea of the birthday paradox: If you have $n$ people in a room, what's the probability of two of them sharing a birthday?  More concretely, how many people have to be in the room before the probability of two sharing a birthday exceeds 50%?  It's a surprisingly small number, but it's not a true paradox in the logical sense.  (This is also called the pigeonhole principle or hash collision probability.)

Modern ciphers are designed so that the ciphertext (output of the encryption) looks random; we can say that the ciphertext has very high entropy.  If the block size is $n$ bits, each of the $2^n$ possible bit combinations should be an equally likely result of encrypting a data block.  (The encryption is deterministic, but the output looks random.)

If we monitor a long stream of encrypted data, there is some probability that we will find two blocks that have the same ciphertext.  We call that a collision.  If the two blocks were encrypted with the same key, we gain information about the plaintext.

(Side question: How can you have birthday paradox collisions of the ciphertext?  Wouldn't that mean that the encryption process is lossy, and that it would be impossible to reliably decrypt that ciphertext back to both original blocks?

Answer: there is more information involved in both the encryption and decryption than just that specific block and the key.  This is the cipher block chaining mentioned at the end of Sec. 1.)

In CBC mode, what you gain is the XOR of two plaintext blocks.  If you get to choose which two, this could be incredibly valuable information; given that it's just two random blocks, it's less so, but not unimportant.  As it happens, if block numbers $i$ and $j$ have the same ciphertext $c[i] = c[j]$, then you get the XOR of the plaintext of blocks $i-1$ and $j-1$.

I spent quite a bit of time working through the math of the birthday paradox, only to come back to one of the first sources I found: the 2016 SWEET32 paper by Bhargavan and Leurent.  Sec. 2.2 of that paper has a compact description of the commonly-quoted square root limit, as well as why and how much to be conservative relative to that value.

If the block size is $n$ bits, there are $N = 2^n$ possible ciphertexts, and if the number of blocks encrypted is the square root of that number, $2^{n/2}$, the probability of at least one collision is above 39%, which arises in the limit as the expression $1-1/e^{1/2}$.  (See the appendix of these notes for some example calculations of this and the following to play around with.)

Assuming you consider a 39% chance of disclosing some information to be an unacceptably large probability, when should you rekey?  That's the first-level answer to our ultimate question of when to rekey using QKD, which is of course back where we started this conversation.  Say, for example, we want the probability of an attacker recovering a plaintext block using the birthday attack to be less than (for example) one in a billion.

If we have some power of two $D = 2^d$ ciphertext blocks, then the expected number of collisions is approximately $2^{2d-n-1}$.  For our one-in-a-billion probability, using $log_2(10^9) \approx 30$, we need to set up our session lifetime such that $2d-n-1 < -30$, or $d < (n-29)/2$.

Since DES has only a 64-bit block, we should set $d \le 17$.  That's a startlingly small number: $D = 2^{17}$ blocks and each block is eight bytes, so we should limit the use of a single key to one megabyte! An impractically small size.

If, on the other hand, you are okay with a disclosure probability of one in a million, we can raise that by a factor of a thousand and change keys once every gigabyte instead.  But if you are trying to protect a 40Gbps link -- a common backbone bandwidth today, and coming to the home in the foreseeable future -- that still means changing keys once every 200msec or so!

Ah, but AES uses $n = 128$, a block size twice as large.  Now we only need $d \le 49$ for that one-in-a-billion probability.  $D = 2^{49}$ times our block size of 16 bytes equals 8 terabytes, about 1,600 seconds on our 40Gbps link.  So change the keys at least once every half hour, and you're good.

Keep in mind a few points:
  1. what's disclosed here is not the key but is plaintext-related, but not even pure plaintext; however, the insanely smart and devious cryptanalysts can do amazing things with small amounts of data;
  2. this depends only on the block size of a block cipher, not on the key length;
  3. part of this arises due to the CBC (cipher block chain) mode, but ECB (electronic code book) mode is worse; there are other modes that I haven't looked at closely; and
  4. given that there are still other attacks, minimizing the amount of data under any given encryption key is still good practice.
Next, let's look at a couple of the cryptanalysis techniques.

(If you missed it, you can go back to the beginning of What Every Quantum Researcher and Engineer Should Know about Classical Cryptography.)

No comments: