Saturday, November 30, 2019

2.2 Birthday Paradox, or, When Should I Change my Encryption Keys?

You're probably familiar with the basic idea of the birthday paradox: If you have $n$ people in a room, what's the probability of two of them sharing a birthday?  More concretely, how many people have to be in the room before the probability of two sharing a birthday exceeds 50%?  It's a surprisingly small number, but it's not a true paradox in the logical sense.  (This is also called the pigeonhole principle or hash collision probability.)

Modern ciphers are designed so that the ciphertext (output of the encryption) looks random; we can say that the ciphertext has very high entropy.  If the block size is $n$ bits, each of the $2^n$ possible bit combinations should be an equally likely result of encrypting a data block.  (The encryption is deterministic, but the output looks random.)

If we monitor a long stream of encrypted data, there is some probability that we will find two blocks that have the same ciphertext.  We call that a collision.  If the two blocks were encrypted with the same key, we gain information about the plaintext.

(Side question: How can you have birthday paradox collisions of the ciphertext?  Wouldn't that mean that the encryption process is lossy, and that it would be impossible to reliably decrypt that ciphertext back to both original blocks?

Answer: there is more information involved in both the encryption and decryption than just that specific block and the key.  This is the cipher block chaining mentioned at the end of Sec. 1.)

In CBC mode, what you gain is the XOR of two plaintext blocks.  If you get to choose which two, this could be incredibly valuable information; given that it's just two random blocks, it's less so, but not unimportant.  As it happens, if block numbers $i$ and $j$ have the same ciphertext $c[i] = c[j]$, then you get the XOR of the plaintext of blocks $i-1$ and $j-1$.

I spent quite a bit of time working through the math of the birthday paradox, only to come back to one of the first sources I found: the 2016 SWEET32 paper by Bhargavan and Leurent.  Sec. 2.2 of that paper has a compact description of the commonly-quoted square root limit, as well as why and how much to be conservative relative to that value.

If the block size is $n$ bits, there are $N = 2^n$ possible ciphertexts, and if the number of blocks encrypted is the square root of that number, $2^{n/2}$, the probability of at least one collision is above 39%, which arises in the limit as the expression $1-1/e^{1/2}$.  (See the appendix of these notes for some example calculations of this and the following to play around with.)

Assuming you consider a 39% chance of disclosing some information to be an unacceptably large probability, when should you rekey?  That's the first-level answer to our ultimate question of when to rekey using QKD, which is of course back where we started this conversation.  Say, for example, we want the probability of an attacker recovering a plaintext block using the birthday attack to be less than (for example) one in a billion.

If we have some power of two $D = 2^d$ ciphertext blocks, then the expected number of collisions is approximately $2^{2d-n-1}$.  For our one-in-a-billion probability, using $log_2(10^9) \approx 30$, we need to set up our session lifetime such that $2d-n-1 < -30$, or $d < (n-29)/2$.

Since DES has only a 64-bit block, we should set $d \le 17$.  That's a startlingly small number: $D = 2^{17}$ blocks and each block is eight bytes, so we should limit the use of a single key to one megabyte! An impractically small size.

If, on the other hand, you are okay with a disclosure probability of one in a million, we can raise that by a factor of a thousand and change keys once every gigabyte instead.  But if you are trying to protect a 40Gbps link -- a common backbone bandwidth today, and coming to the home in the foreseeable future -- that still means changing keys once every 200msec or so!

Ah, but AES uses $n = 128$, a block size twice as large.  Now we only need $d \le 49$ for that one-in-a-billion probability.  $D = 2^{49}$ times our block size of 16 bytes equals 8 terabytes, about 1,600 seconds on our 40Gbps link.  So change the keys at least once every half hour, and you're good.

Keep in mind a few points:
  1. what's disclosed here is not the key but is plaintext-related, but not even pure plaintext; however, the insanely smart and devious cryptanalysts can do amazing things with small amounts of data;
  2. this depends only on the block size of a block cipher, not on the key length;
  3. part of this arises due to the CBC (cipher block chain) mode, but ECB (electronic code book) mode is worse; there are other modes that I haven't looked at closely; and
  4. given that there are still other attacks, minimizing the amount of data under any given encryption key is still good practice.
Next, let's look at a couple of the cryptanalysis techniques.

(If you missed it, you can go back to the beginning of What Every Quantum Researcher and Engineer Should Know about Classical Cryptography.)

Quantum Information Theory, Computing, Algorithms, Networking, Engineering and Education Journals


(Edited on 20/2/29 and 20/5/19, as noted in line.)

The field of quantum computing, broadly constituted, is changing rapidly. Hundreds, nay thousands, of new people are entering the field each year right now. Eventually, things will settle out, but in the short run what is publishable research and how we publish it are going to undergo a lot of stress.  And eventually publishing will become less critical for a large number of quantum computing graduates, as they will move into the nascent quantum industry, where different metrics of success will be applied and ideas will affect society in different ways. But for now, the primary mode of work is research-oriented people creating and testing new ideas, and the standard approach to disseminating those ideas is peer-reviewed publications.

One of the problems is that architecture and engineering are neither theory nor experiment. Each is its own beast. Where do we send papers describing advances in software tools, architectural principles and evaluation, network protocols, etc.? Many of the existing journals are not good venues for this engineering-oriented work, but neither are most of the ACM and IEEE top journals and conferences.

Another item high on my list is how to value creation of new educational tools and techniques, e.g., our MOOC.

But before fussing too much, it's worth taking a look at what we currently have for publication venues...

Very roughly in terms of age, but grouped by organization (in short, the order in which my brain chose to produce/organize them of its own accord):
  1. Nature and Science: You know the drill. Top experimental papers only. 20 papers/year each.
  2. Physical Review Letters. Top papers, including theory and algorithms; HHL, for example. Highly compressed; I refer to a PRL as "the cryptographic hash of a longer paper still to come".
  3. Physical Review A. Has been an excellent venue for almost anything quantum (my first quantum arithmetic paper, for example, and recent engineering-oriented work on quantum networks), and has explicitly added to its charter since the founding of APS's Division of Quantum Information, but is overwhelmed; acceptance rate is declining and topics are narrowing.
  4. Physical Review X. Cody's architecture paper was explicitly described by the editors as, "Hey, this isn't really physics in the traditional sense, but we think it's cool." (Well, that's a liberal paraphrase.) (At almost 250 citations, according to Scholar, good call on their part if they value citations to their journal.)
  5. IOP New Journal of Physics. An excellent venue, longer papers than PRL but good visibility. But seems to draw from same reviewer and reader pool as PRA.
  6. IOP Quantum Science and Technology. Looks to be an excellent new venue, and quantum-focused, which is a win, and we're happy with it, but seems to draw from same reviewer and reader pool as PRA, at least so far. Hopefully engineering-oriented papers will continue to find a home here.
  7. Quantum Information and Computation (QIC). Has been the journal of record for much quantum algorithms work, less so for experimental work and engineering. Subscriptions are expensive, but papers can be put on the arXiv.
  8. Quantum Information Processing (QIP). Along with QIC, a top journal, with more experimental work. Also expensive, also arXiv-friendly.
  9. International Journal of Quantum Information (IJQI). Gets less attention than QIC and QIP. Also arXiv-friendly.
  10. ACM Journal of Emerging Technology in Computing Systems (JETC). One of my favorite journals. I spent two years as an associate editor, and wouldn't mind doing another stint here. arXiv-friendly. Not quantum only, but definitely has a good quantum component. Will be interesting to see if that changes with the introduction of...
  11. ACM Transactions on Quantum Computing (TQC). If the quality of a review board is an indicator of the quality of a journal, destined to be the top thing in our field. True rock stars, and great breadth. Will probably be limited to a few dozen high-impact papers a year?
  12. Nature Quantum Information (npj QI). Aims to be a top journal. Open access, which is great, but the author fees are outrageous. I won't pay them.
  13. Nature Communications and Scientific Reports. In my limited experience, erratic in review quality and editorial procedures, and no help on formatting and copy editing, but still have the outrageous Nature open access fees. A worse deal all the way around than npj QI.
  14. Science Advances. I haven't interacted with this yet at all. Also has high open access fees.
  15. Quantum. The new star in our firmament. An overlay journal dependent on the arXiv. Outstanding, young editorial board, this is the cool coffee shop where all the hipsters hang out. Go here to be seen and to hear the newest tunes!
  16. IEEE Transactions on Quantum Engineering. I have heard only rumors of this and either my Google-fu is poor, or there's nothing about it on the web. Could be good, but unless it's an exception to the truly egregious average IEEE journal latency, not useful for students. (Edit on 20/2/29: Link added. Erik DeBenedictis is EiC, the journal will be good! First papers published in early 2020.)
  17. IET Quantum Communication is brand new, and has a lot of "wireless" in its call, since that's the community the founders come from.
  18. PRX Quantum will start publishing in mid-2020 (Edit: this one added 20/2/29.)
  19. Wiley Quantum Engineering (Edit: this one added 20/2/29.)
  20. Springer Quantum Machine Intelligence (Edit: this one added 20/5/19.)
Of course, there are conferences and workshops, too.

And I'm working on a short slide deck on how contributions to open source and non-traditional publishing (e.g. my #QuantumComputerArchitecture tweetstorm) should count.

But for the moment, the above nineteen or so journals are where the primary action is.

Friday, November 29, 2019

2.1.2 Diffusion and Confusion

Claude Shannon, in his seminal article on the mathematics of cryptography, defined two concepts he called diffusion and confusion.  In diffusion, information from the original plaintext message is spread across more than one symbol in the output ciphertext; the farther the information spreads, the better.  Shannon defined confusion as making "the relation between...[the ciphertext] $E$ and the...[key] $K$ a very complex and involved one."

Feistel, who designed DES, called diffusion the avalanche effect. Webster and Tavares defined the strict avalanche criterion (SAC), requiring that changing a single input bit flips each of the output bits with 50% probability.

The Handbook of Applied Cryptography says the following (p. 20 in my edition):
A substitution in a round is said to add confusion to the encryption process whereas a transposition [permutation] is said to add diffusion. Confusion is intended to make the relationship between the key and the ciphertext as complex as possible.  Diffusion refers to rearranging or spreading out the bits in the message so that any redundancy in the plaintext is spread out over the ciphertext.
I haven't seen it written quite this directly (but then, I'm not that well read in crypto), but I think it's fair to say that confusion is achieved by the nonlinearity of the S-boxes.

These two concepts don't seem to figure prominently in the cryptography books and papers I have been reading, but it seems to me that they ultimately underly much of the process of cryptanalysis: spreading data broadly reduces the relationship exhibited by two ciphertexts even when the plaintexts are closely related, expanding the search space; and the nonlinearity means that even a large set of straightforward equations is not enough to simply and mathematicallyrelate the input and output.

Thursday, November 07, 2019

2.1 Playing Defense / 2.1.1 Entropy

2.1.1 Entropy

The single most important concept in cryptography -- indeed, in all of information theory, and inarguably one of the most important in all of computer science -- is Claude Shannon's entropy.  The entropy is, roughly, the amount of disorder in a sequence of data, or the amount of information that must be transmitted to reconstruct the data.  When the entropy of a cleartext message is high, it is hard to predict the next symbol (bit or byte or letter or floating point number); purely
random data has very high entropy.  When it is low, the cleartext has a skewed probability distribution such that, e.g., the letter 'e' or the number 0 is more common than other values.

Of course, because encryption tries to hide information, encrypted data looks as random as possible: it has very high entropy.  Thus, an easy way to automate a test to see if we have successfully decrypted a message is to calculate the entropy of our provisionally decrypted plaintext; low entropy indicates a high probability that we have (at least partially) succeeded.

The entropy is defined as
$H_{2}(X)=-\sum_{i=1}^{N} \frac{\operatorname{count}_{i}}{N}\log_{2}\left(\frac{\operatorname{count}_{i}}{N}\right)$
where $\operatorname{count}_i$ is the number of times that the $i$th value appeared in our data, and $N$ is the number of different values that show up in our data sequence.

From the fantastic website RosettaCode.org, we have a simple example in Python of calculating the entropy in bits per byte:


import math from collections import Counter def entropy(s): p, lns = Counter(s), float(len(s)) return -sum( count/lns * math.log(count/lns, 2) for count in p.values()) entropy("1223334444")
(This function can actually calculate entropy of strings of other types besides characters, as well.)

Note that this definition of entropy does not take into account any long-range structure; it is purely based on the overall probability of the appearance of a given set of symbols.  Thus, to this function, the strings 0101...01 and 000...01...111 have a bit-level entropy just as high as a string composed of exactly 50/50 randomly chosen 0s and 1s, although by applying a little intelligence we can easily reconstruct the former but not the latter (at least, not exactly).  Many lossless data compression programs operate on stateful _sequences_ of data to predict the next symbol.  For example, the common Unix utilities gzip and compress use the Lempel-Ziv algorithm, which builds a dictionary of sequences as it goes, so that repetitive data and long runs are compressed effectively; this is especially good at sequences such as 0101...01 or 000...01...111.

(Taking still further advantage of brainpower, the ultimate in compression is measured using the _Kolmogorov complexity_: the length of the shortest program that will reproduce the sequence.  This has very low complexity for anything highly regular, as well as seemingly irregular sequences such as the output of a pseudo-random number generator or the digits of $\pi$.  I am not aware of any automated form for calculating the Kolmogorov complexity of an arbitrary sequence of data.)

This, obviously, is one of the motivations for the substitution phase in a substitution-permutation network; with permutation only the entropy of the original cleartext is preserved in the ciphertext, and in some cases reveals a great deal about the original data.