Wednesday, October 30, 2019

2. Cryptanalysis

The general idea of trying to decode messages that are encrypted is known as cryptanalysis.

Some of the planet's smartest people have worked on this for all of their lifetimes, building on work in progress since well before we were born, so my summary here is inevitably going to be superficial and perhaps wrong.  Comments welcome; there is a lot of _bad_ information about cryptography on the web, and I'd prefer to be corrected and make mine _good_ information for the web.

(Somewhere, right now, maybe even as I type, employees of one or more of the world's major intelligence agencies are snickering at how naive my presentation here is.  We don't know how much they really know, but we do know that e.g. the RSA public key cryptosystem and the technique of differential analysis were known to U.S. agencies for years, perhaps decades, before they became public.)

Wikipedia's AES article says, "For cryptographers, a cryptographic 'break' is anything faster than a brute-force attack – i.e., performing one trial decryption for each possible key in sequence (see Cryptanalysis). A break can thus include results that are infeasible with current technology." The page then goes on to list quite a number of attacks on AES, none considered practical yet.

Given how long people have been working on cryptography, naturally there are many techniques for attacking the ciphers.  Here I'll mention just a couple of the modern techniques are known for what we might call "honest" attacks on block cipher cryptography, granting the assumption that the algorithm has no flaws and ignoring side channel attacks (such as measuring the power consumed during encryption) and attacks on infrastructure, people, etc. (some of which are known as "black bag", or burglary, attacks, and some of which are "rubber hose" attacks, extracting information from humans by coercion or torture). David Kahn referred to these latter attacks as "practical cryptanalysis" in his classic book, The Codebreakers.

The older one of the two is differential cryptanalysis, apparently discovered independently at least three times, publicly by Biham and Shamir in the late 1980s, in the mid-70s by IBM (where the goal of defending against the technique drove DES design decisions), and earlier by NSA. The second is linear cryptanalysis, discovered by Mitsuru Matsui in 1990.

(Other techniques that I know nothing about include integral, algebraic, and biclique.)

Before talking about those two in any detail, there's a more straightforward technique that leaks information about the plaintext, if not the key itself: the birthday paradox.  But before we get into the cryptanalysis itself, let's look a little bit at how the designers built defenses into the block ciphers.

1.4 Notes & References

Coming soon to a blog near you!

1.3.3 Limitations of Block Ciphers

One obvious problem with the most straightforward application of a block cipher is that it's deterministic.  If you just apply the transform to each block independently, it's easy to implement; this is known as electronic code book (EBC) mode.  BUT: If you see the same ciphertext $c$ in two different places in the message stream, you know that the two input blocks were the same!  This leaks a huge amount of information to a sophisticated attacker, and is considered unacceptable.

One answer to this is to make the encryption slightly less deterministic by XORing in the ciphertext of the previous block into the current one before performing the encryption.  This is cipher block chaining (CBC) mode, the standard mode of operation.  (There are at least two more modes that I know nothing about.)  CBC has the undesirable side effect of requiring encryption to be done serially, but attacks can be parallelized.  Nevertheless, it's the most commonly used mode.

1.3.2 AES (the Advanced Encryption Standard)

AES, the Advanced Encryption Standard, has replaced DES.  It is a type of block cipher known as a substitution-permutation cipher.  AES uses a block size of 128 bits, and a key size of 128, 192 or 256 bits. The key size affects the number of rounds of substitution-permutation, requiring 10, 12 or 14, respectively.
AES began life in 1997 when NIST announced a competition for a successor to DES, due to the problems with DES described earlier and later. Two Belgian cryptographers, Vincent Rijmen and Joan Daemen, submitted a proposal they dubbed Rijndael, which ultimately beat out fourteen other serious competitors to become, with some modifications, the Advanced Encryption Standard, published as the standard in 2001.
A substitution-permutation network, as you might guess, alternates substituting new bits for old ones with permuting (rearranging) the bits, conducted over a series of rounds. DES has some characteristics of a substitution-permutation network, but is not purely so. A well-designed s-p net will maximize Shannon's confusion (or the avalanche effect) and diffusion, which we will see shortly.
AES begins by laying out the sixteen bytes in a $4\times 4$ array in column major form:the first byte in the upper left, then the next byte below it, with the fifth byte returning to the tp of the second column. Most operations work on rows, however, increasing the mixing of the bytes.
First, in the KeyExpansion phase, the round keys are created from the original key.
Next, the appropriate number of rounds is performed. Each round consists of the following steps:
  1. SubBytes
  2. ShiftRows
  3. MixColumns
  4. AddRoundKey
In the first round, AddRoundKey is performed before the other operations as well as after. In the last round, the MixColumns step is omitted.
The S box in SubBytes is a single, byte-level, fixed operation, $1:1$, unlike the DES S boxes, which are more complex and vary depending on the position within the block. The AES S box is nonlinear ($f(x+y) \ne f(x)+f(y)$) and was designed specifically to defeat both linear and differential cryptanalysis (which we will see later), but should you not trust the specific values they can be modified. The S box can be implemented as a straightforward 256-byte lookup table, with the input byte as the index, or as a simple bitwise matrix multiplication.
The ShiftRows operation rotates all rows but the first. The MixColumns operation is more complex, executed as a matrix multiplication using modulo-two integer arithmetic. These two operations maximize the diffusion of the original data. The AddRoundKey is a simple XOR of the 128-bit round key into the block.
AES is easily implemented in either software or hardware. Some modern CPUs include special instructions to accelerate AES encryption, and even for those that don't, heavily optimized assembly language is achievable.  In the absence of special hardware, encrypting each 128-bit block of data requires several hundred CPU instructions. For example, my Mac laptop, with a 2.2GHz Intel Core i7 processor, encrypts AES-CBC at the following rates:
  • 128-bit key: 1.25 Gbps for 16-byte chunks, 1.44 Gbps for 8192-byte chunks
  • 192-bit key: 1.06 Gbps for 16-byte chunks, 1.18 Gbps for 8192-byte chunks
  • 256-bit key: 912 Mbps for 16-byte chunks, 1.01 Gbps for 8192-byte chunks
Despite two decades of effort, there are no known attacks on AES that allow recovery of the key in substantially less time than brute force; the best attacks roughly halve that cost, at the cost of requiring tremendous amounts of storage.

Thursday, October 24, 2019

Google's Quantum Supremacy Paper

[Important update, 10/25 0630. Dave Bacon has corrected me.  The chip is fully programmable. I had thought that the choice of coupler patterns (Fig. S25 in the SOM) would limit what could and could not be done. I thought those were the only patterns the hardware supported, but it's just the set that they chose; in fact, any couplers can be on/off in any cycle, as long as they aren't using the same qubits. I stand corrected! Thanks, Dave!]

[Edit 10/24. 1730 JP time: softened the language a bit in the first couple of paragraphs; it came out harsher than I intended in the original version.]

We interrupt your irregularly scheduled programming on cryptography (What Every Quantum Researcher and Engineer Should Know about Classical Cryptography) for some comments on Google's new supremacy paper...

By now, everyone on the entire planet and some people in space has heard that Google has claimed the achievement of "quantum supremacy". As the resident quantum pundit on Dave Farber's IP list, I suppose I'm obliged to say a few things, but I really don't have much to add beyond what some of the most prominent people have said, so this is largely a collection of links I mostly endorse...

Fundamentally, Google claims that their 53-qubit superconducting processor can solve a problem that would take a classical supercomputer 10,000 years.  But the problem is rather constrained; arguably, they defined the problem as whatever the computer is capable of doing.  That sounds harsh, and I don't mean it to be; it's still a tour de force that will stand as an important milestone, and it is programmable, but it's not yet fully general. Actually, let me soften that a bit; the chip itself was designed in conjunction with an understanding of what they are physically capable of doing, but very much with an eye toward what would most clearly demonstrate supremacy within their physical constraints.  So it's more fair to say it's hardware/software co-design. (And as I always tell my students, you can't design any computer without a plan for its workload!)

If you squint, it's something like a satisfiability problem (SAT, in CS theory terms).  In a SAT problem, you're given a Boolean circuit and challenged to find a set of input values that makes the (single) output TRUE.  In this case, measuring out all 53 qubits, it's not a single value that's the output, but rather a probability distribution. Using quantum entanglement and interference, in fact, it's not a flat "any number out of $2^{53}$" distribution, but instead has complex statistical characteristics.  It's those characteristics that they are looking for.  They started with a small problem size (using a subset of the qubits), calculated the distribution, saw that the machine conforms (but see the next paragraph!), then increased the size up past the level where their code can simulate it, and declared victory.

There are three two big catches to this: one, as already stated it's not fully programmable; two, it's not clear that this is beyond the capability of classical systems (see below); three, I believe them when they say it works, but noise remains the limiting factor, and it takes tremendous statistical skill to pull the signal out of the noise -- indeed, the fidelity (roughly, probability that it did the right thing) for their largest problem is only 0.2%!  That is, only one time in 500 does the circuit run properly, with no errors.  And given that the output is supposed to be a probability distribution, and that the space of answers is about $10^{16}$, that means figuring out whether it differs from pure noise is tough.

The paper itself:
https://www.nature.com/articles/s41586-019-1666-5
53 qubits and 76 authors by my count, which makes 1.43 authors per qubit.  The team is headed by John Martinis (pronounced close to "Martinez"), one of the long-time key people in superconducting qubits.  John was at UC Santa Barbara before Google lured him and his entire team away a few years ago, and put truly outstanding resources at their disposal.

There is also substantial (67 pages) Supplementary Online Material (SOM) at the bottom of the web page, don't miss it if you want details of everything from the fabrication (two chips bonded face-to-face, one with the classical parts and one with the quantum parts!) to the cross-entropy benchmarking (XEB) used to statistically prove their proposition that the device works to the calibration procedure (tremendously difficult) to the algorithm itself.

News & views from William Oliver:
https://www.nature.com/articles/d41586-019-03173-4

The five-minute Google feel-good video:
https://www.youtube.com/watch?v=-ZNEzzDcllU

Blog post by Sundar Pichai, Google CEO:
https://blog.google/perspectives/sundar-pichai/what-our-quantum-computing-milestone-means

They built on techniques developed by Sergio Boixo & co.:
https://arxiv.org/abs/1608.00263

Naturally, IBM, one of the other biggest names in quantum computing, is pushing back:
https://www.ibm.com/blogs/research/2019/10/on-quantum-supremacy/
(full disclosure: I'm Vice Center Chair of Keio's Quantum Computing Center, and we are working with IBM; my students have recently been using IBM's own 53-qubit processor.)

They're not questioning the experiment itself, but whether it's truly post-classical.  Simulators have been advancing rapidly, and _ideas_ for simulators and what counts as quantum supremacy have been moving even more rapidly.

(n.b.: there is pushback on the term "quantum supremacy"; in particular, the Europeans don't like it, they think it echoes Nazi terminology.)

Anyway...the most straightforward way of simulating a quantum computer takes memory exponential in the number of qubits:

10 qubits: 16KB
20 qubits: 16MB
30 qubits: 16GB
40 qubits: 16TB
50 qubits: 16PB
53 qubits: 128PB (144,115,188,075,855,872 bytes)

and, of course, a separate computation for each entry.  But numerous techniques compress that dramatically; one technique called tensor networks, which (very, very, very roughly) allows a query for the value of a state to reuse previously-calculated results, with the whole thing compressed into a tree of pointers to values, plus modifiers.  IBM's Edwin Pednault and team have simulated up to 56 qubits in special cases:
https://arxiv.org/abs/1710.05867
and this team's new work forms the core of the IBM pushback against Google:
https://arxiv.org/abs/1910.09534

Indeed, Alibaba simulated some special cases related to supremacy last year, up to *144* qubits:
https://arxiv.org/abs/1805.01450

There has been some recent discussion of what it is that truly determines the difficulty of a quantum computation; it's not just the number of qubits or the number of entangling two-qubit gates, it's a more complex function. I should dig up some references on that.

Scott Aaronson's quantum supremacy FAQ:
https://www.scottaaronson.com/blog/?p=4317
and a new post on the topic:
https://www.scottaaronson.com/blog/?p=4372

Simon Devitt's podcast, Meet the meQuanics, just did an episode with some of the inventors of some of the theoretical techniques Google used:
https://soundcloud.com/mequanics/meet-the-mequanics-e53-quantum-supremacy-a-montenaro-m-bremner-r-mann

More useful discussion:
https://spectrum.ieee.org/tech-talk/computing/hardware/how-googles-quantum-supremacy-plays-into-quantum-computings-long-game

And from John Preskill himself, inventor of the term:
https://www.quantamagazine.org/john-preskill-explains-quantum-supremacy-20191002/

Finally, let me point out that Chris Monroe is quoted in Financial Review, talking about his ion trap experiment from a couple of years ago -- featuring, you guessed it, 53 qubits (they were trying for 52 and wound up with 53):
https://www.afr.com/technology/why-google-wasn-t-really-the-first-to-achieve-quantum-supremacy-20190927-p52vg6
https://www.nature.com/articles/nature24654
https://arxiv.org/abs/1708.01044
52 was chosen to get ahead of Misha Lukin's group, which had 51:
https://www.nature.com/articles/nature24622
https://arxiv.org/abs/1707.04344

More than enough for now.  Hope this helps!

Thursday, October 17, 2019

1.3.1 DES (the Data Encryption Standard)

DES, the Data Encryption Standard, was designed by IBM in the 1970s, consulting with the NSA.  DES is a type of cipher known as an iterative block cipher, which breaks data into to blocks and repeats a set of operations on them.  The operations in DES are called a Feistel network, after the inventor.

DES uses a 56-bit key, though products exported from the U.S. were limited to using 40 meaningful key bits [wikipedia/40-bit-encryption]. It was later upgraded to triple-DES, using three 56-bit key pieces and repeating DES three times, giving up to 168 bits of protection.  But it's not just the key size that matters in a block cipher.  The block size itself matters a great deal, as we'll see below.  For DES, that block size is only 64 bits.

DES operates in sixteen rounds, each of which uses a 48-bit subkey generated from the original 56-bit key using a key scheduling algorithm.  In each round, half of the block is tweaked and half initially left alone, then XORed with the tweaked half.  The two halves are swapped before the next round.

The "tweaking" of the right half of the block is done by first expanding the 32 bits into 48 by replicating half of the bits, XORing with the 48-bit subkey, then dividing it into 6-bit chunks and pushing each chunk through one of eight substitution boxes, or S boxes.  Each S box turns 6 bits into 4, using a lookup table defined as part of the algorithm (that is, this operation is not key-dependent).  The S boxes are nonlinear (but not affine), which is the source of the true security of DES; if the S boxes were linear, breaking DES would be easy (or so I am told).

Decrypting DES is exactly the same operation as encrypting, except that the subkeys are used in reverse order.

Slightly more formally, the sequence of operations in a DES encryption is:


  1. Apply initial permutation (IP) (a fixed operation)
  2. For i = 1 to 16 do
    1. divide the block into two 32-bit halves
    2. expand the left half to 48 bits (a fixed operation)
    3. calculate subkey i
      1. split key into two 28-bit halves
      2.  rotate each half 1 or 2 bits (a fixed operation according to the key schedule)
      3. select a subset of 48 bits (a fixed operation according to  the schedule)
    4. XOR subkey i with the left half of the block
    5. split into 8 six-bit pieces
    6. push each 6-bit piece through a $6\rightarrow 4$ S-box
    7. permute and recombine the pieces (a fixed operation)
    8. XOR the left half with the right half of the block
    9.  swap halves of the block
  3. Apply the final permutation (FP) (a fixed operation)

The $6 \rightarrow 4$ S boxes are obviously inherently non-reversible, but the earlier
expansion guarantees that ultimately no information is lost as the block passes through the entire network.

The success of cryptanalysis is often measured in terms of the number of rounds it can break in a multi-round cipher like DES.  e.g., if DES were six rounds instead of sixteen, we could do yea much.  Various attacks have been able to penetrate DES to different depths.  Of course, the more straightforward approach is sheer brute force; as early as 1977, Diffie and Hellman published a critique in which they argued that brute force stood at the edge of feasible, and indeed in 1999, a special-purpose machine named Deep Crack, built by the Electronic Freedom Foundation, cracked a DES key.  Further advances have made it possible even for dedicated hobbyists to crack.

Triple-DES (also called 3DES) has several modes of operation, but is usually used with three independent 56-bit keys, $K1$, $K2$, and $K3$, with encryption performed as $C = E_{K3}(D_{K2}(E_{K1}(P)))$ where $P$ is the plaintext, $E$ and $D$ are the encryption and decryption operations, and $C$ is the ciphertext.

DES was withdrawn as a standard in 2005, after having been replaced by AES in 2001, although the U.S. government still allows 3DES until 2030 for sensitive information.

Monday, October 14, 2019

1.3 Bulk Data Encryption

D-H and RSA are pretty easy to understand, and known to be vulnerable to quantum computers, hence attract a lot of attention. The workhorse symmetric block ciphers for bulk encryption are actually much more complex mathematically, and hence harder to understand, but ultimately can be executed efficiently on modern microprocessors and dedicated chips.

A block cipher takes the data to be encrypted, and breaks it into fixed-size chunks called blocks.  If the last block isn't full, it is filled out with meaningless data.  We also take a key, and using the key perform mathematical operations on the data block.  In a symmetric system, by definition, the decryption key is the same as the encryption key.  Generally, the output block is the same size as the input block.  There is no requirement that the block and key are the same size.

Ideally, the output block (the ciphertext) will look completely random: about 50% zeroes and 50% ones, regardless of input, and with no detectable pattern.  That is, its entropy will be very high.  Of course, it cannot be completely random, as it must be possible for the data to be decrypted by someone holding the appropriate key.  A good cipher, however, will provide few clues to an attacker.  A single bit's difference in either the key or the original plaintext should result in about half of the ciphertext bits being flipped, so that being "close" offers no guidance on a next step.

Thus, a symmetric key system's security is often linked to its key size; with $k$ key bits, it should require, on average, $2^{k-1}$ trial decryptions to find the key and to be able to decrypt the message.  We will discuss this further when we get to cryptanalysis.

Many encryption algorithms have been designed over the years, but two in particular are too important to ignore, so we will examine them: DES, the Data Encryption Standard, which is still in use in modified form but is primarily of historical interest now, and AES, the Advanced Encryption Standard, which is used for most communications today.

Friday, October 11, 2019

1.2 Key Generation

See the previous section here.

The other algorithm we talk about a lot as being both important and vulnerable to Shor's factoring algorithm is Diffie-Hellman key exchange, which is used for creating the session key we will use for bulk data encryption.  It's important that every session have its own separate encryption key; we don't want multiple conversations sharing the same key, for a lot of obvious reasons.

D-H works as follows:

  1. Alice and Bob publicly agree on a modulus $p$ and a base $g$ (in cleartext is okay)
  2. Alice and Bob each pick a secret random number $a$ and $b$
  3. Alice calculates $A = g^a \bmod p$, Bob calculates $B = g^b \bmod p$
  4. Alice sends Bob $A$, Bob sends Alice $B$ (in cleartext is okay)
  5. Alice calculates $s = B^a \bmod p$, Bob calculates $s = A^b \bmod p$

Both Alice and Bob now have the same secret $s$, which they can use as an encryption key.

Note that, as-is, D-H is vulnerable to a man-in-the-middle attack, and so must be coupled with some form of authentication so that Alice and Bob each know that the other is who they say they are.

Thursday, October 10, 2019

1.1 Authentication

Previous post in this series is here.

Okay, time to dive into this in more technical depth. Let's go phase by phase:

1.1 Authentication

Authentication, or proving you are who you say you are, is often described as something you have (a physical key), something.you know (a secret password), or something you are (biometrics, or physically unclonable functions, or PUFs).

The authentication phase can be accomplished using a pre-shared secret key and symmetric encryption of some message and response, or it can be done via asymmetric, public-key cryptography.

The best-known and most-used form of public key crypto is RSA, developed by Rivest, Shamir and Adelman (but also previously discovered by Cocks, of a British intelligence agency):

1. Pick two large primes, $p$ and $q$, let $n = pq$.
2. Calculate $\phi(p,q) = (p-1)(q-1)$.
3. Pick $e$, prime relative to $\phi(p,q)$.
4. Calculate $d$, s.t. $de = 1 \bmod \phi(p,q)$, which we can call the inverse of $e$ modulo $\phi(p,q)$.

The tuple $\langle n,e\rangle$ is now your public key, and $d$ is your private key. You can publish $\langle n,e\rangle$ any way you like, including in the New York Times. To send you a message $m$, I calculate the ciphertext $c$,

$c = m^e \bmod n$

and send $c$ to you.  You can recover the message $m$ by

    $m = c^d \bmod n$

That's all there is to it!  Except that it's expensive, so we don't use it for every message.  If I send you "Hi, Bob, it's Tuesday, let's use 42 as our key," using your public key and you reply, "Tuesday is beautiful, Alice, and 42 is fine," using my public key, we confirm that we are definitely Bob and Alice, known as authentication -- assuming, of course, that you trust that the key that you are holding really and truly belongs to me, and I haven't leaked it out somehow!

As you might guess from the publication of $n$, RSA is the one that's vulnerable to factoring larger integers, and hence of interest to spooks once Shor's algorithm came about.

Current EU recommendations are for 3072-bit RSA keys, recently (2018) upgraded from 2048, for "near-term protection" (up to ten years). They recommend an extraordinary 15360 bits for "long-term protection" (thirty to fifty years). [Cloudflare, Keylength.com, Ecrypt, Lenstra]

(Note: there are many more references to come; they will be delivered in a batch at the end.  It's not practical to list them all here in every article as we go, but I'll try to give you some of the key(ha!) ones.)

1.0 Encrypted Communications

See the previous posting if you haven't.

An encrypted conversation involves, at the macro level, three phases:

  1. Authentication: proving that you are who you say you are
  2. Key generation: creating the keys used for bulk data encryption
  3. Encryption/sending/decryption of the user's valuable bulk data

That's a bit of an oversimplification, as we'll see below when we talk about IPsec, but good enough for now.

You probably all know that there are two major kinds of cryptography -- symmetric key and asymmetric key, also known as public key.  Due to the high costs of computation for public key, bulk data encryption is done using symmetric key.  Symmetric encryption comes in two kinds, block ciphers and stream ciphers.  These days pretty much everything seems to be block, but I'm not sure why.

Some additional terminology:

  • cleartext: data that isn't encrypted and isn't really intended to be (sometimes confused with the below, even by me)
  • plaintext: the original, unencrypted message
  • ciphertext: the encrypted message
  • integrity: the data hasn't been tampered with
  • confidentiality or privacy: the data hasn't been disclosed to anyone unauthorized
  • session keys: the keys used for one communication session
  • forward secrecy: keeping your data secret in the future, esp. by building a crypytosystem that doesn't reuse keys
  • rekeying: changing the keys used in the middle of a session
  • subkeys: keys used for a portion of an encryption process, derived from a subset of the bits of the session key
  • cryptoperiod: the time that a specific key is authorized for use

Attacks on encrypted communications generally fall into one of three
categories:

  • unknown plaintext: the hardest problem; how do you recognize when you've found the message?  With many but not all systems, failure will leave only unintelligible, random data, while success will produce words from the dictionary or other recognizable text.
  • known plaintext: when an attacker knows what the plaintext corresponding to a particular ciphertext is, and attempts to find the key; not uncommon given the regularity of communications such as web browsing or email.
  • chosen plaintext: when the attacker can control the text to be encrypted, but obviously not the key; rarer than known plaintext, but can happen with small devices that a person may "own" but not always completely control, or if the attacker partially controls some subset of resources, such as a related web server, or has compromised one or more hosts behind an encryption gateway.

We also need this definition:

  • brute force/exhaustive search: checking every possible key, which of course is $2^n$ for an $n$-bit key, resulting in an expected hit time of half that number of trials; if you have a method that will find a key in substantially less than $2^{n-1}$ trials, you have "broken" the cipher, even if your attack isn't necessarily practical in the short run.

And three mathematical definitions I know for algebra on the real numbers, but I'm a little fuzzy on in the bitwise, cryptographic context:

  • linear function: I think in this context, $f(x,y)$ is a linear function of some bits if it involves only a linear addition of the inputs, and $f(0,0) = 0$ (origin is preserved).  Multiplication (AND) is disallowed?  Importantly, $f(x1 + x2) = f(x1) + f(x2)$.
  • affine function: same as a linear function, but an offset is allowed, such that $f(0,0) = 1$ (origin isn't necessarily preserved) (equivalent to a translation in a real space $R^n$).
  • nonlinear function: A nonlinear function is one in which $f(x+y) \ne f(x) + f(y)$ for some values $x$ and $y$. Some affine functions are nonlinear. I'm definitely fuzzy here...multiplication and arbitrary mappings are allowed?
Next posting in this series is here.

Wednesday, October 09, 2019

0.0 Introduction

I just posted this introduction to the project here.  I hope you'll follow along as I fill in the postings.

There are two fundamental reasons for periodic rekeying:
  1. Having a long string of data encrypted with the same key increases the ability of an attacker to find the key.
  2. As a practical matter, reducing the amount of data that is exposed in the event that a key is broken is good stewardship of your data.
So changing the key "often" makes it both harder and less valuable for an attacker to find a key.  That's the tl;dr.  Turning it into concrete, numeric recommendations is hard, but it looks like rekeying every half hour is pretty good, if your key bandwidth isn't high enough to do full one-time pad.

The above I knew off the top of my head, but I'm not a cryptographer and had more or less taken the need for changing the keys (known as rekeying) on faith.  So, let's dig a little deeper...there won't be any new knowledge in this, but I hope it helps bring together some information and organizes it to both save you the googling and calculating I've done and to provide a clear picture.

I used to work on a box known as an IPsec gateway, but I was the OS guy, not the cryptographer; Darrell Piper, Dan Harkins and Dave Kashtan did most of the crypto work.  I am also a board member of the WIDE Project, which through its KAME Project created the Racoon software that was an important early implementation of the key exchange protocol for IPsec, with much of that work done by Shoichi Sakane.  Much of what I know about IPsec and other security was just incidental radiation from them, or gained from the security-oriented papers we had to read when I did my master's at USC back before the Enlightenment.

I'm afraid this has gotten longer than I originally intended, so I hope it's all useful.  It's so long, a table of contents is in order:
  1. Brief, mostly non-mathematical intro to cryptography and its terminology, in the context of encrypted communications.
  2. General ideas behind cryptanalysis and the importance of limiting data volume, since that's the topic that started this conversation.
  3. How these ideas play out in IPsec, one of the Internet's key security protocols.
  4. How these ideas play out in TLS/SSL.
  5. Quantum and crypto.
  6. Post-quantum crypto.
  7. ACT NOW and order the above package, and as a FREE bonus, Ronco (well, Rodco) will send you a few thoughts on Bitcoin and its vulnerability to the development of quantum computers!
  8. References
Next posting is here.

What Every Quantum Researcher and Engineer Should Know about Classical Cryptography

At a conference in Berlin in June, I was asked, paraphrasing, about why we change the cryptographic keys used during long communication sessions, and how often they ought to be changed.  I waved my hands a little bit, but I wasn't completely satisfied with my own answer. It turned out to be a good opportunity for me to dig into a little background I'd been meaning to go through.  It also proved to be a remarkably hard question to answer from googling up pages, including cryptography research papers and the extensive mailing list archives from development of key Internet protocols.  I fell down a pretty serious rabbit hole here, and haven't found my way out yet.

What I have wound up with is about 25 pages of notes on classical cryptography, and I have decided to turn them into something that will benefit the quantum community at large.  I'm tentatively titling it, "What Every Quantum Researcher and Engineer Should Know about Classical Cryptography", in homage to Goldberg's classic paper on floating point arithmetic.

The target audience for this is primarily quantum computing researchers who are familiar with Shor's algorithm, Grover's algorithm and QKD, since those are among the first things you learn, but who have only a very rough idea of what it means to actually encrypt data and to use encryption in a real-world setting.

I'll be posting the notes gradually over the next few weeks here on this blog, and I hope you will comment and help me improve them.

Table of Contents:
  1. Introduction
  2. Encrypted Communications
    1. Authentication
    2. Key Generation
    3. Bulk Data Encryption
      1. DES
      2. AES
      3. Limitations of Block Ciphers
    4. Notes & References
  3. Cryptanalysis
    1. Defense
      1. Entropy
      2. Diffusion and Confusion
    2. The Birthday Paradox, or, When Should I Change my Encryption Keys?
    3. Differential Cryptanalysis
    4. Linear Cryptanalysis
    5. Known and Chosen Plaintexts in Real Systems
    6. Notes & References
  4. IPsec and the IETF
    1. Internet Standards
    2. (Classical) IPsec
    3. Digging into the Cryptanalysis of IPsec
    4. IPsec with QKD
    5. Notes & References
  5. TLS/SSL and cryptography
    1. TLS Records and Basic Limits
    2. Keying and Rekeying
    3. Other Attacks on TLS
    4. TLS and QKD
    5. Notes & References
  6. Quantum Attacks on Classical Crypto
    1. Shor 'Nuff
    2. Grover's Amplifier
  7. Post-Quantum Cryptography
  8. Bitcoin/Blockchain and Quantum Computers
  9. Conclusion
  10. Additional References
See the first installment here.

Tuesday, October 08, 2019

testing, testing, is this thing on?

My interest in running a blog was tailing off when I lost access to this account, so I let it go for a long time. Now, I have some things to say, and I may have recovered access here.