rdv live from Tokyo

Friday, December 20, 2019

Network Graph Visualization Software

Just a short list of network/graph visualization software we have used for various things. I keep losing track of them, so I figured it was worth writing them down.

GraphViz is maybe the most venerable.
NetworkX is a pretty cool toolkit for Python. Very easy to use, a great starting point.
D3's Network Graph can do some nice things, well integrated into web pages.
Graphs and Networks in Mathematica. I have used a couple of different approaches in Mathematica, including just treating the graph as a variable. I should dig out examples; I wasn't fully happy with what I managed with simple code. But if you're willing to go through the syntactic pain, you can do amazing things in Mathematica.
OmNeT++ and ns-3 are full-on (classical) network simulators we have used for some sims, and they have built-in or add-on visualization tools.

Do you have a favorite or a dislike among this list? What have I missed?

Saturday, November 30, 2019

2.2 Birthday Paradox, or, When Should I Change my Encryption Keys?

You're probably familiar with the basic idea of the birthday paradox: If you have $n$ people in a room, what's the probability of two of them sharing a birthday? More concretely, how many people have to be in the room before the probability of two sharing a birthday exceeds 50%? It's a surprisingly small number, but it's not a true paradox in the logical sense. (This is also called the pigeonhole principle or hash collision probability.)

Modern ciphers are designed so that the ciphertext (output of the encryption) looks random; we can say that the ciphertext has very high entropy. If the block size is $n$ bits, each of the $2^n$ possible bit combinations should be an equally likely result of encrypting a data block. (The encryption is deterministic, but the output looks random.)

If we monitor a long stream of encrypted data, there is some probability that we will find two blocks that have the same ciphertext. We call that a collision. If the two blocks were encrypted with the same key, we gain information about the plaintext.

(Side question: How can you have birthday paradox collisions of the ciphertext? Wouldn't that mean that the encryption process is lossy, and that it would be impossible to reliably decrypt that ciphertext back to both original blocks?

Answer: there is more information involved in both the encryption and decryption than just that specific block and the key. This is the cipher block chaining mentioned at the end of Sec. 1.)

In CBC mode, what you gain is the XOR of two plaintext blocks. If you get to choose which two, this could be incredibly valuable information; given that it's just two random blocks, it's less so, but not unimportant. As it happens, if block numbers $i$ and $j$ have the same ciphertext $c[i] = c[j]$, then you get the XOR of the plaintext of blocks $i-1$ and $j-1$.

I spent quite a bit of time working through the math of the birthday paradox, only to come back to one of the first sources I found: the 2016 SWEET32 paper by Bhargavan and Leurent. Sec. 2.2 of that paper has a compact description of the commonly-quoted square root limit, as well as why and how much to be conservative relative to that value.

If the block size is $n$ bits, there are $N = 2^n$ possible ciphertexts, and if the number of blocks encrypted is the square root of that number, $2^{n/2}$, the probability of at least one collision is above 39%, which arises in the limit as the expression $1-1/e^{1/2}$. (See the appendix of these notes for some example calculations of this and the following to play around with.)

Assuming you consider a 39% chance of disclosing some information to be an unacceptably large probability, when should you rekey? That's the first-level answer to our ultimate question of when to rekey using QKD, which is of course back where we started this conversation. Say, for example, we want the probability of an attacker recovering a plaintext block using the birthday attack to be less than (for example) one in a billion.

If we have some power of two $D = 2^d$ ciphertext blocks, then the expected number of collisions is approximately $2^{2d-n-1}$. For our one-in-a-billion probability, using $log_2(10^9) \approx 30$, we need to set up our session lifetime such that $2d-n-1 < -30$, or $d < (n-29)/2$.

Since DES has only a 64-bit block, we should set $d \le 17$. That's a startlingly small number: $D = 2^{17}$ blocks and each block is eight bytes, so we should limit the use of a single key to one megabyte! An impractically small size.

If, on the other hand, you are okay with a disclosure probability of one in a million, we can raise that by a factor of a thousand and change keys once every gigabyte instead. But if you are trying to protect a 40Gbps link -- a common backbone bandwidth today, and coming to the home in the foreseeable future -- that still means changing keys once every 200msec or so!

Ah, but AES uses $n = 128$, a block size twice as large. Now we only need $d \le 49$ for that one-in-a-billion probability. $D = 2^{49}$ times our block size of 16 bytes equals 8 terabytes, about 1,600 seconds on our 40Gbps link. So change the keys at least once every half hour, and you're good.

Keep in mind a few points:

what's disclosed here is not the key but is plaintext-related, but not even pure plaintext; however, the insanely smart and devious cryptanalysts can do amazing things with small amounts of data;
this depends only on the block size of a block cipher, not on the key length;
part of this arises due to the CBC (cipher block chain) mode, but ECB (electronic code book) mode is worse; there are other modes that I haven't looked at closely; and
given that there are still other attacks, minimizing the amount of data under any given encryption key is still good practice.

Next, let's look at a couple of the cryptanalysis techniques.

(If you missed it, you can go back to the beginning of What Every Quantum Researcher and Engineer Should Know about Classical Cryptography.)

Quantum Information Theory, Computing, Algorithms, Networking, Engineering and Education Journals

(Edited on 20/2/29 and 20/5/19, as noted in line.)

The field of quantum computing, broadly constituted, is changing rapidly. Hundreds, nay thousands, of new people are entering the field each year right now. Eventually, things will settle out, but in the short run what is publishable research and how we publish it are going to undergo a lot of stress. And eventually publishing will become less critical for a large number of quantum computing graduates, as they will move into the nascent quantum industry, where different metrics of success will be applied and ideas will affect society in different ways. But for now, the primary mode of work is research-oriented people creating and testing new ideas, and the standard approach to disseminating those ideas is peer-reviewed publications.

One of the problems is that architecture and engineering are neither theory nor experiment. Each is its own beast. Where do we send papers describing advances in software tools, architectural principles and evaluation, network protocols, etc.? Many of the existing journals are not good venues for this engineering-oriented work, but neither are most of the ACM and IEEE top journals and conferences.

Another item high on my list is how to value creation of new educational tools and techniques, e.g., our MOOC.

But before fussing too much, it's worth taking a look at what we currently have for publication venues...

Very roughly in terms of age, but grouped by organization (in short, the order in which my brain chose to produce/organize them of its own accord):

Nature and Science: You know the drill. Top experimental papers only. 20 papers/year each.
Physical Review Letters. Top papers, including theory and algorithms; HHL, for example. Highly compressed; I refer to a PRL as "the cryptographic hash of a longer paper still to come".
Physical Review A. Has been an excellent venue for almost anything quantum (my first quantum arithmetic paper, for example, and recent engineering-oriented work on quantum networks), and has explicitly added to its charter since the founding of APS's Division of Quantum Information, but is overwhelmed; acceptance rate is declining and topics are narrowing.
Physical Review X. Cody's architecture paper was explicitly described by the editors as, "Hey, this isn't really physics in the traditional sense, but we think it's cool." (Well, that's a liberal paraphrase.) (At almost 250 citations, according to Scholar, good call on their part if they value citations to their journal.)
IOP New Journal of Physics. An excellent venue, longer papers than PRL but good visibility. But seems to draw from same reviewer and reader pool as PRA.
IOP Quantum Science and Technology. Looks to be an excellent new venue, and quantum-focused, which is a win, and we're happy with it, but seems to draw from same reviewer and reader pool as PRA, at least so far. Hopefully engineering-oriented papers will continue to find a home here.
Quantum Information and Computation (QIC). Has been the journal of record for much quantum algorithms work, less so for experimental work and engineering. Subscriptions are expensive, but papers can be put on the arXiv.
Quantum Information Processing (QIP). Along with QIC, a top journal, with more experimental work. Also expensive, also arXiv-friendly.
International Journal of Quantum Information (IJQI). Gets less attention than QIC and QIP. Also arXiv-friendly.
ACM Journal of Emerging Technology in Computing Systems (JETC). One of my favorite journals. I spent two years as an associate editor, and wouldn't mind doing another stint here. arXiv-friendly. Not quantum only, but definitely has a good quantum component. Will be interesting to see if that changes with the introduction of...
ACM Transactions on Quantum Computing (TQC). If the quality of a review board is an indicator of the quality of a journal, destined to be the top thing in our field. True rock stars, and great breadth. Will probably be limited to a few dozen high-impact papers a year?
Nature Quantum Information (npj QI). Aims to be a top journal. Open access, which is great, but the author fees are outrageous. I won't pay them.
Nature Communications and Scientific Reports. In my limited experience, erratic in review quality and editorial procedures, and no help on formatting and copy editing, but still have the outrageous Nature open access fees. A worse deal all the way around than npj QI.
Science Advances. I haven't interacted with this yet at all. Also has high open access fees.
Quantum. The new star in our firmament. An overlay journal dependent on the arXiv. Outstanding, young editorial board, this is the cool coffee shop where all the hipsters hang out. Go here to be seen and to hear the newest tunes!
IEEE Transactions on Quantum Engineering. I have heard only rumors of this and either my Google-fu is poor, or there's nothing about it on the web. Could be good, but unless it's an exception to the truly egregious average IEEE journal latency, not useful for students. (Edit on 20/2/29: Link added. Erik DeBenedictis is EiC, the journal will be good! First papers published in early 2020.)
IET Quantum Communication is brand new, and has a lot of "wireless" in its call, since that's the community the founders come from.
PRX Quantum will start publishing in mid-2020 (Edit: this one added 20/2/29.)
Wiley Quantum Engineering (Edit: this one added 20/2/29.)
Springer Quantum Machine Intelligence (Edit: this one added 20/5/19.)

Of course, there are conferences and workshops, too.

And I'm working on a short slide deck on how contributions to open source and non-traditional publishing (e.g. my #QuantumComputerArchitecture tweetstorm) should count.

But for the moment, the above nineteen or so journals are where the primary action is.

Friday, November 29, 2019

2.1.2 Diffusion and Confusion

Claude Shannon, in his seminal article on the mathematics of cryptography, defined two concepts he called diffusion and confusion. In diffusion, information from the original plaintext message is spread across more than one symbol in the output ciphertext; the farther the information spreads, the better. Shannon defined confusion as making "the relation between...[the ciphertext] $E$ and the...[key] $K$ a very complex and involved one."

Feistel, who designed DES, called diffusion the avalanche effect. Webster and Tavares defined the strict avalanche criterion (SAC), requiring that changing a single input bit flips each of the output bits with 50% probability.

The Handbook of Applied Cryptography says the following (p. 20 in my edition):

A substitution in a round is said to add confusion to the encryption process whereas a transposition [permutation] is said to add diffusion. Confusion is intended to make the relationship between the key and the ciphertext as complex as possible. Diffusion refers to rearranging or spreading out the bits in the message so that any redundancy in the plaintext is spread out over the ciphertext.

I haven't seen it written quite this directly (but then, I'm not that well read in crypto), but I think it's fair to say that confusion is achieved by the nonlinearity of the S-boxes.

These two concepts don't seem to figure prominently in the cryptography books and papers I have been reading, but it seems to me that they ultimately underly much of the process of cryptanalysis: spreading data broadly reduces the relationship exhibited by two ciphertexts even when the plaintexts are closely related, expanding the search space; and the nonlinearity means that even a large set of straightforward equations is not enough to simply and mathematicallyrelate the input and output.

Thursday, November 07, 2019

2.1 Playing Defense / 2.1.1 Entropy

2.1.1 Entropy

The single most important concept in cryptography -- indeed, in all of information theory, and inarguably one of the most important in all of computer science -- is Claude Shannon's entropy. The entropy is, roughly, the amount of disorder in a sequence of data, or the amount of information that must be transmitted to reconstruct the data. When the entropy of a cleartext message is high, it is hard to predict the next symbol (bit or byte or letter or floating point number); purely
random data has very high entropy. When it is low, the cleartext has a skewed probability distribution such that, e.g., the letter 'e' or the number 0 is more common than other values.

Of course, because encryption tries to hide information, encrypted data looks as random as possible: it has very high entropy. Thus, an easy way to automate a test to see if we have successfully decrypted a message is to calculate the entropy of our provisionally decrypted plaintext; low entropy indicates a high probability that we have (at least partially) succeeded.

The entropy is defined as

$H_{2}(X)=-\sum_{i=1}^{N} \frac{\operatorname{count}_{i}}{N}\log_{2}\left(\frac{\operatorname{count}_{i}}{N}\right)$

where $\operatorname{count}_i$ is the number of times that the $i$th value appeared in our data, and $N$ is the number of different values that show up in our data sequence.

From the fantastic website RosettaCode.org, we have a simple example in Python of calculating the entropy in bits per byte:

(This function can actually calculate entropy of strings of other types besides characters, as well.)

Note that this definition of entropy does not take into account any long-range structure; it is purely based on the overall probability of the appearance of a given set of symbols. Thus, to this function, the strings 0101...01 and 000...01...111 have a bit-level entropy just as high as a string composed of exactly 50/50 randomly chosen 0s and 1s, although by applying a little intelligence we can easily reconstruct the former but not the latter (at least, not exactly). Many lossless data compression programs operate on stateful _sequences_ of data to predict the next symbol. For example, the common Unix utilities gzip and compress use the Lempel-Ziv algorithm, which builds a dictionary of sequences as it goes, so that repetitive data and long runs are compressed effectively; this is especially good at sequences such as 0101...01 or 000...01...111.

(Taking still further advantage of brainpower, the ultimate in compression is measured using the _Kolmogorov complexity_: the length of the shortest program that will reproduce the sequence. This has very low complexity for anything highly regular, as well as seemingly irregular sequences such as the output of a pseudo-random number generator or the digits of $\pi$. I am not aware of any automated form for calculating the Kolmogorov complexity of an arbitrary sequence of data.)

This, obviously, is one of the motivations for the substitution phase in a substitution-permutation network; with permutation only the entropy of the original cleartext is preserved in the ciphertext, and in some cases reveals a great deal about the original data.

Wednesday, October 30, 2019

2. Cryptanalysis

The general idea of trying to decode messages that are encrypted is known as cryptanalysis.

Some of the planet's smartest people have worked on this for all of their lifetimes, building on work in progress since well before we were born, so my summary here is inevitably going to be superficial and perhaps wrong. Comments welcome; there is a lot of _bad_ information about cryptography on the web, and I'd prefer to be corrected and make mine _good_ information for the web.

(Somewhere, right now, maybe even as I type, employees of one or more of the world's major intelligence agencies are snickering at how naive my presentation here is. We don't know how much they really know, but we do know that e.g. the RSA public key cryptosystem and the technique of differential analysis were known to U.S. agencies for years, perhaps decades, before they became public.)

Wikipedia's AES article says, "For cryptographers, a cryptographic 'break' is anything faster than a brute-force attack – i.e., performing one trial decryption for each possible key in sequence (see Cryptanalysis). A break can thus include results that are infeasible with current technology." The page then goes on to list quite a number of attacks on AES, none considered practical yet.

Given how long people have been working on cryptography, naturally there are many techniques for attacking the ciphers. Here I'll mention just a couple of the modern techniques are known for what we might call "honest" attacks on block cipher cryptography, granting the assumption that the algorithm has no flaws and ignoring side channel attacks (such as measuring the power consumed during encryption) and attacks on infrastructure, people, etc. (some of which are known as "black bag", or burglary, attacks, and some of which are "rubber hose" attacks, extracting information from humans by coercion or torture). David Kahn referred to these latter attacks as "practical cryptanalysis" in his classic book, The Codebreakers.

The older one of the two is differential cryptanalysis, apparently discovered independently at least three times, publicly by Biham and Shamir in the late 1980s, in the mid-70s by IBM (where the goal of defending against the technique drove DES design decisions), and earlier by NSA. The second is linear cryptanalysis, discovered by Mitsuru Matsui in 1990.

(Other techniques that I know nothing about include integral, algebraic, and biclique.)

Before talking about those two in any detail, there's a more straightforward technique that leaks information about the plaintext, if not the key itself: the birthday paradox. But before we get into the cryptanalysis itself, let's look a little bit at how the designers built defenses into the block ciphers.

1.4 Notes & References

Coming soon to a blog near you!

1.3.3 Limitations of Block Ciphers

One obvious problem with the most straightforward application of a block cipher is that it's deterministic. If you just apply the transform to each block independently, it's easy to implement; this is known as electronic code book (EBC) mode. BUT: If you see the same ciphertext $c$ in two different places in the message stream, you know that the two input blocks were the same! This leaks a huge amount of information to a sophisticated attacker, and is considered unacceptable.

One answer to this is to make the encryption slightly less deterministic by XORing in the ciphertext of the previous block into the current one before performing the encryption. This is cipher block chaining (CBC) mode, the standard mode of operation. (There are at least two more modes that I know nothing about.) CBC has the undesirable side effect of requiring encryption to be done serially, but attacks can be parallelized. Nevertheless, it's the most commonly used mode.

1.3.2 AES (the Advanced Encryption Standard)

AES, the Advanced Encryption Standard, has replaced DES. It is a type of block cipher known as a substitution-permutation cipher. AES uses a block size of 128 bits, and a key size of 128, 192 or 256 bits. The key size affects the number of rounds of substitution-permutation, requiring 10, 12 or 14, respectively.

AES began life in 1997 when NIST announced a competition for a successor to DES, due to the problems with DES described earlier and later. Two Belgian cryptographers, Vincent Rijmen and Joan Daemen, submitted a proposal they dubbed Rijndael, which ultimately beat out fourteen other serious competitors to become, with some modifications, the Advanced Encryption Standard, published as the standard in 2001.
A substitution-permutation network, as you might guess, alternates substituting new bits for old ones with permuting (rearranging) the bits, conducted over a series of rounds. DES has some characteristics of a substitution-permutation network, but is not purely so. A well-designed s-p net will maximize Shannon's confusion (or the avalanche effect) and diffusion, which we will see shortly.

AES begins by laying out the sixteen bytes in a $4\times 4$ array in column major form:the first byte in the upper left, then the next byte below it, with the fifth byte returning to the tp of the second column. Most operations work on rows, however, increasing the mixing of the bytes.
First, in the KeyExpansion phase, the round keys are created from the original key.
Next, the appropriate number of rounds is performed. Each round consists of the following steps:

SubBytes
ShiftRows
MixColumns
AddRoundKey

In the first round, AddRoundKey is performed before the other operations as well as after. In the last round, the MixColumns step is omitted.
The S box in SubBytes is a single, byte-level, fixed operation, $1:1$, unlike the DES S boxes, which are more complex and vary depending on the position within the block. The AES S box is nonlinear ($f(x+y) \ne f(x)+f(y)$) and was designed specifically to defeat both linear and differential cryptanalysis (which we will see later), but should you not trust the specific values they can be modified. The S box can be implemented as a straightforward 256-byte lookup table, with the input byte as the index, or as a simple bitwise matrix multiplication.
The ShiftRows operation rotates all rows but the first. The MixColumns operation is more complex, executed as a matrix multiplication using modulo-two integer arithmetic. These two operations maximize the diffusion of the original data. The AddRoundKey is a simple XOR of the 128-bit round key into the block.
AES is easily implemented in either software or hardware. Some modern CPUs include special instructions to accelerate AES encryption, and even for those that don't, heavily optimized assembly language is achievable. In the absence of special hardware, encrypting each 128-bit block of data requires several hundred CPU instructions. For example, my Mac laptop, with a 2.2GHz Intel Core i7 processor, encrypts AES-CBC at the following rates:

128-bit key: 1.25 Gbps for 16-byte chunks, 1.44 Gbps for 8192-byte chunks
192-bit key: 1.06 Gbps for 16-byte chunks, 1.18 Gbps for 8192-byte chunks
256-bit key: 912 Mbps for 16-byte chunks, 1.01 Gbps for 8192-byte chunks

Despite two decades of effort, there are no known attacks on AES that allow recovery of the key in substantially less time than brute force; the best attacks roughly halve that cost, at the cost of requiring tremendous amounts of storage.

Thursday, October 24, 2019

Google's Quantum Supremacy Paper

[Important update, 10/25 0630. Dave Bacon has corrected me. The chip is fully programmable. I had thought that the choice of coupler patterns (Fig. S25 in the SOM) would limit what could and could not be done. I thought those were the only patterns the hardware supported, but it's just the set that they chose; in fact, any couplers can be on/off in any cycle, as long as they aren't using the same qubits. I stand corrected! Thanks, Dave!]

[Edit 10/24. 1730 JP time: softened the language a bit in the first couple of paragraphs; it came out harsher than I intended in the original version.]

We interrupt your irregularly scheduled programming on cryptography (What Every Quantum Researcher and Engineer Should Know about Classical Cryptography) for some comments on Google's new supremacy paper...

By now, everyone on the entire planet and some people in space has heard that Google has claimed the achievement of "quantum supremacy". As the resident quantum pundit on Dave Farber's IP list, I suppose I'm obliged to say a few things, but I really don't have much to add beyond what some of the most prominent people have said, so this is largely a collection of links I mostly endorse...

Fundamentally, Google claims that their 53-qubit superconducting processor can solve a problem that would take a classical supercomputer 10,000 years. But the problem is rather constrained; arguably, they defined the problem as whatever the computer is capable of doing. That sounds harsh, and I don't mean it to be; it's still a tour de force that will stand as an important milestone, and it is programmable, ~~but it's not yet fully general.~~ Actually, let me soften that a bit; the chip itself was designed in conjunction with an understanding of what they are physically capable of doing, but very much with an eye toward what would most clearly demonstrate supremacy within their physical constraints. So it's more fair to say it's hardware/software co-design. (And as I always tell my students, you can't design any computer without a plan for its workload!)

If you squint, it's something like a satisfiability problem (SAT, in CS theory terms). In a SAT problem, you're given a Boolean circuit and challenged to find a set of input values that makes the (single) output TRUE. In this case, measuring out all 53 qubits, it's not a single value that's the output, but rather a probability distribution. Using quantum entanglement and interference, in fact, it's not a flat "any number out of $2^{53}$" distribution, but instead has complex statistical characteristics. It's those characteristics that they are looking for. They started with a small problem size (using a subset of the qubits), calculated the distribution, saw that the machine conforms (but see the next paragraph!), then increased the size up past the level where their code can simulate it, and declared victory.

There are ~~three~~ two big catches to this: ~~one, as already stated it's not fully programmable~~; two, it's not clear that this is beyond the capability of classical systems (see below); three, I believe them when they say it works, but noise remains the limiting factor, and it takes tremendous statistical skill to pull the signal out of the noise -- indeed, the fidelity (roughly, probability that it did the right thing) for their largest problem is only 0.2%! That is, only one time in 500 does the circuit run properly, with no errors. And given that the output is supposed to be a probability distribution, and that the space of answers is about $10^{16}$, that means figuring out whether it differs from pure noise is tough.

The paper itself:
https://www.nature.com/articles/s41586-019-1666-5
53 qubits and 76 authors by my count, which makes 1.43 authors per qubit. The team is headed by John Martinis (pronounced close to "Martinez"), one of the long-time key people in superconducting qubits. John was at UC Santa Barbara before Google lured him and his entire team away a few years ago, and put truly outstanding resources at their disposal.

There is also substantial (67 pages) Supplementary Online Material (SOM) at the bottom of the web page, don't miss it if you want details of everything from the fabrication (two chips bonded face-to-face, one with the classical parts and one with the quantum parts!) to the cross-entropy benchmarking (XEB) used to statistically prove their proposition that the device works to the calibration procedure (tremendously difficult) to the algorithm itself.

News & views from William Oliver:
https://www.nature.com/articles/d41586-019-03173-4

The five-minute Google feel-good video:
https://www.youtube.com/watch?v=-ZNEzzDcllU

Blog post by Sundar Pichai, Google CEO:
https://blog.google/perspectives/sundar-pichai/what-our-quantum-computing-milestone-means

They built on techniques developed by Sergio Boixo & co.:
https://arxiv.org/abs/1608.00263

Naturally, IBM, one of the other biggest names in quantum computing, is pushing back:
https://www.ibm.com/blogs/research/2019/10/on-quantum-supremacy/
(full disclosure: I'm Vice Center Chair of Keio's Quantum Computing Center, and we are working with IBM; my students have recently been using IBM's own 53-qubit processor.)

They're not questioning the experiment itself, but whether it's truly post-classical. Simulators have been advancing rapidly, and _ideas_ for simulators and what counts as quantum supremacy have been moving even more rapidly.

(n.b.: there is pushback on the term "quantum supremacy"; in particular, the Europeans don't like it, they think it echoes Nazi terminology.)

Anyway...the most straightforward way of simulating a quantum computer takes memory exponential in the number of qubits:

10 qubits: 16KB
20 qubits: 16MB
30 qubits: 16GB
40 qubits: 16TB
50 qubits: 16PB
53 qubits: 128PB (144,115,188,075,855,872 bytes)

and, of course, a separate computation for each entry. But numerous techniques compress that dramatically; one technique called tensor networks, which (very, very, very roughly) allows a query for the value of a state to reuse previously-calculated results, with the whole thing compressed into a tree of pointers to values, plus modifiers. IBM's Edwin Pednault and team have simulated up to 56 qubits in special cases:
https://arxiv.org/abs/1710.05867
and this team's new work forms the core of the IBM pushback against Google:
https://arxiv.org/abs/1910.09534

Indeed, Alibaba simulated some special cases related to supremacy last year, up to *144* qubits:
https://arxiv.org/abs/1805.01450

There has been some recent discussion of what it is that truly determines the difficulty of a quantum computation; it's not just the number of qubits or the number of entangling two-qubit gates, it's a more complex function. I should dig up some references on that.

Scott Aaronson's quantum supremacy FAQ:
https://www.scottaaronson.com/blog/?p=4317
and a new post on the topic:
https://www.scottaaronson.com/blog/?p=4372

Simon Devitt's podcast, Meet the meQuanics, just did an episode with some of the inventors of some of the theoretical techniques Google used:
https://soundcloud.com/mequanics/meet-the-mequanics-e53-quantum-supremacy-a-montenaro-m-bremner-r-mann

More useful discussion:
https://spectrum.ieee.org/tech-talk/computing/hardware/how-googles-quantum-supremacy-plays-into-quantum-computings-long-game

And from John Preskill himself, inventor of the term:
https://www.quantamagazine.org/john-preskill-explains-quantum-supremacy-20191002/

Finally, let me point out that Chris Monroe is quoted in Financial Review, talking about his ion trap experiment from a couple of years ago -- featuring, you guessed it, 53 qubits (they were trying for 52 and wound up with 53):
https://www.afr.com/technology/why-google-wasn-t-really-the-first-to-achieve-quantum-supremacy-20190927-p52vg6
https://www.nature.com/articles/nature24654
https://arxiv.org/abs/1708.01044
52 was chosen to get ahead of Misha Lukin's group, which had 51:
https://www.nature.com/articles/nature24622
https://arxiv.org/abs/1707.04344

More than enough for now. Hope this helps!

Thursday, October 17, 2019

1.3.1 DES (the Data Encryption Standard)

DES, the Data Encryption Standard, was designed by IBM in the 1970s, consulting with the NSA. DES is a type of cipher known as an iterative block cipher, which breaks data into to blocks and repeats a set of operations on them. The operations in DES are called a Feistel network, after the inventor.

DES uses a 56-bit key, though products exported from the U.S. were limited to using 40 meaningful key bits [wikipedia/40-bit-encryption]. It was later upgraded to triple-DES, using three 56-bit key pieces and repeating DES three times, giving up to 168 bits of protection. But it's not just the key size that matters in a block cipher. The block size itself matters a great deal, as we'll see below. For DES, that block size is only 64 bits.

DES operates in sixteen rounds, each of which uses a 48-bit subkey generated from the original 56-bit key using a key scheduling algorithm. In each round, half of the block is tweaked and half initially left alone, then XORed with the tweaked half. The two halves are swapped before the next round.

The "tweaking" of the right half of the block is done by first expanding the 32 bits into 48 by replicating half of the bits, XORing with the 48-bit subkey, then dividing it into 6-bit chunks and pushing each chunk through one of eight substitution boxes, or S boxes. Each S box turns 6 bits into 4, using a lookup table defined as part of the algorithm (that is, this operation is not key-dependent). The S boxes are nonlinear (but not affine), which is the source of the true security of DES; if the S boxes were linear, breaking DES would be easy (or so I am told).

Decrypting DES is exactly the same operation as encrypting, except that the subkeys are used in reverse order.

Slightly more formally, the sequence of operations in a DES encryption is:

Apply initial permutation (IP) (a fixed operation)
For i = 1 to 16 do

divide the block into two 32-bit halves
expand the left half to 48 bits (a fixed operation)
calculate subkey i

split key into two 28-bit halves
rotate each half 1 or 2 bits (a fixed operation according to the key schedule)
select a subset of 48 bits (a fixed operation according to the schedule)

XOR subkey i with the left half of the block
split into 8 six-bit pieces
push each 6-bit piece through a $6\rightarrow 4$ S-box
permute and recombine the pieces (a fixed operation)
XOR the left half with the right half of the block
swap halves of the block

Apply the final permutation (FP) (a fixed operation)

The $6 \rightarrow 4$ S boxes are obviously inherently non-reversible, but the earlier
expansion guarantees that ultimately no information is lost as the block passes through the entire network.

The success of cryptanalysis is often measured in terms of the number of rounds it can break in a multi-round cipher like DES. e.g., if DES were six rounds instead of sixteen, we could do yea much. Various attacks have been able to penetrate DES to different depths. Of course, the more straightforward approach is sheer brute force; as early as 1977, Diffie and Hellman published a critique in which they argued that brute force stood at the edge of feasible, and indeed in 1999, a special-purpose machine named Deep Crack, built by the Electronic Freedom Foundation, cracked a DES key. Further advances have made it possible even for dedicated hobbyists to crack.

Triple-DES (also called 3DES) has several modes of operation, but is usually used with three independent 56-bit keys, $K1$, $K2$, and $K3$, with encryption performed as $C = E_{K3}(D_{K2}(E_{K1}(P)))$ where $P$ is the plaintext, $E$ and $D$ are the encryption and decryption operations, and $C$ is the ciphertext.

DES was withdrawn as a standard in 2005, after having been replaced by AES in 2001, although the U.S. government still allows 3DES until 2030 for sensitive information.

Monday, October 14, 2019

1.3 Bulk Data Encryption

D-H and RSA are pretty easy to understand, and known to be vulnerable to quantum computers, hence attract a lot of attention. The workhorse symmetric block ciphers for bulk encryption are actually much more complex mathematically, and hence harder to understand, but ultimately can be executed efficiently on modern microprocessors and dedicated chips.

A block cipher takes the data to be encrypted, and breaks it into fixed-size chunks called blocks. If the last block isn't full, it is filled out with meaningless data. We also take a key, and using the key perform mathematical operations on the data block. In a symmetric system, by definition, the decryption key is the same as the encryption key. Generally, the output block is the same size as the input block. There is no requirement that the block and key are the same size.

Ideally, the output block (the ciphertext) will look completely random: about 50% zeroes and 50% ones, regardless of input, and with no detectable pattern. That is, its entropy will be very high. Of course, it cannot be completely random, as it must be possible for the data to be decrypted by someone holding the appropriate key. A good cipher, however, will provide few clues to an attacker. A single bit's difference in either the key or the original plaintext should result in about half of the ciphertext bits being flipped, so that being "close" offers no guidance on a next step.

Thus, a symmetric key system's security is often linked to its key size; with $k$ key bits, it should require, on average, $2^{k-1}$ trial decryptions to find the key and to be able to decrypt the message. We will discuss this further when we get to cryptanalysis.

Many encryption algorithms have been designed over the years, but two in particular are too important to ignore, so we will examine them: DES, the Data Encryption Standard, which is still in use in modified form but is primarily of historical interest now, and AES, the Advanced Encryption Standard, which is used for most communications today.

Friday, October 11, 2019

1.2 Key Generation

See the previous section here.

The other algorithm we talk about a lot as being both important and vulnerable to Shor's factoring algorithm is Diffie-Hellman key exchange, which is used for creating the session key we will use for bulk data encryption. It's important that every session have its own separate encryption key; we don't want multiple conversations sharing the same key, for a lot of obvious reasons.

D-H works as follows:

Alice and Bob publicly agree on a modulus $p$ and a base $g$ (in cleartext is okay)
Alice and Bob each pick a secret random number $a$ and $b$
Alice calculates $A = g^a \bmod p$, Bob calculates $B = g^b \bmod p$
Alice sends Bob $A$, Bob sends Alice $B$ (in cleartext is okay)
Alice calculates $s = B^a \bmod p$, Bob calculates $s = A^b \bmod p$

Both Alice and Bob now have the same secret $s$, which they can use as an encryption key.

Note that, as-is, D-H is vulnerable to a man-in-the-middle attack, and so must be coupled with some form of authentication so that Alice and Bob each know that the other is who they say they are.

Thursday, October 10, 2019

1.1 Authentication

Previous post in this series is here.

Okay, time to dive into this in more technical depth. Let's go phase by phase:

1.1 Authentication

Authentication, or proving you are who you say you are, is often described as something you have (a physical key), something.you know (a secret password), or something you are (biometrics, or physically unclonable functions, or PUFs).

The authentication phase can be accomplished using a pre-shared secret key and symmetric encryption of some message and response, or it can be done via asymmetric, public-key cryptography.

The best-known and most-used form of public key crypto is RSA, developed by Rivest, Shamir and Adelman (but also previously discovered by Cocks, of a British intelligence agency):

1. Pick two large primes, $p$ and $q$, let $n = pq$.
2. Calculate $\phi(p,q) = (p-1)(q-1)$.
3. Pick $e$, prime relative to $\phi(p,q)$.
4. Calculate $d$, s.t. $de = 1 \bmod \phi(p,q)$, which we can call the inverse of $e$ modulo $\phi(p,q)$.

The tuple $\langle n,e\rangle$ is now your public key, and $d$ is your private key. You can publish $\langle n,e\rangle$ any way you like, including in the New York Times. To send you a message $m$, I calculate the ciphertext $c$,

$c = m^e \bmod n$

and send $c$ to you. You can recover the message $m$ by

$m = c^d \bmod n$

That's all there is to it! Except that it's expensive, so we don't use it for every message. If I send you "Hi, Bob, it's Tuesday, let's use 42 as our key," using your public key and you reply, "Tuesday is beautiful, Alice, and 42 is fine," using my public key, we confirm that we are definitely Bob and Alice, known as authentication -- assuming, of course, that you trust that the key that you are holding really and truly belongs to me, and I haven't leaked it out somehow!

As you might guess from the publication of $n$, RSA is the one that's vulnerable to factoring larger integers, and hence of interest to spooks once Shor's algorithm came about.

Current EU recommendations are for 3072-bit RSA keys, recently (2018) upgraded from 2048, for "near-term protection" (up to ten years). They recommend an extraordinary 15360 bits for "long-term protection" (thirty to fifty years). [Cloudflare, Keylength.com, Ecrypt, Lenstra]

(Note: there are many more references to come; they will be delivered in a batch at the end. It's not practical to list them all here in every article as we go, but I'll try to give you some of the key(ha!) ones.)

1.0 Encrypted Communications

See the previous posting if you haven't.

An encrypted conversation involves, at the macro level, three phases:

Authentication: proving that you are who you say you are
Key generation: creating the keys used for bulk data encryption
Encryption/sending/decryption of the user's valuable bulk data

That's a bit of an oversimplification, as we'll see below when we talk about IPsec, but good enough for now.

You probably all know that there are two major kinds of cryptography -- symmetric key and asymmetric key, also known as public key. Due to the high costs of computation for public key, bulk data encryption is done using symmetric key. Symmetric encryption comes in two kinds, block ciphers and stream ciphers. These days pretty much everything seems to be block, but I'm not sure why.

Some additional terminology:

cleartext: data that isn't encrypted and isn't really intended to be (sometimes confused with the below, even by me)
plaintext: the original, unencrypted message
ciphertext: the encrypted message
integrity: the data hasn't been tampered with
confidentiality or privacy: the data hasn't been disclosed to anyone unauthorized
session keys: the keys used for one communication session
forward secrecy: keeping your data secret in the future, esp. by building a crypytosystem that doesn't reuse keys
rekeying: changing the keys used in the middle of a session
subkeys: keys used for a portion of an encryption process, derived from a subset of the bits of the session key
cryptoperiod: the time that a specific key is authorized for use

Attacks on encrypted communications generally fall into one of three
categories:

unknown plaintext: the hardest problem; how do you recognize when you've found the message? With many but not all systems, failure will leave only unintelligible, random data, while success will produce words from the dictionary or other recognizable text.
known plaintext: when an attacker knows what the plaintext corresponding to a particular ciphertext is, and attempts to find the key; not uncommon given the regularity of communications such as web browsing or email.
chosen plaintext: when the attacker can control the text to be encrypted, but obviously not the key; rarer than known plaintext, but can happen with small devices that a person may "own" but not always completely control, or if the attacker partially controls some subset of resources, such as a related web server, or has compromised one or more hosts behind an encryption gateway.

We also need this definition:

brute force/exhaustive search: checking every possible key, which of course is $2^n$ for an $n$-bit key, resulting in an expected hit time of half that number of trials; if you have a method that will find a key in substantially less than $2^{n-1}$ trials, you have "broken" the cipher, even if your attack isn't necessarily practical in the short run.

And three mathematical definitions I know for algebra on the real numbers, but I'm a little fuzzy on in the bitwise, cryptographic context:

linear function: I think in this context, $f(x,y)$ is a linear function of some bits if it involves only a linear addition of the inputs, and $f(0,0) = 0$ (origin is preserved). Multiplication (AND) is disallowed? Importantly, $f(x1 + x2) = f(x1) + f(x2)$.
affine function: same as a linear function, but an offset is allowed, such that $f(0,0) = 1$ (origin isn't necessarily preserved) (equivalent to a translation in a real space $R^n$).
nonlinear function: A nonlinear function is one in which $f(x+y) \ne f(x) + f(y)$ for some values $x$ and $y$. Some affine functions are nonlinear. I'm definitely fuzzy here...multiplication and arbitrary mappings are allowed?

Next posting in this series is here.

Wednesday, October 09, 2019

0.0 Introduction

I just posted this introduction to the project here. I hope you'll follow along as I fill in the postings.

There are two fundamental reasons for periodic rekeying:

Having a long string of data encrypted with the same key increases the ability of an attacker to find the key.
As a practical matter, reducing the amount of data that is exposed in the event that a key is broken is good stewardship of your data.

So changing the key "often" makes it both harder and less valuable for an attacker to find a key. That's the tl;dr. Turning it into concrete, numeric recommendations is hard, but it looks like rekeying every half hour is pretty good, if your key bandwidth isn't high enough to do full one-time pad.

The above I knew off the top of my head, but I'm not a cryptographer and had more or less taken the need for changing the keys (known as rekeying) on faith. So, let's dig a little deeper...there won't be any new knowledge in this, but I hope it helps bring together some information and organizes it to both save you the googling and calculating I've done and to provide a clear picture.

I used to work on a box known as an IPsec gateway, but I was the OS guy, not the cryptographer; Darrell Piper, Dan Harkins and Dave Kashtan did most of the crypto work. I am also a board member of the WIDE Project, which through its KAME Project created the Racoon software that was an important early implementation of the key exchange protocol for IPsec, with much of that work done by Shoichi Sakane. Much of what I know about IPsec and other security was just incidental radiation from them, or gained from the security-oriented papers we had to read when I did my master's at USC back before the Enlightenment.

I'm afraid this has gotten longer than I originally intended, so I hope it's all useful. It's so long, a table of contents is in order:

Brief, mostly non-mathematical intro to cryptography and its terminology, in the context of encrypted communications.
General ideas behind cryptanalysis and the importance of limiting data volume, since that's the topic that started this conversation.
How these ideas play out in IPsec, one of the Internet's key security protocols.
How these ideas play out in TLS/SSL.
Quantum and crypto.
Post-quantum crypto.
ACT NOW and order the above package, and as a FREE bonus, Ronco (well, Rodco) will send you a few thoughts on Bitcoin and its vulnerability to the development of quantum computers!
References

Next posting is here.

What Every Quantum Researcher and Engineer Should Know about Classical Cryptography

At a conference in Berlin in June, I was asked, paraphrasing, about why we change the cryptographic keys used during long communication sessions, and how often they ought to be changed. I waved my hands a little bit, but I wasn't completely satisfied with my own answer. It turned out to be a good opportunity for me to dig into a little background I'd been meaning to go through. It also proved to be a remarkably hard question to answer from googling up pages, including cryptography research papers and the extensive mailing list archives from development of key Internet protocols. I fell down a pretty serious rabbit hole here, and haven't found my way out yet.

What I have wound up with is about 25 pages of notes on classical cryptography, and I have decided to turn them into something that will benefit the quantum community at large. I'm tentatively titling it, "What Every Quantum Researcher and Engineer Should Know about Classical Cryptography", in homage to Goldberg's classic paper on floating point arithmetic.

The target audience for this is primarily quantum computing researchers who are familiar with Shor's algorithm, Grover's algorithm and QKD, since those are among the first things you learn, but who have only a very rough idea of what it means to actually encrypt data and to use encryption in a real-world setting.

I'll be posting the notes gradually over the next few weeks here on this blog, and I hope you will comment and help me improve them.

Table of Contents:

Introduction
Encrypted Communications

Notes & References

Cryptanalysis

Defense

The Birthday Paradox, or, When Should I Change my Encryption Keys?
Differential Cryptanalysis
Linear Cryptanalysis
Known and Chosen Plaintexts in Real Systems
Notes & References

IPsec and the IETF

TLS/SSL and cryptography

TLS Records and Basic Limits
Keying and Rekeying
Other Attacks on TLS
TLS and QKD
Notes & References

Quantum Attacks on Classical Crypto

Shor 'Nuff
Grover's Amplifier

Post-Quantum Cryptography
Bitcoin/Blockchain and Quantum Computers
Conclusion
Additional References

See the first installment here.

Tuesday, October 08, 2019

testing, testing, is this thing on?

My interest in running a blog was tailing off when I lost access to this account, so I let it go for a long time. Now, I have some things to say, and I may have recovered access here.

Monday, October 07, 2013

Who's Afraid of Peer Review?

I've been thinking about restarting this blog, which has been kind of dormant for a while. One of the topics I've been thinking about covering is research integrity, both in Japan and throughout the world. Here is one tidbit along those lines. Perhaps I'll comment a little more later, perhaps not.
Who's Afraid of Peer Review?

Sunday, March 17, 2013

I heard yesterday that Mr. Phillips died on March 13, at the age of 84. When I heard the news, I cried steadily for half an hour, and am crying again now as I write this. Mr. Phillips was, outside of my mother and father, perhaps the single person I most respected; he influenced the direction of my life disproportionately. He was more than simply a high school science teacher, he was a role model in life, a mentor, and a very good friend. I will miss his quiet voice, warm smile and the light in his eyes, almost conspiratorial with some shared humor but never smug or excluding, until the end of my days.

Jack Phillips was the Williamson High School biology, advanced biology, and physiology teacher during my days as a student. But that is like saying Magic Johnson was a basketball player -- it is a simple fact, but gives you no insight into his accomplishments, let alone deeper into the person. This essay won't be able to do him justice, either, but I hope it will give some measure of the man.

Mr. Phillips believed in the intelligence, curiosity and maturity of the students (sometimes in the face of overwhelming evidence to the contrary). He treated us all like we wanted to be treated -- like adults. He found ways to make difficult concepts accessible and lively, even taking us to his farm to study the plants there.

I developed a deep bond with him. Even as a young student, I would go to his classroom during lunch breaks, before or after school, and knock on his door. He would always open it with a welcoming smile. We would sit and talk, sometimes for a few minutes, sometimes for an hour. He would show me experiments he was setting up for class, and even ask me to help. Sometimes my best friend, Michael Hensley, would go with me, sometimes I went alone.

Mr. Phillips had been in the Navy for a number of years before coming back to Williamson and becoming a teacher, so he knew much about the world, had been many places and seen many things. We shared books and magazines -- usually serious, sometimes humorous, sometimes banal -- and video tapes of Nova and Cosmos episodes. These conversations almost always began with some relationship to science, but no topic was off limits, and we covered much of life.

These visits continued even after I graduated from high school, though they gradually tailed off over the years. Mr. Phillips moved from educating the students to educating the educators, and I would visit his office at the Board of Education. Michael and I would also drive up to his farm, twenty minutes up the river, and drop in unexpectedly, on a weekend afternoon or even a weekday evening. If he was home, I can't remember ever being turned away. Mrs. Phillips would ply us with lemonade and we would sit and talk, on the back porch on warm days, inside the house evenings or if the weather was either too hot or too cold. Sometimes we repaid the intrusion by helping in minor ways with farmwork, but most often we just chatted.

His stories of growing up on the farm were wry, often hilarious. Perhaps our favorite was the story of tearing down the barn. His grandmother (if I recall correctly) had an old barn that was falling down, and she wanted it removed. Jack and his cousins offered to do it. Of course, they didn't tell her that they had found a box of dynamite, abandoned by miners or the railroad. They knew enough about dynamite to know how to set it off, and enough to stay well clear, but not enough to know how much to use, so they used it all. The explosion not only succeeded in demolishing the barn (to the point where there were few pieces even worth picking up), it blew windows out at the main house, and the chickens that had been quietly going about their business in the yard made themselves very scarce. After their own ears stopped ringing, the kids had to round the chickens back up, who refused to lay for some time afterward.

More than a talker, though, he was an extraordinary listener. He weighed everything you said, correcting gently when necessary, but always making you feel that your opinions and ideas were valued. I strive (but often fail) to emulate his approach.

I can only remember him getting something like angry once. We were at his house, playing Trivial Pursuit (whether the Phillipses owned it, or Michael and I brought it with us, I don't recall). Mr. Phillips kept trying for the Science questions, and kept getting questions like, "If you were born on March 18, what is your astrological sign?" His annoyance with such un-scientific claptrap in the science category was palpable.

It was Mr. Phillips's sincere interest in me and his sterling role model that gave me the confidence to pursue a career in science and engineering. My parents, especially my father, were always supportive of my interest in the sciences, taking me to Saturday morning science classes when I was young, subscribing to magazines, talking about it, taking us to see solar eclipses and space launches. But without Mr. Phillips's steady presence at just the time I was figuring out who I was, and applying to colleges, I might have drifted off into some other field. He helped me make the right choices in life, first with his advice and incisive questions, later as a presence in my mind.

I don't mean to slight the contributions of my other mentors. I have had half a dozen older mentors, at least as many peers, and many wonderful teachers who changed the direction of my life. I owe them all a debt I can never repay. But Mr. Phillips was there at the right time to guide me. He was perhaps the first outside of my parents to take me seriously and treat me like an intellectual peer and colleague, if a young and naive one, rather than just a student to be trained.

I think the last time I saw him, my older daughter was a toddler, so it must have been a dozen years ago. We were visiting my parents, and on a whim I picked up the phone and called. He was actually living down by the Tennessee border by then, but by coincidence happened to be up at the farm doing some maintenance. My wife, daughter and I drove up, and had a wonderful chat with him. I remember we talked about Japan quite a bit (although we were still living in California at the time), though I don't recall any of the details. I'm not sure I ever even told him that I am now a professor at a university, which I think would have made him proud.

He was loved by all of the students of my generation. Teachers of his caliber are rare. So are human beings. His passing saddens me, but if the measure of value of a life is its impact on others, he made the world a far better place. I can only hope that I pass on to my own students some of the same intellectual and personal traits.

I'm sorry I'm not there today with you. My love to Mrs. Phillips and
Ann.

Saturday, September 22, 2012

Thank You, Endeavour

You friends posting fabulous pictures of Space Shuttle Endeavour's last flyby over California are making me weep. First, I'm crying because I'm not there to celebrate this valedictory moment with you. I love you all and miss you terribly, and this reminds me of that because of the space program events we have shared, both triumph and tragedy. Some of those have been defining moments in our lives and our friendships: watching landings at Edwards Air Force Base; sitting in the shed in the Lloyd House courtyard, mute, staring at the images of Challenger on the big-screen TV, with one friend that evening passing out black strips to be attached to basketball uniforms.

I weep for the machine itself. A fabulous piece of hardware, one of mankind's greatest technical accomplishments. I still use a technical paper about the shuttle's software when I teach about real-time operating systems, and it never fails to impress.

Fabulous, but flawed: I weep for Dick Scobee, Michael Smith, Judy Resnick, Greg Jarvis, Ellison Onizuka, Ronald McNair, and Christa McAuliffe. I weep for Rick Husband, Willie McCool (perhaps the greatest name for an astronaut, ever), Michael Anderson, Kalpana Chawla, David Brown, Laurel Clark and Ilan Ramon.

I weep for the program; from my junior high school days until now the space shuttle has dominated our conception of what it means to fly into space.

I weep for the end of an era. The step into commercial space travel is fraught with uncertainty. Perhaps we should have done it twenty years ago, perhaps never; we won't know for another decade. I stand in awe of the accomplishments that the commercial space companies have already made. I am hopeful, but apprehensive.

I weep for Apollo, the boldest among many bold things Americans have done. I weep because we left the job unfinished, and because the day may come again when no American, and perhaps no human, alive has been to the Moon. As we saw last month, somewhat to our surprise our
heroes are mortal, and the youngest of them are nearing eighty. To me, the greatest generational dividing line is July 20, 1969: were you born before, or after, a human being set foot on another celestial body for the first time?

I weep for our dreams. Make no mistake about it -- I am a huge supporter of our unmanned space program, and Curiosity shows once again what we can accomplish when we try, and that it has the power to capture the imagination of people around the world. But as long as we inhabit these bodies of flesh and blood, part of what it means to be human is to challenge ourselves physically as well as intellectually. "To boldly go where no man has gone before," captured it perfectly.

Finally, I weep for the little boy who thrilled to every launch, with memories of Apollo 15 and STS-1 in person, to every event and landing, every discovery. In what seems simultaneously an instant and forever, the boy grew from a dreamer in a hard worker, from a skinny kid with unruly hair to a middle-aged, gray-bearded, balding, slightly overweight man. The first, the greatest dream was always to become an astronaut. Indeed, I was worried I had been born too late, that by
the time I was an adult all of the interesting exploration would be done! But still, just going would be wonderful. Somewhere along the way, the dream got set aside in favor of easier-to-achieve goals. I
am proud to be a professor, proud of my research and students, and deeply in love with my family. But my wife still tells the story that I once said, "If aliens stop by, and ask if I want to go with them,
I'm saying 'Yes!' Sorry if it means leaving you behind. Maybe you can come, too?"

The tears are tears of deep emotion, both joy and sadness, tears for ourselves, for all of humanity, for our breathtaking accomplishments, our failures, and for our ability to go on.

Thank you, Endeavour.

Reach for the stars.

Thursday, April 14, 2011

New Paper Dance: Recursive Quantum Repeater Networks

I've been neglectful of this blog lately, but this paper has what I think are some ideas that are a good fit for quantum networking. We're just beginning the discussion about QRNA (Quantum Recursive Network Architecture), though, and comments are very welcome!

Our paper Recursive quantum repeater networks is part of a special issue of Progress in Informatics on quantum information technology.

Thanks to Joe Touch (and his Recursive Network Architecture (RNA) project), and Clare Horsman for their work on the paper. Wouldn't have happened without them.

#Quakebook

"2:46" is now available on Amazon. 100% of the proceeds go to the Japanese Red Cross. Buy your copy today!
http://www.quakebook.org/

Tuesday, April 12, 2011

Abstracts for Systems Papers

Generic advice on abstract writing for systems papers:

6 sentences in an abstract:

1. Define the problem you're solving
2. Give the key idea for how you solved it
3. Describe how you demonstrate the success of your solution
4. Give key results, preferably numerically
5. Describe how this impacts the world/industry/whatever (big picture)

Next, go back and reread it, and figure out which of those topics
needs a second sentence, and fill that in. Most often, it's the data
or experiment. Viola! Six sentence abstract!

...now go back and think about whether you really need that first
sentence. Often the first two can be combined, if what you are
working on is a well-understood "hot topic". But be *very* careful
about eliminating it, lest you appear to be doing empty-headed "cool
prototype" building. This is also one of the key places your paper
needs to be timeless; people will hopefully read your abstract for
years to come, but they won't read the paper if they don't like the
abstract!

The abstract *must* be clear about whether your results are analytic,
simulated, or measurements of a real-world system or lab prototype.

Not a perfect formula, and formulas shouldn't be over-used anyway, but
it's a pretty good way to do it. The abstracts of 90% of the papers I
read could be improved by following this approach.

Sunday, March 13, 2011

Diary From the Last Couple of Days in Japan

Dave, others:

This is a collection of notes, lightly edited, compiled over the last
two days, so the voice and time line move around a bit. Apologies.

OK, first impressions:

1. The earthquake was by far the biggest I've ever been in (or hope to
be in), but the building we are in is brand new and very well built,
so it swayed a lot but never felt dangerous.
2. Being a refugee is both stressful and boring at the same time, even
when you're with friends in a place you know is safe. The biggest
thing, of course, is the lack of reliable information; several
people around me have 1seg keitai which give a *very* poor TV image,
enough to be scary but not provide a lot of detail.

Observations:

1. It's a rule of mine not to leave the house without clothing warm
enough for the possiblity of being stuck outside for hours; I think
I'll keep that rule.
2. I carry millions of transistors in my pocket, billions in my
backpack. One would be enough for an AM radio, but no one around me
seemed to have one.

Rolling back to 14:45 Friday...

"Earthquake," I said quietly. Nobody noticed, Kei-san kept on
talking. Even I wasn't completely sure at first, and I'm pretty quick
to pick up on them.

"Earthquake, we're having an earthquake," I said, a little louder.

Kei-san said, "Earthquake?...You're right..."

Osamu-san said, "Earthquake? Really?"

By this time, it had already been swaying for several seconds.

"It's getting bigger," someone said.

Kusumoto-san got up and walked across the room and peeked through the
blinds. "Electric poles are swaying," he said. I got up and walked
across the room to join him.

"It's getting even worse," someone said. "Better get away from the
windows."

"Wow, it's big...this is far away and big..."

Comments like that continued for what seemed like two minutes, before
it calmed down. The electricity went out.

Shortly, the announcement came to evacuate the building, so we grabbed
our jackets and went out. Several of us helped a man in an electric
wheelchair, lifting him down the stairs.

This building includes a gym and pool; dozens of kids in speedos and
googles were forced out into the cold. I handed out a shirt and
fleece I was carrying (which haven't come back, but if that's my
biggest loss, I'm fine).

While we were outside, I got email on my cell phone (DoCoMo mail)
from my wife, letting me know that she was okay, had one of our
daughters, and was getting the other. It would be fifteen hours
before we would be able to connect via voice or SMS, but DoCoMo mail
and their 3G packet service operated sporadically from the beginning.
I was able to access Gmail, Facebook and Twitter, enough to get a
message to friends who relayed it to my parents.

Eventually, they announced that they would inspect the building top to
bottom, starting on the 7th floor, before we were allowed back in the
building. A few minutes later, it started to rain, and a stream of
people went back into the building -- with permission or without, I
don't know.

With power out, we had some emergency lights; our local blackout
continued until 11pm, eight hours after the first shock (but when it
got dark, we could tell that surrounding areas still had power). We
grabbed our stuff, and were herded into a few rooms on the lower
floors, where sat on classroom chairs or stretched out on the floor.
Decks of cards and various drinks, including Dad's Root Beer (which
some student literally mistook for a type of beer -- to her disgust
and my delight) and some sort of avowedly foul Korean liquor, and
snacks materialized, and the students and younger folks quickly
settled into a social mood. I'm fighting either allergies or a bit of
a cold, so I stretched out on the floor to rest.

About a half a dozen faculty were in the early part of an overnight
retreat, here on Keio's Hiyoshi Campus rather than at SFC, so I had
extra shirts and some bread and cheese on hand.

Most of the other faculty that I came with have gone home to check on
their own families, moving via car, but none were going in my
direction, so I elected to stay here. Some of the faculty and staff
and a fair number of students from this campus remain; some live close
enough to walk, but have no power or simply prefer to be with friends
here.

Fighting a cold and stress, and with nothing but emergency lights,
didn't feel like reading. Little information coming in; we were safe,
with nothing particular to do. No one around needed immediate help,
and simply adding people to the streets and stations was clearly a bad
idea. I now understand the empty look on the faces you see in refugee
camps.

Eventually, the campus security and general affairs folks came around
and handed out canned water, crackers, and rather musty blankets.

Some people had keitai (cell phones) with one-seg (1seg) receivers,
very low-bandwidth digital TV. The images we could see were
appalling, with fires burning across broad areas, in the dark.

About 10pm, the Toyoko Line reopened to Shibuya, and people began
filtering out. I stuck with my resolution to stay put until morning,
or JR began running again. (I later heard that some faculty took more
than ten hours to get home; those of us who stayed warm, fed and
comfortable certainly didn't regret that decision the next morning.)

Overnight there were a number of aftershocks, and the second major
quake in Nagano, which they asserted was not related.

I was beside myself with worry about the possibility of tsunami coming
to Kamakura. Our house sits 800 meters from the beach. A couple of
years ago, they handed out a community disaster handbook that included
a tsunami map, which suggested that 7m is a high enough altitude.
Friday's events clearly show that to be false. Our house sits right
on the isoline at 7m, but clearly would have been swept away.

The lights came back on at 11pm. About 1am, I laid down to sleep for
a while; when I woke up at 2:30, many of the students had gone.

More earthquakes, more worries, watching NHK on a big projector screen
until 430am, then slept until 6.

When they announced the reopening of some of the JR lines at 700, I
left. Getting to Yokohama was easy, from there was slow as they kept
trains running at 35km/hr and stopping at every intersection. The
platforms and trains were crowded, but not intolerably so.

From Kamakura Station, it's a ten-minute walk home. I detoured
through the area I think my family should use as an escape route in
the event of a tsunami. The official map recommends that we make our
way to one of the nearby junior high schools, on higher ground, but
their recommended route passes through a stretch of very low ground
several hundred meters long, and crosses the river. I'm revising ours
to what I think is a less-exposed route, though we have to cross a
branch of the river somewhere and there's still one low stretch in
it.

That takes us up to the start of the nuclear reactor concerns, which
will have to be a separate post. I'm struggling with the technical
explanations in Japanese, anyway, so those reading the
English-language press may be better informed.

Over time, as the information flow peaks, my posts will probably lack
originality and insight, but I hope this gives you some idea of what
it's like here on the ground, for a typical family in the Kanto area,
well away from the most seriously hit areas.

Will mine my FB posts and tweets for further material at some point.

Comments and reassurances always welcome; if I seem abrupt via email,
it's just trying to handle the flood of check-in emails from both
people here locally and those from outside asking about us.

We *definitely* appreciate the concerns! Keep us in mind not just
today but over the coming months; recovery here is going to take time.

Wednesday, April 07, 2010

New Paper Dance

Austin G. Fowler, David S. Wang, Thaddeus D. Ladd, Rodney Van Meter, and Lloyd C. Hollenberg,
Surface code quantum communication, arXiv:0910.4074 [quant-ph], accepted to Phys. Rev. Letters! Congrats to Austin and the rest of the team. We are gradually illuminating some of the possible corners in the space of quantum repeater system design.

Friday, April 02, 2010

The New GIGA Program at SFC

The web site for Keio's new undergraduate program, Global Information and Communication Technology and Governance Academic Program (GIGA), is now up and running.

Starting in fall 2011, we'll be accepting freshmen into GIGA to study networking, novel computing systems, international relations, and all of the other fun things we do (including computer architecture and quantum computing, two of the things I teach), with the language of instruction to be English.

Thursday, April 01, 2010

Associate Professor

First day of work as Associate Professor Rodney Van Meter. No foolin'!

Friday, January 08, 2010

New Factoring Record

New factoring record: 768 bits, about 1500 CPU-years on 2.2GHz AMD CPUs, 10^20 operations, 2^67 instructions (for some definition of "operation" and "instruction". Claims a 1024-bit number would only be about 1,000 times as hard.

Brought to you by Kleinjung, Aoki, Franke, Lenstra, Thome', Bos, Gaudry, Kruppa, Montgomery, Osvik, te Riele, Timofeev, and Zimmerman.
http://bit.ly/8xXSgy

Just FYI.

Thursday, December 17, 2009

Lecture on Systems for Distributed Quantum Computing

Systems for Distributed Quantum Computing on Keio's YouTube Channel, where you will find other videos from the Spintronics group at Keio, as well.

Introduction of my research, targeted at physicists, primarily at a qualitative level, no equations but no stopping to explain vocabulary.

Thursday, October 15, 2009

IJQI Acceptance

Our paper Distributed Quantum Computation Architecture Using Semiconductor Nanophotonics has been accepted for the International Journal of Quantum Information special issue on Distributed Quantum Computation. Huzzah!

This paper was very collaboratively written, thanks to Skype. Thaddeus and Austin both deserve full measures of credit for this one. Thanks, guys!

Wednesday, July 29, 2009

Help Me With My Summer Reading

I'm feeling the need to recharge my store of ideas, and I have the
nagging feeling that my lack of currency in a bunch of fields is
causing me to miss some connections I could use in my own research.

So, I'm looking for a reading list of, say, the one hundred most
important papers of the decade. It doesn't have to be an even
hundred, but I'm looking for a good summer's reading. (Given that
it's mid-2009, now would be a good time to start composing such a list
anyway, depending on where you want to place the "decade" boundary.)

I want these papers to cover *ALL* fields of computer science and
engineering; I am by nature catholic in my reading.

You probably know that my current field is quantum computing, but prior to
that I did network-attached storage. The 1990s were a very creative,
ambitious decade in that area; I admit I read much less there now than
I used to, but I haven't seen anything really earthshaking in storage
in the last few years. (Friends still in the area will no doubt
correct me.)

If such a list already exists, I'm happy to use it as-is, otherwise
I'm willing to manage a conversation and create such a list.

Since this list will be very broad, I want only a few "MUST READ"
papers in each field. What are the new ideas in your field? If you
had a short meeting with, say, an NSF god descended from Olympus, what
ideas would you cite to convince him/her that your *field* (not your
pet idea) is a vibrant field with real-world impact, worthy of
large-scale support? What papers or ideas have changed the way you
think?

So, if your field is:

Theory:
* complexity theory
* algorithms
* data structures
* automata/Turing machines/FSMs, etc.
Systems:
* hardware technology (Moore's Law, etc.)
* processor architecture
* systems architecture
* operating systems
* storage systems
Programming languages
Software engineering
Networking
Artificial Intelligence
Search
Databases
Security
Human-computer interaction
Network or systems operations
Ubiquitous systems/sensor networks
Computer Science education (please help me learn to teach!)
etc.

please send me your suggestions.

I'm even willing to go as far afield as robotics and bioinformatics,
if you can convince me it's worth my time to go read. I'm also
willing to accept old ideas that are finding new urgency;
transactional memory would be a good example, virtualization another.

I will leave it to you to balance newness of idea with real-world
impact, and to decide whether to recommend the key original paper(s)
or a survey paper, if one exists already.

(I'm doing this a tad ad hoc; I really should reconcile against, say,
the ACM computing curricula or a journal keyword list, but I probably
won't bother. Some may also object to the categorizations above;
don't worry about it, just send me the papers/ideas you think are
critical, and we will work on the categorization and balance of the
list later.)

(Yes, the choice to limit this to the last decade is arbitrary; there
are plenty of old ideas I'm not familiar with, too -- I recently ran
across R-trees for the first time, for example -- but, generally
speaking, I'm after ideas too new to have made it into textbooks,
otherwise I'd just pick up a recent text.)

(And on these grounds let me say that I'm enjoying the revitalized
CACM, which seems to be helpful in focusing on new, important ideas,
with more timely and accessible review articles than Computing
Surveys.)

A few thoughts to get started:

* The hierarchy of limits in computing technology:
An outstanding synthesis of Moore's Law, Landauer's principle, etc.
Dense, but well-written and worth the effort to understand. I don't
care if the ideas are old, this one is critical.

@Article{meindl01:_terascale-si,
author = {James D. Meindl and Qiang Chen and Jeffrey A. Davis},
title = {Limits on Silicon Nanoelectronics for Terascale
Integration},
journal = {Science},
year = 2001,
volume = 293,
pages = {2044--2049}
}

* proper network topology analysis:
A clear-eyed look at mathematical analysis of and understanding of the
Internet.

@article{JohnCDoyle10112005,
author = {Doyle, John C. and Alderson, David L. and Li, Lun and Low, Steven and Roughan, Matthew and Shalunov, Stanislav and Tanaka, Reiko and Willinger, Walter},
title = {{The "robust yet fragile" nature of the Internet}},
journal = {Proceedings of the National Academy of Sciences},
volume = {102},
number = {41},
pages = {14497-14502},
doi = {10.1073/pnas.0501426102},
year = {2005}
}

* distributed hash tables (DHTs):
Just barely makes my "decade" cutoff, but one of the most influential
ideas of recent years; fault-tolerant, truly distributed,
loosely-coherent key-value pairs, useful for managing e.g. a lookup
system for P2P networks. (You can argue for another choice of
paper.)

@inproceedings{stoica2001chord,
title={{Chord: A scalable peer-to-peer lookup service for internet applications}},
author={Stoica, I. and Morris, R. and Karger, D. and Kaashoek, M.F. and Balakrishnan, H.},
booktitle={Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications},
pages={149--160},
year={2001},
organization={ACM New York, NY, USA}
}

* Peer-to-peer (P2P) networks:
No recommended paper here yet. I've read a few, but none really stand
out in my mind. Suggestions?

* Delay-tolerant networks (DTNs):
No recommended paper here yet. Might not make the decade cutoff.

* Network coding:
Just barely makes the decade cutoff. This is the seminal paper,
AFAIK, but the writing in it is poor; recommendations for an easier
read gladly accepted. A highly theoretical idea that seems to be
gaining surprising traction in real-world systems.

@article{ahlswede2000nif,
title={Network information flow},
author={Ahlswede, R. and Cai, N. and Li, S.Y.R. and Yeung, R.W.},
journal={Information Theory, IEEE Transactions on},
volume={46},
number={4},
pages={1204--1216},
year={2000}
}

* Ajax:
I haven't seen any good papers on it, and it's a philosophy rather
than a technology, but surely it's important enough to rate
here...suggestions?

* Transactional memory:
One possible route to effective parallel programming. The right one?
Good question!

@Article{larus07:_trans_mem,
author = {James Larus and Christos Kozyrakis},
title = {Transactional Memory},
journal = cacm,
year = 2008,
volume = 51,
number = 7,
pages = {80--89},
doi = {10.1145/1364782.1364800}
}

* MapReduce:
One of a set of very good ideas to come out of Google in the last few
years. The right way to get to real parallel programming on very
large datasets? Good question! The database community seems to think
not. But having done a little MPI programming, I can assure you that
MPI is not for the masses, at least not in its current form.

@article{dean:mapreduce-cacm,
author = {Jeffrey Dean and Sanjay Ghemawat},
title = {{MapReduce}: simplified data processing on large clusters},
journal = {Commun. ACM},
volume = {51},
number = {1},
year = {2008},
issn = {0001-0782},
pages = {107--113},
doi = {http://doi.acm.org/10.1145/1327452.1327492},
publisher = {ACM},
address = {New York, NY, USA},
}

* Rethink the Sync:
Recommended by a friend, who called it his favorite paper of the
decade.

@inproceedings{nightingale2006rethink,
title={Rethink the sync},
author={Nightingale, E.B. and Veeraraghavan, K. and Chen, P.M. and Flinn, J.},
booktitle={Proceedings of the 7th symposium on Operating systems design and implementation},
pages={1--14},
year={2006},
comment ={Best paper; Honey's favorite paper of the last decade.}
}

* Disk MTTF:
An analysis rather than idea paper, but important for anyone who does
large-scale systems or cares when their laptop disk is likely to
fail.

@article{schroeder2007dfr,
title={{Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you}},
author={Schroeder, B. and Gibson, G.A.},
journal={Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST)},
year={2007}
}

That's eight papers on eleven ideas that have changed the way I think
about computing systems in the last few years, mostly in the
networking area (a couple may have to get deleted to hold the final
list to 100 papers). There are many other topics, of course (chip
multiprocessors and radical new architectures such as TRIPS;
hypervisors and virtualization of systems and networks is another
old/new idea), but I'll leave them to others to promote. I've read
hundreds of papers in the last decade, but I'm interested in what
*you* consider important, not my own biases here. And while IP is a
heavily network-oriented list and will no doubt expand on that set of
ideas above, please look as far abroad as you comfortably can -- or
forward on to colleagues in other areas who might be interested in
such a list.

Please help educate me!

Friday, May 15, 2009

Mountain Music

I do like mountain music, though I wouldn't qualify myself as a "big" fan.

Today's discovery:
WMMT, broadcasting from Whitesburg, KY (about an hour from my parents' place), using 15,000 watts, a decent Internet service, and an all-volunteer staff.

The mountain version of "All My Loving" was nice, though I'm actually partial to instrumentals. I like the traditional mountain music, with dulcimer, fiddle, banjo, including the gospels. Bluegrass is okay, but I'm not a big fan of modern country. I'm not wild about steel guitar and, despite (because of?) being a drummer, I don't like the heavy drums and simple 4/4 beat of a lot of commercial country.

Just to blow my mind, after three hours of "Bluegrass Express", they wedged some Latin jazz in between "I Need You Like a Train Needs a Track" and "He Got You, I Got the Dog" (how can you go wrong with a song that starts, "He's sleeping in my double-wide, hunting on my land" and proceeds to conclude that getting the dog was the better end of the deal?).

I commented recently that the problem with a lot of Internet music services is that they are predictable, and hence have no personality. This one - and my favorite, KCSM - definitely have personality.

And every WMMT announcer I've heard so far sounds like someone I went to high school with. Ah, the sounds of home...