(I'm back, after a bit of a break. If you missed it, you can go back to the beginning of
What Every Quantum Researcher and Engineer Should Know about Classical Cryptography.)
Biham and Shamir wrote a seminal paper, then
an easy-to-read book on their rediscovery of
differential cryptanalysis. The goal of DC is to recover the key used for a session, so it is potentially far more serious than the birthday attack.
The book says, "An interesting feature of the new attack is that it can be applied with the same complexity and success probability even if the key is frequently changed and thus the collected ciphertexts are derived from many different keys." That's pretty alarming. This appears in a paragraph discussing chosen plaintext, so it may be restricted to that case. I infer from this that rather than needing the cumulative accretion of information, each trial is independent.
The attack works with either
chosen plaintext, in which the attacker says, "Please encrypt this message for me, and give me the ciphertext," or
known plaintext, in which the attacker knows that the message is a repetition of "HEILHILTER", as figured prominently in the cracking of the Enigma machine in WWII. Modern HTTPS (secure web browsing) connections and SMTP (email) connections have enough predictability in their content to provide a similar fulcrum for such a lever. There is a substantial difference in the complexity of the two attacks (see Tab. 2.1 in the book). Known plaintext takes
waaay more trials to succeed.
The key idea is the construction of
differentials (hence the name) from specific S boxes. Take two possible inputs to an S box, $X_1$ and $X_2$. We can assume the subkey has been XORed into both $X_1$ and $X_2$ already. $Y_1$ and $Y_2$ are the corresponding outputs of the S box. We know that if $X_1 = X_2$, then $Y_1 = Y_2$ and also $X_1 + X_2 = 0$ (again, here '+' is addition modulo two, or XOR).
For the total encryption algorithm, changing one bit in the input should result in about half the output bits changing. The S box should be similar; small changes in the input should mean large changes in the output. In fact, the S box is small enough that we can exhaustively analyze its inputs. We also know some rules that were chosen during the design phase, for example, changing
one bit in the six-bit input should result in changes to at
two bits in the four-bit output.
So a differential table for each S box is constructed by listing all $2^6$ possible XORs $X_1 + X_2$, and collecting the stats on $Y_1 + Y_2$. From the differences found here, we can work backwards to find with "high" probability (slightly higher than completely random, at any rate) some characteristics of the outputs of the
previous round.
The overall attack is performed by encrypting a bunch of randomly-chosen
pairs of plaintexts (chosen as a pair; first a completely random one, then a second by XORing in a specific value) and comparing their ciphertexts until we find an output pair of ciphertexts that fit comfortably with the difference tables we have pre-computed for the S boxes. Repeat until we have built up some statistics about the right set of bits in the output, and from that we can take a probabilistic guess at the subkey in the last round. Based on that guess, we can narrow down the set of subkeys for the next-to-last round, rinse and repeat. It's still a computationally intensive, tedious process, but much less than brute force. Roughly, the probability of getting the ciphertext pairs we need is proportional to $1/p_D$, where $p_D$ is the differential probability in the table we are looking for, which may be quite small. (This seems to me that keeping a history of things you've already tried would increase the probability of finding a pair you like, so I'm still puzzled by the assertion above that this works even if the key is being changed frequently.)
Ultimately, this attack is considered practical against ordinary DES. Biham and Shamir estimated (p. 8, p. 61) that DES and DES-like systems, even with the full sixteen rounds, could be broken using $2^{43}$ to $2^{47}$ chosen plaintext/ciphertext pairs. With 8-byte blocks, that's encryption of 64 terabytes up to a petabyte. If the system under attack can encrypt a 40Gbps link at line rate, that's only a few hours up to a couple of days of data.
Triple-DES would be much, much longer, but 3-DES is still considered obsolete due to the birthday paradox attack above. AES is stronger against DC, so this attack is of less help against properly-implemented, modern systems.
I know this is a very rough explanation; I have only a wobbly understanding of the process myself! The differential tables are pretty straightforward, once you understand them, but working from there to a full-on attack is a big jump.