If a cipher is good, every individual bit of the output should have a 50% probability of being 0 and a 50% probability of being 1. Likewise, there should be no obvious relationship between the input and output bits (e.g., "If the third input bit is 1, the seventh output bit is 0.") However, it is possible that some combinations of bits don't appear with equal probability. For example, the bit string composed of the first and third input bits and the second and fourth output bits should have all sixteen combinations 0000, 0001, ..., 1111 with equal probability, but perhaps there is a bias. If $P_i$ is the $i$th input plaintext bit and $C_i$ is the $i$th output ciphertext bit, we can construct an equation like
$L = P_1 + P_3 + C_2 + C_4$
(where '+' is modulo 2 addition, or XOR, here). We say that such a combination has a bias if the probability of $L$ being 0 is noticeably different from 50%.
Such a combination is a linear combination of bits. It is known (by whom?) that if the S-boxes in our cipher are fully linear, then the cipher can be easily broken. Therefore, designers always use nonlinear S-boxes, but some bias such as this may be discoverable.
The basic idea of LC, then, is to find such sets of bits that exhibit some bias and use some algebra to recover a few bits of the subkey used in the last round of the cipher, rinse and repeat. The trick is to find the right relationships showing a detectable bias, as was done with differential analysis. This is done by examining the math of the S-boxes in detail; as far as I can tell, this phase of the operation is done by very smart humans.
If you can find a combination with a bias of $\epsilon$, then you need $1/\epsilon^2$ plaintext ciphertext pairs to find some bits of the subkey.
This is done by taking multiple expressions like the above, combining them using Matsui's "Piling Up Principle" where you track the biases multiplied together to make a linear approximation of the nonlinear S-box.
With this linear approximation, it is possible to find correlations between the output of the next-to-last round of the cipher and the original input qubits, from which it is possible to recover information about the subkey used in the last round of the cipher.
Ultimately, this doesn't give you complete information about the key or the plaintext, but substantially reduces the work needed to find the full key by guiding a search, rather than just testing keys at random.
Linear cryptanalysis is considered to be a very general technique, but I don't see extensive attempts to apply it to AES. Indeed, AES (which was developed in the 1990s from the proposed cipher known as Rijndael) was developed specifically with the idea of not being vulnerable to either differential or linear cryptanalysis.
I found Heys' tutorial to be clear and helpful in understanding LC.
No comments:
Post a Comment