Tuesday, May 19, 2020

2.5: Known and chosen plaintexts in real systems

(I'm back, after a bit of a break. If you missed it, you can go back to the beginning of What Every Quantum Researcher and Engineer Should Know about Classical Cryptography.)

(Parts of this will make more sense after getting through the below. Maybe this section should be moved.  Also, some pictures will definitely help here.)

Modern HTTPS (web) and SMTP (email) connections have a lot of predictability in their content, with commands like 'HELO' and 'HTTP/1.1' being standard parts of an exchange.  We'll see in a later section more detail about how this encryption is done using a protocol called Transport Layer Security, or TLS, but for the moment we'll only focus on the message contents and the fact that they are pretty predictable.  Thus, it's reasonable to consider attacks on TLS to be known-plaintext attacks, and in fact there are cases where we can create chosen-plaintext attacks.

Consider, for example, the connection between your laptop and your email server, whether Gmail or your organization's server.  Assume that I, an attacker, can send you email and can observe your encrypted
connection to your server (perhaps I control a router somewhere between your server and your machine).  I can send you an email message that contains, for example, the strings 0x000000000000000 (15 zero bytes in a row) and 0x000000010000000 (with a one in the middle). If your cipher block size is 8 bytes, as in DES, I know that one of the encrypted blocks will be all zeroes and one will have exactly one bit set, even if I have trouble controlling the exact position within the overall stream.  I capture the ciphertext blocks, and compare them.  This single-bit difference between two blocks helps me with the attack.  All I have to do to execute a basic chosen-plaintext attack is to send you email and watch the resulting packets flow between your machines!

The success of such an attack requires a lot of assumptions about which parts of the entire process I can observe and which I can control, but using the principle of being conservative on security, assuming an attacker can force the choice of plaintext passed between two nodes through an encrypted connection is not unreasonable in today's richly interwoven distributed systems.

Especially for IPsec, there is another big vulnerability: one encrypted connection between two gateways (known as a Security Association, which we will see below) may carry data encrypted for a bunch of machines.  So if an attacker manages to install a program on only one laptop (say, via email, or while you're sitting at Starbucks), they can cause your system to send out arbitrarily chosen packets that will cross the tunnel, so they can execute a chosen plaintext attack pretty easily.  Since IPsec encrypts the whole packet, they may not be able to tell immediately which packets came from your laptop and which from a colleague's laptop, but that distinction is a relatively minor overhead.

Also, for every IP packet from your laptop to the email server passing through the IPsec tunnel, the IP header portion is going to be exactly the same, and its position in the encrypted stream is very easy to identify.  This led to some of the decisions around the use of CBC, I believe; I'm not aware of any deeper features intended to further obscure the location of such predictable data.

In short, as a defender, you should work on the assumption that a noticeable fraction of your plaintext is known under benign circumstances, and that it's not all that hard for an attacker to mount a chosen plaintext attack.

No comments: