I don’t have a lot of time to write; I’m having my fifth (I think) upper endoscopy done tomorrow, which means that the day’s going to be a wash; and Yom Kippur is thursday, and I need to cook, so between the personal crap and work, I’m not going to have much time for blogging. So I’m trying to make use of the time I have to write one short but (hopefully) interesting post.
One thing that I’ve mentioned in passing is the distinction between message confidentiality, and message integrity.
Confidentiality is most of what we’ve been talking about
so far. Confidentially provides a guarantee that when you send an encrypted message, no one but your intended recipient is able
to read the plaintext.
Integrity is something very different. Integrity guarantees
that if you send an encrypted message, there’s no way that the encrypted message could have been tampered with after you encrypted it, without the recipient knowing it.
Suppose you’re using CBC mode for a message in DES, and you’ve
got an attacker who wants to screw up your message. What happens if the attacker flips a couple of bits in one block? First, the plaintext for that block will be corrupted. Second, the
ciphertext for that block is used for decrypting the next block – so the plaintext for the next block will also be corrupted.
Can you tell that there was a corruption in the message? If the plaintext was something like human language, you can see that it makes no sense. But that’s relying on our meta-knowledge about
the structure/nature of the plaintext. In fact, just looking at
the encrypted message, there is no possible way for us to detect tampering!
In CBC mode – and in fact in all of the modes we’ve looked at, an attacker can change bits of the ciphertext, corrupting the message, and we can’t tell. That’s why we say that
these modes of operation can’t provide any guarantees of message
integrity.
How can we get around that? The easiest thing is to add something called a message authentication code (MAC).
A MAC is basically a hash-code: a short string appended to the
message which in some way summarizes the message, so that if any
part of the message was changed, the MAC will not match the message, and so we’ll know that the message was corrupted.
It’s easiest to understand the MAC using a bit of notation:
Mac = M(T, K), where “M” is the MAC function, T is the plaintext of the message, and K is an encryption key. So you take your plaintext and your key, feed it to M, and get back a short string which is the MAC for your message.
MAC functions need to have some basic properties to be
successful in protecting integrity:
- Changing a single bit of the input message must produce
a significant change in the MAC. - Given a MAC value V, a MAC function M, and a key K, it must be difficult to compute a plaintext T where M(T,K)=V.
- Given an oracle O(T) which computes M(T,K), and the
ability to compute O(T) for an arbitrary set of chosen
plaintexts, it must be difficult to guess the MAC for a
particular message without submitting it to O. (That is, even
in the scenario of a ideal chosen plaintext attack,
where you can choose any set of messages you want, and
get the MAC for those messages, that won’t help you to guess
the MAC of any message outside of that set.) - Given a message T, it must be difficult to find a message
U such that M(T,K)=M(U,K).
If you’re familiar with digital signatures, some of this
should be ringing bells. The idea of a digital signature is very
close to this. The main difference is that a digital signature
is asymmetric: the sender and the receiver have different keys. The sender has a private key, and signs the message;
everyone else has a public key, and can verify that the sender
signed the message. With MACs, both the sender and the receiver have the same key. MACs provide no guarantee of who originated the message; they just guarantee that the message
wasn’t changed between the time it was encrypted by the sender
and decrypted by the reciever.
Next post, I’ll go through one or two MAC algorithms, and then talk about modes of operation that incorporate MAC-like integrity
guarantees.
I’m familiar with MACs from various applications, but I’d never really considered their use protecting the authenticity of a symmetrically-encrypted message. One possible simplification occurs to me, though: Couldn’t you simply append a secure hash of the preceding plaintext to each message? Unless the attacker can predict the effect of his changes (which shouldn’t be possible), he shouldn’t be able to modify the message _and_ the hash in such a way that it remains undetectable, right?
Yes, I believe that an encrypted message’s integrity may be protected by a simple hash appended to the plaintext. However, unlike a MAC, transmitting the hash in the clear could expose known plaintexts.
One of the more interesting applications of MACs I’ve seen was in a instant messaging protocol—use of a symmetric key, instead of asymmetric signatures lets each party know they’re getting authentic messages from the other. Unlike signatures, however, neither can prove to a third party that the other said something, since both have all the information they need to spoof messages from each other.
Actually, I was meaning to say that you could append the hash to the plaintext _before_ encryption.
Mark, this is incorrect:
In fact, the ciphertext for the next block will only have a single bit flipped in it, the bit that the attacker flipped. This is because block n+1 only depends on the ciphertext of block n, not its plaintext.
Nick, your idea is insecure because of the above. If you do:
CBC-Encrypt(msg || HASH(msg))
Then the last block can easily be updated to match the hash of any changes the attacker makes to the encrypted message blocks. The attacker can merely set block n – 1 to:
Ciphertext[n – 1] xor HASH(origMsg) xor HASH(attackerMsg)
That’s the whole reason for using a MAC.
One more comment to Nick. You may be tempted to say that an attacker can’t both change the msg and the hash at the same time. If you’re using counter mode or RC4 (stream cipher constructions), every bit is independent. For schemes like CBC where there is some redundancy, there are a number of cases where the propagation behavior of CBC does not prevent an attack.
If the data to be changed is in the same block as the hash, it can easily be changed at the same time. Depending on the protocol, the corrupted block can be surrounded by type fields that cause the receiver to ignore the result of the hash (say, if the receiver crashes once it hits the corrupted block). The receiver could allow retransmissions, allowing the attacker to keep trying until the modified message is accepted.
In summary, you don’t want to depend on accidental behavior in your system to keep you secure. Without a standard public key signature or MAC, your ciphertext integrity is not protected.
Nate:
For the purposes of a quick and simple explanation, I think that “the following block will be corrupt” is correct. Yes, you can be more precise about exactly *what* the corruption is; but the fact is that you know that changing one encrypted block will result in two plaintext blocks being different than what the sender intended. Corruption is *any change* in the plaintext from what the sender intended.
Fifth upper endoscopy? That’s a little hard to swallow.
Seriously – I hope you’re well and things turn out okay.
Ian:
Yeah, it was my fifth. I had such severe reflux that I had to have surgery. I had three endos before the surgery – the GE I was seeing at the time just couldn’t believe that someone my age (I was 28 when I started seeing him), and kept looking for some other cause. Then after the surgery, I had trouble, and then did an endo to look for clues as to the cause. Now, 12 years later, the doctor I saw for my sinus infection thought it might be reflux related.
The good news is, the endo showed *nothing* wrong. No reflux, the surgical fix from 12 years ago is intact, everything is good.
Off topic, and I know this is fish in a barrel, but you might have some fun picking at Conservapedia’s assertion that “The odds of Obama being truthful in his claim that he converted to Christianity are less than 100 to 1 against it, as fewer than 1% of Muslims convert to Christianity.” Bayes is spinning in his grave…
MACs use shared keys because their focus is -messages- — that is, bits in motion. If you use something like SSL the protocol allows each side to negotiate a shared secret session key, and that session key is used with a symmetrical encryption like AES (or DES) for the rest of the session. That setup can take a significant amount of time, something that has is a real consideration for anyone doing low-level work with SSL connections. Also, either party can demand renegotiation of the session key at any time, and “good practice” requires it be done periodically. (Once every N minutes or M megabytes, etc.)
Anyway, with something like SSL the message is broken into smaller packets and each packet is MAC’d and encrypted with the session key. The receiver verifies the MAC and sends a “packet garbled, retransmit” message if it doesn’t match.
If this sounds a lot like TCP/IP protocol, it should. They’re independent implementations, though, and you can get into nasty error cases if there’s a noisy connection since the TCP/IP and SSL packet sizes might not align. It gets even worse if you have deeply nested layers, e.g., you’re talking to a HTTPS site over an encrypted VPN connection.
Mark, I would expect people to jump to conclusions based on “corrupt”, interpreting that as “unpredictable, unknown changes to the entire block.” This is especially the case since it is what happens to block N and you apply “corrupt” to block N+1 as well. I think that’s misleading.
I would say it as:
@4
I’m thinking about it a few days already but still… How do you know HASH(origMsg)? It is, you know, encrypted, as is the whole plaintext.
I do remember reading articles from clever people that even if using the Authenticate-then-Encrypt scheme, the authentication needs to be “good”, not just some hash — but I don’t understand why.
Can you or MarkCC shed some light onto this? Perhaps we assume different knowledge for the attacker? He only knows the algorithm (cipher+hash used) and can change the message.
By the way, Mark, I think you forgot to select a category for this post — it’s not listed under your “Encryption” topics.
I’ll explain in a moment, but first want to say that if you have questions like this and are in charge of building security into a system, please seek assistance. My day job (link) is reviewing and designing security, especially involving the use of crypto. A security review by a third party should be part of every design project. When I design security for a customer, I usually contact someone else to get a formal review of it afterward.
Your proposal is totally insecure under a known plaintext attack. If the attacker knows the plaintext, they can easily calculate HASH(origMsg), the same as you. Then, they can update the value as I originally described, bypassing the integrity protection.
Even if the plaintext is not known, your proposal may be susceptible to this attack. Consider if the message is “transfer $1M to account #1234” and the attacker’s goal is get the money but he doesn’t know the original account number.
He first registers his own account, say #5678. Then, he intercepts the encrypted/hashed transfer message (above). Now, he goes through guesses for the original account number, generating HASH(origMsg) and the corresponding XOR value I described originally. He submits each modified transfer message to the bank and one is eventually accepted as valid.
A construction like HMAC intentionally includes a large secret value (not just a hash of potentially known data) to avoid this kind of attack. There are many others as well, depending on the application.