So, last time, we looked at simple substitution ciphers. In a substitution
cipher, you take each letter, and pick a replacement for it. To encrypt a
message, you just substitute the replacement for each instance of each letter.
As I explained, it’s typically pretty each to break that encryption – the basic
secret of the encryption is the substitution system, and it’s pretty easy to
figure that out, because the underlying information being encrypted still has a
lot of structure.
There are a couple of easy improvements on a simple substitution cipher, some of which came up in the comments. For example, two
good easy improvements are:
- Instead of defining substitutions for single characters, define
substitutions for groups (pairs, triplets) of characters. This improves things,
because it allows you to work with groups that will reduce the visibility of
patterns. Still, because there’s so much structure in human language, given
enough data, an encrypted message is still likely to be easy to decode. So this
is great for short messages, but not for anything bigger. - Multiple substitutions: instead of always substituting, say, “x” for “a”,
substitute each letter with a two-digit number. Then for common letters, allow
multiple possible substitutions. By assigning many codes to common letters, and
few codes to uncommon letters, you can make the coded symbols appear with
roughly equal frequency. This can seriously hamper frequency based analyses.
Both of those changes help. They work particularly well when combined. To do
a two-character version of that, you create a list of all possible two-character
sequences. Then you generate a frequency table for how often each two-character
sequence occurs in a large sample of the kind of text you’re going to encode.
Then, finally, you assign a number of substitutions for each pair so that they
occur with approximately equal frequency. That gives you a pretty good
system.
Still, it’s not great. Given enough encoded text, it can be cracked with a
relatively small amount of computational power. If I know the basic idea of the
cipher, and I’ve got a decent amount of encoded text, I can write a program that
will figure it out pretty quickly. Plus, it’s really a lot of work to generate
the cipher – you need to generate frequency tables, and work out the number of
substitions, etc. It’s definitely not trivial to set up, and it’s still pretty
easy to crack.
For that reason, those kinds of solutions aren’t used much – there’s a lot
of prep work, and the secret that you need to share with your partner is large
and complicated. You can get better quality with less effort and a
simple secret using a different scheme called a rotating cipher.