Category Archives: Good Math

Rotating Ciphers

So, last time, we looked at simple substitution ciphers. In a substitution
cipher, you take each letter, and pick a replacement for it. To encrypt a
message, you just substitute the replacement for each instance of each letter.
As I explained, it’s typically pretty each to break that encryption – the basic
secret of the encryption is the substitution system, and it’s pretty easy to
figure that out, because the underlying information being encrypted still has a
lot of structure.

There are a couple of easy improvements on a simple substitution cipher, some of which came up in the comments. For example, two
good easy improvements are:

  1. Instead of defining substitutions for single characters, define
    substitutions for groups (pairs, triplets) of characters. This improves things,
    because it allows you to work with groups that will reduce the visibility of
    patterns. Still, because there’s so much structure in human language, given
    enough data, an encrypted message is still likely to be easy to decode. So this
    is great for short messages, but not for anything bigger.
  2. Multiple substitutions: instead of always substituting, say, “x” for “a”,
    substitute each letter with a two-digit number. Then for common letters, allow
    multiple possible substitutions. By assigning many codes to common letters, and
    few codes to uncommon letters, you can make the coded symbols appear with
    roughly equal frequency. This can seriously hamper frequency based analyses.

Both of those changes help. They work particularly well when combined. To do
a two-character version of that, you create a list of all possible two-character
sequences. Then you generate a frequency table for how often each two-character
sequence occurs in a large sample of the kind of text you’re going to encode.
Then, finally, you assign a number of substitutions for each pair so that they
occur with approximately equal frequency. That gives you a pretty good
system.

Still, it’s not great. Given enough encoded text, it can be cracked with a
relatively small amount of computational power. If I know the basic idea of the
cipher, and I’ve got a decent amount of encoded text, I can write a program that
will figure it out pretty quickly. Plus, it’s really a lot of work to generate
the cipher – you need to generate frequency tables, and work out the number of
substitions, etc. It’s definitely not trivial to set up, and it’s still pretty
easy to crack.

For that reason, those kinds of solutions aren’t used much – there’s a lot
of prep work, and the secret that you need to share with your partner is large
and complicated. You can get better quality with less effort and a
simple secret using a different scheme called a rotating cipher.

Continue reading

Simple Encryption: Introduction and Substitution Ciphers

The starting point talking about encryption is to understand
what the point of it is; what it’s supposed to do, what problems it’s supposed to avoid.

Encryption is fundamentally about communication: you’ve got two parties who want to communicate, but don’t want anyone else to be able to listen in.

They way that you do that is by sharing a secret. You use that secret to somehow modify the information that you’re going to send, so that it can’t be read by someone who doesn’t have the secret. People often think of encryption as a way of using a password to hide information, but a password is just one of many kinds of secrets that you can use. The secret that you share with your counterpart can be a password, a number, a textbook, or just about anything else you can imagine.

Continue reading

Encryption, Privacy, and You

As you’ve probably heard, the US customs service has, recently, asserted the right to confiscate any and all computers and/or digital storage carried by anyone crossing the US border. They further assert
the right to demand all passwords, encryption keys, etc., from
the owners. They even further assert the right to keep or make copies of any data that they find, and to share it without limit with anyone they choose.

I don’t think I really need to stress how insane this is. Back
when I worked for IBM, I frequently travelled to Canada, because I
worked with development labs in Toronto and Ottawa. When I did that, I
carried a computer full of stuff that IBM considered to be highly
confidential and highly sensitive. (I’ve even still got a wall-plaque
from IBM thanking for me work on a project, where I’m not allowed to
ever tell anyone what I did to earn it!) What this policy
says is that the border service would have the right to turn that
information over to anyone they wanted, without informing me
or IBM that they had done so. Further, some of the information on that
laptop was encrypted, and I did not have the key. They were
encrypted with a system that would only allow them to be opened if the
computer could contact a particular IBM server from inside the IBM
firewall. So not only could the border service have confiscated the
computer and passed on confidential or private information – but they
could have arrested me for refusing to decrypt the information on the
computer – even though I couldn’t decrypt it.

This isn’t new news. They’ve been doing this for a while, and we know they’ve been doing it – they’ve made absolutely no attempt to
hide it.

The reason that I’m writing about it now is because I just read
something on Salon about how an allegedly knowledgeable and tech-savvy
person recommends coping with this, and I can’t possible disagree more
strongly. On the Salon Machinist blog, Denise Caruso wrote:

Swire notes that agents at the border are going further than just
taking image copies of people’s hard drives. They’re actually
demanding passwords and encryption keys so they can examine the
contents.

Of course, they promise to destroy the copies and the keys as soon
as they’re done — as long as they don’t find anything illegal, like a
downloaded song you didn’t pay for — so no security worries there,
right? There’s no such thing as a crooked customs or Border Patrol
agent.

This gives government agents access to information they would
never get by opening up your suitcase. In addition to e-mail,
spreadsheets, documents and personal financial information like credit
card receipts and photos, nowadays they can also listen to your stored
Skype calls and voice mails.

Not to mention that just having encrypted data on your hard drive
causes suspicion, or at least throws down the gauntlet. If you were
looking for illegal stuff and you ran into a file that looked like
this,

qANQR1DBwU4D/TlT68XXuiUQCADfj2o4b4aFYBcWumA7hR1Wvz9rbv2BR6WbEUsy
ZBIEFtjyqCd96qF38sp9IQiJIKlNaZfx2GLRWikPZwchUXxB+AA5+lqsG/ELBvRa
c9XefaYpbbAZ6z6LkOQ+eE0XASe7aEEPfdxvZZT37dVyiyxuBBRYNLN8Bphdr2zv z/9Ak4
/OLnLiJRk05/2UNE5Z0a+3lcvITMmfGajvRhkXqocavPOKiin3hv7+Vx88

wouldn’t you immediately need to know what it said? It could be a conspiracy! It could be a list of child pornographers! It could be a copyrighted magazine article! It could be a bootleg Led Zepplin video!

Urgh.

So I figure the best solution is to encode your files rather than
encrypt them, so that you could hide your stuff in plain sight. If
agents don’t know something is encrypted and it looks innocuous, they
won’t compel you to give them the key. “Here’s your laptop, ma’am.
Sorry for the inconvenience.”

That’s the wrong answer. The solution isn’t to try to hide the
fact that you’re taking your own/your employer’s privace seriously. The answer is to make encryption so absolutely routine that (A) finding encrypted files on a computer is so common and routine that it can’t be used as a distinguishing characteristic to allow them to justify confiscating your computer, and (B) to make it so incredibly painful and laborious for them to get any data off of a computer that they give up.

The first part of instructions for how to do this are below.

Continue reading

Solving Tic-Tac-Toe: Game Tree Basics

tic-tac-toe.png

Moving on from simple zero-sum games, there are a bunch of directions in which
we can go. So far, the games we’ve looked at are very restrictive. Beyond the
zero-sum property, they’re built on a set of fundamental properties which ultimately
reduce to the idea that no player ever has an information advantage over any other
player: the complete payoff matrix is known by all players; no player gets
to see the other players strategy before selecting their own; and so on.

Moving on to more interesting games, we can relax those assumptions, and allow information to be hidden. Perhaps each player can see a different part of the
payoff matrix. Perhaps they take turns, so that one player gets to see the others
strategy before selecting his own. Perhaps the game isn’t zero-sum.

Non-zero sum games turn out to be disappointing from a game theory point of
view. Given a suitable set of restrictions, you can convert a non-zero-sum game to a
zero-sum game with an additional player. In the cases where you can’t do that,
you’re pretty much stuck – the mathematical tools that work well for analyzing
zero-sum games often simply don’t work once you relax the zero-sum requirement.

The more interesting ways of exploring different realms of games comes when you
allow things to get more complex. This comes about when you allow a players strategy
selection to alter the game. This general takes place in a turn-taking
game, where each players strategy selection alters the game for the other player. A
simple example of this is the game of tic-tac-toe. The set of strategies of the game
for a given player at any point in time is the set of open squares on the board.
Each time a player makes a move, the game is altered for the other player.

This makes things much more interesting. The easiest way to think of it is
that now, instead of a simple matrix for the game, we end up with a tree. Each move
that a player can make creates a new game for the other player. By making each game position a tree node, and adding children nodes for each position that can follow it, you can build a tree describing the complete set of possible game positions, and thus the complete set of ways that the game could play out.

Continue reading

Utility Functions

Before we move beyond zero-sum games, it’s worth taking a deeper look
at the idea of utilities. As I mentioned before, in a game, the scores in
the matrix are given by something called a utility function.

Utility is an idea for how to mathematically describe preferences in terms
of a game or lottery. For a game to be valid (that is, for a game to have a meaningful analysis and solution), there must be a valid utility function that
describes the players’ preferences.

But what do we have to do to make a valid utility function? It’s
simple, but as usual, we’ll make it all formal and explicit.

Continue reading

Back to Math: Solving Zero-Sum Games

When I last wrote about game theory, we were working up to how to find
the general solution to an iterated two-player zero-sum game. Since it’s been
a while, I’ll take a moment and refresh your memory a bit.

A zero-sum game is described as a matrix. One player picks a row, and one player picks a column. Those selections are called strategies. The intersection
of the strategies in the game matrix describes what one player pays to the other. The
matrix is generally written from the point of view of one player. So if we call
our two players A and B, and the matrix is written from the viewpoint of A, then
an entry of $50 means that B has to pay $50 to A; an entry of $-50 means that A has to pay $50 to B.

In an iterated game, you’re not going to just play once – you’re going to play it repeatedly. To maximize your winnings, you may want to change strategies. It turns out that the optimal strategy is probabilistic – you’ll assign probabilities to your
strategy choices, and use that probability assignment to randomly select your
strategy each iteration. That probability assignment that dictates how you’ll
choose a strategy each iteration is called a grand strategy.

Jon Von Neumann proved that for any two-player zero-sum game, there is an optimal grand strategy based on probability. In a game where both players know exactly what the payoffs/penalties are, the best strategy is based on constrained randomness – because any deliberate system for choosing strategies can be cracked by your opponent, resulting in his countering you. The best outcome comes from assessing potential wins and losses, and developing a probabilistic scheme for optimizing the way that you play.

Once you make that fundamental leap, realizing that it’s a matter
of probability, it’s not particularly difficult to find the grand strategy: it’s just a simple optimization task, solveable via linear programming. The solution is very elegant: once you see it, once you see how to
formulate the game as a linear optimization process, it just seems completely obvious.

Continue reading

B-Trees – Balanced Search Trees for Slow Storage

Another cool, but frequently overlooked, data structure in the tree family is called the B-tree. A B-tree is a search tree, very similar to a BST in concept, but optimized differently.

BSTs provide logarithmic time operations, where the performance
is fundamentally bounded by the number of comparisons. B-trees also
provide logarithmic performance with a logarithmic number of
comparisons – but the performance is worse by a constant factor. The
difference is that B-trees are designed around a different tradeoff. The B-tree is designed to minimize the number of tree nodes that need to be examined, even if that comes at the cost of doing significantly more
comparisons.

Why would you design it that way? It’s a different performance tradeoff.
The B-tree is a da
ta structure designed not for use in memory, but instead for
use as a structure in hard storage, like a disk drive. B-trees are the
basic structure underlying most filesystems and databases: they provide
an efficient way of provide rapidly searchable stored structures. But
retrieving nodes from disk is very expensive. In comparison to
retrieving from disk, doing comparisons is very nearly free. So the design
goal for performance in a B-tree tries to minimize disk access; and when disk access is necessary, it tries to localize it as much as possible – to minimize the number of retrievals, and even more importantly, to minimize the number of nodes on disk that need to be updated when something is inserted.

Continue reading

Mortgage Basics (part 1)

One thing that I’ve been getting a lot of requests about as a Manchester mortgage advice consultant is the ongoing mortgage mess in the US. I wrote a bit about it a while ago, explaining what was going on. But since then, I’ve gotten a lot of people asking me to explain various things about how mortgages work, and what kinds
of trouble people have gotten into.

Continue reading

The Basic Balanced Search Tree: Red-Black trees.

During my Haskell tutorial, I used balanced binary search trees as an example. I’ve had people write to me asking me to write about that in a non-functional programming post, because the Haskell one got too tied up in how to do things without assignments, and with tail recursion.

Binary search trees (BST) are another one of the really fundamental simple data structures. They’re conceptually similar to heaps – they’re also a kind of size-ordered binary tree with O(lg n) performance – but they’ve got a different set of basic invariants to represent the different performance goals.

The goal of a structure like a BST is to have a variable-size stucture where
you can cheaply and easily increase or decrease the size of the structure, and where insert, delete, and searching for values are all inexpensive. BSTs are a good choice for implementing things like dictionaries, where you have a key associated with a value, or where you want to be able to search for an arbitrary member quickly.

BSTs are very widely used – for example, the default ordered collections, sets, and maps in the C++ standard libraries are all implemented as a kind of BST (in fact, the very kind that I’m going to describe below.)

If you have a meaningful ordering operation, then a BST is a good choice: expanding and contracting the structure happen automatically every time you do an insert or remove; and insert, delete, and search are all bounded by lg(n), where N is the number of values in the tree.

Continue reading