When I last wrote about game theory, we were working up to how to find
the general solution to an iterated two-player zero-sum game. Since it’s been
a while, I’ll take a moment and refresh your memory a bit.
A zero-sum game is described as a matrix. One player picks a row, and one player picks a column. Those selections are called strategies. The intersection
of the strategies in the game matrix describes what one player pays to the other. The
matrix is generally written from the point of view of one player. So if we call
our two players A and B, and the matrix is written from the viewpoint of A, then
an entry of $50 means that B has to pay $50 to A; an entry of $-50 means that A has to pay $50 to B.
In an iterated game, you’re not going to just play once – you’re going to play it repeatedly. To maximize your winnings, you may want to change strategies. It turns out that the optimal strategy is probabilistic – you’ll assign probabilities to your
strategy choices, and use that probability assignment to randomly select your
strategy each iteration. That probability assignment that dictates how you’ll
choose a strategy each iteration is called a grand strategy.
Jon Von Neumann proved that for any two-player zero-sum game, there is an optimal grand strategy based on probability. In a game where both players know exactly what the payoffs/penalties are, the best strategy is based on constrained randomness – because any deliberate system for choosing strategies can be cracked by your opponent, resulting in his countering you. The best outcome comes from assessing potential wins and losses, and developing a probabilistic scheme for optimizing the way that you play.
Once you make that fundamental leap, realizing that it’s a matter
of probability, it’s not particularly difficult to find the grand strategy: it’s just a simple optimization task, solveable via linear programming. The solution is very elegant: once you see it, once you see how to
formulate the game as a linear optimization process, it just seems completely obvious.
The basic idea is: take the game matrix. Work out the expected payoff for each
strategy for one player in terms of an unknown probability assignment. Then, using linear programming, optimize for the maximum value of the minimum expected payoff,
subject to constraints that basically describe a probability assignment.
As usual, that sounds terribly confusing until you see an example.
Let’s get specific. Here’s a game. We’ll call the two players H (for the player who picks a strategy that’s a horizontal line across the graph), and V. The grid is set up from the viewpoint of player H: the entry in a position is what V needs to pay to H if that pair of strategies is selected:
V1 | V2 | V3 | |
---|---|---|---|
H1 | 3 | 1 | 3 |
H2 | 2 | 4 | 5 |
H3 | 3 | 6 | 1 |
Then the linear programming problem works out as follows:
- Let E(H1) = 3p1 + 1p2 + 3p3
- Let E(H2) = 2p1 + 4p2 + 5p3
- Let E(H3) = 3p1 + 6p2 + 1p3
Maximize: min(E(H1), E(H2), E(H3)), where Σpi=1,
and ∀i: 0≤pi≤1.
When we run that through our linear programming tool, we get the following probability assignment (rounded to two figures): p1=0.57, p2=0.17, p3=0.26.
If we transpose, and look at things from the perspective of V, we get that the
optimal strategy is (0.70, 0.09, 0.22).
That example is an unfair game, because V always pays H. But the
same basic construction of the solution will work. It’s implicit in the names of
the basic strategies: maximin; maximize the minimum. It’s just that now, instead
of maximizing the minimum of a single strategy selection, we’re maximizing the minimum of the expected outcomes based on the probability assignments in the grand strategy.
There’s one more question about the game: what’s the expected payoff? In game theory terms, we call that the value of the game: the value of the game to a player is the amount that the player will win/lose if both they and their opponent play optimally. It’s very easy to figure: the probability assignments work out so that the expectation is the same for all three strategies. So to find the value for H, you pick one strategy for V, and work out the expectation for H given the probabilities in his grand strategy.
So, suppose that player V chose strategy 1. Then the expectation for H would be:
0.57*3 + 0.17*2 + 0.26*3 = 2.83. So the value of the game to player H is 2.83. If you work it out from the transpose matrix to get the value of the game to player V, it works out to -2.83.
When you see this, it’s pretty obvious that it works. In fact, most people see it,
and react, roughly, “Of course! Maximize the minimum!” It seems so simple! The real fundamental achievement by Jon Von Neumann wasn’t figuring out how to describe the probabilistic grand strategies as an optimization problem – that’s actually easy. The real key was realizing that the optimal strategy was based on probability – that the whole key to optimizing a zero-sum game lies in randomness constrained by a probability distribution.
Interesting post … thanks.
Small point – I think you’ll find that the usual spelling is “John von Neumann,” rather than “Jon Von Neumann.”
A thought — this article is something that would be wonderful to illustrate with a Javascript tool showing the two players, their choices, and their payoffs accumulating over time. Either just recording as an animation a sequence in which both players were being smart by maximizing the minimum, compared to a sequence in which either or both players was being less smart, showing the difference.
I know nothing about programming, can’t do it, just wishing in public.
This concept is hard to ‘get’ from reading about it, and I don’t know how I could explain it to kids young enough to benefit over their lifetime. Wish I could. Watching an interated game could help.
As an example, this little Java thing gets the concept of random bell curve across to kids quite well (I’ve seen these built in hardware at science museums): http://javaboutique.internet.com/BallDrop/
Oops, anonymous above is Hank Roberts; Typekey says I’m signed in, but isn’t adding the name. Go figure.
I think you’ve got the probabilities for each player switched: E(H1), E(H2), and E(H3) are the expected payouts from player V to H when V uses grand strategy (p1, p2, p3) and H uses the indicated pure strategy. Accordingly, the linear program should minimize Max(E(H1), E(H2), E(H3)), as this minimizes the worst-case payout for player V no matter what H does.
A few minor points and refinements.
* Games do not have to have the same number of possible actions for each player – this can make linear programming approaches more difficult.
* The probabilistic approach is valid for all games, not just zero-sum. It’s just that in non zero-sum games, there is often an equilibrium which does not involve randomness.
In fact, the key insight is that, when you allow for randomness in your strategy, there is always a Nash equilibrium (a situation where no individual player is tempted to change strategies).
* There is another way to formulate the issue, which allows extension into non zero-sum games, and makes sense here as well.
If a player is going to randomly choose between moves A and B, he must find them equally tempting. Otherwise, he’d always select one or the other.
In the game set down above, we look for p1, p2, p3 such that the expected payoff for H1, H2, and H3 are the same. This means that player H is willing to randomize (between H1, H2 and H3) if and only if player V acts with probabilities p1, p2, and p3 (16/23, 2/23 and 5/23, to be precise).
The same reasoning, based on V1, V2, and V3 giving the same expected utility, gives the likelihoods for H’s actions.
The point here is that the equal expected payoffs are not a side effect, but a goal.
In this context, it is possible that some actions (H4?) have a probability of zero – in which case their expected payoff does not matter (though it had better be lower than the expected payoff for that actions that are taken).
If H4 gave (2, 5, 0) (always 1 less than H3), to player H, then equal expected payoffs are not possible for all 4 actions. And the likelihood of H4 would be zero, because it always gives a lower expected payoff than H3.