Category Archives: Good Math

Linear Programming

In my last post on game theory, I said that you could find an optimal
probabilistic grand strategy for any two-player, simultaneous move, zero-sum game.
It’s done through something called linear programming. But linear programming
is useful for a whole lot more than just game theory.

Linear programming is a general technique for solving a huge family of
optimization problems. It’s incredibly useful for scheduling, resource
allocation, economic planning, financial portfolio management,
and a ton of of other, similar things.

The basic idea of it is that you have a linear function,
called an objective function, which describes what you’d like
to maximize. Then you have a collection of inequalities, describing
the constraints of the values in the objective function. The solution to
the problem is a set of assignments to the variables in the objective function
that provide a maximum.

Continue reading

Iterated Zero-Sum Games

The games that we’ve looked at so far are the simplest case of
basic games. In these games, we’ve got a payoff matrix, and both players
can see the whole matrix – the players have equal information,
and nothing is secret. The players move simultaneously – so neither player can wait to see what his opponent does before making his own decision. Finally, the game is played exactly once – there are no repetitions.

The first complication we can add is to make it an iterative game – that is, instead of each player going once, they’ll repeat the game 10 times in sequence. Everything else stays the same: perfect, equal information, simultaneous moves, etc.

This creates an interesting added dimension. Now, we’ve got two layers
of strategy: each iteration, each player selects a strategy; and then there’s an over-arching strategy that they use to select their strategy each iteration. That over-arching strategy is called a grand strategy

In an iterated game, the optimal solution for players can be different than in the single iteration. The goal is to maximize your total utility at the end of a series of iterations. It’s OK if some iterations result in a loss of utility, so long as by the end of the series of iterations, the final
sum of the utility of the series is maximal.

Continue reading

Implementing Compact Binary Heaps

Last post, I described the basic idea of the binary heap data
structure. But I didn’t talk about how to implement it – and there’s
a very cool way of implementing it – optimally space efficient,
very fast, and with respectable cache locality for moderate sized
datasets.

Continue reading

Binary Heaps

One of the most neglected data structures in the CS repertoire is
the heap. Unfortunately, the jargon is overloaded, so “heap” can mean
two different things – but there is a connection, which I’ll explain
in a little while.

In many programming languages, when you can dynamically create
objects in memory, the space from which you allocate the memory to
create them is called the heap. That is not what I’m talking about.

What I am talking about is an extremely simple but powerful structure
that is used for maintaining a collection of values from which you want
to be able to quickly remove the largest object quickly. This may sound
trivial, but this turns out to be an incredibly useful structure. You can use this basic structure to implement a very fast sorting algorithm; to maintain a prioritized set of values/tasks; to manage chunks of memory; etc. (The
prioritized set is the most common case in my experience, but that’s probably because I’ve spent so much of my career working on distributed systems.)

Continue reading

Simple Games, Utility Functions, and Saddle Points

Last time I wrote about Game Theory, I explained the basic idea of
zero sum games. In their simplest form, a game can be described by a payoff matrix,where each dimension of the matrix is the set of strategies which can be selected by one player, and each entry in the matrix describes the payoffs that result when all players select their strategy. Informally, a game is zero-sum when the total payoff is zero – that is, when any positive payout to one player is balanced by payouts from other players.

We can formalize some of the basic ideas more precisely. What we can say is that each move results in a state; and for each player y, there is a function, up, called a utility function, which maps from a state to a utility value. Taking the player utility functions as components, we can define the game’s utility function in terms of tuples: for players 1..n,
the utility function maps from states to tuples of utility values: u(state)=(u1, …, un).

Continue reading

XKCD and Friendly Numbers

I’ve been getting mail all day asking me to explain something
that appeared in today’s XKCD comic. Yes, I’ve been reduced to explaining geek comics to my readers. I suppose that there are worse fates. I just can’t
think of any. 🙂

But seriously, I’m a huge XKCD fan, and I don’t mind explaining interesting things no matter what the source. If you haven’t read today’s
comic, follow the link, and go look. It’s funny, and you’ll know what
people have been asking me about.

The comic refers to friendly numbers. The question,
obviously, is what are friendly numbers?

First, we define something called a divisors function over the integers, written σ(n). For any integer, there’s a set of integers that divide
into it. For example, for 4, that’s 1, 2, and 4. For 5, it’s just 1 and 5. And for 6, it’s 1, 2, 3, 6. The divisors function, σ(n) is the sum of all of the divisors of n. So
$ sigma(4)=8, sigma(5)=6, sigma(6)=12.$

For each integer, there is a characteristic ratio, defined
using the divisors function. For the integer n, the characteristic
is the ratio of the divisors function over the the number itself: σ(n)/n. So the characteristic ratio of 4 is 7/4; for 6, it’s
12/6=2.

For any characteristic ratio, the set of numbers that share that characteristic are friendly with each other. A friendly number is,
therefore, any integer that shares its characteristic ratio with at least one other integer. If an integer isn’t friendly, then it’s called a solitary number. 1, 2, 3, 4, and 5 are all solitary numbers. 6 is
friendly with 28 (1+2+4+7+14+28/28 = 56/28 = 2).

replaceMath( document.body );

Probability Distributions

I’d like to start with a quick apology. Sorry that both the abstract algebra and the new game theory posts have been moving so slowly. I’ve been a bit overwhelmed lately with things that need doing right away, and
by the time I’m done with all that, I’m too tired to write anything that
requires a lot of care. I’ve known the probability theory stuff for so long,
and the parts I’m writing about so far are so simple that it really doesn’t take nearly as much effort.

With that out of the way, today, I’m going to write about probability distributions.

Continue reading

Random Variables

The first key concept in probability is called a random variable.
Random variables are a key concept – but since they’re a key concept of the
frequentist school, they are alas, one of the things that bring out more of
the Bayesian wars. But the idea of the random variable, and its key position
in understanding probability and statistics predates the divide between
frequentist and Bayesian though. So please, folks, be a little bit patient,
and don’t bring the Bayesian flamewars into this post, OK? If you want to
rant about how stupid frequentist explanations are, please keep it in the comments here. I’m trying to
explain basic ideas, and you really can’t talk about probability and
statistics without talking about random variables.

Continue reading

Schools of thought in Probability Theory

To understand a lot of statistical ideas, you need to know about
probability. The two fields are inextricably entwined: sampled statistics
works because of probabilistic properties of populations.

I approach writing about probability with no small amount of trepidation.

For some reason that I’ve never quite understood, discussions of probability
theory bring out an intensity of emotion that is more extreme than anything else
I’ve seen in mathematics. It’s an almost religious topic, like programming
languages in CS. This post is intended really as a flame attractor: that is, I’d request that if you want to argue about Bayesian probability versus frequentist probability, please do it here, and don’t clutter up every comment thread that
discusses probability!

There are two main schools of thought in probability:
frequentism and Bayesianism, and the Bayesians have an intense contempt for the
frequentists. As I said, I really don’t get it: the intensity seems to be mostly
one way – I can’t count the number of times that I’ve read Bayesian screeds about
the intense stupidity of frequentists, but not the other direction. And while I
sit out the dispute – I’m undecided; sometimes I lean frequentist, and sometimes I
lean Bayesian – every time I write about probability, I get emails and comments
from tons of Bayesians tearing me to ribbons for not being sufficiently
Bayesian.

It’s hard to even define probability without getting into trouble, because the
two schools of thought end up defining it quite differently.

The frequentist approach to probability basically defines probability in terms
of experiment. If you repeated an experiment an infinite number of times, and
you’d find that out of every 1,000 trials, a given outcome occured 350 times, then
a frequentist would say that the probability of that outcome was 35%. Based on
that, a frequentist says that for a given event, there is a true
probability associated with it: the probability that you’d get from repeated
trials. The frequentist approach is thus based on studying the “real” probability
of things – trying to determine how close a given measurement from a set of
experiments is to the real probability. So a frequentist would define probability
as the mathematics of predicting the actual likelihood of certain events occuring
based on observed patterns.

The bayesian approach is based on incomplete knowledge. It says that you only
associate a probability with an event because there is uncertainty about it –
because you don’t know all the facts. In reality, a given event either will happen
(probability=100%) or it won’t happen (probability=0%). Anything else is an
approximation based on your incomplete knowledge. The Bayesian approach is
therefore based on the idea of refining predictions in the face of new knowledge.
A Bayesian would define probability as a mathematical system of measuring the
completeness of knowledge used to make predictions. So to a Bayesian, strictly speaking, it’s incorrect to say “I predict that there’s a 30% chance of P”, but rather “Based on the current state of my knowledge, I am 30% certain that P will occur.”

Like I said, I tend to sit in the middle. On the one hand, I think that the
Bayesian approach makes some things clearer. For example, a lot of people
frequently misunderstand how to apply statistics: they’ll take a study showing
that, say, 10 out of 100 smokers will develop cancer, and assume that it means
that for a specific smoker, there’s a 10% chance that they’ll develop cancer.
That’s not true. The study showing that 10 out of 100 people who smoke will develop cancer can be taken as a good starting point for making a prediction – but a Bayesian will be very clear on the fact that it’s incomplete knowledge, and that it therefore isn’t very meaningful unless you can add more information to increase the certainty.

On the other hand, Bayesian reasoning is often used by cranks.
A Bayesian
argues that you can do a probabilistic analysis of almost anything, by lining
up the set of factors that influence it, and combining your knowledge of those factors in the correct way. That’s been used incredibly frequently by cranks for
arguing for the existence of God, for the “fact” that aliens have visited the
earth, for the “fact” that artists have been planting secret messages in
paintings, for the “fact” that there are magic codes embedded in various holy texts, etc. I’ve dealt with these sorts of arguments numerous times on this blog; the link above is a typical example.

Frequentism doesn’t fall victim to that problem; a frequentist only
believes probabilities make sense in the setting of a repeatable experiment. You
can’t properly formulate something like a probabilistic proof of God under the
frequentist approach, because the existence of a creator of the universe isn’t a
problem amenable to repeated experimental trials. But frequentism suffers
from the idea that there is an absolute probability for things – which is often ridiculous.

I’d argue that they’re both right, and both wrong, each in their own settings. There are definitely settings in which the idea of a fixed probability based on a model of repeatable, controlled experiment is, quite simply, silly. And there
are settings in which the idea of a probability only measuring a state of knowledge is equally silly.

Understanding Non-Euclidean Hyperbolic Spaces – With Yarn!

crochet_02.jpg

One of my fellow ScienceBloggers, Andrew Bleiman from Zooilogix, sent me an amusing link. If you’ve done things like study topology, then you’ll know about non-euclidean spaces. Non-euclidean spaces are often very strange, and with the exception of a few simple cases (like the surface of a sphere), getting a handle on just what a non-euclidean space looks like can be extremely difficult.

One of the simple to define but hard to understand examples is called a hyperbolic space. The simplest definition of a hyperbolic space is a space
where if you take open spheres of increasing radius around a point, the amount of space in those open spheres increases exponentially.

If you think of a sheet of paper, if you take a point, and you draw progressively larger circles around the point, the size of the circles increases
with the square of the radius: for a circle with radius R, the amount of space inside the circle is proportional to R2. If you did it in three dimensions, the amount of space in the spheres would be proportional to R3. But it’s always a fixed exponent.

In a hyperbolic space, you’ve got a constant N, which defines the “dimensionality” of the space – and the open spheres around it enclose a
quantity of space proportional to NR. The larger the open circle around
a point, the higher the exponent.

What Andrew sent me is a link about how you can create models of hyperbolic
spaces using simple crochet.
And then you can get a sense of just how a hyperbolic space works by playing with the thing you crocheted!

It’s absolutely brilliant. Once you see it, it’s totally obvious
that this is a great model of a hyperbolic space, and just about anyone
can make it, and then experiment with it to get an actual tactile sense
of how it works!

It just happens that right near where I live, there’s a great yarn shop whose owners my wife and I have become friends with. So if you’re interested in trying this out, you should go to their shop, Flying Fingers, and buy yourself some yarn and crochet hooks, and crochet yourself some hyperbolic surfaces! And tell Elise and Kevin that I sent you!