Independence and Combining probabilities.

As I alluded to in my previous post, simple probability is really a matter of observation, to produce a model. If you roll a six sided die over and over again, you’ll see that each face comes up pretty much equally often, and so you model it as 1/6 probability for each face. There’s not really a huge amount of principle behind the basic models: it’s really just whatever works. This is the root of the distinction between interpretations: a frequentist starts with an experiment, and builds a descriptive model based on it, and says that the underlying phenomena being tested has the model as g, a property; a Bayesian does almost the same thing, but says that the model describes the state of their knowledge.

Where probability starts to become interesting in when you combine things. I know the probability of outcomes for rolling one die: how can I use that to see what happens when I roll five dice together? I know the probability of drawing a specific card from a deck: what are the odds of being dealt a particular poker hand?

We’ll start with the easiest part: combining independent probabilities. The probability of two events are independent when there’s no way for the outcome of one to influence the outcome of the other. For example, if you’re flipping a coin several times, the result of one coin flip has no effect on the result of a subsequent flip. On the other hand, dealing 10 cards from a deck is a sequence of dependent events: once you’ve dealt one card, the next deal must come from the remaining cards: you can’t deal the same card twice.

If you know the probability space of your trials, then recognizing an independent situation is easy: if the outcome of one trial doesn’t alter the probability space of other trials, then they’re independent.

Look back at the coin flip example: we know what the probability space of a coin flip looks like: it’s got two, equally probable outcomes. If you’ve flipped a coin once, and you’re going to flip another coin, the result of the first flip can’t do anything that alters the probability space of a subsequent flip.

But if you think about dealing cards, that’s not true. With a standard deck of cards, the initial probability space has 52 outcomes, each of which is equally likely. So the odds of being dealt the 5 of spades is exactly 1/52.

Now, suppose that you got lucky, and you did get dealt the 5 of spades on the first card. What’s the probability of being dealt the 5 of spades’s on the second? If they were independent events, it would still be 1/52. But once you’ve dealt one card, you can’t deal it again. The probability of being dealt the 5 of spades as the second card is 0: it’s impossible. The probability space only has 51 possible outcomes, and the 5 of spades is not one of them. The space has changed. That’s the definition of a dependent event.

When you’re faced with dependent probabilities, you need to figure out how the probability space will be changed, and incorporate that into your computation. Once you’ve incorporated the change in the probability space of the second test, then you’ve got a new independent probability, and you can combine them. Figuring out how to alter the probability space can be extremely difficult, but that’s what makes it interesting.

When you’re dealing with independent events, it’s easy to combine them. There are two basic ways of combining event probabilities,
and they should be familiar from logic: event1 AND event2, and event1 OR event2.

Suppose you’re looking at two test with independent outcomes. I know that the probability of event e is P(e), and the probability of event f is P(f) Then the outcome of e & f – that is, of having e as the outcome of the first trial, and f as the outcome of the second, is P(e)×P(f). The odds of rolling HTTH on a coin is (1/2)*(1/2)*(1/2)*(1/2)=(1/16).

If you’re looking at independent alternatives – that is, the probability of e OR F, you combine the probabilities of the event with addition: P(e) + P(f). So, the odds of drawing any heart from a deck: for each card, it’s 1/52. There are thirteen different hearts. So the odds of drawing a red are 1/52 + 1/52 + … = 13/52 = 1/4.

That still doesn’t get us to the really interesting stuff. We still can’t quite work out something like the odds of being dealt a flush. To get there, we need to learn some combinatorics, which will allow us to formulate the probability spaces that we need for an interesting probability.

Probability Spaces

Sorry for the slowness of the blog lately. I finally got myself back onto a semi-regular schedule when I posted about the Adria Richards affair, and that really blew up. The amount of vicious, hateful bile that showed up, both in comments (which I moderated) and in my email was truly astonishing. I’ve written things which pissed people off before, and I’ve gotten at least my fair share of hatemail. But nothing I’ve written before came close to preparing me for the kind of unbounded hatred that came in response to that post.

I really needed some time away from the blog after that.

Anyway, I’m back, and it’s time to get on with some discrete probability theory!

I’ve already written a bit about interpretations of probability. But I haven’t said anything about what probability means formally. When I say that the probability of rolling a 3 with a pair of fair six-sided dice is 1/18, how do I know that? Where did that 1/6th figure come from?

The answer lies in something called a probability space. I’m going to explain the probability space in frequentist terms, because I think that that’s easiest, but there is (of course) an equivalent Bayesian description.) Suppose I’m looking at a particular experiment. In classic mathematical form, a probability space consists of three components (Ω, E, P), where:

  1. Ω, called the sample space, is a set containing all possible outcomes of the experiment. For a pair of dice, Ω would be the set of all possible rolls: {(1,1), (1,2), (1,3), (1,4), (1,5), (1, 6), (2,1), …, (6, 5), (6,6)}.
  2. E is an equivalence relation over Ω, which partitions Ω into a set of events. Each event is a set of outcomes that are equivalent. For rolling a pair of dice, an event is a total – each event is the set of outcomes that have the same total. For the event “3” (meaning a roll that totalled three), the set would be {(1, 2), (2, 1)}.
  3. P is a probability assignment. For each event e in E, P(e) is a value between 0 and 1, where:

     Sigma_{ein E} P(e) = 1

    (That is, the sum of the probabilities of all of the possible events in the space is exactly 1.)

The probability of an event e being the outcome of a trial is P(e).

So the probability of any particular event as the result of a trial is a number between 0 and 1. What’s it mean? If the probability of event e is p, then if we repeat the trial N times, we expect N*p of those trials to have e as their result. If the probability of e is 1/4, and we repeat the trial 100 times, we’d expect e to be the result 25 times.

But in an important sense, that’s a cop-out. We’ve defined probability in terms of this abstract model, where the third component is the probability. Isn’t that circular?

Not really. For a given trial, we create the probability assignment by observation and/or analysis. The important point is that this is really just a bare minimum starting point. What we really care about in probability isn’t the change associated with a single, simple, atomic event. What we want to do is take the probability associated with a group of single events, and use our understanding of that to allow us to explore a complex event.

If I give you a well-shuffled deck of cards, it’s easy to show that the odds of drawing the 3 of diamonds is 1/52. What we want to do with probability is things like ask: What are the odds of being dealt a flush in a poker hand?

The construction of a probability space gives us a well-defined platform to use for building probabilistic models of more interesting things. Give a probability space of two single dice, we can combine them together to create the probability space of the two dice rolled together. Given the probability space of a pair of dice, we can construct the probability space of a game of craps. And so on.

Probability and Interpretations

I’m going to do some writing about discrete probability theory. Probability is an extremely important area of math. We encounter aspects of it every day. It’s also a very poorly understood area – it’s one that we see abused or just fouled up every day.

I’m going to focus on discrete probability theory. What that means is that we’re going to look at things where the space containing the things that we’re going to look at contains a countable number of elements. The probability of getting a certain sequence of coin flips, or of getting a certain hand of cards are described by discrete probability theory. On the other hand, the odds of a radioactive isotope decaying at a particular time requires continuous probability theory.

Before getting into the details, there’s one important thing to mention. When you’re talking about probability, there are two fundamental schools of interpretetation. There are frequentist interpretations, and there are Bayesian interpretations.

In a frequentist interpretation, when you say the probability of an event is 0.6, what you mean is that if you were to perform a series of experiments precisely reproducing the event, then on average, if you did 100 experiments, the event would occur 60 times. In the frequentist interpretation, the probability is an intrinsic property of the event. For a frequentist, it makes sense to say that there is a “real” probability associated with an event.

In a Bayesian interpretation, when you say that the probability of an event is 0.6, what you mean is that based on your current state of knowledge about the event, you have a 60% certainty that the event will occur. In a strict Bayesian interpretation, the event doesn’t have any kind of intrinsic probability associated with it. The specific event that you’re interested in either will occur, or it won’t. There’s no real probability involved. What probability measures is how certain you are about whether or not it will occur.

For example, think about flipping a fair coin.

A frequentist would say that you can flip a coin many times, and half of the time, it will land on heads. So the probability of a coin flip landing on the head of the coin is 0.5. A Bayesian would say that the coin will land either on heads or on tails. Since you don’t know which, and you have no other information to use to be able to make a better prediction, you can have a certainty of 0.5 that it will land on the head of the coin.

In the real world, I think that most people are really somewhere in between.

I think that all but the most fervent Bayesians do rely on an intuitive notion of the “intrinsic” probability of an event. They may describe it in different terms, but when it comes down to it, they’re using the basic frequentist notion. And I don’t think that you can find a sane frequentist anywhere who won’t use Bayes theorem to update their priors in the face of new information – which is the most fundamental notion in the Bayesian interpretation.

One note before I finish this, and get started on the real meaty posts. In the past, when I’ve talked about probability, people have started stupid flamewars in the comments. People get downright religious about interpretations of probability. There are religious Bayesians, who think that all frequentists are stupid idiots who should be banished from the field of math; likewise, there are religious frequentists who think that Bayesians are all a crop of arrogant know-it-alls who should be sent to Siberia. I am not going to tolerate any of that nonsense. If you feel that you cannot read posts on probability without going into a diatribe about those stupid frequentists/Bayesians and their deliberately stupid ideas, please go away and don’t even read these posts. If you do go into such a diatribe, I will delete your comments without any hesitation.

Speed-Crankery

A fun game to play with cranks is: how long does it take for the crank to contradict themselves?

When you’re looking at a good example of crankery, it’s full of errors. But for this game, it’s not enough to just find an error. What we want is for them to say something so wrong that one sentence just totally tears them down and demonstrates that what they’re doing makes no sense.

“The color of a clear sky is green” is, most of the time, wrong. If a crank makes some kind of argument based on the alleged fact that the color of a clear daytime sky is green, the argument is wrong. But as a statement, it’s not nonsensical. It’ just wrong.

On th other hand, “The color of a clear sky is steak frite with bernaise sauce and a nice side of roasted asparagus”, well… it’s not even wrong. It’s just nonsense.

Today’s crank is a great example of this. If, that is, it’s legit. I’m not sure that this guy is serious. I think this might be someone playing games, pretending to be a crank. But even if it is, it’s still fun.

About a week ago, I got en mail titled “I am a Cantor crank” from a guy named Chris Cuellar. The contents were:

…AND I CHALLENGE YOU TO A DUEL!! En garde!

Haha, ok, not exactly. But you really seem to be interested in this stuff. And so am I. But I think I’ve nailed Cantor for good this time. Not only have I come up with algorithms to count some of these “uncountable” things, but I have also addressed the proofs directly. The diagonalization argument ends up failing spectacularly, and I believe I have a good explanation for why the whole thing ends up being invalid in the first place.

And then I also get to the power set of natural numbers… I really hope my arguments can be followed. The thing I have to emphasize is that I am working on a different system that does NOT roll up cardinality and countability into one thing! As it will turn out, rational numbers are bigger than integers, integers are bigger than natural numbers… but they are ALL countable, nonetheless!

Anyway, I had started a little blog of my own a while ago on these subjects. The first post is here:
http://laymanmath.blogspot.com/2012/09/the-purpose-and-my-introduction.html

Have fun… BWAHAHAHA

So. We’ve got one paragraph of intro. And then everything crashes and burns in an instant.

“Rational numbers are bigger than integers, integers are bigger than natural numbers, but they are all countable”. This is self-evident rubbish. The definition of “countable” say that an infinite set I is countable if, and only if, you can create a one-to-one mapping between the members of I and the natural numbers. The definition of cardinality says that if you can create a one-to-one mapping between two sets, the sets are the same size.

When Mr. Cuellar says that the set of rational numbers is bigger that the set of natural numbers, but that they are still countable… he’s saying that there is not a one-to-one mapping between the two sets, but that there is a one-to-one mapping between the two sets.

Look – you don’t get to redefine terms, and then pretend that your redefined terms mean the same thing as the original terms.

If you claim to be refuting Cantor’s proof that the cardinality of the real numbers is bigger than the cardinality of the natural numbers, then you have to use Cantor’s definition of cardinality.

You can change the definition of the size of a set – or, more precisely, you can propose an alternative metric for how to compare the sizes of sets. But any conclusions that you draw about your new metric are conclusions about your new metric – they’re not conclusions about Cantor’s cardinality. You can define a new notion of set size in which all infinite sets are the same size. It’s entirely possible to do that, and to do that in a consistent way. But it will say nothing about Cantor’s cardinality. Cantor’s proof will still work.

What my correspondant is doing is, basically, what I did above in saying that the color of the sky is steak frites. I’m using terms in a completely inconsistent meaningless way. Steak frites with bernaise sauce isn’t a color. And what Mr. Cuellar does is similar: he’s using the word “cardinality”, but whatever he means by it, it’s not what Cantor meant, and it’s not what Cantor’s proof meant. You can draw whatever conclusions you want from your new definition, but it has no bearing on whether or not Cantor is correct. I don’t even need to visit his site: he’s demonstrated, in record time, that he has no idea what he’s doing.

The Gravitational Force of Rubbish

Imagine, for just a moment, that you were one a group of scientists that had proven the most important, the most profound, the most utterly amazing scientific discovery of all time. Where would you publish it?

Maybe Nature? Science? Or maybe you’d prefer to go open-access, and go with PLOS ONE? Or more mainstream, and send a press release to the NYT?

Well, in the case of today’s crackpots, they bypassed all of those boring journals. They couldn’t be bothered with a pompous rag like the Times. No, they went for the really serious press: America Now with Leeza Gibbons.

What did they go to this amazing media outlet to announce? The most amazing scientific discovery of all time: gravity is an illusion! There’s no gravity. In fact, not just is there no gravity, but all of that quantum physics stuff? It’s utter rubbish. You don’t need any of that complicated stuff! No – you need only one thing: the solar wind.

A new theory on the forces that control planetary orbit refutes the 400-year old assumptions currently held by the scientific community. Scientific and engineering experts Gerhard and Kevin Neumaier have established a relationship between solar winds and a quantized order in both the position and velocity of the solar system’s planets, and movement at an atomic level, with both governed by the same set of physics.

The observations made bring into question the Big Bang Theory, the concept of black holes, gravitational waves and gravitons. The Neumaiers’ paper, More Than Gravity, is available for review at MoreThanGravity.com

Pretty damned impressive, huh? So let’s follow their instructions, and go over to their website.

Ever since humankind discovered that the Earth and the planets revolved around the Sun, there was a question about what force was responsible for this. Since the days of Newton, science has held onto the notion that an invisible force, which we have never been able to detect, controls planetary motion. There are complicated theories about black holes that have never been seen, densities of planets that have never been measured, and subatomic particles that have never been detected.

However, it is simpler than all of that and right in front of us. The Sun and the solar wind are the most powerful forces in our solar system. They are physically moving the planets. In fact, the solar wind spins outward in a spiral at over a million miles per hour that controls the velocity and distances that planets revolve around the Sun. The Sun via the solar wind quantizes the orbits of the planets – their position and speed.

The solar wind also leads to the natural log and other phenomenon from the very large scale down to the atomic level. This is clearly a different idea than the current view that has been held for over 400 years. We have been working on this for close 50 years and thanks to satellite explorations of space have data that just was not available when theories long ago were developed. We think that we have many of the pieces but there are certainly many more to be found. We set this up as a web site, rather as some authoritative book so that there would be plenty of opportunity for dialog. The name for this web site, www.MorethanGravity.com was chosen because we believe there is far more to this subject than is commonly understood. Whether you are a scientific expert in your field or just have a general interest in how our solar system works, we appreciate your comments.

See, it’s all about the solar wind. There’s no such thing as gravity – that’s just nonsense. The sun produces the solar wind, which does absolutely everything. The wind comes out of the sun, and spirals out from the sun. That spiral motion has eddies in it an quantized intervals, and that’s where the planets are. Amazing, huh?

Remember my mantra: the worst math is no math. This is a beautiful demonstration
of that.

Of course… why does the solar wind move in a spiral? Everything we know says that in the absence of a force, things move in a straight line. It can’t be spiraling because of gravity, because there is no gravity. So why does it spiral? Our brilliant authors don’t bother to say. What makes it spiral, instead of just move straight? Mathematically, spiral motion is very complicated. It requires a centripetal force which is smaller than the force that would produce an orbit. Where’s that force in this framework? There isn’t any. They just say that that’s how the solar wind works, period. There are many possible spirals, with different radial velocities – which one does the solar wind follow according to this, and why? Again, no answer from the authors.

Or… why is the sun producing the solar wind at all? According to those old, stupid theories that this work of brilliance supercedes, the sun produces a solar wind because it’s fusing hydrogen atoms into helium. That’s happening because gravity is causing the atoms of the sun to be compressed together until they fuse. Without gravity, why is fusion happening at all? And given that it’s happening, why does the sun not just explode into a supernova? We know, from direct observation, that the energy produced by fusion creates an outward force. But gravity can’t be holding the sun together – so why is the sun there at all? Still, no answers.

They do, eventually, do some math. One of the big “results” of this hypothesis is about the “quantization” of the orbits of planets around the sun. They were able to develop a simple equation which predicts the locations where planets could exist in their “solar wind” system.

Let’s start with the distance between the planets and the Sun. We guessed that if the solar system was like an atom, that planetary distance would be quantized. This is to say that we thought that the planets would have definite positions and that they would be either in the position or it would be empty. In a mathematical sense, this would be represented by a numerical integer ordering (0,1,2,3,…). If the first planet, Mercury was in the 0 orbital, how would the rest of the planets line up? Amazingly well we found.

If we predict the distance from the surface of the Sun to each planet in this quantized approach, the results are astounding. If D equals the mean distance to the surface of the Sun, and d0 as the distance to Mercury, we can describe the relationship that orders the planets mathematically as:

 D=d_0 S^n

Each planetary position can be predicted from this equation in a simple calculation as we increase the integer (or planet number) n. S is the solar factor, which equals 1.387. The solar factor is found in the differential rotation of the Sun and the profile of the solar wind which we will discuss later.

Similar to the quantized orbits that exist within an atom, the planetary bodies are either there or not. Mercury is in the zero orbital. The next orbital is missing a planet. The second, third, and fourth orbitals are occupied by Venus, Earth, and Mars respectively. The fifth orbital is missing. The sixth orbital is filled with Ceres. Ceres is described as either the largest of all asteroids or a minor planet (with a diameter a little less than half that of Pluto), depending on who describes it. Ceres was discovered in 1801 as astronomers searched for the missing planets that the Titius-Bode Law predicted would exist.

So. What they found was an exponential equation which products very approximate versions of the size of first 8 planets’ orbits, as well as a couple of missing ones.

This is, in its way, interesting. Not because they found anything, but rather because they think that this is somehow profound.

We’ve got 8 data points (or 9, counting the asteroid belt). More precisely, we have 9 ranges, because all of the orbits are elliptical,but the authors of this junk are producing a single number for the size of the orbits, and they can declare success if their number falls anywherewithin the range from perihelion to aphelion in each of the orbits.

It would be shocking if there weren’t any number of simple equations that described exactly the 9 data points of the planet’s orbits.

But they couldn’t even make that work directly. They only manage to get a partial hit – getting an equation that hits the right points, but which also generates a bunch of misses. There’s nothing remotely impressive about that.

From there, they move on to the strawmen. For example, they claim that their “solar wind” hypothesis explains why the planets all orbit in the same direction on the same plane. According to them, if orbits were really gravitational, then planets would orbit in random directions on random planes around the sun. But their theory is better than gravity, because it says why the planets are in the same plane, and why they’re all orbiting in the same direction.

The thing is, this is a really stupid argument. Why are the planets in the same plane, orbiting in the same direction? Because the solar system was formed out of a rotating gas cloud. There’s a really good, solid, well-supported explanation of why the planets exist, and why they orbit the sun the way they do. Gravity doesn’t explain all of it, but gravity is a key piece of it.

What they don’t seem to understand is how amazingly powerful the theory of gravity is as a predictive tool. We’ve sent probes to the outer edges of the solar system. To do that, we didn’t just aim a rocket towards Jupiter and fire it off. We’ve done things like the Cassini probe, where we launched a rocket towards Venus. It used the gravitational field of Venus twice to accelerate it with a double-slingshot maneuver, and send it back towards earth, using the earth’s gravity to slingshot it again, to give it the speed it needed to get to Jupiter.

This wasn’t a simple thing to do. It required an extremely deep understanding of gravity, with extremely accurate predictions of exactly how gravity behaves.

How do our brilliant authors answer this? By handwaving. The extend of their response is:

Gravitational theory works for things like space travel because it empirically measures the force of a planet, rather than predicting it.

That’s a pathetic handwave, and it’s not even close to true. The gravitational slingshot is a perfect answer to it. A slingshot doesn’t just use some “empirically measured” force of a planet. It’s a very precise prediction of what the forces will be at different distances, how that force will vary, and what effects that force will have.

They do a whole lot more handwaving of very much the same order. Pure rubbish.

What the heck is a DNS amplification DoS attack?

A couple of weeks ago, there was a bunch of news about a major DOS attack on Spamhaus. Spamhaus is an online service that maintains a blacklist of mail servers that are known for propagating spam. I’ve been getting questions about what a DoS attack is, and more specifically what a “DNS amplification attack” (the specific attack at the heart of last week’s news) is. This all became a bit more relevant to me last week, because some asshole who was offended by my post about the Adria Richards affair launched a smallish DoS attack against scientopia. (This is why we were interrmitently very slow last week, between tuesday and thursday. Also, to be clear, the DNS amplification attack was used on Spamhaus. Scientopia was hit by a good old fashioned DDoS attack.)

So what is a DoS attack? And what specifically is a DNS amplification attack?

Suppose that you’re a nastly person who wants to take down a website like scientopia. How could you do it? You could hack in to the server, and delete everything. That would kill it pretty effectively, right?

It certainly would. But from the viewpoint of an attacker, that’s not a particularly easy thing to do. You’d need to get access to a privileged account on our server. Even if we’re completely up to date on patches and security fixes, it’s probably possible to do that, but it’s still probably going to be a lot of work. Even for a dinky site like scientopia, getting that kind of access probably isn’t trivial. For a big security-focused site like spamhaus, that’s likely to be close to impossible: there are layers of security that you’d need to get through, and there are people constantly watching for attacks. Even if you got through, if the site has reliable backups, it won’t be down for long, and once they get back up, they’ll patch whatever hole you used to get in, so you’d be back to square one. It’s a lot of work, and there are much easier ways to take down a site.

What you, as an attacker, want is a way to take the site down without having any kind of access to the system. You want a way that keeps the site down for as long as you want it down. And you want a way that doesn’t leave easily traced connections to you.

That’s where the DoS attack comes in. DoS stands for “denial of service”. The idea of a DoS attack is to take a site down without really taking it down. You don’t actually kill the server; you just make it impossible for legitimate users to access it. If the sites users can’t access the site even though the server is technically still up and running, you’ve still effectively killed it.

How do you do that? You overwhelm the server. You target some finite resource on the server, and force it to use up that resource just dealing with requests or traffic that you sent to the server, leaving it with nothing for its legitimate users.

In terms of the internet, the two resources that people typically target are CPU and network bandwidth.

Every time that you send a request to a webserver, the server has to do some computation to process that request. The server has a finite amount of computational capability. If you can hit it with enough requests that it spends all of its time processing your requests, then the site becomes unusable, and it effectively goes down. This is the simplest kind of DoS attack. It’s generally done in a form called a DDoS – distributed denial of server attack, where the attacker users thousands or millions of virus-infected computers to send requests. The server gets hit by a vast storm of requests, and it can’t distinguish the legitimate requests from the ones generated by the attacker. This is the kind of attack that hit Scientopia last week. We were getting streams of a couple of thousands malformed requests per second.

This kind of attack can be very effective. It’s hard – not impossible, but hard – to fight. You need to identify the common traits of the attackers, and set up some kind of filter to discard those requests. From the attacker’s point of view, it’s got one problem: price. Most people don’t have a personal collection of virus-infected machines that they can use to mount an attack. What they actually do is rent machines! Virus authors run services where they’ll use the machines that they’ve to run an attack for you, for a fee. They typically charge per machine-hour. So to keep a good attack going for a long time is expensive! Another problem with this kind of attack is that the amount of traffic that you can inflict on the server per attacker is also used by the client. The client needs to establish a connection to the server. That consumes CPU, network connections, and bandwidth on the client.

The other main DoS vector is network bandwidth. Every server running a website is connected to the network by a connection with a fixed capacity, called it’s bandwidth. A network connection can only carry a certain quantity of information. People love to make fun of the congressman who said that the internet is like a series of tubes, but that’s not really a bad analogy. Any given connection is a lot like a pipe. You can only cram so much information through that pipe in a given period of time. If you can send enough traffic to completely fill that pipe, then the computer on the other end is, effectively, off the network. It can’t receive any requests.

For a big site like spamhaus, it’s very hard to get enough machines attacking to effectively kill the site. The amount of bandwidth, and the number of different network paths connecting spamhaus to the internet is huge! The number of infected machines available for an attack is limited, and the cost of using all of them is prohibitive.

What an attacker would like for killing something like Spamhaus is an attack where the amount of work/cpu/traffic used to generate the attack is much smaller than the amount of work/cpu/traffic used by the server to combat the attack. That’s where amplification comes in. You want to find some way of using a small amount of work/traffic on your attacker machines to cause your target to lost a large amount of work/traffic.

In this recent attack on Spamhaus, they used an amplification attack, that was based on a basic internet infrastructure service called the Domain Name Service (DNS). DNS is the service which is used to convert between the name of a server (like scientopia.org), and its numeric internet address (184.106.221.182). DNS has some technical properties that make it idea for this kind of attack:

  1. It’s not a connection-based service. In most internet services, you establish a connection to a server, and send a request on that connection. The server responds on the same connection. In a connection-based service, that means two things. First, you need to use just as much bandwidth as the target, because if you drop the connection, the server sees the disconnect and stops processing your request. Second, the server knows who it’s connected to, and it always sends the results of a request to the client that requested it. But DNS doesn’t work that way. In DNS, you send a request without a connection, and in the request, you provide an address that the response should be sent to. So you can fake a DNS request, by putting someone else’s address as the “respond-to” address in the request.
  2. It’s possible to set up DNS to create very large responses to very small requests. There are lots of ways to do this. The important thing is that it’s really easy to use DNS in a way that allows you to amplify the amount of data being sent to a server by a factor of 100. In one common form of DNS amplification, you send 60 byte requests, which generate responses larger than 6,000 bytes.

Put these two properties together, and you get a great attack vector: you can send tiny, cheap requests to a server, which don’t cause any incoming traffic on your attacker machine, and which send large quantities of data to your target. Doing this is called a DNS amplification attack: it’s an amplification attack which uses properties of DNS to generate large quantities of data send to your server, using small quantities of data sent by your attackers.

That’s exactly what happened to Spamhaus last week. The attackers used a very common DNS extension, which allowed them to amplify 60 byte requests into 4,000 byte responses, and to send the responses to the spamhaus servers.

There are, of course, more details. (For example, when direct attacks didn’t work, they tried an indirect attack that didn’t target the spamhaus servers, but instead tried to attack other servers that spamhaus relied on.) But this is the gist.

A White Boy's Observations of Sexism and the Adria Richards Fiasco

I’ve been watching the whole Adria Richards fiasco with a sense of horror and disgust. I’m finally going to say something, but for the most part, it’s going to be indirect.

See, I’m a white guy, born as a member of an upper middle class white family. That means that I’m awfully lucky. I’m part of the group that is, effectively, treated as the normal, default person in most settings. I’m also a guy who’s married to a chinese woman, and who’s learned a bit about how utterly clueless I am.

Here’s the fundamental issue that underlies all of this, and many similar stories: our society is deeply sexist and racist. We are all raised in an environment in which mens voices are more important than womens. It’s so deeply ingrained in us that we don’t even notice it.

What this means is that we are all to some degree, sexist, and racist. When I point this out, people get angry. We also have learned that sexism is a bad thing. So when I say to someone that you are sexist, it’s really easy to interpret that as me saying that you’re a bad person: sexism is bad, if I’m sexist, them I’m bad.

But we really can’t get away from this reality. We are sexists. For many of us, we’re not deliberately sexist, we’re not consciously sexist. But we are sexist.

Here’s a really interesting experiment to try, if you have the opportunity. Visit an elementary school classroom. First, just watch the teacher interact with the students while they’re teaching. Don’t try to count interactions. Just watch. See if you think that any group of kids is getting more attention than any other. Most of the time, you probably will get a feeling that they’re paying roughly equal attention to the boys and the girls, or to the white students and the black students. Then, come back on a different day, and count the number of times that they call on boys versus calling on girls. I’ve done this, after having the idea suggested by a friend. The result was amazing. I really, honestly believed that the teacher was treating her students (the teacher I did this with was a woman) equally. But when I counted?She was calling on boys twice as often as girls.

This isn’t an unusual outcome. Do some looking online for studies of classroom gender dynamics, and you’ll find lots of structured observations that come to the same conclusion.

My own awakening about these kinds of things came from my time working at IBM. I’ve told this first story before, but it’s really worth repeating.

One year, I managed the summer intership programs for my department. The previous summer, IBM research had wound up with an intership class consisting of 99% men. (That’s not an estimate: that’s a real number. That year, IBM research hired 198 summer interns, of whom 2 were women.) For a company like IBM, numbers like that are scary. Ignoring all of the social issues of excluding potentially great candidates, numbers like that can open the company up to gender discrimination lawsuits!

So my year, they decided to encourage the hiring of more diverse candidates. The way that they did that was by allocating each department a budget for summer interns. They could only hire up to their budgeted number of interns. Only women and minority candidates didn’t count against the budget.

When the summer program hiring opened, my department was allocated a budget of six students. All six slots were gone within the first day. Every single one of them went to a white, american, male student.

The second day, the guy across the hall from me came with a resume for a student he wanted to hire. This was a guy who I really liked, and really respected greatly. He was not, by any reasonable measure, a bad guy – he was a really good person. Anyway, he had this resume, for yet another guy. I told him the budget was gone, but if he could find a good candidate who was either a woman or minority, that we could hire them. He exploded, ranting about how we were being sexist, discriminating against men. He just wanted to hire the best candidate for the job! We were demanding that he couldn’t hire the best candidate, he had to hire someone less qualified, in order to satisfy some politically correct bureaucrat! There was nothing I could do, so eventually he stormed out.

Three days later, he came back to my office with another resume. He was practically bouncing off the walls he was so happy. “I found another student to hire. She’s even better than the guy I originally came to you with! She’s absolutely perfect for the job!”. We hired her.

I asked him why he didn’t find her before. He had no answer – he didn’t know why he didn’t find her resume of his first search.

This was a pattern that I observed multiple times that year. Looking through a stack of resumes, without deliberately excluding women, somehow, all of the candidates with female names wound up back in the slushpile. I don’t think that anyone was deliberately saying “Hmm, Jane, that’s a woman’s name, I don’t want to hire a woman”. But I do think that in the process of looking through a file containing 5000 resumes, trying to select which ones to look at, on an unconscious level, they were more likely to look carefully at a candidate with a male name, because we all learn, from a young age, that men are smarter than women, men are more serious than women, men are better workers than women, men are more likely to be technically skilled than women. Those attitudes may not be part of our conscious thought, but they are part of the cultural background that gets drummed into us by school, by books, by movies, by television, by commercials.

As I said, that was a real awakening for me.

I was talking about this with my next-door office neighbor, who happened to be one of the only two women in my department (about 60 people) at the time. She was shocked that I hadn’t noticed this before. So she pointed out to me that in meetings, she could say things, and everyone would ignore it, but if a guy said the same thing, they’d get listened to. We’d been in many meetings together, and I’d never noticed this!

So I started paying attention, and she was absolutely right.

What happened next is my second big awakening.

I started watching this in meetings, and when people brushed over something she’d said, I’d raise my voice and say “X just suggested blah, which I think is a really good idea. What about it?”. I wanted to help get her voice listened to.

She was furious at me. This just blew my mind. I was really upset at her at first. Dammit, I was trying to help, and this asshole was yelling at me for it! She’d complained about how people didn’t listen to her, and now when I was trying to help get her listened to, she was complaining again!

What I realized after I calmed down and listened to her was that I was wrong. I hadn’t spoken to her about doing it. I didn’t understand what it meant. But the problem was, people didn’t take her seriously because she was a woman. People might listen to me, because I’m also a white guy. But when I spoke for her, I wasn’t helping. When a man speaks on behalf of a woman, we’re reinforcing the idea that a woman’s voice isn’t supposed to be heard. I was substituting my man’s voice for her woman’s, and by doing that, I was not just not helping her, but I was actively hurting, because the social interpretation of my action was that “X can’t speak for herself”. And more, I learned that by taking offense at her, for pointing out that I had screwed up, I was definitely in the wrong – that I had an instinct for reacting wrong.

What I learned, gradually, from watching things like this, from becoming more sensitive and aware, and by listening to what women said, was that this kind of thing is that I was completely clueless.

The fact is, I constantly benefit from a very strong social preference. I don’t notice that. Unless I’m really trying hard to pay attention, I’m not aware of all of the benefits that I get from that. I don’t notice all of the times when I’m getting a benefit. Worse, I don’t notice all of the times when my behavior is asserting that social preference as my right.

It’s very easy for a member of an empowered majority to just take things for granted. We see the way that we are treated as a default, and assume that everyone is treated the same way. We don’t perceive that we are being treated preferentially. We don’t notice that the things that offend us are absolutely off limits to everyone, but that things that we do to offend others are accepted as part of normal behavior. Most importantly, we don’t notice when our behavior is harmful to people who aren’t part of our empowered group. And when we do offend someone who isn’t part of the empowered majority, we take offense at the fact that they’re offended. Because they’re saying that we did something bad, and we know that we aren’t bad people!

The way that this comes back to the whole Adria Richards fiasco is very simple. Many people have looked at what happened at PyCon, and said something like “She shouldn’t have tweeted their picture”, or “She shouldn’t have been offended, they didn’t do anything wrong”, or “She should have just politely spoken to them”.

I don’t know whether what she did was right or not. I wasn’t there. I didn’t hear the joke that the guys in question allegedly told. What I do know is that for a member of the minority out-group, there is frequently no action that will be accepted as “right” if it includes the assertion that the majority did something offensive.

I’ve seen this phenomena very directly myself, not in the context of sexism, but in terms of antisemitism. There’s an expression that I’ve heard multiple times in the northeast US, to talk about bartering a price for a car: “jewing the salesman down”. I absolutely find that extremely offensive. And I’ve called people out on it. There is no response that’s actually acceptable.

If I politely say “You know, that’s relying on a stereotype of me and my ancestors that’s really hurtful”, the response is: “Oh, come on, it’s just harmless. I’m not talking about you, it’s just a word. You’re being oversensitive”. If I get angry, the response is “You Jews are so strident”. If I go to an authority figure in the setting, “You Jews are so passive aggressive, why couldn’t you just talk to me?”. No matter what I do, I’m wrong. Women deal with this every day, only they’re in a situation where the power dynamic is even less in their favor.

That’s the situation that women – particularly women in tech – find themselves in every day. We are sexist. We do mistreat women in tech every day, without even knowing that we’re doing it. And we’re very likely to take offense if they mention that we did something wrong. Because we know that we’re good people, and since we aren’t deliberately doing something bad, they must be wrong.

For someone in Adria Richards’ situation at PyCon, there is no course of action that can’t be taken wrong. As a woman hearing the joke in question, she certainly knew whether or not it was offensive to her. But once she’d heard something offensive, there was nothing she could do that someone couldn’t turn into a controversy.

Was the joke offensive? We don’t know what, specifically, he said. The only fact that we’re certain of is that in her judgement, it was offensive; that the authorities at PyCon agreed, and asked the gentleman in question to apologize.

Did the guy who made the joke deserve to be fired? I don’t know. If this stupid joke were the first time he’d ever done something wrong, then he didn’t deserve to be fired. But we don’t know what his history is like. I know how hard it is to hire skilled engineers, so I’m very skeptical that any company would fire someone over one minor offense. It’s possible that his company has a crazy hair-trigger HR department. But it’s also possible that there’s background that we don’t know about. That he’s done stuff before, and been warned. If that’s the case, then his company could have decided that this was the last straw.

Did Adria Richards deserve to be fired? Almost certainly not. We know more about her case than we do about the guy who told the joke. We know that her company fired her over this specific incident, because in their announcement of her firing, they told us the reason. They didn’t cite any past behavior – they just specifically cited this incident and its aftermath as the reason for firing her. It’s possible that there’s a history here that we don’t know about, that she’d soured relations with customers of her company in incidents other than this, and that this was a last straw. But it doesn’t seem likely, based on the facts that we’re aware of.

Did either of them deserve to be threatened? Absolutely not.

Genius Continuum Crackpottery

This post was revised on June 25, 2014. Mr. Wince has been threatening to sue me for libel. I don’t think that that’s right, but one thing that he’s complained about is correct. I called him a high school dropout. In his article, Wince refers to “when he dropped out of high school”, but in the same sentence, he goes on to say that he dropped out to attend community college. Calling him a dropout is a cheap shot, which I shouldn’t have included, and for that, I apologize. I’ve removed the line from the post. I still think that his math is laughably wrong, but I shouldn’t have called him a dropout.

There’s a lot of mathematical crackpottery out there. Most of it is just pointless and dull. People making the same stupid mistakes over and over again, like the endless repetitions of the same-old supposed refutations of Cantor’s diagonalization.

After you eliminate that, you get reams of insanity – stuff which
is simply so incoherent that it doesn’t make any sense. This kind of thing is usually word salad – words strung together in ways that don’t make sense.

After you eliminate that, sometimes, if you’re really lucky, you’ll come accross something truly special. Crackpottery as utter genius. Not genius in a good way, like they’re an outsider genius who discovered something amazing, but genius in the worst possible way, where someone has created something so bizarre, so overwrought, so utterly ridiculous that it’s a masterpiece of insane, delusional foolishness.

Today, we have an example of that: Existics!. This is a body of work by a guy named Gavin Wince with truly immense delusions of grandeur. Pomposity on a truly epic scale!

I’ll walk you through just a tiny sample of Mr. Wince’s genius. You can go look at his site to get more, and develop a true appreciation for this. He doesn’t limit himself to mere mathematics: math, physics, biology, cosmology – you name it, Mr. Wince has mastered it and written about it!

The best of his mathematical crackpottery is something called C3: the Canonized Cardinal Continuum. Mr. Wince has created an algebraic solution to the continuum hypothesis, and along the way, has revolutionized number theory, algebra, calculus, real analysis, and god only knows what else!

Since Mr. Wince believes that he has solved the continuum hypothesis. Let me remind you of what that is:

  1. If you use Cantor’s set theory to explore numbers, you get to the uncomfortable result that there are different sizes of infinity.
  2. The smallest infinite cardinal number is called ℵ0,
    and it’s the size of the set of natural numbers.
  3. There are cardinal numbers larger than ℵ0. The first
    one larger than ℵ0 is ℵ1.
  4. We know that the set of real numbers is the size of the powerset
    of the natural numbers – 20 – is larger than the set of the naturals.
  5. The question that the continuum hypothesis tries to answer is: is the size
    of the set of real numbers equal to ℵ1? That is, is there
    a cardinal number between ℵ0 and |20|?

The continuum hypothesis was “solved” in 1963. In 1940, Gödel showed that you couldn’t disprove the continuum hypothesis using ZFC. In 1963,
another mathematician named Paul Cohen, showed that it couldn’t be proven using ZFC. So – a hypothesis which is about set theory can be neither proven nor disproven using set theory. It’s independent of the axioms of set theory. You can choose to take the continuum hypothesis as an axiom, or you can choose to take the negation of the continuum hypothesis as an axiom: either choice is consistent and valid!

It’s not a happy solution. But it’s solved in the sense that we’ve got a solid proof that you can’t prove it’s true, and another solid proof that you can’t prove it’s false. That means that given ZFC set theory as a basis, there is no proof either way that doesn’t set it as an axiom.

But… Mr. Wince knows better.

The set of errors that Wince makes is really astonishing. This is really seriously epic crackpottery.

He makes it through one page without saying anything egregious. But then he makes up for it on page 2, by making multiple errors.

First, he pulls an Escultura:

x1 = 1/21 = 1/2 = 0.5
x2 = 1/21 + 1/22 = 1/2 + 1/4 = 0.75
x3 = 1/21 + 1/22 + 1/23 = 1/2 + 1/4 + 1/8 = 0.875

At the end or limit of the infinite sequence, the final term of the sequence is 1.0

In this example we can see that as the number of finite sums of the sequence approaches the limit infinity, the last term of the sequence equals one.
xn = 1.0
If we are going to assume that the last term of the sequence equals one, it can be deduced that, prior to the last term in the sequence, some finite sum in the series occurs where:
xn-1 = 0.999…
xn-1 = 1/21 + 1/22 + 1/23 + 1/24 + … + 1/2n-1 = 0.999…
Therefore, at the limit, the last term of the series of the last term of the sequence would be the term, which, when added to the sum 0.999… equals 1.0.

There is no such thing as the last term of an infinite sequence. Even if there were, the number 0.999…. is exactly the same as 1. It’s a notational artifact, not a distinct number.

But this is the least of his errors. For example, the first paragraph on the next page:

The set of all countable numbers, or natural numbers, is a subset of the continuum. Since the set of all natural numbers is a subset of the continuum, it is reasonable to assume that the set of all natural numbers is less in degree of infinity than the set containing the continuum.

We didn’t need to go through the difficult of Cantor’s diagonalization! We could have just blindly asserted that it’s obvious!

or actually… The fact that there are multiple degrees of infinity is anything but obvious. I don’t know anyone who wasn’t surprised the first time they saw Cantor’s proof. It’s a really strange idea that there’s something bigger than infinity.

Moving on… the real heart of his stuff is built around some extremely strange notions about infinite and infinitessimal values.

Before we even look at what he says, there’s an important error here
which is worth mentioning. What Mr. Wince is trying to do is talk about the
continuum hypothesis. The continuum hypothesis is a question about the cardinality of the set of real numbers and the set of natural numbers.
Neither infinites nor infinitessimals are part of either set.

Infinite values come into play in Cantor’s work: the cardinality of the natural numbers and the cardinality of the reals are clearly infinite cardinal numbers. But ℵ0, the smallest infinite cardinal, is not a member of either set.

Infinitessimals are fascinating. You can reconstruct differential and integral calculus without using limits by building in terms of infinitessimals. There’s some great stuff in surreal numbers playing with infinitessimals. But infinitessimals are not real numbers. You can’t reason about them as if they were members of the set of real numbers, because they aren’t.

Many of his mistakes are based on this idea.

For example, he’s got a very strange idea that infinites and infinitessimals don’t have fixed values, but that their values cover a range. The way that he gets to that idea is by asserting the existence
of infinity as a specific, numeric value, and then using it in algebraic manipulations, like taking the “infinityth root” of a real number.

For example, on his way to “proving” that infinitessimals have this range property that he calls “perambulation”, he defines a value that he calls κ:

 sqrt[infty]{infty} = 1 + kappa

In terms of the theory of numbers, this is nonsense. There is no such thing as an infinityth root. You can define an Nth root, where N is a real number, just like you can define an Nth power – exponents and roots are mirror images of the same concept. But roots and exponents aren’t defined for infinity, because infinity isn’t a number. There is no infinityth root.

You could, if you really wanted to, come up with a definition of exponents that that allowed you to define an infinityth root. But it wouldn’t be very interesting. If you followed the usual pattern for these things, it would be a limit: sqrt[infty]{x}  lim_{nrightarrowinfty} sqrt[n]{x}. That’s clearly 1. Not 1 plus something: just exactly 1.

But Mr. Cringe doesn’t let himself be limited by silly notions of consistency. No, he defines things his own way, and runs with it. As a result, he gets a notion that he calls perambulation. How?

Take the definition of κ:

 sqrt[infty]{infty} = 1 + kappa

Now, you can, obviously, raise both sides to the power of infinity:

infty = (1 + kappa)^{infty}

Now, you can substitute ℵ0 for infty. (Why? Don’t ask why. You just can.) Then you can factor it. His factoring makes no rational sense, so I won’t even try to explain it. But he concludes that:

  • Factored and simplified one way, you end up with (κ+1) = 1 + x, where x is some infinitessimal number larger than κ. (Why? Why the heck not?)
  • Factored and simplified another way, you end up with (κ+1) = ℵ
  • If you take the mean of of all of the possible factorings and reductions, you get a third result, that (κ+1) = 2.

He goes on, and on, and on like this. From perambulation to perambulating reciprocals, to subambulation, to ambulation. Then un-ordinals, un-sets… this is really an absolute masterwork of utter insane crackpottery.

Do download it and take a look. It’s a masterpiece.

Pi-day randomness

One of my twitter friends was complaining about something that’s apparently making the rounds of Facebook for π-day. It annoyed me sufficiently to be worth ranting about a little bit.

Why isn’t π rational if π=circumference/diameter, and both measurements are plainly finite?

There’s a couple of different ways of interpreting this question.

The stupidest way of interpreting it is that the author didn’t have any clue of what an irrational number is. An irrational number is a number which cannot be written as a ratio of two integers. Another way of saying essentially the same thing is that there’s no way to create a finite representation of an irrational number. I’ve seen people get this wrong before, where they confuse not having a finite representation with not being finite.

π doesn’t have a finite representation. But it’s very clearly finite – it’s less that 3 1/4, which is obviously not infinite. Anyone who can look at π, and be confused about whether or not it’s finite is… well… there’s no nice way to say this. If you think that π isn’t finite, you’re an idiot.

The other way of interpreting this statement is less stupid: it’s a question of measurement. If you have a circular object in real life, then you can measure the circumference and the diameter, and do the division on the measurements. The measurements have finite precision. So how can the ratio of two measurements with finite precision be irrational?

The answer is, they can’t. But perfect circles don’t exist in the real world. Many mathematical concepts don’t exist in the real world. In the real world, there’s no such thing as a mathematical point, no such thing as a perfect line, no such thing as perfectly parallel lines.

π isn’t a measured quantity. It’s a theoretical quantity, which can be computed analytically from the theoretical properties derived from the abstract properties of an ideal, perfect circle.

No “circle” in the real world has a perfect ratio of π between its circumference and its diameter. But the theoretical circle does.

The facebook comments on this get much worse than the original question. One in particular really depressed me.

Just because the measurements are finite doesn’t mean they’re rational.
Pi is possibly rational, we just haven’t figured out where it ends.

Gah, no!

We know an awful lot about π. And we know, with absolute, 100% perfect certainty that π never ends.

We can define π precisely as a series, and that series makes it abundantly clear that it never ends.

pi = frac{4}{1} - frac{4}{3} + frac{4}{5} - frac{4}{7} + frac{4}{9} ...

That series goes on forever. π can’t ever end, because that series never ends.

Just for fun, here’s a little snippet of Python code that you can play with. You can see how, up to the limits of your computer’s floating point representation, that a series computation of π keeps on going, changing with each additional iteration.

def pi(numiter):
  val = 3.0
  sign = 1
  for i in range(numiter):
    term = ((i+1)*2) * ((i+1)*2 + 1) * ((i+1) *2 + 2)
    val = val + sign*4.0/term
    sign = sign * -1
  return val

Finally: Gödel’s Proof of Incompleteness!

Finally, we’re at the end of our walkthrough of Gödel great incompleteness proof. As a refresher, the basic proof sketch is:

  1. Take a simple logic. We’ve been using a variant of the Principia Mathematica’s logic, because that’s what Gödel used.
  2. Show that any statement in the logic can be encoded as a number using an arithmetic process based on the syntax of the logic. The process of encoding statements numerically is called Gödel numbering.
  3. Show that you can express meta-mathematical properties of logical statements in terms of arithemetic properties of their Gödel numbers. In particular, we need to build up the logical infrastructure that we need to talk about whether or not a statement is provable.
  4. Using meta-mathematical properties, show how you can create an unprovable statement encoded as a Gödel number.

What came before:

  1. Gödel numbering: The logic of the Principia, and how to encode it as numbers. This was step 1 in the sketch.
  2. Arithmetic Properties: what it means to say that a property can be expressed arithemetically. This set the groundwork for step 2 in the proof sketch.
  3. Encoding meta-math arithmetically: how to take meta-mathematical properties of logical statements, and define them as arithmetic properties of the Gödel numberings of the statements. This was step 2 proper.

So now we can move on to step three, where we actually see why mathematical logic is necessarily incomplete.

When we left off with Gödel, we’d gone through a very laborious process showing how we could express meta-mathematical properties of logical statements as primitive recursive functions and relations. We built up to being able to express one non-primitive recursive property, which describes the property that a given statement is provable:

pred provable(x) =
  some y {
    proofFor(y, x)
  }
}

The reason for going through all of that was that we really needed to show how we could capture all of the necessary properties of logical statements in terms of arithmetic properties of their Gödel numbers.

Now we can get to the target of Gödel’s effort. What Gödel was trying to do was show how to defeat the careful stratification of the Principia’s logic. In the principia, Russell and Whitehead had tried to avoid problems with self-reference by creating a very strict stratification, where each variable or predicate had a numeric level, and could only reason about objects from lower levels. So if natural numbers were the primitive objects in the domain being reasoned about, then level-1 objects would be things like specific natural numbers, and level-1 predicates could reason about specific natural numbers, but not about sets of natural numbers or predicates over the natural numbers. Level-2 objects would be sets of natural numbers, and level-2 predicates could reason about natural numbers and sets of natural numbers, but not about predicates over sets of natural numbers, or sets of sets of natural numbers. Level-3 objects would be sets of sets of natural numbers… and so on.

The point of this stratification was to make self-reference impossible. You couldn’t make a statement of the form “This predicate is true”: the predicate would be a level-N predicate, and only a level N+1 predicate could reason about a level-N predicate.

What Gödel did in the laborious process we went through in the last post is embed a model of logical statements in the natural numbers. That’s the real trick: the logic is designed to work with a set of objects that are a model of the natural numbers. By embedding a model of logical statements in the natural numbers, he made it possible for a level-1 predicate (a predicate about a specific natural number) to reason about any logical statement or object. A level-1 predicate can now reason about a level-7 object! A level-1 predicate can reason about the set defined by a level-1 predicate: a level-1 predicate can reason about itself!.

Now, we can finally start getting to the point of all of this: incompleteness! We’re going to use our newfound ability to nest logical statements into numbers to construct an unprovable true statement.

In the last post, one of the meta-mathematical properties that we defined for the Gödel-numbered logic was immConseq, which defines when some statement x is an immediate consequence of a set of statements S. As a reminder, that means that x can be inferred from statements in S in one inferrence step.

We can use that property to define what it means to be a consequence of a set of statements: it’s the closure of immediate consequence. We can define it in pseudo-code as:

def conseq(κ) = {
  K = κ + axioms
  added_to_k = false
  do {
    added_to_k = false
    for all c in immConseq(K) {
      if c not in K {
        add c to K
        added_to_k = true
      }
    }
  } while added_to_k
  return K
}

In other words, Conseq(κ) is the complete set of everything that can possibly be inferred from the statements in κ and the axioms of the system. We can say that there’s a proof for a statement x in κ if and only if x ∈ Conseq(κ).

We can the idea of Conseq use that to define a strong version of what it means for a logical system with a set of facts to be consistent. A system is ω-consistent if and only if there is not a statement a such that: a ∈ Conseq(κ) ∧ not(forall(v, a)) ∈ Conseq(κ).

In other words, the system is ω-consistent as long as it’s never true that both a universal statement and it. But for our purposes, we can treat it as being pretty much the same thing. (Yes, that’s a bit hand-wavy, but I’m not trying to write an entire book about Gödel here!)

(Gödel’s version of the definition of ω-consistency is harder to read than this, because he’s very explicit about the fact that Conseq is a property of the numbers. I’m willing to fuzz that, because we’ve shown that the statements and the numbers are interchangable.)

Using the definition of ω-consistency, we can finally get to the actual statement of the incompleteness theorem!

Gödel’s First Incompleteness Theorem: For every ω-consistent primitive recursive set κ of formulae, there is a primitive-recursive predicate r(x) such that neither forall(v, r) nor not(forall(v, r)) is provable.

To prove that, we’ll construct the predicate r.

First, we need to define a version of our earlier isProofFigure that’s specific to the set of statements κ:

pred isProofFigureWithKappa(x, kappa) = {
  all n in 1 to length(x) {
    isAxiom(item(n, x)) or
    item(n, x) in kappa or
    some p in 0 to n {
      some q in 0 to n {
        immedConseq(item(n, x), item(p, x), item(q, x))
      }
    }
  } and length(x) > 0
}

This is the same as the earlier definition – just specialized so that it ensures that every statement in the proof figure is either an axiom, or a member of κ.

We can do the same thing to specialize the predicate proofFor and provable:

pred proofForStatementWithKappa(x, y, kappa) = {
  isProofFigureWithKappa(x, kappa) and
  item(length(x), x) = y
}

pred provableWithKappa(x, kappa) = {
  some y {
    proofForStatementWithKappa(y, x, kappa)
  }
}

If κ is the set of basic truths that we can work with, then provable in κ is equivalent to provable.

Now, we can define a predicate UnprovableInKappa:

pred NotAProofWithKappa(x, y, kappa) = {
  not (proofForKappa(x, subst(y, 19, number(y))))
}

Based on everything that we’ve done so far, NotAProofWithKappa is primitive recursive.

This is tricky, but it’s really important. We’re getting very close to the goal, and it’s subtle, so let’s take the time to understand this.

  • Remember that in a Gödel numbering, each prime number is a variable. So 19 here is just the name of a free variable in y.
  • Using the Principia’s logic, the fact that variable 19 is free means that the statement is parametric in variable 19. For the moment, it’s an incomplete statement, because it’s got an unbound parameter.
  • What we’re doing in NotAProofWithKappa is substituting the numeric coding of y for the value of y‘s parameter. When that’s done, y is no longer incomplete: it’s unbound variable has been replaced by a binding.
  • With that substitution, NotAProofWithKappa(x, y, kappa) is true when x does not prove that y(y) is true.

What NotAProofWithKappa does is give us a way to check whether a specific sequence of statements x is not a proof of y.

We want to expand NotAProofWithKappa to something universal. Instead of just saying that a specific sequence of statements x isn’t a proof for y, we want to be able to say that no possible sequence of statements is a proof for y. That’s easy to do in logic: you just wrap the statement in a “∀ x ( )”. In Gödel numbering, we defined a function that does exactly that. So the universal form of provability is: ∀ a (NotAProofWithKappa(a, y, kappa)).

In terms of the Gödel numbering, if we assume that the Gödel number for the variable a is 17, and the variable y is numbered as 19, we’re talking about the statement p = forall(17, ProvableInKappa(17, 19, kappa).

p is the statement that for some logical statement (the value of variable 19, or y in our definition), there is no possible value for variable 17 (a) where a proves y in κ.

All we need to do now is show that we can make p become self-referential. No problem: we can just put number(p) in as the value of y in UnprovableInKappa. If we let q be the numeric value of the statement UnprovableInKappa(a, y), then:

r = subst(q, 19, p)

i = subst(p, 19, r)

i says that there is no possible value x that proves p(p). In other words, p(p) is unprovable: there exists no possible proof that there is no possible proof of p!

This is what we’ve been trying to get at all this time: self-reference! We’ve got a predicate y which is able to express a property of itself. Worse, it’s able to express a negative property of itself!

Now we’re faced with two possible choices. Either i is provable – in which case, κ is inconsistent! Or else i is unprovable – in which case κ is incomplete, because we’ve identified a true statement that can’t be proven!

That’s it: we’ve shown that in the principia’s logic, using nothing but arithmetic, we can create a true statement that cannot be proven. If, somehow, it were to be proven, the entire logic would be inconsistent. So the principia’s logic is incomplete: there are true statements that cannot be proven true.

We can go a bit further: the process that we used to produce this result about the Principia’s logic is actually applicable to other logics. There’s no magic here: if your logic is powerful enough to do Peano arithmetic, you can use the same trick that we demonstrated here, and show that the logic must be either incomplete or inconsistent. (Gödel proved this formally, but we’ll just handwave it.)

Looking at this with modern eyes, it doesn’t seem quite as profound as it did back in Gödel’s day.

When we look at it through the lens of today, what we see is that in the Principia’s logic, proof is a mechanical process: a computation. If every true statement was provable, then you could take any statement S, and write a program to search for a proof of either S or ¬ S, and eventually, that program would find one or the other, and stop.

In short, you’d be able to solve the halting problem. The proof of the halting problem is really an amazingly profound thing: on a very deep level, it’s the same thing as incompleteness, only it’s easier to understand.

But at the time that Gödel was working, Turing hadn’t written his paper about the halting problem. Incompletess was published in 1931; Turing’s halting paper was published in 1936. This was a totally unprecedented idea when it was published. Gödel produced one of the most profound and surprising results in the entire history of mathematics, showing that the efforts of the best mathematicians in the world to produce the perfection of mathematics were completely futile.