Category Archives: Good Math

Multiplying Spaces

When people talk informally about topology, we always say that the basic idea of equivalence is that two spaces are equivalent if they can be bent, stretched, smushed, or twisted into each other, without tearing or gluing. A mug is the same shape as a donut, because you can make a donut out of clay, and then shape that donut into a mug without tearing, punching holes, or gluing pieces together. A sphere is the same shape as a cube, because if you’ve got a clay sphere, you can easily reshape it into a cube, and vice-versa.

Homeomorphism is the actual formal definition of that sense of equivalence. The intuition is fantastic – it’s one of the best informal description of a difficult formal concept that I know of in math! But it’s not ideal. WHen you take a formal idea and make it informal, you always lose some details.

What we’re going to do here is try to work our way gradually through the idea of transformability and topological equivalence, so that we can really understand it. Before we can do that, we need to be able to talk about what a continuous transformation is. To talk about continuous transformations, we need to be able to talk about some topological ideas called homotopy and isotopy. And to be able to define those, we need to be able to use topological products. (Whew! Nothing is ever easy, is it?) So today’s post is really about topological products!

The easiest way that I can think of to explain the product of two topological spaces is to say that it’s a way of combining the structures of the spaces by adding dimensions. For example, if you start with two spaces each of which is a line segment, the product of those two spaces is a square (or a circle, or an octagon, or …) You started with two one-dimensional spaces, and used them to create a new two-dimensional space. If you start with a circle and a line, the product is a cylinder.

In more formal terms, topological products are a direct extension of cartesian set products. As the mantra goes, topological spaces are just sets with structure, which means that the cartesian product of two topological sets is just the cartesian products of their point-sets, plus a combined structure that preserves attributes of the original structure of the spaces.

Let’s start with a reminder of what the cartesian product of two sets is. Given a set A and a set B, the cartestian product A \times B is defined as the set of all possible pairs (a, b), where a \in A and b \in B. If A=\{1, 2, 3\} and B=\{4, 5\}, then A\times B = \{ (1, 4), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5)  \}.

In category theory, we take the basic idea of the cartesian product, and extend it to a general product of different mathematical objects. It does this by using the idea of projections. In this model, instead of saying that the product of sets A and B is a set of pairs (a, b), we can instead say that the product is a set S of objects, and two functions P_A : S \rightarrow A and P_B : S \rightarrow B. (To be complete, we’d need to add some conditions, but the idea should be clear from this much.) Given any object in the the product set S, P_A(S) will give us the projection of that object onto A. This becomes more interesting when we consider sets of objects. The A-projection of a collection of points from the product set S is the shadow that those points cast onto the set A.

A topological product is easiest to understand with that categorical approach. The set of points in a product category A \times B is the cartesian product of the sets of points in A and the sets of points in B. The trick, with topologies, is that you need to describe the topological structure of the product set: you need to be able to say what the neighorhoods are. There are lots of ways that you could define the neighborhoods of the product, but we define it as the topological space with the smallest collection of open-sets. To understand how we get that, the projections of the category theoretical approach make it much easier.

Informally, the neighborhoods in the product A \times B are things that cast shadows into the topological spaces A and B which are neighborhoods in A and B.

Suppose we have topological spaces A and B. If S is the product topology A \times B, then it has projection functions P_A: S \rightarrow A and P_B: S \rightarrow P_B.

The projection functions from the product need to maintain the topological structure of the original topologies. That means that the projection function must be continuous. And that, in turn, means that the inverse image of the projection function is an open set. So: for each open set O in A, P_A^{-1}(O) is an open-set in S.

Let’s look at an example. We’ll start with two simple topological spaces – a cartesian plane (2d), and a line (1d). In the plane, the neighborhoods are open circles; in the line, the neighborhoods are open intervals. I’ve illustrated those below.

open-sets

The product of those two is a three dimensional space. The neighborhoods in this space are cylinders. If you use the projection from the product to the plane, you get open circles – the neighborhood structure of the plane. If you use the projection from the product to the line, you get open intervals – the neighborhood structure of the line.

open-set-cyl

One interesting side-point here. One thing that we come across constantly in this kind of formal math is the axiom of choice. The AoC is an annoying bugger, because it varies from being obviously true to being obviously ridiculously false. Topological products is one of the places where it’s obviously true. The axiom choice is equivalent to the statement that given a collection of non-empty topological spaces, the product space is not empty. Obvious, right? But then look at the Banach-Tarski paradox.

Bitcoin, MtGox, and Deflation

This is the last bitcoin post. Here I’ll try to answer the questions that led me to start writing about it.

What is MtGox? What happened there?

MtGox was a company that maintained bitcoin wallets. The basic idea is that they acted like a bank/broker for bitcoins. If you want to get bitcoins, you can go to someone like MtGox, and give them some money. They create a public/private keypair for you, and use it to create a transaction giving you the bitcoins. When you want to make a purchase, you’d go to your MtGox account, and tell them to transfer the bitcoins, and they use your key to sign the transaction, and then broadcast it to the bitcoin network. It is through processes like this one that you can buy Bitcoin with PayPal.

By using MtGox, you don’t need to have a program that participates in the bitcoin network to do transactions. You don’t need to worry about keeping your keys safe. You don’t need to have software capable of generating and signing transactions. All you need is your web-browser, to log in to MtGox.

Here’s where the problems start: MtGox didn’t start off as a bitcoin bank. In fact, they started off about as far from banking as you can imagine. From the name, you might think that MtGox is named after a mountain. Nope! It’s an acronym, for “Magic: the Gathering Online Exchange”. MtGox started off as a trading card exchange market.

This continues to boggle my mind. I just can’t quite wrap my head around it. A hacked together trading card exchange site decides to start acting as a sort of electronic bank/currency broker. And people trusted them with hundreds of millions of dollars!.

What happened is completely predictable.

You have an online site that manages massive quantities of money. Criminals are going to try to steal from it. Hell, when I was administrating Scientopia, at least once a week, I’d get email from someone with some kind of scam to try to manipulate google ads with fake clickthroughs, offering to split the profit. Scientopia’s revenue was only in the hundred dollar a month range – but it was still enough to attract crooks and scammers. Imagine what happens when it’s not $10 to be made, but $100,000,000?!

Crooks tried to steal money from MtGox. From what we know (there’s still a lot about this that’s still being figured out), they succeeded. They found a weakness in the MtGox implementation of the bitcoin protocol, and they exploited it to steal a massive number of bitcoins.

The ridiculous thing about all of this is, as I said above, it was totally predictable. You should never just hack together cryptosystems. You should never just hack together anything that handles money. When you hack together a crpytosystem that handles money, it’s pretty much a given that money is going to get lost.

If you want to deal with money, you need to be really, really serious about security. That doesn’t just mean making sure you write code. It means having an entire staff of people who’s job it is to make sure that you don’t fuck up. It means having people working full time, trying to break your system – because if they can break it, so can someone else! It means having a strongly adverserial setup, where the people trying to break it really want to break it – they can’t be the same people who want it to not get broken. It means having a different team of people who’s full time job is auditing – constantly watching the system, checking transactions, verifying them, making sure that everything is working correctly, catching any potential problems the moment they start, instead of letting them continue until they become disasters.

MtGox had none of that. It was a hacked together site. To get a sense of the way it was built, just look at the CEO’s blog, where he talks about implementing SSH in PHP. I’m not saying that he used this SSH code in MtGox – but read it, and read the comments, and you’ll get a sense of how poorly he understands security issues.

What does it mean when people say that Bitcoin is deflationary?

When you read the hype around bitcoin, you also see a lot of criticisms from the skeptics. I am one of the skeptics, but I’m trying to be as fair as I can in these posts. One of the criticisms that you constantly see is that Bitcoin is deflationary.

As I mentioned in yesterdays post, the only source of new bitcoins is mining. Each time the ledger gets updated with a new block in the blockchain, the person who generated the solution for that block gets a bounty, in the form of newly created bitcoins. Today, the bounty for a block is 25 bitcoins. But the bitcoin protocol specifies that that bounty will gradually decline and eventually disappear. When that happens, the miners will receive a commision, in the form of a transaction fee for transactions in the new block, but they won’t get new bitcoins. When the system gets to that point, the supply of bitcoins will be fixed: no new bitcoins, ever.

Lots of people think that that’s a good thing. After all, inflation sucks, right? This will be a fixed supply of money, whose value can’t be manipulated by politicians.

The catch is that nothing is ever that simple.

First: the fact that new bitcoins will not be issued means that the total supply of bitcoins will decline. People die without giving their passwords to their heirs. Passwords get lost. People forget about bank accounts. All of those things are more mean that bitcoins fall out of circulation. So not only is the supply of bitcoins going to stop increasing, it’s going to start decreasing. In fact, the bitcoin folks are completely open about this:

Because of the law of supply and demand, when fewer bitcoins are available the ones that are left will be in higher demand, and therefore will have a higher value. So, as Bitcoins are lost, the remaining bitcoins will eventually increase in value to compensate. As the value of a bitcoin increases, the number of bitcoins required to purchase an item decreases. This is a deflationary economic model. As the average transaction size reduces, transactions will probably be denominated in sub-units of a bitcoin such as millibitcoins (“Millies”) or microbitcoins (“Mikes”).

Is it really a problem? Maybe. I don’t know enough about economics to have a strong opinion, but it’s certainly enough to be worrying. The argument runs as follows:

When the supply of money is decreasing, it means that there’s less money available for making purchases – which means that the value of the money needs to increase. A bitcoin will need to be able to purchase more today than it did yesterday. And that is a serious problem.

Economies work best when money is kept moving. In an ideal world, money isn’t an asset at all: it’s just a medium. You want people to make products, sell them to other people, and then use the money that they made. If they take their money and hide it in a mattress, there’s going to be less activity in the economy than if they used it. The whole idea of money is just to make it easier to match up producers and consumers; when money is taken out of the system, it means that there’s potential economic activity that can’t happen, because the money to make it happen has been withdrawn from the system.

This is why most governments try to run their economies so that there is a moderate amount of inflation. Inflation means that if you take your money and hide it in your mattress, its value will slowly decrease. It means that withdrawing your money from the system is a losing proposition! So a bit of inflation acts as a motivation to put your money to work producing something.

Deflation, on the other hand, does the opposite. Suppose that today, I’ve got 10 bitcoins and 100 dollars, and they’re worth the same amount of money. I’m going to go buy some bacon. I can spend $10 buying bacon, and keep $90 and 10 bitcoins; or I can spend 1 bitcoin, and key 9 bitcoins and $100. So overall, I’ve got the equivalent of $190 and some bacon.

Next week, the value of bitcoins has risen to $15/bitcoin. If I spent my dollars to buy bacon, then now I’ve got $150 worth of bitcoins, $90 worth of dollars, and some bacon – my total asserts are equal to $240 and some bacon. If I spent my bitcoin, then I’d have $135 worth of bitcoins, $100 worth of dollars, and some bacon – $235. If I used my bitcoin to buy stuff, I lost $5.

That means that I’m strongly motivated to not use my bitcoins. And that’s not a good thing. That kind of deflation is very harmful to an economy – for example, look at Japan during the 1990s and 2000s, and to some extent still today.

The Tech of Bitcoin

Now we can get to the stuff about bitcoin that’s actually interesting. How does it work?

Before I start, one major caveat. I’m deliberately talking about this in vague terms – enough so that you an understand how and why it works, but not enough so that you could do something like implement it. Like anything else involving cryptography: if you’re thinking about implementing your own crpytosystem, don’t!

Cryptography is an area where even seasoned experts can easily make mistakes. A serious crpytosystem is built by a team of professionals, including people whose entire job is do everything in their power to break it. And even then, it’s all to easy to wind up with un-noticed bugs! When it comes to something like bitcoin, an inexperienced cryptographer trying to implement a new agent for the bitcoin network is insane: you’re dealing with money, and you’re risking losing a whole lot of it if you screw up. (That’s basically what appears to have happened to mtgox – and if the reports I’m reading are correct, they managed to lose hundreds of millions of dollars worth of bitcoins.

On to the fun part!

Basically, bitcoin is a protocol. That means that it’s really just a system that defines how to communicate information between a collection of computers. Everything about bitcoin is defined by that protocol.

At the heart of the protocol is the ledger. The ledger is a list of transactions that says, essentially A gave N bitcoins to B. The only way to get a bitcoin is by a transaction: there needs to be a transaction saying that someone or something gave you a bitcoin. Once you have one, you can transfer it to someone else, by adding a new transaction to the ledger.

That’s the basic idea of it. The beauty of bitcoin is that at the core, it’s incredibly simple – it’s just a list of transactions. What makes it interesting is the crpytographic magic that makes it work. In theory, the ledger could be simple text. Each line would contain a sender, a receiver, and a quantity. If you have that, it’s enough to manage transactions. But there are a bunch of problems with a simple ledger approach, which mostly involve making sure that transactions in the ledger are valid and that they can’t be changed.

In addition, there are problems that come about because of the fact that bitcoin wants to be completely decentralized. There is no authority in charge of bitcoin. That means that you need to have a consensus based system. Anyone can join the network of bitcoin managers at any time, and anyone in the network can drop out at any time – but the network as a whole must always have a consensus about what the current ledger is. There’s also the question of where the coins come from!

We’ll start with the question of authentication.

Suppose I own a bitcoin, and I want to use it to buy a loaf of bread from friendly baker, Barbera. I need to give Barbera my bitcoin. To do that, I need to add an entry to the ledger saying “MarkCC gave one bitcoin to Barbera”. The problem is, the ledger entry needs to contain something to prove that I’m really the owner of the bitcoin that I’m trying to transfer! If it doesn’t do that, then my arch-nemesis, Tony the thief, could just add a ledger entry saying “MarkCC gave one bitcoin to Tony”. Only I should be able to add a ledger entry transferring my bitcoin.

This one is simple: it’s a textbook use-case for public key crpytography. In a public key system, you have key-pairs. One member of the key pair is called the public key, and the other is the private key. Anything encrypted with the public key can only be decrypted using the private key; anything encrypted with the private key can only be decrypted with the public key. So you get a key pair, lock the private key away somewhere safe, and give away copies of the public key to anyone who wants it. Then if someone sees a message that can be decoded with your public key, it means that you must have been the one who sent it. No one else could encrypt a message using your private key!

In the bitcoin ledger, the lines are signed. Instead of directly saying “MarkCC gave one bitcoin to Barbera”, they say “One bitcoin was transferred from the owner of MarkCCs cryptokey to the owner of Barbera’s cryptokey”. The ledger entry for that transaction is signed using a signature generated from the ledger entry using MarkCC’s private key. Anyone can verify the validity of the transaction by checking the signature using MarkCC’s public key. Once that’s in the ledger, Barbera now own a bitcoin, and only Barbera (or whoever has access to Barbera’s private key) can do transfer that bitcoin to anyone else.

Now, on to the complicated part! There is no authoratative ledger in bitcoin. There are many copies – thousands of copies! – of the ledger, and they’re all equal. So how can you be sure that a transaction is real? Someone could write a transaction to a ledger, show you the ledger, and then you could find out that in every ledger except the one you looked at, the transaction doesn’t exist! If you there is no single place that you can check to be sure that a transaction is in the ledger, how can you be sure that a transaction is real?

The answer is a combination of a consensus protocol, and a bit of computational cleverness. When you want to add a ledger entry, you broadcast a message to the bitcoin network. Every 10 minutes or so, participants in the bitcoin network take whatever transactions were added to the current ledger, put them into a structure called a block, and perform a very difficult, semi-random computational task using the block. When they find a solution, they sign the block using the solution, and broadcast it to the network.

The first agent in the bitcoin network that completes a solution and broadcasts it to the network “wins” – their block becomes the new basis for the next block.

The blocks form a chain: block one contains some transactions; block 2 contains more transactions that happened after block1 was broadcast; block 3 contains transactions that happened after block 2 was broadcast, and so on.

At any time, the consensus ledger – that is the master, canonical ledger – is the one with the longest verifiable chain. So someone can broadcast a new solution to an old block, but it will be ignored, because there is already a longer chain in the consensus. You can be certain that your transaction can’t be revoked or lost once you see a new block issued that builds on the block containing your transaction.

This all relies on people finding solutions to computationally expensive problems. Why would anyone do it? That’s why this process of computing hashes for the blocks is called mining: because if you’re the first one who finds a solution for a block, then you get to add a transaction giving yourself 25 brand new bitcoins to yourself! Mining – the process of maintaining the ledger – becomes a sort of lottery, where the people doing the mining randomly get bonuses to motivate them to do it.

The computational side of it is clever in its simplicity. The bitcoiners want the problem to be hard enough to make it effectively impossible to cheat. But they also want to be sure that it’s easy enough that they get blocks frequently. If it takes an hour before anyone has a solution for a new block, that’s also a problem: if it takes that long to commit a transaction to the ledger, then people aren’t going to trust bitcoin for fast transactions. They can’t be sure that a bitcoin was transferred to them unless they know that the transaction was committed in an accepted block. But there’s a trick there: people want to get the mining rewards. That means that they’re constantly trying to push the limits of what they can get away with computationally. People started with bunches of PCs, and then discovered that the GPUs on their graphic cards could do it faster, and then started building custom PCs with tons of graphics cards to do bitcoin mining computations in parallel. And of course, there’s always Moore’s law: computers are constantly getting faster. That means that they can’t just pick a particular complexity for the problem and stick with it.

The solution is to make the problem variable. They start with a well known algorithm that’s a very good one-way problem (meaning that it’s relatively easy to compute a result given an input; but very, very hard to figure out the inverse – to get an input that produces a desired result. In slightly more mathematical terms, if y=f(x), then computing y is easy if you know x, but it’s very hard to compute x if you know y.

There are a bunch of well known one-way computations. They just picked one, called SHA-256 computation. Now the clever part: they make SHA-256 computation into a variable complexity problem by picking a threshold T, agreed on by consensus in the bitcoin ledger protocol. The solution for a block is a hashcode for the block plus a bit of extra data which is smaller than T: for a ledger block L, they need to find a value N called a Nonce where \textbf{SHA}(L + N) < T.

Because SHA-256 is a one-way function, there’s no good way to predict what value of N will give them a hashvalue that’s smaller than the threshold – the only way to do it is to just keep guessing new N-values, and hoping that one of them will produce an acceptable result. By reducing the value of T, you can make the problem harder; by increasing the value of T, you can make it easier. The bitcoin protocol specifies a regular interval and an algorithm for selecting a new T.

When a miner solves the problem, they publish the new ledger block to the bitcoin network, with the new ledger section and its (H, N) values. Once a new block is issued, all of the future ledger entries can only get added to the next unsolved block in the ledger.

The reason that this is safe is a matter of computation. You can go back in time, and find an old transaction, remove an entry from it, and recalculate the block. But it takes time, and other people are still moving on, computing new blocks. For your change to be accepted by the bitcoin network, you would need to issue new version of the altered block, plus new versions of any other blocks issued since the one you altered, and you’d have to do it before anyone else could issue a new block. The consensus is the longest block-chain, so issuing blocks that aren’t longer than the longest chain in the network is a waste of time. Because the computation is hard, even being one block behind is enough to make it effectively impossible to be able to change the ledger: to become the new largest chain when you’re just one block behind means you’d need to compute solutions for three blocks before anyone could find a solution for just one more block!

That last bit isn’t so clear when you read it, so let’s work through an example.

  1. Block B is issued
  2. Block C is issued
  3. You want to change block B.
  4. The current longest blockchain is […, B, C].
  5. To replace B with a new block B’, you need to issue a
    longer blockchain that the current consensus of […, B, C].
  6. That means that you need to issue B’, C’, and a new block D’ before anyone else can just issue D.
  7. By falling just one block behind, you need to issue 3 three new blocks before anyone else can issue just one.

And that, my friends, is effectively impossible.

Money and Bitcoins part 1: What is money?

Bitcoin has been in the news a lot lately. I’ve been trying to ignore it, even though lots of people have sent me questions about it.

In the last couple of days, Bitcoin has been in the news even more. Mtgox, one of the major companies associated with bitcoin has gotten into serious trouble. It’s not entirely clear what exactly is going on, but it appears that mtgox lost a massive quantity of bitcoins. As a result, they’re almost certainly going bankrupt, and a whole lot of people are about to lose/have already lost a huge amount of money. This burst of news about mtgox has turned the trickle of questions into a flood.

In order to shut you all up, I’m going to try to answer the questions. I started off working on one post, but it’s gotten out of hand – over 5000 words, and not finished yet! Instead of just posting that monstrosity as one gigantic mega-post, I’m splitting it up into several posts.

To understand bitcoin, first you need to understand what money really is. That’s going to be the focus of this post: what is money? And how is bitcoin both similar to, and different from, other things that we call money?

Money is, ultimately, an elaborate shell game. Currency is a bunch of worthless tokens that we all agree to pretend are worth something, so that we have some way of exchanging valuable things in a reasonable, fair, and easy way.

To understand that, let’s start with a really simple scenario. We’ve got two farmers: Will grows wheat and turns it into flour, and Pam raises pigs and chickens. Will has lots of wheat, so he can grind it into flour and make bread – but he’d also like to be able to have some bacon with his toast for breakfast. Pam, meanwhile, has all the bacon she can eat (lucky lady!), but she’d like to have some bread, so that she can make a BLT for lunch.

There’s an easy solution. Will goes over to Pam’s place, and offers her some bread in exchange for some bacon. Between them, they figure out how much bread is worth how much bacon, and make the trade. And hurrah! both are happy. This is simple barter – no money needed.

Things become more complicated when we add more people to the story. We can add Mary the miller who gets wheat from Will and grinds it into flour, and Bob the Baker who gets flour from Mary and bakes it into bread. Now the process of making bread has gotten more elaborate: the person who grows the wheat is no longer the person who sells the final product to the pig-farmer!

Now if Will wants his bacon, it’s harder. He has wheat, but Pam doesn’t want wheat, she wants bread! As long as Mary and Bob both like bacon, this can be made to work. Bob can go to Pam, and trade bread for her bacon. Then he can take some of the bacon he got from Pam, and give it to Mary in exchange for flour. Mary can take part of the bacon he got from Bob, and give it to Will in exchange for wheat. Then Mary, Bob, and Will all have their bacon, and Pam has her bread, and everyone is happy. We’re still in the land of barter, without any money, but it’s getting difficult, because everything is stuck going through intermediaries.

This works, as long as everyone in the chain is happy with bread and bacon. But as things get more complicated, and you get more people involved, you get situations where you have people who want something that they can’t easily trade for what they have. We could add Phil the plowmaker to our scenario. Phil makes the best plows you’ve ever seen. If Phil wants to get some wheat, he’s in great shape: Will would love to get one of Phil’s plows to plow his fields. But Phil doesn’t want to deal with freshly harvested wheat – he wants bread and Bacon! That means he’s got a problem: Pam has no use for a plow, and neither does Bob. In order to get bread and bacon in exchange for his plows, he somehow needs to get something that Pam and Bob want. He’s not part of the chain from Will to Pam.

If you’re sticking with barter, then poor Phil has a very complicated problem. He needs to go to Pam, and figure out what she wants in exchange for bacon – she could say she needs a new butcher’s knife, and she’ll trade that for bacon. So now Phil needs to find someone who’ll trade a plow for a butcher’s knife. He can go to Kevin the knifemaker, and see if he’ll take a plow. If Kevin doesn’t want a plow, then he needs to find out what Kevin wants, and see if he can find someone who’ll trade a plow for that. He’s stuck running around, trying to find the sequence of exchanges that give him what he wants. If he can’t find a chain, he’s stuck, and he’ll just have to give up and not have any bacon, even though he’s got beautiful plows.

The solution to this mess is to create create a medium of exchange. You need to create something which has a symbolic value, which everyone is willing to trade for his or her stuff. Will his wheat for little green pieces of paper. Pam exchanges her bacon for little green pieces of paper. Phil exchanges his plows for little green pieces of paper. The little green pieces of paper are completely worthless themselves – they’re just green paper! But everyone recognizes that other people want them. Because the other people want them, they know that if they take some, they’ll be able to use them to get stuff from other people.

For Phil to get his bacon, there’s still got to be some chain: he exchanges his plow for some green paper; he goes to Pam, and gives her green paper in exchange for the bacon. Pam uses that green paper to get stuff she wants. The chain is still there: ultimately, what Phil did by giving Pam the green paper is give her something that she wanted. But instead of needing to find a concrete product or service that she wanted, he was able to give her symbolic tokens of value, which should could then exchange for whatever she wanted.

That’s what money is. It’s a symbolic medium, where we all agree that the symbolic tokens are worth something. The tokens only have value because we believe that other people will also take them in exchange for their stuff/their work. And because we believe that, we’ll take them in exchange for our work. When we do this, we’re simultaneously being perfectly rational and completely delusional. We’re trading the products of our work for something utterly worthless, because we’ve all agreed to pretend that the worthless stuff isn’t worthless.

Of course, when you’ve got valuable stuff moving around, governments get involved. In fact, governments exist largely to help make it safe for valuable stuff to move around! Money ends up getting tied to governments, because the governments exist in large part specifically to provide an enforceable legal system to manage the exchange of money.

In todays world, money is just a set of tokens. If you go back a hundred years, all over the world, people had what they believed what a different idea about money. Money was backed by gold. If you had a dollar, it wasn’t just a green piece of paper. It was a lightweight representation of about 1.67 grams of gold. You could go to the government with your dollars, and exchange them for that quantity of gold. According to many people, derogatorily called goldbugs, this was fundamentally different from todays money, which they call fiat currency, because it was backed by a tangible valuable asset.

The problem with that argument is: why is gold any more valuable than green paper?

Gold is valuable because it is a pretty yellow metal, incredibly malleable and easy to work with, non-corrosive, and useful for a wide variety of artistic and industrial purposes. It’s also relatively rare. In other words: it’s valuable because people want it. It’s not valuable because it’s tangible! No one would say that a currency backed by rocks is intrinsically more valuable than a so-called fiat currency, even though rocks are tangible. People don’t want a lot of rocks, so rocks aren’t worth much.

Now we can finally get around to just what bitcoin is.

Bitcoin is a currency which isn’t backed by any government. In fact, it’s not backed by anyone. It’s a fundamentally decentralized system of currency. There’s no central authority behind it. Instead, it works based on an interesting protocol of computation over communication networks. Everything about it is distributed all over the world. You could pick any individual computer or group of computers involved in bitcoin, blow them to bits, and bitcoin would be unaffected, because there would still be other people in the bitcoin network. It’s all driven by distributed computation. A bitcoin is an utterly intangible asset. There is no coin in a bitcoin.

I’ll go into more detail in my next post. But the basic idea of bitcoin is really, really simple. There are a bunch of computers on the network that are keeping track of bitcoin transactions. Between them, they maintain a ledger, which consists of a list of transactions. Each transaction says, effectively, “X gave N bitcoins to Y”. The only way that you can own a bitcoin is if, somewhere in the ledger, there is a transaction saying that someone gave a bitcoin to you. There is no coin. There is no identifying object that represents a bitcoin. There’s not even anything like the serial number printed on a banknote. There is no bitcoin: there is just a transaction recordin a ledger saying that someone gave you a bitcoin. There is absolutely no notion of traceability associated with a bitcoin: you can’t take a bitcoin that someone gave you, and ask where it came from. Bitcoins have no identity.

The point of it is specifically that intangibility. Bitcoin exists largely as a way to move valuable stuff around without the involvement of governments. Bitcoin is, really, just a way of making a computationally safe exchange of value, without transferring anything tangible, and without any single central authority in control of it. You know that someone gave you money, because there are thousands of computers around the world that have agreed on the fact that someone gave you money.

Obviously, there needs to be some technical muscle behind it to make people trust in those unknown entities managing the ledgers. We’ll talk about that in the next post.

What’s amusing about bitcoin is that in many ways, it’s no different from any other kind of money: it’s absolutely worthless, except when people agree to pretend that it isn’t. And yet, in bitcoin circles, you’ll constantly see people talking disdainfully about “fiat money”. But bitcoin is the ultimate in fiat: there’s nothing there except the ledger, and the only reason that anyone thinks it’s worth anything is because they’ve all agreed to pretend that it’s worth something.

Topological Spaces: Defining Shapes by Closeness

When people talk about what a topological space is, you’ll constantly hear one refrain: it’s just a set with structure!

I’m not a fan of that saying. It’s true, but it just doesn’t feel right to me. What makes a set into a topological space is a relationship between its members. That relationship – closeness – is defined by a structure in the set, but the structure isn’t the point; it’s just a building block that allows us to define the closeness relations.

The way that you define a topological space formally is:

A topological space is a pair (X, T, N), where X is a set of objects, called points; T is a set of subsets of X; and N is a function from elements of X to elements of T (called the neighborhoods of X where the following conditions hold:

  1. Neighborhoods basis: \forall A \in N(p): p \in A: every neighborhood of a point must include that point.
  2. Neigborhood supersets: \forall A \in N(p): \forall B \in X: B \supset A \Rightarrow B \in N(p). If B is a superset of a neighborhood of a point, then B must also be a neighborhood of that point.
  3. Neighborhood intersections: \forall A, B \in N(p): A \cap B \in N(p): the intersection of any two neighborhoods of a point is a neighborhood of that point.
  4. Neighborhood relations: \forall A \in N(x): \exists B \in N(x): \forall b \in B: A \in N(b). If A is a neighborhood of a point p, then there’s another neighborhood B of p, where A is also a neighborhood of every point in B.

The collection of sets T is called a topology on T, and the neighborhood relation is called a neighborhood topology of T.

Like many formal definitions, this is both very precise, and not particularly informative. What the heck does it mean?

In the previous topology post, I talked about metric spaces. Every metric space is a topological space (but not vice-versa), and we can use that to help explain how the set-of-sets T defines a meaningful definition of closeness for a topological space.

In the metric space, we define open balls around each point in the space. Each one forms an open set around the point. For any point p in the metric space, there are a sequence of ever-larger open-balls of points around p.

That sequence of open balls defines the closeness relation in the metric space:

  • a point q is closer to p than it r is if q is in one of the open balls around p, which r isn’t. (In a metric space, that’s equivalent to saying that the distance d(q, p) < d(q, r).)
  • two points q and r are equally close to p if there is no open ball around p where q is included but r isn’t, or where r is included but p isn’t. (In a metric space, that’s equivalent to saying that d(q, p) = d(r, p).)

In a topological space, we don’t neccessarily have a distance metric to define open balls. But the neighborhoods of each point p define the closeness relation in the same way as the open-balls in a metric space!:

  • The neighborhoods N(p) of a point are equivalent to the open balls around p in a metric space.
  • The open sets of the topology (the members of T) are equivalent to the open sets of the metric space.
  • The complements of the members of T are equivalent to the closed sets of the metric space.

One of the most important ideas in topology is the notion of continuity. Some people would say that it’s the fundamental abstraction of topology, because the whole idea of the equivalence between two shapes is that there is a continuous transformation between them. And now that we know what a topological space is, we can define continuity.

Continuity isn’t a property of a topological space, but rather a property of a function between two topological spaces: if (T, X_T, N_T) and (U, X_U, N_U) are both topological spaces, then a function f: X \rightarrow Y is continuous if and only if for every open set C \in X_U, the inverse image of f on C is an open set in X_T. (The inverse image of f is the set of points x \in X_T: f(x) \in C).

Once again, we’re stuck with a very precise definition that’s really hard to make any sense out of. I mean really, the inverse image of the function on an open set is an open set? What the heck does that mean?

What it’s really capturing is that there are no gaps in mapping from one space to the other. If there was a gap, it would create a boundary – there would be a hard edge in the mapping, and so the inverse image would show that as a closed set. Think of the metric spaces idea of open sets. Imagine an open set with a cube cut out of the middle. It’s definitely not continuous. If you took a function on that open set, and its inverse image was the set with the cube cut out, then the function is not smoothly mapping from the open set to the other topological space. It’s only mapping part of the open set, leaving a ugly, hard-edged gap.

In topology, we say that two shapes are equivalent if and only if they can be continuously transformed into each other. In intuitive terms, that continuous transformation means that you can do the transformation without tearing holes are gluing edges. That gives us a clue about how to understand this definition. What the definition means is really saying is pretty much that there’s no gluing or tearing: it says that if a set in the target is an open set, the set of everything that mapped to it is also an open set. That, in turn, means that if f(x) and f(y) are close together in U, then x and y must have been close together in T: so the structure of neighborhood relations is preserved by the function’s mapping.

One continuous map from a topological space isn’t enough for equivalence. It’s possible to create a continuous mapping from one topological space to another when they’re not the same – for example, you could map part of the topology T onto U. As long as for that part, it’s got the continuity properties, that’s fine. For two topologies to be equivalent, there must be a homeomorphism between the sets. That is, a function f such that:

  • f is one-to-one, total, and onto
  • Both f and f^{-1} are continuous.

As a quick aside: here’s one of the places where you can see the roots of category theory in algebraic topology. There’s a very natural category of topological spaces. The objects in the category are, obviously, the topological spaces. The arrows are continuous functions between the spaces. And a homeomorphism (homo-arrow) in the category is a homeomorphism between the objects.

How Computers Really Work: Math via Boolean Logic

As I try to write more posts about ARM programming, I’m finding that I keep getting sidetracked by background stuff. Hopefully it’s interesting; if not, let me know in the comments!

Today’s sidetrack: how the heck does a computer actually do math?

As I said in my post about binary the other day, at a deep level, computers don’t actually work with numbers, not even zeros and ones. Computers are electronic devices, and what they really do is worth with electrical signals. In computer hardware, there are really just a few fundamental operations that it can perform with those signals. By taking those basic operations, and combining them in the right way, we can do math. How that works is very mysterious to most people. Obviously, I can’t describe all of how a computer works in a blog post. But what I’m going to do is explain the basic parts (the gates that are used to build the hardware), and how to combine them to implement a piece of hardware that does integer addition.

In computer hardware, there are four fundamental components that we can use to build operations, and they’re the basic operations of boolean logic: And, Or, Exclusive Or, and Not.

  • AND: Boolean AND takes two inputs, and outputs a 1 if both inputs are one.andgate
  • OR: Boolean OR takes two inputs, and outputs a 1 if at least one of its inputs are 1. orgate
  • XOR: Boolean Exclusive-OR takes two inputs, and outputs a 1 if one, but not
    both, of its inputs are one.xor
  • NOT: Boolean NOT (also called negation) takes one input, and outputs a 1 if its input was 0. notgate

In talking about these kinds of operations, we frequently use truth tables to explain them. A truth table just lists all of the possible input combinations, and tells you what the output for them is. For example, here’s a truth table showing you AND, OR, and XOR:

A B A∧B A∨B A⊕B
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0

In terms of hardware, you can actually implement all of these using one primitive gate type, the NAND (negated AND) gate. For example, in the diagram below, you can see how to build a NOT gate using a NAND gate, and then using using that NAND-based NOT gate, to build an AND gate using two NANDs.

building-gates

In the hardware, those gates are all that we have to work with. To build arithmetic, we need to figure out how to combine these primitive pieces to build up something that produces the right results. First, we need to be able to describe what the correct results are. We’ll start by figuring out what it means to add two 1-bit numbers. We’ll have two one-bit inputs. We need two outputs – because two one-bit numbers can add up to a two-bit sum. We’ll call the outputs S and C (sum and carry – S is the low-order bit, which would be the sum if we could only have a one-bit result, and C is the value of the second bit.)

We can describe addition with a truth table:

A B S C
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1

We can read those rows as “If A is 0 and B is 0, then A+B is 0 with a carry of 0”.

If we compare that table to the table up above showing the basic operations, then we can see that S is exactly the same as XOR, and C is exactly the same as AND. So we can build a one-bit adder out of gates:

half-adder

This little one-bit adder is commonly called a half-adder, because to really implement a useful add operation, we need two of them for each bit.

If you think about what you’d need to do to add together two two-bit numbers, you couldn’t just use a half-adder for each bit. The carry from the first bit needs to get added to the second bit. You need to plumb the carry result from the first bit into a third input for the second bit. To get that second bit to work properly, you need to add together three one-bit values: the two inputs for the high-order bit, and the carry from the low-order bit. Generalizing, if you want to add together N bits,
when you’re computing the sum of the Mth bit, you need to include the carry from the M-1th bit.

To include the carry, we need to combine half-adders into a full adder, which looks like the following:

full-adder

Using that, we can create an N-bit adder, by chaining the carry output from bit M-1 to the carry input of bit M. This creates the simplest adder circuit, which is called a ripple-carry adder.

ripple-carry

Ripple-carry adders are the simplest way of building an integer addition operation in hardware. They’ve got one very big downside: to add the Nth bit, you need to have the result of adding the N-1th bit. That means that the more bits you add, the slower the ripple-carry adder gets. You have to way for the signals to propagate all the way through the circuit from the low bits to the high bits.

So ripple-carry isn’t really used in hardware anymore. There are a bunch of more complicated ways of building the adder that get rid of that propagation delay. It comes down to a very common tradeoff for engineers: performance versus complexity. But at the end of the day, the concept is still basically the same: it’s still a chain of multiple simple adders, that work the way I described here.

Closeness without distance

In my introduction, I said that topology is fundamentally built on the notion of closeness. Someone very quickly responded on Twitter, because they thought that was wrong. It wasn’t wrong, but it’s easy to see where the confusion came from. Like so much math, Topology is built on a very precise logical and set-theoretic formalism. Mathematicians build those formalisms not because they’re annoying people who want to be mysterious and incomprehensible, but because the precision of those formalisms is critically important.

When you hear a statement like “point A is close to point B in a space S”, you have an intuitive idea of what the word “close” means. But when you try to expand that to math, it could potentially mean several different things. The easiest meaning would be: the distance between A and B is small.

Mathematicians have used that definition for a lot of interesting work. It’s got one limitation though: For it to work, you need to be able to define “distance” in the space. How do you do that? In conventional Euclidean space, we have an easy definition. Describe the position of the two points using Cartesian coordinates: A=(x1, y1), B = (x2, y2). The distance between A and B is:

d(A, B) = \sqrt{(x_2-x_1)^2 + (y_2-y_1)^2}

But we’re moving towards the world of topology. We can’t count on our spaces to be Euclidean. In fact, the whole point of topology is, in some sense, to figure out what happens when you have different spatial structures – that is, structures other than the familiar Euclidean one! We need to be able to talk about distances in some more general way. To do that, we’ll create a new kind of space – a space with an associated distance metric. This new space is called a metric space.

A distance metric is conceptually simple. It’s just a special kind of function, from pairs of points in a space to a real number. To be a distance metric, it needs a couple of properties. Suppose that the set of points in the space is S. Then a function d: S \times S \rightarrow \mathbf{R} is a distance metric if it satisfies the following requirements:

  1. Identity: \forall s_i, s_j \in S: d(s_i, s_j) = 0 \Leftrightarrow s_i = s_j
  2. Symmetry:\forall s_i, s_j \in S: d(s_i, s_j) = d(s_j, s_i)
  3. Triangle Inequality: \forall s_i, s_j, s_k \in S: d(s_i, s_k) \le d(s_i, s_j) + d(s_j, s_k)
  4. Non-negativity: \forall s_i, s_j \in S: d(s_i, s_j) \ge 0

A metric space is just the pair (S,d) of a set S, and a metric function d over the set. For example:

  1. A cartesian plane is a metric space whose metric function is the euclidean distance: d((a_x,a_y), (b_x,b_y)) = \sqrt{(a_x-b_x)^2 + (a_y-b-y)^2}.
  2. A checkerboard is a metric space with the number of kings moves as the metric function.
  3. The Manhattan street grid is a metric space where the distance function between two intersections is the sum of the number of horizontal blocks and the number of vertical blocks between them.

All of this is the mathematical work necessary to take one intuitive notion of closeness – the idea of “two points are close if there’s a small distance between them” and turn it into something formal, general, and unambiguous. But we still haven’t gotten to what closeness means in topology! It’s not based on any idea of distance. There are many topological spaces which aren’t metric spaces – that is, there’s no way to define a metric function!

Fortunately, metric spaces give us a good starting point. In topological spaces, closeness is defined in terms of neighborhoods and open balls.

Take a metric space, (S, d), and a point p \in S. An open ball B(p, r) (that is, a ball of radius r around the point p) is the set of points x \in S | d(p, x) < r.

Given a large enough set of points, you can create an infinite series of concentric open spheres: B(p, \epsilon), B(p, 2\epsilon), B(p, 3\epsilon), and so on. Once you’ve got that series of ever-smaller and ever-larger open balls around a point p, you’ve got another notion of closeness. A is closer to p than B is if A is in a smaller open ball around p.

This is the heart of topology. You can define something like an open-ball on a set without a metric. As long as you can create a consistent sequence of open balls, where each larger ball is a strict superset of all of the smaller ones, you can define closeness without any notion of measurable distance!

In the next post, we’ll use this notion of a distance-free sense of closeness to define what a topology actually is.

Another pass at Topology!

A long time ago – in 2006! – I wrote a ton of blog posts about topology. In the course of trying to fix up some of the import glitches from migrating this blog to its new home, I ended up looking at a bunch of them. And… well… those were written in my early days of blogging, and looking back at those posts now… well, let’s just say that my writing has come a long way with 8 years of practice! I was thinking, “I could do a much better job of writing about that now!”

So that’s what I’m going to do. This isn’t going to be reposts,
but rather completely rewrites.

Topology is typical of one of the methods of math that I love: abstraction. What mathematicians do is pick some fundamental concept, focus tightly on it, discarding everything else. In topology, you want to understand shapes. Starting with the basic idea of shape, topology lets us understand shapes, distortions, dimensions, continuity, and more.

The starting point in topology is closeness. You can define what a shape is by describing which points are close to which other points. Two shapes are equivalent if they can be built using the same closeness relationships. That means that if you can take one shape, and pull, squash, and twist it into another shape – as long as you don’t have to either break closeness relations (by tearing or punching holes in it), or add new closeness relations (by gluing edges together) – the two shapes are really the same thing.

This leads to a very, very bad math joke.

How do you recognize topologists at breakfast?

They’re the people who can’t tell their donut from their coffee.

The easiest way to ruin a joke is to over-explain it. Happily, when the joke is this bad, it’s already ruined, so this isn’t my fault. See, in topology, a coffee mug is the same shape as a donut. They’re each three dimensional shapes with one hole. If you had a donut made of an infinitely stretchable material, you could shape it into a coffee mug without every tearing it, ripping a hole, or gluing two edges together.

Anyway – starting tomorrow, I’ll be posting a new version of that old topology series.

Representing Numbers in Binary

Before we can really start writing interesting programs with our ARM, even simple ones, we need to understand a bit about how how a computer actually works with numbers.

When people talk about computers, they frequently say something like “It’s all zeros and ones”. That’s wrong, in a couple of different ways.

First, when you look at the actual computer hardware, there are no zeros and ones. There are two kinds of signals, represented as two different voltage levels. You can call them “high” and “low”, or “A” and “B”, or “+” and “-“. It doesn’t matter: it’s just two signals. (In fact, one fascinating thing to realize is that there’s a fundamental symmetry in those signals: you can swap the choice of which signal is 0 and 1, and if you just swap the and gates and the or gates, everything you won’t be able to tell the difference! So one ARM processor could use one signal for 1, and a different ARM could use that signal as 0, and you wouldn’t actually be able to tell. In fact, everyone does it the same way, because chip design and manufacture are standardized. But mathematically, it’s possible. We’ll talk about that duality/symmetry in another post.)

Second, the computer really doesn’t work with numbers at all. Computer hardware is all about binary logic. Even if you abstract from different voltages to the basic pairs of values, the computer still doesn’t understand numbers. You’ve got bits, but to get from bits to numbers, you need to decide on a meaning for the bits two possible values, and you need to decide how to put them together.

That’s what we’re really going to focus on in this post. Once you’ve decided to call on of the two primitive values 1, and the other one 0, you need to decide how to combine multiple zeros and ones to make a number.

It might seem silly, because isn’t it obvious that it should use binary? But the choice isn’t so clear. There have been a lot of different numeric representations. For one example, many of IBM’s mainframe computers used (and continue to use) something called binary coded decimal (BCD). It’s not just different for the sake of being different: in fact, for financial applications, BCD really does have some major advantages! Even if you have decided to use simple binary, it’s not that simple. Positive integers are easy. But how do you handle negative numbers? How do you handle things that aren’t integers?

We’re going to start with the simplest case: unsigned integers. (We’ll worry about fractions, decimals, and floating point in another post.) Like pretty much all modern computers, the ARM uses the basic, mathematical binary exponential representation. This works basically the same as our usual base-10 numbers. We look at the digits from right to left. The leftmost digit (also called the least significant digit counts ones; the next digit counts 10s; the next counts 100s, and soon. So in base 10, the number 3256 means 6*100 plus 5*101 plus 2*102 plus 3*103.

binary-bits

In binary, we do exactly the same thing, only we do it with powers of 2 instead of powers of 10. So in binary the number 1001101 is 1 + 0*21 + 1*22 + 1*23 + 0*24 + 0*25 + 1*26 =1 + 4 + 8 + 64 = 77.

Arithmetic on binary is easy enough – you do the same thing you would with decimal, but in subtraction, borrows give you 2, not 10. As a quick example, let’s look at 7 + 13, which is 111 + 1101.

  1. We start at the right edge. We have 1 + 1 = 10 – so the first digit of the sum is 0, and we carry 1.
  2. Next we have 1 + 0 + 1(carry) = 10 – so the second digit is again 0, and we carry 1. Our sum so far is 00.
  3. Now we’re on to the third digit. 1 + 1 + 1(carry) = 11. So the third digit is 1, and we carry one. Our sum so far is 100.
  4. Now the fourth digit, 1 + 0 + 1(carry), so we get 10. So the sum is 10100, or 20.

We’ll ignore subtraction for a moment, because as we’ll see in a little while, in computer hardware, we don’t actually need to have subtraction as a primitive. Addition is the core of what computer arithmetic, and we’ll use addition to implement subtraction by taking the negative of the number being subtracted. (That is, to compute A-B, we’ll do A+(-B).)

Positive integers with addition aren’t enough to do most stuff we want to do on a computer. Even if we’re never going to write a program that manipulates numbers, we absolutely need to be able to subtract. To a computer, the only way to compare values is to subtract one from another! So we need to be able to do negatives and subtractions. How can we represent negative numbers in binary?

There are three basic choices, called sign-bit/sign-magnitude, one’s-complement and two’s complement. I’ve put an example of the three representing the number 75 in the figure below.

different-binary

In sign-bit representation, what you do is take the leftmost bit (also called the high-order bit), and use it to indicate sign (for obvious reasons, you call it the sign bit.). If the sign bit is 0, then the number is positive; if it’s 1, then the number is negative. For example, 01010 would be +10; 11010 would be -10.

For a human being, sign-bit looks great. But in practice, sign-bit was never used much, because while it looks simple to a human, it’s quite complicated to build in computer hardware. IBM did use it in some early machines, but even they gave up on it.

Next is one’s complement. In 1’s complement, high order bit is still a sign bit. But to convert a number from positive to negative, you don’t just change the sign bit – you invert every single bit in the number. You can still tell whether a number is positive or negative by its sign bit, but the rest of the bits are also different. +10 in one’s complement binary is 01010; -10 is 10101.

Arithmetic in 1s complement is a bit weird. You can almost just add a negative number to a positive number as if they were both positive. Almost, but not quite.

For example, let’s try 6 + -6. 6 is 0110, and -6 is 1001. Add them up: 1111. In twos complement, that’s -0. And there’s one of the weird issues about one’s complement: it’s got two distinct values for 0 – +0 and -0. Since they’re both just 0, we treat them as equal, and it’s not really a problem.

How about 6 + -8? 6 is 00110 (we need 5 bits to handle 8), and -8 is 10111. Add
them up, and you get 11101 – which is -2, the correct answer.

Now, what about 8 + -6? 8 is 01000, and -6 is 11001. Add them up, and you
get 00001, with a carry of 1. So 8 + -6 = 1? That’s wrong! In one’s complement,
there are a bunch of places where simple binary addition will be off by one.
So you need to work out the algorithm for where it’s off-by-one and where it’s not, and you need to build more complicated hardware to incorporate it. That’s not attractive.

Still, one’s complement has been used a lot. In particular, one of the first computers I got to use was an old, beaten-up PDP-1, which used 1’s complement numbers.

Finally, we get to the representation that’s used in pretty much all modern computers: 2’s complement!

Once again, 2’s complement uses a sign bit. But instead of flipping all of the bits, you do something different. In 2’s complement, you need to know how many bits you’re using. If you’re doing an N-bit 2’s complement binary number, then the number -x is represented by 2N-x.

So if we’re doing 6 bits, and we wanted to represent -5, then we’d take 26-5, or 64-5=59. In binary, that’s 111011.

The really beautiful thing about 2s complement is that it’s pretty much the same thing as a trucated 2-adic integer – which means that arithmetic just works. If you’re adding two numbers, it doesn’t matter whether they’re signed numbers or not – it works.

It’s also really easy to implement negation. You don’t have to do that whole “subtract from 2^N” thing. In 2s complement, -N is 1+(ones_complement(N)). That’s super-easy to implement in hardware, and it’s also easy to understand and do for a human: flip the bits and add one!

Two’s complement is, in my opinion, a clear winner in integer representation, and the world of computer hardware maker agrees – everyone now uses 2’s complement for integers.

Bayes Theorem

I’ve been meaning to get back to some of the probability stuff. We’re currently recovering from a major snow/ice storm, and I’m snowed/iced in, so this is a good time!

Today, we’ll talk about what is, according to many people, the most important rule in all of probability: Bayes theorem. It’s also, in my experience, the single most abused rule in all of mathematics. Nothing else has been used so poorly, by so many people, to support sloppy, dumb arguments. After we talk about what the rule is, and what it means, we’ll move on to talk about how it gets abused.

In a pure mathematical sense, Bayes theorem is simple. The interpretation of it, and what it means gets pretty hairy. Suppose that you’ve got two related events, A and B. You know the probability of A occurring is P(A). You know the probability of B occurring is P(B). And you know that if A has already occurred, what the probability of B occurring is. (We write that P(B | A), which you can ready as “the probability of B given A”.) What you’d like to know is, suppose that I know that B occurred. What’s the probability that A also occurred? (What is P(A | B)?)

Bayes theorem says:

P(A|B) = \frac{P(B|A) P(A)}{P(B)}

Let’s be concrete. I go to work, and walk into my office in the morning, and get into the elevator with one other person that I work with.What is the probability that it’s a man?

Without knowing anything about the people that I work with, a reasonable guess would be 50% – the population is pretty close to evenly divided between the genders.

But I’m an engineer, and one of the very unfortunate facts about my job is that the gender pool of engineers is very skewed. Let’s say that it’s 80% men. (In reality, that’s probably actually pretty low.)

Let’s say that about 1/3 of the office is engineering. So the odds that someone I bump into will be an engineer is about 50%.

I can do a couple of things with that information. I could ask, suppose that I walked into the elevator with a woman. What’s the probability that she’s an engineer?

To answer that, I’ll use Bayes law. We’ll say that P(A) is the probability that a random person is a woman- 1/2. P(B) is the probability that a random person is an engineer – 1/3. If I know that a given person is an engineer, the probability of that person being a woman is P(B | A), or 1/5. So what’s the probability of my random female coworker being an engineer (P(A | B))?

  • P(\text{woman}) = 1/2
  • P(\text{eng}) = 1/3
  • P(\text{woman} | \text{eng}) = 1/5
  • P(\text{eng} | \text{woman}) = \frac{P(\text{woman}|\text{eng})P(\text{eng})}{P(\text{woman})} = \frac{(1/5)(1/3)}{1/2} = \frac{2}{15}

See? That was easy, wasn’t it?

Now, what’s it actually mean? If you look at it this way, it doesn’t seem to be such a big deal. Sure, it’s a way of combining probabilities in another situation, but so what? Why’s it any more important than any other?

Because it’s the mathematical method for how to incorporate new knowledge into your expectations. What we did above was start with one understanding of the thing we were trying to predict. Knowing nothing but the typical distribution of genders in the general population, we made a guess about a 50% probability of encountering a woman. But then we added in new information. We knew the population of engineers, and the fact that the gender ration was skewed in engineering – and we incorporated that new information into our prediction.

That answer comes from interpretations. One of the classic interpretations of probability theory is the Bayesian interpretation – named Bayesian specifically because of how it interprets this rule! The Bayesian interpretation says that a statement about probability is really a statement about the state of our knowledge. If I say that the probability of flipping heads on a coin is 1/2, what I’m saying under the Bayesian interpretation is that my certainty that I’ll flip heads is just 1/2.

In that kind of knowledge-based interpretation, there is no intrinsic probability of any event. There is just our degree of certainty about whether the event will occur. Given new information, our degree of certainty can change. Bayes theorem tells us, given new information, exactly how we should change our interpretation.

To explain the bayesian interpretation, we’ll add a couple of terms.

Hypothesis
The hypothesis is the thing whose degree of certainty we’re trying to measure. In the formulation of Bayes law up above, we call it A; here, we’ll call it H.
Prior
The prior, P(H), is the degree of certainty about the hypothesis given no other information.
Evidence
The evidence is the new piece of information that we’re trying to add to our measurement of certainty. Above, we called it B, but here, we’ll call it E.
Likelihood
The likelihood P(E | H) of a piece of evidence is our degree of certainty that a specific piece of evidence would be found if the hypothesis is true.
Model Evidence
The model evidence is P(E), and it’s a bit confusing. It’s the analytic likelihood of any piece of evidence occurring. If you’re considering a set of possible hypotheses using Bayes rule, P(E) will be the same for all of them, but P(E | H) will be the specific likelihood of finding that particular piece of evidence under the hypothesis.
Posterior
The posterior, P(H|E), is the degree of certainty that we will have about A if we add new knowledge, B.
Support
Support is the change in our certainty created by the addition of our new evidence. The support is \frac{P(E|H)}{P(E)}.

So Bayes theorem is a formal statement of how, given evidence, we can modify our certainty about the truth of a particular statement. The classical textbook statement of it is the following. (I took this specific formulation from wikipedia, but any textbook will have nearly the same sentence.)

The posterior probability of a hypothesis is determined by a combination of the inherent likeliness of a hypothesis (the prior) and the compatibility of the observed evidence with the hypothesis (the likelihood).

Or, in mathematical terms, P(H | E) = \frac{P(E | H)}{P(E)} \times P(H) – or exactly what we wrote for Bayes theorem up above.

Why is this abused so badly? Because under a naive, stupid
understanding of Bayes rule, you can essentially randomly estimate the probability of anything. After all, Bayes says that probability is just the combination of our certainties about some collection of facts. So if I can line up some set of facts, along with an estimate of the individual probabilities of those facts, then I can combine those probabilities, and come up with an estimate of the probability of anything! And if I don’t know the probability of an event occurring at al, then the state of my initial knowledge is really simple: it’s always 1/2 – 1/2 is always the starting point given absolutely no other knowledge.

That leads to rubbish like this proof that there are no extra-terrestial intelligences, or this or this purported proof of the existence of God.

All of these arguments fail in the same way. They don’t really use Bayes theorem. The quality of the priors – all of the priors, including the priors used to come up with measures of the likelihoods of the evidences – are crucial. They don’t bother with that. They just make up priors, and combine them without good likelihoods.