Category Archives: Good Math

Infinite and Non-Repeating Does Not Mean Unstructured

So, I got in to work this morning, and saw a tweet with the following image:

pi

Pi is an infinite, non-repeating decimal – meaning that every possible number combination exists somewhere in pi. Converted into ASCII text, somewhere in that string of digits is the name of every person you will ever love, the date, time, and manner of your death, and the answers to all the great questions of the universe. Converted into a bitmap, somewhere in that infinite string of digits is a pixel-perfect representation of the first thing you saw on this earth, the last thing you will see before your life leaves you, and all the moments, momentous and mundane, that will occur between those points.

All information that has ever existed or will ever exist, the DNA of every being in the universe.

EVERYTHING: all contaned in the ratio of a circumference and a diameter.

Things like this, that abuse misunderstandings of math in the service of pseudo-mystical rubbish, annoy the crap out of me.

Before I go into detail, let me start with one simple fact: No one knows whether or not π contains every possible finite-length sequence of digits. There is no proof that it does, and there is no proof that it does not. We don’t know. At the moment, no one does. If someone tells you for certain that it does, they’re bullshitting. If someone tell you for certain that it doesn’t,
they’re also bullshitting.

But that’s not really important. What bothers me about this is that it abuses a common misunderstanding of infinity. π is an irrational number. So is e. So are the square roots of most integers. In fact, so are most integral roots of most integers – cube roots, fourth roots, fifth roots, etc. All of these numbers are irrational.

What it means to be irrational is simple, and it can be stated in two different ways:

  1. An irrational number is a number that cannot be written as a ratio (fraction) of two finite integers.
  2. An irrational number is a number whose precise representation in decimal notation is an infinitely long non-repeating sequence of digits.

There are many infinitely long sequences of digits. Some will eventually include every finite sequence of digits; some will not.

For a simple example of a sequence that will, eventually, contain every possible sequence of digits: 0.010203040506070809010011012013014015016…. That is, take the sequence of natural numbers, and write them down after the decimal point with a 0 between them. This will, eventually, contain every possible natural number as a substring – and every finite string of digits is the representation of a natural number.

For a simple example of a sequence that will not contain every possible sequence of digits, consider 0.01011011101111011111… That is, the sequence of natural numbers written in unary form, separated by 0s. This will never include the number combination “2”. It will never contain the number sequence “4”. It will never even contain the digit sequence for four written in binary, because it will never contain a “1” followed by two “0”s. But it never repeats itself. It goes on and on forever, but it never starts repeating – it keeps adding new combinations that never existed before, in the form of longer and longer sequences of “1”s.

Infinite and non-repeating doesn’t mean without pattern, nor does it mean without structure. All that it means is non-repeating. Both of the infinite sequences I described above are infinitely long and non-repeating, but both are also highly structured and predictable. One of those has the property that the original quote talked about; one doesn’t.

That’s the worst thing about the original quotation: it’s taking a common misunderstanding of infinity, and turning it into an implication that’s incorrect. The fact that something is infinitely long and non-repeating isn’t special: most numbers are infinitely long and non-repeating. It doesn’t imply that the number contains all information, because that’s not true. </p.

Hell, it isn’t even close to true. Here’s a simple piece of information that isn’t contained anywhere in π: the decimal representation of e. e is, like π, represented in decimal form as an infinitely long sequence of non-repeating digits. e and π are, in fact, deeply related, via Euler’s equation: e^{i\pi} + 1 = 0. But the digits of e never occur in π, because they can’t: in decimal form, they’re both different infinitely long sequences of digits, so one cannot be contained in the other.

Numbers like π and e are important, and absolutely fascinating. If you take the time to actually study them and understand them, they’re amazing. I’ve writted about both of them: π here and e here. People have spent their entire lives studying them and their properties, and they’re both interesting and important enough to deserve that degree of attention. We don’t need to make up unproven nonsense to make them interesting. We especially don’t need to make up nonsense that teaches people incorrect “fact” about how infinity works.

Parser Combinators Part 2: This time with types!

If you didn’t read my first post on parser combinators, I highly recommend going back and reading it. This post assumes both that you understand parser combinators, and that you’re already familiar with the way I implemented parser combinators in Python.

All of the code discussed in this post is available in a Github repos. It’s still not anything that I’d seriously use for anything important, but it’s good enough to explain stuff.

Continue reading

Combinator Parsing, part 1

My professional work is focused on making the lives of engineers building software better. I got into that area because I was fascinated – or even obsessed – with programming languages. That led me to the more general area of engineering tools of all types. But programming languages and how they’re implemented remain one of my great fascinations.

If you’re trying to implement a new programming language (or re-implement an old one), the first thing you need to do is called parsing. Parsing consists of reading in a file containing a bunch of text, making sure that it’s correctly structured, and then using its structure to translate it into some kind of data structure.

For example, I implemented a silly programming language called Cruise. In Cruise, a snippet from a program looks like:

actor !Adder {
  behavior :Add() {
    on Plus(Z(),$x, $target) { send $x to $target }
    on Plus(I($x), $y, $target) { send Plus($x,I($y), $target) to $self }
  }
  initial :Add
}

In the Cruise implementation, the parser reads that input, and translates it into a structure called an abstract syntax tree. For the code in the example above, a part of the syntax tree for that code looks something like the following:

Actor[
  name="actor"
  behaviors=[
    Behavior[
      pattern=Tuple[
        name="Plus",
        Tuple[name="Z"],
        Var[name="x"],
        Var[name="target"]],
      action=[Send[Var["x"], Var["Target"]],
    Behavior[
      pattern=Tuple[
        name="Plus",
        Tuple[name="I", Var["x"]],
        Var["y"],
        Var["target"]],
      action=[
        Send[
          Tuple[name="Plus",
            Var["x"],
            Tuple[name="I", Var["y"]]],
          Var["target"]]]]]]

When you want to write a parser, you need to start by describing
what you want to parse. Most of the time, we do that using
a context free grammar (CFG). A CFG is a set of rules that
describe how to produce texts with a particular structure. For
example, suppose you wanted to implement a simple calculator (one
of the canonical examples used when teaching parsing!). You’d
need to describe what the expressions you want to parse
look like. A pretty typical version of this would be:

expr := add_expr
add_expr := mult_expr ( ( '+' | '-' ) mult_expr)*
mult_expr := unary_expr ( ('*' | '/') unary_expr)*
unary_expr := '-' simple | simple
simple := number | '(' expr ')'
number: digit+

This grammar is a way of expressing a system for producing valid expressions. The easiest way to see how it works is to look at an example. We know that something like “3 + 4 * (-5 + 6)” is an expression. So how would be produce that using the grammar?

  1. Start with expr. That always expands to add_expr.
  2. Expand add_expr to mult_expr ‘+’ mult_expr.
  3. Expand the first instance of mult_expr to unary_expr, giving us “unary_expr+mult_expr“.
  4. Expand the first unary expr to simple: “simple+mult_expr“.
  5. Expand simple to number: “number+mult_expr“.
  6. Expand number to the number 3: “3+mult_expr“.
  7. Expand the mult_expr to “unary_expr*unary_expr“: “3+unary_expr*unary_expr“.
  8. Expand the first unary_expr to simple, and simple to number, and number to “4”: “3+4*unary_expr“.
  9. Expand the remaining unary_expr to simple, and simple to a parenthesized expression: “3+4*(expr)”.
  10. And so on, until we end up with “3+4*(-5+6)”.

When I started learning to write compilers, there were two ways that people wrote parsers. They either implemented them manually, using a technique called recursive descent, or they used a table-driven parser generator like yacc, where you wrote the grammar using a special syntax, and then a tool translated it into the implementation of a parser.

Writing a parser using recursive descent was a huge pain. It’s very tedious and repetitive, and like any meticulous process, it’s very error-prone. In short, it’s no damn fun at all.

Writing a parser using a generator was usually much easier – but if something didn’t work, it would be a huge pain. The problem is that the code you write doesn’t end up looking very much like the code you run. For example, to write our expression language, in a typical parser generator, you’d write it something like the follows. (The grammar gets re-arranged a bit, to fit the particular underlying parser algorithm that the generator uses).

%start expr
expr: add_expr;
add_expr:
    add_expr '+' mult_expr
  | add_expr '-' mult_expr
  | mult_expr
  ;

mult_expr:
    mult_expr '*' unary_expr
  | mult_expr '/' unary_expr
  | unary_expr
  ;

unary_expr:
    '-' simple
  | simple
  ;

simple:
    number
  | '(' expr ')'
  ;

The parser generator reads the grammar that we wrote, and translates it into a set of lookup tables, which get interpreted by semi-generic parser code provided by the generator toolkit. What that means is that when your parser actually executes, there’s nothing called unary_expr. There’s some set of states encoded into magic variables with names like yytable and yytrans. If you got something wrong in your parser, debugging it is painful at best, because the grammar is gone: it’s been translated into something completely incomprehensible to human eyes.

Not all parser generators generate code that’s this extreme when it comes to readability. But it’s the output from the most widely used one, called YACC, which has been translated and used by C/C++, Caml, Go, Fortran, Pascal, D, SML, and more.

But even if you’re not using YACC, it’s not a good scene. Even ANTLR, my favorite parser generator, which makes a heroic effort to generate efficient but human-readable parsers, the fact remains that the code that you’re running and trying to debug isn’t familiar to you: it’s not what you wrote. You’re still stuck trying to work through an unfamiliar translation.

Honestly, both approaches – the manual parser, and the parser generator – suck an awful lot of the time.

Then, a few years ago, a gang of functional programmers came up with a truly brilliant scheme. You can describe a parser as a special kind of function from inputs to outputs. If you do that, you can also combine parsers using higher-order functions. Using those functions, all you need to do is write tiny parsers for the simplest elements of your language, and then you can build the parser for your language by just combining your simple parsers in different ways. When you implement a parser this way, what you get is very close to the original grammar of the language, but it’s an actual executable program: the code that you need to read and debug is exactly the code that you wrote.

For example, suppose I wanted to parse that expression language. Using a Python version of parser combinators, I can write:

digit = Parser.match(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
number = digit.many(1)
parens = Parser.match(['(']) & Reference('expr') & Parser.match([')'])
simple = number | parens
unary_expr = Parser.match(['-']).opt() & simple
mult_expr = unary_expr &  (Parser.match(['*', '/']) & unary_expr).many()
add_expr = mult_expr & (Parser.match(['-', '+']) & mult_expr).many()
expr = add_expr
Reference.register_named_parser('expr', add_expr)

That’s not a contrived example. That’s actually a real, honest-to-god parser for our expression language! We’ll look at it in more detail later. But first, we need to understand what parsers and parser combinators really are, and how they work.

At this point, it’s time to start getting more concrete. Just what is a parser, precisely?

In keeping with the fact that this approach came from the functional programming community, it makes sense to describe a parser as a function. If you think of it this way, a parser is a special kind of function:

  • It takes a sequence of inputs;
  • It can either succeed or fail;
  • If it succeeds, it consumes part of its input, and produces an output.

That means that we can come up with a sort-of type signature pretty easily. It’s a function from an input to a result value. The result value can be two types: either a success, or a failure. If it’s a success, it contains both the output value produced by the parser and the unconsumed part of the input. If it fails, the failure contains nothing.

In Python, that’s easy to represent in several different ways. We’ll do it in the one that’s closest to a typed approach: an abstract class ParseResult, with subclasses for success or failure.

class ParseResult(object):
  def succeeded(self):
    return False


class Success(ParseResult):
  """A successful parse result, containing the value produced by a parser,
  and the unconsumed part of its input.
  """
  def __init__(self, output, rest):
    self.output = output
    self.rest = rest

  def succeeded(self):
    return True


class Failure(ParseResult):
  def __init__(self):
    pass

The inputs are just a sequence of values. The parsers can look at one element of the input at a time. In Python, we’ll represent them using a class.

class ParserInput(object):
  @abstractmethod
  def at_end(self):
    """Return True if there's no remaining input."""

  def first(self):
     """Return the first element of this input."""

  def rest(self):
     """Return a ParserInput consisting of the remainder after the first
     element has been consumed.
     """

With those definitions, a parser becomes an object with one method, which takes a ParserInput, and produces a ParseResult.

class Parser(object):
  def parse(self, inp):
    """inp should be an instance of ParserInput; returns an instance of ParseResult"""

Before we get to the combinators, let’s look at an example of a simple parser. Suppose we want to accept a digit. We can implement that as:

class DigitParser(Parser):
  def parse(self, inp):
    if inp.first().isdigit():
      return Success(inp.first(), inp.rest())
    else:
      return Failure()

This looks at the first character of the input stream. This is mostly functional, so it does this without altering the input stream. If the first character is a digit, then it succeeds, returning the digit character and the remaining input; otherwise, it fails.

Simple code like that little snippet are all you need to write yourself. It’s so incredibly simple that looking at it, it seems like it’s almost too simple. How on earth can anything this trivially simple actually be the basis of something as powerful as the infamous YACC?

The answer is combinators: functions that combine parsers togethers. This simple basic, functional mechanism makes it possible to combine simple parsers using simple functions – producing a limitless variety of parsers, which can parse pretty much anything you choose to throw at them. So what kinds of combinators do we need? There are four basics that I used in my implementations, and one that’s an artifact of the implementation.

  1. Sequence: Parse an “A” followed by a “B”. In my Python implementation, this is written A & B. It succeeds only if both “A” and “B” succeed in sequence. The output of this is a list containing the outputs from “A”, “B”, and whatever else is in the sequence with them.
  2. Choice: Parse either an “A” or if that fails, parse a “B”. That’s written A | B. The output is whichever one succeeds, or else fails.
  3. Option: Parse either an “A”, or nothing, written a.opt(). The output is either the output from parsing an “A”, or None.
  4. Repetition: Parse more than one “A”, written a.many(min). The result is a list of the outputs of the invocations of “A”.
  5. Reference: I’m going to tell you what kind of thing to parse here later. For now, it’s just named “A”. The output is, of course, whatever the referenced parser outputs.

The last one definitely needs some extra explanation. Most interesting grammars are, ultimately, recursive. In our calculator’s expression grammar, the rule expr contains an add_expr, which contains a mult_expr, which contains a unary_expr, which contains a simple, which could expand to something that contains an expr. That leads to a problem constructing our parsers: we can’t actually build an instance of an expr, because it needs to contain itself. The Reference combinator is the solution to this: it lets you say Call the parser named “A”; I’ll tell you what parser that is later. If you look at the parser implementation I showed earlier, you can see this:

parens = Parser.match(['(']) & Reference('expr') & Parser.match([')'])
...
Reference.register_named_parser('expr', add_expr)

In the definition of the parens rule, we used Reference('expr'), which says “call whatever parser is registered with the name ‘expr'”. Then later on, we registered the add_expr parser with the name ‘expr’. Voila! Recursion.

There’s one more thing that I used in the calculator example: Parser.match(). That’s just a helpful little piece of code that I wrote to create parsers that look for one character from a fixed set. It’s implemented with a class named SetParser. Parser.match(['(']) will only match a “(” in the input; Parser.match(['a', 'b', 'c']) will match either an “a”, a “b”, or a “c”. Implementing it is as easy as you’d guess at this point:

class SetParser(Parser):
  def __init__(self, chars):
    self.chars = chars

  def pr(self):
    return "SetParser%s" % self.chars

  def parse(self, inp):
    if inp.first() in self.chars:
      return Success(inp.first(), inp.rest())
    else:
      return Failure()

With these, you should now be able to completely read and understand the implementation of our expression parser. Let’s walk through it.

digit = Parser.match(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']) 
number = digit.many(1)
parens = Parser.match(['(']) & Reference('expr') & Parser.match([')'])
simple = number | parens
unary_expr = Parser.match(['-']).opt() & simple
mult_expr = unary_expr &  (Parser.match(['*', '/']) & unary_expr).many()
add_expr = mult_expr & (Parser.match(['-', '+']) & mult_expr).many()
expr = add_expr
Reference.register_named_parser('expr', add_expr)
  • digit: this is a SetParser that reads a single digit.
  • number: a number is just a sequence of at least one digit.
  • parens: this is a parenthesized expression: an open paren followed by an expression, followed by a close paren.
  • simple: a simple expression is either a number, or a parenthesized expression.
  • unary: a unary expression is an optional “-“, followed by a simple.
  • mult_expr: a multiplication expression is a unary, followed by a collection of multiplication operators and operands.
  • add_expr: the same thing as a mult_expr, except that it uses add operations. By defining them seperately, with multiplication embedded inside addition, we gave multiplication a higher precedence.
  • expr is just an alias for add_expr.
  • Reference: finally, we bind the reference from the parenthesized expression.

Now that we have a parser, we can run it!

>>> inp = StringParserInput("1+2*(3+5*4)*(6+7)")
>>> print(expr.parse(inp).output)
[None, ['1'], [], [['+', [None, ['2'], [['*', [None, ['(', [None, ['3'], [], [['+', [None, ['5'],
 [['*', [None, ['4']]]]]]]], ')']]], ['*', [None, ['(', [None, ['6'], [], [['+', [None, ['7'],
  []]]]], ')']]]]]]]]

It works, but wow, that output is horrible! It’s basically a tree, formed from lists, that describes the parse process. It’s got all of the information we’d need to evaluate the expressions parsed by our grammar, only it’s in a terrible form.

That brings us to one last thing to add: Actions. In terms of parser combinators, an action is a parser that takes another parser, and a function. The output of the action is the result of applying the function to the embedded parser:

class Action(Parser):
  def __init__(self, parser, act):
    self.parser = parser
    self.action = act

  def pr(self):
    return "Action[%s]" % self.parser.pr()

  def parse(self, inp):
    result = self.parser.parse(inp)
    if result.succeeded():
      return Success(self.action(result.output), result.rest)
    else:
      return Failure()

Using that, we can turn our expression parser into a calculator!

def digits_to_number(digits, running=0):
  """Convert a list of digits to an integer"""
  if len(digits) == 0:
    return running
  else:
    r = (running * 10) + int(digits[0])
    return digits_to_number(digits[1:], r)

digit = Parser.match(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
number = Action(digit.many(1), digits_to_number)

In this version, number now actually outputs an integer, which is the number represented by the list of digits produced by the parser.

We can attach similar actions to all of the other rules. For example, the new version of our addition expression is:

def eval_add(lst):
  """Evaluate an addition expression. For addition rules, the parser will return
  [number, [[op, number], [op, number], ...]]
  To evaluate that, we start with the first element of the list as result value,
  and then we iterate over the pairs that make up the rest of the list, adding
  or subtracting depending on the operator.
  """
  first = lst[0]
  result = first
  for n in lst[1]:
    if n[0] == '+':
      result += n[1]
    else:
      result -= n[1]
  return result

add_expr = Action(mult_expr & (Parser.match(['-', '+']) & mult_expr).many(), eval_add)

With similar actions attached to the others, now, it works. (The code for the other actions is in the github repository, linked below):

>>> inp = StringParserInput("1+2*(3+5*4)*(6+7)")
>>> print(expr.parse(inp).output)
599

That’s about it for doing this in Python. The code is all here on Github, in the “python” subdirectory. The rest of the github project is the Java version. It’s more complicated to do all of this in Java, because of the type system. We’ll take a look at that in the next post. The syntax is more verbose, and the types make things a bit more complicated, but it’s still amazingly easy to build a useful parser this way.

This implementation isn’t quite what I’d call industrial strength:

  1. It’s not super-efficient; particularly when it’s parsing a multiple-precedence expression, it can do a whole lot of backtracking, which makes things very slow. To fix this, the best thing would be to add something called operator precedence parsing for handling infix expressions. Parsec, the Haskell library that introduced my to combinator parsing, does this, and it workes beautifully. It would be nice to add that, but for my purposes (ie, writing this blog post, plus a bit of language parsing for my Apex project), it’s not necessary.
  2. It has terrible error handling. In fact, it doesn’t have error handling at all. If there’s a syntax error in your input, all that will happen is that the parser will fail. To fix this, I’ll need to add a third kind of return value to the basic parser function: a parser should be able to succeed, fail without error, and fail with an error. This is something I really need to add if I ever want to use this code for anything even semi-real.
  3. It doesn’t track positions. In other words, when I get it to the point where it can actually produce a syntax error, it can’t say where the error is. I’d really like to be able to say “There’s an error at line 3 column 7”, but right now, there’s no way to do that. This one is relatively easy to add: the parser input just needs to be able to report a line and column number. I’m definitely planning on adding that.

The Heartbleed Bug

There’s a lot of panic going around on the internet today, about something called the Heartbleed bug. I’ve gotten questions, so I’m giving answers.

I’ve heard lots of hype. Is this really a big deal?

Damn right it is!

It’s pretty hard to wrap your head around just how bad this actually is. It’s probably even more of a big deal that the hype has made it out to be!

This bug affects around 90% of all sites on the internet that use secure connections. Seriously: if you’re using the internet, you’re affected by this. It doesn’t matter how reputable or secure the sites you connect to have been in the past: the majority of them are probably vulnerable to this, and some number of them have, in all likelihood, been compromised! Pretty much any website running on linux or netbst, using Apache or NGINX as its webserver is vulnerable. That means just about every major site on the net.

The problem is a bug in a commonly used version of SSL/TLS. So, before I explain what the bug is, I’ll run through a quick background.

What is SSL/TLS?

When you’re using the internet in the simplest mode, you’re using a simple set of communication protocols called TCP/IP. In basic TCP/IP protocols, when you connect to another computer on the network, the data that gets sent back and forth is not encrypted or obscured – it’s just sent in the clear. That means that it’s easy for anyone who’s on the same network cable as you to look at your connection, and see the data.

For lots of things, that’s fine. For example, if you want to read this blog, there’s nothing confidential about it. Everyone who reads the blog sees the same content. No one is going to see anything private.

But for a lot of other things, that’s not true. You probably don’t want someone to be able to see your email. You definitely don’t want anyone else to be able to see the credit card number you use to order things from Amazon!

To protect communications, there’s another protocol called SSL, the Secure Sockets Layer. When you connect to another site that’s got a URL starting with https:, the two computers establish an encrypted connection. Once an SSL connection is established between two computers, all communication between them is encrypted.

Actually, on most modern systems, you’re not really using SSL. You’re using a successor to the original SSL protocol called TLS, which stands for transport layer security. Pretty much everyone is now using TLS, but many people still just say SSL, and in fact the most commonly used implementation of it is in package called OpenSSL.

So SSL/TLS is the basic protocol that we use on the internet for secure communications. If you use SSL/TLS correctly, then the information that you send and receive can only be accessed by you and the computer that you’re talking to.

Note the qualifier: if you use SSL correctly!

SSL is built on public key cryptography. What that means is that a website identifies itself using a pair of keys. There’s one key, called a public key, that it gives away to everyone; and there’s a second key, called a private key, that it keeps secret. Anything that you encrypt with the public key can only be decrypted using the private key; anything encrypted with the private key can only be decrypted using the public key. That means that if you get a message that can be decrypted using the sites public key, you know that no one except the site could have encrypted it! And if you use the public key to encrypt something, you know that no one except that site will be able to decrypt it.

Public key cryptography is an absolutely brilliant idea. But it relies on the fact that the private key is absolutely private! If anyone else can get a copy of the private key, then all bets are off: you can no longer rely on anything about that key. You couldn’t be sure that messages came from the right source; and you couldn’t be sure that your messages could only be read by an authorized person.

So what’s the bug?

The SSL protocol includes something called a heartbeat. It’s a periodic exchange between the two sides of a connection, to let them know that the other side is still alive and listening on the connection.

One of the options in the heartbeat is an echo request, which is illustrated below. Computer A wants to know if B is listening. So A sends a message to B saying “Here’s X bytes of data. Send them back to me.” Then A waits. If it gets a message back from B containing the same X bytes of data, it knows B was listening. That’s all there is to it: the heartbeat is just a simple way to check that the other side is actually listening to what you say.

heartbeat

The bug is really, really simple. The attacker sends a heartbeat message saying “I’m gonna send you a chunk of data containing 64000 bytes”, but then the data only contains one byte.

If the code worked correctly, it would say “Oops, invalid request: you said you were going to send me 64000 bytes of data, but you only sent me one!” But what the buggy version of SSL does is send you that 1 byte, plus 63,999 bytes of whatever happens to be in memory next to wherever it saved the byte..

heartbleed

You can’t choose what data you’re going to get in response. It’ll just be a bunch of whatever happened to be in memory. But you can do it repeatedly, and get lots of different memory chunks. If you know how the SSL implementation works, you can scan those memory chunks, and look for particular data structures – like private keys, or cookies. Given a lot of time, and the ability to connect multiple times and send multiple heartbeat requests each time you connect, you can gather a lot of data. Most of it will be crap, but some of it will be the valuable stuff.

To make matters worse, the heartbeat is treated as something very low-level which happens very frequently, and which doesn’t transfer meaningful data. So the implementation doesn’t log heartbeats at all. So there’s no way of even identifying which connections to a server have been exploiting this. So a site that’s running one of the buggy versions of OpenSSL has no way of knowing whether or not they’ve been the target of this attack!

See what I mean about it being a big deal?

Why is it so widespread?

When I’ve written about security in the past, one of the things that I’ve said repeatedly is: if you’re thinking about writing your own implementation of a security protocol, STOP. Don’t do it! There are a thousand ways that you can make a tiny, trivial mistake which completely compromises the security of your code. It’s not a matter of whether you’re smart or not; it’s just a simple statement of fact. If you try to do it, you will screw something up. There are just so many subtle ways to get things wrong: it takes a whole team of experts to even have a chance to get it right.

Most engineers who build stuff for the internet understand that, and don’t write their own cryptosystems or cryptographic protocols. What they do is use a common, well-known, public system. They know the system was implemented by experts; they know that there are a lot of responsible, smart people who are doing their best to find and fix any problems that crop up.

Imagine that you’re an engineer picking an implementation of SSL. You know that you want as many people trying to find problems as possible. So which one will you choose? The one that most people are using! Because that’s the one that has the most people working to make sure it doesn’t have any problems, or to fix any problems that get found as quickly as possible.

The most widely used version of SSL is an open-source software package called OpenSSL. And that’s exactly where the most bug is: in OpenSSL.

How can it be fixed?

Normally, something like this would be bad. But you’d be able to just update the implementation to a new version without the bug, and that would be the end of the problem. But this case is pretty much the worst possible case: fixing the implementation doesn’t entirely fix the problem. Because even after you’ve fixed the SSL implementation, if someone got hold of your private key, they still have it. And there’s no way to know if anyone stole your key!

To fix this, then, you need to do much more than just update the SSL implementation. You need to cancel all of your keys, and replace them new ones, and you need to get everyone who has a copy of your public key to throw it away and stop using it.

So basically, at the moment, every keypair for nearly every major website in the world that was valid yesterday can no longer be trusted.

Manifold are the Manifolds

In the stuff I’ve been writing about topology so far, I’ve been talking about topologies mostly algebraically. I’m not sure why, but for me, algebraic topology is both the most interesting and easy to understand bit of topology. Most people seem to think I’m crazy; in general, people seem to think that algebraic topology is hard. It must say something about me that I find other things much harder, but I’ll leave it to someone else to figure out what.

Despite the fact that I love the algebraic side, there’s a lot of interesting stuff in topology that you need to talk about, which isn’t purely algebraic. Today I’m going to talk about one of the most important ones: manifolds.

The definition we use for topological spaces is really abstract. A topological space is a set of points with a structural relation. You can define that relation either in terms of neighborhoods, or in terms of open sets – the two end up being equivalent. (You can define the open sets in terms of neighborhoods, or the neighborhoods in terms of open sets; they define the same structure and imply each other.)

That abstract definition is wonderful. It lets you talk about lots of different structures using the language and mechanics of topology. But it is very abstract. When we think about a topological space, we’re usually thinking of something much more specific than what’s implied by that definition. The word space has an intuitive meaning for us. We hear it, and we think of shapes and surfaces. Those are properties of many, but not all topological spaces. For example, there are topological spaces where you can have multiple distinct points with identical neighborhoods. That’s definitely not part of what we expect!

The things that we think of as a spaces are really are really a special class of topological spaces, called manifolds.

Informally, a manifold is a topological space where its neighborhoods for a surface that appears to be euclidean if you look at small sections. All euclidean surfaces are manifolds – a two dimensional plane, defined as a topological space, is a manifold. But there are also manifolds that aren’t really euclidean, like a torus, or the surface of a sphere – and they’re the things that make manifolds interesting.

The formal definition is very much like the informal, with a few additions. But before we get there, we need to brush up on some definitions.

  • A set {\mathbf S} is countable if and only if there is a total, onto, one-to-one function from {\mathbf S} to the natural numbers.
  • Given a topological space (T, \tau), a basis \beta for \tau is a collection of open sets from which any open set in \tau can be generated by a finite sequence of unions and intersections of sets in \beta. What this really means is that the structure of the space is regular – it’s got nothing strange like an infinitary-union in its open-sets.
  • A topological space (T, \tau) is called a Hausdorff space if and only if for any two distinct points p, q \in T, there are at least one open set o_p where p \in o_p \land q \not\in o_p, and at least one open set o_q where q \in o_q \land p \not\in o_q. (That is, there is at least one open set that includes p but not q, and one that includes q but not p.) A Hausdorff space just defines another kind of regularity: disjoint points have disjoint neighborhoods.

A topological space (T, \tau) is an n-manifold if/f:

  • \tau has a countable basis.
  • (T, \tau) is a Hausdorff space.
  • Every point in T has a neighborhood homeomorphic to an open euclidean n-ball.

Basically, what this really means is pretty much what I said in the informal definition. In a euclidean n-space, every point has a neighborhood which is shaped like an n-ball, and can be separated from any other point using an n-ball shaped neighborhood of the appropriate size. In a manifold, the neighborhoods around a point look like the euclidean neighborhoods.

If you think of a large enough torus, you can easily imagine that the smaller open 2-balls (disks) around a particular point will look very much like flat disks. In fact, as the torus gets larger, they’ll become virtually indistinguishable from flat euclidean disks. But as you move away from the individual point, and look at the properties of the entire surface, you see that the euclidean properties fail.

Another interesting way of thinking about manifolds is in terms of a construction called charts, and charts will end up being important later.

A chart for an manifold is an invertable map from some euclidean manifold to part of the manifold which preserves the topological structure. If a manifold isn’t euclidean, then there isn’t a single chart for the entire manifold. But we can find a set of overlapping charts so that every point in the manifold is part of at least one chart, and the edges of all of the charts overlap. A set of overlapping charts like that is called an atlas for the manifold, and we will sometimes say that the atlas defines the manifold. For any given manifold, there are many different atlases that can define it. The union of all possible atlases for a manifold, which is the set of all charts that can be mapped onto parts of the manifold is called the maximal atlas for the manifold. The maximal atlas for a manifold is, obviously, unique.

For some manifolds, we can define an atlas consisting of charts with coordinate systems. If we can do that, then we have something wonderful: a topology on which we can do angles, distances, and most importantly, calculus.

Topologists draw a lot of distinctions between different kinds of manifolds; a few interesting examples are:

  • A Reimann manifold is a manifold on which you can meaningfully define angles and distance. (The mechanics of that are complicated and interesting, and I’ll talk about them in a future post.)
  • A differentiable manifold is one on which you can do calculus. (It’s basically a manifold where the atlas has measures, and the measures are compatible in the overlaps.) I probably won’t say much more about them, because the interesting thing about them is analysis, and I stink at analysis.
  • A Lie group is a differentiable manifold with a valid closed product operator between points in the manifold, which is compatible with the smooth structure of the manifold. It’s basically what happens when a differentiable manifold and a group fall in love and have a baby.

We’ll see more about manifolds in future posts!

Two Factor Security

My friend Dr24Hours tweeted something disparaging about two factor security this morning:

I’m a huge fan of two factor, but I’ve definitely noticed that it’s not well understood by a lot of people, and there are a lot of misunderstandings of what it is, how it works, and why it’s good.

Two factor is a mode of authentication. It’s a way of proving that you are who you say you are, in order to be granted access to some secured resource. You don’t want random people to be able to access your email, so your email provider has a way of ensuring that only you can access it. Before you an access it, you have to prove that you’re someone who’s supposed to be able to see it.

So how can you prove who you are? There are fancy terms for different strategies, but fundamentally, it comes down to one of two things: either something you know, or something you have.

If you’re using the internet, you’re guaranteed to be familiar with the “something you know” method: passwords are something you know. For any password protected service, you have a secret that you share with he service. By telling them the secret – something that no one but you should know – you’re proving to them that you’re you.

If you work in an office that uses ID badges, you’re familiar with a “something you have” strategy. The people at the security desk on the ground floor of my office have no idea who I am. They don’t recognize me. Someone could walk into the building, and claim that they work for twitter, and the security people would have no idea if they were telling the truth. But we all have security cards for the building. That card proves that I’m someone who’s supposed to be allowed in.

Passwords – something you know – are used all over the internet. Each of us has a ridiculous number of passwords. Just quickly trying to come up with a list, I’ve got at least 35 different passwords! And I’m probably missing at least another dozen.

Something you know has two big related problems: it’s only a good way of proving who you are if it’s really something that only you know; and if it’s something that someone else won’t be able to guess. If anyone else knows it, or if anyone else can guess it, then it’s no longer useful as a way of proving that you’re you.

Both of those weaknesses have major practical effects.

Lots of people choose really stupid passwords. There’s a huge number of people who use “password” or “1234” as their password. In fact, people who’ve tried to crack passwords to test the security of systems have found that the five most common passwords are “password”, “123456”, “12345678”, “12345”, and “qwerty”. If someone wants to get access to your email, or your amazon account, or anything else, those will be the first thing they try. A frightening amount of the time, that will work.

Even among people who use good passwords, the sheer number of passwords they need to remember has gotten out of hand, and so they reuse passwords. If you reuse your password, then that means that if any of the places you used your password gets cracked, or if any of the places where you used your password are dishonest, your password is no longer a secret that only you know.

What two-step authentication does is mitigate the risk of bad passwords and password reuse by adding a layer of “something you have” to authentication. First, it checks the shared secret – the thing that you know. If that’s successful, then it also checks for the thing you have.

For example, my work email account uses two-factor authentication. What that means is that when I log in, it first asks me for my password. Once I give it the password, then it asks for a four digit code. I’ve got an app on my phone that generates that code. Only my phone contains the data that’s needed to generate that four digit code – so if I get it right, that means that I have my phone.

Two factor doesn’t tell anyone additional information about you, beyond the fact that you have whatever device or artifact that you’re using. Using two factor doesn’t tell Google what I’m buying at Amazon, or what I’m tweeting, or who I talk to. Even if I’m using Google 2factor, Google gets no more information about me than it would get by checking my password.

But two factor has a huge benefit for me. It means that it’s much, much harder for someone else to get access to my accounts. I don’t reuse my passwords, but I do use 1password for password management, so if someone were able to crack my 1password data file, they’d have access to my passwords. But with two factor, that’s not enough for them to get access to my account. I wish that I could turn on two factor in more places!

Big Numbers and the Lost Plane

(In the original version of this post, there were two places where I wrote “cubic” instead of “square”. It was a stupid mistake, but just a stupid wording mistake. The calculations are unchanged: I was calculating square units, not cubic – I just typed the wrong thing.)

I haven’t written about this in a while, but for a long time, I’ve been fascinated by just how bad humans are at actually understanding big numbers. When it comes down to it, we’re comfortable with numbers we can count on our fingers. The further you get from our fingers, the worse we get at it.

We see this over and over. I’ve been noticing a new variation on this lately, in discussions about the missing Malaysian airliner. For example, I was at the dentist tuesday morning for a tmj mouth guard because I’ve been told that I grind my teeth at night. They’ve installed TVs in the exam rooms, to give you something to do while you wait for the novocaine to kick in, and they had it turned to CNN. The technician who was setting things up was talking to me about what they were saying, and she kept repeating: it’s a plane! It’s huge! Why can’t they find it?

If you think intuitively, it’s a good question!

In fact, if you look at it with basic intuition, it seems insane. A Boeing 777 is a huge plane! The thing is nearly as long as football field, and it’s length wingtip to wingtip is just short of 200 feet. Fully loaded with fuel at takeoff, it weighs more than 300 tons! How can you lose something that large? Even if it broke into pieces, the debris field from it would need to be huge, and it must be easy to find!

But that’s relying on intuition, and as I said in the introduction, our intuitions scale very poorly. To show what I mean, I’m going to work through a process that will hopefully help make it clear just how badly our intuition fails us here.

We’ll start by coming up with an estimate of the quantity of debris.

Assume that the fuselage is a continuous tube. The official stats say that the diameter of the fuselage is 20 feet, and it’s 242 feet long. So a deliberately overlarge estimate of the total surface area of the fuselage is 24220Ï€ – or about 15,000 square feet. Assume that the surface of the wings is about he same, and you get a total surface area of about 30,000 square feet. That means that if the entire body of the plane was peeled and laid flat, and it all floated on the water, it would cover 30,000 square feet of the surface. (In reality, much of the plane would sink, but the seats would float; on the whole, it’s probably a wash; the 30,000 square feet is probably still order-of-magnitude correct for the amount of debris visible on the surface.) Sounds like a whole lot, right?

To get an idea of how easy that is to find, we need to consider how much area we need to search. From what we know about how long the plane could have been in the air, and what direction it could have been moving, we’ve got a search area that’s at least a couple of thousand square miles.

One square mile is 27878400 square feet. So assuming that the plane crashed, and its wreckage is distributed over one square mile, that would mean that in the crash area, if every bit of the plate floated, just over one tenth of one percent of that square mile is covered in wreckage. That’s piling together very unrealistic assumptions – a highly constrained debris field, all of it floating – in order to stack the odds of finding it!

We’re using some optimistic assumptions here – but even so, even with a debris field that’s highly (unrealistically) constrained, it’s still very sparse. (In fact, it’s likely that if the plane crashed in the water, the debris field is spread over more than one square mile.) That’s what we’re looking for: an area where less than one tenth of one percent of the surface is covered, in an area covering several thousand square miles.

A thousand square miles is pretty squarely in the zone of stuff that we don’t really understand very well. Getting an understanding of just how big an area a thousand square miles is… it’s not something that our intuition will help with. But statistically, we’re looking for a patch of debris where the visible artifacts cover less than one ten-millions of one percent of the area being searched. If the search area were a football field, you’d be looking for a little fleck of tinfoil, 1/32nd of an inch on each side.

Only even that is making it too easy: it’s not one piece of tinfoil, 1/32nd of an inch on a side. It’s a fleck of foil which was shredded into a thousand even tinier pieces.

That’s assuming a search area of only one thousand square miles. But the search area is, potentially, quite a lot larger than that. Searching for the tinfoil on a football field is easy in comparison.

Once you understand how small an airplane is in comparison to a small patch of ocean, and you understand just how big a small patch of ocean is, then it’s not at all surprising that we haven’t found it yet.

Big Science News! Inflation, Gravity, and Gravitational Waves

gwaves

So, big announcement yesterday. Lots of people have asked if I could try to explain it! People have been asking since yesterday morning, but folks, I’ve got a job! I’ve been writing when I have time while running slow tests in another window, so it’s taken more than a day to get to it.

The announcement is really, really fascinating. A group has been able to observe gravity wave fluctuations in the cosmic microwave background. This is a huge deal! For example, Sean Carroll (no relation) wrote:

other than finding life on other planets or directly detecting dark matter, I can’t think of any other plausible near-term astrophysical discovery more important than this one for improving our understanding of the universe.

Why is this such a big deal?

This is not an easy thing to explain, but I’ll do my best.

We believe that the universe started with the big bang – all of the matter and energy, all of the space in the universe, expanding outwards from a point. There’s all sorts of amazing evidence for the big bang – not least the cosmic microwave background.

But the big-bang theory has some problems. In particular, why is everything the same everywhere?

That sounds like a strange question. Why wouldn’t it be the same everywhere?

Here’s why: because for changes to occur at the same time in different places, we expect there to be some causal connection between those places. If there is no plausible causal connection, then there’s a problem: how could things happen at the same time, in the same way?

That causal connection is a problem. To explain why, first I need to explain the idea of the observable universe.

Right now, there is some part of the universe that we can observe – because light from it has reached us. There’s also some part of the universe that we can’t observe, because light from it hasn’t reached us yet. Every day, every moment, the observable universe gets larger – not because the universe is expanding (it is, but we’re not talking about the size of the universe, but rather of the part of the universe that we can observe). It’s literally getting larger, because there are parts of the universe that are so far away from us, that the first light they emitted after the universe started didn’t reach us until right now. That threshold, of the stuff that couldn’t possible have gotten here yet, is constantly expanding, getting farther and farther away.

There are parts of the universe that are so far away, that the light from them couldn’t reach us until now. But when we look at that light, and use it to see what’s there, it looks exactly like what we see around us.

The problem is, it shouldn’t. If you just take the big bang, and you don’t have a period of inflation, what you would expect is a highly non-uniform universe with a very high spatial curvurature. Places very far away shouldn’t be exactly the same as here, because there is no mechanism which can make them evolve in exactly the same way that they did here! As energy levels from the big bang decrease, local fluctuations should have produced very different outcomes. They shouldn’t have ended up the same as here – because there’s many different ways things could have turned out, and they can’t be causally connected, because there’s no way that information could have gotten from there to here in time for it to have any effect.

Light is the fastest thing in the universe – but light from these places just got here. That means that until now, there couldn’t possibly be any connection between here and there. How could all of the fundamental properties of space – its curvature, the density of matter and energy – be exactly the same as here, if there was never any possible causal contact between us?

The answer to that is an idea called inflation. At some time in the earliest part of the universe – during a tiny fraction of the first second – the universe itself expanded at a speed faster than light. (Note: this doesn’t mean that stuff moved faster than light – it means that space itself expanded, creating space between things so that the distance between them expanded faster than light. Subtle distinction, but important!) So the matter and energy all got “stretched” out, at the same time, in the same way, giving the universe the basic shape and structure that it has now.

Inflation is the process that created the uniform universe. This process, which happened to the entire universe, had tiny but uniform fluctuations because of the basic quantum structure of the universe. Those fluctuations were the same everywhere – because when they happened, they were causally connected! Inflation expanded space, but those fluctuations provided the basic structure on which the stuff we observe in the universe developed. Since that basic underlying structure is the same everywhere, everything built on top it is the same as well.

We’ve seen lots of evidence for inflation, but it hasn’t quite been a universally accepted idea.

The next piece of the puzzle is gravity. Gravity at least appears to be very strange. All of the other forces in our universe behave in a consistent way. In fact, we’ve been able to show that they’re ultimately different aspects of the same underlying phenomena. All of the other forces can be described quantum mechanically, and they operate through exchange particles that transmit force/energy – for example, electromagnetic forces are transmitted by photons. But not gravity: we have no working quantum theory for how gravity works! We strongly suspect that it must, but we don’t know how, and up to now, we never found any actual proof that it does behave quantumly. But if it did, and if inflation happened, that means that those quantum fluctations during expansion, the things that provided the basic lattice on which matter and energy hang, should have created an echo in gravity!

Unfortunately, we can’t see gravity. The combination of inflation and quantum mechanics means that there should be gravitational fluctuations in the universe – waves in the basic field of gravity. We’ve predicted those waves for a long time. But we haven’t been able to actually test that prediction, because we didn’t have a way to see gravitational waves.

So now, I can finally get to this new result.

They believe that they found gravity waves in the cosmic microwave background. They used a brilliant scheme to observe them: if we look at the cosmic microwave background – not at any specific celestial object, but just at the background – gravitational waves would created a very subtle tensor polarization effect. So they created a system that could observe polarization. Then they removed all of the kinds of polarization that could be explained by anything other than gravitational waves. What they were left with was a very clear wave pattern in the polarization – exactly what was predicted by quantum inflation! You can see one of their images of this wave pattern at the top of this post.

If these new observations are confirmed, that means that we have new evidence for two things:

  1. Inflation happened. These gravity waves are an expected residue of inflation. They’re exactly what we would have expected if inflation happened, and we don’t have any other explanation that’s compatible with them.
  2. Gravity is quantum! If gravity wasn’t quantum, then expansion would have completely smoothed out the gravitational effects, and we wouldn’t see gravitational waves. Since we do see waves, it’s strong evidence that gravity really does have a quantum aspect. We still don’t know how it works, but now we have some really compelling evidence that it must!

Monte Carlo!

I was browsing around my crackpottery achives, looking for something fun to write about. I noticed a link from one of them to an old subject of one of my posts, the inimitable Miles Mathis. And following that, I noticed an interesting comment from Mr. Mathis about Monte Carlo methods: “Every mathematician knows that ‘tools’ like Monte Carlo are used only when you’ve got nothing else to go on and you are flying by the seat of your pants.” I find Monte Carlo computations absolutely fascinating. So instead of wasting time making fun of more of Mathis rubbish, I decided to talk about Monte Carlo methods.

It’s a little hard to talk about Monte Carlo methods, because there’s a lot of disagreement about exactly what they are. I’m going to use the broadest definition: a Monte Carlo method is a way of generating a computational result using repeated computations and random sampling.

In other words, Monte Carlo methods are a way of using random sampling to solve problems.

I’ll start with a really simple example. Suppose you want to know the value of \pi. (Pretend that you don’t know the analytical solution.) One thing that you could do is try to measure the circumference of a rod, and then divide it by its diameter. That would work, but it would be really hard to get much accuracy. You could, instead, get a great big square sheet of paper, and cover the whole thing in a single layer of grains of sand. Then, very carefully, you could remove the grains of sand that weren’t in the circle, compare it to the number of grains of sand that weren’t in the circle. By doing that, you could get a very, very accurate measurement of the area of the circle, and using that, you could get a much more accurate estimate of \pi.

The problem with that is: it’s really hard to get a perfect single-grain layer of sand all over the paper. And it would be a lot of very, very tedious work to get all of the grains that weren’t in the circle. And it would be very tedious to count them. It’s too much trouble.

Instead, you could just take 1,000 grains of sand, and drop them randomly all over the circle and the square. Then you could count how many landed in the circle. Or ever easier, you could just go to a place where lots of drunk people play darts! Draw a square around the dartboard, and count how many holes there are in the square wall around it, versus how many in the dartboard!

You’re not going to get a super-precise value for \pi – but you might be surprised just how good you can get!

That’s the basic idea of monte carlo simulation: you’ve got a problem that’s hard to compute, or one where you don’t know a closed-form solution to make it easy to compute. Getting the answer some other way is intractable, because it requires more work than you can reasonably do. But you’ve got an easy way to do a test – like the “is it in the circle or not” test. So you generate a ton of random numbers, and use those, together with the test, to do a sequence of trials. Then using the information from the trials, you can get a reasonable estimate of the value you wanted. The more trials you do, the better your estimate will be.

The more you understand the probability distribution of the space you’re sampling, the better your estimate will be. For example, in the \pi example above, we assumed that the grains of sand/darts would be distributed equally all over the space. If you were using the dartboard in a bar, the odds are that the distribution wouldn’t be uniform – there’d be many more darts hitting the dartboard than hitting the wall (unless I was playing). If you assumed a uniform distribution, your estimate would be off!

That’s obviously a trivial example. But in reality, the Monte Carlo method is incredibly useful for a wide range of purposes. It was used during World War II by the Manhattan project to help design the first atom bomb! They needed to figure out how to create a critical mass that would sustain a nuclear chain reaction; to do that, they needed to be able to compute neutron diffusion through a mass of radioactive uranium. But that’s a very hard problem: there are so many degrees of freedom – so many places where things could proceed in several seemingly (or actually!) random directions. With the computers they had available to them at the time, there was absolutely no way that they could write a precise numerical simulation!

But, luckily for them, they had some amazing mathematicians working on the problem! One of them, Stanislav Ulam, had been sick, and while he was recovering, fooled around with some mathematical problems. One of them involved a solitaire game, called Canfield. Ulam wanted to figure out how often the game of Canfield was winnable. He couldn’t figure out how to do it analytically, but he realized that since the deals of cards are a uniform distribution, then if you were to take a computer, and make it play through 1000 games, the number of times that it won would be a pretty good estimate of how many times the game was winnable in general.

In that case, it’s obvious that a complete solution isn’t feasible: there are 52! possible deals – roughly 3\times 10^{66}! But with just a couple of hundred trials, you can get a really good estimate.

Ulam figured that out for the card game. He explained it to Jon von Neumann, and von Neumann realized that the same basic method could be used for the Neutron diffraction process!

Since then, it’s been used as the basis for a widely applicable approach to numeric integration. It’s used for numerous physics simulations, where there is no tractable exact solution – for example, weather prediction. (We’ve been able to get reasonably accurate weather predictions up to six days in advance, using very sparse input data, by using Monte Carlo methods!) It’s an incredibly useful, valuable technique – and anyone who suggests that using Monte Carlo is in any way a half-assed solution is an utter jackass.

I’ll finish up with a beautiful example – my favorite example of combining analytical methods with Monte Carlo. It’s another way of computing an estimate of \pi, but it gets a more accurate result with fewer trials than the sand/darts.

It’s based on a problem Buffon’s needle. Buffon’s needle is a problem first proposed by the Count of Buffon during the 1700s. He asked: suppose I drop a needle onto a panelled wood floor. What’s the probability that the needle will fall so that it crosses a one of the joints between different boards?

Using some very nice analytical work, you can show that if the panels have uniform width t, and the needle has length l, then the probability of a needle crossing a line is: \frac{2l}{\pi t}. That gives us the nice property that if we let l = \frac{t}{2}, then the probability of crossing a line is \frac{1}{\pi}.

Using that, you can do a Monte Carlo computation: take a sheet of paper, and a couple of hundred matchsticks. Draw lines on the paper, separated by twice the length of a matchstick. Then scatter the matchsticks all over the paper. Divide the total number of matchsticks by the number that crossed a line. The result will be roughly \pi.

For example – with 200 trials, I got 63 crossing a line. That gives me roughly 3.17 as an estimate of \pi. That’s not half-bad for a five minute experimental estimate!

Squishy Equivalence with Homotopy

In topology, we always talk about the idea of continuous deformation. For example, we say that two spaces are equivalent if you can squish one into the other – if your space was made of clay, you could reshape it into the other just by squishing and molding, without ever tearing or gluing edges.

That’s a really nice intuition. But it’s a very informal intuition. And it suffers from the usual problem with informal intuition: it’s imprecise. There’s a reason why math is formal: because it needs to be! Intuition is great, as far as it goes, but if you really want to be able to understand what a concept means, you need to go beyond just intuition. That’s what math is all about!

We did already talk about what topological equivalence really is, using homeomorphism. But homeomorphism is not the easiest idea, and it’s really hard to see just how it connects back to the idea of continuous deformation.

What we’re going to do in this post is look at a related concept, called homotopy. Homotopy captures the idea of continuous deformation in a formal way, and using it, we can define a form of homotopic equivalence. It’s not quite equivalent to homeomorphism: if two spaces are homeomorphic, they’re always homotopy equivalent; but there are homotopy equivalent spaces that aren’t homeomorphic.

How can we capture the idea of continuous transformation? We’ll start by looking at it in functions: suppose I’ve got two functions, f and g. Both f and g map from points in a topological space A to a topological space B. What does it mean to say that the function f can be continuously transformed to g?

We can do it using a really neat trick. We’ll take the unit interval space – the topological space using the difference metric over the interval from 0 to 1. Call it U = [0, 1].

f can be continuously deformed into g if, and only if, there is a continuous function t: A \times U \rightarrow B, where \forall a \in A: t(a, 0) = f(a) \land t(a, 1) = g(a).

If that’s true, then we say t is a homotopy between f and g, and that f and g are homotopic.

That’s just the first step. Homotopy, the way we just defined it, doesn’t say anything about topological spaces. We’ve got two spaces, but we’re not looking at how to transform one space into the other; we’re just looking at functions that map between the spaces. Homotopy says when two functions between two spaces are loosely equivalent, because one can be continuously deformed into the other.

To get from there to the idea of transformability of spaces, we need to think about what we’re trying to say. We want to say that a space A can be transformed into a space BB. What does that really mean?

One way to say it would be that if I’ve got A, I can mush it into a shape B, and then much it back to A, without ever tearing or gluing anything. Putting that in terms of functions instead of squishies, that means that there’s a continous function f from A to B, and then a continous function g back from B to A. It’s not enough just to have that pair of functions: if you apply f to map A to B, and then apply g to map back, you need to get back something that’s indistinguishable from what you started with.

Formally, if A and B are topological spaces, and f: A \rightarrow B and g: B \rightarrow A are continuous functions, then the spaces A and B are homotopically equivalent – equivalent over squishing and remolding, but not tearing or gluing – if f \circ g is homotopic with the id function on A, and g \circ f is homotopic with the id function on B.

That captures exactly the notion of continuous transformation that we tried to get with the intuition at the start. Only now it’s complete and precise – we’ve gotten rid of the fuzziness of intuition.