Category Archives: Good Math

Friday Pathological Programming: Unlambda, or Programming Without Variables

Todays tasty treat: a variable-free nearly-pure functional programming language: Unlambda.
Unlambda is based on the SKI combinator calculus. The [SKI calculus][ski] is a way of writing lambda calculus without variables, and without lambda. It’s based on three combinators: S, K, and I:
1. S = λ x y z . x z (y z)
2. K = λ x y . x
3. I = λ x . x
Given only S and K, you can actually write *any* lambda calculus expression – and therefore, you can write any computable function using nothing but S and K. I is often added for convenience; it’s just a shorthand for “SKK”.
You can do recursion in SKI; the Y combinator of lambda calculus is:
Y = S S K (S (K (S S (S (S S K)))) K)
Unlambda is based on SKI; but it extends it with a few additional kinds of sugar to make things easier to write. (Nothing makes Unlambda easier to *read*!) The basic idea is that *everything* in unlambda is a function; and in particular, everything is a function that takes a *single* variable: you don’t need more than one parameter, because you can always using [currying][curry] to express any multiparameter function as a series of single-parameter functions.
(As a refresher, if you have a two-parameter function in lambda calculus, like “λ x y . x + y”; *currying* is translating it into a one parameter function that returns a one parameter function: “λ x .(λ y. x + y)”.)
There are eight basic instructions in Unlambda:
1. ““”` : the application operator. “`’AB`” applies the function `A` to the parameter `B`. You can think of this as being something like the “(” in Lisp; there is no close paren because since every function takes only one parameter, there is no *need* for a close paren. And hey, adding a syntactically unnecessary close paren would only make the code easier to read, and we can’t have that!
2. “`k`” : the K combinator.
3. “`s`” : the S combinator.
4. “`i`” : the I combinator.
5. “`v`” : a “drop” combinator: takes a parameter, discards it, and returns “`v`”
6. “`d`” : the “delay” operator; a way of doing *lazy evaluation* in Unlambda. Unlambda normally evaluates the argument before it
applies a function. `’dA` is the same thing as `A`, except that `A` isn’t evaluated until “`’dA`” is applied to something else.
7. “`c`” : the “call-with-current-continuation” combinator. This is equivalent to [“call/cc”][callcc] in Scheme. What it does is call some function with the current state of the computation captured as a function as its parameter. What this means is that “`’cAB`” calls “`A`”. “`A`” receives the value “`cAB`” as a parameter; during A, it can return a value, *or* it can call the continuation, which will evaluate “`cAB`” again. So this is the function-based equivalent of a loop. “c” turns the “loop body” into a function; so at the end of the loop, you re-invoke the continuation with a new parameter to run the next iteration.
8. “`.`” : The “.” operator is actually a meta-operator. “`.x`” is a function which returns its parameter, *and* prints the character “x”.
9. “`r`” is a shorthand for “`.CR`”, where CR is the carriage return character.
And that is it. The entire language.
Note that there are no numbers! We need to build numbers using church numerals.
* The church numeral “0” is “λ f. (λ x . x)”. In Unlambda, that’s “`’ki`”.
* The church numeral “1” is “λ f . (λ x . f x)”; in unlambda, that’s “`i`”.
* The church numeral “2” is “λ f . (λ x. f f x)”; or “`”s”s’kski`”.
* The church numeral “3” is “λ f . (λ x . f f f x)”; or “`”s”s’ksk”s”s’kski`”.
* And so on.. Just keep appending “`”s”skski`” for each successive number.
So… On to a couple of programs!
Hello world in Unlambda is:
`r“““““`.H.e.l.l.o. .w.o.r.l.di
It’s pretty straightforward. It’s actually sort of easier to understand if you look from right to left. The “`.H.e.l.l.o…`” is basically a series of “`i`” combinators, which print out the characters of hello world. The trailing “`i`” is there because “`.d`” needs a parameter in order to do anything. And the list of “`’`”s at the beginning are there to make the “.” functions be applied. Finally, apply “`’r`” to the whole thing – which prints a newline at the end.
Next, a counter: starts with “N=1”; print “N” asterisks, add one, and then repeat.
“r`ci“s`k`c“s“s`ksk`kr.*
If you look at it, you can see the number pattern in there; “`”s”s’ksk`”. That fragment is an SKI function to add one. You capture a continuation for the main iteration; run “`i”s’k’cN`”; that’s a fragment that adds “1” to N, and then prints N. Since you’ve captured the continuation, you can re-invoke it on 1+n; the rest captures *another* continuation which is the inner loop for printing “*”s; it does that, and then it invokes the out continuation to go back and do it for “N+1”.
One last, which I’ll leave for you to trace through yourself: this one generates the fibonacci sequence; and for each element of it, it prints out the numbers using a line of asterisks containing “N” asterisks for the number “N”:
“`s“s“sii`ki
`k.*“s“s`ks
“s`k`s`ks“s“s`ks“s`k`s`kr“s`k`sikk
`k“s`ksk
I’ll also point out that there are a ton of variants on Unlambda, ranging from the really interesting ([LazyK][lazyk]) to the bizzarely minimal ([Iota][iota]) to the downright goofy ([Jot][jot]).
[callcc]: http://community.schemewiki.org/?call-with-current-continuation
[ski]: http://goodmath.blogspot.com/2006/05/from-lambda-calculus-to-combinator.html
[curry]: http://goodmath.blogspot.com/2006/05/my-favorite-calculus-lambda-part-1.html
[iota]: http://ling.ucsd.edu/~barker/Iota/
[lazyk]: http://homepages.cwi.nl/~tromp/cl/lazy-k.html
[jot]: http://ling.ucsd.edu/~barker/Iota/#Goedel

From Lambda Calculus to Cartesian Closed Categories

This is one of the last posts in my series on category theory; and it’s a two parter. What I’m going to do in these two posts is show the correspondence between lambda calculus and the [cartesian closed categories][ccc]. If you’re not familiar with lambda calculus, you can take a look at my series of articles on it [here][lambda]. You might also want to follow the CCC link above, to remind yourself about the CCCs. (The short refresher is: a CCC is a category that has an exponent and a product, and is closed over both. The product is an abstract version of cartesian set product; the exponent is an abstraction of the idea of a function, with an “eval” arrow that does function evaluation.)
Today I’m going to show you one half of the correspondence: how the lambda-theory of any simply typed lambda calculus can be mapped onto a CCC. Tomorrow, I’ll show the other direction: how an arbitrary CCC can be mapped onto a lambda-theory.
First, let’s define the term “lambda theory”. In the simply typed lambda calculus, we always have a set of *base types* – the types of simple atomic values that can appear in lambda expressions. A *lambda theory* is a simply typed lambda calculus, plus a set of additional rules that define equivalences over the base types.
So, for example, if one of the base types of a lambda calculus was the natural numbers, the lambda theory would need to include rules to define equality over the natural numbers:
1. x = y if x=0 and y=0; and
2. x = y if x=s(x’) and y=s(y’) and x’ = y’
So. Suppose we have a lambda-theory **L**. We can construct a corresponding category C(**L**). The objects in C(**L**) are the *types* in **L**. The arrows in C(**L**) correspond to *families* of expressions in **L**; an arrow f : A → B corresponds to the set of expressions of type B that contain a single *free* variable of type A.
The *semantics* of the lambda-theory can be defined by a *functor*; in particular, a *cartesian closed* functor F that maps from C(**L**) to the closed cartesian category of Sets. (It’s worth noting that this is completely equivalent to the normal Kripke semantics for lambda calculus; but when you get into more complex lambda calculi, like Hindley-Milner variants, this categorical formulation is *much* simpler.)
We describe how we build the category for the lambda theory in terms of a CCC using something called an *interpretation function*. It’s really just a notation that allows us to describe the translation recursively. The interpretation function is written using brackets: [A] is the categorical interpretation of the type A from lambda calculus.
So, first, we define an object for each type in **L**, including “unit”, which is a special distinguished value used for an expression that doesn’t return anything:
* ∀ A ∈ basetypes(**L**), [A] = AC ∈ C(**L**)
* [unit] = 1C
Next, we need to define the typing rules for complex types:
* [ A × B ] = [A] × [B]
* [ A → B ] = [B][A]
Now for the really interesting part. We need to look at type derivations – that is, the type inference rules of the lambda calculus – to show how to do the correspondences between more complicated expressions. Type derivations are done with a *context* containing a set of *type judgements*. Each type judgement assigns a type to a lambda term. We write type contexts as capital greek letters like Γ. There are two translation rules for contexts:
* [∅] = 1C
* [Γ, x : A] = [Γ] × [A]
We also need to describe what to do with the values of the primitive types:
* For each value v : A, there is an arrow v : 1 → AC.
And now the rest of the rules. Each of these is of the form [Γ :- x : A] = arrow, where we’re saying that Γ entails the type judgement “x : A”. What it means is the object corresponding to the type information covering a type inference for an expression corresponds to the arrow in C(**L**).
* [Γ :- unit : Unit] = ! : [Γ] → [Unit] *(A unit expression is a special arrow “!” to the Unit object)*
* [Γ :- a : AC] = a º ! : [Γ] → [AC] *(A simple value expression is an arrow composing with ! to form an arrow from Γ to the type object of Cs type.)*
* [Γ x : A :- x : A] = π2 : ([Γ ] × [A]) → [A] *(A term which is a free variable of type A is an arrow from the product of Γ and the type object A to A; That is, an unknown value of type A is some arrow whose start point will be inferred by the continued interpretation of gamma, and which ends at A. So this is going to be an arrow from either unit or a parameter type to A – which is a statement that this expression evaluates to a value of type A.)*
* [Γ, x:A :- x’ : A’],x’ ≠ x = [Γ :- x’ : A’] º π1 where π1 : ([Γ] × [A]) → [A’]. *(If the type rules of Γ plus the judgement x : A gives us x’ : A’, then the term x’ : A’ is an arrow starting from the product of the interpretation of the full type context with A), and ending at A’. This is almost the same as the previous rule: it says that this will evaluate to an arrow for an expression that results in type A.)*
* [Γ :- λ x:A . M : A → B] = curry([Γ, x:A :- M:B]) : [Γ] → [B][A] *(A function typing rule: the function maps to an arrow from the type context to an exponential [B][A], which is a function from A to B.)*
* [Γ :- (M M’) : B] = evalC,B º ([Γ :- M : C → B], [Γ :- M’ : C]) : [Γ] → [B] *(A function evaluation rule; it takes the eval arrow from the categorical exponent, and uses it to evaluate out the function.)*
There are also two projection rules for the decomposing categorical products, but they’re basically more of the same, and this is already dense enough.
The intuition behind this is:
* *arrows* between types are families of values. A particular value is a particular arrow from unit to a type object.
* the categorical exponent *is* the same as a function type; a function *value* is an arrow to the type. Evaluating the function is using the categorical exponent’s eval arrow to “decompose” the exponent, and produce an arrow to the function’s result type; *that* arrow is the value that the function evaluates to.
And the semantics – called functorial semantics – maps from the objects in this category, C(**L**) to the category of Sets; function values to function arrows; type objects to sets; values to value objects and arrows. (For example, the natural number type would be an object like the NNO we showed yesterday in C(**L**), and the set of natural numbers is the sets category would be the target of the functor.)
Aside from the fact that this is actually a very clean way of defining the semantics of a not-so-simply typed lambda calculus, it’s also very useful in practice. There is a way of executing lambda calculus expressions as programs that is based on this, called the [Categorical Abstract Machine][cam]. The best performing lambda-calculus based programming language (and my personal all-time-favorite programming language), [Objective-CAML][caml] is based on the CAM. (CAML stands for categorical abstract machine language.)
[cam]: http://portal.acm.org/citation.cfm?id=34883&dl=ACM&coll=portal
[caml]: http://caml.inria.fr/
[lambda]: http://goodmath.blogspot.com/2006/06/lamda-calculus-index.html
[ccc]: http://scienceblogs.com/goodmath/2006/06/categories_products_exponentia_1.php

Metamath and the Peano Induction Axiom

In email, someone pointed me at an automated proof system called [Metamath][metamath]. Metamath generates proofs of mathematical statements using ZF set theory. The proofs are actually relatively easy to follow, which is quite unusual for an automated theorem prover. I’ll definitely write more about Metamath some other time, but I thought it would be interesting today to point to [metamaths proof of the fifth axiom of Peano arithmetic][meta-peano], the principle of induction. Here’s a screenshot of the first 15 steps; following the link to see the whole thing.
peano5-proof.jpg
[metamath]: http://us.metamath.org/mpegif/mmset.html#overview
[meta-peano]: http://us.metamath.org/mpegif/peano5.html

Categorical Numbers

Sorry for the delay in the category theory articles. I’ve been busy with work, and haven’t had time to do the research to be able to properly write up the last major topic that I plan to cover in cat theory. While doing my research on closed cartesian categories and lambda calculus, I came across this little tidbit that I hadn’t seen before, and I thought it would be worth taking a brief diversion to look at.
Category theory is sufficiently powerful that you can build all of mathematics from it. This is something that we’ve known for a long time. But it’s not really the easiest thing to demonstrate.
But then, I came across this lovely little article on wikipedia. It turns out that you can [build the natural numbers entirely in terms of categories][wikipedia] in a beautifully simple way.
Before we dive into categorical numbers, let’s take a moment to remember what we really mean by natural numbers.
Peano Arithmetic
——————
The meanings of natural numbers are normally defined in terms of what we call *Peano arithmetic*. Peano arithmetic defines the natural numbers, **N** in terms of five fundamental axioms:
1. 0 ∈ **N** (0 is a natural number.)
2. ∀ a ∈ **N** ∃ a’=s(n), a’ ∈ **N**. (Every natural number a has a successor, s(a))
3. ¬ ∃ a ∈ **N** : s(a)=0. (No natural number has 0 as its successor.)
4. ∀ a,b ∈ **N**: s(a)=s(b) ⇒ a=b. (Each number has a unique successor.)
5. (P(0) ∧ ∀ a : P(a) ⇒ P(s(a))) ⇒ ∀ b ∈ **N** : P(b). (If a property P is true for 0; and it can be shown that for any number a, if it’s true for a, then it’s true for b, then that means that it’s true for all natural numbers. This is the *axiom of induction*.)
To get to arithmetic, we need to add two operations: addition and multiplication.
We can define addition recursively quite easily:
1. ∀ x ∈ **N**: x+0 = x
2. ∀ x,y ∈ **N**: s(x)+y=x+s(y)
And once we have addition, we can do multiplication:
1. ∀ x ∈ **N**: x * 0 = 0 ∧ 0 * x = 0
2. ∀ x ∈ **N**: 1 * x = x
3. ∀ x,y ∈ **N**: s(x)*y = (x*y) + y
To sum up: Peano arithmetic is based on a definition of the natural numbers in terms of five fundamental properties, called the Peano axioms, built on “0”, a successor function, and induction. If we have those five properties, we can define addition and multiplication; and given those, we have the ability to build up arbitrary functions, sets, and the other fundamentals of mathematics in terms of the natural numbers.
Peano Naturals and Categories
———————————
The peano axioms can be captured in a category diagram for a relatively simple object called a *natural number object (NNO)*. A natural number object can exist in many categories: basically any category with a terminal object *can* have NNOs. Any closed cartesian category *does* have an NNO.
There are two alternate constructions; a general construct that works for all NNO-containing categories; and a more specific one for closed cartesian categories. (You can show that they are ultimately the same in a CCC.)
I actually think that the general one is easier to understand, so that’s what I’ll describe. This is basically a construction that captures the ideas of successor and recursion.
Given a category C with a terminal object **1**. A *natural number object* is an object *N* with two arrows z : 1 → *N*, and s : *N* → *N*, where z and s have the following property.
(∀ a ∈ Obj(C), (q : 1 → a, f : a → a) ∈ Arrows(C)) : u º z = q ∧ u º s = f º u.
That statement means the same thing as saying that the following diagram commutes:

cat-numbers.jpg

Let’s just walk through that a bit. Think of the object **N** as a set. “s” is successor. So S is an arrow from *N* to *N* that maps each number to its successor. *N* is “rooted” by the terminal object; that gives it its basic structure. The “z” arrow from the terminal object maps to the zero value in *N*. So we’ve got an arrow that selects zero; and we’ve got an arrow that defines the successor.
Think of “q” as a predicate, and “u” as a statement that a value in *N* satisfies the predicate.
So u º z, that is the path from 1 to a by stepping through z and u is a statement that the predicate “q” is true for z (zero); because composing u (the number I’m composed with satisfies q) and z (the number zero) gives us “q” (zero satifies q).
Now the induction rule part. *N* maps to *N* by s – that’s saying that s is an arrow from a number in *N* to its successor. If you have an arrow composition that says “q is true for n”, then f can be composed onto that giving you “q is true for n + 1”.
If for all q, u is unique – that is, there is only *one* mapping where induction works; then you’ve got a categorical structure that satisfies the Peano axioms.
[wikipedia]: http://en.wikipedia.org/wiki/Natural_number_object

φ, the Golden Ratio

Lots of folks have been asking me to write about φ, the golden ratio. I’m finally giving up and doing it. I’m not a big fan of φ. It’s a number
which has been adopted by all sorts of flakes and crazies, and there are alleged sightings of it in all sorts of strange places that are simply *not* real. For example, pyramid loons claim that the great pyramids in Egypt have proportions that come from φ – but they don’t. Animal mystics claim that the ratio of drones to larvae in a beehive is approximately φ – but it isn’t.
But it is an interesting number. My own personal reason for thinking it’s cool is representational: if you write φ as a continued fraction, it’s [1;1,1,1,1…]; and if you write it as a continued square root, it’s 1 + sqrt(1+sqrt(1+sqrt(1+sqrt(1…)))).
What is φ?
—————–
φ is the value so-called “golden ratio”: it’s the number that is a solution for the equation (a+b)/a = (a/b). (1+sqrt(5))/2. It’s the ratio where if you take a rectangle where the ratio of the length of the sides is 1:φ, then if you remove the largest possible square from it, you’ll get another rectangle whose sides have the ration φ:1. If you take the largest square from that, you’ll get a rectangle whose sides have the ratio 1:φ. And so on:

phi.jpg

Allegedly, it’s the proportion of sides of a rectangle that produce the most aesthetically beautiful appearance. I’m not enough of a visual artist to judge that, so I’ve always just taken that on faith. But it *does* shows up in many places in geometry. For example, if you draw a five-pointed star, the ratio of the length of one of the point-to-point lines of the star to the length of the sides of the pentagon inside the star are φ:1:

pentaphi.jpg

φ is also related to the fibonacci series. In case you don’t remember, the fibonacci series is the set of numbers where each number in the series is the sum of the two previous: 1,1,2,3,5,8,13,… If Fib(n) is the *n*th number in the series, you can compute it as:
phi-fix-eq.gif
History

Quaternions: upping the dimensions of complex numbers

Quaternions
Last week, after I wrote about complex numbers, a bunch of folks wrote and said “Do quaternions next!” My basic reaction was “Huh?” I somehow managed to get by without ever being exposed to quaternions before. They’re quite interesting things.
The basic idea behind quaterions is: we see some amazing things happen when we expand the dimensionality of numbers from 1 (the real numbers) to 2 (the complex numbers). What if we add *more* dimensions?
It doesn’t work for three dimensions But you *can* create a system of numbers in *four* dimensions. As with complex numbers, you need a distinct unit for each dimension. In complex, those were 1 and i. For quaternions, you need for units: we’ll call them “1”, “i”, “j”, and “k”. In quaternion math, the units have the following properties:
1. i2 = j2 = k2 = ijk = -1
2. ij = k, jk = i, and ki=j
3. ji = -k, kj = -i, and ik=-j
No, that last one is *not* an error. Quaternion multiplication is *not* commutative: ab &neq; ba. It *is* associative over multiplication, meaning (a*b)*c = a*(b*c). And fortunately, it’s both commutative and associative over addition.
Anyway, we write a quaternion as “a + bi + cj + dk”; as in “2 + 4i -3j + 7k”.
Aside from the commutativity thing, quaternions work well as numbers; they’re a non-abelian group over multiplication (assuming you omit 0); they’re an abelian group over addition; they form a division algebra *over* the reals (meaning that you can define an algebra with operations up to multiplication and division over the quaternions, and operations in this algebra will generate the same results as normal real algebra if the i,j, and k components are 0.)
History
———
Quaternions are a relatively recent creation, so we know the history quite precisely. Quaternions were invented by a gentleman named William Hamilton on October 16th, 1843, while walking over the Broom bridge with his wife. He had been working on trying to work out something like the complex numbers with more dimensions for a while, and the crucial identity “ijk=-1” struck him while he was walking over the bridge. He immediately *carved it into the bridge* before he forgot it. Fortunately, this act of vandalism wasn’t taken seriously: the Broom bridge now has a plaque commemorating it!
Quaternion_Plague_on_Broom_Bridge.jpg
Quaternions were controversial from the start. The fundamental step that Hamilton had taken in making quaternions work was abandoning commutativity, and many people believed that this was simply *wrong*. For example, Lord Kelvin said “Quaternions came from Hamilton after his really good work had been done; and, though beautifully ingenious, have been an unmixed evil to those who have touched them in any way.”
Why are they non-commutative?
——————————-
Quaternions are decidedly strange things. As I mentioned above, they’re non-commutative. Doing math with things where ab &neq; ba was a *huge* step, and it can be strange. But it is necessary for quaternions. Here’s why.
Let’s take the fundamental identity of quaternions: ijk = -1
Now, let’s multiply both sides by k: ij(k2) = -1k
So, ij(-1)=-k, and ij=k.
Now, multiply by *i* on the left on both sides: i2j = ik
And so, -1j = ik, and ik=-j.
Keep playing with multiplication in the identities, and you find that the identities simple *do not work* unless it’s non-commutative.
What are they good for?
————————-
For the most part, quaternions aren’t used much anymore. Initially, they were used in many places where we now use complex vectors, but that’s mostly faded away. What they *are* used for is all sorts of math that in any way involves *rotation*. Quaternions capture the essential properties of rotation.
To describe a rotation well, you actually need to have *extra* dimensions. To describe an *n*-dimensional rotation, you need *n+2* dimensions. Since in our three dimensional world, we rotate things around a *line* which is two dimensional, the description of the rotation needs *four* dimensions. And further, rotation itself is non-commutative.
Take a book and put it flat on the table, with the binding to the left. Flip it so that the binding stays on the same side, but now you’ve got the back cover on top, upside-down. Then rotate it clockwise 90 degrees on the table. You now have the book with its back cover up, and the binding facing away from you.
Now, start over: book right side up, binding facing left. Rotate it clockwise 90 degrees. Now flip it the same way you did before (over the x axis if you think of left as -x, right as +x, away from you as +y, up as +z, etc.)
rotate.jpg
What’s the result? You get the book with its back cover up, and the binding factor *towards* you. It’s 180 degrees different depending on which order you do it in. And 180 degrees is a reverse equivalent to the sign difference that you get in quaternion multiplication! Quaternions have exactly the right properties for describing rotations – and in particular, the *composition* of rotations.
The use of quaternions for rotations is used in computer graphics (when something rotates on the screen of your PS2, it’s doing quaternion math!); in satellite navigation; and in relativity equations (for symmetry wrt rotation).

Ω: my favorite strange number

Ω is my own personal favorite transcendental number. Ω isn’t really a specific number, but rather a family of related numbers with bizzare properties. It’s the one real transcendental number that I know of that comes from the theory of computation, that is important, and that expresses meaningful fundamental mathematical properties. It’s also deeply non-computable; meaning that not only is it non-computable, but even computing meta-information about it is non-computable. And yet, it’s *almost* computable. It’s just all around awfully cool.
So. What is it Ω?
It’s sometimes called the *halting probability*. The idea of it is that it encodes the *probability* that a given infinitely long random bit string contains a prefix that represents a halting program.
It’s a strange notion, and I’m going to take a few paragraphs to try to elaborate on what that means, before I go into detail about how the number is generated, and what sorts of bizzare properties it has.
Remember that in the theory of computation, one of the most fundamental results is the non-computability of *the halting problem*. The halting problem is the question “Given a program P and input I, if I run P on I, will it ever stop?” You cannot write a program that reads P and I and gives you the answer to the halting problem. It’s impossible. And what’s more, the statement that the halting problem is not computable is actually *equivalent* to the fundamental statement of Gödel’s incompleteness theorem.
Ω is something related to the halting problem, but stranger. The fundamental question of Ω is: if you hand me a string of 0s and 1s, and I’m allowed to look at it one bit at a time, what’s the probability that *eventually* the part that I’ve seen will be a program that eventually stops?
When you look at this definition, your reaction *should* be “Huh? Who cares?”
The catch is that this number – this probability – is a number which is easy to define; it’s not computable; it’s completely *uncompressible*; it’s *normal*.
Let’s take a moment and look at those properties:
1. Non-computable: no program can compute Ω. In fact, beyond a certain value N (which is non-computable!), you cannot compute the value of *any bit* of Ω.
2. Uncompressible: there is no way to represent Ω in a non-infinite way; in fact, there is no way to represent *any substring* of Ω in less bits than there are in that substring.
3. Normal: the digits of Ω are completely random and unpatterned; the value of and digit in Ω is equally likely to be a zero or a one; any selected *pair* of digits is equally likely to be any of the 4 possibilities 00, 01, 10, 100; and so on.
So, now we know a little bit about why Ω is so strange, but we still haven’t really defined it precisely. What is Ω? How does it get these bizzare properties?
Well, as I said at the beginning, Ω isn’t a single number; it’s a family of numbers. The value of *an* Ω is based on two things: an effective (that is, turing equivalent) computing device; and a prefix-free encoding of programs for that computing device as strings of bits.
(The prefix-free bit is important, and it’s also probably not familiar to most people, so let’s take a moment to define it. A prefix-free encoding is an encoding where for any given string which is valid in the encoding, *no prefix* of that string is a valid string in the encoding. If you’re familiar with data compression, Huffman codes are a common example of a prefix-free encoding.)
So let’s assume we have a computing device, which we’ll call φ. We’ll write the result of running φ on a program encoding as the binary number p as &phi(p). And let’s assume that we’ve set up φ so that it only accepts programs in a prefix-free encoding, which we’ll call ε; and the set of programs coded in ε, we’ll call Ε; and we’ll write φ(p)↓ to mean φ(p) halts. Then we can define Ω as:

Ωφ,ε = Σp ∈ Ε|p↓ 2-len(p)

So: for each program in our prefix-free encoding, if it halts, we *add* 2-length(p) to Ω.
Ω is an incredibly odd number. As I said before, it’s random, uncompressible, and has a fully normal distribution of digits.
But where it gets fascinatingly bizzare is when you start considering its computability properties.
Ω is *definable*. We can (and have) provided a specific, precise definition of it. We’ve even described a *procedure* by which you can conceptually generate it. But despite that, it’s *deeply* uncomputable. There are procedures where we can compute a finite prefix of it. But there’s a limit: there’s a point beyond which we cannot compute *any* digits of it. *And* there is no way to compute that point. So, for example, there’s a very interesting paper where the authors [computed the value of Ω for a Minsky machine][omega]; they were able to compute 84 bits of it; but the last 20 are *unreliable*, because it’s uncertain whether or not they’re actually beyond the limit, and so they may be wrong. But we can’t tell!
What does Ω mean? It’s actually something quite meaningful. It’s a number that encodes some of the very deepest information about *what* it’s possible to compute. It gives a way to measure the probability of computability. In a very real sense, it represents the overall nature and structure of computability in terms of a discrete probability. It’s an amazingly dense container of information – as a n infinitely long number and so thoroughly non-compressible, it contains an unmeasurable quantity of information. And we can’t get near most of it!
[omega]: http://www.expmath.org/expmath/volumes/11/11.3/Calude361_370.pdf

Irrational and Transcendental Numbers

If you look at the history of math, there’ve been a lot of disappointments for mathematicians. They always start off with an idea of math as a clean, beautiful, elegant thing. And they seem to often wind up disappointed.
Which leads us into todays strange numbers: irrational and transcendental numbers. Both of them were huge disappointments to the mathematicians who discovered them.
So what are they? We’ll start with the irrationals. They’re numbers that aren’t integers, and that aren’t a ratio of any two integers. So you can’t write them as a normal fraction. If you write them as a continued fraction, then they go on forever. If you write them in decimal form, they go on forever without repeating. π, √2, *e* – they’re all irrational. (Incidentally, the reason that they’re called “irrational” isn’t because they don’t make sense, or because their decimal representation is crazy, but because they can’t be written *as ratios*. My high school algebra teacher who first talked about irrational numbers said that they were irrational because numbers that never repeated in decimal form weren’t sensible.)
The transcendentals are even worse. Transcendental numbers are irrational; but not only can transcendental numbers not be written as a ratio of integers; not only do their decimal forms go on forever without repeating; transcendental numbers are numbers that *can’t* be described by algebraic operations: there’s no finite sequence of multiplications, divisions, additions, subtractions, exponents, and roots that will give you the value of a transcendental number. √2 is not transcendental; *e* is.
History
———–
The first disappointment involving the irrational numbers happened in Greece, around 500 bce. A rather brilliant man by the name of Hippasus, who was part of the school of Pythagoras, was studying roots. He worked out a geometric proof of the fact that √2 could not be written as a ratio of integers. He showed it to his teacher, Pythagoras. Pythagoras was convinced that numbers were clean and perfect, and could not accept the idea of irrational numbers. After analyzing Hippasus’s proof, and being unable to find any error in it, he became so enraged that he *drowned* poor Hippasus.
A few hundred years later, Eudoxus worked out the basic theory of irrationals; and it was published as a part of Euclid’s mathematical texts.
From that point, the study of irrationals pretty much disappeared for nearly 2000 years. It wasn’t until the 17th century that people started really looking at them again. And once again, it led to disappointment; but at least no one got killed this time.
Mathematicians had come up with yet another idea of what the perfection of math meant – this time using algebra. They decided that it made sense that algebra could describe all numbers; so you could write an equation to define any number using a polynomial with rational coefficients; that is, an equation using addition, subtraction, multiple, division, exponents, and roots.
Leibniz was studying algebra and numbers, and he’s the one who made the unfortunate discovery: lots of irrational numbers are algebraic; but lots of them *aren’t*. He discovered it indirectly, by way of the sin function. You see, sin(x) *can’t* be computed from x using algebra. There’s no algebraic function that can compute it. Leibniz called sin a transcendental function, since it went beyond algebra.
Building on the work of Leibniz, Liouville worked out that you could easily construct numbers that couldn’t be computed using algebra. For example, the constant named after Liouville consists of a string of 0s and 1s where 10-i is 1 if/f there is some integer n such that n!=i.
Not too long after, it was discovered that *e* was transcendental. The main reason for this being interesting is because up until that point, every transcendental number known was *constructed*; *e* is a natural, unavoidable constant. Once *e* was shown irrational, others followed; in one neat side-note, π was shown to be transcendental *using e*. One of the properties that they discovered after the transcendence of *e* was that any transcendental number raised to a non-transcendental power was transcendental. Since e is *not* transcendental – it’s 1 – then π must be transcendental.
The final disappointment in this area came soon after; Cantor, studying the irrationals, came up with the infamous “Cantor’s diagonalization” argument, which shows that there are *more* transcendental numbers than there are algebraic ones. *Most* numbers are not only irrational; they’re transcendental.
What does it mean, and why does it matter?
———————————————-
Irrational and transcendental numbers are everywhere. Most numbers aren’t rational. Most numbers aren’t even algebraic. That’s a very strange notion: *we can’t write most numbers down*.
Even stranger, even though we know, per Cantor, that most numbers are transcendental, it’s *incredibly* difficult to prove that any particular number is transcendental. Most of them are; but we can’t even figure out *which ones*.
What does that mean? That our math-fu isn’t nearly as strong as we like to believe. Most numbers are beyond us.
Some interesting numbers that we know are either irrational or transcendental:
1. *e*: transcendental.
2. π: transcendental.
3. √2 : irrational, but algebraic,
4. √x, for all x that are not perfect squares are irrational.
5. log23 is irrational.
6. Ω, Chaitin’s constant, is transcendental.
What’s interesting is that we really don’t know very much about how transcendentals interact; and given the difficulty of proving that something is transcendental, even for the most well-known transcendentals, we don’t know much of what happens when you put them together. π+*e*; π×*e*; ππ; *e**e*, πe are all numbers that we *don’t know* if they;’re transcendental. In fact, for π+*e* (and π-*e*) we don’t even know if it’s *irrational*.
That’s the thing about these number. We have *such* a weak grasp of them that even things that seem like they should be easy and fundamental, *we don’t know how to do*.

Friday Pathological Programming: Programming Through Grammars in Thue

Today’s treat: [Thue][thue], pronounced two-ay, after a mathematician named Axel Thue.
Thue is a language based on [semi-Thue][semi-thue] grammars. semi-Thue grammars are equivalent to Chosmky-0 grammars, except that they’re even more confusing: a semi-Thue grammar doesn’t distinguish between terminal and non-terminal symbols. (Since Chomsky-0 is equivalent to a turing machine, we know that Thue is turing complete.) The basic idea of it is that a program is a set of rewrite rules that tell you how to convert one string into another. The language itself is incredibly simple; programs written in it are positively mind-warping.
There are two parts to a Thue program. First is a set of grammar rules, which are written in what looks like pretty standard BNF: “*<lhs> ::= <rhs>*”. After that, there is a section which is the initial input string for the program. The two parts are separated by the “::=” symbol by itself on a line.
As I said, the rules are pretty much basic BNF: whatever is on the left-hand side can be replaced with whatever is on the right-hand side of the rule, whenever a matching substring is found. There are two exceptions, for input and output. Any rule whose right hand side is “:::” reads a line of input from the user, and replaces its left-hand-side with whatever it read from the input. And any rule that has “~” after the “::=” prints whatever follows the “~” to standard out.
So, a hello world program:
x ::=~ Hello world
::=
x
Pretty simple: the input string is “x”; there’s a replacement for “x” which replaces “x” with nothing, and prints out “Hello world”.
How about something much more interesting. Adding one in binary. This assumes that the binary number starts off surrounded by underscore characters to mark the beginning and end of a line.
1_::=1++
0_::=1
01++::=10
11++::=1++0
_0::=_
_1++::=10
//::=:::
::=
_//_
Let’s tease that apart a bit.
1. Start at the right hand side:
1. “1_::=1++” – If you see a “1” on the right-hand end of the number,
replace it with “1++”, removing the “_” on the end of the line. The rest
of the computation is going to use the “++” to mark the point in
string where the increment is going to alter the string next.
2. “0_::=1” – if you see a “0” on the right-hand-end, then replace it with 1. This doesn’t insert a “++”, so none of the other rules are going to do anything: it’s going to stop. That’s what you want: if the string ended in a zero, then adding one to it just changes that 0 to a 1.
2. Now, we get to something interesting: doing the addition:
1. If the string by the “++” is “01”, then adding one will change it to “10”, and that’s the end of the addition. So we just change it to 10, and get rid of the “++. ”
2. If the string by the “++” is “11”, then we change the last digit to “0”, and move the “++” over to the left – that’s basically like adding 1 to 9 in addition: write down a zero, carry the one to the left.
3. Finally, some termination cleanup. If we get to the left edge, and there’s a zero, after the “_”, we can get rid of it. If we get to the left edge, and there’s a “1++”, then we replace it with 10 – basically adding a new digit to the far left; and get rid of the “++”, because we’re done.
3. Finally, inputting the number “//::=:::” says “If you see “//”, then read some input from the user, and replace the “//” with whatever you read.
4. Now we have the marker for the end of the rules section, and the string to start the program is “_//_”. So that triggers the input rule, which inputs a binary number, and puts it between the “_”s.
Let’s trace this on a couple of smallish numbers.
* Input: “111”
1. “_111_” → “_111++”
2. “_111++_” → “_11++0”
3. “_11++0” → “_1++00”
4. “_1++00” → “1000”
* Input: “1011”
1. “_1011_” → “_1011++”
2. “_1011++” → “_101++0”
3. “_101++0” → “_1100”.
Not so hard, eh?
Decrement is the same sort of thing; here’s decrement with an argument hardcoded instead of reading from input:
0_::=0–
1_::=0
10–::=01
00–::=0–1
_1–::=@
_0–::=1
_0::=
::=
_1000000000000_
Now, something more complicated. From [here][vogel], a very nicely documented program that computes fibonacci numbers. It’s an interesting mess – it converts things into a unary form, does the computation in unary, then converts back. This program uses a fairly neat trick to get comments. Thue doesn’t have any comment syntax. But they use “#” on the left hand side as a comment marker, and then make sure that it never appears anywhere else – so none of the comment rules can ever be invoked.
—-
#::= fibnth.t – Print the nth fibonacci number, n input in decimal
#::= (C) 2003 Laurent Vogel
#::= GPL version 2 or later (http://www.gnu.org/copyleft/gpl.html)
#::= Thue info at: http://www.catseye.mb.ca/esoteric/thue/
#::= This is modified thue where only foo::=~ outputs a newline.
#::= ‘n’ converts from decimal to unary-coded decimal (UCD).
n9::=*********,n
n8::=********,n
n7::=*******,n
n6::=******,n
n5::=*****,n
n4::=****,n
n3::=***,n
n2::=**,n
n1::=*,n
n0::=,n
n-::=-
#::= ‘.’ prints an UCD number and eats it.
.*********,::=.~9
.********,::=.~8
.*******,::=.~7
.******,::=.~6
.*****,::=.~5
.****,::=.~4
.***,::=.~3
.**,::=.~2
.*,::=.~1
.,::=.~0
~9::=~9
~8::=~8
~7::=~7
~6::=~6
~5::=~5
~4::=~4
~3::=~3
~2::=~2
~1::=~1
~0::=~0
#::= messages moving over ‘,’, ‘*’, ‘|’
*<::=<*
,<::=<,
|<::=*::=*0>
0>,::=,0>
0>|::=|0>
1>*::=*1>
1>,::=,1>
1>|::=|1>
2>*::=*2>
2>,::=,2>
2>|::=|2>
#::= Decrement n. if n is >= 0, send message ‘2>’;
#::= when n becomes negative, we delete the garbage using ‘z’ and
#::= print the left number using ‘.’.
*,-::=,2>
,,-::=,-*********,
|,-::=z
z*::=z
z,::=z
z|::=.
#::= move the left number to the right, reversing it (0 becomes 9, …)
#::= reversal is performed to help detect the need for carry. As the
#::= order of rule triggering is undetermined in Thue, a rule matching
#::= nine consecutive * would not work.
*c<::=c
,c<::=c
|c
0>d*::=d
1>d::=d*********,
#::= when the copy is done, ‘d’ is at the right place to begin the sum.
2>d::=f<|
#::= add left to right. 'f' moves left char by char when prompted by a '<'.
#::= it tells 'g' to its right what the next char is.
*f
,f
#::= when done for this sum, decrement nth.
|f’ ‘g’ drops an ‘i’ that increments the current digit.
0>g::=ig
*i::=<
,i::=i,*********
|i::=’ tells ‘g’ to move left to the next decimal position,
#::= adding digit 0 if needed (i.e. nine ‘*’ in reversed UCD)
*1>g::=1>g*
,1>g::=g::=|’ tells ‘g’ that the sum is done. We then prepare for copy (moving
#::= a copy cursor ‘c’ to the correct place at the beginning of the left
#::= number) and reverse (using ‘j’) the right number back.
*2>g::=2>g*
,2>g::=2>g,
|2>g::=c|*********j
#::= ‘j’ reverses the right number back, then leaves a copy receiver ‘d’
#::= and behind it a sum receiver ‘g’.
*j*::=j
j,*::=,********j
j,,::=,*********j,
j,|::=’ sent by ‘-‘
#::= will start when hitting ‘g’ the first swap + sum cycle
#::= for numbers 0 (left) and 1 (reversed, right).
::=
|n?-|,|********g,|
——
Ok. Now, here’s where we go *really* pathological. Here, in all it’s wonderful glory, is a [Brainfuck][brainfuck] interpreter written in Thue, written by Mr. Frederic van der Plancke. (You can get the original version, with documentation and example programs, along with a Thue interpreter written in Python from [here][vdp-thue])
——
#::=# BRAINFUNCT interpreter in THUE
#::=# by Frederic van der Plancke; released to the Public Domain.
o0::=0o
i0::=0i
o1::=1o
i1::=1i
o]::=]o
i]::=]i
o[::=[o
i[::=[i
oz::=zo
oZ::=Zo
iz::=zi
iZ::=Zi
0z::=z0
1z::=z1
]z::=z]
[z::=z[
_z::=z_
0Z::=Z0
1Z::=Z1
]Z::=Z]
[Z::=Z[
_Z::=Z_
}}]::=}]}
&}]::=]&}
&]::=]^
}[::=[{
&[::=[0::=0}
{1::=1{
?Z[::=[&
?z[::=[^
?z]::=]
?Z]::=]^
[{{::={[{
[{::=[
[::=[^
]{::={]
]::={]
0::=
1::=1
0{::={0
1{::={1
^[::=?[iii
^]::=?]iii
}|::=}]|WARN]WARN
&|::=&]|WARN]WARN
WARN]WARN::=~ERROR: missing ‘]’ in brainfunct program; inserted at end
^000::=000^ooo
^001::=001^ooi
^010::=010^oio
^011::=011^oii
^100::=100^ioo
^101::=101^ioi
ooo|::=|ooo
ooi|::=|ooi
oio|::=iio|oio
oii|::=|oii
ioo|::=|ioo
ioi|::=|ioi
iii|::=|iii
iio|!::=|
|z::=z|
|Z::=Z|
o_::=_o
i_::=_i
0!::=!0
1!::=!1
_!::=!_
/!::=!/
oooX!::=Xooo
oooY::=$Y
0$::=!1
1$::=$0
X$::=X!1
ooiX!::=Xooi
ooiY::=@1Y
1@1::=@00
0@1::=@11
1@0::=@01
0@0::=@00
X@1::=X!WARNDEC0WARN
WARNDEC0WARN::=~WARNING: attempt at decrementing zero (result is still zero)
X@00::=X@0
X@01::=X!1
X@0Y::=X!0Y
oioX!::=LLYLR
0LL::=LL0
1LL::=LL1
_LL::=!X!
|LL::=|!0X!
LR0::=0LR
LR1::=1LR
LRY::=_
oiiX!::=_oii
oiiY::=X!%
%0::=0%
%1::=1%
%_0::=Y0
%_1::=Y1
%_/::=Y0_/
iooX!::=X(in)
(in)0::=(in)
(in)1::=(in)
(in)Y::=Yi
i/0::=Zi/
i/1::=zi/
i/_::=!/
XY!::=X!0Y
0Y!::=0!Y
1Y!::=1!Y
YZ::=0Y
Yz::=1Y
ioiX!::=X(out)
(out)0::=0(out)O0
(out)1::=1(out)O1
(out)Y::=OO!Y
O0::=~0
O1::=~1
OO::=~_
iiiX!0::=ZX!0
iiiX!1::=zX!1
+::=000
-::=001
::=011
,::=100
.::=101
:::=|X!0Y0_/
::=
^*
[vdp-thue]: http://fvdp.homestead.com/files/eso_index.html#BFThue
[thue]: http://esoteric.voxelperfect.net/wiki/Thue
[semi-thue]: http://en.wikipedia.org/wiki/Semi-Thue_grammar
[brainfuck]: http://scienceblogs.com/goodmath/2006/07/gmbm_friday_pathological_progr.php
[vogel]: http://lvogel.free.fr/thue.htm

Something Nifty: A Taste of Simple Continued Fractions

One of the annoying things about how we write numbers is the fact that we generally write things one of two ways: as fractions, or as decimals.
You might want to ask, “Why is that annoying?” (And in fact, that’s what I want you to ask, or else there’s no point in my writing the rest of this!)
It’s annoying because both fractions and decimals can both only describe
*rational* numbers – that is, numbers that are a perfect ratio of two integers. And *most* numbers aren’t rational.
But it’s even more annoying than that: if you use decimals, then there are lots of rational numbers that you can’t represent exactly (i.e., 1/3); and if you use fractions, then it’s hard to express the idea that the fraction isn’t exact. (How do you write π as a fraction? 22/7 is a standard fractional approximation, but how do you say π, which is *almost* 22/7?)
So what do we do?
One of the answers is something called *continued fractions*. A continued fraction is a very neat thing. Here’s the idea: take a number where you don’t know it’s fractional form. Pick the nearest simple fraction 1/n that’s just a *little bit too large*. If you were looking at, say, 0.4, you’d take 1/2 – it’s a bit bigger. Now – you’ve got an approximation, but it’s too large. So that means that the demoninator is *too small*. So you add a correction to the denominator to make it a little bit bigger. And you just keep doing that – you approximate the correction to the denominator by adding a fraction to the denominator that’s just a little too big, and then you add a correction to *that* correction.
Let’s look at an example: 2.3456
1. It’s close to 2. So we start with 2 + (0.3456)
2. Now, we start approximating the fraction. The way we do that is we take the *reciprocal* of 0.3456 and take the integer part of it: 1/0.3456 rounded down is 2 . So we make it 2 + 1/2; and we know that the denominator is off by .3088.
3. We take the reciprocal again, and get 3, and it’s off by .736
4. We take the reciprocal again, and get 1, and it’s off by 0.264
5. Next we get 3, but it’s off by 208/1000
6. Then 4, off by 0.168
7. Then 5, off by .16
8. Then 6, off by .25
9. Then 4, off by 0; so now we have an exact result.
So as a continued fraction, 2.3456 looks like:
continued.jpg
As a shorthand, continued fractions are normally written using a list notation inside of square brackets: the integer part, following by a semicolon, followed by a comma-separated list of the denominators of each of the fractions. So our continued fraction for 2.3456 would be written [2; 2, 3, 1, 3, 4, 5, 6, 4].
There’s a very cool visual way of understanding that algorithm. I’m not going to show it for 2.3456, because it’s a bit too much… But let’s look at something simpler: let’s try to write 9/16ths as a continued fraction. Basically, we make a grid consisting of 16 squares across by 9 squares up and down. We draw the *largest* square we can on that grid. The number of squares of that size that we can draw is the first digit of the continued fraction. Now there’s a rectangle left over: we draw the largest squares we can, there. And so on:

square-continued.jpg

So the continued fraction for 9/16ths is [0; 1, 1, 3, 2].
Using continued fractions, we can represent *any* rational number in a finite-length continued fraction.
One incredibly nifty thing about this way of writing numbers is: what’s the reciprocal of 2.3456, aka [2; 2, 3, 1, 3, 4, 5, 6, 4]? It’s [0; 2, 2, 3, 1, 3, 4, 5, 6, 4]. We just add a zero to the front as the integer part, and push everything else one place to the right. If it was a zero in front, then we would have removed the zero and pulled everything else one place to the left.
Irrational numbers are represented as *infinite* continued fractions. So there’s an infinite series of correction fractions. You can understand it as a series of every-improving approximations of the value of the number. And you can define it using a recurrence relation (that is, a recursive formula) for how to get to the next digit.
For example, π = [3; 7, 15, 1, 292, 1, …]. If we work that out, the first six places of the continued fraction for pi work out in decimal form to 3.14159265392. That’s correct to the first *11* places in decimal. Not bad, eh?
A very cool property of the continued fractions is: square roots written as continued fractions *always repeat*. Even cooler? What’s the square root of two as a continued fraction? [1; 2, 2, 2, 2, …. ].