Category Archives: Good Math

Interpreting Lambda Calculus using Closed Cartesian Categories

Today I’m going to show you the basic idea behind the equivalency of closed cartesian categories and typed lambda calculus. I’ll do that by showing you how the λ-theory of any simply typed lambda calculus can be mapped onto a CCC.

First, let’s define the term “lambda theory”. In the simply typed lambda calculus, we always have a set of base types – the types of simple atomic values that can appear in lambda expressions. A lambda theory is a simply typed lambda calculus, plus a set of additional rules that define equivalences over the base types.

So, for example, if one of the base types of a lambda calculus was the natural numbers, the lambda theory would need to include rules to define equality over the natural numbers:

x = y if x=0 and y=0; and
x = y if x=s(x’) and y=s(y’) and x’ = y’

So. Suppose we have a lambda-theory $L$ . We can construct a corresponding category $C(L)$ . The objects in $C(L)$ are the types in $L$ . The arrows in $C(L)$ correspond to families of expressions in $L$ ; an arrow
$f : A \rightarrow B$ corresponds to the set of expressions of type $B$ that contain a single free variable of type $A$ .

The semantics of the lambda-theory can be defined by a functor; in particular, a cartesian closed functor $F$ that maps from $C(L)$ to the closed cartesian category of Sets. (It’s worth noting that this is completely equivalent to the normal Kripke semantics for lambda calculus; but when you get into more complex lambda calculi, like Hindley-Milner variants, this categorical formulation is much simpler.)

We describe how we build the category for the lambda theory in terms of a CCC using something called an interpretation function. It’s really just a notation that allows us to describe the translation recursively. The interpretation function is written using brackets: $[A]$ is the categorical interpretation of the type $A$ from lambda calculus.

So, first, we define an object for each type in $L$ . We need to include a special
type, which we call unit. The idea behind unit is that we need to be able to talk about “functions” that either don’t take any real paramaters, or functions that don’t return anything. Unit is a type which contains exactly one atomic value. Since there’s only one possible value for unit, and unit doesn’t have any extractable sub-values, conceptually, it doesn’t ever need to be passed around. So it’s a “value” that never needs to get passed – perfect for a content-free placeholder.

Anyway, here we go with the base rules:

$\forall A \in \mbox{basetypes}(L), [A] = A_C \in C(L)$
$[\mbox{unit}] = 1_C$

Next, we need to define the typing rules for complex types:

$[ A \times B] == [A] \times [B]$
$[A \rightarrow B] = [B]^{[A]}$

Now for the really interesting part. We need to look at type derivations – that is, the type inference rules of the lambda calculus – to show how to do the correspondences between more complicated expressions. Just like we did in lambda calculus, the type derivations are done with a context $Gamma$ , containing a set of type judgements. Each type judgement assigns a type to a lambda term. There are two translation rules for contexts:

$[ \emptyset ] = 1_C$
$[\Gamma, x: A] = [\Gamma] \times [A]$

We also need to describe what to do with the values of the primitive types:

For each value $v : A$ , there is an arrow $v : 1 \rightarrow A_C$ .

And now the rest of the rules. Each of these is of the form $[\Gamma :- x : A] = \mbox{arrow}$ , where we’re saying that $\Gamma$ entails the type judgement $x : A$ . What it means is the object corresponding to the type information covering a type inference for an expression corresponds to the arrow in $C(L)$ .

Unit evaluation: $[ \Gamma :- \mbox{unit}: \mbox{Unit}] = !: [\Gamma] \rightarrow [\mbox{Unit}]$ . (A unit expression is a special arrow “!” to the unit object.)
Simple Typed Expressions: $[\Gamma :- a: A_C] = a \circ ! : [\Gamma] \rightarrow [A_C]$ . (A simple value expression is an arrow composing with ! to form an arrow from Γ to the type object of Cs type.)
Free Variables: $[\Gamma x: A :- x : A] = \pi_2 : ([\Gamma] \times [A]) \rightarrow [A]$ (A term which is a free variable of type A is an arrow from the product of Γ and the type object A to A; That is, an unknown value of type A is some arrow whose start point will be inferred by the continued interpretation of gamma, and which ends at A. So this is going to be an arrow from either unit or a parameter type to A – which is a statement that this expression evaluates to a value of type A.)
Inferred typed expressions: $[\Gamma, x:A :- x$ , where $\pi_1: ([\Gamma] \times [A])\rightarrow [A']$ (If the type rules of Γ plus the judgement $x : A$ gives us $x$ , then the term $x$ is an arrow starting from the product of the interpretation of the full type context with $A$ ), and ending at $A'$ . This is almost the same as the previous rule: it says that this will evaluate to an arrow for an expression that results in type $A$ .)
Function Abstraction: $[\Gamma :- \lambda x:A . M : A \rightarrow B] = \mbox{curry}([\Gamma, x:A :- M:B]) : [\Gamma] \rightarrow B^{[A]}$ . (A function maps to an arrow from the type context to an exponential $[B]^{[A]}$ , which is a function from $A$ to $B$ .)
Function application: $[\Gamma :- (M M'): B] = \mbox{eval}_{C, B} \circ ([\Gamma :- M: C \rightarrow B]$ , $[\Gamma :- (M M'): B] = \mbox{eval}_{C,B} \circ ([\Gamma :- M : C \rightarrow B]$ , $[\Gamma :- M': C]): [\Gamma] \rightarrow [B]$ . (function evaluation takes the eval arrow from the categorical exponent, and uses it to evaluate out the function.)

There are also two projection rules for the decomposing categorical products, but they’re basically more of the same, and this is already dense enough.

The intuition behind this is:

arrows between types are families of values. A particular value is a particular arrow from unit to a type object.
the categorical exponent in a CC is exactly the same thing as a function type in λ-calculus; and an arrow to an exponent is the same thing as a function value. Evaluating the function is using the categorical exponent’s eval arrow to “decompose” the exponent, and produce an arrow to the function’s result type; that arrow is the value that the function evaluates to.
And the semantics – called functorial semantics – maps from the objects in this category, $C(L)$ to the category of Sets; function values to function arrows; type objects to sets; values to value objects and arrows. (For example, the natural number type would be an object in $C(L)$ , and the set of natural numbers in the sets category would be the target of the functor.)

Aside from the fact that this is actually a very clean way of defining the semantics of a not-so-simply typed lambda calculus, it’s also very useful in practice. There is a way of executing lambda calculus expressions as programs that is based on this, called the Categorical Abstract Machine. The best performing lambda-calculus based programming language (and my personal all-time-favorite programming language), Objective-CAML had its first implementation based on the CAM. (CAML stands for categorical abstract machine language.).

From this, you can see how the CCCs and λ-calculus are related. It turns out that that relation is not just cool, but downright useful. Concepts from category theory – like monads, pullbacks, and functors are really useful things in programming languages! In some later posts, I’ll talk a bit about that. My current favorite programming language, Scala, is one of the languages where there’s a very active stream of work in applying categorical ideas to real-world programming problems.

Programs As Proofs: Models and Types in the Lambda Calculus

14 Replies

Lambda calculus started off with the simple, untyped lambda calculus that we’ve been talking about so far. But one of the great open questions about lambda calculus was: was it sound? Did it have a valid model?

Church found that it was easy to produce some strange and non-sensical expressions using the simple lambda calculus. In order to try to work around those problems, and end up with a consistent system, Church introduced the concept of types, producing the simply typed lambda calculus. Once types hit the scene, things really went wild; the type systems for lambda calculi have never stopped developing: people are still finding new things to do by extending the LC type system today! Most lambda calculus based programming languages are based on the Hindley-Milner lambda calculus, which is a simplification of one of the standard sophisticated typed lambda calculi called SystemF. There’s even a Lambda Cube which can categorize the different type abstractions for lambda calculus (but alas, despite its name, it’s not related to the time cube.) Once people really started to understand types, they realized that the untyped lambda calculus was really just a pathologically simple instance of the simply typed lambda calculus: a typed LC with only one base type.

The semantics of lambda calculus are easiest to talk about in a typed version. For now, I’ll talk about the simplest typed LC, known as the simply typed lambda calculus. One of the really amazing things about this, which I’ll show, is that a simply typed lambda calculus is completely semantically equivalent to an intuitionistic propositional logic: each type in the program is a proposition in the logic; each β reduction corresponds to an inference step; and each complete function corresponds to a proof! Look below for how.

Types

The main thing that typed lambda calculus adds to the mix is a concept called base types. In a typed lambda calculus, you have some universe of atomic values which you can manipulate; those values are partitioned into the *base types*. Base types are usually named by single lower-case greek letters: So, for example, we could have a type “σ”, which consists of the set of natural numbers; a type “τ” which corresponds to boolean true/false values; and a type “γ” which corresponds to strings.

Once we have basic types, then we can talk about the type of a function. A function maps from a value of one type (the type of parameter) to a value of a second type (the type of the return value). For a function that takes a parameter of type “γ”, and returns a value of type “δ”, we write its type as “γ → δ”. “→” is called the _function type constructor_; it associates to the right, so “γ → δ → ε” parses as “γ → (δ → ε)”

To apply types to the lambda calculus, we do a couple of things. First, we need a syntax update so that we can include type information in lambda terms. And second, we need to add a set of rules to show what it means for a typed program to be valid.

The syntax part is easy. We add a “:” to the notation; the colon has an expression or variable binding on its left, and a type specification on its right. It asserts that whatever is on the left side of the colon has the type specified on the right side. A few examples:

$\lambda x: \nu . x + 3$: This asserts that the parameter, $x$ has type $nu$ , which we’ll use as the type name for the natural numbers. (In case it’s hard to tell, that’s a greek letter “nu” for natural.) There is no assertion of the type of the result of the function; but since we know that “+” is a function with type $\nu \rightarrow \nu \rightarrow \nu$ , we can infer that the result type of this function will be $\nu$ .
$(\lambda x . x + 3): \nu \rightarrow \nu$: This is the same as the previous, but with the type declaration moved out, so that it asserts the type for the lambda expression as a whole. This time we can infer that $x : \nu$ because the function type is $\nu \rightarrow \nu$ , which means that the function parameter has type $\nu$ .
$\lambda x: \nu, y:\delta . \text{if}\hspace{1ex} y\hspace{1ex} \text{then}\hspace{1ex} x * x \hspace{1ex}\text{else} \hspace{1ex} x$: This is a two parameter function; the first parameter has type ν, and the second has type δ. We can infer the return type, which is ν. So the type of the full function is ν → δ → ν. This may seem surprising at first; but remember that lambda calculus really works in terms of single parameter functions; multi-parameter functions are a shorthand for currying. So really, this function is: λ x : ν . (λ y : δ . if y then x * x else x); the inner lambda is type δ → ν; the outer lambda is type ν → (δ → ν).

To talk about whether a program is valid with respect to types (aka well-typed), we need to introduce a set of rules for type inference. Then we can verify that the program is type-consistent.

In type inference, we talked about judgements. When we can infer the type of an expression using an inference rule, we call that inference a type judgement. Type inference and judgements allow us to reason about types in a lambda expression; and if any part of an expression winds up with an inconsistent type judgement, then the expression is invalid. (When Church started doing typed LC, one of the motivations was to distinguish between values representing “atoms”, and values representing “predicates”; he was trying to avoid the Godel-esque paradoxes, by using types to ensure that predicates couldn’t operate on predicates.)

Type judgements are usually written in a sequent-based notation, which looks like a fraction where the numerator consists of statements that we know to be true; and the denominator is what we can infer from the numerator. In the numerator, we normally have statements using a context, which is a set of type judgements that we already know;it’s usually written as an uppercase greek letter. If a type context includes the judgement that $x : \alpha$ , I’ll write that as $\Gamma :- x : \alpha$ .

Rule 1: Type Identity

$\frac{\mbox{}}{x : \alpha :- x: \alpha}$

This is the simplest rule: if we have no other information except a declaration of the type of a variable, then we know that that variable has the type it was declared with.

Rule 2: Type Invariance

$\frac{ \Gamma :- x:\alpha, x != y}{\Gamma + y:\beta :- x:\alpha}$

This rule is a statement of non-interference. If we know that $x:\alpha$ , then inferring a type judgement about any other term cannot change our type judgement for $x$ .

Rule 3: Function Type Inference

$\frac{\Gamma + x:\alpha :- y:\beta}{\Gamma :- (\lambda x:\alpha. y):\alpha \rightarrow \beta}$

This statement allows us to infer function types given parameter types. Ff we know the type of the parameter to a function is $\alpha$ ; and if, with our knowledge of the parameter type, we know that the type of term that makes up the body of the function is $\beta$ , then we know that the type of the function is $\alpha \rightarrow \beta$ .

Rule 4: Function Application Inference

$\frac{\Gamma :- x: \alpha \rightarrow \beta, \Gamma :- y:\alpha}{\Gamma :- (x y): \beta}$

This one is easy: if we know that we have a function that takes a parameter of type $\alpha$ and returns a value of type $\beta$ , then if we apply that function to a value of type $\alpha$ , we’ll get a value of type $\beta$ .

These four rules are it. If we can take a lambda expression, and come up with a consistent set of type judgements for every term in the expression, then the expression is well-typed. If not, then the expression is invalid.

So let’s try taking a look at a simple lambda calculus expression, and seeing how inference works on it.

$\lambda x y . y x$

Without any type declarations or parameters, we don’t know the exact type. But we do know that “x” has some type; we’ll call that “α”; and we know that “y” is a function that will be applied with “x” as a parameter, so it’s got parameter type α, but its result type is unknown. So using type variables, we can say “x:α,y:α→β”. We can figure out what “α” and “β” are by looking at a complete expression. So, let’s work out the typing of it with x=”3″, and y=”λ a:ν.a*a”. We’ll assume that our type context already includes “*” as a function of type “ν→ν→ν”, where ν is the type of natural numbers.

“λ x y . y x) 3 (λ a:ν . a * a)”: Since 3 is a literal integer, we know its type: 3:ν.
By rule 4, we can infer that the type of the expression “a*a” where “a:ν” is “ν”, and *:ν→ν→ν so therefore, by rule 3 the lambda expression has type “ν→ν”. So with type labelling, our expression is now: “(λ x y . y x) (3:ν) (λ a:ν.(a*a):ν) : ν→ν”.
So – now, we know that the parameter “x” of the first lambda must be “ν”; and “y” must be “ν→ν”; so by rule 4, we know that the type of the application expression “y x” must be “ν”; and then by rule 3, the lambda has type: “ν→(ν→ν)→ν”.
So, for this one, both α and β end up being “ν”, the type of natural numbers.

So, now we have a simply typed lambda calculus. The reason that it’s simply typed is because the type treatment here is minimal: the only way of building new types is through the unavoidable $\rightarrow$ constructor. Other typed lambda calculi include the ability to define parametric types, which are types expressed as functions ranging over types.

Programs are Proofs

Here’s where it gets really fun. Think about the types in the simple typed language calculus. Anything which can be formed from the following grammar is a lambda calculus type:

type ::= primitive | function | ( type )
primitive ::= α | β | δ | …
function ::= type→type

The catch with that grammar is that you can create type expressions which, while they are valid type definitions, you can’t write a single, complete, closed expression which will actually have that type. (A closed expression is one with no free variables.) When there is an expression that has a type, we say that the expression inhabits the type; and that the type is an inhabited type. If there is no expression that can inhabit a type, we say it’s uninhabitable. Any expression which either can’t be typed using the inference rules, or which is typed to an uninhabitable type is a type error.

So what’s the difference between inhabitable type, and an uninhabitable type?

The answer comes from something called the Curry-Howard isomorphism. For a typed lambda calculus, there is a corresponding intuitionistic logic. A type expression is inhabitable if and only if the type is a provable theorem in the corresponding logic.

The type inference rules in lambda calculus are, in fact, the same as logical inference rules in intuitionistic logic. A type $\alpha \rightarrow \beta$ can be seen as either a statement that this is a function that maps from a value of type $\alpha$ to a value of type $\beta$ , or as a logical statement that if we’re given a fact alpha $\alpha$ , we could use that to infer the truth of a fact $\beta$ .

If there’s a logical inference chain from an axiom (a given type assignment) to an inferred statement, then the inferred statement is an inhabitable type. If we have a type $\alpha \rightarrow \alpha$ , then given a inhabited type $\alpha$ , we know that $\alpha \rightarrow \alpha$ is inhabitable, because if $\alpha$ is a fact, then $\alpha \rightarrow \alpha$ is also a fact.

On the other hand, think of a different case $\alpha \rightarrow \beta$ . That’s not a theorem, unless there’s some other context that proves it. As a function type, that’s the type of a function which, without including any context of any kind, can take a parameter of type α, and return a value of a different type β. You can’t do that – there’s got to be some context which provides a value of type β – and to access the context, there’s got to be something to allow the function to access its context: a free variable. Same thing in the logic and the lambda calculus: you need some kind of context to establish “α→β” as a theorem (in the logic) or as an inhabitable type (in the lambda calculus).

It gets better. If there is a function whose type is a theorem in the corresponding intuitionistic logic, then the program that has that type is a proof of the theorem. Each beta reduction is equivalent to an inference step in the logic. This is what programming languages geeks like me mean when we say “the program is the proof”: a well-typed program is, literally, a proof its well-typed-ness.

To connect back to the discussion about models: the intuitionistic logic corresponding to the lambda calculus and intuitionistic logic are, in a deep sense, just different reflections of the same thing. We know that intuitionistic logic has a valid model. And that, in turn, means that lambda calculus is valid as well. When we show that something is true using the lambda calculus, we can trust that it’s not an artifact of an inconsistent system.

Models and Why They Matter

10 Replies

As I said in the last post, Church came up with λ-calculus, which looks like it’s a great formal model of computation. But – there was a problem. Church struggled to find a model. What’s a model, and why would that matter? That’s the point of this post. To get a quick sense of what a model is, and why it matters?

A model is basically a mapping from the symbols of a logical system to some set off objects, such that all statements that you can prove in the logical system will be true about the corresponding objects. Note that when I say object here, I don’t necessarily mean real-world physical objects – they’re just something that we can work with, which is well-defined and consistent.

Why does it matter? Because the whole point of a system like λ-calculus is because we want to use it for reasoning. When you have a logical system like λ-calculus, you’ve built this system with its rules for a reason – because you want to use it as a tool for understanding something. The model provides you with a way of saying that the conclusions you derive using the system are meaningful. If the model isn’t correct, if it contains any kind of inconsistency, then your system is completely meaningless: it can be used to derive anything.

So the search for a model for λ-calculus is really important. If there’s a valid model for it, then it’s wonderful. If there isn’t, then we’re just wasting our time looking for one.

So, now, let’s take a quick look at a simple model, to see how a problem can creep in. I’m going to build a logic for talking about the natural numbers – that is, integers greater than or equal to zero. Then I’ll show you how invalid results can be inferred using it; and finally show you how it fails by using the model.

One quick thing, to make the notation easier to read: I’m going to use a simple notion of types. A type is a set of atoms for which some particular one-parameter predicate is true. For example, if $P(x)$ is true, I’ll say that x is a member of type P. In a quantifier, I’ll say things like $forall x in P: mbox{em foo}$ to mean $forall x : P(x) Rightarrow mbox{em foo}$ . Used this way, we can say that P is a type predicate.

How do we define natural numbers using logic?

First, we need an infinite set of atoms, each of which represents one number. We pick one of them, and call it zero. To represent the fact that they’re natural numbers, we define a predicate ${cal N}(x)$ , which is true if and only if x is one of the atoms that represents a natural number.

Now, we need to start using predicates to define the fundamental properties of numbers. The most important property of natural numbers is that they are a sequence. We define that idea using a predicate, $mbox{em succ}(x,y)$ , where $mbox{em succ}(x,y)$ is true if and only if x = y + 1. To use that to define the ordering of the naturals, we can say: $forall x in {cal N}: exists y: mbox{em succ}(x, y)$ .

Or in english: every natural number has a successor – you can always add one to a natural number and get another natural number.

We can also define predecessor similarly, with two statements:

$forall x in {cal N}: exists y in {cal N}: mbox{em pred}(x, y)$ .
$forall x,y in {cal N}: mbox{em pred}(y,x) Leftrightarrow mbox{em succ}(x,y)$

So every number has a predecessor, and every number has a successor, and x is the predecessor of y if y is the successor of x.

To be able to define things like addition and subtraction, we can use successor. Let’s define addition using a predicate Sum(x,y,z) which means “z = x + y”.

$forrall x,y in {cal N}: exists z in {cal N} : Sum(x,y,z)$
$forall x,y in {cal N}: Sum(x, 0, x)$
$forall x,y,z in {cal N}: exists a,b in {cal N}: Sum(a,b,z) land mbox{em succ}(a,x) land mbox{em succ}(y,b) Rightarrow Sum(x, y, z)$

Again, in english: for any two natural numbers, there is a natural number that it their sum; x + 0 always = x; and for any natural number, x + y = z is true if (x + 1) + (y – 1) = z.

Once we have addition, subtraction is easy: $forall x,y,z in {cal N} : diff(x,y,z) Leftrightarrow sum(z,y,x)$

That’s: x-y=z if and only if x=y+z.

We can also define greater than using addition:

$forall x,y in {cal N} : x > y Leftrightarrow$ y Leftrightarrow’ style=’vertical-align:1%’ class=’tex’ alt=’forall x,y in {cal N} : x > y Leftrightarrow’ />

$mbox{em succ}(x,y)) lor$
$exists z in {cal N}: mbox{em succ}(x, z)) land exists z in {cal N}: mbox{em succ}(x,z) and z > y$ y’ style=’vertical-align:1%’ class=’tex’ alt=’exists z in {cal N}: mbox{em succ}(x, z)) land exists z in {cal N}: mbox{em succ}(x,z) and z > y’ />.

That’s x > y if you can get to x from y by adding one repeatedly.

So, we’ve got a nice definition of natural numbers here, right?

Almost. There’s one teeny little mistake.

We said that every natural number has a successor and a predecessor, and we also said that a number x is greater than a number y if you can get from y to x using a sequence of successors. That means that 0 has a predecessor, and that the predecessor of 0 is less than 0. But we’re supposed to be defining the natural numbers! And one of the standard axioms of the natural numbers is that $forall x in {cal N}: 0 le x$ . But we’ve violated that – we have both $forall x in {cal N}: 0 le x$ , and
$exists x in {cal N}: 0 > x$ x’ style=’vertical-align:1%’ class=’tex’ alt=’exists x in {cal N}: 0 > x’ />. And with a contradiction like that in the system, we can prove anything we want, anything at all. We’ve got a totally worthless, meaningless system.

That’s why mathematicians are so particular about proving the validity of their models: because the tiniest error can mean that anything you proved with the logic might not be true – your proofs are worthless.

The Basics of Software Transactional Memory

14 Replies

As promised, it’s time for software transactional memory!

A bit of background, first. For most of the history of computers, the way that we’ve built software is very strongly based on the fact that a computer has a processor – a single CPU, which can do one thing at a time, in order. Until recently, that was true. But now it really isn’t anymore. There’s the internet – which effectively means that no computer is ever really operating in isolation – it’s part of a huge computational space shared with billions of other computers. And even ignoring the internet, we’re rapidly approaching the point where tiny devices, like cellphones, will have more than one CPU.

The single processor assumption makes things easy. We humans tend to think very sequentially – that is, the way that we describe how to do things is: do this, then do that. We’re not so good at thinking about how to do lots of things at the same time. Just think about human language. If I want to bake a cake, what I’ll do is: measure out the butter and sugar, put them in the mixer, mix it until they’re creamed. Then add milk. Then in a separate bowl, measure out and sift the flour and the baking powder. Then slowly pour the dry stuff into the wet, and mix it together. Etc.

I don’t need to fully specify that order. If I had multiple bakers, they could do many of steps at the same time. But how, in english, can I clearly say what needs to be done? I can say it, but it’s awkward. It’s harder to say, and harder to understand than the sequential instructions.

What I need to do are identifying families of things that can be done at the same time, and then the points at which those things will merge.

All of the measurements can be done at the same time. In any mixings step, the mixing can’t be done until all of the ingredients are ready. Ingredients being ready could mean two things. It could mean that the ingredients were measured out; or it could mean that one of the ingredients for the mix is the product of one of the previous mixing steps, and that that previous step is complete. In terms of programming, we’d say that the measurement steps are independent; and the points at which we need to wait for several things to get done (like “we can’t mix the dry ingredients in to the wet until the wet ingredients have all been mixed and the dry ingredients have been measured”), we’d call synchroniation points.

It gets really complicated, really quickly. In fact, it gets even more complicated than you might think. You see, if you were to write out the parallel instructions for this, you’d probably leave a bunch of places where you didn’t quite fully specify things – because you’d be relying on stuff like common sense on the part of your bakers. For example, you’d probably say to turn on the over to preheat, but you wouldn’t specifically say to wait until it reached the correct temperature to put stuff into it; you wouldn’t mention things like “open the over door, then put the cake into it, then close it”.

When we’re dealing with multiple processors, we get exactly those kinds of problems. We need to figure out what can be done at the same time; and we need to figure out what the synchronization points are. And we also need to figure out how to do the synchronization. When we’re talking about human bakers “don’t mix until the other stuff is ready” is fine. But in software, we need to consider things like “How do we know when the previous steps are done?”.

And it gets worse than that. When you have a set of processors doing things at the same time, you can get something called a race condition which can totally screw things up!

For example, imagine that we’re counting that all of the ingredients are measured. We could imagine the mixer process as looking at a counter, waiting until all five ingredients have been measured. Each measuring process would do its measurement, and then increment the counter.

  val measureCount = 0
  process Mixer() {
    wait until measureCount == 5
  }

  process Measurer() {
     do_measure
	 measureCount = measureCount + 1
  }

What happens if two measurer finish at almost the same time? The last statement in Measurer actually consists of three steps: retrieve the value of measureCount; add one; store the incremented value. So we could wind up with:

Time	Measurer1	Measurer2	measureCount
0			1
1	Fetch measureCount(=1)		1
2	Increment(=2)	Fetch measurecount(=1)	1
3	Store updated(=2)	Increment(=2)	2
4		Store updated(=2)	2

Now, Mixer will never get run! Because of the way that the two Measurers overlapped, one of the increments effectively got lost, and so the count will never reach 5. And the way it’s written, there’s absolutely no way to tell whether or not that happened. Most of the time, it will probably work – because the two processes have to hit the code that increments the counter at exactly the same time in order for there to be a problem. But every once in a while, for no obvious reason, the program will fail – the mixing will never get done. It’s the worst kind of error to try to debug – one which is completely unpredictable. And if you try to run it in a debugger, because the debugger slows things down, you probably won’t be able to reproduce it!

This kind of issue always comes down to coordination or synchronization of some kind – that is, the main issue is how do the different processes interact without stomping on each other?

The simplest approach is to use something called a lock. A lock is an object which signals ownership of some resource, and which has one really important property: in concept, it points at at most one process, and updating it is atomic meaning that when you want to look at it and update it, nothing can intervene between the read and write. So if you want to use the resource managed by the lock, you can look at the lock, and see if anyone is using it; and if not, set it to point to you. That process is called acquiring the lock.

In general, we wrap locks up in libraries to make them easier to use. If “l” was a lock, you’d take a lock by using a function something like “lock(l)”, which really expanded to something like:

def take(L: Lock) {
  while (L != me)
     atomically do if L==no one then L=me
}

So the earlier code becomes:

val measureCount = 0
val l = new Lock()

process Mixer() {
  wait until measureCount == 5
}

process Measurer() {
  do_measure
  lock(l)
  measureCount = measureCount + 1
  unlock(l)
}

In a simple example like that, locks are really easy. Unfortunately, real examples get messy. For instance, there’s a situation called deadlock. A classic demonstration is something called the dining philosophers. You’ve got four philosophers sitting at a table dinner table. Between each pair, there’s a chopstick. In order to eat, a philosopher needs two chopsticks. When they get two chopsticks, they’ll use them to take a single bite of food, and then they’ll put down the chopsticks. If Each philosopher starts by grabbing the chopstick to their right, then no one gets to each. Each has one chopstick, and there’s no way for them to get a second one.

That’s exactly what happens in a real system. You lock each resource that you want to share. For some operations, you’re going to use more than one shared resource, and so you need two locks. If you have multiple tasks that need multiple resources, it’s easy to wind up with a situation where each task has some subset of the locks that they need.

Things like deadlock mean that simple locks get hairy really quickly. Not that any of the more complex coordination strategies make deadlocks impossible; you can always find a way of creating a deadlock in any system – but it’s a lot easier to create accidental deadlocks using simple locks than, say, actors.

So there’s a ton of methods that try to make it easier to do coordination between multiple tasks. Under the covers, these ultimately rely on primitives like locks (or semaphores, another similar primitive coordination tool). But they provide a more structured way of using them. Just like structured control flow makes code cleaner, clearer, and more maintanable, structured coordination mechanism makes concurrency cleaner, clearer, and more maintainable.

Software transactional memory is one approach to this problem, which is currently very trendy. It’s still not entirely clear to me whether or not STM is really quite up to the real world – current implementations remain promising, but inefficient. But before getting in to any of that, we need to talk about just what it is.

As I see it, STM is based on two fundamental concepts:

Optimism. In software terms, by optimism, we mean that we’re going to plow ahead and assume that there aren’t any errors; when we’re done, we’ll check if there was a problem, and clean up if necessary. A good example of this from my own background is source code control systems. In the older systems like RCS, you’d lock a source file before you edited it; then you’d make your changes, and check them in, and release the lock. That way, you know that you’re never going to have two people making changes to the same file at the same time. But the downside is that you end up with lots of people sitting around doing nothing, waiting to get the lock on the file they want to change. Odds are, they weren’t going to change the same part of the file as the guy who has the lock. But in order to make sure that they can’t, the locks also block a lot of real work. Eventually, the optimistic systems came along, and what they did was say: “go ahead and edit the file all you want. But before I let you check in (save) the edited file to the shared repository, I’m going to make sure that no one changed it in a way that will mess things up. If I find out that someone did, then you’re going to have to fix it.”
Transactions. A transaction is a concept from (among other places) databases. In a database, you often make a collection of changes that are, conceptually, all part of the same update. By making them a transaction, they become one atomic block – and either the entire collection all succeedd, or the entire collection all fail. Transactions guarantee that you’ll never end up in a situation where half of the changes in an update got written to the database, and the other half didn’t.

What happens in STM is that you have some collection of special memory locations or variables. You’re only allowed to edit those variables in a transaction block. Whenever a program enters a transaction block, it gets a copy of the transaction variables, and just plows ahead, and does whatever it wants with its copy of the transaction variables. When it gets to the end of the block, it tries to commit the transaction – that is, it tries to update the master variables with the values of its copies. But it checks them first, to make sure that the current value of the master copies haven’t changed since the time that it made its copy. If they did, it goes back and starts over, re-running the transaction block. So if anyone else updated any of the transaction variables, the transaction would fail, and then get re-executed.

In terms of our baking example, both of the measurers would enter the transaction block at the same time; and then whichever finished first would commit its transaction, which would update the master count variable. Then when the second transaction finished, it would check the count variable, see that it changed, and go back and start over – fetching the new value of the master count variable, incrementing it, and then committing the result. In terms of code, you’d just do something like:

transactional val measureCount = 0

process Mixer() {
  wait until measureCount == 5
}

process Measurer() {
  do_measure
  atomically {
    measureCount = measureCount + 1
  }
}

It’s really that simple. You just mark all the shared resources as transactional, and then wrap the code that modifies them in a transaction block. And it just works. It’s a very elegant solution.

Of course there’s a lot more to it, but that’s the basic idea. In the code, you identify the transactional variables, and only allow them to be updated inside of a transaction block. At runtime, when you encounter a transaction block, charge ahead, and do whatever you want. Then when you finish, make sure that there isn’t an error. If there was, try again.

So what’s it look like in a non-contrived programming language? These days, I’m doing most of my coding in Scala. There’s a decent STM implementation for Scala as a part of a package called Akka.

In Akka, the way that you define a transactional variable is by using a Ref type. A Ref is a basically a cell that wraps a value. (It’s almost like a pointer value in C.) So, for example, in our Baker example:

var count :Ref[Int] = Ref(0)

Then in code, to use it, you literally just wrap the code that modifies the Refs in “atomic”. Alas, you don’t quite get to treat the refs like normal variables – to access the value of a ref, you need to call Ref.get; to change the value, you need to use a method alter, which takes a function that computes the new value in terms of the old.

class Measurer {
  def doMeasure() {
    // do the measuring stuff
    atomic {
	  ref.alter(_ + 1)
    }
  }
}

The “(_ + 1)” probably needs a quick explanation. In Scala, you can define a single expression function using “_” to mark the slot where the parameter should go. So “(_ + 1)” is equivalent to the lambda expression { x => x + 1}.

You can see, just from this tiny example, why STM is such an attractive approach. It’s so simple! It’s very easy to read and write. It’s a very simple natural model. It’s brilliant in its simplicity. Of course, there’s more to it that what I’ve written about here – error handling, voluntary transaction abort, retry management, transaction waits – but this is the basics, and it really is this simple.

What are the problems with this approach?

Impurity. If not all variables are transactional, and you can modify a non-transactional variable inside of a transaction block, then you’ve got a mess onp your hands. Values from transactionals can “leak” out of transaction blocks.
Inefficiency. You’ve got to either copy all of the transactional variables at the entry to the transaction block, or you’ve got to use some kind of copy-on-write strategy. However you do it, you’ve got grief – aggressive copying, copy-on-write, memory protect – they’ve all got non-trivial costs. And re-doing the entire transaction block every time it fails can eat a lot of CPU.
Fairness. This is a fancy term for “what if one guy keeps getting screwed?” You’ve got lots of processes all working with the same shared resources, which are protected behind the transactional variables. It’s possible for timing to work out so that one process keeps getting stuck doing the re-tries. This is something that comes up a lot in coordination strategies for concurrency, and the implementations can get pretty hairy trying to make sure that they don’t dump all of the burden of retries on one process.

The Wrong Way To Write Concurrent Programs: Actors in Cruise

4 Replies

I’ve been planning to write a few posts about some programming stuff that interests me. I’ve spent a good part of my career working on systems that need to support concurrent computation. I even did my PhD working on a system to allow a particular style of parallel programming. It’s a really hard problem – concurrency creates a host of complex issues in how systems behave, and the way that you express concurrency in a programming language has a huge impact on how hard it is to read, write, debug, and reason about systems.

So, like I’ve said, I’ve spent a lot of time thinking about these issues, and looking at various different proposed solutions, as well as proposing a couple of my own. But I really don’t know of any good writeup describing the basics of the most successful approaches for beginners. So I thought I could write one.

But that’s not really today’s post. Todays post is my version of a train-wreck. Long-time readers of the blog know that I’m fascinated with bizarre programming languages. So today, I’m going to show you a twisted, annoying, and thoroughly pointless language that I created. It’s based on one of the most successful models of concurrent programming, called Actors, which was originally proposed by Professor Gul Agha of UIUC. There’ve been some really nice languages built using ideas from Actors, but this is not one of them.

The language is called “Cruise”. Why? Because it’s a really bad Actor language. And what name comes to mind when I think of really, really bad actors with delusions of adequacy? Tom Cruise.

You can grab my implementation from github. Just so you know, the code sucks. It’s something I threw together in my spare time a couple of years ago, and haven’t really looked at since. So it’s sloppy, overcomplicated, probably buggy, and slow as a snail on tranquilizers.

Quick overview of the actor model

Actors are a theoretical model of computation, which is designed to describe completely asynchronous parallel computation. Doing things totally asynchronously is very strange, and very counter-intuitive. But the fact of the matter is, in real distributed systems, everything *is* fundamentally asynchronous, so being able to describe distributed systems in terms of a simple, analyzable model is a good thing.

According to the actor model, a computation is described by a collection of things called, what else, actors. An actor has a mailbox, and a behavior. The mailbox is a uniquely named place where messages sent to an actor can be queued; the behavior is a definition of how the actor is going to process a message from its mailbox. The behavior gets to look at the message, and based on its contents, it can do three kinds of things:

Create other actors.
Send messages to other actors whose mailbox it knows.
Specify a new behavior for the actor to use to process its next message.

You can do pretty much anything you need to do in computations with that basic mechanism. The catch is, as I said, it’s all asynchronous. So, for example, if you want to write an actor that adds two numbers, you can’t do it by what you’d normally think of as a function call. In a lot of ways, it looks like a method call in something like Smalltalk: one actor (object) sends a message to another actor, and in response, the receiver takes some action specified by them message.

But subroutines and methods are synchronous, and nothing in actors is synchronous. In an object-oriented language, when you send a message, you stop and wait until the receiver of the message is done with it. In Actors, it doesn’t work that way: you send a message, and it’s sent; that’s it, it’s over and done with. You don’t wait for anything; you’re done. If you want a reply, you need to send the the other actor a reference to your mailbox, and make sure that your behavior knows what to do when the reply comes in.

It ends up looking something like the continuation passing form of a functional programming language: to do a subroutine-like operation, you need to pass an extra parameter to the subroutine invocation; that extra parameter is the *intended receiver* of the result.

You’ll see some examples of this when we get to some code.

Tuples – A Really Ugly Way of Handling Data

This subtitle is a bit over-the-top. I actually think that my tuple notion is pretty cool. It’s loosely based on how you do data-types in Prolog. But the way that it’s implemented in Cruise is absolutely awful.

Cruise has a strange data model. The idea behind it is to make it easy to build actor behaviors around the idea of pattern matching. The easiest/stupidest way of doing this is to make all data consist of tagged tuples. A tagged tuple consists of a tag name (an identifier starting with an uppercase letter), and a list of values enclosed in the tuple. The values inside of a tuple can be either other tuples, or actor names (identifiers starting with lower-case letters).

So, for example, Foo(Bar(), adder) is a tuple. The tag is “Foo“. It’s contents are another tuple, “Bar()“, and an actor name, “adder“.

Since tuples and actors are the only things that exist, we need to construct all other types of values from some combination of tuples and actors. To do math, we can use tuples to build up Peano numbers. The tuple “Z()” is zero; “I(n)” is the number n+1. So, for example, 3 is “I(I(I(Z())))“.

The only way to decompose tuples is through pattern matching in messages. In an actor behavior. message handlers specify a *tuple pattern*, which is a tuple where some positions may be filled by{em unbound} variables. When a tuple is matched against a pattern, the variables in the pattern are bound to the values of the corresponding elements of the tuple.

A few examples:

matching I(I(I(Z()))) with I($x) will succeed with $x bound to I(I(Z)).
matching Cons(X(),Cons(Y(),Cons(Z,Nil()))) with Cons($x,$y) will succeed with $x bound to X(), and $y bound to Cons(Y(),Cons(Z(),Nil())).
matching Cons(X(),Cons(Y(),Cons(Z(),Nil()))) with Cons($x, Cons(Y(), Cons($y, Nil()))) will succeed with $x bound to X(), and $y bound to Z().

Code Examples!

Instead of my rambling on even more, let’s take a look at some Cruise programs. We’ll start off with Hello World, sort of.

actor !Hello {
  behavior :Main() {
    on Go() { send Hello(World()) to out }
  }
  initial :Main
}

instantiate !Hello() as hello
send Go() to hello

This declares an actor type “!Hello”; it’s got one behavior with no parameters. It only knows how to handle one message, “Go()”. When it receives go, it sends a hello world tuple to the actor named “out”, which is a built-in that just prints whatever is sent to it.

Let’s be a bit more interesting, and try something using integers. Here’s some code to do a greater than comparison:

actor !GreaterThan {
  behavior :Compare() {
    on GT(Z(),Z(), $action, $iftrue, $iffalse) {
      send $action to $iffalse
    }
    on GT(Z(), I($x), $action, $iftrue, $iffalse) {
      send $action to $iffalse
    }
    on GT(I($x), Z(), $action, $iftrue, $iffalse) {
      send $action to $iftrue
    }
    on GT(I($x), I($y), $action, $iftrue, $iffalse) {
      send GT($x,$y,$action,$iftrue,$iffalse) to $self
    }
  }
  initial :Compare
}

actor !True {
  behavior :True() {
    on Result() { send True() to out}
  }
  initial :True
}

actor !False {
  behavior :False() {
    on Result() { send False() to out}
  }
  initial :False
}

instantiate !True() as true
instantiate !False() as false
instantiate !GreaterThan() as greater
send GT(I(I(Z())), I(Z()), Result(), true, false) to greater
send GT(I(I(Z())), I(I(I(Z()))), Result(), true, false) to greater
send GT(I(I(Z())), I(I(Z())), Result(), true, false) to greater

This is typical of how you do “control flow” in Cruise: you set up different actors for each branch, and pass those actors names to the test; one of them will receive a message to continue the execution.

What about multiple behaviors? Here’s a trivial example of a flip-flop:

actor !FlipFlop {
  behavior :Flip() {
    on Ping($x) {
      send Flip($x) to out
      adopt :Flop()
    }
    on Pong($x) {
      send Flip($x) to out
    }
  }
  behavior :Flop() {
    on Ping($x) {
      send Flop($x) to out
    }
    on Pong($x) {
      send Flop($x) to out
      adopt :Flip()
    }
  }
  initial :Flip
}

instantiate !FlipFlop() as ff
send Ping(I(I(Z()))) to ff
send Ping(I(I(Z()))) to ff
send Ping(I(I(Z()))) to ff
send Ping(I(I(Z()))) to ff
send Pong(I(I(Z()))) to ff
send Pong(I(I(Z()))) to ff
send Pong(I(I(Z()))) to ff
send Pong(I(I(Z()))) to ff

If the actor is in the “:Flip” behavior, then when it gets a “Ping”, it sends “Flip” to out, and switches behavior to flop. If it gets point, it just sents “Flip” to out, and stays in “:Flip”.

The “:Flop” behavior is pretty much the same idea, accept that it switches behaviors on “Pong”.

An example of how behavior changing can actually be useful is implementing settable variables:

actor !Var {
  behavior :Undefined() {
    on Set($v) { adopt :Val($v) }
    on Get($target) { send Undefined() to $target }
    on Unset() { }
  }
  behavior :Val($val) {
    on Set($v) { adopt :Val($v) }
    on Get($target) { send $val to $target }
    on Unset() { adopt :Undefined() }
  }
  initial :Undefined
}
instantiate !Var() as v
send Get(out) to v
send Set(I(I(I(Z())))) to v
send Get(out) to v

Two more programs, and I’ll stop torturing you. First, a simple adder:

actor !Adder {
  behavior :Add() {
    on Plus(Z(),$x, $target) {
      send $x to $target
    }
    on Plus(I($x), $y, $target) {
      send Plus($x,I($y), $target) to $self
    }
  }
  initial :Add
}

actor !Done {
  behavior :Finished() {
    on Result($x) { send Result($x) to out }
  }
  initial :Finished
}

instantiate !Adder() as adder
instantiate !Done() as done
send Plus(I(I(I(Z()))),I(I(Z())), out) to adder

Pretty straightforward – the only interesting thing about it is the way that it sends the result of invoking add to a continuation actor.

Now, let’s use an addition actor to implement a multiplier actor. This shows off some interesting techniques, like carrying auxiliary values that will be needed by the continuation. It also shows you that I cheated, and added integers to the parser; they’re translated into the peano-tuples by the parser.

actor !Adder {
  behavior :Add() {
    on Plus(Z(),$x, $misc, $target) {
      send Sum($x, $misc) to $target
    }
    on Plus(I($x), $y, $misc, $target) {
      send Plus($x,I($y), $misc, $target) to $self
    }
  }
  initial :Add
}

actor !Multiplier {
  behavior :Mult() {
    on Mult(I($x), $y, $sum, $misc, $target) {
      send Plus($y, $sum, MultMisc($x, $y, $misc, $target), $self) to adder
    }
    on Sum($sum, MultMisc(Z(), $y, $misc, $target)) {
      send Product($sum, $misc) to $target
    }
    on Sum($sum, MultMisc($x, $y, $misc, $target)) {
      send Mult($x, $y, $sum, $misc, $target) to $self
    }
  }
  initial :Mult
}

instantiate !Adder() as adder
instantiate !Multiplier() as multiplier
send Mult(32, 191, 0, Nil(), out) to multiplier

So, is this Turing complete? You bet: it’s got peano numbers, conditionals, and recursion. If you can do those three, you can do anything.

Sidetrack from the CCCs: Lambda Calculus

3 Replies

So, last post, I finally defined closed cartesian categories. And I alluded to the fact that the CCCs are, essentially, equivalent to the simply typed λ calculus. But I didn’t really talk about what that meant.

Before I can get to that, you need to know what λ calculus is. Many readers are probably familiar, but others aren’t. And as it happens, I absolutely love λ calculus.

In computer science, especially in the field of programming languages, we tend to use λ calculus a whole lot. It’s also extensively used by logicians studying the nature of computation and the structure of discrete mathematics. λ calculus is great for a lot of reasons, among them:

It’s very simple.
It’s Turing complete: if a function can be computed by any possible computing device, then it can be written in λ-calculus.
It’s easy to read and write.
Its semantics are strong enough that we can do reasoning from it.
It’s got a good solid model.
It’s easy to create variants to explore the properties of various alternative ways of structuring computations or semantics.

The ease of reading and writing λ calculus is a big deal. It’s led to the development of a lot of extremely good programming languages based, to one degree or another, on the λ calculus: Lisp, ML, Haskell, and my current favorite, Scala, are very strongly λ calculus based.

The λ calculus is based on the concept of functions. In the pure λ calculus, everything is a function; there are no values except for functions. In fact, we can pretty much build up all of mathematics using λ-calculus.

With the lead-in out of the way, let’s dive in a look at λ-calculus. To define a calculus, you need to define two things: the syntax, which describes how valid expressions can be written in the calculus; and a set of rules that allow you to symbolically manipulate the expressions.

Lambda Calculus Syntax

The λ calculus has exactly three kinds of expressions:

Function definition: a function in λ calculus is an expression, written: λ param . body, which defines a function with one parameter.
Identifier reference: an identifier reference is a name which matches the name of a parameter defined in a function expression enclosing the reference.
Function application: applying a function is written by putting the function value in front of its parameter, as in x y to apply the function x to the value y.

There’s a trick that we play in λ calculus: if you look at the definition above, you’ll notice that a function (lambda expression) only takes one parameter. That seems like a very big constraint – how can you even implement addition with only one parameter?

It turns out to be no problem, because of the fact that functions are, themselves, values. Instead of writing a two parameter function, you can write a one parameter function that returns a one parameter function, which can then operate on the second parameter. In the end, it’s effectively the same thing as a two parameter function. Taking a two-parameter function, and representing it by two one-parameter functions is called currying, after the great logician Haskell Curry.

For example, suppose we wanted to write a function to add x and y. We’d like to write something like: λ x y . x + y. The way we do that with one-parameter functions is: we first write a function with one parameter, which returns another function with one parameter.

Adding x plus y becomes writing a one-parameter function with parameter x, which returns another one parameter function which adds x to its parameter: λ x . (λ y . x + y).

Now that we know that adding multiple parameter functions doesn’t really add anything but a bit of simplified syntax, we’ll go ahead and use them when it’s convenient.

One important syntactic issue that I haven’t mentioned yet is closure or complete binding. For a λ calculus expression to be evaluated, it cannot reference any identifiers that are not bound. An identifier is bound if it a parameter in an enclosing λ expression; if an identifier is not bound in any enclosing context, then it is called a free variable. Let’s look quickly at a few examples:

λ x . p x y: in this expression, y and p are free, because they’re not the parameter of any enclosing λ expression; x is bound because it’s a parameter of the function definition enclosing the expression p x y where it’s referenced.
λ x y.y x: in this expression both x and y are bound, because they are parameters of the function definition, and there are no free variables.
λ y . (λ x . p x y). This one is a tad more complicated, because we’ve got the inner λ. So let’s start there. In the inner λ, λ x . p x y, y and p are free and x is bound. In the full expression, both x and y are bound: x is bound by the inner λ, and y is bound by the other λ. “p” is still free.

We’ll often use “free(x)” to mean the set of identifiers that are free in the expression “x”.

A λ calculus expression is valid (and thus evaluatable) only when all of its variables are bound. But when we look at smaller subexpressions of a complex expression, taken out of context, they can have free variables – and making sure that the variables that are free in subexpressions are treated right is very important.

Lambda Calculus Evaluation Rules

There are only two real rules for evaluating expressions in λ calculus; they’re called α and β. α is also called “conversion”, and β is also called “reduction”.

α is a renaming operation; basically it says that the names of variables are unimportant: given any expression in λ calculus, we can change the name of the parameter to a function as long as we change all free references to it inside the body.

So – for instance, if we had an expression like:

λ x . if (= x 0) then 1 else x^2

We can do an α to replace X with Y (written “α[x/y]” and get):

λ y . if (= y 0) then 1 else y^2

Doing α does not change the meaning of the expression in any way. But as we’ll see later, it’s important because without it, we’d often wind up with situations where a single variable symbol is bound by two different enclosing λs. This will be particularly important when we get to recursion.

β reduction is where things get interesting: this single rule is all that’s needed to make the λ calculus capable of performing any computation that can be done by a machine.

β basically says that if you have a function application, you can replace it with a copy of the body of the function with references to the parameter identifiers replaced by references to the parameter value in the application. That sounds confusing, but it’s actually pretty easy when you see it in action.

Suppose we have the application expression: (λ x . x + 1) 3. By performing a beta reduction, we can replace the application by taking the body x + 1 of the function, and substituting (or αing) the value of the parameter (3) for the parameter variable symbol (x). So we replace all references to x with 3. So the result of doing a beta reduction xs 3 + 1.

A slightly more complicated example is the expression:

λ y . (λ x . x + y)) q

It’s an interesting expression, because it’s a λ expression that when applied, results in another λ expression: that is, it’s a function that creates functions. When we do beta reduction in this, we’re replacing all references to the parameter y with the identifier q; so, the result is λ x . x + q.

One more example, just for the sake of being annoying. Suppose we have: (λ x y. x y) (λ z . z * z) 3

That’s a function that takes two parameters, and applies the first one to the second one. When we evaluate that, we replace the parameter x in the body of the first function with λ z . z * z; and we replace the parameter y with 3, getting: (λ z . z * z) 3. And we can perform beta on that, getting 3 * 3.

Written formally, beta says: λ x . B e = B[x := e] if free(e) ⊂ free(B[x := e])

That condition on the end, “if free(e) ⊂ free(B[x := e]” is why we need α: we can only do beta reduction if doing it doesn’t create any collisions between bound identifiers and free identifiers: if the identifier “z” is free in “e”, then we need to be sure that the beta-reduction doesn’t make “z” become bound. If there is a name collision between a variable that is bound in “B” and a variable that is free in “e”, then we need to use α to change the identifier names so that they’re different.

As usual, an example will make that clearer: Suppose we have a expression defining a function, λ z . (λ x . x+z). Now, suppose we want to apply it: (λ z . (λ x . x + z)) (x + 2). In the parameter (x + 2), x is free. Now, suppose we break the rule and go ahead and do beta. We’d get “λ x . x + x + 2“. The variable that was free in x + 2 is now bound! We’ve changed the meaning of the function, which we shouldn’t be able to do. If we were to apply that function after the incorrect β, we’d get (λ x . x + x + 2) 3. By beta, we’d get 3 + 3 + 2, or 8.

What if we did α the way we were supposed to?

First, we’d do an α to prevent the name overlap. By α[x/y], we would get λ z . (λ y . y + z) (x+2).

Then by β, we’d get “λ y . y + x + 2“. If we apply this function the way we did above, then by β, we’d get 3+x+2.
3+x+2 and 3+3+2 are very different results!

And that’s pretty much it. There’s another optional rule you can add called η-conversion. η is a rule that adds extensionality, which provides a way of expressing equality between functions.

η says that in any λ expression, I can replace the value f with the value g if/f for all possible parameter values x, f x = g x.

What I’ve described here is Turing complete – a full effective computation system. To make it useful, and see how this can be used to do real stuff, we need to define a bunch of basic functions that allow us to do math, condition tests, recursion, etc. I’ll talk about those in my next post.

It’l also important to point out that while I’ve gone through a basic definition of λ calculus, and described its mechanics, I haven’t yet defined a model for λ-calculus. That’s quite an important omission! λ-calculus was played with by logicians for several years before they were able to come up with a complete model for it, and it was a matter of great concern that although it looked correct, the early attempts to define a model for it were failures! And without a valid model, the results of the system are meaningless. An invalid model in a logical system like calculus is like a contradiction in axioms: it means that nothing that it produces is valid.

The Banach-Tarski non-Paradox

42 Replies

For some reason, lately I’ve been seeing a bunch of mentions of Banach Tarski. B-T is a fascinating case of both how counter-intuitive math can be, and also how profoundly people can misunderstand things.

For those who aren’t familiar, Banach-Tarski refers to a topological/measure theory paradox. There are several variations on it, all of which are equivalent.

The simplest one is this: Suppose you have a sphere. You can take that sphere, and slice it into a finite number of pieces. Then you can take those pieces, and re-assemble them so that, without any gaps, you now have two spheres of the exact same size as the original.

Alternatively, it can be formulated so that you can take a sphere, slice it into a finite number of pieces, and then re-assemble those pieces into a bigger sphere.

This sure as heck seems wrong. It’s been cited as a reason to reject the axiom of choice, because the proof that you can do this relies on choice. It’s been cited by crackpots like EE Escultura as a reason for rejecting the theory of real numbers. And there are lots of attempts to explain why it works. For example, there’s one here that tries to explain it in terms of density. There’s a very cool visualization of it here, which tries to make sense of it by showing it in the hyperbolic plane. Personally, most of the attempts to explain it intuitively drive me crazy. One one level, intuitively, it doesn’t, and can’t make sense. But on another level, it’s actually pretty simple. No matter how hard you try, you’re never going to make the idea of turning a finite-sized object into a larger finite-sized object make sense. But on another level, once you think about infinite sets – well, it’s no problem.

The thing is, when you think about it carefully, it’s not really all that odd. It’s counterintuitive, but it’s not nearly as crazy as it sounds. What you need to remember is that we’re talking about a mathematical sphere – that is, an infinite collection of points in a space with a particular set of topological and measure relations.

Here’s an equivalent thing, which is a bit simpler to think about:

Take a line segment. How many points are in it? It’s infinite. So, from that infinite set, remove an infinite set of points. How many points are left? It’s still infinite. Now you’ve got two infinite sets of the same size. So, now you can use one of the sets to create the original line segment, and you can use the second one to create a second, identical line segment.

Still counterintuitive, but slightly easier.

How about this? Take the set of all natural numbers. Divide it into two sets: the set of even naturals, and the set of odd naturals. Now you have two infinite sets,
the set {0, 2, 4, 6, 8, …}, and the set {1, 3, 5, 7, 9, …}. The size of both of those sets is the ω – which is also the size of the original set you started with.

Now take the set of even numbers, and map it so that for any given value i, f(i) = i/2. Now you’ve got a copy of the set of natural numbers. Take the set of odd naturals, and map them with g(i) = (i-1)/2. Now you’ve got a second copy of the set of natural numbers. So you’ve created two identical copies of the set of natural numbers out of the original set of natural numbers.

The problem with Banach-Tarski is that we tend to think of it less in mathematical terms, and more in concrete terms. It’s often described as something like “You can slice up an orange, and then re-assemble it into two identical oranges“. Or “you can cut a baseball into pieces, and re-assemble it into a basketball.” Those are both obviously ridiculous. But they’re ridiculous because they violate one of our instinct that derives from the conservation of mass. You can’t turn one apple into two apples: there’s only a specific, finite amount of stuff in an apple, and you can’t turn it into two apples that are identical to the original.

But math doesn’t have to follow conservation of mass in that way. A sphere doesn’t have a mass. It’s just an uncountably infinite set of points with a particular collection of topological relationship and geometric relationships.

Going further down that path: Banach-Tarski relies deeply of the axiom of choice. The “pieces” that you cut have non-measurable volume. You’re “cutting” from the collection of points in the sphere in a way that requires you to make an uncountably infinite number of distinct “cuts” to produce each piece. It’s effectively a geometric version of “take every other real number, and put them into separate sets”. On that level, because you can’t actually do anything like that, it’s impossible and ridiculous. But you need to remember: we aren’t talking about apples or baseballs. We’re talking about sets. The “slices” in B-T aren’t something you can cut with a knife – they’re infinitely subdivided, not-contiguous pieces. Nothing in the real world has that property, and no real-world process has the ability to cut like that. But we’re not talking about the real world; we’re talking about abstractions. And on the level of abstractions, it’s no stranger than creating two copies of the set of real numbers.

Categorical Computation Characterized By Closed Cartesian Categories

3 Replies

One of my favorite categorical structures is a thing called a closed cartesian category, or CCC for short. Since I’m a computer scientist/software engineer, it’s a natural: CCCs are, basically, the categorical structure of lambda calculus – and thus, effectively, a categorical model of computation. However, before we can talk about the CCCs, we need – what else? – more definitions.

Cartesian Categories

A cartesian category $C$ (note not cartesian closed category) is a category:

With a terminal object $t$ , and
$forall a, b in Obj(C)$ , the objects and arrows of the categorical product $a times b in C$ .

So, a cartesian category is a category closed with respect to product. Many of the common categories are cartesian: the category of sets, and the category of enumerable sets, And of course, the meaning of the categorical product in set? Cartesian product of sets.

Categorical Exponentials

To get from cartesian categories to cartesian closed categories, we also need to define categorical exponentials. Like categorical product, the value of a categorical exponential is not required to included in a category. The exponential is a complicated definition, and it’s a bit hard to really get your head around, but it’s well worth the effort. If categorical products are the categorical generalization of set products, then the categorical exponential is the categorical version of a function space. It gives us the ability to talk about structures that are the generalized version of “all functions from A to B”.

Given two objects x and y from a category C, their categorical exponential x^y, if it exists in the category, is defined by a set of values:

An object $x^y$ ,
An arrow $mbox{eval}_{y,x}: x^y times y rightarrow x$ , called an evaluation map.
$forall z in Obj(C)$ , an operation $Lambda_C: (z times y rightarrow x) rightarrow (z rightarrow x^y)$ . (That is, an operation mapping from arrows to arrows.)

These values must have the following properties:

:
- $mbox{val}_{y,x} circ (Lambda_C(f)times 1_y)$
- $forall f : z times y rightarrow x, g : z rightarrow x^y: Lambda_C(mbox{eval}_{y,x} circ (z times 1_y) = z$

To make that a bit easier to understand, let’s turn it into a diagram.

As I alluded to earlier, you can also think of it as a generalization of a function space. $x^y$ is the set of all functions from y to x. The evaluation map is simple description in categorical terms of an operation that applies a function from a to b (an arrow) to a value from a, resulting in an a value from b.

So what does the categorical exponential mean? I think it’s easiest to explain in terms of sets and functions first, and then just step it back to the more general case of objects and arrows.

If X and Y are sets, then $X^Y$ is the set of functions from Y to X.

Now, look at the diagram:

The top part says, basically, that $g$ is a function from $Z$ to to $X^Y$ : so $g$ takes a member of $Z$ , and uses it to select a function from $Y$ to $X$ .
The vertical arrow says:
1. given the pair $(z,y)$ , $f(z,y)$ maps $(z,y)$ to a value in $X$ .
2. given a pair , we’re going through a function. It’s almost like currying:
  1. The vertical arrow going down is basically taking $g(z,y)$ , and currying it to $g(z)(y)$ .
  2. Per the top part of the diagram, $g(z)$ selects a function from $y$ to $x$ . (That is, a member of $X^Y$ .)
  3. So, at the end of the vertical arrow, we have a pair $(g(z), y)$ .
3. The “eval” arrow maps from the pair of a function and a value to the result of applying the function to the value.
Cartesian Closed Categories

Now – the abstraction step is actually kind of easy: all we’re doing is saying that there is a structure of mappings from object to object here. This particular structure has the essential properties of what it means to apply a function to a value. The internal values and precise meanings of the arrows connecting the values can end up being different things, but no matter what, it will come down to something very much like function application.

With exponentials and products, we can finally say what the cartesian closed categories (CCCs). A Cartesian closed category is a category that is closed with respect to both products and exponentials.

Why do we care? Well, the CCCs are in a pretty deep sense equivalent to the simply typed lambda calculus. That means that the CCCs are deeply tied to the fundamental nature of computation. The structure of the CCCs – with its closure WRT product and exponential – is an expression of the basic capability of an effective computing system. So next, we’ll take a look at a couple of examples of what we can do with the CCCs as a categorical model of computation.
Share this:
Like this:
Like Loading...

Building Structure in Category Theory: Definitions to Build On

3 Replies

The thing that I think is most interesting about category theory is that what it’s really fundamentally about is structure. The abstractions of category theory let you talk about structures in an elegant way; and category diagrams let you illustrate structures in a simple visual way. Morphisms express the structure of a category; functors are higher level morphisms that express the structure of relationships between categories.

In my last category theory post, I showed how you can use category theory to describe the basic idea of symmetry and group actions. Symmetry is, basically, an immunity to transformation – that is, a kind of structural property of an object or system where applying some kind of transformation to that object doesn’t change the object in any detectable way. The beauty of category theory is that it makes that definition much simpler.

Symmetry transformations are just the tip of the iceberg of the kinds of structural things we can talk about using categories. Category theory lets you build up pretty much any mathematical construct that you’d like to study, and describe transformations on it in terms of functors. In fact, you can even look at the underlying conceptual structure of category theory using category theory itself, by creating a category in which categories are objects, and functors are the arrows between categories.

So what happens if we take the same kind of thing that we did to get group actions, and we pull out a level, so that instead of looking at the category of categories, focusing on arrows from the specific category of a group to the category of sets, we do it with arrows between members of the category of functors?

We get the general concept of a natural transformation. A natural transformation is a morphism from functor to functor, which preserves the full structure of morphism composition within the categories mapped by the functors. The original inventor of category theory said that natural transformations were the real point of category theory – they’re what he wanted to study.

Suppose we have two categories, C and D. And suppose we also have two functors, F, G : C → D. A natural transformation from F to G, which we’ll call η maps every object x in C to an arrow η_x : F(x) → G(x). η_x has the property that for every arrow a : x → y in C, η_y º F(a) = G(a) º η_x. If this is true, we call η_x the component of η for (or at) x.

That paragraph is a bit of a whopper to interpret. Fortunately, we can draw a diagram to help illustrate what that means. The following diagram commutes if η has the property described in that paragraph.

I think this is one of the places where the diagrams really help. We’re talking about a relatively straightforward property here, but it’s very confusing to write about in equational form. But given the commutative diagram, you can see that it’s not so hard: the path η_y º F(a) and the path G(a) º η<sub compose to the same thing: that is, the transformation η hasn’t changed the structure expressed by the morphisms.

And that’s precisely the point of the natural transformation: it’s a way of showing the relationships between different descriptions of structures – just the next step up the ladder. The basic morphisms of a category express the structure of the category; functors express the structure of relationships between categories; and natural transformations express the structure of relationships between relationships.

Of course, this being a discussion of category theory, we can’t get any further without some definitions. To get to some of the interesting material that involves things like natural transformations, we need to know about a bunch of standard constructions: initial and final objects, products, exponentials… Then we’ll use those basic constructs to build some really fascinating constructs. That’s where things will get really fun.

So let’s start with initial and finial objects.

An initial object is a pretty simple idea: it’s an object with exactly one arrow to each of the other objects in the category. To be formal, given a category $C$ , an object $o \in Obj(C)$ is an initial object if and only if $\forall b \in Obj(c): \exists_1 f: o \rightarrow b \in Mor(C)$ . We generally write $0_c$ for the initial object in a category. Similarly, there’s a dual concept of a terminal object $1_c$ , which is object for which there’s exactly one arrow from every object in the category to $1_c$ .

Given two objects in a category, if they’re both initial, they must be isomorphic. It’s pretty easy to prove: here’s the sketch. Remember the definition of isomorphism in category theory. An isomorphism is an arrow $f : a \rightarrow b$ , where $\exists g : b \rightarrow a)$ such that $f \circ g = 1_b$ and $g \circ f = 1_a$ . If an object is initial, then there’s an arrow from it to every other object — including the other initial object. And there’s an arrow back, because the other one is initial. The iso-arrows between the two initials obviously compose to identities.

Now, let’s move on to categorical products. Categorical products define the product of two objects in a category. The basic concept is simple – it’s a generalization of cartesian product of two sets. It’s important because products are one of the major ways of building complex structures using simple categories.

Given a category $C$ , and two objects $a,b \in Obj(C)$ , the categorical product $a times b$ consists of:

An object $p$ , often written $a \times b$ ;
two arrows $p_a$ and $p_b$ , where $p \in Obj(C)$ , $p_a : p \rightarrow a$ , and $p_b : p \rightarrow b$ .
a “pairing” operation, which for every object , maps the pair of arrows and
to an arrow , where has the
following properties:
1. $p_a \circ \langle f,g \rangle = f$
2. $p_b \circ \langle f,g \rangle = g$
3. $\forall h : c \rightarrow a \times b: \langle p_a \circ h, p_b \circ h \rangle = h$

The first two of those properties are the separation arrows, to get from the product to its components; and the third is the merging arrow, to get from the components to the product. We can say the same thing about the relationships in the product in an easier way using a commutative diagram:

One important thing to understand is that categorical products do not have to exist. This definition doen not say that given any two objects $a$ and $b$ , that $a times b$ is a member of the category. What it says is what the categorical product
looks like if it exists. If, for a given pair a and b of objects, there is an object that meets this definition, then the product of a and b exists in the category. If not, it doesn’t. For many categories, the products don’t exist for some or even all of the objects in the category. But as we’ll see later, the categories for which the products do exist have some really interesting properties.

Fun with Functors

16 Replies

So far, we’ve looked at the minimal basics of categories: what they are, and how to categorize the kinds of arrows that exist in categories in terms of how they compose with other arrows. Just that much is already enlightening about the nature of category theory: the focus is always on composition.

But to get to really interesting stuff, we need to build up a bit more, so that we can look at more interesting constructs. So now, we’re going to look at functors. Functors are one of the most fundamental constructions in category theory: they give us the ability to create multi-level constructions.

What’s a functor? Well, it’s basically a structure-preserving mapping between categories. So what does that actually mean? Let’s be a bit formal:

A functor $F$ from a category $C$ to a category $D$ is a mapping from $C$ to $D$ that:

Maps each member $m in Obj(C)$ to an object $F(m) in Obj(D)$ .
Maps each arrow to an arrow , where:
- $forall o in Obj(C): F(1_o) = 1_{F(o)}$ . (Identity is preserved by the functor mapping of morphisms.)
- $forall m,n in Mor(C): F(n circ o) = F(n) circ F(o)$ . (Commutativity is preserved by the Functor mapping of morphisms.)

Note: The original version of this post contained a major typo. In the second condition on functors, the “n” and the “o” were reversed. With them in this direction, the definition is actually the definition of something called a covariant functor. Alas, I can’t even pretend that I mixed up covariant and contravariant functors; the error wasn’t nearly so intelligent. I just accidentally reversed the symbols, and the result happened to make sense in the wrong way.

That’s the standard textbook gunk for defining a functor. But if you look back at the original definition of a category, you should notice that this looks familiar. In fact, it’s almost identical to the definition of the necessary properties of arrows!

We can make functors much easier to understand by talking about them in the language of categories themselves. Functors are really nothing but morphisms – they’re morphisms in a category of categories.

There’s a kind of category, called a small category. (I happen to dislike the term “small” category, but I don’t get a say!) A small category is a category whose collections of objects and arrows are sets, not proper classes.

(As a quick reminder: in set theory, a class is a collection of sets that can be defined by a non-paradoxical property that all of its members share. Some classes are sets of sets; some classes are not sets; they lack some of the required properties of sets – but still, the class is a collection with a well-defined, non-paradoxical, unambiguous property. If a class isn’t a set of sets, but just a collection that isn’t a set, then it’s called a proper class.)

Any category whose collections of objects and arrows are sets, not proper classes, are called small categories. Small categories are, basically, categories that are well-behaved – meaning that their collections of objects and arrows don’t have any of the obnoxious properties that would prevent them from being sets.

The small categories are, quite beautifully, the objects of a category called Cat. (For some reason, category theorists like three-letter labels.) The arrows of Cat are all functors – functors really just morphisms between categories. Once you wrap you head around that, then the meaning of a functor, and the meaning of a structure-preserving transformation become extremely easy to understand.

Functors come up over and over again, all over mathematics. They’re an amazingly useful notion. I was looking for a list of examples of things that you can describe using functors, and found a really wonderful list on wikipedia.. I highly recommend following that link and taking a look at the list. I’ll just mention one particularly interesting example: groups and group actions.

If you’ve been reading GM/BM for a very long time, you’ll remember my posts on group theory. In a very important sense, the entire point of group theory is to study symmetry. But working from a set theoretic base, it takes a lot of work to get to the point where you can actually define symmetry. It took many posts to build up the structure – not to present set theory, but just to present the set theoretic constructs that you need to define what symmetry means, and how a symmetric transformation was nothing but a group action. Category theory makes that so much easier that it’s downright dazzling. Ready?

Every group can be represented as a category with a single object. A functor from the category of a group to the category of Sets is a group action on the set that is the target of the functor. Poof! Symmetry.

Since symmetry means structure-preserving transformation; and a functor is a structure preserving transformation – well, they’re almost the same thing. The functor is an even more general abstraction of that concept: group symmetry is just one particular case of a functor transformation. Once you get functors, understanding symmetry is easy. And so are lots of other things.

And of course, you can always carry these things further. There is a category of functors themselves; and notions which can be most easily understood in terms of functors operating on the category of functors!

This last bit should make it clear why category theory is affectionately known as abstract nonsense. Category theory operates at a level of abstraction where almost anything can be wrapped up in it; and once you’ve wrapped something up in a category, almost anything you can do with it can itself be wrapped up as a category – levels upon levels, categories of categories, categories of functors on categories of functors on categories, ad infinitum. And yet, it makes sense. It captures a useful, comprehensible notion. All that abstraction, to the point where it seems like nothing could possibly come out of it. And then out pops a piece of beautiful crystal. It’s really remarkable.

Good Math/Bad Math

The beauty of math; the humor of stupidity.

Category Archives: Good Math

Interpreting Lambda Calculus using Closed Cartesian Categories

Like this:

Programs As Proofs: Models and Types in the Lambda Calculus

Types

Programs are Proofs

Like this:

Models and Why They Matter

Like this:

The Basics of Software Transactional Memory

Like this:

The Wrong Way To Write Concurrent Programs: Actors in Cruise

Quick overview of the actor model

Tuples – A Really Ugly Way of Handling Data

Code Examples!

Like this:

Sidetrack from the CCCs: Lambda Calculus

Lambda Calculus Syntax

Lambda Calculus Evaluation Rules

Like this:

The Banach-Tarski non-Paradox

Like this:

Categorical Computation Characterized By Closed Cartesian Categories

Cartesian Categories

Categorical Exponentials

Cartesian Closed Categories

Like this:

Building Structure in Category Theory: Definitions to Build On

Like this:

Fun with Functors

Like this:

Share this:

Like this:

Types

Programs are Proofs

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Quick overview of the actor model

Tuples – A Really Ugly Way of Handling Data

Code Examples!

Share this:

Like this:

Lambda Calculus Syntax

Lambda Calculus Evaluation Rules

Share this:

Like this:

Share this:

Like this:

Cartesian Categories

Categorical Exponentials

Cartesian Closed Categories

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: