I’m on vacation this week, so I’m recycling some posts that I thought were particularly interesting to give you something to read.
In computer science, especially in the field of programming languages, we tend to use one particular calculus a lot: the Lambda calculus. Lambda calculus is also extensively used by logicians studying the nature of computation and the structure of discrete mathematics. Lambda calculus is great for a lot of reasons, among them:
- It’s very simple.
- It’s Turing complete.
- It’s easy to read and write.
- It’s semantics are strong enough that we can do reasoning from it.
- It’s got a good solid model.
- It’s easy to create variants to explore the properties of various alternative ways of structuring computations or semantics.
The ease of reading and writing lambda calculus is a big deal. It’s led to the development of a lot of extremely good programming languages based, to one degree or another, on the lambda calculus: Lisp, ML, and Haskell are very strongly lambda calculus based.
The lambda calculus is based on the concept of *functions*>. In the pure lambda calculus, *everything* is a function; there are no values *at all* except for functions. But we can build up anything we need using functions. Remember back in the early days of this blog, I talked a bit about how to build mathematics? We can build the entire structure of mathematics from nothing but lambda calculus.
So, enough lead-in. Let’s dive in a look at LC. Remember that for a calculus, you need to define two things: the syntax, which describes how valid expressions can be written in the calculus; and a set of rules that allow you to symbolically manipulate the expressions.
Lambda Calculus Syntax
The lambda calculus has exactly three kinds of expressions:
- Function definition: a function in lambda calculus is an expression, written: “
λ param . body
” which defines a function with one parameter. - Identifier reference: an identifier reference is a name which matches the name of a parameter defined in a function expression enclosing the reference.
- Function application: applying a function is written by putting the function value in front of its parameter, as in “
x y
” to apply the functionx
to the valuey
.
Currying
There’s a trick that we play in lambda calculus: if you look at the definition above, you’ll notice that a function (lambda expression) only takes one parameter. That seems like a very big constraint – how can you even implement addition with only one parameter?
It turns out to be no problem, because of the fact that *functions are values*. So instead of writing a two parameter function, you can write a one parameter function that returns a one parameter function – in the end, it’s effectively the same thing. It’s called currying, after the great logician Haskell Curry.
For example, suppose we wanted to write a function to add x and y. We’d like to write something like: λ x y . x + y
. The way we do that with one-parameter functions is: we first write a function with one parameter, which returns another function with one parameter.
Adding x plus y becomes writing a one-parameter function with parameter x, which returns another one parameter function which adds x to its parameter:
λ x . (λ y . x + y)
Now that we know that adding multiple parameter functions doesn’t *really* add anything but a bit of simplified syntax, we’ll go ahead and use them when it’s convenient.
Free vs Bound Identifiers
One important syntactic issue that I haven’t mentioned yet is *closure* or *complete binding*. For a lambda calculus expression to be evaluated, it cannot reference any identifiers that are not *bound*. An identifier is bound if it a parameter in an enclosing lambda expression; if an identifier is *not* bound in any enclosing context, then it is called a *free* variable.
- *λ x . plus x y*: in this expression, “y” and “plus” are free, because they’re not the parameter of any enclosing lambda expression; x is bound because it’s a parameter of the function definition enclosing the expression “plus x y” where it’s referenced.
- *λ x y.y x*: in this expression both x and y are bound, because they are parameters of the function definition.
- *λ y . (λ x . plus x y*: In the inner lambda, “*λ x . plus x y*”, y and plus are free and x is bound. In the full expression, both x and y are bound: x is bound by the inner lambda, and y is bound by the other lambda. “plus” is still free.
We’ll often use “free(x)” to mean the set of identifiers that are free in the expression “x”.
A lambda calculus expression is completely valid only when all of its variables are bound. But when we look at smaller subexpressions of a complex expression, taken out of context, they can have free variables – and making sure that the variables that are free in subexpressions are treated right is very important.
Lambda Calculus Evaluation Rules
There are only two real rules for evaluating expressions in lambda calculus; they’re called α and beta. α is also called “conversion”, and beta is also called “reduction”.
α Conversion
α is a renaming operation; basically it says that the names of variables are unimportant: given any expression in lambda calculus, we can change the name of the parameter to a function as long as we change all free references to it inside the body.
So – for instance, if we had an expression like:
λ x . if (= x 0) then 1 else x^2
We can do α to replace X with Y (written “α[x/y]” and get):
λ y . if (= y 0) then 1 else y^2
Doing α does *not* change the meaning of the expression in any way. But as we’ll see later, it’s important because it gives us a way of doing things like recursion.
β Reduction
β reduction is where things get interesting: this single rule is all that’s needed to make the lambda calculus capable of performing *any* computation that can be done by a machine.
Beta basically says that if you have a function application, you can replace it with a copy of the body of the function with references to the parameter identifiers replaced by references to the parameter value in the application. That sounds confusing, but it’s actually pretty easy when you see it in action.
Suppose we have the application expression: “*(λ x . x + 1) 3*“. What beta says is that we can replace the application by taking the body of the function (which is “*x + 1*”); and replacing references to the parameter “*x*” by the value “*3*”; so the result of the beta reduction is “*3 + 1*”.
A slightly more complicated example is the expression:
λ y . (λ x . x + y)) q
It’s an interesting expression, because it’s a lambda expression that when applied, results in another lambda expression: that is, it’s a function that creates functions. When we do beta reduction in this, we’re replacing all references to the parameter “y” with the identifier “q”; so, the result is “*λ x. x+q*”.
One more example, just for the sake of being annoying:
“(λ x y. x y) (λ z . z × z) 3“.
That’s a function that takes two parameters, and applies the first one to the second one. When we evaluate that, we replace the parameter “*x*” in the body of the first function with “*λ z . z × z*”; and we replace the parameter “*y*” with “3”, getting: “*(λ z . z × z) 3*”. And we can perform beta on that, getting “*3 × 3*”.
Written formally, beta says:
λ x . B e = B[x := e] if free(e) ⊂ free(B[x := e]
That condition on the end, “if free(e) subset free(B[x := e]” is why we need α: we can only do beta reduction *if* doing it doesn’t create any collisions between bound identifiers and free identifiers: if the identifier “z” is free in “e”, then we need to be sure that the beta-reduction doesn’t make “z” become bound. If there is a name collision between a variable that is bound in “B” and a variable that is free in “e”, then we need to use α to change the identifier names so that they’re different.
As usual, an example will make that clearer: Suppose we have a expression defining a function, “*λ z . (λ x . x+z)*”. Now, suppose we want to apply it:
(λ z . (λ x . x + z)) (x + 2)
In the parameter “*(x + 2)*”, x is free. Now, suppose we break the rule and go ahead and do beta. We’d get “*λ x . x + x + 2*”.
The variable that was *free* in “x + 2” is now bound. Now suppose we apply that function: “*(λ x . x + x + 2) 3*”. By beta, we’d get “3 + 3 + 2”, or 8.
What if we did α the way we were supposed to?
- By α[x/y], we would get “*λ z . (λ y . y + z) (x+2)*”.
- by α[x/y]:
lambda z . (lambda y . y+z)) (x + 2)
. Then by β, we’d get “*λ y . y + x + 2*”. If we apply this function the way we did above, then by β, we’d get “*3+x+2*”.
“*3+x+2*” and “*3+3+2*” are very different results!
And that’s pretty much it. There’s another *optional* rule you can add called η-conversion. η is a rule that adds *extensionality*, which provides a way of expressing equality between functions.
η says that in any lambda expression, I can replace the value “f” with the value “g” if/f for all possible parameter values x, *f x = g x*.
What I’ve described here is Turing complete – a full effective computation system. To make it useful, and see how this can be used to do real stuff, we need to define a bunch of basic functions that allow us to do math, condition tests, recursion, etc. I’ll talk about those in my next post.
We also haven’t defined a model for lambda-calculus yet. (I discussed models here and here.) That’s actually quite an important thing! LC was played with by logicians for several years before they were able to come up with a complete model for it, and it was a matter of great concern that although LC looked correct, the early attempts to define a model for it were failures. After all, remember that if there isn’t a valid model, that means that the results of the system are meaningless!
I’d love to see an elaboration of this point.
Mark, I found this blog post of yours very useful in understanding lambda calculus.
I have a question: Why is it called “lambda” calculus? 🙂
Ashwin:
That turned out to be a good question. It was more arbitrary than I thought.
See pp 18-19 in http://ece.ut.ac.ir/classpages/F84/ProgrammingLanguagesDesign/slides/Lambda.ppt .
I have no formal degree in mathematics and was looking for a intro to lambda calculus. Finally found it. Very well written article. cheers!
I just read the entire sequence of lambda calculus posts, and one of the first things that struck me was that while I had only vaguely heard of the lambda calculus before, everything looked very familiar. I found myself explaining the system to myself by comparing it to … birds. You see, there is a wonderful book by Raymond Smullyan called To Mock a Mockingbird that teaches, using exercises disguised as puzzles (and wrapped up in a very whimsical writing style), about Curry’s combinators. The final chapter of the book (of course– how could he resist, after building all the machinery?) proves G”odel’s incompleteness theorem. In any case, to me, S and K are a starling and a kestrel. Just so you know.