Most programmers are familiar with λ-calculus. It’s one of the most widely used tools in theoretical computer science and logic. It’s the basis of many programming languages, including Lisp, Haskell, ML, and Scala. It’s had a strong influence on many other programming languages that aren’t strictly based on it, including Java, C++, and Javascript.
Motivation: the Problem
Modern programmers love the λ-calculus. A lot of the mathematicians who worked in what became computer science also loved it. But if you go back to its roots, there was a problem.
The λ-calculus in its non-typed form, as it was originally defined, was inconsistent.
Haskell Curry eventually reduced the paradox to something quite simple:
That is a function which returns if is true.
To see the paradox, we need to change how we think of λ-calculus. Most people today who know λ-calculus are programmers, and we think of it primarily as something like a programming language. But the roots of λ-calculus were laid before computers as we know them existed. At the time, λ-calculus was a tool used by logicians. To them it wasn’t a computing machine, it was a logical model of computation.
When we look at the expression , what we see is executable logic that reads as “If applying to returns true, then return y”. And by using computational reasoning, we’d conclude that is a non-terminating computation, because to get the value of , we need to evaluate , and then apply to that result. But to logicians like Haskell Curry, it read is “The statement implies y”. Rendered into simple english, it’s a statement like: “If this statement is true, then my hair is purple”. It’s purely a logical implication, and so even though we can’t actually evaluate , in logical terms, we can still say “This is a well-formed logical statement, so is it true?”.
Is that true? Suppose it is. If it’s true, then that means that it says that whatever is must be true. Without knowing what is, we can’t be sure if this is a true statement.
Suppose that it’s false. If is false, that means that the implication must be true, because FOPL says that if the antecedent of an implication is false, the entire implication is true.
It’s clearer when you look at the english: If this sentence is true, then my hair is purple”. My hair isn’t purple, which means that the statement can’t be true. But if the statement isn’t true, then the implication is true, so the statement is true. We’re caught in a self-reference loop, just like what we saw in Russell’s paradox.
This was considered a very bad thing. It’s a bit subtler than the problem with naive set theory. In set theory, we had an unambiguous inconsistency: we had a set that was well-defined under the axioms of set theory, and without any question, that set was inconsistent. Here, we’ve got an expression which might be consistent, and then again, it might not. It depends on what the consequent – the “my hair is purple” part – of the implication is. If it’s something true, then we’re fine. If it’s not, then we’re stuck.
The problem is, no matter what you put into that “my hair is purple” part, you can use this to produce a proof that it’s true, even if it isn’t. And that’s a fundamental inconsistency, which can’t be tolerated.
Curry’s paradox meant that the logicians working with λ-calculus had a problem that they needed to solve. And following the fashion of the time, they solved it using types. Much like ST type theory attempted to preserve as much of set theory as possible while fixing the inconsistency, typed λ-calculus tried to fix the inconsistency of λ-calculus. The basic approach was also similar: the typed λ-calculus introduced a stratification which made it impossible to build the structures that led to inconsistency.
In ST, the solution was to build a logical system in which it was impossible to express self-reference. In λ-calculus, the basic goal was the same: to eliminate self-reference. But in λ-calculus, that restriction is stated differently. What the type system in λ-calculus does is make it impossible to construct a statement that is a fixed-point combinator.
If we look carefully at the paradoxical statement up above, it’s not really pure λ-calculus. It relies on the fact that we’re defining a function named r, and then applying r to itself using the name . But that’s just syntactic sugar: in fact, can’t reference . You need some tool to let you apply an expression to itself: that tool is called a fixed-point combinator. The most common fixed-point in λ-calculus is the Y combinator, which underlies how recursive computations usually work in λ-calculus.
The way that the simply typed λ-calculuss gets around the Curry paradox is by making it impossible to build a well-typed fixed-point combinator. Without that, you can’t build the self-referential constructs that cause the inconsistency. The downside is that the simply typed λ-calculus, without a fixed point combinator, is not Turing complete. The evaluation of every simple typed λ-calculus expression will eventually terminate.
(As an aside: this isn’t really a problem in practice. The self-referential expressions that cause the Curry paradox turn into non-terminating computations. So they don’t produce a paradox; they just don’t produce anything. Logical inconsistencies don’t produce results: they’re still an error, instead of terminating with an inconsistent result, they just never terminate. Again, to the logicians at the time, the idea of non-termination was, itself, a deep problem that needed a solution.)
The Solution: Stratification by Types
The way that the simply typed λ-calculus fixed things was by creating a stratification using types. The type system created a restriction on the set of statements that were valid, well-formed logical statements. That restriction made it impossible to express a fixed point combinator or a general recursive computation of any kind.
It’s helpful to once again refer back to ST. In ST, we started the type of atoms at level 0. To get to level 1, we defined predicates over level-0 objects, and the set of objects that matched the predicate was a level-1 type. Then we could define predicates over level-1 objects, and the set of level-1 types that satisfied the predicate was a level-2 type, and so on. In the simply typed λ-calculus, we do the same thing, but with functions: we can build functions that operate on primitive atoms (also called base values), or on other functions. When we define a function, it must be assigned a type, and that type must be something at a lower level than the function being defined. You can’t ever pass a function to itself, because the function is an object at a higher level than the type of its parameter.
We start with base types. Every simply-typed lambda calculus starts with a collection of primitive atomic values. The set of atomic values is partitioned into a collection of types, which are called the base types. Base types are usually named by single lower-case greek letters: So, for example, we could have a type which consists of the set of natural numbers; a type which corresponds to boolean true/false values; and a type which corresponds to strings.
Once we have basic types, then we can talk about the type of a function. A function maps from a value of one type (the type of parameter) to a value of a second type (the type of the return value). For a function that takes a parameter of type , and returns a value of type , we write its type as ““. The “” is called the function type constructor; it associates to the right, so is equivalent to .
In every function declaration, we need to specify the type of the parameters and the type of the result. So:
- is a function which takes natural number as a parameter, and returns a natural number as a result.
- is a function which takes a function as a parameter, and produces a natural number as a result.
As usual in λ-calculus, we don’t have multi-parameter functions – all functions are curried, so a function like addNatural
would be a function that takes a natural number as a paramater, and returns a function that takes a natural number and returns a natural number. So the type of addNatural
is .
How does this get around the self-reference problem? A function like the one in the Curry paradox takes an object of its own type as a parameter. There’s no way to write that in a type system. It’s a significant restriction which makes it impossible to write general recursive expressions – it limits us to something close to primitive recursion, but it avoids the inconsistency. All valid expressions written with this system of types in place is guaranteed to terminate with a consistent result.
Extending λ-calculus with types
Now, it’s time to get at least a little bit formal, to see how we integrate a stratified type system into the lambda calculus. There’s two facets to that: the syntactic, and the analytic. The syntactic part shows how we extend λ-calculus to include type declarations, and the analytic part shows how to determine whether or not an expression with type declarations is valid.
The syntax part is easy. We add a “:” to the notation; the colon has an expression or variable binding on its left, and a type specification on its right. It asserts that whatever is on the left side of the colon has the type specified on the right side. A few examples:
- This asserts that the parameter, has type , which we’ll use as the type name for the natural numbers. (In case it’s hard to tell, that’s a greek letter “nu” for natural.) There is no assertion of the type of the result of the function; but since we know that “+” is a function with type , we can infer that the result type of this function will be .
- This is the same as the previous, but with the type declaration moved out, so that it asserts the type for the lambda expression as a whole. This time we can infer that because the function type is , which means that the function parameter has type .
- This is a two parameter function; the first parameter has type ν, and the second has type δ. We can infer the return type, which is ν. So the type of the full function is ν → δ → ν. This may seem surprising at first; but remember that λ-calculus really works in terms of single parameter functions; multi-parameter functions are a shorthand for currying. So really, this function is: λ x : ν . (λ y : δ . if y then x * x else x); the inner lambda is type δ → ν; the outer lambda is type ν → (δ → ν).
To talk about whether a program is valid with respect to types (aka well-typed), we need to introduce a set of rules for checking the validity of the type declarations. Using those rules, we can verify that the program is type-consistent.
In type analysis, we’ll talk about judgements. When we can infer the type of an expression using an inference rule, we call that inference a type judgement. Type analysis allows us to use inference and judgements to reason about types in a lambda expression. If any part of an expression winds up with an inconsistent type judgement, then the expression is invalid. If we can show that all of the components of an expression have consistent type judgements, then we can conclude that the expression is well-typed, meaning that it’s valid with respect to the type system.
Type judgements are usually written in a sequent-based notation, which looks like a fraction where the numerator consists of statements that we know to be true; and the denominator is what we can infer from the numerator. In the numerator, we normally have statements using a context, which is a set of type judgements that we already know;it’s usually written as an uppercase greek letter. If a type context includes the judgement that , I’ll write that as .
Rule 1: Type Identity
This is the simplest rule: if we have no other information except a declaration of the type of a variable, then we know that that variable has the type it was declared with.
Rule 2: Type Invariance
This rule is a statement of non-interference. If we know that , then inferring a type judgement about any other term cannot change our type judgement for .
Rule 3: Function Type Inference
This statement allows us to infer function types given parameter types. Ff we know the type of the parameter to a function is ; and if, with our knowledge of the parameter type, we know that the type of term that makes up the body of the function is , then we know that the type of the function is .
Rule 4: Function Application Inference
This one is easy: if we know that we have a function that takes a parameter of type and returns a value of type , then if we apply that function to a value of type , we’ll get a value of type .
These four rules are it. If we can take a lambda expression, and come up with a consistent set of type judgements for every term in the expression, then the expression is well-typed. If not, then the expression is invalid.
So let’s try taking a look at a simple λ-calculus expression, and seeing how inference works on it.
Without any type declarations or parameters, we don’t know the exact type. But we do know that “x” has some type; we’ll call that “α”; and we know that “y” is a function that will be applied with “x” as a parameter, so it’s got parameter type α, but its result type is unknown. So using type variables, we can say “x:α,y:α→β”. We can figure out what “α” and “β” are by looking at a complete expression. So, let’s work out the typing of it with x=”3″, and y=”λ a:ν.a*a”. We’ll assume that our type context already includes “*” as a function of type “ν→ν→ν”, where ν is the type of natural numbers.
- “λ x y . y x) 3 (λ a:ν . a * a)”: Since 3 is a literal integer, we know its type: 3:ν.
- By rule 4, we can infer that the type of the expression “a*a” where “a:ν” is “ν”, and *:ν→ν→ν so therefore, by rule 3 the lambda expression has type “ν→ν”. So with type labelling, our expression is now: “(λ x y . y x) (3:ν) (λ a:ν.(a*a):ν) : ν→ν”.
- So – now, we know that the parameter “x” of the first lambda must be “ν”; and “y” must be “ν→ν”; so by rule 4, we know that the type of the application expression “y x” must be “ν”; and then by rule 3, the lambda has type: “ν→(ν→ν)→ν”.
- So, for this one, both α and β end up being “ν”, the type of natural numbers.
So, now we have a simply typed λ-calculus. The reason that it’s simply typed is because the type treatment here is minimal: the only way of building new types is through the unavoidable constructor. Other typed lambda calculi include the ability to define parametric types, which are types expressed as functions ranging over types.
Programs are Proofs
Now we can get to the fun part. The mantra of type theory is the program is the proof. Here’s where we get our first glimpse of just what that really means!
Think about the types in the simple typed language calculus. Anything which can be formed from the following grammar is a λ-calculus type:
- type ::= primitive | function | ( type )
- primitive ::= α | β | δ | …
- function ::= type→type
The catch with that grammar is that you can create type expressions which, while they are valid type definitions, you can’t write a single, complete, closed expression which will actually have that type. (A closed expression is one with no free variables.) When there is an expression that has a type, we say that the expression inhabits the type; and that the type is an inhabited type. If there is no expression that can inhabit a type, we say it’s uninhabitable. Any expression which either can’t be typed using the inference rules, or which is typed to an uninhabitable type is a type error.
So what’s the difference between inhabitable type, and an uninhabitable type?
The answer comes from something called the Curry-Howard isomorphism. For a typed λ-calculus, there is a corresponding intuitionistic logic. A type expression is inhabitable if and only if the type is a provable theorem in the corresponding logic.
The type inference rules in λ-calculus are, in fact, the same as logical inference rules in intuitionistic logic. A type can be seen both as a statement that this is a function that maps from a value of type to a value of type , and as a logical statement that if we’re given a fact , we could use that to infer the truth of a fact .
If there’s a logical inference chain from an axiom (a given type assignment) to an inferred statement, then the inferred statement is an inhabitable type. If we have a type , then given a inhabited type , we know that is inhabitable, because if is a fact, then is also a fact.
On the other hand, think of a different case . That’s not a theorem, unless there’s some other context that proves it. As a function type, that’s the type of a function which, without including any context of any kind, can take a parameter of type α, and return a value of a different type β. You can’t do that – there’s got to be some context which provides a value of type β – and to access the context, there’s got to be something to allow the function to access its context: a free variable. Same thing in the logic and the λ-calculus: you need some kind of context to establish “α→β” as a theorem (in the logic) or as an inhabitable type (in the λ-calculus).
What kind of context would make a type inhabitable? A definition of a valid function that takes an α, and returns a β. If such a function exists, then that function is a proof of the inhabitility of the type. Literally, the program is the proof.