Today’s entry is short, but sweet. I wanted to write something longer, but I’m very busy at work, so this is what you get. I think it’s worth posting despite its brevity.
When we look at groups, one of the problems that we can notice is that there are things
that seem to be symmetric, but which don’t work as groups. What that means is that despite the
claim that group theory defines symmetry, that’s not really entirely true. My favorite example of this is the fifteen puzzle.
The fifteen puzzle is a four-by-four grid filled with 15 tiles, numbered from 1 to 15, and one empty space. You can make a move in the puzzle by sliding a tile adjacent to the empty
space into the empty. In the puzzle, you scramble up the tiles, and then try to move them back so that they’re in numerical order. The puzzle, in its initial configuration, is shown to the right.
If you look at the 15 puzzle in terms of configurations – that is, assignments of the pieces to different positions in the grid – so that each member of the group describes a single tile-move in a configuration, you can see some very clear symmetries. For example, the moves that are possible when the empty is in any corner are equivalent to the moves that are possible when the empty is in any other corner. The possible moves when the space is in any given position are the same except for the labeling of the tiles around them. There’s definitely a kind of symmetry there. There are also loops – sequences of moves which end in exactly the same state as the one in which they began. Those are clearly symmetries.
But it’s not a group. In a group, the group operation most be total – given any pair of values x and y in the group, it must be possible to combine x and y via x+y. But with the 15 puzzle, there moves that can’t be combined with other moves. If x = “move the ‘3’ tile from square 2 to square 6”, and y = “move the ‘7’ tile from square 10 to square 11”, then there’s no meaningful value for “x+y”; the two moves can’t be combined.
So far, I’ve spent some time talking about groups and what they mean. I’ve also given a
brief look at the structures that can be built by adding properties and operations to groups –
specifically rings and fields.
Now, I’m going to start over, looking at things using category theory. Today, I’ll start
with a very quick refresher on category theory, and then I’ll give you a category theoretic
presentation of group theory. I did a whole series of articles about category theory right after I moved GM/BM to ScienceBlogs; if you want to read more about category theory than this brief introduction, you can look at the category theory archives.
Like set theory, category theory is another one of those attempts to form a fundamental
abstraction with which you can build essentially any mathematical abstraction. But where sets
treat the idea of grouping things together as the fundamental abstraction, category
theory makes the idea of mappings between things as the fundamental abstraction.
As promised, I’m finally going to get to the theory behind monads. As a quick review, the basic idea of the monad in Haskell is a hidden transition function – a monad is, basically, a state transition function.
The theory of monads comes from category theory. I’m going to assume you know a little bit about category theory – if you have trouble with it, go take a look at my introductory posts here.
Suppose we’ve got a topological space. So far, in our discussion of topology, we’ve tended to focus either very narrowly on local properties of **T** (as in manifolds, where locally, the space appears euclidean), or on global properties of **T**. We haven’t done much to *connect* those two views. How do we get from local properties to global properties?
One of the tools for doing that is a sheaf (plural “sheaves”). A sheaf is a very general kind of structure that provides ways of mapping or relating local information about a topological space to global information about that space. There are many different kinds of sheaves; rather than being exhaustive, I’ll pretty much stick to a simple sheaf of functions on the topological space. Sheaves show up *all over* the place, in everything from abstract algebra to algebraic geometry to number theory to analysis to differential calculus – pretty much every major abstract area of mathematics uses sheaves.
This is going to be a short but sweet post on topology. Remember way back when I started writing about category theory? I said that the reason for doing that was because it’s such a useful tool for talking about other things. Well, today, I’m going to show you a great example of that.
Last friday, I went through a fairly traditional approach to describing the topological product. The traditional approach not *very* difficult, but it’s not particularly easy to follow either. The construction isn’t really that difficult, but it’s not easy to work out just what it all really means.
There is another approach to presenting it using category theory, and to me at least, it makes it a *whole* lot easier to grasp. To make the diagrams easier to draw, I’ll adopt one shorthand: instead of writing (T,τ) for topological spaces, I’ll use a single symbol, like **X**, with the understanding that **X** represents the *pair* of the set and the topology that form the topological space.
Suppose we have a set topological spaces, **E**1, **E**2, …, **E**n. The product **P** = Πi=1..n**E**i is the *only* topological space with projection functions pi : **P** → **E**i, such that
for any other topological spaces **S**, if **S** has continuous functions fi : **S** → **E**i to each of the elements of the product, then there is *exactly one* continuous function g : **S** → **P** such that the following diagram commutes:
That’s really just a repetition of the definition of categorical product, just made specific to the category **Top**. Everything I said in fridays post about what forms the open sets of the topological product space is directly implied by this categorical definition. The property of the open sets of the product topology being the coarsest structure of sets that maintains the structural properties of the product element topologies – that’s implied by the categorical description.
To me, this is the real beauty of category theory, and the whole reason why I spent all that time explaining it. Being able to describe structures in the language of category theory makes things much easier to understand.
This is one of the last posts in my series on category theory; and it’s a two parter. What I’m going to do in these two posts is show the correspondence between lambda calculus and the [cartesian closed categories][ccc]. If you’re not familiar with lambda calculus, you can take a look at my series of articles on it [here][lambda]. You might also want to follow the CCC link above, to remind yourself about the CCCs. (The short refresher is: a CCC is a category that has an exponent and a product, and is closed over both. The product is an abstract version of cartesian set product; the exponent is an abstraction of the idea of a function, with an “eval” arrow that does function evaluation.)
Today I’m going to show you one half of the correspondence: how the lambda-theory of any simply typed lambda calculus can be mapped onto a CCC. Tomorrow, I’ll show the other direction: how an arbitrary CCC can be mapped onto a lambda-theory.
First, let’s define the term “lambda theory”. In the simply typed lambda calculus, we always have a set of *base types* – the types of simple atomic values that can appear in lambda expressions. A *lambda theory* is a simply typed lambda calculus, plus a set of additional rules that define equivalences over the base types.
So, for example, if one of the base types of a lambda calculus was the natural numbers, the lambda theory would need to include rules to define equality over the natural numbers:
1. x = y if x=0 and y=0; and
2. x = y if x=s(x’) and y=s(y’) and x’ = y’
So. Suppose we have a lambda-theory **L**. We can construct a corresponding category C(**L**). The objects in C(**L**) are the *types* in **L**. The arrows in C(**L**) correspond to *families* of expressions in **L**; an arrow f : A → B corresponds to the set of expressions of type B that contain a single *free* variable of type A.
The *semantics* of the lambda-theory can be defined by a *functor*; in particular, a *cartesian closed* functor F that maps from C(**L**) to the closed cartesian category of Sets. (It’s worth noting that this is completely equivalent to the normal Kripke semantics for lambda calculus; but when you get into more complex lambda calculi, like Hindley-Milner variants, this categorical formulation is *much* simpler.)
We describe how we build the category for the lambda theory in terms of a CCC using something called an *interpretation function*. It’s really just a notation that allows us to describe the translation recursively. The interpretation function is written using brackets: [A] is the categorical interpretation of the type A from lambda calculus.
So, first, we define an object for each type in **L**, including “unit”, which is a special distinguished value used for an expression that doesn’t return anything:
* ∀ A ∈ basetypes(**L**), [A] = AC ∈ C(**L**)
* [unit] = 1C
Next, we need to define the typing rules for complex types:
* [ A × B ] = [A] × [B]
* [ A → B ] = [B][A]
Now for the really interesting part. We need to look at type derivations – that is, the type inference rules of the lambda calculus – to show how to do the correspondences between more complicated expressions. Type derivations are done with a *context* containing a set of *type judgements*. Each type judgement assigns a type to a lambda term. We write type contexts as capital greek letters like Γ. There are two translation rules for contexts:
* [∅] = 1C
* [Γ, x : A] = [Γ] × [A]
We also need to describe what to do with the values of the primitive types:
* For each value v : A, there is an arrow v : 1 → AC.
And now the rest of the rules. Each of these is of the form [Γ :- x : A] = arrow, where we’re saying that Γ entails the type judgement “x : A”. What it means is the object corresponding to the type information covering a type inference for an expression corresponds to the arrow in C(**L**).
* [Γ :- unit : Unit] = ! : [Γ] → [Unit] *(A unit expression is a special arrow “!” to the Unit object)*
* [Γ :- a : AC] = a º ! : [Γ] → [AC] *(A simple value expression is an arrow composing with ! to form an arrow from Γ to the type object of Cs type.)*
* [Γ x : A :- x : A] = π2 : ([Γ ] × [A]) → [A] *(A term which is a free variable of type A is an arrow from the product of Γ and the type object A to A; That is, an unknown value of type A is some arrow whose start point will be inferred by the continued interpretation of gamma, and which ends at A. So this is going to be an arrow from either unit or a parameter type to A – which is a statement that this expression evaluates to a value of type A.)*
* [Γ, x:A :- x’ : A’],x’ ≠ x = [Γ :- x’ : A’] º π1 where π1 : ([Γ] × [A]) → [A’]. *(If the type rules of Γ plus the judgement x : A gives us x’ : A’, then the term x’ : A’ is an arrow starting from the product of the interpretation of the full type context with A), and ending at A’. This is almost the same as the previous rule: it says that this will evaluate to an arrow for an expression that results in type A.)*
* [Γ :- λ x:A . M : A → B] = curry([Γ, x:A :- M:B]) : [Γ] → [B][A] *(A function typing rule: the function maps to an arrow from the type context to an exponential [B][A], which is a function from A to B.)*
* [Γ :- (M M’) : B] = evalC,B º ([Γ :- M : C → B], [Γ :- M’ : C]) : [Γ] → [B] *(A function evaluation rule; it takes the eval arrow from the categorical exponent, and uses it to evaluate out the function.)*
There are also two projection rules for the decomposing categorical products, but they’re basically more of the same, and this is already dense enough.
The intuition behind this is:
* *arrows* between types are families of values. A particular value is a particular arrow from unit to a type object.
* the categorical exponent *is* the same as a function type; a function *value* is an arrow to the type. Evaluating the function is using the categorical exponent’s eval arrow to “decompose” the exponent, and produce an arrow to the function’s result type; *that* arrow is the value that the function evaluates to.
And the semantics – called functorial semantics – maps from the objects in this category, C(**L**) to the category of Sets; function values to function arrows; type objects to sets; values to value objects and arrows. (For example, the natural number type would be an object like the NNO we showed yesterday in C(**L**), and the set of natural numbers is the sets category would be the target of the functor.)
Aside from the fact that this is actually a very clean way of defining the semantics of a not-so-simply typed lambda calculus, it’s also very useful in practice. There is a way of executing lambda calculus expressions as programs that is based on this, called the [Categorical Abstract Machine][cam]. The best performing lambda-calculus based programming language (and my personal all-time-favorite programming language), [Objective-CAML][caml] is based on the CAM. (CAML stands for categorical abstract machine language.)
[cam]: http://portal.acm.org/citation.cfm?id=34883&dl=ACM&coll=portal
[caml]: http://caml.inria.fr/
[lambda]: http://goodmath.blogspot.com/2006/06/lamda-calculus-index.html
[ccc]: http://scienceblogs.com/goodmath/2006/06/categories_products_exponentia_1.php
Sorry for the delay in the category theory articles. I’ve been busy with work, and haven’t had time to do the research to be able to properly write up the last major topic that I plan to cover in cat theory. While doing my research on closed cartesian categories and lambda calculus, I came across this little tidbit that I hadn’t seen before, and I thought it would be worth taking a brief diversion to look at.
Category theory is sufficiently powerful that you can build all of mathematics from it. This is something that we’ve known for a long time. But it’s not really the easiest thing to demonstrate.
But then, I came across this lovely little article on wikipedia. It turns out that you can [build the natural numbers entirely in terms of categories][wikipedia] in a beautifully simple way.
Before we dive into categorical numbers, let’s take a moment to remember what we really mean by natural numbers.
Peano Arithmetic
——————
The meanings of natural numbers are normally defined in terms of what we call *Peano arithmetic*. Peano arithmetic defines the natural numbers, **N** in terms of five fundamental axioms:
1. 0 ∈ **N** (0 is a natural number.)
2. ∀ a ∈ **N** ∃ a’=s(n), a’ ∈ **N**. (Every natural number a has a successor, s(a))
3. ¬ ∃ a ∈ **N** : s(a)=0. (No natural number has 0 as its successor.)
4. ∀ a,b ∈ **N**: s(a)=s(b) ⇒ a=b. (Each number has a unique successor.)
5. (P(0) ∧ ∀ a : P(a) ⇒ P(s(a))) ⇒ ∀ b ∈ **N** : P(b). (If a property P is true for 0; and it can be shown that for any number a, if it’s true for a, then it’s true for b, then that means that it’s true for all natural numbers. This is the *axiom of induction*.)
To get to arithmetic, we need to add two operations: addition and multiplication.
We can define addition recursively quite easily:
1. ∀ x ∈ **N**: x+0 = x
2. ∀ x,y ∈ **N**: s(x)+y=x+s(y)
And once we have addition, we can do multiplication:
1. ∀ x ∈ **N**: x * 0 = 0 ∧ 0 * x = 0
2. ∀ x ∈ **N**: 1 * x = x
3. ∀ x,y ∈ **N**: s(x)*y = (x*y) + y
To sum up: Peano arithmetic is based on a definition of the natural numbers in terms of five fundamental properties, called the Peano axioms, built on “0”, a successor function, and induction. If we have those five properties, we can define addition and multiplication; and given those, we have the ability to build up arbitrary functions, sets, and the other fundamentals of mathematics in terms of the natural numbers.
Peano Naturals and Categories
———————————
The peano axioms can be captured in a category diagram for a relatively simple object called a *natural number object (NNO)*. A natural number object can exist in many categories: basically any category with a terminal object *can* have NNOs. Any closed cartesian category *does* have an NNO.
There are two alternate constructions; a general construct that works for all NNO-containing categories; and a more specific one for closed cartesian categories. (You can show that they are ultimately the same in a CCC.)
I actually think that the general one is easier to understand, so that’s what I’ll describe. This is basically a construction that captures the ideas of successor and recursion.
Given a category C with a terminal object **1**. A *natural number object* is an object *N* with two arrows z : 1 → *N*, and s : *N* → *N*, where z and s have the following property.
(∀ a ∈ Obj(C), (q : 1 → a, f : a → a) ∈ Arrows(C)) : u º z = q ∧ u º s = f º u.
That statement means the same thing as saying that the following diagram commutes:
Let’s just walk through that a bit. Think of the object **N** as a set. “s” is successor. So S is an arrow from *N* to *N* that maps each number to its successor. *N* is “rooted” by the terminal object; that gives it its basic structure. The “z” arrow from the terminal object maps to the zero value in *N*. So we’ve got an arrow that selects zero; and we’ve got an arrow that defines the successor.
Think of “q” as a predicate, and “u” as a statement that a value in *N* satisfies the predicate.
So u º z, that is the path from 1 to a by stepping through z and u is a statement that the predicate “q” is true for z (zero); because composing u (the number I’m composed with satisfies q) and z (the number zero) gives us “q” (zero satifies q).
Now the induction rule part. *N* maps to *N* by s – that’s saying that s is an arrow from a number in *N* to its successor. If you have an arrow composition that says “q is true for n”, then f can be composed onto that giving you “q is true for n + 1”.
If for all q, u is unique – that is, there is only *one* mapping where induction works; then you’ve got a categorical structure that satisfies the Peano axioms.
[wikipedia]: http://en.wikipedia.org/wiki/Natural_number_object
Today we’ll finally get to building the categories that provide the model for
the multiplicative linear logic. Before we jump into that, I want to explain why it is that we separate out the multiplicative part.
Remember from the simply typed lambda calculus, that [we showed that the type system was precisely a subset of intuitionistic logic][lambda-type], and that programs in the lambda calculus corresponded to proofs in the type system. In particular, it was propositional intuitionistic logic using only the “→” operation. Similarly, if you build up a more interesting typed lambda calculus like [System-F][systemf], the type system is intuitionist logic *with* the “∀” quantifer and the “∧” operator, but without “∨”, negation, or “∃”. Why do we eliminate the existentials and friends? Because if we left them in, then the *type system* becomes Turing complete and undecidable. If we’re careful, we can analyze a program using universal types, logical and (composition in lambda), and implication and infer the types if it’s a valid program. (Actually, even System-F is undecidable without any type annotations, but HMF – the Hindley-Milner subset, *is* decidable. NP-hard, but decidable.)
The main reason that people like me really care about linear logic is because there is a variant of lambda calculus that includes *linear types*. Linear types are types where referencing the value corresponding to the type *consumes* it. For dealing with the semantics of real programming languages, there are many things that make sense as linear types. For example, the values of an input stream behave in a linear way: read something from an input stream, and it’s not there anymore: unless you copied it, to put it someplace safe, it’s gone.
For typing linear lambda calculus, we use *intuitionistic linear logic* with the *multiplicative* operators. They give us a *decidable* type system with the capability to express linear type constraints.
Enough side-tracking; back to the categories.
We can now finally define a *linear category*. A category C is a linear category if it is both a [cartesian category][cartesian] and a [monoidal][monoidal] [closed category][monclose] with a *dualizing functor*. A dualizing functor is a *contravariant* closed functor defining properties of the categorical exponential, written (-)* : C → C. (That should be read with “-” as a placeholder for an object; and * as a placeholder for an exponent.) (-)* has the property that there is a natural isomorphism d : Id ≡ (-)** (That is, d is an identity natural isomorphism, and it’s equivalent to applying (-)* to the result of applying (-)* ) such that the following commutes:
So, what does this mean? Take the linear implication operator; mix it with the categorical exponent; and what you get *still* behaves *linearly*: that is, if “X -o Y”; that is, one X can be consumed to produce one Y, then 2 Xs can be consumed to produce 2 Ys; and N Xs can be consumed to produce N Ys.
So a linear category behaves cleanly with exponents; t has a linear implication; it has an *eval* operator (from the fact that it’s a cartesian category) to perform the linear implications; it has tensor for producing groups of resources. That’s it; that’s everything we need from categories for a model of linear logic.
Now, there’s just one more question: are there any real linear categories that make sense as a model for linear logic? And obviously, the answer is yes. There are two of them, called **CPO** and **Lin**.
**CPO** is the category with continuous partial orders as objects, and monotonic functions as morphisms. Think about that, and it makes a whole lot of sense for linear logic; the equivalence classes of the partial order are resources of the same “value” (that is, that can be exchanged for one another); and you can’t climb *upwards* in the partial order without adding something.
The other one, **Lin**, is a lot more complicated. I’m not going to explain it in detail; that would lead into another two-month-long series! But to put it briefly, **Lin** is the category with *coherent domains* as objects, and linear maps as morphisms. To make that make sense, I would have to explain domain theory, which (to put it mildly) is complicated; the short version is that a domain is a kind of CPO. But the reason we care about **Lin** is that programming language semantics are generally described using [*denotational semantics*][denote]; and denotational semantics are built using a kind of domain, called a Scott domain. So this gives us a domain that we can use in programming language semantics with exactly the properties we need to explain how linear types work.
[lambda-type]: http://goodmath.blogspot.com/2006/06/finally-modeling-lambda-calculus.html
[systemf]: http://en.wikipedia.org/wiki/System_F
[cartesian]: http://scienceblogs.com/goodmath/2006/06/categories_products_exponentia_1.php
[monclose]: http://scienceblogs.com/goodmath/2006/07/towards_a_model_for_linear_log_1.php
[monoidal]: http://scienceblogs.com/goodmath/2006/07/the_category_structure_for_lin_1.php
[denote]: http://en.wikipedia.org/wiki/Denotational_semantics
Things are a bit busy at work on my real job lately, and I don’t have time to put together as detailed a post for today as I’d like. Frankly, looking at it, my cat theory post yesterday was half-baked at best; I should have held off until I could polish it a bit and make it more comprehensible.
So I’m going to avoid that error today. Since we’ve had an interesting discussion focusing on the number zero, I thought it would be fun to show what zero means in category theory.
There are two zeros it category theory: the zero object, and the zero arrow.
Zero Objects
————–
The zero object is easier. Suppose we’ve got a category, C. Suppose that C has a *terminal object* – that is, an object t for which other object x in C has *exactly* *one* arrow f : x → t. And suppose that C also has an *initial object*: that is, an object i for which every object x in C has *exactly one* arrow, f : i → x. If C has both an initial object i, and a terminal object t, *and* i = t, then it’s a *zero object* for the category. A category can actually have *many* zero objects. For example, in the category of groups, any *trivial* group (that is, a group which contains only one element) is a zero object in the category of groups. For an intuition of why this is called “zero” think of the number zero. It’s a strange little number that sits dead in the middle of everything. It’s the central point on the complex plane, it’s the center of the number line. A zero *object* sits in the middle of a category: everything has exactly one arrow *to* it, and one arrow *from* it.
Zero Arrows
————-
The idea of a zero *arrow* comes roughly from the basic concept of a zero *function*. A zero function is a function f where for all x, f(x) = 0. One of the properties of a zero function is that composing a zero function with any other function results in a zero function. (That is, if f is a zero function, f(g(x))) is also a zero function.)
A zero morphism in a category C is part of a set of arrows, called the *zero family* of morphisms of C, where composing *any* morphism with a member of the zero family results in a morphism in the zero family.
To be precise: suppose we have a category C, and for any pair b of objects a and b ∈ Obj(C), there is a morphism 0a,b : a → .
The morphism 0d,e must satisfy one important property:
For any two morphisms m : d → e, and n : f → g, the following diagram must commute:
To see why this means that any composition with a 0 arrow will give you a zero arrow, look at what happens when we start with an ID arrow. Let’s make n : f → g an id arrow, by making f=g. We get the following diagram:
So – any zero arrow composed with an id arrow is a zero arrow. Now use induction to step up from ID arrows by looking at one arrow composed with an ID arrow and a zero arrow. You’ve still got a zero. Keep going – you’ll keep getting zeros.
Zero arrows are actually very important buggers: the algebraic and topographical notions of a kernel can be defined in category theory using the zero morphisms.
One last little note: in general, it looks like the zero objects and the zero morphisms are unrelated. They’re not. In fact, if you have a category with a zero object *and* a family of zero morphisms, then you can find the zero morphisms by using the zero object. For any objects a and b, 0a,b = (a → 0) º (0 → b).
So, we’re still working towards showing the relationship between linear logic and category theory. As I’ve already hinted, linear logic has something to do with certain monoidal categories. So today, we’ll get one step closer, by talking about just what kind of monoidal category. As I keep complaining, the problem with category theory is that anytime you want to do something interesting, you have to wade through a bunch of definitions to set up your structures. I’ll do my best to keep it short and sweet; as short as it is, it’ll probably take a bit of time to sink in.
First, we need a symmetry property. A monoidal category C (with terminal object t, tensor ⊗, and natural isomorphisms λ, ρ) is *symmetric* if/f for all objects a and b in C, there is a natural isomorphism γa,b : (a ⊗ b) → (b ⊗ a); and the categories monoidal natural isomorphism ρ = λb º γb,t : (b ⊗ t) → b.
In other words, if you reverse the arrows, but leave everything else the same, you still wind up with a monoidal category.
In a similar vein: regular functors are also sometimes called *covariant* functors. If you take a functor and *reverse* the directions of all the arrows, you get what’s known an a *contravariant* functor, or *cofunctor*.
We also need a *closure* property. Suppose C is a symmetric monoidal category. C is *monoidally closed* if/f there is a functor ⇒ : C × C → C, such that for every object b in C, there is an isomorphism Λ : (a⊗b → c) → (a → (b ⇒ c)), and Λ is a natural transformation for a and c. (Remember that a and c are categories here; this is a natural transformation from category to category.)
Basically, this just says that the tensor stays cleanly in the monoid, and there’s always a natural transformation corresponding to any use of tensor.
So what these two really do is pull up some of the basic category properties (closure with respect to composition, symmetry) and apply them to categories and natural transformations themselves.
Finally, we need to be able to talk about what it means for a *functor* to be closed. So suppose we have a monoidally closed category, with the usual components, up to and including the Λ natural transformation. A functor F : C → C is *closed* if, for all a,b in C, there is an arrow fa,b : ba → F(b)F(a) such that for all g : a → b, fa,b º Λ(gº λa) = Λ(F(g) º λF(a)).
In simple english, what a closed functor is is a functor that can be *represented by* an arrow in the category itself. That’s what all of that gobbledygook really means.
So… as a quick preview, to give you an idea of why we just waded through all of this gunk: if you take a monoidally closed category of the appropriate type, then collections of resources are going to be categories; ⊗ is the tensor to join collections of resources; and “-o” implications are the closed functors “⇒”.