Category Archives: Good Math

e – the Unnatural Natural Number

Looks like I’ve accidentally created a series of articles here about fundamental numbers. I didn’t intend to; originally, I was just trying to write the zero article I’d promised back during the donorschoose drive.
Anyway. Todays number is *e*, aka Euler’s constant, aka the natural log base. *e* is a very odd number, but very fundamental. It shows up constantly, in all sorts of strange places where you wouldn’t expect it.
What is e?
————
*e* is a transcendental irrational number. It’s roughly 2.718281828459045. It’s also the base of the natural logarithm. That means that by definition, if ln(x)=y, then *e*y=x. Given my highly warped sense of humor, and my love of bad puns (especially bad *geek* puns) , I like to call *e* the *unnatural natural number*. (It’s natural in the sense that it’s the base of the natural logarithm; but it’s not a natural number according to the usual definition of natural numbers. Hey, I warned you that it was a bad geek pun.)
But that’s not a sufficient answer. We call it the *natural* logarithm. Why is that bizzare irrational number just a bit smaller than 2 3/4 *natural*?
Take the curve y=1/x. The area under the curve from 1 to n is the natural log of n. *e* is the point on the x axis where the area under the curve from 1 is equal to one:
ln.jpg
It’s also what you get if you you add up the reciprocal of the factorials of every natural number: (1/0! + 1/1! + 1/2! + 1/3! + 1/4! + …)
It’s also what you get if you take the limit: *lim*n → ∞ (1 + 1/n)n.
It’s also what you get if you work out this very strange looking series:

2 + 1/(1+1/(2+2/(3+3/(4+..))))

It’s also the base of a very strange equation: the derivative of *e*x is… *e*x.
And of course, as I mentioned yesterday, it’s the number that makes the most amazing equation in mathematics work: *e*=-1.
Why does it come up so often? It’s really deeply fundamental. It’s tied to the fundamental structure of numbers. It really is a deeply *natural* number; it’s tied into the shape of a circle, to the basic 1/x curve. There are dozens of different ways of defining it, because it’s so deeply embedded in the structure of *everything*.
Wikipedia even points out that if you put $1 into a bank account paying 100% interest compounded continually, at the end of the year, you’ll have exactly *e* dollars. (That’s not too suprising; it’s just another way of stating the integral definition of *e*, but it’s got a nice intuitiveness to it.)
History

The Categorical Model of Linear Logic

Today we’ll finally get to building the categories that provide the model for
the multiplicative linear logic. Before we jump into that, I want to explain why it is that we separate out the multiplicative part.
Remember from the simply typed lambda calculus, that [we showed that the type system was precisely a subset of intuitionistic logic][lambda-type], and that programs in the lambda calculus corresponded to proofs in the type system. In particular, it was propositional intuitionistic logic using only the “→” operation. Similarly, if you build up a more interesting typed lambda calculus like [System-F][systemf], the type system is intuitionist logic *with* the “∀” quantifer and the “∧” operator, but without “∨”, negation, or “∃”. Why do we eliminate the existentials and friends? Because if we left them in, then the *type system* becomes Turing complete and undecidable. If we’re careful, we can analyze a program using universal types, logical and (composition in lambda), and implication and infer the types if it’s a valid program. (Actually, even System-F is undecidable without any type annotations, but HMF – the Hindley-Milner subset, *is* decidable. NP-hard, but decidable.)
The main reason that people like me really care about linear logic is because there is a variant of lambda calculus that includes *linear types*. Linear types are types where referencing the value corresponding to the type *consumes* it. For dealing with the semantics of real programming languages, there are many things that make sense as linear types. For example, the values of an input stream behave in a linear way: read something from an input stream, and it’s not there anymore: unless you copied it, to put it someplace safe, it’s gone.
For typing linear lambda calculus, we use *intuitionistic linear logic* with the *multiplicative* operators. They give us a *decidable* type system with the capability to express linear type constraints.
Enough side-tracking; back to the categories.
We can now finally define a *linear category*. A category C is a linear category if it is both a [cartesian category][cartesian] and a [monoidal][monoidal] [closed category][monclose] with a *dualizing functor*. A dualizing functor is a *contravariant* closed functor defining properties of the categorical exponential, written (-)* : C → C. (That should be read with “-” as a placeholder for an object; and * as a placeholder for an exponent.) (-)* has the property that there is a natural isomorphism d : Id ≡ (-)** (That is, d is an identity natural isomorphism, and it’s equivalent to applying (-)* to the result of applying (-)* ) such that the following commutes:

linear-dual.jpg

So, what does this mean? Take the linear implication operator; mix it with the categorical exponent; and what you get *still* behaves *linearly*: that is, if “X -o Y”; that is, one X can be consumed to produce one Y, then 2 Xs can be consumed to produce 2 Ys; and N Xs can be consumed to produce N Ys.
So a linear category behaves cleanly with exponents; t has a linear implication; it has an *eval* operator (from the fact that it’s a cartesian category) to perform the linear implications; it has tensor for producing groups of resources. That’s it; that’s everything we need from categories for a model of linear logic.
Now, there’s just one more question: are there any real linear categories that make sense as a model for linear logic? And obviously, the answer is yes. There are two of them, called **CPO** and **Lin**.
**CPO** is the category with continuous partial orders as objects, and monotonic functions as morphisms. Think about that, and it makes a whole lot of sense for linear logic; the equivalence classes of the partial order are resources of the same “value” (that is, that can be exchanged for one another); and you can’t climb *upwards* in the partial order without adding something.
The other one, **Lin**, is a lot more complicated. I’m not going to explain it in detail; that would lead into another two-month-long series! But to put it briefly, **Lin** is the category with *coherent domains* as objects, and linear maps as morphisms. To make that make sense, I would have to explain domain theory, which (to put it mildly) is complicated; the short version is that a domain is a kind of CPO. But the reason we care about **Lin** is that programming language semantics are generally described using [*denotational semantics*][denote]; and denotational semantics are built using a kind of domain, called a Scott domain. So this gives us a domain that we can use in programming language semantics with exactly the properties we need to explain how linear types work.
[lambda-type]: http://goodmath.blogspot.com/2006/06/finally-modeling-lambda-calculus.html
[systemf]: http://en.wikipedia.org/wiki/System_F
[cartesian]: http://scienceblogs.com/goodmath/2006/06/categories_products_exponentia_1.php
[monclose]: http://scienceblogs.com/goodmath/2006/07/towards_a_model_for_linear_log_1.php
[monoidal]: http://scienceblogs.com/goodmath/2006/07/the_category_structure_for_lin_1.php
[denote]: http://en.wikipedia.org/wiki/Denotational_semantics

Friday Pathological Programming Language: Whenever

I was hoping for a bit of a vanity post for todays pathological programming language in honor of my 40th birthday (tomorrow), but I didn’t have time to finish implementing my own little piece of insanity. So it’ll have to wait for some other occasion.
Todays pathological programming language is a really simple monstrosity called [“Whenever”][whenever]. Whenever is a programming language where programs get executed in *random* order, and there are *no* variables. The only state, the only way of manipulating information in a Whenever program is by manipulating the collection of executable statements itself: the program is both the code *and* the data at the same time.
The basic program model of Wheneveris: you write a set of statements. The statements are inserted into a grab-bag by the interpreter. The interpreter then repeatedly picks a statement out of the bag *at random* and executes it. It keeps doing that until there are no more statements in the bag.
Everything in Whenever is done by manipulating the statement bag.
Each statement is labeled by a number. The number has no direct meaning; it’s just an identifier that will be used to reference the statement. The numbers assigned to lines don’t have to match the order in which the lines were written in the source file; and they have no effect on the order by which statements are pulled out of the bag.
So how do you do anything in this crazy language?
There’s a print statement for printing things out. So, for example,
23 print(“Hello worldn”);
is the hello world program. Since there’s only one statement, it’ll get grabbed from the statement bag, and executed.
There’s also a read statement, which reads a number from standard input, and then acts as if the statement were the number that it read.
The simplest statement is just a number. A number statement *adds* a copy of the line identifier by that number to the bag. If the number is negative, then it *removes* a copy of the statement from the bag. So, for example:
23 print(“Hello worldn”);
11 23;
would print “Hello world” twice. If 23 were executed first, it would print “Hello world”, and then only 11 would be in the bag, so it would execute 11, which would add 23 to the bag. If 11 went first, then it would add 23 to the bag, and there would be two 23s in the bag, which would get executed.
You can add multiple copies of a line to the bag: 5#3 adds 3 copies of statement 5 to the bag. And you can add multiple statements to the bag at once, by separating them with a comma. So:
17 21, -2, 3#4, -5#2;
Would insert one copy of statement 21 and four copies of statement 3 to the bag; and remove one copy of statement 2, and two copies of statement 5.
You can also do any normal arithmetic operation on numbers. The result of the arithmetic operation is interpreter as a line number.
There’s also two kinds of conditionals, “defer” and “again”. Defer takes a parameter which is evaluated to a line number, and if there are any copies of that line number in the bag, then it reinserts itself, and doesn’t do anything else. If there are no copies of the parameter in the bag, then the statement on the rest of the line is executed.
Again, an example:
1 print(“Hello”);
2 defer(1) print(“Worldn”);
is a more complicated version of hello world.
The “again” statement is very similar to the “defer” statement; but if its argument is true (i.e., evaluates to a statement that is present in the bag), then it adds a copy of itself to the bag; whether the parameter is true or false, it then executes the rest of the statement.
There’s one helpful built-in function: N(x) returns the number of copies of statement x in the bag.
So, a couple of interesting example programs:
1 defer (4 || N(1)<N(2) && N(2)<N(3)) print(N(1)+" bottles of beer on the wall, "+N(1)+" bottles of beer,");
2 defer (4 || N(1)==N(2)) print("Take one down and pass it around,");
3 defer (4 || N(2)==N(3)) print(N(1)+" bottles of beer on the wall.");
4 1#98,2#98,3#98;
This first ensures that statement four runs first: statements 1, 2, and 3 will all defer until 4 has been executed. Once four is run, there are 99 copies of statements 1, 2, and 3 in the bag. The rest of the defer statement makes sure that 1 executes before 2, and 2 before 3; so it cycles through 1, 2, 3 99 times. Pretty simple, right?
How about this?
1 again (1) defer (3 || N(1)99) 2#N(1),3,7;
2 again (2) defer (3 || N(2)99) 1#N(2),3,7;
3 defer (5) print(N(1)+N(2));
4 defer (5) print(“1”);
5 4,-3,7;
6 defer (4) 3;
7 7;
8 defer (N(7)<100) -1#N(1),-2#N(2),-7#100;
9 defer (3 || 6) 1,3;
If you look carefully.. It generates the first 100 fibonacci numbers.
It's an incredibly simple language. Simple, but quite twisted, and seriously mind-warping to try to program: you need to always keep track of the fact that the statements that represent your data could get selected for execution, which will modify the data unless you used a "defer" as a guard, but then you need to make sure that the guard gets preserved correctly… It's quite devious.
[whenever]: http://www.dangermouse.net/esoteric/whenever.html

Zeros in Category Theory

Things are a bit busy at work on my real job lately, and I don’t have time to put together as detailed a post for today as I’d like. Frankly, looking at it, my cat theory post yesterday was half-baked at best; I should have held off until I could polish it a bit and make it more comprehensible.
So I’m going to avoid that error today. Since we’ve had an interesting discussion focusing on the number zero, I thought it would be fun to show what zero means in category theory.
There are two zeros it category theory: the zero object, and the zero arrow.
Zero Objects
————–
The zero object is easier. Suppose we’ve got a category, C. Suppose that C has a *terminal object* – that is, an object t for which other object x in C has *exactly* *one* arrow f : x → t. And suppose that C also has an *initial object*: that is, an object i for which every object x in C has *exactly one* arrow, f : i → x. If C has both an initial object i, and a terminal object t, *and* i = t, then it’s a *zero object* for the category. A category can actually have *many* zero objects. For example, in the category of groups, any *trivial* group (that is, a group which contains only one element) is a zero object in the category of groups. For an intuition of why this is called “zero” think of the number zero. It’s a strange little number that sits dead in the middle of everything. It’s the central point on the complex plane, it’s the center of the number line. A zero *object* sits in the middle of a category: everything has exactly one arrow *to* it, and one arrow *from* it.
Zero Arrows
————-
The idea of a zero *arrow* comes roughly from the basic concept of a zero *function*. A zero function is a function f where for all x, f(x) = 0. One of the properties of a zero function is that composing a zero function with any other function results in a zero function. (That is, if f is a zero function, f(g(x))) is also a zero function.)
A zero morphism in a category C is part of a set of arrows, called the *zero family* of morphisms of C, where composing *any* morphism with a member of the zero family results in a morphism in the zero family.
To be precise: suppose we have a category C, and for any pair b of objects a and b ∈ Obj(C), there is a morphism 0a,b : a → .
The morphism 0d,e must satisfy one important property:
For any two morphisms m : d → e, and n : f → g, the following diagram must commute:
zero.jpg
To see why this means that any composition with a 0 arrow will give you a zero arrow, look at what happens when we start with an ID arrow. Let’s make n : f → g an id arrow, by making f=g. We get the following diagram:
zero-id.jpg
So – any zero arrow composed with an id arrow is a zero arrow. Now use induction to step up from ID arrows by looking at one arrow composed with an ID arrow and a zero arrow. You’ve still got a zero. Keep going – you’ll keep getting zeros.
Zero arrows are actually very important buggers: the algebraic and topographical notions of a kernel can be defined in category theory using the zero morphisms.
One last little note: in general, it looks like the zero objects and the zero morphisms are unrelated. They’re not. In fact, if you have a category with a zero object *and* a family of zero morphisms, then you can find the zero morphisms by using the zero object. For any objects a and b, 0a,b = (a → 0) º (0 → b).

The Category Structure for Linear Logic

So, we’re still working towards showing the relationship between linear logic and category theory. As I’ve already hinted, linear logic has something to do with certain monoidal categories. So today, we’ll get one step closer, by talking about just what kind of monoidal category. As I keep complaining, the problem with category theory is that anytime you want to do something interesting, you have to wade through a bunch of definitions to set up your structures. I’ll do my best to keep it short and sweet; as short as it is, it’ll probably take a bit of time to sink in.
First, we need a symmetry property. A monoidal category C (with terminal object t, tensor ⊗, and natural isomorphisms λ, ρ) is *symmetric* if/f for all objects a and b in C, there is a natural isomorphism γa,b : (a ⊗ b) → (b ⊗ a); and the categories monoidal natural isomorphism ρ = λb º γb,t : (b ⊗ t) → b.
In other words, if you reverse the arrows, but leave everything else the same, you still wind up with a monoidal category.
In a similar vein: regular functors are also sometimes called *covariant* functors. If you take a functor and *reverse* the directions of all the arrows, you get what’s known an a *contravariant* functor, or *cofunctor*.
We also need a *closure* property. Suppose C is a symmetric monoidal category. C is *monoidally closed* if/f there is a functor ⇒ : C × C → C, such that for every object b in C, there is an isomorphism Λ : (a⊗b → c) → (a → (b ⇒ c)), and Λ is a natural transformation for a and c. (Remember that a and c are categories here; this is a natural transformation from category to category.)
Basically, this just says that the tensor stays cleanly in the monoid, and there’s always a natural transformation corresponding to any use of tensor.
So what these two really do is pull up some of the basic category properties (closure with respect to composition, symmetry) and apply them to categories and natural transformations themselves.
Finally, we need to be able to talk about what it means for a *functor* to be closed. So suppose we have a monoidally closed category, with the usual components, up to and including the Λ natural transformation. A functor F : C → C is *closed* if, for all a,b in C, there is an arrow fa,b : ba → F(b)F(a) such that for all g : a → b, fa,b º Λ(gº λa) = Λ(F(g) º λF(a)).
In simple english, what a closed functor is is a functor that can be *represented by* an arrow in the category itself. That’s what all of that gobbledygook really means.
So… as a quick preview, to give you an idea of why we just waded through all of this gunk: if you take a monoidally closed category of the appropriate type, then collections of resources are going to be categories; ⊗ is the tensor to join collections of resources; and “-o” implications are the closed functors “⇒”.

Yet Another Crappy Bayesian Argument

A reader sent me a link to yet another purported Bayesian argument for the existence of god, this time by a physicist named Stephen Unwin. It’s actually very similar to Swinburne’s argument, which I discussed back at the old home of this blog. The difference is the degree of *dishonesty* demonstrated by the author.

As usual, you can only see the entire argument if you buy his book. But from a number of reviews of the book, and a self-interview posted on his personal website, we can get the gist. Scientific American’s review has the best concise description of his argument that I could find: (the equation in it is retyped by me.)

Unwin rejects most scientific attempts to prove the divine–such as the anthropic principle and intelligent design–concluding that this “is not the sort of evidence that points in either direction, for or against.” Instead he employs Bayesian probabilities, a statistical method devised by 18th-century Presbyterian minister and mathematician Reverend Thomas Bayes. Unwin begins with a 50 percent probability that God exists (because 50-50 represents “maximum ignorance”), then applies a modified Bayesian theorem:

Pafter = Pbefore×D/(Pbefore×D + 100% -Pbefore)

The probability of God’s existence after the evidence is considered is afunction of the probability before times D (“Divine Indicator Scale”): 10 indicates the evidence is 10 times as likely to be produced if God exists, 2 is two times as likely if God exists, 1 is neutral, 0.5 is moderately more likely if God does not exist, and 0.1 is much more likely if God does not exist. Unwin offers the following figures for six lines of evidence: recognition of goodness (D = 10), existence of moral evil (D = 0.5), existence of natural evil (D = 0.1), intranatural miracles (prayers) (D = 2), extranatural miracles (resurrection) (D = 1), and religious experiences (D = 2).

Plugging these figures into the above formula (in sequence, where the Pafter figure for the first computation is used for the Pbefore figure in the second computation, and so on for all six Ds), Unwin concludes: “The probability that God exists is 67%.” Remarkably, he then confesses: “This number has a subjective element since it reflects my assessment of the evidence. It isn’t as if we have calculated the value of pi for the first time.”

It’s pretty clear looking at this that the argument is nothing more than “I assert God exists, therefore God exists”. The “probability” result is generated by pulling numbers at random for his D-value. Even he admits that the numbers are “subjective”, but I would go much further than that: the numbers are fundamentally built on the assumption of the existence of god. How can you pretend that you haven’t already accepted the assumption that god exists, and then use stories about the occurrence of divine interventions as facts?

But this doesn’t touch on the reason that I call him dishonest. So far, it’s just sloppiness; typical of the sloppy reasoning of religious people trying to make arguments for the existence of god. But then, on his website, there’s a little self-interview:

Q: So does He exist?

SDU: God?

Q: Yes.

SDU: I don’t know. Although my book does expand on this response.

It goes on like that. He claims to not know; to not have a belief about whether or not there is a god; that his book is an honest enquiry by someone uncertain, trying to use evidence to reason about whether or not god exists.

He’s lying. Plain and simple. Everything about his argument is completely predicated on his acceptance of the existence of god. And there’s no way that he’s dumb enough to not know that. But the argument seems so much more convincing to a layman if the author isn’t sure, but is just carefully working through the probabilities. And that final figure: exactly 2/3s… It’s nicely convenient. After all, he’s not saying he’s sure; but he’s saying that an objective review of the evidence gives a number that makes it look good, while not certain – it preserves that illusion of objectivity.

This guy is using his scientific background to give him authority as someone who understands how this kind of math works; and then he’s lying about his intentions in order to increase the credibility of his argument.

Friday Pathological Programming: Befunge, the 2-dimensional language

Today, we’re going to take a look at a brilliant language called Befunge. Befunge is the work of an evil genius named Chris Pressey.

Normal programming languages are based on a basically one-dimensional syntax; the program is a string, a sequence of characters, and it’s processed by reading that string in a straight-ahead fashion. But that’s not Befunge! Befunge is something like a two-dimensional turing machine: it says that the program and data are written on a two dimensionaltorus. Each instruction in Befunge is a single character, and where it’s located on the torus is crucial. (In case you’re not familiar with a torus, it’s what you get if you take a very flexible sheet of paper, and roll it so that you connect the top edge to the bottom, and then roll that tube so that you connect the left edge to the right. You get a donut shape where moving up from what used to be the top of the page puts you on the bottom of the page; moving left from the left edge of the page puts you on the right.) This torus is called the playfield.

The basics of computation in Befunge are pretty straightforward. It’s a stack based language. Operations take their parameters from the stack, and leave their results on the stack. Nothing too complicated there. There are arithmetic operators, IO operators, control flow operators, all operating on the values on the stack.

The arithmetic operators are the usual kinds of things: There are operators for addition (+), subtraction (-), division (/), multiplication (*), modulo (%), and logical negation (!). Digit characters are treated as operators that push the numeric value of the digit onto the stack. (So “99” will create a stack with two nines.) Comparisons are done using “`”, which pops the top two values from the stack and compares them. So if the top of the stack was “x”, and the value beneath it was “y”, then “`” would leave a “0” on the stack if x≤y, and 1 if x > y.

For IO, there are four operators. “&” reads an integer from standard input and pushes it onto the stack. “~” reads a single character from standard input, and leaves it on the stack. “.” pops a value off the stack and writes it to standard out as an integer. “,” pops a value off the stack and writes it to standard out as a character.

Of course, we can’t have a stack-based language without some stack operators: “:” makes a duplicate of the top value on the stack; “$” discards the top value on the stack; “” swaps the top two values of the stack.

So far, nothing has looked particularly pathological – in fact, nothing even looks particularly strange, other that the fact that it’s pretty illegible because of the single-character operators. But now, we get to control flow, and that is where the insanity/brilliance of Befunge reveals itself.

In Befunge, there’s a read-head that moves over the program. Each step, it executes the instruction under the head. But instead of just moving left or right, it can move left, right, up, or down. “>” is an instruction that tells the head to start moving to the right; “<" tells the head to start moving left; "^" means start moving up, and "v" means to start moving down. So, for example:

>v
^<

Is a program that runs an infinite loop: the head will just cycle over those four characters. An even more interesting infinite loop (taken from the befunge documentation) is:

>v>v
>^
^  <

Conditionals work by picking the direction that the head will move: “_” pops the stack, and if the value is zero, then it makes the head move right (“>”); if it’s non-zero, it makes the head move left (“<"). Similarly, "|" pops a value, and makes the head move up if the value was non-zero, or down if it was zero. To make things confusing, "#" means "skip the next instruction." (Actually, it's important for when a vertical and horizontal control flow cross.) And finally,
"@" is the exit command; it makes the head stop, and the program halt.

There’s also a little hack for strings. A double-quote character (“) starts a string; when one is encountered, the head keeps moving in the same direction, but instead of executing the characters as instuctions, it just pushes the character values onto the stack. When the second quote is found, it goes back to executing instructions.

Finally, just in case the basic two dimensional flow control isn’t pathological enough, there are two instructions for modifying cells on the playfield! “g” pops an X and Y value off the stack, and pushes the character at (X,Y) onto the stack; “p” pops an X, Y, and a character off the stack, and writes the character onto location (X,Y) of the playfield. (The coordinates are relative to the cell where the program started.)

So, let’s look at a couple of befunge programs. As usual, we start with the good old “hello world”.

v
>v"Hello world!"0<
,:
^_25*,@

We start at the top left, head moving right. It moves until it hits the “v” character, which makes it go down; then “<" makes it go left. 0 pushes a zero onto the stack. Then we've got a quote – so it starts pushing characters onto the stack until the next quote. When we get to the next quote, if we wrote the stack so that the top comes first, it would look like : ('H' 'e' 'l' 'l' 'o' ' ' 'w' 'o' 'r' 'l' 'd' '!' 0 ).

Then we hit a “v” which makes the head go down. This is the beginning of a loop; the leftmost two characters of rows 2, 3, and 4 are a while loop! The head goes down to “:” which duplicates the top of the stack; then it hits “_”, which turns left if the value on top of the stack is not zero; then the head turns up, outputs a character, turns right, and we’re back at the start of the loop. So the loop will output each character until we get the the “0” we pushed before the string; then at the “_”, we turn right. 2 and 5 are pushed on the stack and multiplied, leaving a 10, which is the linefeed character (basically “n” for you C weenies). It outputs the linefeed, and then exits.

How about a truly pathological example? Here’s a self-reproducing program in 12 bytes.

:0g,:93+`#@_1+

Stepping through that:

  1. Dup the value on top of the stack. That’ll produce a “0” if the stack is empty. (So first pass, stack=[0])
  2. “0” Push a zero on the stack. (First pass, stack=[0,0])
  3. “g”: Fetch the value at (x,y) on the stack; that’s (0,0) initially. (First pass, stack = [‘:’])
  4. “,”: Output it. So we printed the character at (0,0) (First pass, stack = [])
  5. “:” dup the stack top again. (First pass, stack = [0])
  6. “93+”. Get the number 12 onto the stack. (First pass, stack = [12,0])
  7. Compare what was on top of the stack to twelve. Leave a 0 there if it was, or a 1 if it wasn’t. (First pass, stack = [0]).
  8. “#” skip over the next character.
  9. “_” go right if the top of stack is zero; left if it’s one. So if the value copied by the second “:” was greater than 12, then go left (in which case you hit “@” and halt); otherwise, keep going right.
  10. “1+”: add one to the top of the stack (First pass, stack = ([1])). Then keep going right until you hit the right edge, and the you jump back to the left edge, so you’re at the first “:” again.
  11. Now the whole thing repeats, except that there’s a one on the stack. So the “g” will fetch (1,0); the “12” will be compared to 1, and the “1+” on the end will leave “2” on the stack. So now it fetches and outputs (2,0). And so on, until it reaches the “(_)” after outputting (12,0), when it halts.

One more, which I’ll let you figure out for yourself. Here’s a program that prompts you for a number, and computes its factorial:

v
>v"Please enter a number (1-16) : "0$*99g1-:99p#v_.25*,@
^_&:1-99p>:1-:!|10          <
^     <

Towards a Model for Linear Logic: Monoidal Categories

Time to come back to category theory from out side-trip. Category theory provides a good framework for defining linear logic – and for building a Curry-Howard style type system for describing computations with *state* that evolves over time. Linear logic provides a way of defining a valid *model* for part of linear logic (the multiplicative operators) that aren’t tractable using other modeling techniques.
I’m going to show you the relationship between models for linear logic and category theory. It’s going to take a couple of days to go through the whole shebang, but even the parts that we need to go through to get there are fun.
The first step is to define a *monoidal category*, also known as a *tensor* category. We’ve already done most of that when we built [monads][monads] earlier this week; a monad is a kind of monoidal category.
A monoidal category is a category C with one object t, and a *binary functor* ⊗ : C × C → C. This binary functor is called *tensor*. Tensor has three required properties defined using *natural isomorphisms*, called α, λ, and ρ.
α says that tensor must be associative: α(A,B,C) : (A ⊗ B) ⊗ C → A ⊗ (B ⊗ C).
λ says that tensor has a *left identity*: λA : (I ⊗ λ) → A.
ρ says that tensor has a *right identity*, which is the same as the left identity: ρA : (ρ ⊗ 1) → A;.
And finally, the natural transformations need to make the following diagrams commute for all values of A, B, and C. These are known as the *coherence conditions* for the monoidal natural transformations.
monoid.jpg
A monoidal category is said to be *strict* if α, λ, and ρ are all identities. It turns out that for every monoidal category, there is an *equivalent* (in the sense of natural isomorphism) to a struct monoidal category.
And now, here comes an example of just why category theory is useful. In some of the [detailed models of quantum physics][quantum-condense], they try to describe the structure of different kinds of matter using what they call *topological orders*. The standard theory for describing the topological orders to different states of matter is called *Landau* theory. It turns out that Landau theory doesn’t describe the topological order of high temperature semiconductors or very-low-temperature condensate states. Category theory – in particular, the theory surrounding strict monoidal categories does a better job of describing the topological order of the different states of matter than any other mathematical approach that’s been tried so far.
[monads]: http://scienceblogs.com/goodmath/2006/07/monads_and_programming_languag_1.php
[quantum-condense]: http://dao.mit.edu/~wen/pub/qorder.html

Linear Logic

[Monday][yesterday], I said that I needed to introduce the sequent calculus, because it would be useful for describing things like linear logic. Today we’re going to take a quick look at linear logic – in particular, at *propositional* linear logic; you can expand to predicate linear logic in a way very similar to the way we went from propositional logic to first order predicate logic.
So first: what the heck *is* linear logic?
The answer is, basically, logic where statements are treated as *resources*. So using a statement in an inference step in linear logic *consumes* the resource. This is a strange notion if you’re coming from a regular predicate logic. For example, in regular predicate logic, if we have the statements: “A”, “A ⇒ B”, and “A ⇒ C”, we know that we can conclude “B ∧ C”. In linear logic, that’s not true: using either implication statement would *consume* the “A”. So we could infer “B”, or we could infer “C”, but we could *not* infer both.
When people talk about linear logic, and why it makes sense, they almost always use a vending machine analogy. Suppose I walk up to a vending machine, and I want to buy a soda and a candy bar. I’ve got 8 quarters in my pocket; the soda costs $1.50; the candy bar costs $.75.
In linear logic, I’d say something like the following (the syntax is wrong, but we’ll get to syntax later): (Q,Q,Q,Q,Q,Q,Q,Q), (Q,Q,Q,Q,Q,Q) ⇒ Soda, (Q,Q,Q) ⇒ Candy.
Using the rules, I can by a soda by “spending” 6 of my Qs. I wind up with “(Q,Q) ∧ Soda”, and “(Q,Q,Q) ⇒ Candy”. I’ve consumed 6 Qs, and I’ve consumed the “(Q,Q,Q,Q,Q,Q) ⇒ Soda” implication. I can’t do anything else; I don’t have enough Qs.
The basic statements in linear logic, with intuitive meanings are:
1. A ⊗ B. This is called *multiplicative conjunction*, also known as *simultaneous occurrence*. This means that I definitely have both A and B. This has an *identity unit* called “1”, such that A ⊗ 1 ≡ 1 ⊗ A ≡ A. 1 represents the idea of the absence of any resource.
2. A & B : *additive conjunction*, aka *internal choice*. I can have either A *or* B, and I get to pick which one. The unit is ⊤, pronounced “top”, and represents a sort of “I don’t care” value.
3. A ⊕ B. This is called *additive disjunction*, also known as *external choice*. It means that I get either A or B, but I don’t get to pick which one. The unit here is 0, and represents the lack of an outcome.
4. A ⅋ B : *multiplicative disjunction*, aka *parallel occurence*; I *need to have* both A and B at the same time. The unit for this is ⊥, pronounced “bottom”, and represents the absence of a goal. In the vending machine metaphor, think of it as the “cancel” button on the vending machine: I decided I don’t want any, so I’m not going to spend my resources.
5. A -o B : Linear implication. Consume resource A to *produce* resource B. The normal symbol for this looks like an arrow with a circle instead of an arrowhead; this operator is often called “lolly” because of what the normal symbol looks like. I’m stuck writing it as “-o”, because there’s no HTML entity for the symbol.
6. !A : Positive exponentiation, pronounced “Of course A”. This *produces* an arbitrary number of As. Equivalent to A ⊗ !A.
7. ?A : Negative exponentiation, pronounced “Why not A?”. This *consumes* As.
Ok. So, suppose I want to talk about buying lunch. I’ve got 10 dollars to buy lunch. I’ll be full if I have a salad, a coke, and a tuna sandwich. If I wanted to write “I’ve got a dollar” as “D”, “I have a salad” as “S”, “I have a coke” as “C”, “I have a tuna sandwich” as “T”, and finally, “I’m full” as “F”
* I can write “I have 10 dollars” in LL as: “(D ⊗ D ⊗ D ⊗ D ⊗ D ⊗ D ⊗ D ⊗ D⊗ D ⊗ D)”.
* I can write “Tuna sandwich and salad and coke” as a group of things that I want to have all of as: “T ⅋ S ⅋ C”.
* I can say that I’ll be full if I have lunch as “T ⅋ S ⅋ C -o F”
If I want to talk about buying lunch, I can describe the prices of the things I want using implication:
* A coke costs one dollar: “D -o C”; I can spend one dollar, and in return I get one coke.
* A salad costs 3 dollars: “(D ⊗ D ⊗ D) -o S”
* A tuna sandwich also costs three dollars: “(D ⊗ D ⊗ D) -o S”
Now, I can do some reasoning with these.
* By taking 1 of the dollars, I can get one C. That leaves me with “D ⊗ D ⊗ D ⊗ D ⊗ D ⊗ D ⊗ D ⊗ D⊗ D ⊗ C”
* By taking 3 D, I can get one S. “D ⊗ D ⊗ D ⊗ D ⊗ D⊗ D ⊗ C ⊗ S”.
* By taking 3 D, I can get one T. “D ⊗ D⊗ D ⊗ C ⊗ S ⊗ T”.
* Now I’ve got my lunch. I can eat it and be full, with three dollars left: “D ⊗ D⊗ D ⊗ F”.
Just from this trivial example, you should be able to see why linear logic is cool: the idea of being able to talk about *how* resources are used in an inference process or a computation is really valuable, and linear logic gives you the ability to really work with the concept of resource in a solid, formal way. If you think of it in terms of the Curry-Howard isomorphism [types-as-proofs concept from the simply typed lambda calculus][types-as-proofs], you can imagine using linear logic for types of values that are *consumed* by a computation – i.e., they’re no longer available once they’ve been used.
I’m going to adopt a slightly different format for the sequents for working in linear logic. The way that I produced the center bars in yesterdays post was really painful to write, and didn’t even end up looking particularly good. So, the way that I’m going to right the sequents in this post is to wrap the “top” and “bottom” of the sequent in curly braces, and separate them by a “⇒”, as in:

{GivenContext :- GivenEntailment } ⇒ { InferredContext :- InferredEntailment}

Now, let’s take a look at the sequent rules for propositional linear logic. I’m using the version of these rules from [Patrick Lincoln’s SIGACT ’92 paper][sigact92]. Yeah, I know that’s a bit of a big load there. Don’t worry about it too much; the important part is the concept described up above; the sequents are useful to look at when you have a hard time figuring out what some operator means in inference. For example, you can see the difference between & and ⊕ (which I found confusing at first) by looking at their sequents, to see what they do.
1. **Identity**: { } ⇒ { A :- A }
2. **Cut**: { Γ1 :- A, Σ1   Γ2, A :- Σ2 } ⇒ { Γ1, Γ2 :- Σ12}
3. **Exchange Left**: { Γ1, A, B, Γ2 :- Σ } ⇒ { Γ1, B, A, Γ2 :- Σ }
4. **Exchange Right**: { Γ :- Σ1, A, B, Σ2 } ⇒ { Γ :- Σ1, B, A, Σ2}
5. **⊗ Left**: {Γ, A, B :- Σ} ⇒ { Γ, A ⊗ B :- Σ }
6. **⊗ Right**: { Γ1 :- A, Σ1   Γ2 :- B, Σ2} ⇒ { Γ1, Γ2 :- (A ⊗ B), Σ12}
7. **-o Left**: { Γ1 :- A, Σ1  Γ2, B :- Σ2 } ⇒ { Γ12, (A -o B) :- Σ12}
8. **-o Right**: { Γ, A :- B, Σ} ⇒ { Γ :- A -o B, Σ}
9. **⅋ Left**: { Γ1,A :- Σ1  Γ2 :- B, Σ2 } ⇒ { Γ12, (A ⅋ B) :- Σ12}
10. **⅋ Right**: { Γ :- A, B, Σ} ⇒ { Γ :- A ⅋ B, Σ}
11. **& Left**: { Γ, A :- Σ } ⇒ { Γ,A & B :- Σ}/{ Γ, B :- Σ } ⇒ { Γ,A & B :- Σ}
12. **& Right**: { Γ :- A,Σ   Γ :- B,Σ} ⇒ { Γ :- (A & B), Σ }
13. **⊕ Left**: Γ,A :- Σ   Γ,B :- Σ} ⇒ { Γ,A ⊕ B :- Σ}
14. **⊕ Right**: {Γ :- A,Σ} ⇒ {Γ :- A ⊕ B, Σ}/{Γ :- B,Σ} ⇒ {Γ :- A ⊕ B, Σ}
15. **!W**: {Γ :- Σ} ⇒ {Γ,!A :- Σ}
16. **!C**: {Γ,!A,!A :- Σ} ⇒ { Γ,!A :- Σ }
17. **!D**: { Γ, A :- Σ} ⇒ { Γ,!A :- Σ}
18. **!S**: { !Γ :- A, ?Σ} ⇒ { !Γ :- !A, ?Σ}
19. **?W**: {Γ :- Σ} ⇒ {Γ:- ?A, Σ}
20. **?C**: {Γ :- ?A,?A, Σ} ⇒ { Γ :- ?A, Σ }
21. **?D**: { Γ :- A, Σ} ⇒ { Γ :- ?A, Σ}
22. **?S**: { !Γ, A :- ?Σ} ⇒ { !Γ,?A :- ?Σ}
23. **⊥ Exp Left**: { Γ :- A, Σ} ⇒ { Γ, A :- Σ}
24. **⊥ Exp Right**: { Γ, A :- Σ} ⇒ { Γ :- A, Σ}
25. **0 Left**: { Γ,0 :- Σ } *(Nothing can be inferred)*
26. **⊤ Right**: { Γ :- ⊤,Σ}
27. **⊥ Left**: { ⊥ :- }
28. **⊥ Right**: {Γ :- Σ} ⇒ {Γ :- ⊥,Σ}
29. **1 Left**: { Γ :- Σ} ⇒ { Γ,1 :- Σ}
30. **1 Right**: { :- 1}
This is long enough that I’m not going to get into how this hooks into category theory today, except to point out that if you look carefully at the multiplication and exponentiations, they might seem a tad familiar.
[types-as-proofs]: http://goodmath.blogspot.com/2006/06/finally-modeling-lambda-calculus.html
[sigact92]: http://www.csl.sri.com/~lincoln/papers/sigact92.ps
[yesterday]: http://scienceblogs.com/goodmath/2006/07/a_brief_diversion_sequent_calc.php

A Brief Diversion: Sequent Calculus

*(This post has been modified to correct some errors and add some clarifications in response to comments from alert readers. Thanks for the corrections!)*
Today, we’re going to take a brief diversion from category theory to play with
some logic. There are some really neat connections between variant logics and category theory. I’m planning on showing a bit about the connections between category theory and one of those, called *linear logic* . But the easiest way to present things like linear logic is using a mechanism based on sequent calculus.
Sequent calculus is a deduction system for performing reasoning in first order propositional logic. But it’s notation and general principles are useful for all sorts of reasoning systems, including many different logics, all sorts of type theories, etc. The specific sequent calculus that I’m to talk about is sometimes called system-LK; the general category of things that use this basic kind of rules is called Gentzen systems.
The sequent calculus consists of a set of rules called *sequents*, each of which is normally written like a fraction: the top of the fraction is what you know before applying the sequent; the bottom is what you can conclude. The statements in the sequents are always of the form:

CONTEXTS, Predicates :- CONTEXTS, Predicates

The “CONTEXTS” are sets of predicates that you already know are true. The “:-” is read “entails”; it means that the *conjuction* of the statements and contexts to the left of it can prove the *disjunction* of the statements to the right of it. In predicate logic, the conjuction is logical and, and disjunction is logical or, so you can read the statements as if “,” is “∧” on the left of the “:-“, and “∨” on the right. *(Note: this paragraph was modified to correct a dumb error that I made that was pointed out by commenter Canuckistani.)*
Contexts are generally written using capital greek letters; predicates are generally written using uppercase english letters. We often put a name for an inference rule to the right of the separator line for the sequent.
For example, look at the following sequent:
Γ :- Δ
————— Weakening-Left
Γ,A :- Δ
This sequent is named Weakening-left; the top says that “Given Γ everything in Δ can be proved.”; and
the bottom says “Using Γ plus the fact that A is true, everything in Δ can be proved”. The full sequent basically says: if Δ is provable given Γ, then it will still be provable when A is added to Γ;in other words, adding a true fact won’t invalidate any proofs that were valid before the addition of A. *(Note: this paragraph was modified to correct an error pointed out by a commenter.)*
The sequent calculus is nothing but a complete set of rules that you can use to perform any inference in predicate calculus. A few quick syntactic notes, and I’ll show you the full set of rules.
1. Uppercase greek letters are contexts.
2. Uppercase english letters are *statements*.
3. Lowercase english letters are *terms*; that is, the objects that predicates
can reason about, or variables representing objects.
4. A[b] is a statement A that contains the term b in some way.
5. A[b/c] means A with the term “b” replaced by the term “c”.
——-
First, two very basic rules:
1.    
———— (Identity)
A :- A
2. Γ :- A, Δ     Σ, A :- Π
—————————————— (Cut)
Γ,Σ :- Δ, Π
Now, there’s a bunch of rules that have right and left counterparts. They’re duals of each other – move terms across the “:-” and switch from ∧ to ∨ or vice-versa.
3. Γ, A :- Δ
————————— (Left And 1)
Γ, A ∧ B :- Δ
4. Γ :- A, Δ
——————— ——— (Right Or 1)
Γ, :- A ∨ B, Δ
5. Γ, B :- Δ
——————— ——(Left And 2)
Γ,A ∧ B :- Δ
6. Γ :- B, Δ
——————— ——— (Right Or 2)
Γ :- A ∧ B, Δ
7. Γ, A :- Δ    Σ,B :- Π
————————————— (Left Or)
Γ,Σ, A ∨ B :- Δ,Π
8. Γ :- A,Δ   Σ :- B,Π
—————————— ——(Right And)
Γ,Σ :- A ∧ B, Δ,Π
9. Γ :- A,Δ
————— —— (Left Not)
Γ, ¬A :- Δ
10. Γ,A :- Δ
——————— (Right Not)
Γ :- ¬A, Δ
11. Γ :- A,Δ    Σ,B :- Π
————————————— (Left Implies)
Γ, Σ, A → B :- Δ,Π
12. Γ,A[y] :- Δ *(y bound)*
————————————— (Left Forall)
Γ,∀x A[x/y] :- Δ
13. Γ :- A[y],Δ *(y free)*
————————————— (Right Forall)
Γ :- ∀x A[x/y],Δ
14. Γ, A[y] :- Δ *(y bound)*
———————————— (Left Exists)
Γ,∃x A[x/y] :- Δ
15. Γ, :- A[y], Δ *(y free)*
————————————(Right Exists)
Γ :- ∃x A[x/y], Δ
16. Γ :- Δ
—————— (Left Weakening)
Γ, A :- Δ
17. Γ :- Δ
—————— (Right Weakening)
Γ :- A, Δ
18. Γ, A, A :- Δ
——————— (Left Contraction)
Γ,A :- Δ
19. Γ :- A, A, Δ
——————— (Right Contraction)
Γ :- A, Δ
20. Γ, A, B, Δ :- Σ
————————— (Left Permutation)
Γ,B, A, Δ :- Σ
21. Γ :- Δ, A, B, Σ
————————— (Right Permutation)
Γ :- Δ B, A, Σ
Here’s an example of how we can use sequents to derive A ∨ ¬ A:
1. Context empty. Apply Identity.
2. A :- A. Apply Right Not.
3. empty :- ¬ A, A. Apply Right And 2.
4. empty : A ∨ ¬A, A. Apply Permute Right.
5. empty :- A, A ∨ ¬ A. Apply Right And 1.
6. empty :- A ∨ ¬ A, A ∨ ¬ A. Right Contraction.
7. empty :- A ∨ ¬ A
If you look *carefully* at the rules, they actually make a lot of sense. The only ones that look a bit strange are the “forall” rules; and for those, you need to remember that the variable is *free* on the top of the sequent.
A lot of logics can be described using Gentzen systems; from type theory, to temporal logics, to all manner of other systems. They’re a very powerful tool for describing all manner of inference systems.