Category Archives: Good Math

GM/BM Friday: Pathological Programming Languages

In real life, I’m not a mathematician; I’m a computer scientist. Still a math geek, mind you, but what I really do is very much in the realm of applied math, researching how to build systems to help people program.
One of my pathological obsessions is programming languages. Since I first got exposed to TRS-80 Model 1 BASIC back in middle school, I’ve been absolutely nuts programming languages. Last time I counted, I’d learned about 130 different languages; and I’ve picked up more since then. I’ve written programs most of them. Like I said, I’m nuts.
Anyway, I decided that it would be amusing to inflict my obsession on you, my readers, with a new feature: the friday pathological programming language. You see, there are plenty of *crazy* people out there; and many of them like to invent programming languages. Some very small number of them try to design good languages and succeed; a much larger number try to design good languages and fail; and *then* there are the folks who design the languages I’m going to talk about. They’re the ones who set out to design *bizzare* programming languages, and succeed brilliantly. They call them “esoteric” programming languages. I call them evil.
Today, the beautiful grand-daddy of the esoteric language family: the one, the only, the truly and deservedly infamous: [Brainfuck!][bf], designed by Urban Müller. (There are a number of different implementations available; just follow the link.)
Only 8 commands – including input and output – all written using symbols. And yet Turing complete; and not just Turing complete, but actually based on a *real* [formal theoretical design][pprimeprime]. And it’s even been implemented [*in hardware*!][bf-hard]
BrainFuck is based on something very much like a twisted cross between a [Turing machine][turing] and a [Minsky machine][minsky]. It’s got the idea of an input tape, like the turing machine. But unlike the turing machine, each cell of the tape stores a number, which can be incremented or decremented, like a Minsky machine. And like a Minsky, the only control flow is a test for zero.
The 8 instructions:
1. **>**: move the tape head one cell forward.
2. **+++++++++.++++++-.+++++++..+++.>++++<>.<——.++++++
.+++.——.——–.>+.
Let’s pull that apart just a bit so that we can hope to understand.
* “++++++++”: store the number “8” in the current tape cell. We’re going to use that as a loop index, so the loop is going to repeat 8 times.
* “[>+++++++++.”: go to the cell after the loop index, and output what’s there. That outputs the “72” as a character: “H”.
* “++++++-.”: Advance past the index, subtract one, and output. That’s 101, or “e”.
Continues in pretty much the same vein, using a couple of tape cells, and running loops to generate the values of the characters. Beautiful, eh?
If that didn’t seem impressive enough, [here][bf-fib] is a really gorgeous implementation of a fibonacci sequence generator, with documentation. The BF compiler used to write this ignores any character other than the 8 commands, so the comments don’t need to be marked in any way; they just need to be really careful not to use punctuation.

+++++++++++ number of digits to output
> #1
+ initial number
>>>> #5
++++++++++++++++++++++++++++++++++++++++++++ (comma)
> #6
++++++++++++++++++++++++++++++++ (space)
<<<<< #1
copy #1 to #7
[>>>>>>+>+<<<<<<>>>>>>[<<<<<<>>>>>>-]

++++++++++  set the divisor #8
[
subtract from the dividend and divisor
->+>+<<>>[<<>>-]
set #10
+
if #9 clear #10
[-][<>>+<<>[-]]
jump back to #8 (divisor possition)
<>> #11
copy to #13
[>>+>+<<>>[<<>>-]
set #14
+
if #13 clear #14
[-][<>[-]]
<<<<<<>>>> #12
if #12 output value plus offset to ascii 0
[++++++++++++++++++++++++++++++++++++++++++++++++.[-]]
subtract #11 from 10
++++++++++  #12 is now 10
- #12
output #12 even if it's zero
++++++++++++++++++++++++++++++++++++++++++++++++.[-]
<<<<<<<<<<< #1
check for final number
copy #0 to #3
>>+>+<<<>>>[<<<>>>-]
>.>.<<<[-]]
<>+>+<<>>[<<>>-]<<[-]>[-]<<<-
]

[bf]: http://www.muppetlabs.com/~breadbox/bf/
[bf-fib]: http://esoteric.sange.fi/brainfuck/bf-source/prog/fibonacci.txt
[turing]: http://goodmath.blogspot.com/2006/03/playing-with-mathematical-machines.html
[minsky]: http://goodmath.blogspot.com/2006/05/minsky-machine.html
[bf-hard]: http://www.robos.org/?bfcomp
[pprimeprime]: http://en.wikipedia.org/wiki/P_prime_prime

Why I Hate Religious Bayesians

Last night, a reader sent me a link to yet another wretched attempt to argue for the existence of God using Bayesian probability. I really hate that. Over the years, I’ve learned to dread Bayesian arguments, because so many of them are things like this, where someone cobbles together a pile of nonsense, dressing it up with a gloss of mathematics by using Bayesian methods. Of course, it’s always based on nonsense data; but even in the face of a lack of data, you can cobble together a Bayesian argument by pretending to analyze things in order to come up with estimates.

You know, if you want to believe in God, go ahead. Religion is ultimately a matter of personal faith and spirituality. Arguments about the existence of God always ultimately come down to that. Why is there this obsessive need to justify your beliefs? Why must science and mathematics be continually misused in order to prop up your belief?

Anyway… Enough of my whining. Let’s get to the article. It’s by a guy named Robin Collins, and it’s called “God, Design, and Fine-Tuning“.

Let’s start right with the beginning.

Suppose we went on a mission to Mars, and found a domed structure in which everything was set up just right for life to exist. The temperature, for example, was set around 70o F and the humidity was at 50%; moreover, there was an oxygen recycling system, an energy gathering system, and a whole system for the production of food. Put simply, the domed structure appeared to be a fully functioning biosphere. What conclusion would we draw from finding this structure? Would we draw the conclusion that it just happened to form by chance? Certainly not. Instead, we would unanimously conclude that it was designed by some intelligent being. Why would we draw this conclusion? Because an intelligent designer appears to be the only plausible explanation for the existence of the structure. That is, the only alternative explanation we can think of–that the structure was formed by some natural process–seems extremely unlikely. Of course, it is possible that, for example, through some volcanic eruption various metals and other compounds could have formed, and then separated out in just the right way to produce the “biosphere,” but such a scenario strikes us as extraordinarily unlikely, thus making this alternative explanation unbelievable.

The universe is analogous to such a “biosphere,” according to recent findings in physics. Almost everything about the basic structure of the universe–for example, the fundamental laws and parameters of physics and the initial distribution of matter and energy–is balanced on a razor’s edge for life to occur. As eminent Princeton physicist Freeman Dyson notes, “There are many . . .lucky accidents in physics. Without such accidents, water could not exist as liquid, chains of carbon atoms could not form complex organic molecules, and hydrogen atoms could not form breakable bridges between molecules” (1979, p.251)–in short, life as we know it would be impossible.

Yes, it’s the good old ID argument about “It looks designed, so it must be”. That’s the basic argument all the way through; they just dress it up later. And as usual, it’s wrapped up in one incredibly important assumption, which they cannot and do not address: that we understand what it would mean to change the fundamental structure of the universe.

What would it mean to change, say, the ratio of the strengths of the electromagnetic force and gravity? What would matter look like if we did? Would stars be able to exist? Would matter be able to form itself into the kinds of complex structures necessary for life?

We don’t know. In fact, we don’t even really have a clue. And not knowing that, we cannot meaningfully make any argument about how likely it is for the universe to support life.

They do pretend to address this:

Various calculations show that the strength of each of the forces of nature must fall into a very small life-permitting region for intelligent life to exist. As our first example, consider gravity. If we increased the strength of gravity on earth a billionfold, for instance, the force of gravity would be so great that any land-based organism anywhere near the size of human beings would be crushed. (The strength of materials depends on the electromagnetic force via the fine-structure constant, which would not be affected by a change in gravity.) As astrophysicist Martin Rees notes, “In an imaginary strong gravity world, even insects would need thick legs to support them, and no animals could get much larger.” (Rees, 2000, p. 30). Now, the above argument assumes that the size of the planet on which life formed would be an earth-sized planet. Could life forms of comparable intelligence to ourselves develop on a much smaller planet in such a strong-gravity world? The answer is no. A planet with a gravitational pull of a thousand times that of earth — which would make the existence of organisms of our size very improbable– would have a diameter of about 40 feet or 12 meters, once again not large enough to sustain the sort of large-scale ecosystem necessary for organisms like us to evolve. Of course, a billion-fold increase in the strength of gravity is a lot, but compared to the total range of strengths of the forces in nature (which span a range of 1040 as we saw above), this still amounts to a fine-tuning of one part in 1031. (Indeed,other calculations show that stars with life-times of more than a billion years, as compared to our sun’s life-time of ten billion years, could not exist if gravity were increased by more than a factor of 3000. This would have significant intelligent life-inhibiting consequences.) (3)

Does this really address the problem? No. How would matter be different if gravity were a billion times stronger, and EM didn’t change? We don’t know. For the sake of this argument, they pretend that mucking about with those ratios wouldn’t alter the nature of matter at all. That’s what they’re going to build their argument on: the universe must support life exactly like us: it’s got to be carbon-based life on a planetary surface that behaves exactly like matter does in our universe. In other words: if you assume that everything has to be exactly as it is in our universe, then only our universe is suitable.

They babble on about this for quite some time; let’s skip forwards a bit, to where they actually get to the Bayesian stuff. What they want to do is use the likelihood principle to argue for design. (Of course, they need to obfuscate, so they cite it under three different names, and finally use the term “the prime principle of confirmation” – after all, it sounds much more convincing than “the likelihood principle”!)

The likelihood principle is a variant of Bayes’ theorem, applied to experimental systems. The basic idea of it is to take the Bayesian principle of modifying an event probability based on a prior observation, and to apply it backwards to allow you to reason about the probability of two possible priors given a final observation. In other words, take the usual Bayesian approach of asking: “Given that Y has already occurred, what’s the probability of X occurring?”; turn it around, and say “X occurred. For it to have occurred, either Y or Z must have occurred as a prior. Given X, what are the relative probabilities for Y and Z as priors?”

There is some controversy over when the likelihood principle is applicable. But let’s ignore that for now.

To further develop the core version of the fine-tuning argument, we will summarize the argument by explicitly listing its two premises and its conclusion:

Premise 1. The existence of the fine-tuning is not improbable under theism.

Premise 2. The existence of the fine-tuning is very improbable under the atheistic single-universe hypothesis. (8)

Conclusion: From premises (1) and (2) and the prime principle of confirmation, it follows that the fine-tuning data provides strong evidence to favor of the design hypothesis over the atheistic single-universe hypothesis.

At this point, we should pause to note two features of this argument. First, the argument does not say that the fine-tuning evidence proves that the universe was designed, or even that it is likely that the universe was designed. Indeed, of itself it does not even show that we are epistemically warranted in believing in theism over the atheistic single-universe hypothesis. In order to justify these sorts of claims, we would have to look at the full range of evidence both for and against the design hypothesis, something we are not doing in this paper. Rather, the argument merely concludes that the fine-tuning strongly supports theism over the atheistic single-universe hypothesis.

That’s pretty much their entire argument. That’s as mathematical as it gets. Doesn’t stop them from arguing that they’ve mathematically demonstrated that theism is a better hypothesis than atheism, but that’s really their whole argument.

Here’s how they argue for their premises:

Support for Premise (1).

Premise (1) is easy to support and fairly uncontroversial. The argument in support of it can be simply stated as follows: since God is an all good being, and it is good for intelligent, conscious beings to exist, it not surprising or improbable that God would create a world that could support intelligent life. Thus, the fine-tuning is not improbable under theism, as premise (1) asserts.

Classic creationist gibberish: pretty much the same stunt that Swinburne pulled. They pretend that there are only two possibilities. Either (a) there’s exactly one God which has exactly the properties that Christianity attributes to it; or (b) there are no gods of any kind.

They’ve got to stick to that – because if they admitted more than two possibilities, they’d have to actually consider why their deity is more likely that any of the other possibilities. They can’t come up with an argument that Christianity is better than atheism if they acknowledge that there are thousands of possibilities as likely as theirs.

Support for Premise (2).

Upon looking at the data, many people find it very obvious that the fine-tuning is highly improbable under the atheistic single-universe hypothesis. And it is easy to see why when we think of the fine-tuning in terms of the analogies offered earlier. In the dart-board analogy, for example, the initial conditions of the universe and the fundamental constants of physics can be thought of as a dart- board that fills the whole galaxy, and the conditions necessary for life to exist as a small one-foot wide target. Accordingly, from this analogy it seems obvious that it would be highly improbable for the fine-tuning to occur under the atheistic single-universe hypothesis–that is, for the dart to hit the board by chance.

Yeah, that’s pretty much it. The whole argument for why fine-tuning is less probably in a universe without a deity than in a universe with one. Because “many people find it obvious”, and because they’ve got a clever dartboard analogy.

They make a sort of token effort to address the obvious problems with this, but they’re really all nothing but more empty hand-waving. I’ll just quote one of them as an example; you can follow the link to the article to see the others if you feel like giving yourself a headache.

Another objection people commonly raise against the fine-tuning argument is that as far as we know, other forms of life could exist even if the constants of physics were different. So, it is claimed, the fine-tuning argument ends up presupposing that all forms of intelligent life must be like us. One answer to this objection is that many cases of fine-tuning do not make this presupposition. Consider, for instance, the cosmological constant. If the cosmological constant were much larger than it is, matter would disperse so rapidly that no planets, and indeed no stars could exist. Without stars, however, there would exist no stable energy sources for complex material systems of any sort to evolve. So, all the fine-tuning argument presupposes in this case is that the evolution of life forms of comparable intelligence to ourselves requires some stable energy source. This is certainly a very reasonable assumption.

Of course, if the laws and constants of nature were changed enough, other forms of embodied intelligent life might be able to exist of which we cannot even conceive. But this is irrelevant to the fine-tuning argument since the judgement of improbability of fine-tuning under the atheistic single-universe hypothesis only requires that, given our current laws of nature, the life-permitting range for the values of the constants of physics (such as gravity) is small compared to the surrounding range of non-life-permitting values.

Like I said at the beginning: the argument comes down to a hand-wave that if the universe didn’t turn out exactly like ours, it must be no good. Why does a lack of hydrogen fusion stars like we have in our universe imply that there can be no other stable energy source? Why is it reasonable to constrain the life-permitting properties of the universe to be narrow based on the observed properties of the laws of nature as observed in our universe?

Their argument? Just because.

Monads and Programming Languages

One of the questions that a ton of people sent me when I said I was going to write about category theory was “Oh, good, can you please explain what the heck a monad is?”

The short version is: a monad is a category with a functor to itself. The way that this works in a programming language is that you can view many things in programming languages in terms of monads. In particular, you can take things that involve mutable state, and magically hide the state.

How? Well – the state (the set of bindings of variables to values) is an object in a category, State. The monad is a functor from State → State. Since the functor is a functor from a category to itself, the value of the state is implicit – they’re the object at the start and end points of the functor. From the viewpoint of code outside of the monad functor, the states are indistinguishable – they’re just something in the category. For the functor itself, the value of the state is accessible.

So, in a language like Haskell with a State monad, you can write functions inside the State monad; and they are strictly functions from State to State; or you can write functions outside the state monad, in which case the value inside the state is completely inaccessible. Let’s take a quick look at an example of this in Haskell. (This example came from an excellent online tutorial which, sadly, is no longer available.)

Here’s a quick declaration of a State monad in Haskell:

class MonadState m s | m -> s where
  get :: m s
  put :: s -> m ()

instance MonadState (State s) s where
  get   = State $ s -> (s,s)
  put s = State $ _ -> ((),s)

This is Haskell syntax saying we’re defining a state as an object which stores one value. It has two functions: get, which retrieves the value from a state; and put, which updates the value hidden inside the state.

Now, remember that Haskell has no actual assignment statement: it’s a pure functional language. So what “put” actually does is create a new state with the new value in it.

How can we use it? We can only access the state from a function that’s inside the monad. In the example, they use it for a random number generator; the state stores the value of the last random generated, which will be used as a seed for the next. Here we go:

getAny :: (Random a) => State StdGen a
getAny = do g <- get
  (x,g') <- return $ random g
  put g'
  return x

Now – remember that the only functions that exist *inside* the monad are "get" and "put". "do" is a syntactic sugar for inserting a sequence of statements into a monad. What actually happens inside of a do is that *each expression* in the sequence is a functor from a State to State; each expression takes as an input parameter the output from the previous. "getAny" takes a state monad as an input; and then it implicitly passes the state from expression to expression.

"return" is the only way *out* of the monad; it basically says "evaluate this expression outside of the monad". So, "return $ randomR bounds g" is saying, roughly, "evaluate randomR bounds g" outside of the monad; then apply the monad constructor to the result. The return is necessary there because the full expression on the line *must* take and return an instance of the monad; if we just say "(x,g') <- randomR bounds g", we'd get an error, because we're inside of a monad construct: the monad object is going be be inserted as an implicit parameter, unless we prevent it using "return". But the resulting value has to be injected back into the monad – thus the "$", which is a composition operator. (It's basically the categorical º). Finally, "return x" is saying "evaluate "x" outside of the monad – without the "return", it would treat "x" as a functor on the monad.

The really important thing here is to recognize that each line inside of the "do" is a functor from State → State; and since the start and end points of the functor are implicit in the structure of the functor itself, you don't need to write it. So the state is passed down the sequence of instructions – each of which maps State back to State.

Let's get to the formal part of what a monad is. There's a bit of funny notation we need to define for it. (You can't do anything in category theory without that never-ending stream of definitions!)

  1. Given a category C, 1C is the *identity functor* from C to C.
  2. For a category C, if T is a functor C → C, then T2 is the TºT. (And so on for tother )
  3. For a given Functor, T, the natural transformation T → T is written 1T.

Suppose we have a category, C. A *monad on C* is a triple (T,η,μ), where T is a functor from C → C, and η and μ are natural transformations; η: 1C → T, and μ: (TºT) → T. (1C is the identity functor for C in the category of categories.) These must have the following properties:

First, μ º Tμ = μ º μT. Or in diagram form:

monad-prop1.jpg

Second, μ º Tη = μ º ηT = 1T. In diagram form:

monad-prop2.jpg

Basically, what these really comes down to is an associative property ensuring that T behaves properly over composition, and that there is an identity transformation that behaves as we would expect. These two properties together add up to mean that any order of applications of T will behave properly, preserving the structure of the category underlying the monad.

Yoneda's Lemma

So, at last, we can get to Yoneda’s lemma, as I [promised earlier][yoneda-promise]. What Yoneda’s lemma does is show us how for many categories (in fact, most of the ones that are interesting) we can take the category C, and understand it using a structure formed from the functors from C to the category of sets. (From now on, we’ll call the category of sets **Set**.)
So why is that such a big deal? Because the functors from C to the **Set** define a *structure* formed from sets that represents the properties of C. Since we have a good intuitive understanding of sets, that means that Yoneda’s lemma
gives us a handle on how to understand all sorts of difficult structures by looking at the mapping from those structures onto sets. In some sense, this is what category theory is really all about: we’ve taken the intuition of sets and functions; and used it to build a general way of talking about structures. Our knowledge and intuition for sets can be applied to all sorts of structures.
As usual for category theory, there’s yet another definition we need to look at, in order to understand the categories for which Yoneda’s lemma applies.
If you recall, a while ago, I talked about something called *[small categories][smallcats]*: a small category is a categories for which the class of objects is a set, and not a proper class. Yoneda’s lemma applies to a a class of categories slightly less restrictive than the small categories, called the *locally small categories*.
The definition of locally small categories is based on something called the Hom-classes of a category. Given a category C, the hom-classes of C are a partition of the morphisms in the category. Given any two objects a and b in Obj(C), the hom-class **Hom**(a,b) is the class of all morphisms f : a → b. If **Hom**(a,b) is a set (instead of a proper class), then it’s called the hom-set of a and b.
A category C is *locally small* if/f all of the hom-classes of C are sets: that is, if for every pair of objects in Obj(C), the morphisms between them form a set, and not a proper class.
So, on to the lemma.
Suppose we have a locally small category C. Then for each object a in Obj(C), there is a *natural functor* corresponding to a mapping to **Set**. This is called the hom-functor of A, and it’s generally written: *h*a = **Hom**(a,-). *h*a is a functor which maps from a object X in C to the set of morphisms **Hom**(a,x).
If F is a functor from C to **Set**, then for all a ∈ Obj(C), the set of natural transformations from *h*a to F have a one-to-one correspondence with the elements of F(A): that is, the natural transformations – the set of all structure preserving mappings – from hom-functors of C to **Set** are isomorphic to the functors from C to **Set**.
So the functors from C to **Set** provide all of the structure preserving mappings from C to **Set**.
Yesterday, we saw a way how mapping *up* the abstraction hierarchy can make some kinds of reasoning easier. Yoneda says that for some things where we’d like to use our intuitions about sets and functions, we can also *map down* the abstraction hierarchy.
(If you saw my posts on group theory back at GM/BMs old home, this is a generalization of what I wrote about [the symmetric groups][symmetry]: the fact that every group G is isomorphic to a subgroup of the symmetric group on G.)
Coming up next: why computer science geeks like me care about this abstract nonsense? What does all of this gunk have to do with programming and programming languages? What the heck is a Monad? and more.
[symmetry]: http://goodmath.blogspot.com/2006/04/permutations-and-symmetry-groups.html
[yoneda-promise]: http://scienceblogs.com/goodmath/2006/06/category_theory_natural_transf.php
[smallcats]: http://scienceblogs.com/goodmath/2006/06/more_category_theory_getting_i.php

Using Natural Transformations: Recreating Closed Cartesian Categories

Today’s contribution on category theory is going to be short and sweet. It’s an example of why we really care about [natural transformations][nt]. Remember the trouble we went through working up to define [cartesian categories and cartesian closed categories][ccc]?
As a reminder: a [functor][functor] is a structure preserving mapping between categories. (Functors are the morphisms of the category of small categories); natural transformations are structure-preserving mappings between functors (and are morphisms in the category of functors).
Since we know that the natural transformation can be viewed as a kind of arrow, then we can take the definitions of iso-, epi-, and mono-morphisms, and apply them to natural transformations, resulting in *natural isomorphisms*, *natural monomorphisms*, and *natural epimorphisms*.
Expressed this way, a cartesian category is a category C where:
1. C contains a terminal object t; and
2. (∀ a,b ∈ Obj(C)), C contains a product object a×b; and
a *natural isomorphism* Δ, which maps each Functor over (C×C): ((x → a) → b) to (x → (a×b))
What this really says is: if we look at categorical products, then for a cartesian category, there’s a way of understanding the product as a
mapping within the category as a pairing structure over arrows.
structure-preserving transformation from arrows between the pairs of values (a,b) and the products (a×b).
The closed cartesian category is just the same exact trick using the exponential: A CCC is a category C where:
1. C is a cartesian category, and
2. (∀ a,b ∈ Obj(C)), C contains an object ba, and a natural isomorphism Λ, where (∀ y ∈ Obj(C)) Λ : (y×a → b) → (y → ab).
Look at these definitions; then go back and look at the old definitions that we used without the new constructions of the natural transformation. That will let you see what all the work to define natural transformations buys us. Category theory is all about structure; with categories, functors, and natural transformations, we have the ability to talk about extremely sophisticated structures and transformations using a really simple, clean abstraction.
[functor]: http://scienceblogs.com/goodmath/2006/06/more_category_theory_getting_i.php
[nt]: http://scienceblogs.com/goodmath/2006/06/category_theory_natural_transf.php
[ccc]: http://scienceblogs.com/goodmath/2006/06/categories_products_exponentia_1.php

Arrow Equality and Pullbacks

We’re almost at the end of this run of category definitions. We need to get to the point of talking about something called a *pullback*. A pullback is a way of describing a kind of equivalence of arrows, which gets used a lot in things like interesting natural transformations. But, before we get to pullbacks, it helps to understand the equalizer of a pair of morphisms, which is a weaker notion of arrow equivalence.
We’ll start with sets and functions again to get an intuition; and then we’ll work our way back to categories and categorical equalizers.
Suppose we have two functions mapping from members of set A to members of set B.

f, g : A → B

Suppose that they have a non-empty intersection: that is, that there is some set of values x ∈ A for which f(x) = g(x). The set of values C from A on which f and g return the same result (*agree*) is called the *equalizer* of f and g. Obviously, C is a subset of A.
Now, let’s look at the category theoretic version of that. We have *objects* A and B.
We have two arrows f, g : A → B. This is the category analogue of the setup of sets and functions from above. To get to the equalizer, we need to add an object C which is a *subobject* of A (which corresponds to the subset of A on which f and g agree in the set model).
The equalizer of A and B is the pair of the object C, and an arrow i : C → A. (That is, the object and arrow that define C as a subobject of A.) This object and arrow must satisfy the following conditions:
1. f º i = g º i
2. (∀ j : D → A) f º j = g º j ⇒ (∃ 1 k : D → C) i º k = j.
That second one is the mouthful. What it says is: if I have any arrow j from some other object D to A: if f and g agree on composition about j, then there can only be *one* *unique* arrow from C to D which composes with j to get to A. In other words, (C, i) is a *selector* for the arrows on which A and B agree; you can only compose an arrow to A in a way that will compose equivalently with f and g to B if you go through (C, i) Or in diagram form, k in the following diagram is necessarily unique:

equalizer.jpg

There are a couple of interesting properties of equalizers that are worth mentioning. The morphism in an equalizer is a *always* monic arrow (monomorphism); and if it’s epic (an epimorphism), then it must also be iso (an isomorphism).
The pullback is very nearly the same construction as the equalizer we just looked at; except it’s abstracting one step further.
Suppose we have two arrows pointing to the same target, f : B → A and g : C → A. Then the pullback of of f and g is the triple of an object and two arrows (B×AC, p : B×AC → B, q : B×AC → C). The elements of this triple must meet the following requirements:
1. f º p = g º q
2. (f º p) : B×AC → A
3. For every triple (D, h : D → B , k : D → C), there is exactly one unique arrow A : D → B×AC where pºA = h, and q º A = k.
As happens so frequently in category theory, this is clearer using a diagram.

pullback.jpg

If you look at this, you should definitely be able to see how this corresponds to the categorical equalizer. If you’re careful and clever, you can also see the resemblance to categorical product (which is why we use the ×A syntax). It’s a general construction that says that f and g are equivalent with respect to the product-like object B×AC.
Here’s the neat thing. Work backwards through this abstraction process to figure out what this construction means if objects are sets and arrows are functions, and what’s the pullback of the sets A and B?

{ (x,y) ∈ A × B : f(x) = g(y) }

Right back where we started, almost. The pullback is an equalizer; working it back shows that.

Categories and SubThings

What’s a subset? That’s easy: if we have two sets A and B, A is a subset of B if every member of A is also a member of B.
What’s a subgroup? If we have two groups A and B, and the values in group A are a subset of the values in group B, then A is a subgroup of B.
For any kind of thing **X**, what does it mean to be a sub-X? Category theory gives us a way of answering that in a generic way. It’s a bit hard to grasp at first, so let’s start by looking at the basic construction in terms of sets and subsets.
The most generic way of defining subsets is using functions. Suppose we have a set, A. How can we define all of the subsets of A, *in terms of functions*?
We can take the set of all *injective* functions to A (an injective function from X to Y is a function that maps each member of X to a unique member of Y). Let’s call that set of injective functions **Inj**(A). Now, we can define equivalence classes over **Inj**(A), where two functions f : X → A and g : Y → A are equivalent if there is an isomorphism between X and Y.
The domain of each function in one of the equivalence classes in **Inj**(A) is a function isomorphic to a subset of A. So each equivalence class of injective functions defines a subset of A.
We can generalize that function-based definition to categories, so that it can define a sub-object of any kind of object that can be represented in a category.
Before we jump in, let me review one important definition from before; the monomorphism, or monic arrow.
>A *monic arrow* is an arrow f : a → b such that (∀ g1,
>g2: x → a) f º g1 = f º g2
>⇒ g1 = g2. (That is, if any two arrows composed with
>f in f º g end up at the same object only if they are the same.)
The monic arrow is the category theoretic version of an injective function.
Suppose we have a category C, and an object a ∈ Obj(C).
If there are are two monic arrows f : x → a and g : y → a, and
there is an arrow h such that g º h = f, then we say f ≤ g (read “f factors through g”). Now, we can take that “≤” relation, and use it to define an equivalence class of morphisms using f ≡ g ⇔ f ≤ g &land; g ≤ f.
What we wind up with using that equivalence relation is a set of equivalence classes of monomorphisms pointing at A. Each of those equivalence classes of morphisms defines a subobject of A. (Within the equivalence classes are objects which have isomorphisms, so the sources of those arrows are equivalent with respect to this relation.) A subobject of A is the sources of an arrow in one of those equivalence classes.

Dishonest Dembski:the Universal Probability Bound

Dishonest Dembski:the Universal Probability Bound
One of the dishonest things that Dembski frequently does that really bugs me is take bogus arguments, and dress them up using mathematical terminology and verbosity to make them look more credible.
An example of this is Dembski’s *universal probability bound*. Dembski’s definition of the UPB from the [ICSID online encyclopedia][upb-icsid] is:
>A degree of improbability below which a specified event of that probability
>cannot reasonably be attributed to chance regardless of whatever
>probabilitistic resources from the known universe are factored in. Universal
>probability bounds have been estimated anywhere between 10-50 (Emile Borel)
>and 10-150 (William Dembski).
He’s quantified it in several different ways. I’ve found three different versions of the calculation of the UPB: two of them from wikipedia; one is from a message thread at ICSID which the author claims is a quote from one of Dembski’s books.
Let’s look at Dembski’s own words first:
>Specifically, within the known physical universe there are estimated to be no
>more than 1080 elementary particles. Moreover, the properties of matter are
>such that transitions from one state to another cannot occur at a rate faster
>that 1045 times per second. Finally, the universe itself is about a billion
>times younger than 1025 seconds (assuming the universe is around 10 to 20
>billion years old). ….these cosmological constraints imply that the total
>number of specified events throughout cosmic history cannot exceed
>1080 * 1045 x 1025 = 10150.
He goes on to assert that this is the “maximum number of trials” that could have occurred since the beginning of the universe, and that for anything less likely than that which is observed to occur, it is not reasonable to say it is caused by chance.
Wikipedia presents this definition, and a more recent one which lowers the UPB, but as they don’t provide all of the details of the equation, I’ll skip it for now. Wikipedia’s explanation of this original form of the UPB is:
>Dembski’s original value for the universal probability bound is 1 in 10150,
>derived as the inverse of the product of the following approximate
>quantities:[11]
>
> * 1080, the number of elementary particles in the observable
> universe.
> * 1045, the maximum rate per second at which transitions in
> physical states can occur (i.e., the inverse of the Planck time).
> * 1025, a billion times longer than the typical estimated age of
> the universe in seconds.
>
>Thus, 10150 = 1080 × 1045 × 1025.
>Hence, this value corresponds to an upper limit on the number of physical
>events that could possibly have occurred since the big bang.
Here’s the fundamental dishonesty: None of those numbers have *anything* to do with what he’s supposedly trying to prove. He’s trying to create a formal-sounding version of the big-number problem by throwing together a bunch of fancy-sounding numbers, multiplying them together, and claiming that they somehow suddenly have meaning.
But they don’t.
It’s actually remarkably easy to show what utter nonsense this is. I’ll do a fancy one first, and a trivial one second.
Let’s create an incredibly simplified model of a region of space. Let’s say we have a cube of space, 1 kilometer on a side. Further, let’s suppose that this space contains 1000 particles, and they are all electrons. And further, let’s suppose that each 1mm cube in this cubic kilometer can only have one electron in it.
This is a model which is so much simpler than reality that it’s downright silly. But everything about the real world would make it more complex, and it’s sufficient for our purposes.
Now: consider the probability of any *configuration* of the electrons in the region of space. A configuration is a selection of the set of 1mm cubes that contain electrons. The number of different configurations of this region of space is (109!)/((1000!)*(109-1000)!). That works out to (109*(109-1)*(109-2)*…*(109-1000))/(1000!).
1000! is roughly 4×102568 according to my scheme interpreter. We’ll be generous, and use 1×102569, to make things easier. To estimate the numerator, we can treat it as (109)*((108)999), which will be much smaller. That’s 107801. So the probability of any particular configuration within that cube is 1 in 105232.
So any state of particles within that cube is an event with probability considerably smaller than 1 in 105232. So what Dembski is saying is that *every* possible configuration of matter in space in the entire universe is impossible without intelligent intervention.
And the trivial one? Grab two decks of distinguishable cards. Shuffle them together, and lay them out for a game of spider solitaire. What’s the probability of that particular lay of cards? 104! , or, very roughly, something larger than 1×10166. Is god personally arranging ,my cards every time I play spider?
Anyone who’s ever taken any class on probability *knows* this stuff. One college level intro, and you know that routine daily events can have incredibly small probabilities – far smaller than his alleged UPB. But Dembski calls himself a mathematician, and he writes about probability quite frequently. As much as I’ve come to believe that he’s an idiot, things like this just don’t fit: he *must* know that this is wrong, but he continues to publish it anyway.
[upb-icsid]: http://www.iscid.org/encyclopedia/Universal_Probability_Bound

Skewing Statistics for Politics

As I’ve frequently said, statistics is an area which is poorly understood by most people, and as a result, it’s an area which is commonly used to mislead people. The thing is, when you’re working with statistics, it’s easy to find a way of presenting some value computed from the data that will appear to support a predetermined conclusion – regardless of whether the data as a whole supports that conclusion. Politicians and political writers are some of the worst offenders at this.
Case in point: over at [powerline][powerline], they’re reporting:
>UPI reports that Al Gore’s movie, An Inconvenient Truth, hasn’t done so well
>after a promising start:
>
>> Former U.S. vice-President Al Gore’s documentary “An Inconvenient Truth”
>>has seen its ticket sales plummet after a promising start.
>>
>>After Gore’s global warming documentary garnered the highest average per play
>>ever for a film documentary during its limited Memorial Day weekend opening,
>>recent theater takes for the film have been less than stellar, Daily Variety
>>reports.
>>
>> The film dropped from its record $70,333 per play to $12,334 during its
>>third week and its numbers have continued to fall as the film opens in smaller
>>cities and suburbs across the country.
>
>It’s no shock, I suppose, that most people aren’t interested in seeing
>propaganda films about the weather. But the topic is an interesting and
>important one which we wrote about quite a few years ago, and will try to
>return to as time permits.
So: they’re quoting a particular figure: *dollars per screen-showing*, as a measure of how the movie is doing. The thing is, that’s a pretty darn weird statistic. Why would they use dollars/screen-showing, instead of total revenue?
Because it’s the one statistic that lets them support the conclusion that they wanted to draw. What are the real facts? Official box office statistics for gross per weekend (rounded to the nearest thousand):
* May 26: $281,000 (in 4 theaters)
* June 2: $1,356,000 (in 77 theaters)
* June 9: $1,505,000 (in 122 theaters)
* June 16: $1,912,000 (in 404 theaters)
* June 23: $2,016,000 (in 514 theaters)
Each weekend, it has made more money than the previous weekend. (Those are per weekend numbers, not cumulative. The cumulative gross for the movie is $9,630,000.)
But the per showing gross has gone down. Why? Because when it was first released, it was being shown in a small number of showings in a small number of theaters. When it was premiered in 4 theaters, they sold out standing room only – so the gross per showing was very high. Now, four weeks later, it’s showing in over 500 theaters, and the individual showings aren’t selling out anymore. But *more people* are seeing it – every weekend, the number of people seeing it has increased!
The Powerline article (and the UPI article which it cites) are playing games with numbers to skew the results. They *want* to say that Al Gore’s movie is tanking in the theaters, so they pick a bizzare statistic to support that, even though it’s highly misleading. In fact, it’s one of the best performing documentaries *ever*. It’s currently the number seven grossing documentary of all time, and it’s about $600,000 off from becoming number 5.
What was the per-theater (note, not per showing, but per theater) gross for the last Star Wars movie four weeks into its showing? $4,500/theater (at 3,322 theaters), according to [Box Office Mojo][bom]. So, if we want to use reasoning a lot like powerline, we can argue that Al Gore’s movie is doing *as well as Star Wars* an a dollars/theater-weekend basis.
But that would be stupid, wouldn’t it.
[bom]: http://www.boxofficemojo.com/movies/?page=weekend&id=starwars3.htm
[powerline]: http://powerlineblog.com/archives/014530.php

Categories: Products, Exponentials, and the Cartesian Closed Categories

Before I dive into the depths of todays post, I want to clarify something. Last time, I defined categorical products. Alas, I neglected to mention one important point, which led to a bit of confusion in the comments, so I’ll restate the important omission here.
The definition of categorical product defines what the product looks like *if it’s in the category*. There is no requirement that a category include the products for all, or indeed for any, of its members. Categories are *not closed* with respect to categorical product.
That point leads up to the main topic of this post. There’s a special group of categories called the **Cartesian Closed Categories** that *are* closed with respect to product; and they’re a very important group of categories indeed.
However, before we can talk about the CCCs, we need to build up a bit more.
Cartesian Categories
——————–
A *cartesian category* C (note **not** cartesian *closed* category) is a category:
1. With a terminal object t, and
2. ∀ a, b ∈ Obj(C), the objects and arrows of the categorical product a×b are in C.
So, a cartesian category is a category closed with respect to product. Many of the common categories are cartesian: the category of sets, and the category of enumerable sets, And of course, the meaning of the categorical product in set? Cartesian product of sets.
Categorical Exponentials
————————
Like categorical product, the value of a categorical exponential is not *required* to included in a category. Given two objects x and y from a category C, their *categorical exponential* xy, if it exists in the category, is defined by a set of values:
* An object xy,
* An arrow evaly,x : xy×y → x, called an *evaluation map*.
* ∀ z ∈ Obj(C), an operation ΛC : (z&cross;y → x) → (z → xy). (That is, an operation mapping from arrows to arrows.)
These values must have the following properties:
1. ∀ f : z×y → x, g : z → xy:
* valy,x º (ΛC(f)×1y)
2. ∀ f : z×y → x, g : z → xy:
* ΛC(evaly,x *ordm; (z×1y)) = z
To make that a bit easier to understand, let’s turn it into a diagram.
exponent.jpg
You can also think of it as a generalization of a function space. xy is the set of all functions from y to x. The evaluation map is simple description in categorical terms of an operation that applies a function from a to b (an arrow) to a value from a, resulting in an a value from b.
(I added the following section after this was initially posted; a commenter asked a question, and I realized that I hadn’t explained enough here, so I’ve added the explanation.
So what does the categorical exponential mean? I think it’s easiest to explain in terms of sets and functions first, and then just step it back to the more general case of objects and arrows.
If X and Y are sets, then XY is the set of functions from Y to X.
Now, look at the diagram:
* The top part says, basically, that g is a function from Z to to XY: so g takes a member of Z, and uses it to select a function from Y to X.
* The vertical arrow says:
* given the pair (z,y), f(z,y) maps (z,y) to a value in X.
* given a pair (z,y), we’re going through a function. It’s almost like currying:
* The vertical arrow going down is basically taking g(z,y), and currying it to g(z)(y).
* Per the top part of the diagram, g(z) selections a function from y to x. (That is, a member of XY.)
* So, at the end of the vertical arrow, we have a pair (g(z), y).
* The “eval” arrow maps from the pair of a function and a value to the result of applying the function to the value.
Now – the abstraction step is actually kind of easy: all we’re doing is saying that there is a structure of mappings from object to object here. This particular structure has the essential properties of what it means to apply a function to a value. The internal values and precise meanings of the arrows connecting the values can end up being different things, but no matter what, it will come down to something very much like function application.
Cartesian Closed Categories
—————————-
With exponentials and products, we’re ready for the cartesian closed categories (CCCs):
A Cartesian closed category is a category that is closed with respect to both products and exponentials.
Why do we care? Well, the CCCs are in a pretty deep sense equivalent to the simply typed lambda calculus. That means that the CCCs are deeply tied to the fundamental nature of computation. The *structure* of the CCCs – with its closure WRT product and exponential – is an expression of the basic capability of an effective computing system.
We’re getting close to being able to get to some really interesting things. Probably one more set of definitions; and then we’ll be able to do things like show a really cool version of the classic halting problem proof.