Category Archives: Good Math

Fundie Probability: Even Worse Math than Swinburne

I recently got a real prize of a link from one of my readers. He’d enjoyed the [Swinburne][swinburne] article, and had encoutered this monstrosity; an alleged [probability of christianity][prob] argument *significantly worse* than Swinburne.
[swinburne]: http://goodmath.blogspot.com/2006/04/mind-numbingly-stupid-math.html “My shredding of Swinburne”
[prob]: http://www.biblebelievers.org.au/radio034.htm “Mathematical Probability that Jesus is the Christ”
The difference between Swinburne and this bozo (who’s name I can’t locate on the site) is that at least Swinburne made *some* attempt to use some math to justify his position. It may have been sloppy as hell – but at least he *did* go through the effort of using actual Bayesian methods. I think that he did an *appallingly* bad job of it; but at least there was something there.
Our new friend doesn’t even bother to do that.
Here’s a very typical example of his “argument”:
>The reason why prophecy is an indication of the divine authorship of the Scriptures, and hence a testimony to the trustworthiness of the Message of the Scriptures, is because of the minute probability of fulfillment.
>
>Anyone can make predictions. Having those prophecies fulfilled is vastly different. In fact, the more statements made about the future, and the more the detail, then the less likely the precise fulfillment will be.
>
>For example, what’s the likelihood of a person predicting today the exact city in which the birth of a future leader would take place, well into the 21st century? This is indeed what the prophet Micah did 700 years before the Messiah. Further, what is the likelihood of predicting the precise manner of death that a new, unknown religious leader would experience, a thousand years from now – a manner of death presently unknown, and to remain unknown for hundreds of years? Yet, this is what David did in 1000 B.C.
>
>Again, what is the likelihood of predicting the specific date of the appearance of some great future leader, hundreds of years in advance? This is what Daniel did, 530 years before Christ.
>
>If one were to conceive 50 specific prophecies about a person in the future, whom one would never meet, just what’s the likelihood that this person will fulfill all 50 of the predictions? How much less would this likelihood be if 25 of these predictions were about what other people would do to him, and were completely beyond his control?
>
>For example, how does someone “arrange” to be born in a specific family?
>
>How does one “arrange” to be born in a specified city, in which their parents don’t actually live? How does one “arrange” their own death – and specifically by crucifixion, with two others, and then “arrange” to have their executioners gamble for His clothing (John 16:19; Psalms 22:18)? How does one “arrange” to be betrayed in advance? How does one “arrange” to have the executioners carry out the regular practice of breaking the legs of the two victims on either side, but not their own? Finally, how does one “arrange” to be God? How does one escape from a grave and appear to people after having been killed?
>
>Indeed, it may be possible for someone to fake one or two of the Messianic prophecies, but it would be impossible for any one person to arrange and fulfill all of these prophecies.
So the basic method is. Assert that there were prophecies that unambiguously identified a specific event. Further, assert that those prophecies were fulfilled. Now, ask, what’s the probability of all of those prophecies coming true?
The arguments that he presents range from the vague to the silly. In the vague cases, he makes a classic mathematical mistake: switching between a priori and a posteori probabilities. In general, probability calculations are done a priori: “I *don’t know* what specific situation I’m going to apply my calculations to, so when I set the probabilities, I do it without any specific knowledge of a desired outcome”. But he does it a posteori: “Since this description of an event *clearly references* this specific incident, I’m going to set the probabilities using my knowledge of the outcome”.
For example, he asserts things like prophecies naming a specific city in which his alleged messiah would be borne. Of course, the specific prophesy that supposedly makes this claim is not specified. And if you go to other places that babble about the various prophecies about the messiah, you find that none of them are actually *specific*. That is, they all talk in very vague and symbolic terms, and *a posteori* interpretations of those can say that they identify a specific place associated with Jesus; but if you were to read them with an open mind, without any idea of a set of events that you were trying to link them to, you almost certainly would not have demanded that they describe a single, specific location. If they could describe 20 locations, then your probability calculations would need to include the probability for each of those possibilities; if you use the a posteori perspective to narrow it down to one, you’re making invalid use of hindsight to skew the numbers.
Even worse than just using the alleged fulfillment of vague prophecies, he makes an even worse kind of a posteori error: numerous citations of prophecies from the *new* testament: a set of books written *after* the events that were supposedly prophesized – as in the citation of the book of John in the quoted section above. We’re supposed to take the fulfillment of a prophecy that wasn’t written down until *after* the event, and act surprised that there’s exactly one supposedly historical event that matches that prophesy.
Ooh, I’m surprised. A bunch of guys who were trying to convince the world that Jesus was the messiah wrote down prophesies *after the fact* in a way that only Jesus could possibly have fulfilled. Adding *that* to an alleged probability calculation to reduce the probability of anyone but Jesus fulfilling all of the prophecies isn’t even bad math. It’s just plain old lying.
In the category of just plain silly, I absolutely *love* the quote where he’s talking about how hard it would be for anyone else to fulfill these alleged prophecies: “how does one “arrange” to be God?”. Yeah, one of the statements that he supposedly calculuates a probability for is “being God”.
What’s the probability of being God? I’d really like to know. Do I have any real chance of fulling it?
Alas, I can’t tell, Because despite insisting that there are 456 different prophesies that Jesus supposedly fulfilled, and that each of those is factored into the calculuation, he doesn’t tell us how any of those individual probabilities were calculated – for example, he says the probability of an alleged messiah being born in Bethlehem was one in 2.8×105. Where’d that number come from? We don’t know.
And for most of the alleged events that were fulfilled prophesies, he doesn’t even give us that much information.
For example, he says that the odds of fulfilling 48 prophesies – he doesn’t say which 48 – is 1 in 10157. He just asserts that. And then goes off into a classic “big numbers” babble about how terribly unimaginably large that number is. He even goes so far as to pull out one of those bullshit “anything less likely than X is impossible”. (I’m actually playing a game of spider solitaire in the background as a write this; which means that I’m doing something absolutely impossible!)
He’s even got a ripoff of the PSICOP challenge:
>But, of course, there are many more than eight prophecies. In another calculation, Stoner used 48 prophecies (Idem, 109) (even though he could have used Edersheim’s 456), and arrived at the extremely conservative estimate that the probability of 48 prophecies being fulfilled in one person is the incredible number 10^157. In fact, if anybody can find someone, living or dead, other than Jesus, who can fulfill only half of the predictions concerning the Messiah given in the book “Messiah in Both Testaments” by Fred J. Meldau, the Christian Victory Publishing Company is ready to give a ONE thousand dollar reward! As apologist Josh McDowell says, “There are a lot of men in the universities that could use some extra cash!” (Josh McDowell, Evidence that Demands a Verdict, California: Campus Crusade for Christ, 175).
The catch of course is that you need to fulfill the prophesies *in exactly the way that these fundies interpret them* – in other words, starting from the assumption that Jesus fulfilled them, and that the prophecies specifically point at exactly the events of the life of Jesus.
I can actually save the authors a lot of trouble. The probability of anyone else fulfilling these prophesies *is zero*. Because they’re interpreted a posteori as the specific events from the life of Jesus, the odds of it being Jesus who fulfilled them is exactly 100%, and the odds of anyone else doing it are exactly 0%.
Course, if you put it that way, suddenly it doesn’t sound so impressive, even to the bunch of idiots who’d accept this argument to begin with.

Category Theories: Some definitions to build on

Sorry, but I actually jumped the gun a bit on Yoneda’s lemma.

As I’ve mentioned, one of the things that I don’t like about category theory is how definition-heavy it is. So I’ve been trying to minimize the number of definitions at any time, and show interesting results of using the techniques of category theory as soon as I can.

Well, there are some important definitions which I haven’t talked about yet. And for Yoneda’s lemma to make sense, you really need to see some more examples of how categories express structure. And to be able to show how category theory lets you talk about structure, we need to plough our way through a few more definitions.

Initial Objects

Suppose we have a category, C. An object o ∈ Obj(C) is called an initial object in C if/f (∀ b ∈ Obj(C)), (∃ 1 f : o → b ∈ Arrows(C)).

In english: an object o is initial in C if there’s exactly one arrow from o to each other object in C. We often write “0C” for the initial object of a category C, or just “0” if it’s obvious what category we’re talking about.

There’s a dual notion of a terminal object: an object is terminal if there’s a exactly one arrow from every object in the category to it. Terminals are written “1C” or just “1”.

Given two objects in a category, if they’re both initial, they must be isomorphic. It’s pretty easy to prove: here’s the sketch. Remember the definition of isomorphism in category theory. An isomorphism is an arrow f : a → b, where (∃ g : b → a) such that f &ormd; g = 1b and g º f = 1a. If an object is initial, then there’s an arrow from it to every other object. Including the other initial object. And there’s an arrow back, because the other one is initial. The iso-arrows between the two initials obviously compose to identities.

Categorical Products

The product of two morphisms is a generalization of the cartesian product of two sets. It’s important because products are one of the major ways of building complex structures using simple categories.

Given a category C, and two objects a,b ∈ Obj(C), the categorical product a × b consists of:

  1. An object p, often written a×b;
  2. two arrows pa and pb, where p ∈ Obj(C), pa : p → a, and pb : p → b.
  3. a “pairing” operation, which for every object c ∈ C, maps the pair of arrows f : c → a and g : c → b to an arrow Pairc(f,g) : c → a×b. Pairc(f,g) is often written <f,g>c>.

Pairc must have three properties.

  1. pa º Pairc(f,g) = f.
  2. pb º Pairc(f,g) = g.
  3. (∀ h : c → a×b) Pairc(pa º h, pb º h) = h.

The first two of those properties are the separation arrows, to get from the product to its components; and the third is the merging arrow, to get from the components to the product. We can say the same thing about the relationships in the product in an easier way using a commutative diagram:

catprod.jpg

Category Theory: Natural Transformations and Structure

The thing that I think is most interesting about category theory is that what it’s really fundamentally about is structure. The abstractions of category theory let you talk about structures in an elegant way; and category diagrams let you illustrate structures in a simple visual way. Morphisms express the structure of a category; functors are higher level morphisms that express the structure of relationships between categories.

In my last category theory post, one of the things I mentioned was how category theory lets you explain the idea of symmetry and group actions – which are a kind of structural immunity to transformation, and a definition of transformation – in a much simpler way than it could be talked about without categories.

It turns out that symmetry transformations are just the tip of the iceberg of the kinds of structural things we can talk about using categories. In fact, as I alluded to in my last post, if we create a category of categories, we end up with functors as arrows between categories.

What happens if we take the same kind of thing that we did to get group actions, and we pull out a level, so that instead of looking at the category of categories, focusing on arrows from the specific category of a group to the category of sets, we do it with arrows between members of the category of functors?

We get the general concept of a natural transformation. A natural transformation is a morphism from functor to functor, which preserves the full structure of morphism composition within the categories mapped by the functors.

Suppose we have two categories, C and D. And suppose we also have two functors, F, G : C → D. A natural transformation from F to G, which we’ll call η maps every object x in C to an arrow ηx : F(x) → G(x). ηx has the property that for every arrow a : x → y in C, ηy º F(a) = G(a) º ηx. If this is true, we call ηx the component of η for (or at) x.

That paragraph is a bit of a whopper to interpret. Fortunately, we can draw a diagram to help illustrate what that means. The following diagram commutes if η has the property described in that paragraph.

natural-transform.jpg

I think this is one of the places where the diagrams really help. We’re talking about a relatively straightforward property here, but it’s very confusing to write about in equational form. But given the commutative diagram, you can see that it’s not so hard: the path ηy º F(a) and the path G(a) º η<sub compose to the same thing: that is, the transformation η hasn’t changed the structure expressed by the morphisms.

And that’s precisely the point of the natural transformation: it’s a way of showing the relationships between different descriptions of structures – just the next step up the ladder. The basic morphisms of a category express the structure of the category; functors express the structure of relationships between categories; and natural transformations express the structure of relationships between relationships.

Coming soon: a few examples of natural transformation, and then… Yoneda’s lemma. Yoneda’s lemma takes the idea we mentioned before of a group being representable by a category with one object, and generalizes all the way up from the level of a single category to the level of natural transformations.

More Category Theory: Getting into Functors

Let’s talk a bit about functors. Functors are fun!

What’s a functor? I already gave the short definition: a structure-preserving mapping between categories. Let’s be a bit more formal. What does the structure-preserving property mean?

A functor F from category C to category D is a mapping from C to D that:

  • Maps each member m in Obj(C) to an object F(m) in Obj(D).
  • Maps each arrow a : x rightarrow y  in Mor(C) to an arrow F(a) : F(x) rightarrow F(y), where:
    • forall o in Obj(C): F(1_o) = 1_{F(o)}. (Identity is preserved by the functor mapping of morphisms.)
    • forall m,n in Mor(C):  F(n circ o) = F(o) circ F(n). (Associativity is preserved by the Functor mapping of morphisms.)

That’s the standard textbook gunk for defining a functor. But if you look back at the original definition of a category, you should notice that this looks familiar. In fact, it’s almost identical to the definition of the necessary properties of arrows!

We can make functors much easier to understand by talking about them in the language of categories themselves. Functors are arrows between a category of categories.

There’s a kind of category, called a small category. (I happen to dislike the term “small” category; I’d prefer something like “well-behaved”.) A small category is a category C where Obj(C) and Mor(C) are sets, not proper classes. Alas, more definitions; in set theory, a class is a collection of sets that can be defined by a non-paradoxical property that all of its members share. Some classes are sets of sets; some classes are not sets; they lack some of the required properties of sets – but still, the class is a collection with a well-defined, non-paradoxical, unambiguous property. If a class isn’t a set of sets, but just a collection that isn’t a set, then it’s called a proper class.

Any category whose collections of objects and arrows are sets, not proper classes, are called small categories. Small categories are, basically, categories that are well-behaved – meaning that their collections of objects and arrows don’t have any of the obnoxious properties that would prevent them from being sets.

The small categories are, quite beautifully, the objects of a category called Cat. (For some reason, category theorists like three-letter labels.) The arrows of Cat are the functors: Functors are morphisms between small categories. Once you wrap you head around that, then the meaning of a functor, and the meaning of a structure-preserving transformation become extremely easy to understand.

Functors come up over and over again, all over mathematics. They’re an amazingly useful notion. I was looking for a list of examples of things that you can describe using functors, and found a really wonderful list on wikipedia.. I highly recommend following that link and taking a look at the list. I’ll just mention one particularly interesting example: groups and group actions.

If you’ve been reading GM/BM long enough, you’ll remember my posts on group theory, and how long it took for me to work up to the point where I could really define what symmetry meant, and how every symmetric transformation was actually a group action. Category theory makes that much easier.

Every group can be represented as a category with a single object. A functor from the category of a group to the category of Sets is a group action on the set that is the target of the functor. Poof! Symmetry. (This paragraph was modified to correct some ambiguous wording; the ambiguity was pointed out by commenter “Cat lover”.)

Since symmetry means structure-preserving transformation; and a functor is a structure preserving transformation – well, they’re almost the same thing. The functor is an even more general abstraction of that concept: group symmetry is just one particular case of a functor transformation. Once you get functors, understanding symmetry is easy. And so are lots of other things.

And of course, you can always carry these things further. There is a category of functors themselves; and notions which can be most easily understood in terms of functors operating on the category of functors!

By now, it should be sort of clear why category theory is affectionately known as abstract nonsense. Category theory operates at a level of abstraction where almost anything can be wrapped up in it; and once you’ve wrapped something up in a category, almost anything you can do with it can itself be wrapped up asa category – levels upon levels, categories of categories, categories of functors on categories of functors on categories, ad infinitum. And yet, it makes sense. It captures a useful, comprehensible notion. All that abstraction, to the point where it seems like nothing could possibly come out of it. And then out pops a piece of beautiful crystal.

Some Basic Examples of Categories

For me, the frustrating thing about learning category theory was that
it seemed to be full of definitions, but that I couldn’t see why I should care.
What were these category things, and what could I really talk about using this
strange new mathematical language of categories?

To avoid that in my presentation, I’m going to show you a couple of examples up front of things we can talk about using the language of category theory: sets, partially ordered sets, and groups.

Sets as a Category

We can talk about sets using category theory. The objects in the category of sets are, obviously, sets. Arrows in the category of sets are total functions between sets.

Let’s see how these satisfy our definition of categories:

  • Given a function f from set A to set B, it’s represented by an arrow f : A → B.
  • º is function composition. It meets the properties of a categorical º:
    • Associativity: function composition over total functions is associative; we know that from set theory.
    • Identity: for any set S, 1S is the identity function: (∀ i ∈ S) 1S(i) = i. It should be pretty obvious that for any f : S → T, f º 1S = f; and 1T º f = f.

Partially Ordered Sets

Partially ordered sets (that is, sets that have a “<=" operator) can be described as a category, usually called PoSet. The objects are the partially ordered sets; the arrows are monotonic functions (a function f is monotonic if (∀ x,y &isin domain(x)) x <= y ⇒ f(x) <= f(y).). Like regular sets, º is function composition.

It’s pretty easy to show the associativity and identity properties; it’s basically the same as for sets, except that we need to show that º preserves the monotonicity property. And that’s not very hard:

  • Suppose we have arrows f : A → B, g : B → C. We know that f and g are monotonic
    functions.

  • Now, for any pair of x and y in the domain of f, we know that if x <= y, then f(x) <= f(y).
  • Likewise, for any pair s,t in the domain of g, we know that if s <= t, then g(s) <= g(t).
  • Put those together: if x <= y, then f(x) <= f(y). f(x) and f(y) are in the domain of g, so if (f(x) <= f(y)) then we know g(f(x)) <= g(f(y)).

Groups as a Category

There is a category Grp where the objects are groups; group homomorphisms are arrows. Homomorphisms are structure-preserving functions between sets; so function composition of those structure-preserving functions is the composition operator º. The proof that function composition preserves structure is pretty much the same as the proof we just ran through for partially ordered sets.

Once you have groups as a category, then you can do something very cool. If groups are a category, then functors over groups are symmetric transformations. Walk it through, and you’ll see that it fits. What took me a week of writing to be able to explain when I was talking about group theory can be stated in one sentence using the language of category theory. That’s a perfect example of why cat theory is useful: it lets you say some very important, very complicated things in very simple ways.

Miscellaneous Comments

There’ve been a couple of questions about from category theory skeptics in the comments. Please don’t think I’m ignoring you. This stuff is confusing enough for most people (me included) that I want to take it slowly, just a little bit at a time, to give readers an opportunity to digest each bit before going on to the next. I promise that I’ll answer your questions eventually!

Diagrams in Category Theory

One of the things that I find niftiest about category theory is category diagrams. A lot of things that normally turn into complex equations or long-winded logical statements can be expressed in diagrams by capturing the things that you’re talking about in a category, and then using category diagrams to express the idea that you want to get accross.

A category diagram is a directed graph, where the nodes are objects from a category, and the edges are morphisms. Category theorists say that a graph commutes if, for any two paths through arrows in the diagram from node A to node B, the composition of all edges from the first path is equal to the composition of all edges from the second path.

As usual, an example will make that clearer.
cat-assoc.jpg

This diagram is a way of expression the associativy property of morphisms: f º (g º h) = (f º g) º h. The way that the diagram illustrates this is: (g º h) is the morphism from A to C. When we compose that with f, we wind up at D. Alternatively, (f º g) is the arrow from B to D; if we compose that with H, we wind up at D. The two paths: f º (A → C), and (B → D) &ordm h are both paths from A to D, therefore if the diagram commutes, they must be equal.

Let’s look at one more diagram, which we’ll use to define an interesting concept, the principal morphism between two objects. The principle morphism is a single arrow from A to B, and any composition of morphisms that goes from A to B will end up being equivalent to it.

In diagram form, a morphism m is principle if (∀ x : A → A) (∀ y : A → B), the following diagram commutes.
cat-principal.jpg

In words, this says that f is a principal morphism if for every endomorphic arrow x, and for every arrow y from A to B, f is is the result of composing x and y. There’s also something interesting about this diagram that you should notice: A appears twice in the diagram! It’s the same object; we just draw it in two places to make the commutation pattern easier to see. A single object can appear in a diagram as many times as you want to to make the pattern of commutation easy to see. When you’re looking at a diagram, you need to be a bit careful to read the labels to make sure you know what it means. (This paragraph was corrected after a commenter pointed out a really silly error; I originally said “any identity arrow”, not “any endomorphic arrow”.)

One more definition by diagram: x and y are a retraction pair, and A is a retract of B (written A < B) if the following diagram commutes:
cat-retract.jpg

That is, x : A → B, and y : B → A are a retraction pair if y º x = 1A.

Why so many languages? Programming languages, Computation, and Math.

Back at my old digs last week, I put up a post about programming languages and types. It produced an interesting discussion, which ended up shifting topics a bit, and leading to a very interesting question from one of the posters, and since the answer actually really does involve math, I’m going to pop it up to the front page here.

In the discussion, I argued that programmers should know and use many different programming languages; and that that’s not just a statement about todays programming languages, but something that I think will always be true: that there will always be good reasons for having and using more than one programming language.

One of the posters was quite surprised by this, and wanted to know why I didn’t think that it was possible, at least in principle, to design a single ideal programming language that would be good for any program that I needed to write.

It’s a good question, which I think connects very naturally into the underlying mathematics of computation. As I discussed back on the old blog, one of the very fundamental facts concerning computation is the Church-Turing thesis: that computation has a fundamental limit, and that there are many different ways of performing mechanical computation, but ultimately, they all end up with the same fundamental capabilities. Any computing system that reaches that limit is called an effective computing system (ECS). Anything that one ECS can do, any other ECS can do too. That doesn’t mean that they’re all identical. A given computation can be really easy to understand when described in terms of one ECS, and horribly difficult in another. For example, sorting a list of numbers is pretty easy to understand if it’s written in lambda calculus; but it’s a nightmare written for a Minsky machine; and it’s somewhere darned close to impossible for a human to understand written out as a cell-grid for Conway’s life.

How does this connect back to programming languages? Well, what is a programming language really? From a theoretical point of view, it’s a language for specifying a computing machine. Each program is a description of a specific computing machine. The language is a particular way of expressing the computing machine that you want to build. Underlying each programming language is an effective computing system: a fundamental model of how to perform computations. That’s the real fundamental difference between programming languages: what fundamental way of representing computation underlies the language.

Assuming that you’re looking at a good programming language, it’s good for a particular task when that task is easy to describe in terms of the fundamental ECS underlying the language; it’s bad for a task when that task is not easy to describe in terms of the languages underlying ECS.

If a language has a single ECS underlying it, then there are tasks that it’s good for, and tasks that it’s not so good for. If you try to smoosh multiple ECSs together under the covers of one language, then you don’t really have one language: when you’re writing for a vonNeumann machine, you’re really using one language, and when you’re writing for Actors, you’re using another. You’ve just forced two languages together into the same framework – but they’re still two languages, and you’ve probably compromised the quality of expression of those two languages by forcing them together.

A First Glance at Category Theory

To get started, what is category theory?

Back in grad school, I spent some time working with a thoroughly insane guy named John Case who was the new department chair. When he came to the university, he brought a couple of people with him, to take temporary positions. One of them was a category theorist whose name I have unfortunately forgotten. That was the first I’d ever heard of cat theory. So I asked John what the heck this category theory stuff was. His response was “abstract nonsense”. I was astonished; a guy as wacky and out of touch with reality as John called something abstract nonsense? It turned out to be a classic quotation, attributed to one of the founders of category theory, Norman Steenrod. It’s not an entirely bad description.

Category theory is one of those fields of mathematics that fascinates me: where you take some fundamental concept, and abstract it down to its bare essentials in order to understand just what it really is, what it really means. Just like group theory takes the idea of an algebraic operation, strip it down to the bare minimum, and discovering the meaning of symmetry; category theory looks at what happens if you take the concept of a function as a mapping from one thing to another, and strip it down to the bare minimum, and see what you can discover?

The fundamental thing in category theory is an arrow, also called a morphism. A morphism is an abstraction of the concept of homomorphism, which I talked about a bit when I was writing about group theory. We take that concept of a function mapping from one set of values to another, and strip it down: the basic concept is something that provides a map from one thing to some other thing – and that’s all we want.

In classic style, we’ll define a category C as a tuple: (O, M, º), where:

  • O, or Obj(C), is a set of objects. Objects are anything – we don’t care. Anything that we can define maps over.
  • M, or Mor(C) is a set of arrows or morphisms. Every arrow has a unique source and a unique target, both of which are objects in M. Given two objects a and b in O, we also write Mor(a,b) for the set of morphisms from a to b. To talk about a specific morphism from a to b, we write it as “name : a → b”; as in “f : int → int”.
  • º is a binary operation that is the abstraction of function composition; º is a mapping from (Mor(a,b), Mor(b,c)) to Mor(a,c). It’s got the basic properties of function composition:
    1. Associativity: (∀ f : a → b, g : b → c, h : c → d) h º (g º f) = (h º g) º f
    2. Identity: (∀ a,b ∈ O(C)) (exists 1a, 1b ∈ Mor(C)) (∀ f : a → b) 1b º f = f = f º 1a.

One neat little trick to simplify things is that we can actually throw away Obj(C), and replace it with the set of identity morphisms: since there is exactly one identity morphism per object, there’s no real need to distinguish between the identity morphism and the object. It’s a nice trick because it means that we have nothing but morphisms all the way down; but it’s really just a trick. We can talk about Obj(C); or Id(C); but we still need to be able to talk about the objects in some way, whether they’re just a subset of the morphisms, or something distinct.

Now, we get to something about category theory that I really don’t like. Category theory is front-loaded with rather a lot of definitions about different kinds of morphisms and different kinds of categories. The problem with that is that these definitions are very important, but we don’t have enough of the theory under our belts to be able to get much of a sense for why we should care about them, or what their intuitive meaning is. But that’s the way it goes sometimes; you have to start somewhere. It will make sense eventually, and you’ll see why these definitions matter.

There are a lot of special types of morphisms, defined by properties. Here’s the basic list:

  • A monic (or monomorphism) is an arrow f : a → b such that (∀ g1 , g2: x → a) f º g1 = f º g2 ⇒ g1 = g2. (That is, if any two arrows composed with f in f º g end up at the same object only if they are the same.)
  • An epic (or epimorphism) is an arrow f : a → b such that (∀ g1 , g2: b → x) g1 º f = g2 º f ⇒ g1 = g2. (This is almost the same as a monic, but it’s from the other side of the composition; instead of (f º g) in the definition, it’s (g º f).)
  • An isomorphism is an arrow f : a → b such that (∃ g : b → a) f º g = 1b ∧ g º f = 1a.
  • An endomorphism is an arrow f : a → b such that a = b.
  • An automorphism is an arrow that is both an endmorphism and an isomorphism.

One last definition, just because it gives me a chance to point out something useful about category theory. A functor is a morphism in the category of all categories. What that means is that it’s a structure-preserving mapping between categories. It’s neat in a theoretical way, because it demonstrates that we’re already at a point where we’re seeing how category theory can make it easier to talk about something complicated: we’re using it to describe itself! But the concept of functor also has a lot of applications; in particular, the module system of my favorite programming language makes extensive use of functors.

In Ocaml, a module is something called a structure, which is a set of definitions with constrained types. One of the things you often want to be able to do is to write a piece of code in a way that allows you to make it parametric on some other structure. The way you do that is to write a functor: a “function” from a structure to a structure. For example, to implement a generic binary tree, you need a type of values that you’ll put in the tree; and an operation to compare values. The way you do that is to write a functor which takes a structure defining a type and a comparison operator, and mapping it to a structure which is an implementation of a binary tree for that type and comparison.

The Ocaml functor is a category theoretic functor: category theory provides an easy way to talk about the concept of the compile-time “function” from structure to structure.

Election Fraud? Or just bad math?

I’ve gotten an absolutely unprecedented number of requests to write about RFK Jr’s Rolling Stone article about the 2004 election.

RFK Jr’s article tries to argue that the 2004 election was stolen. It does a wretched, sloppy, irresponsible job of making the argument. The shame of it is that I happen to believe, based on the information that I’ve seen, that the 2004 presidential election was stolen. But RFK Jr’s argument is just plain bad: a classic case of how you can use bad math to support any argument you care to make. As a result, I think that the article does just about the worst thing that it could do: to utterly discredit anyone who questions the results of that disastrous election, and make it far more difficult for anyone who wanted to do a responsible job of looking into the question.

Let’s get right into it. He starts his argument by claiming that the exit polls indicated a different result than the official election results:

The first indication that something was gravely amiss on November 2nd, 2004, was the inexplicable discrepancies between exit polls and actual vote counts. Polls in thirty states weren’t just off the mark — they deviated to an extent that cannot be accounted for by their margin of error. In all but four states, the discrepancy favored President Bush.

The key sentence that indicates just how poorly RFK Jr understands the math? “they deviated to an extent that cannot be accounted for by their margin of error”. That is a statement that is, quite simply, nonsensical. The margin of error in a poll is a statistical measure based on the standard deviation. Contrary to popular opinion, a poll with a margin of error of “4%” doesn’t mean that the actual quantity being measured must be within plus or minus 4% of the poll result.

A margin of error is measured to within a level of confidence. Most of the time, the MoE that we see cited is the MoE with 95% confidence. What this means is that 95% of the time, the sampled (polled) result is within that +/- n% range. But there is no case in which a result is impossible: the margin of error is an expression of how confident the poller is in the quality of their measurement: nothing more than that. Like any other measurement based on statistical sampling, the sample can deviate from the population by any quantity: a sample can be arbitrarily bad, even if you’re careful about how you select it.

Elections have consistently shown a bias in the exit polls: a bias in favor of the democratic vote. For some reason, which has not been adequately studied, exit polls almost always err on the side of sampling too many democratic voters. This could be the result of any number of factors: it could be a question of time (when were the polled people asked?); it could be a question of location (where were the pollsters located relative to the polling place?); it could be a social issue (the group of people that consistently votes for the democratic party may be more willing/have more time to answer pollsters questions); it could be something else entirely.

But you can’t conclude that an election was stolen on the basis of a discrepancy between official election results and exit polls. The best you can do is conclude that you need to look at both the election and the exit polling process to try to determine the reason for the discrepancy.

According to Steven F. Freeman, a visiting scholar at the University of Pennsylvania who specializes in research methodology, the odds against all three of those shifts occurring in concert are one in 660,000. ”As much as we can say in sound science that something is impossible,” he says, ”it is impossible that the discrepancies between predicted and actual vote count in the three critical battleground states of the 2004 election could have been due to chance or random error.”

That entire quote is, to put it crudely, utter bullshit. Anyone who would make that statement should be absolutely disqualified from ever commenting on a statistical result.

Now, thanks to careful examination of Mitofsky’s own data by Freeman and a team of eight researchers, we can say conclusively that the theory is dead wrong. In fact it was Democrats, not Republicans, who were more disinclined to answer pollsters’ questions on Election Day. In Bush strongholds, Freeman and the other researchers found that fifty-six percent of voters completed the exit survey — compared to only fifty-three percent in Kerry strongholds.(38) ”The data presented to support the claim not only fails to substantiate it,” observes Freeman, ”but actually contradicts it.”

Again, nonsense. There are two distinct questions in that paragraph, which are being deliberately conflated:

  1. In each given polling place, what percentage of people who voted were willing to participate in exit polls?
  2. In each given polling place, what percentage of the people who were willing to participate in exit polls were voters for each of the major parties?

The fact that a smaller percentage of people in places that tended to vote for the democratic candidate were willing to participate in exit polls is entirely independent of whether or not in a specific polling place a larger percentage of democratic voters than republican voters were willing to participate in the exit polls. This is a deliberate attempt to mislead readers about the meanings of the results – aka, a lie.

What’s more, Freeman found, the greatest disparities between exit polls and the official vote count came in Republican strongholds. In precincts where Bush received at least eighty percent of the vote, the exit polls were off by an average of ten percent. By contrast, in precincts where Kerry dominated by eighty percent or more, the exit polls were accurate to within three tenths of one percent — a pattern that suggests Republican election officials stuffed the ballot box in Bush country.

It could indicate that. It could also indicate that democratic voters were consistently more willing to participate in exit polls than republican voters. And therefore, in polling places that were strongly democratic, the sampling was quite representative; but in polling places that were strongly republican, the sampling was lousy.

Just to give an idea of how this works. Suppose we have two polling places, each of which has 20,000 voters. Suppose in district one, there is a 60%/40% split in favor of democratic voters; and in district two, there’s the opposite; 60% republican, 40% democrat. And let’s just use similar numbers for simplicity; suppose that in both polling places, 60% of democrats were willing to participate in exit polls, and 40% of republicans were willing. What’s the result?

  1. District one will have 12000 votes for the democrat, and 8000 for the republican. The exit polls will get 7200 democratic voters, and 3200 republican voters, or 69% of the vote going to democrats according to exit poll, versus 60% actual.
  2. District two will have the opposite number of votes: 8000 for the democrat, and 12000 for the republican. The exit polls would get 4800 democrats, and 4800 votes for republicans – predicting a 50/50 split.

The democratic margin of victory in the democratic area was increased; the republican margin was decreased by slightly more.

It continues very much in this same vein: giving unreliable samples undue evidence; bait-and-switch of statistics; and claims of measurement errors being impossible. But none of the mathematical arguments are true.

Was there fraud in the election? Almost certainly. Particularly in Ohio, there are some serious flaws that we know about. But this article manages to mix the facts of partisan manipulation of the election with so much gibberish that it discredits the few real facts that it presents.

RFK Jr. should be ashamed of himself. But based on his past record, I rather doubt that he is.

Next Topic Poll Results; or, the losers win

As I mentioned here, back on the old home of goodmath, I was taking a poll of what good math topic to cover next. In that poll, graph theory and topology were far away the most popular topics, tying for most votes (8 each), compared to no more than 2 votes for any other subject.
So, the next topic I’m going to talk about is: category theory.
There is actually a reason for that. I’m not just ignoring what people voted for. Based on the poll, I was planning on writing about topology, so I started doing some background reading on toplogy. What came up in the first chapter of the book I picked up? Explanations of how to interpret category-theory diagrams, because the author of the text found cat theory to be useful for explaining some of the concepts of topology.
I also looked up some articles on graph theory – in particular, graph isomorphisms, because that’s something I’ve done some work on. And what do I find when I start to read them? Again, category theory diagrams.
And what do I find when I look at wikipedia, to see if I missed anything in my recent series of posts on the semantics of lambda calculus? Category theory.
Category theory is a wierd subject, which has an amazing way of generating incredibly polarizing attitudes among mathematicians. But it’s cropping up more and more in numerous fields of mathematics, and it’s widely used in computer science. There seem to be significant doubts among many people as to whether or not category theory represents anything dramatically new, whether or not it provides new insights that couldn’t have been gained by any other fields of mathematics. But whether or not it’s a source of new knowledge, it seems to be undeniable that it is extremely useful as a tool for understanding and explaining other mathematical fields.
So I will definitely write about topology and graph theory soon. But first, it’s going to be category theory.