Numeric Pareidolia and Vortex Math

Update: as of 8/8/2012, the youtube video has been pulled at the request of the TED organization. They’ve also asked me to help them figure out how to keep crackpots like this out of their conferences. I turned them down: if they want help, they’ve got the money to hire a legitimate professional, someone who actually knows what they’re doing – i.e., not me.

I’m not a big fan of the TED phenomenon. In my opinion, it’s basically an elaborate scheme to help make a bunch of self-important rich guys show off their importance by getting people to come give them speeches that, ultimately, serve to reinforce their self-importance.

But there’s another reason that I dislike it. In addition to the self-importance of its audience, as it’s grown, it’s also turned into a forum where other self-important twits can pretend that they’re actual scientists presenting important scientific work.

A great example of this is something called “Vortex Math”. An idiot by the name of “Marko Rodin” came up with this ridiculous idea, wrote up a website talking about how wonderful it was and how brilliant he is, and then worked out an invitation to talk at one of the TEDX conferences, which he’s then used to further promote himself – after all, he must be a genuine genius in order to have been allowed to present to such an important group of people!

Vortex math is an example of what I called numerical pareidolia. For those who haven’t heard the term, pareidolia is seeing patterns in randomness. For example, the common event seeing the image of jesus in a piece of toast, or in a mildew stain, or… We find these images, and then believe that they’re not just an illusion, but that they’re a real, deliberate, meaningful reality.

Paradeidolia isn’t limited to seeing images. We humans are natural pattern seekers. We can find patterns in dirt, in words, and in numbers. Numeric pareidolia is finding patterns in numbers. The most common version of that is numerology, where you assign meanings to numbers, and then find more meanings by performing arithmetic to combine numbers with arithmetic, and finding meaning in result.

In Vortex math, Mr. Rodin has done something interesting, for some definition of that word. He’s found a numeric pattern, and he believes that not only is that a pattern, but it is the pattern, the fundamental structure of the universe. And what is this oh-so-awesome pattern, this vortex that defines the entire nature of the universe?

It’s a sequence of 1-digit, base-10 numbers. To get it, you start, obviously, with 1. So take the number 1. Double it. You get 2. Double it again, and you get 4. Again, 8. Again, 16. But 16 is two digits – so add them together: 7. Double 16 again, and you get 32, 3+2=5, so 5 is next. Double 32, and you get 64. 6+4=10, 1+0=1. Etc. You get a repeating cycle.

That’s it.

Of course, this only works in base-10. It’s a result of the interaction of doubling with the base. So you won’t get the same pattern in any other number system! According to Rodin, because of the significance of this pattern, that means that base-10 really is the only correct number system.

And what is the significance of this base-10 pattern? Let’s let Rodin and his supporters explain, shall we?

Marko Rodin has discovered the source of the non-decaying spin of the electron. Although scientists know that all electrons in the universe spin, they have never discovered the source of this spin. Rodin has. He has discovered the underpinning geometry of the universe, the fabric of time itself. He has done this by reducing all higher mathematics – calculus, geometry, scalar math – to discrete-number mathematics.

With the introduction of Vortex-Based Mathematics you will be able to see how energy is expressing itself mathematically. This math has no anomalies and shows the dimensional shape and function of the universe as being a toroid or donut-shaped black hole. This is the template for the universe and it is all within our base ten decimal system!

The potential scope and breadth of the Rodin Solution is staggering; it is universally applicable in mathematics, science, biology, medicine, genetics, astronomy, chemistry, physics and computer science. The Rodin Solution will revolutionize computer hardware by creating a crucial gap space, or equi-potential major groove, in processors. This gap space generates underpinning nested vortices resulting in far higher efficiency with no heat build-up. The Rodin Solution replaces the binary code with a new code called the binary triplet which will revolutionize computer operating systems. It will transform physics and astrophysics by finally answering how black holes and pulsars work. Space travel will be revolutionized by reactionless drives that are unaffected by the weight they pull, making the present day combustion engine obsolete. The revolution brought on by reactionless drives will far surpass the societal changes wrought by the shift from steam engines to the present day combustion engine. The Rodin Solution can even be applied to ending pollution and drought by creating an inexhaustible, nonpolluting energy source. Because Rodin´s Vortex-Based Mathematics enables him to condense a trillion-fold calculation to only a few integer steps and because he is able to solve all the mathematical enigmas, the Rodin Solution will revolutionize computer information compression.

Pretty impressive, eh?

And what would crackpottery be without at least a bit of conspiracy? See, the government knows all about it, and they’re actually secretly using it to protect us:

Rudimentary versions of the Rodin Coil, or Rodin Torus, have been created and tested by leading scientists and are presently being used by the U.S. Government in antennas that protect the four corners of the continental U.S.. Life-saving medical devices based on crude approximations of the Rodin Coil Torus are being manufactured and used in the treatment of cancer patients. Microsoft´s former senior researcher is using the Rodin Coil to research, develop and patent new computer information-compression schemes.

Nifty!

Alas, it’s all bullshit. It’s not worth spending too much time on this, but I’ll grab a couple of the claims that are close to my interests, and briefly explain why he’s so full of shit.

One of the claims in the passage above is how he’ll revolutionize computer operating systems:

The Rodin Solution replaces the binary code with a new code called the binary triplet which will revolutionize computer operating systems

Suppose for a moment, that we replaced binary in our computers with a different underlying representation – any underlying representation. Ternary, quadrary, decimal, or his “binary triplets”, whatever they are. How much difference would that make?

None at all. We’ve had the capacity to create ternary computers for a long time – there’s just no reason to. We have built decimal computers. For many years, IBM had computers for financial systems that used a representation called BCD – binary coded decimal. BCD can be useful in financials, because it’s easier to control rounding errors. Floating point math is a bit weird, because numbers that should be precise don’t necessarily have precise binary floating point representations, so you can get some odd rounding errors if you’re not careful. You don’t need BCD to do this – you can use a variety of notations, so long as you’re doing fixed point instead of floating point, but using a decimal fixed point representation makes it all easier.

The thing is, you can’t do anything with different representations that you can’t do with binary. It doesn’t matter. So we don’t build hardware using different representations. We don’t use binary because we don’t know how to build anything else; we use binary because it’s easiest to build binary hardware, and there’s no benefit to making the hardware more complex.

More important, one of the beautiful things about computers is that computers don’t really do binary numbers. Computers use binary to represent things. Numbers are one example of something we can represent. But we can represent anything we want. When I was in college, one of my assignments in a CS class was to implement ternary arithmetic. It’s a simple enough thing that it makes an easy assignment for an undergrad introductory CS class! We can build any representation that we want, and use it. We do this routinely. We’re constantly building new representations for particular purposes. Some of them are so useful that they’ve been enshrined in hardware. For example, computers used to only come with integer hardware – that is, the only mathematical operations that were implemented in the hardware were operations on integers. The computers still did floating point math – you just needed to implement the representation in software. It was so useful that we added it to hardware in order to make it faster. But it’s not fundamentally different. And if a new representation that worked better than simple binary worked, we could implement it using a standard binary computer.

So Rodin’s magic vortex binary-triple computer? There’s just nothing special about it. It’s not going to revolutionize computers.

Another example is compression:

Because Rodin´s Vortex-Based Mathematics enables him to condense a trillion-fold calculation to only a few integer steps and because he is able to solve all the mathematical enigmas, the Rodin Solution will revolutionize computer information compression.

Again, it’s stupid. The problem with compression isn’t that it’s too hard to compute. The problem is more fundamental than that. We can’t compress everything – it’s impossible. (I described more about the reason why it’s generally impossible to do universal compression in this post.) The science and math of data compression are based on the fact that we don’t actually want to compress arbitrary things; we want to compress specific types of things: text, images, video. For each of those, common representations contain a lot of redundancies, and compression tries to remove those redundancies. So, for example, by finding regions in successive frames of a video that don’t change, we can reduce the size of a video file. But that technique won’t do anything for a still image or a text file. We exploit the specific properties of the medium to find an effective way of compressing that specific medium.

In fact, we can do better at specific kinds of media with customized hardware. People build custom hardware for things like mp4 compression all the time. But that’s for a specific medium. It’s got nothing to do with general compression. General compression remains impossible, vortex math or no.

Willfull Ignorance about Statistics in Government

Quick but important one here.

I’ve repeatedly ranted here about ignorant twits. Ignorance is a plague on society, and it’s at its worst when it’s willful ignorance – that is, when you have a person who knows nothing about a subject, and who refuses to be bothered with something as trivial and useless about learning about it before they open their stupid mouths.

We’ve got an amazing, truly amazing, example of this in the US congress right now.
There’s a “debate” going on about something called the American Community Survey, or the
ACS for short. The ACS is a regular survey performed by the Census administration, which
measures a wide range of statistics related to economics.

A group of Republicans are trying to eliminate the ACS. Why? well, let’s put that question aside. And let’s also leave aside, for the moment, whether the survey is important or not. You can, honestly, put together an argument that the ACS isn’t worth doing, that it doesn’t measure the right things, that the value of the information gathered doesn’t measure up to the cost, that it’s intrusive, that it violates the privacy of the survey targets. But let’s not even bother with any of that.

Members of congress are arguing that the survey should be eliminated, and they’re claiming that the reason why is because the survey is unscientific. According to Daniel Webster, a representative from the state of Florida:

We’re spending $70 per person to fill this out. That’s just not cost effective, especially since in the end this is not a scientific survey. It’s a random survey.

Note well the emphasized point there. That’s the important bit.

The survey isn’t cost effective, the data gathered isn’t genuinely useful according to Representative Webster, because it’s not a scientific survey. Why isn’t it a scientific survey? Because it’s random.

This is what I mean by willful ignorance. Mr. Webster doesn’t understand what a survey is, or how a survey works, or what it takes to make a valid survey. He’s talking out his ass, trying to kill a statistical analysis for his own political reasons without making any attempt to actually understand what it is or how it works.

Surveys are, fundamentally, about statistical sampling. Given a large population, you can create estimates about the properties of the population by looking at a representative sample of the population. For example, if you’re looking at the entire population of America, you’re talking about hundreds of millions of people. You can’t measure, say, the employment rate of the entire population every year – there are just too many people. It’s too much information – it’s pretty much impossible to gather it.

But: if you can select a group of, say, 10,000 people, whose distribution matches the distribution of the wider population, then the data you gather about them will closely resemble the data about the wider population.

That’s the point of a survey: find a representative sample, and take measurements of that sample. Then, with a certain probability of correctness, you can infer the properties of the entire population from the properties of the sample.

Of course, there’s a catch. The key to a survey is the sample. The sample must be representative – meaning that the sample must have the same properties as the wider population of which it’s a part. But the point of survey is to discover those properties! If you choose your population to match what you believe the distribution to be, then you’ll bias your data towards matching that distribution. Your sample will only be representative if your beliefs about the data are correct. But that defeats the whole purpose of doing the survey.

So the scientific method of doing a survey is to be random. You don’t start with any preconceived idea of what the population is like. You just randomly select people in a way that makes sure that every member of the population is equally likely to be selected. If your selection is truly random, then there’s a high probability (a measurably high probability, based on the size of the sample and the size of the sampled population) that the sample will be representative.

Scientific sampling is always random.

So Mr. Webster’s statement could be rephrased more correctly as the following contradiction: “This is not a scientific survey, because this is a scientific survey”. But Mr. Webster doesn’t know that what he said is a stupid contradiction. Because he doesn’t care.

Introducing Algebraic Data Structures via Category Theory: Monoids

Since joining foursquare, I’ve been spending almost all of my time writing functional programs. At foursquare, we do all of our server programming in Scala, and we have a very strong bias towards writing our scala code very functionally.

This has increased my interest in category theory in an interesting way. As a programming language geek, I’m obviously fascinated by data structures. Category theory provides a really interesting handle on a way of looking at a kind of generic data structures.

Historically (as much as that word can be used for anything in computer science), we’ve thought about data structures primarily in a algorithmic and structural ways.

For example, binary trees. A binary tree consists of a collection of linked nodes. We can define the structure recursively really easily: a binary tree is a node, which contains pointers to at most two other binary trees.

In the functional programming world, people have started to think about things in algebraic ways. So instead of just defining data structures in terms of structure, we also think about them in very algebraic ways. That is, we think about structures in terms of how they behave, instead of how they’re built.

For example, there’s a structure called a monoid. A monoid is a very simple idea: it’s an algebraic structure with a set of values S, one binary operation *, and one value i in S which is an identity value for *. To be a monoid, these objects must satisfy some rules called the monad laws:

  1. \forall s \in S: s * i = s, i * s = s
  2. \forall x, y, z \in S: (x * y) * z = x * (y * z)

There are some really obvious examples of monoids – like the set of integers with addition and 0 or integers with multiplication and 1. But there are many, many others.

Lists with concatenation and the empty list are a monoid: for any list,
l ++ [] == l, [] + l == l, and concatenation is associative.

Why should we care if data structures like are monoids? Because we can write very general code in terms of the algebraic construction, and then use it over all of the different operations. Monoids provide the tools you need to build fold operations. Every kind of fold – that is, operations that collapse a sequence of other operations into a single value – can be defined in terms of monoids. So you can write a fold operation that works on lists, strings, numbers, optional values, maps, and god-only-knows what else. Any data structure which is a monoid is a data structure with a meaningful fold operation: monoids encapsulate the requirements of foldability.

And that’s where category theory comes in. Category theory provides a generic method for talking about algebraic structures like monoids. After all, what category theory does is provide a way of describing structures in terms of how their operations can be composed: that’s exactly what you want for talking about algebraic data structures.

The categorical construction of a monoid is, alas, pretty complicated. It’s a simple idea – but defining it solely in terms of the composition behavior of function-like objects does take a bit of effort. But it’s really worth it: when you see a monoidal category, it’s obvious what the elements are in terms of programming. And when we get to even more complicated structures, like monads, pullbacks, etc., the advantage will be even clearer.

A monoidal category is a category with a functor, where the functor has the basic properties of a algebraic monoid. So it’s a category C, paired with a bi-functor – that is a two-argument functor ⊗:C×C→C. This is the categorical form of the tensor operation from the algebraic monoid. To make it a monoidal category, we need to take the tensor operation, and define the properties that it needs to have. They’re called its coherence conditions, and basically, they’re the properties that are needed to make the diagrams that we’re going to use commute.

So – the tensor functor is a bifunctor from C×C to C. There is also an object I∈C, which is called the unit object, which is basically the identity element of the monoid. As we would expect from the algebraic definition, the tensor functor has two basic properties: associativity, and identity.

Associativity is expressed categorically using a natural isomorphism, which we’ll name α. For any three object X, Y, and Z, α includes a component αX,Y,Z (which I’ll label α(X,Y,Z) in diagrams, because subscripts in diagrams are a pain!), which is a mapping from (X⊗Y)⊗Z to X⊗(Y⊗Z). The natural isomorphism says, in categorical terms, that the the two objects on either side of its mappings are equivalent.

The identity property is again expressed via natural isomorphism. The category must include an object I (called the unit), and two natural isomorphisms, called &lamba; and ρ. For any arrow X in C, &lamba; and ρ contain components λX and ρX such that λX maps from I⊗X→X, and ρX maps from X⊗I to X.

Now, all of the pieces that we need are on the table. All we need to do is explain how they all fit together – what kinds of properties these pieces need to have for this to – that is, for these definitions to give us a structure that looks like the algebraic notion of monoidal structures, but built in category theory. The properties are, more or less, exact correspondences with the associativity and identity requirements of the algebraic monoid. But with category theory, we can say it visually. The two diagrams below each describe one of the two properties.

monoidal-categoy.png

The upper (pentagonal) diagram must commute for all A, B, C, and D. It describes the associativity property. Each arrow in the diagram is a component of the natural isomorphism over the category, and the diagram describes what it means for the natural isomorphism to define associativity.

Similarly, the bottom diagram defines identity. The arrows are all components of natural isomorphisms, and they describe the properties that the natural isomorphisms must have in order for them, together with the unit I to define identity.

Like I said, the definition is a lot more complicated. But look at the diagram: you can see folding in it, in the chains of arrows in the commutative diagram.

Friday Random Ten: from Prog to Neoclassical, with some Blues.

I haven’t posted one of these in a while, but I’m currently stuck waiting between very long and slow compiles, so I figured why not?

  1. Echolyn, “Make Me Sway”: Brilliant track by Echolyn, one of the very best american neo-prog bands. One of the things that’s so distinctive about Echolyn is the way
    that they use complex vocal harmonies better than anyone else.
  2. Umphrey’s McGee, “Miami Virtue”: Ick. What a disappointment. I’d heard great things about Umphrey’s, and got their Mantis album, which was pretty good. So I got their latest album, “Death by Stereo”, which is absolutely atrocious. Ugh.
  3. Tinyfish, “I’m Not Crashing”: Now this was a great discovery. I know nothing about them, but someone pointed them out to be on Twitter, and when I grabbed one of their albums, it knocked my socks off. This is seriously terrific neo-prog.
  4. ProjeKct X, “The Business of Pleasure”: Weird. Very, very weird. This is one of Robert Fripp’s experimental instrumental gatherings of musicians. It’s strange stuff, but if you (like me) like interesting oddness, this is one of the most amazingly bizzare and yet great pieces of work you’ll find. Is it rock, jazz, or something else? Yes.
  5. Cynic, “Nunc Sans”: progressive death metal with strong jazz influences? Yup.
  6. Owl, “Sky Rocket”: Meh.
  7. Gong, “Sold to the Highest Buddha”: How did I go for so long without learning about Gong? This is one of the very best of the Manchester prog bands. Amazing stuff, which never takes itself too seriously.
  8. Jason Ricci and New Blood, “I Turned Into a Martian”: I don’t normally like blues much, but… Jason Ricci completely redefines what you can do with a harmonica. Despite not being a fan of the format, his playing is amazing, and he put together a great band to back him.
  9. NOW Ensemble, “Change”: This is a big change of pace from the rest of this list. Here in NYC, there’s a really great organization called New Amsterdam Records. NAR is a non-profit label dedicated to promoting the NYC post-classical scene. The NOW Ensemble is my favorite of their stable of artists. Beautiful music by up-and-coming composers, brilliantly performed. I’ve embedded a Youtube link to a couple of their pieces below.
  10. Steve Reich Ensemble, “Clapping”: This is a perfect demonstration of the beauty of simplicity. This piece consists of two people clapping the same pattern – but one periodically shifts by a beat, so that the pattern overlaps with itself in different ways. Embedded below is a ten-person variation on it.

Categorical Equalizers

Category theory is really all about building math using composition. Everything we do, we do it by defining things completely in terms of composition. We’ve seen a lot of that. For example, we defined subclasses (and other sub-things) in terms of composition. It sounds strange, but it took me an amazingly long time to really grasp that. (I learned category theory from some not-very-good textbooks, which were so concerned with the formalisms that they never bothered to explain why any of it mattered, or to build any intuition.)

One thing that’s really important that we haven’t talked about yet is equality. In category theory, we define equality on arrows using something called a pullback. We’ll use that notion of equality for a lot of other things – like natural transformations. But before we can do pullbacks, we need to look at a weaker notion of arrow equality – something called the equalizer of a pair of arrows.

As part of my attempt to be better than those books that I complained about, we’ll start with intuition, by looking at what an equalizer is in terms of sets. Suppose we have two functions f, g : A \rightarrow B.

Now, in addition, suppose that they have a not-empty intersection: that is, that there’s some set of values x \in A for which f(x) == g(x). We can take that set of values, and call it C. C is the equalizer of the functions f and g. It’s the largest set C where if you restrict the inputs to the functions
to members of C, then f and g are equal.

Now, let’s look at the category theoretic version of that. We have objects A and B. We have two arrows f, g : A \rightarrow B. This is the category analogue of the setup of sets and functions from above.

To get to the equalizer, we need to add an object C which is a subobject of A (which corresponds to the subset of A on which f and g agree in the set model).

The equalizer of A and B is the pair of the object C, and an arrow i : C \rightarrow A. (That is, the object and arrow that define C as a subobject of A.) This object and arrow must satisfy the following conditions:

  1.  f \circ i = g \circ i
  2. (\forall j: D \rightarrow A) if f \circ j = g \circ j then (\exists 1 k : D \rightarrow C) such that i \circ k = j

That second one is the mouthful. What it says is:

  • Suppose that I have any arrow j from some other object D to A:
  • if f and g agree on composition about j, then there can only be one unique arrow from C to D which composes with j to get to A.

In other words, (C, i) is a selector for the arrows on which A and B agree; you can only compose an arrow to A in a way that will compose equivalently with f and g to B if you go through (C, i).

Or in diagram form, k in the following diagram is necessarily unique:

equalizer.jpg

There are a couple of interesting properties of equalizers that are worth mentioning. The morphism in an equalizer is a always monic arrow (monomorphism); and if it’s epic (an epimorphism), then it must also be iso (an isomorphism).

The pullback is very nearly the same construction as the equalizer we just looked at; except it’s abstracting one step further.

Suppose we have two arrows pointing to the same target: f : B \rightarrow A and g : C \rightarrow A. Then the pullback of of f and g is the triple of an object and two arrows (B \times_A C, p : B \times_A C \rightarrow B, q : B\times_A C \rightarrow C). The elements of this triple must meet the following requirements:

  1. f \circ p = g \circ q
  2. (f \circ p) : B\times_A C \rightarrow A
  3. for every triple (D, h : D \rightarrow B , k : D \rightarrow C), there is exactly one unique arrow \langle h,k \rangle_A : D \rightarrow B \times_A C where p \circ \langle h,k \rangle_A = h, and q \circ \langle h,k \rangle_A = k.As happens so frequently in category theory, this is clearer using a diagram.

    pullback.jpg

    If you look at this, you should definitely be able to see how this corresponds to the categorical equalizer. If you’re careful and clever, you can also see the resemblance to categorical product (which is why we use the ×A syntax). It’s a general construction that says that f and g are equivalent with respect to the product-like object B×AC.

    Here’s the neat thing. Work backwards through this abstraction process to figure out what this construction means if objects are sets and arrows are functions, and what’s the pullback of the sets A and B?

    { (x,y) \in A \times B : f(x) = g(y) }

    Right back where we started, almost. The pullback is an equalizer; working it back shows that.

Substuff

What’s a subset? That’s easy: if we have two sets A and B, A is a subset of B if every member of A is also a member of B.

We can take the same basic idea, and apply it to something which a tad more structure, to get subgroups. What’s a subgroup? If we have two groups A and B, and the values in group A are a subset of the values in group B, then A is a subgroup of B.

The point of category theory is to take concepts like “subset” and generalize them so that we can apply the same idea in many different domains. In category theory, we don’t ask “what’s a subset?”. We ask, for any structured THING, what does it mean to be a sub-THING? We’re being very general here, and that’s always a bit tricky. We’ll start by building a basic construction, and look at it in terms of sets and subsets, where we already understand the specific concept.

In terms of sets, the most generic way of defining subsets is using functions. Suppose we have a set, A. How can we define all of the subsets of A, in terms of functions? We can do it using injective functions, as follows. (As a reminder, a function from X to Y where every value in X is mapped to a distinct function in y.)

For the set, A, we can take the set of all injective functions to A. We’ll call that set of functions Inj(A).

Given Inj(A), we can define equivalence classes over Inj(A), so that f: X \rightarrow A and g: Y \rightarrow A are equivalent if there is an isomorphism between X and Y.

The domain of each function in one of the equivalence classes in Inj(A) is a function isomorphic to a subset of A. So each equivalence class of injective functions defines a subset of A.

And there we go: we’ve got a very abstract definition of subsets.

Now we can take that, and generalize that function-based definition to categories, so that it can define a sub-object of any kind of object that can be represented in a category.

Before we jump in, let me review one important definition from before; the monomorphism, or monic arrow.

A monic arrow is an arrow f : a \rightarrow b such that
\forall g_1, g_2: x \rightarrow a: f \circ g_1 \ge f \circ g_2 \Rightarrow g_1 = g_2 (That is, if any two arrows composed with f in f \circ g end up at the same object only if they are the same.)

So, basically, the monic arrow is the category theoretic version of an injective function. We’ve taken the idea of what an injective function means, in terms of how functions compose, and when we generalized it, the result is the monic arrow.

Suppose we have a category C, and an object a in \mbox{Obj}(C). If there are are two monic arrows f : x \rightarrow a and g : y \rightarrow a, and
there is an arrow h such that g \circ h = f, then we say f \le g (read “f factors through g”). Now, we can take that “≤” relation, and use it to define an equivalence class of morphisms using f \equiv g \Leftrightarrow f \le g \land g \le f.

What we wind up with using that equivalence relation is a set of equivalence classes of monomorphisms pointing at A. Each of those equivalence classes of morphisms defines a subobject of A. (Within the equivalence classes are objects which have isomorphisms, so the sources of those arrows are equivalent with respect to this relation.) A subobject of A is the sources of an arrow in one of those equivalence classes.

It’s exactly the same thing as the function-based definition of sets. We’ve created a very general concept of sub-THING, which works exactly the same way as sub-sets, but can be applied to any category-theoretic structure.

Interpreting Lambda Calculus using Closed Cartesian Categories

Today I’m going to show you the basic idea behind the equivalency of closed cartesian categories and typed lambda calculus. I’ll do that by showing you how the λ-theory of any simply typed lambda calculus can be mapped onto a CCC.

First, let’s define the term “lambda theory”. In the simply typed lambda calculus, we always have a set of base types – the types of simple atomic values that can appear in lambda expressions. A lambda theory is a simply typed lambda calculus, plus a set of additional rules that define equivalences over the base types.

So, for example, if one of the base types of a lambda calculus was the natural numbers, the lambda theory would need to include rules to define equality over the natural numbers:

  1. x = y if x=0 and y=0; and
  2. x = y if x=s(x’) and y=s(y’) and x’ = y’

So. Suppose we have a lambda-theory L. We can construct a corresponding category C(L). The objects in C(L) are the types in L. The arrows in C(L) correspond to families of expressions in L; an arrow
f : A \rightarrow B corresponds to the set of expressions of type B that contain a single free variable of type A.

The semantics of the lambda-theory can be defined by a functor; in particular, a cartesian closed functor F that maps from C(L) to the closed cartesian category of Sets. (It’s worth noting that this is completely equivalent to the normal Kripke semantics for lambda calculus; but when you get into more complex lambda calculi, like Hindley-Milner variants, this categorical formulation is much simpler.)

We describe how we build the category for the lambda theory in terms of a CCC using something called an interpretation function. It’s really just a notation that allows us to describe the translation recursively. The interpretation function is written using brackets: [A] is the categorical interpretation of the type A from lambda calculus.

So, first, we define an object for each type in L. We need to include a special
type, which we call unit. The idea behind unit is that we need to be able to talk about “functions” that either don’t take any real paramaters, or functions that don’t return anything. Unit is a type which contains exactly one atomic value. Since there’s only one possible value for unit, and unit doesn’t have any extractable sub-values, conceptually, it doesn’t ever need to be passed around. So it’s a “value” that never needs to get passed – perfect for a content-free placeholder.

Anyway, here we go with the base rules:

  • \forall A \in \mbox{basetypes}(L), [A] = A_C \in C(L)
  • [\mbox{unit}] = 1_C

Next, we need to define the typing rules for complex types:

  • [ A \times B] == [A] \times [B]
  • [A \rightarrow B] = [B]^{[A]}

Now for the really interesting part. We need to look at type derivations – that is, the type inference rules of the lambda calculus – to show how to do the correspondences between more complicated expressions. Just like we did in lambda calculus, the type derivations are done with a context Gamma, containing a set of type judgements. Each type judgement assigns a type to a lambda term. There are two translation rules for contexts:

  • [ \emptyset ] = 1_C
  • [\Gamma, x: A] = [\Gamma] \times [A]

We also need to describe what to do with the values of the primitive types:

  • For each value v : A, there is an arrow v : 1 \rightarrow A_C.

And now the rest of the rules. Each of these is of the form [\Gamma :- x : A] = \mbox{arrow}, where we’re saying that \Gamma entails the type judgement x : A. What it means is the object corresponding to the type information covering a type inference for an expression corresponds to the arrow in C(L).

  • Unit evaluation:  [ \Gamma :- \mbox{unit}: \mbox{Unit}] = !: [\Gamma] \rightarrow [\mbox{Unit}]. (A unit expression is a special arrow “!” to the unit object.)
  • Simple Typed Expressions: [\Gamma :- a: A_C] = a \circ ! : [\Gamma] \rightarrow [A_C]. (A simple value expression is an arrow composing with ! to form an arrow from Γ to the type object of Cs type.)
  • Free Variables:  [\Gamma x: A :- x : A] = \pi_2 : ([\Gamma] \times [A]) \rightarrow [A] (A term which is a free variable of type A is an arrow from the product of Γ and the type object A to A; That is, an unknown value of type A is some arrow whose start point will be inferred by the continued interpretation of gamma, and which ends at A. So this is going to be an arrow from either unit or a parameter type to A – which is a statement that this expression evaluates to a value of type A.)
  • Inferred typed expressions: , where (If the type rules of Γ plus the judgement x : A gives us x, then the term x is an arrow starting from the product of the interpretation of the full type context with A), and ending at . This is almost the same as the previous rule: it says that this will evaluate to an arrow for an expression that results in type A.)
  • Function Abstraction: [\Gamma :- \lambda x:A . M : A \rightarrow B] = \mbox{curry}([\Gamma, x:A :- M:B]) : [\Gamma] \rightarrow B^{[A]}. (A function maps to an arrow from the type context to an exponential [B]^{[A]}, which is a function from A to B.)
  • Function application: , , . (function evaluation takes the eval arrow from the categorical exponent, and uses it to evaluate out the function.)

There are also two projection rules for the decomposing categorical products, but they’re basically more of the same, and this is already dense enough.

The intuition behind this is:

  • arrows between types are families of values. A particular value is a particular arrow from unit to a type object.
  • the categorical exponent in a CC is exactly the same thing as a function type in λ-calculus; and an arrow to an exponent is the same thing as a function value. Evaluating the function is using the categorical exponent’s eval arrow to “decompose” the exponent, and produce an arrow to the function’s result type; that arrow is the value that the function evaluates to.
  • And the semantics – called functorial semantics – maps from the objects in this category, C(L) to the category of Sets; function values to function arrows; type objects to sets; values to value objects and arrows. (For example, the natural number type would be an object in C(L), and the set of natural numbers in the sets category would be the target of the functor.)

Aside from the fact that this is actually a very clean way of defining the semantics of a not-so-simply typed lambda calculus, it’s also very useful in practice. There is a way of executing lambda calculus expressions as programs that is based on this, called the Categorical Abstract Machine. The best performing lambda-calculus based programming language (and my personal all-time-favorite programming language), Objective-CAML had its first implementation based on the CAM. (CAML stands for categorical abstract machine language.).

From this, you can see how the CCCs and λ-calculus are related. It turns out that that relation is not just cool, but downright useful. Concepts from category theory – like monads, pullbacks, and functors are really useful things in programming languages! In some later posts, I’ll talk a bit about that. My current favorite programming language, Scala, is one of the languages where there’s a very active stream of work in applying categorical ideas to real-world programming problems.

Obama Campaign Lies with Bad Math

This post is a bit of a change of pace for me.

As you all know, when it comes to politics, I’m a hardcore lefty liberal type. And lots of annoying people like to claim that the reason I write more critical posts about right-wing politicians than left-wing ones is because I’m hopelessly biased. I definitely do end up writing more posts critical of RW than LW politicians, but I believe that that’s because in modern day america, the right wing has completely lost touch with reality. They’re just far more likely to regurgitate long-disproven lies, or to use specious, unsupportable, or just plain pig-ignorant reasoning.

But a reader sent me a copy of a recent fund-raising letter from the Obama campaign, and it’s pissed me right off. I probably actually received a copy of it myself, but I’ve got my spam filters set to throw away anything from the Obama campaign, so I didn’t see it until it was pointed out to me.

When it comes to bad math, in my opinion, there are two main kinds. There’s ignorant bad math, and there’s dishonest bad math. In the former, the people pushing it don’t understand what they’re talking about. They’re saying something that they actually believe. It’s hopelessly wrong, and if they made any effort to learn something about what they’re babbling about, they’d see how wrong they are. In the latter kind, the people pushing it are deliberately trying to deceive their readers/listeners. They know that they’re doing something wrong, and they’re hoping that you are stupid enough to not catch on.

The latter kind of bad math is far worse than the former.

And this Obama campaign fundraising letter is very firmly in the latter camp.

I’m not going to post the entire thing, and I’m not going to provide a link. That would be giving them publicity for this despicable, dishonest effort, which is exactly what they want, and I will not reward them for this.

The letter starts by complaining about a Romney campaign fundraiser, saying:

It may not take the Romney camp very long to get to a million — they announced today that just 9 percent of their money comes from donors giving less than $200.

Take note of the fundamental point there. Of the money collected, 9% came from small donors.

Then they attempt to contrast themselves against Romney:

Our campaign is different. It’s about bringing people together to protect the progress we’ve made and make a lot more in a second term. And 98 percent of the donations people like you make to this campaign are $250 or less.

The main point: of the people donating, 98% were small donors.

You’re supposed to look at that, and say “90% of the donors to Romney are big-money people, but just 2% of the donors to Obama are.”

But they’re not comparing the same thing. One is a percentage of money, and the other is a percentage of people. Let’s take a quick look at an example, to show how this works.

Suppose we’ve got just ten donors. They gave 200, 200, 200, 100, 100, 100, 50, 50, 50, and 1,000,000 dollars, respectively. Obviously, 90% (9 out of 10) donors gave $200 or less. And if you work it out, more than 99% of the money came from donations of $1,000,000 or more.

What does the Obama campaigns actual donor distribution look like? I don’t know. But I’d guess that it’s actually pretty similar to the Romney campaign. Politics in America is, very much, a rich persons sport. Both campaigns are absolutely relying on huge donations from people with lots and lots of money. The Obama campaign wants to trick us into believing that they’re different. But all they’re doing is proving that they’re not. They’re lying to us, and hoping that we’re too stupid to notice.

(There’s another level of dishonesty there, but it’s far more trivial. In the Romney campaign figure, they talk about the percentage of donotions smaller than $200; for the Obama campaign figure, they use $250. Why? Probably because they wanted a number for the Romney campaign where they could say that more than 90% came from big donors. And hell, once they were lying, what’s another lie?)

Programs As Proofs: Models and Types in the Lambda Calculus

Lambda calculus started off with the simple, untyped lambda calculus that we’ve been talking about so far. But one of the great open questions about lambda calculus was: was it sound? Did it have a valid model?

Church found that it was easy to produce some strange and non-sensical expressions using the simple lambda calculus. In order to try to work around those problems, and end up with a consistent system, Church introduced the concept of types, producing the simply typed lambda calculus. Once types hit the scene, things really went wild; the type systems for lambda calculi have never stopped developing: people are still finding new things to do by extending the LC type system today! Most lambda calculus based programming languages are based on the Hindley-Milner lambda calculus, which is a simplification of one of the standard sophisticated typed lambda calculi called SystemF. There’s even a Lambda Cube which can categorize the different type abstractions for lambda calculus (but alas, despite its name, it’s not related to the time cube.) Once people really started to understand types, they realized that the untyped lambda calculus was really just a pathologically simple instance of the simply typed lambda calculus: a typed LC with only one base type.

The semantics of lambda calculus are easiest to talk about in a typed version. For now, I’ll talk about the simplest typed LC, known as the simply typed lambda calculus. One of the really amazing things about this, which I’ll show, is that a simply typed lambda calculus is completely semantically equivalent to an intuitionistic propositional logic: each type in the program is a proposition in the logic; each β reduction corresponds to an inference step; and each complete function corresponds to a proof! Look below for how.

Types

The main thing that typed lambda calculus adds to the mix is a concept called base types. In a typed lambda calculus, you have some universe of atomic values which you can manipulate; those values are partitioned into the *base types*. Base types are usually named by single lower-case greek letters: So, for example, we could have a type “σ”, which consists of the set of natural numbers; a type “τ” which corresponds to boolean true/false values; and a type “γ” which corresponds to strings.

Once we have basic types, then we can talk about the type of a function. A function maps from a value of one type (the type of parameter) to a value of a second type (the type of the return value). For a function that takes a parameter of type “γ”, and returns a value of type “δ”, we write its type as “γ → δ”. “→” is called the _function type constructor_; it associates to the right, so “γ → δ → ε” parses as “γ → (δ → ε)”

To apply types to the lambda calculus, we do a couple of things. First, we need a syntax update so that we can include type information in lambda terms. And second, we need to add a set of rules to show what it means for a typed program to be valid.

The syntax part is easy. We add a “:” to the notation; the colon has an expression or variable binding on its left, and a type specification on its right. It asserts that whatever is on the left side of the colon has the type specified on the right side. A few examples:

\lambda x: \nu . x + 3
This asserts that the parameter, x has type nu, which we’ll use as the type name for the natural numbers. (In case it’s hard to tell, that’s a greek letter “nu” for natural.) There is no assertion of the type of the result of the function; but since we know that “+” is a function with type \nu  \rightarrow \nu \rightarrow \nu, we can infer that the result type of this function will be \nu.
(\lambda x . x + 3): \nu \rightarrow \nu
This is the same as the previous, but with the type declaration moved out, so that it asserts the type for the lambda expression as a whole. This time we can infer that x : \nu because the function type is \nu \rightarrow \nu, which means that the function parameter has type \nu.
\lambda x: \nu, y:\delta . \text{if}\hspace{1ex} y\hspace{1ex} \text{then}\hspace{1ex} x * x \hspace{1ex}\text{else} \hspace{1ex} x
This is a two parameter function; the first parameter has type ν, and the second has type δ. We can infer the return type, which is ν. So the type of the full function is ν → δ → ν. This may seem surprising at first; but remember that lambda calculus really works in terms of single parameter functions; multi-parameter functions are a shorthand for currying. So really, this function is: λ x : ν . (λ y : δ . if y then x * x else x); the inner lambda is type δ → ν; the outer lambda is type ν → (δ → ν).

To talk about whether a program is valid with respect to types (aka well-typed), we need to introduce a set of rules for type inference. Then we can verify that the program is type-consistent.

In type inference, we talked about judgements. When we can infer the type of an expression using an inference rule, we call that inference a type judgement. Type inference and judgements allow us to reason about types in a lambda expression; and if any part of an expression winds up with an inconsistent type judgement, then the expression is invalid. (When Church started doing typed LC, one of the motivations was to distinguish between values representing “atoms”, and values representing “predicates”; he was trying to avoid the Godel-esque paradoxes, by using types to ensure that predicates couldn’t operate on predicates.)

Type judgements are usually written in a sequent-based notation, which looks like a fraction where the numerator consists of statements that we know to be true; and the denominator is what we can infer from the numerator. In the numerator, we normally have statements using a context, which is a set of type judgements that we already know;it’s usually written as an uppercase greek letter. If a type context includes the judgement that x : \alpha, I’ll write that as \Gamma :- x : \alpha.

Rule 1: Type Identity

\frac{\mbox{}}{x : \alpha :- x: \alpha}

This is the simplest rule: if we have no other information except a declaration of the type of a variable, then we know that that variable has the type it was declared with.

Rule 2: Type Invariance

\frac{ \Gamma :- x:\alpha, x != y}{\Gamma + y:\beta :- x:\alpha}

This rule is a statement of non-interference. If we know that x:\alpha, then inferring a type judgement about any other term cannot change our type judgement for x.

Rule 3: Function Type Inference

\frac{\Gamma + x:\alpha :- y:\beta}{\Gamma :- (\lambda x:\alpha. y):\alpha \rightarrow \beta}

This statement allows us to infer function types given parameter types. Ff we know the type of the parameter to a function is \alpha; and if, with our knowledge of the parameter type, we know that the type of term that makes up the body of the function is \beta, then we know that the type of the function is \alpha \rightarrow \beta.

Rule 4: Function Application Inference

\frac{\Gamma :- x: \alpha \rightarrow \beta, \Gamma :- y:\alpha}{\Gamma :- (x y): \beta}

This one is easy: if we know that we have a function that takes a parameter of type \alpha and returns a value of type \beta, then if we apply that function to a value of type \alpha, we’ll get a value of type \beta.

These four rules are it. If we can take a lambda expression, and come up with a consistent set of type judgements for every term in the expression, then the expression is well-typed. If not, then the expression is invalid.

So let’s try taking a look at a simple lambda calculus expression, and seeing how inference works on it.

\lambda x y . y x

Without any type declarations or parameters, we don’t know the exact type. But we do know that “x” has some type; we’ll call that “α”; and we know that “y” is a function that will be applied with “x” as a parameter, so it’s got parameter type α, but its result type is unknown. So using type variables, we can say “x:α,y:α→β”. We can figure out what “α” and “β” are by looking at a complete expression. So, let’s work out the typing of it with x=”3″, and y=”λ a:ν.a*a”. We’ll assume that our type context already includes “*” as a function of type “ν→ν→ν”, where ν is the type of natural numbers.

  • “λ x y . y x) 3 (λ a:ν . a * a)”: Since 3 is a literal integer, we know its type: 3:ν.
  • By rule 4, we can infer that the type of the expression “a*a” where “a:ν” is “ν”, and *:ν→ν→ν so therefore, by rule 3 the lambda expression has type “ν→ν”. So with type labelling, our expression is now: “(λ x y . y x) (3:ν) (λ a:ν.(a*a):ν) : ν→ν”.
  • So – now, we know that the parameter “x” of the first lambda must be “ν”; and “y” must be “ν→ν”; so by rule 4, we know that the type of the application expression “y x” must be “ν”; and then by rule 3, the lambda has type: “ν→(ν→ν)→ν”.
  • So, for this one, both α and β end up being “ν”, the type of natural numbers.

So, now we have a simply typed lambda calculus. The reason that it’s simply typed is because the type treatment here is minimal: the only way of building new types is through the unavoidable \rightarrow constructor. Other typed lambda calculi include the ability to define parametric types, which are types expressed as functions ranging over types.

Programs are Proofs

Here’s where it gets really fun. Think about the types in the simple typed language calculus. Anything which can be formed from the following grammar is a lambda calculus type:

  • type ::= primitive | function | ( type )
  • primitive ::= α | β | δ | …
  • function ::= type→type

The catch with that grammar is that you can create type expressions which, while they are valid type definitions, you can’t write a single, complete, closed expression which will actually have that type. (A closed expression is one with no free variables.) When there is an expression that has a type, we say that the expression inhabits the type; and that the type is an inhabited type. If there is no expression that can inhabit a type, we say it’s uninhabitable. Any expression which either can’t be typed using the inference rules, or which is typed to an uninhabitable type is a type error.

So what’s the difference between inhabitable type, and an uninhabitable type?

The answer comes from something called the Curry-Howard isomorphism. For a typed lambda calculus, there is a corresponding intuitionistic logic. A type expression is inhabitable if and only if the type is a provable theorem in the corresponding logic.

The type inference rules in lambda calculus are, in fact, the same as logical inference rules in intuitionistic logic. A type \alpha \rightarrow \beta can be seen as either a statement that this is a function that maps from a value of type \alpha to a value of type \beta, or as a logical statement that if we’re given a fact alpha \alpha, we could use that to infer the truth of a fact \beta.

If there’s a logical inference chain from an axiom (a given type assignment) to an inferred statement, then the inferred statement is an inhabitable type. If we have a type \alpha \rightarrow \alpha, then given a inhabited type \alpha, we know that \alpha \rightarrow \alpha is inhabitable, because if \alpha is a fact, then \alpha \rightarrow \alpha is also a fact.

On the other hand, think of a different case \alpha \rightarrow \beta. That’s not a theorem, unless there’s some other context that proves it. As a function type, that’s the type of a function which, without including any context of any kind, can take a parameter of type α, and return a value of a different type β. You can’t do that – there’s got to be some context which provides a value of type β – and to access the context, there’s got to be something to allow the function to access its context: a free variable. Same thing in the logic and the lambda calculus: you need some kind of context to establish “α→β” as a theorem (in the logic) or as an inhabitable type (in the lambda calculus).

It gets better. If there is a function whose type is a theorem in the corresponding intuitionistic logic, then the program that has that type is a proof of the theorem. Each beta reduction is equivalent to an inference step in the logic. This is what programming languages geeks like me mean when we say “the program is the proof”: a well-typed program is, literally, a proof its well-typed-ness.

To connect back to the discussion about models: the intuitionistic logic corresponding to the lambda calculus and intuitionistic logic are, in a deep sense, just different reflections of the same thing. We know that intuitionistic logic has a valid model. And that, in turn, means that lambda calculus is valid as well. When we show that something is true using the lambda calculus, we can trust that it’s not an artifact of an inconsistent system.

Models and Why They Matter

As I said in the last post, Church came up with λ-calculus, which looks like it’s a great formal model of computation. But – there was a problem. Church struggled to find a model. What’s a model, and why would that matter? That’s the point of this post. To get a quick sense of what a model is, and why it matters?

A model is basically a mapping from the symbols of a logical system to some set off objects, such that all statements that you can prove in the logical system will be true about the corresponding objects. Note that when I say object here, I don’t necessarily mean real-world physical objects – they’re just something that we can work with, which is well-defined and consistent.

Why does it matter? Because the whole point of a system like λ-calculus is because we want to use it for reasoning. When you have a logical system like λ-calculus, you’ve built this system with its rules for a reason – because you want to use it as a tool for understanding something. The model provides you with a way of saying that the conclusions you derive using the system are meaningful. If the model isn’t correct, if it contains any kind of inconsistency, then your system is completely meaningless: it can be used to derive anything.

So the search for a model for λ-calculus is really important. If there’s a valid model for it, then it’s wonderful. If there isn’t, then we’re just wasting our time looking for one.

So, now, let’s take a quick look at a simple model, to see how a problem can creep in. I’m going to build a logic for talking about the natural numbers – that is, integers greater than or equal to zero. Then I’ll show you how invalid results can be inferred using it; and finally show you how it fails by using the model.

One quick thing, to make the notation easier to read: I’m going to use a simple notion of types. A type is a set of atoms for which some particular one-parameter predicate is true. For example, if P(x) is true, I’ll say that x is a member of type P. In a quantifier, I’ll say things like forall x in P: mbox{em foo} to mean forall x : P(x) Rightarrow mbox{em foo}. Used this way, we can say that P is a type predicate.

How do we define natural numbers using logic?

First, we need an infinite set of atoms, each of which represents one number. We pick one of them, and call it zero. To represent the fact that they’re natural numbers, we define a predicate {cal N}(x), which is true if and only if x is one of the atoms that represents a natural number.

Now, we need to start using predicates to define the fundamental properties of numbers. The most important property of natural numbers is that they are a sequence. We define that idea using a predicate, mbox{em succ}(x,y), where mbox{em succ}(x,y) is true if and only if x = y + 1. To use that to define the ordering of the naturals, we can say: forall x in {cal N}: exists y: mbox{em succ}(x, y).

Or in english: every natural number has a successor – you can always add one to a natural number and get another natural number.

We can also define predecessor similarly, with two statements:

  1. forall x in {cal N}: exists y in {cal N}: mbox{em pred}(x, y).
  2. forall x,y in {cal N}: mbox{em pred}(y,x) Leftrightarrow mbox{em succ}(x,y)

So every number has a predecessor, and every number has a successor, and x is the predecessor of y if y is the successor of x.

To be able to define things like addition and subtraction, we can use successor. Let’s define addition using a predicate Sum(x,y,z) which means “z = x + y”.

  1. forrall x,y in {cal N}: exists z in {cal N} : Sum(x,y,z)
  2. forall x,y in {cal N}: Sum(x, 0, x)
  3. forall x,y,z in {cal N}: exists a,b in {cal N}: Sum(a,b,z) land mbox{em succ}(a,x) land mbox{em succ}(y,b) Rightarrow Sum(x, y, z)

Again, in english: for any two natural numbers, there is a natural number that it their sum; x + 0 always = x; and for any natural number, x + y = z is true if (x + 1) + (y – 1) = z.

Once we have addition, subtraction is easy: forall x,y,z in {cal N} : diff(x,y,z) Leftrightarrow sum(z,y,x)

That’s: x-y=z if and only if x=y+z.

We can also define greater than using addition:

  • . But we’ve violated that – we have both forall x in {cal N}: 0 le x, and