Bad Arithmetic and Blatant Political Lies

I’ve been trying to say away from the whole political thing lately. Any time that I open my mouth to say anything about politicians, I get a bunch of assholes trying to jump down my throat for being “biased”. But sometimes, things just get too damned ridiculous, and I can’t possibly let it go without comment.

In the interests of disclosure: I despise Mitt Romney. Despite that, I think he’s gotten a very unfairly hard time about a lot of things. Let’s face it, the guys a rich investor. But that’s been taken by the media, and turned in to the story through which everything is viewed, whether it makes sense or not.

For example, there’s the whole $10,000 bet nonsense. I don’t think that that made a damned bit of sense. It was portrayed as “here’s a guy so rich that he can afford to lose $10,000”. But… well, let’s look at it from a mathematical perspective.

You can assess the cost of a bet by looking at it from probability. Take the cost of losing, and multiply it by the probability of losing. That’s the expected cost of the bet. So, in the case of that debate moment, what was the expected cost of the bet? $0. If you know that you’re betting about a fact, and you know the fact, then you know the outcome of the bet. It’s a standard rhetorical trick. How many of us have said “Bet you a million dollars”? It doesn’t matter what dollar figure you attach to it – because you know the fact, and you know that the cost of the bet, to you, is 0.

But… Well, Mitt is a rich asshole.

As you must have heard, Mitt released his income tax return for last year, and an estimate for this year. Because his money is pretty much all investment income, he paid a bit under 15% in taxes. This is, quite naturally, really annoying to many people. Those of us who actually have jobs and get paid salaries don’t get away with a tax rate that low. (And people who are paid salary rather than investment profits have to pay the alternative minimum tax, which means that they’re not able to deduct charity the way that Mitt is.)

So, in an interview, Mitt was asked about the fairness of a guy who made over twenty million dollars a year paying such a low rate. And Mitt, asshole that he is, tried to cover up the insanity of the current system, by saying:

Well, actually, I released two years of taxes and I think the average is almost 15 percent. And then also, on top of that, I gave another more 15 percent to charity. When you add it together with all of the taxes and the charity, particularly in the last year, I think it reaches almost 40 percent that I gave back to the community.

I don’t care about whether the reasoning there is good or not. Personally, I think it’s ridiculous to say “yeah, I didn’t pay taxes, but I gave a lot of money to my church, so it’s OK.” But forget that part. Just look at the freaking arithmetic!

He pays less than 15% in taxes.

He pays 15% in charity (mostly donations to his church).

What’s less than 15 + 15?

It sure as hell isn’t “almost 40 percent”. It’s not quite 30 percent. This isn’t something debatable. It’s simple, elementary school arithmetic. It’s just fucking insane that he thinks he can just get away with saying that. But he did – they let him say that, and didn’t challenge it at all. He says “less than 15 + 15 = almost 40”, and the interviewer never even batted an eye.

And then, he moved on to something which is a bit more debatable:

One of the reasons why we have a lower tax rate on capital gains is because capital gains are also being taxed at the corporate level. So as businesses earn profits, that’s taxed at 35 percent, then as they distribute those profits as dividends, that’s taxed at 15 percent more. So, all total, the tax rate is really closer to 45 or 50 percent.

Now, like I said, you can argue about that. Personally, I don’t think it’s a particularly good argument. The way that I see it, corporations are a tradeoff. A business doesn’t need to be a corporation. You become a corporation, because transforming the business into a quasi-independent legal entity gives you some big advantages. A corporation owns its own assets. You, as an individual who owns part of a corporation, aren’t responsible for the debts of the corporation. You, as an individual who owns part of a corporation, aren’t legally liable for the actions (such as libel) of the corporation. The corporation is an independent entity, which owns its own assets, which is responsible for its debts and actions. In exchange for taking on the legal status on an independent entity, that legal entity becomes responsible for paying taxes on its income. You give it that independent legal status in order to protect yourself; and in exchange, that independent legal status entails an obligation for that independent entity to pay its own taxes.

But hey, let’s leave that argument aside for the moment. Who pays the cost of the corporate taxes? Is it the owners of the business? Is it the people who work for the business? Is it someone else?

When they talk about their own ridiculously low tax rates, people like Mitt argue that they’re paying those taxes, and they want to add those taxes to the total effective tax that they pay.

But when they want to argue about why we should lower corporate tax rates, they pull out a totally different argument, which they call the “flypaper theory“. The flypaper theory argues that the burden of corporate taxes falls on the employees of the company – because if the company didn’t have to pay those taxes, that money would be going to the employees as salary – that is, the taxes are part of the overall expenses paid by the company. A company’s effective profits are (revenue – expenses). Expenses, in turn, are taxes+labor+materials+…. The company makes a profit of $P to satisfy its shareholders. So if you took away corporate taxes, the company could continue to make $P while paying its employees more. Therefore, the cost of the corporate taxes comes out of the salaries of the corporations employees.

You can make several different arguments – that the full burden of taxes fall on to the owners, or that the full burden of taxes falls on the employees, or that the full burden of taxes falls on the customers (because prices are raised to cover them). Each of those is something that you could reasonably argue. But what the conservative movement in America likes to do is to claim all of those: that the full burden of corporate taxes falls on the employees, and the full burden of corporate taxes falls on the customers, and the full burden of corporate taxes falls on the shareholders.

That’s just dishonest. If the full burden falls on one, then none of the burden falls on anyone else. The reality is, the burden of taxes is shared between all three. If there were no corporate taxes, companies probably would be able to pay their employees more – but there’s really no way that they’d take all of the money they pay in taxes, and push that into salary. And they’d probably be able to lower prices – but they probably wouldn’t lower prices enough to make up the entire difference. And they’d probably pay more in dividends/stock buybacks to pay the shareholders.

But you don’t get to count the same tax money three times.

The Basics of Software Transactional Memory

As promised, it’s time for software transactional memory!

A bit of background, first. For most of the history of computers, the way that we’ve built software is very strongly based on the fact that a computer has a processor – a single CPU, which can do one thing at a time, in order. Until recently, that was true. But now it really isn’t anymore. There’s the internet – which effectively means that no computer is ever really operating in isolation – it’s part of a huge computational space shared with billions of other computers. And even ignoring the internet, we’re rapidly approaching the point where tiny devices, like cellphones, will have more than one CPU.

The single processor assumption makes things easy. We humans tend to think very sequentially – that is, the way that we describe how to do things is: do this, then do that. We’re not so good at thinking about how to do lots of things at the same time. Just think about human language. If I want to bake a cake, what I’ll do is: measure out the butter and sugar, put them in the mixer, mix it until they’re creamed. Then add milk. Then in a separate bowl, measure out and sift the flour and the baking powder. Then slowly pour the dry stuff into the wet, and mix it together. Etc.

I don’t need to fully specify that order. If I had multiple bakers, they could do many of steps at the same time. But how, in english, can I clearly say what needs to be done? I can say it, but it’s awkward. It’s harder to say, and harder to understand than the sequential instructions.

What I need to do are identifying families of things that can be done at the same time, and then the points at which those things will merge.

All of the measurements can be done at the same time. In any mixings step, the mixing can’t be done until all of the ingredients are ready. Ingredients being ready could mean two things. It could mean that the ingredients were measured out; or it could mean that one of the ingredients for the mix is the product of one of the previous mixing steps, and that that previous step is complete. In terms of programming, we’d say that the measurement steps are independent; and the points at which we need to wait for several things to get done (like “we can’t mix the dry ingredients in to the wet until the wet ingredients have all been mixed and the dry ingredients have been measured”), we’d call synchroniation points.

It gets really complicated, really quickly. In fact, it gets even more complicated than you might think. You see, if you were to write out the parallel instructions for this, you’d probably leave a bunch of places where you didn’t quite fully specify things – because you’d be relying on stuff like common sense on the part of your bakers. For example, you’d probably say to turn on the over to preheat, but you wouldn’t specifically say to wait until it reached the correct temperature to put stuff into it; you wouldn’t mention things like “open the over door, then put the cake into it, then close it”.

When we’re dealing with multiple processors, we get exactly those kinds of problems. We need to figure out what can be done at the same time; and we need to figure out what the synchronization points are. And we also need to figure out how to do the synchronization. When we’re talking about human bakers “don’t mix until the other stuff is ready” is fine. But in software, we need to consider things like “How do we know when the previous steps are done?”.

And it gets worse than that. When you have a set of processors doing things at the same time, you can get something called a race condition which can totally screw things up!

For example, imagine that we’re counting that all of the ingredients are measured. We could imagine the mixer process as looking at a counter, waiting until all five ingredients have been measured. Each measuring process would do its measurement, and then increment the counter.

  val measureCount = 0
  process Mixer() {
    wait until measureCount == 5
  }

  process Measurer() {
     do_measure
	 measureCount = measureCount + 1
  }

What happens if two measurer finish at almost the same time? The last statement in Measurer actually consists of three steps: retrieve the value of measureCount; add one; store the incremented value. So we could wind up with:

Time Measurer1 Measurer2 measureCount
0 1
1 Fetch measureCount(=1) 1
2 Increment(=2) Fetch measurecount(=1) 1
3 Store updated(=2) Increment(=2) 2
4 Store updated(=2) 2

Now, Mixer will never get run! Because of the way that the two Measurers overlapped, one of the increments effectively got lost, and so the count will never reach 5. And the way it’s written, there’s absolutely no way to tell whether or not that happened. Most of the time, it will probably work – because the two processes have to hit the code that increments the counter at exactly the same time in order for there to be a problem. But every once in a while, for no obvious reason, the program will fail – the mixing will never get done. It’s the worst kind of error to try to debug – one which is completely unpredictable. And if you try to run it in a debugger, because the debugger slows things down, you probably won’t be able to reproduce it!

This kind of issue always comes down to coordination or synchronization of some kind – that is, the main issue is how do the different processes interact without stomping on each other?

The simplest approach is to use something called a lock. A lock is an object which signals ownership of some resource, and which has one really important property: in concept, it points at at most one process, and updating it is atomic meaning that when you want to look at it and update it, nothing can intervene between the read and write. So if you want to use the resource managed by the lock, you can look at the lock, and see if anyone is using it; and if not, set it to point to you. That process is called acquiring the lock.

In general, we wrap locks up in libraries to make them easier to use. If “l” was a lock, you’d take a lock by using a function something like “lock(l)”, which really expanded to something like:

def take(L: Lock) {
  while (L != me)
     atomically do if L==no one then L=me
}

So the earlier code becomes:

val measureCount = 0
val l = new Lock()

process Mixer() {
  wait until measureCount == 5
}

process Measurer() {
  do_measure
  lock(l)
  measureCount = measureCount + 1
  unlock(l)
}

In a simple example like that, locks are really easy. Unfortunately, real examples get messy. For instance, there’s a situation called deadlock. A classic demonstration is something called the dining philosophers. You’ve got four philosophers sitting at a table dinner table. Between each pair, there’s a chopstick. In order to eat, a philosopher needs two chopsticks. When they get two chopsticks, they’ll use them to take a single bite of food, and then they’ll put down the chopsticks. If Each philosopher starts by grabbing the chopstick to their right, then no one gets to each. Each has one chopstick, and there’s no way for them to get a second one.

That’s exactly what happens in a real system. You lock each resource that you want to share. For some operations, you’re going to use more than one shared resource, and so you need two locks. If you have multiple tasks that need multiple resources, it’s easy to wind up with a situation where each task has some subset of the locks that they need.

Things like deadlock mean that simple locks get hairy really quickly. Not that any of the more complex coordination strategies make deadlocks impossible; you can always find a way of creating a deadlock in any system – but it’s a lot easier to create accidental deadlocks using simple locks than, say, actors.

So there’s a ton of methods that try to make it easier to do coordination between multiple tasks. Under the covers, these ultimately rely on primitives like locks (or semaphores, another similar primitive coordination tool). But they provide a more structured way of using them. Just like structured control flow makes code cleaner, clearer, and more maintanable, structured coordination mechanism makes concurrency cleaner, clearer, and more maintainable.

Software transactional memory is one approach to this problem, which is currently very trendy. It’s still not entirely clear to me whether or not STM is really quite up to the real world – current implementations remain promising, but inefficient. But before getting in to any of that, we need to talk about just what it is.

As I see it, STM is based on two fundamental concepts:

  1. Optimism. In software terms, by optimism, we mean that we’re going to plow ahead and assume that there aren’t any errors; when we’re done, we’ll check if there was a problem, and clean up if necessary. A good example of this from my own background is source code control systems. In the older systems like RCS, you’d lock a source file before you edited it; then you’d make your changes, and check them in, and release the lock. That way, you know that you’re never going to have two people making changes to the same file at the same time. But the downside is that you end up with lots of people sitting around doing nothing, waiting to get the lock on the file they want to change. Odds are, they weren’t going to change the same part of the file as the guy who has the lock. But in order to make sure that they can’t, the locks also block a lot of real work. Eventually, the optimistic systems came along, and what they did was say: “go ahead and edit the file all you want. But before I let you check in (save) the edited file to the shared repository, I’m going to make sure that no one changed it in a way that will mess things up. If I find out that someone did, then you’re going to have to fix it.”
  2. Transactions. A transaction is a concept from (among other places) databases. In a database, you often make a collection of changes that are, conceptually, all part of the same update. By making them a transaction, they become one atomic block – and either the entire collection all succeedd, or the entire collection all fail. Transactions guarantee that you’ll never end up in a situation where half of the changes in an update got written to the database, and the other half didn’t.

What happens in STM is that you have some collection of special memory locations or variables. You’re only allowed to edit those variables in a transaction block. Whenever a program enters a transaction block, it gets a copy of the transaction variables, and just plows ahead, and does whatever it wants with its copy of the transaction variables. When it gets to the end of the block, it tries to commit the transaction – that is, it tries to update the master variables with the values of its copies. But it checks them first, to make sure that the current value of the master copies haven’t changed since the time that it made its copy. If they did, it goes back and starts over, re-running the transaction block. So if anyone else updated any of the transaction variables, the transaction would fail, and then get re-executed.

In terms of our baking example, both of the measurers would enter the transaction block at the same time; and then whichever finished first would commit its transaction, which would update the master count variable. Then when the second transaction finished, it would check the count variable, see that it changed, and go back and start over – fetching the new value of the master count variable, incrementing it, and then committing the result. In terms of code, you’d just do something like:

transactional val measureCount = 0

process Mixer() {
  wait until measureCount == 5
}

process Measurer() {
  do_measure
  atomically {
    measureCount = measureCount + 1
  }
}

It’s really that simple. You just mark all the shared resources as transactional, and then wrap the code that modifies them in a transaction block. And it just works. It’s a very elegant solution.

Of course there’s a lot more to it, but that’s the basic idea. In the code, you identify the transactional variables, and only allow them to be updated inside of a transaction block. At runtime, when you encounter a transaction block, charge ahead, and do whatever you want. Then when you finish, make sure that there isn’t an error. If there was, try again.

So what’s it look like in a non-contrived programming language? These days, I’m doing most of my coding in Scala. There’s a decent STM implementation for Scala as a part of a package called Akka.

In Akka, the way that you define a transactional variable is by using a Ref type. A Ref is a basically a cell that wraps a value. (It’s almost like a pointer value in C.) So, for example, in our Baker example:

var count :Ref[Int] = Ref(0)

Then in code, to use it, you literally just wrap the code that modifies the Refs in “atomic”. Alas, you don’t quite get to treat the refs like normal variables – to access the value of a ref, you need to call Ref.get; to change the value, you need to use a method alter, which takes a function that computes the new value in terms of the old.

class Measurer {
  def doMeasure() {
    // do the measuring stuff
    atomic {
	  ref.alter(_ + 1)
    }
  }
}

The “(_ + 1)” probably needs a quick explanation. In Scala, you can define a single expression function using “_” to mark the slot where the parameter should go. So “(_ + 1)” is equivalent to the lambda expression { x => x + 1}.

You can see, just from this tiny example, why STM is such an attractive approach. It’s so simple! It’s very easy to read and write. It’s a very simple natural model. It’s brilliant in its simplicity. Of course, there’s more to it that what I’ve written about here – error handling, voluntary transaction abort, retry management, transaction waits – but this is the basics, and it really is this simple.

What are the problems with this approach?

  1. Impurity. If not all variables are transactional, and you can modify a non-transactional variable inside of a transaction block, then you’ve got a mess onp your hands. Values from transactionals can “leak” out of transaction blocks.
  2. Inefficiency. You’ve got to either copy all of the transactional variables at the entry to the transaction block, or you’ve got to use some kind of copy-on-write strategy. However you do it, you’ve got grief – aggressive copying, copy-on-write, memory protect – they’ve all got non-trivial costs. And re-doing the entire transaction block every time it fails can eat a lot of CPU.
  3. Fairness. This is a fancy term for “what if one guy keeps getting screwed?” You’ve got lots of processes all working with the same shared resources, which are protected behind the transactional variables. It’s possible for timing to work out so that one process keeps getting stuck doing the re-tries. This is something that comes up a lot in coordination strategies for concurrency, and the implementations can get pretty hairy trying to make sure that they don’t dump all of the burden of retries on one process.

LaTeX Issues Fixed, Hopefully

A quick administrative note. For people reading this blog via RSS readers, there’ve been problems with LaTeX equations for a couple of months. I’ve set up a local tex server and tweak the configurations. Hopefully, now everyone will be able to see the math stuff.

As a test, pi = 4sum_{k=0}^{infty} frac{(-1)^k}{2k+1}. If you still get an error box instead of an equation, please drop me an email.

The Wrong Way To Write Concurrent Programs: Actors in Cruise

I’ve been planning to write a few posts about some programming stuff that interests me. I’ve spent a good part of my career working on systems that need to support concurrent computation. I even did my PhD working on a system to allow a particular style of parallel programming. It’s a really hard problem – concurrency creates a host of complex issues in how systems behave, and the way that you express concurrency in a programming language has a huge impact on how hard it is to read, write, debug, and reason about systems.

So, like I’ve said, I’ve spent a lot of time thinking about these issues, and looking at various different proposed solutions, as well as proposing a couple of my own. But I really don’t know of any good writeup describing the basics of the most successful approaches for beginners. So I thought I could write one.

But that’s not really today’s post. Todays post is my version of a train-wreck. Long-time readers of the blog know that I’m fascinated with bizarre programming languages. So today, I’m going to show you a twisted, annoying, and thoroughly pointless language that I created. It’s based on one of the most successful models of concurrent programming, called Actors, which was originally proposed by Professor Gul Agha of UIUC. There’ve been some really nice languages built using ideas from Actors, but this is not one of them.

The language is called “Cruise”. Why? Because it’s a really bad Actor language. And what name comes to mind when I think of really, really bad actors with delusions of adequacy? Tom Cruise.

You can grab my implementation from github. Just so you know, the code sucks. It’s something I threw together in my spare time a couple of years ago, and haven’t really looked at since. So it’s sloppy, overcomplicated, probably buggy, and slow as a snail on tranquilizers.

Quick overview of the actor model

Actors are a theoretical model of computation, which is designed to describe completely asynchronous parallel computation. Doing things totally asynchronously is very strange, and very counter-intuitive. But the fact of the matter is, in real distributed systems, everything *is* fundamentally asynchronous, so being able to describe distributed systems in terms of a simple, analyzable model is a good thing.

According to the actor model, a computation is described by a collection of things called, what else, actors. An actor has a mailbox, and a behavior. The mailbox is a uniquely named place where messages sent to an actor can be queued; the behavior is a definition of how the actor is going to process a message from its mailbox. The behavior gets to look at the message, and based on its contents, it can do three kinds of things:

  1. Create other actors.
  2. Send messages to other actors whose mailbox it knows.
  3. Specify a new behavior for the actor to use to process its next message.

You can do pretty much anything you need to do in computations with that basic mechanism. The catch is, as I said, it’s all asynchronous. So, for example, if you want to write an actor that adds two numbers, you can’t do it by what you’d normally think of as a function call. In a lot of ways, it looks like a method call in something like Smalltalk: one actor (object) sends a message to another actor, and in response, the receiver takes some action specified by them message.

But subroutines and methods are synchronous, and nothing in actors is synchronous. In an object-oriented language, when you send a message, you stop and wait until the receiver of the message is done with it. In Actors, it doesn’t work that way: you send a message, and it’s sent; that’s it, it’s over and done with. You don’t wait for anything; you’re done. If you want a reply, you need to send the the other actor a reference to your mailbox, and make sure that your behavior knows what to do when the reply comes in.

It ends up looking something like the continuation passing form of a functional programming language: to do a subroutine-like operation, you need to pass an extra parameter to the subroutine invocation; that extra parameter is the *intended receiver* of the result.

You’ll see some examples of this when we get to some code.

Tuples – A Really Ugly Way of Handling Data

This subtitle is a bit over-the-top. I actually think that my tuple notion is pretty cool. It’s loosely based on how you do data-types in Prolog. But the way that it’s implemented in Cruise is absolutely awful.

Cruise has a strange data model. The idea behind it is to make it easy to build actor behaviors around the idea of pattern matching. The easiest/stupidest way of doing this is to make all data consist of tagged tuples. A tagged tuple consists of a tag name (an identifier starting with an uppercase letter), and a list of values enclosed in the tuple. The values inside of a tuple can be either other tuples, or actor names (identifiers starting with lower-case letters).

So, for example, Foo(Bar(), adder) is a tuple. The tag is “Foo“. It’s contents are another tuple, “Bar()“, and an actor name, “adder“.

Since tuples and actors are the only things that exist, we need to construct all other types of values from some combination of tuples and actors. To do math, we can use tuples to build up Peano numbers. The tuple “Z()” is zero; “I(n)” is the number n+1. So, for example, 3 is “I(I(I(Z())))“.

The only way to decompose tuples is through pattern matching in messages. In an actor behavior. message handlers specify a *tuple pattern*, which is a tuple where some positions may be filled by{em unbound} variables. When a tuple is matched against a pattern, the variables in the pattern are bound to the values of the corresponding elements of the tuple.

A few examples:

  • matching I(I(I(Z()))) with I($x) will succeed with $x bound to I(I(Z)).
  • matching Cons(X(),Cons(Y(),Cons(Z,Nil()))) with Cons($x,$y) will succeed with $x bound to X(), and $y bound to Cons(Y(),Cons(Z(),Nil())).
  • matching Cons(X(),Cons(Y(),Cons(Z(),Nil()))) with Cons($x, Cons(Y(), Cons($y, Nil()))) will succeed with $x bound to X(), and $y bound to Z().
  • Code Examples!

    Instead of my rambling on even more, let’s take a look at some Cruise programs. We’ll start off with Hello World, sort of.


    actor !Hello {
      behavior :Main() {
        on Go() { send Hello(World()) to out }
      }
      initial :Main
    }
    
    instantiate !Hello() as hello
    send Go() to hello
    
    

    This declares an actor type “!Hello”; it’s got one behavior with no parameters. It only knows how to handle one message, “Go()”. When it receives go, it sends a hello world tuple to the actor named “out”, which is a built-in that just prints whatever is sent to it.

    Let’s be a bit more interesting, and try something using integers. Here’s some code to do a greater than comparison:


    actor !GreaterThan {
      behavior :Compare() {
        on GT(Z(),Z(), $action, $iftrue, $iffalse) {
          send $action to $iffalse
        }
        on GT(Z(), I($x), $action, $iftrue, $iffalse) {
          send $action to $iffalse
        }
        on GT(I($x), Z(), $action, $iftrue, $iffalse) {
          send $action to $iftrue
        }
        on GT(I($x), I($y), $action, $iftrue, $iffalse) {
          send GT($x,$y,$action,$iftrue,$iffalse) to $self
        }
      }
      initial :Compare
    }
    
    actor !True {
      behavior :True() {
        on Result() { send True() to out}
      }
      initial :True
    }
    
    actor !False {
      behavior :False() {
        on Result() { send False() to out}
      }
      initial :False
    }
    
    instantiate !True() as true
    instantiate !False() as false
    instantiate !GreaterThan() as greater
    send GT(I(I(Z())), I(Z()), Result(), true, false) to greater
    send GT(I(I(Z())), I(I(I(Z()))), Result(), true, false) to greater
    send GT(I(I(Z())), I(I(Z())), Result(), true, false) to greater
    
    

    This is typical of how you do “control flow” in Cruise: you set up different actors for each branch, and pass those actors names to the test; one of them will receive a message to continue the execution.

    What about multiple behaviors? Here’s a trivial example of a flip-flop:


    actor !FlipFlop {
      behavior :Flip() {
        on Ping($x) {
          send Flip($x) to out
          adopt :Flop()
        }
        on Pong($x) {
          send Flip($x) to out
        }
      }
      behavior :Flop() {
        on Ping($x) {
          send Flop($x) to out
        }
        on Pong($x) {
          send Flop($x) to out
          adopt :Flip()
        }
      }
      initial :Flip
    }
    
    instantiate !FlipFlop() as ff
    send Ping(I(I(Z()))) to ff
    send Ping(I(I(Z()))) to ff
    send Ping(I(I(Z()))) to ff
    send Ping(I(I(Z()))) to ff
    send Pong(I(I(Z()))) to ff
    send Pong(I(I(Z()))) to ff
    send Pong(I(I(Z()))) to ff
    send Pong(I(I(Z()))) to ff
    
    

    If the actor is in the “:Flip” behavior, then when it gets a “Ping”, it sends “Flip” to out, and switches behavior to flop. If it gets point, it just sents “Flip” to out, and stays in “:Flip”.

    The “:Flop” behavior is pretty much the same idea, accept that it switches behaviors on “Pong”.

    An example of how behavior changing can actually be useful is implementing settable variables:


    actor !Var {
      behavior :Undefined() {
        on Set($v) { adopt :Val($v) }
        on Get($target) { send Undefined() to $target }
        on Unset() { }
      }
      behavior :Val($val) {
        on Set($v) { adopt :Val($v) }
        on Get($target) { send $val to $target }
        on Unset() { adopt :Undefined() }
      }
      initial :Undefined
    }
    instantiate !Var() as v
    send Get(out) to v
    send Set(I(I(I(Z())))) to v
    send Get(out) to v
    
    

    Two more programs, and I’ll stop torturing you. First, a simple adder:


    actor !Adder {
      behavior :Add() {
        on Plus(Z(),$x, $target) {
          send $x to $target
        }
        on Plus(I($x), $y, $target) {
          send Plus($x,I($y), $target) to $self
        }
      }
      initial :Add
    }
    
    actor !Done {
      behavior :Finished() {
        on Result($x) { send Result($x) to out }
      }
      initial :Finished
    }
    
    instantiate !Adder() as adder
    instantiate !Done() as done
    send Plus(I(I(I(Z()))),I(I(Z())), out) to adder
    
    

    Pretty straightforward – the only interesting thing about it is the way that it sends the result of invoking add to a continuation actor.

    Now, let’s use an addition actor to implement a multiplier actor. This shows off some interesting techniques, like carrying auxiliary values that will be needed by the continuation. It also shows you that I cheated, and added integers to the parser; they’re translated into the peano-tuples by the parser.


    actor !Adder {
      behavior :Add() {
        on Plus(Z(),$x, $misc, $target) {
          send Sum($x, $misc) to $target
        }
        on Plus(I($x), $y, $misc, $target) {
          send Plus($x,I($y), $misc, $target) to $self
        }
      }
      initial :Add
    }
    
    actor !Multiplier {
      behavior :Mult() {
        on Mult(I($x), $y, $sum, $misc, $target) {
          send Plus($y, $sum, MultMisc($x, $y, $misc, $target), $self) to adder
        }
        on Sum($sum, MultMisc(Z(), $y, $misc, $target)) {
          send Product($sum, $misc) to $target
        }
        on Sum($sum, MultMisc($x, $y, $misc, $target)) {
          send Mult($x, $y, $sum, $misc, $target) to $self
        }
      }
      initial :Mult
    }
    
    instantiate !Adder() as adder
    instantiate !Multiplier() as multiplier
    send Mult(32, 191, 0, Nil(), out) to multiplier
    
    

    So, is this Turing complete? You bet: it’s got peano numbers, conditionals, and recursion. If you can do those three, you can do anything.

Sidetrack from the CCCs: Lambda Calculus

So, last post, I finally defined closed cartesian categories. And I alluded to the fact that the CCCs are, essentially, equivalent to the simply typed λ calculus. But I didn’t really talk about what that meant.

Before I can get to that, you need to know what λ calculus is. Many readers are probably familiar, but others aren’t. And as it happens, I absolutely love λ calculus.

In computer science, especially in the field of programming languages, we tend to use λ calculus a whole lot. It’s also extensively used by logicians studying the nature of computation and the structure of discrete mathematics. λ calculus is great for a lot of reasons, among them:

  1. It’s very simple.
  2. It’s Turing complete: if a function can be computed by any possible computing device, then it can be written in λ-calculus.
  3. It’s easy to read and write.
  4. Its semantics are strong enough that we can do reasoning from it.
  5. It’s got a good solid model.
  6. It’s easy to create variants to explore the properties of various alternative ways of structuring computations or semantics.

The ease of reading and writing λ calculus is a big deal. It’s led to the development of a lot of extremely good programming languages based, to one degree or another, on the λ calculus: Lisp, ML, Haskell, and my current favorite, Scala, are very strongly λ calculus based.

The λ calculus is based on the concept of functions. In the pure λ calculus, everything is a function; there are no values except for functions. In fact, we can pretty much build up all of mathematics using λ-calculus.

With the lead-in out of the way, let’s dive in a look at λ-calculus. To define a calculus, you need to define two things: the syntax, which describes how valid expressions can be written in the calculus; and a set of rules that allow you to symbolically manipulate the expressions.

Lambda Calculus Syntax

The λ calculus has exactly three kinds of expressions:

  1. Function definition: a function in λ calculus is an expression, written: λ param . body, which defines a function with one parameter.
  2. Identifier reference: an identifier reference is a name which matches the name of a parameter defined in a function expression enclosing the reference.
  3. Function application: applying a function is written by putting the function value in front of its parameter, as in x y to apply the function x to the value y.

There’s a trick that we play in λ calculus: if you look at the definition above, you’ll notice that a function (lambda expression) only takes one parameter. That seems like a very big constraint – how can you even implement addition with only one parameter?

It turns out to be no problem, because of the fact that functions are, themselves, values. Instead of writing a two parameter function, you can write a one parameter function that returns a one parameter function, which can then operate on the second parameter. In the end, it’s effectively the same thing as a two parameter function. Taking a two-parameter function, and representing it by two one-parameter functions is called currying, after the great logician Haskell Curry.

For example, suppose we wanted to write a function to add x and y. We’d like to write something like: λ x y . x + y. The way we do that with one-parameter functions is: we first write a function with one parameter, which returns another function with one parameter.

Adding x plus y becomes writing a one-parameter function with parameter x, which returns another one parameter function which adds x to its parameter: λ x . (λ y . x + y).

Now that we know that adding multiple parameter functions doesn’t really add anything but a bit of simplified syntax, we’ll go ahead and use them when it’s convenient.

One important syntactic issue that I haven’t mentioned yet is closure or complete binding. For a λ calculus expression to be evaluated, it cannot reference any identifiers that are not bound. An identifier is bound if it a parameter in an enclosing λ expression; if an identifier is not bound in any enclosing context, then it is called a free variable. Let’s look quickly at a few examples:

  • λ x . p x y: in this expression, y and p are free, because they’re not the parameter of any enclosing λ expression; x is bound because it’s a parameter of the function definition enclosing the expression p x y where it’s referenced.
  • λ x y.y x: in this expression both x and y are bound, because they are parameters of the function definition, and there are no free variables.
  • λ y . (λ x . p x y). This one is a tad more complicated, because we’ve got the inner λ. So let’s start there. In the inner λ, λ x . p x y, y and p are free and x is bound. In the full expression, both x and y are bound: x is bound by the inner λ, and y is bound by the other λ. “p” is still free.

We’ll often use “free(x)” to mean the set of identifiers that are free in the expression “x”.

A λ calculus expression is valid (and thus evaluatable) only when all of its variables are bound. But when we look at smaller subexpressions of a complex expression, taken out of context, they can have free variables – and making sure that the variables that are free in subexpressions are treated right is very important.

Lambda Calculus Evaluation Rules

There are only two real rules for evaluating expressions in λ calculus; they’re called α and β. α is also called “conversion”, and β is also called “reduction”.

α is a renaming operation; basically it says that the names of variables are unimportant: given any expression in λ calculus, we can change the name of the parameter to a function as long as we change all free references to it inside the body.

So – for instance, if we had an expression like:


λ x . if (= x 0) then 1 else x^2

We can do an α to replace X with Y (written “α[x/y]” and get):


λ y . if (= y 0) then 1 else y^2

Doing α does not change the meaning of the expression in any way. But as we’ll see later, it’s important because without it, we’d often wind up with situations where a single variable symbol is bound by two different enclosing λs. This will be particularly important when we get to recursion.

β reduction is where things get interesting: this single rule is all that’s needed to make the λ calculus capable of performing any computation that can be done by a machine.

β basically says that if you have a function application, you can replace it with a copy of the body of the function with references to the parameter identifiers replaced by references to the parameter value in the application. That sounds confusing, but it’s actually pretty easy when you see it in action.

Suppose we have the application expression: (λ x . x + 1) 3. By performing a beta reduction, we can replace the application by taking the body x + 1 of the function, and substituting (or αing) the value of the parameter (3) for the parameter variable symbol (x). So we replace all references to x with 3. So the result of doing a beta reduction xs 3 + 1.

A slightly more complicated example is the expression:

λ y . (λ x . x + y)) q

It’s an interesting expression, because it’s a λ expression that when applied, results in another λ expression: that is, it’s a function that creates functions. When we do beta reduction in this, we’re replacing all references to the parameter y with the identifier q; so, the result is λ x . x + q.

One more example, just for the sake of being annoying. Suppose we have: (λ x y. x y) (λ z . z * z) 3

That’s a function that takes two parameters, and applies the first one to the second one. When we evaluate that, we replace the parameter x in the body of the first function with λ z . z * z; and we replace the parameter y with 3, getting: (λ z . z * z) 3. And we can perform beta on that, getting 3 * 3.

Written formally, beta says: λ x . B e = B[x := e] if free(e) ⊂ free(B[x := e])

That condition on the end, “if free(e) ⊂ free(B[x := e]” is why we need α: we can only do beta reduction if doing it doesn’t create any collisions between bound identifiers and free identifiers: if the identifier “z” is free in “e”, then we need to be sure that the beta-reduction doesn’t make “z” become bound. If there is a name collision between a variable that is bound in “B” and a variable that is free in “e”, then we need to use α to change the identifier names so that they’re different.

As usual, an example will make that clearer: Suppose we have a expression defining a function, λ z . (λ x . x+z). Now, suppose we want to apply it: (λ z . (λ x . x + z)) (x + 2). In the parameter (x + 2), x is free. Now, suppose we break the rule and go ahead and do beta. We’d get “λ x . x + x + 2“. The variable that was free in x + 2 is now bound! We’ve changed the meaning of the function, which we shouldn’t be able to do. If we were to apply that function after the incorrect β, we’d get (λ x . x + x + 2) 3. By beta, we’d get 3 + 3 + 2, or 8.

What if we did α the way we were supposed to?

First, we’d do an α to prevent the name overlap. By α[x/y], we would get λ z . (λ y . y + z) (x+2).

Then by β, we’d get “λ y . y + x + 2“. If we apply this function the way we did above, then by β, we’d get 3+x+2.
3+x+2 and 3+3+2 are very different results!

And that’s pretty much it. There’s another optional rule you can add called η-conversion. η is a rule that adds extensionality, which provides a way of expressing equality between functions.

η says that in any λ expression, I can replace the value f with the value g if/f for all possible parameter values x, f x = g x.

What I’ve described here is Turing complete – a full effective computation system. To make it useful, and see how this can be used to do real stuff, we need to define a bunch of basic functions that allow us to do math, condition tests, recursion, etc. I’ll talk about those in my next post.

It’l also important to point out that while I’ve gone through a basic definition of λ calculus, and described its mechanics, I haven’t yet defined a model for λ-calculus. That’s quite an important omission! λ-calculus was played with by logicians for several years before they were able to come up with a complete model for it, and it was a matter of great concern that although it looked correct, the early attempts to define a model for it were failures! And without a valid model, the results of the system are meaningless. An invalid model in a logical system like calculus is like a contradiction in axioms: it means that nothing that it produces is valid.

Admin: Spam Burst

Just a quick administrative note. For the last few days, there’s been a huge increase in spam comments. As a result, I’m tightening up the spam filters to prevent them from getting posted even for a brief period. As a result, legitimate comments may take longer to appear. If you have a comment that should have appeared, and it still hasn’t shown up after a few hours, feel free to send me an email, and I’ll bounce over and release it as soon as possible.

The Banach-Tarski non-Paradox

For some reason, lately I’ve been seeing a bunch of mentions of Banach Tarski. B-T is a fascinating case of both how counter-intuitive math can be, and also how profoundly people can misunderstand things.

For those who aren’t familiar, Banach-Tarski refers to a topological/measure theory paradox. There are several variations on it, all of which are equivalent.

The simplest one is this: Suppose you have a sphere. You can take that sphere, and slice it into a finite number of pieces. Then you can take those pieces, and re-assemble them so that, without any gaps, you now have two spheres of the exact same size as the original.

Alternatively, it can be formulated so that you can take a sphere, slice it into a finite number of pieces, and then re-assemble those pieces into a bigger sphere.

This sure as heck seems wrong. It’s been cited as a reason to reject the axiom of choice, because the proof that you can do this relies on choice. It’s been cited by crackpots like EE Escultura as a reason for rejecting the theory of real numbers. And there are lots of attempts to explain why it works. For example, there’s one here that tries to explain it in terms of density. There’s a very cool visualization of it here, which tries to make sense of it by showing it in the hyperbolic plane. Personally, most of the attempts to explain it intuitively drive me crazy. One one level, intuitively, it doesn’t, and can’t make sense. But on another level, it’s actually pretty simple. No matter how hard you try, you’re never going to make the idea of turning a finite-sized object into a larger finite-sized object make sense. But on another level, once you think about infinite sets – well, it’s no problem.

The thing is, when you think about it carefully, it’s not really all that odd. It’s counterintuitive, but it’s not nearly as crazy as it sounds. What you need to remember is that we’re talking about a mathematical sphere – that is, an infinite collection of points in a space with a particular set of topological and measure relations.

Here’s an equivalent thing, which is a bit simpler to think about:

Take a line segment. How many points are in it? It’s infinite. So, from that infinite set, remove an infinite set of points. How many points are left? It’s still infinite. Now you’ve got two infinite sets of the same size. So, now you can use one of the sets to create the original line segment, and you can use the second one to create a second, identical line segment.

Still counterintuitive, but slightly easier.

How about this? Take the set of all natural numbers. Divide it into two sets: the set of even naturals, and the set of odd naturals. Now you have two infinite sets,
the set {0, 2, 4, 6, 8, …}, and the set {1, 3, 5, 7, 9, …}. The size of both of those sets is the ω – which is also the size of the original set you started with.

Now take the set of even numbers, and map it so that for any given value i, f(i) = i/2. Now you’ve got a copy of the set of natural numbers. Take the set of odd naturals, and map them with g(i) = (i-1)/2. Now you’ve got a second copy of the set of natural numbers. So you’ve created two identical copies of the set of natural numbers out of the original set of natural numbers.

The problem with Banach-Tarski is that we tend to think of it less in mathematical terms, and more in concrete terms. It’s often described as something like “You can slice up an orange, and then re-assemble it into two identical oranges“. Or “you can cut a baseball into pieces, and re-assemble it into a basketball.” Those are both obviously ridiculous. But they’re ridiculous because they violate one of our instinct that derives from the conservation of mass. You can’t turn one apple into two apples: there’s only a specific, finite amount of stuff in an apple, and you can’t turn it into two apples that are identical to the original.

But math doesn’t have to follow conservation of mass in that way. A sphere doesn’t have a mass. It’s just an uncountably infinite set of points with a particular collection of topological relationship and geometric relationships.

Going further down that path: Banach-Tarski relies deeply of the axiom of choice. The “pieces” that you cut have non-measurable volume. You’re “cutting” from the collection of points in the sphere in a way that requires you to make an uncountably infinite number of distinct “cuts” to produce each piece. It’s effectively a geometric version of “take every other real number, and put them into separate sets”. On that level, because you can’t actually do anything like that, it’s impossible and ridiculous. But you need to remember: we aren’t talking about apples or baseballs. We’re talking about sets. The “slices” in B-T aren’t something you can cut with a knife – they’re infinitely subdivided, not-contiguous pieces. Nothing in the real world has that property, and no real-world process has the ability to cut like that. But we’re not talking about the real world; we’re talking about abstractions. And on the level of abstractions, it’s no stranger than creating two copies of the set of real numbers.

Categorical Computation Characterized By Closed Cartesian Categories

One of my favorite categorical structures is a thing called a closed cartesian category, or CCC for short. Since I’m a computer scientist/software engineer, it’s a natural: CCCs are, basically, the categorical structure of lambda calculus – and thus, effectively, a categorical model of computation. However, before we can talk about the CCCs, we need – what else? – more definitions.

Cartesian Categories

A cartesian category C (note not cartesian closed category) is a category:

  1. With a terminal object t, and
  2. forall a, b in Obj(C), the objects and arrows of the categorical product a times b in C.

So, a cartesian category is a category closed with respect to product. Many of the common categories are cartesian: the category of sets, and the category of enumerable sets, And of course, the meaning of the categorical product in set? Cartesian product of sets.

Categorical Exponentials

To get from cartesian categories to cartesian closed categories, we also need to define categorical exponentials. Like categorical product, the value of a categorical exponential is not required to included in a category. The exponential is a complicated definition, and it’s a bit hard to really get your head around, but it’s well worth the effort. If categorical products are the categorical generalization of set products, then the categorical exponential is the categorical version of a function space. It gives us the ability to talk about structures that are the generalized version of “all functions from A to B”.

Given two objects x and y from a category C, their categorical exponential xy, if it exists in the category, is defined by a set of values:

  • An object x^y,
  • An arrow mbox{eval}_{y,x}: x^y times y rightarrow x, called an evaluation map.
  • forall z in Obj(C), an operation Lambda_C: (z times y rightarrow x) rightarrow (z rightarrow x^y). (That is, an operation mapping from arrows to arrows.)

These values must have the following properties:

  1. forall f : z times y rightarrow x, g : z rightarrow x^y:
    • mbox{val}_{y,x} circ (Lambda_C(f)times 1_y)
    • forall f : z times y rightarrow x, g : z rightarrow x^y: Lambda_C(mbox{eval}_{y,x} circ (z times 1_y) = z

To make that a bit easier to understand, let’s turn it into a diagram.

exponent.jpg

As I alluded to earlier, you can also think of it as a generalization of a function space.x^y is the set of all functions from y to x. The evaluation map is simple description in categorical terms of an operation that applies a function from a to b (an arrow) to a value from a, resulting in an a value from b.

So what does the categorical exponential mean? I think it’s easiest to explain in terms of sets and functions first, and then just step it back to the more general case of objects and arrows.

If X and Y are sets, then X^Y is the set of functions from Y to X.

Now, look at the diagram:

  1. The top part says, basically, that g is a function from Z to to X^Y: so g takes a member of Z, and uses it to select a function from Y to X.
  2. The vertical arrow says:
    1. given the pair (z,y), f(z,y) maps (z,y) to a value in X.
    2. given a pair (z,y), we’re going through a function. It’s almost like currying:
      1. The vertical arrow going down is basically taking g(z,y), and currying it to g(z)(y).
      2. Per the top part of the diagram, g(z) selects a function from y to x. (That is, a member of X^Y.)
      3. So, at the end of the vertical arrow, we have a pair (g(z), y).
    3. The “eval” arrow maps from the pair of a function and a value to the result of applying the function to the value.

    Cartesian Closed Categories

    Now – the abstraction step is actually kind of easy: all we’re doing is saying that there is a structure of mappings from object to object here. This particular structure has the essential properties of what it means to apply a function to a value. The internal values and precise meanings of the arrows connecting the values can end up being different things, but no matter what, it will come down to something very much like function application.

    With exponentials and products, we can finally say what the cartesian closed categories (CCCs). A Cartesian closed category is a category that is closed with respect to both products and exponentials.

    Why do we care? Well, the CCCs are in a pretty deep sense equivalent to the simply typed lambda calculus. That means that the CCCs are deeply tied to the fundamental nature of computation. The structure of the CCCs – with its closure WRT product and exponential – is an expression of the basic capability of an effective computing system. So next, we’ll take a look at a couple of examples of what we can do with the CCCs as a categorical model of computation.

Audiophiles and the Need to be Special

I love laughing at audiophiles.

If you’re not familiar with the term, audiophiles are people who are really into top-end audio equipment. In itself, that’s fine. But there’s a very active and vocal subset of the audiophile community that’s built up their self-image around the idea that they’re special. They don’t just have better audio equipment than you do, but they have better appreciation of sound quality than you do. In fact, their hearing is better than yours. They can hear nuances in sound quality that you can’t, because they’re so very, very special. They’ve developed this ability, you see, because they care more about music than you do.

It’s a very human thing. We all really want to be special. And when there’s something that’s really important to us – like music is for many people – there’s a very natural desire to want to be able to appreciate it on a deep level, a special level reserved only for people who really value it. But what happens when you take that desire, and convince yourself that it’s not just a desire? You wind up turning into a sucker who’s easy to fleece for huge quantities of money on useless equipment that can’t possibly work.

I first learned about these people from my old friend John Vlissides. John died of brain cancer about 5 years ago, which was incredibly sad. But back in the day, when we both worked at IBM Research, he and I were part of a group that ate lunch together every day. John was a reformed audiophile, and used to love talking about the crazy stuff he used to do.

Audiophiles get really nutty about things like cables. For example, John used to have the cables linking his speakers to his amp suspended from the ceiling using non-conductive cord. The idea behind that is that electrical signals are carried, primarily, on the outer surface of the wire. If the cable was sitting on the ground, it would deform slighly, and that would degrade the signal. Now, of course, there’s no perceptible difference, but a dedicated audiophile can convince themselves that they can hear it. In fact, this is what convinced John that it was all craziness: he was trained as an electrical engineer, and he sat down and worked out how much the signal should change as a result of the deformation of the copper wire-core, and seeing the real numbers, realized that there was no way in hell that he was actually hearing that tiny difference. Right there, that’s an example of the math aspect of this silliness: when you actually do the math, and see what’s going on, even when there’s a plausible explanation, the real magnitude of the supposed effect is so small that there’s absolutely no way that it’s perceptible. In the case of wire deformation, the magnitude of the effect on the sound produced by the signal carried by the wire is so small that it’s essentially zero – we’re talking about something smaller than the deformation of the sound waves caused by the motion of a mosquito’s wings somewhere in the room.

John’s epiphany was something like 20 years ago. But the crazy part of the audiophile community hasn’t changed. I encountered two instances of it this week that reminded me of this silliness and inspired me to write this post. One was purely accidental: I just noticed it while going about my business. The other, I noticed on boing-boing because the first example was already in my mind.

First, I was looking for an HDMI video cable for my TV. At the moment, we’ve got both an AppleTV and a cable box hooked up to our TV set. We recently found out that under our cable contract, we could get a free upgrade of the cable box, and the new box has HDMI output – so we’d need a new cable to use it.

HDMI is a relatively new standard video cable for carrying digital signals. Instead of old-fashioned analog signals that emulate the signal recieved by a good-old TV antenna like we used to use, HDMI uses a digital stream for both audio and video. Compared to old-fashioned analog, the quality of both audio and video on a TV using HDMI is dramatically improved. Analog signals were designed way, way back in the ’50s and ’60s for the televisions that they were producing then – they’re very low fidelity signals, which are designed to produce images on old TVs, which had exceedingly low resolution by modern standards.

The other really great thing about a digital system like HDMI is that digital signals don’t degrade. A digital system takes a signal, and reduces it to a series of bits – signals that can be interpreted as 1s and 0s. That series of bits is divided into bundles called packets. Each packet is transmitted with a checksum – an additional number that allows the receiver to check that it received the packet correctly. So for a given packet of information, you’ve either received it correctly, or you didn’t. If you didn’t, you request the sender to re-send it. So you either got it, or you didn’t. There’s no in-between. In terms of video quality, what that means is that the cable really doesn’t matter very much. It’s either getting the signal there, or it isn’t. If the cable is really terrible, then it just won’t work – you’ll get gaps in the signal where the bad packets dropped out – which will produce a gap in the audio or video.

In analog systems, you can have a lot of fuzz. The amplitude of the signal at any time is the signal – so noise effects that change the amplitude are changing the signal. There’s a very real possibility that interference will create real changes in the signal, and that those changes will produce a perceptible result when the signal is turned into sound or video. For example, if you listen to AM radio during a thunderstorm, you’ll hear a burst of noise whenever there’s a bolt of lightning nearby.

But digital systems like HDMI don’t have varying degrees of degradation. Because the signal is reduced to 1s and 0s – if you change the amplitude of a 1, it’s still pretty much going to look like a one. And if the noise is severe enough to make a 1 look like a 0, the error will be detected because the checksum will be wrong. There’s no gradual degradation.

But audiophiles… ah, audiophiles.

I was looking at these cables. A basic six-foot-long HDMI cable sells for between 15 and 25 dollars. But on the best-buy website, there’s a clearance cable for just $12. Great! And right next to it, there’s another cable. Also six feet long. For $240 dollars! 20-times higher, for a friggin’ digital cable! I’ve heard, on various websites, the rants about these crazies, but I hadn’t actually paid any attention. But now, I got to see it for myself, and I just about fell out of my chair laughing.

To prolong the entertainment, I went and looked at the reviews of this oh-so-amazing cable.

People who say there is NO difference between HDMI cables are just trying to justify to themselves to go cheap. Now it does depend on what you are connecting the cable between. If you put this Carbon HDMI on a Cable or Satellite box, you probably won’t see that much of a difference compared to some middle grade cables.

I connected this cable from my PS3 to my Samsung to first test it, then to my receiver. It was a nice upgrade from my previous Cinnamon cable, which is already a great cable in it’s own right. The picture’s motion was a bit smoother with gaming and faster action, but I still want to check the link to the guide about gaming monitors my fried sent me. I also noticed that film grain looked a little cleaner, not sure why though.

The biggest upgrade was with my audio though. Everything sounded a little crisper with more detail. I also noticed that the sound fields were more distinct. Again not sure exactly why, but I will take the upgrade.

All and all if you want the best quality, go Audio Quest and specifically a Carbon HDMI. You never have to upgrade your HDMI again with one of these guys. Downfall though is that it is a little pricey.

What’s great about it: Smooth motion and a little more definition in the picture

What’s not so great: Price

It’s a digital cable. The signal that it delivers to your TV and stereo is not the slightest bit different from the signal delivered by the $12 clearance cable. It’s been reduced by the signal producing system to a string of 1s and 0s – the identical string of 1s and 0s on both cables – and that string of bits is getting interpreted by exactly the same equipment on the receiver, producing exactly the same audio and video. There’s no difference. It has nothing to do with how good your ears are, or how perceptive you are. There is no difference.

But that’s nothing. The same brand sells a $700 cable. From the reviews:

I really just bought 3 of these. So if you would like an honest review, here it is. Compared to other Audio Quest cables, like the Vodka, you do not see a difference unless you know what to look for and have the equipment that can actually show the difference. Everyone can see the difference in a standard HDMI to an HDMI with Silver in it if you compare, but the difference between higher level cables is more subtle. Audio is the night and day difference with these cables. My bluray has 2 HDMI outs and I put one directly to the TV and one to my processor. My cable box also goes directly to my TV and I use Optical out of the TV because broadcast audio is aweful. The DBS systems keeps the cable ready for anything and I can tell that my audio is clean instantly and my picture is always flawless. They are not cheap cables, they are 100% needed if you want the best quality. I am considering stepping up to Diamond cables for my theater room when I update it. Hope this helps!

And they even have a “professional quality” HDMI cable that sells for well over $1000. And the audiophiles are all going crazy, swearing that it really makes a difference.

Around the time I started writing this, I also saw a post on BoingBoing about another audiophile fraud. See, when you’re dealing with this breed of twit who’s so convinced of their own great superiority, you can sell them almost anything if you can cobble together a pseudoscientific explanation for why it will make things sound better.

This post talks about a very similar shtick to the superexpensive cable: it’s a magic box which… well, let’s let the manufacturer explain.

The Blackbody ambient field conditioner enhances audio playback quality by modifying the interaction of your gear’s circuitry with the ambient electromagnetic field. The Blackbody eliminates sonic smearing of high frequencies and lowers the noise floor, thus clarifying the stereo image.

This thing is particularly fascinating because it doesn’t even pretend to hook in to your audio system. You just position it close to your system, and it magically knows what equipment it’s close to and “harmonizes” everything. It’s just… magic! But if you’re really special, you’ll be able to tell that it works!

Hydrinos: Impressive Free Energy Crackpottery

Back when I wrote about the whole negative energy rubbish, a reader wrote to me, and asked me to write something about hydrinos.

For those who are lucky enough not to know about them, hydrinos are part of another free energy scam. In this case, a medical doctor named Randell Mills claims to have discovered that hydrogen atoms can have multiple states beyond the typical, familiar ground state of hydrogen. Under the right conditions, so claims Dr. Mills, the electron shell around a hydrogen atom will compact into a tighter orbit, releasing a burst of energy in the process. And, in fact, it’s (supposedly) really, really easy to make hydrogen turn into hydrinos – if you let a bunch of hydrogen atoms bump in to a bunch of Argon atoms, then presto! some of the hydrogen will shrink into hydrino form, and give you a bunch of energy.

Wonderful, right? Just let a bunch of gas bounce around in a balloon, and out comes energy!

Oh, but it’s better than that. There are multiple hydrino forms: you can just keep compressing and compressing the hydrogen atom, pushing out more and more energy each time. The more you compress it, the more energy you get – and you don’t really need to compress it. You just bump it up against another atom, and poof! energy.

To explain all of this, Dr. Mills further claims to have invented a new
form of quantum mechanics, called “grand unified theory of classical quantum mechanics” (CQM for short) which provides the unification between relativity and quantum mechanics that people have been looking for. And, even better, CQM is fully deterministic – all of that ugly probabilistic stuff from quantum mechanics goes away!

The problem is, it doesn’t work. None of it.

What makes hydrinos interesting as a piece of crankery is that there’s a lot more depth to it than to most crap. Dr. Mills hasn’t just handwaved that these hydrino things exist – he’s got a very elaborate detailed theory – with a lot of non-trivial math – to back it up. Alas, the math is garbage, but it’s garbage-ness isn’t obvious. To see the problems, we’ll need to get deeper into math than we usually do.

Let’s start with a couple of examples of the claims about hydrinos, and the kind of favorable clueless press they’ve received.

Here is an example of how hydrino supporters explain them:

In 1986 Randell Mills MD developed a theory that hydrogen atoms could shrink, and release lots of energy in the process. He called the resultant entity a “Hydrino” (little Hydrogen), and started a company called Blacklight Power, Inc. to commercialize his process. He published his theory in a book he wrote, which is available in PDF format on his website. Unfortunately, the book contains so much mathematics that many people won’t bother with it. On this page I will try to present the energy related aspect of his theory in language that I hope will be accessible to many.

According to Dr. Mills, when a hydrogen atom collides with certain other atoms or ions, it can sometimes transfer a quantity of energy to the other atom, and shrink at the same time, becoming a Hydrino in the process. The atom that it collided with is called the “catalyst”, because it helps the Hydrino shrink. Once a Hydrino has formed, it can shrink even further through collisions with other catalyst atoms. Each collision potentially resulting in another shrinkage.

Each successive level of shrinkage releases even more energy than the previous level. In other words, the smaller the Hydrino gets, the more energy it releases each time it shrinks another level.

To get an idea of the amounts of energy involved, I now need to introduce the concept of the “electron volt” (eV). An eV is the amount of energy that a single electron gains when it passes through a voltage drop of one volt. Since a volt isn’t much (a “dry cell” is about 1.5 volts), and the electric charge on an electron is utterly minuscule, an eV is a very tiny amount of energy. Nevertheless, it is a very representative measure of the energy involved in chemical reactions. e.g. when Hydrogen and Oxygen combine to form a water molecule, about 2.5 eV of energy is released per water molecule formed.

When Hydrogen shrinks to form a second level Hydrino (Hydrogen itself is considered to be the first level Hydrino), about 41 eV of energy is released. This is already about 16 times more than when Hydrogen and Oxygen combine to form water. And it gets better from there. If that newly formed Hydrino collides with another catalyst atom, and shrinks again, to the third level, then an additional 68 eV is released. This can go on for quite a way, and the amount gets bigger each time. Here is a table of some level numbers, and the energy released in dropping to that level from the previous level, IOW when you go from e.g. level 4 to level 5, 122 eV is released. (BTW larger level numbers represent smaller Hydrinos).

And some of the press:

Notice a pattern?

The short version of the problem with hydrinos is really, really simple.

The most fundamental fact of nature that we’ve observed is that everything tends to move towards its lowest energy state. The whole theory of hydrinos basically says that that’s not true: everything except hydrogen tends to move towards its lowest energy state, but hydrogen doesn’t. It’s got a dozen or so lower energy states, but none of the abundant quantities of hydrogen on earth are ever observed in any of those states unless they’re manipulated by Mills magical machine.

The whole basis of hydrino theory is Mills CQM. CQM is rubbish – but it’s impressive looking rubbish. I’m not going to go deep into detail; you can see a detailed explanation of the problems here; I’ll run through a short version.

To start, how is Mills claiming that hydrinos work? In CQM, he posits the existence of electron shell levels closer to the nucleus than the ground state of hydrogen. Based on his calculations, he comes up with an energy figure for the difference between the ground state and the hydrino state. Then he finds other substances that have the property that boosting one electron into a higher energy state would cost the same amount of energy. When a hydrogen atom collides with an atom that has a matching electron transition, the hydrogen can get bumped into the hydrino state, while kicking an electron into a higher orbital. That electron will supposedly, in due time, fall back to its original level, releasing the energy differential as a photon.

On this level, it sort-of looks correct. It doesn’t violate conservation of energy: the collision between the two atoms doesn’t produce anything magical. It’s just a simple transfer of energy. That much is fine.

It’s when you get into the details that it gets seriously fudgy.

Right from the start, if you know what you’re doing, CQM goes off the rails. For example, CQM claims that you can describe the dynamics of an electron in terms of a classical wave charge-density function equation. Mills actually gives that function, and asserts that it respects Lorentz invariance. That’s crucial – Lorentz invariance is critical for relativity: it’s the fundamental mathematical symmetry that’s the basis of relativity. But his equation doesn’t actually respect Lorentz invariance. Or, rather, it does – but only if the electron is moving at the speed of light. Which it can’t do.

Mills goes on to describe the supposed physics of hydrinos. If you work through his model, the only state that is consistent with both his equations, and his claim that the electrons orbit in a spherical shell above the atom – well, if you do that, you’ll find that according to his own equations, there is only one possible state for a hydrogen atom – the conventional ground state.

It goes on in that vein for quite a while. He’s got an elaborate system, with an elaborate mathematical framework… but none of the math actually says what he says it says. The Lorentz invariance example that I cited above – that’s typical. Print an equation, say that it says X, even though the equation doesn’t say anything like X.

But we can go a bit further. The fundamental state of atoms is something that we understand pretty well, because we’ve got so many observations, and so much math describing it. And the thing is, that math is pretty damned convincing. That doesn’t mean that it’s correct, but it does mean that any theory that wants to replace it must be able to describe everything that we’ve observed at least as well as the current theory.

Why do atoms have the shape that they do? Why are the size that they are? It’s not a super easy thing to understand, because electrons aren’t really particles. They’re something strange. We don’t often think about that, but it’s true. They’re deeply bizarre things. They’re not really particles. Under many conditions, they behave more like waves than like particles. And that’s true of the atom.

The reason that atoms are the size that they are is because the electron “orbitals” have sizes and shapes that are determined by resonant frequencies of the wave-like aspects of electrons. What Mills is suggesting is that there are a range of never-before observed resonant frequencies of electrons. But the math that he uses to support that claim just doesn’t work.

Now, I’ll be honest here. I’m not nearly enough of a physics whiz to be competent to judge the accuracy of his purported quantum mechanical system. But I’m still pretty darn confident that he’s full of crap. Why?

I’m from New Jersey – pretty much right up the road from where his lab is. Going to college right up the road from him, I’ve been hearing about his for a long time. He’s been running this company for quite a while – going on two decades. And all that time, the company has been constantly issuing press releases promising that it’s just a year away from being commercialized! It’s always one step away. But never, never, has he released enough information to let someone truly independent verify or reproduce his results. And he’s been very deceptive about that: he’s made various claims about independent verification on several occasions.

For example, he once cited that his work had been verified by a researcher at Harvard. In fact, he’d had one of his associates rent a piece of equipment at Harvard, and use it for a test. So yes, it was tested by a researcher – if you count his associate as a legitimate researcher. And it was tested at Harvard. But the claim that it was tested by a researcher at Harvard is clearly meant to imply that it was tested by a Harvard professor, when it wasn’t.

For something around 20 years, he’s been making promises, giving very tightly controlled demos, refusing to give any real details, refusing to actually explain how to reproduce his “results”, and promising that it’s just one year away from being commercialized!

And yet… hydrogen is the most common substance in the universe. If it really had a lower energy state that what we call it’s ground state, and that lower energy state was really as miraculous as he claims – why wouldn’t we see it? Why hasn’t it ever been observed? Substances like Argon are rare – but they’re not that rare. Argon has been exposed to hydrogen under laboratory conditions plenty of times – and yet, nothing anamalous has even been observed. All of the supposed hydrino catalysts have been observed so often under so many conditions – and yet, no anamolous energy has even been noticed before. But according to Mills, we should be seeing tons of it.

And that’s not all. Mills also claims that you can create all sorts of compounds with hydrinos – and naturally, every single one of those compounds is positively miraculous! Bonded with silicon, you get better semiconductors! Substitute hydrinos for regular hydrogen in a battery electrolyte, and you get a miracle battery! Use it in rocket fuel instead of common hydrogen, and you get a ten-fold improvement in the performance of a rocket! Make a laser from it, and you can create higher-density data storage and communication systems. Everything that hydrinos touch is amazing

But… not one of these miraculous substances has ever been observed before. We work with silicon all the time – but we’ve never seen the magic silicon hydrino compound. And he’s never been willing to actually show anyone any of these miracle substances.

He claims that he doesn’t show it because he’s protecting his intellectual property. But that’s silly. If hydrinos existed, then just telling us that these compounds exist and have interesting properties should be enough for other labs to go ahead and experiment with producing them. But no one has. Whether he shows the supposed miracle compounds or not doesn’t change anyone else’s ability to produce those. Even if he’s keeping his magic hydrino factory secret, so that no one else has access to hydrinos, by telling us that these compounds exist, he’s given away the secret. He’s not protecting anything anymore: by publically talking about these things, he’s given up his right to patent the substances. It’s true that he still hasn’t given up the rights to the process of producing them – but publicly demonstrating these alleged miracle substances wouldn’t take away any legal rights that he hasn’t already given up. So, why doesn’t he show them to you?

Because they don’t exist.