Dishonest Dembski:the Universal Probability Bound
One of the dishonest things that Dembski frequently does that really bugs me is take bogus arguments, and dress them up using mathematical terminology and verbosity to make them look more credible.
An example of this is Dembski’s *universal probability bound*. Dembski’s definition of the UPB from the [ICSID online encyclopedia][upb-icsid] is:
>A degree of improbability below which a specified event of that probability
>cannot reasonably be attributed to chance regardless of whatever
>probabilitistic resources from the known universe are factored in. Universal
>probability bounds have been estimated anywhere between 10-50 (Emile Borel)
>and 10-150 (William Dembski).
He’s quantified it in several different ways. I’ve found three different versions of the calculation of the UPB: two of them from wikipedia; one is from a message thread at ICSID which the author claims is a quote from one of Dembski’s books.
Let’s look at Dembski’s own words first:
>Specifically, within the known physical universe there are estimated to be no
>more than 1080 elementary particles. Moreover, the properties of matter are
>such that transitions from one state to another cannot occur at a rate faster
>that 1045 times per second. Finally, the universe itself is about a billion
>times younger than 1025 seconds (assuming the universe is around 10 to 20
>billion years old). ….these cosmological constraints imply that the total
>number of specified events throughout cosmic history cannot exceed
>1080 * 1045 x 1025 = 10150.
He goes on to assert that this is the “maximum number of trials” that could have occurred since the beginning of the universe, and that for anything less likely than that which is observed to occur, it is not reasonable to say it is caused by chance.
Wikipedia presents this definition, and a more recent one which lowers the UPB, but as they don’t provide all of the details of the equation, I’ll skip it for now. Wikipedia’s explanation of this original form of the UPB is:
>Dembski’s original value for the universal probability bound is 1 in 10150,
>derived as the inverse of the product of the following approximate
>quantities:[11]
>
> * 1080, the number of elementary particles in the observable
> universe.
> * 1045, the maximum rate per second at which transitions in
> physical states can occur (i.e., the inverse of the Planck time).
> * 1025, a billion times longer than the typical estimated age of
> the universe in seconds.
>
>Thus, 10150 = 1080 × 1045 × 1025.
>Hence, this value corresponds to an upper limit on the number of physical
>events that could possibly have occurred since the big bang.
Here’s the fundamental dishonesty: None of those numbers have *anything* to do with what he’s supposedly trying to prove. He’s trying to create a formal-sounding version of the big-number problem by throwing together a bunch of fancy-sounding numbers, multiplying them together, and claiming that they somehow suddenly have meaning.
But they don’t.
It’s actually remarkably easy to show what utter nonsense this is. I’ll do a fancy one first, and a trivial one second.
Let’s create an incredibly simplified model of a region of space. Let’s say we have a cube of space, 1 kilometer on a side. Further, let’s suppose that this space contains 1000 particles, and they are all electrons. And further, let’s suppose that each 1mm cube in this cubic kilometer can only have one electron in it.
This is a model which is so much simpler than reality that it’s downright silly. But everything about the real world would make it more complex, and it’s sufficient for our purposes.
Now: consider the probability of any *configuration* of the electrons in the region of space. A configuration is a selection of the set of 1mm cubes that contain electrons. The number of different configurations of this region of space is (109!)/((1000!)*(109-1000)!). That works out to (109*(109-1)*(109-2)*…*(109-1000))/(1000!).
1000! is roughly 4×102568 according to my scheme interpreter. We’ll be generous, and use 1×102569, to make things easier. To estimate the numerator, we can treat it as (109)*((108)999), which will be much smaller. That’s 107801. So the probability of any particular configuration within that cube is 1 in 105232.
So any state of particles within that cube is an event with probability considerably smaller than 1 in 105232. So what Dembski is saying is that *every* possible configuration of matter in space in the entire universe is impossible without intelligent intervention.
And the trivial one? Grab two decks of distinguishable cards. Shuffle them together, and lay them out for a game of spider solitaire. What’s the probability of that particular lay of cards? 104! , or, very roughly, something larger than 1×10166. Is god personally arranging ,my cards every time I play spider?
Anyone who’s ever taken any class on probability *knows* this stuff. One college level intro, and you know that routine daily events can have incredibly small probabilities – far smaller than his alleged UPB. But Dembski calls himself a mathematician, and he writes about probability quite frequently. As much as I’ve come to believe that he’s an idiot, things like this just don’t fit: he *must* know that this is wrong, but he continues to publish it anyway.
[upb-icsid]: http://www.iscid.org/encyclopedia/Universal_Probability_Bound
Skewing Statistics for Politics
As I’ve frequently said, statistics is an area which is poorly understood by most people, and as a result, it’s an area which is commonly used to mislead people. The thing is, when you’re working with statistics, it’s easy to find a way of presenting some value computed from the data that will appear to support a predetermined conclusion – regardless of whether the data as a whole supports that conclusion. Politicians and political writers are some of the worst offenders at this.
Case in point: over at [powerline][powerline], they’re reporting:
>UPI reports that Al Gore’s movie, An Inconvenient Truth, hasn’t done so well
>after a promising start:
>
>> Former U.S. vice-President Al Gore’s documentary “An Inconvenient Truth”
>>has seen its ticket sales plummet after a promising start.
>>
>>After Gore’s global warming documentary garnered the highest average per play
>>ever for a film documentary during its limited Memorial Day weekend opening,
>>recent theater takes for the film have been less than stellar, Daily Variety
>>reports.
>>
>> The film dropped from its record $70,333 per play to $12,334 during its
>>third week and its numbers have continued to fall as the film opens in smaller
>>cities and suburbs across the country.
>
>It’s no shock, I suppose, that most people aren’t interested in seeing
>propaganda films about the weather. But the topic is an interesting and
>important one which we wrote about quite a few years ago, and will try to
>return to as time permits.
So: they’re quoting a particular figure: *dollars per screen-showing*, as a measure of how the movie is doing. The thing is, that’s a pretty darn weird statistic. Why would they use dollars/screen-showing, instead of total revenue?
Because it’s the one statistic that lets them support the conclusion that they wanted to draw. What are the real facts? Official box office statistics for gross per weekend (rounded to the nearest thousand):
* May 26: $281,000 (in 4 theaters)
* June 2: $1,356,000 (in 77 theaters)
* June 9: $1,505,000 (in 122 theaters)
* June 16: $1,912,000 (in 404 theaters)
* June 23: $2,016,000 (in 514 theaters)
Each weekend, it has made more money than the previous weekend. (Those are per weekend numbers, not cumulative. The cumulative gross for the movie is $9,630,000.)
But the per showing gross has gone down. Why? Because when it was first released, it was being shown in a small number of showings in a small number of theaters. When it was premiered in 4 theaters, they sold out standing room only – so the gross per showing was very high. Now, four weeks later, it’s showing in over 500 theaters, and the individual showings aren’t selling out anymore. But *more people* are seeing it – every weekend, the number of people seeing it has increased!
The Powerline article (and the UPI article which it cites) are playing games with numbers to skew the results. They *want* to say that Al Gore’s movie is tanking in the theaters, so they pick a bizzare statistic to support that, even though it’s highly misleading. In fact, it’s one of the best performing documentaries *ever*. It’s currently the number seven grossing documentary of all time, and it’s about $600,000 off from becoming number 5.
What was the per-theater (note, not per showing, but per theater) gross for the last Star Wars movie four weeks into its showing? $4,500/theater (at 3,322 theaters), according to [Box Office Mojo][bom]. So, if we want to use reasoning a lot like powerline, we can argue that Al Gore’s movie is doing *as well as Star Wars* an a dollars/theater-weekend basis.
But that would be stupid, wouldn’t it.
[bom]: http://www.boxofficemojo.com/movies/?page=weekend&id=starwars3.htm
[powerline]: http://powerlineblog.com/archives/014530.php
Good Math, Bad Math might be in trouble later this week
I just received some email that would seriously worry me if it weren’t for the fact that I’m not an idiot.
WARNING!
TERRORISTS are going to ATTACK NEW YORK!
There is NO TIME to waste! Go read http://www.truebiblecode.com/nyc.html!!!!!!
Going there, I find:
We are now 98% confident that the UN Plaza will be hit by a terrorist nuclear bomb between Thursday evening June 29th and Tuesday evening July 4th, 2006
It is certainly true that: No nukes is good nukes! But just because we got the date wrong (3 times) does not mean that the scriptural threat has evaporated. It is still there in black and white in bible symbolism. So we still have the almost impossible task of persuading a typical New Yorker with faith in God, that the Bible predicts the very day and place of the first terrorist nuke. There is obviously a massive credibility gap between: “Here endeth the lesson” and “Here endeth NYC”. But every journey, however long, begins with one small step. So here is our attempt to fill that gap.
Firstly we again strongly advise anyone in New York City with any faith in God, whatever his religion or whatever his distrust of organised religion, to take the last Thursday in June off and to get out of NYC for that weekend and not come back until the evening of July 4 if nothing happens.
You can then study this fascinating article outside NYC at your leisure, during that weekend or more to the point, after that weekend! It is going to be hard to find the necessary time during the next few days, given the busy schedule of every New Yorker, to sit down and fully analyze the fruits of 14 years of bible decoding and reach a rational decision about such a momentous prediction. So the sensible course of action might be to judge for yourself whether we are sincere in our efforts to decode the bible. And if you see that we are sincere, then rather than taking an intellectual walk from basic faith to accurate bible prophecy, just rely on all of the work that we have so far done and on the basis of your faith and our sincerity, take the weekend out of the city.
You gotta love this. Not only is it a splendid example of worse kind of pseudo-numerological gibberish, but they admit to having been wrong three times already. But this time, this time!, they’ve really got it right! We should trust them and get the hell of of NYC this weekend, no matter what it costs!
Categories: Products, Exponentials, and the Cartesian Closed Categories
Before I dive into the depths of todays post, I want to clarify something. Last time, I defined categorical products. Alas, I neglected to mention one important point, which led to a bit of confusion in the comments, so I’ll restate the important omission here.
The definition of categorical product defines what the product looks like *if it’s in the category*. There is no requirement that a category include the products for all, or indeed for any, of its members. Categories are *not closed* with respect to categorical product.
That point leads up to the main topic of this post. There’s a special group of categories called the **Cartesian Closed Categories** that *are* closed with respect to product; and they’re a very important group of categories indeed.
However, before we can talk about the CCCs, we need to build up a bit more.
Cartesian Categories
——————–
A *cartesian category* C (note **not** cartesian *closed* category) is a category:
1. With a terminal object t, and
2. ∀ a, b ∈ Obj(C), the objects and arrows of the categorical product a×b are in C.
So, a cartesian category is a category closed with respect to product. Many of the common categories are cartesian: the category of sets, and the category of enumerable sets, And of course, the meaning of the categorical product in set? Cartesian product of sets.
Categorical Exponentials
————————
Like categorical product, the value of a categorical exponential is not *required* to included in a category. Given two objects x and y from a category C, their *categorical exponential* xy, if it exists in the category, is defined by a set of values:
* An object xy,
* An arrow evaly,x : xy×y → x, called an *evaluation map*.
* ∀ z ∈ Obj(C), an operation ΛC : (z✗y → x) → (z → xy). (That is, an operation mapping from arrows to arrows.)
These values must have the following properties:
1. ∀ f : z×y → x, g : z → xy:
* valy,x º (ΛC(f)×1y)
2. ∀ f : z×y → x, g : z → xy:
* ΛC(evaly,x *ordm; (z×1y)) = z
To make that a bit easier to understand, let’s turn it into a diagram.
You can also think of it as a generalization of a function space. xy is the set of all functions from y to x. The evaluation map is simple description in categorical terms of an operation that applies a function from a to b (an arrow) to a value from a, resulting in an a value from b.
(I added the following section after this was initially posted; a commenter asked a question, and I realized that I hadn’t explained enough here, so I’ve added the explanation.
So what does the categorical exponential mean? I think it’s easiest to explain in terms of sets and functions first, and then just step it back to the more general case of objects and arrows.
If X and Y are sets, then XY is the set of functions from Y to X.
Now, look at the diagram:
* The top part says, basically, that g is a function from Z to to XY: so g takes a member of Z, and uses it to select a function from Y to X.
* The vertical arrow says:
* given the pair (z,y), f(z,y) maps (z,y) to a value in X.
* given a pair (z,y), we’re going through a function. It’s almost like currying:
* The vertical arrow going down is basically taking g(z,y), and currying it to g(z)(y).
* Per the top part of the diagram, g(z) selections a function from y to x. (That is, a member of XY.)
* So, at the end of the vertical arrow, we have a pair (g(z), y).
* The “eval” arrow maps from the pair of a function and a value to the result of applying the function to the value.
Now – the abstraction step is actually kind of easy: all we’re doing is saying that there is a structure of mappings from object to object here. This particular structure has the essential properties of what it means to apply a function to a value. The internal values and precise meanings of the arrows connecting the values can end up being different things, but no matter what, it will come down to something very much like function application.
Cartesian Closed Categories
—————————-
With exponentials and products, we’re ready for the cartesian closed categories (CCCs):
A Cartesian closed category is a category that is closed with respect to both products and exponentials.
Why do we care? Well, the CCCs are in a pretty deep sense equivalent to the simply typed lambda calculus. That means that the CCCs are deeply tied to the fundamental nature of computation. The *structure* of the CCCs – with its closure WRT product and exponential – is an expression of the basic capability of an effective computing system.
We’re getting close to being able to get to some really interesting things. Probably one more set of definitions; and then we’ll be able to do things like show a really cool version of the classic halting problem proof.
A Great Quote about Methods
On my way home from picking up my kids from school, I heard a story on NPR that included a line from one of the editors of the New England Journal of Medicine, which I thought was worth repeating here.
They were discussing an article in this month’s NEJM about [vioxx/rofecoxib][nejm]. The article is a correction to an earlier NEJM article that concluded that the cardiac risks of Vioxx were not really significant until around 18 months of continued use. With more data available, it appears that the 18 month was just an artifact, not a real phenomenon, and the data appears to show that the cardiac risks of Vioxx start very quickly.
[nejm]: http://content.nejm.org/cgi/reprint/NEJMc066260v1.pdf
As usual for something like this, the authors and the corporate sponsors of the work are all consulted in the publication of a correction or retraction. But in this case, the drug maker objected to the correction, because they wanted the correction to include an additional analysis, which appears to show an 18-month threshold for cardiac risk.
The editors response, paraphrased (since I heard it on the radio and don’t have the exact quote):
“That’s not how you do science. You don’t get all of the data, and then
look for an analysis that produces the results you want. You agree on the
analysis you’re going to do *before* you start the study, and then you use
the analysis that you said you were going to use.”
Hey [Geiers][geier], you listening?
[geier]: http://goodmath.blogspot.com/2006/03/math-slop-autism-and-mercury.html
Validity of Mathematical Modeling
In the comments to one of my earlier [Demsbki][demsbki-info-theory] posts, I was responding to a comment by a poster, and realized that what we were discussing was probably interesting enough to promote up to a front-page post.
A poster named [Garote][garote] said:
>I think PaulC said exactly what needs to be said, that grants us all
>hearty permission to ignore Dembski’s work. I think it bears repeating:
>
>”If I am betting on coin flips and notice any kind of algorithmic pattern then
>there are legitimate reasons to conclude that the person flipping coins is not
>playing fair. However, this method does not allow one to discriminate between
>the product of evolution and the product of “intelligent design” since neither
>of these processes in any way resemble random coin flips.”
>
>Dembski’s feverishly comparing apples to oranges, to prove that one is an
>artichoke.
Now, my contempt for Dembski is pretty clear. But this argument against him struck me as making a mistake – and it’s a mistake that strikes at something near-and-dear to my interests: mathematical modeling. Modeling can be sorely abused, [as I’ve discussed before][modeling]; but I don’t want to throw out the baby with the bathwater.
My whole viewpoint on mathematics is that much of its value comes from **abstraction**: the ability to look at something complex; to identify a set of relevant properties for understanding some feature of it; and to use that to create a simpler mathematical model of the complex system based on its fundamental principles. That kind of abstraction isn’t just useful for diddling around in the world of pure math, but for much of how math is used in science. There are a lot of interesting things that we can’t grasp in their entirety; but that we can understand by breaking them down through abstraction into a set of simpler facets that we can study.
I don’t want to throw away the validity of the concept of abstraction and mathematical modeling; but one way of reading that comment is as an argument against the concept of working with a simplified abstract model to understand something complex. (Note that I’m not saying that that’s what Garote meant; but I thought that focusing on this could produce an interesting discussion.)
In Dembski’s favor, I can see the “coin-flip” thing is an abstraction of a complex process to something simple that contains a relevant feature. The thing is, there are things in nature that produce highly random results; and there are other things that produce interesting patterns. Recognizing that there’s something different about processes that produce patterns could be an interesting thing; and recognizing the distinction between randomness and pattern is an interesting subject which probability theorists have spent time studying; it’s right at the intersection between probability theory and information theory, and it’s produced some interesting insights about what information is and what it means.
The problem WRT to Demsbki isn’t that he’s reducing a problem to a mathematical model through abstraction and then analyzing the model. The problem is that when you build a mathematical model, you *need to make sure that your abstraction captures all of the relevant features of what you’re modeling*. And when you draw conclusions based on the analysis of the mathematical model, *you need to make sure that your abstraction isn’t the source of the conclusion*. Both of those are different ways of stating the fundamental rule of mathematical modeling:
**Mathematical models must be validated against the thing that they model.**
Demsbki’s reduction of the process of analyzing certain features of the universe to the a metaphor involving recognition of patterns in coin flips is fine with me. The problem is that he creates a model that *omits important elements* of the real world, and then insists that the *conclusions drawn from his abstract model are applicable to the real world*. But in fact, his conclusions are the result of properties of his model, *not* of the real world. The validation of his model fails: the specific conclusions are the result of eliminating things from his model that might permit a different conclusion.
To continue with the example of the coin flip metaphor: if you’re studying patterns, you could abstract the problem of observing patterns to one of observing flipping coins in an idealized universe, where the coin is perfectly even, and nothing about the surface that it lands on can affect the outcome of a flip. That might be a useful model for trying to understand the difference between random sequences of flips, and patterns in sequences.
If you then try to use that model to determine the *cause* of a pattern, you’ll conclude that the only possible cause of a pattern *is the actions of the coin flipper*. If you’ve abstracted away everything that could influence the outcome of the coin-flip except the influence of the coin-flipper, then *in your model*, concluding that the flipper is the only possible source of a pattern could be a reasonable conclusion.
That doesn’t mean that you can then say that in the real world, the *only possible cause* of a pattern in a sequence of coin-flips is some action taken by the coin-flipper. You can’t apply that simplified abstraction of your model to the real world without showing that it’s a valid model with respect to the the properties that affected your conclusion.
The ultra-simple coin flip model isn’t valid for the real world, because it deliberately omits factors of the real world that could influence the conclusion. In the real world, there are unfair coins (both deliberately unfair coins, and coins that have become unfair either through minting flaws or through wear and tear). There are magnetic fields. There are irregular surfaces.
The problem with many of the conclusions that bad mathematicians draw from mathematical models is that *the models don’t represent what they claim to represent*. They abstract away relevant features of reality, and then demand that reality doesn’t possess those features, because they aren’t in the model.
[demsbki-info-theory]: http://scienceblogs.com/goodmath/2006/06/dembskis_profound_lack_of_comp.php
[garote]: http://garote.bdmonkeys.net/
[modeling]: http://goodmath.blogspot.com/2006/03/on-how-not-to-apply-mathematical.html
Strange Connections
Here at SB, we use Google analytics for getting info about how many people are reading our blogs, and how they get here. I also have a SiteMeter monitor on GM/BM. One thing that I get a kick out of is taking a look at my hits, and seeing what kinds of interesting connections come up. Sometimes it’s funny; sometimes it’s informative, sometimes it’s just depressing.
So last night, I was unwinding after putting my kids to bed, and was taking a look. The interesting/amusing connections I found:
1. The number one search term leading people to Good Math, Bad Math? “pharyngula”. PZ, I hate you! (And yes, I’m deliberately not linking. 🙂 )
2. The number one persons name in searches that lead people to GM/BM? “Ken Ham”. Now *that* is depressing. Ken Ham, who I’ve never written about, directs more hits to my blog than my own name.
3. The strangest search term that led readers here: “butt propeller”. And not just once; three separate visits on three separate days got here via searches for “butt propeller”. I’m really not sure what to make of this one. Probably related to my post on Swinburne where I talked about monkeys flying out of my butt as a metaphor.
4. An interesting connection: I posted something yesterday about a [really bad probability argument][bad-prob] for Christianity. I got the link through email from a reader. It turns out that it’s on a fairly obscure site that [Orac][orac] had linked to in a friday post about Holocaust denial. The reader who sent me the link is definitely one of Orac’s readers as well; so the nasty probability argument was discovered through an unrelated subject on scienceblogs. I like the SB networking aspect of that; on the other hand, I wish I’d looked at the site hosting the wretched argument that I mocked; I’d rather not have given a bunch of asshole holocaust deniers the publicity.
5. Back on GM/BMs [old home][gmbm-blogger] on blogger, I did a couple of posts about a crackpot named Gary Osborne. Gary came to the blog to “defend” himself. (His idea of defending himself is call other people names, and then complain that their criticisms of him are ad hominems.) An interesting thing that I noticed was that every time he posted on the blog, I’d see referrals from a rather obscure british search engine with Gary’s name as the search keyword. After he stopped responding, the hits from that engine went away. Suddenly, I’m seeing a bunch of referrals from that engine, and I discovered that Gary has, once again, posted a [web-page][osborne-declares-victory] on a site without comments discussing his wonderful victory over me, and various GM/BM readers who joined the original discussion. Gotta love people who run away from debates and declare victory, eh?
6. Most disturbing search terms that found my blog?: “juniper lee cartoon porn”. I have no idea who Juniper Lee is. Frankly, I’m too scared to do the search and find out.
7. Unexpected linkages: 65 page views referred from [animalcules][animalcules]. Since I’ve never submitted any of my posts to that, and I didn’t know of anyone linking to me in articles in animalcules, I was surprised (in a good way) by that.
8. Another very surprising one: a search for “daily show” and “kurt vonnegut” led people to GM/BM twice.
9. A couple of frequent commenters here are frequent trollers on Panda’s Thumb. Since I frequently don’t pay a lot of attention to the names of commenters on other blogs, unless I get into a discussion with them, I hadn’t noticed the connection.
10. Three different people set up blogs specifically to respond to something I posted. Each has exactly one post.
[bad-prob]: http://scienceblogs.com/goodmath/2006/06/fundie_probability_even_worse.php
[orac]: http://scienceblogs.com/insolence/2006/06/who_perpetrated_the_holohoax.php
[gmbm-blogger]: http://goodmath.blogspot.com
[osborne-declares-victory]: http://www.gardinersworld.com/content/view/72/45/
[animalcules]: http://scienceblogs.com/aetiology/2006/02/announcing_new_blog_carnival_1.php
An Introduction to Information Theory (updated from Blogspot)
Back when I first started this blog on blogspot, one of the first things I did was write an introduction to information theory. It’s not super deep, but it was a decent overview – enough, I thought, to give people a good idea of what it’s about, and to help understand why the common citations of it are almost always misusing its terms to create bizzare conclusions, like some [“law of conservation of information”,][conservation]
[conservation]: http://en.wikipedia.org/wiki/Law_of_conservation_of_information “wikipedia on Dembski’s law of CoI”
This is a slight revision of that introduction. For the most part, I’m just going to clean up the formatting, but once I’m going through it, I’ll probably end up doing some other polishing.
****
So what is information theory?
================================
Information theory is a relatively new field of mathematics that tries to characterize what information is in a quantifiable way. It’s an area of math that was almost totally obscure until not too long ago, and one which is frequently misunderstood, even by people who mean well.
A bit of history
—————–
Modern information theory comes from two very different sources: Shannon’s information theory, and Kolmogorov/Chaitin algorithmic information theory. Fortunately, they both converge, mostly, on the same place.
### Shannon’s Information Theory
The more famous branch of IT is Claude Shannon’s information theory, as presented in his infamous paper, [Communication in the Presence of Noise][shannon]. Shannon’s interest came from his work at Bell Labs for AT&T, which wanted a way of being able to figure out how much wire they needed to lay for the telephone network.
AT&T was interested because for a telephone company, laying wire in an incredibly expensive operation. The wire itself (especially back in the ways when it was bundles of copper) is quite expensive; the process of running and burying the cables is even more expensive. So ideally, as a phone company, you want to run the cables once; you want to lay enough that you’ll never have to come back and do it again; and you want to lay the smallest amount that will do the job, since the materials are so expensive.
The problem that they wanted Shannon to help solve was, how could they determine how much wire did they actually need to lay? It’s actually an astonishingly tricky question. The less they laid, the less it cost them in the short term. But they *knew* that the number of phone lines was going to increase dramatically – so if they were cheap, and laid too little wire, when the number of phones exceeded the capacity of what they had already laid, they’d have to go back and lay more. So they wanted to be able to make a good prediction of the minimum amount of wire they could lay that would meet their needs both at the time it was laid, and in the future.
But there’s a big trick to that. First: how much information can a single wire carry? And when you bundle a bunch of wires together, how much can you pump over a single wire without it interfering with signals on it’s neighbors in the bundle?
That’s what Shannon was trying to understand. He was trying to find a mathematical model that he could use to describe information transmission, including the effects of imperfections in transmission, including the introduction of noise and interference. He wanted to be able to quantify information in a meaningful way that would allow him to ask questions like: “At what point does noise in the line start to eliminate the information I want to transmit?”.
The answer is just fascinating: a given communication channel has a capacity for carrying information; and a given message has a quantity of information in it. Adding noise to the signal *adds* information to message *until* the capacity of the channel is reached, at which point information from the noise will start to *replace* information from the message; at that point, you can say that information from the message will start to be lost.
Shannon called the information content of a signal it’s *entropy*, because he saw a similarity between his information entropy and thermodynamic entropy: in a communicating system, entropy *never* decreases: it increases until the capacity of the channel is reached, and then it stays content. You can’t reduce the amount of information in a message. (There are various rumours that in fact this choice of terminology was actually suggested by von Neumann himself.)
That naming led directly to the most common misuse of information theory. But that’s a post for another day.
### Algorithmic Information Theory
The other main branch of information theory came from a very different direction. The two pioneers were Andrey Kolmogorov, and Greg Chaitin. Kolmogorov and Chaitin didn’t know each other at the time they independently invented the same mathematical formalization (in fact, Chaitin was a teenager at the time!), a fact which led to some friction.
In the interests of honesty, I’ll say that Greg Chaitin works in the same building that I do, and I’ve met him a number of times, and been greatly impressed by him, so my description is heavily influenced by Greg.
Anyway – algorithmic information theory grew out of some of the fundamental mathematics being done at the beginning of the 20th century. There was an effort to create a single, fundamental, set of mathematical axioms and inference rules, which was both complete (every true statement was provably true), and consistent (every well-formed statement was either true or false). This effort failed, miserably; the nail in the coffin was Gödel’s incompleteness theory. Gödel presented an extremely complicated proof that showed, essentially, that no formal system could be both complete and consistent. (See my recent article about [Extreme Math][extreme] for more about this.)
[extreme]: http://scienceblogs.com/goodmath/2006/06/extreme_math_1_1_2.php “1+1=2”
Most people saw Gödel’s proof, did the equivalent of saying “Oh, shit!”, and then pretended that they hadn’t seen it. It does mean that there are fundamental limits to what you can do using mathematical formalization, but for the most part, they don’t affect you in normal day to day math. (Sort of the way that car designers didn’t change the way they build cars because of relativity. Yes, it showed that the fundamental physical model that they were using was wrong – but at the speeds that cars move, that wrongness is so small that it just doesn’t matter.)
But for people in the early stages of what would become computer science, this was a big deal. One of the early hopes was that the mathematical system would produce a “decision procedure”, a mechanical process by which a computing device could check the truth or falsehood of any mathematical statement. Gödel showed that it couldn’t be done.
But the early computer scientists – in particular, Alan Turing – embraced it. It led directly to two of the fundamental rules of computer science:
1. The Church-Turing Thesis: all mechanical computing systems are basically the same: there is a fundamental limit to what is computable, and any “acceptable” system can compute anything up to that limit – so if you can find a way of doing it on any computing device, then you can, ultimately, do it on every acceptable computing device.
2. The Halting Problem: there are some things that you cannot program a device to do. In particular, you cannot write a program for any computing device that examines another program, and tells you if that other program will eventually stop.
The halting problem turns out to say *exactly* the same thing as the Gödel incompleteness theorem. Only you can write a proof of it that a freshman college student can understand in ten minutes, instead of a proof that the average math grad student will need a semester to plow through.
Chaitin and Kolmogorov saw this as a big deal: using an algorithmic approach to how to process information, you could prove something very simply, which would require a much more complicated proof using other methods.
K-C information theory grew out of this. According to Kolmogorov and Chaitin, the fundamental amount of information contained in a string (called the string’s information entropy after Shannon), is the shortest string of program + data for a computing device that can generate that string.
(If you’re interested in Gödel’s incompleteness theorem, then Hofstadter’s
[Godel, Escher, Bach: an Eternal Golden Braid][geb] is a really fun introduction. For K-C theory, Greg Chaitin has written a bunch of really [great][chaitin1] [books][chaitin2] for mathematical laymen.
[geb]: http://www.amazon.com/gp/search/ref=br_ss_hs/002-3998406-1014458?search-alias=aps&keywords=godel%20escher%20bach “GEB amazon link”
[chaitin1]: http://www.amazon.com/gp/product/9814021725/sr=8-4/qid=1141849782/ref=pd_bbs_4/002-3998406-1014458?%5Fencoding=UTF8
[chaitin2]: http://www.amazon.com/gp/product/1852336684/sr=8-3/qid=1141849782/ref=pd_bbs_3/002-3998406-1014458?%5Fencoding=UTF8″
Information and Entropy
————————-
Now that I’ve got a bit of background and origins done, it’s time to get to some of the fun stuff.
As I said yesterday, both of the big branches of information theory have a concept of *entropy*. While the exact presentations differ, because they’re built on somewhat different theoretical foundations, the basic meaning of entropy in both branches is pretty much the same. I’m going to stick with the terminology of the Kolmogorov-Chaitin branch, but the basic idea is the same in either.
In K-C theory, we talk about the information content of *strings*. A string is an ordered sequence of characters from a fixed alphabet. It doesn’t really matter what alphabet you choose; the point of this kind of theory isn’t to come up with a specific number describing the complexity of a string, but to be able to talk about what information means in a formal algorithmic sense.
The K-C definition of the entropy of a string is the total quantity of information encoded in that string.
Here’s where it gets fun.
Another way of saying exactly the same thing is that the entropy of a string is a measure of the *randomness* of the string.
Structured strings are structured *precisely* because they contain *redundancy* – and redundancy does not add information.
Which string has more information: “XXXXYYYYY” or “4X5Y”? They have the same amount. One is a lot smaller than the other. But they contain the same amount of information.
Here’s a third way of saying exactly the same thing: the entropy of a string is the length of the smallest compressed string that can be decompressed into the original.
Here’s a better example. Go back and look at the first two paragraphs of yesterday’s post. It took 514 characters.
Here’s the same information, compressed (using gzip) and then made
readable using a unix utility called uuencode:
M’XL(“)E8$$0“VIU;FL`19$Q=JPP#$7[6≤7KIN’,`M+]≤HHLPH“)8BMOPY
M[#Y/GCDG#
MLY2EI8$9H5GLX=*R(_+ZP/,-5-1#HRJNT`77@LL,MZD,”H7LSUKDW3@$#V2
MH(KTO$Z^$%CN1Z>3L*J^?6ZW?^Y2+10;+SOO’OC”/;7T5QA%987SY02I3I$
MLKW”W,VZ-(J$E”[$;’2KYI^-_L./3BW.+WF3XE8)≤@D8X^U59DQ7IA*F+X/
MM?I!RJ*%FE%])Z+EXE+LSN*,P$YNX5/P,OCVG;IK=5_K&CK6J7%’+5M,R&J]
M95*W6O5EI%G^K)8B/XV#L=:5_`=5ELP#Y#UJ??>[[DI=J”*2D];K_F230″
$`@(““`
`
That’s only 465 characters; and if I didn’t have to encode it to stop
it from crashing your web-browser, it would only have been 319
characters. Those characters are a lot more random than the original –
so they have a higher information density. The closer we get
to the shortest possible compressed length, the higher the information
density gets.
Actually, I’m lying slightly; there’s an additional factor that needs to be considered in K-C theory.
Remember that K-C comes from the foundational mathematics of computer science, something called recursive function theory. We’re talking about strings being processed algorithmically by a program. So really, we get to talk about programs that generate strings. So it’s not just strings: you have a program and a data string. Really, you measure information content of a string as the total size of the smallest pairing of program and data that generates that string. But for most purposes, that doesn’t make that much difference: the combination of the program and the data can be encoded into a string.
In the two examples above, the program to follow is implied by the contents of the string; if we wanted to really model it precisely, we’d need to include the programs:
* For the string “XXXXYYYYY”, the program is roughly:
while there are more characters in the data:
read a character
output the character that you read
end
* For the string “4X5Y”, the program is roughly:
while there are more characters in the data:
read a number n from the data
read a character c from the data
output c n times
end
Now, specifics aside: the definitions I gave earlier are pretty much correct in both K-C and Shannon information theory: information content is proportional to randomness is proportional to minimum compressed length.
So – what happens if I take a string in very non-compressed format (like, say, an uncompressed digitized voice) and send it over a noisy phone line? Am I gaining information, or losing information?
The answer is: *gaining* information. Introducing randomness into the string is *adding* information.
“AAAABBBB” contains less information than “AAAABxBB”.
The way this is used to refute bozos who claim things like “You can’t create information” should be obvious.
Fundie Probability: Even Worse Math than Swinburne
I recently got a real prize of a link from one of my readers. He’d enjoyed the [Swinburne][swinburne] article, and had encoutered this monstrosity; an alleged [probability of christianity][prob] argument *significantly worse* than Swinburne.
[swinburne]: http://goodmath.blogspot.com/2006/04/mind-numbingly-stupid-math.html “My shredding of Swinburne”
[prob]: http://www.biblebelievers.org.au/radio034.htm “Mathematical Probability that Jesus is the Christ”
The difference between Swinburne and this bozo (who’s name I can’t locate on the site) is that at least Swinburne made *some* attempt to use some math to justify his position. It may have been sloppy as hell – but at least he *did* go through the effort of using actual Bayesian methods. I think that he did an *appallingly* bad job of it; but at least there was something there.
Our new friend doesn’t even bother to do that.
Here’s a very typical example of his “argument”:
>The reason why prophecy is an indication of the divine authorship of the Scriptures, and hence a testimony to the trustworthiness of the Message of the Scriptures, is because of the minute probability of fulfillment.
>
>Anyone can make predictions. Having those prophecies fulfilled is vastly different. In fact, the more statements made about the future, and the more the detail, then the less likely the precise fulfillment will be.
>
>For example, what’s the likelihood of a person predicting today the exact city in which the birth of a future leader would take place, well into the 21st century? This is indeed what the prophet Micah did 700 years before the Messiah. Further, what is the likelihood of predicting the precise manner of death that a new, unknown religious leader would experience, a thousand years from now – a manner of death presently unknown, and to remain unknown for hundreds of years? Yet, this is what David did in 1000 B.C.
>
>Again, what is the likelihood of predicting the specific date of the appearance of some great future leader, hundreds of years in advance? This is what Daniel did, 530 years before Christ.
>
>If one were to conceive 50 specific prophecies about a person in the future, whom one would never meet, just what’s the likelihood that this person will fulfill all 50 of the predictions? How much less would this likelihood be if 25 of these predictions were about what other people would do to him, and were completely beyond his control?
>
>For example, how does someone “arrange” to be born in a specific family?
>
>How does one “arrange” to be born in a specified city, in which their parents don’t actually live? How does one “arrange” their own death – and specifically by crucifixion, with two others, and then “arrange” to have their executioners gamble for His clothing (John 16:19; Psalms 22:18)? How does one “arrange” to be betrayed in advance? How does one “arrange” to have the executioners carry out the regular practice of breaking the legs of the two victims on either side, but not their own? Finally, how does one “arrange” to be God? How does one escape from a grave and appear to people after having been killed?
>
>Indeed, it may be possible for someone to fake one or two of the Messianic prophecies, but it would be impossible for any one person to arrange and fulfill all of these prophecies.
So the basic method is. Assert that there were prophecies that unambiguously identified a specific event. Further, assert that those prophecies were fulfilled. Now, ask, what’s the probability of all of those prophecies coming true?
The arguments that he presents range from the vague to the silly. In the vague cases, he makes a classic mathematical mistake: switching between a priori and a posteori probabilities. In general, probability calculations are done a priori: “I *don’t know* what specific situation I’m going to apply my calculations to, so when I set the probabilities, I do it without any specific knowledge of a desired outcome”. But he does it a posteori: “Since this description of an event *clearly references* this specific incident, I’m going to set the probabilities using my knowledge of the outcome”.
For example, he asserts things like prophecies naming a specific city in which his alleged messiah would be borne. Of course, the specific prophesy that supposedly makes this claim is not specified. And if you go to other places that babble about the various prophecies about the messiah, you find that none of them are actually *specific*. That is, they all talk in very vague and symbolic terms, and *a posteori* interpretations of those can say that they identify a specific place associated with Jesus; but if you were to read them with an open mind, without any idea of a set of events that you were trying to link them to, you almost certainly would not have demanded that they describe a single, specific location. If they could describe 20 locations, then your probability calculations would need to include the probability for each of those possibilities; if you use the a posteori perspective to narrow it down to one, you’re making invalid use of hindsight to skew the numbers.
Even worse than just using the alleged fulfillment of vague prophecies, he makes an even worse kind of a posteori error: numerous citations of prophecies from the *new* testament: a set of books written *after* the events that were supposedly prophesized – as in the citation of the book of John in the quoted section above. We’re supposed to take the fulfillment of a prophecy that wasn’t written down until *after* the event, and act surprised that there’s exactly one supposedly historical event that matches that prophesy.
Ooh, I’m surprised. A bunch of guys who were trying to convince the world that Jesus was the messiah wrote down prophesies *after the fact* in a way that only Jesus could possibly have fulfilled. Adding *that* to an alleged probability calculation to reduce the probability of anyone but Jesus fulfilling all of the prophecies isn’t even bad math. It’s just plain old lying.
In the category of just plain silly, I absolutely *love* the quote where he’s talking about how hard it would be for anyone else to fulfill these alleged prophecies: “how does one “arrange” to be God?”. Yeah, one of the statements that he supposedly calculuates a probability for is “being God”.
What’s the probability of being God? I’d really like to know. Do I have any real chance of fulling it?
Alas, I can’t tell, Because despite insisting that there are 456 different prophesies that Jesus supposedly fulfilled, and that each of those is factored into the calculuation, he doesn’t tell us how any of those individual probabilities were calculated – for example, he says the probability of an alleged messiah being born in Bethlehem was one in 2.8×105. Where’d that number come from? We don’t know.
And for most of the alleged events that were fulfilled prophesies, he doesn’t even give us that much information.
For example, he says that the odds of fulfilling 48 prophesies – he doesn’t say which 48 – is 1 in 10157. He just asserts that. And then goes off into a classic “big numbers” babble about how terribly unimaginably large that number is. He even goes so far as to pull out one of those bullshit “anything less likely than X is impossible”. (I’m actually playing a game of spider solitaire in the background as a write this; which means that I’m doing something absolutely impossible!)
He’s even got a ripoff of the PSICOP challenge:
>But, of course, there are many more than eight prophecies. In another calculation, Stoner used 48 prophecies (Idem, 109) (even though he could have used Edersheim’s 456), and arrived at the extremely conservative estimate that the probability of 48 prophecies being fulfilled in one person is the incredible number 10^157. In fact, if anybody can find someone, living or dead, other than Jesus, who can fulfill only half of the predictions concerning the Messiah given in the book “Messiah in Both Testaments” by Fred J. Meldau, the Christian Victory Publishing Company is ready to give a ONE thousand dollar reward! As apologist Josh McDowell says, “There are a lot of men in the universities that could use some extra cash!” (Josh McDowell, Evidence that Demands a Verdict, California: Campus Crusade for Christ, 175).
The catch of course is that you need to fulfill the prophesies *in exactly the way that these fundies interpret them* – in other words, starting from the assumption that Jesus fulfilled them, and that the prophecies specifically point at exactly the events of the life of Jesus.
I can actually save the authors a lot of trouble. The probability of anyone else fulfilling these prophesies *is zero*. Because they’re interpreted a posteori as the specific events from the life of Jesus, the odds of it being Jesus who fulfilled them is exactly 100%, and the odds of anyone else doing it are exactly 0%.
Course, if you put it that way, suddenly it doesn’t sound so impressive, even to the bunch of idiots who’d accept this argument to begin with.
Friday Random Ten, June 23
- Dirty Three, “Some Summers they Drop Like Flies”. I’ve mentioned the Dirty Three before. Just go get their CDs and listen. Amazing stuff.
- Broadside Electric, “The Gardener”. Broadside is a local electric fold band. Great music, really nice people.
- Tony Trischka Band, “Feed the Horse”. The first album by Tony’s current band. A very cool song actually, although the lyrics are utterly incoherent.
- Thinking Plague, “Consolamentum”. Thinking Plague is, well, just plain weird. I’d probably put them into the same category as groups like the Dirty Three and the Clogs, but TP is a lot less approachable. Think of an often atonal ensemble of people trained in Robert Fripp’s guitar craft program.
- ProjeKct Two, “Laura in Space”. From Fripp acolytes to Fripp himself. ProjeKct two is a sort of free-jazzish improv by Fripp, Adrian Belew, and Trey Gunn.
- Kaipa, “Otherworldly Brights”. Kaipa’s a scandinavian prog-rock band; the first serious band played in by Roine Stolte of the Flower Kings.
- Moxy Fruvous, “Gulf War Song”. A depressing song, especially in light of the events of the last few years. This was written by MF during the first gulf war, about the way that people in favor of that war and people against it couldn’t speak to each other without getting into fights; looking back at the first gulf war, it seems like the disagreements concerning the war were remarkably civil in comparison to now. I don’t recall having major public or political figures call me a traitor, or talk about how I should be killed for treason for disagreeing with their support for the first gulf war; this time around, that seems downright routine.
- Solas, “On the Sea of Fleur De Lis”. Beautiful Irish song.
- Marillion, “An Accidental Man”. Bit of a poppy track by my favorite british neo-prog band. Great song, even if it is a bit on the overly peppy side.
- Transatlantic, “Mystery Train”. Transatlantic is quite an interesting band. Take Roine Stolte of the Flower Kings, Pete Trevawas of Marillion, Neil Morse of Spock’s Beard, and Mike Portnoy of Dream Theater, and throw them into the studio together. Usually when you do that, you got one of those typical “superband” monstrosities, where a bunch of guys with big egos whip something together, and it sounds like a patchwork mess. This sounds like a band that’s been writing songs together for years: polished, exciting, complex stuff, with an incredible chemistry between the musicians. The strangest thing about it is that listening to it, it doesn’t sound much like any of their normal bands; the closest comparison I can come up with is oldish Yes. But none of their normal bands sound particularly Yes-ish to me.