If you regularly follow comments on this blog, you’ll know that I’ve been
having a back-and-forth with a guy who doesn’t know much about information
theory, and who’s been using his ignorance to try to assemble arguments against the
Kolmogorov-Chaitin information-theoretic measure of information in a string.
In the course of doing that, I came up with what I think are some interesting ways
of explaining bits of it, so I thought I’d promote it to the top-level to share
with more readers.
To be absolutely clear up front: I’m far from an expert on K-C theory. It’s something that I find incredibly fascinating, and I’ve read a lot of Chaitin’s work. I’ve been fortunate enough to meet Greg Chaitin a bunch of times when we both worked at
IBM, and I’ve attended a bunch of his lectures. But this isn’t my main area of expertise,
not by a long-shot.
If you don’t know any K-C information theory, then you can look at my
introduction here. The rest is beneath the fold.
As lots of you have heard, William Dembski and Robert Marks just had a
paper published in an IEEE journal. In the last couple of days, I’ve received about 30
copies of the paper in my email with requests to analyze it.
My biggest criticism of the paper is how utterly dull it is. It’s obvious
how they got it published – they removed anything that’s really interesting from it. It’s
a rehash of the stuff they’ve written before, stripped of any content that directly hints
at the anti-evolution part of their claims – which leaves a not-particularly-interesting
paper on search algorithms.
I’m not going to rehash my primary criticisms of Dembski’s approach here – I’ve done it lots of times before, most recently in this post, which critiques a very closely related paper by D&M. In fact, this paper
is really just an edited version of the one I critiqued in that post: it’s that paper with all of the intelligent-design speak removed.
In my Dembski rant, I used a metaphor involving the undescribable numbers. An interesting confusion came up in the comments about just what that meant. Instead of answering it with a comment, I decided that it justified a post of its own. It’s a fascinating topic which is incredibly counter-intuitive. To me, it’s one of the great examples of how utterly wrong our
intuitions can be.
Numbers are, obviously, very important. And so, over the ages, we’ve invented lots of notations that allow us to write those numbers down: the familiar arabic notation, roman numerals, fractions, decimals, continued fractions, algebraic series, etc. I could easily spend months on this blog just writing about different notations that we use to write numbers, and the benefits and weaknesses of each notation.
But the fact is, the vast, overwhelming majority of numbers cannot be written
down in any form.
That statement seems bizarre at best. But it does actually make sense. But for it to
make sense, we have to start at the very beginning: What does it mean for a number to be describable?
Like a lot of other bloggers, I often get annoying email from people. This week, I’ve been dealing with a particularly annoying jerk, who’s been bothering me for multiple reasons. First, he wants me to “lay off” the Christians (because if I don’t, God’s gonna get me). Second, he wants to convince me to become a Christian. And third, he wants to sell me on his brilliant new compression scheme.
See, aside from the religious stuff, he’s a technical visionary. He’s invented a method where he can take a source document, and repeatedly compress it, making it smaller each time.
This is a stupid idea that I’ve seen entirely too many times. But instead of just making fun of it, I thought it would be interesting to explain in detail why it doesn’t work. It touches on a bunch of basic facts about how data compression works, and provides a nice excuse for me to write a bit about compression.
The basic idea of data compression is that you’re eliminating redundancies in the original text. You can’t discard information. Mathematically, a compression function is an invertible function C from an array of characters to an array of characters (or you could use bits if you prefer), such that if y=C(x), then on the average input, the length of y is smaller than the length of x.
An ideal compression system is one where for all possible values of x, C(x) is shorter than x. C is your compressor; and since C is reversible, it has a unique inverse function C-1 such that C-1(C(x)) = x. An illustration of this basic compression system is in the diagram to the side.
Since my post a couple of weeks ago about NASA and the antenna evolution experiment,
I’ve been meaning to write a followup. In both comments and private emails, I’ve gotten
a number of interesting questions about the idea of fitness landscapes, and some of the things
I mentioned in brief throwaway comments. I’m finally finding the time to write about
it now.
Someone sent me some questions about information theory; or actually,
about some questions raised by a bozo creationist arguing about information
theory. But the issues raised are interesting. So while I’m
nominally snarking at a clueless creationist, I’m really using it
as an excuse to talk about one of my favorite mathematical subjects.
The creationist is confused by the information theory idea of information and complexity. That is somewhat confusing.
Intuitively, we think of information in terms of semantics and communication. We think that information is related to semantic content. And our understanding of semantic content comes from language. So we expect
things that are rich in information to be highly structured, like language.
Both in comments, and via email, I’ve received numerous requests to take a look at
the work of Dembski and Marks, published through Professor Marks’s website. The site is
called the “Evolutionary Informatics Laboratory”. Before getting to the paper, it’s worth
taking just a moment to understand its provenance – there’s something deeply fishy about
the “laboratory” that published this work. It’s not a lab – it’s a website; it was funded
under very peculiar circumstances, and hired Dembski as a “post-doc”, despite his being a full-time professor at a different university. Marks claims that his work for
the EIL is all done on his own time, and has nothing to do with his faculty position at the university. It’s all quite bizarre. For details, see here.
On to the work. Marks and Dembski have submitted three papers. They’re all
in a very similar vein (as one would expect for three papers written in a short period
of time by collaborators – there’s nothing at all peculiar about the similarity). The
basic idea behind all of them is to look at search in the context of evolutionary
algorithms, and to analyze it using an information theoretic approach. I’ve
picked out the first one listed on their site: Conservation of Information in Search: Measuring the Cost of Success
While reading Mandelbrot’s text on fractals, I found something that surprised me: a relationship between Shannon’s information theory and fractals. Thinking about it a bit, it’s not really that suprising; in fact, it’s more surprising that I’ve managed to read so much about information theory without encountering the fractal nature of noise in a more than cursory way. But noise in a communication channel is fractal – and relates to one of the earliest pathological fractal sets: Cantor’s set, which Mandelbrot elegantly terms “Cantor’s dust”. Since I find that a wonderfully description, almost poetic way of describing it, I’ll adopt Mandelbrot’s terminology.
Cantor’s dust is a pathological set. It’s caused no small amount of consternation among mathematicians and physicists who find it too strange, too bizarre to accept as anything more than an extreme artifact of logic in the realm of pure math. But it’s a very simple thing. To make it, you start a line segment. Cut it into three identical parts. Erase the middle one. Then repeat the cutting into thirds and erasing the middle on each of the two remaining segments – and then the segments remaining from that, and so on. The diagram below shows a few steps in the construction of the cantor dust.
Why should it be called a dust? Because geometrically, in the limit, it’s got to be a set of completely disconnected points – a scattering of dust across the original line-segment.
It’s so simple – what is pathological about it? Well, it’s clearly 0-dimensional in the pure sense – as I said above, it’s just a collection of points. Topologically, it’s a set of points with empty neighborhoods. And yet – look at it. It’s clearly not zero-dimensional. It’s got a 1-dimensional geometric structure. But naive topology insists that it doesn’t. But there’s worse to it. It’s a similar problem to what we saw in the shape-filling curves. Clearly, the dust disappears into nothingness. Every part of it has zero length – it’s seems like it must converge to something very close to the empty set. And yet it doesn’t: the set of points in the Cantor dust has the same cardinality as the cardinality of the set of points in the original line.
For those of us who came of age as math geeks in the late 20th century, this doesn’t really seem that strange at all. But you’ve got to remember: the 20th century was a time of great turmoil in math. At the beginning of the century, the great work was solving mathematics: turning all of math into a glorious, perfect, clean, rational, elegant edifice. The common belief of the time was that math was beautiful and perfect – that while the real world might be ugly, might have all sorts of imperfections and irrationalities, that those real-world flaws could never touch the realm of pure math: math was, in the words of one famous mathematician “the perfect mind of God”. And then came the crash: the ramifications of Cantor’s set theory, Gödel’s incompleteness, Church and Turing’s uncomputability, fractals, Chaitin’s strange numbers… The edifice collapsed; math was flawed, imperfect, incomplete as anything else in the world. It was hugely traumatic, and there was (and in some circles still is) a great deal of resistance to the idea that so much irregularity or ever irrationality was a part of the world of math – that the part of math that we can really grasp and use is just an infinitessimal part of the monstrous world of what really exists in our abstractions.
But getting back to the point at hand: what does Cantor’s dust have to do with information and noise?
Imagine that you’re listening to sound through a telephone wire with incredibly precise recording equipment. You’re sending a perfectly clear sine-wave over the line, in order to see how much noise there is.
You start pretty high – you only want to record noises that exceed, say, 20% of the amplitude of the basic sine-wave. You wind up with a pattern of bursts of noise. Those noises are scattered around, temporally. Now, mark off every time period of greater that 5 minutes where there is no noise. Those are gaps in the noise – the largest gaps that you’re going to look at. Now look in the bursts of noise – that is, the periods of time where there was no gap in the noise longer than 5 minutes. Look for periods of 1 minute where there wasn’t any noise. In between the 5 minute gaps, you’ll get a collection of smaller 1 minute gaps, separated by smaller bursts of noise. Then look into those 1 minute gaps, for 10 second periods with no noise – and you’ll break the bursts of noise up further, into bursts longer than 10 seconds, but shorter than a minute. Keep doing that, and eventually, you’ll run out of noise. But turn down your noise threshold so that you can hear noise of a smaller amplitude, and you can find more noise, and more gaps, breaking up the bursts.
If you look at the distribution of noise, one thing you’ll notice is that the levels are independent: the length of the longest gaps has no relation to the frequency of smaller gaps between them. And the other thing you’ll notice is that the frequency of gaps is self-similar: the distribution of long gaps relative to sections of the recording of long length are the same as the distribution of short gaps relative to shorter sections of the recording. The noise distribution is fractal! In fact, it’s pretty much a slightly randomized version of Cantor’s dust.
Understanding the structure of noise isn’t just interesting in the abstract: it provides a necessary piece of knowledge, which is used regularly by communication engineers to determine the necessary properties of a communication channel in order to ensure proper transmission and storage of information. Recognizing the fractal nature of noise makes it possible to better predict the properties of that channel, and determine how much information we can safely pump through it, and how much redundancy we need to add to the information to prevent data loss.
Multiple people have written to me, after seeing yesterday’s algorithms basics post, asking me to say more about sorting algorithms. I have to say that it’s not my favorite topic – sorting is one of those old bugaboos that you can’t avoid, but which gets really dull after a while. But there is a kernel of interest to it – sorting can be used to demonstrate a lot of interesting ideas about
computational complexity.
I haven’t taken a look at Uncommon Descent in a while; seeing the same nonsense
get endlessly rehashed, seeing anyone who dares to express disagreement with the
moderators get banned, well, it gets old. But then… Last week, DaveScott (which is, incidentally, a psueudonym!) decided to retaliate against my friend and fellow ScienceBlogger Orac, by “outing” him, and publishing his real name and employer.
Why? Because Orac had dared to criticize the way that a potential, untested
cancer treatment has been hyped recently in numerous locations on the web, including UD.
While reading the message thread that led to DaveScott’s “outing” of Orac, I came
across a claim by Sal Cordova about a new paper that shows how Greg Chaitin’s work in
information theory demonstrates the impossibility of evolution. He even promoted it to
a top-level post on UD. I’m not going to provide a link to Sal’s introduction
of this paper; I refuse to send any more links UDs way. But you can find the paper at this
site.