Like a lot of other bloggers, I often get annoying email from people. This week, I’ve been dealing with a particularly annoying jerk, who’s been bothering me for multiple reasons. First, he wants me to “lay off” the Christians (because if I don’t, God’s gonna get me). Second, he wants to convince me to become a Christian. And third, he wants to sell me on his brilliant new compression scheme.
See, aside from the religious stuff, he’s a technical visionary. He’s invented a method where he can take a source document, and repeatedly compress it, making it smaller each time.
This is a stupid idea that I’ve seen entirely too many times. But instead of just making fun of it, I thought it would be interesting to explain in detail why it doesn’t work. It touches on a bunch of basic facts about how data compression works, and provides a nice excuse for me to write a bit about compression.
The basic idea of data compression is that you’re eliminating redundancies in the original text. You can’t discard information. Mathematically, a compression function is an invertible function C from an array of characters to an array of characters (or you could use bits if you prefer), such that if y=C(x), then on the average input, the length of y is smaller than the length of x.
An ideal compression system is one where for all possible values of x, C(x) is shorter than x. C is your compressor; and since C is reversible, it has a unique inverse function C-1 such that C-1(C(x)) = x. An illustration of this basic compression system is in the diagram to the side.