I received an email from someone with some questions about information theory; they relate to some sufficiently common questions/misunderstandings of information theory that I thought it was worth turning the answer into a post.
There are two parts here: my correspondent started with a question; and then after I answered it, he asked a followup.
The original question:
————————
>Recently in a discussion group, a member posted a series of symbols, numbers,
>and letters:
>
>`+
>The question was what is its meaning and whether this has information or not.
>I am saying is does have information and an unknown meaning (even though I am
>sure they were just characters randomly typed by the sender), because of the
>fact that in order to recognize a symbol, that is information. Others are
>claiming there is no information because there is no meaning. Specifically that
>while the letters themselves have meaning, together the message or “statement”
>does not, and therefore does not contain information. Or that the information
>content is zero.
>
>Perhaps there are different ways to define information?
>
>I think I am correct that it does contain information, but just has no meaning.
This question illustrates one of the most common errors made when talking about information. Information is *not* language: it does *not* have *any* intrinsic meaning. You can describe information in a lot of different ways; in this case, the one that seems most intuitive is: information is something that reduces a possibility space. When you sit down to create a random string of characters, there is a huge possibility space of the strings that could be generated by that. A specific string narrows the space to one possibility. Anything that reduces possibilities is something that *generates* information.
For a very different example, suppose we have a lump of radium. Radium is a radioactive metal which breaks down and emits alpha particles (among other things). Suppose we take out lump of radium, and put it into an alpha particle detector, and record the time intervals between the emission of alpha particles.
The radium is *generating information*. Before we started watching it, there were a huge range of possibilities for exactly when the decays would occur. Each emission – each decay event – narrows the possibility space of other emissions. So the radium is generating information.
That information doesn’t have any particular *meaning*, other than being the essentially random time-stamps at which we observed alpha particles. But it’s information.
A string of random characters may not have any *meaning*; but that doesn’t mean it doesn’t contain information. It *does* contain information; in fact, it *must* contain information: it is a *distinct* string, a *unique* string – one possibility out of many for the outcome of the random process of generation; and as such, it contains information.
The Followup
—————
>The explanation I have gotten from the person I have been debating, as to what
>he says is information is:
>
>I = -log2 P(E)
>
>where:
>I: information in bits
>P: probability
>E: event
>
>So for the example:
>`+
>He says:
>”I find that there is no meaning, and therefore I infer no information. I
>calculate that the probability of that string occuring was ONE (given no
>independent specification), and therefore the amount of information was ZERO. I
>therefore conclude it has no meaning.”
>
>For me, even though that string was randomly typed, I was able to look at the
>characters, and find there placement on a QWERTY keyboard, compare the
>characters to the digits on the hand, and found that over all, the left hand
>index finger keys were used almost twice as much. I could infer that a person
>was left handed tried to type the “random” sequence. So to me, even though I
>don’t know the math, and can’t measure the amount of information, the fact I
>was able to make that inference of a left handed typer tells me that there is
>information, and not “noise”.
The person quoted by my correspondent is an idiot; and clearly one who’s been reading Dembski or one of his pals. In my experience, they’re the only ones who continually stress that log-based equation for information.
But even if we ignore the fact that he’s a Dembski-oid, he’s *still* an idiot. You’ll notice that nowhere in the equation does *meaning* enter into the definition of information content. What *does* matter is the *probability* of the “event”; in this case, the probability of the random string of characters being the result of the process that generated it.
I don’t know how he generated that string. For the sake of working things through, let’s suppose that it was generated by pounding keys on a minimal keyboard; and let’s assume that the odds of hitting any key on that keyboard are equal (probably an invalid assumption, but it will only change the quantity of information, not its presence, which is the point of this example.) A basic minimal keyboard has about sixty keys. (I’m going by counting the keys on my “Happy Hacking Keyboard”. Of those, seven are shifts of various sorts: 2 shift, 2 control, one alt, one command, one mode), and one is “return”. So we’re left with 52 keys. To make things simple, let’s ignore the possibility of shifts affecting the result (this will result in us getting a *lower* information content, but that’s fine). So, it’s 80 characters; each *specific* character generated is an event with probability 1/52. So each *character* of the string has, by the Shannon-based definition quoted above, about 5.7 bits of information. The string as a whole has about 456 bits of information.
The fact that the process that generated it is random doesn’t make the odds of a particular string be one. For a trivial example, I’m going to close my eyes and pound on my keyboard with both hands twice, and then select the first 20 characters from each:
First: “`;kldsl;ksd.z.mdΩ.l.x`”
Second: “`lkficeewrflk;erwm,.r`”
Quite different results overall? Seems like I’m a bit heavy on the right hand on the k/l area. But note how different the outcomes are. *That* is information. Information without *semantics*; without any intrinsic meaning. But that *doesn’t matter*. Information *is not language*; the desire of [certain creationist morons][gitt] to demand that information have the properties of language is nonsense, and has *nothing* to do with the mathematical meaning of information.
The particular importance of this fact is that it’s a common creationist canard that information *must* have meaning; that therefore information can’t be created by a natural process, because natural processes are random, and randomness has no meaning; that DNA contains information; and therefore DNA can’t be the result of a natural process.
The sense in which DNA contains information is *exactly* the same as that of the random strings – both the ones in the original question, and the ones that I created above. DNAs information is *no different*.
*Meaning* is something that you get from language; information *does not* have to be *language*. Information *does not* have to have *any* meaning at all. That’s why we have distinct concepts of *messages*, *languages*, *semantics*, and *information*: because they’re all different.
[gitt]: http://scienceblogs.com/goodmath/2006/07/bad_bad_bad_math_aig_and_infor.php
There is a marvellous illustration to this called the Voynich manuscript. A book originally found in an Italian villa near Frascati, Italy. Like your example the book contains plenty of information – the question is if all the information in the book can be mapped to meaning. 350 years of attempts to hack it, ranging from Zipf’s law over conditional entropy to all sort of esoteric ideas and even sophisticated demonstrations as a (mathematically very well done) hoax. Endless algorithms have been tried to prove (or disprove) if there is meaning encoded in the information of the book.
You can hardly imagine a better illustration to your point that the jump from information to language and meaning is nontrivial.
Here is a good starting point for this hilarious story:
http://en.wikipedia.org/wiki/Voynich_manuscript
On the other side of the coin of the Voynich MS is the question of science itself. If nature were truly random, we could not build models to help us understand it. What is the correct term for the state of nature that when analyzed helps us model some aspect of nature?
PS – I also use a HH keyboard (well HH2.) Great keyboard
Hi Mark,
Wouldn’t it be more correct to say that each emission narrows the possibility space of the whole sequence, but not of the other emissions, which are presumably independent?
Craig:
Unfortunately, I don’t actually use a HH keyboard. I’ve collected a positively obscene number of keyboards; I have this very severe RSI which has caused me a lot of pain. It’s not carpal tunnel, which is what most keyboards are designed for; I’ve got ulnar compression, both in the wrist and the shoulder. So I’ve tried a lot of keyboards to try to find something that would relieve my pain. I’ve wound up with a Kinesis countered keyboard for work; and a typematrix 2030 for home. (At home, I need something small for use with my laptop on my lap; at work, I have enough room for something as bulky as the kinesis.)
The other problem here is that, colloquially, “information” does refer to semantic information or meaning, most of the time. The confusion arises when certain disingenuous ID-iots (e.g. Dembski) intentionally confuse the technical meaning with the colloquial meaning as part of their campaign to push their theology on everybody. Call it the “having your cake and eating it, too” fallacy.
keiths:
Yes, you’re right. Each decay narrows the overall possibility space of the *sequence*, but no decay affects the possibility space of the time until the next decay.
I have understood that data (say a string of bits) is NOT information (but rather noise) unless it has context. Does not a compressed zip file only contain information if you KNOW it’s a compressed zip file?
If you say a statement in a comment is false, don’t we also need to know to which statement you refer? [I hope it’s not this one]
Like the quantum observer, information may be in the eye of the beholder. For example, a student receives a block of random bits (say from the lava lamps video source). This is a very meaningful gift since the she is trying to test a new statistics program (mean/std/…) for accuracy!
601:
You have understood wrong. “Data” is information. *Noise* is information. You’re making the common mistake of mixing the concepts of information and meaning. Information *can* have meaning, but doesn’t have to. Meaning can contain information, but it doesn’t have to. Information is a mathematical concept; meaning a philosophical and/or linguistic one.
A binary file on disk contains information regardless of whether it’s a bunch of randomly generated bits; a file that you know is compressed; a compressed file that you don’t know is compressed; an encrypted file that you know is encrpyted and you know the password; an encrypted file that you know is encrypted but *don’t* know the passwork; or an encripted file that you don’t know is encrypted at all. And in fact, you can “measure” the quantity of information in the file *without knowing which one it is*.
Ok, but I’m not convinced yet.
“*Noise* is information.” To confirm, you assert that a random string of bits is information?
But one can only “measure” the quantity of information in a binary file if you know the file size. Is not this a minimal context?
I am surrounded by RF, power lines, CMB, Talk Radio (Ok, some Talk Radio seems just like random noise), but unless I tune in (frequency/AM/FM/etc) I don’t get any information (signal)?
BTW: the block of random bits as a meaningful gift was an attempt at humor.
Of course we could. That’s the whole point of probability theory.
The fact that an experiment is random (i.e. unpredictable) doesn’t mean we can’t study, analyse, and yes, even make models of it.
601:
Yes, a random string of bits is information.
No, the “file size” is not a context. In information theory, we talk about measuring the information content of a *string*. We don’t care how or even if it’s stored – the fundamental idea is that *anything* which can be in any way viewed as a string of symbols of any kind is something that contains information.
Just because you *don’t receive* information from the things going on around you doesn’t mean that *they don’t generate information*.
Information has nothing to do with whether someone is listening to it, storing it, interpreting it, understanding it, or recognizing it.
It’s not information. It conveys (or contains, or carries) information.
I believe the problem is that you have not yet completely understood the definition of Shannon’s information. It’s really just a matter of definitions. (I don’t claim full understanding myself, but I’ve been at it for some years now).
A string of bits that is not random doesn’t have any information. To have information, the sequence *has* to be random (in the sense of Shannon — the only one I’ve studied).
Here’s a brief and informal explanation of how you determine the information content of a (random) sequence. I’m hoping this example is simpler an more concise than Mark’s.
1. Define an alphabet. In Mark’s example, list the keys in your keyboard.
2. Find the probability of each letter of the alphabet. Let’s say our keyboard has only the keys ‘a’, ‘b’ and ‘c’. The probabilities have to add to 1. For example:
P(‘a’) = 0.5
P(‘b’) = 0.3
P(‘c’) = 0.2
3. Use the logarithmic formula to find the information content, in bits, of each letter:
I(‘a’) = 1
I(‘b’) = 1.74
I(‘c’) = 2.32
Now, for a sequence, just add the information of each letter (that’s why we used the log formula). So the information content of the sequence ‘aabc’ is 1+1+1.74+2.32, or 6.06 bits.
Note that in this definition, the origin of the sequence is not specified. It could come from some sort of noise process, from Mark hitting his keyboard, from a rat pushing buttons… it doesn’t matter. At least it doesn’t matter since the point of view of information theory.
Again, refer to the definition. Nowhere does it say that somebody must read or decode or receive the sequence of letters in order for it to contain information. Of course, if nobody receives it, it will not be *useful* information.
HTH,
“…as a string of symbols of any kind is something that contains information.” This sounds like a context to me. You seem to need symbols (smells like a language) and structure (a string is bounded and ordered) to “contain” information. What would, if anything, wild information be, i.e. not contained?
I’ll concede the zip file point, on the basis of a “random string of bits is information”, but I’m still stuck on the context issue.
I am concerned now that your definition of information contains little information!
This discussion on the meaning of information 🙂 reminds me, though I’m sure it’s an imperfect analogy, about the meaning of causation. Causation is also describable in a lot of different ways, starting with philosophy and religion over to physics where it starts to make sense.
But also here one finds that causality does not have an intrinsic meaning. The firmest formulation is lorentz invariance that describes causality as the relativistic timeordering of localised systems inside a lightcone. But that doesn’t describe the causal mechanisms in a system. Those are described by the description of the system as it evolves in time. This as the meaning of the specific causality seems to be analogous to the meaning of the specific information.
It seems even philosophers of science are confusing information and meaning. The philosopher in biology John Wilkins said recently: “One of my favourite pet hates is “information”. This is an abstraction that usually just means that there is a mapping between one sort of structure (like a DNA sequence) and another (like the sequence of amino acids in a protein, which gets its functional properties from the way it is folded into complex shapes, assisted by other molecules). But it is so attractive a notion, so strong in its grip on our mind, that we generalise it to include aspects of our human intentionality. We start to talk about the “meaning” of genes, to continue our example.” ( http://evolvethought.blogspot.com/2006/04/abstract-and-concrete-in-biology.html )
It seems to me the mapping he refers to is much like the discussion of meaning – meaning of information is perhaps what appears after a mapping or a language transforms a structure, an amount of information, to another? That would be somewhat analogous to the mapping of a system over time describing the meaning of causality.
BTW, Mark has also explained that “There are two ways of understanding information: the on-line perspective (which is the perspective of observing the event, and characterizing information in terms of “surprise”), and the off-line perspective (which is looking at the string that results, and considering its compressibility).” ( http://scientopia.org/blogs/goodmath/2006/06/dembskis-profound-lack-of-comprehension-of-information-theory#comment-119488 )
“Information *can* have meaning, but doesn’t have to. Meaning can contain information, but it doesn’t have to. Information is a mathematical concept; meaning a philosophical and/or linguistic one.”
Um, I felt I forgot something. So either my analogy is restricted to a part of its colloqual meaning or completely off. Or/and perhaps some “structures” that are mapped for meaning doesn’t contain information in the IT sense.
And now I note Mark and Wilkins seems to have compatible views on the meaning of “meaning”. But I’m still curious if some of that could be viewed as discussing mappings from one domain to another, ie interpretation.
I am just looking for a formal definition of information. I see things as signal vs. noise, and I have an intuitive dislike of the idea that the noise is information in the same way as the signal. I don’t need the signal to “mean” anything, only that it not be noise. The CMB is boring radio listening (all oldies) because it is random noise. Although, its slightly non-uniform distribution is interesting information!
Unless we can say that something (anything) is NOT information, what use is the definition?
With all due respect, this sounds nonsensical to me.
Semantic confusion reigns supreme;)
There are about six colloquial (i.e. used in everyday discourse) “meanings” of the word “information”, and one technical. That one is the one that has to be used when discussing “information Theory” in Shannon’s sense, and bears equivalence to “entropy”. Here is a frinstance:
“For example, a string containing one million “0”s can be described using run-length encoding as [(“0″, 1000000)] whereas a string of random symbols (e.g. bits, or characters) will be much harder, if not impossible, to compress in this way”
In other words, technical “information” is related to the minimum number of symbols that are required to describe a particular sequence of symbols. The longer the minimum length, the greater the measure of its “information”. A purely random string then is not likely to be compressible, therefore is likely to express the maximum information of the set of equivalent symbol strings it is a member of.
“meaning” OTOH is a life-associated philosophical notion; the “meaning” of an utterance (a string of symbols) is IMHO to be found only in the change its reception effects in the behavior of the listener. It is also entropically reversed to “information” as entropy – life appears to localy reduce entropy, and “meaningful messages” are best conveyed in minimal symbol strings. viz “sound bytes” of today.
I stipulate to David’s technical information frame.
Ok, So a video of static would be a lot of information, a Kurosawa film less so, a static test pattern very little, and finally Fox News, no information.
It seems the random bits are a little misleading, since it is the inability to compress them, no patterns, that is its salient feature.
But compression requires a common understanding between parties (a shared context). In the extreme, could I agree that a message with no bits (zero length string), i.e. no information, indicates the email server has new spam available?
So does this mean that information IS a sequence of symbols?
601:
The problem is that you’re insisting on making distinctions that simple *do not exist* in information theory. As far as measuring information goes, there is no difference between “signal” and “noise”. Those are terms that only have meaning with reference to *semantics* and *meaning*.
“Static” is information; in fact, it’s often *more* information that what you want to refer to as signal. Information is non-intentional: there does not have to be any *reason* for it; it doesn’t have to be part of a message; it doesn’t have to be deliberately created; it doesn’t have to be deliberately observed. It’s an abstract concept.
WRT the issue about strings and symbols implying context and language… Strings and symbols are the mathematical abstraction that we use to describe things. It doesn’t mean that they’re really an intrinsic part of anything. To repeat an example, we can observe a chunk of radium, and describe its decay pattern as a sequence of zeros and ones; there aren’t any zeros and ones in the actual decay pattern of the radium. It’s the *decay pattern* that is generating information; the zeros and ones are a way of *representing* the information generated by the decay.
Finally, wrt to my comment that “meaning can contain information, but it doesn’t have to”: I believe that my life has *meaning*; but that kind of meaning isn’t informational. (We need to be careful here to distinguish between the *statement* “my life has meaning”; and the actual concept that is the meaning of my life. Any statement contains information in the mathematical sense; the abstract concept of the meaning of my life probably does *not*.)
In this case, you *are* utilizing context. A 0 all by itself is exactly one bit of information. An empty string is zero bits of information. If you take it to mean that the email server has new spam (which is at least more than 1 or 0 bits of information), then you’re actually talking about the information in relation to the entire system.
What I mean is, in this case you have to include the information that the email server contains in order to assign. This is exactly the same as when you’re defining the compressibility of something. In a purely incompressible number, the shortest program that can output it is the number itself. This is the amount of information that the number contains. A compressible number, on the other hand, is equal to the shortest compression program plus the compressed string.
In this case (the “New Spam” message), you are compressing information about the state of the mail. Because of this, you need to include the length of the compression program as well (the email program).
I was going to respond to this, but Mark said the same thing that I was going to. ^_^
The difference you’re making between ‘signal’ and ‘noise’ does not exist. It’s purely a product of an outside context assigning meaning to the string.
Think about it this way. I play you a recording of random spoken syllables. Is this signal, or noise? What if I tell you that the seemingly-random syllables are actually a language that you don’t know? Is this signal, or noise? What if I play you a recording a random high/low tones (essentially binary data) Signal or noise? What if I tell you that it’s an encoding of random syllables (and I explain the encoding to you)? Signal or noise? What if I tell you that, once again, the random syllables are actually a language that you don’t know?
Let’s reiterate and simplify. You’re given three things. After each one, you’re asked whether it is signal or noise, without knowing what you are going to receive later.
1) Random static noise, like a dial-up modem produces.
2) A list of random syllables, and the information that the seemingly-random noise was actually encoding these syllables.
3) A translation into English of the syllables, and the information that the seemingly-random syllables were actually a foreign language.
At what point does it stop being noise, and start being signal? There isn’t one. This is because the distinction between signal and noise doesn’t exist. It’s *all* information. Each of the three messages are equal in information content.
Finally…
No. Information is a mathematical concept. We represent it as a sequence of symbols because it is easy to think about then.
In Mark’s post, he says that the (Shannon) information content of the original message is 457 bits. That’s it – that’s information. It’s a mathematical quantity. We can represent the information as a string of symbols, either as the 80 characters of the original message or as 457 characters of binary digits. Either way is just a simple way of expressing the actual quantity of information. Sort of like how we can use pebbles to count. Integers aren’t pebbles, but pebbles help us work with integers and make it easier to think about them (to an extent).
Okay, I see that we go for the widest possible meaning of meaning here. I guess that means that any precise technical usage of meaning has been found meaningless. 🙂
601
The best simple description of the information content of a string is: knowing the probabilties of the symbols occuring how many yes/no questions do you probably have to ask to determine if this was the string that was generated. This isn’t exactly correct, but it gets the general idea across.
If there is one symbol that occurs 100% of the time then you will know the generated string without asking any questions.
If there are 2 characters which are equally likely. For a single character string you can’t tell which of the 2 characters is there without asking. If it was 90/10 then you would have a much better idea about which letter was there without asking.
Another way to look at it is for a collection of of strings how varied are they likely to be. If you pick 1000 strings of ten characters each from an english text and 1000 strings of ten characters generated with each character equally likely generated by flipping a coin. The strings from the english text are more likely to have duplicates and the random strings are less likely so the english text will have less information then the randomly generated or noise strings.
It seems we can talk about a measure of information (more, less, equal) without defining it. Is this as close as we can get?
So if the bits accurately represent the information, are they the same?
Shannon talks about information in terms of communication, messages. From “A Mathematical Theory of Communication” SHANNON [http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf]
However, “The Mathematical Theory of Information” JAN KÅHRE [http://www.matheory.info/], contains:
Note: I don’t know much about KAHRE, is he peer-reviewed?
Regarding the “meaning of life”, I don’t believe my life has meaning, it’s a little scary, but I don’t care for the alternatives.
601:
(1) Remember that information theory != Shannon. Shannon is one of the founders, and he specialized in one particular aspect that what relevant to the job he needed to do. He was working for a phone company trying to figure out what the information capacity of various mediums was. In that context, he was primarily focused on messages; although much of his work doesn’t *depend* on there being a message: just on their being *something* that you can measure.
(2) Even in Shannon, with its focus on messages, *noise* is information. One of the fundamental things that Shannon did was work out how to tell when noise would start to obscure the message on a given medium; it’s defined in terms of the information capacity of the medium. Noise always *adds* information to the message until the capacity of the medium is reached, and then adding noise starts to obscure some of the information.
(3) I’m not familiar with Kahre. Based on the reviews I can see online, it does not sound particularly good. The reviews frequently talk about how he “reimagines” things; “starts afresh”, “leaps over received wisdom”, etc; and how he has proposed a new rule that supplants Shannon theory using a “law of diminishing information”. Published 4 years ago, 0 reviews on amazon, 0 citations on siteseer. In fact, the author has only one citation in the entire citeseer database, and that’s for a paper on the performance of TDMA cellular radio.
(4) “So if the bits accurately represent the information, are they the same?”. If the bits accurately represent the information, then as far as *information* goes, they’re the same: they contain the same information. That doesn’t mean that the string of bits *means* the same thing as the decay of the radium atoms.
Thanks for answering my questions.
And, sorry, it looks like Kahre is nuts, he’s a big fan of Steve Wolfram (of Mathematica fame and A New Kind of Science). I should have dug deeper before posting the reference.
I also share your fondness for Ω.
Cheers, 601
Glad we could educate, 601! Misinformation about information is one of the deadliest weapons in the IDist’s arsenal.
I can’t say I was educated beyond establishing the semantics of discourse, but this is certainly a necessary first step, and sincerely appreciated. I believe Information Theory / Science still has a long way to go. I have some nascent ideas (1/2 bit of information, information context time/space issues), but may lack the math chops to take them anywhere. I’m a hacker by trade and an amateur philosopher by night.
The IDist’s are intellectual terrorists. They can’t compete in the established science realm (peer review, testable hypotheses, etc.), so they engage in asymetric warfare, instilling fear is their greatest joy. It’s all they know, the basis for superstition or faith is always profound fear.
The confusion here is simple, but profound. Information capacity is what Shannon defined, not information per se. Information is, most simply put, a difference that makes a difference. The capacity for information of a carrier is the number of such differences it can hold, under the assumption that all such differences matter. Information and information capacity are not the same thing. Shannon, a good engineer, had nothing to say about the former, and everything to say about the latter. Unfortunately, everyone ever since has chosen to use the word information when they mean information capacity, so in common parlance the same word is used for both. Disaster!
Why is this distinction more difficult to comprehend then the difference between the capacity of a vessel to hold liquid, and the liquid it currently holds?
Surely anyone can grasp the difference between a brand new hard drive and the same hard drive after it has been in use for some arbitrary period of time. The information capacity of the drive has not changed. The information content of the drive (to it’s owner) has.
It is easy to measure information capacity: there’s an equation, and besides it says right there on the box: 250 Gb hard drive. Information per se: not so much. Who is to say that my collection of songs has more information than your collection of bacterial genomes, both housed on identical hard drives? That’s when you get into the context and meaning questions: did that bits value make a difference, or not? Which is why engineers leave it alone, and focus on information capacity. Shannon didn’t need or want to know anything about the actual content of a phone call, just how big the pipe had to be to hold one.
RGH – Your comment contains some useful information, thanks.