Validity of Mathematical Modeling

In the comments to one of my earlier [Demsbki][demsbki-info-theory] posts, I was responding to a comment by a poster, and realized that what we were discussing was probably interesting enough to promote up to a front-page post.
A poster named [Garote][garote] said:
>I think PaulC said exactly what needs to be said, that grants us all
>hearty permission to ignore Dembski’s work. I think it bears repeating:
>
>”If I am betting on coin flips and notice any kind of algorithmic pattern then
>there are legitimate reasons to conclude that the person flipping coins is not
>playing fair. However, this method does not allow one to discriminate between
>the product of evolution and the product of “intelligent design” since neither
>of these processes in any way resemble random coin flips.”
>
>Dembski’s feverishly comparing apples to oranges, to prove that one is an
>artichoke.
Now, my contempt for Dembski is pretty clear. But this argument against him struck me as making a mistake – and it’s a mistake that strikes at something near-and-dear to my interests: mathematical modeling. Modeling can be sorely abused, [as I’ve discussed before][modeling]; but I don’t want to throw out the baby with the bathwater.
My whole viewpoint on mathematics is that much of its value comes from **abstraction**: the ability to look at something complex; to identify a set of relevant properties for understanding some feature of it; and to use that to create a simpler mathematical model of the complex system based on its fundamental principles. That kind of abstraction isn’t just useful for diddling around in the world of pure math, but for much of how math is used in science. There are a lot of interesting things that we can’t grasp in their entirety; but that we can understand by breaking them down through abstraction into a set of simpler facets that we can study.
I don’t want to throw away the validity of the concept of abstraction and mathematical modeling; but one way of reading that comment is as an argument against the concept of working with a simplified abstract model to understand something complex. (Note that I’m not saying that that’s what Garote meant; but I thought that focusing on this could produce an interesting discussion.)
In Dembski’s favor, I can see the “coin-flip” thing is an abstraction of a complex process to something simple that contains a relevant feature. The thing is, there are things in nature that produce highly random results; and there are other things that produce interesting patterns. Recognizing that there’s something different about processes that produce patterns could be an interesting thing; and recognizing the distinction between randomness and pattern is an interesting subject which probability theorists have spent time studying; it’s right at the intersection between probability theory and information theory, and it’s produced some interesting insights about what information is and what it means.
The problem WRT to Demsbki isn’t that he’s reducing a problem to a mathematical model through abstraction and then analyzing the model. The problem is that when you build a mathematical model, you *need to make sure that your abstraction captures all of the relevant features of what you’re modeling*. And when you draw conclusions based on the analysis of the mathematical model, *you need to make sure that your abstraction isn’t the source of the conclusion*. Both of those are different ways of stating the fundamental rule of mathematical modeling:

**Mathematical models must be validated against the thing that they model.**

Demsbki’s reduction of the process of analyzing certain features of the universe to the a metaphor involving recognition of patterns in coin flips is fine with me. The problem is that he creates a model that *omits important elements* of the real world, and then insists that the *conclusions drawn from his abstract model are applicable to the real world*. But in fact, his conclusions are the result of properties of his model, *not* of the real world. The validation of his model fails: the specific conclusions are the result of eliminating things from his model that might permit a different conclusion.
To continue with the example of the coin flip metaphor: if you’re studying patterns, you could abstract the problem of observing patterns to one of observing flipping coins in an idealized universe, where the coin is perfectly even, and nothing about the surface that it lands on can affect the outcome of a flip. That might be a useful model for trying to understand the difference between random sequences of flips, and patterns in sequences.
If you then try to use that model to determine the *cause* of a pattern, you’ll conclude that the only possible cause of a pattern *is the actions of the coin flipper*. If you’ve abstracted away everything that could influence the outcome of the coin-flip except the influence of the coin-flipper, then *in your model*, concluding that the flipper is the only possible source of a pattern could be a reasonable conclusion.
That doesn’t mean that you can then say that in the real world, the *only possible cause* of a pattern in a sequence of coin-flips is some action taken by the coin-flipper. You can’t apply that simplified abstraction of your model to the real world without showing that it’s a valid model with respect to the the properties that affected your conclusion.
The ultra-simple coin flip model isn’t valid for the real world, because it deliberately omits factors of the real world that could influence the conclusion. In the real world, there are unfair coins (both deliberately unfair coins, and coins that have become unfair either through minting flaws or through wear and tear). There are magnetic fields. There are irregular surfaces.
The problem with many of the conclusions that bad mathematicians draw from mathematical models is that *the models don’t represent what they claim to represent*. They abstract away relevant features of reality, and then demand that reality doesn’t possess those features, because they aren’t in the model.
[demsbki-info-theory]: http://scienceblogs.com/goodmath/2006/06/dembskis_profound_lack_of_comp.php
[garote]: http://garote.bdmonkeys.net/
[modeling]: http://goodmath.blogspot.com/2006/03/on-how-not-to-apply-mathematical.html

0 thoughts on “Validity of Mathematical Modeling

  1. PaulC

    I’m pretty sure we disagree on something significant, but I have trouble putting my finger on exactly what.
    The reason I think that a fixation with uniform random processes is a strange way to go about understanding evolution is not because it’s a model that misses a few things, but because it’s a good model of only small part of evolution that ignores all the other parts: for instance, the non-uniformity of selection: the phenotypes best suited to their environment are more likely to survive, or the obvious yet weirdly neglected non-uniformity of gene replication–even a small population with a significant advantage can overwhelm the background distribution in a short number of generations of exponential population growth.
    Actually, evolution could be driven by pseudorandom processes as easily as random ones. That’s how many genetic algorithms work. It could conceivably even be driven by deterministc sampling provided you were careful to make it uniform (often a difficult problem: see for instance the problem of constructing expander graphs deterministically). But it is none of these things, and I stand by my comment that Dembski’s big error is in equating evolution to randomness.
    I’d like to clarify something about the coin flip example. Informally, I might have said one can infer the coin flipper is “not playing fair.” Formally, all I would infer from the argument I gave is that the uniform random hypothesis is an inadequate explanation of the algorithmic pattern (it might remain a good way of treating other statistical properties: note that it’s often useful to approximate a pseudorandom generator by pretending it’s a true random source).
    You might still infer that the coin flipper is cheating, but that is a different kind of induction based on a model of human behavior: means, motives, opportunities. There is a lot of positive evidence that can be brought to bear since we understand why the flipper might cheat, how it could be possible, and cannot really come up with a better explanation based on the tilt of the table or the influence of tractor beams from an orbiting alien mothership. None of this applies when there is no evidence for the existence of any intelligent agent let alone a motive for the hypothesized behavior.

    Reply
  2. PaulC

    From Mark CC’s coment on the previous thread:

    There are things like pulsars – which produce very regular pulse patterns. Sure, the “I shouldn’t see anything with a regular pattern” is an OK abstraction,

    The thing that struck me here is that in astronomy, it’s a pretty poor abstraction, since astronomy has historically been a science that finds many very regular patterns. True, when pulsars were first discovered, that was a pattern nobody had yet seen. I imagine the novelty was intriguing enough to consider the possibility of an intelligent source. I don’t know how seriously this was taken, but obviously it didn’t take very long for conventional scientific reasoning to come up with a more compelling explanation.
    There is a small subsegment of observable phenomena best modeled as random. Even then, it’s usually not true randomness (quantum events are an exception), but a complex chaotic system with so many confounding factors, that a more faithful analysis is unlikely to give us better predictions than a simple stochastic model. These models are useful in analyzing situations even when we a missing a lot of details. An irregular table probably won’t make a die roll more predictable and it will probably be uniform; if the die itself is asymmetric that could bias the distribution, but would not enhance its predictability in any other way.

    Reply
  3. Mark C. Chu-Carroll

    Paul:
    Actually, I don’t think we really disagree. The original quotation of you could be interpreted as a general condemnation of the idea of abstracted mathematical models; I’m arguing against that interpetation of it. I don’t think you really meant to say that all abstract mathematical models are bad; just that Dembski is using a bad model. I wanted to focus in on that issue: that the problem is very specifically that Dembski is using a *invalid* model, because the conclusions that he wants to draw from it are conclusions about things that were deliberately excluded from the model.

    Reply
  4. Jane Shevtsov

    You wrote, “The problem is that when you build a mathematical model, you need to make sure that your abstraction captures all of the relevant features of what you’re modeling.” That would, in many cases, defy the purpose of modeling, at least if you’re using the model to learn about the world rather than making exact predictions.
    I’m an ecology student, so I’ll use an ecological example. Lotka-Volterra predator-prey models require only growth rates of prey and predator populations, their initial sizes, and parameters describing how the populations affect each other. You get dynamics that are almost never observed in nature, but that discrepancy teaches you something.
    Now you can add prey carrying capacity, refuges, predator satiation, and multiple prey, predators, competitors and mutualists, getting closer to reality with each step. All the factors I’ve listed are relevant to population dynamics, but a model including all of them is horrendously difficult to comprehend, not to mention parametrize. (Alternatively, you can start with a whole-ecosystem model, but that will also be heavily simplified.) I’m sure you’ll agree that models that don’t include all relevant factors can be useful — you just have to be very careful in interpreting them.

    Reply
  5. Mark C. Chu-Carroll

    Jane:
    That’s basically what I was trying to say.
    You can create a mathematical model of something that focuses on one or two aspects of it – that abstraction from the overwhelming complexity of reality to the simplicity of the model is precisely the value of mathematical modeling.
    But when you want to draw conclusions from that model, you have to be sure that the conclusions aren’t an artifact of your limited model. If the process of abstraction in generating your model eliminated a factor that’s relevant to the conclusion you want to draw, then you can’t use that model to draw that conclusion.
    So in the case you’re talking about: you can understand certain population dynamics by considering a limited model. And within the constraints of the model, that abstraction is valuable, and can lead to interesting insights and discoveries. But if you wanted to draw a conclusion about, say, the cause of a sudden precipitous population drop, and your model didn’t include predation, then the set of possible causes for the population change that you could infer from your model will be incomplete: you’d be missing the possible causes that were excluded from your model.
    What I’m trying to argue for here is the necessity of the validation step: before you can draw a conclusion about reality from a mathematical model, you need to demonstrate that the model is valid with respect to the features that could affect the conclusion.
    So to go back to the Dembski’ian model: if you assume a fair coin and a smooth table, and you get a highly improbable pattern of coin-flips, you can reasonably conclude that the coin-flipper is doing something to produce the non-random pattern. But if you see a non-random coin-flipping pattern in the real world, you would probably guess that the most likely would be an unfair coin. The conclusion (cheating flipper) is justified in the model; but because the model can’t be validated against reality for that conclusion, the conclusion can’t be extrapolated from the model to reality.

    Reply
  6. revere

    This is an interesting discussion, to which to which I would like to add some things slightly skew to its current direction.
    i. Evelyn Fox Keller has written a fascinating book, Making Sense of Life, which is about the history of physical and mathematical modeling in developmental biology (Harvard U. Press, 2002, I think). What’s interesting about Keller is that she is one of the well known and accomplished science studies scholars/historians, but she started her career as a mathematical modeler, working with Lee Segel. One point she makes about the failure of mathematicians like Rashevsky in the 1930s – 1970s to gain traction with biologists was that the strategy of simplifying and abstracting (Consider a Spherical Cow) was alien and irrelevant to biologists who are very impressed with and tied to the complexity of their subject. If you consider what the word model means to each camp, on the one hand it is the logical skeleton of just what is needed and nothing more, stripping away all the unnecessary complexity, while for the other, the word model applies to an animal model, like a mouse or a fruitfly whose value is that it retains all the complexity of the object of study (e.g., a cell, a chromosome, a human) but in standardized and manipulable form, i.e., it is a stable target for explanation. As Keller demonstrates with citations to original sources, throwing away the complexity was a non-starter for biologists, who just went ahead and ignored physical and mathematical models as irrelevant.
    This has changed dramatically in the last couple of decades but the reasons have less to do with biologists coming around to physicists and mathematicians ideas than the other way around. the 800 pound gorilla is not high energy particle physics, as it was in the first two thirds of the 20th century, but molecular biology.
    Where mathematics now has its foot in the door is via discrete mathematics because genomics has produced problems perfectly suited to that subject. Computer scientists now have tools of immediate use, tools that are actually necessary.
    In my own field of epidemiology, however, there are still few uses for mathematical models, excepting for the moment statisical models which are in a slightly different category. Dynamical systems has had some use in infectious disease epidemiology but is still distrusted, especially by statisticians. Other than that, it is pretty barren, still, and the complexity versus simplicity/abstraction problem is one of the main cultural stumbling blocks. (Full disclosure: I teach mathematical modeling using ODEs).
    ii. Concluding point: There is an old adage of modelers that I think is relevant here: all models are wrong but some models are useful.
    Sorry to run at such length, but your post stimulated me to think about this more.

    Reply
  7. Eric Wallace

    So to go back to the Dembski’ian model: if you assume a fair coin and a smooth table, and you get a highly improbable pattern of coin-flips, you can reasonably conclude that the coin-flipper is doing something to produce the non-random pattern.

    What exactly is a highly improbable pattern of coin-flips? All sequences are equally (im)probable.
    Certainly we would be tempted to call the coin unfair if it were converging, after a large number of flips, to some head/tail split that was far from 50/50. But there are any number of “non-random” (seeming!) sequences that still appear fair; for example, a continuing cylce of four heads, four tails, four heads, etc.
    We might suspect something is fishy, but do we have any solid basis for that suspicion?
    I’m betting that in Dembski’s head, there are three kinds of patterns in the world: (a) randomness, (b) regularity (like my four heads, four tails), and (c) intelligent (like, say, coin flips coming up 1 head, 2 tails, 3 heads, 5 tails, …, in the sequence of prime numbers).
    I don’t think we have any way to really distinguish between categories (b) and (c) except to show that an intelligent agent was actually behind the pattern. In other words, information theory can’t tell us anything special about the pattern of prime numbers versus any other mixed up patterns of heads and tails. Maybe I’m wrong (I don’t know much about information theory), but that’s my sense.
    On the other hand, I have no doubt that if I were flipping coins like that in Vegas, I’d get kicked out.

    Reply
  8. Mark Chu-Carroll

    Eric:
    The question “What exactly is an improbable series of coin flips” is, basically, one of the fundamental questions that underlies the probabilistic interpretation of information theory. Basically, if the series of coin flips corresponds to a bit sequence with low information entropy, then it’s a more improbable result.
    You can characterize the definition of information entropy in a bunch of different ways; they’re all just different ways of describing the same underlying quantity.
    One good way of defining entropy that came up in my first post about information theory back on blogger: surprise. If I’m observing a sequence of coin flips, how surprised am I by the result of the next flip? If 90% of the time when I flip the coin I get a head, then I’m going to expect a head – I’ll be right 90% of the time, so there’s very little surprise. If the coin is fair, and it doesn’t produce any pattern in the flips, just a 50% head/tail distribution, then any prediction I try to make will not be right more than 50% of the time. The more often I can predict the result of the next flip; the less often I’m surprised by the result; the lower the information entropy.
    Another way of looking at it is to take the result of a series of coin flips, and treat it as a string of ones and zeros. Then the information entropy is inversely related to the compressibility of the string. This actually means exactly the same thing as the “surprise” definition; it’s just based on recursive function theory rather than probability.
    In both cases, in a real random process, you expect to see high information entropy. The “typical” result will be some sequence of heads and tails with approximately the same number of heads and tails, with no strong patterns. If you’re observing the sequence while the coin-flips are going on, then in a good high-entropy sequence, you won’t be able to predict the next flip correctly more than 50% of the time; and after you’re done, looking at the sequence, it will not be very compressible. (For example, I just threw together some scheme scripts for an experiment, and did 512 bytes of randomly generated bits. Trying to compress that with my trusty gzip, I wind up with 529 bytes. Good quality compression :-). When I generate a sequence of alternating 0s and 1s – which will allow me to predict the outcome of the next flip with 100% accuracy – gzip will compress it from 512 bytes to 30 bytes. So the 1/0/1/0 sequence is an uncommon, unexpected pattern of flips – because you expect randomness to generate high entropy.

    Reply
  9. revere

    Eric: It would seem to me that (b) and (c) are explanations for the (unlikely, but of course possible) patterns that we see. If one had a prior explanation that either fit, then the pattern would be a confirmation. Otherwise, it suggests an explanation one tests with other experiments (pardon the comic book version of modeling we see in texts, but it fits this somewhat artificial question).
    The problem with (c) is that it is an explanation that will fit any pattern and is therefore not very informative. Over at Effect Measure we put up our usual Freethinker Sunday Sermonette yesterday, this one asking why God was silent about the Holocaust (maybe because there is no God?) and several commenters used “God’s silence” as evidence for the existence of God. Go figure (if you have an algorithm).

    Reply
  10. loren

    To my mind Dembski illustrates by example a truth uncomfortable to some, obvious to others: one can be a mathematician yet still be a lousy mathematical modeller.

    Reply
  11. PaulC

    Eric Wallace:

    I don’t think we have any way to really distinguish between categories (b) and (c) except to show that an intelligent agent was actually behind the pattern. In other words, information theory can’t tell us anything special about the pattern of prime numbers versus any other mixed up patterns of heads and tails. Maybe I’m wrong (I don’t know much about information theory), but that’s my sense.

    Mark addressed your question, but I’d like add my take on it.
    You’re clearly right that every individual sequence has the same probability and could potentially be generated by flipping a fair coin.
    However, even conventional statistical inferencing (forget about information theory for a second) does attempt to draw conclusions from an individual sequence and whether it supports the hypothesis that the sequence is the outcome of a fair coin. A standard method is to use a p-value http://en.wikipedia.org/wiki/P-value
    Example (1) Suppose 90 of 100 coin flips came up heads. You cannot say anything about the probability of getting that sequence, but you can ask what’s the chance of getting something like it.
    The relevant feature is “too many heads” so you ask what’s the probability that in any toss of 100 coins 90 or more come up heads. That probability is ((100 choose 90)+(100 choose 91)+…+(100 choose 100))/(2^100) which you can easily verify is a very small probability. That is the number you use as your p-value, and you can interpret it (in a way that’s easy to get wrong) to convince yourself that the null hypothesis of assuming a fair coin is a bad explanation for that sequence.
    From the above Wiki page, the p-value is “is the probability that, given that the null hypothesis is true, T will assume a value as or more unfavorable to the null hypothesis as the observed value t_observed.” or informally, if that was a sequence of fair coin flips, then the probability you would see something like what you just saw is very, very small. You cannot rule it out entirely, but you are justified in thinking that the null hypothesis is not a compelling explanation.
    (2) Now suppose the coins came up in a way that looked statistically random according to most tests you can think of, but you found that a tiny computer program implementing a pseudo-random generator was able to predict a series of 10000 coin flips. You might ask if you still have a good reason to suspect something fishy is going on.
    I have noticed that this point trips up a lot of intelligent people. We’re trained not to look for “secret codes” in data since it’s easy to make up explanations that fit completely random data. Nevertheless, you can come up with a p-value that treats cases like the above. Fortunately, we have one generally applicable one that I can state right now, that can be used for all future events, and is immune to charges of cherry-picking or overfitting:
    Given a string of n bits generated uniformly (i.e. by coin flips) what is the probability that it is compressible to k bits (i.e. there is a k-bit Turing machine program that will write exactly n bits on its tape and halt)? This is an easy probability to bound. There are at most 2^k distinct k-bit TM programs. There exactly 2^n n-bit strings. Thus, the probability is 2^k/2^n. As your p-value, take the probability that it is compressible to k bits or less: Sum(i=1 to k) (2^i/2^n).
    To get back to example (2) suppose you have seen 1000 coin flips and you can generate the whole sequence with a 100-bit TM program. There might yet be a shorter program, but you know that it is compressible to 100 bits, and possibility even less. Calculating the above sum directly, you can verify that the probability of getting a sequence like that is very small. Again, there is a possibility of getting this from a fair coin, but you can say “well, if it was a series of fair coins, then it’s awfully unlikely to see it do something like that.” That is completely valid statistical inference and no one should feel afraid to make it.
    But all of these statements are about a sequence and others like it. It remains true that any individual sequence is equally probable. You need some hypothesis that groups observations and similar ones, and it has to be something you could come up with before looking at the one in question.
    Finally, one thing this does not tell you is where the sequence did come from. Ruling out a random explanation is not a reason for preferring an “intelligent agent.” You need some positive evidence to support that. This is one thing I find intolerable about Dembski: he misrepresents a legitimate and growing field of research to which he himself has contribute nothing but smoke.

    Reply
  12. Mark Chu-Carroll

    loren:
    I think it’s worse than that. The stuff I’ve been reading by Demsbki recently has led me to believe that he’s not even a competent mathematician. He’s good at the jargon, but when you work through the density of his hideous prose, you find that his work is a pile of slop. He conflates concepts in bizzare ways that look like he doesn’t understand the distinctions; he contradicts himself; he makes incredibly foolish trivial errors. Either he’s decided that no one really pays attention to understanding his stuff and he doesn’t need to bother to even try to make sense; or he’s just completely incompetent. (Did you see his post about the “increasing interest in ID”, where he didn’t seem to realize what it meant when it said the numbers were normalized?)

    Reply
  13. PaulC

    Mark C-C

    Either he’s decided that no one really pays attention to understanding his stuff

    I think it’s a safe bet that most of his supporters don’t, and that Dembski is aware of this. His detractors might, but that just gives him another opportunity to respond with another ink cloud (no offense to squids) of mathematical symbols and jargon.
    I think there’s enough evidence to conclude that Dembski is dishonest. You can go by his self-reporting. He dismisses a statement that was either a blunder or a lie as “street theater.” http://www.uncommondescent.com/index.php/archives/480#comment-13305 More recently, he has advocated putting rhetorical “pepper on the gloves” http://www.uncommondescent.com/index.php/archives/1213#comment-43255
    I know what mathematical discourse looks like, at least in the context of theoretical computer science. Very rarely, I have seen people disagree to the point of being agitated, but I have never seen anyone but Dembski make the above kinds of comments.
    He might also be incompetent. There is nothing mutually exclusive. The fact that he’s come to a laughably incorrect conclusion would suggest there’s a screw loose somewhere. Maybe he wants to believe so much that he intentionally segues between loosely related concepts as if they were the same.
    The most basic mathematical training teaches you that one mistake invalidates an entire proof (many published proofs are thus invalidated although the mistakes are fixable) so if you are consistently sloppy, you can get from any assumption A to any conclusion B in a manner that looks like it could work until it is read very carefully.
    If Dembski doesn’t realize this, then he is indeed incompetent. I’m still more inclined to think that he realizes it and is so dishonest that he does it intentionally. I just haven’t seen any reason to believe that he wouldn’t; it’s too easy to rationalize away such lies when you believe you have a more important mission.
    My working hypothesis is a little more specific. I think that Dembski may think that there’s a mathematical proof that evolution is impossible, but he’s also smart enough to realize that he hasn’t come up with it. He may have thought he had one at one time, but his detractors have convinced him of its faults. So he has two jobs. One is to keep alive the notion that he has proved something important. That requires spinning ever more elaborate justifications to keep the opposition and convince his backers that he is doing something important. The other is to find the real proof, which he may honestly believe exists. Lately I doubt he’s had much time for the second task.

    Reply
  14. PaulC

    Oops. I meant “keep the opposition busy.” I don’t think he will have any trouble holding onto his opposition.

    Reply
  15. Eric Wallace

    Thanks to those who reponded to my comment. I’m still digesting the entropy/k-bit Turning machine stuff. But I think I understand the compressibility argument, and the idea that one can say something about the non-randomness of a process if the output can be generated by some simplified means. In Mark’s words,

    [I]n a real random process, you expect to see high information entropy.

    But this got me thinking. What about the converse of this statement? If we see high entropy, what can we say, if anything, about the underlying process? For that matter, what does it mean to see high entropy?
    Looking at PaulC’s Turning machine argument, it seems that we have to find the smallest k-bit machine in order to really get a handle on the entropy. But surely this—finding the best compression algorithm—is an NP kind of problem. For instance, in Mark’s gzip example, he overlooked a very good compression of the output sequence: the Scheme program itself! But looking at the output sequence alone, would he have ever found it? In general, only an exhaustive search of k-bit algorithms for all k < n is going to answer the entropy question.
    Is it true then that all we can say about entropy is that it is (a) low, or (b) unknown?
    This doesn’t make me feel to good about mathematical measures of entropy. Am I off base here?

    Reply
  16. Mark C. Chu-Carroll

    Eric:
    Entropy in information is a tricky thing, particularly when you’re doing it K-C style. The trick WRT KC is that you can’t in general say something like “The entropy of string S is 26”.
    But: you can say that in a given context, the entropy of string S is greater than the entropy of string T.
    And: I didn’t overlook the output you’re thinking of. In K-C theory, you ultimately are always talking about a sum: the length of a program, plus the length of input data. The information content of a string is the shortest combination of program and input data that can output that string.
    For a high entropy string S, the shortest program to output it *is* “print S”. But that’s not compression; that’s *longer than* S. It’s the length of S, plus the length of a program that copies its input data to an output. For any string S, that’s pretty much the worst case non-contrived program – and so if there’s nothing shorter, then S is a very high entropy string. (There is a catch in all of this, which you can see by looking at my post about Behe and irreducible complexity. Knowing the shortest program to generate a string is, in general, a Godel self-reference problem.)

    Reply
  17. PaulC

    Eric Wallace:

    If we see high entropy, what can we say, if anything, about the underlying process?

    My answer would be “not much.” More below.

    But surely this—finding the best compression algorithm—is an NP kind of problem.

    Oh, no, it’s not nearly as easy as that. Finding the best algorithmic compression is incomputable. You can show that a string is compressible to k-bits by showing a k-bit program to compute it, but there is no way to show that no shorter program will do. Roughly speaking, the trouble is that even if you enumerated all the shorter programs and ran them there is no computable function that bounds the number of steps needed for them to produce their final output and halt. So it’s as hard as the halting problem. I believe there is another kind of information measure that takes computational complexity into account but I’m not familar with it.
    As for what happens when you “see high entropy” that does not mean you know there is high entropy. For instance, if you believe that there are secure encryption algorithms, then it should be possible to take any low entropy input–e.g. a string of all zeros–and encrypt it so that it looks like it has high entropy using an algorithm and an encryption key that is shorter than the input string.
    Since this encrypted text is fully described by the low-entropy plain text, the key, and the algorithm, it is really highly compressible. Based on recent behavior, the human race has bet high stakes on the notion that you can make other people “see high entropy” when you know that the entropy is in fact rather low. I’m of the opinion that it’s a reasonable bet, but there is no fully satisfying justification of any of the encryption methods now used. Sometimes people find methods to discover patterns is apparent high entropy. However, it’s not easy.

    Reply
  18. Eric Wallace

    But: you can say that in a given context, the entropy of string S is greater than the entropy of string T.

    I guess the interesting question for me is, what constitutes the context? If it includes knowledge of the process which generates the strings, then I can believe this statement. Without that knowledge, then we’re left guessing at what the smallest program that can generate S or T is. We may suspect that one has greater entropy, but we can’t prove it.

    In K-C theory, you ultimately are always talking about a sum: the length of a program, plus the length of input data.

    The program I was thinking of was actually the pseudorandom number generator plus any initial state, not just a program that can print the string. That would be excellent compression of any significantly long output string. I suppose all strings above a certain length would have entropy less than or equal to some fixed amount, as determined by this program.

    Reply
  19. PaulC

    Eric Wallace:

    The program I was thinking of was actually the pseudorandom number generator plus any initial state, not just a program that can print the string. That would be excellent compression of any significantly long output string. I suppose all strings above a certain length would have entropy less than or equal to some fixed amount, as determined by this program.

    I may not be following you, but I think this is a mistake. It’s true that you could have a pseudorandom generator that eventually output any string as a substring. For that matter, it could be a deterministic “dovetailing” program that output all binary strings sorted by length. E.g. 0100011011000001010011100101110111…
    But just because it would print your string as a substring would not make it a compression of your string. You could make it a compression of your string, by specifying the start and end points of the substring you are interested in and printing only in that range, but it would take just as many bits to encode these positions as it would take to encode the string itself. So a program that outputs all possible strings is not a compression of any given string.
    It is easy to prove that most strings have high entropy. There are 2^n n-bit strings but at most 2^k k-bit encodings (for some fairly straightforward reasons there are actually fewer). So at most 2^k/2^n strings have entropy k.

    Reply
  20. Doug

    1 – I highly recommend the following article related to evolution-
    ‘What Birds See’ by TIMOTHY H GOLDSMITH, professor emeritus of molecular, cellular and developmental biology at Yale University
    Scientific American, Jul 2006 Vol 295 Issue 1 p68, 8p
    Goldsmith has performed experiments suggesting that birds and reptiles [tetrachromatic opsins + UV + oils in cones] have vision superior to mammals [dichromatic cones but more rod reliance]. Further, some primates, including humans [trichromatic cones] have regained some loss of this vision.
    This appears to suggest that either genes were lost or, probably more likely, two cones were put to other brain uses [neocortex?] during this period of dinosaur domination.
    2 – I think Kandel shared a Nobel prize for memory formation with a sea slug, Aplysia [intelligent, but not sapient].
    “Professor Eric Kandel showed that transmitters of the same type as studied by Arvid Carlsson, via the protein kinases characterized by Paul Greengard, are involved in the most advanced functions of the nervous system such as the ability to form memories.”
    http://nobelprize.org/nobel_prizes/medicine/laureates/2000/presentation-speech.html

    Reply
  21. Torbjörn Larsson

    Describing modelling is a complex business. I haven’t read those referenced books on it, but I have some small experience of doing it on processes with persistent general but complex behaviour.
    We learned the following:
    1. If your model explains the basic behaviour, people will adopt it after you have done a few experimental papers using it.
    2. If your model explains the basic behaviour, it will help considerable in defining the remaining properties of the process, see 1.
    3. The model will be more complex than basic theory, since we want basic theory to be parsimonious and thus applications are messier, but it will still be useful to make it as parsimonious as possible, see 1 and 2.
    I note that ID makes a point of consistently failing on point 1.
    Another thing we learned from other groups model attempts is that people actually circularly put in ad hocs that gives them the behaviour they want to verify the model with. The ad hoc gives the persistent feature that they see, ie the model is ‘verified’. So is the next model which uses a different ad hoc. And those models fail in the next experiment that typically need other values on the free parameters of the ad hoc.
    The later problem reminds me somewhat of when Dembski invents a “law of conservation of information” since he wants to prove that a designer provided information. The law assumes ad hoc evolution is closed to prove that information is conserved, ergo evolution needs information input. (Evolution isn’t a closed system since the environment both changes and provides information on how new evolutionary properties succeeds.) Or when Behe ad hoc assumes irreducible complexity is irreducible.
    It isn’t clear cut analogies but in general it shows the problem of assuming what you want to show. The only predictive about such results are that they will fail.

    Reply
  22. Eric Wallace

    PaulC said:

    I may not be following you, but I think this is a mistake. It’s true that you could have a pseudorandom generator that eventually output any string as a substring.

    As I reread my comment, I see that I did not word it very well. I really just meant that the program (plus initial state) would be a good compression of a particular string; in this case, the string that Mark generated with his program. When I wrote “That would be excellent compression of any significantly long output string”, I didn’t mean any string, just any string that is naturally generated by that program (i.e., no “substring”ing involved). I was just hedging against the fact that if you stopped after generating only a few bits, then the program obviously wouldn’t be a good compression of the string. You have to generate enough bits that storing the program, seed and length is worthwhile.
    But I think you answered my main question with the “not much”. 🙂

    Reply
  23. PaulC

    Eric: I wanted to add one more comment that should be obvious to anyone who’s studied recursion theory, but I hadn’t thought about it too carefully when I responded. Given any algorithm that purports to decide whether strings have high or low entropy, I can fool it into identifying a low entropy string incorrectly as high entropy.
    Suppose you give me a k-bit program A that looks at any n-bit string and eventually halts with output LOW if it has a description of size less than n bits (note: it could be as high as n-1, which is not very compressible), and HIGH if it fails to find such a description.
    I can use this to make a k+m-bit program B that takes a number n as input, enumerates all bit strings of length n, and passes each string to a subroutine based on program A. As soon as that subroutine returns HIGH, it outputs the string s that gave that result.
    From the above, we can see that the string s has a description of size roughly k+m+log_2(n), which is much smaller than n for all sufficiently large values. Thus, no matter how many low-entropy strings your program manages to identify correctly, there is always at least one that it falsely identifies as high entropy.
    BTW, there is nothing magic about log_2(n). You could start by constructing some very large n with a short description (e.g. (2^(2^(2^(2^(2^2))))) and fool the same program into indentifying it as having n bits complexity for that huge value of n when in fact its actual complexity was very small.

    Reply

Leave a Reply