Remember when I talked about the problems with Bayesian probability? As you’ll probably recall, one of the things that drives me crazy about Bayesianism is that you get a constant stream of crackpots abusing it. Since the basic assumption of Bayesian probability is that you can always use it, you’ll constantly get people abusing it.
Christopher Mims, who was one of the people running ScienceBlogs when I first signed on, sent me a classic example. A professor has published a paper in a journal called “Astrobiology”, arguing that there’s an exceedingly low probability of intelligent life elsewhere in the Universe.
It’s pretty much the same kind of claptrap that you find in the Bayesian
proofs of God. That is, pick the conclusion that you really want to get,
and make up numbers for the priors needed to get the result you want.
So, for example, Watson fairly arbitrarily assumes that there are four key events in the development of intelligent life:
- Single-celled life.
- Multi-cellular life.
- Specialized cells types.
- Language.
He takes these four, and assigns each of them a prior probability of 10%. Why 10%? Who knows, it seems reasonable. Why four steps? Well, because he wanted an end result of 0.01 percent, and that takes four events.
Then he plays with it a bit more, to argue that the time between when intelligent life emerges and when it dies out (due to the death of its solar system) is likely to be quite brief, and so the odds of two intelligent life forms coexisting at the same time in locations where they could make any kind of contact are vanishingly small.
Like I said: this is rubbish. The “4 independent steps” are a crock; you could just as easily make it 8 steps:
- Replicators
- Cells
- Mobility
- Organelles
- Multicellular
- Specialized cells
- Sensory apparatus
- Sexual reproduction
- Tool use
- Language
Why is the four steps more reasonable than the ten? Just because that’s what produced the result that he wanted.
Even if you accept the idea of the best model being these four key milestones, the uniform 10% prior is a crock.
You could easily argue that the probability of
primitive replicators is higher than 10%, or lower than 10%. You could easily argue that once there was a replicator, that the odds of it developing into cells was higher than 10%. But you can make good arguments for it being higher, or for it being lower – because the fact is, we don’t know.
You could easily argue that given single-celled life, the probability of some kind of cooperation leading to multi-cellular life was quite high. You could also argue that it was quite low. Assigning it a prior of 1 in 10 is what is technically known as “talking out your ass” – meaning making it up as you go along to produce the result you want. Because we don’t know.
Once you have multi-cellular life, my own guess is that specialization
would be inevitable. But that’s a guess: the fact of the matter is,
we don’t know. There’s lots of work going on in biology to figure things like
that out, but the fact is, at the moment, we don’t know, and so any figure that someone spits out for a probability is just something that they made up, because it sounded right.
Once you have creatures with specialized cells and internal organs,
what are the odds that they’ll develop language? Who the hell knows? Probably pretty damned unlikely, given the number of species on earth who have distinct organs, but no significant spoken language. Certainly seems reasonable to guess rather less than 1 in 10. But again – that’s a guess.
I could keep going. But the point is, the whole thing is just made up.
And it should be quite obvious to just about anyone who looks at it
that it’s just made up. It’s an abuse of math to make a rhetorical argument. The truth of the matter is, Professor Watson thinks that intelligent life is very rare, and so he threw together a bunch of bullshit to make that argument
look more serious that “I think it’s unlikely”. See, it’s not just that he thinks so – it’s that he’s done a mathematical analysis to determine that
it’s unlikely. The math doesn’t say anything more than “He thinks it’s unlikely”. But he’s been able to turn that into a journal publication and a whole lot of press – by wrapping it in the trappings of math.
It’s not really the fault of Bayesian probability. Idiots will be idiots, and people who want to misuse math or science will find a way. But Bayesian
probability does, by making a claim that it’s always applicable, lend itself to this kind of thing more than any other kind of math I know. And that has the unfortunate effect that when I hear about a Bayesian analysis of just about anything, my bullshit detector goes on high alert.
When you see something like this, there are some simple tricks for recognizing it as bullshit, which I’ve tried to follow above. The main thing is, look at the priors. There are two things that you’ll see in trashy arguments: a set of uniform priors for very different events; and a very random-seeming set of events.
In this article, we’ve got both in spades. We’ve got four priors, which are
completely random – there’s no particular reason to believe that these four are independent, or that they’re the important factors. And we’ve got uniform priors for wildly divergent phenomena: “Cellular life”, “multicellular life”, and “language” all given identical priors without justification beyond “Well, it seems right”.
That’s Bayesian garbage, and it’s very unfortunate that there’s so much of it
out there. Because there’s a lot of really good math built on Bayesian probability. But my instinct is always to mistrust anything Bayesian, because the good math doesn’t go out of its way to advertise itself as Bayesian – the authors just show you how they
did their computation, how they picked their priors, etc. Whereas almost anything which is explicitly labeled as a “Bayesian proof” is crap.
Nice post Mark. Is there any requirements for defining valid Priors? I can obviously see instances where the Priors are valid (such as your coin-flip scenario), and thus the math is completely sound. However, it appears that the nature of Bayesian probability requires a lot of uhmm… estimating. Does the math of Bayesian probability have some type of rigorous/formalized method to exclude invalid Priors?
Daithi:
Unfortunately, there’s nothing formal in terms of requirements for priors – and that’s sort of the problem. The point of Bayesian probability is to allow reasoning in the face of uncertainty, and the priors are gauges of your uncertainty.
So it comes down to a subjective thing – you need to look at the priors, and see if there’s a good reason to trust them. The article I linked to is a great example of unsupported priors – the author makes no real attempt to provide any reason for them.
For a personal example of good priors: my family has a strong history of diabetes. My great-grandfather, my grandfather, my grandfather’s sister, my mother, and my mother’s brother all have/had type-II diabetes. When you look at the population as a whole, around 5% will develop type-II diabetes. So for most people, an initial prior estimate of their chances of getting diabetes is 0.05. Given my family history, according to my doctor, my risk is roughly 3x average. If I keep my weight down, I can cut that risk roughly in half. So we’ve got a set of priors that come from statistical analysis of large populations – 5% average risk; 15% risk with strong family history; 50% risk reduction for low weight.
The key is that there’s a reason for those priors. We have large numbers of observations to give us some way of making an educated guess. If I were just pulling numbers off the top of my head, I would have guessed that the risk is much higher than that, given the number of family that have it. But I’d just be guessing, so it would be a crappy prior.
So that’s what you need to look at: what kind of support can you provide for the priors?
I think that this really points out one area that you need to be careful of when teaching Bayesian methods. The argument I hear all the time in favor of Bayesian probability – which induces varying degrees of wincing depending on how strongly they put it – is that it’s wonderful because it lets you reason in cases where you can’t have repeatable experiments and frequentist notions (at least in a formal sense) therefore can’t apply. But I rarely hear the should-be-mandatory caveats that your conclusions will only be as good as your priors let them be, and if you really have no idea how good your prior is, you shouldn’t be calling what amounts to a mathematically consistent set of educated guesses “Bayesian inference”.
In my understanding (I’m not a mathematician) the use of Bayes theorem didn’t make a Bayesian. Using frequencies from large samples is plain “frequentism”. Bayesian inference assumes that when starting with whichever priors, by repeated use of prescribed techniques, the probabilities, as measures of degree of belief, will converge toward the true value. However, I’m wondering what “true” means, that to which the beliefs converge?
I would be interested to hear you talk about overfitting in this context. If I gather statistics on a sample, I’m likely to generate priors biased toward that sample and go awry when I meet the whole population. Can you do n-fold validation of Bayesian networks/priors? What would that look like?
This is also reminding me of the black swan. And the problem of induction.
The whole point of Bayesian probability is that there is a systematic way to improve your initial subjective guesses about likelihood in light of new information. The initial priors are not particularly interesting in themselves, except to the extent that they allow you to quantify your preconceptions.
In the case of the probabilities of ETs, the interesting part of Bayesian probability is conspicuously absent: we don’t have any new information to use to adjust our priors (well, beyond the observation that 40 years of SETI have yielded nothing)
You miss the whole point of Bayesian probability by talking about priors being right or wrong. Bayesian probability merely tells you how the priors are related to your posterior.
Honestly, why are you even writing this stuff?
Hi Mark
We’ve been discussing the Watson paper over at…
http://www.centauri-dreams.org/?p=1818
…”Centauri Dreams” and a lot of personal prejudice seems in evidence, but not much fact. I had thought something stank about Watson’s, and Brandon Carter’s original argument, but I couldn’t quite point to what it`was. Thanks for an educational spleen venting.
Umm, Bayesian inference is a way to turn around probabilities.
Actually, it’s the only way to do it.
Learning from data is not possible without assumptions. Bayesian inference makes these assumptions explicit.
Somebody in the comments to the previous post linked to Cohen paper describing the fallacies of P value testing: If we get a result that P(data|model) is small, that is simply *not enough information* to say that P(model|data) is small, since P(model|data) = P(data|model)*P(model)/P(data) – we have to know or estimate P(model) and P(data).
The perfect example of P value testing going awfully wrong is this:
Only one in a million Americans is a member of congress. If you get a random person from Earth, and if that person is a member of congress, what is the probability that the person is American? Let us have a null hypothesis that the person is american. Well, P(congress|american) = one in a million, hence a frequentist says we can discard that hypothesis with a very low P value and say that “it is unlikely that the null hypothesis would produce the data we observe, hence the null hypothesis is incorrect and the hypothesis is correct, the person is not an american”.
When the whole time the P(congress) was even smaller, turning the whole thing around…
Just pretending to learn about data without doing any assumptions is mathematically incorrect.
Mark, after discussing Bayesian inference with a graduate student friend, I don’t think many people actually have a clue what it is even about, and how the results of frequentism often don’t answer the questions posed in the research at all.
So I say, dive deeper into Bayesian stuff…
Summing up Bayesianism in one sentence:
It’s a rigorous way of exposing the assumptions that have to be done anyway if one wants to get any results.
More correctly, MarkCC doesn’t know of any formal requirements for priors. Anyone interested can google “reference prior”, “noninformative prior”, “weakly informative prior”, etc., to find work on this question.
Here’s a little sample of reasoning about prior probability distributions. Suppose you have two random deviates from a Gaussian distribution with unknown mean and standard deviation. You are about to make a third observation, and because you have no information about the location and scale of the data distribution, you consider all possible orderings of the three data points equally likely — in particular, your posterior predictive probability for the event “the third data point is between the first two” is 1/3. Harold Jeffreys proved that the only prior distribution for the standard deviation which satisfies this constraint is the standard noninformative 1/σ prior.
Bayesian, Frequentist. Frequentist, Bayesian.
Probability, in any clothing, is often misunderstood, misused, and abused.
Too many people who should know better take a frequentist p-value or confidence interval and give it a Bayesian interpretation.
Too many Bayesians perform “model checking”.
Too many…
I don’t think that multicellularity necessarily leads to cell specialization. Multicellularity occurs in a lot of organisms – bacteria and eukaryotes. However, multicellularity with specialization only occurs in cells that have mitochondrion and nuclei, and that only happened in one lineage of cells. Maybe it’s the ability to oxidize carbon internally or organize DNA better, but the eukaryotic lineage developed multicellularity with specialization three separate times (fungi, plants, and animals), making it look like there’s something special about eukaryotes that makes complex life almost inevitable.
Brian:
there are literally thousands of examples of prokaryotic cell specialization. A classic example is of course the heterocyst cell in cyanobacteria, a single cell specialized for the fixation of nitrogen. However, there are many, many other prokaryotic organisms that exhibit functional specialization in colonies. For a really good example of this, look up Myxobacteria.
Specialization is really not that unique. It is a tool that has been re-evolved countless times by a ridiculous number of lineages.
James:
Bayesian probability only talks about how priors relate to posteriors. Once you step beyond that to the realm of actual inference, as in making claims about the real world, it’s fair game to ask how good they all are and how well it matches with reality.
On the other hand, having read through the actual article (albeit pretty briefly) heaping venom on this guy’s head for abusing Bayesian statistics is unwarranted. It looks like he starts with a sequence of “critical events” that occur with exponential waiting times (not quite a Poisson process), and then tries to estimate the various rates from geological data and the number of critical events from vaguely survival-analysis-ish methods, then rounds it out by smuggling in a little well-hidden likelihood-based testing at the very end. There’s a little proportionality notation thrown in, but that’s just placeholders to find constants of integration to get a valid PDF out. Not even much conditional probability, as far as I can see.
And then finally there’s the little perverse voice in me that compels me to point out that talking about the “real probability” of alien life is quite nearly nonsense. Either there is alien life out there and hence P=1, or there isn’t and P=0.
(Article is Andrew J. Watson. Astrobiology. February 1, 2008, 8(1): 175-185., it’s available off the second result on google scholar if you search “Watson Astrobiology 2008”)
My problem with anti-frequentist arguments (see the previous tread) is that under the same heading are bashed Fisher, Neyman, von Mises, etc., and that sounds propaganda.
R. von Mises:
“Opponents of Bayes’ approach object that the idea of the “collection of dice with various x” is unrealistic since one deals with one single die whose x is an unknown constant, and not with a universe of dice. This objection is not to the point. By attempting an inference on the probability of the x of the die used in our experiment we imply that various values of x are possible. The “collection of dice in a bag” is a model, an abbreviating simplification; the “collection” may consist of the dice in the various stores of a city or of the United States, and we are uncertain about the x of our chosen die.”
What Fisher, Neyman, von Mises, have in common is their belief that certain phenomena or experiments related to relative frequencies are prone to investigation by means of probability calculus.
Now, about ET: for me it is nonsense to attach odds to phenomena like development of metazoans, etc., calculate probabilities and then make claims about the odds of ET. All I have is a change, eventually, in my/our opinions about ET.
(It’s just my opinion:))
Shorter MarkCC:
Garbage in, garbage out.
Quite apart from any of Watson’s presuppositions his 0.01% is not “an exceedingly low probability”. Given the hundreds of billions of galaxies with hundreds of billions of stars almost any probability, no matter how small, virtually guarantees the existence of (intelligent) life somewhere in the universe. If it’s close enough to ever detect is a different cup of tea.
Mark, you’re embarrassing yourself. Repeatedly.
I daresay you will find more abuses of frequentist statistics out there than Bayesian. For every bad choice of Bayesian prior out there, how many cases are there of misinterpreted p-values, spurious correlations, inappropriately calibrated confidence intervals and estimators which should be shrunk but aren’t, “statistically significant” yet negligible effect sizes, etc.? You could argue that this is simply because there are more people using frequentist methods, but you have not given any reason why Bayesian probability is somehow more susceptible to abuse than frequentist probability.
Your claim that “the basic assumption of Bayesian probability is that you can always use it” is baffling. I have no idea what it means. Yes, you can apply Bayesian probability to any problem of inference, if your assumptions are correct. But the same is true of frequentist probability. The two approaches have different things to say and can reach different conclusions, but one is not more widely applicable (or claims to be) than the other.
I also disagree that anything that labels itself “Bayesian” is automatically suspect. Whether a paper labels its methods “Bayesian” has a lot more to do with the prevalence of non-Bayesian methods in the related literature than it does with any attempt to tout bad logic under a sexy name. In subfields where Bayesian methods are common and accepted, people don’t bother to say “Bayesian”; in frequentist-dominated subfields, authors feel it necessary to point out that they’re doing something different.
In point of fact, the article you’re criticizing does NOT tout itself as “Bayesian”. The particular Drake-equation style argument you’re criticizing isn’t Bayesian. It’s FREQUENTIST. It’s a likelihood function, giving the probability of some outcome (intelligent life) conditional on a bunch of hypotheses. Both frequentists and Bayesians use likelihoods. But the Drake equation isn’t a Bayesian argument, it’s frequentist, and every factor in it has a frequency interpretation (expressible in terms of “fraction of planets in the universe with property X”).
A Bayesian would condition on something observed (e.g., the existence of intelligent life on at least one planet) and infer something about the probability of a hypothesis (e.g., the probability of unicellular life arising). Here the opposite is done. Stringing together a bunch of conditional probabilities does not make an argument Bayesian — frequentists can do that too — and the argument you described did not use Bayes’s theorem anywhere.
Your terminology is also mistaken: a conditional probability with a specific (subjectively chosen) value isn’t a “uniform prior”. In fact, as I said, none of the conditional probabilities appearing in your argument are used as priors in the Bayesian sense. But that aside, a “uniform prior” would be something like “I think the likelihood of unicellular life arising can take on any value between 0 and 1 with equal probability”, i.e., that you are totally agnostic about the likelihood of that event. Saying “I think the likelihood is 0.1” is the exact opposite of a “uniform prior”: it is a very specific POINT prior. (Or rather, it would be if you were using it in a Bayesian calculation, which you’re not!)
You seem to be under the impression that because the conditional probabilities which appear in the likelihood were subjectively chosen, this makes the calculation Bayesian. This is nonsense. What makes a calculation Bayesian is conditioning on observations, rather than the frequentist approach of conditioning on hypotheses. If you believe that just because frequentist statistics doesn’t use priors, that means that it is therefore “objective” and free of any arbitrary subjective choices, you are badly mistaken. The choice of a likelihood function is an arbitrary modeling choice. In this case, it’s a very subjective one.
Rich’s post above has a better description of what the Astrobiology article did. It is also clearly a wholly FREQUENTIST argument: he calculates the probability of various events happening over a given interval of time, which is a straightforward frequentist likelihood calculation of an outcome conditioned on various hypotheses (as questionable as his assumptions may be), and then he proceeds to test hypotheses using an ordinary frequentist p-test. (“Thus P_(1/4)
umvue:
I think you mean, “Too many Bayesians do not perform model checking…”
Rich,
“The argument I hear all the time in favor of Bayesian probability […] is that it’s wonderful because it lets you reason in cases where you can’t have repeatable experiments”
That argument for Bayesian methods has never made much sense to me. (The other one I don’t like is “Bayesian methods let you incorporate prior information” … great, that makes sense and is useful when you can do it, but you often don’t have strongly informative priors, so it’s not a compelling reason to be Bayesian.) Frequentist methods let you reason about unique events. They have to do so by introducing a hypothetical infinite ensemble of other events that did not happen and were not observed, upon which inferences depend, so some consider it philosophically bizarre, but they can still generally reason about the likelihood of a unique event.
Dan,
“If I gather statistics on a sample, I’m likely to generate priors biased toward that sample and go awry when I meet the whole population.”
(Here I’m assuming you’re talking about a sample that you set aside to determine a prior, which is not used in later inference about the population. Otherwise, you’re using the data twice and will be overconfident.)
“Can you do n-fold validation of Bayesian networks/priors?”
Yes. It works pretty much like you expect. Set aside a training set, predict on a validation set, see how good your predictions are. You could in principle choose priors that minimize cross-validation error, assuming that you only use those priors in making inferences about new data that was in neither the training nor validation set.
Rich,
“And then finally there’s the little perverse voice in me that compels me to point out that talking about the “real probability” of alien life is quite nearly nonsense. Either there is alien life out there and hence P=1, or there isn’t and P=0.”
Well, you’re clearly a frequentist then. A Bayesian would say that P represents the strength of evidential support for that hypothesis, and can take on any value between 0 and 1.
You know, there’s a reason why I get so pissed off about this whole bayesian/frequentist thing.
Bayesian probability isn’t just Bayes’ theorem. It’s an interpretation of the meaning of probability.
If I were to post a *good* probabilistic argument that was structured in roughly the same way as the thing I criticized here, and I said that it was a non-Bayesian argument, I’d be pounded for daring to suggest that it was non-Bayesian. But if it’s a crock of shit, then I get pounded for calling it Bayesian.
Frequentism is predicated on the repeatable experiment. A frequentist wouldn’t try to do a probability calculation for something like this, because it’s not an repeatable experiment with controllable parameters. Give a problem like this to a frequentist, and they’ll throw up their hands and say “Can’t do it”.
Bayesian probability, because it’s based on degrees of knowledge, says that you can always compute some probability, and the result is your best guess based on uncertain knowledge.
The Bayesian view of this kind of computation isn’t that this argument says that there’s a 0.01 percent chance of intelligent life developing elsewhere; it says that given the state of our current knowledge, we would have a certainty of intelligent life developing elsewhere of 1 in 10,000.
I’d argue that that latter statement – the Bayesian statement – is exactly what the author of this garbage argument is saying. If you read how he qualifies the estimate – he says something along the lines of “It’s obviously a simplified model of things, but it’s a good
estimate based on what we understand in a form which is usable in a mathematical analysis” – I think that the Bayesian interpretation is pretty clear.
With respect to the response to my question about formally evaluating priors:
In general, evaluating a prior isn’t something that can really be formally addressed. In particular circumstances, there are methods that can provide a way of evaluating the quality of a prior.
If you’ve got a set of statistics, and you have a high degree of certainty about the distribution, you can use that to provide an evaluation of the quality of a prior based on
those statistics. But Bayesian probability doesn’t require that you have any such basis. A prior is, ultimately, a “best guess” at the initial state of knowledge. That best guess can come from something really good – like a well understood set of statistics with known distribution; or it can be a wild-ass guess.
Part of the power of Bayesian methods is that even starting from a wild-ass guess, you can apply additional knowledge to improve your understanding. If you start with a wild-ass guess, it’ll take longer to converge on a high degree of certainty than it would with a high-quality prior – but either one will converge on high certainty eventually.
Take my personal example. Whether we start with the 1/20 odds, generated from totally general population statistics, or we start with 1/6 (a rough estimate of generated from statistics about people with family history), or we start from my wild guess of 1/2, eventually, I will either develop diabetes or not. As time passes, and I have various tests at my regular checkups, we’ll gradually accumulate evidence – and if I use Bayesian methods to add that knowledge to my estimate, my certainty about whether or not I will eventually have diabetes will increase. Eventually, we’ll be able to say with virtual certainty that either I *have* diabetes, or I’m not going to have diabetes. Using Bayesian
methods, I’ll always be able to update the state of my knowledge – whether I started with the 1/6 (which is the prior based on the most information), or the wild guess – and eventually I’ll converge towards either 1 (I have diabetes) or 0 (I won’t develop diabetes).
So the wild guess is a usable prior, and it will eventually converge on high certainty as I add knowledge. But how can you formally gauge the quality of my initial wild guess? If you don’t know where it came from, if you don’t have good data about the incidence and causes of diabetes, about the genetic factors that could influence it, etc., how can you formally gauge the quality of it?
The best Bayesian arguments are obviously the ones where you’re not just using a guess for a prior – where you’ve got enough information to formally assess the quality of that prior. But that’s not all the time – and most of the informal (and crappy) probability arguments that we see throw out random guess priors. (For example, look up Swinburne’s proof of God, where he claims that because we have no evidence either way, he can use priors of 1/2 for everything.) What I’m trying to do is explain how to deal with those all-too-common garbage arguments that just fling out randomly selected priors, and try to pretend that they’ve got some mathematical weight because they’re supposedly using probability theory.
MarkCC:
He’s using a Bayesian interpretation of what probability “means”, but I think that speaks more to the hugely unintuitive way that frequentist statistics demands you interpret the result than it does to his methodology. People never want to put the randomness in the sample, they always have this same intuition of probability-as-belief.
At any rate, I don’t see anything in there about the distribution of the probability that there’s intelligent life in the universe being distributed as function X with parameters Y and Z based on the data. It’s point estimation all the way.
If anything, he’s “sinned” twice; first by drawing a somewhat debatable conclusion and presenting it with a good deal more certainty then it probably warrants, and then again by not presenting the results — at least as far as what the number means and how it should be interpreted — in a manner consistent with the model he used.
If you had just called it “Bad Statistics” I think everyone would have been happier. Putting “Bayesian” or “Frequentist” in a post title is just asking for trouble ’round these parts, I guess 🙂
Rich:
Sure, I could have skirted the issue and not said anything about it being a Bayesian article in the title. But I don’t think that would be accurate. This particular kind of argument comes specifically from people abusing Bayesian probability. As you admit, the argument that I’m mocking is based on the Bayesian interpretation of probability.
Frequentist probability has its own set of problems. And if someone were to send me an amusing example of a really dreadful frequentist argument – something as egregious as this monstrosity – then I would criticize it and prominently label it as frequentist bullshit in the title. (And anyone out there who has good examples of that, please send them!)
Ambitwistor writes: That argument [because it lets you reason in cases where you can’t have repeatable experiments] for Bayesian methods has never made much sense to me.
Why not?
I’m not sure if this counts as a stupid frequentist argument, but a classic example of an unintuitive probabilistic prediction is this:
Suppose that there is some disease that is extremely rare, only 1 person in a million has this disease. There is a blood test for this disease, and the test is 99% accurate. On a whim (you don’t show any symptoms) you take the test and it turns out positive. How likely is it that you have the disease?
One plausible way of looking at it is this: You tested positive, there is only a 1% chance that the test is mistaken, so there is 99% chance that you have the disease. However plausible this sounds, it is completely wrong. You haven’t taken into account the fact that before the test, your subjective probability that you had the disease was .0001% (one in a million). Testing positive raises the probability significantly, but not to anywhere close to 99%.
Bayes formula tells us that the posterior probability of having the disease is only about 0.01%
You can come to the same conclusion using frequencies: Out of 1 million people tested, you will find about 10,000 false positives, and only 1 accurate positive.
Daryl:
The “why not?” is easy. To a lot of people, arbitrary priors seem really weird. How do you pull out an initial estimate? What does it mean? In terms of intuition, the frequentist idea of what probability means is easier to grasp, and it’s what a lot of people think of when they’re looking at probability. If you’ve got that view of probability, then pulling out a prior that isn’t generated from some kind of statistical analysis seems strange.
The idea of “certainty” in Bayesian stuff can be hard to grasp. The idea that you can start with an essentially arbitrary initial guess, and use it to get something meaningful and measurable is a strange one. Once you’ve seen it enough times, you get a sense of it, and it makes so much sense that it’s easy to forget how counterintuitive it was at first.
Darryl:
When I asked for examples, I meant specific articles that I could use as a basis for a post; not informal anecdotes. Show me a website that uses that fallacy to make its argument, and I’ll happily tear into it. But a post mocking the idea that someone could use that kind of stupid argument is pretty much a boring post that no one will want to read.
Mark,
But we very often must make decisions that depend on evaluating the likelihood of events for which there is no meaningful frequentist probability. In the end, you have to decide whether to go ahead and launch the space shuttle or to take this or that treatment, even though the precise combination of relevant conditions has never come up before. That’s basically the case whenever you make a real-world decision.
I suppose you can imagine a set of repeated trials that would allow you to make sense of a frequentist probability: If we launched the space shuttle with exactly these conditions 1 million times, how many times would it blow up? But since you haven’t performed the experiment a million times, you have no idea what that probability is.
Mark writes: But a post mocking the idea that someone could use that kind of stupid argument is pretty much a boring post that no one will want to read.
I wasn’t mocking anyone. This was an example of an unintuitive prediction of probability theory. Obviouisly, I thought it was non-boring, since I remembered it for years.
Daryl:
I’m not arguing that the Bayesian approach is wrong; I’m just explaining why it’s not intuitive.
Like I said – once you understand it and you’ve used it for a while, it becomes so natural that it’s easy to forget that it was ever difficult. But coming at it initially, it *seems* strange.
I know that you weren’t mocking anyone. But I was talking in terms of my writing a post. Things like that, when I write them, I do mock the person making the bad argument. I’d be delighted to write a post about the stupid frequentist errors, given something to use as a starting point that I could turn into a reasonably entertaining post.
Mark,
“You know, there’s a reason why I get so pissed off about this whole bayesian/frequentist thing.”
It can’t possibly have anything to do with you saying wrong things about statistics. It must be the rabid Bayesian cabal keeping you down.
“Bayesian probability isn’t just Bayes’ theorem. It’s an interpretation of the meaning of probability.”
No. You don’t have to believe in subjective probability to be a Bayesian. That’s not what Bayesian inference is about. See Berger’s famous essay on objective Baye, particularly with the section starting with “A common misconception is that Bayesian analysis is a subjective theory; this is neither true historically nor in practice”. And many Bayesians (not just the “objective Bayesians”) will agree with a frequency interpretation for the likelihood function (but not necessarily for a posterior or a prior).
Bayesian inference didn’t arise because somebody sat down and said, “Hey, what we really need is a more subjective way to do statistics”. The real point of Bayesian methods is that Bayesians believe that you can speak of the probability of a hypothesis, and frequentists believe that you can only speak speak of the probability of the data. Subjective Bayes is just a side effect, one philosophical interpretation of how to deal with the prior which is automatically introduced when you start talking about the probabilities of hypotheses.
Conversely, just because frequentists have a frequency interpretation of the probability of data doesn’t mean that this probability is not subjectively decided. It is always subjectively decided! In frequentist statistics, as in Bayesian statistics, you always have subjective choice regarding your model and likelihood function. You can apply goodness-of-fit measures to evaluate the validity of those choices, but Bayesian can do that too, and it doesn’t make the choice objective.
“Frequentist” and “subjective” are not opposites. Frequentists always have subjective choices to make, and Bayesians can argue for an objective interpretation of probability.
“If I were to post a *good* probabilistic argument that was structured in roughly the same way as the thing I criticized here, and I said that it was a non-Bayesian argument, I’d be pounded for daring to suggest that it was non-Bayesian. But if it’s a crock of shit, then I get pounded for calling it Bayesian.”
No, you get pounded for not understanding the difference between frequentist and Bayesian statistics.
“Frequentism is predicated on the repeatable experiment. A frequentist wouldn’t try to do a probability calculation for something like this, because it’s not an repeatable experiment with controllable parameters.”
That’s nonsense. Frequentists do calculations like that all the time. Frequentist methods are not limited to repeatable experiments! Frequentist methods apply to geology, paleontology, climatology, evolutionary taxonomy, early-universe cosmology, and plenty of other scenarios where you have one-off observations, not controllable or repeatable experiments.
Frequentism just means that you interpret probability to be within the context of a hypothetical infinite ensemble of scenarios. It doesn’t mean that you have to have actually replicated these scenarios!
“Give a problem like this to a frequentist, and they’ll throw up their hands and say `Can’t do it’.”
You are deeply, deeply confused. A frequentist has absolutely no problem with calculating the probability of an outcome conditional on a hypothesis. That’s all they do!
“The Bayesian view of this kind of computation isn’t that this argument says that there’s a 0.01 percent chance of intelligent life developing elsewhere; it says that given the state of our current knowledge, we would have a certainty of intelligent life developing elsewhere of 1 in 10,000.”
What you’re talking about is
p(life on another planet|single celled life) = p(life on another planet|language) p(language|specialized cells) p(specialized cells|multi celled life) p(multi celled life|single celled life) p(single celled life)
This is a likelihood function, not a Bayesian calculation. It’s exactly the same thing as what frequentists use.
A Bayesian calculation would be something like,
p(life on another planet|life on Earth) ~ p(life on Earth|life on another planet) p(life on Earth)
You can tell it’s Bayesian because (a) the result is conditional on some KNOWN information (the existence of life on Earth), and (b) the prior contains the same quantity that you’re inferring (life on another planet).
Your calculation isn’t Bayesian: it’s (a) conditioned on a hypothesis (a given planet having single celled life), not on an observation (e.g., life on Earth), and (b) the thing being inferred (existence of life on another planet) doesn’t have a prior.
“I’d argue that that latter statement – the Bayesian statement – is exactly what the author of this garbage argument is saying.”
It manifestly is not. Come on, you can’t get more frequentist than a p-value significance test.
The author says a lot of things about “conditioned X% of planets with some property, Y% of planets will be observed to have some other property”. Those are FREQUENTIST statements: they are likelihood functions, and they have a direct frequency interpretation. They were subjectively chosen, but so are all likelihood functions. Maybe some choices are better motivated theoretically than others (e.g., perhaps I can use some knowledge of chemistry to constrain some of those factors, or perhaps I can use some knowledge of physics to constrain the probability that a U.S. quarter comes up heads), but that doesn’t mean that there aren’t subjective choices involved.
Every time a frequentist chooses a normal distribution over a t-distribution, he is making a subjective choice about the probability of observing certain data. That’s the whole POINT of frequentist statistics: to see what the implications of a modeling choice are in terms of what they predict about the data. Neither frequentist nor Bayesian methods give you any means for choosing a likelihood function. You can choose ANY function you want, assigning ANY probabilities you choose to the data. The subjective choice of a likelihood function is where frequentist and Bayesian methods agree, not where they differ!
“If you read how he qualifies the estimate – he says something along the lines of “It’s obviously a simplified model of things, but it’s a good estimate based on what we understand in a form which is usable in a mathematical analysis” – I think that the Bayesian interpretation is pretty clear.
How absurd. Do you think that frequentists don’t use simplified or estimated models of things??
Mark, you’re getting more and more confused about what frequentist and Bayesian statistics are by the minute. I don’t think you have any idea what either one is about.
Daryl,
See my response to Mark about ensembles. Frequentists don’t require you to have literally repeated an experiment infinitely many times in order to talk about probability. They just philosophically consider a hypothetical ensemble of infinitely many outcomes, and look at how often you’d see the observed outcome in this hypothetical ensemble. Philosophically, it’s kind of questionable, because this is just a hypothetical ensemble, not something that you can actually go out and construct and perform infinitely many measurements on. But nevertheless, once you postulate it, you can make inferences based on it. It doesn’t matter if you have one data point or many, or if you ever intend to collect any more data points — none of this affects ensemble arguments. But you are limited to looking at the probability of observed data assuming some hypothesis; frequentist methods don’t give you any way of addressing the probability of hypotheses assuming some observed data.
Rich,
“He’s using a Bayesian interpretation of what probability “means”, but I think that speaks more to the hugely unintuitive way that frequentist statistics demands you interpret the result than it does to his methodology.”
No. He’s using a frequentist interpretation in a perfectly legitimate frequentist way, which is also the same way a Bayesian would interpret a likelihood function.
We can argue that he has a poor choice of likelihood function, but that’s got nothing to do with how he’s interpreting probabilities.
If I say, “conditioned on single celled life arising, there is a 0.1 probaiblity that multi celled life will arise within X years”, that is a completely valid frequentist (and Bayesian) statement.
Bayesians and frequentists agree on likelihood functions. They disagree on how to use them to do inference.
I think we’re getting into the realm of semantics and gory details here, but here goes:
The reason this strikes me as a frequentist article is that he’s (greatly paraphrasing) said that the probability that intelligent life arises is the the probability (e.g. measure of the set of events) that the sum of four waiting times is less than some T. The random variables are the waiting times, which are well-defined maps from some probability space with a good measure to the space R^4. He’s estimated the parameters which define those maps, and derived his probability from that. Leaving aside how he interpreted his result, the proportion of his probability space which will give rise to an event such that the sum of those four random variables is less than T is about 0.0001, which strikes me as firmly in the frequentist mode of analysis.
If it were (what I consider) Bayesian, I’d expect it to see probability statements about the probability of intelligent life arising. That probability P would be the random variable: a map from some probability space equipped with a measure to R^1, and he’d have to either a) talk about some set in the borels of R^1, or b) talk about some property of the measure on the original probability space to make a meaningful statement. So in a Bayesian context, I’d expect to see a statement like “The most likely probability for intelligent life arising is 0.0001”, or “the 95% credible set for intelligent life arising is 0.0001 +/- 0.00004”.
While he’s taken the probability he got out of his math as a subjective estimate of the support for the hypothesis that we’re not alone in the universe, I don’t get “Bayesian analysis” from it. I get that he’s committed the common error of not taking a frequentist probability in the intended sense: as a frequency out of repeated trials.
If you want to just take subjective probability as the hallmark of Bayesian probability, that’s fine, just be aware that there’s (obviously) people who are using a different benchmark.
Anyway, the discussion is interesting, but a touch heated for me at this point, so I think I’m bowing out and heading back to lurk mode. Thanks for a provocative post 🙂
Oops, my example Bayesian calculation should be
p(life on another planet|life on Earth) ~ p(life on Earth|life on another planet) p(life on another planet)
It’s the p(life on another planet) which is the prior, which is what is missing from Mark’s calculation.
Mark’s likelihood function is however proportional (not equal) to a Bayesian posterior (Bayes’s theorem gives the proportionality): the quantity he calculates is proportional to p(single celled life|life on another planet).
Ambitwistor,
I understand that for any kind of uncertainty you have, you could consider the ensemble of all possible worlds that are consistent with your current state of knowledge, and then do frequentist probability on that ensemble. But how could you ever estimate the probabilities for that ensemble (since in fact, we only have one data point, our world)?
Daryl,
The frequentist ensemble is always constructed conditional about some assumed hypothesis about the world. e.g., “The coin is perfectly fair”, “the particle energies are governed by the Boltzmann distribution”, etc.
It obviously helps to do repeatable experiments to test whether the outcomes obey the likelihood function you assume. But there’s nothing in frequentist statistics which requires you to do so. You can make some hypothesis about the fairness of a coin without knowing anything about it, without ever having seen a coin before, and without ever intending to flip it more than once (or at all, I suppose). Nothing stops you from framing hypotheses. A frequentist will then take that assumption (the coin flips will be drawn from a binomial distribution with theta=0.5) and compare how likely or unlikely the observed outcome is under that specified hypothesis. There’s nothing in there that says that you have to observe 2 coin flips or 2 million. A hypothesis is, well, hypothetical. You don’t have to carry out infinitely many experiments in order to make a hypothesis.
Ambitwistor or really anyone else,
Can you explain how you arrive at probabilities values in a Bayesian interpretation without using some reference to frequency? How do you judge how much “evedentiary support” a particular element provides? I don’t see how you could do this with out reference to some kind of frequency (if it is through knowledge of chemistry or physics all of those probabilities are determined through repeated tests or simulations). Just so you know I certainly recognize the usefulness of bayesian methods I just don’t understand how you arrive at a value for “evedentiary support” without some kind of repetition.
Well, you can probably pretty much always dress up the probabilities in a frequentist manner if you are prepared to invent a many-worlds thought experiment. But it seems to me that the substantial difficulty is really for the frequentists, who have to be very careful about what they call a “repetition”. Given any deterministic setting, a precise repetition will give exactly the same answer…even tossing a coin, if you are careful enough. So then one has to start wondering about the source of randomness, and how similar an experiment has to be in order for it to count.
In practical terms, if you know a medical test has historically achieved 95% accuracy over a specific experimental sample, that doesn’t necessarily mean one should assume 95% accuracy for a person of a particular gender, age, weight, ethnicity, habits, or when implemented by a specific doctor (etc etc). So although the interpretation of that bit of evidence does have some basis in (finitely) repeated experiments, it would be hard to argue that it had a particular frequentist interpretation in any specific given application. In fact IMO it’s hard to see where truly frequentist probability ever applies to the real world (at least outside of quantum mechanics).
As for a stupid frequentist argument, I think that this might qualify:
“Pentagon May Issue Pocket Lie Detectors to Afghan Soldiers”, as discussed on Schneier on Security.
“For every 100 deceptive people, the researchers reported, the device would detect 86 (red), with two false negatives (green) and 12 uncertain (yellow).
For every 100 truthful people, they said, it would detect 50 (green), with eight false positives (red) and 42 uncertain (yellow).”
The compound accuracy then being reported as “82 to 90 percent accurate”…
I’m not completely up to speed on bayesian vs. frequentist, but …
What would you (MarkCC) say to a bayesian restatement of your conclusion that we have no idea?
That would be “50%,” correct?
mposey,
I think you’re asking, where do priors come from, if not from frequencies.
There are different kinds of Bayesians, who could give different answers.
There is nothing in Bayesian theory which says that you can’t use a frequency as a probability. If you choose to define probability as “degree of belief”, then it’s up to you how to quantify that degree of belief. So, if I a flip a coin N times and it comes up heads H times, then I can use H/N as my prior for the fairness of a coin regarding future coin flips.
I’m not sure where empirical Bayes fits in here; I can’t remember how that works.
There is Gelman’s “weakly informative prior” idea which, as I understand it, says that even if you do have strong prior information (as above) you should intentionally weaken it so it can learn from data which may differ from what the prior was trained on, but that it should be at least somewhat informative in the sense of giving low probability to really extreme events which could dominate your inference if given too much prior weight.
This has some relationships to the “robust Bayes” school (kind of ill defined, but some of them try to come up with priors to which conclusions are relatively insensitive, others advocate sensitivity tests for different choices of priors, and still others are focused on likelihood functions which are robust to outliers).
There are other “objective Bayesians” who think that the probabilities in priors should be uniquely dictated by symmetry or information-theoretic arguments determined by the structure of the problem at hand.
There are the pure subjectivists who will say that if you’re coming up with a “degree of belief”, you should use a number representing your personal, subjective beliefs as to how likely something is — even if you haven’t done any experiments. The subjective prior can dominate the inference, but if the inference is just supposed to represent an updating of one’s own prior beliefs, then that doesn’t necessarily matter: it just means that the evidence isn’t strong enough to sway a strong prior belief.
I’m sure I’m missing some other schools of thought here …
This is only true if you’ve spent a few classes being told that statistics is the study of populations. If you were instead taught from the start that the purpose of statistics is the way to account for uncertainty in scientific inference, then it wouldn’t seem so strange. Historically, from the start probability theory was thought of as quantifying uncertainty, albeit without the grounding provided by Dutch book arguments, Cox’s theorem, or decision theory approaches. The frequentist view of statistics began to emerge only at the end of the nineteenth century.
Did we read different papers or something?
The interpretation that most people seem to have taken from this isn’t what I read at all. Even MarkCC’s capsule description of the argument in the Astrobiology paper doesn’t match what I read…
“So, for example, Watson fairly arbitrarily assumes that there are four key events in the development of intelligent life … and assigns each of them a prior probability of 10%.”
Not at all. Watson started from a model – for which he credits Carter – in which there are one or more events, that are *by definition* not very likely to occur on a given planet during its habitable period, and which must occur sequentially. The number of such events is unspecified.
He then works out a distribution for the number of events, and takes the observation that here on Earth, intelligent life developed very late in the habitable period to work out the most probable value for the number of events, coming up with 4 or 5. He offers a bit of speculation as to what those events might have been – and credits Szathmary and Smith for *that* model.
Moreover, the paper *never* makes a claim for the likelihood of life on other planets. He even specifically disclaims being able to: “… if biogenesis is sufficiently unlikely to qualify as a (single) critical step, then the model predicts that the period between Earth’s first becoming continuously habitable and biogenesis should be of the same order as the habitable period that remains after observers evolve. In my view, I cannot at present rule out that the time to biogenesis is 0.5 Ga; therefore, no firm conclusion can be drawn as to whether biogenesis might be a common event on other planets.”
He even puts in some caveats about the possibility that the critical step model isn’t even the “right” model in the first place.
Have I completely misread the paper? It seems what he’s really saying is that, based on some newer theories about early evolution, the number of steps that fall in into the “critical” category is fewer than previously thought – down to 4 from 7.
I rather doubt that it’s possible to have formal requirements for priors, as priors are usually about the real world. You cannot get from necessary truth to contingent truth.
I agree with Scott B. The paper does not make the claims attributed by Astrobio.net, in particular regarding those arbitrary priors. It’s probably the author of the news article the one who read a different paper.
Thanks so much for correcting the record Scott. You describe the argument in the paper concisely and accurately, and also the antecedents to it, which are honorable (Carter, Maynard Smith and Szathmary are no fools).
I’m the author, and I tried as hard as I could to make the argument transparent, so was pretty surprised when I came across MarkCC’s encapsulation, which is factually incorrect from start to finish. But I note the links he gives are to media reports of the paper, not the paper itself. Did he actually read it, I wonder? “Squashing the bad math and the fools that promote it”: what about the fools who don’t bother to properly understand what they’re criticising before they make a post?
The model’s prior (if you want to cast it in Bayesian terminology) is that there is at least one unlikely step in the sequence that has led to the evolution of humans on Earth. “Unlikely” is defined as probability much less than one, hence a maximum of about 0.1 or 10%. The number of such steps is not a prior — the best guess of four is a result from the model, not a prior assumption. Neither are the actual probabilities of the steps, which could be much less than 10% but which can’t be solved for of course.
This kind of mathematical work, trying to infer things from over-elaborate models, always reminds me of the famous quote from von Neumann, communicated by Fermi:
“I remember my friend Johnny von Neumann used to say, ‘with four parameters I can fit an elephant and with five I can make him wiggle his trunk.'” A meeting with Enrico Fermi, Nature 427, 297; 2004. doi:10.1038/427297a