I’m going to do some writing about discrete probability theory. Probability is an extremely important area of math. We encounter aspects of it every day. It’s also a very poorly understood area – it’s one that we see abused or just fouled up every day.
I’m going to focus on discrete probability theory. What that means is that we’re going to look at things where the space containing the things that we’re going to look at contains a countable number of elements. The probability of getting a certain sequence of coin flips, or of getting a certain hand of cards are described by discrete probability theory. On the other hand, the odds of a radioactive isotope decaying at a particular time requires continuous probability theory.
Before getting into the details, there’s one important thing to mention. When you’re talking about probability, there are two fundamental schools of interpretetation. There are frequentist interpretations, and there are Bayesian interpretations.
In a frequentist interpretation, when you say the probability of an event is 0.6, what you mean is that if you were to perform a series of experiments precisely reproducing the event, then on average, if you did 100 experiments, the event would occur 60 times. In the frequentist interpretation, the probability is an intrinsic property of the event. For a frequentist, it makes sense to say that there is a “real” probability associated with an event.
In a Bayesian interpretation, when you say that the probability of an event is 0.6, what you mean is that based on your current state of knowledge about the event, you have a 60% certainty that the event will occur. In a strict Bayesian interpretation, the event doesn’t have any kind of intrinsic probability associated with it. The specific event that you’re interested in either will occur, or it won’t. There’s no real probability involved. What probability measures is how certain you are about whether or not it will occur.
For example, think about flipping a fair coin.
A frequentist would say that you can flip a coin many times, and half of the time, it will land on heads. So the probability of a coin flip landing on the head of the coin is 0.5. A Bayesian would say that the coin will land either on heads or on tails. Since you don’t know which, and you have no other information to use to be able to make a better prediction, you can have a certainty of 0.5 that it will land on the head of the coin.
In the real world, I think that most people are really somewhere in between.
I think that all but the most fervent Bayesians do rely on an intuitive notion of the “intrinsic” probability of an event. They may describe it in different terms, but when it comes down to it, they’re using the basic frequentist notion. And I don’t think that you can find a sane frequentist anywhere who won’t use Bayes theorem to update their priors in the face of new information – which is the most fundamental notion in the Bayesian interpretation.
One note before I finish this, and get started on the real meaty posts. In the past, when I’ve talked about probability, people have started stupid flamewars in the comments. People get downright religious about interpretations of probability. There are religious Bayesians, who think that all frequentists are stupid idiots who should be banished from the field of math; likewise, there are religious frequentists who think that Bayesians are all a crop of arrogant know-it-alls who should be sent to Siberia. I am not going to tolerate any of that nonsense. If you feel that you cannot read posts on probability without going into a diatribe about those stupid frequentists/Bayesians and their deliberately stupid ideas, please go away and don’t even read these posts. If you do go into such a diatribe, I will delete your comments without any hesitation.
Looking forward to the upcoming posts, more so than usual, if that is possible.
It has been a recent shock to realize how fraught is the simple concept of “expected value”. As I gain responsibilities, I see how infrequently I can use the expected value to make decisions. More important are bounds and measures of stability.
Looking forward to this. I like Statistics a lot more now that I’m teaching it than I did when I first learned it. I’m always looking for new ways to explain things to my students.
hmmm… this should be interesting… still hope to see some clashing of viewpoints in the comments.
I fight this battle with myself and with my employees and associates regularly. We propose to perform testing and inspection work on construction projects, no two of which are ever identical. When we do, our CRM software requires us to enter a “probability of win” which is used to forecast future revenue and, summed over the large number of proposals we prepare, it predicts fairly well. Yet I’m constantly troubled by the terminology – we’ll either win the work or we won’t. We can’t perform 100 experiments and look back. Even “60% certain” seems vague. I’ve come to grips with what I mean when I say it by saying “I’d take either side of a betting line of 3 to 2 that we’ll win the project.” When I explain why I go through such a torturous and seemingly meaningless mental exercise, my associates look at me with a combination of sympathy for the nut case and head-shaking dismissal.
I’d suggest that that’s one of the greatest truths about interpreting probability; as long as you have done the mathematics correctly, you can interpret the result using any mental model you find helpful.
In some situations, a frequentist model (if I did this 100 times, how many times would I expect to win) is useful; in others, a Bayesian model like yours (how confident am I that I will win, expressed in gambling terms) is useful.
Back in my first year physics class, the lecturer introduced the wave/particle nature of light with words to the following effect:
People have spent a lot of time asking if light is a wave or light is a particle. The right answer is that it is neither. Light is a quantum mechanical phenomenon, and it is what it is. The words “wave” and “particle” refer to models that we use to try to understand light, but when you get down to it, light is what it is.
My undergrad physics lecturer was a wise person.
Now what you, dear reader, probably think I’m going to say is that “Bayesian” and “frequentist” are just models that we use to interpret an underlying formal mathematical model. Actually, I’m not going to say that at all, even though I have quite a lot of sympathy for that point of view. (I will also point out for completeness that David Merman’s “shut up and calculate” interpretation is also a valid interpretation.)
What I’m going to say, instead, is that it probably doesn’t actually matter, since classical probability is just an approximation anyway. At a fundamental level, probabilities are actually algebraically closed (quantum physicists seem to like complex numbers, though it’s probably better thought of as a Grassman algebra), and satisfy a 2-norm instead of a 1-norm. It’s only in the classical limit that classical probabilities appear.
The value of probability theory is not that it has an underlying reality that one side or the other knows the truth about, because it doesn’t. Its value is that it can be used to model useful systems to solve useful problems. That there is more than one kind of problem that it can be applied to is something that we should celebrate, not argue about.
What do you mean when you say probabilites are algebraically closed? Under what operation? And what do you mean they satisfy a 2 norm instead of a 1 norm? Are you suggesting that a pdf’s L2 norm is 1? If so, this is wrong. Indeed, part of the definition of a pdf is that is 1 in L1 norm. (Of course, when you are dealing with discrete probabilities, the integral is a sum, but if you know measure theory, you will know this is also an integral.)
Don’t worry, Robert. Troll comment is trolly.
I’ve just noticed this comment after several weeks. What I meant by “algebraically closed” is “algebraically closed field”. I wasn’t clear about that.
And yes, I am indeed suggesting that a quantum pdf’s 2-norm is (supposed to be) 1. Quantum probability theory is very different from classical probability theory.
No. You are incorrect. A frequentist EXPECTS it to occur with the intrinsic probability. They recognize variance.
This is cool. I did not know that people were so passionate about their interpretations of probability.
Infinite probability distributions are really confusing. I never understand whether the probability of the outcome depends on the number of points corresponding to an outcome or to some measure of that set. shrug.
E. T. Jaynes’ book Probability Theory is a must read for anyone interested in really digging into probability theory.
It derives probability theory from essentially three intuitive properties we’d like probabilities to have.
As a bonus, it shows precisely under what conditions typical frequentist/Bayesian assumptions hold, sort of laying to rest any need for philosophical debates about which is better.
http://xkcd.com/1132/ because it need to be done.
What about the propensity interpretation of probabilities? This is a theory Karl Popper vouched for (among others) and tries to explain the origin of observable frequencies. I guess it’s more metaphysics than statistics and mathematics.
The bitterest fights are between Orthodox vs. Conservative Bayesians.
The coin-flip example is interesting because consideration of unfair coins shows how the two interpretations can intermix. A frequentist needs some bayesian concepts to deal with them (i.e., unfair coins make clear that we are starting with ideas about a coin’s properties, which we update – we are not starting by accessing a coin’s properties directly). Meanwhile, the bayesian is going to have a difficult time making sense of the difference if they don’t acknowledge that there are such things as fair and unfair coins.
IMO this is a case where we have two models, both useful in a wide variety of overlapping cases, but each of which deals with some questions more quickly and more intuitively than the other. Sure, we could use one of them to solve all (relevant) math problems, but if flipping between them solves problems quicker and easier, why not do that? I CAN use a claw hammer to screw in a screw, or a screwdriver to drive in a nail…but why would I want to?
Perhaps I’m a bit more practical about probabilites (is that an engineer thing as opposed to a mathematician thing?). I’m less concerned about the theoretical arguments and more concerned about the practical applications. Specifically – if an event occurred, what was the probability that it was a real effect or could it just have happened by chance? Its all well and good to write a program that simulations ten million flips of a coin that can tell you whether the coin was in fact fair (50% +/- the acceptable range of variation) or whether it has a slight or an obvious bias (52% heads vs. 90% heads).
Problem with ANY probabilitistic endeavor – we can’t always write simulations that flip coins 10 million times. If the weather forecaster says there is an 80% chance of rain, how likely am I to pack an umbrella? If it doesn’t rain, does that make me doubt the forecaster models or should I accept the possibility that in ten million different parallel universes it is raining in approximately 8 million of them?
I love the XKCD cartoon on this. If you want to really see a frequentist vs. a Bayesian duke it out, take them to Vegas and give them each $1,000 to play at Roulette. Chances are, after 10 million spins at the table, they’d both be broke.
In that setup, a good practitioner of probability would just take the money and run.
Pingback: Poor little Dick… | Unsettled Christianity
There is this fundamental weirdness about probability.
If frequentism rejected all variance, it would be straightforward nonsense (it would imply that it is literally impossible to flip heads three times in a row, since any subset of those coins would fail to reflect the “expected” one-head-one-tail). Instead, it is un-straightforward sense. Yet weirdly un-confirmable…?
And likewise for Bayesianism’s “degree of belief” thing. Interpreted simplistically, it suggests that, eg, the probability in Bertrand’s box paradox really is one-half, because that’s the typical person’s “degree of belief” about it! So we try formalizing the concept, for example by discussing the fraction of one currency-unit that a hypothetical rational gambler should be willing to pay, if a positive result meant winning one unit. But how do we confirm that it’s a good amount? Well, one way is to repeat the game many, many times and see if our gambler did break even. But that’s sounding oddly familiar…
The term “degree of belief” in Bayesian discourse is not intended to indicate human belief.
The math makes it exquisitely clear what this Bayesian “belief” is, and there’s a vast body of cognitive science research which demonstrates that human belief systematically departs from this Bayesian belief.
Why isn’t the difference just a matter of definitions? Apparently, the ‘frequentists’ and ‘Bayesians’ use the term ‘event’ in a different sense. Is there anything more profound to it?
It’s not so much a debate of which definition is better as a debate over which model leads gives the best results.
In my experience, the sciences seem to prefer Bayesian formalisms for their ease of use and avoidance of problems that were often encountered when the older “frequentist” models were all we had. In all fairness, I should admit that my experience is considerably biased toward the hard sciences–physics etc.
That said, I think the flame wars tend to rely more on arguing philosophically from interpretations of the mathematical frameworks rather than from the frameworks themselves.