Probabilistic Complexity

As I’ve mentioned in the past, complexity theory isn’t really one of my favorite topics. As a professional computer scientist, knowing and loving complexity up to the level of NP-completeness
is practically a requirement. But once you start to get beyond P and NP, there are thousands of complexity classes that have been proposed for one reason or another, and really understanding all of them can get remarkably difficult, and what’s worse, can feel like an utterly pointless exercise,
particularly if the writer/teacher you’re learning from isn’t very good.

I’m not going to write a long series of posts on the more esoteric complexity classes. But there
are a few that are interesting, and might even have some relevance to actual practical problems in
selecting algorithms for a software system.

Today I’m going to talk about one of those practical classes, called BPP. BPP is the class of
problems that can be solved on a probabilistic machine in P-time with a bounded probability of
less than 1/2 of generating an incorrect result. (In general, we use 1/3 as the probability of failure,
but that’s just for convenience; you can pick any bound b you want from the range 0 < b
< 1/2, and you’ll wind up with the same set of problems in BPP.)

This is going to sound like a strange idea, but it’s actually quite useful. A probabilistic machine
is one which can, whenever it needs to during the execution of its program, flip a coin to select one
of two paths. If you can write a program which uses these random branches, and which will on average
generate correct results for better than 2 out of 3 executions, then the program is considered a
correct probabilistic solution to the problem, and the problem that it solves in in BPP.

Why is this useful?

Randomized algorithms are something we use in the real world. It’s rare that we can accurately
characterize the precise probability of generating a successful result, but BPP provides a model that
allows us to describe the capabilities of algorithms that use randomness. And even in cases where we
don’t really use randomness, we often approach exponential-time problems using heuristics – essentially
guesses about how to find a solution that work most of the time. To get some sense of what
benefit we can get from heuristics, we can model them as probabilistic branches – so again, BPP gives
us a way of exploring what the heuristics can do to the worst-case performance for a problem.

The most common kind of probabilistic algorithm is called a Monte Carlo algorithm. The
idea of a Monte Carlo algorithm is pretty simple. Imagine that you’ve got a barn with a target painted
on it, and your goal is to have a blind man put a bullet through the target. A Monte Carlo approach is
basically handing him a rapid-fire machine gun, pointing him in the right general direction, and having
him pull the trigger while moving the gun barrel back and forth until he’s fired a couple of hundred
shots. There’s a pretty good chance that at least one of those shots will wind up in the target. In
a Monte Carlo algorithm, you use some form of making a series of random guesses, each of which eliminates a group of possible solutions. After enough tries, whatever’s left is probably right.

The easiest example of this is one of the common algorithms for rapidly determining whether a
number is prime. Given an integer N which you think might be prime, randomly pick 10000 integers
between 2 and N1/2. For each one, check if it divides N evenly. If none of them divide it,
then there’s a high probability that it’s prime.

A slightly harder example of a probabilistic algorithm is one of the classic NP hard problems
is called the traveling salesman problem (TSP). In the TSP, you are a salesman who is given a route
including a set of N cities (nodes). You know the distance between any two cities in the list. The question is: what is the shortest path that allows you to visit all of the cities?

There’s a P-time probabilistic algorithm for the TSP based on a randomized heuristic which
generates an optimal solution with high probability. Basically, generate a random path. Then randomly pick a group of four nodes that are close together, and randomly rearrange them. If the
result is a shorter path, then that becomes the new path. If after some number *N* tries, the path doesn’t improve, then stop. (There are more details than that, but that’s the basic idea.) This algorithm can run in P-time, and generates optimal results with extremely high probability for problem sizes in the millions of nodes – problem sizes which would be absolutely intractable using
any conventional deterministic algorithm. It’s not guaranteed to generate a correct result; and in fact, I’ve
never seen an analysis that manages to specifically pin down its rate of success. But in practice,
its results are astonishingly good; I don’t think that for any test on hundreds on thousands of nodes where we know the correct answer that this algorithm didn’t find it.

(Note: The above explanation around the TSP was originally wrong. I managed to make
a major error in saying that the TSP was NP-complete. The TSP is not NP-complete;
it is NP-hard which is quite a lot worse. The TSP decision problem in NP-complete. The decision problem is: Given a path-length P, is there a path whose length is less than or equal to P? Just that decision part is NP-complete. Generating a path as a solution is quite a lot harder; well into EXP territory, even if it turns out that P=NP.)

There are also a couple of other interesting complexity classes that are related to BPP:

  • RP: randomized polynomial time. This is a computational class for decision problems
    – that is, problems which take an input, and return “Yes” or “No”. A problem is in RP if
    there’s a probabilistic machine that always answers “No” if the correct answer
    is “No”, and returns “Yes” more than 50% of the time if the correct answer is “Yes”. (What
    this means is sort of the opposite of what it sounds like: if it answers “Yes”,
    then it’s always right. If it answers “No”, then there’s some chance that it was
    wrong.) We know that P ⊆ RP, and RP ⊆ BPP. (There was originally a type in this paragraph, where I wrote “A problem is in NP” where it correctly reads “A problem is NP”. Thanks to commenter Coin for the catch!)
  • PP: probabilistic polynomial time. This is also a computational class for
    decision problems. For a problem in PP, there is
    a polynomial-time probabilistic algorithm where if the correct answer is “Yes”, the
    algorithm will answer “Yes” more than half the time; if the correct answer
    is “No”, it will answer “Yes” less than half the time. BPP ⊆ PP; PP
    is a much weaker (in the sense of the strength of results required) class – an
    algorithm in PP can end up arbitrarily close to being correct as often as
    a coin-flip, no matter how many times you repeat the computation; whereas in BPP,
    there is a probability bound, and by running the computation multiple times, you
    can reduce the bound to get arbitrarily close to always being correct. In fact,
    it turns out that NP ⊆ PP.

0 thoughts on “Probabilistic Complexity

  1. Coin

    Thanks for the great writeup.
    Possible typo:
    A problem is in NP if there’s a probabilistic machine that always answers “No” if the correct answer is “No”, and returns “Yes” more than 50% of the time if the correct answer is “Yes”.
    You mean RP, not NP?

    Reply
  2. Ithika

    A while back, inspired by your earlier posts and some pining for academia, I started writing a series of blog posts on stuff I thought was interesting in computer science. One of the better posts, I think, was my coverage of the travelling salesman problem. I’ve tried to keep my posts as light as possible on the mathematics front so as not to drive all my friends away. 😉
    Unfortunately I’m not so good a writer that explaining these topics is easy without maths. I really want to write about how awesome I think Turing machines are but can’t find the right angle. I’ll get there eventually!

    Reply
  3. Markk

    One probabilistic algorithm that many people use without knowing is the primality testing built in some of the RSA using programs. I think these complexity classes are actually very deep in the sense that they blur the edges of computability in ways that somehow seem real to me. There are just amazing probabilistic results in areas like modular square roots. Or at least they were amazing 13 years ago when I looked at them in depth. I always thought skip lists were another probabilistic data structure that seems like it could be implemented more – the bigger the list gets the better the bounds get, and there should be ways to get this algorithm really cache aware.
    From a fan of probabilistic algorithms.

    Reply
  4. John Armstrong

    I’m not going to write a long series of posts on the more esoteric complexity classes.

    Oh c’mon. “Pathological Complexity Classes”. You can start with the class of functions that have a bounded probability of the sight of a BF implementation driving an observer mad.

    Reply
  5. Craig Reed

    Wikipedia has a number of articles on Complexity classes of differing types. The main article is
    http://en.wikipedia.org/wiki/Complexity_classes
    which also has a nice little diagram of the relationship between varios classes of complexity. There’s a more complete diagram at
    http://www.cs.umass.edu/~immerman/complexity_theory.html
    I love reading your blog, even though for the past several weeks, the talk about manifolds & such has been over my head.

    Reply
  6. Zombie

    If I recall correctly, probabilistic complexity classes are a jumping off point for dealing with quantum computers and complexity classes of quantum algorithms. The big question: is QP > PP? (that is, are the polynomial-time algorithms you can implement on a quantum computer more powerful than ones on a conventional computer).

    Reply
  7. Xanthir, FCD

    Hmm. Do 100% accurate but not 100%-guaranteed-to-be-polynomial solutions count in BPP? I’m talking about Las Vegas methods (as distinguished from Monte Carlo; CS majors are gambler extraordinaires!) like quicksort. Quicksort doesn’t straddle the poly/exp boundary, but it’s still a good example – randomizing the list before quicksorting it makes it more likely to sort in optimal time (because a sorted list is the degenerate case in quicksort!).

    Reply
  8. Mark C. Chu-Carroll

    Xanthir:
    No, Las Vegas algorithms are not in BPP; BPP is polynomial time. Las Vegas algorithms are related to BPP in sort of the same way as some NP algorithms are to P: for many NP complete problems, they’ll complete in polynomial time in the overwhelming majority of cases, but there are cases where they go exponential. (For example, type inference in Haskell.) Las Vegas is the same deal: if you can’t guarantee that it always finishes in polynomial time, then it’s not in BPP.

    Reply
  9. Coin

    Xanthir, I think that the question there would be one of how probable the algorithm is to have found the correct answer by some polynomial time limit. I imagine what you would do here is modify the algorithm to use a time counter, and if the time counter ever hits some polynomial boundary, the algorithm terminates with whatever partial answer it’s so far found. You now have an algorithm which is guaranteed to run in polynomial time, but is no longer guaranteed to find the correct answer.
    To show it’s in BPP though you’d have to somehow calculate the probability that this modified quicksort will find the correct answer before it gives up.

    Reply
  10. Josiah Carlson

    Markk: re: Skip Lists
    Skip lists are quite interesting. If you have a bit of time, you may consider looking into Treaps. Much of the randomized fun of skip lists, but far fewer pointers to keep track of.

    Reply
  11. Torbjörn Larsson

    Algorithms that use probabilities in any form have fascinated me since I early on learned about finding pi from those. (The easy to grasp throwing pins on tiled floors being the prototypical case.)

    Then randomly pick a group of four nodes that are close together, and randomly rearrange them.

    Interesting. Reminds me of annealing – you shake the solution (here a rough local shake) until it settles down.

    PP is a much weaker (in the sense of the strength of results required) class – an algorithm in PP can end up arbitrarily close to being correct as often as a coin-flip, no matter how many times you repeat the computation; whereas in BPP, there is a probability bound, and by running the computation multiple times, you can reduce the bound to get arbitrarily close to always being correct.

    Ah! So we could call PP’s agnostic algorithms, while BPP’s are atheistic. [/stupid joke]

    Reply
  12. Josef Svenningsson

    Mark C:
    I think there is potential confusion in talking about Haskell type checking here. The algorithm is *not* NP complete but in fact EXPTIME complete. That means that we do know that there exists programs for which the algorithm *must* take exponential time, whereas with NP we just haven’t been able to prove this or find a polynomial time algorithm.

    Reply
  13. Mark C. Chu-Carroll

    Josef:
    The last I heard, Hindley-Milner type inference was shown to be NP-hard, but thought to be NP-complete. (Meaning that they had shown that there is a reduction from NP complete problems *to* HM type inference, and they hadn’t found a reduction *from* HM to an NP-C problem, but thought that there probably was one.)
    Do you have a reference for that? (Not that I doubt you, I’m just interested, so I’d like the read the paper!)

    Reply
  14. MiguelB

    Mark,
    It seems to me that the TSP, as you have expressed it, is not NP. I don’t believe it’s possible to check that the path returned by an algorithm is, in fact, the shortest, in polynomial time. Unless you’re assuming that the answer is known in advance by the checking algorithm?
    For example, the TSP as stated here http://qwiki.caltech.edu/wiki/Complexity_Zoo#np only requires that the returned path is shorter than a specified length; that can indeed be checked in polynomial time (including the requirement that each city is visited exactly once).

    Reply
  15. Mark C. Chu-Carroll

    Miguel:
    The problem on the zoo is just a restatement of the TSP as a decision problem. It’s equivalent.
    Suppose I’ve got a program DTSP which returns takes a maximum path length m, and returns “YES” if there is a path shorter than m, and “NO” otherwise.
    Now, if someone gives me a proposed solution P to the problem as I stated it, what I can do is:
    1. Check that the path visits all of the cities, and compute the length LP of the proposed solution. (That’s O(N) in the number of cities.)
    2. Ask DTSP if there is a path shorter than LP.

    Reply
  16. Xanthir, FCD

    Mark: Thanks for the clarification. I see the difference now – BPP in general (and Monte Carlo methods in specific) are strict on time but not on answers, while Las Vegas methods are the reverse.
    Coin: Ooh, I see how that would work. As long as you could prove that the bounded Las Vegas method would indeed return the correct answer at least half the time, it looks like that would push the problem into BPP.

    Reply
  17. MiguelB

    Mark,
    Is DTSP in P? My belief is that it isn’t, unless it somehow has precalculated all paths or something similar.
    If DTSP is not in P, there is a problem: the definition of NP is that the proposed solution can be verified in P time on a deterministic machine. If this is the case, you haven’t shown that the shortest-path problem is NP.

    Reply
  18. Mark C. Chu-Carroll

    Miguel:
    I’m sorry, but I don’t understand what you’re trying to ask.
    If I’m understanding you correctly, then you agree that the TSP stated as a decision problem is NP-complete, right?
    I think that I showed a P-time way of reducing a correctness test for the TSP as I presented it to a correctness test for the TSP decision problem. Is this the step that’s the problem? Do you see an error, or an unclear point, in how I suggested reducing the TSP verification problem to a verification of the TSP decision problem?
    Unless there’s some error that I’m missing, those two combine to a P-time verification algorithm for the TSP as I presented it. If it’s not the reduction of the problem, can you try to give me a little more info on what you’re not understanding?

    Reply
  19. billb

    Mark: I think Miguel’s question is how do you know DTSP is in P? You haven’t given us any algorithm that can tell us that given a path there is or is not a shorter path in polynomial time. Are we missing something? Is the program for DTSP obvious? It isn’t to me.

    Reply
  20. MiguelB

    Mark,
    billb is correct; I’m wondering how can we know that DTSP is in P.
    A problem is NP if a deterministic machine is able to verify a solution to it, in polynomial time. (This definition comes from your previous post).
    You have indeed shown a satisfactory way to verify a solution to TSP (as you presented it). The key part in the verification machine you proposed is DTSP. However, as billb pointed out, it doesn’t seem clear to me that DTSP itself runs in polynomial time on a deterministic machine.
    If DTSP itself isn’t in P, then your verification method doesn’t prove that TSP is in NP.
    I hope I’m being clearer this time.

    Reply
  21. Mark C. Chu-Carroll

    Miguel et al:
    The TSP decision problem is in NP.
    It turns out that I’m wrong. The TSP as I presented it is NP hard – so there’s a reduction *from* NP-complete problems *to* TSP; but there’s no reduction from TSP to NP-complete.
    Sorry for the confusion; I really blew it this time. 🙁
    But thanks for pushing me until I saw I was wrong!

    Reply
  22. Walt

    TSP is (conjecturally) strictly harder than NP. I’m rusty on the notation, but it’s something like NP + co-NP. (The decision problem version is in NP.)

    Reply
  23. Carl Witty

    Mark, a couple of comments.
    1) I really hope that your algorithm is not a common one for testing primality; it is very likely to give the wrong answer for numbers around a billion, which is tiny by primality-testing standards. (For example, consider 1027243729, which is 1009^3, where 1009 is prime. The only divisor your test can find is 1009; testing 10000 random numbers between 1 and 32050 in hopes of finding 1009 has less than a 1/3 chance of success.)
    2) If P=NP, then there is a polynomial-time algorithm for finding optimal TSP paths. Here is my first attempt, although I’m sure that a more efficient algorithm is possible.
    This assumes that distances are represented as binary integers with at most B bits, that there are V cities, and that there are E city-to-city distances (so in the standard problem, E = V*(V-1)/2).
    The total distance for the optimal path is bounded by V*2^B, which has B*log(V) bits. Call the polynomial-time TSP decision procedure algorithm B*log(V) times to do a binary search to find the exact optimal total distance.
    Now, for each of the E edges in turn, set the distance for that edge to V*2^B and call the polynomial-time TSP decision procedure to see if the original optimal distance is still achievable. If so, leave the edge at the new length and continue; if not, return the edge to the original length and continue.
    When you are done with this procedure, then the edges which remain at their original length form an optimal path for this TSP instance.
    Under the assumption that P=NP (so that the TSP decision procedure is polynomial-time), then this algorithm is polynomial-time: the bulk of the work is B*log(V) + E calls to the TSP decision procedure, which is polynomial in the size of the original problem.

    Reply
  24. JBL

    Carl,
    Presumably, the “10000” in his algorithm just meant “some large number,” and in an actual implementation would likely be something like linear or quadratic in log n (I suppose that’s something I could calculate or look up, but I’m just going to go with that guess here). More importantly, your example of 1009^3 raises an key point: the algorithm doesn’t have to have a high probability of success on every input, it has to have a high average probability of success across all inputs of a given size. So occasional instances (say, the cubes of large primes) where the algorithm fails don’t matter if almost all of the time the algorithm succeeds with high probability. (For you particular example, large primes are rare and their cubes equally so, so we’re extremely unlikely to run up against a case like p^3 for large p.)

    Reply
  25. Xanthir, FCD

    Carl:
    JBL is correct. Probabilistic tests *do* fail on some inputs; that’s why they’re probabilistic and not deterministic. If you run them many times, though, the chance that *every* run answered wrong can be dropped to any value you wish. An effective algorithm will have a >50% chance of returning correctly, so that the error drops to any arbitrary value in polynomial time.
    The particular algorithm Mark described for primality testing is very simple and not actually used. It’s a bad algorithm because the chance of a large number being divided by a single random small number is well under 50%. It takes a long time to get a decent value (and you need a *very* small error possibility when you’re working with large numbers, because there’s just so many of them).
    The Fermat Primality Test linked by Andrew Wade is a very good one. Read up on it, it’s very interesting how it works.

    Reply
  26. Mark C. Chu-Carroll

    Xanthir, Andrew:
    I agree that the primality test example I used is a crappy,
    weak algorithm – but it will get to an acceptable
    result given enough time – it requires many times too many trials to get to a reasonable probability. But I didn’t want to get bogged down in the details of a good algorithm, because I was trying to focus on the idea of probabilistic computation.
    The Fermat algorithm is one of the good probabilistic algorithms for prime testing; Miller-Rabin is on the same general track, but even cooler.

    Reply
  27. chrisk

    Just a couple of quick nitpicks:
    1) (More complaining about the primality-finding example.) While primality-finding is indeed in BPP (and, as discovered recently, in P), the algorithm you give for it in this post is not a BPP algorithm. Non-prime numbers may have very few prime factors, and if the number we’re looking at is large enough your algorithm will have only a negligible chance (in a very quantifiable and critical sense) of finding a factor.
    To put it another way: if your algorithm worked, it would yield a BPP algorithm for actually _factoring_ the number, which would be a monumental discovery. The other algorithms you mentioned (Miller-Rabin, for one) are bona fide BPP algorithms.
    2) A comment on a comment: “Las Vegas” algorithms (more commonly called ZPP algorithms) are indeed BPP algorithms. (Indeed, ZPP == RP intersect co-RP; RP and co-RP are in turn subsets of BPP.) To see the containment of ZPP in BPP directly, note that you can characterize ZPP as the class of algorithms that, in guaranteed-polynomial time, either return YES or NO (in which cases the answer is definitely correct) or return UNKOWN. (It’s the same as the characterization you used, except that we put a clock on the algorithm, and if it runs beyond a certain time bound, we return UNKOWN instead of letting it run forever.)
    To turn this into a BPP algorithm, just flip another coin in the event of an UNKOWN to decide whether to say YES or NO. Since the probability of it returning UNKNOWN can be shown to be small, the probability that it gets the wrong answer is also small.

    Reply

Leave a Reply