A failed attempt to prove P == NP

I wasn’t originally going to write about this, but people keep sending it to me asking for comments.

In computer science, we have one really gigantic open question about complexity. In the lingo, we ask “Does P == NP?”. (I’ll explain what that means below.) On March 9th, a guy named Michael LaPlante posted a paper to ArXiv that purports to prove, once and for all, that P == NP. If this were the case, if Mr. LaPlante (I’m assuming Mr.; if someone knows differently, ie. that it should be Doctor, or Miss, please let me know!) had in fact proved that P==NP, it would be one of the most amazing events in computer science history. And it wouldn’t only be a theoretical triumph – it would have real, significant practical results! I can’t think of any mathematical proof that would be more exciting to me: I really, really wish that this would happen. But Mr. LaPlante’s proof is, sadly, wrong. Trivially wrong, in fact.

In order to understand what all of this means, why it matters, and where he went wrong, we need to take a step back, and briefly look at computational complexity, what P and NP mean, and what are the implications of P == NP? (Some parts of the discussion that follows are re-edited versions of sections of a very old post from 2007.)

Before we can get to the meat of this, which is talking about P versus NP, we need to talk about computational complexity. P and NP are complexity classes of problems – that is, groups of problems that have similar bounds on their performance.

When we look at a computation, one of the things we want to know is: “How long will this take?”. A specific concrete answer to that depends on all sorts of factors – the speed of your computer, the particular programming language you use to run the program, etc. But independent of those, there’s a basic factor that describes something important about how long a computation will take – the algorithm itself fundamental requires some minimum number of operations. Computational complexity is an abstract method of describing how many operations a computation will take, expressed in terms of the size or magnitude of the input.

For example: let’s take a look at insertion sort. Here’s some pseudocode for insertion sort.

def insertion_sort(lst):
  result = []
  for i in lst:
    for j in result:
      if i < j:
        insert i into result before j
      if i wasn't inserted, add it to the end of result
   return result

This is, perhaps, the simplest sorting algorithm to understand - most of us figured it out on our own in school, when we had an assignment to alphebetize a list of words. You take the elements of the list to be sorted one at a time; then you figure out where in the list they belong, and insert them.

In the worst possible case, how long does this take?

  1. Inserting the first element requires 0 comparisons: just stick it into the list.
  2. Inserting the second element takes exactly one comparison: it needs to be compared to the one element in the result list, to determine whether it goes before or after it.
  3. Inserting the third element could take either one or two comparisons. (If it's smaller than the first element of the result list, then it can be inserted in front without any more comparisons; otherwise, it needs to be compared against the second element of the result list. So in the worst case, it takes 2 comparisons.
  4. In general, for the Nth element of the list, it will take at most n-1 comparisons.

So, in the worst case, it's going to take 0 + 1 + 2 + ... + n-1 comparisons to produce a sorted list of N elements. There's a nice shorthand for computing that series: \frac{(n-1)(n-2)}{2}, which simplifies to \frac{n^2 -3n + 2}{2}, which is O(n2).

So while we can't say "computing a list of 100 elements will take 2.3 seconds" (because that depends on a ton of factors - the specific implementation of the code, the programming language, the machine it's running on, etc.), we can say that the time it takes to run increase roughly proportionally to the square of the size of the input - which is what it means when we say that insertion sort is O(n2).

That's the complexity of the insert sort algorithm. When we talk about complexity, we can talk about two different kinds of complexity: the complexity of an algorithm, and the complexity of a problem. The complexity of an algorithm is a measure of how many steps the algorithm takes to execute on an input of a particular size. It's specific to the algorithm, that is, the specific method used to solve the the problem. The complexity of the problem is a bound that bounds the best case of the complexity of any possible algorithm that can solve that problem.

For example, when you look at sort, you can say that there's a minimum number of steps that's needed to compute the correct sorted order of the list. In fact, you can prove that to sort a list of elements, you absolutely require n lg n bits of information: there's no possible way to be sure you have the list in sorted order with less information that that. If you're using an algorithm that puts things into sorted order by comparing values, that means that you absolutely must do O(n lg n) comparisons, because each comparison gives you one bit of information. That means that sorting is an O(n log n) problem. We don't need to know which algorithm you're thinking about - it doesn't matter. There is no possible comparison-based sorting algorithm that takes less than O(n \log n) steps. (It's worth noting that there's some weasel-words in there: there are some theoretical algorithms that can sort in less than O(n lg n), but they do it by using algorithms that aren't based on binary comparisons that yield one bit of information.)

We like to describe problems by their complexity in that way when we can. But it's very difficult. We're very good at finding upper bounds: that is, we can in general come up with ways of saying "the execution time will be less than O(something)", but we are very bad at finding ways to prove that "the minimum amount of time needed to solve this problem is O(something)". That distinction, between the upper bound (maximum time needed to solve a problem), and lower bound (minimum time needed to solve a problem) is the basic root of the P == NP question.

When we're talking about the complexity of problems, we can categorize them into complexity classes. There are problems that are O(1), which means that they're constant time, independent of the size of the input. There are linear time problems, which can be solved in time proportional to the size of the input. More broadly, there are two basic categories that we care about: P and NP.

P is the collection of problems that can be solved in polynomial time. That means that in the big-O notation for the complexity, the expression inside the parens is a polynomial: the exponents are all fixed values. Speaking very roughly, the problems in P are the problems that we can at least hope to solve with a program running on a real computer.

NP is the collection of problems that can be solved in non-deterministic polynomial time. We'll just gloss over the "non-deterministic" part, and say that for a problem in NP, we don't know of a polynomial time algorithm for producing a solution, but given a solution, we can check if it's correct in polynomial time. For problems in NP, the best solutions we know of have worst-case bounds that are exponential - that is, the expression inside of the parens of the O(...) has an exponent containing the size of the problem.

NP problems are things that we can't solve perfectly with a real computer. The real solutions take an amount of time that's exponential in the size of their inputs. Tripling the size of the problem increases its execution time by a factor of 27; quadrupling the input size increases execution time by at least a factor of 256; increasing the input by a factor of 10 increases execution time by at least a factor of 10,000,000,000. For NP problems, we're currently stuck using heuristics - shortcuts that will quickly produce a good guess at the real solution, but which will sometimes be wrong.

NP problems are, sadly, very common in the real world. For one example, there's a classic problem called the travelling salesman. Suppose you've got a door-to-door vacuum cleaner salesman. His territory has 15 cities. You want to find the best route from his house to those 15 cities, and back to his house. Finding that solution isn't just important from a theoretical point of view: the time that the salesman spends driving has a real-world cost! We don't know how to quickly produce the ideal path.

The big problem with NP is that we don't know lower bounds for anything in it. That means that while we know of slow algorithms for finding the solution to problems in NP, we don't know if those algorithms are actually the best. It's possible that there's a fast solution - a solution in polynomial time which will give the correct answer. Many people who study computational complexity believe that if you can check a solution in polynomial time, then computing a solution should also be polynomial time with a higher-order polynomial. (That is, they believe that there should be some sort of bound like "the time to find a solution is no more than the cube of the time to check a solution".) But so far, no one has been able to actually prove a relationship like that.

When you look at NP problems, some of them have a special, amazing property called NP completeness. If you could come up with a polynomial time solution for any single NP-complete problem, then you'd also discover exactly how to come up with a polynomial time solution for every other problem in NP..

In Mr. LaPlante's paper, he claims to have implemented a polynomial time solution to a problem called the maximum clique problem. Maximum clique is NP complete - so if you could find a P-time solution to it, you'd have proven that P == NP, and that there are polynomial time solutions to all NP problems.

The problem that Mr. LaPlante looked at is the maximal clique problem:

  • Given:
    1. a set of V atomic objects called vertices;
    2. a set of E of objects called edges, where each edge is an unordered pair (x, y), where x and y are vertices.
  • Find:
    • The largest set of vertices C=\{v_1, ..., v_n\} where for any v_i, there is an edge between v_i to every other vertex in C.

Less formally: given a bunch of dots, where some of the dots are connected by lines, find the largest set of dots where every dot in the set is connected to every other dot in the set.

The author claims to have come up with a simple P-time solution to that.

The catch? He's wrong. His solution isn't P-time. It's sloppy work.

His algorithm is pretty easy to understand. Each vertex has a finite set of edges connecting it to its neighbors. You have each node in the graph send its list of its neighbors to its neighbors. With that information, each node knows what 3-cliques its a part of. Every clique of size larger than 3 is made up of overlapping 3-cliques - so you can have the cliques merge themselves into ever larger cliques.

If you look at this, it's still basically considering every possible clique. But His "analysis" of the complexity of his algorithm is so shallow and vague that it's easy to get things wrong. It's a pretty typical example of a sloppy analysis. Complexity analysis is hard, and it's very easy to get wrong. I don't want to be too hard on Mr. LaPlante, because it's an extremely easy mistake to make. Analyzing algorithmic complexity needs to be done in a careful, exacting, meticulous way - and while Mr. LaPlante didn't do that, most people who are professional programmers could easily make a similar mistake! But the ultimate sloppiness of it is that he never bothers to finish computing the complexity. He makes vague hand-wavy motions at showing the complexity of certain phases of his algorithm, but he never even bothers to combine them and come up with an estimate of the full upper-bound of his algorithm!

I'm not going to go into great detail about this. Instead, I'll refer you to a really excellent paper by Patrick Prosser, which looks at a series of algorithms that compute exact solutions to the maximum clique problem, and how they're analyzed. Compare their analysis to Mr. LaPlante's, and you'll see quite clearly how sloppy LaPlante was. I'll give you a hint about one thing LaPlante got wrong: he's taking some steps that take significant work, and treating them as if they were constant time.

But we don't even really need to look at the analysis. Mr. LaPlante provides an implementation of his supposedly P-time algorithm. He should be able to show us execution times for various randomly generated graphs, and show how that time grows as the size of the graph grows, right? I mean, if you're making claims about something like this, and you've got real code, you'll show your experimental verification as well as your theoretical analysis, right?

Nope. He doesn't. And I consider that to be a really, really serious problem. He's claiming to have reduced an NP-complete problem to a small-polynomial complexity: where are the numbers?

I'll give you a good guess about the answer: the algorithm doesn't complete in a reasonable amount of time for moderately large graphs. You could argue that even if it's polynomial time, you're looking at exponents that are no smaller than 3 (exactly what he claims the bound to be is hard to determine, since he never bothers to finish the analysis!) - a cubic algorithm on a large graph takes a very long time. But... not bothering to show any runtime data? Nothing at all? That's ridiculous. If you look at the Prosser paper above, he manages to give actual concrete measurements of the exponential time algorithms. LaPlante didn't bother to do that. And I can only conclude that he couldn't gather actual numbers to support his idea.

15 thoughts on “A failed attempt to prove P == NP

  1. John Fringe

    > P and NP are complexity classes of algorithms – that is, groups of algorithms that have similar performance properties

    Not to ntpick, but you may be oversimplifying there by equating algorithm and problem, and I believe it can confuse people more than help.

    Reply
  2. David Starner

    I’m not sure why you say “theoretical algorithms” for sorting in less then O(n log n) time; if you sorting something from a small set (like 16-bit integers) and many of them (way more then 2^16), a counting sort is a very practical O(n) sort on the problem.

    I find the belief that P=NP to be surprising; so many problems would need polynomial-time algorithms that nobody has found yet, for one thing.

    You’re more generous to LaPlante then I would have been. The instant you start thinking you have a solution to a deep problem that’s evaded many, check and double-check your answer.

    As a final nitpick, the first link, “posted a paper to ArXiv” seems to link back here.

    Reply
    1. markcc Post author

      I think you’re talking about radix sort, which is kind-of a cheat. It’s still O(n lg n), it just bakes the lg n factor into a constant,
      because on a real computer, the integer representation has a fixed length. It’s still essentially bounded by O(lg(n)) radix partitions, each of which takes O(n).

      There are other algorithms that get below O(n lg n) without relying on anything remotely fishy – but they’re completely impractical. The constants are huge!

      Reply
      1. David Starner

        Not radix sort per se, except in the degenerate case.

        short[] shortSort (short[] initialArray) {
        int[] buckets = new int[65536];
        for (int i = 0; i < buckets.length; i++) buckets[i] = 0;
        for (short s : initalArray) buckets[s + 32768]++;
        short[] finalArray = new short[initialArray.length];
        int index = 0;
        for (int currVal = 0; currVal < buckets.length; currVal++)
        for (int i = 0; i < buckets[i]; i++) {finalArray[index] = currVal – 32768; index++;}
        assert index == initialArray.length;
        return finalArray
        }

        It's something of a quirk of the Java platform that int will always be long enough, since initialArray must be int indexable, but a 64-bit integer will be large enough to count the values in any computer currently available. Naturally, unless initialArray is larger (much larger?) then buckets, this is going to be less efficient then a normal n log n sort, but it is O(n) for all observable behavior. The log n cost of handling integers that can hold a count of n elements is usually ignored, just like the log n cost of handling a pointer to one of n elements.

        Reply
      2. Ingvar Mattsson

        Actually, counting sort is O(n) in time and O(2**b) [b being the bits of the bounded set] in memory for a bounded integer set.

        Pseudo-code (memory is taken as an integer array, large enough to have all possible input numbers as an index and wide enough to not overflow):

        # O(n)
        for n in numbers:
        memory[n] += 1

        # O(2^b) [fixed] + O(n)
        for i in possible_numbers:
        emitted = 0
        while emitted < memory[i]:
        emitted++
        emit i

        This is subtly different from radix sort, in that we never keep a partially sorted input around. But we do have an awful lot of memory allocated and it is only ever practical for relatively small bounded integer ranges.

        Reply
  3. Pavel

    The sum of 0 + 1 + 2 + … + (n – 1) is equal to n(n – 1)/2, not (n – 1)(n – 2)/2. Of course both functions are still O(n^2), so it’s just a nitpick.

    Reply
  4. Audun

    Another nitpick: You say that for NP problems we only have an exponential algorithm. However, P is contained in NP, so many NP problems have algorithms in P. I think you are actually talking about the NP-hard complexity class.

    Reply
    1. markcc Post author

      No, I’m not talking about NP-hard. I’m just simplifying. Colloquially, when we talk about NP problems, we mean NP-complete and harder – that is, the set of problems in NP for which we don’t know whether or not they’re in P.

      The problem with writing about stuff like this is that each additional level of detail you add loses another group of readers. Sometimes making a simplification – using the colloquial meaning of a term without explaining it – makes the difference between people understanding it, and not understanding it.

      When I was writing this, I thought that presenting NP as the class of NP-complete and harder, the way we usually do colloquially, made sense. I still think it’s good; if it’s a bit fuzzy to experts, that’s OK – as you did, they’ll read this, and see what I meant.

      Reply
      1. Audun

        Fair enough. My worry is that it will confuse a certain kind of reader who takes this to mean that all NP problems are hard, so the statement that P = NP is trivially false. I am not that reader, as I have been exposed enough to this subject to figure out what’s going on.

        Of course, this choice is entirely up to you!

        Reply
  5. Dave W.

    Wouldn’t your colloquial usage really mean “NP-complete and easier, but not known to be in P”? Any problem in NP can’t be harder, in some sense, than an NP-complete problem, because it can always be reduced to SAT (Cook’s Theorem). But it could be of intermediate complexity – if NP-complete problems required exponential time, an intermediate problem in NP might require something like O(n^(log n)) time.

    Reply
  6. Dennis

    In a weird way, I commend LaPlante. He had the courage to publish his algorithm which meant 1. he believed his assertions and 2. he gave a proof which was refutable. I wish that everyone opened themselves to the potential of ridicule.

    Reply
    1. markcc Post author

      Me too. I went much easier on him than on, say, Chris Langan(? the CTMU guy) – because LaPlante really made an effort, and put his work up where he knew people would be able to critique it severely if he was wrong. Langan is careful to never give enough details on his “theory” to make it specific enough to be refuted. There’s always enough wiggle-room for him to take *any* refutation, and say “That’s not what it says, you’re just too stupid to understand it.”

      LaPlante is completely wrong. But he’s wrong in a respectable way.

      Reply
  7. vegafrank

    THE P VERSUS NP PROBLEM

    We define an interesting problem called $MAS$. We show $MAS$ is actually a succinct version of the well known $\textit{NP–complete}$ problem $\textit{SUBSET–PRODUCT}$. When we accept or reject the succinct instances of $MAS$, then we are accepting or rejecting the equivalent and large instances of $\textit{SUBSET–PRODUCT}$. Moreover, we show $MAS \in \textit{NP–complete}$.

    In our proof we start assuming that $P = NP$. But, if $P = NP$, then $MAS$ and $\textit{SUBSET–PRODUCT}$ would be in $\textit{P–complete}$, because all currently known $\textit{NP–complete}$ are $\textit{NP–complete}$ under logarithmic-space reduction including our new problem $MAS$. A succinct version of a problem that is complete for $P$ can be shown not to lie in $P$, because it will be complete for $EXP$. Indeed, in Papadimitriou’s book is proved the following statement: “$NEXP$ and $EXP$ are nothing else but $P$ and $NP$ on exponentially more succinct input”. Since $MAS$ is a succinct version of $\textit{SUBSET–PRODUCT}$ and $\textit{SUBSET–PRODUCT}$ would be in $\textit{P–complete}$, then we obtain that $MAS$ should be also in $\textit{EXP–complete}$.

    Since the classes $P$ and $EXP$ are closed under reductions, and $MAS$ is complete for both $P$ and $EXP$, then we could state that $P = EXP$. However, as result of Hierarchy Theorem the class $P$ cannot be equal to $EXP$. To sum up, we obtain a contradiction under the assumption that $P = NP$, and thus, we can claim that $P \neq NP$ as a direct consequence of the Reductio ad absurdum rule.

    You could see more on (version 3)…

    https://drive.google.com/file/d/0B3yzKiZO_NmWZ1M1MkNuYVBreXc/view?usp=sharing

    Best Regards,
    Frank.

    Reply

Leave a Reply