In the comments thread of the post on Turing Equivalent vs Turing Complete, there’ve been a flurry of questions and comments about some stuff involving computational complexity. So I thought it would be interesting to write a couple of posts about computational complexity. I’m not going to get into a lot of detail, because I would need to dig out some books that I really don’t want to read again :-). But I can do the basics without them. It’ll take a few posts to get through this.
The logical place to begin is: What is computational complexity? When we look at a computation, one of the things we want to know is: “How long will this take?”. A specific concrete answer to that depends on all sorts of factors – the speed of your computer, the particular programming language you use to run the program, etc. But independent of those, there’s a basic factor that describes something important about how long a computation will take – the algorithm itself fundamental requires some minimum number of operations. Computational complexity is an abstract method of describing how many operations a computation will take, expressed in terms of the size or magnitude of the input.
When we talk about complexity, we can talk about two different kinds of complexity: the complexity of an algorithm, and the complexity of a problem. The complexity of an algorithm is a measure of how many steps the algorithm takes to execute on an input of a particular size. It’s specific to the algorithm, that is, the specific method used to solve the the problem. The complexity of the problem is a bound that bounds the best case of the complexity of any possible algorithm that can solve that problem.
For example, if you want to sort a list of names, you need to compare names to figure out where things go in the sorted list. If you think about any sorting algorithm, it’s also pretty obvious that in terms of execution, the time it takes to run a sorting algorithm is overwhelmingly dominated by time it takes to do the comparisons.
One of the most popular algorithms for sorting lists is called quicksort. In the worst case, quicksort will require a number of steps proportional to the length of the list squared; on average, quicksort will require a number of steps proportional to the length of the list times thelogarithm of the length of the list (n × log(n) if n is the length of the list).
But no matter how clever you are, there is some minimum number of comparisons that you absolutely must do in order to produce a properly sorted list no matter what you get as input. In the case of sorting a list of n names, that happens to be proportional to n × log(n).
Talking about complexity the way I did in the previous couple of paragraphs is really pretty awkward. To make it easier to talk about, we have a special notation we use, called big O notation. Roughly speaking, when we have an algorithm like sorting where we know that no matter what it’s given as input, for a list of n elements, the largest possible number of comparisons that are required to sort the list is proportional to n log(n), we say that sorting has a complexity of O(n log(n)). Big O is an upper bound: it gives an approximation of the maximum number of steps required for the worst possible input; nothing will ever be slower than its Big O bound.
To be precise and formal for a moment, what big O actually means is a little more complicated. Many computations have some basic minimum cost for setting up; and many have some factor dependent on the size of the input that might be dominant for small inputs, but which gets dwarfed as the size of the input grows. We don’t want those factors muddying things up. The formal definition of big O specifically says how we’re going to eliminate those kinds of factors that are really just noise.
Take a computation C with input size n. Let’s say that the maximum number of steps required to execute C on an input of size n is given by some function k(n). We say that C has complexity O(f(n)) (or C is O(f(n))) if/f:
∃ n0, M : ∀ n > n0, k(n) ≤ M × f(n)
Just to make sure that that’s clear, here’s a quick example to give you the sense of what a big-O bound means. To the right is a log-scale graph of “max(x ln(x)+x,2+x)”, a function which could be the number of instructions needed for a sort algorithm.
Like most good sort algorithms, this is O(n lg(n)). But if we just add “x ln x” to the graph
in red, then you can see it’s always smaller than the original curve. Looking at this naively, you might think that this would mean that we shouldn’t use O(n lg(n)) as a bound.
But if we multiply it by 2, then starting around 2.5, the log curve (now in blue) overtakes the algorithm curve. Since there is a constant factor (M=2) where beyond some minimum point (n0=2.5), M×(n lg(n)) is always larger than the number of steps needed by the algorithm, then O(n lg(n)) is a valid bound.
Big O is a worst-case measure: it tells you that you will never need more than a certain number of instructions to complete a computation. It is, in many ways, a blunt instrument. It can’t tell you how many instructions you actually need; it can only tell you that it will be less than some number. We sometimes use it to compute upper bounds on the average number of instructions used by an algorithm.
We tend to categorize complexity into a particular group of classes. Roughly speaking, we tend to use something like the following taxonomy:
- O(1)
- Constant time – something that takes the same number of steps for any input.
- O(log(n))
- Logarithmic time – something where the time to compute it is dependent on the logarithm of the size of the input. An example of a log time algorithm is finding a particular value in a sorted list of values where the time to compare two values is constant.
- O(n)
- Linear time – something that takes time proportional to the length of input. For example, finding a particular value in an unsorted list; or comparing two variable length strings.
- O(n log(n))
- Something that takes time proportional to the length of the input times the log of the length of the input. This one sounds strange and unlikely to a lot of people, but it turns out to be a very common class – as we saw above, it’s the complexity of sorting.
- O(n2)
- Quadratic time: something whose execution time is proportional to the square of the size of the input. For many systems, O(n2) is treated as an upper bound on the complexity of any practical computations. For example, many compilers are implemented with the requirement that intra-procedural analysis/optimization be no worse than quadratic.
- O(n3)
- Cubic time. Another common maximum bound on the complexity of “practical” computations. Cubic is somewhat common as a complexity bound in dynamic programming algorithms.
- O(nk) for any fixed exponent k
- This is what we call polynomial time, or P-time – something bounded by any constant exponent. It’s pretty much universally accepted that only P-time computations are really practical.
- O(nf(n)) where f is monotonically increasing in n.
- Exponential time. The time to perform a computation is bounded by the size of its input raised to a monotonically increasing exponent. This is nightmare territory – things where we’ve got an algorithm that can do the computation in theory, but which is essentially impossible in practice.
In the exponential case, we usually teach it as O(k^g(n)) for some constant k, which just gets you a different function on n than the equivalent of what you gave: O(e^(f(n) log (n))) .
“It’s pretty much universally accepted that only P-time computations are really practical.”
But the analysis done by the Alloy analyzer, for instance, is exponential. It’s not much we can do when we run into hard problems, but what we can do may be very valuable indeed.
On another thread, Gregory J. Chaitin’s name came up. By coincidence, something of his appeared on the arXiv just a few hours ago:. Since it relates to the topic of this thread, I link to it below.
Algorithmic information theory: Some recollections
G. J. Chaitin (IBM Research)
Presents a history of the evolution of the author’s ideas on program-size complexity and its applications to metamathematics over the course of more than four decades. Includes suggestions for futher work.
There are several interesting algorithms in (somewhat) wide use in the exponential territory. A good example is the Hindley-Milner type inference algorithm inside all our ML and Haskell implementations. So if your N is usually very small, an exponential algorithm can be acceptable.
Daniel, Harald:
You are right; I should have taken the time to be a bit more precise. There are exponential-time algorithms in widespread use. The catch is that they’re exponential time algorithms that are fast (i.e., not just non-exponential, but often close to linear time) in most cases, but do have the potential to blow up into exponential time.
I remember attending one of Daniel Jackson’s talks about Alloy (which is, incidentally, one of my favorite pieces of software; what an amazing tool!). He was explaining how it translates from the alloy specification to SAT, and then feeds the SAT to a SAT-solver. The way he phrased it was great: “The bad news is, it reduces to SAT which is NP-complete. But the good news is that it reduces to SAT, which means that we can call a SAT-solver that produces a solution really quickly.”
H-M type inference is similar; in the most common cases (i.e., almost all real programs), H-M is really fast. In rare (almost pathological cases) you can create some monstrous cases where the type inference does become exponential. If I recall my dates correctly, H-M type inference has been used in SML since the ’70s, and it wasn’t proven NP-hard until around 1990; in those 20 years, I don’t think that anyone found a single program that actually did require exponential time.
I am such a nerd. . . when MarkCC says “Daniel Jackson”, my immediate thought is, “That guy from Stargate?”
Blake:
Yep, that does pretty much peg you as a definite nerd 🙂
What’s scarier is that Dan’s father is Michael Jackson… And when I hear anything about Michael Jackson at work, I usually think of Daniel’s father, the software engineer Michael Jackson, which can make for some pretty confusing images.
Back reading blogs at the tail end of a 5-day flu I wouldn’t wish on anyone. (Yes, I had my flu shot.)
Some other good stuff on complexity for the ‘beginner’ is available at http://www.scottaaronson.com/democritus/lec5.html and http://www.scottaaronson.com/democritus/lec6.html .
The correct link to Chaitin’s article is:
http://arxiv.org/pdf/math.HO/0701164
Maybe I didn’t read the article closely enough, or maybe I’m just confused, but: since you disregard the proportional (constant) factor when figuring out the big O notation, it is not actually an upper bound. It’s an upper bound up to the constant; that is, for instance:
problem size = n = 10
problem needs 10*n^2 computations
problem is O(n^2)
problem actually takes 1000 > 100 computations
I’ve always thought that the big O notation is more useful as a description of how an algorithm or problem behaves, and not as an upper bound useful in practice. Besides, for small (practical) values of n, O can be deceiving.
But then, I’m an engineer, not a CS 😀
Miguel:
You are right; the correct way of describing it would have been “an upper bound on the growth rate of execution time”. Big-O is a way of describing how fast the execution time will increase as a factor of the increase in input size. So for a computation that is O(f(n)), the upper bound on the rate of slowdown as n increases will be proportional to f(n).
It might be worth pointing out that sorting requires n log n time ONLY if you’re worried about comparisons. Radix sort is one of the fastest sorting algorithms and runs in linear time, but it isn’t based on comparisons (it uses properties of integers), and so doesn’t violate the lower bound.
A quick and dirty way of seeing where n log n comes from is by seeing that a sorting algorithm has to be able to handle any permutation of the input and sort it, which is like saying that it needs to have a path to any possible permutation. There are n! permutations, and thinking of the computation as a series of branches, the depth of such a tree must be log(n!) ~ n log n.
O(n^f(n)) isn’t exponential time even for strictly monotonically increasing f.
Yeah, I’m somewhat confused. I was under the impression that exponential time algorithms were O(k^n) where k>1. That would actually produce an exponential curve. There are many things in between poly and exponential, thrown together as ‘super-polynomial time’. I believe that’s probably more of what you’re describing there.
Oh, and you may want to make it more explicit that in Big-O notation you drop everything except the most significant term, and all constants. So 10*n*log(n) + 1billion*n is still written as only n*log(n). You basically say this, but it’s a throwaway sentence that can be easily skipped.
O(n^f(n)) isn’t exponential time even for strictly monotonically increasing f.
Well, it is if f(n) is n/log(n). 🙂
Thank you for the link to those Scott Aaronson lectures! Wonderful! No offense to Mark Chu-Carroll (I check your blog every day), but that sounds like a class I wish I could join.
Chaitin in his paper [referenced above], section ‘Challenges for the Future’ states that there is a need for dynamic rather than static biological mathematic model.
I think work in this area began shortly after R Isaacs wrote ‘Differential Games I, II, III, IV’ from 1954-1956 or at least in pursuit-evasion games.
This work was continued by many others, including Tamer Basar and Geert Jan Olsder in ‘Dynamic Noncooperative Game Theory revised 1999 from 1982.
Steven M LaValle, ‘Planning Algorithms’ 2006, addresses this subject from a game theory, computer programming and robotics perspective.
There are probably more specific examples for biology than those I provided which are economic and motion oriented. Yet if biology is treated as a type of energy economics then the more general discussion may apply.