Computability

I just recently realized that I only wrote about computability back in the earliest days of this blog. Those posts have never been re-run, and they only exist back on the original blogger site. When I wrote them, I was very new to blogging – looking back, I think I can do a much better job now. So I’m going to re-do that topic. This isn’t just going to be a re-post of those early articles, but a complete rewrite.

The way that I’m going to cover this is loosely based on the way that it was first taught to me by a wonderful professor, Dr. Eric Allender at Rutgers, where I went to college. Dr. Allender was a really tremendous professor: he managed to take an area of computer science that could seem hopelessly abstract and abstruse, and turned it into something fun and even exciting to learn about.

Computability is the most basic and fundamental sub-field of theoretical computer science. It’s the study of what a mechanical computing device can do. Not just what a specific mechanical computing device can do, but what can any mechanical computing device do? What are the limits of what you can do mechanically? And once we know the limits, what can we discover about the nature of computation?

The study of computability has its roots in mathematical logic. The earliest study of computability was done by logicians that were looking at proofs and provability. The connection there is one of the things that fascinates me: from the outside, why on earth would things like the Principia Mathematica, which attempted to completely formalize mathematics from first principles, be connected to the theory of how the computer that I’m typing this article on? Once you understand what’s going on, it becomes almost obvious!

In logic, the process of proof is completely mechanical. You have a set of statements, and a set of inference rules. The inference rules allow you to use the initial statements to produce new statements. The sequence of inference steps that lead to the production of a particular new statement is a proof of that statement. The important thing about logic is that the inference rules are mechanical: you don’t need to know what a statement means to be able to use it in a proof. The inference rules just describe a syntactic structure in the statements, and tells you what you can produce if you can find an instance of that structure.

For example, one of the classic rules of inference is: given two statements, A(x) and forall y: A(y) rightarrow B(y), you can infer B(x). (Note: a typo in the previous sentence was corrected after being pointed out by commenters.) A could be any predicate at all. It could be “IsBig”, or “IsBlue”, or “Simple”, or “Less than 8”, or “Is cute”. x could be any object: it could be a person, or it could be a crayon, or it could be an abstract set. It doesn’t matter.

So you can view the process of proof as computation. And we do. The rallying cry of functional programmers is: “The program is the proof”, because a functional program can be read as a (very complicated) proof; and the type system that’s used to analyze the program is a very simple proof.

Anyway, the way that we talk about computability takes a lot from logic.

You can describe the basics of computability in a few different ways. You can take the very strict logical route; you can take the recursive function theory route; or you can take the automata route. I like to follow the basic way that Prof. Allender taught me, which is the automata version. I like this approach, because as we’ll see, it gives you an almost tactile sense of the differences between the different classes of computability.

One of the many legacies of the logical roots of computability is that we talk about it in terms of languages. We could just as easily talk in terms of functions (which is the way that the recursive function theory approach does it), but the logicians that built the foundations worked in terms of languages.

Given a computational problem, you can express it as a language; and you can express solving that problem as determining whether or not a given string is a member of that language. That sounds very abstract – so let’s take an example.

Suppose that the problem we want to solve is adding two numbers. Then we can describe it as a language consisting of sentences “x+y=z”, where a sentence is a member of the language if and only if the sum of x and y is z. So “1+2=3” and “27+42=69” are valid strings in the language, but “19+18=54” and “2+4” are not. (“2+4” because it doesn’t even have the right structure.)

Any formal language can be described using something called a phrase structure grammar or Chomsky grammar (after Noam Chomsky – yes, that Noam Chomsky, who is a brilliant logician/linguist as well as a political gadfly.). That means that any computation can be described as a grammar – which is, in itself, pretty amazing – but we’ll get to that in a future post.

A Chomsky grammar consists of four elements:

  1. A set of terminal symbols (also called characters). This set is called the grammar’s alphabet. All strings in the language will consist of sequences of terminal symbols.
  2. A set of non-terminal symbols (NTSs).
  3. One distinguished member of the set of NTSs, called the start symbol.
  4. A collection of replacement rules (also called productions). Each replacement rule consists of a left-hand-side and a right-hand-side. The right hand side is a sequence of mixed terminals and non-terminal symbols, or the empty string (which is written epsilon).

It’s easiest to explain how this works with an example. We can define a language that consists of sequences of any number of “1”s followed by at least one “2”:

  • Terminals: {1, 2}.
  • Non-terminals: {S, A, B}
  • Start: S
  • Productions:
    1. S rightarrow A B
    2. A rightarrow 1 A
    3. A rightarrow epsilon
    4. B rightarrow 2 B
    5. B rightarrow 2

So what this says is: you to produce a string in this language, you start with S. Then you do whatever productions you want, until you have a string of all terminals, and then you stop.

So you could produce the string “112222”:

  1. S
  2. AB (Rule 1, replacing S by AB.)
  3. 1AB$ (rule 2, replacing A by 1A.)
  4. 11AB (rule 2, again)
  5. 11B (rule 3, replacing A with the empty string)
  6. 112B (rule 4, repcaing B by 2B.)
  7. 1122B (rule 4 again)
  8. 11222B (rule 4 again)
  9. 112222 (rule 5, replacing B with 2.)

Another example, is considerably more complex computationally, is paren matching. It produces strings that have the same number of open and close parens – but not just N opens followed by N closes – any sequence of opens and closes, as long as it ends up matching.

  • Terminals: {(, )}.
  • Non-terminals: { S }
  • Start: S
  • Productions:
    1. S rightarrow S S
    2. S rightarrow (S)
    3. S rightarrow epsilon

For an example of this, we can generate the string “(()(()()))”:

  1. S
  2. (S) (rule 2)
  3. (SS) (rule 1)
  4. ((S)S) (rule 2)
  5. (()S) (rule 3)
  6. (()(S)) (rule 2)
  7. (()(SS)) (rule 1)
  8. (()((S)S)) (rule 2)
  9. (()(()S)) (rule 3)
  10. (()(()(S))) (rule 2)
  11. (()(()())) (rule 3)

In computability theory, we talk about four general classes of computability – and those four classes can each be expressed by using different restrictions on the structure of production rules in Chomsky grammars. They’re typically talked about in terms of a hierarchy defined by Chomsky:

Level 3: the regular languages
These are very simple languages, which can’t do anything particularly interesting: no counting, nothing beyond sequencing and repetition. Regular languages can be recognized by a simple kind of machine called a finite state automaton.
Level 2: the context free languages (CFLs)
These are widely used in practical computer science. They’re reasonably expressive; you can do a limited kind of counting with them, but nothing too deep. CFLs can be recognized by a machine called a push-down automaton.
Level 1: the context sensitive languages (CSLs)
Context sensitive languages are the things that can be written a more advanced kind of grammar. You can express a lot of quite deep computation using CSLs. CSLs can be recognized by a very interesting machine: a linear-bounded tape non-deterministic turing machine. Level 1 includes most things that we can really compute: the linear-bounding non-deterministic machine is a pretty good match for the capability of a real computer; level 0 conceptually requires infinite memory.
Level 0: the recursively enumerable languages.
Anything computable. Level 0 languages are languages which can be accepted by a regular, unbounded turing machine.

The Chomsky hierarchy is traditional, but it’s not perfect. There are a couple of things missing in it; there are some things between level 0 and level 1 in the Chomsky hierarchy.

There’s also one little cheat in the definition above about level 0. I said that level 0 is equivalent to Turing machines that can accept an input. When you get to a machine as sophisticated as a Turing machine, it can do one of two things on a given input: it can accept the input, or it can decide about the input.

Accepting the input means that if the input is a member of the language, it will eventually stop and say “Yes, this is in my language”. Deciding about the input means that that the machine will eventually stop and say either “Yes, this is in my language”, or “No, this is not in my language”. The difference is that an accepting machine does not stop if the string is not in the language. It might be able to stop and say no sometimes, but there is no guarantee. So it will say “Yes” if the string is in the language; it will not say no if the string is not in the language. The set of languages that’s acceptable by a Turing machine are called the Chomsky level-0 languages; the languages decidable by a Turing machine are called the recursive languages.

In the next couple of posts, we’ll look at the grammars and corresponding computing machines for the Chomsky levels. Then we’ll use that as a launching pad for exploring more about computability.

25 thoughts on “Computability

  1. Andrei

    Yeah, that certainly is unusual, I first heard about him in my Formal Languages course, then I looked up for information and found out about his activism.

    Reply
  2. Jesse

    given two statements, A(x) and Vy: A(y) -> B(y), you can infer B(y)

    I think you mean “you can infer B(x)”.

    Reply
  3. Bruno

    I often wish you were my teacher.

    (The standard Chomsky disclaimer: Chomsky’s cred as a linguist is often disputed within the humanities—as should be anyone theorist’s in the humanieties anyway. Having graduated as Hellenist, I did a lot of Linguistics at some point. If someone asked me for sober criticism on Chomsky, I’d say he is a brilliant fellow with a very sharp eye for linguistic phenomena, who just happens to be a tad all-in about his theories at times—at which point he will often mix logic with rhetoric in a fashion that’s neither scientific nor philosophical. Solid grounds in logic and methodology are thus a must for anyone who wishes to really learn from his literature.)

    Also, s/you can infer B(y)/you can infer B(x)/

    Reply
  4. stigant

    >> So it will say “Yes” if the string is in the language; it will not say no if the string is not in the language.

    I think the second part of this sentence should be “it will not say ‘yes’ if the string is not in the language” It will either terminate and say no, or not terminate.

    Reply
  5. DSimon

    In the 6th paragraph, don’t you mean to say that the statements “A(x)” and “for all y: A(y) -> B(y)” allow you to infer B(x), not B(y)? I mean, y isn’t even defined outside the scope of the second statement.

    Reply
  6. khms

    I should point out that your characterization of level 3 languages is slightly off: regular expressions are important all over in IT.

    Most languages these days seem to be defined with level 2 grammars. I’ve seen exactly one language defined by a level 1 grammar: Algol 68. It took me quite a while, but I finally managed to understand that grammar.

    In any case, levels 3 and 2 will be very familiar to every reasonable compiler writer. The canonical book on this, the so-called Dragon Book, spends pretty much all chapters on parsing on these grammars. (Different Wikipedia articles for different editions of the book!)

    Reply
    1. Shannon Severance

      While “regular expressions are important all over IT” they may or may not match “Level 3: the regular languages” as described here. Perl added a number of extensions to pattern matching that push regular expressions beyond regular languages: http://en.wikipedia.org/wiki/Regular_expression#Patterns_for_non-regular_languages

      The Perl extensions have been picked up an replicated in a number of other regular expression engines. One of these days I’m going to find time to study the extensions, since my model of regular expressions and what they can and can not do is still stuck back in CS Theory taught in the early nineties.

      Reply
  7. william e emba

    I think the ultimate judgment regarding Chomsky will be that he rates in linguistics somewhere about the same as Freud does in psychology: a raving crackpot with an indefatigable gift for argument who stirred the pot incredibly like no one before or since.

    And obviously, Chomsky barely rates as a logician.

    Reply
  8. James Wetterau

    For someone not familiar with what it means for parens to “match”, your explanation might be a little confusing:

    “Another example, is considerably more complex computationally, is paren matching. It produces strings that have the same number of open and close parens – but not just N opens followed by N closes – any sequence of opens and closes, as long as it ends up matching.”

    You might want to explain that for parens to match, the open, or left, paren “(“, must come before a matching close, or right, paren “)”. Otherwise, a naive reading might simply be that there should be one left, or open, paren for every right, or close, paren found anywhere in the string. And clearly that is not what your rules do.

    To have such a simple paren pairing system, it suffices to add to your three productions the following, fourth one, I think:

    S -> ) S (

    Incidentally, can we use tex in our comments?

    Reply
  9. Robert R

    I don’t have much experience in this sort of thing (just one undergraduate level course in the topic) but I am confused about your comment that CFL’s can’t really do anything that “deep.” Aren’t most programming languages described with a Context Free Grammar, and are, therefore CFLs? To me, this seems rather powerful. For example, the professor who taught this course said that, in his opinion, the development of compilers has been the greatest advancement in software, so far. So, have I misread or misunderstood something? I understand that CFLs aren’t as expressive as the Level 1 or 0 languages, but they still seem rather expressive.

    Reply
    1. David W

      The *syntax* for many programming languages can be described using a context-free grammar.

      But CFLs fall way way way short of expressing the semantics of general computer programs. For instance, you can’t even write a CFL that only generates the strings of the form aaa…abbb…bccc…c where there are equal numbers of a’s, b’s and c’s, whereas you could very easily write a computer program that only generated such strings.

      Reply
  10. AKposthuman

    “Computability is the most basic and fundamental sub-field of theoretical computer science. It’s the study of what a mechanical computing device can do. Not just what a specific mechanical computing device can do, but what can any mechanical computing device do? What are the limits of what you can do mechanically? And once we know the limits, what can we discover about the nature of computation?”

    I feel that there is something important missing from this discussion. Any computing device, mechanical, quantum-mechanical, is based on specific physics. I find it impossible to separate a mechanical (abstract) computing device and physics it has got to run on, because anything in this universe is a physical device even our minds. So what can physics of such a device tell us about the limitations of computation and its nature?

    Reply
    1. David W

      In practice, the physics hasn’t mattered. That is, in practice, we observe that all sufficiently powerful machines/programming languages/automata are all equally powerful. They are all Turing-complete.

      (http://en.wikipedia.org/wiki/Turing_completeness)

      Basically, what Turing-completeness means is that your machine (if given enough memory) could run any desired algorithm that you could code up in any other form, such as in your favorite programming language. Such as algorithms for multiplying numbers, computing primes, sorting lists of things, encrypting messages, simulating protein folds, solving equations, simulating 3d worlds of polygons and shapes (computer games), etc.

      Surprisingly, it doesn’t take much to be Turing-complete. All you need is access to some memory where you can freely record and erase data, and a way to make different decisions based on what you’ve read/written. That’s all. This means that’s it’s often hard to make a “computing machine” that *isn’t* Turing-complete, that is, if it’s at all more sophisticated than just a plain calculator.

      (There’s a caveat – technically, your memory needs to be unbounded. But basically, the only limit is memory – all our computers can easily do anything that a theoretical Turing machine with unbounded memory can do. It’s just that we just need to give them enough memory depending on the specific computation.)

      On the flip side, Turing-completeness seems to also be the limit. Right now, there’s not even the slightest hint in modern physics that more might be possible. All the stronger models of computation require things like being able to do infinitely many computations in a fixed amount of time, or computing numerical values to perfectly infinite precision (computing infinitely many decimal places all at once), or things like that.

      We do think quantum computers could be used to compute certain things much *faster* than with ordinary computers. So physics, particularly quantum mechanics, does matter a lot for the *speed* of computation, which is the subject of complexity theory. But computability theory doesn’t care about how fast it is, it just cares about what’s possible at all.

      So for computability, the physics of any particular machine is completely irrelevant, and will remain irrelevant unless anyone can find any reason at all to think that Turing completeness isn’t the limit.

      Reply
      1. william e emba

        On the flip side, Turing-completeness seems to also be the limit. Right now, there’s not even the slightest hint in modern physics that more might be possible.

        You’re out of date by a few decades. Geroch and Hartle pointed out long ago that since quantum gravity almost certainly involves questions about 4-manifolds, and that there does not seem to be a computable way to enumerate 4-manifolds, quantum gravity is quite possibly not computable.

        This can, in principle, be turned on its head: accurate measurements that test quantum gravity in essence would be solving Turing uncomputable problems.

        The book by Pour-El and Richards went further and identified unsolvable phenomena within classical physics. In brief, unbounded operators do not preserve computability. Again, this suggests “supertasks” can occur, and generate Turing uncomputable results.

        Reply
      2. AKposthuman

        David, thanks for your reply.

        “There’s a caveat – technically, your memory needs to be unbounded. But basically, the only limit is memory”…

        There is a two-fold way to think about it, macro and micro. First, on the macro side, you said it yourself, our real universe is basically limited – there has been no more than 14bn years after it started, and there is a number of theories which state that it will come to an end in the remote but finite timeline. So if you tell me that a solution (computation) of certain problems requires time and memory (resources) which are greater than the universe is likely to have, it is like not saying much at all, don’t you think?

        Then, on the micro side, any realistic type of memory and reading devices likely to be required for such problems – and basically any real problem even these days already (not so many computations one can realistically accomplish with a tape and ruler) – will be quantum mechanical, that is operating with purely quantum mechanical structures. Quantum mechanics and everything which follows from it (properties of matter, chemistry, our best understanding of the high-energy and low energy physics, many-worlds interpretation, quntum computers, etc etc etc) are our most fundamental and at the moment the only way to understand reality. Most simplistic starting example of a computing structure is a transistor and its follow-ups, further and further down the scale: nano / pico / femto computing. There you will have to face the fact that computation and (quantum) physics would be quite hard to tell apart.

        Reply
        1. David W

          Yes, but that’s not the theory of computability. That’s complexity.

          Theory of Computability: “What could we compute, if we had unlimited resources, and what would still be impossible even with unlimited resources?”

          Theory of Complexity: “What can we compute with different types of (limited) resources? Such as different limits on time and space, or access to things like random bits or different types of circuits or gates.”

          Reply
      3. william e emba

        On the flip side, Turing-completeness seems to also be the limit. Right now, there’s not even the slightest hint in modern physics that more might be possible.

        This statement is decades out of date.

        In the 1980s, Geroch and Hartle published an article which pointed out that Feynman diagrams for quantum gravity typically involve sums over all 4-manifolds, and that the class of 4-manifolds is known to be Turing non-computable. Implicit is the converse: the results of an appropriate experiment could provide information about a non-computable value, and hence violate Church’s thesis.

        Earlier, Pour-El and Richards had similar claims about noncomputability involving classical wave equations. The relevance of their work is easier to dispute (we have explicit models of what we mean, after all).

        (I thought I replied to this last night, but it seems not to have showed up. I don’t know if it was a bad machine on my end or if the two embedded links, here omitted, forced a hold-up. In both cases, Google on the two authors’ names–optionally with “computability”–will turn up pdfs of some of the relevant papers. These papers were talked about on Usenet way back when, so searching on Google groups will also turn up commentary.)

        Reply
        1. David W

          I admit that I was overstating things. The references you provided are quite interesting – this sort of thing does make the line somewhat fuzzy. To be sure, if we had noncomputable constants in physics, it’s not obvious how one might exploit them for more general computation, in particular, how to measure them to arbitrary precision. It sort of re-raises the question of what you consider “unlimited resources” to mean when translating these asymptotic statements to the real world – measuring physical constants falls into one of those grey areas.

          Anyways, I generally don’t put too much stock in the more speculative realms of theoretical physics. At least in my impression, there’s such a plethora of different theories and variants of theories flying around, that until one or two emerge as the clear best candidates or obtain some more distinguishing evidence, I’m going to basically ignore them. You can speculate endlessly about various possible challenges to Church-Turing, but nothing has seriously established itself yet.

          Reply
          1. william e emba

            It’s much more than just a question of measuring constants. The possibility implicit in Geroch-Hartle is that quantum gravity is ultimately Turing non-computable. For example, in the context of string theory, this might show up as the observed particle spectrum. This could be exploited as follows. The theory asks for a list of all 4-manifolds of a certain type, say, then compute a convergent sum going down the list, with known estimates, so you know when to stop. The problem is you must enumerate these 4-manifolds, and this is non-computable. So you enumerate computational explicit constructions of 4-manifolds, and you simply don’t know how to tell when they are distinct (because of the unsolvability of the word problem). The existence or not of certain predicted particles will imply (because your computable estimates are sharp enough) that certain 4-manifolds must be or not be distinct.

            Symbolically, you know X=a+b+c+…, except you’re not sure what X is, and you’re not sure which terms are improper duplicates. Knowing X experimentally narrows things down–in ideal circumstances, you deduce two 4-manifolds are distinct.

            String theory remains, after 40 years or so, the only known way to unite GR with HEP. Much of the plethora is simply because we do not know how to break its symmetry. Or worse, we really don’t know how to do QFT. We fake it, and various phenomenological guesses are studied for clues.

            The only alternatives are probably non-commutative geometric, and they can only make the computability issue even hairier.

            On the other hand, there is so much mathematics running around in string theory that it may turn out Geroch-Hartle non-computability is a red herring. There could be theorems that establish far superior ways to compute things, and the role of the manifolds is to simply exist and put the whole structure together, while the surface aspects–better known as “reality”–are ultimately quite tame.

  11. ArtK

    I’ll be looking forward to the articles. “Theory of Computability” was one of my favorite classes as an undergrad.

    Reply
  12. Doug Spoonwood

    Interesting post. Line 3 of the first derivation looks like it has a typo with the cash sign. I don’t mean the following as criticism, just more as something to think about. Mark wrote:

    “The study of computability has its roots in mathematical logic. The earliest study of computability was done by logicians that were looking at proofs and provability. The connection there is one of the things that fascinates me: from the outside, why on earth would things like the Principia Mathematica, which attempted to completely formalize mathematics from first principles, be connected to the theory of how the computer that I’m typing this article on? Once you understand what’s going on, it becomes almost obvious!”

    I do see a connection here, but how deep does it go? At least, in any propositional calculus one need not have any inference rules whatsoever. One can just check all possible combinations of values for the variables a formula to see if it does hold true for the logic at hand (truth tables constitutes one such method). If all combinations of values for the variables always yield some value, the formula always has that value. So if all combinations of values for the variables come out true, the formula is true. Thus it can always get proven from a complete set of axioms. You don’t need to know what such a proof looks like, and the proof could come as much, much more complicated. Does there exist a similar way to check and see if a formula exists in a computational system without knowing the “replacement rules”?

    Also, if one has a conditional which always holds true, then one can take all premises of that conditional and infer an inference rule in some system of propositional logic that one wants to use. And every inference rule in a propositional logic has two corresponding conditionals (or analogues of the material conditional if say we only have the Sheffer stroke): one with the conjunction all premises as the antecedent and the conclusion as the consequent, and the other with multiple conditionals each premise as an antecedent of a respective conditional. In Polish notation, let K…K(n-1)a…n stand for a conjunction of n-1 conjunctions with n variables. For n premises the above means for any inference rule
    “Premise a… Premise n… Conclusion z” corresponds to a formula CK…K_(n-1) a…nz in propositional calculus and another formula CaCb…Cnz, where the “…” here means we have a sequence of “Cx’s” where x stands for any premise. Does anything like this hold for computational systems in general?

    Reply

Leave a Reply