Backus's Idea of Functional Programming

In my earlier post about John Backus, I promised to write something about his later work on
functional programming languages. While I was in a doctors office getting treated for an awful
cough, I re-read his 1977 Turing Award Lecture. Even 30 years later, it remains a fascinating read,
and far from being dated, it’s positively astonishingly to see both how far-sighted Backus was, and how little progress we’ve actually made.

Backus started from a pretty solid perspective. Almost all of the common programming
languages that we see – ranging from Backus’s own 1950s version of Fortran, to the most modern languages we see widely used today (Java/C/C++), to the various languages that have been proposed as alternatives to those (Modula-3, Oberon, Ada), even to most of the supposedly functional programming languages (Lisp, ML) – all of them are ultimately based quite strongly on the von Neumann model of computing.

The von Neumann model of the computer consists of two parts: a processing unit,
and a state, connected by a tube. The processing unit can do all sorts of mathematical operations on little piece of data. But the data lives in the state on the other end of the tube. Computations occur by having the processor pull little pieces of data from the state through the tube, do something to them, and then push them back through the tube, updating the state. If you know how modern computer hardware works, that should sound pretty familiar to you: our computers are basically von Neumann machines: there’s a CPU, and a bunch of memory, and they’re connected by a data channel called a bus. The computation works by fetching things from memory into the CPU, where the CPU does some arithmetic operations on it, and then storing the results back into memory. In terms of programming, von Neumann computing is basically programming in a system with assignment statements.

According to Backus (and many other people), the problem with assignment statements is
that they divide the programming language into two distinct worlds: the world of functions and algebra; and the world of assignments. That’s the world on the right-hand side of an assignment statement, and everything else.

That division stinks for a lot of reasons. Just for a start, it can rob a system of a lot of its clarity; make code far more complex and hard to read; make code far harder to reuse; and make it much harder to build generic code for gluing together computations.

Backus correctly pointed out that that basic division of code into two universes – statements and expressions – was a source of incredibly complexity in programming languages. What’s sad to realize is that Backus talked – with awe – about the idea of programming languages whose complete specification reached 500 pages. And today, we have programming language standards documents that exceed 2,000 pages.

The alternative Backus proposed was to adopt a kind of function-oriented programming. In a Backus-style functional language, there are no variables. None. In fact, in
Backus-style functional code, there is no way to even name a piece of data. Everything is built from a set of primitive functions, which are put together using a variety of extremely flexible combinators.

Let’s take a look at a bit of Backus-style FP code, to get a bit o a sense of what that means.

Suppose we want to compute the inner product (otherwise known as the scalar product) of two vectors. Here’s how we’d write that in Python in classical imperative (von-Neumann) style (and without error checks to ensure that the two vectors have matching lengths):

def innerProduct(a,b):
sum = 0
for i in range(len(a)):
sum += a[i]*b[i]
return sum

The thing that bugged Backus about that is that the real key to what’s going on in an inner product is obscured – the looping, the figuring out the lengths of the arrays, the division of
stuff a sequentially iterated assignment statement to generate a result – it really hides the fundamentals of what’s going on.

In Backus’s FP, inner product would be written:

Def InnerProduct ≡ (/+)º(α×)ºTranspose

The Backus FP implementation is built from the following building blocks:

  1. A function that transposes the two lists, transforming a pair of lists into a list of pairs; so [[1,2,3],[6,5,4]] would become [[1,6],[2,5],[3,4]].
  2. A function “+” that adds two integers.
  3. A function “×” that multiplies two integers.
  4. A combinator “α”, which applies a function to every member of a list. So α×([[1,6],[2,5],[3,4]]) is equivalent to [1×6, 2×5, 3×4]
  5. A combinator “/”, which inserts a binary function between all of the elements of a list. So “/+([1,2,3,4,5])” is equivalent to “(1+(2+(3+(4+5))))”.
  6. A binary combinator º that composes the functions on its left and right. So fºg(x) = f(g(x)).

What it actually means is: the inner product is the result of adding up the products of corresponding pairs of two vectors. If you carefully look at the FP code, that’s exactly what it says: “Transpose” is a function that returns a list of corresponding pairs; that’s composed with “α×” which will apply × to each pair; so “(α×)ºTranspose” gives a list of the pairwise products of the elements of two lists; and “/+” will compute the sum of a list of values, so composing that in will give you the sum of the products of the corresponding pairs of two vectors.

The problem with FP is that it’s just totally illegible in textual form. People stink at thinking about these kinds of function composition. or example, here’s matrix multiply – which is really a pretty simple piece of code – written in FP:

Def MatrixMultiply ≡ (ααInnerProduct)º(αdistl)ºdistr;º[1,Transposeº2]

That’s mostly elements from that I’ve already explained. The pieces that are new are:

  1. “distl” is an operation that does a distribution of an operator from left to right; “distr” distributes from right to left.
  2. “1” and “2” are primitive functions that select the 1st and 2nd elements from a list.
  3. “[]” is a combinator which turns two separate functions into a single function that returns a list of two elements.

So now you’ve seen all the building blocks that make it up. And it’s a short program. Can you
figure out how it works by looking at it? It took me a while to work my way through it, and figure
out how to it works. It’s not easy. It took me far longer to understand that than
it would take me to figure out a matrix multiple in, say, Fortran.

As you can probably see, I’m very skeptical of whether this kind of ultra-pure functional programming is really a good idea. The only thing that makes me think that there might be something to this kind of function stuff is looking at people who program in APL and it’s successor, J A huge amount of programming in APL and J are in a style very much like this – and while there’s a hard learning curve, people who program in APL and/or J are incredibly productive.

Anyway – from the primitive FP, Backus goes on to propose something called an applicative state transition system, or AST. An AST is a system which defines a state, consisting of
a set of bindings of names to values. A computation is written in AST as the application of
a function to the state; and the result of the application replaces the old state. So a computation becomes a sequence of state-to-state function applications.

If you’ve been reading my Haskell tutorial, that ought to sound extremely familiar. A AST is essentially the same thing as a State monad; an AST with IO is a combined monad – the result of applying the IO monad transformer to the State monad. 30 years ago, Backus basically told us what the solution to the problems of IO and State in a functional language would be! And 20 years after he told us the solution, someone went to a lot of trouble to independently reinvent what Backus told us.

If you’re interested in trying it out, there’s are implementation of Backus’s FP here.

0 thoughts on “Backus's Idea of Functional Programming

  1. Tulle

    mmmmmmmm APL, I love APL. My APL professor would give out an automatic A to anyone who could do any one programming assignment in fewer characters than his solution. I did the “find matching perens” in 22 characters, his solution was 24. I got an A!!!!!

    Reply
  2. David Swift

    One could write innerProduct() in a more functional style in Python thusly:
    def innerProduct(a, b):
    return reduce(
    lambda x, y: x+y,
    map(
    lambda (x, y): x*y,
    zip(a, b)))
    Where map is equivalent to the combinator α and reduce is the combinator /.
    It seems that people generally don’t write code in this style, which is a shame. It’s only unwieldy in this case because we have to define a function form of addition and multiplication (and I could have used sum instead of the reduce).

    Reply
  3. Mark C. Chu-Carroll

    David:
    I know that you can write a much cleaner, more functional version of innerProduct in Python. But I wasn’t trying to illustrate Python programming style; I was trying to illustrate the kind of program that Backus was criticizing. What Backus was complaining about was the assignment-centric nature of most imperative code, and the amount of control-flow scaffolding that’s necessary to translate a mathematical procedure into word-by-word assignments. Functional techniques like the use of lambda, map, and zip are things that were added to Python to try to solve the problems that Backus was complaining about – using solutions to Backus’s complaints to illustrate Backus’s complaints wouldn’t make sense.

    Reply
  4. David Swift

    I got so excited about my little snippet of code I posted it without thinking about what I was trying to say… 🙂
    My point was that even in languages which DO provide solutions to Backus’ complaints, most programmers tend to fall back upon imperative coding style like the example in your essay. I wonder why that is?

    Reply
  5. Steve Massey

    The programming language J which Mark links to above has freely available implementations for the holy trinity of Linux, Windows and Mac, which is perhaps an easier way of getting the flavour of Backus-style functional programming than the half complete FP implementation under ocaml which Mark links to.

    Reply
  6. D. Eppstein

    David Swift’s solution can be made a little clearer if one removes the lambdas:
    def innerProduct(a,b):
    return reduce(operator.add, map(operator.mul, zip(a,b)))
    Here zip is essentially the same as Backus’ transpose.
    I think, though, that writing in not quite as dogmatic a functional style makes it a lot more readable:
    def innerProduct(a,b):
    return sum((x*y for x,y in zip(a,b)))

    Reply
  7. Mark C. Chu-Carroll

    I agree that the last version – “sum((s*y for x,y in zip(a,b)))” is much easier to read than the Backus-style functional version. It’s a good example of where I think Backus was a little bit off – getting rid of all variables for values is a mistake. Sometimes being able to name something inside of a complex expression can make a huge difference in how readable it is.
    To switch streams a bit, I think that’s one of the things Haskell got right. In Haskell, you can write your code FP-style – no named parameters, everything implicit through primitive functions and combinators. And sometimes, that’s great. But sometimes, it sucks. And that’s why things like named parameters, “let” expressions and “where” clauses come into play – and I don’t really think anyone would seriously argue that Haskell would be a better language if you removed “let” and “where”, much less named parameters.

    Reply
  8. Andy

    D. Eppstein:
    You don’t need the extra pair of parens when using a generator expression in a context like that, so it can be:

    def innerProduct(a,b):
    return sum(x*y for x,y in zip(a,b))

    Reply
  9. Blake Stacey, OM

    Does it reflect well on me as a human being if my first attempt at an inner product function in Python would have been Andy’s two-liner? :-/
    MarkCC said:

    It’s a good example of where I think Backus was a little bit off – getting rid of all variables for values is a mistake. Sometimes being able to name something inside of a complex expression can make a huge difference in how readable it is.

    Dogmatism considered harmful!

    Reply
  10. nikita

    It always struck me that Backus, and other people, critiquing imperative programming, are attacking the wrong target. All imagined difficulty with having statements vs. expressions is due to sticking with operational reasoning about programs, where one tries to describe program by following all possible paths of execution and all possible changes of global state. But this whole approach is long known not to work. Behavior of imperative program is much better captured by axiomatic description based on e.g., predicate transformers. For this description assignment posses no difficulty whatsoever. It’s interesting, that Bakcus conveniently described only very limited result about recursion (in 12.5): proving things about recursion in FP is much harder than proving things about iteration in imperative language.
    Also, see Dijkstra’s “A review of the 1977 Turing Award Lecture by John Backus“.

    Reply
  11. Alex

    I’m just learning Python, and having read Backus’s paper yesterday, I’m struck by how much he must have influenced Van Rossum.

    Reply
  12. Kevin Greer

    Those interested in trying out FP, but who would like a more accessible version (one which doesn’t use all of the funny characters and look like line noise) might consider having a look at Illinois FP, a variant of Backus’ FP.

    Reply
  13. WaterBreath

    > Sometimes being able to name something inside of a complex expression can make a huge difference in how readable it is.
    SQL suffers from this flaw. I can’t count the number of times I’ve wished for a bit more robust way to reference an operation I already wrote out earlier in the query, rather than having to completely restate it. Sub-queries and WITH clauses can mitigate this a bit, but this feels like a hack when you’re not actually using it to query a table.

    Reply
  14. Fj

    Ain’t it a bit erroneous to assume that any programming task boils down to pure math given robust enough language? I mean, doing dot products, cross products, permutations, hashtable lookups — all that stuff does not exhaust common programming tasks, and actually there are fields like system administration, where utmost majority of the tasks are something like “execute that external function using these external functions’ returns as arguments, filtered by some criteria”, which has nothing to do with math altogether.
    “General-purpose programming” is a mix of mathematically describable tasks and those imperative “do this, then do that” tasks, hence general-purpose programming languages like Python or C#, with no solid mathematical foundation, but allowing to use lambdas and stuff if appropriate, while in “pure” languages like Haskell, for example, IO looks like a dirty hack (it is consistent etc, sure, but kinda too sofisticated for such a trivial mission).

    Reply
  15. Michael Pohoreski

    Nice article.
    Backus was trying to solve the wrong problem by getting rid of variables, though, IMHO.
    Variables are aliases / shorthand. Getting rid of them doesn’t make programming simpler, it makes it WORSE, because code becomes WAY more verbose.
    Its the same reason we have functions, AND operators. They are more then just fancy “Syntax Sugar” – it is so we don’t have to use numbers, because that is ALL a machine is doing. Moving numbers around, or calculating with them. When you get right down to it, the ONLY thing a computer does is one of three things: load, store, calculate (boolean logic, masking, etc.)
    A good programming language is orthogonal wrt to operators and complexity. Sure more operators lets you abstract the patterns better, but then there is more to learn. We use names/functions because they are easier to remember, and less likely to run out of obscure symbols.
    We just happen to assign names to the common patterns.
    Naming things is not the problem — providing a logically consistent and expressive language is, while minimizing complexity, and maximizing “readability”, all with some sort of balance. Sure you get some interesting ideas when you take something to an “logical extreme”, but then you also loose out on alternative paradigms / abstractions.

    Reply
  16. Fritz Ruehr

    I think it’s worth pointing out that Backus’ notation and ideas borrowed a lot from APL, a fact I think he acknowledged in his Turing award lecture. (I grew up using APL in my mid-teens and still have a soft spot for it, although I haven’t used it in close to 30 years.) I believe the inner product mentioned here would be just +.* in APL (modulo my use of asterisk for multiplication), although I might be off by a matrix transpose.
    Not that shorter is necessarily better 🙂 , but Iverson recognized that this particular combination might be used a lot and blessed it with a particularly convenient form (albeit still generic wrt the two operators). When I interned at IBM in their Watson Research Center, the APL group was proud of the fact that their APL system out-performed the “FORTRAN optimizing H compiler” on boolean vector/matrix manipulation for just these sorts of reasons: it could see the structure of the algorithm the programmer intended directly in the code, without needing to infer through layers of loops, etc.,.
    Backus, Iverson, Dijkstra, … . I wonder if anyone out there has (or had) taken advantage of the access we still had to the founders to do some sort of oral/video history?
    *sigh* Now I will have to go dig out Trenchard More’s axioms for Array Theory out of sheer nostalgia … .

    Reply
  17. sean

    FWIW, there’s another plain-text version of FP in Perl, http://search.cpan.org/dist/Language-FP-0.03/ .
    My experience with FP was that programming quickly degenerated into using dist*, const, id, and numbers to route anonymous values around the computation to where you need them. It’s not too horrible until you have to use a value in more than one place. Quick, someone! Tell me what “totient” does:

    def totient = /+ . @((== . [1, `1] -> `1 ; `0) .
    (while (> . [2, `0]) ( reverse ; id) . [2, -]))
    . distl . [id, iota]

    Reply
  18. John Perry

    I’ve never worked with functional programming before. Looking at that first line of FP you gave for the cross product, I rolled my eyes because it was more or less illegible. The explanation helped clarify things, but the functional Python posted in another comment was much more legible, if only because the author didn’t use cryptic symbols. It’s like the difference between Pascal and C, and that joke that goes around about C being a hoax: We stopped [developing the C language] when we got a clean compile on the following syntax:
    for(;P(“n”),R-;P(“|”))for(e=3DC;e-;P(“_”+(*u++/8)%2))P(“| “+(*u/4)%2);

    And APL is infamous for this.
    Mathematics is no stranger to cryptic mnemonics, but we use composition of functions a great deal, and can simplify extremely difficult problems by taking advantage of that structure. In such a case, the problem becomes simpler, not harder. Consider for example the Chain Rule for taking derivatives in Calculus.
    So I wonder, when you write:
    The problem with FP is that it’s just totally illegible in textual form. People stink at thinking about these kinds of function composition.
    Is the problem really that people stink at thinking about these kinds of function composition, or that advocates of functional programming (Backus, APL, etc.) have a predilection for cryptic symbols rather than keywords?

    Reply
  19. Mark C. Chu-Carroll

    John:
    I think that people stink at thinking about complex computations in terms of pure function composition chains.
    People suffer from the fundamental limit that we can only keep track of a certain number of things simultaneously. The Backus style of FP requires that you be able to keep track of all of the values in the composition chain as you write and (worse) read it. Without being able to put a name to something, you need to always carry everything in your head – and that’s extremely difficult for most people even for small pieces of code; I question whether a really complicated piece of code in that ultra-pure variable-free syntax would ever be readable to anyone but its author.
    APL and J are known for being similar in many ways to the style of programming advocated by Backus. I don’t know APL (never had access to an APL system to learn); but in J, you do have a way of naming intermediate values, and people seem to use them frequently.
    You can also do the Backus style of program in Haskell – they call it “points-free” style. There are translators that will help you generate points-free code from regular Haskell code. Even so, I don’t recall seeing much points-free code more complex than the Python example in the comments here.
    Naming things is a useful technique for working around a fundamental human limit – and I think that’s a big strike against Backus’s FP.

    Reply
  20. Samuel A. Falvo II

    What amazes me is that reverse-polish notation is _precisely_ function composition. Let’s assume, for the sake of argument, that Forth supported the data types required of FP. Then, we can rewrite:
    Def InnerProduct ≡ (/+)º(α×)ºTranspose
    into:
    : InnerProduct transpose α× /+ ;
    Of course, we would need to define α× and /+ as separate functions, since Forth identifies a function exclusively on its name delimited by whitespace (like Lisp, no non-space characters are considered special). But, it is a mere small price to pay for significantly simplifying the implementation of the language.
    : α× ‘ * map ;
    : /+ 0 ‘ + foldr ;
    The programming language Joy was created to explore pure functional programming in the context of reverse polish notation.
    Note also how RPN syntax tends to be significantly easier to read — like people raised with a European language, it reads left-to-right. So (f.g) x is merely “g f”.
    I have to wonder if the reason most folks are so bad at math is simply due to the order in which expressions are written. In my experience, those who have the largest amount of trouble reading and understanding RPN are those who have the greatest amount of practice with infix notation math (this includes software developers familiar with Fortran- or Algol-derived languages). Those of the “working man” variety tend to get RPN almost out of the box. Ever wonder why business calculators are inherently RPN (enter number to be added, then hit + or – to get the result)?

    Reply
  21. Xanthir, FCD

    Hmm… Just for fun, here are two easy approaches to this in Lisp, one functional, one iterative:
    (defun inner-product (a b)
       (apply ‘+ (mapcar ‘* a b)))
    (defun inner-product (a b)
       (loop for x in a
          for y in b
          sum (* x y)))
    Both of them are very clear and easy to use. The latter is especially easy just because of the Extended Loop macro which defines a sub-language inside of lisp. It’s a bit more difficult if you use more primitive iteration constructs like do or dotimes – it then starts looking pretty much identical to the iterative implementation given in the beginning of Mark’s post.
    The first, though, is *extremely* easy to write and understand. Mapcar takes a function and feeds it successive elements of the supplied lists, building a new list with the results. Apply then simply applies a function to the contents of a list. So you built a list of the products of pairs, then add them all together.
    Of course, a big part of the whole appeal here is the polish notation and n-ary functions. ^_^

    Reply

Leave a Reply