Why so many languages? Programming languages, Computation, and Math.

Back at my old digs last week, I put up a post about programming languages and types. It produced an interesting discussion, which ended up shifting topics a bit, and leading to a very interesting question from one of the posters, and since the answer actually really does involve math, I’m going to pop it up to the front page here.

In the discussion, I argued that programmers should know and use many different programming languages; and that that’s not just a statement about todays programming languages, but something that I think will always be true: that there will always be good reasons for having and using more than one programming language.

One of the posters was quite surprised by this, and wanted to know why I didn’t think that it was possible, at least in principle, to design a single ideal programming language that would be good for any program that I needed to write.

It’s a good question, which I think connects very naturally into the underlying mathematics of computation. As I discussed back on the old blog, one of the very fundamental facts concerning computation is the Church-Turing thesis: that computation has a fundamental limit, and that there are many different ways of performing mechanical computation, but ultimately, they all end up with the same fundamental capabilities. Any computing system that reaches that limit is called an effective computing system (ECS). Anything that one ECS can do, any other ECS can do too. That doesn’t mean that they’re all identical. A given computation can be really easy to understand when described in terms of one ECS, and horribly difficult in another. For example, sorting a list of numbers is pretty easy to understand if it’s written in lambda calculus; but it’s a nightmare written for a Minsky machine; and it’s somewhere darned close to impossible for a human to understand written out as a cell-grid for Conway’s life.

How does this connect back to programming languages? Well, what is a programming language really? From a theoretical point of view, it’s a language for specifying a computing machine. Each program is a description of a specific computing machine. The language is a particular way of expressing the computing machine that you want to build. Underlying each programming language is an effective computing system: a fundamental model of how to perform computations. That’s the real fundamental difference between programming languages: what fundamental way of representing computation underlies the language.

Assuming that you’re looking at a good programming language, it’s good for a particular task when that task is easy to describe in terms of the fundamental ECS underlying the language; it’s bad for a task when that task is not easy to describe in terms of the languages underlying ECS.

If a language has a single ECS underlying it, then there are tasks that it’s good for, and tasks that it’s not so good for. If you try to smoosh multiple ECSs together under the covers of one language, then you don’t really have one language: when you’re writing for a vonNeumann machine, you’re really using one language, and when you’re writing for Actors, you’re using another. You’ve just forced two languages together into the same framework – but they’re still two languages, and you’ve probably compromised the quality of expression of those two languages by forcing them together.

0 thoughts on “Why so many languages? Programming languages, Computation, and Math.

  1. Bob Munck

    I’m sorry, but I don’t see that you answered the question; you just passed it on to another level of abstraction. The original question:

    Why isn’t it possible, at least in principle, to design a single ideal programming language that would be good for any program that one needed to write?

    becomes:

    Why isn’t it possible, at least in principle, to design a single ideal ECS that would be good for any program that one needed to write?

    I don’t see that that changes the question at all. It just restates it in less understandable terms.
    The whole thing about “a programming language is a language for specifying a computing machine” seems to be rather beside the point. I would certainly change it to “… specifying part of a computing machine” but eliminating the whole discussion doesn’t seem to alter your argument. Maybe there’s something I’m not seeing about the relationship between an ECS and a computing machine running on it. I hope it doesn’t have to do with “efficiency” or “execution speed.”
    I suspect that you need to say something about congruence or goodness of fit between the statement of the requirements for a computing machine and the statement of the specification or implementation of it. That also just changes the question from “… single ideal programming language …” to “… single ideal requirements specification language …”.

    Reply
  2. Mark C. Chu-Carroll

    Bob:
    But actually: the first line of your comment is actually the real answer to your question.
    You can write any computation in any ECS – that’s the church-turing thesis. But for some problems, the expression of the computation in one ECS is more understandable than in another. Rewriting it in a different ECS is just restating it in less understandable terms.
    The point is that different computing systems are fundamentally different. They express computations in very different ways. That’s the central deep issue that justifies the different kinds of programming languages: the fundamental way that they describe computation is completely different.
    As for why can’t you just design an ECS that does everything? Well, to take just two basic models: What does it *mean* to have an ECS that includes both sequent calculus and actors as fundamentals?
    It means that you’ve mushed together two different things in a strange way that loses the fundamental properties of both of them. The point of a sequent-calculus approach to computation is that you describe computations not as a procedure to do something, but as a specification of the solution as a logical constraint over a search space.
    The point of actors is that they’re intrinsically concurrent and completely asynchronous. The only thing that exists in the entire universe of computation is message-processors; the only real operation is passing messages. As a result, you get a beautiful model for doing certain things.
    What happens when you mush them together? Well, you no longer have that great property that a program is a constraint on search space – because you need to be able to send messages to take advantage of the actors. And you don’t have the property of a message passing system – because the sequent constraints don’t express naturally in terms of message passing, so you don’t have any of the great properties of actors.
    When you try to force these very different concepts together, what you do is blur away the properties that make them useful. Put transparent stateful operations into Haskell, and you lose the referential transparency property that makes Haskell especially useful. Put modifyable atoms into Prolog, and the declarative search concept falls apart. Put synchronous execution into actors, and the fundamental concurrency properties that you were getting from using actors is gone.
    Unless you partition the language, so that you’re using “pure” sequent in one section of code, and “pure” functional in another, and “pure” actors in a third; and then what you’ve really done is build a collection of languages with a good linker.

    Reply
  3. Ithika

    I agree with you Mark. Programming languages are a notation for communicating with people, and with reasoning about the behaviour of the programs we write.
    It has a very close parallel with mathematics, in this way. There are many different techniques and notations (differential calculus, linear algebra, etc.) which are equivalent in some ways, and can be used to address the same questions in different ways. Category theory, for example, seems to be all about drawing these parallels explicitly, and creating ways in which a problem in one domain is equivalent to one in the other. (Correct me if I’m wrong here!) But I don’t think anyone would seriously consider there should be a single mathematical technique to deprecate all others.
    People create programming languages as tools to express a problem neatly, in order that they can wrap their heads around it.

    Reply
  4. SteveG

    How fascinating. During the period between the World Wars, a group of physicists and philosophers in Vienna were asking the same question about philosophy and whether it would be possible to derive a perfect artificial language in which to couch all philosophical questions such that empirical and logical elements would be unambiguously delimited and decision procedures for deductive and inductive inferences could be cleanly defined. Indeed, they were colleagues and teachers of Church, Turing, von Neumann and the lot of forefathers of computation theory.
    Your discussion of the relation between equi-computablility and identity of language harkens directly back to some of the questions raised by that effort (your line seems to be a version of what the logician W. V. O. Quine argued in his famous paper “Two Dogmas of Empiricism” against Carnap’s hopes for a perfect language). Since these questions have at least a common ancestor, I’m wondering if lessons from the fall of logical positivism would have implications for your question. I’d have to think about it to say anything more, but I wonder if someone else has worked along this line.

    Reply
  5. loren

    SteveG: “During the period between the World Wars, a group of physicists and philosophers in Vienna were asking the same question about philosophy and whether it would be possible to derive a perfect artificial language in which to couch all philosophical questions such that empirical and logical elements would be unambiguously delimited and decision procedures for deductive and inductive inferences could be cleanly defined.”
    I seem to remember the ambition (conceit?) going back considerably further than the Vienna Circle: Liebniz yearned for a truly universal calculus to resolve pretty much every possible argument, moral and empirical (“let us calculate”). Neal Stephenson makes a fairly big deal about this in the Baroque Cycle (and let me just say: that man needs a way more aggressive editor …)

    Reply
  6. loren

    Mark:
    “A given computation can be really easy to understand when described in terms of one ECS, and horribly difficult in another.”
    “That’s the real fundamental difference between programming languages: what fundamental way of representing computation underlies the language.”
    “but they’re still two languages, and you’ve probably compromised the quality of expression of those two languages by forcing them together.”
    SteveG: “Since these questions have at least a common ancestor, I’m wondering if lessons from the fall of logical positivism would have implications for your question.”
    The paper I kept thinking of was Donald Davidson’s “Very Idea of a Conceptual Scheme” and whether or not translateability really captures what’s most interesting about languages, formal or natural (I have no idea: translateability is obviously a deep and critical issue, but the question seems worth asking — so too the question of whether Davidson pulled off in that paper a sort of philosophical natural-language equivalent to the Church-Turing thesis … that’s muddled and a bit silly, but I’ll leave it regardless).
    Based on several of Mark’s word choices (which I’ve quoted above) I kept thinking that there’s a (probably pretty subtle) aesthetic standard being invoked here, lurking behind the intuitive pragmatic standard of whether or not a description is clear (and useful) for a particular task.
    Sorry for all the parenthetical remarks, but we all know, deep down, that Lisp is the most beautiful and elegant of all formal languages that ever will have existed …

    Reply
  7. Don Geddis

    Mark has written a nice summary of our discussion in the comments on the original post. I agree with Mark that when you program in a given style, you must be in that style completely in order to get the benefits. Programming styles might include: functional, object-oriented, imperative, static typing, logical inference, etc.
    Even if your programming language permits assignments, for example, if you’re solving a problem in a functional style then you shouldn’t use that part of the language. Part of the power of using some style is to be able to draw rapid conclusions about different parts of code. If you assume that you’re using a purely functional style, then part of the benefit is supposed to be that you can assume any given piece of code doesn’t change global state, can be re-entered in multiprocessing, etc. — without looking at the source code itself! You lose this power if you program in a 90% functional style, but with random exceptions.
    The same is true for other programming styles. It doesn’t really make much sense to mix styles in a single coherent piece of code.
    The one thing I’ll say is that I think Mark himself reveals the potential answer, although perhaps without realizing it. Mark wrote: “Unless you partition the language, so that you’re using “pure” sequent in one section of code, and “pure” functional in another, and “pure” actors in a third; and then what you’ve really done is build a collection of languages with a good linker.
    A multi-paradigm language (like Common Lisp that I mentioned originally) in some sense tries to be a bit like this. Just let me conclude by saying that I think Mark underestimates the benefit of “a good linker” in his suggestion above.
    But perhaps that’s just a matter of emphasis, and not a substantial disagreement.

    Reply
  8. Mark C. Chu-Carroll

    loren:
    It must be a weakness in my writing; I meant for the aesthetic notion to be very explicit, not to just be suggested by wording.
    As I said in the post, I see a programming language as a tool for describing a computing machine that performs a specific computation. There are two aspects to that: describing the machine to the *real* computer, so that it can simulate it; and describing the machine to other people who read your code.
    If you limit yourself to one ECS, you can still make it perform any computation you want; it may be slower on one ECS than on another, but that’s just a matter of optimization. The real difference is that the program can become virtually incomprehensible to another human if it’s poorly suited to the underlying ECS.
    Comprehensibility to another person is an aesthetic issue more than a technical one. But I’d claim that it is, absolutely, one of the most important properties of any good program. If you’re using a language in which you can’t express your computation in a way that will be easy for other people to understand, then you need to either change the computation, or change the language.

    Reply
  9. Mark C. Chu-Carroll

    Don:
    I don’t mean to underestimate the value of a good linker. Part of what makes it possible to really effectively use multiple languages is the ability to connect programs written in one language with programs in another.
    It’s my personal opinion that the “right” way to do that isn’t to create a huge monolithic programming language that includes multiple sub-languages which you can choose, but to have multiple distinct languages with some way to link them together.
    For example, I think SWIG is a better approach to multi-language cooperation than CommonLisp. CL is a great language in many ways, but it’s a huge monolith; and while you can use embedded languages like Ops5 for constraint programming, the debugger and other tools are still geared towards the primary imperative/functional execution model. SWIG lets you use the different languages, each with their own tools; but you can stick programs written in different languages together when you need to.

    Reply
  10. Bob Munck

    Mark,
    OK, let’s put it in fairly concrete terms: I’m a pretty good programmer, and I’ve spent several decades (including five years on the DARPA STARS program) studying how you write code that other people can understand and maintain easily and quickly. Can you claim with confidence that you can write a program in Haskell, Prolog, Lisp, Ops5, or whatever to solve a problem that is particularily suited to whichever language and I cannot write one in Ada that is equally good?
    The problem with “stick[ing] programs written in different languages together when you need to” is that the people can’t handle it. You may know the several languages you’d be inclined to use on a big project, but you’re probably too expensive to be used as a coder anyway, and you certainly can’t find, hire, and pay a whole team of people who are that good. Most of the programmers implementing a system of any significance are going to be journeyman-level, good at one or maybe two fairly simple, stupid languages like C++ and incapable of going beyond that. We couldn’t even get the big defense contractors to upgrade their hordes of programmers from C to Ada, and you certainly wouldn’t get them to train in a bunch of exotic languages. Remember too, that any really good programmers you manage to get will loose interest and leave after the production release, and you’ll be left with much less skilled people doing the maintenance and enhancement. Much more money is spent on programming after the first release.
    My old buddy Harlan Mills tried to get around this skills problem long ago with the Chief Programmer Team idea. The trouble there was that it didn’t scale up and tended to produce unmaintainable code. The long-term trend has been to de-skill the profession of programmer; you’re bucking that trend.
    Btw, I would point out that the lowest-level “underlying ECS” in all of your examples is a good old fetch-parse-execute von Neumann machine, no matter what exotic execution model you have at the next level up.

    Reply
  11. MarkCC

    Bob:
    Yes, I claim exactly that: that there are numerous examples of code which is dramatically simpler to understand written in Haskell or Prolog than written by the very best programmer in Ada.
    For example, a backtracking Sudoku solver in prolog, in 50 lines, including whitespace.

    %Expects a list of lists 9 by 9 grid.
    sudoku(L) :-
    flatten(L,AllVars),
    domain(AllVars,1,9),
    [R1,R2,R3,R4,R5,R6,R7,R8,R9] = L,
    %Each row is different.
    all_different(R1), all_different(R2), all_different(R3),
    all_different(R4), all_different(R5), all_different(R6),
    all_different(R7), all_different(R8), all_different(R9),
    transpose(L,TL),
    %Each column is different.
    [C1,C2,C3,C4,C5,C6,C7,C8,C9] = TL,
    all_different(C1), all_different(C2), all_different(C3),
    all_different(C4), all_different(C5), all_different(C6),
    all_different(C7), all_different(C8), all_different(C9),
    %Need to put the code in to do each 3×3 square all different.
    %There is a much more elegant way of coding this. But for
    %illustrative purposes it is fine.
    [X11,X12,X13,X14,X15,X16,X17,X18,X19] = R1,
    [X21,X22,X23,X24,X25,X26,X27,X28,X29] = R2,
    [X31,X32,X33,X34,X35,X36,X37,X38,X39] = R3,
    [X41,X42,X43,X44,X45,X46,X47,X48,X49] = R4,
    [X51,X52,X53,X54,X55,X56,X57,X58,X59] = R5,
    [X61,X62,X63,X64,X65,X66,X67,X68,X69] = R6,
    [X71,X72,X73,X74,X75,X76,X77,X78,X79] = R7,
    [X81,X82,X83,X84,X85,X86,X87,X88,X89] = R8,
    [X91,X92,X93,X94,X95,X96,X97,X98,X99] = R9,
    all_different([X11,X12,X13,X21,X22,X23,X31,X32,X33]),
    all_different([X41,X42,X43,X51,X52,X53,X61,X62,X63]),
    all_different([X71,X72,X73,X81,X82,X83,X91,X92,X93]),
    all_different([X14,X15,X16,X24,X25,X26,X34,X35,X36]),
    all_different([X44,X45,X46,X54,X55,X56,X64,X65,X66]),
    all_different([X74,X75,X76,X84,X85,X86,X94,X95,X96]),
    all_different([X17,X18,X19,X27,X28,X29,X37,X38,X39]),
    all_different([X47,X48,X49,X57,X58,X59,X67,X68,X69]),
    all_different([X77,X78,X79,X87,X88,X89,X97,X98,X99]),
    labeling([ffc],AllVars).
    flatten([],[]).
    flatten([H|T],Vars) :-
    flatten(T,TVars),
    append(H,TVars,Vars).

    Looks a tad frightening at first. But it doesn’t take long to figure out. They’ve given a label to each of the constrained areas of the grid: rows, columns, and 3×3 squares; and they’ve said that all the values in each of those constrained areas must be all different. That’s it.
    Ada can’t say that in anything close to that degree of brevity or clarity. Nor can my beloved OCaml.
    I’m not arguing about pragmatism in the face of todays programming market. There are a lot of people out there today working as programmers who don’t know what the hell they’re doing, and who don’t even really understand the one language that they claim to know. That doesn’t mean that that’s how things *should* be. It doesn’t matter whether you’ve got one language or one hundred – if your programmers are idiots, the code they write is going to be crap. If your programmers are actual competent professionals, then they’re capable of figuring out what’s necessary for a given project – picking one language, or a combination of languages as appropriate.

    Reply
  12. ekzept

    another aspect is the insight obtained from using the proper representation. like the category diagram described in a blog entry upstream of here, using the appropriate representation is often the key to solving a problem or seeing a proof. also, and this is something many traditional programming language designers miss, there is something to be said for putting a lot of programming power in a compact expression. yes, such expressions might be complex. but there is an advantage to having a program occupy a half a page versus 10 pages.

    Reply
  13. MarkCC

    ekzept:
    I agree with you completely. I’ve never had the opportunity to learn APL, which is a great example of what you’re talking about; but a five minute explanation is enough to make some really fascinating programs comprehensible.
    For example, quicksort in J, an ASCII-based APL variant:

    sel=: 1 : ‘] #~ ] x. {.’
    qs=: ] ` ($:@(sel)) @. (1:
    Looks bizzare as heck until it gets explained to you. But once you pick up J, it’s actually one of the most amazingly clear and concise quicksort implementations I’ve ever seen. The syntax is a barrier for the first day or two of using it, but it rapidly becomes natural. There’s a lot of silliness in many languages where they’re made very wordy in the name of “readability”, when the kind of readability that they stress is only useful for the first week of using the language.
    One of the reasons that I’m so fond of languages like Prolog, Haskell, and OCaml is that they all tend to push you towards decomposition into small, easily understood units, and provide great tools for assembling those small units in interesting ways. The Haskell list comprehensions and mapping functions are a great example of this: you write the minimal operation as a function, and then build up using the higher-order operators.
    For example, here’s a bit of haskell – again, an implementation of a sudoku solver (seems to be everyones favorite demo program lately 🙂 )

    valAt’ (i,j) = do
    a
    They implement a simple “valAt” operator for accessing and updating the grid values; then everything else is done with maps and array comprehensions. There’s definitely a bit of learning curve associated with those comprehensions – but the payoff is huge.

    Reply
  14. Bob Munck

    “Ada can’t say that in anything close to that degree of brevity or clarity.”
    “Brevity”? Maybe not, but who cares? Brevity is not a criterion in producing good commercial code. If anything, it’s undesirable. If you want to minimize keystrokes for some reason, go with a language like APL.
    “Clarity”? Well, it’s not the least bit clear to me, but I don’t know the rules of Sudoku or prolog. (It’s an AI language, and I reject everything having to do with AI). “Looks a tad frightening at first. But it doesn’t take long to figure out.” Boy, that’s damming with faint praise. You shouldn’t have to “figure out” what a piece of code does, you should be able to understand what it does by reading it.
    Wouldn’t it be possible to write Ada functions that do the things that each of the operators or functions (flatten(), labeling()) in the prolog code do, and then invoke them in the appropriate order? I realize that prolog is evaluating rules rather than executing statements, but in the extreme I can write a prolog engine in Ada and Ada statements that invoke it with the set of predicates needed to solve a particular problem, and do it such that it’s all understandable to an Ada programmer. Now it’s more an algorithm that I’m using to solve that problem than a programming language (there isn’t a sharp line separating those two things). Plus, I don’t have to find someone who knows both prolog and Ada for my Ada-based programming project.
    “It doesn’t matter whether you’ve got one language or one hundred – if your programmers are idiots, the code they write is going to be crap.” Putting aside the “idiots,” that’s not true. If you have less-than-brilliant programmers (and you will), you can improve how they write code with training, coding conventions, good code reading and other kinds of reviews, and all the other mechanisms we’ve developed. If, that is, you’re doing all this for a single programming language. If you have a hundred — or even five — different languages, you can’t afford to do the different rules and different courses that each will necessitate.
    Mark, I have to tell you that I’m at a loss to understand how you could do what you do for a living and have the opinions you have. You don’t seem to be at all practical. I have to wonder how many large- or medium-sized programming projects you’ve worked on. I’ve worked with a lot of people from your company, though I suppose most of them are LockMart now, and they seemed to be much more in agreement with my point of view than with yours. I don’t think it’s even an “academic” vs. “real world” split; I was on the faculty at Brown for six years and Va Tech for five, and that split is along different fracture planes. You’re just not going to find “groups of people working together to build large” … Sudoku solvers.

    Reply
  15. Bob Munck

    OK, here’s a example that somewhat supports your point, but at a higher level of abstraction:
    Back in the late 70’s, NRL hired my company (SofTech) to figure out why software for its AN/UYS-1 signal processing computer was so expensive and so bad. I was technical director of the DC office and project lead. We discovered that the acoustical engineers specified the software by drawing data flow diagrams where the nodes were mostly a standard set of operations like “FFT,” “Bandpass Filter,” etc. Those diagrams would be given to a roomfull of programmers who would laborously convert them into sequential SPL/1 or CMS-2 code by essentially ripping the functionality apart and re-assembling it in a “crystal clock architecture.”
    My big idea seems pretty obvious now, but was radical then: create an engine that will run the dataflow diagrams. We wrote a parser to convert the diagrams into an internal form and an interpreter to run it on the UYS-1. Worked like a charm; we took a diagram that (ahem) IBM had spent several tens of millions of 1970’s dollars converting into code and got it running in about a month with five programmers. Believe it or don’t, our implementation was slightly faster. Long story short, the Navy specified that the next generation of signal processors, the AN/UYS-2, be designed (by Bell Labs) as a dataflow engine at the hardware level. That machine and that architecture are still in use today.
    So that’s an example where a highly-specialized programming language was the right solution, if you’re willing to call dataflow diagrams a programming language. Note, though, that we eliminated the programmers.

    Reply
  16. MarkCC

    Bob:
    I don’t consider the “looks frightening at first” to be damning with faint praise. My point was that if you don’t know prolog at all, and you look at that, it’s likely that your first reaction is “What the heck?”. If you actually know prolog, then you look at that, and it’s crystal clear, beautiful code. As someone who’s used prolog before, it took me less than 30 seconds to understand that entire program.
    And brevity *is* a huge value in code. If I can write a piece of code in a clear way that fits on my screen all at once, so that I can see the entire thing, that’s better than code that does the same thing in five pages.
    As for “can’t you just write a bunch of functions in Ada?”… Yes and no. Yes in the sense that you can essentially write an entire prolog-style goal-directed evaluation and unification engine in Ada, and then use it. But then what you’ve really done is write an interpreter for prolog in Ada. No in the sense that you can’t write a function “labelling” in Ada that will do the equivalent of what’s going on in the Prolog program – not without a huge amount of infrastructure that’s going to be incredibly confusing. Each of the predicates in the Prolog program is a backtracking goal-directed search; to do that in Ada, you’d need to explicitly record all of the backtracking information; detect dead-ends in the search, and then explicitly use the backtrack marks to figure out where to resume the search. That’s not going to be clear in Ada – unless you’ve implemented the entire prolog interpreter.
    As for the line about how you don’t see how I can do what i do… You know, I don’t particularly *care* whether you understand how I do my job. What matters is whether my employer is satisfied with my work – and they are.

    Reply
  17. Ithika

    Bob:

    Wouldn’t it be possible to write Ada functions that do the things that each of the operators or functions (flatten(), labeling()) in the prolog code do, and then invoke them in the appropriate order?

    Or, “wouldn’t it be possible to write Universal Turing machine instructions that do the things that each of the operators or functions in the Prolog code do, and then invoke them in the appropriate order?”
    Well, yes it would, but that would be rather beside the point. We could do it all in assembly if we wanted. We could do that whole thing in microprocessor instructions but it wouldn’t really help us. They are all equivalent in lifting power. What Prolog gains you (in this instance) is brevity and clarity.
    The whole point is that they should all be able to do the same things. We choose the notation that’s easiest to express our goal in.
    I find it strange that you should decry “brevity” and “clarity” as being of little value, and yet talk endlessly about how your way (using this mythic do-everything-easily language) is easy for non code ninjas. The cornerstones of any system like that should be brevity and clarity. Unless you do class longwindedness and obscurity as merits? 😉

    Reply
  18. Bob Munck

    No one is addressing my main point: you can’t do real programming this way.
    Suppose you did somehow manage to staff up your project with a few dozen smart, knowledgable programmers with degrees from good schools and years of experience, despite having gone through a competitive bid where cost was a factor. They spend a few years “figuring out what’s necessary for [the] project – picking one language, or a combination of languages as appropriate” and writing the code. A couple of them program the Sudoku solver part of the system (an advanced fighter avionics suite) in prolog, another does the radar tracker portion in Haskell, and a little group that sits by themselves in the cafeteria writes the pilot interface in OCaml. The rest of the system is a mix of Java (recent grads) and Ada (old hands).
    The system is delivered, and it’s a big success. The prototype fighter, takes off, flys around, and lands without crashing. Support transitions into the lifetime maintenance phase. 95% of the development team moves on to their next job.
    The trouble is, at this point less than 50% of the code-writing has been done, maybe much less than that. The hardware seen by the avionics suite is going to change drastically as the fighter moves into production, and continue to change continuously for the next 30-50 years of the platform’s lifetime. You might need to upgrade the code to 3-D Sudoku, but the guys who wrote it are living in a retirement home in New Zealand and nobody knows prolog any more. You might need to upgrade the pilot interface to support a remote pilot, but everyone’s writing OCaml IV now, and no one can read the old code. And, of course, the Java fad ended when Sun went Chapter 11 thirteen years ago. It’s just a fact of life that your budget for this kind of maintenance work is much lower than the original development.
    Can’t do it that way. You have to choose a single, widely-used, non-exotic programming language, define and enforce a strict set of programming standards and conventions, establish a CMMI level 3 set of code reading and review processes, and do serious configuration management. Sure, you’ll still have some of the problems I listed above, just probably not as many and not as severely.
    Given that rant, and the fact that we’ve been doing it for a half century, I don’t see any point in continuing to fool around with toy languages. Yeah, maybe you can write a little text formatter for your own use elegantly, briefly, and clearly in WingDing-30, but wouldn’t it be better for you and for the world in general to bite the bullet and do it in Ada or (lord help me) C++? It might take a little longer, and it might necessitate using the scroll wheel on your mouse to see the whole program, but it may also give you some insight into the problems and the programming style that result from writing that kind of code in that language. The WD-30 version doesn’t teach you anything new.

    Reply
  19. Mark C. Chu-Carroll

    Bob:
    Not every project should use multiple languages. But not every project should be written all in the same language. There are some extremely compelling examples of multi-lingual systems that are quite real.
    Is google real software to you? They use C++, Python, Java, and Javascript – each in the places where they’re appropriate. Sure, they could do it all in Java – applets on the client, beans for everything on the server. Why do you think they don’t?
    Is Lotus Notes real software? I happen to despise notes, but it’s a real, widely used system. It’s got parts written in C, C++, Java, and LotusScript.
    Is ClearCase real software? C, C++, Perl, and Java. I’m not sure, but they may also be useing some C#.
    And your closing comment, “The WD-30 version doesn’t teach you anything new” is, frankly, the most ridiculous statement I’ve seen in this discussion. Sticking to programming everything in one language, regardless of whether it’s suited to the problem or not will give you insight into the problem, but writing in a language which allows you to express the solution to the problem in a natural way *won’t* teach you anything? Do you really believe that?

    Reply
  20. Bob Munck

    “Is google real software to you? They use C++, Python, Java, and Javascript” Yeah, but they wish they hadn’t. Ed Lazowska, an old student of mine, is a technical advisor to them and has a lot of former students there. I don’t know what single language they would have preferred to use and I don’t think they do either, but they definitely wish they’d made the choice when they could have. Heck, they managed to choose a “universal fastener” for all their hardware: velcro.
    Lotus — what can I say? The company specializes in making mistakes.
    Likewise, I don’t know why ClearCase is a melange; I’m sure Grady would have chosen Ada. He’s out of ICU now but still in MN for tests. I’ll ask him when he’s back at work. Did they maybe inherit it from someone else? (I should know this; ClearCase was part of the development environment we specified in STARS, and I was technical lead on that.)
    In any event, citing cases where people have used a mishmash of languages doesn’t refute my point unless the entities have declared that they’re happy they did it that way. We haven’t yet made every possible mistake in software development, but we’ve got a good start at it.
    ‘The WD-30 version doesn’t teach you anything new’ … ridiculous.” OK, I think you said you’d written a text processor in some rare language. What did you learn? My first text processor, Hypertext, was written in an IBM internal language called BSL-360, later PL/S; I learned a great deal about language use from that.

    Reply
  21. MarkCC

    I know a lot of people at google as well, and I’ve heard rather a different story from them. They love their python for the places where it’s usable; they can’t avoid using C/C++ for some of the stuff where python can’t perform well enough; and you get a lot of benefits from writing UIs to run in browsers using Javascript. The google guys I know love the multi-language environment for its productivity.
    ClearCase, like most Rational products, was an acquisition. It was developed by guy named Dave LeBlang, who sold it to Atria, which because PureAtria, which was acquired by Rational, which was acquired by IBM. Why does it use multiple languages? Simple: there’s the core system, which you want in C/C++. But then there’s something called “trigger” scripts. As Rational folks are fond of saying, basic CC is more of a toolkit for building a custom SCM system than it is a real complete SCM system in itself. A lot of the customization does not need to be at the primitive low-level; for things like process support, which is both important, and subject to change through the development of a system, you want a lot of flexibility to adapt (you don’t use the same process rules when you’re doing initial prototyping as you when you’re close to delivery; and you don’t use the same process rules at the time of initial delivery that you use during maintenance). So CC is set up to allow you to implement those triggers in either low-level C/C++ code, or in a scripting language of your choice. The scripts that I saw when I was looking at CC were all written in Perl.
    As for what did I learn from writing my text processor in Haskell? Well, I notice that you want me to answer that; but your answer to the same question is “a great deal”.
    What I learned from writing in Haskell? Before that, when I had a text processing problem, I used one of two approaches: writing a scanner/tokenizer and then a parser; or regular expressions and looping. In Haskell, strings are lists of characters; files are lists of strings. String processing is sequence processing; and by reducing the problem to lazy list processing, I saw an entirely different way of approaching the problem. It’s a far simpler, cleaner process, and a much better way of implementing it. I wound up with something algorithmically clearer, easier to modify, easier to debug.
    On a different subject: I hadn’t heard Grady was ill. What happened?

    Reply
  22. Don Geddis

    This isn’t my blog, so I can’t really start a new (related) topic. But all this started from lambda calculus to static typing in programming languages, and I had a question for you static typing fans.
    Why are you satisfied with the typing in current programming languages? Especially if you come from a math/lambda calculus background (like Mark), I would expect that the languages had missed the point.
    Specifically: basically all static typing systems deal with representations of machine bits in machine language. Mark’s example in the original post was “array” vs. “array or nil”. Other common types are “signed 32-bit integer” or “integer [0,12]”.
    But surely the “types” that matter to a programmer are semantic ones, with meaning in the outside world. Let’s say I write a function to calculate the area of a wedge of a unit circle. WedgeArea(angle) takes a number and produces a number. But is the angle expressed in degrees, or radians? Perhaps you could notate the argument type with [0,360) vs. [0,2*pi), and maybe you could guess. But if the thing really wanted radians, and I happened to call it with a small degree (like 3 degrees) … shouldn’t that be a type error?
    It seems the logical extension of this static typing idea is that every data element should be a tagged structure of some kind, with basically a “unit”, and a “magnitude”.
    Of course then the coding gets complex. The implementation of WedgeArea would now have to destructure the argument, and re-compose a result:
    WedgeArea(angle [degrees]) {
    portion [integer 0,360] := angle.magnitude
    area := (portion / 360) * pi
    result := new(Area)
    result.magnitude := area
    return(result)
    }
    I just made up some pseudocode, but perhaps you get the idea. With extensive user-defined types, you then get into the problem that no built-in operation works with those types, so all your code has to constantly “unbox” and “re-box” values into the tagged structures.
    But at least you would then get semantic type errors, like calling WedgeArea with a radian or a percentage, when it expects degrees.
    In the comments on the original post, Mark mentioned that a benefit to him of static typing was that it enforced clarity in his programming thoughts. Mark, I would think you would be frustrated with needing to stick to machine bit patterns for your types, and missing out on the clarity that you would get if you could use types as “sets of things I care about as a single group”, the more intuitive (mathematical?) version of what typing ought to be about.

    Reply
  23. Mark C. Chu-Carroll

    Don:
    Of course I’m not completely satisfied with current type systems. Since when is a programmer ever satisfied with any tool, no matter how good? 🙂
    In the code that I spend most of my time writing, unit-measures wouldn’t be particularly useful, but they’re definitely incredibly useful for a lot of code out there, like some of the stuff I did back in grad school. There are a bunch of programming languages that have been experimenting with unit-of-measure types. There are a bunch of libraries for Java that actually exploit the new parametric types in Java5 to create a sort of unit-language; and there are entirely new languages, like Frink (http://futureboy.homeip.net/frinkdocs/), which provide unit-typed measures.
    Personally, my own “obsession” about types involves structure-shape; Laurie Hendren did some work on that a few years ago for automatic parallelization, but it struck me how useful it could be not just for parallelization, but for general type-descriptive purposes. To give an example of what I mean, think of the following in C:

    struct binary_thing {
    void *value;
    struct binary_thing *one;
    struct binary_thing *another;
    };

    Now: is that a binary tree, or a double-linked list? Hendren annotated with “direction”, so that you’d say either:

    struct binary_tree {
    void *value;
    struct binary_tree *[left] one;
    struct binary_tree *[right] another;
    }

    or

    struct dl_list {
    void *value;
    struct dl_list *[list] one;
    struct dl_list *[backward(list)] another;
    };

    Around the same time Laurie was doing that, Mark Wegman and Ken Zadeck at IBM also did some stuff involving automatic inference of structural shape, which could be used for doing type checking on direction declarations.

    Reply
  24. Bob Munck

    your answer to the same question is “a great deal“. Well, yeah, but that was forty freaking years ago; I was twenty years old. Hypertext was the first or second such word processor ever and the first or second WYSIWYG editor (Doug Englebart has a good or better claim to being first, plus he did that upside-down trackball thing.) We’ve learned a huge amount about writing text processors since then; heck, I’d suggest that MS Word has made every single possible mistake that could be made in that area. We’ve written text processors in Snobol and AED and APL and Lisp and every other type of language you could think of. I just can’t imagine that there’s anything left to be learned there.
    Grady Booch had open-heart surgery to repair an aneurysm on 5/30. Having just had a heart procedure that was 1/1000th as serious, I was intensely concerned. He did a blog entry from the Mayo ICU the next day and has now been released, but is unexpectedly being kept in MN for tests. Bill Higgins is posting updates on Grady’s blog.

    Reply
  25. Bob Munck

    Ithika:
    You misread me; I didn’t decry clarity. Clarity of expression is the most important and overriding criterion of programming languages and programs. I was claiming that brevity is often the enemy of clarity, and is a much less important criterion.
    I also don’t think I’ve claimed there is a “do-everything-easily language.” Some things — reliability, security, flexibility, extensibility, maintainability, understandibility — are difficult in any language.

    Reply
  26. Dangerous Dan

    I’m confident that part of the reason that there is no “do-everything-easily” computer language is that people are involved in writing the languages and programs in those languages. And, unlike mass-produced computers, human brains are each individually wired.
    COBOL and FORTRAN remain to this day useful for the purposes they were originally developed, while C has been used to code operating systems, word processors, compilers, number crunching software, and almost everything else. Haskell and Prolog may do some things far better than C, and without doubt, they are far more convenient for some programmers for some problems than C, but I doubt that either is appropriate for writing an interrupt handler or an operating system kernel.

    For example, sorting a list of numbers is pretty easy to understand if it’s written in lambda calculus;…

    but only if you’ve managed to train your brain to follow lambda calculus. Personally, I’ve found long programs in partially disassembled machine code mixed with raw hexidecimal to be easier to follow than short ones in Lisp.
    The variety of computing languages is somewhat similar to the variety of systems of symbols used in mathematics. You have the mathematical systems of Algebra, Geometry, Calculus and differential equations, Topology, Set theory, matrix algebra and tensors, and on and on and on….
    As for brevity versus clarity, I once wrote in APL, one-line functions for a discrete Fourier transform and its inverse. The functions were extremely brief, but far from clear.

    Reply

Leave a Reply