I came across an article yesterday about programming languages, which hit on one of my major peeves, so I can’t resist responding. The article is at greythumb.org, and it’s called Programmer’s rant: what should and should not be added to C/C++. It’s a variation on the extremely common belief that C and C++ are the best languages to use when you need code to run fast. They’re not. They’re good at things that need to get very close to the hardware – not in the efficiency sense, but in the sense of needing to be able to fairly directly munge the stack, address specific hardware registers, etc. But they are *dreadful* languages for writing real scientific and/or numerical code.
To quote the part of the article that set me off:>
First of all, these fears are nonsense. C and C++ are never going to disappear. Why? Because there
are classes of programming problems that are still and will always be CPU bound and there is still
no language as fast as C or C++ for these problems. I highly doubt that there ever will be.I’m talking about things like: scientific number crunching, game/simulation physics, raytracing,
real-time 3d graphics, audio processing, codec implementation, high-speed network packet routing,
evolutionary computation (my personal favorite :), and of course implementing all these high-level
languages’ runtimes. There are also problems like OS and hardware driver implementation where you
need something “close to the metal” that can interact closely with and even embed assembly language.
C is basically shorthand for assembler, which is why it’s the preferred language for that kind of
thing.For these tasks, premature optimization at the level of language and framework choice is not evil.
In some cases it’s a requirement. I predict that at least some of these tasks will still be done in
C, C++, or some language with similar characteristics 50 years from now. To give you an idea of just
how much faster C can be for tasks like this, I have found that evolvable instruction set based
evolutionary computation is almost twice as fast when competently implemented in C than a similar
competent implementation in Java.
Here’s the problem. C and C++ suck rocks as languages for numerical computing. They are not the fastest, not by a longshot. In fact, the fundamental design of them makes it pretty much impossible to make really good, efficient code in C/C++. There’s a good reason that Fortran is still the language of choice for real, intense scientific applications that require the absolute best performance that can be drawn out of our machines – applications like computational fluid dynamics.
Making real applications run really fast is something that’s done with the help of a compiler. Modern architectures have reached the point where people can’t code effectively in assembler anymore – switching the order of two independent instructions can have a dramatic impact on performance in a modern machine, and the constraints that you need to optimize for are just more complicated than people can generally deal with.
So for modern systems, writing an efficient program is sort of a partnership. The human needs to carefully choose algorithms – the machine can’t possibly do that. And the machine needs to carefully compute instruction ordering, pipeline constraints, memory fetch delays, etc. The two together can build really fast systems. But the two parts aren’t independent: the human needs to express the algorithm in a way that allows the compiler to understand it well enough to be able to really optimize it.
And that’s where C and C++ fall down. C and C++ are strongly pointer-based languages. The real semantics of almost anything interesting end up involving pretty much unrestricted pointers. In C and C++, there’s no such thing as an array – there’s just pointers, which you can subscript and a shorthand for pointer arithmetic and indirection(`x[n]` in C/C++ is the same thing as `*(x+n)`.)
That pointer based nature means that in a C or C++ program, it’s very hard for a compiler to figure out what things are independent. It comes down to a problem called alias detection. Alias detection is identifying when two variables *might* be referencing the same location. Alias detection becomes a horrific mess in the presence of unrestricted pointers. Let me show you an example:
for (int i=0; i < 20000) { for (int j=0; j < 20000) { x[i][j] = y[i-2][j+1] * y[i+1][j-2]; } }
If you look at that loop, it can be parallelized or vectorized without any problem if and only if the array pointed to by `x` and the array pointed to by `y` are completely distinct with no overlap. But there’s no way to write code in C or C++ that guarantees that. If it were Fortran-77, you could easily check if they were distinct. If it were Fortran-98, you could check if `x` or `y` were declared as possible pointer targets, and the programmer could make it obvious that they didn’t overlap if they wanted to. But you can’t do that in C or C++. (And Fortran isn’t even the best – an experimental language called Sisal from Lawrence Livermore labs used to be able to beat Fortran by around 20% on typical code!)
That example involves parallelization of code, but alias related problems aren't just an issue for parallelism; it’s just easiest to show an example for parallelism. The aliasing issues in C and C++ have a very direct impact on real code. Let me tell you about a concrete example of this, and then I’ll stop ranting. About six years ago, I was working on a project where I needed to implement a rather messy algorithm to compute something called the "longest common subsequence" (LCS) of two arrays. The standard algorithm for computing LCS is using something called dynamic programming; it's **O***(n3) time, and **O**(n2) space. There’s an algorithm that was designed by people doing computational biology that can do it in the same time, but using on average **O**(n) space.
I didn’t know what language to use for this project, so I decided to do an experiment. I wrote the LCS algorithm in a bunch of different languages, to compare how complex the code was, and how fast it ran. I wrote the comp bio algorithm in C, C++, OCaml, Java, and Python, and recorded the results. What I got timing-wise for running the programs on arrays of 2000 elements each was:
- C: 0.8 seconds.
- C++: 2.3 seconds.
- OCaml: 0.6 seconds interpreted, 0.3 seconds fully compiled.
- Java: 1 minute 20 seconds.
- Python: over 5 minutes.
About a year later, testing a new JIT for Java, the Java time was down to 0.7 seconds to run the code, plus about 1 second for the JVM to start up. (The startup times for C, C++, and Ocaml weren’t really measurable – they were smaller than the margin of error for the measurements.)The Objective-Caml bytecode interpreter was faster than the carefully hand-optimized C program! Why? Because the OCaml compiler could recognize that the arrays were completely independent – it didn’t need to worry about one iteration of the loop stepping on the values used by another. The C compiler couldn’t apply a lot of useful optimizations, because it couldn’t be sure that they were valid.
And it’s not just non-assignment based functional languages where you can see supposedly less-efficient high level languages crushing the performance of C/C++. CMU CommonLisp can beat C/C++ on numeric code. There was a paper a few years back documenting it: using a Sun SPARC workstation, if you use the optional type declarations, and write scientific/numeric code in Lisp, using vectors (Lisp arrays) and assignments to implement exactly the same algorithm as C, the CMU CommonLisp code will perform better than C code generated by either the Solaris C compiler or GCC with maximum optimization.