A bunch of people have sent me links to an article about MapReduce. I’ve hesitated to write about it, because the currently hyped MapReduce stuff was developed, and extensively used by Google, my employer. But the article is really annoying, and deserves a response. So I’m going to be absolutely clear. I am not commenting on this in my capacity as a Google employee. (In fact, I’ve never actually used MapReduce at work!) This is strictly on my own time, and it’s purely my own opinion. If it’s the dumbest thing you’ve ever read, that’s my fault, not Google’s. If it’s the most brilliant thing you’ve ever read, that’s my credit, not Google’s. I wasn’t asked to write this by Google, and I didn’t ask their permission, either. This is just me, the annoying geek behind this blog, writing solely
on my own behalf, speaking for no one but me. Got it?
The reason that I’m interested in this is because it’s related to my PhD work. Back in
grad school, what I worked on was applying parallelism to structured data in
non-scientific applications. That’s pretty much what MapReduce does. And the solution
that I proposed was a kind hierarchical scatter/gather operation – which is, very nearly, the
way that MapReduce works. The big difference? MapReduce beats what I did. The guys who designed MapReduce noticed something that I didn’t, and took advantage of it, which made M/R programs a lot cleaner and easier to write. The article that I’m going to discuss criticizes M/R for exactly that key idea.