So, we’ve built up some pretty nifty binary trees – we can use the binary tree both as the basis of an implementation of a set, or as an implementation of a dictionary. But our implementation has had one major problem: it’s got absolutely no way to maintain balance.
What that means is that depending on the order in which things are inserted to the tree, we might have excellent performance, or we might be no better than a linear list. For example, look at these trees. As you can see, a tree with the same values can wind up quite different. In a good insert order, you can wind up with a nicely balanced tree: the minimum distance from root to leaf is 3; the maximum is 4. On the other hand, take the same values, and insert them in a different order and you get a rotten tree; the minimum distance from root to leaf is 1, and the maximum is 7. So depending on luck, you can get a tree that gives you good performance, or one that ends up giving you no better than a plain old list. Playing with a bit of randomization can often give you reasonably good performance on average – but if you’re using a tree, it’s probably because O(n) complexity is just too high. You want the O(lg n) complexity that you’ll get from a binary tree – and not just sometimes.
To fix that, you need to change the structure a bit, so that as you insert things, the tree stays balanced. There are several different approaches to how you can do this. The one that we’re going to look at is based on labeling nodes in ways that allow you to very easily detect when a serious imbalance is developing, and then re-arrange the tree to re-balance it. There are two major version of this, called the AVL tree, and the red-black tree. We’re going to look at the red-black. Building a red-black tree is as much a lesson in data structures as it is in Haskell, but along with learning about the structure, we’ll see a lot about how to write code in Haskell, and particularly about how to use pattern-matching for complex structures.