Sorry things have been so slow around here. I know I keep promising that I’m going to post more frequently, but it’s hard. Life as an engineer at a startup is exhausting. There’s so much work to do! And the work is so fun – it’s easy to let it eat up all of your time.
Anyway… last good-math post ’round these parts was about monoids and programming. Today, I’m going to talk about monads and programming!
If you recall, monoids are an algebraic/categorical construct that, when implemented in a programming language, captures the abstract notion of foldability. For example, if you’ve got a list of numbers, you can fold that list down to a single number using addition. Folding collections of values is something that comes up in a lot of programming problems – capturing that concept with a programming construct allows you to write code that exploits the general concept of foldability in many different contexts.
Monads are a construct that have become incredibly important in functional programming, and they’re very, very poorly understood by most people. That’s a shame, because the real concept is actually simple: a monad is all about the concept of sequencing. A monad is, basically, a container that you can wrap something in. Once it’s wrapped, you can form a sequence of transformations on it. The result of each step is the input to the next. That’s really what it’s all about. And when you express it that way, you can begin to see why it’s such an important concept.
I think that people are confused by monads for two reasons:
- Monads are almost always described in very, very abstract terms. I’ll also get into the abstract details, but I’ll start by elaborating on the simple description I gave above.
- Monads in Haskell, which are where most people are introduced to them, are very confusing. The basic monad operations are swamped with tons of funny symbols, and the basic monad operations are named in incredibly misleading ways. (“return” does almost the exact opposite of what you expect return to do!)
In programming terms, what’s a monad?
Basically, a monadic computation consists of three pieces:
- A monadic type, M which is a parametric type wrapper that can wrap a value of any type.
- An operation which can wrap a value in M.
- An operation which takes a function that transforms a value wraped in M into another value (possibly with a different type) wrapped in M.
Whenever you describe something very abstractly like this, it can seem rather esoteric. But this is just a slightly more formal way of saying what I said up above: it’s a wrapper for a series of transformations on the wrapped value.
Let me give you an example. At foursquare, we do all of our server programming in Scala. In a year at foursquare, I’ve seen exactly one null pointer exception. That’s amazing – NPEs are ridiculously common in Java programming. But in Scala at foursquare, we don’t allow nulls to be used at all. If you have a value which could either be an instance of A, or no value, we use an option type. An Option[T]
can be either Some(t: T)
or None
.
So far, this is nice, but not really that different from a null. The main difference is that it allows you to say, in the type system, whether or not a given value might be null. But: Option is a monad. In Scala, that means that I can use map on it. (map is one of the transformation functions!)
val t: Option[Int] = ... val u: Option[String] = t.map( _ + 2 ).map(_.toString)
What this does is: if t
is Some(x)
, then it adds two to it, and returns Some(x+2)
; then it takes the result of the first map, and converts it toa string, returning an Option[String]
. If t
is None
, then running map on it always returns None
. So I can write code which takes care of the null case, without having to write out any conditional tests of nullness – because optionality is a monad.
In a good implementation of a monad, I can do a bit more than that. If I’ve got a Monad[T]
, I can use a map-like operation with a function that takes a T
and returns a Monad[T]
.
For an example, we can look at lists – because List
is a monad:
val l: List[Int] = List(1, 2, 3) l.flatMap({ e => List( (e, e), (e+1, e+2) ) }) res0: List[(Int, Int)] = List((1,1), (2,3), (2,2), (3,4), (3,3), (4,5))
The monad map operation does a flatten on the map steps. That means a lot of things. You can see one in the rather silly example above.
You can take values, and wrap them as a list. THen you can perform a series of operations on those elements of a list – sequencing over the elements of the list. Each operation, in turn, returns a list; the result of the monadic computation is a single list, concatenating, in order, the lists returned by each element. In Scala, the flatMap operation captures the monadic concept: basically, if you can flatmap something, it’s a monad.
Let’s look at it a bit more specifically.
- The monadic type:
List[T]
. - A function to wrap a value into the monad: the constructor function from List
def apply[T](value: T): List[T]
- The map operation:
def flatMap[T, U](op: T => List[U]): List[U]
.
(In the original version of this post, the I put the wrong type in flatMap in the list above. In the explanation demonstrating flatMap, the type is correct. Thanks to John Armstrong for catching that!)
You can build monads around just about any kind of type wrapper where it makes sense to map over the values that it wraps: collections, like lists, maps, and options. Various kinds of state – variable environments (where the wrapped values are, essentially, functions from identifiers to values), or IO state. And plenty of other things. Anything where you perform a sequence of operations over a wrapped value, it’s a monad.
Now that we have some understanding of what this thing we’re talking about it, what is it in mathematical terms? For that, we turn to category theory.
Fundamentally, in category theory a monad is a category with a particular kind of structure. It’s a category with one object. That category has a collection of arrows which (obviously) are from the single object to itself. That one-object category has a functor from the category to itself. (As a reminder, a functor is an arrow between categories in the category of (small) categories.)
The first trick to the monad, in terms of theory, is that it’s fundamentally about the functor: since the functor maps from a category to the same category, so you can almost ignore the category; it’s implicit in the definition of the functor. So we can almost treat the monad as if it were just the functor – that is, a kind of transition function.
The other big trick is closely related to that. For the programming language application of monads, we can think of the single object in the category as the set of all possible states. So we have a category object, which is essentially the collection of all possible states; and there are arrows between the states representing possible state transitions. So the monad’s functor is really just a mapping from arrows to different arrows – which basically represents the way that changing the state causes a change in the possible transitions to other states.
So what a monad gives us, in terms of category theory, in a conceptual framework that captures the concept of a state transition system, in terms of transition functions that invisibly carry a state. When that’s translated into programming languages, that becomes a value that implicitly takes an input state, possibly updates it, and returns an output state. Sound familiar?
Let’s take a moment and get formal. As usual for category theory, first there are some preliminary definitions.
- Given a category, C, 1C is the identity functor from C to C.
- Given a category C with a functor T : C → C, T2 = T º T.
- Given a functor T, 1T : T → T is the natural transformation from T to T.
Now, with that out of the way, we can give the complete formal definition of a monad. Given a category C, a monad on C is a triple: (T:C→C, η:1C→T, μ:T2 → T), where T is a functor, and η and μ are natural transformations. The members of the triple must make the following two diagrams commute.
Commutativity of composition with μ
Commutativity of composition with η
What these two diagrams mean is that successive applications of the state-transition functor over C behave associatively – that any sequence of composing monadic functors will result in a functor with full monadic structure; and that the monadic structure will always preserve. Together, these mean that any sequence of operations (that is, applications of the monad functor) are themselves monad functors – so the building a sequence of monadic state transformers is guaranteed to behave as a proper monadic state transition – so what happens inside of the monadic functor is fine – to the rest of the universe, the difference between a sequence and a single simple operation is indistinguishable: the state will be consistently passed from application to application with the correct chaining behavior, and to the outside world, the entire monadic chain looks like like a single atomic monadic operation.
Now, what does this mean in terms of programming? Each element of a monadic sequence in Haskell is an instantiation of the monadic functor – that is, it’s an arrow between states – a function, not a simple value – which is the basic trick to monads. They look like a sequence of statements; in fact, each statement in a monad is actually a function from state to state. And it looks like we’re writing sequence code – when what we’re actually doing is writing function compositions – so that when we’re done writing a monadic sequence, what we’ve actually done is written a function definition in terms of a sequence of function compositions.
Understanding that, we can now clearly understand why we need the return function to use a non-monad expression inside of a monadic sequence – because each step in the sequence needs to be an instance of the monadic functor; an expression that isn’t an instance of the monadic functor couldn’t be composed with the functions in the sequence. The return function is really nothing but a function that combines a non-monadic expression with the id functor.
In light of this, let’s go back and look at the definition of Monad
in the Haskell standard prelude.
class Functor f where fmap :: (a -> b) -> f a -> f b class Monad m where (>>=) :: m a -> (a -> m b) -> m b (>>) :: m a -> m b -> m b return :: a -> m a fail :: String -> m a -- Minimal complete definition: -- (>>=), return m >> k = m >>= _ -> k fail s = error s
The declaration of monad is connected with the definition of functor – if you look, you can see the connection. The fundamental operation of Monad
is “>>=
” – the chaining operation, which is basically the haskell version of the map operation, which is type m a -> (a -> m b) -> m b
is deeply connected with the fmap
operation from Functor
‘s fmap
operation – the a
in m a
is generally going to be a type which can be a Functor
. (Remember what I said about haskell and monads? I really prefer map and flatMap to >> and >>=).
So the value type wrapped in the monad is a functor – in fact, the functor from the category definition! And the “>>=
” operation is just the functor composition operation from the monad definition.
A proper implementation of a monad needs to follow some fundamental rules – the rules are, basically, just Haskell translations of the structure-preserving rules about functors and natural transformations in the category-theoretic monad. There are two groups of laws – laws about the Functor
class, which should hold for the transition function wrapped in the monad class; and laws about the monadic operations in the Monad
class. One important thing to realize about the functor and monad laws is that they are not enforced – in fact, cannot be enforced! – but monad-based code using monad implementations that do not follow them may not work correctly. (A compile-time method for correctly verifying the enforcement of these rules can be shown to be equivalent to the halting problem.)
There are two simple laws for Functor
, and it’s pretty obvious why they’re fundamentally just strucure-preservation requirements. The functor class only has one operation, called fmap
, and the two functor laws are about how it must behave.
-
fmap id = id
(Mapping ID over any structured sequence results in an unmodified sequence) -
fmap (f . g) = (fmap f) . (fmap g)
(“.” is the function composition operation; this just says that fmap preserves the structure to ensure that that mapping is associative with composition.)
The monad laws are a bit harder, but not much. The mainly govern how monadic operations interact with non-monadic operations in terms of the “return
” and “>>=
” operations of the Monad
class.
-
return x >>= f = f x
(injecting a value into the monad is basically the same as passing it as a parameter down the chain – return is really just the identity functor passing its result on to the next step. I hate the use of “return”. In a state functor, in exactly the right context, it does sort-of look like a return statement in an imperative language. But in pretty much all real code, return is the function that wraps a value into the monad.) -
f >>= return = f
(If you don’t specify a value for a return, it’s the same as just returning the result of the previous step in the sequence – again, return is just identity, so passing something into return shouldn’t affect it.) -
seq >>= return . f = fmap f seq
(composing return with a function is equivalent to invoking that function on the result of the monad sequence to that point, and wrapping the result in the monad – in other words, it’s just composition with identity.) -
seq >>= (x -> f x >>= g) = (seq >>= f) >>= g
(Your implementation of “>>=
” needs to be semantically equivalent to the usual translation; that is, it must behave like a functor composition.)
Pingback: A note on null pointers « Otaku, Cedric's blog
Hi Mark,
Nice article. I was going to write a comment on your “null pointer exception” observation but it ended up being pretty long, so I turned it into a full blog post:
http://beust.com/weblog/2012/08/19/a-note-on-null-pointers/
Would appreciate your thoughts 🙂
Still working through, but some of this doesn’t quite sit right. The first I’ve come to is your definition of flatMap. You state that its argument is a function taking a List[T] to a List[U], but the real Scala flatMap (and pretty much every other one) takes T to List[U]. The idea being that first it maps the List[T] to a List[List[U]], then it flattens out that list of lists into a single list.
And this is what connects to the *real* category-theoretic significance of a monad: an endofunctor with unit and composition natural transformations — a monoid object in the category of endofunctors on a category. The importance of the flatMap is that it’s a different way of expressing the fact that we have a natural map from List[List[U]] to List[U], just as we have a natural map from Option[Option[U]] to Option[U], or in general from Monad[Monad[U]] to Monad[U].
Tagging along with that, there is definitely more than one object in the category of Scala types. The important thing is not that the functor goes from one object back to itself, but that it goes from the category back to itself.
While I wholeheartedly agree that “return” is an awful name (something like “encapsulate” makes more sense to me), I don’t think the main issue with most Haskell monad tutorials has anything to do with syntax. The bigger issue is that they’re motivated by the pragmatic need to do imperative tasks, which leads to a backwards perspective on what monads are. That is, they encourage the “Monads Are How We Do Imperative Programming… and other weird stuff” as opposed to “Monads Are How We Encapsulate Control Flow… including imperative programming style”.
The good Haskell monad tutorials ignore the IO monad and build up by explaining Maybe, List, and likely a parser or some other back-tracker. They end up looking remarkably similar to this blog post.
As an aside, my biggest annoyance with monads in Haskell is the increased need to use monad transformers due to the reliance on the State monad for simple things like efficient arrays.
Is the category theory actually necessary, or could programmers in a suitable language just use map() and flatMap() and enjoy the benefits?
The category theory is not necessary to use it.
What the category theory does is suggest the abstraction. The point of things like monads in programming is that there’s a very common mathematical abstraction that occurs in many different contexts – and so it’s likely a pattern that will be widely applicable in programming.
And more than that: when you’re building an algebraic structure in a programming language, designing it to be as generally useful as possible is hard. Things like category theory give you a tool for figuring out how to build structures that work together.
I certainly never thought of using something like map on an Option object. I’d programmed in ML for ages, and operating that way never occurred to me. But if you realize that the concept of optionality matches the concepts required of a monad, then it can open you up to the idea of treating it in a new way.
It’s been a long time since I have seen diagrams of the type you included here and I thought I would add a little history which you may or may not know.
Monads were called “Triples” by their originator,Jonathan Mock Beck. His 1964 thesis under Sammy Eilenberg at Columbia was called “Triples, Algebras and Cohomology.” It can be found here http://www.tac.mta.ca/tac/reprints/articles/2/tr2.pdf. He is, I believe, now, sadly, deceased. However, one of his first collaborators was Michael Barr, Peter Redpath Professor Emeritus of Pure Mathematics at McGill University. He is still with us and, happily, active. He and Charles Welles have written a couple of things which might be of interest.
Michael Barr, Charles Wells, Toposes, Triples and Theories. (http://www.tac.mta.ca/tac/reprints/articles/12/tr12.pdf)
Michael Barr, Charles Wells, Category Theory Lecture Notes for ESSLLI (http://www.ling.ohio-state.edu/~plummer/courses/winter09/ling681/barrwells.pdf)
I was a student of Barr’s far too many years ago when “Triples” were _the_ hot topic in category theory. Unfortunately, for personal reasons, I was forced from the true path of categorical paradise and labored in the desert areas of statistics and computer programming (and even tech support) for many years. This must make me seem like a heretic in the eyes of the faithful, but such is life.
As I remember my history, which might be very out of date by now, there was a controversy over the terminology in the early days. Saunders MacLane disliked the name “Triples” for reasons I do not recall. He insisted on renaming them “Monads” in his book called “Categories for the Working Mathematician”. I guess his terminology has won out.
It is nice to see these interesting creatures coming back into play. I wish I were young enough to understand the connection between them and Computer Science.
Just thought I’d mention it…..
By the way, are you by chance related to Ann Fleury? Ann used to be a friend of mine in college, but we lost touch after graduation.
My wife is named Ann Fleury. She has degrees from Rutgers, Illinois and Wisconsin. Perhaps you met her at one of those places.
…Or, perhaps there are two people with the same name.
Pingback: Monads, monads, monads, monads | A geek with a hat