Now that things are settling down a little bit, I wanted to get back to the stuff I was writing about π-calculus. But looking back at what I’ve already written, I think I did a terrible job of introducing it. So I’m going to start over, and try to be more clear.
I’ll start with a basic refresher of what π-calculus is for, and a bit of history for where it came from.
When computer networks first appeared and came into relatively widespread use, one of the problems that smart people recognized quite quickly is that communicating systems are fundamentally different from the kinds of computing systems that most of us are familiar with. The tools that we use to model conventional computing systems fail to capture many of the relevant properties of computation in a distributed world; and in fact, worse than just not capturing things, they often get things actively wrong.
So – some people started looking at more useful and/or realistic ways of describing
communicating systems – things called process calculi. The first major process calculus that I know of that’s still used today is CSP (Communicating Sequential Processes), first described in 1978 by C. A. R. Hoare. CSP became the basis of a
system called the transputer, and its programming language, Occam.
Just two years later, in 1980, Robin Milner proposed something similar
to CSP called CCS – the calculus of communicating systems. The CCS had a stronger formal
basis than CSP, and was somewhat more flexible in some ways. It was a very widely respected and widely used tool – in particular, it became the basis of a major tool for the formal description of network communication protocols, called LOTOS. But it had one
major problem: it was limited to static communication structures. What that means
is that it described communication over a specific, static network of communication
channels. Using the CCS, you could not add channels, remove channels, or rearrange the
interconnections of communication channels in a system.
The π-calculus is an extension of the CCS that addresses that limitation, by
replacing static communication channels with channel names. Instead of using a
predefined network of communication channels, the π-calculus introduced the notion of a
dynamic system of named channels, and communication instructions in the π-calculus work
in terms of channel names, not specific channels. Channel names can be created
dynamically, assigned to variables, and passed in messages. With the addition of channel
names, π-calculus could be used to describe many more kinds of communicating systems
much more easily than could be done in the systems that preceded it.
So, finally, let’s get down to some syntax. There are only a small number of
basic forms in π-calculus. I’m not going to use Milner’s syntax – it’s too hard to write in HTML. But I’m going to try to be more consistent than last time around. A π-calculus expression is called a process. Processes are written using the following forms:
- Send Message:
!name(values).Process
- If there is a matching receiver waiting, send the values as a message on the channel named by the value of the name.
- Receive Message:
?name(values).Process
- If there is a matching sender, receive the values that it’s sending as a message on the channel named by the value of name.
- Create Name:
new(vars).Process
- Create a new channel set of channel name, and assign them to the variables. The variables names are scoped to the rest of the process definition.
- Parallel Composition:
Process1 | Process2
- Run both Process1 and Process2 at the same
time. - Choice Composition:
Process1 + Process2
- Non-deterministically choose to run either Process1 or Process2.
- Replication:
*(Process)
- Spawn as many copies of a process as necessary. “
*(P)
” is equivalent to “P | *(P)
“. - Null:
∅
- An empty process. The null process is mainly used to express process termination.
In addition to those basic structures, I’ll also use “{}”s for grouping.
As usual for a calculus, we also need to look at reduction rules, the rules by which steps in a π-calculus process get evaluated. In π-calculus, reduction is written using →: “A” is reducible to “B” in one step is written “A → B”. The reduction rules are:
- The Fundamental Rule:
( !x(z).P | ?x(y).Q ) → P|(Q[z/y])
. This rule basically says that given a matching pair of message send and message receive in parallel processes, both the send and the receive can be executed in one step, and the values in the message replace the parameters named in the receiver. (Q[z/y] means “The process Q with every reference to name “z” replaced with a reference to name “y”.) - The Choice Rule: If
P→Q
, then(P+R)→Q
. This one is a little bit tricky. What it says is: if you have a choice between either process P or process R, and there is a one-step reduction of P to Q, then you can choose P, and do that reduction. - The Parallelism Rule: If
P→Q
, then(P|R)→(Q|R)
. All this says is that if there’s two processes running in
parallel, you can do a one-step reduction of one of the parallel processes without
affecting the other. - The Replication Rule: If
P→Q
, then*(P)→Q|*(P)
. This is just a restatement of what I said earlier about the
meaning of replication. - Name Binding Rule: If
P→Q
, thennew(x).P→new(x).Q
. This just says that introducing a new name around
a process doesn’t change the reduceability of the processes inside of it.
For today, we’ll just do one simple example. Suppose that we’ve got a communication channel named “out”, and any message received by “out” is printed to the standard output. Then we can write a hello world process:
new(in).( *(?in(x).!out(x).∅) | !in(hello).!in(world).∅ )
There are two processes composed in parallel. One is a replicated process – so there are conceptually an infinite number of copies of this instantiated. Each instantiated copy receives one message on the channel named “in”, and then sends it to out. The other process in here sends two messages to “in”: “hello”, and “world”.
Here’s an exercise for interested readers. What’s wrong with this process?
As for what’s wrong, I think that you mean that there’s no guarantee of the order in which ‘hello’ and ‘world’ are sent on out.
as in:
[original formula] -> new(in).( (?in(x).!out(x).0)
|*(?in(x).!out(x).0)
| !in(hello).!in(world).0)
-> new(in).( (?in(x).!out(x).0)
| (?in(x).!out(x).0)
|*(?in(x).!out(x).0)
| !in(hello).!in(world).0)
-> new(in).( !out(hello).0
| (?in(x).!out(x).0)
|*(?in(x).!out(x).0)
| !in(world).0)
-> new(in).( !out(hello).0
| !out(world).0
|*(?in(x).!out(x).0))
and then either of [!out(hello).0, !out(world).0] could reduce
… except, of course, that there’s no out, so that reduction is a bit tricky
That was great, but there are a couple things I’m not sure of:
What do the dots represent? They seem like sequencing or scoping or both, but I’m not sure.
Is there anything that helps deal with concurrency itself? The reductions are good because they can turn a complicated expression into a simple one, but even a simple one can be hard (like a race condition in Hello World).
There isn’t a rule that says “if (P|Q) -> R, then (*P|Q) -> (*P|R)” — and without that, the process as written can’t reduce. Replication as written applies only when the replicated process reduces when standing alone, needing no other process. Worse, if P does reduce when taken alone, *P is non-terminating; it can always copy P, even if no other process exists to interact with it.
Great post! This introduction is really much better than the previous.
I agree with Clayton’s reduction proof, and conclusion that the problem in this specification is that !out(hello).0 and !out.(world).0 are parallel processes, and so, there’s no way to tell which will be printed first. Theoretically speaking, they could be printed at the same time.
I only disagree with the latest step of the reduction in which you assume that there is a simplification rule that allows us to say that (P|0) -> P. It makes sense… but we have to play by the rules.
Maybe Mark could show us some light on this one…
Michael, I have to disagree with your comment that there must be a rule such as “if (P|Q) -> R, then (*P|Q) -> (*P|R)”. We have all the rules we need to reduce any pi-calculus expression.
I think that we even have a redundant rule, since “replication rule” is a shortcut that combines “replication definition” and “parallelism rule”.
I think technically that *P -> P | *P cannot be derived from the replication and parallelism rules unless you have a rule for introducing and removing 0.
I also think there are some additional rules of this kind. The reduction rules given are only the, well, reducing ones, but there are others that simply rearrange expressions reversibly, such as commutativity and associativity of | and +. Most of them are just common sense I guess, but there is one kind that I recall is of fundamental importance: The rules for renaming and moving new(v) outwards.
Basically and off the top of my head, the following may be interchanged freely, provided v and w do not occur (freely) in P:
P | new(v).Q = new(v).(P | Q)
P + new(v).Q = new(v).(P + Q)
new(v).P = new(w).P[w/v] (α-rule)
The importance of these rules is that they allow you to send newly created channel names to other processes that were not originally within the scope of the
new
.Also both here and presumably in the fundamental rule, the P[t/v] construct is so-called capture-avoiding substitution, as in λ-calculus. Basically you first use the α-rule recursively within P to rename any
new
whose variables may be confused with those in t.Or perhaps to put this differently, these rules allow you not to distinguish between variables and the channels/names they refer to. Now that I think about it, perhaps Mark introduced that distinction so that he didn’t have to confuse us with these rules, which seem more like symbol shuffling than actual semantics.
Daniel: yes, the “replication rule” is redundant, provided that *P and P|*P are defined as equivalent processes. With the rule I gave, “*P == P|*P” can be left out of the equivalence relation; replication’s definition is expressed wholly in the rule.
Mark, can a replicated process *(P) spawn 0 copies of P?
If not, many instances of *(P) would probably actually have to be “∅ + *(P)”.
As well, if *(P) can’t eliminate itself, how does any process using it ever terminate?
Mark, could you recommend a good introduction to parallel programming, something that would cover the issues that the Pi Calculus is trying to solve?
I am interested to learn about pi calculus. I need a very very very very basic introduction to it and its grammer and how can I use it with examples. pls. kindly if you know send the link or pdf file or web site to me. I found some material in the net but very complicated.
thanks.
omar