For todays dose of pathological programming, we’re going to hit the worlds simplest language. A Turing-complete programming language with exactly *two* characters, no variables, and no numbers. It’s called [Iota][iota]. And rather than bothering with the rather annoying Iota compiler, we’ll just use an even more twisted language called [Lazy-K][lazyk], which can run Iota programs, Unlambda programs, as well as its own syntax.
[unlambda]: http://scienceblogs.com/goodmath/2006/08/friday_pathological_programmin_3.php
[lazyk]: http://esoteric.sange.fi/essie2/download/lazy-k/
[Iota]: http://ling.ucsd.edu/~barker/Iota/
Category Archives: Programming
Ultimate Spaghetti Coding with Linguine
Todays dose of programming pathology is a monstrosity called *Linguine*. Linguine is a language that (obviously) carries spaghetti code to an extreme. You can see the [language specification][linguine-spec], or download an [implementation with examples][linguine-impl].
It’s really a remarkably simple language. There are only 10 commands, including input and output:
1. “`x=y`”: assign the value of y to the variable x.
2. “`x+y`”: assign the value of x+y to the variable x.
3. “`x-y`”: assign the value of x-y to the variable x.
4. “`x|y`”: assign the value of x NAND y to the variable x.
5. “`x>y`”: assign the value of shifting x by y bits to x. If y is positive, then it shifts y bits right; if y is negative, then it shifts -y bits left.
6. “`x?`”: Input one character and store it in the variable x.
7. “`x$`”: Output the content of register x as a character.
8. “`x#`”: Output the content of register x as an integer.
9. “`x<y:loc`”: if x < y then jump to the program line labeled “loc”.
10. “`x~y:loc`”: if x = y then jump to the program line labeled “loc”.
Command operands on either integer literals, or on the contents of an unlimited set of numbered registers each of which contains an integer. Valid operands are either an integer (which identifies a register number if it appears on the left, or a literal value on the right); or an operand preceeded by “*”, which performs an indirection on the value it precedes. So for example, “*3” is the contents of register three. “**3” is the contents of the register whose number is stored in register 3, etc. Any of the operands may be any of these forms, so for example, “1~*3:**4” is a valid command.
A linguine statement consists of three parts: a line label, a list of commands, and a branch target, written “label[commands]target”. When executed, the line evaluates each command in the list in sequence. If it reaches the end of the list without having branched, then it will branch to the statement labeled by the branch target.
The lines of the program may appear in any order. The first statement executed will be the statement whose label is smallest. Branching to 0 ends program execution.
Comments start with a “‘”, and extend to the end of the line.
All quite simple, right?
As usual, we’ll start with hello world.
1[0=72,0$,0+29,0$,0+7,0$,0$,0+3,0$,1=32,1$,0-24,0$,0+24,0$,0+3,0$,0-6,0$,0-8,0$,1+1,1$,1-23,1$]0
That’s perfectly clear, isn’t it? Actually, it’s basically the same as the hello world program we saw in brainfuck. We’ll piece it apart.
* 0=72,0$
: 72 is the ascii code for “H”. So store that in register 0. Then output register 0.
* 0+29,0$
: 101 is the ascii code for “e”. So add 29 to the contents of register 0, which will make the new value of register 0 be 101. Then output that.
* etc.
How about something a tad more interesting? Let’s look at generating the
Fibonacci sequence:
1[0=32,2=1,1#,0$,2#]2
2[1+*2,3=*1,1=*2,2=*3,0$,2#]2
* This starts with line 1:
* “0=32
: Store the ascii code for a whitespace in register 0.
* “`2=1`”: store the value 0 in register 2.
* “`1#`”: output the value of register 1. (Since registers contain 0 when the program starts, that prints a zero.)
* “`0$`”: output the contents of register 0 as a character, which prints a whitespace.
* “`2#`”: print the contents of register 2 (1).
* Finally, branch to statement 2.
* Statement 2: registers 1 and 2 contain the previous two fibonacci numbers. So this adds them together to get the next one, and then rotates the values of the registers so that the newest fibonacci number is in register 2, and the one before that is in register one.
* “`1+*2`”: add the value of register 2 to register 1.
* “`3=*1`”: store the value of register 1 in register 3.
* “`1=*2`”: store the value of register 2 in register 1.
* “`2=*3`”: store the value of register 3 in register 2.
* “`0$,2#`”: output a space, and then output the value of register 2.
* Finally, the branch instruction repeats statement 2, to keep generating fibonacci numbers.
Here’s a nice little multiply program, which demonstrates how to write subroutines. Note that recursion doesn’t work here, because we need to use fixed register numbers. (You can set up recursion using additional levels of indirection, it’s just messy as heck.)
‘Multiply: -2 *= *-3
‘Programmed by Jeffry Johnston, 2005
‘-1=return jump, -4=temp, -5=temp
100[-4=*-3,-5=*-2,-2=0,-4<0:101]102
101[-4~0:*-1,-2-*-5,-4+1]101 '-3 is negative
102[-4~0:*-1,-2+*-5,-4-1]102 '-3 is positive
This takes the address of the instruction to return to in register -1; it takes the left operand of multiple (which is also the target of the result) in register -2, and the right operand in register -3. Registers -4 and -5 are used for temporary storage.
* Line 100: set up, and determine whether the right operand is positive or negative.
* "`-4=*-3,-5=*-2`": Copy the contents of register -3 to register -4, and the contents of register -2 to register -5.
* "`-2=0`": Store 0 in register -2.
* "`-4<0:101`": If the contents of register -4 is less than 0, then branch to statement 101.
* If you reach the end of this line, then the contents of register -4 is non-negative, and you branch to statement 102.
* Line 101: do multiplication by repeated subtraction
* "`-4~0:*-1`": If the contents of register -4 is zero, then we're done, so return by branching to the line labeled by the contents of register -1.
* "`-2-*-5`": Subtract the contents of register -5 from the contents of register -2 (the result register).
* "`-4-1`": decrement (or increment, depending on your point of view) the contents of register four, to show that you've done one subtraction.
* Repeat the line.
* Line 102: basically exactly the same as 101, except that the does a repeated add rather than a subtract.
An example of calling this would be:
1[-1=2,-2=36,-3=13]200
2[-2#]0
This stores 36 and 13 as the parameters to the multiple subroutine, and then invokes it by branching to it at line 200. The "return address" is set to statement 2 by storing 2 in register -1. When control returns to statement 2, the result of the multiplication is printed out.
One last example, which you can puzzle out on your own. A brainfuck interpreter in 21 lines of Linguine:
'BF interpreter in Linguine
'Programmed by Jeffry Johnston, 2005
1[*-2?,*-2~33:2,-2+1]1
2[*-2=-1,-2+1]3
3[-3=**-1,-3~-1:0,-3~43:4,-3~44:6,-3~45:7,-3~46:9,-3~60:10,-3~62:11,-3~91:12,-3~93:14]15
4[*-2+1,*-2~256:5]15 '+
5[*-2=0]15
6[*-2?,*-2~-1:5]15 ',
7[*-2-1,*-2~-1:8]15 '-
8[*-2=255]15
9[*-2$]15 '.
10[-2-1]15 '
12[*-2~0:13]15 ‘[
13[-3=1]16
14[-3=-1]16 ‘]
15[-1+1]3
16[-4=1]17
17[-1+*-3,*-1~91:18,*-1~93:19]20
18[-4+*-3]20
19[-4-*-3]20
20[-4~0:21]17
21[-3~1:15]3
[linguine-spec]: http://kidsquid.com/files/linguine/README
[linguine-impl]: http://kidsquid.com/files/linguine/linguine.tar.gz
Beautiful Insanity: Pathological Programming in SNUSP
Todays programming language insanity is a real winner. It’s a language called SNUSP. You can find the language specification [here][snuspspec], a [compiler][snuspcomp], and [an interpreter embedded in a web page][snuspinterp]. It’s sort of like a cross between [Befunge][befunge] and [Brainfuck][brainfuck], except that it also allows subroutines. (And in a variant, *threads*!) The real beauty of SNUSP is its beauty: that is, programs in SNUSP are actually really quite pretty, and watching them run can be positively entrancing.
SNUSP keeps its data on a tape, like Brainfuck. The basic instructions are very Brainfuck like:
1. “” move the data tape pointer one cell to the right.
3. “+” add one to the value in the current data tape cell.
4. “-” subtract one from the value of the current data tape cell.
5. “,” read a byte from stdin to the current tape cell.
6. “.” write a byte from the current tape cell to stdout.
7. “!” skip the next instruction.
8. “?” skip the next instruction if the current tape cell contains zero.
Then there’s the two-dimensional control flow. There aren’t many instructions here.
1. “/” bounce the current control flow direction as if the “/” were a mirror: if the program is flowing up, switch to right; if it’s flowing down, switch to left; if it’s flowing left, switch to down; and if its flowing right, switch to up.
2. “” bounce the other way; also just like a mirror.
3. “=” noop for drawing a path flowing left/right.
4. “|” noop for drawing a path flowing up/down.
So far, we’ve pretty much got a straightforward mix between Brainfuck and Befunge. Here’s where it becomes particularly twisted. It also has two more
instructions for handling subroutines. There’s a call stack which records pairs of location and direction, and the two instructions work with that:
1. “@” means push the current program location and the current direction onto the stack.
2. “#” means pop the top of the stack, set the location and direction, and *skip* one cell. If there is nothing on the stack, then exit (end program).
Finally, the program execution starts out wherever there is a “$”, moving to the right.
So, for example, here’s a program that reads a number and then prints it out twice:
/==!/===.===#
| |
$====,===@/==@/====#
So, it starts at the “$” flowing right. Then gets to the “,”, and reads a value
into the current tape cell. It hits the first “@”, records the location and direction on the stack. Then it hits the “/” mirror, and goes up until it hits the “/” mirror, and turns right. It gets to the “!” and skips over the “/” mirror, then the “.” prints, and the “#” pops the stack. So it returns to the
first “@”, skips over the “/” mirror, and gets to the second “@”, which pushes the stack, etc.
Here’s a simple subroutine for adding 48 to a cell:
=++++++++++++++++++++++++++++++++++++++++++++++++#
Or a slight variant:
/=+++++++++++++++++++++++++
| #+++++++++++++++++++++++/
|
$
Or (copying from the language manual), how about this one? This one starts to give you an idea of what I like about this bugger; the programs just *look* cool. Writing a program in SNUSP can be as much art as it is programming.
#//
$===!++++
/++++/
/=++++
!///
One last +48 subroutine,
123 4
/=@@@+@+++++#
|
$
This last one is very clever, so I’ll walk through it. The “1234” on the top are comments; any character that isn’t an instruction is ignored. They’re there to label things for me to explain.
The program goes to @1. It pushes the loc/dir on the stack. Then it gets to @2, pushed again. (So now the stack is “@1right,@2right”). Then @3, push (stack=”@1right,@2right,@3right”). Then add one to the cell. Push again (stack=@1right,@2right,@3right,@4right”). Then 5 “+”s, so add 5 to the cell. So we’ve added 6. Then we hit “#”, so pop, return to @4, skip one cell. So 4+s get executed. So we’ve added 10. Then pop again (so stack is “@1right,@2right”), return to @3, skip one instruction. So we’re back to @4; push (stack=@1right,@2right,@4right). Add five (we’re up to “+15”), and pop the stack, which brings us back to @4 again, and skip one cell, so now add another 4 (+19). Pop (stack=@1right), and we’re at @2. Skip one instruction, so we jump over @3. Then add one (+20), and repeat what happened before when we first got to “@4”, adding another 9 (+29). Pop again (so stack is empty), skip one instruction, so we’re at @3. Skip, push, repeat from @4 (+38). Pop back to @2, skip @3, add one (+39), push @4, repeat the same thing from @4 again (+48).
Here’s a real beauty: Multiplication, with documentation. If you look at it carefully, it’s actually reasonably clear how it works! Given this instruction set, that’s *truly* an amazing feat.
read two characters ,>,== * /=================== ATOI ———-
convert to integers /=/@</@=/ * // /===== ITOA ++++++++++ /———-/
multiply @ =!=========/ // /++++++++++/ ———-
convert back !/@!============/ ++++++++++ /———-/
and print the result / .# * /++++++++++/ ——–#
/====================/ * ++++++++#
|
| /- #/?=<<<>>> />>+<+>>>+/ // /======== BSPL2 !======?/#
| /->+< /===|=========== FMOV4 =/ // /<+>-
| #?===/! FMOV1 =|===|============== /====/ /====== FSPL2 !======?/#
| /==|===|==============|==|=======/
| * * *|* | * | * * * * * * *|* | * * * /+@/>@/>>=== /====>>@<@=?/@==<#<<<== !<<<>>>-?/ * // /-
* \ /@========|======!/?>=!/?>!/=======?<<<#
| -/>>+<<+-?+>?/
\!=</</
=== 0 and y = 0
or A(x-1,A(x,y-1)) if x > 0 and y > 0
The value of Ackerman’s function grows like nothing else. A(4, 2) is about 2×1019728.
Here’s Ackerman’s in SNUSP:
/==!/==atoi=@@@-@—–#
| |
| | /=========!==!==== ** recursion **
$,@/>,@/==ack=!? j+1
j i -@/# | | A(i,0) -> A(i-1,1)
@>@->@/@ A(i-1,A(i,j-1))
# # | | |
/-<>!=/ =====|==@>>>@< 0) ? ? | | |
>>+<>+>+<<>>+<<<-/
[snuspspec]: http://www.deepwood.net/~drlion/snusp/snusp-1.0-spec-wd1.pdf
[snuspcomp]: http://www.baumanfamily.com/john/esoteric.html
[snupsinterp]: http://www.quirkster.com/snusp/snusp-js.html
[brainfuck]: http://scienceblogs.com/goodmath/2006/07/gmbm_friday_pathological_progr.php
[befunge]: http://scienceblogs.com/goodmath/2006/07/friday_pathological_programmin.php
Friday Pathological Programming: Unlambda, or Programming Without Variables
Todays tasty treat: a variable-free nearly-pure functional programming language: Unlambda.
Unlambda is based on the SKI combinator calculus. The [SKI calculus][ski] is a way of writing lambda calculus without variables, and without lambda. It’s based on three combinators: S, K, and I:
1. S = λ x y z . x z (y z)
2. K = λ x y . x
3. I = λ x . x
Given only S and K, you can actually write *any* lambda calculus expression – and therefore, you can write any computable function using nothing but S and K. I is often added for convenience; it’s just a shorthand for “SKK”.
You can do recursion in SKI; the Y combinator of lambda calculus is:
Y = S S K (S (K (S S (S (S S K)))) K)
Unlambda is based on SKI; but it extends it with a few additional kinds of sugar to make things easier to write. (Nothing makes Unlambda easier to *read*!) The basic idea is that *everything* in unlambda is a function; and in particular, everything is a function that takes a *single* variable: you don’t need more than one parameter, because you can always using [currying][curry] to express any multiparameter function as a series of single-parameter functions.
(As a refresher, if you have a two-parameter function in lambda calculus, like “λ x y . x + y”; *currying* is translating it into a one parameter function that returns a one parameter function: “λ x .(λ y. x + y)”.)
There are eight basic instructions in Unlambda:
1. ““”` : the application operator. “`’AB`” applies the function `A` to the parameter `B`. You can think of this as being something like the “(” in Lisp; there is no close paren because since every function takes only one parameter, there is no *need* for a close paren. And hey, adding a syntactically unnecessary close paren would only make the code easier to read, and we can’t have that!
2. “`k`” : the K combinator.
3. “`s`” : the S combinator.
4. “`i`” : the I combinator.
5. “`v`” : a “drop” combinator: takes a parameter, discards it, and returns “`v`”
6. “`d`” : the “delay” operator; a way of doing *lazy evaluation* in Unlambda. Unlambda normally evaluates the argument before it
applies a function. `’dA` is the same thing as `A`, except that `A` isn’t evaluated until “`’dA`” is applied to something else.
7. “`c`” : the “call-with-current-continuation” combinator. This is equivalent to [“call/cc”][callcc] in Scheme. What it does is call some function with the current state of the computation captured as a function as its parameter. What this means is that “`’cAB`” calls “`A`”. “`A`” receives the value “`cAB`” as a parameter; during A, it can return a value, *or* it can call the continuation, which will evaluate “`cAB`” again. So this is the function-based equivalent of a loop. “c” turns the “loop body” into a function; so at the end of the loop, you re-invoke the continuation with a new parameter to run the next iteration.
8. “`.`” : The “.” operator is actually a meta-operator. “`.x`” is a function which returns its parameter, *and* prints the character “x”.
9. “`r`” is a shorthand for “`.CR`”, where CR is the carriage return character.
And that is it. The entire language.
Note that there are no numbers! We need to build numbers using church numerals.
* The church numeral “0” is “λ f. (λ x . x)”. In Unlambda, that’s “`’ki`”.
* The church numeral “1” is “λ f . (λ x . f x)”; in unlambda, that’s “`i`”.
* The church numeral “2” is “λ f . (λ x. f f x)”; or “`”s”s’kski`”.
* The church numeral “3” is “λ f . (λ x . f f f x)”; or “`”s”s’ksk”s”s’kski`”.
* And so on.. Just keep appending “`”s”skski`” for each successive number.
So… On to a couple of programs!
Hello world in Unlambda is:
`r“““““`.H.e.l.l.o. .w.o.r.l.di
It’s pretty straightforward. It’s actually sort of easier to understand if you look from right to left. The “`.H.e.l.l.o…`” is basically a series of “`i`” combinators, which print out the characters of hello world. The trailing “`i`” is there because “`.d`” needs a parameter in order to do anything. And the list of “`’`”s at the beginning are there to make the “.” functions be applied. Finally, apply “`’r`” to the whole thing – which prints a newline at the end.
Next, a counter: starts with “N=1”; print “N” asterisks, add one, and then repeat.
“r`ci“s`k`c“s“s`ksk`kr.*
If you look at it, you can see the number pattern in there; “`”s”s’ksk`”. That fragment is an SKI function to add one. You capture a continuation for the main iteration; run “`i”s’k’cN`”; that’s a fragment that adds “1” to N, and then prints N. Since you’ve captured the continuation, you can re-invoke it on 1+n; the rest captures *another* continuation which is the inner loop for printing “*”s; it does that, and then it invokes the out continuation to go back and do it for “N+1”.
One last, which I’ll leave for you to trace through yourself: this one generates the fibonacci sequence; and for each element of it, it prints out the numbers using a line of asterisks containing “N” asterisks for the number “N”:
“`s“s“sii`ki
`k.*“s“s`ks
“s`k`s`ks“s“s`ks“s`k`s`kr“s`k`sikk
`k“s`ksk
I’ll also point out that there are a ton of variants on Unlambda, ranging from the really interesting ([LazyK][lazyk]) to the bizzarely minimal ([Iota][iota]) to the downright goofy ([Jot][jot]).
[callcc]: http://community.schemewiki.org/?call-with-current-continuation
[ski]: http://goodmath.blogspot.com/2006/05/from-lambda-calculus-to-combinator.html
[curry]: http://goodmath.blogspot.com/2006/05/my-favorite-calculus-lambda-part-1.html
[iota]: http://ling.ucsd.edu/~barker/Iota/
[lazyk]: http://homepages.cwi.nl/~tromp/cl/lazy-k.html
[jot]: http://ling.ucsd.edu/~barker/Iota/#Goedel
Friday Pathological Programming: Programming Through Grammars in Thue
Today’s treat: [Thue][thue], pronounced two-ay, after a mathematician named Axel Thue.
Thue is a language based on [semi-Thue][semi-thue] grammars. semi-Thue grammars are equivalent to Chosmky-0 grammars, except that they’re even more confusing: a semi-Thue grammar doesn’t distinguish between terminal and non-terminal symbols. (Since Chomsky-0 is equivalent to a turing machine, we know that Thue is turing complete.) The basic idea of it is that a program is a set of rewrite rules that tell you how to convert one string into another. The language itself is incredibly simple; programs written in it are positively mind-warping.
There are two parts to a Thue program. First is a set of grammar rules, which are written in what looks like pretty standard BNF: “*<lhs> ::= <rhs>*”. After that, there is a section which is the initial input string for the program. The two parts are separated by the “::=” symbol by itself on a line.
As I said, the rules are pretty much basic BNF: whatever is on the left-hand side can be replaced with whatever is on the right-hand side of the rule, whenever a matching substring is found. There are two exceptions, for input and output. Any rule whose right hand side is “:::” reads a line of input from the user, and replaces its left-hand-side with whatever it read from the input. And any rule that has “~” after the “::=” prints whatever follows the “~” to standard out.
So, a hello world program:
x ::=~ Hello world
::=
x
Pretty simple: the input string is “x”; there’s a replacement for “x” which replaces “x” with nothing, and prints out “Hello world”.
How about something much more interesting. Adding one in binary. This assumes that the binary number starts off surrounded by underscore characters to mark the beginning and end of a line.
1_::=1++
0_::=1
01++::=10
11++::=1++0
_0::=_
_1++::=10
//::=:::
::=
_//_
Let’s tease that apart a bit.
1. Start at the right hand side:
1. “1_::=1++” – If you see a “1” on the right-hand end of the number,
replace it with “1++”, removing the “_” on the end of the line. The rest
of the computation is going to use the “++” to mark the point in
string where the increment is going to alter the string next.
2. “0_::=1” – if you see a “0” on the right-hand-end, then replace it with 1. This doesn’t insert a “++”, so none of the other rules are going to do anything: it’s going to stop. That’s what you want: if the string ended in a zero, then adding one to it just changes that 0 to a 1.
2. Now, we get to something interesting: doing the addition:
1. If the string by the “++” is “01”, then adding one will change it to “10”, and that’s the end of the addition. So we just change it to 10, and get rid of the “++. ”
2. If the string by the “++” is “11”, then we change the last digit to “0”, and move the “++” over to the left – that’s basically like adding 1 to 9 in addition: write down a zero, carry the one to the left.
3. Finally, some termination cleanup. If we get to the left edge, and there’s a zero, after the “_”, we can get rid of it. If we get to the left edge, and there’s a “1++”, then we replace it with 10 – basically adding a new digit to the far left; and get rid of the “++”, because we’re done.
3. Finally, inputting the number “//::=:::” says “If you see “//”, then read some input from the user, and replace the “//” with whatever you read.
4. Now we have the marker for the end of the rules section, and the string to start the program is “_//_”. So that triggers the input rule, which inputs a binary number, and puts it between the “_”s.
Let’s trace this on a couple of smallish numbers.
* Input: “111”
1. “_111_” → “_111++”
2. “_111++_” → “_11++0”
3. “_11++0” → “_1++00”
4. “_1++00” → “1000”
* Input: “1011”
1. “_1011_” → “_1011++”
2. “_1011++” → “_101++0”
3. “_101++0” → “_1100”.
Not so hard, eh?
Decrement is the same sort of thing; here’s decrement with an argument hardcoded instead of reading from input:
0_::=0–
1_::=0
10–::=01
00–::=0–1
_1–::=@
_0–::=1
_0::=
::=
_1000000000000_
Now, something more complicated. From [here][vogel], a very nicely documented program that computes fibonacci numbers. It’s an interesting mess – it converts things into a unary form, does the computation in unary, then converts back. This program uses a fairly neat trick to get comments. Thue doesn’t have any comment syntax. But they use “#” on the left hand side as a comment marker, and then make sure that it never appears anywhere else – so none of the comment rules can ever be invoked.
—-
#::= fibnth.t – Print the nth fibonacci number, n input in decimal
#::= (C) 2003 Laurent Vogel
#::= GPL version 2 or later (http://www.gnu.org/copyleft/gpl.html)
#::= Thue info at: http://www.catseye.mb.ca/esoteric/thue/
#::= This is modified thue where only foo::=~ outputs a newline.
#::= ‘n’ converts from decimal to unary-coded decimal (UCD).
n9::=*********,n
n8::=********,n
n7::=*******,n
n6::=******,n
n5::=*****,n
n4::=****,n
n3::=***,n
n2::=**,n
n1::=*,n
n0::=,n
n-::=-
#::= ‘.’ prints an UCD number and eats it.
.*********,::=.~9
.********,::=.~8
.*******,::=.~7
.******,::=.~6
.*****,::=.~5
.****,::=.~4
.***,::=.~3
.**,::=.~2
.*,::=.~1
.,::=.~0
~9::=~9
~8::=~8
~7::=~7
~6::=~6
~5::=~5
~4::=~4
~3::=~3
~2::=~2
~1::=~1
~0::=~0
#::= messages moving over ‘,’, ‘*’, ‘|’
*<::=<*
,<::=<,
|<::=*::=*0>
0>,::=,0>
0>|::=|0>
1>*::=*1>
1>,::=,1>
1>|::=|1>
2>*::=*2>
2>,::=,2>
2>|::=|2>
#::= Decrement n. if n is >= 0, send message ‘2>’;
#::= when n becomes negative, we delete the garbage using ‘z’ and
#::= print the left number using ‘.’.
*,-::=,2>
,,-::=,-*********,
|,-::=z
z*::=z
z,::=z
z|::=.
#::= move the left number to the right, reversing it (0 becomes 9, …)
#::= reversal is performed to help detect the need for carry. As the
#::= order of rule triggering is undetermined in Thue, a rule matching
#::= nine consecutive * would not work.
*c<::=c
,c<::=c
|c
0>d*::=d
1>d::=d*********,
#::= when the copy is done, ‘d’ is at the right place to begin the sum.
2>d::=f<|
#::= add left to right. 'f' moves left char by char when prompted by a '<'.
#::= it tells 'g' to its right what the next char is.
*f
,f
#::= when done for this sum, decrement nth.
|f’ ‘g’ drops an ‘i’ that increments the current digit.
0>g::=ig
*i::=<
,i::=i,*********
|i::=’ tells ‘g’ to move left to the next decimal position,
#::= adding digit 0 if needed (i.e. nine ‘*’ in reversed UCD)
*1>g::=1>g*
,1>g::=g::=|’ tells ‘g’ that the sum is done. We then prepare for copy (moving
#::= a copy cursor ‘c’ to the correct place at the beginning of the left
#::= number) and reverse (using ‘j’) the right number back.
*2>g::=2>g*
,2>g::=2>g,
|2>g::=c|*********j
#::= ‘j’ reverses the right number back, then leaves a copy receiver ‘d’
#::= and behind it a sum receiver ‘g’.
*j*::=j
j,*::=,********j
j,,::=,*********j,
j,|::=’ sent by ‘-‘
#::= will start when hitting ‘g’ the first swap + sum cycle
#::= for numbers 0 (left) and 1 (reversed, right).
::=
|n?-|,|********g,|
——
Ok. Now, here’s where we go *really* pathological. Here, in all it’s wonderful glory, is a [Brainfuck][brainfuck] interpreter written in Thue, written by Mr. Frederic van der Plancke. (You can get the original version, with documentation and example programs, along with a Thue interpreter written in Python from [here][vdp-thue])
——
#::=# BRAINFUNCT interpreter in THUE
#::=# by Frederic van der Plancke; released to the Public Domain.
o0::=0o
i0::=0i
o1::=1o
i1::=1i
o]::=]o
i]::=]i
o[::=[o
i[::=[i
oz::=zo
oZ::=Zo
iz::=zi
iZ::=Zi
0z::=z0
1z::=z1
]z::=z]
[z::=z[
_z::=z_
0Z::=Z0
1Z::=Z1
]Z::=Z]
[Z::=Z[
_Z::=Z_
}}]::=}]}
&}]::=]&}
&]::=]^
}[::=[{
&[::=[0::=0}
{1::=1{
?Z[::=[&
?z[::=[^
?z]::=]
?Z]::=]^
[{{::={[{
[{::=[
[::=[^
]{::={]
]::={]
0::=
1::=1
0{::={0
1{::={1
^[::=?[iii
^]::=?]iii
}|::=}]|WARN]WARN
&|::=&]|WARN]WARN
WARN]WARN::=~ERROR: missing ‘]’ in brainfunct program; inserted at end
^000::=000^ooo
^001::=001^ooi
^010::=010^oio
^011::=011^oii
^100::=100^ioo
^101::=101^ioi
ooo|::=|ooo
ooi|::=|ooi
oio|::=iio|oio
oii|::=|oii
ioo|::=|ioo
ioi|::=|ioi
iii|::=|iii
iio|!::=|
|z::=z|
|Z::=Z|
o_::=_o
i_::=_i
0!::=!0
1!::=!1
_!::=!_
/!::=!/
oooX!::=Xooo
oooY::=$Y
0$::=!1
1$::=$0
X$::=X!1
ooiX!::=Xooi
ooiY::=@1Y
1@1::=@00
0@1::=@11
1@0::=@01
0@0::=@00
X@1::=X!WARNDEC0WARN
WARNDEC0WARN::=~WARNING: attempt at decrementing zero (result is still zero)
X@00::=X@0
X@01::=X!1
X@0Y::=X!0Y
oioX!::=LLYLR
0LL::=LL0
1LL::=LL1
_LL::=!X!
|LL::=|!0X!
LR0::=0LR
LR1::=1LR
LRY::=_
oiiX!::=_oii
oiiY::=X!%
%0::=0%
%1::=1%
%_0::=Y0
%_1::=Y1
%_/::=Y0_/
iooX!::=X(in)
(in)0::=(in)
(in)1::=(in)
(in)Y::=Yi
i/0::=Zi/
i/1::=zi/
i/_::=!/
XY!::=X!0Y
0Y!::=0!Y
1Y!::=1!Y
YZ::=0Y
Yz::=1Y
ioiX!::=X(out)
(out)0::=0(out)O0
(out)1::=1(out)O1
(out)Y::=OO!Y
O0::=~0
O1::=~1
OO::=~_
iiiX!0::=ZX!0
iiiX!1::=zX!1
+::=000
-::=001
::=011
,::=100
.::=101
:::=|X!0Y0_/
::=
^*
[vdp-thue]: http://fvdp.homestead.com/files/eso_index.html#BFThue
[thue]: http://esoteric.voxelperfect.net/wiki/Thue
[semi-thue]: http://en.wikipedia.org/wiki/Semi-Thue_grammar
[brainfuck]: http://scienceblogs.com/goodmath/2006/07/gmbm_friday_pathological_progr.php
[vogel]: http://lvogel.free.fr/thue.htm
Friday Pathological Programming Language: Whenever
I was hoping for a bit of a vanity post for todays pathological programming language in honor of my 40th birthday (tomorrow), but I didn’t have time to finish implementing my own little piece of insanity. So it’ll have to wait for some other occasion.
Todays pathological programming language is a really simple monstrosity called [“Whenever”][whenever]. Whenever is a programming language where programs get executed in *random* order, and there are *no* variables. The only state, the only way of manipulating information in a Whenever program is by manipulating the collection of executable statements itself: the program is both the code *and* the data at the same time.
The basic program model of Wheneveris: you write a set of statements. The statements are inserted into a grab-bag by the interpreter. The interpreter then repeatedly picks a statement out of the bag *at random* and executes it. It keeps doing that until there are no more statements in the bag.
Everything in Whenever is done by manipulating the statement bag.
Each statement is labeled by a number. The number has no direct meaning; it’s just an identifier that will be used to reference the statement. The numbers assigned to lines don’t have to match the order in which the lines were written in the source file; and they have no effect on the order by which statements are pulled out of the bag.
So how do you do anything in this crazy language?
There’s a print statement for printing things out. So, for example,
23 print(“Hello worldn”);
is the hello world program. Since there’s only one statement, it’ll get grabbed from the statement bag, and executed.
There’s also a read statement, which reads a number from standard input, and then acts as if the statement were the number that it read.
The simplest statement is just a number. A number statement *adds* a copy of the line identifier by that number to the bag. If the number is negative, then it *removes* a copy of the statement from the bag. So, for example:
23 print(“Hello worldn”);
11 23;
would print “Hello world” twice. If 23 were executed first, it would print “Hello world”, and then only 11 would be in the bag, so it would execute 11, which would add 23 to the bag. If 11 went first, then it would add 23 to the bag, and there would be two 23s in the bag, which would get executed.
You can add multiple copies of a line to the bag: 5#3 adds 3 copies of statement 5 to the bag. And you can add multiple statements to the bag at once, by separating them with a comma. So:
17 21, -2, 3#4, -5#2;
Would insert one copy of statement 21 and four copies of statement 3 to the bag; and remove one copy of statement 2, and two copies of statement 5.
You can also do any normal arithmetic operation on numbers. The result of the arithmetic operation is interpreter as a line number.
There’s also two kinds of conditionals, “defer” and “again”. Defer takes a parameter which is evaluated to a line number, and if there are any copies of that line number in the bag, then it reinserts itself, and doesn’t do anything else. If there are no copies of the parameter in the bag, then the statement on the rest of the line is executed.
Again, an example:
1 print(“Hello”);
2 defer(1) print(“Worldn”);
is a more complicated version of hello world.
The “again” statement is very similar to the “defer” statement; but if its argument is true (i.e., evaluates to a statement that is present in the bag), then it adds a copy of itself to the bag; whether the parameter is true or false, it then executes the rest of the statement.
There’s one helpful built-in function: N(x) returns the number of copies of statement x in the bag.
So, a couple of interesting example programs:
1 defer (4 || N(1)<N(2) && N(2)<N(3)) print(N(1)+" bottles of beer on the wall, "+N(1)+" bottles of beer,");
2 defer (4 || N(1)==N(2)) print("Take one down and pass it around,");
3 defer (4 || N(2)==N(3)) print(N(1)+" bottles of beer on the wall.");
4 1#98,2#98,3#98;
This first ensures that statement four runs first: statements 1, 2, and 3 will all defer until 4 has been executed. Once four is run, there are 99 copies of statements 1, 2, and 3 in the bag. The rest of the defer statement makes sure that 1 executes before 2, and 2 before 3; so it cycles through 1, 2, 3 99 times. Pretty simple, right?
How about this?
1 again (1) defer (3 || N(1)99) 2#N(1),3,7;
2 again (2) defer (3 || N(2)99) 1#N(2),3,7;
3 defer (5) print(N(1)+N(2));
4 defer (5) print(“1”);
5 4,-3,7;
6 defer (4) 3;
7 7;
8 defer (N(7)<100) -1#N(1),-2#N(2),-7#100;
9 defer (3 || 6) 1,3;
If you look carefully.. It generates the first 100 fibonacci numbers.
It's an incredibly simple language. Simple, but quite twisted, and seriously mind-warping to try to program: you need to always keep track of the fact that the statements that represent your data could get selected for execution, which will modify the data unless you used a "defer" as a guard, but then you need to make sure that the guard gets preserved correctly… It's quite devious.
[whenever]: http://www.dangermouse.net/esoteric/whenever.html
Friday Pathological Programming: Befunge, the 2-dimensional language
Today, we’re going to take a look at a brilliant language called Befunge. Befunge is the work of an evil genius named Chris Pressey.
Normal programming languages are based on a basically one-dimensional syntax; the program is a string, a sequence of characters, and it’s processed by reading that string in a straight-ahead fashion. But that’s not Befunge! Befunge is something like a two-dimensional turing machine: it says that the program and data are written on a two dimensionaltorus. Each instruction in Befunge is a single character, and where it’s located on the torus is crucial. (In case you’re not familiar with a torus, it’s what you get if you take a very flexible sheet of paper, and roll it so that you connect the top edge to the bottom, and then roll that tube so that you connect the left edge to the right. You get a donut shape where moving up from what used to be the top of the page puts you on the bottom of the page; moving left from the left edge of the page puts you on the right.) This torus is called the playfield.
The basics of computation in Befunge are pretty straightforward. It’s a stack based language. Operations take their parameters from the stack, and leave their results on the stack. Nothing too complicated there. There are arithmetic operators, IO operators, control flow operators, all operating on the values on the stack.
The arithmetic operators are the usual kinds of things: There are operators for addition (+), subtraction (-), division (/), multiplication (*), modulo (%), and logical negation (!). Digit characters are treated as operators that push the numeric value of the digit onto the stack. (So “99” will create a stack with two nines.) Comparisons are done using “`”, which pops the top two values from the stack and compares them. So if the top of the stack was “x”, and the value beneath it was “y”, then “`” would leave a “0” on the stack if x≤y, and 1 if x > y.
For IO, there are four operators. “&” reads an integer from standard input and pushes it onto the stack. “~” reads a single character from standard input, and leaves it on the stack. “.” pops a value off the stack and writes it to standard out as an integer. “,” pops a value off the stack and writes it to standard out as a character.
Of course, we can’t have a stack-based language without some stack operators: “:” makes a duplicate of the top value on the stack; “$” discards the top value on the stack; “” swaps the top two values of the stack.
So far, nothing has looked particularly pathological – in fact, nothing even looks particularly strange, other that the fact that it’s pretty illegible because of the single-character operators. But now, we get to control flow, and that is where the insanity/brilliance of Befunge reveals itself.
In Befunge, there’s a read-head that moves over the program. Each step, it executes the instruction under the head. But instead of just moving left or right, it can move left, right, up, or down. “>” is an instruction that tells the head to start moving to the right; “<" tells the head to start moving left; "^" means start moving up, and "v" means to start moving down. So, for example:
>v ^<
Is a program that runs an infinite loop: the head will just cycle over those four characters. An even more interesting infinite loop (taken from the befunge documentation) is:
>v>v >^ ^ <
Conditionals work by picking the direction that the head will move: “_” pops the stack, and if the value is zero, then it makes the head move right (“>”); if it’s non-zero, it makes the head move left (“<"). Similarly, "|" pops a value, and makes the head move up if the value was non-zero, or down if it was zero. To make things confusing, "#" means "skip the next instruction." (Actually, it's important for when a vertical and horizontal control flow cross.) And finally,
"@" is the exit command; it makes the head stop, and the program halt.
There’s also a little hack for strings. A double-quote character (“) starts a string; when one is encountered, the head keeps moving in the same direction, but instead of executing the characters as instuctions, it just pushes the character values onto the stack. When the second quote is found, it goes back to executing instructions.
Finally, just in case the basic two dimensional flow control isn’t pathological enough, there are two instructions for modifying cells on the playfield! “g” pops an X and Y value off the stack, and pushes the character at (X,Y) onto the stack; “p” pops an X, Y, and a character off the stack, and writes the character onto location (X,Y) of the playfield. (The coordinates are relative to the cell where the program started.)
So, let’s look at a couple of befunge programs. As usual, we start with the good old “hello world”.
v >v"Hello world!"0< ,: ^_25*,@
We start at the top left, head moving right. It moves until it hits the “v” character, which makes it go down; then “<" makes it go left. 0 pushes a zero onto the stack. Then we've got a quote – so it starts pushing characters onto the stack until the next quote. When we get to the next quote, if we wrote the stack so that the top comes first, it would look like : ('H' 'e' 'l' 'l' 'o' ' ' 'w' 'o' 'r' 'l' 'd' '!' 0 ).
Then we hit a “v” which makes the head go down. This is the beginning of a loop; the leftmost two characters of rows 2, 3, and 4 are a while loop! The head goes down to “:” which duplicates the top of the stack; then it hits “_”, which turns left if the value on top of the stack is not zero; then the head turns up, outputs a character, turns right, and we’re back at the start of the loop. So the loop will output each character until we get the the “0” we pushed before the string; then at the “_”, we turn right. 2 and 5 are pushed on the stack and multiplied, leaving a 10, which is the linefeed character (basically “n” for you C weenies). It outputs the linefeed, and then exits.
How about a truly pathological example? Here’s a self-reproducing program in 12 bytes.
:0g,:93+`#@_1+
Stepping through that:
- Dup the value on top of the stack. That’ll produce a “0” if the stack is empty. (So first pass, stack=[0])
- “0” Push a zero on the stack. (First pass, stack=[0,0])
- “g”: Fetch the value at (x,y) on the stack; that’s (0,0) initially. (First pass, stack = [‘:’])
- “,”: Output it. So we printed the character at (0,0) (First pass, stack = [])
- “:” dup the stack top again. (First pass, stack = [0])
- “93+”. Get the number 12 onto the stack. (First pass, stack = [12,0])
- Compare what was on top of the stack to twelve. Leave a 0 there if it was, or a 1 if it wasn’t. (First pass, stack = [0]).
- “#” skip over the next character.
- “_” go right if the top of stack is zero; left if it’s one. So if the value copied by the second “:” was greater than 12, then go left (in which case you hit “@” and halt); otherwise, keep going right.
- “1+”: add one to the top of the stack (First pass, stack = ([1])). Then keep going right until you hit the right edge, and the you jump back to the left edge, so you’re at the first “:” again.
- Now the whole thing repeats, except that there’s a one on the stack. So the “g” will fetch (1,0); the “12” will be compared to 1, and the “1+” on the end will leave “2” on the stack. So now it fetches and outputs (2,0). And so on, until it reaches the “(_)” after outputting (12,0), when it halts.
One more, which I’ll let you figure out for yourself. Here’s a program that prompts you for a number, and computes its factorial:
v >v"Please enter a number (1-16) : "0$*99g1-:99p#v_.25*,@ ^_&:1-99p>:1-:!|10 < ^ <
GM/BM Friday: Pathological Programming Languages
In real life, I’m not a mathematician; I’m a computer scientist. Still a math geek, mind you, but what I really do is very much in the realm of applied math, researching how to build systems to help people program.
One of my pathological obsessions is programming languages. Since I first got exposed to TRS-80 Model 1 BASIC back in middle school, I’ve been absolutely nuts programming languages. Last time I counted, I’d learned about 130 different languages; and I’ve picked up more since then. I’ve written programs most of them. Like I said, I’m nuts.
Anyway, I decided that it would be amusing to inflict my obsession on you, my readers, with a new feature: the friday pathological programming language. You see, there are plenty of *crazy* people out there; and many of them like to invent programming languages. Some very small number of them try to design good languages and succeed; a much larger number try to design good languages and fail; and *then* there are the folks who design the languages I’m going to talk about. They’re the ones who set out to design *bizzare* programming languages, and succeed brilliantly. They call them “esoteric” programming languages. I call them evil.
Today, the beautiful grand-daddy of the esoteric language family: the one, the only, the truly and deservedly infamous: [Brainfuck!][bf], designed by Urban Müller. (There are a number of different implementations available; just follow the link.)
Only 8 commands – including input and output – all written using symbols. And yet Turing complete; and not just Turing complete, but actually based on a *real* [formal theoretical design][pprimeprime]. And it’s even been implemented [*in hardware*!][bf-hard]
BrainFuck is based on something very much like a twisted cross between a [Turing machine][turing] and a [Minsky machine][minsky]. It’s got the idea of an input tape, like the turing machine. But unlike the turing machine, each cell of the tape stores a number, which can be incremented or decremented, like a Minsky machine. And like a Minsky, the only control flow is a test for zero.
The 8 instructions:
1. **>**: move the tape head one cell forward.
2. **+++++++++.++++++-.+++++++..+++.>++++<>.<——.++++++
.+++.——.——–.>+.
Let’s pull that apart just a bit so that we can hope to understand.
* “++++++++”: store the number “8” in the current tape cell. We’re going to use that as a loop index, so the loop is going to repeat 8 times.
* “[>+++++++++.”: go to the cell after the loop index, and output what’s there. That outputs the “72” as a character: “H”.
* “++++++-.”: Advance past the index, subtract one, and output. That’s 101, or “e”.
Continues in pretty much the same vein, using a couple of tape cells, and running loops to generate the values of the characters. Beautiful, eh?
If that didn’t seem impressive enough, [here][bf-fib] is a really gorgeous implementation of a fibonacci sequence generator, with documentation. The BF compiler used to write this ignores any character other than the 8 commands, so the comments don’t need to be marked in any way; they just need to be really careful not to use punctuation.
+++++++++++ number of digits to output > #1 + initial number >>>> #5 ++++++++++++++++++++++++++++++++++++++++++++ (comma) > #6 ++++++++++++++++++++++++++++++++ (space) <<<<< #1 copy #1 to #7 [>>>>>>+>+<<<<<<>>>>>>[<<<<<<>>>>>>-] ++++++++++ set the divisor #8 [ subtract from the dividend and divisor ->+>+<<>>[<<>>-] set #10 + if #9 clear #10 [-][<>>+<<>[-]] jump back to #8 (divisor possition) <>> #11 copy to #13 [>>+>+<<>>[<<>>-] set #14 + if #13 clear #14 [-][<>[-]] <<<<<<>>>> #12 if #12 output value plus offset to ascii 0 [++++++++++++++++++++++++++++++++++++++++++++++++.[-]] subtract #11 from 10 ++++++++++ #12 is now 10 - #12 output #12 even if it's zero ++++++++++++++++++++++++++++++++++++++++++++++++.[-] <<<<<<<<<<< #1 check for final number copy #0 to #3 >>+>+<<<>>>[<<<>>>-] >.>.<<<[-]] <>+>+<<>>[<<>>-]<<[-]>[-]<<<- ]
[bf]: http://www.muppetlabs.com/~breadbox/bf/
[bf-fib]: http://esoteric.sange.fi/brainfuck/bf-source/prog/fibonacci.txt
[turing]: http://goodmath.blogspot.com/2006/03/playing-with-mathematical-machines.html
[minsky]: http://goodmath.blogspot.com/2006/05/minsky-machine.html
[bf-hard]: http://www.robos.org/?bfcomp
[pprimeprime]: http://en.wikipedia.org/wiki/P_prime_prime
Why so many languages? Programming languages, Computation, and Math.
Back at my old digs last week, I put up a post about programming languages and types. It produced an interesting discussion, which ended up shifting topics a bit, and leading to a very interesting question from one of the posters, and since the answer actually really does involve math, I’m going to pop it up to the front page here.
In the discussion, I argued that programmers should know and use many different programming languages; and that that’s not just a statement about todays programming languages, but something that I think will always be true: that there will always be good reasons for having and using more than one programming language.
One of the posters was quite surprised by this, and wanted to know why I didn’t think that it was possible, at least in principle, to design a single ideal programming language that would be good for any program that I needed to write.
It’s a good question, which I think connects very naturally into the underlying mathematics of computation. As I discussed back on the old blog, one of the very fundamental facts concerning computation is the Church-Turing thesis: that computation has a fundamental limit, and that there are many different ways of performing mechanical computation, but ultimately, they all end up with the same fundamental capabilities. Any computing system that reaches that limit is called an effective computing system (ECS). Anything that one ECS can do, any other ECS can do too. That doesn’t mean that they’re all identical. A given computation can be really easy to understand when described in terms of one ECS, and horribly difficult in another. For example, sorting a list of numbers is pretty easy to understand if it’s written in lambda calculus; but it’s a nightmare written for a Minsky machine; and it’s somewhere darned close to impossible for a human to understand written out as a cell-grid for Conway’s life.
How does this connect back to programming languages? Well, what is a programming language really? From a theoretical point of view, it’s a language for specifying a computing machine. Each program is a description of a specific computing machine. The language is a particular way of expressing the computing machine that you want to build. Underlying each programming language is an effective computing system: a fundamental model of how to perform computations. That’s the real fundamental difference between programming languages: what fundamental way of representing computation underlies the language.
Assuming that you’re looking at a good programming language, it’s good for a particular task when that task is easy to describe in terms of the fundamental ECS underlying the language; it’s bad for a task when that task is not easy to describe in terms of the languages underlying ECS.
If a language has a single ECS underlying it, then there are tasks that it’s good for, and tasks that it’s not so good for. If you try to smoosh multiple ECSs together under the covers of one language, then you don’t really have one language: when you’re writing for a vonNeumann machine, you’re really using one language, and when you’re writing for Actors, you’re using another. You’ve just forced two languages together into the same framework – but they’re still two languages, and you’ve probably compromised the quality of expression of those two languages by forcing them together.