Planet Haskell

July 23, 2014

Neil Mitchell

Applicative vs Monadic build systems

Summary: Shake is a monadic build system, and monadic build systems are more powerful than applicative ones.

Several people have wondered if the dependencies in the Shake build system are monadic, and if Make dependencies are applicative. In this post I'll try and figure out what that means, and show that the claim is somewhat true.

Gergo recently wrote a good primer on the concepts of Applicative, Monads and Arrows (it is worth reading the first half if you are unfamiliar with monad or applicative). Using a similar idea, we can model a simple build system as a set of rules:

rules :: [(FilePath, Action String)]
rules = [("a+b", do a <- need "a"; b <- need "b"; return (a ++ b))
,("a" , return "Hello ")
,("b" , return "World")

Each rule is on a separate line, containing a pair of the file the rule produces (e.g. a for the second rule) and the action that produces the files contents (e.g. return "Hello"). I've used need to allow a rule to use the contents of another file, so the rule for a+b depends on the files a and b, then concatenates their contents. We can run these rules to produce all the files. We've written these rules assuming Action is a Monad, using the do notation for monads. However, for the above build system, we can restrict ourselves to Applicative functions:

rules = [("a+b", (++) <$> need "a" <*> need "b")
,("a" , pure "Hello ")
,("b" , pure "World")

If Action is applicative but not monadic then we can statically (without running any code operating on file contents) produce a dependency graph. If Action is monadic we can't generate a graph upfront, but there are some build systems that cannot be expressed applicatively. In particular, using a monad we can write a "dereferencing" build system:

rules = [("!a", do a <- need "a"; need a)
,("a" , pure "b")
,("b" , pure "Goodbye")

To build the file !a we first require the file a (which produces the contents b), then we require the file b (which produces the contents Goodbye). Note that the first rule has changed b the content into b the file name. In general, to move information from the file content to a file name, requires a monad. Alternatively stated, a monad lets you chose future dependencies based on the results of previous dependencies.

One realistic example (from the original Shake paper), is building a .tar file from the list of files contained in a file. Using Shake we can write the Action:

contents <- readFileLines "list.txt"
need contents
cmd "tar -cf" [out] contents

The only build systems that I'm aware of that are monadic are redo, SCons and Shake-inspired build systems (including Shake itself, Jenga in OCaml, and several Haskell alternatives).

While it is the case that Shake is monadic, and that monadic build systems are more powerful than applicative ones, it is not the case that Make is applicative. In fact, almost no build systems are purely applicative. Looking at the build shootout, every build system tested can implement the !a example (provided the file a is not a build product), despite several systems being based on applicative dependencies.

Looking at Make specifically, it's clear that the output: input1 input2 formulation of dependencies is applicative in nature. However, there are at least two aspects I'm aware of that increase the power of Make:

  • Using $(shell cat list.txt) I can splice the contents of list.txt into the Makefile, reading the contents of list.txt before the dependencies are parsed.
  • Using -include file.d I can include additional rules that are themselves produced by the build system.

It seems every "applicative" build system contains some mechanism for extending its power. I believe some are strictly less powerful than monadic systems, while others may turn out to be an encoding of monadic rules. However, I think that an explicitly monadic definition provides a clearer foundation.

by Neil Mitchell ( at July 23, 2014 07:11 PM

Mark Jason Dominus

When do n and 2n have the same digits?

[This article was published last month on the math.stackexchange blog, which seems to have died young, despite many earnest-sounding promises beforehand from people who claimed they would contribute material. I am repatriating it here.]

A recent question on math.stackexchange asks for the smallest positive integer for which the number has the same decimal digits in some other order.

Math geeks may immediately realize that has this property, because it is the first 6 digits of the decimal expansion of , and the cyclic behavior of the decimal expansion of is well-known. But is this the minimal solution? It is not. Brute-force enumeration of the solutions quickly reveals that there are 12 solutions of 6 digits each, all permutations of , and that larger solutions, such as 1025874 and 1257489 seem to follow a similar pattern. What is happening here?

Stuck in Dallas-Fort Worth airport one weekend, I did some work on the problem, and although I wasn't able to solve it completely, I made significant progress. I found a method that allows one to hand-calculate that there is no solution with fewer than six digits, and to enumerate all the solutions with 6 digits, including the minimal one. I found an explanation for the surprising behavior that solutions tend to be permutations of one another. The short form of the explanation is that there are fairly strict conditions on which sets of digits can appear in a solution of the problem. But once the set of digits is chosen, the conditions on that order of the digits in the solution are fairly lax.

So one typically sees, not only in base 10 but in other bases, that the solutions to this problem fall into a few classes that are all permutations of one another; this is exactly what happens in base 10 where all the 6-digit solutions are permutations of . As the number of digits is allowed to increase, the strict first set of conditions relaxes a little, and other digit groups appear as solutions.


The property of interest, , is that the numbers and have exactly the same base- digits. We would like to find numbers having property for various , and we are most interested in . Suppose is an -digit numeral having property ; let the (base-) digits of be and similarly the digits of are . The reader is encouraged to keep in mind the simple example of which we will bring up from time to time.

Since the digits of and are the same, in a different order, we may say that for some permutation . In general might have more than one cycle, but we will suppose that is a single cycle. All the following discussion of will apply to the individual cycles of in the case that is a product of two or more cycles. For our example of , we have in cycle notation. We won't need to worry about the details of , except to note that completely exhaust the indices , and that because is an -cycle.

Conditions on the set of digits in a solution

For each we have $$a_{P(i)} = b_{i} \equiv 2a_{i} + c_i\pmod R $$ where the ‘carry bit’ is either 0 or 1 and depends on whether there was a carry when doubling . (When we are in the rightmost position and there is never a carry, so .) We can then write:

$$\begin{align} a_{P(P(i))} &= 2a_{P(i)} + c_{P(i)} \\ &= 2(2a_{i} + c_i) + c_{P(i)} &&= 4a_i + 2c_i + c_{P(i)}\\ a_{P(P(P(i)))} &= 2(4a_i + 2c_i + c_{P(P(i)})) + c_{P(i)} &&= 8a_i + 4c_i + 2c_{P(i)} + c_{P(P(i))}\\ &&&\vdots\\ a_{P^n(i)} &&&= 2^na_i + v \end{align} $$

all equations taken . But since is an -cycle, , so we have $$a_i \equiv 2^na_i + v\pmod R$$ or equivalently $$\big(2^n-1\big)a_i + v \equiv 0\pmod R\tag{$\star$}$$ where depends only on the values of the carry bits —the are precisely the binary digits of .

Specifying a particular value of and that satisfy this equation completely determines all the . For example, is a solution when because , and this solution allows us to compute

$$\def\db#1{\color{darkblue}{#1}}\begin{align} a_0&&&=2\\ a_{P(0)} &= 2a_0 &+ \db0 &= 4\\ a_{P^2(0)} &= 2a_{P(0)} &+ \db0 &= 0 \\ a_{P^3(0)} &= 2a_{P^2(0)} &+ \db1 &= 1\\ \hline a_{P^4(0)} &= 2a_{P^3(0)} &+ \db0 &= 2\\ \end{align}$$

where the carry bits are visible in the third column, and all the sums are taken . Note that as promised. This derivation of the entire set of from a single one plus a choice of is crucial, so let's see one more example. Let's consider . Then we want to choose and so that where . One possible solution is . Then we can derive the other as follows:

$$\begin{align} a_0&&&=5\\ a_{P(0)} &= 2a_0 &+ \db1 &= 1\\ a_{P^2(0)} &= 2a_{P(0)} &+ \db0 &= 2 \\\hline a_{P^3(0)} &= 2a_{P^2(0)} &+ \db1 &= 5\\ \end{align}$$

And again we have as required.

Since the bits of are used cyclically, not every pair of will yield a different solution. Rotating the bits of and pairing them with different choices of will yield the same cycle of digits starting from a different place. In the first example above, we had . If we were to take (which also solves ) we would get the same cycle of values of the but starting from instead of from , and similarly if we take or . So we can narrow down the solution set of by considering only the so-called bracelets of rather than all possible values. Two values of are considered equivalent as bracelets if one is a rotation of the other. When a set of -values are equivalent as bracelets, we need only consider one of them; the others will give the same cyclic sequence of digits, but starting in a different place. For , for example, the bracelets are and ; the sequences and being equivalent to , and so on.


Let us take , so we want to find 3-digit numerals with property . According to we need where . There are 9 possible values for ; for each one there is at most one possible value of that makes the sum zero:

$$\pi \approx 3 $$

$$\begin{array}{rrr} a_i & 7a_i & v \\ \hline 0 & 0 & 0 \\ 1 & 7 & 2 \\ 2 & 14 & 4 \\ 3 & 21 & 6 \\ 4 & 28 & \\ 5 & 35 & 1 \\ 6 & 42 & 3 \\ 7 & 49 & 5 \\ 8 & 56 & 7 \\ \end{array} $$

(For there is no solution.) We may disregard the non-bracelet values of , as these will give us solutions that are the same as those given by bracelet values of . The bracelets are:

$$\begin{array}{rl} 000 & 0 \\ 001 & 1 \\ 011 & 3 \\ 111 & 7 \end{array}$$

so we may disregard the solutions exacpt when . Calculating the digit sequences from these four values of and the corresponding we find:

$$\begin{array}{ccl} a_0 & v & \text{digits} \\ \hline 0 & 0 & 000 \\ 5 & 1 & 512 \\ 6 & 3 & 637 \\ 8 & 7 & 888 \ \end{array} $$

(In the second line, for example, we have , so and .)

Any number of three digits, for which contains exactly the same three digits, in base 9, must therefore consist of exactly the digits or .

A warning

All the foregoing assumes that the permutation is a single cycle. In general, it may not be. Suppose we did an analysis like that above for and found that there was no possible digit set, other than the trivial set 00000, that satisfied the governing equation . This would not completely rule out a base-10 solution with 5 digits, because the analysis only rules out a cyclic set of digits. There could still be a solution where was a product of a and a -cycle, or a product of still smaller cycles.

Something like this occurs, for example, in the case. Solving the governing equation yields only four possible digit cycles, namely , and . But there are several additional solutions: and . These correspond to permutations with more than one cycle. In the case of , for example, exchanges the and the , and leaves the and the fixed.

For this reason we cannot rule out the possibility of an -digit solution without first considering all smaller .

The Large Equals Odd rule

When is even there is a simple condition we can use to rule out certain sets of digits from being single-cycle solutions. Recall that and . Let us agree that a digit is large if and small otherwise. That is, is large if, upon doubling, it causes a carry into the next column to the left.

Since , where the are carry bits, we see that, except for , the digit is odd precisely when there is a carry from the next column to the right, which occurs precisely when is large. Thus the number of odd digits among is equal to the number of large digits among .
This leaves the digits and uncounted. But is never odd, since there is never a carry in the rightmost position, and is always small (since otherwise would have digits, which is not allowed). So the number of large digits in is exactly equal to the number of odd digits in . And since and have exactly the same digits, the number of large digits in is equal to the number of odd digits in . Observe that this is the case for our running example : there is one odd digit and one large digit (the 4).

When is odd the analogous condition is somewhat more complicated, but since the main case of interest is , we have the useful rule that:

For even, the number of odd digits in any solution is equal to the number of large digits in .

Conditions on the order of digits in a solution

We have determined, using the above method, that the digits might form a base-9 numeral with property . Now we would like to arrange them into a base-9 numeral that actually does have that property. Again let us write and , with . Note that if , then (if there was a carry from the next column to the right) or (if there was no carry), but since is impossible, we must have and therefore must be small, since there is no carry into position . But since is also one of , and it cannot also be , it must be . This shows that the 1, unless it appears in the rightmost position, must be to the left of the ; it cannot be to the left of the . Similarly, if then , because is impossible, so the must be to the left of a large digit, which must be the . Similar reasoning produces no constraint on the position of the ; it could be to the left of a small digit (in which case it doubles to ) or a large digit (in which case it doubles to ). We can summarize these findings as follows:

$$\begin{array}{cl} \text{digit} & \text{to the left of} \\ \hline 1 & 1, 2, \text{end} \\ 2 & 5 \\ 5 & 1,2,5,\text{end} \end{array}$$

Here “end” means that the indicated digit could be the rightmost.

Furthermore, the left digit of must be small (or else there would be a carry in the leftmost place and would have 4 digits instead of 3) so it must be either 1 or 2. It is not hard to see from this table that the digits must be in the order or , and indeed, both of those numbers have the required property: , and .

This was a simple example, but in more complicated cases it is helpful to draw the order constraints as a graph. Suppose we draw a graph with one vertex for each digit, and one additional vertex to represent the end of the numeral. The graph has an edge from vertex to whenever can appear to the left of . Then the graph drawn for the table above looks like this:

Graph for 125 base 9

A 3-digit numeral with property corresponds to a path in this graph that starts at one of the nonzero small digits (marked in blue), ends at the red node marked ‘end’, and visits each node exactly once. Such a path is called hamiltonian. Obviously, self-loops never occur in a hamiltonian path, so we will omit them from future diagrams.

Now we will consider the digit set , again base 9. An analysis similar to the foregoing allows us to construct the following graph:

Graph for 367 base 9

Here it is immediately clear that the only hamiltonian path is , and indeed, .

In general there might be multiple instances of a digit, and so multiple nodes labeled with that digit. Analysis of the case produces a graph with no legal start nodes and so no solutions, unless leading zeroes are allowed, in which case is a perfectly valid solution. Analysis of the case produces a graph with no path to the end node and so no solutions. These two trivial patterns appear for all and all , and we will ignore them from now on.

Returning to our ongoing example, in base 8, we see that and must double to and , so must be to the left of small digits, but and can double to either or and so could be to the left of anything. Here the constraints are so lax that the graph doesn't help us narrow them down much:

Graph for 1024 base 8

Observing that the only arrow into the 4 is from 0, so that the 4 must follow the 0, and that the entire number must begin with 1 or 2, we can enumerate the solutions:


If leading zeroes are allowed we have also:


All of these are solutions in base 8.

The case of

Now we turn to our main problem, solutions in base 10.

To find all the solutions of length 6 requires an enumeration of smaller solutions, which, if they existed, might be concatenated into a solution of length 6. This is because our analysis of the digit sets that can appear in a solution assumes that the digits are permuted cyclically; that is, the permutations that we considered had only one cycle each. If we perform the analy

There are no smaller solutions, but to prove that the length 6 solutions are minimal, we must analyze the cases for smaller and rule them out. We now produce a complete analysis of the base 10 case with and . For there is only the trivial solution of , which we disregard. (The question asked for a positive number anyway.)

For , we want to find solutions of where is a two-bit bracelet number, one of or . Tabulating the values of and that solve this equation we get:

$$\begin{array}{ccc} v& a_i \\ \hline 0 & 0 \\ 1& 3 \\ 3& 9 \\ \end{array}$$

We can disregard the and solutions because the former yields the trivial solution and the latter yields the nonsolution . So the only possibility we need to investigate further is , which corresponds to the digit sequence : Doubling gives us and doubling , plus a carry, gives us again.

But when we tabulate of which digits must be left of which informs us that there is no solution with just and , because the graph we get, once self-loops are eliminated, looks like this:

graph for 36 base 10

which obviously has no hamiltonian path. Thus there is no solution for .

For we need to solve the equation where is a bracelet number in , specifically one of or . Since and are relatively prime, for each there is a single that solves the equation. Tabulating the possible values of as before, and this time omitting rows with no solution, we have:

$$\begin{array}{rrl} v & a_i & \text{digits}\\ \hline 0& 0 & 000\\ 1& 7 & 748 \\ 3& 1 & 125\\ 7&9 & 999\\ \end{array}$$

The digit sequences and yield trivial solutions or nonsolutions as usual, and we will omit them in the future. The other two lines suggest the digit sets and , both of which fails the “odd equals large” rule.

This analysis rules out the possibility of a digit set with , but it does not completely rule out a 3-digit solution, since one could be obtained by concatenating a one-digit and a two-digit solution, or three one-digit solutions. However, we know by now that no one- or two-digit solutions exist. Therefore there are no 3-digit solutions in base 10.

For the governing equation is where is a 4-bit bracelet number, one of . This is a little more complicated because . Tabulating the possible digit sets, we get:

$$\begin{array}{crrl} a_i & 15a_i& v & \text{digits}\\ \hline 0 & 0 & 0 & 0000\\ 1 & 5 & 5 & 1250\\ 1 & 5 & 15 & 1375\\ 2 & 0 & 0 & 2486\\ 3 & 5 & 5 & 3749\\ 3 & 5 & 15 & 3751\\ 4 & 0 & 0 & 4862\\ 5 & 5 & 5 & 5012\\ 5 & 5 & 5 & 5137\\ 6 & 0 & 0 & 6248\\ 7 & 5 & 5 & 7493\\ 7 & 5 & 5 & 7513\\ 8 & 0 & 0 & 8624 \\ 9 & 5 & 5 & 9874\\ 9 & 5 & 15 & 9999 \\ \end{array}$$

where the second column has been reduced mod . Note that even restricting to bracelet numbers the table still contains duplicate digit sequences; the 15 entries on the right contain only the six basic sequences , and . Of these, only and obey the odd equals large criterion, and we will disregard and as usual, leaving only . We construct the corresponding graph for this digit set as follows: must double to , not , so must be left of a large number or . Similarly must be left of or . must also double to , so must be left of . Finally, must double to , so must be left of or the end of the numeral. The corresponding graph is:

graph for 3749 base 10

which evidently has no hamiltonian path: whichever of 3 or 4 we start at, we cannot visit the other without passing through 7, and then we cannot reach the end node without passing through 7 a second time. So there is no solution with and .

We leave this case as an exercise. There are 8 solutions to the governing equation, all of which are ruled out by the odd equals large rule.

For the possible solutions are given by the governing equation where is a 6-bit bracelet number, one of . Tabulating the possible digit sets, we get:

$$\begin{array}{crrl} v & a_i & \text{digits}\\ \hline 0 & 0 & 000000\\ 1 & 3 & 362486 \\ 3 & 9 & 986249 \\ 5 & 5 & 500012 \\ 7 & 1 & 124875 \\ 9 & 7 & 748748 \\ 11 & 3 & 362501 \\ 13 & 9 & 986374 \\ 15 & 5 & 500137 \\ 21 & 3 & 363636 \\ 23 & 9 & 989899 \\ 27 & 1 & 125125 \\ 31 & 3 & 363751 \\ 63 & 9 & 999999 \\ \end{array}$$

After ignoring and as usual, the large equals odd rule allows us to ignore all the other sequences except and . The latter fails for the same reason that did when . But , the lone survivor, gives us a complicated derived graph containing many hamiltonian paths, every one of which is a solution to the problem:

graph for 124578 base 10

It is not hard to pick out from this graph the minimal solution , for which , and also our old friend for which .

We see here the reason why all the small numbers with property contain the digits . The constraints on which digits can appear in a solution are quite strict, and rule out all other sequences of six digits and all shorter sequences. But once a set of digits passes these stringent conditions, the constraints on it are much looser, because is only required to have the digits of in some order, and there are many possible orders, many of which will satisfy the rather loose conditions involving the distribution of the carry bits. This graph is typical: it has a set of small nodes and a set of large nodes, and each node is connected to either all the small nodes or all the large nodes, so that the graph has many edges, and, as in this case, a largish clique of small nodes and a largish clique of large nodes, and as a result many hamiltonian paths.


This analysis is tedious but is simple enough to perform by hand in under an hour. As increases further, enumerating the solutions of the governing equation becomes very time-consuming. I wrote a simple computer program to perform the analysis for given and , and to emit the possible digit sets that satisfied the large equals odd criterion. I had wondered if every base-10 solution contained equal numbers of the digits and . This is the case for (where the only admissible digit set is ), for (where the only admissible sets are and ), and for (where the only admissible sets are and ). But when we reach the increasing number of bracelets has loosened up the requirements a little and there are 5 admissible digit sets. I picked two of the promising-seeming ones and quickly found by hand the solutions and , both of which wreck any theory that the digits must all appear the same number of times.


Thanks to Karl Kronenfeld for corrections and many helpful suggestions.

by Mark Dominus ( at July 23, 2014 01:39 PM

Robert Harper

A few new papers

I’ve just updated my web page with links to some new papers that are now available:

  1. Homotopical Patch Theory” by Carlo Angiuli, Ed Morehouse, Dan Licata, and Robert Harper. To appear, ICFP, Gothenburg, October 2014. We’ve also prepared a slightly expanded version with a new appendix containing material that didn’t make the cut for ICFP. (Why do we still have such ridiculously rigid and limited space limitations?  And why do we have such restricted pre-publication deadlines as we go through the charade of there being a “printing” of the proceedings?  One soon day CS will step into its own bright new future.). The point of the paper is to show how to apply basic methods of homotopy theory to various equational theories of patches for various sorts of data. One may see it as an application of functorial semantics in HoTT, in which theories are “implemented” by interpretation into a universe of sets. The patch laws are necessarily respected by any such interpretation, since they are just cells of higher dimension and functors must behave functorially at all dimensions.
  2. Cache Efficient Functional Algorithms” by Guy E. Blelloch and Robert Harper. To appear, Comm. ACM Research Highlight this fall.  Rewritten version of POPL 2013 paper for a broad CS audience.  Part of a larger effort to promote integration of combinatorial theory with logical and semantic theory, two theory communities that, in the U.S. at least, ignore each other completely.  (Well, to be plain about it, it seems to me that the ignoring goes more in one direction than the other.)  Cost semantics is one bridge between the two schools of thought, abandoning the age-old “reason about the compiled code” model used in algorithm analysis.  Here we show that one can reason about spatial locality at the abstract level, without having to drop down to the low-level details of how data structures are represented and allocated in memory.
  3. Refining Objects” by Robert Harper and Rowan Davies. To appear, Luca Cardelli 60th Birthday Celebration, Cambridge, October, 2014.  A paper I’ve meant to write sometime over the last 15 years, and finally saw the right opportunity, with Luca’s symposium coming up and Rowan Davies visiting Carnegie Mellon this past spring.  Plus it was a nice project to get me started working again after I was so rudely interrupted this past fall and winter.  Provides a different take on typing for dynamic dispatch that avoids the ad hoc methods introduced for oop, and instead deploying standard structural and behavioral typing techniques to do more with less.  This paper is a first cut to prove the concept, but it is clear that much more can be said here, all within the framework of standard proof-theoric and realizability-theoretic interpretations of types.  It would help to have read the relevant parts of PFPL, particularly the under-development second edition, which provides a lot of the background that we necessarily elide in this paper.
  4. Correctness of Compiling Polymorphism to Dynamic Typing” by Nick Benton, Kuen-Bang Hou (Favonia), and Robert Harper, draft (summer 2014).  Classically polymorphic type assignment starts with untyped \lambda-terms and assigns types to them as descriptions of their behavior.  Viewed as a compilation strategy for a polymorphic language, type assignment is rather crude in that every expression is compiled in uni-typed form, complete with the overhead of run-time classification and class checking.  A more subtle strategy is to maintain as much structural typing as possible, resorting to the use of dynamic typing (recursive types, naturally) only for variable types.  The catch is that polymorphic instantiation requires computation to resolve the incompatibility between, say, a bare natural number, which you want to compute with, and its encoding as a value of the one true dynamic type, which you never want but are stuck with in dynamic languages.  In this paper we work out an efficient compilation scheme that maximizes statically available information, and makes use of dynamic typing only insofar as the program demands we do so.  There are better ways to compile polymorphism, but the dynamic style is forced by badly designed virtual machines, such as the JVM, so it is worth studying the correctness properties of the translation.  We do so by making use of a combination of structural and behavioral typing, that is using types and refinements.

I hope to comment here more fully on these papers in the near future, but I also have a number of other essays queued up to go out as soon as I can find the time to write them.  Meanwhile, other deadlines loom large.

[Update: added fourth item neglected in first draft.  Revise formatting.  Add links to people. Brief summary of patch theory paper.]

Filed under: Programming, Research Tagged: behavioral typing, cache efficient algorithms, compilation, cost semantics, dynamic dispatch, homotopy type theory, ICFP, polymorphism, structural typing, type refinements

by Robert Harper at July 23, 2014 02:51 AM

July 22, 2014

JP Moresmau

EclipseFP 2.6.1 released!

I've just released EclipseFP 2.6.1. EclipseFP is a set of Eclipse plugins for Haskell development. This is a bug fixing release, mainly for GHC 7.8 support.

Release notes can be found here.

As usual, download from

Happy Haskell Hacking!

by JP Moresmau ( at July 22, 2014 04:40 PM

Functional Jobs

CTO / Tech Co-Founder at Capital Match (Full-time)

TL;DR: start and build the technology for a financial services marketplace in Asia. Compensation is salary plus double digit percent equity. There will be a short trial period to make sure both sides want to work with each other. Relocation to Singapore mandatory (trial could be remote and part-time).


Capital Match is bringing peer-to-peer lending (basically, a marketplace for retail/institutional lenders and corporate borrowers that bypasses the banking system) to Southeast Asia, where for various reasons the US incumbents have not entered. The founders are well connected and are bringing the right contacts and background to make it happen. The company started as a traditional financier for SMEs to better understand the market as well as legal and credit aspects of the business before it would embark on the P2P model.

If you would like to learn more about the business model, here is a link explaining it from the point of view of current very successful US incumbents:

Job description and compensation

The CTO will first build the marketplace, then grow the team as it gains traction. We provide the legal, financial and admin functions as well as the market research backing a high level functional spec; you just need to worry about building the product. The division of labour will be very clear: you are the final call on anything technical, and nobody will come micromanage your work.

Compensation will be a lowish middle class salary by Singapore standards and double digit percent equity, subject to a trial period. Note this is not a strictly technical business, and the marketplace problem is a relatively straightforward and well known one, with the value in the contacts and understanding of the market that goes into the functional spec. Though technology could bring a distinct value and advantage over time.

Additionally, we have eschewed raising much funding for now and most of the capital comes from the founders' personal savings (which we think is a positive signal - our interests are aligned) so don't expect Silicon Valley perks for a while. We don't have hog roasts and whisky tasting Fridays, but you get a real, founder-level stake in the company. Relocation to Singapore is primordial for the CTO, although the rest of the team you'll build can be remote. During a trial period you can work remotely and part-time.

Tech stack

Thanks to one founder's very positive experiences with the Haskell experiment at Zalora, we are very keen to use functional programming languages, especially Haskell. We are however technology agnostic ("best stack for the problem"). We have a bias towards those who prefer the relational model over NoSQL and towards open source.

Desired experience

The CV matters less than your ability to build things, so please send us any major open source project you have authored, both a link to the repo and a "basic" description targeted at the non-technical founders. We would prefer to see some financial services experience, especially on the security side, and some experience building similar products would be even better.

We want to utilize ample local government funding for high-tech start-ups so scientific / high-tech background and a post-grad degree would be preferred.

You can attempt to apply without an open source repo to your name, in that case build us a demonstration of your skills that you think reflects your ability.

Please send your application to pawel [at] capital-match [dot] com

Get information on how to apply for this position.

July 22, 2014 06:27 AM

Robert Harper

Summer of Programming Languages

Having just returned from the annual Oregon Programming Languages Summer School, at which I teach every year, I am once again very impressed with the impressive growth in the technical sophistication of the field and with its ability to attract brilliant young students whose enthusiasm and idealism are inspiring.  Eugene was, as ever, an ideal setting for the summer school, providing a gorgeous setting for work and relaxation.  I was particularly glad for the numerous chances to talk with students outside of the classroom, usually over beer, and I enjoyed, as usual, the superb cycling conditions in Eugene and the surrounding countryside.  Many students commented to me that the atmosphere at the summer school is wonderful, filled with people who are passionate about programming languages research, and suffused with a spirit of cooperation and sharing of ideas.

Started by Zena Ariola a dozen years ago, this year’s instance was organized by Greg Morrisett and Amal Ahmed in consultation with Zena.  As usual, the success of the school depended critically on the dedication of Jim Allen, who has been the de facto chief operating officer since it’s inception.  Without Jim, OPLSS could not exist.  His attention to detail, and his engagement with the students are legendary.   Support from the National Science Foundation CISE Division, ACM SIGPLANMicrosoft Research, Jane Street Capital, and BAE Systems was essential for providing an excellent venue,  for supporting a roster of first-rate lecturers, and for supporting the participation of students who might otherwise not have been able to attend.  And, of course, an outstanding roster of lecturers donated their time to come to Eugene for a week to share their ideas with the students and their fellow lecturers.

The schedule of lectures is posted on the web site, all of which were taped, and are made available on the web.  In addition many speakers provided course notes, software, and other backing materials that are also available online.  So even if you were not able to attend, you can still benefit from the summer school, and perhaps feel more motivated to come next summer.  Greg and I will be organizing, in consultation with Zena.  Applying the principle “don’t fix what isn’t broken”, we do not anticipate major changes, but there is always room for improvement and the need to freshen up the content every year.  For me the central idea of the summer school is the applicability of deep theory to everyday practice.  Long a dream held by researchers such as me, these connections become more “real” every year as the theoretical abstractions of yesterday become the concrete practices of today.  It’s breathtaking to see how far we’ve come from the days when I was a student just beginning to grasp the opportunities afforded by ideas from proof theory, type theory, and category theory (the Holy Trinity) to building beautiful software systems.  No longer the abstruse fantasies of mad (computer) scientists, these ideas are the very air we breathe in PL research.  Gone are the days of ad hoc language designs done in innocence of the foundations on which they rest.  Nowadays serious industrial-strength languages are emerging that are grounded in theory and informed by practice.

Two examples have arisen just this summer, Rust (from Mozila) and Swift (from Apple), that exemplify the trend.  Although I have not had time to study them carefully, much less write serious code using them, it is evident from even a brief review of their web sites that these are serious languages that take account of the academic developments of the last couple of decades in formulating new language designs to address new classes of problems that have arisen in programming practice.  These languages are type safe, a basic criterion of sensibility, and feature sophisticated type systems that include ideas such as sum types, which have long been missing from commercial languages, or provided only in comically obtuse ways (such as objects).  The infamous null pointer mistakes have been eradicated, and the importance of pattern matching (in the sense of the ML family of languages) is finally being appreciated as the cure for Boolean blindness.  For once I can look at new industrial languages without an overwhelming sense of disappointment, but instead with optimism and enthusiasm that important ideas are finally, at long last, being recognized and adopted.  As has often been observed, it takes 25 years for an academic language idea to make it into industrial practice.  With Java it was simply the 1970′s idea of automatic storage management; with languages such as Rust and Swift we are seeing ideas from the 80′s and 90′s make their way into industrial practice.  It’s cause for celebration, and encouragement for those entering the field: the right ideas do win out in the end, one just has to have the courage to be irrelevant.

I hope to find the time to comment more meaningfully on the recent developments in practical programming languages, including Rust and Swift, but also languages such as Go and OCaml that are also making inroads into programming practice.  (I’ve had quite enough to say about Haskell for the time being, so I’ll give that one a rest, but with a tip of the hat to its enormous popularity and influence, despite my criticisms.)  But for now, let me say that the golden age of programming language research is here and now, and promises to continue indefinitely as we develop a grand unified theory of programming and mathematics.

Filed under: Programming, Research, Teaching Tagged: OPLSS14, programming languages, Rust, Swift

by Robert Harper at July 22, 2014 05:22 AM

July 21, 2014

Robert Harper

Parallelism and Concurrency, Revisited

To my delight, I still get compliments on and criticisms of my post from three years ago (can it possibly be that long?) on parallelism and concurrency.  In that post I offered a “top down” argument to the effect that these are different abstractions with different goals: parallelism is about exploiting computational resources to maximize efficiency, concurrency is about non-deterministic composition of components in a system.  Parallelism never introduces bugs (the semantics is identical to the sequential execution), but concurrency could be said to be the mother lode of all bugs (the semantics of a component changes drastically, without careful provision, when composed concurrently with other components).  The two concepts just aren’t comparable, yet somehow the confusion between them persists.  (Not everyone agrees with me on this distinction, but neither have I seen a rigorous analysis that shows them to be the same concept.  Most complaints seem to be about my use of the words “parallelism” and “concurrency” , which is an unavoidable problem, or about my temerity in trying to define two somewhat ill-defined concepts, a criticism that I’ll just have to accept.)

I’ve recently gotten an inkling of why it might be that many people equate the two concepts (or see no point in distinguishing them).  This post is an attempt to clear up what I perceive to be a common misunderstanding that seems to explain it.  It’s hard for me to say whether it really is all that common of a misunderstanding, but it’s the impression I’ve gotten, so forgive me if I’m over-stressing an obvious point.  In any case I’m going to try for a “bottom up” explanation that might make more sense to some people.

The issue is scheduling.

The naive view of parallelism is that it’s just talk for concurrency, because all you do when you’re programming in parallel is fork off some threads, and then do something with their results when they’re done.  I’ve previously argued that this is the wrong way to think about parallelism (it’s really about cost), but let’s just let that pass.  It’s unarguably true that a parallel computation does consist of a bunch of, well, parallel computations.  So, the argument goes, it’s nothing but concurrency.  I’ve previously argued that that’s not a good way to think about concurrency either, but we’ll let that pass too.  So, the story goes, concurrency and parallelism are synonymous, and bullshitters like me are just trying to confuse people and make trouble.

Being the troublemaker that I am, my response is, predictably, nojust no.  Sure, it’s kinda sorta right, as I’ve already acknowledged, but not really, and here’s why: scheduling as you learned about it in OS class (for example) is an altogether different thing than scheduling for parallelism.  And this is the heart of the matter, from a “bottom-up” perspective.

There are two aspects of OS-like scheduling that I think are relevant here.  First, it is non-deterministic, and second, it is competitive.  Non-deterministic, because you have little or no control over what runs when or for how long.  A beast like the Linux scheduler is controlled by a zillion “voodoo parameters” (a turn of phrase borrowed from my queueing theory colleague, Mor Harchol-Balter), and who the hell knows what is going to happen to your poor threads once they’re in its clutches.  Second, and more importantly, an OS-like scheduler is allocating resources competitively.  You’ve got your threads, I’ve got my threads, and we both want ours to get run as soon as possible.  We’ll even pay for the privilege (priorities) if necessary.  The scheduler, and the queueing theory behind it (he says optimistically) is designed to optimize resource usage on a competitive basis, taking account of quality of service guarantees purchased by the participants.  It does not matter whether there is one processor or one thousand processors, the schedule is unpredictable.  That’s what makes concurrent programming hard: you have to program against all possible schedules.  And that’s why you can’t prove much about the time or space complexity of your program when it’s implemented concurrently.

Parallel scheduling is a whole ‘nother ball of wax.  It is (usually, but not necessarily) deterministic, so that you can prove bounds on its efficiency (Brent-type theorems, as I discussed in my previous post and in PFPL).  And, more importantly, it is cooperative in the sense that all threads are working together for the same computation towards the same ends.  The threads are scheduled so as to get the job (there’s only one) done as quickly and as efficiently as possible.  Deterministic schedulers for parallelism are the most common, because they are the easiest to analyze with respect to their time and space bounds.  Greedy schedulers, which guarantee to maximize use of available processors, never leaving any idle when there is work to be done, form an important class for which the simple form of Brent’s Theorem is obvious.

Many deterministic greedy scheduling algorithms are known, of which I will mention p-DFS and p-BFS, which do p-at-a-time depth- and breadth-first search of the dependency graph, and various forms of work-stealing schedulers, pioneered by Charles Leiserson at MIT.  (Incidentally, if you don’t already know what p-DFS or p-BFS are, I’ll warn you that they are a little trickier than they sound.  In particular p-DFS uses a data structure that is sort of like a stack but is not a stack.)  These differ significantly in their time bounds (for example, work stealing usually involves expectation over a random variable, whereas the depth- and breadth-first traversals do not), and differ dramatically in their space complexity.  For example, p-BFS is absolutely dreadful in its space complexity.  For a full discussion of these issues in parallel scheduling, I recommend Dan Spoonhower’s PhD Dissertation.  (His semantic profiling diagrams are amazingly beautiful and informative!)

So here’s the thing: when you’re programming in parallel, you don’t just throw some threads at some non-deterministic competitive scheduler.  Rather, you generate an implicit dependency graph that a cooperative scheduler uses to maximize efficiency, end-to-end.  At the high level you do an asymptotic cost analysis without considering platform parameters such as the number of processors or the nature of the interconnect.  At the low level the implementation has to validate that cost analysis by using clever techniques to ensure that, once the platform parameters are known, maximum use is made of the computational resources to get your job done for you as fast as possible.  Not only are there no bugs introduced by the mere fact of being scheduled in parallel, but even better, you can prove a theorem that tells you how fast your program is going to run on a real platform.  Now how cool is that?

[Update: word-smithing.]

Filed under: Programming, Research Tagged: concurrency, parallelism

by Robert Harper at July 21, 2014 04:36 PM

Philip Wadler

Meditations on Using Haskell

Bitemyapp - Meditations on Using Haskell explains why and how those in the trenches use Haskell, by quoting from conversations on an IRC channel.


So when i found haskell i slingshotted off through dependent and substructural types. Assuming that if a little was good a lot was better. Made it half way through TaPL and found pure type systems, coq, etc.
I think the power to weight ratio isn’t there. I find that Haskell gives amazingly expressive types that have amazingpower for the amount of code you tie up in them and that are very resistant to refactoring.
If i write agda and refactor I scrap and rewrite everything. If i write haskell, and get my tricky logic bits right?
I can refactor it, split things up into classes, play all the squishy software engineering games to get a nice API I want. And in the end if it still compiles I can trust I didn’t screw up the refactoring with a very high degree of assurance.


Admittedly I’m not playing at the level E is, but this was my experience. I can make sweeping changes to my API, get all the bugs caught by the type system, and still have minimal code impact.


That is what I was getting at with the tweet about not using dynamically typed langs because I need to be able to prototype quickly and get rapid feedback.
I think a lot of my friends thought i was just being trollish. Even just being able to see what would have to change if you changed your design slightly and being able to back it out quickly…

by Philip Wadler ( at July 21, 2014 09:27 AM

Douglas M. Auclair (geophf)

That's totes my Bag!

So, does that mean I like tote-bags?

So, today's question on @1HaskellADay was this:

write a function countOccurences :: [Stirng] -> Map Char Int

(typos faithfully reproduced)

such that

lookup 'l' $ countOccurences "Hello" ~> Just 2
lookup 'q' $ countOccurences "Hello" ~> Nothing

Okay, that can be done easily enough, I suppose, by torquing Map into something that it isn't, so one gets wrapped around the axel of creating a mapping from characters to occurrences.

But why?

First of all, countOccurences maps a String (not a List of Strings) to a Map, and that map is a very specialized kind of map that has existed in the literature for quite a while, and that map is known as the Bag data type, and is also, nowadays, called the MultiSet by people too embarrassed to say the word 'bag' in a sentence, because of their prior drug convictions.

("I got two months for selling a dime bag.")

So they now twist the word 'Set' (a 'collection of unique objects') to mean something that's not a set at all, the 'Multi'Set, which is a 'collection of unique objects, but you can have multiples of these unique objects, so they're not unique at all, so it isn't a set at all, but we need to say the word 'set' because we can't say the word 'bag' because saying the word 'bag' would make us sound plebeian for some reason.'

Yeah, that. 'MultiSet.'

What. Ev. Er.

But I digress.

As always.

So I COULD write the countOccurences as a String -> Map Char Int function, but then: why bother? You can either write tons of algorithmic code that obscures the intent or just simply use the appropriate data type.

I went for the latter.

Now, I wuz gonna do a dependently-typed pair to represent an occurrence...

... notice how countOccurences is so badly misspelled, by the way?

SOMEbody didn't QA-check their problem for the day today, I'm thinking.

... but then I said: 'eh!'

I mean: WHY is lookup 'q' $ countOccurences "Hello" ~> Nothing?

WHY can't it be that count 'q' for a Bag Char representation of "Hello" be 0? 0 is a valid answer and it keeps everything nice and monoidal without having to lift everything unnecessarily into the monadic domain.

So, yeah. Let's do that, instead.

So, here we go, and in Idris, because that's how I'm rolling these days. The advantages of dependent types have been enumerated elsewhere, so we'll just go with that they're better as an assumption and move on, using them, instead of extolling them, in this post.


So, my first attempt at Bag crashed and burned, because I did this:

data Bag : (x : Type) -> Type where
    add : Bag x -> x -> Bag x
    emptyBag : Bag x

and the compiler was fine with that. Hey, I can declare any type I'd like, so long as the types just stay as types, but as soon as I tried to define these things:

emptyList : List x
emptyList = []

emptyBag = Bag emptyList
add (Bag []) x = Bag [(x, 1)]
add (Bag ((x, y) :: rest)) x = Bag ((x, y + 1) :: rest)
add (Bag ((z, y) :: rest)) x = Bag ((z, y) :: (add rest x))

The compiler looked at me and asked: 'geophf, what in tarnation are you-ah tryin' to do?'

And about the only intelligent answer I could muster was: 'Ummmm... idk.'

I had gotten too clever for myself by half, trying to reshape a data type you learn in Comp.Sci. 101 as a purely functional type.

Back to Basics ... (but not BASIC)

So, let's just declare Bag to be what it is and KISS: 'keep it simple, stupid!'

Yes, let's.

data Bag x = Air | Stuffed (x, Nat) (Bag x)

Now, I so totally could've gone with the balanced binary-tree representation instead of the simple and standard linked list, but, you know: 'live and learn!'

With this declaration the emptyBag becomes so trivial as to be unnecessary, and then add is simplicity, itself, too, but add is, either way, so that's not saying much.

add : Eq x => Bag x -> x -> Bag x
add Air x = Stuffed (x, 1) Air
add (Stuffed (z, y) rest) x =
    case x == z of
        True  => Stuffed (x, y + 1) rest
        False => Stuffed (z, y) (add rest x)

Now, you see me relying on the case-statement, here. Unhappily.

I'd like my dependent types to say, 'unify x with x (reflexive) for the isomorphic case, and don't unify x with z for the other case.' But we're not there yet, or my coding isn't on par with being there yet, so I forced total coverage bifurcating the result-set into isomorphic and not with a hard-case statement.

Ick. I hate explicit case-statements! Where is really, really, really smart pattern-matching when I need it?

But with add, constructing a Bag becomes easy, and then counting elements of that bag is easy, too (again, with another case-statement, sigh!):

count : Eq x => x -> Bag x -> Nat
count _ Air = 0
count x (Stuffed (z, y) rest) =
    case x == z of
        True  => y
        False => count x rest

countOccurences (with one-too-few 'r's in the function name) becomes easy, given the Bag data type:

countOccurences : String -> Bag Char
countOccurences str = co' (unpack str) where
    co' [] = Air
    co' (char :: rest) = add (co' rest) char


But look at this:

depth : Bag x -> Nat
depth Air = 0
depth (Stuffed _ rest) = 1 + depth rest

sample : ?bag
sample = countOccurences "The quick, brown fox jumped over the lazy dog."

bag = proof search

When we do a depth sample, we get the not-surprising answer of 29 : Nat

Perhaps this could be made a tad bit more efficient?

Just perhaps.

Well, then, let's do that!

data Bag x = Air | Stuffed (x, Nat) (Bag x) (Bag x)

We make Bag balanced, with the add-function, doing the work of (very simply) branching off new nodes:

add : Ord x => Bag x -> x -> Bag x
add Air x = Stuffed (x, 1) Air Air
add (Stuffed (z, y) less more) x =
    case (compare x z) of
        LT => Stuffed (z, y) (add less x) more
        GT => Stuffed (z, y) less (add more x)
        EQ => Stuffed (z, y + 1) less more

Then all the other functions change ('morph') to work with a tree, not a list and work with Ord elements, not with (simply) Eq ones.

And so, the redefined depth-function gives a very different result:

depth sample ~> 9 : Nat

Not bad! Not bad! The improved data-structure improves efficiency across the board from O(N) to O(log N).

Hm, perhaps I'll have count return a dependently-typed pair, just as the library function filter does on List types, but not tonight.

Good night, Moon!

by geophf ( at July 21, 2014 01:14 AM

July 20, 2014

JP Moresmau

BuildWrapper/EclipseFP and GHC 7.8

I've been working on some issues related to GHC 7.8 in BuildWrapper and EclipseFP. On the EclipseFP side, mainly the quickfixes are affected, because EclipseFP parses the GHC error messages to offer them, and the quotes characters have changed in the GHC 7.8 messages.

On the BuildWrapper side, things are more complex. Adapting to API changes wasn't a big deal, but it seems that GHC bugs involving the GHC API, static linking and other unknowns cause some things to break. The solution I've found was to build BuildWrapper with the -dynamic flag. But I couldn't upload this to hackage because Cabal thinks that -dynamic is a debug flag (it starts with d). I've sent a bug fix to Cabal, so in the next release that'll be fixed. So if you're using GHC 7.8 and BuildWrapper, you may want to rebuild the executable with -dynamic (uncomment the relevant line in the cabal file).

Note: BuildWrapper comes with a comprehensive test suite (90 tests covering all aspects). So you can always build the tests and run them to ensure everyting is OK on your system.

Happy Haskell Hacking!

by JP Moresmau ( at July 20, 2014 05:24 PM

Gabriel Gonzalez

Equational reasoning at scale

Haskell programmers care about the correctness of their software and they specify correctness conditions in the form of equations that their code must satisfy. They can then verify the correctness of these equations using equational reasoning to prove that the abstractions they build are sound. To an outsider this might seem like a futile, academic exercise: proving the correctness of small abstractions is difficult, so what hope do we have to prove larger abstractions correct? This post explains how to do precisely that: scale proofs to large and complex abstractions.

Purely functional programming uses composition to scale programs, meaning that:

  • We build small components that we can verify correct in isolation
  • We compose smaller components into larger components

If you saw "components" and thought "functions", think again! We can compose things that do not even remotely resemble functions, such as proofs! In fact, Haskell programmers prove large-scale properties exactly the same way we build large-scale programs:

  • We build small proofs that we can verify correct in isolation
  • We compose smaller proofs into larger proofs

The following sections illustrate in detail how this works in practice, using Monoids as the running example. We will prove the Monoid laws for simple types and work our way up to proving the Monoid laws for much more complex types. Along the way we'll learn how to keep the proof complexity flat as the types grow in size.


Haskell's Prelude provides the following Monoid type class:

class Monoid m where
mempty :: m
mappend :: m -> m -> m

-- An infix operator equivalent to `mappend`
(<>) :: Monoid m => m -> m -> m
x <> y = mappend x y

... and all Monoid instances must obey the following two laws:

mempty <> x = x                -- Left identity

x <> mempty = x -- Right identity

(x <> y) <> z = x <> (y <> z) -- Associativity

For example, Ints form a Monoid:

-- See "Appendix A" for some caveats
instance Monoid Int where
mempty = 0
mappend = (+)

... and the Monoid laws for Ints are just the laws of addition:

0 + x = x

x + 0 = x

(x + y) + z = x + (y + z)

Now we can use (<>) and mempty instead of (+) and 0:

>>> 4 <> 2
>>> 5 <> mempty <> 5

This appears useless at first glance. We already have (+) and 0, so why are we using the Monoid operations?

Extending Monoids

Well, what if I want to combine things other than Ints, like pairs of Ints. I want to be able to write code like this:

>>> (1, 2) <> (3, 4)
(4, 6)

Well, that seems mildly interesting. Let's try to define a Monoid instance for pairs of Ints:

instance Monoid (Int, Int) where
mempty = (0, 0)
mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)

Now my wish is true and I can "add" binary tuples together using (<>) and mempty:

>>> (1, 2) <> (3, 4)
(4, 6)
>>> (1, 2) <> (3, mempty) <> (mempty, 4)
(4, 6)
>>> (1, 2) <> mempty <> (3, 4)
(4, 6)

However, I still haven't proven that this new Monoid instance obeys the Monoid laws. Fortunately, this is a very simple proof.

I'll begin with the first Monoid law, which requires that:

mempty <> x = x

We will begin from the left-hand side of the equation and try to arrive at the right-hand side by substituting equals-for-equals (a.k.a. "equational reasoning"):

-- Left-hand side of the equation
mempty <> x

-- x <> y = mappend x y
= mappend mempty x

-- `mempty = (0, 0)`
= mappend (0, 0) x

-- Define: x = (xL, xR), since `x` is a tuple
= mappend (0, 0) (xL, xR)

-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)
= (0 + xL, 0 + xR)

-- 0 + x = x
= (xL, xR)

-- x = (xL, xR)
= x

The proof for the second Monoid law is symmetric

-- Left-hand side of the equation
= x <> mempty

-- x <> y = mappend x y
= mappend x mempty

-- mempty = (0, 0)
= mappend x (0, 0)

-- Define: x = (xL, xR), since `x` is a tuple
= mappend (xL, xR) (0, 0)

-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)
= (xL + 0, xR + 0)

-- x + 0 = x
= (xL, xR)

-- x = (xL, xR)
= x

The third Monoid law requires that (<>) is associative:

(x <> y) <> z = x <> (y <> z)

Again I'll begin from the left side of the equation:

-- Left-hand side
(x <> y) <> z

-- x <> y = mappend x y
= mappend (mappend x y) z

-- x = (xL, xR)
-- y = (yL, yR)
-- z = (zL, zR)
= mappend (mappend (xL, xR) (yL, yR)) (zL, zR)

-- mappend (x1, y1) (x2 , y2) = (x1 + x2, y1 + y2)
= mappend (xL + yL, xR + yR) (zL, zR)

-- mappend (x1, y1) (x2 , y2) = (x1 + x2, y1 + y2)
= mappend ((xL + yL) + zL, (xR + yR) + zR)

-- (x + y) + z = x + (y + z)
= mappend (xL + (yL + zL), xR + (yR + zR))

-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)
= mappend (xL, xR) (yL + zL, yR + zR)

-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)
= mappend (xL, xR) (mappend (yL, yR) (zL, zR))

-- x = (xL, xR)
-- y = (yL, yR)
-- z = (zL, zR)
= mappend x (mappend y z)

-- x <> y = mappend x y
= x <> (y <> z)

That completes the proof of the three Monoid laws, but I'm not satisfied with these proofs.

Generalizing proofs

I don't like the above proofs because they are disposable, meaning that I cannot reuse them to prove other properties of interest. I'm a programmer, so I loathe busy work and unnecessary repetition, both for code and proofs. I would like to find a way to generalize the above proofs so that I can use them in more places.

We improve proof reuse in the same way that we improve code reuse. To see why, consider the following sort function:

sort :: [Int] -> [Int]

This sort function is disposable because it only works on Ints. For example, I cannot use the above function to sort a list of Doubles.

Fortunately, programming languages with generics let us generalize sort by parametrizing sort on the element type of the list:

sort :: Ord a => [a] -> [a]

That type says that we can call sort on any list of as, so long as the type a implements the Ord type class (a comparison interface). This works because sort doesn't really care whether or not the elements are Ints; sort only cares if they are comparable.

Similarly, we can make the proof more "generic". If we inspect the proof closely, we will notice that we don't really care whether or not the tuple contains Ints. The only Int-specific properties we use in our proof are:

0 + x = x

x + 0 = x

(x + y) + z = x + (y + z)

However, these properties hold true for all Monoids, not just Ints. Therefore, we can generalize our Monoid instance for tuples by parametrizing it on the type of each field of the tuple:

instance (Monoid a, Monoid b) => Monoid (a, b) where
mempty = (mempty, mempty)

mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)

The above Monoid instance says that we can combine tuples so long as we can combine their individual fields. Our original Monoid instance was just a special case of this instance where both the a and b types are Ints.

Note: The mempty and mappend on the left-hand side of each equation are for tuples. The memptys and mappends on the right-hand side of each equation are for the types a and b. Haskell overloads type class methods like mempty and mappend to work on any type that implements the Monoid type class, and the compiler distinguishes them by their inferred types.

We can similarly generalize our original proofs, too, by just replacing the Int-specific parts with their more general Monoid counterparts.

Here is the generalized proof of the left identity law:

-- Left-hand side of the equation
mempty <> x

-- x <> y = mappend x y
= mappend mempty x

-- `mempty = (mempty, mempty)`
= mappend (mempty, mempty) x

-- Define: x = (xL, xR), since `x` is a tuple
= mappend (mempty, mempty) (xL, xR)

-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)
= (mappend mempty xL, mappend mempty xR)

-- Monoid law: mappend mempty x = x
= (xL, xR)

-- x = (xL, xR)
= x

... the right identity law:

-- Left-hand side of the equation
= x <> mempty

-- x <> y = mappend x y
= mappend x mempty

-- mempty = (mempty, mempty)
= mappend x (mempty, mempty)

-- Define: x = (xL, xR), since `x` is a tuple
= mappend (xL, xR) (mempty, mempty)

-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)
= (mappend xL mempty, mappend xR mempty)

-- Monoid law: mappend x mempty = x
= (xL, xR)

-- x = (xL, xR)
= x

... and the associativity law:

-- Left-hand side
(x <> y) <> z

-- x <> y = mappend x y
= mappend (mappend x y) z

-- x = (xL, xR)
-- y = (yL, yR)
-- z = (zL, zR)
= mappend (mappend (xL, xR) (yL, yR)) (zL, zR)

-- mappend (x1, y1) (x2 , y2) = (mappend x1 x2, mappend y1 y2)
= mappend (mappend xL yL, mappend xR yR) (zL, zR)

-- mappend (x1, y1) (x2 , y2) = (mappend x1 x2, mappend y1 y2)
= (mappend (mappend xL yL) zL, mappend (mappend xR yR) zR)

-- Monoid law: mappend (mappend x y) z = mappend x (mappend y z)
= (mappend xL (mappend yL zL), mappend xR (mappend yR zR))

-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)
= mappend (xL, xR) (mappend yL zL, mappend yR zR)

-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)
= mappend (xL, xR) (mappend (yL, yR) (zL, zR))

-- x = (xL, xR)
-- y = (yL, yR)
-- z = (zL, zR)
= mappend x (mappend y z)

-- x <> y = mappend x y
= x <> (y <> z)

This more general Monoid instance lets us stick any Monoids inside the tuple fields and we can still combine the tuples. For example, lists form a Monoid:

-- Exercise: Prove the monoid laws for lists
instance Monoid [a] where
mempty = []

mappend = (++)

... so we can stick lists inside the right field of each tuple and still combine them:

>>> (1, [2, 3]) <> (4, [5, 6])
(5, [2, 3, 5, 6])
>>> (1, [2, 3]) <> (4, mempty) <> (mempty, [5, 6])
(5, [2, 3, 5, 6])
>>> (1, [2, 3]) <> mempty <> (4, [5, 6])
(5, [2, 3, 5, 6])

Why, we can even stick yet another tuple inside the right field and still combine them:

>>> (1, (2, 3)) <> (4, (5, 6))
(5, (7, 9))

We can try even more exotic permutations and everything still "just works":

>>> ((1,[2, 3]), ([4, 5], 6)) <> ((7, [8, 9]), ([10, 11), 12)
((8, [2, 3, 8, 9]), ([4, 5, 10, 11], 18))

This is our first example of a "scalable proof". We began from three primitive building blocks:

  • Int is a Monoid
  • [a] is a Monoid
  • (a, b) is a Monoid if a is a Monoid and b is a Monoid

... and we connected those three building blocks to assemble a variety of new Monoid instances. No matter how many tuples we nest the result is still a Monoid and obeys the Monoid laws. We don't need to re-prove the Monoid laws every time we assemble a new permutation of these building blocks.

However, these building blocks are still pretty limited. What other useful things can we combine to build new Monoids?


We're so used to thinking of Monoids as data, so let's define a new Monoid instance for something entirely un-data-like:

-- See "Appendix A" for some caveats
instance Monoid b => Monoid (IO b) where
mempty = return mempty

mappend io1 io2 = do
a1 <- io1
a2 <- io2
return (mappend a1 a2)

The above instance says: "If a is a Monoid, then an IO action that returns an a is also a Monoid". Let's test this using the getLine function from the Prelude:

-- Read one line of input from stdin
getLine :: IO String

String is a Monoid, since a String is just a list of characters, so we should be able to mappend multiple getLine statements together. Let's see what happens:

>>> getLine  -- Reads one line of input
>>> getLine <> getLine
>>> getLine <> getLine <> getLine

Neat! When we combine multiple commands we combine their effects and their results.

Of course, we don't have to limit ourselves to reading strings. We can use readLn from the Prelude to read in anything that implements the Read type class:

-- Parse a `Read`able value from one line of stdin
readLn :: Read a => IO a

All we have to do is tell the compiler which type a we intend to Read by providing a type signature:

>>> readLn :: IO (Int, Int)
(1, 2)<Enter>
(1 ,2)
>>> readLn <> readLn :: IO (Int, Int)
>>> readLn <> readLn <> readLn :: IO (Int, Int)

This works because:

  • Int is a Monoid
  • Therefore, (Int, Int) is a Monoid
  • Therefore, IO (Int, Int) is a Monoid

Or let's flip things around and nest IO actions inside of a tuple:

>>> let ios = (getLine, readLn) :: (IO String, IO (Int, Int))
>>> let (getLines, readLns) = ios <> ios <> ios
>>> getLines
>>> readLns

We can very easily reason that the type (IO String, IO (Int, Int)) obeys the Monoid laws because:

  • String is a Monoid
  • If String is a Monoid then IO String is also a Monoid
  • Int is a Monoid
  • If Int is a Monoid, then (Int, Int) is also a `Monoid
  • If (Int, Int) is a Monoid, then IO (Int, Int) is also a Monoid
  • If IO String is a Monoid and IO (Int, Int) is a Monoid, then (IO String, IO (Int, Int)) is also a Monoid

However, we don't really have to reason about this at all. The compiler will automatically assemble the correct Monoid instance for us. The only thing we need to verify is that the primitive Monoid instances obey the Monoid laws, and then we can trust that any larger Monoid instance the compiler derives will also obey the Monoid laws.

The Unit Monoid

Haskell Prelude also provides the putStrLn function, which echoes a String to standard output with a newline:

putStrLn :: String -> IO ()

Is putStrLn combinable? There's only one way to find out!

>>> putStrLn "Hello" <> putStrLn "World"

Interesting, but why does that work? Well, let's look at the types of the commands we are combining:

putStrLn "Hello" :: IO ()
putStrLn "World" :: IO ()

Well, we said that IO b is a Monoid if b is a Monoid, and b in this case is () (pronounced "unit"), which you can think of as an "empty tuple". Therefore, () must form a Monoid of some sort, and if we dig into Data.Monoid, we will discover the following Monoid instance:

-- Exercise: Prove the monoid laws for `()`
instance Monoid () where
mempty = ()

mappend () () = ()

This says that empty tuples form a trivial Monoid, since there's only one possible value (ignoring bottom) for an empty tuple: (). Therefore, we can derive that IO () is a Monoid because () is a Monoid.


Alright, so we can combine putStrLn "Hello" with putStrLn "World", but can we combine naked putStrLn functions?

>>> (putStrLn <> putStrLn) "Hello"

Woah, how does that work?

We never wrote a Monoid instance for the type String -> IO (), yet somehow the compiler magically accepted the above code and produced a sensible result.

This works because of the following Monoid instance for functions:

instance Monoid b => Monoid (a -> b) where
mempty = \_ -> mempty

mappend f g = \a -> mappend (f a) (g a)

This says: "If b is a Monoid, then any function that returns a b is also a Monoid".

The compiler then deduced that:

  • () is a Monoid
  • If () is a Monoid, then IO () is also a Monoid
  • If IO () is a Monoid then String -> IO () is also a Monoid

The compiler is a trusted friend, deducing Monoid instances we never knew existed.

Monoid plugins

Now we have enough building blocks to assemble a non-trivial example. Let's build a key logger with a Monoid-based plugin system.

The central scaffold of our program is a simple main loop that echoes characters from standard input to standard output:

main = do
hSetEcho stdin False
forever $ do
c <- getChar
putChar c

However, we would like to intercept key strokes for nefarious purposes, so we will slightly modify this program to install a handler at the beginning of the program that we will invoke on every incoming character:

install :: IO (Char -> IO ())
install = ???

main = do
hSetEcho stdin False
handleChar <- install
forever $ do
c <- getChar
handleChar c
putChar c

Notice that the type of install is exactly the correct type to be a Monoid:

  • () is a Monoid
  • Therefore, IO () is also a Monoid
  • Therefore Char -> IO () is also a Monoid
  • Therefore IO (Char -> IO ()) is also a Monoid

Therefore, we can combine key logging plugins together using Monoid operations. Here is one such example:

type Plugin = IO (Char -> IO ())

logTo :: FilePath -> Plugin
logTo filePath = do
handle <- openFile filePath WriteMode
return (hPutChar handle)

main = do
hSetEcho stdin False
handleChar <- logTo "file1.txt" <> logTo "file2.txt"
forever $ do
c <- getChar
handleChar c
putChar c

Now, every key stroke will be recorded to both file1.txt and file2.txt. Let's confirm that this works as expected:

$ ./logger
$ cat file1.txt
$ cat file2.txt

Try writing your own Plugins and mixing them in with (<>) to see what happens. "Appendix C" contains the complete code for this section so you can experiment with your own Plugins.


Notice that I never actually proved the Monoid laws for the following two Monoid instances:

instance Monoid b => Monoid (a -> b) where
mempty = \_ -> mempty
mappend f g = \a -> mappend (f a) (g a)

instance Monoid a => Monoid (IO a) where
mempty = return mempty

mappend io1 io2 = do
a1 <- io1
a2 <- io2
return (mappend a1 a2)

The reason why is that they are both special cases of a more general pattern. We can detect the pattern if we rewrite both of them to use the pure and liftA2 functions from Control.Applicative:

import Control.Applicative (pure, liftA2)

instance Monoid b => Monoid (a -> b) where
mempty = pure mempty

mappend = liftA2 mappend

instance Monoid b => Monoid (IO b) where
mempty = pure mempty

mappend = liftA2 mappend

This works because both IO and functions implement the following Applicative interface:

class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b

-- Lift a binary function over the functor `f`
liftA2 :: Applicative f => (a -> b -> c) -> f a -> f b -> f c
liftA2 f x y = (pure f <*> x) <*> y

... and all Applicative instances must obey several Applicative laws:

pure id <*> v = v

((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)

pure f <*> pure x = pure (f x)

u <*> pure x = pure (\f -> f y) <*> u

These laws may seem a bit adhoc, but this paper explains that you can reorganize the Applicative class to this equivalent type class:

class Functor f => Monoidal f where
unit :: f ()
(#) :: f a -> f b -> f (a, b)

Then the corresponding laws become much more symmetric:

fmap snd (unit # x) = x                 -- Left identity

fmap fst (x # unit) = x -- Right identity

fmap assoc ((x # y) # z) = x # (y # z) -- Associativity
assoc ((a, b), c) = (a, (b, c))

fmap (f *** g) (x # y) = fmap f x # fmap g y -- Naturality
(f *** g) (a, b) = (f a, g b)

I personally prefer the Monoidal formulation, but you go to war with the army you have, so we will use the Applicative type class for this post.

All Applicatives possess a very powerful property: they can all automatically lift Monoid operations using the following instance:

instance (Applicative f, Monoid b) => Monoid (f b) where
mempty = pure mempty

mappend = liftA2 mappend

This says: "If f is an Applicative and b is a Monoid, then f b is also a Monoid." In other words, we can automatically extend any existing Monoid with some new feature f and get back a new Monoid.

Note: The above instance is bad Haskell because it overlaps with other type class instances. In practice we have to duplicate the above code once for each Applicative. Also, for some Applicatives we may want a different Monoid instance.

We can prove that the above instance obeys the Monoid laws without knowing anything about f and b, other than the fact that f obeys the Applicative laws and b obeys the Applicative laws. These proofs are a little long, so I've included them in Appendix B.

Both IO and functions implement the Applicative type class:

instance Applicative IO where
pure = return

iof <*> iox = do
f <- iof
x <- iox
return (f x)

instance Applicative ((->) a) where
pure x = \_ -> x

kf <*> kx = \a ->
let f = kf a
x = kx a
in f x

This means that we can kill two birds with one stone. Every time we prove the Applicative laws for some functor F:

instance Applicative F where ...

... we automatically prove that the following Monoid instance is correct for free:

instance Monoid b => Monoid (F b) where
mempty = pure mempty

mappend = liftA2 mappend

In the interest of brevity, I will skip the proofs of the Applicative laws, but I may cover them in a subsequent post.

The beauty of Applicative Functors is that every new Applicative instance we discover adds a new building block to our Monoid toolbox, and Haskell programmers have already discovered lots of Applicative Functors.

Revisiting tuples

One of the very first Monoid instances we wrote was:

instance (Monoid a, Monoid b) => Monoid (a, b) where
mempty = (mempty, mempty)

mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)

Check this out:

instance (Monoid a, Monoid b) => Monoid (a, b) where
mempty = pure mempty

mappend = liftA2 mappend

This Monoid instance is yet another special case of the Applicative pattern we just covered!

This works because of the following Applicative instance in Control.Applicative:

instance Monoid a => Applicative ((,) a) where
pure b = (mempty, b)

(a1, f) <*> (a2, x) = (mappend a1 a2, f x)

This instance obeys the Applicative laws (proof omitted), so our Monoid instance for tuples is automatically correct, too.

Composing applicatives

In the very first section I wrote:

Haskell programmers prove large-scale properties exactly the same way we build large-scale programs:

  • We build small proofs that we can verify correct in isolation
  • We compose smaller proofs into larger proofs

I don't like to use the word compose lightly. In the context of category theory, compose has a very rigorous meaning, indicating composition of morphisms in some category. This final section will show that we can actually compose Monoid proofs in a very rigorous sense of the word.

We can define a category of Monoid proofs:

So in our Plugin example, we began on the proof that () was a Monoid and then composed three Applicative morphisms to prove that Plugin was a Monoid. I will use the following diagram to illustrate this:

| |
| Legend: * = Object |
| |
| v |
| | = Morphism |
| v |
| |

* `()` is a `Monoid`

| IO

* `IO ()` is a `Monoid`

| ((->) String)

* `String -> IO ()` is a `Monoid`

| IO

* `IO (String -> IO ())` (i.e. `Plugin`) is a `Monoid`

Therefore, we were literally composing proofs together.


You can equationally reason at scale by decomposing larger proofs into smaller reusable proofs, the same way we decompose programs into smaller and more reusable components. There is no limit to how many proofs you can compose together, and therefore there is no limit to how complex of a program you can tame using equational reasoning.

This post only gave one example of composing proofs within Haskell. The more you learn the language, the more examples of composable proofs you will encounter. Another common example is automatically deriving Monad proofs by composing monad transformers.

As you learn Haskell, you will discover that the hard part is not proving things. Rather, the challenge is learning how to decompose proofs into smaller proofs and you can cultivate this skill by studying category theory and abstract algebra. These mathematical disciplines teach you how to extract common and reusable proofs and patterns from what appears to be disposable and idiosyncratic code.

Appendix A - Missing Monoid instances

These Monoid instance from this post do not actually appear in the Haskell standard library:

instance Monoid b => Monoid (IO b)

instance Monoid Int

The first instance was recently proposed here on the Glasgow Haskell Users mailing list. However, in the short term you can work around it by writing your own Monoid instances by hand just by inserting a sufficient number of pures and liftA2s.

For example, suppose we wanted to provide a Monoid instance for Plugin. We would just newtype Plugin and write:

newtype Plugin = Plugin { install :: IO (String -> IO ()) }

instance Monoid Plugin where
mempty = Plugin (pure (pure (pure mempty)))

mappend (Plugin p1) (Plugin p2) =
Plugin (liftA2 (liftA2 (liftA2 mappend)) p1 p2)

This is what the compiler would have derived by hand.

Alternatively, you could define an orphan Monoid instance for IO, but this is generally frowned upon.

There is no default Monoid instance for Int because there are actually two possible instances to choose from:

-- Alternative #1
instance Monoid Int where
mempty = 0

mappend = (+)

-- Alternative #2
instance Monoid Int where
mempty = 1

mappend = (*)

So instead, Data.Monoid sidesteps the issue by providing two newtypes to distinguish which instance we prefer:

newtype Sum a = Sum { getSum :: a }

instance Num a => Monoid (Sum a)

newtype Product a = Product { getProduct :: a}

instance Num a => Monoid (Product a)

An even better solution is to use a semiring, which allows two Monoid instances to coexist for the same type. You can think of Haskell's Num class as an approximation of the semiring class:

class Num a where
fromInteger :: Integer -> a

(+) :: a -> a -> a

(*) :: a -> a -> a

-- ... and other operations unrelated to semirings

Note that we can also lift the Num class over the Applicative class, exactly the same way we lifted the Monoid class. Here's the code:

instance (Applicative f, Num a) => Num (f a) where
fromInteger n = pure (fromInteger n)

(+) = liftA2 (+)

(*) = liftA2 (*)

(-) = liftA2 (-)

negate = fmap negate

abs = fmap abs

signum = fmap signum

This lifting guarantees that if a obeys the semiring laws then so will f a. Of course, you will have to specialize the above instance to every concrete Applicative because otherwise you will get overlapping instances.

Appendix B

These are the proofs to establish that the following Monoid instance obeys the Monoid laws:

instance (Applicative f, Monoid b) => Monoid (f b) where
mempty = pure mempty

mappend = liftA2 mappend

... meaning that if f obeys the Applicative laws and b obeys the Monoid laws, then f b also obeys the Monoid laws.

Proof of the left identity law:

mempty <> x

-- x <> y = mappend x y
= mappend mempty x

-- mappend = liftA2 mappend
= liftA2 mappend mempty x

-- mempty = pure mempty
= liftA2 mappend (pure mempty) x

-- liftA2 f x y = (pure f <*> x) <*> y
= (pure mappend <*> pure mempty) <*> x

-- Applicative law: pure f <*> pure x = pure (f x)
= pure (mappend mempty) <*> x

-- Eta conversion
= pure (\a -> mappend mempty a) <*> x

-- mappend mempty x = x
= pure (\a -> a) <*> x

-- id = \x -> x
= pure id <*> x

-- Applicative law: pure id <*> v = v
= x

Proof of the right identity law:

x <> mempty = x

-- x <> y = mappend x y
= mappend x mempty

-- mappend = liftA2 mappend
= liftA2 mappend x mempty

-- mempty = pure mempty
= liftA2 mappend x (pure mempty)

-- liftA2 f x y = (pure f <*> x) <*> y
= (pure mappend <*> x) <*> pure mempty

-- Applicative law: u <*> pure y = pure (\f -> f y) <*> u
= pure (\f -> f mempty) <*> (pure mappend <*> x)

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((pure (.) <*> pure (\f -> f mempty)) <*> pure mappend) <*> x

-- Applicative law: pure f <*> pure x = pure (f x)
= (pure ((.) (\f -> f mempty)) <*> pure mappend) <*> x

-- Applicative law : pure f <*> pure x = pure (f x)
= pure ((.) (\f -> f mempty) mappend) <*> x

-- `(.) f g` is just prefix notation for `f . g`
= pure ((\f -> f mempty) . mappend) <*> x

-- f . g = \x -> f (g x)
= pure (\x -> (\f -> f mempty) (mappend x)) <*> x

-- Apply the lambda
= pure (\x -> mappend x mempty) <*> x

-- Monoid law: mappend x mempty = x
= pure (\x -> x) <*> x

-- id = \x -> x
= pure id <*> x

-- Applicative law: pure id <*> v = v
= x

Proof of the associativity law:

(x <> y) <> z

-- x <> y = mappend x y
= mappend (mappend x y) z

-- mappend = liftA2 mappend
= liftA2 mappend (liftA2 mappend x y) z

-- liftA2 f x y = (pure f <*> x) <*> y
= (pure mappend <*> ((pure mappend <*> x) <*> y)) <*> z

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= (((pure (.) <*> pure mappend) <*> (pure mappend <*> x)) <*> y) <*> z

-- Applicative law: pure f <*> pure x = pure (f x)
= ((pure f <*> (pure mappend <*> x)) <*> y) <*> z
f = (.) mappend

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((((pure (.) <*> pure f) <*> pure mappend) <*> x) <*> y) <*> z
f = (.) mappend

-- Applicative law: pure f <*> pure x = pure (f x)
= (((pure f <*> pure mappend) <*> x) <*> y) <*> z
f = (.) ((.) mappend)

-- Applicative law: pure f <*> pure x = pure (f x)
= ((pure f <*> x) <*> y) <*> z
f = (.) ((.) mappend) mappend

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f = ((.) mappend) . mappend

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x = (((.) mappend) . mappend) x

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x = (.) mappend (mappend x)

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f x = mappend . (mappend x)

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x y = (mappend . (mappend x)) y

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x y = mappend (mappend x y)

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x y z = mappend (mappend x y) z

-- Monoid law: mappend (mappend x y) z = mappend x (mappend y z)
= ((pure f <*> x) <*> y) <*> z
f x y z = mappend x (mappend y z)

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x y z = (mappend x . mappend y) z

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x y = mappend x . mappend y

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f x y = (.) (mappend x) (mappend y)

-- (f . g) x = f
= ((pure f <*> x) <*> y) <*> z
f x y = (((.) . mappend) x) (mappend y)

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x y = ((((.) . mappend) x) . mappend) y

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x = (((.) . mappend) x) . mappend

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f x = (.) (((.) . mappend) x) mappend

-- Lambda abstraction
= ((pure f <*> x) <*> y) <*> z
f x = (\k -> k mappend) ((.) (((.) . mappend) x))

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x = (\k -> k mappend) (((.) . ((.) . mappend)) x)

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f = (\k -> k mappend) . ((.) . ((.) . mappend))

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f = (.) (\k -> k mappend) ((.) . ((.) . mappend))

-- Applicative law: pure f <*> pure x = pure (f x)
= (((pure g <*> pure f) <*> x) <*> y) <*> z
g = (.) (\k -> k mappend)
f = (.) . ((.) . mappend)

-- Applicative law: pure f <*> pure x = pure (f x)
= ((((pure (.) <*> pure (\k -> k mappend)) <*> pure f) <*> x) <*> y) <*> z
f = (.) . ((.) . mappend)

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((pure (\k -> k mappend) <*> (pure f <*> x)) <*> y) <*> z
f = (.) . ((.) . mappend)

-- u <*> pure y = pure (\k -> k y) <*> u
= (((pure f <*> x) <*> pure mappend) <*> y) <*> z
f = (.) . ((.) . mappend)

-- (.) f g = f . g
= (((pure f <*> x) <*> pure mappend) <*> y) <*> z
f = (.) (.) ((.) . mappend)

-- Applicative law: pure f <*> pure x = pure (f x)
= ((((pure g <*> pure f) <*> x) <*> pure mappend) <*> y) <*> z
g = (.) (.)
f = (.) . mappend

-- Applicative law: pure f <*> pure x = pure (f x)
= (((((pure (.) <*> pure (.)) <*> pure f) <*> x) <*> pure mappend) <*> y) <*> z
f = (.) . mappend

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= (((pure (.) <*> (pure f <*> x)) <*> pure mappend) <*> y) <*> z
f = (.) . mappend

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((pure f <*> x) <*> (pure mappend <*> y)) <*> z
f = (.) . mappend

-- (.) f g = f . g
= ((pure f <*> x) <*> (pure mappend <*> y)) <*> z
f = (.) (.) mappend

-- Applicative law: pure f <*> pure x = pure (f x)
= (((pure f <*> pure mappend) <*> x) <*> (pure mappend <*> y)) <*> z
f = (.) (.)

-- Applicative law: pure f <*> pure x = pure (f x)
= ((((pure (.) <*> pure (.)) <*> pure mappend) <*> x) <*> (pure mappend <*> y)) <*> z

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((pure (.) <*> (pure mappend <*> x)) <*> (pure mappend <*> y)) <*> z

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= (pure mappend <*> x) <*> ((pure mappend <*> y) <*> z)

-- liftA2 f x y = (pure f <*> x) <*> y
= liftA2 mappend x (liftA2 mappend y z)

-- mappend = liftA2 mappend
= mappend x (mappend y z)

-- x <> y = mappend x y
= x <> (y <> z)

Appendix C: Monoid key logging

Here is the complete program for a key logger with a Monoid-based plugin system:

import Control.Applicative (pure, liftA2)
import Control.Monad (forever)
import Data.Monoid
import System.IO

instance Monoid b => Monoid (IO b) where
mempty = pure mempty

mappend = liftA2 mappend

type Plugin = IO (Char -> IO ())

logTo :: FilePath -> Plugin
logTo filePath = do
handle <- openFile filePath WriteMode
return (hPutChar handle)

main = do
hSetEcho stdin False
handleChar <- logTo "file1.txt" <> logTo "file2.txt"
forever $ do
c <- getChar
handleChar c
putChar c

by Gabriel Gonzalez ( at July 20, 2014 03:27 PM

Mark Jason Dominus

Similarity analysis of quilt blocks

As I've discussed elsewhere, I once wrote a program to enumerate all the possible quilt blocks of a certain type. The quilt blocks in question are, in quilt jargon, sixteen-patch half-square triangles. A half-square triangle, also called a “patch”, is two triangles of fabric sewn together, like this: half-square triangle

Then you sew four of these patches into a four-patch, say like this:


Then to make a sixteen-patch block of the type I was considering, you take four identical four-patch blocks, and sew them together with rotational symmetry, like this:


It turns out that there are exactly 72 different ways to do this. (Blocks equivalent under a reflection are considered the same, as are blocks obtained by exchanging the roles of black and white, which are merely stand-ins for arbitrary colors to be chosen later.) Here is the complete set of 72:

block A1 block A2 block A3 block A4 block B1 block B2 block B3 block B4 block C1 block C2 block C3 block C4 block D1 block D2 block D3 block D4 block E1 block E2 block E3 block E4 block F1 block F2 block F3 block F4 block G1 block G2 block G3 block G4 block H1 block H2 block H3 block H4 block I1 block I2 block I3 block I4 block J1 block J2 block J3 block J4 block K1 block K2 block K3 block K4 block L1 block L2 block L3 block L4 block M1 block M2 block M3 block M4 block N1 block N2 block N3 block N4 block O1 block O2 block O3 block O4 block P1 block P2 block P3 block P4 block Q1 block Q2 block Q3 block Q4 block R1 block R2 block R3 block R4

It's immediately clear that some of these resemble one another, sometimes so strongly that it can be hard to tell how they differ, while others are very distinctive and unique-seeming. I wanted to make the computer classify the blocks on the basis of similarity.

My idea was to try to find a way to get the computer to notice which blocks have distinctive components of one color. For example, many blocks have a distinctive diamond shape small diamond shape in the center.

Some have a pinwheel like this:

pinwheel 1

which also has the diamond in the middle, while others have a different kind of pinwheel with no diamond:

pinwheel 2

I wanted to enumerate such components and ask the computer to list which blocks contained which shapes; then group them by similarity, the idea being that that blocks with the same distinctive components are similar.

The program suite uses a compact notation of blocks and of shapes that makes it easy to figure out which blocks contain which distinctive components.

Since each block is made of four identical four-patches, it's enough just to examine the four-patches. Each of the half-square triangle patches can be oriented in two ways:

patch 1   patch 2

Here are two of the 12 ways to orient the patches in a four-patch:

acddgghj four-patch  bbeeffii four-patch

Each 16-patch is made of four four-patches, and you must imagine that the four-patches shown above are in the upper-left position in the 16-patch. Then symmetry of the 16-patch block means that triangles with the same label are in positions that are symmetric with respect to the entire block. For example, the two triangles labeled b are on opposite sides of the block's northwest-southeast diagonal. But there is no symmetry of the full 16-patch block that carries triangle d to triangle g, because d is on the edge of the block, while g is in the interior.

Triangles must be colored opposite colors if they are part of the same patch, but other than that there are no constraints on the coloring.

A block might, of course, have patches in both orientations:

labeled block 3

All the blocks with diagonals oriented this way are assigned descriptors made from the letters bbdefgii.

Once you have chosen one of the 12 ways to orient the diagonals in the four-patch, you still have to color the patches. A descriptor like bbeeffii describes the orientation of the diagonal lines in the squares, but it does not describe the way the four patches are colored; there are between 4 and 8 ways to color each sort of four-patch. For example, the bbeeffii four-patch shown earlier can be colored in six different ways:

bbeeffii four-patch bbeeffii patch 4   bbeeffii patch 1   bbeeffii patch 2   bbeeffii patch 3   bbeeffii patch 5   bbeeffii patch 6

In each case, all four diagonals run from northwest to southeast. (All other ways of coloring this four-patch are equivalent to one of these under one or more of rotation, reflection, and exchange of black and white.)

We can describe a patch by listing the descriptors of the eight triangles, grouped by which triangles form connected regions. For example, the first block above is:

bbeeffii four-patch bbeeffii patch 4   b/bf/ee/fi/i

because there's an isolated white b triangle, then a black parallelogram made of a b and an f patch, then a white triangle made from the two white e triangles, then another parallelogram made from the black f and i, and finally in the middle, the white i. (The two white e triangles appear to be separated, but when four of these four-patches are joined into a 16-patch block, the two white e patches will be adjacent and will form a single large triangle: b/bf/ee/fi/i 16-patch)

The other five bbeeffii four-patches are, in the same order they are shown above:


All six have bbeeffii, but grouped differently depending on the colorings. The second one (b/b/e/e/f/f/i/i four-patch b/b/e/e/f/f/i/i) has no regions with more than one triangle; the fifth (bfi/bfi/e/e four-patch bfi/bfi/e/e) has two large regions of three triangles each, and two isolated triangles. In the latter four-patch, the bfi in the descriptor has three letters because the patch has a corresponding distinctive component made of three triangles.

I made up a list of the descriptors for all 72 blocks; I think I did this by hand. (The work directory contains a blocks file that maps blocks to their descriptors, but the Makefile does not say how to build it, suggesting that it was not automatically built.) From this list one can automatically extract a list of descriptors of interesting shapes: an interesting shape is two or more letters that appear together in some descriptor. (Or it can be the single letter j, which is exceptional; see below.) For example, bffh represents a distinctive component. It can only occur in a patch that has a b, two fs, and an h, like this one:

labeled block 4

and it will only be significant if the b, the two fs, and the h are the same color:

bffh patch

in which case you get this distinctive and interesting-looking hook component.

There is only one block that includes this distinctive hook component; it has descriptor b/bffh/ee/j, and looks like this: block b/bffh/ee/j. But some of the distinctive components are more common. The ee component represents the large white half-diamonds on the four sides. A block with "ee" in its descriptor always looks like this:

ee patch

and the blocks formed from such patches always have a distinctive half-diamond component on each edge, like this:

ee block

(The stippled areas vary from block to block, but the blocks with ee in their descriptors always have the half-diamonds as shown.)

The blocks listed at all have the ee component. There are many differences between them, but they all have the half-diamonds in common.

Other distinctive components have similar short descriptors. The two pinwheels I mentioned above are pinwheel 1 gh and pinwheel 2 fi, respectively; if you look at the list of gh blocks and the list of fi blocks you'll see all the blocks with each kind of pinwheel.

Descriptor j is an exception. It makes an interesting shape all by itself, because any block whose patches have j in their descriptor will have a distinctive-looking diamond component in the center. The four-patch looks like this:

j patch

so the full sixteen-patch looks like this:

j block

where the stippled parts can vary. A look at the list of blocks with component j will confirm that they all have this basic similarity.

I had made a list of the descriptors for each of the the 72 blocks, and from this I extracted a list of the descriptors for interesting component shapes. Then it was only a matter of finding the component descriptors in the block descriptors to know which blocks contained which components; if the two blocks share two different distinctive components, they probably look somewhat similar.

Then I sorted the blocks into groups, where two blocks were in the same group if they shared two distinctive components. The resulting grouping lists, for each block, which other blocks have at least two shapes in common with it. Such blocks do indeed tend to look quite similar.

This strategy was actually the second thing I tried; the first thing didn't work out well. (I forget just what it was, but I think it involved finding polygons in each block that had white inside and black outside, or vice versa.) I was satisfied enough with this second attempt that I considered the project a success and stopped work on it.

The complete final results were:

  1. This tabulation of blocks that are somewhat similar
  2. This tabulation of blocks that are distinctly similar (This is the final product; I consider this a sufficiently definitive listing of “similar blocks”.)
  3. This tabulation of blocks that are extremely similar

And these tabulations of all the blocks with various distinctive components: bd bf bfh bfi cd cdd cdf cf cfi ee eg egh egi fgh fh fi gg ggh ggi gh gi j

It may also be interesting to browse the work directory.

by Mark Dominus ( at July 20, 2014 12:00 AM

July 19, 2014

Dominic Steinitz

Fun with (Kalman) Filters Part I

Suppose we wish to estimate the mean of a sample drawn from a normal distribution. In the Bayesian approach, we know the prior distribution for the mean (it could be a non-informative prior) and then we update this with our observations to create the posterior, the latter giving us improved information about the distribution of the mean. In symbols

\displaystyle   p(\theta \,\vert\, x) \propto p(x \,\vert\, \theta)p(\theta)

Typically, the samples are chosen to be independent, and all of the data is used to perform the update but, given independence, there is no particular reason to do that, updates can performed one at a time and the result is the same; nor is the order of update important. Being a bit imprecise, we have

\displaystyle   p(z \,\vert\, x, y) = p(z, x, y)p(x, y) = p(z, x, y)p(x)p(y) =  p((z \,\vert\, x) \,\vert\, y) =  p((z \,\vert\, y) \,\vert\, x)

The standard notation in Bayesian statistics is to denote the parameters of interest as \theta \in \mathbb{R}^p and the observations as x \in \mathbb{R}^n. For reasons that will become apparent in later blog posts, let us change notation and label the parameters as x and the observations as y.

Let us take a very simple example of a prior X \sim {\cal{N}}(0, \sigma^2) where \sigma^2 is known and then sample from a normal distribution with mean x and variance for the i-th sample c_i^2 where c_i is known (normally we would not know the variance but adding this generality would only clutter the exposition unnecessarily).

\displaystyle   p(y_i \,\vert\, x) = \frac{1}{\sqrt{2\pi c_i^2}}\exp\bigg(\frac{(y_i - x)^2}{2c_i^2}\bigg)

The likelihood is then

\displaystyle   p(\boldsymbol{y} \,\vert\, x) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi c_i^2}}\exp\bigg(\frac{(y_i - x)^2}{2c_i^2}\bigg)

As we have already noted, instead of using this with the prior to calculate the posterior, we can update the prior with each observation separately. Suppose that we have obtained the posterior given i - 1 samples (we do not know this is normally distributed yet but we soon will):

\displaystyle   p(x \,\vert\, y_1,\ldots,y_{i-1}) = {\cal{N}}(\hat{x}_{i-1}, \hat{\sigma}^2_{i-1})

Then we have

\displaystyle   \begin{aligned}  p(x \,\vert\, y_1,\ldots,y_{i}) &\propto p(y_i \,\vert\, x)p(x \,\vert\, y_1,\ldots,y_{i-1}) \\  &\propto \exp-\bigg(\frac{(y_i - x)^2}{2c_i^2}\bigg) \exp-\bigg(\frac{(x - \hat{x}_{i-1})^2}{2\hat{\sigma}_{i-1}^2}\bigg) \\  &\propto \exp-\Bigg(\frac{x^2}{c_i^2} - \frac{2xy_i}{c_i^2} + \frac{x^2}{\hat{\sigma}_{i-1}^2} - \frac{2x\hat{x}_{i-1}}{\hat{\sigma}_{i-1}^2}\Bigg) \\  &\propto \exp-\Bigg( x^2\Bigg(\frac{1}{c_i^2} + \frac{1}{\hat{\sigma}_{i-1}^2}\Bigg) - 2x\Bigg(\frac{y_i}{c_i^2} + \frac{\hat{x}_{i-1}}{\hat{\sigma}_{i-1}^2}\Bigg)\Bigg)  \end{aligned}


\displaystyle   \frac{1}{\hat{\sigma}_{i}^2} \triangleq \frac{1}{c_i^2} + \frac{1}{\hat{\sigma}_{i-1}^2}

and then completing the square we also obtain

\displaystyle   \frac{\hat{x}_{i}}{\hat{\sigma}_{i}^2} \triangleq \frac{y_i}{c_i^2} + \frac{\hat{x}_{i-1}}{\hat{\sigma}_{i-1}^2}

More Formally

Now let’s be a bit more formal about conditional probability and use the notation of \sigma-algebras to define {\cal{F}}_i = \sigma\{Y_1,\ldots, Y_i\} and M_i \triangleq \mathbb{E}(X \,\vert\, {\cal{F}}_i) where Y_i = X + \epsilon_i, X is as before and \epsilon_i \sim {\cal{N}}(0, c_k^2). We have previously calculated that M_i = \hat{x}_i and that {\cal{E}}((X - M_i)^2 \,\vert\, Y_1, \ldots Y_i) = \hat{\sigma}_{i}^2 and the tower law for conditional probabilities then allows us to conclude {\cal{E}}((X - M_i)^2) = \hat{\sigma}_{i}^2. By Jensen’s inequality, we have

\displaystyle   {\cal{E}}(M_i^2) = {\cal{E}}({\cal{E}}(X \,\vert\, {\cal{F}}_i)^2)) \leq  {\cal{E}}({\cal{E}}(X^2 \,\vert\, {\cal{F}}_i))) =  {\cal{E}}(X^2) = \sigma^2

Hence M is bounded in L^2 and therefore converges in L^2 and almost surely to M_\infty \triangleq {\cal{E}}(X \,\vert\, {\cal{F}}_\infty). The noteworthy point is that if M_\infty = X if and only if \hat{\sigma}_i converges to 0. Explicitly we have

\displaystyle   \frac{1}{\hat{\sigma}_i^2} = \frac{1}{\sigma^2} + \sum_{k=1}^i\frac{1}{c_k^2}

which explains why we took the observations to have varying and known variances. You can read more in Williams’ book (Williams 1991).

A Quick Check

We have reformulated our estimation problem as a very simple version of the celebrated Kalman filter. Of course, there are much more interesting applications of this but for now let us try “tracking” the sample from the random variable.

> {-# OPTIONS_GHC -Wall                     #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing  #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults   #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind  #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans         #-}
> module FunWithKalmanPart1 (
>     obs
>   , nObs
>   , estimates
>   , uppers
>   , lowers
>   ) where
> import Data.Random.Source.PureMT
> import Data.Random
> import Control.Monad.State
> var, cSquared :: Double
> var       = 1.0
> cSquared  = 1.0
> nObs :: Int
> nObs = 100
> createObs :: RVar (Double, [Double])
> createObs = do
>   x <- rvar (Normal 0.0 var)
>   ys <- replicateM nObs $ rvar (Normal x cSquared)
>   return (x, ys)
> obs :: (Double, [Double])
> obs = evalState (sample createObs) (pureMT 2)
> updateEstimate :: (Double, Double) -> (Double, Double) -> (Double, Double)
> updateEstimate (xHatPrev, varPrev) (y, cSquared) = (xHatNew, varNew)
>   where
>     varNew  = recip (recip varPrev + recip cSquared)
>     xHatNew = varNew * (y / cSquared + xHatPrev / varPrev)
> estimates :: [(Double, Double)]
> estimates = scanl updateEstimate (y, cSquared) (zip ys (repeat cSquared))
>   where
>     y  = head $ snd obs
>     ys = tail $ snd obs
> uppers :: [Double]
> uppers = map (\(x, y) -> x + 3 * (sqrt y)) estimates
> lowers :: [Double]
> lowers = map (\(x, y) -> x - 3 * (sqrt y)) estimates


Williams, David. 1991. Probability with Martingales. Cambridge University Press.

by Dominic Steinitz at July 19, 2014 04:37 PM

Danny Gratzer

A Tutorial on Church Representations

Posted on July 19, 2014

I’ve written a few times about church representations, but never aimed at someone who’d never heard of what a church representation is. In fact, it doesn’t really seem like too many people have!

In this post I’d like to fix that :)

What is a Church Representation

Simply put, a church representation (CR) is a way of representing a piece of concrete data with a function. The CR can be used through an identical way to the concrete data, but it’s comprised entirely of functions.

They where originally described by Alanzo Church as a way of modeling all data in lambda calculus, where all we have is functions.


The simplest CR I’ve found is that of a tuples.

Let’s first look at our basic tuple API

    type Tuple a b = ...
    mkTuple :: a -> b -> Tuple a b
    fst     :: Tuple a b -> a
    snd     :: Tuple a b -> b

Now this is trivially implemented with (,)

    type Tuple a b = (a, b)
    mkTuple = (,)
    fst     = Prelude.fst
    snd     = Prelude.snd

The church representation preserves the interface, but changes all the underlying implementations.

    type Tuple a b = forall c. (a -> b -> c) -> c

There’s our church pair, notice that it’s only comprised of ->. It also makes use of higher rank types. This means that a Tuple a b can be applied to function producing any c and it must return something of that type.

Let’s look at how the rest of our API is implemented

    mkTuple a b = \f -> f a b
    fst tup     = tup (\a _ -> a)
    snd tup     = tup (\_ b -> b)

And that’s it!

It’s helpful to step through some reductions here

    fst (mkTuple 1 2)
    fst (\f -> f 1 2)
    (\f -> f 1 2) (\a _ -> a)
    (\a _ -> a) 1 2

And for snd

    snd (mkTuple True False)
    fst (\f -> f True False)
    (\f -> f True False) (\_ b -> b)
    (\_ b -> b) True false

So we can see that these are clearly morally equivalent. The only real question here is whether, for each CR tuple there exists a normal tuple. This isn’t immediately apparent since the function type for the CR looks a lot more general. In fact, the key to this proof lies in the forall c part, this extra polymorphism let’s us use a powerful technique called “parametricity” to prove that they’re equivalent.

I won’t actually go into such a proof now since it’s not entirely relevant, but it’s worth noting that both (,) and Tuple are completely isomorphic.

To convert between them is pretty straightforward

    isoL :: Tuple a b -> (a, b)
    isoL tup = tup (,)

    isoR :: (a, b) -> Tuple a b
    isoR (a, b) = \f -> f a b

Now that we have an idea of how to church representations “work” let’s go through a few more examples to start to see a pattern.


Booleans have the simplest API of all

    type Boolean = ...
    true  :: Boolean
    false :: Boolean
    test  :: Boolean -> a -> a -> a

We can build all other boolean operations on test

    a && b = test a b false
    a || b = test a true b
    when t e = test t e (return ())

This API is quite simple to implement with Bool,

    type Boolean = Bool

    true  = True
    false = False
    test b t e = if b then t else e

But how could we represent this with functions? The answer stems from test,

    type Boolean = forall a. a -> a -> a

Clever readers will notice this is almost identical to test, a boolean get’s two arguments and returns one or the other.

    true  = \a _ -> a
    false = \_ b -> b
    test b t e = b t e

We can write an isomorphism between Bool and Boolean as well

    isoL :: Bool -> Boolean
    isoL b = if b then true else false

    isoR :: Boolean -> Bool
    isoR b = test b True False


Now let’s talk about lists. One of the interesting things is lists are the first recursive data type we’ve dealt with so far.

Defining the API for lists isn’t entirely clear either. We want a small set of functions that can easily cover any conceivable operations for a list.

The simplest way to do this is to realize that we can do exactly 3 things with lists.

  1. Make an empty list
  2. Add a new element to the front of an existing list
  3. Pattern match on them

We can represent this with 3 functions

    type List a = ...

    nil   :: List a
    cons  :: a -> List a -> List a
    match :: List a -> b -> (a -> List a -> b) -> b

If match looks confusing just remember that

    f list = match list g h

Is really the same as

    f []       = g
    f (x : xs) = h x xs

In this way match is just the pure functional version of pattern matching. We can actually simplify the API by realizing that rather than this awkward match construct, we can use something cleaner.

foldr forms a much more pleasant API to work with since it’s really the most primitive form of “recursing” on a list.

    match :: List a -> (a -> List a -> b) -> b -> b
    match list f b = fst $ foldr list worker (b, nil)
      where worker x (b, xs) = (f x xs, cons x xs)

The especially nice thing about foldr is that it doesn’t mention List a in its two “destruction” functions, all the recursion is handled in the implementation.

We can implement CR lists trivially using foldr

    type List a = forall b. (a -> b -> b) -> b -> b

    nil = \ _ nil -> nil
    cons x xs = \ cons nil -> x `cons` xs cons nil
    foldr list cons nil = list cons nil

Notice that we handle the recursion in the list type by having a b as an argument? This is similar to how the accumulator to foldr gets the processed tail of the list. This is a common technique for handling recursion in our church representations.

Last but not least, the isomorphism arises from foldr (:) [],

    isoL :: List a -> [a]
    isoL l = l (:) []

    isoR :: [a] -> List a
    isoR l f z = foldr f z l


The last case that we’ll look at is Either. Like Pair, Either has 3 different operations.

    type Or a b = ...
    inl :: a -> Or a b
    inr :: b -> Or a b
    or :: Or a b -> (a -> c) -> (b -> c)  -> c

This is pretty easy to implement with Either

    type Or a b = Either a b
    inl = Left
    inr = Right
    or (Left a)  f g = f a
    or (Right b) f g = g b

Once again, the trick to encoding this as a function falls right out of the API. In this case we use the type of or

     type Or a b = forall c. (a -> c) -> (b -> c) -> c
    inl a = \f g -> f a
    inr b = \f g -> g a

    or x = x

Last but not least, let’s quickly rattle off our isomorphism.

    isoL :: Or a b -> Either a b
    isoL o = o Left Right

    isoR o :: Either a b -> Or a b
    isoR o = or o

The Pattern

So now we can talk about the underlying pattern in CRs. First remember that for any type T, we have a list of n distinct constructors T1 T2 T3Tn. Each of the constructors has a m fields T11, T12, T13

Now the church representation of such a type T is

    forall c.  (T11 -> T12 -> T13 -> .. -> c)
            -> (T21 -> T22 -> T23 -> .. -> c)
            -> (Tn1 -> Tn2 -> Tn3 -> .. -> c)
            -> c

This pattern doesn’t map quite as nicely to recursive types. Here we have to take the extra step of substituting all occurrences of T for c in our resulting church representation.

This is actually such a pleasant pattern to work with that I’ve written a library for automatically reifying a type between its church representation and concrete form.

Wrap Up

Hopefully you now understand what a church representation is. It’s worth noting that a lot of stuff Haskellers stumble upon daily are really church representations in disguise.

My favorite example is maybe, this function takes a success and failure continuation with a Maybe and produces a value. With a little bit of imagination, one can realize that this is really just a function mapping a Maybe to a church representation!

If you’re thinking that CRs are pretty cool! Now might be a time to take a look at one of my previous posts on deriving them automagically.

<script type="text/javascript"> /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + ''; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

July 19, 2014 12:00 AM

July 18, 2014

Ken T Takusagawa

[gggpgqye] Narrow type signatures which can be widened

Create a tool to find type signatures that are less polymorphic than would be inferred by type inference.

This is a solution in search of a problem.

by Ken ( at July 18, 2014 04:55 PM

Mark Jason Dominus

On uninhabited types and inconsistent logics

Earlier this week I gave a talk about the Curry-Howard isomorphism. Talks never go quite the way you expect. The biggest sticking point was my assertion that there is no function with the type a → b. I mentioned this as a throwaway remark on slide 7, assuming that everyone would agree instantly, and then we got totally hung up on it for about twenty minutes.

Part of this was my surprise at discovering that most of the audience (members of the Philly Lambda functional programming group) was not familiar with the Haskell type system. I had assumed that most of the members of a functional programming interest group would be familiar with one of Haskell, ML, or Scala, all of which have the same basic type system. But this was not the case. (Many people are primarily interested in Scheme, for example.)

I think the main problem was that I did not make clear to the audience what Haskell means when it says that a function has type a → b. At the talk, and then later on Reddit people asked

what about a function that takes an integer and returns a string: doesn't it have type a → b?

If you know one of the HM languages, you know that of course it doesn't; it has type Int → String, which is not the same at all. But I wasn't prepared for this confusion and it took me a while to formulate the answer. I think I underestimated the degree to which I have internalized the behavior of Hindley-Milner type systems after twenty years. Next time, I will be better prepared, and will say something like the following:

A function which takes an integer and returns a string does not have the type a → b; it has the type Int → String. You must pass it an integer, and you may only use its return value in a place that makes sense for a string. If f has this type, then 3 + f 4 is a compile-time type error because Haskell knows that f returns a string, and strings do not work with +.

But if f had the type a → b, then 3 + f 4 would be legal, because context requires that f return a number, and the type a → b says that it can return a number, because a number is an instance of the completely general type b. The type a → b, in contrast to Int → String, means that b and a are completely unconstrained.

Say function f had type a → b. Then you would be able to use the expression f x in any context that was expecting any sort of return value; you could write any or all of:

   3 + f x
   head(f x)
   "foo" ++ f x
   True && f x

and they would all type check correctly, regardless of the type of x. In the first line, f x would return a number; in the second line f would return a list; in the third line it would return a string, and in the fourth line it would return a boolean. And in each case f could be able to do what was required regardless of the type of x, so without even looking at x. But how could you possibly write such a function f? You can't; it's impossible.

Contrast this with the identity function id, which has type a → a. This says that id always returns a value whose type is the same as that if its argument. So you can write

 3 + id x

as long as x has the right type for +, and you can write

 head(id x)

as long as x has the right type for head, and so on. But for f to have the type a → b, all those would have to work regardless of the type of the argument to f. And there is no way to write such an f.

Actually I wonder now if part of the problem is that we like to write a → b when what we really mean is the type ∀a.∀b.a → b. Perhaps making the quantifiers explicit would clear things up? I suppose it probably wouldn't have, at least in this case.

The issue is a bit complicated by the fact that the function

      loop :: a -> b
      loop x = loop x

does have the type a → b, and, in a language with exceptions, throw has that type also; or consider Haskell

      foo :: a -> b
      foo x = undefined

Unfortunately, just as I thought I was getting across the explanation of why there can be no function with type a → b, someone brought up exceptions and I had to mutter and look at my shoes. (You can also take the view that these functions have type a → ⊥, but the logical principle ⊥ → b is unexceptionable.)

In fact, experienced practitioners will realize, the instant the type a → b appears, that they have written a function that never returns. Such an example was directly responsible for my own initial interest in functional programming and type systems; I read a 1992 paper (“An anecdote about ML type inference”) by Andrew R. Koenig in which he described writing a merge sort function, whose type was reported (by the SML type inferencer) as [a] -> [b], and the reason was that it had a bug that would cause it to loop forever on any nonempty list. I came back from that conference convinced that I must learn ML, and Higher-Order Perl was a direct (although distant) outcome of that conviction.

Any discussion of the Curry-Howard isomorphism, using Haskell as an example, is somewhat fraught with trouble, because Haskell's type logic is utterly inconsistent. In addition to the examples above, in Haskell one can write

    fix :: (a -> a) -> a
    fix f = let x = fix f 
             in f x

and as a statement of logic, is patently false. This might be an argument in favor of the Total Functional Programming suggested by D.A. Turner and others.

by Mark Dominus ( at July 18, 2014 04:01 PM

wren gayle romano

Toward a Projectivistic Theory of Gender, Identity, and Social Categorization

As discussed last time there's a deep-seated problem with performativity as a theory of social categorization. Specifically, it puts the focus on the wrong thing. That our actions are performative in nature gives us important insight into the role agency plays both in forming our own identities and in defending those identities against silencing, marginalization, oppression, and colonialism. But, by centering discussions of identity on our own personal agency we miss out on other important facets of the issue. When we say that someone belongs to a category, we do so because we've decided they belong to the category, or because we think they belong to the category. The statement that they belong to the category is not merely true (or false), we are projecting it to be true (or false). That is, we do not passively observe people's gender, race, class, etc; instead we actively project our own notions of gender, race, class, etc upon them. This projecting of beliefs onto others is called projectivism[1].

Interestingly, by localizing "truth" as the beliefs we hold to be true[2], the projective act is itself performative: by projecting something to be true, one comes to believe that it is true. And yet there is no reason to suppose these beliefs are correct (local truths need not be global truths), nor that they will agree with others' beliefs (local truths need not be true in other locales). Crucially, in the case of categorizing or identifying ourselves, we have access to our own personal thoughts, feelings, memories, subconscious inclinations, etc. Whereas, when others are categorizing us, they do not; they can only observe our bodies, our actions, and the results of our actions. Thus arises the discrepancy in cases like transgenderism. When self-identifying, we may well prize our internal observations over our externally observable state. Nevertheless, others will continue to project their categorizations upon us, regardless of our self-identification.

Not only do people project categories onto others, we do it compulsively. Our persistent and ubiquitous gendering of others is an especially powerful example, but it is in no way unique. Projecting race is another example. And in professional cultures where there are sharply contested borders between "tribes" (e.g., academia and hacker culture), projecting these "tribes" is yet another. This compulsive projectivism —or, more particularly, our unconsciousness of it— is where issues arise.

When we are not typically confronted with evidence that our projections are mistaken, our projectivism becomes almost unconscious. Once there, we fail to notice the fact that we are actively projecting and we come to believe we're passively observing truths about the world. So when our projections turn out to be mistaken, we get a feeling of betrayal, we feel like the person whose identity we were mistaken about was "lying" to us. This subsequent projection that they were "lying" stems from the fact that we mistook our earlier projections for mere observations. Thus, because of an original error on our part, we end up imputing that others are being dishonest or deceptive.

When the identity one desires to be seen as (which may differ from the identity they claim for themselves) is often or easily at odds with the identities projected upon them, they understandably become concerned about trying to avoid these projections of "lying". If one can successfully avoid projections of "lying" they are said to "pass", terminology which turns around and suggests that they were in fact lying the whole time and only managed not to get caught. This terminology is, of course, deeply problematic.

Simply acknowledging compulsive projectivism is not enough. To undo the damage caused by misgendering, racial profiling, stereotyping, and other misprojections, we must lift this knowledge up and remain consciously aware that the beliefs we project onto others are not an observation of their identities. We must denaturalize the projectivist assumption that our beliefs are others' truths, by discontinuing the use words like "passing" which rely on that assumption. And when we feel betrayed we must locate that feeling within ourselves and stop projecting it in bad faith. The performative theory highlights the positive role of agency in our lives, but agency alone is not enough. The projectivistic theory extends this to highlight the negative role of agency when used to deny or overwhelm the agency of others.

[1] I do not mean this terminology to be the same as Hume's notion of projectivism, though of course both terms have the same etymology. Hume's projectivism is popular in the ethics literature, with which I am largely unfamiliar; thus, my use of the term here is not meant to entail whatever baggage it may have accrued in that literature.

[2] While it is not usually presented as such, Austin's original definition of performative speech acts should also only hold up to localized truth. In the classical example "I now pronounce you married", by saying the words one does the deed of pronouncing the couple to be married. However, the pronouncement of marriage does not cause the couple to be married in a universal sense; it only causes them to be married in the current jurisdiction, and a different jurisdiction may or may not recognize that marriage as valid. Because the marriage must be localized, therefore the pronouncement of marriage must be localized: one can't pronounce a couple to be married (everywhere), they can only pronounce them to be married (here, or there, or wherever). Thus, the deed performed by the utterance of the words is a localized deed: the pronouncement of a localized wedding.

comment count unavailable comments

July 18, 2014 11:39 AM

I've forgotten how to write

I've forgotten how to write. Somewhere along the way I've forgotten how to say, what I mean. Little sticks and thistles, they burrow under your skin like dry wind and the leaves you brush from your faces. And you find yourself there, looking over, looking out, and turn to tell another how you came to this place, this pretty place, and all you find are tangled weeds and hills and where was the path where you left that friend you thought had come with you

I have half a dozen half written posts, if half written means written and my mind keeps telling me to edit to edit to go over once more, unable to let go, unable to let slip a word lest it falls all out and i somehow say what i somehow mean and someone takes offense. Offence. That word of our times, that police baton with which we beat the helpless, refuse to listen to the stories, those stories once heard we proclaim have "set us free" but we leave the authors beaten, unwilling to look at their lives lest we feel too closely the grip of that truncheon in our fist.

Half a dozen half written posts, weeks of thoughts writ out, on programs and mathematics and words and history. Thoughts I cannot set free. They haunt me, they call me beckoning to spill once again that mental blood to pore and pore over them and wring them dry of every drip of humanity so I can hang out the scraps and let others see how terribly clever i am. I never wanted to be clever, never wanted to be seen like that. I only wanted, once, to be free. From the heartache of a harrowing life, from the illusions and false idols, from my own ignorance. And now these thoughts tie me up in clever little knots, and have me writing bad poetry

comment count unavailable comments

July 18, 2014 11:37 AM

A Word on Words

I'd like to take this moment to point out that all forms of binarism are bad. (Including the binarist notion that all things are either "good" or "bad".) I feel like this has to be pointed out because we, every one of us, has a nasty habit: in our overzealousness to tear down one binary, we do so by reinforcing other binaries. So let me say again. All forms of binarism are bad.

It's well-known that I've had a long, fraught history with certain "feminist" communities, due to which I have heretofore disavowed that label. Because of these persistent conflicts, around ten years ago I retreated from feminist circles and communities. However, over the past year I have rejoined a number of feminist circles— or rather, I have joined womanist, black feminist, transfeminist, and queer feminist circles. And thanks to this reinvolvement with feminist activism I have come, once again, to feel a certain attachment to that word: "feminist". The attachment feels strange to me now, having disavowed it for so long in favor of "womanism", "black feminism", "transfeminism", and "queer feminism". But because of this attachment I feel, once more, the need to reclaim feminism away from those "feminist" communities whose philosophy and political methods I continue to disavow.

So, to piss everyone off once more: a manifesto. )

Edit 2014.07.13: Added footnotes [2] and [3].

comment count unavailable comments

July 18, 2014 11:35 AM

Performativity is not performance

Although the words have superficially similar pronunciations, performativity and performance are two extremely different notions. In her book Gender Trouble (1990) and its sequel Bodies That Matter (1993), Judith Butler put forth the thesis that gender identity is performative. Over the last decade performance-based theories of gender and identity have become popular, even mainstream, despite a number of deep-seated and readily-apparent flaws. Unfortunately, these latter performance-based theories are often portrayed as successors of Butlerean performativity. They're not.[1]

To understand performativity one should go back to Austin's original definition of performative speech acts. Whenever we speak, we speak for a reason. Austin was interested in explaining these reasons— in particular, explaining the contrast between what we say and why we say it. When we ask "could you pass the salt?" we are not literally interested in whether the addressee is capable of moving the salt shaker, we're making a request. When we ask "how do you do?" or "what's up?" we do not actually want an answer, we are merely greeting someone. It is within this context of discussing the why behind what we say that Austin became interested in performative speech acts: speech acts which through their very utterance do what it is they say, or speech acts which are what it is they mean. When the right person in the right context utters "I now pronounce you married", that vocalization is in fact the pronouncement itself. To state that you pronounce something, is itself to make the proclamation. In just the same way, when under the right circumstances someone says they promise such-and-so, they just did.

There are a number of interesting details about what it means to be a performative speech act. For instance, just uttering the words is not enough: if a random stranger comes up to you and pronounces you married, that does not actually mean you're married. For the performative speech act to have any force it must be uttered in a felicitous context (e.g., the words must be spoken with the proper intent, the pronouncer of marriage must be ordained with the ability to marry people, the partners must be willing, the pairing must be of an appropriate sort according to the bigotry of the times, etc). Another detail is that performative speech acts do more than just enact what they say, they also create something: pronouncing a marriage constructs the marriage itself, declaring war brings the war into existence, giving a promise makes the promise, sentencing someone creates the sentence, etc. Because of details like these, claiming that a particular speech act is performative says a heck of a lot more than just saying the act was performed (i.e., spoken).

On the other hand, a performance is the enactment of a particular variety of artistic expression ranging from theatrical plays, to musical opuses, to religious ceremonies, to performance art, and so on. Whether a particular act is performative is independent of whether it is (a part of) a performance. Many performative speech acts are of a ceremonial nature (e.g., marriages, divorces, christenings, declarations of war, etc) and consequently we like to make a big affair of it. Thus, these particular acts tend to be both: they're performative performances. However, many other performative speech acts are executed with little fanfare: ordering food in a restaurant, apologizing, accepting apologies, resigning from a game, etc. These are all performative acts, and yet there's absolutely no need for any sort of performance behind them. Indeed we often find it humorous, or rude, or severe, when someone chooses to turn these performative acts into performances.

The distinction between performativity and performance is crucial to understanding the thesis Butler put forth. We can expand the idea of performativity to include not just speech acts, but other acts as well. Doing so, Butler's thesis is that one's identity as a particular gender is not something which exists a priori, but rather that it is constructed by the enactment —and especially the continuous ritualistic re-enactment— of performative gender actions. The specific claim being made is that one's gender identity is an artifact whose ontological existence arises from particular deeds, in the exact same way that a marriage is an artifact arising from nuptial ceremony, that a promise is an artifact arising from the swearing of a vow, that a state of war is an artifact arising from the declaration of its existence, and so on. The performative theory of gender is often paraphrased as "gender is something we do"— but this paraphrase is grossly misleading. The paraphrase elides the entire specific content of the thesis! Sure, gender is something we do, but it's something we do in very specific ways and it is in virtue of doing those things in those ways that we come to identify with our gender. That's the thesis.

As discussed before, there are some crucial issues with performativity as a theory of gender. (Though these issues can be corrected by changing the focus without giving up the crucial insight.) But the issue with performativity has nothing whatsoever to do with the fact that performances are artificial, that performances are interruptible, that performances can be altered on whimsy, that performances can be disingenuous, that performances are "only" art, etc. Those latter complaints are why performance-based theories of gender are flat out wrong. And they're evidence of why claiming that performance-based theories were built upon performative theories grossly misconceptualizes performativity.

[1] Don't take my word for it, Butler herself has continually argued that performance-based theories are a gross misinterpretation of her work (Gender Trouble, xxii–xxiv; Bodies That Matter, 125–126; "Gender as Performance: An interview with Judith Butler", 32–39; Judith Butler (by Sara Salih), 62–71).

comment count unavailable comments

July 18, 2014 11:34 AM

I Don't Hear You Talking: a silence on Silence Culture

A lot of ink has been spilt over trigger warnings lately. And I don't want to write about it because I feel like I don't have much to add to the conversation. But when I stop, that feeling nags at me. You can't think with your mouth open; and as someone who always had issues keeping her damn mouth shut, it took me a long time to learn that to listen you must be silent. ... And yet. ... And yet, when someone experiences strong emotions about her own marginalization, but feels compelled to self-silence: that's when you need to listen harder.

Because there are a lot of voices I know full well, and I don't hear them talking.

I know them because they're the voices of my friends, and among friends we talk about things we don't talk about. In the workaday world we put on our faces and never hint at the behemoths raging through our china cabinets. And when we let down our hair, those faces stay on, because you always know who might be listening. And behind closed doors, still, we keep them on because elsewise love would be too tragic. But in secret spaces, we talk. We are, every one of us, damaged. I may not know who hurt you yet, I may not know your story of pain, but I never assume there isn't one; because every single person I've known, when we get close enough, they tell me things we don't talk about. Sometimes it takes years before they feel safe enough, sometimes they never feel safe enough, but if they've ever lowered their guard to me, they've told me. Every. single. person.

We are born and raised and live in a world drenched in abuse. And that abuse doesn't leave scars, it leaves open wounds waiting to have dirt rubbed in them. The first rule of what doesn't happen is that it cannot be spoken of. So healing only happens in those secret spaces, one-on-one, in the dark of night, far far from friends and strangers alike. This privatization of healing only compounds the problem of abuse. When we cannot see past others' faces, when we cannot see the living wounds they bear, when we do not hear their daily resistance against reiterations of violence, we come to think that somehow maybe they haven't been hurt as badly as we. When we see our own people succeed, or see leaders of resistance and "survivors" and "healed" voices speaking up against the injustice of the world, we think that somehow maybe they must be stronger than us, more resilient than us, more determined than us. When we cannot witness their struggle, we think that somehow maybe when they go to bed at nights they need not take the time to scrub out that daily dirt from their wounds. And when we cannot bear that witness, we see ourselves as lesser, broken, impostors.

These are the voices I do not hear speaking out, or hear speaking in only roundabout whispers. These are the voices for whom trigger warnings are writ. As so precisely put by Aoife,

Here's something I need you to understand: the vast majority of students when 'triggered' don't write howlers to department heads or flip laptops over in crowded classrooms for YouTube counts.

On the contrary, they most often shut down and collapse into numbness.

That numbness, that collapse, is the last tool our minds have to keep our faces in place when some sudden shock reopens sore wounds. The second rule of what we do not talk about is that wounds never heal, not entirely. We —some of us— can manage not flinching when someone raises their hand. We —some of us— learn to laugh along when someone touches our back. We —some of us— learn to feel safe in a room alone with a man. We —some of us— learn to turn blind to the "tranny" jokes, to the blackface, to the jibes about trailer parks and country living, to the "sex" scene where she lay sleeping, the scene where he takes the other man 'round back, the man who slaps his wife, the mother who cuffs her child, being told to go pick a switch, to the child starving on the street, to the college kids playing "tricks" on the homeless. We —some of us— learn to live as stone. But stone don't heal, and we all have our rituals of self-care we won't talk about. But when everywhere all you ever see is stone, you know your flesh will never make it if the light still shines in your eyes.

And I too am guilty of this silence culture. Because the fact of the matter is, in this day and age, to speak is to jeopardize my career. I can talk about being trans or being a dyke, and I can at least pretend that the laws on the books will mean a damn. But if I talk about my childhood, I won't be seen as an adult. If I talk about my abuse, I won't be seen as stable. If I bring up my mental life, I won't be seen as professional. If I talk about spoons, I won't be seen as reliable. And so I stuff it down and self-silence and hide what it's like, that daily living with depression and PTSD, til some trigger sets it off and out comes that rage which grows on silence. Some full-force punch to the gut, some words like "I'm not sure suicide is ever the answer" and my eyes go black, and words come out, and they sound nice enough, but every one means "I hate you".

Not to be rude, but sometimes suicide is the answer. It may not be the best answer, but it is an answer. And, unfortunately, sometimes that is all that's required. Sometimes a terrible fucking answer is the only answer to be found.

I say this as someone who's spent more of her life being suicidal than not, as someone who's survived multiple attempts, as someone whose friends have almost invariably spent years being suicidal. Yes, it sucks. And no, it doesn't "solve" anything. But think of the suffering of the victim. It is incredibly difficult to overcome the self-preservation instinct. Profoundly difficult. Imagine the volume of suffering it takes, the depths and duration of misery required to actively overcome the single most powerful compulsion any living creature can experience. There comes a point, long after endurance has already given out, when the full weight of that volume cannot be borne.

Whenever this happens, my thoughts are always with the victim. I cannot help but empathize with that terrible terrible suffering

Because the fact of the matter is, I'm too scared to talk. We live in a culture where suicide is "the easy way" and you're supposed to "take it like a man", but the fact of the matter is noone can take it. We are, every one of us, damaged. We privatize our healing because the first rule of abuse is that it must never be mentioned, must never never be discussed. The learning of silence is the first abuse: it is how we are taught to abuse ourselves, to never never hear that we're not alone.

This isn't about suicide and depression. Isn't about rape and racism. Isn't about violence and neglect. This is about silence. About the words we don't use to not say what you can't talk about. This is about learning to speak using words. About how we must open our mouths in order to listen.

comment count unavailable comments

July 18, 2014 11:31 AM

July 17, 2014

Ketil Malde

Information content and allele frequency difference

ESI scores and allele frequency difference

Just a quick note on the relationship between ESI scores and allele frequencies. Allele frequency differences is of course related to – perhaps even the definition of – diversification, but the information we gain from observing an allele also depends on the specific allele frequencies involved. The graph below shows how this is related.

Each line represents a fixed allele difference, from 0.05 at the bottom, to 0.95 at the top, and the x-axis is the average allele frequency between populations. We see that for small differences, the actual frequencies matter little, but for moderate to large allele differences, allele frequencies near the extremes have a large effect.

Note that this is information per allele, and thus not ESI (which is the expected information from observing the site, in other words a weighted average over all alleles).

July 17, 2014 12:00 PM

Mark Jason Dominus

Guess what this does (solution)

A few weeks ago I asked people to predict, without trying it first, what this would print:

 perl -le 'print(two + two == five ? "true" : "false")'

(If you haven't seen this yet, I recommend that you guess, and then test your guess, before reading the rest of this article.)

People familiar with Perl guess that it will print true; that is what I guessed. The reasoning is as follows: Perl is willing to treat the unquoted strings two and five as strings, as if they had been quoted, and is also happy to use the + and == operators on them, converting the strings to numbers in its usual way. If the strings had looked like "2" and "5" Perl would have treated them as 2 and 5, but as they don't look like decimal numerals, Perl interprets them as zeroes. (Perl wants to issue a warning about this, but the warning is not enabled by default. Since the two and five are treated as zeroes, the result of the == comparison are true, and the string "true" should be selected and printed.

So far this is a little bit odd, but not excessively odd; it's the sort of thing you expect from programming languages, all of which more or less suck. For example, Python's behavior, although different, is about equally peculiar. Although Python does require that the strings two and five be quoted, it is happy to do its own peculiar thing with "two" + "two" == "five", which happens to be false: in Python the + operator is overloaded and has completely different behaviors on strings and numbers, so that while in Perl "2" + "2" is the number 4, in Python is it is the string 22, and "two" + "two" yields the string "twotwo". Had the program above actually printed true, as I expected it would, or even false, I would not have found it remarkable.

However, this is not what the program does do. The explanation of two paragraphs earlier is totally wrong. Instead, the program prints nothing, and the reason is incredibly convoluted and bizarre.

First, you must know that print has an optional first argument. (I have plans for an article about how optional first argmuents are almost always a bad move, but contrary to my usual practice I will not insert it here.) In Perl, the print function can be invoked in two ways:

   print HANDLE $a, $b, $c, …;
   print $a, $b, $c, …;

The former prints out the list $a, $b, $c, … to the filehandle HANDLE; the latter uses the default handle, which typically points at the terminal. How does Perl decide which of these forms is being used? Specifically, in the second form, how does it know that $a is one of the items to be printed, rather than a variable containing the filehandle to print to?

The answer to this question is further complicated by the fact that the HANDLE in the first form could be either an unquoted string, which is the name of the handle to print to, or it could be a variable containing a filehandle value. Both of these prints should do the same thing:

  my $handle = \*STDERR;
  print STDERR $a, $b, $c;
  print $handle $a, $b, $c;

Perl's method to decide whether a particular print uses an explicit or the default handle is a somewhat complicated heuristic. The basic rule is that the filehandle, if present, can be distinguished because its trailing comma is omitted. But if the filehandle were allowed to be the result of an arbitrary expression, it might be difficult for the parser to decide where there was a a comma; consider the hypothetical expression:

   print $a += EXPRESSION, $b $c, $d, $e;

Here the intention is that the $a += EXPRESSION, $b expression calculates the filehandle value (which is actually retrieved from $b, the $a += … part being executed only for its side effect) and the remaining $c, $d, $e are the values to be printed. To allow this sort of thing would be way too confusing to both Perl and to the programmer. So there is the further rule that the filehandle expression, if present, must be short, either a simple scalar variable such as $fh, or a bare unqoted string that is in the right format for a filehandle name, such as `HANDLE``. Then the parser need only peek ahead a token or two to see if there is an upcoming comma.

So for example, in

  print STDERR $a, $b, $c;

the print is immediately followed by STDERR, which could be a filehandle name, and STDERR is not followed by a comma, so STDERR is taken to be the name of the output handle. And in

  print $x, $a, $b, $c;

the print is immediately followed by the simple scalar value $x, but this $x is followed by a comma, so is considered one of the things to be printed, and the target of the print is the default output handle.


  print STDERR, $a, $b, $c;

Perl has a puzzle: STDERR looks like a filehandle, but it is followed by a comma. This is a compile-time error; Perl complains “No comma allowed after filehandle” and aborts. If you want to print the literal string STDERR, you must quote it, and if you want to print A, B, and C to the standard error handle, you must omit the first comma.

Now we return the the original example.

 perl -le 'print(two + two == five ? "true" : "false")'

Here Perl sees the unquoted string two which could be a filehandle name, and which is not followed by a comma. So it takes the first two to be the output handle name. Then it evaluates the expression

     + two == five ? "true" : "false"

and obtains the value true. (The leading + is a unary plus operator, which is a no-op. The bare two and five are taken to be string constants, which, compared with the numeric == operator, are considered to be numerically zero, eliciting the same warning that I mentioned earlier that I had not enabled. Thus the comparison Perl actually does is is 0 == 0, which is true, and the resulting string is true.)

This value, the string true, is then printed to the filehandle named two. Had we previously opened such a filehandle, say with

open two, ">", "output-file";

then the output would have been sent to the filehandle as usual. Printing to a non-open filehandle elicits an optional warning from Perl, but as I mentioned, I have not enabled warnings, so the print silently fails, yielding a false value.

Had I enabled those optional warnings, we would have seen a plethora of them:

Unquoted string "two" may clash with future reserved word at -e line 1.
Unquoted string "two" may clash with future reserved word at -e line 1.
Unquoted string "five" may clash with future reserved word at -e line 1.
Name "main::two" used only once: possible typo at -e line 1.
Argument "five" isn't numeric in numeric eq (==) at -e line 1.
Argument "two" isn't numeric in numeric eq (==) at -e line 1.
print() on unopened filehandle two at -e line 1.

(The first four are compile-time warnings; the last three are issued at execution time.) The crucial warning is the one at the end, advising us that the output of print was directed to the filehandle two which was never opened for output.

[ Addendum 20140718: I keep thinking of the following remark of Edsger W. Dijkstra:

[This phenomenon] takes one of two different forms: one programmer places a one-line program on the desk of another and … says, "Guess what it does!" From this observation we must conclude that this language as a tool is an open invitation for clever tricks; and while exactly this may be the explanation for some of its appeal, viz., to those who like to show how clever they are, I am sorry, but I must regard this as one of the most damning things that can be said about a programming language.

But my intent is different than what Dijkstra describes. His programmer is proud, but I am discgusted. Incidentally, I believe that Dijkstra was discussing APL here. ]

by Mark Dominus ( at July 17, 2014 12:00 AM

July 16, 2014

Ian Ross

Haskell data analysis: Reading NetCDF files

Haskell data analysis: Reading NetCDF files

I never really intended the FFT stuff to go on for as long as it did, since that sort of thing wasn’t really what I was planning as the focus for this Data Analysis in Haskell series. The FFT was intended primarily as a “warm-up” exercise. After fourteen blog articles and about 10,000 words, everyone ought to be sufficiently warmed up now…

Instead of trying to lay out any kind of fundamental principles for data analysis before we get going, I’m just going to dive into a real example. I’ll talk about generalities as we go along when we have some context in which to place them.

All of the analysis described in this next series of articles closely follows that in the paper: D. T. Crommelin (2004). Observed nondiffusive dynamics in large-scale atmospheric flow. J. Atmos. Sci. 61(19), 2384–2396. We’re going to replicate most of the data analysis and visualisation from this paper, maybe adding a few interesting extras towards the end.

It’s going to take a couple of articles to lay out some of the background to this problem, but I want to start here with something very practical and not specific to this particular problem. We’re going to look at how to gain access to meteorological and climate data stored in the NetCDF file format from Haskell. This will be useful not only for the low-frequency atmospheric variability problem we’re going to look at, but for other things in the future too.

The NetCDF file format

The NetCDF file format is a “self-describing” binary format that’s used a lot for storing atmospheric and oceanographic data. It’s “self-describing” in the sense that the file format contains metadata describing the spatial and temporal dimensions of variables, as well as optional information about units and a bunch of other stuff. It’s a slightly intimidating format to deal with at first, but we’ll only need to know how a subset of it works. (And it’s much easier to deal with than HDF5, which we’ll probably get around to when we look at some remote sensing data at some point.)

So, here’s the 30-second introduction to NetCDF. A NetCDF file contains dimensions, variables and attributes. A NetCDF dimension just has a name and a size. One dimension can be specified as an “unlimited” or record dimension, which is usually used for time series, and just means that you can tack more records on the end of the file. A NetCDF variable has a name, a type, a list of dimensions, some attributes and some data. As well as attributes attached to variables, a NetCDF file can also have some file-level global attributes. A NetCDF attribute has a name, a type and a value. And that’s basically it (for NetCDF-3, at least; NetCDF-4 is a different beast, but I’ve never seen a NetCDF-4 file in the wild, so I don’t worry about it too much).

An example NetCDF file

That’s very abstract, so let’s look at a real example. The listing below shows the output from the ncdump tool for one of the data files we’re going to be using, which stores a variable called geopotential height (I’ll explain exactly what this is in a later article – for the moment, it’s enough to know that it’s related to atmospheric pressure). The ncdump tool is useful for getting a quick look at what’s in a NetCDF file – it shows all the dimension and variable definitions, all attributes and also dumps the entire data contents of the file as ASCII (which you usually want to chop off…).

netcdf z500-1 {
	longitude = 144 ;
	latitude = 73 ;
	time = 7670 ;
	float longitude(longitude) ;
		longitude:units = "degrees_east" ;
		longitude:long_name = "longitude" ;
	float latitude(latitude) ;
		latitude:units = "degrees_north" ;
		latitude:long_name = "latitude" ;
	int time(time) ;
		time:units = "hours since 1900-01-01 00:00:0.0" ;
		time:long_name = "time" ;
	short z500(time, latitude, longitude) ;
		z500:scale_factor = 0.251043963537454 ;
		z500:add_offset = 50893.8041655182 ;
		z500:_FillValue = -32767s ;
		z500:missing_value = -32767s ;
		z500:units = "m**2 s**-2" ;
		z500:long_name = "Geopotential" ;
		z500:standard_name = "geopotential" ;

// global attributes:
		:Conventions = "CF-1.0" ;
		:history = "Sun Feb  9 18:46:25 2014: ncrename -v z,z500\n",
			"2014-01-29 21:04:31 GMT by grib_to_netcdf-1.12.0: grib_to_netcdf /data/soa/scra
tch/ -o /data/soa/scratch/netcdf-web237-20140129210411-3022" ;

 longitude = 0, 2.5, 5, 7.5, 10, 12.5, 15, 17.5, 20, 22.5, 25, 27.5, 30,
    32.5, 35, 37.5, 40, 42.5, 45, 47.5, 50, 52.5, 55, 57.5, 60, 62.5, 65,
    67.5, 70, 72.5, 75, 77.5, 80, 82.5, 85, 87.5, 90, 92.5, 95, 97.5, 100,
    102.5, 105, 107.5, 110, 112.5, 115, 117.5, 120, 122.5, 125, 127.5, 130,
    132.5, 135, 137.5, 140, 142.5, 145, 147.5, 150, 152.5, 155, 157.5, 160,
    162.5, 165, 167.5, 170, 172.5, 175, 177.5, 180, 182.5, 185, 187.5, 190,
    192.5, 195, 197.5, 200, 202.5, 205, 207.5, 210, 212.5, 215, 217.5, 220,
    222.5, 225, 227.5, 230, 232.5, 235, 237.5, 240, 242.5, 245, 247.5, 250,
    252.5, 255, 257.5, 260, 262.5, 265, 267.5, 270, 272.5, 275, 277.5, 280,
    282.5, 285, 287.5, 290, 292.5, 295, 297.5, 300, 302.5, 305, 307.5, 310,
    312.5, 315, 317.5, 320, 322.5, 325, 327.5, 330, 332.5, 335, 337.5, 340,
    342.5, 345, 347.5, 350, 352.5, 355, 357.5 ;

 latitude = 90, 87.5, 85, 82.5, 80, 77.5, 75, 72.5, 70, 67.5, 65, 62.5, 60,
    57.5, 55, 52.5, 50, 47.5, 45, 42.5, 40, 37.5, 35, 32.5, 30, 27.5, 25,
    22.5, 20, 17.5, 15, 12.5, 10, 7.5, 5, 2.5, 0, -2.5, -5, -7.5, -10, -12.5,
    -15, -17.5, -20, -22.5, -25, -27.5, -30, -32.5, -35, -37.5, -40, -42.5,
    -45, -47.5, -50, -52.5, -55, -57.5, -60, -62.5, -65, -67.5, -70, -72.5,
    -75, -77.5, -80, -82.5, -85, -87.5, -90 ;

As shown in the first line of the listing, this file is called (it’s contains daily 500 millibar geopotential height data). It has dimensions called longitude, latitude and time. There are variables called longitude, latitude, time and z500. The variables with names that are the same as dimensions are called coordinate variables and are part of a metadata convention that provides information about the file dimensions. The NetCDF file format itself doesn’t require that dimensions have any more information provided for them than their name and size, but for most applications, it makes sense to give units and values for points along the dimensions.

If we look at the longitude variable, we see that it’s of type float and has one dimension, which is the longitude dimension – this is how you tell a coordinate variable from a data variable: it will have the same name as the dimension it goes with and will be indexed just by that dimension. Immediately after the line defining the longitude variable are the attributes for the variable. Here they give units and a display name (they can also give information about the range of values and the orientation of the coordinate axis). All of these attributes are again defined by a metadata convention, but they’re mostly pretty easy to figure out. Here, the longitude is given in degrees east of the prime meridian, and if we look further down the listing, we can see the data values for the longitude variable, running from zero degrees to 357.5°E. From all this, we can infer that the 144 longitude values in the file start at the prime meridian and increase eastwards.

Similarly, the latitude variable is a coodinate variable for the latitude dimension, and specifies the latitude of points on the globe. The latitude is measured in degrees north of the equator and ranges from 90° (the North pole) to -90° (the South pole). Taking a look at the data values for the latitude variable, we can see that 90 degrees north is in index 0, and the 73 latitude values decrease with increasing index until we reach the South pole.

The time coordinate variable is a little more interesting, mostly because of its units – this “hours since YYYY-MM-DD HH:MM:SS” approach to time units is very common in NetCDF files and it’s usually pretty easy to work with.

Finally, we get on to the data variable, z500. This is defined on a time/latitude/longitude grid (so, in the data, the longitude is the fastest changing coordinate). The variable has one slightly odd feature: its type. The types for the coordinate variables were all float or double, as you’d expect, but z500 is declared to be a short integer value. Why? Well, NetCDF files are quite often big so it can make sense to use some sort of encoding to reduce file sizes. (I worked on a paleoclimate modelling project where each model simulation resulted in about 200 Gb of data, for a dozen models for half a dozen different scenarios. In “Big Data” terms, it’s not so large, but it’s still quite a bit of data for people to download from a public server.) Here, the real-valued geopotential height is packed into a short integer. The true value of the field can be recovered from the short integer values in the file using the add_offset and scale_factor attributes – here the scale factor is unity, so we just need to add the add_offset to each value in the file to get the geopotential height.

Last of all we have the global attributes in the file. The most interesting of these is the Conventions attribute, which specifies that the file uses the CF metadata convention. This is the convention that defines how coordinate variables are represented, how data values can be compressed by scaling and offsetting, how units and axes are represented, and so on. Given a NetCDF file using the CF convention (or another related convention called the COARDS metadata convention), it’s pretty straightforward to figure out what’s going on.

Reading NetCDF files in Haskell

So, how do we read NetCDF files into a Haskell program to work on them? I’ve seen a few Haskell FFI bindings to parts of the main NetCDF C library, but none of those really seemed satisfactory for day-to-day use, so I’ve written a simple library called hnetcdf that includes both a low-level wrapping of the C library and a more idiomatic Haskell interface (which is what we’ll be using).

In particular, because NetCDF data is usually grid-based, hnetcdf supports reading data values into a number of different kinds of Haskell arrays (storable Vectors, Repa arrays and hmatrix arrays). For this analysis, we’re going to use hmatrix vectors and matrices, since they provide a nice “Matlab in Haskell” interface for doing the sort of linear algebra we’ll need.

In this section, we’ll look at some simple code for accessing the NetCDF file whose contents we looked at above which will serve as a basis for the more complicated things we’ll do later. (The geopotential height data we’re using here is from the ERA-Interim reanalysis project – again, I’ll explain what “reanalysis” means in a later article. For the moment, think of it as a “best guess” view of the state of the atmosphere at different moments in time.) We’ll open the NetCDF file, show how to access the file metadata and how to read data values from coordinate and data variables.

We need a few imports first, along with a couple of useful type synonyms for return values from hnetcdf functions:

import Prelude hiding (length, sum)
import Control.Applicative ((<$>))
import qualified Data.Map as M
import Foreign.C
import Foreign.Storable
import Numeric.Container
import Data.NetCDF
import Data.NetCDF.HMatrix

type VRet a = IO (Either NcError (HVector a))
type MRet a = IO (Either NcError (HRowMajorMatrix a))

As well as a few utility imports and the Numeric.Container module from the hmatrix library, we import Data.NetCDF and Data.NetCDF.HMatrix – the first of these is the general hnetcdf API and the second is the module that allows us to use hnetcdf with hmatrix. Most of the functions in hnetcdf handle errors by returning an Either of NcError and a “useful” return type. The VRet and MRet type synonyms represent return values for vectors and matrices respectively. When using hnetcdf, it’s often necessary to supply type annotations to control the conversion from NetCDF values to Haskell values, and these type synonyms come in handy for doing this.

Reading NetCDF metadata

Examining NetCDF metadata is simple:

Right nc <- openFile "/big/data/reanalysis/ERA-Interim/"
putStrLn $ "Name: " ++ ncName nc
putStrLn $ "Dims: " ++ show (M.keys $ ncDims nc)
putStr $ unlines $ map (\(n, s) -> "  " ++ n ++ ": " ++ s) $
  M.toList $ flip (ncDims nc) $
  \d -> show (ncDimLength d) ++ if ncDimUnlimited d then " (UNLIM)" else ""
putStrLn $ "Vars: " ++ show (M.keys $ ncVars nc)
putStrLn $ "Global attributes: " ++ show (M.keys $ ncAttrs nc)

let Just ntime = ncDimLength <$> ncDim nc "time"
    Just nlat = ncDimLength <$> ncDim nc "latitude"
    Just nlon = ncDimLength <$> ncDim nc "longitude"

We open a file using hnetcdf’s openFile function (here assuming that there are no errors), getting a value of type NcInfo (defined in Data.NetCDF.Metadata in hnetcdf). This is a value representing all of the metadata in the NetCDF file: dimension, variable and attribute definitions all bundled up together into a single value from which we can access different metadata elements. We can access maps from names to dimension, variable and global attribute definitions and can then extract individual dimensions and variables to find information about them. The code in the listing above produces this output for the ERA-Interim Z500 NetCDF file used here:

Name: /big/data/reanalysis/ERA-Interim/
Dims: ["latitude","longitude","time"]
  latitude: 73
  longitude: 144
  time: 7670
Vars: ["latitude","longitude","time","z500"]
Global attributes: ["Conventions","history"]

Accessing coordinate values

Reading values from a NetCDF file requires a little bit of care to ensure that NetCDF types are mapped correctly to Haskell types:

let (Just lonvar) = ncVar nc "longitude"
Right (HVector lon) <- get nc lonvar :: VRet CFloat
let mlon = mean lon
putStrLn $ "longitude: " ++ show lon ++ " -> " ++ show mlon
Right (HVector lon2) <- getS nc lonvar [0] [72] [2] :: VRet CFloat
let mlon2 = mean lon2
putStrLn $ "longitude (every 2): " ++ show lon2 ++ " -> " ++ show mlon2

This shows how to read values from one-dimensional coordinate variables, both reading the whole variable, using hnetcdf’s get function, and reading a strided slice of the data using the getS function. In both cases, it’s necessary to specify the return type of get or getS explicitly – here this is done using the convenience type synonym VRet defined earlier. This code fragment produces this output:

longitude: fromList [0.0,2.5,5.0,7.5,10.0,12.5,15.0,17.5,20.0,22.5,25.0,
  337.5,340.0,342.5,345.0,347.5,350.0,352.5,355.0,357.5] -> 178.75

longitude (every 2): fromList [0.0,5.0,10.0,15.0,20.0,25.0,30.0,35.0,40.0,
  355.0] -> 177.5

The mean function used in above is defined as:

mean :: (Storable a, Fractional a) => Vector a -> a
mean xs = (foldVector (+) 0 xs) / fromIntegral (dim xs)

It requires a Storable type class constraint, and makes use of hmatrix’s foldVector function.

Accessing data values

Finally, we get round to reading the data that we’re interested in (of course, reading the metadata is a necessary prerequisite for this: this kind of geospatial data doesn’t mean much unless you can locate it in space and time, for which you need coordinate variables and their associated metadata).

The next listing shows how we read the Z500 data into a row-major hmatrix matrix:

let (Just zvar) = ncVar nc "z500"
putStrLn $ "z500 dims: " ++ show (map ncDimName $ ncVarDims zvar)
Right slice1tmp <- getA nc zvar [0, 0, 0] [1, nlat, nlon] :: MRet CShort
let (HRowMajorMatrix slice1tmp2) =
      coardsScale zvar slice1tmp :: HRowMajorMatrix CDouble
    slice1 = cmap ((/ 9.8) . realToFrac) slice1tmp2 :: Matrix Double
putStrLn $ "size slice1 = " ++
  show (rows slice1) ++ " x " ++ show (cols slice1)
putStrLn $ "lon(i=25) = " ++ show (lon @> (25 - 1))
putStrLn $ "lat(j=40) = " ++ show (lat @> (nlat - 40))
let v @!! (i, j) = v @@> (nlat - i, j - 1)
putStrLn $ "slice1(i=25,j=40) = " ++ show (slice1 @!! (25, 40))

There are a number of things to note here. First, we use the getA function, which allows us to specify starting indexes and counts for each dimension in the variable we’re reading. Here we read all latitude and longitude points for a single vertical level in the atmosphere (which is the only one there is in this file). Second, the values stored in this file are geopotential values, not geopotential height (so their units are m s-2 instead of metres, which we can convert to geopotential height by dividing by the acceleration due to gravity (about 9.8 m s-2). Third, the geopotential values are stored in a compressed form as short integers according to the COARDS metadata convention. This means that if we want to work with floating point values (which we almost always do), we need to convert using the hnetcdf coardsScale function, which reads the relevant scaling and offset attributes from the NetCDF variable and uses them to convert from the stored data values to some fractional numeric type (in this case CDouble – the destination type also needs to be an instance of hnetcdf’s NcStorable class).

Once we have the input data converted to a normal hmatrix Matrix value, we can manipulate it like any other data value. In particular, here we extract the geopotential height at given latitude and longitude coordinates (the @!! operator defined here is just a custom indexing operator to deal with the fact that the latitude values are stored in north-to-south order).

The most laborious part of all this is managing the correspondence between coordinate values and indexes, and managing the conversions between the C types used to represent values stored in NetCDF files (CDouble, CShort, etc.) and the native Haskell types that we’d like to use for our data manipulation activities. To be fair, the first of these problems is a problem for any user of NetCDF files, and Haskell’s data abstraction capabilities at least make dealing with metadata values less onerous than in C or C++. The second issue is a little more annoying, but it does ensure that we maintain a good cordon sanitaire between external representations of data values and the internal representations that we use.

What’s next

We’re going to have to spend a couple of articles covering some background to the atmospheric variability problem we’re going to look at, just to place some of this stuff in context: we need to look a little at just what this study is trying to address, we need to understand some basic facts about atmospheric dynamics and the data we’re going to be using, and we need to take a look at the gross dynamics of the atmosphere as they appear in these data, just so that we have some sort of idea what we’re looking at later on. That will probably take two or three articles, but then we can start with some real data analysis.

<script src="" type="text/javascript"></script> <script type="text/javascript"> (function () { var articleId = fyre.conv.load.makeArticleId(null); fyre.conv.load({}, [{ el: 'livefyre-comments', network: "", siteId: "290329", articleId: articleId, signed: false, collectionMeta: { articleId: articleId, url: fyre.conv.load.makeCollectionUrl(), } }], function() {}); }()); </script>

July 16, 2014 09:09 PM

Mark Jason Dominus

Guess what this does

Here's a Perl quiz that I confidently predict nobody will get right. Without trying it first, what does the following program print?

 perl -le 'print(two + two == five ? "true" : "false")'

(I will discuss the surprising answer tomorrow.)

by Mark Dominus ( at July 16, 2014 02:37 PM

July 15, 2014

Alessandro Vermeulen

Notes on the Advanced Akka course

The Advanced Akka course is provided by Typesafe and is aimed at teaching advanced usages of Akka. The course covers the basics of Akka, Remoting, Clustering, Routers, CRDTs, Cluster Sharding and Akka Persistance. The following post starts with a general introduction to Akka and presents the takeaways from the course as we experienced them.

A general overview of Akka

The reader which is already familiar with Akka can skip this section.

According to the Akka site this is Akka:

Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM.

Akka achieves this by using Actors.

Actors are very lightweight concurrent entities.

Each Actor has a corresponding mailbox stored separately from the Actor. The Actors together with their mailboxes reside in an ActorSystem. Additionally, the ActorSystem contains the Dispatcher which executes the handling of a message by an actor. Each Actor only handles a single message at a time.

In Akka everything is remote by design and philosophy. In practice this means that each Actor is identified by its ActorRef. This is a reference to the actor which provides Location Transparency.

Actors communicate with each other by sending messages to an another Actor through an ActorRef. This sending of the message takes virtually no time.

In addition to ActorRef there exists also an ActorSelection which contains a path to one or more actors. Upon each sending of the message the path is traversed until the actor is found or when not. No message is send back when the actor is not found however.

States: Started - Stopped - Terminated If an actor enters the Stopped state it first stops its child actors before entering the Terminated state.


Import the context.dispatcher instead of the global Scala ExecutionContext. It is the ExecutionContext managed by Akka. Using the global context causes the Actors to be run in the global Thread pool.

You should not use PoisonPill as it will be removed from future versions of Akka since it is not specific enough. Roll your own message to make sure the appropriate actions for graceful shutdown are done. Use context.stop to stop your actor.

Place your business logic in a separate trait and mix it in to the actor. This increases testability as you can easily unit test the trait containing the business logic. Also, you should put the creation of any child actors inside a separate method so the creation can be overridden from tests.


With the Remoting extension it is possible to communicate with other Actor Systems. This communication is often done through ActorSelections instead of ActorRef.

Remoting uses Java serialisation by default which is slow and fragile in light of changing definitions. It is possible and recommended to use another mechanism such as Google Protobuf.


Akka has a simple perspective on cluster management with regards to split-brain scenarios. Nodes become dead when they are observed as dead and they cannot resurrect. The only way a node can come up again is if it registers itself again.

When a net split happens the other nodes are marked as unreachable. When using a Singleton, this means that only the nodes that can reach the singleton will access it. The others will not decide on a new Singleton in order to prevent a split-brain scenario.

Another measure against split-brain is contacting the seed nodes in order. The first seed node is required to be up.

The seed nodes are tried in order.


There is an library for writing finite state machines called FSM. For larger actors it can be useful to use the FSM. Otherwise stick to pure become and unbecome.

FSM also has an interval timer for scheduling messages. However, the use of stay() resets the interval timer therefore you could have issues with never executing what is at the end of the timer.


There are two different kinds of routers: Pools and Groups. Pools are in charge of their own children and they are created and killed by the pool. Groups are configured with an ActorSelection that defines the actors to which the group should sent its messages. There are several implementations: Consistent Hash, Random, Round Robin, BroadCast, Scatter - Gather First, and Smallest Mailbox. The names are self-explanatory.

Synchronisation of data with CRDTs

Synchronising data between multiple nodes can be done by choosing your datatype so that If the timestamps and events are generated in one place no duplicate entries occur. Therefore merging a map from a different node in your map is easily done by copying entries you don’t already have to your own data.

This can be implemented by letting each member node broadcast which data-points they have. Each node can then detect which information is lacking and request the specific data from the node that claimed to have the data. At some future point in time all nodes will be in sync. This is called eventual consistency.


If you have a singleton cluster manager proxy it only starts when the cluster is formed. A cluster is formed if a member connects. The proxy will then pass on the buffered messages.

Cluster Sharding

Sharding is a way to split up a group of actors in a cluster. This can be useful if the group is too large to fit in the memory of a single machine. The Cluster Sharding feature takes care of the partitioning of the actors using a hash you have to define with a function shardResolver. The sharded actors can be messaged with an unique identifier using ClusterSharding(system).shardRegion("Counter") which proxies the message to the correct actor. ClusterSharding.start is what the Manager is to Singletons.

It is recommended to put the sharding functions into a singleton object for easy re-use of your shards, containing the functions to start the sharding extension and proxy to the shard etc. It is also convenient to adds tell and initialise helper functions to respectively send a message and initialise the actor by its unique id.

Akka Persistence

Akka persistence uses a Journal to store which messages were processed. One of the supported storage mechanisms is Cassandra. It is also possible to use a file-based journal which, of course, is not recommended.

In the current version of Akka there are two approaches to persistence: command sourcing and event sourcing. Simply but, in command storing each message is first persisted and then offered to the actor to do as it pleases whereas in event sourcing only the results of actions are persisted. The latter is preferred and will be the only remaining method in following versions.

Both methods support storing a snapshot of the current state and recovering from it.

Command Sourcing

The main problem with command sourcing lies in that all messages are replayed. This includes requests for information from dead actors which wastes resources for nothing. Moreover, in case of errors, the last message that killed the actor is also replayed and probably killing the actor again in the proces.

Event Sourcing

With event sourcing one only stores state changing events. Events are received by the receiveRecover method. External side-effects should be performed in the receive method. The code for the internal side-effect of the event should be the same in both the receive and receiveRecover methods. The actor or trait for this will be named PersistentActor.

Actor offloading

One can use Akka Persistence to “pause” long living actors, e.g. actors that have seen no activity lately. This frees up memory. When the actor is needed again it can be safely restored from the persistence layer.


Akka 3 is to be released “not super soon”. It will contain typed actors. The consequence of this is that the sender field will be removed from the actor. Therefore, for request-response, the ActorRef should be added to the request itself.


The Advanced Akka course gives a lot of insights and concrete examples of how to use the advanced Akka features of clustering, sharding and persisting data across multiple nodes in order to create a system that really is highly available, resilient and scalable. It also touches on the bleeding edge functionalities, the ideas and concepts around it and what to expect next in this growing ecosystem.

July 15, 2014 10:00 AM

Danny Gratzer

Examining Hackage: extensible-effects

Posted on July 15, 2014

I had a few people tell me after my last post that they would enjoy a write up on reading extensible-effects so here goes.

I’m going to document my process of reading through and understanding how extensible-effects is implemented. Since this is a fairly large library (about 1k) of code, we’re not going over all of it. Rather we’re just reviewing the core modules and enough of the extra ones to get a sense for how everything is implemented.

If you’re curious or still have questions, the modules that we don’t cover should serve as a nice place for further exploration.

Which Modules

extensible-effects comes with quite a few modules, my find query reveals

$ find src -name "*.hs"

Whew! Well I’m going to take a leap and assume that extensible-effects is similar to the mtl in the sense that there are a few core modules, an then a bunch of “utility” modules. So there’s Control.Monad.Trans and then Control.Monad.State and a bunch of other implementations of MonadTrans.

If we assume extensible-effects is formatted like this, then we need to look at

  1. Data.OpenUnion1
  2. Control.Monad.Eff

And maybe a few other modules to get a feel for how to use these two. I’ve added Data.OpenUnion1 because it’s imported by Control.Monad.Eff so is presumably important.

Since Data.OpenUnion1 is at the top of our dependency DAG, we’ll start with it.


So we’re starting with Data.OpenUnion1. If the authors of this code have stuck to normal Haskell naming conventions, that’s an open union of type constructors, stuff with the kind * -> *.

Happily, this module has an export list so we can at least see what’s public.

    module Data.OpenUnion1( Union (..)
                          , SetMember
                          , Member
                          , (:>)
                          , inj
                          , prj
                          , prjForce
                          , decomp
                          , unsafeReUnion
                          ) where

So we’re looking at a data type Union, which we export everything for. Two type classes SetMember and Member, a type operator :>, and a handful of functions, most likely to work with Union.

So let’s figure out exactly what this union thing is

data Union r v = forall t. (Functor t, Typeable1 t) => Union (t v)

So Union r v is just a wrapper around some of functor applied to v. This seems a little odd, what’s this r thing? The docs hint that Member t r should always hold.

Member is a type class of two parameters with no members. In fact, greping the entire source reveals that the entire definition and instances for Member in this code base is

    infixr 1 :>
    data ((a :: * -> *) :> b)
    class Member t r
    instance Member t (t :> r)
    instance Member t r => Member t (t' :> r)

So this makes it a bit clearer, :> acts like a type level cons and Member just checks for membership!

Now Union makes a bit more sense, especially in light of the inj function

    inj :: (Functor t, Typeable1 t, Member t r) => t v -> Union r v
    inj = Union

So Union takes some t in r and hides it away in an existential applied to v. Now this is kinda like having a great nested bunch of Eithers with every t applied to v.

Dual to inj, we can define a projection from a Union to some t in r. This will need to return something wrapped in Maybe since we don’t know which member of r our Union is wrapping.

    prj :: (Typeable1 t, Member t r) => Union r v -> Maybe (t v)
    prj (Union v) = runId <$> gcast1 (Id v)

prj does some evil Typeable casts, but this is necessary since we’re throwing away all our type information with that existential. That Id runId pair is needed since gcast1 has the type

    -- In our case, `c ~ Id`
    gcast1 :: (Typeable t', Typeable t) => c (t a) -> Maybe (c (t' a))

They’re just defined as

    newtype Id a = Id { runId :: a }
      deriving Typeable

so just like Control.Monad.Identity.

Now let’s try to figure out what this SetMember thing is.

    class Member t r => SetMember set (t :: * -> *) r | r set -> t
    instance SetMember set t r => SetMember set t (t' :> r)

This is unhelpful, all we have is the recursive step with no base case! Resorting to grep reveals that our base case is defined in Control.Eff.Lift so we’ll temporarily put this class off until then.

Now the rest of the file is defining a few functions to operate over Unions.

First up is an unsafe “forced” version of prj.

    infixl 4 <?>

    (<?>) :: Maybe a -> a -> a
    Just a <?> _ = a
    _ <?> a = a
    prjForce :: (Typeable1 t, Member t r) => Union r v -> (t v -> a) -> a
    prjForce u f = f <$> prj u <?> error "prjForce with an invalid type"

prjForce is really exactly what it says on the label, it’s a version of prj that throws an exception if we’re in the wrong state of Union.

Next is a way of unsafely rejiggering the type level list that Union is indexed over.

    unsafeReUnion :: Union r w -> Union t w
    unsafeReUnion (Union v) = Union v

We need this for our last function, decom. This function partially unfolds our Union into an Either

    decomp :: Typeable1 t => Union (t :> r) v -> Either (Union r v) (t v)
    decomp u = Right <$> prj u <?> Left (unsafeReUnion u)

This provides a way to actually do some sort of induction on r by breaking out each type piece by piece with some absurd case for when we don’t have a :> b.

That about wraps up this little Union library, let’s move on to see how this is actually used.


Now let’s talk about the core of extensible-effects, Control.Eff. As always we’ll start by taking a look at the export list

    module Control.Eff(
                        Eff (..)
                      , VE (..)
                      , Member
                      , SetMember
                      , Union
                      , (:>)
                      , inj
                      , prj
                      , prjForce
                      , decomp
                      , send
                      , admin
                      , run
                      , interpose
                      , handleRelay
                      , unsafeReUnion
                      ) where

So right away we can see that we’re exporting stuff Data.Union1 as well as several new things, including the infamous Eff.

The first definition we come across in this module is VE. VE is either a simple value or a Union applied to a VE!

    data VE r w = Val w | E !(Union r (VE r w))

Right away we notice that “pure value or X” pattern we see with free monads and other abstractions over effects.

We also include a quick function to try to extract a pure value form Vals

    fromVal :: VE r w -> w
    fromVal (Val w) = w
    fromVal _ = error "extensible-effects: fromVal was called on a non-terminal effect."

Now we’ve finally reached the definition of Eff!

    newtype Eff r a = Eff { runEff :: forall w. (a -> VE r w) -> VE r w }

So Eff bears a striking resemblance to Cont. There are two critical differences though, first is that we specialize our return type to something constructed with VE r. The second crucial difference is that by universally quantifying over w we sacrifice a lot of the power of Cont, including callCC!

Next in Control.Eff is the instances for Eff

    instance Functor (Eff r) where
        fmap f m = Eff $ \k -> runEff m (k . f)
        {-# INLINE fmap #-}
    instance Applicative (Eff r) where
        pure = return
        (<*>) = ap
    instance Monad (Eff r) where
        return x = Eff $ \k -> k x
        {-# INLINE return #-}
        m >>= f = Eff $ \k -> runEff m (\v -> runEff (f v) k)
        {-# INLINE (>>=) #-}

Notice that these are all really identical to Conts instances. Functor adds a function to the head of the continuation. Monad dereferences m and feeds the result into f. Exactly as with Cont.

Next we can look at our primitive function for handling effects

    send :: (forall w. (a -> VE r w) -> Union r (VE r w)) -> Eff r a
    send f = Eff (E . f)

I must admit, this tripped me up for a while. Here’s how I read it, “provide a function, which when given a continuation for the rest of the program expecting an a, produces a side effecting VE r w and we’ll map that into Eff”.

Remember how Union holds functors? Well each of our effects must act like as a functor and wrap itself in that union. By being open, we get the “extensible” in extensible-effects.

Next we look at how to remove effects once they’ve been added to our set of effects. In mtl-land, this is similar to the collection of runFooT functions that are used to gradually strip a layer of transformers away.

The first step towards this is to transform the CPS-ed effectful computation Eff, into a more manageable form, VE

    admin :: Eff r w -> VE r w
    admin (Eff m) = m Val

This is a setup step so that we can traverse the “tree” of effects that our Eff monad built up for us.

Next, we know that we can take an Eff with no effects and unwrap it into a pure value. This is the “base case” for running an effectful computation.

    run :: Eff () w -> w
    run = fromVal . admin

Concerned readers may notice that we’re using a partial function, this is OK since the E case is “morally impossible” since there is no t so that Member t () holds.

Next is the function to remove just one effect from an Eff

    handleRelay :: Typeable1 t
                => Union (t :> r) v -- ^ Request
                -> (v -> Eff r a)   -- ^ Relay the request
                -> (t v -> Eff r a) -- ^ Handle the request of type t
                -> Eff r a
    handleRelay u loop h = either passOn h $ decomp u
      where passOn u' = send (<$> u') >>= loop

Next to send, this function gave me the most trouble. The trick was to realize that that decomp will leave us in two cases.

  1. Some effect producing a v, Union r v
  2. A t producing a v, t v

If we have a t v, then we’re all set since we know exactly how to map that to a Eff r a with h.

Otherwise we need to take this effect, add it back into our computation. send (<$> u') takes the rest of the computation, that continuation and feeds it the v that we know our effects produce. This gives us the type Eff r v, where that outer Eff r contains our most recent effect as well as everything else. Now to convert this to a Eff r a we need to transform that v to an a. The only way to do that is to use the supplied loop function so we just bind to that.

Last but not least is a function to modify an effect somewhere in our effectful computation. A grep reveals will see this later with things like local from Control.Eff.Reader for example.

To do this we want something like handleRelay but without removing t from r. We also need to generalize the type so that t can be anywhere in our. Otherwise we’ll have to prematurally solidify our stack of effects to use something like modify.

    interpose :: (Typeable1 t, Functor t, Member t r)
              => Union r v
              -> (v -> Eff r a)
              -> (t v -> Eff r a)
              -> Eff r a
    interpose u loop h = maybe (send (<$> u) >>= loop) h $ prj u

Now this is almost identical to handleRelay except instead of using decomp which will split off t and only works when r ~ t :> r', we use prj! This gives us a Maybe and since the type of u doesn’t need to change we just recycle that for the send (<$> u) >>= loop sequence.

That wraps up the core of extensible-effects, and I must admit that when writing this I was still quite confused as to actually use Eff to implement new effects. Reading a few examples really helped clear things up for me.


The State monad has always been the sort of classic monad example so I suppose we’ll start here.

    module Control.Eff.State.Lazy( State (..)
                                 , get
                                 , put
                                 , modify
                                 , runState
                                 , evalState
                                 , execState
                                 ) where

So we’re not reusing the State from Control.Monad.State but providing our own. It looks like

    data State s w = State (s -> s) (s -> w)

So what is this supposed to do? Well that s -> w looks a continuation of sorts, it takes the state s, and produces the resulting value. The s -> s looks like something that modify should use.

Indeed this is the case

    modify :: (Typeable s, Member (State s) r) => (s -> s) -> Eff r ()
    modify f = send $ \k -> inj $ State f $ \_ -> k ()

    put :: (Typeable e, Member (State e) r) => e -> Eff r ()
    put = modify . const

we grab the continuation from send and add a State effect on top which uses our modification function s. The continuation that State takes ignores the value it’s passed, the current state, and instead feeds the program computation the () it’s expecting.

get is defined in a similar manner, but instead of modifying the state, we use State’s continuation to feed the program the current state.

    get :: (Typeable e, Member (State e) r) => Eff r e
    get = send (inj . State id)

So we grab the continuation, feed it to a State id which won’t modify the state, and then inject that into our open union of effects.

Now that we have the API for working with states, let’s look at how to remove that effect.

    runState :: Typeable s
             => s                     -- ^ Initial state
             -> Eff (State s :> r) w  -- ^ Effect incorporating State
             -> Eff r (s, w)          -- ^ Effect containing final state and a return value
    runState s0 = loop s0 . admin where
     loop s (Val x) = return (s, x)
     loop s (E u)   = handleRelay u (loop s) $
                           \(State t k) -> let s' = t s
                                           in loop s' (k s')

runState first preps our effect to be pattern matched on with admin. We then start loop with the initial state.

loop has two components, if we have run into a value, then we don’t interpret any effects, just stick the state and value together and return them.

If we do have an effect, we use handleRelay to split out the State s from our effects. To handle the case where we get a VE w, we just loop with the current state. However, if we get a State t k, we update the state with t and pass the continuation k.

From runState evalState and execState.

    evalState :: Typeable s => s -> Eff (State s :> r) w -> Eff r w
    evalState s = fmap snd . runState s
    execState :: Typeable s => s -> Eff (State s :> r) w -> Eff r s
    execState s = fmap fst . runState s

That wraps up the interface for Control.Eff.State. The nice bit is this makes it a lot clearer how to use send, handleRelay and a few other functions from the core.


Now we’re on to Reader. The interesting thing here is that local highlights how to use interpose properly.

As always, we start by looking at what exactly this module provides

    module Control.Eff.Reader.Lazy( Reader (..)
                                  , ask
                                  , local
                                  , reader
                                  , runReader
                                  ) where

The definition of Reader is refreshingly simple

    newtype Reader e v = Reader (e -> v)

Keen readers will note that this is just half of the State definition which makes sense; Reader is half of State.

ask is defined almost identically to get

    ask :: (Typeable e, Member (Reader e) r) => Eff r e
    ask = send (inj . Reader)

We just feed the continuation for the program into Reader. A simple wrapper over this gives our equivalent of reads

    reader :: (Typeable e, Member (Reader e) r) => (e -> a) -> Eff r a
    reader f = f <$> ask

Next up is local, which is the most interesting bit of this module.

    local :: (Typeable e, Member (Reader e) r)
          => (e -> e)
          -> Eff r a
          -> Eff r a
    local f m = do
      e <- f <$> ask
      let loop (Val x) = return x
          loop (E u) = interpose u loop (\(Reader k) -> loop (k e))
      loop (admin m)

So local starts by grabbing the view of the environment we’re interested in, e. From there we define our worker function which looks a lot like runState. The key difference is that instead of using handleRelay we use interpose to replace each Reader effect with the appropriate environment. Remember that interpose is not going to remove Reader from the set of effects, just update each Reader effect in the current computation.

Finally, we simply rejigger the computation with admin and feed it to loop.

In fact, this is very similar to how runReader works!

    runReader :: Typeable e => Eff (Reader e :> r) w -> e -> Eff r w
    runReader m e = loop (admin m)
        loop (Val x) = return x
        loop (E u) = handleRelay u loop (\(Reader k) -> loop (k e))


Now between Control.Eff.Reader and Control.Eff.State I felt I had a pretty good handle on most of what I’d read in extensible-effects. There was just one remaining loose end: SetMember. Don’t remember what that was? It was a class in Data.OpenUnion1 that was conspicuously absent of detail or use.

I finally found where it seemed to be used! In Control.Eff.Lift.

First let’s poke at the exports of his module

    module Control.Eff.Lift( Lift (..)
                           , lift
                           , runLift
                           ) where

This module is designed to lift an arbitrary monad into the world of effects. There’s a caveat though, since monads aren’t necessarily commutative, the order in which we run them in is very important. Imagine for example the difference between IO (m a) and m (IO a).

So to ensure that Eff can support lifted monads we have to do some evil things. First we must require that we never have to lifted monads and we always run the monad last. This is a little icky but it’s usefulness outweighs such ugliness.

To ensure condition 1, we need SetMember.

    instance SetMember Lift (Lift m) (Lift m :> ())

So we define a new instance of SetMember. Basically this says that any Lift is a SetMember ... r iff Lift m is the last item in r.

To ensure condition number two we define runLift with the more restrictive type

    runLift :: (Monad m, Typeable1 m) => Eff (Lift m :> ()) w -> m w

We can now look into exactly how Lift is defined.

    data Lift m v = forall a. Lift (m a) (a -> v)

So this Lift acts sort of like a “suspended bind”. We postpone actually binding the monad and simulate doing so with a continuation a -> v.

We can define our one operation with Lift, lift.

    lift :: (Typeable1 m, SetMember Lift (Lift m) r) => m a -> Eff r a
    lift m = send (inj . Lift m)

This works by suspending the rest of the program in a our faux binding to be unwrapped later in runLift.

    runLift :: (Monad m, Typeable1 m) => Eff (Lift m :> ()) w -> m w
    runLift m = loop (admin m) where
     loop (Val x) = return x
     loop (E u) = prjForce u $ \(Lift m' k) -> m' >>= loop . k

The one interesting difference between this function and the rest of the run functions we’ve seen is that here we use prjForce. The reason for this is that we know that r is just Lift m :> (). This drastically simplifies the process and means all we’re essentially doing is transforming each Lift into >>=.

That wraps up our tour of the module and with it, extensible-effects.

Wrap Up

This post turned out a lot longer than I’d expected, but I think it was worth it. We’ve gone through the coroutine/continuation based core of extensible-effects and walked through a few different examples of how to actually use them.

If you’re still having some trouble putting the pieces together, the rest of extensible effects is a great collection of useful examples of building effects.

I hope you had as much fun as I did with this one!

Thanks to Erik Rantapaa a much longer post than I led him to believe

<script type="text/javascript"> /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + ''; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

July 15, 2014 12:00 AM

July 14, 2014

Roman Cheplyaka

Type-based lift

In mtl, the ask method of the MonadReader class will automatically «lift» itself to the topmost ReaderT layer in the stack, which is very convenient, but only works as long as the topmost layer is the one you need. If you have multiple ReaderTs in the stack, you often have to insert manual lifts.

Previously I described why a smarter automatic lifting mechanism is needed to build truly reusable pieces of monadic code without too much boilerplate.

In this article I show two ways to achieve a type-based lift (that is, a lift which takes into account the r of ReaderT r), one relying on IncoherentInstances, the other — on closed type families.

Class-based approach and IncoherentInstances

In Two failed attempts at extensible effects, I wrote that simply removing the fundep from mtl wouldn’t work. This claim was recently disproved by Ben Foppa and his extensible-transformers library.

Why did I think that such an approach wouldn’t work?

{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, OverlappingInstances #-}
import Control.Monad.Trans.Reader hiding (ask)
import qualified Control.Monad.Trans.Reader as Trans
import Control.Monad.Trans.Class

class MonadReader r m where
  ask :: m r

instance Monad m => MonadReader r (ReaderT r m) where
  ask = Trans.ask

instance (Monad m, MonadReader r m, MonadTrans t) => MonadReader r (t m) where
  ask = lift ask

GHC, when asked to compile something that uses the above instances, will ask you in return to enable the IncoherentInstances extension. My experience with GHC told me that such a request is just a polite way for GHC to say «You’re doing something wrong!», so I immediately dismissed that approach. I had never seen a case where IncoherentInstances would be an acceptable solution to the problem. Well, this one seems to be exactly such a case!

Switching IncoherentInstances on here not only makes the type checker happy, but also makes the code work as expected, at least in the few tests that I tried.

Closed type classes

Intuitively, the reason why GHC needs so many ugly extensions to make the above code work is that we’re trying to simulate a closed type class with an open one.

Our type class is essentially a type-level if operator comparing two types, and its two instances correspond to the two branches of the if operator.

If only we had closed type classes, we could write

import Control.Monad.Trans.Reader hiding (ask)
import qualified Control.Monad.Trans.Reader as Trans
import Control.Monad.Trans.Class

class MonadReader r m where
  ask :: m r

  instance Monad m => MonadReader r (ReaderT r m) where
    ask = Trans.ask

  instance (Monad m, MonadReader r m, MonadTrans t) => MonadReader r (t m) where
    ask = lift ask

(where I put instance declarations inside the class declaration to show that the class is closed).

Alas, GHC 7.8 does not have closed type classes, and I have not even heard of them being developed. All we have is closed type families. Closed type families would let us compute, say, a type-level number showing how far we have to lift a monadic action to reach the right level. They, however, do not allow us to compute a value-level witness — the very lifting function!

Closed type families

Still, it is possible to achieve automatic lifting using closed type families alone. We developed this approach together with Andres Löh at ZuriHac’14.

The main idea is to split the problem into two.

First, we compute the amount of lifting required using a closed type family

-- Peano naturals, promoted to types by DataKinds
data Nat = Zero | Suc Nat

type family Find (t :: (* -> *) -> (* -> *)) (m :: * -> *) :: Nat where
  Find t (t m) = Zero
  Find t (p m) = Suc (Find t m)

Second, assuming we know how far to lift, we can compute the lifting function using an ordinary (open) MPTC:

class Monad m => MonadReaderN (n :: Nat) r m where
  askN :: Proxy n -> m r

instance Monad m => MonadReaderN Zero r (ReaderT r m) where
  askN _ = Trans.ask

instance (MonadTrans t, Monad (t m), MonadReaderN n r m, Monad m)
  => MonadReaderN (Suc n) r (t m)
    askN _ = lift $ askN (Proxy :: Proxy n)

It is important to note that our instances of MonadReaderN are non-overlapping. The instance is uniquely determined by the n :: Nat type parameter.

Finally, we glue the two components together to get a nice ask function:

-- Nice constraint alias
type MonadReader r m = MonadReaderN (Find (ReaderT r) m) r m

ask :: forall m r . MonadReader r m => m r
ask = askN (Proxy :: Proxy (Find (ReaderT r) m))

Problem solved?

Not quite. Both solutions described here do abstract from the position of a monad transformer in the stack, but they do not abstract from the transformer itself. The MonadReader r constraint can only be satisfied with ReaderT r but not, say StateT r. Moreover, a MonadState constraint, defined as

type MonadState s m = MonadStateN (Find (Control.Monad.State.Lazy.StateT s) m) s m

can only be satisfied by the lazy, but not strict, StateT.

I’ll address this issue in a subsequent article.

July 14, 2014 09:00 PM


Haskell Best Practices for Avoiding "Cabal Hell"

I posted this as a reddit comment and it was really well received, so I thought I'd post it here so it would be more linkable.  A lot of people complain about "cabal hell" and ask what they can do to solve it.  There are definitely things about the cabal/hackage ecosystem that can be improved, but on the whole it serves me quite well.  I think a significant amount of the difficulty is a result of how fast things move in the Haskell community and how much more reusable Haskell is than other languages.

With that preface, here are my best practices that seem to make Cabal work pretty well for me in my development.

1. I make sure that I have no more than the absolute minimum number of packages installed as --global.  This means that I don't use the Haskell Platform or any OS haskell packages.  I install GHC directly.  Some might think this casts too much of a negative light on the Haskell Platform.  But everyone will agree that having multiple versions of a package installed at the same time is a significant cause of build problems.  And that is exactly what the Haskell Platform does for you--it installs specific versions of packages.  If you use Haskell heavily enough, you will invariably encounter a situation where you want to use a different version of a package than the one the Haskell Platform gives you.

2. Make sure ~/.cabal/bin is at the front of your path.  Hopefully you already knew this, but I see this problem a lot, so it's worth mentioning for completeness.

3. Install happy and alex manually.  These two packages generate binary executables that you need to have in ~/.cabal/bin.  They don't get picked up automatically because they are executables and not package dependencies.

4. Make sure you have the most recent version of cabal-install.  There is a lot of work going on to improve these tools.  The latest version is significantly better than it used to be, so you should definitely be using it.

5. Become friends with "rm -fr ~/.ghc".  This command cleans out your --user repository, which is where you should install packages if you're not using a sandbox.  It sounds bad, but right now this is simply a fact of life.  The Haskell ecosystem is moving so fast that packages you install today will be out of date in a few months if not weeks or days.  We don't have purely functional nix-style package management yet, so removing the old ones is the pragmatic approach.  Note that sandboxes accomplish effectively the same thing for you.  Creating a new sandbox is the same as "rm -fr ~/.ghc" and then installing to --user, but has the benefit of not deleting everything else you had in --user.

6. If you're not working on a single project with one harmonious dependency tree, then use sandboxes for separate projects or one-off package compiles.

7. Learn to use --allow-newer.  Again, things move fast in Haskell land.  If a package gives you dependency errors, then try --allow-newer and see if the package will just work with newer versions of dependencies.

8. Don't be afraid to dive into other people's packages.  "cabal unpack" makes it trivial to download the code for any package.  From there it's often trivial to make manual changes to version bounds or even small code changes.  If you make local changes to a package, then you can either install it to --user so other packages use it, or you can do "cabal sandbox add-source /path/to/project" to ensure that your other projects use the locally modified version.  If you've made code changes, then help out the community by sending a pull request to the package maintainer.  Edit: bergmark mentions that unpack is now "cabal get" and "cabal get -s" lets you clone the project's source repository.

9. If you can't make any progress from the build messages cabal gives you, then try building with -v3.  I have encountered situations where cabal's normal dependency errors are not helpful.  Using -v3 usually gives me a much better picture of what's going on and I can usually figure out the root of the problem pretty quickly.

by mightybyte ( at July 14, 2014 02:00 PM

July 13, 2014

Oliver Charles

Announcing engine-io and socket-io for Haskell

I’ve just released three new libraries to Hackage:

  1. engine-io
  2. engine-io-snap
  3. socket-io


Engine.IO is a new framework from Automattic, which provides an abstraction for real-time client/server communication over the web. You can establish communication channels with clients over XHR long-polling, which works even through proxies and aggressive traffic rewriting, and connections are upgraded to use HTML 5 web sockets if available to reduce latency. Engine.IO also allows the transmission of binary data without overhead, while also gracefully falling back to using base 64 encoding if the client doesn’t support raw binary packets.

This is all very desirable stuff, but you’re going to have a hard time convincing me that I should switch to Node.js! I’m happy to announce that we now have a Haskell implementation for Engine.IO servers, which can be successfully used with the Engine.IO JavaScript client. A simple application may look like the following:

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where

import Control.Monad (forever)

import qualified Control.Concurrent.STM as STM
import qualified Network.EngineIO as EIO
import qualified Network.EngineIO.Snap as EIOSnap
import qualified Snap.CORS as CORS
import qualified Snap.Http.Server as Snap

handler :: EIO.Socket -> IO ()
handler s = forever $
  STM.atomically $ EIO.receive s >>= EIO.send s

main :: IO ()
main = do
  eio <- EIO.initialize
  Snap.quickHttpServe $ CORS.applyCORS CORS.defaultOptions $
    EIO.handler eio (pure handler) EIOSnap.snapAPI

This example uses engine-io-snap to run an Engine.IO application using Snap’s server, which allows me to concentrate on the important stuff. The body of the application is the handler, which is called every time a socket connects. In this case, we have a basic echo server, which constantly reads (blocking) from the client, and echos the message straight back.

As mentioned, you can also do binary transmission - the following handler transmits the lovable doge.png to clients:

handler s = do
  bytes <- BS.readFile "doge.png"
  STM.atomically $
    EIO.send socket (EIO.BinaryPacket bytes)

On the client side, this can be displayed as an image by using data URIs, or manipulated using the HTML 5 File API.


Socket.IO builds on top of Engine.IO to provide an abstraction to build applications in terms of events. In Socket.IO, clients connect to a server, and then receive and emit events, which can often provide a simpler architecture for web applications.

My Socket.IO implementation in Haskell also strives for simplicity, by taking advantage of the aeson library a lot of the encoding and decoding of packets is hidden, allowing you to focus on your application logic. I’ve implemented the example chat application, originally written in Node.js, using my Haskell server:

data AddUser = AddUser Text.Text

instance Aeson.FromJSON AddUser where
  parseJSON = Aeson.withText "AddUser" $ pure . AddUser

data NumConnected = NumConnected !Int

instance Aeson.ToJSON NumConnected where
  toJSON (NumConnected n) = Aeson.object [ "numUsers" .= n]

data NewMessage = NewMessage Text.Text

instance Aeson.FromJSON NewMessage where
  parseJSON = Aeson.withText "NewMessage" $ pure . NewMessage

data Said = Said Text.Text Text.Text

instance Aeson.ToJSON Said where
  toJSON (Said username message) = Aeson.object
    [ "username" .= username
    , "message" .= message

data UserName = UserName Text.Text

instance Aeson.ToJSON UserName where
  toJSON (UserName un) = Aeson.object [ "username" .= un ]

data UserJoined = UserJoined Text.Text Int

instance Aeson.ToJSON UserJoined where
  toJSON (UserJoined un n) = Aeson.object
    [ "username" .= un
    , "numUsers" .= n

data ServerState = ServerState { ssNConnected :: STM.TVar Int }

server :: ServerState -> SocketIO.Router ()
server state = do
  userNameMVar <- liftIO STM.newEmptyTMVarIO
  let forUserName m = liftIO (STM.atomically (STM.tryReadTMVar userNameMVar)) >>= mapM_ m

  SocketIO.on "new message" $ \(NewMessage message) ->
    forUserName $ \userName ->
      SocketIO.broadcast "new message" (Said userName message)

  SocketIO.on "add user" $ \(AddUser userName) -> do
    n <- liftIO $ STM.atomically $ do
      n <- (+ 1) <$> STM.readTVar (ssNConnected state)
      STM.putTMVar userNameMVar userName
      STM.writeTVar (ssNConnected state) n
      return n

    SocketIO.emit "login" (NumConnected n)
    SocketIO.broadcast "user joined" (UserJoined userName n)

  SocketIO.on_ "typing" $
    forUserName $ \userName ->
      SocketIO.broadcast "typing" (UserName userName)

  SocketIO.on_ "stop typing" $
    forUserName $ \userName ->
      SocketIO.broadcast "stop typing" (UserName userName)

We define a few data types and their JSON representations, and then define our server application below. Users of the library don’t have to worry about parsing and validating data for routing, as this is handled transparently by defining event handlers. In the above example, we listen for the add user event, and expect it to have a JSON payload that can be decoded to the AddUser data type. This follows the best-practice of pushing validation to the boundaries of your application, so you can spend more time working with stronger types.

By stronger types, I really do mean stronger types - at Fynder we’re using this very library with the singletons library in order to provide strongly typed publish/subscribe channels. If you’re interested in this, be sure to come along to the Haskell eXchange, where I’ll be talking about exactly that!

July 13, 2014 12:00 AM

July 12, 2014

Dominic Orchard

Rearranging equations using a zipper

Whilst experimenting with some ideas for a project, I realised I needed a quick piece of code to rearrange equations (defined in terms of +, *, -, and /) in AST form, e.g., given an AST for the equation x = y + 3, rearrange to get y = x - 3.

I realised that equations can be formulated as zippers over an AST, where operations for navigating the zipper essentially rearrange the equation. I thought this was quite neat, so I thought I would show the technique here. The code is in simple Haskell.

I’ll show the construction for a simple arithmetic calculus with the following AST data type of terms:

data Term = Add Term Term 
          | Mul Term Term 
          | Div Term Term
          | Sub Term Term  
          | Neg Term
          | Var String
          | Const Integer

with some standard pretty printing code:

instance Show Term where 
    show (Add t1 t2) = (show' t1) ++ " + " ++ (show' t2)
    show (Mul t1 t2) = (show' t1) ++ " * " ++ (show' t2)
    show (Sub t1 t2) = (show' t1) ++ " - " ++ (show' t2)
    show (Div t1 t2) = (show' t1) ++ " / " ++ (show' t2)
    show (Neg t) = "-" ++ (show' t) 
    show (Var v) = v
    show (Const n) = show n

where show' is a helper to minimise brackets e.g. pretty printing “-(v)” as “-v”.

show' :: Term -> String
show' (Var v) = v
show' (Const n) = show n
show' t@(Neg (Var v)) = show t
show' t@(Neg (Const n)) = show t
show' t = "(" ++ show t ++ ")"

Equations can be defined as pairs of terms, i.e., ‘T1 = T2′ where T1 and T2 are both represented by values of Term. However, instead, I’m going to represent equations via a zipper.

Zippers (described beautifully in the paper by Huet) represent values that have some subvalue “in focus”. The position of the focus can then be shifted through the value, refocussing on different parts. This is encoded by pairing a focal subvalue with a path to this focus, which records the rest of the value that is not in focus. For equations, the zipper type pairs a focus Term (which we’ll think of as the left-hand side of the equation) with a path (which we’ll think of as the right-hand side of the equation).

data Equation = Eq Term Path

Paths give a sequence of direction markers, essentially providing an address to the term in focus, starting from the root, where each marker is accompanied with the label of the parent node and the subtree of the branch not taken, i.e., a path going left is paired with the right subtree (which is not on the path to the focus).

data Path = Top (Either Integer String)  -- At top: constant or variable
           | Bin Op                -- OR in a binary operation Op,
                 Dir               --    in either left (L) or right (R) branch
                 Term              --    with the untaken branch 
                 Path              --    and the rest of the equation
           | N Path                -- OR in the unary negation operation

data Dir = L | R 
data Op = A | M | D | S | So | Do

The Op type gives tags for every operation, as well as additional tags So and Do which represent sub and divide but with arguments flipped. This is used to get an isomorphism between the operations that zip “up” and “down” the equation zipper, refocussing on subterms.

A useful helper maps tags to their operations:

opToTerm :: Op -> (Term -> Term -> Term)
opToTerm A = Add
opToTerm M = Mul
opToTerm D = Div
opToTerm S = Sub
opToTerm So = (\x -> \y -> Sub y x)
opToTerm Do = (\x -> \y -> Div y x)

Equations are pretty printed as follows:

instance Show Path where
    show p = show . pathToTerm $ p
instance Show Equation where
    show (Eq t p) = (show t) ++ " = " ++ (show p)

where pathToTerm converts paths to terms:

pathToTerm :: Path -> Term
pathToTerm (Top (Left c)) = Const c
pathToTerm (Top (Right v))= Var v
pathToTerm (Bin op L t p) = (opToTerm op) (pathToTerm p) t
pathToTerm (Bin op R t p) = (opToTerm op) t (pathToTerm p)
pathToTerm (N p)          = Neg (pathToTerm p)

Now onto the zipper operations which providing rebalancing of the equation. Equations are zipped-down by left and right, which for a binary operation focus on either the left or right argument respectively, for unary negation focus on the single argument, and for constants or variables does nothing. When going left or right, the equations are rebalanced with their inverse arithmetic operations (show in the comments here):

left (Eq (Var s) p)     = Eq (Var s) p
left (Eq (Const n) p)   = Eq (Const n) p
left (Eq (Add t1 t2) p) = Eq t1 (Bin S L t2 p)   -- t1 + t2 = p  -> t1 = p - t2
left (Eq (Mul t1 t2) p) = Eq t1 (Bin D L t2 p)   -- t1 * t2 = p  -> t1 = p / t2
left (Eq (Div t1 t2) p) = Eq t1 (Bin M L t2 p)   -- t1 / t2 = p  -> t1 = p * t2
left (Eq (Sub t1 t2) p) = Eq t1 (Bin A L t2 p)   -- t1 - t2 = p  -> t1 = p + t2
left (Eq (Neg t) p)     = Eq t (N p)             -- -t = p       -> t = -p

right (Eq (Var s) p)     = Eq (Var s) p          
right (Eq (Const n) p)   = Eq (Const n) p
right (Eq (Add t1 t2) p) = Eq t2 (Bin So R t1 p)  -- t1 + t2 = p -> t2 = p - t1
right (Eq (Mul t1 t2) p) = Eq t2 (Bin Do R t1 p)  -- t1 * t2 = p -> t2 = p / t1
right (Eq (Div t1 t2) p) = Eq t2 (Bin D R t1 p)   -- t1 / t2 = p -> t2 = t1 / p
right (Eq (Sub t1 t2) p) = Eq t2 (Bin S R t1 p)   -- t1 - t2 = p -> t2 = t1 - p
right (Eq (Neg t) p)     = Eq t (N p)

In both left and right, Add and Mul become subtraction and dividing, but in right in order for the the zipping-up operation to be the inverse, subtraction and division are represented using the flipped So and Do markers.

Equations are zipped-up by up, which unrolls one step of the path and reforms the term on the left-hand side from that on the right. This is the inverse of left and right:

up (Eq t1 (Top a))        = Eq t1 (Top a)
up (Eq t1 (Bin A L t2 p)) = Eq (Sub t1 t2) p -- t1 = t2 + p -> t1 - t2 = p
up (Eq t1 (Bin M L t2 p)) = Eq (Div t1 t2) p -- t1 = t2 * p -> t1 / t2 = p
up (Eq t1 (Bin D L t2 p)) = Eq (Mul t1 t2) p -- t1 = p / t2 -> t1 * t2 = p
up (Eq t1 (Bin S L t2 p)) = Eq (Add t1 t2) p -- t1 = p - t2 -> t1 + t2 = p

up (Eq t1 (Bin So R t2 p)) = Eq (Add t2 t1) p -- t1 = p - t2 -> t2 + t1 = p
up (Eq t1 (Bin Do R t2 p)) = Eq (Mul t2 t1) p -- t1 = p / t2 -> t2 * t1 = p
up (Eq t1 (Bin D R t2 p))  = Eq (Div t2 t1) p -- t1 = t2 / p -> t2 / t1 = p
up (Eq t1 (Bin S R t2 p))  = Eq (Sub t2 t1) p -- t1 = t2 - p -> t2 - t1 = p

up (Eq t1 (N p))           = Eq (Neg t1) p    -- t1 = -p     -> -t1 = p

And that’s it! Here is an example of its use from GHCi.

foo = Eq (Sub (Mul (Add (Var "x") (Var "y")) (Add (Var "x") 
           (Const 1))) (Const 1)) (Top (Left 0))

*Main> foo
((x + y) * (x + 1)) - 1 = 0

*Main> left $ foo
(x + y) * (x + 1) = 0 + 1

*Main> right . left $ foo
x + 1 = (0 + 1) / (x + y)

*Main> left . right . left $ foo
x = ((0 + 1) / (x + y)) - 1

*Main> up . left . right . left $ foo
x + 1 = (0 + 1) / (x + y)

*Main> up . up . left . right . left $ foo
(x + y) * (x + 1) = 0 + 1

*Main> up . up . up . left . right . left $ foo
((x + y) * (x + 1)) - 1 = 0

It is straightforward to prove that: up . left $ x = x (when left x is not equal to x) and up . right $ x = x(when right x is not equal to x).

Note, I am simply rebalancing the syntax of equations: this technique does not help if you have multiple uses of a variable and you want to solve the question for a particular variable, e.g. y = x + 1/(3x), or quadratics.

Here’s a concluding thought. The navigation operations left, right, and up essentially apply the inverse of the operation in focus to each side of the equation. We could therefore reformulate the navigation operations in terms of any group: given a term L ⊕ R under focus where is the binary operation of a group with inverse operation -1, then navigating left applies ⊕ R-1 to both sides and navigating right applies ⊕ L-1. However, in this blog post there is a slight difference: navigating applies the inverse to both sides and then reduces the term of the left-hand side using the group axioms X ⊕ X-1 = I (where I is the identity element of the group) and X ⊕ I = X such that the term does not grow larger and larger with inverses.

I wonder if there are other applications, which have a group structure (or number of interacting groups), for which the above zipper approach would be useful?

by dorchard at July 12, 2014 03:15 PM

Douglas M. Auclair (geophf)

1HaskellADay: Up, up, and away!

I've taken it upon myself to submit problems, then show the solutions, for @1HaskellADay. I started this work on July 1st, 2014. So the next set of entries serve to collect what I've done, both problems and solutions, and, if a particular day posed a particularly interesting problem, that I solved, or, that I didn't solve, I'll capture that here, too.

by geophf ( at July 12, 2014 02:06 PM

Dominic Orchard

The Four Rs of Programming Language Design (revisited)

Following my blog post last year about the “four Rs of programming language design” I wrote a short essay expanding upon the idea which has now been included in the online post-proceedings of the ACM Onward ’11 essay track (part of the SPLASH conference).

The essay was a lot of fun to write (I somehow end up referencing books from the 19th century, a sci-fi novel, 1984, and The Matrix!) and is a kind of mission statement (at least for myself) for language design; it is available on my research page here. I hope that it provides some food-for-thought for others interested in, or working in, language design.

by dorchard at July 12, 2014 10:13 AM

ERDI Gergo

Arrow's place in the Applicative/Monad hierarchy

I've been thinking lately about arrows in relation to applicative functors and monads. The difference between the latter two is easy to intuit (and I'll describe it via an example below), but I never managed to get the same level of understanding about arrows. There's a somewhat famous paper about this question, which has a very clear-cut diagram showing that applicatives embed into arrows and arrows embed into monads (and both containments are non-trivial), which I understood as meaning every arrow is an applicative functor, and every monad is an arrow.

At first glance, this makes sense, given the well-known result that monads are exactly equivalent to arrows that are also instances of ArrowApply, as witnessed by the Haskell types Kleisli and ArrowMonad. However, there's no immediately obvious reason why you couldn't also turn an applicative functor into an arrow, so how does that leave any room for arrows to be different from applicatives? (As an aside, the fact that applicative functors have kind ⋆ → ⋆ and arrows have kind ⋆ → ⋆ → ⋆ has been a huge complication for me in trying to compare them).

Now, finally, based on the helpful replies to that StackOverflow question and the associated Reddit thread, I am confident enough to answer my own question.

Tom Ellis suggested thinking about a concrete example involving file I/O, so let's compare three approaches to it using the three typeclasses. To make things simple, we will only care about two operations: reading a string from a file and writing a string to a file. Files are going to be identified by their file path:

type FilePath = String

Monadic I/O

Our first I/O interface is defined as follows:

data IOM ∷ ⋆ → ⋆
instance Monad IOM
readFile ∷ FilePath → IOM String
writeFile ∷ FilePath → String → IOM ()

Using this interface, we can for example copy a file from one path to another:

copy ∷ FilePath → FilePath → IOM ()
copy from to = readFile from >>= writeFile to

However, we can do much more than that: the choice of files we manipulate can depend on effects upstream. For example, the below function takes an index file which contains a filename, and copies it to the given target directory:

copyIndirect ∷ FilePath → FilePath → IOM ()
copyIndirect index target = do
    from ← readFile index
    copy from (target ⟨/⟩ to)

On the flip side, this means there is no way to know upfront the set of filenames that are going to be manipulated by a given value action ∷ IOM α. By "upfront", what I mean is the ability to write a pure function fileNames :: IOM α → [FilePath].

Of course, for non-IO-based monads (such as ones for which we have some kind of extractor function μ α → α), this distinction becomes a bit more fuzzy, but it still makes sense to think about trying to extract information without evaluating the effects of the monad (so for example, we could ask "what can we know about a Reader Γ α without having a value of type Γ at hand?").

The reason we can't really do static analysis in this sense on monads is because the function on the right-hand side of a bind is in the space of Haskell functions, and as such, is completely opaque.

So let's try restricting our interface to just an applicative functor.

Applicative I/O

data IOF ∷ ⋆ → ⋆
instance Applicative IOF
readFile ∷ FilePath → IOF String
writeFile ∷ FilePath → String → IOF ()

Since IOF is not a monad, there's no way to compose readFile and writeFile, so all we can do with this interface is to either read from a file and then postprocess its contents purely, or write to a file; but there's no way to write the contents of a file into another one.

How about changing the type of writeFile?

writeFile′ ∷ FilePath → IOF (String → ())

The main problem with this interface is that while it would allow writing something like

copy ∷ FilePath → FilePath → IOF ()
copy from to = writeFile′ to ⟨*⟩ readFile from

it leads to all kind of nasty problems because String → () is such a horrible model of writing a string to a file, since it breaks referential transparency. For example, what do you expect the contents of out.txt to be after running this program?

(λ write → [write "foo", write "bar", write "foo"]) ⟨$⟩ writeFile′ "out.txt"

Two approaches to arrowized I/O

First of all, let's get two arrow-based I/O interfaces out of the way that don't (in fact, can't) bring anything new to the table: Kleisli IOM and Applicarrow IOF.

The Kleisli-arrow of IOM, modulo currying, is:

readFile ∷ Kleisli IOM FilePath String
writeFile ∷ Kleisli IOM (FilePath, String) ()

Since writeFile's input still contains both the filename and the contents, we can still write copyIndirect (using arrow notation for simplicity). Note how the ArrowApply instance of Kleisli IOM is not even used.

copyIndirect ∷ Kleisli IOM (FilePath, FilePath) ()
copyIndirect = proc (index, target) → do
    from ← readFile ↢ index
    s ← readFile ↢ from
    writeFile ↢ (to, s)

The Applicarrow of IOF would be:

readFile ∷ FilePath → Applicarrow IOF () String
writeFile ∷ FilePath → String → Applicarrow IOF () ()

which of course still exhibits the same problem of being unable to compose readFile and writeFile.

A proper arrowized I/O interface

Instead of transforming IOM or IOF into an arrow, what if we start from scratch, and try to create something in between, in terms of where we use Haskell functions and where we make an arrow? Take the following interface:

data IOA ∷ ⋆ → ⋆ → ⋆
instance Arrow IOA
readFile ∷ FilePath → IOA () String
writeFile ∷ FilePath → IOA String ()

Because writeFile takes the content from the input side of the arrow, we can still implement copy:

copy ∷ FilePath → FilePath → IOA () ()
copy from to = readFile from >>> writeFile to

However, the other argument of writeFile is a purely functional one, and so it can't depend on the output of e.g. readFile; so copyIndirect can't be implemented with this interface.

If we turn this argument around, this also means that while we can't know in advance what will end up being written to a file (before running the full IOA pipeline itself), but we can statically determine the set of filenames that will be modified.


Monads are opaque to static analysis, and applicative functors are poor at expressing dynamic-time data dependencies. It turns out arrows can provide a sweet spot between the two: by choosing the purely functional and the arrowized inputs carefully, it is possible to create an interface that allows for just the right interplay of dynamic behaviour and amenability to static analysis.

July 12, 2014 09:30 AM

Arrow's place in the Applicative/Monad hierarchy

I've been thinking lately about arrows in relation to applicative functors and monads. The difference between the latter two is easy to intuit (and I'll describe it via an example below), but I never managed to get the same level of understanding about arrows. There's a somewhat famous paper about this question, which has a very clear-cut diagram showing that applicatives embed into arrows and arrows embed into monads (and both containments are non-trivial), which I understood as meaning every arrow is an applicative functor, and every monad is an arrow.

At first glance, this makes sense, given the well-known result that monads are exactly equivalent to arrows that are also instances of ArrowApply, as witnessed by the Haskell types Kleisli and ArrowMonad. However, there's no immediately obvious reason why you couldn't also turn an applicative functor into an arrow, so how does that leave any room for arrows to be different from applicatives? (As an aside, the fact that applicative functors have kind ⋆ → ⋆ and arrows have kind ⋆ → ⋆ → ⋆ has been a huge complication for me in trying to compare them).

Now, finally, based on the helpful replies to that StackOverflow question and the associated Reddit thread, I am confident enough to answer my own question.

Tom Ellis suggested thinking about a concrete example involving file I/O, so let's compare three approaches to it using the three typeclasses. To make things simple, we will only care about two operations: reading a string from a file and writing a string to a file. Files are going to be identified by their file path:

type FilePath = String

Monadic I/O

Our first I/O interface is defined as follows:

data IOM ∷ ⋆ → ⋆
instance Monad IOM
readFile ∷ FilePath → IOM String
writeFile ∷ FilePath → String → IOM ()

Using this interface, we can for example copy a file from one path to another:

copy ∷ FilePath → FilePath → IOM ()
copy from to = readFile from >>= writeFile to

However, we can do much more than that: the choice of files we manipulate can depend on effects upstream. For example, the below function takes an index file which contains a filename, and copies it to the given target directory:

copyIndirect ∷ FilePath → FilePath → IOM ()
copyIndirect index target = do
    from ← readFile index
    copy from (target ⟨/⟩ to)

On the flip side, this means there is no way to know upfront the set of filenames that are going to be manipulated by a given value action ∷ IOM α. By "upfront", what I mean is the ability to write a pure function fileNames :: IOM α → [FilePath].

Of course, for non-IO-based monads (such as ones for which we have some kind of extractor function μ α → α), this distinction becomes a bit more fuzzy, but it still makes sense to think about trying to extract information without evaluating the effects of the monad (so for example, we could ask "what can we know about a Reader Γ α without having a value of type Γ at hand?").

The reason we can't really do static analysis in this sense on monads is because the function on the right-hand side of a bind is in the space of Haskell functions, and as such, is completely opaque.

So let's try restricting our interface to just an applicative functor.

Applicative I/O

data IOF ∷ ⋆ → ⋆
instance Applicative IOF
readFile ∷ FilePath → IOF String
writeFile ∷ FilePath → String → IOF ()

Since IOF is not a monad, there's no way to compose readFile and writeFile, so all we can do with this interface is to either read from a file and then postprocess its contents purely, or write to a file; but there's no way to write the contents of a file into another one.

How about changing the type of writeFile?

writeFile′ ∷ FilePath → IOF (String → ())

The main problem with this interface is that while it would allow writing something like

copy ∷ FilePath → FilePath → IOF ()
copy from to = writeFile′ to ⟨*⟩ readFile from

it leads to all kind of nasty problems because String → () is such a horrible model of writing a string to a file, since it breaks referential transparency. For example, what do you expect the contents of out.txt to be after running this program?

(λ write → [write "foo", write "bar", write "foo"]) ⟨$⟩ writeFile′ "out.txt"

Two approaches to arrowized I/O

First of all, let's get two arrow-based I/O interfaces out of the way that don't (in fact, can't) bring anything new to the table: Kleisli IOM and Applicarrow IOF.

The Kleisli-arrow of IOM, modulo currying, is:

readFile ∷ Kleisli IOM FilePath String
writeFile ∷ Kleisli IOM (FilePath, String) ()

Since writeFile's input still contains both the filename and the contents, we can still write copyIndirect (using arrow notation for simplicity). Note how the ArrowApply instance of Kleisli IOM is not even used.

copyIndirect ∷ Kleisli IOM (FilePath, FilePath) ()
copyIndirect = proc (index, target) → do
    from ← readFile ↢ index
    s ← readFile ↢ from
    writeFile ↢ (to, s)

The Applicarrow of IOF would be:

readFile ∷ FilePath → Applicarrow IOF () String
writeFile ∷ FilePath → String → Applicarrow IOF () ()

which of course still exhibits the same problem of being unable to compose readFile and writeFile.

A proper arrowized I/O interface

Instead of transforming IOM or IOF into an arrow, what if we start from scratch, and try to create something in between, in terms of where we use Haskell functions and where we make an arrow? Take the following interface:

data IOA ∷ ⋆ → ⋆ → ⋆
instance Arrow IOA
readFile ∷ FilePath → IOA () String
writeFile ∷ FilePath → IOA String ()

Because writeFile takes the content from the input side of the arrow, we can still implement copy:

copy ∷ FilePath → FilePath → IOA () ()
copy from to = readFile from >>= writeFile to

However, the other argument of writeFile is a purely functional one, and so it can't depend on the output of e.g. readFile; so copyIndirect can't be implemented with this interface.

If we turn this argument around, this also means that while we can't know in advance what will end up being written to a file (before running the full IOA pipeline itself), but we can statically determine the set of filenames that will be modified.


Monads are opaque to static analysis, and applicative functors are poor at expressing dynamic-time data dependencies. It turns out arrows can provide a sweet spot between the two: by choosing the purely functional and the arrowized inputs carefully, it is possible to create an interface that allows for just the right interplay of dynamic behaviour and amenability to static analysis.

July 12, 2014 09:30 AM

July 11, 2014

Edward Z. Yang

Type classes: confluence, coherence, global uniqueness

Today, I'd like to talk about some of the core design principles behind type classes, a wildly successful feature in Haskell. The discussion here is closely motivated by the work we are doing at MSRC to support type classes in Backpack. While I was doing background reading, I was flummoxed to discover widespread misuse of the terms "confluence" and "coherence" with respect to type classes. So in this blog post, I want to settle the distinction, and propose a new term, "global uniqueness of instances" for the property which people have been colloquially referred to as confluence and coherence.

Let's start with the definitions of the two terms. Confluence is a property that comes from term-rewriting: a set of instances is confluent if, no matter what order constraint solving is performed, GHC will terminate with a canonical set of constraints that must be satisfied for any given use of a type class. In other words, confluence says that we won't conclude that a program doesn't type check just because we swapped in a different constraint solving algorithm.

Confluence's closely related twin is coherence (defined in the paper "Type classes: exploring the design space"). This property states that every different valid typing derivation of a program leads to a resulting program that has the same dynamic semantics. Why could differing typing derivations result in different dynamic semantics? The answer is that context reduction, which picks out type class instances, elaborates into concrete choices of dictionaries in the generated code. Confluence is a prerequisite for coherence, since one can hardly talk about the dynamic semantics of a program that doesn't type check.

So, what is it that people often refer to when they compare Scala type classes to Haskell type classes? I am going to refer to this as global uniqueness of instances, defining to say: in a fully compiled program, for any type, there is at most one instance resolution for a given type class. Languages with local type class instances such as Scala generally do not have this property, and this assumption is a very convenient one when building abstractions like sets.

So, what properties does GHC enforce, in practice? In the absence of any type system extensions, GHC's employs a set of rules to ensure that type class resolution is confluent and coherent. Intuitively, it achieves this by having a very simple constraint solving algorithm (generate wanted constraints and solve wanted constraints) and then requiring the set of instances to be nonoverlapping, ensuring there is only ever one way to solve a wanted constraint. Overlap is a more stringent restriction than either confluence or coherence, and via the OverlappingInstances and IncoherentInstances, GHC allows a user to relax this restriction "if they know what they're doing."

Surprisingly, however, GHC does not enforce global uniqueness of instances. Imported instances are not checked for overlap until we attempt to use them for instance resolution. Consider the following program:

-- T.hs
data T = T
-- A.hs
import T
instance Eq T where
-- B.hs
import T
instance Eq T where
-- C.hs
import A
import B

When compiled with one-shot compilation, C will not report overlapping instances unless we actually attempt to use the Eq instance in C. This is by design: ensuring that there are no overlapping instances eagerly requires eagerly reading all the interface files a module may depend on.

We might summarize these three properties in the following manner. Culturally, the Haskell community expects global uniqueness of instances to hold: the implicit global database of instances should be confluent and coherent. GHC, however, does not enforce uniqueness of instances: instead, it merely guarantees that the subset of the instance database it uses when it compiles any given module is confluent and coherent. GHC does do some tests when an instance is declared to see if it would result in overlap with visible instances, but the check is by no means perfect; truly, type-class constraint resolution has the final word. One mitigating factor is that in the absence of orphan instances, GHC is guaranteed to eagerly notice when the instance database has overlap (assuming that the instance declaration checks actually worked...)

Clearly, the fact that GHC's lazy behavior is surprising to most Haskellers means that the lazy check is mostly good enough: a user is likely to discover overlapping instances one way or another. However, it is relatively simple to construct example programs which violate global uniqueness of instances in an observable way:

-- A.hs
module A where
data U = X | Y deriving (Eq, Show)

-- B.hs
module B where
import Data.Set
import A

instance Ord U where
compare X X = EQ
compare X Y = LT
compare Y X = GT
compare Y Y = EQ

ins :: U -> Set U -> Set U
ins = insert

-- C.hs
module C where
import Data.Set
import A

instance Ord U where
compare X X = EQ
compare X Y = GT
compare Y X = LT
compare Y Y = EQ

ins' :: U -> Set U -> Set U
ins' = insert

-- D.hs
module Main where
import Data.Set
import A
import B
import C

test :: Set U
test = ins' X $ ins X $ ins Y $ empty

main :: IO ()
main = print test
$ ghc -Wall -XSafe -fforce-recomp --make D.hs
[1 of 4] Compiling A ( A.hs, A.o )
[2 of 4] Compiling B ( B.hs, B.o )

B.hs:5:10: Warning: Orphan instance: instance [safe] Ord U
[3 of 4] Compiling C ( C.hs, C.o )

C.hs:5:10: Warning: Orphan instance: instance [safe] Ord U
[4 of 4] Compiling Main ( D.hs, D.o )
Linking D ...
$ ./D
fromList [X,Y,X]

Locally, all type class resolution was coherent: in the subset of instances each module had visible, type class resolution could be done unambiguously. Furthermore, the types of ins and ins' discharge type class resolution, so that in D when the database is now overlapping, no resolution occurs, so the error is never found.

It is easy to dismiss this example as an implementation wart in GHC, and continue pretending that global uniqueness of instances holds. However, the problem with global uniqueness of instances is that they are inherently nonmodular: you might find yourself unable to compose two components because they accidentally defined the same type class instance, even though these instances are plumbed deep in the implementation details of the components. This is a big problem for Backpack, or really any module system, whose mantra of separate modular development seeks to guarantee that linking will succeed if the library writer and the application writer develop to a common signature.

by Edward Z. Yang at July 11, 2014 04:07 PM

Dominic Orchard

Automatic SIMD Vectorization for Haskell and ICFP 2013

I had a great time at ICFP 2013 this year where I presented my paper “Automatic SIMD Vectorization for Haskell”, which was joint work with Leaf Petersen and Neal Glew of Intel Labs. The full paper and slides are available online. Our paper details the vectorization process in the Intel Labs Haskell Research Compiler (HRC) which gets decent speedups on numerical code (between 2-7x on 256-bit vector registers). It was nice to be able to talk about HRC and share the results. Paul (Hai) Liu also gave a talk at the Haskell Symposium which has more details about HRC than the vectorization paper (see the paper here with Neal Glew, Leaf Petersen, and Todd Anderson). Hopefully there will be a public release of HRC in future.  

Still more to do

It’s been exciting to see the performance gains in compiled functional code over the last few years, and its encouraging to see that there is still much more we can do and explore. HRC outperforms GHC on roughly 50% of the benchmarks, showing some interesting trade-offs going on in the two compilers. HRC is particularly good at compiling high-throughput numerical code, thanks to various strictness/unboxing optimisations (and the vectorizer), but there is still more to be done.

Don’t throw away information about your programs

One thing I emphasized in my talk was the importance of keeping, not throwing away, the information encoded in our programs as we progress through the compiler stack. In the HRC vectorizer project, Haskell’s Data.Vector library was modified to distinguish between mutable array operations and “initializing writes”, a property which then gets encoded directly in HRC’s intermediate representation. This makes vectorization discovery much easier. We aim to preserve as much effect information around as possible in the IR from the original Haskell source.

This connected nicely with something Ben Lippmeier emphasised in his Haskell Symposium paper this year (“Data Flow Fusion with Series Expressions in Haskell“, joint with Manuel Chakravarty, Gabriele Keller and Amos Robinson). They provide a combinator library for first-order non-recursive dataflow computations which is guaranteed to be optimised using flow fusion (outperforming current stream fusion techniques). The important point Ben made is that, if your program fits the pattern, this optimisation is guaranteed. As well as being good for the compiler, this provides an obvious cost model for the user (no more games trying to coax the compiler into optimising in a particular way).

This is something that I have explored in the Ypnos array language, where the syntax is restricted to give (fairly strong) language invariants that guarantee parallelism and various optimisations, without undecidable analyses. The idea is to make static as much effect and coeffect (context dependence) information as possible. In Ypnos, this was so successful that I was able to encode the Ypnos’ language invariant of no out-of-bounds array access directly in Haskell’s type system (shown in the DSL’11 paper; this concept was also discussed briefly in my short language design essay).

This is a big selling point for DSLs in general: restrict a language such that various program properties are statically decidable, facilitating verification and optimisation.

Ypnos has actually had some more development in the past year, so if things progress further, there may be some new results to report on. I hope to be posting again soon about more research, including the ongoing work with Tomas Petricek on coeffect systems, and various other things I have been playing with. – D

by dorchard at July 11, 2014 03:00 PM

July 10, 2014

Edwin Brady

Resource-dependent Algebraic Effects

A new draft paper, Resource-dependent Algebraic Effects, is available. Abstract:

There has been significant interest in recent months in finding new ways to implement composable and modular effectful programs using handlers of algebraic effects. In my own previous work, I have shown how an algebraic effect system (called “effects“) can be embedded directly in a dependently typed host language. Using dependent types ought to allow precise reasoning about programs; however, the reasoning capabilities of effects have been limited to simple state transitions which are known at compile-time. In this paper, I show how effects can be extended to support reasoning in the presence of run-time state transitions, where the result may depend on run-time information about resource usage (e.g. whether opening a file succeeded). I show how this can be used to build expressive APIs, and to specify and verify the behaviour of interactive, stateful programs. I illustrate the technique using a file handling API, and an interactive game.

I’ve just submitted this, although constructive comments and suggestions are still of course very welcome!

by edwinb at July 10, 2014 05:21 PM

Yesod Web Framework

RFC: New Data.Conduit.Process

I've been working on a new iteration of the Data.Conduit.Process API over the past few days. The current API (provided by the process-conduit package), has some issues. So I'm starting over with a new API, and will be including this in conduit-extra's next release.

Before releasing, I'd like to get some feedback on the new API. I've put together a School of Haskell tutorial, which will ultimately become the real documentation for this module. It describes usage of the library, as well as why the library looks the way it does. You can view the source on the process branch of conduit.

In case anyone's wondering, the "well known bug" in waitForProcess may actually not be very well known yet. There's a race condition documented in the source code, but it's not nearly as narrow a window as implied there. For example, the following code will reliably throw an exception:

import System.Process
import Control.Concurrent.Async

main :: IO ()
main = do
    (_, _, _, ph) <- createProcess $ shell "sleep 1"
    let helper i = do
            ec <- waitForProcess ph
            print (i :: Int, ec)
    ((), ()) <- concurrently (helper 1) (helper 2)
    return ()

Thus the motivation for fixing the problem in Data.Conduit.Process. Thanks go to Michael Sloan for discovering the severity of this race condition. In fact, the bug he ran into, combined with a separate process-conduit bug I ran up against, were the impetus for me getting this library written now.

For the lazy, here's a copy of the content from School of Haskell:

NOTE: This tutorial documents a not-yet-released version of conduit-extra's Data.Conduit.Process module. Currently, that module name is provided by process-conduit, which provides a completely different API. This tutorial is present now for early feedback. If you'd like to experiment, this code is available on the process branch of the conduit repo.


Whenever you run an external process, there are four ways to interact with it post-creation:

  • Write to its standard input
  • Read from its standard output
  • Read from its standard error
  • Check its exit code

The standard System.Process module provides means for all of these interactions. However, there are some downsides with using them:

  • Many of the function in System.Process rely on lazy I/O.
  • There is a subtle race condition when checking for exit codes.
  • Dealing with Handles directly is relatively low-level.

Data.Conduit.Process provides a higher-level interface for these four interactions, based on conduit. It additionally leverages type classes to provide more static type safety than dealing directly with System.Process, as will be described below. The library is also designed to work with the wonderful async library, providing for easy, high-quality concurrency.

Note that providing general parameters for creating a process, such as its working directory or modified environment variables, are not addressed by this module; you should instead use the standard facilities from System.Process.


{-# LANGUAGE OverloadedStrings #-}
import           Control.Applicative      ((*>))
import           Control.Concurrent.Async (Concurrently (..))
import           Control.Concurrent.Async (Concurrently (..))
import           Data.Conduit             (await, yield, ($$), (=$))
import qualified Data.Conduit.Binary      as CB
import qualified Data.Conduit.List        as CL
import           Data.Conduit.Process     (ClosedStream (..), conduitProcess,
                                           proc, waitForConduitProcess)
import           System.IO                (stdin)

main :: IO ()
main = do
    putStrLn "Enter lines of data. I'll base64-encode it."
    putStrLn "Enter \"quit\" to exit."

    ((toProcess, close), fromProcess, ClosedStream, cph) <-
        conduitProcess (proc "base64" [])

    let input = CB.sourceHandle stdin
             $$ CB.lines
             =$ inputLoop
             =$ toProcess

        inputLoop = do
            mbs <- await
            case mbs of
                Nothing -> close
                Just "quit" -> close
                Just bs -> do
                    yield bs

        output = fromProcess $$ CL.mapM_
            (\bs -> putStrLn $ "from process: " ++ show bs)

    ec <- runConcurrently $
        Concurrently input *>
        Concurrently output *>
        Concurrently (waitForConduitProcess cph)

    putStrLn $ "Process exit code: " ++ show ec

Exit codes

There's a well documented corner case in waitForProcess whereby multiple calls can end up in a race condition, and therefore a deadlock. Data.Conduit.Process works around this issue by not providing direct access to a ProcessHandle. Instead, it wraps this with a ConduitProcessHandle, which uses an STM TMVar under the surface. This allows you to either poll to check if a process has exited, or block and wait for the process to exit. As a minimal example (ignore the streaming bits for now, they'll be explained shortly).

import Data.Conduit.Process

main :: IO ()
main = do
    (Inherited, Inherited, Inherited, cph) <-
        conduitProcess (shell "sleep 2")

    -- non-blocking
    getConduitProcessExitCode cph >>= print

    -- blocking
    waitForConduitProcess cph >>= print

If you need direct access to the ProcessHandle (e.g., to terminate a process), you can use conduitProcessHandleRaw.


Now to the main event: streaming data. There are multiple ways you can interact with streams with an external process:

  • Let the child process inherit the stream from the parent process
  • Provide a pre-existing Handle.
  • Create a new Handle to allow more control of the interaction.

One downside of the System.Process API is that there is no static type safety to ensure that the std_out parameter in fact matches up with the value produced by createProcess for the standard output handle. To overcome this, Data.Conduit.Process makes use of type classes to represent the different ways to create a stream. This isn't entirely intuitive from the Haddocks, but once you see the concept used, it's easy to use yourself.

Inherited and ClosedStream

Let's start with an example of using the simplest instances of our typeclasses. Inherited says to inherit the Handle from the parent process, while ClosedStream says to close the stream to the child process. For example, the next snippet will inherit stdin and stdout from the parent process and close standard error.

import Data.Conduit.Process

main :: IO ()
main = do
    putStrLn "Just wrapping cat. Use Ctrl-D to exit."

    (Inherited, Inherited, ClosedStream, cph) <-
        conduitProcess (shell "cat")

    waitForConduitProcess cph >>= print

Note that there's no way to send an EOF in School of Haskell, so the above active code will never terminate.


It would be pretty strange to have a library in conduit-extra that didn't provide some conduit capabilities. You can additionally get a Sink to be used to feed data into the process via standard input, and Sources for consuming standard output and error.

This next example reads standard input from the console, process standard output with a conduit, and closes standard error.

import           Data.Conduit         (($$))
import qualified Data.Conduit.List    as CL
import           Data.Conduit.Process

main :: IO ()
main = do
    putStrLn "Just wrapping cat. Use Ctrl-D to exit."

    (Inherited, src, ClosedStream, cph) <-
        conduitProcess (shell "cat")

    src $$ CL.mapM_ print

    waitForConduitProcess cph >>= print

Note that these Sources and Sinks will never close their Handles. This is done on purpose, to allow them to be used multiple times without accidentally closing their streams. In many cases, you'll need to close the streams manually, which brings us to our next section.

Conduit + close

Let's say we'd like to close our input stream whenever the user types in "quit". To do that, we need to get an action to close the standard input Handle. This is simple: instead of just returning a Source or Sink, we ask for a tuple of a Source/Sink together with an IO () action to close the handle.

{-# LANGUAGE OverloadedStrings #-}
import           Data.ByteString      (ByteString)
import           Data.Conduit         (Source, await, yield, ($$), ($=))
import qualified Data.Conduit.Binary  as CB
import           Data.Conduit.Process
import           System.IO            (stdin)

userInput :: Source IO ByteString
userInput =
       CB.sourceHandle stdin
    $= CB.lines
    $= loop
    loop = do
        mbs <- await
        case mbs of
            Nothing -> return ()
            Just "quit" -> return ()
            Just bs -> do
                yield bs
                yield "\n"

main :: IO ()
main = do
    putStrLn "Just wrapping cat. Type \"quit\" to exit."

    ((sink, close), Inherited, ClosedStream, cph) <-
        conduitProcess (shell "cat")

    userInput $$ sink

    waitForConduitProcess cph >>= print


Let's take a quick detour from our running example to talk about the last special type: UseProvidedHandle. This says to conduitProcess: use the example value of UseHandle provided in std_in/std_out/std_err. We can use this to redirect output directly to a file:

import Data.Conduit.Process
import System.IO (withFile, IOMode (..))

main :: IO ()
main = do
    let fp = "date.txt"
    withFile fp WriteMode $ \h -> do
        (ClosedStream, UseProvidedHandle, ClosedStream, cph) <-
            conduitProcess (shell "date")
                { std_out = UseHandle h
        waitForConduitProcess cph >>= print
    readFile fp >>= print

Use with async

In our examples above, we only ever used a single Source or Sink at a time. There's a good reason for this: we can easily run into deadlocks if we don't properly handle concurrency. There are multiple ways to do this, but I'm going to strongly recommend using the async package, which handles many corner cases automatically. In particular, the Concurrently data type and its Applicative instance make it easy and safe to handle multiple streams at once.

Instead of duplicating it here, I'll ask the reader to please refer back to the synopsis example, which ties this all together with two threads for handling streams, and another thread which blocks waiting for the process to exit. That style of concurrency is very idiomatic usage of this library.

July 10, 2014 07:15 AM

Danny Gratzer

Examining Hackage: logict

Posted on July 10, 2014

One of my oldest habits with programming is reading other people’s code. I’ve been doing it almost since I started programming. For the last two years that habit has been focused on Hackage. Today I was reading the source code to the “logic programming monad” provided by logict and wanted to blog about how I go about reading new Haskell code.

This time the code was pretty tiny, find . -name *.hs | xargs wc -l reveals two files with just under 400 lines of code! logict also only has two dependencies, base and the mtl, so there’s not a big worry of unfamiliar libraries.

Setting Up

It’s a lot easier to read this post if you have the source for logict on hand. To grab it, use cabal get. My setup is something like

~ $ cabal get logict
~ $ cd logict-
~/logict- $ cabal sandbox init
~/logict- $ cabal install --only-dependencies

Poking Around

I’m somewhat ashamed to admit that I use pretty primitive tooling for exploring a new codebase, it’s grep and find all the way! If you use a fancy IDE, perhaps you can just skip this section and take a moment to sit back and feel high-tech.

First things first is to figure out what Haskell files are here. It can be different than what’s listed on Hackage since often libraries don’t export external files.

~/logict- $ find . -name *.hs

Alright, there’s two source file and one sitting in dist. The dist one is almost certainly just cabal auto-gened stuff that we don’t care about.

It also appears that there’s no src directory and every module is publicly exported! This means that we only have two modules to worry about.

The next thing to figure out is which to read first. In this case the choice is simple: greping for imports with

grep "import" -r Control

reveals that Control.Monad.Logic imports Control.Monad.Logic.Class so we start with *.Class.

Reading Control.Monad.Logic.Class

Alright! Now it’s actually time to start reading code.

The first thing that jumps out is the export list

    module Control.Monad.Logic.Class (MonadLogic(..), reflect, lnot) where

Alright, so we’re exporting everything from a class MonadLogic, as well as two functions reflect and lnot. Let’s go figure out what MonadLogic is.

    class (MonadPlus m) => MonadLogic m where
      msplit     :: m a -> m (Maybe (a, m a))
      interleave :: m a -> m a -> m a
      (>>-)      :: m a -> (a -> m b) -> m b
      ifte       :: m a -> (a -> m b) -> m b -> m b
      once       :: m a -> m a

The fact that this depends on MonadPlus is pretty significant. Since most classes don’t require this I’m going to assume that it’s fairly key to either the implementation of some of these methods or to using them. Similar to how Monoid is critical to Writer.

The docs make it pretty clear what each member of this class does

  • msplit

    Take a local computation and split it into it’s first result and another computation that computes the rest.

  • interleave

    This is the key difference between MonadLogic and []. interleave gives fair choice between two computation. This means that every result that appears in finitely many applications of msplit for some a and b, will appear in finitely many applications of msplit to interleave a b.

  • >>-

    >>- is similar to interleave. Consider some code like

      (a >>= k) `mplus` (b >>= k)

    This is equivalent to mplus a b >>= k, but has different characteristics since >>= might never terminate. >>- is described as “considering both sides of the disjunction”.

    I have absolutely no idea what that means.. hopefully it’ll be clearer once we look at some implementations.

  • ifte

    This is the equivalent of Prolog’s soft cut. We poke a logical computation and if it can succeed at all, then we feed it into the success computation, otherwise we’ll feed return the failure case.

  • once

    once is clever combinator to prevent backtracking. It will grab the first result from a computation, wrap it up and return it. This prevents backtracking further on the original computation.

Now the docs also state that everything is derivable from msplit. These implementations look like

    interleave m1 m2 = msplit m1 >>=
                        maybe m2 (\(a, m1') -> return a `mplus` interleave m2 m1')

    m >>- f = do (a, m') <- maybe mzero return =<< msplit m
                 interleave (f a) (m' >>- f)

    ifte t th el = msplit t >>= maybe el (\(a,m) -> th a `mplus` (m >>= th))

    once m = do (a, _) <- maybe mzero return =<< msplit m
                return a

The first thing I notice looking at interleave is that it kinda looks like

    interleave' :: [a] -> [a] -> [a]
    interleave' (x:xs) ys = x : interleave' ys xs
    interleave _ ys       = ys

This makes sense, since this will fairly split between xs and ys just like interleave is supposed to. Here msplit is like pattern matching, mplus is :, and we have to sprinkle some return in there for kicks and giggles.

Now about this mysterious >>-, the biggest difference is that each f a is interleaved, rather than mplus-ed. This should mean that it can be fairly split between our first result, f a and the rest of them m' >>- f. Now if we can do something like

    (m >>- f) `interleave` (m' >>- f)

Should have nice and fair behavior.

The next two are fairly clear, ifte splits it’s computation, and if it can it feeds the whole stinking thing return amplusm' to the success computation, otherwise it just returns the failure computation. Nothing stunning.

once is my favorite function. To prevent backtracking all we do is grab the first result and return it.

So that takes care of MonadTrans. The next thing to worry about are these two functions reflect and lnot.

reflect confirms my suspicion that the dual of msplit is mplus (return a) m'.

    reflect :: MonadLogic m => Maybe (a, m a) -> m a
    reflect Nothing = mzero
    reflect (Just (a, m)) = return a `mplus` m

The next function lnot negates a logical computation. Now, this is a little misleading because the negated computation either produces one value, (), or is mzero and produces nothing. This is easily accomplished with ifte and once

    lnot :: MonadLogic m => m a -> m ()
    lnot m = ifte (once m) (const mzero) (return ())

That takes care of most of this file. What’s left is a bunch of instances for monad transformers for MonadTrans. There’s nothing to interesting in them so I won’t talk about them here. It might be worth glancing at the code if you’re interested.

One slightly odd thing I’m noticing is that each class implements all the methods, rather than just msplit. This seems a bit odd.. I guess the default implementations are significantly slower? Perhaps some benchmarking is in order.


Now that we’ve finished with Control.Monad.Logic.Class, let’s move on to the main file.

Now we finally see the definition of LogicT

    newtype LogicT m a =
        LogicT { unLogicT :: forall r. (a -> m r -> m r) -> m r -> m r }

I have no idea how this works, but I’m guessing that this is a church version of [a] specialized to some m. Remember that the church version of [a] is

    type CList a = forall r. (a -> r -> r) -> r -> r

Now what’s interesting here is that the church version is strongly connected to how CPSed code works. We could than imagine that mplus works like cons for church lists and yields more and more results. But again, this is just speculation.

This suspicion is confirmed by the functions to extract values out of a LogicT computation

    observeT :: Monad m => LogicT m a -> m a
    observeT lt = unLogicT lt (const . return) (fail "No answer.")
    observeAllT :: Monad m => LogicT m a -> m [a]
    observeAllT m = unLogicT m (liftM . (:)) (return [])

    observeManyT :: Monad m => Int -> LogicT m a -> m [a]
    observeManyT n m
        | n <= 0 = return []
        | n == 1 = unLogicT m (\a _ -> return [a]) (return [])
        | otherwise = unLogicT (msplit m) sk (return [])
     sk Nothing _ = return []
     sk (Just (a, m')) _ = (a:) `liftM` observeManyT (n-1) m'

observeT grabs the a from the success continuation and if no result is returned than it will evaluate fail "No Answer which looks like the failure continuation! Looks like out suspicion is confirmed, we’re dealing with monadic church lists or some other permutation of those buzzwords.

Somehow in a package partially designed by Oleg I’m not surprised to find continuations :)

observeAllT is quite similar, notice that we take advantage of the fact that r is universally quantified to instantiate it to a. This quantification is also used in observeManyT. This quantification also prevents any LogicT from taking advantage of the return type to do evil things with returning random values that happen to match the return type. This is what’s possible with ContT for example.

Now we have the standard specialization and smart constructor for the non-transformer version.

    type Logic = LogicT Identity
    logic :: (forall r. (a -> r -> r) -> r -> r) -> Logic a
    logic f = LogicT $ \k -> Identity .
                             f (\a -> runIdentity . k a . Identity) .

Look familiar? Now we can inject real church lists into a Logic computation. I suppose this shouldn’t be surprising since [a] functions like a slightly broken Logic a, without any sharing or soft cut.

Now we repeat all the observe* functions for Logic, I’ll omit these since they’re implementations are exactly as you’d expect and not interesting.

Next we have a few type class instances

    instance Functor (LogicT f) where
        fmap f lt = LogicT $ \sk fk -> unLogicT lt (sk . f) fk
    instance Applicative (LogicT f) where
        pure a = LogicT $ \sk fk -> sk a fk
        f <*> a = LogicT $ \sk fk -> unLogicT f (\g fk' -> unLogicT a (sk . g) fk') fk
    instance Alternative (LogicT f) where
        empty = LogicT $ \_ fk -> fk
        f1 <|> f2 = LogicT $ \sk fk -> unLogicT f1 sk (unLogicT f2 sk fk)
    instance Monad (LogicT m) where
        return a = LogicT $ \sk fk -> sk a fk
        m >>= f = LogicT $ \sk fk -> unLogicT m (\a fk' -> unLogicT (f a) sk fk') fk
        fail _ = LogicT $ \_ fk -> fk

It helps for reading this if you expand sk to “success continuation” and fk to “fail computation”. Since we’re dealing with church lists I suppose you could also use cons and nil.

What’s particularly interesting to me here is that there are no constraints on m for these type class declarations! Let’s go through them one at a time.

Functor is usually pretty mechanical, and this is no exception. Here we just have to change a -> m r -> m r to b -> m r -> m r. This is trivial just by composing the success computation with f.

Applicative is similar. pure just lifts a value into the church equivalent of a singleton list, [a]. <*> is a little bit more meaty, we first unwrap f to it’s underlying function g, and composes it with out successes computation for a. Notice that this is very similar to how Cont works, continuation passing style is necessary with church representations.

Now return and fail are pretty straightforward. Though this is interesting because since pattern matching calls fail, we can just do something like

      Just a <- m
      Just b <  n
      return $ a + b

And we’ll run n and m until we get a Just value.

As for >>=, it’s implementation is very similar to <*>. We unwrap m and then feed the unwrapped a into f and run that with our success computations.

We’re only going to talk about one more instance for LogicT, MonadLogic, there are a few others but they’re mostly for MTL use and not too interesting.

    instance (Monad m) => MonadLogic (LogicT m) where
        msplit m = lift $ unLogicT m ssk (return Nothing)
         where ssk a fk = return $ Just (a, (lift fk >>= reflect))

We’re only implementing msplit here, which strikes me as a bit odd since we implemented everything before. We also actually need Monad m here so that we can use LogicT’s MonadTrans instance.

To split a LogicT, we run a special success computation and return Nothing if failure is ever called. Now there’s one more clever trick here, since we can choose what the r is in m r, we choose it to be Maybe (a, LogicT m a)! That way we can take the failure case, which essentially is just the tail of the list, and push it into reflect.

This confused me a bit so I wrote the equivalent version for church lists, where msplit is just uncons.

    {-# LANGUAGE RankNTypes #-}
    newtype CList a = CList {runCList :: forall r. (a -> r -> r) -> r -> r}
    cons :: a -> CList a -> CList a
    cons a (CList list) = CList $ \cs nil -> cs a (list cs nil)
    nil :: CList a
    nil = CList $ \cons nil -> nil
    head :: CList a -> Maybe a
    head list = runCList list (const . Just) Nothing
    uncons :: CList a -> Maybe (a, CList a)
    uncons (CList list) = list skk Nothing
      where skk a rest = Just (a, maybe nil (uncurry cons) rest)

Now it’s a bit clearer what’s going on, skk just pairs up the head of the list with the rest. However, since the tail of the list has the type m (Maybe (a, LogicT m a)), we lift it back into the LogicT monad and use reflect to smush it back into a good church list.

That about covers Control.Monad.Logic

Wrap Up

I’ve never tried sharing these readings before so I hope you enjoyed it. If this receives some positive feedback I’ll do something similar with another package, I’m leaning towards extensible-effects.

If you’re interested in doing this yourself, I highly recommend it! I’ve learned a lot about practical engineering with Haskell, as well as really clever and elegant Haskell code.

One thing I’ve always enjoyed about the Haskell ecosystem is that some of the most interesting code is often quite easy to read given some time.

<script type="text/javascript"> /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + ''; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

July 10, 2014 12:00 AM

July 09, 2014

Danny Gratzer

Dissecting crush

Posted on July 9, 2014

For almost a year and half now I’ve been referencing one particular book on Coq, Certified Programming with Dependent Types. CPDT is a literate program on building practical things with Coq.

One of the main ideas of CPDT is that proofs ought to be fully automated. This means that a proof should be primarily a logic program (Ltac) which constructs some boring and large proof term. To this end, CPDT has a bunch of Ltac “tactics” for constructing such logic programs.

Since CPDT is a program, there’s actual working source for each of these tactics. It occurred to me today that in my 18 months of blinking uncomprehendingly at CPDT, I’ve never read its source for these tactics.

In this post, we’ll dissect how CPDT’s main tactic for automation, crush, actually works. In the process, we’ll get the chance to explore some nice, compositional, ltac engineering as well as a whole host of useful tricks.

The Code

The first step to figuring out of crush works is actually finding where it’s defined.

After downloading the source to CPDT I ran

grep "Ltac crush :=" -r .

And found in src/CpdtTactics, line 205

Ltac crush := crush' false fail.

Glancing at crush', I’ve noticed that it pulls in almost every tactic in CpdtTactics. Therefore, we’ll start at the top of this file and work our way done, dissecting each tactic as we go.

Incidentally, since CpdtTactics is an independent file, if you’re confused about something firing up your coq dev environment of choice and trying things out with Goal inline works nicely.

Starting from the top, our first tactic is inject.

Ltac inject H := injection H; clear H; intros; try subst.

This is just a quick wrapper around injection, which also does the normal operations one wants after calling injection. It clears the original hypothesis and brings our new equalities into our environment so future tactics can use them. It also tries to swap out any variables with our new equalities using subst. Notice the try wrapper since subst is one of those few tactics that will fail if it can’t do anything useful.

Next up is

Ltac appHyps f :=
  match goal with
    | [ H : _ |- _ ] => f H

appHyps makes use of the backtracking nature of match goal with. It’ll apply f to every hypothesis in the current environment and stop once it find a hypothesis f works with.

Now we get to some combinators for working with hypothesis.

Ltac inList x ls :=
  match ls with
    | x => idtac
    | (_, x) => idtac
    | (?LS, _) => inList x LS

inList takes a faux-list of hypothesis and looks for an occurrence of a particular lemma x. When it finds it we just run idtac which does nothing. In the case were we can’t match x anywhere, inList will just fail with the standard “No matching clause” message.

Next we have the equivalent of appHyps for tupled lists

Ltac app f ls :=
  match ls with
    | (?LS, ?X) => f X || app f LS || fail 1
    | _ => f ls

This works exactly like appHyps but instead of looking through the proofs environment, we’re looking through ls. It has the same “keep the first result that works” semantics too. One thing that confused me was the _ => f ls clause of this tactic. Remember that with our tupled lists we don’t have a “nil” member. But rather the equivalent of

A :: B :: C :: Nil


((A, B), C)

So when we don’t have a pair, ls itself is the last hypothesis in our list. As a corollary of this, there is no obvious “empty” tupled list, only one with a useless last hypothesis.

Next we have all, which runs f on every member in f ls.

Ltac all f ls :=
  match ls with
    | (?LS, ?X) => f X; all f LS
    | (_, _) => fail 1
    | _ => f ls

Careful readers will notice that instead of f X || ... we use ;. Additionally, if the first clause fails and the second clause matches, that means that either f X or all f LS failed. In this case we backtrack all the way back out of this clause. This should mean that this is a “all or nothing” tactic. It will either not fail on all members of ls or nothing at all will happen.

Now we get to the first big tactic

Ltac simplHyp invOne :=
  let invert H F :=
    inList F invOne;
      (inversion H; fail)
      || (inversion H; [idtac]; clear H; try subst) in

  match goal with
    | [ H : ex _ |- _ ] => destruct H
    | [ H : ?F ?X = ?F ?Y |- ?G ] =>
      (assert (X = Y); [ assumption | fail 1 ])
      || (injection H;
        match goal with
          | [ |- X = Y -> G ] =>
            try clear H; intros; try subst
    | [ H : ?F ?X ?U = ?F ?Y ?V |- ?G ] =>
      (assert (X = Y); [ assumption
        | assert (U = V); [ assumption | fail 1 ] ])
      || (injection H;
        match goal with
          | [ |- U = V -> X = Y -> G ] =>
            try clear H; intros; try subst

    | [ H : ?F _ |- _ ] => invert H F
    | [ H : ?F _ _ |- _ ] => invert H F
    | [ H : ?F _ _ _ |- _ ] => invert H F
    | [ H : ?F _ _ _ _ |- _ ] => invert H F
    | [ H : ?F _ _ _ _ _ |- _ ] => invert H F

    | [ H : existT _ ?T _ = existT _ ?T _ |- _ ] => generalize (inj_pair2 _ _ _ _ _ H); clear H
    | [ H : existT _ _ _ = existT _ _ _ |- _ ] => inversion H; clear H
    | [ H : Some _ = Some _ |- _ ] => injection H; clear H

Wow, just a little bit bigger than what we’ve been working with so far.

The first small chunk of simpleHyp is a tactic for doing clever inversion using the tuple list invOne.

 invert H F :=
   inList F invOne;
   (inversion H; fail)
     || (inversion H; [idtac]; clear H; try subst)

Here H is a hypothesis that we’re thinking about inverting on and F is the head symbol of H. First we run the inList predicate, meaning that we don’t invert upon anything that we don’t want to. If the head symbol of H is something worth inverting upon we try two different types of inversion.

In the first case inversion H; fail we’re just looking for an “easy proof” where inverting H immediately dispatches the current goal. In the second case inversion H; [idtac]; clear H; try subst, we invert upon H iff it only generates 1 subgoal. Remember that [t | t' | t''] is a tactic that runs t on the first subgoal, t’ on the second, and so on. If the number of goals don’t match, [] will fail. So [idtac] is just a clever way of saying “there’s only one new subgoal”. Next we get rid of the hypothesis we just inverted on (it’s not useful now, and we don’t want to try inverting it again) and see if any substitutions are applicable.

Alright! Now let’s talk about the massive match goal with going on in simplHyp.

The first branch is

    | [ H : ex _ |- _ ] => destruct H

This just looks for a hypothesis with an existential (remember that ex is what exists desugars to). If we find one, we introduce a new variable to our environment and instantiate H with it. The fact that this doesn’t recursively call simplHyp probably means that we want to do something like repeat simplHyp to ensure this is applied everywhere.

Next we look at simplifying hypothesis where injection applies. There are two almost identical branches, one for constructors of two parameters, one for one. Let’s look at the latter since it’s slightly simpler.

    | [ H : ?F ?X = ?F ?Y |- ?G ] =>
      (assert (X = Y); [ assumption | fail 1 ])
      || (injection H;
        match goal with
          | [ |- X = Y -> G ] =>
            try clear H; intros; try subst

This looks for an equality over a constructor F. This branch is looking to prove that X = Y, a fact deducible from the injectiveness of F.

The way that we go about doing this is actually quite a clever ltac trick though. First we assert X = Y, this will generate to subgoals, the first that X = Y (shocker) and the second is the current goal G, with the new hypothesis that X = Y. We attempt to prove that X = Y by assumption. If this works, than we already trivially can deduce X = Y so there’s no point in doing all that injection stuff so we fail 1 and bomb out of the whole branch.

If assumption fails we’ll jump to the other side of the ||s and actually use injection. We only run injection if it generates a proof that X = Y in which case we do the normal cleanup with trying to clear our original fact and do some substitution.

The next part is fairly straightforward, we make use of that invert tactic and run it over facts we have floating around in our environment

    | [ H : ?F _ |- _ ] => invert H F
    | [ H : ?F _ _ |- _ ] => invert H F
    | [ H : ?F _ _ _ |- _ ] => invert H F
    | [ H : ?F _ _ _ _ |- _ ] => invert H F
    | [ H : ?F _ _ _ _ _ |- _ ] => invert H F

Notice that we can now use the match to grab the leading symbol for H so we only invert upon hypothesis that we think will be useful.

Next comes a bit of axiom-fu

    | [ H : existT _ ?T _ = existT _ ?T _ |- _ ] =>
        generalize (inj_pair2 _ _ _ _ _ H); clear H

inj_pair2 is function that lives in the Coq standard library and has the type

forall (U : Type) (P : U -> Type) (p : U) (x y : P p),
       existT P p x = existT P p y -> x = y

This relies on eq_rect_eq so it’s just a little bit dodgy for something like HoTT where we give more rope to = than just refl.

This particular branch of the match is quite straightforward though. Once we see an equality between two witnesses for the same existential type, we just generalize the equality between their proofs into our goal.

If this fails however, we’ll fall back to standard inversion with

    | [ H : existT _ _ _ = existT _ _ _ |- _ ] => inversion H; clear H

Finally, we have one last special case branch for Some. This is because the branches above will fail when phased with a polymorphic constructor

    | [ H : Some _ = Some _ |- _ ] => injection H; clear H

Nothing exciting going on there.

So that wraps up simplHyp. It’s just a conglomeration of useful stuff to do to constructors in our hypothesis.

Onwards we go! Next is a simple tactic for automatically rewriting with a hypothesis

Ltac rewriteHyp :=
  match goal with
    | [ H : _ |- _ ] => rewrite H by solve [ auto ]

like most of the other tactics we saw earlier, this will hunt for an H where this works and then stop. The by solve [auto] will run solve [auto] against all the hypothesis that the rewrite generates and ensure that auto solves all the new goals. This prevents a rewrite from going and introducing obviously false facts as goals for a rewrite that made no sense.

We can combine this with autorewrite with two simple tactics

Ltac rewriterP := repeat (rewriteHyp; autorewrite with core in *).
Ltac rewriter := autorewrite with core in *; rewriterP.

This just repeatedly rewrite with autorewrite and rewriteHyp as long as they can. Worth noticing here how we can use repeat to make these smaller tactics modify all applicable hypothesis rather than just one.

Next up is an innocent looking definition that frightens me a little bit

Definition done (T : Type) (x : T) := True.

What frightens me about this is that Adam calls this “devious”.. and when he calls something clever or devious I’m fairly certain I’d never be able to come up with it :)

What this actually appears to do is provide a simple way to “stick” something into an environment. We can trivially prove done T x for any T and x but having this in an environment also gives us a proposition T and a ready made proof of it x! This is useful for tactics since we can do something like

assert (done SomethingUseful usefulPrf) by constructor

and viola! Global state without hurting anything.

We use these in the next tactic, instr.

Ltac inster e trace :=
  match type of e with
    | forall x : _, _ =>
      match goal with
        | [ H : _ |- _ ] =>
          inster (e H) (trace, H)
        | _ => fail 2
    | _ =>
      match trace with
        | (_, _) =>
          match goal with
            | [ H : done (trace, _) |- _ ] =>
              fail 1
            | _ =>
              let T := type of e in
                match type of T with
                  | Prop =>
                    generalize e; intro;
                      assert (done (trace, tt)) by constructor
                  | _ =>
                    all ltac:(fun X =>
                      match goal with
                        | [ H : done (_, X) |- _ ] => fail 1
                        | _ => idtac
                      end) trace;
                    let i := fresh "i" in (pose (i := e);
                      assert (done (trace, i)) by constructor)

Another big one!

This match is a little different than the previous ones. It’s not a match goal but a match type of ... with. This is used to examine one particular hypothesis’ type and match over that.

This particular match has two branches. The first deals with the case where we have uninstantiated universally quantified variables.

 | forall x : _, _ =>
    match goal with
      | [ H : _ |- _ ] =>
        inster (e H) (trace, H)
      | _ => fail 2

If our hypothesis does, we randomly grab a hypothesis, instantiate e with it, add H to the trace list, and then recurse.

If there isn’t a hypothesis, then we fail out of the toplevel match and exit the tactic.

Now the next branch is where the real work happens

  | _ =>
    match trace with
      | (_, _) =>
        match goal with
          | [ H : done (trace, _) |- _ ] =>
            fail 1
          | _ =>
            let T := type of e in
              match type of T with
                | Prop =>
                  generalize e; intro;
                    assert (done (trace, tt)) by constructor
                | _ =>
                  all ltac:(fun X =>
                    match goal with
                      | [ H : done (_, X) |- _ ] => fail 1
                      | _ => idtac
                    end) trace;
                  let i := fresh "i" in (pose (i := e);
                    assert (done (trace, i)) by constructor)

We first chekc to make sure that trace isn’t empty. If this is the case, then we know that we instantiated e with at least something. If we have, we snoop around to see if there’s a done in our environment with the same trace. If this is the case, we know that we’ve done an identical instantiation of e before hand so we backtrack to try another one.

Otherwise, we look to see what e was instantiated too. If it was a simple Prop, we just stick a done record of this instantiation into our environment and add our new instantiated e back in with generalize. If e isn’t a proof, we do the same thing. In this case, however, we must also double check that the things we used to instantiate e with aren’t results of inster as well otherwise our combination of backtracking/instantiating can lead to an infinite loop.

Since this tactic generates a bunch of done’s that are otherwise useless, a tactic to clear them is helpful.

Ltac un_done :=
  repeat match goal with
           | [ H : done _ |- _ ] => clear H

Hopefully by this point this isn’t too confusing. All this tactic does is loop through the environment and clear all dones.

Now, finally, we’ve reached crush'.

Ltac crush' lemmas invOne :=
  let sintuition := simpl in *; intuition; try subst;
    repeat (simplHyp invOne; intuition; try subst); try congruence in

  let rewriter := autorewrite with core in *;
    repeat (match goal with
              | [ H : ?P |- _ ] =>
                match P with
                  | context[JMeq] => fail 1
                  | _ => rewrite H by crush' lemmas invOne
            end; autorewrite with core in *) in

    (sintuition; rewriter;
      match lemmas with
        | false => idtac            | _ =>
          (** Try a loop of instantiating lemmas... *)
          repeat ((app ltac:(fun L => inster L L) lemmas
          (** ...or instantiating hypotheses... *)
            || appHyps ltac:(fun L => inster L L));
          (** ...and then simplifying hypotheses. *)
          repeat (simplHyp invOne; intuition)); un_done
      sintuition; rewriter; sintuition;
      try omega; try (elimtype False; omega)).

crush' is really broken into 3 main components.

First is a simple tactic sintuition

sintuition := simpl in *; intuition; try subst;
    repeat (simplHyp invOne; intuition; try subst); try congruence

So this first runs the normal set of “generally useful tactics” and then breaks out some of first custom tactics. This essentially will act like a souped-up version of intuition and solve goals that are trivially solvable with straightforward inversions and reductions.

Next there’s a more powerful version of rewriter

rewriter := autorewrite with core in *;
    repeat (match goal with
              | [ H : ?P |- _ ] =>
                match P with
                  | context[JMeq] => fail 1
                  | _ => rewrite H by crush' lemmas invOne
            end; autorewrite with core in *)

This is almost identical to what we have above but instead of solving side conditions with solve [auto], we use crush' to hopefully deal with a larger number of possible rewrites.

Finally, we have the main loop of crush'.

(sintuition; rewriter;
  match lemmas with
    | false => idtac
    | _ =>
      repeat ((app ltac:(fun L => inster L L) lemmas
        || appHyps ltac:(fun L => inster L L));
      repeat (simplHyp invOne; intuition)); un_done
  sintuition; rewriter; sintuition;
try omega; try (elimtype False; omega)).

Here we run the sintuition and rewriter and then get to work with the lemmas we supplied in lemmas.

The first branch is just a match on false, which we use like a nil. Since we have no hypothesis we don’t do anything new.

If we do have lemmas, we try instantiating both them and our hypothesis as many times as necessary and then repeatedly simplify the results. This loop will ensure that we make full use of bot our supplied lemmas and the surrounding environment.

Finally, we make another few passes with rewriter and sintuition attempting to dispatch our goal using our new, instantiated and simplified environment.

As a final bonus, if we still haven’t dispatched our goal, we’ll run omega to attempt to solve a Presburger arithmetic. On the off chance that we have something omega can be contradictory, we also try elimType false; omega to try to exploit such a contradiction.

So all crush does is call this tactic with no lemmas (false) and no suggestions to invert upon (fail). There you have it, and it only took 500 lines to get here.

Wrap Up

So that’s it, hopefully you got a few useful Ltac trick out of reading this. I certainly did writing it :)

If you enjoyed these tactics, there’s a more open-source version of these tactics, on the CPDT website. It might also interest you to read the rest of CpdtTactics.v since it has some useful gems like dep_destruct.

Last but not least, if you haven’t read CPDT itself and you’ve made it this far, go read it! It’s available as either dead-tree or online. I still reference it regularly so I at least find it useful. It’s certainly better written than this post :)

Note, all the code I’ve shown in this post is from CPDT and is licensed under ANCND license. I’ve removed some comments from the code where they wouldn’t render nicely with them.

<script type="text/javascript"> /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + ''; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

July 09, 2014 12:00 AM

July 07, 2014

André Pang (ozone)

Markup Plugin for RapidWeaver 5

For the RapidWeaver users out there, I’ve updated my antique Markup plugin to work with RapidWeaver 5 (slow clap). It also now lives on GitHub, like all the other cool open-source projects published after about 1970. (BitBucket is so 1969.)

As an aside, ohmigosh, there still isn’t anything out there that’s as good as RapidWeaver for building websites. I wanted to redo my site, and looked into a bunch of RapidWeaver alternatives, especially Web apps. Tumblr, Wordpress, Blogger and all that are great for just blogs, but useless for building anything more than a blog. Online site-builders like Squarespace, Weebly, and Virb are either way too dumbed down, too complex, have the most boring themes, or more likely, are all of the above. Despite RapidWeaver still being compiled for ppc and i386 only (it’s not a 64-bit app yet), and using the Objective-C 1.0 runtime (my Markup plugin uses +[NSObject poseAsClass:]!), it is still the best thing going for building sites. Amazing.

Anyway, Markup plugin, go get it.

by André Pang (ozone) ( at July 07, 2014 06:26 PM

July 06, 2014

Roman Cheplyaka

How to run SQL actions in persistent

When I started writing an application that used persistent to interact with a MySQL database, I decided to put the whole application inside one big SqlPersistM action, and run it once inside main. (To make it clear, this is not a Yesod application; I simply use persistent as a standalone library.)

However, as I learned more about persistent and how it worked, it became clear that this was the wrong way to use persistent. Here’s why.

Problems of one big SqlPersistM action

Finalizing transactions

persistent’s SQL layer treats an SqlPersistT action as a single transaction. Thus, until you run the action, the transaction is not committed. Obviously, this is an issue for any long-running server application.

You could work around this by calling transactionSave manually. Now you have a different but related problem…

Overlapping transactions

Normally a single SQL connection can participate in just one SQL transaction. (There are probably exceptions to this rule which I am not aware of, but this is how it happens unless you do something special.)

Thus, assuming your application is multithreaded, you’ll end up committing other threads’ transactions that are active at the same time.

(Besides, I am not sure whether executing multiple SQL statements over the same connection simultaneously is supported at all.)

Resource deallocation

persistent uses resourcet to ensure that resources (such as buffers that hold result sets) are released as soon as they are not needed.

resourcet works by handling these two scenarios:

  1. No exception is thrown; resources are deallocated by an explicit release call.
  2. An exception is thrown, preventing the release action from happening. However, once the exception escapes the enclosing ResourceT block, it triggers the exception handler inside runResourceT. The exception handler then performs deallocation.

When your application consists of one long-running SqlPersistM action, chances are you’re catching some exceptions inside the ResourceT block, by the means of monad-control. Doing that invalidates resourcet’s assumptions: an exception prevents the release action from happening, and yet it never makes it up to runResourceT, and so your long-running app leaks resources.

Do it right

It implies from the above considerations that the right way to use persistent with a SQL backend is:

  1. Make SqlPersistT correspond to logical transactions in your application.
  2. Make ResourceT computations as short-lived as possible. Ideally, don’t catch exceptions inside ResourceT; use finally instead.
  3. Use a connection pool.


I am not an expert in either persistent or SQL databases; I am in the process of figuring this out myself. Corrections (and confirmations) are welcome.

July 06, 2014 09:00 PM

July 04, 2014

Ken T Takusagawa

[mvoghuaz] One-dimensional puzzle generator

Here is a one-dimensional jigsaw puzzle generator implemented in Haskell, creating one-dimensional instances of the exact cover problem.

For generation purposes, the one-dimensional field is divided into n blocks each of size b.  Each of the n pieces is roughly centered on a unique block and spans at most p blocks.  The arguments to the program are b p n.

Each generated piece is specified by a list of coordinates offset from its leftmost coordinate.  Each individual piece is typically not contiguous; that would make the puzzle trivial to solve. Solve the puzzle by finding the offset of each piece in the field so that the field is exactly covered by all the pieces.

There is a "edge effect" flaw such that pieces near the edge tend to span less than p blocks.

Example run with parameters 10 5 10:
Pieces are:

Solution is the concatenation of:

Motivation is to generate random inputs to Knuth's Dancing Links DLX algorithm.  What puzzle parameters generate difficult puzzles?

by Ken ( at July 04, 2014 11:12 PM

Robin KAY

HsQML released

I've been meaning start blogging again for ages, and this seemed to be as good a time as any. I've just released HsQML and uploaded it to Hackage. This minor point release fixes two issues:

Firstly, API changes in Cabal 1.18 broke HsQML's rather involved Setup.hs. I didn't want mandate that users have the latest Cabal library, so I investigated a couple of different options for supporting both the old and new. There are no really pretty solutions here and I ended up using Template Haskell to select between two different sets of utility functions.

Secondly, the text package has recently gone through two major version changes under the PVP. I've widened the Cabal version constraint to permit both text-1.0 and text-1.1.

Hope you enjoy!

release- - 2014.01.18

* Added support for Cabal 1.18 API.
* Relaxed Cabal dependency constraint on 'text'.

by Robin KAY ( at July 04, 2014 09:36 PM

HsQML released

Yesterday, I made new a minor release of HsQML in order to address two issues with using it interactively via GHCi. As usual, it's available for download from Hackage. One little mistake did slip in however, in that I forget to change the darcs repository listed in the package cabal file to the Qt 4 branch. The main trunk is now being used for porting to Qt 5.

An explanation of the GHCi problems follows:

GHCi has traditionally had a number of limitations owing to the built-in linker it uses to load static object files dynamically. The linker is capable enough to load the output of GHC and any simple FFI C code that might be included in a library, but it can't cope with some of the relocations emitted by a C++ compiler. Originally, it wasn't even capable of reading the same archive libraries used by the GHC compiler for linking, and required that Cabal produce special compounded object files for it to use.

The C++ limitation was an issue for HsQML because Qt is a C++ library and hence HsQML needs to include some C++ code as part of its binding layer. I made use of the fact that GHCi depended on special object files in order to incorporate a workaround especially for GHCi. HsQML's build script modifies the build process by removing the objects containing C++ code from being compounded into the special object file, and places them into a separate shared library which is then referenced by the package's extra-ghci-libraries field. GHCi will hence load the shared library and the compiled C++ code within using the system linker, thereby avoiding the problems with its own.

However, it came to my attention recently* than this strategy had run into trouble as GHCi can now load regular archive libraries directly, supplanting the need for special object files. I discovered that the Fedora Linux had modified their distribution of GHC to disable generating the GHCi objects by default. Furthermore, that this behaviour would become the new standard default with Cabal 1.18. This broke HsQML with GHCi because because the aforementioned workaround didn't apply to the regular archive libraries and so GHCi's linker couldn't handle the C++ object files contained within.

I didn't want to simply apply the same workaround to the archive libraries as to the GHCi ones because that would introduce dealing with an additional magic shared library to users who simply wanted to compile their applications. The modification I've applied for this release was therefore to add code to Setup.hs to force (re-)enable generating the special GHCi object files under certain circumstances.

The impact of this issue is likely to decrease over time as GHC now also supports producing shared libraries from Haskell code in addition to static ones. This means that, going forward, the entirety of HsQML can be built as a shared library and GHCi can load it using the system linked without difficulty. My understanding is that this behaviour will become the default with GHC 7.8 for platforms other than Windows.

Hence, the rule is that generating GHCi object files is only force enabled if shared libraries are not enabled. The forcing behaviour can be disabled by passing -f-ForceGHCiLib to cabal-install.

The other issue I found that's fixed with this release is that GHCi had problems finding the workaround shared library on Windows. Unlike other platforms, the extra-ghci-libraries field needed to include the "lib" prefix to the referenced library name in order for Windows GHCi to find it without the library being on the PATH. With that fixed, HsQML should now work with GHCi out of the box on all platforms.

Now, back to working on the Qt 5 port!

release- - 2014.02.01

* Added mechanism to force enable GHCi workaround library.
* Fixed reference name of extra GHCi library on Windows.

* Thanks to rnons.

by Robin KAY ( at July 04, 2014 09:35 PM

Using the Connections element with HsQML

I was asked recently if the Connections element could be used to declaratively connect QML actions to signals defined in Haskell code. I wasn't completely sure if it would work off-hand so I wrote the following example program with HsQML 0.2.x to find out (Hint: the answer is yes).

To begin with, we need a Haskell program which will load a QML document and fire off some signals. The following program forks off a thread which blocks for the user to enter a new line in the terminal window and fires a signal every time they do. The context object has two members, the signal we're experimenting with and a property called 'self' whose function will become apparent shortly.

{-# LANGUAGE DeriveDataTypeable, TypeFamilies #-}
import Graphics.QML
import Data.Typeable
import Data.Tagged
import Control.Concurrent
import Control.Monad

main :: IO ()
main = do
    ctx <- newObject MainObject
    tid <- forkIO $ forever $ do
        putStrLn "Press ENTER to run animation"
        void $ getLine
        fireSignal (Tagged ctx ::
            Tagged TheSignal (ObjRef MainObject))
    runEngineLoop defaultEngineConfig {
        contextObject = Just $ anyObjRef ctx}
    killThread tid

data TheSignal deriving Typeable
instance SignalKey TheSignal where
    type SignalParams TheSignal = IO ()

data MainObject = MainObject deriving Typeable
instance Object MainObject where
    classDef = defClass [
        defPropertyRO "self" ((\x -> return x) ::
            ObjRef MainObject -> IO (ObjRef MainObject)),
        defSignal (Tagged "theSignal" ::
            Tagged TheSignal String)]

The QML document to accompany the above program follows below. It should be placed in a file called 'main.qml' in order to be loaded by the defaultEngineConfig. You could set the initialURL field to something else if you wanted, but I'm trying to keep the code short.

import Qt 4.7
Rectangle {
    id: root
    width: 500; height: 500
    color: "red"
    Rectangle {
        id: square
        x: 150; y: 150; width: 200; height: 200
        color: "yellow"
        Rectangle {
            width: 50; height: 50; color: "black"
        transform: Rotation {
            id: rotateSquare
            origin.x: 100; origin.y: 100; angle: 0
        NumberAnimation {
            id: rotateAnim
            target: rotateSquare; property: "angle"
            from: 0; to: 360; duration: 1500
        Connections {
            target: self
            onTheSignal: rotateAnim.start()

The code for the Connections element is highlighted in bold. Of its two attributes, the first, called 'target', specifies the object with signals that we want to bind handlers to. In this example the signal is a member of the global object and this complicates matters because it's not straightforward to write an expression which yields the global object. Hence, I placed the 'self' property on the global object to provide a convenient way of the getting a reference to it.

There are ways to get the global object, but they're not particularly pretty and I don't fully trust that kind of thing inside Qt's script environment anyway.

The second attribute specifies the signal binding. Specifically, the attribute name identifies the signal and is derived by pre-pending the string 'on' to the actual signal name. Hence, in this case, binding to 'theSignal' is specified using the attribute 'onTheSignal'. The value of the attribute is the JavaScript code to be executed when the signal fires. In our example it causes a simple little animation to occur.

Up to now, the only example I provided of using signals was the hsqml-morris demo application. It's not a great example of idiomatic QML because it uses a big chunk of JavaScript to work around some of the present limitations of HsQML's marshalling facilities (e.g. no lists/arrays). It makes no great attempt to be a "pure" QML application, so it just calls the signal's connect() method to attach it via JavaScript.

You could use the same approach with this test program by replacing the Connections element with the following code snippet:

        Component.onCompleted: {

The 'self' property is superfluous here because we can access the signal member on the global object directly. However, it's a slightly unfair comparison because the JavaScript code only covers connecting to the signal, whereas the Connections element also handles disconnections. When you're dynamically creating and destroying Components using things like the Repeater element, this is important to prevent overloading your signals with handlers that are never cleaned up.

The Connections element also allows the target attribute to be specified with a property or dynamic expression. If the value of the target expression changes at runtime then all the signal handlers will be disconnected and reconnected to the new object.

Addendum: Writing this example has made me think that HsQML relies too heavily on top-level data and instance declarations. I'd like to rectify that in the future by making QML classes first-class values on the Haskell side.

by Robin KAY ( at July 04, 2014 09:35 PM

HsQML released: Now with Qt 5

I've just made a new major release of HsQML, my Haskell binding for the Qt Quick GUI framework. You can download it from Hackage in the usual manner.

This is a particularly exciting release because it's the first to have been ported over to use Qt 5. Previously, HsQML was developed against an older version of the Qt Quick technology which shipped as part of Qt 4.7 and 4.8. Support for Qt 5 has been a constant theme in the e-mails I get concerning HsQML for some time and so I'm pleased to finally deliver on that point.

There are also number of other improvements to the library which should allow more idiomatic QML code and hence reduce the need for helper JavaScript. Properties now support an associated notify signal which allows QML to automatically update in response to property changes rather than needing manual signal handlers. Also, lists and Maybe values can be marshalled between Haskell and QML natively, again reducing friction between the two environments.

The API has been redesigned slightly so that object classes and signal keys can be defined directly inside Haskell functions in addition to the older type-class based method. It's unclear yet if this style is wholly superior but, for smaller programs at least, it permits greater clarity and much less verbosity.

Finally, although still far from comprehensive, I've spent some time trying to improve the documentation on my web-site. It now provides some more substantial examples and goes into greater depth. The complete buildable source code for these examples is contained in the hsqml-demo-samples package. Also, the original Nine Men's Morris demo application is still available, but the package has been renamed to hsqml-demo-morris.

release- - 2014.05.04

* Ported to Qt 5 and Qt Quick 2
* Added type-free mechanism for defining classes.
* Added type-free mechanism for defining signal keys.
* Added property signals.
* Added marshallers for Bool, Maybe, and lists.
* Added less polymorphic aliases for def functions.
* Replaced Tagged with Proxy in public API.
* Removed marshallers for URI and String.
* New design for marshalling type-classes (again).
* Generalised facility for user-defined Marshal instances.
* Relaxed Cabal dependency constraint on 'QuickCheck'.
* Fixed GHCi on Windows with pre-7.8 GHC.

by Robin KAY ( at July 04, 2014 09:35 PM

HsQML released

I've just released HsQML which, as usual, is available for download from Hackage. This release fixes several stability issues and also introduces a facility for defining properties with the CONST attribute.

If you use a property defined on one of your Haskell objects in a QML expression and that property doesn't have an associated signal, then Qt will print the following warning:

Expression depends on non-NOTIFYable properties

For mutable objects, the accurately informs us that QML has no way of knowing if the value of that property changes. However, when using Haskell, we often prefer to work with immutable values and there was previously no way of informing Qt that the value would never change. Previously, the only course of action was to specify a dummy signal or to use nullary methods instead of properties. You can now use the new defPropertyConst function instead of defPropertyRO to specify that an associated signal is unnecessary and suppress this warning.

As an aside, Christopher Reichert has just written a blog post about using HsQML which is well worth the read.

release- - 2014.06.11

* Added properties with constant annotation.
* Added runtime warning for users of the non-threaded RTS.
* Added non-TH version of Setup.hs.
* Relaxed Cabal dependency constraint on 'transformers'.
* Fixed premature garbage collection of QML objects.
* Fixed intermittent crash on exit when firing signals.
* Fixed crash when using Cmd-Q to exit on MacOS.

by Robin KAY ( at July 04, 2014 09:35 PM

July 02, 2014

Ketil Malde

Expected site information from SNPs

Lately, I’ve been working on selecting SNPs, the main goal is often to classify individuals as belonging to some specific population. For instance, we might like to genotype a salmon to see if it is from the local population or an escapee from a sea farm, or perhaps a migrant from a neighboring river? And if it’s an escapee, we might want to know which farm it escaped from. In short, we want to find SNPs that are diagnostic.

Typically, this is done by sequening pools of individuals, mapping the reads to the reference genome, identifying variant positions, and ranking them - typically using FST, sometimes also using p-values for the confidence in an actual allele difference, and maybe filtering on sequencing coverage and base- or mapping quality. However, FST really isn’t a suitable tool for this purpose. I’m therefore proposing the following. Let me know if it makes sense or not.

Expected Site Information

For diagnostic SNP, what we really would like to know is the amount of information observing each site contributes. Using Bayes theorem, observing an allele a in some individual N, gives us the following posterior probability for N belonging to some population A, where the allele frequency, P(aA), is known:

P(A|a) = P(a|A)P(A)/P(a)

Here, P(A) is our prior probability of N belonging to A, which after observing a is modified by a factor of


In order to assign N to one of several populations (either (A) or B, say), we are interested in the relative probabilities for the two hypotheses. In other words, we would like the odds for N belonging to one population or the other. Given the probabilities of P(aA) and (P(a|B)), and initial odds (P(A)/P(B)), we get

P(A|a)/P(B|a) = [P(a|A)P(A)/P(a)]/[P(a|B)P(B)/P(a)]

Canceling out P(a), we find that the prior odds are modified by:


That is, the ratio of this allele’s frequencies in each of the populations. For practical reasons, it is common to take the logarithm of the odds. This gives us scores that are additive and symmetric (so that switching the two populations gives us the same score with the opposite sign). Specifically, base two logarithms will give us the score in bits.

When observing a site, we may of course also encounter the alternative allele. By the same reasoning as above, we find that this allele modifies the odds by


Lacking any prior information, we can consider each population equally likely, and the likelihood of observing a particular allele is the average of the likelihood in each population. The information gain from each possible allele is then averaged, weighted by this average likelihood. For a biallelic site with major allele frequencies p and (q) (and consequentially, minor allele frequencies of 1 − p and (1-q)) in the two populations, the expected added information from the site then becomes:

I(p,q) = |(p+q)/2 log_2(p/q)| + |(1-(p+q)/2)log_2((1-p)/(1-q)) |

Note that we are here only interested in the amount of information gained, regardless of which hypothesis it favors, and thus we take the absolute values. For a site with multiple alleles enumerated by i and with frequency vectors p and q in the two populations, this generalizes to the weighted sum of log2(pi / qi).

Unlike measures like FST, measures of I is additive (assuming independence between sites), so the information gained from observing mulitple sites is readily calculated. From observing the information gained from observing each site, we will also be able to compare different sets of sites, and e.g., compare the value of a single site with minor allele frequencies (MAF) of, say, 0.1 and 0.3 to two sites with MAF of 0.2 and 0.3.

It may also be instructive to compare this procedure to sequence alignment and position specific score matrices (PSSMs). In sequence alignment, a sequence of nucleotides or amino acids are scored by comparing its match to a target sequence to its match to some base model using log odds scores. The base model to compare against is often implicit (typically using sequences of random composition), but more elaborate models are also possible Similarly, position specific frequency matrices are often converted to position specific score matrices using log odds. Calculating the information value from a set of observed alleles is then analogous to scoring an “alignment” of the set of observed alleles to two different sets of allele frequencies.

Allele frequency confidence intervals

In order to apply the above method in practice, we need to measure the allele frequencies in the population. This is problematic for two reasons. First, we do not have precise knowledge of the allele frequencies, we can only estimate them from our sequenced sample. This introduces sampling bias. Second, the sequencing process introduces additional artifacts. For instance, sequencing errors often result in substitutions, which are observed as apparent alleles. In addition, sequences can be incorrectly mapped, contain contamination, the reference genome can contain collapsed repeats, and the chemistry of the sequencing process is usually also biased – for instance, coverage is often biased by GC content. These artifacts often give the false appearance of variant positions.

One challenge with calculating site information from sequencing data (as opposed to using allele frequencies directly), is that such errors in the data can vastly overestimate the information content. For instance, an allele that appears to be fixed in one population means that any other observed allele will assign the individual to the alternative population - regardless of any other alleles. It is easy to see that an allele frequency of zero results in the odds going either to zero or infinity, and thus the log odds will go to either positive or negative infinity.

For diagnostic SNP discovery, it is more important to ensure that identified SNPs are informative, than to precisely estimate the information content. Thus, we take a conservative approach and use upper and lower limits for the allele frequencies by calculating confidence intervals using the method by Agresti-Coull. In addition, the limits are also adjusted by a factor ε, corresponding to sequencing error rate.

Software implementation

I’ve implemented this (that is, the conservative measure) as described above in a software tool called varan. It parses sequence alignments in the standard “mpileup” format as output by the samtools mpileup command. It can currently output several different statistics and estimators, including expected site information. This is a work in progress, so please get in touch if you wish to try it out.

July 02, 2014 08:00 AM

July 01, 2014

Twan van Laarhoven

Dependent equality with the interval

Here is a way to represent heterogeneous or dependent equalities, based on an interval type. In Homotopy Type Theory the interval is usually presented as a Higher Inductive Type with two constructors and a path between them. Here I will just give the two constructors, the path is implicit

data I : Set where
  i₁ : I
  i₂ : I
  -- there is usually a path, i-edge : i₁ ≡ i₂

The eliminator is

i-elim :  {a} {A : I  Set a}
        (x₁ : A i₁)  (x₂ : A i₂)  (Eq A x₁ x₂)  (i : I)  A i
i-elim x₁ x₂ eq i₁ = x₁
i-elim x₁ x₂ eq i₂ = x₂

Here the type Eq is the dependent equality, which has type

Eq :  {a} (A : I  Set a)  (x₁ : A i₁)  (x₂ : A i₂)  Set a

so we take a type parametrized by an interval, and two values of that type at the two endpoints of this interval. We can also define "heterogeneous reflexivity", a generalization of the usual refl function:

refl :  {a} {A : I  Set a}  (x : (i : I)  A i)  Eq A (x i₁) (x i₂)

This function can be used to extract the third part of i-elim, with the reduction

refl (i-elim x₁ x₂ eq) = eq

I believe this can be used as the basis for an observational type theory, where Eq A and refl x reduce. The above is the first case for refl, the rest is "just" tedious structural recursion such as

Eq (\i  A i × B i) x y = Eq A (proj₁ x) (proj₁ y) × Eq B (proj₂ x) (proj₂ y)
refl (\i  x i , y i) = refl x , refl y


Eq (\i  A i  B i) f g = {x : A i₁}  {y : A i₂}  Eq A x y  Eq B (f x) (g y)
refl (\i  \(x : A i)  f i x) = \{x} {y} xy  refl (\i  f i (i-elim x y xy i))

or we can actually use the dependent equality and be more general

Eq (\i  Σ (x₁ : A i) (B i x₁)) x y =
  Σ (x₁y₁ : Eq A (proj₁ x) (proj₁ y))
    (Eq (\i  B i (i-elim (proj₁ x) (proj₁ y) x₁y₁ i)) (proj₂ x) (proj₂ y))
Eq (\i  (x : A i)  B i) f g =
  {x : A i₁}  {y : A i₂}  (xy : Eq A x y)
   Eq (\i  B i (i-elim x y xy i)) (f x) (g y)

Of course there is a lot more to it, but that is not the subject of this post.

As a final remark: if you are not too touchy about typing, then refl could even be implemented with the path i-edge between i₁ and i₂

i-edge : Eq (\_  I) i₁ i₂
i-elim x₁ x₂ eq i-edge = eq
refl foo = foo i-edge

But I'd rather not do that.

July 01, 2014 10:56 PM

Chris Smith

CodeWorld Rises Again!

About three years ago, I started work on an idea about technology-based math education.  The idea was to get middle school students to work passionately on using mathematics to create things, by:

  1. Doing their own original, creative work, instead of following instructions or reaching set answers.
  2. Getting instant feedback 24 hours a day, so they can tinker and learn in a self-directed way.
  3. Building confidence by working on their own ideas, inspiring pride and excitement.
  4. Experiencing how concepts from geometry, algebra, and physics can be springboards for creativity.
  5. Becoming creators, rather than just consumers, of technology.

That’s a lofty set of goals, but it was very successful.  In the 2011-2012 school year, I taught a small class of six students, two to three hours per week.  We had an awesome time.  They built their own computer games throughout the year.  We struggled together, worked our way through, and finished the school year with an awesome expo where the students showed off their work to local technology professionals and participated in a question-and-answer panel about their experiences.  It was fascinating listening to this, because a few patterns arose:

  • Students didn’t really think of what they were doing as math.  This remained true, even when the skills they learned involved describing the behavior of systems using equations, functions, and variables; describing complex shapes in terms of geometry, the coordinate plane, and rotations, translations, and scaling; coming to grips with the meaning of probability and randomness; etc.
  • The students who entered the year being “good at technology” weren’t necessarily the most likely to succeed.  Talking to these students broke all of the stereotypical molds about computers and technology!  Students took to the activity and wildly succeeded were very often girls, and had previously thought they were more the art-and-music type.

At the end of that year, I had plans to teach this program in multiple schools the following school year.  Unfortunately, things then got a little sidetracked.  I started a new job at Google over the summer, moved to California, and dropped the program.  The web site that students had used to build their projects fell into disrepair, and stopped working.  I stopped doing anything about it.

Over the last week and a half, though, that’s changed!  CodeWorld is back!

Getting Started

The CodeWorld web site is (as always) at

Any web browser will do, but you really need to use the latest version of whatever browser you choose.  If you’ve been putting off upgrading Internet Explorer, it’s long past time!

You’ll also want a Google account.  You can log in using your Google account, and save your programs to Google Drive.  Because your programs are saved to the cloud, you can use the web site from any computer you like, even computer labs in a school, and your programs will follow where ever you go.

Using the web site is simple.  Type your program on the left.  Click Run to see it work on the right.  You can sign in to open your existing projects and save your projects.  You can also get links to share your projects with others.  There are sample projects along the bottom of the screen, including Yo Grandma!, a game written by Sophia, one of my students from the original class.

Unfortunately, instructions on how to write the programs are still mostly missing.  If you already know the language, a link to the generated documentation might help.  Otherwise, hold on!  Once the programming environment is stable, I plan to put together a comprehensive progression of exercises, tutorials, and examples.

Behind the Scenes

Under the hood, I mostly recreated this from scratch, throwing away most of the original project from a few years ago.  This new version of the environment has a lot of advantages: it runs your programs on your own computer, so your program runs a lot faster.  It’s less restrictive.  And I completely customized the language to make a lot of things simpler and easier to understand.


  • The programming language for CodeWorld is called Haskell.  Haskell is an awesomely mathematical language, but parts of it are also notoriously complex.  The new incarnation of CodeWorld still uses Haskell, but goes a lot further to hide the rough edges.  In particular, you’ll rarely see any classes, and there’s an obvious type for most things (e.g., all text has the type Text, and all numbers have the type Number.)
  • Previously, CodeWorld was based on a library called Gloss for the Haskell programming language.  Gloss is great, and I saved as many ideas from it as I could.  But CodeWorld is now its own library.  This let me clean up some terminology, align the meaning of programs more closely with the goals of algebraic thinking and math concepts, and work with the simplified version of the language.
  • The biggest change to how the web site works is that your programs now run on your own computer, instead of on the web server.  This is using an awesome project called GHCJS, which converts the Haskell program into JavaScript, which is understood by web browsers.

I’ll try to keep posting here as I have learning material ready to use with this tool.  Stay tuned!

by cdsmith at July 01, 2014 08:01 PM

Big changes coming to CodeWorld

I’m continuing work on CodeWorld, my educational programming environment based on geometry and algebra.  There are big changes coming!  If you’re interested in following the project, please join the new codeworld-discuss mailing list, where I’ll send more regular announcements about significant changes, as well as try to answer questions, and discuss future directions.

Here are some things I intend to change in the near future.  A more complete list is on the project issue tracker, but this is a summary with more details and reasoning about some of the changes.

Aligning With Math Education

An important goal of this project is to align with a standards-based U.S. middle school math education, as much as possible.  To be clear, I still refuse to add complexity or turn the project into a patchwork of specific lessons that promote a specific narrow path of learning.  First and foremost, this should be an environment for tinkering and encountering ideas in self-motivated way.  But given alternative designs that could each be valid on their own, I’ll choose the one that pushes students toward the math standards.

It’s sometimes a tough line to draw.  But I’ve become convinced that there are a few places where I can do better.  Two of those are going to be major breaking changes, coming soon.

1. Death to Currying

Haskell’s convention of currying functions is the wrong default for CodeWorld.  Practically all of mathematics, especially at introductory level, is carried out with the notation f(x,y) = … .  The interpretation is that a function of two parameters is a function whose domain is a product – a set of ordered pairs.  The Haskell language makes a different choice.  Applying a function to two parameters is more like f(x)(y) (the parentheses are optional in Haskell itself), and the interpretation is that f(x) denotes a partially applied function that’s still waiting for its second parameter.

If the goal were to teach about higher-order functions, there would be lots of great arguments for the latter.  If the goal were convenience, you could argue for the latter pretty persuasively, as well.  I think Haskell’s use of currying is great.  But when the goal is to let students encounter and tinker with things they will see in school math, the right choice is to adopt the convention of mathematics.

Luckily, the assumption of curried multi-parameter functions isn’t baked into Haskell too deeply.  By changing the standard library, it’s quite possible to write f(x,y) just as well.  The parentheses on f(x) become optional, but this is actually true of mathematics in general (for example, operators in linear algebra are often written without parentheses, as are trig functions).  I will adopt the convention of using parentheses around even single function parameters.

The only big source of awkwardness comes with binary operators.  So long as we choose not to teach the notations `foo` (for turning a function into an operator) or (+) (for turning an operator into a function), this doesn’t come up much.  Notably, sections still work fine, since they take only one argument.

A couple convenient side effects of this choice are nice, too:

  • Students who routinely write parentheses around function arguments less often find themselves forced to surround negative numbers in parentheses for weird parsing reasons.  As trivial as it might seem, this was a very real and significant learning obstacle the last time I taught the class, and I’ll be happy to see it go.
  • Getting expression structure wrong sometimes gives much better error messages this way.  It’s harder to accidentally mix up precedence between an operator and function application; and passing too few arguments to a function gives a clear error rather than inferring a function type and breaking in an obscure way elsewhere.

2. Resizing the Canvas

The second big change is to resize the canvas from 500×500 to 20×20.

The justification for a 500×500 canvas was generally about confusing pixels – little dots on the screen – with the general idea of a coordinate system.  It’s convenient to blur the distinction at first, but it has in the past become a barrier to understanding the full nature of the coordinate plane with real (or even rational) coordinates.  Many students were confused when later faced with fractional coordinates.  At the same time, developing a full understanding of the rational number system is a big topic in 6th, 7th, and 8th grade mathematics, so it would be great to ask students to do more tinkering with these numbers.

By replacing this with a 20×20 grid (x and y coordinates ranging from -10 to 10), several goals are accomplished:

  • Students early in the class are working with numbers in a range they can comprehend better.
  • Students routinely work in fractions or decimals to fine tune their projects.
  • The abstract coordinate plane, including fractional coordinates, becomes more familiar.

This is a big win overall.

Changes to Usability

On the less controversial side, I’m planning a number of changes to make the site more usable:

  • Pervasive auto-complete, based on a pre-populated list of the standard library symbols as well as parsing the student code for declared names.
  • More complete documentation, tutorials, and better examples.  I admit that the current site is grossly lacking in documentation.  I don’t envy anyone who tries to figure it out on their own!
  • Better tools for playing around with results.  At the very least, students will be given the chance to scroll, pan, and see coordinates of points in pictures, animations, and simulations.

Long-Term Wish List

I also have my wish list for things I’d love to see possible, but am not quite ready to build yet.  This includes:

  • Social features: sharing projects with friends, commenting on or expressing support for other projects.
  • Collaborative projects with shared editing or exporting libraries for others to use.
  • Better debugging tools, such as easy controls to move forward and back in time, fast-forward, pause, etc. for animations, simulations, and even games.
  • Possibly grading features for teachers to grade projects and provide a scoring rubric and comments.

What else would you like to see?  Let me know in the comments here, on codeworld-discuss, or by filing a feature request in the issue tracker.

by cdsmith at July 01, 2014 08:01 PM

June 30, 2014


Announcing rest - A Haskell REST framework

We are excited to officially announce the open source release of our REST framework rest!

rest is a set of packages used to write, document, and use RESTful applications. You write your API in Haskell using rest's DSL. This API can then be run in different web frameworks like happstack, snap, or wai. Additionally, you can automatically generate documentation from it, as well as client libraries for Haskell and Javascript. We have been using it in production for most of our services for a long time and we like it so much that we decided to share it with the public.

If you want to start using rest, check out the tutorial or the example application. You can also come to the Haskell Exchange 2014 where Erik will give a talk about rest. We’d also be happy to answer any questions you have, shoot us an e-mail!

The most important packages are:

  • rest-core: A DSL for defining versioned and web server agnostic REST resources. This is the workhorse of the framework
  • rest-gen: Automatically generates documentation, Haskell, JavaScript, and Ruby clients from a rest API
  • rest-snap, rest-happstack, rest-wai: Drivers for running resources using the web server of your choice

We have also released other packages that are either used by or can be used with rest:

  • rest-client: Used by haskell clients generated by rest-gen
  • rest-types: Types used by the other rest packages
  • json-schema: Define and derive schemas for JSON serializations
  • generic-aeson: Generically derives JSON serializations for data types minimalistically
  • regular-xmlpickler: Generically derives XML serializations for data types
  • aeson-utils: Utilities for working with Aeson.
  • hxt-pickle-utils: Utility functions for using HXT picklers
  • multipart: HTTP Multipart implementation forked from the cgi package
  • rest-stringmap: Maps with string-like keys with built-in serialization to XML and JSON (since JSON doesn’t allow arbitrary keys)
  • code-builder: String manipulation library for code generation

We had a great time working on rest at ZuriHac (thanks for organizing, Better!) and we are happy to see that a lot of people were interested in our work. We got a lot done, here are some highlights:

  • Erik wrote an introductory tutorial to rest
  • Håkan Thörngren rewrote the rest-gen Haskell code generator to use haskell-src-exts, it was released in rest-gen-0.14
  • Christian Berentsen did several things:
    • Cleaned up the interface of json-schema, it was released in json-schema-0.6
    • Added support for outputing Fay compatible json as a separate output type (we want to make it easier to extend rest with more output types so things like this can go in external packages)
    • Implemented a generic API discovery resource that you can hook into your API with no configuration
  • Tom Lokhorst helped out with some always appreciated bug fixing
  • Sebas wrote the rest-wai driver to make sure everyone can use the web server they prefer together with rest
  • Adam worked on the rest-example application and rewrote parts of rest-gen to make the code generator and the library interface simpler
  • wiz did performance benchmarks and created a script to generate haddocks for rest itself

I hope I didn’t forget anyone. A big thanks to everyone who participated!

All the mentioned projects are available on hackage and in public repositories on github. We also created a public mailing list for all our open source projects.

We use these packages to write our Haskell API code for Silk.
Join our team if you enjoy this stuff too, we’re hiring!

June 30, 2014 02:22 PM

June 29, 2014

Neil Mitchell

Optimisation with Continuations

Summary: Continuations are confusing. Here we solve a simple problem (that is at the heart of the Shake build system) using continuations.

Imagine we are given two IO a computations, and want to run them both to completion, returning the first a value as soon as it is produced (let's ignore exceptions). Writing that in Haskell isn't too hard:

parallel :: IO a -> IO a -> IO a
parallel t1 t2 = do
once <- newOnce
var <- newEmptyMVar
forkIO $ t1 >>= once . putMVar var
forkIO $ t2 >>= once . putMVar var
readMVar var

We create an empty variable var with newEmptyMVar, fire off two threads with forkIO to run the computations which write their results to var, and finish by reading as soon as a value is available with readMVar. We use a utility newOnce to ensure that only one of the threads calls putMVar, defined as:

newOnce :: IO (IO () -> IO ())
newOnce = do
run <- newMVar True
return $ \act -> do
b <- modifyMVar run $ \b -> return (False, b)
when b act

Calling newOnce produces a function that given an action will either run it (the first time) or ignore it (every time after). Using newOnce we only call putMVar for the first thread to complete.

This solution works, and Shake does something roughly equivalent (but much more complex) in it's main scheduler. However, this solution has a drawback - it uses two additional threads. Can we use only one additional thread?

For the problem above, running the computations to completion without retrying, you can't avoid two additional threads. To use only one additional thread and run in parallel you must run one of the operations on the calling thread - but if whatever you run on the additional thread finishes first, there's no way to move the other computation off the the calling thread and return immediately. However, we can define:

type C a = (a -> IO ()) -> IO ()

Comparing IO a to C a, instead of returning an a, we get given a function to pass the a to (known as a continuation). We still "give back" the a, but not as a return value, instead we pass it onwards to a function. We assume that the continuation is called exactly once. We can define parallel on C:

parallel :: C a -> C a -> C a
parallel t1 t2 k = do
once <- newOnce
forkIO $ t1 (once . k)
t2 (once . k)

This definition takes the two computations to run (t1 and t2), plus the continuation k. We fork a separate thread to run t1, but run t2 on the calling thread, using only one additional thread. While the parallel function won't return until after t2 completes, subsequent processing using the a value will continue as soon as either finishes.

Looking at the transformers package, we see Control.Monad.Trans.Cont contains ContT, which is defined as:

newtype ContT r m a = ContT {runContT :: (a -> m r) -> m r}

If we use r for () and IO for m then we get the same type as C. We can redefine C as:

type C a = ContT () IO a

The changes to parallel just involve wrapping with ContT and unwrapping with runContT:

parallel :: C a -> C a -> C a
parallel t1 t2 = ContT $ \k -> do
once <- newOnce
forkIO $ runContT t1 (once . k)
runContT t2 (once . k)

Now we've defined our parallel function in terms of C, it is useful to convert between C and IO:

toC :: IO a -> C a
toC = liftIO

fromC :: C a -> IO a
fromC c = do
var <- newEmptyMVar
forkIO $ runContT c $ putMVar var
readMVar var

The toC function is already defined by ContT as liftIO. The fromC function needs to change from calling a callback on any thread, to returning a value on this thread, which we can do with a forkIO and MVar. Given parallel on IO takes two additional threads, and parallel on C takes only one, it's not too surprising that converting IO to C requires an additional thread.

Aren't threads cheap?

Threads in Haskell are very cheap, and many people won't care about one additional thread. However, each thread comes with a stack, which takes memory. The stack starts off small (1Kb) and grows/shrinks in 32Kb chunks, but if it ever exceeds 1Kb, it never goes below 32Kb. For certain tasks (e.g. Shake build rules) often some operation will take a little over 1Kb in stack. Since each active rule (started but not finished) needs to maintain a stack, and for huge build systems there can be 30K active rules, you can get over 1Gb of stack memory. While stacks and threads are cheap, they aren't free.

The plan for Shake

Shake currently has one thread per active rule, and blocks that thread until all dependencies have rebuilt. The plan is to switch to continuations and only have one thread per rule executing in parallel. This change will not require any code changes to Shake-based build systems, hopefully just reduce memory usage. Until then, huge build systems may wish to pass +RTS -kc8K, which can save several 100Mb of memory.

by Neil Mitchell ( at June 29, 2014 09:56 PM

June 28, 2014

JP Moresmau

EclipseFP reaches 100 stars!

This week, the EclipseFP github project reached a hundred stars! Thanks to all users and contributors!! I know still a lot of work is needed to make EclipseFP even better (and faster (-:), so please do not hesitate to participate, on the Eclipse side, on the Haskell side, or on the documentation!

Happy Haskell Hacking!

by JP Moresmau ( at June 28, 2014 12:51 PM

Mateusz Kowalczyk

My experience with NixOS

Posted on June 28, 2014 by Fūzetsu

This post is for Haskellers interested in nix (the package manager) and maybe even NixOS (a distribution built on nix). If you’re not interested then skip it, but I know many people are. It describes how I made the switch and some of my experiences since. I have put off this blog post for a long time, hoping to write it up once I have everything working just like I want it but I was finally motivated to write it up by people expressing interest on IRC. I know many people want to switch but aren’t quite there yet, hopefully this can help them make the decision. If you’re interested in nix but not NixOS, you probably want to just skim the beginning.

Please note that things contained here are just my opinions and I’m not some NixOS guru so things stated here may well be inaccurate.

A couple of weeks ago I have switched to NixOS. Like many, I have seen the blogpost by ocharles and have since thought ‘It’d be great to switch but I’d hate to put in the effort’ but the thought crept in. I ask that you read that blog post first. I have even started to set up NixOS on a separate hard-drive. Recently I have finally decided to retire my trusty ThinkPad X61s on which I did my hacking for the past three years: it was overheating, had holes through it (don’t ask), falling apart and I have took it apart so many times that it’s a miracle it even stayed together. This was a perfect chance. I have taken out the SSD (which cost me more than the netbook itself) and repurposed one of my fileservers which was running Gentoo into a desktop machine.

Probably the most vital resource when making the switch is the NixOS is the NixOS manual. I’ll not go over the installation process but you can find my configuration file here.

My current set-up is XMonad without a DE, using SLiM as a log-in DM.

At the beginning I struggled. I had problems understanding how things worked and some software I wanted to use was simply not packaged. I spent the first couple of weeks with KDE and without some software I wanted. This is a bit of a downside: the number of packages is not the greatest of all distributions. Please don’t get me wrong, there is a lot of software already but the chances are that if you’re using something not that common, you might have to package it yourself. The upside is, it’s easy to do.

I will briefly describe some things which will related to Haskell development later. There is a thing called Hydra, it’s a build-bot that NixOS uses. There is a thing called nixpkgs, it is a repository of packages used by NixOS and also nix itself if you aren’t going for the full OS. nixpkgs is essentially a big repository of nix expressions. Hydra looks at this and builds the expressions, resulting in packages. Main re-distribution works in channels: a user subscribes to a channel and when we ask for some package to be installed, this is where the information is taken from. Official channels are effectively nixpkgs at some commit: nixos channel might be a few weeks behind nixpkgs HEAD, nixos-unstable is usually a few days. Channels are updated when Hydra finishes to build a particular jobset: this means you get binaries for the default settings of all Hydra-built packages. This includes Haskell packages!

I will now describe how I have been doing Haskell development. Again, note that this is constantly evolving while I discover new things.

Haskell development with nix/NixOS

Firstly, NixOS is not necessary to benefit. Pretty much everything I say here is due to nix itself.

Perhaps the main motivation for using nix is wanting to avoid cabal hell. The presence of cabal sandboxes and freezing of dependencies has allowed many people to avoid the problem. I myself used sandboxes very soon after they came out and use cabal-dev before that. My main problem with sandboxes is managing them: are you sandboxing a new project? Come back in an hour when text, hxt, lens, attoparsec, haskell-src-exts and whatever else you happen to be using have compiled for the 50th time on your machine. Sure, one can use shared sandboxes but it is a massive pain. I have wasted hours of my life recompiling same dependencies. nix allows you to avoid this.

I will consider a few scenarios and any potential problems that might come up and how I have dealt (or not dealt!) with them so far.

You have your project. Perhaps the first thing you do is write the cabal file or maybe you already have one but you want to use nix. When we develop, we often want to actually be able to be in the environment of the package, be able to run ghci and all that jazz. There’s a tool called nix-shell which can help you. This effectivelly allows you drop into a sandbox of your project. This is the magical thing ocharles refered to in his blog post. What he did not mention is that you can generate on of these expressions necessary to use nix-shell. Here’s a real example:

[shana@lenalee:/tmp]$ cat Yukari.cabal
name:                Yukari
synopsis:            Command line program that allows for automation of various tasks on the AnimeBytes private tracker website.
license:             GPL-3

license-file:        LICENSE

author:              Mateusz Kowalczyk
category:            Utils
build-type:          Simple
cabal-version:       >=1.8

executable yukari
  main-is:             src/Main.hs
  build-depends:       base ==4.*, Yukari

  default-language:     Haskell2010

  build-depends:       base ==4.*, curl ==1.3.*, HTTP ==4000.*, filepath ==1.3.*
                       , directory ==1.2.*, bytestring ==0.10.*, network ==2.5.*
                       , text ==1.1.1.*, attoparsec ==0.12.*, HandsomeSoup ==0.3.*
                       , hxt ==9.*, download-curl ==0.1.*, dyre

  hs-source-dirs:       src

test-suite spec
  type:             exitcode-stdio-1.0
  default-language: Haskell2010
  main-is:          Spec.hs

  build-depends:       base ==4.*, Yukari, hspec, QuickCheck == 2.*,
                       filepath==1.3.*, directory ==1.2.*

Then with little help of cabal2nix (the dummy sha256 parameter is a hack here as we’re generating an expression for a source repository).:

[shana@lenalee:/tmp]$ cabal2nix Yukari.cabal --sha256 foo
{ cabal, attoparsec, curl, downloadCurl, dyre, filepath
, HandsomeSoup, hspec, HTTP, hxt, network, QuickCheck, text

cabal.mkDerivation (self: {
  pname = "Yukari";
  version = "";
  sha256 = "foo";
  isLibrary = true;
  isExecutable = true;
  buildDepends = [
    attoparsec curl downloadCurl dyre filepath HandsomeSoup HTTP hxt
    network text
  testDepends = [ filepath hspec QuickCheck ];
  meta = {
    homepage = "";
    description = "Command line program that allows for automation of various tasks on the AnimeBytes private tracker website";
    license = self.stdenv.lib.licenses.gpl3;
    platforms = self.ghc.meta.platforms;

Note that cabal2nix generates expressions suitable for nixpkgs. To use it for a shell environment, I ammend the resulting expression into following:

{ pkgs ? (import <nixpkgs> {})
, haskellPackages ? pkgs.haskellPackages_ghc763

haskellPackages.cabal.mkDerivation (self: {
  pname = "Yukari";
  version = "";
  src = /home/shana/programming/yukari;
  isLibrary = true;
  isExecutable = true;
  buildDepends = with haskellPackages; [
    attoparsec curl downloadCurl dyre filepath HandsomeSoup HTTP hxt
    network text
  testDepends = with haskellPackages; [ filepath hspec QuickCheck ];
  meta = {
    homepage = "";
    description = "Command line program that allows for automation of various tasks on the AnimeBytes private tracker website";
    license = self.stdenv.lib.licenses.gpl3;
    platforms = self.ghc.meta.platforms;

If at any point I want to use a different compiler version, I only have to change it at the top (or use a flag to nix-shell) and it will automagically all just work. Now I can use this sandbox:

[shana@lenalee:~/programming/yukari]$ nix-shell --pure

[nix-shell:~/programming/yukari]$ cat .ghci
:set -isrc -fbreak-on-error
[nix-shell:~/programming/yukari]$ ghci
GHCi, version 7.6.3:  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
package flags have changed, resetting and loading new packages...
Loading package array- ... linking ... done.
Loading package deepseq- ... linking ... done.
Loading package containers- ... linking ... done.
Loading package filepath- ... linking ... done.
Loading package old-locale- ... linking ... done.
Loading package time- ... linking ... done.
Loading package bytestring- ... linking ... done.
Loading package unix- ... linking ... done.
Loading package directory- ... linking ... done.
Loading package old-time- ... linking ... done.
Loading package pretty- ... linking ... done.
Loading package process- ... linking ... done.
Loading package Cabal-1.16.0 ... linking ... done.
Loading package binary- ... linking ... done.
Loading package bin-package-db- ... linking ... done.
Loading package hoopl- ... linking ... done.
Loading package hpc- ... linking ... done.
Loading package template-haskell ... linking ... done.
Loading package ghc-7.6.3 ... linking ... done.
Prelude> :l  Utils.Yukari
[1 of 7] Compiling Utils.Yukari.Types ( src/Utils/Yukari/Types.hs, interpreted )
[2 of 7] Compiling Utils.Yukari.Settings ( src/Utils/Yukari/Settings.hs, interpreted )
[3 of 7] Compiling Utils.Yukari.Parser ( src/Utils/Yukari/Parser.hs, interpreted )
[4 of 7] Compiling Utils.Yukari.Formatter ( src/Utils/Yukari/Formatter.hs, interpreted )
[5 of 7] Compiling Utils.Yukari.Crawler ( src/Utils/Yukari/Crawler.hs, interpreted )
[6 of 7] Compiling Utils.Yukari.Spender ( src/Utils/Yukari/Spender.hs, interpreted )
[7 of 7] Compiling Utils.Yukari     ( src/Utils/Yukari.hs, interpreted )
Ok, modules loaded: Utils.Yukari, Utils.Yukari.Crawler, Utils.Yukari.Formatter, Utils.Yukari.Settings, Utils.Yukari.Spender, Utils.Yukari.Parser, Utils.Yukari.Types.

The --pure stops any ‘globally’ installed tools or packages from polluting the environment which ensures that we only use what we say we do: no surprises because other developer didn’t have ‘somespecialprogram’ installed! Personally I currently use emacs with haskell-mode and I want a REPL in emacs. nix-shell lets you do this. The way I do it is to eval (setq haskell-program-name "nix-repl --pure --command "ghci").

So we managed to sandbox a single project. Cool, but what about if we want to depend on another project? It’s often the case that our project depends on another of our projects which might not be on Hackage or we want to work against dev version or ….

I do this with Haddock: we recently split out haddock parser into a sub-library, ‘haddock-library’. I simply wrote an expression for haddock-library and then import it from haddock expression. Simple:

[shana@lenalee:~/programming/haddock]$ cat default.nix
{ haskellPackages ? (import <nixpkgs> {}).myHaskellPackages_ghcHEAD
, haddockLibrary ? (import /home/shana/programming/haddock/haddock-library
    { haskellPackages = haskellPackages; })

haskellPackages.cabal.mkDerivation (self: {
  pname = "haddock";
  version = "2.15.0";
  src = /home/shana/programming/haddock;
  buildDepends = with haskellPackages;
                   [ Cabal deepseq filepath ghcPaths xhtml haddockLibrary ];
  testDepends = with haskellPackages; [ Cabal deepseq filepath hspec QuickCheck ];
  isLibrary = true;
  isExecutable = true;
  enableSplitObjs = false;
  noHaddock = true;
  doCheck = true;
[shana@lenalee:~/programming/haddock]$ cat haddock-library/default.nix
{ haskellPackages ? (import <nixpkgs> {}).myHaskellPackages_ghc763
  inherit (haskellPackages) cabal deepseq QuickCheck hspec baseCompat;
cabal.mkDerivation (self: {
  pname = "haddock-library";
  version = "1.1.0";
  src = /home/shana/programming/haddock/haddock-library;
  testDepends = [ QuickCheck hspec baseCompat ];
  buildDepends = [ deepseq ];
  isLibrary = true;
  isExecutable = false;
  enableSplitObjs = false;
  doCheck = true;

There are a couple of things going on here. Firstly, you can see that haddock-library by default uses GHC 7.6.3: haskellPackages ? (import <nixpkgs> {}).myHaskellPackages_ghc763. This is fine but when I’m working with Haddock itself, I want to make sure this gets built with same version as haddock, so I have

, haddockLibrary ? (import /home/shana/programming/haddock/haddock-library
    { haskellPackages = haskellPackages; })

This makes sure we use the same set of packages in both so when haddock uses GHC HEAD then so does haddock-library. To nix enthusiasts out there, I’m aware I can use ‘inhert’, just didn’t get around to it.

Now whenever I change things under haddock-library and drop into haddock shell, it will automagically get rebuilt.

Better yet, I do this with GHC itself! If you’ll notice, I’m importing (import <nixpkgs> {}).myHaskellPackages_ghcHEAD. If you look in my nixpkgs config you’ll find some incantations of following nature:

{ pkgs }:

{ packageOverrides = self: with pkgs; rec {

  haskellPackages_ghcHEAD = self.haskell.packages {
    ghcPath = /home/shana/programming/ghc;
    ghcBinary = self.haskellPackages.ghcPlain;
    prefFun = self.haskell.ghcHEADPrefs;


What’s going on here? Well, a few things. First I’m overwriting a thing called ghcPath to /home/shana/programming/ghc. This points to my local GHC HEAD checkout. In there I have another nix expression which describes how to build GHC HEAD. This means that yes, I am able to have Haddock depend on a checkout of GHC itself. Here is that GHC expression in full:

{ pkgs ? (import <nixpkgs> {})
, stdenv ? pkgs.stdenv
, ghc ? pkgs.ghc.ghc782
, perl ? pkgs.perl
, gmp ? pkgs.gmp
, ncurses ? pkgs.ncurses
, happy ? pkgs.haskellPackages.happy
, alex ? pkgs.haskellPackages.alex
, automake ? pkgs.automake
, autoconf ? pkgs.autoconf
, git ? pkgs.git
, libxslt ? pkgs.libxslt
, libxml2 ? pkgs.libxml2
, python ? pkgs.python

stdenv.mkDerivation rec {
  name = "ghc-${version}";
  version = "7.9.20140624";

  src = "/home/shana/programming/ghc";

  buildInputs = [ ghc perl gmp ncurses automake autoconf
                  git happy alex libxslt libxml2 python ];

  enableParallelBuilding = true;

  buildMK = ''
    libraries/integer-gmp_CONFIGURE_OPTS += --configure-option=--with-gmp-libraries="${gmp}/lib"
    libraries/integer-gmp_CONFIGURE_OPTS += --configure-option=--with-gmp-includes="${gmp}/include"
    BuildFlavour = quick

  preConfigure = ''
    echo "${buildMK}" > mk/
    perl boot
    sed -i -e 's|-isysroot /Developer/SDKs/MacOSX10.5.sdk||' configure
  '' + stdenv.lib.optionalString (!stdenv.isDarwin) ''
    export NIX_LDFLAGS="$NIX_LDFLAGS -rpath $out/lib/ghc-${version}"

  configureFlags = "--with-gcc=${stdenv.gcc}/bin/gcc";

  # required, because otherwise all symbols from HSffi.o are stripped, and
  # that in turn causes GHCi to abort
  stripDebugFlags = [ "-S" "--keep-file-symbols" ];

  meta = {
    homepage = "";
    description = "The Glasgow Haskell Compiler";
    maintainers = [
    inherit (ghc.meta) license platforms;

You don’t have to be able to understand this but know that whenever I want to update my GHC HEAD, all I have to do is to update the repository (through usual sync-all GHC script) and then bump up the version in above expression. Now if I go to drop into a nix-shell for Haddock, it will notice the change and build GHC HEAD.

Now to explain another bit of my config:

  myHaskellPackages_ghcHEAD = pkgs.recurseIntoAttrs (haskellPackages_ghcHEAD.override {
    extension = se : su : {
      syb = se.callPackage /home/shana/programming/nixpkgs/pkgs/development/libraries/haskell/syb/0.4.2.nix {};
      vty_5_1_0 = se.callPackage /home/shana/programming/nix-project-defaults/vty/5.1.0.nix {};
      mtl = se.callPackage /home/shana/programming/nix-project-defaults/mtl/2.2.1.nix {};
      testFrameworkSmallcheck =
        se.callPackage /home/shana/programming/nix-project-defaults/test-framework-smallcheck {};

}; }

What I’m doing here is defining or overwriting packages in the Haskell package set: as you can see, I’m defining vty_5_1_0 and setting mtl default to 2.2.1. Why? They were either not at that moment in my version of nixpkgs (my channel hasn’t caught up) or I wanted to use different defaults. It’s as easy as the above. This brings me to the next point.

What happens when nixpkgs doesn’t have something you need?

  1. Create an expression for it. This is as easy as using cabal2nix. If it’s on hackage, it’s even easier:

    [shana@lenalee:~/programming/haddock]$ cabal2nix cabal://text
    { cabal, deepseq, HUnit, QuickCheck, random, testFramework
    , testFrameworkHunit, testFrameworkQuickcheck2
    cabal.mkDerivation (self: {
      pname = "text";
      version = "";
      sha256 = "1yrzg449nbbzh2fb9mdmf2jjfhk2g87kr9m2ibssbsqx53p98z0c";
      buildDepends = [ deepseq ];
      testDepends = [
        deepseq HUnit QuickCheck random testFramework testFrameworkHunit
      meta = {
        homepage = "";
        description = "An efficient packed Unicode text type";
        license = self.stdenv.lib.licenses.bsd3;
        platforms = self.ghc.meta.platforms;
  2. Point to it somehow from your project. Two main ways are to either add it to your package base (as seen in my config snippet) or do it directly from a project (as seen from my haddock expression snippet).

  3. Make a pull request to nixpkgs so everyone can benefit. Please read contribution NixOS wiki page on how to contribute.

So is this better than cabal sandbox? In my opinion, yes, here’s why I think so:

  • Automatically share binary results: are you working with dev version of a library? After you build it once, all your other projects benefit: nix will not rebuild a dependency ‘just because’, it will re-use the binary across all your projects that say they want it! This is already much better than sandboxes where you have to explicitly share.

  • You can specify more than Haskell packages: cabal only allows you to specify Haskell dependencies but what if you require gcc too? Maybe you have development tools like ghc-mod that you want to use. When I wanted to use ghc-mod across projects with multiple GHC versions it was absolute nightmare. nix will let you do this effortlessly be it with Haskell packages or external tools or even Haskell tools which depend on specific versions of GHC. Remember, we can sandbox GHC versions and the tools depending on them.

  • It’s not limited to Haskell software. You can sandbox just about anything you can imagine. You absolutely have to run some PHP script? Sure, if it’s a bit complicated then write a nix expression for it and run. If it’s simple, nix-shell -p php will drop you in a shell with PHP available, automatically pulling in all dependencies. Once you’re done with that environment, no longer dependencies will be removed during garbage collection.

  • Uses binaries whenever available while cabal sandbox will usually leave you waiting for everything to compile.

Even hakyll which is a Haskell program that will generate a page from this Markdown post is going to be used by nix-shell -p haskellPackages_ghc763.ghc -p haskellPackages_ghc763.hakyll --pure: I don’t need it day to day so I’ll just let it get garbage collected at next opportunity.

The downsides of using nix-shell for Haskell projects:

  • It’s a less-documented process. For more complicated setups, it might take a bit of figuring out how to get it to work. An example is me trying to figure out how to get Yi to see its own available libraries at runtime which is required for dynamic reloading &c.

  • The workflow is a bit different from what you might be used to. Currently I’m using eval "$configurePhase" && eval "$buildPhase" in my projects which behind the scenes runs cabal. Note that there are people who use nix and stick with their usual development workflow of using cabal configure/build themselves so it is

  • Rarely it might be necessary to run cabal by hand if your project requires it. My use-case was generating symbols that we get from cabal such as those used by CPP library version pragmas. This is not too common however.

  • There are two places to update when you add/remove dependencies to the project: nix expression and cabal file. I consider this very minor considering it’s probably a word change in each. To be clear, your cabal projects keep their cabal files, using nix does not mean that yourproject.cabal is no longer used.


I’ll give a breakdown of what I like and dislike about nix and NixOS so far.

What I like:

  • NixOS configuration is a pleasure. You no longer have to run around all over your system in hunt of configuration files, you now have a config file that you yourself decide how to split up (if at all) and if you screw anything up, you can always roll back.

  • Packaging software is fairly easy. There are things that are difficult to package but in huge majority of cases, it is a few lines. It’s not terribly difficult to get started.

  • Binaries for Haskell packages that aren’t terribly out of date. Many binary distros out there have outdated Haskell packages if at all. Here there are tools to generate expressions from cabal files so updating is not a chore. If all you’re doing is a version bump then it’s as easy as changing a line or two and making a pull request. Hydra is nearly always churning through new Haskell packages to make sure it’s all up to date with change dependencies.

  • I’m not losing sleep over possibility of cabal hell.

  • I’m not losing days of my life staring at lens or text build.

  • Switching between GHC versions is trivial. In the past I was switching symlinks between GHC versions and carefully sandboxing everything. While it worked for development, it certainly did not work for anything using currently-active package databases (ghc-mod anyone?).

  • I don’t have to think about things clashing. If one project wants text-0.11, another -1.0 and another -dev999999Ijustmadeachange then there’s no real hassle on my part.

  • Easy deployment. If you’re a company, you can set up Hydra to build your software for you. If you’re a sysadmin, you can install nix and your users are able to install they software they want without bothering you: nix allows regular users to install software into their profile.

  • You can roll-back through upgrades whether it be system upgrades or user profile upgrades. Every time you run nix-env -i to install a package, a new generation is created so you can roll-back later if you need to.

What I dislike:

  • The documentation is a bit scarce. I end up having to look through package or NixOS module sources more than I’d like to.

  • The nix expression language is not statically typed yet and error messages are often complete ass.

  • On more popular distros, often one can use search engines to find people who already had the problem. On NixOS such information sometimes just does not exist. I have been relentlessly posting to the nix mailing list to hopefully change this a bit and to actually find out what I wanted.

  • One has to either disallow unfree packages completely or allow them. It’s not possible to say that we’re OK with pulling in unfree nvidia drivers but other than that we want nothing unfree on our system.

  • I’m used to being able to customise each software package in 50 different ways. Often in nixpkgs the package maintainers don’t take time to expose various options. To follow up, Hydra only builds packages with the default flags. The current hack is to define multiple packages with different default flags.

  • Pull requests in certain areas can take a longer time and/or some reminding before getting merged. Haskell-related PRs get merged quickly however.

  • The package managment tools are not as up to scratch as they are on older distributions. Gentoo has some great tooling. I put it towards young age of the distribution.

  • Getting fixes from nixpkgs newer than your channel is a bit of a pain. You either check out the appropriate commit and apply patches on top or rebuild half of your system. I used to run against HEAD version of nixpkgs and found myself compiling a lot of stuff because Hydra didn’t build it yet. I recommend nixos-unstable channel which is usually not far behind HEAD.

  • systemd

  • There’s a policy to only keep latest versions of software around unless it’s necessary to have more. This means that when you generate a nix expression from cabal file, it will try to use the defaults in nixpkgs rather than specific versions. While I dislike this quite a bit, there are a few things that can be done to keep things sane:

    • When you really need an older version, you can explicitly refer to it if in nixpkgs or refer to your local expression if it isn’t in nixpkgs

    • If the package works with a newer version and it’s just the case of a bump in cabal file, you can set ‘jailbreak = true’ which ignores what cabal says about versions.

    • Many Haskell packages already have multiple versions available so I find that in practice it is not a huge worry anyway. I initially feared (and still do a bit) a horrible version mess but it seems to well enough.

  • There are no binaries yet for 7.8.2 of package versions which means if you use those, you’ll have to wait a bit while they build just like you would have to with cabal install anyway. This is only temporary but might be a slight annoyance if you’re expecting binaries for those. I think building binaries for 7.8 will be switched on with 7.8.3 out but this is speculation.

  • It can be a bit disk-space heavy because we potentially hold onto many versions of the same package, just built with different dependencies. The two ways to save space are: optimise your store which uses hardlinks for identical files to save space (saves GBs) and garbage-collect which removes software that is no longer dependended on. Even after you say that you no longer want some software with nix-env -e, it stays on your system until garbage-collected.

It may look like there are many dislikes but they are mostly annoyances or my incompetence. I definitely would recommend nix or even NixOS if you are already considering a switch. I have to say that I can not recommend switching to NixOS if you need your machine in development-ready mode next morning because it can take a few days to get everything going just the way you need it. I don’t have this worry with nix itself however which you can install alongside your distro. If you’re a working man, I believe you could set up NixOS in a VM first and then simply carry over the config when you have everything ready.

In general I find that I never worry whether my package database will screw up or anything like that.

If you’re interested, please swing by #nixos on Freenode. This and the mailing list is the majority of my help has been coming from.

It’s a bit of a hectic post so please feel free to contact me if you have questions and I’ll try to answer to best of my knowledge. Note that I’ll almost certainly not see and/or not reply to questions on reddit if this is to find its way there, sorry.

June 28, 2014 08:14 AM

Danny Gratzer

Some Useful Agda

Posted on June 28, 2014

I’ve been using Agda for a few months now. I’ve always meant to figure out how it handles IO but never have.

Today I decided to change that! So off I went to the related Agda wiki page. So hello world in Agda apparently looks like this

    open import IO
    main = run (putStrLn "test")

The first time I tried running this I got an error about an IO.FFI, if you get this you need to go into your standard library and run cabal install in the ffi folder.

Now, on to what this actually does. Like Haskell, Agda has an IO monad. In fact, near as I can tell this isn’t a coincidence at all, Agda’s primitive IO seems to be a direct call to Haskell’s IO.

Unlike Haskell, Agda has two IO monads, a “raw” primitive one and a higher level pure one found in IO.agda. What few docs there are make it clear that you are not intended to write the “primitive IO”.

Instead, one writes in this higher level IO monad and then uses a function called run which converts everything to the primitive IO.

So one might ask: what exactly is this strange IO monad and how does it actually provide return and >>=? Well the docs don’t actually seem to exist so poking about the source reveals

    data IO {a} (A : Set a) : Set (suc a) where
      lift   : (m : Prim.IO A) → IO A
      return : (x : A) → IO A
      _>>=_  : {B : Set a} (m : ∞ (IO B)) (f : (x : B) → ∞ (IO A)) → IO A
      _>>_   : {B : Set a} (m₁ : ∞ (IO B)) (m₂ : ∞ (IO A)) → IO A

Wow.. I don’t know about you, but this was a bit different than I was expecting.

So this actually just forms a syntax tree! There’s something quite special about this tree though, those ∞ annotations mean that it’s a “coinductive” tree. So we can construct infinite IO tree. Otherwise it’s just a normal tree.

Right below that in the source is the definition of run

    run : ∀ {a} {A : Set a} → IO A → Prim.IO A
    run (lift m)   = m
    run (return x) = Prim.return x
    run (m  >>= f) = Prim._>>=_ (run (♭ m )) λ x → run (♭ (f x))
    run (m₁ >> m₂) = Prim._>>=_ (run (♭ m₁)) λ _ → run (♭ m₂)

So here’s where the evilness comes in! We can loop forever transforming our IO into a Prim.IO.

Now I had never used Agda’s coinductive features before and if you haven’t either than they’re not terribly complicated.

is a prefix operator that stands for a “coinductive computation” which is roughly a thunk. is a prefix operator that delays a computation and forces it.

There are reasonably complex rules that govern what qualifies as a “safe” way to force things. Guarded recursion seems to always work though. So we can write something like

    open import Coinduction
    open import Data.Unit
    data Cothingy (A : Set) : Set where
      conil  : Cothingy A
      coCons : A → ∞ (Cothingy A) → Cothingy A
    lotsa-units : Cothingy ⊤
    lotsa-units = coCons tt (♯ lotsa-units)

Now using ♯ we can actually construct programs with infinite output.

    forever : IO ⊤
    forever = ♯ putStrLn "Hi" >> ♯ forever

    main = run forever

This when run will output “Hi” forever. This is actually quite pleasant when you think about it! You can view you’re resulting computation as a normal, first class data structure and then reify it to actual computations with run.

So with all of this figured out, I wanted to write a simple program in Agda just to make sure that I got it all.


I decided to write the fizz-buzz program. For those unfamiliar, the specification of the program is

For each of the numbers 0 to 100, if the number is divisible by 3 print fizz, if it’s divisible by 5 print buzz, if it’s divisible by both print fizzbuzz. Otherwise just print the number.

This program is pretty straightforward. First, the laundry list of imports

    module fizzbuzz where
    import Data.Nat        as N
    import Data.Nat.DivMod as N
    import Data.Nat.Show   as N
    import Data.Bool       as B
    import Data.Fin        as F
    import Data.Unit       as U
    import Data.String     as S
    open import Data.Product using (_,_ ; _×_)
    open import IO
    open import Coinduction
    open import Relation.Nullary
    open import Function

This seems to be the downside of finely grained modules.. Tons and tons of imports.

Now we need a function which takes to ℕs and returns true if the first mod the second is zero.

    congruent : N.ℕ → N.ℕ → B.Bool
    congruent n    = B.false
    congruent n (N.suc m) with N._≟_ 0 $ F.toℕ (N._mod_ n (N.suc m) {})
    ... | yes _ = B.true
    ... | no  _ = B.false

Now from here we can combine this into the actual worker for the program

    _and_ : {A B : Set} → A → B → A × B
    _and_ = _,_
    fizzbuzz : N.ℕ → S.String
    fizzbuzz    = "fizzbuzz"
    fizzbuzz n with congruent n 3 and congruent n 5
    ... | B.true  , B.true   = "fizzbuzz"
    ... | B.true  , B.false  = "fizz"
    ... | B.false , B.true   = "buzz"
    ... | B.false , B.false  = n

Now all that’s left is the IO glue

    worker : N.ℕ → IO U.⊤
    worker    = putStrLn $ fizzbuzz
    worker (N.suc n) = ♯ worker n >> ♯ putStrLn (fizzbuzz $ N.suc n)
    main = run $ worker 100

There. A somewhat real, IO based program written in Agda. It only took me 8 months to figure out how to write it :)

<script type="text/javascript"> /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + ''; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

June 28, 2014 12:00 AM

June 27, 2014

Bill Atkins

Unit Testing in Swift

Since Swift was released at the beginning of the month, I've been doing using it for most of my iOS development. It's been a pleasant experience: I've been able to discard huge amounts of boilerplate and take advantage of a few functional programming techniques that were previously unavailable on the iPhone and iPad.

One area where Swift has made huge improvements over Objective-C is unit tests. Objective-C's verbosity made it difficult to create small, focused classes to perform specific tasks. Plus, the language's insistence on keeping only one class to a file and the cumbersome pairing of every implementation file with a header imposed a hefty penalty on programmers who tried to divide their work up into discrete, testable components.

Unit testing in Swift is done with the same XCTest framework introduced back in Xcode 5 for Objective-C. But Swift's concision and its inclusion of modern language features like closures makes XCTest much more pleasant than it was to use under Objective-C. We'll walk through a very simple example of Swift unit testing below.

To get started, create an empty iOS Application project in Xcode called Counter. Xcode will generate a CounterTests folder for you and an associated test target.

First, let's create a simple class to be tested. Create the file "Counter.swift" and add the following code to it:

import Foundation

class Counter {
  var count: Int
  init(count: Int) {
    self.count = count
  convenience init() {
    self.init(count: 0)
  func increment() {


This is a very simple class, but it will be enough to illustrate how to use XCTest to test your own Swift code.

Create a file called "CounterTest.swift" in the CounterTests folder Xcode generated for you (this simple test will be your "Hello, world" for Swift testing):

import XCTest
import Counter

class CounterTest: XCTestCase {
  func testSimpleAddition() {
    let counter = Counter()
    XCTAssertEqual(0, counter.count)


NOTE: In the current version of Swift (Beta 2), you have to import your main target into the test target to get your tests to compile and run. This is why we import Counter at the top.

NOTE: I've seen a few Swift tutorials recommend that you use the built-in Swift function assert in your test cases - do not do this! assert will terminate your entire program if it fails. Using the XCTAssert functions provides a number of important benefits:

  • If one test case fails, your other cases can continue running; assert stops the entire program.
  • Because the XCTAssert functions are more explicit about what you're expecting, they can print helpful failure messages (e.g. "2 was not equal to 3") whereas assert can only report that its condition was false. There's a broad variety of assert functions, including XCTAssertLessThan, XCTAssertNil, etc.
  • The Swift language specification explicitly forbids string interpolation in the message passed to assert; the XCTAssert functions don't face this limitation.
To try your test code out, click "Test" on the "Product" menu. Your single test should pass.

We'll add two more test cases to create and exercise several instances of Counter and to ensure that the counter wraps around when it overflows:

import XCTest
import Test

class CounterTest: XCTestCase {
  func testInvariants() {
    let counter = Counter()
    XCTAssertEqual(0, counter.count, "Counter not initialized to 0")
    XCTAssertEqual(1, counter.count, "Increment is broken")

    XCTAssertEqual(1, counter.count, "Count has unwanted side effects!")
  func testMultipleIncrements() {
    let counts = [1, 2, 3, 4, 5, 6]
    for count in counts {
      let counter = Counter()
      for i in 0..count {
      XCTAssertEqual(counter.count, count, "Incremented value does not match expected")
  func testWraparound() {
    let counter = Counter(count: Int.max)
    XCTAssertEqual(counter.count, Int.min)

These tests should pass as well.

You can find out more about XCTest in the Apple guide "Testing with Xcode." I hope this was helpful - please feel free to comment if anything is unclear.

by More Indirection ( at June 27, 2014 03:24 AM

Philip Wadler

Propositions as Types, updated

Propositions as Types has been updated. Thanks to all the readers and reviewers who helped me improve the paper.

Propositions as Types
Philip Wadler
Draft, 26 June 2014

The principle of Propositions as Types links logic to computation. At first sight it appears to be a simple coincidence---almost a pun---but it turns out to be remarkably robust, inspiring the design of theorem provers and programming languages, and continuing to influence the forefronts of computing. Propositions as Types has many names and many origins, and is a notion with depth, breadth, and mystery.
Comments still solicited!

by Philip Wadler ( at June 27, 2014 01:42 AM