Planet Haskell

February 21, 2017

Mark Jason Dominus

I found the best anagram in English

I planned to publish this last week sometime but then I wrote a line of code with three errors and that took over the blog.

A few years ago I mentioned in passing that in the 1990s I had constructed a listing of all the anagrams in Webster's Second International dictionary. (The Webster's headword list was available online.)

This was easy to do, even at the time, when the word list itself, at 2.5 megabytes, was a file of significant size. Perl and its cousins were not yet common; in those days I used Awk. But the task is not very different in any reasonable language:

  # Process word list
  while (my $word = <>) {
    chomp $word;
    my $sorted = join "", sort split //, $word;  # normal form
    push @{$anagrams{$sorted}}, $word;

  for my $words (values %anagrams) {
      print "@$words\n" if @$words > 1;

The key technique is to reduce each word to a normal form so that two words have the same normal form if and only if they are anagrams of one another. In this case we do this by sorting the letters into alphabetical order, so that both megalodon and moonglade become adeglmnoo.

Then we insert the words into a (hash | associative array | dictionary), keyed by their normal forms, and two or more words are anagrams if they fall into the same hash bucket. (There is some discussion of this technique in Higher-Order Perl pages 218–219 and elsewhere.)

(The thing you do not want to do is to compute every permutation of the letters of each word, looking for permutations that appear in the word list. That is akin to sorting a list by computing every permutation of the list and looking for the one that is sorted. I wouldn't have mentioned this, but someone on StackExchange actually asked this question.)

Anyway, I digress. This article is about how I was unhappy with the results of the simple procedure above. From the Webster's Second list, which contains about 234,000 words, it finds about 14,000 anagram sets (some with more than two words), consisting of 46,351 pairs of anagrams. The list starts with

aal ala

and ends with

zolotink zolotnik

which exemplify the problems with this simple approach: many of the 46,351 anagrams are obvious, uninteresting or even trivial. There must be good ones in the list, but how to find them?

I looked in the list to find the longest anagrams, but they were also disappointing:

cholecystoduodenostomy duodenocholecystostomy

(Webster's Second contains a large amount of scientific and medical jargon. A cholecystoduodenostomy is a surgical operation to create a channel between the gall bladder (cholecysto-) and the duodenum (duodeno-). A duodenocholecystostomy is the same thing.)

This example made clear at least one of the problems with boring anagrams: it's not that they are too short, it's that they are too simple. Cholecystoduodenostomy and duodenocholecystostomy are 22 letters long, but the anagrammatic relation between them is obvious: chop cholecystoduodenostomy into three parts:

cholecysto duodeno stomy

and rearrange the first two:

duodeno cholecysto stomy

and there you have it.

This gave me the idea to score a pair of anagrams according to how many chunks one had to be cut into in order to rearrange it to make the other one. On this plan, the “cholecystoduodenostomy / duodenocholecystostomy” pair would score 3, just barely above the minimum possible score of 2. Something even a tiny bit more interesting, say “abler / blare” would score higher, in this case 4. Even if this strategy didn't lead me directly to the most interesting anagrams, it would be a big step in the right direction, allowing me to eliminate the least interesting.

This rule would judge both “aal / ala” and “zolotink / zolotnik” as being uninteresting (scores 3 and 4 respectively), which is a good outcome. Note that some other boring-anagram problems can be seen as special cases of this one. For example, short anagrams never need to be cut into many parts: no four-letter anagrams can score higher than 4. The trivial anagramming of a word to itself always scores 1, and nontrivial anagrams always score more than this.

So what we need to do is: for each anagram pair, say acrididae (grasshoppers) and cidaridae (sea urchins), find the smallest number of chunks into which we can chop acrididae so that the chunks can be rearranged into cidaridae.

One could do this with a clever algorithm, if one were available. There is a clever algorithm, based on finding maximal independent sets in a certain graph. (More about this tomorrow.) I did not find this algorithm at the time; nor did I try. Instead, I used a brute-force search. Or rather, I used a very small amount of cleverness to reduce the search space, and then used brute-force search to search the reduced space.

Let's consider a example, scoring the anagram “abscise / scabies”. You do not have to consider every possible permutation of abscise. Rather, there are only two possible mappings from the letters of abscise to the letters of scabies. You know that the C must map to the C, the A must map to the A, and so forth. The only question is whether the first S of abscise maps to the first or to the second S of scabies. The first mapping gives us:

and the second gives us

because the S and the C no longer go to adjoining positions. So the minimum number of chunks is 5, and this anagram pair gets a score of 5.

To fully analyze cholecystoduodenostomy by this method required considering 7680 mappings. (120 ways to map the five O's × 2 ways to map the two C's × 2 ways to map the two D's, etc.) In the 1990s this took a while, but not prohibitively long, and it worked well enough that I did not bother to try to find a better algorithm. In 2016 it would probably still run quicker than implementing the maximal independent set algorithm. Unfortunately I have lost the code that I wrote then so I can't compare.

Assigning scores in this way produced a scored anagram list which began

2 aal ala

and ended

4 zolotink zolotnik

and somewhere in the middle was

3 cholecystoduodenostomy duodenocholecystostomy

all poor scores. But sorted by score, there were treasures at the end, and the clear winner was

14 cinematographer megachiropteran

I declare this the single best anagram in English. It is 15 letters long, and the only letters that stay together are the E and the R. “Cinematographer” is as familiar as a 15-letter word can be, and “megachiropteran” means a giant bat. GIANT BAT! DEATH FROM ABOVE!!!

And there is no serious competition. There was another 14-pointer, but both its words are Webster's Second jargon that nobody knows:

14 rotundifoliate titanofluoride

There are no score 13 pairs, and the score 12 pairs are all obscure. So this is the winner, and a deserving winner it is.

I think there is something in the list to make everyone happy. If you are the type of person who enjoys anagrams, the list rewards casual browsing. A few examples:

7 admirer married
7 admires sidearm

8 negativism timesaving
8 peripatetic precipitate
8 scepters respects
8 shortened threnodes
8 soapstone teaspoons

9 earringed grenadier
9 excitation intoxicate
9 integrals triangles
9 ivoriness revisions
9 masculine calumnies

10 coprophagist topographics
10 chuprassie haruspices
10 citronella interlocal

11 clitoridean directional
11 dispensable piebaldness
11 endometritria intermediator

“Clitoridean / directional” has been one of my favorites for years. But my favorite of all, although it scores only 6, is

6 yttrious touristy

I think I might love it just because the word yttrious is so delightful. (What a debt we owe to Ytterby, Sweden!)

I also rather like

5 notaries senorita

which shows that even some of the low-scorers can be worth looking at. Clearly my chunk score is not the end of the story, because “notaries / senorita” should score better than “abets / baste” (which is boring) or “Acephali / Phacelia” (whatever those are), also 5-pointers. The length of the words should be worth something, and the familiarity of the words should be worth even more.

Here are the results:

In former times there was a restaurant in Philadelphia named “Soupmaster”. My best unassisted anagram discovery was noticing that this is an anagram of “mousetraps”.

[ Addendum: There is a followup article, which will become available on 22 February 2017. ]

by Mark Dominus ( at February 21, 2017 03:31 PM

Brent Yorgey

Signed sets and ballots, part 1

The other day, Anders Claesson wrote a very nice blog post explaining a more combinatorial way to understand multiplicative inverses of virtual species (as opposed to the rather algebraic way I explained it in my previous post). In the middle of his post he makes an offhanded assumption which I stubbornly refused to take at face value; after thinking about it for a while and discussing it with Anders, I’m very glad I did, because there’s definitely more going on here than meets the eye and it’s given me a lot of interesting things to think about.

Recall that E denotes the species of sets, defined by E[U] = \{U\}, that is, the only E-structure on a given label set U is the set of labels itself. Recall also that the exponential generating function of a species F is given by

\displaystyle F(x) = \sum_{n \geq 0} f_n \frac{x^n}{n!}

where f_n counts the number of labelled F-structures of size n. In the case of E, we have f_n = 1 for all n, so

\displaystyle E(x) = \sum_{n \geq 0} \frac{x^n}{n!} = e^x.

(This is why E is such a good name for the species of sets—though in a fantastic coincidence, it seems to originally come from the French word for set, ensemble, rather than from the fact that E(x) = e^x (though on the other hand calling it a “coincidence” is probably too strong, since Joyal must surely have picked the notation with the generating function already in mind!).)

Now, from my previous post we know that

\displaystyle E^{-1} = (1 + E_+)^{-1} = \sum_{k \geq 0} (-1)^k (E_+)^k.

Let’s first consider \sum_k (E_+)^k (without the (-1)^k). This means that we have, for some k \geq 0, a k-ary product of E_+ structures—in other words, a list of nonempty sets. This is the species of ballots, also known as ordered set partitions, and can also be written L \circ E_+. As an example, here is a ballot on the set of labels \{1, \dots, 8\}:

The order of the parts matters, so this is a different ballot:

But the order of labels within each part doesn’t matter (since each part is a set). As another example, here is the complete collection of ballot structures on the labels \{1,2,3\}:

We can see that there are 13 in total: six where the labels are each in their own separate part (corresponding to the six possible permutations of the labels); six where two labels share a part and the other label is a singleton part (corresponding to the three ways to choose the solitary label, times the two ways to order the parts); and one final ballot where all three labels are grouped in the same part. (As an exercise, can you verify that there are 75 different ballot structures on a set of four labels?)

Returning to E^{-1} = \sum_k (-1)^k (E_+)^k, we can see that it consists of signed ballots, where the sign of a ballot is the parity of its number of parts, that is, a ballot with k parts has sign (-1)^k. The second half of Anders’ post gives a nice combinatorial proof that E \cdot E^{-1} = 1, via a sign-reversing involution: if we consider E \cdot E^{-1}-structures, i.e. pairs of sets and signed ballots, there is a natural1 way to pair them up, matching positive and negative structures so everything cancels (except in the case of the empty label set, which is why we get 1 instead of 0).

However, Anders is trying to do more than that. Note first that since multiplication of EGFs corresponds to multiplication of species, the EGF for E^{-1} is of course 1/e^x = e^{-x}. But this ought to also be the EGF for the virtual species E(-X), and the rest of his post hinges on identifying E(-X) and E^{-1}. As Anders and I discovered, however, this is precisely the point where it is worth being more careful.

First of all, what is E(-X)? Intuitively, an E(-X) structure consists of a set of negative atoms; since each set can be thought of as an (unordered) product of atoms, the whole set acquires a sign given by the parity of the number of atoms. In other words, intuitively it seems that E(-X) should be the species of signed sets, where an even-sized set is considered positive and an odd-sized set negative. That is,

\displaystyle E(-X) = \sum_{n \geq 0} (-1)^n E_n,

where E_n denotes the species of sets of size exactly n. As a sanity check, this makes sense as an EGF equation too, since the EGF of E_n is just x^n/n! and indeed

\displaystyle e^{-x} = \sum_{n \geq 0} \frac{(-x)^n}{n!} = \sum_{n \geq 0} (-1)^n \frac{x^n}{n!}.

But hold on a minute, what does E(-X) really mean, formally? It is the composition of the species E with the virtual species -X, and it turns out that it is not at all a priori obvious how to define composition for virtual species! We can find the definition on p. 127 of Bergeron et al. A special case (which is enough for our present purposes) is

\displaystyle \Phi(X - Y) = \Phi(X + Y) \times (E(X)E^{-1}(Y))

where X and Y are two sorts of atoms, and \times denotes Cartesian product of species. In our case,

\displaystyle E(0 - X) = E(0 + X) \times ((E(0) E^{-1}(X)) = E(X) \times E^{-1}(X) = E^{-1}(X)

since E is the identity for Cartesian product (overlaying an additional E structure on a set of labels does not add any structure, since there is only one possible E-structure).

All of this is to say, E(-X) is actually defined as E^{-1}! So at first glance it may seem we actually have nothing to prove: E(-X) and E^{-1} are the same by definition, end of story. …but in fact, all we have done is shift the burden of proof elsewhere: now it is our intuitive idea of E(-X) representing signed sets that requires proof!

To sum up, we know that E(-X) = E^{-1} = \sum_k (-1)^k (E_+)^k is the species of signed ballots, with sign given by parity of the number of parts; and intuitively, we also believe that E(-X) should correspond to parity-signed sets, \sum_n (-1)^n E_n. So, is there a nice combinatorial proof showing the correspondence between signed sets and signed ballots?

One can use the law of excluded middle to show that the answer must be “yes”: suppose the answer were “no”; but then I would not be writing this blog post, which is a contradiction since I am writing this blog post. But is there a constructive proof? Fear not! This blog post has gotten long enough, so I will stop here for now and let interested readers puzzle over it; in my next post I will explain what I came up with, along with some musings on linear orders and naturality.

  1. I am indeed using the word natural in a technical, categorical sense here! This will play an important role in my second post…

by Brent at February 21, 2017 03:42 AM

February 20, 2017

Ken T Takusagawa

[uyyrhizz] IDE type annotations

Ideas for a desirable feature of a Haskell IDE:

Good: IDE pops up the type signature of the library function or symbol under the point.  Emacs haskell-mode can do this.

Better: IDE is aware of static scoping, let binding, and imports to really know what function you are referring to.  However, if you forgot to import, it still tries to be helpful, guessing at a library function and offering its signature as well as a reminder that you need to import it.

Better: If the function does not have an explicit type signature, the IDE does type inference to figure it out.

Better: if the type is polymorphic, the IDE also provides the type of the function as instantiated where it is used, instead of just the polymorphic type where it was declared.

by Ken ( at February 20, 2017 05:08 PM

Gabriel Gonzalez

The Curry-Howard correspondence between programs and proofs

<html xmlns=""><head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> <meta content="text/css" http-equiv="Content-Style-Type"/> <meta content="pandoc" name="generator"/> <style type="text/css">code{white-space: pre;}</style> <style type="text/css">div.sourceCode { overflow-x: auto; } table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode { margin: 0; padding: 0; vertical-align: baseline; border: none; } table.sourceCode { width: 100%; line-height: 100%; } td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; } td.sourceCode { padding-left: 5px; } code > { color: #007020; font-weight: bold; } /* Keyword */ code > span.dt { color: #902000; } /* DataType */ code > span.dv { color: #40a070; } /* DecVal */ code > { color: #40a070; } /* BaseN */ code > span.fl { color: #40a070; } /* Float */ code > { color: #4070a0; } /* Char */ code > { color: #4070a0; } /* String */ code > { color: #60a0b0; font-style: italic; } /* Comment */ code > span.ot { color: #007020; } /* Other */ code > { color: #ff0000; font-weight: bold; } /* Alert */ code > span.fu { color: #06287e; } /* Function */ code > { color: #ff0000; font-weight: bold; } /* Error */ code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */ code > { color: #880000; } /* Constant */ code > { color: #4070a0; } /* SpecialChar */ code > span.vs { color: #4070a0; } /* VerbatimString */ code > { color: #bb6688; } /* SpecialString */ code > { } /* Import */ code > { color: #19177c; } /* Variable */ code > { color: #007020; font-weight: bold; } /* ControlFlow */ code > span.op { color: #666666; } /* Operator */ code > span.bu { } /* BuiltIn */ code > span.ex { } /* Extension */ code > span.pp { color: #bc7a00; } /* Preprocessor */ code > { color: #7d9029; } /* Attribute */ code > { color: #ba2121; font-style: italic; } /* Documentation */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ </style></head><body><head><meta charset="UTF-8"/></head>

This post will explain the connection between programming languages and logical proofs, known as the Curry-Howard correspondence. I will provide several examples of this correspondence to help you build a working intuition for how these two fields relate to one another.

The Curry-Howard correspondence states that:

  • Logical propositions correspond to programming types
  • Logical proofs correspond to programming values
  • Proving a proposition corresponds to creating a value of a given type

I'll use my Dhall programming language to illustrate the above connections since Dhall is an explicitly typed total programming language without any escape hatches. If you're not familiar with Dhall, you can read the Dhall tutorial and I will also make analogies to Haskell along the way, too.


Let's begin with the following logical proposition:

∀(a ∈ Prop): a ⇒ a

You can read that as "for any proposition (which we will denote using the letter a), if the proposition a is true then the proposition a is true". This is true no matter what the proposition a is. For example, suppose that a were the following proposition:

a = {The sky is blue}

Then we could conclude that "if the sky is blue then the sky is blue":

{The sky is blue} ⇒ {The sky is blue}

Here the right double arrow () denotes logical implication. Anywhere you see (A ⇒ B) you should read that as "if the proposition A is true then the proposition B is true". You can also say "the proposition A implies the proposition B" or just "A implies B" for short.

However, what if a were another proposition like:

a = {The sky is green}

giving us:

{The sky is green} ⇒ {The sky is green}

This is true, even though the sky might not be green. We only state that "if the sky is green then the sky is green", which is definitely a true statement, whether or not the sky is green.

The upside down A (i.e. ) in our original proposition is mathematical shorthand that means "for all" (although I sometimes describe this as "for any"). The symbol is shorthand for "in" (i.e. set membership). So whenever you see this:

∀(a ∈ S): p

... you can read that as "for any element a in the set S the proposition p is true". Usually the set S is Prop (i.e. the set of all possible propositions) but we will see some examples below where S can be another set.


This logical proposition:

∀(a ∈ Prop): a ⇒ a

... corresponds to the following Dhall type:

(a : Type)  a  a

... which you can read as "for any type (which we will denote using the letter a), this is the type of a function that transforms an input of type a to an output of type a". Here's the corresponding function that has this type:

λ(a : Type)  λ(x : a)  x

This is an anonymous function of two arguments:

  • the first argument is named a and a is a Type
  • the second argument is named x and x has type a

The result is the function's second argument (i.e. x) which still has type a.

The equivalent Haskell function is id, the polymorphic identity function:

id :: a -> a
id x = x

-- or using the `ExplicitForAll` language extension
id :: forall a . a -> a
id x = x

The main difference is that Haskell does not require you to explicitly bind the polymorphic type a as an additional argument. Dhall does require this because Dhall is an explicitly typed language.


The Curry-Howard correspondence says that we can use the type checker of a programming language as a proof checker. Any time we want to prove a logical proposition, we:

  • translate the logical proposition to the corresponding type in our programming language
  • create a value in our programming language that has the given type
  • use the type-checker to verify that our value has the specified type

If we find a value of the given type then that completes the proof of our original proposition.

For example, if we want to prove this proposition:

∀(a ∈ Prop): a ⇒ a

... then we translate the proposition to the corresponding type in the Dhall programming language:

(a : Type)  a  a

... then define a value in the Dhall language that has this type:

λ(a : Type)  λ(x : a)  x

... and then use Dhall's type checker to verify that this value has the given type:

$ dhall <<< 'λ(a : Type) → λ(x : a) → x'
(a : Type) → ∀(x : a)a

λ(a : Type) → λ(x : a)x

The first line is the inferred type of our value and the second line is the value's normal form (which in this case is the same as the original value).

Note that the inferred type slightly differs from the original type we expected:

-- What we expected:
(a : Type) a a

-- What the compiler inferred:
(a : Type) (x : a) a

These are both still the same type. The difference is that the compiler's inferred type also includes the name of the second argument: x. If we were to translate this back to logical proposition notation, we might write:

∀(a ∈ Prop): ∀(x ∈ a): a

... which you could read as: "for any proposition named a, given a proof of a named x, the proposition a is true". This is equivalent to our original proposition but now we've given a name (i.e. x) to the proof of a.

The reason this works is that you can think of a proposition as a set, too, where the elements of the set are the proofs of that proposition. Some propositions are false and have no elements (i.e. no valid proofs), while other propositions can have multiple elements (i.e. multiple valid proofs).

Similarly, you can think of a type as a set, where the elements of the set are the values that have that type. Some types are empty and have no elements (i.e. no values of that type), while other types can have multiple elements (i.e. multiple values of that type).

Here's an example of a proposition that has multiple valid proofs:

∀(a ∈ Prop): a ⇒ a ⇒ a

The corresponding type is:

(a : Type)  a  a  a

... and there are two values that have the above type. The first value is:

λ(a : Type)  λ(x : a)  λ(y : a)  x

... and the second value is:

λ(a : Type)  λ(x : a)  λ(y : a)  y

We can even translate these two values back into informal logical proofs. For example, this value:

λ(a : Type)  λ(x : a)  λ(y : a)  x

... corresponds to this informal proof:

For each "a" in the set of all propositions:

Given a proof that "a" is true [Let's call this proof "x"]

Given a proof that "a" is true [Let's call this proof "y"]

We can conclude that "a" is true, according to the proof "x"


Similarly, this value:

λ(a : Type)  λ(x : a)  λ(y : a)  y

... corresponds to this informal proof:

For each "a" in the set of all propositions:

Given a proof that "a" is true [Let's call this proof "x"]

Given a proof that "a" is true [Let's call this proof "y"]

We can conclude that "a" is true, according to the proof "y"


We can actually give these proofs a formal structure that parallels our code but that is outside of the scope of this post.

Function composition

Now consider this more complex proposition:

∀(a ∈ Prop):
∀(b ∈ Prop):
∀(c ∈ Prop):
(b ⇒ c) ⇒ (a ⇒ b) ⇒ (a ⇒ c)

You can read that as saying: "for any three propositions named a, b, and c, if b implies c, then if a implies b, then a implies c".

The corresponding type is:

    (a : Type)
(b : Type)
(c : Type)
(b c) (a b) (a c)

... and we can define a value that has this type in order to prove that the proposition is true:

    λ(a : Type)
λ(b : Type)
λ(c : Type)
λ(f : b c)
λ(g : a b)
λ(x : a)
f (g x)

... and the compiler will infer that our value has the correct type:

$ dhall
λ(a : Type)
→ λ(b : Type)
→ λ(c : Type)
→ λ(f : b → c)
→ λ(g : a → b)
→ λ(x : a)
f (g x)
(a : Type) → ∀(b : Type) → ∀(c : Type) → ∀(f : b → c) → ∀(g : a → b) → ∀(x : a)c

λ(a : Type) → λ(b : Type) → λ(c : Type) → λ(f : b → c) → λ(g : a → b) → λ(x : a)f (g x)

Note that this function is equivalent to Haskell's function composition operator (ie. (.)):

(.) :: (b -> c) -> (a -> b) -> (a -> c)
(f . g) x = f (g x)

This is because proofs and programs are equivalent to one another: whenever we prove a logical proposition we get a potentially useful function for free.

Circular reasoning

There are some propositions that we cannot prove, like this one:

∀(a ∈ Prop): a

... which you can read as saying "all propositions are true". This proposition is false, because if we translate the proposition to the corresponding type:

(a : Type)  a

... then there is no value in the Dhall language which has that type. If there were such a value then we could automatically create any value of any type.

However, Haskell does have values which inhabit the above type, such as this one:

example :: a
example = let x = x in x

-- or using the `ExplicitForAll` language extension:
example :: forall a . a
example = let x = x in x

This value just cheats and endlessly loops, which satisfies Haskell's type checker but doesn't produce a useful program. Unrestricted recursion like this is the programming equivalent of "circular reasoning" (i.e. trying to claim that a is true because a is true).

We can't cheat like this in Dhall and if we try we just get this error:

$ dhall <<< 'let x = x in x'

Use "dhall --explain" for detailed errors

Error: Unbound variable



You cannot define x in terms of itself because Dhall does not permit any recursion and therefore does not permit circular reasoning when used as a proof checker.


We will use the symbol to denote logical "and", so you can read the following proposition:

∀(a ∈ Prop): ∀(b ∈ Prop): (a ∧ b) ⇒ a

... as saying "for any two propositions a and b, if a and b are both true then a is true"

The type-level equivalent of logical "and" is a record with two fields:

(a : Type)  (b : Type)  { fst : a, snd : b }  a

... which is equivalent to this Haskell type:

(a, b) -> a

-- or using `ExplicitForAll`:
forall a b . (a, b) -> a

The programming value that has this type is:

λ(a : Type)  λ(b : Type)  λ(r : { fst : a, snd : b })  r.fst

... which is equivalent to this Haskell value:


Similarly, we can conclude that:

∀(a ∈ Prop): ∀(b ∈ Prop): (a ∧ b) ⇒ b

... which corresponds to this type:

(a : Type)  (b : Type)  { fst : a, snd : b }  b

... and this value of that type as the proof:

λ(a : Type)  λ(b : Type)  λ(r : { fst : a, snd : b })  r.snd

Now let's try a slightly more complex proposition:

∀(a ∈ Prop): ∀(b ∈ Prop): (a ∧ (a ⇒ b)) ⇒ b

You can read this as saying "for any propositions a and b, if the proposition a is true, and a implies b, then the proposition b is true".

That corresponds to this type:

(a : Type)  (b : Type)  { fst : a, snd : a  b }  b

... and the following value of that type which proves that the proposition is true:

    λ(a : Type)
λ(b : Type)
λ(r : { fst : a, snd : a b })
r.snd r.fst

Here's a slightly more complex example:

∀(a ∈ Prop): ∀(b ∈ Prop): ∀(c ∈ Prop): ((a ∧ b) ⇒ c) ⇒ (a ⇒ b ⇒ c)

You can read that as saying: "for any three propositions named a, b, and c, if a and b imply c, then a implies that b implies c".

Here's the corresponding type:

    (a : Type)
(b : Type)
(c : Type)
({ fst : a, snd : b } c) (a b c)

... and the corresponding value of that type:

    λ(a : Type)
λ(b : Type)
λ(c : Type)
λ(f : { fst : a, snd : b } c)
λ(x : a)
λ(y : b)
f { fst = x, snd = y }

Note that this is equivalent to Haskell's curry function:

curry :: ((a, b) -> c) -> (a -> b -> c)
curry f x y = f (x, y)


We will use the symbol to denote logical "or", so you can read the following proposition:

∀(a ∈ Prop): ∀(b ∈ Prop): a ⇒ a ∨ b

... as saying "for any propositions a and b, if a is true, then either a is true or b is true".

The type-level equivalent of logical "or" is a sum type:

(a : Type)  (b : Type)  a  < Left : a | Right : b >

... which is equivalent to this Haskell type:

a -> Either a b

-- or using `ExplicitForAll`
forall a b . a -> Either a b

... and the value that has this type is:

λ(a : Type)  λ(b : Type)  λ(x : a)  < Left = x | Right : b >

... which is equivalent to this Haskell value:


Similarly, for this proposition:

∀(a ∈ Prop): ∀(b ∈ Prop): b ⇒ a ∨ b

... the equivalent type is:

(a : Type)  (b : Type)  b  < Left : a | Right : b >

... and the value that has this type is:

λ(a : Type)  λ(b : Type)  λ(x : b)  < Right = x | Left : a >

Let's consider a more complex example:

∀(a ∈ Prop): a ∨ a ⇒ a

... which says: "for any proposition a, if a is true or a is true then a is true".

The corresponding type is:

(a : Type)  < Left : a | Right : a >  a

... which is equivalent to this Haskell type:

Either a a -> a

-- or using `ExplicitForAll`
forall a . Either a a -> a

... and we can create a value of this type by pattern matching on the sum type:

    λ(a : Type)
λ(s : < Left : a | Right : a >)
{ Left = λ(x : a) x
, Right = λ(x : a) x
: a

... which is equivalent to this Haskell code:

example :: Either a a -> a
example (Left x) = x
example (Right x) = x

You can read this "proof" as saying: "There are two possibilities. The first possibility (named "Left") is that a is true, therefore a must be true in that case. The second possibility (named "Right") is that a is true, therefore a must be true in that case. Since a is true in both cases we can conclude that a is true, period."


We'll use True to denote a logical proposition that is always true:


... and the corresponding type is an empty record with no fields:


... which is equivalent to Haskell's "unit" type:


An empty record literal is the only value that has the type of an empty record:


... which is equivalent to Haskell's "unit" value:



We'll use False to denote a logical proposition that is never true:


... and the corresponding type is an empty sum type (i.e. a type with no constructors/alternatives):


... which is equivalent to Haskell's Void type.

There is no value that has the above type, so you cannot prove a False proposition.

The logical "principle of explosion" states that if you assume a False proposition then you can prove any other proposition. In other words:

False ⇒ ∀(a ∈ Prop): a

... which you can read as saying: "if you assume a false proposition, then for any proposition named a you can prove that a is true".

The corresponding type is:

<>  (a : Type)  a

... and the value that inhabits the above type is:

λ(s : <>)  λ(a : Type)  merge {=} s : a

You can read that "proof" as saying: "There are zero possibilities. Since we handled all possibilities then the proposition a must be true." This line of reasoning is called a "vacuous truth", meaning that if there are no cases to consider then any statement you make is technically true for all cases.

The equivalent Haskell type would be:

Void -> a

... and the equivalent Haskell value of that type would be:



We'll use not to negate a logical proposition, meaning that:

not(True) = False

not(False) = True

We can encode logical not in terms of logical implication and logical False, like this:

not(a) = a ⇒ False

not(True) = True ⇒ False = False

not(False) = False ⇒ False = True

If logical not is a function from a proposition to another proposition then type-level not must be a function from a type to a type:

λ(a : Type)  (a  <>)

We can save the above type-level function to a file to create a convenient ./not utility we will reuse later on:

$ echo 'λ(a : Type) → a → <>' > ./not

Now let's try to prove a proposition like:

not(True) ⇒ False

The corresponding type is:

./not {}  <>

... which expands out to:

({}  <>)  <>

... and the value that has this type is:

λ(f : {}  <>)  f {=}

Double negation

However, suppose we try to prove "double negation":

∀(a ∈ Prop): not(not(a)) ⇒ a

... which says that "if a is not false, then a must be true".

The corresponding type would be:

(a : Type)  ./not (./not a)  a

... which expands out to:

(a : Type)  ((a  <>)  <>)  a

... but there is no value in the Dhall language that has this type!

The reason why is that every type checker corresponds to a specific logic system and not all logic systems are the same. Each logic system has different rules and axioms about what you can prove within the system.

Dhall's type checker is based on System Fω which corresponds to an intuitionistic (or constructive) logic. This sort of logic system does not assume the law of excluded middle.

The law of excluded middle says that every proposition must be true or false, which we can write like this:

∀(a ∈ Prop): a ∨ not(a)

You can read that as saying: "any proposition a is either true or false". This seems pretty reasonable until you try to translate the proposition to the corresponding type:

(a : Type)  < Left : a | Right : a  <> >

... which in Haskell would be:

example :: Either a (a -> Void)

We cannot create such a value because if we could then that would imply that for any type we could either:

  • produce a value of that type, or:
  • produce a "counter-example" by creating Void from any value of that type

While we can do this for some types, our type checker does not let us do this automatically for every type.

This is the same reason that we can't prove double negation, which implicitly assumes that there are only two choices (i.e. "true or false"). However, if we don't assume the law of the excluded middle then there might be other choices like: "I don't know".

Not all hope is lost, though! Even though our type checker doesn't support the axiom of choice, we can still add new axioms freely to our logic system. All we have to do is assume them the same way we assume other propositions.

For example, we can modify our original proposition to now say:

(∀(b ∈ Prop): b ∨ not(b)) ⇒ (∀(a ∈ Prop): not(not(a)) ⇒ a)

... which you can read as saying: "if we assume that all propositions are either true or false, then if a proposition is not false it must be true".

That corresponds to this type:

  ((b : Type)  < Left : b | Right : b  <> >)
((a : Type) ((a <>) <>) a)

... and we can create a value of that type:

    λ(noMiddle : (b : Type)  < Left : b | Right : b  <> >)
λ(a : Type)
λ(doubleNegation : (a <>) <>)
{ Left = λ(x : a) x
, Right = λ(x : a <>) merge {=} (doubleNegation x) : a
(noMiddle a)
: a

... which the type checker confirms has the correct type:

$ dhall
λ(noMiddle : ∀(b : Type)< Left : b | Right : b → <> >)
→ λ(a : Type)
→ λ(doubleNegation : (a → <>)<>)
{ Left = λ(x : a) → x
, Right = λ(x : a → <>) → merge {=} (doubleNegation x) : a
(noMiddle a)
: a
(noMiddle : ∀(b : Type)< Left : b | Right : b → <> >) → ∀(a : Type) → ∀(doubleNegation : (a → <>)<>) → a

λ(noMiddle : ∀(b : Type)< Left : b | Right : b → <> >) → λ(a : Type) → λ(doubleNegation : (a → <>)<>) → merge { Left = λ(x : a) → x, Right = λ(x : a → <>) → merge {=} (doubleNegation x) : a } (noMiddle a) : a

You can read that proof as saying:

  • The law of the excluded middle says that there are two possibilities: either a is true or a is false
    • If a is true then we're done: we trivially conclude a is true
    • If a is false then our assumption of not(not(a)) is also false
      • we can conclude anything from a false assumption, therefore we conclude that a is true

What's neat about this is that the compiler mechanically checks this reasoning process. You don't have to understand or trust my explanation of how the proof works because you can delegate your trust to the compiler, which does all the work for you.


This post gives a brief overview of how you can concretely translate logical propositions and proofs to types and programs. This can let you leverage your logical intuitions to reason about types or, vice versa, leverage your programming intuitions to reason about propositions. For example, if you can prove a false proposition in a logic system, then that's typically an escape hatch in the corresponding type system. Similarly, if you can create a value of the empty type in a programming language, that implies that the corresponding logic is not sound.

There are many kinds of type systems just like there are many kinds of logic systems. For every new logic system (like linear logic) you get a type system for free (like linear types). For example, Rust's type checker is an example of an affine type system which corresponds to an affine logic system.

To my knowledge, there are more logic systems in academic literature than there are type systems that have been implemented for programming languages. This in turn suggests that there are many awesome type systems waiting to be implemented.


by Gabriel Gonzalez ( at February 20, 2017 02:49 PM

February 18, 2017

Wolfgang Jeltsch

MIU in Haskell

In the Theory Lunch of the last week, James Chapman talked about the MU puzzle from Douglas Hofstadter’s book Gödel, Escher, Bach. This puzzle is about a string rewriting system. James presented a Haskell program that computes derivations of strings. Inspired by this, I wrote my own implementation, with the goal of improving efficiency. This blog post presents this implementation. As usual, it is available as a literate Haskell file, which you can load into GHCi.

The puzzle

Let me first describe the MU puzzle shortly. The puzzle deals with strings that may contain the characters \mathrm M, \mathrm I, and \mathrm U. We can derive new strings from old ones using the following rewriting system:

\begin{array}{rcl} x\mathrm I & \rightarrow & x\mathrm{IU} \\ \mathrm Mx & \rightarrow & \mathrm Mxx \\ x\mathrm{III}y & \rightarrow & x\mathrm Uy \\ x\mathrm{UU}y & \rightarrow & xy \end{array}

The question is whether it is possible to turn the string \mathrm{MI} into the string \mathrm{MU} using these rules.

You may want to try to solve this puzzle yourself, or you may want to look up the solution on the Wikipedia page.

The code

The code is not only concerned with deriving \mathrm{MU} from \mathrm{MI}, but with derivations as such.


We import Data.List:

import Data.List

Basic things

We define the type Sym of symbols and the type Str of symbol strings:

data Sym = M | I | U deriving Eq

type Str = [Sym]

instance Show Sym where

    show M = "M"
    show I = "I"
    show U = "U"

    showList str = (concatMap show str ++)

Next, we define the type Rule of rules as well as the list rules that contains all rules:

data Rule = R1 | R2 | R3 | R4 deriving Show

rules :: [Rule]
rules = [R1,R2,R3,R4]

Rule application

We first introduce a helper function that takes a string and returns the list of all splits of this string. Thereby, a split of a string str is a pair of strings str1 and str2 such that str1 ++ str2 == str. A straightforward implementation of splitting is as follows:

splits' :: Str -> [(Str,Str)]
splits' str = zip (inits str) (tails str)

The problem with this implementation is that walking through the result list takes quadratic time, even if the elements of the list are left unevaluated. The following implementation solves this problem:

splits :: Str -> [(Str,Str)]
splits str = zip (map (flip take str) [0 ..]) (tails str)

Next, we define a helper function replace. An expression replace old new str yields the list of all strings that can be constructed by replacing the string old inside str by new.

replace :: Str -> Str -> Str -> [Str]
replace old new str = [front ++ new ++ rear |
                          (front,rest) <- splits str,
                          old `isPrefixOf` rest,
                          let rear = drop (length old) rest]

We are now ready to implement the function apply, which performs rule application. This function takes a rule and a string and produces all strings that can be derived from the given string using the given rule exactly once.

apply :: Rule -> Str -> [Str]
apply R1 str        | last str == I = [str ++ [U]]
apply R2 (M : tail)                 = [M : tail ++ tail]
apply R3 str                        = replace [I,I,I] [U] str
apply R4 str                        = replace [U,U]   []  str
apply _  _                          = []

Derivation trees

Now we want to build derivation trees. A derivation tree for a string str has the following properties:

  • The root is labeled with str.
  • The subtrees of the root are the derivation trees for the strings that can be generated from str by a single rule application.
  • The edges from the root to its subtrees are marked with the respective rules that are applied.

We first define types for representing derivation trees:

data DTree = DTree Str [DSub]

data DSub  = DSub Rule DTree

Now we define the function dTree that turns a string into its derivation tree:

dTree :: Str -> DTree
dTree str = DTree str [DSub rule subtree |
                          rule <- rules,
                          subStr <- apply rule str,
                          let subtree = dTree subStr]


A derivation is a sequence of strings with rules between them such that each rule takes the string before it to the string after it. We define types for representing derivations:

data Deriv = Deriv [DStep] Str

data DStep = DStep Str Rule

instance Show Deriv where

    show (Deriv steps goal) = "        "           ++
                              concatMap show steps ++
                              show goal            ++

    showList derivs
        = (concatMap ((++ "\n") . show) derivs ++)

instance Show DStep where

    show (DStep origin rule) = show origin ++
                               "\n-> ("    ++
                               show rule   ++
                               ") "

Now we implement a function derivs that converts a derivation tree into the list of all derivations that start with the tree’s root label. The function derivs traverses the tree in breadth-first order.

derivs :: DTree -> [Deriv]
derivs tree = worker [([],tree)] where

    worker :: [([DStep],DTree)] -> [Deriv]
    worker tasks = rootDerivs tasks        ++
                   worker (subtasks tasks)

    rootDerivs :: [([DStep],DTree)] -> [Deriv]
    rootDerivs tasks = [Deriv (reverse revSteps) root |
                           (revSteps,DTree root _) <- tasks]

    subtasks :: [([DStep],DTree)] -> [([DStep],DTree)]
    subtasks tasks = [(DStep root rule : revSteps,subtree) |
                         (revSteps,DTree root subs) <- tasks,
                         DSub rule subtree          <- subs]

Finally, we implement the function derivations which takes two strings and returns the list of those derivations that turn the first string into the second:

derivations :: Str -> Str -> [Deriv]
derivations start end
    = [deriv | deriv@(Deriv _ goal) <- derivs (dTree start),
               goal == end]

You may want to enter

derivations [M,I] [M,U,I]

at the GHCi prompt to see the derivations function in action. You can also enter

derivations [M,I] [M,U]

to get an idea about the solution to the MU puzzle.

Tagged: Douglas Hofstadter, functional programming, Gödel, Escher, Bach (book), Haskell, Institute of Cybernetics, James Chapman, literate programming, MU puzzle, string rewriting, talk, Theory Lunch

by Wolfgang Jeltsch at February 18, 2017 03:42 AM

Generic programming in Haskell

Generic programming is a powerful way to define a function that works in an analogous way for a class of types. In this article, I describe the latest approach to generic programming that is implemented in GHC. This approach goes back to the paper A Generic Deriving Mechanism for Haskell by José Pedro Magalhães, Atze Dijkstra, Johan Jeuring, and Andres Löh.

This article is a writeup of a Theory Lunch talk I gave on 4 February 2016. As usual, the source of this article is a literate Haskell file, which you can download, load into GHCi, and play with.


Parametric polymorphism allows you to write functions that deal with values of any type. An example of such a function is the reverse function, whose type is [a] -> [a]. You can apply reverse to any list, no matter what types the elements have.

However, parametric polymorphism does not allow your functions to depend on the structure of the concrete types that are used in place of type variables. So values of these types are always treated as black boxes. For example, the reverse function only reorders the elements of the given list. A function of type [a] -> [a] could also drop elements (like the tail function does) or duplicate elements (like the cycle function does), but it could never invent new elements (except for ⊥) or analyze elements.

Now there are situation where a function is suitable for a class of types that share certain properties. For example, the sum function works for all types that have a notion of binary addition. Haskell uses type classes to support such functions. For example, the Num class provides the method (+), which is used in the definition of sum, whose type Num a => [a] -> a contains a respective class constraint.

The methods of a class have to be implemented separately for every type that is an instance of the class. This is reasonable for methods like (+), where the implementations for the different instances differ fundamentally. However, it is unfortunate for methods that are implemented in an analogous way for most of the class instances. An example of such a method is (==), since there is a canonical way of checking values of algebraic data types for equality. It works by first comparing the outermost data constructors of the two given values and if they match, the individual fields. Only when the data constructors and all the fields match, the two values are considered equal.

For several standard classes, including Eq, Haskell provides the deriving mechanism to generate instances with default method implementations whose precise functionality depends on the structure of the type. Unfortunately, there is no possibility in standard Haskell to extend this deriving mechanism to user-defined classes. Generic programming is a way out of this problem.


For generic programming, we need several language extensions. The good thing is that only one of them, DeriveGeneric, is specific to generic programming. The other ones have uses in other areas as well. Furthermore, DeriveGeneric is a very small extension. So the generic programming approach we describe here can be considered very lightweight.

We state the full set of necessary extensions with the following pragma:

{-# LANGUAGE DefaultSignatures,
             TypeOperators #-}

Apart from these language extensions, we need the module GHC.Generics:

import GHC.Generics

Our running example

As our running example, we pick serialization and deserialization of values. Serialization means converting a value into a bit string, and deserialization means parsing a bit string in order to get back a value.

We introduce a type Bit for representing bits:

data Bit = O | I deriving (Eq, Show)

Furthermore, we define the class of all types that support serialization and deserialization as follows:

class Serializable a where

    put :: a -> [Bit]

    get :: [Bit] -> (a, [Bit])

There is a canonical way of serializing values of algebraic data types. It works by first encoding the data constructor of the given value as a sequence of bits and then serializing the individual fields. To show this approach in action, we define an algebraic data type Tree, which is a type of labeled binary trees:

data Tree a = Leaf | Branch (Tree a) a (Tree a) deriving Show

An instantiation of Serializable for Tree that follows the canonical serialization approach can be carried out as follows:

instance Serializable a => Serializable (Tree a) where

    put Leaf                     = [O]
    put (Branch left root right) = [I]       ++
                                   put left  ++
                                   put root  ++
                                   put right

    get (O : bits) = (Leaf, bits)
    get (I : bits) = (Branch left root right, bits''') where

        (left,  bits')   = get bits
        (root,  bits'')  = get bits'
        (right, bits''') = get bits''

Of course, it quickly becomes cumbersome to provide such an instance declaration for every algebraic data type that should use the canonical serialization approach. So we want to implement the canonical approach once and for all and make it easily usable for arbitrary types that are amenable to it. Generic programming makes this possible.


An algebraic data type is essentially a sum of products where the terms “sum” and “product” are understood as follows:

  • A sum is a variant type. In Haskell, Either is the canonical type constructor for binary sums, and the empty type Void from the void package is the nullary sum.

  • A product is a tuple type. In Haskell, (,) is the canonical type constructor for binary products, and () is the nullary product.

The key idea of generic programming is to map types to representations that make the sum-of-products structure explicit and to implement canonical behavior based on these representations instead of the actual types.

The GHC.Generics module defines a number of type constructors for constructing representations:

data V1 p

infixr 5 :+:
data (:+:) f g p = L1 (f p) | R1 (g p)

data U1 p = U1

infixr 6 :*:
data (:*:) f g p = f p :*: g p

newtype K1 i a p = K1 { unK1 :: a }

newtype M1 i a f p = M1 { unM1 :: f p }

All of these type constructors take a final parameter p. This parameter is relevant only when dealing with higher-order classes. In this article, however, we only discuss generic programming with first-order classes. In this case, the parameter p is ignored. The different type constructors play the following roles:

  • V1 is for the nullary sum.

  • (:+:) is for binary sums.

  • U1 is for the nullary product.

  • (:*:) is for binary products.

  • K1 is a wrapper for fields of algebraic data types. Its parameter i used to provide some information about the field at the type level, but is now obsolete.

  • M1 is a wrapper for attaching meta information at the type level. Its parameter i denotes the kind of the language construct the meta information refers to, and its parameter c provides access to the meta information.

The GHC.Generics module furthermore introduces a class Generic, whose instances are the types for which a representation exists. Its definition is as follows:

class Generic a where

  type Rep a :: * -> *

  from :: a -> (Rep a) p

  to :: (Rep a) p -> a

A type Rep a is the representation of the type a. The methods from and to convert from values of the actual type to values of the representation type and vice versa.

To see all this in action, we make Tree a an instance of Generic:

instance Generic (Tree a) where

    type Rep (Tree a) =
        M1 D D1_Tree (
            M1 C C1_Tree_Leaf U1
            M1 C C1_Tree_Branch (
                M1 S NoSelector (K1 R (Tree a))
                M1 S NoSelector (K1 R a)
                M1 S NoSelector (K1 R (Tree a))

    from Leaf                     = M1 (L1 (M1 U1))
    from (Branch left root right) = M1 (
                                        R1 (
                                        M1 (
                                            M1 (K1 left)
                                            M1 (K1 root)
                                            M1 (K1 right)

    to (M1 (L1 (M1 U1)))      = Leaf
    to (M1 (
            R1 (
            M1 (
                M1 (K1 left)
                M1 (K1 root)
                M1 (K1 right)
        ))                    = Branch left root right

The types D1_Tree, C1_Tree_Leaf, and C1_Tree_Branch are type-level representations of the type constructor Tree, the data constructor Leaf, and the data constructor Branch, respectively. We declare them as empty types:

data D1_Tree
data C1_Tree_Leaf
data C1_Tree_Branch

We need to make these types instances of the classes Datatype and Constructor, which are part of GHC.Generics as well. These classes provide a link between the type-level representations of type and data constructors and the meta information related to them. This meta information particularly covers the identifiers of the type and data constructors, which are needed when implementing canonical implementations for methods like show and read. The instance declarations for the Tree-related types are as follows:

instance Datatype D1_Tree where

  datatypeName _ = "Tree"

  moduleName _ = "Main"

instance Constructor C1_Tree_Leaf where

  conName _ = "Leaf"

instance Constructor C1_Tree_Branch where

  conName _ = "Branch"

Instantiating the Generic class as shown above is obviously an extremely tedious task. However, it is possible to instantiate Generic completely automatically for any given algebraic data type, using the deriving syntax. This is what the DeriveGeneric language extension makes possible.

So instead of making Tree a an instance of Generic by hand, as we have done above, we could have declared the Tree type as follows in the first place:

data Tree a = Leaf | Branch (Tree a) a (Tree a)
              deriving (Show, Generic)

Implementing canonical behavior

As mentioned above, we implement canonical behavior based on representations. Let us see how this works in the case of the Serializable class.

We introduce a new class Serializable' whose methods provide serialization and deserialization for representation types:

class Serializable' f where

    put' :: f p -> [Bit]

    get' :: [Bit] -> (f p, [Bit])

We instantiate this class for all the representation types:

instance Serializable' U1 where

    put' U1 = []

    get' bits = (U1, bits)

instance (Serializable' r, Serializable' s) =>
         Serializable' (r :*: s) where

    put' (rep1 :*: rep2) = put' rep1 ++ put' rep2

    get' bits = (rep1 :*: rep2, bits'') where

        (rep1, bits')  = get' bits
        (rep2, bits'') = get' bits'

instance Serializable' V1 where

    put' _ = error "attempt to put a void value"

    get' _ = error "attempt to get a void value"

instance (Serializable' r, Serializable' s) =>
         Serializable' (r :+: s) where

    put' (L1 rep) = O : put' rep
    put' (R1 rep) = I : put' rep

    get' (O : bits) = let (rep, bits') = get' bits in
                      (L1 rep, bits')
    get' (I : bits) = let (rep, bits') = get' bits in
                      (R1 rep, bits')

instance Serializable' r => Serializable' (M1 i a r) where

    put' (M1 rep) = put' rep

    get' bits = (M1 rep, bits') where

        (rep, bits') = get' bits

instance Serializable a => Serializable' (K1 i a) where

    put' (K1 val) = put val

    get' bits = (K1 val, bits') where

        (val, bits') = get bits

Note that in the case of K1, the context mentions Serializable, not Serializable', and the methods put' and get call put and get, not put' and get'. The reason is that the value wrapped in K1 has an ordinary type, not a representation type.

Accessing canonical behavior

We can now apply canonical behavior to ordinary types using the methods from and to from the Generic class. For example, we can implement functions defaultPut and defaultGet that provide canonical serialization and deserialization for all instances of Generic:

defaultPut :: (Generic a, Serializable' (Rep a)) =>
              a -> [Bit]
defaultPut = put' . from

defaultGet :: (Generic a, Serializable' (Rep a)) =>
              [Bit] -> (a, [Bit])
defaultGet bits = (to rep, bits') where

    (rep, bits') = get' bits

We can use these functions in instance declarations for Serializable. For example, we can make Tree a an instance of Serializable in the following way:

instance Serializable a => Serializable (Tree a) where
    put = defaultPut

    get = defaultGet

Compared to the instance declaration we had initially, this one is a real improvement, since we do not have to implement the desired behavior of put and get by hand anymore. However, it still contains boilerplate code in the form of the trivial method declarations. It would be better to establish defaultPut and defaultGet as defaults in the class declaration:

class Serializable a where

    put :: a -> [Bit]
    put = defaultPut

    get :: [Bit] -> (a, [Bit])
    get = defaultGet

However, this is not possible, since the types of defaultPut and defaultGet are less general than the types of put and get, as they put additional constraints on the type a. Luckily, GHC supports the language extension DefaultSignatures, which allows us to give default implementations that have less general types than the actual methods (and consequently work only for those instances that are compatible with these less general types). Using DefaultSignatures, we can declare the Serializable class as follows:

class Serializable a where

    put :: a -> [Bit]
    default put :: (Generic a, Serializable' (Rep a)) =>
                   a -> [Bit]
    put = defaultPut

    get :: [Bit] -> (a, [Bit])
    default get :: (Generic a, Serializable' (Rep a)) =>
                   [Bit] -> (a, [Bit])
    get = defaultGet

With this class declaration in place, we can make Tree a an instance of Serializable as follows:

instance Serializable a => Serializable (Tree a)

With the minor extension DeriveAnyClass, which is provided by GHC starting from Version 7.10, we can even use the deriving keyword to instantiate Serializable for Tree a. As a result, we only have to write the following in order to define the Tree type and make it an instance of Serializable:

data Tree a = Leaf | Branch (Tree a) a (Tree a)
              deriving (Show, Generic, Serializable)

So finally, we can use our own classes like the Haskell standard classes regarding the use of deriving clauses, except that we have to additionally derive an instance declaration for Generic.

Specialized implementations

Usually, not all instances of a class should or even can be generated by means of generic programming, but some instances have to be crafted by hand. For example, making Int an instance of Serializable requires manual work, since Int is not an algebraic data type.

However, there is no problem with this, since we still have the opportunity to write explicit instance declarations, despite the presence of a generic solution. This is in line with the standard deriving mechanism: you can make use of it, but you are not forced to do so. So we can have the following instance declaration, for example:

instance Serializable Int where

    put val = replicate val I ++ [O]

    get bits = (length is, bits') where

        (is, O : bits') = span (== I) bits

Of course, the serialization approach we use here is not very efficient, but the instance declaration illustrates the point we want to make.

Tagged: functional programming, generic programming, GHC, Haskell, Institute of Cybernetics, literate programming, parametric polymorphism, talk, Theory Lunch, type class, type family, void (Haskell package)

by Wolfgang Jeltsch at February 18, 2017 03:36 AM

Constrained monads

There are Haskell types that have an associated monad structure, but cannot be made instances of the Monad class. The reason is typically that the return or the bind operation of such a type m has a constraint on the type parameter of m. As a result, all the nice library support for monads is unusable for such types. This problem is called the constrained-monad problem.

In my article The Constraint kind, I described a solution to this problem, which involved changing the Monad class. In this article, I present a solution that works with the standard Monad class. This solution has been developed by Neil Sculthorpe, Jan Bracker, George Giorgidze, and Andy Gill. It is described in their paper The Constrained-Monad Problem and implemented in the constrained-normal package.

This article is a write-up of a Theory Lunch talk I gave quite some time ago. As usual, the source of this article is a literate Haskell file, which you can download, load into GHCi, and play with.


We have to enable a couple of language extensions:

{-# LANGUAGE ConstraintKinds,
             Rank2Types #-}

Furthermore, we need to import some modules:

import Data.Set     hiding (fold, map)
import Data.Natural hiding (fold)

These imports require the packages containers and natural-numbers to be installed.

The set monad

The Set type has an associated monad structure, consisting of a return and a bind operation:

returnSet :: a -> Set a
returnSet = singleton

bindSet :: Ord b => Set a -> (a -> Set b) -> Set b
bindSet sa g = unions (map g (toList sa))

We cannot make Set an instance of Monad though, since bindSet has an Ord constraint on the element type of the result set, which is caused by the use of unions.

For a solution, let us first look at how monadic computations on sets would be expressed if Set was an instance of Monad. A monadic expression would be built from non-monadic expressions and applications of return and (>>=). For every such expression, there would be a normal form of the shape

s1 >>= \ x1 -> s2 >>= \ x2 ->  -> sn >>= \ xn -> return r

where the si would be non-monadic expressions of type Set. The existence of a normal form would follow from the monad laws.

We define a type UniSet of such normal forms:

data UniSet a where

    ReturnSet  :: a -> UniSet a

    AtmBindSet :: Set a -> (a -> UniSet b) -> UniSet b

We can make UniSet an instance of Monad where the monad operations build expressions and normalize them on the fly:

instance Monad UniSet where

    return a = ReturnSet a

    ReturnSet a     >>= f = f a
    AtmBindSet sa h >>= f = AtmBindSet sa h' where

        h' a = h a >>= f

Note that these monad operations are analogous to operations on lists, with return corresponding to singleton construction and (>>=) corresponding to concatenation. Normalization happens in (>>=) by applying the left-identity and the associativity law for monads.

We can use UniSet as an alternative set type, representing a set by a normal form that evaluates to this set. This way, we get a set type that is an instance of Monad. For this to be sane, we have to hide the data constructors of UniSet, so that different normal forms that evaluate to the same set cannot be distinguished.

Now we need functions that convert between Set and UniSet. Conversion from Set to UniSet is simple:

toUniSet :: Set a -> UniSet a
toUniSet sa = AtmBindSet sa ReturnSet

Conversion from UniSet to Set is expression evaluation:

fromUniSet :: Ord a => UniSet a -> Set a
fromUniSet (ReturnSet a)     = returnSet a
fromUniSet (AtmBindSet sa h) = bindSet sa g where

    g a = fromUniSet (h a)

The type of fromUniSet constrains the element type to be an instance of Ord. This single constraint is enough to make all invocations of bindSet throughout the conversion legal. The reason is our use of normal forms. Since normal forms are “right-leaning”, all applications of (>>=) in them have the same result type as the whole expression.

The multiset monad

Let us now look at a different monad, the multiset monad.

We represent a multiset as a function that maps each value of the element type to its multiplicity in the multiset, with a multiplicity of zero denoting absence of this value:

newtype MSet a = MSet { mult :: a -> Natural }

Now we define the return operation:

returnMSet :: Eq a => a -> MSet a
returnMSet a = MSet ma where

    ma b | a == b    = 1
         | otherwise = 0

For defining the bind operation, we need to define a class Finite of finite types whose sole method enumerates all the values of the respective type:

class Finite a where

    values :: [a]

The implementation of the bind operation is as follows:

bindMSet :: Finite a => MSet a -> (a -> MSet b) -> MSet b
bindMSet msa g = MSet mb where

    mb b = sum [mult msa a * mult (g a) b | a <- values]

Note that the multiset monad differs from the set monad in its use of constraints. The set monad imposes a constraint on the result element type of bind, while the multiset monad imposes a constraint on the first argument element type of bind and another constraint on the result element type of return.

Like in the case of sets, we define a type of monadic normal forms:

data UniMSet a where

    ReturnMSet  :: a -> UniMSet a

    AtmBindMSet :: Finite a =>
                   MSet a -> (a -> UniMSet b) -> UniMSet b

The key difference to UniSet is that UniMSet involves the constraint of the bind operation, so that normal forms must respect this constraint. Without this restriction, it would not be possible to evaluate normal forms later.

The MonadUniMSet instance declaration is analogous to the MonadUniSet instance declaration:

instance Monad UniMSet where

    return a = ReturnMSet a

    ReturnMSet a      >>= f = f a
    AtmBindMSet msa h >>= f = AtmBindMSet msa h' where

        h' a = h a >>= f

Now we define conversion from MSet to UniMSet:

toUniMSet :: Finite a => MSet a -> UniMSet a
toUniMSet msa = AtmBindMSet msa ReturnMSet

Note that we need to constrain the element type in order to fulfill the constraint incorporated into the UniMSet type.

Finally, we define conversion from UniMSet to MSet:

fromUniMSet :: Eq a => UniMSet a -> MSet a
fromUniMSet (ReturnMSet a)      = returnMSet a
fromUniMSet (AtmBindMSet msa h) = bindMSet msa g where

    g a = fromUniMSet (h a)

Here we need to impose an Eq constraint on the element type. Note that this single constraint is enough to make all invocations of returnMSet throughout the conversion legal. The reason is again our use of normal forms.

A generic solution

The solutions to the constrained-monad problem for sets and multisets are very similar. It is certainly not good if we have to write almost the same code for every new constrained monad that we want to make accessible via the Monad class. Therefore, we define a generic type that covers all such monads:

data UniMonad c t a where

    Return  :: a -> UniMonad c t a

    AtmBind :: c a =>
               t a -> (a -> UniMonad c t b) -> UniMonad c t b

The parameter t of UniMonad is the underlying data type, like Set or MSet, and the parameter c is the constraint that has to be imposed on the type parameter of the first argument of the bind operation.

For every c and t, we make UniMonad c t an instance of Monad:

instance Monad (UniMonad c t) where

    return a = Return a

    Return a     >>= f = f a
    AtmBind ta h >>= f = AtmBind ta h' where

        h' a = h a >>= f

We define a function lift that converts from the underlying data type to UniMonad and thus generalizes toUniSet and toUniMSet:

lift :: c a => t a -> UniMonad c t a
lift ta = AtmBind ta Return

Evaluation of normal forms is just folding with the return and bind operations of the underlying data type. Therefore, we implement a fold operator for UniMonad:

fold :: (a -> r)
     -> (forall a . c a => t a -> (a -> r) -> r)
     -> UniMonad c t a
     -> r
fold return _       (Return a)     = return a
fold return atmBind (AtmBind ta h) = atmBind ta g where

    g a = fold return atmBind (h a)

Note that fold does not need to deal with constraints, neither with constraints on the result type parameter of return (like Eq in the case of MSet), nor with constraints on the result type parameter of bind (like Ord in the case of Set). This is because fold works with any result type r.

Now let us implement Monad-compatible sets and multisets based on UniMonad.

In the case of sets, we face the problem that UniMonad takes a constraint for the type parameter of the first bind argument, but bindSet does not have such a constraint. To solve this issue, we introduce a type class Unconstrained of which every type is an instance:

class Unconstrained a

instance Unconstrained a

The implementation of Monad-compatible sets is now straightforward:

type UniMonadSet = UniMonad Unconstrained Set

toUniMonadSet :: Set a -> UniMonadSet a
toUniMonadSet = lift

fromUniMonadSet :: Ord a => UniMonadSet a -> Set a
fromUniMonadSet = fold returnSet bindSet

The implementation of Monad-compatible multisets does not need any utility definitions, but can be given right away:

type UniMonadMSet = UniMonad Finite MSet

toUniMonadMSet :: Finite a => MSet a -> UniMonadMSet a
toUniMonadMSet = lift

fromUniMonadMSet :: Eq a => UniMonadMSet a -> MSet a
fromUniMonadMSet = fold returnMSet bindMSet

Tagged: Andy Gill, constrained-normal (Haskell package), Constraint (kind), containers (Haskell package), functional programming, GADT, George Giorgidze, GHC, Haskell, Institute of Cybernetics, Jan Bracker, literate programming, monad, natural-numbers (Haskell package), Neil Sculthorpe, normal form, talk, Theory Lunch

by Wolfgang Jeltsch at February 18, 2017 03:32 AM

MIU in Curry

More than two years ago, my colleague Denis Firsov and I gave a series of three Theory Lunch talks about the MIU string rewriting system from Douglas Hofstadter’s MU puzzle. The first talk was about a Haskell implementation of MIU, the second talk was an introduction to the functional logic programming language Curry, and the third talk was about a Curry implementation of MIU. The blog articles MIU in Haskell and A taste of Curry are write-ups of the first two talks. However, a write-up of the third talk has never seen the light of day so far. This is changed with this article.

As usual, this article is written using literate programming. The article source is a literate Curry file, which you can load into KiCS2 to play with the code.

I want to thank all the people from the Curry mailing list who have helped me improving the code in this article.


We import the module SearchTree:

import SearchTree

Basic things

We define the type Sym of symbols and the type Str of symbol strings:

data Sym = M | I | U

showSym :: Sym -> String
showSym M = "M"
showSym I = "I"
showSym U = "U"

type Str = [Sym]

showStr :: Str -> String
showStr str = concatMap showSym str

Next, we define the type Rule of rules:

data Rule = R1 | R2 | R3 | R4

showRule :: Rule -> String
showRule R1 = "R1"
showRule R2 = "R2"
showRule R3 = "R3"
showRule R4 = "R4"

So far, the Curry code is basically the same as the Haskell code. However, this is going to change below.

Rule application

Rule application becomes a lot simpler in Curry. In fact, we can code the rewriting rules almost directly to get a rule application function:

applyRule :: Rule -> Str -> Str
applyRule R1 (init ++ [I])              = init ++ [I, U]
applyRule R2 ([M] ++ tail)              = [M] ++ tail ++ tail
applyRule R3 (pre ++ [I, I, I] ++ post) = pre ++ [U] ++ post
applyRule R4 (pre ++ [U, U] ++ post)    = pre ++ post

Note that we do not return a list of derivable strings, as we did in the Haskell solution. Instead, we use the fact that functions in Curry are nondeterministic.

Furthermore, we do not need the helper functions splits and replace that we used in the Haskell implementation. Instead, we use the ++-operator in conjunction with functional patterns to achieve the same functionality.

Now we implement a utility function applyRules for repeated rule application. Our implementation uses a similar trick as the famous Haskell implementation of the Fibonacci sequence:

applyRules :: [Rule] -> Str -> [Str]
applyRules rules str = tail strs where

    strs = str : zipWith applyRule rules strs

The Haskell implementation does not need the applyRules function, but it needs a lot of code about derivation trees instead. In the Curry solution, derivation trees are implicit, thanks to nondeterminism.


A derivation is a sequence of strings with rules between them such that each rule takes the string before it to the string after it. We define types for representing derivations:

data Deriv = Deriv [DStep] Str

data DStep = DStep Str Rule

showDeriv :: Deriv -> String
showDeriv (Deriv steps goal) = "        "                ++
                               concatMap showDStep steps ++
                               showStr goal              ++

showDerivs :: [Deriv] -> String
showDerivs derivs = concatMap ((++ "\n") . showDeriv) derivs

showDStep :: DStep -> String
showDStep (DStep origin rule) = showStr origin ++
                                "\n-> ("       ++
                                showRule rule  ++
                                ") "

Now we implement a function derivation that takes two strings and returns the derivations that turn the first string into the second:

derivation :: Str -> Str -> Deriv
derivation start end
    | start : applyRules rules start =:= init ++ [end]
        = Deriv (zipWith DStep init rules) end where

    rules :: [Rule]
    rules free

    init :: [Str]
    init free

Finally, we define a function printDerivations that explicitly invokes a breadth-first search to compute and ultimately print derivations:

printDerivations :: Str -> Str -> IO ()
printDerivations start end = do
    searchTree <- getSearchTree (derivation start end)
    putStr $ showDerivs (allValuesBFS searchTree)

You may want to enter

printDerivations [M, I] [M, I, U]

at the KiCS2 prompt to see the derivations function in action.

Tagged: breadth-first search, Curry, Denis Firsov, Douglas Hofstadter, functional logic programming, functional pattern, functional programming, Haskell, Institute of Cybernetics, KiCS2, literate programming, logic programming, MU puzzle, string rewriting, talk, Theory Lunch

by Wolfgang Jeltsch at February 18, 2017 03:17 AM

A taste of Curry

Curry is a programming language that integrates functional and logic programming. Last week, Denis Firsov and I had a look at Curry, and Thursday, I gave an introductory talk about Curry in the Theory Lunch. This blog post is mostly a write-up of my talk.

Like Haskell, Curry has support for literate programming. So I wrote this blog post as a literate Curry file, which is available for download. If you want to try out the code, you have to install the Curry system KiCS2. The code uses the functional patterns language extension, which is only supported by KiCS2, as far as I know.

Functional programming

The functional fragment of Curry is very similar to Haskell. The only fundamental difference is that Curry does not support type classes.

Let us do some functional programming in Curry. First, we define a type whose values denote me and some of my relatives.

data Person = Paul
            | Joachim
            | Rita
            | Wolfgang
            | Veronika
            | Johanna
            | Jonathan
            | Jaromir

Now we define a function that yields the father of a given person if this father is covered by the Person type.

father :: Person -> Person
father Joachim  = Paul
father Rita     = Joachim
father Wolfgang = Joachim
father Veronika = Joachim
father Johanna  = Wolfgang
father Jonathan = Wolfgang
father Jaromir  = Wolfgang

Based on father, we define a function for computing grandfathers. To keep things simple, we only consider fathers of fathers to be grandfathers, not fathers of mothers.

grandfather :: Person -> Person
grandfather = father . father

Combining functional and logic programming

Logic programming languages like Prolog are able to search for variable assignments that make a given proposition true. Curry, on the other hand, can search for variable assignments that make a certain expression defined.

For example, we can search for all persons that have a grandfather according to the above data. We just enter

grandfather person where person free

at the KiCS2 prompt. KiCS2 then outputs all assignments to the person variable for which grandfather person is defined. For each of these assignments, it additionally prints the result of the expression grandfather person.


Functions in Curry can actually be non-deterministic, that is, they can return multiple results. For example, we can define a function element that returns any element of a given list. To achieve this, we use overlapping patterns in our function definition. If several equations of a function definition match a particular function application, Curry takes all of them, not only the first one, as Haskell does.

element :: [el] -> el
element (el : _)   = el
element (_  : els) = element els

Now we can enter

element "Hello!"

at the KiCS2 prompt, and the system outputs six different results.

Logic programming

We have already seen how to combine functional and logic programming with Curry. Now we want to do pure logic programming. This means that we only want to search for variable assignments, but are not interested in expression results. If you are not interested in results, you typically use a result type with only a single value. Curry provides the type Success with the single value success for doing logic programming.

Let us write some example code about routes between countries. We first introduce a type of some European and American countries.

data Country = Canada
             | Estonia
             | Germany
             | Latvia
             | Lithuania
             | Mexico
             | Poland
             | Russia
             | USA

Now we want to define a relation called borders that tells us which country borders which other country. We implement this relation as a function of type

Country -> Country -> Success

that has the trivial result success if the first country borders the second one, and has no result otherwise.

Note that this approach of implementing a relation is different from what we do in functional programming. In functional programming, we use Bool as the result type and signal falsity by the result False. In Curry, however, we signal falsity by the absence of a result.

Our borders relation only relates countries with those neighbouring countries whose names come later in alphabetical order. We will soon compute the symmetric closure of borders to also get the opposite relationships.

borders :: Country -> Country -> Success
Canada    `borders` USA       = success
Estonia   `borders` Latvia    = success
Estonia   `borders` Russia    = success
Germany   `borders` Poland    = success
Latvia    `borders` Lithuania = success
Latvia    `borders` Russia    = success
Lithuania `borders` Poland    = success
Mexico    `borders` USA       = success

Now we want to define a relation isConnected that tells whether two countries can be reached from each other via a land route. Clearly, isConnected is the equivalence relation that is generated by borders. In Prolog, we would write clauses that directly express this relationship between borders and isConnected. In Curry, on the other hand, we can write a function that generates an equivalence relation from any given relation and therefore does not only work with borders.

We first define a type alias Relation for the sake of convenience.

type Relation val = val -> val -> Success

Now we define what reflexive, symmetric, and transitive closures are.

reflClosure :: Relation val -> Relation val
reflClosure rel val1 val2 = rel val1 val2
reflClosure rel val  val  = success

symClosure :: Relation val -> Relation val
symClosure rel val1 val2 = rel val1 val2
symClosure rel val2 val1 = rel val1 val2

transClosure :: Relation val -> Relation val
transClosure rel val1 val2 = rel val1 val2
transClosure rel val1 val3 = rel val1 val2 &
                             transClosure rel val2 val3

    where val2 free

The operator & used in the definition of transClosure has type

Success -> Success -> Success

and denotes conjunction.

We define the function for generating equivalence relations as a composition of the above closure operators. Note that it is crucial that the transitive closure operator is applied after the symmetric closure operator, since the symmetric closure of a transitive relation is not necessarily transitive.

equivalence :: Relation val -> Relation val
equivalence = reflClosure . transClosure . symClosure

The implementation of isConnected is now trivial.

isConnected :: Country -> Country -> Success
isConnected = equivalence borders

Now we let KiCS2 compute which countries I can reach from Estonia without a ship or plane. We do so by entering

Estonia `isConnected` country where country free

at the prompt.

We can also implement a nondeterministic function that turns a country into the countries connected to it. For this, we use a guard that is of type Success. Such a guard succeeds if it has a result at all, which can only be success, of course.

connected :: Country -> Country
connected country1
    | country1 `isConnected` country2 = country2

    where country2 free

Equational constraints

Curry has a predefined operator

=:= :: val -> val -> Success

that stands for equality.

We can use this operator, for example, to define a nondeterministic function that yields the grandchildren of a given person. Again, we keep things simple by only considering relationships that solely go via fathers.

grandchild :: Person -> Person
grandchild person
    | grandfather grandkid =:= person = grandkid

    where grandkid free

Note that grandchild is the inverse of grandfather.

Functional patterns

Functional patterns are a language extension that allows us to use ordinary functions in patterns, not just data constructors. Functional patterns are implemented by KiCS2.

Let us look at an example again. We want to define a function split that nondeterministically splits a list into two parts.1 Without functional patterns, we can implement splitting as follows.

split' :: [el] -> ([el],[el])
split' list | front ++ rear =:= list = (front,rear)

    where front, rear free

With functional patterns, we can implement splitting in a much simpler way.

split :: [el] -> ([el],[el])
split (front ++ rear) = (front,rear)

As a second example, let us define a function sublist that yields the sublists of a given list.

sublist :: [el] -> [el]
sublist (_ ++ sub ++ _) = sub

Inverting functions

In the grandchild example, we showed how we can define the inverse of a particular function. We can go further and implement a generic function inversion operator.

inverse :: (val -> val') -> (val' -> val)
inverse fun val' | fun val =:= val' = val where val free

With this operator, we could also implement grandchild as inverse grandfather.

Inverting functions can make our lives a lot easier. Consider the example of parsing. A parser takes a string and returns a syntax tree. Writing a parser directly is a non-trivial task. However, generating a string from a syntax tree is just a simple functional programming exercise. So we can implement a parser in a simple way by writing a converter from syntax trees to strings and inverting it.

We show this for the language of all arithmetic expressions that can be built from addition, multiplication, and integer constants. We first define types for representing abstract syntax trees. These types resemble a grammar that takes precedence into account.

type Expr = Sum

data Sum     = Sum Product [Product]
data Product = Product Atom [Atom]
data Atom    = Num Int | Para Sum

Now we implement the conversion from abstract syntax trees to strings.

toString :: Expr -> String
toString = sumToString

sumToString :: Sum -> String
sumToString (Sum product products)
    = productToString product                           ++
      concatMap ((" + " ++) . productToString) products

productToString :: Product -> String
productToString (Product atom atoms)
    = atomToString atom                           ++
      concatMap ((" * " ++) . atomToString) atoms

atomToString :: Atom -> String
atomToString (Num num)  = show num
atomToString (Para sum) = "(" ++ sumToString sum ++ ")"

Implementing the parser is now extremely simple.

parse :: String -> Expr
parse = inverse toString

KiCS2 uses a depth-first search strategy by default. However, our parser implementation does not work with depth-first search. So we switch to breadth-first search by entering

:set bfs

at the KiCS2 prompt. Now we can try out the parser by entering

parse "2 * (3 + 4)" .

  1. Note that our split function is not the same as the split function in Curry’s List module.

Tagged: breadth-first search, Curry, Denis Firsov, depth-first search, functional logic programming, functional pattern, functional programming, Institute of Cybernetics, KiCS2, literate programming, logic programming, parsing, Prolog, talk, Theory Lunch, type class

by Wolfgang Jeltsch at February 18, 2017 03:14 AM

February 16, 2017

Mark Jason Dominus

Automatically checking for syntax errors with Git's pre-commit hook

Previous related article
Earlier related article

Over the past couple of days I've written about how I committed a syntax error on a cron script, and a co-worker had to fix it on Saturday morning. I observed that I should have remembered to check the script for syntax errors before committing it, and several people wrote to point out to me that this is the sort of thing one should automate.

(By the way, please don't try to contact me on Twitter. It won't work. I have been on Twitter Vacation for months and have no current plans to return.)

Git has a “pre-commit hook” feature, which means that you can set up a program that will be run every time you attempt a commit, and which can abort the commit if it doesn't like what it sees. This is the natural place to put an automatic syntax check. Some people suggested that it should be part of the CI system, or even the deployment system, but I don't control those, and anyway it is much better to catch this sort of thing as early as possible. I decided to try to implement a pre-commit hook to check syntax.

Unlike some of the git hooks, the pre-commit hook is very simple to use. It gets run when you try to make a commit, and the commit is aborted if the hook exits with a nonzero status.

I made one mistake right off the bat: I wrote the hook in Bourne shell, even though I swore years ago to stop writing shell scripts. Everything that I want to write in shell should be written in Perl instead or in some equivalently good language like Python. But the sample pre-commit hook was written in shell and when I saw it I went into automatic shell scripting mode and now I have yet another shell script that will have to be replaced with Perl when it gets bigger. I wish I would stop doing this.

Here is the hook, which, I should say up front, I have not yet tried in day-to-day use. The complete and current version is on github.


    function typeof () {
        case $filename in
            *.pl | *.pm) echo perl; exit ;;

        line1=$(head -1 $1)
        case $line1 in '#!'*perl )
            echo perl; exit ;;

Some of the sample programs people showed me decided which files needed to be checked based only on the filename. This is not good enough. My most important Perl programs have filenames with no extension. This typeof function decides which set of checks to apply to each file, and the minimal demonstration version here can do that based on filename or by looking for the #!...perl line in the first line of the file contents. I expect that this function will expand to include other file types; for example

               *.py ) echo python; exit ;;

is an obvious next step.

    if [ ! -z $COMMIT_OK ]; then
        exit 0;

This block is an escape hatch. One day I will want to bypass the hook and make a commit without performing the checks, and then I can COMMIT_OK=1 git commit …. There is actually a --no-verify flag to git-commit that will skip the hook entirely, but I am unlikely to remember it.

(I am also unlikely to remember COMMIT_OK=1. But I know from experience that I will guess that I might have put an escape hatch into the hook. I will also guess that there might be a flag to git-commit that does what I want, but that will seem less likely to be true, so I will look in the hook program first. This will be a good move because my hook is much shorter than the git-commit man page. So I will want the escape hatch, I will look for it in the best place, and I will find it. That is worth two lines of code. Sometimes I feel like the guy in Memento. I have not yet resorted to tattooing COMMIT_OK=1 on my chest.)

    exec 1>&2

This redirects the standard output of all subsequent commands to go to standard error instead. It makes it more convenient to issue error messages with echo and such like. All the output this hook produces is diagnostic, so it is appropriate for it to go to standard error.

    for file in $(git diff --cached --name-only | sort) ; do

allOK is true if every file so far has passed its checks. badFiles is a list of files that failed their checks. the git diff --cached --name-only function interrogates the Git index for a list of the files that have been staged for commit.

        type=$(typeof "$file")

This invokes the typeof function from above to decide the type of the current file.


When a check discovers that the current file is bad, it will signal this by setting BAD to true.

        echo "##  Checking file $file (type $type)"
        case $type in
            perl )
                perl -cw $file || BAD=true
                [ -x $file ] || { echo "File is not executable"; BAD=true; }
            * )
                echo "Unknown file type: $file; no checks"

This is the actual checking. To check Python files, we would add a python) … ;; block here. The * ) case is a catchall. The perl checks run perl -cw, which does syntax checking without executing the program. It then checks to make sure the file is executable, which I am sure is a mistake, because these checks are run for .pm files, which are not normally supposed to be executable. But I wanted to test it with more than one kind of check.

        if $BAD; then

If the current file was bad, the allOK flag is set false, and the commit will be aborted. The current filename is appended to badFiles for a later report. Bash has array variables but I don't remember how they work and the manual made it sound gross. Already I regret not writing this in a real language.

After the modified files have been checked, the hook exits successfully if they were all okay, and prints a summary if not:

    if $allOK; then
        exit 0;
        echo ''
        echo '## Aborting commit.  Failed checks:'
        for file in $(echo $badFiles | tr ';' ' '); do
            echo "    $file"
        exit 1;

This hook might be useful, but I don't know yet; as I said, I haven't really tried it. But I can see ahead of time that it has a couple of drawbacks. Of course it needs to be built out with more checks. A minor bug is that I'd like to apply that is-executable check to Perl files that do not end in .pm, but that will be an easy fix.

But it does have one serious problem I don't know how to fix yet. The hook checks the versions of the files that are in the working tree, but not the versions that are actually staged for the commit!

The most obvious problem this might cause is that I might try to commit some files, and then the hook properly fails because the files are broken. Then I fix the files, but forget to add the fixes to the index. But because the hook is looking at the fixed versions in the working tree, the checks pass, and the broken files are committed!

A similar sort of problem, but going the other way, is that I might make several changes to some file, use git add -p to add the part I am ready to commit, but then the commit hook fails, even though the commit would be correct, because the incomplete changes are still in the working tree.

I did a little tinkering with git stash save -k to try to stash the unstaged changes before running the checks, something like this:

        git stash save -k "pre-commit stash" || exit 2
        trap "git stash pop" EXIT

but I wasn't able to get anything to work reliably. Stashing a modified index has never worked properly for me, perhaps because there is something I don't understand. Maybe I will get it to work in the future. Or maybe I will try a different method; I can think of several offhand:

  • The hook could copy each file to a temporary file and then run the check on the temporary file. But then the diagnostics emitted by the checks would contain the wrong filenames.

  • It could move each file out of the way, check out the currently-staged version of the file, check that, and then restore the working tree version. (It can skip this process for files where the staged and working versions are identical.) This is not too complicated, but if it messes up it could catastrophically destroy the unstaged changes in the working tree.

  • Check out the entire repository and modified index into a fresh working tree and check that, then discard the temporary working tree. This is probably too expensive.

  • This one is kind of weird. It could temporarily commit the current index (using --no-verify), stash the working tree changes, and check the files. When the checks are finished, it would unstash the working tree changes, use git-reset --soft to undo the temporary commit, and proceed with the real commit if appropriate.

  • Come to think of it, this last one suggests a much better version of the same thing: instead of a pre-commit hook, use a post-commit hook. The post-commit hook will stash any leftover working tree changes, check the committed versions of the files, unstash the changes, and, if the checks failed, undo the commit with git-reset --soft.

Right now the last one looks much the best but perhaps there's something straightforward that I didn't think of yet.

[ Thanks to Adam Sjøgren, Jeffrey McClelland, and Jack Vickeridge for discussing this with me. Jeffrey McClelland also suggested that syntax checks could be profitably incorporated as a post-receive hook, which is run on the remote side when new commits are pushed to a remote. I said above that running the checks in the CI process seems too late, but the post-receive hook is earlier and might be just the thing. ]

[ Addendum: Daniel Holz wrote to tell me that the Yelp pre-commit frameworkhandles the worrisome case of unstaged working tree changes. The strategy is different from the ones I suggested above. If I'm reading this correctly, it records the unstaged changes in a patch file, which it sticks somewhere, and then checks out the index. If all the checks succeed, it completes the commit and then tries to apply the patch to restore the working tree changes. The checks in Yelp's framework might modify the staged files, and if they do, the patch might not apply; in this case it rolls back the whole commit. Thank you M. Holtz! ]

by Mark Dominus ( at February 16, 2017 03:10 PM

Michael Snoyman

Better Exception Messages

Better exception messages

Let's write a really silly, highly inefficient (my favorite kind!) program that connects to multiple HTTP servers and sends a very simple request. Using the network package, this is really straightforward:

#!/usr/bin/env stack
-- stack --install-ghc --resolver lts-8.0 runghc --package network -- -Wall -Werror
import Control.Monad (forM, forM_)
import Network       (PortID (PortNumber), PortNumber, connectTo)
import System.IO     (hClose, hPutStrLn)

dests :: [(String, PortNumber)]
dests =
    [ ("localhost", 80)
    , ("localhost", 8080)
    , ("", 80)

main :: IO ()
main = do
    handles <- forM dests $ \(host, port) -> connectTo host (PortNumber port)
    forM_ handles $ \h -> hPutStrLn h "GET / HTTP/1.1\r\n\r\n"
    forM_ handles hClose

We have our destinations. We open a connection to each of them, send our data, and then close the connection. You may have plenty of objections to how I've written this: we shouldn't be using String, shouldn't we flush the Handle, etc. Just ignore that for now. I'm going to run this on my local system, and get the following output:

$ ./foo.hs 
foo.hs: connect: does not exist (Connection refused)

Riddle me this: which of the destinations above did the connection fail for? Answer: without changing our program, we have no idea. And that's the point of this blog post: all too often in the Haskell world, we get error messages from a program without nearly enough information to debug it. Prelude.undefined, no parse, and Prelude.head: empty list are all infamous examples where a nice stack trace would save lots of pain. I'm talking about something slightly different.

When you throw an exception in your code, whether it be via throwIO, returning Left, using fail, or using error, please give us some context. During development, it's a pain to have to dive into the code, add some trace statements, figure out what the actual problem is, and then remove the trace statements. When running in production, that extra information can be the difference between a two-minutes operations level fix (like opening a port in the firewall) versus a multi-hour debugging excursion.

Concretely, here's an example of how I'd recommend collecting more information from connectTo:

#!/usr/bin/env stack
-- stack --install-ghc --resolver lts-5.10 runghc --package network -- -Wall -Werror
{-# LANGUAGE DeriveDataTypeable #-}
import Control.Exception (Exception, IOException, catch, throwIO)
import Control.Monad     (forM, forM_)
import Data.Typeable     (Typeable)
import Network           (HostName, PortID (PortNumber), PortNumber, connectTo)
import System.IO         (Handle, hClose, hPutStrLn)

data ConnectException = ConnectException HostName PortID IOException
    deriving (Show, Typeable)
instance Exception ConnectException

connectTo' :: HostName -> PortID -> IO Handle
connectTo' host port = connectTo host port `catch`
    \e -> throwIO (ConnectException host port e)

dests :: [(String, PortNumber)]
dests =
    [ ("localhost", 80)
    , ("localhost", 8080)
    , ("", 80)

main :: IO ()
main = do
    handles <- forM dests $ \(host, port) -> connectTo' host (PortNumber port)
    forM_ handles $ \h -> hPutStrLn h "GET / HTTP/1.1\r\n\r\n"
    forM_ handles hClose

Notice how the ConnectException datatype provides plenty of information about the context that connectTo' was called from (in fact, all available information). If I run this program, the problem is immediately obvious:

$ ./bar.hs 
bar.hs: ConnectException "localhost" (PortNumber 80) connect: does not exist (Connection refused)

My web server isn't running locally on port 80. My ops team can now go kick the nginx/Warp process or do whatever other magic they need to do to get things running. All without bothering me at 2am :)

You may be thinking that this extra data type declaration is a lot of boilerplate overhead. While it does add some tedium, the benefit of being able to not only catch the exact exception we care about, but also easily extract the relevant context information, can pay off in completely unexpected ways in the future. I highly recommend it.

Since no Haskell blog post about exceptions is complete without it, let me cover some controversy:

  • I know some people absolutely hate runtime exceptions. This point is orthogonal: however you decide to report exceptions to your users (Left, ExceptT, impure exceptions, etc), be kind to them and provide this extra context information.
  • There are some problems with the approach I gave above regarding hierarchical exceptions. I'm specifically not diving into the details of hierarchical exceptions right now, since it's a complex topic that deserves its own dedicated post.
  • Similar to the above point, it's a fair question whether you should group all exceptions together in one type with lots of data constructors for a package/module, or create lots of separate datatypes. Again, proper design of an exception type really deserves its own post. FWIW, in http-client, I elected to have an HttpException type with lots of data constructors.

Also, I left it out for brevity, but including a displayException method in your Exception instance can allow programs to display much more user-friendly error messages to end users.

While nothing I've said here is revolutionary, it's a small tweak to a library author's development style that can have a profound impact on users of the library, both at the dev level and those running the executable itself.

February 16, 2017 01:24 PM

Tom Schrijvers

TFP 2017: Call for Papers

                     C A L L   F O R   P A P E R S

                     ======== TFP 2017 ===========

           18th Symposium on Trends in Functional Programming
                            19-21 June, 2017
                      University of Kent, Canterbury

The symposium on Trends in Functional Programming (TFP) is an
international forum for researchers with interests in all aspects of
functional programming, taking a broad view of current and future
trends in the area. It aspires to be a lively environment for
presenting the latest research results, and other contributions (see
below). Authors of draft papers will be invited to submit revised
papers based on the feedback receive at the symposium.  A
post-symposium refereeing process will then select a subset of these
articles for formal publication.

TFP 2017 will be the main event of a pair of functional programming
events. TFP 2017 will be accompanied by the International Workshop on
Trends in Functional Programming in Education (TFPIE), which will take
place on 22 June.

The TFP symposium is the heir of the successful series of Scottish
Functional Programming Workshops. Previous TFP symposia were held in
    * Edinburgh (Scotland) in 2003;
    * Munich (Germany) in 2004;
    * Tallinn (Estonia) in 2005;
    * Nottingham (UK) in 2006;
    * New York (USA) in 2007;
    * Nijmegen (The Netherlands) in 2008;
    * Komarno (Slovakia) in 2009;
    * Oklahoma (USA) in 2010;
    * Madrid (Spain) in 2011;
    * St. Andrews (UK) in 2012;
    * Provo (Utah, USA) in 2013;
    * Soesterberg (The Netherlands) in 2014;
    * Inria Sophia-Antipolis (France) in 2015;
    * and Maryland (USA) in 2016.

For further general information about TFP please see the TFP homepage.

== SCOPE ==

The symposium recognizes that new trends may arise through various
routes.  As part of the Symposium's focus on trends we therefore
identify the following five article categories. High-quality articles
are solicited in any of these categories:

Research Articles: leading-edge, previously unpublished research work
Position Articles: on what new trends should or should not be
Project Articles: descriptions of recently started new projects
Evaluation Articles: what lessons can be drawn from a finished project
Overview Articles: summarizing work with respect to a trendy subject

Articles must be original and not simultaneously submitted for
publication to any other forum. They may consider any aspect of
functional programming: theoretical, implementation-oriented, or
experience-oriented.  Applications of functional programming
techniques to other languages are also within the scope of the

Topics suitable for the symposium include, but are not limited to:

      Functional programming and multicore/manycore computing
      Functional programming in the cloud
      High performance functional computing
      Extra-functional (behavioural) properties of functional programs
      Dependently typed functional programming
      Validation and verification of functional programs
      Debugging and profiling for functional languages
      Functional programming in different application areas:
        security, mobility, telecommunications applications, embedded
        systems, global computing, grids, etc.
      Interoperability with imperative programming languages
      Novel memory management techniques
      Program analysis and transformation techniques
      Empirical performance studies
      Abstract/virtual machines and compilers for functional languages
      (Embedded) domain specific languages
      New implementation strategies
      Any new emerging trend in the functional programming area

If you are in doubt on whether your article is within the scope of
TFP, please contact the TFP 2017 program chairs, Scott Owens and Meng Wang.


To reward excellent contributions, TFP awards a prize for the best paper
accepted for the formal proceedings.

TFP traditionally pays special attention to research students,
acknowledging that students are almost by definition part of new
subject trends. A student paper is one for which the authors state
that the paper is mainly the work of students, the students are listed
as first authors, and a student would present the paper. A prize for
the best student paper is awarded each year.

In both cases, it is the PC of TFP that awards the prize. In case the
best paper happens to be a student paper, that paper will then receive
both prizes.




Acceptance of articles for presentation at the symposium is based on a
lightweight peer review process of extended abstracts (4 to 10 pages
in length) or full papers (20 pages). The submission must clearly
indicate which category it belongs to: research, position, project,
evaluation, or overview paper. It should also indicate which authors
are research students, and whether the main author(s) are students.  A
draft paper for which ALL authors are students will receive additional
feedback by one of the PC members shortly after the symposium has
taken place.

We use EasyChair for the refereeing process. Papers must be submitted at:<wbr></wbr>conferences/?conf=tfp17

Papers must be written in English, and written using the LNCS
style. For more information about formatting please consult the
Springer LNCS web site:<wbr></wbr>computer/lncs?SGWID=0-164-6-<wbr></wbr>793341-0


Submission of draft papers:     5 May, 2017
Notification:                   12 May, 2017
Registration:                   11 June, 2017
TFP Symposium:                  19-21 June, 2017
Student papers feedback:        29 June, 2017
Submission for formal review:   2 August, 2017
Notification of acceptance:     3 November, 2017
Camera ready paper:             2 December, 2017



by Tom Schrijvers ( at February 16, 2017 08:58 AM

February 15, 2017

Functional Jobs

Data Engineer at Takt, Inc. (Full-time)

Takt is seeking data engineers to help develop our flagship product. Our platform learns and adapts to people's preferences, habits, and feedback—orchestrating highly relevant experiences that are truly unique to each person. Our vision will change the way people engage across multiple industries, be it retail, finance, or healthcare.

We share your passion for using data to solve complex problems. You understand that legacy code is the work you did yesterday. You'll work in small, self-sufficient teams with a common goal: deliver excellent software anchored in an agile culture of quality, delivery, and innovation.

Key Responsibilities:

  • Apply data to solve complex problems—understanding the importance of data pipelines and quality
  • Refine, harden, and productionize models developed by data scientists
  • Build infrastructure for real-time analytics and real-time predictive intelligence based on large, diverse, and dynamic data sets
  • Apply software testing practices to the work you do
  • Help grow our engineering team

Skills and Experience:

  • Profound and extensive experience with cluster computing frameworks; we work with Spark and Scala
  • Strong programming skills in at least one language; functional languages (Haskell, Scala, Clojure, Erlang) are preferred
  • Significant experience with SQL and NoSQL databases
  • Demonstrated experience with cloud platforms, such as AWS

Bonus Points:

  • You welcome the responsibility and thrill that comes with being a member of a founding team
  • You're motivated, dependable, and continuously focused on excellence
  • You court perfection, but are grounded and practical
  • You're interested in improving humanity's relationship with technology

Get information on how to apply for this position.

February 15, 2017 09:35 PM

Systems and Infrastructure Engineer at Takt, Inc. (Full-time)

Takt is seeking Systems Infrastructure Engineers to support the development of our flagship product. Our platform learns and adapts to people's preferences, habits, and feedback—orchestrating highly relevant experiences that are truly unique to each person. Our vision will change the way people engage with their favorite brands across multiple industries, be it retail, finance, or healthcare.

As a Systems and Infrastructure Engineer at Takt you understand that legacy code is the work you did yesterday. You’re well versed in modern technologies and select tools based on what is best for the team, product and organization. If those tools don’t exist, you’ll roll up your sleeves and build them. At Takt “DevOps” isn’t a role, but an approach to collaboration. We work in small, self-sufficient teams with the shared goal of delivering excellent software anchored in an agile culture of quality, delivery, and innovation.

Key Responsibilities:

  • Implement next-generation operations tools (AWS, Terraform, Datadog)
  • Evaluate available self-service tools in addition to designing and implementing new automation tools in Haskell (previous Haskell experience not required)
  • Unify metrics, monitoring, auto-scaling, automation, orchestration, service management, and deployments
  • Utilize agile approach to promote better working relationships and product delivery
  • Help grow our engineering team

Skills and Experience:

  • Extensive experience with cloud services platforms (AWS, OpenStack, Azure, Google Cloud)
  • Demonstrated proficiency in at least one programming language; functional languages are preferred (but not required)
  • In-depth knowledge of Linux-based operating systems
  • Knowledge of standard configuration tools to launch and control changes to infrastructure (Terraform, Chef, Puppet, Ansible, Docker, Kubernetes, Mesos)
  • Proficiency with performance analytics tools (Datadog, New Relic, LogiMonitor, AppDynamics)
  • Experience with continuous integration and deployment

Bonus Points:

  • Passion for functional programming and type systems
  • Prior experience with enterprise SaaS products
  • Knowledge of database architecture, high availability, failover, and performance tuning
  • Expertise in secure network design and implementation
  • Experience with implementing monitoring, alerting, dashboarding and metrics collection

Get information on how to apply for this position.

February 15, 2017 09:30 PM

Mark Jason Dominus

More thoughts on a line of code with three errors

Yesterday I wrote, in great irritation, about a line of code I had written that contained three errors.

I said:

What can I learn from this? Most obviously, that I should have tested my code before I checked it in.

Afterward, I felt that this was inane, and that the matter required a little more reflection. We do not test every single line of every program we write; in most applications that would be prohibitively expensive, and in this case it would have been excessive.

The change I was making was in the format of the diagnostic that the program emitted as it finished to report how long it had taken to run. This is not an essential feature. If the program does its job properly, it is of no real concern if it incorrectly reports how long it took to run. Two of my errors were in the construction of the message. The third, however, was a syntax error that prevented the program from running at all.

Having reflected on it a little more, I have decided that I am only really upset about the last one, which necessitated an emergency Saturday-morning repair by a co-worker. It was quite acceptable not to notice ahead of time that the report would be wrong, to notice it the following day, and to fix it then. I would have said “oops” and quietly corrected the code without feeling like an ass.

The third problem, however, was serious. And I could have prevented it with a truly minimal amount of effort, just by running:

    perl -cw the-script

This would have diagnosed the syntax error, and avoided the main problem at hardly any cost. I think I usually remember to do something like this. Had I done it this time, the modified script would have gone into production, would have run correctly, and then I could have fixed the broken timing calculation on Monday.

In the previous article I showed the test program that I wrote to test the time calculation after the program produced the wrong output. I think it was reasonable to postpone writing this until after program ran and produced the wrong output. (The program's behavior in all other respects was correct and unmodified; it was only its report about its running time that was incorrect.) To have written the test ahead of time might be an excess of caution.

There has to be a tradeoff between cautious preparation and risk. Here I put everything on the side of risk, even though a tiny amount of caution would have eliminated most of the risk. In my haste, I made a bad trade.

[ Addendum 20170216: I am looking into automating the perl -cw check. ]

by Mark Dominus ( at February 15, 2017 12:54 AM

February 14, 2017

Michael Snoyman

Hackage Security and Stack

Back in 2015, there were two proposals made for securing package distribution in Haskell. The Stackage team proposed and implemented a solution using HTTPS and Git, which was then used as the default in Stack. Meanwhile, the Hackage team moved ahead with hackage-security. Over the past few weeks, I've been working on moving Stack over to hackage-security (more on motivation below). The current status of the overall hackage-security roll-out is:

  • Hackage is now providing the relevant data for hackage-security (the 01-index.tar file and signature files)
  • cabal-install will move over to hackage-security in its next release
  • The FP Complete Hackage mirror is using hackage-security (and in particular Herbert's hackage-mirror-tool) to run its S3-backed mirror.
  • On the master branch, Stack defaults to using hackage-security for downloading package metadata. We may even remove support for Git-based indices entirely, but that's a discussion for another day.

One upside to this is more reliable package index download time. We have had complaints from some firewalled users of slow Git clone time, so this is a good thing. We're still planning on maintaining the Git-based package indices for people using them (to my knowledge they are still being used by Nix, and all-cabal-metadata is still used to power a lot of the information on

However, there's one significant downside I've encountered in the current implementation that I want to discuss.


Quick summary of how hackage-security works: there is a 01-index.tar file, the contents of which I'll discuss momentarily. This is the file which is downloaded by Stack/cabal-install when you "update your index." It is signed by a cryptographic algorithm specified within the hackage-security project, and whenever a client does an update, it must verify the signature. In theory, when that signature is verified, we know that the contents of the 01-index.tar file are unmodified.

Within this file are two (relevant) kinds of files: the .cabal files for every upload to Hackage (including revisions), and .json files containing metadata about the package tarballs themselves. Importantly, this includes a SHA256 checksum and the size of the tarball. Using these already-validated-to-be-correct JSON files, we can download and verify a package tarball, even over an insecure connection.

The alternative Git-based approach that the Stackage team proposed has an almost-identical JSON file concept in the all-cabal-hashes repo. Originally, these were generated by downloading tarballs from (note the HTTPS). However, a number of months back it became known that the connection between the CDN in front of Hackage and Hackage itself was not TLS-secured, and therefore reliance on HTTPS was not possible. We now rely on the JSON files provided by hackage-security to generate the JSON files used in the Git repo.

The problem

With that background, the bug is easy to describe: sometimes the .json files are missing from the 01-index.tar file. This was originally opened in April 2016 (for Americans: on tax day no less), and then I rediscovered the issue three weeks ago when working on Stack.

Over the weekend, another .json file went missing, resulting in the FP Complete mirror not receiving updates until I manually updated the list of missing index files. Due to the inability to securely generate the .json file in the all-cabal-hashes Git repo without the file existing upstream, that file is now missing in all-cabal-hashes, causing downstream issues to the Nix team.

How it manifests

There are a number of outcomes to be aware of from this issue:

  • The FP Complete mirror, and any other mirror using Herbert's tool, will sometimes stop updating if a new JSON file is missing. This is an annoyance for end users, and a frustration for the mirror maintainers. Fortunately, updating the mirror tool code with the added index isn't too heavy a burden. Unfortunately, due to the lack of HTTPS between Hackage and its CDN, there's no truly secure way to do this update.
  • End users cannot currently use packages securely if they are affected by this bug. You can see the full list at the time of writing this post.
  • Stack has had code in place to reject indices that do not provide complete signature cover for a long while (I think since its initial release). Unfortunately, this code cannot be turned on for hackage-security (which is how I discovered this bug in the first place). We can implement a new functionality with weaker requirements (refuse to download a package that is missing signature information), but ideally we could use the more strict semantics.
  • The Nix team cannot rely on hashes being present in all-cabal-hashes. I can't speak to the Nix team internal processes, and cannot therefore assess how big an impact that is.


Overall, I'm still very happy that we've moved Stack over to hackage-security:

  • It fixed an immediate problem for users behind a firewall, which we otherwise would have needed to work around with new code (downloading a Git repo snapshot). Avoiding writing new code is always a win :).
  • Layering the HTTPS/Git-based security system on top of hackage-security doesn't make things more secure, it just adds two layers for security holes to exist in instead of one. From a security standpoint, if Hackage is providing a security mechanism, it makes sense to leverage it directly. Said another way: if it turns out that hackage-security is completely insecure, our Git-based layer would have been vulnerable anyway since it relied on hackage-security.
  • By moving both Stack and cabal-install over to hackage-security for client access, we'll be able to test that code more thoroughly, hopefully resulting in a more reliable security mechanism for both projects to share (small example of such stress-testing).
  • Stack has always maintained compatibility with some form of non-Git index, so we've always had two code paths for index updates. As hinted at above, this change opens the door to removing the Git-based code path. And removing code is almost as good as avoiding writing new code.
  • I would still feel more comfortable with the security of Hackage if HTTPS was used throughout, if only as a level of sanity in case all else fails. I hope that in the future the connection between Hackage and its CDN switches from insecure to secure. I also hope that cabal-install is still planning on moving over to using HTTPS for its downloads.

February 14, 2017 03:24 PM

Mark Jason Dominus

How I got three errors into one line of code

At work we had this script that was trying to report how long it had taken to run, and it was using DateTime::Duration:

    my $duration = $end_time->subtract_datetime($start_time);
    my ( $hours, $minutes, $seconds ) =
    $duration->in_units( 'hours', 'minutes', 'seconds' );

    log_info "it took $hours hours $minutes minutes and $seconds seconds to run"

This looks plausible, but because DateTime::Duration is shit, it didn't work. Typical output:

    it took 0 hours 263 minutes and 19 seconds to run

I could explain to you why it does this, but it's not worth your time.

I got tired of seeing 0 hours 263 minutes show up in my cron email every morning, so I went to fix it. Here's what I changed it to:

    my $duration = $end_time->subtract_datetime_absolute($start_time)->seconds;
    my ( $hours, $minutes, $minutes ) = (int(duration/3600), int($duration/60)%60, $duration%3600);

I was at some pains to get that first line right, because getting DateTime to produce a useful time interval value is a tricky proposition. I did get the first line right. But the second line is just simple arithmetic, I have written it several times before, so I dashed it off, and it contains a syntax error, that duration/3600 is missing its dollar sign, which caused the cron job to crash the next day.

A co-worker got there before I did and fixed it for me. While he was there he also fixed the $hours, $minutes, $minutes that should have been $hours, $minutes, $seconds.

I came in this morning and looked at the cron mail and it said

    it took 4 hours 23 minutes and 1399 seconds to run

so I went back to fix the third error, which is that $duration%3600 should have been $duration%60. The thrice-corrected line has

    my ( $hours, $minutes, $seconds ) = (int($duration/3600), int($duration/60)%60, $duration%60);

What can I learn from this? Most obviously, that I should have tested my code before I checked it in. Back in 2013 I wrote:

Usually I like to draw some larger lesson from this sort of thing. … “Just write the tests, fool!”

This was a “just write the tests, fool!” moment if ever there was one. Madame Experience runs an expensive school, but fools will learn in no other.

I am not completely incorrigible. I did at least test the fixed code before I checked that in. The test program looks like this:

    sub dur {
      my $duration = shift;
      my ($hours, $minutes, $seconds ) = (int($duration/3600), int($duration/60)%60, $duration%60);
      sprintf  "%d:%02d:%02d", $hours, $minutes, $seconds;

    use Test::More;
    is(dur(0),  "0:00:00");
    is(dur(1),  "0:00:01");
    is(dur(59), "0:00:59");
    is(dur(60), "0:01:00");
    is(dur(62), "0:01:02");
    is(dur(122), "0:02:02");
    is(dur(3599), "0:59:59");
    is(dur(3600), "1:00:00");
    is(dur(10000), "2:46:40");

It was not necessary to commit the test program, but it was necessary to write it and to run it. By the way, the test program failed the first two times I ran it.

Three errors in one line isn't even a personal worst. In 2012 I posted here about getting four errors into a one-line program.

[ Addendum 20170215: I have some further thoughts on this. ]

by Mark Dominus ( at February 14, 2017 02:55 AM

February 13, 2017

Malcolm Wallace

We are hiring Functional Programmers.

The Modelling and Analytics Group was the original introducer of Functional Programming (especially Haskell) to the Financial Markets business of Standard Chartered Bank, and although there are now several other teams who contribute to our large codebase and who are also hiring, we now have some vacancies in the Core team.  Please think about applying!

Job Description:<o:p></o:p>
  • Join the Modeling and Analytics’ Core team, part of Financial Markets Front Office, in Singapore or London.
  • Apply Functional Programming concepts to the design and implementation of the unified bank’s Analytics library.
  • Support infrastructure requests from the Quant group, flow trading desks, structured trading desks & structuring worldwide.

Candidate qualifications:<o:p></o:p>
  • Master or PhD in Computer Science, with a focus among: functional programming, language & compiler design, efficient data processing.
  • Excellent programming skills in one of the major statically typed functional languages (ideally Haskell), and preferably exhibited by academic output (research / teaching) or open-source development.
  • Proficiency with C/C++ and debugging / performance tuning tools is a strong advantage.
  • Good communication skills required for interactions with other team members and with trading desks.
  • Familiarity with financial markets is a plus but not required.

How to apply:
  • In the first instance, send your CV to

by malcolm ( at February 13, 2017 07:51 PM

FP Complete

Immutability, Docker, and Haskell's ST type

In managing projects at FP Complete, I get to see both the software development and devops sides of our engineering practice. Over the years, I've been struck by the recurrence of a single word appearing repeatedly in both worlds: immutability.

On the software side, one of the strongest tenets of functional programming is immutable data structures. These are values which - once created - can never be changed again through the course of running the application. These reduce coupling between components, simplify concurrency and parallelism, and decrease the total number of moving pieces in a system, making it easier to maintain and develop over time.

On the devops side, immutable infrastructure is relatively a more recent discovery. By creating machine images and replacing rather than modifying existing components, we have a more reliable hosting setup to target, minimize the differences between test and production systems, and reduce the amount of error-prone, manual fiddling that leads to 3am coffee-fueled emergency recovery sessions.

It's no secret that containerization in general, and Docker in particular, has become very popular in the devops space. I've noticed that there's a strong parallel between how Docker images are built, and a technique from functional programming - the ST (State Thread) type. This blog post will explain both sides of the puzzle, and then explain how they match up.

Dockerfile: mutable steps, immutable outcome

A Docker image is a complete Linux filesystem, providing all of the tools, libraries, and data files needed for its task. As a simple example, I recently created a simple Docker image containing the Stack build tool (more on that later) and Apache FOP for generating some PDFs. In the Docker world, the formula you use for creating a Docker image is a Dockerfile. Let's look at the (very simple) Dockerfile I wrote:

FROM fpco/pid1:16.04

RUN DEBIAN_FRONTEND=noninteractive apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y wget default-jre && \
    wget -qO- | sh
RUN wget -q && \
    tar zxf fop-2.1-bin.tar.gz && \
    rm -f fop-2.1-bin.tar.gz && \
    mv fop-2.1 /usr/local/share

In this file, I'm starting off from the fpco/pid1 base image, which provides us with a filesystem to start off with (it would obviously be pretty difficult to create a complete filesystem each time we wanted to create a new image). Then we provide a series of actions to take to modify that image. Looking at the example above, we:

  • Update the list of APT packages available
  • Install wget and the default Java Runtime Environment
  • Install the Stack build tool by running a script
  • Download the FOP binary bundle
  • Unpack the bundle and move it to /usr/local/share

Look at that list of steps. In no world could those actions be called "immutable." Every single one of them mutates the filesystem, either modifying files, adding files, or removing files. The end result of this mutation process is a new filesystem, captured in a Docker image.

And here's the important bit: this new image is totally immutable. You cannot in any way modify the image. You can create a new image based on it, but the original will remain unchanged. For all of history, this image will remain identical.*

In other words: a Dockerfile is a series of mutations which generates an immutable data structure.

* You may argue that you can delete the image, or you could create a new image with the same name. That's true, but as long as you're working with the image in question, it does not change. By contrast, each time you access the /tmp/foobar file, it may have different contents.

The ST type

In a purely functional programming language like Haskell, data is immutable by default. This means that, if you have a variable holding an Int, you cannot change it. Consider this example code, playing around with a Map structure (also known as a dictionary or lookup table):

myMap <- makeSomeMap
print myMap
useMap myMap
print myMap

We make our initial Map using the makeSomeMap function, print its contents, pass it to some other function (useMap), and then print it again. Pop quiz: is there any way that the two print operations will print different values?

If you're accustomed to mutable languages like Java or Python, you'd probably say yes: myMap is (presumably) an object with mutable state, and the useMap function might modify it. In Haskell, that can't happen: you've passed a reference to myMap to your useMap function, but useMap is not allowed to modify it.

Of course, we would like to be able to create different values, so saying "you can't ever change anything" is a little daunting. The primary way of working with Haskell's immutable data structures is to have functions which create new values based on old ones. In this process, we create a new value by giving it some instructions for the change. For example, if in our example above, the myMap value had a mapping from names to ages, we could insert an extra value:

myMap <- makeSomeMap
let myModifiedMap = insert "Alice" 35 myMap
print myModifiedMap
useMap myModifiedMap
print myModifiedMap

However, this isn't real mutation: the original myMap remains the same. There are cases in which creating a completely new version of the data each time would be highly inefficient. Most sorting algorithms fall into this category, as they involve a large number of intermediate swaps. If each of those swaps generated a brand new array, it would be very slow with huge amounts of memory allocation.

Instead, Haskell provides the ST type, which allows for local mutations. While within an ST block, you can directly mutate local variables, such as mutable vectors. But none of those mutated values can escape the ST block, only immutable variants. To see how this works, look at this Haskell code (save it to Main.hs and run with stack Main.hs using the Stack build tool):

#!/usr/bin/env stack
{- stack --resolver lts-7.14 --install-ghc runghc
    --package vector-algorithms -}
import Data.Vector (Vector, fromList, modify, freeze, thaw)
import Data.Vector.Algorithms.Insertion (sort)
import Control.Monad.ST

-- longer version, to demonstrate what's actually happening
immutableSort :: Vector Int -> Vector Int
immutableSort original = runST $ do
    mutableVector <- thaw original
    sort mutableVector
    freeze mutableVector

-- short version, what we'd use in practice, using the modify helper
-- immutableSort :: Vector Int -> Vector Int
-- immutableSort = modify sort

main = do
    let unsortedVector = fromList [1, 4, 2, 0, 8, 9, 5]
        sortedVector = immutableSort unsortedVector
    print unsortedVector
    print sortedVector

The immutableSort function takes an immutable vector of integers, and returns a new immutable vector of integers. Internally, though, it runs everything inside an ST block. First we thaw the immutable vector into a mutable copy of the original. Now that we have a fresh copy, we're free to - within the ST block - modify it to our heart's content, without impacting the original at all. To do this, we use the mutating sort function. When we're done, we freeze that mutable vector into a new immutable vector, which can be passed outside of the ST block.

(I've also included a shorter version of the function which uses the modify function to automate the freezing and thawing. Under the surface, it's doing almost exactly the same thing... see extra credit at the bottom for more details.)

Using this technique, we get to have our cake and eat it too: an efficient sorting algorithm (insertion sort) based on mutations to a random-access vector, while maintaining the invariant that our original vector remains unchanged.

Parallels between Docker and functional programming

After analyzing both Dockerfiles and the ST type, I think we can draw some interesting parallels. Both techniques accept that there are some things which are either easier or more efficient to do with direct mutation. But instead of throwing out the baby with the bathwater, they both value immutability as a goal. To achieve this, both of them have the concept of constrained mutation: you can only mutate in some specific places.

There's another interesting parallel to be observed: both Docker and functional programming hide some mutation from the user. For example, when you code 2 + 3, under the surface your compiler is generating something like:

  • Write the value 2 to a machine register
  • Write the value 3 to another machine register
  • Perform the ADD machine instruction
  • Copy the result in the output machine register to some location in memory

All four of these steps are inherently mutating the state of your machine, but you probably never think about that. (This applies to all common programming languages, not just functional languages.) While mutation is happening all the time, we'd often rather not think about it, and instead focus on the higher level goal (in this case: add two numbers together).

When you launch a Docker container, Docker is making a lot of mutating changes. When you execute docker run busybox echo Hello World!, Docker creates a new control group (c-group), creates some temporary files, forks processes, and so on. Again, each of these actions is inherently a state mutation, but taken as a whole, we can view the sum total as an immutable action that uses a non-changing file system to run a command in an isolated environment that generates some output on the command line.

Of course, you can also use Docker to run mutating commands, such as bind-mounting the host file system and modifying files. Similarly, from within a functional programming language you can cause mutations of similar magnitude. But that's up to you; the system itself tries to hide away a bunch of intermediate mutations as a single, immutable action.

Further insights

I always enjoy finding a nexus between two seemingly unrelated fields. While the line of reasoning that brought them there are quite distinct, I'm very intrigued that both the devops and functional programming worlds seem to be thriving today on immutability. I'd be interested to hear others' experiences with similar intersections between these worlds, or other worlds.

FP Complete is regularly in the business of combining modern devops practices with cutting edge functional programming. If you'd like to learn more, check out our consulting offerings or reach out for a free consultation.

If you're interested in learning more about Haskell, check out our Haskell syllabus.

Extra credit

I made a comment above about "almost the same thing" with the two versions of immutable sort. The primary difference is in safe versus unsafe freezing. In our longer version, we're using the safe variants of both freeze and thaw, which operate by making a new copy of the original buffer. In the case of thaw, this ensures that the original, immutable version of the vector is never modified. In the case of freeze, this ensures that we don't create a falsely-immutable vector, which can have its values changed when the original, mutable vector is tweaked.

Based on this, our long version of the function does the following operations:

  • Create a new memory buffer the same size as the original. Let's call this buffer A.
  • Copy the values into A from the original.
  • Sort the values inside A using mutation.
  • Create a new memory buffer of the same size. Let's call this buffer B.
  • Copy the values from A into B.
  • Make B immutable and return it.

But if you pay close attention, that intermediate memory buffer A can never be modified after the end of our ST block, and therefore making that extra B buffer and copying into it is unnecessary. Therefore, the modify helper function does an unsafe freeze on the A memory buffer, avoiding the unneeded allocation and copy. While this operation may be unsafe in general, we know in our usage it's perfect safe. This is another great tenet of functional programming: wrapping up operations which may be dangerous on their own into helper functions that guarantee safety.

February 13, 2017 03:24 PM

February 10, 2017

Brent Yorgey

Virtual species suffice

Over six years ago, I wrote a post explaining how virtual species are defined. Ever since then (time flies!) I’ve been meaning to write a follow-up post explaining a bit more about virtual species and how they actually suffice to give us not just additive inverses, but also (somewhat surprisingly) multiplicative inverses.

Recall that the intuitive idea of a combinatorial species is a family of labelled structures which are invariant under relabelling. If you’ve never seen the formal definition before, don’t worry: just think “data structures” or “algebraic data types” for now.

The basic idea of virtual species is to work with pairs of species (P,N) where P is considered “positive” and N “negative”. Formally, we consider equivalence classes of such pairs under the equivalence relation defined by (P_1, N_1) \cong (P_2, N_2) iff P_1 + N_2 = P_2 + N_1.1 This parallels the way one typically gives a formal definition of the integers starting from the natural numbers (the “Grothendieck construction”); see my previous post for more details.


How can we build intuition for virtual species, and for additive inverses of species in particular? To be honest I have been struggling with this question for many years.

Multiplicative inverses are much simpler to think about: they are like matter and antimatter. Having both an F-structure and an F^{-1} structure is the same as having nothing; they annihilate each other. By “having nothing” we mean “having no information”, that is, having a unit value: F F^{-1} = 1.

What about additive inverses? Note first that the 0 species does not correspond to having nothing; the word “nothing” corresponds to the 1 (i.e. unit) species. Instead the 0 (i.e. uninhabited) species corresponds to (logical) impossibility. So to interpret -F we have to imagine something where having either F or -F is impossible.

…yeah, me neither. This seems deeply strange. If someone says, “I either have an F or a -F”, you can confidently call them a liar, because it is impossible to have either an F or a -F; that is, F - F = 0. But surely if you actually have an F-structure, it should also be true to say “I have either an F or a G”? Well, that works for normal, positive species—in which case we can define a canonical injection F \to F + G. But once we introduce negative species this completely breaks down. As another example, if someone truthfully says, “I have either a tree or a negative non-empty tree”, you should be able to say, “Aha! I know what you have—it must be an empty tree.” In general, it’s strange that expressing a disjunction can cause some possibilities to be ruled out. Normally, we are used to disjunctions only increasing the number of possibilities.

Inspired by James and Sabry’s really cool paper The Two Dualities of Computation: Negative and Fractional Types, I have thought a bit about whether there is some plausible interpretation involving travelling backwards in time, but I haven’t been able to come up with one. I can’t quite seem to make the formalism of the paper match up with my intuition about species (though this may just be a failure of my imagination or intuition).

Multiplicative Inverses

In any case, let’s see why the ring of virtual species actually has multiplicative inverses—at least, all the ones we could possibly hope for. This is somewhat surprising, since when we build integers from natural numbers by considering equivalence classes of pairs, we certainly don’t get any multiplicative inverses, only additive ones. To get multiplicative inverses we have to do the same process a second time, building the rational numbers as equivalence classes of pairs of integers. But species already have enough extra structure that throwing in additive inverses is all it takes.

First, a caveat: we don’t get multiplicative inverses for all species, but only those species G such that G(0) = 1: that is, species G with only a single structure of size zero, which are of the form G = 1 + X(\dots). With any constant term other than 1, we clearly have no hope of finding another species H such that GH = 1, since the constant term of GH will be a multiple of G’s constant term.

So given such a G, write G = 1 + G_+, where G_+ denotes “non-empty G-structures”. Then we can define the multiplicative inverse of G as follows:

\displaystyle G^{-1} = \sum_{k \geq 0} (-1)^k (G_+)^k = 1 - G_+ + G_+^2 - G_+^3 + \dots

That is, a G^{-1}-structure consists of a list of nonempty G-structures, except that even-length lists are considered “positive” and odd-length lists considered “negative”.

We can easily check that this indeed defines a multiplicative inverse for G:

\displaystyle \begin{array}{rcl}G G^{-1} &=& (1 + G_+) (1 - G_+ + G_+^2 - G_+^3 + \dots) \\[0.5em] &=& (1 - G_+ + G_+^2 - G_+^3 + \dots) + (G_+ - G_+^2 + G_+^3 - G_+^4 + \dots) \\[0.5em] &=& 1 \end{array}

The infinite sums telescope down to leave only 1. Notice this really isn’t about species in particular, but really about infinite power series (of which species are the categorification): any infinite power series with integer coefficients and a constant term of 1 has a multiplicative inverse which is also such a power series.

As an example, consider 1/(1-X) = (1-X)^{-1}. We know this is “supposed” to be the species of lists (since it results from solving L = 1 + XL for L), but let’s see what happens. In this case G = 1-X and G_+ = -X. So the inverse ought to be

\displaystyle (1-X)^{-1} = \sum_{k \geq 0} (-1)^k (-X)^k = \sum_{k \geq 0} X^k = 1 + X + X^2 + X^3 + \dots

And hey, look at that! Lists!

A field of species?

So what would we need to get a true field, i.e. a multiplicative inverse for every nonzero species? Well, for that we would need to throw in rational coefficients. I forget exactly where I read this—some paper by Baez and Dolan, most likely—but I believe the proper way to interpret this would be as groupoid-valued species, since there is a sense in which the “cardinality” of groupoids can be interpreted as rational numbers. But to be honest I have no idea where this leads.

  1. Note that species sum is cancellative—that is, if A + C = B + C then A = B—so this is a coherent definition. This cancellative property is probably worth another post of its own since the reason for it is not entirely trivial.

by Brent at February 10, 2017 06:11 PM

February 09, 2017

Edward Z. Yang

How to integrate GHC API programs with Cabal

GHC is not just a compiler: it is also a library, which provides a variety of functionality that anyone interested in doing any sort of analysis on Haskell source code. Haddock, hint and ghc-mod are all packages which use the GHC API.

One of the challenges for any program that wants to use the GHC API is integration with Cabal (and, transitively, cabal-install and Stack). The most obvious problem that, when building against packages installed by Cabal, GHC needs to be passed appropriate flags telling it which package databases and actual packages should be used. At this point, people tend to adopt some hacky strategy to get these flags, and hope for the best. For commonly used packages, this strategy will get the job done, but for the rare package that needs something extra--preprocessing, extra GHC flags, building C sources--it is unlikely that it will be handled correctly.

A more reliable way to integrate a GHC API program with Cabal is inversion of control: have Cabal call your GHC API program, not the other way around! How are we going to get Cabal/Stack to call our GHC API program? What we will do is replace the GHC executable which passes through all commands to an ordinary GHC, except for ghc --interactive, which we will then pass to the GHC API program. Then, we will call cabal repl/stack repl with our overloaded GHC, and where we would have opened a GHCi prompt, instead our API program gets run.

With this, all of the flags which would have been passed to the invocation of ghc --interactive are passed to our GHC API program. How should we go about parsing the flags? The most convenient way to do this is by creating a frontend plugin, which lets you create a new major mode for GHC. By the time your code is called, all flags have already been processed (no need to muck about with DynFlags!).

Enough talk, time for some code. First, let's take a look at a simple frontend plugin:

module Hello (frontendPlugin) where

import GhcPlugins
import DriverPhases
import GhcMonad

frontendPlugin :: FrontendPlugin
frontendPlugin = defaultFrontendPlugin {
  frontend = hello

hello :: [String] -> [(String, Maybe Phase)] -> Ghc ()
hello flags args = do
    liftIO $ print flags
    liftIO $ print args

This frontend plugin is taken straight from the GHC documentation (but with enough imports to make it compile ;-). It prints out the arguments passed to it.

Next, we need a wrapper program around GHC which will invoke our plugin instead of regular GHC when we are called with the --interactive flag. Here is a simple script which works on Unix-like systems:

import GHC.Paths
import System.Posix.Process
import System.Environment

main = do
  args <- getArgs
  let interactive = "--interactive" `elem` args
      args' = do
        arg <- args
        case arg of
          "--interactive" ->
            ["--frontend", "Hello",
             "-plugin-package", "hello-plugin"]
          _ -> return arg
  executeFile ghc False (args' ++ if interactive then ["-user-package-db"] else []) Nothing

Give this a Cabal file, and then install it to the user package database with cabal install (see the second bullet point below if you want to use a non-standard GHC via the -w flag):

name:                hello-plugin
license:             BSD3
author:              Edward Z. Yang
build-type:          Simple
cabal-version:       >=1.10

  exposed-modules:     Hello
  build-depends:       base, ghc >= 8.0
  default-language:    Haskell2010

executable hello-plugin
  main-is:             HelloWrapper.hs
  build-depends:       base, ghc-paths, unix
  default-language:    Haskell2010

Now, to run your plugin, you can do any of the following:

  • cabal repl -w hello-plugin
  • cabal new-repl -w hello-plugin
  • stack repl --system-ghc --with-ghc hello-plugin

To run the plugin on a specific package, pass the appropriate flags to the repl command.

The full code for this example can be retrieved at ezyang/hello-plugin on GitHub.

Here are a few miscellaneous tips and tricks:

  • To pass extra flags to the plugin, add --ghc-options=-ffrontend-opt=arg as necessary (if you like, make another wrapper script around this!)
  • If you installed hello-plugin with a GHC that is not the one from your PATH, you will need to put the correct ghc/ghc-pkg/etc executables first in the PATH; Cabal's autodetection will get confused if you just use -w. If you are running cabal, another way to solve this problem is to pass --with-ghc-pkg=PATH to specify where ghc-pkg lives (Stack does not support this.)
  • You don't have to install the plugin to your user package database, but then the wrapper program needs to be adjusted to be able to find wherever the package does end up being installed. I don't know of a way to get this information without writing a Custom setup script with Cabal; hopefully installation to the user package database is not too onerous for casual users.
  • cabal-install and stack differ slightly in how they go about passing home modules to the invocation of GHCi: cabal-install will call GHC with an argument for every module in the home package; Stack will pass a GHCi script of things to load. I'm not sure which is more convenient, but it probably doesn't matter too much if you know already know which module you want to look at (perhaps you got it from a frontend option.)

by Edward Z. Yang at February 09, 2017 12:45 AM

February 08, 2017

Yesod Web Framework

Changes to Yesod's CI

I've made some changes in the past few days to the CI setup for the yesodweb/yesod repo that I thought contributors may be interested in knowing about.

  • We were regularly running into build timeouts on OS X on Travis. To work around this, we no longer build benchmarks and Haddocks on OS X, and compile with -O0. Relevant Travis changes
  • I've (finally) bit the bullet and made the repo compile with -Wall -Werror enabled, at least for recent versions of GHC. From now on, PRs will need to maintain warning-cleanliness. Relevant Travis changes
  • There's now an AppVeyor configuration, so PRs can be checked against Windows (in addition to the existing Linux and OS X coverage provided by Travis). This did, in fact, reveal two issues around line endings. Relevant AppVeyor addition.

Nothing major, just a few changes that contributors should be aware of. Hopefully that green checkmark on a PR will now have a few less of both false positives and false negatives.

February 08, 2017 05:15 AM

February 07, 2017

Mark Jason Dominus

How many 24 puzzles are there?

[ Note: The tables in this article are important, and look unusually crappy if you read this blog through an aggregator. The properly-formatted version on my blog may be easier to follow. ]

A few months ago I wrote about puzzles of the following type: take four digits, say 1, 2, 7, 7, and, using only +, -, ×, and ÷, combine them to make the number 24. Since then I have been accumulating more and more material about these puzzles, which will eventually appear here. But meantime here is a delightful tangent.

In the course of investigating this I wrote programs to enumerate the solutions of all possible puzzles, and these programs were always much faster than I expected at first. It appears as if there are 10,000 possible puzzles, from «0,0,0,0» through «9,9,9,9». But a moment's thought shows that there are considerably fewer, because, for example, the puzzles «7,2,7,1», «1,2,7,7», «7,7,2,1», and «2,7,7,1» are all the same puzzle. How many puzzles are there really?

A back-of-the-envelope estimate is that only about 1 in 24 puzzles is really distinct (because there are typically 24 ways to rearrange the elements of a puzzle) and so there ought to be around puzzles. This is an undercount, because there are fewer duplicates of many puzzles; for example there are not 24 variations of «1,2,7,7», but only 12. The actual number of puzzles turns out to be 715, which I think is not an obvious thing to guess.

Let's write for the set of sequences of length containing up to different symbols, with the duplicates removed: when two sequences are the same except for the order of their symbols, we will consider them the same sequence.

Or more concretely, we may imagine that the symbols are sorted into nondecreasing order, so that is the set of nondecreasing sequences of length of different symbols.

Let's also write for the number of elements of .

Then is the set of puzzles where input is four digits. The claim that there are such puzzles is just that . A tabulation of reveals that it is closely related to binomial coefficients, and indeed that $$C(d,n)=\binom{n+d-1}{d-1}.\tag{$\heartsuit$}$$

so that the surprising is actually . This is not hard to prove by induction, because is easily shown to obey the same recurrence as : $$C(d,n) = C(d-1,n) + C(d,n-1).\tag{$\spadesuit$}$$

To see this, observe that an element of either begins with a zero or with some other symbol. If it begins with a zero, there are ways to choose the remaining symbols in the sequence. But if it begins with one of the other symbols it cannot contain any zeroes, and what we really have is a length- sequence of the symbols , of which there are .

<style> table .rt td { padding: 2px 8px } </style>

0 0 0 0 1 1 1
0 0 0 1 1 1 2
0 0 0 2 1 1 3
0 0 0 3 1 1 4
0 0 1 1 1 2 2
0 0 1 2 1 2 3
0 0 1 3 1 2 4
0 0 2 2 1 3 3
0 0 2 3 1 3 4
0 0 3 3 1 4 4
0 1 1 1 2 2 2
0 1 1 2 2 2 3
0 1 1 3 2 2 4
0 1 2 2 2 3 3
0 1 2 3 2 3 4
0 1 3 3 2 4 4
0 2 2 2 3 3 3
0 2 2 3 3 3 4
0 2 3 3 3 4 4
0 3 3 3 4 4 4

Now we can observe that (they are both 35) so that . We might ask if there is a combinatorial proof of this fact, consisting of a natural bijection between and . Using the relation we have:

$$ \begin{eqnarray} C(4,4) & = & C(3, 4) + & C(4,3) \\ C(5,3) & = & & C(4,3) + C(5,2) \\ \end{eqnarray}$$

so part of the bijection, at least, is clear: There are elements of that begin with a zero, and also elements of that do not begin with a zero, so whatever the bijection is, it ought to match up these two subsets of size 20. This is perfectly straightforward; simply match up (blue) with (pink), as shown at right.

But finding the other half of the bijection, between and , is not so straightforward. (Both have 15 elements, but we are looking for not just any bijection but for one that respects the structure of the elements.) We could apply the recurrence again, to obtain:

$$ \begin{eqnarray} C(3,4) & = \color{darkred}{C(2, 4)} + \color{darkblue}{C(3,3)} \\ C(5,2) & = \color{darkblue}{C(4,2)} + \color{darkred}{C(5,1)} \end{eqnarray}$$

and since $$ \begin{eqnarray} \color{darkred}{C(2, 4)} & = \color{darkred}{C(5,1)} \\ \color{darkblue}{C(3,3)} & = \color{darkblue}{C(4,2)} \end{eqnarray}$$

we might expect the bijection to continue in that way, mapping and . Indeed there is such a bijection, and it is very nice.

To find the bijection we will take a detour through bitstrings. There is a natural bijection between and the bit strings that contain zeroes and ones. Rather than explain it with pseudocode, I will give some examples, which I think will make the point clear. Consider the sequence . Suppose you are trying to communicate this sequence to a computer. It will ask you the following questions, and you should give the corresponding answers:

  • “Is the first symbol 0?” (“No”)
  • “Is the first symbol 1?” (“Yes”)
  • “Is the second symbol 1?” (“Yes”)
  • “Is the third symbol 1?” (“No”)
  • “Is the third symbol 2?” (“No”)
  • “Is the third symbol 3?” (“Yes”)
  • “Is the fourth symbol 3?” (“No”)
  • “Is the fourth symbol 4?” (“Yes”)

At each stage the computer asks about the identity of the next symbol. If the answer is “yes” the computer has learned another symbol and moves on to the next element of the sequence. If it is “no” the computer tries guessing a different symbol. The “yes” answers become ones and “no” answers become zeroes, so that the resulting bit string is 0 1 1 0 0 1 0 1.

It sometimes happens that the computer figures out all the elements of the sequence before using up its questions; in this case we pad out the bit string with zeroes, or we can imagine that the computer asks some pointless questions to which the answer is “no”. For example, suppose the sequence is :

  • “Is the first symbol 0?” (“Yes”)
  • “Is the second symbol 0?” (“No”)
  • “Is the second symbol 1?” (“Yes”)
  • “Is the third symbol 1?” (“Yes”)
  • “Is the fourth symbol 1?” (“Yes”)

The bit string is 1 0 1 1 1 0 0 0, where the final three 0 bits are the padding.

We can reverse the process, simply taking over the role of the computer. To find the sequence that corresponds to the bit string 0 1 1 0 1 0 0 1, we ask the questions ourselves and use the bits as the answers:

  • “Is the first symbol 0?” (“No”)
  • “Is the first symbol 1?” (“Yes”)
  • “Is the second symbol 1?” (“Yes”)
  • “Is the third symbol 1?” (“No”)
  • “Is the third symbol 2?” (“Yes”)
  • “Is the fourth symbol 2?” (“No”)
  • “Is the fourth symbol 3?” (“No”)
  • “Is the fourth symbol 4?” (“Yes”)

We have recovered the sequence from the bit string 0 1 1 0 1 0 0 1.

This correspondence establishes relation in a different way from before: since there is a natural bijection between and the bit strings with zeroes and ones, there are certainly of them as says because there are bits and we may choose any to be the zeroes.

We wanted to see why . The detour above shows that there is a simple bijection between

and the bit strings with 4 zeroes and 3 ones

on one hand, and between

and the bit strings with 3 zeroes and 4 ones

on the other hand. And of course the bijection between the two sets of bit strings is completely obvious: just exchange the zeroes and the ones.

The table below shows the complete bijection between and its descriptive bit strings (on the left in blue) and between and its descriptive bit strings (on the right in pink) and that the two sets of bit strings are complementary. Furthermore the top portion of the table shows that the subsets of the two families correspond, as they should—although the correct correspondence is the reverse of the one that was displayed earlier in the article, not the suggested at all. Instead, in the correct table, the initial digit of the entry says how many zeroes appear in the entry, and vice versa; then the increment to the next digit says how many ones, and so forth.

<style> table .tb { padding-color: purple; } table .tb td { text-align: center; padding: 1px 4px } td .c2 { border-right: dotted thin black } th .c2 { border-right: dotted thin black } </style>

(bits)(complement bits)
0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 4 4 4
0 0 0 1 1 1 1 0 1 0 0 0 0 0 1 0 1 1 3 4 4
0 0 0 2 1 1 1 0 0 1 0 0 0 0 1 1 0 1 3 3 4
0 0 0 3 1 1 1 0 0 0 1 0 0 0 1 1 1 0 3 3 3
0 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 1 1 2 4 4
0 0 1 2 1 1 0 1 0 1 0 0 0 1 0 1 0 1 2 3 4
0 0 1 3 1 1 0 1 0 0 1 0 0 1 0 1 1 0 2 3 3
0 0 2 2 1 1 0 0 1 1 0 0 0 1 1 0 0 1 2 2 4
0 0 2 3 1 1 0 0 1 0 1 0 0 1 1 0 1 0 2 2 3
0 0 3 3 1 1 0 0 0 1 1 0 0 1 1 1 0 0 2 2 2
0 1 1 1 1 0 1 1 1 0 0 0 1 0 0 0 1 1 1 4 4
0 1 1 2 1 0 1 1 0 1 0 0 1 0 0 1 0 1 1 3 4
0 1 1 3 1 0 1 1 0 0 1 0 1 0 0 1 1 0 1 3 3
0 1 2 2 1 0 1 0 1 1 0 0 1 0 1 0 0 1 1 2 4
0 1 2 3 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 2 3
0 1 3 3 1 0 1 0 0 1 1 0 1 0 1 1 0 0 1 2 2
0 2 2 2 1 0 0 1 1 1 0 0 1 1 0 0 0 1 1 1 4
0 2 2 3 1 0 0 1 1 0 1 0 1 1 0 0 1 0 1 1 3
0 2 3 3 1 0 0 1 0 1 1 0 1 1 0 1 0 0 1 1 2
0 3 3 3 1 0 0 0 1 1 1 0 1 1 1 0 0 0 1 1 1
1 1 1 1 0 1 1 1 1 0 0 1 0 0 0 0 1 1 0 4 4
1 1 1 2 0 1 1 1 0 1 0 1 0 0 0 1 0 1 0 3 4
1 1 1 3 0 1 1 1 0 0 1 1 0 0 0 1 1 0 0 3 3
1 1 2 2 0 1 1 0 1 1 0 1 0 0 1 0 0 1 0 2 4
1 1 2 3 0 1 1 0 1 0 1 1 0 0 1 0 1 0 0 2 3
1 1 3 3 0 1 1 0 0 1 1 1 0 0 1 1 0 0 0 2 2
1 2 2 2 0 1 0 1 1 1 0 1 0 1 0 0 0 1 0 1 4
1 2 2 3 0 1 0 1 1 0 1 1 0 1 0 0 1 0 0 1 3
1 2 3 3 0 1 0 1 0 1 1 1 0 1 0 1 0 0 0 1 2
1 3 3 3 0 1 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1
2 2 2 2 0 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 4
2 2 2 3 0 0 1 1 1 0 1 1 1 0 0 0 1 0 0 0 3
2 2 3 3 0 0 1 1 0 1 1 1 1 0 0 1 0 0 0 0 2
2 3 3 3 0 0 1 0 1 1 1 1 1 0 1 0 0 0 0 0 1
3 3 3 3 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0

Observe that since we have in general that , which may be surprising. One might have guessed that since , the relation was and that would have the same structure as , but it isn't so. The two arguments exchange roles. Following the same path, we can identify many similar ‘coincidences’. For example, there is a simple bijection between the original set of 715 puzzles, which was , and , the set of nondecreasing sequences of of length .

[ Thanks to Bence Kodaj for a correction. ]

by Mark Dominus ( at February 07, 2017 08:31 AM

Dimitri Sabadie

Lifetimes limits – self borrowing and dropchecker

Lately, I’ve been playing around with alto in my demoscene framework. This crate in the replacement of openal-rs as openal-rs has been deprecated because unsound. It’s a wrapper over OpenAL, which enables you to play 3D sounds and gives you several physical properties and effects you can apply.

The problem

Just to let you fully understand the problem, let me introduce a few principles from alto. As a wrapper over OpenAL, it exposes quite the same interface, but adds several safe-related types. In order to use the API, you need three objects:

  • an Alto object, which represents the API object (it holds dynamic library handles, function pointers, etc. ; we don’t need to know about that)
  • a Device object, a regular device (a sound card, for example)
  • a Context object, used to create audio resources, handle the audio context, etc.

There are well-defined relationships between those objects that state about their lifetimes. An Alto object must outlive the Device and the Device must outlive the Context. Basically:

let alto = Alto::load_default(None).unwrap(); // bring the default OpenAL implementation in
let dev =; // open the default device
let ctx = dev.new_context(None).unwrap(); // create a default context with no OpenAL extension

As you can see here, the lifetimes are not violated, because alto outlives dev which outlives ctx. Let’s dig in the type and function signatures to get the lifetimes right (documentation here).

fn Alto::open<'s, S: Into<Option<&'s CStr>>>(&self, spec: S) -> AltoResult<Device>

The S type is just a convenient type to select a specific implementation. We need the default one, so just pass None. However, have a look at the result. AltoResult<Device>. I told you about lifetime relationships. This one might be tricky, but you always have to wonder “is there an elided lifetime here?”. Look at the Device type:

pub struct Device<'a> { /* fields omitted */ }

Yep! So, what’s the lifetime of the Device in AltoResult<Device>? Well, that’s simple: the lifetime elision rule in action is one of the simplest:

If there are multiple input lifetime positions, but one of them is &self or &mut self, the lifetime of self is assigned to all elided output lifetimes. (source)

So let’s rewrite the Alto::open function to make it clearer:

fn Alto::open<'a, 's, S: Into<Option<&'s CStr>>>(&'a self, spec: S) -> AltoResult<Device<'a>> // exact same thing as above

So, what you can see here is that the Device must be valid for the same lifetime as the reference we pass in. Which means that Device cannot outlive the reference. Hence, it cannot outlive the Alto object.

impl<'a> Device<'a> {
// …
fn new_context<A: Into<Option<ContextAttrs>>>(&self, attrs: A) -> AltoResult<Context>
// …

That looks a bit similar. Let’s have a look at Context:

pub struct Context<'d> { /* fields omitted */ }

Yep, same thing! Let’s rewrite the whole thing:

impl<'a> Device<'a> {
// …
fn new_context<'b, A: Into<Option<ContextAttrs>>>(&'b self, attrs: A) -> AltoResult<Context<'b>>
// …

Plus, keep in mind that self is actually Device<'a>. The first argument of this function then awaits a &'b Device<'a> object!

rustc is smart enough to automatically insert the 'a: 'b lifetime bound here – i.e. the 'a lifetime outlives 'b. Which makes sense: the reference will die before the Device<'a> is dropped.

Ok, ok. So, what’s the problem then?!

The (real) problem

The snippet of code above about how to create the three objects is straight-forward (though we don’t take into account errors, but that’s another topic). However, in my demoscene framework, I really don’t want people to use that kind of types. The framework should be completely agnostic about which technology or API is used internally. For my purposes, I just need a single type with a few methods to work with.

Something like that:

struct Audio = {}

impl Audio {
pub fn new<P>(track_path: P) -> Result<Self> where P: AsRef<Path> {}

pub fn toggle(&mut self) -> bool {}

pub fn playback_cursor(&self) -> f32 {}

pub fn set_playback_cursor(&self, t: f32) {}

impl Drop for Audio {
fn drop(&mut self) {
// stop the music if playing; do additional audio cleanup

This is a very simple interface, yet I don’t need more. Audio::set_playback_cursor is cool when I debug my demos in realtime by clicking a time panel to quickly jump to a part of the music. Audio::toggle() enables me to pause the demo to inspect an effect in the demo. Etc.

However, how can I implement Audio::new?

The (current) limits of borrowing

The problem kicks in as we need to wrap the three types – Alto, Device and Context – as the fields of Audio:

struct Audio<'a> {
alto: Alto,
dev: Device<'a>,
context: Context<'a>

We have a problem if we do this. Even though the type is correct, we cannot correctly implement Audio::new. Let’s try:

impl<'a> Audio<'a> {
pub fn new<P>(_: P) -> Result<Self> where P: AsRef<Path> {
let alto = Alto::load_default(None).unwrap();
let dev =;
let ctx = dev.new_context(None).unwrap();

Ok(Audio {
alto: alto,
dev: dev,
ctx: ctx

As you can see, that cannot work:

error: `alto` does not live long enough
--> /tmp/alto/src/
14 | let dev =;
| ^^^^ does not live long enough
22 | }
| - borrowed value only lives until here
note: borrowed value must be valid for the lifetime 'a as defined on the body at 12:19...
--> /tmp/alto/src/
12 | fn new() -> Self {
| ^

error: `dev` does not live long enough
--> /tmp/alto/src/
15 | let ctx = dev.new_context(None).unwrap();
| ^^^ does not live long enough
22 | }
| - borrowed value only lives until here
note: borrowed value must be valid for the lifetime 'a as defined on the body at 12:19...
--> /tmp/alto/src/
12 | fn new() -> Self {
| ^

error: aborting due to 2 previous errors

What’s going on here? Well, we’re hitting a problem called the problem of self-borrowing. Look at the first two lines of our implementation of Audio::new:

let alto = Alto::load_default(None).unwrap();
let dev =;

As you can see, the call to Alto::open borrows alto – via a &Alto reference. And of course, you cannot move a value that is borrowed – that would invalidate all the references pointing to it. We also have another problem: imagine we could do that. All those types implement Drop. Because they basically all have the same lifetime, there’s no way to know which one borrows information from whom. The dropchecker has no way to know that. It will then refuse to code creating objects of this type, because dropping might be unsafe in that case.

What can we do about it?

Currently, this problem is linked to the fact that the lifetime system is a bit too restrictive and doesn’t allow for self-borrowing. Plus, you also have the dropchecker issue to figure out. Even though we were able to bring in alto and device altogether, how do you handle context? The dropchecker doesn’t know which one must be dropped first – there’s no obvious link at this stage between alto and all the others anymore, because that link was made with a reference to alto that died – we’re moving out of the scope of the Audio::new function.

That’s a bit tough. The current solution I implemented to fix the issue is ok–ish, but I dislike it because it adds a significant performance overhead: I just moved the initialization code in a thread that stays awake until the Audio object dies, and I use a synchronized channel to communicate with the objects in that thread. That works because the thread provides us with a stack, that is the support of lifetimes – think of scopes.

Another solution would be to move that initialization code in a function that would accept a closure – your application. Once everything is initialized, the closure is called with a few callbacks to toggle / set the cursor of the object living “behind” on the stack. I don’t like that solution because it modifies the main design – having an Audio object was the goal.

Other solutions are:

  • std::mem::transmute to remove the lifetimes (replace them with 'static). That’s hyper dangerous and we are just breaking Rust’s lifetimes… not okay :(
  • change our design to meet the same as alto’s (in a word: use the same three objects)
  • cry deeply

I don’t have a satisfying solution yet to that problem. My thread solution works and lets me have a single type abstracting all of that, but having a thread for such a thing is a waste of resources to me. I think I’ll implement the closure solution as, currently, it’s not possible to embed in struct lifetimes’ semantics / logic. I guess it’s okay; I guess the problem is also linked to the fact the concept is pretty young and we’re still kind of experimenting it. But clearly, lifetimes hit a hard problem here that they cannot solve correctly. Keep in mind that even if unsafe solutions exist, we’re talking about a library that’s designed to work with Rust lifetimes as a pretty high level of abstraction. Firing transmute is very symptomatic of something wrong. I’m open to suggestions, because I’ve been thinking the problem all day long without finding a proper solution.

Keep the vibe!

by Dimitri Sabadie ( at February 07, 2017 02:04 AM

Joachim Breitner

Why prove programs equivalent when your compiler can do that for you?

Last week, while working on CodeWorld, via a sequence of yak shavings, I ended up creating a nicely small library that provides Control.Applicative.Succs, a new applicative functor. And because I am trying to keep my Haskell karma good1, I wanted to actually prove that my code fulfills the Applicative and Monad laws.

This led me to inserted writing long comments into my code, filled with lines like this:

The second Applicative law:

  pure (.) <*> Succs u us <*> Succs v vs <*> Succs w ws
= Succs (.) [] <*> Succs u us <*> Succs v vs <*> Succs w ws
= Succs (u .) (map (.) us) <*> Succs v vs <*> Succs w ws
= Succs (u . v) (map ($v) (map (.) us) ++ map (u .) vs) <*> Succs w ws
= Succs (u . v) (map (($v).(.)) us ++ map (u .) vs) <*> Succs w ws
= Succs ((u . v) w) (map ($w) (map (($v).(.)) us ++ map (u .) vs) ++ map (u.v) ws)
= Succs ((u . v) w) (map (($w).($v).(.)) us ++ map (($w).(u.)) vs ++ map (u.v) ws)
= Succs (u (v w)) (map (\u -> u (v w)) us ++ map (\v -> u (v w)) vs ++ map (\w -> u (v w)) ws)
= Succs (u (v w)) (map ($(v w)) us ++ map u (map ($w) vs ++ map v ws))
= Succs u us <*> Succs (v w) (map ($w) vs ++ map v ws)
= Succs u us <*> (Succs v vs <*> Succs w ws)

Honk if you have done something like this before!

I proved all the laws, but I was very unhappy. I have a PhD on something about Haskell and theorem proving. I have worked with Isabelle, Agda and Coq. Both Haskell and theorem proving is decades old. And yet, I sit here, and tediously write manual proofs by hand. Is this really the best we can do?

Of course I could have taken my code, rewritten it in, say, Agda, and proved it correct there. But (right now) I don’t care about Agda code. I care about my Haskell code! I don’t want to write it twice, worry about copying mistakes and mismatchs in semantics, and have external proofs to maintain. Instead, I want to prove where I code, and have the proofs checked together with my code!

Then it dawned to me that this is, to some extent, possible. The Haskell compiler comes with a sophisticated program transformation machinery, which is meant to simplify and optimize code. But it can also be used to prove Haskell expressions to be equivalent! The idea is simple: Take two expressions, run both through the compiler’s simplifier, and check if the results are the same. If they are, then the expressions are, as far as the compiler is concerned, equivalent.

A handful of hours later, I was able to write proof tasks like

app_law_2 = (\ a b (c::Succs a) -> pure (.) <*> a <*> b <*> c)
        === (\ a b c -> a <*> (b <*> c))

and others into my source file, and the compiler would tell me happily:

[1 of 1] Compiling Successors       ( Successors.hs, Successors.o )
GHC.Proof: Proving getCurrent_proof1 …
GHC.Proof: Proving getCurrent_proof2 …
GHC.Proof: Proving getCurrent_proof3 …
GHC.Proof: Proving ap_star …
GHC.Proof: Proving getSuccs_proof1 …
GHC.Proof: Proving getSuccs_proof2 …
GHC.Proof: Proving getSuccs_proof3 …
GHC.Proof: Proving app_law_1 …
GHC.Proof: Proving app_law_2 …
GHC.Proof: Proving app_law_3 …
GHC.Proof: Proving app_law_4 …
GHC.Proof: Proving monad_law_1 …
GHC.Proof: Proving monad_law_2 …
GHC.Proof: Proving monad_law_3 …
GHC.Proof: Proving return_pure …
GHC.Proof proved 15 equalities

This is how I want to prove stuff about my code!

Do you also want to prove stuff about your code? I packaged this up as a GHC plugin in the Haskell library ghc-proofs (not yet on Hackage). The README of the repository has a bit more detail on how to use this plugin, how it works, what its limitations are and where this is heading.

This is still only a small step, but finally there is a step towards low threshold program equivalence proofs in Haskell.

  1. Or rather recover my karma after such abominations such as ghc-dup, seal-module or ghc-heap-view.

by Joachim Breitner ( at February 07, 2017 12:38 AM

February 06, 2017

FP Complete

MonadMask vs MonadBracket

The exceptions package provides three typeclasses for generalizing exception handling to monads beyond IO:

  • MonadThrow is for monads which allow reporting an exception
  • MonadCatch is for monads which also allow catching a throw exception
  • MonadMask is for monads which also allow safely acquiring resources in the presence of asynchronous exceptions

For reference, these are defined as:

class Monad m => MonadThrow m where
  throwM :: Exception e => e -> m a

class MonadThrow m => MonadCatch m where
  catch :: Exception e => m a -> (e -> m a) -> m a

class MonadCatch m => MonadMask m where
  mask :: ((forall a. m a -> m a) -> m b) -> m b
  uninterruptibleMask :: ((forall a. m a -> m a) -> m b) -> m b

This breakdown of the typeclasses is fully intentional, as each added capability excludes some class of monads, e.g.:

  • Maybe is a valid instance of MonadThrow, but since it throws away information on the exception that was thrown, it cannot be a MonadCatch
  • Continuation-based monads like Conduit are capable of catching synchronously thrown exceptions and are therefore valid MonadCatch instances, but cannot provide guarantees of safe resource cleanup (which is why the resourcet package exists), and are therefore not MonadMask instances

However, there are two tightly related questions around MonadMask which trip people up a lot:

  • Why is there no instance for MonadMask for EitherT (or its new synonym ExceptT)? It's certainly possible to safely acquire resources within an EitherT transformer (see below for an example).
  • It seems perfectly reasonable to define an instance of MonadMask for a monad like Conduit, as its only methods are mask and uninterruptibleMask, which can certainly be implemented in a way that respects the types. The same applies to EitherT for that matter.

Let's look at the docs for the MonadMask typeclass for a little more insight:

Instances should ensure that, in the following code:

f `finally` g

The action g is called regardless of what occurs within f, including async exceptions.

Well, this makes sense: finally is a good example of a function that guarantees cleanup in the event of any exception, so we'd want this (fairly straightforward) constraint to be met. The thing is, the finally function is not part of the MonadMask typeclass, but is instead defined on its own as (doing some aggressive inlining):

finally :: MonadMask m => m a -> m b -> m a
finally action finalizer = mask $ \unmasked -> do
  result <- unmasked action `catch` \e -> do
    throwM (e :: SomeException)
  return result

Let's specialize the type signature to the ExceptT MyError IO type:

finally :: ExceptT MyError IO a
        -> ExceptT MyError IO b
        -> ExceptT MyError IO a

If we remember that ExceptT is defined as:

newtype ExceptT e m a = ExceptT (m (Either e a))

We can rewrite that signature to put the IO on the outside with an explicit Either return value. Inlining the Monad instance for ExceptT into the above implementation of finally, we get:

finally :: IO (Either MyError a)
        -> IO (Either MyError b)
        -> IO (Either MyError a)
finally action finalizer = mask $ \unmasked -> do
  eresult <- unmasked action `catch` \e -> do
    throwM (e :: SomeException)
  case eresult of
    Left err -> return (Left err)
    Right result -> do
      return result

(I took some shortcuts in this implementation to focus on the bad part, take it as an exercise to the reader to make a fully faithful implementation of this function.)

With this inlined implementation, the problem becomes much easier to spot. We run action, which may result in a runtime exception. If it does, our catch function kicks in, we run the finalizer, and rethrow the exception, awesome.

If there's no runtime exception, we have two cases to deal with: the result is either Right or Left. In the case of Right, we run our finalizer and return the result. Awesome.

But the problem is in the Left case. Notice how we're not running the finalizer at all, which is clearly problematic behavior. I'm not pointing out anything new here, as this has been well known in the Haskell world, with packages like MonadCatchIO-transformers in the past.

Just as importantly, I'd like to point out that it's exceedingly trivial to write a correct version of finally for the IO (Either MyError a) case, and therefore for the ExceptT MyError IO a case as well:

finally :: IO (Either MyError a)
        -> IO (Either MyError b)
        -> IO (Either MyError a)
finally action finalizer = mask $ \unmasked -> do
  eresult <- unmasked action `catch` \e -> do
    throwM (e :: SomeException)
  return eresult

While this may look identical to the original, unspecialized version we have in terms of MonadMask and MonadCatch, there's an important difference: the monad used in the do-notation is IO, not ExceptT, and therefore the presence of a Left return value no longer has any special effect on control flow.

There are arguments to be had about the proper behavior to be displayed when the finalizer has some error condition, but I'm conveniently eliding that point right now. The point is: we can implement it when specializing Either or ExceptT.

Enter MonadBracket

A few weeks ago I was working on a pull request for the foundation package, adding a ResourceT transformer. At the time, foundation didn't have anything like MonadMask, so I needed to create such a typeclass. I could have gone with something matching the exceptions package; instead, I went for the following:

class MonadCatch m => MonadBracket m where
    -- | A generalized version of the standard bracket function which
    -- allows distinguishing different exit cases.
        :: m a
        -- ^ acquire some resource
        -> (a -> b -> m ignored1)
        -- ^ cleanup, no exception thrown
        -> (a -> E.SomeException -> m ignored2)
        -- ^ cleanup, some exception thrown. The exception will be rethrown
        -> (a -> m b)
        -- ^ inner action to perform with the resource
        -> m b

This is a generalization of the bracket function. Importantly, it allows you to provide different cleanup functions for the success and failure cases. It also provides you with more information for cleanup, namely the exception that occured or the success value.

I think this is a better abstraction than MonadMask:

  • It allows for a natural and trivial definition of all of the cleanup combinators (bracket, finally, onException, etc) in terms of this one primitive.
  • The primitive can be defined with full knowledge of the implementation details of the monad in question.
  • It makes invalid instances of MonadBracket look "obviously wrong" instead of just being accidentally wrong.

We can fiddle around with the exact definition of generalBracket. For example, with the type signature above, there is no way to create an instance for ExceptT, since in the case of a Left return value from the action:

  • We won't have a runtime exception to pass to the exceptional cleanup function
  • We won't have a success value to pass to the success cleanup function

This can easily be fixed by replacing:

-> (a -> b -> m ignored1)
-- ^ cleanup, no exception thrown


-> (a -> m ignored1)
-- ^ cleanup, no exception thrown

The point is: this formulation can allow for more valid instances, make it clearer why some instances don't exist, and prevent people from accidentally creating broken, buggy instances.

Note that I'm not actually proposing any changes to the exceptions package right now, I'm merely commenting on this new point in the design space. Backwards compatibility is something we need to seriously consider before rolling out changes.

February 06, 2017 02:44 PM

Functional Jobs

Back-end Software Engineer at Maxwell Health (Full-time)

At Maxwell, we are inspired by the opportunity to make a difference in society by helping people make better health, wellness and financial decisions. We build web and mobile applications that support a rich set of interactions among consumers, employers, brokers and product vendors. Our domain requires us to consider usability, design, business workflow, self-service and process scalability. We view ourselves as the stewards of change for a very complex industry.

About Our Team:

We passionately solve problems and commit to quality as the leading design principle.

We are agile practitioners who think critically and systemically.

We are building a culture of automation.

We scale ourselves through small, self-empowered and self-organizing teams.

We focus on learning, at an organizational and personal level.

We value a diverse team with a variety of educational backgrounds.

About You:

You are joining us to teach us something new and to learn a few tricks from us too.

You care deeply about engineering high quality and resilient features.

You are excited to work with a team that strives to delight customers no matter what it takes.

You are an idealist at heart, and our mission resonates with you.


This position involves joining and bootstrapping a new team to set up a new service focused around products and benefit plans.

This new service will be used across teams within and outside of Maxwell.

This work is critical for Maxwell to deliver a strong and seamless on-boarding experience to our customers.

In addition to this new project, as part of the Maxwell engineering team your day-to-day responsibilities will involve:

Contribute to the advancement of Maxwell Health web and mobile applications.

Solve challenging software design problems.

Practice TDD and BDD on a day-to-day basis within agile teams.

Teach and mentor colleagues who are looking to improve their craftsmanship.

Invest in constant personal development and learning.

Must Have:

At least 3 years of experience writing and delivering software

Interest in functional programming and basic knowledge of at least one functional language (e.g. Haskell, Scala, Clojure, Elixir...)

Strong communication skills and experience working in an agile, highly collaborative environment

Experience working on distributed platforms

Experience using and designing web-based APIs (e.g. REST, SOAP, GraphQL, etc)

Experience with relational and/or NoSQL databases

Nice to have:

Experience with a functional, statically-typed language such as Haskell, OCaml or Scala

Experience with Domain Driven Design and complex domain modeling

Experience with the insurance/benefits industry

Get information on how to apply for this position.

February 06, 2017 02:35 PM

February 05, 2017

Gabriel Gonzalez

Program JSON and YAML with Dhall

This is short post to announce a new dhall-json library which lets you compile the Dhall configuration language to both JSON and YAML. This in turn means that you can now program JSON and YAML using a non-Turing-complete programming language.

Here's a quick example of the library in action:

$ dhall-to-json <<< "{ foo = 1, bar = [1, 2, 3] }"
$ dhall-to-yaml <<< "{ foo = 1, bar = [1, 2, 3] }"
foo: 1
- 1
- 2
- 3
$ cat example
let indexed
in let map
in let words
= [ "Lorem"
, "ipsum"
, "dolor"
, "sit"
, "amet"
, "consectetur"
, "adipiscing"
, "elit"
in let makeRow
= λ(r : { index : Natural, value : Text })
{ index = r.index
, background = if Natural/even r.index then "Green" else "Red"
, contents = r.value
in map
{ index : Natural, value : Text }
{ index : Natural, background : Text, contents : Text }
(indexed Text words)
$ dhall-to-json <<< "./example"
$ dhall-to-yaml <<< "./example"
- contents: Lorem
index: 0
background: Green
- contents: ipsum
index: 1
background: Red
- contents: dolor
index: 2
background: Green
- contents: sit
index: 3
background: Red
- contents: amet
index: 4
background: Green
- contents: consectetur
index: 5
background: Red
- contents: adipiscing
index: 6
background: Green
- contents: elit
index: 7
background: Red

This library bundles both JSON and YAML functionality together because Haskell's yaml library reuses the exact same data type that Haskell's aeson library uses to represent JSON (i.e. the Value type). This means that if you can compile a Dhall expression to a Value then you can render that Value as both JSON and YAML.

Unlike the Dhall bindings to Nix, you can't compile most Dhall features to JSON or YAML, since they aren't real programming languages. After all, the whole point of this binding is to make JSON and YAML programmable! So if you try to compile anything other than primitive types or records you will get a compile error:

$ dhall-to-json <<< "λ(x : Bool) → x"

Error: Cannot translate to JSON

Explanation: Only primitive values, records, ❰List❱s, and ❰Optional❱ values can
be translated from Dhall to JSON

The following Dhall expression could not be translated to JSON:

↳ λ(x : Bool)x

Right now I'm slowly adding Dhall integrations with new languages and configuration file formats so that people can use Dhall as a universal configuration language. So far I've targeted integrations that can reuse the initial Haskell implementation of Dhall (i.e. Nix, JSON, and YAML bindings). However, my next integration will probably reimplement Dhall in another language so that Dhall can be used to natively configure a wider range of languages without having to invoke a command-line dhall executable.

I'm most likely to build the second implementation of Dhall in Rust so that I can also reuse the same implementation to expose a C and C++ API. However, I've never done this sort of thing before, so if this is a dumb idea, let me know!

You can find the dhall-json package on Github or Hackage and if you would like to contribute to Dhall in general you can open an issue here.

by Gabriel Gonzalez ( at February 05, 2017 11:30 PM

Dan Piponi (sigfpe)

Logarithms and exponentials of functions


A popular question in mathematics is this: given a function , what is its "square root" in the sense that . There are many questions about this on mathoverflow but it's also a popular subject in mathematics forums for non-experts. This question seems to have a certain amount of notoriety because it's easy to ask but hard to answer fully. I want to look at an approach that works nicely for formal power series, following from the Haskell code I wrote here. There are some methods for directly finding "functional square roots" for formal power series that start as , but I want to approach the problem indirectly. When working with real numbers we can find square roots, say, by using . I want to use an analogue of this for functions. So my goal is to make sense of the idea of the logarithm and exponential of a formal power series as composable functions. Warning: the arguments are all going to be informal.


There's potential for a lot of ambiguous notation here, especially as the usual mathematical notation for th powers of trig functions is so misleading. I'm going to use for composition of functions and power series, and I'm going to use the notation to mean the th iterate of . So and . As I'll be working mostly in the ring of formal power series for some ring , I'll reserve the variable to refer only to the corresponding element in this ring. I'll also use formal power series somewhat interchangeably with functions. So can be thought of as representing the identity function. To make sure we're on the same page, here are some small theorems in this notation:

  1. .
That last one simply says that adding one times is the same as adding .

As I'm going to have ordinary logarithms and exponentials sitting around, as well as functional logarithms and exponentials, I'm going to introduce the notation for functional logarithm and for functional exponentiation.


The first goal is to define a non-trivial function with the fundamental property that

First, let's note some basic algebraic facts. The formal power series form a commutative ring with operations and (ordinary multiplication) and with additive identity and multiplicative identity . The formal power series form a ring-like algebraic structure with operation and partial operation with additive identity and multiplicative identity . But it's not actually ring or even a near-ring. Composition isn't defined for all formal power series and even when it's defined, we don't have distributivity. For example, in general , after all there's no reason to expect to equal . We do have right-distributivity however, i.e.

more or less by definition of .

We can't use power series on our power series

There's an obvious approach, just use power series of power series. So we might tentatively suggest that

Note that I consider rather than because is the multiplicative identity in our ring-like structure.

Unfortunately this doesn't work. The reason is this: if we try to use standard reasoning to show that the resulting function has the fundamental property we seek we end up using distributivity. We don't have distributivity.

Sleight of hand

There's a beautiful trick I spotted on mathoverflow recently that allows us to bring back distributivity. (I can't find the trick again, but when I do I'll come back and add a link and credit here.) Consider the function defined by . In other words is right-composition by . (Ambiguity alert, I'm using here to mean right. It has nothing to do with the ring underlying our formal power series.) Because we have right-distributivity, is a bona fide linear operator on the space of formal power series. If you think of formal power series as being infinitely long vectors of coefficients then can be thought of as an infinitely sized matrix. This means that as long as we have convergence, we can get away with using power series to compute with the property that . Define:

We have:
where I'm using to mean the identity linear operator. And now have:
But does it converge? Suppose is of the form . Then . The leading term in is the same as the leading term in . So kills the first term of whatever it is applied to, which means that when we sum the terms in , we only need to get a power series correct to coefficients. Reusing my code from here, I call by the name flog. Here is its implementation:

> import Data.Ratio

> flog :: (Eq a, Fractional a) => [a] -> [a]
> flog f@(0 : 1 : _) =
> flog' 1 (repeat 0) (0 : 1 : repeat 0)
> where flog' n total term = take (n+1) total ++ (
> drop (n+1) $
> let pz = p term
> in flog' (n+1) (total-map (((-1)^n / fromIntegral n) *) pz) pz)
> p total = (total ○ f) - total

The take and drop are how I tell Haskell when the first coefficients have been exactly computed and so no more terms are necessary.

Does it work?

Here's an example using the twice iterated sin function:

> ex1 = do
> let lhs = flog (sin (sin z))
> let rhs = 2*flog (sin z)
> mapM_ print $ take 20 (lhs-rhs)

Works to 20 coefficients. Dare we try an inverse function?

> ex2 = do
> let lhs = flog (sin z)
> let rhs = flog (asin z)
> mapM_ print $ take 20 (lhs+rhs)

Seems to work!


It's no good having logarithms if we can't invert them. One way to think about the exponential function is that

We get better and better approximations by writing the expression inside the limit as a product of more and more terms. We can derive the usual power series for from this, but only if right-distributivity holds. So let's try to use the above expression directly:
and get
Unfortunately, even though is linear, itself isn't. So it's going to take some extra work to raise to the power of .

The good news is that we're dealing with the special case where is something small. We have

So is actually modulo higher order terms. This gives us
This is something we can implement using the power series for ordinary :
In code that becomes:

> fexp f@(0 : 0 : _) = fexp' f 0 z 1
> fexp' f total term n = take (n-1) total ++ drop (n-1)
> (fexp' f (total+term) (map (/fromIntegral n) (f*d term)) (n+1))

Note how when we differentiate a power series we shift the coefficients down by one place. To counter the effect of that so as to ensure convergence we need to look like . Luckily this is exactly the kind of series gives us.

But does it successfully invert ? Let's try:

> ex3 = do
> let lhs = sin z
> let rhs = fexp (flog (sin z))
> mapM_ print $ take 20 (lhs-rhs)

Now we can start computing fractional iterates. Square root first:

> ex4 = do
> mapM_ print $ take 20 $ fexp (flog (sin z)/2)

That matches the results at A048602 and A048603.

Cube root:

> ex5 = do
> mapM_ print $ take 20 $ fexp (flog (sin z)/3)

Matches A052132 and A052135.

And this gives an alternative to Lagrange inversion for computing power series for inverse functions:

> ex6 = do
> let lhs = fexp (-flog (sin z))
> let rhs = asin z
> mapM_ print $ take 20 (lhs-rhs)

What's really going on with ?

Let's approach in a slightly different way. In effect, is the composition of lots of with . So let's try composing these one at a time, with one composition every seconds. After one second we should have our final result. We can write this as:

and to first order.
So we're solving the differential equation:
with .

So is the function that solves one of the most fundamental differential equations. This also means I can use Mathematica to solve symbolically and check my results. For example, Mathematica says that the solution to

at is
so let's check:

> ex7 = do
> let lhs = fexp ((sin z)^2)
> let rhs = atan (tan z/(1-tan z))
> mapM_ print $ take 20 (lhs-rhs)

I like this example because it leads to the generalized Catalan numbers A004148:

> ex8 = do
> mapM_ print $ take 20 $ fexp (z^2/(1-z^2))

That suggests this question: what does mean combinatorially? I don't have a straightforward answer but solving this class of differential equation motivated the original introduction, by Cayley, of the abstract notion of a tree. See here.

What is going on geometrically?

For those who know some differential geometry, The differential equation

describes a flow on the real line (or complex plane). You can think of as being a one-dimensional vector field describing how points move from time to . When we solve the differential equation we get integral curves that these points follow and tells us where the points end up after one unit of time. So is the exponential map. In fact, is essentially the exponential of the vector field where we're now using the differential geometer's notion of a vector field as a differential operator.

Final word

Unfortunately the power series you get from using and don't always have good convergence properties. For example, I'm not sure but I think the series for has radius of convergence zero. If you truncate the series you get a half-decent approximaion to a square root in the vicinity of the origin, but the approximation gets worse, not better, if you use more terms.

And the rest of the code

> (*!) _ 0 = 0
> (*!) a b = a*b
> (!*) 0 _ = 0
> (!*) a b = a*b
> (^+) a b = zipWith (+) a b
> (^-) a b = zipWith (-) a b

> ~(a:as) ⊗ (b:bs) = (a *! b):
> ((map (a !*) bs) ^+ (as ⊗ (b:bs)))
> (○) (f:fs) (0:gs) = f:(gs ⊗ (fs ○ (0:gs)))
> inverse (0:f:fs) = x where x = map (recip f *) (0:1:g)
> _:_:g = map negate ((0:0:fs) ○ x)
> invert x = r where r = map (/x0) ((1:repeat 0) ^- (r ⊗ (0:xs)))
> x0:xs = x

> (^/) (0:a) (0:b) = a ^/ b
> (^/) a b = a ⊗ (invert b)

> z :: [Rational]
> z = 0:1:repeat 0

> d (_:x) = zipWith (*) (map fromInteger [1..]) x

> integrate x = 0 : zipWith (/) x (map fromInteger [1..])

> instance (Eq r, Num r) => Num [r] where
> x+y = zipWith (+) x y
> x-y = zipWith (-) x y
> ~x*y = x ⊗ y
> fromInteger x = fromInteger x:repeat 0
> negate x = map negate x
> signum (x:_) = signum x : repeat 0
> abs (x:xs) = error "Can't form abs of a power series"

> instance (Eq r, Fractional r) => Fractional [r] where
> x/y = x ^/ y
> fromRational x = fromRational x:repeat 0

> sqrt' x = 1 : rs where rs = map (/2) (xs ^- (rs ⊗ (0:rs)))
> _ : xs = x
> instance (Eq r, Fractional r) => Floating [r] where
> sqrt (1 : x) = sqrt' (1 : x)
> sqrt _ = error "Can only find sqrt when leading term is 1"
> exp x = e where e = 1+integrate (e * d x)
> log x = integrate (d x/x)
> sin x = integrate ((cos x)*(d x))
> cos x = [1] ... negate (integrate ((sin x)*(d x)))
> asin x = integrate (d x/sqrt(1-x*x))
> atan x = integrate (d x/(1+x*x))
> acos x = error "Unable to form power series for acos"
> sinh x = integrate ((cosh x)*(d x))
> cosh x = [1] ... integrate ((sinh x)*(d x))
> asinh x = integrate (d x/sqrt(1+x*x))
> atanh x = integrate (d x/(1-x*x))
> acosh x = error "Unable to form power series for acosh"
> pi = error "There is no formal power series for pi"

> lead [] x = x
> lead (a:as) x = a : (lead as (tail x))
> a ... x = lead a x

> (//) :: Fractional a => [a] -> (Integer -> Bool) -> [a]
> (//) a c = zipWith (\a-> \b->(if (c a :: Bool) then b else 0)) [(0::Integer)..] a

A direct functional square root that doesn't use and :

> fsqrt (0 : 1 : fs) =
> let gs = (fs-(0 : gs*((0 : delta gs gs)+((2 : gs)*(gs*g)))))/2
> g = 0 : 1 : gs
> delta (g : gs) h = let g' = delta gs h
> in (0 : ((1 : h) * g')) + gs
> in g

by Dan Piponi ( at February 05, 2017 06:30 PM

February 01, 2017

Douglas M. Auclair (geophf)

January 2017 1HaskellADay problems and solutions

by geophf ( at February 01, 2017 03:59 AM

January 31, 2017

Don Stewart (dons)

Haskell developer roles at Standard Chartered

The Strats team at Standard Chartered is growing again. We have 7 new open roles currently, in a range of areas.

  • Haskell dev for liquidity management analytics.
  • Haskell devs for trade monitoring and control.
  • Contract analysis in Haskell (via FpML)
  • Haskell devs for low latency components in soft real-time non-linear pricing charges service.

You would join an existing team of 30 Haskell developers in Singapore or London. Generally our roles involve directly working with traders to automate their work and improve their efficiency. We use Haskell for all tasks. Either GHC Haskell or our own (“Mu”) implementation, and this is a rare chance to join a large, experienced Haskell dev team.

We offer permanent or contractor positions, at Director and Associate Director level, with very competitive compensation. Demonstrated experience in typed FP (Haskell, OCaml, F# etc) is required or other typed FP.

All roles require some physical presence in either Singapore or London, and we offer flexiblity with these constraints (with work from home available in some cases). No financial background is required or assumed.

More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview.

If this sounds exciting to you, please send your PDF resume to me – donald.stewart <at>

Tagged: jobs

by Don Stewart at January 31, 2017 10:53 AM

Magnus Therning

On mocks and stubs in python (free monad or interpreter pattern)

A few weeks ago I watched a video where Ken Scambler talks about mocks and stubs. In particular he talks about how to get rid of them.

One part is about coding IO operatioins as data and using the GoF interpreter pattern to

What he’s talking about is of course free monads, but I feel he’s glossing over a lot of details. Based on some of the questions asked during the talk I think I share that feeling with some people in the audience. Specifically I feel he skipped over the following:

  • How does one actually write such code in a mainstream OO/imperative language?
  • What’s required of the language in order to allow using the techniques he’s talking about?
  • Errors tend to break abstractions, so how does one deal with error (i.e. exceptions)?

Every time I’ve used mocks and stubs for unit testing I’ve had a feeling that “this can’t be how it’s supposed to be done!” So to me, Ken’s talk offered some hope, and I really want to know how applicable the ideas are in mainstream OO/imperative languages.

The example

To play around with this I picked the following function (in Python):

def count_chars_of_file(fn):
    fd =, os.O_RDONLY)
    text =, 10000)
    n = len(text)
    return n

It’s small and simple, but I think it suffices to highlight a few important points. So the goal is to rewrite this function such that calls to IO operations (actions) (e.g. are replaced by data (an instance of some data type) conveying the intent of the operation. This data can later be passed to an interpreter of actions.

Thoughts on the execution of actions and the interpreter pattern

When reading the examples in the description of the interpreter pattern what stands out to me is that they are either

  1. a list of expressions, or
  2. a tree of expressions

that is passed to an interpreter. Will this do for us when trying to rewrite count_chars_of_file?

No, it won’t! Here’s why:

  • A tree of actions doesn’t really make sense. Our actions are small and simple, they encode the intent of a single IO operation.
  • A list of actions can’t deal with interspersed non-actions, in this case it’s the line n = len(text) that causes a problem.

The interpreter pattern misses something that is crucial in this case: the running of the interpreter must be intermingled with running non-interpreted code. The way I think of it is that not only the action needs to be present and dealt with, but also the rest of the program, that latter thing is commonly called a continuation.

So, can we introduce actions and rewrite count_chars_of_file such that we pause the program when interpretation of an action is required, interpret it, and then resume where we left off?

Sure, but it’s not really idiomatic Python code!

Actions and continuations

The IO operations (actions) are represented as a named tuple:

Op = collections.namedtuple('Op', ['op', 'args', 'k'])

and the functions returning actions can then be written as

def cps_open(fn, k):
    return Op('open', [fn], k)

def cps_read(fd, k):
    return Op('read', [fd], k)

def cps_close(fd, k):
    return Op('close', [fd], k)

The interpreter is then an if statement checking the value of op.op with each branch executing the IO operation and passing the result to the rest of the program. I decided to wrap it directly in the program runner:

def runProgram(prog):
    def runOp(op):
        if op.op == 'open':
            fd =*op.args, os.O_RDONLY)
            return op.k(fd)
        elif op.op == 'read':
            text =*op.args, 10000)
            return op.k(text)
        elif op.op == 'close':
            return op.k()

    while isinstance(prog, Op):
        prog = runOp(prog)

    return prog

So far so good, but what will count_char_of_file all of this do to count_chars_of_file?

Well, it’s not quite as easy to read any more (basically it’s rewritten in CPS):

def count_chars_of_file(fn):

    def cont_1(text, fd):
        n = len(text)
        return cps_close(fd, lambda n=n: n)

    def cont_0(fd):
        return cps_read(fd, lambda text, fd=fd: cont_1(text, fd))

    return cps_open(fn, cont_0)

Generators to the rescue

Python does have a notion of continuations in the form of generators.1 By making count_char_of_file into a generator it’s possible to remove the explicit continuations and the program actually resembles the original one again.

The type for the actions loses one member, and the functions creating them lose an argument:

Op = collections.namedtuple('Op', ['op', 'args'])

def gen_open(fn):
    return Op('open', [fn])

def gen_read(fd):
    return Op('read', [fd])

def gen_close(fd):
    return Op('close', [fd])

The interpreter and program runner must be modified to step the generator until its end:

def runProgram(prog):
    def runOp(op):
        if op.op == 'open':
            fd =[0], os.O_RDONLY)
            return fd
        elif op.op == 'read':
            text =[0], 10000)
            return text
        elif op.op == 'close':
            return None

        op = prog.send(None)
        while True:
            r = runOp(op)
            op = prog.send(r)
    except StopIteration as e:
        return e.value

Finally, the generator-version of count_chars_of_file goes back to being a bit more readable:

def count_chars_of_file(fn):
    fd = yield gen_open(fn)
    text = yield gen_read(fd)
    n = len(text)
    yield gen_close(fd)
    return n

Generators all the way

Limitations of Python generators mean that we have either have to push the interpreter (runProgram) down to where count_char_of_file is used, or make all intermediate layers into generators and rewrite the interpreter to deal with this. It could look something like this then:

def runProgram(prog):
    def runOp(op):
        if op.op == 'open':
            fd =[0], os.O_RDONLY)
            return fd
        elif op.op == 'read':
            text =[0], 10000)
            return text
        elif op.op == 'close':
            return None

        op = prog.send(None)
        while True:
            if isinstance(op, Op):
                r = runOp(op)
                op = prog.send(r)
            elif isinstance(op, types.GeneratorType):
                r = runProgram(op)
                op = prog.send(r)

    except StopIteration as e:
        return e.value

Final thoughts

I think I’ve shown one way to achieve, at least parts of, what Ken talks about. The resulting code looks almost like “normal Python”. There are some things to note:

  1. Exception handling is missing. I know of no way to inject an exception into a generator in Python so I’m guessing that exceptions from running the IO operations would have to be passed in via generator.send as a normal value, which means that exception handling code would have to look decidedly non-Pythonic.
  2. Using this approach means the language must have support for generators (or some other way to represent the rest of the program). I think this rules out Java, but probably it can be done in C#.
  3. I’ve only used a single interpreter here, but I see no issues with combining interpreters (to combine domains of operations like file operations and network operations). I also think it’d be possible to use it to realize what De Goes calls Onion Architecture.

Would I ever advocate this approach for a larger Python project, or even any project in an OO/imperative language?

I’m not sure! I think that testing using mocks and stubs has lead to a smelly code base each an every time I’ve used it, but this approach feels a little too far from how OO/imperative code usually is written. I would love to try it out and see what the implications really are.

  1. I know, I know, coroutines! I’m simplifying and brushing over details here, but I don’t think I’m brushing over any details that are important for this example.

January 31, 2017 12:00 AM

January 30, 2017

Ken T Takusagawa

[tdoyjpic] Testing the Collatz conjecture on 10-million-bit numbers

tl;dr: No counterexamples to the Collatz conjecture were found.

Assume the Collatz conjecture is false; that is, there exist starting numbers which do not end in the 4-2-1 cycle.

Further major assumption: these counterexamples become much more numerous for larger integers; the reason the Collatz conjecture has seemed true in the range that it has been verified so far is because of the Strong Law of Small Numbers.

Then, we might be able to find counterexamples simply by testing some random large starting numbers.  One problem: when testing a number, how can one tell if it is a counterexample, not headed to a 4-2-1 cycle?  How can one tell if one has iterated enough?

Simple practical solution: start by testing small random numbers and gradually increase the size.  The smaller numbers will all reach 4-2-1, and the number of iterations and computation time can be recorded for each of them.  As the numbers get larger, the number of iterations and computation time will increase sort of smoothly.  If we encounter a number which is taking much longer than the extrapolation of the trend seen thus far, then we know something weird is going on with it.

We tested random starting numbers as large as 10313058 bits, the last one taking 74494956 iterations over 12 hours of computing time (though it was not very optimized code).  Every number tested converged to 4-2-1.

Source code in Haskell and logs.

We wish we had SIMD accelerated (e.g., GPU) small*large arbitrary precision multiplication (previously mentioned) to compute 3*x for large x.  x/2 could also be accelerated with SIMD.  x+1 will only astronomically rarely overflow the least significant limb.

Previous similar attempt, which was much slower because then large integers were represented as lists of Bool.

by Ken ( at January 30, 2017 12:18 PM

January 29, 2017

Brent Yorgey

Advent of code #16 solution: an algebra of bitstrings

I had fun this past December solving Advent of Code problems in Haskell. I was particularly proud of my solution to one particular problem involving generating and processing large bitstrings, which I’d like to share here. I think it really shows off the power of an algebraic, DSL-based approach to problem solving.

This post is literate Haskell—download it and play along!

> {-# LANGUAGE GADTs #-}
> import Control.Arrow   ((***))
> import Data.Bits       (xor)
> import Data.List       (unfoldr)
> import Data.List.Split (chunksOf)
> import Data.Maybe      (fromJust)

The problem

You can go read the problem description if you like, but it’s rather verbose—I’ll try to give a more concise description here, illustrated with Haskell code.

The problem is concerned with strings of bits:

> type BitString = [Bool]

We’ll start just by defining a few utility functions to view and input bitstrings conveniently.

> readbits :: String -> BitString
> readbits = map (=='1')
> showbits :: BitString -> String
> showbits = map (\b -> if b then '1' else '0')
> withbits :: (BitString -> BitString) -> String -> String
> withbits f = showbits . f . readbits

Now on to the problem proper. There is a central operation—which I’ll call the “dragon transform”—which makes a longer bitstring from a shorter one. Given a bitstring s, append a 0 to the end, and then append a reversed and inverted version of s (where “invert” means to flip all the bits). Like so:

> invert :: BitString -> BitString
> invert = map not
> dragon :: BitString -> BitString
> dragon s = s ++ [False] ++ invert (reverse s)

For example,

ghci> withbits dragon "1"

ghci> withbits dragon "1101111"

(Incidentally, this operation is called dragon since it is related to the classic dragon curve. Hint: interpret 0 as a left turn and 1 as a right turn.)

Given a starting bitstring, and a target length, we are supposed to iterate dragon until we have at least the number of target bits, and then truncate the string to the desired length:

> fill :: Int -> BitString -> BitString
> fill len = take len . head . dropWhile ((< len) . length) . iterate dragon

For example, if we start with 1, we have to iterate dragon three times to end up with at least ten bits.

ghci> map showbits . take 4 $ iterate dragon [True]

ghci> withbits (fill 10) "1"

Finally, after extending an initial bitstring to a given length, we perform a checksum operation:

  • If there are an odd number of bits, we are done.
  • Otherwise, take the bits two at a time and compute the negation of their exclusive or: that is, 1 if the bits are the same and 0 if they are different (otherwise known as (==)). This results in a bitstring half as long. Now repeat the process, continuing to halve the length until we have an odd number of bits remaining.

In code:

> checksum :: BitString -> BitString
> checksum a
>   | odd (length a) = a
>   | otherwise = checksum . map xnor . chunksOf 2 $ a
>   where
>     xnor [x,y] = x == y

The first task

So, we now have a simple reference implementation that directly follows the specification. We can use this to solve the first task, which just asks to start with a given short bitstring, extend it to length 272, and then compute the checksum. I think different logged-in users get different starting strings, but mine was 01000100010010111:

> input = "01000100010010111"
ghci> withbits (checksum . fill 272) input

Notice that 272 = 17 \cdot 2^4, so after expanding to that length and then repeatedly halving the length, we end up with a checksum of length 17.

The second task

That was easy. Bring on the second task! Well… of course, it is much bigger. It asks to use the same starting bitstring, but this time extend it to length 35651584 = 17 \cdot 2^{21} before computing the checksum (which will again end up having length 17). Using this naive, unoptimized implementation completely blows up: it turns out that generating a list of 35 million booleans is really not a good idea. Using actual lists with a cons cell for each bit incurs a whole lot of memory and allocation overhead; it just made my computer grind to a halt.

As you may realize, there is a lot of low-hanging fruit here: for example, we can use an unboxed Vector instead of a list, or even do some deforestation to avoid allocation (the former code is by Eric Mertens aka glguy, the latter by Daniel Wagner aka dmwit). Using techniques like that, it’s possible to get the runtime and memory requirements down to something reasonable. But that’s not what I want to talk about. Though more efficient, those solutions are still actually computing every single bit. It seemed to me we shouldn’t have to do that: the computation has a lot of nice structure, and seemingly a lot of opportunity for sharing intermediate results. I went off in search of a way to compute the correct checksum without actually generating the entire intermediate bitstring.

Interlude: xnor

The first order of business was to work out an algebraic understanding of the xnor operation, which I will denote \overline{x \oplus y} (the circled plus operator \oplus denotes xor, and the overbar denotes logical negation). One fundamental fact is that

\displaystyle \overline{x \oplus y} = \overline{x} \oplus y = x \oplus \overline{y}

(checking whether x and y are equal is the same as first negating one and then checking whether they are unequal). From this, and the fact that \oplus is associative, we can prove associativity of xnor:

\displaystyle \overline{\overline{x \oplus y} \oplus z} = (\overline{x} \oplus y) \oplus \overline{z} = \overline{x} \oplus (y \oplus \overline{z}) = \overline{x \oplus \overline{y \oplus z}}

Associativity, along with the fact that 1 is an identity for the operation, means it forms a monoid. When we repeatedly take the xnor of adjacent bits, we are therefore basically doing an mconcat using a strictly balanced combining scheme. But associativity means we can be freer about the order in which we do the combining. If we start with a bitstring of length n \cdot 2^k, the checksumming operation iterates k times, and each consecutive sequence of 2^k bits gets folded down into a single bit via mconcat. In other words, the checksum operation can be reimplemented like this:

> checksum2 :: BitString -> BitString
> checksum2 a = map combine . chunksOf (powTwo (length a)) $ a
>   where
>     combine = foldr (==) True
>     -- Find the biggest power of two that divides n
>     powTwo n
>       | odd n     = 1
>       | otherwise = 2 * powTwo (n `div` 2)

Let’s check that this works:

ghci> withbits (checksum2 . fill 272) input

ghci> let bits = fill 272 (readbits input) in checksum bits == checksum2 bits

Now, this isn’t really any faster yet; but this idea will be important later!

There’s one more thing we can observe about xnor: if we fold an odd number of bits with xnor, it’s the same as taking the xor of all the bits; if we fold an even number of bits, it’s the same as taking the xor of all the bits and then negating the result. That is,

\displaystyle \begin{array}{rcl} \overline{x_1 \oplus x_2} &=& \overline{x_1 \oplus x_2} \\[0.5em] \overline{x_1 \oplus \overline{x_2 \oplus x_3}} &=& x_1 \oplus x_2 \oplus x_3 \\[0.5em] \overline{x \oplus \overline{x_2 \oplus \overline{x_3 \oplus x_4}}} &=& \overline{x_1 \oplus x_2 \oplus x_3 \oplus x_4} \\[0.5em] \overline{x \oplus \overline{x_2 \oplus \overline{x_3 \oplus \overline{x_4 \oplus x_5}}}} &=& x_1 \oplus x_2 \oplus x_3 \oplus x_4 \oplus x_5 \end{array}

and so on. The proof is a simple induction argument, making use of the relation \overline{x \oplus y} = \overline{x} \oplus y we noted before. So when folding xnor, as a simple optimization, we can avoid doing a lot of negations by just computing the xor and then negating appropriately based on the parity of the number of bits.

The algebra of bitstrings

With that under our belts, we can move on to the real meat of the solution. The central idea is that instead of representing bitstrings directly as lists (or vectors, or whatever) of bits, we represent them using a deep embedding of a little bitstring algebra (aka DSL). That is, we represent each bitstring operation as a constructor of an algebraic data type, which allows us to directly manipulate bitstring expressions. The point is that this algebra/DSL has a lot of nice structure that allows us to work at an abstract, algebraic level instead of working directly with bits.

There’s one more twist to note before actually seeing the data type definition. We know that we will need to talk about the length of bitstrings as well as their xnor/xor. Instead of having to recalculate these every time we need them, we can cache them at each node of a bitstring expression. We’ll see how these cached values come in handy later.

> data BitExpr where

So, what does our algebra of bitstrings need? First, it’s useful to have an explicit representation of the empty bitstring, as well as a singleton bit. We don’t need to cache length or xor values here, since they are obvious and can be computed in constant time.

>   Emp :: BitExpr
>   Bit :: Bool -> BitExpr

Next, we need to be able to append bitstrings. Notice the Bool, which represents the cached xor of the entire bitstring, as well as the Integer which represents the length.

>   App :: !Bool -> !Integer -> BitExpr -> BitExpr -> BitExpr

Finally, we need three unary operations on bitstrings: invert, reverse, and dragon. Each also carries a cached length and xor.

>   Inv :: !Bool -> !Integer -> BitExpr -> BitExpr
>   Rev :: !Bool -> !Integer -> BitExpr -> BitExpr
>   Drg :: !Bool -> !Integer -> BitExpr -> BitExpr
>   deriving Show

Note that Drg is redundant in some sense, since the dragon transform can be encoded in terms of append, inverse, and reverse. However, it’s critical that we include it explicitly: since the dragon transform uses the input bitstring twice, expanding an iterated application of Drg in terms of the other constructors would result in an exponential blowup in the size of the expression.

To be concrete, let’s write a straightforward interpreter which formally connects a bitstring expression with its intended semantics as a bitstring. This comes in handy for testing, but other than testing, the whole point is that we will not use this—we want to solve the problem at the level of bitstring expressions, without ever actually generating their corresponding bitstrings.

> toBits :: BitExpr -> BitString
> toBits Emp = []
> toBits (Bit b) = [b]
> toBits (App _ _ s1 s2) = toBits s1 ++ toBits s2
> toBits (Inv _ _ s) = invert  (toBits s)
> toBits (Rev _ _ s) = reverse (toBits s)
> toBits (Drg _ _ s) = dragon  (toBits s)

Next, let’s write some simple utility functions to extract the cached length or xor from the root of a bitstring expression:

> bsLen :: BitExpr -> Integer
> bsLen Emp           = 0
> bsLen (Bit _)       = 1
> bsLen (App _ l _ _) = l
> bsLen (Inv _ l _)   = l
> bsLen (Rev _ l _)   = l
> bsLen (Drg _ l _)   = l
> bsXor :: BitExpr -> Bool
> bsXor Emp           = False
> bsXor (Bit b)       = b
> bsXor (App b _ _ _) = b
> bsXor (Inv b _ _)   = b
> bsXor (Rev b _ _)   = b
> bsXor (Drg b _ _)   = b

Next, we’ll write some smart constructors which automatically take care of properly computing the cached length and xor.

> bit :: Bool -> BitExpr
> bit = Bit

Appending combines xor values with xor and adds lengths. app also does a bit of optimization when appending with the empty bitstring. For convenience, we can also use app to create a function bits to convert a literal bitstring into a BitExpr.

> app :: BitExpr -> BitExpr -> BitExpr
> app s1 Emp = s1
> app s1 s2 = App (bsXor s1 `xor` bsXor s2) (bsLen s1 + bsLen s2) s1 s2
> bits :: String -> BitExpr
> bits = foldr (app . bit . (=='1')) Emp

Inverting a bitstring preserves the xor when it has even length, and inverts the xor when it has odd length. Note how we make use of both the cached xor and length values to compute the new cached xor.

> inv :: BitExpr -> BitExpr
> inv s = Inv (if even (bsLen s) then bsXor s else not (bsXor s))
>             (bsLen s)
>             s

Reversing preserves xor and length.

> rev :: BitExpr -> BitExpr
> rev s = Rev (bsXor s) (bsLen s) s

Finally, the dragon operation: the xor of dragon s is the xor of s combined with the xor of inv s; the length is one more than twice the length of s.

> drg :: BitExpr -> BitExpr
> drg s = Drg (bsXor s `xor` bsXor (inv s)) (2*(bsLen s) + 1) s

We can test these:

ghci> let t = drg (bits "11" `app` inv (bits "10000"))
ghci> showbits . toBits $ t

ghci> bsLen t


Remember that our high-level goal is to take the expanded version of our bitstring, split it into blocks of length 2^k, and then separately reduce each block with xnor. It turns out that we have enough information to split a bitstring expression into two bitstring expressions which correspond to splitting off a block of a given size from the beginning of the corresponding bitstring. That is, we will write a function splitBits :: Integer -> BitExpr -> (BitExpr, BitExpr) which works like splitAt, but on bitstring expressions instead of bitstrings. In other words, it will satisfy the property

splitAt n . toBits == (toBits *** toBits) . splitBits n

We’ll go through the implementation case by case. You might like to try implementing splitBits yourself before peeking at mine; it makes for a nice exercise.

> splitBits :: Integer -> BitExpr -> (BitExpr, BitExpr)

In the base cases, to split zero bits off the front of a bitstring, or if we are asked to split off more bits than there are, just generate the empty bitstring expression.

> splitBits 0 s                = (Emp, s)
> splitBits n s | n >= bsLen s = (s, Emp)

To split an App node, compare the number of bits we want to split off with the length of the first bitstring, and recursively split in either the left or right side appropriately, remembering to subtract the length of the first bitstring from the number of bits to split if we recurse on the right side.

> splitBits n (App _ _ s1 s2)
>   | n < bsLen s1
>     = let (s1a, s1b) = splitBits n s1 in (s1a, s1b `app` s2)
>   | otherwise
>     = let (s2a, s2b) = splitBits (n - bsLen s1) s2 in (s1 `app` s2a, s2b)

Inverting commutes with splitting, so to split an Inv node, we can just split recursively and then rewrap the results with inv.

> splitBits n (Inv _ _ s) = (inv *** inv) $ splitBits n s

To split Rev and Drg nodes, we expand the expressions a bit to get rid of the top-level constructor before re-calling splitBits.

> splitBits n (Rev _ _ s) = splitBits n (pushRev s)
> splitBits n (Drg _ _ s) = splitBits n (expandDragon s)

In the case of Rev, we can “push the reverse through” one level, transforming it into an equivalent expression which no longer has a Rev node at the top. We make use of some nice algebraic properties governing the interaction of reverse with the other operations:

  • Reversing an empty or singleton bitstring does nothing.
  • reverse (s1 ++ s2) == reverse s2 ++ reverse s1
  • reverse . invert = invert . reverse
  • reverse . reverse = id
  • Finally, reverse . dragon = dragon . invert, which can be easily proved by expanding dragon in terms of the other operations and then applying the above algebraic laws.

Using these properties, we can implement pushRev as follows:

> pushRev :: BitExpr -> BitExpr
> pushRev Emp     = Emp
> pushRev (Bit b) = Bit b
> pushRev (App _ _ s1 s2) = rev s2 `app` rev s1
> pushRev (Inv _ _ s) = inv (rev s)
> pushRev (Rev _ _ s) = s
> pushRev (Drg _ _ s) = drg (inv s)

Finally, expandDragon just expands a dragon operation in terms of the other operations. Although this approximately doubles the size of the bitstring expression, we only do this lazily, when we are actually trying to split the result of a dragon transform. It’s only natural that splitting an expression results in somewhat larger expressions.

> expandDragon :: BitExpr -> BitExpr
> expandDragon s = s `app` (bit False `app` inv (rev s))

Filling and checksumming

We’re almost there! We can now implement the fill and checksum operations at the level of bitstring expressions.

fill is straightforward: keep applying the drg smart constructor until the cached length is sufficient, then use splitBits to create an expression corresponding to only the first n bits.

> fillE :: Integer -> String -> BitExpr
> fillE n str = fst . splitBits n $ go (bits str)
>   where
>     go s | bsLen s >= n = s
>          | otherwise    = go (drg s)

Finally, we can implement checksumE using the same pattern as checksum2, where we break up the string into chunks of size 2^k and then reduce each chunk. The only difference is that now we use splitBits to split, and the cached xor to compute the reduction. We know each of the blocks has an even length, so the xnor is just the negation of the cached xor.

> checksumE :: BitExpr -> BitString
> checksumE s = map (not . bsXor) . unfoldr doSplit $ s
>   where
>     doSplit Emp = Nothing
>     doSplit s   = Just (splitBits blockLen s)
>     blockLen = powTwo (bsLen s)
>     powTwo n
>       | odd n     = 1
>       | otherwise = 2 * powTwo (n `div` 2)

Let’s check that we get the same answer for the first task:

ghci> showbits $ checksumE (fillE 272 input)

ghci> withbits (checksum . fill 272) input

Great! And now for the second task:

ghci> showbits $ checksumE (fillE (17 * 2^21) input)

On my machine this finishes pretty much instantaneously, taking only 0.02 seconds. In order to generate enough bits, the dragon transform must be applied 21 times, but that just generates a small expression with 21 Drg constructors. Splitting into chunks of length 2^{21} certainly expands the size of the expressions a bit, but everything stays nice and logarithmic since many of the Drg constructors can remain unexpanded.

In fact, this can easily handle MUCH larger problem instances. For example:

ghci> showbits $ checksumE (fillE (17 * 2^80) input)

ghci> showbits $ checksumE (fillE (17 * 2^81) input)

Semantically, this corresponds to generating yottabytes worth of bits (I had to look up the proper prefix) and then checksumming them; operationally, though, these are still basically instantaneous. (Interestingly, I also tried 17 \cdot 2^{200}, and it instantaneously printed the first 11 bits of the answer and then segfaulted. Perhaps I have found a bug in GHC 8.0.2.)

Notice that the checksum for 17 \cdot 2^{81} is actually the same as that for 17 \cdot 2^{21}. After playing around with it a bit, the checksums for 17 \cdot 2^k seem to have a period of 12, but I’m not sure how to prove it!

by Brent at January 29, 2017 08:57 PM

Gabriel Gonzalez

Typed Nix programming using Dhall

<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>

I recently released a typed and total configuration language named Dhall that you can use to configure Haskell programs. However, Dhall would be more useful if you could configure other programming languages, like Nix.

Nix powers the Nix package manager and the Nix operating system and if you've never used Nix before then please give Nix a try. We use Nix heavily at Awake Networks for managing our deployments but one of our biggest complaints about Nix is the type system:

  • Nix expressions cannot be annotated with types to guide the user
  • You can't use sum types to make invalid states unrepresentable
  • Type errors are not very helpful and poorly located
  • Many Nix builtins and idioms are inherently difficult to statically analyze

To mitigate this problem I contributed a new Dhall-to-Nix compiler that lets you carve out a restricted subset of Nix with a more user-friendly type system.

This post covers:

  • How to use this in your own Nix projects
  • Details of the translation process from Dhall to Nix
  • Benefits and drawbacks of using Dhall to program Nix

User interface

I began by creating a Dhall to Nix compiler that can translate arbitrary Dhall expressions to equivalent Nix expressions. This compiler is not limited to simple values like records or literals: you can even compile Dhall functions to Nix functions.

The compiler takes a Dhall expression on standard input and emits the corresponding Nix expression on standard output:

$ dhall-nix <<< "{ foo = 1, bar = True }"
{ bar = true; foo = 1; }
$ dhall-nix <<< "λ(x : Bool) → x == False"
x: x == false

However, you do not need to install or manually run the compiler to use Dhall within your Nix project. I went a step further and added a dhallToNix utility to nixpkgs which automatically converts Dhall expressions to Nix expressions for you. This utility automatically bootstraps and caches the dhall-to-nix compiler as part of the evaluation process.

Here's an example of how you can use the dhallToNix function to embed arbitrary Dhall expressions inside of Nix:

pkgs = import <nixpkgs> { };

inherit (pkgs) dhallToNix;

{ example0 = dhallToNix "λ(x : Bool) → x == False" false;
example1 = (dhallToNix "{ foo = 1, bar = True }").foo;
example2 = dhallToNix "List/reverse Integer" [1 2 3];

... and that will evaluate to:

{ example0 = true;
example1 = 1;
example2 = [3 2 1];

You can find a larger test suite here exercising all the features of the Dhall language by compiling them to Nix.


The Dhall language is heavily inspired by Nix, so many Dhall language features translate mechanically to Nix language features. For example:

  • Dhall's records translate to Nix's records
  • Primitive Dhall values (like integers) translate to primitive Nix values
  • Dhall's anonymous functions translate to Nix's anonymous functions

... and so on

However, Dhall does some things differently from Nix, and these differences fell into four categories:

  • Dhall supports anonymous sum types, while Nix does not
  • Dhall provides a different set of built-in functions from Nix
  • Dhall is explicitly typed and uses type abstraction and type application
  • Dhall supports floating point numbers, while Nix does not

Compiling sum types was pretty simple: just Church-encode them. For example, a sum type like this:

< Left = 2 | Right : Bool >

... translates into the following Nix expression:

{ Left, Right }: Left 2

In other words, you can encode a sum type as a "pre-formed pattern match", which you consume in Nix by providing one function per alternative to handle:

dhallToNix "< Left = | Right : Bool >" {
Left = n : n == 0; # Handler for the `Left` case
Right = b : b; # Handler for the `Right` case

The dhallToNix invocation evaluates to:

({ Left, Right }: Left 2) {
Left = n : n == 0;
Right = b : b;

... which in turn reduces to:

(n : n == 0) 2

... which in turn reduces to:


Built-in functions which were missing in Nix required the most effort. I had to translate them to efficient implementations based on other Nix-builtins. For example, Dhall's List/reverse primitive uses Nix's builtins.genList and builtins.elemAt under the hood:

dhall-to-nix <<< "List/reverse"
t: xs: let
n = builtins.length xs;
in builtins.genList (i:
builtins.elemAt xs (n - i - 1)) n

The most complex builtin to translate was Dhall's (∧) operation for merging records, which is similar to Nix's (//) operator except that (∧) also merges children recursively:

$ dhall-to-nix <<< "λ(x : { foo : Bool }) → λ(y : { bar : Text }) → x ∧ y"
x: y: let
combine = kvsL: kvsR: let
ks = builtins.attrNames kvsL ++ builtins.attrNames kvsR;
toKeyVals = k:
if builtins.hasAttr k kvsL
then if builtins.hasAttr k kvsR
then if builtins.isAttrs (builtins.getAttr k kvsL) && builtins.isAttrs (builtins.getAttr k kvsR)
then [
name = k;
value = combine (builtins.getAttr k kvsL) (builtins.getAttr k kvsR);
else [
name = k;
value = builtins.getAttr k kvsL;
else [
name = k;
value = builtins.getAttr k kvsL;
else if builtins.hasAttr k kvsR
then [
name = k;
value = builtins.getAttr k kvsR;
else [];
in builtins.listToAttrs (builtins.concatLists (map toKeyVals ks));
in combine x y

The last tricky part was translating Dhall's explicit type abstraction and type application. "Explicit type abstraction" means that polymorphic (or "generic") functions in Dhall are encoded as ordinary functions that take a type as a function argument. For example, this is how you encode the polymorphic identity function in Dhall:

λ(a : Type) → λ(x : a) → x

"Explicit type application" means that you specialize polymorphic functions by applying them to a type argument specifying which type to use. For example, this is how you use the polymorphic identity function:

$ echo "λ(a : Type) → λ(x : a) → x" > id
$ dhall <<< "./id Integer 4"

$ dhall <<< "./id Integer" # Just specialize the function
(x : Integer)Integer

λ(x : Integer)x

dhall-to-nix translates polymorphic functions to functions that just ignore their type argument. For example, the polymorphic identity function becomes:

$ dhall-to-nix <<< "./id"
a : x : x

The first argument named a is the type and the corresponding Nix function still expects the argument but never uses it.

For type application, dhall-to-nix translate all types to an empty Nix record:

$ dhall-to-nix <<< "Integer"

... which is then ignored by any polymorphic function:

$ dhall-to-nix <<< "./id"
a : x : x
$ dhall-to-nix <<< "./id Integer"
x : x
$ dhall-to-nix <<< "./id Integer 4"

Some Dhall built-in functions are also polymorphic, and we treat them the same way. For example, the List/reverse function is polymorphic, which is why the first argument in the corresponding Nix expression is an unused type argument named t:

$ dhall-to-nix <<< "List/reverse"
t: xs: let
n = builtins.length xs;
in builtins.genList (i:
builtins.elemAt xs (n - i - 1)) n
$ dhall-to-nix <<< "List/reverse Integer"
(t: xs: let
n = builtins.length xs;
in builtins.genList (i:
builtins.elemAt xs (n - i - 1)) n) {}
$ dhall-to-nix <<< "List/reverse Integer ([1, 2, 3] : List Integer)"
[3 2 1]

Finally, floating point numbers are not supported in Nix at all, so the dhall-to-nix compiler must reject Double values:

$ dhall-to-nix <<< "1.0"

Error: No doubles

Explanation: Dhall values of type ❰Double❱ cannot be converted to Nix
expressions because the Nix language provides no way to represent floating point

You provided the following value:


... which cannot be translated to Nix


When first learning Nix, particularly NixOS, you'll frequently run into the issue where you're not sure what values you're expected to provide due to the lack of a type system. Dhall can fix that in several ways:

  • You can request the inferred types of functions so that you know what type of function argument you need to supply
  • You can also provide users with a "schema" for an expected value, which in Dhall is just an ordinary type annotation pointing to a remote path
  • You can replaced weakly typed values (like strings) with more strongly typed representations

The following example will illustrate all of the above points

For example a derivation in Nix can be minimally represented in Dhall as the following type:

{ name : Text, builder : Text, system : Text }

... which you can save to a file named derivation and use to check if other expressions match the expected type:

$ echo "{ name : Text, builder : Text, system : Text }" > derivation
$ dhall <<EOF
{ name = "empty"

-- Dhall supports Nix-style multi-line string literals, too
, builder = ''
touch $out

, system = "x86_64-linux"
} : ./derivation
{ builder : Text, name : Text, system : Text }

{ builder = "\ntouch $out\n\n", name = "empty", system = "x86_64-linux" }

If we mistype a field name the type annotation will flag the error:

$ dhall <<EOF
{ name = "empty", builder = "touch $out", sytem = "x86_64-linux" }
: ./derivation
Use "dhall --explain" for detailed errors

Error: Expression doesn't match annotation

... and we can ask for more detailed error messages:

$ dhall --explain <<EOF
{ name = "empty", builder = "touch $out", sytem = "x86_64-linux" }
: ./derivation

Error: Expression doesn't match annotation

Explanation: You can annotate an expression with its type or kind using the
❰:❱ symbol, like this:

│ x : t │ ❰x❱ is an expression and ❰t❱ is the annotated type or kind of ❰x❱

The type checker verifies that the expression's type or kind matches the
provided annotation

For example, all of the following are valid annotations that the type checker

│ 1 : Integer │ ❰1❱ is an expression that has type ❰Integer❱, so the type
└─────────────┘ checker accepts the annotation

│ Natural/even +2 : Bool │ ❰Natural/even +2❱ has type ❰Bool❱, so the type
└────────────────────────┘ checker accepts the annotation

│ List : TypeType │ ❰List❱ is an expression that has kind ❰Type → Type❱,
└────────────────────┘ so the type checker accepts the annotation

│ List Text : Type │ ❰List Text❱ is an expression that has kind ❰Type❱, so
└──────────────────┘ the type checker accepts the annotation

However, the following annotations are not valid and the type checker will
reject them:

│ 1 : TextThe type checker rejects this because ❰1❱ does not have type
└──────────┘ ❰Text

│ List : Type │ ❰List❱ does not have kind ❰Type❱

You or the interpreter annotated this expression:

↳ { builder = "touch $out", name = "empty", sytem = "x86_64-linux" }

... with this type or kind:

↳ { builder : Text, name : Text, system : Text }

... but the inferred type or kind of the expression is actually:

↳ { builder : Text, name : Text, sytem : Text }

Some common reasons why you might get this error:

● The Haskell Dhall interpreter implicitly inserts a top-level annotation
matching the expected type

For example, if you run the following Haskell code:

│ >>> input auto "1" :: IO Text

... then the interpreter will actually type check the following annotated

│ 1 : Text │

... and then type-checking will fail


We can also take advantage of the fact that Dhall supports sum types so that we can make invalid states unrepresentable. For example, the system field really shouldn't be Text because not all strings are valid systems.

We can fix this by first creating a type more accurately representing all supported platforms. First, we just need to create a sum type for all supported operating systems:

$ dhall > OperatingSystem <<EOF
< cygwin : {}
| darwin : {}
| linux : {}
| netbsd : {}
| openbsd : {}
| solaris : {}

... along with helper utilities to build each operating system:

$ dhall > linux <<EOF
< linux = {=}
| cygwin : {}
| darwin : {}
| netbsd : {}
| openbsd : {}
| solaris : {}

Then we can create a sum type for each supported architecture:

$ dhall > Architecture <<EOF
< aarch64 : {}
| armv5tel : {}
| armv6l : {}
| armv7l : {}
| i686 : {}
| mips64el : {}
| x86_64 : {}

... as well as helper constructors for each architecture type:

$ dhall > x86_64 <<EOF
< x86_64 = {=}
| aarch64 : {}
| armv5tel : {}
| armv6l : {}
| armv7l : {}
| i686 : {}
| mips64el : {}

... and then we can create a type for supported Nix platforms:

$ dhall > Platform <<EOF
{ operatingSystem : ./OperatingSystem
, architecture : ./Architecture

... and helper utilities for each platform:

$ dhall > x86_64-linux <<EOF
{ architecture = ./x86_64
, operatingSystem = ./linux

... and verify that they type-check against our Platform type:

$ dhall <<< "./x86_64-linux : ./Platform"
{ architecture : < aarch64 : {} | armv5tel : {} | armv6l : {} | armv7l : {} | i686 : {} | mips64el : {} | x86_64 : {} >, operatingSystem : < cygwin : {} | darwin : {} | linux : {} | netbsd : {} | openbsd : {} | solaris : {} > }

{ architecture = < x86_64 = {=} | aarch64 : {} | armv5tel : {} | armv6l : {} | armv7l : {} | i686 : {} | mips64el : {} >, operatingSystem = < linux = {=} | cygwin : {} | darwin : {} | netbsd : {} | openbsd : {} | solaris : {} > }

Then we can always add a Nix translation layer that converts the strongly typed Dhall version to the weakly typed Nix string:

# platform.nix

pkgs = import <nixpkgs> { };

inherit (pkgs) dhallToNix;

architectureToText =
architecture {
aarch64 = _ : "aarch64" ;
armv5tel = _ : "armv5tel";
armv6l = _ : "armv6l" ;
armv7l = _ : "armv7l" ;
i686 = _ : "i686" ;
mips64el = _ : "mips64el";
x86_64 = _ : "x86_64" ;

operatingSystemToText =
operatingSystem {
cygwin = _ : "cygwin" ;
darwin = _ : "darwin" ;
linux = _ : "linux" ;
netbsd = _ : "netbsd" ;
openbsd = _ : "openbsd";
solaris = _ : "solaris";

platformToText =
{architecture, operatingSystem}:
arch = architectureToText architecture ;
os = operatingSystemToText operatingSystem;
platformToText (dhallToNix "${./x86_64-linux} : ${./Platform}")

... which would type-check our ./x86_64-linux file against the ./Platform and return the following result:


We can even go a step further and implement the intermediate functions in Dhall, too:

pkgs = import <nixpkgs> { };

inherit (pkgs) dhallToNix;

platformToText =
dhallToNix ''
let architectureToText
= λ(x : ${./Architecture} )
{ aarch64 = λ(_ : {}) → "aarch64"
, armv5tel = λ(_ : {}) → "armv5tel"
, armv6l = λ(_ : {}) → "armv6l"
, armv7l = λ(_ : {}) → "armv7l"
, i686 = λ(_ : {}) → "i686"
, mips64el = λ(_ : {}) → "mips64el"
, x86_64 = λ(_ : {}) → "x86_64"
x : Text
in let operatingSystemToText
= λ(x : ${./OperatingSystem} )
{ cygwin = λ(_ : {}) → "cygwin"
, darwin = λ(_ : {}) → "darwin"
, linux = λ(_ : {}) → "linux"
, netbsd = λ(_ : {}) → "netbsd"
, openbsd = λ(_ : {}) → "openbsd"
, solaris = λ(_ : {}) → "solaris"
x : Text
in let platformToText
= λ(x : ${./Platform} )
architectureToText x.architecture
++ "-"
++ operatingSystemToText x.operatingSystem

in platformToText
platformToText (dhallToNix "${./x86_64-linux} : ${./Platform}")

However, in practice you'd like to keep the platform expression as the original strongly typed record instead of converting the platform to a string. The original record lets you more easily extract the architecture and operating system fields and make decisions on their values using exhaustive pattern matching.


The largest drawback of using Dhall to program Nix is that Dhall cannot encode many common idioms used in nixpkgs. Some examples of idioms that do not translate well to Nix are:

  • The callPackage idiom that nixpkgs uses very heavily for easily updating dependencies. This relies on reflection and recursive fixpoints, neither of which Dhall supports
  • Anything which uses builtins.listToAttrs or does reflection on record field names

I don't expect Dhall to be used at all in nixpkgs, but I do believe Dhall can benefit end users or companies for their own internal Nix projects.


The dhallToNix utility is available right now in the nixpkgs-unstable channel if you would like to try this out in your own project:

Also, if you would like to use the dhall-to-nix compiler for other purposes you can find the compiler on Hackage or Github:

If you're new to Dhall you can learn more about the configuration language by reading the Dhall tutorial.

by Gabriel Gonzalez ( at January 29, 2017 01:55 AM

January 27, 2017

Dominic Steinitz

Warming up for NUTS (No U-Turn)

I have been thinking about writing a blog on why the no u-turn sampler (NUTS) works rather than describing the actual algorithm. This led me to look at Jared Tobin’s Haskell implementation. His example tries to explore the Himmelblau function but only finds one local minima. This is not unexpected; as the excellent Stan manual notes

Being able to carry out such invariant inferences in practice is an altogether different matter. It is almost always intractable to find even a single posterior mode, much less balance the exploration of the neighborhoods of multiple local maxima according to the probability masses.


For HMC and NUTS, the problem is that the sampler gets stuck in one of the two "bowls" around the modes and cannot gather enough energy from random momentum assignment to move from one mode to another.

rm(list = ls(all.names=TRUE))

## [1] "2.12.0"
rstan_options(auto_write = TRUE)

On the Rosenbrock function it fares much better.


plot of chunk unnamed-chunk-2

We can’t (at least I don’t know how to) try Stan out on Rosenbrock as its not a distribution but we can try it out on another nasty problem: the funnel. Some of this is taken directly from the Stan manual.

Here’s the Stan:

parameters {
  real y;
  vector[9] x;
model {
  y ~ normal(0,3);
  x ~ normal(0,exp(y/2));

which we can run with the following R:

funnel_fit <- stan(file='funnel.stan', cores=4, iter=10000)
## Warning: There were 92 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## Warning: Examine the pairs() plot to diagnose sampling problems
funnel_samples <- extract(funnel_fit,permuted=TRUE,inc_warmup=FALSE);
funnel_df <- data.frame(x1=funnel_samples$x[,1],y=funnel_samples$y[])

Plotting the data requires some unpleasantness but shows the neck of the funnel does not get explored. So even HMC and NUTS do not perform well.

midpoints <- function(x, dp=2){
    lower <- as.numeric(gsub(",.*","",gsub("\\(|\\[|\\)|\\]","", x)))
    upper <- as.numeric(gsub(".*,","",gsub("\\(|\\[|\\)|\\]","", x)))
    return(round(lower+(upper-lower)/2, dp))

df <- funnel_df[funnel_df$x1 < 20 & funnel_df$x1 > -20 & funnel_df$y < 9 & funnel_df$y > -9,]
x_c <- cut(df$x1, 20)
y_c <- cut(df$y, 20)
z <- table(x_c, y_c)
z_df <-
a_df <- data.frame(x=midpoints(z_df$x_c),y=midpoints(z_df$y_c),f=z_df$Freq)

m = as.matrix(dcast(a_df,x ~ y))
## Using f as value column: use value.var to override.
hist3D(x=m[,"x"],y=as.double(colnames(m)[2:21]),z=(as.matrix(dcast(a_df,x ~ y)))[,2:21], border="black",ticktype = "detailed",theta=5,phi=20)
## Using f as value column: use value.var to override.

Since the analytic form of the distribution is known, one can apply a trick to correct this problem and then one is sampling from unit normals!

parameters {
  real y_raw;
  vector[9] x_raw;
transformed parameters {
  real y;
  vector[9] x;

  y <- 3.0 * y_raw;
  x <- exp(y/2) * x_raw;
model {
  y_raw ~ normal(0,1);
  x_raw ~ normal(0,1);

And now the neck of the funnel is explored.

funnel_fit <- stan(file='funnel_reparam.stan', cores=4, iter=10000)
funnel_samples <- extract(funnel_fit,permuted=TRUE,inc_warmup=FALSE);
funnel_df <- data.frame(x1=funnel_samples$x[,1],y=funnel_samples$y[])

df <- funnel_df[funnel_df$x1 < 20 & funnel_df$x1 > -20 & funnel_df$y < 9 & funnel_df$y > -9,]
x_c <- cut(df$x1, 20)
y_c <- cut(df$y, 20)
z <- table(x_c, y_c)
z_df <-
a_df <- data.frame(x=midpoints(z_df$x_c),y=midpoints(z_df$y_c),f=z_df$Freq)

m = as.matrix(dcast(a_df,x ~ y))
## Using f as value column: use value.var to override.
hist3D(x=m[,"x"],y=as.double(colnames(m)[2:21]),z=(as.matrix(dcast(a_df,x ~ y)))[,2:21], border="black",ticktype = "detailed",theta=5,phi=20)
## Using f as value column: use value.var to override.

We’d expect the Haskell implementation to also fail to explore the neck. Maybe I will return to this after the article on why NUTS works.

by Dominic Steinitz at January 27, 2017 05:02 PM

January 25, 2017

Roman Cheplyaka

How much space does an 8-bit integer occupy in C and Haskell?

How much space does an unsigned 8-bit integer occupy in C and Haskell?

Neither the C99 standard nor the Haskell2010 standard specifies such low-level details, so the answer could in theory be anything. To have something to work with, let’s make the following assumptions:

  • architecture: x86-64
  • ABI/calling conventions: System V
  • C compiler: GCC 6.3
  • Haskell compiler: GHC 8.0


In C, the unsigned 8-bit integer type is called uint8_t. It is defined in the header stdint.h. Its width is guaranteed to be exactly 8 bits; thus, its size is 1 byte.

But how much space does it really occupy? That depends on two factors

  • whether it is a function argument or return value, or a (local or global) variable
  • whether it is part of an array or struct

Function arguments and return values

According to the AMD64 System V ABI, the first 6 integer arguments are passed via registers and the rest are passed on the stack. If a function returns a single integer value, it is passed back in a register. Since the integer registers are 64-bit wide, when a uint8_t value is passed in a register, it effectively occupies 8 bytes.

To illustrate, consider this function:

uint8_t plus(uint8_t a, uint8_t b) {
  return a+b;

GCC generates the following code:

lea    (%rsi,%rdi,1),%eax

The two arguments are passed in the 64-bit registers %rsi and %rdi. Although the result is written to a 32-bit register %eax, it is part of the 64-bit register %rax, and the other 32 bit of that register cannot be reused easily while %eax is occupied.

What about the arguments passed through the stack? The ABI dictates that their sizes, too, are rounded up to 8 bytes. This allows to preserve stack alignment without complicating the calling conventions.


uint8_t plus(uint8_t a, uint8_t b, uint8_t c,
             uint8_t d, uint8_t e, uint8_t f,
             uint8_t g) {
  return a+g;

translates into

mov    %edi,%eax
add    0x8(%rsp),%al

We see that the g argument is 8 bytes below the stack boundary, (%rsp). These whole 8 bytes are dedicated to our tiny int.

When uint8_t’s are part of a struct or similar, they occupy one byte each. Curiously, if the struct is 16 bytes or smaller, the uint8_t’s will be packed into registers!

struct twobytes {
  uint8_t a;
  uint8_t b;

uint8_t plus(struct twobytes p) {
  return p.a+p.b;

compiles into

mov    %edi,%eax
movzbl %ah,%eax
add    %edi,%eax

Both bytes are passed inside %edi, and the intermediate 1-byte %ah register is used to take them apart.

Local and global variables

Like function arguments, local variables can reside in registers or on the stack. But unlike function arguments, local variables are not constrained by calling conventions; the compiler can do whatever it wants.

When an 8-bit local variable is stored in a register, it effectively occupies the whole 64-bit register, as there is only one 8-bit “subregister” per general-purpose register (unlike in x86).

What happens to the local uint8_t variables stored on the stack? We can compile this test program to find out:

uint8_t plus(uint8_t a, uint8_t b) {
  volatile uint8_t c = a+b;
  return c;
add    %edi,%esi
mov    %sil,-0x1(%rsp)
movzbl -0x1(%rsp),%eax

The volatile keyword is needed to force the compiler to store the local variable c on the stack rather than in a register. As we see, c is stored at -0x1(%rsp), so 1 byte is enough here. This is because there is no alignment requirement for 8-bit integers. The same is true for global variables.


In Haskell, the unsigned 8-bit integer type is called Word8. Its canonical module according to the standard is Data.Word, but in GHC, it is originally defined in GHC.Word and then re-exported from Data.Word.

Word8 is a boxed type. The space occupied by every boxed type in Haskell consists of two parts: the header and the payload. Here is a helpful picture from the GHC wiki:

<figure> </figure>

Note that stuff on the bottom of the picture — the info table and the entry code — is read-only static data shared among all instances of the given type and even across multiple copies of the same program, so we don’t count it towards the space occupied by a value.

The header is a structure that (on x86-64) normally consists of 8 bytes — a pointer to the entry code for the object.

The value of our byte is stored in the payload. But how exactly? Let’s look at the definition of Word8 in GHC.Word:

-- Word8 is represented in the same way as Word. Operations may assume
-- and must ensure that it holds only values from its logical range.

data Word8 = W8# Word#
-- ^ 8-bit unsigned integer type

Word# is an unboxed machine-word-sized unsigned integer, i.e. a 64-bit integer for x86-64.

In total, a Word8 lying around occupies 16 bytes. When computing with Word8’s inside some kind of inner loop, they will normally be unboxed into Word#’s and passed around in 8-byte registers or in 8-byte cells on the (Haskell) stack — more or less like in C.

Thus, during computation, Haskell is not that different from C. But what about storage? Can multiple Word8’s be packed together densely?


Say, we need a structure, TwoBytes, consisting of two Word8’s. We intend to use it as a key and/or element type in a large dictionary, so we’d like to keep it as compact as possible. (Note that Data.Map already adds a 48 bytes overhead per key/value.)

If we declare TwoBytes in the most naive way

data TwoBytes = TwoBytes Word8 Word8

the structure will occupy 56 bytes! TwoBytes would consist of a header (8 bytes) and a payload consisting of two pointers (8 bytes each), each pointing to a Word8 (16 bytes each).

A more efficient way to declare TwoBytes is

data TwoBytes = TwoBytes {-# UNPACK #-} !Word8
                         {-# UNPACK #-} !Word8

This makes the fields strict and unpacked, so that the two bytes are stored directly in TwoBytes’s payload. This occupies 24 bytes — “only” 12 bytes per Word8. Compared to a single Word8, we see some economy, but it only amortizes the header. No matter how many Word8’s we put together, the size won’t get below 8 bytes per Word8.

To pack bytes together, we can use an unboxed vector:

data TwoBytes = TwoBytes {-# UNPACK #-} !(Vector Word8)

To see how much memory this structure occupies, we need to see the definition of Vector and the underlying ByteArray:

-- | Unboxed vectors of primitive types
data Vector a = Vector {-# UNPACK #-} !Int
                       {-# UNPACK #-} !Int
                       {-# UNPACK #-} !ByteArray -- ^ offset, length, underlying byte array
data ByteArray = ByteArray ByteArray#

The runtime representation of ByteArray# is a pointer to the StgArrBytes structure defined in includes/rts/storage/Closures.h:

typedef struct {
    StgHeader  header;
    StgWord    bytes;
    StgWord    payload[FLEXIBLE_ARRAY];
} StgArrBytes;

The space required for a ByteArray# is 8 bytes for the header, 8 bytes for the length, and the payload, rounded up to whole words (see stg_newByteArrayzh in rts/PrimOps.cmm) — so 8 bytes in our case, 24 in total.

The size of Vector, therefore, is 8 bytes for the header, 16 bytes for the offset and length (needed to provide O(1) slicing for vectors), 8 bytes for the pointer to the ByteArray#, and 24 bytes for ByteArray# itself; total of 56 bytes.

This is the opposite of the previous definition in that the representation is asymptotically efficient, requiring 1 byte per Word8, but the upfront cost makes it absolutely impractical for TwoBytes.

Even if we cut out the middleman and used ByteArray directly:

data TwoBytes = TwoBytes {-# UNPACK #-} !ByteArray

… it would only get us to 40 bytes.

The most frugal approach for the case of two bytes is to define

data TwoBytes = TwoBytes {-# UNPACK #-} !Word

(16 bytes) and do packing/unpacking by hand. This is a rare case where a Haskell programmer needs to write code that a C compiler would generate (recall two bytes packed into %edi) and not the other way around.

If GHC provided a Word8# unboxed type, we could use the earlier defined

data TwoBytes = TwoBytes {-# UNPACK #-} !Word8
                         {-# UNPACK #-} !Word8

which would still occupy 16 bytes but be more conventient to work with than a single Word. But that’d require a major change to the compiler, and it’s probably not worth the hassle.


In both C and Haskell, a byte-sized integer occupies 8 bytes when it is actively worked upon (i.e. kept in a register) and 1 byte when many of them are stored in an array/vector.

However, when storing single Word8’s or small structures like TwoBytes, Haskell is not as memory-efficient. Primarily this is because idiomatic Haskell relies heavily on pointers and everything is word-aligned.

January 25, 2017 08:00 PM

January 24, 2017

FP Complete

QuickCheck and Magic of Testing

Haskell is an amazing language. With its extremely powerful type system and a pure functional paradigm it prevents programmers from introducing many kinds of bugs, that are notorious in other languages. Despite those powers, code is still written by humans, and bugs are inevitable, so writing quality test suites is just as important as writing an application itself.

Over the course of history buggy software has cost industry billions of dollars in damage and even lost human lives, so I cannot stress enough, how essential testing is for any project.

One of the ways to test software is through writing unit tests, but since it is not feasible to test all possible inputs exhaustively for most functions, we usually check some corner cases and occasionally test with other arbitrary values. Systematic generation of random input, that is biased towards corner cases, could be very helpful in that scenario, and that's where QuickCheck comes into play. This state of the art property testing library was originally invented in Haskell, and, because it turned out to be so powerful, it was later ported to other languages. However, the real power of random testing is unleashed when combined with purity of Haskell.


Let's start by looking at this exemplar properties of a reverse function:

reverse (reverse xs) == xs

reverse (xs ++ ys) == reverse ys ++ reverse xs

We know, that they will hold for all finite lists with total values. Naturally, there are ways to prove them manually and there are even tools for Haskell, such as LiquidHaskell, that can help you automate proving some properties. Formal proof of correctness of a program is not always possible: some properties are either too hard or impossible to prove. Regardless of ability to prove a property of a function, we at least need to check that it works correctly on some finite set of inputs.

import Test.QuickCheck

prop_RevRev :: Eq a => [a] -> Bool
prop_RevRev xs = reverse (reverse xs) == xs

prop_RevApp :: [Int] -> [Int] -> Bool
prop_RevApp xs ys = reverse (xs ++ ys) == reverse ys ++ reverse xs

We can load those properties into GHCi and run quickCheck on them. Here is a quick way on how to do it from a terminal, and a detailed guide on how to get started with stack.

$ stack --resolver lts-7.16 ghci --package QuickCheck
Configuring GHCi with the following packages: 
GHCi, version 8.0.1:  :? for help
Loaded GHCi configuration from /tmp/ghci3260/ghci-script
Prelude> :load examples.hs 
[1 of 1] Compiling Main             ( examples.hs, interpreted )
Ok, modules loaded: Main.
*Main> quickCheck prop_RevRev
+++ OK, passed 100 tests.
*Main> quickCheck prop_RevApp
+++ OK, passed 100 tests.

What just happened? QuickCheck called prop_RevRev and prop_RevApp 100 times each, with random lists as arguments and declared those tests as passing, because all calls resulted in True. Far beyond what a common unit test could have done.

Worth noting, that in reality, not only prop_RevRev, but both of those properties are polymorphic and quickCheck will be happy to work with such functions, even if type signatures were inferred, and it will run just fine in GHCi. On the other hand, while writing a test suite, we have to restrict the type signature for every property to concrete type, such as [Int] or Char, otherwise type checker will get confused. For example, this program will not compile:

import Test.QuickCheck

main :: IO ()
main = quickCheck (const True)

For the sake of example let's write couple more self-explanatory properties:

prop_PrefixSuffix :: [Int] -> Int -> Bool
prop_PrefixSuffix xs n = isPrefixOf prefix xs &&
                         isSuffixOf (reverse prefix) (reverse xs)
  where prefix = take n xs

prop_Sqrt :: Double -> Bool
prop_Sqrt x
  | x < 0            = isNaN sqrtX
  | x == 0 || x == 1 = sqrtX == x
  | x < 1            = sqrtX > x
  | x > 1            = sqrtX > 0 && sqrtX < x
    sqrtX = sqrt x

Now, this is great, but how did we just pass various functions with different number of arguments of different types to quickCheck, and how did it know what to do with them? Let's look at it's type signature:

λ> :t quickCheck
quickCheck :: Testable prop => prop -> IO ()


So, it seems, that QuickCheck can test anything that is Testable:

λ> :i Testable
class Testable prop where
  property :: prop -> Property
  exhaustive :: prop -> Bool
instance [safe] Testable Property
instance [safe] Testable prop => Testable (Gen prop)
instance [safe] Testable Discard
instance [safe] Testable Bool
instance [safe] (Arbitrary a, Show a, Testable prop) => Testable (a -> prop)

The last instance is for a function (a -> prop), that returns a prop, which, in turn, must also be an instance of Testable. This magic trick of a recursive constraint for an instance definition allows quickCheck to test a function with any number of arguments, as long as each one of them is an instance of Arbitrary and Show. So here is a check list of requirements for writing a testable property:

  • Zero or more arguments, which have an instance of Arbitrary, that is used for generating random input. More on that later.
  • Arguments must also be an instance of Show, so if a test fails, offending value can be displayed back to a programmer.
  • Return value is either:

    • True/False - to indicate pass/fail of a test case.
    • Discard - to skip the test case (eg. precondition fails).
    • Result - to customize pass/fail/discard test result behavior, collect extra information about the test outcome, provide callbacks and other advanced features.
    • Property for a much finer control of test logic. Such properties can be used as combinators to construct more complex test cases.
    • Prop used to implement Property
  • Start with prop_ or prop, followed by the usual camelCase, but that is just a convention, not a requirement.
  • Has no side effects. Also not a requirement, but strongly suggested, since referential transparency is lost with IO and test results can be inconsistent between runs. At the same time there are capabilities for testing Monadic code, which we will not go into here.


Here is another very simple property of lists xs !! n == head (drop n xs), so let's define it as is:

prop_Index_v1 :: [Integer] -> Int -> Bool
prop_Index_v1 xs n = xs !! n == head (drop n xs)

Naturally, you can see a problem with that function, it cannot accept just any random Int to be used for indexing, and quickCheck quickly finds that problem for us and prints out violating input along with an error:

λ> quickCheck prop_Index_v1
*** Failed! Exception: 'Prelude.!!: index too large' (after 1 test): 

Interestingly, if you try to run this example on any computer, there is a very good chance that it will give exactly the same output, so, it seems that input to properties is not completely random. In fact, thanks to the function sized, the first input to our property will always be an empty list and an integer 0, which tend to be really good corner cases to test for. In our case, though, !! and head are undefined for empty lists and negative numbers. We could add some guards, but there are facilities provided for such common cases:

prop_Index_v2 :: (NonEmptyList Integer) -> NonNegative Int -> Bool
prop_Index_v2 (NonEmpty xs) (NonNegative n) = xs !! n == head (drop n xs)

This version is still not quite right, since we do have another precondition n < length xs. However, it would be a bit complicated to describe this relation through the type system, so we will specify this precondition at a runtime using implication operator (⇒). Note, that return type has changed too:

prop_Index_v3 :: (NonEmptyList Integer) -> NonNegative Int -> Property
prop_Index_v3 (NonEmpty xs) (NonNegative n) =
  n < length xs ==> xs !! n == head (drop n xs)

Test cases with values, that do not satisfy the precondition, will simply get discarded, but not to worry, it will still generate the 100 tests. In fact it will generate up to a 1000 before giving up. An alternate way to achieve similar effect would be to generate a valid index within a property itself:

prop_Index_v4 :: (NonEmptyList Integer) -> Property
prop_Index_v4 (NonEmpty xs) =
  forAll (choose (0, length xs-1)) $ \n -> xs !! n == head (drop n xs)
λ> quickCheck prop_Index_v3 >> quickCheck prop_Index_v4
+++ OK, passed 100 tests.
+++ OK, passed 100 tests.

Just in case, let's quickly dissect this for all (∀) business. It takes a random value generator, which choose happens to produce, a property that operates on it's values and returns a property, i.e. applies values from a specific generator to the supplied property.

λ> :t forAll
forAll :: (Show a, Testable prop) => Gen a -> (a -> prop) -> Property
λ> sample' $ choose (0, 3)

There is a very subtle difference between the last two versions, namely _v3 will discard tests that do not satisfy a precondition, while _v4 will always generate a value for n that is safe for passing to index function. This is not important for this example, which is good, but that is not always the case. Whenever precondition is too strict, QuickCheck might give up early while looking for valid values for a test, but more importantly, it can give a false sence of validity, since most of the values that it will find could be trivial ones.


For this section we will use prime numbers in our examples, but rather than reinventing the wheel and writing functions for prime numbers ourselves we will use primes package. Just for fun, let's write a property for primeFactors, which is based on Fundamental Theorem of Arithmetic:

prop_PrimeFactors :: (Positive Int) -> Bool
prop_PrimeFactors (Positive n) = isPrime n || all isPrime (primeFactors n)

That was incredibly easy and is almost a direct translation of a theorem itself. Let's consider a fact that every prime number larger than 2 is odd, thus we can easily derive a property that sum of any two prime numbers greater than 2 is even. Here is a naive way to test that property:

prop_PrimeSum_v1 :: Int -> Int -> Property
prop_PrimeSum_v1 p q =
  p > 2 && q > 2 && isPrime p && isPrime q ==> even (p + q)

As you can imagine it is not too often that a random number will be prime, this certainly will affect the quality of this test:

λ> quickCheck prop_PrimeSum_v1
*** Gave up! Passed only 26 tests.

It only found 26 satisfiable tests out of a 1000 generated, that's bad. There is even more to it, in order to convince ourselves, that we are testing functions with data that resembles what we expect in real life, we should always try to inspect the values being generated for a property. An easy way to do that is to classify them by some shared traits:

prop_PrimeSum_v1' :: Int -> Int -> Property
prop_PrimeSum_v1' p q =
  p > 2 && q > 2 && isPrime p && isPrime q ==>
  classify (p < 20 && q < 20) "trivial" $ even (p + q)
λ> quickCheck prop_PrimeSum_v1'
*** Gave up! Passed only 29 tests (96% trivial).
λ> quickCheckWith stdArgs { maxSuccess = 500 } prop_PrimeSum_v1'
*** Gave up! Passed only 94 tests (44% trivial).

Almost all values this property was tested on are in fact trivial ones. Increasing number of tests was not much of a help, because, by default, values generated for integers are pretty small. We could try to fix that with appropriate types, but this time we will also generate a histogram of unique pairs of discovered prime numbers:

prop_PrimeSum_v2 :: (Positive (Large Int)) -> (Positive (Large Int)) -> Property
prop_PrimeSum_v2 (Positive (Large p)) (Positive (Large q)) =
  p > 2 && q > 2 && isPrime p && isPrime q ==>
  collect (if p < q then (p, q) else (q, p)) $ even (p + q)
λ> quickCheck prop_PrimeSum_v2
*** Gave up! Passed only 24 tests:
16% (3,3)
 8% (11,41)
 4% (9413,24019)
 4% (93479,129917)

This is better, there are less trivial values, but still, number of tests is far from satisfactory. It is also extremely inefficient to look for prime values that way, and, for any really large value passed to the property, it will take forever to check its primality. Much better approach would be to choose from a list of prime values, which we have readily available for us:

prop_PrimeSum_v3 :: Property
prop_PrimeSum_v3 =
  forAll (choose (1, 1000)) $ \ i ->
    forAll (choose (1, 1000)) $ \ j ->
      let (p, q) = (primes !! i, primes !! j) in
      collect (if p < q then (p, q) else (q, p)) $ even (p + q)
λ> quickCheck prop_PrimeSum_v3
+++ OK, passed 100 tests:
 1% (983,6473)
 1% (953,5059)
 1% (911,5471)


There could be a scenario where we needed prime values for many tests, then it would be a burden to generate them this way for each property. In such cases solution is always to write an instance for Arbitrary:

newtype Prime a = Prime a deriving Show

instance (Integral a, Arbitrary a) => Arbitrary (Prime a) where
  arbitrary = do
    x <- frequency [ (10, choose (0, 1000))
                   , (5, choose (1001, 10000))
                   , (1, choose (10001, 50000))
    return $ Prime (primes !! x)

Calculating large prime numbers is pretty expensive, so we could simply use something like choose (0, 1000), similarly to how it was done in prop_PrimeSum_v3, but there is no reason why we should exclude generating large prime numbers completely, instead, we can reduce their chance by describing a custom distribution with frequency function.

Now writing prop_PrimeSum is a piece of cake:

prop_PrimeSum_v4 :: Prime Int -> Prime Int -> Property
prop_PrimeSum_v4 (Prime p) (Prime q) =
  p > 2 && q > 2 ==> classify (p < 1000 || q < 1000) "has small prime" $ even (p + q)
λ> quickCheck prop_PrimeSum_v4
+++ OK, passed 100 tests (21% has small prime).


There are quite a few instances of Arbitrary, many common data types from base are, but the most peculiar one is a function:

λ> :i Arbitrary
class Arbitrary a where
  arbitrary :: Gen a
  shrink :: a -> [a]
instance [safe] (CoArbitrary a, Arbitrary b) => Arbitrary (a -> b)

That's right, QuickCheck can even generate functions for us! One of restrictions is that an argument to the function is an instance of CoArbitrary, which also has instance for a function, consequently functions of any arity can be generated. Another caveat is that we need an instance of Show for functions, which is not a standard practice in Haskell, and wrapping a function in a newtype would be more appropriate. For clarity we will opt out from this suggestion and instead demonstrate this cool feature in action. One huge benefit is that it allows us to easily write properties for higher order functions:

instance Show (Int -> Char) where
  show _ = "Function: (Int -> Char)"

instance Show (Char -> Maybe Double) where
  show _ = "Function: (Char -> Maybe Double)"

prop_MapMap :: (Int -> Char) -> (Char -> Maybe Double) -> [Int] -> Bool
prop_MapMap f g xs = map g (map f xs) == map (g . f) xs


One of the first concerns, that programmers usually raise when coming from other languages to Haskell, is that there are situations when unit tests are invaluable, but QuickCheck does not provide an easy way to do that. Bear in mind, QuickCheck's random testing is not a limitation, but rather is a priceless feature of testing paradigm in Haskell. Regular style unit tests and other QA functionality (code coverage, continuous integration, etc.) can be done just as easily as they are done in any other modern language using specialized libraries. In fact, those libraries play beautifully together and complement each other in many ways.

Here is an example of how we can use hspec to create a test suite containing all properties we have discussed so far, plus few extra unit tests for completeness of the picture.

#!/usr/bin/env stack
-- stack --resolver lts-7.16 runghc --package QuickCheck --package hspec --package primes
module Main where
import Test.Hspec
import Test.QuickCheck


main :: IO ()
main = hspec $ do
  describe "Reverse Properties" $
    do it "prop_RevRev" $ property prop_RevRev
       it "prop_RevApp" $ property prop_RevApp
       it "prop_PrefixSuffix" $ property prop_PrefixSuffix
  describe "Number Properties" $
    do it "prop_Sqrt" $ property prop_Sqrt
  describe "Index Properties" $
    do it "prop_Index_v3" $ property prop_Index_v3
       it "prop_Index_v4" $ property prop_Index_v4
       it "unit_negativeIndex" $ shouldThrow (return $! ([1,2,3] !! (-1))) anyException
       it "unit_emptyIndex" $ shouldThrow (return $! ([] !! 0)) anyException
       it "unit_properIndex" $ shouldBe (([1,2,3] !! 1)) 2
  describe "Prime Numbers" $
    do it "prop_PrimeFactors" $ property prop_PrimeFactors
       it "prop_PrimeSum_v3" $ property prop_PrimeSum_v3
       it "prop_PrimeSum_v4" $ property prop_PrimeSum_v4
  describe "High Order" $
    do it "prop_MapMap" $ property prop_MapMap


Random testing can be mistakenly regarded as an inferior way of software testing, but many studies have certainly shown that it is not the case. To quote D. Hamlet:

By taking 20% more points in a random test, any advantage a partition test might have had is wiped out.

It is very easy to start using QuickCheck to test properties of pure functions. There is also a very similar toolbox included in the library for testing monadic functions, thus allowing for a straightforward way of testing properties of functions that do mutations, depend on state, run concurrently and even perform I/O. Most importantly, this library provides yet another technique for making Haskell programs even safer.

Writing tests doesn't have to be a chore, it can be fun. We certainly find it fun at FPComplete and will be happy to provide training, consulting or development work.

Further reading

January 24, 2017 02:44 PM

January 23, 2017

Michael Snoyman

Stackage design choices: making Haskell curated package sets

This post is going to talk about some of the design choices made over the years around the Stackage project, a curated package set for Haskell. While many of these points will be Haskell- and Stackage-specific, I think the ideas would translate well to other languages interested in created curated package sets. This blog post was inspired by a short discussion on Twitter, which made it clear that I'd never really shared design thoughts on the Stackage project:

<script async="async" charset="utf-8" src=""></script>

In understanding why Stackage is the way it is today, it will be important to take into account:

  • The goals of the project
  • The historical circumstances when decisions were made
  • Social pressures in the community agitating for specific decisions
  • Inertia in the project making significant changes difficult

Apologies in advance, this turned out longer than I'd intended.


Before Stackage, the most common way to find a set of libraries to use in a Haskell project was using cabal-install's dependency solver, based on bounds information specified by authors. There were certainly some efforts at creating curated package sets previously (Haskell Platform provided a limited set; the Yesod Platform provided a full set of packages for the Yesod Web Framework; various Linux distros had binary packages). But I think it's fair to say that the vast majority of people writing Haskell code were using dependency solving.

I'm not going to get into the argument of dependency solving vs curation here. I will simply say that for many people - myself included - having official combinations of packages which are known to compile together, and which can be given to end users and teammates on a project, was very appealing. This was the motivation for my initial Stackage call for participation.

While the primary goal - create curated package sets - is obvious, the secondary goals are not. In fact, many of them only really became clear to me in 20-20 hindsight:

  • Require as little maintenance as possible. Stackage should be as much an automated process as can be created, since human time is a valuable, scarce resource. In other words: I'm lazy :).

  • Require as little change in behavior from package authors as possible. In my opinion, the only reasonable way to bootstrap a project is to make it trivial for people to participate. The barrier to entry for Stackage had to be minimal.

    • Even past the "bootstrapping" phase, a nice quality of any system is requiring little effort on the part of users. Therefore, even today, where Stackage is arguably successful and well-established, this goal still applies.
  • It needed to work well with existing tooling. In 2012, the Stack project hadn't even been dreamt up yet, so figuring out a way to work with cabal-install (via the cabal.config file) was vital. Compatibility with cabal-install is still a nice thing today, but not nearly as vital as it was then.

  • We need to maximize the number of packages that can be included in a single snapshot. The two ways in which two packages can be incompatible are:

    • There is an actual incompatibility in the API, such as a function being removed or its type signature changed in a new release of a dependency.

    • There is a stated upper or lower bound in a package which precludes a build plan, but the code itself would actually compile. (This is the case --allow-newer is designed for.)

Initial choices

Based on these goals, I created the initial version of Stackage. While many decisions came into play (e.g., what file format should we use to let package authors submit packages?), I'm going to focus on the interesting choices that fell out of the goals above, and which today may be noteworthy.

  • As I'd learnt from maintaining Yesod, many Windows users in particular were using the Haskell Platform (HP), and trying to specify different versions of packages from what HP provided could cause problems. Therefore, it was important to keep compatibility with the Haskell Platform set of packages. This resulted in multiple builds of Stackage: a "current GHC", "previous GHC", and "Haskell Platform superset."

  • We should always try to take the latest available version of a package, as it may include bug fixes, feature enhancements, and generally because the Haskell community loves the bleeding edge :). However, there would be cases where a new version of a package caused enough breakage to warrant holding it back, so some concept of enforced upper bounds was necessary too.

  • It was theoretically possible to ignore version bound information in cabal files, and instead ensure compatibility based on compiling and running test suites. However, this would have some serious downsides:

    • Users would have regularly needed to run builds with --allow-newer
    • If there were non-API-breaking semantic changes in a package, a version bound was present to avoid those changes, and there was no test suite to cover that behavior, ignoring bounds would cause those semantic changes to slip in (in my experience, this is an exceedingly rare case, but it can happen)
    • It's arguably very confusing behavior that a package set specifies versions of packages which claim to be incompatible with each other

    Therefore, version bounds needed to be respected. However...

  • Due to the frequency of overly restrictive version bounds and trivial compatibility patches which were slow to make it upstream, Stackage allowed for locally modified packages. That means that, for example, Stackage snapshot foo could have a different set of code associated with mtl-2.2.1 than what Hackage reports. Note that this feature was more aggressive than Hackage cabal file revisions, in that it allowed the code itself to change, not just the cabal file.

These decisions lasted for (IIRC) about a year, and were overall successful at letting Stackage become a thriving project. I was soon able to shut down the Yesod Platform initiative in favor of Stackage, which was a huge relief for me. At this point, outside of the Yesod community, I think Stackage was viewed mostly as a "ecosystem-wide CI system" than something for end users. It wasn't until Stack defaulted to Stackage snapshots that end users en masse started using Stackage.

Changes over time

Stackage today is quite a bit different from the above decisions:

  • I eventually dropped the Haskell Platform superset. There was a time when that package set wasn't updated, and the complication of trying to find a compatible set of packages on top of it was simply too high. In addition, HP included a version of aeson with a significant security hole (DoS attack with small inputs), and continuing to supply such a package set was not something I felt comfortable doing.

  • Due to the burden of maintaining bleeding-edge Stackages for multiple GHC versions - both on myself as the curator and on package authors - I also dropped support for older GHC releases. Instead, I introduced LTS Haskell, which keeps compatibility with older GHCs without adding (significant) package author burden.

  • When working on the GPS Haskell collaboration, I removed support for locally modified packages. This was done due to requests from the Hackage and Haskell Platform maintainers, who wanted a single definition of a package. With this change, unresponsive package maintainers can really hold things up in Stackage. However, this overall led to a number of simplifications in code, and ultimately allowed for better binary cache support in Stack. So despite the initial pain, I think this was a good change.

  • Hackage revisions make it possible for a package set to contain packages which are no longer compatible by their latest cabal files. Therefore, we needed to add support to Stackage to track which version of a cabal file was included in a snapshot, not just the version of the package itself. I only mention this here because it weakens our previous decision to respect cabal file constraints due to avoiding user confusion.

  • We have an expanded team! I'm happy to say that I am now one of five Stackage curators, and no longer have to either handle all the work myself, or make unilateral decisions. In other words, I get to share the blame with others :). Many thanks to Adam Bergmark, Dan Burton, Jens Petersen, and our newest member, Luke Murphy.

Changes to consider today

Alright, this post has turned out way longer than I'd expected, apologies for this. I guess there was more decision making that occurred than I'd realized. Anyway, I hope that gives some context for where things are at today. Which brings us to the original discussion that brought this whole blog post into existence: should we be changing anything about Stackage? Here are some changes either proposed by others or that I've thought of, and some remarks.

  • The curator team overall has been pretty lax about booting packages that block newer versions of dependencies. There have definitely been calls for us to be more proactive about that, and aggressively kick out packages that are holding back dependencies.

    • Pros: Stackage Nightly will live up to its bleeding edge mission statement more effectively, we'll overall have less incidental pain on package authors who are staying up to date with their dependencies.

    • Cons: it will decrease the number of packages in Stackage Nightly for end users, and adds extra burden on package authors to be more quick to respond to requests.

  • As a relaxed version of the above: be stricter with package authors, but only in the case of cabal file upper bounds. The argument here is stronger, since the work required is fairly minimal, and - at least in my experience - waiting for relaxed upper bounds is what takes up a lot of the time when curating. An extreme version of this is demanding that upper bounds just be removed.

  • Or an interesting alternative to that: should Stackage simply ignore constraints in cabal files entirely? It would be fairly easy to extend Stack to recognize a flag in snapshots to say "ignore the constraints when building," or even make that the default behavior.

    • Pros: less time spent on bounds issues, Stackage doesn't get held back by trivial version bounds issues, for PVP bounds enthusiasts could encourage people to add bounds during upload more often (not sure of that).

    • Cons: cabal users with Stackage snapshots wouldn't have as nice a time, it could be confusing for users, and if the upper bounds are in place due to semantic changes we won't catch it.

  • Since GPS Haskell isn't happening, we could add back the ability for the Stackage curator team to modify packages (both cabal files and source files). I think the pros and cons of this were pretty well established above, I'm not going to repeat it here.

  • People have asked for running multiple nightly lines with different GHC versions.

    • Pros: instead of haven't slightly outdated LTS versions for older GHCs, we'd have bleeding edge all over again.

    • Cons: we'd need new naming schemes for snapshots, a lot more work for the curator team, and potentially a lot more work for package authors who would need to maintain further GHC compatibility with their most recent releases.

  • I've had some private discussions around this, and thought I should share the idea here. Right now, Stackage requires that any package added must be available on Hackage. A number of newer build systems have been going the route of allowing packages to be present only in a Git repository. Stack has built-in support for specifying such locations, but snapshots do not support it. Should we add support to Stackage to allow packages to be pulled from places besides Hackage?

    • Pros: knocks down another barrier to entry for publishing packages.

    • Cons: Stackage snapshots will not automatically work with cabal-install anymore, extra work to be done to make this functional, and some issues around determining who owns a package name need to be worked out.

There are likely other changes that I haven't mentioned, feel free to raise them in the comments below. Also, if anyone really wants to follow up on these topics, the best place to do that is the Stackage mailing list.

January 23, 2017 12:00 AM

January 21, 2017

JP Moresmau

So long Haskell, and thanks for all the functional fish

I've realized I haven't written or read a line of Haskell in the past 6 months. After roughly ten years of tinkering with it, it seems that I've given up on it. There is a big obvious reason, and other smaller ones.

Professionally, my job got a lot more challenging (I'm officially an **architect** now, which means I still write lots of code but I have to draw pretty diagrams too (-: ) and involves a lot of research and learning new stuff, things like microservices, docker, messaging, Angular2, mobile apps, etc. So a lot of my time is dedicated to work or to learning for work, so I don't have the time to play around with something quite unrelated like Haskell.

I suppose to be honest there also was a bit of lassitude with Haskell. I got tired of the less than optimal IDEs, I realized Haskell was not going to get me a great job, and there were a few little skirmishes on the web that got ugly and made me see the community in a less favorable light.

This was fun, though, and I certainly learned a lot, and I hope it has made me a better programmer. A lot of my coding now is done in Java 8, and it's good to be able to apply some functional idioms practiced in Haskell, and more general ideas like data immutability, small pure functions do help make better - more testable, easier to understand - code.

So maybe I'll come back to Haskell some day, but not for now. To all the open source projects I've contributed, I wish you the best!

Happy Haskell (or any other language) Hacking!

by JP Moresmau ( at January 21, 2017 11:05 AM

January 20, 2017

Joachim Breitner

Global almost-constants for Haskell

More than five years ago I blogged about the “configuration problem” and a proposed solution for Haskell, which turned into some Template Haskell hacks in the seal-module package.

With the new GHC proposal process in plase, I am suddenly much more inclined to write up my weird wishes for the Haskell language in proposal form, to make them more coherent, get feedback, and maybe (maybe) actually get them implemented. But even if the proposal is rejected it is still a nice forum to discuss these ideas.

So I turned my Template Haskell hack into a proposed new syntactic feature. The idea is shamelessly stolen from Isabelle, including some of the keywords, and would allow you to write

context fixes progName in
  foo :: Maybe Int -> Either String Int
  foo Nothing  = Left $ progName ++ ": no number given"
  foo (Just i) = bar i

  bar :: Int -> Either String Int
  bar 0 = Left $ progName ++ ": zero no good"
  bar n = Right $ n + 1

instead of

foo :: String -> Maybe Int -> Either String Int
foo progName Nothing  = Left $ progName ++ ": no number given"
foo progName (Just i) = bar progName  i

bar :: String -> Int -> Either String Int
bar progName 0 = Left $ progName ++ ": zero no good"
bar progName n = Right $ n + 1

when you want to have an “almost constant” parameter.

I am happy to get feedback at the GitHub pull request.

by Joachim Breitner ( at January 20, 2017 06:03 PM

Douglas M. Auclair (geophf)

December 2016 1HaskellADay 1Liners

  • December 22nd, 2016:  f :: (Either a b, c) -> Either (a, c) (b, c), define f, snaps for elegance, e.g.: f (Left 4, "Hi") = Left (4, "Hi")
    • bazzargh @bazzargh uncurry (flip (join bimap . (,) ))
      • Denis Stoyanov @xgrommx need (Left 4, "Hi") = Left (4, "Hi") but your version Left ("Hi", 4)
    • Thomas D @tthomasdd Do tuple sections count? do I have access to Data.Bifunctor?
      • f (eab,c) = bimap (,c) (,c) eab
    • SocialJusticeCleric @walkstherain uncurry $ either ((Left .).(,)) ((Right .).(,))
    • Denis Stoyanov @xgrommx or f (e, a) = (join bimap (\x -> (x, a))) e
    • Nickolay Kudasov @crazy_fizruk most elegant IMO:
      f (Left a, c) = Left (a, c)
      f (Right b, c) = Right (b, c)
  • December 22nd, 2016: define a function that writes out an infinite, alternating stream of 1's and 0's as below. 
    • Philipp Maier @AkiiZedd mapM putStrLn $ join $ repeat ["0","1"]
      • Eyal Lotem @EyalL join . repeat = cycle?
    • mavant @mavant f = putStr "10" >> f
    • Eyal Lotem @EyalL mapM putStrLn $ cycle ["0","1"]
  • December 10th, 2016:
    startsWith :: [String] -> String
    points-free so that:
    startsWith ["ΜΗΛΟΝ", "ΗΔΟΝΗ"] = "ΛΟ"
    That is: (length list)+1 Char of each word
    • SocialJusticeCleric @walkstherain 
      • I prefer `uncurry (!!) . (Data.List.transpose &&& length)`
      • but `map . flip (!!) . length =<< id` only uses the Prelude
    • Nick @crazy_fizruk zipWith (!!) <*> repeat . length

by geophf ( at January 20, 2017 12:13 AM

January 19, 2017

Michael Snoyman

Follow up on mapM_

This is a short follow-up to my blog post about mapM_ and Maybe. Roman Cheplyaka started a discussion on that post, and ultimately we came up with the following implementation of mapM_ which works for all Foldables and avoids the non-tail-recursive case for Maybe as desired:

mapM_ :: (Applicative m, Foldable f) => (a -> m ()) -> f a -> m ()
mapM_ f a =
    go (toList a)
    go [] = pure ()
    go [x] = f x -- here's the magic
    go (x:xs) = f x *> go xs

Why is this useful? If you implement mapM_ directly in terms of foldr or foldMap, there is no way to tell that you are currently looking at the last element in the structure, and therefore will always end up with the equivalent of f x *> pure () in your expanded code. By contrast, with explicit pattern matching on the list-ified version, we can easily pattern match with go [x] and avoid *> pure () bit, thereby making tail recursion possible.

Some interesting things to note:

  • Using () <$ f x instead of f x *> pure () or f x >> return () seemed to make no difference for tail recursion purposes.
  • As a result of that, we still need to have the ()-specialized type signature I describe in the previous blog post, there doesn't seem to be a way around that.
  • As you can see from the benchmark which I unceremoniously ripped off from Roman, there do not appear to be cases where this version has more memory residency than mapM_ from base. Roman had raised the concern that the intermediate list may involve extra allocations, though it appears that GHC is smart enough to avoid them.

Here are the results. Notice the significantly higher residency numbers for base:

   5000      roman          36,064 bytes
   5000    michael          36,064 bytes
   5000       base          36,064 bytes
  50000      roman          36,064 bytes
  50000    michael          36,064 bytes
  50000       base         133,200 bytes
 500000      roman          44,384 bytes
 500000    michael          44,384 bytes
 500000       base       2,354,216 bytes
5000000      roman          44,384 bytes
5000000    michael          44,384 bytes
5000000       base      38,235,176 bytes

My takeaway from all of this: it's probably too late to change the type signature of mapM_ and forM_ in base, but this alternative implementation is a good fit for mono-traversable. Perhaps there are some rewrite rules that could be applied in base to get the benefits of this implementation as well.

Completely tangential, but: as long as I'm linking to pull requests based on blog posts, I've put together a PR for classy-prelude and conduit-combinators that gets rid of generalized I/O operations, based on my readFile blog post.

January 19, 2017 12:00 AM

January 18, 2017

FP Complete

Speeding up a distributed computation in Haskell

While helping a client ship a medical device we were tasked to make its response time bearable. This was no easy feat, given that each request to this device requires running a simulation that takes hours if ran on a single CPU. This long response time would make it impossible for doctors to use this device interactively, which in turn would make the device much less desirable -- think of a doctor having to wait hours between inputting the patient data and getting results, as opposed to getting results immediately as the data is available.

Luckily the simulations in question are embarrassingly parallel, and thus one obvious path to reduce the response time is to run it on multiple CPUs.

At the core of this device sits a Haskell program that performs the simulation. Thus the first step was to exploit Haskell built-in multi-core parallelism to achieve the parallelization. However the results were unsatisfactory, since we were unable to scale decently beyond 7 to 10 CPUs. Thus we created a custom distribution algorithm where separate Haskell runtimes communicate with TCP sockets, similar to what happens in Erlang. This also allowed us to scale beyond a single machine. We've described this effort in the past, see the report Scaling Up a Scientific Computation and the talk Parallelizing and distributing scientific software in Haskell.

This first effort allowed us to run simulations in a much shorter time, but it still did not allow us to scale nicely to hundreds of CPUs. This article describes how we fixed that by bypassing one of the high level facilities that Haskell provides.

High level languages are all about offering such facilities, to be able to write correct programs quicker. Haskell offers a great number of abstractions to help in this regard, such as garbage collection and laziness, and GHC also is full of tools on top of the language itself to write an ever greater number of programs in a more comfortable way.

One of the features that makes GHC stand out is the sophistication of the runtime it provides. Apart from being an impressive piece of work even just for implementing Haskell efficiently, it also offers features that are very useful for the kind of systems programming that writing a distributed application requires. Specifically, green threads and the GHC event manager make writing a fast multi-threaded server much easier than in other languages. For example the first versions of Warp, Haskell's most popular web server, outperformed most web servers in just 500 lines of code, largely thanks to these facilities -- you can find more info about this effort in the report Warp: A Haskell Web Server. Warp has since grown in code size to add new features, but the core is still using the same facilities and performing well.

Since the core of the software that we built is a server coordinating the work of many slaves, for our first version we reached for these facilities to write it. The server was reasonably fast and served us for a while, but we hit a ceiling pretty quickly beyond which we were unable to scale.

However, a nice thing about GHC Haskell is that it's very easy to drop down to a lower level programming style when needed. This can be accomplished through the excellent foreign function interface to C paired with the low-level utilities in base. By doing so we were able to scale to hundreds of cores and run simulations up to 5 times faster then the best time we achieved with the previous version.

The program

As mentioned, the server in question is the master process in a distributed computing application. The application is essentially a particle filter, distributed across many processes which might be on different machines. Since we want multi-machine distribution, we use TCP sockets to communicate between the processes doing the computation.

At the core of the program logic we have a function taking some State and some Input, and generating some new states and an output associated with each one:

type Evolve = State -> Input -> [(State, Output)]

Note that a single state and input pair generates multiple states and output. The multiple outputs are due to the fact that in a particle filter each state (or rather each "particle") can be sampled 0 or multiple times. We need to run one such function on thousands of inputs:

-- Apply the `Evolve` to every given `State`, return
-- the new states and output.
evolveMany :: Evolve -> [State] -> [Input] -> [[(State, Output)]]
evolveMany f = zipWith f

Given this initial specification, there are a couple of adjustments we need to make if we want to be able to distribute the computation. First, the function will have to live in IO, since communication will happen through Sockets. Second, we won't refer to the states directly, but rather refer to them using tokens provided by the system. At the beginning we'll provide the initial states and get back tokens in result, and at each call to evolveMany we'll get -- instead of new States -- new tokens.

We can do this because we do not care about the content of the states (while we care about the outputs) and referring to them with tokens rather than directly we can avoid transferring them to other processes each time we need to operate on them, saving a lot of bandwidth and speeding up the computation greatly.

Thus, we'll also need to book-keep which slave processes are holding which state.

Finally, we'll need Sockets to communicate with the slave processes.

This gives us a new API:

-- We use `Map` and `Set` from `containers` for illustrative purposes, `HashMap`
-- from `unordered-containers` or a mutable hash table from `hashtables`
-- will most likely be more performant.
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Set (Set)
import qualified Data.Set as Set

-- Some token representing a `State` on some slave.
data StateId

-- Some token representing a slave.
data SlaveId

-- Reset the states in the system to the given ones, returns a
-- 'StateId' for each state.
resetStates ::
     Map SlaveId Socket -- Connection to the slaves
  -> [State]
  -> IO (Map SlaveId (Set StateId), [StateId])
  -- Returns how the states have been repartitioned on the slaves
  -- and a list to know which `StateId` corresponds to which `State`.

-- Evolves the states with the given inputs, returns the outputs and
-- the new 'StateId's resulting from the evolution. 
evolveMany ::
     Map SlaveId Socket -- Connections to the slaves
  -> Map SlaveId (Set StateId) -- Which states are on which slave
  -> Map StateId Input -- Inputs to each state
  -> IO (Map SlaveId (Set StateId), Map StateId [(StateId, Output)])
  -- Returns the new mapping from slaves to states, and
  -- the outputs.

When using this API, the usual pattern is to call resetStates at the beginning with the initial states and then a series of evolveMany afterwards, each using the StateIds returned from resetStates the first time and evolveMany afterwards.

The challenge is to implement evolveMany as efficiently as possible.

To give an idea of the time involved, we usually have around 2000 states, a few tens of calls to evolveMany, and each call to Evolve takes a few tenths of seconds to complete, giving a single-threaded run time of a few hours, e.g.

 2000 * -- Number of states
   80 * -- number of calls to evolveMany
0.03s = -- Evolve time 
1h 20m  -- Total running time

High level overview of the implementation

resetStates just assigns a unique StateId to each state, and then splits up and uploads the states evenly between the slaves.

All the complexity lies in evolveMany: the goal is to utilize the slaves as efficiently as possible.

We found pretty early that naively evolving the states present on each slave would not work, because:

  • Each call to Evolve results in many (possibly 0) children states (since the return type is a list), and we cannot predict how many we'll get in advance. This would cause different slaves to have a different number of states after a few calls to evolveMany, which in turn would cause the slaves to not be used efficiently, since some would end up being idle;
  • The runtime of an individual Evolve depends on the state and on the input, and we cannot predict it. This also can cause some slaves to finish earlier than others, causing inefficiencies.

More concretely, imagine a situation with 10 states, where 9 of the states take 1 second while there is an odd state that takes 10 seconds. If we have 2 slaves at our disposal, the most efficient distribution is to assign the slow state to one slave, and all the others to another slave, with one slave taking 10 seconds and the other taking 9. If we just distribute the states evenly between the slaves, 1 slave will take 14 seconds and one 5. Since the total runtime will be constrained by the slowest slave, we must be careful to avoid such long tails.

So we switched to a simple but effective method to utilize the slaves efficiently. The master process keeps track of the states present on each slave, and asks the slaves to process them in batches, say of 5. When a slave finishes its batch, it sends the output back to the master and waits for further instructions. If the slave still has states to evolve, the master sends a request for a new batch to be evolved. If the slave does not have states to update the master will search for a slave with states to spare, and request them. When a slave receives such a request it sends back the states to the master, which will forward them to the needy slave. When there are no more states to update, evolveMany is done.

The algorithm can be summed up as two state machines, one for the master and one for the slave:

-- This is what the master sends to the slave.
data Request
  -- Evolve the selected states
  = EvolveStates [StateId]
  -- Add the given states
  | AddStates [(StateId, State)]
  -- Remove the requested states, and return them to the master
  | RemoveStates [StateId]

-- This is what the slaves reply to the master.
data Response
  = StatesEvolved [(StateId, [(StateId, Output)])]
  | StatesAdded
  | StatesRemoved [(StateId, State)]

-- The slave has a set of `State`s indexed by `StateId`, and it updates
-- it at each request from the master.
slaveStateMachine :: Map StateId State -> Request -> (Map StateId State, Response)

-- Some type to refer to slaves uniquely.
data SlaveId

-- The master keeps track of which states each slave has, and will update
-- it. It also records the outputs we have received from the slaves so far.
data MasterState = MasterState
  { msSlavesStates :: Map SlaveId (Set StateId)
  , msStatesToEvolve :: Map StateId Input
  , msEvolvedStates :: Map StateId [(StateId, Output)]

-- At each response from a slave the master updates its state and then
-- might reply with a new `Request`. Note that the `Request` might not
-- be directed at the same slave that sent the `Response`, since sometimes
-- we need to steal slaves from other slaves since the slave at hand does
-- not have states to update.
masterStateMachine ::
     MasterState -> SlaveId -> Response
  -> (MasterState, Maybe (SlaveId, Request))

The most common pattern of interaction between slave and master will be of a loop of EvolveStates and StatesEvolved:

EvolveStates and StatesEvolved

This interaction between slave and master will continue until one slave will runs out of states to evolve. In that case, the master will have to reach out to some other slave to be able to provide the needy slave with something to evolve. For example, this is what will happen if slave 3 runs out of states and the master decides to ship some states to it from slave 2:

Running out of states

The exact implementation of the state machines is not relevant, but given their types what's to note is that:

  • The slave will be a very simple loop that just waits for a request, processes it, and then replies to the master.
  • The master, on the other hand, is a bit more complicated: it needs to wait for responses from any slave, which means that we'll have to multiplex over multiple channels; and then it can reply to any slave.

First attempt, and performance

Now that we have abstracted out the logic of the master and the slaves in self-contained state machines, we can describe the slave and master processes. We'll assume IO functions to send and receive messages.

The slave implementation is trivial and won't change:

-- These functions will use `recv`/`send` to work with the `Socket`s,
-- and the `store` library to efficiently deserialize and serialize
-- the requests and responses.
receiveRequest :: Socket -> IO Request
sendResponse :: Socket -> Response -> IO ()

slave ::
     Socket -- Connection to master
  -> IO a
slave sock = loop mempty -- No states at the beginning
    loop :: Map StateId State -> IO (Map StateId State)
    loop states = do
      req <- receiveFromMaster sock
      (states', resp) <- slaveStateMachine states req
      sendToMaster sock resp

Note that a slave process is not bound to a single call to evolveMany, it just takes requests from a master.

The master on the other hand is essentially the implementation of evolveMany, and we have a lot more options to implement it. Our first version is a pretty idiomatic Haskell program, using one thread per slave so that we can wait on all of them at once, with the master state stored in an MVar that can be accessed from all the slave threads:

First implementation

Each slave thread will run code waiting on a slave, modifying the shared state using the master state machine:

import Control.Concurrent.MVar

receiveResponse :: Socket -> IO Response
sendRequest :: Socket -> Request -> IO ()

-- Terminates when there is nothing left to do.
slaveThread :: Map SlaveId Socket -> MVar MasterState -> SlaveId -> IO ()
slaveThread slaveSockets masterStateVar slaveId = do
  resp <- receiveResponse (slaveSockets Map.! slaveId)
  (masterState, mbReq) <- modifyMVar masterStateVar $ \masterState ->
    let (masterState', mbReq) =
          masterStateMachine masterState slaveId resp
    return (masterState', (masterState', mbReq))
  -- Send the request if needed
    (\(slaveId, req) -> sendRequest (slaveSockets Map.! slaveId) req)
  -- Continue if there are still slates to evolve
  unless (Map.null (msStatesToEvolve masterState)) $
    slaveThread masterStateVar slaveId

-- Runs the provided actions in separate threads, returns as
-- soon as any exists
raceMany_ :: [IO ()] -> IO ()
raceMany_ xs0 = case xs0 of
  -- `race_` is from the `async` package.
  [] -> return ()
  [x] -> x
  x : xs -> race_ x (raceMany_ xs)
evolveMany ::
     Map SlaveId Socket
  -> Map SlaveId (Set StateId)
  -> Map StateId Input 
  -> IO (Map SlaveId (Set StateId), Map StateId [(StateId, Output)])
evolveMany slaveSockets slaveStates inputs = do
  masterStateVar <- newMVar MasterState
    { msSlavesStates = slaveStates
    , msStatesToEvolve = inputs
    , msEvolvedStates = mempty
  -- Run one thread per slave until one receives the response
  -- after which there are no states to evolve
  raceMany_ (map (slaveThread masterStateVar) (Map.keys slaveStates))
  -- Return the results in the `MasterState`
  masterState <- readMVar masterStateVar
  return (msSlavesStates masterState, msEvolvedStates masterState)

This implementation is simple and quite obviously correct, and it's also pretty fast. In fact, we were able to scale up to around 20 slaves quite well with it:

Performance up to 17 slaves, first implementation

Note that both axes for this and every other plot in this article are logarithmic: if we scaled perfectly we'd get a straight line, which we're pretty close to.

However, things go downhill if we try to scale beyond 20 slaves. Here is a sample of the runtime with up to 450 slaves for six different scenarios:

Performance up to 450 slaves, first implementation

These measurements were all taken on clusters of c4.8xlarge AWS instances with 18 physical cores, with up to 30 machines running at once. The benchmarking was automated using terraform, which was invaluable when evaluating the improvements.

It's evident that the distribution does not scale beyond around 40 slaves, and stalls completely between 50 and 100 slaves, after which adding slaves is detrimental to the runtime. Note that for the scenarios taking more time the scaling is better: this is because for those scenarios each individual call to the Evolve function takes longer, and thus the overhead of the distribution is less substantial. This is the case for scenario D, which starts out being the slowest with 17 slaves, taking more than 4000 seconds rather than 800-1000 seconds, but scaling much better.

From this data it was clear that if we wanted to be able to leverage a large number of machines to run our simulations in a minute or less we had to improve the performance of evolveMany.

Small aside: note how these plots contains a line "with taskset" and one without, with the one without performing noticeably worse. The line with taskset indicates measurements taken where each Haskell process is pinned to a physical CPU core: this improves performance substantially compared to letting the kernel schedule them.[^runtimes] After finding this out we ran all subsequent tests pinning slave processes to physical cores. Hyperthreading was also detrimental to the runtime, since the increased distribution overhead far outweighed the gained CPU time; so we used only one process per physical CPU core and avoided hyperthreading. Keep in mind that since we're distributing the work manually using TCP sockets each slave is a separate OS process that runs a dedicated Haskell runtime, which is why it makes sense to pin it to a single core.

Second attempt

By measuring how much time each slave spent working and how much time it spent waiting for instructions from the master, it became clear that the program was getting slower because the slaves spent more and more time waiting for instructions, rather than actually working. Thus, if we wanted proper scaling, we needed to lower the latency between the time a response reached the master and the time the slave received the next request.

Now, we tried to gain conclusive evidence of why our first version of evolveMany is slow, but profiling these sort of applications is quite hard unless you're intimately familiar with the Haskell runtime -- which is almost like saying "if you are Simon Marlow".

We had however some hypotheses of why our program was slow. One possibility is that the event manager can simply not handle hundreds of connections at the same time efficiently, at least in our use case.

Another suspicion is that the multi-threadedness of the first version played at our disadvantage since there would be a lot of pointless context-switches while one thread was already modifying the MVar MasterState. In other words, any context switch between slave threads while one slave thread is already holding the MVar MasterState is (almost) wasted, since it'll be blocked on the MVar MasterState right after receiving a slave response and will yield, delaying the completion of the loop body in the thread that was already processing the MasterState.

While our second version was based on these hypotheses we were quite short on time and did not want to take the risk of rewriting the program to find that we still could not scale as we desired. Thus, we set ourselves to write the fastest possible version of evolveMany that we could think of.

The main change we wanted was to turn the server from a multi-threaded server multiplexing through the event manager to a single-threaded application multiplexing the sockets directly.

In Linux, the epoll set of syscalls exist for this exact reason: you can register multiple sockets to wait on with epoll_ctl, and then wait for any of them to be ready using epoll_wait.

However in Haskell epoll is abstracted over by the GHC event manager, so there is no library to use these facilities directly. The GHC event manager does offer an interface to it in the form of GHC.Event.registerFd. However all these functions are callback based -- they take a function that will be called in a green thread when the socket is ready. Thus we cannot easily write a single threaded program directly using it. If we want to write a single-threaded loop we're forced to go through an additional synchronization primitive such an MVar to signal that a socket is ready to be read from in the callback provided to registerFd. Note that the normal blocking read for Haskell sockets is implemented using threadWaitRead, which uses registerFd in exactly this way, by having the callback to fill in an MVar that threadWaitRead will wait on. We tried this approach and got no performance improvement.

Thus we decided to just write the loop using epoll directly, which proved very painless given that the GHC codebase already contains bindings to the epoll functions, as part of the event manager. We released a simple library for people that need to do the same, simple-poll. Right now it only supports epoll, and is thus limited to Linux, but it should be easy to extend to other platforms by copy-pasting other bits of code from the GHC event manager.

Updating the old loop to an explicit multiplexing style, we have:

-- `System.Poll.EPoll` comes from the `simple-poll` package
import System.Poll.EPoll (EPoll)
import qualified System.Poll.EPoll as EPoll
import Network.Socket (Socket(MkSocket))
import System.Posix.Types (Fd(Fd))

-- Receives first responses to arrive from any of the slaves.
-- This amounts to calling `EPoll.wait` to get back a list of
-- sockets to read from, and then draining them in turn to
-- decode the `Response`.
-- Note that draining them might still not give us a response,
-- since the full response might not be available all at once,
-- and thus in the full version of the software this function will have
-- to hold somes state holding partially read messages.
-- Also note that in the real software it's much better to return
-- a list of `(SlaveId, Response)` pairs. We have it return only
-- one for simplicity.
receiveFromAnySlave ::
  -> Map Fd SlaveId
  -- Reverse lookup table from `Fd`s to `SlaveId`s. We need it
  -- since `EPoll.wait` gives us the `Fd`s which are ready to
  -- be read from, and from that we need to get back which
  -- `SlaveId` it corresponds to, to return it.
  -> IO (SlaveId, Response)

-- Utility to get a file descriptor out of a `Socket`
socketFd :: Socket -> Fd
socketFd (MkSocket fd _ _ _ _) = Fd fd

evolveMany ::
     Map SlaveId Socket -- All the connections to the slaves
  -> Map SlaveId (Set StateId) -- The states held by each slave
  -> Map StateId Input -- The inputs to each state
  -> IO (Map SlaveId (Set StateId), Map StateId [(StateId, Output)])
evolveMany slaveSockets slaveStates inputs = EPoll.with 256 $ \epoll -> do
  -- First register all the sockets with `epoll_ctl`. `epollIn` is to
  -- indicate that we want to be notified when a socket can be read from.
  forM_ slaveSockets $ \socket ->
    EPoll.control epoll Epoll.controlOpAdd (socketFd socket) EPoll.epollIn
  -- Then start the event loop
  masterState <- loop epoll MasterState
    { msSlavesStates = slaveStates
    , msStatesToEvolve = inputs
    , msEvolvedStates = mempty
  return (msSlavesStates masterState, msEvolvedStates masterState)
    fdToSlaveIds :: Map Fd SlaveId
    fdToSlaveIds =
      Map.fromList [(socketFd sock, slaveId) | (slaveId, sock) <- Map.toList slaveSockets]
    loop :: EPoll -> MasterState -> IO (Map StateId [(StateId, Output)])
    loop epoll masterState = do
      -- Get a response from some slave
      (slaveId, resp) <- receiveFromAnySlave epoll slaveSockets
      -- Update the state accordingly
      let (masterState', mbResp) =
            masterStateMachine masterState slaveId resp
      -- Send the new requests
      mapM_ (uncurry sendToSlave) mbResp
      -- Continue if we're not done
      unless (Map.null (msStatesToEvolve masterState')) (loop masterState')

Once we did this, the performance increased dramatically, fulfilling our current scaling needs and probably getting quite close to optimal scaling for our use case, although we have not researched what more margin for improvements we have since we do not need them for now.

Going back to the original set of plots, the blue line shows the improved performance with our second implementation:

Improved performance

The plots clearly show a much nicer scaling pattern as the number of slaves increases, and runtimes of often 100 seconds of less, which represent a 2x to 5x improvement compared to the first version.

We also integrated other micro optimizations that yielded less substantial improvements (in the 5 to 10%) range, such as

  • Using mutable hashtables instead of unordered-containers for most of the bookkeeping.
  • Reading from the Socket directly into a ByteBuffer and deserializing directly from there rather than copying into intermediate ByteStrings, reducing allocations drastically to perform deserialization, since we allocate the buffer where the socket data is read into upfront.


Our biggest takeaway from this experience is that in Haskell we can have the confidence that we'll always be able to write the task at hand to be as fast as possible with relative ease. Writing the epoll based version took around a day, including factoring out the bindings from the GHC event manager into a library.

Moreover, it's important to remember that the normal facilities for fast IO in Haskell (green threads + transparent evented IO) is fast enough for the overwhelming majority of cases, and much easier to manage and think about than manual evented IO. Michael Snoyman recently compared green threads to garbage collection, an apt comparison. Our software is one of the cases where the abstraction prevents performance, and thus we need to work without it.

Finally, it would be great to gain hard evidence on why the first program was slow, rather than just hypotheses. We tried quite hard to understand it but could not reach conclusive evidence in the time we had. We hope to get to the bottom of this issue when we have the time, and maybe make profiling these kind of programs easier in the meantime.


The work described was performed with Philipp Kant and Niklas Hambüchen. Thanks to Michael Snoyman, Philipp Kant, and Niklas Hambüchen for reviewing drafts of this blog post.

January 18, 2017 03:10 PM

Edward Z. Yang

Try Backpack: Cabal packages

This post is part two of a series about how you can try out Backpack, a new mixin package system for Haskell. In the previous post, we described how to use a new ghc --backpack mode in GHC to quickly try out Backpack's new signature features. Unfortunately, there is no way to distribute the input files to this mode as packages on Hackage. So in this post, we walk through how to assemble equivalent Cabal packages which have the same functionality.

Download a cabal-install nightly

Along with the GHC nightly, you will need a cabal-install nightly to run these examples. Assuming that you have installed hvr's PPA already, just aptitude install cabal-install-head and you will get a Backpack-ready cabal-install in /opt/cabal/head/bin/.

Otherwise, you will need to build cabal-install from source. I recommend using a released version of GHC (e.g., your system GHC, not a nightly) to build cabal-install.

Where we are going

Here is an abridged copy of the code we developed in the last post, where I have removed all of the module/signature contents:

unit str-bytestring where
    module Str

unit str-string where
    module Str

unit regex-types where
    module Regex.Types

unit regex-indef where
    dependency regex-types
    signature Str
    module Regex

unit main where
    dependency regex-types
    dependency regex-indef[Str=str-string:Str]     (Regex as Regex.String)
    dependency regex-indef[Str=str-bytestring:Str] (Regex as Regex.ByteString)
    module Main

One obvious way to translate this file into Cabal packages is to define a package per unit. However, we can also define a single package with many internal libraries—a new feature, independent of Backpack, which lets you define private helper libraries inside a single package. Since this approach involves less boilerplate, we'll describe it first, before "productionizing" the libraries into separate packages.

For all of these example, we assume that the source code of the modules and signatures have been copy-pasted into appropriate hs and hsig files respectively. You can find these files in the source-only branch of backpack-regex-example

Single package layout

In this section, we'll step through the Cabal file which defines each unit as an internal library. You can find all the files for this version at the single-package branch of backpack-regex-example. This package can be built with a conventional cabal configure -w ghc-head (replace ghc-head with the path to your copy of GHC HEAD) and then cabal build.

The header of the package file is fairly ordinary, but as Backpack uses new Cabal features, cabal-version must be set to >=1.25 (note that Backpack does NOT work with Custom setup):

name:                regex-example
build-type:          Simple
cabal-version:       >=1.25

Private libraries. str-bytestring, str-string and regex-types are completely conventional Cabal libraries that only have modules. In previous versions of Cabal, we would have to make a package for each of them. However, with private libraries, we can simply list multiple library stanzas annotated with the internal name of the library:

library str-bytestring
  build-depends:       base, bytestring
  exposed-modules:     Str
  hs-source-dirs:      str-bytestring

library str-string
  build-depends:       base
  exposed-modules:     Str
  hs-source-dirs:      str-string

library regex-types
  build-depends:       base
  exposed-modules:     Regex.Types
  hs-source-dirs:      regex-types

To keep the modules for each of these internal libraries separate, we give each a distinct hs-source-dirs. These libraries can be depended upon inside this package, but are hidden from external clients; only the public library (denoted by a library stanza with no name) is publically visible.

Indefinite libraries. regex-indef is slightly different, in that it has a signature. But it is not too different writing a library for it: signatures go in the aptly named signatures field:

library regex-indef
  build-depends:       base, regex-types
  signatures:          Str
  exposed-modules:     Regex
  hs-source-dirs:      regex-indef

Instantiating. How do we instantiate regex-indef? In our bkp file, we had to explicitly specify how the signatures of the package were to be filled:

dependency regex-indef[Str=str-string:Str]     (Regex as Regex.String)
dependency regex-indef[Str=str-bytestring:Str] (Regex as Regex.ByteString)

With Cabal, these instantiations can be specified through a more indirect process of mix-in linking, whereby the dependencies of a package are "mixed together", with required signatures of one dependency being filled by exposed modules of another dependency. Before writing the regex-example executable, let's write a regex library, which is like regex-indef, except that it is specialized for String:

library regex
  build-depends:       regex-indef, str-string
  reexported-modules:  Regex as Regex.String

Here, regex-indef and str-string are mix-in linked together: the Str module from str-string fills the Str requirement from regex-indef. This library then reexports Regex under a new name that makes it clear it's the String instantiation.

We can easily do the same for a ByteString instantiated version of regex-indef:

library regex-bytestring
  build-depends:       regex-indef, str-bytestring
  reexported-modules:  Regex as Regex.ByteString

Tie it all together. It's simple enough to add the executable and then build the code:

executable regex-example
  main-is:             Main.hs
  build-depends:       base, regex, regex-bytestring, regex-types
  hs-source-dirs:      regex-example

In the root directory of the package, you can cabal configure; cabal build the package (make sure you pass -w ghc-head!) Alternatively, you can use cabal new-build to the same effect.

There's more than one way to do it

In the previous code sample, we used reexported-modules to rename modules at declaration-time, so that they did not conflict with each other. However, this was possible only because we created extra regex and regex-bytestring libraries. In some situations (especially if we are actually creating new packages as opposed to internal libraries), this can be quite cumbersome, so Backpack offers a way to rename modules at use-time, using the mixins field. It works like this: any package declared in build-depends can be specified in mixins with an explicit renaming, specifying which modules should be brought into scope, with what name.

For example, str-string and str-bytestring both export a module named Str. To refer to both modules without using package-qualified imports, we can rename them as follows:

executable str-example
  main-is:             Main.hs
  build-depends:       base, str-string, str-bytestring
  mixins:              str-string     (Str as Str.String),
                       str-bytestring (Str as Str.ByteString)
  hs-source-dirs:      str-example

The semantics of the mixins field is that we bring only the modules explicitly listed in the import specification (Str as Str.String) into scope for import. If a package never occurs in mixins, then we default to bringing all modules into scope (giving us the traditional behavior of build-depends). This does mean that if you say mixins: str-string (), you can force a component to have a dependency on str-string, but NOT bring any of its module into scope.

It has been argued package authors should avoid defining packages with conflicting module names. So supposing that we restructure str-string and str-bytestring to have unique module names:

library str-string
  build-depends:       base
  exposed-modules:     Str.String
  hs-source-dirs:      str-string

library str-bytestring
  build-depends:       base, bytestring
  exposed-modules:     Str.ByteString
  hs-source-dirs:      str-bytestring

We would then need to rewrite regex and regex-bytestring to rename Str.String and Str.ByteString to Str, so that they fill the hole of regex-indef:

library regex
  build-depends:       regex-indef, str-string
  mixins:              str-string (Str.String as Str)
  reexported-modules:  Regex as Regex.String

library regex-bytestring
  build-depends:       regex-indef, str-bytestring
  mixins:              str-bytestring (Str.ByteString as Str)
  reexported-modules:  Regex as Regex.ByteString

In fact, with the mixins field, we can avoid defining the regex and regex-bytestring shim libraries entirely. We can do this by declaring regex-indef twice in mixins, renaming the requirements of each separately:

executable regex-example
  main-is:             Main.hs
  build-depends:       base, regex-indef, str-string, str-bytestring, regex-types
  mixins:              regex-indef (Regex as Regex.String)
                          requires (Str as Str.String),
                       regex-indef (Regex as Regex.ByteString)
                          requires (Str as Str.ByteString)
  hs-source-dirs:      regex-example

This particular example is given in its entirety at the better-single-package branch in backpack-regex-example.

Note that requirement renamings are syntactically preceded by the requires keyword.

The art of writing Backpack packages is still in its infancy, so it's unclear what conventions will win out in the end. But here is my suggestion: when defining a module intending to implement a signature, follow the existing no-conflicting module names convention. However, add a reexport of your module to the name of the signature. This trick takes advantage of the fact that Cabal will not report that a module is redundant unless it is actually used. So, suppose we have:

library str-string
  build-depends:       base
  exposed-modules:     Str.String
  reexported-modules:  Str.String as Str
  hs-source-dirs:      str-string

library str-bytestring
  build-depends:       base, bytestring
  exposed-modules:     Str.ByteString
  reexported-modules:  Str.ByteString as Str
  hs-source-dirs:      str-bytestring

Now all of the following components work:

library regex
  build-depends:       regex-indef, str-string
  reexported-modules:  Regex as Regex.String

library regex-bytestring
  build-depends:       regex-indef, str-bytestring
  reexported-modules:  Regex as Regex.ByteString

-- "import Str.String" is unambiguous, even if "import Str" is
executable str-example
  main-is:             Main.hs
  build-depends:       base, str-string, str-bytestring
  hs-source-dirs:      str-example

-- All requirements are renamed away from Str, so all the
-- instantiations are unambiguous
executable regex-example
  main-is:             Main.hs
  build-depends:       base, regex-indef, str-string, str-bytestring, regex-types
  mixins:              regex-indef (Regex as Regex.String)
                          requires (Str as Str.String),
                       regex-indef (Regex as Regex.ByteString)
                          requires (Str as Str.ByteString)
  hs-source-dirs:      regex-example

Separate packages

OK, so how do we actually scale this up into an ecosystem of indefinite packages, each of which can be used individually and maintained by separate individuals? The library stanzas stay essentially the same as above; just create a separate package for each one. Rather than reproduce all of the boilerplate here, the full source code is available in the multiple-packages branch of backpack-regex-example.

There is one important gotcha: the package manager needs to know how to instantiate and build these Backpack packages (in the single package case, the smarts were encapsulated entirely inside the Cabal library). As of writing, the only command that knows how to do this is cabal new-build (I plan on adding support to stack eventually, but not until after I am done writing my thesis; and I do not plan on adding support to old-style cabal install ever.)

Fortunately, it's very easy to use cabal new-build to build regex-example; just say cabal new-build -w ghc-head regex-example. Done!


If you actually want to use Backpack for real, what can you do? There are a number of possibilities:

  1. If you are willing to use GHC 8.2 only, and you only need to parametrize code internally (where the public library looks like an ordinary, non-Backpack package), using Backpack with internal libraries is a good fit. The resulting package will be buildable with Stack and cabal-install, as long as you are using GHC 8.2. This is probably the most pragmatic way you can make use of Backpack; the primary problem is that Haddock doesn't know how to deal with reexported modules, but this should be fixable.
  2. If you are willing to use cabal new-build only, then you can also write packages which have requirements, and let clients decide however they want to implement their packages.

Probably the biggest "real-world" impediment to using Backpack, besides any lurking bugs, is subpar support for Haddock. But if you are willing to overlook this (for now, in any case), please give it a try!

by Edward Z. Yang at January 18, 2017 04:17 AM

January 17, 2017

Jasper Van der Jeugt

Lazy I/O and graphs: Winterfell to King's Landing


This post is about Haskell, and lazy I/O in particular. It is a bit longer than usual, so I will start with a high-level overview of what you can expect:

  • We talk about how we can represent graphs in a “shallow embedding”. This means we will not use a dedicated Graph type and rather represent edges by directly referencing other Haskell values.

  • This is a fairly good match when we want to encode infinite 1 graphs. When dealing with infinite graphs, there is no need to “reify” the graph and enumerate all the nodes and egdes – this would be futile anyway.

  • We discuss a Haskell implementation of shortest path search in a weighted graph that works on these infinite graphs and that has good performance characteristics.

  • We show how we can implement lazy I/O to model infinite graphs as pure values in Haskell, in a way that only the “necessary” parts of the graph are loaded from a database. This is done using the unsafeInterleaveIO primitive.

  • Finally, we discuss the disadvantages of this approach as well, and we review some of common problems associated with lazy I/O.

Let’s get to it!

As usual, this is a literate Haskell file, which means that you can just load this blogpost into GHCi and play with it. You can find the raw .lhs file here.

> {-# LANGUAGE OverloadedStrings   #-}
> {-# LANGUAGE ScopedTypeVariables #-}
> import           Control.Concurrent.MVar (MVar, modifyMVar, newMVar)
> import           Control.Monad           (forM_, unless)
> import           Control.Monad.State     (State, gets, modify, runState)
> import           Data.Hashable           (Hashable)
> import qualified Data.HashMap.Strict     as HMS
> import qualified Data.HashPSQ            as HashPSQ
> import           Data.Monoid             ((<>))
> import qualified Data.Text               as T
> import qualified Data.Text.IO            as T
> import qualified Database.SQLite.Simple  as SQLite
> import qualified System.IO.Unsafe        as IO

The problem at hand

As an example problem, we will look at finding the shortest path between cities in Westeros, the fictional location where the A Song of Ice and Fire novels (and HBO’s Game of Thrones) take place.

We model the different cities in a straightforward way. In addition to a unique ID used to identify them, they also have a name, a position (X,Y coordinates) and a list of reachable cities, with an associated time (in days) it takes to travel there. This travel time, also referred to as the cost, is not necessarily deducable from the sets of X,Y coordinates: some roads are faster than others.

> type CityId = T.Text
> data City = City
>     { cityId         :: CityId
>     , cityName       :: T.Text
>     , cityPos        :: (Double, Double)
>     , cityNeighbours :: [(Double, City)]
>     }

Having direct access to the neighbouring cities, instead of having to go through CityIds both has advantages and disadvantages.

On one hand, updating these values becomes cumbersome at best, and impossible at worst. If we wanted to change a city’s name, we would have to traverse all other cities to update possible references to the changed city.

On the other hand, it makes access more convenient (and faster!). Since we want a read-only view on the data, it works well in this case.

Getting the data

We will be using data extracted from, conveniently licensed under a Creative Commons license. You can find the complete SQL dump here. The schema of the database should not be too surprising:

  id   text  PRIMARY KEY NOT NULL,
  name text  NOT NULL,
  x    float NOT NULL,
  y    float NOT NULL
  origin      text  NOT NULL,
  destination text  NOT NULL,
  cost        float NOT NULL,
  PRIMARY KEY (origin, destination)
CREATE INDEX roads_origin ON roads (origin);

The road costs have been generated by multiplying the actual distances with a random number uniformly chosen between 0.6 and 1.4. Cities have been (bidirectionally) connected to at least four closest neighbours. This ensures that every city is reachable.

We will use sqlite in our example because there is almost no setup involved. You can load this database by issueing:

curl -L | sqlite3 got.db

But instead of considering the whole database (which we’ll get to later), let’s construct a simple example in Haskell so we can demonstrate the interface a bit. We can use a let to create bindings that refer to one another easily.

> test01 :: IO ()
> test01 = do
>     let winterfell = City "wtf" "Winterfell" (-105, 78)
>           [(13, moatCailin), (12, whiteHarbor)]
>         whiteHarbor = City "wih" "White Harbor" (-96, 74)
>           [(15, braavos), (12, winterfell)]
>         moatCailin = City "mtc" "Moat Cailin" (-104, 72)
>           [(20, crossroads), (13, winterfell)]
>         braavos = City "brv" "Braavos" (-43, 67)
>           [(17, kingsLanding), (15, whiteHarbor)]
>         crossroads = City "crs" "Crossroads Inn" (-94, 58)
>           [(7, kingsLanding), (20, crossroads)]
>         kingsLanding = City "kgl" "King's Landing" (-84, 45)
>           [(7, crossroads), (17, kingsLanding)]
>     printSolution $
>         shortestPath cityId cityNeighbours winterfell kingsLanding
Illustration of test01

Illustration of test01

printSolution is defined as:

> printSolution :: Maybe (Double, [City]) -> IO ()
> printSolution Nothing             = T.putStrLn "No solution found"
> printSolution (Just (cost, path)) = T.putStrLn $
>     "cost: " <> T.pack (show cost) <>
>     ", path: " <> T.intercalate " -> " (map cityName path)

We get exactly what we expect in GHCi:

*Main> test01
cost: 40.0, path: Winterfell -> Moat Cailin ->
Crossroads Inn -> King's Landing

So far so good! Now let’s dig in to how shortestPath works.

The Shortest Path algorithm

The following algorithm is known as Uniform Cost Search. It is a variant of Dijkstra’s graph search algorithm that is able to work with infinite graphs (or graphs that do not fit in memory anyway). It returns the shortest path between a known start and goal in a weighted directed graph.

Because this algorithm attempts to solve the problem the right way, including keeping back references, it is not simple. Therefore, if you are only interested in the part about lazy I/O, feel free to skip to this section and return to the algorithm later.

We have two auxiliary datatypes.

BackRef is a wrapper around a node and the previous node on the shortest path to the former node. Keeping these references around is necessary to iterate a list describing the entire path at the end.

> data BackRef node = BackRef {brNode :: node, brPrev :: node}

We will be using a State monad to implement the shortest path algorithm. This is our state:

> data SearchState node key cost = SearchState
>     { ssQueue    :: HashPSQ.HashPSQ key cost (BackRef node)
>     , ssBackRefs :: HMS.HashMap key node
>     }

In our state, we have:

  • A priority queue of nodes we will visit next in ssQueue, including back references. Using a priority queue will let us grab the next node with the lowest associated cost in a trivial way.

  • Secondly, we have the ssBackRefs map. That one serves two purposes: to keep track of which nodes we have already explored (the keys in the map), and to keep the back references of those locations (the values in the map).

These two datatypes are only used internally in the shortestPath function. Ideally, we would be able to put them in the where clause, but that is not possible in Haskell.

Instead of declaring a Node typeclass (possibly with associated types for the key and cost types), I decided to go with simple higher-order functions. We only need two of those function arguments after all: a function to give you a node’s key (nodeKey) and a function to get the node’s neighbours and associated costs (nodeNeighbours).

> shortestPath
>     :: forall node key cost.
>           (Ord key, Hashable key, Ord cost, Num cost)
>     => (node -> key)
>     -> (node -> [(cost, node)])
>     -> node
>     -> node
>     -> Maybe (cost, [node])
> shortestPath nodeKey nodeNeighbours start goal =

We start by creating an initial SearchState for our algorithm. Our initial queue holds one item (implying that we need explore the start) and our initial back references map is empty (we haven’t explored anything yet).

>     let startbr      = BackRef start start
>         queue0       = HashPSQ.singleton (nodeKey start) 0 startbr
>         backRefs0    = HMS.empty
>         searchState0 = SearchState queue0 backRefs0

walk is the main body of the shortest path search. We call that and if we found a shortest path, we return its cost together with the path which we can reconstruct from the back references (followBackRefs).

>         (mbCost, searchState1) = runState walk searchState0 in
>     case mbCost of
>         Nothing   -> Nothing
>         Just cost -> Just
>             (cost, followBackRefs (ssBackRefs searchState1))
>   where

Now, we have a bunch of functions that are used within the algorithm. The first one, walk, is the main body. We start by exploring the next node in the queue. By construction, this is always a node we haven’t explored before. If this node is the goal, we’re done. Otherwise, we check the node’s neighbours and update the queue with those neighbours. Then, we recursively call walk.

>     walk :: State (SearchState node key cost) (Maybe cost)
>     walk = do
>         mbNode <- exploreNextNode
>         case mbNode of
>             Nothing -> return Nothing
>             Just (cost, curr)
>                 | nodeKey curr == nodeKey goal ->
>                     return (Just cost)
>                 | otherwise -> do
>                     forM_ (nodeNeighbours curr) $ \(c, next) ->
>                         updateQueue (cost + c) (BackRef next curr)
>                     walk

Exploring the next node is fairly easy to implement using a priority queue: we simply need to pop the element with the minimal priority (cost) using minView. We also need indicate that we reached this node and save the back reference by inserting that info into ssBackRefs.

>     exploreNextNode
>         :: State (SearchState node key cost) (Maybe (cost, node))
>     exploreNextNode = do
>         queue0 <- gets ssQueue
>         case HashPSQ.minView queue0 of
>             Nothing                                   -> return Nothing
>             Just (_, cost, BackRef curr prev, queue1) -> do
>                 modify $ \ss -> ss
>                     { ssQueue    = queue1
>                     , ssBackRefs =
>                         HMS.insert (nodeKey curr) prev (ssBackRefs ss)
>                     }
>                 return $ Just (cost, curr)

updateQueue is called as new neighbours are discovered. We are careful about adding new nodes to the queue:

  1. If we have already explored this neighbour, we don’t need to add it. This is done by checking if the neighbour key is in ssBackRefs.
  2. If the neighbour is already present in the queue with a lower priority (cost), we don’t need to add it, since we want the shortest path. This is taken care of by the utility insertIfLowerPrio, which is defined below.
>     updateQueue
>         :: cost -> BackRef node -> State (SearchState node key cost) ()
>     updateQueue cost backRef = do
>         let node = brNode backRef
>         explored <- gets ssBackRefs
>         unless (nodeKey node `HMS.member` explored) $ modify $ \ss -> ss
>             { ssQueue = insertIfLowerPrio
>                 (nodeKey node) cost backRef (ssQueue ss)
>             }

If the algorithm finishes, we have found the lowest cost from the start to the goal, but we don’t have the path ready. We need to reconstruct this by following the back references we saved earlier. followBackRefs does that for us. It recursively looks up nodes in the map, constructing the path in the accumulator acc on the way, until we reach the start.

>     followBackRefs :: HMS.HashMap key node -> [node]
>     followBackRefs paths = go [goal] goal
>       where
>         go acc node0 = case HMS.lookup (nodeKey node0) paths of
>             Nothing    -> acc
>             Just node1 ->
>                 if nodeKey node1 == nodeKey start
>                    then start : acc
>                    else go (node1 : acc) node1

That’s it! The only utility left is the insertIfLowerPrio function. Fortunately, we can easily define this using the alter function from the psqueues package. That function allows us to change a key’s associated value and priority. It also allows to return an additional result, but we don’t need that, so we just use () there.

> insertIfLowerPrio
>     :: (Hashable k, Ord p, Ord k)
>     => k -> p -> v -> HashPSQ.HashPSQ k p v -> HashPSQ.HashPSQ k p v
> insertIfLowerPrio key prio val = snd . HashPSQ.alter
>     (\mbOldVal -> case mbOldVal of
>         Just (oldPrio, _)
>             | prio < oldPrio -> ((), Just (prio, val))
>             | otherwise      -> ((), mbOldVal)
>         Nothing              -> ((), Just (prio, val)))
>     key

Interlude: A (very) simple cache

Lazy I/O will guarantee that we only load the nodes in the graph when necessary.

However, since we know that the nodes in the graph do not change over time, we can build an additional cache around it. That way, we can also guarantee that we only load every node once.

Implementing such a cache is very simple in Haskell. We can simply use an MVar, that will even take care of blocking 2 when we have concurrent access to the cache (assuming that is what we want).

> type Cache k v = MVar (HMS.HashMap k v)
> newCache :: IO (Cache k v)
> newCache = newMVar HMS.empty
> cached :: (Hashable k, Ord k) => Cache k v -> k -> IO v -> IO v
> cached mvar k iov = modifyMVar mvar $ \cache -> do
>     case HMS.lookup k cache of
>         Just v  -> return (cache, v)
>         Nothing -> do
>             v <- iov
>             return (HMS.insert k v cache, v)

Note that we don’t really delete things from the cache. In order to keep things simple, we can assume that we will use a new cache for every shortest path we want to find, and that we throw away that cache afterwards.

Loading the graph using Lazy I/O

Now, we get to the main focus of the blogpost: how to use lazy I/O primitives to ensure resources are only loaded when they are needed. Since we are only concerned about one datatype (City) our loading code is fairly easy.

The most important loading function takes the SQLite connection, the cache we wrote up previously, and a city ID. We immediately use the cached combinator in the implementation, to make sure we load every CityId only once.

> getCityById
>     :: SQLite.Connection -> Cache CityId City -> CityId
>     -> IO City
> getCityById conn cache id' = cached cache id' $ do

Now, we get some information from the database. We play it a bit loose here and assume a singleton list will be returned from the query.

>     [(name, x, y)] <- SQLite.query conn
>         "SELECT name, x, y FROM cities WHERE id = ?" [id']

The neighbours are stored in a different table because we have a properly normalised database. We can write a simple query to obtain all roads starting from the current city:

>     roads <- SQLite.query conn
>         "SELECT cost, destination FROM roads WHERE origin = ?"
>         [id'] :: IO [(Double, CityId)]

This leads us to the crux of the matter. The roads variable contains something of the type [(Double, CityId)], and what we really want is [(Double, City)]. We need to recursively call getCityById to load what we want. However, doing this “the normal way” would cause problems:

  1. Since the IO monad is strict, we would end up in an infinite loop if there is a cycle in the graph (which is almost always the case for roads and cities).
  2. Even if there was no cycle, we would run into trouble with our usage of MVar in the Cache. We block access to the Cache while we are in the cached combinator, so calling getCityById again would cause a deadlock.

This is where Lazy I/O shines. We can implement lazy I/O by using the unsafeInterleaveIO primitive. Its type is very simple and doesn’t look as threatening as unsafePerformIO.

unsafeInterleaveIO :: IO a -> IO a

It takes an IO action and defers it. This means that the IO action is not executed right now, but only when the value is demanded. That is exactly what we want!

We can simply wrap the recursive calls to getCityById using unsafeInterleaveIO:

>     neighbours <- IO.unsafeInterleaveIO $
>         mapM (traverse (getCityById conn cache)) roads

And then return the City we constructed:

>     return $ City id' name (x, y) neighbours

Lastly, we will add a quick-and-dirty wrapper around getCityById so that we are also able to load cities by name. Its implementation is trivial:

> getCityByName
>     :: SQLite.Connection -> Cache CityId City -> T.Text
>     -> IO City
> getCityByName conn cache name = do
>     [[id']] <- SQLite.query conn
>         "SELECT id FROM cities WHERE name = ?" [name]
>     getCityById conn cache id'

Now we can neatly wrap things up in our main function:

> main :: IO ()
> main = do
>     cache <- newCache
>     conn  <- "got.db"
>     winterfell <- getCityByName conn cache "Winterfell"
>     kings      <- getCityByName conn cache "King's Landing"
>     printSolution $
>         shortestPath cityId cityNeighbours winterfell kings

This works as expected:

*Main> :main
cost: 40.23610549037591, path: Winterfell -> Moat Cailin ->
Greywater Watch -> Inn of the Kneeling Man -> Fairmarket ->
Brotherhood Without Banners Hideout -> Crossroads Inn ->
Darry -> Saltpans -> QuietIsle -> Antlers -> Sow's Horn ->
Brindlewood -> Hayford -> King's Landing

Disadvantages of Lazy I/O

Lazy I/O also has many disadvantages, which have been widely discussed. Among those are:

  1. Code becomes harder to reason about. In a setting without lazy I/O, you can casually reason about an Int as either an integer that’s already computed, or as something that will do some (pure) computation and then yield an Int.

    When lazy I/O enters the picture, things become more complicated. That Int you wanted to print? Yeah, it fired a bunch of missiles and returned the bodycount.

    This is why I would not seriously consider using lazy I/O when working with a team or on a large project – it can be easy to forget what is lazily loaded and what is not, and there’s no easy way to tell.

  2. Scarce resources can easily become a problem if you are not careful. If we keep a reference to a City in our heap, that means we also keep a reference to the cache and the SQLite connection.

    We must ensure that we fully evaluate the solution to something that doesn’t refer to these resources (to e.g. a printed string) so that the references can be garbage collected and the connections can be closed.

    Closing the connections is a problem in itself – if we cannot guarantee that e.g. streams will be fully read, we need to rely on finalizers, which are pretty unreliable…

  3. If we go a step further and add concurrency to our application, it becomes even tricker. Deadlocks are not easy to reason about – so how about reasoning about deadlocks when you’re not sure when the IO is going to be executed at all?

Despite all these shortcomings, I believe lazy I/O is a powerful and elegant tool that belongs in every Haskeller’s toolbox. Like pretty much anything, you need to be aware of what you are doing and understand the advantages as well as the disadvantages.

For example, the above downsides do not really apply if lazy I/O is only used within a module. For this blogpost, that means we could safely export the following interface:

> shortestPathBetweenCities
>     :: FilePath                       -- ^ Database name
>     -> CityId                         -- ^ Start city ID
>     -> CityId                         -- ^ Goal city ID
>     -> IO (Maybe (Double, [CityId]))  -- ^ Cost and path
> shortestPathBetweenCities dbFilePath startId goalId = do
>     cache <- newCache
>     conn  <- dbFilePath
>     start <- getCityById conn cache startId
>     goal  <- getCityById conn cache goalId
>     case shortestPath cityId cityNeighbours start goal of
>         Nothing           -> return Nothing
>         Just (cost, path) ->
>             let ids = map cityId path in
>             cost `seq` foldr seq () ids `seq`
>             return (Just (cost, ids))

Thanks for reading – and I hope I was able to offer you a nuanced view on lazy I/O. Special thanks to Jared Tobin for proofreading.

  1. In this blogpost, I frequently talk about “infinite graphs”. Of course most of these examples are not truly infinite, but we can consider examples that do not fit in memory completely, and in that way we can regard them as “infinite for practical purposes”.

  2. While blocking is good in this case, it might hurt performance when running in a concurrent environment. A good solution to that would be to stripe the MVars based on the keys, but that is beyond the scope of this blogpost. If you are interested in the subject, I talk about it a bit here.

by Jasper Van der Jeugt at January 17, 2017 12:00 AM

January 16, 2017

Michael Snoyman

safe-prelude: a thought experiment

This blog post is to share a very rough first stab at a new prelude I played around with earlier this month. I haven't used it in any significant way, and haven't spent more than a few hours on it total. I wrote it because I knew it was the only way to get the idea out of my head, and am sharing it in case anyone finds the idea intriguing or useful.

The project is available on Github at snoyberg/safe-prelude, and I've uploaded the Haddocks for easier reading (though, be warned, they aren't well organized at all). The rest of this post is just a copy of the file for the project.

This is a thought experiment in a different point in the alternative prelude design space. After my blog post on readFile, I realized I was unhappy with the polymorphic nature of readFile in classy-prelude. Adding that with Haskell Pitfalls I've been itching to try something else. I have a lot of hope for the foundation project, but wanted to play with this in the short term.


  • No partial functions, period. If a function can fail, its return type must express that. (And for our purposes: IO functions with runtime exceptions are not partial.)
  • Choose best in class libraries and promote them. bytestring and text fit that bill, as an example. Full listing below.
  • Regardless of the versions of underlying libraries, this package will always export a consistent API, so that CPP usage should be constrained to just inside this package.
  • Use generalization (via type classes) when they are well established. For example: Foldable and Traversable yes, MonoFoldable no.

    • Controversial Avoid providing list-specific functions. This connects to the parent point. Most of the time, I'd argue that lists are not the correct choice, and instead a Vector should be used. There is no standard for sequence-like typeclasses (though many exist), so we're not going to generalize. But we're also not going to use a less efficient representation.

      I was torn on this, but decided in favor of leaving out functions initially, on the basis that it's easier to add something in later rather than remove it.

  • Encourage qualified imports with a consistent naming scheme. This is a strong departure from classy-prelude, which tried to make it unnecessary to use qualified imports. I'll save my feelings about qualified imports for another time, this is just a pragmatic choice given the other constraints.
  • Export any non-conflicting and not-discouraged names from this module that make sense, e.g. ByteString, Text, or readIORef.


This list may fall out of date, so check the .cabal file for a current and complete listing. I'm keeping this here to include reasoning for some libraries:

  • bytestring and text, despite some complaints, are clearly the most popular representation for binary and textual data, respectively
  • containers and unordered-containers are both commonly used. Due to lack of generalization, this library doesn't expose any functions for working with their types, but they are common enough that adding the dependency just for exposing the type name is worth it
  • safe-exceptions hides the complexity of asynchronous exceptions, and should be used in place of Control.Exception
  • transformers and mtl are clear winners in the monad transformer space, at least for now
  • While young, say has been very useful for me in avoiding interleaved output issues
  • Others without real competitors: deepseq, semigroups

Packages I considered but have not included yet:

  • stm is an obvious winner, and while I use it constantly, I'm not convinced everyone else uses it as much as I do. Also, there are some questions around generalizing its functions (e.g., atomically could be in MonadIO), and I don't want to make that decision yet.

    • stm-chans falls into this category too
  • async is an amazing library, and in particular the race, concurrently, and Concurrently bits are an easy win. I've left it out for now due to questions of generalizing to MonadBaseControl (see lifted-async and its .Safe module)

  • Similar argument applies to monad-unlift

  • I didn't bother with exposing the Vector type... because which one would I expose? The Vector typeclass? Boxed Vector? Unboxed? I could do the classy-prelude thing and define type UVector = Data.Vector.Unboxed.Vector, but I'd rather not do such renamings.

Qualified imports

Here are the recommend qualified imports when working with safe-prelude.

import qualified "bytestring" Data.ByteString as B
import qualified "bytestring" Data.ByteString.Lazy as BL
import qualified "text" Data.Text as T
import qualified "text" Data.Text.Lazy as TL
import qualified "containers" Data.Map.Strict as Map
import qualified "containers" Data.Set as Set
import qualified "unordered-containers" Data.HashMap.Strict as HashMap
import qualified "unordered-containers" Data.HashSet as HashSet

January 16, 2017 12:00 AM

January 14, 2017

Dominic Steinitz

Calling Haskell from C

As part of improving the random number generation story for Haskell, I want to be able to use the testu01 library with the minimal amount of Haskell wrapping. testu01 assumes that there is a C function which returns the random number. The ghc manual gives an example but does not give all the specifics. These are my notes on how to get the example working under OS X (El Capitain 10.11.5 to be precise).

The Haskell:

{-# OPTIONS_GHC -Wall                 #-}

{-# LANGUAGE ForeignFunctionInterface #-}

module Foo where

foreign export ccall foo :: Int -> IO Int

foo :: Int -> IO Int
foo n = return (length (f n))

f :: Int -> [Int]
f 0 = []
f n = n:(f (n-1))

The .cabal:

name:                test-via-c
homepage:            TBD
license:             MIT
author:              Dominic Steinitz
category:            System
build-type:          Simple
cabal-version:       >=1.10

executable Foo.dylib
  main-is: Foo.hs
  other-extensions:    ForeignFunctionInterface
  build-depends:       base >=4.7 && =0.6 && <0.7
  hs-source-dirs:      src
  default-language:    Haskell2010
  include-dirs:        src
  ghc-options:         -O2 -shared -fPIC -dynamic
  extra-libraries:     HSrts-ghc8.0.1

On my computer running

cabal install

places the library in


The C:

#include "HsFFI.h"

#include "../dist/build/Foo.dylib/Foo.dylib-tmp/Foo_stub.h"

int main(int argc, char *argv[])
  int i;

  hs_init(&argc, &argv);

  for (i = 0; i < 5; i++) {
    printf("%d\n", foo(2500));

  return 0;

On my computer this can be compiled with

gcc-6 Bar.c

and can be run with


N.B. setting DYLD_LIBRARY_PATH like this is not recommended as it is a good way of breaking things. I have tried setting DYLD_FALLBACK_LIBRARY_PATH but only to get an error message. Hopefully, at some point I will be able to post a robust way of getting the executable to pick up the required dynamic libraries.

by Dominic Steinitz at January 14, 2017 01:39 PM

January 13, 2017

Brent Yorgey

My new programming languages course

tl;dr: my new PL course is now finished, and all the course materials are freely available. Working through all the exercises should be a great option for anyone wishing to learn some basics of programming language design and implementation.

Last May, I wrote about my ideas for designing a new PL course, and got a lot of great comments and feedback. Well, somehow I survived the semester, and the course is now over. In the end I’m pretty happy with how it went (though of course there are always things that can be improved next time).

I decided to use class time in an unconventional way: for each class meeting I created a “module”, consisting of a literate Haskell file with some example code, explanatory text, and lots of holes where students needed to write answers to exercises or fill in code. I split the students into groups, and they spent class time just working through the module. Instead of standing at the front lecturing, I just wandered around watching them work and answering questions. It took a bit of getting used to—for the first few classes I couldn’t shake the feeling that I wasn’t really doing my job—but it quickly became clear that the students were really learning and engaging with the material in a way that they would not have been able to if I had just lectured.

A happy byproduct of this approach is that the modules are fairly self-contained and can now be used by anyone to learn the material. Reading through all the modules and working through the exercises should be a great option for anyone wishing to learn some basics of programming language design and implementation. For example, I know I will probably reuse it to get summer research students up to speed. Note that the course assumes no knowledge of Haskell (so those familiar with Haskell can safely skip the first few modules), but introduces just enough to get where I want to go.

I don’t plan to release any solutions, so don’t ask. But other than that, questions, comments, bug reports, etc. are welcome!

by Brent at January 13, 2017 09:55 PM

January 12, 2017

FP Complete

Containerizing a legacy application: an overview

An overview of what containerization is, the reasons to consider running a legacy application in Docker containers, the process to get it there, the issues you may run into, and next steps once you are deploying with containers. You'll reduce the stress of deployments, and take your first steps on the path toward no downtime and horizontal scaling.

Note: This post focuses on simplifying deployment of the application. It does not cover topics that may require re-architecting parts of the application, such as high-availability and horizontal scaling.


What is a "Legacy" App?

There's no one set of attributes that typifies all legacy apps, but common attributes include:

  • Using the local filesystem for persistent storage, with data files intermingled with application files.
  • Running many services on one server, such as a MySQL database, Redis server, Nginx web server, a Ruby on Rails application, and a bunch of cron jobs.
  • Installation and upgrades use a hodgepodge of scripts and manual processes (often poorly documented).
  • Configuration is stored in files, often in multiple places and intermingled with application files.
  • Inter-process communication uses the local filesystem (e.g. dropping files in one place for another process to pick up) rather than TCP/IP.
  • Designed assuming one instance on the application would run on a single server.

Disadvantages of the legacy approach

  • Automating deployments is difficult
  • If you need multiple customized instances of the application, it's hard to "share" a single server between multiple instances.
  • If the server goes down, can take a while to replace due to manual processes.
  • Deploying new versions is a fraught manual or semi-manual process which is hard to roll back.
  • It's possible for test and production environments to drift apart, which leads to problems in production that were not detected during testing.
  • You cannot easily scale horizontally by adding more instances of the application.

What is "Containerization"?

"Containerizing" an application is the process of making it able to run and deploy under Docker containers and similar technologies that encapsulate an application with its operating system environment (a full system image). Since containers provide the application with an environment very similar to having full control of a system, this is a way to begin modernizing the deployment of the application while making minimal or no changes to the application itself. This provides a basis for incrementally making the application's architecture more "cloud-friendly."

Benefits of Containerization

  • Deployment becomes much easier: replacing the whole container image with a new one.
  • It's relatively easy to automate deployments, even having them driven completely from a CI (continuous integration) system.
  • Rolling back a bad deployment is just a matter of switching back to the previous image.
  • It's very easy to automate application updates since there are no "intermediate state" steps that can fail (either the whole deployment succeeds, or it all fails).
  • The same container image can be tested in a separate test environment, and then deployed to the production environment. You can be sure that what you tested is exactly the same as what is running in production.
  • Recovering a failed system is much easier, since a new container with exactly the same application can be automatically spun up on new hardware and attached to the same data stores.
  • Developers can also run containers locally to test their work in progress in a realistic environment.
  • Hardware can be used more efficiently, by running multiple containerized applications on a single host that ordinarily could not easily share a single system.
  • Containerizing is a good first step toward supporting no-downtime upgrades, canary deployments, high availability, and horizontal scaling.

Alternatives to containerization

  • Configuration management tools like Puppet and Chef help with some of the "legacy" issues such as keeping environments consistent, but they do not support the "atomic" deployment or rollback of the entire environment and application at once. This can still go wrong partway through a deployment with no easy way to roll everything back.

  • Virtual machine images are another way to achieve many of the same goals, and there are cases where it makes more sense to do the "atomic" deployment operations using entire VMs rather than containers running on a host. The main disadvantage is that hardware utilization may be less efficient, since VMs need dedicated resources (CPU, RAM, disk), whereas containers can share a single host's resources between them.

How to containerize


Identify filesystem locations where persistent data is written

Since deploying a new version of the application is performed by replacing the Docker image, any persistent data must be stored outside of the container. If you're lucky, the application already writes all its data to a specific path, but many legacy applications spread their data all over the filesystem and intermingle it with the application itself. Either way, Docker's volume mounts let us expose the host's filesystem to specific locations in the container filesystem so that data survives between containers, so we must identify the locations to persist.

You may at this stage consider modifying the application to support writing all data within a single tree in the filesystem, as that will simplify deployment of the containerized version. However, this is not necessary if modifying the application is impractical.

Identify configuration files and values that will vary by environment

Since a single image should be usable in multiple environments (e.g. test and production) to ensure consistency, any configuration values that will vary by environment must be identified so that the container can be configured at startup time. These could take the form of environment variables, or of values within one or more configuration files.

You may at this stage want to consider modifying the application to support reading all configuration from environment variables, as that that will simplify containerizing it. However, this is not necessary if modifying the application is impractical.

Identify services that can be easily externalized

The application may use some services running on the local machine that are easy to externalize due to being highly independent and supporting communication by TCP/IP. For example, if you run a database such as MySQL or PostgreSQL or a cache such as Redis on the local system, that should be easy to run externally. You may need to adjust configuration to support specifying a hostname and port rather than assuming the service can be reached on localhost.

Creating the image

Create a Dockerfile that installs the application

If you already have the installation process automated via scripts or using a configuration management tool such as Chef or Puppet, this should be relatively easy. Start with an image of your preferred operating system, install any prerequisites, and then run the scripts.

If the current setup process is more manual, this will involve some new scripting. But since the exact state of the image is known, it's easier to script the process than it would be when you have to deal with the potentially inconsistent state of a raw system.

If you identified externalizable services earlier, you should modify the scripts to not install them.

A simple example Dockerfile:

# Start with an official Ubuntu 16.04 Docker image
FROM ubuntu:16.04

# Install prerequisite Ubuntu packages
RUN apt-get install -y <REQUIRED UBUNTU PACKAGES> \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*

# Copy the application into the image
ADD . /app

# Run the app setup script
RUN /app/

# Switch to the application directory

# Specify the application startup script

Startup script for configuration

If the application takes all its configuration as environment variables already, then you don't need to do anything. However, if you have environment-dependent configuration values in configuration files, you will need to create an application startup script that reads these values from environment variables and then updates the configuration files.

An simple example startup script:

#!/usr/bin/env bash
set -e

# Append to the config file using $MYAPPCONFIG environment variable.
cat >>/app/config.txt <<END
my_app_config = "${MYAPPCONFIG}"

# Run the application using $MYAPPARG environment variable for an argument.
/app/bin/my-app --my-arg="${MYAPPARG}"

Push the image

After building the image (using docker build), it must be pushed to a Docker Registry so that it can be pulled on the machine where it will deployed (if you are running on the same machine as the image was built on, then this is not necessary).

You can use Docker Hub for images (a paid account lets you create private image repositories), or most cloud providers also provide their own container registries (e.g. Amazon ECR).

Give the image a tag (e.g. docker tag myimage mycompany/myimage:mytag) and then push it (e.g. docker push mycompany/myimage:mytag). Each image for a version of the application should have a unique tag, so that you always know which version you're using and so that images for older versions are available to roll back to.

How to deploy

Deploying containers is a big topic, and this section just focuses on directly running containers using docker commands. Tools like docker-compose (for simple cases where all containers run on a single server) and Kubernetes (for container orchestration across a cluster) should be considered in real-world usage.

Externalized services

Services you identified for externalization earlier can be run in separate Docker containers that will be linked to the main application. Alternatively, it is often easiest to outsource to managed services. For example, if you are using AWS, using RDS for a database or Elasticache for a cache significantly simplifies your life since they take care of maintenance, high availability, and backups for you.

An example of running a Postgres database container:

docker run \
    -d \
    --name db \
    -v /usr/local/var/docker/volumes/postgresql/data:/var/lib/postgresql/data \

The application

To run the application in a Docker container, you use a command-line such as this:

docker run \
    -d \
    -p 8080:80 \
    --name myapp \
    -v /usr/local/var/docker/volumes/myappdata:/var/lib/myappdata \
    -e MYAPPCONFIG=myvalue \
    -e MYAPPARG=myarg \
    --link db:db \

The -p argument exposes the container's port 80 on the host's port 8080, -v argument sets up the volume mount for persistent data (in the hostpath:containerpath format), the -e argument sets a configuration environment variable (these may both be repeated for additional volumes and variables), and the --link argument links the database container so the application can communicate with it. The container will be started with the startup script you specified in the Dockerfile's COMMAND.


To upgrade to a new version of the application, stop the old container (e.g., docker rm -f myapp) and start a new one with the new image tag (this will require a brief down time). Rolling back is the similar, except that you use the old image tag.

Additional considerations

"init" process (PID 1)

Legacy applications often run multiple processes, and it's not uncommon for orphan processes to accumulate if there is no "init" (PID 1) daemon to clean them up. Docker does not, by default, provide such a daemon, so it's recommended to add one as the ENTRYPOINT in your Dockerfile. dumb-init is an example lightweight init daemon, among others. phusion/baseimage is a fully-featured base image that includes an init daemon in addition to other services.

See our blog post dedicated to this topic: Docker demons: PID-1, orphans, zombies, and signals.

Daemons and cron jobs

The usual way to use Docker containers is to have a single process per container. Ideally, any cron jobs and daemons can be externalized into separate containers, but this is not always possible in legacy applications without re-architecting them. There is no intrinsic reason why containers cannot run many processes, but it does require some extra setup since standard base images do not include process managers and schedulers. Minimal process supervisors, such as runit, are more appropriate to use in containers than full-fledged systems like systemd. phusion/baseimage is a fully-featured base image that includes runit and cron, in addition to other services.

Volume-mount permissions

It's common (though not necessarily recommended) to run all processes in containers as the root user. Legacy applications often have more complex user requirements, and may need to run as a different user (or multiple processes as multiple users). This can present a challenge when using volume mounts, because Docker makes the mount points owned by root by default, which means non-root processes will not be able to write to them. There are two ways to deal with this.

The first approach is to create the directories on the host first, owned by the correct UID/GID, before starting the container. Note that since the container and host's users don't match up, you have to be careful to use the same UID/GID as the container, and not merely the same usernames.

The other approach is for the container itself to adjust the ownership of the mount points during its startup. This has to happen while running as root, before switching to a non-root user to start the application.

Database migrations

Database schema migrations always present a challenge for deployments, because the database schema can be very tightly coupled with the application, and that makes controlling the timing of the migration important, as well as making rolling back to an older version of the application more difficult since database migrations can't always be rolled back easily.

A way to mitigate this easily is to have a staged approach to migrations. You need to make an incompatible schema change, you split that change over two application deployments. For example, if you want to move a piece of data from one location to another, these would be the phases:

  1. Write the data to both the old and new locations, and read it from the new location. This means that if you roll the application back to the previous version, any the new data is still where it expects to find it.

  2. Stop writing it to the old location.

Note that if you want to have deployments with no downtime, that means running multiple versions of the application at the same time, which makes this even more of a challenge.

Backing up data

Backing up from a containerized application is usually easier than the non-containerized deployment. Data files can be backed up from the host and you don't risk any intermingling of data files with application files because they are strictly separated. If you've moved databases to managed services such as RDS, those can take care of backups for you (at least if your needs are relatively simple).

Migrating existing data

To transition the production application to the new containerized version, you will need to migrate the old deployment's data. How to do this will vary, but usually the simplest is to stop the old deployment, back up all the data, and restore it to the new deployment. This should be practiced in advance, and will necessitate some down time.


While it requires some up-front work, containerizing a legacy application will help you get control of, automate, and minimize the stress of deploying it. It sets you on a path toward modernizing your application and supporting no-downtime deployments, high availability, and horizontal scaling.

FP Complete has undertaken this process many times in addition to building containerized applications from the ground up. If you'd like to get on the path to modern and stress-free deployment of your applications, you can learn more about our Devops and Consulting services, or contact us straight away!

January 12, 2017 03:45 PM

January 11, 2017

Toby Goodwin

Artificial Superintelligence

I like Tim Urban's Wait But Why? site (tagline: new post every sometimes). But I thought his article on The AI Revolution - The Road to Superintelligence was dead wrong.

Vaguely at the back of my mind was that I ought to try to write some kind of rebuttal, but it would have taken a lot of time (which I don't have) to research the topic properly, and write it up.

So I was delighted to come across Superintelligence - The Idea That Eats Smart People which does a far better job than I ever could have done.

I recommend reading both of them.

January 11, 2017 09:50 PM

The GHC Team

GHC 8.0.2 is available!

The GHC team is happy to at last announce the 8.0.2 release of the Glasgow Haskell Compiler. Source and binary distributions are available at

This is the second release of the 8.0 series and fixes nearly two-hundred bugs. These include,

  • Interface file build determinism (#4012).
  • Compatibility with macOS Sierra and GCC compilers which compile position-independent executables by default
  • Compatibility with systems which use the gold linker
  • Runtime linker fixes on Windows (see #12797)
  • A compiler bug which resulted in undefined reference errors while compiling some packages (see #12076)
  • A number of memory consistency bugs in the runtime system
  • A number of efficiency issues in the threaded runtime which manifest on larger core counts and large numbers of bound threads.
  • A typechecker bug which caused some programs using -XDefaultSignatures to be incorrectly accepted.
  • More than two-hundred other bugs. See Trac for a complete listing.
  • #12757, which lead to broken runtime behavior and even crashes in the presence of primitive strings.
  • #12844, a type inference issue affecting partial type signatures.
  • A bump of the directory library, fixing buggy path canonicalization behavior (#12894). Unfortunately this required a major version bump in directory and minor bumps in several other libraries.
  • #12912, where use of the select system call would lead to runtime system failures with large numbers of open file handles.
  • #10635, wherein -Wredundant-constraints was included in the -Wall warning set

A more detailed list of the changes included in this release can be found in the release notes.

Please note that this release breaks with our usual tendency to avoid major version bumps of core libraries in minor GHC releases by including an upgrade of the directory library to

Also note that, due to a rather serious bug (#13100) affecting Windows noticed late in the release cycle, the Windows binary distributions were produced using a slightly patched source tree. Users compiling from source for Windows should be certain to include this patch in their build.

This release is the result of six months of effort by the GHC development community. We'd like to thank everyone who has contributed code, bug reports, and feedback to this release. It's only due to their efforts that GHC remains a vibrant and exciting project.

How to get it

Both the source tarball and binary distributions for a wide variety of platforms are available here.


Haskell is a standardized lazy functional programming language.

The Glasgow Haskell Compiler (GHC) is a state-of-the-art programming suite for Haskell. Included is an optimising compiler generating efficient code for a variety of platforms, together with an interactive system for convenient, quick development. The distribution includes space and time profiling facilities, a large collection of libraries, and support for various language extensions, including concurrency, exceptions, and foreign language interfaces. GHC is distributed under a BSD-style open source license.

Supported Platforms

The list of platforms we support, and the people responsible for them, can be found on the GHC wiki

Ports to other platforms are possible with varying degrees of difficulty. The Building Guide describes how to go about porting to a new platform.


We welcome new contributors. Instructions on getting started with hacking on GHC are available from GHC's developer site.

Community Resources

There are mailing lists for GHC users, develpoers, and monitoring bug tracker activity; to subscribe, use the Mailman web interface.

There are several other Haskell and GHC-related mailing lists on; for the full list, see the lists page.

Some GHC developers hang out on the #ghc and #haskell of the Freenode IRC network, too. See the Haskell wiki for details.

Please report bugs using our bug tracking system. Instructions on reporting bugs can be found here.

by bgamari at January 11, 2017 06:40 PM

Christopher Done

Fast Haskell: Competing with C at parsing XML

In this post we’re going to look at parsing XML in Haskell, how it compares with an efficient C parser, and steps you can take in Haskell to build a fast library from the ground up. We’re going to get fairly detailed and get our hands dirty.

A new kid on the block

A few weeks ago Neil Mitchell posted a blog post about a new XML library that he’d written. The parser is written in C, and the API is written in Haskell which uses the C library. He writes that it’s very fast:

Hexml has been designed for speed. In the very limited benchmarks I’ve done it is typically just over 2x faster at parsing than Pugixml, where Pugixml is the gold standard for fast XML DOM parsers. In my uses it has turned XML parsing from a bottleneck to an irrelevance, so it works for me.

In order to achieve that speed, he cheats by not performing operations he doesn’t care about:

To gain that speed, Hexml cheats. Primarily it doesn’t do entity expansion, so &amp; remains as &amp; in the output. It also doesn’t handle CData sections (but that’s because I’m lazy) and comment locations are not remembered. It also doesn’t deal with most of the XML standard, ignoring the DOCTYPE stuff. [..] I only work on UTF8, which for the bits of UTF8 I care about, is the same as ASCII - I don’t need to do any character decoding.

Cheating is fine when you describe in detail how you cheat. That’s just changing the rules of the game!

But C has problems

This post caught my attention because it seemed to me a pity to use C. Whether you use Haskell, Python, or whatever, there are a few problems with dropping down to C from your high-level language:

  • The program is more likely to segfault. I’ll take an exception over a segfault any day!
  • The program opens itself up to possible exploitation due to lack of memory safety.
  • If people want to extend your software, they have to use C, and not your high-level language.
  • Portability (i.e. Windows) is a pain in the arse with C.

Sure enough, it wasn’t long before Austin Seipp posted a rundown of bugs in the C code:

At the moment, sorry to say – I wouldn’t use this library to parse any arbitrary XML, since it could be considered hostile, and get me owned. Using American Fuzzy Lop, just after a few minutes, I’ve already found around ~30 unique crashes.

But C is really fast right? Like 100s of times faster than Haskell! It’s worth the risk.

But-but C is fast!

Let’s benchmark it. We’re going to parse a 4KB, a 31KB and a 211KB XML file.

Using the Criterion benchmarking package, we can compare Hexml against the pretty old Haskell xml package…

File   hexml    xml
4KB    6.26 μs  1.94 ms (1940 μs)
31KB   9.41 μs  13.6 ms (13600 μs)
211KB  260  μs  25.9 ms (25900 μs)

Ouch! Those numbers don’t look good. The xml package is 100-300x times slower.

Okay, I’m being unfair. The xml package isn’t known for speed. Its package description is simply A simple XML library. Let’s compare with the hexpat package. That one has this in its description:

The design goals are speed, speed, speed, interface simplicity and modularity.

So that’s probably more representing the best in Haskell XML parsers. It’s also based on the C expat library, which is supposed to be fast.

File   hexml     hexpat
4KB    6.395 μs  320.3 μs
31KB   9.474 μs  378.3 μs
211KB  256.2 μs  25.68 ms

That’s a bit better. We’re now between 40-100x slower than Hexml. I’d prefer 10x slower, but it’s a more reasonable outcome. The hexpat package handles: keeping location information, reasonable parse errors, the complete XML standard. Hexml doesn’t do any of that.

Let’s set us a challenge. Can we match or beat the Hexml package in plain old Haskell? This is an itch that got under my skin. I emailed Neil and he was fine with it:

I don’t think it’s unfair or attacky to use Hexml as the baseline - I’d welcome it!

I’ll walk you through my approach. I called my library Xeno (for obvious reasons).

Start with the simplest thing possible

…and make sure it’s fast. Here’s the first thing I wrote, to see how fast it was to walk across a file compared with Hexml.

module Xeno (parse) where
import           Data.ByteString (ByteString)
import qualified Data.ByteString as S
import           Data.Word

-- | Parse an XML document.
parse :: ByteString -> ()
parse str =
  parseTags 0
    parseTags index =
      case elemIndexFrom 60 str index of
        Nothing ->
        Just fromLt ->
          case elemIndexFrom 62 str fromLt of
            Nothing -> ()
            Just fromGt -> do
              parseTags (fromGt + 1)

-- | Get index of an element starting from offset.
elemIndexFrom :: Word8 -> ByteString -> Int -> Maybe Int
elemIndexFrom c str offset = fmap (+ offset) (S.elemIndex c (S.drop offset str))
{-# INLINE elemIndexFrom #-}

The numbers 60 and 62 are < and >. In XML the only characters that matter are < and > (if you don’t care about entities). < and > can’t appear inside speech marks (attributes). They are the only important things to search for. Results:

File   hexml     xeno
4KB    6.395 μs  2.630 μs
42KB   37.55 μs  7.814 μs

So the baseline performance of walking across the file in jumps is quite fast! Why is it fast? Let’s look at that for a minute:

  • The ByteString data type is a safe wrapper around a vector of bytes. It’s underneath equivalent to char* in C.
  • With that in mind, the S.elemIndex function is implemented using the standard C function memchr(3). As we all know, memchr jumps across your file in large word boundaries or even using SIMD operations, meaning it’s bloody fast. But the elemIndex function itself is safe.

So we’re effectively doing a for(..) { s=memchr(s,..) } loop over the file.

Keep an eye on the allocations

Using the weigh package for memory allocation tracking, we can also look at allocations of our code right now:

Case         Bytes  GCs  Check
4kb parse    1,168    0  OK
42kb parse   1,560    0  OK
52kb parse   1,168    0  OK
182kb parse  1,168    0  OK

We see that it’s constant. Okay, it varies by a few bytes, but it doesn’t increase linearly or anything. That’s good! One thing that stood out to me, is that didn’t we pay for allocation of the Maybe values. For a 1000x < and > characters, we should have 1000 allocations of Just/Nothing. Let’s go down that rabbit hole for a second.

Looking at the Core

Well, if you compile the source like this

stack ghc -- -O2 -ddump-simpl Xeno.hs

You’ll see a dump of the real Core code that is generated after the Haskell code is desugared, and before it’s compiled to machine code. At this stage you can already see optimizations based on inlining, common-sub-expression elimination, deforestation, and other things.

The output is rather large. Core is verbose, and fast code tends to be longer. Here is the output, but you don’t have to understand it. Just note that there’s no mention of Maybe, Just or Nothing in there. It skips that altogether. See here specifically. There is a call to memchr, then there is an eqAddr comparison with NULL, to see whether the memchr is done or not. But we’re still doing safety checks so that the resulting code is safe.

Inlining counts

The curious reader might have noticed that INLINE line in my first code sample.

{-# INLINE elemIndexFrom #-}

Without the INLINE, the whole function is twice as slow and has linear allocation.

Case        Bytes  GCs  Check
4kb parse   1,472    0  OK
42kb parse  1,160    0  OK
52kb parse  1,160    0  OK

benchmarking 4KB/xeno
time                 2.512 μs   (2.477 μs .. 2.545 μs)
benchmarking 211KB/xeno
time                 129.9 μs   (128.7 μs .. 131.2 μs)
benchmarking 31KB/xeno
time                 1.930 μs   (1.909 μs .. 1.958 μs)


Case         Bytes  GCs  Check
4kb parse   12,416    0  OK
42kb parse  30,080    0  OK
52kb parse  46,208    0  OK

benchmarking 4KB/xeno
time                 5.258 μs   (5.249 μs .. 5.266 μs)
benchmarking 211KB/xeno
time                 265.9 μs   (262.4 μs .. 271.4 μs)
benchmarking 31KB/xeno
time                 3.212 μs   (3.209 μs .. 3.218 μs)

Always pay attention to things like this. You don’t want to put INLINE on everything. Sometimes it adds slowdown, most times it makes no difference. So check with your benchmark suite.

Loop unrolling manually

Some things need to be done manually. I added comment parsing to our little function:

+        Just fromLt -> checkOpenComment (fromLt + 1)
+    checkOpenComment index =
+      if S.isPrefixOf "!--" (S.drop index str)
+         then findCommentEnd (index + 3)
+         else findLt index
+    findCommentEnd index =
+      case elemIndexFrom commentChar str index of
+        Nothing -> () -- error!
+        Just fromDash ->
+          if S.isPrefixOf "->" (S.drop (fromDash + 1) str)
+             then findGt (fromDash + 2)
+             else findCommentEnd (fromDash + 1)

And it became 2x slower:

benchmarking 4KB/xeno
time                 2.512 μs   (2.477 μs .. 2.545 μs)


benchmarking 4KB/xeno
time                 4.296 μs   (4.240 μs .. 4.348 μs)

So I changed the S.isPrefixOf to be unrolled to S.index calls, like this:

-      if S.isPrefixOf "!--" (S.drop index str)
-         then findCommentEnd (index + 3)
-         else findLt index
+      if S.index this 0 == bangChar &&
+         S.index this 1 == commentChar &&
+         S.index this 2 == commentChar
+        then findCommentEnd (index + 3)
+        else findLt index
+      where
+        this = S.drop index str

And it dropped back down to our base speed again.

Finding tag names

I implemented finding tag names like this:

+    findTagName index0 =
+      case S.findIndex (not . isTagName) (S.drop index str) of
+        Nothing -> error "Couldn't find end of tag name."
+        Just ((+ index) -> spaceOrCloseTag) ->
+          if S.head this == closeTagChar
+            then findGt spaceOrCloseTag
+            else if S.head this == spaceChar
+                   then findLt spaceOrCloseTag
+                   else error
+                          ("Expecting space or closing '>' after tag name, but got: " ++
+                           show this)
+          where this = S.drop spaceOrCloseTag str
+      where
+        index =
+          if S.head (S.drop index0 str) == questionChar ||
+             S.head (S.drop index0 str) == slashChar
+            then index0 + 1
+            else index0

And immediately noticed a big slow down. From

Case        Bytes  GCs  Check
4kb parse   1,160    0  OK
42kb parse  1,472    0  OK
52kb parse  1,160    0  OK
Benchmark xeno-memory-bench: FINISH
Benchmark xeno-speed-bench: RUNNING...
benchmarking 4KB/hexml
time                 6.149 μs   (6.125 μs .. 6.183 μs)
benchmarking 4KB/xeno
time                 2.691 μs   (2.665 μs .. 2.712 μs)


Case          Bytes  GCs  Check
4kb parse    26,096    0  OK
42kb parse   65,696    0  OK
52kb parse  102,128    0  OK
Benchmark xeno-memory-bench: FINISH
Benchmark xeno-speed-bench: RUNNING...
benchmarking 4KB/hexml
time                 6.225 μs   (6.178 μs .. 6.269 μs)
benchmarking 4KB/xeno
time                 10.34 μs   (10.06 μs .. 10.59 μs)

The first thing that should jump out at you is the allocations. What’s going on there? I looked in the profiler output, by running stack bench --profile to see a profile output.

	Wed Jan 11 17:41 2017 Time and Allocation Profiling Report  (Final)

	   xeno-speed-bench +RTS -N -p -RTS 4KB/xeno

	total time  =        8.09 secs   (8085 ticks @ 1000 us, 1 processor)
	total alloc = 6,075,628,752 bytes  (excludes profiling overheads)

COST CENTRE            MODULE                             %time %alloc

parse.findTagName      Xeno                                35.8   72.7
getOverhead            Criterion.Monad                     13.6    0.0
parse.checkOpenComment Xeno                                 9.9    0.0
parse.findLT           Xeno                                 8.9    0.0
parse                  Xeno                                 8.4    0.0
>>=                    Data.Vector.Fusion.Util              4.6    7.7
getGCStats             Criterion.Measurement                2.8    0.0
basicUnsafeIndexM      Data.Vector.Primitive                1.6    2.0
fmap                   Data.Vector.Fusion.Stream.Monadic    1.3    2.2
rSquare.p              Statistics.Regression                1.3    1.5
basicUnsafeWrite       Data.Vector.Primitive.Mutable        1.2    1.4
innerProduct.\         Statistics.Matrix.Algorithms         1.0    1.6
qr.\.\                 Statistics.Matrix.Algorithms         0.8    1.2
basicUnsafeSlice       Data.Vector.Primitive.Mutable        0.5    1.1
transpose              Statistics.Matrix                    0.5    1.3

Right at the top, we have findTagName, doing all the allocations. So I looked at the code, and found that the only possible thing that could be allocating, is S.drop. This function skips n elements at the start of a ByteString. It turns out that S.head (S.drop index0 str) was allocating an intermediate string, just to get the first character of that string. It wasn’t copying the whole string, but it was making a new pointer to it.

So I realised that I could just replace S.head (S.drop n s) with S.index s n:

-          if S.head this == closeTagChar
+          if S.index str spaceOrCloseTag == closeTagChar
             then findLT spaceOrCloseTag
-            else if S.head this == spaceChar
+            else if S.index str spaceOrCloseTag == spaceChar
                    then findGT spaceOrCloseTag
                    else error "Expecting space or closing '>' after tag name."
-          where this = S.drop spaceOrCloseTag str
         index =
-          if S.head (S.drop index0 str) == questionChar ||
-             S.head (S.drop index0 str) == slashChar
+          if S.index str index0 == questionChar ||
+             S.index str index0 == slashChar

And sure enough, the allocations disappeared:

Case        Bytes  GCs  Check
4kb parse   1,160    0  OK
42kb parse  1,160    0  OK
52kb parse  1,472    0  OK
Benchmark xeno-memory-bench: FINISH
Benchmark xeno-speed-bench: RUNNING...
benchmarking 4KB/hexml
time                 6.190 μs   (6.159 μs .. 6.230 μs)
benchmarking 4KB/xeno
time                 4.215 μs   (4.175 μs .. 4.247 μs)

Down to 4.215 μs. That’s not as fast as our pre-name-parsing 2.691 μs. But we had to pay something for the extra operations per tag. We’re just not allocating anymore, which is great.

SAX for free

Eventually I ended up with a function called process that parses XML and triggers events in a SAX style:

  :: Monad m
  => (ByteString -> m ())               -- ^ Open tag.
  -> (ByteString -> ByteString -> m ()) -- ^ Tag attribute.
  -> (ByteString -> m ())               -- ^ End open tag.
  -> (ByteString -> m ())               -- ^ Text.
  -> (ByteString -> m ())               -- ^ Close tag.
  -> ByteString -> m ()

Thanks again to GHC’s optimizations, calling this function purely and doing nothing is exactly equal to the function before SAX-ization:

-- | Parse the XML but return no result, process no events.
validate :: ByteString -> Bool
validate s =
  case spork
               (\_ -> pure ())
               (\_ _ -> pure ())
               (\_ -> pure ())
               (\_ -> pure ())
               (\_ -> pure ())
               s)) of
    Left (_ :: XenoException) -> False
    Right _ -> True
Case        Bytes  GCs  Check
4kb parse   1,472    0  OK
42kb parse  1,160    0  OK
52kb parse  1,472    0  OK

benchmarking 4KB/xeno
time                 4.320 μs   (4.282 μs .. 4.361 μs)

This function performs at the same speed as process before it accepted any callback arguments. This means that the only overhead to SAX’ing will be the activities that the callback functions themselves do.

Specialization is for insects (and, as it happens, optimized programs)

One point of interest is that adding a SPECIALIZE pragma for the process function increases speed by roughly 1 μs. Specialization means that for a given function which is generic (type-class polymorphic), which means it will accept a dictionary argument at runtime for the particular instance, instead we will generate a separate piece of code that is specialized on that exact instance. Below is the Identity monad’s (i.e. just pure, does nothing) specialized type for process.

      :: (ByteString -> Identity ())
      -> (ByteString -> ByteString -> Identity ())
      -> (ByteString -> Identity ())
      -> (ByteString -> Identity ())
      -> (ByteString -> Identity ())
      -> ByteString
      -> Identity ()


benchmarking 4KB/xeno-sax
time                 5.877 μs   (5.837 μs .. 5.926 μs)
benchmarking 211KB/xeno-sax
time                 285.8 μs   (284.7 μs .. 287.4 μs)


benchmarking 4KB/xeno-sax
time                 5.046 μs   (5.036 μs .. 5.056 μs)
benchmarking 211KB/xeno-sax
time                 240.6 μs   (240.0 μs .. 241.5 μs)

In the 4KB case it’s only 800 ns, but as we say in Britain, take care of the pennies and the pounds will look after themselves. The 240->285 difference isn’t big in practical terms, but when we’re playing the speed game, we pay attention to things like that.

Where we stand: Xeno vs Hexml

Currently the SAX interface in Zeno outperforms Hexml in space and time. Hurrah! We’re as fast as C!

File   hexml-dom  xeno-sax
4KB    6.134 μs   5.147 μs
31KB   9.299 μs   2.879 μs
211KB  257.3 μs   241.0 μs

It’s also worth noting that Haskell does this all safely. All the functions I’m using are standard ByteString functions which do bounds checking and throw an exception if so. We don’t accidentally access memory that we shouldn’t, and we don’t segfault. The server keeps running.

If you’re interested, if we switch to unsafe functions (unsafeTake, unsafeIndex from the Data.ByteString.Unsafe module), we get a notable speed increase:

File   hexml-dom  xeno-sax
4KB    6.134 μs   4.344 μs
31KB   9.299 μs   2.570 μs
211KB  257.3 μs   206.9 μs

We don’t need to show off, though. We’ve already made our point. We’re Haskellers, we like safety. I’ll keep my safe functions.

But Hexml does more!

I’d be remiss if I didn’t address the fact that Hexml does more useful things than we’ve done here. Hexml allocates a DOM for random access. Oh no! Allocation: Haskell’s worse enemy!

We’ve seen that Haskell allocates a lot normally. Actually, have we looked at that properly?

Case                   Bytes  GCs  Check
4kb/hexpat-sax       444,176    0  OK
31kb/hexpat-sax      492,576    0  OK
211kb/hexpat-sax  21,112,392   40  OK
4kb/hexpat-dom       519,128    0  OK
31kb/hexpat-dom      575,232    0  OK
211kb/hexpat-dom  23,182,560   44  OK


Implementing a DOM parser for Xeno

All isn’t lost. Hexml isn’t a dumb parser that’s fast because it’s in C, it’s also a decent algorithm. Rather than allocating a tree, it allocates a big flat vector of nodes and attributes, which contain offsets into the original string. We can do that in Haskell too!

Here’s my design of a data structure contained in a vector. We want to store just integers in the vector. Integers that point to offsets in the original string. Here’s what I came up with.

We have three kinds of payloads. Elements, text and attributes:

1.  00 # Type tag: element
2.  00 # Parent index (within this array)
3.  01 # Start of the tag name in the original string
4.  01 # Length of the tag name
5.  05 # End index of the tag (within this array)
1.  02 # Type tag: attribute
2.  01 # Start of the key
3.  05 # Length of the key
4.  06 # Start of the value
5.  03 # Length of the value
1.  01 # Type tag: text
2.  01 # Start of the text
3.  10 # Length of the text

That’s all the detail I’m going to go into. You can read the code if you want to know more. It’s not a highly optimized format. Once we have such a vector, it’s possible to define a DOM API on top of it which can let you navigate the tree as usual, which we’ll see later.

We’re going to use our SAX parser–the process function, and we’re going to implement a function that writes to a big array. This is a very imperative algorithm. Haskellers don’t like imperative algorithms much, but Haskell’s fine with them.

The function ends up looking something like this:

  (do nil <- 1000
      vecRef <- newSTRef nil
      sizeRef <- fmap asURef (newRef 0)
      parentRef <- fmap asURef (newRef 0)
        (\(PS _ name_start name_len) ->
            <write the open tag elements>)
        (\(PS _ key_start key_len) (PS _ value_start value_len) ->
            <write an attribute into the vector>)
        (\_ -> <ignore>)
        (\(PS _ text_start text_len) ->
            <write a text entry into the vector>)
        (\_ ->
            <set the end position of the parent>
            <set the current element to the parent>)
      wet <- readSTRef vecRef
      arr <- UV.unsafeFreeze wet
      size <- readRef sizeRef
      return (UV.unsafeSlice 0 size arr))

The function runs in the ST monad which lets us locally read and write to mutable variables and vectors, while staying pure on the outside.

I allocate an array of 1000 64-bit Ints (on 64-bit arch), I keep a variable of the current size, and the current parent (if any). The current parent variable lets us, upon seeing a </close> tag, assign the position in the vector of where the parent is closed.

Whenever we get an event and the array is too small, I grow the array by doubling its size. This strategy is copied from the Hexml package.

Finally, when we’re done, we get the mutable vector, “freeze” it (this means making an immutable version of it), and then return that copy. We use unsafeFreeze to re-use the array without copying, which includes a promise that we don’t use the mutable vector afterwards, which we don’t.

The DOM speed

Let’s take a look at the speeds:

File   hexml-dom  xeno-sax  xeno-dom
4KB    6.123 μs   5.038 μs  10.35 μs
31KB   9.417 μs   2.875 μs  5.714 μs
211KB  256.3 μs   240.4 μs  514.2 μs

Not bad! The DOM parser is only <2x slower than Hexml (except in the 31KB where it’s faster. shrug). Here is where I stopped optimizing and decided it was good enough. But we can review some of the decisions made along the way.

In the code we’re using unboxed mutable references for the current size and parent, the mutable references are provided by the mutable-containers package. See these two lines here:

      sizeRef <- fmap asURef (newRef 0)
      parentRef <- fmap asURef (newRef 0)

Originally, I had tried STRef’s, which are boxed. Boxed just means it’s a pointer to an integer instead of an actual integer. An unboxed Int is a proper machine register. Using an STRef, we get worse speeds:

File   xeno-dom
4KB    12.18 μs
31KB   6.412 μs
211KB  631.1 μs

Which is a noticeable speed loss.

Another thing to take into consideration is the array type. I’m using the unboxed mutable vectors from the vector package. When using atomic types like Int, it can be a leg-up to use unboxed vectors. If I use the regular boxed vectors from Data.Vector, the speed regresses to:

File   xeno-dom
4KB    11.95 μs (from 10.35 μs)
31KB   6.430 μs (from 5.714 μs)
211KB  1.402 ms (from 514.2 μs)

Aside from taking a bit more time to do writes, it also allocates 1.5x more stuff:

Case                 Bytes  GCs  Check
4kb/xeno/dom        11,240    0  OK
31kb/xeno/dom       10,232    0  OK
211kb/xeno/dom   1,082,696    0  OK


Case                 Bytes  GCs  Check
4kb/xeno/dom        22,816    0  OK
31kb/xeno/dom       14,968    0  OK
211kb/xeno/dom   1,638,392    1  OK

See that GC there? We shouldn’t need it.

Finally, one more remark for the DOM parser. If we forsake safety and use the unsafeWrite and unsafeRead methods from the vector package, we do see a small increase:

File   xeno-dom
4KB    9.827 μs
31KB   5.545 μs
211KB  490.1 μs

But it’s nothing to write home about. I’ll prefer memory safety over a few microseconds this time.


I wrote some functions to access our vector and provide a DOM-like API:

> let Right node = parse "<foo k='123'><p>hi</p>ok</foo>"
> node
(Node "foo" [("k","123")] [Element (Node "p" [] [Text "hi"]),Text "ok"])
> name node
> children node
[(Node "p" [] [Text "hi"])]
> attributes node
> contents node
[Element (Node "p" [] [Text "hi"]),Text "ok"]

So that works.


The final results are in:

And just to check that a 1MB file doesn’t give wildly different results:

benchmarking 1MB/hexml-dom
time                 1.225 ms   (1.221 ms .. 1.229 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.239 ms   (1.234 ms .. 1.249 ms)
std dev              25.23 μs   (12.28 μs .. 40.84 μs)

benchmarking 1MB/xeno-sax
time                 1.206 ms   (1.203 ms .. 1.211 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.213 ms   (1.210 ms .. 1.218 ms)
std dev              14.58 μs   (10.18 μs .. 21.34 μs)

benchmarking 1MB/xeno-dom
time                 2.768 ms   (2.756 ms .. 2.779 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.801 ms   (2.791 ms .. 2.816 ms)
std dev              41.10 μs   (30.14 μs .. 62.60 μs)

Tada! We matched Hexml, in pure Haskell, using safe accessor functions. We provided a SAX API which is very fast, and a simple demonstration DOM parser with a familiar API which is also quite fast. We use reasonably little memory in doing so.

UPDATE: Some people asked for comparisons with libxml2 (assuming that it’s somehow faster). Here is a Criterion report.

This package is an experiment for educational purposes, to show what Haskell can do and what it can’t, for a very specific domain problem. If you would like to use this package, consider adopting it and giving it a good home. I’m not looking for more packages to maintain.

January 11, 2017 12:00 AM

January 10, 2017

Roman Cheplyaka

Nested monadic loops may cause space leaks

Consider the following trivial Haskell program:

main :: IO ()
main = worker

{-# NOINLINE worker #-}
worker :: (Monad m) => m ()
worker =
  let loop = poll >> loop
  in loop

poll :: (Monad m) => m a
poll = return () >> poll

It doesn’t do much — except, as it turns out, eat a lot of memory!

% ./test +RTS -s & sleep 1s && kill -SIGINT %1
     751,551,192 bytes allocated in the heap                                               
   1,359,059,768 bytes copied during GC
     450,901,152 bytes maximum residency (11 sample(s))
       7,166,816 bytes maximum slop
             888 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1429 colls,     0 par    0.265s   0.265s     0.0002s    0.0005s
  Gen  1        11 colls,     0 par    0.701s   0.703s     0.0639s    0.3266s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.218s  (  0.218s elapsed)
  GC      time    0.966s  (  0.968s elapsed)
  EXIT    time    0.036s  (  0.036s elapsed)
  Total   time    1.223s  (  1.222s elapsed)

  %GC     time      79.0%  (79.2% elapsed)

  Alloc rate    3,450,267,071 bytes per MUT second

  Productivity  21.0% of total user, 21.0% of total elapsed

These nested loops happen often in server-side programming. About a year ago, when I worked for Signal Vine, this happened to my code: the inner loop was a big streaming computation; the outer loop was something that would restart the inner loop should it fail.

Later that year, Edsko de Vries blogged about a very similar issue.

Recently, Sean Clark Hess observed something similar. In his case, the inner loop waits for a particular AMQP message, and the outer loop calls the inner loop repeatedly to extract all such messages.

So why would such an innocent-looking piece of code consume unbounded amounts of memory? To find out, let’s trace the program execution on the STG level.

Background: STG and IO

The runtime model of ghc-compiled programs is described in the paper Making a Fast Curry: Push/Enter vs. Eval/Apply for Higher-order Languages. Here is the grammar and the reduction rules for the quick reference.



It is going to be important that the IO type in GHC is a function type:

newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))

Here are a few good introductions to the internals of IO: from Edsko de Vries, Edward Z. Yang, and Michael Snoyman.

Our program in STG

Let’s see now how our program translates to STG. This is a translation done by ghc 8.0.1 with -O -ddump-stg -dsuppress-all:

poll_rnN =
    sat-only \r srt:SRT:[] [$dMonad_s312]
        let { sat_s314 = \u srt:SRT:[] [] poll_rnN $dMonad_s312; } in
        let { sat_s313 = \u srt:SRT:[] [] return $dMonad_s312 ();
        } in  >> $dMonad_s312 sat_s313 sat_s314;

worker =
    \r srt:SRT:[] [$dMonad_s315]
        let {
          loop_s316 =
              \u srt:SRT:[] []
                  let { sat_s317 = \u srt:SRT:[] [] poll_rnN $dMonad_s315;
                  } in  >> $dMonad_s315 sat_s317 loop_s316;
        } in  loop_s316;

main = \u srt:SRT:[r2 :-> $fMonadIO] [] worker $fMonadIO;

This is the STG as understood by ghc itself. In the notation of the fast curry paper introduced above, this (roughly) translates to:

main = THUNK(worker monadIO realWorld);

worker = FUN(monad ->
  let {
    loop = THUNK(let {worker_poll_thunk = THUNK(poll monad);}
                 in then monad worker_poll_thunk loop);
  } in loop

poll = FUN(monad ->
  let {
    ret_thunk = THUNK(return monad unit);
    poll_poll_thunk = THUNK(poll monad);
  in then monad ret_thunk poll_poll_thunk

monadIO is the record (“dictionary”) that contains the Monad methods >>=, >>, and return for the IO type. We will need return and >> (called then here) in particular; here is how they are defined:

returnIO = FUN(x s -> (# s, x #));
thenIO = FUN(m k s ->
  case m s of {
    (# new_s, result #) -> k new_s
monadIO = CON(Monad returnIO thenIO);
return = FUN(monad ->
  case monad of {
    Monad return then -> return
then = FUN(monad ->
  case monad of {
    Monad return then -> then

STG interpreters

We could run our STG program by hand following the reduction rules listed above. If you have never done it, I highly recommend performing several reductions by hand as an exercise. But it is a bit tedious and error-prone. That’s why we will use Bernie Pope’s Ministg interpreter. My fork of Ministg adds support for unboxed tuples and recursive let bindings necessary to run our program.

There is another STG interpreter, stgi, by David Luposchainsky. It is more recent and looks nicer, but it doesn’t support the eval/apply execution model used by ghc, which is a deal breaker for our purposes.

We run Ministg like this:

ministg --noprelude --trace --maxsteps=100 --style=EA --tracedir leak.trace leak.stg

Ministg will print an error message saying that the program hasn’t finished running in 100 steps — as we would expect, — and it will also generate a directory leak.trace containing html files. Each html file shows the state of the STG machine after a single evaluation step. You can browse these files here.

Tracing the program

Steps 0 through 16 take us from main to poll monadIO, which is where things get interesting, because from this point on, only code inside poll will be executing. Remember, poll is an infinite loop, so it won’t give a chance for worker to run ever again.

Each iteration of the poll loop consists of two phases. During the first phase, poll monadIO is evaluated. This is the “pure” part. No IO gets done during this part; we are just figuring out what is going to be executed. The first phase runs up until step 24.

On step 25, we grab the RealWorld token from the stack, and the second phase — the IO phase — begins. It ends on step 42, when the next iteration of the loop begins with poll monadIO.

Let’s look at the first phase in more detail. In steps 18 and 19, the let-expression

let {
  ret_thunk = THUNK(return monad unit);
  poll_poll_thunk = THUNK(poll monad);
in then monad ret_thunk poll_poll_thunk

is evaluated. The thunks ret_thunk and poll_poll_thunk are allocated on the heap at addresses $3 and $4, respectively.

Later these thunks will be evaluated/updated to partial applications: $3=PAP(returnIO unit) on step 35 and $4=PAP(thenIO $7 $8) on step 50.

We would hope that these partial applications will eventually be garbage-collected. Unfortunately, not. The partial application $1=PAP(thenIO $3 $4) is defined in terms of $3 and $4. $1 is the worker_poll_thunk, the “next” instance of the poll loop invoked by worker.

This is why the leak doesn’t occur if there’s no outer loop. Nothing would reference $3 and $4, and they would be executed and gc’d.

IO that doesn’t leak

The memory leak is a combination of two reasons. As we discussed above, the first reason is the outer loop that holds on to the reference to the inner loop.

The second reason is that IO happens here in two phases: the pure phase, during which we “compute” the IO action, and the second phase, during which we run the computed action. If there was no first phase, there would be nothing to remember.

Consider this version of the nested loop. Here, I moved NOINLINE to poll. (NOINLINE is needed because otherwise ghc would realize that our program doesn’t do anything and would simplify it down to a single infinite loop.)

main :: IO ()
main = worker

worker :: (Monad m) => m ()
worker =
  let loop = poll >> loop
  in loop

{-# NOINLINE poll #-}
poll :: (Monad m) => m a
poll = return () >> poll

In this version, ghc would inline worker into main and specialize it to IO. Here is the ghc’s STG code:

poll_rqk =
    sat-only \r srt:SRT:[] [$dMonad_s322]
        let { sat_s324 = \u srt:SRT:[] [] poll_rqk $dMonad_s322; } in
        let { sat_s323 = \u srt:SRT:[] [] return $dMonad_s322 ();
        } in  >> $dMonad_s322 sat_s323 sat_s324;

main1 =
    \r srt:SRT:[r3 :-> main1, r54 :-> $fMonadIO] [s_s325]
        case poll_rqk $fMonadIO s_s325 of _ {
          (#,#) ipv_s327 _ -> main1 ipv_s327;

Here, poll still runs in two phases, but main1 (the outer loop) doesn’t. This program still allocates memory and runs not as efficient as it could, but at least it runs in constant memory. This is because the compiler realizes that poll_rqk $fMonadIO is not computing anything useful and there’s no point in caching that value. (I am actually curious what exactly ghc’s logic is here.)

What if we push NOINLINE even further down?

main :: IO ()
main = worker

worker :: (Monad m) => m ()
worker =
  let loop = poll >> loop
  in loop

poll :: (Monad m) => m a
poll = do_stuff >> poll

{-# NOINLINE do_stuff #-}
do_stuff :: Monad m => m ()
do_stuff = return ()


do_stuff_rql =
    sat-only \r srt:SRT:[] [$dMonad_s32i] return $dMonad_s32i ();

$spoll_r2SR =
    sat-only \r srt:SRT:[r54 :-> $fMonadIO,
                         r2SR :-> $spoll_r2SR] [s_s32j]
        case do_stuff_rql $fMonadIO s_s32j of _ {
          (#,#) ipv_s32l _ -> $spoll_r2SR ipv_s32l;

main1 =
    \r srt:SRT:[r3 :-> main1, r2SR :-> $spoll_r2SR] [s_s32n]
        case $spoll_r2SR s_s32n of _ {
          (#,#) ipv_s32p _ -> main1 ipv_s32p;

This code runs very efficiently, in a single phase, and doesn’t allocate at all.

Of course, in practice we wouldn’t deliberately put these NOINLINEs in our code just to make it inefficient. Instead, the inlining or specialization will fail to happen because the function is too big and/or resides in a different module, or for some other reason.


Arities provide an important perspective on the two-phase computation issue. The arity of then is 1: it is just a record selector. The arity of thenIO is 3: it takes the two monadic values and the RealWorld state token.

Arities influence what happens at runtime, as can be seen from the STG reduction rules. Because thenIO has arity 3, a partial application is created for thenIO ret_thunk poll_poll_thunk. Let’s change the arity of thenIO to 2, so that no PAPs get created:

thenIO = FUN(m k ->
  case m realWorld of {
    (# new_s, result #) -> k

(this is similar to how unsafePerformIO works). Now we no longer have PAPs, but our heap is filled with the same exact number of BLACKHOLEs.

More importantly, arities also influence what happens during compile time: what shape the generated STG code has. Because then has arity 1, ghc decides to create a chain of thens before passing the RealWorld token. Let’s change (“eta-expand”) the poll code as if then had arity 4, without actually changing then or thenIO or their runtime arities:

# added a dummy argument s
poll = FUN(monad s ->
  let {
    ret_thunk = THUNK(return monad unit);
    poll_poll_thunk = THUNK(poll monad);
  in then monad ret_thunk poll_poll_thunk s
# no change in then or thenIO
then = FUN(monad ->
  case monad of {
    Monad return then -> then
thenIO = FUN(m k s ->
  case m s of {
    (# new_s, result #) -> k new_s

This code now runs in constant memory!

Therefore, what inlining/specialization does is that it lets the compiler to see the true arity of a function such as then. (Of course, it would also allow the compiler to replace then with thenIO.)


Let me tell you how you can avoid any such space leaks in your code by following a simple rule:

I don’t know.

In some cases, -fno-full-laziness or -fno-state-hack help. In this case, they don’t.

In 2012, I wrote why reasoning about space usage in Haskell is hard. I don’t think anything has changed since then. It is a hard problem to solve. I filed a ghc bug #13080 just in case the ghc developers might figure out a way how to address this particular issue.

Most of the time everything works great, but once in a while you stumble upon something like this. Such is life.

Thanks to Reid Barton for pointing out that my original theory regarding this leak was incomplete at best.

January 10, 2017 08:00 PM

January 09, 2017

Dan Piponi (sigfpe)

Building free arrows from components


Gabriel Gonzalez has written quite a bit about the practical applications of free monads. And "haoformayor" wrote a great stackoverflow post on how arrows are related to strong profunctors. So I thought I'd combine these and apply them to arrows built from profunctors: free arrows. What you get is a way to use arrow notation to build programs, but defer the interpretation of those programs until later.


Using the notation here I'm going to call an element of a type P a b, where P is a profunctor, a heteromorphism.

A product that isn't much of a product

As I described a while back you can compose profunctors. Take a look at the code I used, and also Data.Functor.Composition.

data Compose f g d c = forall a. Compose (f d a) (g a c)

An element of Compose f g d c is just a pair of heteromorphisms, one from each of the profunctors, f and g, with the proviso that the "output" type of one is compatible with the "input" type of the other. As products go it's pretty weak in the sense that no composition happens beyond the two objects being stored with each other. And that's the basis of what I'm going to talk about. The Compose type is just a placeholder for pairs of heteromorphisms whose actual "multiplication" is being deferred until later. This is similar to the situation with the free monoid, otherwise known as a list. We can "multiply" two lists together using mappend but all that really does is combine the elements into a bigger list. The elements themselves aren't touched in any way. That suggests the idea of using profunctor composition in the same way that (:) is used to pair elements and lists.

Free Arrows

Here's some code:

> {-# OPTIONS -W #-}
> {-# LANGUAGE ExistentialQuantification #-}
> {-# LANGUAGE Arrows #-}
> {-# LANGUAGE RankNTypes #-}
> {-# LANGUAGE TypeOperators #-}
> {-# LANGUAGE FlexibleInstances #-}

> import Prelude hiding ((.), id)
> import Control.Arrow
> import Control.Category
> import Data.Profunctor
> import Data.Monoid

> infixr :-

> data FreeA p a b = PureP (a -> b)
> | forall x. p a x :- FreeA p x b

First look at the second line of the definition of FreeA. It says that a FreeA p a b might be a pair consisting of a head heteromorphism whose output matches the input of another FreeA. There's also the PureP case which is acting like the empty list []. The reason we use this is that for our composition, (->) acts a lot like the identity. In particular Composition (->) p a b is isomorphic to p a b (modulo all the usual stuff about non-terminating computations and so on). This is because an element of this type is a pair consisting of a function a -> x and a heteromorphism p x b for some type x we don't get to see. We can't project back out either of these items without information about the type of x escaping. So the only thing we can possibly do is use lmap to apply the function to the heteromorphism giving us an element of p a b.

Here is a special case of PureP we'll use later:

> nil :: Profunctor p => FreeA p a a
> nil = PureP id

So an element of FreeA is a sequence of heteromorphisms. If heteromorphisms are thought of as operations of some sort, then an element of FreeA is a sequence of operations waiting to be composed together into a program that does something. And that's just like the situation with free monads. Once we've build a free monad structure we apply an interpreter to it to evaluate it. This allows us to separate the "pure" structure representing what we want to do from the code that actually does it.

The first thing to note is our new type is also a profunctor. We can apply lmap and rmap to a PureP function straightforwardly. We apply lmap directly to the head of the list and we use recursion to apply rmap to the PureP at the end:

> instance Profunctor b => Profunctor (FreeA b) where
> lmap f (PureP g) = PureP (g . f)
> lmap f (g :- h) = (lmap f g) :- h
> rmap f (PureP g) = PureP (f . g)
> rmap f (g :- h) = g :- (rmap f h)

We also get a strong profunctor by applying first' all the way down the list:

> instance Strong p => Strong (FreeA p) where
> first' (PureP f) = PureP (first' f)
> first' (f :- g) = (first' f) :- (first' g)

We can now concatenate our lists of heteromorphisms using code that looks a lot like the typical implementation of (++):

> instance Profunctor p => Category (FreeA p) where
> id = PureP id
> g . PureP f = lmap f g
> k . (g :- h) = g :- (k . h)

Note that it's slightly different to what you might have expected compared to (++) because we tend to write composition of functions "backwards". Additionally, there is another definition of FreeA we could have used that's analogous to using snoc lists instead of cons lists.

And now we have an arrow. I'll leave the proofs that the arrow laws are obeyed as an exercise :-)

> instance (Profunctor p, Strong p) => Arrow (FreeA p) where
> arr = PureP
> first = first'

The important thing about free things is that we can apply interpreters to them. For lists we have folds:

foldr :: (a -> b -> b) -> b -> [a] -> b

In foldr f e we can think of f as saying how (:) should be interpreted and e as saying how [] should be interpreted.

Analogously, in Control.Monad.Free in the free package we have:

foldFree :: Monad m => (forall x . f x -> m x) -> Free f a -> m a
foldFree _ (Pure a) = return a
foldFree f (Free as) = f as >>= foldFree f

Given a natural transformation from f to m, foldFree extends it to all of Free f.

Now we need a fold for free arrows:

> foldFreeA :: (Profunctor p, Arrow a) =>
> (forall b c.p b c -> a b c) -> FreeA p b c -> a b c
> foldFreeA _ (PureP g) = arr g
> foldFreeA f (g :- h) = foldFreeA f h . f g

It's a lot like an ordinary fold but uses the arrow composition law to combine the interpretation of the head with the interpretation of the tail.

"Electronic" components

Let me revisit the example from my previous article. I'm going to remove things I won't need so my definition of Circuit is less general here. Free arrows are going to allow us to define individual components for a circuit, but defer exactly how those components are interpreted until later.

I'll use four components this time: a register we can read from, one we can write from and a register incrementer, as well as a "pure" component. But before that, let's revisit Gabriel's article that gives some clues about how components should be built. In particular, look at the definition of TeletypeF:

data TeletypeF x
= PutStrLn String x
| GetLine (String -> x)
| ExitSuccess

We use GetLine to read a string, and yet the type of GetLine k could be TeletypeF a for any a. The reason is that free monads work with continuations. Instead of GetLine returning a string to us, it's a holder for a function that says what we'd like to do with the string once we have it. That means we can leave open the question of where the string comes from. The function foldFree can be used to provide the actual string getter.

Free arrows are like "two-sided" free monads. We don't just provide a continuation saying what we'd like to do to our output. We also get to say how we prepare our data for input.

There's also some burden put on us. Free arrows need strong profunctors. Strong profunctors need to be able to convey extra data alongside the data we care about - that's what first' is all about. This means that even though Load is functionally similar to GetLine, it can't simply ignore its input. So we don't have Load (Int -> b), and instead have Load ((a, Int) -> b. Here is our component type:

> data Component a b = Load ((a, Int) -> b)
> | Store (a -> (b, Int))
> | Inc (a -> b)

The Component only knows about the data passing through, of type a and b. It doesn't know anything about how the data in the registers is stored. That's the part that will be deferred to later. We intend for Inc to increment a register. But as it doesn't know anything about registers nothing in the type of Inc refers to that. (It took a bit of experimentation for me to figure this out and there may be other ways of doing things. Often with code guided by category theory you can just "follow your nose" as there's one way that works and type checks. Here I found a certain amount of flexibility in how much you store in the Component and how much is deferred to the interpreter.)

I could implement the strong profunctor instances using various combinators but I think it might be easiest to understand when written explicitly with lambdas:

> instance Profunctor Component where
> lmap f (Load g) = Load $ \(a, s) -> g (f a, s)
> lmap f (Store g) = Store (g . f)
> lmap f (Inc g) = Inc (g . f)

> rmap f (Load g) = Load (f . g)
> rmap f (Store g) = Store $ \a -> let (b, t) = g a
> in (f b, t)
> rmap f (Inc g) = Inc (f . g)

> instance Strong Component where
> first' (Load g) = Load $ \((a, x), s) -> (g (a, s), x)
> first' (Store g) = Store $ \(a, x) -> let (b, t) = g a
> in ((b, x), t)
> first' (Inc g) = Inc (first' g)

And now we can implement individual components. First a completely "pure" component:

> add :: Num a => FreeA Component (a, a) a
> add = PureP $ uncurry (+)

And now the load and store operations.

> load :: FreeA Component () Int
> load = Load (\(_, a) -> a) :- nil

> store :: FreeA Component Int ()
> store = Store (\a -> ((), a)) :- nil

> inc :: FreeA Component a a
> inc = Inc id :- nil

Finally we can tie it all together in a complete function using arrow notation:

> test = proc () -> do
> () <- inc -< ()
> a <- load -< ()
> b <- load -< ()
> c <- add -< (a, b)
> () <- store -< c

> returnA -< ()

At this point, the test object is just a list of operations waiting to be executed. Now I'll give three examples of semantics we could provide. The first uses a state arrow type similar to the previous article:

> newtype Circuit s a b = C { runC :: (a, s) -> (b, s) }

> instance Category (Circuit s) where
> id = C id
> C f . C g = C (f . g)

> instance Arrow (Circuit s) where
> arr f = C $ \(a, s) -> (f a, s)
> first (C g) = C $ \((a, x), s) -> let (b, t) = g (a, s)
> in ((b, x), t)

Here is an interpreter that interprets each of our components as an arrow. Note that this is where, among other things, we provide the meaning of the Inc operation:

> exec :: Component a b -> Circuit Int a b
> exec (Load g) = C $ \(a, s) -> (g (a, s), s)
> exec (Store g) = C $ \(a, _) -> g a
> exec (Inc g) = C $ \(a, s) -> (g a, s+1)

Here's a completely different interpreter that is going to make you do the work of maintaining the state used by the resgisters. You'll be told what to do! We'll use the Kleisli IO arrow to do the I/O.

> exec' :: Component a b -> Kleisli IO a b
> exec' (Load g) = Kleisli $ \a -> do
> putStrLn "What is your number now?"
> s <- fmap read getLine
> return $ g (a, s)
> exec' (Store g) = Kleisli $ \a -> do
> let (b, t) = g a
> putStrLn $ "Your number is now " ++ show t ++ "."
> return b
> exec' (Inc g) = Kleisli $ \a -> do
> putStrLn "Increment your number."
> return $ g a

The last interpreter is simply going to sum values associated to various components. They could be costs in dollars, time to execute, or even strings representing some kind of simple execution trace.

> newtype Labelled m a b = Labelled { unLabelled :: m }

> instance Monoid m => Category (Labelled m) where
> id = Labelled mempty
> Labelled a . Labelled b = Labelled (a `mappend` b)

> instance Monoid m => Arrow (Labelled m) where
> arr _ = Labelled mempty
> first (Labelled m) = Labelled m

> exec'' (Load _) = Labelled (Sum 1)
> exec'' (Store _) = Labelled (Sum 1)
> exec'' (Inc _) = Labelled (Sum 2)

Note that we can't assign non-trivial values to "pure" operations.

And now we execute all three:

> main = do
> print $ runC (foldFreeA exec test) ((), 10)
> putStrLn "Your number is 10." >> runKleisli (foldFreeA exec' test) ()
> print $ getSum $ unLabelled $ foldFreeA exec'' test

Various thoughts

I don't know if free arrows are anywhere near as useful as free monads, but I hope I've successfully illustrated one application. Note that because arrow composition is essentially list concatenation it may be more efficient to use a version of Hughes lists. This is what the Cayley representation is about in the monoid notions paper. But it's easier to see the naive list version first. Something missing from here that is essential for electronics simulation is the possibility of using loops. I haven't yet thought too much about what it means to build instances of ArrowLoop freely.

Profunctors have been described as decategorised matrices in the sense that p a b, with p a profunctor, is similar to the matrix . Or, if you're working in a context where you distinguish between co- and contravariant vectors, it's similar to . The Composition operation is a lot like the definition of matrix product. From this perspective, the FreeA operation is a lot like the function on matrices that takes to . To work with ArrowLoop we need a trace-like operation.

One nice application of free monads is in writing plugin APIs. Users can write plugins that link to a small library based on a free monad. These can then be dynamically loaded and interpreted by an application at runtime, completely insulating the plugin-writer from the details of the application. You can think of it as a Haskell version of the PIMPL idiom. Free arrows might give a nice way to write plugins for dataflow applications.

People typically think of functors as containers. So in a free monad, each element is a container of possible futures. In a free arrow the relationship between the current heteromorphism and its "future" (and "past") is a bit more symmetrical. For example, for some definitions of P, a heteromorphism P a b can act on some as to give us some bs. But some definitions of P can run "backwards" and act on elements of b -> r to give us elements of a -> r. So when I use the words "input" and "output" above, you might not want to take them too literally.

by Dan Piponi ( at January 09, 2017 04:33 PM

Functional Jobs

Developer (Evanston Campus) at Northwestern University (Full-time)

Northwestern University Opportunity (Job ID 30057):

Northwestern University seeks to employ a varied and diverse range of dynamic people who understand the importance of our mission and vision. When you consider a career at Northwestern University, you know that you are joining an institution with a deep history of academic, professional and personal excellence.

Currently, we have a career opportunity as a Developer (Evanston Campus).

Job Summary:

The CCL is looking for a full-time Software Developer to work on NetLogo. This Software Developer position is based at Northwestern University's Center for Connected Learning and Computer-Based Modeling (CCL), working in a small collaborative development team in a university research group that also includes professors, postdocs, graduate students, and undergraduates, supporting the needs of multiple research projects. A major focus would be on development of NetLogo, an open-source modeling environment for both education and scientific research. CCL grants also involve development work on HubNet and other associated tools for NetLogo, including research and educational NSF grants involving building, delivering, and assessing NetLogo-based science curricula for secondary schools.

NetLogo is a programming language and agent-based modeling environment. The NetLogo language is a dialect of Logo/Lisp specialized for building agent-based simulations of natural and social phenomena. NetLogo has tens of thousands of users ranging from grade school students to advanced researchers. A collaborative extension of NetLogo, called HubNet, enables groups of participants to run participatory simulation activities in classrooms and distributed participatory simulations in social science research.

The Northwestern campus is in Evanston, Illinois on the Lake Michigan shore, adjacent to Chicago and easily reachable by public transportation.

To apply, please contact Uri Wilensky, Director of the CCL, at the following email address: ccl-developer-job [at] ccl [dot] northwestern [dot] edu. Please put 'Developer Job' in the subject line. Please include a PDF version of your resume, a list of references, and optionally, links to any relevant development projects you feel illustrate your abilities.

Specific Responsibilities:

  • Collaborates with the NetLogo development team in designing features for NetLogo, HubNet and web-based versions of these applications; writes code independently, and in the context of a team of experienced software engineers and principal investigator;
  • Creates, updates and documents existing models using NetLogo, HubNet and web-based applications; creates new such models;
  • Supports development of new devices to interact with HubNet; interacts with commercial and academic partners to help determine design and functional requirements for NetLogo and HubNet; interacts with user community including responding to bug reports, questions, and suggestions, and interacting with open-source contributors;
  • Performs data collection, organization, and summarization for projects; assists with coordination of team activities.
  • Performs other duties as required or assigned.

Minimum Qualifications:

  • Successful completion of a full 4-year course of study in an accredited college or university leading to a bachelor's or higher degree; OR appropriate combination of education and experience.
  • 2 years of relevant experience required.
  • Demonstrated experience and enthusiasm for writing clean, modular, well-tested code.

Preferred Qualifications: (Education and experience)

  • Experience with working effectively as part of a small software development team, including close collaboration, distributed version control, and automated testing;
  • Experience with building web-based applications, both server-side and client-side components, particularly with html5 and JavaScript and/or CoffeeScript;
  • Experience with at least one JVM language such as Java;
  • Experience with Scala programming, or enthusiasm for learning it;
  • Experience with Haskell, Lisp, or other functional languages;
  • Interest in and experience with programming language implementation, functional programming, and metaprogramming;
  • Experience with GUI design; language design and compilers;
  • Interest in and experience with computer-based modeling and simulation, especially agent-based simulation;
  • Interest in and experience with distributed, multiplayer, networked systems like HubNet;
  • Experience with physical computing;
  • Experience with participatory simulations;
  • Experience with cross-platform mobile development;
  • Experience working on research projects in an academic environment;
  • Experience with open-source software development and supporting the growth of an open-source community;
  • Interest in education and an understanding of secondary school math and science content.

Working at Northwestern University:

Beyond being a place to learn and grow professionally, Northwestern is an exciting and fulfilling place to work! Northwestern offers many benefit options to full and part-time employees including: competitive compensation; excellent retirement plans; comprehensive medical, dental and vision coverage; dependent care match; vacation, sick and holiday pay; professional development opportunities and tuition reimbursement. Northwestern greatly values work/life balance amongst its employees. Take advantage of recreational, cultural, and enrichment opportunities on campus. Employees also receive access to childcare solutions, retail discounts, and other work/life balance resources.

Northwestern University is an equal opportunity employer and strongly believes in creating an environment that welcomes students, faculty and staff of all races, nationalities and religions. In doing so, we offer our students the opportunity to learn and grow in diverse communities preparing them for successful careers in an increasingly global and diverse work force.

For consideration, please click on the link below. You will be directed to Northwestern University's electronic recruiting system, eRecruit, where you will apply for current openings. Once you apply, you will receive an email confirming submission of your resume. For all resumes received, if there is interest in your candidacy, the human resources recruiter or the department hiring manager will contact you. Job Opening ID number for this position is 30057.


As per Northwestern University policy, this position requires a criminal background check. Successful applicants will need to submit to a criminal background check prior to employment.

Northwestern University is an Equal Opportunity, Affirmative Action Employer of all protected classes, including veterans and individuals with disabilities. Women, racial and ethnic minorities, individuals with disabilities, and veterans are encouraged to apply. Hiring is contingent upon eligibility to work in the United States.

Get information on how to apply for this position.

January 09, 2017 04:03 PM

Ken T Takusagawa

[lvbetgkb] Right section of a function

A left section of a binary (two-argument) function is easy to write in Haskell using partial function application: just omit the last (right) argument.  A right section is a little bit more awkward, requiring backquotes, lambda, or flip.

import Data.Function((&));

-- example binary function (not an operator)
f2 :: a -> [a] -> [a];
f2 = (:);

-- we will use the larger functions later
f3 :: Int -> a -> [a] -> [a];
f3 _ = (:);

f4 :: Bool -> Int -> a -> [a] -> [a];
f4 _ _ = (:);

test :: [String];
test = map (\f -> f 'h') -- all of these evaluate 'h':("el"++"lo") yielding hello
[ (`f2` ("el" ++ "lo")) -- backquotes (grave accents) are inline operator syntax. An inline operator followed by an argument, all surrounded by parentheses, is operator right section syntax: one is supposed to imagine a hole in front of the backquotes: (__ `f2` ("el" ++ "lo"))
, (\arg1 -> f2 arg1 ("el" ++ "lo")) -- lambda syntax
, (\arg1 -> f2 arg1 $ "el" ++ "lo")
, ((flip f2) ("el" ++ "lo"))
, ((flip f2) $ "el" ++ "lo")
, (flip f2 $ "el" ++ "lo")
, (flip f2 ("el" ++ "lo")) -- It might be a little surprising that this one works, if one had thought of "flip" as a function taking only one argument, namely the function to be flipped. However, because of currying, it actually takes 3 arguments. flip :: (a -> b -> c) -> b -> a -> c.
, ("el" ++ "lo" & flip f2)

-- For these 3- and 4-argument cases, we would like to create a lambda on the penultimate argument.
-- , (`f3 (2 + 3)` ("el" ++ "lo")) -- This does not work because the contents of the backquotes must be a binary function that is a single token, not an expression.
, (let { t2 = f3 (2 + 3) } in (`t2` ("el" ++ "lo")))
, (\penultimate -> f3 (2 + 3) penultimate ("el" ++ "lo"))
, (\penultimate -> f3 (2 + 3) penultimate $ "el" ++ "lo") -- this wordy lambda syntax is one of the best in terms of low parenthesis count and avoiding deep parentheses nesting.
, (flip (f3 (2 + 3)) ("el" ++ "lo")) -- similar to "a little surprising" above
, (flip (f3 (2 + 3)) $ "el" ++ "lo")
, (flip (f3 $ 2 + 3) $ "el" ++ "lo")
, ((flip $ f3 (2 + 3)) $ "el" ++ "lo")
, ((flip $ f3 $ 2 + 3) $ "el" ++ "lo")
, ("el" ++ "lo" & (f3 (2 + 3) & flip))
, ("el" ++ "lo" & (2 + 3 & f3 & flip))

, (\penultimate -> f4 (not True) (2 + 3) penultimate ("el" ++ "lo"))
, (\penultimate -> f4 (not True) (2 + 3) penultimate $ "el" ++ "lo")
, (let { t2 = f4 (not True) (2 + 3) } in (`t2` ("el" ++ "lo")))
, (flip (f4 (not True) (2 + 3)) ("el" ++ "lo"))
, (flip (f4 (not True) (2 + 3)) $ "el" ++ "lo")
, ((flip $ f4 (not True) (2 + 3)) $ "el" ++ "lo")
, ((flip $ f4 (not True) $ 2 + 3) $ "el" ++ "lo")
, ("el" ++ "lo" & (f4 (not True) (2 + 3) & flip))
, ("el" ++ "lo" & (2 + 3 & f4 (not True) & flip))
, ("el" ++ "lo" & (2 + 3 & (not True & f4) & flip))

(\f -> f 'h') could have been written ($ 'h') , a right section itself, but we deliberately avoid being potentially obscure in the test harness.

by Ken ( at January 09, 2017 05:36 AM

January 08, 2017

wren gayle romano

ANN: containers


The containers package contains efficient general-purpose implementations of various basic immutable container types. The declared cost of each operation is either worst-case or amortized, but remains valid even if structures are shared.

Changes since (2016-08-31)

The headline change is adding merge and mergeA for Data.IntMap. The versions for Data.Map were introduced in, so this change restores parity between the interfaces. With this in place we hope this version will make it into GHC 8.2.

Other changes include:

  • Add instances for Data.Graph.SCC: Foldable, Traversable, Data, Generic, Generic1, Eq, Eq1, Show, Show1, Read, and Read1.
  • Add lifted instances (from Data.Functor.Classes) for Data.Sequence, Data.Map, Data.Set, Data.IntMap, and Data.Tree. (Thanks to Oleg Grenrus for doing a lot of this work.)
  • Properly deprecate functions in Data.IntMap long documented as deprecated.
  • Rename several internal modules for clarity. Thanks to esoeylemez for starting this process.
  • Make Data.Map.fromDistinctAscList and Data.Map.fromDistinctDescList more eager, improving performance.
  • Plug space leaks in Data.Map.Lazy.fromAscList and Data.Map.Lazy.fromDescList by manually inlining constant functions.
  • Add lookupMin and lookupMax to Data.Set and Data.Map as total alternatives to findMin and findMax.
  • Add (!?) to Data.Map as a total alternative to (!).
  • Avoid using deleteFindMin and deleteFindMax internally, preferring total functions instead. New implementations of said functions lead to slight performance improvements overall.


comment count unavailable comments

January 08, 2017 09:25 PM

Chris Smith

Call for interest: Haskell in middle school math education

Just a pointer to this post in haskell-cafe: Call for interest: Haskell in middle school math education

The TL;DR version is that a few people have put together a sizable budget to make the next big push to get CodeWorld and Haskell into middle school mathematics.  We’re looking to produce high-quality resources like video, study materials, etc. to enable teachers to easily use Haskell to make mathematics more tangible and creative in their classrooms for students ages about 11 to 14.  If this interests you, read the announcement!

by cdsmith at January 08, 2017 06:52 AM

January 07, 2017

Dan Piponi (sigfpe)

Addressing Pieces of State with Profunctors

Attempted segue

Since I first wrote about profunctors there has been quite a bit of activity in the area so I think it's about time I revisited them. I could just carry on from where I left off 5 years ago but there have been so many tutorials on the subject that I think I'll have to assume you've looked at them. My favourite is probably Phil Freeman's Fun with Profunctors. What I intend to do here is solve a practical problem with profunctors.

The problem

Arrows are a nice mechanism for building circuit-like entities in code. In fact, they're quite good for simulating electronic circuits. Many circuits are very much like pieces of functional code. For example an AND gate like this

can be nicely modelled using a pure function: c = a && b. But some components, like flip-flops, have internal state. What comes out of the outputs isn't a simple function of the inputs right now, but depends on what has happened in the past. (Alternatively you can take the view that the inputs and outputs aren't the current values but the complete history of the values.)

We'll use (Hughes) arrows rather than simple functions. For example, one kind of arrow is the Kleisli arrow. For the case of Kleisli arrows built from the state monad, these are essentially functions of type a -> s -> (b, s) where s is our state. We can write these more symmetrically as functions of type (a, s) -> (b, s). We can think of these as "functions" from a to b where the output is allowed to depend on some internal state s. I'll just go ahead and define arrows like this right now.

First the extensions and imports:

> {-# OPTIONS -W #-}
> {-# LANGUAGE Arrows #-}
> {-# LANGUAGE RankNTypes #-}
> {-# LANGUAGE FlexibleInstances #-}

> import Prelude hiding ((.), id)
> import Control.Arrow
> import Control.Category
> import Data.Profunctor
> import Data.Tuple

And now I'll define our stateful circuits. I'm going to make these slightly more general than I described allowing circuits to change the type of their state:

> newtype Circuit s t a b = C { runC :: (a, s) -> (b, t) }

> instance Category (Circuit s s) where
> id = C id
> C f . C g = C (f . g)

> instance Arrow (Circuit s s) where
> arr f = C $ \(a, s) -> (f a, s)
> first (C g) = C $ \((a, x), s) -> let (b, t) = g (a, s)
> in ((b, x), t)

This is just a more symmetrical rewrite of the state monad as an arrow. The first method allows us to pass through some extra state, x, untouched.

Now for some circuit components. First the "pure" operations, a multiplier and a negater:

> mul :: Circuit s s (Int, Int) Int
> mul = C $ \((x, y), s) -> (x*y, s)

> neg :: Circuit s s Int Int
> neg = C $ \(x, s) -> (-x, s)

And now some "impure" ones that read and write some registers as well as an accumulator:

> store :: Circuit Int Int Int ()
> store = C $ \(x, _) -> ((), x)

> load :: Circuit Int Int () Int
> load = C $ \((), s) -> (s, s)

> accumulate :: Circuit Int Int Int Int
> accumulate = C $ \(a, s) -> (a, s+a)

I'd like to make a circuit that has lots of these components, each with its own state. I'd like to store all of these bits of state in a larger container. But that means that each of these components needs to have a way to address its own particular substate. That's the problem I'd like to solve.

Practical profunctor optics

In an alternative universe lenses were defined using profunctors. To find out more I recommend Phil Freeman's talk that I linked to above. Most of the next paragraph is just a reminder of what he says in that talk and I'm going to use the bare minimum to do the job I want.

Remember that one of the things lenses allow you to do is this: suppose we have a record s containing a field of type a and another similar enough kind of record t with a field of type b. Among other things, a lens gives a way to take a rule for modifying the a field to a b field and extend it to a way to modify the s record into a t record. So we can think of lenses as giving us functions of type (a -> b) -> (s -> t). Now if p is a profunctor then you can think of p a b as being a bit function-like. Like functions, profunctors typically (kinda, sorta) get used to consume (zero or more) objects of type a and output (zero or more) objects of type b. So it makes sense to ask our lenses to work with these more general objects too, i.e. we'd like to be able to get something of type p a b -> p s t out of a lens. A strong profunctor is one that comes pre-packed with a lens that can do this for the special case where the types s and t are 2-tuples. But you can think of simple records as being syntactic sugar for tuples of fields, so strong profunctors also automatically give us lenses for records. Again, watch Phil's talk for details.

So here is our lens type:

> type Lens s t a b = forall p. Strong p => p a b -> p s t

Here are lenses that mimic the well known ones from Control.Lens:

> _1 :: Lens (a, x) (b, x) a b
> _1 = first'

> _2 :: Lens (x, a) (x, b) a b
> _2 = dimap swap swap . first'

(Remember that dimap is a function to pre- and post- compose a function with two others.)

Arrows are profunctors. So Circuit s s, when wrapped in WrappedArrow, is a profunctor. So now we can directly use the Circuit type with profunctor lenses. This is cool, but it doesn't directly solve our problem. So we're not going to use this fact. We're interested in addressing the state of type s, not the values of type a and b passed through our circuits. In other words, we're interested in the fact that Circuit s t a b is a profunctor in s and t, not a and b. To make this explicit we need a suitable way to permute the arguments to Circuit:

> newtype Flipped p s t a b = F { unF :: p a b s t }

(It was tempting to call that ComedyDoubleAct.)

And now we can define:

> instance Profunctor (Flipped Circuit a b) where
> lmap f (F (C g)) = F $ C $ \(a, s) -> g (a, f s)
> rmap f (F (C g)) = F $ C $ \(a, s) -> let (b, t) = g (a, s)
> in (b, f t)

> instance Strong (Flipped Circuit a b) where
> first' (F (C g)) = F $ C $ \(a, (s, x)) -> let (b, t) = g (a, s)
> in (b, (t, x))

Any time we want to use this instance of Profunctor with a Circuit we have to wrap everything with F and unF. The function dimap gives us a convenient way to implement such wrappings.

Let's implement an imaginary circuit with four bits of state in it.

Here is the state:

> data CPU = CPU { _x :: Int, _y :: Int, _z :: Int, _t :: Int } deriving Show

As I don't have a complete profunctor version of a library like Control.Lens with its template Haskell magic I'll set things up by hand. Here's a strong-profunctor-friendly version of the CPU and a useful isomorphism to go with it:

> type ExplodedCPU = (Int, (Int, (Int, Int)))

> explode :: CPU -> ExplodedCPU
> explode (CPU u v w t) = (u, (v, (w, t)))

> implode :: ExplodedCPU -> CPU
> implode (u, (v, (w, t))) = CPU u v w t

And now we need adapters that take lenses for an ExplodedCPU and (1) apply them to a CPU the way Control.Lens would...

> upgrade :: Profunctor p =>
> (p a a -> p ExplodedCPU ExplodedCPU) ->
> (p a a -> p CPU CPU)
> upgrade f = dimap explode implode . f

> x, y, z, t :: Flipped Circuit a b Int Int -> Flipped Circuit a b CPU CPU
> x = upgrade _1
> y = upgrade $ _2 . _1
> z = upgrade $ _2 . _2 . _1
> t = upgrade $ _2 . _2 . _2

...and (2) wrap them so they can be used on the flipped profunctor instance of Circuit:

> (!) :: p s t a b -> (Flipped p a b s t -> Flipped p a b s' t') ->
> p s' t' a b
> x ! f = dimap F unF f x

After all that we can now write a short piece of code that represents our circuit. Notice how we can apply the lenses x, ..., t directly to our components to get them to use the right pieces of state:

> test :: Circuit CPU CPU () ()
> test = proc () -> do
> a <- load ! x -< ()
> b <- load ! y -< ()
> c <- mul -< (a, b)
> d <- neg -< c
> e <- accumulate ! t -< d
> () <- store ! z -< e

> returnA -< ()

> main :: IO ()
> main = do
> print $ runC test ((), CPU 2 30 400 5000)

Of course with a suitable profunctor lens library you can do a lot more, like work with traversable containers of components.

Note that we could also write a version of all this code using monads instead of arrows. But it's easier to see the symmetry in Flipped Circuit when using arrows, and it also sets the scene for the next thing I want to write about...

by Dan Piponi ( at January 07, 2017 09:46 PM