Planet Haskell

January 25, 2022

Oleg Grenrus

Folding unfoldable

Posted on 2022-01-25 by Oleg Grenrus

You may be aware of Foldable type-class. It’s quite useful one. For example, instead of writing your own sum1 as

sum' :: Num a => [a] -> a
sum' = Data.List.foldl' (+) 0

you may generalize it to arbitrary Foldable2:

sum' :: (Foldable f, Num a) => f a -> a
sum' = Data.Foldable.foldl' (+) 0

And the everything would be great...

... except if your data comes in unboxed vector. You may try to use that generic sum algorithm:

values :: U.Vector Double
values = U.fromList [1,2,3,4,5,6,7,7,7]

result :: Double
result = sum' values

and then GHC just says, without further explanation

No instance for (Foldable U.Vector) arising from a use of ‘sum'’

"Why not?!" you wonder.

Unboxed vectors are backed by bytearrays, so you need an Unbox instance to be able to even read any values from there. (That’s different from e.g. Set, which is Foldable, as you can walk over Set without having Ord instance for the elements).


One idea is to

data Bundle a where
    Bundle :: U.Unbox a => U.Vector a -> Bundle a

When the Unbox instance is next to the data, we will be able to write Foldable instance: pattern match on the Bundle, use that "local" instance to fold. However, people have told me, that sometimes it doesn’t work that well: GHC may not specialize things, even the dictionary is (almost) right there. Though in my small experiments it did:

sumU :: (Num a, U.Unbox a) => U.Vector a -> a
sumU xs = sum' (Bundle xs)

produced nice loops.

Yet, having to bundle instance feels somehow wrong. Distant data type contexts vibes, brr..

There is another way to make Foldable work, with a

data Hack a b where
    Hack :: U.Vector a -> Hack a a

This is a two type-parameter wrapper, but the types are always the same! (I wish that could be a newtype). The Foldable instance is simply:

instance U.Unbox a => Foldable (Hack a) where
    foldr f z (Hack v)  = U.foldr f z v
    foldl' f z (Hack v) = U.foldl' f z v


and specialized sum' for unboxed vector looks the same as with Bundle:

sumU :: (Num a, U.Unbox a) => U.Vector a -> a
sumU xs = sum' (Hack xs)

but now Unbox instance comes from the "outside", it’s needed by Foldable, not to wrap vector in Hack. When GHC sees just Foldable (Hack X) ... it could already start simplifying stuff, if it knows something about X (i.e. its Unbox instance), without waiting to see what the members of the instance are applied to!

We could write also write

{-# SPECIALIZE instance Foldable (UV Double) #-}

to force GHC do some work in advance. We couldn’t with Bundle approach.

Is this Hack terrible or terrific? I’m not sure, yet.

Anyhow, that’s all I have this time. This (just a little) tongue-in-cheek post is "inspired" by the fact that statistics package wants unboxed vectors everywhere, for "performance" reasons, and that is soooo inconvenient.

Please, use Foldable for inputs you will fold over anyway. (Asking for a selector function, like foldMap would avoid creating intermediate structures!). People can choose to Bundle or Hack their way around to provide unboxed (or storable) vectors or primarrays to your algorithm, and others don’t need to suffer when they play with your lib in the GHCi.

P.S. I leave this here:

data HackText a where
    HackText :: Text -> HackText Char

P.P.S. I know there is MonoFoldable, and lens with its Folds and a lot of other stuff. But Foldable is right there, in our Prelude!

  1. it’s already that way in base, check yourself↩︎

  2. Though you probably should write it using strict foldMap' as in base to let container decide how to do it best↩︎

January 25, 2022 12:00 AM

January 24, 2022

Mark Jason Dominus

Excessive precision in crib slat spacing?

A couple of years back I wrote:

You sometimes read news articles that say that some object is 98.42 feet tall, and it is clear what happened was that the object was originally reported to be 30 meters tall …

As an expectant parent, I was warned that if crib slats are too far apart, the baby can get its head wedged in between them and die. How far is too far apart? According to everyone, 2⅜ inches is the maximum safe distance. Having been told this repeatedly, I asked in one training class if 2⅜ inches was really the maximum safe distance; had 2½ inches been determined to be unsafe? I was assured that 2⅜ inches was the maximum. And there's the opposite question: why not just say 2¼ inches, which is presumably safe and easier to measure accurately?

But sometime later I guessed what had happened: someone had determined that 6 cm was a safe separation, and 6cm is 2.362 inches. 2⅜ inches exceeds this by only inch, about half a percent. 7cm would have been 2¾ in, and that probably is too big or they would have said so.

The 2⅜, I have learned, is actually codified in U.S. consumer product safety law. (Formerly it was at 16 CFR 1508; it has since moved and I don't know where it is now.) And looking at that document I see that it actually says:

The distance between components (such as slats, spindles, crib rods, and corner posts) shall not be greater than 6 centimeters (2⅜ inches) at any point.

Uh huh. Nailed it.

I still don't know where they got the 6cm from. I guess there is someone at the Commerce Department whose job is jamming babies’ heads between crib bars.

by Mark Dominus ( at January 24, 2022 07:12 PM

Monday Morning Haskell

Math-y List Operations

Earlier this month we explored some functions related to booleans and lists. Today we'll consider a few simple helpers related to lists of numbers.

As in Python, Haskell allows you to easily take the product and sum of a list. These functions are very straightforward:

sum :: (Num a) => [a] -> a

product :: (Num a) => [a] -> a


>> sum [1, 2, 3, 4]
>> sum [-1, 1]
>> product [1, 2, 3, 4]
>> product [5, 6, -2]

As with our boolean functions, I originally gave the type signatures in terms of lists, but they actually work for any "Foldable" type. So both the inner and outer type of our input are parameterized.

sum :: (Foldable t, Num a) => t a -> a

product :: (Foldable t, Num a) => t a -> a


>> sum $ Set.fromList [1, 2, 3]

Both of these functions also work with empty lists, since these operations have an identity to start with.

>> sum []
>> product []

The same cannot be said for the functions minimum and maximum. These require non-empty lists, or they will throw an error!

minimum :: (Foldable t, Ord a) => t a -> a

maximum :: (Foldable t, Ord a) => t a -> a


>> maximum [5, 6, -1, 4]
>> minimum [5, 6, -1, 4]
>> maximum []
*** Exception: Prelude.maximum: empty list

And while they can be used with numbers, as above, they can also be used with any orderable type, such as strings.

>> maximum ["hello", "world"]

Remember what we mentioned a while ago with groupBy and how many functions have another version with By? This applies for our min and max functions. These allow us to provide a custom comparison operation.

maximumBy :: (Foldable t) => (a -> a -> Ordering) -> t a -> a
minimumBy :: (Foldable t) => (a -> a -> Ordering) -> t a -> a

As an example, let's consider that when comparing tuples, the first element is used to sort them. Only if there is a tie to we go to the second element.:

>> maximum [(1, 4), (2, 1), (2, 3)]
(2, 3)

However, we can instead use the second element as the primary comparison like so:

>> let f (x1, x2) (y1, y2) = if x2 == y2 then x1 `compare` y1 else x2 `compare` y2
>> maximumBy f [(1, 4), (2, 1), (2, 3)]
(1, 4)
>> minimumBy f [(1, 4), (2, 1), (2, 3)]
(2, 1)

This approach is also very helpful when you're dealing with more complex types, but you only care about comparing a single field.

As a final note, sort and sortBy follow this same pattern. The first one assumes an ordering property, the second allows you a custom ordering.

sort :: (Ord a) => [a] -> [a]

sortBy :: (a -> a -> Ordering) -> [a] -> [a]

For more tricks like this, keep following the Monday Morning Haskell blog every Monday and Thursday! You can subscribe to get our monthly newsletter as a summary. This will also get you access to subscriber resources like our Beginners Checklist.

by James Bowen at January 24, 2022 03:30 PM

Chris Smith 2

Monoids are Composable List Summarizers

The standard definition of a monoid goes something like this: a monoid is any set (if you’re a mathematician) or type (if you’re a Haskell programmer), together with an associative binary operation ∗ (if you’re a mathematician) or <> (if you’re a Haskeller) that has an identity. That’s nice and abstract, and those without much math background can find it difficult to follow or think of examples.

So here’s a different definition that I’ve used when talking to programmers in particular. I’m sure this isn’t new, but I don’t know of a reference I can point to for it. If you know of a good beginner-accessible reference for this, I’d like to know. It goes something like this: a monoid is any way to summarize a list so that you can combine just the summaries of two lists to get a summary of their concatenation.

This second definition is interesting to programmers because it suggests all sorts of strategies for decomposition, parallelism, and distributed computing. Anything that I want to compute from a list is much easier to do if I can take this approach of (1) splitting up the list into parts, (2) computing on each of the parts, then (3) merging the results together. I can do this to build distributed systems where each node computes on only the part of the data it has. I can do it to build parallel code where I effectively use multiple CPUs or even GPUs or SIMD instructions to perform operations on different subsets of data. And it’s useful just for making it easier to express and reason about a program even if there’s nothing fancy going on. (For example, foldMap in Haskell’s Foldable class uses monoids in this way.)


Let’s look at some examples:

  • Counts: I can summarize a finite list by counting its elements. If I have the counts of two lists, I can compute the count of their concatenation by adding the two counts together.
  • Sums: I can summarize a finite list by its sum. If I have the sum of two lists, I can compute the sum of their concatenation, again by adding.
  • Minima and Maxima: I can summarize a finite list by its minimum and maximum elements. (Lists may be, but we can define the minimum and maximum of an empty list to be ∞ and -∞, respectively). And again, if I have the minimum and maximum elements of two lists, I can compute the minimum and maximum of their concatenation, by taking the least of the two minima and the greater of the two maxima.
  • GCDs: If I’m doing something with number theory, I might summarize a finite list with its greatest common divisor. If I have the greatest common divisors of two lists, I can compute the greatest common divisor of their concatenation, by computing the gcd of the gcds.
  • Heads: I can summarize a list by its head (i.e., first element), if it exists! If the list is empty, I will just note that it has no head. Given the heads of two lists, I can tell you the first element of their concatenation: it’s the head of the left-hand list if it exists, and the head of the right-hand list otherwise.
  • The Whole List: This one may seem a bit silly, but I’ll reference it later. You can trivially “summarize” a list by just keeping the whole list. Your summarizer here is the identity function! That obviously gives you enough information to know the concatenation, as well, so it’s a perfectly good monoid.

Counter-examples and how to fix them

At this point, our examples of monoids include counts, sums, minima, maxima, gcds, and heads. With so many examples, one might start to suspect that any function on lists is a monoid. But of course, that’s not the case. We can look at some instructive counterexamples.

  • Means: If you’re thinking of ways to summarize a list of numbers, one that probably comes to mind is the mean. The mean is not quite a monoid, because knowing the means of two lists isn’t enough information to compute the mean of their concatenation. Just averaging the averages doesn’t work, unless the lists happen to be the same length!

While means themselves aren’t monoids, I can quite easily deal with means using a monoid structure: Instead of keeping the mean itself, I can keep an ordered pairs of sums and counts, which are monoids, and actually do the division only when I need the final result. By finding an actual monoid, pairs of counts and sums, I can use it as a sort of halfway point to compute with parts of the data, though I still have to do a (small) bit of extra computation to get from there to an actual mean.

Imagine you are designing parallel or distributed algorithm for computing the mean of some data set. The monoid structure means you can use a divide-and-conquer approach to take advantage of data parallelism when computing the sums and counts. Split up the list, sum and count each piece, and then combine them. Or maybe the data already lives on different compute nodes, so you can have each node compute the count and sum locally before combining their results. Once you have the overall monoid value (sum and count), it still remains to compute the mean; but luckily, that last step of computation is pretty easy: just one division!

  • Medians: Going down the list of averages, you might also want to know the median of a list. Now things get more dicey, but also more intriguing! Once again, the median itself is not a monoid: knowing the medians of two lists still doesn’t tell you the median of the concatenation.

This leads to the same question as before. Is there some other monoid I can use as my halfway point? There’s one answer that always works here, but isn’t useful: remember that the whole list is a monoid. But that answer corresponds to just concatenating the whole list together and only running the summarizer function on the final concatenated list. That’s avoids interesting uses of monoids entirely.

For the median, can we do better? Honestly, it looks difficult. In general, any element of a list could wind up being the median of some concatenated list containing it. So it seems that any monoid we choose here will need to contain information about each element of the list. That’s already looking a lot like the trivial “the whole list” monoid above.

There’s one improvement we can make, though. We don’t care about the order of elements in the list! We can then choose the sorted list as our monoid.

  • First, is that even a monoid? Well, (a) given a list I can always sort it. And (b) If I have the sorted versions of two lists, I can merge them to get the sorted concatenation of those lists. So yes, it’s a monoid.
  • Not only is it a monoid, but the monoid structure is helpful. It’s actually easier to merge sorted lists than it is to sort the concatenation of the originals — this is the whole basis of merge sort, after all.
  • Finally, this monoid helps with computing a median. Although we don’t save on the size of data, it’s trivial to find the median of a list after sorting it.

So we do find an approach to computing medians that can take advantage of monoid structure after all. Ultimately, it amounts to this: sort the list using merge sort, then pick the middle element after sorting. Indeed, merge sort is often used because it has many of the properties we’re interested in: it’s easy to parallelize, distribute over multiple nodes, etc. That’s exactly because of the monoid structure it exploits.

It’s worth mentioning, though, that sorting and then picking the middle element is not the classically optimal way to find a median. The median-of-medians algorithm can famously do the computation in O(n) time. Perhaps there is some suitably clever monoid that takes advantage of these ideas, but I don’t see how that would happen. In the end, I suppose not all classical algorithms on lists can be expressed as a monoid in this way.

Why is this equivalent to abstract monoids?

I haven’t yet justified calling these list-summarizers monoids. Are they really the same thing as monoids in their traditional abstract definition? They are, indeed, and the reason is that finite lists form a construction known as the free monoid. The list summarizer function is a monoid homomorphism, meaning that it preserves the monoid structure of this free monoid.

I’ll sketch a proof in two directions:

  • Suppose you have a list-summarizer like this. It must be a monoid because lists themselves form a monoid, and the summarizer is a monoid homomorphism so that it preserves the monoid structure. (One subtle point here is that the abstract monoid we’re talking about is on things in the image of the summarizer function. If the codomain of the function is larger than the image, you do not necessarily have a monoid on the entire codomain.)
  • Conversely, suppose you have a monoid M. Then there is a monoid homomorphism from the free monoid generated by M back to M itself, so you have a list-summarizer as well. (I’m skipping the details, but this is well known. See, for example, this source I just found with Google, which says “every monoid arises as a homomorphic image of a free monoid.”

And that’s it!

by Chris Smith at January 24, 2022 07:36 AM

January 23, 2022

Mark Jason Dominus

Annoying mathematical notation

Recently I've been thinking that maybe the thing I really dislike about set theory might the power set axiom. I need to do a lot more research about this, so any blog articles about it will be in the distant future. But while looking into it I ran across an example of a mathematical notation that annoyed me.

This paper of Gitman, Hamkins, and Johnstone considers a subtheory of ZFC, which they call “”, obtained by omitting the power set axiom. Fine so far. But the main point of the paper:

Nevertheless, these deficits of are completely repaired by strengthening it to the theory , obtained by using collection rather than replacement in the axiomatization above.

Got that? They are comparing two theories that they call “” and “”.

(Blog post by Gitman)

[ Previously ]

by Mark Dominus ( at January 23, 2022 06:43 PM

Sandy Maguire

Review: Clowns to the Left of Me, Jokers to the Right

Another week, another paper. This week it’s McBride’s Clowns to the Left of Me, Jokers to the Right (CJ). At a high level, CJ generalizes the results from The Derivative of a Regular Type is its Type of One-Hole Contexts, wondering about what happens to a zipper when we don’t require the elements on either side to have the same type. This turns out to be not just an idle curiosity; the technique can be used to automatically turn a catamorphism into a tail-recursive function. Why is THAT useful? It lets us run functional programs on stock hardware.

The paper begins by reminding us that all algebraic types can be built out of compositions of functors. Furthermore, any recursive ADT can be represented as the fix-point of its base functor. For example, rather than writing

data Expr = Val Int | Add Expr Expr

we can instead pull the recursive inlining of Expr out into a type argument:

data ExprF a = ValF Int | AddF a a

and then can tie the knot:

newtype Fix f = Fix { unFix :: f (Fix f) }

type Expr ~= Fix ExprF

This is all standard bananas and barbed wires, machinery, so refer to that if you’d like a deeper presentation than I’ve provided here.

Rather than go through the paper’s presentation of this section, I will merely point out that GHC.Generics witnesses the “all ADTs can be built out of functor composition,” and give ExprF a Generic1 instance:

data ExprF a = ValF Int | AddF a a
  deriving stock (Functor, Generic1)

Clowns and Jokers

The title of CJ is a throw-back to some boomer song, whose lyrics go

Clowns to the left of me! Jokers to the right! Here I am stuck in the middle with you.

while this is an apt idea for what’s going on in the paper, it’s also an awful mnemonic for those of us who don’t have strong associations with the song. My mnemonic is that “clowns” come sooner in a lexicographical ordering than “jokers” do. Likewise, work you’ve already done comes before work you haven’t yet done, which is really what CJ is about.

So here’s the core idea of CJ: we can “dissect” a traversal into work we’ve already done, and work we haven’t yet done. The work we’ve already done can have a different type than the stuff left to do. These dissections are a rather natural way of representing a suspended computation. Along with the dissection itself is the ability to make progress. A dissection is spiritually a zipper with different types on either side, so we can make progress by transforming the focused element from “to-do” to “done”, and then focusing on the next element left undone.

CJ implements all of this as a typeclass with fundeps, but I prefer type families. And furthermore, since this is all generic anyway, why not implement it over GHC.Generics? So the game here is thus to compute the type of the dissection for each of the Generic1 constructors.

Begin by building the associated type family. The dissected version of a functor is necessarily a bifunctor, since we want slots to store our clowns and our jokers:

class GDissectable p where
  type GDissected p :: Type -> Type -> Type

As usual, we lift GDissectable over M1:

instance GDissectable f => GDissectable (M1 _1 _2 f) where
  type GDissected (M1 _1 _2 f) = GDissected f

Because a dissection is a separation of the work we have and haven’t done yet, the cases for U1 and K1 are uninspired — there is no work to do, since they’re constants!

instance GDissectable U1 where
  type GDissected U1 = K2 Void

instance GDissectable (K1 _1 a) where
  type GDissected (K1 _1 a) = K2 Void

where K2 is the constant bifunctor:

data K2 a x y = K2 a

A different way to think about these dissections is as generalized zippers, which are the derivatives of their underlying types. Since U1 and K1 are constants, their derivatives are zero, which we have shown here via K2 Void.

The Par1 generic constructor is used to encode usages of the functor’s type parameter. Under the view of the derivative, this is a linear use of the variable, and thus its derivative is one:

instance GDissectable Par1 where
  type GDissected Par1 = K2 ()

We’re left with sums and products. Sums are easy enough: the dissection of the sum is the sum of the dissections.

instance (GDissectable f, GDissectable g) => GDissectable (f :+: g) where
  type GDissected (f :+: g) = Sum2 (GDissected f) (GDissected g)


data Sum2 f g a b = L2 (f a b) | R2 (g a b)

Again, this aligns with our notion of the derivative, as well as with our intuition. If I want to suspend a coproduct computation half way, I either have an L1 I need to suspend, or I have an R1. Nifty.

Finally we come to products:

instance (GDissectable f, GDissectable g) => GDissectable (f :*: g) where
  type GDissected (f :*: g) =
    Sum2 (Product2 (GDissected f) (Joker g))
         (Product2 (Clown f) (GDissected g))


data Clown p a b = Clown (p a)
data Joker p a b = Joker (p b)
data Product2 f g a b = Product2 (f a b) (g a b)

Let’s reason by intuition here first. I have both an f and a g stuck together. If I’d like to suspend a traversal through this thing, either I am suspended in the f, with g not yet touched (Joker g), or I have made it through the f entirely (Clown f), and have suspended inside of g.

Rather unsurprisingly (but also surprisingly, depending on your point of view!), this corresponds exactly to the quotient chain rule:

\[ \frac{d}{dx}[f(x)\cdot{}g(x)] = f(x)\cdot{}g'(x) + f'(x)\cdot{}g(x) \]

Curry-Howard strikes in the most interesting of places!

Getting Started

With our dissected types defined, it’s now time to put them to use. The paper jumbles a bunch of disparate pieces together, but I’m going to split them up for my personal understanding. The first thing we’d like to be able to do is to begin traversing a structure, which is to say, to split it into its first joker and the resulting dissection.

We’ll make a helper structure:

data Suspension p k c j
  = Done (p c)
  | More j (k c j)
  deriving Functor

A Suspension p k c j is either a p fully saturated with clowns (that is, we’ve finished traversing it), or a joker and more structure (k c j) to be traversed. k will always be GDissected p, but for technical reasons, we’re going to need to keep it as a second type parameter.

Armed with Suspension, we’re ready to add our first method to GDissectable. gstart takes a fully-saturated p j and gives us back a suspension:

class GDissectable p where
  type GDissected p :: Type -> Type -> Type
  gstart :: p j -> Suspension p (GDissected p) c j

These instances are all pretty easy. Given a double natural transformation over Suspension:

    :: (forall x. p x -> p' x)
    -> (forall a b. k a b -> k' a b)
    -> Suspension p  k  c j
    -> Suspension p' k' c j
bihoist _ g (More j kcj) = More j (g kcj)
bihoist f _ (Done pc)    = Done (f pc)

Wingman can write U1, K1, Par1, M1 and :+: all for us:

  gstart _ = Done U1

  gstart (K1 a) = Done (K1 a)

  gstart (Par1 j) = More j (K2 ())

  gstart (M1 fj) = bihoist M1 id $ gstart fj

  gstart (L1 fj) = bihoist L1 L2 $ gstart fj
  gstart (R1 gj) = bihoist R1 R2 $ gstart gj

For products, gstart attempts to start the first element, and hoists its continuation if it got More. Otherwise, it starts the second element. This is done with a couple of helper functions:

    :: GDissectable g
    => Suspension f (GDissected f) c j
    -> g j
    -> Suspension (f :*: g)
                 (Sum2 (Product2 (GDissected f) (Joker g))
                       (Product2 (Clown f) (GDissected g)))
                 c j
mindp (More j pd) qj = More j $ L2 $ Product2 pd $ Joker qj
mindp (Done pc) qj = mindq pc (gstart qj)

    :: f c
    -> Suspension g (GDissected g) c j
    -> Suspension (f :*: g)
                 (Sum2 (Product2 (GDissected f) (Joker g))
                       (Product2 (Clown f) (GDissected g)))
                 c j
mindq pc (More j qd) = More j $ R2 $ Product2 (Clown pc) qd
mindq pc (Done qc) = Done (pc :*: qc)

and then

  gstart (pj :*: qj) = mindp (gstart @f pj) qj

Making Progress

Getting started is nice, but it’s only the first step in the process. Once we have a More suspension, how do we move the needle? Enter gproceed, which takes a clown to fill the current hole and a suspension, and gives back a new suspension corresponding to the next joker.

class GDissectable p where
  type GDissected p :: Type -> Type -> Type
  gstart :: p j -> Suspension p (GDissected p) c j
  gproceed :: c -> GDissected p c j -> Suspension p (GDissected p) c j

By pumping gproceed, we can make our way through a suspension, transforming each joker into a clown. Eventually our suspension will be Done, at which point we’ve traversed the entire data structure.

For the most part, gproceed is also Wingman-easy:

  -- U1
  gproceed _ (K2 v) = absurd v

  -- K1
  gproceed _ (K2 v) = absurd v

  gproceed c _ = Done (Par1 c)

  gproceed fc = bihoist M1 id . gproceed fc

  gproceed c (L2 dis) = bihoist L1 L2 $ gproceed c dis
  gproceed c (R2 dis) = bihoist R1 R2 $ gproceed c dis

Products are again a little tricky. If we’re still working on the left half, we want to proceed through it, unless we finish, in which case we want to start on the right half. When the right half finishes, we need to lift that success all the way through the product. Our helper functions mindp and mindq take care of this:

  gproceed c (L2 (Product2 pd (Joker qj))) = mindp (gproceed @f c pd) qj
  gproceed c (R2 (Product2 (Clown pc) qd)) = mindq pc (gproceed @g c qd)

Plugging Holes

McBride points out that if we forget the distinction between jokers and clowns, what we have is a genuine zipper. In that case, we can just plug the existing hole, and give back a fully saturated type. This is witnessed by the final method of GDissectable, gplug:

class GDissectable p where
  type GDissected p :: Type -> Type -> Type
  gstart :: p j -> Suspension p (GDissected p) c j
  gproceed :: c -> GDissected p c j -> Suspension p (GDissected p) c j
  gplug :: x -> GDissected p x x -> p x

Again, things are Wingman-easy. This time, we can even synthesize the product case for free:

  -- U1
  gplug _ (K2 vo) = absurd vo

  -- K1
  gplug _ (K2 vo) = absurd vo

  gplug x _ = Par1 x

  gplug x dis = M1 $ gplug x dis

  gplug x (L2 dis) = L1 (gplug x dis)
  gplug x (R2 dis) = R1 (gplug x dis)

  gplug x (L2 (Product2 f (Joker g))) = gplug x f :*: g
  gplug x (R2 (Product2 (Clown f) g)) = f :*: gplug x g

This sums up GDissectable.

Nongeneric Representations

GDissectable is great and all, but it would be nice to not need to deal with generic representations. This bit isn’t in the paper, but we can lift everything back into the land of real types by making a copy of GDissectable:

class (Functor p, Bifunctor (Dissected p)) => Dissectable p where
  type Dissected p :: Type -> Type -> Type
  start :: p j -> Suspension p (Dissected p) c j
  proceed :: c -> Dissected p c j -> Suspension p (Dissected p) c j
  plug :: x -> Dissected p x x -> p x

and then a little machinery to do -XDerivingVia:

newtype Generically p a = Generically { unGenerically :: p a }
  deriving Functor

instance ( Generic1 p
         , Functor p
         , Bifunctor (GDissected (Rep1 p))
         , GDissectable (Rep1 p)
    => Dissectable (Generically p) where
  type Dissected (Generically p) = GDissected (Rep1 p)
  start (Generically pj) =
    bihoist (Generically . to1) id $ gstart $ from1 pj
  proceed x = bihoist (Generically . to1) id . gproceed x
  plug x = Generically . to1 . gplug x

With this out of the way, we can now get Dissectable for free on ExprF from above:

data ExprF a = ValF Int | AddF a a
  deriving stock (Functor, Generic, Generic1, Show)
  deriving Dissectable via (Generically ExprF)

Dissectable Fmap, Sequence and Catamorphisms

Given a Dissectable constraint, we can write a version of fmap that explicitly walks the traversal, transforming each element as it goes. Of course, this is silly, since we already have Functor for any Dissectable, but it’s a nice little sanity check:

tmap :: forall p a b. Dissectable p => (a -> b) -> p a -> p b
tmap fab = pump . start
    pump :: Suspension p (Dissected p) b a -> p b
    pump (More a dis) = continue $ proceed (fab a) dis
    pump (Done j) = j

We start the dissection, and then pump its suspension until we’re done, applying fab as we go.

Perhaps more interestingly, we can almost get Traversable with this machinery:

tsequence :: forall p f a. (Dissectable p, Monad f) => p (f a) -> f (p a)
tsequence = pump . start
    pump :: Suspension p (Dissected p) a (f a) -> f (p a)
    pump (More fa dis) = do
      a <- fa
      pump $ proceed a dis
    pump (Done pa) = pure pa

It’s not quite Traversable, since it requires a Monad instance instead of merely Applicative. Why’s that? I don’t know, but MonoidMusician suggested it’s because applicatives don’t care about the order in which you sequence them, but this Dissectable is very clearly an explicit ordering on the data dependencies in the container. Thanks MonoidMusician!

Finally, we can implement the stack-based, tail-recursive catamorphism that we’ve been promised all along. The idea is simple — we use the Dissected type as our stack, pushing them on as we unfold the functor fixpoint, and resuming them as we finish calls.

tcata :: forall p v. Dissectable p => (p v -> v) -> Fix p -> v
tcata f t = load' t []
        :: Fix p
        -> [Dissected p v (Fix p)]
        -> v
    load' (Fix t) stk = next (start t) stk

        :: Suspension p (Dissected p) v (Fix p)
        -> [Dissected p v (Fix p)]
        -> v
    next (More p dis) stk = load' p (dis : stk)
    next (Done p) stk = unload' (f p) stk

        :: v
        -> [Dissected p v (Fix p)]
        -> v
    unload' v [] = v
    unload' v (pd : stk) = next (proceed v pd) stk

Compare this with the usual implementation of cata:

cata :: Functor f => (f a -> a) -> Fix f -> a
cata f (Fix fc) = f $ fmap (cata f) fc

which just goes absolutely ham, expanding nodes and fmapping over them, destroying any chance at TCO.

The paper also has something to say about free monads, but it wasn’t able to hold my attention. It’s an application of this stuff, though in my opinion the approach is much more interesting than its applications. So we can pretend the paper is done here.

But that’s not all…

Functor Composition

Although the paper doesn’t present it, there should also be here another instance of GDissectable for functor composition. Based on the composite chain rule, it should be:

instance (Dissectable f, GDissectable g) => GDissectable (f :.: g) where
  type GDissected (f :.: g) =
    Product2 (Compose2 (Dissected f) g)
             (GDissected g)

newtype Compose2 f g c j = Compose2 (f (g c) (g j))

GDissected is clearly correct by the chain rule, but Compose2 isn’t as clear. We stick the clowns in the left side of the composite of f . g, and the jokers on the right.

Intuitively, we’ve done the same trick here as the stack machine example. The first element of the Product2 in GDissected keeps track of the context of the f traversal, and the second element is the g traversal we’re working our way through. Whenever the g finishes, we can get a new g by continuing the f traversal!

It’s important to note that I didn’t actually reason this out—I just wrote the chain rule from calculus and fought with everything until it typechecked. Then I rewrote my examples that used :+: and :*: to instead compose over Either and (,), and amazingly I got the same results! Proof by typechecker!

After a truly devoted amount of time, I managed to work out gstart for composition as well.

  gstart (Comp1 fg) =
    case start @f $ fg of
      More gj gd -> continue gj gd
      Done f -> Done $ Comp1 f
          :: g j
          -> Dissected f (g c) (g j)
          -> Suspension
               (f :.: g)
               (Product2 (Compose2 (Dissected f) g) (GDissected g))
               c j
      continue gj gd =
        case gstart gj of
          More j gd' ->
            More j $ Product2 (Compose2 gd) gd'
          Done g ->
            case progress @f g gd of
              More gj gd -> continue gj gd
              Done fg -> Done $ Comp1 fg

The idea is that you start f, which gives you a g to start, and you need to keep starting g until you find one that isn’t immediately done.

gproceed is similar, except dual. If all goes well, we can just proceed down the g we’re currently working on. The tricky part is now when we finish a g node, we need to keep proceeding down f nodes until we find one that admits a More:

  gproceed c (Product2 cfg@(Compose2 fg) gd) =
    case gproceed @g c gd of
      More j gd -> More j $ Product2 cfg gd
      Done gc -> finish gc
      -- finish
      --     :: g c
      --     -> Suspension
      --          (f :.: g)
      --          (Product2 (Compose2 (Dissected f) g) (GDissected g))
      --          c j
      finish gc =
        case proceed @f gc fg of
          More gj gd ->
            case gstart gj of
              More j gd' -> More j $ Product2 (Compose2 gd) gd'
              Done gc -> finish gc
          Done f -> Done $ Comp1 f

I’m particularly proud of this; not only did I get the type for GDissected right on my first try, I was also capable of working through these methods, which probably took upwards of two hours.

GHC.Generics isn’t so kind as to just let us test it, however. Due to some quirk of the representation, we need to add an instance for Rec1, which is like K1 but for types that use the functor argument. We can give an instance of GDissectable by transferring control back to Dissectable:

instance (Generic1 f, Dissectable f) => GDissectable (Rec1 f) where
  type GDissected (Rec1 f) = Dissected f
  gstart (Rec1 f) = bihoist Rec1 id $ start f
  gproceed c f = bihoist Rec1 id $ proceed c f
  gplug x gd = Rec1 $ plug x gd

Now, a little work to be able to express AddF as a composition, rather than a product:

data Pair a = Pair a a
  deriving (Functor, Show, Generic1)
  deriving Dissectable via (Generically Pair)

deriving via Generically (Either a) instance Dissectable (Either a)

and we can rewrite ExprF as a composition of functors:

data ExprF' a = ExprF (Either Int (Pair a))
  deriving stock (Functor, Generic, Generic1, Show)
  deriving Dissectable via (Generically ExprF')

pattern ValF' :: Int -> ExprF' a
pattern ValF' a = ExprF (Left a)

pattern AddF' :: a -> a -> ExprF' a
pattern AddF' a b = ExprF (Right (Pair a b))

Everything typechecks, and tcata gives us the same results for equivalent values over ExprF and ExprF'. As one final sanity check, we can compare the computer dissected types:

*> :kind! Dissected ExprF
Dissected ExprF :: * -> * -> *
= Sum2
    (K2 Void)
          (K2 ())
          (Joker Par1))
          (Clown Par1)
          (K2 ())))

*> :kind! Dissected ExprF'
Dissected ExprF' :: * -> * -> *
= Product2
    (Compose2 (Sum2 (K2 Void) (K2 ())) (Rec1 Pair))
          (K2 ())
          (Joker Par1))
          (Clown Par1)
          (K2 ())))

They’re not equal, but are they isomorphic? We should hope so! The first one is Sum2 0 x, which is clearly isomorphic to x. The second is harder:

Product (Compose2 (Sum2 (K2 Void) (K2 ())) (Rec1 Pair)) x

If that first argument to Product is 1, then these two types are isomorphic. So let’s see:

    Compose2 (Sum2 (K2 Void) (K2 ())) (Rec1 Pair)
= symbolic rewriting
    Compose2 (0 + 1) (Rec1 Pair)
= 0 is an identity for +
    Compose2 1 (Rec1 Pair)
= definition of Compose2
    K2 () (Rec1 Pair c) (Rec1 Pair j)
= K2 () is still 1

Look at that, baby. Isomorphic types, that compute the same answer.

As usual, today’s code is available on Github.

January 23, 2022 01:49 PM

January 22, 2022

Mark Jason Dominus

Bad writing

A couple of weeks ago I had this dumb game on my phone, there are these characters fighting monsters. Each character has a special power that charges up over time, and then when you push a button the character announces their catch phrase and the special power activates.

This one character with the biggest hat had the catch phrase

I follow my own destiny!

and I began to dread activating this character's power. Every time, I wanted to grab them by the shoulders and yell “That's what destiny is, you don't get a choice!” But they kept on saying it.

So I had to delete the whole thing.

by Mark Dominus ( at January 22, 2022 05:27 PM

January 21, 2022

Mark Jason Dominus

A proposal for improved language around divisibility

Divisibility and modular residues are among the most important concepts in elementary number theory, but the terminology for them is clumsy and hard to pronounce.

  • is divisible by
  • is a multiple of
  • divides

The first two are 8 syllables long. The last one is tolerably short but is backwards. Similarly:

  • The mod- residue of is

is awful. It can be abbreviated to

  • has the form

but that is also long, and introduces a dummy that may be completely superfluous. You can say “ is mod ” or “ mod is ” but people find that confusing if there is a lot of it piled up.

Common terms should be short and clean. I wish there were a mathematical jargon term for “has the form ” that was not so cumbersome. And I would like a term for “mod-5 residue” that is comparable in length and simplicity to “fifth root”.

For mod- residues we have the special term “parity”. I wonder if something like “-ity” could catch on? This doesn't seem too barbaric to me. It's quite similar to the terminology we already use for -gons. What is the name for a polygon with sides? Is it a triskadekawhatever? No, it's just a -gon, simple.

Then one might say things like:

  • “Primes larger than have -ity of

  • “The -ity of a square is or ” or “a perfect square always has -ity of or

  • “A number is a sum of two squares if and only its prime factorization includes every prime with -ity an even number of times.”

  • “For each , the set of numbers of -ity is closed under multiplication”

For “multiple of ” I suggest that “even” and “odd” be extended so that "-even" means a multiple of , and "-odd" means a nonmultiple of . I think “ is 5-odd” is a clear improvement on “ is a nonmultiple of 5”:

  • “The sum or product of two -even numbers is -even; the product of two -odd numbers is -odd, if is prime, but the sum may not be. ( is a special case)”

  • “If the sum of three squares is -even, then at least one of the squares is -even, because -odd squares have -ity , and you cannot add three to get zero”

  • “A number is -even if the sum of its digits is -even”

It's conceivable that “5-ity” could be mistaken for “five-eighty” but I don't think it will be a big problem in practice. The stress is different, the vowel is different, and also, numbers like and just do not come up that often.

The next mouth-full-of-marbles term I'd want to take on would be “is relatively prime to”. I'd want it to be short, punchy, and symmetric-sounding. I wonder if it would be enough to abbreviate “least common multiple” and “greatest common divsor” to “join” and “meet” respectively? Then “ and are relatively prime” becomes “ meet is ” and we get short phrasings like “If is -even, then join is just ”. We might abbreviate a little further: “ meet is 1” becomes just “ meets ”.

[ Addendum: Eirikr Åsheim reminds me that “ and are coprime” is already standard and is shorter than “ is relatively prime to ”. True, I had forgotten. ]

by Mark Dominus ( at January 21, 2022 04:57 PM

January 20, 2022

Mark Jason Dominus

Testing for divisibility by 8

I recently wrote:

Instead of multiplying the total by 3 at each step, you can multiply it by 2, which gives you a (correct but useless) test for divisibility by 8.

But one reader was surprised that I called it “useless”, saying:

I only know of one test for divisibility by 8: if the last three digits of a number are divisible by 8, so is the original number. Fine … until the last three digits are something like 696.

Most of these divisibility tricks are of limited usefulness, because they are not less effort than short division, which takes care of the general problem. I discussed short division in the first article in this series with this example:

Suppose you want to see if 1234 is divisible by 7. It's 1200-something, so take away 700, which leaves 500-something. 500-what? 530-something. So take away 490, leaving 40-something. 40-what? 44. Now take away 42, leaving 2. That's not 0, so 1234 is not divisible by 7.

For a number like 696, take away 640, leaving 56. 56 is divisible by 8, so 696 is also. Suppose we were going 996 instead? From 996 take away 800 leaving 196, and then take away 160 leaving 36, which is not divisible by 8. For divisibility by 8 you can ignore all but the last 3 digits but it works quite well for other small divisors, even when the dividend is large.

This not not what I usually do myself, though. My own method is a bit hard to describe but I will try. The number has the form where is a multiple of 4, or else we would not be checking it in the first place. The part has a ⸢parity⸣, it is either an even multiple of 4 (that is, a multiple of 8) or an odd multiple of 4 (otherwise). This ⸢parity⸣ must match the (ordinary) parity of . is divisible by 8 if and only if the parities match. For example, 104 is divisible by 8 because both parts are ⸢odd⸣. Similarly 696 where both parts are ⸢even⸣. But 852 is not divisible by 8, because the 8 is even but the 52 is ⸢odd⸣.

by Mark Dominus ( at January 20, 2022 07:30 PM

Sandy Maguire

Automating Wordle

It’s been a weird day.

Erin’s family has recently been into a word game called Wordle. Inevitably it spilled into Erin’s life, and subsequently into mine. The rules are simple: there’s a secret five-letter word, and you need to find it by guessing words. If your word shares a letter in the same place as the secret word, that letter is marked as green. If you have a letter in a different place, but also in the secret word, it’s marked as yellow.

The goal is to find the secret word in six guesses or fewer. Yesterday’s, for example, was “pilot.�

After two days of doing it by hand, like a damn pleb, I decided it would be more fun to try to automate this game. So I spent all day thinking about how to do it, and eventually came up with a nice strategy. This blog post documents it, taking time to explain how it works, and more importantly, why.

Measuring Information

The trick to Wordle is to extract as much information from your guesses as possible. But what does it mean to “extract� information? Is information something we can put a number on?

Rather surprisingly, the answer is yes.

Let’s illustrate the idea by ignoring Wordle for a moment. Instead, imagine I have a buried treasure, somewhere on this map:

You don’t know where the treasure is, but you can ask me some yes/no questions, and I promise to answer truthfully. In six questions, can you find the treasure?

The trick here is to be very strategic about how you ask your questions. For example, the first question you ask might be “is the treasure on the left half of the map?�, to which I reply yes. We can now redraw the map, using red to highlight the places the treasure could still be:

Next you can ask “is the treasure on the bottom half of the remaining red region?� I say no. Thus the treasure is on the top half, and our refined map looks like this:

“Is the treasure on the right half?� Yes.

“Top?� No.

You get the idea. By phrasing subsequent questions like this, each time we cut in half the remaining possible hiding spots for the treasure. When we find the treasure, we’re done.

To quantify the amount of information necessary to find the treasure, we need only count how many questions we asked. If we can go from the full map to finding the treasure in 7 questions, we say we needed 7 bits of information to find it.

In general, the information required to solve a problem is the number of times we need to split the space in half in order to find what we were looking for. Information is measured in “bits.�

Back To Wordle

How does any of this apply to Wordle? The first question to ask ourselves is just how much information is required to win the game. But what does that mean? We’re trying to find one particular five-letter word in the entire English language. So, how many five-letter words are there in the English language? Nobody knows for sure, but I wrote a program to look through the dictionary, and it came up with 5150 words.

If we need to find one word in particular out of these 5150, how many times do we need to cut it in half? Let’s do the math:

  5150 / 2
= 2575 / 2
= 1288 / 2
= 644  / 2
= 322  / 2
= 161  / 2
= 81   / 2
= 41   / 2
= 21   / 2
= 11   / 2
= 6    / 2
= 3    / 2
= 2    / 2
= 1

Thirteen cuts! It takes thirteen cuts to trim down the search space of all possible Wordle words down to a single word. Thus, analogously to our hidden treasure, we need thirteen bits of information in order to find the secret word.

Discovering Information

Knowing the amount of information necessary to solve Wordle is one thing, but where does that information actually come from? Recall, the rules of the game don’t even let us ask yes or no questions; all we’re allowed to do is guess a five-letter word.

How can we turn a five-letter word into a yes/no question? Much like with the buried treasure, it helps to have a lay of the land. Imagine that by some chance, exactly half the words in the dictionary had an e in them, and the other half had no e. Then, by guessing a word that contains an e, we could narrow down the possible words by half depending on whether or not we got a yellow result from Wordle.

Imagine by another coincidence that exactly half the words in the dictionary had an s in them, and the other half didn’t. We could further refine our possibilities by guessing a word that has an s as well as an e.

So that’s the idea. Of course, no letter is going to be in exactly half of the words, but some will be more “exactly half� than others. We can inspect the dictionary, and find the letters which are most “balanced.� Doing that, we get the following:

e: 203
a: 497
r: 641
o: 969
t: 981
l: 1019
i: 1021
s: 1079
n: 1215
u: 1401
c: 1419
y: 1481
h: 1557
d: 1575
p: 1623
g: 1715
m: 1719
b: 1781
f: 1901
k: 1911
w: 1927
v: 2017
x: 2241
z: 2245
q: 2257
j: 2261

The numbers here measure the imbalance of each letter. That is, there are 203 fewer words that contain e than do not. On the other end, there are 2261 more words that don’t contain j than do. This means that by guessing e, we are going to get a much more even split than by guessing j.

The letters with lower numbers give us more information on average than the letters with big numbers. And remember, information is the name of the game here.

By forming a five-letter word out of the most-balanced letters on this list, we can extract approximately five bits of information from the system. So that means we’d like to come up with a word from the letters earot if at all possible. Unfortunately, there is no such word, so we need to look a little further and also pull in l. Now we can make a word from earotl—later!

Since later is formed from the most balanced letters in the word set, it has the highest expected information. By trying later first, we are statistically most likely to learn more than any other guess.

Let’s see how it does against yesterday’s word pilot. We get:


No greens, but we know that the secret word (pilot) doesn’t have any as, es or rs. Furthermore, we know it does have both a l and a t. Therefore, we can eliminate a huge chunk of our words, for example:

  • titan because the secret word has no a
  • cupid because it doesn’t have an l

and, as you can imagine, lots of other words.

In fact, the number of words remaining is 27. They are:

sloth,spilt, split,still,stilt,stool,tulip,unlit,until

We can check how many bits of information we extracted:

log2 (5150 / 27) = 7.58

We managed to extract nearly 8 bits of information from this one guess! That’s significantly better than the 5 we should have gotten “theoretically.� Not bad at all!

Our next guess can be found in the same way. Take the 27 remaining words, and figure out which letters are best balanced among them:

u: 5
i: 7
o: 9
s: 13
h: 17
n: 17
f: 19
p: 19
b: 21
g: 21
y: 21
c: 23
m: 23
q: 25
z: 25
a: 27
d: 27
e: 27
j: 27
k: 27
l: 27
r: 27
t: 27
v: 27
w: 27
x: 27

Notice that several letters have an unbalanced count of 27. This means either all the words have (or do not have) this letter, and thus, these are completely unhelpful letters to guess.

Of our remaining 27, the most balanced word we can make from these letters is until. But notice that until uses both t and l, which we already learned from later!

We can do better by picking a word from the original dictionary which is most balanced according to these numbers. That word is using. Let’s use it for our next guess, which results in:


We’re left with only four words:


Rinse and repeat, by finding the most balanced letters in the remaining possibilities, and then finding the best word in the dictionary made out of those letters. The next guess is morph:


Which eliminates all words except for pilot. Nice.

With that, we’ve successfully automated away playing a fun game. Yay? This strategy works for any word, and hones in on it extremely quickly.

All the code can be found on Github.

January 20, 2022 06:02 PM

Monday Morning Haskell

Changing and Re-Arranging

Any time I go to a wedding or some kind of family gathering where we take a lot of pictures, it can seem like it goes on for a while. It seems like we have to take a picture with every different combination of people in it. Imagine how much worse it would be if we needed the also get every different ordering (or permutation) of people as well!

Even if it got to that level, it would still be easy to write a Haskell program to handle this problem! There are a couple specific list functions we can use. The "subsequences" function will give us the list of every subsequence of the input list.

subsequences :: [a] -> [[a]]

This would be helpful if all we want to know is the different combinations of people we would get in the pictures.

>> subsequences ["Christina", "Andrew", "Katharine"]
[[], ["Christina"], ["Andrew"], ["Christina", "Andrew"], ["Katharine"], ["Christina", "Katharine"], ["Andrew", "Katharine"], ["Christina", "Andrew", "Katharine"]]

Note a couple things. First, our result includes the empty sequence! Second of all, the order of the names is always the same. Christina is always before Andrew, who is always before Katharine.

Now let's suppose we have a different problem. We want everyone in the picture, but we can order them anyway we want. How would we do that? The answer is with the "permutations" function.

permutations :: [a] -> [[a]]

This will give us every different ordering of our three people.

>> permutations ["Christina", "Andrew", "Katharine"]
[["Christina", "Andrew", "Katharine"], ["Andrew", "Christina", "Katharine"], ["Katharine", "Andrew", "Christina"], ["Andrew", "Katharine", "Christina"], ["Katharine", "Christina", "Andrew"], ["Christina", "Katharine", "Andrew"]]

Be wary though! These functions are mostly useful with small input lists. The number of subsequences of a list grows exponentially. With permutations, it grows with the factorial! By the time you get up to 10, you're already dealing with over 3 million possible permutations!

>> length (subsequences [1, 2, 3, 4])
>> length (subsequences [1, 2, 3, 4, 5])
>> length (subsequences [1, 2, 3, 4, 5, 6])
>> length (permutations [1, 2, 3, 4])
>> length (permutations [1, 2, 3, 4, 5])
>> length (permutations [1, 2, 3, 4, 5, 6])

If such cases are really necessary for you to handle, you might need to take advantage of Haskell's laziness and treat the result similar to an infinite list, as we'll cover later this month.

If you want to keep learning about random interesting functions, you should subscribe to Monday Morning Haskell! You'll get access to all our subscriber resources, including our Beginners Checklist.

by James Bowen at January 20, 2022 03:30 PM


Haskell development job with Well-Typed

tl;dr If you’d like a job with us, send your application as soon as possible.

Over the next few months, we are looking for one or more Haskell experts to join our team at Well-Typed. At the moment, we are looking particularly for someone who is knowledgeable and interested in one or more of the following areas:

  • General Haskell development, possibly with a focus on networking and/or performance.
  • Teaching Haskell, both at the introductory and the advanced level.
  • Smart contract development with Plutus (note that this still requires very solid Haskell knowledge as a basis).

This is a great opportunity for someone who is passionate about Haskell and who is keen to improve and promote Haskell in a professional context.

About Well-Typed

We are a team of top notch Haskell experts. Founded in 2008, we were the first company dedicated to promoting the mainstream commercial use of Haskell. To achieve this aim, we help companies that are using or moving to Haskell by providing a range of services including consulting, development, training, and support and improvement of the Haskell development tools. We work with a wide range of clients, from tiny startups to well-known multinationals. We have established a track record of technical excellence and satisfied customers.

Our company has a strong engineering culture. All our managers and decision makers are themselves Haskell developers. Most of us have an academic background and we are not afraid to apply proper computer science to customers’ problems, particularly the fruits of FP and PL research.

We are a self-funded company so we are not beholden to external investors and can concentrate on the interests of our clients, our staff and the Haskell community.

About the job

The roles are not tied to a single specific project or task, and are fully remote.

In general, work for Well-Typed could cover any of the projects and activities that we are involved in as a company. The work may involve:

  • working on GHC, libraries and tools;

  • Haskell application development;

  • working directly with clients to solve their problems;

  • teaching Haskell and developing training materials.

We try wherever possible to arrange tasks within our team to suit peoples’ preferences and to rotate to provide variety and interest.

Well-Typed has a variety of clients. For some we do proprietary Haskell development and consulting. For others, much of the work involves open-source development and cooperating with the rest of the Haskell community: the commercial, open-source and academic users.

About you

Our ideal candidate has excellent knowledge of Haskell, whether from industry, academia or personal interest. Familiarity with other languages, low-level programming and good software engineering practices are also useful. Good organisation and ability to manage your own time and reliably meet deadlines is important. You should also have good communication skills.

You are likely to have a bachelor’s degree or higher in computer science or a related field, although this isn’t a requirement.

Further (optional) bonus skills:

  • experience in teaching Haskell or other technical topics,

  • experience of consulting or running a business,

  • experience with Cardano and/or Plutus,

  • knowledge of and experience in applying formal methods,

  • familiarity with (E)DSL design,

  • knowledge of networking, concurrency and/or systems programming,

  • experience with working on GHC,

  • experience with web programming (in particular front-end),

  • … (you tell us!)

Offer details

The offer is initially for one year full time, with the intention of a long term arrangement. Living in England is not required. We may be able to offer either employment or sub-contracting, depending on the jurisdiction in which you live. The salary range is 50k–90k GBP per year.

If you are interested, please apply by email to . Tell us why you are interested and why you would be a good fit for Well-Typed, and attach your CV. Please indicate how soon you might be able to start.

We are expecting that we need to fill multiple roles over the next few months, with some flexibility as to the starting dates, so there is no firm application deadline.

by andres, duncan, adam, christine at January 20, 2022 12:00 AM

January 19, 2022

Tweag I/O

Why Liquid Haskell matters

Since the inception of the Haskell language, the community around it has steadily innovated with language extensions and abstractions. To mention a few, we had the IO monad, all flavors of type classes and constraints, Template Haskell, generalized algebraic data types (GADTs), data kinds, etc. Then we had to deal with the programs resulting from all this innovation, which eventually sprang feelings that ranged from sticking to Simple Haskell to supporting full motion towards Dependent Haskell.

There seems to be two design goals at conflict in current discussions. On the one hand, we want our programs to be easy to maintain. For instance, changes to one part of the code shouldn’t propagate to many others, and the meaning of a piece of code should remain reasonably deducible from the text.

On the other hand, we want our programs to do what we mean, and we want assistance from the computer to help us achieve this goal. Our human brains can only deal with a limited amount of complexity, and industrial-scale software has largely exceeded our capacity in the last several decades. But can we prove the correctness of our programs with a type-system that doesn’t make them hard to read or change?

In this post, I’m arguing that Liquid Haskell offers an alternative angle to approach this question. Liquid Haskell is a tool that can analyse a program and calculate proof obligations that would ensure that the program meets some specification (not unlike Dafny, Why3, or F*). The specification is included in the program as a special comment inserted by the programmer. The compiler ignores this comment, but Liquid Haskell can find it. Once the proof obligations are identified, they are given to a theorem prover (an SMT solver specifically) in an attempt to save the programmer the trouble of writing a proof.

The conjecture that I pose is that many of the properties that would usually require dependent types to ensure at compile time, can be described in Liquid Haskell specifications. The point is not so much that proofs are easier with Liquid Haskell, but rather that we are in front of an approach that integrates well with a programming language as is, and yet it leverages the power of tools specialized to reason about logic when verifying programs.

Verifying functions

Let us consider an indexing function for lists.

elemAt :: [a] -> Int -> a
elemAt (x :  _) 0 = x
elemAt (_ : xs) i = elemAt xs (i-1)

This function can fail at runtime if the index is negative or greater or equal to the length of the list. If we wanted to do indexing safely, a formulation with dependent types could change the type of the function. The list type is replaced with the type of lists with a given length, and the integer type is replaced with the type of natural numbers smaller than the length.

elemAt1 :: Vec a n -> Fin n -> a
elemAt1 xs i = case (i, xs) of
  (FZ, x :>  _) -> x
  (FS i, _ :> xs) -> elemAt1 xs i

-- Vec a n = lists of length n
data Vec :: Type -> Nat -> Type where
  VNil :: Vec a Zero
  (:>) :: a -> Vec a n -> Vec a (Succ n)

-- Fin n = natural numbers smaller than n
data Fin :: Nat -> Type where
  FZ :: Fin (Succ n)
  FS :: Fin n -> Fin (Succ n)

data Nat = Zero | Succ Nat

We include the types of Vec and Fin to illustrate some of the complexity that needs to be introduced to work with dependent types, but we are not overly concerned with their details. A gentle presentation of these and the following examples with dependent types can be found in this tutorial.

The original function has been rewritten to be total. The types of the arguments have been changed to exclude from the domain the invalid indices. Being a total function, it can no longer fail at runtime. Moreover, the new types force all invocations to provide a valid index. In the transformation, though, we lost the original simplicity. The list type is no longer the standard list type for which many functions are offered in the standard libraries, and the efficient Int type has been replaced by a representation that counts sticks. To avoid conversions between the standard types and the new Vec and Fin types, we could be tempted to use these types in code that needs list indexing, which then would make this change less local than we could wish for.

With Liquid Haskell, instead, we could get our safety guarantees by writing a specification of the original function expressed with predicate logic. We use a refinement type on the index argument to state its validity.

-- specifications go between special comments `{-@ ... @-}`
{-@ elemAt :: xs:[a] -> { i:Int | 0 <= i && i < len xs } -> a @-}
elemAt :: [a] -> Int -> a
elemAt (x :  _) 0 = x
elemAt (_ : xs) i = elemAt xs (i-1)

A refinement type denotes a subtype of another type, characterized by a given predicate. In our example, { i:Int | 0 <= i && i < len xs } is a subtype of Int, whose values i satisfy the predicate 0 <= i && i < len xs. The specifications allow us to name the arguments of functions and to refer to them in the predicates, such as the list argument named xs. The language used to describe a predicate is not Haskell, but a separate language with functions and logical connectives. When predicates need to refer to Haskell functions, the functions can be translated to the logic.

The function elemAt continues to be partial in Haskell, but it is total according to its specification. And Liquid Haskell is powerful enough to check that elemAt meets the specification, i.e. given a valid index, the function won’t fail. It is a choice of the user to use Liquid Haskell to ensure that the invocations meet the specification as well, or to leave the invocations unverified. The solution not only leaves the original function untouched, which dependent types could also achieve, but it is remarkably economical in the way of expressing the property we care about.

Beyond bounds checking

Bounds checking is useful enough, but given the ability of Liquid Haskell to translate pure Haskell functions into the logic, we can aspire to verify other kinds of properties too. The following example is a datatype that can be used to represent untyped lambda expressions. It has constructors for variables, lambda abstractions, and applications.

-- | Unchecked expression, indexed by a bound of the allowed free variables
data UExp (n :: Nat)
  = UVar (Fin n)   -- ^ de Bruijn index for a variable
  | ULam (UExp (Succ n))
  | UApp (UExp n) (UExp n)
  deriving Show

Variables are represented with a de Bruijn index. That is, the variable UVar i refers to the variable bound by the i-th lambda abstraction surrounding the variable occurence. Thus, ULam (ULam (UVar FZ)) stands for the lambda expression \x . \y . y, and ULam (ULam (UVar (FS FZ))) stands for \x. \y. x.

The type index of UExp allows us to refer conveniently to closed expressions as UExp Zero. In a closed expression, all variables have a matching lambda abstraction. Thus, ULam (UVar FZ) :: UExp Zero type checks, but UVar FZ :: UExp Zero wouldn’t typecheck because there is no lambda abstraction to bind UVar FZ and it has type forall n. UExp (Succ n).

Besides the fact that we are still counting sticks to identify variables, the type index needs to be carried around everywhere we deal with these expressions regardless of the task at hand. The Liquid Haskell counterpart looks as follows.

data UExp
  = UVar {i:Int | 0 <= i} // indices are specified to be non-negative
  | ULam UExp
  | UApp UExp UExp
data UExp
  = UVar Int
  | ULam UExp
  | UApp UExp UExp

-- | Computes an upper bound of the variables that appear free
-- in an expression.
{-@ reflect freeVarBound @-}
{-@ freeVarBound :: UExp -> { i:Int | 0 <= i } @-}
freeVarBound :: UExp -> Int
freeVarBound (UVar v) = v + 1
freeVarBound (ULam body) = max 0 (freeVarBound body - 1)
freeVarBound (UApp e1 e2) = max (freeVarBound e1) (freeVarBound e2)

With these definitions we can mimic the original indexed type with the following type synonyms.

-- Type synonym parameters starting with an upper case letter stand
-- for variables that are substituted with term-level expressions
-- instead of types.
{-@ type UExpN N = { e:UExp | freeVarBound e <= N } @-}
{-@ type ClosedUExp = UExpN 0 @-}

One feature of the Liquid Haskell way is that we don’t need to use a type index in the Haskell datatypes. We are only concerned with the ability to express the type of closed expressions in specifications, not in the actual program.

Another feature is that we are using a simple Haskell function to express what the bound of free variables is for a particular expression, and then we are using this function in the logic to say that an expression is closed. The reflect keyword in the specification of freeVarBound is directing Liquid Haskell to translate the function to the logic language.

Now we can test our type specifications with example expressions, and this is possible without a single GADT or a promoted data constructor.

{-@ e0 :: ClosedUExp @-}
e0 :: UExp
e0 = ULam (UVar 0)

{-@ e1 :: ClosedUExp @-} -- Fails verification
{-@ e1 :: UExpN 1 @-} -- Passes verification
e1 :: UExp
e1 = UVar 0

Beyond closed expressions

As a final example, let us consider a juicier Haskell datatype to represent typed lambda expressions. These expressions are functions manipulating values of some opaque type T.

-- | @Exp ctx ty@ is a well-typed expression of type @ty@ in context
-- @ctx@. Note that a context is a list of types, where a type's index
-- in the list indicates the de Bruijn index of the associated term-level
-- variable.
data Exp :: forall n. Vec Ty n -> Ty -> Type where
  Var   :: Elem ctx ty -> Exp ctx ty
  Lam   :: Exp (arg :> ctx) res -> Exp ctx (arg :-> res)
  App   :: Exp ctx (arg :-> res) -> Exp ctx arg -> Exp ctx res

-- | A type encoding types. @T@ is the atom while @:->@ refers to
-- the type arrow '->'.
data Ty = T | Ty :-> Ty

-- | @Elem xs x@ is evidence that @x@ is in the vector @xs@.
-- @EZ :: Elem xs x@ is evidence that @x@ is the first element of @xs@.
-- @ES ev :: Elem xs x@ is evidence that @x@ is one position later in
-- @xs@ than is indicated in @ev@
data Elem :: forall a n. Vec a n -> a -> Type where
  EZ :: Elem (x :> xs) x
  ES :: Elem xs x -> Elem (y :> xs)

The Exp datatype still tracks whether variables are in scope, this time with an Elem type that stands for a proof that a variable is in the typing context. Additionally, the datatype ensures that the expressions are well typed.

In Liquid Haskell we start with untyped expressions as before, with the novelty that the ULam constructor now has an extra argument to indicate the type of the binding. The extra argument is necessary to implement a type inference function named inferType.

data UExp
  = UVar Int
  | ULam Ty UExp
  | UApp UExp UExp
  deriving Show

{-@ reflect elemAt @-}
{-@ reflect inferType @-}
{-@ inferType :: ctx:[Ty] -> UExpN (len ctx) -> Maybe Ty @-}
inferType :: [Ty] -> UExp -> Maybe Ty
inferType ctx (UVar i) = Just (elemAt ctx i)
inferType ctx (ULam t body) =
  case inferType (t : ctx) body of
    Just r -> Just (t :-> r)
    Nothing -> Nothing
inferType ctx (UApp e0 e1) =
  case inferType ctx e0 of
    Just (a :-> r) -> case inferType ctx e1 of
      Just t -> if a == t then Just r else Nothing
      Nothing -> Nothing
    _ -> Nothing

Liquid Haskell verifies that the list indexing in the first equation of inferType is safe, thanks to the refinement type of the expression that ensures that the index is within the bounds of the list. The same refinement type ensures that we don’t forget to grow the context when we infer the type for the body of a lambda expression. And we also get the assurance that inferType terminates on inputs that meet the specification. In order to express well-typed terms, we only need now a type synonym.

{-@ type WellTypedExp CTX TY = { e:UExp | freeVarBound e <= len CTX && inferType CTX e == Just TY } @-}

{-@ e2 :: WellTypedExp [T] T @-}
e2 :: UExp
e2 = UVar 0

{-@ e3 :: WellTypedExp [] (T :-> T) @-}
e3 :: UExp
e3 = ULam T (UVar 0)

The specification of well-typed terms borrows the function inferType from Haskell, which has been translated to the logic. This allows to reuse the understanding that the user has of inferType to express what being well-typed means. When verifying e2 and e3, Liquid Haskell relies on the the SMT solver and other reasoning mechanisms of its own.

Summing up

The examples in this post that use Liquid Haskell can be found here. If you would like to see more substantial use of Liquid Haskell, there are two implementations of an interpreter for lambda expressions that can be compared side by side. The first implementation relies on emulating dependent types in Haskell and was implemented by fellow Tweager Richard Eisenberg. The second implementation is my reimplementation of Richard’s using Liquid Haskell.

In this post I have presented some examples where Liquid Haskell is expressive enough to deal with properties that are typically associated with the need of a dependently-typed language. Yet, unlike dependent types, Liquid Haskell allows us to keep programs to verify unchanged from the perspective of the compiler, and the trouble of discharging proof obligations is addressed in cooperation with theorem provers.

The downside of this way is that we now have to deal with the complexities of the integration. We need to orchestrate the communication between the different tools, translate our programs into the logics of the theorem provers, and translate the errors back to terms that the user can relate to the program being verified. There are more opportunities to make mistakes in the seams between the tools, while dependently-typed languages are often built on a single compiler with a possibly smaller trusted core.

On the user side, though, we can turn Liquid Haskell into a philosophy for approaching verification. We are not necessarily doomed to build a behemoth language-implementation capable of verifying it all. Instead, we can separate our verification from our programming needs and tackle each group with dedicated tools in order to better reuse code and techniques on both fronts.

January 19, 2022 12:00 AM

January 17, 2022

Monday Morning Haskell

Transposing Rows

In our last article we explored how groupBy could allow us to arrange a Matrix (represented as an Array) into its rows.

myArray :: Array (Int, Int) Double)
myArray = listArray ((0, 0), (1, 1)) [1..4]

rows = groupBy (\((a, _), _) ((b, _), _) -> a == b) (assocs myArray)


>> rows myArray = [[((0, 0), 1), ((0, 1), 2)], [((1, 0), 3), ((1, 1), 4)]]

But what about a different case? Suppose we were most concerned about grouping the columns together? At first glance, it doesn't seem as though this would be possible using groupBy, since the elements we want aren't actually next to each other in the assocs list.

However, there is another simple list function we can use to complete this process. The transpose function takes a list of lists, which we can think of as a "matrix". Then it transposes the matrix so that the rows are now the columns.

transpose :: [[a]] -> [[a]]


>> transpose [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

Clearly we could use this function to achieve the result we want with our 2D array!

>> rows = groupBy (\((a, _), _) ((b, _), _) -> a == b) (assocs myArray)
>> transpose (rows myArray)
[[((0, 0), 1), ((1, 0), 3)], [((0, 1), 2), ((1, 1), 4)]]

This function, of course, works even if the input is not a "square" matrix.

>> transpose [[1, 2, 3, 4], [5, 6, 7, 8]]
[[1, 5], [2, 6], [3, 7], [4, 8]]

It even works if it is not rectangular! If some "rows" are shorter than others, than our result is still sound! Essentially, the first row in the result is "every first element of a list". The second row is every "second" element. And so on. Every element in the original list is represented in the final list. Nothing is dropped like you might have in zip.

>> transpose [[1, 2, 3], [4], [5, 6], [], [7, 8, 9, 10]]
[[1, 4, 5, 7], [2, 6, 8], [3, 9], [10]]

Hopefully this is an interesting function that you hadn't thought of using before! If you want to keep seeing tricks like this, subscribe to our monthly newsletter so you can keep up to date with the different tricks we're exploring here. You'll also get access to our subscriber resources!

by James Bowen at January 17, 2022 03:30 PM

January 15, 2022

Ken T Takusagawa

[vgkevtpx] sampling large binomial with random-fu

here are some results of single samples from the binomial distribution using Data.Random.Distribution.Binomial in the random-fu Haskell package, version .

N=320000000000000000000000000000000 p=0.5 sample=160000000000000017763568049748834

N=330000000000000000000000000000000 p=0.5 sample=165000000000000000000000000000000

N=320000000000000000000000000000000 p=6.25e-2 sample=20000000000000004440892003541839

N=330000000000000000000000000000000 p=6.25e-2 sample=20625000000000000000000000000000

Haskell source code.

in the first line, our one sample from the binomial distribution is equivalent to simulating flipping a fair coin 3.2*10^32 times and counting the number of heads.  (however, all these results are calculated instantly.)  things seem to work correctly: about half heads, with statistical noise of order sqrt(N) as would be expected from a random sample.

in the second line, we flip 3.3*10^32 times.  things have gone wrong: the number of heads is precisely half with no statistical noise.

the following two examples use an unfair coin whose probability of success is 1/16.  (we use probabilities expressible exactly in binary to eliminate decimal-to-binary conversion as a possible source of noise).  things go wrong similarly: the threshold does not depend on the probability.

the threshold seems to be around N=2^108, probably related to 53 bits of mantissa in double precision (108 / 2 = 54).  (incidentally, 2^108 ~= 3.2*10^32.  the digits of the decimal mantissa coincide with the exponent.  previously similar.)

this issue is mentioned in comments in the source code of integralBinomial random-fu, but not in the Haddock documentation.  (that Hackage makes it easy to browse source code makes it much better than similar library documentation for other programming languages.  one wonders if the Log4j/Log4shell vulnerability would have been discovered sooner if Java documentation had made it easier to view implementation source code.)

on one hand, of course it's unfortunate that things go wrong for large N.  on the other hand, the failure mode is not too bad.  (there might not be much incentive to fix the bug.)  if you're going to sample the binomial, getting the mode (the most likely result) as an answer might be fine for many applications.

inspired by wanting to draw lots and lots of stars.  if one always gets the mode, there will not be pixels containing a statistically unusual large number of stars.

sampling the binomial distribution efficiently for large N is a non-trivial but seemingly well studied topic.  the comment in random-fu's source code cites Knuth's TAOCP.  wikipedia cites:

Devroye, Luc (1986) Non-Uniform Random Variate Generation, New York: Springer-Verlag.

and links to an online chapter of the book.

by Unknown ( at January 15, 2022 05:25 PM

January 14, 2022

Sandy Maguire

Review: Shall We Vote on Values, But Bet on Beliefs?

Another week, another paper review. This week we’re looking at Robin Hanson’s “Shall We Vote on Values, But Bet on Beliefs?” (SWVVBBB). In SWVVBBB, Hanson proposes a new form of government he calls “futuarchy,” which injects prediction markets into democratic government.

The preamble starts like this:

  1. Democracies fail by not aggregating information
  2. Prediction markets are really good at aggregating information
  3. We can postdict which nations did better than others.

These three assumptions seem relatively self-evident, though perhaps the first is the least obvious. In support of this assumption, Hanson presents some evidence:

  1. Most people vote the same way they’ve always voted.
  2. Most people don’t know what the government is doing.
  3. Most people don’t know which platforms parties stand for.
  4. It’s hard for governments to disseminate information.
  5. The populace often has different ideas about what government should do than what experts say it should do.
  6. At large, the populace believes in a bunch of rather crazy things (eg. 85% of Americans believe Jesus was born to a virgin, and 52% believe astrology has some scientific proof.)
  7. It seems untenable that bad policy decisions would be adopted if they were known to be bad policy decisions.

The first three points are pretty easy to believe as well. The fourth is also tenable; the government relies on mainstream media to get messages out, where it can be editorialized before being published. Points five and six taken together suggest that the people are often wrong about what constitutes good policy.

Hanson presents sources for these claims, but I don’t have any issues taking them on faith — this isn’t the crux of the paper. It’s safe to say that failure to aggregate information is a serious problem in democracies.

Why does this matter? Because democracy gives us one vote, with which not only do we need to vote on our values, but also on how we’d like the government to bring about those values. For example, political parties’ platforms involve both values (“we care about healthcare, equality, housing, etc.”), and strategy (“we will build 1,000,000 new houses and hire 50,000 new nurses.”)

Personally, I support Canada’s left-most party’s values, but I don’t think they’d be very capable if they actually got into power. This tension leads me away from voting for them, in an attempt to find a better mix of competent and represents what I care about.

The question that SWVVBBB answers is “how can we separate voting for values from voting for execution of those values.” And as the title suggests, the trick is to do it by putting our money where our mouths are.

Betting on Beliefs

The main contribution of SWVVBBB in my opinion is its clever mechanism design to pull probabilities out of prices. This will take some explanation.

At a high level, the idea is we should vote for a party based only on our values. The winning party is responsible for choosing an explicit mathematical function that represents how well the country is doing on its values. For example, this function might be “GDP”, or “percent of the population employed,” or “global happiness ranking,” or what have you. Probably, it will be some combination of these things.

The government’s only job is to define what success looks like, and how we’re going to measure it. That’s all the government does. They don’t get to raise taxes, or allocate spending, or appoint judges, or anything like that. They are responsible only to pick the utility function, and to create any agencies that might be required to measure it.

Here’s where it gets interesting.

We now put a prediction market in place. For a fee, anybody can propose any intervention on the way the country is run. Collectively, people buy and sell bonds corresponding to whether they think the proposal will help maximize the government’s value function. By looking at the market price of the bonds, we can get a good sense of whether or not the market thinks the proposal is a good idea. If it is, the proposal immediately gets turned into law.

The details on how to actually go about building this market are a good chunk of the paper, which we will get to in the next section. For now, let’s continue thinking about the high-level idea.

By connecting the law to a market, we immediately get a few benefits. The first is that we now incentivize people to have true beliefs about governance. If my ideas about how the country should be run are in fact good, I can make money off of that skill. Furthermore, it incentivizes transparency. If people can make lots of money off of this market, you can bet they’ll watch it extremely closely.

Perhaps best of all, it pushes stupid people out of the market. If you are consistently wrong about what constitutes good governance, you will quickly price yourself out of the market — similar to people who go broke trying to play the stock market on the advice of their uncle.

To be clear, this doesn’t disenfranchise people. They still get to vote on the government. But it requires questions of policy to be answered by putting your money where your mouth is. Thus, under this scheme, it becomes prohibitively expensive to have stupid beliefs strongly held.

Mechanism Design

So, that’s the high level idea. How do we actually go about implementing it?

Discovering Probability

Imagine a particular question of fact, that can be definitely observed in the future. For example, maybe we want to determine whether or not it will be raining next Friday at 10am in the park beside my house. The more specific the question, the better.

The bank can offer a pair of assets:

  1. Pays \$1 if it is raining on Friday at 10am.
  2. Pays \$1 if it is not raining on Friday at 10am.

and the bank is happy to sell these assets for \$1, because exactly one of them will actually pay off. In the meantime, the bank can collect interest for free.

Suppose Market-Making Marty buys 10,000 of these pairs from the bank. Marty can now sell the assets individually, for example, he might sell some not raining assets to Dusty, and some raining assets to Misty. Initially, he might price both assets at \$0.60.

By selling both sides of the pair at \$0.60, Marty safely makes \$2,000 dollars off of his \$10,000 investment. It’s safe because he no longer holds any assets except for cold hard cash.

Dusty figures that it’s sunny more than 60% of the time, so paying \$0.60 for an asset that pays \$1.00 is a good deal. If he estimates the likelihood of it being sunny on Friday at 80%, then he expects an 80% chance of making \$10,000, and a 20% chance of making \$0. Adding these together, he computes his expected value at \(0.8 * 10000 + 0.2 * 0 = 8000\), which is \$2,000 more in expectation than the cost of buying all the assets at \$6,000.

Misty does some chain of reasoning that makes her believe that her money is also well spent.

Summer has been thinking hard, and is pretty sure the chance of rain on Friday is actually closer to 5%. So she approaches Dusty, and offers to buy some of his no-rain assets for \$0.90. Dusty thinks this is too confident, so he happily unloads his options to Summer, since again he expects to be making money on this trade.

When everything settles, no-rain assets are trading at a market price of \$0.83, while rain assets are at \$0.17 (in the limit, these must add up to \$1.00, or else you can make money by holding both sides.)

Patty, who was thinking about having a picnic in the park on Friday, looks at the asset prices, and decides it’s not going to rain, since no-rain is trading above rain.

Thus, Patty has made a decision about the future, based on information aggregated from Dusty, Misty, Summer, and whoever else might have been in on the market.

Friday comes along, and it doesn’t rain. Patty is happy, as is everyone holding no-rain assets, since they all made money.

Discovering Expected Value

We can play the same game to extract information about expected values from the market. Suppose we want to guess the price of flour next week. The bank can look at the historical high of flour price, and sell pairs of assets:

  1. Pays \$x, where x is the percentage of the cost of flour with respect to its all-time high. For example, if the all-time high was \$40, and the cost of flour next week was \$30, this asset pays \$0.75 (\(30/40\)).
  2. Pays \$(1 - x).

Again, the bank is happy to make this trade because they are still only paying out \$1, and they still get to make interest on that dollar until the market pays out.

The trading price of these assets now correspond to the market’s opinion of the expected value of the price of flour next week. If \$x assets are trading at \$0.30, we can read the expected value of flour next week to be \(0.3 * 40 = 12\).

Conditional Assets

There’s one last trick we can do. We can make conditional assets, that is, assets which only pay out when a certain precondition is met. We can consider the case of whether or not a big law firm BLF wins its case, conditional on whether or not they put Litigious Larry on as the lead prosecutor. In this case, the bank offers quads of assets for \$1:

  1. Pays \$1 if BLF wins with Larry
  2. Pays \$1 if BLF wins without Larry
  3. Pays \$1 if BLF loses with Larry
  4. Pays \$1 if BLF loses without Larry

Again, only one of these four cases will actually pay out (ignoring the possibility that it doesn’t go to court, but that’s a simplification for the example, not a limitation of the technique.)

Like Patty in the park, BLF can make a decision about the future: whether or not they should put Larry on the case based on whether Win|Larry is trading higher than Win|-Larry.

Furthermore, they can rethink whether or not they want to settle out of court if the BLF loses assets are trading at better than BLF wins.

Putting It Into Practice

With the mechanism design under our belt, we can now think about implementing futarchy.

The people vote on the government based on parties’ values. The government puts forward its value function. Now, anyone can pay a transaction fee (perhaps quite high) to propose a policy intervention. The bank can offer a pair of assets:

  1. Pays \$x if the intervention is made
  2. Pays \$(1-x) if the intervention is made

where x is linear in the observed value function at some point in the future. The idea is to tie the payoff of the asset to how well it helps influence our value function.

For example, maybe the government decides the value function is GDP. Maybe the target GDP in five years is \$30 trillion. Phoebe then proposes building a high-speed train between Toronto and Vancouver. The bank can offer assets as above, where x is the observed percentage of GDP out of \$30 trillion.

After some period of trading, the \$x assets are trading well above \$(1-x). This is interpreted as the market thinking this train would be good for long term GDP. Immediately, the decision to build the train is ensconced in law.

That doesn’t mean all the details are necessarily worked out. If Phoebe had the whole plan for the train worked out, she could have put those in her proposal. But let’s assume she didn’t. Instead, someone can make a new proposal, that Cost-Cutting Carlos should get put in charge of the project. At the same time, there is a proposal that Safety Susan should be put in charge. Markets pop up for both, and whichever is trading higher gets the bid (unless neither is trading well!)

We can follow this process ad infinitum, to work out more and more particulars. If, at any time, someone thinks the train is actually a bad idea, they can make a proposal to stop its development. We need not worry about the inefficiency inherent this sort of flip-flopping; the market will necessarily price-in the sunk costs.

January 14, 2022 03:55 PM

Tweag I/O

Trustix - Good things come in trees

In the previous Trustix post Adam introduced Trustix and the general ideas behind it. In this post we will talk you through the data structures behind Trustix, in particular Merkle trees.

At its core, Trustix verifies computations by comparing pairs of hashable inputs and outputs between multiple builders by using build logs. A simple set of input-output pairs would be one way to implement this build log — the log would be queried with an input hash, and would then yield the corresponding output hash. Users could then compare it to the output hashes from other logs or binary caches, to check whether they have all obtained the same answer to a computation.

For security, these input-output pairs could be signed by the builder, which ensures their integrity as long as the signature key is not compromised.

Trustix goes further than this, using a “verifiable log” to which data can be appended but that cannot be modified in any other way. This log is structured as a tree where sets of entries have an identifying hash called the log head, which clients can use to verify that the past data has been preserved unmodified. This ensures that the log isn’t tampered with retroactively and if, for instance, a signature key is compromised, we only have to invalidate entries from that point forward.

Merkle trees

The way the append-only log works is somewhat similar to Git commits. In a Git repository, each commit is given a hash, which depends on its content and on the previous commits. If you modify an earlier commit X (and push --force) then all the commits after X see their hash change, and anybody who has a reference to one of theses commits will immediately notice that the branch has been modified. Therefore, a Git branch acts as a verifiably append only log of commits.

Trustix logs are similar, except that they use a tree structure, called a Merkle tree, instead of a chain. Merkle trees are ubiquitous in computer science and can be found in Blockchains, databases, file systems, peer-to-peer software and many other use cases.

Why a tree? While a simple chain of hashes, like Git’s, is sufficient to ensure that the log is append-only, looking up a particular input-output pair would require a linear scan of the entire log. With a Merkle tree, a lookup only requires walking and downloading one (logarithmically sized) branch, as we describe below.

How Merkle trees work

A Merkle tree has data in its leaf nodes, with immediate parents of those nodes being the hashes of each datum. In our case the data are the input-output hashes of the builds, and the first-level parent nodes thus contain hashes of these pairs of hashes. At the second level, we have hashes once more — but now we have the hashes for sets of nodes at the first level. That is, we have hashes of hashes of the input-output hashes. This goes on and on.

What is important to retain here is that gradually hashes get aggregated into fewer and fewer nodes, until the root node is reached. This root node (the log head) transitively depends on all leaf nodes and thus all data that is stored in the log.

Consider, for example, the following tree:

       /    \
      /      \
     /        \
    m          n
   / \        / \
  a   b      c   d
  |   |      |   |
  d0  d1     d2  d3

The d0, d1, d2, d3 nodes are data nodes and contain the input-output hashes in the log. a=h(d0), b=h(d1), c=h(d2), d=h(d3) are their hashes computed with the hash function h. The aggregated hashes m=h(a,b)=h(h(d0),h(d1)) and n=h(c,d)=h(h(d2),h(d3)) coalesce in the root hash root0 = h(m, n), which depends on all leaf nodes.

Let’s append two more nodes, maintaining the binary tree structure, and see what happens:

              /    \
             /      \
            /        \
           /          \
        root0          \
       /    \           \
      /      \           \
     /        \           \
    m          n          o
   / \        / \        / \
  a   b      c   d      e   f
  |   |      |   |      |   |
  d0  d1     d2  d3    d4  d5

Since we already have root0 from the previous state of the tree, the only thing we need to calculate root1 is to hash d4 and d5, let that propagate up the tree to o and hash root0 combined with o. Appending new branches thus doesn’t require us to traverse the whole tree again.

We can verify that nothing from the previous state was modified by seeing that root0 is still in the tree; we have only appended to it.1

It is also possible to easily verify that a branch or a leaf node of the tree belongs there. For instance, if we want to check that d2 is in the tree, the builder sends us the hashs d, m, and o: the hashes of the siblings of the nodes on the path from root1 to d2. With d2, we compute c, with c and d, we compute n, with n and m we compute root0, and root0 and o we compute root1: if root1 coincides with the root of the builder’s log, then indeed, we have verified that d2 belongs to the log.

Let’s explore this in more detail in the context of Trustix.

Solving trust with Merkle trees

Let’s say we want to verify that an output of the builder is indeed what that builder built for that input derivation. We get pointed to an entry in the build log (a leaf node in a Merkle tree) that has the derivation’s input hash, and the build’s output hash as its values.

We need to verify that this is not a faked entry of the tree, so we take the path from the leaf node up to the root tree. We hash each node with its sibling to calculate their parent’s hash, and eventually we reach the root hash. If the result we get is the same as the root hash in the tree, then this entry is indeed a part of the log.

In addition, the root hash is signed with the log’s private key, which allows us to verify that the tree as a whole is correct according to its owner. We now have a single signed entity from which we can verify everything, as opposed to the current Nix binary cache which signs each entry individually. One advantage of this new approach is a defense against targeted attacks. Everyone can publish the log head they receive from a builder, so others can check they are receiving the same one.

The downside of a Merkle tree is that while it’s fast to verify an entry, it’s slow to find that entry — in general, we have to search the whole tree. In other words, the questions “is this (input hash, output hash) in the tree” and “what is the output hash value of this input hash?” are inefficient to answer.

Sparse Merkle trees to the rescue!

The sparse Merkle tree is a very clever indexed variation on the standard Merkle tree. Suppose that our hashes are 256 bit long, we make a Merkle tree with <semantics>2256<annotation encoding="application/x-tex">2^{256}</annotation></semantics>2256 leaves. Each leaf is initially empty. To add an input-output pair, we compute <semantics>I<annotation encoding="application/x-tex">I</annotation></semantics>I, the hash of the input, and we change the <semantics>I<annotation encoding="application/x-tex">I</annotation></semantics>I-th leaf to contain the (hash of) the output. Effectively the leaves form an array of length <semantics>2256<annotation encoding="application/x-tex">2^{256}</annotation></semantics>2256.

Now we can easily find entries (or show they’re not present in the tree), by hashing the input and looking up the corresponding leaf. We can still verify that the entry belongs in the tree once we find it, by hashing our way up as we did before.

There are two problems. First, sparse Merkle trees are huge and time-consuming to generate. This can be solved by (ab)using the fact that most nodes in the tree are empty, to cache large sections of the tree that all look the same. Second, as you may have noticed our tree is no longer append-only. We’re not appending entries anymore, we’re modifying them from empty → something.

Combining trees for fun and profit

By combining both types of trees in a single log we can get the best of both worlds!

We want the log itself to be an append-only standard Merkle tree. So we use an input-hash indexed sparse Merkle tree for lookups, as earlier; but, instead of storing outputs directly, we store a reference to the input-output pair in the standard Merkle tree.

The new submission process is:

  1. Append (input hash, output hash) to the standard Merkle tree
  2. Get the new root hash of the tree
  3. Sign the new root hash
  4. Write a reference to that entry into the sparse Merkle tree, indexed by the input hash
  5. Get the new root hash of the sparse tree
  6. Sign the new sparse root hash
  7. Publish the signed roots at a well-known location

The lookup process is:

  1. Find the input hash of the derivation you want
  2. Look up that input hash in the sparse Merkle tree
  3. Verify that entry belongs in the tree, and the tree is correctly signed
  4. Follow the reference to the standard Merkle tree
  5. Verify that entry belongs in the tree, and the tree is correctly signed

Success! We can look up log entries, and prove that the log itself is append-only.

Readily available implementations

Why blockchains are not fit for purpose

Some astute readers may have noticed already that what we have described above is awfully close to a blockchain, and you may think that intuitively a blockchain would make sense as a foundation of trust. After all isn’t this what blockchains are all about? The problem comes down to the consensus models. Blockchains are all about distributed consensus, but what Trustix aims to solve requires a far more local idea of what consensus means, which makes all current blockchains unsuitable as a foundation.

Consensus, and therefore blockchains, comes with required financial models such as Proof-of-Work and Proof-of-Stake. Our feeling is that neither of these models are applicable to something like Trustix. They might be great for financial transactions, but carry too much inherent cost for a system like Trustix where we want log operation to come at essentially zero extra costs (hosting aside).


Trillian is a mature implementation of Merkle trees that’s already widely used at internet scale, mainly for Certificate Transparency, something that has greatly inspired Trustix.

The performance of Trillian is excellent and it runs on top of many popular storage layers like MariaDB/MySQL, Google Cloud Spanner and PostgreSQL. Thanks to certificate transparency already being deployed at large scale there are many caching patterns to adopt from this ecosystem that apply directly to Trillian.

Trillian has support for smartcard based signing using the PKCS #11 standard, something you don’t necessarily get that easily with the other approaches.

This makes Trillian a very solid foundation to build on. It does require a more complex setup than the other solutions considered. Trillian also ties you to an RDBMS like MySQL or PostgreSQL, making it a very heavy weight solution.


The one major thing Git has going for it is that it’s a simple, well understood format that could be stored easily, potentially even for free at providers like GitHub or GitLab. Git is also based on a structure of Merkle trees, however these are not exposed or designed in a way that makes them suitable for Trustix.

The performance numbers we saw out of Trillian over using Git were also far better at around 3300 submissions per second vs the around 200 we achieved with the Git-based approach. This shows that other solutions can be much more optimized and that Git is too much of a bottleneck.

Rolling your own (or using lower level libraries)

Rolling our own implementations from scratch has some major advantages in terms of allowing us to control the implementation and optimize for problems specific to the package management verification space, the requirements that the NixOS foundation has to deploy this at scale. This makes it much easier to optimize the structures.

Another advantage of this approach is that we entirely control the storage. A valuable property we get from this is that Trustix can run entirely self-contained with its own embedded database.

This turned out to be the best solution for Trustix as we highly benefit from the level of customization we can do.


By combining the strengths of two data structures — traditional and sparse Merkle trees — we can get the best of both worlds and prove the following about a log efficiently:

  • Non-inclusion of a derivation (i.e. this log never built a package)
  • Inclusion of a derivation (this entry is indeed a part of the log)
  • Prove correct operation (the append-only property of the log)

It requires a full log audit to prove that the sparse Merkle tree is append-only. This is needed less often, though, and can be offloaded onto semi-trusted verifiers. This will be explained in depth in a future blog post. When we look up builds, we verify them in the standard Merkle tree, which is easily verified to be append-only.

In the next post in the series we will elaborate on how Trustix compares logs between builders and makes decisions about which build outputs to use.

The development of Trustix is funded by NLnet through the PET(privacy and trust enhancing technologies) fund.



  1. In general it may take slightly more work to verify that a Merkle tree is append-only. Imagine if we add more nodes at this point: o will get hashed with something new, and that will get hashed with root0, replacing root1. However, we can still find root0 and o in the tree, and reconstruct root1, showing it is contained unmodified in the new tree. Importantly, this is still a fast operation.

January 14, 2022 12:00 AM

January 13, 2022

Monday Morning Haskell

"Group" Theory

Last time around we saw the function "intercalate". This could take a list of lists, and produces a single list, separated by the list from the first argument. But today, we'll take a look at a couple functions that work in reverse. The "group" functions take something like a flat, single-level list, and turn it into a list of lists.

group :: (Eq a) => [a] -> [[a]]

groupBy :: (a -> a -> Bool) -> [a] -> [[a]]

When you use the basic group function, you'll take your list and turn it into a list of lists where each list contains successive elements that are equal.

>> group [1, 2, 2, 3, 4, 4, 4]
[[1], [2, 2], [3], [4, 4, 4]]

This is useful for answering a question like "What is the longest consecutive sequence in my list?" Like any list function, it can be used on strings.

>> group "Hello Jimmy"
["H", "e", "ll", "o", " ", "J", "i", "mm", "y"]

The groupBy function provides more utility. It lets you pass in your own comparison test. For example, here's a rudimentary way to separate a raw CSV string. You "group" all characters that aren't strings, and then filter out the commas.

separateCSV :: String -> [String]
separateCSV s = filter (/= ",") (groupBy (\c1 c2 -> c1 /= ',' && c2 /= ',') s)


>> separateCSV "Hello,there,my,friend"
["Hello", "there", "my", "friend"]

Here's another example. Suppose you have an array representing a two-dimensional matrix:

myArray :: Array (Int, Int) Double

The assocs function of the Array library will give you a flat list. But you can make this into a list of lists, where each list is a row:

myArray :: Array (Int, Int) Double)

rows = groupBy (\((a, _), _) ((b, _), _) -> a == b) (assocs myArray)

This is a common pattern you can observe in list functions. There's often a "basic" version of a function, and then a version ending in By that you can augment with a predicate or comparison function so you can use it on more complicated data.

If you enjoyed this article, sign up for our mailing list to our newsletter so that you get access to all our subscriber resources. One example is our Beginners Checklist! This is a useful resource if you're still taking your first steps with Haskell!

by James Bowen at January 13, 2022 03:30 PM

Matthew Sackman

Let's build! A distributed, concurrent editor: Part 6 - Testing

In this series:

So far, I’ve defined some semantics for this distributed, concurrent editor. I’ve defined a protocol, built a really basic browser-based client in TypeScript, and a server in Go. But I’ve not written a single test. This must be rectified: some of the algorithms are not trivial or obviously correct, particularly those that deal with calculating the document state when reading from disk.

I’ve built the server using actors, and each document is managed by its own actor. The actor provides a nice API to test against. All it depends on is the disk-store, for which I’m using bbolt. Because this is an embedded key-value store, using it in tests is very easy: just create a fresh new database file, use it, and make sure you delete it at the end. No need for horrendous choreography to spin up fresh PostgreSQL or MySQL databases, no maddening complexity of trying to rendezvous on some fresh network port, and no need for mocks. Testing nirvana.

I’ve written a little documentTester struct. This has a few fields and methods to help support the tests: the ability to create a fresh database, spawn an actor manager, spawn a document, and shut everything down. With that in place, I can start writing some tests that verify an empty document is indeed empty.

func TestListenEmptyDocument(t *testing.T) {
   documentTester := newDocumentTester(t, seed, plainManagerSpawnFun)
   defer documentTester.close()

   listener := documentTester.newListeningClient()
   str, ok := listener.nextDocumentRendering(), "") // it's a new document so it must be empty

   _, ok = listener.nextDocumentRendering()!ok) // listener should have observed death of document

This checks that I can create a new document, I can subscribe to it (listing to changes to the document), and as soon as I subscribe I should be given the current state of the document, which should be the empty string. It also checks that once the document actor has been terminated, the subscription should be cancelled. This is all machinery that the WebSocket code relies on: as soon as a browser-client connects, I ensure the correct document actor is running, and subscribe to changes to the document. That subscription should immediately be sent the current state of the document so it can be sent down to the browser.

I’ve chosen to use Mat Ryer’s is test assertions library, rather than the more normal (I think) testify. So that’s why some of the assertions look a little different if you’re used to testify. Mainly, I wanted to try something different. I don’t think there’s anything too wrong with testify, but its API is bonkersly huge.

As this test shows, the documentTester has a newListeningClient method that allows me to subscribe to the document actor and listen for updates. That listener has its own, independent, representation of the document which is based purely on the updates it has received from the document actor. I can test that this document, constructed from updates received from the actor, matches the document that the test thinks should have been constructed.

The documentTester also has a newMutatingClient method which allows me to create a document and mutate it in the test code, and then send those mutations up to the document actor. This mutating client models a document as a list of words, just like the browser-client and like the server. Its API allows for the document to be modified: existing words can be edited or deleted, new words can be added, and all such changes can be sent to the document actor. I can use these methods to make basic changes to the document and check the listener receives updates that result in the expected document; the document can be rendered to a simple string by joining together all of its words. These strings can be tested for equality.

func TestSingleMutationToRoot(t *testing.T) {
   documentTester := newDocumentTester(t, seed, plainManagerSpawnFun)
   defer documentTester.close()

   mutator := documentTester.newMutatingClientWithEmptyRoot()
   expected := mutator.unsentGeneration.String()

   listener := documentTester.newListeningClient()
   got, ok := listener.nextDocumentRendering()
   documentTester.log.Debug().Str("expected", expected).Str("received", got).Send(), expected) // we created the listener AFTER sending the mutation. So we should definitely observe the mutation

   _, ok = listener.nextDocumentRendering()!ok) // listener should have observed death of document

What is this unsentGeneration thing and this generation type? Back in episode 2 where I work on the client-server protocol, and again in episode 4 where I decide what I want to write to disk, I discuss different ways of modelling how the document evolves over time. In this test-code, I take the view it’s best to try to keep the code simple and obvious at the expense of efficiency. A generation is a snapshot of the entire document, and I maintain a list of generations which show how the document evolved. I can use this to look back at the previous generation and see what I expect the document to be if I send an undo message to the document actor. Equivalently for redo. The unsent generation is a copy of the current generation but potentially with modifications that are yet to be sent to the document actor. Once they get sent, the unsent generation gets moved appropriately into the list of generations and a new unsent generation gets created based on what’s just been sent.

I can now write tests that check that when I send undo and redo messages, the document formed from the updates the listener receives matches the document the mutating client thinks it’s created:

func TestUndoAndOverwrite(t *testing.T) {
   documentTester := newDocumentTester(t, seed, plainManagerSpawnFun)
   defer documentTester.close()

   mutator := documentTester.newMutatingClientWithEmptyRoot()
   // mutate the root word twice
   expected := mutator.unsentGeneration.String()
   // undo twice, and redo the first change.
   // mutate again, then undo it

   listener := documentTester.newListeningClient()
   got, ok := listener.nextDocumentRendering()
   documentTester.log.Debug().Str("expected", expected).Str("received", got).Send(), expected)

Randomised sequences of events

One of the “fun” challenges of programs that deal with event streams, is that bugs often result from a particular order of events. Some sequence of events, which had been overlooked when designing the system, causes the system to misbehave. If I only write bog standard unit tests, I can end up merely verifying the effect of a single event at a time, or really short sequences of events, as demonstrated above. Whilst each individual event could be correctly handled (at least as far as the limited unit test is concerned), a longer sequence of events could be incorrectly handled. Ideally, for any set of events, you want to test every permutation. But that can become tricky, because the number of permutations of a set of items is the factorial of the number of items. 10 events leads to 3.6 million permutations, which could take a while to test.

What I tend to do in these situations is to generate a stream of random events. Yes, given enough time, it will generate every possible permutation of events; though extremely inefficiently. But I think what’s more important is that it doesn’t spend hours exploring a large number of permutations which all start with a certain sequence of events, that guarantee no error will occur. It’s quite inviting to construct the permutations depth-first, but that means you then have to go through a lot of the permutations before the first items change. You could generate them all, and then shuffle them randomly before “running” any of them, but you can spend a lot of time (and memory) generating all those permutations. If you want to, you can structure the test code in such a way as to make it easy to switch between a randomly-generated stream of events, and a permutation.

It’s important that the tests are still repeatable and deterministic, even when randomly generating events. For this, I create a new random number generator, and seed it with either a value given on the command line, or with the current Unix time, and I make sure I print it to the log. Then, if an error does occur, I can re-run the test with the exact same seed, and the random number generator will generate the same sequence of numbers, which will lead to the same sequence of events, and the bug should continue to manifest itself until it’s fixed. I’ve already been using this random number generator in the tests covered so far. For example the editExistingWord method uses the random number generator to pick which word in the document to edit, and what to change it to. The behaviour of the system should be the same no matter which word is edited or what that word is changed to. I believe tests that are precise about what properties they are testing for, and what properties they are agnostic about, lead to a more robust system: I would consider these tests less useful if they’d hard-coded which word is being edited or what the word is being changed to.

For this style of longer-running randomised tests (I tend to call these soak tests), I think it’s preferable to validate the state of the system after every event, or as close to that as possible. That way, if an error does crop up, I should be able to find it quickly and not have to wade back through hundreds of events wondering which one caused the problem.

Given the machinery already build, my soak test isn’t too long. The guts of it is this:

func (self *soaker) performRandomAction(log zerolog.Logger) {
   n := self.rng.Intn(10)
   switch {
   case n == 0:
      log.Trace().Msg("restarting document")


      self.mutator.documentClient = self.documentClientFactory.NewClient()
      self.listener = self.newListeningClient()

   case n <= 2 && self.mutator.canUndo():

   case n <= 4 && self.mutator.canRedo():

      log.Trace().Msg("mutating words")
      for mutations := self.rng.Intn(5); mutations >= 0; mutations-- {
         action := self.rng.Intn(5)
         switch {
         case action == 0:
         case action == 1 && len(self.mutator.unsentGeneration.orderedWords) > 1:

I do have to hard-code the probability of each event type. I could either try to make weird and bizarre sequences of events more likely (which might expose more corner cases), or I could try to make the sequence of events be kinda similar to what a user might do. Here I think I’ve gone for the latter: about 50% of the time the document’s going to get edited – a few changes to a few words. Abort 20% of the time the event will be an undo, and 20% of the time it’ll be a redo. And the remaining 10% of the time will terminate the document actor, spawn a new one (which will read the list of events back in from disk), and create a new listener. After each event, I validate that the document as the mutator believes it is, matches with the document as the listener believes it is.

Because every actor has a single mailbox (a single linear queue of messages), it doesn’t matter if these events are coming from one client or several: it’ll make no difference to the document actor as all the events get added to the same mailbox. The mutating client does though make sure that when it sends an update, the updated words all carry a version number that’s high enough to guarantee it’ll take effect. I probably should extend this test with the ability to send updates with lower version numbers, to simulate several users editing the same document at the same time and occasionally colliding when editing the same word.

Running all of these tests with code coverage turned on shows I’m hitting around 90% of document.go, db.go, frame.go, and wordmanager.go, which seems quite good. I’m not testing any part of the document registry though, so at some point I should add some tests for that. Code coverage metrics are a minimal measurement: it shows me which bits of my code I should definitely work on adding tests for; but even 100% code coverage is far from sufficient, which is kinda the point of this whole thing: bugs can manifest from the order of events; you could get to 100% code coverage just by writing unit tests and never uncover such bugs.

I do want to adapt this soak test so that it can be used with fuzz testing: my plan is either for the fuzzer to provide the seed for the random number generator, or for the fuzzer to provide a slice of bytes, and the test grabs numbers from that slice of bytes in place of grabbing numbers from the random number generator. However, Go’s support for fuzz testing is arriving in version 1.18, which is expected at some point in February 2022 - a month away. The fuzzer is meant to be coverage-guided: i.e. it’s meant to try to vary the test input to provoke more code to be run. Whether that’s just “does this line of code get run?” or “is this particular path through the control-flow-graph run?” I don’t yet know. The latter would be much more powerful, because that would enable it to explore different sequences of events. But until Go 1.18 is released, I think I’ll have to put this series on hold.

Last week, when I wrote about my binding to LMDB, I mentioned there’s a soak test in there too. It doesn’t benefit from quite the same level of machinery and framework as I’ve built here, but the principle is the same.

January 13, 2022 11:01 AM

January 12, 2022

Don Stewart (dons)

Glean on aarch64 on Apple Silicon : part 3

Creating a Glean index for React

See part 1 (creating a working aarch64 env on Mac) and part 2 (building Glean for ARM).

In the last post we got a working Glean installation built on aarch64 with native emulation on the ARM-based M1 MacBook Air. To be useful, we need to “index” some code and store the code facts in Glean. See (What is Glean?).

Indexing is the process of analysing source code and logging the things we find. Indexers look a lot like parsers or compiler front-ends: they take source code and build up semantic information, then record it to files for ingestion into Glean, or write directly to the database.

Almost all linters, static analysis tools, compilers, transpilers, formatters parse the code, analyse and write out transformed results, usually throwing away what they learned in the process. What Glean lets us do is efficiently persist the semantic information the compiler discovered, in a way that can be very efficiently queried. We turn compilers into data sources for a distributed cache.

This splits up the usual compiler frontend <-> backend coupling, with Glean as a cache between the two phases.This lets us scale: we can index a repo frequently, and share the results with thousands of engineers, or support millions of concurrent queries to a Glean database, with very low latency. It’s like the compiler AST and type environment are now in memcache, and our IDEs and code analysis tools can hit the cache instead of painfully reconstructing the frontend themselves.

What makes a good index?

What we index is described by a Glean schema written in Angle. The schema describes the types, predicates (tables) and references between facts, as well as how they are represented. It’s broadly “pay as you go” — you don’t need to capture everything about the language, but just what you need for specific tasks. There are schemas for most common languages, as well as many mini-languages (like GraphQL, Thrift, Buck). (But n.b. there aren’t _indexers_ for most languages, just schemas. The indexers a quite a bit more work as they usually hook into custom compiler toolchains).

Common examples of things we would capture are:

  • file names
  • declarations, definition locations
  • uses of definitions (“xrefs”)
  • language elements: module names, type names, functions, methods, classes, ..

That’s usually enough to get a working navigation service up (e.g. jump-to-definition or find-references). For more sophisticated static analysis you will need to capture more of the compiler environment. It’s a good idea to float out strings and values that are repeated a lot into their own predicates, so as to maximise sharing. And to have a sense of the queries you need to write when constructing the index schema.

Once you have a schema, you can store data in that format. Indexers are standalone programs, often with no dependency on Glean itself, that parse code (and add other type information, resolve names, resolve packages), before writing out lines and lines of JSON in the schema format you specified (or writing directly to the Glean db over binary thrift).

Ok, let’s index some JavaScript

Let’s see if we can index the React codebase. React is written in JavaScript, and uses the Flow type system. Flow knows about Glean, and can be run directly as an indexer. My aim here is to use the aarch64 VM as the glean server, but the indexer can run anywhere we want — on any box. We just have to write the data into Glean on the VM. Let’s have a go at installing Flow on aarch64/Debian though, for fun, as arm64/Linux is supported by Flow.

We can build from source (needs opam/OCaml) or install a pre-built binary from . I installed the binary into my aarch64 VM and we are in business:

$ flow/flow --version
Flow, a static type checker for JavaScript, version 0.169.0

We’ll need the React source , so get that. This will be our source code to index:

git clone

Initialize flow and process the source using flow glean:

$ flow glean packages --output-dir=/tmp/flow-out/ --write-root="react"
>>> Launching report...
Wrote facts about 787 JavaScript files.

And we are in business! What did we index? Have a look in /tmp/flow-out at the raw JSON. The index is in textual JSON format, and are just arrays of predicates + fact pairs. Each predicate has a set of facts associated (and facts are unique in Glean, any duplicates will be de-duped when ingested).

Flow indexer output

The whole React index is about 58M of JSON, while the raw source code was 17M. We have a few predicates defined with facts:

$ sed 's/"predicate":"\([^"]*\)",/\n\1\n/g' * | grep '^flow.' | sort | uniq

The definitions of these predicates are in flow.angle, which define all the possible predicates we might have facts for in Flow/JavaScript. Looking at the first entry in the first file:


We can parse this fact as:

There is a “LocalDeclarationReference” of “ownerDocument’ in react/packages/react-devtools-shared/src/devtools/views/SearchInput.js at bytespan 1849, length 13. which has a reference at offset 1901.

Seems plausible. The structure of schema tells us the shape of the JSON we need to generate. E.g. LocalDeclarationReferences are bytespans associated with the use of a Declaration. Represented in Angle as::

predicate Range: {
  module : Module,
  span: src.ByteSpan,
predicate Name: string
predicate Declaration: {
  name: Name,
  loc: Range,
# connects a variable to its declaration within the same file
predicate LocalDeclarationReference: {
  declaration: Declaration,
  loc: Range,

Write the data to Glean

Now let’s ingest that into Glean to query. I’ll make a directory in $HOME/gleandbs to store the Glean db images. We can install the standard schema, or just point Glean at the source. Now load all that JSON in. You can do this in parallel on multiple cores to speed things up — +RTS -N8 -A32m -RTS if it is a very, very big DB, but this is fine to run single threaded.

Our first db will be called “react” with tag “main”, but in a production setting you would probably use the commit hash as the tag to identify the data set. Glean data is immutable once the DB is finished, so its fine to use the commit hash if it is also immutable.

$ mkdir ~/gleandbs
$ glean --db-root $HOME/gleandbs --schema $HOME/Glean/glean/schema/source/ create --repo react/main /tmp/flow-out/*.json

And the Glean RTS does some work:

We can look at the DB in the glean shell:

$ glean shell --db-root $HOME/gleandbs --schema $HOME/Glean/glean/schema/source/
Glean Shell, built on 2022-01-08 07:22:56.472585205 UTC, from rev 9adc5e80b7f6f7fb9b556fbf3d7a8774fa77d254
type :help for help.
> :list
react/main (incomplete)
  Created: 2022-01-12 02:01:04 UTC (3 minutes ago)
> :pager on
> :db react/main
> :stat

Use :stat to see a summary of the data we have stored. It’s already a basic form of code analysis:

  count: 33527
  size:  1111222 (1.06 MB) 7.1861%
  count: 33610
  size:  1177304 (1.12 MB) 7.6135%
  count: 2215
  size:  61615 (60.17 kB) 0.3985%
  count: 1803
  size:  53050 (51.81 kB) 0.3431%
  count: 5504
  size:  196000 (191.41 kB) 1.2675%
  count: 86297
  size:  2922952 (2.79 MB) 18.9024%

So just basic info but the React project has 33k unique declarations, 5,500 import declarations, 86k local variable uses, 3,600 type declarations, 1,117 modules, and 904 files. You can use these summaries over time to understand code complexity growth. Things may be missing here — its up to the indexer owner to run the indexer and capture all the things that need capturing. Glean is just reporting what was actually found.

The DB is in “incomplete” state, meaning we could write more data to it (e.g. if the indexer failed part way through we could restart it and resume safely, or we could shard analysis of very large projects). But before we “finish” the DB to freeze it, we need to derive additional predicates.

Note there are some limitations here: the Glean index need to know about the JavaScript and Flow modules system (in particular, names of modules to strings, and module string names to filepaths), so that imports like ‘react’ resolve to the correct module filepath.

import {useDebugValue, useEffect, useState} from 'react';

However, if we look closely at our default Flow index, the string to file facts are all empty. This will limit our ability to see through file names imported via string names (e.g. “React.js” gets imported as ‘react’).


which I think means I haven’t configured Flow correctly or set up the module maps properly (halp Flow folks?).

Derived predicates in Glean

A bit like stored procedures, we can write Angle predicates that are defined in terms of other, existing predicates. This is how we do abstraction. It’s morally equivalent to defining SQL tables on the fly in terms of other tables, with some different guarantees as Glean is more like datalog than a relational system. Derived predicates can be computed on the fly, or fully generated and stored. A very common use case is to compute inverse indices (e.g. find-references is the inverse of jump-to-definition). We can index all uses of definitions, then compute the inverse by deriving.

An example is the “FileXRef” predicate in Flow, which builds an index of File name facts to cross-references in those files. You would do this to quickly discover all outbound references from a file.

This is a stored predicate. The indexer doesn’t write facts of this sort — they are defined in terms of other facts: LocalDeclarationReferences etc. To populate this index we need to derive it first. Let’s do that:

$ glean --db-root $HOME/gleandbs --schema $HOME/Glean/glean/schema/source/ derive --repo react/main flow.FileXRef
I0112 12:20:10.484172 241107 Open.hs:344] react/main: opening
I0112 12:20:10.526576 241107 rocksdb.cpp:605] loadOwnershipSets loaded 0 sets, 0 bytes
I0112 12:20:10.526618 241107 Open.hs:350] react/main: opened
I0112 12:20:10.749966 241107 Open.hs:352] react/main: schema has 799 predicates
flow.FileXRef : 0 facts
I0112 12:20:11.119634 241107 Stats.hs:223] mut_lat: 59ms [59ms] mut_thp: - [-] ded_thp: - [-] dup_thp: - [-] rnm_thp: - [-] cmt_thp: - [-] ibk_mis: - [-] tbi_mis: - [-] fbi_mis: - [-] lch_mem: 0B lch_cnt: 0
I0112 12:20:11.547000 241108 rocksdb.cpp:605] loadOwnershipSets loaded 0 sets, 0 bytes
flow.FileXRef : 112662 facts

We generated 112,662 facts about cross-references. Taking a peek at the DB now with :stat

  count: 112662
  size:  4043028 (3.86 MB) 20.7266%

We’ve increased the DB size by 3.8M. We can derive the rest of the stored predicates now and finalize the DB. Note we have to derive in dependency order, as some stored predicates depend on the results of others. I just do this in two phases:

$ glean --db-root $HOME/gleandbs --schema $HOME/Glean/glean/schema/source/ derive --repo react/main flow.NameLowerCase flow.FileDeclaration flow.FileXRef flow.FlowEntityImportUses flow.FlowTypeEntityImportUses
I0112 12:43:12.098162 241911 Open.hs:344] react/main: opening
I0112 12:43:12.141024 241911 rocksdb.cpp:605] loadOwnershipSets loaded 0 sets, 0 bytes
I0112 12:43:12.141064 241911 Open.hs:350] react/main: opened
I0112 12:43:12.322456 241911 Open.hs:352] react/main: schema has 799 predicates
I0112 12:43:12.367130 242084 Stats.hs:223] mut_lat: 112us [112us] mut_thp: - [-] ded_thp: - [-] dup_thp: - [-] rnm_thp: - [-] cmt_thp: - [-] ibk_mis: - [-] tbi_mis: - [-] fbi_mis: - [-] lch_mem: 0B lch_cnt: 0
flow.FileDeclaration : 46594 facts
flow.FileXRef : 112662 facts
flow.FlowEntityImportUses : 3022 facts
flow.NameLowerCase : 9621 facts
flow.FlowTypeEntityImportUses : 692 facts

And freeze the data.

$ glean --db-root $HOME/gleandbs --schema $HOME/Glean/glean/schema/source finish --repo react/main
I0112 12:45:54.415550 242274 Open.hs:344] react/main: opening
I0112 12:45:54.451892 242274 Open.hs:350] react/main: opened
I0112 12:45:54.671070 242274 Open.hs:352] react/main: schema has 799 predicates
I0112 12:45:54.701830 242270 Work.hs:506] workFinished Work {work_repo = Repo {repo_name = "react", repo_hash = "main"}, work_task = "", work_parcelIndex = 0, work_parcelCount = 0, work_handle = "glean@9adc5e80b7f6f7fb9b556fbf3d7a8774fa77d254"}
I0112 12:45:54.707198 242274 Backup.hs:334] thinned schema for react/main contains src.1, flow.3
I0112 12:45:54.707224 242274 Open.hs:287] updating schema for: react/main
I0112 12:45:54.824131 242274 Open.hs:299] done updating schema for open DBs
I0112 12:45:54.824172 242274 Backup.hs:299] react/main: finalize: finished

The db is now frozen and cannot be changed.

> :list
react/main (complete)
  Created: 2022-01-12 02:01:04 UTC (45 minutes ago)
  Completed: 2022-01-12 02:45:55 UTC (51 seconds ago)

Poking around a Glean database

We can look at this data by querying in the Glean shell. E.g. to count all xrefs in ReactHooks.js.,

react> :limit 0
react> :count flow.FileXRef { file = "react/packages/react/src/ReactHooks.js" }
134 results, 605 facts, 2.84ms, 310224 bytes, 1032 compiled bytes

To see say, only local references, and just the names of the definitions they point at:

react> N where flow.FileXRef { file = "react/packages/react/src/ReactHooks.js", ref = { localRef = { declaration =  { name = N } } } }
{ "id": 14052, "key": "deps" }
{ "id": 4327, "key": "callback" }
{ "id": 13980, "key": "dispatcher" }
{ "id": 9459, "key": "create" }
{ "id": 5957, "key": "ReactCurrentDispatcher" }
{ "id": 1353, "key": "getServerSnapshot" }
{ "id": 1266, "key": "getSnapshot" }
{ "id": 1279, "key": "subscribe" }
{ "id": 14130, "key": "initialValue" }
{ "id": 3073, "key": "init" }
{ "id": 5465, "key": "source" }...

Or we could query for types used in the file:

react> N where flow.FileXRef { file = "react/packages/react/src/ReactHooks.js", ref = { typeRef =  { typeDeclaration = { name =  N }  } } }
{ "id": 1416, "key": "T" }
{ "id": 3728, "key": "A" }
{ "id": 14493, "key": "Dispatch" }
{ "id": 14498, "key": "S" }
{ "id": 14505, "key": "I" }
{ "id": 14522, "key": "BasicStateAction" }
{ "id": 3318, "key": "ReactContext" }
{ "id": 2363, "key": "Snapshot" }
{ "id": 14551, "key": "AbortSignal" }
{ "id": 6196, "key": "Source" }
{ "id": 8059, "key": "Dispatcher" }
{ "id": 7362, "key": "MutableSourceSubscribeFn" }
{ "id": 7357, "key": "MutableSourceGetSnapshotFn" }
{ "id": 7369, "key": "MutableSource" }

Ok this is starting to get useful.

We’re doing some basic code analysis on the fly in the shell. But I had to know / explore the flow schema to make these queries. That doesn’t really scale if we have a client that needs to look at multiple languages — we can’t reasonably expect the client to know how declarations and definitions etc are defined in every single language. Luckily, Glean defines abstractions for us in code.angle and codemarkup.angle to generically query for common code structures.

Querying generically

Entities are an Angle abstraction for “things that have definitions” in programming languages — like types, modules, classes etc. There are some common queries we need across any language:

  • files to their cross-references , of any entity sort
  • references to definitions
  • definitions in this file
  • entity to its definition location and file

For these common operations, a language-agnostic layer is defined in codemarkup.angle, taking care of all the subtleties resolving imports/headers/ .. for each language. E.g. for find-references, there’s a derived “EntityUses” predicate for a bunch of languages here:

We can use this to query Flow too E.g. how many known entities are defined or declared in ReactHooks.js? 99.

react> :count codemarkup.FileEntityLocations { file = "react/packages/react/src/ReactHooks.js" }
99 results, 354 facts, 13.15ms, 9297888 bytes, 54232 compiled bytes

And how many uses (xrefs) are in that file? 132.

:count codemarkup.FileEntityXRefLocations { file = "react/packages/react/src/ReactHooks.js" }
132 results, 329 facts, 40.44ms, 27210432 bytes, 160552 compiled bytes

Quick and dirty find-references for JavaScript

So we probably have enough now to do some basic semantic code search. i.e. not just textual search like grep, but semantically precise search as the compiler would see it. Let’s pick an entity and find its references. Since React is basically purely functional programming for UIs, let’s look for how often state is used — find-references to useState.

First, we get the entity. This tells us the definition site. The Glean key of the entity is $575875. and its structure is as below. Note the compound query here (the semicolon), where I name the entity ‘E’, then filter on its body for only those ‘Es’ with the name “useState”

react> E where codemarkup.FileEntityLocations { file = "react/packages/react/src/ReactHooks.js", entity = E } ; { flow = { decl = { localDecl = { name = "useState" } } } }  = E
  "id": 575875,
  "key": {
    "flow": {
      "decl": {
        "localDecl": {
          "id": 14269,
          "key": {
            "name": { "id": 1317, "key": "useState" },
            "loc": {
              "id": 14268,
              "key": {
                "module": {
                  "id": 12232,
                  "key": {
                    "file": { "id": 12231, "key": "react/packages/react/src/ReactHooks.js" }
                "span": { "start": 2841, "length": 8 }

Now to direct references to this elsewhere in React, we add codemarkup.EntityUses { target = E, file = F } to the query and return the files F:

react> F where codemarkup.FileEntityLocations { file = "react/packages/react/src/ReactHooks.js", entity = E } ; { flow = { decl = { localDecl = { name = "useState" } } } } = E ; codemarkup.EntityUses { target = E, file = F }
{ "id": 10971, "key": "react/packages/react/src/React.js" }

1 results, 1 facts, 9.19ms, 5460072 bytes, 8140 compiled bytes

So that finds the first-order direct reference to useState from ReactHooks.js to React.js.. To find the actual uses in the rest of the react package, we need a proper index for module names to strings, so that an import of ‘react’ can be resolved to ‘React.js’ and thus to the origin. Glean knows about this, but my indexer doesn’t have StringToModule facts — I need the flow indexer to generate these somehow.

For now, this is enough. We are alive.

In the next part I’ll look at writing a simple code search client to the Glean server running on the VM.

by Don Stewart at January 12, 2022 03:48 AM

January 10, 2022

Monday Morning Haskell

In the Middle: Intersperse and Intercalate

This week we continue looking at some useful list-based functions in the Haskell basic libraries. Last time around we looked at boolean functions, this time we'll explore a couple functions to add elements in the middle of our lists.

These functions are intersperse and intercalate:

intersperse :: a -> [a] -> [a]

intercalate :: [a] -> [[a]] -> [a]

Using "intersperse" will place a single item in between every item in your list. For example:

>> intersperse 0 [1, 2, 1]
[1, 0, 2, 0, 1]

This can be used in strings, for example, to space out the letters:

>> intersperse ' ' "Hello"
"H e l l o"

The "intercalate" function is similar but "scaled up". Instead taking a single element and a list of elements as its two inputs, it takes a list and a "list of lists". This is even more flexible when it comes to the use case of separating strings. For example, you can comma separate a series of words:

>> intercalate ", " ["Hello", "my", "friend"]
"Hello, my, friend"

Using intercalate combined with map show can make it very easy to legibly print a series of items if you use commas, spaces, and newlines!

Another use case for intercalate could be if you are constructing a numeric matrix, but you want to add extra columns to pad on the right side. Suppose this is our starting point:

1 2 3 4
5 6 7 8
9 8 7 6
5 4 3 2

But what we want is:

1 2 3 4 0 0 0
5 6 7 8 0 0 0
9 8 7 6 0 0 0
5 4 3 2 0 0 0

Here's how we do this:

>> intercalate [0, 0, 0] [[1, 2, 3, 4], [5, 6, 7, 8], [9, 8, 7, 6], [5, 4, 3, 2]]
[ 1, 2, 3, 4, 0, 0, 0
, 5, 6, 7, 8, 0, 0, 0
, 9, 8, 7, 6, 0, 0, 0
, 5, 4, 3, 2, 0, 0, 0

The result ends up as just a single list though, so you can't really compose calls to intercalate. Keep that in mind! Also, unlike the boolean functions from last time, these only work on lists, not any Foldable object.

Make sure you subscribe to our monthly newsletter so you can stay up to date with the topic each month! Subscribing will give you access to all our subscriber resources, including our Beginners Checklist, so don't miss out!

by James Bowen at January 10, 2022 03:30 PM

Don Stewart (dons)

Glean on aarch64 on Apple Silicon : part 2

See: Part 1: get an aarch64/Linux VM running in UTM on the M1

I want to develop and use Glean on ARM as I have a MacBook Air (road warrior mode) and I’m interested in making Glean more useful for local developer IDE backends. (c.f What is Glean?)

To build Glean just read the fine instructions and fix any compilation errors, right? Actually, we need a few patches to disable Intel-specific things, but otherwise the instructions are the same. It’s a fairly normal-ish Haskell set of projects with an FFI into some moderately bespoke C++ runtime relying on folly and a few other C++ libs.

Thankfully, all the non-portable parts of Glean are easily isolated to the rts/ownership parts of the Glean database runtime. In this case “ownership” is only used for incremental updates to the database and other semi-advanced things I don’t need right now.

The only real bits of non-portable code are:

  • Flags to tell folly and thrift to use haswell or corei7 (we will ignore this on non-x86_64)
  • An implementation of 256-bit bitsets (via AVX).
  • Use of folly/Elias-Fano coding, for efficient compression of sorted integer list or sets as offsets (how we represent ownership of facts to things they depend on).

Why is this stuff in Glean? Well, Glean is a database for storing and querying very large scale code information, represented as 64 bit keys into “tables” (predicates) which represent facts. These facts relate to each other forming DAGs. Facts are named by 64 bit key. A Glean db is millions (or billions) of facts across hundreds of predicates. I.e. lots of 64 bit values.

So we’re in classic information retrieval territory – hence the focus on efficient bit and word encodings and operations. Generally, you flatten AST information (or other code facts) into tables, then write those tables into Glean. Glean then goes to a lot of work to store that efficiently. That’s how we get the sub-millisecond query times.

What is a “fact about code”? A single true statement about the code. E.g. for a method M in file F we might have quite a lot of information:

M is a method
M is located at file F
M is located at span 102-105
M has parent P
F is a file
M has type signature T
M is referred to by file/spans (G, 107-110) and (H, 23-26)

Real code bases have millions of such facts, all relating things in the code to each other – types to methods, methods to container modules, declarations to uses, definitions to declarations etc. We want that to be efficient, hence all the bit fiddling.

So let’s try to build this on non-x86 and see what breaks.

Building Glean from scratch

The normal way to build Glean is from source. There are two repos:

I’ve put some PRs up for the non-x86_64 builds, so if you’re building for ARM or something else, you’ll need these from here:

git clone
cd Glean
git clone

Worth doing a cabal update as well, just in case you never built Haskell stuff before.

Now we can build the dependent libraries and the thrift compiler (n.b. we need some stuff installed in /usr/local (needs sudo).

export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

So let’s build thrift and folly:

 cd hsthrift
 ./install-deps –sudo

I’m doing this on the aarch64 Debian 11 image running in UTM on a Macbook Air.

Now… first time build I seem to reliably get a gcc segfault on both x86 and aarch64, which I will conveniently sidestep by running it again. This seems mildly concerning as open source thrift might be miscompiled with gcc. I should likely be using clang here.

[ 57%] Building CXX object thrift/lib/cpp2/CMakeFiles/thriftprotocol.dir/protocol/TableBasedSerializer.cpp.o
In file included from :
/usr/include/stdc-predef.h: In substitution of ‘template constexpr T apache::thrift::detail::identity(T) [with T = ]’:
/home/dons/Glean/hsthrift/fbthrift/thrift/lib/cpp2/protocol/TableBasedSerializer.cpp:37:1: required from here
/usr/include/stdc-predef.h:32:92: internal compiler error: Segmentation fault
32 | whether the overall intent is to support these features; otherwise,
| ^
Please submit a full bug report,
with preprocessed source if appropriate.
See for instructions.
make[2]: *** [thrift/lib/cpp2/CMakeFiles/thriftprotocol.dir/build.make:173: thrift/lib/cpp2/CMakeFiles/thriftprotocol.dir/protocol/TableBasedSerializer.cpp.o] Error 1

Vanilla Debian gcc.

$ gcc --version
gcc (Debian 10.2.1-6) 10.2.1 20210110

That really looks like a gcc bug and probably other things lurking there. Urk. Re-run the command and it seems to make progress. Hmm. Compilers in memory-unsafe languages eh? Moving along quickly…

Build the Glean rts and Haskell bits

Once hsthrift is built installed, we have all the various C++ deps (folly, xxhash etc). So we can try building Glean itself now. Glean is a mixture of Haskell tools over a C++ runtime. There’s a ton of schemas, bytecode generators, thrift mungers. Glean is sort of an ecosystem of indexers (analyzing code and spitting out facts as logs), a database runtime coupled to a Thrift server (“Glean” itself) and tooling for building up distributed systems around this (for restoring/ migrating/ monitoring / administering clusters of Glean services).

Building Glean .. if you get an error about missing HUnit, that means we haven’t synced the cabal package list. I got this on the first go with a blank Debian iso as the initial cabal package list is a basic one.

Resolving dependencies…
cabal: Could not resolve dependencies:
[__0] trying: fb-stubs- (user goal)
[__1] unknown package: HUnit (dependency of fb-stubs)
[__1] fail (backjumping, conflict set: HUnit, fb-stubs)

That’s fixable with a cabal update.

If you’re not using my branch, and building on non-x86 you’ll fail at the first AVX header.

Preprocessing library 'rts' for glean-
Building library 'rts' for glean-
In file included from ./glean/rts/ownership/uset.h:11,
from ./glean/rts/ownership.h:12, from glean/rts/ffi.cpp:18:0: error:
glean/rts/ownership/setu32.h:11:10: error:
fatal error: immintrin.h: No such file or directory
11 | #include

Similarly, hsthrift needed some patches where the intel arch was baked in, otherwise you’ll get:

cc1plus: error: unknown value ‘haswell’ for ‘-march’
cc1plus: note: valid arguments are: armv8-a armv8.1-a armv8.2-a armv8.3-a armv8.4-a

I fixed up all the .cabal files and other bits:

$ find . -type f -exec grep -hl haswell {} \;

See this PR for the tweaks for hsthrift

AVX instructions

Now, Glean itself uses a whole set of AVX instructions for different things. To see what’s actually needed I added a define to conditionally compile immintrin.h on arm, and then sub out each of the methods until the compiler was happy.

$ find . -type f -exec grep -hl immintrin.h {} \;

The methods we need to stub out are:

int _mm256_testc_si256(__m256i __M, __m256i __V);
int _mm256_testz_si256(__m256i __M, __m256i __V);
__m256i _mm256_setzero_si256();
__m256i _mm256_set1_epi32(int __A);
__m256i _mm256_sllv_epi32(__m256i __X, __m256i __Y);__m256i _mm256_sub_epi32(__m256i __A, __m256i __B);
__m256i _mm256_set_epi32(int __A, int __B, int __C, int __D, int __E, int __F, int __G, int __H);

Ooh AV512

long long _mm_popcnt_u64(unsigned long long __X);


unsigned long long _lzcnt_u64(unsigned long long __X);
__m256i _mm256_or_si256(__m256i __A, __m256i __B);__m256i _mm256_and_si256(__m256i __A, __m256i __B);
__m256i _mm256_xor_si256(__m256i __A, __m256i __B);

Figuring out what these are all used for is interesting. We have 256-bit bitsets everywhere, and e.g. 4 64 bit popcnts to count things (fact counts?).

size_t count() const {
const uint64_t* p = reinterpret_cast(&value);
// _mm256_popcnt instructions require AVX512
_mm_popcnt_u64(p[0]) +
_mm_popcnt_u64(p[1]) +
_mm_popcnt_u64(p[2]) +

Anyway, its relatively trivial to stub these out, match the types and we have a mocked AVX layer. Left to the reader to write a portable shim for 256 bitsets that does these things on vectors of words.

Elias Fano

So the other bit is a little more hairy. Glean uses Elias Fano to compress all these sets of 64 bit keys we have floating around. Tons of sets indicating facts are owned or related to other facts. The folly implementation of Elias Fano is x86_64 only, so just falls over on aarch64:

/usr/local/include/folly/experimental/EliasFanoCoding.h:43:2: error:
error: #error EliasFanoCoding.h requires x86_64
43 | #error EliasFanoCoding.h requires x86_64
| ^~~~~
43 | #error EliasFanoCoding.h requires x86_64

So hmm. Reimplement? No its Saturday so I’m going to sub this out as well, just enough to get it to compile. My guess is we don’t use many methods here, read/write/iterate and some constructors. So I copy just enough of the canonical implementation declarations and dummy bodies to get it all to go through. Hsthrift under aarch64 emulation on UMT on an arm64 M1 takes about 10 mins to build with no custom flags.

Build and test

So we should be good to go now. Compile the big thing: Glean. Some of these generated bits of schema are pretty big too.

Glean storage is described via “schemas” for languages. Schemas represent what predicates (tables and their types) we want to capture. Uniquely, Glean’s Angle language is rich enough to support abstracting over types and predicates, building up layers of API that let you hide language complexity. You can paper over differences between languages while also providing precise language-specific captur

To see an example, look at the mulit-language find-references layer in codemarkup.angle:

The joy of this is that a client only has to know to query codemarkup:find-references and the right query will be issued for the right language. Client doesn’t have to know language-specific stuff, its all hidden in the database engine.

But .. that does end up meaning we generate quite a lot of code. With some trial and error I needed something under 16G to compile the “codemarkup” abstraction layer (this is a language-angostic navigation layer over the Glean schemas).

make test

That should pass and we are in business. We can run the little hello world demo.

$ uname -msr
Linux 5.10.0-10-arm64 aarch64

$ glean shell --db-root /tmp/glean/db/ --schema /tmp/glean/schema/
Glean Shell, built on 2022-01-08 07:22:56.472585205 UTC, from rev 9adc5e80b7f6f7fb9b556fbf3d7a8774fa77d254
type :help for help.

Check our little db created from the walkthrough:

facts/0 (complete)
Created: 2022-01-08 10:59:17 UTC (1 day, 18 hours ago)
Completed: 2022-01-08 10:59:18 UTC (1 day, 18 hours ago)

What predicates does it have?

facts> :schema
predicate example.Member.1 :
{ method : { name : string, doc : maybe string } | variable : { name : string } | }

predicate example.FileClasses.1 : { file : string, classes : [example.Class.1]

predicate example.Reference.1 :
{ file : string, line : nat, column : nat }
-> example.Class.1

predicate example.Class.1 : { name : string, line : nat }

predicate example.Parent.1 : { child : example.Class.1, parent : example.Class.1 }

predicate example.Has.1 :
{ class_ : example.Class.1, has : example.Member.1, access : enum { Public | Private | } }

predicate example.Child.1 : { parent : example.Class.1, child : example.Class.1 }

Try a query or two: e.g. “How many classes do we have?”

facts> example.Class _
{ "id": 1026, "key": { "name": "Fish", "line": 30 } }
{ "id": 1027, "key": { "name": "Goldfish", "line": 40 } }
{ "id": 1025, "key": { "name": "Lizard", "line": 20 } }
{ "id": 1024, "key": { "name": "Pet", "line": 10 } }

What is the parent of the Fish class?

facts> example.Parent { child = { name = "Fish" } }
"id": 1029,
"key": {
"child": { "id": 1026, "key": { "name": "Fish", "line": 30 } },
"parent": { "id": 1024, "key": { "name": "Pet", "line": 10 } }

1 results, 3 facts, 5.59ms, 172320 bytes, 1014 compiled bytes

Ok we have a working ARM64 port of Glean. In the next post I’ll look at indexing some real code and serving up queries.

by Don Stewart at January 10, 2022 11:58 AM

Philip Wadler

Gödel, Animated


A five-minute primer on Gödel and incompleteness, courtesy of Marcus du Sautoy and TED-Ed. While not named, the fellow in the hat disgruntled by the discovery is clearly Hilbert.

by Philip Wadler ( at January 10, 2022 11:34 AM

January 08, 2022

Don Stewart (dons)

Glean on aarch64 on Apple Silicon : part 1

Get a working aarch64 box

This post show how to get a working aarch64 env on the MacBook Air (M1) for Haskell.

I’m working on the road at the moment, so picked up a MacBook Air with the M1 chip, to travel light. I wanted to use it as a development environment for Glean (c.f. what is Glean), the code search system I work on. But Glean is a Linux/x86_64 only at the moment due to use of some fancy AVX extensions deep down in the runtime. Let’s fix that.

Motivation: getting Glean working on Apple ARM chips could be useful for a few reasons. Apple Silicon is becoming really common, and a lot of devs have MacBooks as their primary development environment (essentially expensive dumb terminals to run VS Code). Glean is/could be the core of a lot of developer environments, as it indexes source code and serves up queries extremely efficiently, so it could be killer as a local language server backend for your Mac IDE. (e.g. a common backend for all your languages, with unified search, jump-to-def, find-refs etc).

Setup up UTM

Glean is still very Linux-focused. So we need a VM. I’m building on an M1 MacBook Air (ARM64). So I install UTM from the app store or internet – this will be our fancy iOS QEMU virtualization layer.

Configure the OS image as per for aarch64 debian, using for the basic configuration.

In particular, I set up the following.

  • Information -> Style: Operating system 
  • System -> Hardware -> Architecture: aarch64
  • System -> Memory -> 16G (compiling stuff!)
  • Drives -> VirtIO at least 20G, this will be the install drive and build artifacts
  • Drives -> Removable USB , for the installation .iso
  • Display -> console only (we’ll use ssh)
  • Network -> Mode: Emulated VLAN
VM disk configuration

I’ll point VS Code and other things at this VM, so I’m going to forward port 2200 on the Mac to port 22 on the Debian VM.

Network settings for the VM

Choose OS installer and boot

Set the CD/DVD to the Debian ISO file path. I used the arm64 netinst iso for Debian 11 from

Boot the machine and run the Debian install. It’s like 1999 here. (So much nostalgia when I used to scavenge x86 boxes from dumpsters in the Sydney CBD 20 years ago to put Linux on them).


Boot the image and log in. Now we have a working Linux aarch64 box on the M1, running very close to native speed (arm on arm virtualization).

You can ssh into this from the Mac OS side, or set it up as a remote host for VS Code just fine, which is shockingly convenient (on port 2200).

Install the dev env

This is a really basic Debian image, so you need a couple of things to get started with a barebones Haskell env:

apt install sudo curl cabal-install

We have a basic dev env now.

$ uname -msr
Linux 5.10.0-10-arm64 aarch64

$ ghci
GHCi, version 8.8.4:  :? for help
Prelude> System.Info.arch
Prelude> let s = 1 : 1 : zipWith (+) s (tail s) in take 20 s

To build Glean a la we need to update Cabal to the 3.6.x or greater, as Glean uses some fancy Cabal configuration features.. 

Update cabal

We need cabal > 3.6.x  which isn’t in Debian stable, so I’ll just use the pre-built binary from

Choose: Binary download for Debian 10 (aarch64, requires glibc 2.12 or later): cabal-install-

Unpack that. You’ll also need apt-get libnuma-dev if you use that binary.

$ tar xvfJ  cabal-install-
$ ./cabal --version
cabal-install version
compiled using version of the Cabal library

I just copy that over the system cabal for great good. It’s a good idea now to sync the package list for Hackage, before we start trying to build anything Haskell. with a cabal update.

Install the Glean dependencies

To build Glean we need a bunch C++ things. Glean itself will bootstrap the Haskell parts. The Debian packages needed are identical to those for Ubuntu on the Glean install instructions : except you might see “Package ‘libmysqlclient-dev’ has no installation candidate”. We will instead need default-libmysqlclient-dev. We also need libfmt-dev.

So the full set of Debian Glean dependencies are:

> apt install g++ \
    cmake \
    bison flex \
    git cmake \
    libzstd-dev \
    libboost-all-dev \
    libevent-dev \
    libdouble-conversion-dev \
    libgoogle-glog-dev \
    libgflags-dev \
    libiberty-dev \
    liblz4-dev \
    liblzma-dev \
    libsnappy-dev \
    make \
    zlib1g-dev \
    binutils-dev \
    libjemalloc-dev \
    default-libmysqlclient-dev \
    libssl-dev \
    pkg-config \
    libunwind-dev \
    libsodium-dev \
    curl \
    libpcre3-dev \
    libfftw3-dev \
    librocksdb-dev \
    libxxhash-dev \

Now we have a machine ready to build Glean. We’ll do the ARM port of Glean in the next post and get something running.

by Don Stewart at January 08, 2022 01:41 AM

January 07, 2022

Matthew Sackman

Another Go binding for LMDB

In the series on designing and building a distributed concurrent editor, I’ve been using bbolt as the embedded key-value store. That is inspired by LMDB, which is written in C, is extremely fast, is very widely used, and has been around for a fair while now.

LMDB makes read-write transactions fully serialised (literally, only one can happen at a time). When a read-only transaction starts, it works on a snapshot of the database that contains every committed read-write transaction up to the moment the read-only transaction started. The read-only transaction continues to see this snapshot even if other read-write transactions start and commit concurrently with the read-only transaction. These are very nice and easy to work with semantics: you never have to worry about seeing data from uncommitted transactions, you always see a consistent snapshot, and you don’t have to worry about weird interactions between concurrent read-write transactions because there are none.

Several years ago, I tweaked an existing Go binding for LMDB and extended it a bit. I recently went back and had a look at that code, and the normal thing happened when looking at your own code from more than 6 months ago: “Which idiot wrote this? This code is terrible. I would never write it like this today”. Etc. So, using the actors framework I wrote about last week, I decided to write a new Go binding to LMDB.

The most comprehensive existing Go binding I came across is Bryan Matsuo’s. However, even that is several years bit-rotten: the tests don’t pass without a bit of tweaking, and although he’s clearly gone to some trouble to minimise unnecessary copying of data between Go land and C land, on a close inspection, the C code does violate the Cgo contract about not storing any Go pointers in Go memory. I suspect the code was written before that detail of Cgo was formalised anyway.

All of the existing bindings that I could find (including my previous one), are low-level bindings: they expose the LMDB API fairly faithfully. That has pros and cons: you get a lot of flexibility, but you also have to rebuild a lot of the things that these days you might expect to get for free – particularly things you get for free with BBolt. So I wanted this new binding to be pretty high-level: scenarios where it’s both fairly obvious what the ideal behaviour is, and it’s achievable, the binding should take care of things for you. Especially when the result is an API and semantics that are somewhat more idiomatic and user-friendly.

With LMDB you get some similar stuff to BBolt:

  • keys and values are byte arrays.
  • normal get and put APIs.
  • keys are sorted lexically (though there are flags to change to that to numerically sorted keys).
  • you can use cursors to move about, based on the key sorting.
  • many concurrent readers are supported, but just one writer at a time.

But you also get some bits I wanted to hide:

  • if the database file gets full, you get an error. There’s an API to increase the size, but it’s up to you to call it.
  • the cursors API is a little weird and tersely documented.
  • once you start a read-write transaction, you need to stay in the same OS-thread, which in Go means using os.LockOSThread.

Additionally, commits of lots of small read-write transactions can demonstrate instantaneous performance slower than you might expect. This can be alleviated by batching together read-write transactions where possible, and thus amortising the cost of the commit.

An actor for the read-write side works quite nicely: that can take care of the batching, and auto-resizing, and the server-side Go-routine can lock itself to an OS thread. Read-only transactions can hang off the same struct for a clean API design, but they don’t talk to the actor at all other using a RW-mutex to make sure there’s not a resize going on and so it’s safe to start.

I’ve tried not to hide too much of LMDB, but there are a number of features I’ve decided to not support as they just don’t fit with the API design. I don’t particularly want to duplicate the LMDB documentation (except where it’s really necessary to improve it), so I frequently do refer back to the upstream API docs.

It’s all done, and it seems to work. I’ve written what I think is a fairly reasonable soak test: i.e. a test that runs for as long as you want, and stresses the API and concurrency. Nearly all of the rest of the API is covered by tests too.

I am aware that there is a degree of competitiveness between LMDB and other designs for embedded key-value stores: LMDB is B+tree based and uses mmap. Many other key-value stores use log-structured merge-trees. If you go searching, you’ll easily find some rivalry between these two camps, and evidence that performing benchmarks which are meaningful and make best use of all these designs is tricky, and so it’s easy to make conclusions from unsafe data. There’s also a very recent paper Are You Sure You Want to Use MMAP in Your Database Management System? which explains that once your dataset gets bigger than your RAM, relying on the OS to page data in and out (as is the case when using mmap) is almost certainly a bad idea. So, I’m not at all suggesting that LMDB (or BBolt) are the last word, or even the best option, in embedded key-value database designs. But, I’ve used them a fair amount over the years with success.

January 07, 2022 04:01 PM

Sandy Maguire

Review: Adders and Arrows

This year my goal is to go through one paper a week, and produce some resulting artifact to help hammer in my understanding. Today’s paper is Conal Elliott’s’s Adders and Arrows.

In my opinion, Adders and Arrows is an attempt to answer the questions “where do digital circuits come from?” and “how can we be convinced our digital circuits do what they claim?” by focusing on the concrete case of adders. What’s quite remarkable to me is that the paper takes 17 pages to build up to the “known” ripple-carry adder, which is essentially day 1 of any digital circuitry class. This feels a bit ridiculous at first blush, but immediately pays off for itself by using the same machinery to derive a carry-forward adder. Carry-forward adders come like a week later in circuitry class, and are completely unrelated to ripple-carry adders, but Elliott derives them essentially for free. He then gives a third variation on the theme, which is a ripple-carry adder in time, rather than space, and again gets it in one page.

So that’s the impressive stuff. What’s also important is that the paper doesn’t require us to trust that the corresponding circuits do what they claim — the underlying machinery passes around a big chain of equivalence proofs, which automatically get composed and witness that addition over the naturals is a model for each circuit.

Something I really liked about this paper is that it’s the first time I’ve ever gotten a glimpse of why it might be valuable to understand category theory. It’s not just good for classifying things! Turns out you can do cool things with it too. But nobody who teaches it ever says that.

The paper itself is only a draft, and it shows in several places. Notably, the core abstraction (the arrow category) is missing, and the paper ends extremely abruptly. After some spelunking, I managed to track down the implementation of the arrow category, and was pleased to realize I’d already implemented it independently.

We’ll proceed section by section.

Section 1: A model of addition

Section 1 gives us a model of the natural numbers via the Peano encoding, and also defines addition. It drives home the point that this unary encoding is only the model of our computation — its specification — and not the actual implementation. This is a common theme in Elliott’s work: “how do we even know what question we’re trying to answer?” We know by posing the dumbest possible model of the problem, and then proving any actual solution to the problem is equivalent. He stresses that the “dumbest possible” thing is an important quality — this is the only part of the problem we get no checks or balances on, so our only hope is to make it so simple that we can’t screw it up.

data Nat : Set where
  zero : Nat
  suc  : Nat -> Nat

_+_ : Nat -> Nat -> Nat
zero  + n = n
suc m + n = suc (m + n)

Because arbitrary categories don’t support currying, we can uncurry addition, which will come in handy later:

<+> : Nat2 -> Nat
<+> (x , y) = x + y

Section 2: Bounded numbers

Section 2 is defines the Finiteary numbers, which are natural numbers guaranteed to be smaller than a given upper bound.

data Fin : N -> Set where
  zero : {n : N} -> Fin (suc n)
  suc  : {n : N} -> Fin n -> Fin (suc n)

Thus, Fin 2 is the type of binary digits, and Fin 10 is that of decimal digits.

Elliott gives us an observation of Fin in terms of Nat:

toN : {n : Nat} -> Fin n -> Nat
toN zero    = zero
toN (suc f) = suc (toN f)

We’d like to find a model-preserving implementation of + over Fin (let’s call it <+F>.) But what does model-preserving meaning? As usual, the answer is “homomorphism”, and namely, that the following square must commute:

<+> . bimap toN toN = toN . <+F>

The paper says “solving this equation for <+F> yields the following recursive definition.” It’s unclear if this answer was “solved for” in the sense of being manipulated algebraically, or if it just happens to be a solution to the problem. I hope it’s the former, because I would love to see how to do that, but I suspect it’s the latter. Anyway, the definition of _+F_ is given below, plus the work I needed to do to get everything to typecheck.

n+suc : (n m : Nat) -> n + suc m == suc (n + m)
n+suc zero m = refl
n+suc (suc n) m rewrite n+suc n m = refl

weaken :  {m} {n} (y : Fin n) -> Fin (m + n)
weaken {zero} y = y
weaken {suc m} zero = zero
weaken {suc m} {suc n} (suc y) rewrite n+suc m n = suc (weaken y)

infixl 5 _+F_
_+F_ : {m n : Nat} -> Fin (suc m) -> Fin n -> Fin (m + n)
_+F_ {m} zero y = weaken y
_+F_ {suc _} (suc x) y = suc (x +F y)

Something that ended up challenging me here was that Elliott uses extensional equality of functions for his commutative diagrams, but my implementation of arrow categories (coming up) requires a propositional equality. I got around the problem by postulating extensionality (which I stole from McBride), and then by defining a type-alias for extensional equality that plays well with extensionality:

  extensionality : {S : Set}{T : S -> Set}
                   {f g : (x : S) -> T x} ->
                   ((x : S) -> f x == g x) ->
                   f == g

infix 1 _=o=_
_=o=_ : {A B : Set} -> (A -> B) -> (A -> B) -> Set
_=o=_ {A} f g = (x : A) -> f x == g x

and then we can give most (it’s hard to prove things, OK?) of the proof for the commutative diagram:

toN-weaken : {m n : Nat} -> (y : Fin n) -> toN (weaken {m} y) == toN y
toN-weaken {zero} y = refl
toN-weaken {suc m} zero = refl
toN-weaken {suc m} (suc y) = ?

toN-+F :  {(m , n) : Nat2} -> toN {m + n} ∘ <+F> =o= <+> ∘ toN2 {suc m , n}
toN-+F {zero , n} (zero , y) = refl
toN-+F {suc m , n} (zero , y)
  rewrite toN-+F {m , n} (zero , y)
        | toN-weaken {suc m} y = refl
toN-+F {suc m , n} (suc x , y) rewrite toN-+F {m , n} (x , y) = refl

Section 3: The arrow category

Section 3 presents the fact that we can bundle up the commutative diagram with its proof into an object. Unfortunately, it only gives the barest hint about how to actually go about doing that. I’m presenting here what I think it is, but if something goes wrong in the rest of the paper, here’s where the problem is.

The paper is keen to point out that we have five things we’re bundling together:

  1. A mapping from implementation input to specification input.
  2. A mapping from implementation output to specification output.
  3. An implementation.
  4. A specification.
  5. A proof that the square commutes.

Colloquially, I’d call the first two points our “observations” of the system.

The idea is, we’d like to bundle all of these things together, ideally in some sort of composable packaging. Composable usually means categorical, so we should look at our old friend SET, the category whose objects are types and arrows are functions. By itself, SET doesn’t give us any of the machinery we’d want for thinking about commutative squares. So instead, we’ll construct the arrow category over SET. Let’s call it >-SET->.

But what is the arrow category? The paper is quiet on this front, but my understanding is that it transforms the arrows in SET into objects in >-SET->. An arrow in >-SET-> is thus a pair of arrows in SET which form a commutative square. For example, consider the arrows in SET:

Fin n x Fin n        Nat x Nat
      |                  |
      | <+F>             | <+>
      v                  v
    Fin n               Nat

In >-SET->, these are just two objects, and a morphism between them is something that makes the whole square commute. In >-SET->:

       bimap toNat toNat
<+F>  ------------------>  <+>

and again in SET:

                bimap toNat toNat
Fin n x Fin n  ------------------> Nat x Nat
      |                                |
      | <+F>                           | <+>
      v               toNat            v
    Fin n      ------------------>    Nat

So we can consider the arrow category to be a “view” on its underlying category, like how databases present views over their data. The arrow category lets us talk about arrows directly, and ensures the commutativity of any constructions we’re able to form. As such, it’s a natural fit for our purposes of specification — we are literally unable to construct any arrows in >-SET-> which violate our specification.

Building Categories

How do we go about actually building an arrow category? First, some preliminaries to build a category:

record Category : Set where
  infix 6 _~>_
    Obj : Set
    _~>_ : (A B : Obj) -> Set

    id : {A : Obj} -> A ~> A
    _>>_ : {A B C : Obj} -> A ~> B -> B ~> C -> A ~> C

    id-l : {A B : Obj} (f : A ~> B) -> id >> f == f
    id-r : {A B : Obj} (f : A ~> B) -> f >> id == f
    >>-assoc : {A B C D : Obj}
               (f : A ~> B)
            -> (g : B ~> C)
            -> (h : C ~> D)
            -> f >> (g >> h) == (f >> g) >> h

And I have some syntactic sugar for dealing with arrows and composition in an arbitrary category:

infix 5 _[_,_]
_[_,_] : (C : Category) -> Obj C -> Obj C -> Set
C [ X , Y ] = _~>_ C X Y

infix 5 _[_>>_]
_[_>>_] : (C : Category)
       -> {X Y Z : Obj C}
       -> C [ X , Y ]
       -> C [ Y , Z ]
       -> C [ X , Z ]
C [ f >> g ] = _>>_ C f g

With this, we can describe an arrow in SET via SET [ A , B ], and composition via SET [ foo >> bar ].

Building arrow categories

An arrow category is parameterizd by the category whose arrows form its objects:

module ARROW (C : Category) where
  open Category
  open _=>_

  record ArrowObj : Set where
    constructor arrow-obj
      {alpha} : Obj A
      {beta}  : Obj B
      hom : C [ alpha , beta ]

  open ArrowObj

  record ArrowArr (X Y : ArrowObj) : Set where
    constructor arrow-hom
      f : A [ X .alpha , Y .alpha ]
      g : B [ X .beta , Y .beta ]
      commute : C [ f >> Y .hom ] == C [ X .hom >> g ]

  Arrow : Category
  Obj Arrow = ArrowObj
  _~>_ Arrow = ArrowArr
  -- ...

My implementation for Arrow is actually in terms of the arrow category, which is the same idea except it does some functor stuff. In the code accompanying this post, ARROW {c} is implemented as COMMA {c} ID=> ID=> where ID=> is the identity functor.

Back to section 3

For convenience, the paper implicitly defines a type synonym for constructing arrows in the arrow category. This is called _⮆_ in the paper, but I hate unicode, so mine is defined as =>>:

infix 0 _=>>_
_=>>_ :  {s1 t1 : Set}
          {s2 t2 : Set}
     (s1 -> s2) -> (t1 -> t2) -> Set
g =>> h = ArrowArr (arrow-obj g) (arrow-obj h)

With all of this machinery in place, we’re now ready to continue on the paper. We can construct a morphism in the arrow category corresponding to the fact that <+> is a model for <+F>, as witnessed by toN-+F:

+=>> : {(m , n) : Nat × Nat} -> toN2 { suc m , n } =>> toN { m + n }
+=>> = arrow-hom <+F> <+> $ extensionality toN-+F

Again, the necessity of extensionality here is a byproduct of me not having parameterized my Category by a notion of equivalence. The arrow category wants to use extensional equality, but I’ve hard-baked in propositional equality, and didn’t have time to fix it before my deadline to get this post out.

Although not presented in the paper, arrow categories also have a notion of “transposition,” corresponding to flipping which arrows (in SET) lay on the horizontal and vertical axes. Because _=>>_ names two arrows, leaving the other two implicit, transposition changes the type of our arrows — making the implicit ones explicit and vice versa. We’ll need this later in section 7.

transpose : {A B : CommaObj} ((comma-hom f g _) : CommaArr A B) -> f =>> g
transpose {comma-obj h} {comma-obj h'} (comma-hom f g p)
  = comma-hom h h' (sym p)

As an aside, it’s super cool that Agda can do this sort of pattern matching in a type signature.

Section 4: Carry-in

In order to make non-unary adders compositional, we need to support carrying-in. The play is familiar. Define what we want (the specifiation) over the naturals, write it over Fin, and then give an equivalence proof. The paper defines some type synonyms:

Nat3 : Set
Nat3 = Nat × Nat2

Fin3 : Nat3 -> Set
Fin3 (m , n) = Fin m × Fin2 n

toN3 : {(c , m , n) : Nat3} -> Fin3 (c , m , n) -> Nat3
toN3 (c , mn) = toN c , toN2 mn

(where Nat2 and Fin2 are exactly what you’d expect.)

It’s easy to define the specification (addN), and implementation (addF), and the proof is trivial too:

addN : Nat3 -> Nat
addN (c , a , b) = c + a + b

addF : {(m , n) : Nat2} -> Fin3 (2 , m , n) -> Fin (m + n)
addF (c , a , b) = c +F a +F b

toN-addF :  {mn@(m , n) : Nat2} 
     SET [ addF >> toN ] =o= SET [ toN3 >> addN ]
toN-addF {mn} (c , a , b)
  rewrite toN-+F {mn} (c +F a , b)
        | toN-+F (c , a) = refl

Bundling these up into an arrow proves that addN is a model for addF:

add=>>0 : {(m , n) : Nat2} -> toN3 {2 , m , n} =>> toN {m + n}
add=>>0 = comma-hom addF addN $ extensionality toN-addF

The paper also makes clear that we can show that <+> is a model for addN via carryIn:

carryIn : Nat3 -> Nat2
carryIn (c , a , b) = c + a , b

addN=>> : carryIn =>> id {A = Nat}
addN=>> = comma-hom addN <+> refl


At this point, it’s starting to become clear what this paper is really about. The idea is that we specify super simple pieces, and then build slightly more complicated things, showing equivalence to the last piece we built. In this way, we can slowly derive complexity. Not only does it give us a fool-proof means of getting results, but it also means we can reuse the proof work. As someone whose first real project in Agda was to implement and prove the correctness of a few adders, this is a godsend. I wrote a ripple-carry adder, but was unable to use my half-adder proof to prove it correctly implements addition. And then I had to throw all of that work away when I wanted to subsequently write and prove a carry-forward adder.

Section 5: Category stuff

This section shows we went through too much effort to implement add=>>0. Really what’s going on here is we’re doing three things in a row, for the specification, implementation and proof. First, we’re reassociating the tuple, from N x (N x N) to (N x N) x N. Then we’re doing addition on the first element of the pair, and then doing addition on the resulting pair.

This is all stuff you can do in any category with all finite products. But I was too lazy to implement that in full generality, so I hard-coded it. Because arrow categories lift products, and in SET products are just the product type, it’s easy to implement categorical objects:

_×C_ : Obj Comma -> Obj Comma -> Obj Comma
_×C_ (comma-obj {A1} {B1} f) (comma-obj {A2} {B2} g) =
  comma-obj {A1 × A2} {B1 × B2} λ { (x , y)  (f x , g y) }

And then we can implement first and assoc:

first :  {A B X : Obj Comma} -> Comma [ A , B ] -> Comma [ (A ×C X) , (B ×C X) ]
first {A} {B} {X} (comma-hom f g p) =
  comma-hom (do-first f) (do-first g) $ cong (\k (a , x) -> k a , xf x) p
    do-first : {A B X : Set} -> (A -> B) -> A × X -> B × X
    do-first f =  { (a , x)  f a , x })

assoc : {A B X : Obj Comma} -> Comma [ A ×C (B ×C X) , (A ×C B) ×C X ]
assoc =
  comma-hom reassoc reassoc ?
    reassoc : {A B C : Set} -> A × (B × C) -> (A × B) × C
    reassoc =  { (a , b , c)  (a , b) , c })

where the proof is left as an exercise to the reader :)

We can now implement add=>> more succinctly:

add=>> : {(m , n) : Nat2} -> toN3 {2 , m , n} =>> toN {m + n}
add=>> = Comma [ Comma [ assoc >> first +=>> ] >> +=>> ]

While this is cool, I must admit I’m a little confused. Do first and assoc have most-general types when expressd via _=>>_? Thinking aloud here, I think not. Using the _=>>_ notation is for choosing two particular morphisms in SET, while using the more general COMMA [ X , Y ] is for any pair morphisms in SET with the right type. But I’m not confident about this.

Section 6: Carrying out

Carry-in is great, but what about going the other direction?

Elliott starts by makig the following observation: if we fix m = n, then the type of our finitary adder is Fin2 (m , m) -> Fin (m + m), which we can rewrite the codomain as Fin (2 * m) and then reinterpret as Fin 2 x Fin m. That is to say, the type of finitary adding is to output a single digit in base m, and a bit corresponding to whether or not a carry occurred. This is a great little reminder in the value of type isomorphisms. How cool is it that we can get carry-outs for free just with some algebraic manipulations?

Of course, the trick is to prove it. Start by defining two helper type synonyms:

CarryIn : Nat -> Set
CarryIn m = Fin3 (2 , m , m)

CarryOut : Nat -> Set
CarryOut m = Fin2 (2 , m)

Elliott presents the following “puzzle” of a commutative diagram:

CarryIn m   --------> Fin (m + m)
   ^                      ^
id |                      | ?
   |           ?          |
CarryIn m   --------> Fin (2 * m)
   ^                      ^
id |                      | ?
   |           ?          |
CarryIn m   --------> CarryOut m

It’s unclear how exactly one formulates these diagrams in the first place. I guess the top is pinned by addF, while the bottom corners are pinned by our desired carrying out. The middle is thus the isomorphism presented immediately before this. All of that makes sense, but I’m not convinced I could reproduce it on my own yet.

So the game now is to fill in the question marks. I don’t know how to get Agda to help me with this, so I’m just going to try literally putting question marks in and seeing what happens. When doing that, we get this:

puzzle1 : {m : Nat}
       -> Comma [ comma-obj {CarryIn m}   {CarryIn m}   id
                , comma-obj {Fin (2 * m)} {Fin (m + m)} ?
puzzle1 = comma-hom ? addF ?

Figuring out the first question-mark is simple enough, it’s an isomorphism on Fin:

n+zero : (n : Nat) -> n + zero == n
n+zero zero = refl
n+zero (suc n) rewrite n+zero n = refl

2*m==m+m : (m : Nat) -> (2 * m) == m + m
2*m==m+m zero = refl
2*m==m+m (suc m) rewrite 2*m==m+m m | n+zero m = refl

castF : {m n : Nat} -> (m == n) -> Fin m -> Fin n
castF p rewrite p = id

Our first hole is thus cast $ 2*m==m+m m. Interestingly, this doesn’t refine our other hole, since it’s already fully specified by the vertial components of idA and the horizontal component of addF. However, as the paper points out, we can get the second hole for free. Because cast is invertable, we can make this square commute by taking id >> addF >> cast-1. It feels a bit like cheating, but it does indeed satisfy the commutativity diagram:

puzzle1 : {m : Nat}
       -> Comma [ comma-obj {CarryIn m}   {CarryIn m}   id
                , comma-obj {Fin (2 * m)} {Fin (m + m)} (cast $ 2*m==m+m m)
puzzle1 {m} =
    (SET [ addF >> cast $ sym $ 2*m==m+m m ])

where the proof is trivial (but I don’t know how to make it terse):

   SET [ SET [ addF >> cast (sym (2*m==m+m m)) ] >> cast $ 2*m==m+m m ]
== SET [ id >> addF ]

Probably there is a category of proofs, where I can just do reassoc >> second sym (sym-is-id $ 2*m==m+m m) >> id-r addF >> sym (id-l addF). But I don’t have that setup, and this would be annoying to do in the equational reasoning style. So it remains a hole, and you, gentle reader, can fill it in if you are keen. Also, I’d love to know how to write a proof as simple as my sketch above.

So that gives us the first half of our puzzle. Now that we have the middle arrow, let’s play the same game:

CarryIn m   -------------------------------> Fin (m + m)
   ^                                           ^
id |                                           | cast $ 2*m==m+m m
   |                                           |
   |          addF >> cast (sym (2*m==m+m m))  |
CarryIn m   -------------------------------> Fin (2 * m)
   ^                                           ^
id |                                           | ?
   |                       ?                   |
CarryIn m   -----------------------------> CarryOut m

So how do we turn a CarryOut m = Fin2 (2 , m) into a Fin (2 * m)? Algebraically I think this is a bit tricky, but thankfully Data.Fin.Base has us covered:

combine :  {n k}  Fin n  Fin k  Fin (n * k)

which we can uncurry:

comb : {m n : Nat} -> Fin2 (m , n) -> Fin (m * n)
comb (f1 , f2) = combine f1 f2

and then use this to fill in the vertical arrow. Because comb is one half of an isomorphism (the other half is formed by remQuot : ∀ {n} k → Fin (n * k) → Fin n × Fin k), we can do the same trick to get the horizontal arrow for free:

puzzle2 : {m : Nat}
      -> Comma [ comma-obj {CarryIn m}  {CarryIn m}   id
               , comma-obj {CarryOut m} {Fin (2 * m)} comb
puzzle2 {m} =
    (SET [ SET [ addF >> cast (sym (2*m==m+m m)) ] >> remQuot m ])
    (SET [ addF >> cast (sym (2*m==m+m m)) ])

The proof is again trivial but verbose.

Let’s call the implementation arrow addCarry, because we’ll need it in section 8.

addCarry : {m : Nat} -> SET [ CarryIn m , CarryOut m ]
addCarry {m} =
  SET [ SET [ addF >> cast (sym (2*m==m+m m)) ] >> remQuot m ]

Section 7: Vertical composition

Finally, we can use vertical composition to combine our two puzzles:

puzzle : {m : Nat} -> id =>> (cast (2*m==m+m m) ∘ comb {2} {m})
puzzle = transpose $ Comma [ transpose puzzle2 >> transpose puzzle1 ]

using our transpose machinery from earlier. Vertical composition composes along the axis of specification — if the implementation of one arrow matches the specification another, we can combine them into one.

Section 8: Moving away from unary representations

Unary sucks. Let’s generalize our adding machinery to any arbitrary type. First we’ll make types corresponding to CarryIn and CarryOut:

DIn : Set -> Set
DIn t = Bool × t × t

DOut : Set -> Set
DOut t = t × Bool

I’m going to go rogue for a while, and try to do this section without referencing the paper. We want to make a morphism in the arrow category corresponding to using addCarry as the specification for addition over DIn and DOut. Let’s play the same puzzle game, and set up a commutative diagram.

At the top is our specification, addCarry : CarryIn m -> CarryOut m. That then pins our top two objects, and obviously our bottom two are DIn t and DOut t:

              CarryIn m  --------->  CarryOut m
                 ^                      ^
 bval x (μ x μ)  |                      | μ x bval
                 |         addD         |
               DIn t  ------------->  DOut t

where bval : Bool -> Fin 2. Here, μ plays the part of toNat, and addD is addition over D t-indexed numbers.

We can package this up into a record indexed by μ:

record Adder {t : Set} {m : Nat} (μ : t -> Fin m) : Set where
  constructor _-|_
    addD : DIn t -> DOut t
    is-adder : SET [ addD >> bimap μ bval ]
            == SET [ bimap bval (bimap μ μ) >> addCarry ]

and trivially construct the desired commutative diagram from an Adder:

Adder=>> : {t : Set} {m : Nat} {μ : t -> Fin m}
        -> Adder μ
        -> bimap bval (bimap μ μ) =>> bimap μ bval
Adder=>> (addD -| proof) = comma-hom addD addCarry proof

So let’s implement a full-adder. This is a “well-known” result, but I didn’t know it offhand. I’m sure I could have sussed this out on my own, but instead I just found it on Wikipedia:

and : Bool -> Bool -> Bool
and true b = b
and false b = false

or : Bool -> Bool -> Bool
or true b = true
or false b = b

xor : Bool -> Bool -> Bool
xor true true = false
xor true false = true
xor false true = true
xor false false = false

full-add : DIn Bool -> DOut Bool
full-add (cin , a , b) =
  let o = xor a b
   in xor o cin , or (and a b) (and o cin)

We can construct an Adder for full-add with observation bval by case-bashing our way through the proof:

BitAdder : Adder bval
BitAdder = full-add -| extensionality
  \ { (false , false , false) -> refl
    ; (false , false , true) -> refl
    ; (false , true , false) -> refl
    ; (false , true , true) -> refl
    ; (true , false , false) -> refl
    ; (true , false , true) -> refl
    ; (true , true , false) -> refl
    ; (true , true , true) -> refl

The next step is obviously to figure out how to compose Adders — ideally to construct adders for vectors. But I don’t know how to do this offhand. So it’s time to look back at the paper.

OK, right. So we have an obvious tensor over μ, which is just to lift two of them over a pair:

    : {tm tn : Set}
   -> {m n : Nat}
   -> (tm -> Fin m)
   -> (tn -> Fin n)
   -> (tm × tn -> Fin (n * m))
tensorμ μm μn (tm , tn) = comb $ μn tn , μm tm

Likewise, we have one over the adding functions themselves, pushing the carry-out of one into the carry-in of the next:

    : {m n : Set}
   -> (DIn m -> DOut m)
   -> (DIn n -> DOut n)
   -> (DIn (m × n) -> DOut (m × n))
tensorAdd addm addn (cin , (ma , na) , (mb , nb)) =
  let (m , cin') = addm $ cin  , ma , mb
      (n , cout) = addn $ cin' , na , nb
   in (m , n) , cout

Allegedly these form a true adder as witnessed by Adder (tensorμ μm μn), but the proof isn’t included in the paper, and I had a hard time tracking it down in the source code. So rather than taking this by fiat, let’s see if we can convince ourselves.

As a first attempt of convincing myself, I tried to construct adder22 : Adder (tensorμ bval bval) which is a tensor of two full-adders. I constructed a case bash for the proof, and Agda complained! After some sleuthing, I had missed a swap somewhere in the paper, and thus had my carry bit in the wrong place in full-adder.

After sorting that out, the case bash works on adder22. So that’s a good sanity check that this works as promised. But, why? Presuably I should be able to run my commutative diagram all the way to its specification to debug what’s going on.

A few hours later…

I came up with the following, which can run a commutative diagram:

arrowOf : {A B : CommaObj} -> CommaArr A B -> CommaObj × CommaObj
arrowOf {A} {B} _ = A , B

debug : {A B C D : Set} -> {f : A -> B} -> {g : C -> D} -> f =>> g -> A -> D
debug arr x =
  let (_ , comma-obj y) = arrowOf arr
      (comma-hom f _ _) = arr
  in y (f x)

Of course, the diagrams we get from Adder=>> only get us as far as addCarry. In order to get all the way to Nats, we need to vertically compose a bunch of other diagrams too. In order, they’re puzzle, addF=>> and addN=>>. The actual diagram I came up with was this attrocious thing:

    : Cat2.CommaArr.f
        (Comma [
         Comma [ Comma [ transpose (Adder=>> adder22) >> transpose puzzle ]
         >> transpose addF=>> ]
         >> transpose addN=>> ]) =>> Cat2.CommaArr.g
                 (Comma [
                  Comma [ Comma [ transpose (Adder=>> adder22)
                               >> transpose puzzle ]
                  >> transpose addF=>> ]
                  >> transpose addN=>> ])
Adder=>>N = transpose $
    [ Comma
    [ Comma
    [ transpose (Adder=>> adder22)
   >> transpose puzzle ]
   >> transpose addF=>> ]
   >> transpose addN=>> ]

and finally, I can evaluate the thing:

debug' : Nat
debug' = debug Adder=>>N (false , (true , false) , (true , false))

Nice. OK, so back to answering the question. Each of the Bool x Bool tuples is a little-endian vector, which get added together, plus the carry. In the process of doing all of this work, I inadvertantly figured out how the tensoring works. What’s more interesting is tensoring together two different adders. For example, the trivial-add (section 8.3):

data One : Set where
  one : One

oneval : One -> Fin 1
oneval one = zero

trivial-add : DIn One -> DOut One
trivial-add (b , _ , _) = one , b

If we construct tensorAdder trivial-add BitAdder, we get an adder whose vectors are One x Bool. This is an extremely interesting representation, as it means our number system doesn’t need to have the same base for each digit. In fact, that’s where the compositionality comes from. We’re pretty comfortable assigning 2^i to each digit position, but this representation makes it clear that there’s nothing stopping us from choosing arbitrary numbers. What’s really doing the work here is our old friend comb : {m n : Nat} -> Fin2 (m , n) -> Fin (n * m). Expanding the type synonym makes it a little clearer comb : {m n : Nat} -> Fin m x Fin n -> Fin (n * m) — this thing is literally multiplying two finite numbers into one!

Looking at tensorAdd under this new lens makes it clearer too. Recall:

tensorAdd addm addn (cin , (ma , na) , (mb , nb)) =
  let (m , cin') = addm $ cin  , ma , mb
      (n , cout) = addn $ cin' , na , nb
   in (m , n) , cout

Here we’re pointwise adding the digits, where m is the least significant of the two digits. Our carry-in goes into the m, and its carry-out goes into n. OK, so this thing is just doing adder-chaining.

Section 8.4 talks about extending this to vectors, but the trick is just to fold them via tensorAdd. The paper uses a right-fold. I’m curious about what happens if you do a left fold, but might circle back around to that question since I only have a few more hours to get this post out and I want to spend some time in Mexico while I’m here. Upon deeper thought, I don’t think anythig would change — we’d still get a ripple carry adder. Worth playing around with though.

Section 9: Speculation

Section 9 is the most exciting part of the paper in my eyes. It defines speculation, which Elliott uses to implement a carry-ahead adder. I think the paper cheats a bit in this section — and makes it clear that we might have been cheating a bit earlier too. But first some preliminaries. The paper defines speculate:

speculate : {A C : Set} -> (Bool × A -> C) -> (Bool × A -> C)
speculate f (b , a) = if b then f (true , a) else f (false , a)

This looks like it should be a no-op, and indeed, there’s a trivial proof that speculate f =o= f. The trick then is to lift speculate over an adder:

    : {t : Set} {m : Nat} {μ : t -> Fin m}
   -> Adder μ
   -> Adder μ
spec (adder -| proof) = speculate adder -| ?

and the claim is that if we now do the same vector fold over spec a instead of a, we will get a carry-ahead adder! Sorcery! Magic!

But also… wat? Is that actually true?

I think here is where the paper is playing fast and loose. In SET, speculate a is exactly a. But the paper shows us a circuit diagram for this speculated fold, and it does indeed show the right behavior. So what’s going on?

What’s going on is that the paper isn’t actually working in SET. Well, it is. Sorta. In fact, I think there’s some fancy-pants compiling-to-categories going on here. In the same way that xor presented above is a SET-equivalent version of the xor operation in the CIRCUIT category (but is not actually xor in CIRCUIT), if_then_else_ is actually the SET version of an equivalent operation in CIRCUIT. In CIRCUIT, if_then_else_ is actually implemented as inlining both its true and false branches, and switching between them by anding their outputs against the conditional.

So, the carry-ahead adder is not nearly as simple as it’s presented in this paper. There’s a huge amount of work going on behind the scenes:

  1. defining the CIRCUIT category
  2. implementing if_then_else_ in CIRCUIT
  3. showing that the arrow category lifts if_then_else_

Furthermore, I’m not exactly sure how this all works. Like, when we define speculate as if b then f (true , a) else f (false , a), are we literally inlining f with the conditional fixed, and simplfying the resulting circuit? I mean, we could just fix true by tying it to HIGH, but the diagrams included in the paper don’t appear to do that. If so, who is responsible for simplfying? Does it matter? This is a big hole in the paper, and in my opinion, greatly diminishes its impact, since it’s the claim I was most excited about.

To the paper’s credit, the vector fold and speculative fold turn into nice combinators, and give us a little language for building adders, which is extremely cool.

Section 10: Reusing circuitry over time

Ripple-carry adders are slow but use few gates. Carry-ahead adders are much faster, but use asymptotically more gates. Clearly there is a tradeoff here between latency and manufacturing cost. Section 10 gives us another point in the design space where we just build one full-adder, and loop it into itself. This also sounds exciting, but I’m a bit wary after section 9.

And as presented, I don’t think I trust the paper to deliver on this front. There is some finagling, but at it’s core, we are given a looping construct:

    : {A B S : Set}
   -> {n : Nat}
   -> (S × A -> B × S)
   -> S × Vec A n
   -> Vec B n × S
loop f (s , nil) = nil , s
loop f (s , cons a v) =
  let b , s' = f (s , a)
   in bimap (cons b) id (loop f (s' , v))

Thinking about what this would mean in CIRCUIT makes it unclear how we would go about implementing such a thing in real hardware — especially so if the embedding sticks f into the hardware and then loops over it over time. You’re going to need some sort of ring buffer to read off the outputs and stick them in the resulting vector. You’re going to need timing signals to know when your ring buffer is consistent. There’s clearly a lot going on in this section that is left unsaid, and there aren’t even any pretty pictures to help us reverse engineer the missing bits.

So I’m going to leave it there.


Adders and Arrows was a fun paper to go through. It forced me to up my category game, and I got a much better sense of what the arrow category does, and how it can be useful. Furthermore, just going through a non-trivial project aggressively improved my Agda competencies, and I’m excited to do more along these lines.

The paper itself is a bit too terse for my liking. I would have liked a section saying “here’s what we’re going to do, and here’s how we’re going to go about it,” rather than just being thrown in and trying to deduce this for myself. In particular, it took me an embarassing amount of time to realize how to get natural numbers out of my adder arrows, and why the first 6 sections were worth having done.

Technically, I found the ergonomics of working with arrow-category arrows very challenging. Two of the SET morphisms show up in the type, but the other two show up as values, and there is no easy way to see which diagrams can be vertically composed. My Adder=>>N arrow abve shows the pain of trying to give a type to such a thing.

I had two major points of complaint about this paper. The first is that the source code isn’t very accessible. It exists in a repo, but is scattered around a bunch of modules and whenever I wanted to find something I resorted to just looking at each — being unable to make rhyme or reason of how things were organized. Worse, a huge chunk of the underlying machinery is in a separate repo, one which is significantly more advanced in its Agda usage than I am. A proliferation of weird unicode symbols that aren’t the ones that show up in the PDF make this especially challenging to navigate.

My other major complaint is that sections 9 and 10 were extremely underwhelming, though. If the paper does what it promises, it doesn’t actually show how. There is a lot going on behind the scenes that aren’t even alluded to in the paper. Granted, the version I’m reading is a draft, so hopefully this will be cleared up.

I don’t yet have a major takeaway from this paper, other than that arrow categories are cool for specifying problems and proving that your implementations adhere to those specifications. But as implemented, for my given adeptness at Agda, they are too hard to use. Composition is tricky to wrap ones head around given the type signatures used in this paper, but hopefully that’s an aesthetic problem more than a fundamental issue. In particular, tranpose needs to have type CommaArr A B C D -> CommaArr C D A B — this would make vertical and horizontal composition much easier to think about.

All in all, powering through this paper has given me some new tools for thought, and helped me see how category theory might be useful to mere mortals.

My implementation of this code is available on Github.

January 07, 2022 02:27 PM


The Interface for Multiple Home Units

Over the last few weeks I have been finishing and improving the implementation of support for Multiple Home Units. A lot of the preliminary work was completed by Fendor. In short, multiple home units allows you to load different packages which may depend on each other into one GHC session. This will allow both GHCi and HLS to support multi component projects more naturally. To get a more complete overview of the why then you should first consult Fendor’s excellent introduction.

This post will concentrate on the interface and implementation of the feature. In particular we will talk about the solution to two of the issues he summarises at the end of the post, all the flags you need to know about and other limitations of the current implementation.

I originally implemented support for multiple components in Haskell Language Server (HLS) at the start of 2020 but the implementation has always been hacky. Stack also has rudimentary support for loading multiple home units into one ghci session but doesn’t allow different options or dependencies per component. Given the increasing importance of HLS, fundamental issues which affect it are now being given more priority under the normal GHC maintenance budget. Multiple Home Units is one of the first bigger projects which we hope will allow the language server to be implemented more robustly.

Interface of Multiple Home Units

Imagine that you have a project which contains two libraries, named lib-core and lib. lib-core contains some utility functions which are used by lib, so when editing lib you are often also editing lib-core. Multiple home units can make this less painful by allowing a build tool to compile lib and lib-core with one command line invocation. How would a build tool make use of this feature?

In order to specify multiple units, the -unit @⟨filename⟩ flag is given multiple times with a response file containing the arguments for each unit. The response file contains a newline separated list of arguments.

ghc -unit @unitLibCore -unit @unitLib

where the unitLibCore response file contains the normal arguments that cabal would pass to --make mode.

-this-unit-id lib-core-

The response file for lib, can specify a dependency on lib-core, so then modules in lib can use modules from lib-core.

-this-unit-id lib-
-package-id lib-core-

Then when the compiler starts in --make mode it will compile both units lib and lib-core.

There is also very basic support for multiple home units in GHCi: at the moment you can start a GHCi session with multiple units but only the :reload command is supported. Most commands in GHCi assume a single home unit, and so it is additional work (#20889) to modify the interface to support multiple loaded home units.

Options used when working with Multiple Home Units

There are a few extra flags which have been introduced specifically for working with multiple home units. The flags allow a home unit to pretend it’s more like an installed package, for example, specifying the package name, module visibility and reexported modules.

-working-dir ⟨dir⟩

It is common to assume that a package is compiled in the directory where its cabal file resides. Thus, all paths used in the compiler are assumed to be relative to this directory. When there are multiple home units the compiler is often not operating in the standard directory and instead where the cabal.project file is located. In this case the -working-dir option can be passed which specifies the path from the current directory to the directory the unit assumes to be it’s root, normally the directory which contains the cabal file.

When the flag is passed, any relative paths used by the compiler are offset by the working directory. Notably this includes -i and -I⟨dir⟩ flags.

-this-package-name ⟨name⟩

This flag papers over the awkward interaction of the PackageImports and multiple home units. When using PackageImports you can specify the name of the package in an import to disambiguate between modules which appear in multiple packages with the same name.

This flag allows a home unit to be given a package name so that you can also disambiguate between multiple home units which provide modules with the same name.

This solves one problem that Fendor described in his blog post.

-hidden-module ⟨module name⟩

This flag can be supplied multiple times in order to specify which modules in a home unit should not be visible outside of the unit it belongs to.

The main use of this flag is to be able to recreate the difference between an exposed and hidden module for installed packages.

Fendor talked about the issue of module visibility in his blog post, and this flag solves the issue.

-reexported-module ⟨module name⟩

This flag can be supplied multiple times in order to specify which modules are not defined in a unit but should be reexported. The effect is that other units will see this module as if it was defined in this unit.

The use of this flag is to be able to replicate the reexported modules feature of packages with multiple home units.

Offsetting Paths in Template Haskell splices

When using Template Haskell to embed files into your program, traditionally the paths have been interpreted relative to the directory where the .cabal file resides. This causes problems for multiple home units as we are compiling many different libraries at once which have .cabal files in different directories.

For this purpose we have introduced a way to query the value of the -working-dir flag to the Template Haskell API. By using this function we can implement a makeRelativeToProject function which offsets a path which is relative to the original project root by the value of -working-dir.

import Language.Haskell.TH.Syntax ( makeRelativeToProject )

foo = $(makeRelativeToProject "./relative/path" >>= embedFile)

If you write a relative path in a Template Haskell splice you should use the makeRelativeToProject function so that your library works correctly with multiple home units.

A similar function already exists in the file-embed library. The function in template-haskell implements this function in a more robust manner by honouring the -working-dir flag rather than searching the file system.

Closure Property for Home Units

For tools or libraries using the GHC API there is one very important closure property which must be adhered to:

Any dependency which is not a home unit must not (transitively) depend on a home unit.

For example, if you have three packages p, q and r, then if p depends on q which depends on r then it is illegal to load both p and r as home units but not q, because q is a dependency of the home unit p which depends on another home unit r.

If you are using GHC by the command line then this property is checked, but if you are using the GHC API then you need to check this property yourself. If you get it wrong you will probably get some very confusing errors about overlapping instances.

Limitations of Multiple Home Units

There are a few limitations of the initial implementation which will be smoothed out on user demand.

  • Package thinning/renaming syntax is not supported (#20888)
  • More complicated reexports/renaming are not yet supported.
  • It’s more common to run into existing linker bugs when loading a large number of packages in a session (for example #20674, #20689)
  • Backpack is not yet supported when using multiple home units (#20890)
  • Dependency chasing can be quite slow with a large number of modules and packages (#20891).
  • Loading wired-in packages as home units is currently not supported (this only really affects GHC developers attempting to load template-haskell).
  • Barely any normal GHCi features are supported (#20889). It would be good to support enough for ghcid to work correctly.

Despite these limitations, the implementation works already for nearly all packages. It has been testing on large dependency closures, including the whole of head.hackage which is a total of 4784 modules from 452 packages.


With the first iteration of this implementation the necessary foundational aspects have been implemented to allow GHC API clients such as HLS to load multiple home units at once. The next steps are for the maintainers of build tools such as Cabal and Stack to modify their repl commands to support the new interface.

Well-Typed is able to work on GHC, HLS, Cabal and other core Haskell infrastructure thanks to funding from various sponsors. If your company is interested in contributing to this work, sponsoring maintenance efforts, or funding the implementation of other features, please get in touch.

by matthew at January 07, 2022 12:00 AM

January 06, 2022

Chris Reade

Graphs, Kites and Darts

Graphs, Kites and Darts

Figure 1: Three Coloured Patches
Figure 1: Three Coloured Patches

Non-periodic tilings with Penrose’s kites and darts

We continue our investigation of the tilings using Haskell with Haskell Diagrams. What is new is the introduction of a planar graph representation. This allows us to define more operations on finite tilings, in particular forcing and composing.

Previously in Diagrams for Penrose Tiles we implemented tools to create and draw finite patches of Penrose kites and darts (such as the samples depicted in figure 1). The code for this and for the new graph representation and tools described here can be found on GitHub

To describe the tiling operations it is convenient to work with the half-tiles: LD (left dart), RD (right dart), LK (left kite), RK (right kite) using a polymorphic type HalfTile (defined in a module HalfTile)

data HalfTile rep 
 = LD rep | RD rep | LK rep | RK rep   deriving (Show,Eq)

Here rep is a type variable for a representation to be chosen. For drawing purposes, we chose two-dimensional vectors (V2 Double) and called these Pieces.

type Piece = HalfTile (V2 Double)

The vector represents the join edge of the half tile (see figure 2) and thus the scale and orientation are determined (the other tile edges are derived from this when producing a diagram).

Figure 2: The (half-tile) pieces showing join edges (dashed) and origin vertices (red dots)
Figure 2: The (half-tile) pieces showing join edges (dashed) and origin vertices (red dots)

Finite tilings or patches are then lists of located pieces.

type Patch = [Located Piece]

Both Piece and Patch are made transformable so rotate, and scale can be applied to both and translate can be applied to a Patch. (Translate has no effect on a Piece unless it is located.)

In Diagrams for Penrose Tiles we also discussed the rules for legal tilings and specifically the problem of incorrect tilings which are legal but get stuck so cannot continue to infinity. In order to create correct tilings we implemented the decompose operation on patches.

The vector representation that we use for drawing is not well suited to exploring properties of a patch such as neighbours of pieces. Knowing about neighbouring tiles is important for being able to reason about composition of patches (inverting a decomposition) and to find which pieces are determined (forced) on the boundary of a patch.

However, the polymorphic type HalfTile allows us to introduce our alternative graph representation alongside Pieces.

Tile Graphs

In the module Tgraph, we have the new representation which treats half tiles as triangular faces of a planar graph – a TileFace – by specialising HalfTile with a triple of vertices (clockwise starting with the tile origin). For example

LD (1,3,4)       RK (6,4,3)
type Vertex = Int
type TileFace = HalfTile (Vertex,Vertex,Vertex)

When we need to refer to particular vertices from a TileFace we use originV (the first vertex – red dot in figure 2), oppV (the vertex at the opposite end of the join edge – dashed edge in figure 2), wingV (the remaining vertex not on the join edge).

originV, oppV, wingV :: TileFace -> Vertex


The Tile Graphs implementation uses a type Tgraph which has a list of graph vertices and a list of tile faces.

data Tgraph = Tgraph { vertices :: [Vertex]
                     , faces    :: [TileFace]
                     }  deriving (Show)

For example, fool (short for a fool’s kite) is a Tgraph with 6 faces and 7 vertices, shown in figure 3.

fool = Tgraph { vertices = [1,2,3,4,5,6,7]
              , faces = [RD (1,2,3),LD (1,3,4),RK (6,2,5)
                        ,LK (6,3,2),RK (6,4,3),LK (6,7,4)

(The fool is also called an ace in the literature)

Figure 3: fool
Figure 3: fool

With this representation we can investigate how composition works with whole patches. Figure 4 shows a twice decomposed sun on the left and a once decomposed sun on the right (both with vertex labels). In addition to decomposing the right graph to form the left graph, we can also compose the left graph to get the right graph.

Figure 4: sunD2 and sunD
Figure 4: sunD2 and sunD

After implementing composition, we also explore a force operation and an emplace operation to extend tilings.

There are some constraints we impose on Tgraphs.

  • No spurious vertices. Every vertex of a Tgraph face must be one of the Tgraph vertices and each of the Tgraph vertices occurs in at least one of the Tgraph faces.
  • Connected. The collection of faces must be a single connected component.
  • No crossing boundaries. By this we mean that vertices on the boundary are incident with exactly two boundary edges. The boundary consists of the edges between the Tgraph faces and exterior region(s). This is important for adding faces.
  • Face connected. Roughly, this means that if we collect the faces of a Tgraph by starting from any single face and then add faces which share an edge with those already collected, we get all the Tgraph faces. This is important for drawing purposes.

In fact, if a Tgraph is connected with no crossing boundaries, then it must be face connected. (We could define face connected to mean that the dual graph excluding exterior regions is connected.)

Figure 5 shows two excluded graphs which have crossing boundaries at 4 (left graph) and 13 (right graph). The left graph is still face connected but the right is not face connected (the two faces at the top right do not have an edge in common with the rest of the faces.)

Although we have allowed for Tgraphs with holes (multiple exterior regions), we note that such holes cannot be created by adding faces one at a time without creating a crossing boundary. They can be created by removing faces from a Tgraph without necessarily creating a crossing boundary.

Important We are using face as an abbreviation for half-tile face of a Tgraph here, and we do not count the exterior of a patch of faces to be a face. The exterior can also be disconnected when we have holes in a patch of faces and the holes are not counted as faces either. In graph theory, the term face would generally include these other regions, but we will call them exterior regions rather than faces.

Figure 5: A face-connected graph with crossing boundaries at 4, and a non face-connected graph
Figure 5: A face-connected graph with crossing boundaries at 4, and a non face-connected graph

In addition to the constructor Tgraph we also use

checkTgraph:: [TileFace] -> Tgraph

which creates a Tgraph from a list of faces, but also performs checks on the required properties of Tgraphs. We can then remove or select faces from a Tgraph and then use checkTgraph to ensure the resulting Tgraph still satisfies the required properties.

selectFaces, removeFaces  :: [TileFace] -> Tgraph -> Tgraph
selectFaces fcs g = checkTgraph (faces g `intersect` fcs)
removeFaces fcs g = checkTgraph (faces g \\ fcs)

Edges and Directed Edges

We do not explicitly record edges as part of a Tgraph, but calculate them as needed. Implicitly we are requiring

  • No spurious edges. The edges of a Tgraph are the edges of the faces of the Tgraph.

To represent edges, a pair of vertices (a,b) is regarded as a directed edge from a to b. A list of such pairs will usually be regarded as a directed edge list. In the special case that the list is symmetrically closed [(b,a) is in the list whenever (a,b) is in the list] we will refer to this as an edge list rather than a directed edge list.

The following functions on TileFaces all produce directed edges (going clockwise round a face).

  -- join edge - dashed in figure 2
joinE  :: TileFace -> (Vertex,Vertex)
  -- the short edge which is not a join edge
shortE :: TileFace -> (Vertex,Vertex)
  -- the long edge which is not a join edge
longE  :: TileFace -> (Vertex,Vertex)
 -- all three directed edges clockwise from origin
faceDedges :: TileFace -> [(Vertex,Vertex)]

For the whole Tgraph, we often want a list of all the directed edges of all the faces.

graphDedges :: Tgraph -> [(Vertex,Vertex)]
graphDedges g = concatMap faceDedges (faces g)

Because our graphs represent tilings they are planar (can be embedded in a plane) so we know that at most two faces can share an edge and they will have opposite directions of the edge. No two faces can have the same directed edge. So from graphDedges g we can easily calculate internal edges (edges shared by 2 faces) and boundary directed edges (directed edges round the external regions).

internalEdges, boundaryDedges :: Tgraph -> [(Vertex,Vertex)]

The internal edges of g are those edges which occur in both directions in graphDedges g. The boundary directed edges of g are the missing reverse directions in graphDedges g.

We also refer to all the long edges of a Tgraph (including kite join edges) as phiEdges (both directions of these edges).

phiEdges :: Tgraph -> [(Vertex, Vertex)]

This is so named because, when drawn, these long edges are phi times the length of the short edges (phi being the golden ratio which is approximately 1.618).

Drawing Tgraphs (Patches and VPatches)

The module GraphConvert contains functions to convert a Tgraph to our previous vector representation (Patch) defined in TileLib so we can use the existing tools to produce diagrams.

makePatch :: Tgraph -> Patch

drawPatch :: Patch -> Diagram B -- defined in module TileLib

drawGraph :: Tgraph -> Diagram B
drawGraph = drawPatch . makePatch

However, it is also useful to have an intermediate stage (a VPatch = Vertex Patch) which contains both face (vertices) and vectors. This allows vertex labels to be drawn and for faces to be identified and retained/excluded after the vector information is calculated.

data VPatch  = VPatch {lVertices :: [Located Vertex]
                      ,lHybrids :: [Located Hybrid]

A Vpatch has a list of located vertices and a list of located hybrids, where a Hybrid is a HalfTile with a dual representation of the face (vertices) and vector (join edge). We make VPatch transformable so it can also be an argument type for rotate, translate, and scale.

The conversion functions include

makeVPatch   :: Tgraph -> VPatch
dropVertices :: VPatch -> Patch -- discards vertex information
drawVPatch   :: VPatch -> Diagram B  -- draws labels as well

drawVGraph   :: Tgraph -> Diagram B
drawVGraph = drawVPatch . makeVPatch

One consequence of using abstract graphs is that there is no unique predefined way to orient or scale or position the patch arising from a graph representation. Our implementation selects a particular join edge and aligns it along the x-axis (unit length for a dart, philength for a kite) and face-connectedness ensures the rest of the patch can be calculated from this.

We also have functions to re-orient a Vpatch and lists of VPatchs using chosen pairs of vertices. [Simply doing rotations on the final diagrams can cause problems if these include vertex labels. We do not, in general, want to rotate the labels – so we need to orient the Vpatch before converting to a diagram]

Decomposing Graphs

We previously implemented decomposition for patches which splits each half-tile into two or three smaller scale half-tiles.

decompose :: Patch -> Patch

We now have a Tgraph version of decomposition in the module Tgraphs:

decomposeG :: Tgraph -> Tgraph

Graph decomposition is particularly simple. We start by introducing one new vertex for each long edge (the phiEdges) of the Tgraph. We then build the new faces from each old face using the new vertices.

As a running example we take fool (mentioned above) and its decomposition foolD

*Main> foolD = decomposeG fool

*Main> foolD
Tgraph { vertices = [1,8,3,2,9,4,5,13,10,6,11,14,7,12]
       , faces = [LK (1,8,3),RD (2,3,8),RK (1,3,9)
                 ,LD (4,9,3),RK (5,13,2),LK (5,10,13)
                 ,RD (6,13,10),LK (3,2,13),RK (3,13,11)
                 ,LD (6,11,13),RK (3,14,4),LK (3,11,14)
                 ,RD (6,14,11),LK (7,4,14),RK (7,14,12)
                 ,LD (6,12,14)

which are best seen together (fool followed by foolD) in figure 6.

Figure 6: fool and foolD (= decomposeG fool)
Figure 6: fool and foolD (= decomposeG fool)

Composing graphs, and Unknowns

Composing is meant to be an inverse to decomposing, and one of the main reasons for introducing our graph representation. In the literature, decomposition and composition are defined for infinite tilings and in that context they are unique inverses to each other. For finite patches, however, we will see that composition is not always uniquely determined.

In figure 7 (Two Levels) we have emphasised the larger scale faces on top of the smaller scale faces.

Figure 7: Two Levels
Figure 7: Two Levels

How do we identify the composed tiles? We start by classifying vertices which are at the wing tips of the (smaller) darts as these determine how things compose. In the interior of a graph/patch (e.g in figure 7), a dart wing tip always coincides with a second dart wing tip, and either

  1. the 2 dart halves share a long edge. The shared wing tip is then classified as a largeKiteCentre and is at the centre of a larger kite. (See left vertex type in figure 8), or
  2. the 2 dart halves touch at their wing tips without sharing an edge. This shared wing tip is classified as a largeDartBase and is the base of a larger dart. (See right vertex type in figure 8)
Figure 8: largeKiteCentre (left) and largeDartBase (right)
Figure 8: largeKiteCentre (left) and largeDartBase (right)

[We also call these (respectively) a deuce vertex type and a jack vertex type later in figure 10]

Around the boundary of a graph, the dart wing tips may not share with a second dart. Sometimes the wing tip has to be classified as unknown but often it can be decided by looking at neighbouring tiles. In this example of a four times decomposed sun (sunD4), it is possible to classify all the dart wing tips as largeKiteCentres or largeDartBases so there are no unknowns.

If there are no unknowns, then we have a function to produce the unique composed graph.

composeG:: Tgraph -> Tgraph

Any correct decomposed graph without unknowns will necessarily compose back to its original. This makes composeG a left inverse to decomposeG provided there are no unknowns.

For example, with an (n times) decomposed sun we will have no unknowns, so these will all compose back up to a sun after n applications of composeG. For n=4 (sunD4 – the smaller scale shown in figure 7) the dart wing classification returns 70 largeKiteCentres, 45 largeDartBases, and no unknowns.

Similarly with the simpler foolD example, if we classsify the dart wings we get

largeKiteCentres = [14,13]
largeDartBases = [3]
unknowns = []

In foolD (the right hand graph in figure 6), nodes 14 and 13 are new kite centres and node 3 is a new dart base. There are no unknowns so we can use composeG safely

*Main> composeG foolD
Tgraph { vertices = [1,2,3,4,5,6,7]
       , faces = [RD (1,2,3),LD (1,3,4),RK (6,2,5)
                 ,RK (6,4,3),LK (6,3,2),LK (6,7,4)

which reproduces the original fool (left hand graph in figure 6).

However, if we now check out unknowns for fool we get

largeKiteCentres = []
largeDartBases = []
unknowns = [4,2]    

So both nodes 2 and 4 are unknowns. It had looked as though fool would simply compose into two half kites back-to-back (sharing their long edge not their join), but the unknowns show there are other possible choices. Each unknown could become a largeKiteCentre or a largeDartBase.

The question is then what to do with unknowns.

Partial Compositions

In fact our composeG resolves two problems when dealing with finite patches. One is the unknowns and the other is critical missing faces needed to make up a new face (e.g the absence of any half dart).

It is implemented using an intermediary function for partial composition

partCompose:: Tgraph -> ([TileFace],Tgraph) 

partCompose will compose everything that is uniquely determined, but will leave out faces round the boundary which cannot be determined or cannot be included in a new face. It returns the faces of the argument graph that were not used, along with the composed graph.

Figure 9 shows the result of partCompose applied to two graphs. [These are force kiteD3 and force dartD3 on the left. Force is described later]. In each case, the excluded faces of the starting graph are shown in pale green, overlaid by the composed graph on the right.

Figure 9: partCompose for two graphs (force kiteD3 top row and force dartD3 bottom row)
Figure 9: partCompose for two graphs (force kiteD3 top row and force dartD3 bottom row)

Then composeG is simply defined to keep the composed faces and ignore the unused faces produced by partCompose.

composeG:: Tgraph -> Tgraph
composeG = snd . partCompose 

This approach avoids making a decision about unknowns when composing, but it may lose some information by throwing away the uncomposed faces.

For correct Tgraphs g, if decomposeG g has no unknowns, then composeG is a left inverse to decomposeG. However, if we take g to be two kite halves sharing their long edge (not their join edge), then these decompose to fool which produces an empty graph when recomposed. Thus we do not have g = composeG (decomposeG g) in general. On the other hand we do have g = composeG (decomposeG g) for correct whole-tile Tgraphs g (whole-tile means all half-tiles of g have their matching half-tile on their join edge in g)

Later (figure 21) we show another exception to g = composeG(decomposeG g) with an incorrect tiling.

We make use of

selectFacesVP    :: [TileFace] -> VPatch -> VPatch
removeFacesVP    :: [TileFace] -> VPatch -> VPatch
selectFacesGtoVP :: [TileFace] -> Tgraph -> VPatch
removeFacesGtoVP :: [TileFace] -> Tgraph -> VPatch

for creating VPatches from selected tile faces of a Tgraph or VPatch. This allows us to represent and draw a subgraph which need not be connected nor satisfy the no crossing boundaries property provided the Tgraph it was derived from had these properties.


When building up a tiling, following the rules, there is often no choice about what tile can be added alongside certain tile edges at the boundary. Such additions are forced by the existing patch of tiles and the rules. For example, if a half tile has its join edge on the boundary, the unique mirror half tile is the only possibility for adding a face to that edge. Similarly, the short edge of a left (respectively, right) dart can only be matched with the short edge of a right (respectively, left) kite. We also make use of the fact that only 7 types of vertex can appear in (the interior of) a patch, so on a boundary vertex we sometimes have enough of the faces to determine the vertex type. These are given the following names in the literature (shown in figure 10): sun, star, jack (=largeDartBase), queen, king, ace, deuce (=largeKiteCentre).

Figure 10: Vertex types
Figure 10: Vertex types

The function

force :: Tgraph -> Tgraph

will add some faces on the boundary that are forced (i.e new faces where there is exactly one possible choice). For example:

  • When a join edge is on the boundary – add the missing half tile to make a whole tile.
  • When a half dart has its short edge on the boundary – add the half kite that must be on the short edge.
  • When a vertex is both a dart origin and a kite wing (it must be a queen or king vertex) – if there is a boundary short edge of a kite half at the vertex, add another kite half sharing the short edge, (this converts 1 kite to 2 and 3 kites to 4 in combination with the first rule).
  • When two half kites share a short edge their common oppV vertex must be a deuce vertex – add any missing half darts needed to complete the vertex.

Figure 11 shows foolDminus (which is foolD with 3 faces removed) on the left and the result of forcing, ie force foolDminus on the right which is the same graph we get from force foolD.

foolDminus = 
    removeFaces [RD(6,14,11), LD(6,12,14), RK(5,13,2)] foolD
Figure 11: foolDminus and force foolDminus = force foolD
Figure 11: foolDminus and force foolDminus = force foolD

Figures 12, 13 and 14 illustrate the the result of forcing a 5-times decomposed kite, a 5-times decomposed dart, and a 5-times decomposed sun (respectively). The first two figures reproduce diagrams from an article by Roger Penrose illustrating the extent of influence of tiles round a decomposed kite and dart. [Penrose R Tilings and quasi-crystals; a non-local growth problem? in Aperiodicity and Order 2, edited by Jarich M, Academic Press, 1989. (fig 14)].

Figure 12: force kiteD5 with kiteD5 shown in red
Figure 12: force kiteD5 with kiteD5 shown in red
Figure 13: force dartD5 with dartD5 shown in red
Figure 13: force dartD5 with dartD5 shown in red
Figure 14: force sunD5 with sunD5 shown in red
Figure 14: force sunD5 with sunD5 shown in red

In figure 15, the bottom row shows successive decompositions of a dart (dashed blue arrows from right to left), so applying composeG to each dart will go back (green arrows from left to right). The black vertical arrows are force. The solid blue arrows from right to left are (force . decomposeG) being applied to the successive forced graphs. The green arrows in the reverse direction are composeG again and the intermediate (partCompose) figures are shown in the top row with the ignored faces in pale green.

Figure 15: Arrows: black = force, green = composeG, solid blue = (force . decomposeG)
Figure 15: Arrows: black = force, green = composeG, solid blue = (force . decomposeG)

Figure 16 shows the forced graphs of the seven vertex types (with the starting graphs in red) along with a kite (top right).

Figure 16: Relating the forced seven vertex types and the kite
Figure 16: Relating the forced seven vertex types and the kite

These are related to each other as shown in the columns. Each graph composes to the one above (an empty graph for the ones in the top row) and the graph below is its forced decomposition. [The rows have been scaled differently to make the vertex types easier to see.]

Adding Faces to a Tgraph

This is technically tricky because we need to discover what vertices (and implicitly edges) need to be newly created and which ones already exist in the Tgraph. This goes beyond a simple graph operation and requires use of the geometry of the faces. We have chosen not to do a full conversion to vectors to work out all the geometry, but instead we introduce a local representation of relative directions of edges at a vertex allowing a simple equality test.

Edge directions

All directions are integer multiples of 1/10th turn (mod 10) so we use these integers for comparing directions. The face adding process always adds to the right of a given directed edge (a,b) which must be a boundary directed edge. [Adding to the left of an edge (a,b) would mean that (b,a) will be the boundary direction and so we are really adding to the right of (b,a)]. Face adding looks to see if either of the two other edges already exist in the graph by considering the end points a and b to which the new face is to be added, and inspecting edges of existing faces at a going anti-clockwise from (a,b) and at b going clockwise from (b,a).

This allows an edge in a particular sought direction to be discovered. If it is not found it is assumed not to exist. However, the search will be undermined, and will report a crossing boundaries error if a gap (= second boundary edge) is encountered before all the faces at the vertex are accounted for. In this case there must be more than two boundary directed edges at the vertex and it is unsafe to assume the edge being sought is not already present in the Tgraph.

Establishing the no crossing boundaries property ensures these failures cannot occur. We can easily check this property for newly created graphs (with checkTgraph) and the face adding operations cannot create crossing boundaries.

Touching Vertices and Crossing Boundaries

When a new face to be added on (a,b) has neither of the other two edges already in the graph, the third vertex needs to be created. However it could already exist in the Tgraph – it is not on an edge coming from a or b but from another non-local part of the Tgraph. We call this a touching vertex. If we simply added a new vertex without checking for a clash this would create a nonsense graph. However, if we do check and find an existing vertex, we still cannot add the face using this because it would create a crossing boundary.

Our version of forcing prevents face additions that would create a touching vertex/crossing boundary by calculating the positions of boundary vertices.

No conflicting edges

There is a final (simple) check when adding a new face, to prevent a long edge (phiEdge) sharing with a short edge. This can arise if we force an incorrect graph (as we will see later).

Implementing Forcing

Our order of forcing prioritises updates (face additions) which do not introduce a new vertex. Such safe updates are easy to recognise and they do not require a touching vertex check. Surprisingly, this pretty much removes the problem of touching vertices altogether.

As an illustration, consider foolDMinus again on the left of figure 11. Adding the left dart onto edge (12,14) is not a safe addition (and would create a crossing boundary at 6). However, adding the right dart RD(6,14,11) is safe and creates the new edge (6,14) which then makes the left dart addition safe. In fact it takes some contrivance to come up with a Tgraph with an update that could fail the check during forcing when safe cases are always done first. Figure 17 shows such a contrived Tgraph formed by removing the faces shown in green from a twice decomposed sun on the left. The forced result is shown on the right. When there are no safe cases, we need to try an unsafe one. The four green faces at the bottom are blocked by the touching vertex check. This leaves any one of 9 half-kites at the centre which would pass the check. But after just one of these is added, the check is not needed again. There is always a safe addition to be done at each step until all the green faces are added.

Figure 17: A contrived example requiring a touching vertex check
Figure 17: A contrived example requiring a touching vertex check

Boundary information

The implementation of forcing has been made more efficient by calculating some boundary information in advance. This boundary information uses a type Boundary

data Boundary 
  = Boundary
    { bDedges     :: [(Vertex,Vertex)]
    , vFaceAssoc  :: AssocList Vertex [TileFace]
    , vPointAssoc :: AssocList Vertex (Point V2 Double)
    , allFaces    :: [TileFace]
    , allVertices :: [Vertex]
    , nextVertex  :: Vertex
    } deriving (Show)

This records the boundary directed edges (bDedges) plus an association list of the faces incident with each boundary vertex (vFaceAssoc) plus an association list with the position of each boundary vertex (vPointAssoc). It also keeps track of all the faces and vertices. The boundary information is easily incremented for each face addition without being recalculated from scratch, and a final graph with all the new faces is easily recovered from the boundary information when there are no more updates.

force:: Tgraph -> Tgraph
force = recoverGraph . forceAll updatesBD . makeBoundary

makeBoundary  :: Tgraph -> Boundary
recoverGraph  :: Boundary -> Tgraph
forceAll      :: (Boundary -> [Update]) -> Boundary -> Boundary

updatesBD:: Boundary -> [Update]

The recursive (forceAll updatesBD) first uses updatesBD to calculate a list of possible updates then selects a safe update, only doing an unsafe update if there are no safe ones. After doing an update it recurses, so it recalculates a new list of updates at each step (after each update). The saving that comes from using boundaries lies in efficient incremental changes to boundary information and, of course, in avoiding the need to consider internal faces.

In addition to force we can easily define

wholeTiles:: Tgraph -> Tgraph
wholeTiles = recoverGraph . forceAll wholeTileUpdates . makeBoundary 

which just uses the first forcing rule to make sure every half-tile has a matching other half.

We also have a version of force which counts to a specific number of face additions.

stepForce :: Int -> Tgraph -> Boundary

This proved essential in uncovering problems of accumulated innaccuracy in calculating boundary positions (now fixed).

Some Other Experiments

Below we describe results of some experiments using the tools introduced above. Specifically: emplacements, sub-Tgraphs, incorrect tilings, and composition choices.


The finite number of rules used in forcing are based on local boundary vertex and edge information only. We may be able to improve on this by considering a composition and forcing at the next level up before decomposing and forcing again. This thus considers slightly broader local information. In fact we can iterate this process to all the higher levels of composition. Some graphs produce an empty graph when composed so we can regard those as maximal compositions. For example composeG fool produces an empty graph.

The idea now is to take an arbitrary graph and apply (composeG . force) repeatedly to find its maximally composed graph, then to force the maximal graph before applying (force . decomposeG) repeatedly back down to the starting level (so the same number of decompositions as compositions).

We call the function emplace, and call the result the emplacement of the starting graph as it shows a region of influence around the starting graph.

With earlier versions of forcing when we had fewer rules, emplace g often extended force g for a Tgraph g. This allowed the identification of some new rules. Since adding the new rules we have not yet found graphs with different results from force and emplace. [Although, the vertex labelling of the result will usually be different].


In figure 18 on the left we have a four times decomposed dart dartD4 followed by two sub-Tgraphs brokenDart and badlyBrokenDart which are constructed by removing faces from dartD4 (but retaining the connectedness condition and the no crossing boundaries condition). These all produce the same forced result (depicted middle row left in figure 15).

Figure 18: dartD4, brokenDart, badlyBrokenDart
Figure 18: dartD4, brokenDart, badlyBrokenDart

However, if we do compositions without forcing first we find badlyBrokenDart fails because it produces a graph with crossing boundaries after 3 compositions. So composeG on its own is not always safe, where safe means guaranteed to produce a valid Tgraph from a valid correct Tgraph.

In other experiments we tried force on Tgraphs with holes and on incomplete boundaries around a potential hole. For example, we have taken the boundary faces of a forced, 5 times decomposed dart, then removed a few more faces to make a gap (which is still a valid Tgraph). This is shown at the top in figure 19. The result of forcing reconstructs the complete original forced graph. The bottom figure shows an intermediate stage after 2200 face additions. The gap cannot be closed off to make a hole as this would create a crossing boundary, but the channel does get filled and eventually closes the gap without creating a hole.

Figure 19: Forcing boundary faces with a gap (after 2200 steps)
Figure 19: Forcing boundary faces with a gap (after 2200 steps)

Incorrect Tilings

When we say a Tgraph g is a correct graph (respectively: incorrect graph), we mean g represents a correct tiling (respectively: incorrect tiling). A simple example of an incorrect graph is a kite with a dart on each side (called a mistake by Penrose) shown on the left of figure 20.

*Main> mistake
Tgraph { vertices = [1,2,4,3,5,6,7,8]
       , faces = [RK (1,2,4),LK (1,3,2),RD (3,1,5)
                 ,LD (4,6,1),LD (3,5,7),RD (4,8,6)

If we try to force (or emplace) this graph it produces an error in construction which is detected by the test for conflicting edge types (a phiEdge sharing with a non-phiEdge).

*Main> force mistake
Tgraph {vertices = *** Exception: doUpdate:(incorrect tiling)
Conflicting new face RK (11,1,6)
with neighbouring faces
[RK (9,1,11),LK (9,5,1),RK (1,2,4),LK (1,3,2),RD (3,1,5),LD (4,6,1),RD (4,8,6)]
in boundary
Boundary ...

In figure 20 on the right, we see that after successfully constructing the two whole kites on the top dart short edges, there is an attempt to add an RK on edge (1,6). The process finds an existing edge (1,11) in the correct direction for one of the new edges so tries to add the erroneous RK (11,1,6) which fails a noConflicts test.

Figure 20: An incorrect graph (mistake), and the point at which force mistake fails
Figure 20: An incorrect graph (mistake), and the point at which force mistake fails

So it is certainly true that incorrect graphs may fail on forcing, but forcing cannot create an incorrect graph from a correct graph.

If we apply decomposeG to mistake it produces another incorrect graph (which is similarly detected if we apply force), but will nevertheless still compose back to mistake if we do not try to force.

Interestingly, though, the incorrectness of a graph is not always preserved by decomposeG. If we start with mistake1 which is mistake with just two of the half darts (and also an incorrect tiling) we still get a similar failure on forcing, but decomposeG mistake1 is no longer incorrect. If we apply composeG to the result or force then composeG the mistake is thrown away to leave just a kite (see figure 21). This is an example where composeG is not a left inverse to either decomposeG or (force . decomposeG).

Figure 21: mistake1 with its decomposition, forced decomposition, and recomposed.
Figure 21: mistake1 with its decomposition, forced decomposition, and recomposed.

Composing with Choices

We know that unknowns indicate possible choices (although some choices may lead to incorrect graphs). As an experiment we introduce

makeChoices :: Tgraph -> [Tgraph]

which produces 2^n alternatives for the 2 choices of each of n unknowns (prior to composing). This uses forceLDB which forces an unknown to be a largeDartBase by adding an appropriate joined half dart at the node, and forceLKC which forces an unknown to be a largeKiteCentre by adding a half dart and a whole kite at the node (making up the 3 pieces for a larger half kite).

Figure 22 illustrates the four choices for composing fool this way. The top row has the four choices of makeChoices fool (with the fool shown embeded in red in each case). The bottom row shows the result of applying composeG to each choice.

Figure 22: makeChoices fool (top row) and composeG of each choice (bottom row)
Figure 22: makeChoices fool (top row) and composeG of each choice (bottom row)

In this case, all four compositions are correct tilings. The problem is that, in general, some of the choices may lead to incorrect tilings. More specifically, a choice of one unknown can determine what other unknowns have to become with constraints such as

  • a and b have to be opposite choices
  • a and b have to be the same choice
  • a and b cannot both be largeKiteCentres
  • a and b cannot both be largeDartBases

This analysis of constraints on unknowns is not trivial. The potential exponential results from choices suggests we should compose and force as much as possible and only consider unknowns of a maximal graph.

For calculating the emplacement of a graph, we first find the forced maximal graph before decomposing. We could also consider using makeChoices at this top step when there are unknowns, i.e a version of emplace which produces these alternative results (emplaceChoices)

The result of emplaceChoices is illustrated for foolD in figure 23. The first force and composition is unique producing the fool level at which point we get 4 alternatives each of which compose further as previously illustrated in figure 22. Each of these are forced, then decomposed and forced, decomposed and forced again back down to the starting level. In figure 23 foolD is overlaid on the 4 alternative results. What they have in common is (as you might expect) emplace foolD which equals force foolD and is the graph shown on the right of figure 11.

Figure 23: emplaceChoices foolD
Figure 23: emplaceChoices foolD

Future Work

I am collaborating with Stephen Huggett who suggested the use of graphs for exploring properties of the tilings. We now have some tools to experiment with but we would also like to complete some formalisation and proofs. For example, we do not know if force g always produces the same result as emplace g.

It would also be good to establish that g is incorrect iff force g fails.

We have other conjectures relating to subgraph ordering of Tgraphs and Galois connections to explore.

by readerunner at January 06, 2022 03:22 PM

January 05, 2022

Philip Wadler

Beyond the Scope


Researchers tend to focus on research, ignoring relevant politics. But of course we should pay attention to the context of our work. Here is a comic on that subject, by Max Easton and Lizzie Nagy. Spotted in The Nib.

by Philip Wadler ( at January 05, 2022 04:32 PM

Tweag I/O

Scoped Effect Resources for Polysemy

Effect systems like Polysemy provide the programmer with a flexible way to keep the business logic of a program as flexible as possible by separating the definition of effects and their interpretation. This is useful for many reasons, but especially for testing and mocking. For example, instead of using an interpreter that runs a task over the network, one can swap it for an in-memory implementation when running tests. This allows to test features in isolation.

To achieve this goal, effects should clearly convey functionality without exposing their implementation. In many cases, this means exposing interpreters involving low-level constructs, such as IO, StateT or exceptions, only at the outermost levels of the application.

However, for some kinds of effects, it can be hard to design an expressive interface due to the semantics of their primitive resources. One instance of those are resources whose lifetime is scoped to a small part of a program (called a region in this post), like a database connection.

In this post I will show why these effects are tricky, and outline the thought process that led to a solution that allows for transparent locally scoped resources.

The Use Case: A Synchronization Effect

Take for example an abstraction of an MVar, named Sync, used to signal a synchronization point between two threads in this program:

import Polysemy
import Polysemy.Async

program :: Sem [Sync, Async, Output Text] ()
program = do
  async do -- uses Async
    output "background thread" -- uses Output Text
    signal -- uses Sync
  wait -- uses Sync
  output "main thread" -- uses Output Text

main :: IO ()
main = do
  log <- (runFinal . asyncToIOFinal . runOutputList . interpretSync) do
  traverse_ putStrLn log

The semantics of Sync are that wait should block until signal gets executed; and when running program twice in a row, the semantics shouldn’t change.

A simple implementation might look like the following:

data Sync :: Effect where
  Wait :: Sync m ()
  Signal :: Sync m ()

interpretSync ::
  Member (Embed IO) r =>
  InterpreterFor Sync r
interpretSync sem =
  mv <- embed newEmptyMVar
  run mv sem
    run mv =
      interpret \case
        Wait -> embed (takeMVar mv)
        Signal -> embed (putMVar mv)

This interpreter chooses a concrete implementation with the primitives MVar and IO, which embody the “low-level constructs” that, as mentioned in the introduction, should be run as far removed from the logic as possible.

Despite the MVar being shared among the two executions of program, this construct works as intended, since the calls to wait are sequential.

However, the problems of this naive implementation start to show when running two instances of program concurrently, causing a race condition – the second call to wait might take the MVar while the first call to signal is executed. In other words, the interpreter cannot distinguish between the consumers of the effect:

main :: IO ()
main = do
  (runFinal . asyncToIOFinal . interpretSync) do
    async program

Leaky Abstraction: Using Interpreters in Business Logic

A straightforward solution for the race condition above would be to run interpretSync directly at the call site.

program :: Sem [Async, Output, Embed IO] ()
program = do
  interpretSync do -- Transforms the `Sync` requirement to `Embed IO`
    async do
      output "background thread"
    output "main thread"

This solution is nice because it restricts the use of the corresponding MVar to the region in which it is used. A restriction of a resource to a region, or scoping of a resource, is commonly performed using the bracket combinator; the resource in question for this example is the MVar.

Unfortunately, like bracket, the interpreter acquires a concrete resource in the supposedly abstract business logic that propagates its constraints to any program that calls this function, as is evident from the Embed IO member constraint.

This issue might be more clearly undesirable for effects that do actual I/O work, like database transactions:

data Database :: Effect where
  Query :: AbstractQuery a -> Database query m a
  Transact :: m a -> Database query m a

interpretDatabasePostgres ::
  Member PostgresConnection r =>
  InterpreterFor (Database PostgresQuery) r
interpretDatabasePostgres =

postgresProgram ::
  Member PostgresConnection r =>
  Sem r ()
postgresProgram =
  interpretDatabasePostgres do
    transact do
      query (AbstractQuery.fetchById 1)

This effect’s implementation (only sketched here) is more complex than Sync’s, but it illustrates how committing to a concrete resource (here, a database connection) can ruin the flexibility that effect systems provide — using interpretDatabasePostgres in postgresProgram causes the implementation to be fixed to PostgreSQL, prohibiting the testing of postgresProgram with an in-memory version of Database.

The Old Interpreter Switcheroo: Hiding the Implementation with Higher-Order Effects

In order to fix that implementation leak, the scoping part of interpretSync/interpretDatabasePostgres has to be separated from the rest of the interpretation, so that the interpreter for Wait and Signal is provided with a dynamically allocated resource.

Transact’s signature hints at a feature that can be exploited to achieve this: Higher-order effects. This term denotes an effect constructor that uses the monad m in its parameters, allowing it to store an entire region for evaluation in an interpreter.

A higher-order Sync.use :: Member Sync r => Sem r a -> Sem r a should have the following semantics, using program from before:

main :: IO ()
main = do
  (runFinal . asyncToIOFinal . interpretSync) do
    async (Sync.use program) -- both calls to `use` should have their own `MVar`
    Sync.use program

This snippet introduces a new effect constructor, Sync.use, which stores one instance of program. Higher-order regions are notoriously difficult to deal with in interpreters, so the following sketches a simplified version:

data Sync :: Effect where
  Wait :: Sync m ()
  Signal :: Sync m ()
  Use :: m a -> Sync m a

interpretSyncWithMVar ::
  Members [Error Text, Embed IO] r =>
  MVar () ->
  InterpreterFor Sync r
interpretSyncWithMVar mv =
  interpretH \case
    Wait -> embed (takeMVar mv)
    Signal -> embed (putMVar mv)
    Use region -> do
      mv <- embed newEmptyMVar
      interpretSyncWithMVar mv =<< runT region

interpretSync ::
  Members [Error Text, Embed IO] r =>
  InterpreterFor Sync r
interpretSync sem =
  interpretH \case
    Wait -> throw "Called Wait without Use"
    Signal -> throw "Called Signal without Use"
    Use region -> do
      mv <- embed newEmptyMVar
      interpretSyncWithMVar mv =<< runT region

These two interpreters split the work – interpretSync is allocating the MVar resource, while interpretSyncWithMVar implements the effect logic, the former refusing to handle any action it’s not equipped to deal with by throwing runtime errors.

Our interpreter, interpretSync, makes use of one of Polysemy’s features for higher-order interpretation: the function runT does not interpret the Sync effect in region. This lets us switch from interpretSync to interpretSyncWithMVar when interpreting Use.

The caveat of this solution is that runtime errors don’t prevent incorrect programs from being compiled; in other words, the interpreter is unsound. In the following example, it allows an accidental call to wait outside of the use region:

prog1 :: Members [Sync, Async] r => Sem r ()
prog1 = do
  use do
    async do

In the rest of this article, however, I will build upon this idea, and describe the general scoped-resource abstraction that was built for Polysemy, where only sound programs can be written.

Generalizing the Problem

Let’s forget the specifics of Sync and focus on the subject matter: the allocation of resources scoped to a region of the program. The interpreter for a resource scoping effect, aptly named Scoped, should:

  • Allocate a resource (the MVar) whose lifetime is restricted to the region in which the effect is used
  • Allow multiple resource allocations within one interpreter
  • Hide as much of the implementation from the use site as possible
  • Be sound, i.e. not require exceptions for incorrect use
  • Allow the business logic to explicitly specify where the resource is used, without knowledge of its implementation details

In the previous section the job of the outer interpreter, interpretSync, was precisely to acquire the resource and pass it to the inner interpreter, interpretSyncWithMVar, which executes the effect-specific logic. Consequently, the generalized version of it takes a resource acquisition action and a parameterized interpreter:

interpretScoped ::
  Sem r resource ->
  (resource -> InterpreterFor effect r) ->
  InterpreterFor (Scoped resource effect) r
interpretScoped acquireResource scopedInterpreter = ...

This already looks better – the second parameter’s type has the exact shape of interpretSyncWithMVar. The implementation now has to acquire the resource with the first argument and use the second one to interpret the higher-order region.

Note that interpretScoped is an interpreter for Scoped resource effect. You should understand a program with effect Scoped resource effect as a program which can use effect under the condition that it has acquired a resource. The inner region (stored in a Use in our example), on the other hand, does actually use effect directly. So the inner interpreter is an interpreter for effect itself.

We will also need something to play the role of Use: a function to allocate a resource for a region. The region uses effect, but it is used in a program that uses Scoped resource effect. In Polysemy, a function that changes the effects available to a region is written as an interpreter, which we will be calling scoped.

scoped ::
  Member (Scoped resource effect) r =>
  InterpreterFor effect r

In our concrete example, the effect parameter is Sync, but the resource parameter must stay polymorphic, because the concrete implementation should remain hidden from the business logic. The hard part, however, is figuring out the implementation of scoped, and this requires some knowledge about Polysemy’s internals.

Here Be Dragons: The Full Implementation

The Scoped resource effect effect must perform two tasks:

  • Store the region in which the scope should be active
  • Interpret effects of type effect in a scope where a resource exists

This suggests these two constructors for Scoped:

data Scoped (resource :: Type) (effect :: Effect) :: Effect where
  InScope :: (resource -> m a) -> Scoped resource effect m a
  Run :: resource -> effect m a -> Scoped resource effect m a

scoped ::
  Member (Scoped resource effect) r =>
  InterpreterFor effect r
scoped region =
  send $ InScope \resource -> transform (Run resource) region

We can store a region with InScope. Regions are stored as functions resource -> m a so that the interpreter will be able to create and inject a scoped resource. We then store the resource in the Run constructor, which simply pairs up an effect with the scoped resource. The implementation of scoped uses the transform combinator from Polysemy, which converts an effect type into another.

The implementation of the interpreter, interpretScoped, follows:

interpretScoped ::
  Sem r resource ->
  (resource -> InterpreterFor effect r) ->
  InterpreterFor (Scoped resource effect) r
interpretScoped acquireResource scopedInterpreter =
  interpretH \case
    Run resource action ->
      scopedInterpreter resource (send action)
    InScope region -> do
      resource <- acquireResource
      interpretScoped (region resource)

Now Sync can be interpreted in terms of Scoped with all its benefits:

data Sync :: Effect where
  Wait :: Sync m ()
  Signal :: Sync m ()

interpretSync ::
  Member (Embed IO) r =>
  MVar () ->
  InterpreterFor Sync r
interpretSync mv =
  interpret \case
    Wait -> embed (takeMVar mv)
    Signal -> embed (putMVar mv)

program :: Sem [Scoped resource Sync, Async, Output Text] ()
program = do
  scoped do
    async do
      output "background thread"
    output "main thread"

main :: IO ()
main = do
  log <- (runFinal . asyncToIOFinal . runOutputList . interpretScoped (embed newEmptyMVar) interpretSync) do
    async program
  traverse_ putStrLn log

The resource parameter stays polymorphic in program, to be instantiated as MVar only when interpretSync is called in main, thereby hiding the implementation from the logic, while providing the mechanism by which GHC connects the resource to the use site.

Wrapping Up

I’ve worked with Polysemy quite intensely, but when I started using this pattern I was surprised at the ergonomics it displays in practice. I already built several useful effects with it, like a publish/subscribe mechanism built on unagi channels that duplicates the channel for each subscriber:

program = do
  async do
    Events.subscribe do
      assertEqual 1 =<< Events.consume
  async do
    Events.subscribe do
      assertEqual 1 =<< Events.consume
  Events.publish 1

Finally, I’d like to acknowledge the brilliant people who made this possible: Love Waern, whose genius manifested the magic of the implementation, Georgi Lyubenov, who stated the problem that motivated it, and Sandy Maguire, the creator of the amazing Polysemy.

January 05, 2022 12:00 AM

January 03, 2022

Magnus Therning

Accessing the host from inside a Docker container

To give the container access to a service running on the the host add extra_hosts to its definition in the Compose file:

    - "host.docker.internal:host-gateway"

Then it's possible to access it as host.docker.internal. Just don't forget to bind the service on the host to something else than

January 03, 2022 12:48 PM

Joachim Breitner

Telegram bots in Python made easy

A while ago I set out to get some teenagers interested in programming, and thought about a good way to achieve that. A way that allows them to get started with very little friction, build something that’s relevant in their currently live quickly, and avoids frustration.

They were old enough to have their own smartphone, and they were already happily chatting with their friends, using the Telegram messenger. I have already experimented a bit with writing bots for Telegram (e.g. @Umklappbot or @Kaleidogen), and it occurred to me that this might be a good starting point: Chat bot interactions have a very simple data model: message in, response out, all simple text. Much simpler than anything graphical or even web programming. In a way it combines the simplicity of the typical initial programming exercises on the command-line with the impact and relevance of web programming.

But of course “real” bot programming is still too hard – installing a programming environment, setting up a server, deploying, dealing with access tokens, understanding the Telegram Bot API and mapping it to your programming language.


So I built a browser-based Python programming environments for Telegram bots that takes care of all of that. You simply write a single Python function, click the “Deploy” button, and the bot is live. That’s it!

This environment provides a much simpler “API” for the bots: Define a function like the following:

  def private_message(sender, text):
     return "Hello!"

This gets called upon a message, and if it returns a String, that’s the response. That’s it! Not enough to build any kind of Telegram bot, but sufficient for many fun applications.

A chatbot
A chatbot

In fact, my nephew and niece use this to build a simple interactive fiction game, where the player says where they are going (“house”, ”forest”, “lake”) and thus explore the story, and in the end kill the dragon. And my girlfriend created a shopping list bot that we are using “productively”.

If you are curious, you can follow the instructions to create your own bot. There you can also find the source code and instructions for hosting your own instance (on Amazon Web Services).

Help with the project (e.g. improving the sandbox for running untrustworthy python code; making the front-end work better) is of course highly appreciated, too. The frontend is written in PureScript, and the backend in Python, building on Amazon lambda and Amazon DynamoDB.

by Joachim Breitner ( at January 03, 2022 10:20 AM

January 01, 2022

Magnus Therning

Trimming newline on code block variable

Today I found ob-http and decided to try it out a little. I quickly ran into a problem of a trailing newline. Basically I tried to do something like this:

#+name: id
#+begin_src http :select .id :cache yes
POST /foo
Content-Type: application/json

  "foo": "toto",
  "bar": "tata"

#+RESULTS[c5fd99206822a2109d7ac1d140185e6ec3f4f1d9]: id

#+header: :var id=id
#+begin_src http
POST /foo/${id}/fix

The trailing newline messes up the URL though, and the second code block fails.

I found two ways to deal with it, using a table and using org-sbe

Using a table

#+name: id
#+begin_src http :select .id :cache yes :results table
POST /foo
Content-Type: application/json

  "foo": "toto",
  "bar": "tata"

#+RESULTS[c5fd99206822a2109d7ac1d140185e6ec3f4f1d9]: id
| 48722051-f81b-433f-acb4-a65d961ec841 |

#+header: :var id=id[0,0]
#+begin_src http
POST /foo/${id}/fix

Using org-sbe

#+name: id
#+begin_src http :select .id :cache yes
POST /foo
Content-Type: application/json

  "foo": "toto",
  "bar": "tata"

#+RESULTS[c5fd99206822a2109d7ac1d140185e6ec3f4f1d9]: id

#+header: :var id=(org-sbe id)
#+begin_src http
POST /foo/${id}/fix

January 01, 2022 06:24 PM

December 30, 2021

Matthew Sackman

Let's build! A distributed, concurrent editor: Part 5 - Actors

In this series:

Last week I explored how the server could store a document and its history on disk, and how it can calculate and communicate the correct document state when it receives an undo or redo message. This week I want to look at the overall architecture of the server, and servers in general. I want to discuss the Actors model for dealing with concurrency, and why I believe it is an excellent approach to architecting servers. There are definitely opinions ahead, which are a result of my experiences and biases.

A brief history

When I was at university, in both undergrad and PhD courses, I studied several formal models for concurrent programming. At the time, study into concurrency was a hot topic: consumer CPUs had started going parallel, with hyper-threading appearing. Everyone knew that writing programs that make good use of these new CPUs was difficult: parallel computation had been around for a long time in more expensive and rarer CPUs. But now it was starting to cause headaches for everyone.

The dominant approach, both then and now, is to use locks to regulate how multiple concurrent threads access data structures. But as a program gets larger, it becomes difficult to use locks correctly: it is easy to accidentally introduce deadlocks.

  • Locks don’t compose, so you always have to reason about the program as a whole, not small parts in isolation: small parts can be correct, but the overall program can still be faulty.
  • Pointer aliasing and other features of mainstream programming languages make it impossible for static analysis (type-checkers) to prove a program is free of deadlocks, without significant changes to the type-systems of those languages.
  • Tests normally don’t help much because whether a faulty program deadlocks when running is dependent on how its threads get scheduled, which is not something a program typically has any control over.

Lots of ideas were receiving attention, from ownership types (which I believe went on to form a major part of the design of Rust) to software transactional memory (which didn’t really go on to anything much as far as I know; although I’ve implemented it a few times), and others. At the time, there was still significant resistance from some quarters to statically-typed languages: a lot of people seemed to revel in the fact that a human is smarter than a computer (at some tasks), and so object to type-checkers that tell them their program can’t be proven safe. Thankfully, from what I can tell, attitudes seem to have changed. But I suspect that if Rust 1.0 had been released 10 years earlier, it would have been dead on arrival.

Ownership types, and similar research at the time, was focused on trying to prove at compile-time (i.e. before running it), that a program which shares data between multiple threads is free from data-races and deadlocks. Transactional memory was focused on identifying at run-time unsafe concurrent access to shared data, and making sure that when it occurs, the effects are undone and tidied up without doing any damage. By contrast, the Actor model prohibits sharing mutable data between actors. Instead, data is sent between actors, but a piece of data should only ever be accessible by a single actor at a time.

Actor implementations can rely on ownership types, or use transactional memory if they want to. Ultimately, communication between actors is achieved with a shared data structure: a mailbox or queue of some sort, which must be safe for multiple actors to concurrently send to, and a single actor to receive from. But really, Actors don’t require fancy new type-systems, or anything particularly novel from a language run-time. The Actors model has been around for some time: Carl Hewitt and others created it in the 1970s, and Gul Agha further developed it in the 1980s. Type-systems have been developed that allow the communication patterns between actors (protocols) to be specified and verified (session types). To my knowledge, they have not enjoyed mainstream adoption.

Towards the end of my time at university, I did more and more work on RabbitMQ, which is written in Erlang. Erlang is an Actor programming language. So a lot of my early career was spent building a pretty scalable and reasonably well-performing distributed messaging system, using Actors. This probably explains why I find myself biased in favour of Actors, and why I have built Actor frameworks a few times.

What are Actors?

An actor is a little server. It has a mailbox; other actors will send it messages by posting them into the mailbox. When a message appears in the mailbox, the actor will retrieve it, and will process it in some way. Then it’ll go back to sleep until the next message arrives. In the course of processing a message, an actor can send messages to other actors it knows about (including sending to itself), it can mutate its own state, it can spawn new actors if it wishes, and it can choose whether to terminate instead of waiting for the next message. When a message is sent to an actor, ownership of that message transfers to the recipient.

Unlike most other models of concurrency, actors combine the unit of concurrency (a thread, or thread-like thing) with state ownership: you cannot talk about state without talking about the actor who owns and manages that state. It is nonsensical to talk about multiple actors having access to the same state. If you want an actor to change its state then you send it a message asking it to do so. Sending a message to an actor is typically asynchronous: the sender does not block, waiting for either the message to be received, or some sort of reply to be issued. But if your message requires a reply then you can include your own mailbox as part of that message, and the actor can use that to send you its response (amongst other techniques). A single actor is always a single thread. So sometimes you’ll want to make sure you spawn enough actors of a particular type so that you have one per CPU core, to ensure you can make best use of the CPU resources available to you.

It’s this concept of unified state and concurrency that I think appeals to me: one model to keep in my head and think about instead of two or more. Uncomplicated rules about state ownership.

Actors can be short-lived or long-lived. Some programs might create a bunch of actors as soon as they start, and those actors will stay alive for a long time. Equally, it should be very cheap to spawn short-lived actors: actors that come into existence just to carry out some specific task and then terminate. Because of the expectation that spawning new actors should be very cheap, Actor languages and frameworks often work best on green-threads: virtual threads that are typically managed by the language run-time. They’re lighter-weight than OS-level threads (i.e. context switching between them is cheaper) because the language run-time is able to take advantage of extra knowledge it has, to do less work to switch between threads. It’s common for such language run-times to create one OS-level thread for each CPU core, and then the run-time chooses how to schedule its actors (or green-threads) across those OS-level threads.

Go has always supported green-threads, in the form of Go-routines. Its channels are a little like mailboxes. It makes no further provisions for Actors, but it’s not too difficult to fill in the missing parts, which I’ve done with my actors library. There are Actor frameworks in most mainstream languages; and several in Go (though many of them look abandoned to me).

Actors have some ideas in common with micro-services. But actors don’t need to use TCP or HTTP or GRPC to send messages to each other, and you don’t need docker-compose or mini-kube or any other pile of ridiculous, unnecessary, and accidental complexity to orchestrate them. An essential feature of any Actor framework is the provision for creating, managing, and terminating actors. You don’t need external tools for it: it’s all baked in. Being notified that an actor has terminated (and why), or being able to terminate a set of actors in a careful and deterministic manner, is critical for a reliable and well-behaved program. Erlang has an entire set of libraries and principles to help with this, and I’ve borrowed several design ideas from there.

Actors also differ from micro-services in that micro-services normally form a distributed system: each service may run on a different machine, and they use a network to pass messages between them. Each of those machines could fail independently (catch fire etc), and messages could be lost on the network. That friendly dog could come bounding in and chew up a network cable. These sorts of failure scenarios are part and parcel of a distributed system. For actors all running within a single program, a single OS-process, these sorts of failures can’t happen. Nevertheless, some Actor frameworks also work in a distributed setting, allowing messages to be sent between actors on different machines, without the code having to show any knowledge of the location of the actors. Erlang can do this, for example. My actor framework cannot; in Go, it seems very difficult to send objects between machines and maintain pointer equality properties (if one actor sends the same pointer to another actor, twice, then the recipient should be able to see that both pointers have the same value (point to the same object). In light of garbage collection and the fact Go does not support weak references, I currently believe it’s not possible to implement this correctly without changes to the language run-time).

So Actors allow you to architect your program around a set of servers, which have a simple combined model of state ownership and concurrency. They send messages to each other to coordinate and communicate. The Actor framework provides mechanisms to spawn new actors, to monitor actors for termination, and to manage the life-cycle of actors. I believe that building programs using actors helps you practise thinking about things like:

  • the different orders in which messages might be sent and received;
  • the different orders in which your actors might be scheduled and preempted;
  • what bits of state should belong together because they need to be updated at the same time;
  • how to run multiple actors of the same type to scale your program and make good use of a parallel CPU.

Thinking about these sorts of things regularly helps when it comes to designing and building distributed systems: it’s all the same stuff, apart from that distributed systems can fail in even more exciting ways.

Actor Life-cycle

An actor can be divided into two parts: the client-side API, and the server-side. If you’re familiar with Go, you may have heard of a general design principle which says “don’t make channels part of your API”. Go’s channels can have complex and subtle semantics, and it’s generally advisable to wrap them in friendlier API. The same is true of an actor and its mailbox: if you expose just the mailbox as an actor’s API then it’s not clear what messages the actor responds to, where there’s any ordering requirements of particular messages, and so on. So instead, it’s advised that you wrap the mailbox with a client-side API. This client-side code presents to the world the methods that your actor supports, but hides the details of posting messages into the actor’s mailbox, maybe waiting for a response, and any other logic, from the user of your actor.

The server-side of the actor gets created and completes its Init method before the call to Spawn returns.

Figure 1: Spawning an actor.

Figure 1 depicts spawning a new actor. The server-side of the new actor completes its call to the Init method before the call to Spawn returns. If the Init method returns an error then that error will be the returned from Spawn. It also means that whilst Init is running, no other actor can know of the existence of the new actor. This can be useful: for example it means that the new actor, in its Init method, can send itself messages. Those messages are then guaranteed to be the first items in the actor’s mailbox which means it can safely do some long, complex initialisation asynchronously, and not keep its spawner blocked. Figure 2 shows the flow of messages and control in an actor.

Figure 2: The flow of control and messages in an actor and its mailbox.

If actor X posts two messages in some order (m1, then m2) to actor Z’s mailbox, then it is guaranteed that if both messages are received by actor Z, then m1 will be received before m2. If actor’s X and Y concurrently post messages to actor Z’s mailbox, then without any other mechanism to impose a specific order, the messages could be received by Z in either order.

The server-side of the actor is the code that runs in the actor’s own go-routine. There are 3 call-backs that the server-side code can provide:

  1. Init(arguments) error. This is always the first thing that gets called by the actor’s new go-routine as soon as it gets created. It should do any setup work necessary, any state initialisation that’s required. If it returns a non-nil error then the actor will terminate.
  2. HandleMsg(msg) error. This is called for each message received from the actor’s mailbox. As part of processing the received message, the actor can: send messages to other actors; it can spawn new actors; depending on the type of message it send a reply to the message; and it can choose to terminate: if the method returns a non-nil error then the actor will terminate.
  3. Terminate(reason). The actor terminates when either Init or HandleMsg return a non-nil error or panic. When that occurs, the actor’s go-routine will call this method. If the server-side wants to terminate “normally”, then there is a special error value ErrNormalActorTermination which provides a non-nil error for triggering termination but does not cause any alarming details to be logged.

Some methods are always available to every actor: suitable code provided in both the base client and base server types. One of those is the client TerminateSync method. This sends a message into the mailbox, asking the actor to terminate. The default server-side handler for this message returns the ErrNormalActorTermination error which causes the actor to terminate. The client-side method waits until the actor has finished running its Terminate method before it returns. TerminateSync is also idempotent.

Other client-side methods that are always available include OnTermination(callback) which allows you to register a callback to be invoked when the actor terminates. Through this mechanism you can monitor actors and receive the reason they terminated.

When an actor terminates, any messages left in its mailbox are discarded. For any of these messages which required a reply, the client-side API is able to detect that the server-side terminated before it received the message. Similarly, if the client-side code attempts to post any message to a mailbox after the server-side has terminated, then it can immediately spot that the actor has already terminated and that the message can’t possibly be processed. But, successfully posting a message into a mailbox is no guarantee that the message will be received by the server-side and processed: the server-side could terminate before that occurs.

Shopping-Basket Example

Let’s build an actor that represents a shopping basket. Lots of people will be able to concurrently add to the shopping basket, and the shopping basket actor will be able to answer questions about the number of items in it, and the cost of the entire basket. First, the server-side:

import (

type basketServer struct {
   items map[*Item]uint

var _ actors.Server = (*basketServer)(nil)

func (self *basketServer) Init(log zerolog.Logger, mailboxReader *mailbox.MailboxReader, selfClient *actors.ClientBase) (err error) {
   self.items = make(map[*Item]uint)
   return self.ServerBase.Init(log, mailboxReader, selfClient)

type Item struct {
   Name string
   Cost uint

type summariseMsg struct {
   ItemCount uint // reply field
   TotalCost uint // reply field

func (self *basketServer) HandleMsg(msg mailbox.Msg) (err error) {
   switch msgT := msg.(type) {
   case *Item:
      self.items[msgT] += 1
      return nil

   case *summariseMsg:
      totalCount := uint(0)
      totalCost := uint(0)
      for item, count := range self.items {
         totalCount += count
         totalCost += (count * item.Cost)
      msgT.ItemCount = totalCount
      msgT.TotalCost = totalCost
      return nil

      return self.ServerBase.HandleMsg(msg)

The basketServer embeds a ServerBase which provides default implementations of all the callbacks. I need to override two of the callbacks: Init and HandleMsg. Whenever I override one of these callbacks, I must make sure I also call the default embedded handler otherwise things will break. In HandleMsg there are two message types that this basket actor cares about:

  1. Receiving an *Item means adding the item to the basket. This is asynchronous; there is no reply to the caller. I just add the item into the map in the private server-side state.
  2. Receiving a *summariseMsg message. This needs a response: it is asking the actor to summarise what’s in the basket. The summariseMsg type has actors.MsgSyncBase embedded within it, which adds a little machinery to allow replies to be issued straight into the message itself. So the handler fills in a couple of fields in message, and then calls MarkProcessed on the message, which provides a signal that the client-side can now proceed and safely access the fields.

The client-side looks like this:

type BasketClient struct {

func SpawnBasket(log zerolog.Logger) (*BasketClient, error) {
   clientBase, err := actors.Spawn(log, &basketServer{}, "basket")
   if err != nil {
      return nil, err
   return &BasketClient{ClientBase: clientBase}, nil

func (self *BasketClient) AddItem(item *Item) {

func (self *BasketClient) Summarise() (itemCount, totalCost uint, success bool) {
   msg := &summariseMsg{}
   if self.SendSync(msg, true) {
      return msg.ItemCount, msg.TotalCost, true
   } else {
      return 0, 0, false

The BasketClient is the public API to this actor. It has two methods: AddItem and Summarise, neither of which expose the mailbox, or any implementation detail of the actor. The BasketClient type embeds a *ClientBase which is created by actors.Spawn. This *ClientBase value wraps the actor’s mailbox and provides useful methods like Send and SendSync. Whilst any value can be sent to an actor, only values which embed actors.MsgSyncBase can be passed to SendSync. SendSync takes a 2nd parameter, waitForReply, which if true causes the call to SendSync to block until the server has called MarkProcessed on the message. I set this to true so that when the call to SendSync returns, I know the server-side has processed the message and I can find the answers I’m looking for in the fields of the message. This is why it’s important that on the server-side, MarkProcessed is called after the fields have been filled in. SendSync also returns a boolean. If the value returned is false then it means the server terminated before it received and processed the message: i.e. the message was not processed.

Because BasketClient embeds an *actors.ClientBase value, it also gains TerminateSync and OnTermination methods.

Not a lot of code, but it shows all the key design points:

  • Client-side and server-side use different types (structs) so there’s no danger that private server-side state is publicly exposed in the client.
  • The server-side code and state is only ever run by a single go-routine, so there’s no need for any locks when manipulating private state.
  • The server-side embeds an actors.ServerBase (or similar) which makes my basketServer type a valid actor server.
  • The client-side uses the *actors.ClientBase it gets back from Spawn to post to the actor’s mailbox.
  • Any value of any type can be sent to an actor.
  • Values which embed actors.MsgSyncBase can also be sent to the actor using SendSync. This allows the client-side to block until the server-side has received the message, processed it, and called MarkProcessed on the message. This makes it safe to use a single message to both send values to, and receive values from the actor.

There is a little boilerplate, particularly on the server-side; and you have to remember to call the embedded default implementation of a callback when you override it. But there are only 3 callbacks, so hopefully it’s not too onerous nor the API too wide. The semantics around the synchronisation for both Spawn/Init and TerminateSync/Terminated provide important guarantees about the state of the actor.

Hierarchies of Actors

One very common pattern is to use an actor to manage a set of other (child) actors. I can ensure that:

  • If a child actor terminates with ErrNormalActorTermination (normal termination) then that is no cause for alarm and everything else continues to work.
  • If a child actor terminates for any other reason (abnormal termination) then the manager actor itself terminates.
  • Whenever the manager terminates, it makes sure that all its child actors have terminated.
  • Given the synchronous design of TerminateSync, calling TerminateSync on the manager will not return until all its children have also fully terminated too.

My actors library provides exactly these properties with its ManagerClient type, and SpawnManager function. I can adjust the spawning of baskets so that a new basket is a child of a manager:

func SpawnBasket(manager actors.ManagerClient, name string) (*BasketClient, error) {
   clientBase, err := manager.Spawn(&basketServer{}, name)
   if err != nil {
      return nil, err
   return &BasketClient{ClientBase: clientBase}, nil

Instead of calling actors.Spawn, I call manager.Spawn. I can now create some sort of registry actor, which allows me to access baskets by basket identifier. Again, I’ll do the server-side first:

type basketRegistryServer struct {
   baskets map[uint64]*BasketClient
   manager *actors.ManagerClientBase

var _ actors.Server = (*basketRegistryServer)(nil)

func (self *basketRegistryServer) Init(log zerolog.Logger, mailboxReader *mailbox.MailboxReader, selfClient *actors.ClientBase) (err error) {
   self.baskets = make(map[uint64]*BasketClient)
   manager, err := actors.SpawnManager(log, "basket manager")
   if err != nil {
      return err
   self.manager = manager
   return self.ServerBase.Init(log, mailboxReader, selfClient)

func (self *basketRegistryServer) Terminated(err error, caughtPanic interface{}) {
   self.ServerBase.Terminated(err, caughtPanic)

type ensureBasketMsg struct {
   basketId uint64        // query field
   basket   *BasketClient // reply field

func (self *basketRegistryServer) HandleMsg(msg mailbox.Msg) (err error) {
   switch msgT := msg.(type) {
   case *ensureBasketMsg:
      basket, err := self.ensureBasket(msgT.basketId)
      if err != nil {
         return err
      msgT.basket = basket
      return nil
      return self.ServerBase.HandleMsg(msg)

func (self *basketRegistryServer) ensureBasket(basketId uint64) (basket *BasketClient, err error) {
   basket, found := self.baskets[basketId]
   if !found {
      basket, err = SpawnBasket(self.manager, fmt.Sprintf("basket(%d)", basketId))
      if err == nil {
         self.baskets[basketId] = basket
   return basket, err

The basketRegistryServer has two bits of private state this time, both of which get set up in Init: a map to allow me to find baskets by identifier, and the manager itself. I think of the manager as a child of the registry, and then the baskets will be children of the manager, as shown in figure 3.

The Basket Registry owns the Manager, which owns the Basket actors

Figure 3: Hierarchy of actors

Because there’s a manager and other child actors involved, I override Terminated too to make sure than when the registry terminates, it terminates the manager (which will in turn terminate all its children). As normal, I have to make sure I also call the default base implementation.

There’s only one message type the registry cares about currently: ensureBasketMsg which returns the basket for the given identifier, creating it if it doesn’t already exist. This is a message which gets a reply, hence embedding actors.MsgSyncBase. In the ensureBasket method, I spawn the new basket as a child of the registry’s manager. I also provide a name for the new basket actor which includes its identifier. This will show up in logs and generally makes tracing easier.

Next the client-side:

type BasketRegistryClient struct {

func SpawnBasketRegistry(log zerolog.Logger) (*BasketRegistryClient, error) {
   clientBase, err := actors.Spawn(log, &basketRegistryServer{}, "basket registry")
   if err != nil {
      return nil, err
   return &BasketRegistryClient{ClientBase: clientBase}, nil

func (self *BasketRegistryClient) EnsureBasket(basketId uint64) *BasketClient {
   msg := &ensureBasketMsg{}
   if self.SendSync(msg, true) {
      return msg.basket
   return nil

A couple of improvements are needed to the server though:

  1. If a basket terminates abnormally, then the manager will terminate, which will cause all the baskets to be terminated. It would be good if this cascade continued up so that the registry terminates too. So I want the registry to observe the termination of the manager.
  2. If a basket terminates normally (perhaps the customer goes through the checkout and pays), there is currently no way for the registry to delete its reference to the basket from its map. So the registry should also observe the normal termination of each basket.

For both of these I need to use the OnTermination facility to create a subscription to observe the termination of an actor. Firstly for the manager, I can make a few changes to Init function in the registry:

func (self *basketRegistryServer) Init(log zerolog.Logger, mailboxReader *mailbox.MailboxReader, selfClient *actors.ClientBase) (err error) {
   self.baskets = make(map[uint64]*BasketClient)
   manager, err := actors.SpawnManager(log, "basket manager")
   if err != nil {
      return err
   subscription := manager.OnTermination(func(subscription *actors.TerminationSubscription, err error, caughtPanic interface{}) {
   if subscription == nil {
      return errors.New("Unable to create subscription to manager.")
   self.manager = manager
   return self.ServerBase.Init(log, mailboxReader, selfClient)

The function that I pass to OnTermination will get run when the manager terminates. It will run in its own new go-routine. So to get it to terminate the registry, it uses the client for the registry to ask the registry to terminate. The effect is that if the manager terminates for any reason (which could be caused by a basket terminating abnormally) then the registry will be asked to terminate too. In this way, errors can propagate between actors, and you can make sure that if something goes wrong, actors can be terminated in a controlled and deterministic fashion.

Solving the second problem looks very similar, except I need to subscribe to each new basket. If the basket terminates, the callback will be run in a new go-routine as before, which means to tidy up the registry I need to send it a suitable message:

type deleteBasketMsg uint64

func (self *basketRegistryServer) ensureBasket(basketId uint64) (basket *BasketClient, err error) {
   basket, found := self.baskets[basketId]
   if !found {
      basket, err = SpawnBasket(self.manager, fmt.Sprintf("basket(%d)", basketId))
      if err == nil {
         self.baskets[basketId] = basket
         subscription := basket.OnTermination(func(subscription *actors.TerminationSubscription, err error, caughtPanic interface{}) {
            if err == actors.ErrNormalActorTermination {
         if subscription == nil {
            return nil, errors.New("Unable to create subscription to basket")
   return basket, err

I only bother sending the new deleteBasketMsg if the basket terminated normally: in all abnormal cases, the manager would also terminate, which I already detect and handle suitably. Lastly, I need to add the extra case to HandleMsg:

func (self *basketRegistryServer) HandleMsg(msg mailbox.Msg) (err error) {
   switch msgT := msg.(type) {
   case *ensureBasketMsg:
      basket, err := self.ensureBasket(msgT.basketId)
      if err != nil {
         return err
      msgT.basket = basket
      return nil
   case deleteBasketMsg:
      delete(self.baskets, uint64(msgT))
      return nil
      return self.ServerBase.HandleMsg(msg)

Managing child actors and observing their termination takes a little more code and effort. The reward is a sensible hierarchy of actors, and control over the way the program behaves when errors occur and propagate.

Back to the Distributed Editor

With all that covered, I can now return to the distributed editor I’ve been building and explain how I’ve used my Actors library to structure the server.

Back in part 3 I briefly talked about the URL structure for the HTTP server. I said:

The WebSockets will be available under /documents/$documentName.

What I want to achieve is:

  • One actor manages one document.
  • All of these actors are children of some registry actor which allows me to find such actors by document name.
  • Each of these actors only gets created when a WebSocket to that document name is created.
  • When there are no more WebSockets open for a particular document name, the corresponding actor terminates normally.
  • As long as a WebSocket connection stays alive, it sends messages it receives from the browser to the document’s actor, and updates that it receives from the document’s actor it sends down to the browser over the WebSocket.

The document.go file I linked to last week. Hopefully it’s now clear how that file is divided into the client-side API and server-side. The implementations of Init and HandleMsg should no longer be mysterious. There are 3 different messages the document actor understands:

  1. Messages from a browser, received via a WebSocket. These include edits, undo, and redo messages.
  2. documentSubscribeMsg and documentUnsubscribeMsg messages. These are used by the code that handles the WebSockets to subscribe to a particular document actor. You can see in the code for unsubscribe how if the number of subscribers drops to 0, then the actor will exit normally.

To create a subscription you provide a callback function. The document actor will invoke this function with the bytes that should be sent out of the WebSocket and down to the browser.

The document registry is structured almost exactly the same way as the basket registry from earlier.

That really just leaves the code that handles each WebSocket. It creates the subscription to the document actor and also sends to the document actor any messages it receives from the WebSocket that it’s able to decode. Defers make sure that whatever causes the WebSocket to close, the subscription to the document actor will be cancelled.

That covers pretty much all the interesting bits of the client and server. In about 850 lines of TypeScript and 1800 lines of Go I have a pretty reasonably engineered implementation of a distributed text editor that should be enough for Alice and Bob to start playing with.

What remains to be covered is testing. Testing a distributed system is always fun: as I mentioned earlier, bugs frequently only show themselves if certain events happen in a particular order. I find it very effective to create tests that can generate a random stream of events and feed that into the system. How to do that and how to verify the effect of those events is a subject for next week.

December 30, 2021 11:01 AM

December 27, 2021

Stackage Blog

December 25, 2021

Ken T Takusagawa

[qxjnommp] initializing generalized Fibonacci for monotonic growth

we consider a generalized Fibonacci sequence generated by a state vector.  the next term is the sum of the two oldest elements of the state vector, then, the oldest element is discarded.  here is a demonstration of the evolution of an initial state vector [3,20,100] for 3 iterations.  the oldest elements are on the left; the newest element is added to the right, shifting everything to the left.


the simplest possible initial state vector is all zeros, but that generates only zeros.

the next simplest initial state vector has one 1 and the rest zeros.  this family of initial states, in particular putting the 1 at the oldest slot, is discussed in previous posts.  for such a state vector of width W, it takes (W-1)^2+1 iterations until the state vector has no zeros.  (this is the first of many conjectures in this post inspired by experimental verification for small sizes.)  in the example below, we can see rows of Pascal's triangle shift along the state vector.


the sequence initially oscillates.  it takes many iterations before the sequence becomes monotonically strictly increasing.

the next simplest initial state is all ones.  this generates a sequence that is monotonically nondecreasing.  it takes (W-1)^2+1 iterations until the state vector has no repeated elements, after which point the sequence becomes monotonically strictly increasing.


consider running an initial state of all ones backward as far as possible avoiding negative numbers.  if the width of the state vector is even, the earliest such state alternates ones and zeros.


if the state vector width is odd, there is a state of alternating ones and zeros part way through, but that state can be rewound further backward to a state with a prefix [1,0,0] followed by alternating ones and zeros.


the following initial state vectors generate sequences that are monotonically strictly increasing.  (the first generates the Fibonacci sequence.)  the first two elements sum to 1 larger than the last element.


if we run an even-length such sequence backward, the earliest non-negative state vector (in the example below) has 4s alternated with [0,1,2,3].  the point where the output sequence (last elements) becomes strictly increasing also has a simple form of doubled integers in order [4,4,5,5,6,6,7,7].


for an odd-length initial state vector (width 19 below), the sequence can be rewound further to something with a difficult-to-describe prefix followed by 9s alternated with [0,1,2,3,4,5].


here is the earliest nonnegative state vector of width 201.  the prefix consists of triangular numbers in decreasing order alternated with a sequence which causes pairs of elements to sum to a value which increase by 1 each pair.  91 unpaired, 9+78 = 87, 22+66 = 88, 34+55 = 89.  [91,78,66,55] are decreasing triangular numbers.

[91, 9, 78, 22, 66, 34, 55, 45, 45, 55, 36, 64, 28, 72, 21, 79, 15, 85, 10, 90, 6, 94, 3, 97, 1, 99, 0, 100, 0, 100, 1, 100, 2, 100, 3, 100, 4, 100, 5, 100, 6, 100, 7, 100, 8, 100, 9, 100, 10, 100, 11, 100, 12, 100, 13, 100, 14, 100, 15, 100, 16, 100, 17, 100, 18, 100, 19, 100, 20, 100, 21, 100, 22, 100, 23, 100, 24, 100, 25, 100, 26, 100, 27, 100, 28, 100, 29, 100, 30, 100, 31, 100, 32, 100, 33, 100, 34, 100, 35, 100, 36, 100, 37, 100, 38, 100, 39, 100, 40, 100, 41, 100, 42, 100, 43, 100, 44, 100, 45, 100, 46, 100, 47, 100, 48, 100, 49, 100, 50, 100, 51, 100, 52, 100, 53, 100, 54, 100, 55, 100, 56, 100, 57, 100, 58, 100, 59, 100, 60, 100, 61, 100, 62, 100, 63, 100, 64, 100, 65, 100, 66, 100, 67, 100, 68, 100, 69, 100, 70, 100, 71, 100, 72, 100, 73, 100, 74, 100, 75, 100, 76, 100, 77, 100, 78, 100, 79, 100, 80, 100, 81, 100, 82, 100, 83, 100, 84, 100, 85, 100, 86]

here is Haskell source code which investigates these sequences.  although the state vector would have been best implemented with a circular array, we did not pursue this because it seems difficult in a purely functional language (probably need a state monad, e.g., circular).  we instead used Data.Sequence, which probably caused an asymptotic performance penalty of logarithmic versus constant time in the width of the state vector.

by Unknown ( at December 25, 2021 05:04 PM

GHC Developer Blog

GHC 9.0.2 is now available

GHC 9.0.2 is now available

Zubin Duggal - 2021-12-25

The GHC developers are very happy to at long last announce the availability of GHC 9.0.2. Binary distributions, source distributions, and documentation are available at

GHC 9.0.2 adds first class AArch64/Darwin support using the LLVM backend, as well as fixing a number of critical correctness bugs with the 9.0.1 release, along with numerous improvements to compiler performance and memory usage:

A complete list of bug fixes and improvements can be found in the release notes.

Finally, thank you to Microsoft Research, GitHub, IOHK, the Zw3rk stake pool, Tweag I/O, Serokell, Equinix, SimSpace, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Moreover, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do open a ticket if you see anything amiss.

by ghc-devs at December 25, 2021 12:00 AM

Gil Mizrahi

Things I worked on in 2021

In this post I'd like to highlight a few (programming related) things I worked on in 2021.


In 2021 I wrote 11 new blog posts (including this one)!

I'm especially happy with the Typing Giml series. I've been trying to learn how type inference algorithms work and how to implement them for a long time and found most papers and articles either a bit lacking some important details or too jargon-heavy for me.

In this series I tried to write the tutorial I wish I had, covering "the french approach" to type inference (constraint generation and solving in different phases), as well as more exotic features - extensible records and polymorphic variants.

I hope these articles will be useful to others who are trying to learn about type inference.


In 2021 I started working on Strema, which was later renamed to Giml.

Giml is a purely functional programming language with emphasis on structural typing. Almost all of the work on the Strema/Giml compiler, which currently compiles Giml to JavaScript, was done while streaming on twitch.

This youtube playlist covers almost all of the sessions:

I started streaming in January and wanted to achieve 4 goals:

  • Demonstrate and demystify the process of building a non-trivial Haskell program
  • Demonstrate building a compiler in Haskell (to serve as a tutorial of sorts)
  • Learn more about type inference, and prototype a language I wanted to build for a long time
  • Have fun and communicate with others

I am quite happy with the result. I was able to stick with this project long enough to make a non-trivial compiler and share the process with others. I hope it helped a few people get into compilers or even just Haskell in general.

At some point Strema progressed enough that I felt I can switch the focus to Giml, the language I've been wanting to build for several years (and so I did).

For reasons beyond my control streaming halted in May and I haven't continued since. I hope to continue working on Giml while streaming in Q1 of 2022.

Open-Source Software

I've selected a few notable fun projects I worked on this year:


Late last year I published a scotty tutorial where we built a very basic bulletin board. After that, I decided to spend some time and expand the initial project to provide a more featureful example project - one with user authentication, cookies, a database, and more.

The result of this work and my first project of 2021 was bulletin-app. In this blog post I describe the various libraries used, how to run the program, and link to the following demo video:


I participated in Ludum Dare 48, a game jam where participants build games in 48 or 72 hours. This Ludum Dare's theme was "Deeper and deeper", and I built a "game" named Deep where the player needs to dig into the ground... and that's it. Due to time constraints and the fact that I wasn't very prepared for the jam the game isn't very interesting and frankly isn't much of the game.

During the game jam I tried to experiment with destructible terrains and multiplayer using alpaca-netcode. The game's codebase is heavily refactored but based on nyx-game.

You can see a demo of the multiplayer mode in this demo video:

If you are interested in trying the multiplayer mode, make sure you read the warning.


As I mentioned before, I spent most of the first half of 2021 working on Giml while streaming on twitch.

With Giml, I'd like to explore another point in the design space - a strict, purely functional programming language that provides extensible records, polymorphic variants and higher kinded types, but without type classes and many other advanced type-level features.

I hope to continue working on Giml in 2022 and add more features such as modules, operators, and lazy arguments, additional backends such as an interpreter, a WASM backend or a native backend, and maybe even development tools and a step debugger.

If you are interested in exploring the Giml codebase, you can find the repository on Gitlab, or you can explore the auto generated haddocks (which btw, I totally recommend creating as part of your CI step!).



Tapir is a website generator I hacked in a couple of weekends in order to build Giml's website.

It mainly uses mustache templates, toml and markdown to create the website. The mustache templates are used to define page templates and look kinda like HTML with holes to plug in content, navigation, etc.

Each page contains two parts, a metadata header with information, such as the chosen template and page name, and the content of the page in markdown format.

Tapir will take this description of a website and will inject the markdown pages into their chosen templates and create HTML pages!

Tapir was pretty fun to build, and I'm quite satisfied with the result.


Last but not least, I wrote a Haskell book!

Learn Haskell by building a blog generator is an introductory, project-oriented, free, and online book teaching Haskell while building a static blog generator.

With this book, I attempt to create introduction material for Haskell which focuses on software development in Haskell and "getting things done", rather than providing a comprehensive walkthrough to the language. I try to describe and demonstrate design patterns such as the combinator pattern and functional core / imperative shell, and cover topics such as error handling, writing tests, generating documentation, and more.

I tried to keep the book relatively short to try and reduce the amount of time and commitment a learner must invest in order to try learning Haskell, while still covering repeatedly asked questions and topics that I've seen beginners struggle with when they try to apply their Haskell knowledege to build useful programs.

I hope I've been able to achieve these goals in some way, and I really appreciate any constructive feedback I can get on the book. I would like to thank everyone who opened issues in the issue tracker, provided feedback, and submitted editing and typo fixes! Your help is greatly appreciated.

Book logo

Final notes

These were the main things I worked on in 2021. I'm quite happy with what I got to create. If there's anything here that catches your eye, here's your opportunity to check it out!

If you found the things I work on interesting and you'd like to get more frequent updates from me, you might want to follow me on twitter.

Thank you for reading. I wish you all the best in the coming year. May your 2022 be filled with happiness and peace.

Happy new year!

December 25, 2021 12:00 AM

December 23, 2021

Matthew Sackman

Let's build! A distributed, concurrent editor: Part 4 - Writing to disk

In this series:

Last week’s article was not very exciting. I know, and I’m sorry. As I’ve said, I don’t enjoy programming browsers, I struggle to build up much enthusiasm for it, or for writing about it. This week is much more my bag. This week, and the next few articles in this series (I reckon I’m about half way through), are what I consider fun.

I need to write the server. There’s some normal stuff to do: running an HTTP server; accepting WebSockets; managing a bit of state associated with that. But there’s also managing the state of the document, and recording it and its history on disk. I need to figure out what to do when the server receives an undo or redo message: firstly what the resulting document is, but secondly what then to send out to the clients in order to update them. Only the server has the document history, not the clients. So the server has to interpret the undo and redo messages and update the clients with the consequence of those messages.

Deltas or Consequences?

It’s difficult to know where to start, in terms of figuring out what to write to disk. One decision I need to make is whether to try and store deltas, i.e. modifications to the document, or the consequences of those modifications, i.e. the full document.

  • A delta, or a change to the document, might look like: “change word with identifier 5ce63g from dog to cat
  • Whereas the consequence would be the resulting document: the best possible pet is a cat

The client is already sending the server deltas; in last week’s article I covered how the client models the document as a linked list of words. When the user types, the client marks the modified words as dirty. Only the dirty words get sent up to the server.

The client would be happy to receive either deltas or consequences: it expects the full document whenever it establishes a connection to the server; but as each word has its own version, and the version is pretty much a Lamport Clock, the client would behave the same way regardless of whether it receives a stream of deltas or consequences. This is quite a nice aspect of the design: the protocol and algorithms I’ve covered so far are all (I believe) fairly simple, but they still permit flexibility in the design and implementation of other parts of the system.

The client doesn’t really care what it receives, so it doesn’t help guide my decision as to what to write to disk. I could still store endless copies of the full document on disk, but keep track of what’s been sent to the clients, and calculate the difference between the two whenever I need to update the clients. Thus I could still be storing consequences but sending deltas.

If I choose to store deltas, and those deltas include the plain undo and redo messages that the clients send, then what’s on disk would be a list of these deltas, from the creation of the empty document onwards. Replaying them, and correctly interpreting them, should result in the correct current state of the document. It feels quite nice that I could receive deltas from clients, append them to some file on the server, and make no further writes to disk at all: that seems fairly simple and efficient to me. It’s not at all clear how the interpretation of those deltas would work: I’d need to figure out some algorithms there; but it’s appealing to me that the interpretation of the deltas is a distinct activity to writing to disk. Writing to disk this way (not much data to write; appending only; not modifying existing data on disk) feels like it could be fast and efficient, and that would help the server to scale to many documents and many users. One downside I can see is that it’s not great to have to replay and interpret the document’s entire editing history from the dawn of time in order to calculate the current state of the document. Some sort of check-point mechanism might be helpful if I go down this route.

But what about the alternative: writing consequences? I have to keep the history of the document on disk so that I can restart the server and still be able to cope with new undo messages arriving. In the simplest design, I would write out the full document after every modification. This seems distasteful, but I want to see if other things get easier if I follow through with this design.

A scenario: a document has been created, and a few modifications have happened. The document currently consists of four words, “The brown fox jumps”. This is shown in figure 1, which also shows the full document after some preceding modifications.

'The red fox', then 'The red fox jumps', and now 'The brown fox jumps'

Figure 1: A few modifications have led to this document.

So now a user presses undo, and the client sends the server an undo message. What should I write to disk? It’s tempting to append the previous version of the document; this seems simple and reasonably intuitive. On disk, this would look like:

First 'The red fox', then 'The red fox jumps', followed by 'The brown fox jumps'. The user presses undo so lastly back to 'The red fox jumps'

Figure 2: After the user presses undo

What if the user now presses redo? How do I figure out that the redo is firstly legal (i.e. something has been undone which can now be redone), and secondly find where it is and append the consequence of the redo? After thinking about multiple undos and redos, this starts to look very tricky to me. Maybe I shouldn’t write the consequences of undo and redo to disk, and instead maintain some sort of pointer so that I know where in the sequence of normal edits I am?

First 'The red fox', then 'The red fox jumps', followed by 'The brown fox jumps'. The user presses undo but this moves the pointer back to the previous state of the document

Figure 3: After the user presses undo, but now with a pointer

This would mean for a normal edit, I must write to disk the new document and make sure the pointer is on disk pointing at the latest state of the document. If I receive an undo, I must update the pointer on disk, moving it one to the left (assuming I can go left), and if I receive a redo, I must update the pointer on disk moving it one to the right (assuming I can go right). What happens if I receive an undo follow by a normal edit?

The user presses undo, moving the pointer back to 'The red fox jumps', and then edits to become 'The red fox leaps'

Figure 4: The user presses undo and then edits

In this case, the consequence of the new edit must replace everything to the right of the pointer. So the consequence of an edit is not simply appended to the list of consequences; first that list must be truncated. I explored these sorts of sequences of events back in the 2nd article in this series. Having to maintain on disk this pointer is not the end of the world, and nor is having to truncate the list or overwrite existing data. But certainly the use of disk is now more complex than if I just write deltas.

Will this do though? If I implement this will it work? I want to think more about what happens when the server receives that undo message. I know it can update the pointer on disk, which will reveal to the server the correct new state of the document, and that document state now needs sending to all the clients. Assuming that on disk I’ve stored the document as a linked list of words, using the protocol I’ve created and used in previous articles, can I just read that document from disk and send it to all the clients?

Sadly, no. The reason why not, affects me regardless of whether I’m writing out deltas or consequences, so this isn’t a nail-in-the-coffin of writing consequences. But it does show the apparent simplicity of storing consequences is erroneous.

Last week I worked through the algorithms in the client. The client will ignore an updated word it receives if the version of that word is not greater than the version it currently has. This is how the system brings some small amount of sanity to concurrent modification. If I draw out the consequences of the edits, but now add the versions of each word, in green, above the word, it would look like this:

Before the user presses undo, the words of 'The brown fox jumps' are at versions 3, 5, 3, 1, respectively.

Figure 5: Just before the user presses undo, with word versions shown

The server now receives undo. It wants to change the word brown back into the word red. The word brown is currently at version 5. If the server just reads the previous document and sends that out to all the clients, they’ll receive an update for the word, trying to change the word to red, but that update for the word will carry version 4. And so the clients will ignore it, because they already have that word at version 5. It needs to be sent out at version 6, or higher.

So, I have to make sure that the words in the new state of the document, that differ from what the server has previously sent to the clients, have higher version numbers than the client has previously seen. This is not as simple as bumping all the words in the previous state of the document; this is making sure the server knows, for each word, what is the highest version number the server has ever sent to the clients, and then bumping that. This is extra state that I could store on disk alongside the pointer: a mapping from WordId to Version to record the highest version of each word ever sent out.

This is the case for both undo and redo. Consider a sequence of events: undo, redo, undo, redo, undo, redo, then the server is restarted, and lastly another undo. How is the server meant to know which version numbers to use for this final undo? It should be at least 7 greater (because of the 6 previous events) than the version numbers recorded in the consequences on disk. So the consequences on disk, and the position of the pointer, are not sufficient on their own to allow me to derive the current version number for any word. Hence the extra state must be written to disk too. This now means for undo and redo events, the server will have to adjust the pointer, bump version numbers as appropriate, and then write all that data back to disk. For normal edits, it would have to do the same, plus truncating and appending the new document state to disk.

This design has become too ugly for me. There’s quite a lot of state on disk now, and the updates to that state are not trivial. I’m going to abandon this approach, and pick appending deltas (including plain undo and redo messages) to disk only. In other words, writing an event log to disk.

Interpreting deltas

The plan so far: receive messages from the clients, append them to disk in some way. For edit messages, the only other thing to do is to send the edit to everyone else. All the fun comes from undos and redos. This is why I gave Bob and Alice such death-stares back in episode 1 when they asked for this feature.

I shall define a term here: a frame. Each frame combines a single edit event with counts of the number of times that edit event has been undone, and redone. Every edit event is wrapped by exactly one frame. Assume I manage to transform my event log of deltas into a list of frames somehow. I could then walk through this list of frames to calculate the state of the document. If, for example, I see a frame where the undo count is greater than the redo count, then I know I shouldn’t apply the edit in that frame to the document state, at least at this point. But I’m getting ahead of myself; what exactly does this list of frames look like?

Each delta becomes a pointer to a frame. The same frame can appear several times.

Figure 6: Transforming a sequence of deltas into pointers to frames

The list of frames is the same length as the list of deltas. But, a single frame can appear several times in the list: the first time it appears (left most / oldest) will be when the frame’s edit message happened. The next time it appears will be the first time it was undone. The time after that is when it was redone, and so on.

I can now walk over this list of frames and calculate the document. An edit must be applied to the document at the correct point: if the edit was ultimately undone then it should never be applied. I.e. undoing an edit is the same as not applying that edit in the first place. If an edit was undone and redone then it should be applied to the document at the point where it was redone, and not earlier. I use the following logic to decide whether or not to apply an edit to the document as I loop through the list of frames:

  • If undoCount == 0 && redoCount == 0 && undoNext then apply the frame’s edit to the document. This frame will not occur later in the list.
  • Else, if undoNext then decrement undoCount by 1, set undoNext = false.
  • Else, if !undoNext then decrement redoCount by 1, set undoNext = true.

If I try out a few base-cases, this appears to be correct:

  • If an edit was never undone (or redone), then its frame will appear once, undoCount and redoCount will be 0, and undoNext will be true (the default value when a frame is constructed). So the first time the frame is encountered in the list, its edit will be applied.

  • If an edit was undone (and never redone), then its frame will appear twice. Initially undoCount will be 1. When the frame is first encountered, the edit will not be applied, undoCount will be decremented to 0, and undoNext will be flipped from true to false. When the frame is next encountered, undoCount and redoCount will both be 0, but undoNext will be false, so the edit will not be applied, and the frame is never seen again. Thus the edit has been undone.

  • If an edit was undone and redone once each, then its frame will appear three times. Initially undoCount and redoCount will both be 1. The last time the frame is encountered, both undoCount and redoCount will be 0, and undoNext will be true, so the edit will be applied to the document at that point, and not before.

Earlier, I talked about the fact that undos and redos need to bump versions of words; they cannot use the versions of the words that are in the corresponding edit. If they do, then the clients will ignore the messages they receive from the server, as the versions will be too low. I can sort this out here:

  • The first time I see a frame, I must make sure that every word edited by the frame, that exists in the document state, has a version at least as high as in the edit. Think about the sequence of events as they would have been received by the server: a client would have sent the edit to the server, and any of the words in that edit that had greater version numbers than the server knew about would have won and altered the document. So I have to have that logic here: if the version number is greater in the edit, then copy the version over to the document. I just don’t necessarily apply the edit itself unless undoCount == 0 && redoCount == 0 && undoNext, as covered previously.

  • All other times I see the frame will correspond to undo and redo events. For these, every word in the frame’s edit should be looked up in the document state, and whatever version is found there should be bumped by 1. If you think back to the design idea for writing to disk consequences and not deltas, this is the same as keeping track of the highest version number seen for each word. It’s just here, I can derive it from the event log, and not have to write it out to disk explicitly.

Imagine a client edits word 5ce63g, changing its version from 4 to 5, and its letters from dog to cat. It sends this edit to the server, which appends it to disk. The client then sends an undo message to the server. The server appends this to disk too, and then recalculates the document. The list of frames ends with two pointers to the same frame: firstly for this edit, and secondly for its undo.

The first time the frame is encountered, word 5ce63g will exist in the document state, with the letters dog. I copy the version from the frame’s edit, thus making sure word 5ce63g is at version 5 (or greater) in the document state. In the frame, undoCount == 1, so the edit is not applied. I decrement undoCount to 0, and set undoNext to false. The next (and final) item in the list of frames is this same frame again, corresponding to the undo event. I bump the version of word 5ce63g in the document state by 1. Although undoCount is now 0, undoNext is false, so the edit is never applied, which is what we want: the edit was undone!

The final document state contains word 5ce63g at version 6 (or greater), with the letters dog. I can send this to all the clients: the version is greater than any version I’ve received from the clients, so they will apply it, and the letters are dog, not cat. I.e. the edit has been correctly undone.

Building the list of frames

This list of frames seems to do the job, but how do I build it? If I walk through the list of deltas, then for each delta, I need to figure out what the current frame is, and append that current frame to the list of frames.

Assume that each frame has previous and next pointers to other frames. These will not relate to the order of frames in the list of frames. They simply help with determining the current frame.

In pseudo-code:

for delta in eventLog:
  if isEdit(delta):
    frame := newFrame(delta) // undoCount = 0, redoCount = 0, undoNext = true
    frame.previous = currentFrame = frame
    currentFrame = frame

  if isUndo(delta): // it's an undo of the current frame
    currentFrame.undoCount += 1
    currentFrame = currentFrame.previous

  if isRedo(delta): // it's a redo of the next frame
    currentFrame =
    currentFrame.redoCount += 1

  frames = append(frames, currentFrame)


It all seems to be falling into place: writing an event log is nice and simple, and hopefully fast and efficient. For edit events, I can send the event out to all the other clients and that’s it. For undo and redo events I can’t send those out to the clients because the clients don’t have the edit history of the document, and so the server has to calculate what the document has become. I’ve covered the algorithms necessary to do this, which rely on replaying the event log … from the dawn of time.

Yeah. That might not scale for a large document. I’m quite happy to believe using a little more CPU to calculate the document from the event stream is preferable to storing lots of extra data and state on disk. But it’s only a matter of time before the event log becomes so long that replaying it is slow. At least I only have to replay it on undo and redo events: I would expect these to be rarer than normal edits. Even so, this is something I’d like to fix.

Replaying the event log, and applying the calculated frames to the document state, starts with an empty document state. There’s no reason it has to be empty. From time to time, I could write out the complete document, creating a check-point. I could then load that, and only have to process events that happened after that check-point was created. This would put a bound on the maximum number of events that would need to be replayed, thus guaranteeing acceptable levels of performance.

Well, almost. Consider the server creates a check-point, and then the very next event it receives from a client is an undo. I do not want the undo to undo the check-point: that would result in an empty document! No, instead, I have to ignore the check-point, and look further back in the event log to identify the normal edit event which matches up with this new undo.

Creating the list of frames now changes: instead of walking forwards from the dawn of time, I walk backwards from the most recent event. If I reach a checkpoint, and none of the frames I’ve created so far are missing their edit event, then I can stop, and rely on that check-point: I can start with the document state as defined by the check-point, and interpret the frames in the normal forwards direction. But if there are some frames which are missing their edit event, then I must skip past this check-point and keep heading back in time. This explains why I’ve not been linking to the real code so far: it runs backwards! This also gives me some clues about when it’s appropriate to try to create a check-point: it’s only sensible just before writing out an edit event. Creating a check-point before an undo or redo event is appended to the event log is counter-productive because that check-point will always have to be ignored and skipped over.

If your cat falls asleep on your keyboard and is pressing Ctl-z, then your entire document will be undone. This design does mean that as more and more of your document is undone, each undo will take longer and longer because it has to replay ever more of the event log. But I think that’s a reasonable trade-off: when editing a document, I often find myself undoing and redoing changes I’ve made recently (and its those events that have now been optimised to some extent by the addition of check-points), but it’s rare that I (deliberately) undo hours and hours of work. So I believe this design should be acceptable.

Code and Technology

In terms of writing to disk, there are lots of options: my requirements are very light. I could use individual files and append directly using basic file operations; one file per document seems reasonable. If I put the check-points within the event log then I’d need a way of indicating that an event is a check-point or not; I could extend the bebop protocol to do this conveniently, or introduce some extra random byte to do this inconveniently. I would want to be careful about making sure fsync and friends get called correctly and the data really does get written to disk. I’d probably attempt a simple length-prefix encoding.

At the other end of the spectrum, I could reach straight for an SQL store. But I am not a fan. Every time you have to transcribe some data structure from one language to another, you hit problems: little inconsistencies that bite you hard later on. So far I’ve been very lucky that the modelling of the document in TypeScript, and Bebop, and (it turns out) in Go is all very uniform. But I don’t want to push my luck and attempt it in SQL. Modelling data structures and objects in general in SQL is always a disaster, in my opinion; ORMs group data together in tables in unnatural ways that do not reflect the structure of the data at all. Also, from a testing and deployment point of view, it massively complicates things: it’s a huge extra moving part which needs a lot of careful choreography.

Instead I’m picking a half-way house: bbolt. It’s an embedded key-value store, which supports transactions. Keys and values are simple byte arrays. Perfect for what I need: the values will be the bytes I receive from the client off the WebSocket, and bbolt supports auto-incrementing sequence numbers which will help with the keys. It also supports cursors which will allow me to easily walk backwards from the end of the event log. It’s just a Go library so it compiles in to the binary: no external moving parts to mess about with. Simple semantics, and easy to test with and deploy.

Aside: I did also try to use badger which may well have better performance than bbolt. However, I found the quality of the documentation and API design to be much poorer than bbolt, so I abandoned badger.

The code that loads the event log from disk should have some resemblance to the earlier pseudo-code. Some of the logic is a bit different because it’s working backwards over the event log, not forwards. Building the document from the frames looks fairly different. One of the reasons for this is that the server always keeps track of the current frame. This is very useful as it allows me to easily check whether an undo or redo message I’ve just received is legal or not, by looking at the previous and next fields. But it means that the logic about updating undoCount and redoCount etc is slightly different (though equivalent, hopefully!) than how I’ve introduced it in this prose, because object management is a little different.

Filtering edits for words that have greater version numbers should seem very similar from last week’s article, as should performing garbage collection. Because Go supports structural equality, I can have WordIds as keys in a map and not have to convert to strings like I had to in JavaScript. Although I’ve not discussed it, the server does filter edits it receives from clients before writing to disk or sending to other clients. This means that if the server receives an ancient edit from a slow client, it may well not need to write anything to disk or send any update to any client if all the edited words in the received message are already out of date. It’s an easy optimisation to make and makes sure that use of disk and network bandwidth is minimal.

I mentioned much earlier that even if I store consequences on disk, I could still send deltas to the clients by keeping track of what’s already been sent to them. Well, the server recalculates the document after an undo or redo event. Rather than send the entire document to the clients, I do keep track of what the server currently thinks it has sent to the clients (i.e. the current state of the document), and so I can calculate the difference and continue to send only deltas to the clients.

Creating a check-point is not much more than appending the current state of the document to disk. And when a new client connects, the server can send a check-point of the document state to the new client as its part of the initial synchronisation with the client. This is equivalent to the client marking all the words it knows about in the document as dirty and sending all of those up to the server, which I discussed last week.

There are a very large number of ways of skinning this particular cat and safely storing the necessary data on disk. I have no doubt there are many other solutions that are better under various metrics, from simplicity to performance. Hopefully not robustness though, as I have done some solid testing on this! That said, the server currently completely trusts the changes any client sends, which could cause problems: there’s nothing to stop the client sending changes that make the linked list of words turn into a cycle. That could cause significant problems, for example the current GC algorithm would never terminate if this happens. Ooops!

If you come up with your own design and want to share it with me please do; I’m curious about other designs. You can reach me by email at matthew at this website’s domain name, or on twitter (DMs are open).

For anyone having a gander at the server code, you’ll spot it seems to be written using some sort actors library. I think that’s going to be the subject of next week’s article: I’m a big fan of using actors for concurrent programming, and I’d like to spend some words extolling its virtues (and also write a tutorial for my actors library)! After that, it’s on to testing: I’ve already written some integration tests and some soak tests. Hopefully I’ll add some fuzz tests before this series comes to a close.

December 23, 2021 11:01 AM

December 20, 2021

Tweag I/O

Nix 2.4 and 2.5

A couple of weeks ago Nix 2.4 was released. This was the first release in more than two years. More than 195 individuals contributed to this release. Since Tweag is the biggest contributor to the Nix project, I’d like to highlight some of the features that Tweag has worked on.


Flakes are a new format to package Nix-based projects in a more discoverable, composable, consistent and reproducible way. A flake is just a repository or tarball containing a file named flake.nix that specifies dependencies on other flakes and returns any Nix assets such as packages, Nixpkgs overlays, NixOS modules or CI tests.

You can read more about flakes in the following blog posts:

The development of flakes was sponsored by Target Corporation and Tweag.

Content-addressed store

Nix’s store can now be content-addressed, meaning that the hash component of a store path is the hash of the path’s contents. Previously Nix could only build input-addressed store paths, where the hash is computed from the derivation dependency graph. Content-addressing allows deduplication, early cutoff in build systems, and unprivileged closure copying.

The content-addressed store (CAS) is described in detail in RFC 0062. It is still marked as experimental, and your input is welcome. You can read more about CAS in these blog posts:

CAS was developed by Tweag and Obsidian Systems, who were supported by an IPFS Grant.

UX improvements

The Nix command line interface (CLI) - commands such as nix-env and nix-build - is pretty old and doesn’t provide a very good user experience. A couple of years ago we started working on a new CLI: a single nix command to replace the nix-* commands that aims to be more modern, consistent, discoverable and pleasant to use.

However, work on the new CLI had stalled somewhat because we didn’t have a discoverable packaging mechanism for Nix projects. Thanks to flakes, we now do! As a result, in Nix 2.4, the nix command has seen a lot of work and is now almost at feature parity with the old CLI. It is centered around flakes; for example, a command like

> nix run nixpkgs#hello

runs the hello application from the nixpkgs flake.

Most of the work on the new CLI was done by Tweag. We organized a Nix UX team to review the state of the Nix user experience and plan improvements. A major result of the UX team is a set of CLI guidelines for the Nix project. More UX improvements are coming up, including an interactive progress indicator.

Experimental features and release schedule

The previous Nix release (2.3) was in September 2019. Having a 2-year gap between releases is something we want to avoid in the future, since it’s bad for both contributors and users that there is an unbounded amount of time before a new feature shows up in a stable release. The thing that has historically caused long gaps between Nix releases is new experimental features landing in master that we weren’t quite sure about, and doing a new release meant having to support these features indefinitely. However, Nix 2.4 introduces a mechanism to mark features as experimental, requiring them to be enabled explicitly on the command line or in the nix.conf configuration file. Thanks to this, we can merge experimental features in a way that still allows them to be changed or removed, while still getting feedback from adventurous users.

Therefore, starting with Nix 2.4, we have switched to a 6-weekly release schedule, meaning that we do a new release every 6 weeks. In fact, Nix 2.5.0 was already released a few days ago!

Non-blocking garbage collector

A very old annoyance with large Nix stores (such as CI systems) is that garbage collection can take a long time, and during that time, you couldn’t start new builds. Instead you would get the infamous message

waiting for the big garbage collector lock...

Nix 2.5 has a new garbage collector that makes this a thing of the past: the collector no longer prevents new builds from proceeding. The development of the new collector was sponsored by Flox.

December 20, 2021 12:00 AM

December 19, 2021

Matthew Sackman

Static Site Generator

Quick little bonus post this week. I wrote my own static site generator. If you are familiar with Go and Go’s templates then you might like it. It’s only 1200 lines of code so it doesn’t do a great deal, but its semantics are quite simple and it’s quite flexible. See the README for the details and some examples.

I got fed-up with Hugo. I think it’s not aimed at programmers really. It doesn’t let you call templates from within posts, and its serve functionality would endlessly crash over editor backup files. Despite that ticket being marked closed, it still happens a lot, for a lot of people.

Writing a static site generator is a fun little project; after 3 days I had everything I wanted, including a server that watches the filesystem for changes and automatically rebuilds. It’s certainly not as polished or slick as others out there, but it does what I wanted: simple, predictable, and flexible. It’s what builds this site now.

December 19, 2021 04:01 PM

December 17, 2021


Type-level sharing in Haskell, now

The lack of type-level sharing in Haskell is a long-standing problem. It means, for example, that this simple Haskell code

apBaseline :: Applicative f => (A -> B -> C -> r) -> f r
apBaseline f =
        pure f
    <*> pure A
    <*> pure B
    <*> pure C

results in core code that is quadratic in size, due to type arguments; in pseudo-Haskell, it will look something like

apBaseline :: Applicative f => (A -> B -> C -> r) -> f r
apBaseline f =
                          (pure f)
    <*> @A @(B -> C -> r) (pure A)
    <*> @B @(     C -> r) (pure B)
    <*> @C @(          r) (pure C)

Each application of (<*>) records both the type of the argument that we are applying, as well as the types of all remaining arguments. Since the latter is linear in the number of arguments, and we have a linear number of applications of <*>, the core size of this function becomes quadratic in the number of arguments.

We recently discovered a way to solve this problem, in ghc as it is today (tested with 8.8, 8.10, 9.0 and 9.2). In this blog post we will describe the approach, as well as introduce a new typechecker plugin (available from Hackage or GitHub) which makes the process more convenient.

It is exciting that we now have an answer to a previously unsolved problem, but it must be admitted that the resulting code is not very elegant, and fairly unpleasant to write. It will be necessary to explore this new approach, find new ways to use it and adapt it to improve the user experience. We are nonetheless releasing this blog post, and the plugin, in the hope that others might feel inspired to experiment with it and devise new ways in which it can be used.

Vanilla ghc

Before we introduce the new typechecker plugin, we will first demonstrate the concept in vanilla ghc. Here’s the main idea: we will represent a type-level let-binding as an existentially quantified type variable, along with an equality that specifies the value of that variable; the equality will be opaque to ghc until we reveal it.

That probably sounds rather abstract, so let’s make this more concrete:

data a :~: b where             -- defined in base (Data.Type.Equality)
  Refl :: a :~: a

data LetT :: a -> Type where   -- new
  LetT :: (b :~: a) -> LetT a

The LetT constructor has two type variables, a and b; b is the existential type variable mentioned above, while a is a regular type variable, and will correspond to the value of the type variable we are “let-binding”. In other words, think of this as a type-level assignment b := a. The argument b :~: a records the equality between a and b; it is opaque to ghc in the sense that ghc will not be aware of this equality until we pattern match on the Refl constructor.

When we construct a let-binding, a and b will (by definition) have the same value, and so we can introduce a helper function:

{-# NOINLINE letT #-}
letT :: LetT a
letT = LetT Refl

This gives us a let-binding with value a, for an existential variable that we will discover when we pattern match on the LetT constructor.1

This is all probably still quite abstract, so let’s see a simple example of how we might use this:

castSingleLet :: Int -> Int
castSingleLet x =
    case letT of { LetT (p :: b :~: Int) ->   -- (1)
      let x' :: b
          x' = case p of Refl -> x            -- (2)
      in case p of Refl -> x'                 -- (3)

In (1), we introduce a type-level let-binding b := Int. Then in (2) we define a value x' of type b; we know that b := Int, but ghc doesn’t, and so we explicitly pattern match on the equality proof. Finally, in (3) we want to use x' as the result of the function; for this we need to cast back from b to Int.

Of course, this example is a bit pointless, so let’s consider how we might actually use this to solve a problem.

Heterogenous lists

We will come back to the applicative example from the introduction a bit later, but let’s consider a slightly simpler example first. Recall this definition of heterogenous lists:

data HList :: [Type] -> Type where
  HNil  :: HList '[]
  HCons :: x -> HList xs -> HList (x : xs)

Without type-level sharing, we cannot construct values of type HList without resulting in quadratic core code size, for much the same reason as before. For example,

hlistBaseline :: HList '[A, B, C]
hlistBaseline =
      HCons A
    $ HCons B
    $ HCons C
    $ HNil

will be expanded with type variables to something like

hlistBaseline :: HList '[A, B, C]
hlistBaseline =
      HCons @'[A, B, C] A
    $ HCons @'[   B, C] B
    $ HCons @'[      C]
    $ HNil

where we again have a linear number of calls to HCons, each of which has a list of type arguments which is itself linear; hence, this value is quadratic in size.

Let’s fix that. Instead of repeating the list each time, we will introduce type-level sharing so that we can express, “this list is like that other list over there, but with an additional value at the front”. Let’s first define the various type-level lists:

hlist1 :: HList '[A, B, C]
hlist1 =
   case letT of { LetT (p2 :: r2 :~: (C : '[])) ->   -- r2 := C : []
   case letT of { LetT (p1 :: r1 :~: (B : r2 )) ->   -- r1 := B : r2
   case letT of { LetT (p0 :: r0 :~: (A : r1 )) ->   -- r0 := A : r1

With the type-level lists defined, we can now define the corresponding values. Just like before, we need to cast explicitly. For example, the list HCons C HNil has type HList (C : '[]); we know that this is the same as HList r2, but to convince ghc of that fact, we need to appeal to the explicit equality.

let xs2 :: HList r2
    xs1 :: HList r1
    xs0 :: HList r0

    xs2 = case p2 of Refl -> HCons C HNil
    xs1 = case p1 of Refl -> HCons B xs2
    xs0 = case p0 of Refl -> HCons A xs1

Finally, we need to cast back from HList r0 to HList '[A, B, C]; we will need to appeal to all equalities in order to do so. The full function is:

hlist1 :: HList '[A, B, C]
hlist1 =
   case letT of { LetT (p2 :: r2 :~: (C : '[])) ->
   case letT of { LetT (p1 :: r1 :~: (B : r2 )) ->
   case letT of { LetT (p0 :: r0 :~: (A : r1 )) ->

     let xs2 :: HList r2
         xs1 :: HList r1
         xs0 :: HList r0

         xs2 = case p2 of Refl -> HCons C HNil
         xs1 = case p1 of Refl -> HCons B xs2
         xs0 = case p0 of Refl -> HCons A xs1

     in case p0 of { Refl ->
        case p1 of { Refl ->
        case p2 of { Refl ->


Unpacking equalities

It is critical that ghc cannot see the equalities we introduce, because if it did, it would just unfold the definition and we’d lose the sharing we worked so hard to introduce. Nonetheless, the need to match on all these equality proofs in order to cast values to the right type is certainly inconvenient. It is also easy to get wrong; we will discuss that problems in this section, but fortunately we can avoid the need for pattern matching altogether when we use the typelet type checker plugin; we will introduce this plugin in the next section.

The reason that the pattern matches are easy to get wrong is that we need to match in the right order. Concretely, if instead of the order above, we instead did

case p2 of { Refl ->
case p1 of { Refl ->
case p0 of { Refl ->

we would end up with quadratic code again.

This is due to the shape of the equality proof that ghc constructs: xs0 has type HList r0, but we want to use it at type HList '[A, B, C]. There are sufficient equalities in scope to enable ghc to prove that these two types are in fact the same, which is why the program is accepted, but the resulting core code will look like

   xs0 `cast` {- .. proof that HList r0 ~ HList '[A, B, C] .. -}

In the linear version (where we pattern match on p0 first), we end up with the proof

let co2 :: r2 ~ (C : [])
    co1 :: r1 ~ (B : r2)
    co0 :: r0 ~ (A : r1)

    co2 = ..
    co1 = ..
    co0 = ..

in .. co2 .. co1 .. co0 ..

which mirrors our own definitions very closely. However, if we match in the wrong order, we get this proof instead:

let co2 :: r2 ~ '[      C]
    co1 :: r1 ~ '[   B, C]
    co0 :: r0 ~ '[A, B, C]

    co2 = ..
    co1 = .. co2 ..
    co0 = .. co1 ..

in .. co0 ..

When we unpack the equalities in the right order, ghc first learns that r0 ~ (A : r1), without yet knowing what r1 is, and so it just constructs a proof for that; similarly, on the next equality, it learns that r1 ~ (B : r2), without knowing what r2 is, and so it constructs the corresponding proof (without modifying the proof it generated previously). When we do things in the opposite order, ghc first learns that r2 ~ (C : '[]); then when it learns that r1 ~ (B : r2), it already knows what r2 is, and so it constructs a proof for r1 ~ (B : C), and we have lost sharing.

Of course, we don’t really want these proofs at all, and indeed, when we use the plugin, we won’t get them.

The typelet typechecker plugin

To use the typelet typechecker plugin, just add

{-# OPTIONS_GHC -fplugin=TypeLet #-}

at the top of your module. When we use type plugin, we can write the HList example as

hlistLet :: HList '[A, B, C]
hlistLet =
    case letT (Proxy @(C : '[])) of { LetT (_ :: Proxy r2) ->
    case letT (Proxy @(B : r2))  of { LetT (_ :: Proxy r1) ->
    case letT (Proxy @(A : r1))  of { LetT (_ :: Proxy r0) ->

     let xs2 :: HList r2
         xs1 :: HList r1
         xs0 :: HList r0

         xs2 = castEqual (HCons C HNil)
         xs1 = castEqual (HCons B xs2)
         xs0 = castEqual (HCons A xs1)

     in castEqual xs0

We still need to be explicit about when we want to cast, but we don’t need to be explicit anymore about how we want to cast. As an additional bonus, the resulting core code also doesn’t have any coercion proofs. (We will see below how we can rewrite this example more compactly using another combinator from the library.)

In the remainder of this blog post we will discuss the type system extension provided by the plugin. We will not discuss how it works internally; it is a reasonably simple type checker plugin and a discussion of the implementation is not relevant for our goal here, which is type-level sharing.

Let and Equal

The typelet library introduces two new classes, Let and Equal. Let is defined as

class Let (a :: k) (b :: k)

A constraint Let x t, where x is an existentially bound type variable, models a type-level let-bnding x := t. Only constraints of this shape (with x a variable2) are valid, and let-bindings cannot be recursive; if either of these conditions are not met, the plugin will emit a type error.

Let has a single instance for reflexivity (much like the use of Refl in letT above):

instance Let a a

In order to introduce the existential type variable, we define a LetT type much like we did above, but now carrying evidence of a Let constraint:

data LetT (a :: k) where
  LetT :: Let b a => Proxy b -> LetT a

letT :: Proxy a -> LetT a
letT p = LetT p

Of course, introducing let bindings is only one half of the story. We must also be able to apply them. This is where the second class, Equal, comes in:

class Equal (a :: k) (b :: k)

castEqual :: Equal a b => a -> b
castEqual = unsafeCoerce

Equal is a class without any instances; constraints Equal a b are instead solved by the plugin. Function castEqual allows to coerce from a to b whenever the plugin can prove that a and b are equal3. Formally:

\frac{ \text{$\mathcal{Q}$ defines substitution $\sigma$} \qquad \mathcal{Q} \Vdash \sigma(a) \sim_\mathrm{N} \sigma(b) }{ \mathcal{Q} \Vdash \mathtt{Equal} \; a \; b }

In order words, the Let constraints define an (idempotent) substitution, and an Equal a b constraint is solvable whenever a and b are nominally equal types after applying that substitution.

For a trivial example, two types that are already nominally equal will also be Equal:

castReflexive :: Int -> Int
castReflexive = castEqual

The following example is slightly more interesting, and is the equivalent of castSingleLet that we already saw above, but now using the plugin:

castSingleLet :: Int -> Int
castSingleLet x =
    case letT (Proxy @Int) of
      LetT (_ :: Proxy t1) ->
        let y :: t1
            y = castEqual x
        in castEqual y

We saw a more realistic example above, in the definition of hlistLet.

Combining a type-level let with a cast

In castSingleLet we define a type variable t1, and then immediately cast a value to that type. That is such a common idiom that the typelet library provides a custom combinator for it:

data LetAs f (a :: k) where
  LetAs :: Let b a => f b -> LetAs f a

letAs :: f a -> LetAs f a
letAs x = LetAs x

Most of the time, we don’t want to hide the entire type of some value, because then that value would become unuseable without a cast (we’d have no information about its type). That’s why LetAs is parameterised by some functor f; when we have a value of type f a, letAs introduces a let-binding b := a, and then casts the value to f b. Here is a simple example:

castSingleLetAs :: Identity Int -> Identity Int
castSingleLetAs x =
    case letAs x of
      LetAs (x' :: Identity t1) ->
        castEqual x'

For a more realistic example, let’s consider the HList example once more. Here too we see the same idiom: we introduce a type-level let binding for the type-level list, and then cast a term-level value. Using letAs we can do that in one go:

hlistLetAs :: HList '[A, B, C]
hlistLetAs =
    case letAs (HCons C HNil) of { LetAs (xs02 :: HList t02) ->
    case letAs (HCons B xs02) of { LetAs (xs01 :: HList t01) ->
    case letAs (HCons A xs01) of { LetAs (xs00 :: HList t00) ->
      castEqual xs00


Both letT and letAs introduce a data constructor, only for us to then directly pattern match on it again. The obvious question then is whether we might be able to avoid this using CPS form. Indeed we can, but we do have to be careful. The library defines CPS forms of both letT and letAs:

letT' :: forall r a. Proxy a -> (forall b. Let b a => Proxy b -> r) -> r
letT' pa k = case letT pa of LetT pb -> k pb

letAs' :: forall r f a. f a -> (forall b. Let b a => f b -> r) -> r
letAs' fa k = case letAs fa of LetAs fb -> k fb

The problem is that these abstractions introduce a type variable for the continuation (r), which may itself require sharing. The “obvious but wrong” translation of hlistLetAs, above, is

hlistLetAsCPS_bad :: HList '[A, B, C]
hlistLetAsCPS_bad =
    letAs' (HCons C HNil) $ \(xs02 :: HList t02) ->
    letAs' (HCons B xs02) $ \(xs01 :: HList t01) ->
    letAs' (HCons A xs01) $ \(xs00 :: HList t00) ->
      castEqual xs00

This is wrong, because the continuation variable r for each call to letAs' is HList '[A, B, C]: the type of the final result. But this means that we have n occurrences of the full n elements of the list, and so we are back to code that is O(n²) in size. If we do want to use CPS form, we have to introduce one additional let binding for the final result:

hlistLetAsCPS :: HList '[A, B, C]
hlistLetAsCPS =
    letT' (Proxy @'[A, B, C]) $ \(_ :: Proxy r) -> castEqual $
      letAs' @(HList r) (HCons C HNil) $ \(xs02 :: HList t02) ->
      letAs' @(HList r) (HCons B xs02) $ \(xs01 :: HList t01) ->
      letAs' @(HList r) (HCons A xs01) $ \(xs00 :: HList t00) ->
        castEqual xs00


For the HList example, we can give a type-level let for the tail of the list (C ': []), and at the same type provide a value of that type. The fact that we do things “in the same order” at the type level is what made it possible to use the letAs combinator.

Unfortunately, that is not always the case. For example, in the applicative chain example from the introduction the order doesn’t quite work out: we must give type-level let bindings starting at the end

l02 := C -> r
l01 := B -> l01
l00 := A -> l00

but we need to apply arguments in the reverse order (A, then B, then C). Put another way, associativity at the type-level and at the term-level match for the HList example (both right-associative), but mismatch for the applicative example (right associative for the function type and left associative for function application).

Perhaps we will discover further combinators that help with this example, but for now it means that we need to use the more verbose option to write the function:

apLet :: forall f r. Applicative f => (A -> B -> C -> r) -> f r
apLet f =
    case letT (Proxy @(C -> r))   of { LetT (_ :: Proxy l02) ->
    case letT (Proxy @(B -> l02)) of { LetT (_ :: Proxy l01) ->
    case letT (Proxy @(A -> l01)) of { LetT (_ :: Proxy l00) ->

      let f00 :: f l00
          f01 :: f l01
          f02 :: f l02

          res :: f r

          f00 = pure (castEqual f)
          f01 = castEqual f00 <*> pure A
          f02 = castEqual f01 <*> pure B
          res = castEqual f02 <*> pure C

      in res



Is all this really worth it? It depends. For the applicative chain, the difference is not huge, and the let-bindings add some overhead:

For GADTs, however, the difference is much more dramatic. For the HList example, after desugaring:

and after the simplifier:

This is all with -O0; the primary goal here is to optimize compilation time during development. Talking of compilation time, let’s measure that too. Unfortunately, the performance of the HList example using letAs (rather than the CPS version) depends critically on the performance of ghc’s pattern match exhaustiveness checker, which differs quite a bit between ghc versions. Let’s first disable that altogether:

-Wno-incomplete-patterns -Wno-incomplete-uni-patterns -Wno-overlapping-patterns

With these options, compilation time for the three HList variations (baseline without sharing, letAs, and the CPS version letAs') are similar across ghc 8.8, 8.10, 9.0 and 9.2, and look something like (compilation time in ms versus number of entries in the list):

With the exhaustiveness checker enabled, times vary wildly between ghc versions for the non-CPS version (note: these graphs have different ranges on their y-axes):

8.8 8.10
9.0 9.2

The non-CPS version is up to 1.6x faster than the baseline in 8.8, but up to 44x slower in 8.10. The situation improves a bit in 9.0, but it’s still up to 10x slower than the baseline, until sanity is restored in 9.2 and the non-CPS version is up to 3x faster than the baseline.

The CPS version meanwhile is more stable across versions: up to 3x faster in 8.8 and 8.10, a slightly less impressive improvement of up to 2.4x faster in 9.0, and then back to up to 3.4x faster in 9.2.


The typelet type checker plugin offers an API that makes it possible to introduce type-level sharing in ghc as it is today (tested with 8.8, 8.10, 9.0 and 9.2). This is pretty exciting, but like any new abstraction, we need to experiment with it to discover the best way it can be used. We are releasing the plugin as well as this blog post at this early stage in the hope that others will feel inspired to try it and share their discoveries.

Our own motivation for developing this now is our continued efforts on behalf of Juspay to improve their compile times. In particular, we are currently looking at how we could support large anonymous records that compile in less-than-quadratic core space. Of course, type level sharing is only one weapon in our arsenal if we want to optimize for compilation time; my previous two blog posts in this series (Avoiding quadratic core code size with large records and Induction without core-size blow-up) discuss many others, and the search is not yet over.

  1. For this version (without the plugin), it is crucial that letT is marked as NOINLINE, because if it isn’t, even with -O0 the optimizer will inline it, evaluate the case expressions, and we lose all sharing again.↩︎

  2. In fact, x must be a skolem variable: one that cannot unify with anything.↩︎

  3. It is possible that the lack of a explicit dependency of castEqual on the evidence for Equal (and then, transitively, the lack of an explicit dependency of the evidence for Equal on the Let constraints that justify it), may under certain circumstances result in ghc’s optimizer floating the unsafeCoerce too far up. It is not clear to me at present whether this can actually happen. Although the Equal evidence is trivial, it does at least record the full types of the left-hand side and right-hand side of the equality; we have also marked castEqual in an attempt to make sure that the unsafeCoerce does not escape the scope of the Equal evidence. Nonetheless, it is conceivable that we might have to revisit this.↩︎

by edsko, andres, adam, sam at December 17, 2021 12:00 AM

December 15, 2021

Tweag I/O

21.11 Zero Hydra Failures Moscow Hackathon: Report and Suggestions

Twice a year — in May and November, nixpkgs maintainers and contributors get together to fix as many build failures as possible for the new release. The event is, as per tradition, called Zero Hydra Failures, or ZHF for short.

This year, me and fellow hacker cab404 had organized a hackathon to help the cause and spread the Nix love around. The basic premise was, quoting cab404:


We wanted to fix as many builds as possible, right before the branchoff. Fixing the broken builds would allow NixOS 21.11 to have a more complete package set.

The main point of this post is to share the experience with people looking to organize similar events.


Due to the current lockdown in Moscow, we weren’t able to decide whether the hackathon would happen at all until about a week before the last possible date (the branchoff). This limited our ability to advertise the event in time for all potentially interested people to be able to join. Despite this, we tried our best to advertise the event using the following channels:


Obviously, we should have planned the event earlier. A week’s notice is way too little for many people, especially on a Friday. In hindsight, we could have anticipated that the branch-off was going to be late and ran the event on Saturday.


The event took place in undefspace, both physically and virtually (via

We had provided lots of tea and snacks to physical participants, and a build machine to speed up builds.


First of all, the hackerspace was quite small. All of us managed to fit, just. If the event attracted any more people, it could become problematic. Plan your capacity!

Another problem was with the build server setup: while it was running, we didn’t have time to provide people with actual instructions on using it, so a lot of time was spent building packages on slow laptop CPUs instead of the significantly powerful build machine. The theme of lack of instructions limiting the impact of the event deserves a separate section.


As happens, most participants came in late. This meant that the spoken instructions I gave at the beginning weren’t heard by everyone, and it resulted in a slowdown in the hacking while people were trying to understand the process.

Another issue was that the written instructions on the website were aimed at mostly at experienced contributors, but most of participants didn’t have much nixpkgs experience — in fact, two of them made their first open-source contributions during the hackathon!


The takeaway here is that a lot of attention should be given to making instructions for the hacking you’re going to do — make sure to have something for newcomers as well as experienced hackers.


Friends were made

Picture from the hackathon

Nix{,OS,pkgs, Flakes} knowledge was shared

One of the best things about in-person hackathons is the ability to share knowledge and ideas — and plenty were shared!

  • I have learned some neat git-over-ssh tricks from Cab (in particular, I was pleasantly surprised that one doesn’t need any special git server for an ssh remote!). Also, thanks to Alexander I know about genericBuild which runs all phases as is done inside the nix sandbox during a regular nix build. And finally, I have confirmed I can install NixOS on a headless box with only the keyboard connected!
  • Nommy and Alexander explored the wonders of Nix Flakes: learned the usage basics (nix build, nix shell, nix develop), wrote their first flake.nix-es, and used their new knowledge to simplify the process of build debugging;
  • Anton has refreshed his Nix skills and explored the nixpkgs Haskell infrastructure in the process of fixing the mime-strings package.

And, most importantly, builds were fixed!

In total, 10 PRs were submitted during the hackathon:

We fixed too many packages to count manually, and it’s not an easy thing to count programmatically. However, the openjfx11 fix on x86_64-linux has fixed a lot of Java packages, and other pull requests typically fixed one or two packages.

How are we going to improve the next one?

Pick the right time, in advance: we will try our best to arrange the hackathon on a weekend, with at least two weeks’ notice.

Inform people about the build server: a fast build server speeds up the debugging process significantly. Telling people about it, together with instructions on setting up a build server, is important.

Provide better instructions for all skill levels: prominently displayed instructions on what exactly people need to do, together with links to learning materials for novices, should reduce the need for repeated explanations tête-à-tête, and speed the hacking significantly.

December 15, 2021 12:00 AM

December 13, 2021

Gabriel Gonzalez

Funding isn't the problem with open source


I’m writing this partially in response to the recent log4j vulnerability, where I see several people suggest that funding open source projects could have prevented this security meltdown from happening. In this post I hope to convince you that the biggest problem with open source projects isn’t the lack of funding.

Don’t get me wrong: I do believe that open source contributors should be paid a comfortable wage so that they can contribute full-time to the project. However, there is a greater disease that afflicts open source projects and funding them in the wrong way will only worsen the affliction.

The true problem with open source projects is catering to the needs of large and exploitative tech companies. Open source projects will continue to compromise on quality, ethics, and fair working conditions so long as they view adoption by these companies as the highest measure of success.

The log4j vulnerability

I don’t mean to dunk on log4j or its maintainers, but the log4j vulnerability perfectly illustrates the problem with putting the needs of big business first.

The maintainers of the log4j project knew that one of the lesser-known features was potentially problematic (although perhaps they underestimated the impact). However, did not remove the feature out of concern for breaking backwards compatibility. This is covered in more detail here:

As the above post notes, if large tech companies had funded the log4j project that would have only increased the pressure to not break backwards compatibility. Large companies are notorious for being loathe to migrate their codebases to accommodate breaking changes, to the point that they will significantly fall behind on upgrades or vendor an open source project. These companies consistently place their priorities ahead of the needs of the wider open source ecosystem.


The log4j incident is a symptom of a larger problem: many large and publicly-traded companies are exploitative and abusive and open source projects that simp for these large companies aren’t doing themselves any favors.

Not all companies are bad, but we can all tell when a given company has lost all sense of ethics or social responsibility when they do things like:

  • engaging in anti-competitive business practices

  • exposing sensitive user data through willful neglect

  • busting unions unscrupulously

  • doing business with authoritarian regimes

  • not even attempting to morally justify their actions

    “The shareholders made us do it”

Is having an ethically dubious logo on your project’s page really something to be proud of? Or is it actually a red flag? Think about it:

Do you believe that a company that asks their employees to cut corners won’t ask open source projects they sponsor to hack around problems?

Do you believe that a company that colludes with other employers to depress wages will agree to fair working conditions for the open source projects they depend on?

Do you believe that a company that has compromised on its own morals won’t pressure its dependencies to do the same?

Free software vs open source

I’m not exaggerating when I say that businesses encourage open source developers to compromise on their morals, because this has already happened.

The most prominent example is that the predecessor to the open source movement was the free software movement, which was not just about making source code freely available, but also securing certain freedoms for end users, such as the right to inspect, modify and run the software their lives depend on.

The free software movement is fundamentally a moral movement, and whether or not you agree with their goals, their stated purpose is grounded in ethical arguments, not a business justification. Richard Stallman discusses this in Why Open Source Misses the Point of Free Software:

For the free software movement, free software is an ethical imperative, essential respect for the users’ freedom. By contrast, the philosophy of open source considers issues in terms of how to make software “better”—in a practical sense only. It says that nonfree software is an inferior solution to the practical problem at hand.

In contrast, the open source movement originated when businesses pressured developers to compromise on their ethical principles (especially copyleft licenses) in favor of being as easy to commercialize as possible (especially with permissive licenses). The same essay notes:

That is, however, what the leaders of open source decided to do. They figured that by keeping quiet about ethics and freedom, and talking only about the immediate practical benefits of certain free software, they might be able to “sell” the software more effectively to certain users, especially business.

… and they were successful, if you define success as wildly enriching tech companies.


I’m not an expert on fixing this problem, but I also don’t believe that open source developers need to accept exploitative working conditions. I’ll briefly mention some alternatives that people have shared with me when I’ve brought up this subject in the past.

First off, funding is still important (despite this post’s title) and anybody interested in doing open source full-time should read:

… and if you do accept sponsorships, try to steer clear of accepting funding from companies with a dubious record. Even if they never explicitly threaten to withdraw their sponsorship the implicit risk will always loom over you and influence your actions.

If you form a business around your open source software you should prefer more ethical and sustainable business models. For example, consider staying away from venture capital and instead see if something like a co-op might be more appropriate, which is described in more detail here:

This is more common than you think and you can find a list of open source projects backed by co-ops here:

However, open source developers should first reflect upon what success looks likes for their project before pursuing funding. If your measure of success is “large companies use my project or fund my project”, you still run the risk of being taken advantage of by those same companies. If your goal is to be used then, well, you’ll be used.

More generally, I believe that open source development should return to its roots as a free software movement guided by moral principles. Doing so would help the open source community set better boundaries, which would in turn improve software quality, funding, and working conditions. Without a moral center to give developers a spine, they’ll continue to race to the bottom to please corporate interests.

by Gabriella Gonzalez ( at December 13, 2021 04:25 PM


GHC activities report: October-November 2021

This is the ninth edition of our GHC activities report, which describes the work on GHC and related projects that we are doing at Well-Typed. The current edition covers roughly the months of October and November 2021.

You can find the previous editions collected under the ghc-activities-report tag.

A bit of background: One aspect of our work at Well-Typed is to support GHC and the Haskell core infrastructure. Several companies, including IOHK, Facebook, and GitHub via the Haskell Foundation, are providing us with funding to do this work. We are also working with Hasura on better debugging tools. We are very grateful on behalf of the whole Haskell community for the support these companies provide.

If you are interested in also contributing funding to ensure we can continue or even scale up this kind of work, please get in touch.

Of course, GHC is a large community effort, and Well-Typed’s contributions are just a small part of this. This report does not aim to give an exhaustive picture of all GHC work that is ongoing, and there are many fantastic features currently being worked on that are omitted here simply because none of us are currently involved in them in any way. Furthermore, the aspects we do mention are still the work of many people. In many cases, we have just been helping with the last few steps of integration. We are immensely grateful to everyone contributing to GHC. Please keep doing so (or start)!


The current GHC team consists of Ben Gamari, Andreas Klebinger, Matthew Pickering, Zubin Duggal and Sam Derbyshire.

Many others within Well-Typed, including Adam Gundry, Alfredo Di Napoli, Alp Mestanogullari, Douglas Wilson and Oleg Grenrus, are contributing to GHC more occasionally.


  • Ben released GHC 9.2.1 and started work on preparing 9.2.2.
  • Ben and Zubin continue to finalize the 9.0.2 release.


  • Sam introduced Concrete# constraints, which allow GHC to enforce the representation-polymorphism invariants during typechecking. This brings GHC in line with the French approach (generate constraints then solve them).

  • Matt has started to refactor how the provenance of type variables is tracked to solve the dreaded “no skolem info” errors messages. (#20732)

  • Sam rectified some oddities surrounding defaulting in type families, fixing #17536.

  • Sam improved the way GHC understands class declarations with no methods in hs-boot files (#20661).

  • Sam fixed various bugs in the typechecker, such as incoherence stemming from GHC confusing Type and Constraint in some circumstances (#20521), a panic involving implication constraints (#20043).

  • Sam ensured that GHC rejects GADT pattern matches in arrow notation, as Alexis King discovered that the current implementation suffers from severe problems (#20469, #20470).

Code generation


  • Ben fixed a desugarer bug resulting in modules to fail to load in GHCi (#20570), enabling the entirety of GHC to be loaded into GHCi

Runtime system

  • Ben debugged a runtime crash, identifying the cause as an interaction between the garbage collector’s treatment of CAFs and the linker’s code unloading logic (#20649).

  • Ben debugged and fixed a missing non-moving write barrier in the MVar implementation (#20399)

Error messages

  • Sam has been improving the error messages reported to the user in the case of unsolved constraints. For example, GHC can now remind the user about how overlapping instances work (#20542).


  • Matt has finished the multiple home units patch (!6805) which is now waiting for review. The patch allows multiple packages to be compiled in one GHC session. The largest example which has been tried so far is loading the whole of head.hackage at once. This amounts to 4700 modules and 450 packages in a single session.

  • Ben investigated a compilation failure on Windows (#20682) which ended up being due to incorrect an object merging implementation in Cabal. To avoid this, he implemented a new GHC mode, --merge-objs, which tools like Cabal can use to avoid repeating subtle linking logic throughout the ecosystem.

  • Matt fixed and clarified the logic around regenerating interface files (!6846) in --make mode. This only affects projects which have hs-boot files but was causing confusing core lint errors for packages such as Agda when build with HEAD.

  • Matt corrected some more bugs in -dynamic-too recompilation checking and tried to tidy up some loose ends to do with -dynamic-too. The testsuite coverage is also now much better for this feature (!6583).

  • Matt enabled support in GHCi for CApi FFI calls (!6904).


  • Matt and Andreas have been working on improvements to ticky profiling. Each ticky counter is now given a source location using the info table map. This makes it much easier to work out which part of your program each counter has come from. We have also modernised how to inspect a ticky profile by adding support to eventlog2html. The profile is now rendered as an interactive, searchable, sortable table rather than the fixed textual format.

  • Ben fixed a data race in the ticky profiler which can result in runtime hangs during profile generation (#20451)

  • Ben introduced support for running hpc code coverage collection on GHC and is working to significantly improve performance of the hpc library


  • Andreas revived an older patch for adding HasCallStack constraints to some notorious functions in base. The patch was initial written by Oleg Grenrus and the proposal discussed in #17040.

  • Matt has finished the revert of the Data.List specialisation patch which caused a large amount of unexpected ecosystem churn. The proposal now awaits a new plan from the revamped CLC.

  • Ben integrated text-2.0 into the compiler, fixing a number of issues pertaining to C++ linkage in the process (#20346).

  • Ben debugged a link failure when building foreign libraries on Windows and submitted a fix upstream to Cabal (#20520).

Compiler Performance

  • Matt continued to look into memory usage and found some large improvements for GHCi users. Now the memory overhead when reloading packages should be lower (!6773) and some leaks when using -fno-code were sorted out (!6775). Peak memory usage when loading Agda into GHCi is reduced by half, from 5GB to 2.5GB.

  • Matt, Sam and Zubin worked together to identify that part of the backpack implementation had adverse memory usage consequences for standard users. (!6763)

  • Matt wrote down (!6758) some heap structure invariants which we think should hold. These invariants are things which can be checked using ghc-debug.

  • Adam and Sam have continued to work on directed coercions. These store less information than ordinary coercions, which helps avoid generating quadratically-large Core when reducing type families (which is one of the main causes behind slow compilation of programs using type families, #8095).

Runtime Performance

  • Andreas looked at some issues regarding the CmmSink optimization in GHC (#20679, #20334). They were partially resolved by !6981. The remaining issues are discussed in #20679 and there is a WIP patch in !6988 which should fix these for good. This improves register allocation in certain edge cases involving unlifted data types or records with a large number of fields.


  • Matt stabilised the CI performance tests by realising they were sensitive to the size of the environment and hence the length of a commit message (!6612).

  • Ben worked to fix a number of remaining testsuite failures in the statically-linked Alpine build (#20574, #20523, #20706).

  • Ben reworked the provisioning of the FreeBSD CI runners, started work to add FreeBSD 13 targets, and worked to debug many of the testsuite failures present on FreeBSD (#20095, #19723, #20354).

  • Ben reworked the infrastructure for managing the Linux runners provided by Azure.

  • Ben fixed a number of packaging issues (#19963, #20592, #20707).

  • Ben started collecting patches to remove the make build system from GHC, in preparation for a full migration to Hadrian.

by ben, andreask, matthew, zubin, sam, adam at December 13, 2021 12:00 AM

December 08, 2021

Magnus Therning

Magit/forge and self-hosted GitLab

As I found the documentation for adding a self-hosted instance of GitLab to to magit/forge a bit difficult, I thought I'd write a note for my future self (and anyone else who might find it useful).

First put the following in `~/.gitconfig`

[gitlab ""]
  user = my.username

Then create an access token on GitLab. I ticked api and write_repository, which seems to work fine so far. Put the token in ~/.authinfo.gpg

machine login my.user^forge password <token>

(Remember that a newline is needed at the end of the file.)

Finally, add the GitLab instance to 'forge-alist

 '(("" "" "" forge-gitlab-repository)
   ("" "" "" forge-github-repository)
   ("" "" "" forge-gitlab-repository))

That's it!

December 08, 2021 05:57 AM

November 30, 2021

Matt Parsons

RankNTypes via Lambda Calculus

RankNTypes is a language extension in Haskell that allows you to write even more polymorphic programs. The most basic explanation is that it allows the implementer of a function to pick a type, rather than the caller of the function. A very brief version of this explanation follows:

The Typical Explanation

Consider the identity function, or const:

id :: a -> a
id x = x

const :: a -> b -> a
const a b = a

These functions work for any types that the caller of the function picks. Which means that, as implementers, we can’t know anything about the types involved.

Let’s say we want to apply a function to each element in a tuple. Without a type signature, we can write:

applyToBoth f (a, b) = (f a, f b)

Vanilla Haskell will provide this type:

applyToBoth :: (a -> a) -> (a, a) -> (a, a)

This is a perfectly useful type, but what if we want to apply it to a tuple containing two different types? Well, we can’t do anything terribly interesting with that - if we don’t know anything about the type, the only thing we can provide is id.

applyToBoth :: (forall x. x -> x) -> (a, b) -> (a, b)
applyToBoth f (a, b) = (f a, f b)

And that forall x inside of a parentheses is a RankNType. It allows the implementer of the function to select the type that the function will be used at, and the caller of the function must provide something sufficiently polymorphic.

This explanation is a bit weird and difficult, even though it captures the basic intuition. It’s not super obvious why the caller or the implementer gets to pick types, though. Fortunately, by leveraging the lambda calculus, we can make this more precise!

Whirlwind Tour of Lambda

Feel free to skip this section if you’re familiar with the lambda calculus. We’re going to work from untyped, to simply typed, and finally to the polymorphic lambda calculus. This will be sufficient for us to get a feeling for what RankNTypes are.

Untyped Lambda Calculus

The untyped lambda calculus is an extremely simple programming language with three things:

  1. Variables
  2. Anonymous Functions (aka lambdas)
  3. Function Application

This language is Turing complete, surprisingly. We’ll use Haskell syntax, but basically, you can write things like:

id = \x -> x

const = \a -> \b -> a

apply = \f -> \a -> f a

Simply Typed Lambda Calculus

The simply typed lambda calculus adds an extremely simple type system to the untyped lambda calculus. All terms must be given a type, and we will have a pretty simple type system - we’ll only have Unit and function arrows. A lambda will always introduce a function arrow, and a function application always eliminates it.

id :: Unit -> Unit
id = \(x :: Unit) -> x

idFn :: (Unit -> Unit) -> (Unit -> Unit)
idFn = \(f :: Unit -> Unit) -> f

const :: Unit -> Unit -> Unit
const = \(a :: Unit) -> \(b :: Unit) -> a

apply :: (Unit -> Unit) -> Unit -> Unit
apply = \(f :: Unit -> Unit) -> \(a :: Unit) -> f a

This is a much less powerful programming language - it is not even Turing Complete. This makes sense - type systems forbid certain valid programs that are otherwise syntactically valid.

The type system in this is only capable of referring to the constants that we provide. Since we only have Unit and -> as valid type constants, we have a super limited ability to write programs. We can still do quite a bit - natural numbers and Boolean types are perfectly expressible, but many higher order combinators are impossible.

Let’s add polymorphic types.

Polymorphic Lambda Calculus

The magic of the lambda calculus is that we have a means of introducing variables. The problem of the simply typed lambda calculus is that we don’t have variables. So we can introduce type variables.

Like Haskell, we’ll use forall to introduce type variables. In a type signature, the syntax will be the same. However, unlike Haskell, we’re going to have explicit type variable application and introduction at the value level as well.

Let’s write id with our new explicit type variables.

id :: forall a. a -> a
id = forall a. \(x :: a) -> x

Let’s write const and apply.

const :: forall a. forall b. a -> b -> a
const = forall a. forall b. \(x :: a) -> \(y :: b) -> x

apply :: forall a. forall b. (a -> b) -> a -> b
apply = forall a. forall b. \(f :: a -> b) -> \(x :: a) -> f x

Finally, let’s apply some type variables.

constUnit :: Unit -> Unit -> Unit
constUnit = 
    const @Unit @Unit 

idUnitFn :: (Unit -> Unit) -> (Unit -> Unit)
idUnitFn = 
    id @(Unit -> Unit) f

idReturnUnitFn :: forall a. (a -> Unit) -> (a -> Unit)
idReturnUnitFn =
    forall a. id @(a -> Unit)

constUnitFn :: Unit -> (Unit -> Unit) -> Unit
constUnitFn = 
    const @Unit @(Unit -> Unit)

We’re passing types to functions. With all of these simple functions, the caller gets to provide the type. If we want the implementer to provide a type, then we’d just put the forall inside a parentheses. Let’s look at the applyBoth from above. This time, we’ll have explicit type annotations and introductions!

    :: forall a. forall b. (forall x. x -> x) -> (a, b) -> (a, b)
applyBoth =
    forall a. forall b.           -- [1]
    \(f :: forall x. x -> x) ->   -- [2]
    \((k, h) :: (a, b)) ->        -- [3]
        (f @a k, f @b h)          -- [4]

There’s a good bit going on here, so let’s break it down on a line-by-line basis.

  1. Here, we’re introducing our type variables a and b so that we can refer to them in the type signatures of our variables, and apply them to our functions.
  2. Here, we’re introducing our first value parameter - the function f, which itself has a type that accepts a type variable.
  3. Now, we’re accepting our second value parameter - a tuple (k, h) :: (a, b). We can refer to a and b in this signature because we’ve introduced them in step 1.
  4. Finally, we’re supplying the type @a to our function f in the left hand of the tuple, and the type @b to the type in the right. This allows our types to check.

Let’s see what it looks like to call this function. To give us some more interesting types to work with, we’ll include Int and Bool literals.

foo :: (Int, Bool)
foo = 
        @Int @Bool 
        (3, True)

We haven’t decided what _f will look like exactly, but the type of the value is forall x. x -> x. So, syntactically, we’ll introduce our type variable, then our value-variable:

foo :: (Int, Bool)
foo = 
        @Int @Bool 
        (forall x. \(a :: x) -> (_ :: x))
        (3, True)

As it happens, the only value we can possibly plug in here is a :: x to satisfy this. We know absolutely nothing about the type x, so we cannot do anything with it.

foo :: (Int, Bool)
foo = 
        @Int @Bool 
        (forall x. \(a :: x) -> a)
        (3, True)

Tug of War

applyBoth is an awful example of RankNTypes because there’s literally nothing useful you can do with it. The reason is that we don’t give the caller of the function any options! By giving the caller of the function more information, they can do more useful and interesting things with the results.

This mirrors the guarantee of parametric polymorphism. The less that we know about our inputs, the less we can do with them - until we get to types like const :: a -> b -> a where the implementation is completely constrained.

What this means is that we provide, as arguments to the callback function, more information!

Let’s consider this other type:

applyBothList :: (forall x. [x] -> Int) -> ([a], [b]) -> (Int, Int)
applyBothList f (as, bs) = 
    (f as, f bs)

Now the function knows a good bit more: we have a list as our input (even if we don’t know anything aobut the type), and the output is an Int. Let’s translate this to our polymorphic lambda calculus.

applyBothList =
    forall a. forall b.
    \(f :: forall x. [x] -> Int) ->
    \( as :: [a], bs :: [b] ) ->
    ( f @a as, f @b bs )

When we call this function, this is what it looks like:

        @Int @Char
        (forall x. \(xs :: [x]) -> length @x xs * 2)
        ( [1, 2, 3], ['a', 'b', 'c', 'd'] )


In Haskell, a type class constraint is elaborated into a record-of-functions that is indexed by the type.

class Num a where
    fromInteger :: Integer -> a
    (+) :: a -> a -> a
    (*) :: a -> a -> a
    (-) :: a -> a -> a
    -- etc...

-- under the hood, this is the same thing:

data NumDict a = NumDict
    { fromInteger :: Integer -> a
    , (+) :: a -> a -> a
    , (-) :: a -> a -> a
    , (*) :: a -> a -> a

When you have a function that accepts a Num a argument, GHC turns it into a NumDict a and passes it explicitly.

-- Regular Haskell:
square :: Num a => a -> a
square a = a * a

-- What hapens at runtime:
square :: NumDict a -> a -> a
square NumDict {..} a = a * a

Or, for a simpler variant, let’s consider Eq.

-- Regular Haskell:
class Eq a where
    (==) :: a -> a -> Bool

-- Runtime dictionary:
newtype EqDict a = EqDict { (==) :: a -> a -> Bool }

-- Regular:
allEqual :: (Eq a) => a -> a -> a -> Bool
allEqual x y z =
    x == y && y == z && x == z

-- Runtime dictionary:
allEqual :: EqDict a -> a -> a -> a -> Bool
allEqual (EqDict (==)) x y z =
    x == y && y == z && x == z

(Note that binding a variable name to an operator is perfectly legal!)

One common way to extend the power or flexibility of a RankNTypes program is to include allowed constraints in the callback function. Knowing how and when things come into scope can be tricky, but if we remember our polymorphic lambda calculus, this becomes easy.

Consider this weird signature:

weirdNum :: (forall a. Num a => a) -> String
weirdNum someNumber = 
    show (someNumber @Int)

This isn’t exactly a function. What sort of things can we call here?

Well, we have to produce an a. And we know that we have a Num a constraint. This means we can call fromInteger :: Integer -> a. And, we can also use any other Num methods - so we can add to it, double it, square it, etc.

So, calling it might look like this:

main = do
    putStrLn $ weirdNum (fromInteger 3 + 6 * 2)

Let’s elaborate this to our lambda calculus. We’ll convert the type class constraint into an explicit dictionary, and then everything should work normally.

weirdNum =
    \(number :: forall a. NumDict a -> a) ->
        show @Int intShowDict(number @Int intNumDict)

Now, let’s call this:

        ( forall a. 
        \(dict :: NumDict a) -> 
            fromInteger dict 3

More on the Lambda Calculus

If you’ve found this elaboration interesting, you may want to consider reading Type Theory and Formal Proof. This book is extremely accessible, and it taught me almost everything I know about the lambda calculus.

November 30, 2021 12:00 AM

November 28, 2021

Joachim Breitner

Zero-downtime upgrades of Internet Computer canisters

TL;DR: Zero-downtime upgrades are possible if you stick to the basic actor model.


DFINITY’s Internet Computer provides a kind of serverless compute platform, where the services are WebAssemmbly programs called “canisters”. These services run without stopping (or at least that’s what it feels like from the service’s perspective; this is called “orthogonal persistence”), and process one message after another. Messages not only come from the outside (“ingress” calls), but are also exchanged between canisters.

On top of these uni-directional messages, the system provides the concept of “inter-canister calls”, which associates a respondse message with the outgoing message, and guarantees that a response will come. This RPC-like interface allows canister developers to program in the popular async/await model, where these inter-canister calls look almost like normal function calls, and the subsequent code is suspended until the response comes back.

The problem

This is all very well, until you try to upgrade your canister, i.e. install new code to fix a bug or add a feature. Because if you used the await pattern, there may still be suspended computations waiting for the response. If you swap out the program now, the code of that suspended computation will no longer be present, and the response cannot be handled! Worse, because of an infelicity with the current system’s API, when the response comes back, it may actually corrupt your service’s state.

That is why upgrading a canister requires stopping it first, which means waiting for all outstanding calls to come back. During this time, your canister is not available for new calls (so there is downtime), and worse, the length of the downtime is at the whims of the canisters you called – they could withhold the response ad infinitum, rendering your canister unupgradeable.

Clearly, this is not acceptable for any serious application. In this post, I’ll explore some of the ways to mitigate this problem, and how to create canisters that are safely instantanously (no downtime) upgradeable.

It’s a spectrum

Some canisters are trivially upgradeable, for others all hope is lost; it depends on what the canister does and how. As an overview, here is the spectrum:

  1. A canister that never performs inter-canister calls can always be upgraded without stopping.
  2. A canister that only does one-way calls, and does them in a particular way (see below), can always be upgraded without stopping.
  3. A canister that performs calls, and where it is acceptable to simply drop outstanding repsonses, can always be upgraded without stopping, once the System API has been improved and your Canister Development Kit (CDK; Motoko or Rust) has adapted.
  4. A canister that performs calls, but uses explicit continuations to handle, responses instead of the await-convenience, based on an eventually fixed System API, can be upgradeded without stopping, and will even handle responses afterwards.
  5. A canister that uses await to do inter-canister call cannot be upgraded without stopping.

In this post I will explain 2, which is possible now, in more detail. Variant 3 and 4 only become reality if and when the System API has improved.

One-way calls

A one-way call is a call where you don’t care about the response; neither the replied data, nor possible failure conditions.

Since you don’t care about the response, you can pass an invalid continuation to the system (technical detail: a Wasm table index of -1). Because it is invalid for any (realistic) Wasm module, it will stay invalid even after an upgrade, and the problem of silent corruption mentioned above is avoided. And otherwise it’s fine for this to be invalid: it means the canister “traps” once the response comes back, which is harmeless (and possibly even cheaper than a do-nothing computation).

This requires your CDK to support this kind of call. Mostly incidential, Motoko (and Candid) actually have the concept of one-way call in their type system, namely shared functions with return type () instead of async ... (Motoko is actually older than the system, and not every prediction about what the system will provide has proven successful). So, pending this PR to be released, Motoko will implement one-way calls in this way. On Rust, you have to use the System API directly or wait for cdk-rs to provide this ability (patches welcome, happy to advise).

You might wonder: How are calls useful if I don’t get to look at the response? Of course, this is a set-back – calls with responses are useful, and await is convenient. And if you have to integrate with an existing service that only provides normal calls, you are out of luck.

But if you get to design the canister and all called canisters together, it may be possible to use only one-way messages. You’d be programming in the plain actor model now, with all its advantages (simple concurrency, easy to upgrade, general robustness).

Consider for example a token ledger canister, not unlike the ICP ledger canister. For the most part, it doesn’t have to do any outgoing calls (and thus be trivially upgradeble). But say we need to add notify functionality, where the ledger canister tells other canisters about a transaction. This is a good example for a one-way call: Maybe the ledger canister doesn’t care if that notification was received? The ICP leder does care (once it comes back successful, this particular notification cannot be sent again), but maybe your ledger can do it differently: let the other canister confirm the receip via another one-way call, instead of via the reply; or simply charge for each notification and do not worry about repeated notifications.

Maybe you want to add archiving functionality, where the ledger canister streams its data to an archive canister. There, again, instead of using successful responses to confirm receipt, the archive canister can ping the ledger canister with the latest received index directly.

Yes, it changes the programming model a bit, and all involved parties have to play together, but the gain (zero-downtime upgrades) is quite valuable, and removes a fair number of other sources of issues.

And in the future?

The above is possible with today’s Internet Computer. If the System API gets improves the way I hope it will be, you have a possible middle ground: You still don’t get to use await and instead have to write your response handler as separate functions, but this way you can call any canister again, and you get the system’s assistance in mapping responses to calls. With this in place, any canister can be rewritten to a form that supports zero-downtime upgrades, without affecting its interface or what the canister can do.

by Joachim Breitner ( at November 28, 2021 05:11 PM

November 27, 2021

Magnus Therning

Fallback of actions

In a tool I'm writing I want to load a file that may reside on the local disk, but if it isn't there I want to fetch it from the web. Basically it's very similar to having a cache and dealing with a miss, except in my case I don't populate the cache.

Let me first define the functions to play with

loadFromDisk :: String -> IO (Either String Int)
loadFromDisk k@"bad key" = do
    putStrLn $ "local: " <> k
    pure $ Left $ "no such local key: " <> k
loadFromDisk k = do
    putStrLn $ "local: " <> k
    pure $ Right $ length k

loadFromWeb :: String -> IO (Either String Int)
loadFromWeb k@"bad key" = do
    putStrLn $ "web: " <> k
    pure $ Left $ "no such remote key: " <> k
loadFromWeb k = do
    putStrLn $ "web: " <> k
    pure $ Right $ length k

Discarded solution: using the Alternative of IO directly

It's fairly easy to get the desired behaviour but Alternative of IO is based on exceptions which doesn't strike me as a good idea unless one is using IO directly. That is fine in a smallish application, but in my case it makes sense to use tagless style (or ReaderT pattern) so I'll skip exploring this option completely.

First attempt: lifting into the Alternative of Either e

There's an instance of Alternative for Either e in version 0.5 of transformers. It's deprecated and it's gone in newer versions of the library as one really should use Except or ExceptT instead. Even if I don't think it's where I want to end up, it's not an altogether bad place to start.

Now let's define a function using liftA2 (<|>) to make it easy to see what the behaviour is

fallBack ::
    Applicative m =>
    m (Either String res) ->
    m (Either String res) ->
    m (Either String res)
fallBack = liftA2 (<|>)
λ> loadFromDisk "bad key" `fallBack` loadFromWeb "good key"
local: bad key
web: good key
Right 8

λ> loadFromDisk "bad key" `fallBack` loadFromWeb "bad key"
local: bad key
web: bad key
Left "no such remote key: bad key"

The first example shows that it falls back to loading form the web, and the second one shows that it's only the last failure that survives. The latter part, that only the last failure survives, isn't ideal but I think I can live with that. If I were interested in collecting all failures I would reach for Validation from validation-selective (there's one in validation that should work too).

So far so good, but the next example shows a behaviour I don't want

λ> loadFromDisk "good key" `fallBack` loadFromWeb "good key"
local: good key
web: good key
Right 8

or to make it even more explicit

λ> loadFromDisk "good key" `fallBack` undefined
local: good key
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
  error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err
  undefined, called at <interactive>:451:36 in interactive:Ghci4

There's no short-circuiting!1

The behaviour I want is of course that if the first action is successful, then the second action shouldn't take place at all.

It looks like either <|> is strict in its second argument, or maybe it's liftA2 that forces it. I've not bothered digging into the details, it's enough to observe it to realise that this approach isn't good enough.

Second attempt: cutting it short, manually

Fixing the lack of short-circuiting the evaluation after the first success isn't too difficult to do manually. Something like this does it

fallBack ::
    Monad m =>
    m (Either String a) ->
    m (Either String a) ->
    m (Either String a)
fallBack first other = do
    first >>= \case
        r@(Right _) -> pure r
        r@(Left _) -> (r <|>) <$> other

It does indeed show the behaviour I want

λ> loadFromDisk "bad key" `fallBack` loadFromWeb "good key"
local: bad key
web: good key
Right 8

λ> loadFromDisk "bad key" `fallBack` loadFromWeb "bad key"
local: bad key
web: bad key
Left "no such remote key: bad key"

λ> loadFromDisk "good key" `fallBack` undefined
local: good key
Right 8

Excellent! And to switch over to use Validation one just have to switch constructors, Right becomes Success and Left becomes Failure. Though collecting the failures by concatenating strings isn't the best idea of course. Switching to some other Monoid (that's the constraint on the failure type) isn't too difficult.

fallBack ::
    (Monad m, Monoid e) =>
    m (Validation e a) ->
    m (Validation e a) ->
    m (Validation e a)
fallBack first other = do
    first >>= \case
        r@(Success _) -> pure r
        r@(Failure _) -> (r <|>) <$> other

Third attempt: pulling failures out to MonadPlus

After writing the fallBack function I still wanted to explore other solutions. There's almost always something more out there in the Haskell eco system, right? So I asked in the #haskell-beginners channel on the Functional Programming Slack. The way I asked the question resulted in answers that iterates over a list of actions and cutting at the first success.

The first suggestion had me a little confused at first, but once I re-organised the helper function a little it made more sense to me.

mFromRight :: MonadPlus m => m (Either err res) -> m res
mFromRight = (either (const mzero) return =<<)

To use it put the actions in a list, map the helper above, and finally run asum on it all2. I think it makes it a little clearer what happens if it's rewritten like this.

firstRightM :: MonadPlus m => [m (Either err res)] -> m res
firstRightM = asum . fmap go
    go m = m >>= either (const mzero) return
λ> firstRightM [loadFromDisk "bad key", loadFromWeb "good key"]
local: bad key
web: good key

λ> firstRightM [loadFromDisk "good key", undefined]
local: good key

So far so good, but I left out the case where both fail, because that's sort of the fly in the ointment here

λ> firstRightM [loadFromDisk "bad key", loadFromWeb "bad key"]
local: bad key
web: bad key
*** Exception: user error (mzero)

It's not nice to be back to deal with exceptions, but it's possible to recover, e.g. by appending <|> pure 0.

λ> firstRightM [loadFromDisk "bad key", loadFromWeb "bad key"] <|> pure 0
local: bad key
web: bad key

However that removes the ability to deal with the situation where all actions fail. Not nice! Add to that the difficulty of coming up with a good MonadPlus instance for an application monad; one basically have to resort to the same thing as for IO, i.e. to throw an exception. Also not nice!

Fourth attempt: wrapping in ExceptT to get its Alternative behaviour

This was another suggestion from the Slack channel, and it is the one I like the most. Again it was suggested as a way to stop at the first successful action in a list of actions.

firstRightM ::
    (Foldable t, Functor t, Monad m, Monoid err) =>
    t (m (Either err res)) ->
    m (Either err res)
firstRightM = runExceptT . asum . fmap ExceptT

Which can be used similarly to the previous one. It's also easy to write a variant of fallBack for it.

fallBack ::
    (Monad m, Monoid err) =>
    m (Either err res) ->
    m (Either err res) ->
    m (Either err res)
fallBack first other = runExceptT $ ExceptT first <|> ExceptT other
λ> loadFromDisk "bad key" `fallBack` loadFromWeb "good key"
local: bad key
web: good key
Right 8

λ> loadFromDisk "good key" `fallBack` undefined
local: good key
Right 8

λ> loadFromDisk "bad key" `fallBack` loadFromWeb "bad key"
local: bad key
web: bad key
Left "no such local key: bad keyno such remote key: bad key"

Yay! This solution has the short-circuiting behaviour I want, as well as collecting all errors on failure.


I'm still a little disappointed that liftA2 (<|>) isn't short-circuiting as I still think it's the easiest of the approaches. However, it's a problem that one has to rely on a deprecated instance of Alternative for Either String, but switching to use Validation would be only a minor change.

Manually writing the fallBack function, as I did in the second attempt, results in very explicit code which is nice as it often reduces the cognitive load for the reader. It's a contender, but using the deprecated Alternative instance is problematic and introducing Validition, an arguably not very common type, takes away a little of the appeal.

In the end I prefer the fourth attempt. It behaves exactly like I want and even though ExpectT lives in transformers I feel that it (I pull it in via mtl) is in such wide use that most Haskell programmers will be familiar with it.

One final thing to add is that the documentation of Validation is an excellent inspiration when it comes to the behaviour of its instances. I wish that the documentation of other packages, in particular commonly used ones like base, transformers, and mtl, would be more like it.



I'm not sure if it's a good term to use in this case as Wikipedia says it's for Boolean operators. I hope it's not too far a stretch to use it in this context too.


In the version of base I'm using there is no asum, so I simply copied the implementation from a later version:

asum :: (Foldable t, Alternative f) => t (f a) -> f a
asum = foldr (<|>) empty

November 27, 2021 10:31 AM

November 23, 2021

Edward Z. Yang

Interactive scraping with Jupyter and Puppeteer

One of the annoying things about scraping websites is bouncing back and forth between the browser where you are using Dev Tools to work out what selectors you should be using to scrape out data, and your actual scraping script, which is usually some batch program that may have to take a few steps before the step you are debugging. A batch script is fine once your scraper is up and running, but while developing, it's really handy to pause the scraping process at some page and fiddle around with the DOM to see what to do.

This interactive-style development is exactly what Juypter notebooks shine at; when used in conjunction with a browser-based scraping library like Puppeteer, you can have exactly this workflow. Here's the setup:

  1. Puppeteer is a JavaScript library, so you'll need a JavaScript kernel for Jupyter to run it. As an extra complication, Puppeteer is also async, so you'll need a kernel that supports async execution. Fortunately, ijavascript-await provides exactly this. Note that on recent versions of node this package does not compile; you can install this PR which makes this work: Hypothetically, we should be able to use stock ijavascript when node supports top level await, but this currently does not work:
  2. Inside the directory you will store your snotebooks, you'll need to npm install puppeteer so that it's available for your notebooks.
  3. Launch Puppeteer with let puppeteer = require('puppeteer'); let browser = await puppeteer.launch({headless: false}); and profit!

There will be a live browser instance which you can poke at using Dev Tools, and you type commands into the Jupyter notebook and see how they affect the browser state.

I tweeted about this and the commenters had some good suggestions about other things you could try:

  • You don't have to use Puppeteer; Selenium can also drive the browser, and it has a Python API to boot (so no faffing about with alternate Jupyter kernels necessary). I personally prefer working in JavaScript for crawlers, since the page scripting itself is also in JavaScript, but this is mostly a personal preference thing.
  • For simple interactions, where all you really want is to just do a few interactions and record them, Headless Recorder provides a nice extension for just directly recording operations in your browser and then getting them out in executable form. I haven't tried it out yet but it seems like it would be very easy to use.

by Edward Z. Yang at November 23, 2021 02:28 PM

November 21, 2021

Sandy Maguire

Automatically Migrating Eq of No (/=)

We’ve all spent more time talking about Eq of no (/=) than it deserves. Today Bodigrim published Migration guide for Eq of no (/=) which describes all the steps you’ll need to take in order to update your codebase for the century of the fruitbat.

But that made me think — why do humans need to do this by hand? Computers are good at this sort of thing. So I wrote a tiny little comby config that does the replacements we want. Comby is a fantastic “parser parser combinator” — which is to say, a little DSL for writing program transformations. You just write the pattern you want to match, and comby lifts it to work over whitespace, and ensures that your greedy matches are parenthesis-aware, and that sort of thing. It’s quite lovely. The config I wrote is listed at the end of this post.

Here’s a problematic module that will be very broken by Eq of no (/=):

module Neq where

import Prelude (Eq (..), Bool(..), (||))
import Data.Eq (Eq (..))

data A = A Bool Bool

instance Eq A where
  A x1 x2 /= A y1 y2 = x1 /= y1 || x2 /= x2

data B = B Bool

instance Eq B where
  B x == B y = x == y
  B x /= B y = x /= y

data C a = C a

  Eq a => Eq (C a)
  C x == C y = x == y
  C x /= C y = x /= y

data D = D Bool

instance Eq D where
  D x /= D y =
    x /= y
  D x == D y =
    x == y

data E = E Bool

instance Eq E where
  E x /= E y =
    let foo = x /= y in foo

After running comby, we get the following diff:

 module Neq where

-import Prelude (Eq (..), Bool)
-import Data.Eq (Eq (..))
-data A = A Bool
+import Prelude (Eq, (==), (/=), Bool(..), (||))
+import Data.Eq (Eq, (==), (/=))
+data A = A Bool Bool

 instance Eq A where
-  A x1 x2 /= A y1 y2 = x1 /= y1 || x2 /= x2
+  A x1 x2 == A y1 y2 = not $ x1 /= y1 || x2 /= x2

 data B = B Bool

 instance Eq B where
   B x == B y = x == y
-  B x /= B y = x /= y

 data C a = C a

 instance Eq a => Eq (C a) where
   C x == C y = x == y
-  C x /= C y = x /= y

 data D = D Bool

 instance Eq D where
-  D x /= D y = x /= y
   D x == D y = x == y

 data E = E Bool

 instance Eq E where
-  E x /= E y =
-    let foo = x /= y in foo
+  E x == E y = not $ let foo = x /= y in foo

Is it perfect? No, but it’s pretty good for the 10 minutes it took me to write. A little effort here goes a long way!

My config file to automatically migrate Eq of no (/=):

instance :[ctx]Eq :[name] where
  :[x] /= :[y] = :[z\n]
instance :[ctx]Eq :[name] where
  :[x] == :[y] = not $ :[z]

instance :[ctx]Eq :[name] where
  :[x1] == :[y1] = :[z1\n]
  :[x2] /= :[y2] = :[z2\n]
instance :[ctx]Eq :[name] where
  :[x1] == :[y1] = :[z1]

instance :[ctx]Eq :[name] where
  :[x2] /= :[y2] = :[z2\n]
  :[x1] == :[y1] = :[z1\n]
instance :[ctx]Eq :[name] where
  :[x1] == :[y1] = :[z1]

import Prelude (:[pre]Eq (..):[post])
import Prelude (:[pre]Eq, (==), (/=):[post])

import Data.Eq (:[pre]Eq (..):[post])
import Data.Eq (:[pre]Eq, (==), (/=):[post])

Save this file as eq.toml, and run comby in your project root via:

$ comby -config eq.toml -matcher .hs -i -f .hs

Comby will find and make all the changes you need, in place. Check the diff, and make whatever changes you might need. In particular, it might bork some of your whitespace — there’s an issue to get comby to play more nicely with layout-aware languages. A more specialized tool that had better awareness of Haskell’s idiosyncrasies would help here, if you have some spare engineering cycles. But when all’s said and done, comby does a damn fine job.

November 21, 2021 01:38 PM

November 17, 2021

Brent Yorgey

Competitive programming in Haskell: BFS, part 4 (implementation via STUArray)

In a previous post, we saw one way to implement our BFS API, but I claimed that it is not fast enough to solve Modulo Solitaire. Today, I want to demonstrate a faster implementation. (It’s almost certainly possible to make it faster still; I welcome suggestions!)

Once again, the idea is to replace the HashMaps from last time with mutable arrays, but in such a way that we get to keep the same pure API—almost. In order to allow arbitrary vertex types, while storing the vertices efficiently in a mutable array, we will require one extra argument to our bfs function, namely, an Enumeration specifying a way to map back and forth between vertices and array indices.

So why not instead just restrict vertices to some type that can be used as keys of a mutable array? That would work, but would unnecessarily restrict the API. For example, it is very common to see competitive programming problems that are “just” a standard graph algorithm, but on a non-obvious graph where the vertices are conceptually some more complex algebraic type, or on a graph where the vertices are specified as strings. Typically, competitive programmers just implement a mapping between vertices to integers on the fly—using either some math or some lookup data structures on the side—but wouldn’t it be nicer to be able to compositionally construct such a mapping and then have the graph search algorithm automatically handle the conversion back and forth? This is exactly what the Enumeration abstraction gives us.

This post is literate Haskell; you can obtain the source from the darcs repo. The source code (without accompanying blog post) can also be found in my comprog-hs repo.

{-# LANGUAGE FlexibleContexts    #-}
{-# LANGUAGE RankNTypes          #-}
{-# LANGUAGE RecordWildCards     #-}
{-# LANGUAGE ScopedTypeVariables #-}

module Graph where

import Enumeration

import           Control.Arrow       ((>>>))
import           Control.Monad
import           Control.Monad.ST
import qualified Data.Array.IArray   as IA
import           Data.Array.ST
import           Data.Array.Unboxed  (UArray)
import qualified Data.Array.Unboxed  as U
import           Data.Array.Unsafe   (unsafeFreeze)
import           Data.Sequence       (Seq (..), ViewL (..), (<|), (|>))
import qualified Data.Sequence       as Seq

infixl 0 >$>
(>$>) :: a -> (a -> b) -> b
(>$>) = flip ($)
{-# INLINE (>$>) #-}

exhaustM is like exhaust from the last post, but in the context of an arbitrary Monad. Each step will now be able to have effects (namely, updating mutable arrays) so needs to be monadic.

exhaustM :: Monad m => (a -> m (Maybe a)) -> a -> m a
exhaustM f = go
    go a = do
      ma <- f a
      maybe (return a) go ma

The BFSResult type is the same as before.

data BFSResult v =
  BFSR { getLevel :: v -> Maybe Int, getParent :: v -> Maybe v }

Instead of using HashMaps in our BFSState as before, we will use STUArrays.1 These are unboxed, mutable arrays which we can use in the ST monad. Note we also define V as a synonym for Int, just as a mnemonic way to remind ourselves which Int values are supposed to represent vertices.

type V = Int
data BFSState s =
  BS { level :: STUArray s V Int, parent :: STUArray s V V, queue :: Seq V }

To initialize a BFS state, we allocate new mutable level and parent arrays (initializing them to all -1 values), and fill in the level array and queue with the given start vertices. Notice how we need to be explicitly given the size of the arrays we should allocate; we will get this size from the Enumeration passed to bfs.

initBFSState :: Int -> [V] -> ST s (BFSState s)
initBFSState n vs = do
  l <- newArray (0,n-1) (-1)
  p <- newArray (0,n-1) (-1)

  forM_ vs $ \v -> writeArray l v 0
  return $ BS l p (Seq.fromList vs)

The bfs' function implements the BFS algorithm itself. Notice that it is not polymorphic in the vertex type; we will fix that with a wrapper function later. If you squint, the implementation looks very similar to the implementation of bfs from my previous post, with the big difference that everything has to be in the ST monad now.

bfs' :: Int -> [V] -> (V -> [V]) -> (V -> Bool) -> ST s (BFSState s)
bfs' n vs next goal = do
  st <- initBFSState n vs
  exhaustM bfsStep st
    bfsStep st@BS{..} = case Seq.viewl queue of
      EmptyL -> return Nothing
      v :< q'
        | goal v -> return Nothing
        | otherwise -> v >$> next >>> filterM (fmap not . visited st)
            >=> foldM (upd v) (st{queue=q'}) >>> fmap Just

    upd p b@BS{..} v = do
      lp <- readArray level p
      writeArray level v (lp + 1)
      writeArray parent v p
      return $ b{queue = queue |> v}

visited :: BFSState s -> V -> ST s Bool
visited BS{..} v = (/= -1) <$> readArray level v
{-# INLINE visited #-}

The bfs function is a wrapper around bfs'. It presents the same API as before, with the exception that it requires an extra Enumeration v argument, and uses it to convert vertices to integers for the inner bfs' call, and then back to vertices for the final result. It also handles freezing the mutable arrays returned from bfs' and constructing level and parent lookup functions that index into them. Note, the use of unsafeFreeze seems unavoidable, since runSTUArray only allows us to work with a single mutable array; in any case, it is safe for the same reason the use of unsafeFreeze in the implementation of runSTUArray itself is safe: we can see from the type of toResult that the s parameter cannot escape, so the type system will not allow any further mutation to the arrays after it completes.

bfs :: forall v. Enumeration v -> [v] -> (v -> [v]) -> (v -> Bool) -> BFSResult v
bfs Enumeration{..} vs next goal
  = toResult $ bfs' card (map locate vs) (map locate . next . select) (goal . select)
    toResult :: (forall s. ST s (BFSState s)) -> BFSResult v
    toResult m = runST $ do
      st <- m
      (level' :: UArray V Int) <- unsafeFreeze (level st)
      (parent' :: UArray V V) <- unsafeFreeze (parent st)
      return $
          ((\l -> guard (l /= -1) >> Just l) . (level' IA.!) . locate)
          ((\p -> guard (p /= -1) >> Just (select p)) . (parent' IA.!) . locate)

Incidentally, instead of adding an Enumeration v argument, why don’t we just make a type class Enumerable, like this?

class Enumerable v where
  enumeration :: Enumeration v

bfs :: forall v. Enumerable v => [v] -> ...

This would allow us to keep the same API for BFS, up to only different type class constraints on v. We could do this, but it doesn’t particularly seem worth it. It would typically require us to make a newtype for our vertex type (necessitating extra code to map in and out of the newtype) and to declare an Enumerable instance; in comparison, the current approach with an extra argument to bfs requires us to do nothing other than constructing the Enumeration itself.

Using this implementation, bfs is finally fast enough to solve Modulo Solitaire, like this:

main = C.interact $ runScanner tc >>> solve >>> format

data Move = Move { a :: !Int, b :: !Int } deriving (Eq, Show)
data TC = TC { m :: Int, s0 :: Int, moves :: [Move] } deriving (Eq, Show)

tc :: Scanner TC
tc = do
  m <- int
  n <- int
  TC m <$> int <*> n >< (Move <$> int <*> int)

type Output = Maybe Int

format :: Output -> ByteString
format = maybe "-1" showB

solve :: TC -> Output
solve TC{..} = getLevel res 0
    res = bfs (finiteE m) [s0] (\v -> map (step m v) moves) (==0)

step :: Int -> Int -> Move -> Int
step m v (Move a b) = (a*v + b) `mod` m
{-# INLINE step #-}

It’s pretty much unchanged from before, except for the need to pass an Enumeration to bfs (in this case we just use finiteE m, which is the identity on the interval [0 .. m)).

Some remaining questions

This is definitely not the end of the story.

  • Submitting all this code (BFS, Enumeration, and the above solution itself) as a single file gives a 2x speedup over submitting them as three separate modules. That’s annoying—why is that?

  • Can we make this even faster? My solution to Modulo Solitaire runs in 0.57s. There are faster Haskell solutions (for example, Anurudh Peduri has a solution that runs in 0.32s), and there are Java solutions as fast as 0.18s, so it seems to me there ought to be ways to make it much faster. If you have an idea for optimizing this code I’d be very interested to hear it! I am far from an expert in Haskell optimization.

  • Can we generalize this nicely to other kinds of graph search algorithms (at a minimum, DFS and Dijkstra)? I definitely plan to explore this question in the future.

For next time: Breaking Bad

Next time, I want to look at a few other applications of this BFS code (and perhaps see if we can improve it along the way); I challenge you to solve Breaking Bad.

  1. Why not use Vector, you ask? It’s probably even a bit faster, but the vector library is not supported on as many platforms.↩

by Brent at November 17, 2021 03:52 PM

FP Complete

Levana NFT Launch

FP Complete Corporation, headquartered in Charlotte, North Carolina, is a global technology company building next-generation software to solve complex problems.  We specialize in Server-Side Software Engineering, DevSecOps, Cloud-Native Computing, Distributed Ledger, and Advanced Programming Languages. We have been a full-stack technology partner in business for 10+ years, delivering reliable, repeatable, and highly secure software.  Our team of engineers, strategically located in over 13 countries, offers our clients one-stop advanced software engineering no matter their size.

For the past few months, the FP Complete engineering team has been working with Levana Protocol on a DeFi platform for leveraged assets on the Terra blockchain. But more recently, we've additionally been helping launch the Levana Dragons meteor shower. This NFT launch completed in the middle of last week, and to date is the largest single NFT event in the Terra ecosystem. We were very excited to be a part of this. You can read more about the NFT launch itself on the Levana Protocol blog post.

We received a lot of positive feedback about the smoothness of this launch, which was pretty wonderful feedback to hear. People expressed interest in learning about the technical decisions we made that led to such a smooth event. We also had a few hiccups occur during the launch and post-launch that are worth addressing as well.

So strap in for a journey involving cloud technologies, DevOps practices, Rust, React, and—of course—Dragons.

Overview of the event

The Levana Dragons meteor shower was an event consisting of 44 separate "showers", or drops during which NFT meteors would be issued. Participants in a shower competed by contributing UST (a Terra-specific stablecoin tied to US Dollars) to a specific Terra wallet. Contributions from a single wallet across the shower were aggregated into a single contribution, and contributions of a higher amount resulted in a better meteor. At the least granular level, this meant stratification into legendary, ancient, rare, and common meteors. But higher contributions also lead to the greater likelihood of receiving an egg inside your meteor.

Each shower was separated from the next by 1 hour, and we opened up the site about 24 hours before the first shower occurred. That means the site was active for contributions for about 67 hours straight. Then, following the showers, we needed to mint the actual NFTs, ship them to users' wallets, and open up the "cave" page where users could view their NFTs.

So all told, this was an event that spanned many days, had lots of bouts of high activity, was involved in a game that incorporated many financial transactions, and any downtime, slowness, or poor behavior could result in user frustration or worse. On top of that, given the short timeframe this event was intended to be active, attacks such as DDoS taking down the site could be catastrophic for success of the showers. And the absolute worst case would be a compromise allowing an attacker to redirect funds to a different wallet.

All that said, let's dive in.

Backend server

A major component of the meteor drop was to track contributions to the destination wallet, and provide high level data back to users about these activities. This kind of high level data included the floor prices per shower, the timestamps of the upcoming drops, total meteors a user had acquired so far, and more. All this information is publicly available on the blockchain, and in principle could have been written as frontend logic. However, the overhead of having every visitor to the site downloading essentially the entire history of transactions with the destination wallet would have made the site unusable.

Instead, we implemented a backend web server. We used Rust (with Axum) for this for multiple reasons:

  • We're very familiar with Rust
  • Rust is a high performance language, and there were serious concerns about needing to withstand surges in traffic and DDoS attacks
  • Due to CosmWasm already heavily leveraging Rust, Rust was already in use on the project

The server was responsible for keeping track of configuration data (like the shower timestamps and destination wallet address), downloading transaction information from the blockchain (using the Full Client Daemon), and answering queries to the frontend (described next) providing this information.

We could have kept data in a mutable database like PostgreSQL, but instead we decided to keep all data in memory and download from scratch from the blockchain on each application load. Given the size of the data, these two decisions initially seemed very wise. We'll see some outcomes of this when we analyze performance and look at some of our mistakes below.

React frontend

The primary interface users interacted with was a standard React frontend application. We used TypeScript, but otherwise stuck with generic tools and libraries wherever possible. We didn't end up using any state management libraries or custom CSS systems. Another thing to note is that this frontend is going to expand and evolve over time to include additional functionality around the evolving NFT concept, some of which has already happened, and we'll discuss below.

One specific item that popped up was mobile optimization. Initially, the plan was for the meteor shower site to be desktop-only. After a few beta runs, it became apparent that the majority of users were using mobile devices. As a DAO, a primary goal of Levana is to allow for distributed governance of all products and services, and therefore we felt it vital to be responsive to this community request. Redesigning the interface for mobile and then rewriting the relevant HTML and CSS took up a decent chunk of time.

Hosting infrastructure

Many DApps sites are exclusively client side, leveraging frontend logic interacting with the blockchain and smart contracts exclusively. For these kinds of sites, hosting options like Vercel work out very nicely. However, as described above, this application was a combo frontend/backend. Instead of splitting the hosting between two different options, we decided to host both the static frontend app and the backend dynamic app in a single place.

At FP Complete, we typically use Kubernetes for this kind of deployment. In this case, however, we went with Amazon ECS. This isn't a terribly large delta from our standard Kubernetes deployments, following many of the same patterns: container-based application, rolling deployments with health checks, autoscaling and load balancers, externalized TLS cert management, and centralized monitoring and logging. No major issues there.

Additionally, to help reduce burden on the backend application and provide a better global experience for the site, we put Amazon CloudFront in front of the application, which allowed caching the static files in data centers around the world.

Finally, we codified all of this infrastructure using Terraform, our standard tool for Infrastructure as Code.


GitLab is a standard part of our FP Complete toolchain. We leverage it for internal projects for its code hosting, issue tracking, Docker registry, and CI integration. While we will often adapt our tools to match our client needs, in this case we ended up using our standard tool, and things went very well.

We ended up with a four-stage CI build process:

  1. Lint and build the frontend code, producing an artifact with the built static assets
  2. Build a static Rust application from the backend, embedding the static files from (1), and run standard Rust lints (clippy and fmt), producing an artifact with the single file compiled binary
  3. Generate a Docker image from the static binary in (2)
  4. Deploy the new Docker image to either the dev or prod ECS cluster

Steps (3) and (4) are set up to only run on the master and prod branches. This kind of automated deployment setup made it easy for our distributed team to get changes into a real environment for review quickly. However, it also opened a security hole we needed to address.

AWS lockdown

Due to the nature of this application, any kind of downtime during the active showers could have resulted in a lot of egg on our faces and a missed opportunity for the NFT raise. However, there was a far scarier potential outcome. Changing a single config value in production—the destination wallet—would have enabled a nefarious actor to siphon away funds intended for NFTs. This was the primary concern we had during the launch.

We considered multiple social engineering approaches to the problem, such as advertising to potentially users the correct wallet address they should be using. However, we decided that most likely users would not be checking addresses before sending their funds. We did set up some emergency "shower halted" page and put in place an on-call team to detect and deploy such measures if necessary, but fortunately nothing along those lines occurred.

However, during the meteor shower, we did instate an AWS account lockdown. This included:

  • Switching Zehut, a tool we use for granting temporary AWS credentials, into read-only credentials mode
  • Disabling GitLab CI's production credentials, so that GitLab users could not cause a change in prod

We additionally vetted all other components in the pipeline of DNS resolution, such as domain name registrar, Route 53, and other AWS services for hosting.

These are generally good practices, and over time we intend to refine the AWS permissions setup for Levana's AWS account in general. However, this launch was the first time we needed to use AWS for app deployment, and time did not permit a thorough AWS permissions analysis and configuration.

During the shower

As I just mentioned, during the shower we had an on-call team ready to jump into action and a playbook to address potential issues. Issues essentially fell into three categories:

  1. Site is slow/down/bad in some way
  2. Site is actively malicious, serving the wrong content and potentially scamming people
  3. Some kind of social engineering attack is underway

The FP Complete team were responsible for observing (1) and (2). I'll be honest that this is not our strong suit. We are a team that typically builds backends and designs DevOps solutions, not an on-call operations team. However, we were the experts in both the DevOps hosting, as well as the app itself. Fortunately, no major issues popped up, and the on-call team got to sit on their hands the whole time.

Out of a preponderance of caution, we did take a few extra steps before the showers started to try and ensure we were ready for any attack:

  1. We bumped the replica count in ECS from 2 desired instances to 5. We had autoscaling in place already, but we wanted extra buffer just to be safe.
  2. We increased the instance size from 512 CPU units to 2048 CPU units.

In all of our load testing pre-launch, we had seen that 512 CPU units was sufficient to handle 100,000 requests per second per instance with 99th percentile latency of 3.78ms. With these bumped limits in production, and in the middle of the highest activity on the site, we were very pleased to see the following CPU and memory usage graphs:

CPU usage

Memory usage

This was a nice testament to the power of a Rust-written web service, combined with proper autoscaling and CloudFront caching.

Image creation

Alright, let's put the app itself to the side for a second. We knew that, at the end of the shower, we would need to quickly mint NFTs for everyone wallet that donated more than $8 during a single shower. There are a few problems with this:

  • We had no idea how many users would contribute.
  • Generating the images is a relatively slow process.
  • Making the images available on IPFS—necessary for how NFTs work—was potentially going to be a bottleneck.

What we ended up doing was writing a Python script that pregenerated 100,000 or so meteor images. We did this generation directly on an Amazon EC2 instance. Then, instead of uploading the images to an IPFS hosting/pinning service, we ran the IPFS daemon directly on this EC2 instance. We additionally backed up all the images on S3 for redundant storage. Then we launched a second EC2 instance for redundant IPFS hosting.

This Python script not only generated the images, but also generated a CSV file mapping the image Content ID (IPFS address) together with various pieces of metadata about the meteor image, such as the meteor body. We'll use this CID/meteor image metadata mapping for correct minting next.

All in all, this worked just fine. However, there were some hurdles getting there, and we have plans to change this going forward in future stages of the NFT evolution. We'll mention those below.


Once the shower finished, we needed to get NFTs into user wallets as quickly as possible. That meant we needed two different things:

  1. All the NFT images on IPFS, which we had.
  2. A set of CSV files providing the NFTs to be generated, together with all of their metadata and owners.

The former was handled by the previous step. The latter was additional pieces of Rust tooling we wrote that leveraged the same internal libraries we wrote for the backend application. The purpose of this tooling was to:

  • Aggregate the total set of contributions from the blockchain.
  • Stratify contributions into individual meteors of different rarity.
  • Apply the appropriate algorithms to randomly decide which meteors receive an egg and which don't.
  • Assign eggs among the meteors.
  • Assign additionally metadata to the meteors.
  • Choose an appropriate and unique meteor image for each meteor based on its needed metadata. (This relies on the Python-generated CSV file above.)

This process produced a few different pieces of data:

  • CSV files for meteor NFT generation. There's nothing secret about these, you could reconstruct them yourself by analyzing the NFT minting on the blockchain.
  • The distribution of attributes (such as essence, crystals, distance, etc.) among the meteors for calculating rarity of individual traits. Again, this can be derived easily from public information.
  • A file that tracks the meteor/egg mapping. This is the one outcome from this process that is a closely guarded secret.

This final point is also influencing the design of the next few stages of this project. Specifically, while a smart contract would be the more natural way to interact with NFTs in general, we cannot expose the meteor/egg mapping on the blockchain. Therefore, the "cracking" phase (which will allow users to exchange meteors for their potential eggs) will need to work with another backend application.

In any event, this metadata-generation process was something we tested multiple times on data from our beta runs, and were ready to produce and send over to for minting soon after the shower. I believe users got NFTs in their wallets within 8 hours of the end of the shower, which was a pretty good timeframe overall.

Opening the cave

The final step was opening the cave, a new page on the meteor site that allows users to view their meteors. This phase was achieved by updating the configuration values of the backend to include:

  • The smart contract address of the NFT collection
  • The total number of meteors
  • The trait distribution

Once we switched the config values, the cave opened up, and users were able to access it. Besides pulling the static information mentioned above from the server, all cave page interactions occur fully client side, with the client querying the blockchain using the Terra.js library.

And that's where we're at today. The showers completed, users got their meteors, the cave is open, and we're back to work on implementing the cracking phase of this project. W00t!


Overall, this project went pretty smoothly in production. However, there were a few gotcha moments worth mentioning.

FCD rate limiting

The biggest issue we hit during the showers, and the one that had the biggest potential to break everything, was FCD rate limiting. We'd done extensive testing prior to the real showers on testnet, with many volunteer testers in addition to bots. We never ran into a single example that I'm aware of where rate limiting kicked in.

However, the real production shower run into such rate limiting issues about 10 showers into the event. (We'll look at how they manifested in a moment.) There are multiple potentially contributing factors for this:

  • There was simply far greater activity in the real event than we had tested for.
  • Most of our testing was limited to just 10 showers, and the real event went for 44.
  • There may be different rate limiting rules for FCD on mainnet versus testnet.

Whatever the case, we began to notice the rate limiting when we tried to roll out a new feature. We implemented the Telescope functionality, which allowed users to see the historical floor prices in previous showers.


After pushing the change to ECS, however, we noticed that the new deployment didn't go live. The reason was that, during the initial data load process, the new processes were receiving rate limiting responses and dying. We tried fixing this by adding a delay or other kinds of retry logic. However, none of these combinations allowed the application to begin processing requests within ECS's readiness check period. (We could have simply turned off health checks, but that would have opened a new can of worms.)

This problem was fairly critical. Not being able to roll out new features or bug fixes was worrying. But more troubling was the lack of autohealing. The existing instances continued to run fine, because they only needed to download small amounts of data from FCD to stay up-to-date, and therefore never triggered the rate limiting. But if any of those instances went down, ECS wouldn't be able to replace them with healthy instances.

Fortunately, we had already written the majority of a caching solution in prior weeks, and had not finished the work because we thought it wasn't a priority. After a few hair-raising hours of effort, we got a solution in place which:

  • Saved all transactions to a YAML file (a binary format would have been a better choice, but YAML was the easiest to roll out)
  • Uploaded this YAML file to S3
  • Ran this save/upload process on a loop, updating every 10 minutes
  • Modified the application logic to start off by first downloading the YAML file from S3, and then doing a delta load from there using FCD

This reduced startup time significantly, bypassed the rate limiting completely, and allowed us to roll out new features and not worry about the entire site going down.

IPFS hosting

FP Complete's DevOps approach is decidedly cloud-focused. For large blob storage, our go-to solution is almost always cloud-based blob storage, which would be S3 in the case of Amazon. We had zero experience with large scale IPFS data hosting prior to this project, which presented a unique challenge.

As mentioned, we didn't want to go with one of the IPFS pinning services, since the rate limiting may have prevented us from uploading all the pregenerated images. (Rate limiting is beginning to sound like a pattern here...) Being comfortable with S3, we initially tried hosting the images using go-ds-s3, a plugin for the ipfs CLI that uses S3 for storage. We still don't know why, but this never worked correctly for us. Instead, we reverted to storing the raw image data on Amazon EBS, which is more expensive and less durable, but actually worked. To fix the durability issue, we backed up all the raw image files to S3.

Overall, however, we're not happy with this outcome. The cost for this hosting is relatively high, and we haven't set up a truly fault-tolerant, highly available hosting. At this point, we would like to switch over to an IPFS pinning service, such as Pinata. Now that the images are available on IPFS, issuing API calls to pin those files should be easier than uploading the complete images. We're planning on using this as a framework going forward for other images, namely:

  • Generate the raw images on EC2
  • Upload for durability to S3
  • Run ipfs locally to make the images available on IPFS
  • Pin the images to a service like Pinata
  • Take down the EC2 instance

The next issue we ran into was... RATE LIMITING, again. This time, we discovered that Cloudflare's IPFS gateway was rate limiting users on downloading their meteor images, resulting in a situation where users would see only some of their meteors appear in their cave page. We solved this one by sticking CloudFront in front of the S3 bucket holding the meteor images and serving from there instead.

Going forward, when it's available, Cloudflare R2 is a promising alternative to the S3+CloudFront offering, due to reduced storage cost and entirely removed bandwidth costs.

Lessons learned

This project was a great mix of leveraging existing expertise and pairing with some new challenges. Some of the top lessons we learned here were:

  1. We got a lot of experience with working directly with the LCD and FCD APIs for Terra from Rust code. Previously, with our DeFi work, this almost exclusively sat behind Terra.js usage.
  2. IPFS was a brand-new topic for us, and we got to play with some pretty extreme cases right off the bat. Understanding the concepts in pinning and gateways will help us immensely with future NFT work.
  3. Since ECS is a relatively unusual technology for us, we got to learn quite a few of the idiosyncrasies it has versus Kubernetes, our more standard toolchain.
  4. While rate limiting is a concept we're familiar with and have worked with many times in the past, these particular obstacles were all new, and each of them surprising in different ways. Typically, we would have some simpler workarounds for these rate limiting issues, such as using authenticated requests. Having to solve each problem in such an extreme way was surprising.
  5. And while we've been involved in blockchain and smart contract work for years, this was our first time working directly with NFTs. This was probably the simplest lesson learned. The API for querying the NFTs contracts is fairly straightforward, and represented a small portion of the time spent on this project.


We're very excited to have been part of such a successful event as the Levana Dragons NFT meteor shower. This was a fun site to work on, with a huge and active user base, and some interesting challenges. It was great to pair together some of our standard cloud DevOps practices with blockchain and smart contract common practices. And using Rust brought some great advantages we're quite happy with.

Going forward, we're looking forward to getting to continue evolving the backend, frontend, and DevOps of this project, just like the NFTs themselves will be evolving. Happy dragon luck to all!

Interested in learning more? Check out these relevant articles

Does this kind of work sound interesting? Consider applying to work at FP Complete.

November 17, 2021 12:00 AM

November 15, 2021

Brent Yorgey

Competitive programming in Haskell: Enumeration

I’m in the middle of a multi-part series on implementing BFS in Haskell. In my last post, we saw one implementation, but I claimed that it is not fast enough to solve Modulo Solitaire, and I promised to show off a faster implementation in this post, but I lied; we have to take a small detour first.

The main idea to make a faster BFS implementation is to replace the HashMaps from last time with mutable arrays, but hopefully in such a way that we get to keep the same pure API. Using mutable arrays introduces a few wrinkles, though.

  1. The API we have says we get to use any type v for our vertices, as long as it is an instance of Ord and Hashable. However, this is not going to work so well for mutable arrays. We still want the external API to allow us to use any type for our vertices, but we will need a way to convert vertices to and from Int values we can use to index the internal mutable array.

  2. A data structre like HashMap is dynamically sized, but we don’t have this luxury with arrays. We will have to know the size of the array up front.

In other words, we need to provide a way to bijectively map vertices to a finite prefix of the natural numbers; that is, we need what I call invertible enumerations. This idea has come up for me multiple times: in 2016, I wrote about using such an abstraction to solve another competitive programming problem, and in 2019 I published a library for working with invertible enumerations. I’ve now put together a lightweight version of that library for use in competitive programming. I’ll walk through the code below, and you can also find the source code in my comprog-hs repository.

First, some extensions and imports.

{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}

module Enumeration where

import qualified Data.List as L
import Data.Hashable
import qualified Data.Array as A
import qualified Data.HashMap.Strict as HM

An Enumeration a consists of a cardinality, and two functions, select and locate, which together form a bijection between (some subset of) values of type a and a finite prefix of the natural numbers. We can convert an Enumeration into a list just by mapping the select function over that finite prefix.

data Enumeration a = Enumeration
  { card   :: !Int
  , select :: Int -> a
  , locate :: a -> Int

enumerate :: Enumeration a -> [a]
enumerate e = map (select e) [0 .. card e-1]

Since a occurs both positively and negatively, Enumeration is not a Functor, but we can map over Enumerations as long as we provide both directions of a bijection a <-> b.

mapE :: (a -> b) -> (b -> a) -> Enumeration a -> Enumeration b
mapE f g (Enumeration c s l) = Enumeration c (f . s) (l . g)

We have various fundamental ways to build enumerations: empty and unit enumerations, and an identity enumeration on a finite prefix of natural numbers.

voidE :: Enumeration a
voidE = Enumeration 0 (error "select void") (error "locate void")

unitE :: Enumeration ()
unitE = singletonE ()

singletonE :: a -> Enumeration a
singletonE a = Enumeration 1 (const a) (const 0)

finiteE :: Int -> Enumeration Int
finiteE n = Enumeration n id id

We can automatically enumerate all the values of a Bounded Enum instance. This is useful, for example, when we have made a custom enumeration type.

boundedEnum :: forall a. (Enum a, Bounded a) => Enumeration a
boundedEnum = Enumeration
  { card   = hi - lo + 1
  , select = toEnum . (+lo)
  , locate = subtract lo . fromEnum
    lo, hi :: Int
    lo = fromIntegral (fromEnum (minBound @a))
    hi = fromIntegral (fromEnum (maxBound @a))

We can also build an enumeration from an explicit list. We want to make sure this is efficient, since it is easy to imagine using this e.g. on a very large list of vertex values given as part of the input of a problem. So we build an array and a hashmap to allow fast lookups in both directions.

listE :: forall a. (Hashable a, Eq a) => [a] -> Enumeration a
listE as = Enumeration n (toA A.!) (fromA HM.!)
    n = length as
    toA :: A.Array Int a
    toA = A.listArray (0,n-1) as
    fromA :: HM.HashMap a Int
    fromA = HM.fromList (zip as [0 :: Int ..])

Finally, we have a couple ways to combine enumerations into more complex ones, via sum and product.

(>+<) :: Enumeration a -> Enumeration b -> Enumeration (Either a b)
a >+< b = Enumeration
  { card   = card a + card b
  , select = \k -> if k < card a then Left (select a k) else Right (select b (k - card a))
  , locate = either (locate a) ((+card a) . locate b)

(>*<) :: Enumeration a -> Enumeration b -> Enumeration (a,b)
a >*< b = Enumeration
  { card = card a * card b
  , select = \k -> let (i,j) = k `divMod` card b in (select a i, select b j)
  , locate = \(x,y) -> card b * locate a x + locate b y

There are a few more combinators in the source code but I don’t know whether I’ll ever use them. You can read about them if you want. For now, let’s try using this to solve a problem!

…ah, who am I kidding, I can’t find any problems that can be directly solved using this framework. Invertibility is a double-edged sword—we absolutely need it for creating an efficient BFS with arbitrary vertices, and the combinators will come in quite handy if we want to use some complex type for vertices. However, requiring invertibility also limits the expressiveness of the library. For example, there is no Monad instance. This is why my simple-enumeration library has both invertible and non-invertible variants.

by Brent at November 15, 2021 10:29 PM

November 12, 2021

Daniel Mlot (duplode)

Divisible and the Monoidal Quartet

A recent blog post by Gabriella Gonzalez, Co-Applicative programming style, has sparked discussion on ergonomic ways to make use of the Divisible type class. The conversation pointed to an interesting rabbit hole, and jumping into it resulted in these notes, in which I attempt to get a clearer view of picture of the constellation of monoidal functor classes that Divisible belongs to. The gist of it is that “Divisible is a contravariant Applicative, and Decidable is a contravariant Alternative” is not a full picture of the relationships between the classes, as there are a few extra connections that aren’t obvious to the naked eye.

Besides Gabriella’s post, which is an excellent introduction to Divisible, I recommend as background reading Tom Ellis’ Alternatives convert products to sums, which conveys the core intuition about monoidal functor classes in an accessible manner. There is a second post by Tom, The Mysterious Incomposability of Decidable, that this article will be in constant dialogue with, in particular as a source of examples. From now on I will refer to it as “the Decidable post”. Thanks to Gabriella and Tom for inspiring the writing of this article.

For those of you reading with GHCi on the side, the key general definitions in this post are available from this .hs file.


As I hinted at the introduction, this post is not solely about Divisible, but more broadly about monoidal functor classes. To start from familiar ground and set up a reference point, I will first look at the best known of those classes, Applicative. We won’t, however, stick with the usual presentation of Applicative in terms of (<*>), as it doesn’t generalise to the other classes we’re interested in. Instead, we will switch to the monoidal presentation: 1

zipped :: Applicative f => f a -> f b -> f (a, b)
zipped = liftA2 (,)

-- An operator spelling, for convenience.
(&*&) :: Applicative f => f a -> f b -> f (a, b)
(&*&) = zipped
infixr 5 &*&

unit :: Applicative f => f ()
unit = pure ()

-- Laws:

-- unit &*& v ~ v
-- u &*& unit ~ u
-- (u &*& v) &*& w ~ u &*& (v &*& w)

(Purely for the sake of consistency, I will try to stick to the Data.Functor.Contravariant.Divisible naming conventions for functions like zipped.)

The matter with (<*>) (and also liftA2) that stops it from being generalised for our purposes is that it leans heavily on the fact that Hask is a Cartesian closed category, with pairs as the relevant product. Without that, the currying and the partial application we rely on when writing in applicative style become unfeasible.

While keeping ourselves away from (<*>) and liftA2, we can recover, if not the full flexibility, the power of applicative style with a variant of liftA2 that takes an uncurried function:

lizip :: Applicative f => ((a, b) -> c) -> f a -> f b -> f c
lizip f u v = fmap f (zipped u v)

(That is admittedly a weird name; all the clashing naming conventions around this topic has left me with few good options.)

On a closing note for this section, my choice of operator for zipped is motivated by the similarity with (&&&) from Control.Arrow:

(&&&) :: Arrow p => p a b -> p a c -> p a (b, c)

In particular, (&*&) for the function Applicative coincides with (&&&) for the function Arrow.

Leaning on connections like this one, I will use Control.Arrow combinators liberally, beginning with the definition of the following two convenience functions that will show up shortly:

dup :: a -> (a, a)
dup = id &&& id

forget :: Either a a -> a
forget = id ||| id


As summarised at the beginning of the Decidable post, while Applicative converts products to products covariantly, Divisible converts products to products contravariantly. From that point of view, I will take divided, the counterpart to zipped, as the fundamental combinator of the class:

-- This is the divided operator featured on Gabriella's post, soon to
-- become available from Data.Functor.Contravariant.Divisible
(>*<) :: Divisible k => k a -> k b -> k (a, b)
(>*<) = divided
infixr 5 >*<

-- Laws:

-- conquered >*< v ~ v
-- u >*< conquered ~ u
-- (u >*< v) >*< w ~ u >*< (v >*< w)

Recovering divide from divided is straightforward, and entirely analogous to how lizip can be obtained from zipped:

divide :: Divisible k => (a -> (b, c)) -> k b -> k c -> k a
divide f = contramap f (divided u v)

Lessened currying aside, we might say that divide plays the role of liftA2 in Divisible.

It’s about time for an example. For that, I will borrow the one from Gabriella’s post:

data Point = Point { x :: Double, y :: Double, z :: Double }
    deriving Show

nonNegative :: Predicate Double
nonNegative = Predicate (0 <=)

-- (>$<) = contramap
nonNegativeOctant :: Predicate Point
nonNegativeOctant =
    adapt >$< nonNegative >*< nonNegative >*< nonNegative
    adapt = x &&& y &&& z

The slight distortion to Gabriella’s style in using (&&&) to write adapt pointfree is meant to emphasise how that deconstructor can be cleanly assembled out of the component projection functions x, y and z. Importantly, that holds in general: pair-producing functions a -> (b, c) are isomorphic (a -> b, a -> c) pairs of projections. That gives us a variant of divide that takes the projections separately:

tdivide :: Divisible k => (a -> b) -> (a -> c) -> k b -> k c -> k a
tdivide f g u v = divide (f &&& g) u v

Besides offering an extra option with respect to ergonomics, tdivide hints at extra structure available from the Divisible class. Let’s play with the definitions a little:

tdivide f g u v
divide (f &&& g) u v
contramap (f &&& g) (divided u v)
contramap ((f *** g) . dup) (divided u v)
(contramap dup . contramap (f *** g)) (divided u v)
contramap dup (divided (contramap f u) (contramap g v))
divide dup (contramap f u) (contramap g v)

divide dup, which duplicates input in order to feed each of its arguments, is a combinator worthy of a name, or even two:

dplus :: Divisible k => k a -> k a -> k a
dplus = divide dup

(>+<) :: Divisible k => k a -> k a -> k a
(>+<) = dplus
infixr 5 >+<

So we have:

tdivide f g u v = dplus (contramap f u) (contramap g v)

Or, using the operators:

tdivide f g u v = f >$< u >+< g >$< v

An alternative to using the projections to set up a deconstructor to be used with divide is to contramap each projection to its corresponding divisible value and combine the pieces with (>+<). That is the style favoured by Tom Ellis, 2 which is why I have added a “t” prefix to tdivide comes from. For instance, Gabriella Gonzalez’s example would be spelled as follows in this style:

nonNegativeOctantT :: Predicate Point
nonNegativeOctantT =
    x >$< nonNegative >+< y >$< nonNegative >+< z >$< nonNegative


The (>+<) combinator defined above is strikingly similar to (<|>) from Alternative, down to its implied monoidal nature: 3

(>+<) :: Divisible k => k a -> k a -> k a

(<|>) :: Alternative f => f a -> f a -> f a

It is surprising that (>+<) springs forth in Divisible rather than Decidable, which might look like the more obvious candidate to be Alternative’s contravariant counterpart. To understand what is going on, it helps to look at Alternative from the same perspective we have used here for Applicative and Divisible. For that, first of all we need an analogue to divided. Let’s borrow the definition from Applicatives convert products to sums:

combined :: Alternative f => f a -> f b -> f (Either a b)
combined u v = Left <$> u <|> Right <$> v

(-|-) :: Alternative f => f a -> f b -> f (Either a b)
(-|-) = combined
infixr 5 -|-

-- We also need a suitable identity:
zero :: Alternative f => f Void
zero = empty

-- Laws:

-- zero -|- v ~ v
-- u -|- zero ~ u
-- (u -|- v) -|- w ~ u -|- (v -|- w)

(I won’t entertain the various controversies about the Alternative laws here, nor any interaction laws involving Applicative. Those might be interesting matters to think about from this vantage point, though.)

A divide analogue follows:

combine :: Alternative f => (Either a b -> c) -> f a -> f b -> f c
combine f u v = fmap f (combined u v)

Crucially, Either a b -> c can be split in a way dual to what we have seen earlier with a -> (b, c): an Either-consuming function amounts to a pair of functions, one to deal with each component. That being so, we can use the alternative style trick done for Divisible by dualising things:

tcombine :: Alternative f => (a -> c) -> (b -> c) -> f a -> f b -> f c
tcombine f g = combine (f ||| g)
tcombine f g u v
combine (f ||| g) u v
fmap (f ||| g) (combined u v)
fmap (forget . (f +++ g)) (combined u v)
fmap forget (combined (fmap f u) (fmap g v))
combine forget (fmap f u) (fmap g v)

To keep things symmetrical, let’s define:

aplus :: Alternative f => f a -> f a -> f a
aplus = combine forget
-- (<|>) = aplus

So that we end up with:

tcombine f g u v = aplus (fmap f u) (fmap g v)

tcombine f g u v = f <$> u <|> g <$> v

For instance, here is the Alternative composition example from the Decidable post…

alternativeCompose :: [String]
alternativeCompose = show <$> [1,2] <|> reverse <$> ["hello", "world"]

… and how it might be rendered using combine/(-|-):

alternativeComposeG :: [String]
alternativeComposeG = merge <$> [1,2] -|- ["hello", "world"]
    merge = show ||| reverse

There is, therefore, something of a subterranean connection between Alternative and Divisible. The function arguments to both combine and divide, whose types are dual to each other, can be split in a way that not only reveals an underlying monoidal operation, (<|>) and (>+<) respectively, but also allows for a certain flexibility in using the class combinators.


Last, but not least, there is Decidable to deal with. Data.Functor.Contravariant.Divisible already provides chosen as the divided analogue, so let’s just supply the and operator variant: 4

(|-|) :: Decidable k => k a -> k b -> k (Either a b)
(|-|) = chosen
infixr 5 |-|

-- Laws:

-- lost |-| v ~ v
-- u |-| lost ~ u
-- (u |-| v) |-| w ~ u |-| (v |-| w)

choose can be recovered from chosen in the usual way:

choose :: Decidable k => (a -> Either b c) -> k b -> k c -> k a
choose f u v = contamap f (chosen u v)

The a -> Either b c argument of choose, however, is not amenable to the function splitting trick we have used for divide and combine. Either-producing functions cannot be decomposed in that manner, as the case analysis to decide whether to return Left or Right cannot be disentangled. This is ultimately what Tom’s complaint about the “mysterious incomposability” of Decidable is about. Below is a paraphrased version of the Decidable example from the Decidable post:

data Foo = Bar String | Baz Bool | Quux Int
    deriving Show

pString :: Predicate String
pString = Predicate (const False)

pBool :: Predicate Bool
pBool = Predicate id

pInt :: Predicate Int
pInt = Predicate (>= 0)

decidableCompose :: Predicate Foo
decidableCompose = analyse >$< pString |-| pBool |-| pInt
    analyse = \case
        Bar s -> Left s
        Baz b -> Right (Left b)
        Quux n -> Right (Right n)

The problem identified in the post is that there is no straightfoward way around having to write “the explicit unpacking into an Either” performed by analyse. In the Divisible and Alternative examples, it was possible to avoid tuple or Either shuffling by decomposing the counterparts to analyse, but that is not possible here. 5

In the last few paragraphs, we have mentioned Divisible, Alternative and Decidable. What about Applicative, though? The Applicative example from the Decidable post is written in the usual applicative style:

applicativeCompose :: [[String]]
applicativeCompose =
    f <$> [1, 2] <*> [True, False] <*> ["hello", "world"]
    f = (\a b c -> replicate a (if b then c else "False"))

As noted earlier, though, applicative style is a fortunate consequence of Hask being Cartesian closed, which makes it possible to turn (a, b) -> c into a -> b -> c. If we leave out (<*>) and restrict ourselves to (&*&), we end up having to deal explicitly with tuples, which is a dual version of the Decidable issue:

monoidalCompose :: [[String]]
monoidalCompose =
    consume <$> [1, 2] &*& [True, False] &*& ["hello", "world"]
    consume (a, (b, c)) = replicate a (if b then c else "False")

Just like a -> Either b c functions, (a, b) -> c functions cannot be decomposed: the c value can be produced by using the a and b components in arbitrary ways, and there is no easy way to disentangle that.

Decidable, then, relates to Applicative in an analogous way to how Divisible does to Alternative. There are a few other similarities between them that are worth pointing out:

  • Neither Applicative nor Decidable offer a monoidal f a -> f a -> f a operation like the ones of Alternative and Decidable. A related observation is that, for example, Op’s Decidable instance inherits a Monoid constraint from Divisible but doesn’t actually use it in the method implementations.

  • choose Left and choose Right can be used to combine consumers so that one of them doesn’t actually receive input. That is analogous to how (<*) = lizip fst and (*>) = lizip snd combine applicative values while discarding the output from one of them.

  • Dually to how zipped/&*& for the function functor is (&&&), chosen for decidables such as Op and Predicate amounts to (|||). My choice of |-| as the corresponding operator hints at that.

In summary

To wrap things up, here is a visual summary of the parallels between the four classes:

Diagram of the four monoidal functor classes under consideration, with Applicative and Decidable in one diagonal, and Alternative and Divisible in the other.
Diagram of the four monoidal functor classes under consideration, with Applicative and Decidable in one diagonal, and Alternative and Divisible in the other.

To my eyes, the main takeaway of our figure of eight trip around this diagram has to do with its diagonals. Thanks to a peculiar kind of duality, classes in opposite corners of it are similar to each other in quite a few ways. In particular, the orange diagonal classes, Alternative and Divisible, have monoidal operations of f a -> f a -> f a signature that emerge from their monoidal functor structure.

That Divisible, from this perspective, appears to have more to do with Alternative than with Applicative leaves us a question to ponder: what does that mean for the relationship between Divisible and Decidable? The current class hierarchy, with Decidable being a subclass of Divisible, mirrors the Alternative-Applicative relationship on the other side of the covariant-contravariant divide. That, however, is not the only reasonable arrangement, and possibly not even the most natural one. 6


dplus is a monoidal operation

If we are to show that (>+<) is a monoidal operation, first of all we need an identity for it. conquer :: f a sounds like a reasonable candidate. It can be expressed in terms of conquered, the unit for divided, as follows:

-- conquer discards input.
conquer = const () >$< conquered

The identity laws do come out all right:

conquer >+< v = v  -- Goal
conquer >+< v  -- LHS
dup >$< (conquer >*< v)
dup >$< ((const () >$< conquered) >*< v)
dup >$< (first (const ()) >$< (conquered >*< v))
first (const ()) . dup >$< (conquered >*< v)
-- conquered >*< v ~ v
first (const ()) . dup >$< (snd >$< v)
snd . first (const ()) . dup >$< v
v  -- LHS = RHS

u >+< conquer = u  -- Goal
u >+< conquer  -- LHS
dup >$< (u >*< discard)
dup >$< (u >*< (const () >$< conquered))
dup >$< (second (const ()) >$< (u >*< conquered))
second (const ()) . dup >$< (u >*< conquered)
-- u >*< conquered ~ u
second (const ()) . dup >$< (fst >$< u)
fst . second (const ()) . dup >$< u
u  -- LHS = RHS

And so does the associativity one:

(u >+< v) >+< w = u >+< (v >+< w)  -- Goal
(u >+< v) >+< w  -- LHS
dup >$< ((dup >$< (u >*< v)) >*< w)
dup >$< (first dup >$< ((u >*< v) >*< w))
first dup . dup >$< ((u >*< v) >*< w)
u >+< (v >+< w)  -- RHS
dup >$< (u >*< (dup >$< (v >*< w)))
dup >$< (second dup >$< (u >*< (v >*< w)))
second dup . dup >$< (u >*< (v >*< w))
-- (u >*< v) >*< w ~ u >*< (v >*< w)
-- assoc ((x, y), z) = (x, (y, z))
second dup . dup >$< (assoc >$< ((u >*< v) >*< w))
assoc . second dup . dup >$< ((u >*< v) >*< w)
first dup . dup >$< ((u >*< v) >*< w)  -- LHS = RHS

Handling nested Either

The examples in this appendix are available as a separate .hs file.

There is a certain awkwardness in dealing with nested Either as anonymous sums that is hard to get rid of completely. Prisms are a tool worth looking into in this context, as they are largely about expressing pattern matching in a composable way. Let’s bring lens into Tom’s Decidable example, then:

data Foo = Bar String | Baz Bool | Quux Int
    deriving (Show)
makePrisms ''Foo

A cute trick with prisms is using outside to fill in the missing cases of a partial function (in this case, (^?! _Quux):

anonSum :: APrism' s a -> (s -> b) -> s -> Either a b
anonSum p cases = set (outside p) Left (Right . cases)

decidableOutside :: Predicate Foo
decidableOutside = analyse >$< pString |-| pBool |-| pInt
    analyse = _Bar `anonSum` (_Baz `anonSum` (^?! _Quux))

An alternative is using matching to write it in a more self-explanatory way:

matchingL :: APrism' s a -> s -> Either a s
matchingL p = view swapped . matching p

decidableMatching :: Predicate Foo
decidableMatching =
    choose (matchingL _Bar) pString $
    choose (matchingL _Baz) pBool $
    choose (matchingL _Quux) pInt $
    error "Missing case in decidableMatching"

These implementations have a few inconveniences of their own, the main one perhaps being that there is noting to stop us from forgetting one of the prisms. The combinators from the total package improve on that by incorporating exhaustiveness checking for prisms, at the cost of requiring the sum type to be defined in a particular way.

There presumably also is the option of brining in heavy machinery, and setting up an anonymous sum wrangler with Template Haskell or generics. In fact, it appears the shapely-data package used to offer precisely that. It might be worth it to take a moment to make it build with recent GHCs.

All in all, these approaches feel like attempts to approximate extra language support for juggling sum types. As it happens, though, there is a corner of the language which does provide extra support: arrow notation. Converting the example to arrows provides a glimpse of what might be:

-- I'm going to play nice, rather than making b phantom and writing a
-- blatantly unlawful Arrow instance just for the sake of the notation.
newtype Pipecate a b = Pipecate { getPipecate :: a -> (Bool, b) }

instance Category Pipecate where
    id = Pipecate (True,)
    Pipecate q . Pipecate p = Pipecate $ \x ->
        let (bx, y) = p x
            (by, z) = q y
        in (bx && by, z)

instance Arrow Pipecate where
    arr f = Pipecate (const True &&& f)
    first (Pipecate p) = Pipecate $ \(x, o) ->
         let (bx, y) = p x
         in (bx, (y, o))

instance ArrowChoice Pipecate where
    left (Pipecate p) = Pipecate $ \case
        Left x ->
            let (bx, y) = p x
            in (bx, Left y)
        Right o -> (True, Right o)

fromPred :: Predicate a -> Pipecate a ()
fromPred (Predicate p) = Pipecate (p &&& const ())

toPred :: Pipecate a x -> Predicate a
toPred (Pipecate p) = Predicate (fst . p)

decidableArrowised :: Predicate Foo
decidableArrowised = toPred $ proc foo -> case foo of
    Bar s -> fromPred pString -< s
    Baz b -> fromPred pBool -< b
    Quux n -> fromPred pInt -< n

decidableArrowised corresponds quite closely to the various Decidable-powered implementations. Behind the scenes, case commands in arrow notation give rise to nested eithers. Said eithers are dealt with by the arrows, which are combined in an appropriate way with (|||). (|||), in turn, can be seen as an arrow counterpart to chosen/(|-|). Even the -< “feed” syntax, which the example above doesn’t really take advantage of, amounts to slots for contramapping. If someone ever feels like arranging a do-esque noation for Decidable to go with Gabriella’s DivisibleFrom, it seems case commands would be a nice starting point.

  1. See the relevant section of the Typeclassopedia for a brief explanation of it.↩︎

  2. See, for instance, this Twitter conversation, or the Divisible example in the Decidable post. Note that, though I’m using (>$<) here for ease of comparison, the examples in this style arguably look tidier when spelled with contramap.

    Speaking of operator usage, it is not straightforward to decide on the right fixities for all those operators, and it is entirely possible that I have overlooked something. I have picked them aiming to have both styles work without parentheses, and to have the pairs associated to the right, that is:

    adapt >$< u >*< v >*< w
      = adapt >$< (u >*< (v >*< w))
    f >$< u >+< g >$< v >+< h >$< v
      = (f >$< u) >+< (g >$< v) >+< (h >$< w)
  3. A proof that (>+<) is indeed monoidal is in an end note to this post.

    On a related note, my choice of >+< as the dplus operator is, in part, a pun on (<+>) from ArrowPlus. (>+<) for many instances works very much like (<+>), monoidally combining outputs, even if there probably isn’t a sensible way to actually make the types underlying the various Divisible functors instances of ArrowPlus.↩︎

  4. Both dhall and co-log-core define (>|<) as chosen-like operators. To my eyes, though, >|< fits dplus better. As a compromise, I opted to not use >|< for neither of them here.↩︎

  5. I will play with a couple of approaches to nested Either ergonomics at the end of the post, in an appendix.↩︎

  6. See also contravariant issue #64, which suggests no longer making Decidable a subclass of Divisible. Though the argument made by Zemyla is a different one, there are resonances with the observations made here. On a related development, semigroupoids has recently introduced a Conclude class, which amounts to “Decidable without a superclass constraint on Divisible”.↩︎

by Daniel Mlot at November 12, 2021 02:50 AM

November 10, 2021

Douglas M. Auclair (geophf)

November, 2021 1HaskellADay 1Liners

  • 2021-11-09: You have: \k _v -> f k Curry away the arguments.
  • 2021-11-09: Hello, all. It's been a minute.

    Here's a #1Liner #Haskell problem

    You have m :: Map a b

    You want to filter it by s :: Set a

    so that m has keys only in s.

    How would you do that?

    • O_O @dysinger: let map = Data.Map.fromList [(1, "one"), (2, "two"), (3, "three")]
      set = Data.Set.fromList [1,3,5,7,9]
      in Data.Map.fromList [ elem | elem <- Data.Map.toList map, Data.Set.member (fst elem) set ]
    • ephemient @ephemient: Map.filterWithKey (\k _ -> Set.member k set) map
    • ephemient @ephemient: Map.intersectionWith const map $ Map.fromDistinctAscList [(k, ()) | k <- Set.toAscList set]
    • じょお @gotoki_no_joe Map.intersection m (Map.fromSet (const ()) s)

by geophf ( at November 10, 2021 12:12 AM

November 09, 2021

Joachim Breitner

How to audit an Internet Computer canister

I was recently called upon by Origyn to audit the source code of some of their Internet Computer canisters (“canisters” are services or smart contracts on the Internet Computer), which were written in the Motoko programming language. Both the application model of the Internet Computer as well as Motoko bring with them their own particular pitfalls and possible sources for bugs. So given that I was involved in the creation of both, they reached out to me.

In the course of that audit work I collected a list of things to watch out for, and general advice around them. Origyn generously allowed me to share that list here, in the hope that it will be helpful to the wider community.

Inter-canister calls

The Internet Computer system provides inter-canister communication that follows the actor model: Inter-canister calls are implemented via two asynchronous messages, one to initiate the call, and one to return the response. Canisters process messages atomically (and roll back upon certain error conditions), but not complete calls. This makes programming with inter-canister calls error-prone. Possible common sources for bugs, vulnerabilities or simply unexpected behavior are:

  • Reading global state before issuing an inter-canister call, and assuming it    to still hold when the call comes back.

  • Changing global state before issuing an inter-canister call, changing it again    in the response handler, but assuming nothing else changes the state in    between (reentrancy).

  • Changing global state before issuing an inter-canister call, and not    handling failures correctly, e.g. when the code handling the callback rolls    backs.

If you find such pattern in your code, you should analyze if a malicious party can trigger them, and assess the severity that effect

These issues apply to all canisters, and are not Motoko-specific.


Even in the absence of inter-canister calls the behavior of rollbacks can be surprising. In particular, rejecting (i.e. throw) does not rollback state changes done before, while trapping (e.g. Debug.trap, assert …, out of cycle conditions) does.

Therefore, one should check all public update call entry points for unwanted state changes or unwanted rollbacks. In particular, look for methods (or rather, messages, i.e. the code between commit points) where a state change is followed by a throw.

This issues apply to all canisters, and are not Motoko-specific, although other CDKs may not turn exceptions into rejects (which don’t roll back).

Talking to malicious canisters

Talking to untrustworthy canisters can be risky, for the following (likely incomplete) reasons:

  • The other canister can withhold a response. Although the bidirectional   messaging paradigm of the Internet Computer was designed to guarantee a   response eventually, the other party can busy-loop for as long as they are   willing to pay for before responding. Worse, there are ways to deadlock a   canister.

  • The other canister can respond with invalidly encoded Candid. This will cause   a Motoko-implemented canister to trap in the reply handler, with no easy way   to recover. Other CDKs may give you better ways to handle invalid Candid, but even then you will have to worry about Candid cycle bombs that will cause your reply handler to trap. 

Many canisters do not even do inter-canister calls, or only call other trustwothy canisters. For the others, the impact of this needs to be carefully assessed.

Canister upgrade: overview

For most services it is crucial that canisters can be upgraded reliably. This can be broken down into the following aspects:

  1. Can the canister be upgraded at all?
  2. Will the canister upgrade retain all data?
  3. Can the canister be upgraded promptly?
  4. Is three a recovery plan for when upgrading is not possible?

Canister upgradeability

A canister that traps, for whatever reason, in its canister_preupgrade system method is no longer upgradeable. This is a major risk. The canister_preupgrade method of a Motoko canister consists of the developer-written code in any system func preupgrade() block, followed by the system-generated code that serializes the content of any stable var into a binary format, and then copies that to stable memory.

Since the Motoko-internal serialization code will first serialize into a scratch space in the main heap, and then copy that to stable memory, canisters with more than 2GB of live data will likely be unupgradeable. But this is unlikely the first limit:

The system imposes an instruction limit on upgrading a canister (spanning both canister_preupgrade and canister_postupgrade). This limit is a subnet configuration value, and sepearate (and likely higher) than the normal per-message limit, and not easily determined. If the canister’s live data becomes too large to be serialized within this limit, the canister becomes non-upgradeable.

This risk cannot be eliminated completely, as long as Motoko and Stable Variables are used. It can be mitigated by appropriate load testing:

Install a canister, fill it up with live data, and exercise the upgrade. If this succeeds with a live data set exceeding the expected amount of data by a margin, this risk is probably acceptable. Bonus points for adding functionality that will prevent the canister’s live data to increase above a certain size.

If this testing is to be done on a local replica, extra care needs to be taken to make sure the local replica actually performs instruction counting and has the same resource limits as the production subnet.

An alternative mitigation is to avoid canister_pre_upgrade as much as possible. This means no use of stable var (or restricted to small, fixed-size configuration data). All other data could be

  • mirrored off the canister (possibly off chain), and manually re-hydrated after an upgrade.
  • stored in stable memory manually, during each update call, using the ExperimentalStableMemory API. While this matches what high-assurance Rust canisters (e.g. the Internet Identity) do, This requires manual binary encoding of the data, and is marked experimental, so I cannot recommend this at the moment.
  • not put into a Motoko canister until Motoko has a scalable solution for stable variable (for example keeping them in stable memory permanently, with smart caching in main memory, and thus obliterating the need for pre-upgrade code.)

Data retention on upgrades

Obviously, all live data ought to be retained during upgrades. Motoko automatically ensures this for stable var data. But often canisters want to work with their data in a different format (e.g. in objects that are not shared and thus cannot be put in stable vars, such as HashMap or Buffer objects), and thus may follow following idiom:

stable var fooStable = …;
var foo = fooFromStable(fooStable);
system func preupgrade() { fooStable := fooToStable(foo); })
system func postupgrade() { fooStable := (empty); })

In this case, it is important to check that

  • All non-stable global vars, or global lets with mutable values, have a stable companion.
  • The assignments to foo and fooStable are not forgotten.
  • The fooToStable and fooFromStable form bijections.

An example would be HashMaps stored as arrays via Iter.toArray(….entries()) and HashMap.fromIter(….val