Planet Haskell

June 23, 2017

Joachim Breitner

The perils of live demonstrations

Yesterday, I was giving a talk at the The South SF Bay Haskell User Group about how implementing lock-step simulation is trivial in Haskell and how Chris Smith and me are using this to make CodeWorld even more attractive to students. I gave the talk before, at Compose::Conference in New York City earlier this year, so I felt well prepared. On the flight to the West Coast I slightly extended the slides, and as I was too cheap to buy in-flight WiFi, I tested them only locally.

So I arrived at the offices of Target1 in Sunnyvale, got on the WiFi, uploaded my slides, which are in fact one large interactive CodeWorld program, and tried to run it. But I got a type error…

Turns out that the API of CodeWorld was changed just the day before:

commit 054c811b494746ec7304c3d495675046727ab114
Author: Chris Smith <>
Date:   Wed Jun 21 23:53:53 2017 +0000

    Change dilated to take one parameter.
    Function is nearly unused, so I'm not concerned about breakage.
    This new version better aligns with standard educational usage,
    in which "dilation" means uniform scaling.  Taken as a separate
    operation, it commutes with rotation, and preserves similarity
    of shapes, neither of which is true of scaling in general.

Ok, that was quick to fix, and the CodeWorld server started to compile my code, and compiled, and aborted. It turned out that my program, presumably the larges CodeWorld interaction out there, hit the time limit of the compiler.

Luckily, Chris Smith just arrived at the venue, and he emergency-bumped the compiler time limit. The program compiled and I could start my presentation.

Unfortunately, the biggest blunder was still awaiting for me. I came to the slide where two instances of pong are played over a simulated network, and my point was that the two instances are perfectly in sync. Unfortunately, they were not. I guess it did support my point that lock-step simulation can easily go wrong, but it really left me out in the rain there, and I could not explain it – I did not modify this code since New York, and there it worked flawless2. In the end, I could save my face a bit by running the real pong game against an attendee over the network, and no desynchronisation could be observed there.

Today I dug into it and it took me a while, and it turned out that the problem was not in CodeWorld, or the lock-step simulation code discussed in our paper about it, but in the code in my presentation that simulated the delayed network messages; in some instances it would deliver the UI events in different order to the two simulated players, and hence cause them do something different. Phew.

  1. Yes, the retail giant. Turns out that they have a small but enthusiastic Haskell-using group in their IT department.

  2. I hope the video is going to be online soon, then you can check for yourself.

by Joachim Breitner ( at June 23, 2017 11:54 PM

wren gayle romano

"Spring semester" in review

Hi all, long time no post. A lot has been going on, but I’m finally starting to get on top of things again. I’ve been meaning to write in a bit more depth about some of this, but that want for perfection has been the enemy of the writing anything at all. So, here’s a quick synopsis of what’s been going on in my neck of the woods.

Both of L’s parents passed away. We’ve known this was coming, but it’s still hard of course. L was out there for a bit over a month taking care of her mom. They died very close together, so we ended up having a single combined service. I was out there for about a week helping to wrap things up before whisking L back home.

I finally got back the results of the genetics test. Turns out I don’t have Loeys–Dietz, or at least not the same genetic variant my mother did. But I definitely have something. So it’s back to the diagnostic swamp trying to figure out how to give it a name so that doctors’ll take it seriously. Current working hypothesis is hypermobility-type Ehlers–Danlos. Alas, “hypermobility-type” is medical jargon for “we have no idea what this is, but it kinda looks similar to the forms of Ehlers–Danlos we do know stuff about, so let’s call it that.” So, yeah, no medical tests to “prove” that’s what it is; just your usual game of convincing folks you have enough of the symptoms to match the syndrome.

I’ve been getting used to paying attention to my ADHD and working with it rather than trying to plow through it. It helps a lot to recognize that it’s not a failing on my part (e.g., that I can’t focus on boring things for as long as other people) but rather just part of how I’m wired. That makes it a lot easier to stop beating myself up over things, and instead figure out better ways to work with my brain rather than trying to force it into a shape it won’t take. As I’ve gotten better at this I’ve finally started getting caught up on a bunch of things that’ve fallen to the wayside over the past few years.

For example, I’m slowly getting caught up on the backlog of bug reports and feature requests for my various Haskell packages. Mostly been focusing on logfloat and unification-fd so far, but will make it around to the others in time. So, if you sent me an email about some bug or feature over the past few years and it seems to have fallen into the void, consider filing a ticket.

Still working on getting caught up to where I should be on my dissertation.

Work has also been going excellently. It’s all seekrit and nonsense, so I can’t say too much about it. But lately I’ve been doing a bunch of work on characterizing families of mathematical objects, and discovering their symmetries so we can exploit them to simplify and optimize things. So lots of mathy goodness going on. It’s a bit more geometric and combinatorial than my usual algebraic fare, but it’s the sort of stuff that arises from algebraic structures so it’s not too far from home base. (If that doesn’t make sense to you, maybe take a look at Brent Yorgey’s thesis to see an example of the connection between combinatorics and algebraic data types.) Plus, it helps that I’ve been getting to know some of the hella queer ladies who work in my building :)

In other health-y news, round about the time I got officially diagnosed with ADHD I had a bunch of friends going on about what the symptoms of allism (aka non-autism) are. Though I have a bunch of autistic friends, I’ve never really known much about what autism’s really like because all the literature is written by allistic folks, for allistic folks, so they’re all “patient has underdeveloped/insufficient blah” and I’m like “according to what baseline? How much blah does it take to count as having ‘sufficient’ blah? What are diagnostic details for measuring how much blah you really have?” So I finally got to hear some details from the autistic side of the fence, where people actually explain shit and elucidate the differences. And based on that: I’m hella not allistic. I can (and should! and have been meaning to!) write a whole separate post on this topic. I’m still not entirely sure I feel comfortable adopting “autistic” label (for reasons which are, themselves, further symptoms of autism), because my experiences don’t match up perfectly with some of the parts of what is traditionally called “autism”, but I’m absolutely non-allistic. I think the spectrum of non-allism is far larger and more diverse than allistic people currently believe, but —again— a post for another time.

comment count unavailable comments

June 23, 2017 05:37 AM

June 22, 2017

Philip Wadler


Please submit to RADICAL 2017, Recent Advances in Concurrency and Logic, a workshop co-located with QONFEST (CONCUR, QEST, FORMATS, and EPEW), Berlin (Germany), September 4, 2017.
As you know, submissions to RADICAL could be, for instance:- reports of an ongoing work and/or preliminary results;- summaries of an already published paper (even at CONCUR'17 - see below);- overviews of (recent) PhD theses;- descriptions of research projects and consortia;- manifestos, calls to action, personal views on current and future challenges;- overviews of interesting yet underrepresented problems.
Many thanks for your cooperation!Julian and Jorge

by Philip Wadler ( at June 22, 2017 02:26 PM

June 21, 2017

Keegan McAllister

A Rust view on Effective Modern C++

Recently I've been reading Effective Modern C++ by Scott Meyers. It's a great book that contains tons of practical advice, as well as horror stories to astound your friends and confuse your enemies. Since Rust shares many core ideas with modern C++, I thought I'd describe how some of the C++ advice translates to Rust, or doesn't.

This is not a general-purpose Rust / C++ comparison. Honestly, it might not make a lot of sense if you haven't read the book I'm referencing. There are a number of C++ features missing in Rust, for example integer template arguments and advanced template metaprogramming. I'll say no more about those because they aren't new to modern C++.

I may have a clear bias here because I think Rust is a better language for most new development. However, I massively respect the effort the C++ designers have put into modernizing the language, and I think it's still the best choice for many tasks.

There's a common theme that I'll avoid repeating: most of the C++ pitfalls that result in undefined behavior will produce compiler or occasionally runtime errors in Rust.

Chapters 1 & 2: Deducing Types / auto

This is what Rust and many other languages call "type inference". C++ has always had it for calls to function templates, but it became much more powerful in C++11 with the auto keyword.

Rust's type inference seems to be a lot simpler. I think the biggest reason is that Rust treats references as just another type, rather than the weird quasi-transparent things that they are in C++. Also, Rust doesn't require the auto keyword — whenever you want type inference, you just don't write the type. Rust also lacks std::initializer_list, which simplifies the rules further.

The main disadvantage in Rust is that there's no support to infer return types for fn functions, only for lambdas. Mostly I think it's good style to write out those types anyway; GHC Haskell warns when you don't. But it does mean that returning a closure without boxing is impossible, and returning a complex iterator chain without boxing is extremely painful. Rust is starting to improve the situation with -> impl Trait.

Rust lacks decltype and this is certainly a limitation. Some of the uses of decltype are covered by trait associated types. For example,

template<typename Container, typename Index>
auto get(Container& c, Index i)
-> decltype(c[i])
{ … }


fn get<Container, Index, Output>(c: &Container, i: Index) -> &Output
where Container: ops::Index<Index, Output=Output>
{ … }

The advice to see inferred types by intentionally producing a type error applies equally well in Rust.

Chapter 3: Moving to Modern C++

Initializing values in Rust is much simpler. Constructors are just static methods named by convention, and they take arguments in the ordinary way. For good or for ill, there's no std::initializer_list.

nullptr is not an issue in Rust. &T and &mut T can't be null, and you can make null raw pointers with ptr::null() or ptr::null_mut(). There are no implicit conversions between pointers and integral types.

Regarding aliases vs. typedefs, Rust also supports two syntaxes:

use foo::Bar as Baz;
type Baz = foo::Bar;

type is a lot more common, and it supports type parameters.

Rust enums are always strongly typed. They are scoped unless you explicitly use MyEnum::*;. A C-like enum (one with no data fields) can be cast to an integral type.

f() = delete; has no equivalent in Rust, because Rust doesn't implicitly define functions for you in the first place.

Similar to the C++ override keyword, Rust requires a default keyword to enable trait specialization. Unlike in C++, it's mandatory.

As in C++, Rust methods can be declared to take self either by reference or by move. Unlike in C++, you can't easily overload the same method to allow either.

Rust supports const iterators smoothly. It's up to the iterator whether it yields T, &T, or &mut T (or even something else entirely).

The IntoIterator trait takes the place of functions like std::begin that produce an iterator from any collection.

Rust has no equivalent to noexcept. Any function can panic, unless panics are disabled globally. This is pretty unfortunate when writing unsafe code to implement data types that have to be exception-safe. However, recoverable errors in Rust use Result, which is part of the function's type.

Rust supports a limited form of compile-time evaluation, but it's not yet nearly as powerful as C++14 constexpr. This is set to improve with the introduction of miri.

In Rust you mostly don't have to worry about "making const member functions thread safe". If something is shared between threads, the compiler will ensure it's free of thread-related undefined behavior. (This to me is one of the coolest features of Rust!) However, you might run into higher-level issues such as deadlocks that Rust's type system can't prevent.

There are no special member functions in Rust, e.g. copy constructors. If you want your type to be Clone or Copy, you have to opt-in with a derive or a manual impl.

Chapter 4: Smart Pointers

Smart pointers are very important in Rust, as in modern C++. Much of the advice in this chapter applies directly to Rust.

std::unique_ptr corresponds directly to Rust's Box type. However, Box doesn't support custom deallocation code. If you need that, you have to either make it part of impl Drop on the underlying type, or write your own smart pointer. Box also does not support custom allocators.

std::shared_ptr corresponds to Rust's Arc type. Both provide thread-safe reference counting. Rust also supports much faster thread-local refcounting with the Rc type. Don't worry, the compiler will complain if you try to send an Rc between threads.

C++ standard libraries usually implement shared_ptr as a "fat pointer" containing both a pointer to the underlying value and a pointer to a refcount struct. Rust's Rc and Arc store the refcounts directly before the value in memory. This means that Rc and Arc are half the size of shared_ptr, and may perform better due to fewer indirections. On the downside, it means you can't upgrade Box to Rc/Arc without a reallocation and copy. It could also introduce performance problems on certain workloads, due to cache line sharing between the refcounts and the data. (I would love to hear from anyone who has run into this!) Boost supports intrusive_ptr which should perform very similarly to Rust's Arc.

Like Box, Rc and Arc don't support custom deleters or allocators.

Rust supports weak pointer variants of both Rc and Arc. Rather than panicing or returning NULL, the "upgrade" operation returns None, as you'd expect in Rust.

Chapter 5: Rvalue References, Move Semantics, and Perfect Forwarding

This is a big one. Move semantics are rare among programming languages, but they're key in both Rust and C++. However, the two languages take very different approaches, owing to the fact that Rust was designed around moves whereas they're a late addition to C++.

There's no std::move in Rust. Moves are the default for non-Copy types. The behavior of a move or copy is always a shallow bit-wise copy; there is no way to override it. This can greatly improve performance. For example, when a Rust Vec changes address due to resizing, it will use a highly optimized memcpy. In comparison, C++'s std::vector has to call the move constructor on every element, or the copy constructor if there's no noexcept move constructor.

However the inability to hook moves and the difficulty of creating immovable types is an obstacle for certain kinds of advanced memory management, such as intrusive pointers and interacting with external garbage collectors.

Moves in C++ leave the source value in an unspecified but valid state — for example, an empty vector or a NULL unique pointer. This has several weird consequences:

  • A move counts as mutating a source variable, so "Move requests on const objects are silently transformed into copy operations". This is a surprising performance leak.
  • The moved-out-of variable can still be used after the move, and you don't necessarily know what you'll get.
  • The destructor will still run and must take care not to invoke undefined behavior.

The first two points don't apply in Rust. You can move out of a non-mut variable. The value isn't considered mutated, it's considered gone. And the compiler will complain if you try to use it after the move.

The third point is somewhat similar to old Rust, where types with a destructor would contain an implicit "drop flag" indicating whether they had already been moved from. As of Rust 1.12 (September 2016), these hidden struct fields are gone, and good riddance! If a variable has been moved from, the compiler simply omits a call to its destructor. In the situations where a value may or may not have been moved (e.g. move in an if branch), Rust uses local variables on the stack.

Rust doesn't have a feature for perfect forwarding. There's no need to treat references specially, as they're just another type. Because there are no rvalue references in Rust, there's also no need for universal / forwarding references, and no std::forward.

However, Rust lacks variadic generics, so you can't do things like "factory function that forwards all arguments to constructor".

Item 29 says "Assume that move operations are not present, not cheap, and not used". I find this quite dispiriting! There are so many ways in C++ to think that you're moving a value when you're actually calling an expensive copy constructor — and compilers won't even warn you!

In Rust, moves are always available, always as cheap as memcpy, and always used when passing by value. Copy types don't have move semantics, but they act the same at runtime. The only difference is whether the static checks allow you to use the source location afterwards.

All in all, moves in Rust are more ergonomic and less surprising. Rust's treatment of moves should also perform better, because there's no need to leave the source object in a valid state, and there's no need to call move constructors on individual elements of a collection. (But can we benchmark this?)

There's a bunch of other stuff in this chapter that doesn't apply to Rust. For example, "The interaction among perfect-forwarding constructors and compiler-generated copy and move operations develops even more wrinkles when inheritance enters the picture." This is the kind of sentence that will make me run away screaming. Rust doesn't have any of those features, gets by fine without them, and thus avoids such bizarre interactions.

Chapter 6: Lambda Expressions

C++ allows closures to be copied; Rust doesn't.

In C++ you can specify whether a lambda expression's captures are taken into the closure by reference or by value, either individually or for all captures at once. In Rust this is mostly inferred by how you use the captures: whether they are mutated, and whether they are moved from. However, you can prefix the move keyword to force all captures to be taken by value. This is useful when the closure itself will outlive its environment, common when spawning threads for example.

Rust uses this inference for another purpose: determining which Fn* traits a closure will implement. If the lambda body moves out of a capture, it can only implement FnOnce, whose "call" operator takes self by value. If it doesn't move but does mutate captures, it will implement FnOnce and FnMut, whose "call" takes &mut self. And if it neither moves nor mutates, it will implement all of FnOnce, FnMut, and Fn. C++ doesn't have traits (yet) and doesn't distinguish these cases. If your lambda moves from a capture, you can call it again and you'll see whatever "empty" value was left behind by the move constructor.

Rust doesn't support init capture; however, move capture is supported natively. You can do whatever init you like outside the lambda and then move the result in.

Like C++, Rust allows inference of closure parameter types. Unlike C++, an individual closure cannot be generic.

Chapter 7: The Concurrency API

Rust doesn't have futures in the standard library; they're part of an external library maintained by a core Rust developer. They're also used for async I/O.

In C++, dropping a std::thread that is still running terminates the program, which certainly seems un-fun to me. The behavior is justified by the possibility that the thread captures by reference something from its spawning context. If the thread then outlived that context, it would result in undefined behavior. In Rust, this can't happen because thread::spawn(f) has a 'static bound on the type of f. So, when a Rust JoinHandle falls out of scope, the thread is safely detached and continues to run.

The other possibility, in either language, is to join threads on drop, waiting for the thread to finish. However this has surprising performance implications and still isn't enough to allow threads to safely borrow from their spawning environment. Such "scoped threads" are provided by libraries in Rust and use a different technique to ensure safety.

C++ and Rust both provide atomic variables. In C++ they support standard operations such as assignment, ++, and atomic reads by conversion to the underlying type. These all use the "sequentially consistent" memory ordering, which provides the strongest guarantees. Rust is more explicit, using dedicated methods like fetch_add which also specify the memory ordering. (This kind of API is also available in C++.)

This chapter also talks about the C++ type qualifier volatile, even though it has to do with stuff like memory-mapped I/O and not threads. Rust doesn't have volatile types; instead, a volatile read or write is done using an intrinsic function.

Chapter 8: Tweaks

Rust containers don't have methods like emplace_back. You can however use the experimental placement-new feature.


Rust and C++ share many features, allowing a detailed comparison between them. Rust is a much newer design that isn't burdened with 20 years of backwards compatibility. This I think is why Rust's versions of these core features tend to be simpler and easier to reason about. On the other hand, Rust gains some complexity by enforcing strong static guarantees.

There are of course some differences of principle, not just historical quirks. C++ has an object system based on classes and inheritance, even allowing multiple inheritance. There's no equivalent in Rust. Rust also prefers simple and explicit semantics, while C++ allows a huge amount of implicit behavior. You see this for example with implicit copy construction, implicit conversions, ad-hoc function overloading, quasi-transparent references, and the operators on atomic values. There are still some implicit behaviors in Rust, but they're carefully constrained. Personally I prefer Rust's explicit style; I find there are too many cases where C++ doesn't "do what I mean". But other programmers may disagree, and that's fine.

I hope and expect that C++ and Rust will converge on similar feature-sets. C++ is scheduled to get a proper module system, a "concepts" system similar to traits, and a subset with statically-checkable memory safety. Rust will eventually have integer generics, variadic generics, and more powerful const fn. It's an exciting time for both languages :)

by keegan ( at June 21, 2017 07:58 PM

June 20, 2017

Neil Mitchell

Announcing Weeder: dead export detection

Most projects accumulate code over time. To combat that, I've written Weeder which detects unused Haskell exports, allowing dead code to be removed (pulling up the weeds). When used in conjunction with GHC -fwarn-unused-binds -fwarn-unused-imports and HLint it will enable deleting unused definitions, imports and extensions.

Weeder piggy-backs off files generated by stack, so first obtain stack, then:

  • Install weeder by running stack install weeder --resolver=nightly.
  • Ensure your project has a stack.yaml file. If you don't normally build with stack then run stack init to generate one.
  • Run weeder . --build, which builds your project with stack and reports any weeds.

What does Weeder detect?

Weeder detects a bunch of weeds, including:

  • You export a function helper from module Foo.Bar, but nothing else in your package uses helper, and Foo.Bar is not an exposed-module. Therefore, the export of helper is a weed. Note that helper itself may or may not be a weed - once it is no longer exported -fwarn-unused-binds will tell you if it is entirely redundant.
  • Your package depends on another package but doesn't use anything from it - the dependency should usually be deleted. This functionality is quite like packunused, but implemented quite differently.
  • Your package has entries in the other-modules field that are either unused (and thus should be deleted), or are missing (and thus should be added). The stack tool warns about the latter already.
  • A source file is used between two different sections in a .cabal file - e.g. in both the library and the executable. Usually it's better to arrange for the executable to depend on the library, but sometimes that would unnecessarily pollute the interface. Useful to be aware of, and sometimes worth fixing, but not always.
  • A file has not been compiled despite being mentioned in the .cabal file. This situation can be because the file is unused, or the stack compilation was incomplete. I recommend compiling both benchmarks and tests to avoid this warning where possible - running weeder . --build will use a suitable command line.

Beware of conditional compilation (e.g. CPP and the Cabal flag mechanism), as these may mean that something is currently a weed, but in different configurations it is not.

I recommend fixing the warnings relating to other-modules and files not being compiled first, as these may cause other warnings to disappear.

Ignoring weeds

If you want your package to be detected as "weed free", but it has some weeds you know about but don't consider important, you can add a .weeder.yaml file adjacent to the stack.yaml with a list of exclusions. To generate an initial list of exclusions run weeder . --yaml > .weeder.yaml.

You may wish to generalise/simplify the .weeder.yaml by removing anything above or below the interesting part. As an example of the .weeder.yaml file from ghcid:

- message: Module reused between components
- message:
- name: Weeds exported
- identifier: withWaiterPoll

This configuration declares that I am not interested in the message about modules being reused between components (that's the way ghcid works, and I am aware of it). It also says that I am not concerned about withWaiterPoll being a weed - it's a simplified method of file change detection I use for debugging, so even though it's dead now, I sometimes do switch to it.

Running with Continuous Integration

Before running Weeder on your continuous integration (CI) server, you should first ensure there are no existing weeds. One way to achieve that is to ignore existing hints by running weeder . --yaml > .weeder.yaml and checking in the resulting .weeder.yaml.

On the CI you should then run weeder . (or weeder . --build to compile as well). To avoid the cost of compilation you may wish to fetch the latest Weeder binary release. For certain CI environments there are helper scripts to do that.

Travis: Execute the following command:

curl -sL | sh -s .

The arguments after -s are passed to weeder, so modify the final . if you want other arguments.

Appveyor: Add the following statement to .appveyor.yml:

- ps: Invoke-Command ([Scriptblock]::Create((Invoke-WebRequest '').Content)) -ArgumentList @('.')

The arguments inside @() are passed to weeder, so add new arguments surrounded by ', space separated - e.g. @('.' '--build').

What about Cabal users?

Weeder requires the textual .hi file for each source file in the project. Stack generates that already, so it was easy to integrate in to. There's no reason that information couldn't be extracted by either passing flags to Cabal, or converting the .hi files afterwards. I welcome patches to do that integration.

by Neil Mitchell ( at June 20, 2017 09:33 PM

June 19, 2017

FP Complete

Understanding ResourceT

This blog post came out of two unrelated sets of questions I received last week about usage of the resourcet library. For those unfamiliar with it, the library is often used in combination with the Conduit streaming data library; basically every conduit tutorial will quickly jump into usage of the resourcet library.

Instead of just teaching you how to use the library, this post will demonstrate why you need it and how it works internally, to help you avoid some of the potential pitfalls of the library. And stay tuned in the next week or two for a fun debugging storing around resourcet, bracket, and monad-control.

Anyway, back to our topic. To start off, consider some code to read a file and print its size:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
import qualified Data.ByteString as B
import qualified System.IO as IO

main :: IO ()
main = do
  bs <- myReadFile "/usr/share/dict/words"
  print $ B.length bs

myReadFile :: FilePath -> IO B.ByteString
myReadFile fp = IO.withBinaryFile fp IO.ReadMode $ \h ->
  -- Highly inefficient, use a builder instead
  let loop front = do
        next <- B.hGetSome h 4096
        if B.null next
          then return front
          else loop $ B.append front next
   in loop B.empty

However, this is highly inefficient: it reads the entire contents of the file into memory at once, when we don't need that. Instead, let's calculate that in a streaming fashion:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString as B
import qualified System.IO as IO

main :: IO ()
main = do
  len <- myFileLength "/usr/share/dict/words"
  print len

-- Yes, there's hFileSize... ignore that
myFileLength :: FilePath -> IO Int
myFileLength fp = IO.withBinaryFile fp IO.ReadMode $ \h ->
  let loop !total = do
        next <- B.hGetSome h 4096
        if B.null next
          then return total
          else loop $ total + B.length next
   in loop 0

Notice that in both of these implementations, we've used withBinaryFile to open the file in such a way that the handle will be closed when we're done with it, regardless of whether an exception is thrown.

Introduce continuations

But it's pretty unforunate that we've coupled together our file read logic with the logic that consumes the file. Let's make an abstraction similar to conduit to address that. We'll have an action which returns the next chunk of data from the file, and the following action to perform.

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString as B
import qualified System.IO as IO

data IOSource a
  = IOChunk a (IO (IOSource a))
  | IODone

sourceHandle :: IO.Handle -> IO (IOSource B.ByteString)
sourceHandle h = do
  next <- B.hGetSome h 4096
  return $
    if B.null next
      then IODone
      else IOChunk next (sourceHandle h)

sourceFile :: FilePath -> IO (IOSource B.ByteString)
sourceFile fp = IO.withBinaryFile fp IO.ReadMode sourceHandle

sourceLength :: IO (IOSource B.ByteString) -> IO Int
sourceLength =
    loop 0
    loop !total mnext = do
      next <- mnext
      case next of
        IOChunk bs mnext' -> loop (total + B.length bs) mnext'
        IODone -> return total

main :: IO ()
main = do
  len <- sourceLength $ sourceFile "/usr/share/dict/words"
  print len

Our IOSource is essentially a slimmed-down conduit which can't consume any input, only produce output. That's good enough for proving our point. The sourceHandle function has the same basic structure to what we were doing in our first two code examples: read a chunk of data, see if it's null, and if not, we return that chunk and then keep going. We then do a trivial wrapping up of sourceHandle with sourceFile, which uses the same withBinaryFile we had before. Finally, sourceLength just grabs the successive chunks from a given IOSource and counts the total bytes.

There's a major bug in this program. Try to spot it. Think through the control flow of this program. I encourage you to actually figure it out for yourself instead of just continuing to my explanation below.

Hint 1 This isn't a subtle exception-handling bug, it makes the program above completely broken in all cases (except, interestingly, the case of an empty file). You will never get a valid result, besides the empty file case.

Hint 2 The output when I run this program is /usr/share/dict/words: hGetBufSome: illegal operation (handle is closed).

Explanation When we enter the sourceFile function, we first call withBinaryFile. This opens up a file handle. We hand this file handle to sourceHandle, which reads the first chunk of data from the file, and returns an IOChunk value containing that chunk and a continuation, or instruction on what to do next. This continuation is an IO action, and it refers to that file handle we were given by sourceFile. (This bit is vital.) We then return this IOChunk value from sourceHandle to sourceFile. Inside sourceFile, we now trigger the cleanup bit of withBinaryFile, which closes the handle, and then return the IOChunk value back to the caller.

When we consume that IOChunk value, we will proceed to perform that continuation we were handed back. That continuation refers to the previously opened file handle, and will try to read from it. See the problem? We've already closed it! There is nothing we can do with it anymore.

Explicit close

Let's try rewriting this to delay the closing of the file handle until the handle is fully consumed. Also, let's replace our sourceLength function with a new function: it tells us what the first byte in the file is. I've also added a putStrLn to tell us when we're closing the file handle.

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString as B
import qualified System.IO as IO
import Data.Word (Word8)

data IOSource a
  = IOChunk a (IO (IOSource a))
  | IODone

sourceHandle :: IO.Handle -> IO (IOSource B.ByteString)
sourceHandle h = do
  next <- B.hGetSome h 4096
  if B.null next
    then do
      putStrLn "Closing file handle"
      IO.hClose h
      return IODone
    else return $ IOChunk next (sourceHandle h)

sourceFile :: FilePath -> IO (IOSource B.ByteString)
sourceFile fp = do
  h <- IO.openBinaryFile fp IO.ReadMode
  sourceHandle h

firstByte :: IO (IOSource B.ByteString) -> IO (Maybe Word8)
firstByte mnext = do
  next <- mnext
  return $
    case next of
      IOChunk bs _mnext' -> Just $ B.head bs
      IODone             -> Nothing

main :: IO ()
main = do
  mbyte <- firstByte $ sourceFile "/usr/share/dict/words"
  print mbyte

OK, take a guess at the output. In particular, will our file handle be closed, and why?

It turns out that, when dealing with continuations, there is no way to guarantee that your continuation will ever get called. In our case, we're only interested in reading the first chunk of data from the file, and want to ignore the rest. As a result, our cleanup code will never get called. This doesn't even get into the fact that, if an exception is thrown, we have no exception handler in place to perform cleanup. The moral of the story:

Continuation based approaches, like conduit or ContT, cannot guarantee that cleanup code will be run.

(Side note: conduit actually adds a concept called finalizers to address the non-exception case and to ensure cleanup happens promptly. But that's not our topic today.)

So what's the right way to write this code? You have to use withBinaryFile outside of your sourceHandle call entirely, like this:

main :: IO ()
main = do
  mbyte <- IO.withBinaryFile "/usr/share/dict/words" IO.ReadMode
         $ \h -> firstByte $ sourceHandle h
  print mbyte

Why this is bad

Firstly, there's an aesthetic argument again the above code. A function like sourceFile is convenient, elegant, and simple to teach. Telling people that they need to open their file handles first can be confusing. But this isn't the only problem. Let's consider a few more complicated cases:

  1. I want to create an IOSource that reads from two files, not just one. Ideally, we would only keep one file handle open at a time. If you follow through on the withBinaryFile approach above, you'd realize you need to open up both files before you get started. This is a performance problem of using too many resources.
  2. Suppose you want to read a file, and each line in that file will tell you a new file to open and stream from. In this case, we won't know statically how many files to open, or even which files to open. Since these facts are dynamically determined, our withBinaryFile approach won't work at all.
  3. If the previous example seems a bit far-fetched, that's exactly the case when doing a deep directory traversal. We start with a top level directory, and for each entry, may or may not need to open up a new directory handle, depending on whether it's a directory or not.

In other words: this approach is a bit cumbersome to use, resource-inefficient, and prevents some programs from being written at all. We need something better.

Why withBinaryFile works

The reason that withBinaryFile solves our problems is that it lives outside of our continuation framework. It is not subject to the whims of whether a specific continuation will or will not be called. It lives in IO directly, and we know how to install a cleanup function which will always be called, regardless of whether an exception is thrown or not. Specifically: we can just use bracket.

We need some way to pair the control that bracket provides from outside our continuation with the dynamic allocations we want to perform inside our continuations.

A simplified ResourceT

In order to make this work, we'll implement a simplified version of ResourceT. We'll keep a list of file handles that need to be closed. But since we need to be able to update that list dynamically from within our continuation code, this will be a mutable list (wrapped in an IORef). Also, for simplicity, we'll make it ResourceIO instead of a proper monad transformer.

Note that, by sticking to just a list of file handles, we've simplified our work significantly. File handles can be closed multiple times, and closing a file handle is not supposed to throw an exception itself (though it can in some corner cases; we're ignoring that). The actual code for ResourceT ensures that cleanups only happen one time and explicitly deals with exceptions from cleanup code.

{-# LANGUAGE DeriveFunctor #-}
module ResourceIO
  ( ResourceIO
  , runResourceIO
  , openBinaryFile
  ) where

import Data.IORef
import qualified System.IO as IO
import Control.Exception
import Control.Monad
import Control.Monad.IO.Class

newtype ResourceIO a = ResourceIO (IORef [IO.Handle] -> IO a)
  deriving Functor

instance Applicative ResourceIO where
  pure x = ResourceIO $ \_ -> return x
  (<*>) = ap
instance Monad ResourceIO where
  return = pure
  ResourceIO f >>= g = ResourceIO $ \ref -> do
    x <- f ref
    let ResourceIO g' = g x
    g' ref
instance MonadIO ResourceIO where
  liftIO m = ResourceIO $ \_ref -> m

runResourceIO :: ResourceIO a -> IO a
runResourceIO (ResourceIO inner) = bracket
  (newIORef [])
    cleanup ref = do
      handles <- readIORef ref
      mapM_ IO.hClose handles

openBinaryFile :: FilePath -> IO.IOMode -> ResourceIO IO.Handle
openBinaryFile fp mode = ResourceIO $ \ref -> mask $ \restore -> do
  h <- restore $ IO.openBinaryFile fp mode
  atomicModifyIORef' ref $ \hs -> (h:hs, ())
  return h

Most of the code here is involved in implementing a Monad/MonadIO interface for ResourceIO. If you focus on runResourceIO, you'll see that, as promised, we're using bracket. We create our shared mutable reference, ensure that cleanup is called regardless of exceptions, and then run the user-provided action.

openBinaryFile demonstrates how we would allocate resources. We open the file, and immediately modify our list of open handles to include the newly opened handle. In the real ResourceT, this is generalized to IO () actions to perform arbitrary cleanup.

Side note: if you're confused about the usage of mask here, it's to deal with the possibility of asynchronous exceptions, and to make sure an exception is not thrown between the call to openBinaryFile and atomicModifyIORef'. Proper async exception handling is a complicated topic, which is why it's best to stick to library functions like bracket and libraries like safe-exceptions that are designed to handle them.

Using it

We need to make some minor modifications to our program in order to use this. Firstly, we specialized IOSource to using IO actions only. We're now going to want this thing to run in ResourceIO, so let's add a type parameter to indicate the base monad (just like ConduitM has). And let's also call a spade a spade, and rename from IOSource to ListT. This is, after all, the correctly implemented list monad transformer. (Ignore the one from the transformers package, it's completely broken.)

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString as B
import qualified System.IO as IO
import Data.Word (Word8)
import ResourceIO
import Control.Monad.IO.Class

data ListT m a
  = ConsT a (m (ListT m a))
  | NilT

sourceHandle :: MonadIO m => IO.Handle -> m (ListT m B.ByteString)
sourceHandle h = liftIO $ do
  next <- B.hGetSome h 4096
  if B.null next
    then do
      IO.hClose h
      return NilT
    else return $ ConsT next (sourceHandle h)

sourceFile :: FilePath -> ResourceIO (ListT ResourceIO B.ByteString)
sourceFile fp = do
  h <- openBinaryFile fp IO.ReadMode
  sourceHandle h

firstByte :: Monad m => m (ListT m B.ByteString) -> m (Maybe Word8)
firstByte mnext = do
  next <- mnext
  return $
    case next of
      ConsT bs _mnext' -> Just $ B.head bs
      NilT             -> Nothing

main :: IO ()
main = do
  mbyte <- runResourceIO $ firstByte $ sourceFile "/usr/share/dict/words"
  print mbyte

Note that there's no longer any call with withBinaryFile, and we have all of the exception safety guarantees we want. We can even implement something which reads two files in sequence, and have the desired behavior of only having one file open at a time:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString as B
import qualified System.IO as IO
import Data.Word (Word8)
import ResourceIO
import Control.Monad.IO.Class

data ListT m a
  = ConsT a (m (ListT m a))
  | NilT

appendListT :: Monad m
            => m (ListT m a)
            -> m (ListT m a)
            -> m (ListT m a)
appendListT left0 right =
    loop left0
    loop mnext = do
      next <- mnext
      case next of
        ConsT x mnext' -> return $ ConsT x $ loop mnext'
        NilT           -> right

sourceHandle :: MonadIO m => IO.Handle -> m (ListT m B.ByteString)
sourceHandle h = liftIO $ do
  next <- B.hGetSome h 4096
  if B.null next
    then do
      IO.hClose h
      return NilT
    else return $ ConsT next (sourceHandle h)

sourceFile :: FilePath -> ResourceIO (ListT ResourceIO B.ByteString)
sourceFile fp = do
  h <- openBinaryFile fp IO.ReadMode
  sourceHandle h

sourceLength :: Monad m => m (ListT m B.ByteString) -> m Int
sourceLength =
    loop 0
    loop !total mnext = do
      next <- mnext
      case next of
        ConsT bs mnext' -> loop (total + B.length bs) mnext'
        NilT            -> return total

main :: IO ()
main = do
  len <- runResourceIO $ sourceLength $ appendListT
    (sourceFile "/usr/share/dict/words")
    (sourceFile "/usr/share/dict/words")
  print len


If you looked in the code above, I used atomicModifyIORef' to add a new file handle to the cleanup queue. You may think that this means we're concurrency-friendly. However, we aren't at all. Let's start by adding a new function to our ResourceIO interface:

asyncResourceIO :: ResourceIO a -> ResourceIO (Async a)
asyncResourceIO (ResourceIO f) = ResourceIO $ \ref -> async $ f ref

This uses the async library to fork a thread and provides an Async value to retrieve the value from that thread when it completes. Now let's naively use it in our main function:

main :: IO ()
main = do
  alen <- runResourceIO $ asyncResourceIO $ sourceLength $
    (sourceFile "/usr/share/dict/words")
  putStrLn "Do some other work in the main thread, may take a while..."
  threadDelay 100000
  len <- wait alen
  print len

With the ominous introduction I gave this, answer this question: do you think this is going to work? And why or why not?

Let's step through what's going to happen here:

  1. runResourceIO creates a mutable reference to hold onto file handles to be closed
  2. asyncResourceIO forks a child thread
  3. Child thread opens up a file handle and adds it to the mutable reference of things to clean up
  4. Parent thread finishes forking the child thread, and (from within runResourceIO) calls the cleanup action, closing the file handle
  5. Child thread continues to do work, but throws an exception trying to read from the (now closed) file handle

Actually, that's just one possible scenario. Another possibility is that the parent thread will call cleanup before the child thread grabs the file handle. In which case, the reads will succeed, but we'll have no guarantee that the file handle will be cleaned up. In other words, we have a race condition.

This should stress the important of getting concurrency and ResourceT correct. We need to make sure that runResourceT does not close any resources that are still being consumed by child threads. One way to do that is to use the resourceForkIO function, which introduces a reference counting scheme to ensure that resources are only closed when all threads are done with them.

Unfortunately, due to how the monad-control instances for ResourceT work, using concurrency functions from lifted-base or lifted-async will not use this reference counting behavior. Overall, my recommendation is: don't fork threads when inside ResourceT if you can avoid it.

Other ways to abuse ResourceT

There is no actual scoping of the resources you get from ResourceT to ensure that they are still alive. Such techniques do exist (e.g., regions), but the types are significantly more complicated, which is why the conduit ecosystem sticks to ResourceT.

The simplest demonstration of breaking this is:

main :: IO ()
main = do
  h <- runResourceIO $ openBinaryFile "/usr/share/dict/words" IO.ReadMode
  len <- sourceLength $ sourceHandle h
  print len

The handle we get back from openBinaryFile will be closed before we ever get a chance to pass it to sourceHandle. This code is just as broken as:

main :: IO ()
main = do
  h <- IO.withBinaryFile "/usr/share/dict/words" IO.ReadMode return
  len <- sourceLength $ sourceHandle h
  print len

But for many, the latter is more obviously wrong. The rule: make sure that your runResourceIO call lives around the entire scope that the resources will be used in.

As a more real-world example taken from a Twitter discussion, consider the following code that you might achieve by playing Type Tetris with Conduit:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
import Conduit

main :: IO ()
main = do
  len <- runConduit
       $ transPipe runResourceT (sourceFile "/usr/share/dict/words")
      .| lengthCE
  print len

transPipe applies some kind of a monad transformation at each step of the running of the given conduit. So each time we try to perform some action in sourceFile, we'll create a new mutable reference of cleanup actions, perform the action, and then immediately clean up the resources we allocated. In reality, we want those resources to persist through later continuations within the sourceFile. We would rewrite the code above to:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
import Conduit

main :: IO ()
main = do
  len <- runResourceT
       $ runConduit
       $ sourceFile "/usr/share/dict/words"
      .| lengthCE
  print len

Or, since runConduitRes = runResourceT . runConduit:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
import Conduit

main :: IO ()
main = do
  len <- runConduitRes
       $ sourceFile "/usr/share/dict/words"
      .| lengthCE
  print len

June 19, 2017 08:52 AM

Mark Jason Dominus

Git's rejected push error

On Saturday I posted an article explaining how remote branches and remote-tracking branches work in Git. That article is a prerequisite for this one. But here's the quick summary:

When dealing with a branch (say, master) copied from a remote repository (say, remote), there are three branches one must consider:
  1. The copy of master in the local repository
  2. The copy of master in the remote repository
  3. The local branch origin/master that records the last known position of the remote branch
Branch 3 is known as a “remote-tracking branch”. This is because it tracks the remote branch, not because it is itself a remote branch. Actually it is a local copy of the remote branch. From now on I will just call it a “tracking branch”.

The git-fetch command (green) copies branch (2) to (3).

The git-push command (red) copies branch (1) to (2), and incidentally updates (3) to match the new (2).

The diagram at right summarizes this.

We will consider the following typical workflow:

  1. Fetch the remote master branch and check it out.
  2. Do some work and commit it on the local master.
  3. Push the new work back to the remote.

But step 3 fails, saying something like:

    ! [rejected]        master -> master (fetch first)
    error: failed to push some refs to '../remote/'
    hint: Updates were rejected because the remote contains work that you do
    hint: not have locally. This is usually caused by another repository pushing
    hint: to the same ref. You may want to first integrate the remote changes
    hint: (e.g., 'git pull ...') before pushing again.
    hint: See the 'Note about fast-forwards' in 'git push --help' for details.

In older versions of Git the hint was a little shorter:

    hint: Updates were rejected because the tip of your current branch is behind
    hint: its remote counterpart. Merge the remote changes (e.g. 'git pull')
    hint: before pushing again.
    hint: See the 'Note about fast-forwards' in 'git push --help' for details.

Everyone at some point gets one of these messages, and in my experience it is one of the most confusing and distressing things for beginners. It cannot be avoided, worked around, or postponed; it must be understood and dealt with.

Not everyone gets a clear explanation. (Reading it over, the actual message seems reasonably clear, but I know many people find it long and frighting and ignore it. It is tough in cases like this to decide how to trade off making the message shorter (and perhaps thereby harder to understand) or longer (and frightening people away). There may be no good solution. But here we are, and I am going to try to explain it myself, with pictures.)

In a large project, the remote branch is always moving, as other people add to it, and they do this without your knowing about it. Immediately after you do the fetch in step 1 above, the tracking branch origin/master reflects the state of the remote branch. Ten seconds later, it may not; someone else may have come along and put some more commits on the remote branch in the interval. This is a fundamental reality that new Git users must internalize.

Typical workflow

We were trying to do this:

  1. Fetch the remote master branch and check it out.
  2. Do some work and commit it on the local master.
  3. Push the new work back to the remote.

and the failure occurred in step 3. Let's look at what each of these operations actually does.

1. Fetch the remote master branch and check it out.

git fetch origin master
git checkout master

The black circles at the top represent some commits that we want to fetch from the remote repository. The fetch copies them to the local repository, and the tracking branch origin/master points to the local copy. Then we check out master and the local branch master also points to the local copy.

Branch names like master or origin/master are called “refs”. At this moment all three refs refer to the same commit (although there are separate copies in the two repositories) and the three branches have identical contents.

2. Do some work and commit it on the local master.

git add …
git commit …

The blue dots on the local master branch are your new commits. This happens entirely inside your local repository and doesn't involve the remote one at all.

But unbeknownst to you, something else is happening where you can't see it. Your collaborators or co-workers are doing their own work in their own repositories, and some of them have published this work to the remote repository. These commits are represented by the red dots in the remote repository. They are there, but you don't know it yet because you haven't looked at the remote repository since they appeared.

3. Push the new work back to the remote.

git push origin master

Here we are trying to push our local master, which means that we are asking the remote repo to overwrite its master with our local one. If the remote repo agreed to this, the red commits would be lost (possibly forever!) and would be completely replaced by the blue commits. The error message that is the subject of this article is Git quite properly refusing to fulfill your request:

    ! [rejected]        master -> master (fetch first)
    error: failed to push some refs to '../remote/'
    hint: Updates were rejected because the remote contains work that you do
    hint: not have locally. This is usually caused by another repository pushing
    hint: to the same ref. You may want to first integrate the remote changes
    hint: (e.g., 'git pull ...') before pushing again.
    hint: See the 'Note about fast-forwards' in 'git push --help' for details.

Let's read through that slowly:

Updates were rejected because the remote contains work that you do not have locally.

This refers specifically to the red commits.

This is usually caused by another repository pushing to the same ref.

In this case, the other repository is your co-worker's repo, not shown in the diagram. They pushed to the same ref (master) before you did.

You may want to first integrate the remote changes (e.g., 'git pull ...') before pushing again.

This is a little vague. There are many ways one could conceivably “integrate the remote changes” and not all of them will solve the problem.

One alternative (which does not integrate the changes) is to use git push -f. The -f is for “force”, and instructs the remote repository that you really do want to discard the red commits in favor of the blue ones. Depending on who owns it and how it is configured, the remote repository may agree to this and discard the red commits, or it may refuse. (And if it does agree, the coworker whose commits you just destroyed may try to feed you poisoned lemonade, so use -f with caution.)

See the 'Note about fast-forwards' in 'git push --help' for details.

To “fast-forward” the remote ref means that your local branch is a direct forward extension of the remote branch, containing everything that the remote branch does, in exactly the same order. If this is the case, overwriting the remote branch with the local branch is perfectly safe. Nothing will be lost or changed, because the local branch contains everything the remote branch already had. The only change will be the addition of new commits at the end.

There are several ways to construct such a local branch, and choosing between them depends on many factors including personal preference, your familiarity with the Git tool set, and the repository owner's policies. Discussing all of this is outside the scope of the article, so I'll just use one as an example: We are going to rebase the blue commits onto the red ones.

4. Refresh the tracking branch.

git fetch origin master

The first thing to do is to copy the red commits into the local repo; we haven't even seen them yet. We do that as before, with git-fetch. This updates the tracking branch with a copy of the remote branch just as it did in step 1.

If instead of git fetch origin master we did git pull --rebase origin master, Git would do exactly the same fetch, and then automatically do a rebase as described in the next section. If we did git pull origin master without --rebase, it would do exactly the same fetch, and then instead of a rebase it would do a merge, which I am not planning to describe. The point to remember is that git pull is just a convenient way to combine the commands of this section and the next one, nothing more.

5. Rewrite the local changes.

git rebase origin/master

Now is the moment when we “integrate the remote changes” with our own changes. One way to do this is git rebase origin/master. This tells Git to try to construct new commits that are just like the blue ones, but instead of starting from the last black commit, the will start from the last red one. (For more details about how this works, see my talk slides about it.) There are many alternatives here to rebase, some quite elaborate, but that is a subject for another article, or several other articles.

If none of the files modified in the blue commits have also been modified in any of the red commits, there is no issue and everything proceeds automatically. And if some of the same files are modified, but only in non-overlapping portions, Git can automatically combine them. But if some of the files are modified in incompatible ways, the rebase process will stop in the middle and asks how to proceed, which is another subject for another article. This article will suppose that the rebase completed automatically. In this case the blue commits have been “rebased onto” the red commits, as in the diagram at right.

The diagram is a bit misleading here: it looks as though those black and red commits appear in two places in the local repository, once on the local master branch and once on the tracking branch. They don't. The two branches share those commits, which are stored only once.

Notice that the command is git rebase origin/master. This is different in form from git fetch origin master or git push origin master. Why a slash instead of a space? Because with git-fetch or git-push, we tell it the name of the remote repo, origin, and the name of the remote branch we want to fetch or push, master. But git-rebase operates locally and has no use for the name of a remote repo. Instead, we give it the name of the branch onto which we want to rebase the new commits. In this case, the target branch is the tracking branch origin/master.

6. Try the push again.

git push origin master

We try the exact same git push origin master that failed in step 3, and this time it succeeds, because this time the operation is a “fast-forward”. Before, our blue commits would have replaced the red commits. But our rewritten local branch does not have that problem: it includes the red commits in exactly the same places as they are already on the remote branch. When the remote repository replaces its master with the one we are pushing, it loses nothing, because the red commits are identical. All it needs to do is to add the blue commits onto the end and then move its master ref forward to point to the last blue commit instead of to the last red commit. This is a “fast-forward”.

At this point, the push is successful, and the git-push command also updates the tracking branch to reflect that the remote branch has moved forward. I did not show this in the illustration.

But wait, what if someone else had added yet more commits to the remote master while we were executing steps 4 and 5? Wouldn't our new push attempt fail just like the first one did? Yes, absolutely! We would have to repeat steps 4 and 5 and try a third time. It is possible, in principle, to be completely prevented from pushing commits to a remote repo because it is always changing so quickly that you never get caught up on its current state. Repeated push failures of this type are sign that the project is large enough that repository's owner needs to set up a more structured code release mechanism than “everyone lands stuff on master whenever they feel like it”.

An earlier draft of this article ended at this point with “That is all I have to say about this.” Ha!

Unavoidable problems

Everyone suffers through this issue at some point or another. It is tempting to wonder if Git couldn't somehow make it easier for people to deal with. I think the answer is no. Git has multiple, distributed repositories. To abandon that feature would be to go back to the dark ages of galley slaves, smallpox, and SVN. But if you have multiple distributed anythings, you must face the issue of how to synchronize them. This is intrinsic to distributed systems: two components receive different updates at the same time, and how do you reconcile them?

For reasons I have discussed before, it does not appear possible to automate the reconciliation in every case in a source code control system, because sometimes the reconciliation may require going over to a co-worker's desk and arguing for two hours, then calling in three managers and the CTO and making a strategic decision which then has to be approved by a representative of the legal department. The VCS is not going to do this for you.

I'm going to digress a bit and then come back to the main point. Twenty-five years ago I taught an introductory programming class in C. The previous curriculum had tried hard to defer pointers to the middle of the semester, as K&R does (chapter 7, I think). I decided this was a mistake. Pointers are everywhere in C and without them you can't call scanf or pass an array to a function (or access the command-line arguments or operate on strings or use most of the standard library or return anything that isn't a number…). Looking back a few years later I wrote:

Pointers are an essential part of [C's] solution to the data hiding problem, which is an essential issue. Therefore, they cannot be avoided, and in fact should be addressed as soon as possible. … They presented themselves in the earliest parts of the material not out of perversity, but because they were central to the topic.

I developed a new curriculum that began treating pointers early on, as early as possible, and which then came back to them repeatedly, each time elaborating on the idea. This was a big success. I am certain that it is the right way to do it.

(And I've been intending since 2006 to write an article about K&R's crappy discussion of pointers and how its deficiencies and omissions have been replicated down the years by generation after generation of C programmers.)

I think there's an important pedagogical principle here. A good teacher makes the subject as simple as possible, but no simpler. Many difficult issues, perhaps most, can be ignored, postponed, hidden, prevaricated, fudged, glossed over, or even solved. But some must be met head-on and dealt with, and for these I think the sooner they are met and dealt with, the better.

Push conflicts in Git, like pointers in C, are not minor or peripheral; they are an intrinsic and central issue. Almost everyone is going to run into push conflicts, not eventually, but right away. They are going to be completely stuck until they have dealt with it, so they had better be prepared to deal with it right away.

If I were to write a book about Git, this discussion would be in chapter 2. Dealing with merge conflicts would be in chapter 3. All the other stuff could wait.

That is all I have to say about this. Thank you for your kind attention, and thanks to Sumana Harihareswara and AJ Jordan for inspiration.

by Mark Dominus ( at June 19, 2017 04:34 AM

June 18, 2017

Holden Karau

SF Queer Trans Focused Scooter Group

Outside of the wonderful world of distributed systems, open source software, and being kind of emo about that, I help organize a queer & trans focused scooter group in San Francisco called the Sparkling Pink Pandas. Many of our members ride on rental scoots (no motorcycle license requires) and our most frequent destinations are the 7-11 and the cookie shop. Every year around pride we try and recruit some new members since that's when realize how much more awesome it would be if their more of us. If this sounds like fun to you, and you are not an asshole, come join us at

by Holden Karau ( at June 18, 2017 05:41 PM

Gabriel Gonzalez

Dhall is now a template engine

<html xmlns=""><head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> <meta content="text/css" http-equiv="Content-Style-Type"/> <meta content="pandoc" name="generator"/> <style type="text/css">code{white-space: pre;}</style> <style type="text/css">div.sourceCode { overflow-x: auto; } table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode { margin: 0; padding: 0; vertical-align: baseline; border: none; } table.sourceCode { width: 100%; line-height: 100%; } td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; } td.sourceCode { padding-left: 5px; } code > { color: #007020; font-weight: bold; } /* Keyword */ code > span.dt { color: #902000; } /* DataType */ code > span.dv { color: #40a070; } /* DecVal */ code > { color: #40a070; } /* BaseN */ code > span.fl { color: #40a070; } /* Float */ code > { color: #4070a0; } /* Char */ code > { color: #4070a0; } /* String */ code > { color: #60a0b0; font-style: italic; } /* Comment */ code > span.ot { color: #007020; } /* Other */ code > { color: #ff0000; font-weight: bold; } /* Alert */ code > span.fu { color: #06287e; } /* Function */ code > { color: #ff0000; font-weight: bold; } /* Error */ code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */ code > { color: #880000; } /* Constant */ code > { color: #4070a0; } /* SpecialChar */ code > span.vs { color: #4070a0; } /* VerbatimString */ code > { color: #bb6688; } /* SpecialString */ code > { } /* Import */ code > { color: #19177c; } /* Variable */ code > { color: #007020; font-weight: bold; } /* ControlFlow */ code > span.op { color: #666666; } /* Operator */ code > span.bu { } /* BuiltIn */ code > span.ex { } /* Extension */ code > span.pp { color: #bc7a00; } /* Preprocessor */ code > { color: #7d9029; } /* Attribute */ code > { color: #ba2121; font-style: italic; } /* Documentation */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ </style></head><body>

Dhall is a typed and programmable configuration language which you can:

... and now you can also use Dhall as a template engine with the newly released dhall-text library which provides a dhall-to-text executable for templating text.

This executable actually does not do very much: all the code does is check that the Dhall expression has type Text and then renders the Text. Most of the work to support template engine features actually consists of improvements to the core Dhall language. That means that all the features I'm highlighting in this post also benefit the other Dhall integrations.

You can learn more about Dhall by reading the official tutorial but I can also illustrate how dhall-to-text works by comparing to Mustache, which is one of the more widely used template engines. All of the following examples come from the Mustache manual for the Ruby library.

Initial example

Mustache is a text templating engine that subdivides the work of templating into two parts:

  • The text to template
  • The data to template the text with

For example, given the following template:

Hello {{name}}
You have just won {{value}} dollars!
Well, {{taxed_value}} dollars, after taxes.

... and the following data:

"name": "Chris",
"value": 10000,
"taxed_value": 10000 - (10000 * 0.4),
"in_ca": true

... we get the following output when we combine the two:

Hello Chris
You have just won 10000 dollars!
Well, 6000.0 dollars, after taxes.

In Dhall, there is no distinction between the template and the data. They are both Dhall expressions. A template is just a Dhall function and the data is just an argument that we pass to that function.

For example, the above template translates to this Dhall file:

$ cat function
\(record : { name : Text
, value : Double
, taxed_value : Double
, in_ca : Bool
) -> ''
Hello ${}
You have just won ${Double/show record.value} dollars!
${ if record.in_ca
then "Well, ${Double/show record.taxed_value} dollars, after taxes"
else ""

... and the above data payload translates to this Dhall file:

$ cat value
{ name = "Chris"
, value = 10000.0
, taxed_value = 6000.0
, in_ca = True

... and we can combine the two using the dhall-to-text executable by applying the function to the argument:

$ dhall-to-text <<< './function ./value'

Hello Chris
You have just won 10000.0 dollars!
Well, 6000.0 dollars, after taxes

This example already highlights several features of Dhall which the next section will walk through

Dhall basics

Dhall is a functional programming language and supports anonymous functions of the form:

\(functionArgumentName : functionArgumentType) -> functionResult

For example, this template:

    \(record : { name        : Text
, value : Double
, taxed_value : Double
, in_ca : Bool
) -> ''
Hello ${}
You have just won ${Double/show record.value} dollars!
${ if record.in_ca
then "Well, ${Double/show record.taxed_value} dollars, after taxes"
else ""

... is just one large function where:

  • the function argument name is record

  • the function argument type is the following anonymous record type:

    { name        : Text
    , value : Double
    , taxed_value : Double
    , in_ca : Bool
  • the function result is a multiline string literal

    Hello ${}
    You have just won ${Double/show record.value} dollars!
    ${ if record.in_ca
    then "Well, ${Double/show record.taxed_value} dollars, after taxes"
    else ""

Multiline string literals use the same syntax as The Nix language: two single quotes to open and close the string. Dhall also supports the ordinary string literals you know and love using double quotes, such as:

"Well, ${Double/show record.taxed_value} dollars, after taxes"

We can interpolate any Dhall expression of type Text into a string literal using ${...} syntax (another newly added Dhall feature). We cannot automatically interpolate other types of values like Doubles, so we have to explicitly convert them with a function like Double/show.

Interpolation works for arbitrarily long Dhall expressions as long as they have type Text. This is why we can interpolate an if expression, like this:

${ if record.in_ca
then "Well, ${Double/show record.taxed_value} dollars, after taxes"
else ""

Dhall lets us import other Dhall expressions by their file path, URL, or even via environment variables. For example, we were already using this feature when evaluating our template:

$ dhall-to-text <<< './function ./value'

./function ./value is yet another valid Dhall expression that replaces ./function and ./value with the corresponding expression stored within each respective file.


Dhall is typed and will catch errors in our template files. If our record is missing any fields then that's a type error. For example:

$ dhall-to-text <<< './function { name = "Chris" }'
Error: Wrong type of function argument

./example0 { name = "Chris" }


We can also obtain more detailed information by adding the --explain flag:

$ dhall-to-text --explain <<< './function { name = "Chris" }'

Error: Wrong type of function argument

Explanation: Every function declares what type or kind of argu...

For example:

│ λ(x : Bool) → x : Bool → Bool │ This anonymous function...
└───────────────────────────────┘ arguments that have typ...

The function's input type


{ Lots of helpful explanation that I'm cutting out for brevity }


You tried to invoke the following function:

↳ λ(record : { in_ca : Bool, name : Text, taxed_value : Double...

... which expects an argument of type or kind:

↳ { in_ca : Bool, name : Text, taxed_value : Double, value : D...

... on the following argument:

↳ { name = "Chris" }

... which has a different type or kind:

↳ { name : Text }


./example0 { name = "Chris" }


These type safety guarantees protect us against unintentional templating errors.

Dhall does support optional fields and values, though, but you have to explicitly opt into them because all values are required by default. The next section covers how to produce and consume optional values.

Optional fields

In Mustache, if we provide a template like this:

* {{name}}
* {{age}}
* {{company}}
* {{{company}}}

... and we don't supply all the fields:

"name": "Chris",
"company": "<b>GitHub</b>"

... then by default any missing fields render as empty text (although this behavior is configurable in Mustache)::

* Chris
* &lt;b&gt;GitHub&lt;/b&gt;
* <b>GitHub</b>

Mustache also provides support for escaping HTML (and Dhall does not), as the above example illustrates.

If we ignore the ability to escape HTML, then the corresponding Dhall template would be:

$ cat function1
\(record : { name : Text
, age : Optional Integer
, company : Text
-> ''
* ${}
* ${Optional/fold Integer record.age Text Integer/show ""}
* ${}

... and the corresponding data would be:

$ cat value1
{ name = "Chris"
, age = [] : Optional Integer
, company = "<b>GitHub</b>"

... which renders like this:

$ dhall-to-text <<< './function1 ./value1'

* Chris
* <b>GitHub</b>

Dhall forces us to declare which values are Optional (such as age) and which values are required (such as name). However, we do have the luxury of specifying that individual values are Optional, whereas Mustache requires us to specify globally whether all values are optional or required.

We also still have to supply an Optional field, even if the field is empty. We can never omit a record field in Dhall, since that changes the type of the record.

We cannot interpolate record.age directly into the string because the type of record.age is Optional Integer and not Text. We have to explicitly convert to Text, like this:

Optional/fold Integer record.age Text Integer/show ""

Informally, you can read this code as saying:

  • If the record.age value is present, then use Integer/show to render the value
  • If the record.age value is absent, then return the empty string

Optional/fold is a builtin function that provides the most general function to consume an Optional value. However, the type is a bit long:

  (a : Type)  -- The element type of the `Optional` value
Optional a -- The `Optional` value to consume
(r : Type) -- The type of result we will produce
(a r) -- Function to produce the result if the value is present
r -- Result if the value is absent

We can work through this large type by seeing what is the inferred type of Optional/fold applied to successively more arguments:

: (a : Type) Optional a (r : Type) (a r) r r

Optional/fold Integer
: Optional Integer (r : Type) (Integer r) r r

Optional/fold Integer record.age
: (r : Type) (Integer r) r r

Optional/fold Integer record.age Text
: (Integer Text) Text Text

Optional/fold Integer record.age Text Integer/show
: Text Text

Optional/fold Integer record.age Text Integer/show ""
: Text

We could also make every field of the record optional, too:

    \(record : { name    : Optional Text
, age : Optional Integer
, company : Optional Text
-> let id = \(t : Text) -> t
in ''
* ${Optional/fold Text Text id ""}
* ${Optional/fold Integer record.age Text Integer/show ""}
* ${Optional/fold Text Text id ""}

... which would also require matching changes in the data:

{ name    = ["Chris"]         : Optional Text
, age = [] : Optional Integer
, company = ["<b>GitHub</b>"] : Optional Text

This is quite verbose, but we can take advantage of the fact that Dhall is a real programming language and define helper functions to reduce repetition. For example, we could save the following two files:

$ cat optionalText 
\(x : Optional Text)
-> Optional/fold Text x Text
(\(t : Text) -> t) -- What to do if the value is present
"" -- What to do if the value is absent
$ cat optionalInteger 
\(x : Optional Integer)
-> Optional/fold Integer x Text
Integer/show -- What to do if the value is present
"" -- What to do if the value is absent

... and then use those two functions to reduce the boilerplate of our template:

    \(record : { name    : Optional Text
, age : Optional Integer
, company : Optional Text
-> ''
* ${./optionalText }
* ${./optionalInteger record.age }
* ${./optionalText}

However, we might not even want to render the bullet at all if the value is missing. We could instead define the following two utilities:

$ cat textBullet 
\(x : Optional Text)
-> Optional/fold Text x Text
(\(t : Text) -> "* ${t}\n")
$ cat integerBullet 
\(x : Optional Integer)
-> Optional/fold Integer x Text
(\(t : Integer) -> "* ${Integer/show t}\n")

... and then we could write our template like this:

    \(record : { name    : Optional Text
, age : Optional Integer
, company : Optional Text
-> ./textBullet
++ ./integerBullet record.age
++ ./textBullet

... which would render like this:

* Chris
* <b>GitHub</b>

This illustrates how Dhall gives you greater precision in controlling the layout of your template. A template language like Mustache is limited by the fact that the templating logic must be expressed inline within the templated file itself. With Dhall you can separate the template from the logic if you want to avoid accidentally introducing superfluous newlines or whitespace.


Mustache lets you guard a section of text to only display if a boolean value is True:

Never shown!

If you render that with this data:

"person": false

... then you get this result:


The literal translation of that template to Dhall would be:

    \(record : { person : Bool })
-> ''
${ if record.person
then "Never shown!"
else ""

However, Dhall does not have to wrap everything in a record like Mustache does. We could just provide a naked Bool argument to our function directly:

-- ./function2

\(person : Bool)
-> ''
${ if person
then "Never shown!"
else ""

We also don't need to separate the argument out into a separate file. We can just apply the function directly to the argument like this:

$ dhall-to-text <<< './function2 False'


... or we could combine both of them into the same file if we never intended to change the data:

    let person = False
in ''
${ if person
then "Never shown!"
else ""

Mustache also has a notion of "truthiness", meaning that you can use other types of values in place of boolean values. For example, the Mustache template permits person to also be a List or an Optional value, and Mustache would treat the absence of a value as equivalent to False and the presence of at least one value as equivalent to True.

Dhall does not automatically treat Bool/List/Optional as interchangeable. You have to explicitly convert between them in order to avoid type errors.


Mustache uses a similar syntax to render a list of values. For example, if you template this file:


... with this data:

"repo": [
{ "name": "resque" },
{ "name": "hub" },
{ "name": "rip" }

... then you would get this result:


The equivalent Dhall template is:

    let concatMap =
in \(repo : List Text)
-> concatMap Text (\(name : Text) -> "<b>${name}</b>\n") repo

... and the equivalent Dhall payload is:

[ "resque"
, "hub"
, "rip"

Again, we don't need to wrap each value of the list in a one-field record like we do with Mustache. That's why we can get away with passing a list of naked Text values (i.e. List Text) instead of a list of one-field records (i.e. List { name : Text }).

This example also illustrates how Dhall can import expressions by URL. Dhall hosts a Prelude of utilities online that you can use anywhere within your program by pasting their URL. The web is Dhall's "package system", except that instead of distributing code grouped in modules or packages you distribute code at the granularity of individual expressions.

The above example retrieves Dhall's concatMap function from a URL hosted on IPFS (a distributed hashtable for the web). You don't have to use IPFS to distribute Dhall expressions, though; you can host code anywhere that can serve raw text, such as a pastebin, GitHub, or your own server.


Mustache also lets you supply user-defined functions, using the same syntax as for boolean values and lists. For example, you can template this file:

{{name}} is awesome.

... with this data:

"name": "Willy",
"wrapped": function() {
return function(text, render) {
return "<b>" + render(text) + "</b>"

... and Mustache will call the function on the block wrapped with the function's name:

<b>Willy is awesome.</b>

Dhall makes no distinction between functions and data because Dhall is a functional language where functions are first-class values. We can translate the above template to Dhall like this:

    \(record : { wrapped : Text -> Text, name : Text })
-> record.wrapped "${} is awesome"

... and translating the data to:

{ name    = "Willy"
, wrapped = \(text : Text) -> "<b>${text}</b>"

Additional examples

We can translate the remaining examples from the Mustache manual fairly straightforwardly using the concepts introduced above.

Optional records in Mustache:

Hi {{name}}!

... translate to Optional values in Dhall consumed using Optional/fold:

    \(person : Optional Text)
-> Optional/fold Text person Text
(\(name : Text) -> "Hi ${name}!")

The following inverted section in Mustache:

No repos :(

... is also just a special case of Optional/fold in Dhall:

    \(repo : Optional Text)
-> Optional/fold Text repo Text
(\(name : Text) -> "<b>${name}</b>")
"No repos :("

Inline template comments in Mustache:

<h1>Today{{! ignore me }}.</h1>

... are more verbose in Dhall:

"<h1>Today${"" {- ignore me -}}.</h1>"

What Mustache calls "partial" values:

$ cat base.mustache:
{{> user}}
$ cat user.mustache:

... correspond to Dhall's support for importing paths as expressions:

$ cat base
let concatMap =
in \(names : List Text)
-> ''
${concatMap Text ./user names}
$ cat user
\(name : Text) -> "<strong>${name}</strong>"


If this interests you then you can test drive dhall-to-text by installing the executable from Hackage or by building from source on GitHub.

People most commonly adopt Dhall when they prefer to use a programming language without sacrificing safety. Dhall is a total (i.e. non-Turing-complete) programming language, meaning that evaluation never crashes, hangs, throws exceptions, or otherwise fails.

Dhall also supports other programming features besides the ones introduced in this post. Read the Dhall tutorial if you would like to learn about the full set of features that Dhall supports.


by Gabriel Gonzalez ( at June 18, 2017 01:11 PM

June 17, 2017

Mark Jason Dominus

Git remote branches and Git's missing terminology

Beginning and even intermediate Git users have several common problem areas, and one of these is the relationship between remote and local branches. I think the basic confusion is that it seems like there ought to be two things, the remote branch and the local one, and you copy back and forth between them. But there are not two but three, and the Git documentation does not clearly point this out or adopt clear terminology to distinguish between the three.

Let's suppose we have a remote repository, which could be called anything, but is typically named origin. And we have a local repository which has no name; it's just the local repo. And let's suppose we're working on a branch named master, as one often does.

There are not two but three branches of interest, and they might all be pointing to different commits:

  1. The branch named master in the local repo. This is where we do our work and make our commits. This is the local branch. It is at the lower left in the diagram.

  2. The branch named master in the remote repo. This is the remote branch, at the top of the diagram. We cannot normally see this at all because it is (typically) on another computer and (typically) requires a network operation to interact with it. So instead, we mainly deal with…

  3. The branch named origin/master in the local repo. This is the tracking branch, at the lower right in the diagram.

    We never modify the tracking branch ourselves. It is automatically maintained for us by Git. Whenever Git communicates with the remote repo and learns something about the disposition of the remote master branch, it updates the local branch origin/master to reflect what it has learned.

I think this triangle diagram is the first thing one ought to see when starting to deal with remote repositories and with git-fetch and git-push.

The Git documentation often calls the tracking branch the “remote-tracking branch”. It is important to understand that the remote-tracking branch is a local branch in the local repository. It is called the “remote-tracking” branch because it tracks the state of the remote branch, not because it is itself remote. From now on I will just call it the “tracking branch”.

Now let's consider a typical workflow:

  1. We use git fetch origin master. This copies the remote branch master from the remote repo to the tracking branch origin/master in the local repo. This is the green arrow in the diagram.

    If other people have added commits to the remote master branch since our last fetch, now is when we find out what they are. We can compare the local branch master with the tracking branch origin/master to see what is new. We might use git log origin/master to see the new commits, or git diff origin/master to compare the new versions of the files with the ones we had before. These commands do not look at the remote branch! They look at the copy of the remote branch that Git retrieved for us. If a long time elapses between the fetch and the compare, the actual remote branch might be in a completely different place than when we fetched at it.

    (Maybe you use pull instead of fetch. But pull is exactly like fetch except that it does merge or rebase after the fetch completes. So the process is the same; it merely combines this step and the next step into one command. )

  2. We decide how to combine our local master with origin/master. We might use git merge origin/master to merge the two branches, or we might use git rebase origin/master to copy our new local commits onto the commits we just fetched. Or we could use git reset --hard origin/master to throw away our local commits (if any) and just take the ones on the tracking branch. There are a lot of things that could happen here, but the blue arrow in the diagram shows the general idea: we see new stuff in origin/master and update the local master to include that new stuff in some way.

  3. After doing some more work on the local master, we want to publish the new work. We use git push origin master. This is the red arrow in the diagram. It copies the local master to the remote master, updating the remote master in the process. If it is successful, it also updates the tracking branch origin/master to reflect the new position of the remote master.

In the last step, why is there no slash in git push origin master? Because origin/master is the name of the tracking branch, and the tracking branch is not involved. The push command gets two arguments: the name of the remote (origin) and the branch to push (master) and then it copies the local branch to the remote one of the same name.

Deleting a branch

How do we delete branches? For the local branch, it's easy: git branch -d master does it instantly.

For the tracking branch, we include the -r flag: git branch -d -r origin/master. This deletes the tracking branch, and has no effect whatever on the remote repo. This is a very unusual thing to do.

To delete the remote branch, we have to use git-push because that is the only way to affect the remote repo. We use git push origin :master. As is usual with a push, if this is successful Git also deletes the tracking branch origin/master.

This section has glossed over an important point: git branch -d master does not delete the master branch, It only deletes the ref, which is the name for the branch. The branch itself remains. If there are other refs that refer to it, it will remain as long as they do. If there are no other refs that point to it, it will be deleted in due course, but not immediately. Until the branch is actually deleted, its contents can be recovered.


Another way to delete a local ref (whether tracking or not) is just to go into the repository and remove it. The repository is usually in a subdirectory .git of your working tree, and if you cd .git/refs you can see where Git records the branch names and what they refer to. The master branch is nothing more nor less than a file heads/master in this directory, and its contents are the commit ID of the commit to which it refers. If you edit this commit ID, you have pointed the ref at a different commit. If you remove the file, the ref is gone. It is that simple.

Tracking branches are similar. The origin/master ref is in .git/refs/remotes/origin/master.

The remote master branch, of course, is not in your repository at all; it's in the remote repository.

Poking around in Git's repository is fun and rewarding. (If it worries you, make another clone of the repo, poke around in the clone, and throw it away when you are finished poking.) Tinkering with the refs is a good place to start Git repo hacking: create a couple of branches, move them around, examine them, delete them again, all without using git-branch. Git won't know the difference. Bonus fun activity: HEAD is defined by the file .git/HEAD. When you make a new commit, HEAD moves forward. How does that work?

There is a gitrepository-layout manual that says what else you can find in the repository.

Failed pushes

We're now in a good position to understand one of the most common problems that Git beginners face: they have committed some work, and they want to push it to the remote repository, but Git says

      ! [rejected]        master -> master (fetch first)
      error: failed to push some refs to 'remote'
      something something fast-forward, whatever that is

My article explaining this will appear here on Monday. (No, I really mean it.)

Terminology problems

I think one of the reasons this part of Git is so poorly understood is that there's a lack of good terminology in this area. There needs to be a way to say "the local branch named master” and “the branch named master in the remote named origin” without writing a five- or nine-word phrase every time. The name origin/master looks like it might be the second of these, but it isn't. The documentation uses the descriptive but somewhat confusing term “remote-tracking branch” to refer to it. I think abbreviating this to “tracking branch” would tend to clear things up more than otherwise.

I haven't though of a good solution to the rest of it yet. It's tempting to suggest that we should abbreviate “the branch named master in the remote named origin” to something like “origin:master” but I think that would be a disaster. It would be too easy to confuse with origin/master and also with the use of the colon in the refspec arguments to git-push. Maybe something like origin -> master that can't possibly be mistaken for part of a shell command and that looks different enough from origin/master to make clear that it's related but not the same thing.

Git piles yet another confusion on this:

    $ git checkout master 
    Branch master set up to track remote branch master from origin.

This sounds like it has something to with the remote-tracking branch, but it does not! It means that the local branch master has been associated with the remote origin so that fetches and pushes that pertain to it will default to using that remote.

I will think this over and try to come up with something that sucks a little less. Suggestions are welcome.

by Mark Dominus ( at June 17, 2017 10:26 PM

Roman Cheplyaka

Generic unification

The unification-fd package by wren gayle romano is the de-facto standard way to do unification in Haskell. You’d use it if you need to implement type inference for your DSL, for example.

To use unification-fd, we first need to express our Type type as a fixpoint of a functor, a.k.a. an initial algebra.

For instance, let’s say we want to implement type inference for the simply typed lambda calculus (STLC). The types in STLC can be represented by a Haskell type

data Type = BaseType String
          | Fun Type Type

Note that Type cannot represent type variables.

Type can be equivalently represented as a fixpoint of a functor, TypeF:

-- defined in Data.Functor.Fixedpoint in unification-fd
newtype Fix f = Fix { unFix :: f (Fix f) }

data TypeF a = BaseType String
             | Fun a a
  deriving (Functor, Foldable, Traversable)

type Type = Fix TypeF

So Fix TypeF still cannot represent any type variables, but UTerm TypeF can. UTerm is another type defined in unification-fd that is similar to Fix except it includes another constructor for type variables:

-- defined in Control.Unification in unification-fd
data UTerm t v
    = UVar  !v               -- ^ A unification variable.
    | UTerm !(t (UTerm t v)) -- ^ Some structure containing subterms.

type PolyType = UTerm TypeF IntVar

UTerm, by the way, is the free monad over the functor t.


The Control.Unification module exposes several algorithms (unification, alpha equivalence) that work on any UTerm, provided that the underlying functor t (TypeF in our example) implements a zipMatch function:

class (Traversable t) => Unifiable t where
    -- | Perform one level of equality testing for terms. If the
    -- term constructors are unequal then return @Nothing@; if they
    -- are equal, then return the one-level spine filled with
    -- resolved subterms and\/or pairs of subterms to be recursively
    -- checked.
    zipMatch :: t a -> t a -> Maybe (t (Either a (a,a)))

zipMatch essentially tells the algorithms which constructors of our TypeF functor are the same, which are different, and which fields correspond to variables. So for TypeF it could look like

instance Unifiable TypeF where
  zipMatch (BaseType a) (BaseType b) =
    if a == b
      then Just $ BaseType a
      else Nothing
  zipMatch (Fun a1 a2) (Fun b1 b2) =
    Just $ Fun (Right (a1, b1)) (Right (a2, b2))
  zipMatch _ _ = Nothing

Now, I prefer the following style instead:

instance Unifiable TypeF where
  zipMatch a b =
    case a of
      BaseType a' -> do
        BaseType b' <- return b
        guard $ a' == b'
        return $ BaseType a'
      Fun a1 a2 -> do
        Fun b1 b2 <- return b
        return $ Fun (Right (a1, b1)) (Right (a2, b2))

Why? First, I really don’t like multi-clause definitions. But the main reason is that the second definition behaves more reliably when we add new constructors to TypeF. Namely, if we enable ghc warnings (-Wall) and extend TypeF to include tuples:

data TypeF a = BaseType String
             | Fun a a
             | Tuple a a
  deriving (Functor, Foldable, Traversable)

… we’ll get a warning telling us not to forget to implement zipMatch for tuples:

warning: [-Wincomplete-patterns]
    Pattern match(es) are non-exhaustive
    In a case alternative: Patterns not matched: (Tuple _ _)

If we went with the first version, however, we would get no warning, because it contains a catch-all clause

  zipMatch _ _ = Nothing

As a result, it is likely that we forget to update zipMatch, and our tuples will never unify.

This is a common mistake people make when implementing binary operations in Haskell, so I just wanted to point it out. But other than that, both definitions are verbose and boilerplate-heavy.

And it goes without saying that in real-life situations, the types we want to unify tend to be bigger, and the boilerplate becomes even more tedious.

For instance, I’ve been working recently on implementing type inference for the nstack DSL, which includes tuples, records, sum types, optionals, arrays, the void type, and many primitive types. Naturally, I wasn’t eager to write zipMatch by hand.

Generic Unifiable

Generic programming is a set of techniques to avoid writing boilerplate such as our implementation of zipMatch above.

Over the years, Haskell has acquired a lot of different generic programming libraries.

For most of my generic programming needs, I pick uniplate. Uniplate is very simple to use and reasonably efficient. Occasionally I have a problem that requires something more sophisticated, like generics-sop to parse YAML or traverse-with-class to resolve names in a Haskell AST.

But none of these libraries can help us to implement a generic zipMatch.

Consider the following type:

data TypeF a = Foo a a
             | Bar Int String

A proper zipMatch implementation works very differently for Foo and Bar: Foo has two subterms to unify whereas Bar has none.

But most generics libraries don’t see this difference between Foo and Bar. They don’t distinguish between polymoprhic and non-polymorphic fields. Instead, they treat all fields as non-polymorphic. From their point of view, TypeF Bool is exactly equivalent to

data TypeF = Foo Bool Bool
           | Bar Int String

Luckily, there is a generic programming library that lets us “see” type parameters. Well, just one type parameter, but that’s exactly enough for zipMatch. In other words, this library provides a generic representation for type constructors of kind * -> *, whereas most other libraries only concern themselves with ordinary types of kind *.

What is that library called? base.

Seriously, starting from GHC 7.6 (released in 2012), the base library includes a module GHC.Generics. The module consists of:

  1. Several types (constants K1, parameters Par1, sums :+:, products :*:, compositions :.:) out of which we can build different algebraic types of kind * -> *.
  2. A class for representable algebraic data types, Generic1:

    class Generic1 f where
      type Rep1 f :: * -> *
      from1  :: f a -> (Rep1 f) a
      to1    :: (Rep1 f) a -> f a

    The associated type synonym Rep1 maps an algebraic data type like TypeF to an isomorphic type composed out of the primitives like K1 and :*:. The functions from1 and to1 allow converting between the two.

The compiler itself knows how to derive the Generic1 instance for eligible types. Here is what it looks like:

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
{-# LANGUAGE DeriveGeneric #-}

import GHC.Generics (Generic1)

data TypeF a = BaseType String
             | Fun a a
             | Tuple a a
  deriving (Functor, Foldable, Traversable, Generic1)

So, in order to have a generic Unifiable instance, all I had to do was:

  1. Implement Unifiable for the primitive types in GHC.Generics.
  2. Add a default zipMatch implementation to the Unifiable class.

You can see the details in the pull request.

Complete example

Here is a complete example that unifies a -> (c, d) with c -> (a, b -> a).

{-# OPTIONS_GHC -Wall #-}
{-# LANGUAGE DeriveFunctor, DeriveFoldable,
             DeriveTraversable, DeriveGeneric,
             DeriveAnyClass, FlexibleContexts

import Data.Functor.Identity
import Control.Unification
import Control.Unification.IntVar
import Control.Unification.Types
import Control.Monad.Trans.Except
import Control.Monad.Trans.Class
import qualified Data.Map as Map
import GHC.Generics

data TypeF a = BaseType String
             | Fun a a
             | Tuple a a
  deriving (Functor, Foldable, Traversable, Show, Generic1, Unifiable)
  --                                              ^^^^^^^^^^^^^^^^^^^
  --                                           the magic happens here

unified :: IntBindingT TypeF Identity
    (UFailure TypeF IntVar)
    (UTerm TypeF String))
unified = runExceptT $ do
  a_var <- lift freeVar
  b_var <- lift freeVar
  c_var <- lift freeVar
  d_var <- lift freeVar
    a = UVar a_var
    b = UVar b_var
    c = UVar c_var
    d = UVar d_var
    term1 = UTerm (Fun a (UTerm $ Tuple c d))
    term2 = UTerm (Fun c (UTerm $ Tuple a (UTerm $ Fun b a)))

  result <- applyBindings =<< unify term1 term2

  -- replace integer variable identifiers with variable names
    all_vars = Map.fromList
      [(getVarID a_var, "a")
      ,(getVarID b_var, "b")
      ,(getVarID c_var, "c")
      ,(getVarID d_var, "d")

  return $ fmap ((all_vars Map.!) . getVarID) result

main :: IO ()
main = print . runIdentity $ evalIntBindingT unified


Right (Fun "c" (Tuple "c" (Fun "b" "c")))

June 17, 2017 08:00 PM

Douglas M. Auclair (geophf)

May 2017 1Liners 1HaskellADay

  • May 10th, 2017:
    Define (^) :: (a -> a) -> Int -> (a -> a)
    The 'power'-function where f ^ 3 = f . f . f
    • Conor McBride @pigworker flip ((ala Endo foldMap .) . replicate)

by geophf ( at June 17, 2017 04:26 AM

June 16, 2017

Russell O'Connor

Some Random Thoughts of an Advanced Haskeller

Recently I was thinking about a programming problem that would need access to random values. I thought it might be fun to write up my though process as an advanced Haskeller while working through this particular problem.

In Haskell, one would write such a program by using a random monad to access an oracle providing random numbers. The traditional way to implement MonadRandom is using the state monad. The Gen type holds the state of the (pseudo-)random number generator, and the randomInt function returns a new random number and updates the state of the generator.

type Random a = State Gen a

randomInt :: Gen -> (Int,Gen)

randomOracle :: MonadRandom Int
randomOracle = state random

Then I write my program inside the Random monad, making calls to the randomOracle as needed

myProg :: Random Result
myProg = do
  {- ... -}
  x <- randomOracle
  {- ... -}
  y <- randomOracle
  {- ... -}

In order to run my program, I need to provide it with a random seed.

evalRandom :: Random result -> Gen -> result
evalRandom = evalState

For deterministic testing, we can pass fixed generators to evalRandom. If we use StdGen, we can map our Random program to IO and use the system random number generator.

type Gen = System.Random.StdGen
randomInt = System.Random.random

evalRandomIO :: Random result -> IO result
evalRandomIO = getStdRandom . runState

For the most general possible random number generator, the type for the generator state is simply an infinite stream of random values.

data Stream a = Cons a (Stream a)

unfoldStream :: (g -> (a, g)) -> g -> Stream a
unfoldStream next = go
  go seed = Cons value (go nextSeed)
    (value, nextSeed) = next seed

type Gen = Stream Int
randomInt (Cons hd tl) = (hd, tl)

evalRandomStdGen :: Random result -> StdGen -> result
evalRandomStdGen rr = evalRandom rr . unfoldStream System.Random.random

evalRandomIO :: Random result -> IO result
evalRandomIO rr = evalRandomStdGen rr <$> newStdGen

Before, when I was an intermediate Haskeller, I would probably stop at this point pretty satisfied with this. And let me be clear that this is a fine solution. However, now that I am an advanced Haskeller, I cannot help but feel a little dissatisfied with this solution. The problem with this implementation of the Random monad is that the type is too broad. Since the Random type is the State monad, there are operations allowed by the type that should not be allowed for a Random program. For instance, within the Random type, someone could store the state of the generator and restore it later causing random values to be replayed, or someone might completely replace the state of the generator with their own value.

One way to solve this problem is to use Haskell’s module system to abstract the monad and only expose the randomOracle operation. While this is a reasonable solution (in fact it is a very good solution as we will see), it would be nicer if we could instead use the type system to create a monad that is only capable of representing the programs we want to allow, and disallows other programs that would try manipulate the generator state in ways we do not want. Essentially we want our Random programs to only be able to query the random oracle, and that is it. After reflecting on this problem and the various kinds of monads I know about, I realized that the a suitable free monad captures exactly this notion of providing only an operation to query the random oracle. Specifically we want Control.Monad.Free.Free (Reader Int) or more directly (but more obtusely written) as Control.Monad.Free.Free ((->) Int). We truly want a free monad because any sequence of responses from the random oracle is valid.

One problem with this Free monad is that the bind operation can be slow because it needs to traverse through a, possibly long, data structure. There are several solutions to this, but for this particular free monad, I happen to know that the Van Laarhoven free monad representation is possible: The type forall m. Monad m => m Int -> m a is isomorphic to Control.Monad.Free.Free ((->) Int) a.

newtype Random a = Random { instantiateRandom :: forall m. Monad m => m Int -> m a }

instance Monad Random where
  return a = Random $ \_ -> return a
  ma >>= mf = Random . runReaderT
            $ ReaderT (instantiateRandom ma) >>= ReaderT . instantiateRandom . mf
instance Applicative Random where
  pure = return
  (<*>) = ap

instance Functor Random where
  fmap = liftM

In this representation, the random oracle function just fetches the argument.

randomOracle :: Random Int
randomOracle = Random id

For deterministic testing purposes we can evaluate the random monad by instantiating it with our state monad from before.

evalRandom :: Random result -> Stream Int -> result
evalRandom rr = evalState . instantiateRandom rr . state $ \(Cons hd tl) -> (hd, tl)

However, we can also directly instantiate it with the IO monad for production purposes.

evalRandomIO :: Random result -> IO result
evalRandomIO rr = instantiateRandom rr evalRandomIO

This is all very good; however, the advanced Haskeller in me still thinks that I ought to be able to write evalRandom without the need to invoke the State monad. This is because if we were actually using the Free ((->) Int monad, I would be writing evalRandom using iterA.

iterA :: (Applicative p, Functor f) => (f (p a) -> p a) -> Free f a -> p a 

evalFreeRandom :: Free ((->) Int) result -> Stream Int -> result
evalFreeRandom = iterA (\fpa (Cons hd tl) -> fpa hd tl)

No need for any state monad business in evalFreeRandom. How can I get that elegant solution with the Van Laarhoven free monad? One way would be to convert to the Free ((->) Int) representation

freeRandom :: Random result -> Free ((->) Int) result
freeRandom rr = instantiateRandom rr (liftF id)

evalRandom :: Random result -> Stream Int -> result
evalRandom = evalFreeRandom . freeRandom

Surely there must be a way to do this directly?

Before solving this I turned to another interesting problem. The iterA function recurses over the free monad structure to do its evaluation. The Free monad comes with its own general recursive elimination function called foldFree

foldFree :: Monad m => (forall x. f x -> m x) -> Free f a -> m a

This foldFree function is captures everything about the free monad, so it should be possible to write iterA by using foldFree to do all the recursion. But how is that even possible? The types of foldFree and iterA look too far apart. foldFree requires an natural transformation as input, and the arguments to iterA provide nothing resembling that.

To solve this I turned to the #haskell IRC channel for help. With assistance from glguy, ReinH, and byorgey, I turned the well known idea that if you want turn something to or from a natural transformation you use some sort of Yoneda / continuation like construction. In this particular case, the Cont (p a) monad seems to capture what we need. Following the types (and forgetting about the semantics) we end up the following.

iterA :: (Applicative p, Functor f) => (f (p a) -> p a) -> Free f a -> p a 
iterA h ff = runCont (foldFree (\fx -> Cont $ \k -> h (k <$> fx)) ff) pure

As an aside, glguy has a more “natural” solution, but it technically has a less general type, so I will not talk about here, even if I do feel it is better.

Turning back to our original problem of directly writing evalRandom without using the state monad, we can try to see if Cont will solve our problem.

evalRandom :: Random result -> Stream Int -> result
evalRandom rr = runCont (instantiateRandom rr (Cont $ \k (Cons hd tl) -> k hd tl)) const

We can compare the Cont solution to the State solution and see that the code is pretty similar.

evalRandom :: Random result -> Stream Int -> result
evalRandom rr = evalState (instantiateRandom rr (state $ \(Cons hd tl) -> (hd, tl)))

The inner Cont construction looks very similar to the inner state construction. The outer call to const in the Cont solution discards the final "state" which captures the same functionality that evalState has for the State solution. Now we can ask, which has better performance, the State solution, with its tupling and untupling of values, or the Cont solution which uses continuations instead? I will leave it to the GHC experts to figure that one out.

Arguably most of this is an exercise in academics, but it only took me a hour or three to go through this whole thought process. As an advanced Haskeller, I have slowly gathered, over many years, experience with these sorts of abstractions so that it starts becoming easy to do this sort of reasoning. While it may or may not matter much for my particular application, eventually this kind of reasoning becomes important. For example, the modern stream fusion in GHC exploits constructions that resemble this kind of reasoning, and that has had a big impact on performance.

For the non-advanced Haskellers out there, do not be deterred. Keep practicing your craft; keep reading about new abstractions, even if you do not fully get it; keep looking out for potential applications of those abstractions to solidify your understanding. Eventually you will have lots of very powerful problem solving tools at your disposal for making safe software.

June 16, 2017 11:45 AM

June 15, 2017

Michael Snoyman

A Very Naive Overview of Exercise (Part 3)

This blog post is part 3 of a series on nutrition and exercise. If you haven't seen them already, I recommend reading part 1 now, which provides a general overview, and part 2 detailing nutrition. This blog post will go into more details on exercise.

I'm going to break down exercise into three broad categories:

  • Resistance training
  • Cardio
  • Mobility/flexibility

These categories can overlap. For example, a weighted squat could be seen as both resistance training and mobility work. Circuit training could be seen as cardio and resistance. But typically there are distinct benefits for each categories, and fairly distinct activities that achieve those goals.

For the completely impatient, here are my recommendations on where you should get started. I strongly encourage reading the rest of the post so that these recommendations make sense and you can tweak them for your own personal needs:

  1. Perform bodyweight exercises three days a week. A simple routine will include exercises from the squat, pushup, pullup, and leg raise progressions.
  2. Run at least twice a week. I would focus on high-intensity sprinting, such as running as fast as you can for 20 seconds, resting for 40 seconds, and repeating for 5 sprints.
  3. Stay active regularly. Try to find excuses to get out and walk, take a bike ride, go for a swim, or just play with your kids.

Health vs fitness

Before diving into the details, I want to talk about two related but distinct terms. Definitions on these two terms vary quite a bit, but I'd like to give my own simplified definitions based on the input of many other sources:

  • Health is a measure of your ability to live life without sickness, crippling weakness, premature death, or other debilitating conditions.
  • Fitness is a measure of your ability to perform tasks. In our context, we're talking about the ability to perform specific physical feats, such as running a mile in a certain amount of time, bench press a certain amount of weight, etc.

What I'm trying to get across in these definitions is that health is about reaching a baseline where your body is not working against you. By contrast, fitness lets you push the boundaries of what you're capable of.

Often times, these go hand in hand. Being able to run a mile in 15 minutes, for instance, is a good indication that you are not suffering from any respiratory conditions, your bones are strong enough to withstand the impact of running, you have decent lower body muscle mass, and so on.

However, these two concepts can and do diverge. The ability to deadlift 300kg (660lbs) is not by any reasonable standard a prerequisite for a healthy body, but certainly measures fitness. Running a 4 minute mile is an amazing feat of prowess in fitness, but doesn't really tell me you're healthier than the person running an 8 minute mile.

I point this distinction out here because this series of posts is intended to cover health, and using nutrition and exercise to achieve it. It is very tempting to get caught up in numbers and goals that measure fitness, while throwing health to the wind. For the most trivial example of this: taking steroids to improve your powerlifting numbers will certainly improve your fitness. However, I'd argue pretty strongly against it, since it's bad for your health.

All that said, there's nothing wrong with pursuing fitness goals, and as I mentioned in why I lift, doing so can be a lot of fun. Having something to compete against—even yourself—is a huge motivator. Just make sure you're not sacrificing your health in the process.

Resistance training

This is also known as strength training. Let's rip off the definition straight from Wikipedia:

Strength training is a type of physical exercise specializing in the use of resistance to induce muscular contraction which builds the strength, anaerobic endurance, and size of skeletal muscles.

The term strength training tells us the why, whereas resistance training hints more at how we achieve these goals. Resistance training involves exerting your muscles against some external resistance. Probably the most emblematic version of this is resisting against gravity in the form of lifting weights, but we'll see that there are many other approaches available.


This could be my own personal experience that others have not felt, but growing up I always had the impression that training for strength was somehow bad. Lifting weights was a vain pursuit of bigger muscles, and real health benefits only came from cardio like jogging.

If you never had these misconceptions, congratulations. I certainly did. And in case others do as well, let me dispel them:

  • Muscle mass has a protective effect on your body. For example, if you have more muscle, you can withstand a larger impact.
  • If you're capable of moving larger weights, then day to day activities are easier. For example, if you can deadlift 100kg, then picking up your 30kg child is a much easier activity, and won't exhaust you as quickly.
  • Strength training doesn't just increase muscle mass; it also increases your bone density and strengthens your tendons. This makes strength training a great way to fight off osteoporosis, making it a vital activity for older people, and especially older women. (Unfortunately, this is the group most likely to not bother strength training.)
  • While strength training doesn't burn as many calories as cardio, it does encourage your body to use calories consumed to build and maintain muscle mass instead of fat mass. This means you can get away with eating some level of extra calories without gaining fat.
  • Because strength training uses up muscle glycogen, it can be a great way to help control blood glucose levels. After a heavy training session, your muscles will be primed to absorb glucose to rebuild glycogen, instead of leaving the glucose in your blood to convert into fat or (in the case of diabetics) simply harm your body with toxic glucose levels.
  • Increased strength can help avoid injuries. Prior to 2016, despite no longer being overweight and having a decent strength base, I was constantly throwing out my back from normal day-to-day activities (like sitting at a computer for too long). This was my biggest motivation for getting into weight lifting that year, and my back has been much happier since.
  • Strength training helps improve many health markers, like blood lipid profiles (cholesterol) and hormone levels.

That's a lot of benefits, and it's far from a complete list. You may not relate to all of the points above, but hopefully it makes the point that strength training is not just for young guys wanting to impress people with their biceps. Strength training is a vital component of good health for everyone, regardless of age or gender.


All strength/resistance training fits into the same basic idea. You want to move some part of your body by contracting a muscle. You want to use some technique to make that contraction difficult so that your muscle has to struggle. By challenging the muscle, you trigger—through various pathways—your body to:

  • Make the muscle stronger
  • Increase toughness of the tendons
  • Increase bone density

These benefits occur during recovery, or the time after you stop exercising. This is important: if you keep exercising non-stop for days on end, you will get weaker, not stronger. The formula then is:

  • Perform exercise against resistance
  • Rest/recover
  • Repeat

This kind of exercise is anaerobic, meaning "without air." Because resistance training is short bursts of heavy intensity, it mostly relies upon glycogen for energy, which can be burned without oxygen. This may seem to imply that resistance training has no benefits on the cardiovascular (heart and lung) system, and doesn't help burn fat (which requires oxygen to break down). Neither of these is true, however. During the recovery phase, your body will need to rely on your fat stores to provide energy to rebuild muscles, which will put demands on the cardiovascular system to provide additional oxygen.


Last bit of theory here before we dive into how to do all of this. Another way of looking at exercise is a stress we are applying to our body. Stress has a bad rap, and for good reason: chronic stress, such as we experience in our daily life from work and continual electronic stimulation, is damaging. However, in small doses, stress is wonderful for our body.

When we temporarily stress our body, it provides a stimulus for our body to get better, so it is able to more easily handle the stress in the future. Stressing our muscles causes them to get stronger. Stressing our bones makes them more dense. And stressing our cardiovascular system with extra oxygen demands makes our heart and lungs more efficient.

Temporary stress with proper recovery is the very heart of exercise, and will carry through to everything in this post.


OK, let's actually talk about some exercises! The most easily accessible form of resistance training is body weight exercises, or bodyweights. The concept here is simple: use your own body and gravity to provide a resistance for your muscles to exert against.

Probably the most famous example of this is the pushup. You are pushing against the ground with your arm, shoulder, and chest muscles to create enough force to move your body against gravity. Your own body weight is working against your muscles.

If you read the word "pushup" and thought "I can't do that," no need to worry. Bodyweight exercises usually follow some form of progression, where you can start with easier forms of the exercise and gradually move to more difficult versions. Taking a pushup as an example, a progression may look something like:

  1. Stand in front of a wall and push your body away from it
  2. Put your hands on a table and push up from that position
  3. Do pushups with your knees on the ground
  4. A standard pushup, with only your feet and hands touching the ground
  5. Put your feet on a stool and push up
  6. Put your feet high on a wall and perform a vertical pushup

There are other variations you can perform: changing the width of your grip by putting your hands closer or farther apart to focus on different muscles. You can also follow a one-arm pushup progression instead of a vertical pushup progression. Vertical pushups put more stress on your shoulder muscles, while one-arm pushups put more focus on your chest muscles.

If all of this sounds confusing and a bit daunting, don't worry. Some very helpful people online have already created programs around bodyweights. Some references:

All of these routines follow the same basic principles: use compound movements to target all of your major muscle groups, progressively overload those muscles, and provide ample time for recovery. If those terms are confusing, don't worry, there are sections below dedicated to explaining them.

If you're feeling overwhelmed or confused, let me remind of something from the first post in this series: don't let confusion get in your way! These are all great routines, and doing something is better than doing nothing. Pick something and do it for a few weeks, and after you get comfortable, you'll be ready to make a more informed decision about how you want to proceed.

Let's see how bodyweight exercises stack up against alternatives:


  • Requires little to no equipment, making it an easy method to start with or use on the road
  • Less risk of injury vs free weights, since there's no barbell trying to crush you. (Notice I said less, not none. Be careful.)
  • Because you are working against your own body weight, reducing your body fat makes your bodyweight exercises more successful. Typically, practicioners of bodyweight routines will be leaner than weight lifters.


  • Increasing intensity is more complicated than simply adding more weight to a bar
  • Some muscles groups are difficult to properly stress. While you can get a pretty good shoulder workout with vertical pushups, it's difficult to develop your posterior chain (hamstrings, glutes, and lower back) with bodyweights. This was the reason I started weight lifting in the first place.

Weight lifting

Weight lifting is the act of moving some external weight against gravity (or sometimes against friction and inertia). The category breaks down broadly into machines and free weights. Free weights are things like barbells, dumbells, and kettlebells. For those unfamiliar with these terms:

  • A barbell is a long metal bar (about 2 meters or 6 feet) that you usually hold with both hands.
  • A dumbbell is a shorter metal bar usually held in one hand
  • A kettlebell is a weight with a handle on the top
  • A machine is some kind of, well, machine

Free weights have an advantage over machines in that they are unstable. This means you need to use more muscle groups to keep control of the weight. By contrast, a machine keeps the weight in more a less a straight line, which takes some of the stress off of your body. Additionally, machines are usually easier to learn to use and less dangerous.

If you're too intimidated by free weights, by all means start right away with machines. But if you avoid free weights indefinitely, you're limiting yourself significantly. I strongly recommend you get comfortable with using a barbell. Start with low weights and build up slowly. Focus on getting the movements correct (aka good form), and slowly build up to heavy weights (where heavy is a personal assessment of what is difficult for you).

If you're going to pursue a machine-based routine, I'd recommend speaking with one of the trainers at the gym you're attending. I'm not familiar with good machine-based routines available online, and it will depend a lot on what equipment you have available.

If you want to get started with free weights, there are two very popular routines to consider:

If you go to popular weight lifting forums, you'll see a lot of flamewars between these two routines. To cut through some of this: Starting Strength was the original program, is designed by a coach (Mark Rippetoe) with a huge amount of experience training individuals, and was groundbreaking when first released. StrongLifts is basically a variation of Starting Strength and doesn't have as much experience to back it up.

Based on that, it would seem that Starting Strength is the way to go. I personally decided to go with StrongLifts, and my reasons were:

  • It has a really nice smartphone app. Yes, I'm that shallow, but it makes it dead simple to get started
  • StrongLifts uses a barbell row in place of a power clean. I agree with the StrongLifts creator (Mehdi) that the latter is more complicated to learn, and that the former is a great upper back exercise missing from Starting Strength.

I'm sure these reasons sound shallow, and some people will be upset with this. But the reality is: do whichever routine you want (or a completely different one). As long as you're lifting, you're better off.

And one word of warning from my own experience: don't become so obsessed with progressing through the program that you ignore your body's complaints. I trained to injury a few times because I ignored pain and put on extra weight when I shouldn't have. Don't be stupid!

Resistance bands

I'm not going to say much about these, since I haven't really used them. But I wanted to make it clear that there are drastically different approaches to resistance training. Resistance bands are pieces of rubber which you can stretch, and which become harder to stretch the further you've pulled them. You can use them in place of weights for many kinds of workouts. Your body doesn't care what's causing the resistance. It just wants something to be resisting it.

There's a YouTube channel which I find very beginner-friendly, called "Picture Fit." Here are two videos I recommend watching that summarize the three categories mentioned:

Combine them!

I've presented the information so far as a choice among competitors. This is far from the case. Many of these techniques can be combined to gain the advantages of each. For example, consider a workout routine consisting of:

  • Bench press (free weight)
  • Pushups (body weight)
  • Seated press machine (machine, duh)
  • Overhead band (resistance bands)

There's no reason to avoid mixing and matching. However, building your own routine is a more advanced activity. When you're getting started, I recommend choosing one of the routines I linked to above and sticking to it until you get comfortable with the exercises.

Sets and reps

Let's talk nomenclature. A rep is short for a repetition, and it describes performing one complete exercise. For example, with a pushup, a repetition consists of lowering your body to the ground and raising yourself back up to the starting position.

A set is a collection repetitions performed without rest. For example, a set may consist of 8 reps.

Often times, workout programs will be given in terms of sets and reps like so:

  • Pushups 3x8
  • Bench press 3x5
  • Overhead press 1xF

This means:

  • Perform three sets of five repetitions of pushups
  • Perform three sets of five repetitions of bench press
  • Perform one set of overhead press to failure (as many reps as you can do)

You'll also need to consider how long to rest between sets. Usually your program will tell you this. Valid answers here can be as little as 30 seconds and as much as 5 minutes. Typically different rest periods will work your body in different ways: shorter rest gives more endurance training, whereas longer rest gives more strength gains.

Compound vs isolation

Think of a bench press: you're lying on your back with a barbell over you. You bend your elbows, your wrist bends, and your shoulder joint activates. You push back up using your chest muscles, your shoulder muscles, and your arm muscles (tricpes in particular).

Now think of a bicep curl: you hold a dumbbell in your hand and you bend your elbow.

The former is called a compound movement: it involves multiple muscle groups moving mutiple joints in your body. The latter is an isolation exercise: it targets just one muscle group via one joint.

Generally speaking, you'll get better results by focusing on compound movements. They stress the body more, and in more natural ways. They lead to more balanced development of muscles. And they are more time efficient: you work more muslces in less time.

That's not to say you should never use isolation exercises, but in my opinion they should be considered accessories to main, compound movement. Use them to help develop weak spots in your strength.

You'll notice that the routines I listed above all focus on compound movements. That's not by chance.

Progressive overload

If you do 10 pushups a day for the rest of your life, after a certain point you aren't going to get stronger. In order to reap the full benefits of strength training, you need to progressively overload your muscles by increasing the stress/stimulus. You can do this in multiple ways:

  • Adding more weight to the bar/machine
  • Doing more reps
  • Doing more sets
  • Changing the tempo (slower exercises are harder)
  • Changing the exercise you're doing (full pushups vs knee pushups)

A good program will build in this kind of progressive overload, as do the programs I linked to above. The basic idea is to avoid stagnating by constantly challenging yourself to improve.

Plate math

In order to modify the weight of a barbell, we can add extra weight to it. These weights come in the form of plates, circular pieces of metal—sometimes rubberized—that are put on the sides of the bar.

If you're going to be doing barbell exercises, it's important to get comfortable with adding up weights, also known as plate math. I'll start with the metric system, since it's easier to handle, and what I use.

A standard barbell weighs 20kg. The plates you'll put on the barbell must be balanced: you put the same amount on the left and right side. If you put a 10kg and 5kg weight on each side, you'll end up with:

  • 20kg bar
  • 10kg times 2 (one per side) = 20kg
  • 5kg times 2 (one per side) = 10kg
  • Total 20+20+10=50kg

I find it easiest in most cases to add up the weight per side of the bar, double it, and add 20. So in the above example, I'd do "10 + 5 = 15, 15 * 2 = 30, 30 + 20 = 50." This is just arithmetic, so don't get too hung up on it, and do what's comfortable.

Now let's do this in reverse. Suppose you're planning on benching 70kg. In order to figure out what to put on the bar, you would do this:

  • 70kg - 20kg for the bar = 50kg in plates
  • 50kg total plates / 2 = 25kg in plates per side
  • Start finding the largest plates that will add up to your number. In this case, you're probably looking at a 20kg and 5kg.

Try not to just match the total weight, but also the plate distribution. In other words, don't put a 20kg on one side of the bar and 4 5kg plates on the other. That will feel unbalanced. Most gyms will have plates of size 20kg, 10kg, 5kg, 2.5kg, and 1.25kg. Some may also have 25kg and 15kg.

You may also hear people say things like "squatting 2 plate," or on stranger parts of the internet, "2pl8." This means you have 2 20kg plates per side of the barbell. Why 20kg? Convention. Do the math, I'll give you the total weight for this at the end of this section.

For you Americans, the numbers are slightly different. Instead of a barbell weighing 20kg, it weights 45lbs, which is just slightly more than 20kg (20.4kg). And the plates come in sizes of 45lbs, 35lbs, 25lbs, 10lbs, 5lbs, and 2.5lbs. As a developer, I love the power-of-2 system employed by the metric plates, but if you have to use imperial measurements, just get used to doing the math.

This has the funny side-effect that if you say "I squatted 2 plate," it means something different between America and the rest of the world. (Go ahead and figure out what that total pound value is.) The numbers are close, but not exactly the same.

Answer: 2 plate is 100kg, or 225lbs.

Importance of proper form

You'll read this just about everywhere that discusses weight lifting, but I'll say it here too: using proper form on your lifts is absolutely crucial. Using proper form will:

  • Ensure you are getting the full value from your workout
  • Help you avoid injuries
  • Make sure you don't end up in an embarassing video on YouTube

There are two particular points of proper form that I want to point out:

  • The act of lowering the weight is known as the eccentric portion of the exercise. It is common to see people lose control of the weight during this portion. If you do this, you are hindering your progress dramatically! Most of the muscle tearing that leads to muscle regrowth occurs during the eccentric portion. Lowering the weight in a controlled, steady pace is difficult, but well worth it.
  • Be sure to follow full range of motion. You'll often hear people say they don't want to squat to parallel because it will injure their knees. This is in fact a myth: squatting with insufficient depth leads to muscular imbalances and injuries. * I'm well aware of the fact that I haven't actually described how to do a squat in this post; please see the linked routines above that describe how to do a squat properly.

Full body vs splits

A full body workout is a routine that exercises all (or most) muscle groups each day you train. A split routine somehow splits up days of the week to specific muscle groups. There are many tradeoffs between these two approaches, and I won't be able to cover them all here. But here's a basic idea: you should always have a day of rest between training a specific muscle group. But having too many rest days in between is limiting your growth potential.

If you're going to work out three days a week, you can do a full body routine each of those days and have 1 or 2 days of rest in between. By contrast, if you're going to work out 6 days a week, doing a full body routine each day won't give you any time to rest and recover.

The routines above are all full body routines. That's probably the right place to start; I would highly advise against strength training for more than three days a week as a beginner. If you later want to progress to more days of working out a week, you can consider some kind of split. There are many preexisting routines based on splits, and you can of course make your own.

Personally, I've found the PPL (Push/Pull/Leg) split approach to be pretty good. The idea is to first separate out all lower-body/leg exercises to their own day. Then, of upper body exercises, break them up by whether they push the weight away from your body (like a bench press) or are pulling the weight toward your body (like a curl or barbell row). This ends up pretty cleanly dividing up the upper body muscle groups.

How to eat

If you're just getting started with strength training, you don't need to worry too much about eating. Follow nutrition advice from the previous post. If you're trying to lose fat, eat at a caloric deficit. When you're initially going from untrained to trained, you get to experience what are known as "noob gains," which lifters treat as the magical ability for your body to get stronger and leaner at the same time.

Once you're past that initial beginner phase, it gets harder to pull this off. You'll hear people talk about bulking and cutting, on the premise that you need to eat extra food to fuel muscle growth (bulk), and then go for a period of caloric deficit to burn off the extra fat you gained (cut). Other approaches believe in trying for a recomp, or body recomposition, consisting of careful balancing of calories to get just enough to gain muscle and burn fat. Other approaches like Lean Gains believe in carb and calorie cycling: eating more carbs and calories on training days, and less carbs and calories on rest days.

This is all rocket science versus what we're discussing here. I'm mentioning it all so that you know you don't need to freak out about it. Remember, your goal is to get used to training, enjoy it, nail down form, and get basic strength gains. If you decide to pursue strength training more aggressively (like I have), there will be plenty of time in the gym to read hundreds of articles on the right way to eat. For now: eat healthy and lift heavy things.

Final note: be sure to get plenty of protein while strength training. You'll be using protein to rebuild your muscles after working them in the gym. If you don't have enough protein in your diet, your body will be unable to recover.

Muscle groups

There are many different muscles in your body. However, when talking about weight lifting, we usually break the body down into major muscle groups. The basic breakdown often discussed is:

  • Trapezius, or traps: muscles between shoulders and neck
  • Deltoids, or delts: shoulder muscles
  • Triceps: back of the arm muscles (used to extend your elbow)
  • Biceps: front of the arm muscles (used to bend/flex your elbow)
  • Pectoralis, or pecs: chest muscles
  • Latissimus, or lats: upper back
  • Core: stomach and lower back stabilizing muscles. This includes your abs
  • Gluteus, or glutes: your butt muscles
  • Quadriceps, or quads: front of the leg muscles (used to extend your knee)
  • Hamstrings: back of the leg muscles (used to bend/flex your knee)

You should get comfortable with identifying these muscle groups, and at flexing the different muscle groups. Some exercises will say things like "activate your glutes" or "stabilize with your lats." Don't worry if you're having trouble feeling your pecs or lats, working them out will help.

Make sure that, with whatever exercise routine you're following, you're hitting all of this muscle groups at least once per week (and ideally 2-3 times).

Summary of resistance training

Wow, that was a lot! I honestly didn't realize I had that much to say on the subject of resistance training, and there's still a lot more worth saying. But hopefully this gives you a good place to start. In sum:

  • Strength training is for everyone
  • Don't forget to focus on health, not just pushing some numbers
  • Body weights are an easy way to get started and require little equipment *
  • If you have access to a gym and/or weights, a weight lifting routine can be a great approach * StrongLifts
  • Start light, get your form down, and progressively increase the load
  • Focus on compound movements, adding in isolation movements as desired
  • Eat healthy, and be sure to get plenty of protein


I'll say right now that I know more about resistance training than cardio and mobility, so these two sections will not be as detailed as resistance training. (And after everything you just read through, you may be relieved to hear that).

Cardio is also known as aerobic exercise. Aerobic means "with oxygen," and describes the energy system used during typical cardio workouts. When you go for a 30 minutes jog, you'll end up using fat as a major energy source, which requires oxygen to break down. This energy production is not as fast as glycogen, but we don't need to have the same level of explosive energy as we do with weight lifting.

Advantages of cardio:

  • It increases the efficiency of your respiratory system in order to provide sufficient oxygen to your body
  • It increases the efficiency of your circulatory system, also in order to provide sufficient oxygen to your body
  • It's good for burning fat

    • Because you can sustain cardio exercise for a longing period of time than intense weight lifting, you can cumulatively burn more calories
    • Since the primary energy source for cardio is fat, you'll burn fat directly, which you won't do with weight lifting
    • Both of these points are more nuanced than I've implied, keep reading for more
  • Improvements to blood lipids (cholesterol)
  • Numerous other, less tangible benefits, like decreased chronic stress

There are also some downsides:

  • Many forms of cardio (like jogging) put strains on our bones and joints, which can lead to injury over time
  • You may have heard the meme "cardio kills your gains," implying that cardio destroys muscle mass. While the meme is certainly overplayed, there's no question that 30 minutes of cardio will not result in as much muscle synthesis stimulation as 30 minutes of weight lifting.
  • Subjectively: it's boring. Some people really love running or biking. Others (like me) find it difficult to stay motivated for longer cardio sessions. If you love cardio: great, keep doing it. If this describes you, I'll present an alternative below.

There are many different ways you can perform cardio. Some of the most popular are:

  • Running/jogging
  • Cycling
  • Swimming
  • Eliptical (my personal favorite, due to signficantly lowered joint impact)
  • Jumping rope
  • Stair climbing

Cardio can be performed on a daily basis. There is far less concern of overtraining like with weight training, since the exercise will not break down your muscles to the same extent. Common durations for a session range from 15 minutes to an hour. My recommendation: start off with something you can manage easily, get used to the activity, and then ramp it up over time.

I haven't personally done this program, but I've heard good reviews of the Couch to 5k program, which trains you to be able to run 5 kilometers (or just over 3 miles) in 9 weeks.

High Intensity Interval Training

It may be slightly incorrect to include High Intensity Interval Training, or HIIT, as a subheading within cardio, but I'll explain my motivation shortly. Cardio as described above is also known as Low Intensity Steady State (LISS), where you keep to a mostly-fixed level of exertion which can be maintained for a significant period of time. By contrast, HIIT uses short bursts of high intensity exertion for a shorter period of time.

A typical HIIT protocol may look like: perform a cycle of 8 sprints. For each sprint, run as fast as you possibly can for 20 seconds, and then rest for 10 seconds. (This specific protocol is known as tabatas.) This full workout will take only 4 minutes, but as I saw someone once describe it, "it's 4 minutes of suck." Also, since HIIT is more physically taxing than LISS, you should take at least one rest day between sessions.

Before getting into the physical comparison, I want to point out that both HIIT and LISS are appealing. HIIT is anything but boring, and it's incredibly time efficient (imagine replacing a daily 30 minute run with a 4 minute sprint 3 days a week). But it's a hard workout. In fact, it's hard enough that I'd encourage people to not try to start exercising with regular HIIT sessions, as it may encourage you to give up. Instead, try a HIIT session interspersed with other workouts, and only consider making it part of your routine when you're confident that you won't give up. Remember, any exercise is better than no exercise.

So, if HIIT is so very different than normal cardio, why did I include it here? Because research is indicating that it can deliver on the same benefits people try to get from LISS cardio:

  • While you burn less energy during workout than with LISS, HIIT triggers something known as Excess Post-exercise Oxygen Consumption (EPOC), also known as the afterburn effect thanks to some spammy infomercials. What this means is that you continue to burn energy at a higher rate for about 48 hours after a HIIT session to recover.
  • Since this EPOC involves increased oxygen usage, it puts a stress on the respiratory and cardiovascular system, providing similar health benefits to those systems as LISS. (I encourage you to do the research yourself on which form actually causes better adaptations.)
  • While you will use glycogen more than fat during a HIIT session, the recover period will use more fat burning, resulting in plenty of fat loss. (Again, please check out the research yourself.)

In addition, HIIT claims some advantages over LISS, like more favorable hormonal responses and possibly better blood glucose control.

Short story: there is a lot of positive to be said about HIIT, but the science is not conclusive yet. If you want to try HIIT, and you don't believe you'll be discouraged by the intensity, go for it.

To make my biases clear in this: I almost never do dedicated LISS cardio sessions, but instead rely on HIIT for cardiovascular health. It's worked well for me, with improvements in my blood pressure, pulse, and respiratory system (far less symptoms of asthma). But given that HIIT is still considered somewhat less established than LISS, I want it to be clear that I am not advocating for anyone to stop standard cardio workouts.

You can do HIIT with lots of different exercises:

  • Running (sprinting)
  • Cycling
  • Eliptical (again, my favorite)
  • Swimming

There are also similar programs, like circuit training, which involve high intensity as well as weight lifting.

Weight lifting for cardio health?

One other very interesting approach for overall strength and cardiovascular health is presented in the book "Body by Science." I'm throwing this in here just to give a taste of how varied theories of good exercise are, and to encourage you to continue research beyond this naive overview.

Body by Science makes the bold claim that you can get "strength training, body building, and complete fitness in 12 minutes a week." I'll present a massively simplified version of what they claim, and encourage you to read the book itself if you're interested in more.

  • We can use just 5 big, compound weight lifting movements to target all of the major muscles groups in the body.
  • It's possible to perform each of these 5 movements for 90 seconds continuously to fully exhaust the muscles and deplete their glycogen stores. (5 * 90 seconds plus rest time is where the 12 minute claim comes from.)
  • It takes approximately a week for your body to fully recover from such an ordeal.
  • By fully exhausting the muscles, you send a trigger to your body to increase your muscle mass so you're more well prepared for the next time this happens. This is because your body reads this event as a fight-or-flight, life-or-death situation.
  • In order to provide energy to replenish glycogen and rebuild the muscles, your body will have significant respiratory and cardiovascular demands, which will cause improvements in those systems (like HIIT).

I've never done this program myself, but that's mostly because I actually enjoy my time in the gym, and don't want to reduce it to just 15 minutes a week. At the very least, the book is a great read with lots of valuable information.

Undoing your workout with food

This is a very common problem with people doing cardio: get on the treadmill for 45 minutes, walk at a decent (but not particular strenuous pace), and then get some kind of recovery smoothie (or insert other food item here). Take a guess: how many calories did the treadmill burn, and how many are in the smoothie?

Unfortunately, for many people, the smoothie completely outweighs the workout itself. Don't fall into this trap! Figure out your nutrition, and stick to it. Don't convince yourself that you're free to eat whatever you want because you went for a run today. You'll be undoing all of your hard work.

Move slowly, often

Another idea to throw in is, outside of "exercise," it's a good idea to simply be more active. Taking a nightly walk, taking the stairs instead of the elevator, playing some easy sports, taking a break at the office to step outside, or a dozen other tweaks you can make throughout your day, all make you less sedentary. Sure, these activities help you burn a few more calories. But I would argue—as would many better authorities—that simply being more active is a reward in and of itself.


Flexibility measures the range of movement of a joint. Flexibility can be improved with stretching. Given the sedentary lifestyles most of us live today, we end up having reduced flexibility. While flexibility and stretching typically have to do with the static range of motion of our joints, mobility refers to our ability to effectively move our joints.

An important distinction to make in these kinds of routines is dynamic vs static. Dynamic movements will involve moving a joint constantly. These are good to warm up before another exercise session. By contrast, static stretches will hold your joints in a fixed position. These can increase overall flexibility, but are generally best saved for after a workout.

This is the area in this post I am least familiar with, so I'm not going to go into much detail. Probably the most popular technique out there right now for improving your flexibility and mobility is Yoga. Many other people can give better advice than I can for getting started with it.

One pair of programs I followed (for less time than I should have) for mobility and flexibility are Molding Mobility and Starting Stretching. I found it much easier to grasp when I watched a set of Youtube videos demonstrating them:

The idea with this order is to perform the dynamic mobility routine first, perform any resistance training next, and then finally perform static stretches at the end.


Thank you for making it through these three posts, I know I didn't make it easy. Hopefully they have provided you with lots of information, a good idea of the terms at play, and encouragement to go read more from better sources. And, of course, I hope you don't just make this an intellectual endeavor, but start taking control of your health!

My recommendation for getting started with this: get your nutrition improved, and to a place where you're comfortable with your daily eating routine. Try not to focus on a scale goal; focus on eating better. Experiment, and find what works. Introduce some exercise. Make sure you're ultimately getting in exercise that both improves your strength level, and improves your cardiovascular system.

I hope this was useful. If you have questions, please send them to me. I still haven't decided if I'll be making more health-related posts. If this is something you'd like to see from me, please say so, it's more likely to happen with such feedback.

June 15, 2017 03:00 PM

Functional Jobs

Software Engineer (Haskell — Full Stack) at Capital Match Holdings Pte. Ltd. (Full-time)

At Capital Match, we are looking for a software engineer primarily using Haskell to develop features and integrations of the platform with the financial system in Southeast Asia.

Capital Match is the leading peer-to-peer lending company in Singapore, and we are in the process of expanding to the rest of Southeast Asia.

The candidate should be interested in all aspects of the creation, growth and operations of a secure web-based platform: front-to-back feature development, distributed deployment and automation in the cloud, build and test automation.

We are inviting developers with at least 5 years of coding experience. She/he should have been involved in development of multiple web-application products. Experience using Haskell or a functional language is strongly preferred, but we also welcome those who don't have Haskell experience but can learn very quickly.

We practice agile development with rapid iteration and frequent deployments, and we are constantly looking to improve our development practices too.

Capital Match is a successful fin-tech startup, founded in 2014 in Singapore. We already S$40+ million loans funded to date.

We offer competitive compensation, depending on experience.

You'll work in Singapore, Bangkok, or remotely. If you are relocating to Singapore, visa sponsorship will be provided.

Get information on how to apply for this position.

June 15, 2017 10:20 AM

Software Engineer (Haskell - Full Stack) at Capital Match (Full-time)

Capital Match is a leading marketplace lending platform in Southeast Asia, headquartered in Singapore.

We are looking for a Full Stack Software Engineer primarily using Haskell to develop features and integrations with the financial system in Singapore and other countries in the region.

The candidate should be interested in all aspects of the creation, growth and operations of a secure web-based platform: Front-to-back feature development, distributed deployment and automation in the cloud, build and test automation.

We are inviting developers with at least 5 years of coding experience. She/he should have been involved in development of multiple web-application products. Experience using Haskell or a functional language is strongly preferred, but we also welcome those who don't have Haskell experience but can learn very quickly.

Senior applicants might be considered for a Head of Engineering position.

We practice agile development with rapid iteration and frequent deployments, and we are constantly improving our development practices.

Capital Match is a successful fintech venture having already processed over S$40 million loans to date. The company was founded in 2014 and has since attracted substantial VC funding.

We offer competitive compensation, including equity and depending on experience.

You can work in Singapore, Bangkok or remotely. If you are relocating to Singapore, visa sponsorship will be provided.

Get information on how to apply for this position.

June 15, 2017 10:20 AM

Software Engineer (Haskell - Full Stack) at Capital Match Holdings Pte. Ltd. (Full-time)

Software Engineer (Haskell - Full Stack) will be responsible for developing features and integrations of the platform with financial system in Southeast Asia.

The candidate should be interested in all aspects of the creation, growth and operations of a secure web-based platform: front-to-back feature development, distributed deployment and automation in the cloud, build and test automation.

We are inviting developers with a minimum of 5 years coding experience. She/he should have been involved in development of multiple web-application products.

Capital Match is a successful fintech start-up, founded in 2014 in Singapore, making it’s way to SEA.

Competitive compensation depending on of experience.

Location in Singapore or remotely.

Get information on how to apply for this position.

June 15, 2017 10:20 AM


Binary instances for GADTs
(or: RTTI in Haskell)

In this blog post we consider the problem of defining Binary instances for GADTs such as

data Val :: * -> * where
  VI :: Int    -> Val Int
  VD :: Double -> Val Double

If you want to play along the full source code for the examples in this blog post can be found on github.

Failed attempt

The “obvious” way in which you might attempt to serialize and deserialize Val could look something like

instance Binary (Val a) where
  put (VI i) = putWord8 0 >> put i
  put (VD d) = putWord8 1 >> put d

  get = do
    tag <- getWord8
    case tag of
      0 -> VI <$> get -- Couldn't match type ‘a’ with ‘Int’
      1 -> VD <$> get -- Couldn't match type ‘a’ with ‘Double’
      _ -> error "invalid tag"

However, this does not work. The definition of put is type correct (but dubious), but the definition of get is not type correct. And actually this makes sense: we are claiming that we can define Binary (Val a) for any a; but if the tag is 0, then that a can only be Int, and if the tag is 1, then that a can only be Double.

One option is to instead give a Binary (Some Val) instance with Some defined as

data Some :: (* -> *) -> * where
  Exists :: forall f x. f x -> Some f

That is often independently useful, but is a different goal: in such a case we are discovering type information when we deserialize. That’s not what we’re trying to achieve in this blog post; we want to write a Binary instance that can be used when we know from the context what the type must be.

Working, but inconvenient

The next thing we might try is to introduce Binary instances for the specific instantiations of that a type variable:

instance Binary (Val Int) where
  put (VI i) = put i
  get = VI <$> get

instance Binary (Val Double) where
  put (VD d) = put d
  get = VD <$> get

Note that there is no need to worry about any tags in the encoded bytestring; we always know the type. Although this works, it’s not very convenient; for example, we cannot define

encodeVal :: Val a -> ByteString
encodeVal = encode

because we don’t have a polymorphic instance Binary (Val a). Instead we’d have to define

encodeVal :: Binary (Val a) => Val a -> ByteString
encodeVal = encode

but that’s annoying: we know that that a can only be Int or Double, and we have Binary instances for both of those cases. Can’t we do better?

Introducing RTTI

Although we know that a can only be Int or Double, we cannot take advantage of this information in the code. Haskell types are erased at compile time, and hence we cannot do any kind of pattern matching on them. The key to solving this problem then is to introduce some explicit runtime type information (RTTI).

We start by introducing a data family associating with each indexed datatype a corresponding datatype with RTTI:

data family RTTI (f :: k -> *) :: (k -> *)

For the example Val this runtime type information tells us whether we’re dealing with Int or Double:

data instance RTTI Val a where
  RttiValInt    :: RTTI Val Int
  RttiValDouble :: RTTI Val Double

For serialization we don’t need to make use of this:

putVal :: Val a -> Put
putVal (VI i) = put i
putVal (VD d) = put d

but for deserialization we can now pattern match on the RTTI to figure out what kind of value we’re expecting:

getVal :: RTTI Val a -> Get (Val a)
getVal RttiValInt    = VI <$> get
getVal RttiValDouble = VD <$> get

We’re now almost done: the last thing we need to express is that if we know at the type level that we have some RTTI available, then we can serialize. For this purpose we introduce a type class that returns the RTTI:

class HasRTTI f a where
  rtti :: RTTI f a

which we can use as follows:

instance HasRTTI Val a => Binary (Val a) where
  put = putVal
  get = getVal rtti

This states precisely what we described in words above: as long as we have some RTTI available, we can serialize and deserialize any kind of Val value.

The last piece of the puzzle is to define some instances for HasRTTI; right now, if we try to do encode (VI 1234) ghc will complain

No instance for (HasRTTI Val Int)

Fortunately, these instances are easily defined:

instance HasRTTI Val Int    where rtti = RttiValInt
instance HasRTTI Val Double where rtti = RttiValDouble

and the good news is that this means that whenever we construct specific Vals we never have to construct the RTTI by hand; ghc’s type class resolution takes care of it for us.

Taking stock

Instead of writing

encodeVal :: Binary (Val a) => Val a -> ByteString
encodeVal = encode

we can now write

encodeVal :: HasRTTI Val a => Val a -> ByteString
encodeVal = encode

While it may seem we haven’t gained very much, HasRTTI is a much more fine-grained constraint than Binary; from HasRTTI we can derive Binary constraints, like we have done here, but also other constraints that rely on RTTI. So while we do still have to carry these RTTI constraints around, those are – ideally – the only constraints that we still need to carry around. Moreover, as we shall see a little bit further down, RTTI also scales nicely to composite type-level structures such as type-level lists.

Another example: heterogeneous lists

As a second—slightly more involved—example, lets consider heterogeneous lists or n-ary products:

data NP (f :: k -> *) (xs :: [k]) where
  Nil  :: NP f '[]
  (:*) :: f x -> NP f xs -> NP f (x ': xs)

An example of such a heterogeneous list is

VI 12 :* VD 34.56 :* Nil :: NP Val '[Int, Double]

The type here says that this is a list of two Vals, the first Val being indexed by Int and the second Val being indexed by Double. If that makes zero sense to you, you may wish to study Well-Typed’s Applying Type-Level and Generic Programming in Haskell lecture notes.

As was the case for Val, we always statically know how long such a list is, so there should be no need to include any kind of length information in the encoded bytestring. Again, for serialization we don’t need to do anything very special:

putNP :: All Binary f xs => NP f xs -> Put
putNP Nil       = return ()
putNP (x :* xs) = put x >> putNP xs

The only minor complication here is that we need Binary instances for all the elements of the list; we guarantee this using the All type family (which is a minor generalization of the All type family explained in the same set of lecture notes linked above):

type family All p f xs :: Constraint where
  All p f '[]       = ()
  All p f (x ': xs) = (p (f x), All p f xs)

Deserialization however needs to make use of RTTI again. This means we need to define what we mean by RTTI for these heterogenous lists:

data instance RTTI (NP f) xs where
  RttiNpNil  :: RTTI (NP f) '[]
  RttiNpCons :: (HasRTTI f x, HasRTTI (NP f) xs)
             => RTTI (NP f) (x ': xs)

instance HasRTTI (NP f) '[] where
  rtti = RttiNpNil
instance (HasRTTI f x, HasRTTI (NP f) xs)
      => HasRTTI (NP f) (x ': xs) where
  rtti = RttiNpCons

In this case the RTTI gives us the shape of the list. We can take advantage of this during deserialization:

getNP :: All Binary f xs => RTTI (NP f) xs -> Get (NP f xs)
getNP RttiNpNil  = return Nil
getNP RttiNpCons = (:*) <$> get <*> getNP rtti

allowing us to give the Binary instance as follows:

instance (All Binary f xs, HasRTTI (NP f) xs)
      => Binary (NP f xs) where
  put = putNP
  get = getNP rtti

Serializing lists of Vals

If we use this Binary instance to serialize a list of Vals, we would end up with a type such as

decodeVals :: (HasRTTI (NP Val) xs, All Binary Val xs)
           => ByteString -> NP Val xs
decodeVals = decode

This All Binary Val xs constraint however is unfortunate, because we know that all Vals can be deserialized! Fortunately, we can do better. The RTTI for the (:*) case (RttiNpCons) included RTTI for the elements of the list. We made no use of that above, but we can make use of that when giving a specialized instance for lists of Vals:

putNpVal :: NP Val xs -> Put
putNpVal Nil       = return ()
putNpVal (x :* xs) = putVal x >> putNpVal xs

getNpVal :: RTTI (NP Val) xs -> Get (NP Val xs)
getNpVal RttiNpNil  = return Nil
getNpVal RttiNpCons = (:*) <$> get <*> getNpVal rtti

instance {-# OVERLAPPING #-} HasRTTI (NP Val) xs
      => Binary (NP Val xs) where
  put = putNpVal
  get = getNpVal rtti

This allows us to define

decodeVals :: HasRTTI (NP Val) xs => ByteString -> NP Val xs
decodeVals = decode

Note that this use of overlapping type classes instances is perfectly safe: the overlapping instance is fully compatible with the overlapped instance, so it doesn’t make a difference which one gets picked. The overlapped instance just allows us to be more economical with our constraints.

Here we can appreciate the choice of RTTI being a data family indexed by f; indeed the constraint HasRTTI f x in RttiNpCons is generic as possible. Concretely, decodeVals required only a single HasRTTI constraint, as promised above. It is this compositionality, along with the fact that we can derive many type classes from just having RTTI around, that gives this approach its strength.

Advanced example

To show how all this might work in a more advanced example, consider the following EDSL describing simple functions:

data Fn :: (*,*) -> * where
  Exp   :: Fn '(Double, Double)
  Sqrt  :: Fn '(Double, Double)
  Mod   :: Int -> Fn '(Int, Int)
  Round :: Fn '(Double, Int)
  Comp  :: (HasRTTI Fn '(b,c), HasRTTI Fn '(a,b))
        => Fn '(b,c) -> Fn '(a,b) -> Fn '(a,c)

If you are new to EDSLs (embedded languages) in Haskell, you way wish to watch the Well-Typed talk Haskell for embedded domain-specific languages. However, hopefully the intent behind Fn is not too difficult to see: we have a datatype that describes functions: exponentiation, square root, integer modules, rounding, and function composition. The two type indices of Fn describe the function input and output types. A simple interpreter for Fn would be

eval :: Fn '(a,b) -> a -> b
eval Exp          = exp
eval Sqrt         = sqrt
eval (Mod m)      = (`mod` m)
eval Round        = round
eval (g `Comp` f) = eval g . eval f

In the remainder of this blog post we will consider how we can define a Binary instance for Fn. Compared to the previous examples, Fn poses two new challenges:

  • The type index does not uniquely determine which constructor is used; if the type is (Double, Double) then it could be Exp, Sqrt or indeed the composition of some functions.
  • Trickier still, Comp actually introduces an existential type: the type “in the middle” b. This means that when we serialize and deserialize we do need to include some type information in the encoded bytestring.

RTTI for Fn

To start with, let’s define the RTTI for Fn:

data instance RTTI Fn ab where
  RttiFnDD :: RTTI Fn '(Double, Double)
  RttiFnII :: RTTI Fn '(Int, Int)
  RttiFnDI :: RTTI Fn '(Double, Int)

instance HasRTTI Fn '(Double, Double) where rtti = RttiFnDD
instance HasRTTI Fn '(Int, Int)       where rtti = RttiFnII
instance HasRTTI Fn '(Double, Int)    where rtti = RttiFnDI

For our DSL of functions, we only have functions from Double to Double, from Int to Int, and from Double to Int (and this is closed under composition).

Serializing type information

The next question is: when we serialize a Comp constructor, how much information do we need to serialize about that existential type? To bring this into focus, let’s consider the type information we have when we are dealing with composition:

data RttiComp :: (*,*) -> * where
  RttiComp :: RTTI Fn '(b,c) -> RTTI Fn '(a,b) -> RttiComp '(a,c)

Whenever we are deserializing a Fn, if that Fn happens to be the composition of two other functions we know RTTI about the composition; but since the “type in the middle” is unknown, we have no information about that at all. So what do we need to store? Let’s start with serialization:

putRttiComp :: RTTI Fn '(a,c) -> RttiComp '(a,c) -> Put

The first argument here is the RTTI about the composition as a whole, and sets the context. We can look at that context to determine what we need to output:

putRttiComp :: RTTI Fn '(a,c) -> RttiComp '(a,c) -> Put
putRttiComp rac (RttiComp rbc rab) = go rac rbc rab
    go :: RTTI Fn '(a,c) -> RTTI Fn '(b,c) -> RTTI Fn '(a,b) -> Put
    go RttiFnDD RttiFnDD RttiFnDD = return ()

    go RttiFnII RttiFnII RttiFnII = return ()
    go RttiFnII RttiFnDI rAB      = case rAB of {}

    go RttiFnDI RttiFnII RttiFnDI = putWord8 0
    go RttiFnDI RttiFnDI RttiFnDD = putWord8 1

Let’s take a look at what’s going on here. When we know from the context that the composition has type Double -> Double, then we know that the types of both functions in the composition must also be Double -> Double, and hence we don’t need to output any type information. The same goes when the composition has type Int -> Int, although we need to work a bit harder to convince ghc in this case. However, when the composition has type Double -> Int then the first function might be Double -> Int and the second might be Int -> Int, or the first function might be Double -> Double and the second might be Double -> Int. Thus, we need to distinguish between these two cases (in principle a single bit would suffice).

Having gone through this thought process, deserialization is now easy: remember that we know the context (the RTTI for the composition):

getRttiComp :: RTTI Fn '(a,c) -> Get (RttiComp '(a,c))
getRttiComp RttiFnDD = return $ RttiComp RttiFnDD RttiFnDD
getRttiComp RttiFnII = return $ RttiComp RttiFnII RttiFnII
getRttiComp RttiFnDI = do
    tag <- getWord8
    case tag of
      0 -> return $ RttiComp RttiFnII RttiFnDI
      1 -> return $ RttiComp RttiFnDI RttiFnDD
      _ -> fail "invalid tag"

Binary instance for Fn

The hard work is now mostly done. Although it is probably not essential, during serialization we can clarify the code by looking at the RTTI context to know which possibilities we need to consider at each type index. For example, if we are serializing a function of type Double -> Double, there are three possibilities (Exp, Sqrt, Comp). We did something similar in the previous section.

putAct :: RTTI Fn a -> Fn a -> Put
putAct = go
    go :: RTTI Fn a -> Fn a -> Put
    go r@RttiFnDD fn =
      case fn of
        Exp      -> putWord8 0
        Sqrt     -> putWord8 1
        Comp g f -> putWord8 255 >> goComp r (rtti, g) (rtti, f)
    go r@RttiFnII fn =
      case fn of
        Mod m    -> putWord8 0   >> put m
        Comp g f -> putWord8 255 >> goComp r (rtti, g) (rtti, f)
    go r@RttiFnDI fn =
      case fn of
        Round    -> putWord8 0
        Comp g f -> putWord8 255 >> goComp r (rtti, g) (rtti, f)

    goComp :: RTTI Fn '(a,c)
           -> (RTTI Fn '(b,c), Fn '(b,c))
           -> (RTTI Fn '(a,b), Fn '(a,b))
           -> Put
    goComp rAC (rBC, g) (rAB, f) = do
      putRttiComp rAC (RttiComp rBC rAB)
      go rBC g
      go rAB f

Deserialization proceeds along very similar lines; the only difficulty is that when we deserialize RTTI using getRttiComp we somehow need to reflect that to the type level; for this purpose we can provide a function

reflectRTTI :: RTTI f a -> (HasRTTI f a => b) -> b

It’s definition is beyond the scope of this blog post; refer to the source code on github instead. With this function in hand however deserialization is no longer difficult:

getAct :: RTTI Fn a -> Get (Fn a)
getAct = go
    go :: RTTI Fn a -> Get (Fn a)
    go r@RttiFnDD = do
      tag <- getWord8
      case tag of
        0   -> return Exp
        1   -> return Sqrt
        255 -> goComp r
        _   -> error "invalid tag"
    go r@RttiFnII = do
      tag <- getWord8
      case tag of
        0   -> Mod <$> get
        255 -> goComp r
        _   -> error "invalid tag"
    go r@RttiFnDI = do
      tag <- getWord8
      case tag of
        0   -> return Round
        255 -> goComp r
        _   -> error "invalid tag"

    goComp :: RTTI Fn '(a,c) -> Get (Fn '(a,c))
    goComp rAC = do
      RttiComp rBC rAB <- getRttiComp rAC
      reflectRTTI rBC $ reflectRTTI rAB $
        Comp <$> go rBC <*> go rAB

We can define the corresponding Binary instance for Fn simply using

instance HasRTTI Fn a => Binary (Fn a) where
  put = putAct rtti
  get = getAct rtti

If desired, a specialized instance for HList Fn can be defined that relies only on RTTI, just like we did for Val (left as exercise for the reader).


Giving type class instances for GADTs, in particular for type classes that produce values of these GADTs (deserialization, translation from Java values, etc.) can be tricky. If not kept in check, this can result in a code base with a lot of unnecessarily complicated function signatures or frequent use of explicit computation of evidence of type class instances. By using run-time type information we can avoid this, keeping the code clean and allowing programmers to focus at the problems at hand rather than worry about type classes instances.

PS: Singletons

RTTI looks a lot like singletons, and indeed things can be set up in such a way that singletons would do the job. The key here is to define a new kind for the type indices; for example, instead of

data Val :: * -> * where
  VI :: Int    -> Val Int
  VD :: Double -> Val Double

we’d write something like

data U = Int | Double

data instance Sing (u :: U) where
  SI :: Sing 'Int
  SD :: Sing 'Double

data Val :: U -> * where
  VI :: Int    -> Val 'Int
  VD :: Double -> Val 'Double

instance SingI u => Binary (Val u) where
  put (VI i) = put i
  put (VD d) = put d

  get = case sing :: Sing u of
          SI -> VI <$> get
          SD -> VD <$> get

In such a setup singletons can be used as RTTI. Which approach is preferable depends on questions such as are singletons already in use in the project, how much of their infrastructure can be reused, etc. A downside of using singletons rather than a more direct encoding using RTTI as I’ve presented it in this blog post is that using singletons probably means that some kind of type level decoding needs to be introduced (in this example, a type family U -> *); on the other side, having specific kinds for specific purposes may also clarify the code. Either way the main ideas are the same.

by edsko at June 15, 2017 09:48 AM

Mark Jason Dominus

Base-4 fractions in Telugu

Rik Signes brought to my attention that since version 5.1 Unicode has contained the following excitingly-named characters:



I looked into this a little and found out what they are for. It makes a lot of sense! The details were provided by “Telugu Measures and Arithmetic Marks” by Nāgārjuna Venna.

Telugu is the third-most widely spoken language in India, spoken mostly in the southeast part of the country. Traditional Telugu units of measurement are often divided into four or eight subunits. For example, the tūmu is divided into four kuṁcamulu, the kuṁcamulu, into four mānikalu, and the mānikalu into four sōlalu.

These days they mainly use liters like everyone else. But the traditional measurements are mostly divided into fours, so amounts are written with a base-10 integer part and a base-4 fractional part. The characters above are the base-4 fractional digits.

To make the point clearer, I hope, let's imagine that we are using the Telugu system, but with the familar western-style symbols 0123456789 instead of the Telugu digits ౦౧౨౩౪౫౬౭౮౯. (The Telugu had theirs first of course.) And let's use 0-=Z as our base-four fractional digits, analogous to Telugu ౦౼౽౾. (As in Telugu, we'll use the same zero symbol for both the integer and the fractional parts.) Then to write the number of gallons (7.4805195) in a cubic foot, we say


which is 7 gallons plus one (-) quart plus three (Z) cups plus two (=) quarter-cups plus three (Z) tablespoons plus zero (0) drams, a total of 7660 drams almost exactly. Or we could just round off to 7.=, seven and a half gallons.

(For the benefit of readers who might be a bit rusty on the details of these traditional European measurements, I should mention that there are four drams in a tablespoon, four tablespoons in a quarter cup, four quarter cups in a cup, four cups in a quart, and four quarts in a gallon, so 4⁵ = 1024 drams in a gallon and 7.4805195·4⁵ = 7660.052 drams in a cubic foot. Note also that these are volume (fluid) drams, not mass drams, which are different.)

We can omit the decimal point (as the Telegu did) and write


and it is still clear where the integer part leaves off and the fraction begins, because we are using special symbols for the fractional part. But no, this isn't quite enough, because if we wrote 20ZZ= it might not be clear whether we meant 20.ZZ= or 2.0ZZ=.

So the system has an elaboration. In the odd positions, we don't use the 0-=Z symbols; we use Q|HN instead. And we don't write 7-Z=Z0, we write


This is always unambiguous: 20.ZZ= is actually written 20NZH and 2.0ZZ= is written 2QZN=, quite different.

This is all fanciful in English, but Telugu actually did this. Instead of 0-=Z they had ౦౼౽౾ as I mentioned before. And instead of Q|HN they had ౸౹౺౻. So if the Telugu were trying to write 7.4805195, where we had 7|ZHZQ they might have written ౭౹౾౺౾౸. Like us, they then appended an abbreviation for the unit of measurement. Instead of “gal.” for gallon they might have put ఘ (letter “gha”), so ౭౹౾౺౾౸ఘ. It's all reasonably straightforward, and also quite sensible. If you have ౭౹౾౺ tūmu, you can read off instantly that there are ౺ (two) sōlalu left over, just as you can see that $7.43 has three pennies left over.

Notice that both sets of Telugu fraction digits are easy to remember: the digits for 3 have either three horizonal strokes ౾ or three vertical strokes ౻, and the others similarly.

I have an idea that the alternating vertical-horizontal system might have served as an error-detection mechanism: if a digit is omitted, you notice right away because the next symbol is wrong.

I find this delightful. A few years back I read all of The Number Concept: Its Origin and Development (1931) by Levi Leonard Conant, hoping to learn something really weird, and I was somewhat disappointed. Conant spends most of his book describing the number words and number systems used by dozens of cultures and almost all of them are based on ten, and a few on five or twenty. (“Any number system which passes the limit 10 is reasonably sure to have either a quinary, a decimal, or a vigesimal structure.”) But he does not mention Telugu!

by Mark Dominus ( at June 15, 2017 07:02 AM

June 14, 2017

Michael Snoyman

A Very Naive Overview of Nutrition (Part 2)

This blog post is part 2 of a series on nutrition and exercise. If you haven't seen it already, I recommend reading part 1 now. This blog post will go into more details on nutrition.

For the completely impatient, here are my recommendations on where you should get started, in a priority-sorted list (start with #1, and add more recommendations as you're ready):

  1. Avoid eating processed foods. For example: sweet potato with butter? OK. Potato chips? Avoid.
  2. Eat protein at each meal. Protein helps you feel full longer, helping avoid overeating.
  3. Reduce your sugar intake. Sugar is addictive, has significantly negative health impacts, and encourages you to eat more than you should at each meal.
  4. Pay attention to hunger cues. Stop eating before you feel "stuffed."

Of course, I strongly recommend you read the rest of this blog post for more details.


We need to get two different things from our food:

  • Essential nutrients
  • Energy

Essential nutrients are things that our body requires to live, and cannot make itself. Energy is what powers us. Without either of these, we die. You've probably heard of calories before. A calorie is a unit of measurement for energy. Each person has different requirements for both essential nutrients and calories, which we'll get to shortly.

The thing is that these two requirements overlap significantly. For example, Omega 3 fatty acids are an essential nutrient, but they also provide energy. Therefore, it's impossible to say something like "I'm going to get all of my energy from carbohydrates," since you'll be required to eat protein and fat as well.

Alright, let's break down nutrients:

  • Macronutrients, aka macros, are either protein, carbohydrates (carbs), or fat. All three of these provide some level of energy (more on that later). As far as the essential aspects of these are concerned:

    • Protein is made up of amino acids. There are 21 different amino acids, of which 9 are essential. Amino acids are used by your body for building most of its structure (muscles, organs, bones).

    • There are two essential fatty acids: Omega 3 and Omega 6. You've probably heard a lot about Omega 3. That's because our modern diets (for reasons I won't get into) have a much higher level of Omega 6 relative to Omega 3, which is theorized to be a cause of many diseases via inflammation. That means you likely don't need to worry about getting enough Omega 6, but may want to supplement Omega 3 (such as with fish oil pills).

      Other than that, you don't need to eat any fats. Your body can create its own fat (via de novo lipogensis) for fat storage.

    • There are no essential carbs. Fiber is a form of carbs that our bodies don't break down well, and help with digestion. Fiber also helps us feel full. But by saying it is non-essential, my point is: you can eat a diet without any carbs at all and survive. (Whether you should is a different issue.)

  • Micronutrients are vitamins and minerals. There are many of these, and I'm not going to be getting into too many details here, because it's complicated, and I'm not all that familiar on the details. You can supplement these with multivitamins. But much better in my opinion is to eat real foods (as opposed to processed foods) that give you a good variety of micronutrients. A good general rule when choosing foods is: prefer foods which are dense in micronutrients, meaning lots of vitamins and minerals per calorie of food.

NOTE You also get calories from alcohol. I'm not going to discuss that here; alcohol is completely unnecessary in your diet, and has many negative impacts on health. I certainly enjoy a drink from time to time, but if you're drinking enough that the calorie impact of the alcohol is meaningful, you're sabotaging your health significantly.


Unimportant side note: One calorie is the amount of energy needed to raise one gram of water one degree Celsius. When you read calories on food, it's actually talking about kilo-calories, or Calories (capital C), or food calories. The point is: there are a thousand "real" calories in a food calorie. I only mention this because it can be a point of confusion. We'll in general be talking about food calories, and just referring to them as calories.

Each of the macronutrients provides a different amount of calories:

  • Fat: 9 calories/gram
  • Carbs: 4 calories/gram
  • Protein: 4 calories/gram

But these numbers don't add up exactly as you'd expect. For example, protein is harder to convert into usable energy than the other two, and therefore it takes more energy to perform the breakdown. This is called the thermic effect of food, and means that you'll get less net energy from 9 grams of protein than from 4 grams of fat or 9 grams of carbs, even though in theory they should be the same.

This brings us to our first important point: during digestion, each macronutrient follows a different metabolic pathway, and therefore can have different effects on the body. We'll cover the difference between carbs and fat in a later section. For now, I want to point out that protein is a suboptimal energy source. This greatly affects how we want to consider protein as part of our diet (also in an upcoming section).

Total Daily Energy Expenditure

Your body needs energy to operate. The total energy it needs on a daily basis is the TDEE, or Total Daily Energy Expenditure. If you eat more energy than this number, the excess will be stored as fat. If you eat less, the difference will be taken from fat. This is known as calories-in/calories-out.

You'll see lots of debates online about this point. Here's my personal take: it's a truism, but misses a lot of the point. Yes, if you eat a lot more food, you'll put on weight. But the situation is quite a bit more complicated than this. The amount and type of food you eat affects hormone levels that influence your energy expenditure and hunger levels. And while my simplified model talks about adding and losing fat, we have other body mass (glycogen and muscle) which will be affected as well.

What's my point in all of this? Yes, you should be aware of your TDEE. Let it be a general guide (in addition to hunger signals) to how much you should eat. But realize it's an estimate, and that trying to change it (such as by eating only 500 calories a day) will not immediately result in losing the amount of fat you expect. Your body may slow down its metabolism to compensate, you may cheat more often, etc.

You can find lots of TDEE calculators online, here's one I find with a quick search. Also, one pound of body fat contains 3500 calories (7700 per kilogram), so in theory, you'd need to eat at a calorie deficit of 500 calories per day for a week to lose one pound of fat.

Protein Requirements

Since, as we said above, protein isn't a great source for energy, we primarily want to include protein in our diet for its non-energy aspects. This involves the "essential" bit about providing amino acids. However, there's another big benefit that comes from eating protein: you tend to stay full longer when you eat protein. One recommendation that I like to follow is to include a protein source in every meal.

So then the question is: how much protein do you need? You'll see lots of values thrown around for this. For example, 1 gram of protein per pound of bodyweight. That means, if you weigh 170 pounds (77kg for those of us outside the US), you'd target 170 grams of protein per day. But numbers really vary all over the place. Some standards place this as a certain number of grams per pound of lean body weight (meaning, ignoring your body fat). How much you need also varies with what kind of activity you're doing: if you're trying to build muscle, you'll usually want to eat more protein.

I'd recommend doing some research yourself on how much protein you need to get per day, I'm going to avoid making a recommendation. I will, instead, try to debunk some myths:

  • If you eat only 100% protein all day, you're not going to grow super muscles. Eventually, you'll die from something known as rabbit starvation.
  • That said, eating a high-protein diet, above the Recommended Daily Allowance, isn't going to send you into renal (kidney) failure. Unless you have some preexisting condition, you'll be able to handle a fairly high protein level without issue.

One of the biggest downsides with protein is that it tends to be relatively expensive (compare the cost of a steak vs a loaf of bread). Also, different protein sources have different absorption rates in the body. Finally, referring back to the essential amino acids, not all protein sources are complete, especially not vegan ones. (Complete here means it contains all 9 essential amino acids.) If you're eating animal products, you're probably fine. With vegan products, do a little more research on what you're eating (hemp seed and quinoa are both complete proteins).

Summary Get enough protein, and eat it at each meal to help you stay full longer.

Carbs vs fat

Alright, once you're done putting protein into your diet, you'll be filling up the rest of your calories from carbs and fat. This is probably one of the biggest areas where that issue of complication I mentioned comes into play. If you want my simple recommendation: start off by getting adequate protein and avoiding processed foods. In my opinion, you'll be getting 80% of the way to a great diet with just those steps.

OK, you want to get into the details of carbs vs fat? I would say that, first and foremost, a lot of the most sensational claims out there are simply not true. Fat doesn't clog your arteries. Carbs don't magically make you fat. Things are far more nuanced. I'm going to give a list of benefits for each of these macronutrients.

Benefits of carbs

  • Since they are less calorically dense than fat, you can eat more of them and get the same amount of calories
  • Carbs are part of what people often consider healthy foods, like fruits, vegetables, legumes, and grains. (I encourage you to especially research whether fruits and grains should be considered healthy in general. I'd recommend moderating fruit intake due to high sugar, and especially fructose, levels.)
  • Carbs tend to be the cheapest macronutrient available
  • Many high carb foods are also high fiber foods, which is good for digestion and satiety
  • Carbs are broken down into glucose in the body, and stored in the body as glycogen, which is a faster burning energy source than fat. This makes carbs good for explosive activity (like weight lifting or sprinting).
  • Unlike fats, carbs cannot be stored directly in the body as fat. They need to first be converted to fat via a process called de novo lipogensis, which loses some energy in the process. In other words, 500 calories of excess carbs will result in less body fat vs 500 calories of excess fat. * That said, if you eat both fat and carbs in your diet, your body will prefer to burn the carbs and store the fat, so given a fair mix of both macronutrients, this won't matter too much.

Benefits of fats

  • Fat tends to leave you feeling fuller longer, since digestion of fat is slower. This is very likely the primary mechanism by which low-carb diets help you lose weight.
  • If you almost completely eliminate carbs, your body will enter a state called ketosis, where your liver generates ketone bodies for your brain and other organs to run off of. This can have great fat burning results, and can be used for treating some neurological conditions (like epilepsy).
  • Eating insufficient fat can lead to hormonal imbalances, and the so-called "starvation mode." Having a high-fat low-carb diet can allow you to eat less total calories without having your apetite ramped up or your metabolism turned down.
  • If you eat primarily fat, your body gets better at turning fat into usable energy. This doesn't just apply to dietary fat, but to your body fat too. This is sometimes referred to as being a "fat burner."
  • Glycogen (stored carbs) is very limited in capacity in the body. By contrast, even extremely lean people have many tens of thousands of calories available in fat. If your body is good at burning fat, it can be a big advantage for endurance activities like marathon running or cycling.
  • Fats taste good. Carbs can taste good too, but that usually depends on the presence of sugar. Most people agree today that sugar is a pretty dangerous substance for the body and should be avoided.

There are clearly arguments in favor of both macronutrients. I'd argue that it has been the obvious case throughout human history that we have eaten diets high in carbs, high in fats, and high in both, and we can survive well on any of them. I've personally used all kinds of diets with good results.

There is one thing I've seen claimed that I think has a lot of logic to it. Some of the most successful diets today seem to be based around banning either carbs or fat. Perhaps the reason they work is that the biggest reward foods—ice cream, potato chips, chocolate, etc—are high in both carbs and fat. By allowing yourself large quantities of food, but naturally avoiding these highly tempting and easy-to-binge reward foods, it becomes much easier to adhere to a diet.

My recommendation Unless you have some ethical or religious reason guiding your eating, try out whatever popular diet plan appeals to you. Give it a few weeks at least, ideally a few months, and see how you respond. If you find that you're constantly fighting cravings even after trying the diet for a few weeks, consider trying something else. And if you are not losing body fat, either the diet's a bad one (don't fall for the ice cream diet!) or you're not following it well.

Glycogen and water weight

I mentioned above that carbs get stored as glycogen. When your body stores glycogen, it stores some water to go along with it. This is one of the reasons why low carb diets have such amazing short term results: when you first become fat adapted, you burn up your glycogen stores quickly, and flush out that extra water (in your urine) at the same time. You can lose a few pounds/kilos in a few short days.

Don't fall into this all-too-common trap:

Wow, I lost 3 pounds in my first week alone! This is great! If I just continue like this for the next 2 months, I'll lose 25 pounds in no time!

Then, when you of course can't continue peeing out 2.5 pounds of water per week and you eventually hit a weight loss plateu, you decide your diet isn't working and give up. In other words:

Be wary of the scale, it will lie to you!

Intermittent fasting

Something popping up much more recently is intermittent fasting, where you spend a certain number of hours per day not eating. Perhaps the most common is the 16-8 fast: you fast 16 hours and only eat for 8. That might sound rough, but when you realize that sleep is part of this, and the schedule is "fit all of your eating into 11am-7pm or similar", it's not too bad.

There are some theoretical health benefits of fasting on its own. Our bodies can swing between catabolic (breaking down) and anabolic (building up) phases, and there are advantages to both. If we're constantly stuffing our faces, our body never has to enter catabolism, which can be detrimental.

But intermittent fasting has a much simpler motivator: it makes it easier to eat within your TDEE if you don't spend all day eating. And during the part of the day you're not eating, it's much easier to control yourself. At least for me, a simple binary on/off switch for "am I allowed to eat" is easy.

Do you have to do this? Absolutely not. But if you're feeling like trying something, go for it. If nothing else, convincing yourself that you're strong enough to go regularly without eating is a good psychological barrier to overcome.

Different types of fat

Saturated. Unsaturated. Monounsaturated. Polyunsaturated. Omegas. Trans. What's up with all of this? Well, it's just chemistry. Fats are chains of carbons. Each carbon can form four bonds, and hydrogen can form one bond. So in theory, each carbon can bond to the carbon to its left, the carbon to its right, and two hydrogens. If that happens, you have a saturated fat. This is saturated because each carbon is fully saturated by two hydrogens.

However, sometimes we'll be missing hydrogens. Instead of binding to two hydrogens, two carbons can form a double bond. Each of those carbons will bond with one hydrogen and one other neighboring carbon. When such a double bond forms, we have an unsaturated fat. Because double bonds are more flexible, unsaturated fats melt (turn liquid) at lower temperatures. That's why saturated fats (like butter) tend to be solid at room temperature, but unsaturated fats (like olive oil) are liquid.

If a fat has just one double bond in it, it's monounsaturated. If it has more than one, it's polyunsaturated. Two of these polyunsaturated fats are special: omega 3 and omega 6 are differentiated by the distance between the tail of the carbon chain and the first double bond.

Trans fats are unsaturated fats which have been chemically altered to make them solid at higher temperatures. This is done by hydrogenating them. Because trans fats occur very rarely naturally, it seems that our bodies are not particularly good at digesting them, with the result being that they're bad for our health. Basically: avoid trans fats.

As mentioned above, both omega 3 and omega 6 are essential fatty acids. We get plenty of omega 6, so you should try to get more omega 3.

Beyond that, what kind of fats should you go for? That's a topic of much debate. Up until recently, the answer would be to prefer polyunsaturated vegetable oils. However, newer evidence points to saturated fat not being the villain it was thought to be, and vegetable oil in fact being dangerous. Monounsaturated fats—especially olive oil—seem to be pretty well accepted as being good for us.

Personally, I avoid vegetable oils and don't avoid saturated fats. But you'll get lots of conflicting advice on this area. I recommend reading up.

Different types of carbs

Simple. Complex. Sugar. Glucose. Fructose. Lactose. Starch. What exactly are carbs? Time for some more chemistry!

Saccharide is another term of carbohydrates. The monosaccharides and disaccharides make up what we call the sugars. The most common monosaccharides are:

  • Glucose
  • Fructose
  • Galactose

Disaccharides are pairs of monosaccharides, such as:

  • Sucrose (table sugar) = glucose + fructose
  • Lactose (milk sugar) = galactose + fructose
  • Maltose = glucose + glucose

Longer chains of saccharides form polysaccharides, such as starch (as you'd find in potatos or rice) and cellulose. Cellulose gives plants their structure and is indigestible (for the most part) to humans; you've already seen it referred to here as dietary fiber. However, some gut bacteria can digest fiber and generate molecules we can digest.

When digesting, our body will break down carbohydrates into monosaccharides so they can be absorbed in the large intenstine. Because this breakdown takes time, the more complex the carbohydrate (meaning the more saccharides are bound together), the slower the digestion. This will leave you feeling full longer and avoid a blood sugar spike.

When your blood sugar spikes, your body releases insulin to remove the toxic levels of sugar from the blood and store it as glycogen and fat. One working theory is that, when you eat a diet filled with simple sugars, you bounce between sugar highs and sugar crashes, the latter leaving you hungry and irritable, and reaching for that next sugary snack. All this is to say: avoid simple sugars!

One method for measuring how quickly carbs are absorbed is the glycemic index (GI), where a higher value means the food is more quickly absorbed. By this standard, you should probably stick to low GI foods, unless you have a specific reason to do otherwise (such as some kind of athletic competition or muscle recovery... but that's complicated and you should do research on it before trying it out).

Of the three monosaccharides, glucose is the one that our body cells can use directly. Fructose and galactose must be processed first by the liver. There are some claims that having a high-fructose diet can put undue strain on the liver, giving one reason why High Fructose Corn Syrup has such a bad rap. This is also a reason why binge-eating fruit—which is high in fructose—may not be a great idea.


I'm only putting in this section because people will ask. The story with salt is, in my opinion, completely unclear. There are many contradictory studies. If you have hypertension, general consensus is to reduce salt. Beyond that, conventional wisdom says reducing salt is a good thing, but many newer studies show that it has no benefit. And also, if you're going for a ketogenic diet, make sure to get plenty of electrolytes, including salt, potassium, and magnesium.

Summary of Nutrition

Whew, that's a lot of information! Let me try to simplify all of that down into some practical advice.

  • Avoid processed foods. They're made up of the worst combination of foods that basically everyone agrees will kill you: processed oils, simple sugars and starches, chemicals, and excess salt. Honestly, just following this one piece of advice is in my opinion the best thing you can do for your health.
  • Eat plenty of protein, and try to get it with each meal.
  • Don't eat too many calories in the course of a day.
  • Balance your carbs and fats based on your calorie needs. Try out variations of that balance and see what works for you.
  • Get sufficient omega 3s.
  • If necessary, supplement vitamins and minerals.

I'll tie up this series in my next post, which will go into details on exercise.

June 14, 2017 03:00 AM

June 13, 2017

Functional Jobs

Senior Backend Engineer (Haskell) at Front Row Education (Full-time)


Senior Backend Engineer to join fast-growing education startup transforming the way 6.5+ million K-12 students learn Math, English, Social Studies and more.

What you tell your friends you do

“You know how teachers in public schools are always overworked and overstressed with 30 kids per classroom and never ending state tests? I make their lives possible and help their students make it pretty far in life”

What you really will be doing

Architect, design and develop new web applications, tools and distributed systems for the Front Row ecosystem in Haskell, Flow, PostgreSQL, Ansible and many others. You will get to work on your deliverable end-to-end, from the UX to the deployment logic

Mentor and support more junior developers in the organization

Create, improve and refine workflows and processes for delivering quality software on time and without incurring debt

Work closely with Front Row educators, product managers, customer support representatives and account executives to help the business move fast and efficiently through relentless automation.

How you will do this

You’re part of an agile, multidisciplinary team. You bring your own unique skill set to the table and collaborate with others to accomplish your team’s goals.

You prioritize your work with the team and its product owner, weighing both the business and technical value of each task.

You experiment, test, try, fail and learn all the time

You don’t do things just because they were always done that way, you bring your experience and expertise with you and help the team make the best decisions

What have we worked on in the last quarter

We have rewritten our business logic to be decoupled from the Common Core math standards, supporting US state-specific standards and international math systems

Prototyped and tested a High School Math MVP product in classrooms

Changed assigning Math and English to a work queue metaphor across all products for conceptual product simplicity and consistency

Implemented a Selenium QA test suite 100% in Haskell

Released multiple open source libraries for generating automated unit test fixtures, integrating with AWS, parsing and visualizing Postgres logs and much more

Made numerous performance optimization passes on the system for supporting classrooms with weak Internet bandwidth


We’re an agile and lean small team of engineers, teachers and product people working on solving important problems in education. We hyper-focus on speeds, communication and prioritizing what matters to our millions of users.


  • You’re smart and can find a way to show us.
  • A track record of 5+ years of working in, or leading, teams that rapidly ship high quality web-based software that provides great value to users. Having done this at a startup a plus.
  • Awesome at a Functional Programming language: Haskell / Scala / Clojure / Erlang etc
  • Exceptional emotional intelligence and people skills
  • Organized and meticulous, but still able to focus on the big picture of the product
  • A ton of startup hustle: we're a fast-growing, VC-backed, Silicon Valley tech company that works hard to achieve the greatest impact we can.


  • Money, sweet
  • Medical, dental, vision
  • Incredible opportunity to grow, learn and build lifetime bonds with other passionate people who share your values
  • Food, catered lunch & dinner 4 days a week + snacks on snacks
  • Room for you to do things your way at our downtown San Francisco location right by the Powell Station BART, or you can work remotely from anywhere in the US, if that’s how you roll
  • Awesome monthly team events + smaller get-togethers (board game nights, trivia, etc)

Get information on how to apply for this position.

June 13, 2017 04:59 PM

Senior Backend Engineer (Haskell) at Front Row Education (Full-time)


Senior Backend Engineer to join fast-growing education startup transforming the way 6.5+ million K-12 students learn Math, English, Social Studies and more.

What you tell your friends you do

“You know how teachers in public schools are always overworked and overstressed with 30 kids per classroom and never ending state tests? I make their lives possible and help their students make it pretty far in life”

What you really will be doing

Architect, design and develop new web applications, tools and distributed systems for the Front Row ecosystem in Haskell, Flow, PostgreSQL, Ansible and many others. You will get to work on your deliverable end-to-end, from the UX to the deployment logic

Mentor and support more junior developers in the organization

Create, improve and refine workflows and processes for delivering quality software on time and without incurring debt

Work closely with Front Row educators, product managers, customer support representatives and account executives to help the business move fast and efficiently through relentless automation.

How you will do this

You’re part of an agile, multidisciplinary team. You bring your own unique skill set to the table and collaborate with others to accomplish your team’s goals.

You prioritize your work with the team and its product owner, weighing both the business and technical value of each task.

You experiment, test, try, fail and learn all the time

You don’t do things just because they were always done that way, you bring your experience and expertise with you and help the team make the best decisions

What have we worked on in the last quarter

We have rewritten our business logic to be decoupled from the Common Core math standards, supporting US state-specific standards and international math systems

Prototyped and tested a High School Math MVP product in classrooms

Changed assigning Math and English to a work queue metaphor across all products for conceptual product simplicity and consistency

Implemented a Selenium QA test suite 100% in Haskell

Released multiple open source libraries for generating automated unit test fixtures, integrating with AWS, parsing and visualizing Postgres logs and much more

Made numerous performance optimization passes on the system for supporting classrooms with weak Internet bandwidth


We’re an agile and lean small team of engineers, teachers and product people working on solving important problems in education. We hyper-focus on speeds, communication and prioritizing what matters to our millions of users.


  • You’re smart and can find a way to show us.
  • A track record of 5+ years of working in, or leading, teams that rapidly ship high quality web-based software that provides great value to users. Having done this at a startup a plus.
  • Awesome at a Functional Programming language: Haskell / Scala / Clojure / Erlang etc
  • Exceptional emotional intelligence and people skills
  • Organized and meticulous, but still able to focus on the big picture of the product
  • A ton of startup hustle: we're a fast-growing, VC-backed, Silicon Valley tech company that works hard to achieve the greatest impact we can.


  • Money, sweet
  • Medical, dental, vision
  • Incredible opportunity to grow, learn and build lifetime bonds with other passionate people who share your values
  • Food, catered lunch & dinner 4 days a week + snacks on snacks
  • Room for you to do things your way at our downtown San Francisco location right by the Powell Station BART, or you can work remotely from anywhere in the US, if that’s how you roll
  • Awesome monthly team events + smaller get-togethers (board game nights, trivia, etc)

Get information on how to apply for this position.

June 13, 2017 04:59 PM

Michael Snoyman

A Very Naive Overview of Nutrition and Exercise (Part 1)

Some family and friends have been asking me to write up my thoughts on the topic of nutrition and exercise. To give proper warning, I want to say right from the beginning of this that I am not in any way a qualified expert. I'm a computer programmer who was overweight and unhealthy for most of my life until my mid-twenties, when I decided to take control, did a bunch of reading, and have been (mostly) in shape and far healthier since.

I don't want you to take anything I say as gospel; it's not. Hopefully this will give you ideas of where to start, topics worth researching, and short-circuit some of the very self-defeating confusion that I think most of us have suffered through. I'm not providing sources for what I'm writing, partly because I want you to read up on topics yourself, and mostly because I'm too lazy :).

This is something of a continuation on my post on why I lift, though in reality I started on this post first. Also, I had originally intended to make one massive post covering nutrition and exercise. Instead, I'm breaking this up into three parts. This post will set the tone and give some background information, and the following two posts will dive into each of nutrition and exercise in more detail (though still as a "naive overview").

This post series is very off the beaten track for me, and I'm still unsure if I'll be writing more like it. If you do like it and want to see more, or have some specific questions, please mention so in the comments and I'll be more likely to make future posts on these topics.


I've come up with the following philosophical points about health and fitness, which guide my own decisions a lot:

  • Overcomplication is a major enemy. Should you follow a vegan diet, a paleo diet, go ketogenic, or respect GI values? Should you run, jog, sprint, lift weights, do bodyweights? This abundance of seemingly contradictory advice is the most demotivating thing out there, and prevents so many of us from getting healthy.
  • While these complications are real, you can get the vast majority of benefits by following many simpler guidelines (I'll talk about those later) that almost everyone agrees on. Do the simple stuff first, worry about the rocket science later.
  • If you read any nutrition study, odds are pretty high there's another study that shows the opposite result. Nutrition science is greatly lacking in replication studies, so take everything you read with a grain of salt (and yes, studies on salt are contradictory too).
  • You'll be best served by following basic guidelines, getting comfortable with those, and then experimenting with different approaches from that baseline. If you're motivated to, go ahead and spend a week or three on a vegan diet, on a keto diet, and anything else you believe has a chance of working. Pay attention to how you respond to it.

Who am I?

I mentioned this a bit in the why I lift post, but I want to give a little more background here. Odds are pretty good that my baseline level of health and fitness is lower than you, the reader. As a child and young adult, I was overweight. I ate junk food constantly. I hardly exercised. I had a few brief bouts where I lost some weight, but it always came back within a year, and with a vengeance.

I've been programming since I was 10 years old. I spent hours on end almost every day since then on a computer or playing video games. I wasn't quite at the stereotype of sitting in a darkened room eating Cheetos and Mountain Dew, but I was pretty close.

Around the age of 25 (give or take a few years), I decided I had enough. I was tired of being overweight. I was scared of developing diabetes. I could barely sit at my desk for 10 minutes without back pain. I woke up in the morning and had trouble getting out of bed. I finally decided that bad health—at least in my case—wasn't a curse of genetics, but something I'd brought on myself, and only I would be able to fix it.

So as you read these posts, I don't want you to become discouraged and think "well, this guy can do this, but I never could, I'm just your average office worker." It's quite the opposite. If I've been able to overcome a lifetime of bad habits and genetic predispositions to negative health conditions, you can too.


It's useless to talk about "getting healthy" or "getting fit" without some definition of what that means. Some people are going to have very specific goals; for example, a power lifter may want to deadlift as much weight as possible, even if the process shortens his/her lifespan by 10 years. If you have such specific goals, odds are this post isn't for you.

I'm going to guess that most people reading this will probably have the same three goals, though their priorities among the goals will differ:

  • Lose fat
  • Gain muscle
  • Improve general health/increase longevity/feel better. This would include improvements in things like:

    • Cardiovascular function
    • Cholesterol levels

I was specific in my wording on those first two bullets. You may think you want to lose weight, but you won't be happy if you lose weight in the form of muscle mass or (worse) organs. Similarly, you may not think you want to gain muscle, but I'd argue that you do:

  • More muscle = more calories burned, making fat loss easier
  • More muscle makes moving around in day to day life easier
  • You'll look better (both men and women) with more muscle

Caveat: I'm not talking about bodybuilder levels here.

Nutrition and Exercise

Nutrition is what food you put into your body. Exercise is what activities you do with your body. Based on the goals above, we need to acknowledge that you need to address both nutrition and exercise to address your goals. This is the first big mistake I'll address in this post.

  • If you eat a bunch of junk food, almost no level of exercise you perform will burn off the extra fat you're gaining.
  • If you don't do any exercise, your body will get weaker, regardless of what you're eating.

So this is important: you need to do both. Period. If you're going to pick one of them to start off with... I guess I'd say start with nutrition, but it's really a personal call. I'd recommend starting with whatever you believe you're more likely to stick with.

Up next

My next post will dive into details on the nutrition half of the equation, and the following post will dive into exercise. If there are enough questions raised in the comments in these three posts, I'll likely add a fourth Q&A post to this series.

And if you're just desperate to read more now, don't forget about my why I lift post.

June 13, 2017 03:00 AM

June 12, 2017

Roman Cheplyaka

On friendly contributing policies

Neil Mitchell gave a talk at ZuriHac about drive-by contributions. I did not attend ZuriHac this year, but Neil published the slides of his talk. While I was skimming through them, one caught my attention:

<figure> </figure>

The quote on the bottom-right is taken from the haskell-src-exts contributing policy, which I wrote back when I was its maintainer.

As I said, I didn’t have the chance to attend the talk, and the video does not seem to be released yet, so I can only guess the message of this slide (perhaps Neil or someone who was present will clarify) — but it looks to me that this quote is an example of an unfriendly message that is contrasted to the first, welcoming, message.

If that’s the case, this doesn’t seem fair to me: the quote is cherry-picked from a section that starts with

So, you’ve fixed a bug or implemented an extension. Awesome!

We strive to get every such pull request reviewed and merged within a month.

For best results, please follow these guidelines:

You could argue whether this is friendly enough or not, but you have to admit that it is at least considerate of contributors’ time and interests.

But what about those two particular sentences on the slide? Are they rude? Is it because they don’t have “please” in them?

I guess that, taken in isolation, they do seem rude for that reason, but they are part of a list of instructions, and you can make a list of instructions only so much polite. In particular, I don’t think that simply adding “please” in front of every imperative would make it any nicer — but I am not a native English speaker and could be wrong.

On the other hand, if I wrote that policy as a list of polite requests (“Could you please not put multiple unrelated changes in a single pull request?”), I feel that it would take more time and effort for a potential contributor to comprehend and extract the essence of what’s written. That would be disrespectful to the contributor’s time. Besides, if you have 10+ polite requests in a row, it starts to look as a caricature.

That said, I am open to constructive criticism. If you have ideas about how I could have written it better, please drop me a line.

Update. Neil has written a nice detailed response, which I greatly appreciate.

June 12, 2017 08:00 PM

FP Complete

The ReaderT Design Pattern

Often times I'll receive or read questions online about "design patterns" in Haskell. A common response is that Haskell doesn't have them. What many languages address via patterns, in Haskell we address via language features (like built-in immutability, lambdas, laziness, etc). However, I believe there is still room for some high-level guidance on structuring programs, which I'll loosely refer to as a Haskell design pattern.

The pattern I'm going to describe today is what I've been referring to as the "ReaderT pattern" for a few years now, at least in informal discussions. I use it as the basis for the design of Yesod's Handler type, it describes the majority of the Stack code base, and I've recommended it to friends, colleagues, and customers regularly.

That said, this pattern is not universally held in the Haskell world, and plenty of people design their code differently. So remember that, like other articles I've written, this is highly opinionated, but represents my personal and FP Complete's best practices recommendations.

Let's summarize the pattern, and then get into details and exceptions:

  • Your application should define a core data type (call it Env if you want).
  • This data type will contain all runtime configuration and global functions that may be mockable (like logging functions or database access).
  • If you must have some mutable state, put it in Env as a mutable reference (IORef, TVar, etc).
  • Your application code will, in general, live in ReaderT Env IO. Define it as type App = ReaderT Env IO if you wish, or use a newtype wrapper instead of ReaderT directly.
  • You can use additional monad transformers on occassion, but only for small subsets of your application, and it's best if those subsets are pure code.
  • Optional: instead of directly using the App datatype, write your functions in terms of mtl-style typeclasses like MonadReader and MonadIO, which will allow you to recover some of the purity you think I just told you to throw away with IO and mutable references.

That's a lot to absorb all at once, some of it (like the mtl typeclasses) may be unclear, and other parts (especially that mutable reference bit) probably seems completely wrong. Let me motivate these statements and explain the nuances.

Better globals

Let's knock out the easy parts of this. Global variables are bad, and mutable globals are far worse. Let's say you have an application which wants to configure its logging level (e.g., should we print or swallow DEBUG level messages?). There are three common ways you might do this in Haskell:

  1. Use a compile-time flag to control which logging code gets included in your executable
  2. Define a global value which reads a config file (or environment variables) with unsafePerformIO
  3. Read the config file in your main function and then pass the value (explicitly, or implicitly via ReaderT) to the rest of your code

(1) is tempting, but experience tells me it's a terrible solution. Every time you have conditional compilation in your codebase, you're adding in a fractal of possible build failures. If you have 5 conditionals, you now have 32 (2^5) possible build configurations. Are you sure you have the right set of import statements for all of those 32 configurations? It's just a pain to deal with. Moreover, do you really want to decide at compile time that you're not going to need debug information? I'd much rather be able to flip a false to true in a config file and restart my app to get more information while debugging.

(By the way, even better than this is the ability to signal the process while it's running to change debug levels, but we'll leave that out for now. The ReaderT+mutable variable pattern is one of the best ways to achieve this.)

OK, so you've agreed with me that you shouldn't conditionally compile. Now that you've written your whole application, however, you're probably hesitant to have to rewrite it to thread through some configuration value everywhere. I get that, it's a pain. So you decide you'll just use unsafePerformIO, since the file will be read once, it's a pure-ish value for the entire runtime, and everything seems mostly safe. However:

  • You now have a slight level of non-determinism of where exceptions will occur. If you have a missing or invalid config file, where will the exception get thrown from? I'd much rather that occur immediately on app initialization.
  • Suppose you want to run one small part of the application with louder debug information (because you know it's more likely to fail thant the rest of your code). You basically can't do this at all.
  • Just wait until you use STM inside your config file parsing for some reason, and then your config value first gets evaluated inside a different STM block. (And there's no way that could ever happen.)
  • Every time you use unsafePerformIO, a kitten dies.

It's time to just bite the bullet, define some Env data type, put the config file values in it, and thread it through your application. If you design your application from the beginning like this: great, no harm done. Doing it later in application development is certainly a pain, but the pain can be mitigated by some of what I'll say below. And it is absolutely better to suck up a little pain with mechanical code rewriting than be faced with race conditions around unsafePerformIO. Remember, this is Haskell: we'd rather face compile time rather than runtime pain.

Once you've accepted your fate and gone all-in on (3), you're on easy street:

  • Have some new config value to pass around? Easy, just augment the Env type.
  • Want to temporarily bump the log level? Use local and you're golden

You're now far less likely to resort to ugly hacks like CPP code (1) or global variables (2), because you've eliminated the potential pain from doing it the Right Way.

Initializing resources

The case of reading a config value is nice, but even nicer is initializing some resources. Suppose you want to initialize a random number generator, open up a Handle for sending log messages to, set up a database pool, or create a temporary directory to store files in. These are all far more logical to do inside main than from some global position.

One advantage of the global variable approach for these kinds of initializations is that it can be defered until the first time the value is used, which is nice if you think some resources may not always be needed. But if that's what you want, you can use an approach like runOnce.

Avoiding WriterT and StateT

Why in the world would I ever recommend mutable references as anything but a last resort? We all know that purity in Haskell is paramount, and mutability is the devil. And besides, we have these wonderful things called WriterT and StateT if we have some values that need to change over time in our application, so why not use them?

In fact, early versions of Yesod did just that: they used a StateT kind of approach within Handler to allow you to modify the user session values and set response headers. However, I switched over to mutable references quite a while ago, and here's why:

Exception-survival If you have a runtime exception, you will lose your state in WriterT and StateT. Not so with a mutable reference: you can read the last available state before the runtime exception was thrown. We use this to great benefit in Yesod, to be able to set response headers even if the response fails with something like a notFound.

False purity We say WriterT and StateT are pure, and technically they are. But let's be honest: if you have an application which is entirely living within a StateT, you're not getting the benefits of restrained mutation that you want from pure code. May as well call a spade a spade, and accept that you have a mutable variable.

Concurrency What's the result of put 4 >> concurrently (modify (+ 1)) (modify (+ 2)) >> get? You may want to say that it will be 7, but it definitely won't be. Your options, depending on how concurrently is implemented with regards to the state provided by StateT, are 4, 5, or 6. Don't believe me? Play around with:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
import Control.Concurrent.Async.Lifted
import Control.Monad.State.Strict

main :: IO ()
main = execStateT
    (concurrently (modify (+ 1)) (modify (+ 2)))
    4 >>= print

The issue is that we need to clone the state from the parent thread into both child threads, and then arbitrarily pick which child state will survive. Or, if we want to, we can just throw both child states away and continue with the original parent state. (By the way, if you think the fact that this code compiles is a bad thing, I agree, and suggest you use Control.Concurrent.Async.Lifted.Safe.)

Dealing with mutable state between different threads is a hard problem, but StateT doesn't fix the problem, it hides it. If you use a mutable variable, you'll be forced to think about this. What semantics do we want? Should we use an IORef, and stick to atomicModifyIORef? Should we use a TVar? These are fair questions, and ones we'll be forced to examine. For a TVar-like approach:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
{-# LANGUAGE FlexibleContexts #-}
import Control.Concurrent.Async.Lifted.Safe
import Control.Monad.Reader
import Control.Concurrent.STM

modify :: (MonadReader (TVar Int) m, MonadIO m)
       => (Int -> Int)
       -> m ()
modify f = do
  ref <- ask
  liftIO $ atomically $ modifyTVar' ref f

main :: IO ()
main = do
  ref <- newTVarIO 4
  runReaderT (concurrently (modify (+ 1)) (modify (+ 2))) ref
  readTVarIO ref >>= print

And you can even get a little fancier with prebaked transformers.

WriterT is broken Don't forget that, as Gabriel Gonzalez has demonstrated, even the strict WriterT has a space leak.

Caveats I still do use StateT and WriterT sometimes. One prime example is Yesod's WidgetT, which is essentially a WriterT sitting on top of HandlerT. It makes sense in that context because:

  • The mutable state is expected to be modified for a small subset of the application
  • Although we can perform side-effects while building up the widget, the widget construction itself is a morally pure activity
  • We don't need to let state survive an exception: if something goes wrong, we'll send back an error page instead
  • There's no good reason to use concurrency when constructing a widget.
  • Despite my space leak concerns, I thoroughly benchmarked WriterT against alternatives, and found it to be the fastest for this use case. (Numbers beat reasoning.)

The other great exception to this rule is pure code. If you have some subset of your application which can perform no IO but needs some kind of mutable state, absolutely, 100%, please use StateT.

Avoiding ExceptT

I'm already strongly on record as saying that ExceptT over IO is a bad idea. To briefly repeat myself: the contract of IO is that any exception can be thrown at any time, so ExceptT doesn't actually document possible exceptions, it misleads. You can see that blog post for more details.

I'm rehashing this here because some of the downsides of StateT and WriterT apply to ExceptT as well. For example, how do you handle concurrency in an ExceptT? With runtime exceptions, the behavior is clear: when using concurrently, if any child thread throws an exception, the other thread is killed and the exception rethrown in the parent. What behavior do you want with ExceptT?

Again, you can use ExceptT from pure code where a runtime exception is not part of the contract, just like you should use StateT in pure code. But once we've eliminated StateT, WriterT, and ExceptT from our main application transformers, we're left with...

Just ReaderT

And now you know why I call this "the ReaderT design pattern." ReaderT has a huge advantage over the other three transformers listed: it has no mutable state. It's simply a convenient manner of passing an extra parameter to all functions. And even if that parameter contains mutable references, that parameter itself is fully immutable. Given that:

  • We get to ignore all of the state-overwriting issues I mentioned with concurrency. Notice how we were able to use the .Safe module in the example above. That's because it is actually safe to do concurrency with a ReaderT.
  • Similarly, you can use the monad-unlift library package
  • Deep monad transformer stacks are confusing. Knocking it all down to just one transformer reduces complexity, significantly.
  • It's not just simpler for you. It's simpler for GHC too, which tends to have a much better time optimizing one-layer ReaderT code versus 5-transformer-deep code.

By the way, once you've bought into ReaderT, you can just throw it away entirely, and manually pass your Env around. Most of us don't do that, because it feels masochistic (imagine having to tell every call to logDebug where to get the logging function). But if you're trying to write a simpler codebase that doesn't require understanding of transformers, it's now within your grasp.

Has typeclass approach

Let's say we're going to expand our mutable variable example above to include a logging function. It may look something like this:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
{-# LANGUAGE FlexibleContexts #-}
import Control.Concurrent.Async.Lifted.Safe
import Control.Monad.Reader
import Control.Concurrent.STM
import Say

data Env = Env
  { envLog :: !(String -> IO ())
  , envBalance :: !(TVar Int)

modify :: (MonadReader Env m, MonadIO m)
       => (Int -> Int)
       -> m ()
modify f = do
  env <- ask
  liftIO $ atomically $ modifyTVar' (envBalance env) f

logSomething :: (MonadReader Env m, MonadIO m)
             => String
             -> m ()
logSomething msg = do
  env <- ask
  liftIO $ envLog env msg

main :: IO ()
main = do
  ref <- newTVarIO 4
  let env = Env
        { envLog = sayString
        , envBalance = ref
      (modify (+ 1))
      (logSomething "Increasing account balance"))
  balance <- readTVarIO ref
  sayString $ "Final balance: " ++ show balance

Your first reaction to this is probably that defining this Env data type for your application looks like overhead and boilerplate. You're right, it is. Like I said above though, it's better to just suck up some of the pain initially to make a better long-term application development practice. Now let me double down on that...

There's a bigger problem with this code: it's too coupled. Our modify function takes in an entire Env value, even though it never uses the logging function. And similarly, logSomething never uses the mutable variable it's provided. Exposing too much state to a function is bad:

  • We can't, from the type signature, get any idea about what the code is doing
  • It's more difficult to test. In order to see if modify is doing the right thing, we need to provide it some garbage logging function.

So let's double down on that boilerplate, and use the Has typeclass trick. This composes well with MonadReader and other mtl classes like MonadThrow and MonadIO to allow us to state exactly what our function needs, at the cost of having to define a lot of typeclasses upfront. Let's see how this looks:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
import Control.Concurrent.Async.Lifted.Safe
import Control.Monad.Reader
import Control.Concurrent.STM
import Say

data Env = Env
  { envLog :: !(String -> IO ())
  , envBalance :: !(TVar Int)

class HasLog a where
  getLog :: a -> (String -> IO ())
instance HasLog (String -> IO ()) where
  getLog = id
instance HasLog Env where
  getLog = envLog

class HasBalance a where
  getBalance :: a -> TVar Int
instance HasBalance (TVar Int) where
  getBalance = id
instance HasBalance Env where
  getBalance = envBalance

modify :: (MonadReader env m, HasBalance env, MonadIO m)
       => (Int -> Int)
       -> m ()
modify f = do
  env <- ask
  liftIO $ atomically $ modifyTVar' (getBalance env) f

logSomething :: (MonadReader env m, HasLog env, MonadIO m)
             => String
             -> m ()
logSomething msg = do
  env <- ask
  liftIO $ getLog env msg

main :: IO ()
main = do
  ref <- newTVarIO 4
  let env = Env
        { envLog = sayString
        , envBalance = ref
      (modify (+ 1))
      (logSomething "Increasing account balance"))
  balance <- readTVarIO ref
  sayString $ "Final balance: " ++ show balance

Holy creeping boilerplate batman! Yes, type signatures get longer, rote instances get written. But our type signatures are now deeply informative, and we can test our functions with ease, e.g.:

main :: IO ()
main = hspec $ do
  describe "modify" $ do
    it "works" $ do
      var <- newTVarIO (1 :: Int)
      runReaderT (modify (+ 2)) var
      res <- readTVarIO var
      res `shouldBe` 3
  describe "logSomething" $ do
    it "works" $ do
      var <- newTVarIO ""
      let logFunc msg = atomically $ modifyTVar var (++ msg)
          msg1 = "Hello "
          msg2 = "World\n"
      runReaderT (logSomething msg1 >> logSomething msg2) logFunc
      res <- readTVarIO var
      res `shouldBe` (msg1 ++ msg2)

And if defining all of these classes manually bothers you, or you're just a big fan of the library, you're free to use lens:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FunctionalDependencies #-}
import Control.Concurrent.Async.Lifted.Safe
import Control.Monad.Reader
import Control.Concurrent.STM
import Say
import Control.Lens
import Prelude hiding (log)

data Env = Env
  { envLog :: !(String -> IO ())
  , envBalance :: !(TVar Int)

makeLensesWith camelCaseFields ''Env

modify :: (MonadReader env m, HasBalance env (TVar Int), MonadIO m)
       => (Int -> Int)
       -> m ()
modify f = do
  env <- ask
  liftIO $ atomically $ modifyTVar' (env^.balance) f

logSomething :: (MonadReader env m, HasLog env (String -> IO ()), MonadIO m)
             => String
             -> m ()
logSomething msg = do
  env <- ask
  liftIO $ (env^.log) msg

main :: IO ()
main = do
  ref <- newTVarIO 4
  let env = Env
        { envLog = sayString
        , envBalance = ref
      (modify (+ 1))
      (logSomething "Increasing account balance"))
  balance <- readTVarIO ref
  sayString $ "Final balance: " ++ show balance

In our case, where Env doesn't have any immutable config-style data in it, the advantages of the lens approach aren't as apparent. But if you have some deeply nested config value, and especially want to play around with using local to tweak some values in it throughout your application, the lens approach can pay off.

So to summarize: this approach really is about biting the bullet and absorbing some initial pain and boilerplate. I argue that the myriad benefits you get from it during app development are well worth it. Remember: you'll pay that upfront cost once, you'll reap its rewards daily.

Regain purity

It's unfortunate that our modify function has a MonadIO constraint on it. Even though our real implementation requires IO to perform side-effects (specifically, to read and write a TVar), we've now infected all callers of the function by saying "we have the right to perform any side-effects, including launching the missiles, or worse, throwing a runtime exception." Can we regain some level of purity? The answer is yes, it just requires a bit more boilerplate:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
import Control.Concurrent.Async.Lifted.Safe
import Control.Monad.Reader
import qualified Control.Monad.State.Strict as State
import Control.Concurrent.STM
import Say
import Test.Hspec

data Env = Env
  { envLog :: !(String -> IO ())
  , envBalance :: !(TVar Int)

class HasLog a where
  getLog :: a -> (String -> IO ())
instance HasLog (String -> IO ()) where
  getLog = id
instance HasLog Env where
  getLog = envLog

class HasBalance a where
  getBalance :: a -> TVar Int
instance HasBalance (TVar Int) where
  getBalance = id
instance HasBalance Env where
  getBalance = envBalance

class Monad m => MonadBalance m where
  modifyBalance :: (Int -> Int) -> m ()
instance (HasBalance env, MonadIO m) => MonadBalance (ReaderT env m) where
  modifyBalance f = do
    env <- ask
    liftIO $ atomically $ modifyTVar' (getBalance env) f
instance Monad m => MonadBalance (State.StateT Int m) where
  modifyBalance = State.modify

modify :: MonadBalance m => (Int -> Int) -> m ()
modify f = do
  -- Now I know there's no way I'm performing IO here
  modifyBalance f

logSomething :: (MonadReader env m, HasLog env, MonadIO m)
             => String
             -> m ()
logSomething msg = do
  env <- ask
  liftIO $ getLog env msg

main :: IO ()
main = hspec $ do
  describe "modify" $ do
    it "works, IO" $ do
      var <- newTVarIO (1 :: Int)
      runReaderT (modify (+ 2)) var
      res <- readTVarIO var
      res `shouldBe` 3
  it "works, pure" $ do
      let res = State.execState (modify (+ 2)) (1 :: Int)
      res `shouldBe` 3
  describe "logSomething" $ do
    it "works" $ do
      var <- newTVarIO ""
      let logFunc msg = atomically $ modifyTVar var (++ msg)
          msg1 = "Hello "
          msg2 = "World\n"
      runReaderT (logSomething msg1 >> logSomething msg2) logFunc
      res <- readTVarIO var
      res `shouldBe` (msg1 ++ msg2)

It's silly in an example this short, since the entirety of the modify function is now in a typeclass. But with larger examples, you can see how we'd be able to specify that entire portions of our logic perform no arbitrary side-effects, while still using the ReaderT pattern to its fullest.

To put this another way: the function foo :: Monad m => Int -> m Double may appear to be impure, because it lives in a Monad. But this isn't true: by giving it a constraint of "any arbitrary instance of Monad", we're stating "this has no real side-effects." After all, the type above unifies with Identity, which of course is pure.

This example may seem a bit funny, but what about parseInt :: MonadThrow m => Text -> m Int? You may think "that's impure, it throws a runtime exception." But the type unifies with parseInt :: Text -> Maybe Int, which of course is pure. We've gained a lot of knowledge about our function and can feel safe calling it.

So the take-away here is: if you can generalize your functions to mtl-style Monad constraints, do it, you'll regain a lot of the benfits you'd have with purity.


While the technique here is certainly a bit heavy-handed, for any large-scale application or library development that cost will be amortized. I've found the benefits of working in this style to far outweigh the costs in many real world projects.

There are other problems it causes, like more confusing error messages and more cognitive overhead for someone joining the project. But in my experience, once someone is onboarded to the approach, it works out well.

In addition to the concrete benefits I listed above, using this approach automatically navigates you around many common monad transformer stack pain points that you'll see people experiencing in the real world. I encourage others to share their real-world examples of them. I personally haven't hit those problems in a long while, since I've stuck to this approach.

Post-publish updates

June 15, 2017 The comment below from Ashley about ImplicitParams spawned a Reddit discussion about the problems with that extension. Please do read the discussion yourself, but the takeaway for me is that MonadReader is a better choice.

June 12, 2017 04:24 PM

Gabriel Gonzalez

Translating a C++ parser to Haskell

<html xmlns=""><head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> <meta content="text/css" http-equiv="Content-Style-Type"/> <meta content="pandoc" name="generator"/> <style type="text/css">code{white-space: pre;}</style> <style type="text/css">div.sourceCode { overflow-x: auto; } table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode { margin: 0; padding: 0; vertical-align: baseline; border: none; } table.sourceCode { width: 100%; line-height: 100%; } td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; } td.sourceCode { padding-left: 5px; } code > { color: #007020; font-weight: bold; } /* Keyword */ code > span.dt { color: #902000; } /* DataType */ code > span.dv { color: #40a070; } /* DecVal */ code > { color: #40a070; } /* BaseN */ code > span.fl { color: #40a070; } /* Float */ code > { color: #4070a0; } /* Char */ code > { color: #4070a0; } /* String */ code > { color: #60a0b0; font-style: italic; } /* Comment */ code > span.ot { color: #007020; } /* Other */ code > { color: #ff0000; font-weight: bold; } /* Alert */ code > span.fu { color: #06287e; } /* Function */ code > { color: #ff0000; font-weight: bold; } /* Error */ code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */ code > { color: #880000; } /* Constant */ code > { color: #4070a0; } /* SpecialChar */ code > span.vs { color: #4070a0; } /* VerbatimString */ code > { color: #bb6688; } /* SpecialString */ code > { } /* Import */ code > { color: #19177c; } /* Variable */ code > { color: #007020; font-weight: bold; } /* ControlFlow */ code > span.op { color: #666666; } /* Operator */ code > span.bu { } /* BuiltIn */ code > span.ex { } /* Extension */ code > span.pp { color: #bc7a00; } /* Preprocessor */ code > { color: #7d9029; } /* Attribute */ code > { color: #ba2121; font-style: italic; } /* Documentation */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ </style></head><body>

Recently I translated Nix's derivation parser to Haskell and I thought this would make an instructive example for how C++ idioms map to Haskell idioms. This post targets people who understand Haskell's basic syntax but perhaps have difficulty translating imperative style to a functional style. I will also throw in some benchmarks at the end, too, comparing Haskell performance to C++.

Nix derivations

Nix uses "derivations" to store instructions for how to build something. The corresponding C++ type is called a Derivation, which is located here:

struct Derivation : BasicDerivation
DerivationInputs inputDrvs; /* inputs that are sub-derivations */

/* Print a derivation. */
std::string unparse() const;

... which in turn references this BasicDerivation type:

struct BasicDerivation
DerivationOutputs outputs; /* keyed on symbolic IDs */
PathSet inputSrcs; /* inputs that are sources */
string platform;
Path builder;
Strings args;
StringPairs env;

virtual ~BasicDerivation() { };

/* Return the path corresponding to the output identifier `id' in
the given derivation. */
Path findOutput(const string & id) const;

bool willBuildLocally() const;

bool substitutesAllowed() const;

bool isBuiltin() const;

bool canBuildLocally() const;

/* Return true iff this is a fixed-output derivation. */
bool isFixedOutput() const;

/* Return the output paths of a derivation. */
PathSet outputPaths() const;


We can translate the above C++ types to Haskell, even though Haskell is not an object-oriented language.

First, we can translate inheritance to Haskell by either (A) using composition instead of inheritance, like this:

struct Derivation
BasicDerivation basicDrv;

DerivationInputs inputDrvs; /* inputs that are sub-derivations */

/* Print a derivation. */
std::string unparse() const;

... or (B) flattening the class hierarchy by combining both classes into an equivalent single-class definition, like this:

struct Derivation
DerivationOutputs outputs; /* keyed on symbolic IDs */
PathSet inputSrcs; /* inputs that are sources */
string platform;
Path builder;
Strings args;
StringPairs env;
DerivationInputs inputDrvs; /* inputs that are sub-derivations */

virtual ~Derivation() { };

/* Return the path corresponding to the output identifier `id' in
the given derivation. */
Path findOutput(const string & id) const;

bool willBuildLocally() const;

bool substitutesAllowed() const;

bool isBuiltin() const;

bool canBuildLocally() const;

/* Return true iff this is a fixed-output derivation. */
bool isFixedOutput() const;

/* Return the output paths of a derivation. */
PathSet outputPaths() const;

/* Print a derivation. */
std::string unparse() const;

This post will flatten the class hierarchy for simplicity, but in general composition is the more flexible approach for translating inheritance to a functional style.

Second, we separate out all methods into standalone functions that take the an object of that class as their first argument:

struct Derivation
DerivationOutputs outputs; /* keyed on symbolic IDs */
PathSet inputSrcs; /* inputs that are sources */
string platform;
Path builder;
Strings args;
StringPairs env;
DerivationInputs inputDrvs; /* inputs that are sub-derivations */

virtual ~Derivation() { };

/* Return the path corresponding to the output identifier `id' in
the given derivation. */
Path findOutput(Derivation drv, const string & id) const;

bool willBuildLocally(Derivation drv) const;

bool substitutesAllowed(Derivation drv) const;

bool isBuiltin(Derivation drv) const;

bool canBuildLocally(Derivation drv) const;

/* Return true iff this is a fixed-output derivation. */
bool isFixedOutput(Derivation drv) const;

/* Return the output paths of a derivation. */
PathSet outputPaths(Derivation drv) const;

/* Print a derivation. */
std::string unparse(Derivation drv) const;

This is how people used to encode object-oriented programming before there was such a thing as object-oriented programming and this pattern is common in functional languages. The disadvantage is that this leads to an import-heavy programming style.

We can now translate this C++ to Haskell now that we've reduced the code to simple data types and functions on those types:

data Derivation = Derivation
{ outputs :: DerivationOutputs
-- ^ keyed on symbolic IDs
, inputSrcs :: PathSet
-- ^ inputs that are sources
, platform :: String
, builder :: String
, args :: Strings
, env :: StringPairs
, inputDrvs :: DerivationInputs

-- | Return the path corresponding to the output identifier `id' in
-- the given derivation.
findOutput :: Derivation -> String -> Path

willBuildLocally :: Derivation -> Bool

substitutesAllowed :: Derivation -> Bool

isBuiltin :: Derivation -> Bool

canBuildLocally :: Derivation -> Bool

-- | Return true iff this is a fixed-output derivation.
isFixedOutput :: Derivation -> Bool

-- | Return the output paths of a derivation.
outputPaths :: Derivation -> PathSet

-- | Print a derivation.
unparse :: Derivation -> String

Since this post is all about parsing we won't be defining or using any of these methods, so we'll throw them away for now and stick to the datatype definition:

data Derivation = Derivation
{ outputs :: DerivationOutputs -- ^ keyed on symbolic IDs
, inputSrcs :: PathSet -- ^ inputs that are sources
, platform :: String
, builder :: String
, args :: Strings
, env :: StringPairs
, inputDrvs :: DerivationInputs

This isn't valid Haskell code, yet, because we haven't defined any of these other types, like DerivationOutputs or PathSet. We'll need to translate their respective C++ definitions to Haskell, too.

The DerivationOutput class resides in the same file:

struct DerivationOutput
Path path;
string hashAlgo; /* hash used for expected hash computation */
string hash; /* expected hash, may be null */
DerivationOutput(Path path, string hashAlgo, string hash)
this->path = path;
this->hashAlgo = hashAlgo;
this->hash = hash;
void parseHashInfo(bool & recursive, Hash & hash) const;

When we strip the methods, that translates to Haskell as:

data DerivationOutput = DerivationOutput
{ path :: Path
, hashAlgo :: String -- ^ hash used for expected hash computation
, hash :: String -- ^ expected hash, may be null

All of the other C++ types are typedefs which reside in either the same file or in Nix's types.hh file:

I'll consolidate all the relevant typedefs here:

typedef string                             Path;
typedef set<Path> PathSet;
typedef list<string> Strings;
typedef set<string> StringSet;
typedef std::map<string, string> StringPairs;
typedef std::map<Path, StringSet> DerivationInputs;
typedef std::map<string, DerivationOutput> DerivationOutputs;

The Haskell analog of a C++ typedef is a type synonym, and the above C++ typedefs translate to the following type synonyms:

import Data.Map (Map)
import Data.Set (Set)

type Path = String
type PathSet = Set Path
type Strings = [String] -- [a] is Haskell syntax for "list of `a`s"
type StringSet = Set String
type StringPairs = Map String String
type DerivationInputs = Map Path StringSet
type DerivationOutputs = Map String DerivationOutput

Note that Haskell type synonyms reverse the order of the types compared to C++. The new type that you define goes on the left and the body of the definition goes on the right. Haskell's order makes more sense to me, since I'm used to the same order when defining values like x = 5.

There are a few more changes that I'd like to make before we proceed to the parsing code:

First, Haskell's String type and default list type are inefficient for both performance and space utilization, so we will replace them with Text and Vector, respectively. The latter types are more compact and provide better performance:

import Data.Map (Map)
import Data.Set (Set)
import Data.Text (Text)
import Data.Vector (Vector)

type Path = Text
type PathSet = Set Path
type Strings = Vector Text
type StringSet = Set Text
type StringPairs = Map Text Text
type DerivationInputs = Map Path StringSet
type DerivationOutputs = Map Text DerivationOutput

Second, I prefer to use a separate type for Paths that is not synonymous with Text in order to avoid accidentally conflating the two:

import Filesystem.Path.CurrentOS (FilePath)
import Data.Map (Map)
import Data.Set (Set)
import Data.Text (Text)
import Data.Vector (Vector)
-- The Prelude `FilePath` is a synonym for `String`
import Prelude hiding (FilePath)

type Path = FilePath
type PathSet = Set Path
type Strings = Vector Text
type StringSet = Set Text
type StringPairs = Map Text Text
type DerivationInputs = Map Path StringSet
type DerivationOutputs = Map Text DerivationOutput

Third, I prefer to avoid use type synonyms since I believe they make Haskell code harder to read. Instead, I fully inline all types, like this:

import Filesystem.Path.CurrentOS (FilePath)
import Data.Map (Map)
import Data.Set (Set)
import Data.Text (Text)
import Data.Vector (Vector)
import Prelude hiding (FilePath)

data Derivation = Derivation
{ outputs :: Map Text DerivationOutput -- ^ keyed on symbolic IDs
, inputSrcs :: Set FilePath -- ^ inputs that are sources
, platform :: Text
, builder :: Text
, args :: Vector Text
, env :: Map Text Text
, inputDrvs :: Map FilePath (Set Text)

data DerivationOutput = DerivationOutput
{ path :: FilePath
, hashAlgo :: Text -- ^ hash used for expected hash computation
, hash :: Text -- ^ expected hash, may be null

Fourth, Haskell lets you auto-generate code to render the data type, which is useful for debugging purposes. All you have to do is add deriving (Show) to the end of the datatype definition, like this:

import Filesystem.Path.CurrentOS (FilePath)
import Data.Map (Map)
import Data.Set (Set)
import Data.Text (Text)
import Data.Vector (Vector)
import Prelude hiding (FilePath)

data Derivation = Derivation
{ outputs :: Map Text DerivationOutput -- ^ keyed on symbolic IDs
, inputSrcs :: Set FilePath -- ^ inputs that are sources
, platform :: Text
, builder :: Text
, args :: Vector Text
, env :: Map Text Text
, inputDrvs :: Map FilePath (Set Text)
} deriving (Show)

data DerivationOutput = DerivationOutput
{ path :: FilePath
, hashAlgo :: Text -- ^ hash used for expected hash computation
, hash :: Text -- ^ expected hash, may be null
} deriving (Show)

Finally, we'll change the order of the Derivation fields to match the order that they are stored when serialized to disk:

import Filesystem.Path.CurrentOS (FilePath)
import Data.Map (Map)
import Data.Set (Set)
import Data.Text (Text)
import Data.Vector (Vector)
import Prelude hiding (FilePath)

data Derivation = Derivation
{ outputs :: Map Text DerivationOutput -- ^ keyed on symbolic IDs
, inputDrvs :: Map FilePath (Set Text)
, inputSrcs :: Set FilePath -- ^ inputs that are sources
, platform :: Text
, builder :: Text
, args :: Vector Text
, env :: Map Text Text
} deriving (Show)

data DerivationOutput = DerivationOutput
{ path :: FilePath
, hashAlgo :: Text -- ^ hash used for expected hash computation
, hash :: Text -- ^ expected hash, may be null
} deriving (Show)

Derivation format

Nix stores derivations as *.drv files underneath the /nix/store directory. For example, here is what one such file looks like:

$ cat /nix/store/zzhs4fb83x5ygvjqn5rdpmpnishpdgy6-perl-MIME-Types-2.13.drv
t devdoc"),("propagatedBuildInputs",""),("propagatedNativeBuildInputs",""),("src

This corresponds to the following Haskell value using the types we just defined:

{ outputs =
[ ( "devdoc"
, DerivationOutput
{ path =
, hashAlgo = ""
, hash = ""
, ( "out"
, DerivationOutput
{ path =
, hashAlgo = ""
, hash = ""
, inputDrvs =
[ ( "/nix/store/57h2hjsdkdiwbzilcjqkn46138n1xb4a-perl-5.22.3.drv"
, Data.Set.fromList [ "out" ]
, ( "/nix/store/cvdbbvnvg131bz9bwyyk97jpq1crclqr-MIME-Types-2.13.tar.gz.drv"
, Data.Set.fromList [ "out" ]
, ( "/nix/store/p5g31bc5x92awghx9dlm065d7j773l0r-stdenv.drv"
, Data.Set.fromList [ "out" ]
, ( "/nix/store/x50y5qihwsn0lfjhrf1s81b5hgb9w632-bash-4.4-p5.drv"
, Data.Set.fromList [ "out" ]
, inputSrcs =
[ "/nix/store/"
, platform = "x86_64-linux"
, builder =
, args =
[ "-e" , "/nix/store/" ]
, env =
, ( "PERL_AUTOINSTALL" , "--skipdeps" )
, ( "buildInputs" , "" )
, ( "builder"
, "/nix/store/fi3mbd2ml4pbgzyasrlnp0wyy6qi48fh-bash-4.4-p5/bin/bash"
, ( "checkTarget" , "test" )
, ( "devdoc"
, "/nix/store/15x9ii8c3n5wb5lg80cm8x0yk6zy7rha-perl-MIME-Types-2.13-devdoc"
, ( "doCheck" , "1" )
, ( "installTargets" , "pure_install" )
, ( "name" , "perl-MIME-Types-2.13" )
, ( "nativeBuildInputs"
, "/nix/store/nsa311yg8h93wfaacjk16c96a98bs09f-perl-5.22.3"
, ( "out"
, "/nix/store/93d75ghjyibmbxgfzwhh4b5zwsxzs44w-perl-MIME-Types-2.13"
, ( "outputs" , "out devdoc" )
, ( "propagatedBuildInputs" , "" )
, ( "propagatedNativeBuildInputs" , "" )
, ( "src"
, "/nix/store/5smhymz7viq8p47mc3jgyvqd003ab732-MIME-Types-2.13.tar.gz"
, ( "stdenv"
, "/nix/store/s3rlr45jzlzx0d6k2azlpxa5zwzr7xyy-stdenv"
, ( "system" , "x86_64-linux" )

We can express the serialization format using the following Extended Backus-Naur Form:

= 'Derive('
, outputs
, ','
, inputDrvs
, ','
, inputSrcs
, ','
, platform
, ','
, builder
, ','
, args
, ','
, env
, ')'

outputs = '[]' | '[', output, { ',', output }, ']'

output = '(', string, ',', path, ',', string, ',', string, ')'

inputDrvs = '[]' | '[', inputDrv, { ',', inputDrv }, ']'

inputDrv = '(', path, ',' strings, ')'

inputSrcs = paths

platform = string

builder = string

args = strings

env = '[]' | '[', stringPair, { ',', stringPair }, ']'

stringPair = '(', string, ',' string, ')'

strings = '[]' | '[', string, { ',', string }, ']'

paths = '[]' | '[', path, { ',', path }, ']'

string = '"', { char }, '"'

path = '"/', { char }, '"'

char = ( '\', <any character> ) | <character other than '"' or '\'>

Now we just need a way to convert from Nix's serialization format to the Derivation type.

Parsing derivations

You can find Nix's parseDerivation function here:

... which is what we will translate to Haskell. If you would like to follow along you can find the completed parser code in Appendix A.

Let's start from the top:

static Derivation parseDerivation(const string & s)
Derivation drv;
istringstream_nocopy str(s);
expect(str, "Derive([");

/* Parse the list of outputs. */
while (!endOfList(str)) {
DerivationOutput out;
expect(str, "("); string id = parseString(str);
expect(str, ","); out.path = parsePath(str);
expect(str, ","); out.hashAlgo = parseString(str);
expect(str, ","); out.hash = parseString(str);
expect(str, ")");
drv.outputs[id] = out;


static bool endOfList(std::istream & str)
if (str.peek() == ',') {
return false;
if (str.peek() == ']') {
return true;
return false;

The first thing that the C++ parses is the string "Derive(" followed by a list of DerivationOutputs. The code consolidates the first '[' character of the list with the string "Derive(" which is why the code actually matches "Derive([".

This code corresponds to the outputs field of our Derivation type:

data Derivation = Derivation
{ outputs :: Map Text DerivationOutput -- ^ keyed on symbolic IDs
} deriving (Show)

data DerivationOutput = DerivationOutput
{ path :: FilePath
, hashAlgo :: Text -- ^ hash used for expected hash computation
, hash :: Text -- ^ expected hash, may be null
} deriving (Show)

Derivation files store the outputs field of our Derivation type as a list of 4-tuples. The first field of each 4-tuple is a key in our Map and the remaining three fields are the corresponding value, which is marshalled into a DerivationOutput.

The C++ code interleaves the logic for parsing the list structure and parsing each element but our Haskell code will separate the two for clarity:

{-# LANGUAGE RecordWildCards   #-}
{-# LANGUAGE OverloadedStrings #-}

import Data.Attoparsec.Text.Lazy (Parser)
import Data.Map (Map)

parseDerivation :: Parser Derivation
parseDerivation = do

let keyValue0 :: Parser (Text, DerivationOutput)
keyValue0 = do
key <- string
path <- filepath
hashAlgo <- string
hash <- string
return (key, DerivationOutput {..})

outputs <- mapOf keyValue0


-- We will fill these in later

mapOf :: Ord k => Parser (k, v) -> Parser (Map k v)
mapOf = ???

string :: Parser Text
string = ???

filepath :: Parser FilePath
filepath = ???

You can read the Haskell code as saying:

  • First match the string "Derive("
  • Now define a parser for a key-value pair called keyValue0, which will:
    • Match the string "("
    • Parse a string and stores result as a value named key
    • Match the string ","
    • Parse a path and stores result as a value named path
    • Match the string ","
    • Parse a string and stores result as a value named hashAlgo
    • Match the string ","
    • Parse a string and stores result as a value named hash
    • Match the string ")"
    • Returns a key-value pair:
      • The key is key
      • The value is a DerivationOutput built from path/hashAlgo/hash
        • The {..} populates record fields with values of the same name
  • Use the mapOf utility to parse a list of key-value pairs as a Map

Also, the OverloadedStrings extension is the reason we can use naked string literals as parsers that match the given literal.

If we really wanted to be like the C++ code we could put more than one statement on each line using semicolons, like this:

    let keyValue0 :: Parser (Text, DerivationOutput)
keyValue0 = do
"("; key <- string
","; path <- filepath
","; hashAlgo <- string
","; hash <- string
return (key, DerivationOutput {..})

... but I prefer to keep them on separate lines for readability.

The code has placeholders for three utilities we haven't defined yet with the following types:

-- This is a utility function that transforms a parser of key-value pairs into a
-- parser for a `Map`
:: Ord k
-- ^ This is a "constraint" and not a function argument. This constraint
-- says that `k` can be any type as long as we can compare two values of
-- type `k`
=> Parser (k, v)
-- ^ This is the actual first function argument: a parser of key-value
-- pairs. The type of the key (which we denote as `k`) can be any type as
-- long as `k` is comparable (due to the `Ord k` constraint immediately
-- preceding this). The type of the value (which we dnote as `v`) can be
-- any type
-> Parser (Map k v)
-- ^ This is the function output: a parser of a `Map k v` (i.e. a map from
-- keys of type `k` to values of type `v`)

-- This is a utility which parses a string literal according to the EBNF rule
-- named `string`
string :: Parser Text

-- This is a utility which parses a string literal according to the EBNF rule
-- named `path`
filepath :: Parser FilePath

mapOf is fairly simple to define:

import qualified Data.Attoparsec.Text.Lazy
import qualified Data.Map

mapOf :: Ord k => Parser (k, v) -> Parser (Map k v)
mapOf keyValue = do
keyValues <- listOf keyValue
return (Data.Map.fromList keyValues)

-- | Given a parser for an element, return a parser for a list of elements
listOf :: Parser a -> Parser [a]
listOf element = do
es <- Data.Attoparsec.Text.Lazy.sepBy element ","
return es

mapOf use a helper function named listOf which parses a list of values. This parser takes advantage of the handy sepBy utility (short for "separated by") provided by Haskell's attoparsec library. You can read the implementation of listBy as saying:

  • Match the string "["
  • Match 0 or more elements separated by commas
  • Match the string "]"

Then you can read the implementation of mapOf as saying:

  • Parse a list of keyValue pairs
  • Use Data.Map.fromList to transform that into the corresponding Map

We can now use mapOf and listOf to transform the next block of parsing code, too:

static Derivation parseDerivation(const string & s)

/* Parse the list of input derivations. */
expect(str, ",[");
while (!endOfList(str)) {
expect(str, "(");
Path drvPath = parsePath(str);
expect(str, ",[");
drv.inputDrvs[drvPath] = parseStrings(str, false);
expect(str, ")");


static StringSet parseStrings(std::istream & str, bool arePaths)
StringSet res;
while (!endOfList(str))
res.insert(arePaths ? parsePath(str) : parseString(str));
return res;

The corresponding Haskell code is:

import qualified Data.Set

parseDerivation :: Parser Derivation
parseDerivation = do

let keyValue1 = do
key <- filepath
value <- setOf string
return (key, value)
inputDrvs <- mapOf keyValue1


setOf :: Ord a => Parser a -> Parser (Set a)
setOf element = do
es <- listOf element
return (Data.Set.fromList es)

The only difference is that the Haskell code doesn't define a parser for a set of strings. Instead, the Haskell code defines a more general parser for a set of any type of value.

The remaining parsing logic is fairly straightforward to translate. This C++ code:

static Derivation parseDerivation(const string & s)

expect(str, ",["); drv.inputSrcs = parseStrings(str, true);
expect(str, ","); drv.platform = parseString(str);
expect(str, ","); drv.builder = parseString(str);

/* Parse the builder arguments. */
expect(str, ",[");
while (!endOfList(str))

/* Parse the environment variables. */
expect(str, ",[");
while (!endOfList(str)) {
expect(str, "("); string name = parseString(str);
expect(str, ","); string value = parseString(str);
expect(str, ")");
drv.env[name] = value;

expect(str, ")");


... becomes this Haskell code:

import qualified Data.Vector

parseDerivation :: Parser Derivation
parseDerivation = do

inputSrcs <- setOf filepath

platform <- string

builder <- string

args <- vectorOf string

let keyValue2 = do
key <- string
value <- string
return (key, value)
env <- mapOf keyValue2



vectorOf :: Parser a -> Parser (Vector a)
vectorOf element = do
es <- listOf element
return (Data.Vector.fromList es)

The only thing missing is to translate the C++ code for parsing strings and paths to Haskell. The original C++ code is:

/* Read a C-style string from stream `str'. */
static string parseString(std::istream & str)
string res;
expect(str, "\"");
int c;
while ((c = str.get()) != '"')
if (c == '\\') {
c = str.get();
if (c == 'n') res += '\n';
else if (c == 'r') res += '\r';
else if (c == 't') res += '\t';
else res += c;
else res += c;
return res;

However, we won't naively translate that to Haskell because this is on our parser's critical path for performance. Haskell's attoparsec library only guarantees good performance if you use bulk parsing primitives when possible instead of character-at-a-time parsing loops.

Our Haskell string literal parser will be a loop, but each iteration of the loop will parse a string block instead of a single character:

import qualified Data.Text.Lazy

string :: Parser Text
string = do
let predicate c = not (c == '"' || c == '\\')
let loop = do
text0 <- Data.Attoparsec.Text.Lazy.takeWhile predicate
char0 <- Data.Attoparsec.Text.Lazy.anyChar
text2 <- case char0 of
'"' -> return ""
_ -> do
char1 <- Data.Attoparsec.Text.Lazy.anyChar
char2 <- case char1 of
'n' -> return '\n'
'r' -> return '\r'
't' -> return '\t'
_ -> return char1
text1 <- loop
return (Data.Text.Lazy.cons char2 text1)
return (Data.Text.Lazy.toStrict text0 <> text2)

In Haskell, loops become recursive definitions such as the above loop. You can read the above parser as saying:

  • Match a double quote character: "\""
  • Now, define a function named predicate
    • predicate takes a single character c as input
    • predicate returns True if c is neither a quote nor a backslash
  • Now define a loop named loop, which will:
    • Consume consecutive characters up to first quote or backslash (text0)
    • Consume the next character (char0) and branch on its value:
      • If char0 is a double quote, then text2 is the empty string
      • If char0 is a backslash then:
        • Consume the next character (char1) and branch on its value:
          • If char1 is n/r/t then char2 is the matching escape code
          • Otherwise, char2 is just char1
        • Run loop again to parse the rest of the string (text1)
        • text2 is char2 prepended onto text1
    • Return a lazy text0 concatenated with text2
      • Concatenation is more efficient for lazy Text than strict Text
  • Run our recursive loop and store the result as text
  • Transform our lazy Text back into a strict Text result

Once we have a string parser we can then implement the filepath parser. The C++ version is:

static Path parsePath(std::istream & str)
string s = parseString(str);
if (s.size() == 0 || s[0] != '/')
throw FormatError(format("bad path ‘%1%’ in derivation") % s);
return s;

The corresponding Haskell code is:

filepath :: Parser FilePath
filepath = do
text <- string
case Data.Text.uncons text of
Just ('/', _) -> do
return (Filesystem.Path.CurrentOS.fromText text)
_ -> do
fail ("bad path ‘" <> Data.Text.unpack text <> "’ in derivation")

You can read that as saying:

  • Parse a string and store the result as a value named text
  • Inspect the beginning of the string and branch on the result:
    • If the beginning of the string is '/', then convert the string to a path
    • If the string is empty or does not begin with '/', then die

If we wanted to more closely match the C++ version, we could have done something like this:

import Prelude hiding (FilePath)
import Filesystem.Path.CurrentOS (FilePath)

import qualified Data.Text
import qualified Filesystem.Path.CurrentOS

filepath :: Parser FilePath
filepath = do
text <- string
case Data.Text.uncons text of
Just ('/', _) -> do
return ()
_ -> do
fail ("bad path ‘" <> Data.Text.unpack text <> "’ in derivation")
return (Filesystem.Path.CurrentOS.fromText text)

The reason this works is because Haskell's return is not the same as return in C/C++/Java because the Haskell return does not exit from the surrounding subroutine. Indeed, there is no such thing as a "surrounding subroutine" in Haskell and that's a good thing!

In this context the return function is like a Parser that does not parse anything and returns any value that you want. More generally, return is used to denote a subroutine that does nothing and produces a value that can be stored just like any other command.


Let's test the performance of our parser on a sample derivation file:

import Criterion (Benchmark)

import qualified Criterion
import qualified Criterion.Main
import qualified Data.Attoparsec.Text.Lazy
import qualified Data.Text.Lazy.IO
import qualified Nix.Derivation

main :: IO ()
main = Criterion.Main.defaultMain benchmarks

benchmarks :: [Benchmark]
benchmarks =
[ Criterion.Main.env
(Data.Text.Lazy.IO.readFile "/nix/store/zx3rshaya690y0xlc64jb8i12ljr8nyp-ghc-8.0.2-with-packages.drv")
bench0 example =
Criterion.bench "example" ( parseExample example)

parseExample =
Data.Attoparsec.Text.Lazy.parse Nix.Derivation.parseDerivation

... where /nix/store/zx3rshaya690y0xlc64jb8i12ljr8nyp-ghc-8.0.2-with-packages.drv is a 15 KB file that you can find in Appendix B of this post. This benchmark gives the following results:

Running 1 benchmarks...
Benchmark benchmark: RUNNING...
benchmarking example
time 3.230 ms (3.215 ms .. 3.244 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 3.265 ms (3.251 ms .. 3.285 ms)
std dev 54.87 μs (41.41 μs .. 74.99 μs)

Benchmark benchmark: FINISH

Our derivation file is 15,210 characters long, so that comes out to about 200 nanoseconds per character to parse. That's not bad, but could still use improvement. However, I stopped optimizing at this point because I did some experiments that showed that parsing was no longer the bottleneck for even a trivial program.

I compared the performance of an executable written in Haskell to the nix-store executable (written in C++) to see how fast each one could display the outputs of a list of derivations. I ran them on 169 derivations all beginning with the letter z in their hash:

$ ls -d /nix/store/z*.drv | wc -l

The nix-store command lets you do this with nix-store --query --outputs:

$ nix-store --query --outputs /nix/store/z*.drv

I compared that to the following Haskell program, which parses a list of paths from the command line and then displays their outputs:

{-# LANGUAGE OverloadedStrings #-}

import Data.Attoparsec.Text.Lazy (Result(..))

import qualified Data.Attoparsec.Text.Lazy
import qualified Data.Text.Lazy.IO
import qualified Nix.Derivation
import qualified Options.Generic

main :: IO ()
main = do
paths <- Options.Generic.getRecord "Get the outputs of a Nix derivation"
mapM_ process (paths :: [FilePath])

process :: FilePath -> IO ()
process path = do
text <- Data.Text.Lazy.IO.readFile path
case Data.Attoparsec.Text.Lazy.parse Nix.Derivation.parseDerivation text of
Fail _ _ string -> fail string
Done _ derivation -> do
let printOutput output = print (Nix.Derivation.path output)
mapM_ printOutput (Nix.Derivation.outputs derivation)

... which gives this output:

$ query-outputs /nix/store/z*.drv
FilePath "/nix/store/qq46wcgwk7lh7v5hvlsbr3gi30wh7a81-ansi-wl-pprint-"
FilePath "/nix/store/sn0v9rkg0q5pdhm6246c7sigrih22k9h-tagged-0.8.5"
FilePath "/nix/store/zsryzwadshszfnkm740b2412v88iqgi4-semigroups-0.18.2"
FilePath "/nix/store/mxl1p0033xf8yd6r5i6h3jraz40akqyb-perl-DBIx-Class-0.082840-devdoc"

I benchmarked both of these executables using my bench utility. Benchmarks show that both executables take the same amount of time to process all 169 derivation files:

$ bench 'nix-store --query /nix/store/z*.drv'
benchmarking nix-store --query /nix/store/z*.drv
time 84.19 ms (83.16 ms .. 85.40 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 84.33 ms (83.92 ms .. 84.84 ms)
std dev 781.0 μs (581.5 μs .. 1.008 ms)

$ bench 'query-outputs /nix/store/z*.drv'
benchmarking query-outputs /nix/store/z*.drv
time 83.52 ms (82.88 ms .. 83.85 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 84.12 ms (83.83 ms .. 84.67 ms)
std dev 606.0 μs (161.1 μs .. 849.9 μs)

Also, note that 9 milliseconds are due to the overhead of the benchmark tool running a subprocess:

$ bench true
benchmarking true
time 9.274 ms (9.161 ms .. 9.348 ms)
0.998 R² (0.995 R² .. 1.000 R²)
mean 9.324 ms (9.233 ms .. 9.502 ms)
std dev 333.5 μs (183.1 μs .. 561.9 μs)
variance introduced by outliers: 15% (moderately inflated)

... so if you factor in that overhead then both tools process derivations at a rate of about 440 microseconds per file. Given that the Haskell executable is exactly as efficient as C++ I figured that there was no point further optimizing the code. The first draft is simple, clear and efficient enough.


Hopefully this helps people see that you can translate C++ parsing code to Haskell. The main difference is that Haskell parsing libraries provide some higher-level abstractions and Haskell programs tend to define loops via recursion instead of iteration.

The Haskell code is simpler than the C++ code and efficient, too! This is why I recommend Haskell to people who want want a high-level programming language without sacrificing performance.

I also released the above Haskell parser as part of the nix-derivation library in case people were interested in using this code. You can find the library on Hackage or on GitHub.

Appendix A: Completed parser

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards #-}

import Data.Attoparsec.Text.Lazy (Parser)
import Data.Map (Map)
import Data.Monoid ((<>))
import Data.Set (Set)
import Data.Text (Text)
import Data.Vector (Vector)
import Nix.Derivation.Types (Derivation(..), DerivationOutput(..))
import Prelude hiding (FilePath)
import Filesystem.Path.CurrentOS (FilePath)

import qualified Data.Attoparsec.Text.Lazy
import qualified Data.Map
import qualified Data.Set
import qualified Data.Text
import qualified Data.Text.Lazy
import qualified Data.Vector
import qualified Filesystem.Path.CurrentOS

-- | Parse a derivation
parseDerivation :: Parser Derivation
parseDerivation = do

let keyValue0 = do
key <- string
path <- filepath
hashAlgo <- string
hash <- string
return (key, DerivationOutput {..})
outputs <- mapOf keyValue0


let keyValue1 = do
key <- filepath
value <- setOf string
return (key, value)
inputDrvs <- mapOf keyValue1


inputSrcs <- setOf filepath


platform <- string


builder <- string


args <- vectorOf string


let keyValue2 = do
key <- string
value <- string
return (key, value)
env <- mapOf keyValue2


return (Derivation {..})

string :: Parser Text
string = do
let predicate c = not (c == '"' || c == '\\')
let loop = do
text0 <- Data.Attoparsec.Text.Lazy.takeWhile predicate
char0 <- Data.Attoparsec.Text.Lazy.anyChar
text2 <- case char0 of
'"' -> return ""
_ -> do
char1 <- Data.Attoparsec.Text.Lazy.anyChar
char2 <- case char1 of
'n' -> return '\n'
'r' -> return '\r'
't' -> return '\t'
_ -> return char1
text1 <- loop
return (Data.Text.Lazy.cons char2 text1)
return (Data.Text.Lazy.fromStrict text0 <> text2)
text <- loop
return (Data.Text.Lazy.toStrict text)

filepath :: Parser FilePath
filepath = do
text <- string
case Data.Text.uncons text of
Just ('/', _) -> do
return (Filesystem.Path.CurrentOS.fromText text)
_ -> do
fail ("bad path ‘" <> Data.Text.unpack text <> "’ in derivation")

listOf :: Parser a -> Parser [a]
listOf element = do
es <- Data.Attoparsec.Text.Lazy.sepBy element ","
return es

setOf :: Ord a => Parser a -> Parser (Set a)
setOf element = do
es <- listOf element
return (Data.Set.fromList es)

vectorOf :: Parser a -> Parser (Vector a)
vectorOf element = do
es <- listOf element
return (Data.Vector.fromList es)

mapOf :: Ord k => Parser (k, v) -> Parser (Map k v)
mapOf keyValue = do
keyValues <- listOf keyValue
return (Data.Map.fromList keyValues)

Appendix B: Example derivation

qi48fh-bash-4.4-p5/bin/bash",["-e","/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-"],[("allowSubstitutes",""),("buildCommand","mkdir -p $out\nfo
r i in $paths; do\n /nix/store/lnai0im3lcpb03arxfi0wx1dm7anf4f8-lndir-1.0.3/bin
/lndir $i $out\ndone\n. /nix/store/plmya6mkfvq658ba7z6j6n36r5pdbxk5-hook/nix-sup
port/setup-hook\n\n# wrap compiler executables with correct env variables\n\nfor
prg in ghc ghci ghc-8.0.2 ghci-8.0.2; do\n if [[ -x \"/nix/store/s0hpng652hsn4
0jy4kjdh1x0jm86dx9l-ghc-8.0.2/bin/$prg\" ]]; then\n rm -f $out/bin/$prg\n
makeWrapper /nix/store/s0hpng652hsn40jy4kjdh1x0jm86dx9l-ghc-8.0.2/bin/$prg $out/
bin/$prg \\\n --add-flags '\"-B$NIX_GHC_LIBDIR\"'
\\\n --set \"NIX_GHC\" \"$out/bin/ghc\" \\\n
--set \"NIX_GHCPKG\" \"$out/bin/ghc-pkg\" \\\n --set \"NIX_GHC_DOC
DIR\" \"$out/share/doc/ghc/html\" \\\n --set \"NIX_GHC_LIB
DIR\" \"$out/lib/ghc-8.0.2\" \\\n \n fi\ndone\n\nfor prg
in runghc runhaskell; do\n if [[ -x \"/nix/store/s0hpng652hsn40jy4kjdh1x0jm86dx
9l-ghc-8.0.2/bin/$prg\" ]]; then\n rm -f $out/bin/$prg\n makeWrapper /nix/
store/s0hpng652hsn40jy4kjdh1x0jm86dx9l-ghc-8.0.2/bin/$prg $out/bin/$prg
\\\n --add-flags \"-f $out/bin/ghc\"
\\\n --set \"NIX_GHC\" \"$out/bin/ghc\" \\\n --set \"
NIX_GHCPKG\" \"$out/bin/ghc-pkg\" \\\n --set \"NIX_GHC_DOCDIR\" \"$out/
share/doc/ghc/html\" \\\n --set \"NIX_GHC_LIBDIR\" \"$out/
lib/ghc-8.0.2\"\n fi\ndone\n\nfor prg in ghc-pkg ghc-pkg-8.0.2; do\n if [[ -x
\"/nix/store/s0hpng652hsn40jy4kjdh1x0jm86dx9l-ghc-8.0.2/bin/$prg\" ]]; then\n
rm -f $out/bin/$prg\n makeWrapper /nix/store/s0hpng652hsn40jy4kjdh1x0jm86dx9
l-ghc-8.0.2/bin/$prg $out/bin/$prg --add-flags \"--global-package-db=$out/lib/gh
c-8.0.2/package.conf.d\"\n fi\ndone\n$out/bin/ghc-pkg recache\n\n$out/bin/ghc-p
kg check\n\n"),("buildInputs",""),("builder","/nix/store/fi3mbd2ml4pbgzyasrlnp0w
yy6qi48fh-bash-4.4-p5/bin/bash"),("extraOutputsToInstall","out doc"),("ignoreCol
ing-compat-0.3.6 /nix/store/v2qsqznrik64f46msahvgg7dmaiag18k-hnix-0.3.4 /nix/sto
re/vbkqj8zdckqqiyjh08ykx75fwc90gwg4-optparse-applicative- /nix/store/6m7
qia8q0rkdkzvmiak38kdscf27malf-optparse-generic-1.1.5 /nix/store/r687llig7vn9x15h
hkmfak01ff7082n6-utf8-string- /nix/store/j6gvad67dav8fl3vdbqmar84kgmh5gar
-reducers-3.12.1 /nix/store/i8wf08764lknc0f9ja12miqvg509jn1k-fingertree-
/nix/store/301hq4fabrpbi3l47n908gvakkzq1s88-blaze-markup- /nix/store/055m
hi44s20x5xgxdjr82vmhnyv79pzl-blaze-html- /nix/store/vnc1yyig90skcwx3l1xrb
p1jqwmmb9xv-trifecta- /nix/store/vraffi24marw5sks8b78xrim6c8i1ng6-double-
conversion- /nix/store/kwdk03p0lyk5lyll1fp7a6z20j17b3sx-text-format-0.3.1
.1 /nix/store/zn5hlw3y94sbli4ssygr2w04mpb396zs-system-filepath- /nix/sto
re/jn7lbnk0gsirj8kb02an31v8idy7ym3c-system-fileio- /nix/store/9frfci9ywf
9lc216ci9nwc1yy0qwrn1b-integer-logarithms-1.0.1 /nix/store/rps46jwa7yyab629p27la
r094gk8dal2-scientific- /nix/store/c4a3ynvnv3kdxgd7ngmnjhka4mvfk8ll-atto
parsec- /nix/store/kc34l1gpzh65y4gclmv4dgv6agpmagdi-parsers-0.12.4 /nix/
store/1kf78yxf3lliagb5rc5din24iq40g96y-base-prelude- /nix/store/hi868d12p
kzcbzyvp7a7cigc58mp2lmg-neat-interpolation- /nix/store/h00jrbdvzj4yfy796j
8vq00lkd1gxr6w-primitive- /nix/store/vys8qsf317rn8qwy00p80zlywb47lqwz-vec
tor- /nix/store/wchch11312m3lxkwl8rad04x02svcs3i-reflection-2.1.2 /nix/s
tore/jj1kfv52mjxp54flz8v5ba64va3hvy22-parallel- /nix/store/jwj23y7vfvs14j
drkw1py9q7lm9fyhy4-adjunctions-4.3 /nix/store/px4979la9b98knwv36551zg3p5jb69lw-k
an-extensions-5.0.1 /nix/store/2cp1ar0f73jrcn231ai07zpwayy735j2-semigroupoids-5.
1 /nix/store/3nkxw5wdadckz28laijrvwdkkfqp07sb-profunctors-5.2 /nix/store/bd3njvy
0ahcsqw47vaz5zayhx34hari7-prelude-extras- /nix/store/zdp7zqasz1l1wifpngbg
6ngq189gbbqh-free-4.12.4 /nix/store/n7c5ynfqc6j570bbyaajqx34c3pvfvph-tagged-0.8.
5 /nix/store/xdkhd7mkqj2mmcami8ycmf7j0valwp5h-distributive-0.5.2 /nix/store/9dxb
a4g9x0xjj21r3vchqnh4rdwbc31b-void-0.7.2 /nix/store/dahah2ivrn4hc5gjygnlvxlad2399
zqh-StateVar- /nix/store/f2rdi1bx46fs165n1j316k5w90ab6lwy-contravariant-1
.4 /nix/store/mgg9rsvhvn4dd4qzv559nn24iqvspjnb-comonad-5.0.1 /nix/store/18n8i570
pf4gpszdyc0bki9qxm1p9xd7-bifunctors-5.4.2 /nix/store/d8ys5wq4wrvdjqw0bzv3y23zqpr
khjs2-base-orphans-0.5.4 /nix/store/j4hbyhnj4a2z4z4vb1437vk7ha0b287a-lens-4.15.1
/nix/store/ra3jh12mbyz82n4gvj2bam77vl8aabbq-x509-system-1.6.4 /nix/store/ps8915
q1047frp891jg1anp85ads0s9b-x509-validation-1.6.5 /nix/store/5vrgrls6l1cdsbbznis3
9chx8scq2r98-x509-store-1.6.2 /nix/store/7vvg8y8fp0s50qiciq11irfvh31f1q58-pem-0.
2.2 /nix/store/myv75wk9s19f8vms2dcy6sl773288zy4-asn1-parse-0.9.4 /nix/store/kwyc
1jdz09lazw21qpc96wyamxalcg11-x509-1.6.5 /nix/store/gadc7c6d1lqn0wqk29bhn56is67x0
r45-cryptonite-0.21 /nix/store/ix26y5rpidwpgjzrsixz0ff59j1p1swr-foundation-0.0.7
/nix/store/n784p4qh18zx9v8ag3n3ypszq1kifjjr-memory-0.14.3 /nix/store/h3qq6m5ahd
b4kw784gcvx2skil8ilks8-hourglass-0.2.10 /nix/store/dn65dl65spk4j0sky2zpdig75c42y
cj1-asn1-types-0.3.2 /nix/store/s5jklkk0y6i7d8h3akgsciv1kv2js786-asn1-encoding-0
.9.5 /nix/store/g5qjgns5cyz9c5xw4w5s2iji1kbhg47z-tls-1.3.10 /nix/store/iyllk46by
75f428pwis9v74jpr1rmk4x-cereal- /nix/store/b22wyyl3wdl6kb7gkpk3yxnynk340l
ya-socks-0.5.5 /nix/store/05r3i8w2n7hbxqyb4w8rina9rldyacd3-byteable-0.1.1 /nix/s
tore/xjbl6w60czyfqlfwwfs5q93by144yr1n-connection-0.2.8 /nix/store/j10yqzk323rvnw
gsk3nj7rgmvqlv035a-http-client-tls- /nix/store/vf84v2398g55mai2gjh2d9gipw
izhhzd-zlib- /nix/store/7h7vy3mi603y536dgvxwfglaacxw5ra8-async- /n
ix/store/y6hh2ifv35afw1j5phpzp1y72x532izn-streaming-commons-0.1.17 /nix/store/f5
jdarp8djisa1wrv4bv1saimrabcb3f-random-1.1 /nix/store/18vpnmd28bnjib6andw8bx522wc
b3zwa-parsec-3.1.11 /nix/store/i3ra66pcpj0v9wq3m00gh9i72br2bki3-network-uri-2.6.
1.0 /nix/store/2ck9avbwacfpi16p2ib2shw951mx33pz-network- /nix/store/rz022
7nv8n8kdrxjg3arya6r2ixxjh4h-mime-types- /nix/store/rx71j4kg0l02dginiswnmw
swdq9i9msv-http-types-0.9.1 /nix/store/y2ca4scn0n2f9qsmvsiixcnx11793jlf-transfor
mers-compat- /nix/store/bzicr83ibzzzbab6cjkb3i95sc8cvxy9-stm- /nix
/store/qk5pl6r2h0vfkhhwjgrv8x1ldf8dyj5a-mtl-2.2.1 /nix/store/0d6k71ljl108dgq1l7l
3pz12bfwv0z4h-exceptions-0.8.3 /nix/store/z5k23ymwjhhpd670a7mcsm1869hlpncf-old-l
ocale- /nix/store/k4an783d4j3m48fqhx7gpnizqg2ns38j-data-default-class-0.1
.2.0 /nix/store/p5867jsig02zi0ynww9w4916nm0k527s-cookie- /nix/store/wy7j4
2kqlw1sskagmyc1bzb0xv04s2na-case-insensitive- /nix/store/j35339b0nk7k3qaq
3m75nl3i4x603rqf-blaze-builder- /nix/store/33mip0ql9x1jjbhi34kf8izh4ilyf2
k0-base64-bytestring- /nix/store/29a73kd2jkwvfdcrhysmi5xjr7nysrxf-http-cl
ient- /nix/store/d2hy666g79qvhmbh520x5jclwvnr1gk2-text- /nix/store
/2bdzia66lg08d5zngmllcjry2c08m96j-hashable- /nix/store/7kdgc6c0b21s9j5qgg
0s0gxj7iid2wk5-unordered-containers- /nix/store/zsryzwadshszfnkm740b2412v
88iqgi4-semigroups-0.18.2 /nix/store/h2c0kz3m83x6fkl2jzkmin8xvkmfgs7s-charset-0.
3.7.1 /nix/store/gapj6j0ya5bi9q9dxspda15k50gx8f1v-ansi-terminal- /nix/sto
re/l46769n2p6rlh936zrbwznq3zxxa6mjd-ansi-wl-pprint- /nix/store/p7zmpgz0sq
5pamgrf1xvhvidc3m4cfmk-dhall-1.3.0 /nix/store/938ndd0mqfm148367lwhl6pk5smv5bm0-d
ata-fix-0.0.4 /nix/store/s0hpng652hsn40jy4kjdh1x0jm86dx9l-ghc-8.0.2"),("preferLo

by Gabriel Gonzalez ( at June 12, 2017 02:21 AM

June 11, 2017

Neil Mitchell

Haskell Website Working Group - Update

Summary: We have agreed a set of principles for the website and are collecting information.

I'm writing this partly in my capacity as the chair of the Haskell Website Working Group, and partly as an individual (so blame me rather than the committee). It's fair to say that the original goal of the committee was to make sure everyone agrees on the download page. Discussions amongst the committee lead to a shared goal that the download page itself should clearly direct users along a precise path, without requiring beginners to make decisions requiring judgement. That probably means that download page should only describe one installation path, pushing alternatives onto a separate page.

To decide what should go on the download page, our first step was to evaluate what was currently available, and what experience a beginner might have. We've started that on this page. As an example, it says how to install Haskell, how to open ghci, how to install a tool etc - all using the different options.

When I actually tried installing and using the various options listed on the current download page, they all had confusing steps, unintuitive behaviour and problems that I had to overcome. As an example, Stack on Windows recommends using the 32bit version, while noting that only the 64bit version works. At the same time, Core Platform starts by telling me to edit a global Cabal config file.

I invite everyone to help contribute to that page, via pull requests. At the same time, it would be great if the issues raised could be ironed out, leading to a smooth beginner experience (I'm talking to maintainers in person and raising GitHub tickets). Once the information is collected, and ideally the tools have improved, it will be time to make a decision between the options. When the decision comes, it will be technically motivated, and hopefully unambiguous.

by Neil Mitchell ( at June 11, 2017 05:17 AM

June 07, 2017

Mark Jason Dominus

Annual self-evaluation time, woo-hoo!

It's annual performance evaluation time at my employer, ZipRecruiter, and as part of that I have to write a self-evaluation. I know many people dread these, and I used to dread them, but these days I like doing it. Instead of being a torture or even a chore, for the last couple of years I have come out of it feeling much better about my job and my performance than I went in.

I think that is about 20% because my company does it in a good way, 30% because it suits my personality, and 50% because I have learned how to handle it well. The first half of that might not help you much, but if you're an evaluation loather, you might be able to transfer some of the second half and make it a little less horrible for yourself.

How ZipRecruiter does self-evaluations

I will get this out of the way because it's quick. ZipRecruiter does these evaluations in a way that works well for me. They do not pester me with a million questions. They ask only four, which are roughly:

  1. What were your main accomplishments this year?
  2. Describe areas you feel require improvement.
  3. What do you hope to accomplish in the coming year?
  4. How can your manager help you?

I very much appreciate this minimalist approach. It gets right to the point, covers all the important issues and nothing more. None of these questions feels to me like a meaningless bureaucratism or a waste of time.

Answering the questions thoroughly takes (only) two or three hours, but would take less if I didn't write such detailed answers; I'm sure I could write an acceptable report in an hour. I can see going in that it will be a finite process.

Why this suits my personality well

If you have followed this blog for a while, you may have noticed that I like writing essays, particularly essays about things I have been working on or thinking about. ZipRecruiter's self-evaluation process invites me to write a blog post about my year's work. This is not everyone's cup of tea, but it is right up my alley. Tea alley. Hee hee.

My brain has problems

My big problems with writing a self-evaluation are first, that I have a very poor memory, and second, that I think of myself as a lazy slacker who spends a lot of time goofing off and who accomplishes very little. These combine badly at evaluation time.

In the past, I would try to remember what work I did in the previous year so I could write it up. My memory is poor, so I wouldn't remember most of what I had done, and then it was easy to come to the conclusion that I had not done very much, probably because I was a lazy slacker whose spent a lot of time goofing off. I would go through several iterations of this, until, tormented by guilt and self-hatred, I would write that into the self-evaluation. This is not a practice I would recommend.

If there were two projects, A and B, and I promptly finished A but B dragged on and was running late, which one would I be more likely to remember when the time came to write the self-evaluation report? B, of course. It was still on my mind because I spent so long thinking about it and because it was still in progress. But I had forgotten about A immediately after putting it to rest. Since I could remember only the unfinished projects, I would conclude that I was a lazy slacker who never finished anything, and write that into the self-evaluation. This is also a a practice I recommend you avoid.

The ticketing system is my bionic brain

The way I have been able to escape this horrible trap is by tracking every piece of work I do, every piece, as a ticket in our ticketing system. People often come to me and ask me to do stuff for them, and I either write up a ticket or I say “sure, write me a ticket”. If they ask why I insist on the ticket (they usually don't), I say it's because when self-evaluation time comes around I want to be able to take credit for working on their problem. Everyone seems to find this reasonable.

Then, when it's time to write the self-evaluation, the first thing I do is visit the ticket website, select all my tickets from the past year, and sort them into chronological order. I look over the list of ticket titles and make a list of stuff that might be worth mentioning on the evaluation. I will have forgotten about three-fourths of it. If I didn't have the list in the ticketing system, I would only remember the most recent one-fourth and conclude that I spent three-fourths of my time goofing off because I am a lazy slacker. Instead, there is this long list of the year's accomplishments, too many to actually include in the report.

Well, this is not rocket science. One is called upon to describe the year's accomplishments. Even people with better memory than me surely cannot remember all this without keeping records, can they? Anyway I surely cannot, so I must keep records and then consult them when the time comes. Put that way, it seems obvious. Why did it take so long to figure out? But there are a lot of happy little details that were not so obvious.

  • Instead of thinking “Why didn't I finish big project X? I must have been goofing off. What a lazy slacker I am” I think “holy cow, I resolved 67 tickets related to big project X! That is great progress! No wonder I got hardly anything else done last fall” and also “holy cow, X has 78 resolved tickets and 23 still open. It is huge! No wonder it is not finished yet.”

    Writing “I completed 67 tickets related to X” is a lot more concrete than “I worked hard on X”. If you are neurotic in the way I am, and concerned that you might be a lazy slacker, it feels much more persuasive. I have an idea that it sounds better to my boss also, particularly if he were to be called upon to discuss it with his manager. (“Under my leadership, Mark completed 67 tickets related to X!”) Andy Lester says that your job is to make your boss happy, and that means making it easy for him to do his job, which is to make his boss happy. So this is no small thing.

  • Instead of thinking “Gee, the CTO declared top priority initiative Y, and while everyone else was working hard on it I mostly ignored it because I am a lazy slacker” I might see that I have tagged 9 tickets “top priority initiative Y”. Then on the report, I proudly boast “I completed 9 tickets in support of the CTO's mandate, including (most important one) and (most impressive one).” This also comes under the heading of “make it easy for your boss to do his job”.

  • Instead of completely forgetting that I did project Z, I see the tickets and can put it in my report.

  • Instead of remembering awful project W, which dragged on for months, and thinking what a lazy slacker I was because I couldn't get it done, I have a progress record in the ticket and the details might suggest a different interpretation: Project W sucked, but I nevertheless pursued it doggedly to completion, even though it took months.

  • I might remember that I once helped Jones, but what did I help him with? Did I really spend much time on him? Without looking at the ticket list, I might not realize that I helped Jones every few weeks all year long. This sort of pattern is often apparent only in the retrospective summary. With the ticket system, instead of “oh, Jones sometimes asks me questions, I guess” I can see that supporting Jones was an ongoing thing and he kept coming back. This goes into the report: “I provided ongoing support to Jones, including (some cherry-picked example that makes me look especially good).”

  • One question (#2) on the report form is “Describe areas you feel require improvement”. If I wrote in last year's report that I would like to improve at doing X, I can look in the ticket system for specific evidence that I might have improved, even if I wasn't consciously trying to improve X at the time. Probably there is something somewhere that can at least be spun as an attempt to improve at X. And if it didn't actually improve X, I can still ask myself why it didn't and what might work instead, and put that in the report as something to try next time, which is question #3.

    Hey, look at that, I am evaluating my prior performance and making informed corrections. That might be a useful practice. Wouldn't it be great if I took time every so often to do that? Some sort of periodic self-evaluation perhaps?

  • Another question (#3) is “What would you like to do in the coming year?” If I wrote in last year's report said “I would like to do more of X” I can look for evidence that I did do that, and then write it into this year's report: “Last year I said I would try to do more of X, and I did.”

  • Even if I were having a bad year and got very little done—and this has happened—having a list of the stuff I did get done leaves me in a much better position to write the report than not having such a list.

None of this good stuff would be possible without an actual record of what I did. If there weren't a ticketing system, I would have to invent one or maybe tattoo it on my chest like the guy in Memento. Even aside from its use in writing annual self-evaluations, keeping a work diary is crucial for me, because without it I can't remember day-to-day what I am working on and what needs to happen next. And even for people with better memory than me, are they really going to remember all 317 things they did for work this year, or that 67 of them pertained to big project X? If they can that's great but I doubt it.

Keeping a work record is part of my job

I think it is important to maintain the correct attitude to this. It would be easy to imagine ticket management as unproductive time that I wasted instead of accomplishing something useful. This is wrong. The correct attitude is to consider ticket updates to be part of my work product: I produce code. I produce bug fixes. I produce documentation, reports, and support interactions. And I also produce ticket updates. This is part of my job and while I am doing it I am not goofing off, I am not procrastinating, I am doing my job and earning my salary. If I spent the whole day doing nothing but updating tickets, that would be a day well-spent.

Compare “I produce ticket updates” with “I produce unit tests”. The attitude for ticket updates is the same as for testing. When something happens in a project, I update the ticket, because keeping the tickets updated is part of the project, just like writing tests is. A day spent doing nothing but writing tests is a day well-spent. An organization that fails to support ticket updates is broken in the same way as one that fails to support test development.

My boss gets email every time I update a ticket. I don't know if he reads these, but he has the option to, and I don't need to worry as much that maybe he thinks I am a lazy slacker who is goofing off, because he is getting a stream of automatic notifications about what I am up to. I'm not my boss but if I were I would appreciate this very much.

Maybe some of this can help you?

There might be some use in this even for people who aren't already in the habit of writing self-absorbed blog posts.

If doing the annual self-evaluation makes you suffer, maybe it would help to practice writing some blog posts. You don't have to publish them or show anyone. Next time you finish a project, set aside thirty or sixty minutes to try to write up a little project report: What worked, what didn't, what are you pleased about, what was surprising, what was fun, what was annoying? I'm not certain this will help but it seems like this is a skill that might get easier with practice, and then when you have to write your annual self-evaluation it might be easier because you have more practice doing it. Also, you would have a little file of material to draw on and would not have to start from zero.

If your employer's process requires you to fill in some giant questionnaire, it might be easier to do if you go into it with answers to the four basic questions prepared ahead of time. (I imagine that it's even possible that if you address the four big questions and ignore everything on the giant questionnaire that doesn't pertain to them, everyone will be perfectly satisfied and nobody will notice the omissions.)

And keep a work diary! Tattoo it on your chest if you have to. If it seems daunting, realize that you don't have to do it all at once. Keeping a work diary of some sort is forgiving in the same way as writing unit tests:

  • It's not all-or-nothing, you don't have to test every piece of code to get any benefit from testing. If you write tests for 1% of the code, you get about 1% of the benefit, and you can ramp up.

  • If you break your test-writing streak you don't have to start over from zero. If you didn't write any tests for the code you wrote last week, that's a shame, but it doesn't affect the benefit you can get from writing a unit test for whatever you're working on today.

The work diary is similar. When time comes to write your evaluation, a small and incomplete record is better than no record at all. If you forget to write in the diary for a month, that's a shame, but it doesn't affect the benefit you can get from writing down today what you did today.

Our ticketing system

This isn't important, but I expect someone will want to know: At work we use FogBugz. Some of my co-workers dislike it but I have no major complaints. If you want to try it on your own, they have a seven-day free trial offer, after which you can sign up for a permanent free account that supports up to two users. I am experimenting with using a free tier account to manage my personal to-do list.

Coming soon

I wrote another 2,000 words about my specific practices for managing tickets. I hope it'll be along in a few days.

by Mark Dominus ( at June 07, 2017 07:54 PM

Brent Yorgey

The Typeclassopedia is now up-to-date

The title pretty much says it all: I have finally finished (I hope) updating the Typeclassopedia to reflect various recent changes to the language and standard libraries (such as the AMP and BBP/FTP). Along the way I also added more links to related reading as well as more exercises.

How you can help

I am always on the lookout for more exercises to add and for more links to interesting further reading. If you know of a cool exercise or a cool paper or blog post that helps explain/illustrate/apply a standard Haskell type class, please let me know (or just add it yourself, it’s a wiki!). And, of course, the same goes if you notice any errors or confusing bits.

Happy Haskelling!

by Brent at June 07, 2017 07:47 PM

Robert Harper

What, if anything, is a programming paradigm?

Just out, an essay on the Cambridge University Press author’s blog about “programming paradigms”, and why I did not structure Practical Foundations for Programming Languages around them.



Filed under: Programming, Teaching Tagged: programming languages

by Robert Harper at June 07, 2017 05:12 PM

Douglas M. Auclair (geophf)

May 2017 1HaskellADay problems and solutions

by geophf ( at June 07, 2017 12:41 PM

Yesod Web Framework

Updated Yesod Scaffolding

A few days ago I released an update to the Yesod scaffolding. It's nothing major, but it has some new niceness I thought people would be interested in:

  1. I've (finally) moved the Haskell source files into a src directory. I rejected some moves in the past. But since then, this style has become the dominant style in the Haskell world, and it makes sense to embrace it.
  2. Instead of putting language extensions in the default-extensions field of the cabal file, they are now in LANGUAGE pragmas in each source file. This was not an obvious decision to make, and there are still people (myself included) who are conflicted on it. You can see some of the discussion of this on Twitter:

    <script async="async" charset="utf-8" src=""></script>
  3. We've moved from a cabal file to an hpack package.yaml file. I only started using hpack a few months back, but it's completely won me over already. For those not familiar, check out the hpack repo. Note that hpack generates a cabal file, so there is full compatibility with cabal-the-build-system. We just get some niceties, like leaving off exposed-modules.

Next time you create a scaffolded Yesod project (by running, e.g. stack new mysite yesod-postgres), you'll automatically get this updated scaffolding.

June 07, 2017 05:15 AM

Dan Piponi (sigfpe)

A relaxation technique


Sometimes you want to differentiate the expected value of something. I've written about some tools that can help with this. For example you can use Automatic Differentiation for the derivative part and probability monads for the expectation. But the probability monad I described in that article computes the complete probability distribution for your problem. Frequently this is intractably large. Instead people often use Monte Carlo methods. They'll compute the "something" many times, substituting pseudo-random numbers for the random variables, and then average the results. This provides an estimate of the expected value and is ubiquitous in many branches of computer science. For example it's the basis of ray-tracing and path-tracing algorithms in 3D rendering, and plays a major role in machine learning when used in the form of stochastic gradient descent.

But there's a catch. Suppose we want to compute where each of the belong to the Bernoulli distribution . I.e. each has a probability of being 1 and probability of being 0. If we compute this using a Monte Carlo approach we'll repeatedly generate pseudo-random numbers for each of the . Each one will be 0 or 1. This means that our estimate depends on via subexpressions that can't meaningfully be differentiated with respect to . So how can we use automatic differentiation with the Monte Carlo method? I'm proposing an approach that may or may not already be in the literature. Whether it is or not, I think it's fun to get there by combining many of the things I've previously talked about here, such as free monads, negative probabilities and automatic differentiation. I'm going to assume you're familiar with using dual numbers to compute derivatives as I've written about this before and wikipedia has the basics.

A probability monad

I want to play with a number of different approaches to using monads with probability theory. Rather than define lots of monads I think that the easiest thing is to simply work with one free monad and then provide different interpreters for it.

First some imports:

> import Control.Monad
> import qualified System.Random as R
> import qualified Data.Map.Strict as M

I'm going to use a minimal free monad that effectively gives us a DSL with a new function that allows us to talk about random Bernoulli variables:

> data Random p a = Pure a | Bernoulli p (Int -> Random p a)

The idea is that Pure a represents the value a and Bernoulli p f is used to say "if we had a random value x, f x is the value we're interested in". The Random type isn't going to do anything other than represent these kinds of expressions. There's no implication that we actually have a random value for x yet.

> instance Functor (Random p) where
> fmap f (Pure a) = Pure (f a)
> fmap f (Bernoulli p g) = Bernoulli p (fmap f . g)

> instance Applicative (Random p) where
> pure = return
> (<*>) = ap

> instance Monad (Random p) where
> return = Pure
> Pure a >>= f = f a
> Bernoulli p g >>= f = Bernoulli p (\x -> g x >>= f)

We'll use bernoulli p to represent a random Bernoulli variable drawn from .

> bernoulli :: p -> Random p Int
> bernoulli p = Bernoulli p return

So let's write our first random expression:

> test1 :: Random Float Float
> test1 = do
> xs <- replicateM 4 (bernoulli 0.75)
> return $ fromIntegral $ sum xs

It sums 4 Bernoulli random variables from and converts the result to a Float. The expected value is 3.

We don't yet have a way to do anything with this expression. So let's write an interpreter that can substitute pseudo-random values for each occurrence of bernoulli p:

It's essentially interpreting our free monad as a state monad where the state is the random number seed:

> interpret1 :: (Ord p, R.Random p, R.RandomGen g) => Random p a -> g -> (a, g)
> interpret1 (Pure a) seed = (a, seed)
> interpret1 (Bernoulli prob f) seed =
> let (r, seed') = R.random seed
> b = if r <= prob then 1 else 0
> in interpret1 (f b) seed'

You can use the expression R.getStdRandom (interpret1 test1) if you want to generate some random samples for yourself.

We're interested in the expected value, so here's a function to compute that:

> expect1 :: (Fractional p, Ord p, R.Random p, R.RandomGen g) => Random p p -> Int -> g -> (p, g)
> expect1 r n g =
> let (x, g') = sum1 0 r n g
> in (x/fromIntegral n, g')

> sum1 :: (Ord p, Num p, R.Random p, R.RandomGen g) => p -> Random p p -> Int -> g -> (p, g)
> sum1 t r 0 g = (t, g)
> sum1 t r n g =
> let (a, g') = interpret1 r g
> in sum1 (t+a) r (n-1) g'

You can test it out with R.getStdRandom (expect1 test1 1000). You should get values around 3.

We can try completely different semantics for Random. This time we compute the entire probability distribution:

> interpret2 :: (Num p) => Random p a -> [(a, p)]
> interpret2 (Pure a) = [(a, 1)]
> interpret2 (Bernoulli p f) =
> scale p (interpret2 (f 1)) ++ scale (1-p) (interpret2 (f 0))

> scale :: Num p => p -> [(a, p)] -> [(a, p)]
> scale s = map (\(a, p) -> (a, s*p))

You can try it with interpret2 test1.

Unfortunately, as it stands it doesn't collect together multiple occurrences of the same value. We can do that with this function:

> collect :: (Ord a, Num b) => [(a, b)] -> [(a, b)]
> collect = M.toList . M.fromListWith (+)

And now you can use collect (interpret2 test1).

Let's compute some expected values:

> expect2 :: (Num p) => Random p p -> p
> expect2 r = sum $ map (uncurry (*)) (interpret2 r)

The value of expect2 test1 should be exactly 3. One nice thing about interpret2 is that it is differentiable with respect to the Bernoulli parameter when this is meaningful. Unfortunately it has one very big catch: the value of interpret2 can be a very long list. Even a small simulation can results in lists too big to store in the known universe. But interpret1 doesn't produce differentiable results. Is there something in-between these two interpreters?

Importance sampling

Frequently in Monte Carlo sampling it isn't convenient to sample from the distribution you want. For example it might be intractably hard to do so, or you might have proven that the resulting estimate has a high variance. So instead you can sample from a different, but possibly related distribution. This is known as importance sampling. Whenever you do this you must keep track of how "wrong" your probability was and patch up your expectation estimate at the end. For example, suppose a coin comes up heads 3/4 of the time. Instead of simulating a coin toss that comes up 3/4 of the time you could simulate one that comes up heads half of the time. Suppose at one point in the simulation it does come up heads. Then you used a probability of 1/2 when you should have used 3/4. So when you compute the expectation you need to scale the contribution from this sample by (3/4)/(1/2) = 3/2. You need so scale appropriately for every random variable used. A straightforward way to see this for the case of a single Bernoulli variable is to note that

We've replaced probabilities and with and but we had to scale appropriately in each of the cases and to keep the final value the same. I'm going to call the scale value the importance. If we generate random numbers in a row we need to multiply all of the importance values that we generate. This is a perfect job for the Writer monad using the Product monoid. (See Eric Kidd's paper for some discussion about the connection between Writer and importance sampling.) However I'm just going to write an explicit interpreter for our free monad to make it clear what's going where.

This interpreter is going to take an additional argument as input. It'll be a rule saying what probability we should sample with when handling a variable drawn from . The probability should be a real number in the interval .

> interpret3 :: (Fractional p, R.RandomGen g) =>
> (p -> Float) -> Random p a -> g -> ((a, p), g)
> interpret3 rule (Pure a) g = ((a, 1), g)
> interpret3 rule (Bernoulli p f) g =
> let (r, g') = R.random g
> prob = rule p
> (b, i) = if (r :: Float) <= prob
> then (1, p/realToFrac prob)
> else (0, (1-p)/realToFrac (1-prob))
> ((a, i'), g'') = interpret3 rule (f b) g'
> in ((a, i*i'), g'')

Here's the accompanying code for the expectation:

> expect3 :: (Fractional p, R.RandomGen g) =>
> (p -> Float) -> Random p p -> Int -> g -> (p, g)
> expect3 rule r n g =
> let (x, g') = sum3 rule 0 r n g
> in (x/fromIntegral n, g')

> sum3 :: (Fractional p, R.RandomGen g) =>
> (p -> Float) -> p -> Random p p -> Int -> g -> (p, g)
> sum3 rule t r 0 g = (t, g)
> sum3 rule t r n g =
> let ((a, imp), g') = interpret3 rule r g
> in sum3 rule (t+a*imp) r (n-1) g'

For example, you can estimate the expectation of test1 using unbiased coin tosses by evaluating R.getStdRandom (expect3 (const 0.5) test1 1000).

Generalising probability

Did you notice I made my code slightly more general than seems to be needed? Although I use probabilities of type Float to generate my Bernoulli samples, the argument to the function bernoulli can be of a more general type. This means that we can use importance sampling to compute expected values for generalised measures that take values in a more general algebraic structure than the interval [0,1]. For example, we could use negative probabilities. An Operational Interpretation of Negative Probabilities and No-Signalling Models by Adamsky and Brandenberger give a way to interpret expressions involving negative probabilities. We can implement it using interpret3 and the rule \p -> abs p/(abs p+abs (1-p)). Note that it is guaranteed to produce values in the range [0,1] (if you start with dual numbers with real parts that are ordinary probabilities) and reproduces the usual behaviour when given ordinary probabilities.

Here's a simple expression using a sample from "":

> test2 = do
> a <- bernoulli 2
> return $ if a==1 then 2.0 else 1.0

It's expected value is 3. We can get this exactly using expect2 test2. For a Monte Carlo estimate use

R.getStdRandom (expect3 (\back p -> abs p/(abs p+abs (1-p))) test2 1000)

Note that estimates involving negative probabilities can have quite high variances so try a few times until you get something close to 3 :-)

We don't have to stick with real numbers. We can use this approach to estimate with complex probabilities (aka quantum mechanics) or other algebraic structures.

Discrete yet differentiable

And now comes the trick: automatic differentiation uses the algebra of dual numbers. It's not obvious at all what a probability like means when is infinitesimal. However, we can use interpret3 to give it meaningful semantics.

Let'd define the duals in the usual way first:

> data Dual a = D { real :: a, infinitesimal :: a }

> instance (Ord a, Num a) => Num (Dual a) where
> D a b + D a' b' = D (a+a') (b+b')
> D a b * D a' b' = D (a*a') (a*b'+a'*b)
> negate (D a b) = D (negate a) (negate b)
> abs (D a b) = if a > 0 then D a b else D (-a) (-b)
> signum (D a b) = D (signum a) 0
> fromInteger a = D (fromInteger a) 0

> instance (Ord a, Fractional a) => Fractional (Dual a) where
> fromRational a = D (fromRational a) 0
> recip (D a b) = let ia = 1/a in D ia (-b*ia*ia)

> instance Show a => Show (Dual a) where
> show (D a b) = show a ++ "[" ++ show b ++ "]"

Now we can use the rule real to give as a real-valued probability from a dual number. The function expect3 will push the infinitesimal part into the importance value so it doesn't get forgotten about. And now expect3 gives us an estimate that is differentiable despite the fact that our random variables are discrete.

Let's try an expression:

> test3 p = do
> a <- bernoulli p
> b <- bernoulli p
> return $ if a == 1 && b == 1 then 1.0 else 0.0

The expected value is and the derivative is . We can evaluate at with expect2 (test3 (D 0.5 1)). And we can estimate it with

R.getStdRandom (expect3 real (test4 (D 0.5 1)) 1000)

What's neat is that we can parameterise our distributions in a more complex way and we can freely mix with conventional expressions in our parameter. Here's an example:

> test4 p = do
> a <- bernoulli p
> b <- bernoulli (p*p)
> return $ p*fromIntegral a*fromIntegral b

Try evaluating expect2 (test4 (D 0.5 1)) and

R.getStdRandom (expect3 real (test4 (D 0.5 1)) 1000)

I've collected the above examples together here:

> main = do
> print =<< R.getStdRandom (interpret1 test1)
> print $ collect $ interpret2 test1
> print =<< R.getStdRandom (expect1 test1 1000)
> print (expect2 test1)
> print =<< R.getStdRandom (expect3 id test1 1000)
> print =<< R.getStdRandom (expect3 (const 0.5) test1 1000)
> print "---"
> print $ expect2 test2
> print =<< R.getStdRandom (expect3 (\p -> abs p/(abs p+abs (1-p))) test2 1000)
> print "---"
> print $ expect2 (test3 (D 0.5 1))
> print =<< R.getStdRandom (expect3 real (test3 (D 0.5 1)) 1000)
> print "---"
> print $ expect2 (test4 (D 0.5 1))
> print =<< R.getStdRandom (expect3 real (test4 (D 0.5 1)) 1000)

What just happened?

You can think of a dual number as a real number that has been infinitesimally slightly deformed. To differentiate something we need to deform something. But we can't deform 0 or 1 and have them stay 0 or 1. So the trick is to embed probability sampling in something "bigger", namely importance sampling, where samples carry around an importance value. This bigger thing does allow infinitesimal deformations. And that allows differentiation. This process of turning something discrete into something continuously "deformable" is generally called relaxation.

Implementation details

I've made no attempt to make my code fast. However I don't think there's anything about this approach that's incompatible with performance. There's no need to use a monad. Instead you can track the importance value through your code by hand and implement everything in C. Additionally, I've previously written about the fact that for any trick involving forward mode AD there is another corresponding trick you can use with reverse mode AD. So this method is perfectly comptible with back-propagation. Note also that the dual number importances always have real part 1 which means you don't actually need to store them.

The bad news is that the derivative estimate can sometimes have a high variance. Nonetheless, I've used it successfully for some toy optimisation problems. I don't know if this approach is effective for industrial strength problems. Your mileage may vary :-)


Sometimes you may find that it is acceptable to deform the samples from your discrete distribution. In that case you can use the concrete relaxation.

Continuous variables

The above method can be adapted to work with continuous variables. There is a non-trivial step which I'll leave as an exercise but I've tested it in some Python code. I think it reproduces a standard technique and it gives an alternative way to think about that trick. That article is also useful for ways to deal with the variance issues. Note also that importance sampling is normally used itself as a variance reduction technique. So there are probably helpful ways to modify the rule argument to interpret3 to simultaneously estimate derivatives and keep the variance low.

Personal note

I've thought about this problem a couple of times over the years. Each time I've ended up thinking "there's no easy way to extend AD to work with random variables so don't waste any more time thinking about it". So don't listen to anything I say. Also, I like that this method sort of comes "for free" once you combine methods I've described previously.


I think it was Eric Kidd's paper on building probability monads that first brought to my attention that there are many kinds of semantics you can use with probability theory - i.e. there are many interpreters you can write for the Random monad. I think there is an interesting design space worth exploring here.

Answer to exercise

I set the continuous case as an exercise above. Here is a solution.

Suppose you're sampling from a distribution parameterised by with pdf . To compute the derivative with respect to you need to consider sampling from where is an infinitesimal.

As we don't know how to sample from a pdf with infinitesimals in it, we instead sample using as usual, but use an importance of
The coefficient of the gives the derivative. So we need to compute the expectation, scaling each sample with this coefficient. In other words, to estimate we use
where the are drawn from the original distribution. This is exactly what is described at Shakir Mohamed's blog.

Final word

I managed to find the method in the literature. It's part of the REINFORCE method. For example, see equation (5) there.

by Dan Piponi ( at June 07, 2017 03:32 AM

June 06, 2017

Philip Wadler

Monbiot: I’ve never voted with hope before. Jeremy Corbyn has changed that

Leave it to George Monbiot to make the most effective case for Labour.
On policy after policy, the Labour manifestoaccords with what people say they want. It offers a strong and stable National Health Service, in which privatisation is reversed, clinical budgets rise and staff are properly paid. It promises more investment in schools, smaller class sizes, and an end to the stifling micromanagement driving teachers out of the profession. It will restore free education at universities. It will ensure that railways, water, energy and the postal service are owned for the benefit of everyone, rather than only the bosses and shareholders. It will smoke out tax avoidance, and bring the banks under control.
While Theresa May will use Brexit as a wrecking ball to be swung at workers’ rights, environmental laws and other regulations the Conservative party has long wanted to destroy, Labour has promised to enhance these public protections. It will ban zero-hours contracts, prevent companies from forcing their staff into bogus self-employment, and give all workers – whether temporary or permanent – equal rights. The unemployed will be treated with respect. Both carers and people with disabilities will be properly supported. Those who need homes will find them, and tenants will be protected from the new generation of rack-renting slumlords. Who, apart from the richest beneficiaries of the current regime, would not wish to live in such a nation?  ...
[May] won’t stand up to anyone who wields power. She will say nothing against Donald Trump, even when he peddles blatant falsehoods in the aftermath of terrorist attacks in this nation, exploiting our grief to support his disgusting prejudices; even when he pulls out of the global agreement on climate change.
She is even more sycophantic towards this revolting man than Tony Blair was to George W Bush. She won’t confront Saudi Arabia over terrorism or Yemen or anything else. ...
She won’t stand up to the polluters lavishly funding the Conservative party, whose role explains both her weakness on climate change and her miserable failure to address our air pollution crisis. She won’t stand up to the fanatics in her party who call for the hardest of possible Brexits. She won’t stand up on television to debate these policies because she knows that the more we see, the less we like. The party machine’s attempt to build a personality cult around her fell at an obvious hurdle: first, you need a personality.  ...
The election now hangs on whether the young people who claim they will vote Labour are prepared to act on this intention. We know that older Conservative voters will make good their promise: they always do. Will the young electors, who will lose most from another five years of unresponsive government, walk a couple of hundred metres to their polling stations? Or will they let this unprecedented chance to change the nation slip through their fingers? The world belongs to those who turn up.

by Philip Wadler ( at June 06, 2017 09:05 PM

Marking the Death of Zhi Min Soh

I slipped a couple of years ago on the tram tracks, a hundred meters from where this accident happened. I broke my little finger, Zhi Min Soh lost her life. Please consider attending tomorrow morning's event in her memory.
The death on Wednesday (31st May) of a young woman in Edinburgh on Wednesday has hit a nerve with cyclists across Scotland. She appears to have been killed after her bike slipped on the tram tracks on Princes Street.
Edinburgh’s tram tracks have been described as an accident waiting to happen from the moment they were unveiled. [H]undreds of cyclists have been injured from falls on the tracks, and thousands more have had close shaves. This Wednesday (7th June), at 8:30 am, cyclists in Edinburgh will be marking Zhi Min Soh’s death. There will be a short, respectful protest at the junction where she died, reflecting the emotion that has bubbled up in the days since this senseless death. Although we are not organising it, we fully support this action and ask anyone who can to come and join them, on bike or on foot, and whether you cycle or not.If you can attend, please make your way directly to Shandwick Place for 8:30 a.m. If you can, bring a sign or a placard letting people know what it is about. People will gather at the junction for a minute’s silence, and a lament from a piper to remember this death, and to ask for the City of Edinburgh to take action to ensure that it will be the last.

by Philip Wadler ( at June 06, 2017 11:46 AM

Michael Snoyman

How to send me a pull request

I find myself repeating a lot of the same comments in pull requests, so I decided to put together a list of what I consider the most important features of a good pull request. Other people will have different feelings on some of these, but the points below are what apply to my projects. If you have thoughts on things I've left out, or reasons why you disagree with these points, please comment below.

Many of these points only make sense for source code, and even more specifically for code written in Haskell. My content repos (like this site's content) and non-Haskell repos (do I have any of those???) would be slightly different.

NOTE: I'm not the maintainer of Stack, so the comments below do not necessarily apply there. Stack has its own contribution rules, so please don't take my personal opinions here as relevant to that project.

  • Every top-level identifier exported from a library needs to have a Haddock comment. It's irrelevant if the identifier name is completely self-commenting; a comment is still necessary. (These are the comments that look like -- | Describe what the function does.)
  • My packages all follow PVP-style version numbers, and pull requests should include bumps to the version number. For those unfamiliar: PVP version numbers consistent of four components, A.B.C.D.

    • If your change fixes a bug without modifying the API at all, then bump D ( becomes, or 1.2.3 becomes
    • If your change adds something new to the API without changing something that already exists, bump C ( becomes 1.2.4).
    • If you change the existing API (e.g., remove a function, change semantics, modify a data type), bump either A or B ( becomes either 1.3.0 or 2.0.0, depending on how big a change you think this is).

      • By the way, I'm unlikely to include a breaking change unless you have a really good reason. I consider backwards compatibility really important. Consider exporting a new function and (optionally) deprecating the old one instead.
  • To elaborate on motivation for the previous point: I follow a policy of releasing new code fairly quickly in most cases, as opposed to batching up a number of changes. In that situation, it makes sense for a PR to include the new version number immediately. Many other projects work differently, and do not encourage contributors to do version bumps.

    • Also, I sometimes forget to make a new release after merging a PR. If I do forget, don't be shy about pinging me to make a release.
  • Include a @since annotation in each new identifier, including the new version number you just bumped to. This is absolutely vital for users of a library to properly specify lower bounds in their dependencies easily.

    • Don't include @since for un-exported identifiers.
    • If you are exporting a pre-existing identifier that was previously not exported, include the @since.
    • In other words: @since indicates when it was added to the public API, not when it came into existence.


    -- | Download a JPEG image.
    -- @since 1.5.2
    downloadJPEG :: MonadIO m
                 => Request -- ^ URL to download from
                 -> m JPEG
  • Include a entry. If the project doesn't have a, create one. Odds are the content you write in the changelog can be identical to the pull request description. This is a huge end-user convenience. Example:

    ## 1.5.2
    * Added the `downloadJPEG` function
  • Slight exception to the above: if you're making a doc-only PR, don't bother with a version bump, but instead add a ChangeLog entry with ## Unreleased and a description of the change.
  • Do not include unrelated changes in your PR, it makes it difficult to review, more likely to get delayed, and more likely to conflict with other changes. Include separate changes in separate PRs.
  • Keep coding style consistent with the rest of the code. I'm not a big stickler for coding style guidelines in general, but I do consider it very jarring to have conflicting styles in the same file, or even worse the same function.
  • Similar to the previous point: think hard before sending a pull request to modify the style of code. Again, I'm not a big stickler on coding style, and I consider style in general a pretty arbitrary and unimportant aspect of the code. But that makes me even less likely to want to spend my time reviewing and discussing changes like whether records should be aligned. (For the record, I don't think they should be aligned, as it makes diffs noisier than necessary.)
  • If you have a PR for addressing a typo, making a trivial fix, or adding a straightforward feature: just send a PR without any prior discussion. However, if you want to make a major overhaul, change behavior, or break API compatibility: it's worth opening an issue first to discuss. I don't like rejecting PRs or causing people to waste their time, so I'd rather hash out details before you spend a lot of time on something.
  • Don't use partial functions. I've received lots of PRs which, for example, use fromJust because "by analyzing the rest of the code, you can tell it will always be a Just value and not Nothing." That doesn't cut it for me:

    • I'm lazy, and I don't want to reason about the code. I want the compiler to do it for me.
    • I'm stupid, and I don't trust my own reasoning.
    • Such logic does not withstand future refactorings of the code, making it fragile.

    There are definitely some exceptions to this rule, but odds are pretty good your case won't be one of them :).

  • Added test cases are always a good thing. They also make it easier for me to understand why you're writing this PR, and ensure I don't break your work in the future (remember from the previous point: I'm stupid).
  • When submitting PRs to libraries (as opposed to an application like

    • Keep compatibility with older versions of dependencies whenever possible. I try to keep as broad a range of potential package versions as I can to help avoid "dependency hell." (For the record: dependency hell is not related to tooling in any way, it's an intrinsic aspect of having dependencies.)
    • In some cases, if you drop compatibility with a major version of a dependency (e.g., change from transformers >= 0.3 to transformers >= 0.4), I may consider it a breaking change in the library worthy of a major version bump.
    • Avoid adding dependencies. I personally am not of the opinion that reducing the dependency footprint is that important, and strongly believe that such behavior in general leads to Not Invented Here (NIH) syndrome. However, since enough users of my libraries do feel this way, it's easier on me if you don't incur unnecessary dependencies.

June 06, 2017 10:00 AM

Wolfgang Jeltsch

Generic programming in Haskell

Generic programming is a powerful way to define a function that works in an analogous way for a class of types. In this article, I describe the latest approach to generic programming that is implemented in GHC. This approach goes back to the paper A Generic Deriving Mechanism for Haskell by José Pedro Magalhães, Atze Dijkstra, Johan Jeuring, and Andres Löh.

This article is a writeup of a Theory Lunch talk I gave on 4 February 2016. As usual, the source of this article is a literate Haskell file, which you can download, load into GHCi, and play with.


Parametric polymorphism allows you to write functions that deal with values of any type. An example of such a function is the reverse function, whose type is [a] -> [a]. You can apply reverse to any list, no matter what types the elements have.

However, parametric polymorphism does not allow your functions to depend on the structure of the concrete types that are used in place of type variables. So values of these types are always treated as black boxes. For example, the reverse function only reorders the elements of the given list. A function of type [a] -> [a] could also drop elements (like the tail function does) or duplicate elements (like the cycle function does), but it could never invent new elements (except for ⊥) or analyze elements.

Now there are situation where a function is suitable for a class of types that share certain properties. For example, the sum function works for all types that have a notion of binary addition. Haskell uses type classes to support such functions. For example, the Num class provides the method (+), which is used in the definition of sum, whose type Num a => [a] -> a contains a respective class constraint.

The methods of a class have to be implemented separately for every type that is an instance of the class. This is reasonable for methods like (+), where the implementations for the different instances differ fundamentally. However, it is unfortunate for methods that are implemented in an analogous way for most of the class instances. An example of such a method is (==), since there is a canonical way of checking values of algebraic data types for equality. It works by first comparing the outermost data constructors of the two given values and if they match, the individual fields. Only when the data constructors and all the fields match, the two values are considered equal.

For several standard classes, including Eq, Haskell provides the deriving mechanism to generate instances with default method implementations whose precise functionality depends on the structure of the type. Unfortunately, there is no possibility in standard Haskell to extend this deriving mechanism to user-defined classes. Generic programming is a way out of this problem.


For generic programming, we need several language extensions. The good thing is that only one of them, DeriveGeneric, is specific to generic programming. The other ones have uses in other areas as well. Furthermore, DeriveGeneric is a very small extension. So the generic programming approach we describe here can be considered very lightweight.

We state the full set of necessary extensions with the following pragma:

{-# LANGUAGE DefaultSignatures,
             TypeOperators #-}

Apart from these language extensions, we need the module GHC.Generics:

import GHC.Generics

Our running example

As our running example, we pick serialization and deserialization of values. Serialization means converting a value into a bit string, and deserialization means parsing a bit string in order to get back a value.

We introduce a type Bit for representing bits:

data Bit = O | I deriving (Eq, Show)

Furthermore, we define the class of all types that support serialization and deserialization as follows:

class Serializable a where

    put :: a -> [Bit]

    get :: [Bit] -> (a, [Bit])

There is a canonical way of serializing values of algebraic data types. It works by first encoding the data constructor of the given value as a sequence of bits and then serializing the individual fields. To show this approach in action, we define an algebraic data type Tree, which is a type of labeled binary trees:

data Tree a = Leaf | Branch (Tree a) a (Tree a) deriving Show

An instantiation of Serializable for Tree that follows the canonical serialization approach can be carried out as follows:

instance Serializable a => Serializable (Tree a) where

    put Leaf                     = [O]
    put (Branch left root right) = [I]       ++
                                   put left  ++
                                   put root  ++
                                   put right

    get (O : bits) = (Leaf, bits)
    get (I : bits) = (Branch left root right, bits''') where

        (left,  bits')   = get bits
        (root,  bits'')  = get bits'
        (right, bits''') = get bits''

Of course, it quickly becomes cumbersome to provide such an instance declaration for every algebraic data type that should use the canonical serialization approach. So we want to implement the canonical approach once and for all and make it easily usable for arbitrary types that are amenable to it. Generic programming makes this possible.


An algebraic data type is essentially a sum of products where the terms “sum” and “product” are understood as follows:

  • A sum is a variant type. In Haskell, Either is the canonical type constructor for binary sums, and the empty type Void from the void package is the nullary sum.

  • A product is a tuple type. In Haskell, (,) is the canonical type constructor for binary products, and () is the nullary product.

The key idea of generic programming is to map types to representations that make the sum-of-products structure explicit and to implement canonical behavior based on these representations instead of the actual types.

The GHC.Generics module defines a number of type constructors for constructing representations:

data V1 p

infixr 5 :+:
data (:+:) f g p = L1 (f p) | R1 (g p)

data U1 p = U1

infixr 6 :*:
data (:*:) f g p = f p :*: g p

newtype K1 i a p = K1 { unK1 :: a }

newtype M1 i c f p = M1 { unM1 :: f p }

All of these type constructors take a final parameter p. This parameter is relevant only when dealing with higher-order classes. In this article, however, we only discuss generic programming with first-order classes. In this case, the parameter p is ignored. The different type constructors play the following roles:

  • V1 is for the nullary sum.

  • (:+:) is for binary sums.

  • U1 is for the nullary product.

  • (:*:) is for binary products.

  • K1 is a wrapper for fields of algebraic data types. Its parameter i used to provide some information about the field at the type level, but is now obsolete.

  • M1 is a wrapper for attaching meta information at the type level. Its parameter i denotes the kind of the language construct the meta information refers to, and its parameter c provides access to the meta information.

The GHC.Generics module furthermore introduces a class Generic, whose instances are the types for which a representation exists. Its definition is as follows:

class Generic a where

  type Rep a :: * -> *

  from :: a -> (Rep a) p

  to :: (Rep a) p -> a

A type Rep a is the representation of the type a. The methods from and to convert from values of the actual type to values of the representation type and vice versa.

To see all this in action, we make Tree a an instance of Generic:

instance Generic (Tree a) where

    type Rep (Tree a) =
        M1 D D1_Tree (
            M1 C C1_Tree_Leaf U1
            M1 C C1_Tree_Branch (
                M1 S NoSelector (K1 R (Tree a))
                M1 S NoSelector (K1 R a)
                M1 S NoSelector (K1 R (Tree a))

    from Leaf                     = M1 (L1 (M1 U1))
    from (Branch left root right) = M1 (
                                        R1 (
                                        M1 (
                                            M1 (K1 left)
                                            M1 (K1 root)
                                            M1 (K1 right)

    to (M1 (L1 (M1 U1)))      = Leaf
    to (M1 (
            R1 (
            M1 (
                M1 (K1 left)
                M1 (K1 root)
                M1 (K1 right)
        ))                    = Branch left root right

The types D1_Tree, C1_Tree_Leaf, and C1_Tree_Branch are type-level representations of the type constructor Tree, the data constructor Leaf, and the data constructor Branch, respectively. We declare them as empty types:

data D1_Tree
data C1_Tree_Leaf
data C1_Tree_Branch

We need to make these types instances of the classes Datatype and Constructor, which are part of GHC.Generics as well. These classes provide a link between the type-level representations of type and data constructors and the meta information related to them. This meta information particularly covers the identifiers of the type and data constructors, which are needed when implementing canonical implementations for methods like show and read. The instance declarations for the Tree-related types are as follows:

instance Datatype D1_Tree where

  datatypeName _ = "Tree"

  moduleName _ = "Main"

instance Constructor C1_Tree_Leaf where

  conName _ = "Leaf"

instance Constructor C1_Tree_Branch where

  conName _ = "Branch"

Instantiating the Generic class as shown above is obviously an extremely tedious task. However, it is possible to instantiate Generic completely automatically for any given algebraic data type, using the deriving syntax. This is what the DeriveGeneric language extension makes possible.

So instead of making Tree a an instance of Generic by hand, as we have done above, we could have declared the Tree type as follows in the first place:

data Tree a = Leaf | Branch (Tree a) a (Tree a)
              deriving (Show, Generic)

Implementing canonical behavior

As mentioned above, we implement canonical behavior based on representations. Let us see how this works in the case of the Serializable class.

We introduce a new class Serializable' whose methods provide serialization and deserialization for representation types:

class Serializable' f where

    put' :: f p -> [Bit]

    get' :: [Bit] -> (f p, [Bit])

We instantiate this class for all the representation types:

instance Serializable' U1 where

    put' U1 = []

    get' bits = (U1, bits)

instance (Serializable' r, Serializable' s) =>
         Serializable' (r :*: s) where

    put' (rep1 :*: rep2) = put' rep1 ++ put' rep2

    get' bits = (rep1 :*: rep2, bits'') where

        (rep1, bits')  = get' bits
        (rep2, bits'') = get' bits'

instance Serializable' V1 where

    put' _ = error "attempt to put a void value"

    get' _ = error "attempt to get a void value"

instance (Serializable' r, Serializable' s) =>
         Serializable' (r :+: s) where

    put' (L1 rep) = O : put' rep
    put' (R1 rep) = I : put' rep

    get' (O : bits) = let (rep, bits') = get' bits in
                      (L1 rep, bits')
    get' (I : bits) = let (rep, bits') = get' bits in
                      (R1 rep, bits')

instance Serializable' r => Serializable' (M1 i a r) where

    put' (M1 rep) = put' rep

    get' bits = (M1 rep, bits') where

        (rep, bits') = get' bits

instance Serializable a => Serializable' (K1 i a) where

    put' (K1 val) = put val

    get' bits = (K1 val, bits') where

        (val, bits') = get bits

Note that in the case of K1, the context mentions Serializable, not Serializable', and the methods put' and get call put and get, not put' and get'. The reason is that the value wrapped in K1 has an ordinary type, not a representation type.

Accessing canonical behavior

We can now apply canonical behavior to ordinary types using the methods from and to from the Generic class. For example, we can implement functions defaultPut and defaultGet that provide canonical serialization and deserialization for all instances of Generic:

defaultPut :: (Generic a, Serializable' (Rep a)) =>
              a -> [Bit]
defaultPut = put' . from

defaultGet :: (Generic a, Serializable' (Rep a)) =>
              [Bit] -> (a, [Bit])
defaultGet bits = (to rep, bits') where

    (rep, bits') = get' bits

We can use these functions in instance declarations for Serializable. For example, we can make Tree a an instance of Serializable in the following way:

instance Serializable a => Serializable (Tree a) where
    put = defaultPut

    get = defaultGet

Compared to the instance declaration we had initially, this one is a real improvement, since we do not have to implement the desired behavior of put and get by hand anymore. However, it still contains boilerplate code in the form of the trivial method declarations. It would be better to establish defaultPut and defaultGet as defaults in the class declaration:

class Serializable a where

    put :: a -> [Bit]
    put = defaultPut

    get :: [Bit] -> (a, [Bit])
    get = defaultGet

However, this is not possible, since the types of defaultPut and defaultGet are less general than the types of put and get, as they put additional constraints on the type a. Luckily, GHC supports the language extension DefaultSignatures, which allows us to give default implementations that have less general types than the actual methods (and consequently work only for those instances that are compatible with these less general types). Using DefaultSignatures, we can declare the Serializable class as follows:

class Serializable a where

    put :: a -> [Bit]
    default put :: (Generic a, Serializable' (Rep a)) =>
                   a -> [Bit]
    put = defaultPut

    get :: [Bit] -> (a, [Bit])
    default get :: (Generic a, Serializable' (Rep a)) =>
                   [Bit] -> (a, [Bit])
    get = defaultGet

With this class declaration in place, we can make Tree a an instance of Serializable as follows:

instance Serializable a => Serializable (Tree a)

With the minor extension DeriveAnyClass, which is provided by GHC starting from Version 7.10, we can even use the deriving keyword to instantiate Serializable for Tree a. As a result, we only have to write the following in order to define the Tree type and make it an instance of Serializable:

data Tree a = Leaf | Branch (Tree a) a (Tree a)
              deriving (Show, Generic, Serializable)

So finally, we can use our own classes like the Haskell standard classes regarding the use of deriving clauses, except that we have to additionally derive an instance declaration for Generic.

Specialized implementations

Usually, not all instances of a class should or even can be generated by means of generic programming, but some instances have to be crafted by hand. For example, making Int an instance of Serializable requires manual work, since Int is not an algebraic data type.

However, there is no problem with this, since we still have the opportunity to write explicit instance declarations, despite the presence of a generic solution. This is in line with the standard deriving mechanism: you can make use of it, but you are not forced to do so. So we can have the following instance declaration, for example:

instance Serializable Int where

    put val = replicate val I ++ [O]

    get bits = (length is, bits') where

        (is, O : bits') = span (== I) bits

Of course, the serialization approach we use here is not very efficient, but the instance declaration illustrates the point we want to make.

Tagged: functional programming, generic programming, GHC, Haskell, Institute of Cybernetics, literate programming, parametric polymorphism, talk, Theory Lunch, type class, type family, void (Haskell package)

by Wolfgang Jeltsch at June 06, 2017 12:11 AM

June 05, 2017

Roman Cheplyaka

Word vs Int

When dealing with bounded non-negative integer quantities in Haskell, should we use Word or Int to represent them?

Some argue that we shoud use Word because then we automatically know that our quantities are non-negative.

There is a famous in typed functional programming circles maxim that says “make illegal states unrepresentable”. Following this maxim generally leads to code that is more likely correct, but in each specific instance we should check that we are indeed getting closer to our goal (writing correct code) instead of following a cargo cult.

So let’s examine whether avoiding unrepresentable negative states serves us well here.

If our program is correct and never arrives at a negative result, it does not matter whether we use Int or Word. (I’m setting overflow issues aside for now.)

Thus, we only need to consider a case when we subtract a bigger number from a smaller number because of a logic flaw in our program.

Here is what happens depending on what type we use:

> 2 - 3 :: Int
> 2 - 3 :: Word

Which answer would you prefer?

Even though technically -1 doesn’t satisfy our constraints and 18446744073709551615 does, I would choose -1 over 18446744073709551615 any day, for two reasons:

  1. There is a chance that some downstream computation will recognize a negative number and report an error.

    A stock exchange won’t let me buy -1 shares, and the engine won’t let me set the speed to -1 km/h. (These are terrible examples, I know, but hopefully they illustrate my point.)

    Will those systems also reject 18446744073709551615 shares or km/h? If they are well designed, yes, but I’d rather not test this in production.

  2. For a human, it is easier to notice the mistake if the answer does not make any sense at all than if the answer kinda makes sense.

    If an experienced programmer sees an unexpectedly huge number like 18446744073709551615, she will easily connect it to an underflow, although it’s an extra logical step she has to make. A less experienced programmer might spend quite a bit of time figuring this out.

In any case, I don’t see any advantage of replacing an invalid number such as -1 with a technically-valid-but-meaningless number like 18446744073709551615.

Ben Millwood said it well:

I moreover feel like, e.g. length :: [a] -> Word (or things of that ilk) would be even more of a mistake, because type inference will spread that Word everywhere, and 2 - 3 :: Word is catastrophically wrong. Although it seems nice to have an output type for non-negative functions that only has non-negative values, in fact Word happily supports subtraction, conversion from Integer, even negation (!) without a hint that anything has gone amiss. So I just don’t believe that it is a type suitable for general “positive integer” applications.


There is a way to get the best of both worlds, though: the Natural type from Numeric.Natural.

  1. It is an arbitrary-precision integer, so we don’t have to worry about overflow.
  2. It is a non-negative type, so the invalid state is not representable.
  3. It raises an exception upon underflow, making the errors in the code prominent:

    > 2 - 3 :: Natural
    *** Exception: arithmetic underflow

There are perhaps a couple of valid use cases for Word that I can think of, but they are fairly exotic. (I am not talking here about the types such as Word32 or Word64, which are indespensible for bit manipulation.) Most of the time we should prefer either Int or Natural to Word.

June 05, 2017 08:00 PM

June 04, 2017

Roman Cheplyaka

Universally stateless monads

Background: monad-control and stateless monads

The monad-control package allows to lift IO functions such as

forkIO :: IO () -> IO ThreadId

catch :: Exception e => IO a -> (e -> IO a) -> IO a

allocate :: MonadResource m => IO a -> (a -> IO ()) -> m (ReleaseKey, a)

to IO-based monad stacks such as StateT Int (ReaderT Bool IO).

The core idea of the package is the associated type StM, which, for a given monad m and result type a, calculates the “state” of m at a.

The “state” of a monad is whatever the “run” function for this monad returns.

For instance, for StateT Int IO Char, we have

runStateT :: StateT Int IO Char -> Int -> IO (Char, Int)

The result type of this function (minus the outer monad constructor, IO, which is always there) is (Char, Int), and that is what StM (StateT Int IO) Char should expand to:

> :kind! StM (StateT Int IO) Char
StM (StateT Int IO) Char :: *
= (Char, Int)

In this case, StM m a is not the same as a; it contains a plus some extra information.

In other cases, StM m a may not contain an a at all; for instance

> :kind! StM (ExceptT Text IO) Char
StM (ExceptT Text IO) Char :: *
= Either Text Char

and we cannot always extract a Char from Either Text Char.

For some monads, though, StM m a reduces precisely to a. I call such monads “stateless”. A notable example is the reader monad:

> :kind! StM (ReaderT Int IO) Bool
StM (ReaderT Int IO) Bool :: *
= Bool

Note that a monad like ReaderT (IORef Int) IO is also stateless, even though one can use it to implement stateful programs.

The important feature of stateless monads is that we can fork them without duplicating the state and terminate them without losing the state. The monad-control package works best with stateless monads: it is less tricky to understand, and you can do some things with stateless monads that are hard or impossible to do with arbitrary MonadBaseControl monads.

Universally stateless monads

When both the monad m and the result type a are known, the compiler can expand the associated synonym StM m a and decide whether StM m a ~ a.

However, there are good reasons to keep the monad m polymorphic and instead impose the constraints (e.g. MonadReader Int m) that m must satisfy.

In this case, the compiler cannot know a priori that m is stateless, and we need to explicitly state that in the function signature. In Taking advantage of type synonyms in monad-control, I showed one such example: running a web application with access to the database. In order to convince the compiler that m is stateless, I needed to add the constraint

StM m ResponseReceived ~ ResponseReceived

to the type signature.

As you can see, this doesn’t quite say “monad m is stateless”; instead it says “monad m is stateless at type a” (where a is ResponseReceived in the above example).

This is fine if we only use monad-control at one result type. But if we use monad-control functions at many different types, the number of constraints required quickly gets out of hand.

As an example, consider the allocate function from resourcet’s Control.Monad.Trans.Resource:

allocate :: MonadResource m => IO a -> (a -> IO ()) -> m (ReleaseKey, a)

As the module’s documentation says,

One point to note: all register cleanup actions live in the IO monad, not the main monad. This allows both more efficient code, and for monads to be transformed.

In practice, it is often useful for the register and cleanup actions to live in the main monad. monad-control lets us lift the allocate function:

{-# LANGUAGE FlexibleContexts, TypeFamilies, ScopedTypeVariables #-}
import Control.Monad.Trans.Control
import Control.Monad.Trans.Resource

  :: forall m a . (MonadBaseControl IO m, MonadResource m,
                   StM m a ~ a, StM m () ~ (), StM m (ReleaseKey, a) ~ (ReleaseKey, a))
  => m a -> (a -> m ()) -> m (ReleaseKey, a)
allocateM acquire release =
    (\runInIO -> runInIO $ allocate
      (runInIO acquire)
      (runInIO . release))

This small function requires three different stateless constraints — constraints of the form StM m x ~ x — and two additional stateless constraints per each additional type a at which we use it.

These constraints are artefacts of the way type synonyms work in Haskell; StM m a is not really supposed to depend on a. If a monad is stateless, it is stateless at every result type.

In order to capture this universal statelessness in a single constraint, we can use Forall from the constraints package.

First, we need to transform the constraint StM m a ~ a to a form in which it can be partially applied, as we want to abstract over a. Simply saying

type StatelessAt m a = StM m a ~ a

won’t do because type synonyms need to be fully applied: StatelessAt m is not a valid type.

We need to use a trick to create a class synonym:

class    StM m a ~ a => StatelessAt m a
instance StM m a ~ a => StatelessAt m a

Now StatelessAt is a legitimate class constraint (not a type synonym), and so we can abstract over its second argument with Forall:

type Stateless m = Forall (StatelessAt m)

Now we only need to include the Stateless m constraint, and every time we need to prove that StM m a ~ a for some result type a, we wrap the monadic computation in assertStateless @a (...), where assertStateless is defined as follows:

assertStateless :: forall a m b . Stateless m => (StM m a ~ a => m b) -> m b
assertStateless action = action \\ (inst :: Stateless m :- StatelessAt m a)

The type signature of assertStateless is crafted in such a way that we only need to specify a, and m is inferred to be the “current” monad. We could have given assertStateless a more general type

assertStateless :: forall m a r . Stateless m => (StM m a ~ a => r) -> r

but now we have to apply it to both m and a.

As an example of using assertStateless, let’s rewrite the lifted allocate function to include a single stateless constraint:

  :: forall m a . (MonadBaseControl IO m, MonadResource m, Stateless m)
  => m a -> (a -> m ()) -> m (ReleaseKey, a)
allocateM acquire release =
  assertStateless @a $
  assertStateless @() $
  assertStateless @(ReleaseKey, a) $
    (\runInIO -> runInIO $ allocate
      (runInIO acquire)
      (runInIO . release))

Here, assertStateless generated all three StM m x ~ x constraints for us on the fly, from the single universally-quantified constraint Stateless m.

Stateless monad transformers

Let’s say we are writing a function that works in some stateless monad, m:

foo :: (MonadBaseControl IO m, MonadResource m, Stateless m) => m ()
foo = do ...

But locally, it adds another layer on top of m:

foo :: (MonadBaseControl IO m, MonadResource m, Stateless m) => m ()
foo = do
  thing <- getThing
  flip runReaderT thing $ do

And somewhere in there we need to allocate something:

foo :: (MonadBaseControl IO m, MonadResource m, Stateless m) => m ()
foo = do
  thing <- getThing
  flip runReaderT thing $ do
    resource <- allocateM acq rel

The compiler won’t accept this, though:

• Could not deduce: StM
                      m (Data.Constraint.Forall.Skolem (StatelessAt (ReaderT () m)))
                    Data.Constraint.Forall.Skolem (StatelessAt (ReaderT () m))
    arising from a use of ‘allocateM’
  from the context: (MonadBaseControl IO m,
                     MonadResource m,
                     Stateless m)
    bound by the type signature for:
               foo :: (MonadBaseControl IO m, MonadResource m, Stateless m) =>
                      m ()

In order to run allocateM in the inner environment, ReaderT Thing m, we need to satisfy the constraint Stateless (ReaderT Thing), which is different from the Stateless m that we have in scope.

If the acq and rel actions do not need to access the thing, we can avoid the problem by lifting the action to the outer environment:

foo :: (MonadBaseControl IO m, MonadResource m, Stateless m) => m ()
foo = do
  thing <- getThing
  flip runReaderT thing $ do
    resource <- lift $
      -- this happens in m
      allocateM acq rel

But what if acq and rel do need to know the thing?

In that case, we need to prove to the compiler that for all m, Stateless m implies Stateless (ReaderT Thing). This should follow from the fact that ReaderT e is itself a “stateless transformer”, meaning that it doesn’t change the state of the monad that it transforms. As with Stateless, we put this in the form of partially-applicable class and then abstract over a (and m):

class    StM (t m) a ~ StM m a => StatelessTAt t (m :: * -> *) a
instance StM (t m) a ~ StM m a => StatelessTAt t m a

type StatelessT t = ForallV (StatelessTAt t)

Now we need to prove that StatelessT t and Stateless m together imply Stateless (t m). In the notation of the constraints package, this statement can be written as

statelessT :: forall t m . (StatelessT t, Stateless m) :- Stateless (t m)

How to prove it in Haskell is not completely obvious, and I recommend that you try it yourself. I also posted a simplified version of this exercise on twitter the other day.

Anyway, here is one possible answer:

statelessT = Sub $ forall prf
    prf :: forall a . (StatelessT t, Stateless m) => Dict (StatelessAt (t m) a)
    prf = Dict \\ (instV :: StatelessT t :- StatelessTAt t m a)
               \\ (inst  :: Stateless m  :- StatelessAt m a)

Finally, here is a function analogous to assertStateless that brings the Stateless (t m) constraint into scope:

  :: forall t m b . (StatelessT t, Stateless m)
  => (Stateless (t m) => (t m) b) -> (t m) b
liftStatelessT action = action \\ statelessT @t @m

And here is a minimal working example that demonstrates the usage of liftStatelessT:

foo :: (MonadBaseControl IO m, MonadResource m, Stateless m) => m ()
foo = do
  flip runReaderT () $ liftStatelessT $ do
    resource <- allocateM (return ()) (const $ return ())
    return ()

Complete code for Stateless and StatelessT

{-# LANGUAGE GADTs, ConstraintKinds, MultiParamTypeClasses, FlexibleInstances,
             ScopedTypeVariables, RankNTypes, AllowAmbiguousTypes,
             TypeApplications, TypeOperators, KindSignatures,
             UndecidableInstances, UndecidableSuperClasses #-}

module Stateless where

import Data.Constraint
import Data.Constraint.Forall
import Control.Monad.Trans.Control

class    StM m a ~ a => StatelessAt m a
instance StM m a ~ a => StatelessAt m a

-- | A constraint that asserts that a given monad is stateless
type Stateless m = Forall (StatelessAt m)

-- | Instantiate the stateless claim at a particular monad and type
assertStateless :: forall a m b . Stateless m => (StM m a ~ a => m b) -> m b
assertStateless action = action \\ (inst :: Stateless m :- StatelessAt m a)

class    StM (t m) a ~ StM m a => StatelessTAt t (m :: * -> *) a
instance StM (t m) a ~ StM m a => StatelessTAt t m a

-- | A statement that a monad transformer doesn't alter the state type
type StatelessT t = ForallV (StatelessTAt t)

-- | A proof that if @t@ is a stateless transformer and @m@ is a stateless monad,
-- then @t m@ is a stateless monad
statelessT :: forall t m . (StatelessT t, Stateless m) :- Stateless (t m)
statelessT = Sub $ forall prf
    prf :: forall a . (StatelessT t, Stateless m) => Dict (StatelessAt (t m) a)
    prf = Dict \\ (instV :: StatelessT t :- StatelessTAt t m a)
               \\ (inst  :: Stateless m  :- StatelessAt m a)

-- | Derive the 'Stateless' constraint for a transformed monad @t m@
  :: forall t m b . (StatelessT t, Stateless m)
  => (Stateless (t m) => (t m) b) -> (t m) b
liftStatelessT action = action \\ statelessT @t @m

June 04, 2017 08:00 PM

June 02, 2017

Philip Wadler

Dominic Steinitz

Haskell for Numerics?



Back in January, a colleague pointed out to me that GHC did not produce very efficient code for performing floating point abs. I have yet to produce a write-up of my notes about hacking on GHC: in summary it wasn’t as difficult as I had feared and the #ghc folks were extremely helpful.

But maybe getting GHC to produce high performance numerical code is “swimming uphill”. Below is a comparison of a “state of the art” numerical language, Julia, and an alternative Haskell approach, using a Haskell domain-specific embedded language (DSEL), accelerate. The problem is a real one albeit quite simple from a programming point of view and should therefore be taken with some grains of salt.

A bit more background

The reason for improving GHC’s (numerical) performance is that I’d like to have a fast way of performing statistical inference either via Hamiltonian Monte Carlo or Sequential Monte Carlo. Stan is my “go to” tool for offline analysis but its inference engine is fixed and according to Fast Forward Labs: “it’s very awkward to productize”. What one would really like is the ability to choose and compose inference methods like can be done in monad-bayes. Although I haven’t compared monad-bayes and Stan in very much depth, the latter seems faster in my very limited experience.

The view of one of the folks that’s worked hard on making Haskell a great tool for doing numerical applications is that trying to get GHC to produce llvm on which tools like polly can work is “swimming up hill”.

A lot of people who work on numerical problems use matlab but it seems like if you want to put the code into production then you have to re-write it. Julia is an attempt to improve this situation and also provide higher performance.

A sort of manifesto

To summarise: what we’d like is type-safe blazingly fast numerical code. Type-safe can give us e.g. static guarantees that matrix multiplication is between compatible matrices and assurances that the units are correct. Here’s an example of static typing helping ensure the correctness of Kalman filter usage and here’s a package that I have used successfully on a medium-sized project to ensure all units are correct.

Symplectic Integrators

Let’s take a simple solver, encode it in Haskell and Julia and see how well we can do.

> {-# OPTIONS_GHC -Wall                   #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults #-}
> {-# LANGUAGE FlexibleContexts #-}
> {-# LANGUAGE TypeOperators    #-}
> module Symplectic (
>     runSteps
>   , runSteps'
>   , reallyRunSteps'
>   , inits
>   , h
>   , bigH1
>   , hs
>   , nabla1
>   , nabla2
>   , runStepsH98
>   , bigH2BodyH98
>   ) where
> import Data.Number.Symbolic
> import Numeric.AD
> import Prelude                            as P
> import Data.Array.Accelerate              as A   hiding ((^))
> import Data.Array.Accelerate.LLVM.Native  as CPU
> import Data.Array.Accelerate.Linear              hiding (trace)
> import Data.Array.Accelerate.Control.Lens
> import qualified Linear                   as L

The Störmer-Verlet scheme is an implicit symplectic method of order 2. Being symplectic is important as it preserves the energy of the systems being solved. If, for example, we were to use RK4 to simulate planetary motion then the planets would either crash into the sun or leave the solar system: not a very accurate representation of reality.

\displaystyle   \begin{aligned}  p_{n+1 / 2} &= p_n         - \frac{h}{2}\nabla_q H(p_{n+1 / 2}, q_n) \\  q_{n+1}     &= q_n         + \frac{h}{2}(\nabla_p H(p_{n+1 / 2}, q_n) + \nabla_p H(p_{n+1 / 2}, q_{n+1}) \\  p_{n+1}     &= p_{n+1 / 2} - \frac{h}{2}\nabla_q H(p_{n+1 / 2}, q_{n+1})  \end{aligned}

Let’s assume that the Hamiltonian is of the form H(p, q) = T(q) + V(p) then this becomes an explicit scheme.

\displaystyle   \begin{aligned}  p_{n+1 / 2} &= p_n         - \frac{h}{2}\nabla_q T(q_n) \\  q_{n+1}     &= q_n         + \frac{h}{2}(\nabla_p V(p_{n+1 / 2}) + \nabla_q V(p_{n+1 / 2})) \\  p_{n+1}     &= p_{n+1 / 2} - \frac{h}{2}\nabla_q T(q_{n+1})  \end{aligned}

Simple Pendulum

Consider the following Hamiltonian for the pendulum:

\displaystyle   H(q,p) = \frac{1}{2}p^2 - \cos q

> bigH1 :: P.Floating a => a -> a -> a
> bigH1 q p = p^2 / 2 - cos q
> ham1 :: P.Floating a => [a] -> a
> ham1 [qq1, pp1] = 0.5 * pp1^2 - cos qq1
> ham1 _ = error "Hamiltonian defined for 2 generalized co-ordinates only"

Although it is trivial to find the derivatives in this case, let us check using automatic symbolic differentiation

> q1, q2, p1, p2 :: Sym a
> q1 = var "q1"
> q2 = var "q2"
> p1 = var "p1"
> p2 = var "p2"
> nabla1 :: [[Sym Double]]
> nabla1 = jacobian ((\x -> [x]) . ham1) [q1, p1]
ghci> nabla1
  [[sin q1,0.5*p1+0.5*p1]]

which after a bit of simplification gives us \nabla_q T(q) = \sin q and \nabla_p V(p) = p or

> nablaQ, nablaP :: P.Floating a => a -> a
> nablaQ = sin
> nablaP = id

One step of the Störmer-Verlet

> oneStep :: P.Floating a => (a -> a) ->
>                                  (a -> a) ->
>                                  a ->
>                                  (a, a) ->
>                                  (a, a)
> oneStep nabalQQ nablaPP hh (qPrev, pPrev) = (qNew, pNew)
>   where
>     h2 = hh / 2
>     pp2 = pPrev - h2 * nabalQQ qPrev
>     qNew = qPrev + hh * nablaPP pp2
>     pNew = pp2 - h2 * nabalQQ qNew
> h :: Double
> h = 0.01
> hs :: (Double, Double) -> [(Double, Double)]
> hs = P.iterate (oneStep nablaQ nablaP h)

We can check that the energy is conserved directly

ghci> (P.uncurry bigH1) $ P.take 5 $               hs (pi/4, 0.0)

ghci> (P.uncurry bigH1) $ P.take 5 $ P.drop 1000 $ hs (pi/4, 0.0)

Two body problem

Newton’s equations of motions for the two body problem are

\displaystyle   \ddot{q}_1 = -\frac{x}{(x^2 + y^2)^{3/2}}, \quad  \ddot{q}_2 = -\frac{y}{(x^2 + y^2)^{3/2}}

And we can re-write those to use Hamilton’s equations with this Hamiltonian

\displaystyle   H(p_1,p_2,q_1,q_2) = \frac{1}{2}(p_1^2 +p_2^2) - \frac{1}{\sqrt{q_1^2 + q_2^2}}

The advantage of using this small example is that we know the exact solution should we need it.

> ham2 :: P.Floating a => [a] -> a
> ham2 [qq1, qq2, pp1, pp2] = 0.5 * (pp1^2 + pp2^2) - recip (sqrt (qq1^2 + qq2^2))
> ham2 _ = error "Hamiltonian defined for 4 generalized co-ordinates only"

Again we can calculate the derivatives

> nabla2 :: [[Sym Double]]
> nabla2 = jacobian ((\x -> [x]) . ham2) [q1, q2, p1, p2]
ghci> (P.mapM_ . P.mapM_) putStrLn . ( show) $ nabla2
  q1/(2.0*sqrt (q1*q1+q2*q2))/sqrt (q1*q1+q2*q2)/sqrt (q1*q1+q2*q2)+q1/(2.0*sqrt (q1*q1+q2*q2))/sqrt (q1*q1+q2*q2)/sqrt (q1*q1+q2*q2)
  q2/(2.0*sqrt (q1*q1+q2*q2))/sqrt (q1*q1+q2*q2)/sqrt (q1*q1+q2*q2)+q2/(2.0*sqrt (q1*q1+q2*q2))/sqrt (q1*q1+q2*q2)/sqrt (q1*q1+q2*q2)

which after some simplification becomes

\displaystyle   \begin{matrix}  \frac{q_1}{(q_1^2 + q_2^2)^{3/2}} \\  \frac{q_2}{(q_1^2 + q_2^2)^{3/2}} \\  p_1 \\  p_2  \end{matrix}

Here’s one step of Störmer-Verlet using Haskell 98.

> oneStepH98 :: Double -> V2 (V2 Double) -> V2 (V2 Double)
> oneStepH98 hh prev = V2 qNew pNew
>   where
>     h2 = hh / 2
>     hhs = V2 hh hh
>     hh2s = V2 h2 h2
>     pp2 = psPrev - hh2s * nablaQ' qsPrev
>     qNew = qsPrev + hhs * nablaP' pp2
>     pNew = pp2 - hh2s * nablaQ' qNew
>     qsPrev = prev ^. L._x
>     psPrev = prev ^. L._y
>     nablaQ' qs = V2 (qq1 / r) (qq2 / r)
>       where
>         qq1 = qs ^. L._x
>         qq2 = qs ^. L._y
>         r   = (qq1 ^ 2 + qq2 ^ 2) ** (3/2)
>     nablaP' ps = ps

And here is the same thing using accelerate.

> oneStep2 :: Double -> Exp (V2 (V2 Double)) -> Exp (V2 (V2 Double))
> oneStep2 hh prev = lift $ V2 qNew pNew
>   where
>     h2 = hh / 2
>     hhs = lift $ V2 hh hh
>     hh2s = lift $ V2 h2 h2
>     pp2 = psPrev - hh2s * nablaQ' qsPrev
>     qNew = qsPrev + hhs * nablaP' pp2
>     pNew = pp2 - hh2s * nablaQ' qNew
>     qsPrev :: Exp (V2 Double)
>     qsPrev = prev ^. _x
>     psPrev = prev ^. _y
>     nablaQ' :: Exp (V2 Double) -> Exp (V2 Double)
>     nablaQ' qs = lift (V2 (qq1 / r) (qq2 / r))
>       where
>         qq1 = qs ^. _x
>         qq2 = qs ^. _y
>         r   = (qq1 ^ 2 + qq2 ^ 2) ** (3/2)
>     nablaP' :: Exp (V2 Double) -> Exp (V2 Double)
>     nablaP' ps = ps

With initial values below the solution is an ellipse with eccentricity e.

> e, q10, q20, p10, p20 :: Double
> e = 0.6
> q10 = 1 - e
> q20 = 0.0
> p10 = 0.0
> p20 = sqrt ((1 + e) / (1 - e))
> initsH98 :: V2 (V2 Double)
> initsH98 = V2 (V2 q10 q20) (V2 p10 p20)
> inits :: Exp (V2 (V2 Double))
> inits = lift initsH98

We can either keep all the steps of the simulation using accelerate and the CPU

> nSteps :: Int
> nSteps = 10000
> dummyInputs :: Acc (Array DIM1 (V2 (V2 Double)))
> dummyInputs = A.use $ A.fromList (Z :. nSteps) $
>                P.replicate nSteps (V2 (V2 0.0 0.0) (V2 0.0 0.0))
> runSteps :: Array DIM1 (V2 (V2 Double))
> runSteps = $ A.scanl (\s _x -> (oneStep2 h s)) inits dummyInputs

Or we can do the same in plain Haskell

> dummyInputsH98 :: [V2 (V2 Double)]
> dummyInputsH98 = P.replicate nSteps (V2 (V2 0.0 0.0) (V2 0.0 0.0))
> runStepsH98 :: [V2 (V2 Double)]
> runStepsH98= P.scanl (\s _x -> (oneStepH98 h s)) initsH98 dummyInputsH98

The fact that the phase diagram for the two objects is periodic is encouraging.

And again we can check directly that the energy is conserved.

> bigH2BodyH98 :: V2 (V2 Double) -> Double
> bigH2BodyH98 z = ke + pe
>   where
>     q = z ^. L._x
>     p = z ^. L._y
>     pe = let V2 q1' q2' = q in negate $ recip (sqrt (q1'^2 + q2'^2))
>     ke = let V2 p1' p2' = p in 0.5 * (p1'^2 + p2'^2)
ghci> P.maximum $ bigH2BodyH98 $ P.take 100 $ P.drop 1000 runStepsH98

ghci> P.minimum $ bigH2BodyH98 $ P.take 100 $ P.drop 1000 runStepsH98

We’d like to measure performance and running the above for many steps might use up all available memory. Let’s confine ourselves to looking at the final result.

> runSteps' :: Int -> Exp (V2 (V2 Double)) -> Exp (V2 (V2 Double))
> runSteps' nSteps = A.iterate (lift nSteps) (oneStep2 h)
> reallyRunSteps' :: Int -> Array DIM1 (V2 (V2 Double))
> reallyRunSteps' nSteps = $
>                          A.scanl (\s _x -> runSteps' nSteps s) inits
>                          (A.use $ A.fromList (Z :. 1) [V2 (V2 0.0 0.0) (V2 0.0 0.0)])


Accelerate’s LLVM

Let’s see what accelerate generates with

{-# OPTIONS_GHC -Wall                      #-}
{-# OPTIONS_GHC -fno-warn-name-shadowing   #-}
{-# OPTIONS_GHC -fno-warn-type-defaults    #-}
{-# OPTIONS_GHC -fno-warn-unused-do-bind   #-}
{-# OPTIONS_GHC -fno-warn-missing-methods  #-}
{-# OPTIONS_GHC -fno-warn-orphans          #-}

import Symplectic

main :: IO ()
main = do
  putStrLn $ show $ reallyRunSteps' 100000000

It’s a bit verbose but we can look at the key “loop”: while5.

[   0.023] llvm: generated 102 instructions in 10 blocks
[   0.023] llvm: generated 127 instructions in 15 blocks
[   0.023] llvm: generated 86 instructions in 7 blocks
[   0.024] llvm: generated 84 instructions in 7 blocks
[   0.024] llvm: generated 18 instructions in 4 blocks
[   0.071] llvm: optimisation did work? True
[   0.072] ; ModuleID = 'scanS'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin15.5.0"

@.memset_pattern = private unnamed_addr constant [2 x double] [double 2.000000e+00, double 2.000000e+00], align 16
@.memset_pattern.1 = private unnamed_addr constant [2 x double] [double 4.000000e-01, double 4.000000e-01], align 16

; Function Attrs: nounwind
define void @scanS(i64 %ix.start, i64 %ix.end, double* noalias nocapture %out.ad3, double* noalias nocapture %out.ad2, double* noalias nocapture %out.ad1, double* noalias nocapture %out.ad0, i64 %out.sh0, double* noalias nocapture readnone %fv0.ad3, double* noalias nocapture readnone %fv0.ad2, double* noalias nocapture readnone %fv0.ad1, double* noalias nocapture readnone %fv0.ad0, i64 %fv0.sh0) local_unnamed_addr #0 {
  %0 = add i64 %fv0.sh0, 1
  %1 = icmp slt i64 %ix.start, %ix.end
  br i1 %1, label, label %while1.exit                             ; preds = %entry
  %2 = add i64 %ix.start, 1
  %3 = mul i64 %2, %fv0.sh0
  br label                                       ; preds = %while3.exit,
  %indvars.iv = phi i64 [ %3, ], [, %while3.exit ]
  %4 = phi i64 [ %ix.start, ], [ %52, %while3.exit ]
  %5 = mul i64 %4, %fv0.sh0
  %6 = mul i64 %4, %0
  %7 = getelementptr double, double* %out.ad0, i64 %6
  store double 2.000000e+00, double* %7, align 8
  %8 = getelementptr double, double* %out.ad1, i64 %6
  store double 0.000000e+00, double* %8, align 8
  %9 = getelementptr double, double* %out.ad2, i64 %6
  store double 0.000000e+00, double* %9, align 8
  %10 = getelementptr double, double* %out.ad3, i64 %6
  store double 4.000000e-01, double* %10, align 8
  %11 = add i64 %5, %fv0.sh0
  %12 = icmp slt i64 %5, %11
  br i1 %12, label, label %while3.exit                             ; preds =
  br label                                       ; preds =, %while5.exit = phi i64 [ %44, %while5.exit ], [ %6, ]
  %13 = phi i64 [ %51, %while5.exit ], [ %5, ]
  %14 = phi <2 x double> [ %43, %while5.exit ], [ <double 2.000000e+00, double 0.000000e+00>, ]
  %15 = phi <2 x double> [ %32, %while5.exit ], [ <double 0.000000e+00, double 4.000000e-01>, ]
  br label                                       ; preds =,
  %16 = phi i64 [ 0, ], [ %19, ]
  %17 = phi <2 x double> [ %14, ], [ %43, ]
  %18 = phi <2 x double> [ %15, ], [ %32, ]
  %19 = add nuw nsw i64 %16, 1
  %20 = extractelement <2 x double> %18, i32 1
  %21 = fmul fast double %20, %20
  %22 = extractelement <2 x double> %18, i32 0
  %23 = fmul fast double %22, %22
  %24 = fadd fast double %21, %23
  %25 = tail call double @llvm.pow.f64(double %24, double 1.500000e+00) #2
  %26 = insertelement <2 x double> undef, double %25, i32 0
  %27 = shufflevector <2 x double> %26, <2 x double> undef, <2 x i32> zeroinitializer
  %28 = fdiv fast <2 x double> %18, %27
  %29 = fmul fast <2 x double> %28, <double 5.000000e-03, double 5.000000e-03>
  %30 = fsub fast <2 x double> %17, %29
  %31 = fmul fast <2 x double> %30, <double 1.000000e-02, double 1.000000e-02>
  %32 = fadd fast <2 x double> %31, %18
  %33 = extractelement <2 x double> %32, i32 1
  %34 = fmul fast double %33, %33
  %35 = extractelement <2 x double> %32, i32 0
  %36 = fmul fast double %35, %35
  %37 = fadd fast double %34, %36
  %38 = tail call double @llvm.pow.f64(double %37, double 1.500000e+00) #2
  %39 = insertelement <2 x double> undef, double %38, i32 0
  %40 = shufflevector <2 x double> %39, <2 x double> undef, <2 x i32> zeroinitializer
  %41 = fdiv fast <2 x double> %32, %40
  %42 = fmul fast <2 x double> %41, <double 5.000000e-03, double 5.000000e-03>
  %43 = fsub fast <2 x double> %30, %42
  %exitcond = icmp eq i64 %19, 100000000
  br i1 %exitcond, label %while5.exit, label

while5.exit:                                      ; preds =
  %44 = add i64, 1
  %45 = getelementptr double, double* %out.ad0, i64 %44
  %46 = extractelement <2 x double> %43, i32 0
  store double %46, double* %45, align 8
  %47 = getelementptr double, double* %out.ad1, i64 %44
  %48 = extractelement <2 x double> %43, i32 1
  store double %48, double* %47, align 8
  %49 = getelementptr double, double* %out.ad2, i64 %44
  store double %35, double* %49, align 8
  %50 = getelementptr double, double* %out.ad3, i64 %44
  store double %33, double* %50, align 8
  %51 = add i64 %13, 1
  %exitcond7 = icmp eq i64 %51, %indvars.iv
  br i1 %exitcond7, label %while3.exit.loopexit, label

while3.exit.loopexit:                             ; preds = %while5.exit
  br label %while3.exit

while3.exit:                                      ; preds = %while3.exit.loopexit,
  %52 = add nsw i64 %4, 1 = add i64 %indvars.iv, %fv0.sh0
  %exitcond8 = icmp eq i64 %52, %ix.end
  br i1 %exitcond8, label %while1.exit.loopexit, label

while1.exit.loopexit:                             ; preds = %while3.exit
  br label %while1.exit

while1.exit:                                      ; preds = %while1.exit.loopexit, %entry
  ret void

; Function Attrs: nounwind
define void @scanP1(i64 %ix.start, i64 %ix.end, i64 %ix.stride, i64 %ix.steps, double* noalias nocapture %out.ad3, double* noalias nocapture %out.ad2, double* noalias nocapture %out.ad1, double* noalias nocapture %out.ad0, i64 %out.sh0, double* noalias nocapture %tmp.ad3, double* noalias nocapture %tmp.ad2, double* noalias nocapture %tmp.ad1, double* noalias nocapture %tmp.ad0, i64 %tmp.sh0, double* noalias nocapture readonly %fv0.ad3, double* noalias nocapture readonly %fv0.ad2, double* noalias nocapture readonly %fv0.ad1, double* noalias nocapture readonly %fv0.ad0, i64 %fv0.sh0) local_unnamed_addr #0 {
  %0 = mul i64 %ix.stride, %ix.start
  %1 = add i64 %0, %ix.stride
  %2 = icmp sle i64 %1, %fv0.sh0
  %3 = select i1 %2, i64 %1, i64 %fv0.sh0
  %4 = icmp eq i64 %ix.start, 0
  %5 = add i64 %0, 1
  %6 = select i1 %4, i64 %0, i64 %5
  br i1 %4, label %if5.exit, label %if5.else

if5.else:                                         ; preds = %entry
  %7 = getelementptr double, double* %fv0.ad3, i64 %0
  %8 = load double, double* %7, align 8
  %9 = getelementptr double, double* %fv0.ad2, i64 %0
  %10 = load double, double* %9, align 8
  %11 = getelementptr double, double* %fv0.ad1, i64 %0
  %12 = load double, double* %11, align 8
  %13 = getelementptr double, double* %fv0.ad0, i64 %0
  %14 = load double, double* %13, align 8
  %15 = insertelement <2 x double> undef, double %14, i32 0
  %16 = insertelement <2 x double> %15, double %12, i32 1
  %17 = insertelement <2 x double> undef, double %10, i32 0
  %18 = insertelement <2 x double> %17, double %8, i32 1
  br label %if5.exit

if5.exit:                                         ; preds = %entry, %if5.else
  %19 = phi i64 [ %5, %if5.else ], [ %0, %entry ]
  %20 = phi <2 x double> [ %16, %if5.else ], [ <double 2.000000e+00, double 0.000000e+00>, %entry ]
  %21 = phi <2 x double> [ %18, %if5.else ], [ <double 0.000000e+00, double 4.000000e-01>, %entry ]
  %22 = getelementptr double, double* %out.ad0, i64 %6
  %23 = extractelement <2 x double> %20, i32 0
  store double %23, double* %22, align 8
  %24 = getelementptr double, double* %out.ad1, i64 %6
  %25 = extractelement <2 x double> %20, i32 1
  store double %25, double* %24, align 8
  %26 = getelementptr double, double* %out.ad2, i64 %6
  %27 = extractelement <2 x double> %21, i32 0
  store double %27, double* %26, align 8
  %28 = getelementptr double, double* %out.ad3, i64 %6
  %29 = extractelement <2 x double> %21, i32 1
  store double %29, double* %28, align 8
  %30 = icmp slt i64 %19, %3
  br i1 %30, label, label %while9.exit                             ; preds = %if5.exit
  br label                                       ; preds =, %while11.exit = phi i64 [ %62, %while11.exit ], [ %6, ]
  %31 = phi i64 [ %69, %while11.exit ], [ %19, ]
  %32 = phi <2 x double> [ %61, %while11.exit ], [ %20, ]
  %33 = phi <2 x double> [ %50, %while11.exit ], [ %21, ]
  br label                                      ; preds =,
  %34 = phi i64 [ 0, ], [ %37, ]
  %35 = phi <2 x double> [ %32, ], [ %61, ]
  %36 = phi <2 x double> [ %33, ], [ %50, ]
  %37 = add nuw nsw i64 %34, 1
  %38 = extractelement <2 x double> %36, i32 1
  %39 = fmul fast double %38, %38
  %40 = extractelement <2 x double> %36, i32 0
  %41 = fmul fast double %40, %40
  %42 = fadd fast double %39, %41
  %43 = tail call double @llvm.pow.f64(double %42, double 1.500000e+00) #2
  %44 = insertelement <2 x double> undef, double %43, i32 0
  %45 = shufflevector <2 x double> %44, <2 x double> undef, <2 x i32> zeroinitializer
  %46 = fdiv fast <2 x double> %36, %45
  %47 = fmul fast <2 x double> %46, <double 5.000000e-03, double 5.000000e-03>
  %48 = fsub fast <2 x double> %35, %47
  %49 = fmul fast <2 x double> %48, <double 1.000000e-02, double 1.000000e-02>
  %50 = fadd fast <2 x double> %49, %36
  %51 = extractelement <2 x double> %50, i32 1
  %52 = fmul fast double %51, %51
  %53 = extractelement <2 x double> %50, i32 0
  %54 = fmul fast double %53, %53
  %55 = fadd fast double %52, %54
  %56 = tail call double @llvm.pow.f64(double %55, double 1.500000e+00) #2
  %57 = insertelement <2 x double> undef, double %56, i32 0
  %58 = shufflevector <2 x double> %57, <2 x double> undef, <2 x i32> zeroinitializer
  %59 = fdiv fast <2 x double> %50, %58
  %60 = fmul fast <2 x double> %59, <double 5.000000e-03, double 5.000000e-03>
  %61 = fsub fast <2 x double> %48, %60
  %exitcond = icmp eq i64 %37, 100000000
  br i1 %exitcond, label %while11.exit, label

while11.exit:                                     ; preds =
  %62 = add i64, 1
  %63 = getelementptr double, double* %out.ad0, i64 %62
  %64 = extractelement <2 x double> %61, i32 0
  store double %64, double* %63, align 8
  %65 = getelementptr double, double* %out.ad1, i64 %62
  %66 = extractelement <2 x double> %61, i32 1
  store double %66, double* %65, align 8
  %67 = getelementptr double, double* %out.ad2, i64 %62
  store double %53, double* %67, align 8
  %68 = getelementptr double, double* %out.ad3, i64 %62
  store double %51, double* %68, align 8
  %69 = add nsw i64 %31, 1
  %70 = icmp slt i64 %69, %3
  br i1 %70, label, label %while9.exit.loopexit

while9.exit.loopexit:                             ; preds = %while11.exit
  br label %while9.exit

while9.exit:                                      ; preds = %while9.exit.loopexit, %if5.exit
  %71 = phi <2 x double> [ %20, %if5.exit ], [ %61, %while9.exit.loopexit ]
  %72 = phi <2 x double> [ %21, %if5.exit ], [ %50, %while9.exit.loopexit ]
  %73 = getelementptr double, double* %tmp.ad0, i64 %ix.start
  %74 = extractelement <2 x double> %71, i32 0
  store double %74, double* %73, align 8
  %75 = getelementptr double, double* %tmp.ad1, i64 %ix.start
  %76 = extractelement <2 x double> %71, i32 1
  store double %76, double* %75, align 8
  %77 = getelementptr double, double* %tmp.ad2, i64 %ix.start
  %78 = extractelement <2 x double> %72, i32 0
  store double %78, double* %77, align 8
  %79 = getelementptr double, double* %tmp.ad3, i64 %ix.start
  %80 = extractelement <2 x double> %72, i32 1
  store double %80, double* %79, align 8
  ret void

; Function Attrs: nounwind
define void @scanP2(i64 %ix.start, i64 %ix.end, double* noalias nocapture %tmp.ad3, double* noalias nocapture %tmp.ad2, double* noalias nocapture %tmp.ad1, double* noalias nocapture %tmp.ad0, i64 %tmp.sh0, double* noalias nocapture readnone %fv0.ad3, double* noalias nocapture readnone %fv0.ad2, double* noalias nocapture readnone %fv0.ad1, double* noalias nocapture readnone %fv0.ad0, i64 %fv0.sh0) local_unnamed_addr #0 {
  %0 = add i64 %ix.start, 1
  %1 = icmp slt i64 %0, %ix.end
  br i1 %1, label, label %while1.exit                             ; preds = %entry
  %2 = getelementptr double, double* %tmp.ad3, i64 %ix.start
  %3 = load double, double* %2, align 8
  %4 = getelementptr double, double* %tmp.ad2, i64 %ix.start
  %5 = load double, double* %4, align 8
  %6 = getelementptr double, double* %tmp.ad1, i64 %ix.start
  %7 = load double, double* %6, align 8
  %8 = getelementptr double, double* %tmp.ad0, i64 %ix.start
  %9 = load double, double* %8, align 8
  %10 = insertelement <2 x double> undef, double %5, i32 0
  %11 = insertelement <2 x double> %10, double %3, i32 1
  %12 = insertelement <2 x double> undef, double %9, i32 0
  %13 = insertelement <2 x double> %12, double %7, i32 1
  br label                                       ; preds = %while3.exit,
  %14 = phi i64 [ %45, %while3.exit ], [ %0, ]
  %15 = phi <2 x double> [ %44, %while3.exit ], [ %13, ]
  %16 = phi <2 x double> [ %33, %while3.exit ], [ %11, ]
  br label                                       ; preds =,
  %17 = phi i64 [ 0, ], [ %20, ]
  %18 = phi <2 x double> [ %15, ], [ %44, ]
  %19 = phi <2 x double> [ %16, ], [ %33, ]
  %20 = add nuw nsw i64 %17, 1
  %21 = extractelement <2 x double> %19, i32 1
  %22 = fmul fast double %21, %21
  %23 = extractelement <2 x double> %19, i32 0
  %24 = fmul fast double %23, %23
  %25 = fadd fast double %22, %24
  %26 = tail call double @llvm.pow.f64(double %25, double 1.500000e+00) #2
  %27 = insertelement <2 x double> undef, double %26, i32 0
  %28 = shufflevector <2 x double> %27, <2 x double> undef, <2 x i32> zeroinitializer
  %29 = fdiv fast <2 x double> %19, %28
  %30 = fmul fast <2 x double> %29, <double 5.000000e-03, double 5.000000e-03>
  %31 = fsub fast <2 x double> %18, %30
  %32 = fmul fast <2 x double> %31, <double 1.000000e-02, double 1.000000e-02>
  %33 = fadd fast <2 x double> %32, %19
  %34 = extractelement <2 x double> %33, i32 1
  %35 = fmul fast double %34, %34
  %36 = extractelement <2 x double> %33, i32 0
  %37 = fmul fast double %36, %36
  %38 = fadd fast double %35, %37
  %39 = tail call double @llvm.pow.f64(double %38, double 1.500000e+00) #2
  %40 = insertelement <2 x double> undef, double %39, i32 0
  %41 = shufflevector <2 x double> %40, <2 x double> undef, <2 x i32> zeroinitializer
  %42 = fdiv fast <2 x double> %33, %41
  %43 = fmul fast <2 x double> %42, <double 5.000000e-03, double 5.000000e-03>
  %44 = fsub fast <2 x double> %31, %43
  %exitcond = icmp eq i64 %20, 100000000
  br i1 %exitcond, label %while3.exit, label

while3.exit:                                      ; preds =
  %45 = add i64 %14, 1
  %46 = getelementptr double, double* %tmp.ad0, i64 %14
  %47 = extractelement <2 x double> %44, i32 0
  store double %47, double* %46, align 8
  %48 = getelementptr double, double* %tmp.ad1, i64 %14
  %49 = extractelement <2 x double> %44, i32 1
  store double %49, double* %48, align 8
  %50 = getelementptr double, double* %tmp.ad2, i64 %14
  store double %36, double* %50, align 8
  %51 = getelementptr double, double* %tmp.ad3, i64 %14
  store double %34, double* %51, align 8
  %exitcond7 = icmp eq i64 %45, %ix.end
  br i1 %exitcond7, label %while1.exit.loopexit, label

while1.exit.loopexit:                             ; preds = %while3.exit
  br label %while1.exit

while1.exit:                                      ; preds = %while1.exit.loopexit, %entry
  ret void

; Function Attrs: nounwind
define void @scanP3(i64 %ix.start, i64 %ix.end, i64 %ix.stride, double* noalias nocapture %out.ad3, double* noalias nocapture %out.ad2, double* noalias nocapture %out.ad1, double* noalias nocapture %out.ad0, i64 %out.sh0, double* noalias nocapture readonly %tmp.ad3, double* noalias nocapture readonly %tmp.ad2, double* noalias nocapture readonly %tmp.ad1, double* noalias nocapture readonly %tmp.ad0, i64 %tmp.sh0, double* noalias nocapture readnone %fv0.ad3, double* noalias nocapture readnone %fv0.ad2, double* noalias nocapture readnone %fv0.ad1, double* noalias nocapture readnone %fv0.ad0, i64 %fv0.sh0) local_unnamed_addr #0 {
  %0 = add i64 %ix.start, 1
  %1 = mul i64 %0, %ix.stride
  %2 = add i64 %1, %ix.stride
  %3 = icmp sle i64 %2, %out.sh0
  %4 = select i1 %3, i64 %2, i64 %out.sh0
  %5 = add i64 %1, 1
  %6 = add i64 %4, 1
  %7 = icmp slt i64 %5, %6
  br i1 %7, label, label %while1.exit                             ; preds = %entry
  %8 = getelementptr double, double* %tmp.ad0, i64 %ix.start
  %9 = load double, double* %8, align 8
  %10 = getelementptr double, double* %tmp.ad1, i64 %ix.start
  %11 = load double, double* %10, align 8
  %12 = getelementptr double, double* %tmp.ad2, i64 %ix.start
  %13 = load double, double* %12, align 8
  %14 = getelementptr double, double* %tmp.ad3, i64 %ix.start
  %15 = load double, double* %14, align 8
  %16 = insertelement <2 x double> undef, double %9, i32 0
  %17 = insertelement <2 x double> %16, double %11, i32 1
  %18 = insertelement <2 x double> undef, double %13, i32 0
  %19 = insertelement <2 x double> %18, double %15, i32 1
  br label                                       ; preds =, %while3.exit
  %20 = phi i64 [ %55, %while3.exit ], [ %5, ]
  br label                                       ; preds =,
  %21 = phi i64 [ 0, ], [ %24, ]
  %22 = phi <2 x double> [ %17, ], [ %48, ]
  %23 = phi <2 x double> [ %19, ], [ %37, ]
  %24 = add nuw nsw i64 %21, 1
  %25 = extractelement <2 x double> %23, i32 1
  %26 = fmul fast double %25, %25
  %27 = extractelement <2 x double> %23, i32 0
  %28 = fmul fast double %27, %27
  %29 = fadd fast double %26, %28
  %30 = tail call double @llvm.pow.f64(double %29, double 1.500000e+00) #2
  %31 = insertelement <2 x double> undef, double %30, i32 0
  %32 = shufflevector <2 x double> %31, <2 x double> undef, <2 x i32> zeroinitializer
  %33 = fdiv fast <2 x double> %23, %32
  %34 = fmul fast <2 x double> %33, <double 5.000000e-03, double 5.000000e-03>
  %35 = fsub fast <2 x double> %22, %34
  %36 = fmul fast <2 x double> %35, <double 1.000000e-02, double 1.000000e-02>
  %37 = fadd fast <2 x double> %36, %23
  %38 = extractelement <2 x double> %37, i32 1
  %39 = fmul fast double %38, %38
  %40 = extractelement <2 x double> %37, i32 0
  %41 = fmul fast double %40, %40
  %42 = fadd fast double %39, %41
  %43 = tail call double @llvm.pow.f64(double %42, double 1.500000e+00) #2
  %44 = insertelement <2 x double> undef, double %43, i32 0
  %45 = shufflevector <2 x double> %44, <2 x double> undef, <2 x i32> zeroinitializer
  %46 = fdiv fast <2 x double> %37, %45
  %47 = fmul fast <2 x double> %46, <double 5.000000e-03, double 5.000000e-03>
  %48 = fsub fast <2 x double> %35, %47
  %exitcond = icmp eq i64 %24, 100000000
  br i1 %exitcond, label %while3.exit, label

while3.exit:                                      ; preds =
  %49 = getelementptr double, double* %out.ad0, i64 %20
  %50 = extractelement <2 x double> %48, i32 0
  store double %50, double* %49, align 8
  %51 = getelementptr double, double* %out.ad1, i64 %20
  %52 = extractelement <2 x double> %48, i32 1
  store double %52, double* %51, align 8
  %53 = getelementptr double, double* %out.ad2, i64 %20
  store double %40, double* %53, align 8
  %54 = getelementptr double, double* %out.ad3, i64 %20
  store double %38, double* %54, align 8
  %55 = add i64 %20, 1
  %56 = icmp slt i64 %55, %6
  br i1 %56, label, label %while1.exit.loopexit

while1.exit.loopexit:                             ; preds = %while3.exit
  br label %while1.exit

while1.exit:                                      ; preds = %while1.exit.loopexit, %entry
  ret void

; Function Attrs: norecurse nounwind
define void @generate(i64 %ix.start, i64 %ix.end, double* noalias nocapture %out.ad3, double* noalias nocapture %out.ad2, double* noalias nocapture %out.ad1, double* noalias nocapture %out.ad0, i64 %out.sh0, double* noalias nocapture readnone %fv0.ad3, double* noalias nocapture readnone %fv0.ad2, double* noalias nocapture readnone %fv0.ad1, double* noalias nocapture readnone %fv0.ad0, i64 %fv0.sh0) local_unnamed_addr #1 {
  %0 = icmp sgt i64 %ix.end, %ix.start
  br i1 %0, label, label %while1.exit                             ; preds = %entry
  %scevgep = getelementptr double, double* %out.ad1, i64 %ix.start
  %scevgep1 = bitcast double* %scevgep to i8*
  %1 = sub i64 %ix.end, %ix.start
  %2 = shl i64 %1, 3
  call void @llvm.memset.p0i8.i64(i8* %scevgep1, i8 0, i64 %2, i32 8, i1 false)
  %scevgep2 = getelementptr double, double* %out.ad2, i64 %ix.start
  %scevgep23 = bitcast double* %scevgep2 to i8*
  call void @llvm.memset.p0i8.i64(i8* %scevgep23, i8 0, i64 %2, i32 8, i1 false)
  %scevgep4 = getelementptr double, double* %out.ad0, i64 %ix.start
  %scevgep45 = bitcast double* %scevgep4 to i8*
  call void @memset_pattern16(i8* %scevgep45, i8* bitcast ([2 x double]* @.memset_pattern to i8*), i64 %2) #0
  %scevgep6 = getelementptr double, double* %out.ad3, i64 %ix.start
  %scevgep67 = bitcast double* %scevgep6 to i8*
  call void @memset_pattern16(i8* %scevgep67, i8* bitcast ([2 x double]* @.memset_pattern.1 to i8*), i64 %2) #0
  br label %while1.exit

while1.exit:                                      ; preds =, %entry
  ret void

; Function Attrs: nounwind readonly
declare double @llvm.pow.f64(double, double) #2

; Function Attrs: argmemonly nounwind
declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i32, i1) #3

; Function Attrs: argmemonly
declare void @memset_pattern16(i8* nocapture, i8* nocapture readonly, i64) #4

attributes #0 = { nounwind }
attributes #1 = { norecurse nounwind }
attributes #2 = { nounwind readonly }
attributes #3 = { argmemonly nounwind }
attributes #4 = { argmemonly }

And then we can run it for 10^8 steps and see how long it takes.

       9,031,008 bytes allocated in the heap
       1,101,152 bytes copied during GC
         156,640 bytes maximum residency (2 sample(s))
          28,088 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        16 colls,     0 par    0.002s   0.006s     0.0004s    0.0041s
  Gen  1         2 colls,     0 par    0.002s   0.099s     0.0496s    0.0990s

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time   10.195s  ( 10.501s elapsed)
  GC      time    0.004s  (  0.105s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   10.206s  ( 10.608s elapsed)

  %GC     time       0.0%  (1.0% elapsed)

  Alloc rate    885,824 bytes per MUT second

  Productivity 100.0% of total user, 99.0% of total elapsed

Julia’s LLVM

Let’s try the same problem in Julia.

using StaticArrays

e = 0.6
q10 = 1 - e
q20 = 0.0
p10 = 0.0
p20 = sqrt((1 + e) / (1 - e))

h = 0.01

x1 = SVector{2,Float64}(q10, q20)
x2 = SVector{2,Float64}(p10, p20)
x3 = SVector{2,SVector{2,Float64}}(x1,x2)

@inline function oneStep(h, prev)
    h2 = h / 2
    @inbounds qsPrev = prev[1]
    @inbounds psPrev = prev[2]
    @inline function nablaQQ(qs)
        @inbounds q1 = qs[1]
        @inbounds q2 = qs[2]
        r = (q1^2 + q2^2) ^ (3/2)
        return SVector{2,Float64}(q1 / r, q2 / r)
    @inline function nablaPP(ps)
        return ps
    p2 = psPrev - h2 * nablaQQ(qsPrev)
    qNew = qsPrev + h * nablaPP(p2)
    pNew = p2 - h2 * nablaQQ(qNew)
    return SVector{2,SVector{2,Float64}}(qNew, pNew)

function manyStepsFinal(n,h,prev)
    for i in 1:n
        prev = oneStep(h,prev)
    return prev

final = manyStepsFinal(100000000,h,x3)

Again we can see how long it takes

       22.20 real        22.18 user         0.32 sys

Surprisingly it takes longer but I am Julia novice so it could be some rookie error. Two things occur to me:

  1. Let’s look at the llvm and see if we can we can find an explanation.

  2. Let’s analyse execution time versus number of steps to see what the code generation cost is and the execution cost. It may be that Julia takes longer to generate code but has better execution times.

define void @julia_oneStep_71735(%SVector.65* noalias sret, double, %SVector.65*) #0 {
  %3 = fmul double %1, 5.000000e-01
  %4 = getelementptr inbounds %SVector.65, %SVector.65* %2, i64 0, i32 0, i64 0, i32 0, i64 0
  %5 = load double, double* %4, align 1
  %6 = getelementptr inbounds %SVector.65, %SVector.65* %2, i64 0, i32 0, i64 0, i32 0, i64 1
  %7 = load double, double* %6, align 1
  %8 = getelementptr inbounds %SVector.65, %SVector.65* %2, i64 0, i32 0, i64 1, i32 0, i64 0
  %9 = load double, double* %8, align 1
  %10 = getelementptr inbounds %SVector.65, %SVector.65* %2, i64 0, i32 0, i64 1, i32 0, i64 1
  %11 = load double, double* %10, align 1
  %12 = fmul double %5, %5
  %13 = fmul double %7, %7
  %14 = fadd double %12, %13
  %15 = call double @"julia_^_71741"(double %14, double 1.500000e+00) #0
  %16 = fdiv double %5, %15
  %17 = fdiv double %7, %15
  %18 = fmul double %3, %16
  %19 = fmul double %3, %17
  %20 = fsub double %9, %18
  %21 = fsub double %11, %19
  %22 = fmul double %20, %1
  %23 = fmul double %21, %1
  %24 = fadd double %5, %22
  %25 = fadd double %7, %23
  %26 = fmul double %24, %24
  %27 = fmul double %25, %25
  %28 = fadd double %26, %27
  %29 = call double @"julia_^_71741"(double %28, double 1.500000e+00) #0
  %30 = fdiv double %24, %29
  %31 = fdiv double %25, %29
  %32 = fmul double %3, %30
  %33 = fmul double %3, %31
  %34 = fsub double %20, %32
  %35 = fsub double %21, %33
  %36 = getelementptr inbounds %SVector.65, %SVector.65* %0, i64 0, i32 0, i64 0, i32 0, i64 0
  store double %24, double* %36, align 8
  %37 = getelementptr inbounds %SVector.65, %SVector.65* %0, i64 0, i32 0, i64 0, i32 0, i64 1
  store double %25, double* %37, align 8
  %38 = getelementptr inbounds %SVector.65, %SVector.65* %0, i64 0, i32 0, i64 1, i32 0, i64 0
  store double %34, double* %38, align 8
  %39 = getelementptr inbounds %SVector.65, %SVector.65* %0, i64 0, i32 0, i64 1, i32 0, i64 1
  store double %35, double* %39, align 8
  ret void

We can see two things:

  1. Julia doesn’t use SIMD by default. We can change this by using -O3. In the event (I don’t reproduce it here), this makes very little difference to performance.

  2. Julia generates

    %15 = call double @"julia_^_71741"(double %14, double 1.500000e+00) #0

    whereas GHC generates

    %26 = tail call double @llvm.pow.f64(double %25, double 1.500000e+00) #2

    Now it is entirely possible that this results in usage of different libMs, the Julia calling openlibm and GHC calling the system libm which on my machine is the one that comes with MAC OS X and is apparently quite a lot faster. We could try compiling the actual llvm and replacing the Julia calls with pow but maybe that is the subject for another blog post.

Just in case, “on-the-fly” compilation is obscuring runtime performance let’s try running both the Haskell and Julia for 20m, 40m, 80m and 100m steps.





Cleary the negative compilation time for Haskell is wrong but I think it’s fair to infer that Julia has a higher start up cost and Haskell is 2 times quicker but as noted above this may be because of different math libraries.


I used shake to build some of the material “on the fly”. There are still some manual steps to producing the blog post. The code can be downloaded from github.

by Dominic Steinitz at June 02, 2017 03:47 PM

June 01, 2017

Philip Wadler

Lambdaman meets Spider-Moy

I was delighted and gobsmacked to hear from Moises Vazquez, a university teacher from Mexico who has a penchant for dressing up as Spider-Man. Reuters reports:
"I do the same job as anyone else, I don't think it's the best class in the world just because I put on a suit. But I assure you I want to be the most honest and dedicated there is, I just want to make the classroom a better place," he said.
I am blown away by his costume, with amazing detail such as Barendregt's lambda cube on the back and categorical arrows for webbing. Time for a superhero team-up?

by Philip Wadler ( at June 01, 2017 05:19 PM

Michael Snoyman

Why I lift

WARNING Believe it or not, this post is about health and fitness, not monad transformers.

Anyone following me on Twitter over the past year has probably noticed that I've started weight lifting. At the request of some friends and family, I've been meaning for a while to write up an in-depth blog post on some topics around exercise and nutrition. But after some discussions with others at LambdaConf last week, I decided to start off with a much smaller blog post: why I started lifting, and why I think most developers—or anyone with a desk job—should do the same.

My background on this is, unfortunately, pretty standard these days.

  • I have a family history of medical conditions such as stroke, cancer, and diabetes.
  • For the majority of my life, I've been significantly overweight, at times bordering on obese. I've tried losing weight on many occassions, but historically always gained it back.
  • Slightly more unusual: I have a specific injury (I fell down a staircase onto my shoulder when I was 18) which required surgery and can cause me issues when unattended.
  • And perhaps the one almost every reader will nod in agreement to: my back used to go out regularly.

About five years ago, I decided to stop ignoring nagging health issues, fight off the laziness, do research, and make myself healthier. I've learned a lot in that time, much of which I hope to share in an upcoming blog post. In this blog post, I just want to motivate why I am so passionate about this topic now, and why I think you should be too.

While the main focus here is on weight lifting, in the sense of going to the gym and picking up heavy pieces of metal, I'd like to point out that it's hardly been an isolated activity in this journey. Highly related changes have included:

  • Significant improvements to my nutrition (again, more on this in a later post)
  • About three years of doing bodyweight workout routines (pushups, bodyweight squats, pullups, etc)
  • Performing High Intensity Interval Training (HIIT), which can basically be described as short bouts (~20 seconds) of intense sprinting with quick recover periods (~40 seconds)

Most of the benefits I'll list below are more generally about resistance training, which will include bodyweight training, resistance bands, and weight machines. If for some reason you really don't want to lift weights, consider those as good alternatives.

NOTE Despite many stereotypes out there, my comments here are not targetted specifically at men. I believe that barring specific male advantages (like spiking testosterone production), weight lifting is just as important for women as for men. The points I list below are gender neutral. If you read any of them and think they don't apply to women, I encourage you to rethink your stance.

We have more control than we think

I can't speak universally, but I definitely know this for myself and a number of people in my life. There's a pervasive idea in the modern world that our bodies just break down, and we need doctors and medicine to fix them. While this is certainly the case sometimes, I think we've often become overly reliant on pills, where lifestyle changes could be more effective with less side effects.

Weight lifting and the other changes I mentioned are a way of taking that control back.

Strength protects us

I think there's often a negative stereotype of a "meathead" or similar who trades in brain for brawn. Besides being contradicted by actual correlative studies, this stereotype implies that strength isn't an inherently good thing for our bodies.

The main thing that got me to start going into the gym was my bad back. I would wake up some mornings and be almost unable to move. It would happen seemingly randomly. I also have a family history of this, so I just assumed this would be a regular part of my life, and I would pop ibuprofen as needed.

But then I started deadlifting. As I raised the weight (meaning I had gotten stronger), an amazing thing happened: my back didn't trouble me as much. Focus on good posture, sitting correctly at my desk, and regular heating of my back muscles have all contributed. But I believe that the most important change I've made has been the deadlifts and, to a slightly lesser extent, barbell squats.

I should also mention the negative side of lifting on this topic: if you aren't careful, you can injure yourself in the gym. I've unfortunately done this. My advice—which is ripped off from others—is to start off with very light weights and focus on perfecting your form. Don't "ego lift," meaning don't lift more weight than you can handle to feel good about some number. Lift what you can safely handle, and add weight as you get stronger and are ready for it.

Increased muscle mass is healthy

If I had heard "increased muscle mass" 10 years ago, I probably would have thought of some bodybuilder, and thought it was a vain pursuit of aesthetic beauty. And while increased muscle mass can in fact be an aesthetic feature, I believe the health benefits are even greater.

When we get sick, we'll often lose muscle mass. Having a little extra reserve prevents us from getting into danger levels. Increased body mass also leads to increased caloric requirements. Meaning if you eat a specific number of calories but have extra muscle on you, you won't put on as much fat.

Nutrient partitioning

We've all heard back-and-forths over the past decade about whether carbs are good or bad for us. I don't want to get into that now. However, I will say that elevated blood glucose levels are clearly unhealthy, as is insulin insensitivity.

One of the great things about lifting weights is that it burns muscle glycogen, your major body storage of carbohydrates. When you do this, your muscles will quickly soak up the glucose in your bloodstream, instead of it getting turned into fat or, worse, harming your body organs.

Psychologically beneficial

We all have stress in our lives. Family, work, finances, and hundreds of others. It can be difficult to cope with it.

Lifting weights is also a stress, but it's a completely different kind of stress. It's physical (though sometimes a bit mentally terrifying as well). And it's short-term, instead of the really unhealthy, cortisol-producing chronic stress many of us have.

But here's the best part for me: lifting is an escape, in a way that most other "leisure" activities (reading, watching TV, etc) are not. When I'm doing a bench press, it is a simple battle between me and gravity. There are no complex deadlocks to debug, no decisions around corporate strategy to make, no challenges with children's education.

There's a heavy thing. I pick it up. I put it down. I do it again. Its simplicity is its virtue.

More productive

Far too many of us developers spend our entire lives on electronic devices. We wake up and check Reddit/Hacker News/whatever. We watch some TV or YouTube videos. We spend all day in an office (or, in my case, home office) writing code or discussing issues on Slack/IRC/whatever. We're constantly answering emails on our phones. And then when work is over, how many of us either play video games, read Reddit some more, or work on hobby programming projects.

Far be it from me to tell people not to have open source development they do outside of work (that would be pretty hypocritical). But having something which is explicitly different from the rest of our electronic activities is a much needed break, and at least for me leads to better producitivity in the rest of my day.

A new challenge

I mentioned ego lifting above, and this section is certainly bordering on it. But it's also one of the things I enjoy the most about lifting: it's a constant new challenge. I can set new goals all the time: increase the weight on my lifts, do more reps at the same weight, or even modifying my tempo.

I need to regularly remind myself not to be stupid and push too hard. And I also need to admit that I have done stupid things in the gym in the past while trying to challenge myself. But if done safely, the constant challenges can be invigorating, exciting, and fun.

Life is more fun

Speaking of fun: when you get stronger, life is just more fun. I have four kids. They like to play (well, the three month old is slightly less active than the others). They like to play physically. And the fact that I can pick them up, toss them in the air, run around with them, and even climb through play structures in the park makes that play so much better.

Since lifting, I can play with them for longer, despite the fact that they've all grown bigger and heavier. (Though I'll admit that tossing my 4 year old is significantly easier than tossing my 9 year old.) I can't imagine what life would be like if I was unable to do more than a half-hearted game of catch for five minutes.

I've never really been into sports, so I'm not using that as a personal example. But I'd imagine those of you who do enjoy sports will find much more enjoyment in it once you've increased your strength and endurance.

Learning how to use our bodies

I remember reading a tutorial on deadlifting, and seeing the comment that it teaches you the right way to lift objects. I now regularly find myself, while moving objects around the house, naturally using deadlift techniques and cues. This helps prevent injury, and makes me a more useful person.

I have a genetic history that leads to an achilles tendon which is overly short. This leads to such issues as toe-walking, foot pain while standing, and lack of mobility/flexibility. In line with the comment above of taking control of our bodies, working on my squat has drastically improved the situation with my tendon, leading to improvements in all mentioned areas. (Though to be fair, I do still have issues here.)

In order to lift correctly, you'll need to learn how to tighten your core, tighten your lats, engage your glutes. You'll discover muscle groups you can control that you didn't know you could. You'll need to perfect your posture to nail down form.

Don't think of weight lifting as learning an isolated skill. It's a transferable activity with payoff in almost every aspect of your day to day life.

How to get started

I'm hoping to dive into details in a later blog post, but if you want to get started, my advice is: don't overcomplicate! I have a theory that a significant part of why so many people are unhealthy is the confusion around the topic (think of the paradox of choice). Choose a resistance activity and do it.

I personally think that StrongLifts 5x5 is a great place to start with weight lifting. If you want to start with bodyweights instead, I had a lot of success with the Start Bodyweight program.

This blog post is obviously an anomaly versus most of my other development-related posts. If this is something you like and would enjoy more of, please let me know. Depending on interest in it and the volume of posts on the topic, I may put the health and fitness posts in a separate part of my site. Feedback welcome!

June 01, 2017 10:00 AM

May 26, 2017

José Pedro Magalhães

Four openings for Haskell developers at Standard Chartered

I'm happy to announce that the Strats team at Standard Chartered is still hiring! We currently have openings for four roles. These will typically involve direct contact with traders to automate processes, often in rapid delivery style.

The growing Strats team now consists of about 40 developers in either Singapore or London, working exclusively in Haskell. Dozens of people from other teams are also writing Haskell code, so this is a chance to join what's probably the largest Haskell dev team in the world, and work on the largest Haskell codebase (over 3 million lines of code).

We currently offer only contractor positions (with the possibility of conversion to permanent in the future) with very competitive compensation. We require demonstrated experience in typed FP (Haskell, OCaml, F# etc); no financial background is required. We also require physical presence in either Singapore or London. Flexible work arrangements are possible, but relocation to the UK or Singapore is necessary.

If this sounds exciting to you, please email your CV and a short motivation text to Feel free to also ask any questions you might have.

by José Pedro Magalhães ( at May 26, 2017 01:01 PM

May 21, 2017

Neil Mitchell

Proving fib equivalence

Summary: Using Idris I proved the exponential and linear time fib functions are equivalent.

The Haskell wiki proclaims that Implementing the Fibonacci sequence is considered the "Hello, world!" of Haskell programming. It also provides multiple versions of fib - an exponential version and a linear version. We can translate these into Idris fairly easily:

fibExp : Nat -> Nat
fibExp Z = 0
fibExp (S Z) = 1
fibExp (S (S n)) = fibExp (S n) + fibExp n

fibLin' : Nat -> Nat -> Nat -> Nat
fibLin' Z a b = b
fibLin' (S n) a b = fibLin' n (a + b) a

fibLin : Nat -> Nat
fibLin n = fibLin' n 1 0

We've made the intermediate go function in fibLin top-level, named it fibLib' and untupled the accumulator - but it's still fundamentally the same. Now we've got the power of Idris, it would be nice to prove that fibExp and fibLin are equivalent. To do that, it's first useful to think about why fibLib' works at all. In essence we're decrementing the first argument, and making the next two arguments be fib of increasingly large values. If we think more carefully we can come up with the invariant:

fibLinInvariant : (d : Nat) -> (u : Nat) ->
fibLin' d (fibExp (1+u)) (fibExp u) = fibExp (d+u)

Namely that at all recursive calls we must have the fib of two successive values (u and u+1), and after d additional loops we end up with fib (d+u). Actually proving this invariant isn't too hard:

fibLinInvariant Z u = Refl
fibLinInvariant (S d) u =
rewrite fibLinInvariant d (S u) in
rewrite plusSuccRightSucc d u in

Idris simplifies the base case away for us. For the recursive case we apply the induction hypothesis to leave us with:

fibExp (plus d (S u)) = fibExp (S (plus d u))

Ignoring the fibExp we just need to prove d+(u+1) = (d+u)+1. That statement is obviously true because + is associative, but in our particular case, we use plusSuccRightSucc which is the associative lemma where +1 is the special S constructor. After that, everything has been proven.

Armed with the fundamental correctness invariant for fibLib, we can prove the complete equivalence.

fibEq : (n : Nat) -> fibLin n = fibExp n
fibEq n =
rewrite fibLinInvariant n 0 in
rewrite plusZeroRightNeutral n in

We appeal to the invariant linking fibLin' and finExp, do some minor arithmetic manipulation (x+0 = x), and are done. Note that depending on exactly how you define the fibLinInvariant you require different arithmetic lemmas, e.g. if you write d+u or u+d. Idris is equipped with everything required.

I was rather impressed that proving fib equivalence wasn't all that complicated, and really just requires figuring out what fundamental property makes fibLin actually work. In many ways the proof makes things clearer.

by Neil Mitchell ( at May 21, 2017 08:01 PM

May 19, 2017

Daniel Mlot (duplode)

Traversable: A Remix

Traversable is a fun type class. It lies at a crossroad, where many basic Haskell concepts meet, and it can be presented in multiple ways that provide complementary intuitions. In this post, Traversable will be described from a slightly unusual point of view, or at least one that is not put into foreground all that often. We will suspend for a moment the picture of walking across a container while using an effectful function, and instead start by considering what can be done with effectful functions.

Weird fishes

Let’s begin with a familiar sight:

a -> F b

There are quite a few overlapping ways of talking about functions with such a type. If F is a Functor, we can say the function produces a functorial context; if it is an Applicative, we (also) say it produces an effect; and if it is a Monad we (also) call it a Kleisli arrow. Kleisli arrows are the functions we use with (>>=). Kleisli arrows for a specific Monad form a category, with return as identity and the fish operator, (<=<), as composition. If we pick join as the fundamental Monad operation, (<=<) can be defined in terms of it as:

(<=<) :: Monad m => (b -> m c) -> (a -> m b) -> (a -> m c)
g <=< f = join . fmap g . f

The category laws, then, become an alternative presentation of the monad laws:

return <=< f = f
f <=< return = f
(h <=< g) <=< f = h <=< (g <=< f)

All of that is very well-known. Something less often noted, though, is that there is an interesting category for a -> F b functions even if F is not a Monad. Getting to it is amusingly easy: we just have to take the Kleisli category operators and erase the monad-specific parts from their definitions. In the case of (<=<), that means removing the join (and, for type bookkeeping purposes, slipping in a Compose in its place):

(<%<) :: (Functor f, Functor g) =>
    (b -> g c) -> (a -> f b) -> (a -> Compose f g c)
g <%< f = Compose . fmap g . f

While (<=<) creates two monadic layers and merges them, (<%<) creates two functorial layers and leaves both in place. Note that doing away with join means the Functors introduced by the functions being composed can differ, and so the category we are setting up has all functions that fit Functor f => a -> f b as arrows. That is unlike what we have with (<=<) and the corresponding Kleisli categories, which only concern a single specific monad.

As for return, not relying on Monad means we need a different identity. Given the freedom to pick any Functor mentioned just above, it makes perfect sense to replace bringing a value into a Monad in a boring way by bringing a value into the boring Functor par excellence, Identity:

Identity :: a -> Identity a

With (<%<) as composition and Identity as identity, we can state the following category laws:

Identity <%< f ~ f
f <%< Identity ~ f
(h <%< g) <%< f ~ h <%< (g <%< f)

Why didn’t I write them as equalities? Once the definition of (<%<) is substituted, it becomes clear that they do not hold literally as equalities: the left hand sides of the identity laws will have a stray Identity, and the uses of Compose on either side of the associativity law will be associated differently. Since Identity and Compose are essentially bookkeeping boilerplate, however, it would be entirely reasonable to ignore such differences. If we do that, it becomes clear that the laws do hold. All in all, we have a category, even though we can’t go all the way and shape it into a Category instance, not only due to the trivialities we chose to overlook, but also because of how each a -> F b function introduces a functorial layer F in a way that is not reflected in the target object b.

The first thing to do once after figuring out we have a category in our hands is looking for functors involving it.1 One of the simplest paths towards one is considering a way to, given some Functor T, change the source and target objects in an a -> F b function to T a and T b (that is precisely what fmap does with regular functions). This would give an endofunctor, whose arrow mapping would have a signature shaped like this:

(a -> F b) -> T a -> F (T b)

This signature shape, however, should ring a bell:

class (Functor t, Foldable t) => Traversable t where
    traverse :: Applicative f => (a -> f b) -> t a -> f (t b)
    -- etc.

If traverse were the arrow mapping of our endofunctor, the relevant functor laws would be:

traverse Identity = Identity
traverse (g <%< f) = traverse g <%< traverse f

Substituting the definition of (<%<) reveals these are the identity and composition laws of Traversable:

traverse Identity = Identity
traverse (Compose . fmap g . f) = Compose . fmap (traverse g) . traverse f

There it is: a Traversable instance is an endofunctor for a category made of arbitrary context-producing functions.2

Is it really, though? You may have noticed I have glossed over something quite glaring: if (<%<) only involved Functor constraints, where does the Applicative in traverse comes from?


Let’s pretend we have just invented the Traversable class by building it around the aforementioned endofunctor. At this point, there is no reason for using anything more restrictive than Functor in the signature of its arrow mapping:

-- Tentative signature:
traverse :: (Functor f, Traversable t) => (a -> f b) -> t a -> f (t b)

The natural thing to do now is trying to write traverse for various choices of t. Let’s try it for one of the simplest Functors around: the pair functor, (,) e – values with something extra attached:

instance Traversable ((,) e) where
    -- traverse :: Functor f => (a -> f b) -> (e, a) -> f (e, b)
    traverse f (e, x) = ((,) e) <$> f x

Simple enough: apply the function to the contained value, and then shift the extra stuff into the functorial context with fmap. The resulting traverse follows the functor laws just fine.

If we try to do it for different functors, though, we quickly run into trouble. Maybe looks simple enough…

instance Traversable Maybe where
    -- traverse :: Functor f => (a -> f b) -> Maybe a -> f (Maybe b)
    traverse f (Just x) = Just <$> f x
    traverse f Nothing  = -- ex nihilo

… but the Nothing case stumps us: there is no value that can be supplied to f, which means the functorial context would have to be created out of nothing.

For another example, consider what we might do with an homogeneous pair type (or, if you will, a vector of length two):

data Duo a = Duo a a

instance Functor Duo where
    fmap f (Duo x y) = Duo (f x) (f y)

instance Traversable Duo where
    -- traverse :: Functor f => (a -> f b) -> Duo a -> f (Duo b)
    traverse f (Duo x y) = -- dilemma

Here, we seemingly have to choose between applying f to x or to y, and then using fmap (\z -> Duo z z) on the result. No matter the choice, though, discarding one of the values means the functor laws will be broken. A lawful implementation would require somehow combining the functorial values f x and f y.

As luck would have it, though, there is a type class which provides ways to both create a functorial context out of nothing and to combine functorial values: Applicative. pure solves the first problem; (<*>), the second:

instance Traversable Maybe where
    -- traverse :: Applicative f => (a -> f b) -> Maybe a -> f (Maybe b)
    traverse f (Just x) = Just <$> f x
    traverse f Nothing  = pure Nothing

instance Traversable Duo where
    -- traverse :: Applicative f => (a -> f b) -> Duo a -> f (Duo b)
    traverse f (Duo x y) = Duo <$> f x <*> f y

Shifting to the terminology of containers for a moment, we can describe the matter by saying the version of traverse with the Functor constraint can only handle containers that hold exactly one value. Once the constraint is strengthened to Applicative, however, we have the means to deal with containers that may hold zero or many values. This is a very general solution: there are instances of Traversable for the Identity, Const, Sum, and Product functors, which suffice to encode any algebraic data type.3 That explains why the DeriveTraversable GHC extension exists. (Note, though, that Traversable instances in general aren’t unique.)

It must be noted that our reconstruction does not reflect how Traversable was discovered, as the idea of using it to walk across containers holding an arbitrary number of values was there from the start. That being so, Applicative plays an essential role in the usual presentations of Traversable. To illustrate that, I will now paraphrase Definition 3.3 in Jaskelioff and Rypacek’s An Investigation of the Laws of Traversals. It is formulated not in terms of traverse, but of sequenceA:

sequenceA :: (Applicative f, Traversable t) => t (f a) -> f (t a)

sequenceA is characterised as a natural transformation in the category of applicative functors which “respects the monoidal structure of applicative functor composition”. It is worth it to take a few moments to unpack that:

  • The category of applicative functors has what the Data.Traversable documentation calls “applicative transformations” as arrows – functions of general type (Applicative f, Applicative g) => f a -> g a which preserve pure and (<*>).

  • sequenceA is a natural transformation in the aforementioned category of applicative functors. The two functors it maps between amount to the two ways of composing an applicative functor with the relevant traversable functor. The naturality law of Traversable

    -- t is an applicative transformation
    t . sequenceA = sequenceA . fmap t

… captures that fact (which, thanks to parametricity, is a given in Haskell).

  • Applicative functors form a monoid, with Identity as unit and functor composition as multiplication. sequenceA preserves these monoidal operations, and the identity and composition laws of Traversable express that:

    sequenceA . fmap Identity = Identity
    sequenceA . fmap Compose = Compose . fmap sequenceA . sequenceA

All of that seems only accidentally related to what we have done up to this point. However, if sequenceA is taken as the starting point, traverse can be defined in terms of it:

traverse f = sequenceA . fmap f

Crucially, the opposite path is also possible. It follows from parametricity4 that…

traverse f = traverse id . fmap f

… which allows us to start from traverse, define…

sequenceA = traverse id

… and continue as before. At this point, our narrative merges with the traditional account of Traversable.

A note about lenses

In the previous section, we saw how using Applicative rather than Functor in the type of traverse made it possible to handle containers which don’t necessarily hold just one value. It is not a coincidence that, in lens, this is precisely the difference between Traversal and Lens:

type Traversal s t a b = forall f. Applicative f => (a -> f b) -> s -> f t
type Lens s t a b = forall f. Functor f => (a -> f b) -> s -> f t

A Lens targets exactly one value. A Traversal might reach zero, one or many targets, which requires a strengthening of the constraint. Van Laarhoven (i.e. lens-style) Traversals and Lenses can be seen as a straightforward generalisation of the traverse-as-arrow-mapping view we have been discussing here, in which the, so to say, functoriality of the container isn’t necessarily reflected at type level in a direct way.

A note about profunctors

Early on, we noted that (<%<) gave us a category that cannot be expressed as a Haskell Category because its composition is too quirky. We have a general-purpose class that is often a good fit for things that look like functions, arrows and/or Category instances but don’t compose in conventional ways: Profunctor. And sure enough: profunctors defines a profunctor called Star

-- | Lift a 'Functor' into a 'Profunctor' (forwards).
newtype Star f d c = Star { runStar :: d -> f c }

… which corresponds to the arrows of the category we presented in the first section. It should come as no surprise that Star is an instance of a class called Traversing

-- Abridged definition.
class (Choice p, Strong p) => Traversing p where
  traverse' :: Traversable f => p a b -> p (f a) (f b)
  wander :: (forall f. Applicative f => (a -> f b) -> s -> f t) -> p a b -> p s t

instance Applicative m => Traversing (Star m) where
  traverse' (Star m) = Star (traverse m)
  wander f (Star amb) = Star (f amb)

… which is a profunctor-oriented generalisation of Traversable.

Amusingly, it turns out there is a baroque way of expressing (<%<) composition with the profunctors vocabulary. Data.Profunctor.Composition gives us a notion of profunctor composition:

data Procompose p q d c where
  Procompose :: p x c -> q d x -> Procompose p q d c

Procompose simply pairs two profunctorial values with matching extremities. That is unlike Category composition, which welds two arrows5 into one:

(.) :: Category cat => cat b c -> cat a b -> cat a c

The difference is rather like that between combining functorial layers at type level with Compose and fusing monadic layers with join6.

Among a handful of other interesting things, Data.Functor.Procompose offers a lens-style isomorphism

stars :: Functor f => Iso' (Procompose (Star f) (Star g) d c) (Star (Compose f g) d c)

… which gives us a rather lyrical encoding of (<%<):

GHCi> import Data.Profunctor
GHCi> import Data.Profunctor.Composition
GHCi> import Data.Profunctor.Traversing
GHCi> import Data.Functor.Compose
GHCi> import Control.Lens
GHCi> f = Star $ \x -> print x *> pure x
GHCi> g = Star $ \x -> [0..x]
GHCi> getCompose $ runStar (traverse' (view stars (g `Procompose` f))) [0..2]

If you feel like playing with that, note that Data.Profunctor.Sieve offers a more compact (though prosaic) spelling:

GHCi> import Data.Profunctor.Sieve
GHCi> :t sieve
sieve :: Sieve p f => p a b -> a -> f b
GHCi> getCompose $ traverse (sieve (g `Procompose` f)) [0..2]

Further reading

  • The already mentioned An Investigation of the Laws of Traversals, by Mauro Jaskelioff and Ondrej Rypacek, is a fine entry point to the ways of formulating Traversable. It also touches upon some important matters I didn’t explore here, such as how the notion of container Traversable mobilises can be made precise, or the implications of the Traversable laws. I plan to discuss some aspects of these issues in a follow-up post.

  • Will Fancher’s Profunctors, Arrows, & Static Analysis is a good applied introduction to profunctors. In its final sections, it demonstrates some use cases for the Traversing class mentioned here.

  • The explanation of profunctor composition in this post is intentionally cursory. If you want to dig deeper, Dan Piponi’s Profunctors in Haskell can be a starting point. (N.B.: Wherever you see “cofunctor” there, read “contravariant functor” instead). Another option is going to Bartosz Milewski’s blog and searching for “profunctor” (most of the results will be relevant).

<section class="footnotes">
  1. For why that is a good idea, see Gabriel Gonzalez’s The functor design pattern.

  2. A more proper derivation for the results in this section can be found in this Stack Overflow answer, which I didn’t transcribe here to avoid boring you.

  3. Suffice, that is, with the help of the trivial data types, () (unit) and Void. As an arbitrary example, Maybe can be encoded using this functor toolkit as Sum (Const ()) Identity.

  4. The property is an immediate consequence of the free theorem for traverse. Cf. this Stack Overflow answer by Rein Heinrichs.

  5. I mean “arrows” in the general sense, and not necessarily Arrows as in Control.Arrow!

  6. This is not merely a loose analogy. For details, see Bartosz Milewski’s Monoids on Steroids, and and in particular its section about Arrows.

Comment on GitHub (see the full post for a reddit link)

by Daniel Mlot at May 19, 2017 07:30 AM

May 16, 2017

Neil Mitchell

Idris reverse proofs

Summary: I proved O(n) reverse is equivalent to O(n^2) reverse.

Following on from my previous post I left the goal of given two reverse implementations:

reverse1 : List a -> List a
reverse1 [] = []
reverse1 (x :: xs) = reverse1 xs ++ [x]

rev2 : List a -> List a -> List a
rev2 acc [] = acc
rev2 acc (x :: xs) = rev2 (x :: acc) xs

reverse2 : List a -> List a
reverse2 = rev2 []

Proving the following laws hold:

proof_reverse1_reverse1 : (xs : List a) -> xs = reverse1 (reverse1 xs)
proof_reverse2_reverse2 : (xs : List a) -> xs = reverse2 (reverse2 xs)
proof_reverse1_reverse2 : (xs : List a) -> reverse1 xs = reverse2 xs

The complete proofs are available in my Idris github repo, and if you want to get the most out of the code, it's probably best to step through the types in Idris. In this post I'll talk through a few of the details.

Directly proving the reverse1 (reverse1 x) = x by induction is hard, since the function doesn't really follow so directly. Instead we need to define a helper lemma.

proof_reverse1_append : (xs : List a) -> (ys : List a) ->
reverse1 (xs ++ ys) = reverse1 ys ++ reverse1 xs

Coming up with these helper lemmas is the magic of writing proofs - and is part intuition, part thinking about what might be useful and part thinking about what invariants are obeyed. With that lemma, you can prove by induction on the first argument (which is the obvious one to chose since ++ evaluates the first argument first). Proving the base case is trivial:

proof_reverse1_append [] ys =
rewrite appendNilRightNeutral (reverse1 ys) in

The proof immediately reduces to reverse1 ys == reverse1 ys ++ [] and we can invoke the standard proof that if ++ has [] on the right we can ignore it. After applying that rewrite, we're straight back to Refl.

The :: is not really any harder, we immediately get to reverse1 ys ++ (reverse1 xs ++ [x]), but bracketed the wrong way round, so have to apply appendAssociative to rebracket so it fits the inductive lemma. The proof looks like:

proof_reverse1_append (x :: xs) ys =
rewrite appendAssociative (reverse1 ys) (reverse1 xs) [x] in
rewrite proof_reverse1_append xs ys in Refl

Once we have the proof of how to move reverse1 over ++ we use it inductively to prove the original lemma:

proof_reverse1_reverse1 : (xs : List a) -> xs = reverse1 (reverse1 xs)
proof_reverse1_reverse1 [] = Refl
proof_reverse1_reverse1 (x :: xs) =
rewrite proof_reverse1_append (reverse1 xs) [x] in
rewrite proof_reverse1_reverse1 xs in

For the remaining two proofs, I first decided to prove reverse1 x = reverse2 x and then prove the reverse2 (reverse2 x) = x in terms of that. To prove equivalence of the two functions I first needed information about the fundamental property that rev2 obeys:

proof_rev : (xs : List a) -> (ys : List a) ->
rev2 xs ys = reverse2 ys ++ xs

Namely that the accumulator can be tacked on the end. The proof itself isn't very complicated, although it does require two calls to the inductive hypothesis (you basically expand out rev2 on both sides and show they are equivalent):

proof_rev xs [] = Refl
proof_rev xs (y :: ys) =
rewrite proof_rev [y] ys in
rewrite sym $ appendAssociative (rev2 [] ys) [y] xs in
rewrite proof_rev (y :: xs) ys in

The only slight complexity is that I needed to apply sym to switch the way the appendAssocative proof is applied. With that proof available, proving equivalence isn't that hard:

proof_reverse1_reverse2 : (xs : List a) -> reverse1 xs = reverse2 xs
proof_reverse1_reverse2 [] = Refl
proof_reverse1_reverse2 (x :: xs) =
rewrite proof_rev [x] xs in
rewrite proof_reverse1_reverse2 xs in

In essence the proof_rev term shows that rev behaves like the O(n^2) reverse.

Finally, actually proving that reverse2 (reverse2 x) is true just involves replacing all the occurrences of reverse2 with reverse1, then applying the proof that the property holds for reverse1. Nothing complicated at all:

proof_reverse2_reverse2 : (xs : List a) -> xs = reverse2 (reverse2 xs)
proof_reverse2_reverse2 xs =
rewrite sym $ proof_reverse1_reverse2 xs in
rewrite sym $ proof_reverse1_reverse2 (reverse1 xs) in
rewrite proof_reverse1_reverse1 xs in

If you've got this far and are still hungry for more proof exercises I recommend Exercises on Generalizing the Induction Hypothesis which I have now worked through (solutions if you want to cheat).

by Neil Mitchell ( at May 16, 2017 08:08 PM

May 15, 2017

Brent Yorgey

Algorithms lecture notes and assignments

I just finished teaching our algorithms course for the second time. The first time around, last spring, I was always scrambling at the last minute to prepare for class, make new assignments, and so on (although I did have some excellent material from Brent Heeringa to start with). This time around, I had a bit more breathing room to develop a fuller set of assignments and actually TeX up all my hand-written lecture notes. The course is loosely based on the approach taken by Kleinberg and Tardos, though I don’t really rely on the book.

Feel free to use any of the lecture notes, assignments, or even exam questions. I didn’t leave the exams linked by accident; I use an exam format where the students have a week or so to prepare solutions to the exam, using any resources they want, and then have to come in on exam day and write down their solutions without referring to anything (I got this idea from Scott Weinstein). So leaving the exams posted publically on the web isn’t a problem for me.

Please don’t ask for solutions; I won’t give any, even if you are an instructor. But questions, comments, bug reports, etc. are most welcome.

by Brent at May 15, 2017 06:10 PM

May 11, 2017

Mark Jason Dominus

Zomg lots more anagram stuff

I'm almost done with anagrams. For now, anyway. I think. This article is to mop up the last few leftover anagram-related matters so that I can put the subject to rest.

(Earlier articles: [1] [2] [3] [•] )

Code is available

Almost all the code I wrote for this project is available on Github.

The documentation is not too terrible, I think.

Anagram lists are available

I have also placed my scored anagram lists on my web site. Currently available are:

  • Original file from the 1990s. This contains 23,521 anagram pairs, the results of my original scoring algorithm on a hand-built dictionary that includes the Unix spellcheck dictionary (/usr/dict/words), the Webster's Second International Dictionary word list, and some lexicons copied from a contemporaneous release of WordNet. This file has been in the same place on my web site since 1997 and is certainly older than that.

  • New file from February. Unfortunately I forget what went into this file. Certainly everything in the previous file, and whatever else I had lying around, probably including the Moby Word Lists. It contains 38,333 anagram pairs.

  • Very big listing of Wikipedia article titles. (11 MB compressed) I acquired the current list of article titles from the English Wikipedia; there are around 13,000,000 of these. I scored these along with the other lexicons I had on hand. The results include 1,657,150 anagram pairs. See below for more discussion of this.

!‌!Con talk

On Saturday I gave a talk about the anagram-scoring work at !‌!Con in New York. The talk was not my best work, since I really needed 15 minutes to do a good job and I was unwilling to cut it short enough. (I did go overtime, which I deeply regret.) At least nobody came up to me afterward and complained.

Talk materials are on my web site and I will link other talk-related stuff from there when it becomes available. The video will be available around the end of May, and the text transcript probably before that.

[ Addendum 20170518: The video is available thanks to Confreaks. ]

Both algorithms are exponential

The day after the talk an attendee asked me a very good question: why did I say that one algorithm for scoring algorithms was better than the other, when they are both exponential? (Sorry, I don't remember who you were—if you would like credit please drop me a note.)

The two algorithms are:

  • A brute-force search to construct all possible mappings from word A to word B, and then calculate the minimum score over all mappings (more details)

  • The two words are converted into a graph; we find the maximum independent set in the graph, and the size of the MIS gives the score (more details)

The answer to this excellent question begins with: just because two problems are both hard doesn't mean they are equally hard. In this case, the MIS algorithm is better for several reasons:

  1. The number of possible mappings from A to B depends on the number of repeated letters in each word. For words of length n, in the worst case this is something like . This quantity is superexponential; it eventually exceeds for all constants . The naïve algorithm for MIS is only exponential, having .

  2. The problem size for the mapping algorithm depends on the number of repeated letters in the words. The problem size for the MIS algorithm depends on the number of shared adjacent letter pairs in the two words. This is almost always much smaller.

  3. There appears to be no way to score all the mappings without constructing the mappings and scoring them. In contrast, MIS is well-studied and if you don't like the obvious algorithm you can do something cleverer that takes only .

  4. Branch-and-bound techniques are much more effective for the MIS problem, and in this particular case we know something about the graph structure, which can be exploited to make them even more effective. For example, when calculating the score for

    chromophotolithograph photochromolithograph

    my MIS implementation notices the matching trailing olithograph parts right away, and can then prune out any part of the MIS search that cannot produce a mapping with fewer than 11 chunks. Doing this in the mapping-generating algorithm is much more troublesome.

Stuff that didn't go into the talk

On Wednesday I tried out the talk on Katara and learned that it was around 75% too long. I had violated my own #1 content rule: “Do not begin with a long introduction”. My draft talk started with a tour of all my favorite anagrams, with illustrations. Included were:

  • “Please” and “asleep” and “elapse”.

  • “Spectrum” and “crumpets” ; my wife noticed this while we were at a figure-skating event at the Philadelphia Spectrum, depicted above.

  • “English” and “shingle” ; I came up with this looking at a teabag while at breakfast with my wife's parents. This prompted my mother-in-law to remark that it must be hard to always be thinking about such things—but then she admitted that when she sees long numerals she always checks them for divisibility by 9.

  • “Soupmaster” and “mousetraps”. The picture here is not perfect. I wanted a picture of the Soupmaster restaurant that was at the Liberty Place food court in Philadelphia, but I couldn't find one.

  • I also wanted to show the back end of a Honda Integra and a picture of granite, but I couldn't find a good picture of either one before I deleted them from the talk. (My wife also gets credit for noticing this one.) [ Addendum 20170515: On the road yesterday I was reminded of another one my wife noticed: “Pontiac” / “caption”. ]

Slide #1 defines what anagrams actually are, with an example of “soapstone” / “teaspoons”. I had originally thought I might pander to the left-wing sensibilities of the !‌!Con crowd by using the example “Donald Trump” / “Lord Dampnut” and even made the illustration. I eventually rejected this for a couple of reasons. First, it was misleading because I only intended to discuss single-word anagrams. Second, !‌!Con is supposed to be fun and who wants to hear about Donald Trump?

But the illustration might be useful for someone else, so here it is. Share and enjoy.

After I rejected this I spent some time putting together an alternative, depicting “I am Lord Voldemort” / “Tom Marvolo Riddle”. I am glad I went with the soapstone teaspoons instead.

People Magazine

Clearly one important ingredient in finding good anagrams is that they should have good semantics. I did not make much of an effort in this direction. But it did occur to me that if I found a list of names of well-known people I might get something amusing out of it. For example, it is well known that “Britney Spears” is an anagram of “Presbyterians” which may not be meaningful but at least provides something to mull over.

I had some trouble finding a list of names of well-known people, probably because i do not know where to look, but I did eventually find a list of a few hundred on the People Magazine web site so I threw it into the mix and was amply rewarded:

Cheryl Burke Huckleberry

I thought Cheryl Burke was sufficiently famous, sufficiently recently, that most people might have heard of her. (Even I know who she is!) But I gave a version of the !‌!Con talk to the Philadelphia Perl Mongers the following Monday and I was the only one in the room who knew. (That version of the talk took around 75 minutes, but we took a lot of time to stroll around and look at the scenery, much of which is in this article.)

I had a struggle finding the right Cheryl Burke picture for the !‌!Con talk. The usual image searches turned up lots of glamour and fashion pictures and swimsuit pictures. I wanted a picture of her actually dancing and for some reason this was not easy to find. The few I found showed her from the back, or were motion blurred. I was glad when I found the one above.


A few days before the !‌!Con talk my original anagram-scoring article hit #1 on Hacker News. Hacker News user Pxtl suggested using the Wikipedia article title list as an input lexicon. The article title list is available for download from the Wikimedia Foundation so you don't have to scrape the pages as Pxtl suggested. There are around 13 million titles and I found all the anagrams and scored them; this took around 25 minutes with my current code.

The results were not exactly disappointing, but neither did they deliver anything as awesomely successful as “cinematographer” / “megachiropteran”. The top scorer by far was “ACEEEFFGHHIILLMMNNOORRSSSTUV”, which is the pseudonym of 17th-century German writer Hans Jakob Christoffel von Grimmelshausen. Obviously, Grimmelshausen constructed his pseudonym by sorting the letters of his name into alphabetical order.

(Robert Hooke famously used the same scheme to claim priority for discovery of his spring law without actually revealing it. He published the statement as “ceiiinosssttuv” and then was able to claim, two years later, that this was an anagram of the actual law, which was “ut tensio, sic vis”. (“As the extension, so the force.”) An attendee of my Monday talk wondered if there is some other Latin phrase that Hooke could have claimed to have intended. Perhaps someone else can take the baton from me on this project.)

Anyway, the next few top scorers demonstrate several different problems:

    21 Abcdefghijklmnopqrstuvwxyz / Qwertyuiopasdfghjklzxcvbnm
    21 Abcdefghijklmnopqrstuvwxyz / Qwertzuiopasdfghjklyxcvbnm
    21 Ashland County Courthouse / Odontorhynchus aculeatus
    21 Daniel Francois Malherbe / Mindenhall Air Force Base

    20 Christine Amongin Aporu / Ethnic groups in Romania
    20 Message force multiplier / Petroleum fiscal regimes

    19 Cholesterol lowering agent / North West Regional College
    19 Louise de Maisonblanche / Schoenobius damienella
    19 Scorpaenodes littoralis / Steroidal spirolactones

The “Qwerty” ones are intrinsically uninteresting and anyway we could have predicted ahead of time that they would be there. And the others are just sort of flat. “Odontorhynchus aculeatus” has the usual problems. One can imagine that there could be some delicious irony in “Daniel Francois Malherbe” / “Mindenhall Air Force Base” but as far as I can tell there isn't any and neither was Louise de Maisonblanche killed by an S. damienella. (It's a moth. Mme de Maisonblanche was actually killed by Variola which is not an anagram of anything interesting.)

Wikipedia article titles include many trivial variations. For example, many people will misspell “Winona Ryder” as “Wynona Rider”, so Wikipedia has pages for both, with the real article at the correct spelling and the incorrect one redirecting to it. The anagram detector cheerfully picks these up although they do not get high scores. Similarly:

  • there are a lot of articles about weasels that have alternate titles about “weasles”
  • there are a lot of articles about the United States or the United Kingdom that have alternate titles about the “Untied States” or the “Untied Kingdom”
  • Articles about the “Center for” something or other with redirects to (or from) the “Centre for” the same thing.
  • There is an article about “Major professional sports leagues in Canada and the United States” with a redirect from “Major professional sports leagues in the United States and Canada”.
  • You get the idea.

The anagram scorer often had quite a bit of trouble with items like these because they are long and full of repeated letter pairs. The older algorithm would have done even worse. If you're still wondering about the difference between two exponential algorithms, some of these would make good example cases to consider.

As I mentioned above you can download the Wikipedia anagrams from my web site and check for yourself. My favorite item so far is:

    18 Atlantis Casino Resort Spa / Carter assassination plot


Some words appear with surprising frequency and I don't know why. As I mentioned above one of the top scorers was “Ethnic groups in Romania” and for some reason Romania appears in the anagram list over and over again:

    20 Christine Amongin Aporu / Ethnic groups in Romania
    17 List of Romanian actors / Social transformation
    15 Imperial Coronation  / Romanian riot police
    14 Rakhine Mountains / Romanians in the UK
    14 Mindanao rasbora / Romanians abroad
    13 Romanian poets / ramosopinnate
    13 Aleuron carinatum / Aromanian culture
    11 Resita Montana / Romanian state
    11 Monte Schiara / The Romaniacs
    11 Monetarianism / Romanian Times
    11 Marion Barnes / Romanian Serb
    11 Maarsen railway station / Romanian State Railways
    11 Eilema androconia / Nicolae de Romania
    11 Ana Maria Norbis / Arabs in Romania

    ( 170 more )

Also I had never thought of this before, but Romania appears in this unexpected context:

    09 Alicia Morton / Clitoromania
    09 Carinito Malo / Clitoromania

(Alicia Morton played Annie in the 1999 film. Carinito Malo is actually Cariñito Malo. I've already discussed the nonequivalence of “n” and “ñ” so I won't beat that horse again.)

Well, this is something I can investigate. For each string of letters, we have here the number of Wikipedia article titles in which the string appears (middle column), the number of anagram pairs in which the string appears (left column; anagrams with score less than 6 are not counted) and the quotient of the two (right column).

            romania               110  4106  2.7%
            serbia                109  4400  2.5%
            croatia                68  3882  1.8%
            belarus                24  1810  1.3%

            ireland               140 11426  1.2%
            andorra                 7   607  1.2%
            austria                60  5427  1.1%
            russia                137 15944  0.9%

            macedonia              28  3167  0.9%
            france                111 14785  0.8%
            spain                  64  8880  0.7%
            slovenia               18  2833  0.6%

            wales                  47  9438  0.5%
            portugal               17  3737  0.5%
            italy                  21  4353  0.5%
            denmark                19  3698  0.5%

            ukraine                12  2793  0.4%
            england                37  8719  0.4%
            sweden                 11  4233  0.3%
            scotland               16  4945  0.3%

            poland                 22  6400  0.3%
            montenegro              4  1446  0.3%
            germany                16  5733  0.3%
            finland                 6  2234  0.3%

            albania                10  3268  0.3%
            slovakia                3  1549  0.2%
            norway                  9  3619  0.2%
            greece                 10  8307  0.1%

            belgium                 3  2414  0.1%
            switzerland             0  5439  0.0%
            netherlands             1  3522  0.0%
            czechia                 0    75  0.0%

As we see, Romania and Serbia are substantially ahead of the others. I suspect that it is a combination of some lexical property (the interesting part) and the relatively low coverage of those countries in English Wikipedia. That is, I think if we were to identify the lexical component, we might well find that russia has more of it, but scores lower than romania because Russia is much more important. My apologies if I accidentally omitted your favorite European country.

[ Oh, crap, I just realized I left out Bosnia. ]


Another one of the better high scorers turns out to be the delightful:

   16 Lesbian intercourse / Sunrise Celebration

“Lesbian”, like “Romania”, seems to turn up over and over; the next few are:

    11 Lesbian erotica / Oreste Bilancia
    11 Pitane albicollis / Political lesbian
    12 Balearic islands / Radical lesbians
    12 Blaise reaction / Lesbian erotica

    (43 more)

Wikipedia says:

The Blaise reaction is an organic reaction that forms a β-ketoester from the reaction of zinc metal with a α-bromoester and a nitrile.

A hundred points to anyone who can make a genuinely funny joke out of this.

Oreste Bilancia is an Italian silent-film star, and Pitane albicollis is another moth. I did not know there were so many anagrammatic moths. Christian Bale is an anagram of Birthana cleis, yet another moth.

I ran the same sort of analysis on lesbian as on romania, except that since it wasn't clear what to compare it to, I picked a bunch of random words.

    nosehair                 3     3 100.0%
    margarine                4    16  25.0%
    penis                   95   573  16.6%
    weasel                  11   271   4.1%
    phallus                  5   128   3.9%
    lesbian                 26   863   3.0%
    center                 340 23969   1.4%
    flowers                 14  1038   1.3%
    trumpet                  6   487   1.2%
    potato                  10   941   1.1%
    octopus                  4   445   0.9%
    coffee                  12  1531   0.8%

It seems that lesbian appears with unusually high but not remarkably high frequency. The unusual part is its participation in so many anagrams with very high scores. The outstanding item here is penis. (The top two being rare outliers.) But penis still wins even if I throw away anagrams with scores less than 10 (instead of less than 6):

    margarine               1    16   6.2%
    penis                  13   573   2.3%
    lesbian                 8   863   0.9%
    trumpet                 2   487   0.4%
    flowers                 4  1038   0.4%
    center                 69 23969   0.3%
    potato                  2   941   0.2%
    octopus                 1   445   0.2%
    coffee                  1  1531   0.1%
    weasel                  0   271   0.0%
    phallus                 0   128   0.0%
    nosehair                0     3   0.0%

Since I'm sure you are wondering, here are the anagrams of margarine and nosehair:

    07 Nosehair / Rehsonia
    08 Aso Shrine / Nosehairs
    09 Nosehairs / hoariness

    04 Margaret Hines / The Margarines
    07 Magerrain / margarine
    07 Ramiengar / margarine
    08 Rae Ingram / margarine
    11 Erika Armstrong / Stork margarine

I think “Margaret Hines” / “The Margarines” should score more than 4, and that this exposes a defect in my method.

Acrididae graphs 

Here is the graph constructed by the MIS algorithm for the pair “acrididae” / “cidaridae”, which I discussed in an earlier article and also mentioned in my talk.

Each maximum independent set in this graph corresponds to a minimum-chunk mapping between “acrididae” and “cidaridae”. In the earlier article, I claimed:

This one has two maximum independent sets

which is wrong; it has three, yielding three different mappings with five chunks:

My daughter Katara points out that the graphs above resemble grasshoppers. My Gentle Readers will no doubt recall that acrididae is the family of grasshoppers, comprising around 10,000 species. I wanted to find an anagram “grasshopper” / “?????? graph”. There are many anagrams of “eoprs” and “eoprss” but I was not able to find anything good. The best I could do was “spore graphs”.

Thank you, Gentle Readers, for taking this journey with me. I hope nobody walks up to me in the next year to complain that my blog does not feature enough anagram-related material.

by Mark Dominus ( at May 11, 2017 05:26 PM


Haskell development jobs with Well-Typed

tl;dr If you’d like a job with us, send your application as soon as possible.

We are looking for several (probably two) Haskell experts to join our team at Well-Typed. This is a great opportunity for someone who is passionate about Haskell and who is keen to improve and promote Haskell in a professional context.

About Well-Typed

We are a team of top notch Haskell experts. Founded in 2008, we were the first company dedicated to promoting the mainstream commercial use of Haskell. To achieve this aim, we help companies that are using or moving to Haskell by providing a range of services including consulting, development, training, and support and improvement of the Haskell development tools. We work with a wide range of clients, from tiny startups to well-known multinationals. We have established a track record of technical excellence and satisfied customers.

Our company has a strong engineering culture. All our managers and decision makers are themselves Haskell developers. Most of us have an academic background and we are not afraid to apply proper computer science to customers’ problems, particularly the fruits of FP and PL research.

We are a self-funded company so we are not beholden to external investors and can concentrate on the interests of our clients, our staff and the Haskell community.

About the jobs

Generally, the roles are not tied to a single specific project or task, and allow remote work. However, we are also looking for someone to work on a specific project with one of our clients, and that requires work on-site in London.

Please indicate in your application whether on-site work in London is an option for you.

In general, work for Well-Typed could cover any of the projects and activities that we are involved in as a company. The work may involve:

  • working on GHC, libraries and tools;

  • Haskell application development;

  • working directly with clients to solve their problems;

  • teaching Haskell and developing training materials.

We try wherever possible to arrange tasks within our team to suit peoples’ preferences and to rotate to provide variety and interest.

Well-Typed has a variety of clients. For some we do proprietary Haskell development and consulting. For others, much of the work involves open-source development and cooperating with the rest of the Haskell community: the commercial, open-source and academic users.

Our ideal candidate has excellent knowledge of Haskell, whether from industry, academia or personal interest. Familiarity with other languages, low-level programming and good software engineering practices are also useful. Good organisation and ability to manage your own time and reliably meet deadlines is important. You should also have good communication skills.

You are likely to have a bachelor’s degree or higher in computer science or a related field, although this isn’t a requirement.

Further (optional) bonus skills:

  • experience in teaching Haskell or other technical topics,

  • experience of consulting or running a business,

  • knowledge of and experience in applying formal methods,

  • familiarity with (E)DSL design,

  • knowledge of concurrency and/or systems programming,

  • experience with working on GHC,

  • experience with web programming (in particular front-end),

  • … (you tell us!)

Offer details

The offer is initially for one year full time, with the intention of a long term arrangement. For the remote role(s), living in England is not required. For the on-site role, you have to be allowed to work in England. We may be able to offer either employment or sub-contracting, depending on the jurisdiction in which you live.

If you are interested, please apply via Tell us why you are interested and why you would be a good fit for Well-Typed, and attach your CV. Please indicate whether the on-site work in London is an option for you. Please also indicate how soon you might be able to start.

We are more than happy to answer informal enquiries. Contact Duncan Coutts (, dcoutts on IRC), Adam Gundry (, agundry on IRC) or Andres Löh (, kosmikus on IRC) for further information.

We will consider applications as soon as we receive them, and will try to fill the positions as soon as possible. In any case, please try to get your application to us by 8 June 2017.

by andres, duncan, adam at May 11, 2017 01:56 PM

May 10, 2017

Douglas M. Auclair (geophf)

April 2017 1HaskellADay 1Liners

  • April 14th, 2017: given
    eitherOr, neitherNor :: Eq a => a -> a -> a -> Bool

    Is eitherOr not neitherNor?

    Prove or disprove.
  • April 14th, 2017: given
    neitherNor :: Eq a => a -> a -> a -> Bool
    andNot :: Eq a => a -> a -> Bool

    How do you compose neitherNor 1 0 and andNot 4?
  • April 11th, 2017:

    opts :: Credentials -> Options
    opts c = defaults & auth ?~ basicAuth (pack $ keyToken c) (pack $ secretToken c)



    data Credentials = Credentials { keyToken, secretToken :: String }

    and (?~) and (&) are from Control.Lens

    Snaps for elegance

    The above code from quillio/Twillo.hs by ismailmustafa

by geophf ( at May 10, 2017 06:44 PM

May 07, 2017

Neil Mitchell

Proving stuff with Idris

Summary: I've been learning Idris to play around with writing simple proofs. It seems pretty cool.

The Idris programming language is a lot like Haskell, but with quite a few cleanups (better export syntax, more though out type classes), and also dependent types. I'm not a massive fan of dependent types as commonly presented - I want to write my code so it's easy to prove, not intertwine my beautiful code with lots of invariants. To try out Idris I've been proving some lemmas about list functions. I'm very much an Idris beginner, but thought I would share what I've done so far, partly to teach, partly to remember, and party to get feedback/criticism about all the beginner mistakes I've made.

Before proving stuff, you need to write some definitions. Idris comes with an append function (named ++ as you might expect), but to keep everything simple and out in the open I've defined:

append : List a -> List a -> List a
append [] ys = ys
append (x :: xs) ys = x :: append xs ys

Coming from Haskell the only real difference is that cons is :: and types are :. Given how Haskell code now looks, that's probably the right way around anyway. We can load that code in the Idris interactive environment and run it:

$ idris Main.idr
Main> append [1,2] [3]
[1,2,3] : List Integer

What we can also do, but can't easily do in Haskell, is start proving lemmas about it. We'd like to prove that append xs [] = xs, so we write a proof:

proof_append_nil : (xs : List a) -> append xs [] = xs
proof_append_nil xs = ?todo

We name the proof proof_append_nil and then say that given xs (of type List a) we can prove that append xs [] = xs. That's pretty direct and simple. Now we have to write the proof - we first write out the definition leaving ?todo where we want the proof to go. Now we can load the proof in Idris and type :t todo to get the bit of the proof that is required, and Idris says:

todo : append xs [] = xs

That's fair - we haven't proven anything yet. Since we know the proof will proceed by splitting apart the xs we can rewrite as:

proof_append_nil [] = ?todo_nil
proof_append_nil (x :: xs) = ?todo_cons

We can now ask for the types of the two remaining todo bits:

todo_nil : [] = []
todo_cons : x :: append xs [] = x :: xs

The first todo_nil is obviously true since the things on either side of the equality match, so we can replace it with Refl. The second statement looks like the inductive case, so we want to apply proof_append_nil to it. We can do that with rewrite proof_append_nil xs, which expands to append xs [] = xs, and rewrites the left-hand-side of the proof to be more like the right. Refined, we end up with:

proof_append_nil [] = Refl
proof_append_nil (x :: xs) = rewrite proof_append_nil xs in ?todo_cons

Reloading again and asking for the type of todo_cons we get:

todo_cons : x :: xs = x :: xs

These things are equal, so we replace todo_cons with Refl to get a complete proof of:

proof_append_nil : (xs : List a) -> append xs [] = xs
proof_append_nil [] = Refl
proof_append_nil (x :: xs) = rewrite proof_append_nil xs in Refl

Totality Checking

Proofs are only proofs if you have code that terminates. In Idris, seemingly sprinkling the statement:

%default total

At the top of the file turns on the totality checker, which ensures the proof is really true. With the statement turned on I don't get any warnings about totality issues, so we have proved append really does have [] as a right-identity.

Avoiding rewrite

The rewrite keyword seems like a very big hammer, and in our case we know exactly where we want to apply the rewrite. Namely at the end. In this case we could have equally written:

cong (proof_append_nil xs)

Not sure which would be generally preferred style in the Idris world, but as the proofs get more complex using rewrite certainly seems easier.

Differences from Haskell

Coming from a Haskell background, and sticking to simple things, the main differences in Idris were that modules don't have export lists (yay), lists are :: and types are : (yay), functions can only use functions defined before them (sad, but I guess a restriction to make dependent types work) and case/lambda both use => instead of -> (meh).

Next steps

For my first task in Idris I also defined two reverse functions:

reverse1 : List a -> List a
reverse1 [] = []
reverse1 (x :: xs) = reverse1 xs `append` [x]

rev2 : List a -> List a -> List a
rev2 acc [] = acc
rev2 acc (x :: xs) = rev2 (x :: acc) xs

reverse2 : List a -> List a
reverse2 = rev2 []

The first reverse1 function is reverse in O(n^2), the second is reverse in O(n). I then tried to prove the three lemmas:

proof_reverse1_reverse1 : (xs : List a) -> xs = reverse1 (reverse1 xs)
proof_reverse2_reverse2 : (xs : List a) -> xs = reverse2 (reverse2 xs)
proof_reverse1_reverse2 : (xs : List a) -> reverse1 xs = reverse2 xs

Namely that both reverse1 and reverse2 are self inverses, and then that they are equivalent. To prove these functions required a few helper lemmas, but only one additional Idris function/feature, namely:

sym : (left = right) -> right = left

A function which transforms a proof of equality from one way around to the other way around. I also required a bunch of helper lemmas, including:

proof_append_assoc : (xs : List a) -> (ys : List a) -> (zs : List a) -> append xs (append ys zs) = append (append xs ys) zs

Developing these proofs in Idris took me about ~2 hours, so were a nice introductory exercise (with the caveat that I've proven these lemmas before, although not in 4+ years). I'd invite anyone interested in learning this aspect of Idris to have a go, and I'll post my proofs sometime in the coming week.

Update: additional notes on this material can be found here from Reddit user chshersh.

by Neil Mitchell ( at May 07, 2017 08:35 PM

Robert Harper

A “proof by contradiction” is not a proof that ends with a contradiction

It is well-known that constructivists renounce “proof by contradiction”, and that classicists scoff at the critique.  “Those constructivists,” the criticism goes, “want to rule out proofs by contradiction.  How absurd!  Look, Pythagoras showed that the square root of two is irrational by deriving a contradiction from the assumption that it is rational.  There is nothing wrong with this.  Ignore them!”

On examination that sort of critique fails, because a proof by contradiction is not a proof that derives a contradiction.  Pythagoras’s  proof is valid, one of the eternal gems of mathematics.  No one questions the validity of that argument, even if they question proof by contradiction. 

Pythagoras’s Theorem expresses a negation: it is not the case that the square root of two can be expressed as the ratio of two integers.  Assume that it can be so represented.  A quick deduction shows that this is impossible.  So the assumption is false.  Done.  This is a direct proof of a negative assertion; it is not a “proof by contradiction”.

What, then, is a proof by contradiction?  It is the affirmation of a positive statement by refutation of its denial.  It is a direct proof of the negation of a negative assertion that is then pressed into service as a direct proof of the assertion, which it is not.  Anyone is free to ignore the distinction for the sake of convenience, as a philosophical issue, or as a sly use of “goto” in a proof, but the distinction nevertheless exists and is important.  Indeed, part of the beauty of constructive mathematics is that one can draw such distinctions, for, once drawn, one can selectively disregard them.  Once blurred, forever blurred, a pure loss of expressiveness.

For the sake of explanation, let me rehearse a standard example of a proof by contradiction.  The claim is that there exists irrationals a and b such that a to the b power is rational.  Here is an indirect proof, a true proof by contradiction.  Move number one, let us prove instead that it is impossible that any two irrationals a and b are such that a to the b is irrational.  This is a negative statement, so of course one proves it be deriving a contradiction from assuming it. But it is not the original statement!  This will be clear from examining the information content of the proof.

Suppose, for a contradiction, that every two irrationals a and b are such that a to the b power is irrational.  We know from Pythagoras that root two is irrational, so plug it in for both a and b, and conclude that root two to the root two power is irrational.  Now use the assumption again, taking a to be root two to the root two, and b to be root two.  Calculate a to the power of b, it is two, which is eminently rational.  Contradiction.

We have now proved that it is not the case that every pair of irrationals, when exponentiated, give an irrational.  There is nothing questionable about this proof as far as I am aware.  But it does not prove that there are two irrationals whose exponent is rational!  If you think it does, then I ask you, please name them for me.  That information is not in this proof (there are other proofs that do name them, but that is not relevant for my purposes).  You may, if you wish, disregard the distinction I am drawing, that is your prerogative, and neither I nor anyone has any problem with that.  But you cannot claim that it is a direct proof, it is rather an indirect proof, that proceeds by refuting the negative of the intended assertion.

So why am I writing this?  Because I have learned, to my dismay, that in U.S. computer science departments–of all places!–students are being taught, erroneously, that any proof that derives a contradiction is a “proof by contradiction”.  It is not.  Any proof of a negative must proceed by contradiction.  A proof by contradiction in the long-established sense of the term is, contrarily, an indirect proof of a positive by refutation of the negative.  This distinction is important, even if you want to “mod out” by it in your work, for it is only by drawing the distinction that one can even define the equivalence with which to quotient.

That’s my main point.  But for those who may not be familiar with the distinction between direct and indirect proof, let me take the opportunity to comment on why one might care to draw such a distinction.  It is entirely a matter of intellectual honesty: the information content of the foregoing indirect proof does not fulfill the expectation stated in the theorem.  It is a kind of boast, an overstatement, to claim otherwise.  Compare the original statement with the reformulation used in the proof.  The claim that it is not the case that every pair of irrationals exponentiate to an irrational is uncontroversial.  The proof proves it directly, and there is nothing particularly surprising about it.  One would even wonder why anyone would bother to state it.  Yet the supposedly equivalent claim stated at the outset appears much more fascinating, because most people cannot easily think up an example of two irrationals that exponentiate to rationals.  Nor does the proof provide one. Once, when shown the indirect proof, a student of mine blurted out “oh that’s so cheap.”  Precisely.

Why should you care?  Maybe you don’t, but there are nice benefits to keeping the distinction, because it demarcates the boundary between constructive proofs, which have direct interpretation as functional programs, and classical proofs, which have only an indirect such interpretation (using continuations, to be precise, and giving up canonicity).  Speaking as a computer scientist, this distinction matters, and it’s not costly to maintain.  May I ask that you adhere to it?

Edit: rewrote final paragraph, sketchy and irrelevant, and improved prose throughout. 


Filed under: Programming, Research, Teaching

by Robert Harper at May 07, 2017 02:31 AM

May 05, 2017

Joachim Breitner

Farewall green cap

For the last two years, I was known among swing dancers for my green flat cap:

Monti, a better model than me

Monti, a better model than me

This cap was very special: It was a gift from a good friend who sewed it by hand from what used to be a table cloth of my deceased granny, and it has traveled with me to many corners of the world.

Just like last week, when I was in Paris where I attended the Charleston class of Romuald and Laura on Saturday (April 29). The following Tuesday I went to a Swing Social and wanted to put on the hat, and noticed that it was gone. The next day I bugged the manager and the caretaker of the venue of the class (Salles Sainte-Roche), and it seems that the hat was still there, that morning, im Salle Kurtz1, but when I went there it was gone.

And that is sad.

The last picture with the hat

The last picture with the hat

  1. How fitting, given that my granny’s maiden name is Kurz.

by Joachim Breitner ( at May 05, 2017 10:13 PM

May 04, 2017

Roman Cheplyaka

Please help me with these open source Haskell projects

Here are some pull requests and issues that have been awaiting my attention for months or years.

Sadly, I haven’t found and probably won’t find enough time for them. So I am asking you for help.

If you can work on and close at least one of the items below, that would be awesome.

Thanks to people who have helped so far:

  • Alexey Zabelin @alexeyzab — immortal: add wait

Update (2017-06-14). I now have processed most of the PRs here myself.

Pull requests awaiting reviews

These are pull requests submitted to my projects. For each one, I describe what needs to be done.

ansi-terminal: add save, restore, report cursor

Needs a review and testing on different platforms.

ansi-terminal: compatibility with Win32- and above

Needs review and testing.

tasty: build with GHCJS

This one actually has little to do with tasty or ghcjs; it only needs a PR for the clock package to fall back to time.

tasty: make --quickcheck-show-replay always show the seed (not just on failure)

Need to figure out how to fix this without going back to random (see this comment).

temporary: canonicalized versions of withSystemTempDirectory and withTempDirectory

Needs another pair of eyes — especially regarding the mysterious withCanonicalizedSystemTempDirectory.

tasty: add soft timeouts

When a QuickCheck test runs into a timeout, Tasty does not report the input values generated by QuickCheck for the interrupted run.

This PR is probably affected by this change in exception handling — you’ll need to figure out how exactly and update the pull request appropriately.

immortal: add wait

Done on 2017-05-07 by Alexey Zabelin

For graceful shutdown, I wait for threads to finish in my servers main loop. I would like to do the same with Immortal threads.

What needs to be done:

  1. Rebase
  2. Add a test

tasty-golden: add golden test for text files

I think the point of this PR is to add an internal diff implementation, but I am not sure.

If you understand and like the idea, please update the PR and make a new case for it; otherwise I’ll probably close it.

tasty: Silence a few GHC warnings

Needs rebase and review.

Issues that need to be fixed

Here are a couple of issues that I care about and was going to fix myself, but can’t find the time.

stack: no package names in haddock index

xmonad: notifications pop up on top of xscreensaver

May 04, 2017 08:00 PM

May 03, 2017

Christopher Allen

What a Haskell Study Group is Not

This article is by Steven Syrek. I’m reposting it here because I endorse what he’s saying. I believe Steven brings a valuable perspective on the haskell book, reading groups, and education in general.

Steven posted this article on his Medium.

He has also written some extended notes on pedagogy tied to this post here.

The most rewarding thing about learning Haskell is the feeling that you are making real progress toward understanding not just the syntax of a programming language but a way of solving problems that is more universal than the domain of software development. The most rewarding thing about teaching Haskell is watching others make this same progress. The most frustrating thing about both learning and teaching Haskell is the widespread attitude that functional programming in general, or Haskell specifically, is difficult.

The goal of any teacher should be to expel this myth. The Haskell community is too small to support much formal education and those most apt to teach have precious little time to do so. And so the myth persists, and many people who could or would learn Haskell do not, whether out of fear, misprision, or lack of communal encouragement. Independent study groups, however, can meet the needs of language learners just as well, but only if they are committed at the outset to pragmatic goals and maintain a culture of mutual support and accountability.

Over the course of about eight months, from September 2016 to April 2017, I ran a Haskell study group in New York City based on Chris Allen and Julie Moronuki’s book, Haskell Programming from First Principles. Since this book was designed to provide a curriculum, it was ideally suited to our purposes. We covered one chapter a week, more or less, as there is just enough content per chapter to keep a dedicated group humming along: challenged but not overwhelmed.

In this article, I will share my thoughts on what made our group successful, where we went astray, and in particular on what a Haskell study group should not be—as the pitfalls are many but advice for avoiding them, based on actual experience, scant. My own background happens to be in education, which has no doubt informed my observations, but I was just as much a student in this study group, even if I happened to be its organizer. I hope this dual role has given me insight from both sides of the pedagogical divide that will be useful to others thinking about starting their own Haskell study groups, a course of action I highly recommend.

1. A Haskell study group is not meant to serve any purpose other than helping people learn Haskell.

Keep this mantra in mind: the point of a Haskell study group is to help people learn Haskell. Nothing else matters. Refer to this mantra whenever you’re in doubt about anything else.

2. A Haskell study group is not a place for people to learn about monads.

Or any other specific language feature. Do not let the random people who will inevitably show up derail your meetings with general, off-topic questions. Make a schedule, post it publicly, and stick to it. Once you embark on diversions, you may never get back on track. Respect the time, commitment, and ongoing progress of your regulars and let newcomers catch up on their own. Always assume anyone who comes to a meeting has: 1) read the chapter for that week, 2) typed in all the code in the chapter and, separately, 3) attempted to complete all the exercises in the chapter. Avoid spending too much time on topics everyone should already know or any time at all on questions about the language from Meetup shoppers.

3. A Haskell study group should not reinforce mistaken or unhelpful stereotypes about Haskell.

Many people have had a hard time learning Haskell. This has given the language a reputation for being hard to learn. In reality, Haskell is elegant and far easier to learn than many other, less elegant languages. The dearth of teaching materials and low quality of documentation have made it difficult for entrants to gain a foothold in the community as compared to language communities that go out of their way to be approachable to newcomers. The Haskell Book directly addresses this problem, however, and it does no one any good to credit assumptions that Haskell requires you to be a math genius. Haskell is capable of representing extremely abstract mathematical entities, but so is math itself, and you hardly need to understand linear algebra to balance your checkbook. I’ve been a college writing instructor for over a decade. Writing is hard. Teaching writing is hard. But anyone can become a proficient writer, even if few will ever write literary masterpieces. We can respect the challenge without succumbing to it or using it as an excuse. Practice, and you’ll improve. Complain, and you’ll remain mediocre. Focus on the positives and the cumulative effect of breakthroughs in the Haskell learning experience.

4. A Haskell study group is not a language advocacy committee.

Given Haskell’s rather esoteric reputation in the world of professional developers, you are likely to be issued demands to defend Haskell’s general usefulness, the practical utility of certain concepts, or the viability of functional programming as a whole. Don’t do it. People have Google if they want to research such things and the entire rest of the Internet if they want to bring their spoons to a knife fight. Your group exists to teach people how to code, not to teach developers how to do their jobs. Focus on the people. Ignore the trolls, don’t use facile comparisons to OOP to explain anything, and don’t waste your energy trying to convince the skeptical. Actually, examples of what you can do in Haskell won’t necessarily help your cause, because most programming languages can do anything any other language can do. The power of Haskell isn’t in any one abstraction. It’s in leveraging all of them together to build software, and software is complex, composed of many interlocking parts. You aren’t likely to persuade by showing someone “the amazing things you can do with Functor” or what have you. Ultimately, they will have to find out for themselves.

5. A Haskell study group is not an experiment in deliberative democracy.

A successful study group requires leadership. Leadership means that someone has to be in charge. Someone has to find space for the meetings, plan them, show up to all of them on time, be responsible for any contingencies that arise, and enforce norms of behavior. If there is no point person, it is unlikely that the group as a whole will make much progress. Sometimes, leaders emerge organically, but it’s better for someone to volunteer to be the organizer in advance. Natural leaders can serve a useful purpose, but they can also be destructive if they aren’t reliable, grow resentful at having assumed a role they didn’t sign up for, or turn your study group into a cult of personality instead of a collaborative learning environment. It’s also essential for the organizer to have buy-in from the rest of the group. On the one hand, participants should appreciate the amount of work it takes to put a study group together and keep it together. On the other hand, an effective leader makes decisions without turning every possible choice into a referendum. Democracy is a fine thing, but in this situation, it is more likely to result in anarchy and listlessness than efficient and decisive action. Respect for the organizer(s) is also relevant to the next point:

6. A Haskell study group should not turn anyone into a martyr.

Whoever is running your group, it’s a good idea if the person in charge is a fellow learner. Someone already proficient in Haskell will need to be especially motivated to teach others to stick with a study group for beginners. Another beginner, however, will have intrinsic motivation. In fact, the organizer will have an even stronger incentive to keep up with the work. Beware any situation in which a single person has assumed all of the responsibility but has little incentive to continue participating or is otherwise crushed by the demands of a dysfunctional group culture—it’s not sustainable.

7. A Haskell study group is not a traditional classroom.

While it is advantageous for the organizer to have prior teaching experience, it is not essential. Since such experience is in short supply, most groups will have to go with what and whom they have. That means the culture of the group is even more important, because people who don’t know how to teach should probably not try to do it. Go over exercises, break into pairs or smaller groups if necessary, but avoid devoting too much of any given meeting to lectures or presentations. Talking is no substitute for coding.

8. A Haskell study group is not a hackathon.

That is, it shouldn’t be a freeform, come-and-do-as-you-please affair. Adhere to a long term curriculum, keep individual meetings structured, and enforce norms of accountability. People who drift in and out or consistently show up unprepared are only going to drain energy from the room. Shower those who participate in good faith with praise and attention. Marginalize those who only come for Haskell social hour or for the free food and drinks. Speaking of which:

9. A Haskell study group is not a Meetup.

Go ahead and use Meetup to schedule your meetings. I did. But don’t give people the impression that it’s one of those Meetups where you can show up to eat pizza, drink beer, and contribute nothing. Likewise:

10. A Haskell study group is not a tech talk.

Your study group is not a form of entertainment. Don’t just give talks or show slides. Some of that may be an inducement to attend, but you want students, not audience members. Your best bet is to plug in to a projector, display the REPL side-by-side with the PDF of the book, and code. You can do pairing, mobbing, taking turns on the exercises, or whatever other method you desire as long as you’re coding and everyone else has the chance to code right along with you. Live coding can be scary, so make it an exercise in solidarity. If everyone did their homework, then everyone should have something to contribute. I suggest you ask them to close their computers and have them do the exercises again, on the spot, for reinforcement. You’ll probably find that everyone ends up loving that in spite of themselves. And failures along the way are fine. In fact, the type checker invites “failure-driven development.” Work things out together. Learn to love the REPL together. Encourage hands-on work, and figure out as a group how to leverage error messages to produce provably correct code.

11. Haskell study group meetings should not go on forever.

People are busy. If you get anyone to dedicate an hour a week to attending a Haskell meeting, it’s a miracle. Don’t ask for too much, or you’re just going to discourage people who would otherwise like to come. As with learning anything, most of the work of learning Haskell people have to do on their own. The study group can provide motivation and moral support (and asynchronous assistance if you use Slack or something similar), but the meetings themselves shouldn’t be longer than an hour or two. Use them to catch up, go over as many of the exercises as possible, and answer questions relevant to that week’s chapter assignment. The book provides a curriculum, and it’s best to just follow it. There is no need to diverge into advanced language features or demonstrate practical applications of every concept. Keep it short, and keep it simple. Also, you don’t have to make sure everyone completely understands everything. Give people time to figure things out, and help them do so however you can, but if it takes too long, move on. You can always work with people one-on-one if they get stuck somewhere.

12. Haskell study groups themselves should not go on forever.

Choose a realistic goal for your study group. Covering chapters 1 to 18 is a realistic goal, at least to start. That will get you through monads (the denouement for many students), at which point you can decide how to proceed with the more advanced material. Some will have had enough, while others will be ready to learn independently. At any rate, don’t give people the impression that they’ll be visiting your co-working cubicle farm every week for the rest of their lives.

13. A Haskell study group is not group therapy.

Remember the point of a Haskell study group? It isn’t to make people feel good about themselves or give them a place to go at night. It’s to teach people Haskell. You don’t need to be an intimidating jerk, but you aren’t doing anyone who attends any favors by not prodding them at least a little bit. If group members come prepared, they should be able to participate in group discussion of the exercises. You can organize smaller groups if some members are truly shrinking violets, but I don’t recommend letting people off the hook just because they’re reluctant to speak up. Create a supportive environment, and there’s no reason for anyone to be terrified of contributing. You never know when an otherwise shy person may be withholding a valuable insight just for being reluctant to be the center of attention. Moreover, you may never know when one such person’s confusion about a concept is shared by others, just because no one wants to admit it. Seek out opportunities for productive conversations. Don’t let people be dominating or rude, but also don’t let them be window dressing. Both attitudes are often just forms of pride, and you need to break through that to reach the vulnerable, yearning students within. Likewise, don’t be afraid to ask people to leave if their attitude or behavior is causing problems or their lack of preparedness makes their ongoing participation pointless and a negative example for others.

14. A Haskell study group is not a race.

Do not skip material in the book, and do not try to cover too much at once. A chapter a week, more or less, should be the pace you adopt. Any faster is overwhelming. Any slower is boring and ineffective. Some chapters are long enough to warrant coverage over multiple weeks, but that should be a rarity. And precious few chapters (certainly not chapter 1) are skippable. Set a reasonable pace, and trust the curriculum. Don’t feel the need to move quickly or to slow down, even if some people ask for it. I repeat, most of the work they do, they should do at home. The meetings are for review, as much as you can reasonably do, and not to serve the needs of any one student. Also, don’t skip the chapter on QuickCheck, and don’t let anyone skip writing all those quickBatch tests. We know who you are.

15. A Haskell study group is not a competition.

Programmers often have strong, competitive personalities. Do your best to contain that. Even better, turn it to useful ends: make the game about helping others, not being smarter or further along in the book. Encourage more experienced students to help less experienced students, in pairs if necessary, so the value of collective progress is enhanced and “being in charge, because I’m the best” diminished. That said, a competitive spirit can be motivating as long as it isn’t toxic. Whatever keeps people showing up without discouraging others—it’s often a fine balance.

16. A Haskell study group should not encourage or facilitate cheating.

Implore group members to at least try the exercises for themselves and avoid looking for solutions online. Insist that they do not post solutions publicly themselves. On a related note, make sure they buy the book. The authors worked hard on it, of course, but most people will value something more highly if they have paid for it. You’ll want your group to have that feeling of “buy-in” however you can get it.

17. A Haskell study group should not let its culture develop spontaneously.

The culture of any group of people has at least as much to do with how that group behaves and what it considers acceptable, en masse, than its laws. And culture has inertia. If the culture of your study group is misaligned with the goal of learning Haskell, you will have a hard time changing it. Therefore, establish and reinforce your group’s culture from the outset. Be serious about the responsibilities of its members and that it’s not OK to show up unprepared. Think about creating the proper incentives for productive behavior, and avoid giving in to a culture of impunity. Over time, as the group becomes habituated to doing things the right way, you won’t need to be an enforcer as much. On the other hand, a culture that has gone astray will not easily be corrected with new rules, because by that point, leadership will no longer have the credibility to overrule custom: consuetudo pro lege servatur.

18. A Haskell study group does not need anything other than Haskell in order to teach people Haskell.

Use the tools provided by the language to teach the language: the REPL, the type checker, handy compiler extensions such as InstanceSigs, etc. Empower students to use these tools to develop an intuition for the type system and how to use it to solve problems. You don’t need to explain anything in terms of JavaScript, and be careful about overusing metaphors, too. Comparing Haskell to other languages and making abstract concepts more concrete feel like easy wins, but don’t do either as a substitute for teaching and learning the language on its own terms.

19. A Haskell study group is not for people who can’t handle discipline and hard work.

The Haskell Book is nothing if not demanding. But the intensity of the curriculum is meant to guide students to develop an intuition for how the type system works and the ability to interpret the terse but expressive syntax of Haskell code as quickly as possible. It helps you build up a repertoire of fundamental concepts and techniques that lead to ever more complex, but still comprehensible, abstractions down the road. There are no shortcuts here. Diligent learners will rapidly establish for themselves that foundation of understanding, and it will help them move more easily, and with a greater sense of achievement, as more advanced concepts are introduced. Conversely, the less studious are unlikely to overcome the psychological barriers that make people think Haskell is difficult. They will grow frustrated and give up as their more enthusiastic peers come to seem like magicians when working with the REPL. The truth is just the difference in commitment.

20. A Haskell study group is not only about discipline and hard work.

After your meetings, go out with the group to a local bar or cafe. Pick a regular place so it becomes a tradition. Not only is it good for community cohesion, which is good for the continuation of your study group, but it gives people a venue for venting their natural desire to socialize—a venue that is not the meeting place itself. You’re likely going to end up with a core group of people who always come. Cultivate them. Cherish them. Protect their interests. Get the less committed to want to be a part of the in-crowd, and welcome them with open arms when they prove themselves worthy.

I know this site is a bit of a disaster zone, but if you like my writing or think you could learn something useful from me, please take a look at the Haskell book I've been writing. There's a free sample available too!

May 03, 2017 12:00 AM

May 01, 2017

FP Complete

What pure functional programming is all about: Part 2

In the last post, we covered the following:

  • What purity is and what it isn't.
  • We looked at functions and function composition.
  • We've looked at how reasoning is easier if you only have functions.

In this post, we'll explore:

  • SQL as a pure language, which is a familiar language to almost everybody.
  • How pure languages deal with the real-world and doing side-effects, which are obviously practical things to do.
  • Evaluation vs execution, the generalization of pure vs impure languages.

In the next post, we'll explore the performance and declarative-code-writing benefits of pure functional languages in particular detail, looking at generated code, assembly, performance, etc.

In a familiar setting: SQL

In the last post, we established that pure languages (like Haskell or PureScript) are unable to make observations about the real world directly.

So, then, the question is: How do pure languages interact with the real world? Let's look at SQL first, because everyone knows SQL, and then use that as a bridge towards how it's done in Haskell and other general purpose pure languages.

Imagine a subset of SQL which is pure. It's not a big leap: SQL is mostly pure anyway. I don't like it when people write an article and ask me to imagine a syntax that's in their heads, so here is trivial a PEG grammar, which you can try online here.

Select = 'SELECT ' Fields ' FROM ' Names Where?
Names = Name (', ' Name)*
Fields = Exp (', ' Exp)*
Where = ' WHERE ' Exp
Exp = Prefix? (Funcall / Condition / Name) (Connective Exp)?
Funcall = Name '(' Exp ')'
Condition = (Funcall / Name) Op Exp
Prefix = 'NOT '
Connective = ' AND ' / ' OR '
Op = ' = ' / ' <> ' / ' < ' / ' > ' / ' * '
Name = [a-zA-Z0-9]+

The following is an example of the above grammar.

SELECT sin(foo), bar * 3 FROM table1, table2 WHERE foo > 1 AND NOT bar = floor(foo)

This language is pure. The result of evaluating one of these expressions is the same every time. You can swap the order of commutative expressions like floor(foo) = bar and it doesn't change the meaning of the program. floor(x) * sin(y) = sin(y) * floor(x).

You get a description of the work you want to be done, and then the database engine (PostgreSQL, MySQL, SQL Server, etc.), usually creating an optimized query plan based on what it knows about your query and the database schema and contents, executes the query on the database, returning a collection of results.

You didn't write instructions about how to get the database, where to look on the database, how to map indices across tables and whether to do an index scan or a sequence scan, allocating a buffer, etc.

And yet, above, we clearly are giving the SQL engine a little program (not a turing complete one), that can compute conditionals and even functions (floor and sin) that we specified freely.

Because our little program is pure, SQL is capable of basically rewriting it completely, reducing it into a more normalized form without redundancy, and executing something far more efficient.

Look at what PostgreSQL is able to do with your queries. Here I do a query which works on an indexed field on a table with 37,626,086 rows. PostgreSQL knows both the query and my data, and plans accordingly, able to estimate the cost of how much work we'll have to do.

ircbrowse=> explain select * from event where id > 123 and id < 200*2;
                                    QUERY PLAN
 Index Scan using event_unique_id on event  (cost=0.56..857.20 rows=309 width=87)
   Index Cond: ((id > 123) AND (id < 400))

(The 200*2 was evaluated statically.)

Simply based on a static analysis of the SQL program.

Evaluation vs execution

The SQL example is a specific case of a more general way of separating two concerns: evaluation and execution (or interpretation). On the one side you have your declarative language that you can evaluate, pretty much in terms of substition steps (and that's how you do your reasoning, in terms of the denotational semantics), and on the other you have a separate program which interprets that language in "the real world" (associated with operational semantics).

I'm sure any programmer reading this easily understands the informal denotational semantics of the SQL mini language above, and would feel comfortable implementing one or more ways of executing it.

In the same way, Haskell and other pure languages, construct declarative descriptions of work which a separate program can execute.


What's evaluation, as differentiated from execution? To do your language's regular evaluation steps, as you might do on paper. If you've read your Structured and Interpretation of Computer Programs by MIT, you'll know this as the substitution model. Here's an example.

If you want to count the length of a linked list, you might do:

length [1, 2, 3]

And length is implemented like this:

length [] = 0
length (x:xs) = 1 + length xs

So let's evaluate the expression, step-by-step:

length [1, 2, 3]

1 + length [2, 3]

1 + (1 + length [3])

1 + (1 + (1 + length []))

1 + (1 + (1 + 0))

1 + (1 + 1)

1 + 2


Even if you don't know Haskell, I think the simple substitution steps to evaluate the expression are straight-forward and make sense. That's evaluation.

A Terminal mini-language

Let's look at a data type that models a list of terminal actions to perform. I realise that if you don't know Haskell or ML, the below is mostly nonsense. I'll also bring in JavaScript as a "lingua franca", so that the idea translates nicely.

(If it looks concise in Haskell and verbose in JavaScript, it's because JavaScript was designed for imperative, OO-style programming, and Haskell was designed for pure functional programming.)

data Terminal a
  = Print String (Terminal a)
  | GetLine (String -> Terminal a)
  | Return a
function Print(string, next) {
  this.line = string; = next;
function GetLine(cont) {
  this.f = cont;
function Return(a) {
  this.a = a;

It can describe printing to stdout, getting a line from stdin, or returning a result. An example might be:

  "Please enter your name: "
     (\name ->
           ("Hello, " ++ name ++ "!")
           (Return ())))
new Print(
  "Please enter your name: ",
  new GetLine(function(name){
    return new Print("Hello, " + name + "!",
                     new Return(null));}))

Think of this like an abstract syntax tree. It's the description of what to do. When there is a function in the AST, that's a callback. The callback is just another pure function that returns a new action to be interpreted.

The expressions above can be evaluated, but they perform no side-effects. It's a piece of purely functional code which generates a data structure. Our substitution model above still works.

Hammering the point home, consider a greeting action like this:

greeting :: String -> Terminal ()
greeting msg =
       (\name ->
          if name == "Chris"
            then Print "That's my name too." (Return ())
            else Print "Hello, stranger!" (Return ())))

The greeting function itself is pure. It takes a message string, and constructs a Print using that. That's done by evaluation. Also, the sub-expression (\name -> ...) is another pure function that takes a name and conditionally produces one action or another.

Now, onto execution.


We can write a number of ways to interpret this AST. Here is one pure implementation. It takes as input (1) some action list to run and (2) lines of input, then it returns a value and some output lines:

data Result a = Success a | Failure String deriving Show
interpretPure :: Terminal a -> [String] -> Result (a,[String])
interpretPure action stdin =
  case action of
    Return a -> Success (a, [])
    Print line next ->
      case interpretPure next stdin of
        Success (a, stdout) -> Success (a, line : stdout)
        Failure e -> Failure e
    GetLine f ->
      case stdin of
        [] -> Failure "stdin closed"
        (line:stdin') -> interpretPure (f line) stdin'

In diagram form:

Note the part,

    GetLine f ->
      case stdin of
        [] -> Failure "stdin closed"
        (line:stdin') -> interpretPure (f line) stdin'
-------------------------------------- ^

Above is where we call the pure function from GetLine to produce another action for us to interpret.

In JavaScript:

function Success(x){ this.success = x; }
function Failure(e){ this.error = e; }
function interpretPure(action, stdin) {
  if (action instanceof Return) {
    return new Success({ result: action.a, stdout: [] });
  } else if (action instanceof Print) {
    var result = interpretPure(, stdin);
    if (result instanceof Success) {
      return new Success({
        result: result.success.result,
        stdout: [action.line].concat(result.success.stdout)
    } else return result;
  } else if (action instanceof GetLine) {
    if (stdin == "") {
      return new Failure("stdin closed");
    } else {
      var line = stdin[0];
      var stdin_ = stdin.slice(1);
      return interpretPure(action.f(line), stdin_);

Which we can run like this:

> interpretPure demo ["Dave"]
Success ((),["Please enter your name: ","Hello, Dave!"])
> interpretPure demo []
Failure "stdin closed"
> JSON.stringify(interpretPure(demo, ["Dave"]));
"{"success":{"result":null,"stdout":["Please enter your name: ","Hello, Dave!"]}}"
> JSON.stringify(interpretPure(demo, []));
"{"error":"stdin closed"}"

If we liked, we could interpret this as real I/O that talks to the world directly.

In Haskell, we translate it to a different type called IO. In JavaScript, let's do a good olde-fashioned 90's-style alert/prompt user interaction.

interpretIO :: Terminal a -> IO a
interpretIO terminal =
  case terminal of
    Return a -> return a
    Print str next -> do
      putStrLn str
      interpretIO next
    GetLine f -> do
      line <- getLine
      interpretIO (f line)
function interpretIO(action) {
  if (action instanceof Return) {
    return action.a;
  } else if (action instanceof Print) {
  } else if (action instanceof GetLine) {
    var line = prompt();

In this case, we're just translating from one model to another. The IO a type is interpreted by the Haskell runtime either in a REPL prompt, or in a compiled program. JavaScript is capable of executing things directly during evaluation, so we were able to implement something similar to what Haskell's runtime does for IO.

In SICP, they talk about programs as being spells, and the interpreter being like a spirit that does your bidding. Here is a formal description:

(In practice, Haskell's IO type is not a closed set of operations, but an open set that can be extended with e.g. bindings to C libraries and such, and it isn't interpreted at runtime but rather compiled down to bytecode or machine code. The strict evaluation/execution separation remains in place, however.)


We've covered an informal comparison between "evaluating" and "executing" code, which is a distinction often discussed in the functional programming community.

We've explored making a declarative data type that represents terminal interaction, and implemented two different ways of interpreting it: as a pure function, or executing it directly. There may be other interpretations, such as talking to a server. We could write an interpreter which logs every action performed. We could also write a test suite which feeds randomized or specific inputs and demands consistent outputs.

We looked at how SQL is such an example of an evaluation (1 + 1 is evaluated before executing the query) vs execution separation. In SELECT customer.purchases + 1 FROM customer, the customer.purchases + 1 expression is evaluated by the interpreter for each row in the table.

In the next post, we'll explore:

  • Performance of pure code.
  • Optimization of pure code, in particular: Inlining, deforestation, common sub-expression elimination, fusion, rewrite rules, etc.
  • And how these optimizations allow for low- or zero-cost abstractions.

We may further explore different types of interpreters (software transactional memory, local mutation, parsers, etc).


There are some extra articles that may be of interest for the fearless:

May 01, 2017 04:04 PM

Douglas M. Auclair (geophf)

April 2017 1HaskellADay Problems and Solutions

by geophf ( at May 01, 2017 01:42 PM

April 30, 2017

Erik de Castro Lopo

What do you mean ExceptT doesn't Compose?

Disclaimer: I work at Ambiata (our Github presence) probably the biggest Haskell shop in the southern hemisphere. Although I mention some of Ambiata's coding practices, in this blog post I am speaking for myself and not for Ambiata. However, the way I'm using ExceptT and handling exceptions in this post is something I learned from my colleagues at Ambiata.

At work, I've been spending some time tracking down exceptions in some of our Haskell code that have been bubbling up to the top level an killing a complex multi-threaded program. On Friday I posted a somewhat flippant comment to Google Plus:

Using exceptions for control flow is the root of many evils in software.

Lennart Kolmodin who I remember from my very earliest days of using Haskell in 2008 and who I met for the first time at ICFP in Copenhagen in 2011 responded:

Yet what to do if you want composable code? Currently I have
type Rpc a = ExceptT RpcError IO a
which is terrible

But what do we mean by "composable"? I like the wikipedia definition:

Composability is a system design principle that deals with the inter-relationships of components. A highly composable system provides recombinant components that can be selected and assembled in various combinations to satisfy specific user requirements.

The ensuing discussion, which also included Sean Leather, suggested that these two experienced Haskellers were not aware that with the help of some combinator functions, ExceptT composes very nicely and results in more readable and more reliable code.

At Ambiata, our coding guidelines strongly discourage the use of partial functions. Since the type signature of a function doesn't include information about the exceptions it might throw, the use of exceptions is strongly discouraged. When using library functions that may throw exceptions, we try to catch those exceptions as close as possible to their source and turn them into errors that are explicit in the type signatures of the code we write. Finally, we avoid using String to hold errors. Instead we construct data types to carry error messages and render functions to convert them to Text.

In order to properly demonstrate the ideas, I've written some demo code and made it available in this GitHub repo. It compiles and even runs (providing you give it the required number of command line arguments) and hopefully does a good job demonstrating how the bits fit together.

So lets look at the naive version of a program that doesn't do any exception handling at all.

  import Data.ByteString.Char8 (readFile, writeFile)

  import Naive.Cat (Cat, parseCat)
  import Naive.Db (Result, processWithDb, renderResult, withDatabaseConnection)
  import Naive.Dog (Dog, parseDog)

  import Prelude hiding (readFile, writeFile)

  import System.Environment (getArgs)
  import System.Exit (exitFailure)

  main :: IO ()
  main = do
    args <- getArgs
    case args of
      [inFile1, infile2, outFile] -> processFiles inFile1 infile2 outFile
      _ -> putStrLn "Expected three file names." >> exitFailure

  readCatFile :: FilePath -> IO Cat
  readCatFile fpath = do
    putStrLn "Reading Cat file."
    parseCat <$> readFile fpath

  readDogFile :: FilePath -> IO Dog
  readDogFile fpath = do
    putStrLn "Reading Dog file."
    parseDog <$> readFile fpath

  writeResultFile :: FilePath -> Result -> IO ()
  writeResultFile fpath result = do
    putStrLn "Writing Result file."
    writeFile fpath $ renderResult result

  processFiles :: FilePath -> FilePath -> FilePath -> IO ()
  processFiles infile1 infile2 outfile = do
    cat <- readCatFile infile1
    dog <- readDogFile infile2
    result <- withDatabaseConnection $ \ db ->
                 processWithDb db cat dog
    writeResultFile outfile result

Once built as per the instructions in the repo, it can be run with:

  dist/build/improved/improved Naive/Cat.hs Naive/Dog.hs /dev/null
  Reading Cat file 'Naive/Cat.hs'
  Reading Dog file 'Naive/Dog.hs'.
  Writing Result file '/dev/null'.

The above code is pretty naive and there is zero indication of what can and cannot fail or how it can fail. Here's a list of some of the obvious failures that may result in an exception being thrown:

  • Either of the two readFile calls.
  • The writeFile call.
  • The parsing functions parseCat and parseDog.
  • Opening the database connection.
  • The database connection could terminate during the processing stage.

So lets see how the use of the standard Either type, ExceptT from the transformers package and combinators from Gabriel Gonzales' errors package can improve things.

Firstly the types of parseCat and parseDog were ridiculous. Parsers can fail with parse errors, so these should both return an Either type. Just about everything else should be in the ExceptT e IO monad. Lets see what that looks like:

  {-# LANGUAGE OverloadedStrings #-}
  import           Control.Exception (SomeException)
  import           Control.Monad.IO.Class (liftIO)
  import           Control.Error (ExceptT, fmapL, fmapLT, handleExceptT
                                 , hoistEither, runExceptT)

  import           Data.ByteString.Char8 (readFile, writeFile)
  import           Data.Monoid ((<>))
  import           Data.Text (Text)
  import qualified Data.Text as T
  import qualified Data.Text.IO as T

  import           Improved.Cat (Cat, CatParseError, parseCat, renderCatParseError)
  import           Improved.Db (DbError, Result, processWithDb, renderDbError
                               , renderResult, withDatabaseConnection)
  import           Improved.Dog (Dog, DogParseError, parseDog, renderDogParseError)

  import           Prelude hiding (readFile, writeFile)

  import           System.Environment (getArgs)
  import           System.Exit (exitFailure)

  data ProcessError
    = ECat CatParseError
    | EDog DogParseError
    | EReadFile FilePath Text
    | EWriteFile FilePath Text
    | EDb DbError

  main :: IO ()
  main = do
    args <- getArgs
    case args of
      [inFile1, infile2, outFile] ->
              report =<< runExceptT (processFiles inFile1 infile2 outFile)
      _ -> do
          putStrLn "Expected three file names, the first two are input, the last output."

  report :: Either ProcessError () -> IO ()
  report (Right _) = pure ()
  report (Left e) = T.putStrLn $ renderProcessError e

  renderProcessError :: ProcessError -> Text
  renderProcessError pe =
    case pe of
      ECat ec -> renderCatParseError ec
      EDog ed -> renderDogParseError ed
      EReadFile fpath msg -> "Error reading '" <> T.pack fpath <> "' : " <> msg
      EWriteFile fpath msg -> "Error writing '" <> T.pack fpath <> "' : " <> msg
      EDb dbe -> renderDbError dbe

  readCatFile :: FilePath -> ExceptT ProcessError IO Cat
  readCatFile fpath = do
    liftIO $ putStrLn "Reading Cat file."
    bs <- handleExceptT handler $ readFile fpath
    hoistEither . fmapL ECat $ parseCat bs
      handler :: SomeException -> ProcessError
      handler e = EReadFile fpath (T.pack $ show e)

  readDogFile :: FilePath -> ExceptT ProcessError IO Dog
  readDogFile fpath = do
    liftIO $ putStrLn "Reading Dog file."
    bs <- handleExceptT handler $ readFile fpath
    hoistEither . fmapL EDog $ parseDog bs
      handler :: SomeException -> ProcessError
      handler e = EReadFile fpath (T.pack $ show e)

  writeResultFile :: FilePath -> Result -> ExceptT ProcessError IO ()
  writeResultFile fpath result = do
    liftIO $ putStrLn "Writing Result file."
    handleExceptT handler . writeFile fpath $ renderResult result
      handler :: SomeException -> ProcessError
      handler e = EWriteFile fpath (T.pack $ show e)

  processFiles :: FilePath -> FilePath -> FilePath -> ExceptT ProcessError IO ()
  processFiles infile1 infile2 outfile = do
    cat <- readCatFile infile1
    dog <- readDogFile infile2
    result <- fmapLT EDb . withDatabaseConnection $ \ db ->
                 processWithDb db cat dog
    writeResultFile outfile result

The first thing to notice is that changes to the structure of the main processing function processFiles are minor but all errors are now handled explicitly. In addition, all possible exceptions are caught as close as possible to the source and turned into errors that are explicit in the function return types. Sceptical? Try replacing one of the readFile calls with an error call or a throw and see it get caught and turned into an error as specified by the type of the function.

We also see that despite having many different error types (which happens when code is split up into many packages and modules), a constructor for an error type higher in the stack can encapsulate error types lower in the stack. For example, this value of type ProcessError:

  EDb (DbError3 ResultError1)

contains a DbError which in turn contains a ResultError. Nesting error types like this aids composition, as does the separation of error rendering (turning an error data type into text to be printed) from printing.

We also see that with the use of combinators like fmapLT, and the nested error types of the previous paragraph, means that ExceptT monad transformers do compose.

Using ExceptT with the combinators from the errors package to catch exceptions as close as possible to their source and converting them to errors has numerous benefits including:

  • Errors are explicit in the types of the functions, making the code easier to reason about.
  • Its easier to provide better error messages and more context than what is normally provided by the Show instance of most exceptions.
  • The programmer spends less time chasing the source of exceptions in large complex code bases.
  • More robust code, because the programmer is forced to think about and write code to handle errors instead of error handling being and optional afterthought.

Want to discuss this? Try reddit.

April 30, 2017 02:22 AM

April 28, 2017

Joachim Breitner

ghc-proofs rules more now

A few weeks ago I blogged about an experiment of mine, where I proved equalities of Haskell programs by (ab)using the GHC simplifier. For more details, please see that post, or the video of my talk at the Zürich Haskell User Group, but one reason why this approach has any chance of being useful is the compiler’s support for rewrite rules.

Rewrite rules are program equations that the programmer specifies in the source file, and which the compiler then applies, from left to right, whenever some intermediate code matches the left-hand side of the equation. One such rule, for example, is

{-# RULES "foldr/nil" forall k z. foldr k z [] = z #-}

taken right out of the standard library.

In my blog post I went through the algebraic laws that a small library of mine, successors, should fulfill, and sure enough, once I got to more interesting proofs, they would not go through just like that. At that point I had to add additional rules to the file I was editing, which helped the compiler to finish the proofs. Some of these rules were simple like

{-# RULES "mapFB/id" forall c. mapFB c (\x -> x) = c #-}
{-# RULES "foldr/nil" forall k n. GHC.Base.foldr k n [] = n #-}
{-# RULES "foldr/undo" forall xs. GHC.Base.foldr (:) [] xs = xs #-}

and some are more intricate like

{-# RULES "foldr/mapFB" forall c f g n1 n2 xs.
    GHC.Base.foldr (mapFB c f) n1 (GHC.Base.foldr (mapFB (:) g) n2 xs)
    = GHC.Base.foldr (mapFB c (f.g)) (GHC.Base.foldr (mapFB c f) n1 n2) xs

But there is something fishy going on here: The foldr/nil rule is identical to a rule in the standard library! I should not have to add to my file that as I am proving things. So I knew that the GHC plugin, which I wrote to do these proofs, was doing something wrong, but I did not investigate for a while.

I returned to this problem recetly, and with the efficient and quick help of Simon Peyton Jones, I learned what I was doing wrong.1 After fixing it, I could remove all the simple rules from the files with my proofs. And to my surprise, I could remove the intricate rule as well!

So with this bug fixed, ghc-proofs is able to prove all the Functor, Applicative and Monad rules of the Succs functor without any additional rewrite rules, as you can see in the example file! (I still have to strategically place seqs in a few places.)

That’s great,