Planet Haskell

August 27, 2014

Philip Wadler

Informatics Independence Referendum Debate - the result

Here is the outcome of the debate announced earlier:
<embed allowfullscreen="true" allowscriptaccess="always" flashvars="config=;file=" height="360" src="" width="740"></embed>
Yes 19
No 25
Undecided 28

Yes 28
No 31
Undecided 26

(Either some people entered after the debate began, or some people began the debate unsure even whether they were undecided.)

Thank you to Alan, Mike, and the audience for a fantastic debate. The audience asked amazing questions on both sides, clearly much involved with the process.

Video of debate here.
Alan's slides here.
My slides here.

by Philip Wadler ( at August 27, 2014 01:43 PM

Bill Atkins

NSNotificationCenter, Swift and blocks

The conventional way to register observers with NSNotificationCenter is to use the target-action pattern. While this gets the job done, it's inherently not type-safe.

For example, the following Swift snippet will compile perfectly:

    NSNotificationCenter.defaultCenter().addObserver(self, selector: Selector("itemAdded:"),
      name: MyNotificationItemAdded, object: nil)

even though at runtime it will fail unless self has a method named itemAdded that takes exactly one parameter (leaving off that last colon in the selector will turn this line into a no-op). Plus, this method gives you no way to take advantages of Swift's closures, which would allow the observer to access local variables in the method that adds the observer and would eliminate the need to create a dedicated method to handle the event.

A better way to do this is to use blocks. And NSNotificationCenter does include a block-based API:

    NSNotificationCenter.defaultCenter().addObserverForName(MyNotificationItemAdded, object: nil, queue: nil) { note in
      // ...

This is much nicer, especially with Swift's trailing closure syntax. There are no method names to be looked up at runtime, we can refer to local variables in the method that registered the observer and we can perform small bits of logic in reaction to events without having to create and name dedicated methods.

The catch comes in resource management. It's very important that an object remove its event observers when it's deallocated, or else NSNotificationCenter will try to invoke methods on invalid pointers.

The traditional target-action method has the one advantage that we can easily handle this requirement with a single call in deinit:

  deinit {

With the block API, however, since there is no explicit target object, each call to addObserverForName returns "an opaque object to act as observer." So your observer class would need to track all of these objects and then remove them all from the notification center in deinit, which is a pain.

In fact, the hassle of having to do bookkeeping on the observer objects almost cancels out the convenience of using the block API. Frustrated by this situation, I sat down and created a simple helper class, NotificationManager:

class NotificationManager {
  private var observerTokens: [AnyObject] = []

  deinit {

  func deregisterAll() {
    for token in observerTokens {

    observerTokens = []

  func registerObserver(name: String!, block: (NSNotification! -> ()?)) {
    let newToken = NSNotificationCenter.defaultCenter().addObserverForName(name, object: nil, queue: nil) {note in

  func registerObserver(name: String!, forObject object: AnyObject!, block: (NSNotification! -> ()?)) {
    let newToken = NSNotificationCenter.defaultCenter().addObserverForName(name, object: object, queue: nil) {note in

First, this simple class provides a Swift-specialized API around NSNotificationCenter.  It provides an additional convenience method without an object parameter (rarely used, in my experience) to make it easier to use trailing-closure syntax. But most importantly, it keeps track of the observer objects generated when observers are registered, and removes them when the object is deinit'd.

A client of this class can simply keep a member variable of type NotificationManager and use it to register its observers. When the parent class is deallocated, the deinit method will automatically be called on its NotificationManager member variable, and its observers will be properly disposed of:

class MyController: UIViewController {
  private let notificationManager = NotificationManager()
  override init() {
    notificationManager.registerObserver(MyNotificationItemAdded) { note in
      println("item added!")
  required init(coder: NSCoder) {
    fatalError("decoding not implemented")

When the MyController instance is deallocated, its NotificationManager member variable will be automatically deallocated, triggering the call to deregisterAll that will remove the dead objects from NSNotificationCenter.

In my apps, I add a notificationManager instance to my common UIViewController base class so I don't have to explicitly declare the member variable in all of my controller subclasses.

Another benefit of using my own wrapper around NSNotificationCenter is that I can add useful functionality, like group observers: an observer that's triggered when any one of a group of notifications are posted:

struct NotificationGroup {
  let entries: [String]
  init(_ newEntries: String...) {
    entries = newEntries


extension NotificationManager {
  func registerGroupObserver(group: NotificationGroup, block: (NSNotification! -> ()?)) {
    for name in group.entries {
      registerObserver(name, block: block)

This can be a great way to easily set up an event handler to run when, for example, an item is changed in any way at all:

   let MyNotificationItemsChanged = NotificationGroup(

    notificationManager.registerGroupObserver(MyNotificationItemsChanged) { note in
      // ...

by More Indirection ( at August 27, 2014 11:21 AM

Simon Michael

Creating well-behaved Hakyll blog posts

Posts in a Hakyll-powered blog need to be created with care, if you want your feed to work well with clients and aggregators. There are many things to remember:

  • If you have clones of your site, decide which one to work in and make sure it’s up to date
  • Create the file in the right place
  • Name it consistently (I use
  • In my setup, prefix it with _ if it’s a draft (I render but don’t publish those)
  • Set title, tags, and author with a metadata block
  • Set published time with metadata to get a more precise timestamp than Hakyll can guess from the filename. Include a time zone. Use the right format.
  • When moving a post from draft to published:
    • Update the published time
    • Update the file name if the title or publish date has changed
  • If changing a post after it has been published: set updated time in the metadata
  • At some point, commit it to version control and sync it to other clones

I find this makes blogging feel tedious, especially after an absence when I’ve forgotten the details. Case in point, I managed to share an ugly template post with Planet Haskell readers while working on this one.

So I’m trying out this bash shell script, maybe it will help. Adjust to suit your setup.
(updated 2014/8/27)

# add to ~/.bashrc


# List recent blog posts.
alias blog-ls="ls $BLOGDIR | tail -10"

# Create a new hakyll-compatible draft blog post.
# blog-new ["The Title" ["tag1, tag2" ["Author Name"]]]
function blog-new() {
    TITLE=${1:-Default Title}
    TAGS=${2:-defaulttag1, defaulttag2}
    AUTHOR=${3:-Default Author Name}
    SLUG=${TITLE// /-}
    DATE=`date +"%Y-%m-%d"`
    TIME=`date +"%Y-%m-%d %H:%M:%S%Z"`
    echo creating $BLOGDIR/$FILE
    cat <<EOF >>$BLOGDIR/$FILE && emacsclient -n $BLOGDIR/$FILE
title:     $TITLE
tags:      $TAGS
author:    $AUTHOR
published: $TIME


An example:

$ blog-new 'Scripted Hakyll blog post creation' 'hakyll, haskell'
(file opens in emacs, edit & save)
$ make
./site build
  Creating store...
  Creating provider...
  Running rules...
Checking for out-of-date items
  updated blog/

See also part 2.

August 27, 2014 02:15 AM

Well-behaved Hakyll blog posts, continued

More hakyll blog fixes:

Ugly things showing on planets

My posts were showing unwanted things on planet haskell - double heading, redundant date, tag links, and ugly disqus html. By comparing with Jasper Van der Jeugt’s blog, I found the problem: I was snapshotting content for the feed at the wrong time, after applying the post template:

>>= return . fmap demoteHeaders
>>= loadAndApplyTemplate "templates/post.html"    (postCtx tags)
>>= saveSnapshot "content"
>>= loadAndApplyTemplate "templates/default.html" defaultContext


>>= saveSnapshot "content"  --
>>= return . fmap demoteHeaders
>>= loadAndApplyTemplate "templates/post.html"    (postCtx tags)
>>= loadAndApplyTemplate "templates/default.html" defaultContext

Manual feed publishing

The main blog feed is now generated with a _ prefix, and I must manually rename it (with make feed) to make it live it on Planet Haskell. This will hopefully reduce snafus (and not create new ones).

./site.hs 95
-    create ["blog.xml"] $ do
+    create ["_blog.xml"] $ do

./Makefile 14
+feed: _site/blog.xml
+_site/blog.xml: _site/_blog.xml
+	cp _site/_blog.xml _site/blog.xml

Better HTML titles

Changed the “Joyful Systems” prefix to a suffix in the HTML page titles, making search results and browser tab names more useful.

August 27, 2014 02:00 AM

FP Complete

IAP: conduit stream fusion

Both the changes described in this blog post, and in the previous blog post, are now merged to the master branch of conduit, and have been released to Hackage as conduit 1.2.0. That doesn't indicate stream fusion is complete (far from it!). Rather, the optimizations we have so far are valuable enough that I want them to be available immediately, and future stream fusion work is highly unlikely to introduce further breaking changes. Having the code on Hackage will hopefully also make it easier for others to participate in the discussion around this code.

Stream fusion

Last time, I talked about applying the codensity transform to speed up conduit. This greatly increases performance when performing many monadic binds. However, this does nothing to help us with speeding up the "categorical composition" of conduit, where we connect two components of a pipeline together so the output from the first flows into the second. conduit usually refers to this as fusion, but given the topic at hand (stream fusion), I think that nomenclature will become confusing. So let's stick to categorical composition, even though conduit isn't actually a category.

Duncan Coutts, Roman Leshchinskiy and Don Stewart wrote the stream fusion paper, and that technique has become integral to getting high performance in the vector and text packages. The paper is well worth the read, but for those unfamiliar with the technique, let me give a very brief summary:

  • GHC is very good at optimising non-recursive functions.
  • We express all of our streaming functions has a combination of some internal state, and a function to step over that state.
  • Stepping either indicates that the stream is complete, there's a new value and a new state, or there's a new state without a new value (this last case helps avoid recursion for a number of functions like filter).
  • A stream transformers (like map) takes a Stream as input and produces a new Stream as output.
  • The final consuming functions, like fold, are the only place where recursion happens. This allows all other components of the pipeline to be inlined, rewritten to more efficient formats, and optimized by GHC.

Let's see how this looks compared to conduit.

Data types

I'm going to slightly rename data types from stream fusion to avoid conflicts with existing conduit names. I'm also going to add an extra type parameter to represent the final return value of a stream; this is a concept that exists in conduit, but not common stream fusion.

data Step s o r
    = Emit s o
    | Skip s
    | Stop r
data Stream m o r = forall s. Stream
    (s -> m (Step s o r))
    (m s)

The Step datatype takes three parameters. s is the internal state used by the stream, o is the type of the stream of values it generates, and r is the final result value. The Stream datatype uses an existential to hide away that internal state. It then consists of a step function that takes a state and gives us a new Step, as well as an initial state value (which is a monadic action, for cases where we want to do some initialization when starting a stream).

Let's look at some functions to get a feel for what this programming style looks like:

enumFromToS_int :: (Integral a, Monad m) => a -> a -> Stream m a ()
enumFromToS_int !x0 !y =
    Stream step (return x0)
    step x | x <= y    = return $ Emit (x + 1) x
           | otherwise = return $ Stop ()

This function generates a stream of integral values from x0 to y. The internal state is the current value to be emitted. If the current value is less than or equal to y, we emit our current value, and update our state to be the next value. Otherwise, we stop.

We can also write a function that transforms an existing stream. mapS is likely the simplest example of this:

mapS :: Monad m => (a -> b) -> Stream m a r -> Stream m b r
mapS f (Stream step ms0) =
    Stream step' ms0
    step' s = do
        res <- step s
        return $ case res of
            Stop r -> Stop r
            Emit s' a -> Emit s' (f a)
            Skip s' -> Skip s'

The trick here is to make a function from one Stream to another. We unpack the input Stream constructor to get the input step and state functions. Since mapS has no state of its own, we simply keep the input state unmodified. We then provide our modified step' function. This calls the input step function, and any time it sees an Emit, applies the user-provided f function to the emitted value.

Finally, let's consider the consumption of a stream with a strict left fold:

foldS :: Monad m => (b -> a -> b) -> b -> Stream m a () -> m b
foldS f b0 (Stream step ms0) =
    ms0 >>= loop b0
    loop !b s = do
        res <- step s
        case res of
            Stop () -> return b
            Skip s' -> loop b s'
            Emit s' a -> loop (f b a) s'

We unpack the input Stream constructor again, get the initial state, and then loop. Each loop, we run the input step function.

Match and mismatch with conduit

There's a simple, straightforward conversion from a Stream to a Source:

toSource :: Monad m => Stream m a () -> Producer m a
toSource (Stream step ms0) =
    lift ms0 >>= loop
    loop s = do
        res <- lift $ step s
        case res of
            Stop () -> return ()
            Skip s' -> loop s'
            Emit s' a -> yield a >> loop s'

We extract the state, and then loop over it, calling yield for each emitted value. And ignoring finalizers for the moment, there's even a way to convert a Source into a Stream:

fromSource :: Monad m => Source m a -> Stream m a ()
fromSource (ConduitM src0) =
    Stream step (return $ src0 Done)
    step (Done ()) = return $ Stop ()
    step (Leftover p ()) = return $ Skip p
    step (NeedInput _ p) = return $ Skip $ p ()
    step (PipeM mp) = liftM Skip mp
    step (HaveOutput p _finalizer o) = return $ Emit p o

Unfortunately, there's no straightforward conversion for Conduits (transformers) and Sinks (consumers). There's simply a mismatch in the conduit world- which is fully continuation based- to the stream world- where the upstream is provided in an encapsulated value. I did find a few representations that mostly work, but the performance characteristics are terrible.

If anyone has insights into this that I missed, please contact me, as this could have an important impact on the future of stream fusion in conduit. But for the remainder of this blog post, I will continue under the assumption that only Source and Stream can be efficiently converted.


Once I accepted that I wouldn't be able to convert a stream transformation into a conduit transformation, I was left with a simple approach to start working on fusion: have two representations of each function we want to be able to fuse. The first representation would use normal conduit code, and the second would be streaming. This looks like:

data StreamConduit i o m r = StreamConduit
    (ConduitM i o m r)
    (Stream m i () -> Stream m o r)

Notice that the second field uses the stream fusion concept of a Stream-transforming function. At first, this may seem like it doesn't properly address Sources and Sinks, since the former doesn't have an input Stream, and the latter results in a single output value, not a Stream. However, those are really just special cases of the more general form used here. For Sources, we provide an empty input stream, and for Sinks, we continue executing the Stream until we get a Stop constructor with the final result. You can see both of these in the implementation of the connectStream function (whose purpose I'll explain in a moment):

connectStream :: Monad m
              => StreamConduit () i    m ()
              -> StreamConduit i  Void m r
              -> m r
connectStream (StreamConduit _ stream) (StreamConduit _ f) =
    run $ f $ stream $ Stream emptyStep (return ())
    emptyStep _ = return $ Stop ()
    run (Stream step ms0) =
        ms0 >>= loop
        loop s = do
            res <- step s
            case res of
                Stop r -> return r
                Skip s' -> loop s'
                Emit _ o -> absurd o

Notice how we've created an empty Stream using emptyStep and a dummy () state. And on the run side, we loop through the results. The type system (via the Void datatype) prevents the possibility of a meaningful Emit constructor, and we witness this with the absurd function. For Stop we return the final value, and Skip implies another loop.

Fusing StreamConduit

Assuming we have some functions that use StreamConduit, how do we get things to fuse? We still need all of our functions to have a ConduitM type signature, so we start off with a function to convert a StreamConduit into a ConduitM:

unstream :: StreamConduit i o m r -> ConduitM i o m r
unstream (StreamConduit c _) = c
{-# INLINE [0] unstream #-}

Note that we hold off on any inlining until simplification phase 0. This is vital to our next few rewrite rules, which is where all the magic happens.

The next thing we want to be able to do is categorically compose two StreamConduits together. This is easy to do, since a StreamConduit is made up of ConduitMs which compose via the =$= operator, and Stream transformers, which compose via normal function composition. This results in a function:

fuseStream :: Monad m
           => StreamConduit a b m ()
           -> StreamConduit b c m r
           -> StreamConduit a c m r
fuseStream (StreamConduit a x) (StreamConduit b y) = StreamConduit (a =$= b) (y . x)
{-# INLINE fuseStream #-}

That's very logical, but still not magical. The final trick is a rewrite rule:

{-# RULES "fuseStream" forall left right.
        unstream left =$= unstream right = unstream (fuseStream left right)

We're telling GHC that, if we see a composition of two streamable conduits, then we can compose the stream versions of them and get the same result. But this isn't enough yet; unstream will still end up throwing away the stream version. We now need to deal with running these things. The first case we'll handle is connecting two streamable conduits, which is where the connectStream function from above comes into play. If you go back and look at that code, you'll see that the ConduitM fields are never used. All that's left is telling GHC to use connectStream when appropriate:

{-# RULES "connectStream" forall left right.
        unstream left $$ unstream right = connectStream left right

The next case we'll handle is when we connect a streamable source to a non-streamable sink. This is less efficient than the previous case, since it still requires allocating ConduitM constructors, and doesn't expose as many opportunities for GHC to inline and optimize our code. However, it's still better than nothing:

connectStream1 :: Monad m
               => StreamConduit () i    m ()
               -> ConduitM      i  Void m r
               -> m r
connectStream1 (StreamConduit _ fstream) (ConduitM sink0) =
    case fstream $ Stream (const $ return $ Stop ()) (return ()) of
        Stream step ms0 ->
            let loop _ (Done r) _ = return r
                loop ls (PipeM mp) s = mp >>= flip (loop ls) s
                loop ls (Leftover p l) s = loop (l:ls) p s
                loop _ (HaveOutput _ _ o) _ = absurd o
                loop (l:ls) (NeedInput p _) s = loop ls (p l) s
                loop [] (NeedInput p c) s = do
                    res <- step s
                    case res of
                        Stop () -> loop [] (c ()) s
                        Skip s' -> loop [] (NeedInput p c) s'
                        Emit s' i -> loop [] (p i) s'
             in ms0 >>= loop [] (sink0 Done)
{-# INLINE connectStream1 #-}

{-# RULES "connectStream1" forall left right.
        unstream left $$ right = connectStream1 left right

There's a third case that's worth considering: a streamable sink and non-streamable source. However, I ran into two problems when implementing such a rewrite rule:

  • GHC did not end up firing the rule.

  • There are some corner cases regarding finalizers that need to be dealt with. In our previous examples, the upstream was always a stream, which has no concept of finalizers. But when the upstream is a conduit, we need to make sure to call them appropriately.

So for now, fusion only works for cases where all of the functions can by fused, or all of the functions before the $$ operator can be fused. Otherwise, we'll revert to the normal performance of conduit code.


I took the benchmarks from our previous blog post and modified them slightly. The biggest addition was including an example of enumFromTo =$= map =$= map =$= fold, which really stresses out the fusion capabilities, and demonstrates the performance gap stream fusion offers.

The other thing to note is that, in the "before fusion" benchmarks, the sum results are skewed by the fact that we have the overly eager rewrite rules for enumFromTo $$ fold (for more information, see the previous blog post). For the "after fusion" benchmarks, there are no special-case rewrite rules in place. Instead, the results you're seeing are actual artifacts of having a proper fusion framework in place. In other words, you can expect this to translate into real-world speedups.

You can compare before fusion and after fusion. Let me provide a few select comparisons:

Benchmark Low level or vector Before fusion After fusion Speedup
map + sum 5.95us 636us 5.96us 99%
monte carlo 3.45ms 5.34ms 3.70ms 71%
sliding window size 10, Seq 1.53ms 1.89ms 1.53ms 21%
sliding vector size 10, unboxed 2.25ms 4.05ms 2.33ms 42%

Note at the map + sum benchmark is very extreme, since the inner loop is doing very cheap work, so the conduit overhead dominated the analysis.

Streamifying a conduit

Here's an example of making a conduit function stream fusion-compliant, using the map function:

mapC :: Monad m => (a -> b) -> Conduit a m b
mapC f = awaitForever $ yield . f
{-# INLINE mapC #-}

mapS :: Monad m => (a -> b) -> Stream m a r -> Stream m b r
mapS f (Stream step ms0) =
    Stream step' ms0
    step' s = do
        res <- step s
        return $ case res of
            Stop r -> Stop r
            Emit s' a -> Emit s' (f a)
            Skip s' -> Skip s'
{-# INLINE mapS #-}

map :: Monad m => (a -> b) -> Conduit a m b
map = mapC
{-# INLINE [0] map #-}
{-# RULES "unstream map" forall f.
    map f = unstream (StreamConduit (mapC f) (mapS f))

Notice the three steps here:

  • Define a pure-conduit implementation (mapC), which looks just like conduit 1.1's map function.
  • Define a pure-stream implementation (mapS), which looks very similar to vector's mapS.
  • Define map, which by default simply reexposes mapC. But then, use an INLINE statement to delay inlining until simplification phase 0, and use a rewrite rule to rewrite map in terms of unstream and our two helper functions mapC and mapS.

While tedious, this is all we need to do for each function to expose it to the fusion framework.

Vector vs conduit, mapM style

Overall, vector has been both the inspiration for the work I've done here, and the bar I've used to compare against, since it is generally the fastest implementation you can get in Haskell (and tends to be high-level code to boot). However, there seems to be one workflow where conduit drastically outperforms vector: chaining together monadic transformations.

I put together a benchmark which does the same enumFromTo+map+sum benchmark I demonstrated previously. But this time, I have four versions: vector with pure functions, vector with IO functions, conduit with pure functions, and conduit with IO functions. You can see the results here, the important takeaway is:

  • Pure is always faster, since it exposes more optimizations to GHC.
  • vector and conduit pure are almost identical, at 57.7us and 58.1us.
  • Monadic conduit code does have a slowdown (86.3us). However, monadic vector code has a drastic slowdown (305us), presumably because monadic binds defeat its fusion framework.

So there seems to be at least one workflow for which conduit's fusion framework can outperform even vector!


The biggest downside to this implementation of stream fusion is that we need to write all of our algorithms twice. This can possibly be mitigated by having a few helper functions in place, and implementing others in terms of those. For example, mapM_ can be implemented in terms foldM.

There's one exception to this: using the streamSource function, we can convert a Stream into a Source without having to write our algorithm twice. However, due to differences in how monadic actions are performed between Stream and Conduit, this could introduce a performance degredation for pure Sources. We can work around that with a special case function streamSourcePure for the Identity monad as a base.

Getting good performance

In order to take advantage of the new stream fusion framework, try to follow these guidelines:

  • Use fusion functions whenever possible. Explicit usage of await and yield will immediately kick you back to non-fusion (the same as explicit pattern matching defeats list fusion).
  • If you absolutely cannot use an existing fusion function, consider writing your own fusion variant.
  • When mixing fusion and non-fusion, put as many fusion functions as possible together with the $= operator before the connect operator $$.

Next steps

Even though this work is now publicly available on Hackage, there's still a lot of work to be done. This falls into three main categories:

  • Continue rewriting core library functions in streaming style. Michael Sloan has been working on a lot of these functions, and we're hoping to have almost all the combinators from Data.Conduit.List and Data.Conduit.Combinators done soon.
  • Research why rewrite rules and inlining don't play nicely together. In a number of places, we've had to explicitly use rewrite rules to force fusion to happen, when theoretically inlining should have taken care of it for us.
  • Look into any possible alternative formulations of stream fusion that provide better code reuse or more reliable rewrite rule firing.

Community assistance on all three points, but especially 2 and 3, are much appreciated!

August 27, 2014 12:00 AM

August 26, 2014

Edward Z. Yang

A taste of Cabalized Backpack

So perhaps you've bought into modules and modularity and want to get to using Backpack straightaway. How can you do it? In this blog post, I want to give a tutorial-style taste of how to program Cabal in the Backpack style. None of these examples are executable, because only some of this system is in GHC HEAD--the rest are on branches awaiting code review or complete vaporware. However, we've got a pretty good idea how the overall design and user experience should go, and so the purpose of this blog post is to communicate that idea. Comments and suggestions would be much appreciated; while the design here is theoretically well-founded, for obvious reasons, we don't have much on-the-ground programmer feedback yet.

A simple package in today's Cabal

To start, let's briefly review how Haskell modules and Cabal packages work today. Our running example will be the bytestring package, although I'll inline, simplify and omit definitions to enhance clarity.

Let's suppose that you are writing a library, and you want to use efficient, packed strings for some binary processing you are doing. Fortunately for you, the venerable Don Stewart has already written a bytestring package which implements this functionality for you. This package consists of a few modules: an implementation of strict ByteStrings...

module Data.ByteString(ByteString, empty, singleton, ...) where
  data ByteString = PS !(ForeignPtr Word8) !Int !Int
  empty :: ByteString
  empty = PS nullForeignPtr 0 0

...and an implementation of lazy ByteStrings:

module Data.ByteString.Lazy(ByteString, empty, singleton, ...) where
  data ByteString = Empty | Chunk !S.ByteString ByteString
  empty :: ByteString
  empty = Empty

These modules are packaged up into a package which is specified using a Cabal file (for now, we'll ignore the ability to define libraries/executables in the same Cabal file and assume everything is in a library):

name: bytestring
build-depends: base >= 4.2 && < 5, ghc-prim, deepseq
exposed-modules: Data.ByteString, Data.ByteString.Lazy, ...
other-modules: ...

We can then make a simple module and package which depends on the bytestring package:

module Utils where
  import Data.ByteString.Lazy as B
  blank :: IO ()
  blank = B.putStr B.empty
name: utilities
version: 0.1
build-depends: base, bytestring >= 0.10
exposed-modules: Utils

It's worth noting a few things about this completely standard module setup:

  1. It's not possible to switch Utils from using lazy ByteStrings to strict ByteStrings without literally editing the Utils module. And even if you do that, you can't have Utils depending on strict ByteString, and Utils depending on lazy ByteString, in the same program, without copying the entire module text. (This is not too surprising, since the code really is different.)
  2. Nevertheless, there is some amount of indirection here: while Utils includes a specific ByteString module, it is unspecified which version of ByteString it will be. If (hypothetically) the bytestring library released a new version where lazy byte-strings were actually strict, the functionality of Utils would change accordingly when the user re-ran dependency resolution.
  3. I used a qualified import to refer to identifiers in Data.ByteString.Lazy. This is a pretty common pattern when developing Haskell code: we think of B as an alias to the actual model. Textually, this is also helpful, because it means I only have to edit the import statement to change which ByteString I refer to.

Generalizing Utils with a signature

To generalize Utils with some Backpack magic, we need to create a signature for ByteString, which specifies what the interface of the module providing ByteStrings is. Here one such signature, which is placed in the file Data/ByteString.hsig inside the utilities package:

module Data.ByteString where
  import Data.Word
  data ByteString
  instance Eq ByteString
  empty :: ByteString
  singleton :: Word8 -> ByteString
  putStr :: ByteString -> IO ()

The format of a signature is essentially the same of that of an hs-boot file: we have normal Haskell declarations, but omitting the actual implementations of values.

The utilities package now needs a new field to record signatures:

name: utilities
indefinite: True
build-depends: base
exposed-modules: Utils
required-signatures: Data.ByteString

Notice that there have been three changes: (1) We've removed the direct dependency on the bytestring package, (2) we've added a new field indefinite, which indicates that this indefinite package has signatures and cannot be compiled until those signatures are filled in with implementations (this field is strictly redundant, but is useful for documentation purposes, as we will see later), and (3) we have a new field required-signatures which simply lists the names of the signature files (also known as holes) that we need filled in.

How do we actually use the utilities package, then? Let's suppose our goal is to produce a new module, Utils.Strict, which is Utils but using strict ByteStrings (which is exported by the bytestring package under the module name Data.ByteString). To do this, we'll need to create a new package:

name: strict-utilities
build-depends: utilities, bytestring
reexported-modules: Utils as Utils.Strict

That's it! strict-utilities exports a single module Utils.Strict which is utilities using Data.ByteString from bytestring (which is the strict implementation). This is called a mix-in: in the same dependency list, we simply mix together:

  • utilities, which requires a module named Data.ByteString, and
  • bytestring, which supplies a module named Data.ByteString.

Cabal automatically figures out that how to instantiate the utilities package by matching together module names. Specifically, the two packages above are connected through the module name Data.ByteString. This makes for a very convenient (and as it turns out, expressive) mode of package instantiation. By the way, reexported-modules is a new (orthogonal) feature which lets us reexport a module from the current package or a dependency to the outside world under a different name. The modules that are exported by the package are the exposed-modules and the reexported-modules. The reason we distinguish them is to make clear which modules have source code in the package (exposed-modules).

Unusually, strict-utilities is a package that contains no code! Its sole purpose is to mix existing packages.

Now, you might be wondering: how do we instantiate utilities with the lazy ByteString implementation? That implementation was put in Data.ByteString.Lazy, so the names don't match up. In this case, we can use another new feature, module thinning and renaming:

name: lazy-utilities
  bytestring (Data.ByteString.Lazy as Data.ByteString)
reexported-modules: Utils as Utils.Lazy

The utilities dependency is business as usual, but bytestring has a little parenthesized expression next to it. This expression is the thinning and renaming applied to the package import: it controls what modules are brought into the scope of the current package from a dependency, possibly renaming them to different names. When I write build-depends: bytestring (Data.ByteString.Lazy as Data.ByteString), I am saying "I depend on the bytestring package, but please only make the Data.ByteString.Lazy module available under the name Data.ByteString when considering module imports, and ignore all the other exposed modules." In strict-utilities, you could have also written bytestring (Data.ByteString), because this is the only module that utilities uses from bytestring.

An interesting duality is that you can do the renaming the other way:

name: lazy-utilities
  utilities (Utils, Data.ByteString as Data.ByteString.Lazy),

Instead of renaming the implementation, I renamed the hole! It's equivalent: the thing that matters it that the signature and implementation need to be mixed under the same name in order for linking (the instantiation of the signature with the implementation) to occur.

There are a few things to note about signature usage:

  1. If you are using a signature, there's not much point in also specifying an explicit import list when you import it: you are guaranteed to only see types and definitions that are in the signature (modulo type classes... a topic for another day). Signature files act like a type-safe import list which you can share across modules.

  2. A signature can, and indeed often must, import other modules. In the type signature for singleton in Data/ByteString.hsig, we needed to refer to a type Word8, so we must bring it into scope by importing Data.Word.

    Now, when we compile the signature in the utilities package, we need to know where Data.Word came from. It could have come from another signature, but in this case, it's provided by the definite package base: it's a proper concrete module with an implementation! Signatures can depend on implementations: since we can only refer to types from those modules, we are saying, in effect: any implementation of the singleton function and any representation of the ByteString type is acceptable, but regarding Word8 you must use the specific type from Data.Word in prelude.

  3. What happens if, independently of my packages strict-utilities, someone else also instantiatiates utilities with Data.ByteString? Backpack is clever enough to reuse the instantiation of utilities: this property is called applicativity of the module system. The specific rule that we use to decide if the instantiation is the same is to look at how all of the holes needed by a package are instantiated, and if they are instantiated with precisely the same modules, the instantiated packages are considered type equal. So there is no need to actually create strict-utilities or lazy-utilities: you can just instantiate utilities on the fly.

Mini-quiz: What does this package do?

name: quiz-utilities
  utilities (Utils, Data.ByteString as B),
  bytestring (Data.ByteString.Lazy as B)

Sharing signatures

It's all very nice to be able to explicitly write a signature for Data.ByteString in my package, but this could get old if I have to do this for every single package I depend on. It would be much nicer if I could just put all my signatures in a package and include that when I want to share it. I want all of the Hackage mechanisms to apply to my signatures as well as my normal packages (e.g. versioning). Well, you can!

The author of bytestring can write a bytestring-sig package which contains only signatures:

name: bytestring-sig
version: 1.0
indefinite: True
build-depends: base
exposed-signatures: Data.ByteString

...and declare that the bytestring package satisfies this signature:

name: bytestring
implements: bytestring-sig-1.0

The implements fields is purely advisory: it offers a proactive check to library authors to make sure they aren't breaking compatibility with signatures, and it also helps Cabal offer suggestions for how to provide implementations for signatures.

Now, utilities can include this package to indicate its dependence on the signature:

name: utilities
indefinite: True
build-depends: base, bytestring-sig-1.0
exposed-modules: Utils

Unlike normal dependencies, signature dependencies should be exact: after all, while you might want an upgraded implementation, you don't want the signature to change on you!

Another interesting difference is that we specified the signatures using exposed-signatures, as opposed to required-signatures. We can summarize all of the fields as follows:

  1. exposed-modules says that there is a public module defined in this package
  2. other-modules says that there is a private module defined in this package
  3. exposed-signatures says that there is a public signature defined in this package
  4. required-signatures says that there is a "private" signature defined in this package
  5. reexported-modules says that there is a public module or signature defined in a dependency.

In this list, public means that it is available to clients. Notice the first four fields list all of the source code in this package. Here is a simple example of a client:

name: utilities-extras
indefinite: True
build-depends: utilities
exposed-modules: Utils.Extra

Utils/Extra.hs defined in this package can import Utils (because it's exposed by utilities) but can't import Data.ByteString (because it's not exposed). Had we said reexported-modules: Data.ByteString in utilities, then Data.ByteString would have been accessible.

Do note, however, that the package is still indefinite (since it depends on an indefinite package). Despite Data.ByteString being "private" to utilities (not importable), a client may still refer to it in a renaming clause in order to instantiate the module:

name: utilities-extras-lazy
  utilities-extras (Data.ByteString as Data.ByteString.Lazy),

You can't "hide" holes altogether: that would be like saying, "I'm never going to say what the actual implementation is!" But you can choose not to directly rely on them.

By the way, if Utils/Extra.hs, in utilities-extras, wanted to import Data.ByteString (even though utilities did not expose it), utilities-extras simply needs directly depend on the signature package:

name: utilities-extras
indefinite: True
build-depends: utilities, bytestring-sig == 1.0
exposed-modules: Utils.Extra

The Data.ByteString hole from utilities and the new hole included here are automatically checked for compatibility and linked together: you only need to provide one implementation for both of them.

Mini-quiz: What does this package do? Specifically, if I include it in a package, what modules are available for import?

name: attoparsec-sig
version: 1.0
indefinite: True
build-depends: base, bytestring-sig
exposed-signatures: Data.Attoparsec


We've covered a lot of ground, but when it comes down to it, Backpack really comes together because of set of orthogonal features which interact in a good way:

  1. Module signatures (mostly implemented but needs lots of testing): the heart of a module system, giving us the ability to write indefinite packages and mix together implementations,
  2. Module reexports (fully implemented and in HEAD): the ability to take locally available modules and reexport them under a different name, and
  3. Module thinning and renaming (fully implemented and in code review): the ability to selectively make available modules from a dependency.

To compile a Backpack package, we first run the traditional version dependency solving, getting exact versions for all packages involved, and then we calculate how to link the packages together. That's it! In a future blog post, I plan to more comprehensively describe the semantics of these new features, especially module signatures, which can be subtle at times. Also, note that I've said nothing about how to type-check against just a signature, without having any implementation in mind. As of right now, this functionality is vaporware; in a future blog post, I also plan on saying why this is so challenging.

by Edward Z. Yang at August 26, 2014 10:01 PM

Chris Smith

On CodeWorld and Haskell

I’ve been pouring a lot of effort into CodeWorld lately… and I wanted to write a sort of apology to the Haskell community.  Well, perhaps not an apology, because I believe I did the right thing.  But at the same time, I realize that decisions I’ve made haven’t been entirely popular among Haskell programmers.  I’d like to explain what happened, and try to make it up to you!

What Happened

Originally, I started this project using Haskell and the excellent gloss package, by Ben Lippmeier.  CodeWorld has been moving slowly further and further away from the rest of the Haskell community.  This has happened in a sequence of steps:

  1. Way back in 2011, I started “CodeWorld”, but at the time, I called it Haskell for Kids.  At the time, I understood that the reasons I’d chosen Haskell as a language were not about cool stuff like type classes (which I love) and monads and categories and other commonplace uses of solid abstractions (which fascinate me).  Instead, I chose Haskell for the simple reason that it looked like math.  The rest of Haskell came with the territory.  I built the first CodeWorld web site in a weekend, and I had to settle on a language and accept all that came with it.
  2. From the beginning, I made some changes for pedagogical reasons.  For example, gloss defines rotation to be clockwise.  I insisted on rotation working in the counter-clockwise direction, because that’s the convention universally used in math.  Later, I resized the canvas to 20×20, so that typical programs would need to use fractions and decimals, which is a middle school math education goal.  I made thes changes, even though they broke compatibility with a widely used package.  Sorry for anyone that’s struggled with this.
  3. I rebranded “Haskell for Kids” as CodeWorld, and stopped explicitly depending on gloss in favor of just reproducing its general approach in a new Prelude.  This was a deliberate attempt to get away from focusing on the Haskell language and libraries, and also to the accompanying import statements and such.  This hid the ways that Haskell was a general purpose language with uses outside this toy environment.  That is unfortunate.
  4. I rewrote the Haskell Prelude, to remove type classes.  Along the way, I collapsed the whole numeric type class hierarchy into a single type, and even got Luite (the author of GHCJS) to help me with some deep black magic to implement equality on arbitrary Haskell types without type classes.  This threw away much of the beauty of Haskell… in favor of dramatically improved error messages, and fewer things you need to know to get started.  It was a real loss.
  5. Finally, I commited the unforgivable sin.  I dropped curried functions, in favor of defining functions of multiple parameters using tuples.  This finally makes CodeWorld feel like a completely different language from Haskell.  That really sucks, and I know some people are frustrated.

Why It Happened?

First, I want to point out some things that are not the reason for any of this:

  • I did not do this because I think there’s something wrong with Haskell.  I love type classes.  I love currying, and especially love how it’s not just a convenient trick, but sometimes introduces whole new perspectives by viewing tedious functions of multiple parameters as simple, clean, and elegant higher-order functions.
  • I also did not do this because I think anyone is incapable of learning full-fledged Haskell.  In fact, I taught full-fledged Haskell to middle schoolers for a year.  I know they can do it.

So why did I do it?  Two reasons:

  • Teaching mathematics has always been more important to me than teaching Haskell.  While Haskell is an awesome programming language, mathematics is just an awesome perspective on life.  For every student who benefits from learning an inspiring programming language, many students will benefit from learning that humanity has a method called mathematics for thinking about fundamental truths in a systematic, logical way that can capture things precisely.  So any time I have to choose between pushing students further toward their math education or away from it, I’ll choose toward it.
  • Details matter.  Even though I know kids are capable of a lot, they are capable of a lot more without artificial obstacles in their way.  I learned this the hard way teaching this class the first time.  The smallest little things, with absolutely no great significance as a language, matter a lot.  Having to put parentheses around negative numbers obscures students from reaching leaps of understanding.  Confusing error messages mean the difference between a student who spends a weekend learning, and one who gives up on Friday afternoon and doesn’t think about it until the next school day.  Different surface syntax means that a lot of kids never fully make the connection that functions here are the same thing as functions there.

In the end, I do think these were the right decisions… despite the frustration they can cause for Haskell programmers who know there’s a better way.

Making Up For It

A couple weekends ago, though, I worked on something to hopefully restore some of this loss for Haskellers.  You see, all the changes I’ve made, in the end, come from replacing the Prelude module with my own alternative.  Specifically:

  1. I deliberately replaced functions from the Prelude with my modified versions.
  2. Because I provided an alternative Prelude, I had to hide the base package, which made it impossible to import things like Control.Monad.  This was not a deliberate decision.  It just happened.

So I fixed this.  I added to the codeworld-base package re-exports of all of the modules from base.  I renamed Prelude to HaskellPrelude in the process, so that it doesn’t conflict with my own Prelude.  And finally, I added a new module, CodeWorld, that exports all the really new stuff from CodeWorld like pictures, colors, and the interpreters for pictures, animations, simulations, etc.  The result is that you can now start your programs with the following:

import Prelude()
import HaskellPrelude
import CodeWorld -- If you still want to do pictures, etc.

main = putStrLn "Hello, World"

At this point, you can write any Haskell you like!  You aren’t even constrained to pure code, or safe code.  (The exception: TemplateHaskell is still rejected, since the compiler runs on the server, so TH code would execute code on the server.)

In fact, it’s even better!  You’re free to use GHCJS JavaScript foreign imports, to interact with the browser environment!  See a brief example here.  Now you’re out of the sandbox, and are free to play around however you like.

Right now, the CodeWorld module still uses uncurried functions and other CodeWorld conventions like Number for numbers, etc.  There’s no reason for this, and it’s something that I should probably change.  Anyone want to send a pull request?

by cdsmith at August 26, 2014 04:38 PM

Dominic Steinitz

Haskell Vectors and Sampling from a Categorical Distribution


Suppose we have a vector of weights which sum to 1.0 and we wish to sample n samples randomly according to these weights. There is a well known trick in Matlab / Octave using sampling from a uniform distribution.

num_particles = 2*10^7
likelihood = zeros(num_particles,1);
likelihood(:,1) = 1/num_particles;
[_,index] = histc(rand(num_particles,1),[0;cumsum(likelihood/sum(likelihood))]);
s = sum(index);

Using tic and toc this produces an answer with

Elapsed time is 10.7763 seconds.


I could find no equivalent function in Haskell nor could I easily find a binary search function.

> {-# OPTIONS_GHC -Wall                     #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing  #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults   #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind  #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans         #-}
> {-# LANGUAGE BangPatterns                 #-}
> import System.Random.MWC
> import qualified Data.Vector.Unboxed as V
> import Control.Monad.ST
> import qualified Data.Vector.Algorithms.Search as S
> import Data.Bits
> n :: Int
> n = 2*10^7

Let’s create some random data. For a change let’s use mwc-random rather than random-fu.

> vs  :: V.Vector Double
> vs = runST (create >>= (asGenST $ \gen -> uniformVector gen n))

Again, I could find no equivalent of cumsum but we can write our own.

> weightsV, cumSumWeightsV :: V.Vector Double
> weightsV = V.replicate n (recip $ fromIntegral n)
> cumSumWeightsV = V.scanl (+) 0 weightsV

Binary search on a sorted vector is straightforward and a cumulative sum ensures that the vector is sorted.

> binarySearch :: (V.Unbox a, Ord a) =>
>                 V.Vector a -> a -> Int
> binarySearch vec x = loop 0 (V.length vec - 1)
>   where
>     loop !l !u
>       | u <= l    = l
>       | otherwise = let e = vec V.! k in if x <= e then loop l k else loop (k+1) u
>       where k = l + (u - l) `shiftR` 1
> indices :: V.Vector Double -> V.Vector Double -> V.Vector Int
> indices bs xs = (binarySearch bs) xs

To see how well this performs, let’s sum the indices (of course, we wouldn’t do this in practice) as we did for the Matlab implementation.

> js :: V.Vector Int
> js = indices (V.tail cumSumWeightsV) vs
> main :: IO ()
> main = do
>   print $ V.foldl' (+) 0 js

Using +RTS -s we get

Total   time   10.80s  ( 11.06s elapsed)

which is almost the same as the Matlab version.

I did eventually find a binary search function in vector-algorithms and since one should not re-invent the wheel, let us try using it.

> indices' :: (V.Unbox a, Ord a) => V.Vector a -> V.Vector a -> V.Vector Int
> indices' sv x = runST $ do
>   st <- V.unsafeThaw (V.tail sv)
>   V.mapM (S.binarySearch st) x
> main' :: IO ()
> main' = do
>   print $  V.foldl' (+) 0 $ indices' cumSumWeightsV vs

Again using +RTS -s we get

Total   time   11.34s  ( 11.73s elapsed)

So the library version seems very slightly slower.

by Dominic Steinitz at August 26, 2014 03:05 PM

Douglas M. Auclair (geophf)

Dylan: the harsh realities of the market

So, this is a little case study.

I did everything for Dylan. And when I say everything, I mean everything.  Here's my resumé:

  • I got excited about Dylan as a user, and I used it. I bought an old Mac that I don't ever remember the designation for, it's so '90's old, and got the floppies for the Dylan IDE from Apple research.
I'm not joking.

  • I integrated Dylan into my work at work, building an XML parser then open-sourcing it to the community under the (then) non-restrictive license. I think mine was the only XML parser that was industrial-strength for Dylan. Can't claim originality: I ported over the Common-LISP one, but it was a lot of (fun) work.
  • I made improvements to the gwydion-dylan compiler, including some library documentation (you can see my name right there, right in the compiler code), including some library functionality, did I work on the compiler itself? The Dylan syntax extensions or type system? I don't recall; if not in those places, I know I've looked at those guts: I had my fingers all over parts of the compiler.
I was in the Dylan compiler code. For you ll-types ('little language') that's no big deal.

But ask a software developer in industry if they've ever been in their compiler code. I have, too: I've found bugs in Java Sun-compiler that I fixed locally and reported up the chain.
  • I taught a course at our community college on Dylan. I had five students from our company that made satellite mission software.
  • I effing had commercial licenses bought when the boss asked me: what do we have to do to get this (my system) done/integrated into the build. I put my job on the line, for Dylan. ... The boss bought the licenses: he'd rather spend the $x than spending six weeks to back-port down to Java or C++.
  • I built a rule-based man-power scheduling system that had previously took three administrative assistants three days each quarter to generate. My system did it, and printed out a PDF in less than one second. I sold it, so that means I started a commercial company and sold my software.
I sold commercial Dylan software. That I wrote. Myself. And sold. Because people bought it. Because it was that good.

Hells yeah.

Question: what more could I have done?

I kept Dylan alive for awhile. In industry. For real.

So why is Dylan dead?

That's not the question.

Or, that question is answered over and over and over again.

Good languages, beautiful languages, right-thing languages languish and die all the time.

Dylan was the right-thing, and they (Apple) killed it in the lab, and for a reason.

Who is Dylan for?

That's not the question either. Because you get vague, general, useless answers.

The question is to ask it like Paul Graham answered it for LISP.

Lisp is a pointless, useless, weird language that nobody uses.

But Paul and his partner didn't care. They didn't give a ...


... what anybody else thought. They knew that this language, the language they loved, was built and designed and made for them. Just them and only them, because the only other people who were using it were college kids on comp.lang.lisp asking for the answers for problem-set 3 on last night's homework.

That's what Lisp was good for: nothing.
That's who Lisp was good for: nobody.

Same exact scenario for Erlang. Exactly the same. Erlang was only good for Joe Armstrong and a couple of buddies/weirdos like him, you know: kooks, who believed that Erlang was the right-thing for what they were doing, because they were on a mission, see, and nothing nobody could say could stop them nor stand against them, and all who would rise up against them would fall.


What made Lisp and Haskell and Erlang and Scala and Prolog (yes, Prolog, although you'll never hear that success story publicly, but $26M and three lives saved? Because of a Prolog system I wrote? And that's just one day in one month's report for data? I call that a success) work when nobody sane would say that these things would work?

Well, it took a few crazy ones to say, no, not: 'say' that it would work, but would make it work with their beloved programming language come hell or high water or, worse: indifferent silence, or ridicule, or pity from the rest of the world.

That is the lesson of perl and python and all these other languages. They're not good for anything. They suck. And they suck in libraries and syntax and semantics and weirdness-factor and everything.

But two, not one, but at least two people loved that language enough to risk everything, and ...

They lost.

Wait. What?

Did you think I was going to paint the rosy picture and lie to you and say 'they won'?

Because they didn't.

Who uses Lisp commercially? Or Haskell, except some fringers, or Scala or Clojure or Erlang or Smalltalk or Prolog

... or Dylan.

These languages are defined, right there in the dictionary.

Erlang: see 'career wrecker.'

Nobody uses those languages nor admits to even touching them with a 10-foot (3-meter) pole. I had an intern from college. 'Yeah, we studied this weird language called ML in Comp.sci. Nobody uses it.'

She was blown away when I started singing ML's praises and what it can do.

A meta-language, and she called it useless? Seriously?

Because that's what the mainstream sees.

Newsflash. I'm sorry. Dylan, Haskell, Idris: these aren't main-stream, and they never will be.

Algebraic types? Dependent types? You'll never see them. They're too ... research-y. They stink of academe, which is: they stink of uselessness-to-industry. You'll be dead and buried to see them in this form, even after they discover the eternity elixir. Sorry.

Or you'll see them in Visual Basic as a new Type-class form that only a few Microserfs use because they happened to have written those extensions. Everybody else?


Here's how Dylan will succeed, right now.

Bruce and I will put our heads together, start a company, and we'll code something. Not for anybody else to use and to love and to cherish, just for us, only for us, and it will blow out the effing doors, and we'll be bought out for $40M because our real worth is $127M.

And the first thing that Apple will do, after they bought us, is to show us the door, then convert the code into Java. Or Swift. Or Objective-C, or whatever.

And that's how we'll win.

Not the $40M. Not the lecture series on 'How to Make Functional Programming Work in Industry for Real' afterwards at FLoC and ICFP conferences with fan-bois and -girls wanting to talk to us afterwards and ask us how they can get a job doing functional programming.

Not that.

We'll win because we made something in Dylan, and it was real, and it worked, and it actually did something for enough people that we can now go to our graves knowing that we did something once with our lives (and we can do it again and again, too: there's no upper limit on the successes you're allowed to have, people) that meant something to some bodies. And we did that. With Dylan.


I've done that several times already, by my counting: the Prolog project, the Dylan project, the Mercury project, and my writing.

I'm ready to do that, again.

Because, actually, fundamentally, doing something in this world and for it ... there's nothing like it.

You write that research paper, and I come up to you, waving it in your face, demanding you implement your research because I need it to do my job in Industry?

I've done that to three professors so far. Effing changed their world-view in that moment. "What?" they said, to a person, "somebody actually wants to use this?" The look of bemused surprise on their faces?

It was sad, actually, because they did write something that somebody out there (moiself) needed, but they never knew that what they were doing meant something.

And it did.

Effing change your world-view. Your job? Your research? Your programming language?

That's status quo, and that's good and necessary and dulce and deleche (or decorum, I forget which).

But get up out of the level you're at, and do something with it so that that other person, slouched in their chair, sits up and takes notice, and a light comes over their face and they say, 'Ooh! That does that? Wow!' and watch their world change, because of you and what you've done.

Dylan is for nothing and for nobody.

So is everything under the Sun, my friend.

Put your hand to the plow, and with the sweat of your brow, make it yours for this specific thing.

Regardless of the long hours, long months of unrewarded work, and regardless of the hecklers, naysayers, and concerned friends and parents, and regardless of the mountain of unpaid bills.

You make it work, and you don't stop until it does.

That's how I've won.

Every time.

by geophf ( at August 26, 2014 12:09 PM

August 25, 2014

Danny Gratzer

Introduction to Dependent Types: Haskell on Steroids

Posted on August 25, 2014

I’d like to start another series of blog posts. This time on something that I’ve wanted to write about for a while, dependent types.

There’s a noticeable lack of accessible materials introducing dependent types at a high level aimed at functional programmers. That’s what this series sets out help fill. Therefore, if you’re a Haskell programmer and don’t understand something, it’s a bug! Please comment so I can help make this a more useful resource for you :)

There are four parts to this series, each answering one question

  1. What are dependent types?
  2. What does a dependently typed language look like?
  3. What does it feel like to write programs with dependent types?
  4. What does it mean to “prove” something?

So first things first, what are dependent types? Most people by now have heard the unhelpful quick answer

A dependent type is a type that depends on a value, not just other types.

But that’s not helpful! What does this actually look like? To try to understand this we’re going to write some Haskell code that pushes us as close as we can get to dependent types in Haskell.

Kicking GHC in the Teeth

Let’s start with the flurry of extensions we need

{-# LANGUAGE DataKinds            #-}
{-# LANGUAGE KindSignatures       #-}
{-# LANGUAGE GADTs                #-}
{-# LANGUAGE TypeFamilies         #-}
{-# LANGUAGE UndecidableInstances #-}

Now our first definition is a standard formulation of natural numbers

    data Nat = Z | S Nat

Here Z represents 0 and S means + 1. So you should read S Z as 1, S (S Z) as 2 and so on and so on.

If you’re having some trouble, this function to convert an Int to a Nat might help

    -- Naively assume n >= 0
    toNat :: Int -> Nat
    toNat 0 = Z
    toNat n = S (toNat $ n - 1)

We can use this definition to formulate addition

    plus :: Nat -> Nat -> Nat
    plus Z n     = n
    plus (S n) m = S (plus n m)

This definition proceeds by “structural induction”. That’s a scary word that pops up around dependent types. It’s not all that complicated, all that it means is that we use recursion only on strictly smaller terms.

There is a way to formally define smaller, if a term is a constructor applied to several (recursive) arguments. Any argument to the constructor is strictly smaller than the original terms. In a strict language if we restrict ourselves to only structural recursion we’re guaranteed that our function will terminate. This isn’t quite the case in Haskell since we have infinite structures.

    toInt :: Nat -> Int
    toInt (S n) = 1 + toInt n
    toInt Z     = 0

    bigNumber = S bigNumber

    main = print (toInt bigNumber) -- Uh oh!

Often people will cheerfully ignore this part of Haskell when talking about reasoning with Haskell and I’ll stick to that tradition (for now).

Now back to the matter at hand. Since our definition of Nat is quite straightforward, it get’s promoted to the kind level by DataKinds.

Now we can “reflect” values back up to this new kind with a second GADTed definition of natural numbers.

    data RNat :: Nat -> * where
      RZ :: RNat Z
      RS :: RNat n -> RNat (S n)

Now, let’s precisely specify the somewhat handwavy term “reflection”. I’m using it in the imprecise sense meaning that we’ve lifted a value into something isomorphic at the type level. Later we’ll talk about reflection precisely mean lifting a value into the type level. That’s currently not possible since we can’t have values in our types!

What on earth could that be useful for? Well with this we can do something fancy with the definition of addition.

    type family Plus n m :: Nat where
      Plus Z n     = n
      Plus (S n) m = S (Plus n m)

Now we’ve reflected our definition of addition to the type family. More than that, what we’ve written above is fairly obviously correct. We can now force our value level definition of addition to respect this type family

    plus' :: RNat n -> RNat m -> RNat (Plus n m)
    plus' RZ n     = n
    plus' (RS n) m = RS (plus' n m)

Now if we messed up this definition we’d get a type error!

    plus' :: RNat n -> RNat m -> RNat (Plus n m)
    plus' RZ n     = n
    plus' (RS n) m = plus' n m -- Unification error! n ~ S n

Super! We know have types that express strict guarantees about our program. But how useable is this?

To put it to the test, let’s try to write some code that reads to integers for standard input and prints their sum.

We can easily do this with our normal plus

    readNat :: IO Nat
    readNat = toNat <$> readLn

    main :: IO ()
    main = plus <$> readNat <*> readNat

Easy as pie! But what about RNat, how can we convert a Nat to an RNat? Well we could try something with type classes I guess

class Reify a where
  type N
  reify :: a -> RNat N

But wait, that doesn’t work since we can only have once instance for all Nats. What if we did the opposite

class Reify (n :: Nat) where
  nat :: RNat n -> Nat

This let’s us go in the other direction.. but that doesn’t help us! In fact there’s no obvious way to propagate runtime values back into the types. We’re stuck.

GHC with Iron Dentures

Now, if we could add some magical extension to GHC could we write something like above program? Yes of course! The key idea is to not reflect up our types with data kinds, but rather just allow the values to exist in the types on their own.

For these I propose two basic ideas

  1. A special reflective function type
  2. Lifting expressions into types

For our special function types, we allow the return type to use the supplied value. These are called pi types. We’ll give this the following syntax

(x :: A) -> B x

Where A :: * and B :: Nat -> * are some sort of type. Notice that that Nat in B’s kind isn’t the data kind promoted version, but just the goodness to honest normal value.

Now in order to allow B to actually make use of it’s supplied value, our second idea let’s normal types be indexed on values! Just like how GADTs can be indexed on types. We’ll call these GGADTs.

So let’s define a new version of RNat

    data RNat :: Nat -> * where
      RZ :: RNat Z
      RS :: RNat n -> RNat (S n)

This looks exactly like what we had before, but our semantics are different now. Those Z’s and S’s are meant to represent actual values, not members of some kind. There’s no promoting types to singleton kinds anymore, just plain old values being held in fancier types.

Because we can depend on normal values, we don’t even have to use our simple custom natural numbers.

    data RInt :: Int -> * where
      RZ :: RInt 0
      RS :: RInt n -> RInt (1 + n)

Notice that we allowed our types to call functions, like +. This can potentially be undecidable, something that we’ll address later.

Now we can write our function with a combination of these two ideas

    toRInt :: (n :: Int) -> RInt n
    toRInt 0 = RZ
    toRInt n = RS (toRInt $ n - 1)

Notice how we used pi types to change the return type dependent on the input value. Now we can feed this any old value, including ones we read from standard input.

    main = print . toInt $ plus' <$> fmap toRInt readLn <*> fmap toRInt readLn

Now, one might wonder how the typechecker could possibly know how to handle such things, after all how could it know what’ll be read from stdin!

The answer is that it doesn’t. When a value is reflected to the type level we can’t do anything with it. For example, if we had a type like

    (n :: Int) -> (if n == 0 then Bool else ())

Then we would have to pattern match on n at the value level to propagate information about n back to the type level.

If we did something like

    foo :: (n :: Int) -> (if n == 0 then Bool else ())
    foo n = case n of
      0 -> True
      _ -> ()

Then the typechecker would see that we’re matching on n, so if we get into the 0 -> ... branch then n must be 0. It can then reduce the return type to if 0 == 0 then Bool else () and finally Bool. A very important thing to note here is that the typechecker doesn’t evaluate the program. It’s examining the function in isolation of all other values. This means we sometimes have to hold its hand to ensure that it can figure out that all branches have the correct type.

This means that when we use pi types we often have to pattern match on our arguments in order to help the typechecker figure out what’s going on.

To make this clear, let’s play the typechecker for this function. I’m reverting to the Nat type since it’s nicer for pattern matching.

    toRNat :: (n :: Nat) -> RNat n
    toRNat Z = RZ -- We know that n is `Z` in this branch
    toRNat (S n) = RS (toRNat n {- This has the type RNat n' -})

    p :: (n :: Nat) -> (m :: Int) -> RNat (plus n m)
    p Z m     = toRNat m
    p (S n) m = RS (toRNat n m)

First the type checker goes through toRNat.

In the first branch we have n equals Z, so RZ trivially typechecks. Next we have the case S n.

  • We know that toRNat n has the type RNat n' by induction
  • We also know that S n' = n.
  • Therefore RS builds us a term of type RNat n.

Now for p. We start in much the same manner.

if we enter the p Z m case

  • we know that n is Z.
  • we can reduce plus n m since plus Z m is by definition equal to m Look at the definition of plus to confirm this).
  • We know how to produce RNat m easily since we have a function toRNat :: (n :: Nat) -> RNat n.
  • We can apply this to m and the resulting term has the type RNat m.

In the RS case we know that we’re trying to produce a term of type RNat (plus (S n) m).

  • Now since we know that the constructor for the first argument of plus, we can reduce plus (S n) m to S (plus n m) by the definition of plus.
  • We’re looking to build a term of type plus n m and that’s as simple as a recursive call.
  • From here we just need to apply RS to give us S (plus n m)
  • As we previously noted S (plus n m) is equal to plus (S n) m

Notice how as we stepped through this as the typechecker we never needed to do any arbitrary reductions. We only ever reduce definitions when we have the outer constructor (WHNF) of one of the arguments.

While I’m not actually proposing adding {-# LANGUAGE PiTypes #-} to GHC, it’s clear that with only a few orthogonal editions to system F we can get some seriously cool types.

Wrap Up

Believe or not we’ve just gone through two of the most central concepts in dependent types

  • Indexed type families (GGADTs)
  • Dependent function types (Pi types)

Not so bad was it? :) From here we’ll look in the next post how to translate our faux Haskell into actual Agda code. From there we’ll go through a few more detailed examples of pi types and GGADTs by poking through some of the Agda standard library.

Thanks for reading, I must run since I’m late for class. It’s an FP class ironically enough.

<script type="text/javascript"> /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + ''; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

August 25, 2014 12:00 AM

August 24, 2014

Alejandro Serrano Mena

Using Emacs for Haskell development

In the last months, the toolchain for using Haskell within Emacs has changed a lot, and has become a lot better. Apart from my additions to ghc-mod, new autocompletion packages such as company-ghc have appeared.

In the past, I've felt that there was a need for a comprehensive article of all the available options for Haskell development in Emacs, including haskell-mode, ghc-mod, company-ghc, HaRe and structured-haskell-mode. To fill this gap, I have written a tutorial covering installation, configuration and use of these tools, especially keeping an eye into making all of them work nicely when put together.

Hope it helps!

by Alejandro Serrano ( at August 24, 2014 10:48 AM

Summer of Code on Emacs!

This summer I've been participating in Google Summer of Code, as I did some years ago. My aim was the same: to make it easier for Haskell developers to interact with their code. But instead of Eclipse, I've focused on another very well-known editor: Emacs. In particular, I've been extending the already excellent ghc-mod.

During the last year I've turned increasingly jealous of the Emacs modes for Agda and Idris, two programming languages which resemble Haskell but add dependent types to the mix. Using those modes, you can work interactively with your code, write pattern matches automatically, refine certain parts of your code, ask the compiler what is the type that a certain code should have, and so on. Furthermore, since version 7.8 GHC includes support for typed holes, so it seemed like all the necessary infrastructure from the compiler was in place to do this.

Instead of a boring description of the outcome of the project, I have prepared a video demostration ;)

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="450" src="" width="600"></iframe>

As a summary, here is the list of new key bindings that you can use since the release (just a few days ago) of ghc-mod 5.0:

  • C-u M-t: create the skeleton of a function from its signature, or the skeleton of a type class instance from its declaration,
  • M-t: perform case splitting on variables;
  • C-c M-n and C-c M-p: navigate between typed holes in your program, to the next or the previous one, respectively;
  • C-c C-f: refine a hole through an expression, including as much holes as needed to make it type check;
  • C-c C-a: try automatic completion of a hole by calling Djinn.

I would like to thank eveybody who has helped me during this summer, especially my mentor David Raymond Christiansen (whose work in idris-mode is just amazing) and Kazu Yamamoto, the creator and maintainer of ghc-mod.

by Alejandro Serrano ( at August 24, 2014 09:09 AM

August 23, 2014

Antti-Juhani Kaijanaho (ibid)

A milestone toward a doctorate

Yesterday I received my official diploma for the degree of Licentiate of Philosophy. The degree lies between a Master’s degree and a doctorate, and is not required; it consists of the coursework required for a doctorate, and a Licentiate Thesis, “in which the student demonstrates good conversance with the field of research and the capability of independently and critically applying scientific research methods” (official translation of the Government decree on university degrees 794/2004, Section 23 Paragraph 2).

The title and abstract of my Licentiate Thesis follow:

Kaijanaho, Antti-Juhani
The extent of empirical evidence that could inform evidence-based design of programming languages. A systematic mapping study.
Jyväskylä: University of Jyväskylä, 2014, 243 p.
(Jyväskylä Licentiate Theses in Computing,
ISSN 1795-9713; 18)
ISBN 978-951-39-5790-2 (nid.)
ISBN 978-951-39-5791-9 (PDF)
Finnish summary

Background: Programming language design is not usually informed by empirical studies. In other fields similar problems have inspired an evidence-based paradigm of practice. Central to it are secondary studies summarizing and consolidating the research literature. Aims: This systematic mapping study looks for empirical research that could inform evidence-based design of programming languages. Method: Manual and keyword-based searches were performed, as was a single round of snowballing. There were 2056 potentially relevant publications, of which 180 were selected for inclusion, because they reported empirical evidence on the efficacy of potential design decisions and were published on or before 2012. A thematic synthesis was created. Results: Included studies span four decades, but activity has been sparse until the last five years or so. The form of conditional statements and loops, as well as the choice between static and dynamic typing have all been studied empirically for efficacy in at least five studies each. Error proneness, programming comprehension, and human effort are the most common forms of efficacy studied. Experimenting with programmer participants is the most popular method. Conclusions: There clearly are language design decisions for which empirical evidence regarding efficacy exists; they may be of some use to language designers, and several of them may be ripe for systematic reviewing. There is concern that the lack of interest generated by studies in this topic area until the recent surge of activity may indicate serious issues in their research approach.

Keywords: programming languages, programming language design, evidence-based paradigm, efficacy, research methods, systematic mapping study, thematic synthesis

A Licentiate Thesis is assessed by two examiners, usually drawn from outside of the home university; they write (either jointly or separately) a substantiated statement about the thesis, in which they suggest a grade. The final grade is almost always the one suggested by the examiners. I was very fortunate to have such prominent scientists as Dr. Stefan Hanenberg and Prof. Stein Krogdahl as the examiners of my thesis. They recommended, and I received, the grade “very good” (4 on a scale of 1–5).

The thesis has been accepted for publication in our faculty’s licentiate thesis series and will in due course appear in our university’s electronic database (along with a very small number of printed copies). In the mean time, if anyone wants an electronic preprint, send me email at

<figure class="wp-caption aligncenter" id="attachment_1622" style="width: 334px;">Figure 1 of the thesis: an overview of the mapping process<figcaption class="wp-caption-text">Figure 1 of the thesis: an overview of the mapping process</figcaption></figure>

As you can imagine, the last couple of months in the spring were very stressful for me, as I pressed on to submit this thesis. After submission, it took me nearly two months to recover (which certain people who emailed me on Planet Haskell business during that period certainly noticed). It represents the fruit of almost four years of work (way more than normally is taken to complete a Licentiate Thesis, but never mind that), as I designed this study in Fall 2010.

<figure class="wp-caption aligncenter" id="attachment_1625" style="width: 330px;">Figure 8 of the thesis: Core studies per publication year<figcaption class="wp-caption-text">Figure 8 of the thesis: Core studies per publication year</figcaption></figure>

Recently, I have been writing in my blog a series of posts in which I have been trying to clear my head about certain foundational issues that irritated me during the writing of the thesis. The thesis contains some of that, but that part of it is not very strong, as my examiners put it, for various reasons. The posts have been a deliberately non-academic attempt to shape the thoughts into words, to see what they look like fixed into a tangible form. (If you go read them, be warned: many of them are deliberately provocative, and many of them are intended as tentative in fact if not in phrasing; the series also is very incomplete at this time.)

I closed my previous post, the latest post in that series, as follows:

In fact, the whole of 20th Century philosophy of science is a big pile of failed attempts to explain science; not one explanation is fully satisfactory. [...] Most scientists enjoy not pondering it, for it’s a bit like being a cartoon character: so long as you don’t look down, you can walk on air.

I wrote my Master’s Thesis (PDF) in 2002. It was about the formal method called “B”; but I took a lot of time and pages to examine the history and content of formal logic. My supervisor was, understandably, exasperated, but I did receive the highest possible grade for it (which I never have fully accepted I deserved). The main reason for that digression: I looked down, and I just had to go poke the bridge I was standing on to make sure I was not, in fact, walking on air. In the many years since, I’ve taken a lot of time to study foundations, first of mathematics, and more recently of science. It is one reason it took me about eight years to come up with a doable doctoral project (and I am still amazed that my department kept employing me; but I suppose they like my teaching, as do I). The other reason was, it took me that long to realize how to study the design of programming languages without going where everyone has gone before.

Debian people, if any are still reading, may find it interesting that I found significant use for the dctrl-tools toolset I have been writing for Debian for about fifteen years: I stored my data collection as a big pile of dctrl-format files. I ended up making some changes to the existing tools (I should upload the new version soon, I suppose), and I wrote another toolset (unfortunately one that is not general purpose, like the dctrl-tools are) in the process.

For the Haskell people, I mainly have an apology for not attending to Planet Haskell duties in the summer; but I am back in business now. I also note, somewhat to my regret, that I found very few studies dealing with Haskell. I just checked; I mention Haskell several times in the background chapter, but it is not mentioned in the results chapter (because there were not studies worthy of special notice).

I am already working on extending this work into a doctoral thesis. I expect, and hope, to complete that one faster.

by Antti-Juhani Kaijanaho at August 23, 2014 05:44 PM

Joachim Breitner

This blog goes static

After a bit more than 9 years, I am replacing Serendipity, which as been hosting my blog, by a self-made static solution. This means that when you are reading this, my server no longer has to execute some rather large body of untyped code to produce the bytes sent to you. Instead, that happens once in a while on my laptop, and they are stored as static files on the server.

I hope to get a little performance boost from this, so that my site can more easily hold up to being mentioned on hackernews. I also do not want to worry about security issues in Serendipity – static files are not hacked.

Of course there are down-sides to having a static blog. The editing is a bit more annoying: I need to use my laptop (previously I could post from anywhere) and I edit text files instead of using a JavaScript-based WYSIWYG editor (but I was slightly annoyed by that as well). But most importantly your readers cannot comment on static pages. There are cloud-based solutions that integrate commenting via JavaScript on your static pages, but I decided to go for something even more low-level: You can comment by writing an e-mail to me, and I’ll put your comment on the page. This has the nice benefit of solving the blog comment spam problem.

The actual implementation of the blog is rather masochistic, as my web page runs on one of these weird obfuscated languages (XSLT). Previously, it contained of XSLT stylesheets producing makefiles calling XSLT sheets. Now it is a bit more-self-contained, with one XSLT stylesheet writing out all the various html and rss files.

I managed to import all my old posts and comments thanks to this script by Michael Hamann (I had played around with this some months ago and just spend what seemed to be an hour to me to find this script again) and a small Haskell script. Old URLs are rewritten (using mod_rewrite) to the new paths, but feed readers might still be confused by this.

This opens the door to a long due re-design of my webpage. But not today...

by Joachim Breitner ( at August 23, 2014 03:54 PM

Dominic Steinitz

Importance Sampling

Importance Sampling

Suppose we have an random variable X with pdf 1/2\exp{-\lvert x\rvert} and we wish to find its second moment numerically. However, the random-fu package does not support sampling from such as distribution. We notice that

\displaystyle   \int_{-\infty}^\infty x^2 \frac{1}{2} \exp{-\lvert x\rvert} \mathrm{d}x =  \int_{-\infty}^\infty x^2 \frac{\frac{1}{2} \exp{-\lvert x\rvert}}                                 {\frac{1}{\sqrt{8\pi}}{\exp{-x^2/8}}}                        \frac{1}{\sqrt{8\pi}}{\exp{-x^2/8}}  \,\mathrm{d}x

So we can sample from {\cal{N}}(0, 4) and evaluate

\displaystyle   x^2 \frac{\frac{1}{2} \exp{-\lvert x\rvert}}           {\frac{1}{\sqrt{8\pi}}{\exp{-x^2/8}}}

> {-# OPTIONS_GHC -Wall                     #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing  #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults   #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind  #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans         #-}
> module Importance where
> import Control.Monad
> import Data.Random.Source.PureMT
> import Data.Random
> import Data.Random.Distribution.Binomial
> import Data.Random.Distribution.Beta
> import Control.Monad.State
> import qualified Control.Monad.Writer as W
> sampleImportance :: RVarT (W.Writer [Double]) ()
> sampleImportance = do
>   x <- rvarT $ Normal 0.0 2.0
>   let x2 = x^2
>       u = x2 * 0.5 * exp (-(abs x))
>       v = (exp ((-x2)/8)) * (recip (sqrt (8*pi)))
>       w = u / v
>   lift $ W.tell [w]
>   return ()
> runImportance :: Int -> [Double]
> runImportance n =
>   snd $
>   W.runWriter $
>   evalStateT (sample (replicateM n sampleImportance))
>              (pureMT 2)

We can run this 10,000 times to get an estimate.

ghci> import Formatting
ghci> format (fixed 2) (sum (runImportance 10000) / 10000)

Since we know that the n-th moment of the exponential distribution is n! / \lambda^n where \lambda is the rate (1 in this example), the exact answer is 2 which is not too far from our estimate using importance sampling.

The value of

\displaystyle   w(x) = \frac{1}{N}\frac{\frac{1}{2} \exp{-\lvert x\rvert}}                         {\frac{1}{\sqrt{8\pi}}{\exp{-x^2/8}}}       = \frac{p(x)}{\pi(x)}

is called the weight, p is the pdf from which we wish to sample and \pi is the pdf of the importance distribution.

Importance Sampling Approximation of the Posterior

Suppose that the posterior distribution of a model in which we are interested has a complicated functional form and that we therefore wish to approximate it in some way. First assume that we wish to calculate the expectation of some arbitrary function f of the parameters.

\displaystyle   {\mathbb{E}}(f({x}) \,\vert\, y_1, \ldots y_T) =  \int_\Omega f({x}) p({x} \, \vert \, y_1, \ldots y_T) \,\mathrm{d}{x}

Using Bayes

\displaystyle   \int_\Omega f({x}) {p\left(x \,\vert\, y_1, \ldots y_T\right)} \,\mathrm{d}{x} =  \frac{1}{Z}\int_\Omega f({x}) {p\left(y_1, \ldots y_T \,\vert\, x\right)}p(x) \,\mathrm{d}{x}

where Z is some normalizing constant.

As before we can re-write this using a proposal distribution \pi(x)

\displaystyle   \frac{1}{Z}\int_\Omega f({x}) {p\left(y_1, \ldots y_T \,\vert\, x\right)}p(x) \,\mathrm{d}{x} =  \frac{1}{Z}\int_\Omega \frac{f({x}) {p\left(y_1, \ldots y_T \,\vert\, x\right)}p(x)}{\pi(x)}\pi(x) \,\mathrm{d}{x}

We can now sample X^{(i)} \sim \pi({x}) repeatedly to obtain

\displaystyle   {\mathbb{E}}(f({x}) \,\vert\, y_1, \ldots y_T) \approx \frac{1}{ZN}\sum_1^N  f({X^{(i)}}) \frac{p(y_1, \ldots y_T \, \vert \, {X^{(i)}})p({X^{(i)}})}                              {\pi({X^{(i)}})} =  \sum_1^N w_if({X^{(i)}})

where the weights w_i are defined as before by

\displaystyle   w_i = \frac{1}{ZN} \frac{p(y_1, \ldots y_T \, \vert \, {X^{(i)}})p({X^{(i)}})}                          {\pi({X^{(i)}})}

We follow Alex Cook and use the example from (Rerks-Ngarm et al. 2009). We take the prior as \sim {\cal{Be}}(1,1) and use {\cal{U}}(0.0,1.0) as the proposal distribution. In this case the proposal and the prior are identical just expressed differently and therefore cancel.

Note that we use the log of the pdf in our calculations otherwise we suffer from (silent) underflow, e.g.,

ghci> pdf (Binomial nv (0.4 :: Double)) xv

On the other hand if we use the log pdf form

ghci> logPdf (Binomial nv (0.4 :: Double)) xv
> xv, nv :: Int
> xv = 51
> nv = 8197
> sampleUniform :: RVarT (W.Writer [Double]) ()
> sampleUniform = do
>   x <- rvarT StdUniform
>   lift $ W.tell [x]
>   return ()
> runSampler :: RVarT (W.Writer [Double]) () ->
>               Int -> Int -> [Double]
> runSampler sampler seed n =
>   snd $
>   W.runWriter $
>   evalStateT (sample (replicateM n sampler))
>              (pureMT (fromIntegral seed))
> sampleSize :: Int
> sampleSize = 1000
> pv :: [Double]
> pv = runSampler sampleUniform 2 sampleSize
> logWeightsRaw :: [Double]
> logWeightsRaw = map (\p -> logPdf (Beta 1.0 1.0) p +
>                            logPdf (Binomial nv p) xv -
>                            logPdf StdUniform p) pv
> logWeightsMax :: Double
> logWeightsMax = maximum logWeightsRaw
> weightsRaw :: [Double]
> weightsRaw = map (\w -> exp (w - logWeightsMax)) logWeightsRaw
> weightsSum :: Double
> weightsSum = sum weightsRaw
> weights :: [Double]
> weights = map (/ weightsSum) weightsRaw
> meanPv :: Double
> meanPv = sum $ zipWith (*) pv weights
> meanPv2 :: Double
> meanPv2 = sum $ zipWith (\p w -> p * p * w) pv weights
> varPv :: Double
> varPv = meanPv2 - meanPv * meanPv

We get the answer

ghci> meanPv

But if we look at the size of the weights and the effective sample size

ghci> length $ filter (>= 1e-6) weights

ghci> (sum weights)^2 / (sum $ map (^2) weights)

so we may not be getting a very good estimate. Let’s try

> sampleNormal :: RVarT (W.Writer [Double]) ()
> sampleNormal = do
>   x <- rvarT $ Normal meanPv (sqrt varPv)
>   lift $ W.tell [x]
>   return ()
> pvC :: [Double]
> pvC = runSampler sampleNormal 3 sampleSize
> logWeightsRawC :: [Double]
> logWeightsRawC = map (\p -> logPdf (Beta 1.0 1.0) p +
>                             logPdf (Binomial nv p) xv -
>                             logPdf (Normal meanPv (sqrt varPv)) p) pvC
> logWeightsMaxC :: Double
> logWeightsMaxC = maximum logWeightsRawC
> weightsRawC :: [Double]
> weightsRawC = map (\w -> exp (w - logWeightsMaxC)) logWeightsRawC
> weightsSumC :: Double
> weightsSumC = sum weightsRawC
> weightsC :: [Double]
> weightsC = map (/ weightsSumC) weightsRawC
> meanPvC :: Double
> meanPvC = sum $ zipWith (*) pvC weightsC
> meanPvC2 :: Double
> meanPvC2 = sum $ zipWith (\p w -> p * p * w) pvC weightsC
> varPvC :: Double
> varPvC = meanPvC2 - meanPvC * meanPvC

Now the weights and the effective size are more re-assuring

ghci> length $ filter (>= 1e-6) weightsC

ghci> (sum weightsC)^2 / (sum $ map (^2) weightsC)

And we can take more confidence in the estimate

ghci> meanPvC


Rerks-Ngarm, Supachai, Punnee Pitisuttithum, Sorachai Nitayaphan, Jaranit Kaewkungwal, Joseph Chiu, Robert Paris, Nakorn Premsri, et al. 2009. “Vaccination with ALVAC and AIDSVAX to Prevent HIV-1 Infection in Thailand.” New England Journal of Medicine 361 (23) (December 3): 2209–2220. doi:10.1056/nejmoa0908492.

by Dominic Steinitz at August 23, 2014 08:05 AM

August 22, 2014

Philip Wadler

Informatics Independence Referendum Debate

School of Informatics, University of Edinburgh
Independence Referendum Debate
4.00--5.30pm Monday 25 August
Appleton Tower Lecture Room 2

For the NAYs: Prof. Alan Bundy
For the AYEs: Prof. Philip Wadler
Moderator: Prof. Mike Fourman

All members of the School of Informatics
and the wider university community welcome

(This is a debate among colleagues and not a formal University event.
All views expressed are those of the individuals who express them,
and not the University of Edinburgh.)

by Philip Wadler ( at August 22, 2014 10:43 AM

Research funding in an independent Scotland

A useful summary, written by a colleague.
In Summary:
  •  the difference between the Scottish tax contribution and RCUK spending in Scotland is small compared to savings that will be made in other areas such as defence
  • Funding levels per institution are actually similar in Scotland to those in the rest of the UK, it’s just that there are more institutions here
  • Because of its relative importance any independent Scottish government would prioritise research
  • If rUK rejects a common research area it would lose the benefits of its previous investments, and the Scottish research capacity, which is supported by the Scottish government and the excellence of our universities
  • There are significant disadvantages with a No vote, resulting from UK immigration policy and the possibility of exiting the EU

by Philip Wadler ( at August 22, 2014 10:32 AM

August 21, 2014

Philip Wadler

Scotland can't save England

Salmond concluded his debate with Darling by observing that for half his lifetime Scotland had been ruled by governments that Scotland had not elected. Many take this the other way, and fret that if Scotland leaves the UK, then Labour would never win an election. Wings Over Scotland reviews the figures. While Scotland has an effect on the size of the majority, elections would yield the same ruling party with or without Scotland in 65 of the last 67 years. To a first approximation, Scotland's impact over the rest of the UK is nil, while the rest of the UK overwhelms Scotland's choice half the time.

1945 Labour govt (Attlee)

Labour majority: 146
Labour majority without any Scottish MPs in Parliament: 143

1950 Labour govt (Attlee)

Labour majority: 5
Without Scottish MPs: 2

1951 Conservative govt (Churchill/Eden)

Conservative majority: 17
Without Scottish MPs: 16

1955 Conservative govt (Eden/Macmillan)

Conservative majority: 60
Without Scottish MPs: 61

1959 Conservative govt (Macmillan/Douglas-Home)

Conservative majority: 100
Without Scottish MPs: 109

1964 Labour govt (Wilson)

Labour majority: 4
Without Scottish MPs: -11

1966 Labour govt (Wilson)

Labour majority: 98
Without Scottish MPs: 77

1970 Conservative govt (Heath)

Conservative majority: 30
Without Scottish MPs: 55

1974 Minority Labour govt (Wilson)

Labour majority: -33
Without Scottish MPs: -42

1974b Labour govt (Wilson/Callaghan)

Labour majority: 3
Without Scottish MPs: -8

1979 Conservative govt (Thatcher)

Conservative majority: 43
Without Scottish MPs: 70

1983 Conservative govt (Thatcher)

Conservative majority: 144
Without Scottish MPs: 174

1987 Conservative govt (Thatcher/Major)

Conservative majority: 102
Without Scottish MPs: 154

1992 Conservative govt (Major)

Conservative majority: 21
Without Scottish MPs: 71

1997 Labour govt (Blair)

Labour majority: 179
Without Scottish MPs: 139

2001 Labour govt (Blair)

Labour majority: 167
Without Scottish MPs: 129

2005 Labour govt (Blair/Brown)

Labour majority: 66
Without Scottish MPs:  43

2010 Coalition govt (Cameron)

Conservative majority: -38
Without Scottish MPs: 19

by Philip Wadler ( at August 21, 2014 10:05 PM

How Scotland will be robbed

Thanks to the Barnett Formula, the UK government provides more funding per head in Scotland than in the rest of the UK. Better Together touts this as an extra £1400 in each person's pocket that will be lost if Scotland votes 'Aye' (famously illustrated with Lego). Put to one side the argument as to whether the extra £1400 is a fair reflection of the extra Scotland contributes to the UK economy, through oil and other means. The Barnett Formula is up for renegotiation. Will it be maintained if Scotland votes 'Nay'?

Wings over Scotland lays out the argument that if Scotland opts to stick with Westminster then Westminster will stick it to Scotland.
The Barnett Formula is the system used to decide the size of the “block grant” sent every year from London to the Scottish Government to run devolved services. ...
Until now, however, it’s been politically impossible to abolish the Formula, as such a manifestly unfair move would lead to an upsurge in support for independence. In the wake of a No vote in the referendum, that obstacle would be removed – Scots will have nothing left with which to threaten Westminster.
It would still be an unwise move for the UK governing party to be seen to simply obviously “punish” Scotland after a No vote. But the pledge of all three Unionist parties to give Holyrood “more powers” provides the smokescreen under which the abolition of Barnett can be executed and the English electorate placated.
The block grant is a distribution of tax revenue. The “increased devolution” plans of the UK parties will instead make the Scottish Government responsible for collecting its own income taxes. The Office of Budget Responsibility has explained in detail how the block grant from the UK government to Scotland will then be reduced to reflect the fiscal impact of the devolution of these tax-raising powers.” (page 4).But if Holyrood sets Scottish income tax at the same level as the UK, that’ll mean the per-person receipts are also the same, which means that there won’t be the money to pay for the “extra” £1400 of spending currently returned as part-compensation for Scottish oil revenues, because the oil revenues will be staying at Westminster. ...
We’ve explained the political motivations behind the move at length before. The above is simply the mechanical explanation of how it will happen if Scotland votes No. The“if” is not in question – all the UK parties are united behind the plan.
A gigantic act of theft will be disguised as a gift. The victories of devolution will be lost, because there’ll no longer be the money to pay for them. Tuition fees and prescription charges will return. Labour’s “One Nation” will manifest itself, with the ideologically troublesome differences between Scotland and the rest of the UK eliminated.
And what’s more, it’ll all have been done fairly and above-board, because the Unionist parties have all laid out their intentions in black and white. They’ll be able to say, with justification, “Look, you can’t complain, this is exactly what we TOLD you we’d do”.
This analysis looks persuasive to me, and I've not seen it put so clearly elsewhere. Please comment below if you know sources for similar arguments.

by Philip Wadler ( at August 21, 2014 09:36 PM

Theory Lunch (Institute of Cybernetics, Tallinn)

Transgressing the limits

Today, the 14th of January 2014, we had a special session of our Theory Lunch. I spoke about ultrafilters and how they allow to generalize the notion of limit.

Consider the space \ell^\infty of bounded sequences of real numbers, together with the supremum norm. We would like to define a notion of limit which holds for every \{x_n\}_{n \geq 0} \in \ell^\infty and satisfies the well known properties of standard limit:

  1. Linearity: \lim_{n \to \infty} (\lambda x_n + \mu y_n) = \lambda \lim_{n \to \infty} x_n + \mu \lim_{n \to \infty} y_n.
  2. Homogeneity: \lim_{n \to \infty} (x_n \cdot y_n) = (\lim_{n \to \infty} x_n) \cdot (\lim_{n \to \infty} y_n).
  3. Monotonicity: if x_n \leq y_n for every n \geq 0 then \lim_{n \to \infty} x_n \leq \lim_{n \to \infty} y_n.
  4. Nontriviality: if x_n = 1 for every n \geq 0 then \lim_{n \to \infty} x_n = 1.
  5. Consistency: if the limit exists in the classical sense, then the two notions coincide.

The consistency condition is reasonable also because it avoids trivial cases: if we fix n_0 \in \mathbb{N} and we define the limit of the sequence x_n as the value x_{n_0}, then the first four properties are satisfied.

Let us recall the classical definition of limit: we say that x_n converges to x if and only if, for every \varepsilon > 0, the set of values n \in \mathbb{N} such that |x_n - x| < \varepsilon is cofinite, i.e., has a finite complement: the inequality |x_n - x| \geq \varepsilon can be satisfied at most for finitely many values of n. The family \mathcal{F} of cofinite subsets of \mathbb{N} (in fact, of any set X) has the following properties:

  1. Upper closure: if A \in \mathcal{F} and B \supseteq A then B \in \mathcal{F}.
  2. Meet stability: if A,B \in \mathcal{F} then A \cap B \in \mathcal{F}.

A family \mathcal{F} of subsets of X with the two properties above is called a filter on X. An immediate example is the trivial filter \mathcal{F} = \{X\}; another example is the improper filter \mathcal{F} = \mathcal{P}(X). The family \mathcal{F}(X) of cofinite subset of X is called the Fréchet filter on X. The Fréchet filter is not the improper one if and only if X is infinite.

An ultrafilter on X is a filter \mathcal{U} on X satisfying the following additional conditions:

  1. Properness: \emptyset \not \in \mathcal{U}.
  2. Maximality: for every A \subseteq X, either A \in \mathcal{U} or  X \setminus A \in \mathcal{U}.

For example, if x \in X, then (x) = \{ A \subseteq X \mid x \in A \} is an ultrafilter on X, called the principal ultrafilter generated by x. Observe that \bigcap_{A \in (x)} A = \{x\}: if \bigcap_{A \in \mathcal{U}} A = \emptyset we say that \mathcal{U} is free. These are, in fact, the only two options.

Lemma 1. For a proper filter \mathcal{F} to be an ultrafilter, it is necessary and sufficient that it satisfies the following condition: for every n \geq 2 and nonempty A_1, \ldots, A_n \subseteq X, if \bigcup_{i=1}^n A_i \in \mathcal{F} then A_i \in \mathcal{F} for at least one i \in \{1, \ldots, n\}.

Proof: It is sufficient to prove the thesis with n=2. If A \cup B \in \mathcal{F} with A,B \not \in \mathcal{F}, then \mathcal{F}' = \{ B' \subseteq X \mid B' \neq \emptyset, A \cup B' \in \mathcal{F} \} is a proper filter that properly contains \mathcal{F}. If the condition is satisfied, for every A \subseteq X which is neither \emptyset nor X we have A \cup (X \setminus A) = X \in \mathcal{F}, thus either A \in \mathcal{F} or X \setminus A \in \mathcal{F}. \Box

Theorem 1. Every nonprincipal ultrafilter is free. In addition, an ultrafilter is free if and only if it extends the Fréchet filter. In particular, every ultrafilter over a finite set is principal.

Proof: Let \mathcal{U} be a nonprincipal ultrafilter. Let x \in X: then \mathcal{U} \neq (x), so either there exists B \subseteq X such that x \not \in B and B \in \mathcal{U}, or there exists B' \subseteq X such that x \in B' and B' \not \in \mathcal{U}. In the first case, x \not \in \bigcap_{A \in \mathcal{U}} A; in the second case, we consider B = X \setminus B' and reduce to the first case. As x is arbitrary, \mathcal{U} is free.

Now, for every x \in X the set X \setminus \{x\} belongs to \mathcal{F}(X) but not to (x): therefore, no principal ultrafilter extends the Fréchet filter. On the other hand, if \mathcal{U} is an ultrafilter, A \subseteq X is finite, and X \setminus A \not \in \mathcal{U}, then A \in \mathcal{U} by maximality, hence \{x\} \in \mathcal{U} for some x \in A because of Lemma 1, thus \mathcal{U} = (x) cannot be a free ultrafilter. \Box

So it seems that free ultrafilters are the right thing to consider when trying to expand the concept of limit. There is an issue, though: we have not seen any single example of a free ultrafilter; in fact, we do not even (yet) know whether free ultrafilters do exist! The answer to this problem comes, in a shamelessly nonconstructive way, from the following

Ultrafilter lemma. Every proper filter can be extended to an ultrafilter.

The ultrafilter lemma, together with Theorem 1, implies the existence of free ultrafilters on every infinite set, and in particular on \mathbb{N}. On the other hand, to prove the ultrafilter lemma the Axiom of Choice is required, in the form of Zorn’s lemma. Before giving such proof, we recall that a family of sets has the finite intersection property if every finite subfamily has a nonempty intersection: every proper filter has the finite intersection property.

Proof of the ultrafilter lemma. Let \mathcal{F} be a proper filter on X and let \mathcal{M} be the family of the collections of subsets of X that extend \mathcal{F} and have the finite intersection property, ordered by inclusion. Let \{U_i\}_{i \in I} be a totally ordered subfamily of \mathcal{M}: then U = \bigcup_{i \in I} U_i extends \mathcal{F} and has the finite intersection property, because for every finitely many A_1, \ldots, A_n \in U there exists by construction i \in I such that A_1, \ldots, A_n \in U_i.

By Zorn’s lemma, \mathcal{M} has a maximal element \mathcal{U}, which surely satisfies \emptyset \not \in \mathcal{U} and \mathcal{F} \subseteq \mathcal{U}. If A \in \mathcal{U} and B \supseteq A, then \mathcal{U} \cup \{B\} still has the finite intersection property, therefore B \in \mathcal{U} by maximality. If A,B \in \mathcal{U} then \mathcal{U} \cup \{A \cap B\} still has the finite intersection property, therefore again A \cap B \in \mathcal{U} by maximality.

Suppose, for the sake of contradiction, that there exists A \subseteq X such that A \not \in \mathcal{U} and X \setminus A \not \in \mathcal{U}: then neither \mathcal{U} \cup \{A\} nor \mathcal{U} \cup \{X \setminus A\} have the finite intersection property, hence there exist A_1, \ldots, A_m, B_1, \ldots, B_n \in \mathcal{U} such that A_1 \cap \ldots \cap A_m \cap A = B_1 \cap \ldots \cap B_n \cap (X \setminus A) = \emptyset. But A_1 \cap \ldots \cap A_m \cap A = \emptyset means A_1 \cap \ldots \cap A_m \subseteq X \setminus A, and B_1 \cap \ldots \cap B_n \cap (X \setminus A) = \emptyset means B_1 \cap \ldots \cap B_n \subseteq A: therefore,

A_1 \cap \ldots \cap A_m \cap B_1 \cap \ldots \cap B_n \subseteq (X \setminus A) \cap A = \emptyset,

against \mathcal{U} having the finite intersection property. \Box

We are now ready to expand the idea of limit. Let (X,d) be a metric space and let \mathcal{U} be an ultrafilter on X: we say that x \in X is the ultralimit of the sequence \{x_n\}_{n \geq 0} \subseteq X along \mathcal{U} if for every \varepsilon > 0 the set

\{ n \geq 0 \mid d(x_n, x) < \varepsilon \}

belongs to \mathcal{U}. (Observe how, in the standard definition of limit, the above set is required to belong to the Fréchet filter.) If this is the case, we write

\lim_{n \to \mathcal{U}} x_n = x

Ultralimits, if they exist, are unique and satisfy our first four conditions. Moreover, the choice of a principal ultrafilter \mathcal{U} = (n_0) corresponds to the trivial definition \lim_{n \to \mathcal{U}} x_n = x_{n_0}. So, what about free ultrafilters?

Theorem 2. Every bounded sequence of real numbers has an ultralimit along every free ultrafilter on \mathbb{N}.

Proof: It is not restrictive to suppose x_n \in [0,1] for every n \geq 0. Let \mathcal{U} be an arbitrary, but fixed, free ultrafilter on \mathbb{N}. We will construct a sequence of closed intervals A_k, k \geq 0, such that A_{k+1} \subseteq A_k and \mathrm{diam} \, A_k = 2^{-k} for every k \geq 0. By the Cantor intersection theorem it will be \bigcap_{k \geq 0} A_k = \{x\}: we will then show that \lim_{n \to \mathcal{U}} x_n = x.

Let A_0 = [0,1]. Let A_1 be either [0,1/2] or [1/2,1], chosen according to the following criterion: \{n \geq 0 \mid x_n \in A_1\} \in \mathcal{U}. If both halves satisfy the criterion, then we just choose one once and for all. We iterate the procedure by always choosing A_{k+1} as one of the two halves of A_k such that \{n \geq 0 \mid x_n \in A_{k+1}\} \in \mathcal{U}.

Let \bigcap_{k \geq 0} A_k = \{x\}. Let \varepsilon > 0, and let k be so large that 2^{-k} < \varepsilon: then A_k \subseteq (x-\varepsilon, x+\varepsilon), thus \{n \geq 0 \mid x_n \in A_k\} \subseteq \{n \geq 0 \mid |x_n-x| < \varepsilon\}. As the smaller set belongs to \mathcal{U}, so does the larger one. \Box

We have thus almost achieved our original target: a notion of limit which applies to every bounded sequence of real numbers. Such notion will depend on the specific free ultrafilter we choose: but it is already very reassuring that such a notion exists at all! To complete our job we need one more check: we have to be sure that the definition is consistent with the classical one. And this is indeed what happens!

Theorem 3. Let \{x_n\}_{n \geq 0} be a sequence of real numbers and let x \in \mathbb{R}. Then \lim_{n \to \infty} x_n = x in the classical sense if and only if \lim_{n \to \mathcal{U}} x_n = x for every free ultrafilter \mathcal{U} on \mathbb{N}.

To prove Theorem 3 we make use of an auxiliary result, which is of interest by itself.

Lemma 2. Let \mathcal{M}(X) be the family of collections of subsets of X that have the finite intersection property. The maximal elements of \mathcal{M} are precisely the ultrafilters.

Proof: Every ultrafilter is clearly maximal in \mathcal{M}. If \mathcal{U} is maximal in \mathcal{M}, then it is clearly proper and upper closed, and we can reason as in the proof of the ultrafilter lemma to show that it is actually an ultrafilter. \Box

Proof of Theorem 3: Suppose x_n does not converge to x in the classical sense. Fix \varepsilon_0 > 0 such that the set S = \{n \geq 0 \mid |x_n-x| \geq \varepsilon_0\} is infinite. Then the family \mathcal{V} = \{S \setminus \{n\} \mid n \geq 0\} has the finite intersection property: an ultrafilter \mathcal{U} that extends \mathcal{V} must be free. Then S \in \mathcal{U}, and x_n does not have an ultralimit along \mathcal{U}.

The converse implication follows from the classical definition of limit, together with the very notion of free ultrafilter. \Box

Theorem 3 does hold for sequences of real numbers, but does not extend to arbitrary metric spaces. In fact, the following holds, which we state without proving.

Theorem 4. Let X be a metric space. The following are equivalent.

  1. For some free ultrafilter \mathcal{U} on \mathbb{N}, every sequence in X has an ultralimit along \mathcal{U}.
  2. For every free ultrafilter \mathcal{U} on \mathbb{N}, every sequence in X has an ultralimit along \mathcal{U}.
  3. X is compact.

Ultrafilters are useful in many other contexts. For instance, they are used to construct hyperreal numbers, which in turn allow a rigorous definition of infinitesimals and the foundation of calculus over those. But this might be the topic for another Theory Lunch talk.

by Silvio Capobianco at August 21, 2014 03:15 PM

Edward Z. Yang

The fundamental problem of programming language package management

Why are there so many goddamn package managers? They sprawl across both operating systems (apt, yum, pacman, Homebrew) as well as for programming languages (Bundler, Cabal, Composer, CPAN, CRAN, CTAN, EasyInstall, Go Get, Maven, npm, NuGet, OPAM, PEAR, pip, RubyGems, etc etc etc). "It is a truth universally acknowledged that a programming language must be in want of a package manager." What is the fatal attraction of package management that makes programming language after programming language jump off this cliff? Why can't we just, you know, reuse an existing package manager?

You can probably think of a few reasons why trying to use apt to manage your Ruby gems would end in tears. "System and language package managers are completely different! Distributions are vetted, but that's completely unreasonable for most libraries tossed up on GitHub. Distributions move too slowly. Every programming language is different. The different communities don't talk to each other. Distributions install packages globally. I want control over what libraries are used." These reasons are all right, but they are missing the essence of the problem.

The fundamental problem is that programming languages package management is decentralized.

This decentralization starts with the central premise of a package manager: that is, to install software and libraries that would otherwise not be locally available. Even with an idealized, centralized distribution curating the packages, there are still two parties involved: the distribution and the programmer who is building applications locally on top of these libraries. In real life, however, the library ecosystem is further fragmented, composed of packages provided by a huge variety of developers. Sure, the packages may all be uploaded and indexed in one place, but that doesn't mean that any given author knows about any other given package. And then there's what the Perl world calls DarkPAN: the uncountable lines of code which probably exist, but which we have no insight into because they are locked away on proprietary servers and source code repositories. Decentralization can only be avoided when you control absolutely all of the lines of code in your application.. but in that case, you hardly need a package manager, do you? (By the way, my industry friends tell me this is basically mandatory for software projects beyond a certain size, like the Windows operating system or the Google Chrome browser.)

Decentralized systems are hard. Really, really hard. Unless you design your package manager accordingly, your developers will fall into dependency hell. Nor is there a one "right" way to solve this problem: I can identify at least three distinct approaches to the problem among the emerging generation of package managers, each of which has their benefits and downsides.

Pinned versions. Perhaps the most popular school of thought is that developers should aggressively pin package versions; this approach advocated by Ruby's Bundler, PHP's Composer, Python's virtualenv and pip, and generally any package manager which describes itself as inspired by the Ruby/node.js communities (e.g. Java's Gradle, Rust's Cargo). Reproduceability of builds is king: these package managers solve the decentralization problem by simply pretending the ecosystem doesn't exist once you have pinned the versions. The primary benefit of this approach is that you are always in control of the code you are running. Of course, the downside of this approach is that you are always in control of the code you are running. An all-to-common occurrence is for dependencies to be pinned, and then forgotten about, even if there are important security updates to the libraries involved. Keeping bundled dependencies up-to-date requires developer cycles--cycles that more often than not are spent on other things (like new features).

A stable distribution. If bundling requires every individual application developer to spend effort keeping dependencies up-to-date and testing if they keep working with their application, we might wonder if there is a way to centralize this effort. This leads to the second school of thought: to centralize the package repository, creating a blessed distribution of packages which are known to play well together, and which will receive bug fixes and security fixes while maintaining backwards compatibility. In programming languages, this is much less common: the two I am aware of are Anaconda for Python and Stackage for Haskell. But if we look closely, this model is exactly the same as the model of most operating system distributions. As a system administrator, I often recommend my users use libraries that are provided by the operating system as much as possible. They won't take backwards incompatible changes until we do a release upgrade, and at the same time you'll still get bugfixes and security updates for your code. (You won't get the new hotness, but that's essentially contradictory with stability!)

Embracing decentralization. Up until now, both of these approaches have thrown out decentralization, requiring a central authority, either the application developer or the distribution manager, for updates. Is this throwing out the baby with the bathwater? The primary downside of centralization is the huge amount of work it takes to maintain a stable distribution or keep an individual application up-to-date. Furthermore, one might not expect the entirety of the universe to be compatible with one another, but this doesn't stop subsets of packages from being useful together. An ideal decentralized ecosystem distributes the problem of identifying what subsets of packages work across everyone participating in the system. Which brings us to the fundamental, unanswered question of programming languages package management:

How can we create a decentralized package ecosystem that works?

Here are a few things that can help:

  1. Stronger encapsulation for dependencies. One of the reasons why dependency hell is so insidious is the dependency of a package is often an inextricable part of its outwards facing API: thus, the choice of a dependency is not a local choice, but rather a global choice which affects the entire application. Of course, if a library uses some library internally, but this choice is entirely an implementation detail, this shouldn't result in any sort of global constraint. Node.js's NPM takes this choice to its logical extreme: by default, it doesn't deduplicate dependencies at all, giving each library its own copy of each of its dependencies. While I'm a little dubious about duplicating everything (it certainly occurs in the Java/Maven ecosystem), I certainly agree that keeping dependency constraints local improves composability.
  2. Advancing semantic versioning. In a decentralized system, it's especially important that library writers give accurate information, so that tools and users can make informed decisions. Wishful, invented version ranges and artistic version number bumps simply exacerbate an already hard problem (as I mentioned in my previous post). If you can enforce semantic versioning, or better yet, ditch semantic versions and record the true, type-level dependency on interfaces, our tools can make better choices. The gold standard of information in a decentralized system is, "Is package A compatible with package B", and this information is often difficult (or impossible, for dynamically typed systems) to calculate.
  3. Centralization as a special-case. The point of a decentralized system is that every participant can make policy choices which are appropriate for them. This includes maintaining their own central authority, or deferring to someone else's central authority: centralization is a special-case. If we suspect users are going to attempt to create their own, operating system style stable distributions, we need to give them the tools to do so... and make them easy to use!

For a long time, the source control management ecosystem was completely focused on centralized systems. Distributed version control systems such as Git fundamentally changed the landscape: although Git may be more difficult to use than Subversion for a non-technical user, the benefits of decentralization are diverse. The Git of package management doesn't exist yet: if someone tells you that package management is solved, just reimplement Bundler, I entreat you: think about decentralization as well!

by Edward Z. Yang at August 21, 2014 01:02 PM

Lee Pike

SmartChecking Matt Might’s Red-Black Trees

Matt Might gave a nice intro to QuickCheck via testing red-black trees recently. Of course, QuickCheck has been around for over a decade now, but it’s still useful (if underused–why aren’t you QuickChecking your programs!?).

In a couple of weeks, I’m presenting a paper on an alternative to QuickCheck called SmartCheck at the Haskell Symposium.

SmartCheck focuses on efficiently shrinking and generalizing large counterexamples. I thought it’d be fun to try some of Matt’s examples with SmartCheck.

The kinds of properties Matt Checked really aren’t in the sweet spot of SmartCheck, since the counterexamples are so small (Matt didn’t even have to define instances for shrink!). SmartCheck focuses on shrinking and generalizing large counterexamples.

Still, let’s see what it looks like. (The code can be found here.)

SmartCheck is only interesting for failed properties, so let’s look at an early example in Matt’s blog post where something goes wrong. A lot of the blog post focuses on generating sufficiently constrained arbitrary red-black trees. In the section entitled, “A property for balanced black depth”, a property is given to check that the path from the root of a tree to every leaf passes through the same number of black nodes. An early generator for trees fails to satisfy the property.

To get the code to work with SmartCheck, we derive Typeable and Generic instances for the datatypes, and use GHC Generics to automatically derive instances for SmartCheck’s typeclass. The only other main issue is that SmartCheck doesn’t support a `forall` function like in QuickCheck. So instead of a call to QuickCheck such as

> quickCheck (forAll nrrTree prop_BlackBalanced)

We change the arbitrary instance to be the nrrTree generator.

Because it is so easy to find a small counterexample, SmartCheck’s reduction algorithm does a little bit of automatic shrinking, but not too much. For example, a typical minimal counterexample returned by SmartCheck looks like

T R E 2 (T B E 5 E)

which is about as small as possible. Now onto generalization!

There are three generalization phases in SmartCheck, but we’ll look at just one, in which a formula is returned that is universally quantified if every test case fails. For the test case above, SmartCheck returns the following formula:

forall values x0 x1:
T R E 2 (T B x1 5 x0)

Intuitively, this means that for any well-typed trees chosen that could replace the variables x0 and x1, the resulting formula is still a counterexample.

The benefit to developers is seeing instantly that those subterms in the counterexample probably don’t matter. The real issue is that E on the left is unbalanced with (T B E 5 E) on the right.

One of the early design decisions in SmartCheck was focus on structurally shrinking data types and essentially ignore “base types” like Int, char, etc. The motivation was to improve efficiency on shrinking large counterexamples.

But for a case like this, generalizing base types would be interesting. We’d hypothetically get something like

forall values (x0, x1 :: RBSet Int) (x2, x3 :: Int):
T R E x2 (T B x1 x3 x0)

further generalizing the counterexample. It may be worth adding this behavior to SmartCheck.

SmartCheck’s generalization begins to bridge the gap from specific counterexamples to formulas characterizing counterexamples. The idea is related to QuickSpec, another cool tool developed by Claessen and Hughes (and SmallBone). Moreover, it’s a bridge between testing and verification, or as Matt puts it, from the 80% to the 20%.

by Lee Pike at August 21, 2014 04:53 AM

FP Complete

IAP: Speeding up conduit

This post contains fragments of active Haskell code, best viewed and executed at

As most of us know, performance isn't a one-dimensional spectrum. There are in fact multiple different ways to judge performance of a program. A commonly recognized tradeoff is that between CPU and memory usage. Often times, a program can be sped up by caching more data, for example.

conduit is a streaming data library. In that sense, it has two very specific performance criterion it aims for:

  • Constant memory usage.
  • Efficient usage of scarce resources, such as closing file descriptors as early as possible.

While CPU performance is always a nice goal, it has never been my top priority in the library's design, especially given that in the main use case for conduit (streaming data in an I/O context), the I/O cost almost always far outweighs any CPU overhead from conduit.

However, for our upcoming Integrated Analysis Platform (IAP) release, this is no longer the case. conduit will be used in tight loops, where we do need to optimize for the lowest CPU overhead possible.

This blog post covers the first set of optimizations I've applied to conduit. There is still more work to be done, and throughout this blogpost I'll be describing some of the upcoming changes I am attempting.

I'll give a brief summary up front:

  • Applying the codensity transform results in much better complexity of monadic bind.
  • We're also less reliant on rewrite rules firing, which has always been unreliable (and now I know why).
  • This change does represent a breaking API change. However, it only affects users of the Data.Conduit.Internal module. If you've just been using the public API, your code will be unaffected, besides getting an automatic speedup.
  • These changes will soon be released as conduit 1.2.0, after a period for community feedback.

Note that this blog post follows the actual steps I went through (more or less) in identifying the performance issues I wanted to solve. If you want to skip ahead to the solution itself, you may want to skip to the discussion on difference lists, or even straight to continuation passing style, church-encoding, codensity.

By the way, after I originally wrote this blog post, I continued working on the optimizations I describe as possible future enhancements. Those are actually working out far better than I expected, and it looks like conduit 1.2.0 will be able to ship with them. I'll be writing a separate blog post detailing those changes. A bit of a teaser is: for vector-equivalent code, conduit now generates identical core as vector itself.

The benchmarks

Before embarking on any kind of serious optimizations, it's important to have some benchmarks. I defined three benchmarks for the work I was going to be doing:

  • A simple sum: adding up the numbers from 1 to 10000. This is to get a baseline of the overhead coming from conduit.

  • A monte carlo analysis: This was based on a previous IAP blog post. I noticed when working on that benchmark that, while the conduit solution was highly memory efficient, there was still room to speed up the benchmark.

  • Sliding vectors: Naren Sundar recently sent a sliding windows pull requests, which allow us to get a view of a fixed size of a stream of values. This feature is very useful for a number of financial analyses, especially regarding time series.

    Naren's pull request was based on immutable data structures, and for those cases it is highly efficient. However, it's possible to be far more memory efficient by writing to a mutable vector instead, and then taking immutable slices of that vector. Mihaly Barasz sent a pull request for this feature, and much to our disappointment, for small window sizes, it performed worse than sliding windows. We want to understand why.

You can see the benchmark code, which stays mostly unchanged for the rest of this blog post (a few new cases are added to demonstrate extra points). The benchmarks always contain a low-level base case representing the optimal performance we can expect from hand-written Haskell (without resorting to any kind of FFI tricks or the like).

You can see the first run results which reflect conduit 1.1.7, plus inlining of a few functions. Some initial analysis:

  • Control.Monad.foldM is surpringly slow.
  • Data.Conduit.List.foldM has a rather steep performance hit versus Data.Conduit.List.fold.
  • There's a very high overhead in the monte carlo analysis.
  • For sliding vector, the conduit overhead is more pronounced at smaller window sizes.
  • But even with large window sizes, mutable vector conduits still have a large overhead. The sliding window/immutable approach, however, shows almost no overhead.

That hopefully sets the scene enough for us to begin to dive in.

Rewrite rules: lift

GHC offers a very powerful optimization technique: rewrite rules. This allows you to tell the compiler that a certain expression can be rewritten to a more efficient one. A common example of a rewrite rule would be to state that map f . map g is the same as map (f . g). This can be expressed as:

{-# RULES "map f . map g" forall f g. map f . map g = map (f . g) #-}

Note that GHC's list rewrite rules are actually more complicated than this, and revolve around a concept called build/foldr fusion.

Let's look at the implementation of the yield function in conduit (with some newtypes stripped away):

yield :: Monad m => o -> ConduitM i o m ()
yield o = HaveOutput (Done ()) (return ()) o
{-# INLINE [1] yield #-}
    "yield o >> p" forall o (p :: ConduitM i o m r).
    yield o >> p = HaveOutput p (return ()) o

The core datatype of conduit is recursive. The HaveOutput constructor contains a field for "what to do next." In the case of yield, there isn't anything to do next, so we fill that with Done (). However, creating that Done () value just to throw it away after a monadic bind is wasteful. So we have a rewrite rule to fuse those two steps together.

But no such rewrite rule exists for lift! My first step was to add such a rule, and check the results. Unfortunately, the rule didn't have any real impact, because it wasn't firing. Let's put that issue to the side; we'll come back to it later.

Cleanup, inlining

One of the nice features introduced in (I believe) GHC 7.8 is that the compiler will now warn you when a rewrite rule may not fire. When compiling conduit, I saw messages like:

Data/Conduit/List.hs:274:11: Warning:
    Rule "source/map fusion $=" may never fire
      because ‘$=’ might inline first
    Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$=’

Data/Conduit/List.hs:275:11: Warning:
    Rule "source/map fusion =$=" may never fire
      because ‘=$=’ might inline first
    Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘=$=’

Data/Conduit/List.hs:542:11: Warning:
    Rule "source/filter fusion $=" may never fire
      because ‘$=’ might inline first
    Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$=’

Data/Conduit/List.hs:543:11: Warning:
    Rule "source/filter fusion =$=" may never fire
      because ‘=$=’ might inline first
    Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘=$=’

Data/Conduit/List.hs:552:11: Warning:
    Rule "connect to sinkNull" may never fire
      because ‘$$’ might inline first
    Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$$’

This demonstrates an important interaction between inlining and rewrite rules. We need to make sure that expressions that need to be rewritten are not inlined first. If they are first inlined, then GHC won't be able to rewrite them to our more optimized version.

A common approach to this is to delay inlining of functions until a later simplification phase. The GHC simplification process runs in multiple steps, and we can state that rules and inlining should only happen before or after a certain phase. The phases count down from 2 to 0, so we commonly want to delay inlining of functions until phase 0, if they may be subject to rewriting.

Conversely, some functions need to be inlined before a rewrite rule can fire. In stream fusion, for example, the fusion framework depends on the following sequencing to get good performance:

map f . map g
-- inline map
unstream . mapS f . stream . unstream . mapS g . stream
-- rewrite stream . unstream
unstream . mapS f . mapS g . stream
-- rewrite mapS . mapS
unstream . mapS (f . g) . stream

In conduit, we need to make sure that all of this is happening in the correct order. There was one particular complexity that made it difficult to ensure this happened. conduit in fact has two core datatypes: Pipe and ConduitM, with the latter being a more friendly newtype wrapper around the first. Up until this point, the code for the two was jumbled into a single internal module, making it difficult to track which things were being written in which version of the API.

My next step was to split things into .Pipe and .Conduit internal modules, and then clean up GHC's warnings to get rules to fire more reliably. This gave a modest performance boost to the sliding vector benchmarks, but not much else. But it does pave the way for future improvements.

Getting serious about sum, by cheating

The results so far have been uninspiring. We've identified a core problem (too many of those Done data constructors being used), and noticed that the rewrite rules that should fix that don't seem to be doing their job. Now let's take our first stab at really improving performance: with aggressive rewrite rules.

Our sum benchmark is really simple: use enumFromTo to create a stream of values, and fold (or foldM) to consume that. The thing that slows us down is that, in between these two simple functions, we end up allocating a bunch of temporary data structures. Let's get rid of them with rewrite rules!

This certainly did the trick. The conduit implementation jumped from 185us to just 8.63us. For comparison, the low level approach (or vector's stream fusion) clocks in at 5.77us, whereas foldl' on a list is 80.6us. This is a huge win!

But it's also misleading. All we've done here is sneakily rewritten our conduit algorithm into a low-level format. This solves the specific problem on the table (connecting enumFromTo with fold), but won't fully generalize to other cases. A more representative demonstration of this improvement is the speedup for foldM, which went from 1180us to 81us. The reason this is more realistic is that the rewrite rule is not specialized to enumFromTo, but rather works on any Source.

I took a big detour at this point, and ended up writing an initial implementation of stream fusion in conduit. Unfortunately, I ran into a dead end on that branch, and had to put that work to the side temporarily. However, the improvements discussed in the rest of this blog post will hopefully reopen the door to stream fusion, which I hope to investigate next.

Monte carlo, and associativity

Now that I'd made the results of the sum benchmark thoroughly useless, I decided to focus on the results of monte carlo, where the low level implementation still won by a considerable margin (3.42ms vs 10.6ms). The question was: why was this happening? To understand, let's start by looking at the code:

analysis = do
    successes <- sourceRandomN count
              $$ CL.fold (\t (x, y) ->
                            if (x*x + y*(y :: Double) < 1)
                                then t + 1
                                else t)
                    (0 :: Int)
    return $ fromIntegral successes / fromIntegral count * 4

sourceRandomN :: (MWC.Variate a, MonadIO m) => Int -> Source m a
sourceRandomN cnt0 = do
    gen <- liftIO MWC.createSystemRandom
    let loop 0 = return ()
        loop cnt = do
            liftIO (MWC.uniform gen) >>= yield >> loop (cnt - 1)
    loop cnt0

The analysis function is not very interesting: it simply connects sourceRandomN with a fold. Given that we now have a well behaved and consistently-firing rewrite rule for connecting to folds, it's safe to say that was not the source of our slowdown. So our slowdown must be coming from:

liftIO (MWC.uniform gen) >>= yield >> loop (cnt - 1)

This should in theory generate really efficient code. yield >> loop (cnt - 1) should be rewritten to \x -> HaveOutput (loop (cnt - 1)) (return ()) x), and then liftIO should get rewritten to generate:

PipeM $ do
    x <- MWC.uniform gen
    return $ HaveOutput (loop $ cnt - 1) (return ()) x

I added another commit to include a few more versions of the monte carlo benchmark (results here). The two most interesting are:

  • Explicit usage of the Pipe constructors:

    sourceRandomNConstr :: (MWC.Variate a, MonadIO m) => Int -> Source m a
    sourceRandomNConstr cnt0 = ConduitM $ PipeM $ do
        gen <- liftIO MWC.createSystemRandom
        let loop 0 = return $ Done ()
            loop cnt = do
                x <- liftIO (MWC.uniform gen)
                return $ HaveOutput (PipeM $ loop (cnt - 1)) (return ()) x
        loop cnt0

    This version ran in 4.84ms, vs the original conduit version which ran in 15.8ms. So this is definitely the problem!

  • Explicitly force right-associated binding order:

    sourceRandomNBind :: (MWC.Variate a, MonadIO m) => Int -> Source m a
    sourceRandomNBind cnt0 = lift (liftIO MWC.createSystemRandom) >>= \gen ->
        let loop 0 = return ()
            loop cnt = do
                lift (liftIO $ MWC.uniform gen) >>= (\o -> yield o >> loop (cnt - 1))
         in loop cnt0

    Or to zoom in on the important bit:

    lift (liftIO $ MWC.uniform gen) >>= (\o -> yield o >> loop (cnt - 1))

    By the monad laws, this code is identical to the original. However, instead of standard left-associativity, we have right associativity or monadic bind. This code ran in 5.19ms, an approximate threefold speedup vs the left associative code!

This issue of associativity was something Roman Cheplyaka told me about back in April, so I wasn't surprised to see it here. Back then, I'd looked into using Codensity together with ConduitM, but didn't get immediate results, and therefore postponed further research until I had more time.

OK, so why exactly does left-associativity hurt us so much? There are two reasons actually:

  • Generally speaking, many monads perform better when they are right associated. This is especially true for free monads, of which conduit is just a special case. Janis Voigtl ̈ander's paper Asymptotic Improvement of Computations over Free Monads and Edward Kmett's blog post series free monads for less do a far better job of explaining the issue than I could.
  • In the case of conduit, left associativity prevented the lift and yield rewrite rules from firing, which introduced extra, unnecessary monadic bind operations. Forcing right associativity allows these rules to fire, avoiding a lot of unnecessary data constructor allocation and analysis.

At this point, it became obvious at this point that the main slowdown I was seeing was driven by this problem. The question is: how should we solve it?

Difference lists

To pave the way for the next step, I want to take a quick detour and talk about something simpler: difference lists. Consider the following code:

(((w ++ x) ++ y) ++ z)

Most experienced Haskellers will cringe upon reading that. The append operation for a list needs to traverse every cons cell in its left value. When we left-associate append operations like this, we will need to traverse every cell in w, then every cell in w ++ x, then every cell in w ++ x ++ y. This is highly inefficient, and would clearly be better done in a right-associated style (sound familiar?).

But forcing programmers to ensure that their code is always right-associated isn't always practical. So instead, we have two common alternatives. The first is: use a better datastructure. In particular, Data.Sequence has far cheaper append operations than lists.

The other approach is to use difference lists. Difference lists are functions instead of actual list values. They are instructions for adding values to the beginning of the list. In order to append, you use normal function composition. And to convert them to a list, you apply the resulting function to an empty list. As an example:

type DList a = [a] -> [a]

dlist1 :: DList Int
dlist1 rest = 1 : 2 : rest

dlist2 :: DList Int
dlist2 rest = 3 : 4 : rest

final :: [Int]
final = dlist1 . dlist2 $ []

main :: IO ()
main = print final

Both difference lists and sequences have advantages. Probably the simplest summary is:

  • Difference lists have smaller constant factors for appending.
  • Sequences allow you to analyze them directly, without having to convert them to a different data type first.

That second point is important. If you need to regularly analyze your list and then continue to append, the performance of a difference list will be abysmal. You will constantly be swapping representations, and converting from a list to a difference list is an O(n) operation. But if you will simply be constructing a list once without any analysis, odds are difference lists will be faster.

This situation is almost identical to our problems with conduit. Our monadic composition operator- like list's append operator- needs to traverse the entire left hand side. This connection is more clearly spelled out in Reflection without Remorse by Atze van der Ploeg and Oleg Kiselyov (and for me, care of Roman).

Alright, with that out of the way, let's finally fix conduit!

Continuation passing style, church-encoding, codensity

There are essentially two things we need to do with conduits:

  • Monadically compose them to sequence two streams into a larger stream.
  • Categorically compose them to connect one stream to the next in a pipeline.

The latter requires that we be able to case analyze our datatypes, while theoretically the former does not: something like difference lists for simple appending would be ideal. In the past, I've tried out a number of different alternative implementations of conduit, none of which worked well enough. The problem I always ran into was that either monadic bind became too expensive, or categorical composition became too expensive.

Roman, Mihaly, Edward and I discussed these issues a bit on Github, and based on Roman's advice, I went ahead with writing a benchmark of different conduit implementations. I currently have four implementations in this benchmark (and hope to add more):

  • Standard, which looks very much like conduit 1.1, just a bit simplified (no rewrite rules, no finalizers, no leftovers).
  • Free, which is conduit rewritten to explicitly use the free monad transformer.
  • Church, which modifies Free to instead use the Church-encoded free monad transformer.
  • Codensity, which is a Codensity-transform-inspired version of conduit.

You can see the benchmark results, which clearly show the codensity version to be the winner. Though it would be interesting, I think I'll avoid going into depth on the other three implementations for now (this blog post is long enough already).

What is Codensity?

Implementing Codensity in conduit just means changing the ConduitM newtype wrapper to look like this:

newtype ConduitM i o m r = ConduitM
    { unConduitM :: forall b.
                    (r -> Pipe i i o () m b) -> Pipe i i o () m b

What this says is "I'm going to provide an r value. If you give me a function that needs an r value, I'll give it that r value and then continue with the resulting Pipe." Notice how similar this looks to the type signature of monadic bind itself:

(>>=) :: Pipe i i o () m r
      -> (r -> Pipe i i o () m b)
      -> Pipe i i o () m b

This isn't by chance, it's by construction. More information is available in the Haddocks of kan-extension, or in the above-linked paper and blog posts by Janis and Edward. To see why this change is important, let's look at the new implementations of some of the core conduit functions and type classes:

yield o = ConduitM $ \rest -> HaveOutput (rest ()) (return ()) o

await = ConduitM $ \f -> NeedInput (f . Just) (const $ f Nothing)

instance Monad (ConduitM i o m) where
    return x = ConduitM ($ x)
    ConduitM f >>= g = ConduitM $ \h -> f $ \a -> unConduitM (g a) h

instance MonadTrans (ConduitM i o) where
    lift mr = ConduitM $ \rest -> PipeM (liftM rest mr)

Instead of having explicit Done constructors in yield, await, and lift, we use the continuation rest. This is the exact same transformation we were previously relying on rewrite rules to provide. However, our rewrite rules couldn't fire properly in a left-associated monadic binding. Now we've avoided the whole problem!

Our Monad instance also became much smaller. Notice that in order to monadically compose, there is no longer any need to case-analyze the left hand side, which avoids the high penalty of left association.

Another interesting quirk is that our Monad instance on ConduitM no longer requires that the base m type constructor itself be a Monad. This is nice feature of Codensity.

So that's half the story. What about categorical composition? That certainly does require analyzing both the left and right hand structures. So don't we lose all of our speed gains of Codensity with this? Actually, I think not. Let's look at the code for categorical composition:

ConduitM left0 =$= ConduitM right0 = ConduitM $ \rest ->
    let goRight final left right =
            case right of
                HaveOutput p c o  -> HaveOutput (recurse p) (c >> final) o
                NeedInput rp rc   -> goLeft rp rc final left
                Done r2           -> PipeM (final >> return (rest r2))
                PipeM mp          -> PipeM (liftM recurse mp)
                Leftover right' i -> goRight final (HaveOutput left final i) right'
            recurse = goRight final left

        goLeft rp rc final left =
            case left of
                HaveOutput left' final' o -> goRight final' left' (rp o)
                NeedInput left' lc        -> NeedInput (recurse . left') (recurse . lc)
                Done r1                   -> goRight (return ()) (Done r1) (rc r1)
                PipeM mp                  -> PipeM (liftM recurse mp)
                Leftover left' i          -> Leftover (recurse left') i
            recurse = goLeft rp rc final
     in goRight (return ()) (left0 Done) (right0 Done)

In the last line, we apply left0 and right0 to Done, which is how we convert our Codensity version into something we can actually analyze. (This is equivalent to applying a difference list to an empty list.) We then traverse these values in the same way that we did in conduit 1.1 and earlier.

The important difference is how we ultimately finish. The code in question is the Done clause of the goRight's case analysis, namely:

Done r2           -> PipeM (final >> return (rest r2))

Notice the usage of rest, instead of what we would have previously done: used the Done constructor. By doing this, we're immediately recreating a Codensity version of our resulting Pipe, which allows us to only traverse our incoming Pipe values once each, and not need to retraverse the outgoing Pipe for future monadic binding.

This trick doesn't just work for composition. There are a large number of functions in conduit that need to analyze a Pipe, such as addCleanup and catchC. All of them are now implemented in this same style.

After implementing this change, the resulting benchmarks look much better. The naive implementation of monte carlo is now quite close to the low-level version (5.28ms vs 3.44ms, as opposed to the original 15ms). Sliding vector is also much better: the unboxed, 1000-size window benchmark went from 7.96ms to 4.05ms, vs a low-level implementation at 1.87ms.

Type-indexed sequences

One approach that I haven't tried yet is the type-indexed sequence approach from Reflection without Remorse. I still intend to add it to my conduit benchmark, but I'm not optimistic about it beating out Codensity. My guess is that a sequence data type will have a higher constant factor overhead, and based on the way composition is implemented in conduit, we won't get any benefit from avoiding the need to transition between two representations.

Edward said he's hoping to get an implementation of such a data structure into the free package, at which point I'll update my benchmark to see how it performs.

To pursue next: streamProducer, streamConsumer, and more

While this round of benchmarking produced some very nice results, we're clearly not yet at the same level as low-level code. My goal is to focus on that next. I have some experiments going already relating to getting conduit to expose stream fusion rules. In simple cases, I've generated a conduit-compatible API with the same performance as vector.

The sticking point is getting something which is efficient not just for functions explicitly written in stream style, but also provides decent performance when composed with the await/yield approach. While the latter approach will almost certainly be slower than stream fusion, I'm hoping we can get it to degrade to current-conduit performance levels, and allow stream fusion to provide a significant speedup when categorically composing two Conduits written in that style.

The code discussed in this post is now available on the next-cps branch of conduit. conduit-extra, conduit-combinators, and a number of other packages either compile out-of-the-box with these changes, or require minor tweaks (already implemented), so I'm hoping that this API change does not affect too many people.

As I mentioned initially, I'd like to have some time for community discussion on this before I make this next release.

August 21, 2014 12:00 AM

August 20, 2014

Neil Mitchell

Continuations and Exceptions

Summary: In moving Shake to continuations, exceptions were the biggest headache. I figured out how to somewhat integrate continuations and exception handling.

The git repo for Shake now suspends inactive computations by capturing their continuation instead of blocking their thread, based on the continuations I described in a previous blog post. The most difficult part was managing exceptions. I needed to define a monad where I could capture continuations and work with exceptions, requiring the definitions:

data M a = ... deriving (Functor, Applicative, Monad, MonadIO)
throwM :: SomeException -> M a
catchM :: M a -> (SomeException -> M a) -> M a
captureM :: ((a -> IO ()) -> IO ()) -> M a

I'm using M as the name of the monad. I want equivalents of throwIO and catch for M, along with a function to capture continuations.

The first observation is that since catchM must catch any exceptions, including those thrown by users calling error, then throwM can be defined as:

throwM = liftIO . throwIO

Using throwIO gives better guarantees about when the exception is raised, compared to just throw.

The second observation is that sometimes I want to raise an exception on the continuation, rather than passing back a value. I can build that on top of captureM with:

captureM' :: ((Either SomeException a -> IO ()) -> IO ()) -> M a
captureM' k = either throwM return =<< captureM k

The third observation (which I observed after a few weeks trying not to follow it) is that the continuation may never be called, and that means you cannot implement a robust finallyM function. In particular, if the person who was intending to run the continuation themselves raises an exception, the continuation is likely to be lost. I originally tried to come up with schemes for defining the function passed the continuation to guarantee the continuation was called, but it became messy very quickly.

The properties we expect of the implementation, to a rough approximation, include:

  • catchM (x >> throwM e) (\_ -> y) >> z === x >> y >> z -- if you throw an exception inside a catchM, you must run the handler.
  • captureM (\k -> x) >>= y === x -- if you execute something not using the continuation inside captureM it must behave like it does outside captureM. In particular, if the captureM is inside a catchM, that catchM must not catch the exception.
  • captureM (\k -> k x) >>= y === x >>= y -- if you capture the continuation then continue that must be equivalent to not capturing the continuation.
  • captureM (\k -> k x >> k x) >>= y === (x >>= y) >> (x >>= y) -- if you run the continuation twice it must do the same IO actions each time. In particular, if the first gets its exceptions caught, the second must do also.

These properties are incomplete (there are other things you expect), and fuzzy (for example, the second property isn't type correct) - but hopefully they give an intuition.

The implementation was non-trivial and (sadly) non-elegant. I suspect a better implementation is known in the literature, and I'd welcome a pointer. My implementation defines M as:

type M a = ContT () (ReaderT (IORef (SomeException -> IO ())) IO) a

Here we have a continuation monad wrapping a reader monad. The reader contains an IORef which stores the exception handler. The basic idea is that whenever we start running anything in M we call the Haskell catch function, and the exception handler forwards to the IORef. We can define catchM as:

catchM :: M a -> (SomeException -> M a) -> M a
catchM m hdl = ContT $ \k -> ReaderT $ \s -> do
old <- liftIO $ readIORef s
writeIORef s $ \e -> do
s <- newIORef old
hdl e `runContT` k `runReaderT` s `catch`
\e -> ($ e) =<< readIORef s
flip runReaderT s $ m `runContT` \v -> do
s <- ask
liftIO $ writeIORef s old
k v
  • We store the previous exception handler as old, and insert a new one. After the code has finished (we have left the catchM block) we restore the old exception handler. In effect, we have a stack of exception handlers.
  • When running the handler we pop off the current exception handler by restoring old, then since we have already used up our catch, we add a new catch to catch exceptions in the handler.

We then define captureM as:

captureM :: ((a -> IO ()) -> IO ()) -> M a
captureM f = ContT $ \k -> ReaderT $ \s -> do
old <- readIORef s
writeIORef s throwIO
f $ \x -> do
s <- newIORef old
flip runReaderT s (k x) `E.catch`
\e -> ($ e) =<< readIORef s
writeIORef s throwIO
  • We make sure to switch the IORef back to throwIO before we start running the users code, and after we have finished running our code and switch back to user code. As a result, if the function that captures the continuation throws an exception, it will be raised as normal.
  • When running the continuation we create a new IORef for the handler, since the continuation might be called twice in parallel, and the separate IORef ensures they don't conflict with each other.

Finally, we need a way to run the computation. I've called that runM:

runM :: M a -> (Either SomeException a -> IO ()) -> IO ()
runM m k = do
let mm = do
captureM $ \k -> k ()
catchM (Right <$> m) (return . Left)
s <- newIORef throwIO
mm `runContT` (liftIO . k) `runReaderT` s

The signature of runM ends up being the only signature the makes sense given the underlying mechanisms. We define mm by using the facilities of captureM to insert a catch and catchM to ensure we never end up in an exception state from runM. The rest is just matching up the types.

Stack depth could potentially become a problem with this solution. If you regularly do:

captureM (\k -> k ())

Then each time a catch will be wrapped around the function. You can avoid that by changing captureM to throw an exception:

captureM :: ((a -> IO ()) -> IO ()) -> M a
captureM f = ContT $ \k -> ReaderT $ \s -> do
old <- readIORef s
writeIORef s $ \_ ->
f $ \x -> do
s <- newIORef old
flip runReaderT s (k x) `E.catch`
\e -> ($ e) =<< readIORef s
throwIO anyException

Here we unwind the catch by doing a throwIO, after installing our exception handler which actually passes the continuation. It is a bit ugly, and I haven't checked if either the catch is a problem, or that this solution solves it.

The implementation in Shake is a bit different to that described above. In Shake I know that captured continuations are never called more than once, so I
can avoid creating a new IORef in captureM, and I can reuse the existing one. Since I never change the handler, I can use a slightly less powerful definition of M:

type M a = ReaderT (IORef (SomeException -> IO ())) (ContT () IO) a

The resulting code is Development.Shake.Monad, which implements the RAW monad, and also does a few extra things which are irrelevant to this post.

The cool thing about Haskell is that I've been able to completely replace the underlying Shake Action monad from StateT/IO, to ReaderT/IO, to ReaderT/ContT/IO, without ever breaking any users of Shake. Haskell allows me to produce effective and flexible abstractions.

by Neil Mitchell ( at August 20, 2014 05:39 PM

August 19, 2014

Brent Yorgey

Maniac week postmortem

My maniac week was a great success! First things first: here’s a time-lapse video1 (I recommend watching it at the full size, 1280×720).

<iframe class="youtube-player" frameborder="0" height="315" src=";rel=1&amp;fs=1&amp;showsearch=0&amp;showinfo=1&amp;iv_load_policy=1&amp;wmode=transparent" type="text/html" width="560"></iframe>

Some statistics2:

  • Total hours of productive work: 55.5 (74 pings)
  • Average hours of work per day3: 11
  • Average hours of sleep per night: 7.8 (52 pings over 5 nights)4
  • Total hours not working or sleeping: 27.25 (37 pings)
  • Average hours not working per day: 5.5
  • Pages of dissertation written: 24 (157 to 181)

[I was planning to also make a visualization of my TagTime data showing when I was sleeping, working, or not-working, but putting together the video and this blog post has taken long enough already! Perhaps I’ll get around to it later.]

Overall, I would call the experiment a huge success—although as you can see, I was a full 2.5 hours per day off my target of 13.5 hours of productive work each day. What with eating, showering, making lunch, getting dinner, taking breaks (both intentional breaks as well as slacking off), and a few miscellaneous things I had to take care of like taking the car to get the tire pressure adjusted… it all adds up surprisingly fast. I think this was one of the biggest revelations for me; going into it I thought 3 hours of not-work per day was extremely generous. I now think three hours of not-work per day is probably within reach for me but would be extremely difficult, and would probably require things like planning out meals ahead of time. In any case, 55 hours of actual, focused work is still fantastic.

Some random observations/thoughts:

  • Having multiple projects to work on was really valuable; when I got tired of working on one thing I could often just switch to something else instead of taking an actual break. I can imagine this might be different if I were working on a big coding project (as most of the other maniac weeks have been). The big project would itself provide multiple different subtasks to work on, but more importantly, coding provides immediate feedback that is really addictive. Code a new feature, and you can actually run the new code! And it does something cool! That it didn’t do before! In contrast, when I write another page of my dissertation I just have… another page of my dissertation. I am, in fact, relatively excited about my dissertation, but it can’t provide that same sort of immediate reinforcing feedback, and it was difficult to keep going at times.

  • I found that having music playing really helped me get into a state of “flow”. The first few days I would play some album and then it would stop and I wouldn’t think to put on more. Later in the week I would just queue up many hours of music at a time and that worked great.

  • I was definitely feeling worn out by the end of the week—the last two days in particular, it felt a lot harder to get into a flow. I think I felt so good the first few days that I became overconfident—which is good to keep in mind if I do this again. The evening of 12 August was particularly bad; I just couldn’t focus. It might have been better in the long run to just go home and read a book or something; I’m just not sure how to tell in the moment when I should push through and when it’s better to cut my losses.

  • Blocking Facebook, turning off email notifications, etc. was really helpful. I did end up allowing myself to check email using my phone (I edited the rules a few hours before I started) and I think it was a good idea—I ended up still needing to communicate with some people, so it was very convenient and not too distracting.

  • Note there are two places on Tuesday afternoon where you can see the clock jump ahead by an hour or so; of course those are times when I turned off the recording. One corresponded to a time when I needed to read and write some sensitive emails; during the other, I was putting student pictures into an anki deck, and turned off the recording to avoid running afoul of FERPA.

That’s all I can think of for now; questions or comments, of course, are welcome.

  1. Some technical notes (don’t try this at home; see for some recommendations on making your own timelapse). To record and create the video I used a homegrown concoction of scrot, streamer, ImageMagick, ffmpeg, with some zsh and Haskell scripts to tie it all together, and using diagrams to generate the clock and tag displays. I took about 3GB worth of raw screenshots, and it takes probably about a half hour to process all of it into a video.

  2. These statistics are according to TagTime, i.e. gathered via random sampling, so there is a bit of inherent uncertainty. I leave it as an exercise for the reader to calculate the proper error bars on these times (given that I use a standard ping interval of 45 minutes).

  3. Computed as 74/(171 – 9) pings multiplied by 24 hours; 9 pings occurred on Sunday morning which I did not count as part of the maniac week.

  4. This is somewhat inflated by Saturday night/Sunday morning, when I both slept in and got a higher-than-average number of pings; the average excluding that night is 6.75 hours, which sounds about right.

by Brent at August 19, 2014 06:18 PM

August 18, 2014

Functional Programming Group at the University of Kansas

Bluetooth on Haskell

I'm presenting a very early draft of my bluetooth library. As its name suggests, bluetooth is a Haskell frontend to low-level Bluetooth APIs, making it similar in spirit to Python's PyBluez and Java's BlueCove.

What it can do

Currently, bluetooth only supports Linux. It has the capability of running an RFCOMM server and client. Theoretically, it should also support L2CAP servers and clients, although this has not been tested yet.

What it will eventually do

I plan to have bluetooth support each of the GHC Tier 1 platforms—that is, Windows, OS X, Linux, and FreeBSD. I want to have the capability to run the full gamut of L2CAP and RFCOMM-related functions on each OS, as well as any additional OS-specific functionality.


Bluetooth programming on Haskell is currently in a very primitive state. As of now, there are only two packages on Hackage that make any mention of Bluetooth (as far as I can tell):

  1. network hints in its Network.Socket module that there is an AF_BLUETOOTH socket address family. However, trying to use it with network will fail, since there is no corresponding Bluetooth SockAddr.
  2. simple-bluetooth by Stephen Blackheath offers some of what my own bluetooth package offers (namely, RFCOMM client capability on Windows and Linux).
However, there is currently no comprehensive, cross-platform Haskell Bluetooth library à la PyBluez or BlueCove. I want bluetooth to fill that niche.

How bluetooth works

bluetooth can be thought of as a wrapper around network. After all, Bluetooth programming is socket-based, so Network.Socket already provides most of what one needs to implement a Bluetooth server or client. There are several gotchas with Bluetooth programming, however:
  • Every major OS has a completely different Bluetooth software stack. For example, Linux uses BlueZ, and Windows has several different stacks, including Winsock and Widcomm. Therefore, bluetooth is not likely to work identically on every OS.
  • Windows in particular is challenging to support since several Winsock functions do not work correctly on the version of MinGW-w64 that is currently shipped with GHC for Windows (only the 64-bit version, no less). For this reason, I probably won't develop a Windows version of bluetooth until this issue is resolved.
It is recommended that you have a basic understanding of Bluetooth programming before attempting to use bluetooth. I recommend this introduction by Albert Huang.


The following are abridged examples of the RFCOMM client and server examples from the bluetooth repo.

RFCOMM server

module Main where
import Data.Set

import Network.Bluetooth
import Network.Socket

main :: IO ()
main = withSocketsDo $ do
let uuid = serviceClassToUUID SerialPort
proto = RFCOMM
settings = defaultSDPInfo {
sdpServiceName = Just "Roto-Rooter Data Router"
, sdpProviderName = Just "Roto-Rooter"
, sdpDescription = Just "An experimental plumbing router"
, sdpServiceClasses = singleton SerialPort
, sdpProfiles = singleton SerialPort

handshakeSock <- bluetoothSocket proto
btPort <- bluetoothBindAnyPort handshakeSock anyAddr
bluetoothListen handshakeSock 1
service <- registerSDPService uuid settings proto btPort
(connSock, connAddr) <- bluetoothAccept handshakeSock
putStrLn $ "Established connection with address " ++ show connAddr

message <- recv connSock 4096
putStrLn $ "Received message! [" ++ message ++ "]"
let response = reverse message
respBytes <- send connSock response
putStrLn $ "Sent response! " ++ show respBytes ++ " bytes."

close connSock
close handshakeSock
closeSDPService service

RFCOMM client

module Main where

import Network.Bluetooth
import Network.Socket

import System.IO

main :: IO ()
main = withSocketsDo $ do
let addr = read "12:34:56:78:90:00"
port = 1

sock <- bluetoothSocket RFCOMM
bluetoothConnect sock addr port
putStrLn "Established a connection!"

putStr "Please enter a message to send: "
hFlush stdout
message <- getLine

messBytes <- send sock message
response <- recv sock 4096
putStrLn $ "Received reponse! [" ++ reponse ++ "]"

close sock
<script type="text/javascript"> SyntaxHighlighter.highlight(); </script>

by Ryan Scott ( at August 18, 2014 09:04 PM

JP Moresmau

Fame at last!

I was reading the book "Haskell Data Analysis Cookbook" when suddenly, my name pops up! Funny to see a link to a 7 year old blog entry, who knew I would go down in history for a few line of codes for a perceptron? It's deep in Chapter 7, for those interested. Maybe this is a sign that I should abandon everything else and spend my time on AI, since it's obviously where fame and riches abound! Right...

by JP Moresmau ( at August 18, 2014 03:10 PM

Tom Schrijvers

Algebraic Effect Handlers, Twice

I have two new papers on algebraic effect handlers:

  • Effect Handlers in Scope
    Nicolas Wu, Tom Schrijvers, Ralf Hinze.
    To appear at the Haskell Symposium 2014.

    Algebraic effect handlers are a powerful means for describing effectful computations. They provide a lightweight and orthogonal technique to define and compose the syntax and semantics of different effects. The semantics is captured by handlers, which are functions that transform syntax trees.
    Unfortunately, the approach does not support syntax for scoping constructs, which arise in a number of scenarios. While handlers can be used to provide a limited form of scope, we demonstrate that this approach constrains the possible interactions of effects and rules out some desired semantics.
    This paper presents two different ways to capture scoped constructs in syntax, and shows how to achieve different semantics by reordering handlers. The first approach expresses scopes using the existing algebraic handlers framework, but has some limitations. The problem is fully solved in the second approach where we introduce higher-order syntax.
  • Heuristics Entwined with Handlers Combined
    Tom Schrijvers, Nicolas Wu, Benoit Desouter, Bart Demoen.
    To appear at the PPDP Symposium 2014.

    A long-standing problem in logic programming is how to cleanly separate logic and control. While solutions exist, they fall short in one of two ways: some are too intrusive, because they require significant changes to Prolog’s underlying implementation; others are lacking a clean semantic grounding. We resolve both of these issues in this paper.
    We derive a solution that is both lightweight and principled. We do so by starting from a functional specification of Prolog based on monads, and extend this with the effect handlers approach to capture the dynamic search tree as syntax. Effect handlers then express heuristics in terms of tree transformations. Moreover, we can declaratively express many heuristics as trees themselves that are combined with search problems using a generic entwining handler. Our solution is not restricted to a functional model: we show how to implement this technique as a library in Prolog by means of delimited continuations.

by Tom Schrijvers ( at August 18, 2014 08:05 AM

August 17, 2014

Don Stewart (dons)

Haskell development job at Standard Chartered

The Strats team at Standard Chartered is hiring expert typed FP developers for Haskell dev roles in London.

This is a “front office” finance role – meaning you will work directly with traders building software to automate and improve their efficiency. The role is very development focused, and you will use Haskell for almost all tasks: data analysis, DSLs, market data publishing, databases, web services, desktop GUIs, large parallel tasks, quantitative models, solvers, everything. There may be a small amount of C++ or F# on occasion. This is a fast paced role – code you write today will be deployed within hours to hundreds of users and has to work.

You will join an expert team in London, and significant, demonstrable experience in typed FP (Haskell, OCaml, F# etc) is strongly preferred. We have more than 2 million lines of Haskell, and our own compiler. In this context we look for skill and taste in typed API design to capture and abstract over complex, messy systems.

Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability.

The role requires physical presence on the trading floor in London. Remote work isn’t an option. Ideally you have some project and client management skills — you will talk to users, understand their problems, and then implement and deliver what they really need. No financial background is required.

More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview.

If this sounds exciting to you, please send CVs to me – dons00 at

Tagged: community, jobs, london

by Don Stewart at August 17, 2014 01:07 PM

August 16, 2014

Noam Lewis

xml-to-json – new version released, constant memory usage

I’ve released a new version (1.0.0) of xml-to-json, which aims to solve memory issues encountered when converting large XML files. The new version includes two executables: the regular (aka “classic”) version, xml-to-json, which includes the various features, and the newly added executable xml-to-json-fast, which runs with constant memory usage and can process files of arbitrary size. It does this by not validating the input xml, and by basically streaming json output as it encounters xml elements (tags) in the input. The implementation takes advantage of the cool tagsoup library for processing XML.

Check the for more details. Hackage is updated.

Tagged: Haskell

by sinelaw at August 16, 2014 07:27 PM

wren gayle romano

Citation, Recitation, and Resuscitation

Citation is a necessary practice for any sort of intellectual engagement, whether formal or colloquial, and whether academic or activistic. It is crucial to give credit to the originators of ideas— for ethical honesty: to acknowledge those who've enlightened you; for professional honesty: to make clear where your contributions begin; and for intellectual honesty: to allow others to read the sources for themselves and to follow up on other extensions and criticisms of that work.

When encountering a new idea or text, I often engage in a practice I call "encitation". In order to more thoroughly understand and ingrain a text's intellectual content, I try (temporarily) to view all other ideas and arguments through its lens. This is why when I was reading Whipping Girl I was citing it left and right, just as when I was reading Killing Rage I quoted it incessantly. To understand structuralism, I embraced the structuralist theory and viewed all things in structuralist terms; to understand functionalism, or Marxism, or Freudianism, or performativity, I did the same. Of course, every framework is incomplete and emphasizes certain things to the exclusion of observing others; so viewing the world entirely from within any single framework distorts your perception of reality. The point of the exercise is not to embrace the framework per se, it's to roleplay the embracing of it. The point of this roleplay is to come to understand the emphases and limitations of the framework— not abstractly but specifically. This is especially important for trying to understand frameworks you disagree with. When we disagree with things, the instinct is to discount everything they say. But it's intellectually dishonest to refuse to understand why you disagree. And it's counterproductive, since you cannot debunk the theory nor convince people to change their minds without knowing and addressing where they're coming from.

I engage in encitation not only for anthropological or philosophical ideas, I also do it for mathematical ideas. By trying to view all of mathematics through a particular idea or framework, you come to understand both what it's good at and what it cannot handle. That's one of the things I really love about the way Jason Eisner teaches NLP and declarative methods. While it's brutal to give people a framework (like PCFGs or SAT solving) and then ask them to solve a problem just barely outside of what that framework can handle, it gives you a deep understanding of exactly where and why the framework fails. This is the sort of knowledge you usually have to go out into industry and beat your head against for a while before you see it. But certain fields, like anthropology and writing, do try to teach encitation as a practice for improving oneself. I wonder how much of Jason's technique comes from his background in psychology. Regardless, this practice is one which should, imo, be used (and taught explicitly) more often in mathematics and computer science. A lot of the arguing over OO vs FP would go away if people did this. Instead, we only teach people hybridized approaches, and they fail to internalize the core philosophical goals of notions like objects, functions, types, and so on. These philosophical goals can be at odds, and even irreconcilable, but that does not make one or the other "wrong". The problem with teaching only hybridized approaches is that this irreconcilability means necessarily compromising on the full philosophical commitment to these goals. Without understanding the full philosophical goals of these different approaches, we cannot accurately discuss why sometimes one philosophy is more expedient or practical than another, and yet why that philosophy is not universally superior to others.

The thing to watch out for, whether engaging in the roleplay of encitation or giving citations for actual work, is when you start reciting quotes and texts like catechisms. Once things become a reflexive response, that's a sign that you are no longer thinking. Mantras may be good for meditation, but they are not good critical praxis. This is, no doubt, what Aoife is referring to when she castigates playing Serano says. This is also why it's so dangerous to engage with standardized narratives. The more people engage in recitations of The Narrative, the more it becomes conventionalized and stripped of whatever humanity it may once have had. Moreover, reiterating The Narrative to everyone you meet is the surest way to drive off anyone who doesn't believe in that narrative, or who believes the content but disagrees with the message. Even if I was "born this way", saying so doesn't make it any more true or any more acceptable to those who who would like Jesus to save me from myself. More to the point, saying so places undue emphasis on one very tiny aspect of the whole. I'd much rather convince people of the violent nature of gender enculturation, and get them to recognize the psychological damage that abuse causes, than get them to believe that transgender has a natal origin.

As time goes on, we ask different questions. Consequently, we end up discarding old theories and embracing new ones when the old theory cannot handle our new questions. In our tireless pursuit of the "truth", educators are often reticent to teach defunct theories because we "know" they are "wrong". The new theory is "superior" in being able to address our new questions, but we often lose track of the crucial insights of the old theory along the way. For this reason, it's often important to revive old theories in order to re-highlight those insights and to refocus on old questions which may have become relevant once more. In a way, this revitalization is similar to encitation: the goal is not to say that the old theory is "right", the goal is to understand what the theory is saying and why it's important to say those things.

But again, one must be careful. When new theories arise, practitioners of the immediately-old theory often try to derail the asking of new questions by overemphasizing the questions which gave rise to the preceding theory. This attempt to keep moribund theories on life support often fuels generational divides: the new theoreticians cannot admit to any positives of the old theory lest they undermine their own work, while the old theoreticians feel like they must defend their work against the unrelenting tide lest it be lost forever. I think this is part of why radfems have been spewing such vitriol lately. The theoretical framework of radical feminism has always excluded and marginalized trans women, sex workers, and countless others; but the framework does not justify doxxing, stalking, and harassing those women who dare refute the tenets of The Doctrine. This reactionary violence bears a striking resemblance to the violence of religious fundamentalists1. And as with the religious fundamentalists, I think the reactionary violence of radfems stems from living in a world they can no longer relate to or make sense of.

Major changes in mathematics often result in similar conflicts, though they are seldom so violent. The embracing/rejection of constructivism as a successor to classical mathematics. The embracing/rejection of category theory as an alternative to ZFC set theory. Both of these are radical changes to the philosophical foundations of mathematical thought, and both of these are highly politicized, with advocates on both sides who refuse to hear what the other side is saying. Bob Harper's ranting and railing against Haskell and lazy evaluation is much the same. Yes, having simple cost models and allowing benign side effects is important; but so is having simple semantic models and referential transparency. From where we stand now, those philosophical goals seem to be at odds. But before we can make any progress on reconciling them, we must be willing to embrace both positions long enough to understand their crucial insights and to objectively recognize where and how both fail.

[1] To be clear: I do not draw this analogy as a way of insulting radfems; only to try and make sense of their behavior. There are many religious people (even among those who follow literalist interpretations of their religious texts) who are not terrorists; so too, there are women who believe in the radfem ideology and don't support the behavior of TERFs, SWERFs, etc. It is important to recognize both halves of each community in order to make sense of either side's reactions; and it's important to try to understand the mechanism that leads to these sorts of splits. But exploring this analogy any further is off-topic for this post. Perhaps another time.

comment count unavailable comments

August 16, 2014 12:26 AM

August 15, 2014

Ken T Takusagawa

[leuzkdqp] Units

Some notes on dimensional quantities and type systems:

Addition, subtraction, assignment, and comparison should fail if the units are incompatible.

Multiplication, division, and exponentiation by a rational dimensionless power always work.  These operations assume commutativity.

Distinguishing addition from multiplication vaguely reminds me of the difference between floating point and fixed point.

Unit conversion: a quantity can be read in one set of units then shown in another set.  Abstractly it does not exist as a real number in either.

Converting between different families of units requires exact linear algebra on rational numbers.

In some functions, units pass through just fine.  Others, e.g., trigonometric, require dimensionless numbers.

Not all dimensionless numbers are the same unit: adding an angle to the fine structure constant seems as meaningless as adding a foot to a volt.  But multiplying them could be reasonable.

One can take any compound type with a dimensionless internal type and turn it into a new compound type with that internal type having units.  But should this be considered a "new" type?  Of course, this is useless unless the internal type defined arithmetic operations: "True" miles per hour seems meaningless.

Creating such compound types is analogous to the "function" idea above by viewing a compound type as a data constructor function of a base type.  Constructors do not do operations which can fail, like addition, so the function always succeeds.

Creating a list populated by successive time derivatives of position seems like a useful thing to be able do.  But every element of the list will have different dimensions, which violates the naive idea of a list being items all of the same type.

We would like to catch all dimensionality errors at compile time, but this may not be possible.  The extreme example would be implementing the "units" program.  Is that an anomaly?

It is OK to add a vector to a coordinate (which has an origin) but not a coordinate to a coordinate.  There seems to be a concept of units and "delta" units.

It is OK to subtract coordinates to get a delta.

Maybe multiplying coordinates is also illegal.

Coordinates versus vectors, units versus delta units, seems like an orthogonal problem to "regular" units.  Separate them in software so one can use either concept independently, for example, distinguishing dimensionless from delta dimensionless.

Go further than just "delta" to distinguish first, second, etc., differences.

An X component and Y component of a vector might have the same units, say, length, but one wants to avoid adding them, as this is typically a typo.  But sometimes, for rotations, one does add them.

A Haskell Wiki page: The units package seems promising.

by Ken ( at August 15, 2014 06:39 PM

August 14, 2014

FP Complete

Announcing Stackage Server

A New Service

A couple months ago I made a post explaining Stackage server, its motivations and use-cases, and that it would be available in the coming months. It's now officially available in beta!

Stackage server.

As a quick recap: the essence of Stackage is that rather than publishing at the granularity of packages, like Hackage, it publishes at the granularity of package sets: Either everything builds together, or nothing is published until it does. We call these published things “snapshots.”

Note: A snapshot is an exact image that can be reproduced at any time. With the snapshot's digest hash, you can download and install the same index and install packages based on that index all the time. Subsequently generated snapshots have no effect on previous ones.

I've been using it for a couple months for every project I've worked on, private and public. It's perfect for application developers and teams who want to share a common always-building package set. Provided you're using one of the 500+ packages we're publishing in the snapshots, there will always be a build plan for the package you want to use in your project.

And if your library is in Stackage, as explained in the previous post, you will get a heads up on Github if your updates or other people's updates cause a build failure related to your library.

How it Works

Snapshots are built every couple days. It takes about 16 hours to complete a build. You can view the build progress at

There are two types of snapshots published by FP Complete:

  1. Exclusive: this excludes packages not specified in the Stackage configuration. This means anything that you try to install from this snapshot will have a build plan.
  2. Inclusive: this includes Hackage packages not known to build. If you try to install a package not tracked by Stackage, it may or may not build.

You can use whichever suites your needs. If you want everything to always build, the former is an attractive choice. If you need to use a package not currently on Stackage, the latter choice makes sense.

Try it Right Now

Choose a snapshot. Each snapshot applies to a specific GHC version. For example, the latest (as of writing) GHC 7.8 build. You'll see something like this:

To use, copy the following to your ~/.cabal/config:

remote-repo: stackage:

Note: Remove or comment out any existing remote-repo line.

Run the following to update your packages:

$ cabal update

If you already have installed some packages, it's better to clear out your package set. See this page in the FAQ for how to do that.


How does this interact with sandboxes? Good question. Here's the rundown:

  • hsenv: Yes, works fine. Edit the .hsenv/cabal/config file and off you go.
  • cabal sandbox: Not yet! There is an open issue about this. But I have tried cabal sandboxes inside hsenv, which worked.

We've added this to the FAQ on the wiki. Contributions to this wiki page are welcome!


Personally, I'm very satisfied with this service so far. I just use my existing tools with a different remote-repo.

Others familiar with Nix have asked how they compare. They are very similar solutions in terms of versioning and curation (although Stackage has full-time maintenance); the main advantage to Stackage is that it just uses existing tools, so you don't have to learn a new tool and way of working to have a better user experience.

We'd like feedback on a few points:

  • Is the inclusive/exclusive separation useful?
  • Is the process of using Stackage in an existing system (clearing the package set and starting fresh) easy?
  • Should we establish a convention for storing Stackage snapshot hashes in projects source-tracking repositories?

And any other feedback you come up with while using it.

Stackage for businesses

As part of my last announcement for Stackage I mentioned there will also be custom services for businesses looking to build their development platform on Stackage.

These commercial services include:

  1. Premium support - FP Complete will quickly respond and make improvements or fixes to the public Stackage server as they need to happen.
  2. Private snapshots with premium support - very helpful for commercial users looking to add proprietary or custom libraries.
  3. Validated pre-compiled build images based on public or private snapshots. These can be used on developer systems or automated build systems.
  4. Packaged Jenkins server using the pre-compiled build images.

All these additional commercial services are meant to be helpful add-ons and we look forward to hearing more about what features you think would be beneficial to you. For more information email us at:

August 14, 2014 12:00 AM

August 12, 2014

Neil Mitchell

Safe library - generalising functions

Summary: The Safe library now has exact versions of take/drop, with twelve functions implemented on top of a generalised splitAt.

The Safe library is a simple Haskell library that provides versions of standard Prelude and Data.List functions that usually throw errors (e.g. tail), but wrapped to provide better error messages (e.g. tailNote), default values (e.g. tailDef) and Maybe results (e.g. tailMay).

I recently released version 0.3.5, which provides a new module Safe.Exact containing crashing versions of functions such as zip/zipWith (which error if the lists are not equal length) and take/drop/splitAt (which error if there are not enough elements), then wraps them to provide safe variants. As an example, the library provides:

takeExact    :: Int -> [a] -> [a]
takeExactMay :: Int -> [a] -> Maybe [a]

These are like take, but if the Int is larger than the length of the list it will throw an error or return Nothing. Some sample evaluations:

takeExactMay 2 [1,2,3] == Just [1,2]
takeExact 2 [1,2,3] == [1,2]
takeExactMay 2 [1] == Nothing
takeExact 2 [1] ==
1:error "Safe.Exact.takeExact, index too large, index=2, length=1"
take 1 (takeExact 2 [1]) == [1]

So takeExactMay computes up-front whether the whole computation will succeed, and returns a Nothing if it will fail. In contrast, takeExact produces elements while they are present, but if you demand an additional element that is missing it will result in an error. All the exceptions in the Safe library are designed to provide the maximum level of detail about what went wrong, here telling us the index we were after and the length of the list.

The library provides takeExact, dropExact and splitAtExact, plus Def/May/Note versions, resulting in twelve similar functions. While the implementation of any one function is reasonably short (although not that short, once proper error messages are provided), I didn't want to write the same code twelve times. However, generalising over functions that check up-front and those that check on-demand requires a bit of thought. In the end I settled for:

splitAtExact_ :: (String -> r) -> ([a] -> r) -> (a -> r -> r) -> Int -> [a] -> r
splitAtExact_ err nil cons o xs
| o < 0 = err $ "index must not be negative, index=" ++ show o
| otherwise = f o xs
f 0 xs = nil xs
f i (x:xs) = x `cons` f (i-1) xs
f i [] = err $
"index too large, index=" ++ show o ++ ", length=" ++ show (o-i)

Here the splitAtExact_ function has a parameterised return type r, along with three functional arguments that construct and consume the r values. The functional arguments are:

  • err :: String -> r, says how to convert an error into a result value. For up-front checks this produces a Nothing, for on-demand checks this calls error.
  • nil :: [a] -> r, says what to do once we have consumed the full number of elements. For take we discard all the remaining elements, for drop we are only interested in the remaining elements.
  • cons :: a -> r -> r, says how to deal with one element before we reach the index. For take this will be (:), but for functions producing a Maybe we have to check the r parameter first.

With this generalisation, I was able to write all twelve variants. As a few examples:

addNote fun msg = error $ "Safe.Exact." ++ fun ++ ", " ++ msg

takeExact = splitAtExact_ (addNote "takeExact") (const []) (:)

dropExact = splitAtExact_ (addNote "dropExact") id (flip const)

takeExactMay = splitAtExact_ (const Nothing) (const $ Just []) (\a -> fmap (a:))

dropExactMay = splitAtExact_ (const Nothing) Just (flip const)

splitAtExact = splitAtExact_ (addNote "splitAtExact")
(\x -> ([], x)) (\a b -> first (a:) b)

splitAtExactMay = splitAtExact_ (const Nothing)
(\x -> Just ([], x)) (\a b -> fmap (first (a:)) b)

Normally I would have defined takeExact and dropExact in terms of fst/snd on top of splitAtExact. However, in the Safe library error messages are of paramount importance, so I go to additional effort to ensure the error says takeExact and not splitAtExact.

by Neil Mitchell ( at August 12, 2014 09:08 PM

August 11, 2014

Dan Burton

Similarities: Monoid, MonadPlus, Category

This is perhaps obvious to anyone who has thoroughly studied category theory, but the similarities between Monoid, MonadPlus, and Category, have really struck me lately. I’m going to take a smidgeon of artistic license to present this train of thought. … Continue reading

by Dan Burton at August 11, 2014 09:41 PM

Functional Jobs

Big Data Engineer / Data Scientist at Recruit IT (Full-time)

  • Are you a Big Data Engineer who wants to work on innovative cloud and real-time data analytic technologies?
  • Do you have a passion for turning data into meaningful information?
  • Does working on a world-class big data project excite you?

Our client is currently looking in growth phase and looking for passionate and creative Data Scientists who can design, development, and implement robust, and scalable big data solutions. This is a role where you will need to enjoy being on the cusp of emerging technologies, and have a genuine interest in breaking new ground.

Your skills and experience will cover the majority of the following:

  • Experience working across real-time data analytics, machine learning, and big data solutions
  • Experience working with large data sets and cloud clusters
  • Experience with various NoSQL technologies and Big Data platforms including; Hadoop, Cassandra, HBASE, Accumulo, and MapReduce
  • Experience with various functional programming languages including; Scala, R, Clojure, Erlang, F#, Caml, Haskell, Common Lisp, or Scheme

This is an excellent opportunity for someone who is interested in a change in lifestyle, and where you would be joining other similar experienced professionals!

New Zealand awaits!

Get information on how to apply for this position.

August 11, 2014 12:32 AM

Gabriel Gonzalez

Equational reasoning at scale

Haskell programmers care about the correctness of their software and they specify correctness conditions in the form of equations that their code must satisfy. They can then verify the correctness of these equations using equational reasoning to prove that the abstractions they build are sound. To an outsider this might seem like a futile, academic exercise: proving the correctness of small abstractions is difficult, so what hope do we have to prove larger abstractions correct? This post explains how to do precisely that: scale proofs to large and complex abstractions.

Purely functional programming uses composition to scale programs, meaning that:

  • We build small components that we can verify correct in isolation
  • We compose smaller components into larger components

If you saw "components" and thought "functions", think again! We can compose things that do not even remotely resemble functions, such as proofs! In fact, Haskell programmers prove large-scale properties exactly the same way we build large-scale programs:

  • We build small proofs that we can verify correct in isolation
  • We compose smaller proofs into larger proofs

The following sections illustrate in detail how this works in practice, using Monoids as the running example. We will prove the Monoid laws for simple types and work our way up to proving the Monoid laws for much more complex types. Along the way we'll learn how to keep the proof complexity flat as the types grow in size.


Haskell's Prelude provides the following Monoid type class:

class Monoid m where
mempty :: m
mappend :: m -> m -> m

-- An infix operator equivalent to `mappend`
(<>) :: Monoid m => m -> m -> m
x <> y = mappend x y

... and all Monoid instances must obey the following two laws:

mempty <> x = x                -- Left identity

x <> mempty = x -- Right identity

(x <> y) <> z = x <> (y <> z) -- Associativity

For example, Ints form a Monoid:

-- See "Appendix A" for some caveats
instance Monoid Int where
mempty = 0
mappend = (+)

... and the Monoid laws for Ints are just the laws of addition:

0 + x = x

x + 0 = x

(x + y) + z = x + (y + z)

Now we can use (<>) and mempty instead of (+) and 0:

>>> 4 <> 2
>>> 5 <> mempty <> 5

This appears useless at first glance. We already have (+) and 0, so why are we using the Monoid operations?

Extending Monoids

Well, what if I want to combine things other than Ints, like pairs of Ints. I want to be able to write code like this:

>>> (1, 2) <> (3, 4)
(4, 6)

Well, that seems mildly interesting. Let's try to define a Monoid instance for pairs of Ints:

instance Monoid (Int, Int) where
mempty = (0, 0)
mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)

Now my wish is true and I can "add" binary tuples together using (<>) and mempty:

>>> (1, 2) <> (3, 4)
(4, 6)
>>> (1, 2) <> (3, mempty) <> (mempty, 4)
(4, 6)
>>> (1, 2) <> mempty <> (3, 4)
(4, 6)

However, I still haven't proven that this new Monoid instance obeys the Monoid laws. Fortunately, this is a very simple proof.

I'll begin with the first Monoid law, which requires that:

mempty <> x = x

We will begin from the left-hand side of the equation and try to arrive at the right-hand side by substituting equals-for-equals (a.k.a. "equational reasoning"):

-- Left-hand side of the equation
mempty <> x

-- x <> y = mappend x y
= mappend mempty x

-- `mempty = (0, 0)`
= mappend (0, 0) x

-- Define: x = (xL, xR), since `x` is a tuple
= mappend (0, 0) (xL, xR)

-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)
= (0 + xL, 0 + xR)

-- 0 + x = x
= (xL, xR)

-- x = (xL, xR)
= x

The proof for the second Monoid law is symmetric

-- Left-hand side of the equation
= x <> mempty

-- x <> y = mappend x y
= mappend x mempty

-- mempty = (0, 0)
= mappend x (0, 0)

-- Define: x = (xL, xR), since `x` is a tuple
= mappend (xL, xR) (0, 0)

-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)
= (xL + 0, xR + 0)

-- x + 0 = x
= (xL, xR)

-- x = (xL, xR)
= x

The third Monoid law requires that (<>) is associative:

(x <> y) <> z = x <> (y <> z)

Again I'll begin from the left side of the equation:

-- Left-hand side
(x <> y) <> z

-- x <> y = mappend x y
= mappend (mappend x y) z

-- x = (xL, xR)
-- y = (yL, yR)
-- z = (zL, zR)
= mappend (mappend (xL, xR) (yL, yR)) (zL, zR)

-- mappend (x1, y1) (x2 , y2) = (x1 + x2, y1 + y2)
= mappend (xL + yL, xR + yR) (zL, zR)

-- mappend (x1, y1) (x2 , y2) = (x1 + x2, y1 + y2)
= mappend ((xL + yL) + zL, (xR + yR) + zR)

-- (x + y) + z = x + (y + z)
= mappend (xL + (yL + zL), xR + (yR + zR))

-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)
= mappend (xL, xR) (yL + zL, yR + zR)

-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)
= mappend (xL, xR) (mappend (yL, yR) (zL, zR))

-- x = (xL, xR)
-- y = (yL, yR)
-- z = (zL, zR)
= mappend x (mappend y z)

-- x <> y = mappend x y
= x <> (y <> z)

That completes the proof of the three Monoid laws, but I'm not satisfied with these proofs.

Generalizing proofs

I don't like the above proofs because they are disposable, meaning that I cannot reuse them to prove other properties of interest. I'm a programmer, so I loathe busy work and unnecessary repetition, both for code and proofs. I would like to find a way to generalize the above proofs so that I can use them in more places.

We improve proof reuse in the same way that we improve code reuse. To see why, consider the following sort function:

sort :: [Int] -> [Int]

This sort function is disposable because it only works on Ints. For example, I cannot use the above function to sort a list of Doubles.

Fortunately, programming languages with generics let us generalize sort by parametrizing sort on the element type of the list:

sort :: Ord a => [a] -> [a]

That type says that we can call sort on any list of as, so long as the type a implements the Ord type class (a comparison interface). This works because sort doesn't really care whether or not the elements are Ints; sort only cares if they are comparable.

Similarly, we can make the proof more "generic". If we inspect the proof closely, we will notice that we don't really care whether or not the tuple contains Ints. The only Int-specific properties we use in our proof are:

0 + x = x

x + 0 = x

(x + y) + z = x + (y + z)

However, these properties hold true for all Monoids, not just Ints. Therefore, we can generalize our Monoid instance for tuples by parametrizing it on the type of each field of the tuple:

instance (Monoid a, Monoid b) => Monoid (a, b) where
mempty = (mempty, mempty)

mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)

The above Monoid instance says that we can combine tuples so long as we can combine their individual fields. Our original Monoid instance was just a special case of this instance where both the a and b types are Ints.

Note: The mempty and mappend on the left-hand side of each equation are for tuples. The memptys and mappends on the right-hand side of each equation are for the types a and b. Haskell overloads type class methods like mempty and mappend to work on any type that implements the Monoid type class, and the compiler distinguishes them by their inferred types.

We can similarly generalize our original proofs, too, by just replacing the Int-specific parts with their more general Monoid counterparts.

Here is the generalized proof of the left identity law:

-- Left-hand side of the equation
mempty <> x

-- x <> y = mappend x y
= mappend mempty x

-- `mempty = (mempty, mempty)`
= mappend (mempty, mempty) x

-- Define: x = (xL, xR), since `x` is a tuple
= mappend (mempty, mempty) (xL, xR)

-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)
= (mappend mempty xL, mappend mempty xR)

-- Monoid law: mappend mempty x = x
= (xL, xR)

-- x = (xL, xR)
= x

... the right identity law:

-- Left-hand side of the equation
= x <> mempty

-- x <> y = mappend x y
= mappend x mempty

-- mempty = (mempty, mempty)
= mappend x (mempty, mempty)

-- Define: x = (xL, xR), since `x` is a tuple
= mappend (xL, xR) (mempty, mempty)

-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)
= (mappend xL mempty, mappend xR mempty)

-- Monoid law: mappend x mempty = x
= (xL, xR)

-- x = (xL, xR)
= x

... and the associativity law:

-- Left-hand side
(x <> y) <> z

-- x <> y = mappend x y
= mappend (mappend x y) z

-- x = (xL, xR)
-- y = (yL, yR)
-- z = (zL, zR)
= mappend (mappend (xL, xR) (yL, yR)) (zL, zR)

-- mappend (x1, y1) (x2 , y2) = (mappend x1 x2, mappend y1 y2)
= mappend (mappend xL yL, mappend xR yR) (zL, zR)

-- mappend (x1, y1) (x2 , y2) = (mappend x1 x2, mappend y1 y2)
= (mappend (mappend xL yL) zL, mappend (mappend xR yR) zR)

-- Monoid law: mappend (mappend x y) z = mappend x (mappend y z)
= (mappend xL (mappend yL zL), mappend xR (mappend yR zR))

-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)
= mappend (xL, xR) (mappend yL zL, mappend yR zR)

-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)
= mappend (xL, xR) (mappend (yL, yR) (zL, zR))

-- x = (xL, xR)
-- y = (yL, yR)
-- z = (zL, zR)
= mappend x (mappend y z)

-- x <> y = mappend x y
= x <> (y <> z)

This more general Monoid instance lets us stick any Monoids inside the tuple fields and we can still combine the tuples. For example, lists form a Monoid:

-- Exercise: Prove the monoid laws for lists
instance Monoid [a] where
mempty = []

mappend = (++)

... so we can stick lists inside the right field of each tuple and still combine them:

>>> (1, [2, 3]) <> (4, [5, 6])
(5, [2, 3, 5, 6])
>>> (1, [2, 3]) <> (4, mempty) <> (mempty, [5, 6])
(5, [2, 3, 5, 6])
>>> (1, [2, 3]) <> mempty <> (4, [5, 6])
(5, [2, 3, 5, 6])

Why, we can even stick yet another tuple inside the right field and still combine them:

>>> (1, (2, 3)) <> (4, (5, 6))
(5, (7, 9))

We can try even more exotic permutations and everything still "just works":

>>> ((1,[2, 3]), ([4, 5], 6)) <> ((7, [8, 9]), ([10, 11), 12)
((8, [2, 3, 8, 9]), ([4, 5, 10, 11], 18))

This is our first example of a "scalable proof". We began from three primitive building blocks:

  • Int is a Monoid
  • [a] is a Monoid
  • (a, b) is a Monoid if a is a Monoid and b is a Monoid

... and we connected those three building blocks to assemble a variety of new Monoid instances. No matter how many tuples we nest the result is still a Monoid and obeys the Monoid laws. We don't need to re-prove the Monoid laws every time we assemble a new permutation of these building blocks.

However, these building blocks are still pretty limited. What other useful things can we combine to build new Monoids?


We're so used to thinking of Monoids as data, so let's define a new Monoid instance for something entirely un-data-like:

-- See "Appendix A" for some caveats
instance Monoid b => Monoid (IO b) where
mempty = return mempty

mappend io1 io2 = do
a1 <- io1
a2 <- io2
return (mappend a1 a2)

The above instance says: "If a is a Monoid, then an IO action that returns an a is also a Monoid". Let's test this using the getLine function from the Prelude:

-- Read one line of input from stdin
getLine :: IO String

String is a Monoid, since a String is just a list of characters, so we should be able to mappend multiple getLine statements together. Let's see what happens:

>>> getLine  -- Reads one line of input
>>> getLine <> getLine
>>> getLine <> getLine <> getLine

Neat! When we combine multiple commands we combine their effects and their results.

Of course, we don't have to limit ourselves to reading strings. We can use readLn from the Prelude to read in anything that implements the Read type class:

-- Parse a `Read`able value from one line of stdin
readLn :: Read a => IO a

All we have to do is tell the compiler which type a we intend to Read by providing a type signature:

>>> readLn :: IO (Int, Int)
(1, 2)<Enter>
(1 ,2)
>>> readLn <> readLn :: IO (Int, Int)
>>> readLn <> readLn <> readLn :: IO (Int, Int)

This works because:

  • Int is a Monoid
  • Therefore, (Int, Int) is a Monoid
  • Therefore, IO (Int, Int) is a Monoid

Or let's flip things around and nest IO actions inside of a tuple:

>>> let ios = (getLine, readLn) :: (IO String, IO (Int, Int))
>>> let (getLines, readLns) = ios <> ios <> ios
>>> getLines
>>> readLns

We can very easily reason that the type (IO String, IO (Int, Int)) obeys the Monoid laws because:

  • String is a Monoid
  • If String is a Monoid then IO String is also a Monoid
  • Int is a Monoid
  • If Int is a Monoid, then (Int, Int) is also a `Monoid
  • If (Int, Int) is a Monoid, then IO (Int, Int) is also a Monoid
  • If IO String is a Monoid and IO (Int, Int) is a Monoid, then (IO String, IO (Int, Int)) is also a Monoid

However, we don't really have to reason about this at all. The compiler will automatically assemble the correct Monoid instance for us. The only thing we need to verify is that the primitive Monoid instances obey the Monoid laws, and then we can trust that any larger Monoid instance the compiler derives will also obey the Monoid laws.

The Unit Monoid

Haskell Prelude also provides the putStrLn function, which echoes a String to standard output with a newline:

putStrLn :: String -> IO ()

Is putStrLn combinable? There's only one way to find out!

>>> putStrLn "Hello" <> putStrLn "World"

Interesting, but why does that work? Well, let's look at the types of the commands we are combining:

putStrLn "Hello" :: IO ()
putStrLn "World" :: IO ()

Well, we said that IO b is a Monoid if b is a Monoid, and b in this case is () (pronounced "unit"), which you can think of as an "empty tuple". Therefore, () must form a Monoid of some sort, and if we dig into Data.Monoid, we will discover the following Monoid instance:

-- Exercise: Prove the monoid laws for `()`
instance Monoid () where
mempty = ()

mappend () () = ()

This says that empty tuples form a trivial Monoid, since there's only one possible value (ignoring bottom) for an empty tuple: (). Therefore, we can derive that IO () is a Monoid because () is a Monoid.


Alright, so we can combine putStrLn "Hello" with putStrLn "World", but can we combine naked putStrLn functions?

>>> (putStrLn <> putStrLn) "Hello"

Woah, how does that work?

We never wrote a Monoid instance for the type String -> IO (), yet somehow the compiler magically accepted the above code and produced a sensible result.

This works because of the following Monoid instance for functions:

instance Monoid b => Monoid (a -> b) where
mempty = \_ -> mempty

mappend f g = \a -> mappend (f a) (g a)

This says: "If b is a Monoid, then any function that returns a b is also a Monoid".

The compiler then deduced that:

  • () is a Monoid
  • If () is a Monoid, then IO () is also a Monoid
  • If IO () is a Monoid then String -> IO () is also a Monoid

The compiler is a trusted friend, deducing Monoid instances we never knew existed.

Monoid plugins

Now we have enough building blocks to assemble a non-trivial example. Let's build a key logger with a Monoid-based plugin system.

The central scaffold of our program is a simple main loop that echoes characters from standard input to standard output:

main = do
hSetEcho stdin False
forever $ do
c <- getChar
putChar c

However, we would like to intercept key strokes for nefarious purposes, so we will slightly modify this program to install a handler at the beginning of the program that we will invoke on every incoming character:

install :: IO (Char -> IO ())
install = ???

main = do
hSetEcho stdin False
handleChar <- install
forever $ do
c <- getChar
handleChar c
putChar c

Notice that the type of install is exactly the correct type to be a Monoid:

  • () is a Monoid
  • Therefore, IO () is also a Monoid
  • Therefore Char -> IO () is also a Monoid
  • Therefore IO (Char -> IO ()) is also a Monoid

Therefore, we can combine key logging plugins together using Monoid operations. Here is one such example:

type Plugin = IO (Char -> IO ())

logTo :: FilePath -> Plugin
logTo filePath = do
handle <- openFile filePath WriteMode
hSetBuffering handle NoBuffering
return (hPutChar handle)

main = do
hSetEcho stdin False
handleChar <- logTo "file1.txt" <> logTo "file2.txt"
forever $ do
c <- getChar
handleChar c
putChar c

Now, every key stroke will be recorded to both file1.txt and file2.txt. Let's confirm that this works as expected:

$ ./logger
$ cat file1.txt
$ cat file2.txt

Try writing your own Plugins and mixing them in with (<>) to see what happens. "Appendix C" contains the complete code for this section so you can experiment with your own Plugins.


Notice that I never actually proved the Monoid laws for the following two Monoid instances:

instance Monoid b => Monoid (a -> b) where
mempty = \_ -> mempty
mappend f g = \a -> mappend (f a) (g a)

instance Monoid a => Monoid (IO a) where
mempty = return mempty

mappend io1 io2 = do
a1 <- io1
a2 <- io2
return (mappend a1 a2)

The reason why is that they are both special cases of a more general pattern. We can detect the pattern if we rewrite both of them to use the pure and liftA2 functions from Control.Applicative:

import Control.Applicative (pure, liftA2)

instance Monoid b => Monoid (a -> b) where
mempty = pure mempty

mappend = liftA2 mappend

instance Monoid b => Monoid (IO b) where
mempty = pure mempty

mappend = liftA2 mappend

This works because both IO and functions implement the following Applicative interface:

class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b

-- Lift a binary function over the functor `f`
liftA2 :: Applicative f => (a -> b -> c) -> f a -> f b -> f c
liftA2 f x y = (pure f <*> x) <*> y

... and all Applicative instances must obey several Applicative laws:

pure id <*> v = v

((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)

pure f <*> pure x = pure (f x)

u <*> pure x = pure (\f -> f y) <*> u

These laws may seem a bit adhoc, but this paper explains that you can reorganize the Applicative class to this equivalent type class:

class Functor f => Monoidal f where
unit :: f ()
(#) :: f a -> f b -> f (a, b)

Then the corresponding laws become much more symmetric:

fmap snd (unit # x) = x                 -- Left identity

fmap fst (x # unit) = x -- Right identity

fmap assoc ((x # y) # z) = x # (y # z) -- Associativity
assoc ((a, b), c) = (a, (b, c))

fmap (f *** g) (x # y) = fmap f x # fmap g y -- Naturality
(f *** g) (a, b) = (f a, g b)

I personally prefer the Monoidal formulation, but you go to war with the army you have, so we will use the Applicative type class for this post.

All Applicatives possess a very powerful property: they can all automatically lift Monoid operations using the following instance:

instance (Applicative f, Monoid b) => Monoid (f b) where
mempty = pure mempty

mappend = liftA2 mappend

This says: "If f is an Applicative and b is a Monoid, then f b is also a Monoid." In other words, we can automatically extend any existing Monoid with some new feature f and get back a new Monoid.

Note: The above instance is bad Haskell because it overlaps with other type class instances. In practice we have to duplicate the above code once for each Applicative. Also, for some Applicatives we may want a different Monoid instance.

We can prove that the above instance obeys the Monoid laws without knowing anything about f and b, other than the fact that f obeys the Applicative laws and b obeys the Applicative laws. These proofs are a little long, so I've included them in Appendix B.

Both IO and functions implement the Applicative type class:

instance Applicative IO where
pure = return

iof <*> iox = do
f <- iof
x <- iox
return (f x)

instance Applicative ((->) a) where
pure x = \_ -> x

kf <*> kx = \a ->
let f = kf a
x = kx a
in f x

This means that we can kill two birds with one stone. Every time we prove the Applicative laws for some functor F:

instance Applicative F where ...

... we automatically prove that the following Monoid instance is correct for free:

instance Monoid b => Monoid (F b) where
mempty = pure mempty

mappend = liftA2 mappend

In the interest of brevity, I will skip the proofs of the Applicative laws, but I may cover them in a subsequent post.

The beauty of Applicative Functors is that every new Applicative instance we discover adds a new building block to our Monoid toolbox, and Haskell programmers have already discovered lots of Applicative Functors.

Revisiting tuples

One of the very first Monoid instances we wrote was:

instance (Monoid a, Monoid b) => Monoid (a, b) where
mempty = (mempty, mempty)

mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)

Check this out:

instance (Monoid a, Monoid b) => Monoid (a, b) where
mempty = pure mempty

mappend = liftA2 mappend

This Monoid instance is yet another special case of the Applicative pattern we just covered!

This works because of the following Applicative instance in Control.Applicative:

instance Monoid a => Applicative ((,) a) where
pure b = (mempty, b)

(a1, f) <*> (a2, x) = (mappend a1 a2, f x)

This instance obeys the Applicative laws (proof omitted), so our Monoid instance for tuples is automatically correct, too.

Composing applicatives

In the very first section I wrote:

Haskell programmers prove large-scale properties exactly the same way we build large-scale programs:

  • We build small proofs that we can verify correct in isolation
  • We compose smaller proofs into larger proofs

I don't like to use the word compose lightly. In the context of category theory, compose has a very rigorous meaning, indicating composition of morphisms in some category. This final section will show that we can actually compose Monoid proofs in a very rigorous sense of the word.

We can define a category of Monoid proofs:

So in our Plugin example, we began on the proof that () was a Monoid and then composed three Applicative morphisms to prove that Plugin was a Monoid. I will use the following diagram to illustrate this:

| |
| Legend: * = Object |
| |
| v |
| | = Morphism |
| v |
| |

* `()` is a `Monoid`

| IO

* `IO ()` is a `Monoid`

| ((->) String)

* `String -> IO ()` is a `Monoid`

| IO

* `IO (String -> IO ())` (i.e. `Plugin`) is a `Monoid`

Therefore, we were literally composing proofs together.


You can equationally reason at scale by decomposing larger proofs into smaller reusable proofs, the same way we decompose programs into smaller and more reusable components. There is no limit to how many proofs you can compose together, and therefore there is no limit to how complex of a program you can tame using equational reasoning.

This post only gave one example of composing proofs within Haskell. The more you learn the language, the more examples of composable proofs you will encounter. Another common example is automatically deriving Monad proofs by composing monad transformers.

As you learn Haskell, you will discover that the hard part is not proving things. Rather, the challenge is learning how to decompose proofs into smaller proofs and you can cultivate this skill by studying category theory and abstract algebra. These mathematical disciplines teach you how to extract common and reusable proofs and patterns from what appears to be disposable and idiosyncratic code.

Appendix A - Missing Monoid instances

These Monoid instance from this post do not actually appear in the Haskell standard library:

instance Monoid b => Monoid (IO b)

instance Monoid Int

The first instance was recently proposed here on the Glasgow Haskell Users mailing list. However, in the short term you can work around it by writing your own Monoid instances by hand just by inserting a sufficient number of pures and liftA2s.

For example, suppose we wanted to provide a Monoid instance for Plugin. We would just newtype Plugin and write:

newtype Plugin = Plugin { install :: IO (String -> IO ()) }

instance Monoid Plugin where
mempty = Plugin (pure (pure (pure mempty)))

mappend (Plugin p1) (Plugin p2) =
Plugin (liftA2 (liftA2 (liftA2 mappend)) p1 p2)

This is what the compiler would have derived by hand.

Alternatively, you could define an orphan Monoid instance for IO, but this is generally frowned upon.

There is no default Monoid instance for Int because there are actually two possible instances to choose from:

-- Alternative #1
instance Monoid Int where
mempty = 0

mappend = (+)

-- Alternative #2
instance Monoid Int where
mempty = 1

mappend = (*)

So instead, Data.Monoid sidesteps the issue by providing two newtypes to distinguish which instance we prefer:

newtype Sum a = Sum { getSum :: a }

instance Num a => Monoid (Sum a)

newtype Product a = Product { getProduct :: a}

instance Num a => Monoid (Product a)

An even better solution is to use a semiring, which allows two Monoid instances to coexist for the same type. You can think of Haskell's Num class as an approximation of the semiring class:

class Num a where
fromInteger :: Integer -> a

(+) :: a -> a -> a

(*) :: a -> a -> a

-- ... and other operations unrelated to semirings

Note that we can also lift the Num class over the Applicative class, exactly the same way we lifted the Monoid class. Here's the code:

instance (Applicative f, Num a) => Num (f a) where
fromInteger n = pure (fromInteger n)

(+) = liftA2 (+)

(*) = liftA2 (*)

(-) = liftA2 (-)

negate = fmap negate

abs = fmap abs

signum = fmap signum

This lifting guarantees that if a obeys the semiring laws then so will f a. Of course, you will have to specialize the above instance to every concrete Applicative because otherwise you will get overlapping instances.

Appendix B

These are the proofs to establish that the following Monoid instance obeys the Monoid laws:

instance (Applicative f, Monoid b) => Monoid (f b) where
mempty = pure mempty

mappend = liftA2 mappend

... meaning that if f obeys the Applicative laws and b obeys the Monoid laws, then f b also obeys the Monoid laws.

Proof of the left identity law:

mempty <> x

-- x <> y = mappend x y
= mappend mempty x

-- mappend = liftA2 mappend
= liftA2 mappend mempty x

-- mempty = pure mempty
= liftA2 mappend (pure mempty) x

-- liftA2 f x y = (pure f <*> x) <*> y
= (pure mappend <*> pure mempty) <*> x

-- Applicative law: pure f <*> pure x = pure (f x)
= pure (mappend mempty) <*> x

-- Eta conversion
= pure (\a -> mappend mempty a) <*> x

-- mappend mempty x = x
= pure (\a -> a) <*> x

-- id = \x -> x
= pure id <*> x

-- Applicative law: pure id <*> v = v
= x

Proof of the right identity law:

x <> mempty = x

-- x <> y = mappend x y
= mappend x mempty

-- mappend = liftA2 mappend
= liftA2 mappend x mempty

-- mempty = pure mempty
= liftA2 mappend x (pure mempty)

-- liftA2 f x y = (pure f <*> x) <*> y
= (pure mappend <*> x) <*> pure mempty

-- Applicative law: u <*> pure y = pure (\f -> f y) <*> u
= pure (\f -> f mempty) <*> (pure mappend <*> x)

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((pure (.) <*> pure (\f -> f mempty)) <*> pure mappend) <*> x

-- Applicative law: pure f <*> pure x = pure (f x)
= (pure ((.) (\f -> f mempty)) <*> pure mappend) <*> x

-- Applicative law : pure f <*> pure x = pure (f x)
= pure ((.) (\f -> f mempty) mappend) <*> x

-- `(.) f g` is just prefix notation for `f . g`
= pure ((\f -> f mempty) . mappend) <*> x

-- f . g = \x -> f (g x)
= pure (\x -> (\f -> f mempty) (mappend x)) <*> x

-- Apply the lambda
= pure (\x -> mappend x mempty) <*> x

-- Monoid law: mappend x mempty = x
= pure (\x -> x) <*> x

-- id = \x -> x
= pure id <*> x

-- Applicative law: pure id <*> v = v
= x

Proof of the associativity law:

(x <> y) <> z

-- x <> y = mappend x y
= mappend (mappend x y) z

-- mappend = liftA2 mappend
= liftA2 mappend (liftA2 mappend x y) z

-- liftA2 f x y = (pure f <*> x) <*> y
= (pure mappend <*> ((pure mappend <*> x) <*> y)) <*> z

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= (((pure (.) <*> pure mappend) <*> (pure mappend <*> x)) <*> y) <*> z

-- Applicative law: pure f <*> pure x = pure (f x)
= ((pure f <*> (pure mappend <*> x)) <*> y) <*> z
f = (.) mappend

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((((pure (.) <*> pure f) <*> pure mappend) <*> x) <*> y) <*> z
f = (.) mappend

-- Applicative law: pure f <*> pure x = pure (f x)
= (((pure f <*> pure mappend) <*> x) <*> y) <*> z
f = (.) ((.) mappend)

-- Applicative law: pure f <*> pure x = pure (f x)
= ((pure f <*> x) <*> y) <*> z
f = (.) ((.) mappend) mappend

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f = ((.) mappend) . mappend

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x = (((.) mappend) . mappend) x

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x = (.) mappend (mappend x)

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f x = mappend . (mappend x)

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x y = (mappend . (mappend x)) y

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x y = mappend (mappend x y)

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x y z = mappend (mappend x y) z

-- Monoid law: mappend (mappend x y) z = mappend x (mappend y z)
= ((pure f <*> x) <*> y) <*> z
f x y z = mappend x (mappend y z)

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x y z = (mappend x . mappend y) z

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x y = mappend x . mappend y

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f x y = (.) (mappend x) (mappend y)

-- (f . g) x = f
= ((pure f <*> x) <*> y) <*> z
f x y = (((.) . mappend) x) (mappend y)

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x y = ((((.) . mappend) x) . mappend) y

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x = (((.) . mappend) x) . mappend

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f x = (.) (((.) . mappend) x) mappend

-- Lambda abstraction
= ((pure f <*> x) <*> y) <*> z
f x = (\k -> k mappend) ((.) (((.) . mappend) x))

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x = (\k -> k mappend) (((.) . ((.) . mappend)) x)

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f = (\k -> k mappend) . ((.) . ((.) . mappend))

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f = (.) (\k -> k mappend) ((.) . ((.) . mappend))

-- Applicative law: pure f <*> pure x = pure (f x)
= (((pure g <*> pure f) <*> x) <*> y) <*> z
g = (.) (\k -> k mappend)
f = (.) . ((.) . mappend)

-- Applicative law: pure f <*> pure x = pure (f x)
= ((((pure (.) <*> pure (\k -> k mappend)) <*> pure f) <*> x) <*> y) <*> z
f = (.) . ((.) . mappend)

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((pure (\k -> k mappend) <*> (pure f <*> x)) <*> y) <*> z
f = (.) . ((.) . mappend)

-- u <*> pure y = pure (\k -> k y) <*> u
= (((pure f <*> x) <*> pure mappend) <*> y) <*> z
f = (.) . ((.) . mappend)

-- (.) f g = f . g
= (((pure f <*> x) <*> pure mappend) <*> y) <*> z
f = (.) (.) ((.) . mappend)

-- Applicative law: pure f <*> pure x = pure (f x)
= ((((pure g <*> pure f) <*> x) <*> pure mappend) <*> y) <*> z
g = (.) (.)
f = (.) . mappend

-- Applicative law: pure f <*> pure x = pure (f x)
= (((((pure (.) <*> pure (.)) <*> pure f) <*> x) <*> pure mappend) <*> y) <*> z
f = (.) . mappend

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= (((pure (.) <*> (pure f <*> x)) <*> pure mappend) <*> y) <*> z
f = (.) . mappend

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((pure f <*> x) <*> (pure mappend <*> y)) <*> z
f = (.) . mappend

-- (.) f g = f . g
= ((pure f <*> x) <*> (pure mappend <*> y)) <*> z
f = (.) (.) mappend

-- Applicative law: pure f <*> pure x = pure (f x)
= (((pure f <*> pure mappend) <*> x) <*> (pure mappend <*> y)) <*> z
f = (.) (.)

-- Applicative law: pure f <*> pure x = pure (f x)
= ((((pure (.) <*> pure (.)) <*> pure mappend) <*> x) <*> (pure mappend <*> y)) <*> z

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((pure (.) <*> (pure mappend <*> x)) <*> (pure mappend <*> y)) <*> z

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= (pure mappend <*> x) <*> ((pure mappend <*> y) <*> z)

-- liftA2 f x y = (pure f <*> x) <*> y
= liftA2 mappend x (liftA2 mappend y z)

-- mappend = liftA2 mappend
= mappend x (mappend y z)

-- x <> y = mappend x y
= x <> (y <> z)

Appendix C: Monoid key logging

Here is the complete program for a key logger with a Monoid-based plugin system:

import Control.Applicative (pure, liftA2)
import Control.Monad (forever)
import Data.Monoid
import System.IO

instance Monoid b => Monoid (IO b) where
mempty = pure mempty

mappend = liftA2 mappend

type Plugin = IO (Char -> IO ())

logTo :: FilePath -> Plugin
logTo filePath = do
handle <- openFile filePath WriteMode
hSetBuffering handle NoBuffering
return (hPutChar handle)

main = do
hSetEcho stdin False
handleChar <- logTo "file1.txt" <> logTo "file2.txt"
forever $ do
c <- getChar
handleChar c
putChar c

by Gabriel Gonzalez ( at August 11, 2014 12:13 AM

August 10, 2014

Gabriel Gonzalez

managed-1.0.0: A monad for managed resources

I'm splitting off the Managed type from the mvc library into its own stand-alone library. I've wanted to use this type outside of mvc for some time now, because it's an incredibly useful Applicative that I find myself reaching for in my own code whenever I need to acquire resources.

If you're not familiar with the Managed type, it's simple:

-- The real implementation uses smart constructors
newtype Managed a =
Managed { with :: forall r . (a -> IO r) -> IO r }

-- It's a `Functor`/`Applicative`/`Monad`
instance Functor Managed where ...
instance Applicative Managed where ...
instance Monad Managed where ...

-- ... and also implements `MonadIO`
instance MonadIO Managed where ...

Here's an example of mixing the Managed monad with pipes to copy one file to another:

import Control.Monad.Managed
import System.IO
import Pipes
import qualified Pipes.Prelude as Pipes

main = runManaged $ do
hIn <- managed (withFile "in.txt" ReadMode)
hOut <- managed (withFile "out.txt" WriteMode)
liftIO $ runEffect $
Pipes.fromHandle hIn >-> Pipes.toHandle hOut

However, this is not much more concise than the equivalent callback-based version. The real value of the Managed type is its Applicative instance, which you can use to lift operations from values that it wraps.

Equational reasoning

My previous post on equational reasoning at scale describes how you can use Applicatives to automatically extend Monoids while preserving the Monoid operations. The Managed Applicative is no different and provides the following type class instance that automatically lifts Monoid operations:

instance Monoid a => Monoid (Managed a)

Therefore, you can treat the Managed Applicative as yet another useful building block in your Monoid tool box.

However, Applicatives can do more than extend Monoids; they can extend Categorys, too. Given any Category, if you extend it with an Applicative you can automatically derive a new Category. Here's the general solution:

import Control.Applicative
import Control.Category
import Prelude hiding ((.), id)

newtype Extend f c a b = Extend (f (c a b))

instance (Applicative f, Category c)
=> Category (Extend f c) where
id = Extend (pure id)

Extend f . Extend g = Extend (liftA2 (.) f g)

So let's take advantage of this fact to extend one of the pipes categories with simple resource management. All we have to do is wrap the pull-based pipes category in a bona-fide Category instance:

import Pipes

newtype Pull m a b = Pull (Pipe a b m ())

instance Monad m => Category (Pull m) where
id = Pull cat

Pull p1 . Pull p2 = Pull (p1 <-< p2)

Now we can automatically define resource-managed pipes by Extending them with the Managed Applicative:

import Control.Monad.Managed
import qualified Pipes.Prelude as Pipes
import System.IO

fromFile :: FilePath -> Extend Managed (Pull IO) () String
fromFile filePath = Extend $ do
handle <- managed (withFile filePath ReadMode)
return (Pull (Pipes.fromHandle handle))

toFile :: FilePath -> Extend Managed (Pull IO) String X
toFile filePath = Extend $ do
handle <- managed (withFile filePath WriteMode)
return (Pull (Pipes.toHandle handle))

All we need is a way to run Extended pipes and then we're good to go:

runPipeline :: Extend Managed (Pull IO) () X -> IO ()
runPipeline (Extend mp) = runManaged $ do
Pull p <- mp
liftIO $ runEffect (return () >~ p)

If we compose and run these Extended pipes they just "do the right thing":

main :: IO ()
main = runPipeline (fromFile "in.txt" >>> toFile "out.txt")

Let's check it out:

$ cat in.txt
$ ./example
$ cat out.txt

We can even reuse existing pipes, too:

reuse :: Monad m => Pipe a b m () -> Extend Managed (Pull m) a b
reuse = Extend . pure . Pull

main = runPipeline $
fromFile "in.txt" >>> reuse (Pipes.take 2) >>> toFile "out.txt"

... and reuse does the right thing:

$ ./example
$ cat out.txt

What does it mean for reuse to "do the right thing"? Well, we can specify the correctness conditions for reuse as the following functor laws:

reuse (p1 >-> p2) = reuse p1 >>> reuse p2

reuse cat = id

These two laws enforce that reuse is "well-behaved" in a rigorous sense.

This is just one example of how you can use the Managed type to extend an existing Category. As an exercise, try to take other categories and extend them this way and see what surprising new connectable components you can create.


Experts will recognize that Managed is a special case of Codensity or ContT. The reason for defining a separate type is:

  • simpler inferred types,
  • additional type class instances, and:
  • a more beginner-friendly name.

Managed is closely related in spirit to the Resource monad, which is now part of resourcet. The main difference between the two is:

  • Resource preserves the open and close operations
  • Managed works for arbitrary callbacks, even unrelated to resources

This is why I view the them as complementary Monads.

Like all Applicatives, the Managed type is deceptively simple. This type does not do much in isolation, but it grows in power the more you compose it with other Applicatives to generate new Applicatives.

by Gabriel Gonzalez ( at August 10, 2014 11:58 PM


Field Accessors Considered Harmful

It's pretty well known these days that Haskell's field accessors are rather cumbersome syntactically and not composable.  The lens abstraction that has gotten much more popular recently (thanks in part to Edward Kmett's lens package) solves these problems.  But I recently ran into a bigger problem with field accessors that I had not thought about before.  Consider the following scenario.  You have a package with code something like this:

data Config = Config { configFieldA :: [Text] }

So your Config data structure gives your users getters and setters for field A (and any other fields you might have).  Your users are happy and life is good.  Then one day you decide to add a new feature and that feature requires expanding and restructuring Config.  Now you have this:
data MConfig = MConfig { mconfigFieldA :: [Text] }
data Config = Config { configMC :: MConfig , configFieldX :: Text , configFieldY :: Bool }

This is a nice solution because your users get to keep the functionality over the portion of the Config that they are used to, and they still get the new functionality.  But now there's a problem.  You're still breaking them because configFieldA changed names to mconfigFieldA and now refers to the MConfig structure instead of Config.  If this was not a data structure, you could preserve backwards compatibility by creating another function:

configFieldA = mconfigFieldA . configMC

But alas, that won't work here because configFieldA is not a normal function.  It is a special field accessor generated by GHC and you know that your users are using it as a setter.  It seems to me that we are at an impasse.  It is completely impossible to deliver your new feature without breaking backwards compatibility somehow.  No amount of deprecated cycles can ease the transition.  The sad thing is that it seems like it should have been totally doable.  Obviously there are some kinds of changes that understandably will break backwards compatibility.  But this doesn't seem like one of them since it is an additive change.  Yes, yes, I's impossible to do this change without changing the type of the Config constructor, so that means that at least that function will break.  But we should be able to minimize the breakage to the field accessor functions, and field accessors prevent us from doing that.

However, we could have avoided this problem.  If we had a bit more foresight, we could have done this.

module Foo (mkConfig, configFieldA) where

data Config = Config { _configFieldA :: [Text] }

mkConfig :: [Text] -> Config
mkConfig = Config

configFieldA = lens _configFieldA (\c a -> c { _configFieldA = a })

This would allow us to avoid breaking backwards compatibility by continuing to export appropriate versions of these symbols.  It would look something like this.
module Foo
  ( MConfig
  , mkMConfig
  , mconfigFieldA
  , Config
  , mkConfig
  , configFieldA
  , configMC
  -- ...
  ) where

data MConfig = MConfig { _mconfigFieldA :: [Text] }
data Config = Config { _configMC :: MConfig
                     , _configFieldX :: Text
                     , _configFieldY :: Bool }

mkMConfig = MConfig

mkConfig a = Config (mkMConfig a) "" False

mconfigFieldA = lens _mconfigFieldA (\c a -> c { _mconfigFieldA = a })
configMC = lens _configMC (\c mc -> c { _configMC = mc })

-- The rest of the field lenses here

configFieldA = configMC . mconfigFieldA
Note that the type signatures for mkConfig and configFieldA stay exactly the same.  We weren't able to do this with field accessors because they are not composable.  Lenses solve this problem for us because they are composable and we have complete control over their definition.

For quite some time now I have thought that I understood the advantage that lenses give you over field accessors.  Discovering this added ability of lenses in helping us preserve backwards compatibility came as a pleasant surprise.  I'll refrain from opining on how this should affect your development practices, but I think it makes the case for using lenses in your code a bit stronger than it was before.

by mightybyte ( at August 10, 2014 09:33 PM

Brent Yorgey

Readers wanted!

tl;dr: Read a draft of my thesis and send me your feedback by September 9!

Over the past year I’ve had several people say things along the lines of, “let me know if you want me to read through your thesis”. I never took them all that seriously (it’s easy to say you are willing to read a 200-page document…), but it never hurts to ask, right?

My thesis defense is scheduled for October 14, and I’m currently undertaking a massive writing/editing push to try to get as much of it wrapped up as I can before classes start on September 4. So, if there’s anyone out there actually interested in reading a draft and giving feedback, now is your chance!

The basic idea of my dissertation is to put combinatorial species and related variants (including a port of the theory to HoTT) in a common categorical framework, and then be able to use them for working with/talking about data types. If you’re brave enough to read it, you’ll find lots of category theory and type theory, and very little code—but I can promise lots of examples and pretty pictures. I’ve tried to make it somewhat self-contained, so it may be a good way to learn a bit of category theory or homotopy type theory, if you’ve been curious to learn more about those topics.

You can find the latest draft here (auto-updated every time I commit); more generally, you can find the git repo here. If you notice any typos or grammatical errors, feel free to open a pull request. For anything more substantial—thoughts on the organization, notes or questions about things you found confusing, suggestions for improvement, pointers to other references—please send me an email (first initial last name at gmail). And finally, please send me any feedback by September 9 at the latest (but the earlier the better). I need to have a final version to my committee by September 23.

Last but not least, if you’re interested to read it but don’t have the time or inclination to provide feedback on a draft, never fear—I’ll post an announcement when the final version is ready for your perusal!

by Brent at August 10, 2014 07:51 PM

Manuel M T Chakravarty

What is currying (in Swift)?

A blog post by Ole Begemann has led some people interested in Swift wondering what exactly curried functions and currying are — for example, listen to the discussion on the Mobile Couch podcast Episode 37.

Let’s look at currying in Swift. Here is a binary add function

func add(x: Int, #y: Int) -> Int {
    return x + y

and next we have a curried version of the same function

func curriedAdd(x: Int)(y: Int) -> Int {
  return x + y

The difference between the two is that add takes two arguments (two Ints) and returns an Int, whereas curriedAdd takes only one argument and returns a function of type (y: Int) -> Int. If you put those two definitions into a Playground, both add(1, y: 2) and curriedAdd(1)(y: 2) yield 3.

In add(1, 2), we supply two arguments at once, but in curriedAdd(1)(y: 2), we supply only one argument, get a new function as the result, and then, apply that new function to the second argument. In other words, add requires two arguments at once, whereas its curried variant requires the two arguments one after the other in two separate function calls.

This works not only for binary functions, but also for functions expecting three or more arguments. More generally, currying refers to the fact that any n-ary function (i.e., any function expecting n arguments) can be rewritten as a computationally equivalent function that doesn’t get all n arguments at once, but gets them one after the other, always returning an intermediate function for each of the n function applications.

That’s…interesting, but why should we care? Curried functions are more versatile than their uncurried counterparts. We can apply the function add only to two arguments. That’s it. In contrast, we can apply curriedAdd to either one or two arguments. If we want to define an increment function, we can do that easily in terms of curriedAdd:

let inc = curriedAdd(1)

As expected, inc(y: 2) gives 3.

For a simple function, such as, add, this extra versatility is not very impressive. However, Ole’s blog post explains how this ultimately enables the target-action pattern in Swift and that is pretty impressive!

As a side note, in the functional language Haskell all functions are curried by default. In fact, the concept was called currying after the mathematician Haskell B. Curry in whose honour the language was called Haskell.

August 10, 2014 09:19 AM

Mike Izbicki

Scribal traditions of "ancient" Hebrew scrolls

Scribal traditions of "ancient" Hebrew scrolls

posted on 2014-08-10

In 2006, I saw the dead sea scrolls in San Diego. The experience changed my life. I realized I knew nothing about ancient Judea, and decided to immerse myself in it. I studied biblical Hebrew and began a collection of Hebrew scrolls.

a pile of Torah scrolls

a pile of Torah scrolls

Each scroll is between 100 to 600 years old, and is a fragment of the Torah. These scrolls were used by synagogues throughout Southern Europe, Africa, and the Middle East. As we’ll see in a bit, each region has subtly different scribal traditions. But they all take their Torah very seriously.

The first thing that strikes me about a scroll is its color. Scrolls are made from animal skin, and the color is determined by the type of animal and method of curing the skin. The methods and animals used depend on the local resources, so color gives us a clue about where the scroll originally came from. For example, scrolls with a deep red color usually come from North Africa. As the scroll ages, the color may either fade or deepen slightly, but remains largely the same. The final parchment is called either gevil or klaf depending on the quality and preparation method.

The four scrolls below show the range of colors scrolls come in:

4 Torah scrolls side by side with different ages

4 Torah scrolls side by side with different ages

My largest scroll is about 60 feet long. Here I have it partially stretched out on the couch in my living room:

Torah scroll stretched out on couch

Torah scroll stretched out on couch

The scroll is about 300 years old, and contains most of Exodus, Leviticus, and Numbers. A complete Torah scroll would also have Genesis and Deuteronomy and be around 150 feet long. Sadly, this scroll has been damaged throughout its long life, and the damaged sections were removed.

As you can imagine, many hides were needed to make these large scrolls. These hides get sewn together to form the full scroll. You can easily see the stitching on the back of the scroll:

back of a Torah scroll

back of a Torah scroll

Also notice how rough that skin is! The scribes (for obvious reasons) chose to use the nice side of the skin to write on.

Here is a front end, rotated view of the same seam above. Some columns of text are separated at these seems, but some columns are not.

front of Torah seam

front of Torah seam

Animal hides come in many sizes. The hide in this image is pretty large and holds five columns of text:

5 panels of parchment

5 panels of parchment

But this hide is smaller and holds only three columns:

3 panels of parchment

3 panels of parchment

The coolest part of these scrolls is their calligraphy. Here’s a zoomed in look on one of the columns of text above:

zoomed in Hebrew Torah scroll

zoomed in Hebrew Torah scroll

There’s a lot to notice in this picture:

  1. The detail is amazing. Many characters have small strokes decorating them. These strokes are called tagin (or crowns in English). A bit farther down the page we’ll see different ways other scribal traditions decorate these characters. Because of this detail in every letter, a scribe (or sopher) might spend the whole day writing without finishing a single piece of parchment. The average sopher takes between nine months to a year to complete a Torah scroll.

  2. There are faint indentations in the parchment that the sopher used to ensure he was writing straight. We learned to write straight in grade school by writing our letters on top of lines on paper. But in biblical Hebrew, the sopher writes their letters below the line!

  3. Hebrew is read and written right to left (backwards from English). To keep the left margin crisp, the letters on the left can be stretched to fill space. This effect is used in different amounts throughout the text. The stretching is more noticeable in this next section:

Hebrew stretched letters in Torah scroll

Hebrew stretched letters in Torah scroll

And sometimes the sopher goes crazy and stretches all the things:

scribe stretched all the letters on this line of a Hebrew Torah

scribe stretched all the letters on this line of a Hebrew Torah

If you look at the pictures above carefully, you can see that only certain letters get stretched: ת ד ח ה ר ל. These letters look nice when stretched because they have a single horizontal stroke.

The next picture shows a fairly rare example of stretching the letter ש. It looks much less elegant than the other stretched letters:

stretching the shem letter in a Hebrew Torah scroll

stretching the shem letter in a Hebrew Torah scroll

Usually these stretched characters are considered mistakes. An experienced sopher evenly spaces the letters to fill the line exactly. But a novice sopher can’t predict their space usage as well. When they hit the end of the line and realize they can’t fit another full word, they’ll add one of these stretched characters to fill the space.

In certain sections, however, stretched lettering is expected. It is one of the signs of poetic text in the Torah. For example, in the following picture, the sopher intentionally stretched each line, even when they didn’t have to:

closeup of Torah scroll with cool calligraphy

closeup of Torah scroll with cool calligraphy

Keeping the left margin justified isn’t just about looks. The Torah is divided into thematic sections called parashot. There are two types of breaks separating parashot. The petuha (open) is a jagged edge, much like we end paragraphs in English. The setumah (closed) break is a long space in the middle of the line. The picture below shows both types of breaks:

Torah scroll containing a petuha and setumah parashah break

Torah scroll containing a petuha and setumah parashah break

A sopher takes these parashot divisions very seriously. If the sopher accidentally adds or removes parashot from the text, the entire scroll becomes non-kosher and cannot be used. A mistake like this would typically be fixed by removing the offending piece of parchment from the scroll, rewriting it, and adding the corrected version back in. (We’ll see some pictures of less serious mistakes at the bottom of this page.)

The vast majority of of the Torah is formatted as simple blocks of text. But certain sections must be formatted in special ways. This is a visual cue that the text is more poetic.

The passage below is of Numbers 10:35-36. Here we see an example of the inverted nun character being used to highlight some text. This is the only section of the Torah where this formatting appears (although it also appears seven times in the book of psalms). The inverted nun characters are set all by themselves, and surround a command about the Ark of the Covenant:

Moses gives a command about the ark of the covenant in fancy Hebrew script; inverted nun character

Moses gives a command about the ark of the covenant in fancy Hebrew script; inverted nun character

It’s really cool when two different scrolls have sections that overlap. We can compare them side-by-side to watch the development of different scribal traditions. The image below shows two versions of Numbers 6:22-27.

The lord bless you and keep you rendered in a Hebrew Torah in fancy Hebrew script

The lord bless you and keep you rendered in a Hebrew Torah in fancy Hebrew script

The writing is almost identical in both versions, with one exception. On the first line with special formatting, the left scroll has two words in the right column: אמור להם, but the right scroll only has the world אמור) להם is the last word on the previous line). When the sopher is copying a scroll, he does his best to preserve the formatting in these special sections. But due to the vast Jewish diaspora, small mistakes like this get made and propagate. Eventually they form entirely new scribal traditions. (Note that if a single letter is missing from a Torah, then the Torah is not kosher and is considered unfit for use. These scribal differences are purely stylistic.)

Many individual characters and words also receive special formatting throughout the text. Both images below come from the same piece of parchment (in Genesis 23) and were created by the same sopher. The image on the left shows the letter פ in its standard form, and the image on the right shows it in a modified form.

a whirled pe in the Hebrew Torah side by side with a normal pe

a whirled pe in the Hebrew Torah side by side with a normal pe

The meaning of these special characters is not fully known, and every scribal tradition exhibits some variation in what letters get these extra decorations. In the scroll above, the whirled פ appears only once. But some scrolls exhibit the special character dozens of times. Here is another example where you can see a whirled פ a few letters to the right of its normal version:

a whirled pe and normal pe in the Hebrew Torah in the same sentence

a whirled pe and normal pe in the Hebrew Torah in the same sentence

Another special marker is when dots are placed over the Hebrew letters. The picture below comes from the story when Esau is reconciling with Jacob in Genesis 33. Normally, the dotted word would mean that Esau kissed Jacob in reconciliation; but tradition states that these dots indicate that Esau was being incincere. Some rabbis say that this word, when dotted, could be more accurately translated as Esau “bit” Jacob.

dots above words on the Hebrew Torah

dots above words on the Hebrew Torah

Next, let’s take a look at God’s name written in many different styles. In Hebrew, God’s name is written יהוה. Christians often pronounce God’s name as Yahweh or Jehovah. Jews, however, never say God’s name. Instead, they say the word adonai, which means “lord.” In English old testaments, anytime you see the word Lord rendered in small caps, the Hebrew is actually God’s name. When writing in English, Jews will usually write God’s name as YHWH. Removing the vowels is a reminder to not say the name out loud.

Below are nine selected images of YHWH. Each comes from a different scroll and illustrates the decorations added by a different scribal tradition. A few are starting to fade from age, but they were the best examples I could find in the same style. The simplest letters are in the top left, and the most intricate in the bottom right. In the same scroll, YHWH is always written in the same style.

yahweh, jehova, the name of god, in many different Hebrew scripts

yahweh, jehova, the name of god, in many different Hebrew scripts

The next image shows the word YHWH at the end of the line. The ה letters get stretched just like in any other word. When I first saw this I was surprised a sopher would stretch the name of God like this—the name of God is taken very seriously and must be handled according to special rules. I can just imagine rabbis 300 years ago getting into heated debates about whether or not this was kosher!

stretched yahweh in Hebrew Torah scroll

stretched yahweh in Hebrew Torah scroll

There is another oddity in the image above. The letter yod (the small, apostrophe looking letter at the beginning of YHWH) appears in each line. But it is written differently in the last line. Here, it is given two tagin, but everywhere else it only has one. Usually, the sopher consistently applies the same styling throughout the scroll. Changes like this typically indicate the sopher is trying to emphasize some aspect of the text. Exactly what the changes mean, however, would depend on the specific scribal tradition.

The more general word for god in Hebrew is אלוהים, pronounced elohim. This word can refer to either YHWH or a non-Jewish god. Here it is below in two separate scrolls:

elohim, god, in Hebrew

elohim, god, in Hebrew

Many Christians, when they first learn Hebrew, get confused by the word elohim. The ending im on Hebrew words is used to make a word plural, much like the ending s in English. (For example, the plural of sopher is sophrim.) Christians sometimes claim that because the Hebrew word for god looks plural, ancient Jews must have believed in the Christian doctrine of the trinity. But this is very wrong, and rather offensive to Jews.

Tradition has that Moses is the sole author of the Torah, and that Jewish sophrim have given us perfect copies of Moses’ original manuscripts. Most modern scholars, however, believe in the documentary hypothesis, which challenges this tradition. The hypothesis claims that two different writers wrote the Torah. One writer always referenced God as YHWH, whereas the other always referenced God as elohim. The main evidence for the documentary hypothesis is that some stories in the Torah are repeated twice with slightly different details; in one version God is always called YHWH, whereas in the other God is always called elohim. The documentary hypothesis suggests that some later editor merged two sources together, but didn’t feel comfortable editing out the discrepancies, so left them exactly as they were. Orthodox Jews reject the documentary hypothesis, but some strains of Judaism and most Christian denominations are willing to consider that the hypothesis might be true. This controversy is a very important distinction between different Jewish sects, but most Christians aren’t even aware of the controversy in their holy book.

The next two pictures show common gramatical modifications of the words YHWH and elohim: they have letters attached to them in the front. The word YHWH below has a ל in front. This signifies that something is being done to YHWH or for YHWH. The word elohim has a ה in front. This signifies that we’re talking about the God, not just a god. In Hebrew, prepositions like “to,” “for,” and “the” are not separate words. They’re just letters that get attached to the words they modify.

lamed on yhwh adonai plus a he on elohium

lamed on yhwh adonai plus a he on elohium

Names are very important in Hebrew. Most names are actually phrases. The name Jacob, for example, means “heel puller.” Jacob earned his name because he was pulling the heel of his twin brother Esau when they were born in Genesis 25:26. Below are two different versions of the word Jacob:

the name jacob written in fancy Hebrew script; genesis 25:26

the name jacob written in fancy Hebrew script; genesis 25:26

But names often change in the book of Genesis. In fact, Jacob’s name is changed to Israel in two separate locations: first in Genesis 32 after Jacob wrestles with “a man”; then again in Genesis 35 after Jacob builds an alter to elohim. (This is one of the stories cited as evidence for the documentary hypothesis.) The name Israel is appropriate because it literally means “persevered with God.” The el at the end of Israel is a shortened form of elohim and is another Hebrew word for god.

Here is the name Israel in two different scripts:

Israel in Torah script Hebrew

Israel in Torah script Hebrew

Another important Hebrew name is ישוע. In Hebrew, this name is pronounced yeshua, but Christians commonly pronounce it Jesus! The name literally translates as “salvation.” That’s why the angel in Matthew 1:21 and Luke 1:31 gives Jesus this name. My scrolls are only of the old testament, so I don’t have any examples to show of Jesus’ name!

To wrap up the discussion of scribal writing styles, let’s take a look at the most common phrase in the Torah: ודבר יהוה אל משה. This translates to “and the Lord said to Moses.” Here is is rendered in three different styles:

vaydaber adonai lmosheh

vaydaber adonai lmosheh

vaydaber adonai lmosheh

vaydaber adonai lmosheh

vaydaber adonai lmosheh

vaydaber adonai lmosheh

Now let’s move on to what happens when the sophrim make mistakes.

Copying all of these intricate characters was exhausting work! And hard! So mistakes are bound to happen. But if even a single letter is wrong anywhere in the scroll, the entire scroll is considered unusable. The rules are incredibly strict, and this is why Orthodox Jews reject the documentary hypothesis. To them, it is simply inconceivable to use a version of the Torah that was combined from multiple sources.

The most common way to correct a mistake is to scratch off the outer layer of the parchment, removing the ink. In the picture below, the sopher has written the name Aaron (אהרן) over the scratched off parchment:

scribe mistake in Hebrew Torah scroll

scribe mistake in Hebrew Torah scroll

The next picture shows the end of a line. Because of the mistake, however, the sopher must write several characters in the margin of the text, ruining the nice sharp edge they created with the stretched characters. Writing that enters the margins like this is not kosher.

scribe mistake in Hebrew Torah scroll

scribe mistake in Hebrew Torah scroll

Sometimes a sopher doesn’t realize they’ve made a mistake until several lines later. In the picture below, the sopher has had to scratch off and replace three and a half lines:

scribe makes a big mistake and scratches off several lines in a Torah scroll

scribe makes a big mistake and scratches off several lines in a Torah scroll

Scratching the parchment makes it thinner and weaker. Occasionally the parchment is already very thin, and scratching would tear through to the other side. In this case, the sopher can take a thin piece of blank parchment and attach it to the surface. In the following picture, you can see that the attached parchment has a different color and texture.

parchment mistake added ontop Torah scroll

parchment mistake added ontop Torah scroll

The next picture shows a rather curious instance of this technique. The new parchment is placed so as to cover only parts of words on multiple lines. I can’t imagine how a sopher would make a mistake that would best be fixed in this manner. So my guess is that this patch was applied some time later, by a different sopher to repair some damage that had occurred to the scroll while it was in use.

parchment of repair added to the top of a Torah scroll

parchment of repair added to the top of a Torah scroll

Our last example of correcting mistakes is the most rare. Below, the sopher completely forgot a word when copying the scroll, and added it in superscript above the standard text:

superscript mistake fixing in Toral scroll

superscript mistake fixing in Toral scroll

If we zoom in, you can see that the superscript word is slightly more faded than the surrounding text. This might be because the word was discovered to be missing a long time (days or weeks) after the original text was written, so a different batch of ink was used to write the word.

superscript mistake fixing in Toral scroll

superscript mistake fixing in Toral scroll

Since these scrolls are several hundred years old, they’ve had plenty of time to accumulate damage. When stored improperly, the parchment can tear in some places and bunch up in others:

parchment Torah scroll damage

parchment Torah scroll damage

One of the worst things that can happen to a scroll is water. It damages the parchment and makes the ink run. If this happens, the scroll is ruined permanently.

water damage on a torah scroll

water damage on a torah scroll

you should learn Hebrew!

If you’ve read this far and enjoyed it, then you should learn biblical Hebrew. It’s a lot of fun! You can start right now at any of these great sites:

When you’re ready to get serious, you’ll need to get some books. The books that helped me the most were:

These books all have lots of exercises and make self study pretty simple. The Biblical Hebrew Workbook is for absolute beginners. Within the first few sessions you’re translating actual bible verses and learning the nuances that get lost in the process. I spent two days a week with this book, two hours at each session. It took about four months to finish.

The other two books start right where the workbook stops. They walk you through many important passages and even entire books of the old testament. After finishing these books, I felt comfortable enough to start reading the old testament by myself. Of course I was still very slow and was constantly looking things up in the dictionary!

For me, learning the vocabulary was the hardest part. I used a great free piece of software called FoundationStone to help. The program remembers which words you struggle with and quizes you on them more frequently.

Finally, let’s end with my favorite picture of them all. Here we’re looking down through a rolled up Torah scroll at one of my sandals.

torah sandals james bond

torah sandals james bond

August 10, 2014 12:00 AM

August 09, 2014

Edward Z. Yang

What’s a module system good for anyway?

This summer, I've been working at Microsoft Research implementing Backpack, a module system for Haskell. Interestingly, Backpack is not really a single monolothic feature, but, rather, an agglomeration of small, infrastructural changes which combine together in an interesting way. In this series of blog posts, I want to talk about what these individual features are, as well as how the whole is greater than the sum of the parts.

But first, there's an important question that I need to answer: What's a module system good for anyway? Why should you, an average Haskell programmer, care about such nebulous things as module systems and modularity. At the end of the day, you want your tools to solve specific problems you have, and it is sometimes difficult to understand what problem a module system like Backpack solves. As tomejaguar puts it: "Can someone explain clearly the precise problem that Backpack addresses? I've read the paper and I know the problem is 'modularity' but I fear I am lacking the imagination to really grasp what the issue is."

Look no further. In this blog post, I want to talk concretely about problems Haskellers have today, explain what the underlying causes of these problems are, and say why a module system could help you out.

The String, Text, ByteString problem

As experienced Haskellers are well aware, there are multitude of string types in Haskell: String, ByteString (both lazy and strict), Text (also both lazy and strict). To make matters worse, there is no one "correct" choice of a string type: different types are appropriate in different cases. String is convenient and native to Haskell'98, but very slow; ByteString is fast but are simply arrays of bytes; Text is slower but Unicode aware.

In an ideal world, a programmer might choose the string representation most appropriate for their application, and write all their code accordingly. However, this is little solace for library writers, who don't know what string type their users are using! What's a library writer to do? There are only a few choices:

  1. They "commit" to one particular string representation, leaving users to manually convert from one representation to another when there is a mismatch. Or, more likely, the library writer used the default because it was easy. Examples: base (uses Strings because it completely predates the other representations), diagrams (uses Strings because it doesn't really do heavy string manipulation).
  2. They can provide separate functions for each variant, perhaps identically named but placed in separate modules. This pattern is frequently employed to support both strict/lazy variants Text and ByteStringExamples: aeson (providing decode/decodeStrict for lazy/strict ByteString), attoparsec (providing Data.Attoparsec.ByteString/Data.Attoparsec.ByteString.Lazy), lens (providing Data.ByteString.Lazy.Lens/Data.ByteString.Strict.Lens).
  3. They can use type-classes to overload functions to work with multiple representations. The particular type class used hugely varies: there is ListLike, which is used by a handful of packages, but a large portion of packages simply roll their own. Examples: SqlValue in HDBC, an internal StringLike in tagsoup, and yet another internal StringLike in web-encodings.

The last two methods have different trade offs. Defining separate functions as in (2) is a straightforward and easy to understand approach, but you are still saying no to modularity: the ability to support multiple string representations. Despite providing implementations for each representation, the user still has to commit to particular representation when they do an import. If they want to change their string representation, they have to go through all of their modules and rename their imports; and if they want to support multiple representations, they'll still have to write separate modules for each of them.

Using type classes (3) to regain modularity may seem like an attractive approach. But this approach has both practical and theoretical problems. First and foremost, how do you choose which methods go into the type class? Ideally, you'd pick a minimal set, from which all other operations could be derived. However, many operations are most efficient when directly implemented, which leads to a bloated type class, and a rough time for other people who have their own string types and need to write their own instances. Second, type classes make your type signatures more ugly String -> String to StringLike s => s -> s and can make type inference more difficult (for example, by introducing ambiguity). Finally, the type class StringLike has a very different character from the type class Monad, which has a minimal set of operations and laws governing their operation. It is difficult (or impossible) to characterize what the laws of an interface like this should be. All-in-all, it's much less pleasant to program against type classes than concrete implementations.

Wouldn't it be nice if I could import String, giving me the type String and operations on it, but then later decide which concrete implementation I want to instantiate it with? This is something a module system can do for you! This Reddit thread describes a number of other situations where an ML-style module would come in handy.

(PS: Why can't you just write a pile of preprocessor macros to swap in the implementation you want? The answer is, "Yes, you can; but how are you going to type check the thing, without trying it against every single implementation?")

Destructive package reinstalls

Have you ever gotten this error message when attempting to install a new package?

$ cabal install hakyll
cabal: The following packages are likely to be broken by the reinstalls:
Use --force-reinstalls if you want to install anyway.

Somehow, Cabal has concluded that the only way to install hakyll is to reinstall some dependency. Here's one situation where a situation like this could come about:

  1. pandoc and Graphalyze are compiled against the latest unordered-containers-, which itself was compiled against the latest hashable-
  2. hakyll also has a dependency on unordered-containers and hashable, but it has an upper bound restriction on hashable which excludes the latest hashable version. Cabal decides we need to install an old version of hashable, say hashable-
  3. If hashable- is installed, we also need to build unordered-containers against this older version for Hakyll to see consistent types. However, the resulting version is the same as the preexisting version: thus, reinstall!

The root cause of this error an invariant Cabal currently enforces on a package database: there can only be one instance of a package for any given package name and version. In particular, this means that it is not possible to install a package multiple times, compiled against different dependencies. This is a bit troublesome, because sometimes you really do want the same package installed multiple times with different dependencies: as seen above, it may be the only way to fulfill the version bounds of all packages involved. Currently, the only way to work around this problem is to use a Cabal sandbox (or blow away your package database and reinstall everything, which is basically the same thing).

You might be wondering, however, how could a module system possibly help with this? It doesn't... at least, not directly. Rather, nondestructive reinstalls of a package are a critical feature for implementing a module system like Backpack (a package may be installed multiple times with different concrete implementations of modules). Implementing Backpack necessitates fixing this problem, moving Haskell's package management a lot closer to that of Nix's or NPM.

Version bounds and the neglected PVP

While we're on the subject of cabal-install giving errors, have you ever gotten this error attempting to install a new package?

$ cabal install hledger-0.18
Resolving dependencies...
cabal: Could not resolve dependencies:
# pile of output

There are a number of possible reasons why this could occur, but usually it's because some of the packages involved have over-constrained version bounds (especially upper bounds), resulting in an unsatisfiable set of constraints. To add insult to injury, often these bounds have no grounding in reality (the package author simply guessed the range) and removing it would result in a working compilation. This situation is so common that Cabal has a flag --allow-newer which lets you override the upper bounds of packages. The annoyance of managing bounds has lead to the development of tools like cabal-bounds, which try to make it less tedious to keep upper bounds up-to-date.

But as much as we like to rag on them, version bounds have a very important function: they prevent you from attempting to compile packages against dependencies which don't work at all! An under-constrained set of version bounds can easily have compiling against a version of the dependency which doesn't type check.

How can a module system help? At the end of the day, version numbers are trying to capture something about the API exported by a package, described by the package versioning policy. But the current state-of-the-art requires a user to manually translate changes to the API into version numbers: an error prone process, even when assisted by various tools. A module system, on the other hand, turns the API into a first-class entity understood by the compiler itself: a module signature. Wouldn't it be great if packages depended upon signatures rather than versions: then you would never have to worry about version numbers being inaccurate with respect to type checking. (Of course, versions would still be useful for recording changes to semantics not seen in the types, but their role here would be secondary in importance.) Some full disclosure is warranted here: I am not going to have this implemented by the end of my internship, but I'm hoping to make some good infrastructural contributions toward it.


If you skimmed the introduction to the Backpack paper, you might have come away with the impression that Backpack is something about random number generators, recursive linking and applicative semantics. While these are all true "facts" about Backpack, they understate the impact a good module system can have on the day-to-day problems of a working programmer. In this post, I hope I've elucidated some of these problems, even if I haven't convinced you that a module system like Backpack actually goes about solving these problems: that's for the next series of posts. Stay tuned!

by Edward Z. Yang at August 09, 2014 11:21 PM


7 Startups - part 5 - the XMPP backend

Note: I ran out of time weeks ago. I could never finish this serie as I envisionned, and I don’t see much free time on the horizon. Instead of letting this linger forever, here is a truncated conclusion. The previous episodes were :

  • Part 1 : probably the best episode, about the basic game types.
  • Part 2 : definition of the game rules in an unspecified monad.
  • Part 3 : writing an interpreter for the rules.
  • Part 4 : stumbling and failure in writing a clean backend system.

In the previous episode I added a ton of STM code and helper functions in several 15 minutes sessions. The result was not pretty, and left me dissatisfied.

For this episode, I decided to release my constraints. For now, I am only going to support the following :

  • The backend list will not be dynamic : a bunch of backends are going to be registered once, and it will be not be possible to remove an existing or add a previous backend once this is done.
  • The backends will be text-line based (XMPP and IRC are good protocols for this). This will unfortunately make it harder to write a nice web interface for the game too, but given how much time I can devote to this side-project this doesn’t matter much …

The MVC paradigm

A great man once said that “if you have category theory, everything looks like a pipe. Or a monad. Or a traversal. Or perhaps it’s a cosomething”. With the previously mentionned restrictions, I was able to shoehorn my problem in the shape of the mvc package, which I wanted to try for a while. It might be a bit different that what people usually expect when talking about the model - view - controller pattern, and is basically :

  • Some kind of pollable input (the controllers),
  • a pure stream based computation (the model), sporting an internal state and transforming the data coming from the inputs into something that is passed to …
  • … IO functions that run the actual effects (the views).

Each of these components can be reasoned about separately, and combined together in various ways.

There is however one obvious problem with this pattern, due to the way the game is modeled. Currently, the game is supposed to be able to receive data from the players, and to send data to them. It would need to live entirely in the model for this to work as expected, but the way it is currently written doesn’t make it obvious.

It might be possible to have the game be explicitely CPS, so that the pure part would run the game until communication with the players is required, which would translate nicely in an output that could be consumed by a view.

This would however require some refactoring and a lot of thinking, which I currently don’t have time for, so here is instead how the information flows :

Information flow

Here PInput and GInput are the type of the inputs (respectively from player and games). The blue boxes are two models that will be combined together. The pink ones are the type of outputs emitted from the models. The backends serve as drivers for player communication. The games run in their respective threads, and the game manager spawns and manages the game threads.

Comparison with the “bunch of STM functions” model

I originally started with a global TVar containing the state information of each players (for example if they are part of a game, still joining, due to answer to a game query, etc.). There were a bunch of “helper functions” that would manipulate the global state in a way that would ensure its consistency. The catch is that the backends were responsible for calling these helper functions at appropriate times and for not messing with the global state.

The MVC pattern forces the structure of your program. In my particular case, it means a trick is necessary to integrate it with the current game logic (that will be explained later). The “boSf” pattern is more flexible, but carries a higher cognitive cost.

With the “boSf” pattern, response to player inputs could be :

  • Messages to players, which fits well with the model, as it happened over STM channels, so the whole processing / state manipulation / player output could be of type Input -> STM ().
  • Spawning a game. This time we need forkIO and state manipulation. This means a type like c :: Input -> STM (IO ()), with a call like join (atomically (c input)).

Now there are helper functions that return an IO action, and some that don’t. When some functionnality is added, some functions need to start returning IO actions. This is ugly and makes it harder to extend.

Conclusion of the serie

Unfortunately I ran out of time for working on this serie a few weeks ago. The code is out, the game works and it’s fun. My original motivation for writing this post was as an exposure on basic type-directed design to my non-Haskeller friends, but I think it’s not approachable to non Haskellers, so I never shown them.

The main takeaways are :

Game rules

The game rules have first been written with an unspecified monad that exposed several functions required for user interaction. That’s the reason I started with defining a typeclass, that way I wouldn’t have to worry about implementing the “hard” part and could concentrate on writing the rules instead. For me, this was the fun part, and it was also the quickest.

As of the implementation of the aforementionned functions, I then used the operational package, that would let me write and “interpreter” for my game rules. One of them is pure, and used in tests. There are two other interpreters, one of them for the console version of the game, the other for the multi-backends system.

Backend system

The backends are, I think, easy to expand. Building the core of the multi-game logic with the mvc package very straightforward. It would be obvious to add an IRC backend to the XMPP one, if there weren’t that many IRC packages to choose from on hackage …

A web backend doesn’t seem terribly complicated to write, until you want to take into account some common web application constraints, such as having several redundant servers. In order to do so, the game interpreter should be explicitely turned into an explicit continuation-like system (with the twist it only returns on blocking calls) and the game state serialized in a shared storage system.


My main motivation was to show it was possible to eliminate tons of bug classes by encoding of the invariants in the type system. I would say this was a success.

The area where I expected to have a ton of problems was the card list. It’s a tedious manual process, but some tests weeded out most of the errors (it helps that there are some properties that can be verified on the deck). The other one was the XMPP message processing in its XML horror. It looks terrible.

The area where I wanted this process to work well was a success. I wrote the game rules in one go, without any feedback. Once they were completed, I wrote the backends and tested the game. It turned out they were very few bugs, especially when considering the fact that the game is a moderately complicated board game :

  • One of the special capabilities was replaced with another, and handled at the wrong moment in the game. This was quickly debugged.
  • I used traverse instead of both for tuples. I expected them to have the same result, and it “typechecked” because my tuple was of type (a,a), but the Applicative instance for tuples made it obvious this wasn’t the case. That took a bit longer to find out, as it impacted half of the military victory points, which are distributed only three times per game.
  • I didn’t listen to my own advice, and didn’t take the time to properly encode that some functions only worked with nonempty lists as arguments. This was also quickly found out, using quickcheck.

The game seems to run fine now. There is a minor rule bugs identified (the interaction between card-recycling abilities and the last turn for example), but I don’t have time to fix it.

There might be some interest with the types of the Hub, as they also encode a lot of invariants.

Also off-topic, but I really like using the lens vocabulary to encode the relationship between types these days. A trivial example can be found here.

The game

That might be the most important part. I played a score of games, and it was a lot of fun. The game is playable, and just requires a valid account on an XMPP server. Have fun !

August 09, 2014 08:48 AM

Yesod Web Framework

Deprecating yesod-platform

I want to deprecate the yesod-platform, and instead switch to Stackage server as the recommended installation method for Yesod for end users. To explain why, let me explain the purpose of yesod-platform, the problems I've encountered maintaining it, and how Stackage Server can fit in. I'll also explain some unfortunate complications with Stackage Server.

Why yesod-platform exists

Imagine a simpler Yesod installation path:

  1. cabal install yesod-bin, which provides the yesod executable.
  2. yesod init to create a scaffolding.
  3. cabal install inside that directory, which downloads and installs all of the necessary dependencies.

This in fact used to be the installation procedure, more or less. However, this led to a number of user problems:

  • Back in the earlier days of cabal-install, it was difficult for the dependency solver to find a build plan in this situation. Fortunately, cabal-install has improved drastically since then.
    • This does still happen occasionally, especially with packages with restrictive upper bounds. Using --max-backjumps=-1 usually fixes that.
  • It sometimes happens that an upstream package from Yesod breaks Yesod, either by changing an API accidentally, or by introducing a runtime bug.

This is where yesod-platform comes into play. Instead of leaving it up to cabal-install to track down a consistent build plan, it specifies exact versions of all depedencies to ensure a consistent build plan.

Conflicts with GHC deps/Haskell Platform

Yesod depends on aeson. So logically, yesod-platform should have a strict dependency on aeson. We try to always use the newest versions of dependencies, so today, that would be aeson == In turn, this demands text >= However, if you look at the Haskell Platform changelog, there's no version of the platform that provides a new enough version of text to support that constraint.

yesod-platform could instead specify an older version of aeson, but that would unnecessarily constrain users who aren't sticking to the Haskell Platform versions (which, in my experience, is the majority of users). This would also cause more dependency headaches down the road, as you'd now also need to force older versions of packages like criterion.

To avoid this conflict, yesod-platform has taken the approach of simply omitting constraints on any packages in the platform, as well as any packages with strict bounds on those packages. And if you look at yesod-platform today, you'll that there is no mention of aeson or text.

A similar issue pops up for packages that are a dependency of the GHC package (a.k.a., GHC-the-library). The primary problem there is the binary package. In this case, the allowed version of the package depends on which version of GHC is being used, not the presence or absence of the Haskell Platform.

This results in two problems:

  • It's very difficult to maintain this list of excluded packages correctly. I get large number of bug reports about these kinds of build plan problems.

  • We're giving up quite a bit of the guaranteed buildability that yesod-platform was supposed to provide. If aeson (as an example) doesn't work with yesod-form, yesod-platform won't be able to prevent such a build plan from happening.

There's also an issue with the inability to specify dependencies on executable-only packages, like alex, happy, and yesod-bin.

Stackage Server

Stackage Server solves exactly the same problem. It provides a consistent set of packages that can be installed together. Unlike yesod-platform, it can be distinguished based on GHC version. And it's far simpler to maintain. Firstly, I'm already maintaining Stackage Server full time. And secondly, all of the testing work is handled by a very automated process.

So here's what I'm proposing: I'll deprecate the yesod-platform package, and change the Yesod quickstart guide to have the following instructions:

  • Choose an appropriate Stackage snapshot from
  • Modify your cabal config file appropriately
  • cabal install yesod-bin alex happy
  • Use yesod init to set up a scaffolding
  • cabal install --enable-tests in the new directory

For users wishing to live on more of a bleeding edge, the option is always available to simply not use Stackage. Such a usage will give more control over package versions, but will also lack some stability.

The problems

There are a few issues that need to be ironed out.

  • cabal sandbox does not allow changing the remote-repo. Fortunately, Luite seems to have this solved, so hopefully this won't be a problem for long. Until then, you can either use a single Stackage snapshot for all your development, or use a separate sandboxing technology like hsenv.

  • Haskell Platform conflicts still exist. The problem I mentioned above with aeson and text is a real problem. The theoretically correct solution is to create a Stackage snapshot for GHC 7.8 + Haskell Platform. And if there's demand for that, I'll bite the bullet and do it, but it's not an easy bullet to bite. But frankly, I'm not hearing a lot of users saying that they want to peg Haskell Platform versions specifically.

    In fact, the only users who really seem to want to stick to Haskell Platform versions are Windows users, and the main reason for this is the complexity in installing the network package on Windows. I think there are three possible solutions to this issue, without forcing Windows users onto old versions of packages:

    1. Modify the network package to be easier to install on Windows. I really hope this has some progress. If this is too unstable to be included in the official Hackage release, we could instead have an experimental Stackage snapshot for Windows with that modification applied.
    2. Tell Windows users to simply bypass Stackage and yesod-platform, with the possibility of more build problems on that platform.
      • We could similarly recommend Windows users develop in a Linux virtual machine/Docker image.
    3. Provide a Windows distribution of GHC + cabal-install + network. With the newly split network/network-uri, this is a serious possibility.

Despite these issues, I think Stackage Server is a definite improvement on yesod-platform on Linux and Mac, and will likely still improve the situation on Windows, once we figure out the Haskell Platform problems.

I'm not making any immediate changes. I'd very much like to hear people using Yesod on various operating systems to see how these changes will affect them.

August 09, 2014 05:10 AM

August 08, 2014

Brent Yorgey

Maniac week

Inspired by Bethany Soule (and indirectly by Nick Winter, and also by the fact that my dissertation defense and the start of the semester are looming), I am planning a “maniac week” while Joyia and Noah will be at the beach with my family (I will join them just for the weekend). The idea is to eliminate as many distractions as possible and to do a ton of focused work. Publically committing (like this) to a time frame, ground rules, and to putting up a time-lapse video of it afterwards are what actually make it work—if I don’t succeed I’ll have to admit it here on my blog; if I waste time on Facebook the whole internet will see it in the video; etc. (There’s actually no danger of wasting time on Facebook in particular since I have it blocked, but you get the idea.)

Here are the rules:

  • I will start at 6pm (or thereabouts) on Friday, August 8.
  • I will continue until 10pm on Wednesday, August 13, with the exception of the morning of Sunday, August 10 (until 2pm).
  • I will get at least 7.5 hours of sleep each night.
  • I will not eat cereal for any meal other than breakfast.
  • I will reserve 3 hours per day for things like showering, eating, and just plain resting.  Such things will be tracked by the TagTime tag “notwork”.
  • I will spend the remaining 13.5 hours per day working productively. Things that will count as productive work:
    • Working on my dissertation
    • Course prep for CS 354 (lecture and assignment planning, etc.) and CS 134 (reading through the textbook); making anki decks with names and faces for both courses
    • Updating my academic website (finish converting to Hakyll 4; add potential research and independent study topics for undergraduates)
    • Processing FogBugz tickets
    • I may work on other research or coding projects (e.g. diagrams) each day, but only after spending at least 7 hours on my dissertation.
  • I will not go on IRC at all during the week.  I will disable email notifications on my phone (but keep the phone around for TagTime), and close and block gmail in my browser.  I will also disable the program I use to check my UPenn email account.
  • For FogBugz tickets which require responding to emails, I will simply write the email in a text file and send it later.
  • I may read incoming email and write short replies on my phone, but will keep it to a bare minimum.
  • I will not read any RSS feeds during the week.  I will block feedly in my browser.
  • On August 18 I will post a time-lapse video of August 8-13.  I’ll probably also write a post-mortem blog post, if I feel like I have anything interesting to say.
  • I reserve the right to tweak these rules (by editing this post) up until August 8 at 6pm.  After that point it’s shut up and work time, and I cannot change the rules any more.

And no, I’m not crazy. You (yes, you) could do this too.

by Brent at August 08, 2014 07:43 PM

Bryan O'Sullivan

criterion 1.0

<html xmlns=""> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> <meta content="text/css" http-equiv="Content-Style-Type"/> <meta content="pandoc" name="generator"/> <style type="text/css">code{white-space: pre;}</style> </head> <body>

Almost five years after I initially released criterion, I'm delighted to announce a major release with a large number of appealing new features.

As always, you can install the latest goodness using cabal install criterion, or fetch the source from github.

Please let me know if you find criterion useful!

New documentation

I built both a home page and a thorough tutorial for criterion. I've also extended the inline documentation and added a number of new examples.

All of the documentation lives in the github repo, so if you'd like to see something improved, please send a bug report or pull request.

New execution engine

Criterion's model of execution has evolved, becoming vastly more reliable and accurate. It can now measure events that take just a few hundred picoseconds.

benchmarking return ()
time                 512.9 ps   (512.8 ps .. 513.1 ps)

While almost all of the core types have changed, criterion should remain API-compatible with the vast majority of your benchmarking code.

New metrics

In addition to wall-clock time, criterion can now measure and regress on the following metrics:

  • CPU time
  • CPU cycles
  • bytes allocated
  • number of garbage collections
  • number of bytes copied during GC
  • wall-clock time spent in mutator threads
  • CPU time spent running mutator threads
  • wall-clock time spent doing GC
  • CPU time spent doing GC

Linear regression

Criterion now supports linear regression of a number of metrics.

Here's a regression conducted using --regress cycles:iters:

cycles:              1.000 R²   (1.000 R² .. 1.000 R²)
  iters              47.718     (47.657 .. 47.805)

The first line of the output is the R² goodness-of-fit measure for this regression, and the second is the number of CPU cycles (measured using the rdtsc instruction) to execute the operation in question (integer division).

This next regression uses --regress allocated:iters to measure the number of bytes allocated while constructing an IntMap of 40,000 values.

allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
  iters              4.382e7    (4.379e7 .. 4.384e7)

(That's a little under 42 megabytes.)

New outputs

While its support for active HTML has improved, criterion can also now output JSON and JUnit XML files.

New internals

Criterion has received its first spring cleaning, and is much easier to understand as a result.


I was inspired into some of this work by the efforts of the authors of the OCaml Core_bench package.

</body> </html>

by Bryan O'Sullivan at August 08, 2014 10:02 AM

August 07, 2014

wren gayle romano

On being the "same" or "different": Introduction to Apartness

Meanwhile, back in math land... A couple-few months ago I was doing some work on apartness relations. In particular, I was looking into foundational issues, and into what an apartness-based (rather than equality-based) dependently-typed programming language would look like. Unfortunately, too many folks think "constructive mathematics" only means BHK-style intuitionistic logic— whereas constructive mathematics includes all sorts of other concepts, and they really should be better known!

So I started writing a preamble post, introducing the basic definitions and ideas behind apartnesses, and... well, I kinda got carried away. Instead of a blog post I kinda ended up with a short chapter. And then, well, panic struck. In the interests of Publish Ever, Publish Often, I thought I might as well share it: a brief introduction to apartness relations. As with my blog posts, I'm releasing it under Creative Commons Attribution-NonCommercial-NoDerivs 4.0; so feel free to share it and use it for classes. But, unlike the other columbicubiculomania files, it is not ShareAlike— since I may actually turn it into a published chapter someday. So do respect that. And if you have a book that needs some chapters on apartness relations, get in touch!

The intro goes a little something like this:

We often talk about values being "the same as" or "different from" one another. But how can we formalize these notions? In particular, how should we do so in a constructive setting?

Constructively, we lack a general axiom for double-negation elimination; therefore, every primitive notion gives rise to both strong (strictly positive) and weak (doubly-negated) propositions. Thus, from the denial of (weak) difference we can only conclude weak sameness. Consequently, in the constructive setting it is often desirable to take difference to be a primitive— so that, from the denial of strong difference we can in fact conclude strong sameness.

This ability "un-negate" sameness is the principal reason for taking difference to be one of our primitive notions. While nice in and of itself, it also causes the strong and weak notions of sameness to become logically equivalent (thm 1.4); enabling us to drop the qualifiers when discussing sameness.

But if not being different is enough to be considered the same, then do we still need sameness to be primitive? To simplify our reasoning, we may wish to have sameness be defined as the lack of difference. However, this is not without complications. Sameness has been considered a natural primitive for so long that it has accrued many additional non-propositional properties (e.g., the substitution principle). So, if we eliminate the propositional notion of primitive equality, we will need somewhere else to hang those coats.

The rest of the paper fleshes out these various ideas.

comment count unavailable comments

August 07, 2014 09:51 AM

Oliver Charles

Working with postgresql-simple with generics-sop

The least interesting part of my job as a programmer is the act of pressing keys on a keyboard, and thus I actively seek ways to reduce typing. As programmers, we aim for reuse in a our programs - abstracting commonality into reusable functions such that our programs get more concise. Functional programmers are aware of the benefits of higher-order functions as one form of generic programming, but another powerful technique is that of data type generic programming.

This variant of generic programming allows one to build programs that work over arbitrary data types, providing they have some sort of known “shape”. We describe the shape of data types by representing them via a code - often we can describe a data type as a sum of products. By sum, we are talking about the choice of a constructor in a data type (such as choosing between Left and Right to construct Either values), and by product we mean the individual fields in a constructor (such as the individual fields in a record).

Last month, Edsko and Löh announced a new library for generic programming: generics-sop. I’ve been playing with this library in the last couple of days, and I absolutely love the approach. In today’s short post, I want to demonstrate how easy it is to use this library. I don’t plan to go into a lot of detail, but I encourage interested readers to check out the associated paper - True Sums of Products - a paper with a lovely balance of theory and a plethora of examples.


When working with postgresql-simple, one often defines records and corresponding FromRow and ToRow instances. Let’s assume we’re modelling a library. No library is complete without books, so we might begin with a record such as:

data Book = Book
  { bookTitle :: Text
  , bookAuthor :: Text
  , bookISBN :: ISBN
  , bookPublishYear :: Int

In order to store and retrieve these in our database, we need to write the following instances:

instance FromRow Book where
  toRow = Book <$> field <*> field <*> field <*> field

instance ToRow Book where
  toRow Book{..} =
    [ toField bookTitle
    , toField bookAuthor
    , toField bookISBN
    , toField bookPublishYear

As you can see - that’s a lot of boilerplate. In fact, it’s nearly twice as much code as the data type itself! The definitions of these instances are trivial, so it’s frustrating that I have to manually type the implementation bodies by hand. It’s here that we turn to generics-sop.

First, we’re going to need a bit of boiler-plate in order to manipulate Books generically:

data Book = ...
  deriving (GHC.Generics.Generic)

instance Generics.SOP.Generic Book

We derive generic representations of our Book using GHC.Generics, and in turn use this generic representation to derive the Generics.SOP.Generic instance. With this out of the way, we’re ready to work with Books in a generic manner.


The generics-sop library works by manipulating heterogeneous lists of data. If we look at our Book data type, we can see that the following two are morally describing the same data:

book = Book "Conceptual Mathematics" "Lawvere, Schanuel" "978-0-521-71916-2" 2009
book = [ "Conceptual Mathematics", "Lawvere, Schanuel", "978-0-521-71916-2", 2009 ]

Of course, we can’t actually write such a thing in Haskell - lists are required to have all their elements of the same type. However, using modern GHC extensions, we can get very close to modelling this:

data HList :: [*] -> * where
  Nil :: HList '[]
  (:*) :: x -> HList xs -> HList (x ': xs)

book :: HList '[Text, Text, ISBN, Int]
book = "Conceptual Mathematics"
    :* "Lawvere, Schanuel"
    :* "978-0-521-71916-2"
    :* 2009
    :* Nil

Once we begin working in this domain, a lot of the techniques we’re already familiar with continue fairly naturally. We can map over these lists, exploit their applicative functor-like structure, fold them, and so on.

generics-sop continues in the trend, using kind polymorphism and a few other techniques to maximise generality. We can see what exactly is going on with generics-sop if we ask GHCI for the :kind! of Book’s generic Code:

> :kind! Code Book
Code Book = SOP I '[ '[ Text, Text, ISBN, Int ] ]

The list of fields is contained within another list of all possible constructors - as Book only has one constructor, thus there is only one element in the outer list.

FromRow, Generically

How does this help us solve the problem of our FromRow and ToRow instances? First, let’s think about what’s happening when we write instances of FromRow. Our Book data type has four fields, so we need to use field four times. field has side effects in the RowParser functor, so we sequence all of these calls using applicative syntax, finally applying the results to the Book constructor.

Now that we’ve broken the problem down, we’ll start by solving our first problem - calling field the correct number of times. Calling field means we need to have an instance of FromField for each field in a constructor, so to enforce this, we can use All to require all fields have an instance of a type class. We also use a little trick with Proxy to specify which type class we need to use. We combine all of this with hcpure, which is a variant of pure that can be used to build a product:

fields :: (All FromField xs, SingI xs) => NP RowParser xs
fields = hcpure fromField field
  where fromField = Proxy :: Proxy FromField

So far, we’ve built a product of field calls, which you can think of as being a list of RowParsers - something akin to [RowParser ..]. However, we need a single row parser returning multiple values, which is more like RowParser [..]. In the Prelude we have a function to sequence a list of monadic actions:

sequence :: Monad m => [m a] -> m [a]

There is an equivalent in generics-sop for working with heterogeneous lists - hsequence. Thus if we hsequence our fields, we build a single RowParser that returns a product of values:

fields :: (All FromField xs, SingI xs) => RowParser (NP I xs)
fields = hsequence (hcpure fromField field)
  where fromField = Proxy :: Proxy FromField

(I is the “do nothing” identity functor).

Remarkably, these few lines of code are enough to construct data types. All we need to do is embed this product in a constructor of a sum, and then switch from the generic representation to a concrete data type. We’ll restrict ourselves to data types that have only one constructor, and this constraint is mentioned in the type below (Code a ~ '[ xs ] forces a to have only one constructor):

  :: (All FromField xs, Code a ~ '[xs], SingI xs, Generic a)
  => RowParser a
gfrowRow = to . SOP . Z <$> hsequence (hcpure fromField field)
  where fromField = Proxy :: Proxy FromField

That’s all there is to it! No type class instances, no skipping over meta-data - we just build a list of field calls, sequence them, and turn the result into our data type.

ToRow, Generically

It’s not hard to apply the same ideas for ToRow. Recall the definition of ToRow:

class ToRow a where
  toRow :: a -> [Action]

toRow takes a value of type a and turns it into a list of actions. Usually, we have one action for each field - we just call toField on each field in the record.

To work with data generically, we first need move from the original data type to its generic representation, which we can do with from and a little bit of pattern matching:

gtoRow :: (Generic a, Code a ~ '[xs]) => a -> [Action]
gtoRow a =
  case from a of
    SOP (Z xs) -> _

Here we pattern match into the fields of the first constructor of the data type. xs is now a product of all fields, and we can begin turning into Actions. The most natural way to do this is simply to map toField over each field, collecting the resulting Actions into a list. That is, we’d like to do:

map toField xs

That’s not quite possible in generics-sop, but we can get very close. Using hcliftA, we can lift a method of a type class over a heterogeneous list:

gtoRow :: (Generic a, Code a ~ '[xs], All ToField xs, SingI xs) => a -> [Action]
gtoRow a =
  case from a of
    SOP (Z xs) -> _ (hcliftA toFieldP (K . toField . unI) xs)

  where toFieldP = Proxy :: Proxy ToField

We unwrap from the identity functor I, call toField on the value, and then pack this back up using the constant functor K. The details here are a little subtle, but essentially this moves us from a heterogeneous list to a homogeneous list, where each element of the list is an Action. Now that we have a homogeneous list, we can switch back to a more basic representation by collapsing the structure with hcollapse:

gtoRow :: (Generic a, Code a ~ '[xs], All ToField xs, SingI xs) => a -> [Action]
gtoRow a =
  case from a of
    SOP (Z xs) -> hcollapse (hcliftA toFieldP (K . toField . unI) xs)

  where toFieldP = Proxy :: Proxy ToField

Admittedly this definition is a little more complicated than one might hope, but it’s still extremely concise and declarative - there’s only a little bit of noise added. However, again we should note there was no need to write type class instances, perform explicit recursion or deal with meta-data - generics-sop stayed out of way and gave us just what we needed.


Now that we have gfromRow and gtoRow it’s easy to extend our application. Perhaps we now want to extend our database with Author objects. We’re now free to do so, with minimal boiler plate:

data Book = Book
  { bookId :: Int
  , bookTitle :: Text
  , bookAuthorId :: Int
  , bookISBN :: ISBN
  , bookPublishYear :: Int
  } deriving (GHC.Generics.Generic)

instance Generic.SOP.Generic Book
instance FromRow Book where fromRow = gfromRow
instance ToRow Book where toRow = gtoRow

data Author = Author
  { authorId :: Int
  , authorName :: Text
  , authorCountry :: Country
  } deriving (GHC.Generics.Generic)

instance Generic.SOP.Generic Author
instance FromRow Author where fromRow = gfromRow
instance ToRow Author where toRow = gtoRow

generics-sop is a powerful library for dealing with data generically. By using heterogeneous lists, the techniques we’ve learnt at the value level naturally extend and we can begin to think of working with generic data in a declarative manner. For me, this appeal to familiar techniques makes it easy to dive straight in to writing generic functions - I’ve already spent time learning to think in maps and folds, it’s nice to see the ideas transfer to yet another problem domain.

generics-sop goes a lot further than we’ve seen in this post. For more real-world examples, see the links at the top of the generics-sop Hackage page.

August 07, 2014 12:00 AM

August 06, 2014

Yesod Web Framework

Announcing auto-update

Kazu and I are happy to announce the first release of auto-update, a library to run update actions on a given schedule. To make it more concrete, let's start with a motivating example.

Suppose you're writing a web service which will return the current time. This is simple enough with WAI and Warp, e.g.:

{-# LANGUAGE OverloadedStrings #-}
import           Data.ByteString.Lazy.Char8 (pack)
import           Data.Time                  (formatTime, getCurrentTime)
import           Network.HTTP.Types         (status200)
import           Network.Wai                (responseLBS)
import           Network.Wai.Handler.Warp   (run)
import           System.Locale              (defaultTimeLocale)

main :: IO ()
main =
    run 3000 app
    app _ respond = do
        now <- getCurrentTime
        respond $ responseLBS status200 [("Content-Type", "text/plain")]
                $ pack $ formatTime defaultTimeLocale "%c" now

This is all well and good, but it's a bit inefficient. Imagine if you have a thousand requests per second (some people really like do know what time it is). We will end up recalculating the string representation of the time a 999 extra times than is necessary! To work around this, we have a simple solution: spawn a worker thread to calculate the time once per second. (Note: it will actually calculate it slightly less than once per second due to the way threadDelay works; we're assuming we have a little bit of latitude in returning a value thats a few milliseconds off.)

{-# LANGUAGE OverloadedStrings #-}
import           Control.Concurrent         (forkIO, threadDelay)
import           Control.Monad              (forever)
import           Data.ByteString.Lazy.Char8 (ByteString, pack)
import           Data.IORef                 (newIORef, readIORef, writeIORef)
import           Data.Time                  (formatTime, getCurrentTime)
import           Network.HTTP.Types         (status200)
import           Network.Wai                (responseLBS)
import           Network.Wai.Handler.Warp   (run)
import           System.Locale              (defaultTimeLocale)

getCurrentTimeString :: IO ByteString
getCurrentTimeString = do
    now <- getCurrentTime
    return $ pack $ formatTime defaultTimeLocale "%c" now

main :: IO ()
main = do
    timeRef <- getCurrentTimeString >>= newIORef
    _ <- forkIO $ forever $ do
        threadDelay 1000000
        getCurrentTimeString >>= writeIORef timeRef
    run 3000 (app timeRef)
    app timeRef _ respond = do
        time <- readIORef timeRef
        respond $ responseLBS status200 [("Content-Type", "text/plain")] time

Now we will calculate the current time once per second, which is far more efficient... right? Well, it depends on server load. Previously, we talked about a server getting a thousand requests per second. Let's instead reverse it: a server that gets one request every thousand seconds. In that case, our optimization turns into a pessimization.

This problem doesn't just affect getting the current time. Another example is flushing logs. A hot web server could be crippled by flushing logs to disk on every request, whereas flushing once a second on a less popular server simply keeps the process running for no reason. One option is to put the power in the hands of users of a library to decide how often to flush. But often times, we won't know until runtime how frequently a service will be requested. Or even more complicated: traffic will come in spikes, with both busy and idle times.

(Note that I've only given examples of running web servers, though I'm certain there are plenty of other examples out there to draw from.)

This is the problem that auto-update comes to solve. With auto-update, you declare an update function, a frequency with which it should run, and a threshold at which it should "daemonize". The first few times you request a value, it's calculated in the main thread. Once you cross the daemonize threshold, a dedicated worker thread is spawned to recalculate the value. If the value is not requested during an update period, the worker thread is shut down, and we go back to the beginning.

Let's see how our running example works out with this:

{-# LANGUAGE OverloadedStrings #-}
import           Control.AutoUpdate         (defaultUpdateSettings,
                                             mkAutoUpdate, updateAction)
import           Data.ByteString.Lazy.Char8 (ByteString, pack)
import           Data.Time                  (formatTime, getCurrentTime)
import           Network.HTTP.Types         (status200)
import           Network.Wai                (responseLBS)
import           Network.Wai.Handler.Warp   (run)
import           System.Locale              (defaultTimeLocale)

getCurrentTimeString :: IO ByteString
getCurrentTimeString = do
    now <- getCurrentTime
    return $ pack $ formatTime defaultTimeLocale "%c" now

main :: IO ()
main = do
    getTime <- mkAutoUpdate defaultUpdateSettings
        { updateAction = getCurrentTimeString
    run 3000 (app getTime)
    app getTime _ respond = do
        time <- getTime
        respond $ responseLBS status200 [("Content-Type", "text/plain")] time

If you want to see the impact of this change, add a putStrLn call to getCurrentTimeString and make a bunch of requests to the service. You should see just one request per second, once you get past that initial threshold period (default of 3).

Kazu and I have started using this library in a few places:

  • fast-logger no longer requires explicit flushing; it's handled for you automatically.
  • wai-logger and wai-extra's request logger, by extension, inherit this functionality.
  • Warp no longer has a dedicated thread for getting the current time.
  • The Yesod scaffolding was able to get rid of an annoying bit of commentary.

Hopefully others will enjoy and use this library as well.


The second module in auto-update is Control.Reaper. This provides something similar, but slightly different, from Control.AutoUpdate. The goal is to spawn reaper/cleanup threads on demand. These threads can handle such things as:

  • Recycling resources in a resource pool.
  • Closing out unused connections in a connection pool.
  • Terminating threads that have overstayed a timeout.

This module is currently being used in Warp for slowloris timeouts and file descriptor cache management, though I will likely use it in http-client in the near future as well for its connection manager management.

August 06, 2014 07:10 AM

Dominic Steinitz

Fun with (Kalman) Filters Part II


Suppose we have particle moving in at constant velocity in 1 dimension, where the velocity is sampled from a distribution. We can observe the position of the particle at fixed intervals and we wish to estimate its initial velocity. For generality, let us assume that the positions and the velocities can be perturbed at each interval and that our measurements are noisy.

A point of Haskell interest: using type level literals caught a bug in the mathematical description (one of the dimensions of a matrix was incorrect). Of course, this would have become apparent at run-time but proof checking of this nature is surely the future for mathematicians. One could conceive of writing an implementation of an algorithm or proof, compiling it but never actually running it purely to check that some aspects of the algorithm or proof are correct.

The Mathematical Model

We take the position as x_i and the velocity v_i:

\displaystyle   \begin{aligned}  x_i &= x_{i-1} + \Delta T v_{i-1} + \psi^{(x)}_i \\  v_i &= v_{i-1} + \psi^{(v)}_i \\  y_i &= a_i x_i + \upsilon_i  \end{aligned}

where \psi^{(x)}_i, \psi^{(v)}_i and \upsilon_i are all IID normal with means of 0 and variances of \sigma^2_x, \sigma^2_v and \sigma^2_y

We can re-write this as

\displaystyle   \begin{aligned}  \boldsymbol{x}_i &= \boldsymbol{A}_{i-1}\boldsymbol{x}_{i-1} + \boldsymbol{\psi}_{i-1} \\  \boldsymbol{y}_i &= \boldsymbol{H}_i\boldsymbol{x}_i + \boldsymbol{\upsilon}_i  \end{aligned}


\displaystyle   \boldsymbol{A}_i =    \begin{bmatrix}      1 & \Delta T\\      0 & 1\\    \end{bmatrix}  ,\quad  \boldsymbol{H}_i =    \begin{bmatrix}      a_i & 0 \\    \end{bmatrix}  ,\quad  \boldsymbol{\psi}_i \sim {\cal{N}}\big(0,\boldsymbol{\Sigma}^{(x)}_i\big)  ,\quad  \boldsymbol{\Sigma}^{(x)}_i =    \begin{bmatrix}      \sigma^2_{x} & 0\\      0 & \sigma^2_{v} \\    \end{bmatrix}  ,\quad  \boldsymbol{\upsilon}_i \sim {\cal{N}}\big(0,\boldsymbol{\Sigma}^{(y)}_i\big)  ,\quad  \boldsymbol{\Sigma}^{(y)}_i =    \begin{bmatrix}      \sigma^2_{z} \\    \end{bmatrix}

Let us denote the mean and variance of \boldsymbol{X}_i\,\vert\,\boldsymbol{Y}_{i-1} as \hat{\boldsymbol{x}}^\flat_i and \hat{\boldsymbol{\Sigma}}^\flat_i respectively and note that

\displaystyle   \begin{aligned}  {\boldsymbol{Y}_i}\,\vert\,{\boldsymbol{Y}_{i-1}} =  {\boldsymbol{H}_i\boldsymbol{X}_i\,\vert\,{\boldsymbol{Y}_{i-1}} + \boldsymbol{\Upsilon}_i}\,\vert\,{\boldsymbol{Y}_{i-1}} =  {\boldsymbol{H}_i\boldsymbol{X}_i\,\vert\,{\boldsymbol{Y}_{i-1}} + \boldsymbol{\Upsilon}_i}  \end{aligned}

Since {\boldsymbol{X}_i}\,\vert\,{\boldsymbol{Y}_{i-1}} and {\boldsymbol{Y}_i}\,\vert\,{\boldsymbol{Y}_{i-1}} are jointly Gaussian and recalling that ({\hat{\boldsymbol{\Sigma}}^\flat_i})^\top = \hat{\boldsymbol{\Sigma}}^\flat_i as covariance matrices are symmetric, we can calculate their mean and covariance matrix as

\displaystyle   \begin{bmatrix}      \hat{\boldsymbol{x}}^\flat_i \\      \boldsymbol{H}_i\hat{\boldsymbol{x}}^\flat_i  \end{bmatrix}  ,\quad  \begin{bmatrix}      \hat{\boldsymbol{\Sigma}}^\flat_i & \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top \\       \boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i & \boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top + \boldsymbol{\Sigma}^{(y)}_i \\  \end{bmatrix}

We can now use standard formulæ which say if

\displaystyle   \begin{bmatrix}      \boldsymbol{X} \\      \boldsymbol{Y}  \end{bmatrix}  \sim  {\cal{N}}  \begin{bmatrix}  \begin{bmatrix}      \boldsymbol{\mu}_x \\      \boldsymbol{\mu}_y  \end{bmatrix}  &  ,  &  \begin{bmatrix}      \boldsymbol{\Sigma}_x & \boldsymbol{\Sigma}_{xy} \\      \boldsymbol{\Sigma}^\top_{xy} & \boldsymbol{\Sigma}_y  \end{bmatrix}  \end{bmatrix}


\displaystyle   \boldsymbol{X}\,\vert\,\boldsymbol{Y}=\boldsymbol{y} \sim {{\cal{N}}\big( \boldsymbol{\mu}_x + \boldsymbol{\Sigma}_{xy}\boldsymbol{\Sigma}^{-1}_y(\boldsymbol{y} - \boldsymbol{\mu}_y) , \boldsymbol{\Sigma}_x - \boldsymbol{\Sigma}_{xy}\boldsymbol{\Sigma}^{-1}_y\boldsymbol{\Sigma}^\top_{xy}\big)}

and apply this to

\displaystyle   (\boldsymbol{X}_i\,\vert\, \boldsymbol{Y}_{i-1})\,\vert\,(\boldsymbol{Y}_i\,\vert\, \boldsymbol{Y}_{i-1})

to give

\displaystyle   \boldsymbol{X}_i\,\vert\, \boldsymbol{Y}_{i} = \boldsymbol{y}_i  \sim  {{\cal{N}}\big( \hat{\boldsymbol{x}}^\flat_i + \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top  \big(\boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top + \boldsymbol{\Sigma}^{(y)}_i\big)^{-1}  (\boldsymbol{y}_i - \boldsymbol{H}_i\hat{\boldsymbol{x}}^\flat_i) , \hat{\boldsymbol{\Sigma}}^\flat_i - \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top(\boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top + \boldsymbol{\Sigma}^{(y)}_i)^{-1}\boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i\big)}

This is called the measurement update; more explicitly

\displaystyle   \begin{aligned}  \hat{\boldsymbol{x}}^i &\triangleq  \hat{\boldsymbol{x}}^\flat_i +  \hat{\boldsymbol{\Sigma}}^\flat_i  \boldsymbol{H}_i^\top  \big(\boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top + \boldsymbol{\Sigma}^{(y)}_i\big)^{-1}  (\boldsymbol{y}_i - \boldsymbol{H}_i\hat{\boldsymbol{x}}^\flat_i) \\  \hat{\boldsymbol{\Sigma}}_i &\triangleq  {\hat{\boldsymbol{\Sigma}}^\flat_i - \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top(\boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top + \boldsymbol{\Sigma}^{(y)}_i)^{-1}\boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i}  \end{aligned}

Sometimes the measurement residual \boldsymbol{v}_i, the measurement prediction covariance \boldsymbol{S}_i and the filter gain \boldsymbol{K}_i are defined and the measurement update is written as

\displaystyle   \begin{aligned}  \boldsymbol{v}_i & \triangleq  \boldsymbol{y}_i - \boldsymbol{H}_i\hat{\boldsymbol{x}}^\flat_i \\  \boldsymbol{S}_i & \triangleq  \boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i  \boldsymbol{H}_i^\top + \boldsymbol{\Sigma}^{(y)}_i \\  \boldsymbol{K}_i & \triangleq \hat{\boldsymbol{\Sigma}}^\flat_i  \boldsymbol{H}_i^\top\boldsymbol{S}^{-1}_i \\  \hat{\boldsymbol{x}}^i &\triangleq \hat{\boldsymbol{x}}^\flat_i + \boldsymbol{K}_i\boldsymbol{v}_i \\  \hat{\boldsymbol{\Sigma}}_i &\triangleq \hat{\boldsymbol{\Sigma}}^\flat_i - \boldsymbol{K}_i\boldsymbol{S}_i\boldsymbol{K}^\top_i  \end{aligned}

We further have that

\displaystyle   \begin{aligned}  {\boldsymbol{X}_i}\,\vert\,{\boldsymbol{Y}_{i-1}} =  {\boldsymbol{A}_i\boldsymbol{X}_{i-1}\,\vert\,{\boldsymbol{Y}_{i-1}} + \boldsymbol{\Psi}_{i-1}}\,\vert\,{\boldsymbol{Y}_{i-1}} =  {\boldsymbol{A}_i\boldsymbol{X}_{i-1}\,\vert\,{\boldsymbol{Y}_{i-1}} + \boldsymbol{\Psi}_i}  \end{aligned}

We thus obtain the Kalman filter prediction step:

\displaystyle   \begin{aligned}  \hat{\boldsymbol{x}}^\flat_i &=  \boldsymbol{A}_{i-1}\hat{\boldsymbol{x}}_{i-1} \\  \hat{\boldsymbol{\Sigma}}^\flat_i &= \boldsymbol{A}_{i-1}                                       \hat{\boldsymbol{\Sigma}}_{i-1}                                       \boldsymbol{A}_{i-1}^\top                                     + \boldsymbol{\Sigma}^{(x)}_{i-1}  \end{aligned}

Further information can be found in (Boyd 2008), (Kleeman 1996) and (Särkkä 2013).

A Haskell Implementation

The hmatrix now uses type level literals via the DataKind extension in ghc to enforce compatibility of matrix and vector operations at the type level. See here for more details. Sadly a bug in the hmatrix implementation means we can’t currently use this excellent feature and we content ourselves with comments describing what the types would be were it possible to use it.

> {-# OPTIONS_GHC -Wall                     #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing  #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults   #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind  #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans         #-}
> {-# LANGUAGE DataKinds                    #-}
> {-# LANGUAGE ScopedTypeVariables          #-}
> {-# LANGUAGE RankNTypes                   #-}
> module FunWithKalmanPart1a where
> import Numeric.LinearAlgebra.HMatrix hiding ( outer )
> import Data.Random.Source.PureMT
> import Data.Random hiding ( gamma )
> import Control.Monad.State
> import qualified Control.Monad.Writer as W
> import Control.Monad.Loops

Let us make our model almost deterministic but with noisy observations.

> stateVariance :: Double
> stateVariance = 1e-6
> obsVariance :: Double
> obsVariance = 1.0

And let us start with a prior normal distribution with a mean position and velocity of 0 with moderate variances and no correlation.

> -- muPrior :: R 2
> muPrior :: Vector Double
> muPrior = vector [0.0, 0.0]
> -- sigmaPrior :: Sq 2
> sigmaPrior :: Matrix Double
> sigmaPrior = (2 >< 2) [ 1e1,   0.0
>                       , 0.0,   1e1
>                       ]

We now set up the parameters for our model as outlined in the preceeding section.

> deltaT :: Double
> deltaT = 0.001
> -- bigA :: Sq 2
> bigA :: Matrix Double
> bigA = (2 >< 2) [ 1, deltaT
>                 , 0,      1
>                 ]
> a :: Double
> a = 1.0
> -- bigH :: L 1 2
> bigH :: Matrix Double
> bigH = (1 >< 2) [ a, 0
>                 ]
> -- bigSigmaY :: Sq 1
> bigSigmaY :: Matrix Double
> bigSigmaY = (1 >< 1) [ obsVariance ]
> -- bigSigmaX :: Sq 2
> bigSigmaX :: Matrix Double
> bigSigmaX = (2 >< 2) [ stateVariance, 0.0
>                      , 0.0,           stateVariance
>                      ]

The implementation of the Kalman filter using the hmatrix package is straightforward.

> -- outer ::  forall m n . (KnownNat m, KnownNat n) =>
> --           R n -> Sq n -> L m n -> Sq m -> Sq n -> Sq n -> [R m] -> [(R n, Sq n)]
> outer :: Vector Double
>          -> Matrix Double
>          -> Matrix Double
>          -> Matrix Double
>          -> Matrix Double
>          -> Matrix Double
>          -> [Vector Double]
>          -> [(Vector Double, Matrix Double)]
> outer muPrior sigmaPrior bigH bigSigmaY bigA bigSigmaX ys = result
>   where
>     result = scanl update (muPrior, sigmaPrior) ys
>     -- update :: (R n, Sq n) -> R m -> (R n, Sq n)
>     update (xHatFlat, bigSigmaHatFlat) y =
>       (xHatFlatNew, bigSigmaHatFlatNew)
>       where
>         -- v :: R m
>         v = y - bigH #> xHatFlat
>         -- bigS :: Sq m
>         bigS = bigH <> bigSigmaHatFlat <> (tr bigH) + bigSigmaY
>         -- bigK :: L n m
>         bigK = bigSigmaHatFlat <> (tr bigH) <> (inv bigS)
>         -- xHat :: R n
>         xHat = xHatFlat + bigK #> v
>         -- bigSigmaHat :: Sq n
>         bigSigmaHat = bigSigmaHatFlat - bigK <> bigS <> (tr bigK)
>         -- xHatFlatNew :: R n
>         xHatFlatNew = bigA #> xHat
>         -- bigSigmaHatFlatNew :: Sq n
>         bigSigmaHatFlatNew = bigA <> bigSigmaHat <> (tr bigA) + bigSigmaX

We create some ranodm data using our model parameters.

> singleSample ::(Double, Double) ->
>                RVarT (W.Writer [(Double, (Double, Double))]) (Double, Double)
> singleSample (xPrev, vPrev) = do
>   psiX <- rvarT (Normal 0.0 stateVariance)
>   let xNew = xPrev + deltaT * vPrev + psiX
>   psiV <- rvarT (Normal 0.0 stateVariance)
>   let vNew = vPrev + psiV
>   upsilon <- rvarT (Normal 0.0 obsVariance)
>   let y = a * xNew + upsilon
>   lift $ W.tell [(y, (xNew, vNew))]
>   return (xNew, vNew)
> streamSample :: RVarT (W.Writer [(Double, (Double, Double))]) (Double, Double)
> streamSample = iterateM_ singleSample (1.0, 1.0)
> samples :: ((Double, Double), [(Double, (Double, Double))])
> samples = W.runWriter (evalStateT (sample streamSample) (pureMT 2))

Here are the actual values of the randomly generated positions.

> actualXs :: [Double]
> actualXs = map (fst . snd) $ take nObs $ snd samples
> test :: [(Vector Double, Matrix Double)]
> test = outer muPrior sigmaPrior bigH bigSigmaY bigA bigSigmaX
>        (map (\x -> vector [x]) $ map fst $ snd samples)

And using the Kalman filter we can estimate the positions.

> estXs :: [Double]
> estXs = map (!!0) $ map toList $ map fst $ take nObs test
> nObs :: Int
> nObs = 1000

And we can see that the estimates track the actual positions quite nicely.

Of course we really wanted to estimate the velocity.

> actualVs :: [Double]
> actualVs = map (snd . snd) $ take nObs $ snd samples
> estVs :: [Double]
> estVs = map (!!1) $ map toList $ map fst $ take nObs test


Boyd, Stephen. 2008. “EE363 Linear Dynamical Systems.”

Kleeman, Lindsay. 1996. “Understanding and Applying Kalman Filtering.” In Proceedings of the Second Workshop on Perceptive Systems, Curtin University of Technology, Perth Western Australia (25-26 January 1996).

Särkkä, Simo. 2013. Bayesian Filtering and Smoothing. Vol. 3. Cambridge University Press.

by Dominic Steinitz at August 06, 2014 06:34 AM

Danny Gratzer

Equality is Hard

Posted on August 6, 2014

Equality seems like one of the simplest things to talk about in a theorem prover. After all, the notion of equality is something any small child can intuitively grasp. The sad bit is, while it’s quite easy to hand-wave about, how equality is formalized seems to be a rather complex topic.

In this post I’m going to attempt to cover a few of the main different means of “equality proofs” or identity types and the surrounding concepts. I’m opting for a slightly more informal approach in the hopes of covering more ground.

Definitional Equality

This is not really an equality type per say, but it’s worth stating explicitly what definitional equality is since I must refer to it several times throughout this post.

Two terms A and B are definitional equal is a judgment notated

Γ ⊢ A ≡ B

This is not a user level proof but rather a primitive, untyped judgment in the meta-theory of the language itself. The typing rules of the language will likely include a rule along the lines of

Γ ⊢ A ≡ B, Γ ⊢ x : A
     Γ ⊢ x : B

So this isn’t an identity type you would prove something with, but a much more magical notion that two things are completely the same to the typechecker.

Now in most type theories we have a slightly more powerful notion of definitional equality where not only are x ≡ y if x is y only by definition but also by computation.

So in Coq for example

(2 + 2) ≡ 4

Even though “definitionally” these are entirely separate entities. In most theories, definitionally equal means “inlining all definitions and with normalization”, but not all.

In type theories that distinguish between the two, the judgment that when normalized x is y is called judgmental equality. I won’t distinguish between the two further because most don’t, but it’s worth noting that they can be seen as separate concepts.

Propositional Equality

This is the sort of equality that we’ll spend the rest of our time discussing. Propositional equality is a particular type constructor with the type/kind

Id : (A : Set) → A → A → Type

We should be able to prove a number of definitions like

reflexivity  : (A : Set)(x     : A) → Id x x
symmetry     : (A : Set)(x y   : A) → Id x y → Id y x
transitivity : (A : Set)(x y z : A) → Id x y → Id y z → Id x z

This is an entirely separate issue from definitional equality since propositional equality is a concept that users can hypothesis about.

One very important difference is that we can make proofs like

sanity : Id 1 2 → ⊥

Since the identity proposition is a type family which can be used just like any other proposition. This is in stark contrast to definitional equality which a user can’t even normally utter!


This is arguably the simplest form of equality. Identity types are just normal inductive types with normal induction principles. The most common is equality given by Martin Lof

data Id (A : Set) : A → A → Type where
   Refl : (x : A) → Id x x

This yields a simple induction principle

id-ind : (P : (x y : A) → Id x y → Type)
       → ((x : A) → P x x (Refl x))
       → (x y : A)(p : Id x y) → P x y p

In other words, if we can prove that P holds for the reflexivity case, than P holds for any x and y where Id x y.

We can actually phrase Id in a number of ways, including

data Id (A : Set)(x : A) : A → Set where
  Refl : Id x x

This really makes a difference in the resulting induction principle

j : (A : Set)(x : A)(P : (y : A) → Id x y → Set)
  → P x Refl
  → (y : A)(p : Id x y) → P y p

This clearly turned out a bit differently! In particular now P is only parametrized over one value of A, y. This particular elimination is traditionally named j.

These alternative phrasings can have serious impacts on proofs that use them. It also has even more subtle effects on things like heterogeneous equality which we’ll discuss later.

The fact that this only relies on simple inductive principles is also a win for typechecking. Equality/substitution fall straight out of how normal inductive types are handled! This also means that we can keep decidability within reason.

The price we pay of course is that this is much more painful to work with. An intensional identity type means the burden of constructing our equality proofs falls on users. Furthermore, we lose the ability to talk about observational equality.

Observational equality is the idea that two “thingies” are indistinguishable by any test.

It’s clear that we can prove that if Id x y, then f x = f y, but it’s less clear how to go the other way and prove something like

fun_ext : (A B : Set)(f g : A → B)
         → ((x : A) → Id (f x) (g x)) → Id f g
fun_ext f g p = ??

Even though this is clearly desirable. If we know that f and g behave exactly the same way, we’d like our equality to be able to state that. However, we don’t know that f and g are constructed the same way, making this impossible to prove.

This can be introduced as an axiom but to maintain our inductively defined equality type we have to sacrifice one of the following

  1. Coherence
  2. Inductive types
  3. Extensionality
  4. Decidability

Some this has been avoided by regarding equality as an induction over the class of types as in Martin Lof’s intuitionist type theory.

In the type theory that we’ve outlined, this isn’t expressible sadly.

Definitional + Extensional

Some type theories go a different route to equality, giving us back the extensionality in the process. One of those type theories is extensional type theory.

In the simplest formulation, we have intensional type theory with a new rule, reflection

Γ ⊢ p : Id x y
  Γ ⊢ x ≡ y

This means that our normal propositional equality can be shoved back into the more magical definitional equality. This gives us a lot more power, all the typecheckers magic and support of definitional equality can be used with our equality types!

It isn’t all puppies an kittens though, arbitrary reflection can also make things undecidable in general. For example Martin Lof’s system is undecidable in with extensional equality.

It’s worth noting that no extensional type theory is implemented this way. Instead they’ve taken a different approach to defining types themselves!

In this model of ETT types are regarded as a partial equivalence relation (PER) over unityped (untyped if you want to get in a flamewar) lambda calculus terms.

These PERs precisely reflect the extensional equality at that “type” and we then check membership by reflexivity. So a : T is synonymous with (a, a) ∈ T. Notice that since we are dealing with a PER, we know that ∀ a. (a, a) ∈ T need not hold. This is reassuring, otherwise we’d be able to prove that every type was inhabited by every term!

The actual NuRPL&friends theory is a little more complicated than that. It’s not entirely dependent on PERs and allows a few different ways of introducing types, but I find that PERs are a helpful idea.

Propositional Extensionality

This is another flavor of extensional type theory which is really just intensional type theory plus some axioms.

We can arrive at this type theory in a number of ways, the simplest is to add axiom K

k : (A : Set)(x : A)(P : (x : A) → Id x x → Type)
  → P x (Refl x) → (p : Id x x) → P x p

This says that if we can prove that for any property P, P x (Refl x) holds, then it holds for any proof that Id x x. This is subtly different than straightforward induction on Id because here we’re not proving that a property parameterized over two different values of A, but only one.

This is horribly inconsistent in something like homotopy type theory but lends a bit of convenience to theories where we don’t give Id as much meaning.

Using k we can prove that for any p q : Id x y, then Id p q. In Agda notation

    prop : (A : Set)(x y : A)(p q : x ≡ y)
         → p ≡ q
    prop A x .x refl q = k A P (λ _ → refl) x q
      where P : (x : A) → x ≡ x → Set
            P _ p = refl ≡ p

This can be further refined to show that that we can eliminate all proofs that Id x x are Refl x

    rec : (A : Set)(P : A → Set)(x y : A)(p : P x) → x ≡ y → P y
    rec A P x .x p refl = p

    rec-refl-is-useless : (A : Set)(P : A → Set)(x : A)
                        → (p : P x)(eq : x ≡ x) → p ≡ rec A P x x p eq 
    rec-refl-is-useless A P x p eq with prop A x x eq refl
    rec-refl-is-useless A P x p .refl | refl = refl

This form of extensional type theory still leaves a clear distinction between propositional equality and definitional equality by avoiding a reflection rule. However, with rec-refl-is–useless we can do much of the same things, whenever we have something that matches on an equality proof we can just remove it.

We essentially have normal propositional equality, but with the knowledge that things can only be equal in 1 way, up to propositional equality!

Heterogeneous Equality

The next form of equality we’ll talk about is slightly different than previous ones. Heterogeneous equality is designed to co-exist in some other type theory and supplement the existing form of equality.

Heterogeneous equality is most commonly defined with John Major equality

    data JMeq : (A B : Set) → A → B → Set where
      JMrefl : (A : Set)(x : A) → JMeq A A x x

This is termed after a British politician since while it promises that any two terms can be equal regardless of their class (type), only two things from the same class can ever be equal.

Now remember how earlier I’d mentioned that how we phrase these inductive equality types can have a huge impact? We’ll here we can see that because the above definition doesn’t typecheck in Agda!

That’s because Agda is predicative, meaning that a type constructor can’t quantify over the same universe it occupies. We can however, cleverly phrase JMeq so to avoid this

    data JMeq (A : Set) : (B : Set) → A → B → Set where
      JMrefl : (a : A) → JMeq A A a a

Now the constructor avoids quantifying over Set and therefore fits inside the same universe as A and B.

JMeq is usually paired with an axiom to reflect heterogeneous equality back into our normal equality proof.

reflect : (A : Set)(x y : A) → JMeq x y → Id x y

This reflection doesn’t look necessary, but arises for similar reasons that dictate that k is unprovable.

It looks like this heterogeneous equality is a lot more trouble than it’s worth at first. It really shines when we’re working with terms that we know must be the same, but require pattern matching or other jiggering to prove.

If you’re looking for a concrete example, look no further than Observational Equality Now!. This paper gives allows observational equality to be jammed into a principally intensional system!

Wrap Up

So this has been a whirlwind tour through a lot of different type theories. I partially wrote this to gather some of this information in one (free) place. If there’s something here missing that you’d like to see added, feel free to comment or email me.

Thanks to Jon Sterling for proof reading and many subtle corrections :)

<script type="text/javascript"> /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + ''; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

August 06, 2014 12:00 AM

August 05, 2014

wren gayle romano

Imagine that this is not an academic debate

A followup to my previous [reddit version]:

The examples are of limited utility. The problem is not a few bad apples or a few bad words; were that the case it would be easier to address. The problem is a subtle one: it's in the tone and tenor of conversation, it's in the things not talked about, in the implicitization of assumptions, and in a decentering of the sorts of communities of engagement that Haskell was founded on.

Back in 2003 and 2005, communities like Haskell Cafe were communities of praxis. That is, we gathered because we do Haskell, and our gathering was a way to meet others who do Haskell. Our discussions were centered on this praxis and on how we could improve our own doing of Haskell. Naturally, as a place of learning it was also a place of teaching— but teaching was never the goal, teaching was a necessary means to the end of improving our own understandings of being lazy with class. The assumptions implicit in the community at the time were that Haskell was a path to explore, and an obscure one at that. It is not The Way™ by any stretch of the imagination. And being a small community it was easy to know every person in it, to converse as you would with a friend not as you would online.

Over time the tone and nature of the Cafe changed considerably. It's hard to explain the shift without overly praising the way things were before or overly condemning the shift. Whereas the Cafe used to be a place for people to encounter one another on their solitary journeys, in time it became less of a resting stop (or dare I say: cafe) and more of a meeting hall. No longer a place to meet those who do Haskell, but rather a place for a certain communal doing of Haskell. I single the Cafe out only because I have the longest history with that community, but the same overall shift has occurred everywhere I've seen. Whereas previously it was a community of praxis, now it is more a community of educationalism. In the public spaces there is more teaching of Haskell than doing of it. There's nothing wrong with teaching, but when teaching becomes the thing-being-done rather than a means to an end, it twists the message. It's no longer people asking for help and receiving personal guidance, it's offering up half-baked monad tutorials to the faceless masses. And from tutorialization it's a very short path to proselytizing and evangelizing. And this weaponization of knowledge always serves to marginalize and exclude very specific voices from the community.

One class of voices being excluded is women. To see an example of this, consider the response to Doaitse Swierstra's comment at the 2012 Haskell Symposium. Stop thinking about the comment. The comment is not the point. The point is, once the problematic nature of the comment was raised, how did the community respond? If you want a specific example, this is it. The example is not in what Swierstra said, the example is in how the Haskell community responded to being called out. If you don't recall how this went down, here's the reddit version; though it's worth pointing out that there were many other conversations outside of reddit. A very small number of people acquitted themselves well. A handful of people knew how to speak the party line but flubbed it by mansplaining, engaging in flamewars, or allowing the conversation to be derailed. And a great many people were showing their asses all over the place. Now I want you to go through and read every single comment there, including the ones below threshold. I want you to read those comments and imagine that this is not an academic debate. Imagine that this is your life. Imagine that you are the unnamed party under discussion. That your feelings are the ones everyone thinks they know so much about. That you personally are the one each commenter is accusing of overreacting. Imagine that you are a woman, that you are walking down the street in the middle of the night in an unfamiliar town after a long day of talks. It was raining earlier so the streets are wet. You're probably wearing flats, but your feet still hurt. You're tired. Perhaps you had a drink over dinner with other conference-goers, or perhaps not. Reading each comment, before going on to the next one, stop and ask yourself: would you feel safe if this commenter decided to follow you home on that darkened street? Do you feel like this person can comprehend that you are a human being on that wet street? Do you trust this person's intentions in being around you late at night? And ask yourself, when some other commenter on that thread follows you home at night and rapes you in the hotel, do you feel safe going to the comment's author to tell them what happened? Because none of this is academic. As a woman you go to conferences and this is how you are treated. And the metric of whether you can be around someone is not whether they seem interesting or smart or anything else, the metric is: do you feel safe? If you can understand anything about what this is like, then reading that thread will make you extremely uncomfortable. The problem is not that some person makes a comment. The problem is that masculinized communities are not safe for women. The problem is that certain modes of interaction are actively hostile to certain participants. The problem is finding yourself in an uncomfortable situation and knowing that noone has your back. Knowing that anyone who agrees with you will remain silent because they do not think you are worth the time and energy to bother supporting. Because that's what silence says. Silence says you are not worth it. Silence says you are not one of us. Silence says I do not think you are entirely human. And for all the upvotes and all the conversation my previous comment has sparked on twitter, irc, and elsewhere, I sure don't hear anyone here speaking up to say they got my back.

This is not a problem about women in Haskell. Women are just the go-to example, the example cis het middle-class educated able white men are used to engaging. Countless voices are excluded by the current atmosphere in Haskell communities. I know they are excluded because I personally watched them walk out the door after incidents like the one above, and I've been watching them leave for a decade. I'm in various communities for queer programmers, and many of the folks there use Haskell but none of them will come within ten feet of "official" Haskell communities. That aversion is even stronger in the transgender/genderqueer community. I personally know at least a dozen trans Haskellers, but I'm the only one who participates in the "official" Haskell community. Last fall I got hatemail from Haskellers for bringing up the violence against trans women of color on my blog, since that blog is syndicated to Planet Haskell. Again, when I brought this up, people would express their dismay in private conversations, but noone would say a damn thing in public nor even acknowledge that I had spoken. Ours has never been a great community for people of color, and when I talk to POC about Haskell I do not even consider directing them to the "official" channels. When Ken Shan gave the program chair report at the Haskell symposium last year, there was a similarly unwholesome response as with Swierstra's comment the year before. A number of people have shared their experiences in response to Ken's call, but overwhelmingly people feel like their stories of being marginalized and excluded "don't count" or "aren't enough to mention". Stop. Think about that. A lot of people are coming forward to talk about how they've been made to feel uncomfortable, and while telling those stories they feel the need to qualify. While actively explaining their own experiences of racism, sexism, heterosexism, cissexism, ablism, sanism, etc, they feel the simultaneous need to point out that these experiences are not out of the ordinary. Experiencing bigotry is so within the ordinary that people feel like they're being a bother to even mention it. This is what I'm talking about. This is what I mean when I say that there is a growing miasma in our community. This is how racism and sexism and ablism work. It's not smacking someone on the ass or using the N-word. It's a pervasive and insidious tone in the conversation, a thousand and one not-so-subtle clues about who gets to be included and who doesn't. And yes the sexual assaults and slurs and all that factor in, but that's the marzipan on top of the cake. The cake is made out of assuming someone who dresses "like a rapper" can't be a hacker. The cake is made out of assuming that "mother" and "professional" are exclusive categories. The cake is made out of well-actuallys and feigned surprise. And it works this way because this is how it avoids being called into question. So when you ask for specific examples you're missing the point. I can give examples, but doing so only contributes to the errant belief that bigotry happens in moments. Bigotry is not a moment. Bigotry is a sustained state of being that permeates one's actions and how one forms and engages with community. So knowing about that hatemail, or knowing about when I had to call someone out for sharing titty pictures on Haskell Cafe, or knowing about the formation of #nothaskell, or knowing about how tepid the response to Tim's article or Ken's report were, knowing about none of these specifics helps to engage with the actual problem.

comment count unavailable comments

August 05, 2014 04:01 PM

August 04, 2014

Douglas M. Auclair (geophf)

Are you serious? "What degree?"

So, serious question.

An aside (get used to them, by the way), people are never sure if I'm serious.

I'm, like, seriously? C'mon!

I am always silly (well, oftentimes silly, compared to my dour workmates), but I am always serious in my silliness.

Chesterton said it well: the opposite of funny is not serious: the opposite of funny is not funny.

Okay, that aside is done. Now onto the topic at hand.

Hands up, those who have ever used your education in your jobs. Hands up, those who needed proof of your degree to prove competency as a prerequisite of employment.

(sound of crickets.)

Thought so.

Actually, more of you might raise your hands to me than to most of my colleagues, because why? Because I have close ties to academe, that's why. So there are more than a few of you who are required to have your Master's degree or, those of you who are post-docs, to have your Ph.D. to get that research position or grant that you're working on.

The rest of the world?


Education, in the real world, is a detriment to doing your job.

Across the board.


Do you know how many Ph.D.s we fire? Do you know how many Ph.D.s and matriculated students we turn away, because of their education and their lack of real-world experience?

We did a survey: a sure bell-weather for a prospective employee? The amount of their education: the more they have, the more likely they are to be useless on the job.

It's the Ph.D-disease: 'Ph.D: "piled high and deep."' People get ed-ju-ma-kated and then they think because they have a sheepskin or that they have a certain GPA at a certain school and know a certain ... educated way of doing things, that they know how to do this-and-that other thing, totally unrelated on the job.

"I've studied the Martin-Löf's intuitionistic type theory."

"Great, how do you connect to our database."

"Uh ...."


You'll bridle at this, but you know who agrees most strongly with me?


I went to the ICFP2006 specifically looking to roll dependent types into the programming language I was using, in industry, and I could not get the time of day from a single implementer of the type-theory in Twelf, and you know why?

"Why would you be interested in this? This is purely theoretical."

Uh, huh. As in "not applicable to industry."

I reread the paper again, later: "Dependent types make programs easier to understand and more reliable."

And why would I want that in my programs in industry, where it mattered.

I spent four years at the United States Coast Guard Academy: 10,000 students apply, only 300 are allowed in, only 150 graduate, each year. It cost, at the time, $250,000 to graduate a student from the Academy, making it the most expensive school in the nation.

Thank you, taxpayers, for my education.

I graduated with a dual major: mathematics and computer science.

How much of my education did I use on the job to save 150 lives from a capsized Arctic research exploration vessel (boy, they surely used their education, didn't they! ... to capsize their ship), act as the translator when we boarded Japanese trawlers, provide civil rights education and mediate EEO complaints and ...

None. Zip. Zilch.

After my stint, how much of my college education did I use on the job.

My job was encoding matrices in FORTRAN. How much FORTRAN did I study in college?

Zip. Zilch. Nada.

How much of the Advanced Calculus II did I use on the job.

People, it was frikken matrix manipulation! Something you can (now) look up on wikipedia and pick up in, oh, two hours if you're really slow.

Java. C#. Swift. Visual whatever. Spring. SQL. HBase. Angular. JavaScript.

All these things (Python, Ruby on Rails) can be taught in college, but they can be taught in high school, and I learned them in school, did I?

No, I did not. I learned them on my own, thank you very much.

Design patterns, frameworks, data structures. Do educated people know these things?

Some of them do. Most people with a 'computer science' degree DO NOT, people.

They do not. They woefully do not, as comp sci teachers lament over and over again, and as I, the hiring manager, scratch my head wondering what, precisely, did these kids learn in school, because, insofar as I see, they did not learn abstraction, polymorphism, typing, or data structures.

They learned the if-statement.

They learned the if-statement that n-dispatches within a for-loop from 1 to n off an array. That's the data structure they know: the array.

Maps? Sets?

The array.


We got you covered: the if-statement.

Functional decomposition?

Well, there's always main(String[] args), with a big-ole if-statement.

The word on the street is education is a canard at best and a detriment at most, and at worst, it's a project-sinker.

That's a shame, because there are educated people who are productive and smart and effective in their field, and can help.

How to Solve It: a Modern Approach claims that one billion dollars is wasted on software because it's written in the absence of very simple techniques, such as linear programming.


Our authors, Michalewicz and Fogel, are off. Way off. I know. By a factor of at least one-hundred.

We wasted a billion dollars on a file-indexing system for the FBI. Oh, it had email, too. Government project. Never used because it was never delivered in a useable state. Do you know how many go through that cycle?

I don't. I do know know I've seen project after project just ...

God. The waste.

And I have friends. And they tell me stories.

But, you get MITRE in there, or you get a Ph.D. or two in there, and what happens?

They study the issue.

They study it for six-plus months, and then they write you a nice little white paper that states the obvious search criteria that you knew from day one, but what do you have to say? Where is your ROC-analyses? So your bayesian system that was cranking out results month after month was killed by the bean-counting pointy-heads and they submit a ... working solution that could go into production? Oh, no. They submit a white paper calling for a research grant to allow for a year of surveying and further study of the issue.


Then they get fired or they move on to more interesting research areas leaving behind us to clean up the mess and get a working system out the door in some serviceable shape that used zero percent of their research.

Zero percent.

You see, they modeled the situation, but the model doesn't fit the data, which is raw and dirty, so their solution solved the model, not your problem, not even close.

Your degree.

How much have you used your degree on your job?

If you're a researcher, you probably use quite a bit of what you've studied in your research, and you are contributing more to your field of study.

If you're not, then you're told this, day one, on your job: "Friend, put those books away, you're never going to use them again."

I mean, seriously: did you really bring your college books to your job thinking you'd use them?


This, here, is the real world. The ivory tower is for the academics. In the real world, you roll up your sleeves, get to work, and get some results; because if you don't, the door is right over there.

You were expecting to use your degree on your job? This is America, people. We don' need no edjumakashun.

Now, if this were Soviet Russia, your degree uses you.


So, silliness, and serious silliness aside.


You were expecting to use your degree on your job?

English major. This is America, we don't talk English, we talk American, so, that's nice that you have that degree.

Mathematics major. This is America, we don't do 'maths,' nor trig, nor geometric algebras, nor category theory, how would you use any of that on your job?

I was seriously asked that on my interview for a job overseeing a 1.7 petabyte-sized database.

I said: 'uh, map-reduce are straight from category theory.'

"Yes, but how do you use that on your job?"

We both blinked at each other dumbly.

The gulf.

You don't go to school to get trained to do a job well, ladies and gentlemen.

I mean, too many of you do that, and too many others go do school to party some, to sex some, to blaze some, and then get to work after your folks financed your four-plus year bacchanal.

College is not a technical training-institute and has nothing to do with acquiring skills or proficiency on repetitive stress disorder, oh, I meant: 'your job.' Your job, almost without exception, can be just as proficiently performed by nearly anyone they drag off the street and put in your chair for eight hours a day. They sit in your chair for a few days, and everyone else won't even know you're gone.

Most jobs.

My wife beat the pants off her entire payroll division with an excel spreadsheet because they didn't have simple accounting principles and deductive reasoning. Why? Because they were well-regulated at their jobs, proficient at it, in fact, and their job was to make continuous clerical errors because they had absolutely no rigor. Why would they? They weren't paid for rigor. They were paid for doing their jobs, which was: don't make waves.

I regularly go into situations where other software engineers (a misnomer, they are more like computer programmers, not engineers) say such-and-so cannot be done in programming language X.

Then, I implement a little bit of category theory, in programming language X, do some simple mappings and natural transformations, and, voilà! those 50,000 lines of code that didn't solve the problem but only made things worse? I replace all that with 500 lines of code that actually delivers the solution.

Unit tested: all the edge cases.

And meeting their requirements, because I've translated the requirements into a declarative DSL on top of their programming language X.

Of course they couldn't solve the insurmountable problem in programming language X, not because they were using programming language X (although it helped with the colossal fail being object-disoriented and improvably/mutatatively impure), but because they couldn't think outside the box that 'you can only do this and that' as a software engineer. They were caught in their own domain and can't even see that they had boxed themselves in.

Because they were educated that way. Comp Sci 101: this is how you write a program. This is the 'if'-statement. This is the for-loop. If that doesn't work, add more if-statements wrapped by more for-loops, and this statement is perfectly acceptable:

x = x + 1

Go to town.

That's what their education gave them: they went to school to acquire a trade and a proficiency at the if-statement, and gave up their ability to see and to think.

And some, many, academics are the most bigoted, most blundering blinders-on fools out there, because they see it their way, and they see their way as the only way, which requires a six-month research grant and further study after that.

With co-authorship on the American Mathematical Society journal article.

And the uneducated are the worst, most pigheaded fools out there, so sure that the educated have nothing to offer, that they have no dirt under their perfectly manicured fingernails attached silky smooth hands that have never seen an honest-day's work nor, God forbid! a callous, so what do they know, these blowhards, so the uneducated ignore the advances of research into type-theory, category theory, object theory (polymorphism does help at times), any theory, and just code and code and code until they have something that 'looks good.'

How to solve this?

Start with you.

Not with your education, that is: not with your education that tells you who you are.

Start with how you can help, and then help.

  • Project one: I saw how fractal dimensionality would solve a spectrum analysis problem. Did I say the words 'fractal' or 'dimensions'? No. I was working with real-programmers. If I asked if I could try this, do you know what they would say?

    Pfft. Yeah, right. Get back to work, geophf!

    But, instead, I implemented the algorithm. I sat with a user who had been working on those signals and knew what he needed, iterated through the result a week.

    Just a week. While I did my job-job full time. I did the fractal spectrum analysis on my own time.

    My 'thing' floored the software management team. They had seen straight-line approximations before. They thought I was doing actual signal analysis. I mean: with actual signals.

    They showed my 'thing' to the prospective customer. And got funded.

  • Another project: data transformation and storage, built a system that encompassed six-hundred data elements using a monadic framework to handle the semideterminism. That was an unsolvable problem in Java.

    I used Java.

    Java with my monadic framework, yes, but Java, to solve the problem.

  • Third project: calculating a 'sunset date' over a data vector of dimension five over a time continuum.

    Hm: continuum.

    Unsolvable problem. Three teams of software developers tackled it over six months. Nobody could get close to the solution.


    I used a comonadic framework.

    Took me, along with a tester who was the SME on the problem, and a front-end developer to get the presentation layer just right, about a month, and we solved that baby and put it to bed.

    Unit tested. All edge cases.

    Did I tell them I used a comonadic framework?

    Nah, they tripped over themselves when they saw the word 'tuple.'

    No joke, my functional programming language friends: they, 'software engineers,' were afraid of the word 'tuple.'

    So I explained as much as anyone wanted to know when anyone asked. I wrote design documents, showing unit test case results, and they left me alone. They knew I knew what I was doing, and I got them their results. That's what they needed.

    They didn't need my degree.

    They didn't need to know I used predicate logic to optimize SQL queries that took four hours to run to a query that took forty-five seconds.

    They didn't need to know I refactored using type theory, that A + B are disjoint types and A * B are type instances and A ^ B are function applications so I could look at a program, construct a mathematical model of it and get rid of 90% of it because it was all redundantly-duplicated code inside if-clauses, so I simply extracted (2A + 2B ... ad nauseam) to 2(A + B ...) and then used a continuation, for God's sake, with 'in the middle of the procedure' code, or, heaven help me, parameterization over a simple functional decomposition exercise to reduce a nightmare of copy-and-paste to something that had a story to tell that made sense.

How do you connect to a database?

Do you need a college degree for that?

Kids with college degrees don't know the answer to that simple interview question.

And they don't know the Spring framework, making 'how to connect to a database' a stupid-superfluous question.

They don't know what unit tests give them. They don't know what unit tests don't give them. Because they, college kids and crusty old 'software engineers,' don't write them, so they have no consistency nor security in their code: they can't change anything here because it might break something over there, and they have no unit tests as a safety net to provide that feedback to them, and since they are programming in language X, a nice, strict, object-oriented programming language, they have no programs-as-proofs to know that what they are writing is at all good or right or anything.

A college degree gives you not that. A not college degree gives you not that.

A college degree is supposed to what, then?

It's suppose to open your mind to the possibility of a larger world, and it's supposed to give you the tools to think, and to inquire, so that you can discern.

"This, not that. That, and then this. This causes that. That is a consequence of this. I choose this for these reasons. These reasons are sound because of these premises. This reason here. Hm. I wonder about that one. It seems unsound. No: unfamiliar. Is it sound or unsound? Let me find out and know why."

English. Mathematics. Art. Literature. Music. Philosophy. All of these things are the humanities. The sciences and the above. Law. Physics. All these lead one to the tools of inquiry.

In school, you are supposed to have been given tools to reason.

Then, you're sent back out into the world.

And then you are supposed to reason.

And with your reason, you make the world a better place, or a worse place.

These things at school, these are the humanities, and they are there to make you human.

Not good at your job, not so you can 'use' your degree as a skill at work, but to make you human.

And, as human, are you good at your job?


And, as human, do you make your world a place such that others are good and happy at their jobs?


The end of being human is not to be skilled, nor proficient ... 'good' at your job.

But it's an accident of it, a happy accident.

The 'end' of being human?

Well: that's your inquiry.

That's what school, that's what everything, is for: for you to answer the unanswered question.

Your way.

And, if you accept that, and are fully realized as a human being, then your way is the best way in the world, and your way has the ability to change lives. First, your own, then others. Perhaps your coworkers.

Perhaps hundreds of others.

Perhaps thousands.

Perhaps you will change the entire world.

But you won't know that until you take that first step of inquiry.

Then the next.

Then the next.

And you look back, and you see how far you've come, and ... wow.

Just wow.

That's what school is for. Not for your job.

For you.

by geophf ( at August 04, 2014 07:33 PM

wren gayle romano

On my pulling away from Haskell communities

Gershom Bazerman gave some excellent advice for activism and teaching. His focus was on teaching Haskell and advocating for Haskell, but the advice is much more widely applicable and I recommend it to anyone interested in activism, social justice, or education. The piece has garnered a good deal of support on reddit— but, some people have expressed their impression that Gershom's advice is targeting a theoretical or future problem, rather than a very concrete and very contemporary one. I gave a reply there about how this is indeed a very real issue, not a wispy one out there in the distance. However, I know that a lot of people like me —i.e., the people who bear the brunt of these problems— tend to avoid reddit because it is an unsafe place for us, and I think my point is deserving of a wider audience. So I've decided to repeat it here:

This is a very real and current problem. (Regardless of whether things are less bad in Haskell communities than in other programming communities.) I used to devote a lot of energy towards teaching folks online about the ideas behind Haskell. However, over time, I've become disinclined to do so as these issues have become more prevalent. I used to commend Haskell communities for offering a safe and welcoming space, until I stopped feeling quite so safe and welcomed myself.

I do not say this to shame anyone here. I say it as an observation about why I have found myself pulling away from the Haskell community over time. It is not a deliberate act, but it is fact all the same. The thing is, if someone like me —who supports the ideology which gave rise to Haskell, who is well-educated on the issues at hand, who uses Haskell professionally, who teaches Haskell professionally, and most importantly: who takes joy in fostering understanding and in building communities— if someone like me starts instinctively pulling away, that's a problem.

There are few specific instances where I was made to feel unsafe directly, but for years there has been a growing ambiance which lets me know that I am not welcome, that I am not seen as being part of the audience. The ambiance (or should I say miasma?) is one that pervades most computer science and programming/tech communities, and things like dogmatic activism, dragon slaying, smarter-than-thou "teaching", anti-intellectualism, hyper-intellectualism, and talking over the people asking questions, are all just examples of the overarching problem of elitism and exclusion. The problem is not that I personally do not feel as welcomed as I once did, the problem is that many people do not feel welcome. The problem is not that my experience and expertise are too valuable to lose, it's that everyone's experience and expertise is too valuable to lose. The problem is not that I can't teach people anymore, it's that people need teachers and mentors and guides. And when the tenor of conversation causes mentors and guides to pull away, causes the silencing of experience and expertise, causes the exclusion and expulsion of large swaths of people, that always has an extremely detrimental impact on the community.

comment count unavailable comments

August 04, 2014 02:30 AM

August 03, 2014

Thiago Negri

Code reuse considered harmful

The title is intended to call for attention. This post is about one perspective of software development in the light of my own experience in the area, it won't contain anything really revealing and is not to be taken as an absolute true for life. It's a rant. I hope you have a good time reading it, feel free to leave me any kind of feedback.

I see a bunch of people praising reuse as being the prime thing of good software development, and few talking about replaceability. There seems to be a constant seek to avoid writing code that is used only once, as if it was a really bad thing. Then we end up with software that is made of conceptual factories that create factories that create the things the software really needs, yes there are two levels of factories, or more. Is this really necessary? How much time do we save by this extreme look for reusing code?

First, let me ask and answer a simple question: why duplicated code is annoying? Well, duplicated code makes it harder to change stuff. When you have the same piece of code written multiple times in a code base and you find that it needs a change, e.g. bug fix or new feature, you will need to change it in all places. Things can get worst if you don't know all places where the code is duplicated, so you may forget to change one of these spots. The result is that duplicated code is a sign of harder maintenance and a fertile ground for further bugs to spawn. That's why we learned to hate it. We started fighting this anti-pattern with all strength we had.

Code reuse is the perfect counter to code duplication, right? Sure, it is right, if we reuse a piece of code in two places, we have no duplication between these places. So, we did it! We found the Holy Grail of code quality, no more duplicated code, yay! But something unintended happened. Remember the old saying: with great powers, comes great responsibility. People started to be obsessed with it. As soon as they learned to use the hammer of code reuse, everything turned into a nail, when it didn't work out in the first hit, they adjust the size of the hammer and hit it again with more effort.

This seek after code reuse led us to a plethora of abstractions that seems to handle every problem by reusing some code. Don't get me wrong, lots of them are useful, these are the ones that were created from observation. The problem is the ones that are created from "it's cool to abstract", or other random reason that is not true observation. We see frameworks after frameworks that try to fit every problem of the world into a single model. Developers learn to use these frameworks and suddenly find out that the framework creator is wrong and create yet another abstraction over it or creates yet another framework that tries to use a different model to solve the world.

What happens when we have a bug in one of these abstractions or we need to enhance it? Silence, for a while, then the sky turns black, you take a break, go for a walk, come back to your computer and start blaming the other developer that created the bug or that "got the abstraction wrong", because your vision was the right one. What happened? We reused code to avoid code duplication, but we are still having the same problems: code that is hard to maintain and evolve.

My guess? We missed the enemy. Code duplication is not our enemy. Maintenance problem and rigidity of code is.

My tip? Give more focus on replaceability of code instead of reuse in your talks, codes, classes, etc. Create the right abstraction to fix the problem at hand in a way that is easy to replace the underlying code when needed. Some time in the future, you will need to change it anyway. That's what agile methodologies try to teach us: embrace change. Planning for a design to be reused says: "my design will be so awesome, that I will reuse it everywhere." That's what agile says: "your design will need to change sometime, because the requirements will change, plan for the replaceability of it." People are doing things like service oriented architecture in the wrong way because they are looking for reuse of services and not for replaceability of services, they end up with a Big Web of Mud.

That's all folks. Thanks for your time.

by Thiago Negri ( at August 03, 2014 09:24 PM

Unity3D, Bejeweled and Domain-driven design

I'm working in a new game like Bejeweled. I'm happy with the freedom of code organization that Unity3D engine allows. During my first contacts with it, I thought that almost everything would be oriented to MonoBehaviour class, but this showed to be false. This class is necessary just as a glue point between any C# code and the objects of the engine. I'll report how I've started coding this game and the changes I made so far, you can watch the following video to see the current state of the game: <iframe allowfullscreen="allowfullscreen" frameborder="0" height="315" src="" width="420"></iframe>

I started creating a GameObject for every entity that I identified in the game mechanics:

  1. Board
  2. Piece
The board contains every pieces and manages them:

public class Board : MonoBehaviour {
private GameObject[,] pieces;
void Awake() {
pieces = /* Initialize pieces */;
The piece type is defined by a MonoBehaviour that exposes an enumeration:

public class Piece : MonoBehaviour {
public PieceType Type;
public enum PieceType {

After the definition of the entities participating in the game, I started to code game's logic inside these classes. It worked for a while, but some problems appeared. The same classes had lots of different responsibilities (i.e. game's rules, animations, handling input) and this made it hard to code some stuff, because I needed to maintain a mind map of all concerns to avoid breaking something. Also, during animations, the board in memory was in an inconsistent state, waiting for the end of the animation to then continue processing.

Recently I've read some stuff about Domain-driven design (DDD) and decided to apply a bit of it in this game. My first step was to separate my code domain from the others, I selected game's mechanics as my core domain: if this part is not well behaving and it's hard to maintain, I'll be in a bad spot. Then I went to create this domain classes completely separated from the rest of the game, I ignored the existence of Unity3D at this point.

I only seen a single entity for this domain: the board. It makes no sense for the piece to exist on its own, everything that involves pieces always happens inside the board. I still have a class for the piece, but it is an internal thing of the board. My design became this:

public class BoardPosition {
public readonly int Row;
public readonly int Column;
public BoardPosition(int row, int column) {
Row = row;
Column = column;

public class Board {
private Piece[,] pieces;
public Board() {
pieces = /* Initialize pieces */;

#region Queries
public Piece PieceAt(BoardPosition p) { /* ... */ }

#region Events
public delegate void PieceCreatedDelegate(BoardPosition position, Piece piece);
public event PieceCreatedDelegate PieceCreated;

public delegate void PieceDestroyedDelegate(BoardPosition position);
public event PieceDestroyedDelegate PieceDestroyed;

public delegate void PieceMovedDelegate(BoardPosition from, BoardPosition to);
public event PieceMovedDelegate PieceMoved;

public delegate void PiecesSwappedDelegate(BoardPosition a, BoardPosition b);
public event PiecesSwappedDelegate PiecesSwapped;

#region Commands
public void SwapPieces(BoardPosition a, BoardPosition b) {
...; // Swap pieces
PiecesSwapped(a, b);

public void StepGameState() {
...; // Destroy pieces
...; // Move pieces
...; // Create pieces

for (...) {
for (...) {
for (...) {
This way, the view part of the game register itself to handle the events generated by the board and update the user interface as needed.

public class BoardView : MonoBehaviour {
private Board board;
private GameObject[,] pieces;
void Awake() {
board = new Board();
board.PieceCreated += HandlePieceCreated;
board.PieceDestroyed += HandlePieceDestroyed;
board.PieceMoved += HandlePieceMoved;
board.PiecesSwapped += HandlePiecesSwapped;
pieces = /* Initialize pieces based on 'board' */;

public void HandlePieceCreated(BoardPosition position, Piece piece) { /* ... */ }
public void HandlePieceDestroyed(BoardPosition position) { /* ... */ }
public void HandlePieceMoved(BoardPosition from, BoardPosition to) { /* ... */ }
public void HandlePiecesSwapped(BoardPosition a, BoardPosition b) { /* ... */ }

void Update() {
if (/* ... */) {
board.SwapPieces(a, b);

This design made it hard to sync time between the model and the view. The model calls the methods of the view to notify about changes, the view has little space left to decide when to handle each event. In my case, some events started animations that needed to hold other events from happening, i.e. there is a temporal sequencing between some events.

I changed the model to return a list of events that happened at each command, instead of calling the handler directly:

#region Events
public interface BoardEvent {}
public class PieceCreated : BoardEvent { /* ... */ }
public class PieceDestroyed : BoardEvent { /* ... */ }
public class PieceMoved : BoardEvent { /* ... */ }
public class PiecesSwapped : BoardEvent { /* ... */ }

#region Commands
public List<BoardEvent> SwapPieces(BoardPosition a, BoardPosition b) { /* ... */ }
public List<BoardEvent> StepGameState() { /* ... */ }
Now, the view needs to call the handlers itself, but can decide when to handle each event:

public class BoardView : MonoBehaviour {
private List<BoardEvent> events;
void Update() {
if (events.Count < 1) { events = board.StepGameState(); }
foreach (BoardEvent e in events) {
if (CanHandleNow(e)) {
// ...
if (HandledEverything) { events.Clear(); }
After this, I still felt that this temporal sequencing was not clear, it was "floating in the air". I decided to put it into the model, it's part of my domain: every event has a temporal identifier:

public class Board {
private int timeCounter;
public List<BoardEvent> StepGameState() {
...; // Destroy pieces
for (...) {
events.add(new PieceDestroyed(timeCounter, ...));
if (eventHappened) { timeCounter++; }

...; // Move pieces
for (...) {
events.add(new PieceMoved(timeCounter, ...));
if (eventHappened) { timeCounter++; }

...; // Create pieces
for (...) {
events.add(new PieceCreated(timeCounter, ...));
if (eventHappened) { timeCounter++; }

return events;

public class BoardView : MonoBehaviour {
private int timeCounter;
private List<BoardEvent> events;
void Update() {
if (events.Count < 1) { events = board.StepGameState(); }
foreach (BoardEvent e in events) {
if (e.When() == timeCounter) Handle(e);
if (e.When() > timeCounter) {
stillHasEventsToHandle = true;
if (/*handledAnimationOfAllEventsOfMyTimeCounter*/) {
// Advance time perception of view
if (!stillHasEventsToHandle) {
events.Clear(); // Will step game state at next frame
Both view and model has a temporal identifier and the sync is more evident.

The actual code is looking very similar to the listed here. The model is handling well up to now. I feel bad about one thing: the Step command of the model may leave the board in a "not-consolidated" state, as it makes a single interaction to check for matching groups to be removed from the board. The view then needs to call the Step command more than once between handling two inputs from the user. I didn't want to make a lot of interactions in a single Step to avoid putting lots of stuff in memory before anything is handled by the interface, looks like a waste to me. I miss the lazy part of Haskell.

I still have lots of stuff to add to the game's mechanics (my core domain). I'll see the problems of this design in the next days and will post news with the next changes. Critics and suggestions are welcome.

by Thiago Negri ( at August 03, 2014 05:12 PM

Russell O'Connor

ICFP 2014 Post-Mortem

I participated in the 2014 ICFP programming contest this year. This year’s task was to write an AI for a simplified Pac-Man game called Lambda-Man. You could write the AI in any language you wanted, as long as it complies to a specific SECD machine architecture invented for the contest. At the end of the lightening round, it was announced that the final task included writing an AI for the ghosts as well. Again, the ghost AI could be written in any language, as long as it compiles to a separate 8-bit architecture invented for the contest.

I spent the first several hours implementing my own simulator of the arcade. Eventually I realized that I would have to start working on the AI if I was going to have an entry for the 24-hour lightening division. It was at that point I realized that the provided on-line simulator was plenty adequate for my needs and I never completed my simulator.

I have some previous experience writing assembler DSLs in Haskell to handle linking. After the 2006 ICFP contest, our team wrote a fake UM-DOS shell so that we could submit our solution in UM format. This lead me to writing an article in The Monad Reader about how to write an assembler using recursive do. After that, I encountered a really elegant and simple formulation of an assembler monad on some paste site. Unfortunately, I do not recall the author, but here is how the implementation looks.

newtype Label = Label { unLabel :: Int }
data ASMMonad w a = ASMMonad { runASM :: Label -> ([w],a) }

instance Monad (ASMMonad w) where
  return a = ASMMonad $ \_ -> ([], a)
  x >>= y = ASMMonad (\(Label i) -> let (o0, a) = runASM x (Label i)
                                        (o1, b) = runASM (y a) (Label (i+length o0))
                                     in (o0 ++ o1, b))

instance MonadFix (ASMMonad w) where
  mfix f = ASMMonad (\i -> let (o0, a) = runASM (f a) i in (o0, a))
execASM :: ASMMonad w a -> [w]
execASM m = fst $ runASM m (Label 0)

Next one adds two primitive operations. The tell function is similar to the version for the writer monad. The label function returns the current index of the output stream.

tell :: [w] -> ASMMonad w ()
tell l = ASMMonad $ \_ -> (l,())

label :: ASMMonad w Label
label = ASMMonad $ \i -> ([],i)

Lastly one makes an ASMMonadic value for each assembly instruction

data ASM = LDC Int32  -- load constant
         | LD Int Int -- load variable
         | LDF Label  -- load function
         | ADD
         {- … -}
         deriving Show

ldc x = tell [LDC x]
ld x y = tell [LD x y]
ldf x = tell [LDF x]
add = tell [ADD]
{- … -}

At the risk of jumping ahead too far, my compiler can produce linked assembly code very simply. The clause below compiles a lambda abstraction to linked SECD assembly using recursive do.

compileH env (Abs vars body) = mdo
  jmp end
  begin <- label
  compileH (update env vars) body
  end <- label
  ldf begin

Thanks to recursive do, the first line, jmp end, refers to the end label which is bound in the second last line.

With a DSL assembler written in Haskell, I turned to creating another DSL language in Haskell to compile to this assembly language. The SECD machine is designed for Lisp compilers, so I created a little Lisp language.

data Binding a = a := Lisp a

data Lisp a = Var a
            | Const Int32
            | Cons (Lisp a) (Lisp a)
            | Abs [a] (Lisp a)
            | Rec [Binding a] (Lisp a)
            {- … -}

The Abs constructor builds an n-ary lambda function. The Rec constructor plays the role of letrec to build mutually recursive references. With some abuse of the Num class and OverloadedStrings, this Lisp DSL is barely tolerable to program with directly in Haskell.

  Rec [ {- … -}
      ,"heapNew" := ["cmp"]! (Cons "cmp" 0) -- heap layout 0 = leaf | (Cons (Cons /heap is full/ /value/) (Cons /left tree/ /right tree/))
                                            -- "cmp" @@ ["x","y"] returns true when "x" < "y"
      ,"heapIsFull" := ["h"]! If (Atom "h") 1 (caar "h")
      ,"heapInsert" := ["cmpHeap", "v"]! Rec ["cmp" := (car "cmpHeap")
                                             ,"insert" := ["heap", "v"]! -- returns (Cons /new heap is full/ /new heap/)
                                                If (Atom "heap") (Cons (Cons 1 "v") (Cons 0 0))
                                                (Rec ["root" := cdar "heap"
                                                     ,"left" := cadr "heap"
                                                     ,"right" := cddr "heap"
                                                     ] $
                                                 Rec ["swap" := "cmp" @@ ["v", "root"]] $
                                                 Rec ["newRoot" := If "swap" "v" "root"
                                                     ,"newV" := If "swap" "root" "v"
                                                     ] $
                                                 If (caar "heap" `ou` Not ("heapIsFull" @@ ["left"]))
                                                    (Rec ["rec" := "insert" @@ ["left", "newV"]] $
                                                     Cons (Cons 0 "newRoot") (Cons "rec" "right"))
                                                    (Rec ["rec" := "insert" @@ ["right", "newV"]] $
                                                     Cons (Cons ("heapIsFull" @@ ["rec"]) "newRoot") (Cons "left" "rec")))
                                             (Cons "cmp" ("insert" @@ [cdr "cmpHeap","v"]))
  {- … -}

The @@ operator is infix application for the Lisp langauge and the ! operator is infix lambda abstraction for the Lisp langauge.

This Lisp language compiles to the SECD assembly and the assembly is printed out. The compiler is very simple. It does not even implement tail call optimization. There is a bit of an annoying problem with the compiler; the assembly code is structured in exactly the same way that the original Lisp is structured. In particular, lambda abstractions are compiled directly in place, and since lambda expressions are typically not executed in the location they are declared, I have to jump over the compiled code. You can see this happening in the snippet of my compiler above. I would have preferred to write

compileH env (Abs vars body) = do
  fun <- proc (compileH (update env vars) body)
  ldf fun
where proc is some function that takes an ASMMonad value and sticks the assembly code “at the end” and returns a label holding the location where the assembly code got stashed. However, I could not figure out a clever and elegent way of modifing the assembly monad to support this new primitive. This is something for you to ponder.

My Lambda AI, written in my Lisp variant, is fairly simple and similar to other entries. Lambda-Man searches out the maze for the nearest edible object. It searches down each path until it hits a junction and inserts the location of the junction into a binary heap. It also inserts the junction into a binary tree of encountered junctions. If the junction is already in the binary tree, it does not insert the junction into the heap because it has already considered it. The closest junction is popped off the heap, and the search is resumed.

There is at least one bit of surprising behaviour. If there is more than one path from one junction to another, sometimes Lambda-Man ends up taking the longer path. This behaviour did not seem to be bothersome enough to warrant fixing.

This programming task has renewed my appreciation for typed languages. The Lisp language I developed is untyped, and I made several type errors programming in it. Although it is true that I did detect (all?) my errors at run-time, they were still frustrating to debug. In a typed language, when an invariant enforced by the type system is violated, you get a compile time error that, more or less, points to the code location where the invariant is violated. In an untyped language, when an invariant is violated, you get a run-time error that, more or less, points to some point in the code where missing invariant has caused a problem. While this often is enough to determine what invariant was violated, I had little idea where the code breaking the invariant was located.

With some effort I probably could have used GADTs to bring Haskell’s type checker to the Lisp DSL, but I was not confident enough I could pull that off in time.

I also needed to write some ghost AIs. The 8-bit machine that the ghosts run on is so constrained, 256 bytes of data memory; 256 code locations; 8 registers, that it seemed to make sense to write the code in raw assembly.

The first thing I tried was to make the ghosts move randomly. This meant I needed to write my own pseudo-random number generator. Wikipedia lead me to a paper on how to write long period xorshift random number generators. The examples in that paper are all for 32-bit or 64-bit machines, but I had an 8-bit architecture. I wrote a little Haskell program to find analogous random number generators for 8-bit machines. It found 6 possibilities for 32-bit state random number generator composed of four 8-bit words that satisfied the xorshift constraints described in the paper. Here is the assembly code for getting a 2 bit pseudo-random value.

mov a,[0]
div a,2
xor [0],a
mov a,[0]
mul a,2
xor a,[0]
mov [0],[1]
mov [1],[2]
mov [2],[3]
mul [3],8
xor [3],[2]
xor [3],a
; get 2 bits
mov a,[3]
div a,64

The random seed is held in memory locations [0] through [3]. After moving to the successive the state, this code takes 2 pseudo-random bits from memory location [3] and puts it into register a.

I did not check the quality of this random number generator beyond constructing it so that it has a period of 232-1. I expect the bit stream to appear to be quite random.

My Lambda-Man performed reasonably well against my random ghosts, so I put some effort into making my random ghosts a little smarter. I wrote a ghost AI that tried to get above Lambda-man and attack him from above. Then I made each other ghost try to attack Lambda-man from the other three directions in the same manner. The idea is to try to trap Lambda-man between two ghosts.

These smarter ghosts were quite a bit more successful against my simple Lambda-man AI. At this point I was out of contest time, so that was it for my 2014 ICFP contest submission.

Thanks to the organizers for a terrific contest problem. I am looking forward to see the final rankings.

August 03, 2014 03:09 AM

August 02, 2014

Robin KAY

HsQML released: One Thousand Downloads

A few days ago I released HsQML, a bug fix to my Haskell binding to the Qt Quick GUI library. You can download the latest release from Hackage as usual.

The primary purpose of this release was to fix issue 20. HsQML has code which monitors variant values created using the Qt library in order to prevent objects which are referenced by a variant from being garbage collected. A flaw in this code caused it to examine the data held inside variants even when it wasn't valid, causing a crash in certain circumstances.

release- - 2014.07.31

* Fixed crash when storing Haskell objects in QML variants.
* Fixed corrupted logging output caused by threading.

In related news, HsQML has now reached over 1000 downloads from Hackage since Hackage 2 started collecting download statistics late last year. See the bar chart below:-

The spike in May was owed to the transition to Qt 5 brought about by the release of Hopefully, the graph will climb to new heights with the release of more features in the future!

My target for the next release is to support rendering OpenGL graphics directly from Haskell code and into the QML scene, to better support applications with sophisticated requirements for custom drawing. This is tracked by issue 10.

by Robin KAY ( at August 02, 2014 11:59 AM

August 01, 2014

Douglas M. Auclair (geophf)

1HaskellADay July 2014 problems and solutions

  • July 1st, 2014: (text:) "Hi, all! @1HaskellADay problems are back! #YAY First (renewed) problem: a verbose way of saying, 'Give me a deque!'" Deque, last, and all that (verbose version with hints) (solution: Deque the halls (with my solution): Data.Deque)
  • July 2nd, 2014: (text:) "Today's Haskell exercise: Vectors, length in constant time, and (bonus) reverse return in constant time." Vector (solution: Vector: Magnitude, and Direction, OH, YEAH! Data.Vector)
  • July 4th, 2014: (text:) "Today's exercise(s). pack/unpack. encode/decode. Cheer up, sweet B'Genes!" Cheer up, Sweet B'Genes (solution: GATTACA)
  • July 7th, 2014: (text:) "#haskell daily exercise: ROLL with it, Baby!  ('cause I'm feeling a little #forth'y')" Roll (solution: Rollin' on the (finite) river)
  • Bonus problem: July 7th, 2014: (text:) "For those who found the 'roll'-exercise trivial; here's (a more than) a bit more of a challenge for you to play with." Acid rules! (solution: "A solution set to today's challenge exercise: BASIC ...  ... and Acidic  ... WHEW! That was fun!" BASIC ... and Acitic)
  • July 8th, 2014: (text:) "Today's #Haskell exercise: LOTTO! Powerball! Mega-whatever! Who's the big winner?" Lotto (solution: "And the big winner today is ... solution-set to today's #Haskell lotto exercise" ... and the winner is ...)
  • Bonus problem: July 8th, 2014: (text:) "#bonus #haskell exercise: Well, that was RND... Randomness, and such (or 'as such')" Well, that was RND (solution: For YESTERDAY's bonus question of roll-your-own-rnd-nbr-generator, here's one as comonadic cellular automata (*WHEW*) Data.Random)
  • July 9th, 2014: (text:) "Okay, ... WHERE did yesterday and today GO? :/ #haskell exercise today: "Hey, Buddy!"  I will post solution in 4 hours" Hey, Buddy! Distinct sets-of-an-original-set. (solution: "Here's a story ..." A(n inefficient) solution to bunches and cliques." Brady Bunch)
  • July 10th, 2014: (text:) "Today's #haskell list-exercise: "Get out of the pool!"  Will post a solution at 9 pm EDT (which is what time CET? ;)" (solution: "She's a supa-freak! She's supa-freaky! (Bass riff) A solution to today's #haskell exercise about list-length-ordering")
  • July 11th, 2014: (text:) ""It's Friday, Friday!" So, does that mean Rebecca Black wants to code #haskell, too? Today is a Calendar #exercise" (solution: ""In a New York Minute": a solution to today's #haskell exercise that took WAAAY more than a minute to complete! #WHEW")
  • July 14th, 2014: (text:) "Today's #haskell exercise: isPrime with some numbers to test against. They aren't even really Mp-hard. ;)" First stab at primality-test (solution: "A simple, straightforward stab at the test for primality. #haskell #exercise" The start of a primal inquiry
  • July 15th, 2014: (text:) "Primes and ... 'not-primes.' For a prime, p, a run of p-consecutive 'plain' numbers is today's #haskell exercise:" (solution: "So ya gotta ask yerself da question: are ya feelin' lucky, punk? Run of p non-primes in linear time  #haskell exercise." Alternate solution by Gautier:
  • July 16th, 2014: (text:) "Difference lists? We no need no steenkin' Difference lists!"  DList in #haskell for today's exercise. (solution: "DLists? We got'cher DList right here! A solution to today's #haskell exercise is posted at")
  • July 17th, 2014 (text:) " Prélude à l'après-midi d'un Stream ... I thought that last word was ... something else...? #haskell exercise today." Comonads for lists and Id. (solution: "Control.Comonad: That was easy! #haskell exercise #solution" Learn you a Comonad for Greater GoodFunny story, bro'! id is not necessarily Id. (I knew that.) #haskell solution")
  • Bonus exercise: July 17th, 2014 (text:) "Streams are natural, streams are fun, streams are best when ... THEY'RE BONUS QUESTIONS! #bonus #haskell exercise" LET'S GET THIS PARTY STARTED! (solution: "Take this Stream and ... it! #solution to today's #haskell #bonus exercises")
  • July 18th, 2014: (text: "Today's #haskell exercise: Frère Mersenne would like a prime, please.") (see solution next bullet)
  • Bonus exercise: July 18th, 2014 (text: "#bonus prime-time! Frère Mersenne would be pleased with a partial proof of a prime ... in good time.") (solution: "A #haskell #solution for (monadic?) primes and the #bonus interruptible primes.") Primary primes.
  • Bonus-bonus exercise: July 18th, 2014 (text: "Ooh! π-charts! No. Wait. #bonus-bonus #haskell exercise.") (solution: "#bonus-bonus: a #solution")

  • July 21st, 2014: (text: "Demonstrating coprimality of two integers with examples #haskell exercise") (solution: "A coprimes solution #haskell problem is at")
  • July 22nd, 2014: (text: "The prime factors of a number (and grouping thereof) as today's #haskell exercise.") (solution: "OKAY, THEN! Some prime factors for ya, ... after much iteration (torquing) over this #haskell exercise solution.")
  • Bonus exercise: July 22nd, 2014: ("For today's #bonus #haskell exercise you'll find a Bag 'o gold at the end of the rainbow") (solution: "Second things first: a definition for the Bag data type as today's #bonus #haskell exercise.")
  • July 23rd, 2014: (text: "Today's #haskell exercise, two variations of Euler's totient function") (solution: "And, for a very small φ ...  is a solution-set to today's #haskell exercise.")
  • July 24th, 2014: (text: "WEAKSAUCE! Goldbach's conjecture irreverently presented as a #haskell exercise.") (solution: "That solution to today's #haskell exercise will cost you one Goldbach (*groan!*)")
  • July 25th, 2014: LOGIC! Peano series: it's as easy as p1, p2, p3 ... ... in today's #haskell exercise. "Excuse me, Miss, where do we put this Grande Peano?" A solution to today's #Haskell exercise in the HA!-DSL
  • Bonus: July 25th, 2014: Bonus #haskell problem for today. But not as easy as λa, λb, λc ... Church numerals and booleans. Correction: Ooh! forall! Church encodings and Haskell have a funny-kind of relationship. Updated the #bonus #haskell exercise with rank-types and forall. Solution: "Gimme that olde-time Church encoding ... it's good enough for me!" A solution to today's #bonus #haskell exercise
  • July 28th, 2014: George Boole, I presume? Today's #haskell exercise: Solution: This and-or That ... a NAND-implementation of today's #haskell exercise at
  • July 29th, 2014: Readin' 'Ritin' 'Rithmetic: today's #haskell exercise Solution: That's alotta NANDs! A solution to today's exercise at
  • July 30th, 2014: ACHTUNG! BlinkenLights! Today's #haskell exercise Solution: Let it Snow! Let it Snow! Let it (binary) Snow! A solution to today's exercise is at 
  • July 31st, 2014: π-time! Today's #haskell exercise. BLARG! UPDATE! Please read the update attached to the problem statement, simplifying the calculation quite a bit: Solution: Apple or coconut π? A solution to today's problem

Notes on the problems
  • July 9th, 2014. I didn't quite know how to go about this, so I made several attempts with the State pattern. But how to describe it? It's the base pool from which you draw, and each (sub-)choice-point affects it, what's that type? I spent way too much time trying to discern the type, and failing. But now, a much simpler approach suggests itself to me (after experiencing the New York Minute exercise): this is simply a permutation of the list, and then that permutation is partitioned by the sizes of the groups! Implementing permute-then-partition is a much simpler approach than tracking some monster monadic state transformer.

    No, that didn't work, either. A permutation will give you [[1,2], ...] and[[2,1], ...] That is, all solutions, even the redundant ones. So, I reworked the problem simply following the data. With takeout feeding the iterative-deepening function, I finally got a guarded state-like thingie working fast and correctly. The new solution is on the same page as the old one.
  • July 11th, 2014. The New York Minute problem demonstrates the implementation of a rule-based classifer. It takes unclassified numeric inputs, and based on the cues from the rules, either types each number into day, month, year, hour, minute, or rejects the input data as unclassifiable. I was pleased to have implemented this entire system in less than two days of work! Sweet!
  • July 22nd, 2014. So, I've been running up against the double-exponential cost of computing a stream of primes for some time now since I gave the solution to the question of demonstrating the Prime Number Theorem. So now I have to tackle of bringing down that extraordinary, or unreasonable, cost down to something useable, and along those lines (of feasibility), I'm thinking of instead of regenerating and re-searching the primesish stream that we have some kind of State-like thing of ([already generated primes], indexed-primesish) ... something like that. Solution: "General improvement of problem-solving modules in anticipation of solving today's #haskell exercise, including primes:"

by geophf ( at August 01, 2014 09:57 PM