Planet Haskell

June 17, 2019

Monday Morning Haskell

Loading Games and Changing Colors!


Last week we added functionality for serializing our world state. This allowed us to save our game with the press of a button. We also parameterized our application so that we could customize many aspects of it. We didn't explore all the possibilities though! This week, we'll see how we can load a previous game-state by using command line options. We'll also use options to specify how the game appears!

As always, take a look at our Github repository to see the full code. This article corresponds to the part-9 branch.

All this is part of our effort to make our game more "mature". But you should also consider some libraries that will be more useful in industry! Take a look at our Production Checklist!. Whether it's web servers or databases, you'll learn how Haskell interacts with more advanced concepts!

Command Line Options

Everything we do this week will happen through command line options. For a quick refresher on these, take a look at this article! We'll have a couple main concerns. First, we want to take an optional filename argument to load the initial world state. If we get this, we'll throw the player right back into their game! If we don't get this argument, we'll generate a new, random state for them to work with.

Second, we'll make command line parameters for all our different render parameters. This will allow anyone invoking the game to customize the appearance! We'll also allow them to specify a whole file for these as well. This will involve quite a bit of work with the Options.Applicative library.

Our main goal is this function:

parseOptions :: IO (Maybe FilePath, RenderParameters)

This will return us a possible file path to load our world state from, as well as a set of render parameters. Let's start with some basic framework code.

Parser Setup

The first item we want is a generic parser that will give us Maybe values. As far as I could tell the options library doesn't have this built-in. But it's not too hard to write. We want to use the option function to start with. It will attempt the given parser. If it succeeds, we want to wrap it with Just. Then we'll append a default value of Nothing to the options in case it fails.

maybeParser ::
  ReadM a -> Mod OptionFields (Maybe a) -> Parser (Maybe a)
maybeParser reader opts =
  option (Just <$> reader) (opts <> value Nothing)

We can now use this to build a parser for the maze file:

mazeFileParser :: Parser (Maybe FilePath)
mazeFileParser = maybeParser str
 (long "load-file" <> short 'f'
   <> help "A file to use to load the world state")

And now we just apply execParser on this, supplying some simple Info for our parser:

parseOptions :: IO (Maybe FilePath)
parseOptions = execParser $ info mazeFileParser commandInfo

commandInfo :: InfoMod (Maybe FilePath)
commandInfo = fullDesc <> progDesc "Haskell Maze Game"

Runner Updates

The next step is a short function for loading our world from the file. Since we have our JSON instance on the World type, we'll rely on decodeFileStrict'. There's one caveat. If the game parameters don't have a random seed value, we'll use a new one. Otherwise we'll use mkStdGen on the seed:

loadWorldFromFile :: FilePath -> IO World
loadWorldFromFile fp = do
 parseResult <- Data.Aeson.decodeFileStrict' fp
 case parseResult of
   Just w -> do
     gen <- case randomGeneratorSeed (worldParameters w) of
       Nothing -> getStdGen
       Just i -> return $ mkStdGen i
     return $ w { worldRandomGenerator = gen }
   Nothing -> error $ "Couldn't parse world from file " ++ fp ++ "!"

Now we want to make some changes to our main running function. We'll run our argument parser first. If we get a file from the options, we'll load the initial World from that file. Otherwise, we'll use our previous flow with generating a random maze.

main :: IO ()
main = do
 maybeLoadFile <- parseOptions
 initialWorld <- case maybeLoadFile of
   Just loadFile -> loadWorldFromFile loadFile
   Nothing -> …
 play ...

Parsing a Render File

This is good enough to load the world, so we can re-start from a saved (or derived) position. But suppose we wanted to go a step further. Suppose we wanted to also load our render parameters. We could use another file if we liked. We could start with another parser for a file path:

renderFileParser :: Parser (Maybe FilePath)
renderFileParser = maybeParser str
 (long "render-param-file" <> short 'r'
   <> help "A file to use to load render parameters")

Then we'll combine our two parsers together like so:

parser :: Parser (Maybe FilePath, Maybe FilePath)
parser = (,) <$>
 mazeFileParser <*>

Now we'll add a bit more logic to the wrapper function. If we have a file, we should use it to load the RenderParameters object:

parseOptions :: IO (Maybe FilePath, RenderParameters)
parseOptions = do
 (mazeFile, renderFile) <- execParser $ info parser commandInfo
 case renderFile of
   Nothing -> return (mazeFile, defaultRenderParameters)
   Just fp -> do
     parseResult <- decodeFileStrict' fp
     case parseResult of
       Nothing -> return (mazeFile, defaultRenderParameters)
       Just fileRenderParams -> return (mazeFile, fileRenderParams)

Note that the type of our commandInfo will also need to change as a result of this. But then we just have the simple task of getting other these items out in our main function:

main :: IO ()
main = do
 (maybeLoadFile, renderParams) <- parseOptions

Individual Render Parameters

We have one last trick though! Suppose we want to change one thing about the game's appearance and we don't want to use a JSON file. We can add individual options on render elements. We've got a lot of possible elements. We'll wrap them all in a basic type, matching the fields we have in the different sub-components. Each of these fields is optional. We'll "merge" them with a complete set of render parameters to get a final result.

data RenderParamInfo = RenderParamInfo
 -- Screen/Text Parameters
 (Maybe Int)
 (Maybe Int)
 (Maybe Int)
 (Maybe Float)
 (Maybe Float)
 (Maybe Float)
 (Maybe Float)
 -- Player Parameters
 (Maybe Float)
 (Maybe Color)
 (Maybe Float)
 (Maybe Color)
 -- Enemy Parameters
 (Maybe Float)
 (Maybe Color)
 (Maybe Color)
 -- Cell Parameters
 (Maybe Color)
 (Maybe Color)
 (Maybe Float)

Each field will have it's own parser. These will all be variations on our maybeParser:

maybeIntParser :: Mod OptionFields (Maybe Int) -> Parser (Maybe Int)
maybeIntParser = maybeParser auto

maybeFloatParser ::
  Mod OptionFields (Maybe Float) -> Parser (Maybe Float)
maybeFloatParser = maybeParser auto

maybeColorParser ::
  Mod OptionFields (Maybe Color) -> Parser (Maybe Color)
maybeColorParser = maybeParser (maybeReader colorReader)
   colorReader "blue" = Just blue
   … -- other colors

Then we can combine them using applicative syntax, and providing some help information:

parseRenderInfo :: Parser RenderParamInfo
parseRenderInfo = RenderParamInfo <$>
 maybeIntParser (long "screen-dimen"
   <> help "The screen width/height") <*>
 maybeIntParser (long "screen-offset-x"
   <> help "The screen width/height") <*>

Next we'll write a "merge" function. This will take a RenderParameters item with default values, and apply the Just values.

mergeOptions ::
  RenderParameters -> RenderParamInfo -> RenderParameters
mergeOptions rp (RenderParamInfo sd_ sox_ ...)
 = RenderParameters
   (fromMaybe (screenDimen rp) sd_)
   (fromMaybe (screenOffsetX rp) sox_)

Then we add this new parser to our set:

parser :: Parser (Maybe FilePath, Maybe FilePath, RenderParamInfo)
parser = (,,) <$>
 mazeFileParser <*>
 renderFileParser <*>

And our original function should now reflect the need to merge parameters.

parseOptions :: IO (Maybe FilePath, RenderParameters)
parseOptions = do
 (mazeFile, renderFile, renderInfo) <- execParser $
                                         info parser commandInfo
 case renderFile of
   Nothing -> return
     (mazeFile, mergeOptions defaultRenderParameters renderInfo)
   Just fp -> do
     parseResult <- decodeFileStrict' fp
     case parseResult of
       Nothing -> return
         (mazeFile, mergeOptions defaultRenderParameters renderInfo)
       Just fileRenderParams -> return
         (mazeFile, mergeOptions fileRenderParams renderInfo)

Wrapping Up

Now we've got a lot of interesting possibilities. We can save our game from a particular state:


Then we can load it back up with different colors. For example, we can make some obnoxious green walls:

stack exec -- maze-game --load-file=maze_game_save_1557121650\
  --cell-wall-color=green --enemy-base-color=red


Now that our game is customizable and re-loadable we'll be able to do a lot more interesting things with it. We'll be able to run many simulations that test the difficulty. Some day, we'll be able to make the AI much better! For the time being though, there are still some more features we can add. So the next couple parts of this series will explore some other exciting twists in the game-play. This will lead us to a point where other types of refactors will be necessary.

Making a project like this requires a good knowledge of the Stack tool. To learn more, take our free mini-course on Using Stack!.

by James Bowen at June 17, 2019 02:30 PM

June 16, 2019

Gabriel Gonzalez

The CAP theorem for software engineering


The CAP theorem says that distributed computing systems cannot simultaneously guarantee all three of:

  • Consistency - Every read receives the most recent write or an error

  • Availability - Every request receives a (non-error) response - without the guarantee that it contains the most recent write

  • Partition tolerance - The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

Source: CAP theorem - Wikipedia

Since we cannot guarantee all three, we must typically sacrifice at least one of those guarantees (i.e. sacrifice availability or sacrifice partition tolerance).

However, what if we were to squint and apply the CAP theorem to another distributed system: a team of software engineers working towards a common goal.

In particular:

  • What if our data store were a distributed version control system?

  • What if our “nodes” were software developers instead of machines?

If we view engineering through this lens, we can recognize many common software engineering tradeoffs as special cases of CAP theorem tradeoffs. In other words, many architectural patterns also require us sacrifice at least one of consistency, availability, or partition tolerance among developers.

Before we get into examples, I’d like to review two points that come up in most discussions of the CAP theorem:

Partition tolerance

What does it mean to sacrifice partition tolerance? In the context of machines, this would require us to rule out the possibility of any sort of network failure.

Now replace machines with developers. We’d have to assume that people never miscommunicate, lose internet access, or fetch the wrong branch.

The possibility of partitions are what make a system a distributed system, so we’re usually not interested in the option of sacrificing partition tolerance. That would be like a computing system with only one machine or a team with only one developer.

Instead, we’ll typically focus on sacrificing either availability or consistency.

Spectrums of tradeoffs

In most systems you’re always sacrificing all three of consistency, availability, and partition tolerance to some degree if you look closely enough. For example, even a healthy machine is not 100% available if you consider that even the fastest network request still has an irreducible delay of around a few hundred microseconds on today’s machines.

In practice, we ignore these vanishingly small inconsistencies or inavailabilities, but they still illustrate a general pattern: we can think of system health/availability/consistency as spectrums rather than boolean options.

For example, if we say we choose availability over consistency, we really mean that we choose to make our system’s unavailability vanishingly small and that our system could be consistent, but not all the time. Indeed, if our hardware or network were both fast and extremely reliable we could enjoy both high consistency and high availability, but when things fail then we need to prioritize which of consistency or availability that we sacrifice to accommodate that failure.

We can also choose to sacrifice a non-trivial amount of both availability and consistency. Sometimes exclusively prioritizing one or the other is not the right engineering choice!

With those caveats out of the way, let’s view some common software engineering tradeoffs through the lens of the CAP theorem.

Monorepo vs. Polyrepo

In revision control systems, a monorepo (syllabic abbreviation of monolithic repository) is a software development strategy where code for many projects are stored in the same repository

A “polyrepo” is the opposite software development strategy where each project gets a different source repository. In a monorepo, projects depend on each other by their relative paths. In a polyrepo, a project can depend on another project by referencing a specific release/revision/build of the dependency.

The tradeoff between a monorepo and a polyrepo is a tradeoff between consistency and availability. A monorepo prioritizes consistency over availability. Conversely, a polyrepo prioritizes availability over consistency.

To see why, let’s pretend that project A depends on project B and we wish to make a breaking change to project B that requires matching fixes to project A. Let’s also assume that we have some sort of continuous integration that ensures that the master branch of any repository must build and pass tests.

In a polyrepo, we can make the breaking change to the master branch of project B before we are prepared to make the matching fix to project A. The continuous integration that we run for project B’s repository does not check that other “downstream” projects that depend on B will continue to build if they incorporate the change. In this scenario, we’ve deferred the work of integrating the two projects together and left the system in a state where the master branch of project B is not compatible with the master branch of project A. (Note: the master branch of project A may still build and pass tests, but only because it depends on an older version of project B).

In a monorepo, we must bundle the breaking change to project B and the fix to project A in a single logical commit to the master branch of our monorepo. The continuous integration for the monorepo prevents us from leaving the master branch in a state where some of the projects don’t build. Fixing these package incompatibilities up-front will delay merging work into the master branch (i.e. sacrificing availability of our work product) but imposing this restriction ensures that the entire software engineering organization has a unified view of the codebase (i.e. preserving consistency).

To make the analogy precise, let’s revisit the original definitions of consistency, availability, and partition tolerance:

  • Consistency - Every read receives the most recent write or an error

  • Availability - Every request receives a (non-error) response - without the guarantee that it contains the most recent write

  • Partition tolerance - The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

… and change them to reflect the metaphor of a distributed team of developers collaborating via GitHub/GitLab:

  • Consistency - Every git pull receives the latest versions of all dependencies

  • Availability - Every pull request succeeds - without the guarantee that it contains the latest versions of all dependencies

  • Partition tolerance - git operations continue to work even if GitHub/GitLab is unavailable

Trunk based development vs. Long-lived branches

Our previous scenario assumed that each repository was using “trunk-based development”, defined as:

A source-control branching model, where developers collaborate on code in a single branch called “trunk” [and] resist any pressure to create other long-lived development branches by employing documented techniques.


The opposite of trunk-based development is “long-lived branches” that are not the master branch (i.e. the “trunk” branch).

Here are some examples of long-lived branches you’ll commonly find in the wild:

  • A develop branch that is used as a the base branch of pull requests. This develop branch is periodically merged into master (typically at release boundaries)

  • Release branches that are suported for months or years (i.e. long-term support releases)

  • Feature branches that people work on for an extended period of time before merging their work into master

The choice between trunk-based development and long-lived branches is a choice between consistency and availability. Trunk-based development prioritizes consistency over availability. Long-lived branches prioritize availability over consistency.

To see why, imagine merging a feature branch over a year old back into the master branch. You’ll likely run into a large number of merge conflicts because up until now you sacrificed consistency by basing your work on an old version of the master branch. However, perhaps you would have slowed down your iteration speed (i.e. sacrificing availability of your local work product) if you had to ensure that each of your commits built against the latest master.

You might notice that trunk-based development vs. long-lived branches closely parallels monorepo vs. polyrepo. Indeed, organizations that prefer monorepos also tend to prefer trunk-based development because they both reflect the same preference for developers sharing a unified view of the codebase. Vice versa, organizations that prefer polyrepo also tend to prefer long-lived branches because both choices emerge from the same preference to prioritize availability of developers’ work product. These are not perfect correlations, though.

Continuous integration vs. Test team

Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early.

Source: - Continous Integration

So far we’ve been assuming the use of continuous integration to ensure that master stays “green”, but not all organization operate that way. Some don’t use continuous integration and rely on a test team to identify integration issues.

You can probably guess where this is going:

  • Continuous integration prioritizes consistency over availability
  • Use of a test team prioritizes availability over consistency

The more you rely on continuous integration, the more you need to catch (and fix) errors up front since the error-detection process is automated. The more you rely on a test team the more developers tend to defer detection of errors and bugs, leaving the system in a potentially buggy state for features not covered by automated states.

Organizations that use a test team prioritize availability of developers’ work product, but at the expense of possibly deferring consistency between components system-wide. Vice-versa, organizations that rely on continuous integration prioritize consistency of the fully integrated system, albeit sometimes at the expense of the progress of certain components.


Remember that each one of these choices is really a spectrum. For example:

  • Many monorepos are partially polyrepos, too, if you count their third-party dependencies. The only true monorepo is one with no external dependencies

  • The distinction between trunk-based development and long-lived branches is a matter of degree. There isn’t a bright line that separates a short-lived branch from a long-lived one.

  • Many organizations use a mix of continuous integration (to catch low-level issues) and a test team (to catch high-level issues). Also, every organization has an implicit test team: their customers, who will report bugs that even the best automation will miss.


This post is not an exhaustive list of software engineering tradeoffs that mirror the CAP theorem. I’ll wager that as you read this several other examples came to mind. Once you recognize the pattern you will begin to see this tension between consistency and availability everywhere (even outside of software engineering).

Hopefully this post can help provide a consistent language for talking about these choices so that people can frame these discussions in terms of their organization’s core preference for consistency vs availability. For example, maybe in the course of reading this you noticed that your organization prefers availability in some cases but consistency in others. Maybe that’s a mistake you need to correct or maybe it’s an inevitability since we can never truly have 100% availability or 100% consistency.

You might be interested in what happens if you take availability or consistency to their logical conclusion. For example, Kent Beck experiments with an extreme preference for consistency over availability in test && commit || revert. Or to put it more humorously:

On the other hand, if you prioritize availability over consistency at all costs you get … the open source ecosystem.

This is not the first post exploring the relationship between the CAP theorem and software development. For example, Jessica Kerr already explored this idea of treating teams as distributed systems in Tradeoffs in Coordination Among Teams.

by Gabriel Gonzalez ( at June 16, 2019 04:39 PM

June 13, 2019

FP Complete

Neil Mitchell

HLint's path to the GHC parser

Summary: HLint is going to switch to the GHC parser over the next few months. The plan is below.

For some time, HLint has been accumulating a list of those files which are valid GHC Haskell but don't parse with haskell-src-exts. The list of differences keeps growing. While I have appreciated all the maintainers of haskell-src-exts, there have been a fair few of them recently, and the project has felt like it was in maintenance mode, rather than a vibrant project.

To solve this problem, I decided to switch to the GHC parser. However, the GHC parse tree changes significantly with each version of GHC, and HLint needs to support more than one version of GHC. The solution was ghc-lib - a decoupling of the GHC API, turning it into a reusable library. As of now, the latest haskell-src-exts maintainer has recommended people move to ghc-lib.

The plan for HLint is tracked in a GitHub issue. The first step was to switch so all files are parsed with both haskell-src-exts and ghc-lib - with a failure if either parser fails - that step has been completed and released (with much work from Shayne Fletcher, who is my partner in crime for this transition).

The next step was to abstract over the Language.Haskell.HLint3 API to produce a version that didn't fundamentally rely on the haskell-src-exts data types. That has led to the Language.Haskell.HLint4 API which makes things like parsed modules abstract, and removes functions that Aelve Codesearch showed weren't being used in practice (e.g. functions for approximate Scope resolution).

The next release will ship with a 0.1 breaking-change bump and HLint3 reexporting what is currently HLint4. If you think the HLint4 API does not include necessary functions, please let me know ASAP. After that release, we'll start changing hints one by one to use the GHC parse tree. Once that is complete, we will drop the dependency on haskell-src-exts and the project will be complete.

For command line users of HLint you should notice greater compatibility with GHC, but relatively little else.

by Neil Mitchell ( at June 13, 2019 09:32 AM

June 12, 2019

Holden Karau

Validating Big Data Jobs An exploration with @ApacheSpark & @ApacheAirflow (+ friends) @ FOSDEM

Thanks for joining me on 2019-02-03 at FOSDEM 2019 Brussels, Belgium for Validating Big Data Jobs An exploration with @ApacheSpark & @ApacheAirflow (+ friends).

The slides are at

Comment bellow to join in the discussion :).Talk feedback is appreciated at

by Holden Karau ( at June 12, 2019 10:56 AM

Holden @ Kiwi Code Mania: talk title TBD @ Code Mania

Thanks for joining me on 2019-05-15 at Code Mania 2019 Auckland, New Zealand for Holden @ Kiwi Code Mania: talk title TBD.I'll update this post with the slides soon.Comment bellow to join in the discussion :).Talk feedback is appreciated at

by Holden Karau ( at June 12, 2019 09:42 AM

The Importance of Reviewing PR comments @ Scala Days EU 2019

Come join me on Wednesday 12 June @ 17:45 for The Importance of Reviewing PR comments.I'll update this post with the slides soon.Come see to the talk or comment bellow to join in the discussion :).Talk feedback is appreciated at

by Holden Karau ( at June 12, 2019 09:42 AM

June 10, 2019

Monday Morning Haskell

Spring Cleaning: Parameters and Saving!


Our game is a lot more interesting after the changes we made last week. But there's still lots of room for improvement. There are many things that could make the game more or less interesting, depending how we tune things. For instance, how many enemies is too many? What's a good stun duration? How quickly does the game get harder if we tweak these parameters?

Right now, it would be hard for us to answer these questions in a systematic way. We've baked many of these parameters directly into the code. So we would have to recompile everything if we wanted to test out another version of the game. This is a bad sign. We should be able to run the same binary with different sets of parameters.

Another issue with our game is that the only real "flow" is to start off with a random arrangement. We don't know what the map will be or where the enemies start. But if we want to test how well certain concepts work, we'll want true re-playability. In other words, we'll want to be able to start the game from a certain state we've established.

This week, we'll start to clean these things up. The first job will be to move a lot of our magic numbers into the World type. Then we'll devise a way to serialize the complete world state. We've already done the hard work of ensuring we can serialize the map. The rest will be pretty straightforward using JSON instances. We'll wrap up this week by adding the option to save the game in the middle of the action. Then next week, we'll add some options to our game so we can load a particular starting state from a file.

As with each of the different phases of this projects, you can take a look at our Github repository to see how we do everything. For this article, you should be following the part-8 branch. We'll also provide a couple commit links for so you can follow along step-by-step.

As you get better at using Haskell, you'll be able to use it for more and more types of projects. Download our Production Checklist for some ideas!

Parameterizing the App

There are a lot of "magic numbers" floating around our app right now. Some of these have to do with game-play logic. How many enemies are there? What's their cool-down time? How long is our player's stun timer? Then there are other parameters that have to do with how we draw the game. For instance, what colors do we use? How big are the cells?

We make this distinction because we should be able to run our game without any render information. At some point, we'll run simulations of this game that don't get drawn at all. So it would be useless to have drawing information around. Thus we'll have GameParameters types that will live in the World. Then we'll have RenderParameters types for drawing everything.

With that said, let's starting devising what information these types contain. We'll start out with types describing the player and enemy parameters:

data PlayerGameParameters = PlayerGameParameters
  { initialStunTimer :: Word
  , stunTimerIncrease :: Word
  , stunTimerMax :: Word
  , stunRadius :: Int

data EnemyGameParameters = EnemyGameParameters
  { initialStunTime :: Word
  , stunTimeDecrease :: Word
  , minStunTime :: Word
  , enemyRandomMoveChance :: Word
  , initialLagTime :: Word
  , minLagTime :: Word

Now we can use these to populate a bigger type with more generic game parameters:

data GameParameters = GameParameters
  { numRows :: Int
  , numColumns :: Int
  , numEnemies :: Int
  , tickRate :: Int
  , playerGameParameters :: PlayerGameParameters
  , enemyGameParameters :: EnemyGameParameters
  , randomGeneratorSeed :: Maybe Int

Notice the random seed is a Maybe value. In the normal circumstances of running the program, we don't want to fix the random generator. But there are cases where we'll want to load from a specific stored state. If we fix the generator seed value, gameplay will be deterministic. This could be a desirable property in some circumstances. In most cases though, this will be Nothing.

With all these types in place, we'll now add the game parameters to our World:

data World = World
  { …
  , worldParameters :: GameParameters

We'll go through a similar process with RenderParameters. The main difference will be that we will not attach the type to the World. There will also be a CellRenderParameters type as well as types for the Player and Enemy. This gives us information about how individual cells get displayed on our screen. Here's a quick sample of this code. You can see the other types at the bottom as an appendix.

data RenderParameters = RenderParameters
  { screenWidth :: Float
  , screenHeight :: Float
  , screenOffsetX :: Float
  , screenOffsetY :: Float
  , textOffset :: (Float, Float)
  , textScale :: (Float, Float)
  , playerRenderParameters :: PlayerRenderParameters
  , enemyRenderParameters :: EnemyRenderParameters
  , cellRenderParameters :: CellRenderParameters

data CellRenderParameters = CellRenderParameters
  { cellWallColor :: Color
  , cellStunColor :: Color
  , cellWallWidth :: Float

No More Magic

Once we have these types in place, our next step is to replace the magic numbers (and colors) in our application. We'll need to add the parameters as arguments in a few places. Most of all, the drawingFunc will need the RenderParameters argument.

drawingFunc :: RenderParameters -> World -> Picture

This process isn't too much of a challenge, as all our important functions take the World as an input. Then for now, we'll pass our default parameter packs as arguments when running the program. Here's a quick look at changes to our main function:

main :: IO ()
main = do
  gen <- getStdGen
  let gameParams = defaultGameParameters
      renderParams = defaultRenderParameters
      (maze, gen') = generateRandomMaze
        gen (numRows gameParams, numColumns gameParams)
      (randomLocations, gen'') = runState
          (numEnemies gameParams)
            (numRows gameParams, numColumns gameParams)))
      enemies = (mkNewEnemy
        (enemyGameParameters gameParams)) <$> randomLocations
      endCell = (numColumns gameParams - 1, numRows gameParams - 1)
      initialWorld = World
        (newPlayer (playerGameParameters gameParams))
        (0,0) endCell maze GameInProgress gen'' enemies [] 0 
    (windowDisplay renderParams)
    (tickRate gameParams)
    (drawingFunc renderParams)

Take a look at this commit for a longer look at all our parameter changes.

Serializing Our World

Now that we've updated our World type, we'll want to determine how we can serialize it. For simplicity's sake we'll use JSON serialization. This is mostly a matter of creating (or, if you wish, deriving), a bunch of ToJSON and FromJSON instances. Check out this article for a refresher on the Data.Aeson library.

Most of this code is pretty simple. With game parameters, a lot of the instances are a simple matter of creating and parsing pairs. Here's an example with Player:

instance FromJSON Player where
  parseJSON = withObject "Player" $ \o -> do
    location <- o .: "location"
    currentStunDelay <- o .: "currentStunDelay"
    nextStunDelay <- o .: "nextStunDelay"
    return $ Player location currentStunDelay nextStunDelay

instance ToJSON Player where
  toJSON p = object
    [ "location" .= playerLocation p
    , "currentStunDelay" .= playerCurrentStunDelay p
    , "nextStunDelay" .= playerNextStunDelay p

But there are a few caveats. To start, we need to make a separate file for these instances. We'll need our maze parsing code for the World type, and this depends on the Types module. We have to avoid the resulting dependency cycle.

It's generally a bad practice to separate instances from the type declarations. We call these "orphan" instances and you'll get a compiler warning about them in other projects. A way around this is to create wrapper types. This is a little tedious, so we won't do it for all the types. But we will show the concept for the Color type from Graphics.Gloss. Let's start with a wrapper type:

newtype ColorWrapper = ColorWrapper { unColor :: Color }

Now we can create instances on this wrapper, and they're considered valid. To cover all cases of color, we'd use RGB arrays. But we'll color cover the 9 or so colors we care about and parse them as strings. Here's what the instances look like. Notice how we wrap and unwrap the actually library functions for the colors:

instance ToJSON ColorWrapper where
  toJSON (ColorWrapper c) = Ae.String colorStr
        | c == blue = "blue"
        | c == red = "red"
        | c == yellow = "yellow"
        | c == green = "green"
        | c == cyan = "cyan"
        | c == orange = "orange"
        | c == magenta = "magenta"
        | c == rose = "rose"
        | c == black = "black"

instance FromJSON ColorWrapper where
  parseJSON = withText "ColorWrapper" parseText
      parseText "blue" = return (ColorWrapper blue)
      parseText "red" = return (ColorWrapper red)
      parseText "yellow" = return (ColorWrapper yellow)
      parseText "green" = return (ColorWrapper green)
      parseText "cyan" = return (ColorWrapper cyan)
      parseText "orange" = return (ColorWrapper orange)
      parseText "magenta" = return (ColorWrapper magenta)
      parseText "rose" = return (ColorWrapper rose)
      parseText "black" = return (ColorWrapper black)
      parseText _ = error "Couldn't parse color!"

Then we can use these instances within other parsers for render parameters. The other caveat now is to parse out our World in two stages. We'll get all the basic fields and parameters first:

instance FromJSON World where
  parseJSON = withObject "World" $ \o -> do
    player <- o .: "player"
    startLoc <- o .: "startLocation"
    endLoc <- o .: "endLocation"
    result <- o .: "result"
    enemies <- o .: "enemies"
    stunCells <- o .: "stunCells"
    time <- o .: "time"
    params <- o .: "gameParameters"

Now we'll get the boundaries as a Text item. We'll parse the maze boundaries out using our parser as well as the number of rows and columns.

instance FromJSON World where
  parseJSON = withObject "World" $ \o -> do
    (boundaryString :: Text) <- o .: "boundaries"
    let (rs, cs) = (numRows params, numColumns params)
    let boundaries =
      case runParser (mazeParser (rs, cs)) "" boundaryString of
          Right result -> result
          _ -> error "Map parse failed!"

As a last trick, we'll check what the random seed is within our parameters. If it's Nothing, we'll fix the generator with a seed of 1 and rely on other code to change it:

instance FromJSON World where
  parseJSON = withObject "World" $ \o -> do
    let gen = case randomGeneratorSeed params of
          Just i -> mkStdGen i
          _ -> mkStdGen 1
    return $ World player startLoc endLoc boundaries
      result gen enemies stunCells time params

Take a look at this commit to see the full code for these instances. Now let's see how we use them!

Saving Our World

We'd like to make it so that our user can save their game-state by hitting the s key at any point in the game. This idea starts out simple enough. We add a handler for the key in our inputHandler.

inputHandler :: Event -> World -> World
inputHandler event w
  | worldResult w == GameWon = ...
  | worldResult w == GameLost = ...
  | otherwise = case event of
      (EventKey (Char 's') Down _ _) -> ...

But now we're a little stuck! We want to write out to a file, but our handler is a pure function! There miiight be a way to do this without breaking functional purity. Perhaps we could keep a list of saved world states and add a handler to save them at the end of our program. But Gloss wasn't made for that. So we're going to break the rules a bit and resort to unsafePerformIO. This allows us to run an IO computation from a seemingly pure context. Here's the basic layout:

inputHandler :: Event -> World -> World
inputHandler event w
      (EventKey (Char 's') Down _ _) -> unsafeSaveWorldToFile w

unsafeSaveWorldToFile :: World -> World
unsafeSaveWorldToFile w = unsafePerformIO $ do
  return w

Since we have a JSON instance for our World, we'll lean on the encodeFile function. The rest of the work here comes from generating a filename using the current time, for uniqueness:

unsafeSaveWorldToFile :: World -> World
unsafeSaveWorldToFile w = unsafePerformIO $ do
  timeAsString <- show . floor <$> getPOSIXTime
  currentDir <- getCurrentDirectory
  let filename = currentDir ++ "/maze_game_save_" ++ timeAsString
  encodeFile filename w
  return w

And that's all it takes for us to save some game files! Perhaps you've heard the phrase "don't try this at home." When it comes to unsafePerformIO, feel free to try it at home, but don't try it at work! Take a look at this commit for details on saving the state.


In spite of unsafePerformIO, our game feel like a much more "grown-up" program now. The code quality is much better with our parameters. We now have a lot more options of what to do when it comes to improving it. Saving the world state is the first step towards solving some interesting problems. Next week, we'll explore how we can load the saved game states we've created.

As we move forward, we'll keep trying to turn this game into a more mature program. Eventually though, you should think about using Haskell for more common production use cases. To learn about different libraries you can use, download our Production Checklist!

Appendix: Render Parameter Types

data RenderParameters = RenderParameters
  { screenDimen :: Int
  , screenOffsetX :: Int
  , screenOffsetY :: Int
  , textOffset :: (Float, Float)
  , textScale :: (Float, Float)
  , playerRenderParameters :: PlayerRenderParameters
  , enemyRenderParameters :: EnemyRenderParameters
  , cellRenderParameters :: CellRenderParameters

data PlayerRenderParameters = PlayerRenderParameters
  { playerIndicatorSize :: Float
  , playerIndicatorColor :: Color
  , playerStunIndicatorSize :: Float
  , playerStunIndicatorColor :: Color

data EnemyRenderParameters = EnemyRenderParameters
  { enemySize :: Float
  , enemyBaseColor :: Color
  , enemyStunnedColor :: Color

data CellRenderParameters = CellRenderParameters
  { cellWallColor :: Color
  , cellStunColor :: Color
  , cellWallWidth :: Float

defaultRenderParameters :: RenderParameters
defaultRenderParameters = RenderParameters
  625 10 10 (-275, 0) (0.12, 0.25) playerParams enemyParams cellParams
    playerParams = PlayerRenderParameters 10 black 5 red
    enemyParams = EnemyRenderParameters 10 orange yellow
    cellParams = CellRenderParameters blue cyan 2

by James Bowen at June 10, 2019 02:30 PM

June 09, 2019

Lysxia's blog

June 08, 2019

Mark Jason Dominus

The inside-outness of category theory

I have pondered category theory periodically for the past 35 years, but not often enough to really stay comfortable with it. Today I was pondering again. I wanted to prove that and I was having trouble. I eventually realized my difficulty: my brain had slipped out of category theory mode so that the theorem I was trying to prove did not even make sense.

In most of mathematics, would denote some specific entity and we would then show that that entity had a certain property. For example, in set theory we might define to be some set, say the set of all Kuratowski pairs where :

$$ 1×A =_{\text{def}} \{ z : \exists a\in A : z = \{\{\varnothing\}, \{\varnothing, a\}\} \} $$

and then we would explicitly construct a bijection :

$$ f(a) = \{\{\varnothing\}, \{\varnothing, a\}\}$$

In category theory, this is not what we do. Everything is less concrete. looks like an operator, one that takes two objects and yields a third. It is not. does not denote any particular entity with any particular construction. (Nor does , for that matter.) Instead, it denotes an unspecified entity, which happens to have a certain universal property, perhaps just one of many such entities with that property, and there is no canonical representative of the class. It's a mistake to think of it as a single thing, and it's also a mistake to ask the question the way I did ask it. You can't show that has any specific property, because it's not a specific thing. All you can do is show that anything with the one required universal property must also have the other property. We should rephrase the question like this:

Let be a product of and . Then .

Maybe a better phrasing is:

Let be some object that is a product of and . Then .

The notation is still misleading, because it looks like denotes the result of some operation, and it isn't. We can do a little better:

Let be a product of and . Then .

That it, that's the real theorem. It seems at first to be more difficult — where do we start? But it's actually easier! Because now it's enough to simply prove that itself is a product of and , which is easily done: its projection morphisms are evidently and . And by a previous theorem that all products are isomorphic, any other product, such as , must be isomorphic to this one, which is itself.

(We can similarly imagine that any theorem that mentions is preceded by the phrase “Let be some terminal object.”)

by Mark Dominus ( at June 08, 2019 06:16 AM

June 07, 2019

Alson Kemp

On Euthanizing A Companion Animal

[This is rather off-topic but it’s cathartic and might be helpful to someone.]

We recently euthanized a much beloved family cat. The process was both straightforward and bewildering. Herewith, notes on our experience along with suggestions about how we might approach it differently in the future.


This is about the mechanics of euthanizing a particular animal.  It should be applicable to larger animals in different environments.  Emotional and spiritual aspects are not addressed; those are difficult enough but not understanding the mechanics of the process only compounds the difficulty.

The Animal

He was a gregarious and happy cat, though he was a little “well fed”. In the last month or two, he’d looked rather slimmer, had taken to “hiding” in unused rooms and then to snuggling aggressively, was not eating or drinking as he normally would. Tests were done, nothing was found and the downward spiral continued over the next few weeks.

We took him to another vet. They looked at his teeth, listened to his heart, squeezed his belly… and said we’re going to take him in the back room for a moment. They came back with ultrasound pictures (no charge) of a significant tumor.

At this point, the discussion turned to heroic (tumor resection + chemo) and/or palliative measures (he might be comfortable for a few more weeks with prednisone), no doubt to assure the pet owners that euthanasia was not the only option. This discussion was quickly cut off: we appreciate the situation, we know his condition, we know where this ends, further pain is not warranted.

Chronology [roughly]

  • 10:00:00 – Arrive in exam room.
  • 10:10:00 – Physical exam complete. Per earlier note, the cat is carried to the backroom.
  • 10:15:00 – Vet returns with ultrasound pictures of tumor.
  • 10:16:00 – Discussion.
  • 10:18:00 – End of discussion. Let’s sort this. Explanation of process.
  • 10:20:00 – The cat is carried to the backroom to be sedated and catheterized. Wife and I sit around glumly.
  • 10:30:00 – The cat and staff return. A warm hospital blanket is fluffed on a cold, stainless steel exam table. The cat is placed upon the fluffed blanket. He looks fine, though his right arm is taped and has a catheter port. The vet and assistant say that they’ll leave now and we can knock on the door whenever we’re ready.
  • 10:31:00 – What are we waiting for? We knock.
  • 10:32:00 – Everyone is positioned (vet at the fore; assistant to one side; us to the other). The vet explains the three syringes: the first is just saline to verify the catheter is flowing properly; the second is propofol to put him to sleep (this is the “count backwards from ten” juice everyone knows from surgery); the third is an unspecified chemical to stop the heart.
  • 10:32:30 – The vet assistant is holding the cat firmly though caringly; he’s petting the cat; we’re petting the cat; the vet attaches the first syringe and pushes it. Seems fine.
  • 10:32:35 – The vet attaches the second syringe (propofol) and pushes it. Seems fine.
  • 10:32:38 – The vet assistant cradles the cat as he begins to slump onto the blanket. We continue petting him.
  • 10:32:45 – The cat is fully lying down. The vet attaches and pushes the third syringe. We continue petting the cat.
  • 10:32:50 – “Excuse me, sorry”. The vet needs me to move aside so that he can apply the stethoscope to the cat’s chest.
  • 10:32:52 – “It’s over. He’s gone.” We shake hands, say thank yous, they gracefully excuse themselves with a “Take as long as you need.” More petting. His eyes are still open but he looks peaceful. It’s actually fairly cathartic to look at your dead pet: in the last few minutes he’s ventured just about as far as he had from his healthy self a few months in the past… A favor has been done.
  • 10:34:00 – To the wife: “Let’s get out of here!”
  • 10:34:15 – Stop. Muzak, waiting at front desk for receipt to print, card swipe, signature, etc.
  • 10:36:00 – Drive away.

Looks at those times. It’s quick. [Longer for larger animals as their circulatory systems are larger?]

The Environment

He was euthanized at a local animal hospital (Montclair Animal Hospital) by a wonderful vet (Dr. Scriffignano) and assistant (Tim), neither of whom had met the cat before. Two thumbs up, five stars, etc. There are in-home euthanasia options, softly-lighted-couch options, etc. But it turns out: he’s a cat. He’s not being chased by coyotes; he’s got a bunch of caring people around him; lots of petting; fresh food; certain unpleasant things are going to happen that he can’t comprehend; and then it’s naptime. We would have taken the chipmunk chasing option, but it wasn’t offered.

In the end, these are cats/animals and there are pots of money to be made making the human happier during the process.  We looked closely into these and they may be completely worth it to others. Our cat was in pain, the staff was kind and caring, so, yes, he was euthanized on a warm blanket on top of a cold, stainless steel examination table in an unfamiliar environment. It was quick and he’s no longer in pain. Quick, comfortable, happy-human: pick two.

The Order of Operations

In talking with other people about euthanizing their pets, the order of operations and its timing is quite variable (even when using the same vet). This is the bit over which you can possibly exert control. The end will still be the end but how you get there is the question.

After our cat was down, and we had driven away and processed the experience, we were left with three issues:

  1. Why was the cat taken away to be catheterized? I’m sure it can be unpleasant, especially if your animal is vocal or nippy. That might not be something you want to experience and so it makes sense and is a safer experience if they take the animal away. Our cat was not that way: I’m fairly sure he was purring when they catheterized him. We would have preferred to spend those 10 minutes with him.
  2. The procedure was quick: My wife and I would have loved to spend 30 seconds gazing at and petting a peacefully sleeping cat before the third syringe. Perhaps there are strong reasons to push the third syringe quickly? If not, next time we’ll ask for more time to watch them relaxing.
  3. Paying afterwards sucks: Immediately after euthanasia, we were in the mood to run and I was thinking that I’d suggest that we call in the payment… Next thing I knew we were standing at the front desk waiting to sign a card slip. Cue muzak…

Next Time

Once the decision is made, we’ll do the following:

  • Explain our animal’s temperament (e.g. if it’s a meowy, hissy cat, then maybe it’s best they go to back room the back room while the pros deal with an angry cat; if he’s mellow and trusting, then sedate and catheterize him here with us).
  • Explain our temperament (e.g. are we going to faint at catheter placement?).
  • Ask for the precise order of operations. When will the animal be away from us? Which absences and timings are required? Which can we change? Adjust as necessary/possible.
  • Pay.
  • Go through the procedure.
  • Pay respects and exit quickly.

by alson at June 07, 2019 08:07 PM

Roman Cheplyaka

Why would you use ContT?

The ContT transformer is one of the more exotic and less used transformers provided by mtl. The Control.Monad.Cont module now even includes the following warning:

Before using the Continuation monad, be sure that you have a firm understanding of continuation-passing style and that continuations represent the best solution to your particular design problem. Many algorithms which require continuations in other languages do not require them in Haskell, due to Haskell’s lazy semantics. Abuse of the Continuation monad can produce code that is impossible to understand and maintain.

So what is ContT, and when does it represent the best solution to a problem?

Consider the following three functions from the base library:

They are all “unsafe” in the sense that they return something to you but don’t clean up after you: don’t close the file you’ve just opened, put back the MVar you’ve just taken, or free the array’s memory. If you forget to perform the cleanup, it’s on you.

Indeed, a function that closed the file right after opening it would be rather useless. Therefore, a “safer” function should know what you intend to do with the file handle and insert that action between opening and closing the file.

The base library provides the following safer versions of the above functions:

Notice how these functions follow the same pattern and how they relate to their unsafe versions: if the unsafe function returned IO a, then the corresponding safe function takes an additional argument of the form (a -> IO r) and returns IO r. This style of writing functions is called the continuation-passing style (CPS), and the argument a -> IO r is called the continuation.

ContT gives this pattern its own type:

(In this article, the underlying monad m will always be IO.)

You can construct ContT functions from the functions written in the CPS:

Notice how the types of the safe functions, once wrapped in ContT, become even more similar to the types of their non-safe versions: the only difference is ContT r IO instead of IO in the return type. And because ContT turns out to be a monad transformer, we can use and combine the safe function in all the same way as we could with the unsafe functions—getting the added safety essentially for free. Once you no longer need the resources you’ve allocated inside ContT, you turn it to IO using runContT, and when you exit a ContT block, all the cleanup actions are run—in the reverse order, of course.

Let’s consider some practical examples.


In the llvm-hs package’s API, the continuation-passing style is used to reliably deallocate the allocated LLVM structures and classes, which are written in C++.

Here’s an example from llvm-hs’s test suite, reformatted such that each nesting level gets its own indentation level:

And here’s the same test rewritten with ContT:

Most of the nesting and indentation is gone. The code is completely linear now, just as if we were using the unsafe, non-CPS functions. Applying ContT on every line is a bit awkward, but that would be gone if the library’s API was designed so that the CPS functions were already in the ContT monad.

Reading from a list of files

In the above example, we had merely a high nesting level. But what if the nesting level is unknown in advance?

Consider a program that takes a list of files and prints them interleaved—a line from the first time, then a line from the second, and so on, printing the line m+1 of the first file after the line m of the last file.

To implement such a program, we need to keep all the n files open all the time. If we want to use the safe withFile function, we need to nest it n times, where n is not known until the program is run. How is that even possible?

Well, it is possible with ContT.

Allocating a linked list

This example is similar to the previous one, except it actually occurred in my practice. See Way 2 in 6 ways to manage allocated memory in Haskell.

The withX-style safe functions are far from the only possible use for CPS and ContT, but it’s the one that I encounter most often. So if you were wondering why you’d use a continuation monad, hopefully this gives you some ideas.

June 07, 2019 08:00 PM

Mark Jason Dominus

Disco ball technology

A little while ago I wrote:

I think a disco ball would not be out of place at Versailles.

I just remembered that good mirror technology is perhaps too recent for disco balls to have been at Versailles. Hmmm. Early mirrors were made of polished metal or even stone, clearly unsuitable. Back-silvering of glass wasn't invented until the mid-19th century.

Still, a disco ball is a very forgiving application of mirrors. For a looking-glass you want a large, smooth, flat mirror with no color distortion. For a disco ball you don't need any of those things. Large sheets of flat glass were unavailable before the invention of float glass at the end of the 19th century, but for a disco ball you don't need plate glass, just little shards, leftovers even.

The 17th century could produce mirrors by gluing metal foil to the back of a piece of glass, so I wonder why they didn't. They wouldn't have been able to spotlight it, but they certainly could have hung it under an orbiculum. Was there a technological limitation, or did nobody happen to think of it?

by Mark Dominus ( at June 07, 2019 03:21 PM


Free training sessions at ZuriHac 2019

For the first time we will be providing free training sessions at ZuriHac. There will be two independent sessions, which will take place on Friday 14th June at 2–5pm and Saturday 15th June at 1.30–4pm, and are open to anyone who is registered for ZuriHac.

Adam Gundry: Advanced Types

Friday 14th June 2019, 2–5pm

On Friday afternoon Adam Gundry will lead the session on Advanced Types. This will assume familiarity with Haskell fundamentals such as defining your own data types and type classes. We will explore some more advanced extensions to the type system and techniques for dependently-typed programming in GHC Haskell, including:

  • GADTs
  • type families
  • equality constraints
  • the Constraint kind
  • singleton types

Duncan Coutts: Evaluation and Performance

Saturday 15th June 2019, 1.30–4pm

On Saturday afternoon Duncan Coutts will lead the session on Evaluation and Performance. This is aimed at intermediate Haskell programmers who are comfortable understanding what their programs compute and are ready to move on to understanding how their programs compute. This is an important step towards being able to confidently write Haskell programs that will work within reasonable time and memory. Topics will include:

  • understanding demand and strictness
  • how lazyness works
  • understanding the heap and space use during evaluation
  • controlling evaluation
  • space profiling
  • avoiding common causes of space leaks

We would be delighted to welcome you at these sessions or to discuss any interesting topics with you at this fantastic Hackathon.

Other courses

If you cannot make it to ZuriHac but are still interested in our courses or other services, check our Training page, Services page, or just send us an email.

by christine, adam, duncan at June 07, 2019 02:34 PM

June 04, 2019

Monday Morning Haskell

Fighting Back!


In last week's article, we made our enemies a lot smarter. We gave them a breadth-first-search algorithm so they could find the shortest path to find us. This made it much harder to avoid them. This week, we fight back! We'll develop a mechanism so that our player can stun nearby enemies and bypass them.

None of the elements we're going to implement are particularly challenging in isolation. The focus this week is on maintaining a methodical development process. To that end, it'll help a lot to take a look at the Github Repository for this project when reading this article. The code for this part is on the part-7 branch.

We won't go over every detail in this article. Instead, each section will describe one discrete stage in developing these features. We'll examine the important parts, and give some high level guidelines for the rest. Then there will be a single commit, in case you want to examine anything else that changed.

Haskell is a great language for following a methodical process. This is especially true if you use the idea of compile driven development (CDD). If you've never written any Haskell before, you should try it out! Download our Beginners Checklist and get started! You can also read about CDD and other topics in our Haskell Brain series!

Feature Overview

To start, let's formalize the definition of our new feature.

  1. The player can stun all enemies within a 5x5 tile radius (ignoring walls) around them.
  2. This will stun enemies for a set duration of time. However, the stun duration will go down each time an enemy gets stunned.
  3. The player can only use the stun functionality once every few seconds. This delay should increase each time they use the stun.
  4. Enemies will move faster each time they recover from getting stunned.
  5. Stunned enemies appear as a different color
  6. Affected tiles briefly appear as a different color.
  7. When the player's stun is ready, their avatar should have an indicator.

It seems like there are a lot of different criteria here. But no need to worry! We'll follow our development process and it'll be fine! We'll need more state in our game for a lot of these changes. So, as we have in the past, let's start by modifying our World and related types.

World State Modifications

The first big change is that we're going to add a Player type to carry more information about our character. This will replace the playerLocation field in our World. It will have current location, as well as timer values related to our stun weapon. The first value will be the time remaining until we can use it again. The second value will be the next delay after we use it. This second value is the one that will increase each time we use the stun. We'll use Word (unsigned int) values for all our timers.

data Player = Player
  { playerLocation :: Location
  , playerCurrentStunDelay :: Word
  , playerNextStunDelay :: Word

data World = World
  { worldPlayer :: Player

We'll add some similar new fields to the enemy. The first of these is a lagTime. That is, the number of ticks an enemy will wait before moving. The more times we stun them, the lower this will go, and the faster they'll get. Then, just as we keep track of a stun delay for the player, each enemy will have a stun remaining time. (If the enemy is active, this will be 0). We'll also store the "next stun duration", like we did with the Player. For the enemy, this delay will decrease each time the enemy gets stunned, so the game gets harder.

data Enemy = Enemy
  { enemyLocation :: Location
  , enemyLagTime :: Word
  , enemyNextStunDuration :: Word
  , enemyCurrentStunTimer :: Word

Finally, we'll add a couple fields to our world. First, a list of locations affected by the stun. These will briefly highlight when we use the stun and then go away. Second, we need a worldTime. This will help us keep track of when enemies should move.

data World = World
  { worldPlayer :: Player
  , startLocation :: Location
  , endLocation :: Location
  , worldBoundaries :: Maze
  , worldResult :: GameResult
  , worldRandomGenerator :: StdGen
  , worldEnemies :: [Enemy]
  , stunCells :: [Location]
  , worldTime :: Word

At this point, we should stop thinking about our new features for a second and get the rest of our code to compile. Here are the broad steps we need to take.

  1. Every instance of playerLocation w should change to access playerLocation (worldPlayer w).
  2. We should make a newPlayer expression and use it whenever we re-initialize the world.
  3. We should make a similar function mkNewEnemy. This should take a location and initialize an Enemy.
  4. Any instances of Enemy constructors in pattern matches need the new arguments. Use wildcards for now.
  5. Other places where we initialize the World should add extra arguments as well. Use the empty list for the stunCells and 0 the world timer.

Take a look at this commit for details!

A Matter of Time

For the next step, we want to ensure all our time updates occur. Our game entities now have several fields that should be changing each tick. Our world timer should go up, our stun delay timers should go down. Let's start with a simple function that will increment the world timer:

incrementWorldTime :: World -> World
incrementWorldTime w = w { worldTime = worldTime w + 1 }

In our normal case of the update function, we want to apply this increment:

updateFunc :: Float -> World -> World
updateFunc _ w
  | otherwise = incrementWorldTime (w 
    { worldRandomGenerator = newGen
    , worldEnemies = newEnemies

Now there are some timers we'll want to decrement. Let's make a quick helper function:

decrementIfPositive :: Word -> Word
decrementIfPositive 0 = 0
decrementIfPositive x = x - 1

We can use this to create a function to update our player each tick. All we need to do is reduce the stun delay. We'll apply this function within our update function for the world.

updatePlayerOnTick :: Player -> Player
updatePlayerOnTick p = p
  { playerCurrentStunDelay =
      decrementIfPositive (playerCurrentStunDelay p)

updateFunc :: Float -> World -> World
updateFunc _ w
  | otherwise = incrementWorldTime (w
    { worldPlayer = newPlayer
    , ...
    player = worldPlayer w
    newPlayer = updatePlayerOnTick player

Now we need to change how we update enemies:

  1. The function needs the world time. Enemies should only move when the world time is a multiple of their lag time.
  2. Enemies should also only move if they aren't stunned.
  3. Reduce the stun timer if it exists.
  :: Word
  -> Maze
  -> Location
  -> Enemy
  -> State StdGen Enemy
updateEnemy time maze playerLocation
  e@(Enemy location lagTime nextStun currentStun) =
    if not shouldUpdate
      then return e
      else do
        … -- Make the new move!
        return (Enemy newLocation lagTime nextStun 
                 (decrementIfPositive currentStun))
        isUpdateTick = time `mod` lagTime == 0
        shouldUpdate = isUpdateTick && 
                         currentStun == 0 && 
                         not (null potentialLocs)
        potentialLocs = …

There are also a couple minor modifications elsewhere.

  1. The time step argument for the play function should now be 20 steps per second, not 1.
  2. Enemies should start with 20 for their lag time.

We haven't affected the game yet, since we can't use the stun! This is the next step. But this is important groundwork for making everything work. Take a look at this commit for how this part went down.

Activating the Stun

Let's make that stun work! We'll do this with the space-bar key. Most of this logic will go into the event handler. Let's set up the point where we enter this command:

inputHandler :: Event -> World -> World
inputHandler event w
  | otherwise = case event of
      … -- (movement keys)
      (EventKey (SpecialKey KeySpace) Down _ _) -> ...

What are all the different things that need to happen?

  1. Enemies within range should get stunned. This means they receive their "next stun timer" value for their current stun timer.
  2. Their "next stun timers" should decrease (let's say by 5 to a minimum of 20).
  3. Our player stun delay timer should get the "next" value as well. Then we'll increase the "next" value by 10.
  4. Our "stun cells" list should include all cells within range.

None of these things are challenging on their own. But combining them all is a bit tricky. Let's start with some mutation functions:

activatePlayerStun :: Player -> Player
activatePlayerStun (Player loc _ nextStunTimer) =
  Player loc nextStunTimer (nextStunTimer + 10)

stunEnemy :: Enemy -> Enemy
stunEnemy (Enemy loc lag nextStun _) =
  Enemy loc newLag newNextStun nextStun
    newNextStun = max 20 (nextStun - 5)
    newLag = max 10 (lag - 1)

Now we want to apply these mutators within our input handler. To start, let's remember that we should only be able to trigger any of this logic if the player's stun timer is already 0!

inputHandler :: Event -> World -> World
inputHandler event w
  | otherwise = case event of
      … -- (movement keys)
      (EventKey (SpecialKey KeySpace) Down _ _) ->
        if playerCurrentStunDelay currentPlayer /= 0 then w
          else ...

Now let's add a helper that will give us all the locations affected by the stun. We want everything in a 5x5 grid around our player, but we also want bounds checking. Luckily, we can do all this with a neat list comprehension!

    stunAffectedCells :: [Location]
    stunAffectedCells =
      let (cx, cy) = playerLocation currentPlayer
      in  [(x,y) | x <- [(cx-2)..(cx+2)], y <- [(cy-2)..(cy+2)], 
            x >= 0 && x <= 24, y >= 0 && y <= 24]

Now we'll make a wrapper around our enemy mutation to determine which enemies get stunned:

    stunEnemyIfClose :: Enemy -> Enemy
    stunEnemyIfClose e = if enemyLocation e `elem` stunAffectedCells
      then stunEnemy e
      else e

Now we can incorporate all our functions into a final update!

inputHandler :: Event -> World -> World
inputHandler event w
  | otherwise = case event of
      … -- (movement keys)
      (EventKey (SpecialKey KeySpace) Down _ _) ->
        if playerCurrentStunDelay currentPlayer /= 0 
          then w
          else w
            { worldPlayer = activatePlayerStun currentPlayer
            , worldEnemies = stunEnemyIfClose <$> worldEnemies w
            , stunCells = stunAffectedCells

Other small updates:

  1. When initializing game objects, they should get default values for their "next" timers. For the player, we give 200 (10 seconds). For the enemies, we stun them for 60 ticks (3 seconds) initially.
  2. When updating the world, clear out the "stun cells". Use another mutator function to achieve this:
clearStunCells :: World -> World
clearStunCells w = w { stunCells = []}

Take a look at this commit for a review on this part!

Drawing the Changes

Our game works as expected now! But as our last update, let's make sure we represent these changes on the screen. This will make the game a much better experience. Here are some changes:

  1. Enemies will turn yellow when stunned
  2. Affected squares will flash teal
  3. Our player will have red inner circle when the stun is ready

Each of these is pretty simple! For our enemies, we'll add a little extra logic around what color to use, depending on the stun timer:

enemyPic :: Enemy -> Picture
enemyPic (Enemy loc _ _ currentStun) =
  let enemyColor = if currentStun == 0 then orange else yellow
  in  Color enemyColor (Polygon [tl, tr, br, bl])

For the player, we'll add some similar logic. The indicator will be a smaller red circle inside of the normal black circle:

stunReadyCircle = if playerCurrentStunDelay (worldPlayer world) == 0
  then Color red (Circle 5)
  else Blank
playerMarker = translate px py (Pictures [stunReadyCircle, Circle 10])

Finally, for the walls, we need to check if a location is among the stunCells. If so, we'll add a teal (cyan) background.

makeWallPictures :: (Location, CellBoundaries) -> [Picture]
makeWallPictures ((x,y), CellBoundaries up right down left) =
  let coords = conversion (x,y)
      tl = cellTopLeft coords
      tr = cellTopRight coords
      bl = cellBottomLeft coords
      br = cellBottomRight coords
      stunBackground = if (x, y) `elem` stunCells world
        then Color cyan (Polygon [tl, tr, br, bl])
        else Blank
  in  [ stunBackground
      … (wall edges)

And that's all! We can now tell what is happening in our game, so we're done with these features! You can take a look at this commit for all the changes we made to the drawing!


Now our game is a lot more interesting. There's a lot of tuning we can do with various parameters to make our levels more and more competitive. For instance, how many enemies is appropriate per level? What's a good stun delay timer? If we're going to experiment with all these, we'll want to be able to load full game states from the file system. We've got a good start with serializing mazes. But now we want to include information about the player, enemies, and timers.

So next week, we'll go further and serialize our complete game state. We'll also look at how we parameterize the application and fix all the "magic numbers". This will add new options for customization and flexibility. It will also enable us to build a full game that gets harder as it goes on, and allow saving and loading of your progress.

Throughout this article (and series), we've tried to use a clean, precise development process. Read our Haskell Brain series to learn more about this! You can also download our Beginners Checklist if you are less familiar with the language!

by James Bowen at June 04, 2019 02:30 AM

Donnacha Oisín Kidney

Solving Programming Puzzles without using your Brain

Posted on June 4, 2019
Tags: Python

This post is a write-up of a solution to part of a programming puzzle I did yesterday. It’s a little different than the usual “solution + theory” approach, though: I’m going to talk about the actual steps you’d need to take to get to the solution (i.e. what to google, what intermediate code looks like, etc.). Often write ups like this are presented as finished artefacts, with little info on the tricks or techniques the author used to jog their intuition into figuring out the puzzle (or where some intermediate step requires a leap of insight). In actual fact, this particular puzzle requires almost no insight at all: I’m going to show how to get to a working solution without understanding any of the theory behind it!

Spoilers ahead for the google foobar problem “Distract the Guards”.

The Problem

We’re interested in a particular type of sequences of pairs of numbers. These sequences are generated from a starting pair <semantics>n<annotation encoding="application/x-tex">n</annotation></semantics> and <semantics>m<annotation encoding="application/x-tex">m</annotation></semantics> like so:

If <semantics>n<annotation encoding="application/x-tex">n</annotation></semantics> and <semantics>m<annotation encoding="application/x-tex">m</annotation></semantics> are equal, the sequence stops.

Otherwise, the smaller number is subtracted from the larger, and then the smaller is doubled, and the sequence continues with these two numbers.

Here’s an example starting with 3 and 5:

     3, 5
     6, 2
     4, 4
---- done ----

Once it hits 4, 4, the first condition is met, and the sequence stops. Not all of these sequences stop, however:

     1, 4
     2, 3
     1, 4
---- done ---- 

As you can see, in this case we loop back around to 1, 4: our task is to figure out, given a pair of numbers, whether the sequence generated by them loops forever, or stops at some point.

Step 1: Write a Dumb Solution

This step is crucial: before trying to figure out any of the deep mathematics behind the problem, write the dumbest thing that could work. You’re going to need it, anyway, to test your faster versions against, and besides, it might be good enough as-is!

The first function actually generates the sequence we’re interested in: it uses python’s generators to do so. The second function is just a generic function that checks a sequence for duplicates. Finally, the last function answers the question we’re interested in.

Step 2: Graph it

Next, we want to try and spot some patterns in the answers the function generates. Remember, we’re not really interested in figuring out the theory at this point: if we find out that a loop only happens when both numbers are even (for instance), that’s good enough for us and we can stop there!

We humans are pattern-matching machines: to leverage our abilities, though, we will need to visualise the data somehow. In this case, I’m going to plot a simple scatter graph to the terminal, using the following code (I apologise for my terrible indentation style):

And we get the following output:


*************************** ******************************* ********************
**************************** *** *********** ******************************* ***
************* *************** **************************************************
****************************** *************************************************
******************************* ************************************************
******************************** *********************** ******* ***************
********************************* **********************************************
** *************************** *** *********************************************
*********************************** ********************************************
************ ******* *************** *******************************************
***** *********************** ******* *************** *************** **********
************************************** *****************************************
*************************************** ****************************************
******** ******************* *********** ***************************************
***************************************** **************************************
****************************************** ******* *********************** *****
*********** *************** *************** ************************************
******************************************** ***********************************
********************************************* **********************************
************** *********** ******************* *************** *****************
*********************************************** ******************************* 
************************************************ ***************************** *
***************** ******* *********************** *************************** **
********** *********************** *************** ************************* ***
*************************************************** *********************** ****
**** *************** *** ******************* ******* ********************* *****
***************************************************** ******************* ******
****************************************************** ***************** *******
*********************** ******************************* *************** ********
******************************************************** ************* *********
********* ******************************* *************** *********** **********
********************** *** ******************************* ********* ***********
*********************************************************** ******* ************
************************************************************ ***** *************
********************* ******* ******************************* *** **************
************** *********************** *********************** * ***************
*************************************************************** ****************
 ******* *********** *********** *************** ************* * ***************
* *********************************************************** *** **************
** ********************************************************* ***** *************
*** *************** *************** *********************** ******* ************
**** ***************************************************** ********* ***********
***** *************************************************** *********** **********
****** *********** ******************* ***************** ************* *********
******* *********************************************** *************** ********
******** *************** ******* ********************* ***************** *******
********* ******* *********************** *********** ******************* ******
********** ***************************************** ********************* *****
*********** *************************************** *********************** ****
************ *** *************************** ***** ************************* ***
************* *************** ******************* *************************** **
****** ******* ********************************* ************* *************** *
*************** ******************************* ******************************* 
**************** ***************************** *********************************
***************** *************************** **********************************
** *********** *** ******* ******* ********* ***** *********************** *****
******************* *********************** ************************************
******************** ********************* *************************************
***** ******* ******* ******************* *********** *************** **********
********************** ***************** ***************************************
*********************** *************** ****************************************
******** *** *********** ************* ***************** ******* ***************
************************* *********** ******************************************
************************** ********* *******************************************
*********** *************** ******* *********************** ********************
**** *************** ******* ***** ********* ******************************* ***
***************************** *** **********************************************
********** *** *************** * ********************* ******* *****************
******************************* ************************************************
 ***************************** * ***********************************************
* ******* ******* *********** *** *************** *************** **************
** ************************* ***** *********************************************
*** *********************** ******* ********************************************
**** *** *********** ***** ********* ******* *********************** ***********
***** ******************* *********** *************************************** **
****** ******* ********* ************* *************** ******************* *****
******* *************** *************** ******************************* ********
******** ************* ***************** *************************** ***********
********* *********** ******************* *********************** **************
** *** *** ********* ***** ******* ******* ******************* *********** *****
*********** ******* *********************** *************** ********************
************ ***** ************************* *********** ***********************
***** ******* *** *********** *************** ******* *********************** **
************** * ***************************** *** *****************************
*************** ******************************* ********************************
 *** *** ***** * ******* ******* *********** *** *************** ***************
* *********** *** *********************** ******* ******************************
** ********* ***** ******************* *********** *****************************
*** ******* ******* *************** *************** ****************************
**** ***** ********* *********** ******************* *********************** ***
***** *** *********** ******* *********************** *************** **********
** *** * ***** ******* *** *********** *************** ******* *****************
******* *************** ******************************* ************************
 ***** * *********** *** *********************** ******* ***********************
* *** *** ******* ******* *************** *************** **********************
** * ***** *** *********** ******* *********************** *************** *****
*** ******* *************** ******************************* ********************
 * * *** *** ******* ******* *************** *************** *******************
* *** ******* *************** ******************************* ******************
 * *** ******* *************** ******************************* *****************

There’s a clear pattern there, but it might be easier to see if we inverted it, plotting those things which don’t loop:


                           *                               *                    
                            *   *           *                               *   
             *               *                                                  
                                *                       *       *               
  *                           *   *                                             
            *       *               *                                           
     *                       *       *               *               *          
        *                   *           *                                       
                                          *       *                       *     
           *               *               *                                    
              *           *                   *               *                 
                                               *                               *
                                                *                             * 
                 *       *                       *                           *  
          *                       *               *                         *   
                                                   *                       *    
    *               *   *                   *       *                     *     
                                                     *                   *      
                                                      *                 *       
                       *                               *               *        
                                                        *             *         
         *                               *               *           *          
                      *   *                               *         *           
                                                           *       *            
                                                            *     *             
                     *       *                               *   *              
              *                       *                       * *               
*       *           *           *               *             * *               
 *                                                           *   *              
  *                                                         *     *             
   *               *               *                       *       *            
    *                                                     *         *           
     *                                                   *           *          
      *           *                   *                 *             *         
       *                                               *               *        
        *               *       *                     *                 *       
         *       *                       *           *                   *      
          *                                         *                     *     
           *                                       *                       *    
            *   *                           *     *                         *   
             *               *                   *                           *  
      *       *                                 *             *               * 
               *                               *                               *
                *                             *                                 
                 *                           *                                  
  *           *   *       *       *         *     *                       *     
                   *                       *                                    
                    *                     *                                     
     *       *       *                   *           *               *          
                      *                 *                                       
                       *               *                                        
        *   *           *             *                 *       *               
                         *           *                                          
                          *         *                                           
           *               *       *                       *                    
    *               *       *     *         *                               *   
                             *   *                                              
          *   *               * *                     *       *                 
*                             * *                                               
 *       *       *           *   *               *               *              
  *                         *     *                                             
   *                       *       *                                            
    *   *           *     *         *       *                       *           
     *                   *           *                                       *  
      *       *         *             *               *                   *     
       *               *               *                               *        
        *             *                 *                           *           
         *           *                   *                       *              
  *   *   *         *     *       *       *                   *           *     
           *       *                       *               *                    
            *     *                         *           *                       
     *       *   *           *               *       *                       *  
              * *                             *   *                             
               *                               *                                
*   *   *     * *       *       *           *   *               *               
 *           *   *                       *       *                              
  *         *     *                   *           *                             
   *       *       *               *               *                            
    *     *         *           *                   *                       *   
     *   *           *       *                       *               *          
  *   * *     *       *   *           *               *       *                 
       *               *                               *                        
*     * *           *   *                       *       *                       
 *   *   *       *       *               *               *                      
  * *     *   *           *       *                       *               *     
   *       *               *                               *                    
* * *   *   *       *       *               *               *                   
 *   *       *               *                               *                  
* *   *       *               *                               *                 

For this kind of thing it’s also worth getting familiar with gnuplot.

Step 3: Reduce The Space

The clearest pattern in the graph above is the straight lines coming from the origin. This tells me, straight away, that we have an opportunity for optimisation if we wanted to memoize. We can’t yet be sure, but it looks like every point belongs to one of these straight lines. That means that once we find a non-looping pair like 3, 5, we can extend that line out to 6, 10 and 9, 15, etc.

We can also see that the graph has a symmetry through the line x = y. This means that if 3, 5 doesn’t loop, neither does 5, 3.

Both of these techniques allow us to reduce the arguments to a canonical form, making the memoization table smaller, also. In code:

Step 4: Test

Now that we have our faster version, we want to be able to quickly check that it’s equivalent to the slow. While Python is usually great for programming puzzles, this step in particular is crying out for something like QuickCheck: without it, we’ll have to roll our own.

We’re not looking for certainty here, just something that will quickly spot an error if one exists.

Step 5: More Sophisticated Patterns

Now that we’ve made some of the more obvious optimisations, it’s time to move on to finding another pattern in the output. To do this, we’ll use We want to find if the pairs which pass our test follow some sequence which has a simple generating function which we can adapt into a test.

Since the things we’re testing are pairs, rather than individual numbers, we’ll have to fix one of them and see if there’s a pattern in the other.

This prints the following sequence:

[1, 3, 7, 15, 31, 63]

And when we search for it on oeis, we get this as the top result:

0, 1, 3, 7, 15, 31, 63, 127, 255, 511, 1023, 2047, 4095, 8191, 16383, 32767,
65535, 131071, 262143, 524287, 1048575, 2097151, 4194303, 8388607, 16777215,
33554431, 67108863...

And looking at the comments under the sequence, we see the following:

Numbers n for which the expression 2^n/(n+1) is an integer. - Paolo P. Lava, May 12 2006

A test for members of the sequence, all packaged up for us!

But how do we generalise to pairs other than 1? Well, as a total guess, we can see that 1 appears in one place in the formula: why not replace that with the other member of the pair?

After that, we get the following function to test:

And it works!

Step 6: Look For Algorithms

This last step is pretty straightforward: see if there’s an algorithm already out there that solves your problem. In our case, taking the modulus is still pretty slow, but it turns out that modular exponentiation (i.e. computing expressions of the form x^y mod z) can be done faster than the naive way. In fact, python provides this algorithm as a function in the standard library, making our last version of the function the following:

I’m not sure if this function is fully correct, but it was accepted as a solution to the puzzle.

Anyway, in conclusion: you can get quite far through a programming puzzle by applying some educated guesses and googling!

by Donnacha Oisín Kidney at June 04, 2019 12:00 AM

June 03, 2019

Brandon Simmons

GHC LLVM LTO Experiments Scratch Notes

This is a followup to this previous post where I play with GCC’s link-time optimization in GHC-compiled programs. I note there that this is limited since it only works on sources compiled from C (crucially the RTS). Here I’ve compiled rough notes doing the same thing with LLVM LTO, where I attempt to get true whole-program optimization cooking.

The goals were to play with some new tech and possibly discover some significant performance gains (without thinking very hard) that could be ported back or realized in a less hacky way. Results were underwhelming and so I don’t do much real digging into assembly (the other thing besides good results that would have made for an interesting post). So after waffling I decided to basically dump my notes here, lightly edited, in case it’s helpful to someone who wants to take up similar work later.

The GHC branch with scratch work is also a mess since it took a lot of fiddling and some manual intervention (below) before a full build completed. If you’re really interested in this work shoot me an email and I can try to help.

Building GHC for LLVM LTO

I used shims as before to quickly manipulate clang, lld, etc for LTO. The first issue a ran into was that symbols StgReturn and StgRun (both defined with inline asm) weren’t making it into the RTS library archive (in particular StgCRun.thr_o), possibly due to dead-code elimination; note all objects are actually LLVM bitcode, not native object code.

I worked around this by adding the functions back directly to the LLVM IR which was quite nasty. See commit message.

The next big issue was we weren’t able to produce LLVM IR bitcode from haskell code (e.g. in libraries) due to the LLVM mangler which fiddles directly with assembly, and is supposedly necessary for tables-next-to-code.

Normally this is what happens when you compile with -fllvm

opt-6.0 -O2 -enable-tbaa -tbaa '-relocation-model=static' hello.ll -o /tmp/ghc15832_0/ghc_2.bc
llc-6.0 -O2 -enable-tbaa '-relocation-model=static' '-mcpu=x86-64' '-mattr=+sse2,+sse' /tmp/ghc15832_0/ghc_2.bc -o /tmp/ghc15832_0/ghc_3.lm_s
*** LLVM Mangler
clang -fno-stack-protector -DTABLES_NEXT_TO_CODE -iquote. -Qunused-arguments -x assembler -c /tmp/ghc15832_0/ghc_4.s -o hello.o
Linking blaaaah.lto ...
*** C Compiler:
clang -fno-stack-protector -DTABLES_NEXT_TO_CODE -c /tmp/ghc15832_0/ghc_5.c -o /tmp/ghc15832_0/ghc_6.o ...
*** C Compiler:
clang -fno-stack-protector -DTABLES_NEXT_TO_CODE -c /tmp/ghc15832_0/ghc_8.s -o /tmp/ghc15832_0/ghc_9.o ...
*** Linker:
clang -fno-stack-protector -DTABLES_NEXT_TO_CODE -o blaaaah.lto -lm -Wl,--gc-sections hello.o ...
link: done

But I found an (undocumented/experimental?) flag -fast-llvm, that bypasses the mangler like so:

opt-6.0 -O2 -enable-tbaa -tbaa '-relocation-model=static' hello.ll -o /tmp/ghc18922_0/ghc_2.bc
llc-6.0 -O2 -enable-tbaa '-relocation-model=static' '-filetype=obj' '-mcpu=x86-64' '-mattr=+sse2,+sse' /tmp/ghc18922_0/ghc_2.bc -o hello.o
Linking blaaaah.lto ...
*** C Compiler:
clang -fno-stack-protector -DTABLES_NEXT_TO_CODE -c /tmp/ghc18922_0/ghc_3.c -o /tmp/ghc18922_0/ghc_4.o ...
*** C Compiler:
clang -fno-stack-protector -DTABLES_NEXT_TO_CODE -c /tmp/ghc18922_0/ghc_6.s -o /tmp/ghc18922_0/ghc_7.o ...
*** Linker:
clang -fno-stack-protector -DTABLES_NEXT_TO_CODE -o blaaaah.lto -lm -Wl,--gc-sections hello.o ...

So I manually added -fast-llvm to all the cabal files in the GHC tree, shimmed llc-6.0 to be just a cp (so hello.o stays an LLVM IR bitcode that can participate in LTO).

It’s not clear to me why the mangler doesn’t seem to actually be necessary (for the binaries I tested, on my machine).

The shims contained a fallback to compiling to object code, but eventually I got a GHC source tree of mostly LLVM bitcode:

$ for f in `find libraries -name '*.o'`; do grep -q LLVM <(file "$f") && echo "$f"; done  2>/dev/null | wc -l
$ for f in `find libraries -name '*.o'`; do grep -q  ELF <(file "$f") && echo "$f"; done  2>/dev/null | wc -l


I played with a few things in addition to just LTO (again timing hlint).


This was basically the same process as before, using AutoFDO. It wasn’t clear if it was really working as well as it could. Got a lot of warnings like:

ld.lld: warning: No debug information found in function r3UWn_info$def: Function profile not used
ld.lld: warning: No debug information found in function r3UUK_info$def: Function profile not used
ld.lld: warning: No debug information found in function hlintzm2zi0zi12zminplace_HintziPattern_zdwpatternHint_info$def: Function profile not used

Should have tried again with -fprofile-sample-accurate which treats functions with no call data as cold rather than unknown. This isn’t in clang 6, but I learned later that newer clang/LLVM seem to work pretty well with IR from GHC’s LLVM backend.


POLLY is a research optimization framework targeting LLVM IR and which is included/installed by default with my llvm-6 distribution, and it’s straightforward to try from opt or clang.

As an aside, to see what opt -polly (or other flags) actually does, you can do:

    $ touch fake
    $ opt fake  -O3 -polly -debug-pass=Arguments   # and compare to...
    $ opt fake  -O2 -debug-pass=Arguments

It’s a bit sketchy since optimization passes are not generally idempotent, but I decided to just run opt -O3 -polly over all the (optimized) LLVM IR in the GHC tree and ~/.cabal/store. There are a couple scripts for unpacking/repacking library archives in the SHIMS directory, if you ever need something like that.


An experimental tool from google that was annoying to install but actually pretty straightforward to use. It tries to discover new peephole-style optimization opportunities using an SMT solver (there’s like a whole paper and stuff), so it’s more something for compiler writers rather than something to include in an application compilation pipeline.

I first focused on Evac.c, a hotspot in the RTS. This found a few optimizations:

$ souper -z3-path=/home/me/.local/bin/z3  Evac_thr.thr_o

Ran much longer and found a few more:

$ souper -z3-path=/home/me/.local/bin/z3  -souper-exhaustive-synthesis   Evac_thr.thr_o

Here’s one example picked at random that you can stare at for a while:


Timing Results

Again an hlint run over the lens code tree, like:

hlint lens-4.17 --report=/dev/null -q --threads=4 +RTS -s -A32M
  • Stock GHC 8.6 (native codegen):

     30,029,597,382 instructions
     3.577023660 seconds time elapsed
  • Stock LLVM (stock ghc + fllvm)

     29,361,619,562 instructions:u            ( +-  0.45% )
     3.412444661 seconds time elapsed      ( +-  1.49% )
  • LLVM LTO after successfully compiling libraries to LTO objs:

     28,858,041,897 instructions:u            ( +-  0.03% )
     3.372028439 seconds time elapsed      ( +-  0.19% )
  • LLVM LTO (-O3):

     27,912,731,892 instructions:u            ( +-  0.17% )
     3.330403689 seconds time elapsed      ( +-  1.00% )
  • First attempt at PGO (lots of warnings about lack of profile data (mangling issue?)):

     27,965,192,528 instructions:u            ( +-  0.25% )
     3.396919779 seconds time elapsed      ( +-  1.97% )
  • PGO with O3:

     27,882,344,880 instructions:u            ( +-  0.06% )
     3.332535151 seconds time elapsed      ( +-  0.88% )

     27,509,027,644 instructions:u            ( +-  0.41% )
     3.370770830 seconds time elapsed      ( +-  1.51% )
  • LLVM LTO O3 + Souper over full cabal new-build:

     27,799,718,133 instructions:u            ( +-  0.18% )
     3.292543353 seconds time elapsed      ( +-  0.73% )
  • LLVM LTO O3 + souper on Evac (RTS):

     27,905,099,478 instructions:u            ( +-  0.16% )
     3.308439822 seconds time elapsed      ( +-  0.47% )

Stripped hlint executable sizes for LLVM lto experiments…

    20945776    llvm_lto
    20966256    llvm_lto_pgo1
    21113712    llvm_lto_O3
    21336792    llvm_lto_souper_evac_lld9
    21349120    llvm_lto_O3_souper_all
    21349184    llvm_lto_O3_souper_all_exhaustive
    21357304    llvm_lto_souper_evac
    22039256    llvm_lto_polly

…and for previous gcc LTO work:

    22415344    lto_pgo
    22423424    lto_Os
    22431616    lto_basic_O2
    22431616    lto_march_native
    22431616    lto_march_native2
    22431616    lto_march_native_O3
    22432216    gcc_stock
    24612696    llvm_stock

June 03, 2019 01:29 AM

June 02, 2019

Shayne Fletcher

Have GHC parsing respect dynamic pragmas

Have GHC parsing respect dynamic pragmas

This post about Handling GHC parse errors shows that using qualified in postpostive position is a syntax error unless the ImportQualifiedPost language extension is enabled. In that post, it is explained that the program

module M where
import Data.List qualified
is invalid whereas,
{#- LANGUAGE ImportQualifiedPost -#}
module M where
import Data.List qualified
which enables the extension via a "dynamic pragma", is legit.

Perhaps surprisingly, running the second of these programs through the parsing code presented in that post continues to generate the error

     Found `qualified' in postpositive position.
To allow this, enable language extension 'ImportQualifiedPost'
Evidently, our parse-fu needs an upgrade to respect dynamic pragmas and that's what this post provides.

This code exercises the GHC API to parse a module.

parse :: String -> DynFlags -> String -> ParseResult (Located (HsModule GhcPs))
parse filename flags str =
unP Parser.parseModule parseState
location = mkRealSrcLoc (mkFastString filename) 1 1
buffer = stringToStringBuffer str
parseState = mkPState flags buffer location

Note in the above, the second argument flags :: DynFlags. In order for parse to take into account extensions enabled by pragmas in the source argument str, then flags must be set up to do so a priori. That is, before jumping into parse, a "first pass" must be made to sniff out flags. There is a GHC API for that. It's called parseDynamicFilePragma.

Here's a function to harvest flags from pragmas that makes that call to parseDynamicFlags.

parsePragmasIntoDynFlags :: DynFlags -> FilePath -> String -> IO (Maybe DynFlags)
parsePragmasIntoDynFlags flags filepath str =
catchErrors $ do
let opts = getOptions flags (stringToStringBuffer str) filepath
(flags, _, _) <- parseDynamicFilePragma flags opts
return $ Just flags
catchErrors :: IO (Maybe DynFlags) -> IO (Maybe DynFlags)
catchErrors act = handleGhcException reportErr
(handleSourceError reportErr act)
reportErr e = do putStrLn $ "error : " ++ show e; return Nothing
The main contribution of this function is to account for the complication that parseDynamicFilePragma can throw two kinds of exceptions : GhcException and SourceError. The GHC API functions handleGhcException and handleSourceError are the means to achieve that.

Putting it all together then, here's an outline of how to parse in the presence of dynamic pragmas.

      s <- readFile' file
flags <-
(defaultDynFlags fakeSettings fakeLlvmConfig) file s
whenJust flags $ \flags ->
case parse file flags s of
PFailed s ->
report flags $ snd (getMessages s flags)
POk s m -> do
let (wrns, errs) = getMessages s flags
report flags wrns
report flags errs
when (null errs) $ analyzeModule flags m
For a complete working program that utilizes this function, see this example in the ghc-lib repo.

by Shayne Fletcher ( at June 02, 2019 03:57 PM

Handling GHC parser errors right

Handling GHC parser errors right

Did you know, a POk parse result from the GHC parser doesn't necessarily mean the parse was OK? This blog explains what's up with that. The source code below is from this example in the ghc-lib repo.

Here is code that tries to make a parse tree of a Haskell module.

parse :: String -> DynFlags -> String -> ParseResult (Located (HsModule GhcPs))
parse filename flags str =
unP Parser.parseModule parseState
location = mkRealSrcLoc (mkFastString filename) 1 1
buffer = stringToStringBuffer str
parseState = mkPState flags buffer location

The way to call the above code is like this.

case parse file flags s of
PFailed s ->
report flags $ snd (getMessages s flags)
POk s m -> do
report flags $ fst (getMessages s flags)
analyzeModule flags m
In the PFailed s case (where s is the parse state), the expression snd (getMessages s flags) retrieves the errors and we report them. In the POk case, we report warnings and do whatever it is we wanted to do with the parse tree m right?

Not quite. The problem is that the parser produces two sorts of errors : "fatal" and "non-fatal". Thus far, we have only considered the "fatal" ones.

Fatal errors are such that production of a parse tree is impossible. Non-fatal parse errors are those that don't prevent construction of a parse tree. A parse that generates non-fatal errors is going to associate with a parse tree in some way non-conforming to the Haskell language specification.

The right way to write the POk case is like this.

POk s m -> do
let (warns, errs) = getMessages s flags
report flags warns
report flags errs
when (null errs) $ analyzeModule flags m
The key point is analyzeModule is called only if there are absolutely no parse errors at all.

A non-fatal error example is provided by the ImportQualifiedPost language extension (see this post for how to add a GHC language extension). Specifically, it is only legal to write import M qualified if the extension is in effect via pragma or the option -XImportQualifiedPost. In the event this syntax is used when the extension is not in effect, the user should see an error like

 test/MiniHlintTest_non_fatal_error.hs:6:18: error:
Found `qualified' in postpositive position.
To allow this, enable language extension 'ImportQualifiedPost'
and further analysis of the parse abandoned.

by Shayne Fletcher ( at June 02, 2019 03:53 PM

JP Moresmau

Rust interviews questions

A few days ago this blog entry sparkled a big discussion on Reddit, and it got me thinking that I just couldn't be relying on my 20 years experience in professional programming if I ever needed another job. I need to practice small, low level coding problems.

So of course I decided to use Rust. I'm putting my code on Github so feel free to have a look if you're interested. I tried to not look at existing resources, I'm sure you can find plenty of Rust interview answers and toy problems if you look around :-).

First I tackled some array operations like finding and removing duplicates, implementing quicksort, etc. I just used Vec and implemented the algorithms only. In some cases of course a Set is what makes sense (finding duplicates) so I just used another vector as a set, of course implementing an efficient integer set would a good problem too. Here I didn't run into any major issue.

The second (and last, for now) thing I did is to implement a Linked List from scratch. The list owns its values of course, so implementing pop/push methods and iterator support proved "interesting"! Now I really had to understand the borrow checker, owning vs referencing, etc. I even tried to do thing using unsafe code like Box::into_raw! And? Well, my program... crashed. I think I'll tread carefully if ever I need to use unsafe.
In the end, mem::replace was my savior. I could get and own a cell from my list by swapping it with an empty cell, do whatever I needed to do, and reset the cell to a proper value. That worked a charm!

So, I'm still happy enough with Rust. The error messages are for the most part clear enough to be able to understand, I do find enough on the web between the official docs and Stack Overflow to save me from being stuck. Rust is good!

Happy Rust Hacking!

by JP Moresmau ( at June 02, 2019 01:46 PM

May 31, 2019

Sandy Maguire

Polysemy Internals: Freer Interpretations of Higher-Order Effects

aka "what the hell is that Yo type?"

This is the first post in a series of implementation details in polysemy --- a fast, powerful and low-boilerplate effect-system library.

Even if you're not particularly interested in polysemy, there are some functional pearls here --- and a crash course on the history on the implementations of free monads in Haskell.

Critics of free monads often make the claim that higher-order effects aren't possible. This has historically been true, but Wu, Schrijvers and Hinze's paper Effect Handlers in Scope gives a technique for lifting the restriction. Today I want to illustrate the problem, discuss Wu et al.'s solution, and then show what changes polysemy makes to remove the boilerplate. In the process, we'll look at finding free constructions for tricky typeclasses.

The Problem

Let's consider the Error e effect, in which we'd like to be able to throw errors of type e, and catch any errors thrown within a specific block of code. You're already familiar with this concept, in transformers it's called ExceptT e, and in mtl, MonadError e. A typical usage of this effect might be:

foo =
    do             -- computation to run
      when (not someBool) $ throw SomeError
      pure True
    \SomeError ->  -- error handler
      pure False

We would expect foo to be pure False whenever someBool is False; and vice versa. The idea is that a throw should short-circuit the rest of the computation, until it reaches the end of a catch statement. This is the basis of every exception system of all time, so we won't belabor the example any further.

Given some appropriate m, we'd like to model this problem with the following interface:

throw :: e -> m a
catch :: m a -> (e -> m a) -> m a

In first-order effect systems such as freer-simple, our effects have kind * -> *. With such a kind, we can easily model throw, but it's less clear how to model catch:

data Error e a where
  Throw :: e -> Error e a
  Catch :: ??

We simply don't have an m available to us in order to write something equivalent to m a -> (e -> m a) -> m a. There are a few unsatisfactory solutions here --- you can either choose a concrete m and bake it in (which defeats the entire purpose of effect systems), or you can attempt to encode m somewhere inside of the Error e part. Neither is fruitful.

freer-simple actually takes a pretty clever approach to this problem. Instead of modeling catch in the Error e effect, it just provides catch as a function:

    :: Member (Error e) r
    => Eff r a
    -> (e -> Eff r a)
    -> Eff r a
catch ma f = -- replace every call to `throw e` in `ma` with `f e`

And what do you know, this solution actually works pretty well. It accurately captures the semantics of catch for ExceptT. Success! For most people, most of the time, this implementation of catch is perfectly fine.

But let's consider an interpretation of Error e which isn't completely analogous to ExceptT. After all, the whole point of effect-systems is to be able to arbitrarily reinterpret the meaning of your programs. So let's pretend that we're writing an interpretation of the system which wants to audit the happy code path. As a result, we'd like to log whether or not we successfully got to the end of a catch block.

In essence, we'd like to replace every call to catch ma f with:

catch' ma f = catch (ma <* logSuccessfulExit) f

meaning logSuccessfulExit will be called if and only if ma didn't contain a throw statement.

Unfortunately, the clever encoding of catch as a separate function outside of Effect e means that this interpretation of catch is impossible. The problem is fundamentally that by virtue of being outside the effect, catch must choose its own interpretation of catching effects, and you're out of luck if its choice isn't what you want.

This is a bit of a contrived example, but it shows up every time you want to embed a computation; such as doing callbacks, coroutines, asynchronous work, or resource bracketing. It's a big class of problems that quickly become untenable in the first-order world.

Effect Handlers in Scope

Wu et al. give us a real solution for the problem above. Instead of modeling our effects with kind * -> *, we give them a kind (* -> *) -> * -> *. This extra (* -> *) is enough to hold a monad in. As such, Error e is now modeled as:

data Error e m a where
  Throw :: e -> Error e m a
  Catch :: m a -> (e -> m a) -> Error e m a

This extra m parameter lets us write Catch as a constructor, meaning it is now part of the effect algebra. By writing clever constructors, we can force m to be the effect stack we're running in:

    :: Member (Error e) r
    => Eff r a -> (e -> Eff r a) -> Eff r a

which nicely ties the recursive knot.

This change is pretty straightforward, and has probably occurred to most people who've spent any time playing around with the internals of first-order free monads. However, here is where the first problem sets in.

Effect systems model interpretations of effects as functions. For example, lets' assume we have a State s effect to play with. We can give an interpretation of it with the type:

runState :: s -> Eff (State s ': r) a -> Eff r (s, a)

In the first-order world, you can just have runState walk through every action in Eff, and handle the State s ones. In the higher-order world, however, we also need to run runState on all of the embedded computations (like Catch) as well --- and then somehow merge the resulting side states back into the main thread.

Recall above that we tied the recursive knot on catch, so that the m in Error e m was always equal to the actual Eff monad its being run in. By calling runState, we're promising that that m is of the form Eff (State s ': r). But now we're eliminating the State s effect, and we want to maintain the invariant that m is the same monad. Which means, we need to somehow use runState to eliminate the State s inside of Catch.

It makes my head spin, too. English is not particularly good at describing these kinds of things, so pay attention to the types here:

  1. We called catch :: Eff r a -> (e -> Eff r0 a) -> Eff r0 a somewhere in our application code
  2. We then interpret the application via runState :: s -> Eff (State s ': r1) a -> Eff r1 (s, a)
  3. As such, we learn that r0 ~ (State s ': r1)
  4. After calling runState, we are left only with r1 in our effect stack.
  5. But catch still contains r0. We need to transform it into r1 to maintain our invariant that the computations embedded inside catch are in same monad as the call to catch.

Doing such a thing is going to require a function:

    :: s
    -> Error (Eff (State s ': r)) a
    -> Error (Eff r) (s, a)

which for reasons that will become clearer later, we will uncurry into:

    :: (s, Error (Eff (State s ': r)) a)
    -> Error (Eff r) (s, a)

The implementation of this function is guided by the types, and looks like this:

    :: (s, Error (Eff (State s ': r)) a)
    -> Error (Eff r) (s, a)
call'runState'InsideError (_, Throw e) = Throw e
call'runState'InsideError (s, Catch ma f) =
  Catch (runState s ma)
        (\e -> runState s $ f e)

Such an example is helpful for building intuition, but is completely infeasible in the real world. Not only do we need one of these functions for every effect inside of our stack, but we also need one for every interpretation of every effect in our stack! This is O(m*n) functions in the number of effects and interpretations we have.

The insight of Wu et al. is that we can get this down to O(n) --- one function analogous to call'runState'InsideError for each effect. Let's go through the derivation together.

The first thing to notice is that we don't need to hard-code runState in call'runState'InsideError'. It's fine to just pass it in as a parameter:

    :: (forall x. (s, Eff (State s ': r) x) -> Eff r (s, x))
    -> (s, Error (Eff (State s ': r)) a)
    -> Error (Eff r) (s, a)
elimStateInsideError _ (_, Throw e) = Throw e
elimStateInsideError elimState (s, Catch ma f) =
  Catch (elimState (s, ma))
        (\e -> elimState (s, f e))

Note that the elimState function must be rank-2 so that we can use it on every instance of Catch --- there's no guarantee that they'll all be called to produce the same type.

The next step is to notice that there's a homomorphism here; we transforming a (s, m a) into m' (s, a), by somehow pushing the (,) s bit through the monad. We can make that a little more clear by explicitly factoring it out:

    :: (f ~ ((,) s))
    => (forall x. f (Eff (State s ': r) x) -> Eff r (f x))
    -> f (Error (Eff (State s ': r)) a)
    -> Error (Eff r) (f a)

This type is identical to before, we've just renamed (,) s to f. Let's do the same renaming trick on Eff (State s ': r):

    :: ( f ~ ((,) s)
       , m ~ Eff (State s ': r)
    => (forall x. f (m x) -> Eff r (f x))
    -> f (Error m a)
    -> Error (Eff r) (f a)

and then again on Eff r:

    :: ( f ~ ((,) s)
       , m ~ Eff (State s ': r)
       , n ~ Eff r
    => (forall x. f (m x) -> n (f x))
    -> f (Error m a)
    -> Error n (f a)

As it stands, our current implementation of elimStateInsideError will actually work for any m and n; so we can just get rid of those renames:

    :: (f ~ ((,) s))
    => (forall x. f (m x) -> n (f x))
    -> f (Error m a)
    -> Error n (f a)
elimEffectInsideError _ (_, Throw e) = Throw e
elimEffectInsideError elim (s, Catch ma f) =
  Catch (elim (s, ma))
        (\e -> elim (s, f e))

Let's now undo our uncurrying of our s -> Error m a -> ... as (s, Error m a) -> .... But since we've renamed s away, we're not allowed to reference it anymore. Instead, we can use f (), aka (s, ()), which you'll notice is isomorphic to s.

    :: (f ~ ((,) s))
    => (forall x. f (m x) -> n (f x))
    -> f ()
    -> Error m a
    -> Error n (f a)
elimEffectInsideError _ _ Throw e = Throw e
elimEffectInsideError elim (s, ()) (Catch ma f) =
  Catch (elim (s, ma))
        (\e -> elim (s, f e))

As one last step, we can rewrite the explicit destructuring of the f () parameter using its functor instance. Given the ice-cream cone function (<$) :: Functor f => a -> f b -> f a, which replaces the contents of a functor, we can rewrite elimEffectInsideError as follows:

    :: (f ~ ((,) s))
    => (forall x. f (m x) -> n (f x))
    -> f ()
    -> Error m a
    -> Error n (f a)
elimEffectInsideError _ _ Throw e = Throw e
elimEffectInsideError elim s (Catch ma f) =
  Catch (elim $ ma <$ s)
        (\e -> elim $ f e <$ s)

and in doing so, are now fully functor-agnostic, so we can get rid of the f-renaming now:

    :: Functor f
    => (forall x. f (m x) -> n (f x))
    -> f ()
    -> Error m a
    -> Error n (f a)

That was a lot of work! But we've bought ourselves a huge amount with this. Now elimEffectInsideError is general enough that it supports eliminating any effect inside of Error. The last step is to wrap this thing up into a typeclass, which Wu et al. call weave:

class ( m. Functor m => Functor (e m)) => Effect e where
      :: (Functor f, Functor m, Functor n)
      => f ()
      -> ( x. f (m x) -> n (f x))
      -> e m a
      -> e n (f a)

Don't worry about the extra mentions of Functor in this definition; they're there for reasons we don't care about today.

By giving an instance of Effect for e, we can now thread any other effects through e. If we give an instance of Effect for every effect, we get higher-order effects that can be run through one another in any order. Happy days!

This weave transformation is the major contribution of Effect Handlers in Scope. And while it does indeed solve the problem of higher-order effects, such a thing brings with it a lot of boilerplate; we need to write an instance of Effect for each of our effects, which is non-trivial and can't be automated via today's support for generics.

Free Effects

Back in the bad old days of free, we would have had to model the first-order version of Error e above (the one that just has Throw) as follows:

data Error e a = forall x. Throw (x -> a)

while State s would look like this:

data State s a
  = Get (s -> a)
  | Put s (() -> a)

It's gross, and you'd need to give Functor instances for both. AND you can't even derive Functor for Error e due to the existential.

The specifics here aren't very important, but the point is that this was a bunch of boilerplate that got in the way of doing any work. The main contribution of Kiselyov and Ishii's paper Freer Monads, More Extensible Effects is that we can use a free functor to automate away this boilerplate. The result is what puts the "simple" in freer-simple1.

The free functor is called Coyoneda2, and it looks like this:

data Coyoneda f b where
  Coyoneda :: f a -> (a -> b) -> Coyoneda f b

instance Functor (Coyoneda f) where
  fmap f' (Coyoneda fa f) = Coyoneda fa (f' . f)

As you can see, Coyoneda f is a Functor, even when f itself isn't. Coyoneda just accumulates all of the fmaps you wanted to do, and you can choose later what to do with the resulting function.

This got me to thinking. Maybe there's a free Effect that can likewise accumulate all of the weaveing we'd like to do, so that library users don't need to write those instances themselves.

The "trick" to making a free construction is to just make a datatype that stores each parameter to the characteristic function. In the Functor example, you'll notice a similarity between the types of (flipped) fmap and Coyoneda:

flip fmap :: f a -> (a -> b) -> f b
Coyoneda  :: f a -> (a -> b) -> Coyoneda f b

So let's do the same thing, for weave, and construct an equivalent datatype. Recall the type of weave:

    :: (Functor f, Functor m, Functor n)
    => f ()
    -> ( x. f (m x) -> n (f x))
    -> e m a
    -> e n (f a)

As a first attempt, let's just turn this thing into a GADT and see what happens. I called it Yo a little because it's sorta like Coyoneda, but mostly because naming things is hard.

data Yo e m a where
  Yo :: Functor f
     => e m a
     -> f ()
     -> (forall x. f (m x) -> n (f x))
     -> Yo e n (f a)

While this looks right, it turns out to be a no-go. We can't actually give an instance of Effect for Yo e. We can get close, by realizing that the composition of any two functors is also a functor (given via the Compose newtype). With that in mind, it's just a little work to make all of the types line up:

instance Effect (Yo e) where
  weave s' elim' (Yo e s elim) =
    Yo e (Compose $ s <$ s')
         (fmap Compose . elim' . fmap elim . getCompose)

Unfortunately, this definition doesn't quite work. The problem is that weave s elim is supposed to result in a e m a -> e n (f a), but ours has type e m (g a) -> e n (Compose f g a)! By hard-coding that f into the result of our GADT, we've painted ourselves into a corner. Similar problems would crop up if we wanted to give a Functor instance to Yo e m.

As is so often the case in this line of work, the solution is to make f existential, and to take another function which is responsible for producing the desired type. We add a (f a -> b) parameter to Yo, and make it return Yo e n b:

data Yo e m a where
  Yo :: Functor f
     => e m a
     -> f ()
     -> (forall x. f (m x) -> n (f x))
     -> (f a -> b)
     -> Yo e n b

We can now call getCompose in this last function --- in order to undo our trick of packing the two pieces of state together.

instance Effect (Yo e) where
  weave s' elim' (Yo e s elim f) =
    Yo e (Compose $ s <$ s')
         (fmap Compose . elim' . fmap elim . getCompose)
         (fmap f . getCompose)

Giving an instance of Functor (Yo e m) can also riff on this final parameter, exactly in the same way that Coyoneda did:

instance Functor (Yo e m) where
  fmap f' (Yo e s elim f) = Yo e s elim (f' . f)

(The real implementation also needs hoist :: (forall x. m x -> n x) -> e m a -> e n a, which turns out to be a special case of weave. This is left as an exercise for the ambitious reader.)

All that's left is be able to lift e m as into Yo e m as. In every free construction I've ever seen, this operation is to just fill all of your parameters with identity --- and this case is no different!

liftYo :: Functor m => e m a -> Yo e m a
liftYo e = Yo e (Identity ()) (fmap Identity . runIdentity) runIdentity

We're done! This funny Yo construction is powerful enough to coalesce entire chains of effect interpreters into a single call. We haven't done anything magical here --- someone still needs to figure out what these functions actually mean for their interpretation. By collecting it all into a single place, we can cut down on boilerplate and find easier ways to express these concepts to the end-user.

But that's a tale for another time, when we talk about polysemy's Tactics machinery.

  1. Plus, it provides better combinators and more helpful error messages.

  2. For further discussion of Coyoneda and how it can help performance, perhaps you might be interested in my book.

May 31, 2019 01:11 AM

May 30, 2019

FP Complete

Blockchain and Cryptocurrency Security

Why Audit?

FP Complete now does blockchain audit services.
Why have we chosen to work in this field, and what are we aiming to accomplish?

Our corporate mission is to drive the successful adoption of better IT engineering tools and practices.

Experience shows us again and again: quality and productivity are driven more by these substantive improvements than by simply deciding to try harder. Any engineer, and any team, can be more successful using the right tools and best practices. This was true when I built and ran Microsoft’s Productivity Tools Team (for Windows and Office engineering), and when I was in charge of Visual C++ and parts of Visual Studio. And it remains true today as we see with powerful tools like Stack for Haskell, or Kubernetes, or a wide range of corporate projects FP Complete has worked on.

How To Succeed In FinTech

5 Killer DevOps Strategies

In this eBook you will learn the unique challenges facing FinTech and software development:

  • Data and Information Integrity
  • Data Security
  • System Integration
  • Compliance and Regulations

Click the "Download Now" button below to learn how to conquer these unique challenges.

Download Now!

Blockchain Now Needs Stronger Engineering

Good engineering involves a lot of pieces beyond just having a strong algorithm paper: coding standards, continuous integration, automated test coverage, documentation management, reproducible cloud deployment, dependency tracking, and more. The stronger the engineering infrastructure, the more likely you can expect a reliable and secure result that works as intended under a wide range of conditions—in other words, quality.

The blockchain field, including cryptocurrency, is of course fairly new. And these technologies are of course very sensitive to quality. Unfortunately as we have all seen, they don’t all live up to their promises. Engineering teams, perhaps feeling the pressure to get to market quickly, sometimes overlook valuable opportunities to improve quality. To be blunt, a lot of blockchain implementation work needs improvement.

We believe over the next few years the bar for engineering excellence is going way up. People are staking their money, their privacy, their businesses on the correct operation of these systems. So we’ve been asking directly: what can be done to increase the quality of engineering in the whole blockchain industry?

More even than other open technologies, blockchain relies upon community trust. We need to give blockchain groups a way to earn that trust by actually doing proper work, with an independent inspection that it’s being done right. A cryptocurrency cannot be a hack job—and if it is done right, users want to know. Thus the audit, an inspection to verify that the project lives up to good engineering standards. By making these standards clear, we give teams something specific to shoot for and give credit to those who've got it right. For users and investors, knowledge is power.

Months ago Cardano announced their decision to appoint us as the auditors of their cryptocurrency engineering. This cryptocurrency has a market capitalization over US$ 4 Billion, and they want users to know that the system can be trusted. We’ve already provided them with interim results which are being published, and the work is ongoing.

At the same time, we’re working on several other non-published cryptocurrency projects, and in talks with more. So we decided it was time to formalize the audit program and announce it publicly.

Learn More!

Blockchain Success Program

Membership has its benefits!

Think you know what you are doing, but need a little support? This program is designed for you along with your company in mind! FP Complete's Success Programs are packed with the necessary tools and techniques designed to guide your team's Blockchain project towards success at a fraction of the industry standard pricing.

  • Offering 3 strategic program levels.
  • Engineer to engineer mentoring that pays for itself!
  • Accelerate your IT team's expertise in cloud tools and automation.
  • Free your team to use their talents.
Enroll Today!

Levels of Auditing

We hope to encourage a great many blockchain and cryptocurrency projects to seek an outside engineering audit, whether from FP Complete or another qualified firm. We look forward to the day when users expect to see an audit on any sensitive cryptocurrency or blockchain work. And that means we need to provide people with a path to get started.

Therefore we’ve chosen to offer several audit plans, using different amounts of labor (and thus, costs) to achieve different amounts of scrutiny and certification. For ease of understanding by general audiences we are calling these Bronze, Silver and Gold; and we will use “stars” to further summarize how well the project is doing. We will be publishing the criteria for each level; obviously the more auditing work is done, the more parts of the engineering can be checked and potentially certified. What's crucial right now is to get every project on the path to verifiable quality.

Auditing is not the same as a 100% inspection. Given that all blockchain projects are moving targets, our goal is to achieve a reasonable level of scrutiny with sampling, and report accurately on whether each audited project appears to be living up to a reasonable standard of engineering practices. As part of any public certification we will report on the nature of what we’ve inspected, what standards it met, and exceptions we’ve found.

At a basic level of scrutiny, we will focus on the tools, development processes, and quality control processes in use: are good engineering systems used, in line with best practices for predictable results? At a higher level of scrutiny we will delve much deeper into a larger percentage of the source code, tests, and so on, greatly increasing the density of checks that can be done. Are the software and the distributed system being built in a way that is most likely to operate as specified? Or is the team operating on just caffeine and hope?

Clearly, signing up for an audit is no guarantee of a passing grade: a project may fail an audit and earn no certification at all. In such cases, our intention is to provide the team with as much constructive feedback as possible on how they can improve. We hope in such cases the chance to work up to a certification will serve as a “carrot,” an incentive to implement improvements that would lead to a passing grade or better.

As you probably know, FP Complete offers extensive services in FinTech software engineering, cloud engineering, and DevOps. To avoid any conflict of interest, of course we will not issue an audit grade for a project where we ran the engineering. In any such case we will bring in an outside firm to compare the engineering work with the published criteria and determine the grade.

Learn More!

Raising the Standards

Right now we see a wide range of engineering quality levels on blockchain projects. Frankly, I don’t expect to see many Gold or even Silver certifications in the short term. However, we hope to see some. Moreover, as industry standards rise (as they must), we expect to add further criteria, increasing the bar for each level of certification. Even a Bronze certification in 2020 may involve far more requirements than one in 2019 or 2018. This will be spelled out in the published criteria for each level at any given time.

FP Complete does not have the capacity to audit the over 1600 cryptocurrencies already in existence, plus all of the other blockchain projects and wallets. We certainly hope to make a dent, but realistically other companies will need to enter this space as well. We will welcome them to use criteria modeled on our own, or to create their own lists of what constitutes proper engineering. What’s important is that they not lower the bar, but raise the bar, for quality in this industry. The blockchain engineering audit field needs to grow rapidly for the public good, and we will promote its growth in a constructive and timely manner.

Note that a technology audit will never be the same thing as a financial audit. Technical excellence doesn’t mean that a particular cryptocurrency is a good investment, or that a particular blockchain is suited for some particular use. But it should mean that the implementation team is following best practices to bring their implementation in line with what’s been described and specified.

We hope the day will come when consumers of any blockchain will ask: where’s the audit? It’s long been expected in the stock market, and crypto users deserve no less. Home and business users alike deserve to know if they can trust the technology on which they are staking so much. By demanding evidence of excellence, we give providers the backing they need to invest more in quality, safety, and security.

Learn More!

For further reading

  1. How to select the right level of QA for your project
  2. Best practices: multifaceted testing
  3. Best practices: DevOps priorities for FinTech
  4. DevOps to prepare for a blockchain world (video presentation)
  5. From my colleague Steve Bogdan: getting past IT operations into DevOps
  6. From my colleague Niklas Hambüchen: the Haskell language and cryptocurrencies
  7. Details on our blockchain auditing services



by Aaron Contorer ( at May 30, 2019 07:14 PM

Functional Jobs

Scala Developer; can cross train from Clojure, F# or Erlang at Flexys (Full-time)

Flexys is hiring for a Software Developer with skills in functional or reactive programming.

Functional programming languages are entering the mainstream and if you've already explored Scala, Clojure, F# or Erlang, you'll find Flexys a supportive environment in which to grow. We can equally provide progression for a practiced Scala Developer ready for a challenge.

We're building highly responsive and scalable applications that manage debt more effectively and provide a collaborative experience for consumers which aims to promote sustainable and positive outcomes.

With our first customer deployments made, we're now expanding our engineering team to build out our digital engagement platform, applying machine learning and other AI techniques through Agile approaches. This is a back end oriented position, working primarily in Scala on systems with an event sourcing architecture.

We strongly value learning and doing things properly. We continuously improving our functional programming techniques. We automate heavily and spend where it makes sense. Importantly, we like all of our technical people to get some exposure to how customers use our software and how their businesses run.

We're always open to new things and you can influence where we go and how we get there. At all times you'll have a lot of freedom in design, technology choice and influence over the platform architecture.

Upcoming projects for the team

Integrating with Open Banking for payments and affordability checking Using AI to make collections processes more efficient Designing a streaming API to support delivery of real-time analytics Implementing a way of dynamically applying custom data models Exploring new open source ecosystems and services Exploring different programming languages in research and prototyping

We're looking for

Someone with solid functional or reactive programming skills, with a positive attitude to learning Scala, transferring from any of Clojure, Haskell, F# etc. Familiarity with Scala - from online learning, courses or tutorials, hobby projects, open source contributions, or from production coding A clear desire to learn and improve Familiarity with the principles of Agile, TDD and continuous delivery Enthusiasm, and informed opinions about modern software engineering practice Considering functional programming or Scala jobs in Bristol such as: Scala Developer | Scala Software Engineer | Back End Scala Developer | Back End Software Engineer | Clojure Developer | Haskell Developer | F# Developer | Erlang Developer etc.

Current stack - we'll help you get up to speed

Scala | Cats | Akka | Cassandra | Kafka | Docker | Kubernetes | ScalaCheck | Linux | and more

Salary and benefits

£55,000 to £65,000+ negotiable - we're keeping an open mind Discretionary relocation assistance Flexible working environment - tell us what you need 30 days holiday + bank holidays | medical cover | life cover | contributory pension | open source friendly IP Rights agreement | personal development budget available | conference involvement encouraged | time for open source contributions | on site cafe / restaurant | bicycle racks, on site showers and changing rooms | option to bring your own device | standing desks, supportive chairs, high quality screens, choice of teas, coffee

About us

Flexys was formed in 2016 by a group of debt management solution specialists looking to dramatically improve on the status quo in the industry. This is a chance to be a formative team member, building things from the ground up and making a positive impact. We offer a flat team structure with responsibility and autonomy. We strive to be honest, transparent, non-bureaucratic and supportive, and we offer real creative freedom. We strongly value learning and doing things properly. We're also proponents of open source, functional and reactive programming, TDD and Agile. A growing team, we're hiring for someone who shares our belief in providing highly effective Scala solutions and who is positive about improving their craft in software engineering.


Future Space, UWE, north Bristol - well served by public transport, cycle routes, and with on-site parking - just off the M4/M32

RECRUITERS: Flexys has selected techfolk as our recruitment partner and cold calling is not welcomed.

Get information on how to apply for this position.

May 30, 2019 02:51 PM

Magnus Therning

Hedgehog on a REST API

Last year I wrote a little bit about my attempt to use QuickCheck to test a REST API. Back then I got as far as generating test programs, running them, and validating an in-test model against the observed behaviour of the web service under test. One thing that I didn’t implement was shrinking. I had some ideas, and got some better ideas in a comment on that post, but I’ve not taken the time to actually sit down and work it out. Then, during this spring, a couple of blog posts from Oskar Wickström (intro, part 1, part 2) made me aware of another library for doing property-based testing, hedgehog. It differs quite a bit from QuickCheck, most notably the way it uses to generate random data, and, this is the bit that made me sit up and pay attention, it has integrated shrinking.

My first plan was to use the same approach as I used with QuickCheck, but after finding out that there’s explicit support for state machine tests everything turned out to be a bit easier than I had expected.

Well, it still wasn’t exactly easy to work out the details, but the registry example in the hedgehog source repo together with a (slightly dated) example I managed to work it out (I think).


The API is the same as in the post on using QuickCheck, with one little difference, I’ve been lazy when implementing GET /users/:id and return a list of users (that makes it easy to represent a missing :id).

Method Route Example in Example out
POST /users {"userId": 0, "userName": "Yogi Berra"} {"userId": 42, "userName": "Yogi Berra"}
DELETE /users/:id
GET /users [0,3,7]
GET /users/:id [{"userId": 42, "userName": "Yogi Berra"}]
GET /users/:id [] (when there’s no user with :id)
POST /reset

The model state

Just like last time I’m using as simple a model as I think I can get away with, based on the API above:

That extra v is something that hedgehog requires. Why? I don’t really know, and luckily I don’t have to care to make it all work. One thing though, the language pragma KindSignatures is necessary to use that kind of syntax.

Representing API calls

Representing an API call requires three things

  1. a type
  2. an implementation of HTraversable for the type
  3. a function producing a Command for the type

I represent the three API calls with these three types

Again that v pops up, but as with the model state, there’s no need to pay any attention to it.

For the implementation of HTraversable I was greatly helped by the registry example. Their implementations are fairly straight forward, which is a good thing since the need for them is internal to hedgehog.

Once these two things are out of the way we get to the meat of the implementation of the API calls, a function creating a Command instance for each type of API call. The exact type for all three functions will be

which doesn’t say a whole lot, I think. After reading the documentation I found it a little clearer, but the two examples, state machine testing and registry, was what cleared things up for me.1 In an attempt at being overly explicit I wrote these functions in the same style. This is what it ended up looking like for the AddUser type:

Piece by piece:

  1. gen is the generator of data. It takes one argument, the current state, but for AddUser I have no use for it. The user name is generated using a generator for Text, and rather arbitrarily I limit the names to 42 characters.
  2. exec is the action that calls the web service. Here I’m using http-client to make the call and aeson to parse the response into a User. It produces output.
  3. u is a function for updating the model state. It’s given the current state, the command and the output. All I need to to do for AddUser is to pick a userId and associate it with the generated name.
  4. e is a function for checking post-conditions, in other words checking properties that must hold after exec has run and the state has been updated. It’s given four arguments, the previous state, the updated state, the command and the output. The tests here are on the HTTP response code and the returned user name. I think that will do for the time being.

The function for DeleteUser follows the same pattern

I think only two pieces need further explanation:

  1. gen only returns a DeleteUser with an index actually present in the model state. If there are no users in the model then Nothing is returned. As far as I understand that means that generated programs will only make calls to delete existing users.2
  2. r is a pre-condition that programs only delete users that exist. At first I had skipped this pre-condition, thinking that it’d be enough to have gen only create delete calls for existing users. However, after reading the documentation of Command and Callback a bit more closely I realised that I might need a pre-condition to make sure that this holds true also while shrinking.

The final function, for GetUser requires no further explanation so I only present it here

The property and test

It looks like there are two obvious top-level properties

  1. the web service works as expected when all calls are made one at a time (sequential), and
  2. the web service works as expected when all calls are made in parallel.

Hedgehog provides two pairs of functions for this

  1. a sequential generator with executeSequential, and
  2. a parallel generator with executeParallel.

I started with the former only

This first creates a generator of programs of at most length 103, then turning that into a Sequential which can be passed to executeSequential to turn into a Property.

The function resetWS clears out the web service to make sure that the tests start with a clean slate each time. Its definition is

The final bit is the main function, which I wrote like this

That is, first run the property sequentially (checkSequential) and if that fails exit with failure.

Running the test

When running the test fails and gives me a program that breaks the property, and exactly what fails:

━━━ Main ━━━
  ✗ sequential failed after 13 tests and 1 shrink.
        ┏━━ tst/test-01.hs ━━━
     89 ┃ getUser :: (MonadGen n, MonadIO m) => Command n m State
     90 ┃ getUser = Command gen exec [ Require r
     91 ┃                            , Ensure e
     92 ┃                            ]
     93 ┃   where
     94 ┃     gen (State m) = case M.keys m of
     95 ┃       [] -> Nothing
     96 ┃       ks -> Just $ GetUser <$> Gen.element ks
     97 ┃ 
     98 ┃     exec (GetUser i) = liftIO $ do
     99 ┃       mgr <- newManager defaultManagerSettings
    100 ┃       getReq <- parseRequest $ "GET http://localhost:3000/users/" ++ show i
    101 ┃       getResp <- httpLbs getReq mgr
    102 ┃       let us = decode $ responseBody getResp :: Maybe [User]
    103 ┃       return (status200 == responseStatus getResp, us)
    104 ┃ 
    105 ┃     r (State m) (GetUser i) = i `elem` M.keys m
    106 ┃ 
    107 ┃     e _ _ (GetUser _) (r, us) = do
    108 ┃       r === True
    109 ┃       assert $ isJust us
    110 ┃       (length <$> us) === Just 1
        ┃       ^^^^^^^^^^^^^^^^^^^^^^^^^^
        ┃       │ Failed (- lhs =/= + rhs)
        ┃       │ - Just 0
        ┃       │ + Just 1
        ┏━━ tst/test-01.hs ━━━
    118 ┃ prop_seq :: Property
    119 ┃ prop_seq = property $ do
    120 ┃   actions <- forAll $ Gen.sequential (Range.linear 1 10) initialState [addUser, deleteUser, getUser]
        ┃   │ Var 0 = AddUser ""
        ┃   │ Var 1 = GetUser 1
    121 ┃   resetWS
    122 ┃   executeSequential initialState actions
    This failure can be reproduced by running:
    > recheck (Size 12) (Seed 6041776208714975061 (-2279196309322888437)) sequential
  ✗ 1 failed.

My goodness, that is pretty output!

Anyway, I’d say that the failing program has been shrunk to be minimal so I’d say that all in all this is a big step up from what I had earlier. Sure, using the hedgehog state machine API is slightly involved, but once worked out I find it fairly straight-forward and it most likely is written by people much more knowledgable than me and better than anything I could produce. Having to use generators explicitly (the hedgehog way) is neither easier nor more complicated than defining a few type class instances (the QuickCheck way). Finally, the integrated shrinking is rather brilliant and not having to implement that myself is definitely a big benefit.

Now I only have to fix the errors in the web service that the test reveal. This post is already rather long, so I’ll keep that for a future post.

  1. There is still one thing that’s unclear to me though, and that’s how to get to the output in an update function.

  2. Put another way, programs will never test how the web service behaves when asking for non-existing users. I think that, if I want to test that, I’ll opt for using a separate API call type for it.

  3. At least that’s my understanding of the impact of Range.linear 1 10.

May 30, 2019 12:00 AM

May 29, 2019

FP Complete

Faking SQL Server in Haskell


At FP Complete we develop many tools for our clients to help them achieve their goals. Most of these tools are written in Haskell (and, more recently, some are in Rust), and that has helped us write them more quickly and produce more maintainable apps going forward.

by Chris Done ( at May 29, 2019 12:00 PM

May 28, 2019

Joachim Breitner


At a dance event in Freiburg someone told me that my first name would be spelled Ιωακείμ in Greek.

I am happy that my name contains ω (the first infinite ordinal, α (which identifies programs that differ only in names), ε (the crucial variable in a calculus proofs) and μ (which builds recursive types).

Sorry ι and κ, you just aren’t that cool.

by Joachim Breitner ( at May 28, 2019 01:24 PM

Donnacha Oisín Kidney

Deriving a Linear-Time Applicative Traversal of a Rose Tree

Posted on May 28, 2019
Tags: Haskell

The Story so Far

Currently, we have several different ways to enumerate a tree in breadth-first order. The typical solution (which is the usual recommended approach in imperative programming as well) uses a queue, as described by Okasaki (2000). If we take the simplest possible queue (a list), we get a quadratic-time algorithm, with an albeit simple implementation. The next simplest version is to use a banker’s queue (which is just a pair of lists). From this version, if we inline and apply identities like the following:

We’ll get to the following definition:

We can get from this function to others (like one which uses a corecursive queue, and so on) through a similar derivation. I might some day write a post on each derivation, starting from the simple version and demonstrating how to get to the more efficient at each step.

For today, though, I’m interested in the traversal of a rose tree. Traversal, here, of course, is in the applicative sense.

Thus far, I’ve managed to write linear-time traversals, but they’ve been unsatisfying. They work by enumerating the tree, traversing the effectful function over the list, and then rebuilding the tree. Since each of those steps only takes linear time, the whole thing is indeed a linear-time traversal, but I hadn’t been able to fuse away the intermediate step.


The template for the algorithm I want comes from the Phases applicative (Easterly 2019):

We can use it to write a breadth-first traversal like so:

The key component that makes this work is that it combines applicative effects in parallel:

We’re also using the following helper functions:

The problem is that it’s quadratic: the traverse in:

Hides some expensive calls to <*>.

A Roadmap for Optimisation

The problem with the Phases traversal is actually analogous to another function for enumeration: levels from Gibbons (2015).

lzw takes the place of <*> here, but the overall issue is the same: we’re zipping at every point, making the whole thing quadratic.

However, from the above function we can derive a linear time enumeration. It looks like this:

Our objective is clear, then: try to derive the linear-time implementation of bft from the quadratic, in a way analogous to the above two functions. This is actually relatively straightforward once the target is clear: the rest of this post is devoted to the derivation.


First, we start off with the original bft.

Inline traverse.

Factor out go''.

Inline go' (and rename go'' to go')

Definition of liftA2

Definition of liftA2 (pattern-matching on ys)

Definition of <*>.

Fuse liftA2 with fmap


At this point, we actually hit a wall: the expression

Is what makes the whole thing quadratic. We need to find a way to thread that liftA2 along with the fold to get it to linear. This is the only real trick in the derivation: I’ll use polymorphic recursion to avoid the extra zip.

And that’s it!

Avoiding Maps

We can finally write a slightly different version that avoids some unnecessary fmaps by basing Phases on liftA2 rather than <*>.


Easterly, Noah. 2019. “Functions and newtype wrappers for traversing Trees: Rampion/tree-traversals.”

Gibbons, Jeremy. 2015. “Breadth-First Traversal.” Patterns in Functional Programming.

Okasaki, Chris. 2000. “Breadth-first Numbering: Lessons from a Small Exercise in Algorithm Design.” In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming, 131–136. ICFP ’00. New York, NY, USA: ACM. doi:10.1145/351240.351253.

by Donnacha Oisín Kidney at May 28, 2019 12:00 AM

May 27, 2019

Monday Morning Haskell

Smarter Enemies with BFS!


Last week we added enemies to our maze. These little squares will rove around the maze, and if they touch our character, we have to restart the maze. We made it so that these enemies moved around at random. Thus they're not particularly efficient at getting to us.

This week, we're going to make them much more dangerous! They'll use the breadth first search algorithm to find the shortest path towards our player. We'll use three kinds of data structures from the containers package. So if you want to get a little more familiar with that, this article is a great start! Take a look at our Github Repository to see the full code! Look at the part-6 branch for this article!

We'll also make use of the state monad throughout. If you're still a little uncomfortable with monads, make sure to read our series on them! It'll help you with the basics. By the end you'll know about the state monad and how to use it in conjunction with other monads! If you're new to Haskell, you should also take a look at our Beginners Checklist!

BFS Overview

The goal of our breadth first search will be to return the fastest path from one location to another. We'll be writing this function:

getShortestPath :: Maze -> Location -> Location -> [Location]

It will return all the locations on the path from the initial location to the target location. If there's no possible path, we'll return the empty list. In practice, we'll usually only want to take the first element of this list. But there are use cases for having the whole path that we'll explore later. Here's a basic outline of our algorithm:

  1. Keep a queue of locations that we'll visit in the future. At the start, this should contain our starting location.
  2. Dequeue the first location (if the queue is empty, return the empty list). Mark this location as visited. If it is our target location, skip to step 5.
  3. Find all adjacent locations that we haven't visited/enqueued yet. Put them into the search queue. Mark the dequeued location as the "parent" location for each of these new locations.
  4. Continue dequeuing elements and inserting their unvisited neighbors. Stop when we dequeue the target location.
  5. Once we have the target location, use the "parents" map to create the full path from start to finish.

Data Structures Galore

Now let's start getting into the details. As we'll see, there are several different data structures we'll need for this! We'll do some of the same things we did for depth first search (the first time around). We'll make a type to represent our current algorithm state. Then we'll make a recursive, stateful function over that type. In this case, we'll want three items in our search state.

  1. A set of "visited" cells
  2. A queue for cells we are waiting to visit
  3. A mapping of cells to their "parent"

And for all three of these, we'll want different structures. Data.Set will suffice for our visited cells. Then we'll want Data.Map for the parent map. For the search queue though, we'll use something that we haven't used on this blog before: Data.Sequence. This structure allows us to add to the back and remove from the front quickly. Here's our search state type:

data BFSState = BFSState
  { bfsSearchQueue :: Seq.Seq Location
  , bfsVisistedLocations :: Set.Set Location
  , bfsParents :: Map.Map Location Location

Before we get carried away with our search function, let's fill in our wrapper function. This will initialize the state with the starting location. Then it will call evalState to get the result:

getShortestPath :: Maze -> Location -> Location -> [Location]
getShortestPath maze initialLocation targetLocation = evalState
  (bfs maze initialLocation targetLocation)
    (Seq.singleton initialLocation) 
    (Set.singleton initialLocation) 

bfs :: Maze -> Location -> Location -> State BFSState [Location]
bfs = ...

As with depth first search, we'll start by retrieving the current state. Then we'll ask if the search queue is empty. If it is, this means we've exhausted all possibilities, and should return the empty list. This indicates no path is possible:

bfs :: Maze -> Location -> Location -> State BFSState [Location]
bfs maze initialLocation targetLocation = do
  BFSState searchQueue visitedSet parentsMap <- get
  if Seq.null searchQueue
    then return []
    else do

Now let's consider the first element in our queue. If it's our target location, we're done. We'll write the exact helper for this part later. But first let's get into the meat of the algorithm:

bfs maze initialLocation targetLocation = do
  BFSState searchQueue visitedSet parentsMap <- get
  if Seq.null searchQueue
    then return []
    else do
      let nextLoc = Seq.index searchQueue 0
      if nextLoc == targetLocation
        then … -- Get results
        else do

Now our code will actually look imperative, to match the algorithm description above:

  1. Get adjacent cells and filter based on those we haven't visited
  2. Insert the current cell into the visited set
  3. Insert the new cells at the end of the search queue, but drop the current (first) element from the queue as well.
  4. Mark the current cell as the "parent" for each of these new cells. The new cell should be the "key", the current should be the value.

There's a couple tricky folds involved here, but nothing too bad. Here's what it looks like:

bfs :: Maze -> Location -> Location -> State BFSState [Location]
bfs maze initialLocation targetLocation = do
  BFSState searchQueue visitedSet parentsMap <- get
      if nextLoc == targetLocation
        then ...
        else do
              -- Step 1 (Find next locations)
          let adjacentCells = getAdjacentLocations maze nextLoc
              unvisitedNextCells = filter 
                (\loc -> not (Set.member loc visitedSet)) 

              -- Step 2 (Mark as visited)
              newVisitedSet = Set.insert nextLoc visitedSet

              -- Step 3 (Enqueue new elements)
              newSearchQueue = foldr
                (flip (Seq.|>))
                -- (Notice we remove the first element!)
                (Seq.drop 1 searchQueue)

              -- Step 4
              newParentsMap = foldr
                (\loc -> Map.insert loc nextLoc)

Then once we're done, we'll insert these new elements into our search state. Then we'll make a recursive call to bfs to continue the process!

bfs :: Maze -> Location -> Location -> State BFSState [Location]
bfs maze initialLocation targetLocation = do
  BFSState searchQueue visitedSet parentsMap <- get
      if nextLoc == targetLocation
        then ...
        else do
              -- Step 1
          let adjacentCells = getAdjacentLocations maze nextLoc
              unvisitedNextCells = filter 
                (\loc -> not (Set.member loc visitedSet)) 
              -- Step 2
              newVisitedSet = Set.insert nextLoc visitedSet
              -- Step 3
              newSearchQueue = foldr
                (flip (Seq.|>))
                -- (Notice we remove the first element!)
                (Seq.drop 1 searchQueue)
              -- Step 4
              newParentsMap = foldr
                (\loc -> Map.insert loc nextLoc)

          -- Replace the state and make recursive call!
          put (BFSState newSearchQueue newVisitedSet newParentsMap)
          bfs maze initialLocation targetLocation

For the last part of this, we need to consider what happens when we hit our target. In this case, we'll "unwind" the path using the parents map. We'll start with the target location in our path list. Then we'll look up its parent, and append it to the list. Then we'll look up the parent's parent. And so on. We do this recursion (of course).

bfs :: Maze -> Location -> Location -> State BFSState [Location]
bfs maze initialLocation targetLocation = do
  BFSState searchQueue visitedSet parentsMap <- get
  if Seq.null searchQueue
    then return []
    else do
      let nextLoc = Seq.index searchQueue 0
      if nextLoc == targetLocation
        then return (unwindPath parentsMap [targetLocation])
    unwindPath parentsMap currentPath =
      case Map.lookup (head currentPath) parentsMap of
        Nothing -> tail currentPath
        Just parent -> unwindPath parentsMap (parent : currentPath)

The only cell we should find without a parent is the initial cell. So when we hit this case, we return the trail of the current path (so removing the current cell from it). And that's all!

Modifying the Game

All we have to do to wrap things up is call this function instead of our random function for the enemy movements. We'll keep things a little fresh by having them make a random move about 20% of the time. (We'll make this a tunable parameter in the future). Here's the bit where we keep some randomness, like what we have now:

updateEnemy :: Maze -> Location -> Enemy -> State StdGen Enemy
updateEnemy maze playerLocation e@(Enemy location) =
  if (null potentialLocs)
    then return e
    else do
      gen <- get
      let (randomMoveRoll, gen') = randomR (1 :: Int, 5) gen
      let (newLocation, newGen) = if randomMoveRoll == 1
              let (randomIndex, newGen) =
                randomR (0, (length potentialLocs) - 1) gen'
              in  (potentialLocs !! randomIndex, newGen)
    potentialLocs = getAdjacentLocations maze location

And in the rest of the cases, we'll call our getShortestPath function!

updateEnemy :: Maze -> Location -> Enemy -> State StdGen Enemy
updateEnemy maze playerLocation e@(Enemy location) =
  if (null potentialLocs)
    then return e
    else do
      gen <- get
      let (randomMoveRoll, gen') = randomR (1 :: Int, 5) gen
      let (newLocation, newGen) = if randomMoveRoll == 1
              let (randomIndex, newGen) =
                randomR (0, (length potentialLocs) - 1) gen'
              in  (potentialLocs !! randomIndex, newGen)
              let shortestPath =
                getShortestPath maze location playerLocation
              in  (if null shortestPath then location 
                     else head shortestPath, gen')
      put newGen
      return (Enemy newLocation)
      potentialLocs = getAdjacentLocations maze location

And now the enemies will chase us around! They're hard to avoid!


With our enemies now being more intelligent, we'll want to allow our player to fight back against them! Next week, we'll create a mechanism to stun the ghosts to give ourselves a better chance! After, we'll look a some other ways to power up our player!

If you've never programmed in Haskell, hopefully this series is giving you some good ideas of the possibilities! We have a lot of resources for beginners! Check out our Beginners Checklist as well as our Liftoff Series!

by James Bowen at May 27, 2019 02:30 PM

Tweag I/O

Format Haskell code like never before

Mark Karpov

If we think of the average Haskeller, who do we see? A poor, broken person. Tired head falls into the open palms, sobbing, deprived of the basic human right for automated source code formatting.

Is it at all conceivable that so many attempts were made and none quite succeeded? The design space is surprisingly large. Perhaps the sweet spot for large projects with several contributors hasn't been found yet.

I'd like to announce a new project called Ormolu. It's still vaporware, but that's just a bug we're a long way into fixing, and I want to convince you that the principles are sound.


Ormolu is a formatter that follows a few simple ideas that make it quite unlike other similar projects in the Haskell land. I'm going to explain them now.

What are code formatters good for? Normalizing what does not impact readability and therefore need not be under the programmer's control. But use of whitespace does impact readability, and therefore should be under at least partial control of the programmer. In other words, there is no gain in tolerating 5 different type signature styles, when there is gain in letting the programmer decide whether some if-then-else should be single line or multiline.

In Ormolu, the layout of the input influences the layout of the output. This means that the choices between single line/multiline layouts in each particular situation are made by the author of the original source code, not by an algorithm. While giving more precise control to the user, as a bonus we also get a simpler and faster implementation.

Both Hindent and Brittany try to make their own decisions about use of whitespace. While Hindent's decisions are simpler than Brittany's, great care needs to be taken in the implementation to avoid exponential blowups when formatting deeply nested expressions. On the other hand Brittany tries hard to avoid those, at the cost of more complex data structures.

Code formatters are also good to take away some of the tedium of writing code in the first place. With Ormolu, if you decide that a particular case-expression or type signature should be multiline, you don't have to painstakingly write out each line properly indented. Just introduce a line break anywhere at least once and Ormolu will do the rest.

Let's see an example of Ormolu's approach. The input:

-- | Foo performs foo and sometimes bar.

foo :: Thoroughness
  -> Int -> Int
foo t x = if x > 20
    then case t of
           Thorough -> x + 50
           Somewhat -> x + 20
           NotAtAll -> 0
    else 10 + 1

Results in the following formatted code:

-- | Foo performs foo and sometimes bar.
  :: Thoroughness
  -> Int
  -> Int
foo t x =
  if x > 20
    case t of
      Thorough -> x + 50
      Somewhat -> x + 20
      NotAtAll -> 0
  else 10 + 1

The fact that the signature of foo occupies two lines in the original source code causes the multiline version of the type signature to be used in the formatted version. The same principle applies to the body of foo. Note the difference between formatting of then and else clauses: then is multiline and else is single line.

Other features that are worth mentioning:

  • Idempotency: formatting already formatted code is a no-op. This is an important property for any code formatter to have, which still holds under our multiline-in-multiline-out policy, even if the formatting for a given parse tree is not unique.

  • The project aims to implement one “true” formatting style which admits no configuration. Similarly to what's described in the blog post about Hindent 5, we concluded that if formatting is done automatically, it's better to embrace one style and avoid stylistic fragmentation. This way everyone who uses Ormolu will be automatically on the same page.

  • The formatting style aims to result in minimal diffs while still remaining close to conventional Haskell formatting. Certain formatting practices, like vertically aligning the bodies of let-bindings or allowing the length of a type or variable name to influence indentation level lead to diff amplification. Therefore, we try to avoid that.

Why Ormolu?

There are a few solutions for formatting Haskell source code, why would this project be more successful?

  • Ormolu uses GHC's own parser to avoid parsing problems caused by haskell-src-exts. Many similar projects suffer from the fact that they don't use the same parser that GHC does. Like Brittany, we are using the parser from the ghc package and therefore work with the same AST that GHC uses.

  • The code of the formatter is written so that it's easy to modify and maintain. Roughly, it means that the project follows the path of Hindent and is very much about printing the AST in a particular way. So far I think the goal is met and the code base is hacking-friendly.

  • There is a good testing scheme in place that allows us to grow the collection of examples easily. This will keep the project well-tested and robust to the point that it can be used in large projects without exposing unfortunate, disappointing bugs here and there.

  • It is an open project that anyone is free to fork and it is actively maintained by a large commercial contributor, that is, Tweag. This makes the odds very high that it'll be maintained in the future and bugs will be fixed.

Want it sooner? You can help!

Right now some parts of the AST are implemented fully, such as data type definitions, module export lists, and a few others. Most importantly, handling of comments is dealt with implicitly by the rendering combinators, allowing us to focus on rendering the AST nodes only.

But the GHC AST is huge. This is why contributions are welcome. The printing framework and the approach to testing that we use makes it very easy to implement the rendering of the missing parts of AST. So right now it's just a matter of time before we have a fully featured formatter for Haskell code.

Let's do it iteratively: spend an evening and implement rendering of a little bit of Haskell syntax, throw in a few files in the test suite and boom, we're a bit closer. One can pick up something really simple, such as e.g. role annotations. It takes 1 hour or so, but a whole new type of declarations will be supported! Lots of fun.

What is more, we're taking Ormolu to ZuriHac, where everyone will be able to help developing the project. So come and contribute to a tool that you'll be able to use proudly at your daily job. Or indeed, in the intimacy of a late evening. As a little secret. Just between you and your source code.

May 27, 2019 12:00 AM

May 25, 2019

Sandy Maguire

Faking Fundeps with Typechecker Plugins

The approach here, and my original implementation are both lifted almost entirely from Luka Horvat's plugin for simple-effects. All praise should be directed to him.

Last time we chatted about using a GHC plugin to run custom Core-to-Core transformations on the programs that GHC is compiling. Doing so allows us to add custom optimization passes, and even other, more exotic things like rewriting lambda expression as categorical operations.

Today I want to talk about another sort of GHC plugin: type-checker plugins! TC plugins let you hook into GHC's constraint machinery and help it solve domain-specific problems that it wouldn't be able to otherwise. One of the more interesting examples of a TC plugin is nomeata's ghc-justdoit --- which will automatically generate a value of the correct type, essentially letting you leave implementations as "exercises for the compiler."

Polysemy uses a TC plugin in order to improve type-inference. The result is that it can provide type-inference that is as good as mtl's, without succumbing to the pitfalls that accompany mtl's approach.

The Problem

Consider the following program:

foo :: MonadState Int m => m ()
foo = modify (+ 1)

Such a thing compiles and runs no problem. There are no surprises here for any Haskell programmers who have ever run into mtl. But the reason it works is actually quite subtle. If we look at the type of modify we see:

modify :: MonadState s m => (s -> s) -> m ()

which suggests that the s -> s function we pass to it should determine the s parameter. But our function (+ 1) has type Num a => a -> a, therefore the type of modify (+1) should be this:

modify (+ 1) :: (MonadState s m, Num s) => m ()

So the question is, why the heck is GHC willing to use a MonadState Int m constraint to solve the wanted (MonadState s m, Num s) constraint arising from a use of modify (+1)? The problem feels analogous to this one, which doesn't work:

bar :: Show Bool => a -> String
bar b = show b  -- doesn't work

Just because we have a Show Bool constraint in scope doesn't mean that a is a Bool! So how come we're allowed to use our MonadState Int m constraint, to solve a (MonadState s m, Num s)? Completely analogously, we don't know that s is an Int!

The solution to this puzzler is in the definition of MondState:

class Monad m => MonadState s (m :: * -> *) | m -> s where

Notice this | m -> s bit, which is known as a functional dependency or a fundep for short. The fundep says "if you know m, you also know s," or equivalently, "s is completely determined by m." And so, when typechecking foo, GHC is asked to solve both MonadState Int m and (Num s, MonadState s m). But since there can only be a single instance of MonadState for m, this means that MonadState Int m and MonadState s m must be the same. Therefore s ~ Int.

This is an elegant solution, but it comes at a cost --- namely that we're only allowed to use a single MonadState at a time! If you're a longtime Haskell programmer, this probably doesn't feel like a limitation to you; just stick all the pieces of state you want into a single type, and then use some classy fields to access them, right? Matt Parsons has a blog post on the pain points, and some bandages, for doing this with typed errors. At the end of the day, the real problem is that we're only allowed a single MonadError constraint.

Polysemy "fixes the glitch" by just not using fundeps. This means you're completely free to use as many state, error, and whatever effects you want all at the same time. The downside? Type-inference sucks again. Indeed, the equivalent program to foo in polysemy doesn't compile by default:

foo' :: Member (State Int) r => Sem r ()
foo' = modify (+ 1)
• Ambiguous use of effect 'State'
  Possible fix:
    add (Member (State s0) r) to the context of
      the type signature
  If you already have the constraint you want, instead
    add a type application to specify
      's0' directly, or activate polysemy-plugin which
        can usually infer the type correctly.
• In the expression: modify (+ 1)
  In an equation for ‘foo'’: foo' = modify (+ 1)

This situation blows chunks. It's obvious what this program should do, so let's just fix it.

The Solution

Let's forget about the compiler for a second and ask ourselves how the Human Brain Typechecker(TM) would type-check this problem. Given the program:

foo' :: Member (State Int) r => Sem r ()
foo' = modify (+ 1)

A human would look at the modify here, and probably run an algorithm similar to this:

  • Okay, what State is modify running over here?
  • Well, it's some sort of Num.
  • Oh, look, there's a Member (State Int) r constraint in scope.
  • That thing wouldn't be there if it wasn't necessary.
  • I guess modify is running over State Int.

Pretty great algorithm! Instead, here's what GHC does:

  • Okay, what State is modify running over here?
  • Well, it's some sort of Num.
  • But that thing is polymorphic.
  • Guess I'll emit a (Num n, Member (State n) r) constraint.
  • Why did the stupid human put an unnecessary Member (State Int) r constraint here?
  • What an idiot!

And then worse, it won't compile because the generated n type is now ambiguous and not mentioned anywhere in the type signature!

Instead, let's use a TC plugin to make GHC reason more like a human when it comes to Member constraints. In particular, we're going to mock the fundep lookup algorithm:

  • Whenever GHC is trying to solve a Member (effect a) r constraint
  • And there is exactly one constraint in scope of the form Member (effect b) r
  • Then emit a a ~ b constraint, allowing GHC to use the given Member (effect b) r constraint to solve the wanted Member (effect a) r

TC Plugins

At its heart, a TC plugin is a value of type TcPlugin, a record of three methods:

data TcPlugin = forall s. TcPlugin
  { tcPluginInit  :: TcPluginM s
  , tcPluginSolve :: s -> [Ct] -> [Ct] -> [Ct] -> TcPluginM TcPluginResult
  , tcPluginStop  :: s -> TcPluginM ()

The tcPluginInit field can be used to allocate a piece of state that is passed to the other two records, and tcPluginStop finalizes that state. Most plugins I've seen use the s parameter to lookup the GHC representation of classes that they want to help solve. However, the most interesting bit is the tcPluginSolve function.

tcPluginSolve takes three lists of Cts, which are different varieties of constraints relevant to the problem.

  1. The first list is the given constraints --- the ones a user has explicitly written out in a type signature.
  2. The second list is the derived constraints --- things GHC has inferred from context.
  3. The third list is the wanted constraints --- the ones that GHC can't solve on its own.

From these three lists, we are expected to provide a TcPluginResult, which for our purposes is a pair of new Cts we'd like GHC to solve; and a list of the Cts we solved, along with the corresponding dictionaries. Returning two empty lists here signals to GHC "I can't do any more work!"

So let's get to work. The first thing we need to do is get our hands on the Member class we want to solve. In polysemy, Member is actually just a type synonym for a few other typeclasses; so the real typeclass we'd like to solve for is called Find.

As a brief aside on the Find class, its definition is this:

class Find (r :: [k]) (t :: k) where

and it means "lookup the index of t inside r". In Polysemy, r is usually left polymorphic, for the same reasons that we leave the m polymorphic in MonadState s m.

Anyway, we want to find the Find class. We can do this by writing a function for our tcPluginInit function:

findFindClass :: TcPlugin Class
findFindClass = do
  md <- lookupModule
          (mkModuleName "Polysemy.Internal.Union")
          (fsLit "polysemy")
  find_tc <- lookupName md $ mkTcOcc "Find"
  tcLookupClass find_tc

We first lookup the defining module, here Polysemy.Internal.Union in package polysemy. We then lookup the Find name in that module, and then lookup the class with that name. By setting findFindClass as our tcPluginInit, our tcPluginSolve function will receive the Find class as a parameter.

Before diving into tcPluginSolve, we're going to need some helper functions.

allFindCts :: Class -> [Ct] -> [(CtLoc, (Type, Type, Type))]
allFindCts cls cts = do
  ct <- cts
  CDictCan { cc_tyargs = [ _, r, eff ] } <- pure ct
  guard $ cls == cc_class cd
  let eff_name = getEffName eff
  pure (ctLoc ct, (eff_name, eff, r))

getEffName :: Type -> Type
getEffName t = fst $ splitAppTys t

The allFindCts function searches through the Cts for Find constraints, and unpacks the pieces we're going to need. We first pattern match on whether the Ct is a CDictCan, which corresponds to everyday typeclass-y constraints. We ensure it has exactly three type args (Find takes a kind, and then the two parameters we care about), and ensure that this class is the cls we're looking for.

We return four things for each matching Ct:

  1. We need to keep track of its CtLoc --- corresponding to where the constraint came from. This is necessary to keep around so GHC can give good error messages if things go wrong.
  2. The effect "name". This is just the head of the effect, in our ongoing example, it's State.
  3. The actual effect we're looking for. This corresponds to the t parameter in a Find constraint. In the ongoing example, State s.
  4. The effect stack we're searching in (r in the Find constraint).

So remember, our idea is "see if there is exactly one matching given Find constraint for any wanted Find constraint --- and if so, unify the two."

    :: (Type, Type, Type)
    -> [(Type, Type, Type)]
    -> Maybe Type
findMatchingEffect (eff_name, _, r) ts =
  singleListToJust $ do
    (eff_name', eff', r') <- ts
    guard $ eqType eff_name eff_name'
    guard $ eqType r r'
    pure eff

singleListToJust :: [a] -> Maybe a
singleListToJust [a] = Just a
singleListToJust _ = Nothing

findMatchingEffect takes the output of allFindCts for a single wanted constraint, and all of the given constraints, and sees if there's a single match between the two. If so, it returns the matching effect.

We need one last helper before we're ready to put everything together. We wanted to be able to generate new wanted constraints of the form a ~ b. Emitting such a thing as a new wanted constraint will cause GHC to unify a and b; which is exactly what we'd like in order to convince it to use one given constraint in place of another.

mkWanted :: CtLoc -> Type -> Type -> TcPluginM (Maybe Ct)
mkWanted loc eff eff' = do
  if eqType (getEffName eff) (getEffName eff')
     then do
       (ev, _) <- unsafeTcPluginTcM
                . runTcSDeriveds
                $ newWantedEq loc Nominal eff eff'
       pure . Just $ CNonCanonical ev
       pure Nothing

What's going on here? Well we check if the two effects we want to unify have the same effect name. Then if so, we use the wanted's CtLoc to generate a new, derived wanted constraint of the form eff ~ eff'. In essence, we're promising the compiler that it can solve the wanted if it can solve eff ~ eff'.

And finally we're ready to roll.

solveFundep :: Class -> [Ct] -> [Ct] -> [Ct] -> TcPluginM TcPluginResult
solveFundep find_cls giv _ want = do
    let wanted_effs = allFindCts find_cls want
        given_effs  = fmap snd $ allFindCts find_cls giv

    eqs <- forM wanted_effs $ \(loc, e@(_, eff, r)) ->
      case findMatchingEffect e given_effs of
        Just eff' -> mkWanted loc eff eff'
        Nothing -> do
          case splitAppTys r of
            (_, [_, eff', _]) -> mkWanted loc eff eff'
            _                 -> pure Nothing

    pure . TcPluginOk [] $ catMaybes eqs

We get all of the Find constraints in the givens and the wanteds. Then, for each wanted, we see if there is a singularly matching given, and if so, generate a wanted constraint unifying the two.

However, if we don't find a singularly matching effect, we're not necessarily in hot water. We attempt to decompose r into a type constructor and its arguments. Since r has kind [k], there are three possibilities here:

  1. r is a polymorphic type variable, in which case we can do nothing.
  2. r is '[], so we have no effects to possibly unify, and so we can do nothing.
  3. r has form e ': es, in which case we attempt to unify e with the wanted.

What's going on with this? Why is this bit necessary? Well, consider the case where we want to run our effect stack. Let's say we have this program:

foo' :: Member (State Int) r => Sem r ()
foo' = modify (+ 1)

main :: IO ()
main = do
  result <- runM . runState 5 $ foo'
  print result

The type of runM . runState 5 is Num a => Sem '[State a, Lift IO] x -> IO x. But foo' still wants a State Int constraint, however, main doesn't have any givens! Instead, the wanted we see is of the form Find '[State a, Lift IO] (State Int), and so we're justified in our logic above to unify State Int with the head of the list.

Finally we can bundle everything up:

plugin :: Plugin
plugin = defaultPlugin
    { tcPlugin = const $ Just fundepPlugin

fundepPlugin :: TcPlugin
fundepPlugin = TcPlugin
    { tcPluginInit = findFindClass
    , tcPluginSolve = solveFundep
    , tcPluginStop = const $ pure ()

and voila, upon loading our module via the -fplugin flag, GHC will automatically start solving Member constraints as though they were fundeps!

This isn't the whole story; there are still a few kinks in the implementation for when your given is more polymorphic than your wanted (in which case they shouldn't unify), but this is enough to get a feeling for the idea. As always, the full source code is on Github.

As we've seen, TC plugins are extraordinarily powerful for helping GHC solve domain-specific problems, and simultaneously quite easy to write. They're not often the right solution, but they're a great thing to keep in your tool belt!

May 25, 2019 02:17 AM

May 22, 2019

Philip Wadler

Please vote!

If you are a UK citizen, please remember to vote in the EU elections tomorrow. Latest poll results for Scotland from the Herald are above (and linked). I will vote Green.

Farage's Brexit Party is doing distressingly well. If you are considering sitting this one out, I remind you of the words of Robert Heinlein:
If you are part of a society that votes, then do so. There may be no candidates and no measures you want to vote for ... but there are certain to be ones you want to vote against. [from Time Enough for Love]

by Philip Wadler ( at May 22, 2019 08:01 PM

PLFA in Padova

Next week, I will teach six hours of classes based on PLFA, and deliver a departmental seminar on the same topic. Maybe I will see you there!

Seminar: 16.00-17.00 Tue 28 May (details)

Classes: 12.30-15.30 Wed-Thu 29-30 May (details)

by Philip Wadler ( at May 22, 2019 07:17 PM

Brent Yorgey

Competitive Programming in Haskell: Scanner

In my previous post I explored solving a simple competitive programming problem in Haskell. The input of the problem just consisted of a bunch of lines containing specific data, so that we could parse it using lines and words. There is another common class of problems, however, which follow this pattern:

The first line of the input consists of an integer T. Each of the next T lines consists of…

That is, the input contains integers which are not input data per se but just tell you how many things are to follow. This is really easy to process in an imperative language like Java or C++. For example, in Java we might write code like this:

Scanner in = new Scanner(;
int T = in.nextInt();
for (int i = 0; i < T; i++) {
   // process each line

Occasionally, we can get away with completely ignoring the extra information in Haskell. For example, if the input consists of a number T followed by T lines, each of which contains a number n followed by a list of n numbers, we can just write

main = interact $
  lines >>> drop 1 >>> map (words >>> drop 1 >>> map read) >>> ...

That is, we can ignore the first line containing T since the end-of-file will tell us how many lines there are; and we can ignore the n at the beginning of each line, since the newline character tells us when the list on that line is done.

Sometimes, however, this isn’t possible, especially when there are multiple test cases, or when a single test case has multiple parts, each of which can have a variable length. For example, consider Popular Vote, which describes its input as follows:

The first line of input contains a single positive integer T \leq 500 indicating the number of test cases. The first line of each test case also contains a single positive integer n indicating the number of candidates in the election. This is followed by n lines, with the ith line containing a single nonnegative integer indicating the number of votes candidate i received.

How would we parse this? We could still ignore T—just keep reading until the end of the file—but there’s no way we can ignore the n values. Since the values for each test case are all on separate lines instead of on one line, there’s otherwise no way to know when one test case ends and the next begins.

Once upon a time, I would have done this using splitAt and explicit recursion, like so:

type Election = [Int]

readInput :: String -> [Election]
readInput = lines >>> drop 1 {- ignore T -} >>> map read >>> go
    go :: [Int] -> [Election]
    go []     = []
    go (n:xs) = votes : go rest
      where (votes,rest) = splitAt n xs

However, this is really annoying to write and easy to get wrong. There are way too many variable names to keep track of (n, xs, votes, rest, go) and for more complex inputs it becomes simply unmanageable. You might think we should switch to using a real parser combinator library—parsec is indeed installed in the environment Kattis uses to run Haskell solutions—and although sometimes a full-blown parser combinator library is needed, in this case it’s quite a bit more heavyweight than we would like. I can never remember which modules I have to import to get parsec set up; there’s a bunch of boilerplate needed to set up a lexer; and so on. Using parsec is only worth it if we’re parsing something really complex.


The heart of the issue is that we want to be able to specify a high-level description of the sequence of things we expect to see in the input, without worrying about managing the stream of tokens explicitly. Another key insight is that 99% of the time, we don’t need the ability to deal with parse failure or the ability to parse multiple alternatives. With these insights in mind, we can create a very simple Scanner abstraction, which is just a Stateful computation over a list of tokens:

type Scanner = State [String]

runScanner :: Scanner a -> String -> a
runScanner s = evalState s . words

To run a scanner, we just feed it the entire input as a String, which gets chopped into tokens using words. (Of course in some scenarios we might want to use lines instead of words, or even do more complex tokenization.)

Note since Scanner is just a type synonym for State [String], it is automatically an instance of Functor, Applicative, and Monad (but not Alternative).

So let’s develop a little Scanner DSL. The most fundamental thing we can do is read the next token.

str :: Scanner String
str = get >>= \case { s:ss -> put ss >> return s }

(This uses the LambdaCase extension, though we could easily rewrite it without.) str gets the current list of tokens, puts it back without the first token, and returns the first token. Note that I purposely didn’t include a case for the empty list. You might think we want to include a case for the empty token list and have it return the empty string or something like that. But since the input will always be properly formatted, if this scenario ever happens it means my program has a bug—e.g. perhaps I misunderstood the description of the input format. In this scenario I want it to crash loudly, as soon as possible, rather than continuing on with some bogus data.

We can now add some scanners for reading specific token types other than String, simply by mapping the read function over the output of str:

int :: Scanner Int
int = read <$> str

integer :: Scanner Integer
integer = read <$> str

double :: Scanner Double
double = read <$> str

Again, these will crash if they see a token in an unexpected format, and that is a very deliberate choice.

Now, as I explained earlier, a very common pattern is to have an integer n followed by n copies of something. So let’s make a combinator to encapsulate that pattern:

numberOf :: Scanner a -> Scanner [a]
numberOf s = int >>= flip replicateM s

numberOf s expects to first see an Int value n, and then it runs the provided scanner n times, returning a list of the results.

It’s also sometimes useful to have a way to repeat a Scanner some unknown number of times until encountering EOF (for example, the input for some problems doesn’t specify the number of test cases up front the way that Popular Vote does). This is similar to the many combinator from Alternative.

many :: Scanner a -> Scanner [a]
many s = get >>= \case { [] -> return []; _ -> (:) <$> s <*> many s }

many s repeats the scanner s as many times as it can, returning a list of the results. In particular it first peeks at the current token list to see if it is empty. If so, it returns the empty list of results; if there are more tokens, it runs s once and then recursively calls many s, consing the results together.

Finally, it’s quite common to want to parse a specific small number of something, e.g. two double values representing a 2D coordinate pair. We could just write replicateM 2 double, but this is common enough that I find it helpful to define dedicated combinators with short names:

two, three, four :: Scanner a -> Scanner [a]
[two, three, four] = map replicateM [2..4]

The complete file can be found on GitHub. As I continue this series I’ll be putting more code into that repository. Note I do not intend to make this into a Hackage package, since that wouldn’t be useful: you can’t tell Kattis to go download a package from Hackage before running your submission. However, it is possible to submit multiple files at once, so you can include Scanner.hs in your submission and just import Scanner at the top of your main module.


So what have we gained? Writing the parser for Popular Vote is now almost trivial:

type Election = [Int]

main = interact $ runScanner elections >>> ...

elections :: Scanner [Election]
elections = numberOf (numberOf int)

In practice I would probably just inline the definition of elections directly: interact $ runScanner (numberOf (numberOf int)) >>> ...

As a slightly more involved example, chosen almost at random, consider Board Wrapping:

On the first line of input there is one integer, N \leq 50, giving the number of test cases (moulds) in the input. After this line, N test cases follow. Each test case starts with a line containing one integer n, 1 \leq n \leq 600, which is the number of boards in the mould. Then n lines follow, each with five floating point numbers x,y,w,h,v where 0 \leq x,y,w,h \leq 10000 and -90^{\circ} < v \leq 90^{\circ}. The x and y are the coordinates of the center of the board and w and h are the width and height of the board, respectively. v is the angle between the height axis of the board to the y-axis in degrees, positive clockwise.

Here’s how I would set up the input, using Scanner and a custom data type to represent boards.

import Scanner

type V = [Double]     -- 2D vectors/points
newtype A = A Double  -- angle (radians)
                      -- newtype helps avoid conversion errors

fromDeg :: Double -> A
fromDeg d = A (d * pi / 180)

data Board = Board { boardLoc :: V, boardDims :: V, boardAngle :: A }

board :: Scanner Board
board = Board
  <$> two double
  <*> two double
  <*> ((fromDeg . negate) <$> double)

main = interact $
  runScanner (numberOf (numberOf board)) >>> ...

by Brent at May 22, 2019 05:58 PM

Sandy Maguire

Writing Custom Optimization Passes

I've been paying a lot of attention to performance in polysemy. Getting it to be fast has been really hard. It's clearly possible, but for the longest time I was afraid I'd need to fork the compiler. And that didn't seem like a thing that would attract a large-user base.

For example, polysemy benefits greatly from a late specialization pass, and would benefit further from aggressive inlining after the late specialization pass. Unfortunately, GHC doesn't do any inlining passes after -flate-specialise, so it feels like we're stuck on this front.

Thankfully, the eternally helpful mpickering pointed me at the GHC plugin interface, which has support for directing the optimizer to do things it wouldn't usually.

Today, I want to talk about how I made the polysemy-plugin run two optimizations that greatly benefit code written with polysemy.

The gist of writing a GHC plugin is to import ghc:Plugins, and to create an exported top-level bind plugin :: Plugin. Other code can use this plugin by specifying the -fplugin= option to point at this module.

Installing Core ToDos

Plugins have a field called installCoreToDos with type [CommandLineOption] -> [CoreToDo] -> CoreM [CoreToDo]. A CoreToDo is GHC's oddly-named concept of a compiler pass over Core. This function receives the list of CoreToDos it was planning to do, and you can change that list if you want.

By default there's a big flowchart of CoreToDos that the compiler will run through in order to compile a module. The optimization level (-O) effects which passes get run, as do many of the individual optimization flags.

By attaching our extra optimization passes to the end of this list, we can make GHC optimize harder than it usually would. But because most code won't benefit from this extra work, we guard the new optimization passes behind two conditions. The user must be compiling with optimizations turned on, and the module being compiled must import Polysemy.

Checking for the optimization level is simple enough, we can pull it out of the DynFlags (GHC's datatype that stores all of the crazy flags you might have set):

  dflags <- getDynFlags
  case optLevel dflags of
    0 -> -- corresponds to -O0
    1 -> -- corresponds to -O
    2 -> -- corresponds to -O2

Checking, however, for presence of the Polysemy module is less straightforward. Honestly I'm not sure what the "correct" solution to this problem is, but I'm pretty happy with the disgusting hack I came up with.

The CoreM monad (which is what you're running in when you install CoreToDos) doesn't exactly have stellar documentation. It has access to the HscEnv, which in turn has a hsc_mod_graph :: ModuleGraph --- which sounds like the sort of thing that might contain the modules currently in scope. Unfortunately this is not so; hsc_mod_graph contains the modules defined in the package being defined.

If we could get our hands on the ModGuts (GHC's representation of a Haskell module), we could inspect its mg_deps :: Dependencies field, which would surely have what we need. Unfortunately, I couldn't find any easy way to get access to the ModGuts in a CoreM without jumping through several hoops.

But one thing caught my eye! There is an operation getVisibleOrphanMods :: CoreM ModuleSet, which after some investigation, turns out to contain any module in scope (directly or otherwise) that defines an orphan instance.

It's disgusting, but I made an internal module in polysemy that contains the following definitions:

module Polysemy.Internal.PluginLookup where

class PluginLookup t
data Plugin

and the corresponding orphan instance in the module I wanted to track in my plugin:

{-# OPTIONS_GHC -fno-warn-orphans #-}

import Polysemy.Internal.PluginLookup

instance PluginLookup Plugin

I know, I know. But because the module that defines these things is internal, there's no way for anyone else to define instances of this thing. So at least it's a safe use of orphans.

Sure enough, this little gem is enough to get my module noticed by getVisibleOrphanMods, and so I can check for the presence of my module via:

  mods <- moduleSetElts <$> getVisibleOrphanMods
  if any ((== mkModuleName "Polysemy.Internal") . moduleName) mods
     then ...

And voila, we're now ready to install our extra CoreToDos. In this case, I just cargo-culted a few from GHC's existing passes list. Namely I added a CoreDoSpecialising, a CoreDoStaticArgs, yet another CoreDoSpecialising, and a bevvy of simplification passes. The result might be overkill, but it's sufficient to massage this scary core into this --- and get roughly a 1000x runtime performance improvement in the process.

Inlining Recursive Calls

But this lack of optimization passes wasn't the only thing slowly polysemy down. The library depends on several library- and user-written functions that are complicated and necessarily self-recursive.

GHC is understandably hesitant to inline recursive functions --- the result would diverge --- but as a side-effect, it seems to refuse to optimize big recursive functions whatsoever. For my purposes, this meant that most of the crucial machinery in the library was being completely ignored by GHC's best optimization pass.

I accidentally stumbled upon a fix. To illustrate, let's pretend like the factorial function is my complicated self-recursive function. The optimizer would refuse to fire when the function was written like this:

factorial :: Int -> Int
factorial 0 = 1
factorial n = n * factorial (n - 1)
{-# INLINE factorial #-}

But, a minor syntactic tweak was enough to trick the compiler into optimizing it:

factorial :: Int -> Int
factorial 0 = 1
factorial n = n * factorial' (n - 1)
{-# INLINE factorial #-}

factorial' :: Int -> Int
factorial' = factorial
{-# NOINLINE factorial' #-}

Now factorial is no longer self-recursive. It's mutually recursive, and for some reason, the NO/INLINE pragmas are enough to keep GHC off our back. This is an easy fix, but it's annoying boilerplate. And I hate annoying boilerplate.

Early versions of polysemy shipped with a function inlineRecursiveCalls :: Q [Dec] -> Q [Dec] which would use Template Haskell to transform our slow, self-recursive factorial above into the fast, mutually-exclusive version below. While this worked, it was unsatisfactory; TH splices don't play nicely with haddock or with text editors.

But this isn't something that regular users should need to care about! Optimization concerns should lie solely in the responsibility of library-writers --- not in their users. It seemed like a good opportunity to write a custom optimization pass, and like any curious boy, I took it.

We can use the CoreDoPluginPass :: String -> (ModGuts -> CoreM ModGuts) -> CoreToDo constructor to inject our own ModGuts transformation as an optimization pass. Recall that ModGuts is GHC's definition of a module. For our purposes, we're interested in its mg_binds field, which contains all of the value-level things in the module.

A mg_binds is a [Bind CoreBndr], and a Bind CoreBndr is a pair of a name and its corresponding expression definition. More specifically, the definition for Bind is:

data Bind b = NonRec b (Expr b)
            | Rec [(b, (Expr b))]

A non-recursive binding is something like x = 5, while a recursive binding is anything that is self- or mutually-recursive.

So, if we want to transform self-recursive calls into mutually-recursive calls, we first need to identify if a definition is self-recursive. Fortunately, the incredible syb library comes in handy here, as it lets us write small queries that get lifted over the entire datatype.

We can write containsName using everywhere, mkQ and the Any monoid to determine if the CoreBndr name is used anywhere in the CoreExpr1.

containsName :: CoreBndr -> CoreExpr -> Bool
containsName n =
    getAny .
        (mkQ (Any False) matches)
    matches :: CoreExpr -> Any
    matches (Var n') | n == n' = Any True
    matches _ = Any False

If containsName b e is True for any (b, e) in the mg_binds, then that function is self-recursive. As such, we'd like to generate a new NOINLINE bind for it, and then replace the original self-call to be to this new bind.

Replacing a call is just as easy as finding the recursion:

replace :: CoreBndr -> CoreBndr -> CoreExpr -> CoreExpr
replace n n' = everywhere $ mkT go
    go :: CoreExpr -> CoreExpr
    go v@(Var nn)
      | nn == n   = Var n'
      | otherwise = v
    go x = x

But creating the new binding is rather more work; we need to construct a new name for it, and then fiddle with its IdInfo in order to set the inlining information we'd like.

loopbreaker :: Uniq -> CoreBndr -> CoreExpr -> [(Var, CoreExpr)]
loopbreaker newUniq n e =
  let Just info = zapUsageInfo $ idInfo n
      info' = setInlinePragInfo info alwaysInlinePragma
      n' = mkLocalVar
             (idDetails n)
             (mkInternalName newUniq (occName n) noSrcSpan)
             (idType n)
         $ setInlinePragInfo vanillaIdInfo neverInlinePragma
   in [ (lazySetIdInfo n info', replace n n' e)
      , (n', Var n)

First we use zapUsageInfo to make GHC forget that this binding is self-recursive2, and then use setInlinePragInfo to spiritually inject a {-# INLINE n #-} pragma onto it. We then construct a new name (a nontrivial affair; loopbreaker above is simplified in order to get the new Uniq to ensure our variable is hygienic), and replace the self-recursive call with a call to the new name. Finally, we need to spit out the two resulting binds.

There's a little machinery to call loopbreaker on the mg_guts, but it's uninteresting and this post is already long enough. If you're interested, the full code is available on Github. In total, it's a little less than 100 lines long; pretty good for adding a completely new optimization pass!

That's enough about writing plugins for improving performance; in the next post we'll discuss typechecker plugins, and how they can be used to extend GHC's constraint-solving machinery. Stay tuned!

  1. GHC has a bad habit of using type synonyms. A CoreExpr is just a Expr CoreBndr.

  2. I'm not sure this part is necessary, but it doesn't seem to hurt.

May 22, 2019 01:39 PM

May 21, 2019

Mark Jason Dominus

Super-obscure bug in my code

Say $dt is a Perl DateTime object.

You are allowed to say

  $dt->add( days => 2 )
  $dt->subtract( days => 2 )

Today Jeff Boes pointed out that I had written a program that used

  $dt->add({ days => 2 })

which as far as I can tell is not documented to work. But it did work. (I wrote it in 2016 and would surely have noticed by now if it hadn't.) Jeff told me he noticed when he copied my code and got a warning. When I tried it, no warning.

It turns out that

  $dt->add({ days => 2 })
  $dt->subtract({ days => 2 })

both work, except that:

  1. The subtract call produces a warning (add doesn't! and Jeff had changed my add to subtract)

  2. If you included an end_of_month => $mode parameter in the arguments to subtract, it would get lost.

Also, the working-ness of what I wrote is a lucky fluke. It is undocumented (I think) and works only because of a quirk of the implementation. ->add passes its arguments to DateTime::Duration->new, which passes them to Params::Validate::validate. The latter is documented to accept either form. But its use by DateTime::Duration is an undocumented implementation detail.

->subtract works the same way, except that it does a little bit of preprocessing on the arguments before calling DateTime::Duration->new. That's where the warning comes from, and why end_of_month won't work with the hashref form.

(All this is as of version 1.27. The current version is 1.51. Matthew Horsfall points out that 1.51 does not raise a warning, because of a different change to the same interface.)

This computer stuff is amazingly complicated. I don't know how anyone gets anything done.

by Mark Dominus ( at May 21, 2019 03:04 PM

May 20, 2019

Mark Jason Dominus

Alphabetical order in Korean

Alphabetical order in Korean has an interesting twist I haven't seen in any other language.

(Perhaps I should mention up front that Korean does not denote words with individual symbols the way Chinese does. It has a 24-letter alphabet, invented in the 15th century.)

Consider the Korean word “문어”, which means “octopus”. This is made up of five letters ㅁㅜㄴㅇㅓ. The ㅁㅜㄴ are respectively equivalent to English ‘m’, ‘oo‘ (as in ‘moon‘), and ‘n’. The ㅇis silent, just like ‘k’ in “knit”. The ㅓis a vowel we don't have in English, partway between “saw” and “bud”. Confusingly, it is usually rendered in Latin script as ‘eo’. (It is the first vowel in “Seoul”, for example.) So “문어” is transliterated to Latin script as “muneo”, or “munǒ”, and approximately pronounced “moon-aw”.

But as you see, it's not written as “ㅁㅜㄴㅇㅓ” but as “문어”. The letters are grouped into syllables of two or three letters each. (Or, more rarely, four or even five.)

Now consider the word “무해” (“harmless”) This word is made of the four letters ㅁㅜㅎㅐ. The first two, as before, are ‘m’, ‘oo’. The ㅎ is ‘h’ and the ‘ㅐ’ is a vowel that is something like the vowel in “air”, usually rendered in Latin script as ‘ae’. So it is written “muhae” and pronounced something like “moo-heh”.

ㅎis the last letter of the alphabet. Because ㅎfollows ㄴ, you might think that 무해 would follow 문어. But it does not. In Korean, alphabetization is also done at the syllable level. The syllable 무 comes before 문, because it is a proper prefix, so 무해 comes before 문어. If the syllable break in 문어 were different, causing it to be spelled 무너, it would indeed come before 무해. But it isn't, so it doesn't. (“무너” does not seem to be an actual word, but it appears as a consitutent in words like 무너지다 (“collapse”) and 무너뜨리다 (“demolish”) which do come before 무해 in the dictionary.)

As far as I know, there is nothing in Korean analogous to the English alphabet song.

Or to alphabet soup! Koreans love soup! And they love the alphabet, so why no hangeul-tang? There is a hundred dollar bill lying on the sidewalk here, waiting to be picked up.

[ Previously, but just barely related: Medieval Chinese typesetting technique. ]

by Mark Dominus ( at May 20, 2019 04:25 PM

Monday Morning Haskell

Running From Enemies!


We've spent a few weeks now refactoring a few things in our game. We made it more performant and examined some related concepts. This week, we're going to get back to adding new features to the game! We'll add some enemies, represented by little squares, to rove around our maze! If they touch our player, we'll have to re-start the level!

In the next couple weeks, we'll make these enemies smarter by giving them a better search strategy. Then later, we'll give ourselves the ability to fight back against the enemies. So there will be interesting trade-offs in features.

Remember we have a Github Repository for this project! You can find all the code for this part can in the part-5 branch! For some other interesting Haskell project ideas, download our Production Checklist!


Let's remind ourselves of our process for adding new features. Remember that at the code level, our game has a few main elements:

  1. The World state type
  2. The update function
  3. The drawing function
  4. The event handler

So to change our game, we should update each of these in turn. Let's start with the changes to our world type. First, it's now possible for us to "lose" the game. So we'll need to expand our GameResult type:

data GameResult = GameInProgress | GameWon | GameLost

Now we need to store the enemies. We'll add more data about our enemies as the game develops. So let's make a formal data type and store a list of them in our World. But for right now, all we need to know about them is their current location:

data Enemy = Enemy
  { enemyLocation :: Location

data World = World
  { …
  , worldEnemies :: [Enemy]

Updating The Game

Now that our game contains information about the enemies, let's determine what they do! Enemies won't respond to any input events from the player. Instead, they'll update at a regular interval via our updateFunc. Our first concern will be the game end condition. If the player's current location is one of the enemies locations, we've "lost".

updateFunc :: Float -> World -> World
updateFunc _ w =
  -- Game Win Condition
  | playerLocation w == endLocation w = w { worldResult = GameWon }
  -- Game Loss Condition
  | playerLocation w `elem` (enemyLocation <$> worldEnemies w) = 
      w { worldResult = GameLost }
  | otherwise = ...

Now we'll need a function that updates the location for an individual enemy. We'll have the enemies move at random. This means we'll need to manipulate the random generator in our world. Let's make this function stateful over the random generator.

updateEnemy :: Maze -> Enemy -> State StdGen Enemy

We'll want to examine the enemy's location, and find all the possible locations it can move to. Then we'll select from them at random. This will look a lot like the logic we used when generating our random mazes. It would also be a great spot to use prisms if we were generating them for our types! We might explore this possibility later on in this series.

updateEnemy :: Maze -> Enemy -> State StdGen Enemy
updateEnemy maze e@(Enemy location) = if (null potentialLocs)
  then return e
  else do
    gen <- get
    let (randomIndex, newGen) = randomR
                                  (0, (length potentialLocs) - 1) 
        newLocation = potentialLocs !! randomIndex
    put newGen
    return (Enemy newLocation)
    bounds = maze Array.! location
    maybeUpLoc = case upBoundary bounds of
      (AdjacentCell loc) -> Just loc
      _ -> Nothing
    maybeRightLoc = case rightBoundary bounds of
      (AdjacentCell loc) -> Just loc
      _ -> Nothing
    maybeDownLoc = case downBoundary bounds of
      (AdjacentCell loc) -> Just loc
      _ -> Nothing
    maybeLeftLoc = case leftBoundary bounds of
      (AdjacentCell loc) -> Just loc
      _ -> Nothing
    potentialLocs = catMaybes
      [maybeUpLoc, maybeRightLoc, maybeDownLoc, maybeLeftLoc]

Now that we have this function, we can incorporate it into our main update function. It's a little tricky though. We have to use the sequence function to combine all these stateful actions together. This will also give us our final list of enemies. Then we can insert the new generator and the new enemies into our state!

updateFunc _ w =
  | otherwise = 
      w { worldRandomGenerator = newGen, worldEnemies = newEnemies} 
    (newEnemies, newGen) = runState
      (sequence (updateEnemy (worldBoundaries w) <$> worldEnemies w))
      (worldRandomGenerator w)

Drawing our Enemies

Now we need to draw our enemies on the board. Most of the information is already there. We have a conversion function to get the drawing coordinates. Then we'll derive the corner points of the square within that cell, and draw an orange square.

drawingFunc =
  | otherwise = Pictures
      [..., Pictures (enemyPic <$> worldEnemies world)]
    enemyPic :: Enemy -> Picture
    enemyPic (Enemy loc) =
      let (centerX, centerY) = cellCenter $ conversion loc
          tl = (centerX - 5, centerY + 5)
          tr = (centerX + 5, centerY + 5)
          br = (centerX + 5, centerY - 5)
          bl = (centerX - 5, centerY - 5)
      in  Color orange (Polygon [tl, tr, br, bl])

One extra part of updating the drawing function is that we'll have to draw a "losing" message. This will be a lot like the winning message.

drawingFunc :: (Float, Float) -> Float -> World -> Picture
drawingFunc (xOffset, yOffset) cellSize world
 | worldResult world == GameLost =
     Translate (-275) 0 $ Scale 0.12 0.25
       (Text "Oh no! You've lost! Press enter to restart this maze!")

Odds and Ends

Two little things remain. First, we want a function to randomize the locations of the enemies. We'll use this to decide their positions at the beginning and when we restart. In the future we may add a power-up that allows the player to run this randomizer. As with other random functions, we'll make this function stateful over the StdGen element.

generateRandomLocation :: (Int, Int) -> State StdGen Location
generateRandomLocation (numCols, numRows) = do
  gen <- get
  let (randomCol, gen') = randomR (0, numCols - 1) gen
      (randomRow, gen'') = randomR (0, numRows - 1) gen'
  put gen''
  return (randomCol, randomRow)

As before, we can sequence these stateful actions together. In the case of initializing the board, we'll use replicateM and the number of enemies. Then we can use the locations to make our enemies, and then place the final generator back into our world.

main = do
  gen <- getStdGen
  let (maze, gen') = generateRandomMaze gen (25, 25)
      numEnemies = 4
      (randomLocations, gen'') = runState
        (replicateM numEnemies (generateRandomLocation (25,25)))
      enemies = Enemy <$> randomLocations
      initialWorld = World (0, 0) (0,0) (24,24)
                       maze GameInProgress gen'' enemies
  play ...

The second thing we'll want to do is update the event handler so that it restarts the game when we lose. We'll have similar code to when we win. However, we'll stick with the original maze rather than re-randomizing.

inputHandler :: Event -> World -> World
inputHandler event w
  | worldResult w == GameLost = case event of
      (EventKey (SpecialKey KeyEnter) Down _ _) ->
        let (newLocations, gen') = runState
              (replicateM (length (worldEnemies w)) 
                (generateRandomLocation (25, 25)))
                (worldRandomGenerator w)
        in  World (0,0) (0,0) (24, 24)
             (worldBoundaries w) GameInProgress gen'
             (Enemy <$> newLocations)
      _ -> w

(Note we also have to update the game winning code!) And now we have enemies roving around our maze. Awesome!



Next week we'll step up the difficulty of our game! We'll make the enemies much smarter so that they'll move towards us! This will give us an opportunity to learn about the breadth first search algorithm. There are a few nuances to writing this in Haskell. So don't miss it! The week after, we'll develop a way to stun the enemies. Remember you can follow this project on our Github! The code for this article is on the part-5 branch.

We've used monads, particularly the State monad, quite a bit in this series. Hopefully you can see now how important they are! But they don't have to be difficult to learn! Check out our series on Functional Structures to learn more! It starts with simpler structures like functors. But it will ultimately teach you all the common monads!

by James Bowen at May 20, 2019 02:30 PM

Neil Mitchell

Hoogle XSS Vulnerability

Summary: Hoogle and below have an XSS vulnerability, fixed in later versions.

On Friday afternoon I got an email from Alexander Gugel with the subject line "Non-persistent XSS vulnerability on" - never a good thing to get. He had found that Hoogle was echoing the user search string back into the page, meaning that if you searched for %27"><marquee style you could make all the results scroll past in a disturbingly hypnotic manner. Oh dear!

Step 1: Fix the website

The first concern was to fix the website. While there aren't any cookies stored by Hoogle, and there are no logon forms or similar, the Project Zero blog has taught me that given the tiniest chink, everything can be broken. Fortunately, Alex emailed me using the email address on my webpage, described the problem, and provided a 3 line diff that escaped all the problematic variables. I applied this fix and pushed a new version to

Step 2: Use the type system

Like any good Haskeller, my first thought on encountering a bug is to use the type system to prevent it by construction. The problem boils down to taking user input and splicing it into an HTML page. My initial fix was to introduce a type Taint:

newtype Taint a = Taint a

escapeUntaint :: Taint String -> String
escapeUntaint (Taint x) = escapeHTML x

The idea is that instead of the query parameters to the web page being String's that can be carelessly spliced into the output, they were Taint String values whose only real unwrapping function involves escaping any HTML they may contain. Furthermore, Taint can have instances for Monad etc, meaning you can work on tainted values, but the result will always remain tainted.

Using this approach uncovered no additional problems, but gave me much more confidence there weren't any I just hadn't found.

Step 3: Make a release

At this point I made a release of Hoogle This version has no known XSS issues with it.

Step 4: Switch to blaze-html

While Taint is an effective tool for some domains, the real problem for Hoogle was that I was building up HTML values using String - making it way too easy to create invalid HTML, and providing an easy attack vector. The next change was to switch to blaze-html, which uses strong typing to ensure the HTML is always valid. Instead of having to call escapeHTML to turn bad String into good String, I instead used H.string to turn bad String into good Markup. For the rare case where there genuinely was String that contained HTML for good reasons I used H.preEscapedString, making the "don't escape" explicit and longer, and the "do escape" the default - a much safer default.

Step 5: Use Content Security Policy headers

There are a whole suite of headers that can be returned by the server to opt in to additional checking, known as CSP headers. These headers can ban inline script, detect XSS attacks, avoid confusion with MIME types, avoid http downgrade attacks and more. Thanks to Gary Verhaegen many of these are now applied to Hoogle, meaning that even if my code is wrong, the chances of it causing any damange (even just hypnotic scrolling) are much reduced.

Step 6: Relax

Hoogle has all the security fixes listed and is deployed to Hopefully no more security issues for a while!

Many thanks to Alexander Gugel for the responsible disclosure, and to Gary Verhaegen for his work on CSP headers.

by Neil Mitchell ( at May 20, 2019 08:34 AM

May 18, 2019

Derek Elkins

Absolute Colimits

In category theory a concept is called absolute if it is preserved by all functors. Identity arrows and composition are absolute by the definition of functor. Less trivially, isomorphisms are absolute. In general, anything that is described by a diagram commuting is absolute as that diagram will be preserved by any functor. This is generally the case, but if I tell you something is an absolute epimorphism, it’s not clear what diagram is causing that; the notion of epimorphism itself doesn’t reduce to the commutativity of a particular diagram.

Below I’ll be focused primarily on absolute colimits as those are the most commonly used examples. They are an important part of the theory of monadicity. The trick to many questions about absolute colimits and related concepts is to see what it means for |\newcommand{\Set}{\mathbf{Set}}\newcommand{\Hom}{\mathsf{Hom}} \newcommand{onto}{\twoheadrightarrow}\Hom| functors to preserve them.


To start, we can show that certain colimits cannot be absolute, at least for |\Set|-enriched category theory. In particular, initial objects and coproducts are never absolute. Using the trick above, this is easily proven.

\[\Hom(0,0)\cong 1 \not\cong 0\]

\[\Set(\Hom(X+Y,Z),1)\cong 1 \not\cong 2\cong\Set(\Hom(X,Z),1)+\Set(\Hom(Y,Z),1)\]

Absolute Epimorphisms

What do absolute epimorphisms look like? We know that there are absolute epimorphisms because a split epimorphism is defined by a certain diagram commuting. Are there other absolute epimorphisms? To find out, we apply our trick.

Let |r:X\onto Y| be our epimorphism. The we have the surjection \[\Hom(Y,r):\Hom(Y,X)\onto\Hom(Y,Y)\] but this means that for every arrow |f:Y\to Y|, there’s an arrow |s:Y\to X| such that |f = r \circ s|. As you can no doubt guess, we want to choose |f=id_Y|, and we then have that |r| is a split epimorphism. Therefore split epimorphisms are the only examples of absolute epimorphisms.

Split Coequalizers

Now let’s consider the coequalizer case. Let |f,g:X\to Y| and |e:Y\onto C| be their coequalizer which we’ll assume is absolute. Before we pull out our trick, we can immediately use the previous result to show that |e| has a section, i.e. an arrow |s : C\rightarrowtail Y| such that |id_C=e\circ s|. Moving on we use the trick to get the diagram: \[\Hom(Y,X)\rightrightarrows\Hom(Y,Y)\onto\Hom(Y,C)\]

Next, we use the explicit construction of the coequalizer in |\Set| which |\Hom(Y,C)| is supposed to be canonically isomorphic to. That is, the coequalizer of |\Hom(Y,f)| and |\Hom(Y,g)| is |\Hom(Y,Y)| quotiented by the equivalence relation generated by the relation which identifies |h,k:Y\to Y| when |\exists j:Y\to X.h = f\circ j \land k = g\circ j|. Let |[h]| represent the equivalence class of |h|. The claim that |\Hom(Y,C)| is (with |\Hom(Y,e)|) a coequalizer of the above arrows implies that |e\circ h = \bar e([h])| and |[h]=\bar e^{-1}(e\circ h)| with |\bar e| and |\bar e^{-1}| forming an isomorphism. Of course our next move is to choose |h=id_Y| giving |e=\bar e([id_Y])|. However, |e=e\circ s\circ e = \bar e([s\circ e])| so we get |[id_Y]=[s\circ e]| because |\bar e| is invertible.

If we call the earlier relation |\sim| and write |\sim^*| for its reflexive, symmetric, transitive closure, then |[id_Y] = \{h:Y\to Y\mid id_Y\sim^* h\}|. Therefore |id_Y \sim^* s\circ e|. Now let’s make a simplifying assumption and assume further that |id_Y \sim s\circ e|, i.e. that |id_Y| is directly related to |s\circ e| by |\sim|. By definition of |\sim| this means there is a |t : Y\to X| such that |id_Y = f\circ t| and |s\circ e = g\circ t|. Given |f|, |g|, and |e| such that |e\circ f = e\circ g| and equipped equipped with a |s : C\to Y| and |t : Y\to X| satisfying the previous two equations along with |q\circ s = id_C| is called a split coequalizer. This data is specified diagrammatically and so is preserved by all functors, thus split coequalizers are absolute. All that we need to show is that this data is enough, on its own, to show that |e| is a coequalizer.

Given any |q : Y\to Z| such that |q\circ f = q\circ g|, we need to show that there exists a unique arrow |C\to Z| which |q| factors through. The obvious candidate is |q\circ s| leading to us needing to verify that |q=q\circ s\circ e|. We calculate as follows: \[ \begin{align} q \circ s \circ e & = q \circ g \circ t \\ & = q \circ f \circ t \\ & = q \end{align}\] Uniqueness then quickly follows since if |q = k\circ e| then |q\circ s = k\circ e \circ s = k|. |\square|

There’s actually an even simpler example where |s\circ e = id_Y| which corresponds to the trivial case where |f=g|.

Absolute Coequalizers

Split coequalizers show that (non-trivial) absolute coequalizers can exist, but they don’t exhaust all the possibilities. The obvious cause of this is the simplifying assumption we made where we said |id_Y\sim s\circ e| rather than |id_Y\sim^* s\circ e|. In the general case, the equivalence will be witnessed by a sequence of arrows |t_i : Y\to X| such that we have either |s\circ e = g\circ t_0| or |s \circ e = f\circ t_0|, then |f\circ t_0 = g\circ t_1| or |g\circ t_0 = f\circ t_1| respectively, and so on until we get to |f\circ t_n = id_Y| or |g\circ t_n = id_Y|. As a diagram, this is a fan of diamonds of the form |f\circ t_i = g\circ t_{i+1}| or |g\circ t_i = f\circ t_{i+1}| with a diamond with side |s\circ e| on one end of the fan and a triangle with |id_Y| on the other. All this data is diagrammatic so it is preserved by all functors making the coequalizer absolute. That it is a coequalizer uses the same proof as for split coequalizers except that we have a series of equalities to show that |q\circ s \circ e = q|, namely all those pasted together diamonds. There is no conceptual difficulty here; it’s just awkward to notate.

Absolute Colimits

The absolute coequalizer case captures the spirit of the general case, but you can see a description here. I’m not going to work through it, but you could as an exercise. Less tediously, you could work through absolute pushouts. If |P| is the pushout of |Y \leftarrow X \to Z|, then the functors to consider are |\Hom(P,-)| and |\Hom(Y,-)+\Hom(Z,-)|. For each, the pushout in |\Set| can be turned into a coequalizer of a coproduct. For the first functor, as before, this gives us an inverse image of |id_P| which will either be an arrow |P\to Y| or an arrow |P\to Z| which will play the role of |s|. The other functor produces a coequalizer of |\Hom(Y,Y)+\Hom(Z,Y)+\Hom(Y,Z)+\Hom(Z,Z)|. The generating relation of the equivalence relation states that there exists either an arrow |Y\to X| or an arrow |Z\to X| such that the appropriate equation holds. The story plays out much as before except that we have a sequence of arrows from two different hom-sets.

May 18, 2019 11:59 PM

Sandy Maguire

GHC's Specializer: Much More Than You Wanted to Know

In the course of tracking down why free monads were so slow, I fell into a deep labyrinth of scary GHC internals. Six weeks later, I emerged, significantly more knowledgeable, and having implemented some changes in the compiler that will allow polysemy to be optimized much better. The improvements will be available in 8.10.1.

All of this seems like a good opportunity to share what I've learned, so today let's talk about GHC's specialization pass. This optimization is more popularly known as "the reason why mtl is so fast."

At a high level, the specialization pass is responsible for optimizing away uses of ad-hoc polymorphism (typeclasses) in Haskell source code. When -fspecialise is enabled, GHC will make a monomorphic copy of every polymorphic method --- one for every unique type it's called with. The result should feel similar to anyone who's written modern C++, as it's completely analogous to how templates work.

While polymorphic functions are great for humans to write, they're significantly slower for machines to execute, since you need to pass around vtables and perform dynamic dispatches, and all sorts of crazy things. This is exactly the purpose of GHC's specialization pass, to simply get rid of all of that machinery and keep only the pieces that are explicitly used.

Let's take an example. Consider the following program:

{-# LANGUAGE FlexibleContexts #-}

import           Control.Monad.State.Class
import qualified Control.Monad.Trans.State as S

countdown :: S.StateT Int IO ()
countdown = do
  v <- get
  case v of
    0 -> pure ()
    _ -> do
      put $ v - 1

main :: IO ()
main = S.evalStateT countdown 10

When compiled via ghc Example.hs -O -fno-specialise1, we can look directly at the resulting Core of this program. If you're unfamiliar with Core, it's GHC's intermediate language between source-level Haskell and the generated machine code. Core differs in two notable ways from source Haskell: its evaluation is explicit via case expressions, and both types and typeclass instances are explicitly passed around.

Anyway, here's the relevant Core for our above program:

Rec {
-- RHS size: {terms: 14, types: 13, coercions: 0, joins: 0/0}
  :: Int# -> State# RealWorld -> (# State# RealWorld, ((), Int) #)
  = \ (ww_s49L :: Int#) (w_s49I :: State# RealWorld) ->
      case ww_s49L of ds_X2I1 {
        __DEFAULT -> $wcountdown (-# ds_X2I1 1#) w_s49I;
        0# -> (# w_s49I, lvl1_r4ap #)
end Rec }

-- RHS size: {terms: 12, types: 29, coercions: 0, joins: 0/0}
main1 :: State# RealWorld -> (# State# RealWorld, () #)
  = \ (s_a2Ks :: State# RealWorld) ->
      case $wcountdown 10# s_a2Ks of { (# ipv_a2Kv, ipv1_a2Kw #) ->
      (# ipv_a2Kv,
         case ipv1_a2Kw of { (a1_a2I6, ds2_a2I7) -> a1_a2I6 } #)

As you can see, this is very short and to the point. Reading Core is a bit of an art, but the gist of it is this: main1 calls $wcountdown, which recursively calls itself, until the value of w_s49I is 0# when it stops. It's probably exactly the same code you'd write by hand, if for some reason you were writing Core by hand.

Our program above is written directly against transformers, but nobody actually writes code against transformers in the real world. Choosing a concrete monad transformer stack is limiting, and at the same time, prevents you from restricting access to pieces of the stack. Instead, we're encouraged to write code against abstract monad capabilities, traditionally mtl.

So let's subtly change the type of countdown above:

countdown :: MonadState Int m => m ()

Nothing else in the program needs to change. Let's now compile this program again via ghc Example.hs -O -fno-specialise. The result is horrendously worse Core:

Rec {
-- RHS size: {terms: 35, types: 47, coercions: 0, joins: 0/2}
  :: forall (m :: * -> *).
     Applicative m =>
     (forall a b. m a -> (a -> m b) -> m b)
     -> (forall a b. m a -> m b -> m b)
     -> m Int
     -> (Int -> m ())
     -> m ()
  = \ (@ (m_s4WK :: * -> *))
      (ww_s4WR :: Applicative m_s4WK)
      (ww1_s4WS :: forall a b. m_s4WK a -> (a -> m_s4WK b) -> m_s4WK b)
      (ww2_s4WT :: forall a b. m_s4WK a -> m_s4WK b -> m_s4WK b)
      (ww3_s4WX :: m_s4WK Int)
      (ww4_s4WY :: Int -> m_s4WK ()) ->
      let {
        lvl6_s4W1 :: m_s4WK ()
          = $wcountdown
              @ m_s4WK ww_s4WR ww1_s4WS ww2_s4WT ww3_s4WX ww4_s4WY } in
      let {
        lvl7_s4W2 :: m_s4WK ()
        lvl7_s4W2 = pure @ m_s4WK ww_s4WR @ () () } in
        @ Int
        @ ()
        (\ (v_a192 :: Int) ->
           case v_a192 of { I# ds_d3xJ ->
           case ds_d3xJ of ds1_X3xT {
             __DEFAULT ->
               ww2_s4WT @ () @ () (ww4_s4WY (I# (-# ds1_X3xT 1#))) lvl6_s4W1;
             0# -> lvl7_s4W2
end Rec }

-- RHS size: {terms: 17, types: 32, coercions: 21, joins: 0/0}
main1 :: State# RealWorld -> (# State# RealWorld, () #)
  = \ (s_a3z5 :: State# RealWorld) ->
      case (((($wcountdown
                 @ (StateT Int IO)
                 (lvl3_r50k `cast` <Co:13>)
              `cast` <Co:4>)
            `cast` <Co:4>)
      { (# ipv_a3z8, ipv1_a3z9 #) ->
      (# ipv_a3z8,
         case ipv1_a3z9 of { (a1_a3y3, ds2_a3y4) -> a1_a3y3 } #)

Yikes! What a mess! It's amazing how much of a difference that one type signature made! Our simple mtl program above has turned into an unholy mess of passing around overly-polymorphic functions. We've paid an awful price to abstract away our monad stack, even though the actual program being run didn't change!

Of course, this isn't a real problem in the wild. Compile the program again, this time without the -fno-specialise flag2 --- ghc Example.hs -O:

Rec {
-- RHS size: {terms: 14, types: 13, coercions: 0, joins: 0/0}
  :: Int# -> State# RealWorld -> (# State# RealWorld, ((), Int) #)
  = \ (ww_s5dY :: Int#) (w_s5dV :: State# RealWorld) ->
      case ww_s5dY of ds_X3xU {
        __DEFAULT -> $w$scountdown (-# ds_X3xU 1#) w_s5dV;
        0# -> (# w_s5dV, lvl1_r5jV #)
end Rec }

-- RHS size: {terms: 12, types: 29, coercions: 0, joins: 0/0}
main1 :: State# RealWorld -> (# State# RealWorld, () #)
  = \ (s_X3Bw :: State# RealWorld) ->
      case $w$scountdown 10# s_X3Bw of { (# ipv_a3z9, ipv1_a3za #) ->
      (# ipv_a3z9,
         case ipv1_a3za of { (a1_a3y4, ds2_a3y5) -> a1_a3y4 } #)

Whew! We're back to the speedy program we started with. -fspecialise has done the hard work of transforming our abstract code into fast code for us --- exactly as a good compiler should.

What's Going On?

It's amazing how drastic the differences are in the generated code, just from flipping a switch!

Before we can discuss exactly how this transformation helps, we need to first go over some details of how GHC implements a few source-level Haskell features. The first is dictionaries, which are how typeclass dispatch works.


Consider the following program in source-level Haskell:

class Eq a where
  (==) :: a -> a -> Bool

instance Eq () where
  (==) _ _ = True

equate :: Eq a => a -> a -> Bool
equate a1 a2 = a1 == a2

main :: IO ()
main = print $ equate () ()

Internally, GHC will generate the equivalent program:

data Eq a = Eq  -- #1
  (a -> a -> Bool)

(==) :: Eq a -> a -> a -> Bool
(==) dEq'a =  -- #2
  case dEq'a of
    Eq eqMethod -> eqMethod

eqUnit :: Eq ()  -- # 3
eqUnit = Eq
  (\_ _ -> True)

equate :: Eq a -> a -> a -> Bool  -- #4
equate dEq'a a1 a2 = (==) dEq'a a1 a2  -- #5

main :: IO ()
main = print $ equate eqUnit () ()  -- #6

Notably, the following changes occur:

  1. The class Eq a is transformed into data Eq a, with each class method becoming a field.
  2. The class method (==) receives a new Eq a parameter, and becomes a function which pattern matches on it.
  3. The instance Eq () becomes a top-level declaration of an Eq () value.
  4. The Eq a constraint on equate becomes a parameter of the new Eq a datatype.
  5. The usage of (==) in equate receives the new dEq'a parameter.
  6. The usage of equate at type a ~ () in main receives the new top-level eqUnit :: Eq () value as an argument.

We call the values eqUnit and dEq'a dictionaries. More precisely, a dictionary is any value whose type is a data type corresponding to a typeclass. Dictionaries do not exist in source-level Haskell, only in the generated Core. In real Core, dictionaries have names that start with $d, but we'll omit the leading $ today, so we don't get it confused with the ($) operator.

From all of this that we see that, under the hood, class definitions are just data definitions, and that constraints are just invisible parameters.

Case of Known Constructor

Consider the following program:

blah =
  case True of
    True  -> foo
    False -> bar

Because we're scrutinizing on a constant value here, the result of this expression must always be foo. As such, it's safe to replace the entire pattern match expression with foo:

blah = foo

This transformation is known as the case of known constructor optimization. While humans would never write such a thing by hand, expressions like these often come up as the result of other optimizing transformations.

Rewrite Rules

One final thing to discuss is GHC's term rewriting mechanism, known as rewrite rules.

Rewrite rules are little statements that say "this thing can be written as that thing." Whenever GHC encounters this thing, it will duly rewrite it as that thing. The motivating use case is to allow library authors to implement domain-specific optimizations --- such as ensuring composing functions don't generate intermediate structures. You might have heard of "list fusion," which is implemented via rewrite rules.

Rewrite rules must preserve the type of the expression, but besides that are free to do anything they'd like. Just as an example, we can write a program which prints hello world seemingly from nowhere:

  "it's magic!"
    pure () = putStrLn "hello world"

main :: IO ()
main = pure ()

Compiling this with -O0 won't print any message when run, but will print hello world when compiled with -O. Spooky!

When -XTypeApplications is enabled, rewrite rules are allowed to match on types too! For example, the following program will print 2 1 1:

{-# LANGUAGE AllowAmbiguousTypes #-}
{-# LANGUAGE RankNTypes          #-}
{-# LANGUAGE TypeApplications    #-}

magic :: forall b a. a -> a
magic = id
{-# NOINLINE magic #-}

{-# RULES "it's magic!"
      forall (a :: Int).
        magic @String a = a + 1

main :: IO ()
main = do
  print $ magic @String (1 :: Int)
  print $ magic @Bool   (1 :: Int)
  print $ magic @String (1 :: Integer)

Of course, you shouldn't abuse rewrite rules like this --- make sure any rules you write are just more efficient versions of an equivalent program --- but it's helpful to demonstrate what's going on.

Internally, GHC uses lots of rewrite rules itself! All of its constant-folding (e.g. replacing 2 + 3 with 5 at compile time) is done via rewrite rules, which helps separate that logic from the main compiler.


So with all of that background information out of the way, we're finally ready to talk about how the specializer works.

Recall our our original mtl program, transformed so it has its dictionaries explicitly passed:

countdown :: Monad m -> MonadState Int m -> m ()
-- There is a `Monad m` constraint on `MonadState s m`, which is where this
-- extra constraint comes from.
countdown dMonad'm dMonadState'm = do dMonad'm
  v <- get dMonadState'm
  case v of
    0 -> pure dMonad'm ()
    _ -> do dMonad'm
      put dMonadState'm $ v - 1
      countdown dMonad'm dMonadState'm

main :: IO ()
main =
      (dMonadStateT dMonadIO)
      (dMonadStateStateT dMonadIO))

When -fspecialise is set, the specializer will look for any calls to polymorphic functions with all of their dictionaries saturated by "interesting" dictionaries. The dictionaries dMonad'm and dMonadState'm in countdown aren't interesting, since they're just opaque dictionary variables; we don't know anything about them.

However, GHC notices that countdown is called with m ~ StateT Int IO, and that all of its dictionaries are statically known. As such, it emits a specialized version of countdown, monomorphized to StateT Int IO ():

scountdown_StateT :: StateT Int IO ()
scountdown_StateT = do (dMonadStateT dMonadIO)
  v <- get (dMonadStateStateT dMonadIO)
  case v of
    0 -> pure (dMonadStateT dMonadIO) ()
    _ -> do (dMonadStateT dMonadIO)
      put (dMonadStateStateT dMonadIO) $ v - 1

In addition, the specializer will emit a rewrite rule:

{-# RULES "SPEC countdown @ (StateT Int IO)"
      forall (dMonad'm :: Monad (StateT Int IO))
             (dMonadState'm :: MonadState Int (StateT Int IO)).
        countdown @(StateT Int IO) dMonad'm dMonadState'm =

This rewrite rule will find any call to countdown at m ~ StateT Int IO, ignore the dictionaries passed to it, and replace the entire expression with the specialized scountdown_StateT function.

In particular, this means that main becomes:

main :: IO ()
main = S.evalStateT scountdown_StateT 10

The rule takes advantage of the fact that dictionaries are known to be consistent (all expressions for a dictionary of a given type eventually evaluate to the same record), so it can completely ignore its two dictionary arguments. However, in principle there's absolutely no reason this same technique couldn't be used to specialize on other, non-dictionary, arguments!

Notice now that pure, get, and the two do-blocks in scountdown_StateT are now called with interesting dictionaries, so pure, get and >>= can now all also be specialized at StateT Int IO.

Eventually the concrete dictionaries and corresponding specializations have propagated throughout the entire program. The optimizer can take advantage of two other properties now, namely that class methods were already transformed into pattern matches, and that all of the dictionaries are statically known. Which means, we have created several places in which we can now case of known case!

For example, let's consider the get in countdown. It now looks something like this:

  v <- case MonadState (StateT $ \s -> implOfPureForIO (s, s)) ... of
         MonadState getMethod _ _ -> getMethod

which can obviously be simplified to

  v <- StateT $ \s -> implOfPureForIO (s, s)

This is already a great improvement! But it gets better, recall that we're binding in the StateT monad, which in turn is calling bind in IO. But bind in IO is itself implemented as a pattern match, and so case-of-known-constructor applies there too!

The end result is that GHC spins for a while, alternatingly specializing, inlining, case-of-known-casing, and performing a few other optimizations. Each of these in turn opens up additional opportunities for the others to fire. After a few iterations of this, the resulting code is often orders of magnitude faster!

Coming in 8.10.1...

Everything described above is how the compiler behaves today in GHC 8.6.5 (and has, since like 2007 or something.) However, when digging into the performance of my free monad library polysemy, I noticed that code written against my library wasn't benefiting from the specialization pass! As a result, my library was performing anywhere between 10x and 1000x worse than mtl, even though the eventual code being run was identical to mtl.

Like our experiments above into mtl, I was paying a performance cost for abstraction, even though the concrete program was identical.

Some investigation by the indefatigable mpickering pointed out that the specializer was failing to specialize. As it happens, the specializer is more than happy to optimize away dictionaries that are passed as the first non-type arguments to a function, but no others.

That means it will go home early if it runs into a function whose signature is of the form:

foo :: Int -> forall a. Eq a => ...

Again, humans would never write such a thing, but the optimizer is more than happy to spit these things out. Additionally, code like this often shows up whenever you use a newtype to get around GHC's annoying error that it "does not (yet) support impredicative polymorphism".

Anyway, all of this is to say that in 8.10.1, the specialization pass is now smart enough to specialize functions like foo. As a result, we should see very real performance improvements in libraries like polysemy and lens, and, excitingly, any programs which use them!

  1. The meaning of the flags is --- -O: enable optimizations; -fno-specialise: disable the specialization pass.

  2. -fspecialise is included in -O.

May 18, 2019 09:59 PM

May 17, 2019

Mark Jason Dominus

What's the difference between 0/0 and 1/0?

Last year a new Math Stack Exchange user asked What's the difference between and ?.

I wrote an answer I thought was pretty good, but the question was downvoted and deleted as “not about mathematics”. This is bullshit, but what can I do?

I can repatriate my answer here, anyway.

This long answer has two parts. The first one is about the arithmetic, and is fairly simple, and is not very different from the other answers here: neither nor has any clear meaning. But your intuition is a good one: if one looks at the situation more carefully, and behave rather differently, and there is more to the story than can be understood just from the arithmetic part. The second half of my answer tries to go into these developments.

The notation has a specific meaning:

The number for which $$x\cdot b=a.$$

Usually this is simple enough. There is exactly one number for which , namely , so . There is exactly one number for which , namely , so .

But when we can't keep the promise that is implied by the word "the" in "The number for which...". Let's see what goes wrong. When the definition says:

The number for which $$x\cdot 0=a.$$

When this goes severely wrong. The left-hand side is zero and the right-hand size is not, so there is no number that satisfies the condition. Suppose is the ugliest gorilla on the dairy farm. But the farm has no gorillas, only cows. Any further questions you have about are pointless: is a male or female gorilla? Is its fur black or dark gray? Does prefer bananas or melons? There is no such , so the questions are unanswerable.

When and are both zero, something different goes wrong:

The number for which $$x\cdot 0=0.$$

It still doesn't work to speak of "The number for which..." because any will work. Now it's like saying that is ‘the’ cow from the dairy farm, But there are many cows, so questions about are still pointless, although in a different way: Does have spots? I dunno man, what is ?

Asking about this , as an individual number, never makes sense, for one reason or the other, either because there is no such at all ( when ) or because the description is not specific enough to tell you anything ().

If you are trying to understand this as a matter of simple arithmetic, using analogies about putting cookies into boxes, this is the best you can do. That is a blunt instrument, and for a finer understanding you need to use more delicate tools. In some contexts, the two situations ( and ) are distinguishable, but you need to be more careful.

Suppose and are some functions of , each with definite values for all numbers , and in particular . We can consider the quantity $$q(x) = \frac{f(x)}{g(x)}$$ and ask what happens to when gets very close to . The quantity itself is undefined, because at the denominator is . But we can still ask what happens to when gets close to zero, but before it gets all the way there. It's possible that as gets closer and closer to zero, might get closer and closer to some particular number, say ; we can ask if there is such a number , and if so what it is.

It turns out we can distinguish quite different situations depending on whether the numerator is zero or nonzero. When , we can state decisively that there is no such . For if there were, it would have to satisfy which is impossible; would have to be a gorilla on the dairy farm. There are a number of different ways that can behave in such cases, when its denominator approaches zero and its numerator does not, but all of the possible behaviors are bad: can increase or decrease without bound as gets close to zero; or it can do both depending on whether we approach zero from the left or the right; or it can oscillate more and more wildly, but in no case does it do anything like gently and politely approaching a single number .

But if , the answer is more complicated, because (if it exists at all) would only need to satisfy , which is easy. So there might actually be a that works; it depends on further details of and , and sometimes there is and sometimes there isn't. For example, when and then . This is still undefined at but at any other value of it is equal to , and as approaches zero, slides smoothly in toward along the straight line . When is close to (but not equal to) zero, is close to (but not equal to) ; for example when then , and as gets closer to zero gets even closer to . So the number we were asking about does exist, and is in fact equal to . On the other hand if and then there is still no such .

The details of how this all works, when there is a and when there isn't, and how to find it, are very interesting, and are the basic idea that underpins all of calculus. The calculus part was invented first, but it bothered everyone because although it seemed to work, it depended on an incoherent idea about how division by zero worked. Trying to frame it as a simple matter of putting cookies into boxes was no longer good enough. Getting it properly straightened out was a long process that took around 150 years, but we did eventually get there and now I think we understand the difference between and pretty well. But to really understand the difference you probably need to use the calculus approach, which may be more delicate than what you are used to. But if you are interested in this question, and you want the full answer, that is definitely the way to go.

by Mark Dominus ( at May 17, 2019 07:40 PM

Edward Z. Yang

PyTorch internals

This post is a long form essay version of a talk about PyTorch internals, that I gave at the PyTorch NYC meetup on May 14, 2019.

Hi everyone! Today I want to talk about the internals of PyTorch.

This talk is for those of you who have used PyTorch, and thought to yourself, "It would be great if I could contribute to PyTorch," but were scared by PyTorch's behemoth of a C++ codebase. I'm not going to lie: the PyTorch codebase can be a bit overwhelming at times. The purpose of this talk is to put a map in your hands: to tell you about the basic conceptual structure of a "tensor library that supports automatic differentiation", and give you some tools and tricks for finding your way around the codebase. I'm going to assume that you've written some PyTorch before, but haven't necessarily delved deeper into how a machine learning library is written.

The talk is in two parts: in the first part, I'm going to first introduce you to the conceptual universe of a tensor library. I'll start by talking about the tensor data type you know and love, and give a more detailed discussion about what exactly this data type provides, which will lead us to a better understanding of how it is actually implemented under the hood. If you're an advanced user of PyTorch, you'll be familiar with most of this material. We'll also talk about the trinity of "extension points", layout, device and dtype, which guide how we think about extensions to the tensor class. In the live talk at PyTorch NYC, I skipped the slides about autograd, but I'll talk a little bit about them in these notes as well.

The second part grapples with the actual nitty gritty details involved with actually coding in PyTorch. I'll tell you how to cut your way through swaths of autograd code, what code actually matters and what is legacy, and also all of the cool tools that PyTorch gives you for writing kernels.

The tensor is the central data structure in PyTorch. You probably have a pretty good idea about what a tensor intuitively represents: its an n-dimensional data structure containing some sort of scalar type, e.g., floats, ints, et cetera. We can think of a tensor as consisting of some data, and then some metadata describing the size of the tensor, the type of the elements in contains (dtype), what device the tensor lives on (CPU memory? CUDA memory?)

There's also a little piece of metadata you might be less familiar with: the stride. Strides are actually one of the distinctive features of PyTorch, so it's worth discussing them a little more.

A tensor is a mathematical concept. But to represent it on our computers, we have to define some sort of physical representation for them. The most common representation is to lay out each element of the tensor contiguously in memory (that's where the term contiguous comes from), writing out each row to memory, as you see above. In the example above, I've specified that the tensor contains 32-bit integers, so you can see that each integer lies in a physical address, each offset four bytes from each other. To remember what the actual dimensions of the tensor are, we have to also record what the sizes are as extra metadata.

So, what do strides have to do with this picture?

Suppose that I want to access the element at position tensor[0, 1] in my logical representation. How do I translate this logical position into a location in physical memory? Strides tell me how to do this: to find out where any element for a tensor lives, I multiply each index with the respective stride for that dimension, and sum them all together. In the picture above, I've color coded the first dimension blue and the second dimension red, so you can follow the index and stride in the stride calculation. Doing this sum, I get two (zero-indexed), and indeed, the number three lives two below the beginning of the contiguous array.

(Later in the talk, I'll talk about TensorAccessor, a convenience class that handles the indexing calculation. When you use TensorAccessor, rather than raw pointers, this calculation is handled under the covers for you.)

Strides are the fundamental basis of how we provide views to PyTorch users. For example, suppose that I want to extract out a tensor that represents the second row of the tensor above:

Using advanced indexing support, I can just write tensor[1, :] to get this row. Here's the important thing: when I do this, I don't create a new tensor; instead, I just return a tensor which is a different view on the underlying data. This means that if I, for example, edit the data in that view, it will be reflected in the original tensor. In this case, it's not too hard to see how to do this: three and four live in contiguous memory, and all we need to do is record an offset saying that the data of this (logical) tensor lives two down from the top. (Every tensor records an offset, but most of the time it's zero, and I'll omit it from my diagrams when that's the case.)

Question from the talk: If I take a view on a tensor, how do I free the memory of the underlying tensor?

Answer: You have to make a copy of the view, thus disconnecting it from the original physical memory. There's really not much else you can do. By the way, if you have written Java in the old days, taking substrings of strings has a similar problem, because by default no copy is made, so the substring retains the (possibly very large string). Apparently, they fixed this in Java 7u6.

A more interesting case is if I want to take the first column:

When we look at the physical memory, we see that the elements of the column are not contiguous: there's a gap of one element between each one. Here, strides come to the rescue: instead of specifying a stride of one, we specify a stride of two, saying that between one element and the next, you need to jump two slots. (By the way, this is why it's called a "stride": if we think of an index as walking across the layout, the stride says how many locations we stride forward every time we take a step.)

The stride representation can actually let you represent all sorts of interesting views on tensors; if you want to play around with the possibilities, check out the Stride Visualizer.

Let's step back for a moment, and think about how we would actually implement this functionality (after all, this is an internals talk.) If we can have views on tensor, this means we have to decouple the notion of the tensor (the user-visible concept that you know and love), and the actual physical data that stores the data of the tensor (called storage):

There may be multiple tensors which share the same storage. Storage defines the dtype and physical size of the tensor, while each tensor records the sizes, strides and offset, defining the logical interpretation of the physical memory.

One thing to realize is that there is always a pair of Tensor-Storage, even for "simple" cases where you don't really need a storage (e.g., you just allocated a contiguous tensor with torch.zeros(2, 2)).

By the way, we're interested in making this picture not true; instead of having a separate concept of storage, just define a view to be a tensor that is backed by a base tensor. This is a little more complicated, but it has the benefit that contiguous tensors get a much more direct representation without the Storage indirection. A change like this would make PyTorch's internal representation a bit more like Numpy's.

We've talked quite a bit about the data layout of tensor (some might say, if you get the data representation right, everything else falls in place). But it's also worth briefly talking about how operations on the tensor are implemented. At the very most abstract level, when you call, two dispatches happen:

The first dispatch is based on the device type and layout of a tensor: e.g., whether or not it is a CPU tensor or a CUDA tensor (and also, e.g., whether or not it is a strided tensor or a sparse one). This is a dynamic dispatch: it's a virtual function call (exactly where that virtual function call occurs will be the subject of the second half of this talk). It should make sense that you need to do a dispatch here: the implementation of CPU matrix multiply is quite different from a CUDA implementation. It is a dynamic dispatch because these kernels may live in separate libraries (e.g., versus, and so you have no choice: if you want to get into a library that you don't have a direct dependency on, you have to dynamic dispatch your way there.

The second dispatch is a dispatch on the dtype in question. This dispatch is just a simple switch-statement for whatever dtypes a kernel chooses to support. Upon reflection, it should also make sense that we need to a dispatch here: the CPU code (or CUDA code, as it may) that implements multiplication on float is different from the code for int. It stands to reason you need separate kernels for each dtype.

This is probably the most important mental picture to have in your head, if you're trying to understand the way operators in PyTorch are invoked. We'll return to this picture when it's time to look more at code.

Since we have been talking about Tensor, I also want to take a little time to the world of tensor extensions. After all, there's more to life than dense, CPU float tensors. There's all sorts of interesting extensions going on, like XLA tensors, or quantized tensors, or MKL-DNN tensors, and one of the things we have to think about, as a tensor library, is how to accommodate these extensions.

Our current model for extensions offers four extension points on tensors. First, there is the trinity three parameters which uniquely determine what a tensor is:

  • The device, the description of where the tensor's physical memory is actually stored, e.g., on a CPU, on an NVIDIA GPU (cuda), or perhaps on an AMD GPU (hip) or a TPU (xla). The distinguishing characteristic of a device is that it has its own allocator, that doesn't work with any other device.
  • The layout, which describes how we logically interpret this physical memory. The most common layout is a strided tensor, but sparse tensors have a different layout involving a pair of tensors, one for indices, and one for data; MKL-DNN tensors may have even more exotic layout, like blocked layout, which can't be represented using merely strides.
  • The dtype, which describes what it is that is actually stored in each element of the tensor. This could be floats or integers, or it could be, for example, quantized integers.

If you want to add an extension to PyTorch tensors (by the way, if that's what you want to do, please talk to us! None of these things can be done out-of-tree at the moment), you should think about which of these parameters you would extend. The Cartesian product of these parameters define all of the possible tensors you can make. Now, not all of these combinations may actually have kernels (who's got kernels for sparse, quantized tensors on FPGA?) but in principle the combination could make sense, and thus we support expressing it, at the very least.

There's one last way you can make an "extension" to Tensor functionality, and that's write a wrapper class around PyTorch tensors that implements your object type. This perhaps sounds obvious, but sometimes people reach for extending one of the three parameters when they should have just made a wrapper class instead. One notable merit of wrapper classes is they can be developed entirely out of tree.

When should you write a tensor wrapper, versus extending PyTorch itself? The key test is whether or not you need to pass this tensor along during the autograd backwards pass. This test, for example, tells us that sparse tensor should be a true tensor extension, and not just a Python object that contains an indices and values tensor: when doing optimization on networks involving embeddings, we want the gradient generated by the embedding to be sparse.

Our philosophy on extensions also has an impact of the data layout of tensor itself. One thing we really want out of our tensor struct is for it to have a fixed layout: we don't want fundamental (and very frequently called) operations like "What's the size of a tensor?" to require virtual dispatches. So when you look at the actual layout of a Tensor (defined in the TensorImpl struct), what we see is a common prefix of all fields that we consider all "tensor"-like things to universally have, plus a few fields that are only really applicable for strided tensors, but are so important that we've kept them in the main struct, and then a suffix of custom fields that can be done on a per-Tensor basis. Sparse tensors, for example, store their indices and values in this suffix.

I told you all about tensors, but if that was the only thing PyTorch provided, we'd basically just be a Numpy clone. The distinguishing characteristic of PyTorch when it was originally released was that it provided automatic differentiation on tensors (these days, we have other cool features like TorchScript; but back then, this was it!)

What does automatic differentiation do? It's the machinery that's responsible for taking a neural network:

...and fill in the missing code that actually computes the gradients of your network:

Take a moment to study this diagram. There's a lot to unpack; here's what to look at:

  1. First, rest your eyes on the variables in red and blue. PyTorch implements reverse-mode automatic differentiation, which means that we effectively walk the forward computations "backward" to compute the gradients. You can see this if you look at the variable names: at the bottom of the red, we compute loss; then, the first thing we do in the blue part of the program is compute grad_loss. loss was computed from next_h2, so we compute grad_next_h2. Technically, these variables which we call grad_ are not really gradients; they're really Jacobians left-multiplied by a vector, but in PyTorch we just call them grad and mostly everyone knows what we mean.
  2. If the structure of the code stays the same, the behavior doesn't: each line from forwards is replaced with a different computation, that represents the derivative of the forward operation. For example, the tanh operation is translated into a tanh_backward operation (these two lines are connected via a grey line on the left hand side of the diagram). The inputs and outputs of the forward and backward operations are swapped: if the forward operation produced next_h2, the backward operation takes grad_next_h2 as an input.

The whole point of autograd is to do the computation that is described by this diagram, but without actually ever generating this source. PyTorch autograd doesn't do a source-to-source transformation (though PyTorch JIT does know how to do symbolic differentiation).

To do this, we need to store more metadata when we carry out operations on tensors. Let's adjust our picture of the tensor data structure: now instead of just a tensor which points to a storage, we now have a variable which wraps this tensor, and also stores more information (AutogradMeta), which is needed for performing autograd when a user calls loss.backward() in their PyTorch script.

This is yet another slide which will hopefully be out of date in the near future. Will Feng is working on a Variable-Tensor merge in C++, following a simple merge which happened to PyTorch's frontend interface.

We also have to update our picture about dispatch:

Before we dispatch to CPU or CUDA implementations, there is another dispatch on variables, which is responsible for unwrapping variables, calling the underlying implementation (in green), and then rewrapping the results into variables and recording the necessary autograd metadata for backwards.

Some implementations don't unwrap; they just call into other variable implementations. So you might spend a while in the Variable universe. However, once you unwrap and go into the non-Variable Tensor universe, that's it; you never go back to Variable (except by returning from your function.)

In my NY meetup talk, I skipped the following seven slides. I'm also going to delay writeup for them; you'll have to wait for the sequel for some text.

Enough about concepts, let's look at some code.

PyTorch has a lot of folders, and there is a very detailed description of what they are in the CONTRIBUTING document, but really, there are only four directories you really need to know about:
  • First, torch/ contains what you are most familiar with: the actual Python modules that you import and use. This stuff is Python code and easy to hack on (just make a change and see what happens). However, lurking not too deep below the surface is...
  • torch/csrc/, the C++ code that implements what you might call the frontend of PyTorch. In more descriptive terms, it implements the binding code that translates between the Python and C++ universe, and also some pretty important pieces of PyTorch, like the autograd engine and the JIT compiler. It also contains the C++ frontend code.
  • aten/, short for "A Tensor Library" (coined by Zachary DeVito), is a C++ library that implements the operations of Tensors. If you're looking for where some kernel code lives, chances are it's in ATen. ATen itself bifurcates into two neighborhoods of operators: the "native" operators, which are modern, C++ implementations of operators, and the "legacy" operators (TH, THC, THNN, THCUNN), which are legacy, C implementations. The legacy operators are the bad part of town; try not to spend too much time there if you can.
  • c10/, which is a pun on Caffe2 and A"Ten" (get it? Caffe 10) contains the core abstractions of PyTorch, including the actual implementations of the Tensor and Storage data structures.

That's a lot of places to look for code; we should probably simplify the directory structure, but that's how it is. If you're trying to work on operators, you'll spend most of your time in aten.

Let's see how this separation of code breaks down in practice:

When you call a function like torch.add, what actually happens? If you remember the discussion we had about dispatching, you already have the basic picture in your head:

  1. We have to translate from Python realm to the C++ realm (Python argument parsing)
  2. We handle variable dispatch (VariableType--Type, by the way, doesn't really have anything to do programming language types, and is just a gadget for doing dispatch.)
  3. We handle device type / layout dispatch (Type)
  4. We have the actual kernel, which is either a modern native function, or a legacy TH function.

Each of these steps corresponds concretely to some code. Let's cut our way through the jungle.

Our initial landing point in the C++ code is the C implementation of a Python function, which we've exposed to the Python side as something like torch._C.VariableFunctions.add. THPVariable_add is the implementation of one such implementation.

One important thing to know about this code is that it is auto-generated. If you search in the GitHub repository, you won't find it, because you have to actually build PyTorch to see it. Another important thing is, you don't have to really deeply understand what this code is doing; the idea is to skim over it and get a sense for what it is doing. Above, I've annotated some of the most important bits in blue: you can see that there is a use of a class PythonArgParser to actually pull out C++ objects out of the Python args and kwargs; we then call a dispatch_add function (which I've inlined in red); this releases the global interpreter lock and then calls a plain old method on the C++ Tensor self. On its way back, we rewrap the returned Tensor back into a PyObject.

(At this point, there's an error in the slides: I'm supposed to tell you about the Variable dispatch code. I haven't fixed it here yet. Some magic happens, then...)

When we call the add method on the Tensor class, no virtual dispatch happens yet. Instead, we have an inline method which calls a virtual method on a "Type" object. This method is the actual virtual method (this is why I say Type is just a "gadget" that gets you dynamic dispatch.) In the particular case of this example, this virtual call dispatches to an implementation of add on a class named TypeDefault. This happens to be because we have an implementation of add that is the same for every device type (both CPU and CUDA); if we had happened to have different implementations, we might have instead landed on something like CPUFloatType::add. It is this implementation of the virtual method that finally gets us to the actual kernel code.

Hopefully, this slide will be out-of-date very soon too; Roy Li is working on replacing Type dispatch with another mechanism which will help us better support PyTorch on mobile.

It's worth reemphasizing that all of the code, until we got to the kernel, is automatically generated.

It's a bit twisty and turny, so once you have some basic orientation about what's going on, I recommend just jumping straight to the kernels.

PyTorch offers a lot of useful tools for prospective kernel writers. In this section, we'll walk through a few of them. But first of all, what do you need to write a kernel?

We generally think of a kernel in PyTorch consisting of the following parts:

  1. First, there's some metadata which we write about the kernel, which powers the code generation and lets you get all the bindings to Python, without having to write a single line of code.
  2. Once you've gotten to the kernel, you're past the device type / layout dispatch. The first thing you need to write is error checking, to make sure the input tensors are the correct dimensions. (Error checking is really important! Don't skimp on it!)
  3. Next, we generally have to allocate the result tensor which we are going to write the output into.
  4. Time for the kernel proper. At this point, you now should do the second, dtype dispatch, to jump into a kernel which is specialized per dtype it operates on. (You don't want to do this too early, because then you will be uselessly duplicating code that looks the same in any case.)
  5. Most performant kernels need some sort of parallelization, so that you can take advantage of multi-CPU systems. (CUDA kernels are "implicitly" parallelized, since their programming model is built on top of massive parallelization).
  6. Finally, you need to access the data and do the computation you wanted to do!

In the subsequent slides, we'll walk through some of the tools PyTorch has for helping you implementing these steps.

To take advantage of all of the code generation which PyTorch brings, you need to write a schema for your operator. The schema gives a mypy-esque type of your function, and also controls whether or not we generate bindings for methods or functions on Tensor. You also tell the schema what implementations of your operator should be called for given device-layout combinations. Check out the README in native is for more information about this format.

You also may need to define a derivative for your operation in derivatives.yaml.

Error checking can be done by way of either a low level or a high level API. The low level API is just a macro, TORCH_CHECK, which takes a boolean, and then any number of arguments to make up the error string to render if the boolean is not true. One nice thing about this macro is that you can intermix strings with non-string data; everything is formatted using their implementation of operator<<, and most important data types in PyTorch have operator<< implementations.

The high level API saves you from having to write up repetitive error messages over and over again. The way it works is you first wrap each Tensor into a TensorArg, which contains information about where the tensor came from (e.g., its argument name). It then provides a number of pre-canned functions for checking various properties; e.g., checkDim() tests if the tensor's dimensionality is a fixed number. If it's not, the function provides a user-friendly error message based on the TensorArg metadata.

One important thing to be aware about when writing operators in PyTorch, is that you are often signing up to write three operators: abs_out, which operates on a preallocated output (this implements the out= keyword argument), abs_, which operates inplace, and abs, which is the plain old functional version of an operator.

Most of the time, abs_out is the real workhorse, and abs and abs_ are just thin wrappers around abs_out; but sometimes writing specialized implementations for each case are warranted.

To do dtype dispatch, you should use the AT_DISPATCH_ALL_TYPES macro. This takes in the dtype of the tensor you want to dispatch over, and a lambda which will be specialized for each dtype that is dispatchable from the macro. Usually, this lambda just calls a templated helper function.

This macro doesn't just "do dispatch", it also decides what dtypes your kernel will support. As such, there are actually quite a few versions of this macro, which let you pick different subsets of dtypes to generate specializations for. Most of the time, you'll just want AT_DISPATCH_ALL_TYPES, but keep an eye out for situations when you might want to dispatch to some more types. There's guidance in Dispatch.h for how to select the correct one for your use-case.

On CPU, you frequently want to parallelize your code. In the past, this was usually done by directly sprinkling OpenMP pragmas in your code.

At some point, we have to actually access the data. PyTorch offers quite a few options for doing this.

  1. If you just want to get a value at some specific location, you should use TensorAccessor. A tensor accessor is like a tensor, but it hard codes the dimensionality and dtype of the tensor as template parameters. When you retrieve an accessor like x.accessor<float, 3>();, we do a runtime test to make sure that the tensor really is this format; but after that, every access is unchecked. Tensor accessors handle strides correctly, so you should prefer using them over raw pointer access (which, unfortunately, some legacy kernels do.) There is also a PackedTensorAccessor, which is specifically useful for sending an accessor over a CUDA launch, so that you can get accessors from inside your CUDA kernel. (One notable gotcha: TensorAccessor defaults to 64-bit indexing, which is much slower than 32-bit indexing in CUDA!)
  2. If you're writing some sort of operator with very regular element access, for example, a pointwise operation, you are much better off using a higher level of abstraction, the TensorIterator. This helper class automatically handles broadcasting and type promotion for you, and is quite handy.
  3. For true speed on CPU, you may need to write your kernel using vectorized CPU instructions. We've got helpers for that too! The Vec256 class represents a vector of scalars and provides a number of methods which perform vectorized operations on them all at once. Helpers like binary_kernel_vec then let you easily run vectorized operations, and then finish everything that doesn't round nicely into vector instructions using plain old instructions. The infrastructure here also manages compiling your kernel multiple times under different instruction sets, and then testing at runtime what instructions your CPU supports, and using the best kernel in those situations.

A lot of kernels in PyTorch are still written in the legacy TH style. (By the way, TH stands for TorcH. It's a pretty nice acronym, but unfortunately it is a bit poisoned; if you see TH in the name, assume that it's legacy.) What do I mean by the legacy TH style?

  1. It's written in C style, no (or very little) use of C++.
  2. It's manually refcounted (with manual calls to THTensor_free to decrease refcounts when you're done using tensors), and
  3. It lives in generic/ directory, which means that we are actually going to compile the file multiple times, but with different #define scalar_t.

This code is pretty crazy, and we hate reviewing it, so please don't add to it. One of the more useful tasks that you can do, if you like to code but don't know too much about kernel writing, is to port some of these TH functions to ATen.

To wrap up, I want to talk a little bit about working efficiently on PyTorch. If the largeness of PyTorch's C++ codebase is the first gatekeeper that stops people from contributing to PyTorch, the efficiency of your workflow is the second gatekeeper. If you try to work on C++ with Python habits, you will have a bad time: it will take forever to recompile PyTorch, and it will take you forever to tell if your changes worked or not.

How to work efficiently could probably be a talk in and of itself, but this slide calls out some of the most common anti-patterns I've seen when someone complains: "It's hard to work on PyTorch."

  1. If you edit a header, especially one that is included by many source files (and especially if it is included by CUDA files), expect a very long rebuild. Try to stick to editing cpp files, and edit headers sparingly!
  2. Our CI is a very wonderful, zero-setup way to test if your changes worked or not. But expect to wait an hour or two before you get back signal. If you are working on a change that will require lots of experimentation, spend the time setting up a local development environment. Similarly, if you run into a hard to debug problem on a specific CI configuration, set it up locally. You can download and run the Docker images locally
  3. The CONTRIBUTING guide explains how to setup ccache; this is highly recommended, because sometimes it will help you get lucky and avoid a massive recompile when you edit a header. It also helps cover up bugs in our build system, when we recompile files when we shouldn't.
  4. At the end of the day, we have a lot of C++ code, and you will have a much more pleasant experience if you build on a beefy server with CPUs and RAM. In particular, I don't recommend doing CUDA builds on a laptop; building CUDA is sloooooow and laptops tend to not have enough juice to turnaround quickly enough.

So that's it for a whirlwind tour of PyTorch's internals! Many, many things have been omitted; but hopefully the descriptions and explanations here can help you get a grip on at least a substantial portion of the codebase.

Where should you go from here? What kinds of contributions can you make? A good place to start is our issue tracker. Starting earlier this year, we have been triaging issues; issues labeled triaged mean that at least one PyTorch developer has looked at it and made an initial assessment about the issue. You can use these labels to find out what issues we think are high priority or look up issues specific to some module, e.g., autograd or find issues which we think are small (word of warning: we're sometimes wrong!)

Even if you don't want to get started with coding right away, there are many other useful activities like improving documentation (I love merging documentation PRs, they are so great), helping us reproduce bug reports from other users, and also just helping us discuss RFCs on the issue tracker. PyTorch would not be where it is today without our open source contributors; we hope you can join us too!

by Edward Z. Yang at May 17, 2019 02:11 AM

May 15, 2019

Edward Z. Yang

A short note about functional linear maps

Some notes collected from a close read of Conal Elliot's Compiling to Categories and The Simple Essence of Automatic Differentiation.

A colleague of mine was trying to define a "tree structure" of tensors, with the hope of thereby generalizing the concept to also work with tensors that have "ragged dimensions." Let's take a look:

Suppose we have a (2, 3) matrix:

tensor([[1, 2, 3],
        [4, 5, 6]])

One way to think about this is that we have a "tree" of some sort, where the root of the tree branches to two subnodes, and then each subnode branches to three nodes:

       /- ROOT -\
  ROW 1          ROW 2
 /  |  \        /  |  \
1   2   3      4   5   6

Suppose you wanted to define this data structure in Haskell. One obvious way of going about doing this is to just say that a matrix is just a bunch of nested lists, [[Float]]. This works, true, but it isn't very illuminating, and it is certainly not type safe. Type safety could be achieved with sized vectors, but we are still left wondering, "what does it mean?"

Often, inductive definitions fall out of how we compose things together, in the same way that the inductive data structure for a programming language tells us how we take smaller programs and put them together to form a larger program. With matrices, we can think of a pictorial way of composing them, by either attaching matrices together vertically or horizontally. That gives us this vocabulary for putting together matrices, which would let us (non-uniquely) represent every matrix (Compiling to Categories, Section 8):

data Matrix
  = Scalar Float
  | Horizontal Matrix Matrix
  | Vertical Matrix Matrix

But what does it mean? Well, every matrix represents a linear map (if A : (n, m) is your matrix, the linear map is the function R^m -> R^n, defined to be f(x) = A x. We'll call a linear map from a to b, Linear a b). So the question we ask now is, what does it mean to "paste" two matrices together? It's a way of composing two linear maps together into a new linear map:

-- A function definition does not a category make!  You have to
-- prove that the resulting functions are linear.

horizontal :: Linear a c -> Linear b c -> Linear (a, b) c
horizontal f g = \(a, b) -> f a + g b

-- In matrix form:
--              [ a ]
-- [ F  |  G ]  [ - ] = [ F a + G b ]
--              [ b ]

vertical :: Linear a c -> Linear a d -> Linear a (c, d)
vertical f g = \a -> (f a, g a)

-- In matrix form:
-- [ F ]         [ F a ]
-- [ - ] [ a ] = [  -  ]
-- [ G ]         [ G a ]

Now we're cooking! Notice that the pasting shows up in the type of the linear map: if we paste horizontally, that just means that the vectors this linear map takes in have to be pasted together (with the tuple constructor); similarly, if we paste vertically, we'll produce output vectors that are the pasted results.

Cool, so we can add some type indexes, and write Linear as a GADT to refine the indices when you apply the constructor:

data Linear a b where
  Scalar :: Float -> Linear Float Float
  Horizontal :: Linear a c -> Linear b c -> Linear (a, b) c
  Vertical :: Linear a c -> Linear a d -> Linear a (c, d)

Is this the end of the story? Not quite. There are many ways you can go about combining linear maps; for example, you could (literally) compose two linear maps together (in the same sense of function composition). It's true that you can paste together any matrix you like with the data type above; how do we decide what should and shouldn't go in our language of linear maps?

To this end, Conal Elliot calls on the language of category theory to adjudicate. A category should define identity and function composition:

identity :: Linear a a
identity a = a

-- In matrix form: the identity matrix

compose :: Linear b c -> Linear a b -> Linear a c
compose g f = \a -> g (f a)

-- In matrix form: matrix multiply

We find that Horizontal and Vertical are the elimination and introduction operations of cocartesian and cartesian categories (respectively).

But this should we just slap Identity and Compose constructors to our data type? Linear map composition is a computationally interesting operation: if we just keep it around as syntax (rather than doing what is, morally, a matrix multiply), then it will be quite expensive to do operations on the final linear map. Where do representable functors come in? I'm not exactly sure how to explain this, and I've run out of time for this post; stay tuned for a follow up.

by Edward Z. Yang at May 15, 2019 09:42 PM

May 14, 2019

Neil Mitchell

Shake with Applicative Parallelism

Summary: Shake now does that Applicative trick from Haxl.

In Shake 0.17.9 and below, need xs >> need ys builds xs in parallel, then afterwards builds ys in parallel. The same is true of need xs *> need ys, where *> is the applicative equivalent of >>. From Shake 0.18 onwards both versions run everything in parallel. Hopefully that makes some Shake-based build systems go faster.

What change is being made?

If you make two calls to apply without any IO, monadic-bind or state operations in between then they will be executed as though you had made a single call to apply. As examples, need, askOracle and getDirectoryFiles are all calls to apply under the hood, so can be merged. However, note that the invariants are somewhat subtle. Something as simple as:

myNeed xs = do putNormal "Needing here"; need xs

Will not be merged with a preceeding need - the function putNormal queries the state (what is the verbosity level), does IO and contains a monadic bind.

Why are you making this change?

I am making the change for two reasons: 1) people have kept asking for it since Haxl does it; 2) the Hadrian build probably benefits from it. The downsides are relatively low (more complexity inside Shake, slightly slower Action operations) but the benfits are potentially large.

Why didn't you make this change sooner?

My previous reasoning for not making the change was:

Shake could follow the Haxl approach, but does not, mainly because they are targeting different problems. In Haxl, the operations are typically read-only, and any single step is likely to involve lots of operations. In contrast, with Shake the operations definitely change the file system, and there are typically only one or two per rule. Consequently, Shake opts for an explicit approach, rather than allow users to use *> (and then inevitably add a comment because its an unusual thing to do).

I stand by that comment - explicit grouping of need or explicit use of parallel is often better - all it takes is a sneaky >>= and the parallelism disappears. But if this change improves some build times, it's hard to argue strongly against.

Will it break any build systems?

Potentially, but unlikely, and those it will break were already on thin ice. As some examples:

  • If a depends on some state change from b (e.g. creating a directory), but doesn't have a dependency on it, then need [a] >> need [b] might have worked, while need [a,b] might not. The correct solution is for a to depend on b, if it does in fact depend on b, or at the very least use orderOnly.
  • If you use getDirectoryFiles on generated files (something the documentation says is a bad idea) then if merged with the thing that generates the files you will get incoherent results. The solution is to avoid using getDirectoryFiles on generated files.

Thanks to Pepe Iborra for encouraging, testing and troubleshooting this change.

by Neil Mitchell ( at May 14, 2019 08:41 PM

Roman Cheplyaka

How (not) to convert CDouble to Double

What’s wrong with the following code?

If you use QuickCheck to test the equivalence of Acos.acos and Prelude.acos, you’ll quickly find a counterexample:

You might think this is a difference in the semantics of Haskell acos vs C acos, but the acos(3) manpage disproves that:

If x is outside the range [-1, 1], a domain error occurs, and a NaN is returned.

Moreover, you’ll notice the discrepancy only when compiling the Haskell program with -O0. If you compile with -O1 or higher, both versions will result in NaN. So what’s going on here?

What turns the NaN turned into the Infinity is realToFrac. It is defined as follows:

Unlike Double, Rational, which is defined as a ratio of two Integers, has no way to represent special values such as NaN. Instead, toRational (acos 1.1) results in a fraction with some ridiculously large numerator, which turns into Infinity when converted back to Double.

When you compile with -O1 or higher, the following rewrite rules fire and avoid the round trip through Rational:

Unfortunately, the Haskell 2010 Report doesn’t give you any reliable way to convert between Double and CDouble. According to the Report, CDouble is an abstract newtype, about which all you know is the list of instances, including Real and Fractional. So if you want to stay portable, realToFrac seems to be the only solution available.

However, if you only care about GHC and its base library (which pretty much everyone is using nowadays), then you can take advantage of the fact that the constructor of the CDouble newtype is exported. You can use coerce from Data.Coerce or apply the data constructor CDouble directly.

So here’s a reliable, but not portable, version of the Acos module above:

May 14, 2019 08:00 PM

Brent Yorgey

Lightweight, efficiently sampleable enumerations in Haskell

For another project I’m working on, I needed a way to enumerate and randomly sample values from various potentially infinite collections. There are plenty of packages in this space, but none of them quite fit my needs:

  • universe (and related packages) is very nice, but it’s focused on enumerating values of Haskell data types, not arbitrary sets: since it uses type classes, you have to make a new Haskell type for each thing you want to enumerate. It also uses actual Haskell lists of values, which doesn’t play nicely with sampling.
  • enumerable has not been updated in a long time and seems to be superseded by universe.
  • enumerate is likewise focused on generating values of Haskell data types, with accompanying generic deriving machinery.
  • size-based is used as the basis for the venerable testing-feat library, but these are again focused on generating values of Haskell data types. I’m also not sure I need the added complexity of size-indexed enumerations.
  • enumeration looks super interesting, and I might be able to use it for what I want, but (a) I’m not sure whether it’s maintained anymore, and (b) it seems rather more complex than I need.

I really want something like Racket’s nice data/enumerate package, but nothing like that seems to exist in Haskell. So, of course, I made my own! For now you can find it on GitHub.1 Here’s the package in a nutshell:

  • Enumerations are represented by the parameterized type Enumeration, which is an instance of Functor, Applicative, and Alternative (but not Monad).
  • Enumerations keep track of their cardinality, which could be either countably infinite or a specific natural number.
  • Enumerations are represented as functions from index to value, so they can be efficiently indexed (which also enables efficient random sampling).
  • The provided combinators will always do something sensible so that every value in the resulting enumeration occurs at a finite index. For example, if you take the disjoint union of two infinite enumerations, the resulting enumeration will alternate between values from the two inputs.

I wrote about something similar a few years ago. The main difference is that in that post I limited myself to only finite enumerations. There’s a lot more I could say but for now I think I will just show some examples:

>>> enumerate empty
>>> enumerate unit
>>> enumerate $ empty <|> unit <|> unit

>>> enumerate $ finite 4 >< finiteList [27,84,17]

>>> select (finite 4000000000000 >< finite 123456789) 0
>>> select (finite 4000000000000 >< finite 123456789) 196598723084073
>>> card (finite 4000000000000 >< finite 123456789)
Finite 493827156000000000000

>>> :set -XTypeApplications
>>> enumerate $ takeE 26 . dropE 65 $ boundedEnum @Char

>>> take 10 . enumerate $ nat >< nat
>>> take 10 . enumerate $ cw
[1 % 1,1 % 2,2 % 1,1 % 3,3 % 2,2 % 3,3 % 1,1 % 4,4 % 3,3 % 5]

>>> take 15 . enumerate $ listOf nat

data Tree = L | B Tree Tree
  deriving (Eq, Show)

trees :: Enumeration Tree
trees = infinite $ singleton L <|> B <$> trees <*> trees

>>> take 3 . enumerate $ trees
[L,B L L,B L (B L L)]
>>> select trees 87239862967296
B (B (B (B (B L L) (B (B (B L L) L) L)) (B L (B L (B L L)))) (B (B (B L (B L (B L L))) (B (B L L) (B L L))) (B (B L (B L (B L L))) L))) (B (B L (B (B (B L (B L L)) (B L L)) L)) (B (B (B L (B L L)) L) L))

treesOfDepthUpTo :: Int -> Enumeration Tree
treesOfDepthUpTo 0 = singleton L
treesOfDepthUpTo n = singleton L <|> B <$> t' <*> t'
  where t' = treesOfDepthUpTo (n-1)

>>> card (treesOfDepthUpTo 0)
Finite 1
>>> card (treesOfDepthUpTo 1)
Finite 2
>>> card (treesOfDepthUpTo 3)
Finite 26
>>> card (treesOfDepthUpTo 10)
>>> select (treesOfDepthUpTo 10) (2^50)
B L (B L (B L (B (B L (B (B L (B (B L L) L)) (B (B (B (B L L) (B L L)) (B L (B L L))) (B (B (B L L) L) (B (B L L) L))))) (B (B (B (B (B (B L L) L) (B (B L L) L)) (B L L)) (B (B (B (B L L) L) (B L (B L L))) (B (B (B L L) (B L L)) L))) (B (B (B (B L L) (B L L)) (B (B (B L L) L) L)) (B (B L L) (B (B (B L L) L) (B (B L L) L))))))))

Comments, questions, suggestions for additional features, etc. are all very welcome!

  1. I chose the name enumeration before I realized there was already a package of that name on Hackage! So now I have to come up with another name that’s not already taken. Suggestions welcome…↩

by Brent at May 14, 2019 07:48 PM

Gabriel Gonzalez

Release early and often


This post summarizes the virtues of cutting frequent releases for software projects. You might find this post useful if you are trying to convince others to release more frequently (such as your company or an open source project you contribute to).

Easing migration

Frequent releases provide a smoother migration path for end-users of your software.

For example, suppose that your software is currently version “1.0” and you have two breaking changes (“A” and “B”) that you plan to make. Now consider the following two release strategies

  • Release strategy #0 - More frequent releases

    * Version 1.0
    * Initial release

    * Version 2.0

    * Version 3.0
  • Release strategy #1 - Less frequent releases

    * Version 1.0
    * Initial release

    * Version 2.0

The first release strategy is better from the end-user’s point of view because they have the option to upgrade in two smaller steps. In other words, they can upgrade from version 1.0 to version 2.0 and then upgrade from version 2.0 to version 3.0 at a later date.

Under both release strategies users can elect to skip straight to the latest release if they are willing to pay down the full upgrade cost up front, but releasing more frequently provides users the option to pay down the upgrade cost in smaller installments. To make an analogy: walking up a staircase is easier than scaling a sheer cliff of the same height.

In particular, you want to avoid the catastrophic scenario where a large number of users refuse to upgrade if one release bundles too many breaking changes. The textbook example of this is the Python 2 to 3 upgrade where a large fraction of the community refused to upgrade because too many breaking changes were bundled into a single release instead of spread out over several releases.

Keeping trains running on time

You don’t need to delay a release to wait for a particular feature if you release frequently. Just postpone the change for the next release if it’s not ready. After all, if you release frequently then the next release is right around the corner.

Conversely, if you release infrequently, you will frequently run into the following vicious cycle:

  • Important feature X is close to completion but perhaps not quite ready to merge

    Perhaps the feature has insufficient tests or there are unresolved concerns during code review

  • A new release is about to be cut

    Should you wait to merge feature X? It might be a long time (3 months?) before the next release, even though X could be ready with just 1 more week of work.

  • You choose to delay the release to wait for important feature X

  • Now another important feature Y requests to also slip in before the release

    … further delaying the release

  • 3 months have passed and you still haven’t cut the release

    New features keep (justifiably) slipping in out of fear that they will have to otherwise wait for the next release

Eventually you do cut a release, but each iteration of this process decreases the release frequency and compounds the problem. The less frequently you release software the more incentive to slip in last-minute changes before the release cutoff, further delaying the release. Even worse, the longer you wait to cut each release the greater the pressure to compromise on quality to get the release out the door.

Sticking to a strict and frequent release schedule staves off this vicious cycle because then you can always safely postpone incomplete features to the next release.

Avoiding “crunch time”

Infrequent “big bang” releases create pressure for developers to work excessive hours in the lead up to a release cutoff. This can happen even when developers are unpaid, such as on open source projects: the peer pressure of holding up the release for others can induce people to work unhealthy schedules they wouldn’t work otherwise.

I won’t claim that frequent release schedules will prevent paid developers from working late nights and weekends, but at least management can’t hide behind a looming release deadline to justify the overtime.

Accelerating the feedback loop

Releases are opportunities to correct course because you don’t know how users will react to a feature until you put the feature into their hands. If you implement a feature and the next release is 3 month away, that’s 3 months where you don’t know if the feature is what the user actually needs.

Even worse: suppose that the first implementation of the feature does not do what the user wants: now you have to wait another 3 months to get the next iteration of the feature into their hands. That slow feedback loop is a recipe for a poorly-designed product.

Incentivizing automation

Fast release cycles force you to automate and accelerate release-related processes that you would otherwise do manually (i.e. continuous integration), including:

  • Testing
  • Publication of software artifacts
  • Collecting quality and health metrics

That automation in turn means that you spend more time in the long run developing features and less time delivering them to end users.


Releasing more frequently isn’t free: as the previous section suggests, you need to invest in automation to be able to make frequent releases a reality.

However, I do hope that people reading this post will recognize when symptoms of infrequent releases creep up on them so that they can get ahead of them and make the case to others to invest in improving release frequency.

by Gabriel Gonzalez ( at May 14, 2019 01:48 PM

Donnacha Oisín Kidney

Implicit Corecursive Queues

Posted on May 14, 2019
Tags: Haskell


I was looking again at one of my implementations of breadth-first traversals:

And I was wondering if I could fuse away the intermediate list. On the following line:

The xs : bw is a little annoying, because we know it’s going to be consumed eventually by a fold. When that happens, it’s often a good idea to remove the list, and just inline the fold. In other words, if you see the following:

You should replace it with this:

If you try and do that with the above definition, you get something like the following:

Infinite Types

The trouble is that the above comes with type errors:

Cannot construct the infinite type: b ~ (b -> c) -> [a]

This error shows up occasionally when you try and do heavy church-encoding in Haskell. You get a similar error when trying to encode the Y combinator:

• Occurs check: cannot construct the infinite type: t0 ~ t0 -> t

The solution for the y combinator is to use a newtype, where we can catch the recursion at a certain point to help the typechecker.

The trick for our queue is similar:

This is actually equivalent to the continuation monad:


There’s a problem though: this algorithm never checks for an end. That’s ok if there isn’t one, mind you. For instance, with the following “unfold” function:

We can write a decent enumeration of the rationals.

However, if we do want to stop at some point, we need a slight change to the queue type.


We can actually add in a monad to the above unfold without much difficulty.

And it passes the torture tests for a linear-time breadth-first unfold from Feuer (2015). It breaks when you try and use it to build a tree, though.


Finally, we can try and make the above code a little more modular, by actually packaging up the queue type as a queue.

At this point, our type is starting to look a lot like the Phases type from Noah Easterly’s tree-traversals package. This is exciting: the Phases type has the ideal interface for level-wise traversals. Unfortunately, it has the wrong time complexity for <*> and so on: my suspicion is that the queue type above here is to Phases as the continuation monad is to the free monad. In other words, we’ll get efficient construction at the expense of no inspection. Unfortunately, I can’t figure out how to turn the above type into an applicative. Maybe in a future post!

Finally, a lot of this is working towards finally understanding Smith (2009) and Allison (2006).

Allison, Lloyd. 2006. “Circular Programs and Self-Referential Structures.” Software: Practice and Experience 19 (2) (October): 99–109. doi:10.1002/spe.4380190202.

Feuer, David. 2015. “Is a lazy, breadth-first monadic rose tree unfold possible?” Question. Stack Overflow.

Smith, Leon P. 2009. “Lloyd Allison’s Corecursive Queues: Why Continuations Matter.” The Monad.Reader 14 (14) (July): 28.

by Donnacha Oisín Kidney at May 14, 2019 12:00 AM

May 13, 2019


Integrated versus Manual Shrinking

TL;DR: Even with integrated shrinking, you still have to think about shrinking. There is no free lunch. Also, important new Hedgehog release!

Property-based testing is an approach to software testing where instead of writing tests we generate tests, based on properties that the software should have. To make this work, we need to be able to generate test data and, when we find a counter example, we need to shrink that test data to attempt to construct a minimal test case.

In Haskell, the library QuickCheck has long been the library of choice for property based testing, but recently another library called Hedgehog has been gaining popularity. One of the key differences between these two libraries is that in QuickCheck one writes explicit generation and shrinking functions, whereas in Hedgehog shrinking is integrated in generation. In this blog post we will explain what that means by developing a mini-QuickCheck and mini-Hedgehog and compare the two. We will consider some examples where integrated shrinking gives us benefits over manual shrinking, but we we will also see that the belief that integrated shrinking basically means that we can forget about shrinking altogether is not justified. There is no such thing as a free shrinker.

The release of this blog post coincides with release 1.0 of Hedgehog. This is an important update which, amongst lots of other goodies, includes many bug fixes and improvements to shrinking based on earlier drafts of this blog post. Upgrading is strongly recommended.

This blog post is not intended as an introduction to property-based testing. We will assume the reader has at least a superficial familiarity with setting up property based tests (in QuickCheck, Hedgehog, or otherwise). If you want to follow along, the code we present here is available from GitHub.


In this section we will develop a mini-QuickCheck interface which will enable us to study how shrinking works in QuickCheck's manual approach. Although many readers will be more familiar with this than they might be with the integrated shrinking approach, understanding how shrinking works exactly can be quite subtle and so we will spend a bit of time here to set up our running examples. We will then come back to these examples when we look at integrated shrinking in the next section.


When we want to test a property requiring input data of type a, we have to write a generator that produces random elements of type a. In order to be able to do that we need access to some kind of pseudo-random number generator, and so we define the type of generators for type a as

where R is some module providing PRNGs; in this blog post will we use System.Random for simplicity’s sake1. Gen forms a monad; in return we simply ignore the PRNG, and in (>>=) we split the PRNG into two:

(the Applicative instance is then the implied one). Technically speaking this breaks the monad laws since

but we can argue that this satisfies the monad laws “up to choice of PRNG”, which is modelling randomness anyway and should not be observable2.


Generating random test data is not sufficient. For example, consider testing the property that “for any pair (x, y), the sum x + y must be zero”. Clearly this property does not hold, and a good generator will easily find a counter-example. However, the counter-example we find might not be minimal; for instance, we might find the counter-example (28,89). It is therefore important that we can shrink counter-examples to construct minimal test cases, just like one might do when testing something by hand.3 In this example, a minimal test case might be (0, 1) or (1, 0).

In QuickCheck’s manual approach to shrinking, shrinking is modelled by a function that produces possible smaller values from a given value; we package up the generator and the shrinking together4

Primitive generators

As a very simple first example, consider generating boolean values, shrinking True to False:

It is important that values don’t shrink to themselves; when we are trying to find a counter-example, we will shrink the test case until we can’t shrink any more; if a value would shrink to itself, this process would loop indefinitely.

As a slightly more involved example, consider writing a generator for a positive integer in the range (0, hi):

In the generator we simply pick a random value5, and in the shrinker we return half the value and one less than the value. Consider testing the property that “all numbers are less than 12”. If we start with the counter example 72, this will quickly shrink to 38, then 18, and then shrink more slowly to 17, 16, 15, 14, 13 and finally 12, which is indeed the minimal counter-example. (Note that a more realistic version of shrinkWord will try numbers in a different order for improved efficiency6.)

Generating pairs

Although Gen is a monad, Manual is not (indeed, it’s not even a functor). When we compose Manual instances together we must manually compose the generator (easy, since we have a Monad interface available) and the shrinker (harder). For example, here is a generator for pairs:

First attempting to shrink the left element introduces a slight bias. For example, consider again the example “for all pairs (x, y), the sum x + y is zero”. Starting from a counter-example (9, 11), due to this bias shrinking will shrink the first component

(9,11) ⇝ (4,11) ⇝ (2, 11) ⇝ (1, 11) ⇝ (0, 11)

and then the second component

(0, 11) ⇝ (0, 5) ⇝ (0, 2) ⇝ (0, 1)

Thus, no matter what counter-example we start with, we will always reduce that counter-example to (0, 1), not (1, 0) (unless the original counter-example happens to have a zero in the second component, of course).

In practice however this bias is not usually a concern, however, since we can shrink either the left or the right at every step in the shrinking process.7 For example, consider the property “for all pairs (x, y), x < y”. Starting with the counter example (8, 6), we will first shrink the first component

(8, 6) ⇝ (7, 6) ⇝ (6, 6)

At this point we cannot shrink the left component any further, and so we shrink the right component instead

(6, 6) ⇝ (6, 3)

Now we can shrink the left component again, and shrinking continues in this “interleaved” fashion

(6, 3) ⇝ (3, 3) ⇝ (3, 1) ⇝ (1, 1) ⇝ (1, 0) ⇝ (0, 0)

We’re putting so much emphasis on this ordering because this will become a concern once we start looking at integrated shrinking.

Recursive data types

As a simple example of generating recursive data types, we will consider how to generate lists of an arbitrary length:

The generator is straight-forward: we generate an arbitrary length n, and then use the standard monadic replicateM combinator to generate n elements.

The shrinker is more interesting: not only can we shrink any of the elements of the list, like we did for pairs, but now we can also drop elements from the list altogether. At every step it eithers drops an element or shrinks an element, using the function pickOne to choose an element:

Consider how this shrinker works for the property “all elements of a list are greater than or equal to the length of the list”. Suppose the original counter-example we find is [5,2,65]; this list will shrink as follows:

[5,2,65] ⇝ [2,2,65] ⇝ [1,2,65] ⇝ [1,65] ⇝ [0,65] ⇝ [0]

The length of this list is 3, and so the element that violates the property is 2. However, if we were to drop any element from this list, the length would be become 2, and so no matter which element we would drop, we would not have a counter-example to the properly anymore. We must therefore shrink one of the elements first; mList tries them in order, and so we shrink the first one to 2 and then to 1. At this point we can drop the 2 from the list because the resulting list [1, 65] has length 2 and so the element 1 still violates the property. This process repeats one more time, interleaving dropping elements with shrinking elements, until we reach the minimal counter example [0].


The final example we will consider is how to generate elements satisfying a given predicate. We will first define a simple helper function that runs a monadic action as often as needed8 to generate a value satisfying a predicate:

This in hand, we can write a filter combinator as follows:

For the generator we repeat the generator until we hit on an element that satisfies the predicate, and for the shrinker we filter out any shrunk elements that don’t satisfy the predicate.

Although this combinator is not wrong, and occasionally useful, it is not always optimal. Consider using this filter to generate even numbers:

Suppose we are testing the property that “all even numbers are less than 5”, and we start with the counter example 88; this will now shrink as follows:

88 ⇝ 44 ⇝ 22

and then shrink no further. The problem is that 22 can only shrink to either 11 or 21, neither of which are even, and so mSuchThat_ filters both of them out, leaving us with no further shrink steps.

There are two solutions to this problem. One is to define a variant on mSuchThat_ that instead of removing a shrunk value that doesn’t satisfy the predicate, instead shrinks it again, in the hope of finding even smaller values that do satisfy the predicate:

If we use this combinator instead, the same counter example now shrinks

88 ⇝ 44 ⇝ 22 ⇝ 10 ⇝ 8 ⇝ 6

because 22 shrinks to 11 (which is not even) which in turn shrink to 5 (not even) and 10 (even), and we end up with 6, which is indeed the smallest even number which is not less than 5.

The alternative solution is not to use filter at all. Instead of generate-then-test, we can write a generator that produces even numbers by construction by generating any number and then multiplying it by two:

While the generator is simple, the shrinker is not, and we have logic for “evenness” both in the generator and in the shrinker. As we will see later, this is an example where integrated shrinking has clear benefits.

Integrated shrinking

It is now time to turn our attention to integrated shrinking. The key idea is straight-forward enough: instead of having the generator producing a single value, it will instead produce a tree of values. The root of the tree will correspond to the original value produced, the immediate children of the root correspond to the immediate shrink steps from the root, and so on.

where Tree here means “rose tree”: trees with an arbitrary number of children at every step:

For example, here is the tree that corresponds to the shrinker that we defined in mWord, shrinking a value x to half x or x - 1:

├─ 2
│  └─ 1
│     └─ 0
└─ 4
   ├─ 2
   │  └─ 1
   │     └─ 0
   └─ 3
      ├─ 1
      │  └─ 0
      └─ 2
         └─ 1
            └─ 0

Primitive examples

The easiest way to write primitive generators (generators not defined in terms of other generators) is to translate from a manual generator to an integrated one, constructing the tree by repeatedly applying the shrink function:

where unfoldTree builds a tree from a root and a function to construct the immediate children of that root:

For example, we can write integrated shrinkers for Bool and Word using


For primitive generators integrated shrinking provides little benefit, but once we start composing generators things get more interesting. We can equip Integrated with an Applicative instance:

For pure we just return a singleton tree, but the case for (<*>) is more complicated. After we split the PRNG into two and use it, we end up with a Tree (a -> b) of functions and a Tree a of arguments, and need to construct a tree Tree b of results.

How might we combine these two trees? Remember that these trees are shrink trees: the roots are the unshrunk values, and the subtrees are different ways in which we can shrink those values. Thus, to combine the “left” tree of functions and the “right” tree of arguments, the root of the new tree will combine the unshrunk root of both trees, and then shrink either the function in the left tree or an argument from the right tree, much like we did for pairs above in mPair:

Just like in mPair this has a slight bias because it shrinks the left argument first but, like in mPair, in practice this bias does not matter too much.

Laws. We should verify that this definition of interleave is correct; that is, satisfies the laws for Applicative. This boils down to showing that

In the repository there is a Coq file that verifies these laws. Note that these laws are true whether we shrink the left tree first or the right one.

Example: generating pairs

The above description of the Applicative instance is rather abstract, so let’s consider a concrete example. We can write a combinator for generating pairs using

(Indeed, such combinators are so simple that there is no need to provide them explicitly; the Applicative interface suffices.) Let’s consider what happens when we use this to generate a pair of a boolean and a number. Suppose the boolean we pick is True, which has shrink tree

└─ False

and the number we pick is 2, with shrink tree

└─ 1
   └─ 0

We first fmap the function (,) over that first tree to end up with the tree

└─ (False,_)

When we then interleave these two trees, the final result is

├─ (False,2)
│  └─ (False,1)
│     └─ (False,0)
└─ (True,1)
   ├─ (False,1)
   │  └─ (False,0)
   └─ (True,0)
      └─ (False,0)

Note how this tree matches our intuition precisely: we start with the unshrunk value (True, 2); this has two immediate children, one first shrinking the bool (False, 2) and one first shrinking the number (True, 1). If we do first shrink the number, we again have the choice to shrink the bool or the number first.

The advantage of the applicative interface is that this is not restricted to pairs, but can be used for any number of elements. For example, we can write a generator for triples using

If we start with the shrink trees

└─ False

└─ 0

└─ 'a'

then the final interleaved tree will be

├─ (False,1,'b')
│  ├─ (False,0,'b')
│  │  └─ (False,0,'a')
│  └─ (False,1,'a')
│     └─ (False,0,'a')
├─ (True,0,'b')
│  ├─ (False,0,'b')
│  │  └─ (False,0,'a')
│  └─ (True,0,'a')
│     └─ (False,0,'a')
└─ (True,1,'a')
   ├─ (False,1,'a')
   │  └─ (False,0,'a')
   └─ (True,0,'a')
      └─ (False,0,'a')

Notice how this models that we can pick any element in the triple to reduce at any given moment.


Using the applicative interface to generate lists of a fixed length is easy. Indeed, we can use a standard combinator on Applicative:9

to define

However, there is no way to use the Applicative interface to write a generator for lists of an arbitrary size. The problem is that in order to do that, we first need to generate the length, and then depending on the value n of the length that we pick, run the generator for the elements n times. This kind of dependency between generators is impossible using only an Applicative interface; instead, we need a Monad interface. This is however where trouble starts.

Joining trees

Suppose we have shrink tree corresponding to the length of the list

and a function

for some a that produces a shrink tree for the list itself given a length. The natural thing to try is to apply function f at every length

This gives us a tree of trees: for every value n in len, the corresponding shrink tree for lists of length n. The only thing left to do is to collapse this tree-of-trees into a tree. This is a standard combinator on monads called join. For trees, we can implement it as follows:

Laws. As for the Applicative interface, join should satisfy a number of laws:

The Coq file in the repo contains proofs of these properties.

However, we will not equip Integrated with a monad instance. In order to understand why, let’s suppose that it did have a monad instance. We could then write this alternative definition of iPair:

This looks deceptively simple, and almost identical to iPair, but iPair and iPairWRONG have very different behaviour. Starting from the tree ((,) <$> genA)

we get the tree

(True,2) ⇝ (True,1) ⇝ (True,0)
└─ (False,2) ⇝ (False,1) ⇝ (False,0)

which after join looks like

Monad                  Applicative
(True,2)               (True,2)
├─ (False,2)           ├─ (False,2)
│  └─ (False,1)        │  └─ (False,1)
│     └─ (False,0)     │     └─ (False,0)
└─ (True,1)            └─ (True,1)
   └─ (True,0)            ├─ (False,1)
                          │  └─ (False,0)
                          └─ (True,0)
                             └─ (False,0)

where for comparison we have reproduced the shrink tree we got using the Applicative interface on the right. Notice the difference between the two trees: after we shrink the number, we do not go back to shrink the boolean anymore. This is important to remember: in a generator of the form

x >>= f

as soon as we start shrinking f we will not go back to shrink x. We can’t; since f is a function, we must first decide on a value of x before we can do anything with f x.

This has real consequences for testing. For example, consider the property that “for all pairs (x, y), x < y”. If we start with a counter example (80, 57), this will shrink as follows:

(80, 57) ⇝ (79, 57) ⇝  .. ⇝  (57, 57) ⇝ (57, 28) ⇝ .. ⇝  (57, 0)

When we reach (57, 57), we cannot shrink the first component anymore, since (56, 57) isn’t a counter-example to the property. However, as soon as we start strinking the second component, we will not go back to the first component anymore, and so we end up with (57, 0) as our rather poor “minimal” counter-example.

We have a choice in join in the order of the subtrees; we could have defined it like

For our example tree from above, this results in the tree

├─ (True,1)
│  └─ (True,0)
└─ (False,2)
   └─ (False,1)
      └─ (False,0)
Although this version of join still satisfies the monad laws, it is strictly worse. As before, when shrinking x >>= f, as soon as we start shrinking f, we will not go back anymore to shrink x. But worse, rather than trying to shrink x first, we will now first try to shrink f! This means that for the same property above, if we started with the counter example (80, 57), we would end up with the counter-example (80, 0). Even less “minimal” than before.

Dependent generators

Let’s go back to writing a generator for lists of an arbitrary length. Let’s suppose we did have a Monad instance available for Integrated. Writing a generator for lists is then easy:

If the belief that integrated shrinking means that we can mostly just forget about shrinking, this definition should be fine. However, it isn’t. Just like for iPairWRONG above, this shrinker shrinks very poorly. Part of the problem is actually very similar as for iPairWRONG. Consider checking the property that “all lists are sorted”, and suppose the initial counter-example we find is [81,27]; this will shrink as follows:

When we reach [28,27] we cannot shrink the first element any further, and so we start shrinking the next, never returning to the first element anymore. This problem is easily fixed though; the elements of the list are clearly independent from each other, and so we can use our iListOfSize function from above instead; this uses the Applicative interface and does not introduce this unwanted ordering between shrinking the elements of the list:

The dependency on the length however is a real dependency, and so cannot be removed. Although iListWRONG' shrinks a bit better than iListWRONG, it can still result in non-minimal counter-examples. For example, if for the same property (all lists are sorted) the initial counter example we find is [28,66,13], we cannot first shrink the length because the shorter list [28,66] is sorted. However, after we then start shrinking the elements of the list we never go back to try to shrink the length again, ending up with the non-minimal counter example [0,1,0].

Freezing the tree

So how do we fix it? If the implied shrinker for dependent generators is not good, we need a way to override it and define our own. In order to do that, we need to be able to manipulate the shrink trees explicitly. We therefore introduce a useful function called freeze:

This changes an integrated shrinker into a simple (“manual”) shrinker for trees; this is the key step that makes it possible to manipulate shrink trees explicitly. We will also find it useful to define a variant on freeze which just throws away any subtrees, leaving only the unshrunk root:

We can use these two combinators to define a explicit generator for lists of trees:

This is almost identical to our naive first attempt iListWRONG, but produces a list of shrink trees instead of a list of elements. In order to turn this into a proper shrink tree for lists of elements, we need to turn a list of trees into a tree of lists, and this operation corresponds precisely to the shrinker for lists that we defined back when we considered generating lists in the manual approach (mList, above):

All that’s left now is to turn a dependent generator that explicitly manipulates shrink trees back into a regular integrated generator:

leaving us with the following generator for lists:

Derived generators

At this point you might be wondering whether all of this is worth it; the combination of Integrated and freeze seems like a round-about way to introduce manual shrinking; all that just to get back to where we were in mini-QuickCheck. That is however overly pessimistic.

Suppose we have some application-specific datatype that counts elements:

and suppose we want to write a generator for this. One approach is to apply the same set of steps that we did in the previous section. We can write a function

and specialized shrinking function for Count

and finally

This is however a fair bit of work, interleaveCount in particular is non-trivial (see repo). However, if we have a function

we can avoid all this work and piggy-back on the generator for lists:

and we’re done with a one-line generator. This is much more difficult to do in QuickCheck. Although it is easy to generate a list and then generate the Count from that, when it comes to shrinking we don’t have the list available anymore and so we can’t piggy-back on the shrinker for lists there. We can introduce what’s known as a “shrink wrapper”

which pairs a Count value with the list that generated it; if we do that, then we can piggy-back on the generator and shrinker for lists:

However, this approach does not compose: if we have some other datatype that expects a Count as a subterm, we cannot use a WrapCount term there. This limits the applicability of this pattern rather severely. In integrated shrinking, however, this is really easy to do; a clear win.

Caution. Defining the generator for Count in terms of the one for lists is really only a valid approach if all Count values can be generated by some list. If this is not the case, your tests don’t cover all cases. This should be stated and tested separately.

Monad instance

What if we don’t care about shrinking, or feel that the implied shrinker is okay, even for dependent generators? Just to support this use case we can introduce Dependent alias which does have the Monad instance available10:

where the corresponding Applicative instance is the implied one.

When we define dependent generators we often want to use integrated ones, and so it is useful to “lift” an integrated shrinker to a dependent one:

Going in the other direction however is unsafe, as we have seen; unless we take special precautions, the implied shrinker behaviour of dependent shrinkers is very poor:


As our final example, we will reconsider filtering in the context of integrated shrinking. Like generating lists, filtering requires a Monad interface. After all, the effects we need (how often we generate a value) depends on the value that we generated previously. If we did have a Monad instance for Integrated available, we could simply define

As for iListWRONG, this function looks simple, but as for iListWRONG, it is wrong; and in fact, this one is unuseably wrong. Remember what the monad interface does: it applies a function at every level in the shrink tree. In iSuchThatWRONG, the function we apply is a function that checks the predicate and if it fails, reruns the generator. This means that as soon as we shrink a value to something that does not satisfy the predicate anymore, we start over from scratch, pick an entirely new value (possibly even larger than what we started with), and repeat ad nauseam.

What we want to do, of course, is first generate the shrink tree, and then filter out the elements from that tree that don’t satisfy the predicate. When we discussed this in the context of manual shrinking, we mentioned that we had two possibilities: either stop as soon as we find an element that doesn’t satisfy the predicate, or else recursively apply shrinking in the hope of finding a even smaller element that does satisfy the predicate. Translated to trees, the former corresponds to

and the latter to

Defining filtering is now easy; since we want to explicitly manipulate the shrink tree, freeze comes in handy again:11

As an example use case, consider once more generating even numbers. As for manual shrinking, we have two options: generate-then-test or generate-even-by-construction. For the former, we can do

where we must use iSuchThat instead of iSuchThat_ (for the same reason that mEvenWRONG was wrong). For the latter, we can do

If we compare this to mEven', we can see that integrated shrinking here again gives us a clear advantage. In the manual case we had to reason about “evenness” in both the shrinker and the generator; no such duplication of logic happens here.


In this blog post we have compared the manual approach to shrinking from QuickCheck with the integrated approach from Hedgehog. There are many other differences between these two libraries that we have completely ignored here, and I can strongly recommend watching Jacob Stanley’s excellent Lambda Jam talk Gens N’ Roses: Appetite for Reduction. Even if you have no intention of switching from QuickCheck to Hedgehog, many of the gotchas of QuickCheck that Jacob mentions in that talk are well worth thinking about.

One of the problems that Jacob mentions in his talk is a social one: most QuickCheck users simply don’t write shrinkers. Indeed, this is true. Part of the problem comes from the fact that the QuickCheck type class Arbitary has a default implementation of shrink that returns the empty list. This means that by default the values you generate don’t shrink at all. This is clearly not good.

Unfortunately, it is not obvious that integrated shrinking solves this social problem. As we have seen, the implicitly defined shrinkers for dependent generators (generators that require the monad interface) are very poor (iListWRONG), and in some cases even unuseable (iSuchThatWRONG). It simply isn’t the case that we don’t have to think about shrinking, no matter which approach we use. Perhaps it can be argued that a default shrinker that shrinks a bit is better than one that doesn’t shrink at all; but not by much. Admittedly for generators that depend on the Applicative interface only the implied shrinker is fine, but this case is easy in QuickCheck also (it corresponds precisely with the behaviour of genericShrink).

Perhaps we can construct integrated generators with good shrinkers by writing them in clever ways. It is however not obvious how to do this, even for the relatively simple case of lists. This is an interesting topic of future work.

  1. In reality System.Random is a poor choice, and we should choose something different such as splitmix.

  2. A more obvious type for the generator might have been

    which would allow us to thread the PRNG through. We don’t do this because we would lose laziness; for example, when generating a pair of values; we would not be able to generate the second value until we finished generating the first. This can make testing much slower, and makes it impossible to generate infinite values.

  3. To some degree we can reduce the need for shrinking by trying small counter examples first; both QuickCheck and Hedgehog do this, though Hedgehog’s approach using first-class “ranges” is arguably nicer here. However, this is not sufficient. It is often the case that the probability that a larger test case hits a given bug is disproportionally larger than the probability that a small test case does, and we are therefore more likely to find bugs in larger test cases than smaller ones.

  4. QuickCheck uses type-classes instead of explicit records; for simplicity and to keep the comparison with Hedgehog as focussed as possible, we will not do that in this blog post.

  5. Picking a random value uniformly in the range (0, hi) might not be the best choice; we may wish to generate “edge cases” such as 0 with a higher probability. Moreover, if we want to generate smaller test cases first, we’d also do that here.

  6. This shrinker is sub-optimal; it will use binary search if the minimum test case happens to be near zero, but linear search if the value happens to be be near the upper end of the range. The shrinkers in this blog post are intended to illustrate how QuickCheck and Hedgehog work under the hood, not as examples of how to write good shrinkers.

  7. Sometimes the bias is a problem. For example, consider the property “for all pairs (x, y), x /= y”. If we start with a counter-example, say, (46, 46), we could only shrink this if we shrink both components at the same time. We can write shrinkers like this, but in general there are O(2^n) possible combinations of values to choose to shrink together when given n values, which makes shrinking much too costly.

  8. In reality we will want to impose a maximum number of iterations here and give up if we cannot find an element satisfying the predicate within that bound.

  9. In recent versions of base the function replicateM has this signature; we define this custom combinator for the sake of this blog post because the difference between the Monad and Applicative interface to the integrated shrinkers is crucial.

  10. In Hegdehog no distinction is made at the type level between generators satisfying the Applicative instance and the Monad instance, and so it is up to the programmer to make sure not to use the Monad instance where the Applicative instance would suffice, or overwrite the shrinker when the Monad instance is required. This can result in poor shrinkers, a problem which might materialize only much later if a bug is found and suddenly a test case does not shrink properly.

  11. The use of head and fromJust in these definitions is justified by the fact that we know that the very root of the tree must satisfy the predicate.

by edsko at May 13, 2019 10:10 AM

Joachim Breitner

Artsy desktop background

Friends of mine took part in a competition where they had to present an art project of theirs using a video. At some point we had the plan of creating a time lapse video of a drawing being created, and for that mounted a camera above the drawing desk.

With paper

With paper

In the end we did not actually use the video, but it turns out that the still from the beginning (with blank paper) and the end of the video (no paper) are pretty nice, too. So I am sharing them here, in case anyone wants to use them as a desktop background or what not.

Without paper

Without paper

Feel free to re-use these photos under the terms of the Creative Commons Attribution 4.0 International License.

by Joachim Breitner ( at May 13, 2019 06:37 AM

May 11, 2019

Donnacha Oisín Kidney

Concatenative Programming; The Free Monoid of Programming Languages

Posted on May 11, 2019

This post demonstrates a simple encoding of a (typed) concatenative language in Haskell.

Point-free style is one of the distinctive markers of functional programming languages. Want to sum a list? That’s as easy as:

Now I want to sum every number after adding one to it.

One more step to make this function truly abstract™ and general™: we’ll allow the user to supply their own number to add

And here the trouble begins. The above expression won’t actually type check. In fact, it’ll give a pretty terrible error message:

• Non type-variable argument in the constraint: Num [a]
  (Use FlexibleContexts to permit this)
• When checking the inferred type
    sumThoseThat :: forall a.
                    (Num [a], Foldable ((->) [a])) =>
                    a -> [a]

I remember as a beginner being confused by similar messages. What’s FlexibleContexts? I had thought that the “point-free style” just meant removing the last variable from an expression if it’s also the last argument:

Why doesn’t it work here?

Well, it doesn’t work because the types don’t line up, but I’m going to try and explain a slightly different perspective on the problem, which is associativity.

To make it a little clearer, let’s see what happens when we point-fill the expression:

Indeed, the problem is the placement of the parentheses. What we want at the end is:

But, no matter. We have to jiggle the arguments around, or we could use something terrible like this:

Is there something, though, that could do this automatically?


We run into a similar problem in Agda. We’re forever having to prove statements like:

There are a couple of ways to get around the issue, and for monoids there’s a rich theory of techniques. I’ll just show one for now, which relies on the endomorphism monoid. This monoid is created by partially applying the monoid’s binary operator:

And you can get back to the underlying monoid by applying it to the neutral element:

Here’s the important parts: first, we can lift the underlying operation into the endomorphism:

And second, it’s definitionally associative.

These are all clues as to how to solve the composition problem in the Haskell code above. We need definitional associativity, somehow. Maybe we can get it from the endomorphism monoid?


You’re probably familiar with Haskell’s state monad:

It can help a lot when you’re threading around fiddly accumulators and so on.

Of course, these days state is a transformer:

This lets us stack multiple effects on top of each other: error handling, IO, randomness, even another state monad. In fact, if you do stack another state monad on top, you might be surprised by the efficiency of the code it generates:

It’s nothing earth shattering, but it inlines and optimises well. That output is effectively a left-nested list, also.


If we can do one, and we can do two, why not more? Can we generalise the state pattern to an arbitrary number of variables? First we’ll need a generic tuple:

Then, the state type.

We can actually clean the definition up a little: instead of a tuple at the other end, why not push it onto the stack.

In fact, let’s make this as polymorphic as possible. We should be able to change the state is we so desire.

And suddenly, our endomorphism type from above shows up again.

We can, of course, get back our original types.

And it comes with all of the instances you might expect:


But what’s the point? So far we’ve basically just encoded an unnecessarily complicated state transformer. Think back to the stacking of states. Written in the mtl style, the main advantage of stacking monads like that is you can write code like the following:

In other words, we don’t care about the rest of m, we just care that it has, somewhere, state for an [a].

This logic should apply to our stack transformer, as well. If it only cares about the top two variables, it shouldn’t care what the rest of the list is. In types:

And straight away we can write some of the standard combinators:

You’ll immediately run into trouble if you try to work with some of the more involved combinators, though. Quote should have the following type, for instance:

But GHC complains again:

• Illegal polymorphic type: xs :-> ys
  GHC doesn't yet support impredicative polymorphism
• In the type signature:
    quote :: (xs :-> ys) -> '[] :-> '[xs :-> ys]

I won’t go into the detail of this particular error: if you’ve been around the block with Haskell you know that it means “wrap it in a newtype”. If we do that, though, we get yet more errors:

• Couldn't match type ‘ys ++ zs0’ with ‘ys ++ zs’
  Expected type: Stack (xs ++ zs) -> Stack (ys ++ zs)
    Actual type: Stack (xs ++ zs0) -> Stack (ys ++ zs0)
  NB: ‘++’ is a type function, and may not be injective

This injectivity error comes up often. It means that GHC needs to prove that the input to two functions is equal, but it only knows that their outputs are. This is a doubly serious problem for us, as we can’t do type family injectivity on two type variables (in current Haskell). To solve the problem, we need to rely on a weird mishmash of type families and functional dependencies:

And it does indeed work:

Interestingly, these combinators represent the monadic operations on state (dot = join, pure = pure, etc.)

And can we get the nicer composition of the function from the intro? Kind of:

Here are some references for concatenative languages: Okasaki (2002), Purdy (2012), Kerby (2007), Okasaki (2003).

Kerby, Brent. 2007. “The Theory of Concatenative Combinators.”\%7Eiepos/joy.html.

Okasaki, Chris. 2002. “Techniques for embedding postfix languages in Haskell.” In Proceedings of the ACM SIGPLAN workshop on Haskell - Haskell ’02, 105–113. Pittsburgh, Pennsylvania: ACM Press. doi:10.1145/581690.581699.

———. 2003. “THEORETICAL PEARLS: Flattening combinators: Surviving without parentheses.” Journal of Functional Programming 13 (4) (July): 815–822. doi:10.1017/S0956796802004483.

Purdy, Jon. 2012. “The Big Mud Puddle: Why Concatenative Programming Matters.” The Big Mud Puddle.

by Donnacha Oisín Kidney at May 11, 2019 12:00 AM

May 10, 2019

Ken T Takusagawa

[efkabuai] Always have an escape plan

We consider a situation in which an internet site anticipates that it might disappear from its current address, perhaps because of censorship.  In preparation for potentially moving to a yet undecided location, we propose a method of preemptively publishing "digital breadcrumbs" (referencing Hansel and Gretel) on its current site as a way for users to find the next location if or when the site moves.  The site encourages users to save a local copy of the breadcrumbs.

The published digital breadcrumbs consist of two parts: a public key and the specification of a deterministic pseudorandom word generator including its random seed.

As an example, we'll specify digital breadcrumbs for this blog.

Here is the public key for this blog, with fingerprint 2C95 B41D A4CE 7C5F 0110 C27A 561D DCBA 1BDA E2F7:

Version: GnuPG v1


Messages signed with a site's public key should contain the text Serial: NNN where NNN is a string in the format of a Debian software package version number, roughly epoch:version, where the epoch: is optional and version is a dotted number string.  (Inspired by the serial number of DNS zone records.  Some sites use serial numbers like 20190228 to ensure the serial number monotonically increases.  This would be nicer if dots were allowed: 2019.02.28.)  These serial numbers can be compared as version numbers as specified by Debian.  Having comparable serial numbers establishes a total ordering on messages, useful if later messages need to invalidate older messages.

Here is the first message signed with our public key, giving an example serial number.  This message is otherwise uselessly empty.

Hash: SHA1

Serial: 0.2015.6
Version: GnuPG v1


We leave unsolved the standard hard problems of Public Key Infrastructure: what to do if your private key is compromised, what to do if you lose access to your private key.

We now turn our attention to the random word generator.

To combat yet unknown countermeasures an adversarial censor might deploy, the digital breadcrumbs provide only hints toward the many possibilities where the next site might be.  Exactly how generated random words specify the site's next location is deliberately left unstated in order to flexibly route around the yet unknown censorship: perhaps DNS, perhaps it should be used as a web search term, perhaps a key into some censorship-resistant medium.  It is the responsibility of the user, the recipient of the breadcrumbs, to try things and figure it out.  The infinite stream of random words provides many choices in case some of the earlier words are not available (perhaps due to censorship).  The public key allows the user to verify, perhaps even in an automated fashion, that he or she has found the right new site.

(Previous vaguely related idea: search the entire internet for the latest "message" signed by a public key.)

(Unfortunately, the random words won't directly be usable as addresses for things like Onion hidden services or Freenet CHK because for those, you don't have the ability to choose your own address.  Freenet KSK might be usable, at least until spammers overwrite it.)

A cryptographically secure pseudorandom number generator (PRNG) is not strictly necessary; however, we use one in the sample implementation below because cryptographic primitives are standardized so widely available and easily portable.

Two unrelated entities (perhaps even adversaries) can use exactly the same random word generator, perhaps because they chose the same seed.  We expect this is only to be a slight inconvenience because their different public keys can be used to distinguish them.  They can also choose different words in the same random stream.

Here is the random word generator for this blog, suggesting a template for others.  The random word generator is implemented in Haskell, similar to this stream cipher example.

Hash: SHA1

{- This code is public domain. -}
{- Serial: 0.2019.1 -}
{-# LANGUAGE PackageImports #-}
module Main where {
import "cipher-aes" Crypto.Cipher.AES as AES;
import qualified Data.ByteString as ByteString;
import Data.Word (Word8);
import Data.Maybe (catMaybes);
import qualified Crypto.Hash.SHA256 as SHA256;
import Data.Text.Encoding (encodeUtf8);
import qualified Data.Text as Text;
import qualified Data.List as List;

url :: String;
url = "";

main :: IO();
main = putStr $ catMaybes $ map filt27 $ to_bytes $ aes_stream
$ AES.initAES $ SHA256.hash $ encodeUtf8 $ Text.pack url;

aes_stream :: AES.AES -> [ByteString.ByteString];
aes_stream key = List.unfoldr (Just . (next_block key)) zero;

next_block :: AES.AES -> AES.AESIV ->
(ByteString.ByteString, AES.AESIV);
next_block key iv = AES.genCounter key iv 1; {- Although we ask
for just 1 byte, genCounter will return a full 16-byte block. -}

{- Initial counter value -}
zero :: AES.AESIV;
zero = AES.aesIV_ $ ByteString.replicate 16 0;

to_bytes :: [ByteString.ByteString] -> [Word8];
to_bytes = concat . map ByteString.unpack;

filt27 :: Word8 -> Maybe Char;
filt27 z = case mod z 32 of {
0 -> Just ' ';
x | x<=26 -> Just $ toEnum $ fromIntegral x + fromEnum 'a' - 1;
_ -> Nothing;
Version: GnuPG v1


Prose description of what the code does: The seed string is hashed with SHA-256, (yielding f78183140c842e8f4a550c3a5eb5663a33706fc07eeacc9687e70823d44511c4), which is then used (unsalted) as a key for AES-256 in counter mode (CTR) to generate a stream of bytes.  The counter starts at the zero block and increments as a 128-bit big-endian integer.  We read each AES output block as bytes in big-endian order.  Each byte is truncated to its least significant five bits (i.e., modulo 32), then encoded into letters by the following substitution: 0=space, 1=A, 2=B,... 26=Z.  Values 27 through 31 are ignored, encoding to nothing.  (They might be useful for future extensions.)  Occasionally there may be two or more consecutive spaces, a rare occurrence which currently has no special meaning but may be useful for future extensions.  For reference, our random word stream begins: d efkabuai yqwrmhspliasmvokhvwhvz fdvhdwhoxkcqfujilomqjubxfzjtug ...

This idea is similar to and kind of an elaboration of the Document ReFinding Key also deployed on this blog which helps in finding, via a search engine, new addresses of individual posts (for which you have an old URL) should this blog move.  Rather than associating just a single random word with an internet resource, we now associate an infinite stream of random words.  The template above can easily be extended to arbitrary URLs, such as those of individual posts.

The technical details were inspired by a mechanism used by some computer virus to call home and update itself, using a deterministic pseudorandom word generator to generate internet domains, one of which the virus author would anonymously purchase and post update code.

by Ken ( at May 10, 2019 05:29 AM

May 09, 2019

Ken T Takusagawa

[efkabuai] Always have an escape plan

We consider a situation in which an internet site anticipates that it might disappear from its current address, perhaps because of censorship.  In preparation for potentially moving to a yet undecided location, we propose a method of preemptively publishing "digital breadcrumbs" (referencing Hansel and Gretel) on its current site as a way for users to find the next location if or when the site moves.  The site encourages users to save a local copy of the breadcrumbs.

The published digital breadcrumbs consist of two parts: a public key and the specification of a deterministic pseudorandom word generator including its random seed.

As an example, we'll specify digital breadcrumbs for this blog.

Here is the public key for this blog, with fingerprint 2C95 B41D A4CE 7C5F 0110 C27A 561D DCBA 1BDA E2F7:

Version: GnuPG v1


Messages signed with a site's public key should contain the text Serial: NNN where NNN is a string in the format of a Debian software package version number, roughly epoch:version, where the epoch: is optional and version is a dotted number string.  (Inspired by the serial number of DNS zone records.  Some sites use serial numbers like 20190228 to ensure the serial number monotonically increases.  This would be nicer if dots were allowed: 2019.02.28.)  These serial numbers can be compared as version numbers as specified by Debian.  Having comparable serial numbers establishes a total ordering on messages, useful if later messages need to invalidate older messages.

Here is the first message signed with our public key, giving an example serial number.  This message is otherwise uselessly empty.

Hash: SHA1

Serial: 0.2015.6
Version: GnuPG v1


We leave unsolved the standard hard problems of Public Key Infrastructure: what to do if your private key is compromised, what to do if you lose access to your private key.

We now turn our attention to the random word generator.

To combat yet unknown countermeasures an adversarial censor might deploy, the digital breadcrumbs provide only hints toward the many possibilities where the next site might be.  Exactly how generated random words specify the site's next location is deliberately left unstated in order to flexibly route around the yet unknown censorship: perhaps DNS, perhaps it should be used as a web search term, perhaps a key into some censorship-resistant medium.  It is the responsibility of the user, the recipient of the breadcrumbs, to try things and figure it out.  The infinite stream of random words provides many choices in case some of the earlier words are not available (perhaps due to censorship).  The public key allows the user to verify, perhaps even in an automated fashion, that he or she has found the right new site.

(Previous vaguely related idea: search the entire internet for the latest "message" signed by a public key.)

(Unfortunately, the random words won't directly be usable as addresses for things like Onion hidden services or Freenet CHK because for those, you don't have the ability to choose your own address.  Freenet KSK might be usable, at least until spammers overwrite it.)

A cryptographically secure pseudorandom number generator (PRNG) is not strictly necessary; however, we use one in the sample implementation below because cryptographic primitives are standardized so widely available and easily portable.

Two unrelated entities (perhaps even adversaries) can use exactly the same random word generator, perhaps because they chose the same seed.  We expect this is only to be a slight inconvenience because their different public keys can be used to distinguish them.  They can also choose different words in the same random stream.

Here is the random word generator for this blog, suggesting a template for others.  The random word generator is implemented in Haskell, similar to this stream cipher example.

Hash: SHA1

{- This code is public domain. -}
{- Serial: 0.2019.1 -}
{-# LANGUAGE PackageImports #-}
module Main where {
import "cipher-aes" Crypto.Cipher.AES as AES;
import qualified Data.ByteString as ByteString;
import Data.Word (Word8);
import Data.Maybe (catMaybes);
import qualified Crypto.Hash.SHA256 as SHA256;
import Data.Text.Encoding (encodeUtf8);
import qualified Data.Text as Text;
import qualified Data.List as List;

url :: String;
url = "";

main :: IO();
main = putStr $ catMaybes $ map filt27 $ to_bytes $ aes_stream
$ AES.initAES $ SHA256.hash $ encodeUtf8 $ Text.pack url;

aes_stream :: AES.AES -> [ByteString.ByteString];
aes_stream key = List.unfoldr (Just . (next_block key)) zero;

next_block :: AES.AES -> AES.AESIV ->
(ByteString.ByteString, AES.AESIV);
next_block key iv = AES.genCounter key iv 1; {- Although we ask
for just 1 byte, genCounter will return a full 16-byte block. -}

{- Initial counter value -}
zero :: AES.AESIV;
zero = AES.aesIV_ $ ByteString.replicate 16 0;

to_bytes :: [ByteString.ByteString] -> [Word8];
to_bytes = concat . map ByteString.unpack;

filt27 :: Word8 -> Maybe Char;
filt27 z = case mod z 32 of {
0 -> Just ' ';
x | x<=26 -> Just $ toEnum $ fromIntegral x + fromEnum 'a' - 1;
_ -> Nothing;
Version: GnuPG v1


Prose description of what the code does: The seed string is hashed with SHA-256, (yielding f78183140c842e8f4a550c3a5eb5663a33706fc07eeacc9687e70823d44511c4), which is then used (unsalted) as a key for AES-256 in counter mode (CTR) to generate a stream of bytes.  The counter starts at the zero block and increments as a 128-bit big-endian integer.  We read each AES output block as bytes in big-endian order.  Each byte is truncated to its least significant five bits (i.e., modulo 32), then encoded into letters by the following substitution: 0=space, 1=A, 2=B,... 26=Z.  Values 27 through 31 are ignored, encoding to nothing.  (They might be useful for future extensions.)  Occasionally there may be two or more consecutive spaces, a rare occurrence which currently has no special meaning but may be useful for future extensions.  For reference, our random word stream begins: d efkabuai yqwrmhspliasmvokhvwhvz fdvhdwhoxkcqfujilomqjubxfzjtug ...

This idea is similar to and kind of an elaboration of the Document ReFinding Key also deployed on this blog which helps in finding, via a search engine, new addresses of individual posts (for which you have an old URL) should this blog move.  Rather than associating just a single random word with an internet resource, we now associate an infinite stream of random words.  The template above can easily be extended to arbitrary URLs, such as those of individual posts.

The technical details were inspired by a mechanism used by some computer virus to call home and update itself, using a deterministic pseudorandom word generator to generate internet domains, one of which the virus author would anonymously purchase and post update code.

by Unknown ( at May 09, 2019 05:53 AM

Tweag I/O

seamless JavaScript/Haskell interop

Shao Cheng has a bit of a history with language interop. By this point, we created or collaborated with others in the community on HaskellR, inline-c, inline-java, and now inline-js. The original idea for this style of interop was realized in language-c-inline by Manuel Chakravarty a few years before joining, concurrently to HaskellR. Manuel wrote a blog post about the design principles that underpin all these different libraries. Others in the community have since created similar libraries such as clr-inline, inline-rust and more. In this post, we'll present our latest contribution to the family: inline-js.

The tagline for inline-js: program Node.js from Haskell.

A quick taste of inline-js

Here is a quick demo of calling the Node.js DNS Promises API to resolve a domain:

import Data.Aeson
import GHC.Generics
import Language.JavaScript.Inline

data DNSRecord = DNSRecord
  { address :: String
  , family :: Int
  } deriving (FromJSON, Generic, Show)

dnsLookup :: String -> IO [DNSRecord]
dnsLookup hostname =
        const dns = (await import("dns")).promises;
        return dns.lookup($hostname, {all: true});

To run it in ghci:

*Blog> dnsLookup ""
[DNSRecord {address = "", family = 4},DNSRecord {address = "", family = 4},DNSRecord {address = "2606:4700:30::681f:44a3", family = 6},DNSRecord {address = "2606:4700:30::681f:45a3", family = 6}]

We can see that the A/AAAA records of are returned as Haskell values.

This demo is relatively small, yet already enough to present some important features described below.

The QuasiQuoters

In the example above, we used block to embed a JavaScript snippet. Naturally, two questions arise: what content can be quoted, and what's the generated expression's type?

block quotes a series of JavaScript statements, and in-scope Haskell variables can be referred to by prefixing their names with $. Before evaluation, we wrap the code in a JavaScript async function, and this clearly has advantages against evaluating unmodified code:

  • When different blocks of code share a JSSession, the local bindings in one block don't pollute the scope of another block. And it's still possible to add global bindings by explicitly operating on global; these global bindings will persist within the same JSSession.

  • We can return the result back to Haskell any time we want; otherwise we'll need to ensure the last executed statement happens to be the result value itself, which can be tricky to get right.

  • Since it's an async function, we have await at our disposal, so working with async APIs becomes much more pleasant.

When we call dnsLookup "", the constructed JavaScript code looks like this:

(async ($hostname) => {
  const dns = (await import("dns")).promises;
  return dns.lookup($hostname, {all: true});
})("").then(r => JSON.stringify(r))

As we can see, the Haskell variables are serialized and put into the argument list of the async function. Since we're relying on FromJSON to parse the result in this case, the result of the async function is further mapped with JSON.stringify.

We also provide an expr QuasiQuoter when the quoted code is expected to be a single expression. Under the hood it adds return and reuse the implementation of block, to save a few keystrokes for the user.

Haskell/JavaScript data marshaling

The type of block's generated expression is JSSession -> IO r, with hidden constraints placed on r. In our example, we're returning [DNSRecord] which has a FromJSON instance, so that instance is picked up, and on the JavaScript side, JSON.stringify() is called automatically before returning the result back to Haskell. Likewise, since hostname is a String which supports ToJSON, upon calling dnsLookup, hostname is serialized to a JSON to be embedded in the JavaScript code.

For marshaling user-defined types, ToJSON/FromJSON is sufficient. This is quite convenient when binding a JavaScript function, since the ToJSON/FromJSON instances are often free due to Haskell's amazing generics mechanism. However, there are also a few other useful non-JSON types which are supported here. These non-JSON types are:

  • The ByteString types in the bytestring package, including strict/lazy/short versions. It's possible to pass a Haskell ByteString to JavaScript, which shows up as a Buffer. Going in the other direction works too.

  • The JSVal type which is an opaque reference to a JavaScript value, described in later sections of this post.

  • The () type (only as a return value), meaning that the JavaScript return value is discarded.

Ensuring the expr/block QuasiQuoters work with both JSON/non-JSON types involves quite a bit of type hackery, so we hide the relevant internal classes and it's currently not possible for inline-js users to add new such non-JSON types.

Importing modules & managing sessions

When prototyping inline-js, we felt the need to support the importing of modules, either built-in or user-supplied ones. Currently, there are two different import mechanisms coexisting in Node.js: the old CommonJS-style require() and the new ECMAScript native import. It's quite non-trivial to support both, and we eventually chose to support ECMAScript dynamic import() since it works out-of-the-box on both web and Node, making it more future-proof.

Importing a built-in module is straightforward: import(module_name) returns a Promise which resolves to that module's namespace object. When we need to import npm-installed modules, we need to specify their location in the settings to initialize JSSession:

import Data.ByteString (ByteString)
import Data.Foldable
import Language.JavaScript.Inline
import System.Directory
import System.IO.Temp
import System.Process

getMagnet :: String -> FilePath -> IO ByteString
getMagnet magnet filename =
  withSystemTempDirectory "" $ \tmpdir -> do
    withCurrentDirectory tmpdir $
        ["npm init --yes", "npm install --save --save-exact webtorrent@0.103.1"]
      defJSSessionOpts {nodeWorkDir = Just tmpdir}
        const WebTorrent = (await import("webtorrent")).default,
          client = new WebTorrent();

        return new Promise((resolve, reject) =>
          client.add($magnet, torrent =>
              .find(file => === $filename)
              .getBuffer((err, buf) => (err ? reject(err) : resolve(buf)))

Here, we rely on the webtorrent npm package to implement a simple BitTorrent client function getMagnet, which fetches the file content based on a magnet URI and a filename. First, we allocate a temporary directory and run npm install in it; then we supply the directory path in the nodeWorkDir field of session config, so inline-js knows where node_modules is. And finally, we use the webtorrent API to perform downloading, returning the result as a Haskell ByteString.

Naturally, running npm install for every single getMagnet call doesn't sound like a good idea. In a real world Haskell application which calls npm-installed modules with inline-js, the required modules shall be installed by the package build process, e.g. by using Cabal hooks to install to the package's data directory, and getMagnet can use the data directory as the working directory of Node.

Now, it's clear that all code created by the QuasiQuoters in inline-js requires a JSSession state, which can be created by newJSSession or withJSSession. There are a couple of config fields available, which allows one to specify the working directory of Node, pass extra arguments or redirect back the Node process standard error output.

How it works

Interacting with Node from Haskell

There are multiple possible methods to interact with Node in other applications, including in particular:

  • Whenever we evaluate some code, start a Node process to run it, and fetch the result either via standard output or a temporary file; persistent Node state can be serialized via structural cloning. This is the easiest way but also has the highest overhead.

  • Use pipes/sockets for IPC, with inline-js starting a script to get the code, perform evaluation and return results, reusing the same Node process throughout the session. This requires more work and has less overhead than calling Node for each call.

  • Use the Node.js N-API to build a native addon, and whatever Haskell application relying on inline-js gets linked with the addon, moving the program entry point to the Node side. We have ABI stability with N-API, and building a native addon is surely less troublesome than building the whole Node stack. Although the IPC overhead is spared, this complicates the Haskell build process.

  • Try to link with Node either as a static or dynamic library, then directly call internal functions. Given that the build system of Node and V8 is a large beast, we thought it would take a considerable amount of effort; even if it's known to work for a specific revision of Node, there's no guarantee later revisions won't break it.

The current implementation uses the second method listed above. inline-js starts an "eval server" which passes binary messages between Node and the host Haskell process via a pair of pipes. At the cost of a bit of IPC-related overhead, we make inline-js capable of working with multiple installations of Node without recompiling. The schema of binary messages and implementation of "eval server" is hidden from users and thus can evolve without breaking the exposed API of inline-js.

The "eval server"

The JavaScript specification provides the eval() function, allowing a dynamically constructed code string to be run anywhere. However, it's better to use the built-in vm module of Node.js, since it's possible to supply a custom global object where JavaScript evaluation happens, so we can prevent the eval server's declarations leaking into the global scope of the evaluated code, while still being able to add custom classes or useful functions to the eval server.

Once started, the eval server accepts binary requests from the host Haskell process and returns responses. Upon an "eval request" containing a piece of UTF-8 encoded JavaScript code, it first evaluates the code, expecting a Promise to be returned. When the Promise resolves with a final result, the result is serialized and returned. Given the asynchronous nature of this pipeline, it's perfectly possible for the Haskell process to dispatch a batch of eval requests, and the eval server to process them concurrently, therefore we also export a set of "async" APIs in Language.JavaScript.Inline which decouples sending requests and fetching responses.

On the Haskell side, we use STM to implement send/receive queues, and they are accompanied by threads which perform the actual sending/receiving. All user-facing interfaces either enqueues a request or tries to fetch the corresponding response from a TVar, blocked if the response is not ready yet. In this way, we make almost all exposed interfaces of inline-js thread-safe.

Marshaling data based on types

Typically, the JavaScript code sent to the eval server is generated by the QuasiQuoter's returned code, potentially including some serialized Haskell variables in the code, and the raw binary data included in the eval response is deserialized into a Haskell value. So how are the Haskell variables recognized in quoted code, and how does the Haskell/JavaScript marshaling take place?

To recognize Haskell variables, it's possible to simply use a simple regex to parse whatever token starting with $ and assume it's a captured Haskell variables, yet this introduces a lot of false positives, e.g. "$not_var", where $not_var is actually in a string. So in the QuasiQuoters of inline-js, we perform JavaScript lexical analysis on quoted code, borrowing the lexer in language-javascript. After the Haskell variables are found, the QuasiQuoters generate a Haskell expression including them as free variables, and at runtime, they can be serialized as parts of the quoted JavaScript code.

To perform type-based marshaling between Haskell and JavaScript data, the simplest thing to do is solely relying on aeson's FromJSON/ToJSON classes. All captured variables should have a ToJSON instance, serialized to JSON which is also a valid piece of ECMAScript, and whatever returned value should also have a FromJSON instance. However, there are annoying exceptions which aren't appropriate to recover from FromJSON/ToJSON instances.

One such types is ByteString. It's very important to be able to support Haskell ByteString variables and expect them to convert to Buffer on the Node side (or vice versa). Unfortunately, the JSON spec doesn't have a special variant for raw binary data. While there are other cross-language serialization schemes (e.g. CBOR) that support it, they introduce heavy npm dependencies to the eval server. Therefore, a reasonable choice is: expect inline-js users to solely rely on FromJSON/ToJSON for their custom types, while also supporting a few special types which have different serialization logic.

Therefore, we have a pair of internal classes for this purpose: ToJSCode and FromEvalResult. All ToJSON instances are also ToJSCode instances, while for ByteString, we encode it with base64 and generate an expression which recovers a Buffer and is safe to embed in any JavaScript code. The FromEvalResult class contains two functions: one to generate a "post-processing" JavaScript function that encodes the result to binary on the Node side, another to deserialize from binary on the Haskell side. For the instances derived from FromJSON, the "post-processing" code is r => JSON.stringify(r), and for ByteString it's simply r => r.

To keep the public API simple, ToJSCode/FromEvalResult is not exposed, and although type inference is quite fragile for QuasiQuoter output, everything works well as long as the relevant variables and return values have explicit type annotations.

Passing references to arbitrary JavaScript values

It's also possible to pass opaque references to arbitrary JavaScript values between Haskell and Node. On the Haskell side, we have a JSVal type to represent such references, and when the returned value's type is annotated to be a JSVal, on the Node side, we allocate a JSVal table slot for the result and pass the table index back. JSVal can also be included in quoted JavaScript code, and they convert to JavaScript expressions which fetch the indexed value.

Exporting Haskell functions to the JavaScript world

Finally, here's another important feature worth noting: inline-js supports a limited form of exporting Haskell functions to the JavaScript world! For functions of type [ByteString] -> IO ByteString, we can use exportHSFunc to get the JSVal corresponding to a JavaScript wrapper function which calls this Haskell function. When the wrapper function is called, it expects all parameters to be convertible to Buffer, then sends a request back to the Haskell process. The regular response-processor Haskell thread has special logic to handle them; it fetches the indexed Haskell function, calls it with the serialized JavaScript parameters in a forked thread, then the result is sent back to the Node side. The wrapper function is async and returns a Promise which resolves once the expected response is received from the Haskell side. Due to the async nature of message processing on both the Node and Haskell side, it's even possible for an exported Haskell function to call into Node again, and it also works the other way.

Normally, the JavaScript wrapper function is async, and async functions work nicely for most cases. There are corner cases where we need the JavaScript function to be synchronous, blocking when the Haskell response is not ready and returning the result without firing a callback. One such example is WebAssembly imports: the JavaScript embedding spec of WebAssembly doesn't allow async functions to be used as imports since this involves the "suspending" and "resuming" of WebAssembly instance state, which might be not economical to implement in today's JavaScript engines. Therefore, we also provide exportSyncHSFunc which makes a synchronous wrapper function to be used in such scenarios. Since it involves completely locking up the main thread in Node with Atomics, this is an extremely heavy hammer and should be used with much caution. We also lose reentrancy with this "sync mode"; when the exported Haskell function calls back into Node, the relevant request will be forever stuck in the message queue, freezing both the Haskell/Node process.


We've presented how inline-js allows JavaScript code to be used directly from Haskell, and explained several key aspects of inline-js internals. The core ideas are quite simple, and the potential use cases are potentially endless, given the enormous ecosystem the Node.js community has accumulated over the past few years. Even for development tasks that are not specifically tied to Node.js, it is still nice to have the ability to easily call relevant JavaScript libraries, to accelerate prototyping in Haskell and to compare correctness/performance of Haskell/JavaScript implementations.

There are still potential improvements to make, e.g. implementing type-based exporting of Haskell functions. But we decided that now is a good timing to announce the framework and collect some first-hand use experience, spot more bugs and hear user opinions on how it can be improved. When we get enough confidence from the feedback of seed users, we can prepare an initial Hackage release. Please spread the word, make actual stuff with inline-js and tell us what you think :)

May 09, 2019 12:00 AM

May 08, 2019

FP Complete

Maximizing Haskell Webinar Review

We are happy to announce that we have been sponsoring free webinars for over a year now. The feedback we have been receiving from the IT community has been overwhelmingly positive. We have been working towards producing a new webinar topic every month, and we plan to keep moving at that pace. In this webinar, Michael Snoyman Vice President of Engineering at FP Complete discusses how to maximize the "Haskell Success Program ". We had 189 people registered for the event which aired on Wednesday, May 1st at 10:00 am PDT. 

Interested In

Enrolling for FP Complete's
Haskell Success Program?


Haskell Special Offer

About the Webinar

In this month's webinar, Michael Snoyman, demonstrated just how easy it is to get started with maximizing FP Complete's "Haskell Success Program"

Topics covered:

During the webinar we tried to answer these questions:

  • Why you should use Haskell
  • The benefits of Haskell in a commercial setting
  • How you can get your team to adopt more Haskell
  • About the training resources we provide online
  • How the FP Complete team can help your company succeed with Haskell

Interested In

Learning more from FP Complete's
Haskell Success Program?


Watch the Webinar


We decided to include the chat log for this webinar, and it can be seen at the end of this blog post.

We have a winner!

Haskell Success Program Winning Draw is...

Congratulations Lauri Lättilä of Helsinki, Finland's SimAnalytics! Lauri & SimAnalytics have won FP Complete's $1 Haskell Success Program Drawing. FP Complete looks forward to working together with Lauri and his SimAnalytics' Haskell team.

NEW! 2 Success Programs!

Following the great feedback and success of its new Haskell Success Program, FP Complete would like to take this opportunity to introduce and welcome its 2 new Success Programs!

BlockChain Success Program

Bringing world-class engineering to your rapid blockchain projects.

Engineer to engineer blockchain mentoring that saves time, slashes risks, and pays for itself!

Be The First

Learn more about this exciting new opportunity available for  the success of your team!


DevOps Success Program

Accelerate your team's expertise in cloud tools & automation 

Mentoring, teaching, and collaboration customized to your team's needs

Engineer To Engineer

Fixing broken processes with DevOps. Engineer to engineer mentoring that pays for itself! 


Do You Know FP Complete?

At FP Complete, we do so many things to help companies it's hard to encapsulate our impact in a few words.  They say a picture is worth a thousand words, so a video has to be worth 10,000 words (at least). Therefore, to tell all we can in as little time as possible, check out our explainer video. It's only 108 seconds to get the full story of FP Complete. 

We want your feedback for webinar topics

We would like to hear your suggestions for future webinar topics. The simplest way to accomplish this is to add a comment to this blog post with your suggestion. Alternatively, send your suggestion via email to

Webinar Chat Log

We find it useful to share what was chatted about during the webinar. You can see the chat flow below.

00:06:56 Yanik Koval: hey
00:07:08 Chris Done: hey!
00:07:37 Yanik Koval: Will you record this?
00:07:51 Chris Done: yes, it's being recorded
00:08:02 Chris Done: it'll be uploaded to our YouTube channel
00:08:15 Yanik Koval: perfect, thank you
01:09:31 Do you mean 20% of Haskell gives 80% of the benefits? Is that just a type?
01:16:30 Bruce Alspaugh: I see languages like Java have lots of free or inexpensive MOOCs available, but it is hard to find very many on platforms like EdX, Coursera, Linda, etc. but there are very few for Haskell. Is there any effort underway to expand these offerings?
01:19:22 : I have seen some university/academic lecturers giving talks about teaching Haskell via MOOCs, but I'm not aware of commercial initiatives in this area.

01:24:45 Bruce Alspaugh: Do you have any recommendations for how to work with existing local user groups, or establishing new Haskell-oriented user groups to encourage developers to learn Haskell, and companies to start pilot projects?
01:35:13 Bruce Alspaugh: I see languages like Java have lots of free or inexpensive MOOCs available on platforms like EdX, Coursera, Lynda, etc., but there are very few available for Haskell. Is there any effort underway to expand these offerings?
01:39:51 Bruce Alspaugh: Is Haskell on the JVM using Eta or Frege a viable option?
01:41:04 Byron Hale: To All Panelists : My experience with Python is that it's like an elephant in a china-shop when it comes to package management. A carefully curated system will be overwritten without a by-your-leave. Python is the preferred language for machine-learning. What can be done?
01:46:03 Bruce Alspaugh: Does Haskell have good libraries for PDF report generation?
01:47:07 Byron Hale: For example, can Python be added to Nix? Guix?
01:59:21 Dan Banta: Thank you.
01:59:36 Chris Done: Thanks all!!
01:59:53 Bulent Basaran: Thanks You!

by Greg Morancey ( at May 08, 2019 05:30 PM

Donnacha Oisín Kidney

Some Tricks for List Manipulation

Posted on May 8, 2019
Tags: Haskell

This post is a collection of some of the tricks I’ve learned for manipulating lists in Haskell. Each one starts with a puzzle: you should try the puzzle yourself before seeing the solution!

The Tortoise and the Hare

How can you split a list in half, in one pass, without taking its length?

This first one is a relatively well-known trick, but it occasionally comes in handy, so I thought I’d mention it. The naive way is as follows:

But it’s unsatisfying: we have to traverse the list twice, and we’re taking its length (which is almost always a bad idea). Instead, we use the following function:

The “tortoise and the hare� is the two arguments to go: it traverses the second one twice as fast, so when it hits the end, we know that the first list must be halfway done.

There and Back Again

Given two lists, xs and ys, write a function which zips xs with the reverse of ys (in one pass).

There’s a lovely paper (Danvy and Goldberg 2005) which goes though a number of tricks for how to do certain list manipulations “in reverse�. Their technique is known as “there and back again�. However, I’d like to describe a different way to get to the same technique, using folds.

Whenever I need to do some list manipulation in reverse (i.e., I need the input list to be reversed), I first see if I can rewrite the function as a fold, and then just switch out foldr for foldl.

For our puzzle here, we need to first write zip as a fold:

If that looks complex, or difficult to write, don’t worry! There’s a systematic way to get to the above definition from the normal version of zip. First, let’s start with a normal zip:

Then, we need to turn it into a case-tree, where the first branch is on the list we want to fold over. In other words, we want the function to look like this:

To figure out the cases, we factor out the cases in the original function. Since the second clause (zip xs [] = []) is only reachable when xs /= [], it’s effectively a case for the x:xs branch.

Now, we rewrite the different cases to be auxiliary functions:

And finally, we refactor the recursive call to the first case expression.

Then those two auxiliary functions are what you pass to foldr!

So, to reverse it, we simply take wherever we wrote foldr f b, and replace it with foldl (flip f) b:

Of course, we’re reversing the wrong list here. Fixing that is simple:

Maintaining Laziness

Rewrite the above function without using continuations.

zipRev, as written above, actually uses continuation-passing style. In most languages (including standard ML, which was the one used in Danvy and Goldberg (2005)), this is pretty much equivalent to a direct-style implementation (modulo some performance weirdness). In a lazy language like Haskell, though, continuation-passing style often makes things unnecessarily strict.

Consider the church-encoded pairs:

So it’s sometimes worth trying to avoid continuations if there is a fast direct-style solution. (alternatively, continuations can give you extra strictness when you do want it)

First, I’m going to write a different version of zipRev, which folds on the first list, not the second.

Then, we inline the definition of foldl:

Then, as a hint, we tuple up the two accumulating parameters:

What we can see here is that we have two continuations stacked on top of each other. When this happens, they can often “cancel out�, like so:

And we have our direct-style implementation!

Note 14/05/2019: the “cancel-out� explanation there is a little handwavy, as I’m sure you’ll notice. However, there are a number of excellent explanations on this stackoverflow thread which explain it much better than I ever could. Thanks to Anders Kaseorg, Will Ness, user11228628, and to Joseph Sible (2019) for asking the question.

Manual Fusion

Detect that a list is a palindrome, in one pass.

We now know a good way to split a list in two, and a good way to zip a list with its reverse. We can combine the two to get a program that checks if a list is a palindrome. Here’s a first attempt:

But this is doing three passes!

To get around it, we can manually do some fusion. Fusion is a technique where we can spot scenarios like the following:

And translate them into a version without a list:

The trick is making sure that the consumer is written as a fold, and then we just put its f and b in place of the : and [] in the producer.

So, when we inline the definition of splitHalf into zipRev, we get the following:

(adding a special case for odd-length lists)

Finally, the all (uncurry (==)) is implemented as a fold also. So we can fuse it with the rest of the definitions:

You may have spotted the writer monad over All there. Indeed, we can rewrite it to use the monadic bind:

Eliminating Multiple Passes with Laziness

Construct a Braun tree from a list in linear time.

This is also a very well-known trick (Bird 1984), but today I’m going to use it to write a function for constructing Braun trees.

A Braun tree is a peculiar structure. It’s a binary tree, where adjacent branches can differ in size by only 1. When used as an array, it has <semantics>�(logn)<annotation encoding="application/x-tex">\mathcal{O}(\log n)</annotation></semantics> lookup times. It’s enumerated like so:

   │ └11
 │ │ ┌─9
 │ └5┤
 │   └13
 │   ┌─8
 │ ┌4┤
 │ │ └12
   │ ┌10

The objective is to construct a tree from a list in linear time, in the order defined above. Okasaki (1997) observed that, from the list:

Each level in the tree is constructed from chucks of powers of two. In other words:

From this, we can write the following function:

The first place we’ll look to eliminate a pass is the build function. It combines two rows by splitting the second in half, and zipping it with the first.

We don’t need to store the length of the first list, though, as we are only using it to split the second, and we can do that at the same time as the zipping.

Using this function in build looks like the following:

That top-level zipWith is also unnecessary, though. If we make the program circular, we can produce ts2 as we consume it, making the whole thing single-pass.

That zip3Node is a good candidate for rewriting as a fold, also, making the whole thing look like this:

To fuse all of those definitions, we first will need to rewrite rows as a fold:

Once we have everything as a fold, the rest of the transformation is pretty mechanical. At the end of it all, we get the following linear-time function for constructing a Braun tree from a list:


Bird, R. S. 1984. “Using Circular Programs to Eliminate Multiple Traversals of Data.� Acta Inf. 21 (3) (October): 239–250. doi:10.1007/BF00264249.

Danvy, Olivier, and Mayer Goldberg. 2005. “There and Back Again.� BRICS Report Series 12 (3). doi:10.7146/brics.v12i3.21869.

Okasaki, Chris. 1997. “Three Algorithms on Braun Trees.� Journal of Functional Programming 7 (6) (November): 661–666. doi:10.1017/S0956796897002876.

Sible, Joseph. 2019. “How can two continuations cancel each other out?� Stack Overflow.

by Donnacha Oisín Kidney at May 08, 2019 12:00 AM

May 07, 2019

Brandon Simmons

A little tool for visualizing ghc verbose timing

I wrote a little tool to graph the fine-grained timing logs produced by ghc when the -v (verbose) flag is given. These logs to STDERR look like, e.g.

!!! Liberate case [Data.Binary.Class]: finished in 32.06 milliseconds, allocated 14.879 megabytes
!!! Simplifier [Data.Binary.Class]: finished in 873.97 milliseconds, allocated 563.867 megabytes

The project is on GitHub at jberryman/ghc-timing-treemap, along with a screenshot.

Navigating within the graph is pretty slow at least in firefox, and there are some other improvements that could be made, for instance some of the phases are run multiple times on the same module and it would be nice to see these grouped, where the module name is logged.

May 07, 2019 06:31 PM