Planet Haskell

January 19, 2019

Haskell at Work

Purely Functional GTK+, Part 2: TodoMVC

Purely Functional GTK+, Part 2: TodoMVC

In the last episode we built a "Hello, World" application using gi-gtk-declarative. It's now time to convert it into a to-do list application, in the style of TodoMVC.

To convert the “Hello, World!” application to a to-do list application, we begin by adjusting our data types. The Todo data type represents a single item, with a Text field for its name. We also need to import the Text type from Data.Text.

data Todo = Todo
  { name :: Text

Our state will no longer be (), but a data types holding Vector of Todo items. This means we also need to import Vector from Data.Vector.

data State = State
  { todos :: Vector Todo

As the run function returns the last state value of the state reducer loop, we need to discard that return value in main. We wrap the run action in void, imported from Control.Monad.

Let’s rewrite our view function. We change the title to “TodoGTK+” and replace the label with a todoList, which we’ll define in a where binding. We use container to declare a Gtk.Box, with vertical orientation, containing all the to-do items. Using fmap and a typed hole, we see that we need a function Todo -> BoxChild Event.

view' :: State -> AppView Gtk.Window Event
view' s = bin
  [#title := "TodoGTK+", on #deleteEvent (const (True, Closed))]
    todoList = container Gtk.Box
                         [#orientation := Gtk.OrientationVertical]
                         (fmap _ (todos s))

The todoItem will render a Todo value as a Gtk.Label displaying the name.

view' :: State -> AppView Gtk.Window Event
view' s = bin
  [#title := "TodoGTK+", on #deleteEvent (const (True, Closed))]
    todoList = container Gtk.Box
                         [#orientation := Gtk.OrientationVertical]
                         (fmap todoItem (todos s))
    todoItem todo = widget Gtk.Label [#label := name todo]

Now, GHC tells us there’s a “non-type variable argument in the constraint”. The type of todoList requires us to add the FlexibleContexts language extension.

{-# LANGUAGE FlexibleContexts  #-}
{-# LANGUAGE OverloadedLabels  #-}
{-# LANGUAGE OverloadedLists   #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where

The remaining type error is in the definition of main, where the initial state cannot be a () value. We construct a State value with an empty vector.

main :: IO ()
main = void $ run App
  { view         = view'
  , update       = update'
  , inputs       = []
  , initialState = State {todos = mempty}

Adding New To-Do Items

While our application type-checks and runs, there are no to-do items to display, and there’s no way of adding new ones. We need to implement a form, where the user inserts text and hits the Enter key to add a new to-do item. To represent these events, we’ll add two new constructors to our Event type.

data Event
  = TodoTextChanged Text
  | TodoSubmitted
  | Closed

TodoTextChanged will be emitted each time the text in the form changes, carrying the current text value. The TodoSubmitted event will be emitted when the user hits Enter.

When the to-do item is submitted, we need to know the current text to use, so we add a currentText field to the state type.

data State = State
  { todos       :: Vector Todo
  , currentText :: Text

We modify the initialState value to include an empty Text value.

main :: IO ()
main = void $ run App
  { view         = view'
  , update       = update'
  , inputs       = []
  , initialState = State {todos = mempty, currentText = mempty}

Now, let’s add the form. We wrap our todoList in a vertical box, containing the todoList and a newTodoForm widget.

view' :: State -> AppView Gtk.Window Event
view' s = bin
  [#title := "TodoGTK+", on #deleteEvent (const (True, Closed))]
  (container Gtk.Box
             [#orientation := Gtk.OrientationVertical]
             [todoList, newTodoForm]

The form consists of a Gtk.Entry widget, with the currentText of our state as its text value. The placeholder text will be shown when the entry isn’t focused. We use onM to attach an effectful event handler to the changed signal.

view' :: State -> AppView Gtk.Window Event
view' s = bin
  [#title := "TodoGTK+", on #deleteEvent (const (True, Closed))]
  (container Gtk.Box
             [#orientation := Gtk.OrientationVertical]
             [todoList, newTodoForm]
    newTodoForm = widget
      [ #text := currentText s
      , #placeholderText := "What needs to be done?"
      , onM #changed _

The typed hole tells us we need a function Gtk.Entry -> IO Event. The reason we use onM is to have that IO action returning the event, instead of having a pure function. We need it to query the underlying GTK+ widget for it’s current text value. By using entryGetText, and mapping our event constructor over that IO action, we get a function of the correct type.

    newTodoForm = widget
      [ #text := currentText s
      , #placeholderText := "What needs to be done?"
      , onM #changed (fmap TodoTextChanged . Gtk.entryGetText)

It is often necessary to use onM and effectful GTK+ operations in event handlers, as the callback type signatures rarely have enough information in their arguments. But for the next event, TodoSubmitted, we don’t need any more information, and we can use on to declare a pure event handler for the activated signal.

    newTodoForm = widget
      [ #text := currentText s
      , #placeholderText := "What needs to be done?"
      , onM #changed (fmap TodoTextChanged . Gtk.entryGetText)
      , on #activate TodoSubmitted

Moving to the next warning, we see that the update' function is no longer total. We are missing cases for our new events. Let’s give the arguments names and pattern match on the event. The case for Closed will be the same as before.

update' :: State -> Event -> Transition State Event
update' s e = case e of
  Closed -> Exit

When the to-do text value changes, we’ll update the currentText state using a Transition. The first argument is the new state, and the second argument is an action of type IO (Maybe Event). We don’t want to emit any new event, so we use (pure Nothing).

update' :: State -> Event -> Transition State Event
update' s e = case e of
  TodoTextChanged t -> Transition s { currentText = t } (pure Nothing)
  Closed -> Exit

For the TodoSubmitted event, we define a newTodo value with the currentText as its name, and transition to a new state with the newTodo item appended to the todos vector. We also reset the currentText to be empty.

To use Vector.snoc, we need to add a qualified import.

import           Control.Monad                 (void)
import           Data.Text                     (Text)
import           Data.Vector                   (Vector)
import qualified Data.Vector                   as Vector
import qualified GI.Gtk                        as Gtk
import           GI.Gtk.Declarative
import           GI.Gtk.Declarative.App.Simple

Running the application, we can start adding to-do items.

Improving the Layout

Our application doesn’t look very good yet, so let’s improve the layout a bit. We’ll begin by left-aligning the to-do items.

todoItem i todo =
    [#label := name todo, #halign := Gtk.AlignStart]

To push the form down to the bottom of the window, we’ll wrap the todoList in a BoxChild, and override the defaultBoxChildProperties to have the child widget expand and fill all the available space of the box.

todoList =
  BoxChild defaultBoxChildProperties { expand = True, fill = True }
    $ container Gtk.Box
                [#orientation := Gtk.OrientationVertical]
                (fmap todoItem (todos s))

We re-run the application, and see it has a nicer layout.

Completing To-Do Items

There’s one very important missing: being able to mark a to-do item as completed. We add a Bool field called completed to the Todo data type.

data Todo = Todo
  { name      :: Text
  , completed :: Bool

When creating new items, we set it to False.

update' :: State -> Event -> Transition State Event
update' s e = case e of
  TodoSubmitted ->
    let newTodo = Todo {name = currentText s, completed = False}
    in  Transition
          s { todos = todos s `Vector.snoc` newTodo, currentText = mempty }
          (pure Nothing)

Instead of simply rendering the name, we’ll use strike-through markup if the item is completed. We define completedMarkup, and using guards we’ll either render the new markup or render the plain name. To make it strike-through, we wrap the text value in <s> tags.

    [ #label := completedMarkup todo
    , #halign := Gtk.AlignStart
    completedMarkup todo
      | completed todo = "<s>" <> name todo <> "</s>"
      | otherwise      = name todo

For this to work, we need to enable markup for the label be setting #useMarkup to True.

    [ #label := completedMarkup todo
    , #useMarkup := True
    , #halign := Gtk.AlignStart
    completedMarkup todo
      | completed todo = "<s>" <> name todo <> "</s>"
      | otherwise      = name todo

In order for the user to be able to toggle the completed status, we wrap the label in a Gtk.CheckButton bin. The #active property will be set to the current completed status of the Todo value. When the check button is toggled, we want to emit a new event called TodoToggled.

todoItem todo =
  bin Gtk.CheckButton
      [#active := completed todo, on #toggled (TodoToggled i)]
    $ widget
        [ #label := completedMarkup todo
        , #useMarkup := True
        , #halign := Gtk.AlignStart

Let’s add the new constructor to the Event data type. It will carry the index of the to-do item.

data Event
  = TodoTextChanged Text
  | TodoSubmitted
  | TodoToggled Int
  | Closed

To get the corresponding index of each Todo value, we’ll iterate using Vector.imap instead of using fmap.

    todoList =
      BoxChild defaultBoxChildProperties { expand = True, fill = True }
        $ container Gtk.Box
                    [#orientation := Gtk.OrientationVertical]
                    (Vector.imap todoItem (todos s))
    todoItem i todo =

The pattern match on events in the update' function is now missing a case for the new event constructor. Again, we’ll do a transition where we update the todos somehow.

update' :: State -> Event -> Transition State Event
update' s e = case e of
  TodoToggled i -> Transition s { todos = _ (todos s) } (pure Nothing)

We need a function Vector Todo -> Vector Todo that modifies the value at the index i. There’s no handy function like that available in the vector package, so we’ll create our own. Let’s call it mapAt.

update' :: State -> Event -> Transition State Event
update' s e = case e of
  TodoToggled i -> Transition s { todos = mapAt i _ (todos s) } (pure Nothing)

It will take as arguments the index, a mapping function, and a Vector a, and return a Vector a.

mapAt :: Int -> (a -> a) -> Vector a -> Vector a

We implement it using Vector.modify, and actions on the mutable representation of the vector. We overwrite the value at i with the result of mapping f over the existing value at i.

mapAt :: Int -> (a -> a) -> Vector a -> Vector a
mapAt i f = Vector.modify (\v -> MVector.write v i . f =<< v i)

To use mutable vector operations through the MVector name, we add the qualified import.

import qualified Data.Vector.Mutable           as MVector

Finally, we implement the function to map, called toggleComplete.

toggleCompleted :: Todo -> Todo
toggleCompleted todo = todo { completed = not (completed todo) }

update' :: State -> Event -> Transition State Event
update' s e = case e of
  TodoToggled i -> Transition s { todos = mapAt i toggleComplete (todos s) } (pure Nothing)

Now, we run our application, add some to-do items, and mark or unmark them as completed. We’re done!

Learning More

Building our to-do list application, we have learned the basics of gi-gtk-declarative and the “App.Simple” architecture. There’s more to learn, though, and I recommend checking out the project documentation. There are also a bunch of examples in the Git repository.

Please note that this project is very young, and that APIs are not necessarily stable yet. I think, however, that it’s a much nicer way to build GTK+ applications using Haskell than the underlying APIs provided by the auto-generated bindings.

Now, have fun building your own functional GTK+ applications!

by Oskar Wickström at January 19, 2019 12:00 AM

December 28, 2018

Oskar Wickström

Why I'm No Longer Taking Donations

Haskell at Work, the screencast focused on Haskell in practice, is approaching its one year birthday. Today, I decided to stop taking donations through Patreon due to the negative stress I’ve been experiencing.

The Beginning

This journey started in January 2018. Having a wave of inspiration after watching some of Gary Bernhardt’s new videos, I decided to try making my own videos about practical Haskell programming. Not only producing high-quality content, but with high video and audio quality, was the goal. Haskell at Work was born, and the first video was surprisingly well-received by followers on Twitter.

With the subsequent episodes being published in rapid succession, a follower base on YouTube grew quickly. A thousand or so followers might not be exceptional for a programming screencast channel on YouTube, but to me this was exciting and unexpected. To be honest, Haskell is not exactly a mainstream programming language.

Early on, encouraged by some followers, and being eager to develop the concept, I decided to set up Patreon as a way of people to donate to Haskell at Work. Much like the follower count, the number of patrons and their monthly donations grew rapidly, beyond any hopes I had.

Fatigue Kicks In

The majority of screencasts were published between January and May. Then came the summer and my month-long vacation, in which I attended ZuriHac and spent three weeks in Bali with my wife and friends. Also, I had started getting side-tracked by my project to build a screencast video editor in Haskell. Working on Komposition also spawned the Haskell package gi-gtk-declarative, and my focus got swept away from screencasts. In all fairness, I’m not great at consistently doing one thing for an extended period. My creativity and energy comes in bursts, and it may not strike where and when I hope. Maybe this can be managed or controlled somehow, but I don’t know how.

With the lower publishing pace over the summer, a vicious circle of anxiety and low productivity grew. I had thoughts about shutting down the Patreon back then, but decided to instead pause it for a few months.

Regaining Energy

By October, I had recovered some energy. I got very good feedback and lots of encouragement from people at Haskell eXchange, and decided to throw myself back into the game. I published one screencast in November, but something was still there nagging me. I felt pressure and guilt. That I had not delivered on the promise given.

By this time, the Patreon donations had covered my recording equipment expenses, hosting costs over the year, and a few programming books I bought. The donations were still coming in, however, at around $160 per month, with me producing no obvious value for the patrons. The guilt was still there, even stronger than before.

I’m certain that this is all in my head. I do not blame any supporter for these feelings. You have all been great! With all the words of caution you hear about not reading the comments, having a YouTube channel filled with positive feedback, and almost exclusively thumbs-up ratings, I’m beyond thankful for the support I have received.

Trying Something Else

After Christmas this year, I had planned to record and publish a new screencast. Various personal events got in the way, though, and I had very little time to spend on working with it, resulting in the same kind of stress. I took a step back and thought about it carefully, and I’ve realized that money is not a good driver for the free material and open-source code work that I do, and that it’s time for a change.

I want to make screencasts because I love doing it, and I will do so when I have time and energy.

From the remaining funds in my PayPal account, I have allocated enough to keep the domain name and hosting costs covered for another year, and I have donated the remaining amount (USD 450) to

Please keep giving me feedback and suggestions for future episodes. Your ideas are great! I’m looking forward to making more Haskell at Work videos in the future, and I’m toying around with ideas on how to bring in guests, and possibly trying out new formats. Stay tuned, and thank you all for your support!

December 28, 2018 11:00 PM

June 22, 2013

Shayne Fletcher


There are different approaches to the issue of not having a value to return. One idiom to deal with this in C++ is the use of boost::optional<T> or std::pair<bool, T>.

class boost::optional<T> //Discriminated-union wrapper for values.

Maybe is a polymorphic sum type with two constructors : Nothing or Just a.
Here's how Maybe is defined in Haskell.

{- The Maybe type encapsulates an optional value. A value of type
Maybe a either contains a value of type a (represented as Just a), or
it is empty (represented as Nothing). Using Maybe is a good way to
deal with errors or exceptional cases without resorting to drastic
measures such as error.

The Maybe type is also a monad.
It is a simple kind of error monad, where all errors are
represented by Nothing. -}

data Maybe a = Nothing | Just a

{- The maybe function takes a default value, a function, and a Maybe
value. If the Maybe value is Nothing, the function returns the default
value. Otherwise, it applies the function to the value inside the Just
and returns the result. -}

maybe :: b -> (a -> b) -> Maybe a -> b
maybe n _ Nothing = n
maybe _ f (Just x) = f x

I haven't tried to compile the following OCaml yet but I think it should be roughly OK.

type 'a option = None | Some of 'a ;;

let maybe n f a =
match a with
| None -> n
| Some x -> f x

Here's another variant on the Maybe monad this time in Felix. It is applied to the problem of "safe arithmetic" i.e. the usual integer arithmetic but with guards against under/overflow and division by zero.

union success[T] =
| Success of T
| Failure of string

fun str[T] (x:success[T]) =>
match x with
| Success ?t => "Success " + str(t)
| Failure ?s => "Failure " + s

typedef fun Fallible (t:TYPE) : TYPE => success[t] ;

instance Monad[Fallible]
fun bind[a, b] (x:Fallible a, f: a -> Fallible b) =>
match x with
| Success ?a => f a
| Failure[a] ?s => Failure[b] s

fun ret[a](x:a):Fallible a => Success x ;

//Safe arithmetic.

const INT_MAX:int requires Cxx_headers::cstdlib ;
const INT_MIN:int requires Cxx_headers::cstdlib ;

fun madd (x:int) (y:int) : success[int] =>
if x > 0 and y > (INT_MAX - x) then
Failure[int] "overflow"
Success (y + x)

fun msub (x:int) (y:int) : success[int] =>
if x > 0 and y < (INT_MIN + x) then
Failure[int] "underflow"
Success (y - x)

fun mmul (x:int) (y:int) : success[int] =>
if x != 0 and y > (INT_MAX / x) then
Failure[int] "overflow"
Success (y * x)

fun mdiv (x:int) (y:int) : success[int] =>
if (x == 0) then
Failure[int] "attempted division by zero"
Success (y / x)


open Monad[Fallible] ;

//Evalue some simple expressions.

val zero = ret 0 ;
val zero_over_one = bind ((Success 0), (mdiv 1)) ;
val undefined = bind ((Success 1),(mdiv 0)) ;
val two = bind((ret 1), (madd 1)) ;
val two_by_one_plus_one = bind (two , (mmul 2)) ;

println$ "zero = " + str zero ;
println$ "1 / 0 = " + str undefined ;
println$ "0 / 1 = " + str zero_over_one ;
println$ "1 + 1 = " + str two ;
println$ "2 * (1 + 1) = " + str (bind (bind((ret 1), (madd 1)) , (mmul 2))) ;
println$ "INT_MAX - 1 = " + str (bind ((ret INT_MAX), (msub 1))) ;
println$ "INT_MAX + 1 = " + str (bind ((ret INT_MAX), (madd 1))) ;
println$ "INT_MIN - 1 = " + str (bind ((ret INT_MIN), (msub 1))) ;
println$ "INT_MIN + 1 = " + str (bind ((ret INT_MIN), (madd 1))) ;

println$ "--" ;

//We do it again, this time using the "traditional" rshift-assign

syntax monad //Override the right shift assignment operator.
x[ssetunion_pri] := x[ssetunion_pri] ">>=" x[>ssetunion_pri] =># "`(ast_apply ,_sr (bind (,_1 ,_3)))";
open syntax monad;

println$ "zero = " + str (ret 0) ;
println$ "1 / 0 = " + str (ret 1 >>= mdiv 0) ;
println$ "0 / 1 = " + str (ret 0 >>= mdiv 1) ;
println$ "1 + 1 = " + str (ret 1 >>= madd 1) ;
println$ "2 * (1 + 1) = " + str (ret 1 >>= madd 1 >>= mmul 2) ;
println$ "INT_MAX = " + str (INT_MAX) ;
println$ "INT_MAX - 1 = " + str (ret INT_MAX >>= msub 1) ;
println$ "INT_MAX + 1 = " + str (ret INT_MAX >>= madd 1) ;
println$ "INT_MIN = " + str (INT_MIN) ;
println$ "INT_MIN - 1 = " + str (ret INT_MIN >>= msub 1) ;
println$ "INT_MIN + 1 = " + str (ret INT_MIN >>= madd 1) ;
println$ "2 * (INT_MAX/2) = " + str (ret INT_MAX >>= mdiv 2 >>= mmul 2 >>= madd 1) ; //The last one since we know INT_MAX is odd and that division will truncate.
println$ "2 * (INT_MAX/2 + 1) = " + str (ret INT_MAX >>= mdiv 2 >>= madd 1 >>= mmul 2) ;

That last block using the <<= syntax produces (in part) the following output (the last two print statments have been truncated away -- the very last one produces an expected overflow).

by Shayne Fletcher ( at June 22, 2013 09:07 PM

November 04, 2013

Tim Docker

Cabal version consistency

Thanks to some great work done over the google summer of code, the chart library has gained much new functionality over the last 6 months. A consequence of this is that it has gained plenty of dependencies on other software. Furthermore, where the library previously had 2 cabal files to build the system, it now has 4. It's important the the versioning of dependencies is consistent across these cabal files, but manually checking is tedious. As best I could tell there is not yet a tool to facilitate this.

Hence, I spend a little time learning about the cabal API, and wrote a short script that:

  1. reads several cabal files specified on the command line
  2. merges these into one overall set of dependencies
  3. displays the depencies in such a way that inconsistent version constrains are obvious

Here's some example output:

$ runghc ~/repos/merge-cabal-deps/mergeCabalDeps.hs `find . -name '*.cabal'`
* loaded Chart-gtk-1.1
* loaded Chart-1.1
* loaded Chart-tests-1.1
* loaded Chart-cairo-1.1
* loaded Chart-diagrams-1.1
    >=1.1 && <1.2 (Chart-cairo,Chart-diagrams,Chart-gtk,Chart-tests)
    >=1.1 && <1.2 (Chart-gtk,Chart-tests)
    >=1.1 && <1.2 (Chart-tests)
    >=1.1 && <1.2 (Chart-tests)
    >=1.4 && <1.5 (Chart-diagrams)
    -any (Chart,Chart-cairo,Chart-gtk,Chart-tests)
    >=3 && <5 (Chart,Chart-cairo,Chart-diagrams,Chart-gtk,Chart-tests)
    >=0.3.3 (Chart-diagrams,Chart-tests)
    >=0.9 && <1.0 (Chart-diagrams,Chart-tests)
    >=0.9.11 (Chart-cairo,Chart-gtk,Chart-tests)
    >=2.2.0 (Chart-diagrams)
    >=2.2.1 && <2.4 (Chart,Chart-cairo,Chart-gtk,Chart-tests)
    >=0.4 && <0.6 (Chart-diagrams,Chart-tests)
    <0.1 (Chart,Chart-cairo,Chart-diagrams,Chart-tests)
    >=0.7 && <0.8 (Chart-tests)
    >=0.7 && <0.8 (Chart-diagrams,Chart-tests)
    >=0.7 && <0.8 (Chart-diagrams,Chart-tests)

As should be evident, all of the imported cabal packages are referenced with consistent version constraints except for colour (which is lacking an upper bound in Chart-diagrams).

The script is pretty straightforward:

import Control.Monad
import Data.List(intercalate)
import System.Environment(getArgs)

import qualified Data.Map as Map
import qualified Data.Set as Set

import Distribution.Package
import Distribution.Version
import Distribution.Verbosity
import Distribution.Text(display)
import Distribution.PackageDescription
import Distribution.PackageDescription.Parse
import Distribution.PackageDescription.Configuration

type VersionRangeS = String

type DependencyMap = Map.Map PackageName (Map.Map VersionRangeS (Set.Set PackageName))

getDependencyMap :: PackageDescription -> DependencyMap
getDependencyMap pd = foldr f Map.empty (buildDepends pd)
    f :: Dependency -> DependencyMap  -> DependencyMap
    f (Dependency p vr) = Map.insert p (Map.singleton (display vr) (Set.singleton (pkgName (package pd))))

printMergedDependencies :: [PackageDescription] -> IO ()
printMergedDependencies pds = do
  forM_ (Map.toList dmap) $ \(pn,versions) -> do
    putStrLn (display pn ++ ":")
    forM_ (Map.toList versions) $ \(version,pnset) -> do
       putStrLn ("    " ++ version ++ " (" ++ intercalate "," (map display (Set.toList pnset)) ++ ")")
    dmap :: DependencyMap
    dmap = Map.unionsWith (Map.unionWith Set.union) (map getDependencyMap pds)

scanPackages :: [FilePath] -> IO ()
scanPackages fpaths = do
    pds <- mapM loadPackageDescription fpaths
    printMergedDependencies pds
    loadPackageDescription path = do
      pd <- fmap flattenPackageDescription (readPackageDescription silent path)
      putStrLn ("* loaded " ++ display (package pd))
      return pd

main = getArgs >>= scanPackages      

I'd be interested in other tools used for managing suites of cabal configurations.

November 04, 2013 12:00 AM

July 15, 2020

Neil Mitchell

Managing Haskell Extensions

Summary: You can divide extensions into yes, no and maybe, and then use HLint to enforce that.

I've worked in multiple moderately sized multinational teams of Haskell programmers. One debate that almost always comes up is which extensions to enable. It's important to have some consistency, so that everyone is using similar dialects of Haskell and can share/review code easily. The way I've solved this debate in the past is by, as a team, dividing the extensions into three categories:

  • Always-on. For example, ScopedTypeVariables is just how Haskell should work and should be enabled everywhere. We turn these extensions on globally in Cabal with default-extensions, and then write an HLint rule to ban turning the extension on manually. In one quick stroke, a large amount of {-# LANGUAGE #-} boilerplate at the top of each file disappears.
  • Always-off, not enabled and mostly banned. For example, TransformListComp steals the group keyword and never got much traction. These extensions can be similarly banned by HLint, or you can default unmentioned extensions to being disabled. If you really need to turn on one of these extensions, you need to both turn it on in the file, and add an HLint exception. Such an edit can trigger wider code review, and serves as a good signal that something needs looking at carefully.
  • Sometimes-on, written at the top of the file as {-# LANGUAGE #-} pragmas when needed. These are extensions which show up sometimes, but not that often. A typical file might have zero to three of them. The reasons for making an extension sometimes-on fall into three categories:
    • The extension has a harmful compile-time impact, e.g. CPP or TemplateHaskell. It's better to only turn these extensions on if they are needed, but they are needed fairly often.
    • The extension can break some code, e.g. OverloadedStrings, so enabling it everywhere would cause compile failures. Generally, we work to minimize such cases, aiming to fix all code to compile with most extensions turned on.
    • The extension is used rarely within the code base and is a signal to the reader that something unusual is going on. Depending on the code base that might be things like RankNTypes or GADTs. But for certain code bases, those extensions will be very common, so it very much varies by code base.

The features that are often most debated are the syntax features - e.g. BlockSyntax or LambdaCase. Most projects should either use these extensions commonly (always-on), or never (banned). They provide some syntactic convenience, but if used rarely, tend to mostly confuse things.

Using this approach every large team I've worked on has had one initial debate to classify extensions, then every few months someone will suggest moving an extension from one pile to another. However, it's pretty much entirely silenced the issue from normal discussion thereafter, leaving us to focus on actual coding.

by Neil Mitchell ( at July 15, 2020 06:17 PM

Mark Jason Dominus

More trivia about megafauna and poisonous plants

A couple of people expressed disappointment with yesterday's article, which asked were giant ground sloths immune to poison ivy?, but then failed to deliver on the implied promise. I hope today's article will make up for that.


I said:

Mangoes are tropical fruit and I haven't been able to find any examples of Pleistocene megafauna that lived in the tropics…

David Formosa points out what should have been obvious: elephants are megafauna, elephants live where mangoes grow (both in Africa and in India), elephants love eating mangoes [1] [2] [3], and, not obvious at all…

Elephants are immune to poison ivy!

Captive elephants have been known to eat poison ivy, not just a little bite, but devouring entire vines, leaves and even digging up the roots. To most people this would have cause a horrific rash … To the elephants, there was no rash and no ill effect at all…

It's sad that we no longer have megatherium. But we do have elephants, which is pretty awesome.

Idiot fruit

The idiot fruit is just another one of those legendarily awful creatures that seem to infest every corner of Australia (see also: box jellyfish, stonefish, gympie gympie, etc.); Wikipedia says:

The seeds are so toxic that most animals cannot eat them without being severely poisoned.

At present the seeds are mostly dispersed by gravity. The plant is believed to be an evolutionary anachronism. What Pleistocene megafauna formerly dispersed the poisonous seeds of the idiot fruit?

A wombat. A six-foot-tall wombat.

I am speechless with delight.

by Mark Dominus ( at July 15, 2020 04:20 PM

Douglas M. Auclair (geophf)

July 2020 1HaskellADay Problems and Solutions

by geophf ( at July 15, 2020 02:59 PM

July 14, 2020

Mark Jason Dominus

Were giant ground sloths immune to poison ivy?

The skin of the mango fruit contains urushiol, the same irritating chemical that is found in poison ivy. But why? From the mango's point of view, the whole point of the mango fruit is to get someone to come along and eat it, so that they will leave the seed somewhere else. Posioning the skin seems counterproductive.

An analogous case is the chili pepper, which contains an irritating chemical, capsaicin. I think the answer here is believed to be that while capsaicin irritates mammals, birds are unaffected. The chili's intended target is birds; you can tell from the small seeds, which are the right size to be pooped out by birds. So chilis have a chemical that encourages mammals to leave the fruit in place for birds.

What's the intended target for the mango fruit? Who's going to poop out a seed the size of a mango pit? You'd need a very large animal, large enough to swallow a whole mango. There aren't many of these now, but that's because they became extinct at the end of the Pleistocene epoch: woolly mammoths and rhinoceroses, huge crocodiles, giant ground sloths, and so on. We may have eaten the animals themselves, but we seem to have quite a lot of fruits around that evolved to have their seeds dispersed by Pleistocene megafauna that are now extinct. So my first thought was, maybe the mango is expecting to be gobbled up by a giant gound sloth, and have its giant seed pooped out elsewhere. And perhaps its urushiol-laden skin makes it unpalatable to smaller animals that might not disperse the seeds as widely, but the giant ground sloth is immune. (Similarly, I'm told that goats are immune to urushiol, and devour poison ivy as they do everything else.)

Well, maybe this theory is partly correct, but even if so, the animal definitely wasn't a giant ground sloth, because those lived only in South America, whereas the mango is native to South Asia. Ground slots and avocados, yes; mangos no.

Still the theory seems reasonable, except that mangoes are tropical fruit and I haven't been able to find any examples of Pleistocene megafauna that lived in the tropics. Still I didn't look very hard.

Wikipedia has an article on evolutionary anachronisms that lists a great many plants, but not the mango.

[ Addendum: I've eaten many mangoes but never noticed any irritation from the peel. I speculate that cultivated mangoes are varieties that have been bred to contain little or no urushiol, or that there is a post-harvest process that removes or inactivates the urushiol, or both. ]

[ Addendum 20200715: I know this article was a little disappointing and that it does not resolve the question in the title. Sorry. But I wrote a followup that you might enjoy anyway. ]

by Mark Dominus ( at July 14, 2020 06:11 PM

Gabriel Gonzalez

Record constructors


This is a short post documenting various record-related idioms in the Haskell ecosystem. First-time package users can use this post to better understand record API idioms they encounter in the wild.

For package authors, I also include a brief recommendation near the end of the post explaining which idiom I personally prefer.

The example

I’ll use the following record type as the running example for this post:

module Example where

data Person = Person{ name :: String , admin :: Bool }

There are a few ways you can create a Person record if the package author exports the record constructors.

The simplest approach requires no extensions. You can initialize the value of every field in a single expression, like this:

example :: Person
example = Person{ name = "John Doe", admin = True }

Some record literals can get quite large, so the language provides two extensions which can help with record assembly.

First, you can use the NamedFieldPuns extension, to author a record like this:

{-# LANGUAGE NamedFieldPuns #-}

example :: Person
example = Person{ name, admin }
name = "John Doe"

admin = True

This words because the NamedFieldPuns extension translates Person{ name, admin } to Person{ name = name, admin = admin }.

The RecordWildCards extension goes a step further and allows you to initialize a record literal without naming all of the fields (again), like this:

{-# LANGUAGE RecordWildCards #-}

example :: Person
example = Person{..}
name = "John Doe"

admin = True

Vice versa, you can destructure a record literal in a few ways. For example, you can access record fields using accessor functions:

render :: Person -> String
render person = name person ++ suffix
suffix = if admin person then " - Admin" else ""

… or you can pattern match on a record literal:

render :: Person -> String
render Person{ name = name, admin = admin } = name ++ suffix
suffix = if admin then " - Admin" else ""

… or you can use the NamedFieldPuns extension (which also works in reverse):

render :: Person -> String
render Person{ name, admin } = name ++ suffix
suffix = if admin then " - Admin" else ""

… or you can use the RecordWildCards extension (which also works in reverse):

render :: Person -> String
render Person{..} = name ++ suffix
suffix = if admin then " - Admin" else ""

Also, once the RecordDotSyntax extension is available you can use ordinary dot syntax to access record fields:

render :: Person -> String
render person = ++ suffix
suffix = if person.admin then " - Admin" else ""

Opaque record types

Some Haskell packages will elect to not export the record constructor. When they do so they will instead provide a function that initializes a record value with all required fields and defaults the remaining fields.

For example, suppose the name field were required for our Person type and the admin field were optional (defaulting to False). The API might look like this:

module Example (
Person(name, admin)
, makePerson
) where

data Person = Person{ name :: String, admin :: Bool }

makePerson :: String -> Person
makePerson name = Person{ name = name, admin = False }

Carefully note that the module exports the Person type and all of the fields, but not the Person constructor. So the only way that a user can create a Person record is to use the makePerson “smart constructor”. The typical idiom goes like this:

example :: Person
example = (makePerson "John Doe"){ admin = True }

In other words, the user is supposed to initialize required fields using the “smart constructor” and then set the remaining non-required fields using record syntax. This works because you can update a record type using exported fields even if the constructor is not exported.

The wai package is one of the more commonly used packages that observes this idiom. For example, the Request record is opaque but the accessors are still exported, so you can create a defaultRequest and then update that Request using record syntax:

example :: Request
example = defaultRequest{ requestMethod = "GET", isSecure = True }

… and you can still access fields using the exported accessor functions:

requestMethod example

This approach also works in conjunction with NamedFieldPuns for assembly (but not disassembly), so something like this valid:

example :: Request
example = defaultRequest{ requestMethod, isSecure }
requestMethod = "GET"

isSecure = True

However, this approach does not work with the RecordWildCards language extension.

Some other packages go a step further and instead of exporting the accessors they export lenses for the accessor fields. For example, the amazonka-* family of packages does this, leading to record construction code like this:

example :: PutObject
example =
putObject "my-example-bucket" "some-key" "some-body"
& poContentLength .~ Just 9
& poStorageClass .~ ReducedRedundancy

… and you access fields using the lenses:

view poContentLength example

My recommendation

I believe that package authors should prefer to export record constructors instead of using smart constructors. Specifically, the smart constructor idiom requires too much specialized language knowledge to create a record, something that should be an introductory task for a functional programming language.

Package authors typically justify smart constructors to improve API stability since they permit adding new default-valued fields in a backwards compatible way. However, I personally do not weight such stability highly (both as a package author and a package user) because Haskell is a typed language and these changes are easy for reverse dependencies to accommodate with the aid of the type-checker.

I place a higher premium on improving the experience for new contributors so that Haskell projects can more easily take root within a polyglot engineering organization. Management tends to be less reluctant to accept Haskell projects within their organization if they feel that other teams can confidently contribute to the Haskell code.

Future directions

One long-term solution that could provide the best of both worlds is if the language had first-class support for default-valued fields. In other words, perhaps you could author a record type like this:

data Person = Person{ name :: String , admin :: Bool = False }

… and then you could safely omit default-valued fields when initializing a record. Of course, I haven’t fully thought through the implications of such a change.

by Gabriel Gonzalez ( at July 14, 2020 02:31 PM

July 13, 2020

Monday Morning Haskell

Preparing for Rust!


Next week, we're going to change gears a bit and start some interesting projects with Rust! Towards the end of last year, we dabbled a bit with Rust and explored some of the basics of the language. In our next series of blog articles, we're going to take a deep dive into some more advanced concepts.

We'll explore several different Rust libraries in various topics. We'll consider data serialization, web servers and databases, among other. We'll build a couple small apps, and compare the results to our earlier work with Haskell.

To get ready for this series, you should brush up on your Rust basics! To help, we've wrapped up our Rust content into a permanent series on the Beginners page! Here's an overview of that series:

Part 1: Basic Syntax

We start out by learning about Rust's syntax. We'll see quite a few differences to Haskell. But there are also some similarities in unexpected places.

Part 2: Memory Management

One of the major things that sets Rust apart from other languages is how it manages memory. In the second part, we'll learn a bit about how Rust's memory system works.

Part 3: Data Types

In the third part of the series, we'll explore how to make our own data types in Rust. We'll see that Rust borrows some of Haskell's neat ideas!

Part 4: Cargo Package Manager

Cargo is Rust's equivalent of Stack and Cabal. It will be our package and dependency manager. In part 4, we see how to make basic Rust projects using Cargo.

Part 5: Lifetimes and Collections

In the final part, we'll look at some more advanced collection types in Rust. Because of Rust's memory model, we'll need some special rules for handling items in collections. This will lead us to the idea of lifetimes.

If you prefer video content, our Rust Video Tutorial also provides a solid foundation. It goes through all the topics in this series, starting from installation. Either way, stay tuned for new blog content, starting next week!

by James Bowen at July 13, 2020 02:30 PM

Tweag I/O

Qualified do: rebind your do-notation the right way

Announcement of the upcoming QualifiedDo language extension.

July 13, 2020 12:00 AM

July 11, 2020

Brent Yorgey

Competitive programming in Haskell: 2D cross product, part 1

Time for some more geometry! In my previous post I challenged you to solve Cookie Cutters, which asks us to scale the vertices of a polygon so that it has a certain prescribed area. It’s possible to solve this just by looking up an algorithm for computing the area of a polygon (see the “shoelace formula”). But the way to get good at solving geometry problems is not by memorizing a bunch of formulas, but rather by understanding a few general primitives and principles which can be assembled to solve a wide range of problems.

Incidentally, if you’re serious about getting good at geometric problems in competitive programming, then you absolutely must read Victor Lecomte’s Handbook of geometry for competitive programmers. (It’s still a great read even if you’re not serious!)

The 2D cross product

In two dimensions, given vectors \mathbf{u} = (u_x, u_y) and \mathbf{v} = (v_x, v_y), we can compute their cross product as

\mathbf{u} \times \mathbf{v} = \begin{vmatrix} u_x & v_x \\ u_y & v_y \end{vmatrix} = u_x v_y - v_x u_y.

One useful way to understand this as giving the signed area of the parallelogram determined by \mathbf{u} and \mathbf{v}. The area is positive when \mathbf{v} is counterclockwise from \mathbf{u}, negative when it is clockwise, and zero when the two vectors are colinear (i.e. parallel or antiparallel).

I’m not going to prove this here, since to be quite honest I don’t remember off the top of my head how to derive it. (Also, geometric algebra does a much better job of explaining where this comes from and generalizing to any number of dimensions; in particular, \mathbf{u} \times \mathbf{v} is the coefficient of the bivector resulting from the outer product of \mathbf{u} and \mathbf{v}. But that would take us much too far afield for now!)

So let’s write some Haskell code to compute the cross product of 2D vectors. (All this code has of course been added to Geom.hs.)

cross :: Num s => V2 s -> V2 s -> s
cross (V2 ux uy) (V2 vx vy) = ux*vy - vx*uy

crossP :: Num s => P2 s -> P2 s -> P2 s -> s
crossP p1 p2 p3 = cross (p2 ^-^ p1) (p3 ^-^ p1)

type P2 s = V2 s
type P2D  = P2 Double

A few things to note:

  • cross works over any scalar type which is an instance of Num. In solving Cookie Cutters, this is going to be Double, but it could also be, e.g. Integer.
  • For convenience, crossP is a variant of cross that takes three points as arguments, and computes the cross product of the vector from the first to the second with the vector from the first to the third. In many instances where we want to use the cross product, we actually have the coordinates of three points/vertices.
  • We’ve added P2 and P2D as type aliases for V2 and V2D. They are just aliases, not newtypes, to reduce the need for separate operators that work on points vs vectors, but it’s still helpful to have different type aliases to at least alert us to whether our functions morally want to be given vectors or points as arguments.

Now, keeping in mind the fundamental interpretation of the 2D cross product as computing the signed area of a parallelogram, we can derive a few other operations. First, given the three vertices of a triangle, we can compute the signed area of the triangle as half of the cross product (because the triangle is half the parallelogram). Note that the order of the vertices matters: the area will be positive if they are in counterclockwise order, and negative if clockwise. Swapping any two vertices negates the result. If we want the normal nonnegative area of a triangle regardless of the order of the vertices, of course we can just take the absolute value.

signedTriArea :: Fractional s => P2 s -> P2 s -> P2 s -> s
signedTriArea p1 p2 p3 = crossP p1 p2 p3 / 2

triArea :: Fractional s => P2 s -> P2 s -> P2 s -> s
triArea p1 p2 p3 = abs (signedTriArea p1 p2 p3)

(Notice the Fractional constraint since we have to divide by two.) At first glance, you might think the concept of “signed triangle area” is silly and useless. But it turns out to be the key to understanding the “shoelace formula”.

The shoelace formula for polygon area

Imagine first that we have a convex polygon. If we pick a point somewhere in its interior (say, the centroid) and draw lines from the central point to every vertex, we chop up the polygon into triangles. Obviously, adding up the areas of these triangles will give us the area of the polygon.

What’s much less obvious is that if we add up the signed area of each triangle, it still works even if (1) the polygon is not convex, and/or (2) the “central point” is not in the interior of the polygon! That is, we just pick some arbitrary “central point” (the origin works nicely) and compute the signed area of the triangle formed by the origin and each edge of the polygon. A sort of magical inclusion-exclusion thing happens where all the area outside the polygon gets canceled out, and all the area inside ends up getting counted exactly once. Rather than try to prove this to you, I’ll just illustrate some examples.

So, here’s the Haskell code. signedPolyArea yields a positive area if the vertices of the polygon are in “counterclockwise order” (puzzle: what does “counterclockwise order” mean for a non-convex polygon? Hint: look up “winding number”; this is also the key to a formal proof that all of this works), and negative if they are clockwise.

signedPolyArea :: Fractional s => [P2 s] -> s
signedPolyArea pts = sum $ zipWith (signedTriArea zero) pts (tail pts ++ [head pts])

polyArea :: Fractional s => [P2 s] -> s
polyArea = abs . signedPolyArea

The “shoelace formula”, as it is usually presented, falls out if you inline the zero argument to signedTriArea and then simplify the result. It would be possible to do this and code an optimized version of signedPolyArea that uses the shoelace formula more directly, but I much prefer having this version which is built out of meaningful and reusable components!

Incidentally, there is a 3D analogue to the shoelace formula for computing the volume of a 3D polyhedron, but it requires some care to first make sure all the faces are oriented in a compatible way; see section 3.5 of Lecomte.

Other utilities

I added a couple more utilities to Geom.hs which we will need. First, since we need to scale polygons up or down to give a required area, we need the concept of multiplying a vector by a scalar:

(*^) :: Num s => s -> V2 s -> V2 s
k *^ (V2 x y) = V2 (k*x) (k*y)

Also, to help with reading vectors from the input, I added this combinator:

v2 :: Applicative f => f s -> f (V2 s)
v2 s = V2 <$> s <*> s

The idea is to use it with f ~ Scanner. For example, if double :: Scanner Double then we can write v2 double :: Scanner (V2 Double).

Last but not least, I also added getX and getY field labels to the V2 type, for when we need to extract the coordinates of a point or vector:

data V2 s = V2 { getX :: !s, getY :: !s } deriving (Eq, Ord, Show)

Finally, here’s my solution to Cookie Cutters. First, some imports and main, which just scans the input, generates the required scaled and translated list of vertices, and then formats the output.

import           Control.Arrow
import qualified Data.Foldable as F
import           Text.Printf

import           Geom
import           Scanner

main = interact $
  runScanner scan >>> solve >>> map (F.toList >>> map (printf "%.5f") >>> unwords) >>> unlines

Here’s the data type for storing the input, along with a Scanner for it. Notice how we use v2 double' to read in 2D vectors (well, actually points!) in the input. The annoying thing is that some floating-point values in the input are formatted like .5, with no leading 0, and read ".5" :: Double crashes. Hence the need for the double' scanner below, which reads a string token and potentially adds a leading zero before conversion to Double.

data TC = TC { polygon :: [P2D], area :: Double }

scan :: Scanner TC
scan = do
  n <- int
  TC <$> n `times` (v2 double') <*> double'

double' :: Scanner Double
double' = (read . fixup) <$> str
    fixup s@('.':_) = '0':s
    fixup s         = s

And finally, putting the pieces together to solve the meat of the problem. We first compute the area of the given polygon using polyArea, then divide the desired area by the original area to find the factor by which the area must increase (or decrease). Area scales as the square of distance, so we must take the square root of this factor to find the factor by which the vertices must scale. We simply scale all the vertices appropriately, then find the minimum x and y coordinates so we can translate by their negation, to make the polygon touch the positive x and y axes as required.

solve :: TC -> [P2D]
solve (TC ps a) = map (^-^ V2 xmin ymin) ps'
    a0 = polyArea ps
    s  = sqrt (a / a0)     -- scaling factor to get the desired area
    ps' = map (s *^) ps
    xmin = minimum (map getX ps')
    ymin = minimum (map getY ps')

Next time: Chair Hopping

For next time I invite you to solve Chair Hopping. Warning, this one is rather difficult! But I had a lot of fun solving it, and the solution touches on several interesting topics (in fact, I’ll probably need more than one blog post).

by Brent at July 11, 2020 02:43 AM

July 08, 2020

Roman Cheplyaka

How I integrate ghcid with vim/neovim

ghcid by Neil Mitchell is a simple but robust tool to get instant error messages for your Haskell code.

For the most part, it doesn’t require any integration with your editor or IDE, which is exactly what makes it robust—if you can run ghci, you can run ghcid. There’s one feature though for which the editor and ghcid have to talk to one another: the ability to quickly jump to the location of the error.

The “official” way to integrate ghcid with neovim is the plugin. However, the plugin insists on running ghcid from within nvim, which makes the whole thing less robust. For instance, I often need to run ghci/ghcid in a different environment than my editor, like in a nix shell or a docker container.

Therefore, I use a simpler, plugin-less setup. After all, vim/nvim already have a feature to read the compiler output, called quickfix, and ghcid is able to write ghci’s output to a file. All we need is a few tweaks to make them play well together. This article describes the setup, which I’ve been happily using for 1.5 years now.

ghcid setup

ghcid passes some flags to ghci which makes its output a bit harder to parse.

Therefore, I build a modified version of ghcid, with a different default set of flags.

(There are probably ways to achieve this that do not require recompiling ghcid, but this is what I prefer—so that when I run ghcid, it simply does what I want.)

The patch you need to apply is very simple:

--- src/Ghcid.hs
+++ src/Ghcid.hs
@@ -97,7 +97,7 @@ options = cmdArgsMode $ Options
     ,restart = [] &= typ "PATH" &= help "Restart the command when the given file or directory contents change (defaults to .ghci and any .cabal file, unless when using stack or a custom command)"
     ,reload = [] &= typ "PATH" &= help "Reload when the given file or directory contents change (defaults to none)"
     ,directory = "." &= typDir &= name "C" &= help "Set the current directory"
-    ,outputfile = [] &= typFile &= name "o" &= help "File to write the full output to"
+    ,outputfile = ["quickfix"] &= typFile &= name "o" &= help "File to write the full output to"
     ,ignoreLoaded = False &= explicit &= name "ignore-loaded" &= help "Keep going if no files are loaded. Requires --reload to be set."
     ,poll = Nothing &= typ "SECONDS" &= opt "0.1" &= explicit &= name "poll" &= help "Use polling every N seconds (defaults to using notifiers)"
     ,max_messages = Nothing &= name "n" &= help "Maximum number of messages to print"
--- src/Language/Haskell/Ghcid/Util.hs
+++ src/Language/Haskell/Ghcid/Util.hs
@@ -47,7 +47,8 @@ ghciFlagsRequiredVersioned =
 -- | Flags that make ghcid work better and are supported on all GHC versions
 ghciFlagsUseful :: [String]
 ghciFlagsUseful =
-    ["-ferror-spans" -- see #148
+    ["-fno-error-spans"
+    ,"-fno-diagnostics-show-caret"
     ,"-j" -- see #153, GHC 7.8 and above, but that's all I support anyway

Alternatively, you can clone my fork of ghcid at, which already contains the patch.

Apart from changing the default flags passed to ghci, it also tells ghcid to write the ghci output to the file called quickfix by default, so that you don’t have to write -o quickfix on the command line every time.

vim/neovim setup

Here are the vim pieces that you’ll need to put into your .vimrc or init.vim. First, set the errorformat option to tell vim how to parse ghci’s error messages:

set errorformat=%C%*\\s•\ %m,
               \%-C\ %.%#,
               \%A%f:%l:%c:\ %t%.%#

Don’t ask me how it works—it’s been a long time since I wrote it—but it works.

Next, I prefer to define a few keybindings that make quickfix’ing easier:

map <F5> :cfile quickfix<CR>
map <C-j> :cnext<CR>
map <C-k> :cprevious<CR>

When I see any errors in the ghcid window, I press F5 to load them into vim and jump to the first error. Then, if I need to, I use Ctrl-j and Ctrl-k to jump between different errors.

July 08, 2020 08:00 PM

Mark Jason Dominus

Ron Graham has died

Ron Graham has died. He had a good run. When I check out I will probably not be as accomplished or as missed as Graham, even if I make it to 84.

I met Graham once and he was very nice to me, as he apparently was to everyone. I was planning to write up a reminiscence of the time, but I find I've already done it so you can read that if you care.

Graham's little book Rudiments of Ramsey Theory made a big impression on me when I was an undergraduate. Chapter 1, if I remember correctly, is a large collection of examples, which suited me fine. Chapter 2 begins by introducing a certain notation of Erdős and Rado: is the family of subsets of of size , and

$$\left[{\Bbb N\atop k}\right] \to \left[{\Bbb N\atop k}\right]_r$$

is an abbreviation of the statement that for any -coloring of members of there is always an infinite subset for which every member of is the same color. I still do not find this notation perspicuous, and at the time, with much less experience, I was boggled. In the midst of my bogglement I was hit with the next sentence, which completely derailed me:

Scan of two lines from _Rudiments of Ramsey Theory_ including the sentence “We will occasionally use this arrow notation unless there is danger of no confusion.”

After this I could no longer think about the mathematics, but only about the sentence.

Outside the mathematical community Graham is probably best-known for juggling, or for Graham's number, which Wikipedia describes:

At the time of its introduction, it was the largest specific positive integer ever to have been used in a published mathematical proof.

One of my better Math Stack Exchange posts was in answer to the question Graham's Number : Why so big?. I love the phrasing of this question! And that, even with the strange phrasing, there is an answer! This type of huge number is quite typical in proofs of Ramsey theory, and I answered in detail.

The sense of humor that led Graham to write “danger of no confusion” is very much on display in the paper that gave us Graham's number. If you are wondering about Graham's number, check out my post.

by Mark Dominus ( at July 08, 2020 04:40 PM

Addendum to “Weirdos during the Depression”

[ Previously ]

Ran Prieur had a take on this that I thought was insightful:

I would frame it like this: If you break rules that other people are following, you have to pretend to be unhappy, or they'll get really mad, because they don't want to face the grief that they could have been breaking the rules themselves all this time.

by Mark Dominus ( at July 08, 2020 06:40 AM

Tweag I/O

Setting up Buildkite for Nix-based projects using Terraform and GCP

How to setup a Buildkite-based CI for Nix projects with workers running on GCP.

July 08, 2020 12:00 AM

July 07, 2020

Neil Mitchell

How I Interview

Summary: In previous companies I had a lot of freedom to design an interview. This article describes what I came up with.

Over the years, I've interviewed hundreds of candidates for software engineering jobs (at least 500, probably quite a bit more). I've interviewed for many companies, for teams I was setting up, for teams I was managing, for teams I worked in, and for different teams at the same company. In most places, I've been free to set the majority of the interview. This post describes why and how I designed my interview process. I'm making this post now because where I currently work has a pre-existing interview process, so I won't be following the process below anymore.

I have always run my interviews as a complete assessment of a candidate, aiming to form a complete answer. Sometimes I did that as a phone screen, and sometimes as part of a set of interviews, but I never relied on other people to cover different aspects of a candidate. (Well, I did once, and it went badly...)

When interviewing, there are three questions I want to answer for myself, in order of importance.

Will they be happy here?

If the candidate joined, would they be happy? If people aren't happy, it won't be a pleasant experience, and likely, they won't be very successful. Whether they are happy is the most important criteria because an employee who can't do the job but is happy can be trained or can use their skills for other purposes. But an employee who is unhappy will just drag the morale of the whole team down.

To figure out whether a candidate would be happy, I explain the job (including any office hours/environment/location) and discuss it in contrast to their previous experience. The best person to judge if they would be happy are the candidate themselves - so I ask that question. The tricky part is that it's an interview setting, so they have prepared saying "Yes, that sounds good" to every question. I try and alleviate that by building a rapport with the candidate first, being honest about my experiences, and trying to discuss what they like in the abstract first. If I'm not convinced they are being truthful or properly thinking it through, I ask deeper questions, for example how they like to split their day etc.

A great sign is when a candidate, during the interview, concludes for themselves that this job just isn't what they were looking for. I've had that happen 5 times during the actual interview, and 2 times as an email afterwards. It isn't awkward, and has saved some candidates an inappropriate job (at least 2 would have likely been offered a job otherwise).

While I'm trying to find out if the candidate will be happy, at the same time, I'm also attempting to persuade the candidate that they want to join. It's a hard balance and being open and honest is the only way I have managed it. Assuming I am happy where I work, I can use my enthusiasm to convince the candidate it's a great place, but also give them a sense of what I do.

Can they do the job?

There are two ways I used to figure out if someone can do the job. Firstly, I discuss their background, coding preferences etc. Do the things they've done in the past match the kind of things required in the job. Have they got experience with the non-technical portions of the job, or domain expertise. Most of these aspects are on their CV, so it involves talking about their CV, past projects, what worked well etc.

Secondly, I give them a short technical problem. My standard problem can be solved in under a minute in a single line of code by the best candidates. The problem is not complex, and has no trick-question or clever-approach element. The result can then be used as a springboard to talk about algorithmic efficiency, runtime implementation, parallelism, testing, verification etc. However, my experience is that candidates who struggle at the initial problem go on to struggle with any of the extensions, and candidates that do well at the initial question continue to do well on the extensions. The correlation has been so complete that over time I have started to use the extensions more for candidates who did adequately but not great on the initial problem.

My approach of an incredibly simple problem does not seem to be standard or adopted elsewhere. One reason might be that if it was used at scale, the ability to cheat would be immense (I actually have 2 backup questions for people I've interviewed previously).

Given such a simple question, there have been times when 5 candidates in a row ace the question, and I wonder if the question is just too simple. But usually then the next 5 candidates all struggle terribly and I decide it still has value.

Will I be happy with them doing the job?

The final thing I wonder is would I be happy with them being a part of the team/company. The usual answer is yes. However, if the candidate displays nasty characteristics (belittling, angry, racist, sexist, lying) then it's a no. This question definitely isn't code for "culture fit" or "would I go for a beer with them", but specific negative traits. Generally I answer this question based on whether I see these characteristics reflected in the interactions I have with the candidate, not specific questions. I've never actually had a candidate who was successful at the above questions, and yet failed at this question. I think approximately 5-10 candidates have failed on this question.

by Neil Mitchell ( at July 07, 2020 09:21 PM

Mark Jason Dominus

Weird constants in math problems

Michael Lugo recently considered a problem involving the allocation of swimmers to swim lanes at random, ending with:

If we compute this for large we get , which agrees with the Monte Carlo simulations… The constant is $$\frac{(1-e^{-2})}2.$$

I love when stuff like this happens. The computer is great at doing a quick random simulation and getting you some weird number, and you have no idea what it really means. But mathematical technique can unmask the weird number and learn its true identity. (“It was Old Man Haskins all along!”)

A couple of years back Math Stack Exchange had Expected Number and Size of Contiguously Filled Bins, and although it wasn't exactly what was asked, I ended up looking into this question: We take balls and throw them at random into bins that are lined up in a row. A maximal contiguous sequence of all-empty or all-nonempty bins is called a “cluster”. For example, here we have 13 balls that I placed randomly into 13 bins:

13 boxes, some with blue balls.  The boxes contain, respectively, 1, 0, 3, 0, 1, 2, 1, 1, 0, 1, 2, 1, 0 balls.

In this example, there are 8 clusters, of sizes 1, 1, 1, 1, 4, 1, 3, 1. Is this typical? What's the expected cluster size?

It's easy to use Monte Carlo methods and find that when is large, the average cluster size is approximately . Do you recognize this number? I didn't.

But it's not hard to do the calculation analytically and discover that that the reason it's approximately is that the actual answer is $$\frac1{2(e^{-1} - e^{-2})}$$ which is approximately .

Math is awesome and wonderful.

(Incidentally, I tried the Inverse Symbolic Calculator just now, but it was no help. It's also not in Plouffe's Miscellaneous Mathematical Constants)

[ Addendum 20200707: WolframAlpha does correctly identify the constant. ]

by Mark Dominus ( at July 07, 2020 01:42 AM

July 06, 2020

Monday Morning Haskell

Summer Course Sale!

newlogo3 (2).png

This week we have some exciting news! Back in March, we opened our Practical Haskell course for enrollment. The first round of students has had a chance to go through the course. So we're now opening it up for general enrollment!

This course goes through some more practical concepts and libraries you might use on a real world project. Here's a sneak peak at some of the skills you'll learn:

  1. Making a web server with Persistent and Servant
  2. Deploying a Haskell project using Heroku and Circle CI
  3. Making a web frontend with Elm, and connecting it to the Haskell backend
  4. Using Monad Transformers and Free Effects to organize our application
  5. Test driven development in Haskell

As a special bonus, for this week only, both of our courses are on sale, $100 off their normal prices! So if you're not ready for Practical Haskell, you can take a look at Haskell From Scratch. With that said, if you buy either course now, you'll have access to all the materials indefinitely! Prices will go back to normal after this Sunday, so head to the course pages now!

Next week, we'll start getting back into the swing of things by reviewing some of our Rust basics!

by James Bowen at July 06, 2020 02:30 PM

Derek Elkins

Enriched Indexed Categories, Syntactically


This is part 3 in a series. See the previous part about internal languages for indexed monoidal categories upon which this part heavily depends.

In category theory, the hom-sets between two objects can often be equipped with some extra structure which is respected by identities and composition. For example, the set of group homomorphisms between two abelian groups is itself an abelian group by defining the operations pointwise. Similarly, the set of monotonic functions between two partially ordered sets (posets) is a poset again by defining the ordering pointwise. Linear functions between vector spaces form a vector space. The set of functors between small categories is a small category. Of course, the structure on the hom-sets can be different than the objects. Trivially, with the earlier examples a vector space is an abelian group, so we could say that linear functions form an abelian group instead of a vector space. Likewise groups are monoids. Less trivially, the set of relations between two sets is a partially ordered set via inclusion. There are many cases where instead of hom-sets we have hom-objects that aren’t naturally thought of as sets. For example, we can have hom-objects be non-negative (extended) real numbers from which the category laws become the laws of a generalized metric space. We can identify posets with categories who hom-objects are elements of a two element set or, even better, a two element poset with one element less than or equal to the other.

This general process is called enriching a category in some other category which is almost always called |\newcommand{\V}{\mathcal V}\V| in the generic case. We then talk about having |\V|-categories and |\V|-functors, etc. In a specific case, it will be something like |\mathbf{Ab}|-categories for an |\mathbf{Ab}|-enriched category, where |\mathbf{Ab}| is the category of abelian groups. Unsurprisingly, not just any category will do for |\V|. However, it turns out very little structure is needed to define a notion of |\V|-category, |\V|-functor, |\V|-natural transformation, and |\V|-profunctor. The usual “baseline” is that |\V| is a monoidal category. As mentioned in the previous post, paraphrasing Bénabou, notions of “families of objects/arrows” are ubiquitous and fundamental in category theory. It is useful for our purposes to make this structure explicit. For very little cost, this will also provide a vastly more general notion that will readily capture enriched categories, indexed categories, and categories that are simultaneously indexed and enriched, of which internal categories are an example. The tool for this is a (Grothendieck) fibration aka a fibered category or the mostly equivalent concept of an indexed category.1

To that end, instead of just a monoidal category, we’ll be using indexed monoidal categories. Typically, to get an experience as much like ordinary category theory as possible, additional structure is assumed on |\V|. In particular, it is assumed to be an (indexed) cosmos which means that it is an indexed symmetric monoidally closed category with indexed coproducts preserved by |\otimes| and indexed products and fiberwise finite limits and colimits (preserved by the indexed structure). This is quite a lot more structure which I’ll introduce in later parts. In this part, I’ll make no assumptions beyond having an indexed monoidal category.

Basic Category Theory in Indexed Monoidal Categories

The purpose of the machinery of the previous posts is to make this section seem boring and pedestrian. Other than being a little more explicit and formal, for most of the following concepts it will look like we’re restating the usual definitions of categories, functors, and natural transformations. The main exception is profunctors which will be presented in a quite different manner, though still in a manner that is easy to connect to the usual presentation. (We will see how to recover the usual presentation in later parts.)

While I’ll start by being rather explicit about indexes and such, I will start to suppress that detail over time as most of it is inferrable. One big exception is that right from the start I’ll omit the explicit dependence of primitive terms on indexes. For example, while I’ll write |\mathsf F(\mathsf{id}) = \mathsf{id}| for the first functor law, what the syntax of the previous posts says I should be writing is |\mathsf F(s, s; \mathsf{id}(s)) = \mathsf{id}(s)|.

To start, I want to introduce two different notions of |\V|-category, small |\V|-categories and large |\V|-categories, and talk about what this distinction actually means. I will proceed afterwards with the “large” notions, e.g. |\V|-functors between large |\V|-categories, as the small case will be an easy special case.

Small |\V|-categories

The theory of a small |\V|-category consists of:

  • an index type |\newcommand{\O}{\mathsf O}\O|,
  • a linear type |\newcommand{\A}{\mathsf A}s, t : \O \vdash \A(t, s)|,
  • a linear term |s : \O; \vdash \mathsf{id} : \A(s, s)|, and
  • a linear term |s, u, t : \O; g : \A(t, u), f : \A(u, s) \vdash g \circ f : \A(t, s)|

$$\begin{gather} s, t : \O; f : \A(t, s) \vdash \mathsf{id} \circ f = f = f \circ \mathsf{id} : \A(t, s)\qquad \text{and} \\ \\ s, u, v, t : \O; h : \A(t, v), g : \A(v, u), f : \A(u, s) \vdash h \circ (g \circ f) = (h \circ g) \circ f : \A(t, s) \end{gather}$$

In the notation of the previous posts, I’m saying |\O : \mathsf{IxType}|, |\A : (\O, \O) \to \mathsf{Type}|, |\mathsf{id} : (s : \O;) \to \A(s, s)|, and |\circ : (s, u, t : \O; \A(t, u), \A(u, s)) \to \A(t, s)| are primitives added to the signature of the theory. I’ll continue to use the earlier, more pointwise presentation above to describe the signature.

A small |\V|-category for an |\mathbf S|-indexed monoidal category |\V| is then an interpretation of this theory. That is, an object |O| of |\mathbf S| as the interpretation of |\O|, and an object |A| of |\V^{O\times O}| as the interpretation of |\A|. The interpretation of |\mathsf{id}| is an arrow |I_O \to \Delta_O^* A| of |\V^O|, where |\Delta_O : O \to O\times O| is the diagonal arrow |\langle id, id\rangle| in |\mathbf S|. The interpretation of |\circ| is an arrow |\pi_{23}^* A \otimes \pi_{12}^* A \to \pi_{13}^* A| of |\V^{O\times O \times O}| where |\pi_{ij} : X_1 \times X_2 \times X_3 \to X_i \times X_j| are the appropriate projections.

Since we can prove in the internal language that the choice of |()| for |\O|, |s, t : () \vdash I| for |\A|, |s, t : (); x : I \vdash x : I| for |\mathsf{id}|, and |s, u, t : (); f : I, g: I \vdash \mathsf{match}\ f\ \mathsf{as}\ *\ \mathsf{in}\ g : I| for |\circ| satisfies the laws of the theory of a small |\V|-category, we know we have a |\V|-category which I’ll call |\mathbb I| for any |\V|2.

For |\V = \mathcal Fam(\mathbf V)|, |O| is a set of objects. |A| is an |(O\times O)|-indexed family of objects of |\mathbf V| which we can write |\{A(t,s)\}_{s,t\in O}|. The interpretation of |\mathsf{id}| is an |O|-indexed family of arrows of |\mathbf V|, |\{ id_s : I_s \to A(s, s) \}_{s\in O}|. Finally, the interpretation of |\circ| is a family of arrows of |\mathbf V|, |\{ \circ_{s,u,t} : A(t, u)\otimes A(u, s) \to A(t, s) \}_{s,u,t \in O}|. This is exactly the data of a (small) |\mathbf V|-enriched category. One example is when |\mathbf V = \mathbf{Cat}| which produces (strict) |2|-categories.

For |\V = \mathcal Self(\mathbf S)|, |O| is an object of |\mathbf S|. |A| is an arrow of |\mathbf S| into |O\times O|, i.e. an object of |\mathbf S/O\times O|. I’ll write the object part of this as |A| as well, i.e. |A : A \to O\times O|. The idea is that the two projections are the target and source of the arrow. The interpretation of |\mathsf{id}| is an arrow |ids : O \to A| such that |A \circ ids = \Delta_O|, i.e. the arrow produced by |ids| should have the same target and source. Finally, the interpretation of |\circ| is an arrow |c| from the pullback of |\pi_2 \circ A| and |\pi_1 \circ A| to |A|. The source of this arrow is the object of composable pairs of arrows. We further require |c| to produce an arrow with the appropriate target and source of the composable pair. This is exactly the data for a category internal to |\mathbf S|. An interesting case for contrast with the previous paragraph is that a category internal to |\mathbf{Cat}| is a double category.

For |\V = \mathcal Const(\mathbf V)|, the above data is exactly the data of a monoid object in |\mathbf V|. This is a formal illustration that a (|\mathbf V|-enriched) category is just an “indexed monoid”. Indeed, |\mathcal Const(\mathbf V)|-functors will be monoid homomorphisms and |\mathcal Const(\mathbf V)|-profunctors will be double-sided monoid actions. In particular, when |\mathbf V = \mathbf{Ab}|, we get rings, ring homomorphisms, and bimodules of rings. The intuitions here are the guiding ones for the construction we’re realizing.

One aspect of working in a not-necessarily-symmetric (indexed) monoidal category is the choice of the standard order of composition or diagrammatic order is not so trivial since it is not possible to even state what it means for them to be the equivalent. To be clear, this definition isn’t really taking a stance on the issue. We can interpret |\A(t, s)| as the type of arrows |s \to t| and then |\circ| will be the standard order of composition, or as the type of arrows |t \to s| and then |\circ| will be the diagrammatic order. In fact, there’s nothing in this definition that stops us from having |\A(t, s)| being the type of arrows |t \to s| while still having |\circ| be standard order composition as usual. The issue comes up only once we consider |\V|-profunctors as we will see.

Large |\V|-categories

A large |\V|-category is a model of a theory of the following form. There is

  • a collection of index types |\O_x|,
  • for each pair of index types |\O_x| and |\O_y|, a linear type |s : \O_x, t : \O_y \vdash \A_{yx}(t, s)|,
  • for each index type |\O_x|, a linear term |s : \O_x; \vdash \mathsf{id}_x : \A_{xx}(s, s)|, and
  • for each triple of index types, |\O_x|, |\O_y|, and |\O_z|, a linear term |s: \O_x, u : \O_y, t : \O_z; g : \A_{zy}(t, u), f : \A_{yx}(u, s) \vdash g \circ_{xyz} f : \A_{zx}(t, s)|

satisfying the same laws as small |\V|-categories, just with some extra subscripts. Clearly, a small |\V|-category is just a large |\V|-category where the collection of index types consists of just a single index type.

Small versus Large

The typical way of describing the difference between small and large (|\V|-)categories would be to say something like: “By having a collection of index types in a large |\V|-category, we can have a proper class of them. In a small |\V|-category, the index type of objects is interpreted as an object in a category, and a proper class can’t be an object of a category3.” However, for us, there’s a more directly relevant distinction. Namely, while we had a single theory of small |\V|-categories, there is no single theory of large |\V|-categories. Different large |\V|-categories correspond to models of (potentially) different theories. In other words, the notion of a small |\V|-category is able to be captured by our notion of theory but not the concept of a large |\V|-category. This extends to |\V|-functors, |\V|-natural transformations, and |\V|-profunctors. In the small case, we can define a single theory which captures each of these concepts, but that isn’t possible in the large case. In general, notions of “large” and “small” are about what we can internalize within the relevant object language, usually a set theory. Arguably, the only reason we speak of “size” and of proper classes being “large” is that the Axiom of Specification outright states that any subclass of a set is a set, so proper classes in ZFC can’t be subsets of any set. As I’ve mentioned elsewhere, you can definitely have set theories with proper classes that are contained in even finite sets, so the issue isn’t one of “bigness”.

The above discussion also explains the hand-wavy word “collection”. The collection is a collection in the meta-language in which we’re discussing/formalizing the notion of theory. When working within the theory of a particular large |\V|-category, all the various types and terms are just available ab initio and are independent. There is no notion of “collection of types” within the theory and nothing indicating that some types are part of a “collection” with others.

Another perspective on this distinction between large and small |\V|-categories is that small |\V|-categories have a family of arrows, identities, and compositions with respect to the notion of “family” represented by our internal language. If we hadn’t wanted to bother with formulating the internal language of an indexed monoidal category, we could have still defined the notion of |\V|-category with respect to the internal language of a (non-indexed) monoidal category. It’s just that all such |\V|-categories (except for monoid objects) would have to be large |\V|-categories. That is, the indexing and notion of “family” would be at a meta-level. Since most of the |\V|-categories of interest will be large (though, generally a special case called a |\V|-fibration which reins in the size a bit), it may seem that there was no real benefit to the indexing stuff. Where it comes in, or rather where small |\V|-categories come in, is that our notion of (co)complete means “has all (co)limits of small diagrams” and small diagrams are |\V|-functors from small |\V|-categories.4 There are several other places, e.g. the notion of presheaf, where we implicitly depend on what we mean by “small |\V|-category”. So while we won’t usually be focused on small |\V|-categories, which |\V|-categories are small impacts the structure of the whole theory.


The formulation of |\V|-functors is straightforward. As mentioned before, I’ll only present the “large” version.

Formally, we can’t formulate a theory of just a |\V|-functor, but rather we need to formulate a theory of “a pair of |\V|-categories and a |\V|-functor between them”.

A |\V|-functor between (large) |\V|-categories |\mathcal C| and |\mathcal D| is a model of a theory consisting of a theory of a large |\V|-category, of which |\mathcal C| is a model, and a theory of a large |\V|-category which I’ll write with primes, of which |\mathcal D| is a model, and model of the following additional data:

  • for each index type |\O_x|, an index type |\O’_{F_x}| and an index term, |s : \O_x \vdash \mathsf F_x(s) : \O’_{F_x}|, and
  • for each pair of index types, |\O_x| and |\O_y|, a linear term |s : \O_x, t : \O_y; f : \A_{yx}(t, s) \vdash \mathsf F_{yx}(f) : \A’_{F_yF_x}(\mathsf F_y(t), \mathsf F_x(s))|

$$\begin{gather} s : \O_x; \vdash \mathsf F_{xx}(\mathsf{id}_x) = \mathsf{id}'_{F_x}: \A'_{F_xF_x}(F_x(s), F_x(s))\qquad\text{and} \\ s : \O_x, u : \O_y, t : \O_z; g : \A_{zy}(t, u), f : \A_{yx}(u, s) \vdash \mathsf F_{zx}(g \circ_{xyz} f) = F_{zy}(g) \circ'_{F_xF_yF_z} F_{yx}(f) : \A'_{F_zF_x}(F_z(t), F_x(s)) \end{gather}$$

The assignment of |\O’_{F_x}| for |\O_x| is, again, purely metatheoretical. From within the theory, all we know is that we happen to have some index types named |\O_x| and |\O’_{F_x}| and some data relating them. The fact that there is some kind of mapping of one to the other is not part of the data.

Next, I’ll define |\V|-natural transformations . As before, what we’re really doing is defining |\V|-natural transformations as a model of a theory of “a pair of (large) |\V|-categories with a pair of |\V|-functors between them and a |\V|-natural transformation between those”. As before, I’ll use primes to indicate the types and terms of the second of each pair of subtheories. Unlike before, I’ll only mention what is added which is:

  • for each index type |\O_x|, a linear term |s : \O_x; \vdash \tau_x : \A’_{F’_xF_x}(\mathsf F’_x(s), \mathsf F_x(s))|

$$\begin{gather} s : \O_x, t : \O_y; f : \A_{yx}(t, s) \vdash \mathsf F'_{yx}(f) \circ'_{F_xF'_xF'_y} \tau_x = \tau_y \circ'_{F_xF_yF'_y} \mathsf F_{yx}(f) : \A'_{F'_yF_x}(\mathsf F'_y(t), \mathsf F_x(s)) \end{gather}$$

In practice, I’ll suppress the subscripts on all but index types as the rest are inferrable. This makes the above equation the much more readable
$$\begin{gather} s : \O_x, t : \O_y; f : \A(t, s) \vdash \mathsf F'(f) \circ' \tau = \tau \circ' \mathsf F(f) : \A'(\mathsf F'(t), \mathsf F(s)) \end{gather}$$


Here’s where we need to depart from the usual story. In the usual story, a |\mathbf V|-enriched profunctor |\newcommand{\proarrow}{\mathrel{-\!\!\!\mapsto}} P : \mathcal C \proarrow \mathcal D| is a |\mathbf V|-enriched functor |P : \mathcal C\otimes\mathcal D^{op}\to\mathbf V| (or, often, the opposite convention is used |P : \mathcal C^{op}\otimes\mathcal D \to \mathbf V|). There are many problems with this definition in our context.

  1. Without symmetry, we have no definition of opposite category.
  2. Without symmetry, the tensor product of |\mathbf V|-enriched categories doesn’t make sense.
  3. |\mathbf V| is not itself a |\mathbf V|-enriched category, so it doesn’t make sense to talk about |\mathbf V|-enriched functors into it.
  4. Even if it was, we’d need some way of converting between arrows of |\mathbf V| as a category and arrows of |\mathbf V| as a |\mathbf V|-enriched category.
  5. The equation |P(g \circ f, h \circ k) = P(g, k) \circ P(f, h)| requires symmetry. (This is arguably 2 again.)

All of these problems are solved when |\mathbf V| is a symmetric monoidally closed category.

Alternatively, we can reformulate the notion of a |\V|-profunctor so that it works in our context and is equivalent to the usual one when it makes sense.5 To this end, at a low level a |\mathbf V|-enriched profunctor is a family of arrows
$$\begin{gather}P : \mathcal C(t, s)\otimes\mathcal D(s', t') \to [P(s, s'), P(t, t')]\end{gather}$$
which satisfies
$$\begin{gather}P(g \circ f, h \circ k)(p) = P(g, k)(P(f, h)(p))\end{gather}$$
in the internal language of a symmetric monoidally closed category among other laws. We can uncurry |P| to eliminate the need for closure, getting
$$\begin{gather}P : \mathcal C(t, s)\otimes \mathcal D(s', t')\otimes P(s, s') \to P(t, t')\end{gather}$$
$$\begin{gather}P(g \circ f, h \circ k, p) = P(g, k, P(f, h, p))\end{gather}$$
We see that we’re always going to need to permute |f| and |h| past |k| unless we move the third argument to the second producing the nice
$$\begin{gather}P : \mathcal C(t, s)\otimes P(s, s') \otimes \mathcal D(s', t') \to P(t, t')\end{gather}$$
and the law
$$\begin{gather}P(g \circ f, p, h \circ k) = P(g, P(f, p, h), k)\end{gather}$$
which no longer requires symmetry. This is also where the order of the arguments of |\circ| drives the order of the arguments of |\V|-profunctors.

A |\V|-profunctor, |P : \mathcal C \proarrow \mathcal D|, is a model of the theory (containing subtheories for |\mathcal C| and |\mathcal D| etc. as in the |\V|-functor case) having:

  • for each pair of index types |\O_x| and |\O’_{x’}|, a linear type |s : \O_x, t : \O’_{x’} \vdash \mathsf P_{x’x}(t, s)|, and
  • for each quadruple of index types |\O_x|, |\O_y|, |\O’_{x’}|, and |\O’_{y’}|, a linear term |s : \O_x, s’ : \O’_{x’}, t : \O_y, t’ : \O’_{y’}; f : \A_{yx}(t, s), p : \mathsf P_{xx’}(s, s’), h : \A’_{x’y’}(s’, t’) \vdash \mathsf P_{yxx’y’}(f, p, h) : \mathsf P_{yy’}(t, t’)|

$$\begin{align} & s : \O_x, s' : \O'_{x'}; p : \mathsf P(s, s') \vdash \mathsf P(\mathsf{id}, p, \mathsf{id}') = p : \mathsf P(s, s') \\ \\ & s : \O_x, s' : \O'_{x'}, u : \O_y, u' : \O'_{y'}, t : \O_z, t' : \O'_{z'}; \\ & g : \A(t, u), f : \A(u, s), p : \mathsf P(s, s'), h : \A'(s', u'), k : \A'(u', t') \\ \vdash\ & \mathsf P(g \circ f, p, h \circ' k) = \mathsf P(g, \mathsf P(f, p, h), k) : \mathsf P(t, t') \end{align}$$

This can also be equivalently presented as a pair of a left and a right action satisfying bimodule laws. We’ll make the following definitions |\mathsf P_l(f, p) = \mathsf P(f, p, \mathsf {id})| and |\mathsf P_r(p, h) = \mathsf P(\mathsf{id}, p ,h)|.

A |\V|-presheaf on |\mathcal C| is a |\V|-profunctor |P : \mathbb I \proarrow \mathcal C|. Similarly, a |\V|-copresheaf on |\mathcal C| is a |\V|-profunctor |P : \mathcal C \proarrow \mathbb I|.

Of course, we have the fact that the term
$$\begin{gather} s : \O_x, t : \O_y, s' : \O_z, t' : \O_w; h : \A(t, s), g : \A(s, s'), f : \A(s', t') \vdash h \circ g \circ f : \A(t, t') \end{gather}$$
witnesses the interpretation of |\A| as a |\V|-profunctor |\mathcal C \proarrow \mathcal C| for any |\V|-category, |\mathcal C|, which we’ll call the hom |\V|-profunctor. More generally, given a |\V|-profunctor |P : \mathcal C \proarrow \mathcal D|, and |\V|-functors |F : \mathcal C’ \to \mathcal C| and |F’ : \mathcal D’ \to \mathcal D|, we have the |\V|-profunctor |P(F, F’) : \mathcal C’ \proarrow \mathcal D’| defined as
$$\begin{gather} s : \O_x, s' : \O'_{x'}, t : \O_y, t' : \O'_{y'}; f : \A(t, s), p : \mathsf P(\mathsf F(s), \mathsf F'(s')), f' : \A'(s', t') \vdash \mathsf P(\mathsf F(f), p, \mathsf F'(f')) : \mathsf P(\mathsf F(t), \mathsf F'(t')) \end{gather}$$
In particular, we have the representable |\V|-profunctors when |P| is the hom |\V|-profunctor and either |F| or |F’| is the identity |\V|-functor, e.g. |\mathcal C(Id, F)| or |\mathcal C(F, Id)|.


There’s a natural notion of morphism of |\V|-profunctors which we could derive either via passing the notion of natural transformation of the bifunctorial view through the same reformulations as above, or by generalizing the notion of a bimodule homomorphism. This would produce a notion like: a |\V|-natural transformation from |\alpha : P \to Q| is a |\alpha : P(t, s) \to Q(t, s)| satisfying |\alpha(P(f, p, h)) = Q(f, \alpha(p), h)|. While there’s nothing wrong with this definition, it doesn’t quite meet our needs. One way to see this is that it would be nice to have a bicategory whose |0|-cells were |\V|-categories, |1|-cells |\V|-profunctors, and |2|-cells |\V|-natural transformations as above. The problem there isn’t the |\V|-natural transformations but the |1|-cells. In particular, we don’t have composition of |\V|-profunctors. In the analogy with bimodules, we don’t have tensor products so we can’t reduce multilinear maps to linear maps; therefore, linear maps don’t suffice, and we really want a notion of multilinear maps.

So, instead of a bicategory what we’ll have is a virtual bicategory (or, more generally, a virtual double category). A virtual bicategory is to a bicategory what a multicategory is to a monoidal category, i.e. multicategories are “virtual monoidal categories”. The only difference between a virtual bicategory and a multicategory is that instead of our multimorphisms having arbitrary lists of objects as their sources, our “objects” (|1|-cells) themselves have sources and targets (|0|-cells) and our multimorphisms (|2|-cells) have composable sequences of |1|-cells as their sources.

A |\V|-multimorphism from a composable sequence of |\V|-profunctors |P_1, \dots, P_n| to the |\V|-profunctor |Q| is a model of the theory consisting of the various necessary subtheories and:

  • a linear term, |s_0 : \O_{x_0}^0, \dots, s_n : \O_{x_n}^n; p_1 : \mathsf P_{x_0x_1}^1(s_0, s_1), \dots, p_n : \mathsf P_{x_{n-1}x_n}^n(s_{n-1}, s_n) \vdash \tau_{x_0\cdots x_n}(p_1, \dots, p_n) : \mathsf Q_{x_0x_n}(s_0, s_n)|

$$\begin{align} & t, s_0 : \O^0, \dots, s_n : \O^n; f : \A^0(t, s_0), p_1 : \mathsf P^1(s_0, s_1), \dots, p_n : \mathsf P^n(s_{n-1}, s_n) \\ \vdash\ & \tau(\mathsf P_l^0(f, p_1), \dots, p_n) = \mathsf Q_l(f, \tau(p_1, \dots, p_n)) : \mathsf Q(t, s_n) \\ \\ & s_0 : \O^0, \dots, s_n, s : \O^n; p_1 : \mathsf P^1(s_0, s_1), \dots, p_n : \mathsf P^n(s_{n-1}, s_n), f : \A^n(s_n, s) \\ \vdash\ & \tau(p_1, \dots, \mathsf P_r^n(p_n, f)) = \mathsf Q_r(\tau(p_1, \dots, p_n), f) : \mathsf Q(s_0, s) \\ \\ & s_0 : \O^0, \dots, s_n : \O^n; \\ & p_1 : \mathsf P^1(s_0, s_1), \dots, p_i : \mathsf P^i(s_{i-1}, s_i), f : \A^i(s_i, s_{i+1}), p_{i+1} : \mathsf P^{i+1}(s_i, s_{i+1}), \dots, p_n : \mathsf P^n(s_{n-1}, s_n) \\ \vdash\ & \tau(p_1, \dots, \mathsf P_r^i(p_i, f), p_{i+1}, \dots, p_n) = \tau(p_1, \dots, p_i, \mathsf P_l^{i+1}(f, p_{i+1}) \dots, p_n) : \mathsf Q(s_0, s_n) \end{align}$$
except for the |n=0| case in which case the only law is
$$\begin{gather} t, s : \O^0; f : \A^0(t, s) \vdash \mathsf Q_l(f, \tau()) = \mathsf Q_r(\tau(), f) : \mathsf Q(t, s) \end{gather}$$

The laws involving the action of |\mathsf Q| are called external equivariance, while the remaining law is called internal equivariance. We’ll write |\V\mathbf{Prof}(P_1, \dots, P_n; Q)| for the set of |\V|-multimorphisms from the composable sequence of |\V|-profunctors |P_1, \dots, P_n| to the |\V|-profunctor |Q|.

As with multilinear maps, we can characterize composition via a universal property. Write |Q_1\diamond\cdots\diamond Q_n| for the composite |\V|-profunctor (when it exists) of the composable sequence |Q_1, \dots, Q_n|. We then have for any pair of composable sequences |R_1, \dots, R_m| and |S_1, \dots, S_k| which compose with |Q_1, \dots, Q_n|,
$$\begin{gather} \V\mathbf{Prof}(R_1,\dots, R_m, Q_1 \diamond \cdots \diamond Q_n, S_1, \dots, S_k; -) \cong \V\mathbf{Prof}(R_1,\dots, R_m, Q_1, \dots, Q_n, S_1, \dots, S_k; -) \end{gather}$$
where the forward direction is induced by precomposition with a |\V|-multimorphism |Q_1, \dots, Q_n \to Q_1 \diamond \cdots \diamond Q_n|. A |\V|-multimorphism with this property is called opcartesian. The |n=0| case is particularly important and, for a |\V|-category |\mathcal C|, produces the unit |\V|-profunctor, |U_\mathcal C : \mathcal C \proarrow \mathcal C| as the composite of the empty sequence. When we have all composites, |\V\mathbf{Prof}| becomes an actual bicategory rather than a virtual bicategory. |\V\mathbf{Prof}| always has all units, namely the hom |\V|-profunctors. Much like we can define the tensor product of modules by quotienting the tensor product of their underlying abelian groups by internal equivariance, we will find that we can make composites when we have enough (well-behaved) colimits6.

Related to composites, we can talk about left/right closure of |\V\mathbf{Prof}|. In this case we have the natural isomorphisms:
$$\begin{gather} \V\mathbf{Prof}(Q_1,\dots, Q_n, R; S) \cong \V\mathbf{Prof}(Q_1, \dots, Q_n; R \triangleright S) \\ \V\mathbf{Prof}(R, Q_1, \dots, Q_n; S) \cong \V\mathbf{Prof}(Q_1, \dots, Q_n;S \triangleleft R) \end{gather}$$
Like composites, this merely characterizes these constructs; they need not exist in general. These will be important when we talk about Yoneda and (co)limits in |\V|-categories.

A |\V|-natural transformation |\alpha : F \to G : \mathcal C \to \mathcal D| is the same as |\alpha\in\V\mathbf{Prof}(;\mathcal D(G, F))|.

Example Proof

Just as an example, let’s prove a basic fact about categories for arbitrary |\V|-categories. This will use an informal style.

The fact will be that full and faithful functors reflect isomorphisms. Let’s go through the typical proof for the ordinary category case.

Suppose we have an natural transformation |\varphi : \mathcal D(FA, FB) \to \mathcal C(A, B)| natural in |A| and |B| such that |\varphi| is an inverse to |F|, i.e. the action of the functor |F| on arrows. If |Ff \circ Fg = id| and |Fg \circ Ff = id|, then by the naturality of |\varphi|, |\varphi(id) = \varphi(Ff \circ id \circ Fg) = f \circ \varphi(id) \circ g| and similarly with |f| and |g| switched. We now just need to show that |\varphi(id) = id| but |id = F(id)|, so |\varphi(id) = \varphi(F(id)) = id|. |\square|

Now in the internal language. We’ll start with the theory of a |\V|-functor, so we have |\O|, |\O’|, |\A|, |\A’|, and |\mathsf F|. While the previous paragraph talks about a natural transformation, we can readily see that it’s really a multimorphism. In our case, it is a |\V|-multimorphism |\varphi| from |\A’(\mathsf F, \mathsf F)| to |\A|. Before we do that though, we need to show that |\mathsf F| itself is a |\V|-multimorphism. This corresponds to the naturality of the action on arrows of |F| which we took for granted in the previous paragraph. This is quickly verified: the external equivariance equations are just the functor law for composites. The additional data we have is two linear terms |\mathsf f| and |\mathsf g| such that |\mathsf F(\mathsf f) \circ \mathsf F(\mathsf g) = \mathsf{id}| and |\mathsf F(\mathsf g) \circ \mathsf F(\mathsf f) = \mathsf{id}|. Also, |\varphi(\mathsf F(h)) = h|. The result follows through almost identically to the previous paragraph. |\varphi(\mathsf{id}) = \varphi(\mathsf F(\mathsf f) \circ \mathsf F(\mathsf g)) = \varphi(\mathsf F(\mathsf f) \circ \mathsf{id} \circ \mathsf F(\mathsf g))|, we apply external equivariance twice to get |\mathsf f \circ \varphi(\mathsf{id}) \circ \mathsf g|. The functor law for |\mathsf{id}| gives |\varphi(\mathsf{id}) = \varphi(\mathsf F(\mathsf{id})) = \mathsf{id}|. A quick glance verifies that all these equations use their free variables linearly as required. |\square|

As a warning, in the above |\mathsf f| and |\mathsf g| are not free variables but constants, i.e. primitive linear terms. Thus there is no issue with an equation like |\mathsf F(\mathsf f) \circ \mathsf F(\mathsf g) = \mathsf{id}| as both sides have no free variables.

This is a very basic result but, again, the payoff here is how boring and similar to the usual case this is. For contrast, the definition of an internal profunctor is given here. This definition is easier to connect to our notion of |\V|-presheaf, specifically a |\mathcal Self(\mathbf S)|-presheaf, than it is to the usual |\mathbf{Set}|-valued functor definition. While not hard, it would take me a bit of time to even formulate the above proposition, and a proof in terms of the explicit definitions would be hard to recognize as just the ordinary proof.

For fun, let’s figure out what the |\mathcal Const(\mathbf{Ab})| case of this result says explicitly. A |\mathcal Const(\mathbf{Ab})|-category is a ring, a |\mathcal Const(\mathbf{Ab})|-functor is a ring homomorphism, and a |\mathcal Const(\mathbf{Ab})|-profunctor is a bimodule. Let |R| and |S| be rings and |f : R \to S| be a ring homomorphism. An isomorphism in |R| viewed as a |\mathcal Const(\mathbf{Ab})|-category is just an invertible element. Every ring, |R|, is an |R|-|R|-bimodule. Given any |S|-|S|-bimodule |P|, we have an |R|-|R|-bimodule |f^*(P)| via restriction of scalars, i.e. |f^*(P)| has the same elements as |P| and for |p \in f^*(P)|, |rpr’ = f(r)pf(r’)|. In particular, |f| gives rise to a bimodule homomorphism, i.e. a linear function, |f : R \to f^*(S)| which corresponds to its action on arrows from the perspective of |f| as a |\mathcal Const(\mathbf{Ab})|-functor. If this linear transformation has an inverse, then the above result states that when |f(r)| is invertible so is |r|. So to restate this all in purely ring theoretic terms, given a ring homomorphism |f : R \to S| and an abelian group homomorphism |\varphi : S \to R| satisfying |\varphi(f(rst)) = r\varphi(f(s))t| and |\varphi(f(r)) = r|, then if |f(r)| is invertible so is |r|.

  1. Indexed categories are equivalent to cloven fibrations and, if you have the Axiom of Choice, all fibrations can be cloven. Indexed categories can be viewed as presentations of fibrations.↩︎

  2. This suggests that we could define a small |\V|-category |\mathcal C \otimes \mathcal D| where |\mathcal C| and |\mathcal D| are small |\V|-categories. Start formulating a definition of such a |\V|-category. You will get stuck. Where? Why? This implies that the (ordinary, or better, 2-)category of small |\V|-categories does not have a monoidal product with |\mathbb I| as unit in general.↩︎

  3. With a good understanding of what a class is, it’s clear that it doesn’t even make sense to have a proper class be an object. In frameworks with an explicit notion of "class", this is often manifested by saying that a class that is an element of another class is a set (and thus not a proper class).↩︎

  4. This suggests that it might be interesting to consider categories that are (co)complete with respect to this monoid notion of “small”. I don’t think I’ve ever seen a study of such categories. (Co)limits of monoids are not trivial.↩︎

  5. This is one of the main things I like about working in weak foundations. It forces you to come up with better definitions that make it clear what is and is not important and eliminates coincidences. Of course, it also produces definitions and theorems that are inherently more general too.↩︎

  6. This connection isn’t much of a surprise as the tensor product of modules is exactly the (small) |\mathcal Const(\mathbf{Ab})| case of this.↩︎

July 06, 2020 05:03 AM

Internal Language of Indexed Monoidal Categories


This is part 2 in a series. See the previous part about internal languages for (non-indexed) monoidal categories. The main application I have in mind – enriching in indexed monoidal categories – is covered in the next post.

As Jean Bénabou pointed out in Fibered Categories and the Foundations of Naive Category Theory (PDF) notions of “families of objects/arrows” are ubiquitous and fundamental in category theory. One of the more noticeable places early on is in the definition of a natural transformation as a family of arrows. However, even in the definition of category, identities and compositions are families of functions, or, in the enriched case, arrows of |\mathbf V|. From a foundational perspective, one place where this gets really in-your-face is when trying to formalize the notion of (co)completeness. It is straightforward to make a first-order theory of a finitely complete category, e.g. this one. For arbitrary products and thus limits, we need to talk about families of objects. To formalize the usual meaning of this in a first-order theory would require attaching an entire first-order theory of sets, e.g. ZFC, to our notion of complete category. If your goals are of a foundational nature like Bénabou’s were, then this is unsatisfactory. Instead, we can abstract out what we need of the notion of “family”. The result turns out to be equivalent to the notion of a fibration.

My motivations here are not foundational but leaving the notion of “family” entirely meta-theoretical means not being able to talk about it except in the semantics. Bénabou’s comment suggests that at the semantic level we want not just a monoidal category, but a fibration of monoidal categories1. At the syntactic level, it suggests that there should be a built-in notion of “family” in our language. We accomplish both of these goals by formulating the internal language of an indexed monoidal category.

As a benefit, we can generalize to other notions of “family” than set-indexed families. We’ll clearly be able to formulate the notion of an enriched category. It’s also clear that we’ll be able to formulate the notion of an indexed category. Of course, we’ll also be able to formulate the notion of a category that is both enriched and indexed which includes the important special case of an internal category. We can also consider cases with trivial indexing which, in the unenriched case, will give us monoids, and in the |\mathbf{Ab}|-enriched case will give us rings.

Indexed Monoidal Categories

Following Shulman’s Enriched indexed categories, let |\mathbf{S}| be a category with a cartesian monoidal structure, i.e. finite products. Then an |\mathbf{S}|-indexed monoidal category is simply a pseudofunctor |\newcommand{\V}{\mathcal V}\V : \mathbf{S}^{op} \to \mathbf{MonCat}|. A pseudofunctor is like a functor except that the functor laws only hold up to isomorphism, e.g. |\V(id)\cong id|. |\mathbf{MonCat}| is the |2|-category of monoidal categories, strong monoidal functors2, and monoidal natural transformations. We’ll write |\V(X)| as |\V^X| and |\V(f)| as |f^*|. We’ll never have multiple relevant indexed monoidal categories so this notation will never be ambiguous. We’ll call the categories |\V^X| fiber categories and the functors |f^*| reindexing functors. The cartesian monoidal structure on |\mathbf S| becomes relevant when we want to equip the total category, |\int\V|, (computed via the Grothendieck construction in the usual way) with a monoidal structure. In particular, the tensor product of |A \in \V^X| and |B \in \V^Y| is an object |A\otimes B \in \V^{X\times Y}| calculated as |\pi_1^*(A) \otimes_{X\times Y} \pi_2^*(B)| where |\otimes_{X\times Y}| is the monoidal tensor in |\V^{X\times Y}|. The unit, |I|, is the unit |I_1 \in \V^1|.

The two main examples are: |\mathcal Fam(\mathbf V)| where |\mathbf V| is a (non-indexed) monoidal category and |\mathcal Self(\mathbf S)| where |\mathbf S| is a category with finite limits. |\mathcal Fam(\mathbf V)| is a |\mathbf{Set}|-indexed monoidal category with |\mathcal Fam(\mathbf V)^X| defined as the set of |X|-indexed families of objects of |\mathbf V|, families of arrows between them, and an index-wise monoidal product. We can identify |\mathcal Fam(\mathbf V)^X| with the functor category |[DX, \mathbf V]| where |D : \mathbf{Set} \to \mathbf{cat}| takes a set |X| to a small discrete category. Enriching in indexed monoidal category |\mathcal Fam(\mathbf V)| will be equivalent to enriching in the non-indexed monoidal category |\mathbf V|, i.e. the usual notion of enrichment in a monoidal category. |\mathcal Self(\mathbf S)| is an |\mathbf S|-indexed monoidal category and |\mathcal Self(\mathbf S)^X| is the slice category |\mathbf S/X| with its cartesian monoidal structure. |f^*| is the pullback functor. |\mathcal Self(\mathbf S)|-enriched categories are categories internal to |\mathbf S|. A third example we’ll find interesting is |\mathcal Const(\mathbf V)| for a (non-indexed) monoidal category, |\mathbf V|, which is a |\mathbf 1|-indexed monoidal category, which corresponds to an object of |\mathbf{MonCat}|, namely |\mathbf V|.

The Internal Language of Indexed Monoidal Categories

This builds on the internal language of a monoidal category described in the previous post. We’ll again have linear types and linear terms which will be interpreted into objects and arrows in the fiber categories. To indicate the dependence on the indexing, we’ll use two contexts: |\Gamma| will be an index context containing index types and index variables, which will be interpreted into objects and arrows of |\mathbf S|, while |\Delta|, the linear context, will contain linear types and linear variables as before except now linear types will be able to depend on index terms. So we’ll have judgements that look like:
$$\begin{gather} \Gamma \vdash A \quad \text{and} \quad \Gamma; \Delta \vdash E : B \end{gather}$$
The former indicates that |A| is a linear type indexed by the index variables of |\Gamma|. The latter states that |E| is a linear term of linear type |B| in the linear context |\Delta| indexed by the index variables of |\Gamma|. We’ll also have judgements for index types and index terms:
$$\begin{gather} \vdash X : \square \quad \text{and} \quad \Gamma \vdash E : Y \end{gather}$$
The former asserts that |X| is an index type. The latter asserts that |E| is an index term of index type |Y| in the index context |\Gamma|.

Since each fiber category is monoidal, we’ll have all the rules from before just with an extra |\Gamma| hanging around. Since our indexing category, |\mathbf S|, is also monoidal, we’ll also have copies of these rules at the level of indexes. However, since |\mathbf S| is cartesian monoidal, we’ll also have the structural rules of weakening, exchange, and contraction for index terms and types. To emphasize the cartesian monoidal structure of indexes, I’ll use the more traditional Cartesian product and tuple notation: |\times| and |(E_1, \dots, E_n)|. This notation allows a bit more uniformity as the |n=0| case can be notated by |()|.

The only really new rule is the rule that allows us to move linear types and terms from one index context to another, i.e. the rule that would correspond to applying a reindexing functor. I call this rule Reindex and, like Cut, it will be witnessed by substitution. Like Cut, it will also be a rule which we can eliminate. At the semantic level, this elimination corresponds to the fact that to understand the intepreration of any particular (linear) term, we can first reindex everything, i.e. all the intepretations of all subterms, into the same fiber category and then we can work entirely within that one fiber category. The Reindex rule is:
$$\begin{gather} \dfrac{\Gamma \vdash E : X \quad \Gamma', x : X; a_1 : A_1, \dots, a_n : A_n \vdash E' : B}{\Gamma',\Gamma; a_1 : A_1[E/x], \dots, a_n : A_n[E/x] \vdash E'[E/x] : B[E/x]}\text{Reindex} \end{gather}$$

By representing reindexing by syntactic substitution, we’re requiring the semantics of (linear) type and term formation operations to be respected by reindexing functors. This is exactly the right thing to do as the appropriate notion of, say, indexed coproducts, which would correspond to sum types, is coproducts in each fiber category which are preserved by reindexing functors.

Below I provide a listing of rules and equations.

Relation to Parameterized and Dependent Types

None of this section is necessary for anything else.

This notion of (linear) types and terms being indexed by other types and terms is reminiscent of parametric types or dependent types. The machinery of indexed/fibered categories is also commonly used in the categorical semantics of parameterized and dependent types. However, there are important differences between those cases and our case.

In the case of parameterized types, we have types and terms that depend on other types. In this case, we have kinds, which are “types of types”, which classify types which in turn classify terms. If we try to set up an analogy to our situation, index types would correspond to kinds and index terms would correspond to types. The most natural thing to continue would be to have linear terms correspond to terms, but we start to see the problem. Linear terms are classified by linear types, but linear types are not index terms. They don’t even induce index terms. In the categorical semantics of parameterized types, this identification of types with (certain) expressions classified by kinds is handled by the notion of a generic object. A generic object corresponds to the kind |\mathsf{Type}| (what Haskell calls *). The assumption of a generic object is a rather strong assumption and one that none of our example indexed monoidal categories support in general.

A similar issue occurs when we try to make an analogy to dependent types. The defining feature of a dependent type system is that types can depend on terms. The problem with such a potential analogy is that linear types and terms do not induce index types and terms. A nice way to model the semantics of dependent types is the notion of a comprehension category. This, however, is additional structure beyond what we are given by an indexed monoidal category. However, comprehension categories will implicitly come up later when we talk about adding |\mathbf S|-indexed (co)products. These comprehension categories will share the same index category as our indexed monoidal categories, namely |\mathbf S|, but will have different total categories. Essentially, a comprehension category shows how objects (and arrows) of a total category can be represented in the index category. We can then talk about having (co)products in a different total category with same index category with respect to those objects picked out by the comprehension category. We get dependent types in the case where the total categories are the same. (More precisely, the fibrations are the same.) Sure enough, we will see that when |\mathcal Self(\mathbf S)| has |\mathbf S|-indexed products, then |\mathbf S| is, indeed, a model of a dependent type theory. In particular, it is locally cartesian closed.

Rules for an Indexed Monoidal Category

$$\begin{gather} \dfrac{\vdash X : \square}{x : X \vdash x : X}\text{IxAx} \qquad \dfrac{\Gamma\vdash E : X \quad \Gamma', x : X \vdash E': Y}{\Gamma',\Gamma \vdash E'[E/x] : Y}\text{IxCut} \\ \\ \dfrac{\vdash Y : \square \quad \Gamma\vdash E : X}{\Gamma, y : Y \vdash E : X}\text{Weakening},\ y\text{ fresh} \qquad \dfrac{\Gamma, x : X, y : Y, \Gamma' \vdash E : Z}{\Gamma, y : Y, x : X, \Gamma' \vdash E : Z}\text{Exchange} \qquad \dfrac{\Gamma, x : X, y : Y \vdash E : Z}{\Gamma, x : X \vdash E[x/y] : Z}\text{Contraction} \\ \\ \dfrac{\mathsf X : \mathsf{IxType}}{\vdash \mathsf X : \square}\text{PrimIxType} \qquad \dfrac{\vdash X_1 : \square \quad \cdots \quad \vdash X_n : \square}{\vdash (X_1, \dots, X_n) : \square}{\times_n}\text{F} \\ \\ \dfrac{\Gamma \vdash E_1 : X_1 \quad \cdots \quad \Gamma \vdash E_n : X_n \quad \mathsf F : (X_1, \dots, X_n) \to Y}{\Gamma \vdash \mathsf F(E_1, \dots, E_n) : Y}\text{PrimIxTerm} \\ \\ \dfrac{\Gamma_1 \vdash E_1 : X_1 \quad \cdots \quad \Gamma_n \vdash E_n : X_n}{\Gamma_1,\dots,\Gamma_n \vdash (E_1, \dots, E_n) : (X_1, \dots, X_n)}{\times_n}\text{I} \qquad \dfrac{\Gamma \vdash E : (X_1, \dots, X_n) \quad x_1 : X_1, \dots, x_n : X_n, \Gamma' \vdash E' : Y}{\Gamma, \Gamma' \vdash \mathsf{match}\ E\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E' : Y}{\times_n}\text{E} \\ \\ \dfrac{\Gamma \vdash E_1 : X_1 \quad \cdots \quad \Gamma \vdash E_n : X_n \quad \mathsf A : (X_1, \dots, X_n) \to \mathsf{Type}}{\Gamma \vdash \mathsf A(E_1, \dots, E_n)}\text{PrimType} \\ \\ \dfrac{\Gamma \vdash A}{\Gamma; a : A \vdash a : A}\text{Ax} \qquad \dfrac{\Gamma; \Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Gamma; \Delta_n \vdash E_n : A_n \quad \Gamma; \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E: B}{\Gamma; \Delta_l, \Delta_1, \dots, \Delta_n, \Delta_r \vdash E[E_1/a_1, \dots, E_n/a_n] : B}\text{Cut} \\ \\ \dfrac{\Gamma \vdash E : X \quad \Gamma', x : X; a_1 : A_1, \dots, a_n : A_n \vdash E' : B}{\Gamma',\Gamma; a_1 : A_1[E/x], \dots, a_n : A_n[E/x] \vdash E'[E/x] : B[E/x]}\text{Reindex} \\ \\ \dfrac{}{\Gamma\vdash I}I\text{F} \qquad \dfrac{\Gamma\vdash A_1 \quad \cdots \quad \Gamma \vdash A_n}{\Gamma \vdash A_1 \otimes \cdots \otimes A_n}{\otimes_n}\text{F}, n \geq 1 \\ \\ \dfrac{\Gamma \vdash E_1 : X_1 \quad \cdots \quad \Gamma \vdash E_n : X_n \quad \Gamma; \Delta_1 \vdash E_1' : A_1 \quad \cdots \quad \Gamma; \Delta_m \vdash E_m' : A_m \quad \mathsf f : (x_1 : X_1, \dots, x_n : X_n; A_1, \dots, A_m) \to B}{\Gamma; \Delta_1, \dots, \Delta_m \vdash \mathsf f(E_1, \dots, E_n; E_1', \dots, E_m') : B}\text{PrimTerm} \\ \\ \dfrac{}{\Gamma; \vdash * : I}I\text{I} \qquad \dfrac{\Gamma; \Delta \vdash E : I \quad \Gamma; \Delta_l, \Delta_r \vdash E' : B}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash \mathsf{match}\ E\ \mathsf{as}\ *\ \mathsf{in}\ E' : B}I\text{E} \\ \\ \dfrac{\Gamma; \Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Gamma; \Delta_n \vdash E_n : A_n}{\Gamma; \Delta_1,\dots,\Delta_n \vdash E_1 \otimes \cdots \otimes E_n : A_1 \otimes \cdots \otimes A_n}{\otimes_n}\text{I} \\ \\ \dfrac{\Gamma; \Delta \vdash E : A_1 \otimes \cdots \otimes A_n \quad \Gamma; \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E' : B}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash \mathsf{match}\ E\ \mathsf{as}\ (a_1 \otimes \cdots \otimes a_n)\ \mathsf{in}\ E' : B}{\otimes_n}\text{E},n \geq 1 \end{gather}$$


$$\begin{gather} \dfrac{\Gamma_1 \vdash E_1 : X_1 \quad \cdots \quad \Gamma_n \vdash E_n : X_n \qquad x_1 : X_1, \dots, x_n : X_n, \Gamma \vdash E : Y}{\Gamma_1, \dots, \Gamma_n, \Gamma \vdash (\mathsf{match}\ (E_1, \dots, E_n)\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E) = E[E_1/x_1, \dots, E_n/x_n] : Y}{\times_n}\beta \\ \\ \dfrac{\Gamma \vdash E : (X_1, \dots, X_n) \qquad \Gamma, x : (X_1, \dots, X_n) \vdash E' : B}{\Gamma \vdash E'[E/x] = \mathsf{match}\ E\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E'[(x_1, \dots, x_n)/x] : B}{\times_n}\eta \\ \\ \dfrac{\Gamma \vdash E_1 : (X_1, \dots, X_n) \qquad x_1 : X_1, \dots, x_n : X_n \vdash E_2 : Y \quad y : Y \vdash E_3 : Z}{\Gamma \vdash (\mathsf{match}\ E_1\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E_3[E_2/y]) = E_3[(\mathsf{match}\ E_1\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E_2)/y] : Z}{\times_n}\text{CC} \\ \\ \dfrac{\Gamma;\vdash E : B}{\Gamma;\vdash (\mathsf{match}\ *\ \mathsf{as}\ *\ \mathsf{in}\ E) = E : B}{*}\beta \qquad \dfrac{\Gamma; \Delta \vdash E : I \qquad \Gamma; \Delta_l, a : I, \Delta_r \vdash E' : B}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash E'[E/a] = (\mathsf{match}\ E\ \mathsf{as}\ *\ \mathsf{in}\ E'[{*}/a]) : B}{*}\eta \\ \\ \dfrac{\Gamma; \Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Gamma; \Delta_n \vdash E_n : A_n \qquad \Gamma; \Delta_l, a_1 : A_1, \dots, a_n, \Delta_r : A_n \vdash E : B}{\Gamma; \Delta_l, \Delta_1, \dots, \Delta_n, \Delta_r \vdash (\mathsf{match}\ E_1\otimes\cdots\otimes E_n\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E) = E[E_1/a_1, \dots, E_n/a_n] : B}{\otimes_n}\beta \\ \\ \dfrac{\Gamma; \Delta \vdash E : A_1 \otimes \cdots \otimes A_n \qquad \Gamma; \Delta_l, a : A_1 \otimes \cdots \otimes A_n, \Delta_r \vdash E' : B}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash E'[E/a] = \mathsf{match}\ E\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E'[(a_1\otimes\cdots\otimes a_n)/a] : B}{\otimes_n}\eta \\ \\ \dfrac{\Gamma; \Delta \vdash E_1 : I \qquad \Gamma; \Delta_l, \Delta_r \vdash E_2 : B \qquad \Gamma; b : B \vdash E_3 : C}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash (\mathsf{match}\ E_1\ \mathsf{as}\ *\ \mathsf{in}\ E_3[E_2/b]) = E_3[(\mathsf{match}\ E_1\ \mathsf{as}\ *\ \mathsf{in}\ E_2)/b] : C}{*}\text{CC} \\ \\ \dfrac{\Gamma; \Delta \vdash E_1 : A_1 \otimes \cdots \otimes A_n \qquad \Gamma; \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E_2 : B \qquad \Gamma; b : B \vdash E_3 : C}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E_3[E_2/b]) = E_3[(\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \dots \otimes a_n\ \mathsf{in}\ E_2)/b] : C}{\otimes_n}\text{CC} \end{gather}$$

|\mathsf X : \mathsf{IxType}| means |\mathsf X| is a primitive index type in the signature. |\mathsf A : (X_1, \dots, X_n) \to \mathsf{Type}| means that |\mathsf A| is a primitive linear type in the signature. |\mathsf F : (X_1, \dots, X_n) \to Y| and |\mathsf f : (x_1 : X_1, \dots, x_n : X_n; A_1, \dots, A_m) \to B| mean that |\mathsf F| and |\mathsf f| are assigned these types in the signature. In the latter case, it is assumed that |x_1 : X_1, \dots, x_n : X_n \vdash A_i| for |i = 1, \dots, m| and |x_1 : X_1, \dots, x_n : X_n \vdash B|. Alternatively, these assumptions could be added as additional hypotheses to the PrimTerm rule. Generally, every |x_i| will be used in some |A_j| or in |B|, though this isn’t technically required.

As before, I did not write the usual laws for equality (reflexivity and indiscernability of identicals) but they also should be included.

See the discussion in the previous part about the commuting conversion (|\text{CC}|) rules.

A theory in this language is free to introduce additional index types, operations on indexes, linear types, and linear operations.

Interpretation into an |\mathbf S|-indexed monoidal category

Fix an |\mathbf S|-indexed monoidal category |\V|. Write |\newcommand{\den}[1]{[\![#1]\!]}\den{-}| for the (overloaded) interpretation function. Its value on primitive operations is left as a parameter.

Associators for the semantic |\times| and |\otimes| will be omitted below.

Interpretation of Index Types

$$\begin{align} \vdash X : \square \implies & \den{X} \in \mathsf{Ob}(\mathbf S) \\ \\ \den{\Gamma} = & \prod_{i=1}^n \den{X_i}\text{ where } \Gamma = x_1 : X_1, \dots, x_n : X_n \\ \den{(X_1, \dots, X_n)} = & \prod_{i=1}^n \den{X_i} \end{align}$$

Interpretation of Index Terms

$$\begin{align} \Gamma \vdash E : X \implies & \den{E} \in \mathbf{S}(\den{\Gamma}, \den{X}) \\ \\ \den{x_i} =\, & \pi_i \text{ where } x_1 : X_1, \dots, x_n : X_n \vdash x_i : X_i \\ \den{(E_1, \dots, E_n)} =\, & \den{E_1} \times \cdots \times \den{E_n} \\ \den{\mathsf{match}\ E\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E'} =\, & \den{E'} \circ (\den{E} \times id_{\den{\Gamma'}}) \text{ where } \Gamma' \vdash E' : Y \\ \den{\mathsf F(E_1, \dots, E_n)} =\, & \den{\mathsf F} \circ (\den{E_1} \times \cdots \times \den{E_n}) \\ & \quad \text{ where }\mathsf F\text{ is an appropriately typed index operation} \end{align}$$

Witnesses of Index Derivations

IxAx is witnessed by identity, and IxCut by composition in |\mathbf S|. Weakening is witnessed by projection. Exchange and Contraction are witnessed by expressions that can be built from projections and tupling. This is very standard.

Interpretation of Linear Types

$$\begin{align} \Gamma \vdash A \implies & \den{A} \in \mathsf{Ob}(\V^{\den{\Gamma}}) \\ \\ \den{\Delta} =\, & \den{A_1}\otimes_{\den{\Gamma}}\cdots\otimes_{\den{\Gamma}}\den{A_n} \text{ where } \Delta = a_1 : A_1, \dots, a_n : A_n \\ \den{I} =\, & I_{\den{\Gamma}}\text{ where } \Gamma \vdash I \\ \den{A_1 \otimes \cdots \otimes A_n} =\, & \den{A_1}\otimes_{\den{\Gamma}} \cdots \otimes_{\den{\Gamma}} \den{A_n}\text{ where } \Gamma \vdash A_i \\ \den{\mathsf A(E_1, \dots, E_n)} =\, & \langle \den{E_1}, \dots, \den{E_n}\rangle^*(\den{\mathsf A}) \\ & \quad \text{ where }\mathsf A\text{ is an appropriately typed linear type operation} \end{align}$$

Interpretation of Linear Terms

$$\begin{align} \Gamma; \Delta \vdash E : A \implies & \den{E} \in \V^{\den{\Gamma}}(\den{\Delta}, \den{A}) \\ \\ \den{a} =\, & id_{\den{A}} \text{ where } a : A \\ \den{*} =\, & id_{I_{\den{\Gamma}}} \text{ where } \Gamma;\vdash * : I \\ \den{E_1 \otimes \cdots \otimes E_n} =\, & \den{E_1} \otimes_{\den{\Gamma}} \cdots \otimes_{\den{\Gamma}} \den{E_n} \text{ where } \Gamma; \Delta_i \vdash E_i : A_i \\ \den{\mathsf{match}\ E\ \mathsf{as}\ {*}\ \mathsf{in}\ E'} =\, & \den{E'} \circ (id_{\den{\Delta_l}} \otimes_{\den{\Gamma}} (\lambda_{\den{\Delta_r}} \circ (\den{E} \otimes_{\den{\Gamma}} id_{\den{\Delta_r}}))) \\ \den{\mathsf{match}\ E\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E'} =\, & \den{E'} \circ (id_{\den{\Delta_l}} \otimes_{\den{\Gamma}} \den{E} \otimes_{\den{\Gamma}} id_{\den{\Delta_r}}) \\ \den{\mathsf f(E_1, \dots, E_n; E_1', \dots, E_n')} =\, & \langle \den{E_1}, \dots, \den{E_n}\rangle^*(\den{\mathsf f}) \circ (\den{E_1'} \otimes_{\den{\Gamma}} \cdots \otimes_{\den{\Gamma}} \den{E_n'}) \\ & \quad \text{ where }\mathsf f\text{ is an appropriately typed linear operation} \end{align}$$

Witnesses of Linear Derivations

As with the index derivations, Ax is witnessed by the identity, in this case in |\V^{\den{\Gamma}}|.

|\den{E[E_1/a_1,,E_n/a_n]} = \den{E} \circ (\den{E_1}\otimes\cdots\otimes\den{E_n})| witnesses Cut.

Roughly speaking, Reindex is witnessed by |\den{E}^*(\den{E’})|. If we were content to restrict ourselves to semantics in |\mathbf S|-indexed monoidal categories witnessed by functors, as opposed to pseudofunctors, into strict monoidal categories, then this would suffice. For an arbitrary |\mathbf S|-indexed monoidal category, we can’t be sure that the naive interpretation of |A[E/x][E’/y]|, i.e. |\den{E’}^*(\den{E}^*(\den{A}))|, which we’d get from two applications of the Reindex rule, is the same as the interpretation of |A[E[E’/y]/x]|, i.e. |\den{E \circ E’}^*(\den{A})|, which we’d get from IxCut followed by Reindex. On the other hand, |A[E/x][E’/y] = A[E[E’/y]/x]| is simply true syntactically by the definition of substitution (which I have not provided but is the obvious, usual thing). There are similar issues for (meta-)equations like |I[E/x] = I| and |(A_1 \otimes A_2)[E/x] = A_1[E/x] \otimes A_2[E/x]|.

The solution is that we essentially use a normal form where we eliminate the uses of Reindex. These normal form derivations will be reached by rewrites such as:
$$\begin{gather} \dfrac{\dfrac{\mathcal D}{\Gamma' \vdash E : X} \qquad \dfrac{\dfrac{\mathcal D_1}{\Gamma, x : X; \Delta_1 \vdash E_1 : A_1} \quad \cdots \quad \dfrac{\mathcal D_n}{\Gamma, x : X; \Delta_n \vdash E_n : A_n}} {\Gamma, x : X; \Delta_1, \dots, \Delta_n \vdash E_1 \otimes \cdots \otimes E_n : A_1 \otimes \cdots \otimes A_n}} {\Gamma, \Gamma'; \Delta_1[E/x], \dots, \Delta_n[E/x] \vdash E_1[E/x] \otimes \cdots \otimes E_n[E/x] : A_1[E/x] \otimes \cdots \otimes A_n[E/x]} \\ \Downarrow \\ \dfrac{\dfrac{\dfrac{\mathcal D}{\Gamma' \vdash E : X} \quad \dfrac{\mathcal D_1}{\Gamma, x : X; \Delta_1 \vdash E_1 : A_1}} {\Gamma, \Gamma'; \Delta_1[E/x] \vdash E_1[E/x] : A_1[E/x]} \quad \cdots \quad \dfrac{\dfrac{\mathcal D}{\Gamma' \vdash E : X} \quad \dfrac{\mathcal D_n}{\Gamma, x : X; \Delta_n \vdash E_n : A_n}} {\Gamma, \Gamma'; \Delta_n[E/x] \vdash E_n[E/x] : A_n[E/x]}} {\Gamma, \Gamma'; \Delta_1[E/x], \dots, \Delta_n[E/x] \vdash E_1[E/x] \otimes \cdots \otimes E_n[E/x] : A_1[E/x] \otimes \cdots \otimes A_n[E/x]} \end{gather}$$

Semantically, this is witnessed by the strong monoidal structure, i.e. |\den{E}^*(\den{E_1} \otimes \cdots \otimes \den{E_n}) \cong \den{E}^*(\den{E_1}) \otimes \cdots \otimes \den{E}^*(\den{E_n})|. We need such rewrites for all (linear) rules that can immediately precede Reindex in a derivation. For |I\text{I}|, |I\text{E}|, |\otimes_n\text{E}|, and, as we’ve just seen, |\otimes_n\text{I}|, these rewrites are witnessed by |\den{E}^*| being a strong monoidal functor. The rewrites for |\text{Ax}| and |\text{Cut}| are witnessed by functorality of |\den{E}^*| and also strong monoidality for Cut. Finally, two adjacent uses of Reindex become an IxCut and a Reindex and are witnessed by the pseudofunctoriality of |(\_)^*|. (While we’re normalizing, we may as well eliminate Cut and IxCut as well.)

  1. As the previous post alludes, monoidal structure is more than we need. If we pursue the generalizations described there in this indexed context, we eventually end up at augmented virtual double categories or virtual equipment.↩︎

  2. The terminology here is a mess. Leinster calls strong monoidal functors “weak”. “Strong” also refers to tensorial strength, and it’s quite possible to have a “strong lax monoidal functor”. (In fact, this is what applicative functors are usually described as, though a strong lax closed functor would be a more direct connection.) Or the functors we’re talking about which are not-strong strong monoidal functors…↩︎

July 06, 2020 02:00 AM

July 05, 2020

Neil Mitchell

Automatic UI's for Command Lines with cmdargs

Summary: Run cmdargs-browser hlint and you can fill out arguments easily.

The Haskell command line parsing library cmdargs contains a data type that represents a command line. I always thought it would be a neat trick to transform that into a web page, to make it easier to explore command line options interactively - similar to how the custom-written wget::gui wraps wget.

I wrote a demo to do just that, named cmdargs-browser. Given any program that uses cmdargs (e.g. hlint), you can install cmdargs-browser (with cabal install cmdargs-browser) and run:

cmdargs-browser hlint

And it will pop up:

As we can see, the HLint modes are listed on the left (you can use lint, grep or test), the possible options on the right (e.g. normal arguments and --color) and the command line it produces at the bottom. As you change mode or add/remove flags, the command line updates. If you hit OK it then runs the program with the command line. The help is included next to the argument, and if you make a mistake (e.g. write foo for the --color flag) it tells you immediately. It could be more polished (e.g. browse buttons for file selections, better styling) but the basic concepts works well.

Technical implementation

I wanted every cmdargs-using program to support this automatic UI, but also didn't want to increase the dependency footprint or compile-time overhead for cmdargs. I didn't want to tie cmdargs to this particular approach to a UI - I wanted a flexible mechanism that anyone could use for other purposes.

To that end, I built out a Helper module that is included in cmdargs. That API provides the full power and capabilities on which cmdargs-browser is written. The Helper module is only 350 lines.

If you run cmdargs with either $CMDARGS_HELPER or $CMDARGS_HELPER_HLINT set (in the case of HLint) then cmdargs will run the command line you specify, passing over the explicit Mode data type on the stdin. That Mode data type includes functions, and using a simplistic communication channel on the stdin/stdout, the helper process can invoke those functions. As an example, when cmdargs-browser wants to validate the --color flag, it does so by calling a function in Mode, that secretly talks back to hlint to validate it.

At the end, the helper program can choose to either give an error message (to stop the program, e.g. if you press Cancel), or give some command lines to use to run the program.

Future plans

This demo was a cool project, which may turn out to be useful for some, but I have no intention to develop it further. I think something along these lines should be universally available for all command line tools, and built into all command line parsing libraries.

Historical context

All the code that makes this approach work was written over seven years ago. Specifically, it was my hacking project in the hospital while waiting for my son to be born. Having a little baby is a hectic time of life, so I never got round to telling anyone about its existence.

This weekend I resurrected the code and published an updated version to Hackage, deliberately making as few changes as possible. The three necessary changes were:

  1. jQuery deprecated the live function replacing it with on, meaning the code didn't work.
  2. I had originally put an upper bound of 0.4 for the transformers library. Deleting the upper bound made it work.
  3. Hackage now requires that all your uploaded .cabal files declare that they require a version of 1.10 or above of Cabal itself, even if they don't.

Overall, to recover a project that is over 7 years old, it was surprisingly little effort.

by Neil Mitchell ( at July 05, 2020 10:03 AM

July 04, 2020

Don Stewart (dons)

Back to old tricks .. (or, baby steps in Rust)

I’ve been learning Rust for the past twenty days or so, working through the Blandy & Orendorff book, coding up things as I go. Once I got into playing with Rust traits and closures and associated types, the similarities to programming in Haskell with typeclasses, data structures, closure passing and associated types was pretty obvious.

As a warm up I thought I’d try porting the stream fusion core from Haskell to Rust. This was code I was working on more than a decade ago. How much of it would work or even make sense in today’s Rust?

Footnote: this is the first code I’ve attempted to seriously write for more than 2 years, as I’m diving back into software engineering after an extended sojourn in team building and eng management. I was feeling a bit .. rusty. Let me know if I got anything confused.


  • Two versions of the basic stream/list/vector APIs: a data structure with boxed closure-passing and pure state
  • And a more closure-avoiding trait encoding that uses types to index all the code statically

Fun things to discover:

  • most of the typeclass ‘way of seeing’ works pretty much the same in Rust. You index/dispatch/use similar mental model to program generically. I was able to start guessing the syntax after a couple of days
  • it’s actually easier to get first class dictionaries and play with them
  • compared to the hard work we do in GHC to make everything strict, unboxed and not heap-allocated, this is the default in Rust which makes the optimization story a lot simpler
  • Rust has no GC , instead using a nested-region like allocation strategy by default. I have to commit to specific sharing and linear memory use up front. This feels a lot like an ST-monad-like state threading system. Borrowing and move semantics take a bit of practice to get used to.
  • the trait version looks pretty similar to the core of the standard Rust iterators, and the performance in toy examples is very good for the amount of effort I put in
  • Rust seems to push towards method/trait/data structure-per-generic-function programming, which is interesting. Encoding things in types does good things, just as it does in Haskell.
  • Non-fancy Haskell, including type class designs, can basically be ported directly to Rust, though you now annotate allocation behavior explicitly.
  • cargo is a highly polished ‘cabal’ with lots of sensible defaults and way more constraints on what is allowed. And a focus on good UX.

As a meta point, it’s amazing to wander into a fresh programming language, 15 years after the original “associated types with class” paper, and find basically all the original concepts available, and extended in some interesting ways. There’s a lot of feeling of deja vu for someone who worked on GHC optimizations/unboxing/streams when writing in Rust.

Version 1: direct Haskell translation

First, a direct port of the original paper. A data type for stepping through elements of stream, including ‘Skip’ so we can filter things. And a struct holding the stream stepper function, and the state. Compared to the Haskell version (in the comment) there’s more rituals to declare what goes on the heap, and linking the lifetime of objects together. My reward for doing this is not needing a garbage collector. There’s quite an enjoyable serotonin reward when a borrow checking program type checks :-)

You can probably infer from the syntax this will be operationally expensive: a closure boxed up onto the heap. Rust’s explicit control of where things are stored and for how long feels a lot like a type system for scripting allocators, and compared to Haskell you’re certainly exactly aware of what you’re asking the machine to do.

Overall, its a fairly pleasing translation and even though I’m putting the closure on the heap I’m still having it de-allocated when the stream is dropped. Look mum, closure passing without a GC.

We can create empty streams, or streams with a single element in them. Remember a stream is just a function from going from one value to the next, and a state to kick it off:

The lambda syntax feels a bit heavy at first, but you get used to it. The hardest part of these definitions were:

  • being explicit about whether my closure was going to borrow or own the lifetime of values it captures. Stuff you never think about in Haskell, with a GC to take care of all that thinking (for a price). You end up with a unique type per closure showing what is captured, which feels _very_ explicit and controlled.
  • no rank-2 type to hide the stream state type behind. The closest I could get was the ‘impl Seed’ opaque return type, but it doesn’t behave much like a proper existential type and tends to leak through the implementation. I’d love to see the canonical way to hide this state value at the type level without being forced to box it up.
  • a la vector ‘Prim’ types in Haskell, we use Copy to say something about when we want the values to be cheap to move (at least, that’s my early mental model)

I can generate a stream of values, a la replicate:

As long as I’m careful threading lifetime parameters around I can box values and generate streams without using a GC. This is sort of amazing. (And the unique lifetime token-passing discipline feels very like the ST monad and its extensions into regions/nesting). Again , you can sort of “feel” how expensive this is going to be, with the capturing and boxing to the heap explict. That boxed closure dynamically invoked will have a cost.

Let’s consume a stream, via a fold:

Not too bad. The lack of tail recursion shows up here, so while I’d normally write this as a ‘go’ local work function with a stack parameter, to get a loop, instead in Rust we just write a loop and peek and poke the memory directly via a ‘mut’ binding. Sigh, fine, but I promise I’m still thinking in recursion.

Now I can do real functional programming:

What about something a bit more generic: enumFromTo to fill a range with consecutive integer values, of any type supporting addition?

The trait parameters feel a lot like a super layered version of the Haskell Num class, where I’m really picking and choosing which methods I want to dispatch to. The numeric overloading is also a bit different (A::one()) instead of an overloaded literal. Again, is almost identical to the Haskell version, but with explicit memory annotations, and more structured type class/trait hierarchy. Other operations, like map, filter, etc all fall out fairly straight forward. Nice: starting to feel like I can definitely be productive in this language.

Now I can even write a functional pipeline — equivalent to:

   sum . map (*2) . filter (\n -> n`mod`2 == 1) $ [1..1000000::Int]
=> 500000000000

As barebones Rust:

It actually runs and does roughly what I’d expect, for N=1_000_000:

$ cargo run


Another moment of deja vu, installing Criterion to benchmark the thing. “cargo bench” integration is pretty sweet:

So about 7 ms for four logical loops over 1M i64 elements. That’s sort of plausible and actually not to bad considering I don’t know what I’m doing.

The inflation is… not great, not terrible | Meanwhile in Budapest

The overhead of the dynamic dispatch to the boxed closure is almost certainly going to dominate, and and then likely breaks inlining and arithmetic optimization, so while we do get a fused loop, we get all the steps of the loop in sequence. I fiddled a bit with the inlining the consuming loop, which shaved 1ms off, but that’s about it.

A quick peek at the assembly f–release mode, which I assume does good things, and yeah, this isn’t going to be fast. Tons of registers, allocs and dispatching everywhere. Urk.

But it works! The first thing directly translated works, and it has basically the behavior you’d expect with explict closure calls. Not bad!

A trait API

That boxed closure bothers me a bit. Dispatching to something that’s known statically. The usual trick for resolving things statically is to move the work to the type system. In this case, we want to lookup the right ‘step’ function by type. So I’ll need a type for each generator and transformer function in the stream API. We can take this approach in Rust too.

Basic idea:

  • move the type of the ‘step’ function of streams into a Stream trait
  • create a data type for each generator or transformer function, then impl that for Stream. This is the key to removing the overhead resolving the step functions
  • stream elem types can be generic parameters, or specialized associated types
  • the seed state can be associated type-indexed

I banged my head against this a few different ways, and settled on putting the state data into the ‘API key’ type. This actually looks really like something we already knew how to do – streams as Rust iterators – Snoyman already wrote about it 3 years ago! — I’ve basically adapted his approach here after a bit of n00b trial and error.

The ‘Step’ type is almost the same, and the polymorphic ‘Stream’ type with its existential seed becomes a trait definition:

What’s a bit different now is how we’re going to resolve the function to generate each step of the stream. That’s now a trait method associated with some instance and element type.

So e.g. if I want to generate an empty stream, I need a type, and instance and a wrapper:

Ok not too bad. My closure for stepping over streams is now a ‘next’ method. What would have been a Stream ‘object’ with an embedded closure is now a trait instance where the ‘next’ function can be resolved statically.

I can convert all the generator functions like this. For example, to replicate a stream I need to know how many elements, and what the element is. Instead of capturing the element in a closure, it’s in an explicit data type:

The step function is still basically the same as in the Haskell version, but to get the nice Rust method syntax we have it all talk to ‘self’.

We also need a type for each stream transformer: so a ‘map’ is now a struct with the mapper function, paired with the underlying stream object it maps over.

This part is a bit more involved — when a map is applied to a stream element, we return f(x) of the element, and lift the stream state into a Map stream state for the next step.

I can implement Stream-generic folds now — again, since I have no tail recursion to consume the stream I’m looping explicitly. This is our real ‘driver’ of work , the actual loop pulling on a chain of ‘’s we’ve built up.

Ok so with the method syntax this looks pretty nice:

I had to write out the types here to understand how the method resolving works. We build up a nice chain of type information about exactly what function we want to use at what type. The whole pipeline is a Map<Filter<Range < … type> , all an instance of Stream.

So this should do ok right? No boxing of closures, there could be some lookups and dispatch but there’s enough type information here to know all calls statically. I don’t have much intuition for how Rust will optimize the chain of nested Yield/Skip constructors.. but I’m hopeful given the tags fit in 2 bits, and I don’t use Skip anywhere in the specific program.

288 microseconds to collapse a 1_000_000 element stream. Or about 25x faster. Nice!

So the type information and commitment to not allocating to the heap does a lot of work for us here. I ask cargo rustc --bin stream_test --release -- --emit asm for fun. And this is basically what I want to see: a single loop, no allocation, a bit of math. Great.

It’s converted the %2 / *2 body into adding a straight i64 addition loop with strides. I suspect with a bit of prodding it could resolve this statically to a constant but that’s just a toy anyway. All the intermediate data structures are gone.

Overall, that’s a pretty satisfying result. With minimal effort I got a fusing iterator/stream API that performs well out of the box. The Rust defaults nudge code towards low overhead by default. That can feel quite satisfying.

by Don Stewart at July 04, 2020 11:35 PM

July 03, 2020

Philip Wadler

Haskell, Then and Now. Got Questions? Ask them here!

IOHK Cardano Virtual Summit continues. Today's sessions include:

16.00 Fri 3 Jul Haskell, then and now: What is the future for functional programming languages? Prof Simon Peyton-Jones, Prof John Hughes, Prof Philip Wadler, Dr Kevin Hammond, Dr Duncan Coutts.

You can submit questions via Reddit. Register for the summit here. You can log in with your registered username and password here.

by Philip Wadler ( at July 03, 2020 09:05 AM

An Incredible Scientific Breakthrough Discovery to Beat Covid

I almost never see masks in Edinburgh, not even in stores or on busses. Brazil has serious problems, but no one in Rio de Janeiro goes outside without a mask. Courtesy of Tom the Dancing Bug.

by Philip Wadler ( at July 03, 2020 09:00 AM

July 02, 2020

Douglas M. Auclair (geophf)

June 2020 1HaskellADay Problems and Solutions

  • YAY! HELLO! Our first #haskell exercise in a while!... and this exercise is about ... wait for it ... exercise
  • For today's #haskell exercise we convert a set of arcs to a graph. #GraphTheory 
  • by geophf ( at July 02, 2020 02:38 PM

    July 01, 2020

    Joachim Breitner

    Template Haskell recompilation

    I was wondering: What happens if I have a Haskell module with Template Haskell that embeds some information from the environment (time, environment variables). Will such a module be reliable recompiled? And what if it gets recompiled, but the source code produced by Template Haskell is actually unchanged (e.g., because the environment variable has not changed), will all depending modules be recompiled (which would be bad)?

    Here is a quick experiment, using GHC-8.8:

    /tmp/th-recom-test $ cat Foo.hs
    {-# LANGUAGE TemplateHaskell #-}
    {-# OPTIONS_GHC -fforce-recomp #-}
    module Foo where
    import Language.Haskell.TH
    import Language.Haskell.TH.Syntax
    import System.Process
    theMinute :: String
    theMinute = $(runIO (readProcess "date" ["+%M"] "") >>= stringE)
    [jojo@kirk:2] Mi, der 01.07.2020 um 17:18 Uhr ☺
    /tmp/th-recom-test $ cat Main.hs
    import Foo
    main = putStrLn theMinute

    Note that I had to set {-# OPTIONS_GHC -fforce-recomp #-} – by default, GHC will not recompile a module, even if it uses Template Haskell and runIO. If you are reading from a file you can use addDependentFile to tell the compiler about that depenency, but that does not help with reading from the environment.

    So here is the test, and we get the desired behaviour: The Foo module is recompiled every time, but unless the minute has changed (see my prompt), Main is not recomipled:

    /tmp/th-recom-test $ ghc --make -O2 Main.hs -o test
    [1 of 2] Compiling Foo              ( Foo.hs, Foo.o )
    [2 of 2] Compiling Main             ( Main.hs, Main.o )
    Linking test ...
    [jojo@kirk:2] Mi, der 01.07.2020 um 17:20 Uhr ☺
    /tmp/th-recom-test $ ghc --make -O2 Main.hs -o test
    [1 of 2] Compiling Foo              ( Foo.hs, Foo.o )
    Linking test ...
    [jojo@kirk:2] Mi, der 01.07.2020 um 17:20 Uhr ☺
    /tmp/th-recom-test $ ghc --make -O2 Main.hs -o test
    [1 of 2] Compiling Foo              ( Foo.hs, Foo.o )
    [2 of 2] Compiling Main             ( Main.hs, Main.o ) [Foo changed]
    Linking test ...

    So all well!

    Update: It seems that while this works with ghc --make, the -fforce-recomp does not cause cabal build to rebuild the module. That’s unfortunate.

    by Joachim Breitner ( at July 01, 2020 03:16 PM

    Neil Mitchell

    A Rust self-ownership lifetime trick (that doesn't work)

    Summary: I came up with a clever trick to encode lifetimes of allocated values in Rust. It doesn't work.

    Let's imagine we are using Rust to implement some kind of container that can allocate values, and a special value can be associated with the container. It's a bug if the allocated value gets freed while it is the special value of a container. We might hope to use lifetimes to encode that relationship:

    struct Value<'v> {...}
    struct Container {...}

    impl Container {
    fn alloc<'v>(&'v self) -> Value<'v> {...}
    fn set_special<'v>(&'v self, x: Value<'v>) {...}

    Here we have a Container (which has no lifetime arguments), and a Value<'v> (where 'v ties it to the right container). Within our container we can implement alloc and set_special. In both cases, we take &'v self and then work with a Value<'v>, which ensures that the lifetime of the Container and Value match. (We ignore details of how to implement these functions - it's possible but requires unsafe).

    Unfortunately, the following code compiles:

    fn set_cheat<'v1, 'v2>(to: &'v1 Container, x: Value<'v2>) {

    The Rust compiler has taken advantage of the fact that Container can be reborrowed, and that Value is variant, and rewritten the code to:

    fn set_cheat<'v1, 'v2>(to: &'v1 Container, x: Value<'v2>) {
    'v3: {
    let x : Value<'v3> = x; // Value is variant, 'v2 : 'v3
    let to : &'v3 Container = &*to;

    The code with lifetime annotations doesn't actually compile, it's just what the compiler did under the hood. But we can stop Value being variant by making it contain PhantomData<Cell<&'v ()>>, since lifetimes under Cell are invariant. Now the above code no longer compiles. Unfortunately, there is a closely related variant which does compile:

    fn set_cheat_alloc<'v1, 'v2>(to: &'v1 Container, from: &'v2 Container) {
    let x = from.alloc();

    While Value isn't variant, &Container is, so the compiler has rewritten this code as:

    fn set_cheat<'v1, 'v2>(to: &'v1 Container, from: &'v2 Container) {
    'v3: {
    let from = &'v3 Container = &*from;
    let x : Value<'v3> = from.alloc();
    let to : &'v3 Container = &*to;

    Since lifetimes on & are always variant, I don't think there is a trick to make this work safely. Much of the information in this post was gleaned from this StackOverflow question.

    by Neil Mitchell ( at July 01, 2020 08:50 AM

    June 30, 2020

    Philip Wadler

    Cardano Virtual Summit 2020

    I'm participating in four sessions at Cardano Virtual Summit 2020, and there are many other sessions too. All times UK/BST.

    16.00 Thu 2 Jul An overview of IOHK research Prof Aggelos Kiayias, Prof Elias Koutsoupias, Prof Alexander Russell, Prof Phil Wadler.

    18.30 Thu 2 Jul Architecting the internet: what I would have done differently... Vint Cerf, Internet pioneer and Google internet evangelist, Prof Aggelos Kiayias, panel moderated by Prof Philip Wadler.

    20.00 Thu 2 Jul Functional smart contracts on Cardano Prof Philip Wadler, Dr Manuel Chakravarty, Prof Simon Thompson.

    16.00 Fri 3 Jul Haskell, then and now: What is the future for functional programming languages? Prof Simon Peyton-Jones, Prof John Hughes, Prof Philip Wadler, Dr Kevin Hammond, Dr Duncan Coutts.

    by Philip Wadler ( at June 30, 2020 11:23 AM

    June 29, 2020

    Brent Yorgey

    Competitive programming in Haskell: data representation and optimization, with cake

    In my previous post I challenged you to solve Checking Break, which presents us with a cake in the shape of a rectangular prism, with chocolate chips at various locations, and asks us to check whether a proposed division of the cake is valid. A division of the cake is valid if it is a partition (no pieces overlap and every part of the cake is in some piece) and every piece contains one chocolate chip.

    No one posted a solution—I don’t know if that’s because people have lost interest, or because no one was able to solve it—but in any case, don’t read this post yet if you still want to try solving it! As a very small hint, part of the reason I chose this problem is that it is an interesting example of a case where just getting the correct asymptotic time complexity is not enough—we actually have to work a bit to optimize our code so it fits within the allotted time limit.

    The algorithm

    When solving this problem I first just spent some time thinking about the different things I would have to compute and what algorithms and data structures I could use to accomplish them.

    • The first thing that jumped out at me is that we are going to want some kind of abstractions for 3D coordinates, and for 3D rectangular prisms (i.e. boxes, i.e. pieces of cake). Probably we can just represent boxes as a pair of points at opposite corners of the box (in fact this is how boxes are given to us). As we plan out how the rest of the solution is going to work we will come up with a list of operations these will need to support.

      As an aside, when working in Java I rarely make any classes beyond the single main class, because it’s too heavyweight. When working in Haskell, on the other hand, I often define little abstractions (i.e. data types and operations on them) because they are so lightweight, and being able to cleanly separate things into different layers of abstraction helps me write solutions that are more obviously correct.

    • We need to check that the coordinates of each given box are valid.

    • We will need to check that every piece of cake contains exactly one chocolate chip. At first this sounds difficult—given a chip, how do we find out which box(es) it is in? Or given a box, how can we find out which chips are in it? To do this efficiently seems like it will require some kind of advanced 3D space partitioning data structure, like an octree or a BSP tree. BUT this is a situation where reading carefully pays off: the problem statement actually says that “the i-th part must contain the i-th chocolate chip”. So all we have to do is zip the list of pieces together with the list of chips. We just need an operation to test whether a given point is contained in a given box.

    • We have to check that none of the boxes overlap. We can make a primitive to check whether two boxes intersect, but how do we make sure that none of the boxes intersect? Again, complicated space-partitioning data structures come to mind; but since there are at most 10^3 boxes, the number of pairs is on the order of 10^6. There can be multiple test cases, though, and the input specification says the sum of values for m (the number of pieces) over all test cases will be at most 5 \times 10^4. That means that in the worst case, we could get up to 50 test cases with 10^3 pieces of cake (and thus on the order of 10^6 pairs of pieces) per test case. Given 10^8 operations per second as a rough rule of thumb, it should be just barely manageable to do a brute-force check over every possible pair of boxes.

    • Finally, we have to check that the pieces account for every last bit of the cake. If we think about trying checking this directly, it is quite tricky. One could imagine making a 3D array representing every cubic unit of cake, and simply marking off the cubes covered by each piece, but this is completely out of the question since the cake could be up to 10^6 \times 10^6 \times 10^6 in size! Or we could again imagine some complicated space-partitioning structure to keep track of which parts have and have not been covered so far.

      But there is a much simpler way: just add up the volume of all the pieces and make sure it is the same as the volume of the whole cake! Of course this relies on the fact that we are also checking to make sure none of the pieces overlap: the volumes being equal implies that the whole cake is covered if and only if none of the pieces overlap. In any case, we will need a way to compute the volume of a box.

    Implementation and optimization

    Let’s start with some preliminaries: LANGUAGE pragmas, imports, main, and the parser.

    {-# LANGUAGE OverloadedStrings #-}
    {-# LANGUAGE RecordWildCards   #-}
    {-# LANGUAGE TupleSections     #-}
    import           Control.Arrow
    import           Data.Bool
    import qualified Data.ByteString.Lazy.Char8 as C
    import           Data.Monoid
    import           ScannerBS
    main = C.interact $
      runScanner (many tc) >>> init >>>
      map (solve >>> bool "NO" "YES") >>>
    data TC = TC { cake :: Box, chips :: [Pos], parts :: [Box] }
    tc :: Scanner TC
    tc = do
      a <- int
      case a of
        -1 -> return undefined
        _  -> do
          xs <- three int
          let [b,c,m] = xs
              cake    = Box (Pos 1 1 1) (Pos a b c)
          TC cake <$> m `times` pos <*> m `times` box

    The parser is worth remarking upon. The input consists of multiple test cases, with a single value of -1 marking the end of the input. This is annoying: ideally, we would have a many combinator that keeps running a Scanner until it fails, but we don’t. To keep things simple and fast, our Scanner abstraction does not support parse failures and alternatives! The many combinator we made keeps running a given Scanner until the end of input, not until it fails. The quick-and-dirty solution I adopted is to make the test case Scanner return undefined if it sees a -1, and then simply ignore the final element of the list of test cases via init. Not pretty but it gets the job done.

    Representing positions and boxes

    Next let’s consider building abstractions for 3D coordinates and boxes. It is very tempting to do something like this:

    type Pos = [Integer]
    type Box = [Pos]
    -- Check whether one position is componentwise <= another
    posLTE :: Pos -> Pos -> Pos
    posLTE p1 p2 = and $ zipWith (<=) p1 p2
    -- ... and so on

    Using list combinators like zipWith to work with Pos and Box values is quite convenient. And for some problems, using lists is totally fine. Having a small number of large lists—e.g. reading in a list of 10^5 integers and processing them somehow—is rarely a problem. But having a large number of small lists, as we would if we use lists to represent Pos and Box here, slows things down a lot (as I learned the hard way). I won’t go into the details of why—I am no expert on Haskell performance—but suffice to say that lists are a linked structure with a large memory overhead.

    So let’s do something more direct. We’ll represent both Pos and Box as data types with strict fields (the strict fields make a big difference, especially in the case of Pos), and make some trivial Scanners for them. The volume function computes the volume of a box; given that the coordinates are coordinates of the cubes that make up the pieces, and are both inclusive, we have to add one to the difference between the coordinates. Note we assume that the first coordinate of a Box should be elementwise less than or equal to the second; otherwise, the call to max 0 ensures we will get a volume of zero.

    data Pos = Pos !Int !Int !Int
    data Box = Box !Pos !Pos
    pos :: Scanner Pos
    pos = Pos <$> int <*> int <*> int
    box :: Scanner Box
    box = Box <$> pos <*> pos
    volume :: Box -> Int
    volume (Box (Pos x1 y1 z1) (Pos x2 y2 z2)) = (x2 -. x1) * (y2 -. y1) * (z2 -. z1)
        x -. y = max 0 (x - y + 1)

    Another very important note is that we are using Int instead of Integer. Using Integer is lovely when we can get away with it, since it means not worrying about overflow at all; but in this case using Int instead of Integer yields a huge speedup (some quick and dirty tests show about a factor of 6 speedup on my local machine, and replacing Int with Integer, without changing anything else, makes my solution no longer accepted on Kattis). Of course, this comes with an obligation to think about potential overflow: the cake can be at most 10^6 units on each side, giving a maximum possible volume of 10^{18}. On a 64-bit machine, that just fits within an Int (maxBound :: Int is approximately 9.2 \times 10^{18}). Since the Kattis test environment is definitely 64-bit, we are good to go. In fact, limits for competitive programming problems are often chosen so that required values will fit within 64-bit signed integers (C++ has no built-in facilities for arbitrary-size integers); I’m quite certain that’s why 10^6 was chosen as the maximum size of one dimension of the cake.

    Pos and Box utilities

    Next, some utilities for checking whether one Pos is elementwise less than or equal to another, and for taking the elementwise max and min of two Pos values. Checking whether a Box contains a Pos simply reduces to doing two calls to posLTE (again assuming a valid Box with the first corner componentwise no greater than the second).

    posLTE (Pos x1 y1 z1) (Pos x2 y2 z2) = x1 <= x2 && y1 <= y2 && z1 <= z2
    posMax (Pos x1 y1 z1) (Pos x2 y2 z2) = Pos (max x1 x2) (max y1 y2) (max z1 z2)
    posMin (Pos x1 y1 z1) (Pos x2 y2 z2) = Pos (min x1 x2) (min y1 y2) (min z1 z2)
    contains :: Box -> Pos -> Bool
    contains (Box lo hi) p = posLTE lo p && posLTE p hi

    To test whether a box is a valid box within a given cake, we test that its corners are in the correct order and fit within the low and high coordinates of the cake.

    valid :: Box -> Box -> Bool
    valid (Box lo hi) (Box c1 c2) = posLTE lo c1 && posLTE c1 c2 && posLTE c2 hi

    How to test whether two given boxes intersect or not? There are probably many ways to do this, but the nicest way I could come up with is to first find the actual Box which represents their intersection, and check whether it has a positive volume (relying on the fact that volume returns 0 for degenerate boxes with out-of-order coordinates). In turn, to find the intersection of two boxes, we just take the coordinatewise max of their lower corners, and the coordinatewise min of their upper corners.

    intersection :: Box -> Box -> Box
    intersection (Box c11 c12) (Box c21 c22) = Box (posMax c11 c21) (posMin c12 c22)
    disjoint :: Box -> Box -> Bool
    disjoint b1 b2 = volume (intersection b1 b2) == 0

    The solution

    Finally, we can put the pieces together to write the solve function. We simply check that all the given cake parts are valid; that every part contains its corresponding chocolate chip; that every pair of parts is disjoint; and that the sum of the volumes of all parts equals the volume of the entire cake.

    solve :: TC -> Bool
    solve (TC{..}) = and
      [ all (valid cake) parts
      , and $ zipWith contains parts chips
      , all (uncurry disjoint) (pairs parts)
      , sum (map volume parts) == volume cake

    Computing all pairs

    Actually, there’s still one missing piece: how to compute all possible pairs of parts. The simplest possible thing would be to use a list comprehension like

    [(x,y) | x <- parts, y <- parts]

    but this has problems: first, it includes a pairing of each part with itself, which will definitely have a nonzero intersection. We could exclude such pairs by adding x /= y as a guard, but there is another problem: (p2,p1) is included whenever (p1,p2) is included, but this is redundant since disjoint is commutative. In fact, we don’t really want all pairs; we want all unordered pairs, that is, all sets of size two. We can do that with the below utility function (which I have now added to Util.hs):

    pairs :: [a] -> [(a,a)]
    pairs []     = []
    pairs [_]    = []
    pairs (a:as) = map (a,) as ++ pairs as

    This is accepted, and runs in about 0.91 seconds (the time limit is 2 seconds). However, I was curious whether we are paying anything here for all the list operations, so I wrote the following version, which takes a binary operation for combining list elements, and a Monoid specifying how to combine the results, and directly returns the monoidal result of combining all the pairs, without ever constructing any intermediate lists or tuples at all. It’s sort of like taking the above pairs function, following it by a call to foldMap, and then manually fusing the two to get rid of the intermediate list.

    withPairs :: Monoid r => (a -> a -> r) -> [a] -> r
    withPairs _ []     = mempty
    withPairs _ [_]    = mempty
    withPairs f (a:as) = go as
        go []        = withPairs f as
        go (a2:rest) = f a a2 <> go rest

    To use this, we have to change the solve function slightly: instead of

      , all (uncurry disjoint) (pairs parts)

    we now have

      , getAll $ withPairs (\p1 p2 -> All $ disjoint p1 p2) parts

    This version runs significantly faster on Kattis—0.72 seconds as opposed to 0.91 seconds. (In fact, it’s faster than the currently-fastest Java solution (0.75 seconds), though there is still a big gap to the fastest C++ solution (0.06 seconds).) I don’t completely understand why this version is faster—perhaps one of you will be able to enlighten us!

    For next time

    For next time, we’ll go back to computational geometry: I invite you to solve Cookie Cutters.

    by Brent at June 29, 2020 09:55 PM

    Monday Morning Haskell

    Mid-Summer Break, Open AI Gym Series!


    We're taking a little bit of a mid-summer break from new content here at MMH. But we have done some extra work in organizing the site! Last week we wrapped up our series on Haskell and the Open AI Gym. We've now added that series as a permanent fixture on the advanced section of the page!

    Here's a quick summary of the series:

    Part 1: Frozen Lake Primer

    The first part introduces the Open AI framework and goes through the Frozen lake example. It presents the core concept of an environment.

    Part 2: Frozen Lake in Haskell

    In the second part, we write a basic version of Frozen Lake in Haskell.

    Part 3: Blackjack

    Next, we expand on our knowledge of games and environments to write a second game. This one based on casino Blackjack, and it will start to show us common elements in games.

    Part 4: Q-Learning

    Now we start getting into the ideas of reinforcement learning. We'll explore Q-Learning, one of the simplest techniques in this field. We'll apply this approach to both of our games.

    Part 5: Generalized Environments

    Now that we've seen the learning process in action, we can start generalizing our games. We'll create an abstract notion of what an Environment is. Just as Python has a specific API for their games, so will we! In true Haskell fashion, we'll represent this API with a type family!

    Part 6: Q-Learning with Tensors in Python

    In part 6, we'll take our Q-learning process a step further by using TensorFlow. We'll see how we can learn a more general function than we had before. We'll start this process in Python, where the mathematical operations are more clear.

    Part 7: Q-Learning with Tensors in Haskell

    Once we know how Q-Learning works with Python, we'll apply these techniques in Haskell as well! Once you get here, you'd better be ready to use your Haskell TensorFlow skills!

    Part 8: Rendering with Gloss

    In the final part of the series, we'll see how we can use the Gloss library to render our Haskell games!

    You can take a look at the series summary page for more details!

    In a couple weeks, we'll be back, this time with some fresh Rust content! Take a look at our Rust Video Tutorial to get a headstart on that!

    by James Bowen at June 29, 2020 02:30 PM

    Tweag I/O

    Splittable pseudo-random number generators in Haskell: random v1.1 and v1.2

    How we ensured that the implementation of random v1.2 produces higher quality random numbers than v1.1.

    June 29, 2020 12:00 AM

    The team

    Rust at FP Complete, 2020 update

    At FP Complete, we have long spoken about the three pillars of a software development language: productivity, robustness, and performance. Often times, these three pillars are in conflict with each other, or at least appear to be. Getting to market quickly (productivity) often involves skimping on quality assurance (robustness), or writing inefficient code (performance). Or you can write simple code which is easy to test and validate (productivity and robustness), but end up with a slow algorithm (performance). Optimizing the code takes time and may introduce new bugs.

    For the entire history of our company, our contention has been that while some level of trade-off here is inevitable, we can leverage better tools, languages, and methodologies to improve our standing on all of these pillars. We initially focused on Haskell, a functional programming language that uses a strong type system and offers decent performance. We still love and continue to use Haskell. However, realizing that code was only half the battle, we then began adopting DevOps methodologies and tools.

    We've watched with great interest as the Rust programming language has developed, matured, and been adopted in industry. Virtually all major technology companies are now putting significant effort behind Rust. Most recently, Microsoft has been quite publicly embracing Rust.

    In this post, I wanted to share some thoughts on why we're thrilled to see Rust's adoption in industry, what we're using Rust for at FP Complete, and give some advice to interested companies in how they can begin adopting this language.

    Why Rust?

    We're big believers in using the computer itself to help us write better code. Some of this can be done with methodologies like test-driven development (TDD). But there are two weak links in the chain of techniques like TDD:

    • It requires active effort to think through what needs to be tested
    • It's possible to ignore these test failures and ship broken code

    The latter might sound contrived, but we've seen it happen in industry. The limitations of testing are well known, and we've previously blogged about recommended testing strategies. And don't get me wrong: testing is an absolutely vital part of software development, and you should be doing more of it!

    But industry experience has shown us that many bugs slip through testing. Perhaps the most common and dangerous class of bug is memory safety issues. These include buffer overruns, use-after-free and double-free. What is especially worrying about these classes of bugs is that, typically, the best case scenario is your program crashing. Worst case scenario includes major security and privacy breaches.

    The industry standard approach has been to bypass these bugs by using managed languages. Managed languages bypass explicit memory management and instead rely on garbage collection. This introduces some downsides, latency being the biggest one. Typically, garbage collected languages are more memory hungry as well. This is the typical efficiency-vs-correctness trade-off mentioned above. We've been quite happy to make that trade-off ourselves, using languages like Haskell and accepting some level of performance hit.

    Rust took a different approach, one we admire deeply. By introducing concepts around ownership and borrowing, Rust seeks to drastically reduce the presence of memory safety errors, without introducing the overhead of garbage collection. This fits completely with FP Complete's mindset of using better tools when possible.

    The downside to this is complexity. Understanding ownership can be a challenge. But see below for information on how to get started with Rust. This is an area where FP Complete as a company, and I personally, have taken a lot of interest.

    Going beyond memory safety issues, however, is the rest of the Rust language design. As a relatively new language, Rust has the opportunity to learn from many other languages on the market already. And in our opinion, it has selected some of the best features available from other languages, especially our beloved Haskell. Some of these features include:

    • Strong typing
    • Sum types (aka enums) and pattern matching
    • Explicit error handling, but with a beautiful syntax
    • Async syntax
    • Functional style via closures and Iterator pipelines

    In other words: Rust has fully embraced the concepts of using better approaches to solve problems, and to steal great ideas that have been tried and tested. We believe Rust has the potential to drastically improve software quality in the world, and lead to more maintainable solutions. We think Rust can be instrumental in solving the global software crisis.

    Rust at FP Complete

    We've taken a three-pronged approach to Rust at FP Complete until now. This has included:

    • Producing educational material for both internal and external audiences
    • Using Rust for internal tooling
    • Writing product code with Rust

    The primary educational offering we've created is our Rust Crash Course, which we'll provide at the end of this post. This course has been honed to address the most common pitfalls we've seen developers hit when onboarding with Rust.

    Also, as a personal project, I decided to see if Rust could be taught as a first programming language, and I think it can.

    For internal tooling and product code, we always have the debate: should we use Rust or Haskell. We've been giving our engineers more freedom to make that decision themselves in the past year. Personally, I'm still more comfortable with Haskell, which isn't really surprising: I've been using Haskell professionally longer than Rust has existed. But the progress we're seeing in Rust—both in the library ecosystem and the language itself—means that Rust becomes more competitive on an almost monthly basis.

    At this point, we have some specific times when Rust is a clear winner:

    • When performance is critical, we prefer Rust. Haskell is usually fast enough, but microoptimizing Haskell code ends up taking more time than writing it in Rust.
    • For client-side code (e.g., command line tooling) we've been leaning towards Rust. Overall, it has better cross-OS support than Haskell.
    • There are some domains that have much better library coverage in Rust than in Haskell, and then we'll gravitate towards them. (The same applies in the other direction too.)
    • And as we're engineers who like playing with shiny tools, if someone wants to have extra fun, Rust is usually it. In most places in the world, Haskell would probably be considered the shiny toy. FP Complete is pretty exceptional there.

    We're beginning to expand to a fourth area of Rust at FP Complete: consulting services. The market for Rust has been steadily growing over the past few years. We believe at this point Rust is ready for much broader adoption, and we're eager to help companies adopt this wonderful language. If you're interested in learning more, please contact our consulting team for more information.

    Getting started

    How do you get started with a language like Rust? Fortunately, the tooling and documentation for Rust is top notch. We can strongly recommend checking out the Rust homepage for guidance on installing Rust and getting started. The freely available Rust book is great too, covering many aspects of the language.

    That said, my recommendation is to check out our Rust Crash Course eBook (linked below). We've tried to focus this book on answering the most common questions about Rust first, and get you up and running quickly.

    If you're interested in getting your team started with Rust, you may also want to reach out to us for information on our training programs.

    Want to read more about Rust? Check out the FP Complete Rust homepage.

    Want to learn more about FP Complete offerings? Please reach out to us any time.


    June 29, 2020 12:00 AM

    June 28, 2020

    Ken T Takusagawa

    [orfveorb] More Generalized Fermat Primes

    Consider numbers of the form a^2^n + b^2^n.  For each exponent n, we give the first 20 prime numbers of that form in ascending order.  Primes of this form is known as Generalized Fermat Primes, which is confusingly a generalization of another form (b restricted to 1) also known as Generalized Fermat Primes.

    a^2^0 + b^2^0 : [1,1] [2,0] [2,1] [3,0] [3,2] [4,1] [5,0] [4,3] [5,2] [6,1] [7,0] [6,5] [7,4] [8,3] [9,2] [10,1] [11,0] [7,6] [8,5] [9,4] [10,3]

    a^2^1 + b^2^1 : [1,1] [2,1] [3,2] [4,1] [5,2] [6,1] [5,4] [7,2] [6,5] [8,3] [8,5] [9,4] [10,1] [10,3] [8,7] [11,4] [10,7] [11,6] [13,2] [10,9] [12,7]

    a^2^2 + b^2^2 : [1,1] [2,1] [3,2] [4,1] [4,3] [5,2] [5,4] [6,1] [7,2] [7,4] [7,6] [8,3] [8,5] [9,2] [9,8] [10,7] [11,2] [11,4] [11,6] [10,9] [13,4]

    a^2^3 + b^2^3 : [1,1] [2,1] [4,1] [6,5] [10,3] [12,7] [13,2] [13,8] [14,3] [15,8] [16,9] [16,13] [17,4] [19,4] [18,17] [20,7] [20,17] [21,2] [21,8] [22,3] [22,5]

    a^2^4 + b^2^4 : [1,1] [2,1] [4,3] [6,5] [7,6] [8,7] [12,5] [13,8] [13,10] [14,5] [15,14] [16,11] [17,10] [17,14] [18,7] [18,11] [19,16] [21,8] [21,10] [22,9] [22,15]

    a^2^5 + b^2^5 : [1,1] [9,8] [11,10] [13,12] [18,13] [19,10] [23,22] [29,2] [29,22] [30,1] [33,4] [34,5] [37,18] [37,30] [37,34] [38,11] [38,29] [38,31] [39,4] [39,32] [40,3]

    a^2^6 + b^2^6 : [1,1] [11,8] [13,6] [16,7] [17,14] [29,12] [32,3] [32,27] [37,2] [38,7] [38,23] [39,4] [41,10] [49,6] [51,14] [52,3] [53,2] [55,4] [55,36] [58,21] [60,47]

    a^2^7 + b^2^7 : [1,1] [27,20] [32,31] [38,37] [44,25] [45,38] [47,10] [47,14] [47,24] [53,10] [54,47] [62,11] [66,65] [68,31] [69,40] [77,38] [78,53] [84,13] [85,6] [87,82] [88,21]

    a^2^8 + b^2^8 : [1,1] [14,5] [14,9] [16,3] [34,25] [38,13] [43,34] [50,17] [52,25] [54,31] [65,54] [68,49] [70,9] [73,44] [76,73] [77,12] [83,26] [83,48] [86,85] [87,86] [88,85]

    a^2^9 + b^2^9 : [1,1] [13,2] [29,22] [35,24] [38,35] [44,3] [46,1] [60,59] [68,61] [89,62] [92,89] [96,17] [99,94] [115,8] [115,18] [116,99] [117,10] [119,54] [124,19] [136,25] [143,68]

    a^2^10 + b^2^10 : [1,1] [47,26] [56,39] [67,28] [68,59] [72,47] [80,43] [82,79] [84,71] [103,32] [104,9] [114,97] [115,6] [119,86] [128,93] [134,49] [144,125] [146,107] [149,122] [157,22] [162,53]

    a^2^11 + b^2^11 : [1,1] [22,3] [43,2] [78,41] [82,9] [101,18] [106,29] [109,18] [150,1] [150,7] [163,78] [188,15] [209,142] [211,190] [236,101] [254,109] [259,76] [263,88] [264,71] [271,2] [281,52]

    a^2^12 + b^2^12 : [1,1] [53,2] [53,48] [122,69] [137,10] [153,40] [155,66] [215,98] [221,198] [228,211] [251,174] [260,77] [269,142] [281,188] [310,169] [311,30] [312,311] [317,74] [330,47]

    a^2^13 + b^2^13 : [1,1] [72,43] ... [257,52]

    Exponents 0 through 11 took 2.5 hours total.  The largest prime in that batch, 281^2048 + 52^2048, has size 16660 bits.

    The short list for exponent 12 is the result of 24 hours of computing.  The largest prime on that line, 330^4096 + 47^4096, has size 34269 bits.

    The first two primes for exponent 13 took 12 hours to find.  72^8192 + 43^8192 has size 50545 bits.  The third listed prime, 257^8192 + 52^8192 (65583 bits), was found when we accidentally started searching at the wrong start point.  There is an unsearched gap indicated by ellipses.

    We also searched for primes of the form (a^2^n + b^2^n)/2.  For exponents greater than 0, this requires both a and b to be odd, a parity combination not possible above.

    (a^2^0 + b^2^0)/2 : [2,2] [3,1] [4,0] [3,3] [4,2] [5,1] [6,0] [5,5] [6,4] [7,3] [8,2] [9,1] [10,0] [7,7] [8,6] [9,5] [10,4] [11,3] [12,2] [13,1] [14,0]

    (a^2^1 + b^2^1)/2 : [2,0] [3,1] [5,1] [5,3] [7,3] [7,5] [9,1] [9,5] [11,1] [11,5] [13,3] [13,5] [11,9] [13,7] [15,1] [15,7] [17,3] [17,5] [15,11] [19,1] [19,5]

    (a^2^2 + b^2^2)/2 : [3,1] [5,1] [5,3] [7,1] [9,5] [9,7] [11,1] [11,7] [11,9] [13,1] [13,3] [13,5] [13,11] [15,7] [15,11] [17,1] [17,3] [17,5] [17,7] [17,11] [17,13]

    (a^2^3 + b^2^3)/2 : [5,3] [9,1] [11,3] [13,1] [13,9] [17,3] [19,17] [23,3] [25,19] [27,7] [27,11] [29,5] [29,17] [29,23] [29,27] [31,5] [31,7] [31,11] [31,27] [31,29] [33,1]

    (a^2^4 + b^2^4)/2 : [3,1] [7,5] [9,1] [11,7] [13,5] [15,13] [17,3] [19,3] [23,5] [23,19] [25,11] [25,13] [27,5] [27,25] [29,1] [31,23] [31,25] [31,27] [35,9] [35,13] [35,31]

    (a^2^5 + b^2^5)/2 : [3,1] [9,1] [11,7] [21,1] [21,19] [25,7] [31,11] [33,13] [33,29] [35,29] [39,35] [41,7] [43,3] [43,19] [47,37] [49,3] [49,37] [51,25] [53,3] [53,23] [53,29]

    (a^2^6 + b^2^6)/2 : [3,1] [19,11] [33,19] [35,1] [41,17] [41,37] [43,29] [51,1] [51,49] [55,27] [59,51] [61,19] [65,17] [71,23] [75,23] [75,37] [81,67] [83,61] [85,1] [89,49] [91,81]

    (a^2^7 + b^2^7)/2 : [49,9] [51,7] [59,23] [67,11] [67,35] [69,43] [69,53] [71,39] [73,11] [87,37] [89,17] [91,85] [99,61] [113,1] [113,15] [121,89] [121,113] [125,3] [127,27] [127,115] [131,37]

    (a^2^8 + b^2^8)/2 : [7,3] [21,5] [37,11] [37,17] [45,23] [51,23] [57,35] [61,53] [75,37] [75,43] [81,59] [89,63] [95,31] [101,83] [103,11] [107,63] [111,35] [111,89] [115,61] [121,7] [121,13]

    (a^2^9 + b^2^9)/2 : [35,9] [41,17] [51,13] [67,15] [81,37] [83,37] [89,83] [101,3] [113,91] [115,79] [123,47] [123,85] [127,51] [127,107] [131,51] [135,79] [137,31] [149,87] [155,39] [155,43] [159,13]

    (a^2^10 + b^2^10)/2 : [67,57] [77,15] [79,7] [93,85] [95,61] [117,29] [151,33] [181,71] [181,155] [183,37] [185,147] [191,111] [193,11] [199,55] [211,29] [211,113] [215,21] [223,73] [223,83] [229,93] [229,185]

    (a^2^11 + b^2^11)/2 : [75,49] [109,69] [109,81] [167,75] [193,155] [195,41] [227,53] [249,107] [259,223] [275,39] [281,107] [299,35] [311,117] [333,287] [335,239] [349,259] [351,239] [353,125] [409,357] [431,39] [431,167]

    This list took 3.5 hours.  The largest prime has size 17923 bits.

    Haskell source code here.  We use Data.List.Ordered.mergeBy to merge infinite lists to generate numbers in order for primality testing.  The code is generalized to handle an arbitrary number of terms, a^2^n + b^2^n + c^2^n + ..., though we only investigated 2 terms.  The tricky bit of code, in the cchildren function, avoids generating the same number multiple times.  This saves an exponential amount of work.

    Update (2020-06-28): minor wordsmithing.

    by Unknown ( at June 28, 2020 07:22 AM

    June 27, 2020

    Chris Smith 2

    Using client-side Haskell web frameworks in CodeWorld

    I’ve made another change to make the CodeWorld environment more useful for general Haskell programming.


    Here is the calculator from

    Here is the calculator running inside CodeWorld:


    You can also use Miso from CodeWorld:


    You can even build your own HTML with ghcjs-dom, jsaddle, or even just the JavaScript FFI. For example, here’s how you can incorporate the diagrams library into CodeWorld:


    I’d love to add anything else that’s useful. The framework must be usable in an otherwise blank page, so frameworks that expect lost of preexisting HTML, CSS, images, etc. won’t work out so well. If you would like to use CodeWorld with a different client-side web framework or library that meets this description, just leave a comment and let me know, or send a GitHub pull request to modify this file and add it.


    Setting up a development environment with GHCJS can be a real pain. For serious work, nix is emerging as the standard tooling, but nix is the most complex build tool available for Haskell. Building GHCJS with cabal or stack is possible, but the setup takes hours. Entire tooling projects (see Obelisk) have been born out of the goal of helping make it easier to get up and running with a GHCJS client-side web project.

    But what if you’re visiting a friend and want to show them what Haskell can do? They don’t have GHC installed, and you don’t want to start your pitch by walking them through an hour of software setup. Or maybe you’re teaching a survey class on different applications of functional programming. Maybe you want to post something to Twitter, or your blog, and let readers tinker and experiment with it easily?

    CodeWorld is the answer. Just write code, click Run, and voila! Want to share your code? Just copy and paste the URL and the people you share it with can see, edit, and run their modified versions of your code. Zero installation, no down time.


    This was actually a pretty easy change to make, once I realized I should do it. It used to be that the CodeWorld environment ran applications in an HTML page with a giant Canvas, and the codeworld-api package looked up that canvas and drew onto it. After this change, the CodeWorld environment runs applications in a document with an empty body. Any client-side framework, whether it’s codeworld-api, reflex-dom, etc., can add its own elements to the document. The codeworld-api package now dynamically adds its canvas to the document before drawing to it.

    There were a few more details to work out. I don’t want the output page to pop up when you’re only running console-mode code in the CodeWorld environment, so there’s a trick with MutationObserver to detect when the running app modifies its own document, and change the parent UI to show it if so.

    Implementation details are documented at, if you’re excessively curious.

    by Chris Smith at June 27, 2020 01:23 AM

    Thanks, Paul.

    Thanks, Paul. Glad you've found it useful. Good news: I added smallcheck to the available package list.

    by Chris Smith at June 27, 2020 12:33 AM

    June 26, 2020

    Chris Smith 2

    Teaching quadratic functions

    Teaching quadratic expressions

    Quadratic expressions are the first category of non-linear expressions that algebra students typically study in detail. How do you motivate quadratics as an interesting object of study?

    There are a few answers to this question. For example:

    • Khan Academy starts with parabolas. After a hint that the shape of a parabola has to do with ballistics and the path of a thrown object, they go on to describe the shape of the graph in detail, including the vertex, the axis of symmetry, and the x and y intercepts.
    • EngageNY starts with algebraic techniques. Specifically, it is concerned with distributing and factoring. Engage goes out of its way to avoid giving any early significance to quadratics, instead focusing on the more general category of polynomials, and solving problems with polynomials using factoring. Quadratics are presented as a special case, and associated set of tricks, for that general problem.

    I have always liked a third option, which I’m writing about here. Quadratics can be taught as a natural generalization of linear expressions, keeping the focus squarely on what these expressions mean.

    Reviewing linear expressions

    To follow this line of reasoning, students will need a previous understanding of linear expressions. I’ll keep this brief, but the building blocks they need most are here.

    First, they must understand that an expression represents a number. The specific number that it represents, though, may change depending on the values of the variables used in that expression.

    I find students often get confused about the meaning of what they are doing as soon as x and y coordinates get involved. Suddenly they think the meaning of a linear expression is a line. (It’s the other way around: a line is one of several ways to represent a linear relationship!) This confusion is reinforced by terms like slope and especially y-intercept, which talk about the graph instead of the relationship between quantities. This has consequences: students who learn this way can answer questions about graphs, but don’t transfer that understanding to other changing values.

    For this reason, I prefer to leave x and y out of it, and talk in terms of t, which can represent either time or just a generic parameter, instead. An expression like mt + b represents a number, m represents the rate of change of that number (relative to change in t), and b represents the starting value of that number (when t = 0). That m is also the slope a a graph, and b the y-intercept of the graph, is a secondary meaning.

    Students should also understand that the defining characteristic of a linear expression is that it changes at a constant rate. (Specifically, that rate is m.)

    Generalizing to non-linear functions

    To reach non-linear functions, one simply changes the assumption of the constant rate of change, instead using another linear expression for the rate of change.

    The resulting expression looks like: (m t + v) t + b. The rate of change is now the entire linear expression: m t + b. Now students can dig into the rate of change, and they will see that it has its own initial value and its own rate of change. (There’s one important caveat, though, discussed in the next paragraph.) There’s also just one quick application of the distributive property between this form and the more popular m t² + v t + b. But this time, the meanings of the coefficients are front and center.

    Here’s the caveat: m does not represent the acceleration, or change in the instantaneous rate. Instead, it represents the change in the average rate so far. A bit of guided exploration can clarify that this must be the case: to decide how far something has traveled, you need to know its average speed, not its speed right this moment. The starting rate is v. The rate at time t is a t + v (where a is the acceleration). That means the average rate so far is the sum of these, divided by 2, or 0.5 a t + v, so m is only half of the instantaneous acceleration.

    If a vehicle is accelerating, the distance it travels is related to its average speed.

    The upshot of this is that if students accept that a linear expression is the simplest kind of smooth change, then a quadratic expression is the simplest kind of smooth non-linear change!

    What next?

    Of course, once the importance of quadratic expressions is established, it’s still important to talk about the parabolas that appear in their graphs. It’s still important to talk about techniques for solving them. It’s important to talk about situations, such as trajectories of thrown objects, where they come up a lot. But as you do this, students will hopefully understand this as talking about an idea that has a fundamentally important meaning. They aren’t studying quadratics because parabolas are an interesting shape or because the quadratic formula is so cool; they are studying them because once you need to talk about non-linear change, quadratics are the simplest model for doing that.

    If you are the whimsical sort, though, you might notice one more connection. The choice of inserting a linear expression for rate of change was the simplest option, but ultimately arbitrary. In fact, any continuous non-linear function can be written as f(t) = a(t) t + b, if a(t) is a function giving the average rate of change of f(t) over the range from t = 0 to its current value. How could one find such a value for a(t)?

    In calculus, students will learn that the derivative of a function gives the instantaneous rate of change of a function at any point in time. We want a(t), then to represent the average value of that derivative. That integrals are so closely related to average values of a function over a range of input is less well-understood by early calculus students. Again, it’s more popular to present these ideas in terminology about areas, that confuse the graph representation for the fundamental meaning. But in fact, a(t) t is precisely the integral (specifically: the definite integral evaluated from 0 to t) of the derivative. A constant factor is lost by taking the derivative, so b recovers that detail. Everything fits nicely together.

    You wouldn’t get into this with an algebra class, but it’s an interesting follow-on.

    by Chris Smith at June 26, 2020 09:41 PM

    Don Stewart (dons)

    Writing code again…

    For the past 3 years I’ve been managing the Testing + Verification org at Facebook London, a group of teams that build bug finding tools, focusing on test infrastructure, dynamic analysis, automated test generation, type systems and static analysis.

    Now, I’m switching track to a software engineering role at Facebook, in the same organization. I’m interested in bug prevention strategies when you have a very large scale, highly automated CI system. Given all the great tools we have to find bugs, how do you best integrate them into development processes, particularly when there are thousands of engineers in the loop.

    And, in case you’re wondering, I’ll be hacking in some mix of Rust, Hack, maybe a bit of Haskell or OCaml, probably some Python in the mix, and some ML + data viz/data analysis.

    by Don Stewart at June 26, 2020 10:23 AM

    June 25, 2020

    Philip Wadler

    Coronavirus: Why You Must Act Now

    Unclear on what is happening with Coronavirus or what you should do about it? Tomas Pueyo presents a stunning analysis with lots of charts, a computer model you can use, and some clear and evidence-based conclusions. Please read it and do as he says!

    by Philip Wadler ( at June 25, 2020 10:46 AM

    Kim Stanley Robinson: The Coronavirus Is Rewriting Our Imaginations

    One of the more thoughtful and hopeful analyses I've seen, from sf writer Kim Stanley Robinson in The New Yorker.
    The critic Raymond Williams once wrote that every historical period has its own “structure of feeling.” How everything seemed in the nineteen-sixties, the way the Victorians understood one another, the chivalry of the Middle Ages, the world view of Tang-dynasty China: each period, Williams thought, had a distinct way of organizing basic human emotions into an overarching cultural system. Each had its own way of experiencing being alive.
    In mid-March, in a prior age, I spent a week rafting down the Grand Canyon. When I left for the trip, the United States was still beginning to grapple with the reality of the coronavirus pandemic. Italy was suffering; the N.B.A. had just suspended its season; Tom Hanks had been reported ill. When I hiked back up, on March 19th, it was into a different world. I’ve spent my life writing science-fiction novels that try to convey some of the strangeness of the future. But I was still shocked by how much had changed, and how quickly.
    Schools and borders had closed; the governor of California, like governors elsewhere, had asked residents to begin staying at home. But the change that struck me seemed more abstract and internal. It was a change in the way we were looking at things, and it is still ongoing. The virus is rewriting our imaginations. What felt impossible has become thinkable. We’re getting a different sense of our place in history. We know we’re entering a new world, a new era. We seem to be learning our way into a new structure of feeling.

    by Philip Wadler ( at June 25, 2020 10:44 AM


    ZuriHac 2020 Advanced Track Materials

    We had a lot of fun at ZuriHac this year, and are very grateful for the many attendees of our two Advanced Track lectures on “Datatype-generic programming” and “Haskell and Infosec”.

    Thanks to the organisers for a wonderful event!

    The videos of our sessions as well as the materials used in them are available online, so those who could not attend during the event itself can still do so by following the links below.

    Datatype-Generic Programming (by Andres Löh)

    Watch video on YouTubeRead lecture notes on GithubRepository with source code

    Datatype-Generic programming is a powerful tool that allows the implementation of functions that adapt themselves to a large class of datatypes and can be made available on new datatypes easily by means such as “deriving”.

    In this workshop, we focus on the ideas at the core of two popular approaches to generic programming: GHC.Generics and generics-sop. Both can be difficult to understand at first. We build simpler versions of both approaches to illustrate some of the design decisions taken. This exploration will lead to a better understanding of the trade-offs and ultimately also make using these libraries easier.

    This presentation involves various type-level programming concepts such as type families, data kinds, GADTs and higher-rank types. It’s not a requirement to be an expert in using these features, but I do not focus on explaining them in detail either; so having some basic familiarity with the syntax and semantics is helpful.

    Haskell and Infosec (by Tobias Dammers)

    Watch video on YouTubeView slides on Google Drive

    In this workshop, we look at Haskell from an information security point of view. How do security vulnerabilities such as SQL Injection (SQLi) or Cross-Site Scripting (XSS) work? How does a hacker exploit them? What can we, as Haskell programmers, do to prevent them, and how can Haskell help us with that? And what principles can we extract from this to develop more security-aware coding habits?

    No knowledge of or experience with information security is required for this course, but participants are expected to have a working knowledge of practical Haskell. If you can write a simple web application with, e.g., Scotty, you should be fine.

    Other ZuriHac videos

    Of course, there was more to ZuriHac than just the Advanced Track. If you haven’t yet, you might want to have a look at the YouTube playlist, which also contains all the keynotes as well as the lectures of the GHC track.

    Well-Typed courses and services

    If you are interested in our courses or other services, check our Training page, Services page, or just send us an email.

    by andres at June 25, 2020 12:00 AM

    Tweag I/O

    Nix Flakes, Part 2: Evaluation caching

    How Nix flakes enable caching of evaluation results of Nix expressions.

    June 25, 2020 12:00 AM

    June 24, 2020

    Brent Yorgey

    Competitive programming in Haskell: vectors and 2D geometry

    In my previous post (apologies it has been so long!) I challenged you to solve Vacuumba, which asks us to figure out where a robot ends up after following a sequence of instructions. Mathematically, this corresponds to adding up a bunch of vectors, but the interesting point is that the instructions are always relative to the robot’s current state, so robot programs are imperative programs.

    Vector basics

    The first order of business is to code up some primitives for dealing with (2D) vectors. I have accumulated a lot of library code for doing geometric stuff, but it’s kind of a mess; I’m using this as an opportunity to clean it up bit by bit. So there won’t be much code at first, but the library will grow as we do more geometry problems. The code so far (explained below) can be found in the comprog-hs repository.

    First, a basic representation for 2D vectors, the zero vector, and addition and subtraction of vectors.

    {-# LANGUAGE GeneralizedNewtypeDeriving #-}
    module Geom where
    -- 2D points and vectors
    data V2 s = V2 !s !s deriving (Eq, Ord, Show)
    type V2D  = V2 Double
    instance Foldable V2 where
      foldMap f (V2 x y) = f x <> f y
    zero :: Num s => V2 s
    zero = V2 0 0
    -- Adding and subtracting vectors
    (^+^), (^-^) :: Num s => V2 s -> V2 s -> V2 s
    V2 x1 y1 ^+^ V2 x2 y2 = V2 (x1+x2) (y1+y2)
    V2 x1 y1 ^-^ V2 x2 y2 = V2 (x1-x2) (y1-y2)

    A few things to point out:

    • The V2 type is parameterized over the type of scalars, but we define V2D as a synonym for V2 Double, which is very common. The reason for making V2 polymorphic in the first place, though, is that some problems require the use of exact integer arithmetic. It’s nice to be able to share code where we can, and have the type system enforce what we can and can’t do with vectors over various scalar types.

    • For a long time I just represented vectors as lists, type V2 s = [s]. This makes implementing addition and subtraction very convenient: for example, (^+^) = zipWith (+). Although this has worked just fine for solving many geometry problems, I have recently been reminded that having lots of small lists can be bad for performance. As long as we’re making a library anyway we might as well use a proper data type for vectors!

    • Elsewhere I have made a big deal out of the fact that vectors and points ought to be represented as separate types. But in a competitive programming context I have always just used a single type for both and it hasn’t bit me (yet!).

    • The Foldable instance for V2 gets us toList. It also gets us things like sum and maximum which could occasionally come in handy.

    Angles and rotation

    The other thing we are going to need for this problem is angles.

    -- Angles
    newtype Angle = A Double  -- angle (radians)
      deriving (Show, Eq, Ord, Num, Fractional, Floating)
    fromDeg :: Double -> Angle
    fromDeg d = A (d * pi / 180)
    fromRad :: Double -> Angle
    fromRad = A
    toDeg :: Angle -> Double
    toDeg (A r) = r * 180 / pi
    toRad :: Angle -> Double
    toRad (A r) = r
    -- Construct a vector in polar coordinates.
    fromPolar :: Double -> Angle -> V2D
    fromPolar r θ = rot θ (V2 r 0)
    -- Rotate a vector counterclockwise by a given angle.
    rot :: Angle -> V2D -> V2D
    rot (A θ) (V2 x y) = V2 (cos θ * x - sin θ * y) (sin θ * x + cos θ * y)

    Nothing too complicated going on here: we have a type to represent angles, conversions to and from degrees and radians, and then two uses for angles: a function to construct a vector in polar coordinates, and a function to perform rotation.

    Incidentally, one could of course define type Angle = Double, which would be simpler in some ways, but after getting bitten several times by forgetting to convert from degrees to radians, I decided it was much better to use a newtype and entirely prevent that class of error.

    Solving Vacuumba

    Now we just put the pieces together to solve the problem. First, some imports:

    {-# LANGUAGE FlexibleContexts #-}
    {-# LANGUAGE RecordWildCards  #-}
    import           Control.Arrow
    import           Control.Monad.State
    import qualified Data.Foldable       as F
    import           Text.Printf
    import           Geom
    import           Scanner

    We make a data type for representing robot instructions, and a corresponding Scanner. Notice how we are forced to use fromDeg to convert the raw input into an appropriate type.

    data Instr = I { turn :: Angle, dist :: Double }
    instr :: Scanner Instr
    instr = I <$> (fromDeg <$> double) <*> double

    The high-level solution then reads the input via a Scanner, solves each scenario, and formats the output. The output is a V2D, so we just convert it to a list with F.toList and use printf to format each coordinate.

    main = interact $
      runScanner (numberOf (numberOf instr)) >>>
      map (solve >>> F.toList >>> map (printf "%.6f") >>> unwords) >>> unlines

    Our solve function needs to take a list of instructions, and output the final location of the robot. Since the instructions can be seen as an imperative program for updating the state of the robot, it’s entirely appropriate to use a localized State computation.

    First, a data type to represent the robot’s current state, consisting of a 2D vector recording the position, and an angle to record the current heading. initRS records the robot’s initial state (noting that it starts out facing north, corresponding to an angle of 90^\circ as measured clockwise from the positive x-axis).

    data RobotState = RS { pos :: V2D, heading :: Angle }
    initRS = RS zero (fromDeg 90)

    Finally, the solve function itself executes each instruction in sequence as a State RobotState computation, uses execState to run the resulting overall computation and extract the final state, and then projects out the robot’s final position. Executing a single instruction is where the geometry happens: we look up the current robot state, calculate its new heading by adding the turn angle to the current heading, construct a movement vector in the direction of the new heading using polar coordinates, and add the movement to the current position.

    solve :: [Instr] -> V2D
    solve = mapM_ exec >>> flip execState initRS >>> pos
        exec :: Instr -> State RobotState ()
        exec (I θ d) = do
          RS{..} <- get
          let heading' = heading + θ
              move     = fromPolar d heading'
          put $ RS (pos ^+^ move) heading'

    For next time

    We’ll definitely be doing more geometry, but for the next post I feel like doing something different. I invite you to solve Checking Break.

    by Brent at June 24, 2020 08:21 PM

    Chris Smith 2

    Toy Machine Learning with Haskell

    In this post, I show how you can use Haskell and the ad package (automatic differentiation) to build a toy machine learning model in Haskell. I’ve tried to write enough that someone without a machine learning background can follow the code. If you do have an ML background, just skim those parts!

    I’m going to try something different. Instead of writing in Medium, I’ve written up this post in comments inside of CodeWorld.

    Part 1: Model Structure

    In this part, I explain what a machine learning model is, define the Haskell types for the parts of the model, and write the code to make the model work. We end with a visualization of a pre-trained model to recognize which points are in a circle.


    Part 2: Training

    In this part, I show how to use automatic differentiation to train the model that we defined in the previous section.


    by Chris Smith at June 24, 2020 04:20 PM

    June 22, 2020

    Neil Mitchell

    The HLint Match Engine

    Summary: HLint has a match engine which powers most of the rules.

    The Haskell linter HLint has two forms of lint - some are built in written in Haskell code over the GHC AST (e.g. unused extension detection), but 700+ hints are written using a matching engine. As an example, we can replace map f (map g xs) with map (f . g) xs. Doing so might be more efficient, but importantly for HLint, it's often clearer. That rule is defined in HLint as:

    - hint: {lhs: map f (map g x), rhs: map (f . g) x}

    All single-letter variables are wildcard matches, so the above rule will match:

    map isDigit (map toUpper "test")

    And suggest:

    map (isDigit . toUpper) "test"

    However, Haskell programmers are uniquely creative in specifying functions - with a huge variety of $ and . operators, infix operators etc. The HLint matching engine in HLint v3.1.4 would match this rule to all of the following (I'm using sort as a convenient function, replacing it with foo below would not change any matches):

    • map f . map g
    • sort . map f . map g . sort
    • concatMap (map f . map g)
    • map f (map (g xs) xs)
    • f `map` (g `map` xs)
    • map f $ map g xs
    • map f (map g $ xs)
    • map f (map (\x -> g x) xs)
    • f ( g xs)
    • map f ((sort . map g) xs)

    That's a large variety of ways to write a nested map. In this post I'll explain how HLint matches everything above, and the bug that used to cause it to match even the final line (which isn't a legitimate match) which was fixed in HLint v3.1.5.


    Given a hint comprising of lhs and rhs, the first thing HLint does is determine if it can eta-contract the hint, producing a version without the final argument. If it can do so for both sides, it generates a completely fresh hint. In the case of map f (map g x) in generates:

    - hint: {lhs: map f . map g, rhs: map (f . g)}

    For the examples above, the first three match with this eta-contracted version, and the rest match with the original form. Now we've generated two hints, it's important that we don't perform sufficiently fuzzy matching that both match some expression, as that would generate twice as many warnings as appropriate.

    Root matching

    The next step is root matching, which happens only when trying to match at the root of some match. If we have (foo . bar) x then it would be reasonable for that to match bar x, despite the fact that bar x is not a subexpression. We overcome that by transforming the expression to foo (bar x), unifying only on bar x, and recording that we need to add back foo . at the start of the replacement.

    Expression matching

    After splitting off any extra prefix, HLint tries to unify the single-letter variables with expressions, and build a substitution table with type Maybe [(String, Expr)]. The substitution is Nothing to denote the expressions are incompatible, or Just a mapping of variables to the expression they matched. If two expressions have the same structure, we descend into all child terms and match further. If they don't have the same structure, but are similar in a number of ways, we adjust the source expression and continue.

    Examples of adjustments include expanding out $, removing infix application such as f `map` x and ignoring redundant brackets. We translate (f . g) x to f (g x), but not at the root - otherwise we might match both the eta-expanded and non-eta-expanded variants. We also re-associate (.) where needed, e.g. for expressions like sort . map f . map g . sort the bracketing means we have sort . (map f . (map g . sort)). We can see that map f . map g is not a subexpression of that expression, but given that . is associative, we can adjust the source.

    When we get down to a terminal name like map, we use the scope information HLint knows to determine if the two map's are equivalent. I'm not going to talk about that too much, as it's slated to be rewritten in a future version of HLint, and is currently both slow and a bit approximate.

    Substitution validity

    Once we have a substitution, we see if there are any variables which map to multiple distinct expressions. If so, the substitution is invalid, and we don't match. However, in our example above, there are no duplicate variables so any matching substitution must be valid.

    Side conditions

    Next we check any side conditions - e.g. we could decide that the above hint only makes sense if x is atomic - i.e. does not need brackets in any circumstance. We could have expressed that with side: isAtom x, and any such conditions are checked in a fairly straightforward manner.


    Finally, we substitute the variables into the provided replacement. When doing the replacement, we keep track of the free variables, and if the resulting expression has more free variables than it started with, we assume the hint doesn't apply cleanly. As an example, consider the hint \x -> a <$> b x to fmap a . b. It looks a perfectly reasonable hint, but what if we apply it to the expression \x -> f <$> g x x. Now b matches g x, but we are throwing away the \x binding and x is now dangling, so we reject it.

    When performing the substitution, we used knowledge of the AST we want, and the brackets required to parse that expression, to ensure we insert the right brackets, but not too many.

    Bug #1055

    Hopefully all the above sounds quite reasonable. Unfortunately, at some point, the root-matching lost the check that it really was at the root, and started applying the translation to terms such as sort . in map f ((sort . map g) xs). Having generated the sort ., it decided since it wasn't at the root, there was nowhere for it to go, so promptly threw it away. Oops. HLint v3.1.5 fixes the bug in two distinct ways (for defence in depth):

    1. It checks the root boolean before doing the root matching rule.
    2. If it would have to throw away any extra expression, it fails, as throwing away that expression is certain to lead to a correctness bug.


    The matching engine of HLint is relatively complex, but I always assumed one day would be replaced with a finite-state-machine scanner that could match n hints against an expression in O(size-of-expression), rather than the current O(n * size-of-expression). However, it's never been the bottleneck, so I've kept with the more direct version.

    I'm glad HLint has a simple external lint format. It allows easy contributions and makes hint authoring accessible to everyone. For large projects it's easy to define your own hints to capture common coding patterns. When using languages whose linter does not have an external matching language (e.g. Rust's Clippy) I certainly miss the easy customization.

    by Neil Mitchell ( at June 22, 2020 08:22 PM

    Monday Morning Haskell

    Rendering Frozen Lake with Gloss!


    We've spent the last few weeks exploring some of the ideas in the Open AI Gym framework. We made a couple games, generalized them, and applied some machine learning techniques. When it comes to rendering our games though, we're still relying on a very basic command line text format.

    But if we want to design agents for more visually appealing games, we'll need a better solution! Last year, we spent quite a lot of time learning about the Gloss library. This library makes it easy to create simple games and render them using OpenGL. Take a look at this article for a summary of our work there and some links to the basics.

    In this article, we'll explore how we can draw some connections between Gloss and our Open AI Gym work. We'll see how we can take the functions we've already written and use them within Gloss!

    Gloss Basics

    The key entrypoint for a Gloss game is the play function. At its core is the world type parameter, which we'll define for ourselves later.

    play :: Display -> Color -> Int
      -> world
      -> (world -> Picture)
      -> (Event -> world -> world)
      -> (Float -> world -> world)
      -> IO ()

    We won't go into the first three parameters. But the rest are important. The first is our initial world state. The second is our rendering function. It creates a Picture for the current state. Then comes an "event handler". This takes user input events and updates the world based on the actions. Finally there is the update function. This changes the world based on the passage of time, rather than specific user inputs.

    This structure should sound familiar, because it's a lot like our Open AI environments! The initial world is like the "reset" function. Then both systems have a "render" function. And the update functions are like our stepEnv function.

    The main difference we'll see is that Gloss's functions work in a pure way. Recall our "environment" functions use the "State" monad. Let's explore this some more.

    Re-Writing Environment Functions

    Let's take a look at the basic form of these environment functions, in the Frozen Lake context:

    resetEnv :: (Monad m) => StateT FrozenLakeEnvironment m Observation
    stepEnv :: (Monad m) =>
      Action -> StateT FrozenLakeEnvironment m (Observation, Double, Bool)
    renderEnv :: (MonadIO m) => StateT FrozenLakeEnvironment m ()

    These all use State. This makes it easy to chain them together. But if we look at the implementations, a lot of them don't really need to use State. They tend to unwrap the environment at the start with get, calculate new results, and then have a final put call.

    This means we can rewrite them to fit more within Gloss's pure structure! We'll ignore rendering, since that will be very different. But here are some alternate type signatures:

    resetEnv' :: FrozenLakeEnvironment -> FrozenLakeEnvironment
    stepEnv' :: Action -> FrozenLakeEnvironment
      -> (FrozenLakeEnvironment, Double, Bool)

    We'll exclude Observation as an output, since the environment contains that through currentObservation. The implementation for each of these looks like the original. Here's what resetting looks like:

    resetEnv' :: FrozenLakeEnvironment -> FrozenLakeEnvironment
    resetEnv' fle = fle
      { currentObservation = 0
      , previousAction = Nothing

    Now for stepping our environment forward:

    stepEnv' :: Action -> FrozenLakeEnvironment -> (FrozenLakeEnvironment, Double, Bool)
    stepEnv' act fle = (finalEnv, reward, done)
        currentObs = currentObservation fle
        (slipRoll, gen') = randomR (0.0, 1.0) (randomGenerator fle)
        allLegalMoves = legalMoves currentObs (dimens fle)
        numMoves = length allLegalMoves - 1
        (randomMoveIndex, finalGen) = randomR (0, numMoves) gen'
        newObservation = ... -- Random move, or apply the action
        (done, reward) = case (grid fle) A.! newObservation of
          Goal -> (True, 1.0)
          Hole -> (True, 0.0)
          _ -> (False, 0.0)
        finalEnv = fle
          { currentObservation = newObservation
          , randomGenerator = finalGen
          , previousAction = Just act

    What's even better is that we can now rewrite our original State functions using these!

    resetEnv :: (Monad m) => StateT FrozenLakeEnvironment m Observation
    resetEnv = do
      modify resetEnv'
      gets currentObservation
    stepEnv :: (Monad m) =>
      Action -> StateT FrozenLakeEnvironment m (Observation, Double, Bool)
    stepEnv act = do
      fle <- get
      let (finalEnv, reward, done) = stepEnv' act fle
      put finalEnv
      return (currentObservation finalEnv, reward, done)

    Implementing Gloss

    Now let's see how this ties in with Gloss. It might be tempting to use our Environment as the world type. But it can be useful to attach other information as well. For one example, we can also include the current GameResult, telling us if we've won, lost, or if the game is still going.

    data GameResult =
      GameInProgress |
      GameWon |
      deriving (Show, Eq)
    data World = World
      { environment :: FrozenLakeEnvironment
      , gameResult :: GameResult

    Now we can start building the other pieces of our game. There aren't really any "time" updates in our game, except to update the result based on our location:

    updateWorldTime :: Float -> World -> World
    updateWorldTime _ w = case tile of
      Goal -> World fle GameWon
      Hole -> World fle GameLost
      _ -> w
        fle = environment w
        obs = currentObservation fle
        tile = grid fle A.! obs

    When it comes to handling inputs, we need to start with the case of restarting the game. When the game isn't InProgress, only the "enter" button matters. This resets everything, using resetEnv':

    handleInputs :: Event -> World -> World
    handleInputs event w
      | gameResult w /= GameInProgress = case event of
          (EventKey (SpecialKey KeyEnter) Down _ _) ->
            World (resetEnv' fle) GameInProgress
          _ -> w

    Now we handle each directional input key. We'll make a helper function at the bottom that does the business of calling stepEnv'.

    handleInputs :: Event -> World -> World
    handleInputs event w
      | gameResult w /= GameInProgress = case event of
          (EventKey (SpecialKey KeyEnter) Down _ _) ->
            World (resetEnv' fle) GameInProgress
      | otherwise = case event of
          (EventKey (SpecialKey KeyUp) Down _ _) ->
            w {environment = finalEnv MoveUp }
          (EventKey (SpecialKey KeyRight) Down _ _) ->
            w {environment = finalEnv MoveRight }
          (EventKey (SpecialKey KeyDown) Down _ _) ->
            w {environment = finalEnv MoveDown }
          (EventKey (SpecialKey KeyLeft) Down _ _) ->
            w {environment = finalEnv MoveLeft }
          _ -> w
        fle = environment w
        finalEnv action =
          let (fe, _, _) = stepEnv' action fle
          in  fe

    The last step is rendering the environment with a draw function. This just requires a working knowledge of constructing the Picture type in Gloss. It's a little tedious, so I've included the full implementation as an appendix at the bottom. We can then combine all these pieces like so:

    main :: IO ()
    main = do
      env <- basicEnv
      play windowDisplay white 20
        (World env GameInProgress)

    After we have all these pieces, we can run our game, moving our player around to reach the green tile while avoiding the black tiles!



    With a little more plumbing, it would be possible to combine this with the rest of our "Environment" work. There are some definite challenges. Our current environment setup doesn't have a "time update" function. Combining machine learning with Gloss rendering would also be interesting. This is the end of our Open Gym series for now, but I'll definitely be working on this project more in the future! Next week we'll have a summary and review what we've learned!

    Take a look at our Github repository to see all the code we wrote in this series! The code for this article is on the gloss branch. And don't forget to Subscribe to Monday Morning Haskell to get our monthly newsletter!

    Appendix: Rendering Frozen Lake

    A lot of numbers here are hard-coded for a 4x4 grid, where each cell is 100x100. Notice particularly that we have a text message if we've won or lost.

    windowDisplay :: Display
    windowDisplay = InWindow "Window" (400, 400) (10, 10)
    drawEnvironment :: World -> Picture
    drawEnvironment world
      | gameResult world == GameWon = Translate (-150) 0 $ Scale 0.12 0.25
          (Text "You've won! Press enter to restart!")
      | gameResult world == GameLost = Translate (-150) 0 $ Scale 0.12 0.25
          (Text "You've lost :( Press enter to restart.")
      | otherwise = Pictures [tiles, playerMarker]
        observationToCoords :: Word -> (Word, Word)
        observationToCoords w = quotRem w 4
        renderTile :: (Word, TileType) -> Picture
        renderTile (obs, tileType ) =
          let (centerX, centerY) = rowColToCoords . observationToCoords $ obs
              color' = case tileType of
                Goal -> green
                Hole -> black
                _ -> blue
           in Translate centerX centerY (Color color' (Polygon [(-50, -50), (-50, 50), (50, 50), (50, -50)]))
        tiles = Pictures $ map renderTile (A.assocs (grid . environment $ world))
        (px, py) = rowColToCoords . observationToCoords $ (currentObservation . environment $ world)
        playerMarker = translate px py (Color red (ThickCircle 10 3))
    rowColToCoords :: (Word, Word) -> (Float, Float)
    rowColToCoords (row, col) = (100 * (fromIntegral col - 1.5), 100 * (1.5 - fromIntegral row))

    by James Bowen at June 22, 2020 02:30 PM

    Magnus Therning

    Better Nix setup for Spacemacs

    In an earlier post I documented my setup for getting Spacemacs/Emacs to work with Nix. I’ve since found a much more elegant solution based on

    No more Emacs packages for Nix and no need to defining functions that wrap executables in an invocation of nix-shell.

    There’s a nice bonus too, with this setup I don’t need to run nix-shell, which always drops me at a bash prompt, instead I get a working setup in my shell of choice.

    Setting up direnv

    The steps for setting up direnv depends a bit on your setup, but luckily I found the official instructions for installing direnv to be very clear and easy to follow. There’s not much I can add to that.

    Setting up Spacemacs

    Since emacs-direnv isn’t included by default in Spacemacs I needed to do a bit of setup. I opted to create a layer for it, rather than just drop it in the list dotspacemacs-additional-packages. Yes, a little more complicated, but not difficult and I nurture an intention of submitting the layer for inclusion in Spacemacs itself at some point. I’ll see where that goes.

    For now, I put the following in the file ~/.emacs.d/private/layers/direnv/packages.el:

    (defconst direnv-packages
    (defun direnv/init-direnv ()
      (use-package direnv

    Setting up the project folders

    In each project folder I then add the file .envrc containing a single line:


    Then I either run direnv allow from the command line, or run the function direnv-allow after opening the folder in Emacs.

    Using it

    It’s as simple as moving into the folder in a shell – all required envvars are set up on entry and unset on exit.

    In Emacs it’s just as simple, just open a file in a project and the envvars are set. When switching to a buffer outside the project the envvars are unset.

    There is only one little caveat, nix-build doesn’t work inside a Nix shell. I found out that running

    IN_NIX_SHELL= nix-build

    does work though.

    June 22, 2020 12:00 AM

    June 19, 2020

    Tweag I/O

    Linear types are merged in GHC

    Looking back at the journey which brought us there, and forward to what still lies ahead.

    June 19, 2020 12:00 AM

    June 16, 2020


    Using Template Haskell to generate static data

    Template Haskell (TH) is a powerful tool for specializing programs and allows shifting some work from runtime to compile time. It can be a bit intimidating to use for beginners. So I thought I would write up how to use TH to turn certain kind runtime computations into compile time computations.

    In particular we will turn the initialization of a fully static data structure into a compile time operation. This pattern works for many data structures but we will look at IntSet in particular.

    A working example

    As an example, consider a function such as:

    isStaticId :: Int -> Bool
    isStaticId x =
        x `elem` staticIds
        staticIds = [1,2,3,5,7 :: Int]

    We have a set of known things here, represented by a list named staticIds.

    We use Int as it makes the example easier. But these could be Strings or all kinds of things. In particular, I was inspired by GHC’s list of known builtin functions.

    Upsides of the list representation

    The advantage of the code as written above is that the list is statically known. As a result the list will be built into the final object code as static data, and accessing it will not require any allocation/computation.

    You can check this by looking at the core dump (-ddump-simpl). Don’t forget to enable optimizations or this might not work as expected. In the core there should be a number of definitions like the one below.

    -- RHS size: {terms: 3, types: 1, coercions: 0, joins: 0/0}
    isStaticId3 = : isStaticId8 isStaticId4

    Note that the above is Core syntax for isStaticId3 = isStaticId8 : isStaticId4, i.e., this is just denoting a part of the list, and each element gets its own identifier. All these definitions will be compiled to static data, and will eventually be represented as just a number of words encoding the constructor and its fields.

    We can confirm this by looking at the Cmm output where the corresponding fragment will look like this:

    [section ""data" . isStaticId3_closure" {
             const :_con_info;
             const isStaticId8_closure+1;
             const isStaticId4_closure+2;
             const 3;

    I won’t go into the details of how to read the Cmm, but it shows us that the binding will end up in the data section. The constant :_con_info; tells us that we are dealing with a Cons cell, and then we have the actual data stored in the cell.

    What is important here is that this is static data. The GC won’t have to traverse it so having the data around does not affect GC performance. We also don’t need to compute it at runtime as it’s present in the object file in its fully evaluated form.

    Switching to IntSet

    What if we aggregate more data? If we blow up the list to a hundred, a thousand or more elements, it’s likely that performing a linear search will become a bottleneck for performance.

    So we rewrite our function to use a set as follows:

    isStaticIdSet :: Int -> Bool
    isStaticIdSet x =
        x `S.member` staticIds
        staticIds = S.fromList [1,2,3,5,7 :: Int] :: IntSet

    This looks perfectly fine on the surface. Instead of having O(n) lookups we should get O(log(n)) lookups, right?

    Pitfalls of runtime initialization

    However, what happens at runtime? In order to query the set we have to first convert the list into a set. This is where disaster strikes. We are no longer querying static data, as the list argument has to be converted into a set. The S.fromList function call will not be evaluated at compilation time.

    In many cases, GHC may manage to at least share our created set staticIds across calls. But this is fragile, and depending on the exact code in question, it might not. Then we can end up paying the cost of set construction for each call to isStaticIdSet.

    So while we reduced the lookup cost from O(n) to O(log(n)) the total cost is now O(n*min(n,W)+ log(n)), where n*min(n,W) is the cost of constructing the set from a list. We could optimize this somewhat by making sure the list is sorted and has no duplicates. But we would still end up worse than with the list-based code we started out with.

    It’s a shame that GHC can’t evaluate S.fromList at compile time … or can it?

    Template Haskell (TH) to the rescue

    What we really want to do is to force GHC to fully evaluate our input data to an IntSet. Then ensure the IntSet is stored as static data just like it happens for the list in our initial example.

    How can TH help?

    Template Haskell allows us to specify parts of the program to compute at compile time.

    So we “simply” tell GHC to compute the set at compile time and are done.

    Like so:

    {-# NOINLINE isStaticIdSet #-}
    isStaticIdSet :: Int -> Bool
    isStaticIdSet x =
        x `S.member` staticIds
        staticIds = $$( liftTyped (S.fromList [1,2,3,5,7] :: IntSet))

    This results in Core that is even simpler as in the [Int] example above:

    -- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0}
    isStaticIdSet1 = Tip 0# 174##
    -- RHS size: {terms: 7, types: 3, coercions: 0, joins: 0/0}
      = \ x_a5ar ->
          case x_a5ar of { I# ww1_i5r2 -> $wmember ww1_i5r2 isStaticIdSet1 }

    No longer will we allocate the set at compilation time; instead the whole set is encoded in isStaticIdSet1. We only get a single constructor because IntSet can encode small sets using a single constructor.

    How it works

    From the outside in:

    $$( .. ) is TH syntax for a typed splice.1 Splicing is the process of inserting generated syntax into our program. The splice construct takes an expression denoting a syntax tree, evaluates it and inserts the resulting syntax at the place where the splice occurs.

    The next piece of magic is liftTyped. It takes a regular Haskell expression, evaluates it at compile time to an abstract syntax tree that, when spliced, equals the evaluated value of the Haskell expression.

    This leaves S.fromList [1,2,3,5,7] which is regular set creation.

    Putting these together, during compilation GHC will:

    • Evaluate S.fromList [1,2,3,5,7].
    • Turn the resulting set into an abstract syntax tree using liftTyped.
    • Splice that abstract syntax tree into our program using $$(..), effectively inserting the fully evaluated set expression into our program.

    The resulting code will be compiled like any other, in this case resulting in fully static data.

    Full example

    Now you might think this was too easy, and you are partially right. The main issue is that liftTyped requires a instance of the Lift typeclass.

    But for the case of IntSet, we can have GHC derive one for us. So all it costs us is slightly more boiler plate.

    Here is a full working example for you to play around with:

    -- First module
    {-# LANGUAGE TemplateHaskell #-} -- Enable TH
    {-# LANGUAGE StandaloneDeriving #-}
    {-# LANGUAGE DeriveLift #-}
    module TH_Lift  where
    import Language.Haskell.TH.Syntax
    import Data.IntSet.Internal
    deriving instance Lift (IntSet)
    -- Second module
    {-# LANGUAGE TemplateHaskell #-}
    module M (isStaticIdSet) where
    import TH_Lift
    import Data.IntSet as S
    import Language.Haskell.TH
    import Language.Haskell.TH.Syntax
    type Id = Int
    isStaticIdSet :: Int -> Bool
    isStaticIdSet x =
        x `S.member` staticSet
        staticSet = $$(liftTyped (S.fromList [1,2,3,5,7] :: IntSet))

    Why do we require two modules?

    We translate liftTyped (S.fromList [1,2,3,5,7] :: IntSet) into a TH expression at compile time. For this, GHC will call the (already compiled) lift method of the Lift instance.

    However if we define isStaticIdSet and the Lift instance in the same module, GHC can’t call liftTyped as it’s not yet compiled by the time we need it.

    In practice most packages have companions which already offer Lift instances. For example, th-lift-instances offers instances for the containers package.2

    Disclaimer: This won’t work for all data types!

    For many data types the result of liftTyped will be an expression that can be compiled to static data as long as the contents are known.

    This is in particular true for “simple” ADTs like the ones used by IntSet or Set.

    However certain primitives like arrays can’t be allocated at compile time. This sadly means this trick won’t currently work for Arrays or Vectors. There is a ticket about removing this restriction on arrays on GHC’s issue tracker.. So hopefully, we will be able to lift arrays at some point in the future.

    Note furthermore that lifting won’t work for infinite data structures, as it usually requires its argument to be evaluated completely if we want it to result in static data.

    1. We are using Typed Template Haskell here in order to advertise it a bit better. Typed Template Haskell ensures that we are building type-correct code. In an example, as simple as this, it hardly makes a difference, because even normal Template Haskell does type-check the generated code. We could equally well have written

      staticSet = $(lift (S.fromList [1,2,3,5,7] :: IntSet))
    2. Unfortunately, some of the instances defined in th-lift-instances up to version 0.1.16 are “wrong” for the purposes of this post. For example, the IntSet instance is based on a call to fromList, not statically building the internal representation. Make sure that you use th-lift-instances version 0.1.17 or later.↩︎

    by andreask at June 16, 2020 12:00 AM

    June 15, 2020

    Monday Morning Haskell

    Training our Agent with Haskell!


    In the previous part of the series, we used the ideas of Q-Learning together with TensorFlow. We got a more general solution to our agent that didn't need a table for every state of the game.

    This week, we'll take the final step and implement this TensorFlow approach in Haskell. We'll see how to integrate this library with our existing Environment system. It works out quite smoothly, with a nice separation between our TensorFlow logic and our normal environment logic!

    This article requires a working knowledge of the Haskell TensorFlow integration. If you're new to this, you should download our Guide showing how to work with this framework. You can also read our original Machine Learning Series for some more details! In particular, the second part will go through the basics of tensors.

    Building Our TF Model

    The first thing we want to do is construct a "model". This model type will store three items. The first will be the tensor for the weights we have. Then the second two will be functions in the TensorFlow Session monad. The first function will provide scores for the different moves in a position, so we can choose our move. The second will allow us to train the model and update the weights.

    data Model = Model
      {  weightsT :: Variable Float
      , chooseActionStep :: TensorData Float -> Session (Vector Float)
      , learnStep :: TensorData Float -> TensorData Float -> Session ()

    The input for choosing an action is our world observation state, converted to a Float and put in a size 16-vector. The result will be 4 floating point values for the scores. Then our learning step will take in the observation as well as a set of 4 values. These are the "target" values we're training our model on.

    We can construct our model within the Session monad. In the first part of this process we define our weights and use them to determine the score of each move (results).

    createModel :: Session Model
    createModel = do
      -- Choose Action
      inputs <- placeholder (Shape [1, 16])
      weights <- truncatedNormal (vector [16, 4]) >>= initializedVariable
      let results = inputs `matMul` readValue weights
      returnedOutputs <- render results

    Now we make our "trainer". Our "loss" function is the reduced, squared difference between our results and the "target" outputs. We'll use the adam optimizer to learn values for our weights to minimize this loss.

    createModel :: Session Model
    createModel = do
      -- Choose Action
      -- Train Nextwork
      (nextOutputs :: Tensor Value Float) <- placeholder (Shape [4, 1])
      let (diff :: Tensor Build Float) = nextOutputs `sub` results
      let (loss :: Tensor Build Float) = reduceSum (diff `mul` diff)
      trainer_ <- minimizeWith adam loss [weights]

    Finally, we wrap these tensors into functions we can call using runWithFeeds. Recall that each feed provides us with a way to fill in one of our placeholder tensors.

    createModel :: Session Model
    createModel = do
      -- Choose Action
      -- Train Network
      -- Create Model
      let chooseStep = \inputFeed ->
            runWithFeeds [feed inputs inputFeed] returnedOutputs
      let trainStep = \inputFeed nextOutputFeed ->
            runWithFeeds [ feed inputs inputFeed
                         , feed nextOutputs nextOutputFeed
      return $ Model weights chooseStep trainStep

    Our model now wraps all the different tensor operations we need! All we have to do is provide it with the correct TensorData. To see how that works, let's start integrating with our EnvironmentMonad!

    Integrating With Environment

    Our model's functions exist within the TensorFlow monad Session. So how then, do we integrate this with our existing Environment code? The answer is, of course, to construct a new monad! This monad will wrap Session, while still giving us our FrozenLakeEnvironment! We'll keep the environment within a State, but we'll also keep a reference to our Model.

    newtype FrozenLake a = FrozenLake
      (StateT (FrozenLakeEnvironment, Model) Session a)
      deriving (Functor, Applicative, Monad)
    instance (MonadState FrozenLakeEnvironment) FrozenLake where
      get = FrozenLake (fst <$> get)
      put fle = FrozenLake $ do
        (_, model) <- get
        put (fle, model)

    Now we can start implementing the actual EnvironmentMonad instance. Most of our existing types and functions will work with trivial modification. The only real change is that runEnv will need to run a TensorFlow session and create the model. Then it can use evalStateT.

    instance EnvironmentMonad FrozenLake where
      type (Observation FrozenLake) = FrozenLakeObservation
      type (Action FrozenLake) = FrozenLakeAction
      type (EnvironmentState FrozenLake) = FrozenLakeEnvironment
      baseEnv = basicEnv
      currentObservation = currentObs <$> get
      resetEnv = resetFrozenLake
      stepEnv = stepFrozenLake
      runEnv env (FrozenLake action) = runSession $ do
        model <- createModel
        evalStateT action (env, model)

    This is all we need to define the first class. But, with TensorFlow, our environment is only useful if we use the tensor model! This means we need to fill in LearningEnvironment as well. This has two functions, chooseActionBrain and learnEnv using our tensors. Let's see how that works.

    Choosing an Action

    Choosing an action is straightforward. We'll once again start with the same format for sometimes choosing a random move:

    chooseActionTensor :: FrozenLake FrozenLakeAction
    chooseActionTensor = FrozenLake $ do
      (fle, model) <- get
      let (exploreRoll, gen') = randomR (0.0, 1.0) (randomGenerator fle)
      if exploreRoll < flExplorationRate fle
        then do
          let (actionRoll, gen'') = Rand.randomR (0, 3) gen'
          put $ (fle { randomGenerator = gen'' }, model)
          return (toEnum actionRoll)
        else do

    As in Python, we'll need to convert an observation to a tensor type. This time, we'll create TensorData. This type wraps a vector, and our input should have the size 1x16. It has the format of a oneHot tensor. But it's easier to make this a pure function, rather than using a TensorFlow monad.

    obsToTensor :: FrozenLakeObservation -> TensorData Float
    obsToTensor obs = encodeTensorData (Shape [1, 16]) (V.fromList asList)
        asList = replicate (fromIntegral obs) 0.0 ++ 
                   [1.0] ++
                   replicate (fromIntegral (15 - obs)) 0.0

    Since we've already defined our chooseAction step within the model, it's easy to use this! We convert the current observation, get the result values, and then pick the best index!

    chooseActionTensor :: FrozenLake FrozenLakeAction
    chooseActionTensor = FrozenLake $ do
      (fle, model) <- get
      -- Random move
        else do
          let obs1 = currentObs fle
          let obs1Data = obsToTensor obs1
          -- Use model!
          results <- lift ((chooseActionStep model) obs1Data)
          let bestMoveIndex = V.maxIndex results
          put $ (fle { randomGenerator = gen' }, model)
          return (toEnum bestMoveIndex)

    Learning From the Environment

    One unfortunate part of our current design is that we have to repeat some work in our learning function. To learn from our action, we need to use all the values, not just the chosen action. So to start our learning function, we'll call chooseActionStep again. This time we'll get the best index AND the max score.

    learnTensor ::
      FrozenLakeObservation -> FrozenLakeObservation ->
      Reward -> FrozenLakeAction ->
      FrozenLake ()
    learnTensor obs1 obs2 (Reward reward) action = FrozenLake $ do
      model <- snd <$> get
      let obs1Data = obsToTensor obs1
      -- Use the model!
      results <- lift ((chooseActionStep model) obs1Data)
      let (bestMoveIndex, maxScore) =
            (V.maxIndex results, V.maximum results)

    We can now get our "target" values by substituting in the reward and max score at the proper index. Then we convert the second observation to a tensor, and we have all our inputs to call our training step!

    learnTensor ::
      FrozenLakeObservation -> FrozenLakeObservation ->
      Reward -> FrozenLakeAction ->
      FrozenLake ()
    learnTensor obs1 obs2 (Reward reward) action = FrozenLake $ do
      let (bestMoveIndex, maxScore) =
            (V.maxIndex results, V.maximum results)
      let targetActionValues = results V.//
            [(bestMoveIndex, double2Float reward + (gamma * maxScore))]
      let obs2Data = obsToTensor obs2
      let targetActionData = encodeTensorData
            (Shape [4, 1])
      -- Use the model!
      lift $ (learnStep model) obs2Data targetActionData
        gamma = 0.81

    Using these two functions, we can now fill in our LearningEnvironment class!

    instance LearningEnvironment FrozenLake where
      chooseActionBrain = chooseActionTensor
      learnEnv = learnTensor
      -- Same as before
      explorationRate = ..
      reduceExploration = ...

    We'll then be able to run this code just as we would our other Q-learning examples!


    This wraps up the machine learning part of this series. We'll have one more article about Open Gym next week. We'll compare our current setup and the Gloss library. Gloss offers much more extensive possibilities for rendering our game and accepting input. So using it would expand the range of games we could play!

    We'll definitely continue to expand on the Open Gym concept in the future! Expect a more formal approach to this at some point! For now, take a look at our Github repository for this series! This article's code is on the tensorflow branch!

    by James Bowen at June 15, 2020 02:30 PM

    June 14, 2020

    Sandy Maguire

    Polysemy: Mea Culpa

    Alexis King gave an utterly fantastic talk today on the deep inner workings of Haskell programs’ performance profiles. It’s really very excellent and you should go watch it if you haven’t already. I’ve been extremely burned out on Polysemy and effect-system-related topics lately, but it seems like as good a time as any to discuss what’s going on with the library. Why do Alexis’ benchmarks clearly show something other than my claim that Polysemy was “zero cost?” In short:

    I screwed up.

    The core Haskell that’s being run in Alexis’ benchmark probably looks like this, though at one point I did indeed get the countdown benchmark to completely optimize away. My claim to being zero-cost was based on this result, which was possible, but required patching GHC, enabling -flate-specialise -O2 -fstatic-argument-transformation -fmax-simplifier-iterations=10 as well as a GHC patch cajoling the optimizer into running extra hard.

    My patch to GHC just barely missed the 8.8 deadline, which meant it wouldn’t be publicly available until GHC 8.10, roughly a year away. And until then, Polysemy had no chance of being fast.

    The result of all this: fast code, relying on a house of cards of optimization, only on a compiler that didn’t exist yet. It worked, but was a huge hassle to test, and because of that, I didn’t do it very often, nor did I make it easy for others to verify my claims.

    My mindset has always been that the “free monads are too slow” argument is overblown and irrelevant to 99% of programs, and my original goal with Polysemy was to show that there was nothing fundamentally wrong with the approach; that if we tried hard enough, we really could pull amazing performance out of free monads.

    It’s been about a year now, so my recollection is hazy, but I think I must have somehow conflated “fast programs are possible in Polysemy” with “Polysemy is zero-cost.” There was absolutely no deception intended, but it appears I deceived myself, and the community because of that. I’m sorry.

    Sometime near the end of 2019, Lexi showed me her research into why the effect system benchmarks were extremely misleading (as mentioned in her talk.) Her research made it very evident that all effect systems were “cheating” in the benchmark shootout, and I attributed Polysemy’s pre-super-optimized terrible benchmark numbers to “not cheating as much.” If the optimizer was what was making other effect systems fast, but only in single-module programs, presumably they would also perform badly in real-world, multiple-module programs, and would see the same performance characteristics as Polysemy. I didn’t confirm this experimentally.

    Plus, I figured, if performance truly is a problem, and not the overactive fear I thought it was, surely someone would have filed a bug complaining that Polysemy wasn’t as fast as it claimed. To date, nobody has filed that bug, and I continue to believe it’s an overblown issue — though that isn’t to say we shouldn’t fix it if we can. Lexi’s package eff seems to be working towards that solution, and I applaud her for all of the work she’s been putting into this problem.

    So that’s more or less the story. But there are a few loose ends; such as why Lexi and I are seeing different benchmarking results. I realize this doesn’t actually matter, and I agree with her that Polysemy is in fact slow. That being said, I feel like enough of my reputation is on the line that I’d like to put towards some more evidence that I didn’t fabricate the whole thing. Also, the investigation will unearth some more systematic problems.

    First and foremost, the last time I looked at the source of Lexi’s benchmarks, I noted that they don’t use polysemy-plugin, which the documentation states is necessary for the good performance. I don’t remember where these benchmarks actually are, but it doesn’t matter, because even if she had enabled the plugin, Polysemy would still not optimize away.

    Why not? Polysemy’s performance was extremely reliant on unfolding of its recursive bind operation. As described here, you could trick GHC into unfolding a recursive call once by explicitly giving a loop-breaker. In essence, it required transforming the following recursive call:

    factorial :: Int -> Int
    factorial 0 = 1
    factorial n = n * factorial (n - 1)
    {-# INLINE factorial #-}

    Into this:

    factorial :: Int -> Int
    factorial 0 = 1
    factorial n = n * factorial' (n - 1)
    {-# INLINE factorial #-}
    factorial' :: Int -> Int
    factorial' = factorial
    {-# NOINLINE factorial' #-}

    For whatever reason, this trick exposes enough of Polysemy’s bind so that the simplifier could inline away the expensive bits. But this was tedious! Every recursive call needed an explicit loop-breaker, and missing one would silently jeopardize your performance! Doing this by hand seemed antithetical to Polysemy’s other goal of no-boilerplate, and so at some point we factored out this logic into a GHC plugin, and then removed our hand-written loop-breakers.. The initial implementation of that plugin is described in this blog post.

    In retrospect, this explicit breaking-of-loops doesn’t seem to be required in the benchmark — only in Polysemy — but that escaped my attention at the time and believing that user-code required this optimization was the main motivation in turning it into a GHC plugin. Anyway…

    As it turns out, this plugin didn’t actually work! It was successfully rewriting the core into the explicitly loop-broken version, but for whatever reason, the simplifier wasn’t picking up where we left off. To this day I don’t know why it doesn’t work, but it doesn’t. Instead we proposed to implement this plugin as a renamer pass, but that presents serious implementation problems. Since there was no way in hell Polysemy could possibly be fast before GHC 8.10 (to be released roughly a year in the future) motivation to find a solution to this problem was understandably low, and it fell by the wayside. It has never been fixed, and remains disabled and half-worked around in Polysemy to this day.

    Hopefully this is the only reason why Polysemy doesn’t show the excellent (though, admittedly unrepresentative) countdown benchmark results I claimed it did. I’m not invested enough to check for myself, but if you’re interested, I suspect you’ll see excellent core produced by my single-file repro if you compile it on GHC 8.10 under -O2 with the polysemy-plugin and the above flags enabled. If so, I suspect rolling back #8bbd9dc would get the real Polysemy library also doing well on the benchmark. But again, the benchmark performance is meaningless!

    Enough history for today. Before ending this post, I’d like to do a tiny STAMP on what went wrong, in the hope that we can all learn something. The goal here is not to pass the buck, but to get a sense of just how much went wrong, how, and why.

    By my analysis, the following issues all contributed to Polysemy’s colossal failure:

    • Haskell’s performance is not well understood
      • The effect system benchmarks were meaningless, and if anyone knew that, it was not common knowledge.
      • MTL is widely believed to be more performant than it is.
      • Existing effect systems’ performance is tied largely to GHC’s optimizer firing.
      • Because of lack of understanding, I was tackling bad-performance symptoms rather than causes.
    • Polysemy’s performance was unreliable
      • Required several interlocking pieces to work: a patched compiler, a core plugin, explicit loop-breakers, obscure GHC options.
      • Because the performance was hard to test, we didn’t notice when these pieces didn’t work.
        • Upon noticing the loop-breaking plugin didn’t work, it was unclear how to fix it.
          • Because of requiring a patched GHC, it was not a priority to fix.
            • Not being a priority meant it wasn’t motivating, and so it didn’t get done.
      • Debugging the simplifier is hard work. I was looking at thousands of lines of generated core by eye. Tooling exists, but it is more helpful for navigating core than diffing it.
    • Polysemy’s performance was too hard to test.
      • I missed the GHC deadline
        • My patch lingered for weeks in a finished state
          • Only reviewable by one person, who was on vacation.
          • Stuck doing drive-by improvements that were only suggestions, and not blockers to being merged. This was not made clear to me.
          • The simplifier is really hairy. It’s under-documented, and the function I was touching was over 150 lines of code.
      • I use Stack for my development, Stack doesn’t easily support custom-built GHCs. Therefore I couldn’t use my usual tools to test.
      • I don’t know how to use cabal
        • The documentation is notoriously lacking. As best I can tell, there are no “quick start” tutorials, and the relevant parts of the user manual are mentioned only under a heading that mentions “Nix”.
      • Because of the above two points, I only tested on the single module, and never on the library itself.
    • I had too much ego in the project.
      • I wanted to believe I had accomplished something “impossible.”
      • I had invested several engineering-months of my time working on this problem.
      • I had invested a large chunk of my reputation into free monads.

    This post is long enough without diving into those points in more detail, but I’m happy to expand on individual points. Let me know in the comments if you’re interested.

    All in all, this is has been the embarrassing affair. But then again, if you haven’t failed in recent memory, you’re not trying hard enough. I’ll strive to do better in the future.

    June 14, 2020 10:36 PM

    June 11, 2020

    Joey Hess

    bracketing and async exceptions in haskell

    I've been digging into async exceptions in haskell, and getting more and more concerned. In particular, bracket seems to be often used in ways that are not async exception safe. I've found multiple libraries with problems.

    Here's an example:

    withTempFile a = bracket setup cleanup a
        setup = openTempFile "/tmp" "tmpfile"
        cleanup (name, h) = do
            hClose h
            removeFile name

    This looks reasonably good, it makes sure to clean up after itself even when the action throws an exception.

    But, in fact that code can leave stale temp files lying around. If the thread receives an async exception when hClose is running, it will be interrupted before the file is removed.

    We normally think of bracket as masking exceptions, but it doesn't prevent async exceptions in all cases. See Control.Exception on "interruptible operations", which can receive async exceptions even when other exceptions are masked.

    It's a bit surprising, but hClose is such an interruptable operation, because it flushes the write buffer. The only way to know is to read the code.

    It can be quite hard to determine if an operation is interruptable, since it can come down to whether it retries a STM transaction, or uses a MVar that is not always full. I've been auditing libraries and I often have to look at code several dependencies away, and even then may not be sure if a library has this problem.

    • process's withCreateProcess could fail to wait on the process, leaving a zombie. Might also leak file descriptors?

    • http-client's withResponse might fail to close a network connection. (If a MVar happened to be empty when it's called.)

      Worth noting that there are plenty of examples of using http-client to eg, race downloading two urls and cancel the slower download. Which is just the kind of use of an async exception that could cause a problem.

    • persistent's withSqlPool and withSqlConn might fail to clean up, when used with persistent-postgresql. (If another thread is using the connection and so a MVar over in postgresql-simple is empty.)

    • concurrent-output has some locking code that is not async exception safe. (My library, so I've fixed part of it, and hope to fix the rest.)

    So far, around half of the libraries I've looked at, that use bracket or onException or the like probably have this problem.

    What can libraries do?

    • Document whether these things are async exception safe. Or perhaps there should be an expectation that "withFoo" always is, but if so the Haskell comminity has some work ahead of it.

    • Use finally. Good mostly in simple situations; more complicated things would be hard to write this way.

      hClose h `finally` removeFile name

    • Use uninterruptibleMask, but it's a big hammer and is often not the right tool for the job. If the operation takes a while to run, the program will not respond to ctrl-c during that time.

    • May be better to run the actions in worker threads, to insulate them from receiving any async exceptions.

      bracketInsulated :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c
      bracketInsulated a b = bracket
        (uninterruptibleMask $ \u -> async (u a) >>= u . wait)
        (\v -> uninterruptibleMask $ \u -> async (u (b v)) >>= u . wait)
      (Note use of uninterruptibleMask here in case async itself does an interruptable operation. My first version got that wrong.. This is hard!)

    My impression of the state of things now is that you should be very cautious using race or cancel or withAsync or the like, unless the thread is small and easy to audit for these problems. Kind of a shame, since I had wanted to be able to cancel a thread that is big and sprawling and uses all the libraries mentioned above.

    This work was sponsored by Jake Vosloo and Graham Spencer on Patreon.

    June 11, 2020 05:24 PM

    June 10, 2020

    Jeremy Gibbons

    How to design co-programs

    I recently attended the Matthias Felleisen Half-Time Show, a symposium held in Boston on 3rd November in celebration of Matthias’s 60th birthday. I was honoured to be able to give a talk there; this post is a record of what I (wished I had) said.

    Matthias Felleisen

    Matthias is known for many contributions to the field of Programming Languages. He received the SIGPLAN Programming Languages Achievement Award in 2012, the citation for which states:

    He introduced evaluation contexts as a notation for specifying operational semantics, and progress-and-preservation proofs of type safety, both of which are used in scores of research papers each year, often without citation. His other contributions include small-step operational semantics for control and state, A-normal form, delimited continuations, mixin classes and mixin modules, a fully-abstract semantics for Sequential PCF, web programming techniques, higher-order contracts with blame, and static typing for dynamic languages.

    Absent from this list, perhaps because it wasn’t brought back into the collective memory until 2013, is a very early presentation of the idea of using effects and handlers for extensible language design.

    However, perhaps most prominent among Matthias’s contributions to our field is a long series of projects on teaching introductory programming, from TeachScheme! through Program By Design to a latest incarnation in the form of the How to Design Programs textbook (“HtDP”), co-authored with Robby Findler, Matthew Flatt, and Shriram Krishnamurthi, and now in its second edition. The HtDP book is my focus in this post; I have access only the First Edition, but the text of the Second Edition is online. I make no apologies for using Haskell syntax in my examples, but at least that gave Matthias something to shout at!

    Design Recipes

    One key aspect of HtDP is the emphasis on design recipes for solving programming tasks. A design recipe is a template for the solution to a problem: a contract for the function, analogous to a type signature (but HtDP takes an untyped approach, so this signature is informal); a statement of purpose; a function header; example inputs and outputs; and a skeleton of the function body. Following the design recipe entails completing the template—filling in a particular contract, etc—then fleshing out the function body from its skeleton, and finally testing the resulting program against the initial examples.

    The primary strategy for problem solving in the book is via analysis of the structure of the input. When the input is composite, like a record, the skeleton should name the available fields as likely ingredients of the solution. When the input has “mixed data”, such as a union type, the skeleton should enumerate the alternatives, leading to a case analysis in the solution. When the input is of a recursive type, the skeleton encapsulates structural recursion—a case analysis between the base case and the inductive case, the latter case entailing recursive calls.

    So, the design recipe for structural recursion looks like this:

    Phase Goal Activity
    Data Analysis and Design to formulate a data definition develop a data definition for mixed data with at least two alternatives; one alternative must not refer to the definition; explicitly identify all self-references in the data definition
    Contract Purpose and Header to name the function; to specify its classes of input data and its class of output data; to describe its purpose; to formulate a header name the function, the classes of input data, the class of output data, and specify its purpose:

    ;; name : in1 in2 … –> out

    ;; to compute … from x1 …

    (define (name x1 x2 …) …)

    Examples to characterize the input-output relationship via examples create examples of the input-output relationship; make sure there is at least one example per subclass
    Template to formulate an outline develop a cond-expression with one clause per alternative; add selector expressions to each clause; annotate the body with natural recursions; Test: the self-references in this template and the data definition match!
    Body to define the function formulate a Scheme expression for each simple cond-line; explain for all other cond-clauses what each natural recursion computes according to the purpose statement
    Test to discover mistakes (“typos” and logic) apply the function to the inputs of the examples; check that the outputs are as predicted

    The motivating example for structural recursion is Insertion Sort: recursing on the tail of a non-empty list and inserting the head into the sorted subresult.

    \displaystyle  \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{insertsort} :: [\mathit{Integer}] \rightarrow [\mathit{Integer}]} \\ \mathit{insertsort}\;[\,] &= [\,] \\ \mathit{insertsort}\;(a:x) &= \mathit{insert}\;a\;(\mathit{insertsort}\;x) \medskip \\ \multicolumn{2}{@{}l}{\mathit{insert} :: \mathit{Integer} \rightarrow [\mathit{Integer}] \rightarrow [\mathit{Integer}]} \\ \mathit{insert}\;b\;[\,] &= [b] \\ \mathit{insert}\;b\;(a:x) \\ \qquad \mid b \le a &= b : a : x \\ \qquad \mid b > a &= a : \mathit{insert}\;b\;x \end{array}

    A secondary, more advanced, strategy is to use generative recursion, otherwise known as divide-and-conquer. The skeleton in this design recipe incorporates a test for triviality; in the non-trivial cases, it splits the problem into subproblems, recursively solves the subproblems, and assembles the subresults into an overall result. The motivating example for generative recursion is QuickSort (but not the fast in-place version): dividing a non-empty input list into two parts using the head as the pivot, recursively sorting both parts, and concatenating the results with the pivot in the middle.

    \displaystyle  \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{quicksort} :: [\mathit{Integer}] \rightarrow [\mathit{Integer}]} \\ \mathit{quicksort}\;[\,] & = [\,] \\ \mathit{quicksort}\;(a:x) &= \mathit{quicksort}\;y \mathbin{{+}\!\!\!{+}} [a] \mathbin{{+}\!\!\!{+}} \mathit{quicksort}\;z \\ & \qquad \mathbf{where}\; \begin{array}[t]{@{}ll} y &= [ b \mid b \leftarrow x, b \le a ] \\ z &= [ b \mid b \leftarrow x, b > a ] \end{array} \end{array}

    As far as I can see, no other program structures than structural recursion and generative recursion are considered in HtDP. (Other design recipes are considered, in particular accumulating parameters and imperative features. But these do not determine the gross structure of the resulting program. In fact, I believe that the imperative recipe has been dropped in the Second Edition.)


    My thesis is that HtDP has missed an opportunity to reinforce its core message, that data structure determines program structure. Specifically, I believe that the next design recipe to consider after structural recursion, in which the shape of the program is determined by the shape of the input, should be structural corecursion, in which the shape of the program is determined instead by the shape of the output.

    More concretely, a function that generates “mixed output”—whether that is a union type, or simply a boolean—might be defined by case analysis over the output. A function that generates a record might be composed of subprograms that generate each of the fields of that record. A function that generates a recursive data structure from some input data might be defined with a case analysis as to whether the result is trivial, and for non-trivial cases with recursive calls to generate substructures of the result. HtDP should present explicit design recipes to address these possibilities, as it does for program structures determined by the input data.

    For an example of mixed output, consider a program that may fail, such as division, guarded so as to return an alternative value in that case:

    \displaystyle  \begin{array}{@{}lll} \multicolumn{3}{@{}l}{\mathit{safeDiv} :: \mathit{Integer} \rightarrow \mathit{Integer} \rightarrow \mathsf{Maybe}\;\mathit{Integer}} \\ \mathit{safeDiv}\;x\;y & \mid y == 0 & = \mathit{Nothing} \\ & \mid \mathbf{otherwise} & = \mathit{Just}\;(x \mathbin{\underline{\smash{\mathit{div}}}} y) \end{array}

    The program performs a case analysis, and of course the analysis depends on the input data; but the analysis is not determined by the structure of the input, only its value. So a better explanation of the program structure is that it is determined by the structure of the output data.

    For an example of generating composite output, consider the problem of extracting a date, represented as a record:

    \displaystyle  \mathbf{data}\;\mathit{Date} = \mathit{Date} \{ \mathit{day} :: \mathit{Day}, \mathit{month} :: \mathit{Month}, \mathit{year} :: \mathit{Year} \}

    from a formatted string. The function is naturally structured to match the output type:

    \displaystyle  \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{readDate} :: \mathit{String} \rightarrow \mathit{Date}} \\ \mathit{readDate}\;s & = \mathit{Date}\;\{ \mathit{day} = d, \mathit{month} = m, \mathit{year} = y \} \\ & \qquad \mathbf{where}\; \begin{array}[t]{@{}ll} d & = ... s ... \\ m & = ... s ... \\ y & = ... s ... \end{array} \end{array}

    For an example of corecursion, consider the problem of “zipping” together two input lists to a list of pairs—taking {[1,2,3]} and {[4,5,6]} to {[(1,4),(2,5),(3,6)]}, and for simplicity let’s say pruning the result to the length of the shorter input. One can again solve the problem by case analysis on the input, but the fact that there are two inputs makes that a bit awkward—whether to do case analysis on one list in favour of the other, or to analyse both in parallel. For example, here is the outcome of case analysis on the first input, followed if that is non-empty by a case analysis on the second input:

    \displaystyle  \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{zip} :: [\alpha] \rightarrow [\beta] \rightarrow [(\alpha,\beta)]} \\ \mathit{zip}\;x\;y &= \begin{array}[t]{@{}l} \mathbf{if}\; \mathit{null}\;x \;\mathbf{then}\; [\,] \;\mathbf{else} \\ \qquad \mathbf{if}\; \mathit{null}\;y \;\mathbf{then}\; [\,] \;\mathbf{else} \\ \qquad \qquad (\mathit{head}\;x,\mathit{head}\;y) : \mathit{zip}\;(\mathit{tail}\;x)\;(\mathit{tail}\;y) \end{array} \end{array}

    Case analysis on both inputs would lead to four cases rather than three, which would not be an improvement. One can instead solve the problem by case analysis on the output—and it is arguably more natural to do so, because there is only one output rather than two. When is the output empty? (When either input is empty.) If it isn’t empty, what is the head of the output? (The pair of input heads.) And from what data is the tail of the output recursively constructed? (The pair of input tails.)

    \displaystyle  \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{zip} :: [\alpha] \rightarrow [\beta] \rightarrow [(\alpha,\beta)]} \\ \mathit{zip}\;x\;y &= \begin{array}[t]{@{}l} \mathbf{if}\; \mathit{null}\;x \lor \mathit{null}\;y \;\mathbf{then}\; [\,] \;\mathbf{else} \\ \qquad (\mathit{head}\;x,\mathit{head}\;y) : \mathit{zip}\;(\mathit{tail}\;x)\;(\mathit{tail}\;y) \end{array} \end{array}

    And whereas Insertion Sort is a structural recursion over the input list, inserting elements one by one into a sorted intermediate result, Selection Sort is a structural corecursion towards the output list, repeatedly extracting the minimum remaining element as the next element of the output: When is the output empty? (When the input is empty.) If the output isn’t empty, what is its head? (The minimum of the input.) And from what data is the tail recursively generated? (The input without this minimum element.)

    \displaystyle  \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{selectSort} :: [\mathit{Integer}] \rightarrow [\mathit{Integer}]} \\ \mathit{selectSort}\;x & = \mathbf{if}\; \mathit{null}\;x \;\mathbf{then}\; [\,] \;\mathbf{else}\; a : \mathit{selectSort}\;(x \mathbin{\backslash\!\backslash} [a]) \\ & \qquad \mathbf{where}\; a = \mathit{minimum}\;x \end{array}

    (here, {x \mathbin{\backslash\!\backslash} y} denotes list {x} with the elements of list {y} removed).

    I’m not alone in making this assertion. Norman Ramsey wrote a nice paper On Teaching HtDP; his experience led him to the lesson that

    Last, and rarely, you could design a function’s template around the introduction form for the result type. When I teach [HtDP] again, I will make my students aware of this decision point in the construction of a function’s template: should they use elimination forms, function composition, or an introduction form? They should use elimination forms usually, function composition sometimes, and an introduction form rarely.

    He also elaborates on test coverage:

    Check functional examples to be sure every choice of input is represented. Check functional examples to be sure every choice of output is represented. This activity is especially valuable for functions returning Booleans.

    (his emphasis). We should pay attention to output data structure as well as to input data structure.

    Generative recursion, aka divide-and-conquer

    Only once this dual form of program structure has been explored should students be encouraged to move on to generative recursion, because this exploits both structural recursion and structural corecursion. For example, the QuickSort algorithm that is used as the main motivating example is really structured as a corecursion {\mathit{build}} to construct an intermediate tree, followed by structural recursion {\mathit{flatten}} over that tree to produce the resulting list:

    \displaystyle  \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{quicksort} :: [\mathit{Integer}] \rightarrow [\mathit{Integer}]} \\ \mathit{quicksort} & = \mathit{flatten} \cdot \mathit{build} \medskip \\ \multicolumn{2}{@{}l}{\mathbf{data}\;\mathit{Tree} = \mathit{Empty} \mid \mathit{Node}\;\mathit{Tree}\;\mathit{Integer}\;\mathit{Tree}} \medskip\\ \multicolumn{2}{@{}l}{\mathit{build} :: [\mathit{Integer}] \rightarrow \mathit{Tree}} \\ \mathit{build}\;[\,] & = \mathit{Empty} \\ \mathit{build}\;(a:x) &= \mathit{Node}\;(\mathit{build}\;y)\;a\;(\mathit{build}\;z) \\ & \qquad \mathbf{where}\; \begin{array}[t]{@{}ll} y &= [ b \mid b \leftarrow x, b \le a ] \\ z &= [ b \mid b \leftarrow x, b > a ] \end{array} \medskip \\ \multicolumn{2}{@{}l}{\mathit{flatten} :: \mathit{Tree} \rightarrow [\mathit{Integer}]} \\ \mathit{flatten}\;\mathit{Empty} & = [\,] \\ \mathit{flatten}\;(\mathit{Node}\;t\;a\;u) &= \mathit{flatten}\;t \mathbin{{+}\!\!\!{+}} [a] \mathbin{{+}\!\!\!{+}} \mathit{flatten}\;u \end{array}

    The structure of both functions {\mathit{build}} and {\mathit{flatten}} is determined by the structure of the intermediate {\mathit{Tree}} datatype, the first as structural corecursion and the second as structural recursion. A similar explanation applies to any divide-and-conquer algorithm; for example, MergeSort is another divide-and-conquer sorting algorithm, with the same intermediate tree shape, but this time with a simple splitting phase and all the comparisons in the recombining phase:

    \displaystyle  \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{mergesort} :: [\mathit{Integer}] \rightarrow [\mathit{Integer}]} \\ \mathit{mergesort} & = \mathit{mergeAll} \cdot \mathit{splitUp} \medskip \\ \multicolumn{2}{@{}l}{\mathit{splitUp} :: [\mathit{Integer}] \rightarrow \mathit{Tree}} \\ \mathit{splitUp}\;[\,] & = \mathit{Empty} \\ \mathit{splitUp}\;(a:x) &= \mathit{Node}\;(\mathit{splitUp}\;y)\;a\;(\mathit{splitUp}\;z) \\ & \qquad \mathbf{where}\; (y,z) = \mathit{halve}\;x \medskip \\ \multicolumn{2}{@{}l}{\mathit{halve} :: [\mathit{Integer}] \rightarrow ([\mathit{Integer}],[\mathit{Integer}])} \\ \mathit{halve}\;[\,] & = ( [\,], [\,]) \\ \mathit{halve}\;[a] & = ( [a], [\,]) \\ \mathit{halve}\;(a:b:x) & = (a:y,b:z) \;\mathbf{where}\; (y,z) = \mathit{halve}\;x \medskip \\ \multicolumn{2}{@{}l}{\mathit{mergeAll} :: \mathit{Tree} \rightarrow [\mathit{Integer}]} \\ \mathit{mergeAll}\;\mathit{Empty} & = [\,] \\ \mathit{mergeAll}\;(\mathit{Node}\;t\;a\;u) &= \mathit{merge}\;(\mathit{mergeAll}\;t)\;(\mathit{merge}\;[a]\;(\mathit{mergeAll}\;u)) \medskip \\ \multicolumn{2}{@{}l}{\mathit{merge} :: ([\mathit{Integer}],[\mathit{Integer}]) \rightarrow [\mathit{Integer}]} \\ \mathit{merge}\;[\,]\;y & = y \\ \mathit{merge}\;x\;[\,] & = x \\ \mathit{merge}\;(a:x)\;(b:y) & = \begin{array}[t]{@{}l@{}l@{}l} \mathbf{if}\; a \le b\; & \mathbf{then}\; & a : \mathit{merge}\;x\;(b:y) \\ & \mathbf{else}\; & b : \mathit{merge}\;(a:x)\;y \end{array} \end{array}

    (Choosing this particular tree type is a bit clunky for Merge Sort, because of the two calls to {\mathit{merge}} required in {\mathit{mergeAll}}. It would be neater to use non-empty externally labelled binary trees, with elements at the leaves and none at the branches:

    \displaystyle  \begin{array}{@{}l} \mathbf{data}\;\mathit{Tree}_2 = \mathit{Tip}\;\mathit{Integer} \mid \mathit{Bin}\;\mathit{Tree}_2\;\mathit{Tree}_2 \end{array}

    Then you define the main function to work only for non-empty lists, and provide a separate case for sorting the empty list. This clunkiness is a learning opportunity: to realise the problem, come up with a fix (no node labels in the tree), rearrange the furniture accordingly, then replay the development and compare the results.)

    Having identified the two parts, structural recursion and structural corecursion, they may be studied separately; separation of concerns is a crucial lesson in introductory programming. Moreover, the parts may be put together in different ways. The divide-and-conquer pattern is known in the MPC community as a hylomorphism, an unfold to generate a call tree followed by a fold to consume that tree. As the QuickSort example suggests, the tree can always be deforested—it is a virtual data structure. But the converse pattern, of a fold from some structured input to some intermediate value, followed by an unfold to a different structured output, is also interesting—you can see this as a change of structured representation, so I called it a metamorphism. One simple application is to convert a number from an input base (a sequence of digits in that base), via an intermediate representation (the represented number), to an output base (a different sequence of digits). More interesting applications include encoding and data compression algorithms, such as arithmetic coding.


    Although I have used Haskell as a notation, nothing above depends on laziness; it would all work as well in ML or Scheme. It is true that the mathematical structures underlying structural recursion and structural corecursion are prettier when you admit infinite data structures—the final coalgebra of the base functor for lists is the datatype of finite and infinite lists, and without admitting the infinite ones some recursive definitions have no solution. But that sophistication is beyond the scope of introductory programming, and it suffices to restrict attention to finite data structures.

    HtDP already stipulates a termination argument in the design recipe for generative recursion; the same kind of argument should be required for structural corecursion (and is easy to make for the sorting examples given above). Of course, structural recursion over finite data structures is necessarily terminating. Laziness is unnecessary for co-programming.

    Structured programming

    HtDP actually draws on a long tradition on relating data structure and program structure, which is a theme dear to my own heart (and indeed, the motto of this blog). Tony Hoare wrote in his Notes on Data Structuring in 1972:

    There are certain close analogies between the methods used for structuring data and the methods for structuring a program which processes that data.

    Fred Brooks put it pithily in The Mythical Man-Month in 1975:

    Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.

    which was modernized by Eric Raymond in his 1997 essay The Cathedral and the Bazaar to:

    Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won’t usually need your code; it’ll be obvious.

    HtDP credits Jackson Structured Programming as partial inspiration for the design recipe approach. As Jackson wrote, also in 1975:

    The central theme of this book has been the relationship between data and program structures. The data provides a model of the problem environment, and by basing our program structures on data structures we ensure that our programs will be intelligible and easy to maintain.


    The structure of a program must be based on the structures of all of the data it processes.

    (my emphasis). In a retrospective lecture in 2001, he clarified:

    program structure should be dictated by the structure of its input and output data streams

    —the JSP approach was designed for processing sequential streams of input records to similar streams of output records, and the essence of the approach is to identify the structures of each of the data files (input and output) in terms of sequences, selections, and iterations (that is, as regular languages), to refine them all to a common structure that matches all of them simultaneously, and to use that common data structure as the program structure. So even way back in 1975 it was clear that we need to pay attention to the structure of output data as well as to that of input data.


    HtDP apologizes that generative recursion (which it identifies with the field of algorithm design)

    is much more of an ad hoc activity than the data-driven design of structurally recursive functions. Indeed, it is almost better to call it inventing an algorithm than designing one. Inventing an algorithm requires a new insight—a “eureka”.

    It goes on to suggest that mere programmers cannot generally be expected to have such algorithmic insights:

    In practice, new complex algorithms are often developed by mathematicians and mathematical computer scientists; programmers, though, must th[o]roughly understand the underlying ideas so that they can invent the simple algorithms on their own and communicate with scientists about the others.

    I say that this defeatism is a consequence of not following through on the core message, that data structure determines program structure. QuickSort and MergeSort are not ad hoc. Admittedly, they do require some insight in order to identify structure that is not present in either the input or the output. But having identified that structure, there is no further mystery, and no ad-hockery required.

    by jeremygibbons at June 10, 2020 10:42 AM

    June 09, 2020

    Brent Yorgey

    Anyone willing to help me get set up with something like miso?

    For the last few years I’ve been working (off and on) on a teaching language for FP and discrete math. One of the big goals is to build a browser-based IDE which runs completely client-side, so students can code in their browsers without installing anything, and I don’t have to worry about running a server. Thanks to amazing technology like GHCJS, miso, reflex, etc., this seems like it should be entirely doable.

    However, every time I sit down to try building such a thing, I end up getting completely bogged down in details of nix and stack and .cabal files and whatnot, and never even get off the ground. There are usually nice examples of building a new site from scratch, but I can never figure out the right way to incorporate my large amount of existing Haskell code. Should I have one package? Two separate packages for the website and the language implementation? How can I set things up to build either/both a command-line REPL and a web IDE?

    I’m wondering if there is someone experienced with GHCJS and miso who would be willing to help me get things set up. (I’m also open to being convinced that some other combination of technologies would be better for my use case.) I’m imagining a sort of pair-programming session via videoconference. I am pretty flexible in terms of times so don’t worry about whether your time zone matches mine. And if there’s something I can help you with I’m happy to do an exchange.

    If you’re interested and willing, send me an email: byorgey at gmail. Thanks!

    by Brent at June 09, 2020 01:30 PM

    June 08, 2020

    Michael Snoyman

    New book available: Begin Rust

    Miriam (aka @LambdaMom) and I are very happy to announce the release of Begin Rust. We’ve been working on this book since the end of last year, and we’re very excited to release it to the world.

    Begin Rust is a complete guide to programming with the Rust programming language, targeted at new and experienced programmers alike.

    We started working on the content in this book when trying to teach our children some programming. We eventually decided to expand it to a full introduction. We’ve strived to make the content accessible to younger audiences who have less experience with computers. We hope it can also be useful for experienced programmers looking to learn about the Rust programming language.

    The first three chapters are available online if you’d like to see them. We also have a Discourse instance for discussion and a @beginrust Twitter account.

    June 08, 2020 01:00 PM


    Well-Typed Advanced Track at ZuriHac 2020

    Just as last year, we will offer an Advanced Track comprising two (completely independent) workshops at this year’s ZuriHac. This year’s ZuriHac will take place on the weekend of 13–14 June 2020, and it will be an online event. This means the two sessions are going to be streamed live to YouTube and are free to attend for everyone who is interested.

    If you want to ask questions during the sessions or otherwise discuss the workshops in more detail, you should sign up for ZuriHac and join its Discord server if you have not already. More details on registration, the program, and how the setup with YouTube and Discord works will be available via the ZuriHac site.

    The Advanced Track sessions are as follows:

    Andres Löh: Datatype-Generic Programming

    Saturday, 13 June 2020, 1400-1700 CEST (1300-1600 BST)

    On Saturday, Andres Löh will lead a session on Datatype-Generic Programming in Haskell.

    Datatype-Generic programming is a powerful tool that allows the implementation of functions that adapt themselves to a large class of datatypes and can be made available on new datatypes easily by means such as “deriving”.

    In this workshop, we focus on the ideas at the core of two popular approaches to generic programming: GHC.Generics and generics-sop. Both can be difficult to understand at first. We will build simpler versions of both approaches to illustrate some of the design decisions taken. This exploration will lead to a better understanding of the trade-offs and ultimately also make using these libraries easier.

    This presentation will involve various type-level programming concepts such as type families, data kinds, GADTs and higher-rank types. It’s not a requirement to be an expert in using these features, but I will not focus on explaining them in detail either; so having some basic familiarity with the syntax and semantics will be helpful.

    Tobias Dammers: Haskell and Infosec

    Sunday, 14 June 2020, 1400-1700 CEST (1300-1600 BST)

    On Sunday afternoon, Tobias Dammers will lead a session on Haskell and Infosec.

    In this workshop, we will look at Haskell from an information security point of view. How do security vulnerabilities such as SQL Injection (SQLi) or Cross-Site Scripting (XSS) work? How does a hacker exploit them? What can we, as Haskell programmers, do to prevent them, and how can Haskell help us with that? And what principles can we extract from this to develop more security-aware coding habits?

    No knowledge of or experience with information security is required for this course, but participants are expected to have a working knowledge of practical Haskell. If you can write a simple web application with, e.g., Scotty, you should be fine.

    We would be delighted to welcome you at these sessions or to discuss any interesting topics with you at ZuriHac.

    Other courses

    If you cannot make it to ZuriHac but are still interested in our courses or other services, check our Training page, Services page, or just send us an email.

    by andres at June 08, 2020 12:00 AM

    June 07, 2020

    Joachim Breitner

    Managed by an eleven year old

    This weekend I had some pretty good uncle time. My eleven year old nephew (after beating me in a fair match of tennis) wanted to play with his Lego Mindstorms set. He never even touches it unless I am around to help him, but then he quiet enjoys. As expected, when I ask him what we should try to build, he tends to come up with completely unrealistic or impossible ideas, and we have to somehow reduce the scope to something doable.

    This time round, inspired by their autonomous vacuum cleaner, he wanted to build something like that. That was convenient, because now I could sell him what I wanted to try to do, namely make the robot follow a wall at more or less constant distance, as a useful first step.

    We mounted the infra red distance sensor at an 45° angle to the front-left, and then I built a very simple control loop – measure the distance, subtract 40, apply a factor, and feed that into the “steering” input of the drive action, and repeat. I must admit that I was pretty proud to see just how well that very simple circuit worked: The little robot turned sharp corners in both directions, and otherwise drove nicely parallel and with constant distance to the wall.

    I was satisfied with that achievement and would have happily ended it here before we might be disappointed by more ambitious and then failing goals.

    But my nephew had not forgotten about the vacuum cleaner, and casually asked if I could make the robot draw an outline of its path, and thus of the room, on the display. Ok, where do I begin to explain just how unfeasible that is … yes, it seemed one can draw to the display. But how should the robot know where it is? How far it turns? How far it went? This is programming by connecting boxes, and I would not expect such an interface to allow for complex logic (right?). And I would need trigonometry and stuff.

    But he didn’t quite believe it, and thus got me thinking … indeed, the arithmetic block can evaluate more complex formulas, involving sin  and cos  … so maybe if I can somehow keep track of where the robot is heading … well, let’s give it a try.

    So by dragging boxes and connecting wires, I implemented a simple logic (three variables, x and y for current position, alpha for current heading; in each loop add the “steering” input onto alpha, and add (sin(alpha),cos(alpha)) onto the current position; throw in some linear factors to calibrate; draw pixel at (x, y)). And, to my nephew’s joy and my astonishment, the robot was drawing a curvy line that clearly correlates with the taken path!

    It wasn’t respecting the angles perfectly, a square might not properly close, but half an hour earlier I would have actively bet against that we would pull this off!

    After a few turns

    After a few turns

    I was, I must admit, a bit proud. But not only about the technical nerding, but also about my uncleing: That night, according to his parents, my nephew said that he can’t remember ever being so happy!

    by Joachim Breitner ( at June 07, 2020 08:21 PM

    June 04, 2020

    Dominic Orchard

    A short exploration of GHC’s instance resolution hiding mistakes from the type checker.

    I was recently working on some Haskell code (for research, with Jack Hughes) and happened to be using a monoid (via the Monoid type class) and I was rushing. I accidentally wrote x `mempty` y instead of x `mappend` y. The code with mempty type checked and compiled, but I quickly noticed some tests giving unexpected results. After a pause, I checked the recent diff and noticed this mistake, but I had to think for a moment about why this mistake was not leading to a type error. I thought this was an interesting little example of how type class instance resolution can sometimes trip you up in Haskell, and how to uncover what is going on. This also points to a need for GHC to explain its instance resolution, something others have thought about; I will briefly mention some links at the end.

    The nub of the problem

    In isolation, my mistake was essentially this:

    whoops :: Monoid d => d -> d -> d
    whoops x y = mempty x y  -- Bug here. Should be "mappend x y"

    So, given two parameters of type d for which there is a monoid structure on d, we use both parameters as arguments to mempty.

    Recall the Monoid type class is essentially:

    class Monoid d where
      mempty :: d
      mappend :: d -> d -> d

    (note that ​​Monoid also has a derived operation mconcat and is now decomposed into Semigroup and Monoid but I elide that detail here, see

    We might naively think that whoops would therefore not type check since we do not know that d is a function type. However, whoops is well-typed and evaluating
    whoops [1,2,3] [4,5,6] returns []. If the code had been as I intended, (using mappend here instead of mempty) then we would expect [1,2,3,4,5,6] according to the usual monoid on lists.

    The reason this is not a type error is because of GHC’s instance resolution and the following provided instance of `Monoid`:

    instance Monoid b => Monoid (a -> b) where
      mempty = \_ -> mempty
      mappend f g = \x -> f x `mappend` g x

    That is, functions are monoids if their domain is a monoid with mempty as the constant function returning the mempty element of b and mappend as the pointwise lifting of a monoid to a function space.

    In this case, we can dig into what is happening by compiling1 with --ddump-ds-preopt and looking at GHC’s desugared output before optimisation, where all the type class instances have been resolved. I’ve cleaned up the output a little (mostly renaming):

    whoops :: forall d. Monoid d => d -> d -> d
    whoops = \ (@ d) ($dMonoid :: Monoid d) ->
        $dMonoid_f :: Monoid (d -> d)
        $dMonoid_f = GHC.Base.$fMonoid-> @ d @ d $dMonoid }
        $dMonoid_ff :: Monoid (d -> d -> d)
        $dMonoid_ff = GHC.Base.$fMonoid-> @ (d -> d) @ d $dMonoid_f }
        \ (x :: d) (y :: d) -> mempty @ (d -> d -> d) $dMonoid_ff x y

    The second line shows whoops has a type parameter (written @ d) and the incoming dictionary $dMonoid representing the Monoid d type class instance (type classes are implemented as data types called dictionaries, and I use the names $dMonoidX for these here).

    Via the explicit type application (of the form @ t for a type term t) we can see in the last line that mempty is being resolved at the type d -> d -> d with the monoid instance Monoid (d -> d -> d) given here by the $dMonoid_ff construction just above. This is in turn derived from the Monoid (d -> d) given by the dictionary construction $dMonoid_f just above that. Thus we have gone twice through the lifting of a monoid to a function space, and so our use of mempty here is:

    mempty @ (d -> d -> d) $dMonoid_ff = \_ -> (\_ -> mempty @ $dMonoid)


    mempty @ (d -> d -> d) $dMonoid_ff x y = mempty @ d $dMonoid

    That’s why the program type checks and we see the mempty element of the original intended monoid on d when applying whoops to some arguments and evaluating.


    I luckily spotted my mistake quite quickly, but this kind of bug can be a confounding experience for beginners. There has been some discussion about extending GHCi with a feature allowing users to ask GHC to explain its instance resolution. Michael Sloan has a nice write up discussing the idea and there is a GHC ticket proposing by Icelandjack something similar which seems like it would work well in this context where you want to ask what the instance resolution was for a particular expression. There are many much more confusing situations possible that get hidden by the implicit nature of instance resolution, so I think this would be a very useful feature, for beginner and expert Haskell programmers alike. And it certainly would have explained this particular error quickly to me without me having to scribble on paper and then check --ddump-ds-preopt to confirm my suspicion.

    Additional: I should also point out that this kind of situation could be avoided if there were ways to scope, import, and even name type class instances. The monoid instance Monoid b => Monoid (a -> b) is very useful, but having it in scope by default as part of base was really the main issue here.

    1 To compile, put this in a file with a main stub, e.g.

    main :: IO ()
    main = return ()

    by dorchard at June 04, 2020 08:16 AM


    Fix-ing regular expressions

    TL;DR: We add variables, let bindings, and explicit recursion via fixed points to classic regular expressions. It turns out that the resulting explicitly recursive, finitely described languages are well suited for analysis and introspection.

    It’s been almost a year since I touched the kleene library, and almost two years since I published it – a good time to write a little about regular expressions.

    I like regular expressions very much. They are truly declarative way to write down a grammar… as long as the grammar is expressible in regular expressions.

    Matthew Might, David Darais and Daniel Spiewak have written a paper Functional Pearl: Parsing with Derivatives published in ICFP ’11 Proceedings [Might2011] in which regular expressions are extended to handle context-free languages. However, they rely on memoization, and – as structures are infinite – also on reference equality. In short, their approach is not implementable in idiomatic Haskell.1

    There’s another technique that works for a subset of context-free languages. In my opinion, it is very elegant, and it is at least not painfully slow. The result is available on Hackage: the rere library. The idea is to treat regular expressions as a proper programming language, and add a constructions which proper languages should have: variables and recursion.

    This blog post will describe the approach taken by rere in more detail.

    Regular expression recap

    The abstract syntax of a regular expression (over the alphabet of unicode characters) is given by the following “constructors”:

    • Null regexp: \rerenull
    • Empty string: \rereeps
    • Characters: \rerelit{a} , \rerelit{b} etc
    • Concatenation: \rereconcat{\rerevarsub{r}{1}}{\rerevarsub{r}{2}}
    • Alternation: \rerealt{\rerevarsub{r}{1}}{\rerevarsub{r}{2}}
    • Kleene star: \rerestar{\rerevar{r}}

    The above can be translated directly into Haskell:

    data RE
        = Empty
        | Eps
        | Ch Char
        | App RE RE
        | Alt RE RE
        | Star RE

    In the rere implementation, instead of bare Char we use a set of characters, CharSet, as recommended by Owens et al. in Regular-expression derivatives reexamined [Owens2009]. This makes the implementation more efficient, as a common case of character sets is explicitly taken into account. We write them in curly braces: \{\lit{0} \ldots \lit{9}\} .

    We can give declarative semantics to these constructors. These will look like typing rules. A judgment \matches{\rerestr{\ensuremath{\Gamma}}}{\rerevar{r}} denotes that the regular expression \rerevar{r} successfully recognises the string \rerestr{\ensuremath{\Gamma}} .

    For example, the rule for application now looks like:

    \prftree[r]{\rulename{App}} {\matches{\rerestr{\ensuremath{\Gamma_1}}}{\rerevarsub{r}{1}}} {\matches{\rerestr{\ensuremath{\Gamma_2}}}{\rerevarsub{r}{2}}} {\matches{\rerestr{\ensuremath{\Gamma_1\Gamma_2}}}{\rereconcat{\rerevarsub{r}{1}}{\rerevarsub{r}{2}}}}

    This rule states that if \rerevarsub{r}{1} recognises \rerestr{\ensuremath{\Gamma_1}} , and \rerevarsub{r}{2} recognises \rerestr{\ensuremath{\Gamma_2}} , then the concatenation expression \rereconcat{\rerevarsub{r}{1}}\rerespace {\rerevarsub{r}{2}} recognises the concatenated string \rerestr{\ensuremath{\Gamma_1\Gamma_2}} .

    For alternation we have two rules, one for each of the alternatives:

    \begin{aligned} \prftree[r]{\rulename{Alt$_1$}} {\matches{\str{\Gamma}}{\var{r_1}}} {\matches{\str{\Gamma}}{\var{r_1}\cup\var{r_2}}} &\quad& \prftree[r]{\rulename{Alt$_2$}} {\matches{\str{\Gamma}}{\var{r_2}}} {\matches{\str{\Gamma}}{\var{r_1}\cup\var{r_2}}} \end{aligned}

    The rules resemble the structure of non-commutative intuitionistic linear logic, if you are into such stuff. Not only do you have to use everything exactly once; you have to use everything in order, there aren’t any substructural rules, no weakening, no contraction and even no exchange. I will omit the rest of the rules, look them up (and think how rules for Kleene star would look like ‘why not’ exponential ?).

    It’s a good idea to define smart versions of the constructors, which simplify regular expressions as they are created. For example, in the following Semigroup instance for concatenation, <> is a smart version of App:

    instance Semigroup RE where
        -- Empty annihilates
        Empty  <> _     = Empty
        _      <> Empty = Empty
        -- Eps is unit of <>
        Eps    <> r     = r
        r      <> Ep s  = r
        -- otherwise use App
        r      <> s     = App r s

    The smart version of Alt is called \/, and the smart version of Star is called star.

    We can check that the simplifications performed by the smart constructors are sound, by using the semantic rules. For example, the simplification Eps <> r = r is justified by the following equivalence of derivation trees:

    \begin{aligned} \prftree[r]{\rulename{App}}% {\prftree[r]{\rulename{Eps}}{\matches\eps\eps}}% {\prfsummary{}{\matches{\str\Gamma}{\var{r}}}}% {\matches{\str\Gamma}{\eps\var{r}}}% \qquad=\qquad \prfsummary{}{\matches{\str\Gamma}{\var{r}}} \end{aligned}

    If string \str\Gamma is matched by \eps\var{r} , then “the match” can be constructor only in one way, by applying the \ruleref{App} rule. Therefore \str\Gamma is also matched by bare \var{r} . If we introduced proof terms, we’d have a concrete evidence of the match as terms in this language.

    There is, however, a problem: matching using declarative rules is not practical. At several points in these rules, we have to guess. We have to guess whether we should pick left or right branch, or where we should split string to match concatenated regular expression. For a practical implementation, we need a syntax-directed approach. Interestingly, we then need just two rules:

    \begin{aligned} \prftree[r]{\rulename{Nullable}}% {\nullable{\var{r}}} {\matches\eps{\var{r}}} &\qquad& \prftree[r]{\rulename{Derivative}}% {\matches{\str\Gamma}{\D\gamma{\var{r}}}} {\matches{\str{\gamma\Gamma}}{\var{r}}}% \end{aligned}

    In the above rules, we use two operations: The decision procedure “nullable” that tells whether a regular expression can recognise the empty string, and a mapping \D{\str\gamma}{\var{r}} that, given a single character \str\gamma and a regular expression \rerevar{r} computes a new regular expression called the derivative of \rerevar{r} with respect to \str\gamma .

    Both operations are quite easy to map to Haskell. The function nullable is defined as a straight-forward recursive function:

    nullable :: RE -> Bool
    nullable Empty      = False
    nullable Eps       = True
    nullable (Ch _)    = False
    nullable (App r s) = nullable r && nullable s
    nullable (Alt r s) = nullable r || nullable s
    nullable (Star _)  = True

    The Brzozowski derivative is best understood by considering the formal language L regular expressions represent:

    \D{\str\gamma}{L} = \{ \str{\Gamma} \mid \str{\gamma}\str{\Gamma} \in L \}

    In Haskell terms: derivative c r matches string str if and only if r matches c : str. From this equivalence, we can more or less directly infer an implementation:

    derivative :: Char -> RE -> RE
    derivative _ Empty       = Empty
    derivative _ Eps         = Empty
    derivative _ (Ch x)
        | c == x             = Eps
        | otherwise          = Empty
    derivative c (App r s)
        | nullable r         = derivative c s \/ derivative c r <> s
        | otherwise          =                   derivative c r <> s
    derivative c (Alt r s)   = derivative c r \/ derivative c s
    derivative c r0@(Star r) = derivative c r <> r0

    We could try to show that the declarative and syntax directed systems are equivalent, but I omit it here, because it’s been done often enough in the literature (though probably not in exactly this way and notation).

    We can now watch how a regular expression “evolves” while matching a string. For example, if we take the regular expression \rep{(\lit{a}\lit{b})} , which in code looks like

    ex1 :: RE
    ex1 = star (Ch 'a' <> Ch 'b')

    then the following is how match ex1 "abab" proceeds:

    \begin{reretrace} \reretraceline[]{\rerestr{abab}}{\rerestar{(\rereconcat{\rerelit{a}}{\rerelit{b}})}} \reretraceline[]{\rerestr{bab}}{\rereconcat{\rerelit{b}}{\rerestar{(\rereconcat{\rerelit{a}}{\rerelit{b}})}}} \reretraceline[]{\rerestr{ab}}{\rerestar{(\rereconcat{\rerelit{a}}{\rerelit{b}})}} \reretraceline[]{\rerestr{b}}{\rereconcat{\rerelit{b}}{\rerestar{(\rereconcat{\rerelit{a}}{\rerelit{b}})}}} \reretraceline[]{\rereeps}{\rerestar{(\rereconcat{\rerelit{a}}{\rerelit{b}})}} \end{reretrace}

    We can see that there’s implicitly a small finite state automaton, with two states: an initial state \exI and secondary state \exIstep . This is the approach taken by the kleene package to transform regular expressions into finite state machines. There is an additional character set optimization from Regular-expression derivatives re-examined [Owens2009] by Owens, Reppy and Turon, but in essence, the approach works as follows: Try all possible derivatives, and in the process collect all the states and construct a transition function.2

    The string is accepted as the matching process stops at the \exI state, which is nullable.

    Variables and let-expressions

    The first new construct we now add to regular-expressions are let expressions. They alone do not add any matching power, but they are prerequisite for allowing recursive expressions.

    We already used meta-variables in the rules in the previous section. Let expressions allow us to internalise this notion. The declarative rule for let expressions is:

    \prftree[r]{\rulename{Let}} {\matches{\str{\Gamma}}{\subst{\var{s}}{\var{x}}{\var{r}}}} {\matches{\str{\Gamma}}{\letin{\var{x}}{\var{r}}{\var{s}}}}

    Here, the notation \subst{\var{s}}{\var{x}}{\var{r}} denotes substituting the variable \rerevar{x} by the regular expression \rerevar{r} in the regular expression \rerevar{s} .

    To have let expressions in our implementation, we need to represent variables and we must be able to perform substitution. My tool of choice for handling variables and substitution in general is the bound library. But for the sake of keeping the blog post self-contained, we’ll define the needed bits inline. We’ll reproduce a simple variant of bound, which amounts to using de Bruijn indices and polymorphic recursion.

    We define our own datatype to represent variables (which is isomorphic to Maybe):

    data Var a
        = B     -- ^ bound
        | F a   -- ^ free
      deriving (Eq, Show, Functor, Foldable, Traversable)

    With this, we can extend regular expressions with let. First we make it a functor, i.e., change RE to RE a, and then also add two new constructors: Var and Let:

    data RE a
        = Empty
        | Eps
        | Ch Char
        | App (RE a) (RE a)
        | Alt (RE a) (RE a)
        | Star (RE a)
        | Var a
        | Let (RE a) (RE (Var a))

    Note that we keep the argument a unchanged in all recursive occurrences of RE, with the exception of the body of the Let, where use use Var a instead, indicating that we can use B in that body to refer to the variable bound by the Let.

    In the actual rere library, the Let (and later Fix) constructors additionally have an irrelevant Name field, which allows us to retain the variable names and use them for pretty-printing. I omit them from the presentation in this blog post.

    Now, we can write a regular expression with repetitions like \exII instead of \exIIvar ; or in Haskell:

    ex2 :: RE Void
    ex2 = Let (star (Ch 'a')) (Var B <> Var B)

    The use of Void as parameter tells us that expression is closed, i.e., doesn’t contain any free variables.

    We still need to extend nullable and derivative to work with the new constructors. For nullable, we’ll simply pass a function telling whether variables in context are nullable. The existing constructors just pass a context around:

    nullable :: RE Void -> Bool
    nullable = nullable' absurd
    nullable' :: (a -> Bool) -> RE a -> Bool
    nullable' _ Empty     = False
    nullable' _ Eps       = True
    nullable' _ (Ch _)    = False
    nullable' f (App r s) = nullable' f r && nullable' f s
    nullable' f (Alt r s) = nullable' f r || nullable' f s
    nullable' _ (Star _)  = True

    The cases for Var and Let use and extend the context, respectively:

    -- Var: look in the context
    nullable' f (Var a)   = f a
    -- Let: - compute `nullable r`
    --      - extend the context
    --      - continue with `s`
    nullable' f (Let r s) = nullable' (unvar (nullable' f r) f) s

    The unvar function corresponds to maybe, but transported to our Var type:

    unvar :: r -> (a -> r) -> Var a -> r
    unvar b _ B     = b
    unvar _ f (F x) = f x

    How to extend derivative to cover the new cases requires a bit more thinking. The idea is similar: we want to add to the context whatever we need to know about the variables. The key insight is to replace every Let binding by two Let bindings, one copying the original, and one binding to the derivative of the let-bound variable. Because the number of let bindings changes, we have to carefully re-index variables as we go.

    Therefore, the context for derivative consists of three pieces of information per variable:

    • whether the variable is nullable (we need it for derivative of App),
    • the variable denoting the derivative of the original variable,
    • the re-indexed variable denoting the original value.

    The top-level function derivative :: Char -> RE Void -> RE Void now makes use of a local helper function

    derivative' :: (Eq a, Eq b) => (a -> (Bool, b, b)) -> RE a -> RE b

    which takes this context. Note that, as discussed above, derivative' changes the indices of the variables. However, at the top-level, both a and b are Void, and the environment can be trivially instantiated to the function with empty domain.

    The derivative' case for Var is simple: we just look up the derivative of the Var in the context.

    derivative' f (Var a) = Var (sndOf3 (f a))

    The case for Let is quite interesting:

    derivative' f (Let r s)
        = let_ (fmap (trdOf3 . f) r)       -- rename variables in r
        $ let_ (fmap F (derivative' f r))  -- binding for derivative of r
        $ derivative' (\case
            B   -> (nullable' (fstOf3 . f) r, B, F B)
            F x -> bimap (F . F) (F . F) (f x))
        $ s

    As a formula it looks like:

    \D{\rerestr{c}}{\rereletin{\rerevar{x}}{\rerevar{r}}{\rerevar{s}}} = \begin{rerealignedlet} \rereleteqn{\rerevar{x}}{\rerevar{r}} \rereleteqn{\rerevarsub{x}{\rerestr{c}}}{\D{\rerestr{c}}{\rerevar{r}}} \rereletbody{\D{\rerestr{c}}{\rerevar{s}} \quad\text{where}\quad \D{\rerestr{c}}{\rerevar{x}} = \rerevarsub{x}{\rerestr{c}}} \end{rerealignedlet}

    For our running example \exII or Let (star (Ch 'a')) (Var B <> Var B), we call derivative' recursively with an argument of type RE (Var a), corresponding to the one variable \rerevar{x} , and we get back a RE (Var (Var b)), corresponding to the two variables \rerevar{x} and \rerevarsub{x}{\rerestr{c}} .

    The careful reader will also have noticed the smart constructor let_, which does a number of standard rewritings on the fly (which I explain in a Do you have a problem? Write a compiler! talk). These are justified by the properties of substitution:

    -- let-from-let
    let x = (let y = a in b) in c
    -- ==>
    let y = a; x = b in c
    -- inlining of cheap bindings
    let x = a in b
    -- ==>
    b [ x -> a ] -- when a is cheap, i.e. Empty, Eps, Ch or Var
    -- used once, special case
    let x = a in x
    -- ==>
    -- unused binding
    let x = a in b
    -- ==>
    b -- when x is unused in b

    And importantly, we employ a quick form common-subexpression-elimination (CSE):

    let x = a in f x a
    -- ==>
    let x = a in f x x

    This form of CSE is easy and fast to implement, as we don’t introduce new lets, only consider what we already bound and try to increase sharing.

    It’s time for examples: Recall again ex2 which was defined as \exII or

    ex2 :: RE Void
    ex2 = Let "r" (star (Ch 'a')) (Var B <> Var B)

    Let’s try to observe the match of the string \rerestr{aaa} step by step:

    \begin{reretrace} \reretraceline[]{\rerestr{aaa}}{\rereletin{\rerevar{r}}{\rerestar{\rerelit{a}}}{\rereconcat{\rerevar{r}}{\rerevar{r}}}} \reretraceline[]{\rerestr{aa}}{\rereletin{\rerevar{r}}{\rerestar{\rerelit{a}}}{\rerealt{\rerevar{r}}{\rereconcat{\rerevar{r}}{\rerevar{r}}}}} \reretraceline[]{\rerestr{a}}{\rereletin{\rerevar{r}}{\rerestar{\rerelit{a}}}{\rerealt{\rerevar{r}}{\rereconcat{\rerevar{r}}{\rerevar{r}}}}} \reretraceline[]{\rereeps}{\rereletin{\rerevar{r}}{\rerestar{\rerelit{a}}}{\rerealt{\rerevar{r}}{\rereconcat{\rerevar{r}}{\rerevar{r}}}}} \end{reretrace}

    As our smart constructors are quite smart, the automaton stays in its single state, the union comes from the derivative of App, as r is nullable, we get derivative 'a' r \/ derivative 'a' r <> r. And as derivative 'a' r = r, we don’t see any additional let bindings.


    Now we are ready for the main topic of the post: recursion. We add one more constructor to our datatype of regular expressions:

    data RE a
        | Fix (RE (Var a))

    The Fix construct looks similar to Let, except that the bound variable is semantically equivalent to the whole expression. We can unroll each \FIX expression by substituting it into itself:

    \prftree[r]{\rulename{Unroll}} {\matches{\str\Gamma}{\subst{\var{r}}{\var{x}}{\fix{\var{x}}{\var{r}}}}} {\matches{\str\Gamma}{\fix{\var{x}}{\var{r}}}}

    The Fix constructor subsumes the Kleene star, as \rerestar{\rerevar{r}} can now be expressed as \rerefix{\rerevar{x}}{\rerealt{\rereeps}{\rereconcat{\rerevar{r}}{\rerevar{x}}}} , which feels like a very natural definition indeed. For example ex1 previously defined using Kleene star as \exI could also be re-defined as \exIII . That looks like

    ex3 :: RE Void
    ex3 = Fix "x" (Eps \/ Ch 'a' <> Ch 'b' <> Var B)

    in code.

    The problem is now the same as with Let: How to define nullable and derivative? Fortunately, we have most of the required machinery already in place from the addition of Var and Let.

    Nullability of Fix relies on Kleene’s theorem to compute the least fixed point of a monotonic recursive definition, like in Parsing with Derivatives. The idea is to unroll Fix once, and to pretend that the nullability of the recursive occurrence of the bound variable in Fix is False:

    nullable' :: (a -> Bool) -> RE a -> Bool
    nullable' f (Fix _ r)   = nullable' (unvar False f) r

    In other words, we literally assume that the nullability of new binding is False, and see what comes out. We don’t need to iterate more then once, as False will flip to True right away, or will never do so even with further unrollings.

    Following a similar idea, our smart constructor fix_ is capable of recognising a Empty fixed point by substituting Empty for the recursive occurrence in the unrolling:

    fix_ :: RE (Var a) -> RE a
    fix_ r | (r >>>= unvar Empty Var) == Empty = Empty

    This works because Empty is a bottom of the language-inclusion lattice (just as False is a bottom of the Bool lattice).

    The extension of derivative is again a bit more involved, but it resembles what we did for Let: As the body \rerevar{r} of a \rereFIX contains self references \rerevar{x} , the derivative of a \rereFIX will also be a \rereFIX . Thus, when we need to compute the derivative of \rerevar{x} , we’ll use \rerevarsub{x}{c} . It is important that not all occurrences of \rerevar{x} in the body of a \rereFIX will turn into references to its derivative (e.g., if they appear to the right of an App, or in a Star), so we need to save the value of \rerevar{x} in a let binding – how fortunate that we just introduced those … Schematically, the transformation looks as follows:

    \D{c}{\fix{\var{x}}{\containing{\var{r}}{\var{x}}}} = \begin{aligned}[t] \LET~&\var{x} = \fix{\var{x_1}}{\containing{\var{r}}{\var{x_1}}} \\ \IN~&\fix{\var{x_c}}{\D{c}{\containing{\var{r}}{\var{x}}}} \quad\text{where}\quad \D{c}{\var{x}} = \var{x_c} \end{aligned}

    In the rest, we will use a shorthand notation for a let binding to a \rereFIX , as in \LET~\var{x} = \fix{\var{x_1}}{\containing{\var{r}}{\var{x_1}}} . We will write such a binding more succinctly as \LET~\var{x} =_{R} \containing{\var{r}}{\var{x}} with the R subscript indicating that the binding is recursive. We prefer this notation over introducing \textbf{letrec} , because in a cascade of \LET expressions, we can have individual bindings being recursive, but we still cannot forward-reference to later bindings.

    Applying the abbreviation to our derivation rule above yields

    \D{c}{\fix{\var{x}}{\containing{\var{r}}{\var{x}}}} = \begin{aligned}[t] \LET~&\var{x} =_{R} \containing{\var{r}}{\var{x}} \\ \IN~&\fix{\var{x_c}}{\D{c}{\containing{\var{r}}{\var{x}}}} \quad\text{where}\quad \D{c}{\var{x}} = \var{x_c} \end{aligned}

    Let’s compare this to the let case, rearranged slightly, to establish the similarity:

    \D{c}{\letin{\var{x}}{\var{r}}{\var{s}}} = \begin{aligned}[t] \LET~&\var{x} = \var{r} \\ \IN~&\letin{\var{x_c}}{\D{c}{\var{r}}}{\D{c}{\var{s}}} \quad\text{where}\quad \D{c}{\var{x}} = \var{x_c} \end{aligned}

    Consequently, the implementation in Haskell also looks similar to the Let case:

    derivative' f r0@(Fix r)
        = let_ (fmap (trdOf3 . f) r0)
        $ fix_
        $ derivative' (\case
            B   -> (nullable' (fstOf3 . f) r0, B, F B)
            F x -> bimap (F . F) (F . F) (f x))
        $ r

    Let’s see how it works in practice. We observe the step-by-step matching of ex3 on abab, which was ex1 defined using a fixed point rather than the Kleene star:

    \begin{reretrace} \reretraceline[]{\rerestr{abab}}{\rerefix{\rerevar{x}}{\rerealt{\rereeps}{\rereconcat{\rerelit{a}}{\rereconcat{\rerelit{b}}{\rerevar{x}}}}}} \reretraceline[]{\rerestr{bab}}{\rereletrecin{\rerevar{x}}{\rerealt{\rereeps}{\rereconcat{\rerelit{a}}{\rereconcat{\rerelit{b}}{\rerevar{x}}}}}{\rereconcat{\rerelit{b}}{\rerevar{x}}}} \reretraceline[]{\rerestr{ab}}{\rerefix{\rerevar{x}}{\rerealt{\rereeps}{\rereconcat{\rerelit{a}}{\rereconcat{\rerelit{b}}{\rerevar{x}}}}}} \reretraceline[]{\rerestr{b}}{\rereletrecin{\rerevar{x}}{\rerealt{\rereeps}{\rereconcat{\rerelit{a}}{\rereconcat{\rerelit{b}}{\rerevar{x}}}}}{\rereconcat{\rerelit{b}}{\rerevar{x}}}} \reretraceline[]{\rereeps}{\rerefix{\rerevar{x}}{\rerealt{\rereeps}{\rereconcat{\rerelit{a}}{\rereconcat{\rerelit{b}}{\rerevar{x}}}}}} \end{reretrace}

    We couldn’t wish for a better outcome. We see the same two-state ping-pong behavior as we got using the Kleene star.

    More examples

    The \rereFIX / Fix is a much more powerful construction than the Kleene star. Let’s look at some examples …


    Probably the simplest non-regular language is some amount of \rerestr{a} s followed by the same amount of \rerestr{b} s:

    L = \{ \str{a}^n \str{b}^n \mid n \in \mathbb{N} \}

    We can describe that language using our library, thanks to the presence of fixed points: \exIV (note the variable \rerevar{x} in between the literal symbols). Transcribed to Haskell code, this is:

    ex4 :: RE Void
    ex4 = Fix (Eps \/ Ch 'a' <> Var B <> Ch 'b')

    And we can test the expression on a string in the language, for example "aaaabbbb":

    \begin{reretrace} \reretraceline[]{\rerestr{aaaabbbb}}{\rerefix{\rerevar{x}}{\rerealt{\rereeps}{\rereconcat{\rerelit{a}}{\rereconcat{\rerevar{x}}{\rerelit{b}}}}}} \reretraceline[]{\rerestr{aaabbbb}}{\rereletrecin{\rerevar{x}}{\rerealt{\rereeps}{\rereconcat{\rerelit{a}}{\rereconcat{\rerevar{x}}{\rerelit{b}}}}}{\rereconcat{\rerevar{x}}{\rerelit{b}}}} \reretraceline[]{\rerestr{aabbbb}}{\begin{rerealignedlet}\rereletreceqn{\rerevar{x}}{\rerealt{\rereeps}{\rereconcat{\rerelit{a}}{\rereconcat{\rerevar{x}}{\rerelit{b}}}}}\rereleteqn{\rerevarsub{x}{\rerestr{a}}}{\rereconcat{\rerevar{x}}{\rerelit{b}}}\rereletbody{\rereconcat{\rerevarsub{x}{\rerestr{a}}}{\rerelit{b}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{abbbb}}{\begin{rerealignedlet}\rereletreceqn{\rerevar{x}}{\rerealt{\rereeps}{\rereconcat{\rerelit{a}}{\rereconcat{\rerevar{x}}{\rerelit{b}}}}}\rereleteqn{\rerevarsub{x}{\rerestr{a}}}{\rereconcat{\rerevar{x}}{\rerelit{b}}}\rereleteqn{\rerevarsub{x}{\rerestr{aa}}}{\rereconcat{\rerevarsub{x}{\rerestr{a}}}{\rerelit{b}}}\rereletbody{\rereconcat{\rerevarsub{x}{\rerestr{aa}}}{\rerelit{b}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{bbbb}}{\begin{rerealignedlet}\rereletreceqn{\rerevar{x}}{\rerealt{\rereeps}{\rereconcat{\rerelit{a}}{\rereconcat{\rerevar{x}}{\rerelit{b}}}}}\rereleteqn{\rerevarsub{x}{\rerestr{a}}}{\rereconcat{\rerevar{x}}{\rerelit{b}}}\rereleteqn{\rerevarsub{x}{\rerestr{aa}}}{\rereconcat{\rerevarsub{x}{\rerestr{a}}}{\rerelit{b}}}\rereleteqn{\rerevarsub{x}{\rerestr{aaa}}}{\rereconcat{\rerevarsub{x}{\rerestr{aa}}}{\rerelit{b}}}\rereletbody{\rereconcat{\rerevarsub{x}{\rerestr{aaa}}}{\rerelit{b}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{bbb}}{\rereletin{\rerevarsub{x}{\rerestr{aaab}}}{\rereconcat{\rerelit{b}}{\rerelit{b}}}{\rereconcat{\rerevarsub{x}{\rerestr{aaab}}}{\rerelit{b}}}} \reretraceline[]{\rerestr{bb}}{\rereconcat{\rerelit{b}}{\rerelit{b}}} \reretraceline[]{\rerestr{b}}{\rerelit{b}} \reretraceline[]{\rereeps}{\rereeps} \end{reretrace}

    Now things become more interesting. We can see how in the trace of this not-so-regular expression, we obtain let bindings resembling the stack of a pushdown automaton.

    From the trace one can relatively easily see that if we “forget” one b at the end of the input string, then the “state” b isn’t nullable, so the string won’t be recognized.

    Left recursion

    Previously in this post, we have rewritten \exI as \exIII . But another option is to use recursion on the left, i.e., to write \exV instead:

    ex5 :: RE Void
    ex5 = Fix "x" (Eps \/ Var B <> Ch 'a' <> Ch 'b')

    This automaton works as well. In fact, in some sense it works better than the right-recursive one: we can see (as an artifact of variable naming), that we get the derivatives as output of each step. We do save the original expression in a \rereLET , but as it is unused in the result, our smart constructors will drop it:

    \begin{reretrace} \reretraceline[]{\rerestr{abab}}{\rerefix{\rerevar{x}}{\rerealt{\rereeps}{\rereconcat{\rerevar{x}}{\rereconcat{\rerelit{a}}{\rerelit{b}}}}}} \reretraceline[]{\rerestr{bab}}{\rerefix{\rerevarsub{x}{\rerestr{a}}}{\rerealt{\rerelit{b}}{\rereconcat{\rerevarsub{x}{\rerestr{a}}}{\rereconcat{\rerelit{a}}{\rerelit{b}}}}}} \reretraceline[]{\rerestr{ab}}{\rerefix{\rerevarsub{x}{\rerestr{ab}}}{\rerealt{\rereeps}{\rereconcat{\rerevarsub{x}{\rerestr{ab}}}{\rereconcat{\rerelit{a}}{\rerelit{b}}}}}} \reretraceline[]{\rerestr{b}}{\rerefix{\rerevarsub{x}{\rerestr{aba}}}{\rerealt{\rerelit{b}}{\rereconcat{\rerevarsub{x}{\rerestr{aba}}}{\rereconcat{\rerelit{a}}{\rerelit{b}}}}}} \reretraceline[]{\rereeps}{\rerefix{\rerevarsub{x}{\rerestr{abab}}}{\rerealt{\rereeps}{\rereconcat{\rerevarsub{x}{\rerestr{abab}}}{\rereconcat{\rerelit{a}}{\rerelit{b}}}}}} \end{reretrace}

    Arithmetic expressions

    Another go-to example of context free grammars is arithmetic expressions:


    The Haskell version is slightly more inconvenient to write due to the use of de Bruijn indices, but otherwise straight-forward:

    ex6 :: RE Void
    ex6 = let_ (Ch "0123456789")
        $ let_ (Var B <> star_ (Var B))
        $ fix_
        $ ch_ '(' <> Var B <> ch_ ')'
        \/ Var (F B)
        \/ Var B <> ch_ '+' <> Var B
        \/ Var B <> ch_ '*' <> Var B

    Here is an (abbreviated) trace of matching the input string \rerestr{1\rerecharstar\rerecharpopen20\rerecharplus3\rerecharpclose} :

    \begin{reretrace} \reretraceline[]{\rerestr{1\rerecharstar\rerecharpopen20\rerecharplus3\rerecharpclose}}{\rereletin{\rerevar{n}}{\rereconcat{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}}{\rerefix{\rerevar{e}}{\rerealt{\rereconcat{\rerelit{\rerecharpopen}}{\rereconcat{\rerevar{e}}{\rerelit{\rerecharpclose}}}}{\rerealt{\rerevar{n}}{\rerealt{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}}} \reretraceline[]{\rerestr{\rerecharstar\rerecharpopen20\rerecharplus3\rerecharpclose}}{\begin{rerealignedlet}\rereleteqn{\rerevar{n}}{\rereconcat{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}}\rereleteqn{\rerevarsub{n}{\rerestr{1}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereletreceqn{\rerevar{e}}{\rerealt{\rereconcat{\rerelit{\rerecharpopen}}{\rereconcat{\rerevar{e}}{\rerelit{\rerecharpclose}}}}{\rerealt{\rerevar{n}}{\rerealt{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\rereletbody{\rerefix{\rerevarsub{e}{\rerestr{1}}}{\rerealt{\rerevarsub{n}{\rerestr{1}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{1}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevarsub{e}{\rerestr{1}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{\rerecharpopen20\rerecharplus3\rerecharpclose}}{\begin{rerealignedlet}\rereleteqn{\rerevar{n}}{\rereconcat{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}}\rereletreceqn{\rerevar{e}}{\rerealt{\rereconcat{\rerelit{\rerecharpopen}}{\rereconcat{\rerevar{e}}{\rerelit{\rerecharpclose}}}}{\rerealt{\rerevar{n}}{\rerealt{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\rereletbody{\rerefix{\rerevarsub{e}{\rerestr{1\rerecharstar}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rerealt{\rerevar{e}}{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\end{rerealignedlet}} &\ \ \ \vdots\\ %\reretraceline[]{\rerestr{20\rerecharplus3\rerecharpclose}}{\begin{rerealignedlet}\rereleteqn{\rerevar{n}}{\rereconcat{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}}\rereletreceqn{\rerevar{e}}{\rerealt{\rereconcat{\rerelit{\rerecharpopen}}{\rereconcat{\rerevar{e}}{\rerelit{\rerecharpclose}}}}{\rerealt{\rerevar{n}}{\rerealt{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\rereletreceqn{\rerevarsub{e}{\rerestr{\rerecharpopen}}}{\rerealt{\rereconcat{\rerevar{e}}{\rerelit{\rerecharpclose}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{\rerecharpopen}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevarsub{e}{\rerestr{\rerecharpopen}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}\rereletbody{\rerefix{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rerealt{\rerevarsub{e}{\rerestr{\rerecharpopen}}}{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\end{rerealignedlet}} %\reretraceline[]{\rerestr{0\rerecharplus3\rerecharpclose}}{\begin{rerealignedlet}\rereleteqn{\rerevar{n}}{\rereconcat{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}}\rereleteqn{\rerevarsub{n}{\rerestr{2}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereletreceqn{\rerevar{e}}{\rerealt{\rereconcat{\rerelit{\rerecharpopen}}{\rereconcat{\rerevar{e}}{\rerelit{\rerecharpclose}}}}{\rerealt{\rerevar{n}}{\rerealt{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\rereletreceqn{\rerevarsub{e}{\rerestr{2}}}{\rerealt{\rerevarsub{n}{\rerestr{2}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{2}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevarsub{e}{\rerestr{2}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}\rereletreceqn{\rerevarsub{e}{\rerestr{\rerecharpopen2}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{2}}}{\rerelit{\rerecharpclose}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{\rerecharpopen2}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevarsub{e}{\rerestr{\rerecharpopen2}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}\rereletbody{\rerefix{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen2}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen2}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rerealt{\rerevarsub{e}{\rerestr{\rerecharpopen2}}}{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen2}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\end{rerealignedlet}} %\reretraceline[]{\rerestr{\rerecharplus3\rerecharpclose}}{\begin{rerealignedlet}\rereleteqn{\rerevar{n}}{\rereconcat{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}}\rereleteqn{\rerevarsub{n}{\rerestr{0}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereletreceqn{\rerevar{e}}{\rerealt{\rereconcat{\rerelit{\rerecharpopen}}{\rereconcat{\rerevar{e}}{\rerelit{\rerecharpclose}}}}{\rerealt{\rerevar{n}}{\rerealt{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\rereletreceqn{\rerevarsub{e}{\rerestr{20}}}{\rerealt{\rerevarsub{n}{\rerestr{0}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{20}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevarsub{e}{\rerestr{20}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}\rereletreceqn{\rerevarsub{e}{\rerestr{\rerecharpopen20}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{20}}}{\rerelit{\rerecharpclose}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{\rerecharpopen20}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevarsub{e}{\rerestr{\rerecharpopen20}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}\rereletbody{\rerefix{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen20}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen20}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rerealt{\rerevarsub{e}{\rerestr{\rerecharpopen20}}}{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen20}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{3\rerecharpclose}}{\begin{rerealignedlet}\rereleteqn{\rerevar{n}}{\rereconcat{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}}\rereletreceqn{\rerevar{e}}{\rerealt{\rereconcat{\rerelit{\rerecharpopen}}{\rereconcat{\rerevar{e}}{\rerelit{\rerecharpclose}}}}{\rerealt{\rerevar{n}}{\rerealt{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\rereletreceqn{\rerevarsub{e}{\rerestr{20\rerecharplus}}}{\rerealt{\rerevar{e}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{20\rerecharplus}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevarsub{e}{\rerestr{20\rerecharplus}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}\rereletreceqn{\rerevarsub{e}{\rerestr{\rerecharpopen20\rerecharplus}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{20\rerecharplus}}}{\rerelit{\rerecharpclose}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{\rerecharpopen20\rerecharplus}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevarsub{e}{\rerestr{\rerecharpopen20\rerecharplus}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}\rereletbody{\rerefix{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen20\rerecharplus}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen20\rerecharplus}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rerealt{\rerevarsub{e}{\rerestr{\rerecharpopen20\rerecharplus}}}{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen20\rerecharplus}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{\rerecharpclose}}{\begin{rerealignedlet}\rereleteqn{\rerevar{n}}{\rereconcat{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}}\rereleteqn{\rerevarsub{n}{\rerestr{3}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereletreceqn{\rerevar{e}}{\rerealt{\rereconcat{\rerelit{\rerecharpopen}}{\rereconcat{\rerevar{e}}{\rerelit{\rerecharpclose}}}}{\rerealt{\rerevar{n}}{\rerealt{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\rereletreceqn{\rerevarsub{e}{\rerestr{3}}}{\rerealt{\rerevarsub{n}{\rerestr{3}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{3}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevarsub{e}{\rerestr{3}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}\rereletreceqn{\rerevarsub{e}{\rerestr{20\rerecharplus3}}}{\rerealt{\rerevarsub{e}{\rerestr{3}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{20\rerecharplus3}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevarsub{e}{\rerestr{20\rerecharplus3}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}\rereletreceqn{\rerevarsub{e}{\rerestr{\rerecharpopen20\rerecharplus3}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{20\rerecharplus3}}}{\rerelit{\rerecharpclose}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{\rerecharpopen20\rerecharplus3}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevarsub{e}{\rerestr{\rerecharpopen20\rerecharplus3}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}\rereletbody{\rerefix{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen20\rerecharplus3}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen20\rerecharplus3}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rerealt{\rerevarsub{e}{\rerestr{\rerecharpopen20\rerecharplus3}}}{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen20\rerecharplus3}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\end{rerealignedlet}} \reretraceline[]{\rereeps}{\begin{rerealignedlet}\rereleteqn{\rerevar{n}}{\rereconcat{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}}\rereletreceqn{\rerevar{e}}{\rerealt{\rereconcat{\rerelit{\rerecharpopen}}{\rereconcat{\rerevar{e}}{\rerelit{\rerecharpclose}}}}{\rerealt{\rerevar{n}}{\rerealt{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevar{e}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\rereletreceqn{\rerevarsub{e}{\rerestr{\rerecharpopen20\rerecharplus3\rerecharpclose}}}{\rerealt{\rereeps}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{\rerecharpopen20\rerecharplus3\rerecharpclose}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rereconcat{\rerevarsub{e}{\rerestr{\rerecharpopen20\rerecharplus3\rerecharpclose}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}\rereletbody{\rerefix{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen20\rerecharplus3\rerecharpclose}}}{\rerealt{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen20\rerecharplus3\rerecharpclose}}}{\rereconcat{\rerelit{\rerecharplus}}{\rerevar{e}}}}{\rerealt{\rerevarsub{e}{\rerestr{\rerecharpopen20\rerecharplus3\rerecharpclose}}}{\rereconcat{\rerevarsub{e}{\rerestr{1\rerecharstar\rerecharpopen20\rerecharplus3\rerecharpclose}}}{\rereconcat{\rerelit{\rerecharstar}}{\rerevar{e}}}}}}}\end{rerealignedlet}} \end{reretrace}

    One can see that the final state is nullable and the word is therefore accepted: one options in the final \rereFIX is \rerevarsub{e}{\rerestr{\rerecharpopen20\rerecharplus3\rerecharpclose}} , and that is nullable in itself, because it is a union containing \rereeps .

    The other two options are “continuations” starting with \rerestr{\rerecharplus} or \rerestr{\rerecharstar} , as the arithmetic expression could indeed continue with these two characters.

    Conversion from context-free grammars

    Can all context-free languages be expressed in this framework? Is there some algorithm to rewrite a usual context-free grammar into the formalism presented here? The answer to both these questions is yes.

    For example, the following non-ambiguous grammar for arithmetic expressions

    \begin{rerecfg} \rerecfgproduction{\rerevar{digit}}{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}} \rerecfgproduction{\rerevar{digits}}{\rerevar{digit}\,\rerestar{\rerevar{digit}}} \rerecfgproduction{\rerevar{term}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}} \rerecfgproduction{\rerevar{mult}}{\rerealt{\rerevar{term}\rerelit{\rerecharplus{}}\rerevar{mult}}{\rerevar{term}}} \rerecfgproduction{\rerevar{expr}}{\rerealt{\rerevar{mult}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevar{mult}}} \end{rerecfg}

    can be converted into the following “recursive regular expression”:


    And it works – it’s fascinating to see how the “state expression” evolves during a match:

    \begin{reretrace} \reretraceline[]{\rerestr{1\rerecharstar{}\rerecharpopen{}20\rerecharplus{}3\rerecharpclose{}}}{\begin{rerealignedlet}\rereleteqn{\rerevar{digit}}{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}\rereleteqn{\rerevar{digits}}{\rerevar{digit}\,\rerestar{\rerevar{digit}}}\rereletbody{\rerefix{\rerevar{expr}}{\begin{rerealignedlet}\rereleteqn{\rerevar{term}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevar{mult}}{\rerealt{\rerevar{term}\rerelit{\rerecharplus{}}\rerevar{mult}}{\rerevar{term}}}\rereletbody{\rerealt{\rerevar{mult}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevar{mult}}}\end{rerealignedlet}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{\rerecharstar{}\rerecharpopen{}20\rerecharplus{}3\rerecharpclose{}}}{\begin{rerealignedlet}\rereleteqn{\rerevar{digits}}{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereleteqn{\rerevarsub{digits}{\rerestr{1}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereletreceqn{\rerevar{expr}}{\begin{rerealignedlet}\rereleteqn{\rerevar{term}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevar{mult}}{\rerealt{\rerevar{term}\rerelit{\rerecharplus{}}\rerevar{mult}}{\rerevar{term}}}\rereletbody{\rerealt{\rerevar{mult}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevar{mult}}}\end{rerealignedlet}}\rereleteqn{\rerevarsub{term}{\rerestr{}1}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevarsub{mult}{\rerestr{}1}}{\rerealt{\rerevarsub{term}{\rerestr{}1}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{term}{\rerestr{}1}}}\rereleteqn{\rerevarsub{mult}{\rerestr{1}}}{\rerealt{\rerevarsub{digits}{\rerestr{1}}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{digits}{\rerestr{1}}}}\rereletbody{\rerealt{\rerevarsub{mult}{\rerestr{1}}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevarsub{mult}{\rerestr{1}}}}\end{rerealignedlet}} &\ \ \ \vdots\\ %\reretraceline[]{\rerestr{\rerecharpopen{}20\rerecharplus{}3\rerecharpclose{}}}{\rereletin{\rerevar{digits}}{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}{\rerefix{\rerevar{expr}}{\begin{rerealignedlet}\rereleteqn{\rerevar{term}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevar{mult}}{\rerealt{\rerevar{term}\rerelit{\rerecharplus{}}\rerevar{mult}}{\rerevar{term}}}\rereletbody{\rerealt{\rerevar{mult}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevar{mult}}}\end{rerealignedlet}}}} %\reretraceline[]{\rerestr{20\rerecharplus{}3\rerecharpclose{}}}{\begin{rerealignedlet}\rereleteqn{\rerevar{digits}}{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereletreceqn{\rerevar{expr}}{\begin{rerealignedlet}\rereleteqn{\rerevar{term}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevar{mult}}{\rerealt{\rerevar{term}\rerelit{\rerecharplus{}}\rerevar{mult}}{\rerevar{term}}}\rereletbody{\rerealt{\rerevar{mult}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevar{mult}}}\end{rerealignedlet}}\rereleteqn{\rerevarsub{term}{\rerestr{}1}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereleteqn{\rerevarsub{term}{\rerestr{\rerecharpopen{}}}}{\rerevar{expr}\rerelit{\rerecharpclose{}}}\rereletreceqn{\rerevarsub{mult}{\rerestr{}1}}{\rerealt{\rerevarsub{term}{\rerestr{}1}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{term}{\rerestr{}1}}}\rereleteqn{\rerevarsub{mult}{\rerestr{\rerecharpopen{}}}}{\rerealt{\rerevarsub{term}{\rerestr{\rerecharpopen{}}}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{term}{\rerestr{\rerecharpopen{}}}}}\rereletbody{\rerealt{\rerevarsub{mult}{\rerestr{\rerecharpopen{}}}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevarsub{mult}{\rerestr{\rerecharpopen{}}}}}\end{rerealignedlet}} %\reretraceline[]{\rerestr{0\rerecharplus{}3\rerecharpclose{}}}{\begin{rerealignedlet}\rereleteqn{\rerevar{digits}}{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereleteqn{\rerevarsub{digits}{\rerestr{2}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereletreceqn{\rerevar{expr}}{\begin{rerealignedlet}\rereleteqn{\rerevar{term}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevar{mult}}{\rerealt{\rerevar{term}\rerelit{\rerecharplus{}}\rerevar{mult}}{\rerevar{term}}}\rereletbody{\rerealt{\rerevar{mult}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevar{mult}}}\end{rerealignedlet}}\rereleteqn{\rerevarsub{term}{\rerestr{}1}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevarsub{mult}{\rerestr{}1}}{\rerealt{\rerevarsub{term}{\rerestr{}1}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{term}{\rerestr{}1}}}\rereleteqn{\rerevarsub{mult}{\rerestr{2}}}{\rerealt{\rerevarsub{digits}{\rerestr{2}}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{digits}{\rerestr{2}}}}\rereleteqn{\rerevarsub{expr}{\rerestr{2}}}{\rerealt{\rerevarsub{mult}{\rerestr{2}}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevarsub{mult}{\rerestr{2}}}}\rereleteqn{\rerevarsub{term}{\rerestr{\rerecharpopen{}2}}}{\rerevarsub{expr}{\rerestr{2}}\rerelit{\rerecharpclose{}}}\rereleteqn{\rerevarsub{mult}{\rerestr{\rerecharpopen{}2}}}{\rerealt{\rerevarsub{term}{\rerestr{\rerecharpopen{}2}}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{term}{\rerestr{\rerecharpopen{}2}}}}\rereletbody{\rerealt{\rerevarsub{mult}{\rerestr{\rerecharpopen{}2}}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevarsub{mult}{\rerestr{\rerecharpopen{}2}}}}\end{rerealignedlet}} %\reretraceline[]{\rerestr{\rerecharplus{}3\rerecharpclose{}}}{\begin{rerealignedlet}\rereleteqn{\rerevar{digits}}{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereleteqn{\rerevarsub{digits}{\rerestr{0}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereletreceqn{\rerevar{expr}}{\begin{rerealignedlet}\rereleteqn{\rerevar{term}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevar{mult}}{\rerealt{\rerevar{term}\rerelit{\rerecharplus{}}\rerevar{mult}}{\rerevar{term}}}\rereletbody{\rerealt{\rerevar{mult}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevar{mult}}}\end{rerealignedlet}}\rereleteqn{\rerevarsub{term}{\rerestr{}1}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevarsub{mult}{\rerestr{}1}}{\rerealt{\rerevarsub{term}{\rerestr{}1}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{term}{\rerestr{}1}}}\rereleteqn{\rerevarsub{mult}{\rerestr{20}}}{\rerealt{\rerevarsub{digits}{\rerestr{0}}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{digits}{\rerestr{0}}}}\rereleteqn{\rerevarsub{expr}{\rerestr{20}}}{\rerealt{\rerevarsub{mult}{\rerestr{20}}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevarsub{mult}{\rerestr{20}}}}\rereleteqn{\rerevarsub{term}{\rerestr{\rerecharpopen{}20}}}{\rerevarsub{expr}{\rerestr{20}}\rerelit{\rerecharpclose{}}}\rereleteqn{\rerevarsub{mult}{\rerestr{\rerecharpopen{}20}}}{\rerealt{\rerevarsub{term}{\rerestr{\rerecharpopen{}20}}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{term}{\rerestr{\rerecharpopen{}20}}}}\rereletbody{\rerealt{\rerevarsub{mult}{\rerestr{\rerecharpopen{}20}}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevarsub{mult}{\rerestr{\rerecharpopen{}20}}}}\end{rerealignedlet}} %\reretraceline[]{\rerestr{3\rerecharpclose{}}}{\begin{rerealignedlet}\rereleteqn{\rerevar{digits}}{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereletreceqn{\rerevar{expr}}{\begin{rerealignedlet}\rereleteqn{\rerevar{term}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevar{mult}}{\rerealt{\rerevar{term}\rerelit{\rerecharplus{}}\rerevar{mult}}{\rerevar{term}}}\rereletbody{\rerealt{\rerevar{mult}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevar{mult}}}\end{rerealignedlet}}\rereleteqn{\rerevarsub{term}{\rerestr{}1}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevarsub{mult}{\rerestr{}1}}{\rerealt{\rerevarsub{term}{\rerestr{}1}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{term}{\rerestr{}1}}}\rereleteqn{\rerevarsub{expr}{\rerestr{20\rerecharplus{}}}}{\rerealt{\rerevarsub{mult}{\rerestr{}1}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevarsub{mult}{\rerestr{}1}}}\rereleteqn{\rerevarsub{term}{\rerestr{\rerecharpopen{}20\rerecharplus{}}}}{\rerevarsub{expr}{\rerestr{20\rerecharplus{}}}\rerelit{\rerecharpclose{}}}\rereleteqn{\rerevarsub{mult}{\rerestr{\rerecharpopen{}20\rerecharplus{}}}}{\rerealt{\rerevarsub{term}{\rerestr{\rerecharpopen{}20\rerecharplus{}}}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{term}{\rerestr{\rerecharpopen{}20\rerecharplus{}}}}}\rereletbody{\rerealt{\rerevarsub{mult}{\rerestr{\rerecharpopen{}20\rerecharplus{}}}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevarsub{mult}{\rerestr{\rerecharpopen{}20\rerecharplus{}}}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{\rerecharpclose{}}}{\begin{rerealignedlet}\rereleteqn{\rerevar{digits}}{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereleteqn{\rerevarsub{digits}{\rerestr{3}}}{\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereletreceqn{\rerevar{expr}}{\begin{rerealignedlet}\rereleteqn{\rerevar{term}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevar{mult}}{\rerealt{\rerevar{term}\rerelit{\rerecharplus{}}\rerevar{mult}}{\rerevar{term}}}\rereletbody{\rerealt{\rerevar{mult}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevar{mult}}}\end{rerealignedlet}}\rereleteqn{\rerevarsub{term}{\rerestr{}1}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevarsub{mult}{\rerestr{}1}}{\rerealt{\rerevarsub{term}{\rerestr{}1}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{term}{\rerestr{}1}}}\rereleteqn{\rerevarsub{mult}{\rerestr{3}}}{\rerealt{\rerevarsub{digits}{\rerestr{3}}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{digits}{\rerestr{3}}}}\rereleteqn{\rerevarsub{expr}{\rerestr{20\rerecharplus{}3}}}{\rerealt{\rerevarsub{mult}{\rerestr{3}}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevarsub{mult}{\rerestr{3}}}}\rereleteqn{\rerevarsub{term}{\rerestr{\rerecharpopen{}20\rerecharplus{}3}}}{\rerevarsub{expr}{\rerestr{20\rerecharplus{}3}}\rerelit{\rerecharpclose{}}}\rereleteqn{\rerevarsub{mult}{\rerestr{\rerecharpopen{}20\rerecharplus{}3}}}{\rerealt{\rerevarsub{term}{\rerestr{\rerecharpopen{}20\rerecharplus{}3}}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{term}{\rerestr{\rerecharpopen{}20\rerecharplus{}3}}}}\rereletbody{\rerealt{\rerevarsub{mult}{\rerestr{\rerecharpopen{}20\rerecharplus{}3}}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevarsub{mult}{\rerestr{\rerecharpopen{}20\rerecharplus{}3}}}}\end{rerealignedlet}} \reretraceline[]{\rereeps}{\begin{rerealignedlet}\rereleteqn{\rerevar{digits}}{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}\rerestar{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}}\rereletreceqn{\rerevar{expr}}{\begin{rerealignedlet}\rereleteqn{\rerevar{term}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevar{mult}}{\rerealt{\rerevar{term}\rerelit{\rerecharplus{}}\rerevar{mult}}{\rerevar{term}}}\rereletbody{\rerealt{\rerevar{mult}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevar{mult}}}\end{rerealignedlet}}\rereleteqn{\rerevarsub{term}{\rerestr{}1}}{\rerealt{\rerevar{digits}}{\rerelit{\rerecharpopen{}}\rerevar{expr}\rerelit{\rerecharpclose{}}}}\rereletreceqn{\rerevarsub{mult}{\rerestr{}1}}{\rerealt{\rerevarsub{term}{\rerestr{}1}\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rerevarsub{term}{\rerestr{}1}}}\rereleteqn{\rerevarsub{mult}{\rerestr{\rerecharpopen{}20\rerecharplus{}3\rerecharpclose{}}}}{\rerealt{\rerelit{\rerecharplus{}}\rerevarsub{mult}{\rerestr{}1}}{\rereeps}}\rereletbody{\rerealt{\rerevarsub{mult}{\rerestr{\rerecharpopen{}20\rerecharplus{}3\rerecharpclose{}}}\rerelit{\rerecharstar{}}\rerevar{expr}}{\rerevarsub{mult}{\rerestr{\rerecharpopen{}20\rerecharplus{}3\rerecharpclose{}}}}}\end{rerealignedlet}} \end{reretrace}

    In general, the conversion from a context-free grammar to a recursive regular expression makes use of the following theorem by Bekić:

    Bisection lemma [Bekic1984]:
    For monotone f : P \times Q \to P and g : P \times Q \to Q

    \left(\rereFIX_{P\times Q} ( x,y ) = ((x,y) , g(x,y) )\right) = (x_0, y_0)


    \begin{aligned} x_0 & = (\rereFIX_P\,x = f(x, y_0)) \\ y_0 & = (\rereFIX_Q\,y = g(x_0, y)) \end{aligned}

    (There is a constructive proof for this lemma.)

    We can use the bisection lemma to eliminate simultaneous recursion in a CFG, reducing it to a recursive regular expression. A CFG is a fixed point of h_n : \text{RE}^n \to \text{RE}^n , We can assume that there’s at least one production, the starting symbol. If CFG has only one single production, than we can convert it to recursive regular expression using \FIX . Otherwise, n = 1 + m . We then take P = \text{RE} and Q = \text{RE}^m , extract functions f : \text{RE} \times \text{RE}^m \to \text{RE} and g : \text{RE} \times \text{RE}^m \to \text{RE}^m , and define

    h_m(\bar{y}) = \begin{rerealignedlet} \rereleteqn{x_0}{\rerefix{x}{f(x, \bar{y})}} \rereletbody{g(x_0, \bar{y})} \end{rerealignedlet}

    where \bar{y} represents a vector of m distinct \text{RE} variables.

    By iterating this process, we can reduce the number of top-level productions until we have a single production left and can apply the base case.

    Note on performance

    Let me start by saying that I haven’t really measured very much. The examples above run at interactive speed in GHCi, and I suspect that collecting and pretty-printing the traces is not free.

    I wrote the parser for unambiguous arithmetic expressions based on the example above, using parsec:

    ex7parsec :: P.Parser ()
    ex7parsec = expr where
        expr   = void $ P.try (mult >> P.char '*' >> expr) <|> mult
        mult   = void $ P.try (term >> P.char '+' >> mult) <|> term
        term   = P.try digits <|> void (P.char '(' *> expr *> P.char ')')
        digits = void $ some digit
        digit  = P.satisfy (\c -> c >= '0' && c <= '9')

    Note, that this parser only recognises, i.e., doesn’t build a parse tree. I also didn’t use good practices in writing parsec parsers, rather translating the CFG as directly as possible.

    The result is salty. The recursive-regexp approach is 1000-10000 times slower (and getting slower the longer the input string is). This is not really surprising, as the matching algorithm recomputes a lot of things on each character, but still unfortunate.

    We can get 100x speedup (but be still 100x slower than parsec) by introducing explicit sharing instead of Fix (and Let). At the end we are doing the same as Might, Darais and Spiewak; with a difference being that our public interface is non-opaque.

    For this, we take the original regular expression we started with, and add a new constructor Ref:

    -- | Knot-tied recursive regular expression.
    data RR
        = Eps
        | Ch CS.CharSet
        | App RR RR
        | Alt RR RR
        | Star RR
        | Ref !Int RR

    This structure can now be circular, as long as cycles use Ref. Conversion from RE with Fix to RR is a direct mapping of constructors; the interesting part happens with Fix, where we have to use mfix (and a lazy state monad):

        go :: RE RR -> State Int RR
        go (R.Fix _ r) = mfix $ \res -> do
            i <- newId
            r' <- go (fmap (unvar res id) r)
            return (Ref i r')

    The implementation is still relatively simple and importantly not non-acceptably slow. In the simple artificial benchmark of parsing 1000*(2020+202)*(20+3)*((30+20)*10000)+123123123*12313 arithmetic expression the results are

    benchmarking parsec
    time                 22.31 μs   (21.44 μs .. 23.66 μs)
    benchmarking rere
    time                 237.2 ms   (207.5 ms .. 266.1 ms)
    benchmarking ref
    time                 6.029 ms   (5.486 ms .. 6.801 ms)
    benchmarking derp
    time                 20.31 μs   (18.37 μs .. 22.07 μs)

    The Haskell used in this post is simple enough so that the library can easily be ported to Miranda3. With this version, our example runs in

    % echo "bench" | time mira rere.m
    mira rere.m  0,32s user 0,02s system 99% cpu 0,343 total

    Perhaps surprisingly, it is not much slower than GHC.


    A less artificial benchmark is parsing JSON. I took the JSON syntax definition from and directly translated it to CFG:

    \begin{rerecfg} \rerecfgproduction{\rerevar{ws}}{\rerealt{\rereeps}{\rerealt{\rerelit{\rerecharcode{32}}\rerevar{ws}}{\rerealt{\rerelit{\rerecharcode{10}}\rerevar{ws}}{\rerealt{\rerelit{\rerecharcode{13}}\rerevar{ws}}{\rerelit{\rerecharcode{9}}\rerevar{ws}}}}}} \rerecfgproduction{\rerevar{sign}}{\rerealt{\rereeps}{\rerealt{\rerelit{\rerecharplus{}}}{\rerelit{\rerecharminus{}}}}} \rerecfgproduction{\rerevar{exponent}}{\rerealt{\rereeps}{\rerealt{\rerelit{E}\rerevar{sign}\,\rerevar{digits}}{\rerelit{e}\rerevar{sign}\,\rerevar{digits}}}} \rerecfgproduction{\rerevar{fraction}}{\rerealt{\rereeps}{\rerelit{.}\rerevar{digits}}} \rerecfgproduction{\rerevar{onenine}}{\rerelitset{\rerelitrange{\rerelit{1}}{\rerelit{9}}}} \rerecfgproduction{\rerevar{digit}}{\rerealt{\rerelit{0}}{\rerevar{onenine}}} \rerecfgproduction{\rerevar{digits}}{\rerealt{\rerevar{digit}}{\rerevar{digit}\,\rerevar{digits}}} \rerecfgproduction{\rerevar{integer}}{\rerealt{\rerevar{digit}}{\rerealt{\rerevar{onenine}\,\rerevar{digits}}{\rerealt{\rerelit{\rerecharminus{}}\rerevar{digit}}{\rerelit{\rerecharminus{}}\rerevar{onenine}\,\rerevar{digits}}}}} \rerecfgproduction{\rerevar{number}}{\rerevar{integer}\,\rerevar{fraction}\,\rerevar{exponent}} \rerecfgproduction{\rerevar{hex}}{\rerevar{digit}\rerelitset{\rerelitrange{\rerelit{A}}{\rerelit{F}}, \rerelitrange{\rerelit{a}}{\rerelit{f}}}} \rerecfgproduction{\rerevar{escape}}{\rerealt{\rerelitset{\rerelit{"}, \rerelit{/}, \rerelit{\rerecharbackslash{}}, \rerelit{b}, \rerelit{f}, \rerelit{n}, \rerelit{r}, \rerelit{t}}}{\rerelit{u}\rerevar{hex}\,\rerevar{hex}\,\rerevar{hex}\,\rerevar{hex}}} \rerecfgproduction{\rerevar{character}}{\rerealt{\rerelitsetcomplement{\rerelitrange{\rerelit{\rerecharcode{0}}}{\rerelit{\rerecharcode{31}}}, \rerelit{"}, \rerelit{\rerecharbackslash{}}}}{\rerelit{\rerecharbackslash{}}\rerevar{escape}}} \rerecfgproduction{\rerevar{characters}}{\rerealt{\rereeps}{\rerevar{character}\,\rerevar{characters}}} \rerecfgproduction{\rerevar{string}}{\rerelit{"}\rerevar{characters}\rerelit{"}} \rerecfgproduction{\rerevar{element}}{\rerevar{ws}\,\rerevar{value}\,\rerevar{ws}} \rerecfgproduction{\rerevar{elements}}{\rerealt{\rerevar{element}}{\rerevar{element}\rerelit{,}\rerevar{elements}}} \rerecfgproduction{\rerevar{array}}{\rerealt{\rerelit{\rerecharbopen{}}\rerevar{ws}\rerelit{\rerecharbclose{}}}{\rerelit{\rerecharbopen{}}\rerevar{elements}\rerelit{\rerecharbclose{}}}} \rerecfgproduction{\rerevar{member}}{\rerevar{ws}\,\rerevar{string}\,\rerevar{ws}\rerelit{:}\rerevar{element}} \rerecfgproduction{\rerevar{members}}{\rerealt{\rerevar{member}}{\rerevar{member}\rerelit{,}\rerevar{members}}} \rerecfgproduction{\rerevar{object}}{\rerealt{\rerelit{{}\rerevar{ws}\rerelit{}}}{\rerelit{{}\rerevar{members}\rerelit{}}}} \rerecfgproduction{\rerevar{value}}{\rerealt{\rerevar{object}}{\rerealt{\rerevar{array}}{\rerealt{\rerevar{string}}{\rerealt{\rerevar{number}}{\rerealt{\rerelit{t}\rerelit{r}\rerelit{u}\rerelit{e}}{\rerealt{\rerelit{f}\rerelit{a}\rerelit{l}\rerelit{s}\rerelit{e}}{\rerelit{n}\rerelit{u}\rerelit{l}\rerelit{l}}}}}}}} \rerecfgproduction{\rerevar{json}}{\rerevar{element}} \end{rerecfg}

    which is then translated into recursive regular expression:

    \begin{rerealignedlet}\rereletreceqn{\rerevar{ws}}{\rerealt{\rereeps}{\rerealt{\rerelit{\rerecharcode{32}}\rerevar{ws}}{\rerealt{\rerelit{\rerecharcode{10}}\rerevar{ws}}{\rerealt{\rerelit{\rerecharcode{13}}\rerevar{ws}}{\rerelit{\rerecharcode{9}}\rerevar{ws}}}}}}\rereleteqn{\rerevar{hex}}{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}\rerelitset{\rerelitrange{\rerelit{A}}{\rerelit{F}}, \rerelitrange{\rerelit{a}}{\rerelit{f}}}}\rereleteqn{\rerevar{escape}}{\rerealt{\rerelitset{\rerelit{"}, \rerelit{/}, \rerelit{\rerecharbackslash{}}, \rerelit{b}, \rerelit{f}, \rerelit{n}, \rerelit{r}, \rerelit{t}}}{\rerelit{u}\rerevar{hex}\,\rerevar{hex}\,\rerevar{hex}\,\rerevar{hex}}}\rereleteqn{\rerevar{character}}{\rerealt{\rerelitsetcomplement{\rerelitrange{\rerelit{\rerecharcode{0}}}{\rerelit{\rerecharcode{31}}}, \rerelit{"}, \rerelit{\rerecharbackslash{}}}}{\rerelit{\rerecharbackslash{}}\rerevar{escape}}}\rereletreceqn{\rerevar{characters}}{\rerealt{\rereeps}{\rerevar{character}\,\rerevar{characters}}}\rereleteqn{\rerevar{string}}{\rerelit{"}\rerevar{characters}\rerelit{"}}\rereletreceqn{\rerevar{digits}}{\rerealt{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}\rerevar{digits}}}\rereleteqn{\rerevar{integer}}{\rerealt{\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}{\rerealt{\rerelitset{\rerelitrange{\rerelit{1}}{\rerelit{9}}}\rerevar{digits}}{\rerealt{\rerelit{\rerecharminus{}}\rerelitset{\rerelitrange{\rerelit{0}}{\rerelit{9}}}}{\rerelit{\rerecharminus{}}\rerelitset{\rerelitrange{\rerelit{1}}{\rerelit{9}}}\rerevar{digits}}}}}\rereleteqn{\rerevar{fraction}}{\rerealt{\rereeps}{\rerelit{.}\rerevar{digits}}}\rereleteqn{\rerevar{sign}}{\rerealt{\rereeps}{\rerelitset{\rerelit{\rerecharplus{}}, \rerelit{\rerecharminus{}}}}}\rereleteqn{\rerevar{exponent}}{\rerealt{\rereeps}{\rerealt{\rerelit{E}\rerevar{sign}\,\rerevar{digits}}{\rerelit{e}\rerevar{sign}\,\rerevar{digits}}}}\rereleteqn{\rerevar{number}}{\rerevar{integer}\,\rerevar{fraction}\,\rerevar{exponent}}\rereletreceqn{\rerevar{value}}{\begin{rerealignedlet}\rereleteqn{\rerevar{element}}{\rerevar{ws}\,\rerevar{value}\,\rerevar{ws}}\rereleteqn{\rerevar{member}}{\rerevar{ws}\,\rerevar{string}\,\rerevar{ws}\rerelit{:}\rerevar{element}}\rereletreceqn{\rerevar{members}}{\rerealt{\rerevar{member}}{\rerevar{member}\rerelit{,}\rerevar{members}}}\rereleteqn{\rerevar{object}}{\rerealt{\rerelit{{}\rerevar{ws}\rerelit{}}}{\rerelit{{}\rerevar{members}\rerelit{}}}}\rereletreceqn{\rerevar{elements}}{\rerealt{\rerevar{element}}{\rerevar{element}\rerelit{,}\rerevar{elements}}}\rereleteqn{\rerevar{array}}{\rerealt{\rerelit{\rerecharbopen{}}\rerevar{ws}\rerelit{\rerecharbclose{}}}{\rerelit{\rerecharbopen{}}\rerevar{elements}\rerelit{\rerecharbclose{}}}}\rereletbody{\rerealt{\rerevar{object}}{\rerealt{\rerevar{array}}{\rerealt{\rerevar{string}}{\rerealt{\rerevar{number}}{\rerealt{\rerelit{t}\rerelit{r}\rerelit{u}\rerelit{e}}{\rerealt{\rerelit{f}\rerelit{a}\rerelit{l}\rerelit{s}\rerelit{e}}{\rerelit{n}\rerelit{u}\rerelit{l}\rerelit{l}}}}}}}}\end{rerealignedlet}}\rereletbody{\rerevar{ws}\,\rerevar{value}\,\rerevar{ws}}\end{rerealignedlet}

    I used a simple 1611 byte JSON file (a package.json definition). It takes a dozen of microseconds for aeson to parse it. The “optimised” implementation of this section needs dozen seconds, i.e., it is a million times slower. The parser is slow, but acceptably so to be used in tests, to verify the implementation of “fast” production variant. Unfortunately, the derp parser generated from above regular expression loops, so we cannot know how it would perform.


    We know that regular expressions are closed under intersection, and it’s possible to define a conversion from an RE that makes use of the And constructor to an RE that does not. Context-free grammars are not closed under intersection, but our recursive RE can still be extended with additional And constructor, and everything discussed above will continue to work.

    data RE a =
        | And (RE a) (RE a)

    We can also add a Full constructor that matches anything.

    The extension of nullable and derivative is so simple that you might think something will break; yet nothing does:

    nullable' Full      = True
    nullable' (And r s) = nullable' r && nullable' s
    derivative' _ Full      = Full
    derivative' f (And r s) = derivative' f r /\ derivative' f s
    (/\) :: Ord a => RE a -> RE a -> RE a
    r /\ s = And r s

    As an example, we consider the intersection of two languages:

    \begin{aligned} X &= \{ \rerestr{a}^n \rerestr{b}^n \rerestr{c}^m \mid n, m \in \mathbb{N} \} & Y &= \{ \rerestr{a}^m \rerestr{b}^n \rerestr{c}^n \mid n, m \in \mathbb{N} \} \end{aligned}

    which is known to be not context-free:

    L = X \cap Y = \{ \rerestr{a}^n \rerestr{b}^n \rerestr{c}^n \mid n \in \mathbb{N} \}

    However, we can simply match with it:

    \begin{reretrace} \reretraceline[]{\rerestr{aaabbbccc}}{\rereintersect{\rerestar{\rerelit{a}}(\rerefix{\rerevar{bc}}{\rerealt{\rereeps}{\rerelit{b}\rerevar{bc}\rerelit{c}}})}{(\rerefix{\rerevar{ab}}{\rerealt{\rereeps}{\rerelit{a}\rerevar{ab}\rerelit{b}}})\rerestar{\rerelit{c}}}} \reretraceline[]{\rerestr{aabbbccc}}{\rereletrecin{\rerevar{ab}}{\rerealt{\rereeps}{\rerelit{a}\rerevar{ab}\rerelit{b}}}{\rereintersect{\rerestar{\rerelit{a}}(\rerefix{\rerevar{bc}}{\rerealt{\rereeps}{\rerelit{b}\rerevar{bc}\rerelit{c}}})}{\rerevar{ab}\rerelit{b}\rerestar{\rerelit{c}}}}} \reretraceline[]{\rerestr{abbbccc}}{\begin{rerealignedlet}\rereletreceqn{\rerevar{ab}}{\rerealt{\rereeps}{\rerelit{a}\rerevar{ab}\rerelit{b}}}\rereleteqn{\rerevarsub{ab}{\rerestr{a}}}{\rerevar{ab}\rerelit{b}}\rereletbody{\rereintersect{\rerestar{\rerelit{a}}(\rerefix{\rerevar{bc}}{\rerealt{\rereeps}{\rerelit{b}\rerevar{bc}\rerelit{c}}})}{\rerevarsub{ab}{\rerestr{a}}\rerelit{b}\rerestar{\rerelit{c}}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{bbbccc}}{\begin{rerealignedlet}\rereletreceqn{\rerevar{ab}}{\rerealt{\rereeps}{\rerelit{a}\rerevar{ab}\rerelit{b}}}\rereleteqn{\rerevarsub{ab}{\rerestr{a}}}{\rerevar{ab}\rerelit{b}}\rereleteqn{\rerevarsub{ab}{\rerestr{aa}}}{\rerevarsub{ab}{\rerestr{a}}\rerelit{b}}\rereletbody{\rereintersect{\rerestar{\rerelit{a}}(\rerefix{\rerevar{bc}}{\rerealt{\rereeps}{\rerelit{b}\rerevar{bc}\rerelit{c}}})}{\rerevarsub{ab}{\rerestr{aa}}\rerelit{b}\rerestar{\rerelit{c}}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{bbccc}}{\rereletrecin{\rerevar{bc}}{\rerealt{\rereeps}{\rerelit{b}\rerevar{bc}\rerelit{c}}}{\rereintersect{\rerevar{bc}\rerelit{c}}{\rerelit{b}\rerelit{b}\rerestar{\rerelit{c}}}}} \reretraceline[]{\rerestr{bccc}}{\begin{rerealignedlet}\rereletreceqn{\rerevar{bc}}{\rerealt{\rereeps}{\rerelit{b}\rerevar{bc}\rerelit{c}}}\rereleteqn{\rerevarsub{bc}{\rerestr{b}}}{\rerevar{bc}\rerelit{c}}\rereletbody{\rereintersect{\rerevarsub{bc}{\rerestr{b}}\rerelit{c}}{\rerelit{b}\rerestar{\rerelit{c}}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{ccc}}{\begin{rerealignedlet}\rereletreceqn{\rerevar{bc}}{\rerealt{\rereeps}{\rerelit{b}\rerevar{bc}\rerelit{c}}}\rereleteqn{\rerevarsub{bc}{\rerestr{b}}}{\rerevar{bc}\rerelit{c}}\rereleteqn{\rerevarsub{bc}{\rerestr{bb}}}{\rerevarsub{bc}{\rerestr{b}}\rerelit{c}}\rereletbody{\rereintersect{\rerevarsub{bc}{\rerestr{bb}}\rerelit{c}}{\rerestar{\rerelit{c}}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{cc}}{\rereintersect{\rerelit{c}\rerelit{c}}{\rerestar{\rerelit{c}}}} \reretraceline[]{\rerestr{c}}{\rereintersect{\rerelit{c}}{\rerestar{\rerelit{c}}}} \reretraceline[]{\rereeps}{\rereeps} \end{reretrace}

    This is of course cheating somewhat, as And / \cap occurs here only on the top-level, and not e.g. inside of Fix / \rereFIX . However, even then it wouldn’t pose problems for the algorithm, but it is difficult to come up with any meaningful examples. Also recall that we interpret \FIX as least fixed point, so for example, even though


    has \rereeps as a fixed point, we have \rerenull as another fixed point, and that is the least one. Therefore, the above expression works (and indeed is automatically simplified to) \rerenull .

    However And / \cap adds expressive power to the language, so it cannot be omitted as in pure regular expressions (where it causes a combinatorial explosion of expression size, so it may not be a good idea there either).

    There’s one clear problem with And however: languages defined using And cannot be easily generated. Assuming we can use the first branch of an intersection to generate the candidate, it must still also match the second branch. I don’t know whether the language inhabitation problem for this class of languages is decidable (for CFGs it is, but we’re now outside of CFGs). Consider the intersection of two simple regular expressions, strings of \rerestr{a} of odd and even length:


    Their intersection is empty, but it’s not structurally obvious. For example if we match on \rerestr{aaa} , the expression will stay in a single non-nullable state, but it won’t simplify to \rerenull :

    \begin{reretrace} \reretraceline[]{\rerestr{aaa}}{\begin{rerealignedlet}\rereleteqn{\rerevar{odd}}{\rerelit{a}\rerestar{(\rerelit{a}\rerelit{a})}}\rereleteqn{\rerevar{even}}{\rerestar{(\rerelit{a}\rerelit{a})}}\rereletbody{\rereintersect{\rerevar{even}}{\rerevar{odd}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{aa}}{\begin{rerealignedlet}\rereleteqn{\rerevarsub{odd}{\rerestr{a}}}{\rerestar{(\rerelit{a}\rerelit{a})}}\rereleteqn{\rerevarsub{even}{\rerestr{a}}}{\rerelit{a}\rerevarsub{odd}{\rerestr{a}}}\rereletbody{\rereintersect{\rerevarsub{even}{\rerestr{a}}}{\rerevarsub{odd}{\rerestr{a}}}}\end{rerealignedlet}} \reretraceline[]{\rerestr{a}}{\begin{rerealignedlet}\rereleteqn{\rerevarsub{odd}{\rerestr{a}}}{\rerestar{(\rerelit{a}\rerelit{a})}}\rereleteqn{\rerevarsub{odd}{\rerestr{aa}}}{\rerelit{a}\rerevarsub{odd}{\rerestr{a}}}\rereletbody{\rereintersect{\rerevarsub{odd}{\rerestr{a}}}{\rerevarsub{odd}{\rerestr{aa}}}}\end{rerealignedlet}} \reretraceline[]{\rereeps}{\begin{rerealignedlet}\rereleteqn{\rerevarsub{odd}{\rerestr{a}}}{\rerestar{(\rerelit{a}\rerelit{a})}}\rereleteqn{\rerevarsub{odd}{\rerestr{aa}}}{\rerelit{a}\rerevarsub{odd}{\rerestr{a}}}\rereletbody{\rereintersect{\rerevarsub{odd}{\rerestr{aa}}}{\rerevarsub{odd}{\rerestr{a}}}}\end{rerealignedlet}} \end{reretrace}


    It was nice to combine known things in a new way, and the result is interesting.

    We now know that we can add fixed points to regular expressions or recursive types to non-commutative intuitionistic linear logic. The systems are still well-behaved, but many problems become harder. As CFG equivalence is undecidable, so is term synthesis in NCILL with recursive types (as synthesizing a term would yield a proof of equivalence).

    We can also use the recursive RE not only to match on strings, but also to generate them. Therefore we can use it to statistically determine grammar equivalence (which is not a new idea).

    Finally, this is not only for fun. I’m trying to formalize the grammars of fields in .cabal files. The parsec parsers are the definition, but we now have more declarative definitions too and compare these two, using QuickCheck. Additionally we get nice-looking \text{\LaTeX} grammar definitions that are (hopefully) human-readable. If you ever wondered what the complete and precise syntax for version ranges in .cabal files is, here is what it looks like at the time of writing this post:

    The notation is described in detail in the Cabal user manual, where you can also find more grammar definitions.

    Code in the Cabal Parsec instances is accumulating history baggage, and is written to produce helpful error messages, not necessarily with clarity of the grammar in mind. However, we can compare it (and its companion Pretty instance) with its RE counterpart to find possible inconsistencies. Also, Cabal has a history of not handling whitespace well, either always requiring, completely forbidding, or allowing it where it shouldn’t be allowed. The RE-derived generator can be amended to produce slightly skewed strings, for example inserting or removing whitespace, to help identify and overcome such problems.


    [Bekic1984] Bekić, Hans: Definable operations in general algebras, and the theory of automata and flowcharts. In: Jones, C. B. (ed.): Programming languages and their definition: H. Bekič (1936–1982). Berlin, Heidelberg : Springer Berlin Heidelberg, 1984 — ISBN 978-3-540-38933-0, pp. 30–55.

    [Might2011] Might, Matthew; Darais, David; Spiewak, Daniel: Parsing with derivatives: A functional pearl. In: Proceedings of the 16th ACM Sigplan International Conference on Functional Programming, ICFP ’11. New York, NY, USA : Association for Computing Machinery, 2011 — ISBN 9781450308656, pp. 189–195.

    [Owens2009] Owens, Scott; Reppy, John; Turon, Aaron: Regular-expression derivatives re-examined. In: Journal of Functional Programming vol. 19 (2). USA, Cambridge University Press (2009), pp. 173–190.

    1. There is an implementation in the derp package, which uses Data.IORef and unsafePerformIO.↩︎

    2. Using smart constructors in this approach, we obtain relatively small automata, but they are not minimal. To actually obtain minimal ones, kleene can compare regular expressions for an equivalence, e.g. concluding that \rerestar{\rerelit{a}} and \rereconcat{\rerestar{\rerelit{a}}}{\rerestar{\rerelit{a}}} are in fact equivalent.↩︎

    3. Miranda was first released in 1985, and in January 2020 also under BSD-2-Clause license. Now anyone can play with it. Lack of type classes makes the code a bit more explicit, but otherwise it doesn’t look much different.↩︎

    by oleg at June 04, 2020 12:00 AM