Planet Haskell

December 07, 2018

Serokell Blog

Introduction to Tagless Final

I’m Vasiliy Kevroletin, and I work at Serokell with a lot of different people. Teams consist not only of Haskell experts (who contribute to GHC and Haskell libraries), but also of people like me who have less Haskell experience, but strive hard to learn and expand their Haskell knowledge.

Recently, my team decided to implement an eDSL using the tagless final style for one of the new projects. Although it’s a fairly known technique, I had zero experience with tagless final and there were some difficulties associated with terms tagless, final, and eDSL.

In preparation for the task, I’ve done some research and organized the learned material into a small HowTo. Now I want to share it with others.

Prerequisites

I assume that the reader is fairly comfortable with MTL because I will use a lot of analogies with MTL.

Gist

Recall your everyday MTL-style programming. Forget about concrete monad transformers and concentrate on type classes. Without transformers, there are only two things that are left:

  1. Functions are declared using type constraints instead of concrete types:

    getUser :: (MonadReader r m, Has DatabaseConfig r, MonadIO m) => Name -> m User
    
  2. Instantiation of a polymorphic function to a concrete type (aka interpreter) happens somewhere “in the end”:

    liftIO $ runReader (getUser (Name "Pedro")) env
    

That’s all. We’ve just covered the tagless final style:

  1. Write code using overloaded functions.
  2. Run code using any of the suitable implementations (aka interpreters).

All good (and bad) in tagless final comes from ad-hoc polymorphism and type classes (i.e. overloading). Your output depends directly on your commitment to overload things.

A distinct feature of tagless final is extensibility in two dimensions, which is, in fact, a significant achievement (see The Expression Problem Revisited for an explanation for why it’s hard to achieve this property).

Let’s discuss extensibility while keeping this function signature in mind:

wimble :: (MonadReader Env m, MonadState State m) => m ()

We can run wimble using a custom new implementation of MonadReader and MonadState (by defining a new data type and defining instances for it). This is an extension in the first dimension: a new interpreter. Furthermore, we can use a new set of operations, say MonadWriter, and use wimble in a new function which uses all 3 classes: MonadReader, MonadState and MonadWriter (i.e. old and new operations). This is an extension in the second dimension: a new set of operations.

From my point of view, available learning resources show two different approaches to using tagless final:

  1. Define operations abstracted over a monad

    In that case, we can use do notation.

  2. Define an Abstract Syntax Tree using overloaded functions

    In that case, potentially, we can pretty print, inspect and optimize AST.

People who have experience with the technique might say that the two approaches are exactly the same. After learning about tagless final, this opinion makes sense for me. But earlier, when I had just started searching through available learning resources, I was confused by the difference in the look and feel of the resulting code. Also, some people say that having do notation is enough to call something an eDSL, others say that eDSL should define an AST. So, by saying tagless final, different people might assume slightly different approaches which might look as completely different techniques to a novice Haskell programmer. We will briefly explore programming with Monads and defining ASTs using tagless final, and also will touch a few other relevant topics.

Application Monad

It’s common among Haskell programmers to organize effectful application code using monads. Details vary between implementations but the basic idea is to define a monad together with a set of operations available in this monad. Similarly to this blog post, I’ll call a monad for organizing effectful application code an application monad.

Tagless final is a suitable technique for defining application monads. In facts, thanks to MTL, it is one of the most widely used tools for that task.

Let’s take a simplified problem of fetching/deleting a user from a database as an example to demonstrate how tagless final can be used to define operations available in do notation. Our application monad will provide two operations: getUser and deleteUser. By applying tagless final approach, we will define a set of overloaded functions and later will provide their implementation. Right from the start there is a design decision to make: which operations to overload. We can define a new typeclass with getUser/deleteUser operations, or we can use more generic functions from MTL and build on top of them. Although in practice I often will choose the 2nd option, here I’ll show the 1st one because in our particular case it leads to shorter code:

data Name = Name String
data User = User { name :: Name, age :: Int }

class Monad m => MonadDatabase m where
    getUser    :: Name -> m User
    deleteUser :: User -> m ()

Using operations given above we can define some logic like this:

test :: MonadDatabase m => m ()
test = do user <- getUser (Name "Pedro")
          when (age user < 18) (deleteUser user)

Note that the test function is abstract: it can be executed using a different implementation of MonadDatabase. Now, let’s define a MonadDatabase instance suitable to run that test function. One way to do it is to build on top of MTL transformers. I’ve assumed that getUser/deleteUser functions can be implemented on top of Reader and IO monads and I’ve also omitted some implementation details (marked by ...):

data DatabaseConfig = DatabaseConfig { ... }

newtype AppM a =
    AppM { unAppM :: ReaderT DatabaseConfig IO a }
    deriving (Functor, Applicative, Monad, MonadIO, MonadReader DatabaseConfig)

instance MonadDatabase AppM where
    getUser name = do cfg <- ask
                         ...

    deleteUser user = do cfg <- ask
                         ...

runAppM :: AppM a -> DatabaseConfig -> IO a
runAppM app config = runReaderT (unAppM app) config

Now, we can execute the abstract test function using a particular AppM implementation:

main = do cfg <- ...
          runAppM test cfg

By using tagless final style, we have separated the definition of abstract operations from their implementation which gives us extensibility. With such an approach it is possible to define a new set of operations and use it together with MonadDatabase. It is also possible to add a new interpretation (for example, for testing purposes).

Even with such a small example, there were many possibilities to organize code. The first question: how to choose the set of overloaded operations? Is it better to define a new typeclass such as MonadDatabase with application-specific functions, or is it better to stick to MTL typeclasses and define operations on top of more generic functions? The second question is: how to write the implementation? Are there practical alternatives to MTL transformers? Although it’s very tempting to discuss those and several other questions here, I don’t know all answers and the topic of proper application architecture is too broad. For more in-depth resources on application architecture, you can visit other blogs:
(1, 2, 3, 4, 5, 6).

Mini Q&A

Q. I heard you were describing tagless final using MTL. What about the famous n^2 problem?

A. n^2 problem appears in transformers implementation (because transformers need to propagate methods of their child monads). Transformers have nothing to do with tagless final. We were only talking about type constraints and freedom to switch between implementations.

If you are still wondering about the n^2 problem, here is a small trick to mitigate it (export method implementations as separate functions with a hope that other instances will use your implementation).

If you create many similar implementations, tagless final causes some effort duplication. In that case, you might want to use transformers, which leads to the n^2 problem.

Q. You were talking about an “Application” monad and MTL style. Is it really
an eDSL?

A. Even if there is a canonical scientific definition for the term eDSL, people use the term to talk about different things. Opinions range from “eDSL is a completely distinct language with its own semantics” to “eDSL is a library with a nice, consistent interface”. Here are the answers I got in several public Haskell-related channels to the question “what are good examples of eDSLs implemented in Haskell?”: SBV, diagrams, accelerate, blaze, esqueleto, shake, lens, gloss, streaming libraries (pipes, streamly, etc.), Servant, opaleye, frp-arduino, HHDL, ivory, pandoc. As you can see, those answers clearly show that the term eDSL is vague. But anyway, tagless final can be used to create both “true” eDSLs and nice library interfaces (probably with monads).

eDSLs

The most complete discussion of the tagless final approach was done by Oleg Kiselyov and his colleagues. He talks mostly about the embedding of different versions of typed lambda calculus using the tagless final encoding. He achieves very motivating results, such as embedding lambda calculus with linear types and transforming ASTs.

Let’s pick a simple language as an example and explore two ways to encode an AST: using Initial and Final encodings. The chosen language has integer constants, an addition operation, and lambda functions. From one hand it’s quite simple to put most of the implementation in this blog post. From the other hand, it’s complicated enough to discuss two versions of Initial encodings and to demonstrate extensibility of tagless final approach.

Initial encoding

Initial encoding means that we are representing AST using values of a given algebraic data type. The term “Initial encoding” was inspired by the category theory and it follows from the observation that inductive data type can be viewed as an “initial algebra”. Bartosz Milewski gives a gentle description of what an initial algebra is and why inductive data structure can be viewed as an initial algebra.

Tagged initial encoding

Here is one way to represent an Abstract Syntax Tree for our simple language (we are re-using Haskell lambda functions in the definition of Lambda for simplicity so that we don’t need to implement identifiers assignment/lookup by ourselves, this approach is called higher-order abstract syntax):

data Expr = IntConst Int
          | Lambda   (Expr -> Expr)
          | Apply    Expr Expr
          | Add      Expr Expr

This representation allows us to define well-formed eDSL expressions like these:

-- 10 + 20
t1 = IntConst 10 `Add` IntConst 20

-- (\x -> 10 + x) 20
t2 = Apply (Lambda $ \x -> IntConst 10 `Add` x) (IntConst 20)

Unfortunately, it also allows us to define malformed expressions like these:

-- Trying to call integer constant as a function
e1 = Apply (IntConst 10) (IntConst 10)

-- Trying to add lambda functions
e2 = Add f f where f = Lambda (\x -> x)

Evaluation of Expr values can produce errors because the representation of our eDSL allows encoding malformed expressions. Consequently, interpreter eval should check for type errors during its work. To be more precise, it should pattern-match on resulting values to find out which concrete values came out from eval function in runtime to ensure that the Add operation was applied to integer constants and the Apply operation was used with a lambda function. We define the Result data type to represent possible resulting values and use Maybe Result to represent possible errors:

data Result = IntResult Int
            | LambdaResult (Expr -> Expr)

eval :: Expr -> Maybe Result
eval e@(IntConst x) = Just (IntResult x)
eval e@(Lambda   f) = Just (LambdaResult f)
eval (Apply f0 arg) = do
    f1  <- eval f0
    case f1 of
        LambdaResult f -> eval (f arg)
        _              -> Nothing
eval (Add l0 r0) = do
    l1 <- eval l0
    r1 <- eval r0
    case (l1, r1) of
        (IntResult l, IntResult r) -> Just $ IntResult (l + r)
        _                          -> Nothing

The technique is called “tagged” because sum types in Haskell are tagged sum types. At runtime such values are represented as a pair (tag, payload) and tag is used to perform pattern-matches. The eval function uses pattern matching on IntResult and LambdaResult to perform type checking and errors checking, or in other words, it uses tags in runtime. Hence the name.

Tagless initial encoding

The idea is that we can use GADT to add information about values into Expr type and use it to make malformed eDSL expressions unrepresentable. We no longer need a Result data type and there is no more runtime type checking in eval function. In the Finally Tagless, Partially Evaluated paper authors refer to their versions of data constructors IntResult and LambdaResult as “tags”. And because the GADTs-based approach has no tags, they call it “tagless initial” encoding.

The GADTs-based AST definition and corresponding interpreter eval are given below. New AST is capable of representing examples t1, t2 from the previous section while making e1, e2 unrepresentable. The idea of Expr a data type is that a parameter holds a type to which a given expression should evaluate. IntConst and Lambda just duplicate its field types in a parameter because evaluating a value just means unwrapping it. In the case of Add constructor, a parameter is equal to Int which means that Add evaluates to an integer. Apply evaluates to a result of a passed lambda function.

data Expr a where
    IntConst :: Int                     -> Expr Int
    Lambda   :: (Expr a -> Expr b)      -> Expr (Expr a -> Expr b)
    Apply    :: Expr (Expr a -> Expr b) -> Expr a -> Expr b
    Add      :: Expr Int -> Expr Int    -> Expr Int

eval :: Expr a -> a
eval (IntConst x) = x
eval (Lambda f)   = f
eval (Apply f x)  = eval (eval f x)
eval (Add l r)    = (eval l) + (eval r)

Final encoding

Although the term “Initial” came from category theory, the term “Final” didn’t. Oleg shows that “the final and initial typed tagless representations are related by bijection” which means that these approaches are equivalent in some sense and both are “Initial” from the category theorist’s point of view. The Finally Tagless paper states “We call this approach final (in contrast to initial) because we represent each object term not by its abstract syntax, but by its denotation in a semantic algebra”. My best guess is that the name “Final” was chosen to differentiate from the term Initial as much as possible.

With tagless final, we build expressions using overloaded functions instead of data constructors. The expression from the previous section will look like this:

test = lambda (\x -> add x (intConst 20))

Machinery to make it work consists of two parts.

  1. Combinators definition:

    class LambdaSYM repr where
        intConst :: Int -> repr Int
        lambda   :: (repr a -> repr b) -> repr (a -> b)
        apply    :: repr (a -> b) -> repr a -> repr b
    
  2. Interpreter implementation:

    data R a = R { unR :: a }
    
    instance LambdaSYM R where
        intConst x = R x
        lambda f   = R $ \x -> unR (f (R x))
        apply f a  = R $ (unR f) (unR a)
    
    eval :: R a -> a
    eval x = unR x
    

Applying interpreter:

testSmall :: LambdaSYM repr => repr Int
testSmall = apply (lambda (\x -> x)) (intConst 10)

main = print (eval testSmall) -- 10

Interesting points:

  1. eval function instantiates testSmall expression to a concrete type R Int (aka interpreter).

  2. It’s easy to define other interpreters. For example, a pretty printer. There is a little twist, though: a pretty printer needs to allocate names for free variables and keep track of allocated names, so the printing interpreter will pass an environment and it will look very similar to a Reader monad.

  3. It’s extremely easy to extend the language with new operations.

Adding a new add operation to our previous example requires only defining a new type class and implementing a new instance for each interpreter. Functions which use new add operations should add additional AddSYM repr constrain to its type.

class AddSYM repr where
    add :: repr Int -> repr Int -> repr Int

instance AddSYM R where
    add a b = R $ (unR a) + (unR b)

test :: (LambdaSYM repr, AddSYM repr) => repr Int
test = apply (apply (lambda (\y -> lambda (\x -> x `add` y))) (intConst 10)) (intConst 20)

Please note that in this particular case we are lucky because it’s possible to write instance AddSYM R. Or, in other words, it’s possible to implement the new operation on top of an existing interpreter. Sometimes we will need to extend existing interpreters or write new ones.

Introspection. Host vs target language

In Oleg’s papers, he pretty prints and transforms tagless final AST. It’s very counterintuitive to expect the existence of such utilities because combinators are functions, and we are used to manipulating values of Algebraic Data Types. Yet, it is possible to extract facts about the structure of Final Tagless ASTs (i.e. introspection is possible) and to transform them. For the proof of that claim, check section 3.4 (page 28) of Oleg’s course where he presents pretty-printer and transformer.

The pretty-printer and the transformer of ASTs are just tagless final interpreters which keep track of some additional information and propagate it during interpreting from parents to children. Both are extensible in the same ways as other tagless final interpreters.

However, if we return to our first section and think about applying a tagless final approach to a simple case of defining Application monad, then we will quickly find out that we can’t inspect and transform resulting monads. Consider our simple example:

class Monad m => HasDatabaseConfig m where
    getDatabaseConfig :: m DatabaseConfig

getUser :: (HasDatabaseConfig m, MonadIO m) => Name -> m User
getUser = ...

test :: (HasDatabaseConfig m, MonadIO m) => m String
test = do user <- getUser (Name "Pedro")
          if age user > 3 then pure "Fuzz"
                          else pure "Buzz"

Although the getDatabaseConfig function is overloaded, a lot of logic is expressed using functions and other constructions which are not overloaded. Therefore, there is no way to statically inspect the resulting monadic value. This is an important point: if you want introspection and transformation of ASTs, then you need to keep track of what’s overloaded and what’s not. Oleg obtained his great results because he overloaded everything and expressed embedded lambda calculus by using only overloaded functions. In other words, the power of tagless final depends on how far you want to go with overloading.

Relation to Free monads

People often compare tagless final and free monads. Both approaches give you machinery to define overloaded operations inside a monadic context. I am not an expert in free monads, but tagless final:

  • is faster;
  • extensible (one can easily add new operations);
  • requires less boilerplate.

One argument for free monads is that it’s possible to statically introspect free monads. That is not completely true. Yes, you can easily execute actions one by one, and it helps to combine monadic values by interleaving actions (we can achieve a similar thing by interpreting into continuation with tagless final). But here is a blog post which describes the difficulties of free monad introspection (we’ve already covered the gist of the problem in the previous section). Also, see this blog post where the author describes difficulties associated with free monads and suggests using tagless final instead.

Here is a very good overview of free monad performance challenges. Here Edward Kmett gives his perspective on the same problem.

In few words:

  1. A simple implementation of free monads causes O(n^2) asymptotic for left-associated monadic binds. It always adds one element to a “list” like this [1, 2, 3] ++ [4].
  2. Using continuations (similarly to the DList package) gives O(n) binds, but makes some operations slow (for example combining two sequences of commands by interleaving them).
  3. Using a technology similar to the Seq data structure leads to a good asymptotic behaviour for all operations, but also gives significant constant overhead.

Performance

General techniques regarding optimizing Haskell code with polymorphic functions apply here. In few words, sometimes using overloaded functions cause compiler to generate code which uses methods dictionaries to dispatch calls. The compiler often knows how to specialize functions and get rid of dictionaries, but module boundaries prevent that from happening. To help the compiler, we need to read the “Specializing” section of this document and then use an INLINEABLE pragma like this

getUser :: (MonadReader r m, Has DatabaseConfig r, MonadIO m) => Name -> m User
...
{-# INLINEABLE  getUser #-}

Limitations

Haskell lacks first-class polymorphism (aka impredicative polymorphism), which means that we can’t specialize an existing data type to hold a polymorphic value like this:

Maybe (LambdaSym repr => repr Int)

It follows that we can’t interpret such polymorphic value twice (but this situation doesn’t appear very frequent in cases where we just want an Application monad with some overloaded operations). This is an issue when, for example, we parse some text file, obtain a tagless final AST, and want to interpret it twice: to evaluate and to pretty print. There is a limited workaround: define a newtype wrapper around a polymorphic value. The wrapper specifies concrete type constraints and hence kills one extensibility dimension of tagless final.

Oleg’s paper also presents another workaround: a special “duplicating” interpreter. Unfortunately, It is presented using a simple eDSL without lambda functions and I failed to apply the same technique to a more complicated AST with lambdas. I mention it here just for the sake of completeness.

Also, note that sometimes people want to change implementations (aka interpreters) in the runtime, not in the compile time, or even change only a part of the existing behaviour. For example, change the data source but leave all other application-specific logic intact. Tagless final can support it by implementing an interpreter configurable in the runtime which uses some sort of a method dictionary (see the handle pattern).

Conclusion

Thanks to MTL, tagless final style of programming was battle-tested and has wide adoption. In my opinion, it’s quite a natural way to write Haskell code because it utilizes a very basic Haskell feature: type classes. It also goes far beyond MTL — it can be used both for writing application-specific logic with and without monad transformers and “true” eDSLs with their own semantics.

I also found that it’s not a hard concept to grasp, so it can safely be used in a large team of developers with different backgrounds.

That’s all, I hope my post will help others to grasp the main idea of tagless final and to use it in their projects.

Acknowledgement

Many thanks to Gints Dreimanis, Vlad Zavialov and others from Serokell for their help in writing this article. Without their reviews and suggestions, this post would not have happened.

Literature

by Vasiliy Kevroletin (hi+vasiliykevroletin@serokell.io) at December 07, 2018 12:00 AM

January 19, 2019

Haskell at Work

Purely Functional GTK+, Part 2: TodoMVC

Purely Functional GTK+, Part 2: TodoMVC

In the last episode we built a "Hello, World" application using gi-gtk-declarative. It's now time to convert it into a to-do list application, in the style of TodoMVC.

To convert the “Hello, World!” application to a to-do list application, we begin by adjusting our data types. The Todo data type represents a single item, with a Text field for its name. We also need to import the Text type from Data.Text.

data Todo = Todo
  { name :: Text
  }

Our state will no longer be (), but a data types holding Vector of Todo items. This means we also need to import Vector from Data.Vector.

data State = State
  { todos :: Vector Todo
  }

As the run function returns the last state value of the state reducer loop, we need to discard that return value in main. We wrap the run action in void, imported from Control.Monad.

Let’s rewrite our view function. We change the title to “TodoGTK+” and replace the label with a todoList, which we’ll define in a where binding. We use container to declare a Gtk.Box, with vertical orientation, containing all the to-do items. Using fmap and a typed hole, we see that we need a function Todo -> BoxChild Event.

view' :: State -> AppView Gtk.Window Event
view' s = bin
  Gtk.Window
  [#title := "TodoGTK+", on #deleteEvent (const (True, Closed))]
  todoList
  where
    todoList = container Gtk.Box
                         [#orientation := Gtk.OrientationVertical]
                         (fmap _ (todos s))

The todoItem will render a Todo value as a Gtk.Label displaying the name.

view' :: State -> AppView Gtk.Window Event
view' s = bin
  Gtk.Window
  [#title := "TodoGTK+", on #deleteEvent (const (True, Closed))]
  todoList
  where
    todoList = container Gtk.Box
                         [#orientation := Gtk.OrientationVertical]
                         (fmap todoItem (todos s))
    todoItem todo = widget Gtk.Label [#label := name todo]

Now, GHC tells us there’s a “non-type variable argument in the constraint”. The type of todoList requires us to add the FlexibleContexts language extension.

{-# LANGUAGE FlexibleContexts  #-}
{-# LANGUAGE OverloadedLabels  #-}
{-# LANGUAGE OverloadedLists   #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where

The remaining type error is in the definition of main, where the initial state cannot be a () value. We construct a State value with an empty vector.

main :: IO ()
main = void $ run App
  { view         = view'
  , update       = update'
  , inputs       = []
  , initialState = State {todos = mempty}
  }

Adding New To-Do Items

While our application type-checks and runs, there are no to-do items to display, and there’s no way of adding new ones. We need to implement a form, where the user inserts text and hits the Enter key to add a new to-do item. To represent these events, we’ll add two new constructors to our Event type.

data Event
  = TodoTextChanged Text
  | TodoSubmitted
  | Closed

TodoTextChanged will be emitted each time the text in the form changes, carrying the current text value. The TodoSubmitted event will be emitted when the user hits Enter.

When the to-do item is submitted, we need to know the current text to use, so we add a currentText field to the state type.

data State = State
  { todos       :: Vector Todo
  , currentText :: Text
  }

We modify the initialState value to include an empty Text value.

main :: IO ()
main = void $ run App
  { view         = view'
  , update       = update'
  , inputs       = []
  , initialState = State {todos = mempty, currentText = mempty}
  }

Now, let’s add the form. We wrap our todoList in a vertical box, containing the todoList and a newTodoForm widget.

view' :: State -> AppView Gtk.Window Event
view' s = bin
  Gtk.Window
  [#title := "TodoGTK+", on #deleteEvent (const (True, Closed))]
  (container Gtk.Box
             [#orientation := Gtk.OrientationVertical]
             [todoList, newTodoForm]
  )
  where
    ...

The form consists of a Gtk.Entry widget, with the currentText of our state as its text value. The placeholder text will be shown when the entry isn’t focused. We use onM to attach an effectful event handler to the changed signal.

view' :: State -> AppView Gtk.Window Event
view' s = bin
  Gtk.Window
  [#title := "TodoGTK+", on #deleteEvent (const (True, Closed))]
  (container Gtk.Box
             [#orientation := Gtk.OrientationVertical]
             [todoList, newTodoForm]
  )
  where
    ...
    newTodoForm = widget
      Gtk.Entry
      [ #text := currentText s
      , #placeholderText := "What needs to be done?"
      , onM #changed _
      ]

The typed hole tells us we need a function Gtk.Entry -> IO Event. The reason we use onM is to have that IO action returning the event, instead of having a pure function. We need it to query the underlying GTK+ widget for it’s current text value. By using entryGetText, and mapping our event constructor over that IO action, we get a function of the correct type.

    ...
    newTodoForm = widget
      Gtk.Entry
      [ #text := currentText s
      , #placeholderText := "What needs to be done?"
      , onM #changed (fmap TodoTextChanged . Gtk.entryGetText)
      ]

It is often necessary to use onM and effectful GTK+ operations in event handlers, as the callback type signatures rarely have enough information in their arguments. But for the next event, TodoSubmitted, we don’t need any more information, and we can use on to declare a pure event handler for the activated signal.

    ...
    newTodoForm = widget
      Gtk.Entry
      [ #text := currentText s
      , #placeholderText := "What needs to be done?"
      , onM #changed (fmap TodoTextChanged . Gtk.entryGetText)
      , on #activate TodoSubmitted
      ]

Moving to the next warning, we see that the update' function is no longer total. We are missing cases for our new events. Let’s give the arguments names and pattern match on the event. The case for Closed will be the same as before.

update' :: State -> Event -> Transition State Event
update' s e = case e of
  Closed -> Exit

When the to-do text value changes, we’ll update the currentText state using a Transition. The first argument is the new state, and the second argument is an action of type IO (Maybe Event). We don’t want to emit any new event, so we use (pure Nothing).

update' :: State -> Event -> Transition State Event
update' s e = case e of
  TodoTextChanged t -> Transition s { currentText = t } (pure Nothing)
  Closed -> Exit

For the TodoSubmitted event, we define a newTodo value with the currentText as its name, and transition to a new state with the newTodo item appended to the todos vector. We also reset the currentText to be empty.

To use Vector.snoc, we need to add a qualified import.

import           Control.Monad                 (void)
import           Data.Text                     (Text)
import           Data.Vector                   (Vector)
import qualified Data.Vector                   as Vector
import qualified GI.Gtk                        as Gtk
import           GI.Gtk.Declarative
import           GI.Gtk.Declarative.App.Simple

Running the application, we can start adding to-do items.

Improving the Layout

Our application doesn’t look very good yet, so let’s improve the layout a bit. We’ll begin by left-aligning the to-do items.

todoItem i todo =
  widget
    Gtk.Label
    [#label := name todo, #halign := Gtk.AlignStart]

To push the form down to the bottom of the window, we’ll wrap the todoList in a BoxChild, and override the defaultBoxChildProperties to have the child widget expand and fill all the available space of the box.

todoList =
  BoxChild defaultBoxChildProperties { expand = True, fill = True }
    $ container Gtk.Box
                [#orientation := Gtk.OrientationVertical]
                (fmap todoItem (todos s))

We re-run the application, and see it has a nicer layout.

Completing To-Do Items

There’s one very important missing: being able to mark a to-do item as completed. We add a Bool field called completed to the Todo data type.

data Todo = Todo
  { name      :: Text
  , completed :: Bool
  }

When creating new items, we set it to False.

update' :: State -> Event -> Transition State Event
update' s e = case e of
  ...
  TodoSubmitted ->
    let newTodo = Todo {name = currentText s, completed = False}
    in  Transition
          s { todos = todos s `Vector.snoc` newTodo, currentText = mempty }
          (pure Nothing)
  ...

Instead of simply rendering the name, we’ll use strike-through markup if the item is completed. We define completedMarkup, and using guards we’ll either render the new markup or render the plain name. To make it strike-through, we wrap the text value in <s> tags.

widget
  Gtk.Label
    [ #label := completedMarkup todo
    , #halign := Gtk.AlignStart
    ]
  where
    completedMarkup todo
      | completed todo = "<s>" <> name todo <> "</s>"
      | otherwise      = name todo

For this to work, we need to enable markup for the label be setting #useMarkup to True.

widget
  Gtk.Label
    [ #label := completedMarkup todo
    , #useMarkup := True
    , #halign := Gtk.AlignStart
    ]
  where
    completedMarkup todo
      | completed todo = "<s>" <> name todo <> "</s>"
      | otherwise      = name todo

In order for the user to be able to toggle the completed status, we wrap the label in a Gtk.CheckButton bin. The #active property will be set to the current completed status of the Todo value. When the check button is toggled, we want to emit a new event called TodoToggled.

todoItem todo =
  bin Gtk.CheckButton
      [#active := completed todo, on #toggled (TodoToggled i)]
    $ widget
        Gtk.Label
        [ #label := completedMarkup todo
        , #useMarkup := True
        , #halign := Gtk.AlignStart
        ]

Let’s add the new constructor to the Event data type. It will carry the index of the to-do item.

data Event
  = TodoTextChanged Text
  | TodoSubmitted
  | TodoToggled Int
  | Closed

To get the corresponding index of each Todo value, we’ll iterate using Vector.imap instead of using fmap.

    todoList =
      BoxChild defaultBoxChildProperties { expand = True, fill = True }
        $ container Gtk.Box
                    [#orientation := Gtk.OrientationVertical]
                    (Vector.imap todoItem (todos s))
    todoItem i todo =
      ...

The pattern match on events in the update' function is now missing a case for the new event constructor. Again, we’ll do a transition where we update the todos somehow.

update' :: State -> Event -> Transition State Event
update' s e = case e of
  ...
  TodoToggled i -> Transition s { todos = _ (todos s) } (pure Nothing)
  ...

We need a function Vector Todo -> Vector Todo that modifies the value at the index i. There’s no handy function like that available in the vector package, so we’ll create our own. Let’s call it mapAt.

update' :: State -> Event -> Transition State Event
update' s e = case e of
  ...
  TodoToggled i -> Transition s { todos = mapAt i _ (todos s) } (pure Nothing)
  ...

It will take as arguments the index, a mapping function, and a Vector a, and return a Vector a.

mapAt :: Int -> (a -> a) -> Vector a -> Vector a

We implement it using Vector.modify, and actions on the mutable representation of the vector. We overwrite the value at i with the result of mapping f over the existing value at i.

mapAt :: Int -> (a -> a) -> Vector a -> Vector a
mapAt i f = Vector.modify (\v -> MVector.write v i . f =<< MVector.read v i)

To use mutable vector operations through the MVector name, we add the qualified import.

import qualified Data.Vector.Mutable           as MVector

Finally, we implement the function to map, called toggleComplete.

toggleCompleted :: Todo -> Todo
toggleCompleted todo = todo { completed = not (completed todo) }

update' :: State -> Event -> Transition State Event
update' s e = case e of
  ...
  TodoToggled i -> Transition s { todos = mapAt i toggleComplete (todos s) } (pure Nothing)
  ...

Now, we run our application, add some to-do items, and mark or unmark them as completed. We’re done!

Learning More

Building our to-do list application, we have learned the basics of gi-gtk-declarative and the “App.Simple” architecture. There’s more to learn, though, and I recommend checking out the project documentation. There are also a bunch of examples in the Git repository.

Please note that this project is very young, and that APIs are not necessarily stable yet. I think, however, that it’s a much nicer way to build GTK+ applications using Haskell than the underlying APIs provided by the auto-generated bindings.

Now, have fun building your own functional GTK+ applications!

by Oskar Wickström at January 19, 2019 12:00 AM

October 14, 2019

Monday Morning Haskell

Different Feature Schemes

new_features.jpg

In last week's edition of our Maze AI series, we explored the mechanics of supervised learning. We took the training data we'd been building up and trained an agent on it. We had one set of data to make the AI follow our own human moves, and another to follow our hand-crafted AI. This wasn't particularly successful. The resulting agent had a lot of difficulty navigating the maze and using its stun at the right times.

This week, we'll explore a couple different ways we can expand the feature set. First, we'll try encoding the legality of moves in our feature set. Second, we'll try expanding the feature set to include more data about the grid. This will motivate some other approaches to the problem. We'll conclude by taking the specifics of grid navigation out. We'll let our agent go to work on an empty grid to validate that this is at least a reasonable approach.

For some more reading on using Haskell and AI, take a look at our Haskell AI Series. We explore some reasons why Haskell could be a good fit for AI and machine learning problems. It will also help you through some of the basics of using Haskell and Tensor Flow.

Our supervised agent uses our current feature set. Let's remind ourselves what these features are. We have five different directions we can go (up, down, left, right, stand still). And in each of these directions, we calculate 8 different features.

  1. The maze distance to the goal
  2. The manhattan distance to the goal
  3. Whether the location contains an active enemy
  4. The number of enemies on the shortest path to the goal from that location
  5. The distance to the nearest enemy from that location
  6. The number of nearby enemies in manhattan distance terms
  7. Whether our stun is available
  8. The number of drills we have after the move

Some of these features are higher level. We do non-trivial calculations to figure them out. This gives our agent some idea of strategy. But there's not a ton of lower level information available! We zero out the features for a particular spot if it's past the world boundary. But we can't immediately tell from these features if a particular move is legal.

This is a big oversight. It's possible for our AI to learn about the legality of moves from the higher level training data. But it would take a lot more data and a lot more time.

So let's add a feature for how "easy" a move is. A value of 0 will indicate an illegal move, either past the world boundary or through a wall when we don't have a drill. A value of 1 will indicate a move that requires a drill. A value of 2 will indicate a normal move.

We'll add the extra feature into the LocationFeatures type. We'll also add an extra parameter to our produceLocationFeatures function. This boolean indicates whether a drill would be necessary. Note, we don't need to account for WorldBoundary. The value will get zeroed out in that case. We'll call this feature lfMoveEase since a higher value indicates less effort.

data LocationFeatures = LocationFeatures
  { …
  , lfMoveEase :: Int
  }

produceLocationFeatures :: Location -> World -> Bool -> LocationFeatures
produceLocationFeatures location@(lx, ly) w needsDrill = LocationFeatures
  …
  moveEase
  where
    moveEase = if not needsDrill then 2
      else if drillsRemaing > 0 then 1 else 0

It's easy to add the extra parameter to the function call in produceWorldFeatures. We already use case statements on the boundary types. Now we need to account for it when vectorizing our world.

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w = V.fromList (fromInegral <$>
  [ ...
  , lfMoveEase standStill
  ...
  , zeroIfNull (lfMoveEase <$> up)
  ...
  , zeroIfNull (lfMoveEase <$> right)
  ...
  , zeroIfNull (lfMoveEase <$> down)
  ...
  , zeroIfNull (lfMoveEase <$> left)
  ])

We we train with this feature set, we actually get a good training error, down to around 10%. Thus it can learn our data a bit better. Yet it still can't navigate right.

Expanding the Feature Set

Another option we can try is to serialize the world in a more raw state. We currently use more strategic features. But what about using the information on the board?

Here's a different way to look at it. Let's fix it so that the grid must be 10x10, there must be 2 enemies, and we must start with 2 drills powerups on the map. Let's get these features about the world:

  1. 100 features for the grid cells. Each feature will be the integer value corresponding the the wall-shape of that cell. These are hexadecimal, like we have when serializing the maze.
  2. 4 features for the player. Get the X and Y coordinates for the position, the current stun delay, and the number of drills remaining.
  3. 3 features for each enemy. Again, X and Y coordinates, as well as a stun timer.
  4. 2 coordinate features for each drill location. Once a drill gets taken, we'll use -1 and -1.

This will give us a total of 114 features. Here's how it breaks down.

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w = gridFeatures V.++ playerFeatures V.++
                     enemyFeatures V.++ drillFeatures
  where

    -- 1. Features for the Grid
    mazeStr = Data.Text.unpack $ dumpMaze (worldBoundaries w)
    gridFeatures = V.fromList $
      fromIntegral <$> digitToInt <$> mazeStr

    player = worldPlayer w
    enemies = worldEnemies w

    -- 2. Features for the player
    playerFeatures = V.fromList $ fromIntegral <$>
      [ fst . playerLocation $ player
      , snd . playerLocation $ player
      , fromIntegral $ playerCurrentStunDelay player
      , fromIntegral $ playerDrillsRemaining player
      ]

    -- 3. Features for the two enemies
    enemy1 = worldEnemies w !! 0
    enemy2 = worldEnemies w !! 1
    enemyFeatures = V.fromList $ fromIntegral <$>
      [ fst . enemyLocation $ enemy1
      , snd . enemyLocation $ enemy1
      , fromIntegral $ enemyCurrentStunTimer enemy1
      , fst . enemyLocation $ enemy2
      , snd . enemyLocation $ enemy2
      , fromIntegral $ enemyCurrentStunTimer enemy2
      ]

    -- 4. Features for the drill locations
    drills = worldDrillPowerUpLocations w
    drillFeatures = V.fromList $ fromIntegral <$>
      if length drills == 0 then [-1, -1, -1, -1]
        else if length drills == 1
          then [fst (head drills), snd (head drills), -1, -1]
          else [ fst (head drills), snd (head drills)
               , fst (drills !! 1), snd (drills !! 1)
               ]

As an optimization, we can make the grid features part of the world since they will not change.

Still though, our model struggles to complete the grid when training off this data. Compared to the high-level features, the model doesn't even learn very well. We get training errors around 25-30%, but a test error close to 50%. With more data and time, our model might be able to draw the connection between various features.

We could attempt to make our model more sophisticated. We're working with grid data, which is a little like an image. Image processing algorithms use concepts such as convolution and pooling. This allows them to derive patterns arising from how the grid actually looks. We're only looking at the data as a flat vector.

It's unlikely though that convolution and pooling would help us with this feature set. Our secondary features don't fit into the grid. So we would actually want to add them in at a later stage in the process. Besides, we won't get that much data from taking the average value or the max value in a 2x2 segment of the maze. (This is what pooling does).

If we simplify the problem though, we might find a situation where they'll help.

A Simpler Problem

We're having a lot of difficulty with getting our agent to navigate the maze. So let's throw away the problem of navigation for a second. Can we train an agent that will navigate the empty maze? This should be doable.

Let's start with a bare bones feature set with the goal and current location highlighted in a grid. We'll give a value of 10 for our player's location, and a value of 100 for the target location. We start with a vector of all zeros, and uses Vector.// to modify the proper values:

vectorizeWorld :: World -> V.Vector Float
vectorizeWorld w =
  where
    initialGrid = V.fromList $ take 100 (repeat 0.0)
    (px, py) = playerLocation (worldPlayer w)
    (gx, gy) = endLocation w
    playerLocIndex = (9 - py) * 10 + px
    goalLocIndex = (9 - gy) * 10 + gx
    finalFeatures = initialGrid V.//
      [(playerLocIndex, 10), (goalLocIndex)]

Our AI bot will always follow the same path in this grid, so it will be quite easy for our agent to learn this path! Even if we use our own moves and vary the path a little bit, the agent can still learn it. It'll achieve 100% accuracy on the AI data. It can't get that high on our data, since we might choose different moves for different squares. But we can still train it so it wins every time.

Conclusion

So our results are still not looking great. But next week we'll take this last idea and run a little further with it. We'll keep it so that our features only come from the grid itself. But we'll add a few more complications with enemies. We might find that convolution and pooling are useful in that case.

If you're interested in using Haskell for AI but don't know where to start, read our Haskell AI Series! We discuss some important ideas like why Haskell is a good AI language. We also get into the basics of Tensor Flow with Haskell.

by James Bowen at October 14, 2019 02:30 PM

Philip Wadler

The Next 7000 Programming Languages


The Next 7000 Programming Languages, by Chatley, Donaldson, and Mycroft, appears in a book marking 10,000 volumes of LNCS. Though they riff on Landin's title, the authors consider something quite different: Darwinian evolution in the context of programming languages. I've long thought we need a theory of the economics of programming languages, to explain why the most popular language is not always that one might consider best suited to a task. But, until now, it's not been clear to me what such a theory might consist of, other than the observation that network effects apply to programming languages. This paper points in the direction of a theory of programming languages as a whole, drawing on evolutionary theory and with a potential grounding in empirical measures, such as scraping Github to measure which languages are more or less popular.

It is useful here to distinguish between the success of a species of plant (or a programming language) and that of a gene (or programming language concept). For example, while pure functional languages such as Haskell have been successful in certain programming niches the idea (gene) of passing side-effect-free functions to map, reduce, and similar operators for data processing, has recently been acquired by many mainstream programming languages and systems; we later ascribe this partly to the emergence of multi-core processors.

This last example highlights perhaps the most pervasive form of competition for niches (and for languages, or plants, to evolve in response): climate change. Ecologically, an area becoming warmer or drier might enable previously non-competitive species to get a foothold. Similarly, even though a given programming task has not changed, we can see changes in available hardware and infrastructure as a form of climate change—what might be a great language for solving a programming problem on a single-core processor may be much less suitable for multi-core processors or data-centre solutions.

Amusingly, other factors which encourage language adoption (e.g. libraries, tools, etc.) have a plant analogy as symbiotes—porting (or creating) a wide variety of libraries for a language enhances its prospects.

by Philip Wadler (noreply@blogger.com) at October 14, 2019 01:58 PM

Neil Mitchell

Monads as Graphs

Summary: You can describe type classes like monads by the graphs they allow.

In the Build Systems a la Carte paper we described build systems in terms of the type class their dependencies could take. This post takes the other view point - trying to describe type classes (e.g. Functor, Applicative, Monad) by the graphs they permit.

Functor

The Functor class has one operation: given Functor m, we have fmap :: (a -> b) -> m a -> m b. Consequently, if we want to end up with an m b, we need to start with an m a and apply fmap to it, and can repeatedly apply multiple fmap calls. The kind of graph that produces looks like:

We've used circles for the values m a/m b etc and lines to represent the fmap that connects them. Functor supplies no operations to "merge" two circles, so our dependencies form a linear tree. Thinking as a build system, this represents Docker, where base images can be extended to form new images.

Applicative

The Applicative class has two fundamental operations - pure :: a -> m a (which we ignore because its pretty simple) and liftA2 :: (a -> b -> c) -> m a -> m b -> m c (most people think of <*> as the other fundamental operation, but liftA2 is equivalent in power). Thinking from a graph perspective, we now have the ability to create a graph node that points at two children, and uses the function argument to liftA2 to merge them. Since Applicative is a superset of Functor, we still have the ability to point at one child if we want. Children can also be pointed at by multiple parents, which just corresponds to reusing a value. We can visualise that with:

The structure of an Applicative graph can be calculated before any values on the graph have been calculated, which can be more efficient for tasks like parsing or build systems. When viewed as a build system, this represents build systems like Make or Buck, where all dependencies are given up front.

Selective

The next type class we look at is Selective, which can be characterised by the operation ifS :: m Bool -> m a -> m a -> m a. From a graph perspective, Selective interrogates the value of the first node, and then selects either the second or third node. We can visualise that as:

We use two arrows with arrow heads to indicate that we must point at one of the nodes, but don't know which. Unlike before, we don't know exactly what the final graph structure will be until we have computed the value on the first node of ifS. However, we can statically over-approximate the graph by assuming both branches will be taken. In build system terms, this graph corresponds to something like Dune.

Monad

The final type class is Monad which can be characterised with the operation (>>=) :: m a -> (a -> m b) -> m b. From a graph perspective, Monad interrogates the value of the first node, and then does whatever it likes to produce a second node. It can point at some existing node, or create a brand new node using the information from the first. We can visualise that as:

The use of an arrow pointing nowhere seems a bit odd, but it represents the unlimited options that the Monad provides. Before we always knew all the possible structures of the graph in advance. Now we can't know anything beyond a monad-node at all. As a build system, this graph represents a system like Shake.

by Neil Mitchell (noreply@blogger.com) at October 14, 2019 09:12 AM

Well-Typed.Com

A Summer of Runtime Performance

Need for speed

GHC produces pretty fast code by most standards. After Well-Typed put some development effort towards faster code it’s now even faster, with a reduction in runtime of 3-4%.

As a disclaimer, these numbers are based on nofib benchmarks. Actual benefits for individual programs can vary. However I’m confident that this will benefit any reasonably large program to some degree.

These changes were implemented over the summer by me (Andreas Klebinger) for Well-Typed. The changes will be in GHC 8.10 which is scheduled to be released early next year. So hopefully users won’t have to wait long to get these benefits.

So what did I change?

Here is a list of potentially user-facing changes as a result of this work. Not included are GHC internal changes which shouldn’t be user visible.

In the rest of this post, I will go over these changes in more detail …

1. Code layout redux

I already worked on code layout as part of Google Summer of Code last year. This year I picked this back up and improved it further.

Architecture of GHC’s code layout.

GHC’s pipeline has many stages, but relevant for code layout are the four below.

        Cmm
         v
Instruction Selection
         v
 Register Allocation
         v
    Code Layout
  • We start to think about code layout once we get to Cmm. Cmm is a C-like intermediate language which contains meta information about some branches indicating which branch of an if/else is more likely. We take this information and build an explicit control flow graph data structure from it.
  • From there we perform instruction selection, which can introduce new basic blocks and assigns virtual registers. This can change control flow in certain cases.
  • Register allocation can introduce new basic blocks as an implementation detail, but does not change control flow.
  • And the last thing we do is place all basic blocks in a sequence.

For code layout much of the functionality I added can be viewed as two independent parts:

  • Heuristics to estimate runtime behavior based on the control flow graph and code. Essentially we annotate the code with additional information at various stages and keep this information until code layout.
  • An algorithm taking the annotated code as input in order to produce a performant code layout.

This year I focused almost exclusively on the first part: how to produce more accurate estimates of runtime behavior:

The second part was for the most part already implemented by me last year during Google Summer of Code. Last year, we tried to compute a good code layout quickly, based on estimates of how often each control flow path is taken.

The main goal in the second step is to minimize the number of jump instructions, while placing jump targets closer to the jump sources. This is a hard optimization problem, but good heuristics take us very far for this part.

Estimating runtime behavior – loops

Loops are important for two reasons:

  • They are good predictors of runtime behavior.
  • Most execution time is spent in loops.

Combined, this means identifying loops allows for some big wins. Not only can we do a better job at optimizing the code involving them. The code in question will also be responsible for most of the instructions executed making this even better.

Last year I made the native backend “loop aware”. In practice this meant GHC would perform strongly connected components (SCC) analysis on the control flow graph.

  • This allowed us to identify blocks and control flow edges which are part of a loop.
  • In turn this means we can optimize loops at the cost of non-looping code for a net performance benefit.
  • However SCC can not determine loop headers, back edges or the nesting level of nested loops which means we miss out on some potential optimizations.

This meant we sometimes ended up placing the loop header in the middle of the loop code. As in code blocks would be laid out in order 2->3->1. This isn’t as horrible as it sounds. Loops tend to be repeated many times and it only adds two jumps overhead at worst. But sometimes we bail out of loops early and then the constant overhead matters. We also couldn’t optimize for inner loops as SCC is not enough to determine nested loops.

Nevertheless, being aware of loops at all far outweighed the drawbacks of this approach. As a result, this optimisation made it into GHC 8.8.

This year I fixed these remaining issues. Based on dominator analysis we can now not only determine if a block is part of a loop. We can also answer what loop it is, how deeply nested that loop is and determine the loop header.

As a consequence we can prioritize the inner most loops for optimizations, and can also estimate the frequency with which all control flow edges in a given function are taken with reasonable accuracy.

Estimating runtime behavior – more heuristics

The paper Static Branch Frequency and Program Profile Analysis by Youfeng Wu and James R. Larus contains a number of heuristics which can be used to estimate run time as well as an algorithm to compute global edge frequencies for a control flow graph. As far as I know this paper was based on work in/for gcc, however much of it was directly applicable to GHC with slight modifications.

In order for their approach to be usable we need to have certain meta-information about branches available, which is no longer inferable from the assembly code which GHC performs code layout on. For this reason we get the required information during the various pipeline stages and keep it around until we can use it for the code layout.

This allows us not only to implement heuristics from the paper, but also to implement GHC specific heuristics. For example we identify heap and stack checks which are easy to predict statically.

Results

Let’s start with the numbers:

Benchmark suite Runtime difference (lower is better)
nofib −1.9%
containers −1.1%
megaparsec −1.1%

This is a substantial improvement for the code generator!

Nofib is the default way to benchmark GHC changes.

In my experience containers and megaparsec respond very different to code layout changes. So seeing that this approach works well for both is nice to see.

2. Linker woes

GHC has its own linker implementation used to load static libraries for e.g. GHCi and TH on Windows.

Some offsets when doing linker things are limited to 32 bit which is not an issue, as linkers can work around this as long as the overflow is recognized. So obviously we check for this case. However the check was wrong, leading to runtime crashes in rare circumstances as jump offsets overflowed.

In the end this is a very rare error, usually only happening when at least TH and large packages are involved on Windows. So far I only know of one other person who ran into this issue. But it reliably occurred for me when working on GHC itself.

I identified the issue a while ago, but while running benchmarks I had time to write up a patch and fix it. It will be in 8.10 and will be backported to 8.8 as well.

For the curious

In detail the issue is that there are no 64bit relative jumps on x64. So if possible a linker will use a 32bit offset to the target location. If this is not feasible the linker has to work around this by jumping to another piece of code, which will then use an absolute jump to go to the destination.

This works, however the check was faulty so failed sometimes if the value was close to the end of the 32bit range.

If you are curious, here is the old (faulty) check, and the new (working) check.

  // Old check
  if ((v >> 32) && ((-v) >> 32))

  // New check
  if ((v > (intptr_t) INT32_MAX) || (v < (intptr_t) INT32_MIN))

3. Smaller interface files

GHC stores a lot of Int/Int32/… numbers in interface files. Most of these however represent small values. In fact, most of the numbers we store could fit into a single byte. So I changed the encoding to one which allows us to do so.

I wrote up the details a while ago so if you are interested check it out.

4. Precomputed Int closures

GHC uses a great hack to reduce GC overhead when dealing with Char or Int values. Ömer Sinan Ağacan wrote about this in the past so I won’t reproduce the details here.

The short version is that a large number of Int and Char values will be close to zero. So during garbage collection GHC would replaces all heap allocated ASCII range Char and certain Int values with references to ones built into the runtime. This means during garbage collection we do not have to copy them speeding up collections. Further we can get away with one 1 :: Int value instead of many. So we also reduce memory usage.

This is great, but the Int range was only from -16 to 16 which seemed quite small to me.

I tried a few things and settled on -16 to 255 as new range. The impact (based on nofib) was surprisingly large. From the MR:

Effects as I measured them:

RTS Size: +0.1%
Compile times: -0.5%
Runtime nofib: -1.1%

Nofib overrepresents Int usage. But even GHC itself got 0.5% faster so I was quite happy with how this change turned out. And there will be follow up work improving things further when I get around to it.

5. Switch statements

Here I optimized two patterns.

Eliminating redundant switches

The simpler change is that if all branches are equal we can eliminate the switch entirely. This happens rarely as this pattern is usually eliminated earlier in the pipeline. But sometimes it only arises after certain optimizations have been applied so GHC will check for this later in the pipeline (at the Cmm stage) as well.

This pattern is fairly rare. However when it happens in an inner loop it can result in a massive performance improvement, so it’s good to patch this.

Avoid expression duplication

The other change was slightly more complex. For this we need to look at some Cmm code.

We start of with a Cmm switch statement like the one in this macro:

#define ENTER_(ret,x)                                   \
 again:                                                 \
  W_ info;                                              \
  LOAD_INFO(ret,x)                                      \
  /* See Note [Heap memory barriers] in SMP.h */        \
  prim_read_barrier;                                    \
  switch [INVALID_OBJECT .. N_CLOSURE_TYPES]            \
         (TO_W_( %INFO_TYPE(%STD_INFO(info)) )) {       \
  case                                                  \
    IND,                                                \
    IND_STATIC:                                         \
   {                                                    \
      x = StgInd_indirectee(x);                         \
      goto again;                                       \
   }                                                    \
  case                                                  \
    FUN,                                                \
    FUN_1_0,                                            \
    FUN_0_1,                                            \
    FUN_2_0,                                            \
    FUN_1_1,                                            \
    FUN_0_2,                                            \
    FUN_STATIC,                                         \
    BCO,                                                \
    PAP:                                                \
   {                                                    \
       ret(x);                                          \
   }                                                    \
  default:                                              \
   {                                                    \
       x = UNTAG_IF_PROF(x);                            \
       jump %ENTRY_CODE(info) (x);                      \
   }                                                    \
  }

GHC is fairly smart about how it compiles these kinds of constructs. In this case it will use a binary search tree as one can see in the control flow graph:

So far so good, however it also decides to inline the expression we branch on into each node.

Here is one snippet taken from GHC’s RTS where we duplicate I32[_c3::I64 - 8]. From a code size point of view this isn’t that bad, but we also duplicate the work in each node of our binary search tree as a result. This is especially bad if we end up duplicating memory reads.

==================== Optimised Cmm ====================
stg_ap_0_fast() { //  [R1]
        { []
        }
    {offset
      ...
      c6: // global
          _c3::I64 = I64[_c1::P64];   // CmmAssign
          if (I32[_c3::I64 - 8] < 26 :: W32) goto ub; else goto ug;   // CmmCondBranch
      ub: // global
          if (I32[_c3::I64 - 8] < 15 :: W32) goto uc; else goto ue;   // CmmCondBranch
      uc: // global
          if (I32[_c3::I64 - 8] < 8 :: W32) goto c7; else goto ud;   // CmmCondBranch
      ...
    }
}

If we go through four nodes in the search tree this means we perform 3 redundant loads from memory. Even when we hit L1 cache this still incurs a latency overhead of 15 cycles on my desktop machine.

For comparison if the branches get predicted correctly and we remove this overhead we should be able to process all 4 search tree nodes in 2-4 cycles total.

Only very rarely does GHC emit code following this pattern when compiling Haskell code. However GHC also supports compilation of hand written Cmm code. A feature heavily used heavily used by the RTS. Some of the more common macros used in the RTS lead to this pattern. As a consequence most, if not all programs will benefit from this change as the RTS will perform better.

For nofib it reduced runtime by 1.0% so definitely a meaningful improvement.

Conclusion

I think it’s great that GHC does not only add new features, but also keeps improving things for existing code. And I’m glad I was able to add some improvements myself for the community.

by andreask at October 14, 2019 12:00 AM

October 13, 2019

Sandy Maguire

New Book: Design and Interpretation of Haskell Programs

I’m writing a new book, on how to write good, real-world Haskell applications! The announcement copy is below.


Hi there! My name is Sandy Maguire — you might know me from my work on Polysemy and Thinking with Types.

One of purely functional programming’s greatest strengths is its powerful abstraction capabilities. We proudly exclaim that our functions are referentially transparent, and because of that, our bugs will always be shallow. And this is often true.

10x is often cited as the magic number beyond which technology is good enough to overcome network effects. I’m personally convinced that writing Haskell is 10x better than any other popular programming language I’ve tried. But if functional programming is so good, why hasn’t it yet taken over the world?

This is a very serious question. If we’re right about this, why haven’t we won?

Design and Interpretation of Haskell Programs is my answer to this question. Haskell hasn’t taken market share because we collectively don’t yet know how to write real applications with it. Abstraction is our language’s greatest strength, but all of our “best practices” evangelize doing everything directly in IO. Is it really any wonder that nonbelievers aren’t convinced when we show them an imperative C program that just happens to be compiled with GHC?

Instead of giving up, this book encourages us to take a heavy focus on designing leak-free abstractions, on building programs that can be reasoned about algebraically, and on completely separating business logic from interpretation details.

But I can’t do it all by myself. Writing a book is a hard, gruelling process, that’s made much easier by knowing that people care about the end result. If you’re a conscientious engineer, unhappy with the status-quo of large, unmaintainable, “production-grade” Haskell, then this book is for you. By pledging, you let me know that this book is worth writing. In addition, your early feedback will help make this book the best it can possibly be.

Not sure if this is the book for you? Take a look at the sample before committing to anything!

With your help, together we can tame software complexity and write codebases we’re proud of.

One love, Sandy

October 13, 2019 02:14 PM

Philip Wadler

Toki Pona


At Haskell Exchange, Stephan Schneider introduced me to Toki Pona, an extremely simple artificial language, based on 120 symbols. You can read more on Wikipedia.




by Philip Wadler (noreply@blogger.com) at October 13, 2019 01:12 PM

Brent Yorgey

Competitive Programming in Haskell: reading large inputs with ByteString

In my last post in this series, we looked at building a small Scanner combinator library for lightweight input parsing. It uses String everywhere, and usually this is fine, but occasionally it’s not.

A good example is the Kattis problem Army Strength (Hard). There are a number of separate test cases; each test case consists of two lines of positive integers which record the strengths of monsters in two different armies. Supposedly the armies will have a sequence of battles, where the weakest monster dies each time, with some complex-sounding rules about how to break ties. It sounds way more complicated than it really is, though: a bit of thought reveals that to find out who wins we really just need to see which army’s maximum-strength monster is strongest.

So our strategy for each test case is to read in the two lists of integers, find the maximum of each list, and compare. Seems pretty straightforward, right? Something like this:

import           Control.Arrow
import           Data.List.Split

main = interact $
  lines >>> drop 1 >>> chunksOf 4 >>>
  map (drop 2 >>> map (words >>> map read) >>> solve) >>>
  unlines

solve :: [[Int]] -> String
solve [gz, mgz] = case compare (maximum gz) (maximum mgz) of
  LT -> "MechaGodzilla"
  _  -> "Godzilla"

Note I didn’t actually use the Scanner abstraction here, though I could have; it’s actually easier to just ignore the numbers telling us how many test cases there are and the length of each line, and just split up the input by lines and go from there.

This seems straightforward enough, but sadly, it results in a Time Limit Exceeded (TLE) error on the third of three test cases. Apparently this program takes longer than the allowed 1 second. What’s going on?

If we look carefully at the limits for the problem, we see that there could be up to 50 test cases, each test case could have two lists of length 10^5, and the numbers in the lists can be up to 10^9. If all those are maxed out (as they probably are in the third, secret test case), we are looking at an input file many megabytes in size. At this point the time to simply read the input is a big factor. Reading the input as a String has a lot of overhead: each character gets its own cons cell; breaking the input into lines and words requires traversing over these cons cells one by one. We need a representation with less overhead.

Now, if this were a real application, we would reach for Text, which is made for representing textual information and can correctly handle unicode encodings and all that good stuff. However, this isn’t a real application: competitive programming problems always limit the input and output strictly to ASCII, so characters are synonymous with bytes. Therefore we will commit a “double no-no”: not only are we going to use ByteString to represent text, we’re going to use Data.ByteString.Lazy.Char8 which simply assumes that each 8 bits is one character. As explained in a previous post, however, I think this is one of those things that is usually a no-no but is completely justified in this context.

Let’s start by just replacing some of our string manipulation with corresponding ByteString versions:

import           Control.Arrow
import qualified Data.ByteString.Lazy.Char8 as C
import           Data.List.Split

main = C.interact $
  C.lines >>> drop 1 >>> chunksOf 4 >>>
  map (drop 2 >>> map (C.words >>> map (C.unpack >>> read)) >>> solve) >>>
  C.unlines

solve :: [[Int]] -> C.ByteString
solve [gz, mgz] = case compare (maximum gz) (maximum mgz) of
  LT -> C.pack "MechaGodzilla"
  _  -> C.pack "Godzilla"

This already helps a lot: this version is actually accepted, taking 0.66 seconds. (Note there’s no way to find out how long our first solution would take if allowed to run to completion: once it goes over the time limit Kattis just kills the process. So we really don’t know how much of an improvement this is, but hey, it’s accepted!)

But we can do even better: it turns out that read also has a lot of overhead, and if we are specifically reading Int values we can do something much better. The ByteString module comes with a function

readInt :: C.ByteString -> Maybe (Int, C.ByteString)

Since, in this context, we know we will always get an integer with nothing left over, we can replace C.unpack >>> read with C.readInt >>> fromJust >>> fst. Let’s try it:

import           Control.Arrow
import qualified Data.ByteString.Lazy.Char8 as C
import           Data.List.Split
import           Data.Maybe (fromJust)

main = C.interact $
  C.lines >>> drop 1 >>> chunksOf 4 >>>
  map (drop 2 >>> map (C.words >>> map readInt) >>> solve) >>>
  C.unlines

  where
    readInt = C.readInt >>> fromJust >>> fst

solve :: [[Int]] -> C.ByteString
solve [gz, mgz] = case compare (maximum gz) (maximum mgz) of
  LT -> C.pack "MechaGodzilla"
  _  -> C.pack "Godzilla"

Now we’re talking — this version completes in a blazing 0.04 seconds!

We can take these principles and use them to make a variant of the Scanner module from last time which uses (lazy, ASCII) ByteString instead of String, including the use of the readInt functions to read Int values quickly. You can find it here.

by Brent at October 13, 2019 02:46 AM

October 11, 2019

Mark Jason Dominus

More fair cake-cutting

In a recent article about fair cake division, I said:

Grandma can use the same method … to divide a regular 17-gonal cake into 23 equally-iced equal pieces.

I got to wondering what that would look like, and the answer is, not very interesting. A regular 17-gon is pretty close to a circle, and the 23 pieces, which are quite narrow, look very much like equal wedges of a circle:

A 17-gon divided into 23 equal pieces, as described in the previous paragraph

This is generally true, and it becomes more nearly so both as the number of sides of the polygon increases (it becomes more nearly circular) and as the number of pieces increases (the very small amount of perimeter included in each piece is not very different from a short circular arc).

Recall my observation from last time that even in the nearly extreme case of a square divided into three slices, the central angles deviate from equality by only a few percent.

Of particular interest to me is this series of demonstrations of how to cut four pieces from a cake with an odd number of sides:

I think this shows that the whole question is a little bit silly: if you just cut the cake into equiangular wedges, the resulting slices are very close in volume and in frosting. If the nearly-horizontal cuts in the pentagon above had been perfectly straight and along the -axis, they would have intersected the pentagon only 3% of a radius-length lower than they should have.

Some of the simpler divisions of simpler cakes are interesting. A solution to the original problem (of dividing a square cake into nine pieces) is highlighted.

The method as given works regardless of where you make the first cut. But the results do not look very different in any case:

The original SVG files are also available, as is the program that wrote them.

by Mark Dominus (mjd@plover.com) at October 11, 2019 06:06 PM

Tweag I/O

Ormolu:
Announcing First Release

Mark Karpov, Utku Demir

We're happy to announce the first release of Ormolu, a formatter for Haskell source code. Some may remember our first post from a couple months ago where we disclosed our work on the Ormolu project—but carefully called it “vaporware” then. Times have changed; it's not anymore.

Functionality

We've run Ormolu on large real-world projects, legacy codebases, and most popular packages. We consider Ormolu usable:

  • It formats all Haskell constructs and handles all language extensions.
  • It places comments correctly.
  • It performs some normalization of language pragmas, GHC/Haddock option pragmas, and import lists.
  • It's fast enough to format large real-world codebases in seconds.
  • Its output is almost always idempotent. We'll get idempotence 100% right in the following releases.

Style

Ormolu's original goal was to implement a formatting style close to one people already use. We also wanted a style that minimizes diffs. We met both goals in the first release, but you may notice some unexpected stylistic decisions.

Let's look at an example:

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE RecordWildCards #-}

-- | A formatter for Haskell source code.
module Ormolu
  ( ormolu,
    ormoluFile,
    ormoluStdin,
    -- ...
    withPrettyOrmoluExceptions,
  )
where

-- ...

-- | Load a file and format it. The file stays intact and the rendered
-- version is returned as 'Text'.
ormoluFile ::
  MonadIO m =>
  -- | Ormolu configuration
  Config ->
  -- | Location of source file
  FilePath ->
  -- | Resulting rendition
  m Text
ormoluFile cfg path =
  liftIO (readFile path) >>= ormolu cfg path

The snippet should look conventional except Ormolu placed commas and function arrows at the ends of lines. Let's see why we decided to place them there.

Commas

While this popular formatting choice

( "foo"
, "bar"
, "baz"
)

works in expressions, it's a parse error if used in a pattern, because everything in a multiline pattern should be more indented than the opening parenthesis. That's why we make an exception in our rendering rules—we move the closing parenthesis one indentation level to the right on the rare occasions it's necessary. Re-arranging or shifting all commas would be too inconsistent in that case, so we went with commas at the end of lines.

Did you notice that we also add trailing commas where possible, for example, in export lists? Our ability to do this comes from a relatively new feature in Haskell—it helps with Ormolu's goal of achieving minimal diffs too. If we try to remember where leading commas come from, Johan Tibell's style guide comes to mind. The author said later:

[…] I designed [Haskell style guide] to work with the lack of support for a trailing comma. If we supported a trailing comma my style guide would probably be different.

There you have it: Ormolu supports a trailing comma now, so it's not unreasonable to start putting commas at the end of lines. This style is also more familiar to programmers who come to Haskell from other languages.

Function arrows

We faced another dilemma with placement of function arrows. The familiar style is this:

traverse
  :: Applicative f
  => (a -> f b)
  -> t a
  -> f (t b)

There is nothing wrong with it. It works perfectly well… with Haskell98. As soon as we start adding more features to the type system, it's no longer clear what is the best way to format type signatures:

reassociateOpTreeWith
  :: forall ty op.
  [(RdrName, Fixity)]
  -> (op -> Maybe RdrName)
  -> OpTree ty op
  -> OpTree ty op

Here, we have had a hard time deciding how to format the type signature because of forall ty op.. If we leave [(RdrName, Fixity)] like this, it's not aligned with other arguments and looks quite different because it's not prefixed by (->).

We could try this:

reassociateOpTreeWith
  :: forall ty op.
     [(RdrName, Fixity)]
  -> (op -> Maybe RdrName)
  -> OpTree ty op
  -> OpTree ty op

But then the first argument starts at a column that is not a multiple of our indentation step. We could also try to put . on the same line as [(RdrName, Fixity)] but . belongs to forall ty op., so it's not perfect.

In the future, there will be more additions to the type system:

  • Linear types will add a new type of arrow. It's clear that the new arrow (#->) will be at least three characters long and won't align with :: and other arrows. What's more, (#->) is shorthand. In general linear arrows can have multiplicity annotations, like p in Int #p-> Bool. Multiplicities characterize use of the function argument, whose type is given immediately before the multiplicity, so it makes sense to group the argument type and the arrow on the same line.

  • Dependent Haskell is going to add new constructions on the type level as well. They may bring us problems similar to the existing forall.

We found that all these problems get solved if we put arrows in trailing position:

reassociateOpTreeWith ::
  forall ty op.
  [(RdrName, Fixity)] ->
  (op -> Maybe RdrName) ->
  OpTree ty op ->
  OpTree ty op

This makes sense especially because (->) is right-associative.

The only problem with trailing arrows is per-argument Haddocks. They cannot be placed after (->) so there are two options:

reassociateOpTreeWith ::
  forall ty op.
  [(RdrName, Fixity)] {- ^ Fixity map for operators -} ->
  (op -> Maybe RdrName) {- ^ How to get the name of an operator -} ->
  OpTree ty op {- ^ Original 'OpTree' -} ->
  OpTree ty op {- ^ Re-associated 'OpTree' -}

or

reassociateOpTreeWith ::
  forall ty op.
  -- | Fixity map for operators
  [(RdrName, Fixity)] ->
  -- | How to get the name of an operator
  (op -> Maybe RdrName) ->
  -- | Original 'OpTree'
  OpTree ty op ->
  -- | Re-associated 'OpTree'
  OpTree ty op

We went with the second version, which seems clearer and arguably encourages writing more detailed Haddocks.

Configuration and language extensions

Ormolu aims to have only one style, as noted in the first post. That means no configuration and no configuration file to keep.

Most language extensions co-exist peacefully so they're turned on by default for every file. This way, Ormolu always works with syntax that's enabled by the extensions—it doesn't need to search for Cabal files to figure out which extensions to use—which simplifies the usage.

There are a few exceptions though. You can find out which extensions are not enabled by default like this:

$ ormolu --manual-exts
AlternativeLayoutRule
AlternativeLayoutRuleTransitional
Arrows
BangPatterns
Cpp
MagicHash
MonadComprehensions
PatternSynonyms
RecursiveDo
StaticPointers
TemplateHaskellQuotes
TransformListComp
TypeApplications
UnboxedSums
UnboxedTuples
UnicodeSyntax

Those should be enabled either on top of each file (recommended) or passed with the --ghc-opt option.

The next steps

Ormolu is now in beta stage, and it's available here to download and try today. Next, we're going to concentrate on a few idempotence bugs. They're low severity, but we've made them high priority, and we're confident we'll fix them.

Want to help improve Ormolu? Please suggest improvements, make contributions, and report any issues here.

October 11, 2019 12:00 AM

October 10, 2019

Mark Jason Dominus

Incenters of chocolate-iced cakes

A MathOverflow post asks:

Puzzle 1: Grandma made a cake whose base was a square of size 30 by 30 cm and the height was 10 cm. She wanted to divide the cake fairly among her 9 grandchildren. How should she cut the cake?

Okay, this is obvious.

Puzzle 2: Grandma made a cake whose base was a square of size 30 by 30 cm and the height was 10 cm. She put chocolate icing on top of the cake and on the sides, but not on the bottom. She wanted to divide the cake fairly among her 9 grandchildren so that each child would get an equal amount of the cake and the icing. How should she cut the cake?

This one stumped me; the best I could do was to cut the cake into 27 slabs, each cm, and each with between 1 and 5 units of icing. Then we can give three slabs to each grandkid, taking care that each kid's slabs have a total of 7 units of icing. This seems like it might work for an actual cake, but I suspected that it wasn't the solution that was wanted, because the problem seems like a geometry problem and my solution is essentially combinatorial.

Indeed, there is a geometric solution, which is more interesting, and which cuts the cake into only 9 pieces.

I eventually gave up and looked at the answer, which I will discuss below. Sometimes when I give up I feel that if I had had thought a little harder or given up a little later, I would have gotten the answer, but not this time. It depends on an elementary property of squares that I had been completely unaware of.

This is your last chance to avoid spoilers.


The solution given is this: Divide the perimeter of the square cake into 9 equal-length segments, each of length cm. Some of these will be straight and others may have right angles; it does not matter. Cut from the center of the cake to the endpoints of these segments; the resulting pieces will satisfy the requirements.

“Wat.” I said. “If the perimeter lengths are equal, then the areas are equal? How can that be?”

This is obviously true for two pieces; if you cut the square from the center into two pieces into two parts that divide the perimeter equally, then of course they are the same size and shape. But surely that is not the case for three pieces?

I could not believe it until I got home and drew some pictures on graph paper. Here Grandma has cut her cake into three pieces in the prescribed way:

A square with vertices at ‹±3, ±3› and center at ‹0,0›. Three regions are marked on it: a blue kite with vertices ‹0,0›, ‹1,3›, ‹-3,3›, ‹-3,-1›; a pink irregular quadrilateral with vertices ‹0,0›, ‹3,-3›, ‹3,3›, ‹1,3›; a green irregular quadrilateral congruent to the pink one with vertices ‹0,0›, ‹-3,-1›, ‹-3,-3›, ‹3,-3›.  The three regions completely partition the square with no overlap and nothing left over.

The three pieces are not the same shape! But each one contains one-third of the square's outer perimeter, and each has an area of 12 square units. (Note, by the way, that although the central angles may appear equal, they are not; the blue one is around 126.9° and the pink and green ones are only 116.6°.)

And indeed, any piece cut by Grandma from the center that includes one-third of the square's perimeter will have an area of one-third of the whole square:

A square with vertices     at ‹±3, ±3› and center at ‹0,0›. Marked on it is an orange trapezoid     with vertices ‹0,0›, ‹-3,-2›, ‹-3,3›, ‹0,3›. Also a pink pentagon with     vertices ‹0,0›, ‹2,-3›, ‹3,-3›, ‹3,3›, ‹2,3›.  Both polygons     include 8 units of the square’s 24 perimeter units, and both     have an area of 12 square units.

The proof that this works is actually quite easy. Consider a triangle where is the center of the square and and are points on one of the square's edges.

The triangle's area is half its height times its base. The base is of course the length of the segment , and the height is the length of the perpendicular from to the edge of the square. So for any such triangle, its area is proportional to the length of .

No two of the five triangles below are congruent, but each has the same base and height, and so each has the same area.

Five wedges radiate downward in different directions from the center of the square, each arriving at a different part of the edge but each with a base of 1 unit.

Since the center of the square is the same distance from each of the four edges, the same is true for any two triangles, regardless of which edge they arise from: the area of each triangle is proportional to the length of the square's perimeter at its base. Any piece Grandma cuts in this way, from the center of the cake to the edge, is a disjoint union of triangular pieces of this type, so the total area of any such piece is also proportional to the length of the square's perimeter that it includes.

That's the crucial property of the square that I had not known before: if you make cuts from the center of a square, the area of the piece you get is proportional to the length of the perimeter that it contains. Awesome!

Here Grandma has used the same method to cut a pair of square cakes into ten equal-sized pieces that all the have same amount of icing.

A 10×10 square divided     into five pieces from the center.  The pieces are three     different shapes, but each piece contains 8 units     of the square's perimeter and has an area of 20 square units.

The crucial property here was that the square’s center is the same distance from each of its four edges. This is really obvious, but not every polygon has an analogous point. The center of a regular polygon always has this property, and every triangle has a unique point, called its incenter, which enjoys this property. So Grandma can use the same method to divide a triangular cake into 7 equally-iced equal pieces, if she can find its incenter, or to divide a regular 17-gonal cake into 23 equally-iced equal pieces.

Not every polygon does have an incenter, though. Rhombuses and kites always do, but rectangles do not, except when they are square. If Grandma tries this method with a rectangular sheet cake, someone will get shortchanged. I learned today that polygons that have incenters are known as tangential polygons. They are the only polygons in which can one inscribe a single circle that is tangent to every side of the polygon. This property is easy to detect: these are exactly the polygons in which all the angle bisectors meet at a single point. Grandma should be able to fairly divide the cake and icing for any tangential polygon.

I have probably thought about this before, perhaps in high-school geometry but perhaps not since. Suppose you have two lines, and , that cross at an acute angle at , and you consider the set of points that are equidistant from both and . Let be a line through which bisects the angle between and ; clearly any point on will be equidistant from and by a straightforward argument involving congruent triangles.

Now consider a triangle . Let be the intersection of the angle bisectors of angles and .

The triangle, as described, with the two angle bisectors drawn and their intersection at P.  Centered at P is a circle that is tangent to all of AB, BC, and CA at the same time.

is the same distance from both and because it is on the angle bisector of , and similarly it is the same distance from both and because it is on the angle bisector of .

So therefore is the same distance from both and and it must be on the angle bisector of angle also. We have just shown that a triangle's three angle bisectors are concurrent! I knew this before, but I don't think I knew a proof.

[ Addendum 20191011: Many illustrated examples. ]

by Mark Dominus (mjd@plover.com) at October 10, 2019 09:04 PM

October 09, 2019

Tweag I/O

Bazel, Cabal, Stack:
Why choose when you can have them all?

Mathieu Boespflug, Andreas Herrmann

No new product created in Haskell ever starts from scratch. Hackage hosts millions of lines of third-party code, neatly and independently redistributable as Cabal packages. Now, Bazel has native support for building Cabal packages since the 0.10 release of rules_haskell.

Cabal packages themselves seldom start from scratch. That's why packages typically have dozens of dependencies. Resolving version bounds declared for all dependencies in the package metadata to a set of concrete versions, and then downloading these dependencies, is a painstaking task if done manually. Bazel can now use Stack to do this all automatically—the user only needs to provide the name of a Stackage snapshot and the names of the packages they want to reuse for their project.

Users frequently ask which build tool to use for their next project. It turns out that "all of them at once" is a compelling answer (including Nix, though we covered that previously and won't be rehearsing that in this post).

A Bazel primer

Bazel is a build tool originally created by Google. The key attributes of Bazel are:

  • Bazel is a polyglot build tool, supporting many different programming languages. This enables Bazel to be fast, because it can have a global and fine-grained view of the dependency graph, even across language boundaries.
  • Bazel tries hard to guarantee build correctness. This means that after making a few localized changes to your source code, you don't need to start your build from scratch to be confident that others get the same result. Incremental builds are guaranteed to yield the same result as full builds (under mild conditions we won't discuss here). This also means that it's safe to distribute builds to a large cluster of remote machines to make it finish fast. You still get the same result.
  • Bazel is extensible. You can teach Bazel to build code in new programming languages that it didn't know about out-of-the-box. Doing so requires getting familiar with a simple Python-like language called Starlark. Unlike Make or Shake rules, mechanisms and conventions exist to easily reuse Bazel rules across projects, leading to the emergence of an entire ecosystem of rules that build on each other.

Internally, Google uses a variant of Bazel to build most of their billions of lines of source code, thousands of times a day. If your project has lots of components in a variety of different languages and you don't want the hassle of lots of build systems too, or if you simply want your builds to remain fast no matter how big your project grows, you should probably be using Bazel (or Buck, Facebook's equivalent).

The tool expects two types of files in your project:

  • One or more BUILD files. Each BUILD file declares a set of targets. Each target is an instance of a build rule, like haskell_library for any reusable component in your project, or haskell_binary for the executables, or miscellaneous other build rules (like API documentation). See the tutorial for a longer introduction.
  • A WORKSPACE file that allows you to invoke macros that perform some autodiscovery and automatically generate BUILD files from, say, third-party package metadata.

Here's how our solution to build third-party code works:

  1. We define two new build rules: haskell_cabal_library and haskell_cabal_binary. These are like haskell_library and haskell_binary, respectively, except that Cabal is used to build the targets, rather than calling GHC (the Glasgow Haskell Compiler) directly.
  2. A macro called stack_snapshot generates a BUILD file that declares a target for each Cabal package in the given snapshot that we'll be using in our project, directly or indirectly.

Building a Cabal package

Let's say you have an existing Cabal library in your project. Perhaps you would like it to be a Cabal library so that you can publish it on Hackage. To expose it to downstream Haskell code that uses Bazel as the build tool, you can write the following rule in a BUILD file:

haskell_cabal_library(
    name = "mylib",
    version = "0.1",
    srcs = ["mylib.cabal", "Lib.hs"],
)

A binary could now depend on this Cabal library as well as on base (which ships with the GHC toolchain):

haskell_toolchain_library(name = "base")

haskell_binary(
    name = "myexe",
    srcs = ["Main.hs"],
    deps = [":base", ":mylib"],
)

In the above, we have three targets, each designated with a "label": :mylib, :base and :myexe. The label is derived from the name attribute that is mandatory for each target. rules_haskell is a set of build rules for Bazel. The build rules tell Bazel that it needs to call Cabal to build a haskell_cabal_library target. Performing this action produces several outputs, including on most platforms a static library and a shared library (called libHSmylib-0.1.a and libHSmylib-0.1.so, respectively). You don't need to remember the names of any of the outputs, since you can simply pass a target as a dependency to another, using the target's label. The build rules tell Bazel which outputs from each one of a target's dependencies it needs to build the target. In this case, we are building a binary with :mylib statically linked (the default), so the libHSmylib-0.1.a output from that target is needed to build the :myexe target.

Building a Stackage snapshot

The ability to build libraries with or without Cabal given a short target definition is great. But in practice, even these short target definitions get tiring to write, for two reasons:

  1. We don't want to have to write out the version numbers of each Cabal package explicitly. The great thing about Stackage snapshots is that a single snapshot name determines the version number for all packages. If only we could tell Bazel which snapshot we want to use, explicit version numbers for each package would no longer be necessary.
  2. Cabal libraries on Hackage typically have many dependencies, which in turn have dependencies of their own. The full dependency graph can get very large, in the order of hundreds of nodes. Writing it out in full in the form of target definitions like above would be tiresome indeed.

Stack already knows how to resolve a snapshot name to a specific set of package versions. Stack also already knows where to find these packages, on Hackage or any of its mirrors. Finally, Stack already knows what the dependency graph looks like. So the solution is to just call Stack. We added a workspace macro called stack_snapshot. An example:

stack_snapshot(
    name = "stackage",
    packages = ["conduit", "lens", "zlib-0.6.2"],
    snapshot = "lts-14.7",
)

The above generates a BUILD file behind the scenes with one haskell_cabal_library per package listed in the packages attribute and any transitive dependencies thereof. The result is essentially the output of stack dot, which outputs a dependency graph, munged into a form cromulent for Bazel. This means that Bazel sees the same dependency graph as Stack does, and can therefore parallelize the build on multiple cores in exactly the same way Stack does. But because this is Bazel, we can even distribute the build on multiple machines at once (see below).

Shared cache for Cabal libraries

The upshot is that you can now write polyglot projects that include Haskell code and hundreds of third-party dependencies without sweating. By building it with Bazel, you get the benefit of correct caching to accelerate all your build jobs. The Bazel cache can be remote and shared among all of your continuous integration worker machines and even shared with all of your developers. Bazel's correctness guarantees make this safe to do. If a branch was published and the build succeeded, then any developer that checks out the branch now benefits from fast builds.

Conclusions

It's an interesting story that to achieve correct, reproducible, and cacheable builds, we gainfully combined Haskell's three main build technologies:

  • Bazel to run build actions in parallel or distributed on many nodes in a build cluster,
  • Cabal to interpret the metadata of existing third-party code and correctly construct shared and static libraries, and
  • Stack to inform Bazel about where to find the source code for the third-party dependencies, what versions to use, and tell it what the dependency graph looks like.

Another interesting observation is that emulating Cabal is hard. We previously collaborated with Formation on Hazel, an effort to reimplement Cabal as a Bazel ruleset. It turned out that getting the Cabal semantics exactly right for all packages on all platforms (especially Windows) was exceedingly difficult. With the current approach, we lose a few build parallelization opportunities, but wrapping Cabal instead of reimplementing it leads to a much simpler solution overall.

Have a look at Digital Asset's DAML repository. DAML is an example of a large Haskell project powered by Bazel and rules_haskell. It's an open source smart contract language for building distributed applications. You can build the project using this new Stack and Cabal support on Linux and macOS. Windows support is in progress. The repository has around 150 direct Hackage dependencies and makes use of advanced features such as a custom stack snapshot, custom package flags, C library dependencies, and injecting vendored packages into Stack's dependency graph.

October 09, 2019 12:00 AM

October 07, 2019

Monday Morning Haskell

Using Our Data with Supervised Learning

supervised_learning.png

Our aim these last couple weeks has been to try a supervised learning approach to our game. In last week's article we gathered training data for playing the game. We had two different sources. First, we played the game ourselves and recorded our moves. Second, we let our AI play the game and recorded it. This gave us a few CSV files. Each line in these is a record containing the 40 "features" of the game board at the time and the move we chose.

This week, we're going to explore how to build a machine-learning agent based on this data. This will use supervised learning techniques . Look at the supervised-learning branch on our Github repository for more details.

To get started with Haskell and Tensor Flow, download our Haskell Tensor Flow Guide. This library is a little tricky to work with, so you want to make sure you know what you're doing!

Defining Our Model

For our supervised model, we're going to use a fully connected neural network with a single hidden layer. We'll have 40 input features, 100 hidden units, and then our 10 output values for the different move scores. We'll be following a very similar pattern to one we explored in this older article, using the basic Iris data set. We'll copy a lot of code from that article. We won't go over a lot of the helper code in this article, so feel free to check out that one for some help with that!

We define each layer with a "weights" matrix and a "bias" vector. We multiply the input by the weights and then add the bias vector. Let's explore how we can build a single layer of the network. This will take the input and output size, as well as the input tensor. It will have three results. One variable for the weights, one for the biases, and then a final "output" tensor:

buildNNLayer :: Int64 -> Int64 -> Tensor v Float
  -> Build (Variable Float, Variable Float, Tensor Build Float)

The definition is pretty simple. We'll initialize random variables for the weights and bias. We'll produce the result tensor by multiplying by the weights and adding the bias.

buildNNLayer :: Int64 -> Int64 -> Tensor v Float
  -> Build (Variable Float, Variable Float, Tensor Build Float)
buildNNLayer inputSize outputSize input = do
  weights <- truncatedNormal (vector [inputSize, outputSize])
    >>= initializedVariable
  bias <- truncatedNormal (vector [outputSize])
    >>= initializedVariable
  let results = (input `matMul` readValue weights)
        `add` readValue bias
  return (weights, bias, results)

Now that we understand the layers a little better, it's easier to define our model. First, we'll want to include both sets of weights and biases in the model, so we can output them later:

data Model = Model
  { w1 :: Variable Float
  , b1 :: Variable Float
  , w2 :: Variable Float
  , b2 :: Variable Float
  ...
  }

Now we want two different "steps" we can run. The training step will take a batch of data and determine what our network produces for the inputs. It will compare our network's output with the expected output. Then it will train to minimize the loss function. The error rate step will simply produce the error rate on the given data. That is, it will tell us what percentage of the moves we are getting correct. Both of these will be Session actions that take two inputs. First, the TensorData for the features, and then the TensorData for the correct moves:

data Model = Model
  { 
  … -- Weights and biases
  , train :: TensorData Float
          -> TensorData Int64
          -> Session ()
  , errorRate :: TensorData Float
              -> TensorData Int64
              -> Session (V.Vector Float) -- Produces a single number
  }

Let's see how we put this all together.

Building Our Model

To start, let's make placeholders for our input features and expected output results. A dimension of -1 means we can provide any size we like:

createModel :: Build Model
createModel = do
  let batchSize = -1
  (inputs :: Tensor Value Float) <-
    placeholder [batchSize, moveFeatures]
  (outputs :: Tensor Value Int64) <-
    placeholder [batchSize]
  ...

Now we build the layers of our neural network using our helper. We'll apply relu, an activation function, on the results of our hidden layer. This helps our model deal with interaction effects and non-linearities:

createModel :: Build Model
createModel = do
  ...
  (hiddenWeights, hiddenBiases, hiddenResults) <-
    buildNNLayer moveFeatures hiddenUnits inputs
  let rectifiedHiddenResults = relu hiddenResults
  (finalWeights, finalBiases, finalResults) <-
    buildNNLayer hiddenUnits moveLabels rectifiedHiddenResults
  ...

Now to get our error rate, we need a couple steps. We'll get the best move from each predicted result using argMax. We can then compare these to the training data using equal. By using reduceMean we'll get the percentage of our moves that match. Subtracting this from 1 gives our error rate:

createModel :: Build Model
createModel = do
  ...
  (actualOutput :: Tensor Value Int64) <- render $
    argMax finalResults (scalar (1 :: Int64))
  let (correctPredictions :: Tensor Build Float) = cast $
        equal actualOutput outputs
  (errorRate_ :: Tensor Value Float) <- render $
    1 - (reduceMean correctPredictions)

Now we need our training step. We'll compare outputs. This involves the softmaxCrossEntropyWithLogits function. We train our model by selecting our variables for training, and using minimizeWith. This will update the variables to reduce the value of the loss function:

createModel :: Build Model
createModel = do
  ...
  let outputVectors = oneHot outputs (fromIntegral moveLabels) 1 0
  let loss = reduceMean $ fst $
    softmaxCrossEntropyWithLogits finalResults outputVectors
  let params =
        [hiddenWeights, hiddenBiases, finalWeights, finalBiases]
  train_ <- minimizeWith adam loss params
  ...

We conclude by creating our functions. These take the tensor data as parameters. Then they use runWithFeeds to put the data into our placeholders:

createModel :: Build Model
createModel = do
  ...
return $ Model
    { train = \inputFeed outputFeed ->
        runWithFeeds
          [ feed inputs inputFeed
          , feed outputs outputFeed
          ]
          train_
    , errorRate = \inputFeed outputFeed ->
        runWithFeeds
          [ feed inputs inputFeed
          , feed outputs outputFeed
          ]
          errorRate_
    , w1 = hiddenWeights
    , b1 = hiddenBiases
    , w2 = finalWeights
    , b2 = finalBiases
    }

Running Our Tests

Now let's run our tests. We'll read the move record data from the file, shuffle them, and set aside a certain proportion as our test set. Then we'll build our model:

runTraining totalFile = runSession $ do
  initialRecords <- liftIO $ readRecordFromFile totalFile
  shuffledRecords <- liftIO $ shuffleM (V.toList initialRecords)
  let testRecords = V.fromList $ take 2000 shuffledRecords
  let trainingRecords = V.fromList $ drop 2000 shuffledRecords
  model <- build createModel
  ...

Then we run our iterations (we'll do 50000, as an example). We select some random records (100 per batch), and then convert them to data. Then we run our train step. Finally, every 100 iterations or so, we'll get a gauge of the training error on this set. This involves the errorRate step. Note our error rate returns a vector with a single wrapped value. So we need to unwrap it with !.

runTraining totalFile = runSession $ do
  ...
  forM_ ([0..50000] :: [Int]) $ \i -> do
    trainingSample <- liftIO $ chooseRandomRecords trainingRecords
    let (trainingInputs, trainingOutputs) =
          convertRecordsToTensorData trainingSample
    (train model) trainingInputs trainingOutputs
    when (i `mod` 100 == 0) $ do
      err <- (errorRate model) trainingInputs trainingOutputs
      liftIO $ putStrLn $
        (show i) ++ " : current error " ++ show ((err V.! 0) * 100)

Now to run the final test, we use the errorRate step again, this time on our test data:

runTraining totalFile = runSession $ do
  ...

  -- Testing
  let (testingInputs, testingOutputs) =
        convertRecordsToTensorData testRecords
  testingError <- (errorRate model) testingInputs testingOutputs
  liftIO $ putStrLn $
    "test error " ++ show ((testingError V.! 0) * 100)

Results

When it comes to testing our system, we should use proper validation techniques. We want a model that will represent our training data well. But it should also generalize well to other reasonable examples. If our model represents the training data too well, we're in danger of "overfitting" our data. To check this, we'll hold back roughly 20% of the data. This will be our "test" data. We'll train our model on the other 80% of the data. Every 100 steps or so, we print out the training error on that batch of data. We hope this figure drops. But then at the very end, we'll run the model on the other 20% of the data, and we'll see what the error rate is. This will be the true test of our system.

We know we have overfitting if we see figures on training error that are lower than the testing error. When training on human moves for 50000 iterations, the training error drops to the high teens and low 20's. But the test error is often still close to 50%. This suggests we shouldn't be training quite as long.

The AI moves provide a little more consistency though. The training error seems to stabilize around the mid 20's and low 30's, and we end up with a test error of about 34%.

Conclusion

Our error rate isn't terrible. But it's not great either. And worse, testing shows it doesn't appear to capture the behaviors well enough to win the game. A case like this suggests our model isn't sophisticated enough to capture the problem. It could also suggest our data is too noisy, and the patterns we hoped to find aren't there. The feature set we have might not capture all the important information about the graph.

For our final look at this problem, we're going to try a more new serialization technique. Instead of deriving our own features, we're going to serialize the entire game board! The "feature space" will be much much larger now. It will include the structure of the graph and information about enemies and drills. This will call for a more sophisticated model. A pure fully connected network will take a long time to learn things like how walls allow moves or not. A big drawback of this technique is that it will not generalize to arbitrary mazes. It will only work for a certain size and number of enemies. But with enough training time we may find that interesting patterns emerge. So come back next week to see how this works!

by James Bowen at October 07, 2019 02:30 PM

FP Complete

Ownership and impl Trait

There's a common pattern in Rust APIs: returning a relatively complex data type which provide a trait implementation we want to work with. One of the first places many Rust newcomers encounter this is with iterators. For example, if I want to provide a function that returns the range of numbers 1 to 10, it may look like this:

use std::ops::RangeInclusive;

fn one_to_ten() -> RangeInclusive<i32> {
    1..=10i32
}

This obscures the iterator-ness of what's happening here. However, the situation gets worse as you start making things more complicated, e.g.:

use std::iter::Filter;

fn is_even(x: &i32) -> bool {
    x % 2 == 0
}

fn evens() -> Filter<RangeInclusive<i32>, for<'r> fn(&'r i32) -> bool> {
    one_to_ten().filter(is_even)
}

Or even more crazy:

use std::iter::Map;

fn double(x: i32) -> i32 {
    x * 2
}

fn doubled() ->
    Map<
        Filter<
               RangeInclusive<i32>,
               for<'r> fn(&'r i32) -> bool
              >,
        fn(i32) -> i32
       > {
    evens().map(double)
}

This is clearly not the code we want to write! Fortunately, we now have a more elegant way to state our intention: impl Trait. This feature allows us to say that a function returns a value which is an implementation of some trait, without needing to explicitly state the concrete type. We can rewrite the signatures above with:

fn one_to_ten() -> impl Iterator<Item = i32> {
    1..=10i32
}

fn is_even(x: &i32) -> bool {
    x % 2 == 0
}

fn evens() -> impl Iterator<Item = i32> {
    one_to_ten().filter(is_even)
}

fn double(x: i32) -> i32 {
    x * 2
}

fn doubled() -> impl Iterator<Item = i32> {
    evens().map(double)
}

fn main() {
    for x in doubled() {
        println!("{}", x);
    }
}

This can be a boon for development, especially when we get to more complicated cases (like futures and tokio heavy code). However, I'd like to present one case where impl Trait demonstrates a limitation. Hopefully this will help explain some of the nuances of ownership and its interaction with this feature.

Introducing the riddle

Have a look at this code, which does not compile:

// Try replacing with (_: &String)
fn make_debug<T>(_: T) -> impl std::fmt::Debug {
    42u8
}

fn test() -> impl std::fmt::Debug {
    let value = "value".to_string();

    // try removing the ampersand to get this to compile
    make_debug(&value)
}

pub fn main() {
    println!("{:?}", test());
}

In this code, we have a make_debug function, which takes any value at all, entirely ignores that value, and returns a u8. However, instead of including the u8 in the function signature, I say impl Debug (which is fully valid: u8 does in fact implement Debug). The test function produces its own impl Debug by passing in a &String to make_debug.

When I try to compile this, I get the error message:

error[E0597]: `value` does not live long enough
  --> src/main.rs:10:16
   |
6  | fn test() -> impl std::fmt::Debug {
   |              -------------------- opaque type requires that `value` is borrowed for `'static`
...
10 |     make_debug(&value)
   |                ^^^^^^ borrowed value does not live long enough
11 | }
   | - `value` dropped here while still borrowed

Before we try to understand this error message, I want to deepen the riddle here. There are a large number of changes I can make to this code to get it to compile. For example:

  • If I replace the T parameter on make_debug with &String (or the more idiomatic &str). the code compiles. For some reason, being polymorphic causes a problem.
  • Perhaps even stranger, changing the signature from make_debug<T>(_: T) to make_debug<T>(_: &T) fixes it too! What's weird about this is that T allows references to be passed in, so why does &T fix anything?
  • And finally, in the call to make_debug, if we pass the value (via a move) instead of a reference to the value, everything compiles, e.g. make_debug(value) instead of make_debug(&value). At least intuitively, I would expect to get less lifetime errors when using references.

Something subtle is going on here, let's try to understand it, bit by bit.

Lifetimes with concrete types

Let's simplify our make_debug function to explicitly take a String:

fn make_debug(_: String) -> impl std::fmt::Debug {
    42u8
}

What's the lifetime of that parameter? Well, make_debug consumes the value completely and then drops it. The value cannot be used outside of the function any more. Interestingly though, the fact that make_debug drops it is not really reflected in the type signature of the function; it just says we return an impl Debug. To prove the point a bit, we can instead return the parameter itself instead of our 42u8:

fn make_debug(message: String) -> impl std::fmt::Debug {
    //42u8
    message
}

In this case, the ownership of the message transfers from the make_debug function itself to the returned impl Debug value. That's an interesting and important observation which we'll get back to in a bit. Let's keep exploring, and instead look at a make_debug that accepts a &String:

fn make_debug(_: &String) -> impl std::fmt::Debug {
    42u8
}

What's the lifetime of that reference? Thanks to lifetime elision, we don't have to state it explicitly. But the implied lifetime is within the lifetime of the function itself. In other words, our borrow of the String expires completely when our function exits. We can prove that point a bit more by trying to return the reference:

fn make_debug(message: &String) -> impl std::fmt::Debug {
    //42u8
    message
}

The error message we get is a bit surprising, but quite useful:

error: cannot infer an appropriate lifetime
 --> src/main.rs:4:5
  |
2 | fn make_debug(message: &String) -> impl std::fmt::Debug {
  |                                    -------------------- this return type evaluates to the `'static` lifetime...
3 |     //42u8
4 |     message
  |     ^^^^^^^ ...but this borrow...
  |
note: ...can't outlive the anonymous lifetime #1 defined on the function body at 2:1
 --> src/main.rs:2:1
  |
2 | / fn make_debug(message: &String) -> impl std::fmt::Debug {
3 | |     //42u8
4 | |     message
5 | | }
  | |_^
help: you can add a constraint to the return type to make it last less than `'static` and match the anonymous lifetime #1 defined on the function body at 2:1
  |
2 | fn make_debug(message: &String) -> impl std::fmt::Debug + '_ {
  |                                    ^^^^^^^^^^^^^^^^^^^^^^^^^

What's happening is we have essentially two lifetimes in our signature. The implied lifetime for message is the lifetime of the function, whereas the lifetime for impl Debug is 'static, meaning it either borrows no data or only borrows values that last the entire program (such as a string literal). We can even try to follow through with the recommendation and add some explicit lifetimes:

fn make_debug<'a>(message: &'a String) -> impl std::fmt::Debug + 'a {
    message
}

fn test() -> impl std::fmt::Debug {
    let value = "value".to_string();
    make_debug(&value)
}

While this fixes make_debug itself, we can no longer call make_debug successfully from test:

error[E0597]: `value` does not live long enough
  --> src/main.rs:11:16
   |
7  | fn test() -> impl std::fmt::Debug {
   |              -------------------- opaque type requires that `value` is borrowed for `'static`
...
11 |     make_debug(&value)
   |                ^^^^^^ borrowed value does not live long enough
12 | }
   | - `value` dropped here while still borrowed

In other words, our return value from test() is supposed to outlive test itself, but value does not outlive test.

Challenge question Make sure you can explain to yourself (or a rubber duck): why did returning message work when we were passing by value but not by reference?

For the concrete type versions of make_debug, we essentially have a two-by-two matrix: whether we pass by value or reference, and whether we return the provided parameter or a dummy 42u8 value. Let's get this clearly recorded:

By value By reference
Use message Success: parameter owned by return value Failure: return value outlives reference
Use dummy 42 Success: return value doesn't need parameter Success: return value doesn't need reference

Hopefully the story with concrete types just described makes sense. But that leaves us with the question...

Why does polymorphism break things?

We see in the bottom row that, when returning the dummy 42 value, we're safe with both pass-by-value and pass-by-reference, since the returned value doesn't need the parameter at all. But for some reason, when we use a parameter T instead of String or &String, we get an error message. Let's refresh our memory a bit with the code:

fn make_debug<T>(_: T) -> impl std::fmt::Debug {
    42u8
}

fn test() -> impl std::fmt::Debug {
    let value = "value".to_string();
    make_debug(&value)
}

And the error message:

error[E0597]: `value` does not live long enough
  --> src/main.rs:10:16
   |
6  | fn test() -> impl std::fmt::Debug {
   |              -------------------- opaque type requires that `value` is borrowed for `'static`
...
10 |     make_debug(&value)
   |                ^^^^^^ borrowed value does not live long enough
11 | }
   | - `value` dropped here while still borrowed

From within make_debug, we can readily see that the parameter is ignored. However, and this is the important bit: the function signature of make_debug doesn't tell us that explicitly! Instead, here's what we know:

  • make_debug takes a parameter of type T
  • T may contain references with non-static lifetimes, we really don't know (also: very important!)
  • The return value is an impl Debug
  • We don't know what concrete type this return value has, but it must have a 'static lifetime
  • The impl Debug may rely upon data inside the T parameter

The outcome of this is: if T has any references, then their lifetime must be at least as large as the lifetime of the return impl Debug, which would mean it must be a 'static lifetime. Which sure enough is the error message we get:

opaque type requires that `value` is borrowed for `'static`

Notice that this occurs at the call to make_debug, not inside make_debug. Our make_debug function is perfectly valid as-is, it simply has an implied lifetime. We can be more explicit with:

fn make_debug<T: 'static>(_: T) -> impl std::fmt::Debug + 'static

Why the workarounds work

We previously fixed the compilation failure by making the type of the parameter concrete. There are two relatively easy ways to work around the compilation failure and still keep the type polymorphic. They are:

  1. Change the parameter from _: T to _: &T
  2. Change the call site from make_debug(&value) to make_debug(value)

Challenge Before reading the explanations below, try to figure out for yourself what these changes fix the compilation based on what we've explained so far.

Change parameter to &T

Our implicit requirement of T is that any references it contains have a static lifetime. This is because we cannot see from the type signature whether the impl Debug is holding onto data inside T. However, by making the parameter itself a reference, we change the ballgame completely. Suddenly, instead of just a single implied lifetime of 'static on T, we have two implied lifetimes:

  • The lifetime of the reference, which we'll call 'a
  • The lifetime of the T value and the impl Debug, which are both still 'static

More explicitly:

fn make_debug<'a, T: 'static>(_: &'a T) -> impl std::fmt::Debug + 'static

While we cannot see from this type signature whether the impl Debug depends on data inside the T, we do know—by the definition of the impl Trait feature itself—that it does not depend on the 'a lifetime. Therefore, the only requirement for the reference is that it live as long as the call to make_debug itself, which is in fact true.

Change call to pass-by-value

If, on the other hand, we keep the parameter as T (instead of &T), we can fix the compilation issue by passing by value with make_debug(value) (instead of make_debug(&value)). This is because the requirement of the T value passed in is that it have 'static lifetime, and values without reference do have such a lifetime (since they are owned by the function). More intuitively: make_debug takes ownership of the T, and if the impl Debug uses that T, it will take ownership of it away from make_debug. Otherwise, when we leave make_debug, the T will be dropped.

Review by table

To sum up the polymorphic case, let's break out another table, this time comparing whether the parameter is T or &T, and whether the call is make_debug(value) or make_debug(&value):

Parameter is T Parameter is &T
make_debug(value) Success: lifetime of the String is 'static Type error: passing a String when a reference expected
make_debug(&value) Lifetime error: &String doesn't have lifetime 'static Success: lifetime of the reference is 'a

Conclusion

Personally I found this behavior of impl Trait initially confusing. However, walking through the steps above helped me understand ownership in this context a bit better. impl Trait is a great feature in the Rust language. However, there may be some cases where we need to be more explicit about the lifetimes of values, and then reverting to the original big type signature approach may be warranted. Hopefully those cases are few and far between. And often, an explicit clone—while inefficient—can save a lot of work.

Learn more

Read more information on Rust at FP Complete and see our other learning material. If you're interested in getting help with your projects, check out our consulting and training offerings.

FP Complete specializes in server side software, with expertise in Rust, Haskell, and DevOps. You can learn more about our mission. If you're interested in learning about how we can help your team succeed, please reach out for a free consultation with one of our engineers.

October 07, 2019 03:48 AM

October 06, 2019

Chris Smith 2

This would be a valid mathematical function with no fixpoint, but you cannot define it in Haskell.

This would be a valid mathematical function with no fixpoint, but you cannot define it in Haskell. Here are two perspectives on why:

  1. You cannot pattern-match on ⊥, nor compare to it. Although it’s useful to think of ⊥ as a value, remember that really it just represents a computation that doesn’t terminate. Asking the computer to determine whether a value is ⊥ or not is, quite literally, asking to solve the halting problem!
  2. Equivalently, the problem is that your function is not monotone in the definedness order. If f(⊥) = 1, then anything more defined than ⊥ — and that is everything at all — must map to something more defined than 1. (Since 1 is completely defined, that means they must map to 1 exactly. So if f(⊥) = 1, then f must be a constant function.)

Another way to look at being monotone in the definedness order is to think of ⊥ as meaning “I don’t know yet!”. If you later learn more about the input, you can still learn more about the output; but you cannot contradict what you already supposedly knew. That would be logically inconsistent. Saying f(⊥) = 1 is saying that if you don’t yet know anything at all about the input, then you still already know that the output is 1. Fine, but then you cannot turn around and say that f(1) = 2. By contrast, it’s fine if, say, g(⊥) = 1 : ⊥, but g(1) = 1 : []. That just says that without knowing anything about the input, you already know the output is a list that starts with 1; but if you know the input is 1, then you learn something more: the 1 is the only element of the output list. It doesn’t contradict what you already knew. That’s what it means for the function to be monotone. In Haskell, you can only define monotone functions.

by Chris Smith at October 06, 2019 05:47 PM

October 04, 2019

Mark Jason Dominus

Addenda to recent articles 201910

Several people have written in with helpful remarks about recent posts:

  • Regarding online tracking of legislation:

    • Ed Davies directed my attention to www.legislation.gov.uk, an official organ of the British government, which says:

    The aim is to publish legislation on this site simultaneously, or at least within 24 hours, of its publication in printed form.

    M. Davies is impressed. So am I. Here is the European Union (Withdrawal) Act 2018.

    This then led me to Standardizing the World’s Legislative Information — One hackathon at a time on the LII's VOXPOPULII blog.

    (Reminder to readers: I do not normally read Twitter, and it is not a reliable way to contact me.)

  • Regarding the mysteriously wide letter ‘O’ on the Yeadon firehouse. I had I had guessed that it was not in the same family as the others, perhaps because the original one had been damaged. I asked Jonathan Hoefler, a noted font expert; he agreed.

    But one reader, Steve Nicholson, pointed out that it is quite common, in Art Deco fonts, for the ‘O’ to be circular even when that makes it much wider than the other letters. He provided ten examples, such as Haute Corniche.

    I suggested this to M. Hoefler, but he rejected the theory decisively:

    True; it's a Deco mannerism to have 'modulated capitals'… . But this isn't a deco font, or a deco building, and in any case it would have been HIGHLY unlikely for a municipal sign shop to spec something like this for any purpose, let alone a firehouse. It's a wrong sort O, probably installed from the outset.

    (The letter spacing suggests that this is the original ‘O’.)

  • Several people wrote to me about the problem of taking half a pill every day, in which I overlooked that the solution was simply the harmonic numbers.

    • Robin Houston linked to this YouTube video, “the frog problem”, which has the same solution, and observed that the two problems are isomorphic, proceeding essentially as Jonathan Dushoff does below.

    • Shreevatsa R. wrote a long blog article detailing their thoughts about the solution. I have not yet read the whole thing carefully but knowing M. Shreevatsa, it is well worth reading. M. Shreevatsa concludes, as I did, that a Markov chain approach is unlikely to be fruitful, but then finds an interesting approach to the problem using probability generating functions, and then another reformulating it as a balls-in-bins problem.

    • Jonathan Dushoff sent me a very clear and elegant solution and kindly gave me permission to publish it here:

    The first key to my solution is the fact that you can add expectations even when variables are not independent.

    In this case, that means that each time we break a pill we can calculate the probability that the half pill we produce will "survive" to be counted at the endpoint. That's the same as the expectation of the number of half-pills that pill will contribute to the final total. We can then just add these expectations to get the answer! A little counter-intuitive, but absolutely solid.

    The next key is symmetry. If I break a half pill and there are whole pills left, the only question for that half pill is the relative order in which I pick those objects. In particular, any other half pills that exist or might be generated can be ignored for the purpose of this part of the question. By symmetry, any of these objects is equally likely to be last, so the survival probability is .

    If I start with pills and break one, I have whole pills left, so the probability of that pill surviving is . Going through to the end we get the answer:

    $$\frac1n + \frac1{n-1} + \ldots + 1.$$

  • I have gotten feedback from several people about my Haskell type constructor clutter, which I will write up separately, probably, once I digest it.

Thanks to everyone who wrote in, even people I forgot to mention above, and even to the Twitter person who didn't actually write in.

by Mark Dominus (mjd@plover.com) at October 04, 2019 05:09 PM

October 03, 2019

Mark Jason Dominus

Neatness counts

I recently mentioned a citation listing on one of the pages of the United States Code at LII. It said:

(Pub. L. 85–767, Aug. 27, 1958, 72 Stat. 904; Pub. L. 86–342, title I, § 106, Sept. 21, 1959, 73 Stat. 612; Pub. L. 87–61, title I, § 106, June 29, 1961, 75 Stat. 123; Pub. L. 88–157, § 5, Oct. 24, 1963, 77 Stat. 277; Pub. L. 89–285, title I, § 101, Oct. 22, 1965, 79 Stat. 1028; Pub. L. 89–574, § 8(a), Sept. 13, 1966, 80 Stat. 768; Pub. L. 90–495, § 6(a)–(d), Aug. 23, 1968, 82 Stat. 817; Pub. L. 91–605, title I, § 122(a), Dec. 31, 1970, 84 Stat. 1726; Pub. L. 93–643, § 109, Jan. 4, 1975, 88 Stat. 2284; Pub. L. 94–280, title I, § 122, May 5, 1976, 90 Stat. 438; Pub. L. 95–599, title I, §§ 121, 122, Nov. 6, 1978, 92 Stat. 2700, 2701; Pub. L. 96–106, § 6, Nov. 9, 1979, 93 Stat. 797; Pub. L. 102–240, title I, § 1046(a)–(c), Dec. 18, 1991, 105 Stat. 1995, 1996; Pub. L. 102–302, § 104, June 22, 1992, 106 Stat. 253; Pub. L. 104–59, title III, § 314, Nov. 28, 1995, 109 Stat. 586; Pub. L. 105–178, title I, § 1212(a)(2)(A), June 9, 1998, 112 Stat. 193; Pub. L. 112–141, div. A, title I, §§ 1519(c)(6), formerly 1519(c)(7), 1539(b), July 6, 2012, 126 Stat. 576, 587, renumbered § 1519(c)(6), Pub. L. 114–94, div. A, title I, § 1446(d)(5)(B), Dec. 4, 2015, 129 Stat. 1438.)

My comment was “Whew”.

But this wouldn't have been so awful if LII had made even a minimal effort to clean it up:

  • Pub. L. 85–767, Aug. 27, 1958, 72 Stat. 904
  • Pub. L. 86–342, title I, § 106, Sept. 21, 1959, 73 Stat. 612
  • Pub. L. 87–61, title I, § 106, June 29, 1961, 75 Stat. 123
  • Pub. L. 88–157, § 5, Oct. 24, 1963, 77 Stat. 277
  • Pub. L. 89–285, title I, § 101, Oct. 22, 1965, 79 Stat. 1028
  • Pub. L. 89–574, § 8(a), Sept. 13, 1966, 80 Stat. 768
  • Pub. L. 90–495, § 6(a)–(d), Aug. 23, 1968, 82 Stat. 817
  • Pub. L. 91–605, title I, § 122(a), Dec. 31, 1970, 84 Stat. 1726
  • Pub. L. 93–643, § 109, Jan. 4, 1975, 88 Stat. 2284
  • Pub. L. 94–280, title I, § 122, May 5, 1976, 90 Stat. 438
  • Pub. L. 95–599, title I, §§ 121, 122, Nov. 6, 1978, 92 Stat. 2700, 2701
  • Pub. L. 96–106, § 6, Nov. 9, 1979, 93 Stat. 797
  • Pub. L. 102–240, title I, § 1046(a)–(c), Dec. 18, 1991, 105 Stat. 1995, 1996
  • Pub. L. 102–302, § 104, June 22, 1992, 106 Stat. 253
  • Pub. L. 104–59, title III, § 314, Nov. 28, 1995, 109 Stat. 586
  • Pub. L. 105–178, title I, § 1212(a)(2)(A), June 9, 1998, 112 Stat. 193
  • Pub. L. 112–141, div. A, title I, §§ 1519(c)(6), formerly 1519(c)(7), 1539(b), July 6, 2012, 126 Stat. 576, 587, renumbered § 1519(c)(6), Pub. L. 114–94, div. A, title I, § 1446(d)(5)(B), Dec. 4, 2015, 129 Stat. 1438.

That's the result of s/; /<li>/g, nothing more.

(I wonder if that long citation at the end is actually two citations.)

by Mark Dominus (mjd@plover.com) at October 03, 2019 05:46 PM

The pain of tracking down changes in U.S. law

Last month when I was researching my article about the free coffee provision in U.S. federal highway law, I spent a great deal of time writing this fragment:

under the Federal-Aid Highway Act of 1978

I knew that the provision was in 23 USC §131, but I should explain what this means.

The body of U.S. statutory law can be considered a single giant document, which is "codified" as the United States Code, or USC for short. USC is divided into fifty or sixty “titles” or subject areas, of which the relevant one here, title 23, concerns “Highways”. The titles are then divided into sections (the free coffee is in section 131), paragraphs, sub-paragraphs, and so on, each with an identifying letter. The free coffee is 23 USC §131 (c)(5).

But this didn't tell me when the coffee exception was introduced or in what legislation. Most of Title 23 dates from 1958, but the coffee sign exception was added later. When Congress amends a law, they do it by specifying a patch to the existing code. My use of the programmer jargon term “patch” here is not an analogy. The portion of the Federal-Aid Highway Act of 1978 that enacted the “free coffee” exception reads as follows:

ADVERTISING BY NONPROFIT ORGANIZATIONS

Sec. 121. Section 131(c) of title 23, United States Code, is amended—
  (1) by striking out “and (4)” and inserting in lieu thereof “(4)”; and
  (2) by striking out the period at the end thereof and inserting in lieu thereof a comma and the following: “and (5) signs, displays, and devices advertising the distribution of nonprofit organizations of free coffee […]”.

(The “[…]” is my elision. The Act includes the complete text that was to be inserted.)

The act is not phrased as a high-level functional description, such as “extend the list of exceptions to include: ... ”. It says to replace the text ‘and (4)’ with the text ‘(4)’; then replace the period with a comma; then …”, just as if Congress were preparing a patch in a version control system.

Unfortunately, the lack of an actual version control system makes it quite hard to find out when any particular change was introduced. The code page I read is provided by the Legal Information Institute at Cornell University. At the bottom of the page, there is a listing of the changes that went into this particular section:

(Pub. L. 85–767, Aug. 27, 1958, 72 Stat. 904; Pub. L. 86–342, title I, § 106, Sept. 21, 1959, 73 Stat. 612; Pub. L. 87–61, title I, § 106, June 29, 1961, 75 Stat. 123; Pub. L. 88–157, § 5, Oct. 24, 1963, 77 Stat. 277; Pub. L. 89–285, title I, § 101, Oct. 22, 1965, 79 Stat. 1028; Pub. L. 89–574, § 8(a), Sept. 13, 1966, 80 Stat. 768; Pub. L. 90–495, § 6(a)–(d), Aug. 23, 1968, 82 Stat. 817; Pub. L. 91–605, title I, § 122(a), Dec. 31, 1970, 84 Stat. 1726; Pub. L. 93–643, § 109, Jan. 4, 1975, 88 Stat. 2284; Pub. L. 94–280, title I, § 122, May 5, 1976, 90 Stat. 438; Pub. L. 95–599, title I, §§ 121, 122, Nov. 6, 1978, 92 Stat. 2700, 2701; Pub. L. 96–106, § 6, Nov. 9, 1979, 93 Stat. 797; Pub. L. 102–240, title I, § 1046(a)–(c), Dec. 18, 1991, 105 Stat. 1995, 1996; Pub. L. 102–302, § 104, June 22, 1992, 106 Stat. 253; Pub. L. 104–59, title III, § 314, Nov. 28, 1995, 109 Stat. 586; Pub. L. 105–178, title I, § 1212(a)(2)(A), June 9, 1998, 112 Stat. 193; Pub. L. 112–141, div. A, title I, §§ 1519(c)(6), formerly 1519(c)(7), 1539(b), July 6, 2012, 126 Stat. 576, 587, renumbered § 1519(c)(6), Pub. L. 114–94, div. A, title I, § 1446(d)(5)(B), Dec. 4, 2015, 129 Stat. 1438.)

Whew.

Each of these is a citation of a particular Act of Congress. For example, the first one

Pub. L. 85–767, Aug. 27, 1958, 72 Stat. 904

refers to “Public law 85–767”, the 767th law enacted by the 85th Congress, which met during the Eisenhower administration, from 1957–1959. The U.S. Congress has a useful web site that contains a list of all the public laws, with links — but it only goes back to the 93rd Congress of 1973–1974.

And anyway, just knowing that it is Public law 85–767 is not (or was not formerly) enough to tell you how to look up its text. The laws must be published somewhere before they are codified, and scans of these publications, the United States Statutes at Large, are online back to the 82nd Congress. That is what the “72 Stat. 904” means: the publication was in volume 72 of the Statutes at Large, page 904. This citation style was obviously designed at a time when the best (or only) way to find the statute was to go down to the library and pull volume 72 off the shelf. It is well-deisgned for that purpose. Now, not so much.

Here's a screengrab of the relevant portion of the relevant part of the 1978 act:

Screengrab of scan of the text quoted earlier, ADVERTISING BY NONPROFIT ORGANIZATIONS

The citation for this was:

Pub. L. 95–599, title I, §§ 121, 122, Nov. 6, 1978, 92 Stat. 2700, 2701

(Note that “title I, §§ 121, 122” here refers to the sections of the act itself, not the section of the US Code that was being amended; that was title 23, §131, remember.)

To track this down, I had no choice but to grovel over each of the links to the Statutes at Large, download each scan, and search over each one looking for the coffee provision. I kept written notes so that I wouldn't mix up the congressional term numbers with the Statutes volume numbers.

It ought to be possible, at least in principle, to put the entire U.S. Code into a version control system, with each Act of Congress represented as one or more commits, maybe as a merged topic branch. The commit message could contain the citation, something like this:

    commit a4e2b2a1ca2d5245c275ddef55bf8169d72580df
    Merge: 6829b2dd986 836108c2ba0
    Author: ... <...>
    Date:   Mon Nov 6 00:00:00 1978 -0400

        Surface Transportation Assistance Act of 1978

        P.L. 95–599
        92 Stat. 2689–2762
        H.R. 11733   

        Merge branch `pl-95-599` to `master`

    commit 836108c2ba0d5245c275ddef55bf8169d72580df
    Author: ... <...>
    Date:   Mon Nov 6 00:00:00 1978 -0400

        Federal-Aid Highway Act of 1978 (section 121)

        (Surface Transportation Assistance Act of 1978, title I)
        P.L. 95–599
        92 Stat. 2689–2762
        H.R. 11733

        Signs advertising free coffee are no longer prohibited
        within 660 feet of a federal highway.

    diff --git a/USC/title23.md b/USC/title23.md
    index 084bfc2..caa5a53 100644
    --- a/USC/title23.md
    +++ b/USC/title23.md
    @@ -20565,11 +20565,16 @@ 23 U.S. Code § 131. Control of outdoor advertising
     be changed at reasonable intervals by electronic process or by remote
     control, advertising activities conducted on the property on which
    -they are located, and (4) signs lawfully in existence on October 22,
    +they are located, (4) signs lawfully in existence on October 22,
     1965, determined by the State, subject to the approval of the
     Secretary, to be landmark signs, including signs on farm structures or
     natural surfaces, or historic or artistic significance the
     preservation of which would be consistent with the purposes of this
    -section.
    +section, and (5) signs, displays, and devices advertising the
    +distribution by nonprofit organizations of free coffee to individuals
    +traveling on the Interstate System or the primary system. For the
    +purposes of this subsection, the term “free coffee” shall include
    +coffee for which a donation may be made, but is not required.
    +
     *(d)* In order to promote the reasonable, orderly and effective 

Or maybe the titles would be directories and the sections would be numbered files in those directories. Whatever. If this existed, I would be able to do something like:

  git log -Scoffee -p -- USC/title23.md

and the Act that I wanted would pop right out.

Preparing a version history of the United States Code would be a dauntingly large undertaking, but gosh, so useful. A good VCS enables you to answer questions that you previously wouldn't have even thought of asking.

Steve Buscemi in _Reservoir Dogs_ is playing the world's smallest violin.

This article started as a lament about how hard it was for me to track down the provenance of the coffee exception. But it occurs to me that this is the response of someone who has been spoiled by plenty. A generation ago it would have been unthinkable for me even to try to track this down. I would have had to start by reading a book about legal citations and learning what “79 Stat. 1028” meant, instead of just picking it up on the fly. Then I would have had to locate a library with a set of the Statutes at Large and travel to it. And here I am complaining about how I had to click 18 links and do an (automated!) text search on 18 short, relevant excerpts of the Statutes at Large, all while sitting in my chair.

My kids can't quite process the fact that in my childhood, you simply didn't know what the law was and you had no good way to find out. You could go down the the library, take the pertinent volumes of the USC off the shelf, and hope you had looked in all the appropriate places for the relevant statutes, but you could never be sure you hadn't overlooked something. OK, well, you still can't be sure, but now you can do keyword search, and you can at least read what it does say without having to get on a train.

Truly, we live in an age of marvels.

[ Addendum 20191004: More about this ]

by Mark Dominus (mjd@plover.com) at October 03, 2019 11:37 AM

October 02, 2019

Chris Smith 2

The “many meanings” of variables

What is a variable, anyway? Math educators have plenty of complex answers.

In their article “On Developing a Rich Conception of Variable” (part of this volume on undergraduate math education), Maria Trigueros and Sally Jacobs write, “Unlike the concept of function, for example, variable has no precise mathematical definition. It has come to be a catch-all term to cover a variety of uses of letters in expressions and equations.” It appears from a survey of different writing that this point of view — that there is no single definition of variable and each use must be understood in a sort of piecemeal case analysis — is widely accepted.

In this article, I’d like to challenge this notion. In fact, a very simple definition suffices: a variable is just a name that represents a value. Of course, variables are still used with different context and purpose. Instead of introducing variable as a fuzzy word with many definitions, though, educators would do better to nail down a simple definition, and then move on to considering how this unifying idea can be used to communicate and reason in various ways.

Enumerating variable meanings is an obstacle to comprehension and learning.

A typical analysis of the meaning of variables lists several definitions, classifying each use of variables as one of them. For instance:

  • Unknowns are variables that stand for a value that isn’t known yet, and should be discovered by solving an equation.
  • Generalized numbers are variables that stand for any number to make general statements like (a+ b) + c = a + (bc).
  • Covarying values are variables whose values change together, so that as one changes the other changes with it.
  • Parameters are variables whose values identify one from among a family of expressions or functions.

These words definitely get at various conventional ways to use variables. As separate definitions, though, these often come up short. After solving for x, for instance, it changes from an “unknown” to a “known”. Does it really cease to be a variable? Few would be willing to accept that x is a variable for students who did not do their homework, but it is not a variable for those who did! This may seem facetious, yet similar problems arise from the other definitions in the list. They have little to say about what a variable is, and change the subject toward what students should do with them to get an answer. This is ultimately just another instance of the central problem of math education: procedural knowledge without conceptual understanding.

The problem with choosing definitions in terms of what students should do is that as they progress in their level of sophistication, the kinds of things we ask them to do change. If the only way they understood earlier content was based on how to compute answers to certain problem types, then they must now relearn, and come away with the notion that somehow math changed on them. If students are clear on the difference between the definition of a concept and how it’s used in computations in this class, then the latter can change, and they still have their knowledge of the former.

We often want students to use variables in multiple ways even within a single class, unit, or problem! When introducing the standard form ax² + bx + c for quadratics, a, b, and c act as parameters . But if you ask a student for which values of a, b and c the expression ax² + bx + c is the square of a polynomial (and this is actually an important question!), the student must now think of a, b, and c as quantities that vary together, under constraints, and manipulate them algebraically just as they do with unknowns. No single role of variables is enough to solve this problem.

If students need to solve problems like this, it won’t help to tell them that each of these roles is fundamentally a different kind of variable, and try to sort our variables into these buckets. Instead, students must eventually develop a fluid understanding of the ways variables are used in mathematical reasoning, often switching quickly and easily between them. The most interesting problem-solving involves comparing the situation from more than one point of view, where the variables may play different roles, and combining the insights from each.

Variables belong to language, not mathematics.

If defining variables by enumerating the roles they can play in problems is a dead end, surely we must still say something! It’s true, after all, that a variable like x might sometimes be used to generalize over any number, but other times represent a single well-defined number that students should can determine through solving an equation. And what of Trigueros and Jacobs’ claim that variables have no precise mathematical definition?

The key, I believe, is to understand that variables are not mathematical objects at all. They are a notation — a linguistic object instead of a mathematical one — and as such, their definition is simple, but their many senses and uses arise from the context of the communication. We know what a variable is: it’s the name, like x or a, that represents a mathematical object in an expression! You can point to it, and circle it on the page. There is no mystery at all in what x is.

When it comes to using variables, it is not that students have a weak understanding of what they are, but that they lack the communication skills to interpret the surrounding meaning: the qualifying phrases, quantification, and implied setting in what they read.

When we ask a student to solve an equation, we mean: suppose that there exists a value called x, and that this equation is true. What might the value of x be? The “unknown” comes from the existential quantifier on the value of x, not the nature of variables themselves. Because we supposed that the value x exists instead of defining it, we simply do not know — a priori — which it is.

I’ve sometimes heard educators do worse by suggesting that x could have multiple values, or none! This is unfortunate, since it tries to change what is really a familiar situation of not knowing into a more complex definition. If there’s not enough information in the equation to determine the value of x, one can just say that we do not know its value. Perhaps it could be either 3 or 5, for instance, but we cannot tell which. Perhaps the equation is contradictory and so cannot be true at all. But it’s plainly nonsense to say that x has both values at once, or none at all. Moving the complexity to the definition like this prevents students from applying what they already know about not knowing. A common error, for instance, is for a student to believe they have the answer from solving an equation, and not consider whether each possible value of x is consistent with other information they have besides the equation.

There are other kinds of quantification, as well. When we state the distributive property, we mean: for all numbers x, y, and z, this equation is true. The fact that x, y, and z this time generalize over all numbers — something which many authors called a definition of variable — instead arises from the universal quantifier that says so! On the other hand, when we write that F = ma in physics, or y = x² as the equation of a line, we mean: the relevant situations — accelerating masses under a force, or pairs of numbers (x, y) — are those for which this equation holds. This is bounded quantification. In general, the crucial phenomenon of covariation arises from bounded quantification of values, not from a third meaning of variable.

As you can see, the variation in the meaning of variables is actually variation in the stated or implied context of the communication. More often than not, that context involves some kind of quantification, and it’s the meaning of the quantifier, not the concept of variable, that changes.

Teaching language comprehension and communication is not easy.

Correcting this misunderstanding doesn’t make the task easy. Certainly, no one would claim that teaching students language comprehension is an easy task. Neither does it mean we can throw up our hands and leave language arts teachers to the job. Mathematics is its own language, both in the formal language of algebraic expressions themselves, and in the heavy use of semi-formal logical phrases and qualifiers attached to English or other natural languages. Nested and alternating quantifiers, for instance, add significant complexity to mathematics, and are far more rare in other common uses of language. Addressing these students’ language comprehension difficulties is still part of teaching mathematics.

Unfortunately, there are incidental challenges, as well. In current curricula and resources, these implied phrases and qualifiers are left out for students far too early. Students cannot be blamed for missing the details of communication when those details are not stated! Authors or teachers are too familiar with common types or patterns of math problems, and don’t always take the time to express the structure of what they are asking. Perhaps they don’t consciously understand it themselves; many of us get by with an intuitive understanding of language that we picked up second-hand from our own teachers and textbooks, but have never analyzed ourselves.

Consider the notation f(x) = 3x+5. In writing this, we usually mean: “Let the function f be defined such that for all numbers x, f(x) = 3x+5.” I find that most math educators struggle to express this meaning. A substantial number believe that “f(x)” is itself the full formal name of the function, and f just an abbreviation! This leads to ad hoc rules about why f(t) = 3t+5 is the same function, why the equal sign makes sense even though 3x+5 is a number, why it’s okay to write fg but not f(x) ∘ g(x), and so on. Understanding of the implied quantifiers makes all of this clear, but as many students have discovered, math gets bewilderingly complex when you’re compensating for incorrect fundamentals.

(I say usually above because one could write the same equation in a context where x is defined elsewhere, such as “If x is prime, then f(x) = 3x+5”. Then, different quantifiers would be inferred from that context. In short, inferred context is not compositional: the meaning of a statement is quite ambiguous without full context.)

We can teach fluid comprehension of variables.

Then a proper understanding of variables depends on interpreting complex, multi-level, and sometimes implicit use of mathematical language. This is not trivial, but it’s important to get right anyway. Mathematics is a language for communication, and however hard we try, we cannot divorce mathematics from the usual concerns of human communication, such as reading comprehension, and drawing inferences of intent and context.

The question arises of how math educators might help students build the skill of communicating and understanding the use of variables in mathematics, including the various kinds of quantification and implied phrases. Though I do not claim to have all the answers, I can share some thoughts in this direction. I’m also intensely interested in the ideas of others.

Teachers should model variables as a communication technique.

When variables are only introduced as part of new procedural learning such as solving equations, it’s not surprising that students connect their meaning to those procedural skills. But successful students, and professional mathematicians as well, often start with defining variables so that they will have vocabulary to reason with, before they know what they’ll be doing with them. Teachers can model this by introducing variables purely for didactic reasons in class.

For a simple example, consider the usual presentation of the so-called “distance formula” (essentially the Pythagorean theorem, but stated using Cartesian coordinates instead of right triangles):

The “distance formula”, as usually presented in textbooks

Many younger students find this rather daunting as an expression, because there are so many levels of nesting. It can also be written like this:

The “distance formula”, after naming interesting subexpressions

The two new variables here are non-essential, since the expression can just as well be written as it usually is, without them. Adding these variables, though, gives names to interesting quantities in the original expression, making the top-level expression shorter and clearer. This is a communication technique, and nothing more. Nevertheless, it’s an interesting conversation to have with students: Is the first or second form clearer? Easier to apply? Better for specific purposes?

Students should define their own variables.

We can also encourage students, as well, to make up their own variables by choosing an unclaimed symbol and explaining which quantity is represents. This dispels the myth that the variables are inherent in the problem itself, when in fact any interesting mathematical situation is likely to have dozens of different quantities involved. The choice of which to name as variables, and which to leave as unnamed and implied, is a communication decision, and the resulting variables are a working vocabulary which can make problem-solving easier or harder. (As an advanced example, consider the decision of a physicist to work with momentum instead of mass and velocity.)

Students can define variables with informal language, or by writing equations that define them, or of course preferably both! They can justify their choices and explain why they feel one quantity is more informative or important than another. And they should do so before they have solved a problem.

(It is interesting to note that in computer science, it’s typical to start out writing a complex definition by defining variables — for a moment ignoring that the word means something different there, because the difference isn’t relevant now — to capture simple but interesting quantities first. This organization and communication technique is taught explicitly under the name top-down decomposition. Mathematics students, though, are often expected to pick up skills like this on their own, without direct instruction.)

Quantification should be taught explicitly.

Logical quantification is sometimes viewed as an advanced topic in either philosophy or mathematical logic, and reserved for classes like [pre-]calculus or even the university level. But we can see here that quantification is implicit in much of the basic middle school curriculum! Therefore, we cannot get away with not teaching it.

There is no need to do so with a great deal of formalism. Instead, teachers should understand that they are teaching language comprehension, and that students have brains hard-wired for intuitive understanding of language. Instruction should focus on statements in plain language, not logical connectives! What is the difference between saying “For all birds, there is a nest they live in.” and “There is a nest that all birds live in.”? Can the first be true without the second? Can the second be true without the first?

Students should also practice writing out the quantification in their math problems explicitly. As I did above, students should write out whether a statement claims that such a value exists, or that it’s true for all values, or perhaps that it defines certain associations between values (such as all pairs of x and y that are part of a curve or region of the plane). Again, the focus must be on communicating, so it helps to ask them to write for a specific audience, such as a hypothetical student (the foil) who expresses some incorrect understanding, and to convince that student.

Background and Credits

My thoughts here originally stem from my project of the last eight years, teaching an enrichment curriculum about creative expression with mathematics. The curriculum attempts to answer the question: what if algebra were taught as a language for creative self-expression? The implications are exciting:

  • The emphasis shifts from calculating answers to expressing abstract ideas, and use computers to do any calculation so students can see the consequences of their descriptions.
  • Students no longer seek to answer unmotivated questions posed by others, but instead find ways to express their own ideas. They experience mathematics as a creative medium, which gives them autonomy and empowers them.

In effect, this turns the entire middle-school mathematics curriculum inside-out. Even without stretching, one easily reaches around 70% of middle school math concepts (for instance, as listed in the Common Core math standards, or other standards documents). And yet, most of these ideas are seen from a radically different perspective. It makes memorized procedural knowledge quite irrelevant, to the extent that some teachers express skepticism that it is even real mathematics in the first place. But students understand ideas like variables, expressions, functions, operators, and so on with a kind of concreteness that is missing from a typical math education.

I often find that the way ideas are taught in conventional mathematics is incompatible with what I’m doing, because the ideas are too tied to traditional math problems and procedures, rather than the big shared ideas.

For this article in particular, I’m also grateful to Maria Trigueros and Sally Jacobs, whose writing on the subject helped focus my thoughts. Though I treated their central claim as a foil to some extent, they also address the importance of a fluid understanding of the uses of variables in mathematics, and how this causes problems for students who conceptual knowledge is too limited. Henri Picciotto, who has authored a large number of resources on the teaching of algebra, was also helpful in responding to my earlier rants on Twitter! Thanks to you all.

by Chris Smith at October 02, 2019 04:21 AM

Donnacha Oisín Kidney

What is Good About Haskell?

Posted on October 2, 2019
Tags: Haskell

Update 5/10/2019: check the bottom of this post for some links to comments and discussion.

Beginners to Haskell are often confused as to what’s so great about the language. Much of the proselytizing online focuses on pretty abstract (and often poorly defined) concepts like “purity�, “strong types�, and (god forbid) “monads�. These things are difficult to understand, somewhat controversial, and not obviously beneficial (especially when you’ve only been using the language for a short amount of time).

The real tragedy is that Haskell (and other ML-family languages) are packed with simple, decades-old features like pattern matching and algebraic data types which have massive, clear benefits and few (if any) downsides. Some of these ideas are finally filtering in to mainstream languages (like Swift and Rust) where they’re used to great effect, but the vast majority of programmers out there haven’t yet been exposed to them.

This post aims to demonstrate some of these features in a simple (but hopefully not too simple) example. I’m going to write and package up a simple sorting algorithm in both Haskell and Python, and compare the code in each. I’m choosing Python because I like it and beginners like it, but also because it’s missing most of the features I’ll be demonstrating. It’s important to note I’m not comparing Haskell and Python as languages: the Python code is just there as a reference for people less familiar with Haskell. What’s more, the comparison is unfair, as the example deliberately plays to Haskell’s strengths (so I can show off the features I’m interested in): it wouldn’t be difficult to pick an example that makes Python look good and Haskell look poor.

This post is not meant to say “Haskell is great, and your language sucks�! It’s not even really about Haskell: much of what I’m talking about here applies equally well to Ocaml, Rust, etc. I’m really writing this as a response to the notion that functional features are somehow experimental, overly complex, or ultimately compromised. As a result of that idea, I feel like these features are left out of a lot of modern languages which would benefit from them. There exists a small set of simple, battle-tested PL ideas, which have been used for nearly forty years now: this post aims to demonstrate them, and argue for their inclusion in every general-purpose programming language that’s being designed today.

The Algorithm

We’ll be using a skew heap to sort lists in both languages. The basic idea is to repeatedly insert stuff into the heap, and then repeatedly “pop� the smallest element from the heap until it’s empty. It’s not in-place, but it is <semantics>�(nlogn)<annotation encoding="application/x-tex">\mathcal{O}(n \log n)</annotation></semantics>, and actually performs pretty well in practice.

A Tree

A Skew Heap is represented by a binary tree:

I want to point out the precision of the Haskell definition: a tree is either a leaf (an empty tree), or a node, with a payload and two children. There are no special cases, and it took us one line to write (spread to 3 here for legibility on smaller screens).

In Python, we have to write a few more lines1. This representation uses the _is_node field is False for an empty tree (a leaf). If it’s True, the other fields are filled. We write some helper functions to give us constructors like the leaf and node ones for the Haskell example.

This isn’t the standard definition of a binary tree in Python, in fact it might looks a little weird to most Python people. Let’s run through some alternatives and their issues.

  1. The standard definition:

    Instead of having a separate field for “is this a leaf or a node�, the empty tree is simply None:

    With this approach, if we define any methods on a tree, they won’t work on the empty tree!

  2. We’ll do inheritance! Python even has a handy abc library to help us with some of this:

    Methods will now work on an empty tree, but we’re faced with 2 problems: first, this is very verbose, and pretty complex. Secondly, we can’t write a mutating method which changes a tree from a leaf to a node. In other words, we can’t write an insert method!

  3. We won’t represent a leaf as the whole tree being None, just the data!

    This (surprisingly) pops up in a few places. While it solves the problem of methods, and the mutation problem, it has a serious bug. We can’t have None as an element in the tree! In other words, if we ask our eventual algorithm to sort a list which contains None, it will silently discard some of the list, returning the wrong answer.

There are yet more options (using a wrapper class), none of them ideal. Another thing to point out is that, even with our definition with a tag, we can only represent types with 2 possible states. If there was another type of node in the tree, we couldn’t simply use a boolean tag: we’d have to switch to integers (and remember the meaning of each integer), or strings! Yuck!

What Python is fundamentally missing here is algebraic data types. This is a way of building up all of your types out of products (“my type has this and this�) and sums (“my type is this or this�). Python can do products perfectly well: that’s what classes are. The tree class itself is the product of Bool, data, Tree, and Tree. However it’s missing an entire half of the equation! This is why you just can’t express binary trees as cleanly as you can in Swift, Haskell, OCaml, etc. Python, as well as a host of other languages like Go, Java, etc, will let you express one kind of “sum� type: “or nothing� (the null pointer). However, it’s clunky and poorly handled in all of those languages (the method problems above demonstrate the issues in Python), and doesn’t work for anything other than that one special case.

Again, there’s nothing about algebraic data types that makes them ill-suited to mainstream or imperative languages. Swift uses them, and people love them!

A Function

The core operation on skew heaps is the skew merge.

The standout feature here is pattern matching. In Haskell, we’re able to write the function as we might describe it: “in this case, I’ll do this, in this other case, I’ll do this, etc.�. In Python, we are forced to think of the truth tables and sequential testing. What do I mean by truth tables? Consider the following version of the Python function above:

You may even write this version first: it initially seems more natural (because _is_node is used in the positive). Here’s the question, though: does it do the same thing as the previous version? Are you sure? Which else is connected to which if? Does every if have an else? (some linters will suggest you remove some of the elses above, since the if-clause has a return statement in it!)

The fact of the matter is that we are forced to do truth tables of every condition in our minds, rather than saying what we mean (as we do in the Haskell version).

The other thing we’re saved from in the Haskell version is accessing undefined fields. In the Python function, we know accessing lhs._data is correct since we verified that lhs is a node. But the logic to do this verification is complex: we checked if it wasn’t a node, and returned if that was true… so if it is true that lhs isn’t a node, we would have returned, but we didn’t, so…

Bear in mind all of these logic checks happened four lines before the actual access: this can get much uglier in practice! Compare this to the Haskell version: we only get to bind variables if we’re sure they exist. The syntax itself prevents us from accessing fields which aren’t defined, in a simple way.

Pattern matching has existed for years in many different forms: even C has switch statements. The added feature of destructuring is available in languages like Swift, Rust, and the whole ML family. Ask for it in your language today!

Now that we have that function, we get to define others in terms of it:

A Word on Types

I haven’t mentioned Haskell’s type system so far, as it’s been quite unobtrusive in the examples. And that’s kind of the point: despite more complex examples you’ll see online demonstrating the power of type classes and higher-kinded types, Haskell’s type system excels in these simpler cases.

Without much ceremony, this signature tells us:

  1. The function takes two trees, and returns a third.
  2. Both trees have to be filled with the same types of elements.
  3. Those elements must have an order defined on them.

Type Inference

I feel a lot of people miss the point of this particular feature. Technically speaking, this feature allows us to write fewer type signatures, as Haskell will be able to guess most of them. Coming from something like Java, you might think that that’s an opportunity to shorten up some verbose code. It’s not! You’ll rarely find a Haskell program these days missing top-level type signatures: it’s easier to read a program with explicit type signatures, so people are advised to put them as much as possible.

(Amusingly, I often find older Haskell code snippets which are entirely devoid of type signatures. It seems that programmers were so excited about Hindley-Milner type inference that they would put it to the test as often as they could.)

Type inference in Haskell is actually useful in a different way. First, if I write the implementation of the merge function, the compiler will tell me the signature, which is extremely helpful for more complex examples. Take the following, for instance:

Remembering precisely which numeric type x needs to be is a little difficult (Floating? Real? Fractional?), but if I just ask the compiler it will tell me without difficulty.

The second use is kind of the opposite: if I have a hole in my program where I need to fill in some code, Haskell can help me along by telling me the type of that hole automatically. This is often enough information to figure out the entire implementation! In fact, there are some programs which will use this capability of the type checker to fill in the hole with valid programs, synthesising your code for you.

So often strong type systems can make you feel like you’re fighting more and more against the compiler. I hope these couple examples show that it doesn’t have to be that way.

When Things Go Wrong

The next function is “pop-min�:

At first glance, this function should be right at home in Python. It mutates its input, and it has an error case. The code we’ve written here for Python is pretty idiomatic, also: other than the ugly deep copy, we’re basically just mutating the object, and using an exception for the exceptional state (when the tree is empty). Even the exception we use is the same exception as when you try and pop() from an empty list.

The Haskell code here mainly demonstrates a difference in API style you’ll see between the two languages. If something isn’t found, we just use Maybe. And instead of mutating the original variable, we return the new state in the second part of a tuple. What’s nice about this is that we’re only using simple core features like algebraic data types to emulate pretty complex features like exceptions in Python.

You may have heard that “Haskell uses monads to do mutation and exceptions�. This is not true. Yes, state and exceptions have patterns which technically speaking are “monadic�. But make no mistake: when we want to model “exceptions� in Haskell, we really just return a maybe (or an either). And when we want to do “mutation�, we return a tuple, where the second element is the updated state. You don’t have to understand monads to use them, and you certainly don’t “need� monads to do them. To drive the point home, the above code could actually equivalently have a type which mentions “the state monad� and “the maybe monad�:

But there’s no need to!

Gluing It All Together

The main part of our task is now done: all that is left is to glue the various bits and pieces together. Remember, the overall algorithm builds up the heap from a list, and then tears it down using popMin. First, then, to build up the heap.

To my eye, the Haskell code here is significantly more “readable� than the Python. I know that’s a very subjective judgement, but foldr is a function so often used that it’s immediately clear what’s happening in this example.

Why didn’t we use a similar function in Python, then? We actually could have: python does have an equivalent to foldr, called reduce (it’s been relegated to functools since Python 3 (also technically it’s equivalent to foldl, not foldr)). We’re encouraged not to use it, though: the more pythonic code uses a for loop. Also, it wouldn’t work for our use case: the insert function we wrote is mutating, which doesn’t gel well with reduce.

I think this demonstrates another benefit of simple, functional APIs. If you keep things simple, and build things out of functions, they’ll tend to glue together well, without having to write any glue code yourself. The for loop, in my opinion, is “glue code�. The next function, heapToList, illustrates this even more so:

Again, things are kept simple in the Haskell example. We’ve stuck to data types and functions, and these data types and functions mesh well with each other. You might be aware that there’s some deep and interesting mathematics behind the foldr and unfoldr functions going on, and how they relate. We don’t need to know any of that here, though: they just work together well.

Again, Python does have a function which is equivalent to unfoldr: iter has an overload which will repeatedly call a function until it hits a sentinel value. But this doesn’t fit with the rest of the iterator model! Most iterators are terminated with the StopIteration exception; ours (like the pop function on lists) is terminated by the IndexError exception; and this function excepts a third version, terminated by a sentinel!

Finally, let’s write sort:

This is just driving home the point: programs work well when they’re built out of functions, and you want your language to encourage you to build things out of functions. In this case, the sort function is built out of two smaller ones: it’s the essence of function composition.

Laziness

So I fully admit that laziness is one of the features of Haskell that does have downsides. I don’t think every language should be lazy, but I did want to say a little about it in regards to the sorting example here.

I tend to think that people overstate how hard it makes reasoning about space: it actually follows pretty straightforward rules, which you can generally step through in yourself (compared to, for instance, rewrite rules, which are often black magic!)

In modern programming, people will tend to use laziness it anyway. Python is a great example: the itertools library is almost entirely lazy. Actually making use of the laziness, though, is difficult and error-prone. Above, for instance, the heapToList function is lazy in Haskell, but strict in Python. Converting it to a lazy version is not the most difficult thing in the world:

But now, suddenly, the entire list API won’t work. What’s more, if we try and access the first element of the returned value, we mutate the whole thing: anyone else looking at the output of the generator will have it mutated out from under them!

Laziness fundamentally makes this more reusable. Take our popMin function: if we just want to view the smallest element, without reconstructing the rest of the tree, we can actually use popMin as-is. If we don’t use the second element of the tuple we don’t pay for it. In Python, we need to write a second function.

Testing

Testing the sort function in Haskell is ridiculously easy. Say we have an example sorting function that we trust, maybe a slow but obvious insertion sort, and we want to make sure that our fast heap sort here does the same thing. This is the test:

In that single line, the QuickCheck library will automatically generate random input, run each sort function on it, and compare the two outputs, giving a rich diff if they don’t match.

Conclusion

This post was meant to show a few features like pattern-matching, algebraic data types, and function-based APIs in a good light. These ideas aren’t revolutionary any more, and plenty of languages have them, but unfortunately several languages don’t. Hopefully the example here illustrates a little why these features are good, and pushes back against the idea that algebraic data types are too complex for mainstream languages.

Update 5/10/2019

This got posted to /r/haskell and hackernews. You can find me arguing in the comments there a little bit: I’m oisdk on hackernews and u/foBrowsing on reddit.

There are two topics that came up a bunch that I’d like to add to this post. First I’ll just quote one of the comments from Beltiras:

Friend of mine is always trying to convert me. Asked me to read this yesterday evening. This is my take on the article:

Most of my daily job goes into gluing services (API endpoints to databases or other services, some business logic in the middle). I don’t need to see yet another exposition of how to do algorithmic tasks. Haven’t seen one of those since doing my BSc. Show me the tools available to write a daemon, an http server, API endpoints, ORM-type things and you will have provided me with tools to tackle what I do. I’ll never write a binary tree or search or a linked list at work.

If you want to convince me, show me what I need to know to do what I do.

and my response:

I wasn’t really trying to convince anyone to use Haskell at their day job: I am just a college student, after all, so I would have no idea what I was talking about!

I wrote the article a while ago after being frustrated using a bunch of Go and Python at an internship. Often I really wanted simple algebraic data types and pattern-matching, but when I looked up why Go didn’t have them I saw a lot of justifications that amounted to “functional features are too complex and we’re making a simple language. Haskell is notoriously complex�. In my opinion, the res, err := fun(); if err != nil (for example) pattern was much more complex than the alternative with pattern-matching. So I wanted to write an article demonstrating that, while Haskell has a lot of out-there stuff in it, there’s a bunch of simple ideas which really shouldn’t be missing from any modern general-purpose language.

As to why I used a binary tree as the example, I thought it was pretty self-contained, and I find skew heaps quite interesting.

The second topic was basically people having a go at my ugly Python; to which I say: fair enough! It is not my best. I wasn’t trying necessarily to write the best Python I could here, though, rather I was trying to write the “normal� implementation of a binary tree. If I was to implement a binary tree of some sort myself, though, I would certainly write it in an immutable style rather than the style here. Bear in mind as well that much of what I’m arguing for is stylistic: I think (for instance) that it would be better to use reduce in Python more, and I think the move away from it is a bad thing. So of course I’m not going to use reduce when I’m showing the Python version: I’m doing a compare and contrast!


  1. Yes, I know about the new dataclasses feature. However, it’s wrapped up with the (also new) type hints module, and as such is much more complicated to use. As the purpose of the Python code here is to provide something of a lingua franca for non-Haskellers, I decided against using it. That said, the problems outlined are not solved by dataclasses.↩

by Donnacha Oisín Kidney at October 02, 2019 12:00 AM

October 01, 2019

Philip Wadler

Instead of flight shaming, let's be thoughtful and selective about all travel



Fly, drive, train? Here's a resource to help you decide. Spotted by Michael J. Oghia of the ACM Climate group.

by Philip Wadler (noreply@blogger.com) at October 01, 2019 11:51 AM

September 30, 2019

Monday Morning Haskell

Gathering Smart Data

gather_data.jpg

Last week we made a few more fixes to our Q-Learning algorithm. Ultimately though, it still seems to fall short for even basic versions of our problem.

Q-learning is an example of an "unsupervised" learning approach. We don't tell the machine learning algorithm what the "correct" moves are. We give it rewards when it wins the game (and negative rewards when it loses). But it needs to figure out how to play to get those rewards. With supervised learning, we'll have specific examples of what it should do! We'll have data points saying, "for this feature set, we should make this move." We'll determine a way to record the moves we make in our game, both as a human player and with our manual AI algorithm! This will become our "training" data for the supervised learning approach.

This week's code is all on the Gloss side of things. You can find it on our Github repository under the branch record-player-ai. Next week, we'll jump back into Tensor Flow. If you're not familiar yet with how to use Haskell and Tensor Flow, download our Haskell Tensor Flow Guide!

Recording Moves

To gather training data, we first need a way to record moves in the middle of the game. Gloss doesn't give us access to the IO monad in our update functions. So we'll unfortunately have to resort to unsafePerformIO for this, since we need the data in a file. (We did the same thing when saving game states). Here's the skeleton of our function:

unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
  ...

The first parameter will be a representation of our move, an integer from 0-9. This follows the format we had with serialization.

0 -> Move Up
1 -> Move Right
2 -> Move Down
3 -> Move Left
4 -> Stand Still
X + 5 -> Move direction X and use the stun

The first World parameter will be the world under which we made the move. The second world will be the resulting world. This parameter only exists as a pass-through, because of how unsafePerformIO works.

Given these parameters, our function is pretty straightforward. We want to record a single line that has the serialized world state values and our final move choice. These will go in a comma separated list. We'll save everything to a file called moves.csv. So let's open that file and get the list of numbers. We'll immediately convert the numbers to strings with show.

unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
  handle <- openFile "moves.csv" AppendMode
  let numbers = show <$>
      (Vector.toList (vectorizeWorld prevWorld) ++
        [fromIntegral moveChoice])
  ...

Now that our values are all strings, we can get them in a comma separated format with intercalate. We'll write this string to the file and close the handle!

unsafeSaveMove :: Int -> World -> World -> World
unsaveSaveMove moveChoice prevWorld nextWorld = unsafePerformIO $ do
  handle <- openFile "moves.csv" AppendMode
  let numbers = show <$>
        (Vector.toList (vectorizeWorld prevWorld) ++
          [fromIntegral moveChoice])
  let csvString = intercalate "," numbers
  hPutStrLn handle csvString
  hClose handle
  return nextWorld

Now let's figure out how to call this function!

Saving Human Moves

Saving the moves we make as a human is pretty easy. All we need to do is hook into the inputHandler. Recall this section, that receives moves from arrow keys and makes our move:

inputHandler :: Event -> World -> World
inputHandler event w
  ...
  | otherwise = case event of
      (EventKey (SpecialKey KeyUp) Down (Modifiers _ _ Down) _) ->
        drillLocation upBoundary breakUpWall breakDownWall w
      (EventKey (SpecialKey KeyUp) Down _ _) ->
        updatePlayerMove upBoundary
      (EventKey (SpecialKey KeyDown) Down (Modifiers _ _ Down) _) ->
        drillLocation downBoundary breakDownWall breakUpWall w
      (EventKey (SpecialKey KeyDown) Down _ _) ->
        updatePlayerMove downBoundary
      (EventKey (SpecialKey KeyRight) Down (Modifiers _ _ Down) _) ->
        drillLocation rightBoundary breakRightWall breakLeftWall w
      (EventKey (SpecialKey KeyRight) Down _ _) ->
        updatePlayerMove rightBoundary
      (EventKey (SpecialKey KeyLeft) Down (Modifiers _ _ Down) _) ->
        drillLocation leftBoundary breakLeftWall breakRightWall w
      (EventKey (SpecialKey KeyLeft) Down _ _) ->
        updatePlayerMove leftBoundary
      (EventKey (SpecialKey KeySpace) Down _ _) ->
        if playerCurrentStunDelay currentPlayer /= 0
          then w
          else w
            { worldPlayer =
                activatePlayerStun currentPlayer playerParams
            , worldEnemies = stunEnemyIfClose <$> worldEnemies w
            , stunCells = stunAffectedCells
            }
  …

All these lines return World objects! So we just need to wrap them as the final argument to unsafeSaveWorld. Then we add the appropriate move choice number. The strange part is that we cannot move AND stun at the same time when playing as a human. So using the stun will always be 9, which means stunning while standing still. Here are the updates:

inputHandler :: Event -> World -> World
inputHandler event w
  ...
  | otherwise = case event of
      (EventKey (SpecialKey KeyUp) Down (Modifiers _ _ Down) _) -> 
        unsafeSaveMove 0 w $
          drillLocation upBoundary breakUpWall breakDownWall w
      (EventKey (SpecialKey KeyUp) Down _ _) ->
        unsafeSaveMove 0 w $ updatePlayerMove upBoundary
      (EventKey (SpecialKey KeyDown) Down (Modifiers _ _ Down) _) -> 
        unsafeSaveMove 2 w $
          drillLocation downBoundary breakDownWall breakUpWall w
      (EventKey (SpecialKey KeyDown) Down _ _) ->
        unsafeSaveMove 2 w $ updatePlayerMove downBoundary
      (EventKey (SpecialKey KeyRight) Down (Modifiers _ _ Down) _) -> 
        unsafeSaveMove 1 w $
          drillLocation rightBoundary breakRightWall breakLeftWall w
      (EventKey (SpecialKey KeyRight) Down _ _) ->
        unsafeSaveMove 1 w $ updatePlayerMove rightBoundary
      (EventKey (SpecialKey KeyLeft) Down (Modifiers _ _ Down) _) ->
        unsafeSaveMove 3 w $
          drillLocation leftBoundary breakLeftWall breakRightWall w
      (EventKey (SpecialKey KeyLeft) Down _ _) ->
        unsafeSaveMove 3 w $ updatePlayerMove leftBoundary
      (EventKey (SpecialKey KeySpace) Down _ _) ->
        if playerCurrentStunDelay currentPlayer /= 0
          then w
          else unsafeSaveMove 9 w $ w
            { worldPlayer =
                activatePlayerStun currentPlayer playerParams
            , worldEnemies = stunEnemyIfClose <$> worldEnemies w
            , stunCells = stunAffectedCells
            }
  …

And now whenever we play the game, it will save our moves! Keep in mind though, it takes a lot of training data to get good results when using supervised learning. I played for an hour and got around 10000 data points. We'll see if this is enough!

Saving AI Moves

While the game is a least a little fun, it's also exhausting to keep playing it to generate data! So now let's consider how we can get the AI to play the game itself and generate data. The first step is to reset the game automatically on winning or losing:

updateFunc :: Float -> World -> World
updateFunc _ w =
  | (worldResult w == GameWon || worldResult w == GameLost) &&
       (usePlayerAI params) =
    ...

The rest will follow the other logic we have for resetting the game. Now we must examine where to insert our call to unsafeSaveMove. The answer is our updateWorldForPlayerMove function. Wecan see that we get the move (and our player's cached memory) as part of makePlayerMove:

updateWorldForPlayerMove :: World -> World
updateWorldForPlayerMove w = …
  where
    (move, memory) = makePlayerMove w
    ...

We'll want a quick function to convert our move into the number choice:

moveNumber :: PlayerMove -> Int
moveNumber (PlayerMove md useStun dd) =
  let directionFactor = case (md, dd) of
        (DirectionUp, _) -> 0
        (_, DirectionUp) -> 0
        (DirectionRight, _) -> 1
        (_, DirectionRight) -> 1
        (DirectionDown, _) -> 2
        (_, DirectionDown) -> 2
        (DirectionLeft, _) -> 3
        (_, DirectionLeft) -> 3
        _ -> 4
  in  if useStun then directionFactor + 5 else directionFactor

Our saving function requires a pass-through world parameter. So we'll do the saving on our first new World calculation. This comes from modifyWorldForPlayerDrill:

updateWorldForPlayerMove :: World -> World
updateWorldForPlayerMove w = …
  where
    (move, memory) = makePlayerMove w

    worldAfterDrill = unsafeSaveMove (moveNumber move) w
     (modifyWorldForPlayerDrill …)
    ...

And that's all! Now our AI will play the game by itself, gathering data for hours on end if we like! We'll get some different data for different cases, such as 4 enemies 4 drills, 8 enemies 5 drills, and so on. This is much faster and easier than playing the game ourselves! It will automatically get 12-15 thousand data points an hour if we let it!

Conclusion

With a little bit of persistence, we can now get a lot of data for the decisions a smarter agent will make. Next week, we'll take the data we've acquired and use it to write a supervised learning algorithm! Instead of using Q-learning, we'll make the weights reflect the decisions that we (or the AI) would make.

Supervised learning is not without its pitfalls! It won't necessarily perform optimally. It will perform like the training data. So even if we're successful, our algorithm will replicate our own mistakes! It'll be interesting to see how this plays out, so stay tuned!

For more information on using Haskell in AI, take a look at our Haskell AI Series. Plus, download our Haskell Tensor Flow Guide to learn more about using this library!

by James Bowen at September 30, 2019 02:30 PM

September 29, 2019

Joey Hess

watch me program for half an hour

In this screencast, I implement a new feature in git-annex. I spend around 10 minutes writing haskell code, 10 minutes staring at type errors, and 10 minutes writing documentation. A normal coding session for me. I give a play-by-play, and some thoughts of what programming is like for me these days.

git-annex coding in haskell.ogg (38 MB) | on vimeo

Not shown is the hour I spent the next day changing the "optimize" subcommand implemented here into "--auto" options that can be passed to git-annex's get and drop commands.

watched it all, liked it (59%)


watched some, boring (9%)


too long for me (4%)


too haskell for me (15%)


not interested (13%)


Total votes: 100

September 29, 2019 07:26 AM

September 26, 2019

Oleg Grenrus

Do you have a problem? Write a compiler!

Posted on 2019-09-26 by Oleg Grenrus

These are notes of the talk I gave at ClojuTre 2019. There is a video recording of the talk, and the slide deck as a PDF.

This work is licensed under a “CC BY SA 4.0” license.

Hello ClojuTre. I'm Oleg from Helsinki.


Imagine you are writing a cool new rogue-like game. So cool many have no idea what's going on. A definitive character of rogue-likes is procedural generation of content. You'll need a random number generator for that.


SplitMix is a fast, splittable pseudorandom number generator. Being able to split a generator into two independent generators is a property you'll want if you use the generator in a functional programming language. Look it up. We'll concentrate on the being fast part. Obviously you want things to be fast.

SplitMix is fast. It does only 9 operations per generated number, one addition to advance the random seed, and xor, shift, multiply xor, shift, multiply, xor and shift. 9 in total.

However, for the maximum reach, we want our game to run in the web browsers.


  • JavaScript is a great platform
  • But a terrible programming language

Look carefully. When we multiply two "big" odd numbers, we do get even result. We shouldn't: product of two odd numbers is odd.

JavaScript numbers are a mess. Next a bit of arithmetics.


Instead of multiplying two big numbers, we split them in high and low parts and multiply many small numbers. Like in an elementary school. Then we'll have enough precision.


So as we are multiplying 32bit numbers, and are interested only in lower 32 bits of 64 bit result. Then we need to do only three multiplications.

At the end, we can get correct results, i.e. good random numbers even in JavaScript.


Next if we replace the multiplication in xor-shift-multiply with a macro doing the right thing, and actually change everything to be a macro, and expand we'll get...


a lot of code which barely fits on the slide.

I had to shorted bit-shift-left to bsl and unsigned-bit-shift-right to ubsr.


Look closely. Would you write this kind of code.


We bind ("assign") a constant value to uv variable.


And the calculate the high and low 16 bit parts of it. Something we could write directly: let [u 0x85eb v ca6b], i.e. optimize by hand. Should we do so?


No. Let's rather write an (optimizing) compiler. We don't have time to optimize by hand.


Recall, we have a very specific problem. Working with a very very tiny subset of Clojure. Some simple arithmetic and bit mangling of 32bit numbers.


We'd like to add a little of magic, which would make the program magically run faster. In my toy-micro benchmarks the speed-up is 10 percent. It's definitely worth it.


What is this magic?


A little cute macro. Of course.


We get a form and convert it into internal representation on a way in, and back to Clojure on the way back.

Working the whole Clojure syntax directly is insane task. We want only deal with a small sane subset of it. Also in a easier to manipulate format. Clojure as a pragmatic language is still optimized for writing and reading, not machine manipulation so much.


The next step is to expand a multiplication using a trick we have seen previously. We could have done it already in from-clojure step, but it's good engineering to have only one thing per step.


And the important part is the optimise function.


The internal representation I used is nested vectors with a node type as a first element. An uniform representation made it way easier to do everything else. Note how literals 1 and 3 are wrapped into vector too. We can simply look at the head of a vector to know what we are dealing with.

Code is Data.


A difficult part in Code is Data are local variables. I chose to use de Bruijn indices, so instead of names: x and y or a and b there are numbers counting towards the corresponding let (which binds only one variable at the time by the way).

de Bruijn indices are tricky to grok. Look at the colors, they are there to help. The blue :var 1 references "one away" let.

I don't expect you to understand them. It's one way to represent bindings. Perfectly there should be a library, so you don't need to think about low-level details.


Once we got rid out of names, we can still keep them around using metadata. That's a really cool feature in Clojure, I have to admit. The metadata is there, but not in your way. And having names around is actually useful when you try to debug things.


Now we have a setup done. Let's jump into optimizations.


Recall our code snippet. There's uv which is bound to a constant. And then it's used an expression which could simplify if we do this and that...


Let's keep it super simple.

  • We can have a small set of simple local rewrite rules. Local meaning, we don't need to look around, only at one subexpression at the time.
  • Then we try to match the rule everywhere, and if it match, perform the rewrite.
  • And loop until there's nothing to do.

The first optimization is inlining. In a sense it's most powerful one, as it makes opportunities for other optimizations to fire, even that on itself it doesn't do much.

So if we have a let-binding, then in some cases we perform a substitution. Replace all xs with an expression inside a body.

let x 1 y 1 (+ x y) to a lot simpler (+ 1 2). But nothing more, just that.


I need to point out, that optimizing is somewhat of an art. Sometimes it work, sometimes it don't.

For example, we don't want to duplicate an expensive (fibonacci 100) expression. We want to evaluate it once and share the result.

On the other hand, if someone already went and computed the value, then we can push it to the leaves of an expression tree.

Heuristics are tricky.

Luckily for our needs simple heuristics work well.


When inlining is a valid rewrite?


Not in every language. Consider this not-so-functional example.

If we substitute everything, the (do-foo) and (do-bar) would be in different order. And (do-quux) will be gone completely, hopefully it didn't anything important!

We need a language where there are no side-effects, nor there are so much difference in the execution order. Whole Clojure isn't such language. Our tiny subset is.

I have heard there are programming languages which behave like our small one, but are more general purpose!

OK. Let's move to the next optimization.


When we have something simple as (+ 1 2), let us evaluate it already at compile time.

For every primitive operation, if the arguments are known constants, just do it.


Now, I ask you when constant folding is a valid rewrite.


Well, that was a trick question. It really depends on primitives, whether it make sense to perform them at compile time. (Even pure languages have primitives to print stuff on a screen).

But you could think about that precalculate example. When does it makes to perform the calculation (assuming we somehow know it terminates):

  • At compile time?
  • At start up?
  • At first access?

It really depends, and there are no single simple answer.

Again, optimizations is an art.


Because we work with nice internal representation, writing individual optimizations is so nice.

  • if a node is not one of special nodes
  • and all node arguments are constants
  • evaluate it.

The code is shorter than my explanation. And still understandable.


With these two optimizations, inlining and constant folding, we get from this big (and repetitive) code blob to...


Something which actually fits on the slide without font size scaling. It's not super-pretty, but it's hard to spot if something can be done there.


We can make it look nice with one more optimization. When you bind a let-expression to a variable in outer let-expression, we can float out the inner one.

A very old idea it is.


And then we get (in my opinion) a very nice direct code. Six intermediate results to get final one.


That's what I wanted to tell you.


  • Implementing small (domain specific) languages is fun.
  • If you approach problems with "let's write a programming language to describe them" -attitude there a lot of big hammers in your disposal. A lot of wheels is already invented.
  • Languages don't need only be about numerics, it could be HTTP routing, authorisation rules, UI-workflows, CI-scripts (I could bash about bash), data descriptions, you name it.
  • But make your languages typed, lazy, pure, total or and even dependent for extra fun and interesting new problems. ;)

Thank you.

September 26, 2019 12:00 AM

September 25, 2019

Tweag I/O

Bazel's Persistent Worker Mode for GHC:
An Industrial Internship

Artem Pelenitsyn

I got the opportunity to work on Bazel's Persistent Worker Mode for Haskell GHC during my internship at Tweag. Let's begin with some context. The rules_haskell project adds support for Haskell components in software based on the Bazel build system. By default, compiling an individual Haskell target triggers a separate sandboxed GHC invocation. This approach is not optimal for two reasons: the recurring cost of compiler startups and the potential loss in incremental builds. Bazel has a special mode of communication with a compiler to resolve the issue. My internship goal was to improve the method of communication between Bazel and the Haskell GHC compiler by adding support for this persistent worker mode in rules_haskell. Let's explore what I learned and what I was able to accomplish.

Call for Persistent Compilers

Consider the following example of a C++ application build script in Bazel.

cc_library(
    name = "Lib",
    srcs = ["A.cpp", "B.cpp"]
)
cc_binary(
    name = "Bin",
    srcs = ["Main.cpp"]
    deps = [":Lib"]
)

Bazel's built-in cc_library and cc_binary are rules to describe C++ build targets. We have two targets in this application, called Lib and Bin. The Lib library target depends on two source files, A.cpp and B.cpp; the Bin binary target depends on Lib and the Main.cpp source file.

Bazel controls the order in which targets are built and does not depend on a programming language in question. But how a target is built, does, hence the names like cc_binary, haskell_binary, etc. For instance, each Haskell target is built with just a pair of calls to a compiler (one for compiling and one for linking). In the case of C++, however, every file is compiled by a separate call to a compiler. On the one hand, the one-call-per-file strategy wastes time on repetitive compiler startups, which may form a significant cost in languages like Scala or Haskell but not a big deal for C++. On the other hand, this strategy creates an additional opportunity for improved incremental building. Let's consider each of these two observations in more detail.

Startup Times Saved

The opportunity to save on startup times was pointed out in Persistent Worker Processes for Bazel, the original Bazel Blog post on the topic. The Bazel team demonstrated the significant benefits of using a persistent compiler process for Java, where startups are expensive and JIT needs runtime stats to do its job effectively. Other JVM languages, e.g., Scala and Kotlin, followed this path.

Today, many of the main Bazel-enabled languages hosted under bazelbuild have persistent workers, but not all of them benefit from warm startup and caching JIT data as much as the JVM-based languages do. Luckily, there is another way to improve performance with a persistent compiler process, namely, reusing auxiliary build artifacts that did not change since the last build—incremental builds.

Chasing Incremental Builds

Fast, correct incremental builds are such a fundamental Bazel goal they're the first thing mentioned in Why Bazel? at the top of Bazel's homepage. To fulfill this promise, though, Bazel needs sufficient knowledge about dependencies between the build artifacts. Let's get back to our example to explain this better.

After the Bin target has been built once, any change to Main.cpp would require recompiling Main.cpp and relinking Bin, but does not require rebuilding Lib. This is the inter-target incrementality supported by Bazel universally: no specific knowledge about the programming language is needed, and the logic fully translates to rules_haskell and its haskell_library/haskell_binary rules.

The difference comes when after a full build you make a change in, e.g., A.cpp. As we know, .cpp files are compiled separately, and the knowledge is encoded in the cc_library; therefore, Bazel would only recompile A.cpp, but not B.cpp. In contrast, the recompilation strategy for Haskell is rather subtle: whether you need to recompile the module B in similar circumstances roughly depends on whether there is an import of A inside B. This goes beyond the knowledge of haskell_library, and the rule will simply recompile all the modules in the Lib target. The bottom-line is: due to the difference in the language nature, Bazel supports sub-target incrementality for C++ components but not for Haskell components.

Is it possible to get better incremental builds for Haskell components? Almost certainly, yes. In fact, this was one of the driving powers of the project. The persistent worker mode opens an opportunity for, first, the sub-target dependency analysis using GHC API and, second, caching of auxiliary build artifacts (e.g., .hi and .o files) to save work during rebuilds.

Unfortunately, it's hard to get the implementation of incremental builds right, and only a few Bazel-aware languages support sub-target incremental builds (e.g., rules_swift and one of several rules_scala forks that employ the Zinc compiler). I did not get to implementing incremental builds in my project. I spent most of my time finding the right way to integrate the worker mode into rules_haskell.

Worker Mechanics

First Step: Bazel Interface

When Bazel encounters a compile-like command for the first time, it spawns a persistent process. This child process gets its stdin/stdout redirected to talk directly to Bazel. The process listens on its stdin for a compilation request and upon processing one, sends back a response. Bazel speaks to its workers in a simple Protobuf-based protocol: a request is a list of filepaths with the corresponding file hashes; a response is an exit code and a string with other textual output the worker decides to report (e.g., compiler's warning messages).

All in all, this scheme looks straightforward except an IPC solution based on stdin/stdout complicates debugging by an order of magnitude. For example, sometimes GHC sends diagnostic messages to stdout instead of stderr, and sometimes you cannot mute such messages (I solved one particularly annoying instance of the problem during this work). One might hope redirecting stdout helps, but some standard tricks may fail for all sorts of reasons; e.g., concurrency employed by Bazel bit me when running rules_haskell's test suite under the worker strategy.

Second Step: Protobuf for Haskell

Several libraries implement support for Protobuf in the Haskell ecosystem. We chose proto-lens which allows us to generate Haskell definitions from a .proto description and conveniently access data with lenses.

One obstacle with proto-lens was that they silently (and unconsciously, it seems) dropped support for parsing messages from an unbounded stream. That means once you have a handle to read in a message from, you have to specify the size of the bytestring you're going to read before the parser can get its hands on it. The length of a message is variable and encoded as a variable-length integer sent in front of every message. The proto-lens library had internal machinery to read varints but lacked a reasonable interface to employ it when receiving messages. I fixed this.

Third Step: GHC API

The worker application is a simple single-threaded server creating a fresh GHC session for every request. One issue I hit when employing GHC API to the rules_haskell use case is that we use separate GHC calls to compile and then link. The Hello-World example for using the GHC API in the GHC User Guide does not cover the latter use case (where you should run GHC in the "one-shot" instead of the --make mode), and I ended up copying some parts of the GHC driver to support this use case, since GHC doesn't export its driver module, unfortunately.

Integration and Tests

Since version 0.27 (June 2019) Bazel picks the worker strategy as the default one if it is available for action at all. Finding a convenient way to override the default was not straightforward—I had to rework the solution several times through both of my PRs to rules_haskell.

The final version of the interface to activate the worker mode consists (as now described in the rules_haskell docs) of a couple of actions: one has to, first, load the worker's dependencies in the WORKSPACE file, and, second, pass a command-line argument when starting the build:

bazel build my_target --define use_worker=True

The --define syntax is heavyweight due to Bazel's initial reluctance to provide user-defined command-line arguments. Recently, Bazel added special support for this feature but, as I discovered, the implementation of the feature has issues in realistic applications going beyond Hello-World.

The worker mode passed the whole test suite without a glitch from the first time; this is impressive given that the test suite contains tricky examples, e.g., with GHCi, C dependencies, and even GHC plugins. There's only one gotcha to take into account: the worker is not sandboxed by default. In some cases, GHC prefers rebuilding a perfectly valid target when it has access to the target's source. This will fail if GHC is not provided with sufficient dependencies. There were about 4 test cases out of 96 that failed due to this. The solution is to simply always use sandboxing passing --worker_sandboxing in the command-line.

We did not get to rigorous performance measurements, but there are some promising observations even for the current implementation lacking sub-target incrementality. First of all, I assembled a sample ten module project where each module held one function calling the function from the previous module. Every module turned into a separate target in the BUILD script, forming a deep target dependency tree. For this setup, I observed 10–15% speedup for the worker-enabled version of rules_haskell (excluding the time for building the worker). On the other hand, running the rules_haskell test suite did not show significant improvements on the worker-enabled version (2–3% speedup). I attribute this difference to two features of the test suite: first, the suite holds a fair amount of non-Haskell code, which dims the worker effect on build time; second, the suite represents a very shallow dependency graph, unlike in the first experiment with ten modules. Overall, there is a hope for a speedup in projects with deep dependency graphs.

GHC Persistent Worker in Context

There are many efforts underway to make GHC friendlier in various client-server kind of scenarios including IDEs (e.g., HIE and now hie-bios in ghcide), interactive sessions (e.g., this issue and this PR), and, finally, build systems (e.g., the recent Extended Dependency Generation proposal by David Eichmann and his work on cloud builds for the Shake-based GHC build system Hadrian). Indeed, we can now see some level of the convergence foreseen by Edward Z. Yang among compilers, build systems, and IDEs happening in the Haskellverse today.

What about incremental builds? David's findings suggest: GHC's ability to communicate detailed source file dependencies allows for fine-grained control over the build process. Under his proposal, the build system might get to decide when to call GHC and only ever call it in the "one-shot" mode. This decision logic could hardly fit in rules_haskell main code but seems perfectly relevant to the worker implementation.

Although I did not get to incremental builds, I did some experiments with warm GHC startups. None of those ended up in the current implementation since I think there is room for improvement here. I believe one of the possible ways to improve GHC session startup times is caching package data for packages loaded from the global package database. To me, this loading stage looks like the most expensive action during GHC session initialization. Sadly, I found there were not enough utilities exported from GHC's Packages.hs to tune this behavior.

Acknowledgments

I'm grateful to Tweag for giving me an exciting opportunity to work in an industrial setting; to Mathieu Boespflug for suggesting a project that not only kindled my Haskell passion but also pushed me outside my comfort zone to learn a new thing (Bazel); to Andreas Herrmann, my mentor in this project, for providing endless insightful help and feedback; to all other Tweagers, who either directly helped me or brought me excitement by demonstrating their exceptional engineering skills as well as creativity.

Annotated References

Here's a summary of my contributions and some pointers to possible future directions for improvement of the persistent worker mode in rules_haskell.

  1. Worker pull requests to rules_haskell: [1], [2]. The second one adds the worker sources and reworks good part of the first because the initial strategy to implement the switch between the regular and the worker modes forced the user to download worker dependencies anyway. The current strategy based on config_setting/select does not have the flaw. It can be improved when the Bazel Custom keys issue is resolved.

  2. The initial worker repository. Unlike its replica inside rules_haskell, which just holds static Protobuf descriptions in Haskell, the repository implements proper generation of those descriptions from a .proto file. Notably, the repository contains the reuse-ghc-session branch, which explores a warm startup of a GHC session. It is blocked because once all package databases are loaded into a session, they cannot be easily unloaded with just the utilities exported from GHC's Packages.hs.

  3. GHC's hDuplicate issue, which makes it harder to design protocols around stdin/stdout if you want to intercept certain writes to stdout (or reads from stdin).

  4. My PR to proto-lens fixing the issue with no support for streaming reads.

  5. My PR to GHC allowing to mute the Loading package environment message with -v0—it required more refactoring than one might imagine.

  6. David Eichmann's Extended Dependency Generation (EDG) GHC proposal suggesting that no build system or IDE should ever call GHC in the --make mode. Instead, GHC should be able to dump all the necessary information about dependencies into a machine-readable format. Once you have that file, with dependencies recorded, you only need GHC to compile individual files, the "one-shot" mode, and never the normal --make mode. This approach would liberate us from certain shortcomings of the --make mode like timestamp-based recompilation checking.

September 25, 2019 12:00 AM

September 24, 2019

Well-Typed.Com

Eventful GHC

What can we do when it takes GHC a lot of time to compile a given module? Where is it spending its time? Where can we start to get a 10,000 feet view of what GHC is doing? This blog post covers one possible answer, using the eventlog mechanism.

(This post has also been published on the GHC blog.)

Eventlog ?

GHC has a mechanism that allows us to record all sorts of “events” to an eventlog (a simple binary file), during a program’s execution, attaching a timestamp to it to later allow tools to reconstruct as much of a program’s execution as the events allow. This includes RTS events (garbage collection, HEC/thread activity) but also user defined events, where “user” designates the author of a Haskell library or program; Debug.Trace in base provides functions that anyone can use to emit events, in addition to the ones that the RTS itself will emit.

Those functions are implemented in terms of primitive operations that are backed by GHC’s runtime system. Its design and implementation is discussed, along with other topics, in Parallel Performance Tuning for Haskell by Don Jones Jr., Simon Marlow and Satnam Singh.

After the program’s execution, one can then use libraries like ghc-events or tools like ghc-events-analyze, or threadscope to consume the eventlog in order to gain some insights into where time was spent.

While profiling lets us gather more detailed information about where time is spent and where allocations are made, it requires rebuilding our program’s code and using a dedicated RTS flavour. The code generated by GHC for a profiled program is quite different, adding lots of instrumentation to support profiling the execution of Haskell code. The extra code generated to support profiling can get in the way of some optimisations and can therefore drastically affect the performance of a program. On the other hand, generating the eventlog for a program only requires re-linking it (against a flavour of the RTS that supports tracing) with -eventlog and running it with +RTS -l. The eventlog mechanism also has much lower impact on runtime performance, since emitting events is merely about putting a few values in a buffer, that the RTS will then regularly flush to the eventlog file or whatever the destination is. The aforementionned paper has some precise numbers on the overhead of the eventlog mechanism, but it’s of course quite low.

This can therefore be an interesting solution when you want to get a big picture without having to rebuild your whole program and some of its dependencies. All you have to do is set up some events that cover the fragments of the program’s execution that you’re particularly interested in.

Events are in general emitted using one of the following 4 functions from Debug.Trace:

  • traceEvent :: String -> a -> a
  • traceEventIO :: String -> IO ()
  • traceMarker :: String -> a -> a
  • traceMarkerIO :: String -> IO ()

The traceEvent[IO] functions should be used for all sorts of events, particularly the ones that are likely to happen a lot during the execution of your program, while traceMarker[IO] are generally used to mark certain points or phases in the execution and see that visually in the profile. This is particularly helpful with tools like eventlog2html (see the last section of this post) that allow you to visualize heap profiles, drawing those “marker events” on top so that users can get a sense of when some particularly allocations or deallocations take place, with user-supplied labels instead of trying to guess from timestamps.

For more about eventlogs in general, see GHC’s users guide or the event-log page on the GHC wiki.

GHC events

Starting with this commit, by Ben Gamari about 3 months ago, GHC started emitting eventlog entries for all calls to withTiming, which is a function we use to measure how long various parts of GHC take. We then started adding a few more withTiming calls:

  • 688a1b89 added tracing around calls to various external tools (C compiler, linker, assembler, etc);
  • 0c5cd771 added tracing around all the individual passes that make up the codegen pipeline (we previously only tracked codegen as a whole, single event);
  • e3cbe319 added tracing around the package database initialization.

And other similar patches later on. As a result, we can trace most of the execution of GHC when compiling a trivial hello world module:

-- hello.hs
main :: IO ()
main = putStrLn "hello, world!"

First, you need to get a somewhat recent checkout of GHC’s master branch - e3cbe319 or newer. Then you need to build that source tree and make sure that the stage 2 GHC is linked with -eventlog:

$ ./boot && ./configure

# Build with Hadrian:
$ hadrian/build.sh -j "stage1.ghc-bin.ghc.link.opts += -eventlog"
# stage 2 ghc at: _build/stage1/bin/ghc

# Build with Make:
$ make -j GhcStage2HcOpts+=-eventlog
# stage 2 ghc at: inplace/bin/ghc-stage2

# If you have a GHC build around already, both of those commands should not
# cause a lot of work to be done: just linking the GHC executable against a
# slightly different RTS flavour! No recompilation needed.

You can then build any module, library or program with the resulting stage 2 executable as you would normally do. Except that if you pass +RTS -l with one of -v2 or -ddump-timings, GHC will produce an eventlog at ghc.eventlog with all the standard RTS events, but also events for each pass in GHC’s pipeline. Let’s see this in action by compiling `hello.hs’ from earlier.

# use inplace/bin/ghc-stage2 if you built GHC with Make
$ _build/stage1/bin/ghc -ddump-timings hello.hs -o hello +RTS -l

We now have an eventlog. But what can we do with it? You’ve got several options:

  • The ghc-events library and program, used by all the other options, provide primitives for decoding the eventlog and extracting data from it. The accompanying program provides various commands for querying eventlog files.
  • The threadscope program lets you visualize eventlogs with a GTK+ frontend.
  • ghc-events-analyze produces per-label totals, SVG visualizations and per-label timing data given an eventlog file as input.
  • … and maybe other options that I’m not remembering or aware of at all.

I personally use ghc-events-analyze a lot, since it can produce pretty SVG pictures that I can then embed in GitLab comments or… blog posts. :-)

To get the timings and SVG picture, you need to install ghc-events-analyze, and then ask it to process ghc.eventlog with the right “markers”:

$ ghc-events-analyze --timed --timed-txt --totals \
                     --start "GHC:started:" --stop "GHC:finished:" \
		     ghc.eventlog

# --timed:     produce SVG visualization of the eventlog, in ghc.timed.svg
# --timed-txt: produce per-label groupings of timings that report when the events
               were emitted, in ghc.timed.txt
# --totals:    produce per-label totals, reporting how much time was spent in
               a given label, in total
# --start:     all events that we're interested in are wrapped in
# --stop       GHC:started:... / GHC:finished:... events, so we just tell
               ghc-events-analyze that it should be looking for those markers
	       and report about those events.

Here are the totals that I get for our hello world program compilation.

GC                                389461503ns   0.389s

USER EVENTS (user events are corrected for GC)
 systool:linker                  2386891920ns   2.387s
 systool:cc                       801347228ns   0.801s
 systool:as                       145128851ns   0.145s
 Renamer/typechecker [Main]        45709853ns   0.046s
 initializing package database     20877412ns   0.021s
 CodeGen [Main]                    20754058ns   0.021s
 CoreTidy [Main]                    8262122ns   0.008s
 NCG                                7566252ns   0.008s
 Chasing dependencies               2441212ns   0.002s
 Cmm pipeline                       2040174ns   0.002s
 Desugar [Main]                     1657607ns   0.002s
 STG -> Cmm                         1103737ns   0.001s
 Simplifier [Main]                  1045768ns   0.001s
 Cmm -> Raw Cmm                      319442ns   0.000s
 Parser [Main]                       286350ns   0.000s
 CorePrep [Main]                      88795ns   0.000s
TOTAL                            3445520781ns   3.446s

THREAD EVENTS
1                                    460138ns   0.000s
IOManager on cap 0:2                 670044ns   0.001s
TimerManager:3                       143699ns   0.000s
4                                 153050783ns   0.153s
weak finalizer thread:5               43518ns   0.000s
weak finalizer thread:6               10677ns   0.000s
weak finalizer thread:7               33126ns   0.000s
weak finalizer thread:8               23787ns   0.000s
weak finalizer thread:9                1534ns   0.000s
weak finalizer thread:10               8142ns   0.000s
weak finalizer thread:11               1352ns   0.000s
weak finalizer thread:12              10080ns   0.000s
weak finalizer thread:13              10603ns   0.000s
weak finalizer thread:14               8767ns   0.000s
weak finalizer thread:15              10849ns   0.000s
16                                    69745ns   0.000s
17                                   307420ns   0.000s
18                                    83017ns   0.000s
19                                    76866ns   0.000s
weak finalizer thread:20              10083ns   0.000s
21                                    96582ns   0.000s
22                                   103373ns   0.000s
23                                    62562ns   0.000s
24                                    97655ns   0.000s
weak finalizer thread:25              11676ns   0.000s
26                                   238116ns   0.000s
27                                   245821ns   0.000s
weak finalizer thread:28               8268ns   0.000s
29                                    80235ns   0.000s
30                                    86759ns   0.000s
31                                    21571ns   0.000s
weak finalizer thread:32                396ns   0.000s
TOTAL                             156087244ns   0.156s

And the SVG image:

(click here for full size picture)

We can see two time windows where we don’t have an “active” label:

  • towards the beginning, where GHC likely parses settings, sets everything up and reads the input file;
  • after code generation and before calling the assembler, where it turns out we spend most of the time getting some information about the C compiler being used.

We otherwise have a pretty good picture of what GHC is doing the rest of the time. It might be nice in the future to add a few more events, e.g to track when we’re running Template Haskell code.

If you want GHC to only emit so-called “user events” (= non-RTS events) to the eventlog, pass +RTS -l-au instead of +RTS -l. More options can be found in the relevant section of the GHC user manual.

Wrapping up

Of course, if you’re looking into a performance problem, once you narrow down the part of the execution that is taking too long, you will quite likely start using slightly fancier profiling methods, nothing new there. You can however emit your own events when working on GHC, so as to collect some simple timing data about those codepaths that you are working on, without affecting GHC’s performance much. You simply have to use compiler/main/ErrUtils.hs:withTiming:

withTiming <action to get dynflags> (text "<label>") <forcing function> $
  <action to time>

where the forcing function is used to force the evaluation of the action’s result just as much as the user wants; the time it takes to perform this evaluation is included in the timings reported by withTiming. Your <label> would then show up in the eventlog reports, if the corresponding code path is entered. See Note [withTiming] for more explanations about the withTiming function and how to use it.

All the eventlog markers that withTiming emits can then be integrated in heap profiles using eventlog2html, written by Matthew Pickering; see here for an example. If you put your mouse on one of the vertical lines, you will see the label of the phase that begins or ends at that moment.

Before ending this post, I would like to mention that we have some infrastructure to collect timings and allocations for various compilation phases in GHC, over the head.hackage package set. This can be used locally or triggered manually from the CI interface of the GHC repository, and is set up to run nightly against GHC’s master branch. The resulting data can be loaded in PostgreSQL and queried at will. We plan on collecting more data about GHC’s behaviour when compiling all those packages and on relying more on the eventlog than GHC output parsing to do so, with two major goals in mind:

  • have an easy way to track GHC’s performance over time and in particular provide some initial data about where a performance regression might be coming from;
  • expose enough data to be able to identify “real world modules” where GHC doesn’t perform as well as we would like it to, which could in turn lead to an investigation and possibly a patch that improves GHC’s performance on the said module.

Happy profiling!

by alp at September 24, 2019 02:08 PM

September 23, 2019

Monday Morning Haskell

Tweaks, Fixes, and Some Results

tweaks.jpg

In last week's episode of this AI series, we added random exploration to our algorithm. This helped us escape certain "traps" and local minimums in the model that could keep us rooted in bad spots. But it still didn't improve results too much.

This week we'll explore a couple more ways we can fix and improve our algorithm. For the first time, see some positive outcomes. Still, we'll find our approach still isn't great.

To get started with Tensor Flow and Haskell, download our guide! It's a complex process so you'll want some help! You should also check out our Haskell AI Series to learn more about why Haskell is a good choice as an AI language!

Improvements

To start out, there are a few improvements we can make to how we do q-learning. Let's recall the basic outline of running a world iteration. There are three steps. We get our "new" move from the "input" world. Then we apply that move, and get our "next" move against the "next" world. Then we use the possible reward to create our target actions, and use that to train our model.

runWorldIteration model = do
  (prevWorld, _, _) <- get

  -- Get the next move on the current world (with random chance)
  let inputWorldVector = … -- vectorize prevWorld
  currentMoveWeights <- lift $ lift $
    (iterateWorldStep model) inputWorldVector
  let bestMove = moveFromOutput currentMoveWeights
  let newMove = chooseRandomMoveWithChance …

  -- Get the next world using this move, and produce our next move
  let nextWorld = stepWorld newMove prevWorld
  let nextWorldVector = vectorizeWorld nextWorld
  nextMoveVector <- lift $ lift $
    (iterateWorldStep model) nextWorldVector

  -- Use these to get "target action values" and use them to train!
  let (bestNextMoveIndex, maxScore) =
          (V.maxIndex nextMoveVector, V.maximum nextMoveVector)
  let targetActionData = encodeTensorData (Shape [10, 1]) $
          nextMoveVector V.//
            [(bestNextMoveIndex, newReward + maxScore)]
  lift $ lift $ (trainStep model) nextWorldVector targetActionData

There are a couple issues here. First, we want to substitute based on the first new move, not the later move. We want to learn from the move we are taking now, since we assess its result now. Thus we want to substitute for that index. We'll re-write our randomizer to account for this and return the index it chooses.

Next, when training our model, we should the original world, instead of the next world. That is, we want inputWorldVector instead of nextWorldVector. Our logic is this. We get our "future" action, which accounts for the game's reward. We want our current action on this world should be more like the future action. Here's what the changes look like:

runWorldIteration model = do
  (prevWorld, _, _) <- get

  -- Get the next move on the current world (with random chance)
  let inputWorldVector = … -- vectorize prevWorld
  currentMoveWeights <- lift $ lift $
    (iterateWorldStep model) inputWorldVector
  let bestMove = moveFromOutput currentMoveWeights
  let (newMove, newMoveIndex) = chooseRandomMoveWithChance …

  -- Get the next world using this move, and produce our next move
  let nextWorld = stepWorld newMove prevWorld
  let nextWorldVector = vectorizeWorld nextWorld
  nextMoveVector <- lift $ lift $
    (iterateWorldStep model) nextWorldVector

  -- Use these to get "target action values" and use them to train!
  let maxScore = V.maximum nextMoveVector
  let targetActionData = encodeTensorData (Shape [10, 1]) $
          nextMoveVector V.//
            [(newMoveIndex, newReward + maxScore)]
  lift $ lift $ (trainStep model) inputWorldVector targetActionData

Another change we can make is to provide some rewards based on whether the selected move was legal or not. To do this, we'll need to update the stepWorld game API to return this boolean value:

stepWorld :: PlayerMove -> World -> (World, Bool)

Then we can add a small amount (0.01) to our reward value if we get a legal move, and subtract this otherwise.

As a last flourish, we should also add a timeout condition. Our next step will be to test on simple mazes that have no enemies. This means we'll never get eaten, so we need some loss condition if we get stuck. This timeout condition should have the same negative reward as losing.

Results

Now that we've made some improvements, we'll train on a very basic maze that's only 5x5 and has no walls and no enemies. Whereas we used to struggle to even finish this maze, we now achieve the goal a fair amount of the time. One of our training iterations achieved the goal around 2/3 of the time.

However, our bot is still useless against enemies! It loses every time if we try to train from scratch on a map with a single enemy. One attempt to circumvent this is to first train our weights to solve the empty maze. Then we can start with these weights as we attempt to avoid the enemy. That way, we have some pre-existing knowledge, and we don't have to learn everything at once. Still though, it doesn't result in much improvement. Typical runs only succeeded 40-50 times out of 2000 iterations.

Limiting Features

One conclusion we can draw is that we actually have too many features! Our intuition is that a larger feature set would take more iterations to learn. If the features aren't chosen carefully, they'll introduce noise.

So instead of tracking 8 features for each possible direction of movement, let's stick with 3. We'll see if the enemy is on the location, check the distance to the end, and count the number of nearby enemies. When we do this, we get comparable results on the empty maze. But when it comes to avoiding enemies, we do a little better, surviving 150-250 iterations out of 2000. These statistics are all very rough, of course. If we wanted a more thorough analysis, we'd use multiple maze configurations and a lot more runs using the finalized weights.

Conclusions

We can't draw too many conclusions from this yet. Our model is still failing to solve simple versions of our problem. It's quite possible that our model is too simplistic. After all, all we're doing is a simple matrix multiplication on our features. In theory, this should be able to solve the problem, but it may take a lot more iterations. The results stream we see also suggests local minimums are a big problem. Logging information reveals that we often die in the same spot in the maze many times in a row. The negative rewards aren't enough to draw us out, and we are often relying on random moves to find better outcomes.

So next week we're going to start changing our approach. We'll explore a way to introduce supervised learning into our process. This depends on "correct" data. We'll try a couple different ways to get that data. We'll use our own "human" input, as well as the good AI we've written in the past to solve this problem. All we need is a way to record the moves we make! So stay tuned!

by James Bowen at September 23, 2019 02:30 PM

September 20, 2019

Tweag I/O

Probabilistic Programming with monad‑bayes, Part 1:
First Steps

Siddharth Bhat, Simeon Carstens, Matthias Meschede

In this blog post series, we're going to lead you through Bayesian modeling in Haskell with the monad-bayes library. We start this series gradually with some simple binary models, move next to linear regression, and finish by building a simple neural network that we "train" with a Metropolis-Hastings sampler. You don't need any prior knowledge of Bayesian modeling to understand and learn from these posts—and we keep the code simple and understandable for Haskell newcomers.

Want to make this post interactive? Try our notebook version. It includes a Nix shell, the required imports, and some helper routines for plotting. Let's start modeling!

Sampling

In this first part of the series, we introduce two fundamental concepts of monad-bayes: sampling and scoring. We examine them based on one of the simplest probabilistic models that we can think of—a model that represents a True or False choice. You can use it, for example, to describe the answer to a question such as "Did it rain yesterday?".

The model is parameterized by a boolean b, and in this simple case, b is also directly the model output. Without additional information, we assign equal probabilities 0.5 to each value that b can take (50% True, 50% False). In other words, we get the model parameter b from a discrete uniform prior distribution.

Let's see how the model looks like in the monad-bayes library:

model1 :: MonadSample m => m Bool
model1 = do
    b <- uniformD [False, True]
    return b

In monad-bayes a model is expressed through the typeclass MonadSample. The MonadSample typeclass provides a function random to our model1 that returns a random sample from it. The type of the sample itself is set to Bool in our case.

We define our model by binding together other basic MonadSample models. In this case, for example, we build our model from the uniformD distribution that is provided by monad-bayes The model, that is the chain of actions that are bound together in MonadSample, is not executed until we start sampling from the model. Once we sample, the resulting chain of actions that is executed is simple: draw b from a discrete uniform distribution (uniformD) and then return its value.

Sampling can be executed with:

sampleIOfixed model1
False

We can get a list of samples with Haskell's replicateM function:

nsamples = 1000
samples <- sampleIOfixed $ replicateM nsamples model1

Then plot the result afterwards with Vega-Lite. You can find our custom plotting functions for Vega-Lite in the notebook.

vlShow $ plot (200, 100) [barPlot "b"] [("b", VL.Booleans samples)]

png

So far, so good: we now have a model that represents a distribution of True/False values and we can draw samples from it. But how can we include observations into this model?

Scoring

Consider again model1 as the answer to "Did it rain yesterday?". What if we found out "Yes, it did rain yesterday!"? To include this new knowledge, we need to update the distribution of the model parameter b.

In monad-bayes the function score is responsible for making samples more or less likely—by a factor. Don't worry if you are mystified by this explanation, we'll explain more in a moment. But first check out model2 that uses score to include the observation:

model2 :: MonadInfer m => m Bool
model2 = do
    b <- uniformD [False, True]
    score (if b then 1.0 else 0.0)
    return b

Notice the new typeclass MonadInfer that allows us to use score in addition to sampling. Instead of a single operation, the model is now a chain of actions: sample, score, …

Here's a naive idea of how this could work: assume that the representation of the probability distribution of b was a list of tuples [(0.5, True), (0.5, False)]. We would multiply this distribution with the distribution of my observation "Yes, it did rain yesterday!", that is with [(1, True), (0, False)]. Then we would normalize the updated probabilities such that they sum to one, and the job is done.

So are we secretly tracking the probability of all samples—the full distribution—at every step (in the Haskell world a.k.a. some variant of the Dist monad)? Granted, in the case of a True/False question this might be a good approach—but this is not what is happening here for very good reasons:

To update a sample's probability as described above we need to track all samples with their probabilities and run global normalization operations on them. We are essentially running computations with full distributions over all possible values. This quickly becomes intractable because a model can basically be any Haskell function with lots and lots of possible outcomes.

monad-bayes and similar probabilistic frameworks use an elegant approach that do what we want without dragging around full distributions. The trick behind is to approximate the outcome distribution by drawing successive samples from it. It turns out that it is enough to know the relative probability of two samples at a time to do this. The relative probability of the two samples is independent of the probability of other samples—computations on full distributions are thus reduced to computations on samples. With appropriate sampling algorithms (MCMC) we can approximate the outcome distribution despite this limited knowledge. It is difficult to overstate the implications: with MCMC we can address all kinds of sampling related problems in a probabilistic manner that would be completely inaccessible otherwise.

Let's get back to the score function that modifies the probability to pull a certain sample from the distribution. We now understand that it multiplies the relative probability to observe a sample compared to any other with a factor. This means that the sample's probability is left untouched if this factor is 1. If this factor is 10, the sample's relative probability is increased ten fold with respect to any other sample. If this factor is 0, the sample's probability is set to 0.

Here is model2 expressed in words: (a) draw a sample from [True, False] with equal probability, and (b) multiply the relative probability of a sample with value True with 1 and of a sample with value False with 0.

An appropriate sampler can trace and accumulate the score factor of a sample to compare with the score factor of other samples. The accumulated score factors give us access to the relative probability of the two samples, and then we can use MCMC to start sampling. We won't go into detail here about how tracing and accumulating works or which two samples we are actually comparing. monad-bayes provides a few samplers that can go through this process in different ways. We include here the prior and the mh (Metropolis-Hastings sampler) functions before our well known sampleIOfixed function to use an MCMC sampler. Hopefully, the general idea became clear enough here. We'll provide more of the details later in this series.

The final result looks like this:

nsamples = 1000
samples <- sampleIOfixed $ prior $ mh nsamples model2
vlShow $ plot (200, 100) [barPlot "b"] [("b", VL.Booleans samples)]

png

Voilà, we wrote down a model (answer to "Did it rain yesterday?") with an uninformed (uniform) prior, and updated it based on the observation "It rained yesterday!". The distribution of parameter b after scoring—its posterior distribution—has probability 1 for True and probability 0 for False. The operations that we needed to figure out the posterior distribution were random and score.

Multiple parameters

Let's move onwards to more complex models: What if we considered a model with two parameters? Both parameters are drawn independently from uniform continuous distributions between -1 and 1. Again, we want to score this model. This time, we use the function condition that is a short form for scoring with 1 or 0 based on a condition. The new model becomes:

model3 :: MonadInfer m => m (Double, Double)
model3 = do
    b <- uniform (-1) 1
    m <- uniform (-1) 1
    condition $ (b-m) > 0
    return (b, m)

The principal approach is the same: pick sample b, pick sample m, modify the joint sample probability based on a condition, and return the values of both samples in a tuple. If we run this through the Metropolis-Hastings sampler, we get:

nsamples = 5000
modelsamples <- sampleIOfixed $ prior $ mh nsamples model3
(xValues, yValues) = unzip modelsamples
vlShow $ plot (600, 300)
              [density2DPlot "b" "m" (-1.1,1.1) (-1.1,1.1)]
              [("b", VL.Numbers xValues), ("m", VL.Numbers yValues)]

png

The resulting distribution in the plot above is 0 where b<m, and approximately uniform when b>m, as we'd expect. You might spot the initial state of the Markov chain as a faint rectangle in the b<m region.

How about multiple conditions? Remember, we can freely bind operations together so it shouldn't be a problem. This model chains two sampling and two condition operations:

model4 :: MonadInfer m => m (Double, Double)
model4 = do
    b <- uniform (-1) 1
    m <- uniform (-1) 1
    condition $ (b-m) > 0
    condition $ (b+m) > 0
    return (b, m)

And it produces the expected result:

nsamples = 5000
modelsamples <- sampleIOfixed $ prior $ mh nsamples model4
(xValues, yValues) = unzip modelsamples
vlShow $ plot (600, 300)
              [density2DPlot "b" "m" (-1.1,1.1) (-1.1,1.1)]
              [("b", VL.Numbers xValues), ("m", VL.Numbers yValues)]

png

Conclusions

We learned how to build models and examine related probability distributions by drawing samples and modifying their relative probabilities with the score function. The MCMC approach taken by monad-bayes and similar frameworks avoids computations with full distributions, and works with individual samples to enormously simplify life. monad-bayes can, therefore, be used to approximate large and complex distributions—something that quickly comes in handy. We can use monad-bayes to approximate the distribution of the return values of basically any Haskell function for a given input distribution. This could even be the return values of entire programs. Even better—we can use score to infer the input distribution if we have a way of scoring samples based on observations.

We hope you enjoyed this first post in our Probabilistic Programming with monad‑bayes Series and learned lots! Now, you're ready to build more general statistical models using these building blocks, and proceed to linear regression in our next post. We hope you join us!

Notes

We use this GitHub version of monad-bayes in our posts and notebooks since it's neither on Hackage nor Stackage right now. Here are two original articles you may want to check out:

September 20, 2019 12:00 AM

Chris Penner

Optics + Regex: Greater than the sum of their parts

Optics + Regex: Greater than the sum of their parts

The library presented in this post is one of many steps towards getting everyone interested in the amazing world of Optics! If you're at all interested in learning their ins & outs; check out the comprehensive book I'm writing on the topic: Optics By Example


Regardless of the programming language, regular expressions have always been a core tool in the programmer's toolbox. Though some have a distaste for their difficult to maintain nature, they're an adaptable quick'n'dirty way to get things done.

As much love as I have for Regular Expressions, they've become an incredibly hacky thing; they support a lot of options and a lot of different behaviours, so the interfaces to regular expressions in all languages tends to leave a bit to be desired.

The Status Quo

I don't know about you, but I've found almost every regular expression interface I've ever used in any language to be a bit clunky and inelegant; that's not meant to insult or demean any of those libraries, I think it's because regular expressions have a complex set of possible operations and the combination of them makes it tough to design a clean interface. Here are just a few reasons why it's hard to design a regex interface:

  • Regular expressions can be used to either get or set
  • Sometimes you want only one match, sometimes a few, sometimes you want all of them!
  • Sometimes you want just the match groups; sometimes you want the whole match, sometimes you want BOTH!
  • Regular Expression searching is expensive; we want to be lazy to avoid work!
  • Regular expressions patterns are written as text; what if it's not valid?

Luckily Haskell has a few tricks that help make some of these inherently difficult things a bit easier. Inherently lazy data structures and computations allows us to punt off laziness to the language rather than worrying about how to do the minimal amount of work possible. TemplateHaskell allows us to statically check Regular Expressions to ensure they're valid at compile time, and could even possibly allow us to statically analyze the existence of match groups. But that still leaves a lot of surface area to cover! It's easy to see how come these interfaces are complicated!

Think about designing a single interface which can support ALL of the following operations performantly and elegantly:

  • Get me the second match group from the first three matches
  • Replace only the first match with this text
  • Get me all groups AND match text from ALL matches
  • Replace the first match with this value, the next with this one, and so on...
  • Lazily get me the full match text of the first 2 matches where match-group 1 has a certain property.

Yikes... That's going to take either a lot of methods or a lot of options!

In a language like Haskell which doesn't have keyword or optional arguments it means we have to either overload operators with a lot of different meanings based on context; or provide a LOT of functions that the user has to learn, increasing our API's surface area. You may be familiar with the laughably overloaded "do everything" regex operator in many Haskell regex libs:

(=~) :: ( RegexMaker Regex CompOption ExecOption source2
        , RegexContext Regex source1 target
        ) => source1 -> source2 -> target

And even that doesn't handle replacement!

Overloading is one approach, but as it turns out, it requires a lot of spelunking through types and documentation to even find out what the valid possible uses are! I'm going to rule out this approach as unwieldy and tough to reason about. That leaves us with the other option; add a whole bunch of methods or options, which doesn't sound great either, mainly because I don't want someone to need to learn a dozen specialized functions just to use my library. If only there was some existing vocabulary of operations which could be composed in different permutations to express complex ideas!

Something Different

Introducing lens-regex-pcre; a Haskell regular expression library which uses optics as its primary interface.

Think about what regular expressions are meant to do; they're an interface which allows you to get or set zero or more small pieces of text in a larger whole. This is practically the dictionary definition of a Traversal in optics! Interop with optics means you instantly benefit from the plethora of existing optics combinators! In fact, optics fit this problem so nicely that the lensy wrapper I built supports more features, with less code, and runs faster (for replacements) than the regex library it wraps! Stay tuned for more on how that's even possible near the end!

Using optics as an interface has the benefit that the user is either already familiar with most of the combinators and tools they'll need from using optics previously, or that everything they learn here is transferable into work with optics in the future! As more optics are discovered, added, and optimized, the regex library passively benefits without any extra work from anyone!

I don't want to discount the fact that optics can be tough to work with; I'm aware that they have a reputation of being too hard to learn and sometimes have poor type-inference and tricky error messages. I'm doing my best to address those problems through education, and there are new optics libraries coming out every year that improve error messages and usability! Despite current inconveniences, optics are fundamental constructions which model problems well; I believe optics are inevitable! So rather than shying away from an incredibly elegant solution because of a few temporary issues with the domain I'd rather push through them, use all the power the domain provides me, and continue to do all I can to chip away at the usability problems over time.

Optics are inevitable.

Okay! I'll put my soapbox away, now it's time to see how this all actually works. Notice how most of the following examples actually read roughly like a sentence!

Examples

lens-regex-pcre provides regex, match, group and groups in the following examples, everything else is regular ol' optics from the lens library!

We'll search through this text in the following examples:

txt :: Text
txt = "raindrops on roses and whiskers on kittens"

First off, let's check if a pattern exists in the text:

>>> has [regex|wh.skers|] txt
True

Looks like we found it!

regex is a QuasiQuoter which constructs a traversal over all the text that matches the pattern you pass to it; behind the scenes it compiles the regex with pcre-heavy and will check your regex for you at compile time! Look; if we give it a bad pattern we find out right away!

-- Search
>>> has [regex|?|] txt

<interactive>:1:12: error:
    • Exception when trying to run compile-time code:
        Text.Regex.PCRE.Light: Error in regex: nothing to repeat

Handy!

Okay! Moving on, what if we just want to find the first match? We can use firstOf from lens to get Just the first focus or Nothing.

Here we use a fun regex to return the first word with doubles of a letter inside; it turns out kittens has a double t!

We use match to say we want to extract the text that was matched.

>>> firstOf ([regex|\w*(\w)\1\w*|] . match) txt
Just "kittens"

-- Alias: ^?
>>> txt ^? [regex|\w*(\w)\1\w*|] . match
Just "kittens"

Next we want to get ALL the matches for a pattern, this one is probably the most common task we want to perform, luckily it's common when working with optics too!

Let's find all the words starting with r using toListOf

>>> toListOf ([regex|\br\w*|] . match) txt
["raindrops","roses"]

-- ALIAS: ^..
>>> txt ^.. [regex|\br\w*|] . match
["raindrops","roses"]

What if we want to count the number of matches instead?

>>> lengthOf [regex|\br\w*|] txt
2

Basically anything you can think to ask is already provided by lens

-- Do any matches contain "drop"?
>>> anyOf ([regex|\br\w*|] . match) (T.isInfixOf "drop") txt
True

-- Are all of our matches greater than 3 chars?
>>> allOf ([regex|\br\w*|] . match) ((>3) . T.length) txt
True

-- "Is 'roses' one of our matches"
>>> elemOf ([regex|\br\w*|] . match) "roses" txt
True

Substitutions and replacements

But that's not all! We can edit and mutate our matches in-place! This is something that the lensy interface does much better than any regex library I've ever seen. Hold my beer.

We can do the boring basic regex replace without even breaking a sweat:

>>> set ([regex|\br\w*|] . match) "brillig" txt
"brillig on brillig and whiskers on kittens"

-- Alias .~
>>> txt & [regex|\br\w*|] . match .~ "brillig"
"brillig on brillig and whiskers on kittens"

Now for the fun stuff; we can mutate a match in-place!

Let's reverse all of our matches:

>>> over ([regex|\br\w*|] . match) T.reverse txt
"spordniar on sesor and whiskers on kittens"

-- Alias %~
>>> txt & [regex|\br\w*|] . match %~ T.reverse
"spordniar on sesor and whiskers on kittens"

Want to replace matches using a list of substitutions? No problem! We can use partsOf to edit our matches as a list!

>>> txt & partsOf ([regex|\br\w*|] . match) .~ ["one", "two"]
"one on two and whiskers on kittens"

-- Providing too few simply leaves extras alone
>>> txt & partsOf ([regex|\br\w*|] . match) .~ ["one"]
"one on roses and whiskers on kittens"

-- Providing too many performs as many substitutions as it can
>>> txt & partsOf ([regex|\br\w*|] . match) .~ ["one", "two", "three"]
"one on two and whiskers on kittens"

We can even do updates which require effects!

Let's find and replace variables in a block of text with values from environment variables using IO!

Note that %%~ is the combinator for running a traverse over the targets. We could also use traverseOf.

import qualified Data.Text as T
import Control.Lens
import Control.Lens.Regex
import System.Environment
import Data.Text.Lens

src :: T.Text
src = "Hello $NAME, how's your $THING?"

replaceEnv :: T.Text -> IO T.Text
replaceEnv = [regex|\$\w+|] . match . unpacked %%~ getEnv . tail

Let's run it:

>>> setEnv "NAME" "Joey"
>>> setEnv "THING" "dog"
>>> replaceWithEnv src
"Hello Joey, how's your dog?"

When you think about what we've managed to do with replaceWithEnv in a single line of code I think it's pretty impressive.

And we haven't even looked at groups yet!

Using Groups

Any sufficiently tricky regex problem will need groups eventually! lens-regex-pcre supports that!

Instead of using match after regex we just use groups instead! It's that easy.

Let's say we want to collect only the names of every variable in a template string:

template :: T.Text
template = "Hello $NAME, glad you came to $PLACE"

>>> toListOf ([regex|\$(\w+)|] . group 0) template
["NAME","PLACE"]

You can substitute/edit groups too!

What if we got all our our area codes and local numbers messed up in our phone numbers? We can fix that in one fell swoop:

phoneNumbers :: T.Text
phoneNumbers = "555-123-4567, 999-876-54321"

-- 'reverse' will switch the first and second groups in the list of groups matches!
>>> phoneNumbers & [regex|(\d{3})-(\d{3})|] . groups %~ Prelude.reverse
"123-555-4567, 876-999-54321"

Bringing it in

So with this new vocabulary how do we solve all the problems we posed earlier?

  • Get me the second match group from the first three matches
>>> "a:b, c:d, e:f, g:h" ^.. taking 3 ([regex|(\w):(\w)|] . group 1)
["b","d","f"]

You can replace the call to taking with a simple Prelude.take 3 on the whole list of matches if you prefer, it'll lazily do the minimum amount of work!

  • Replace only the first match with this text
>>> "one two three" & [regex|\w+|] . index 0 . match .~ "new"
"new two three"
  • Get me all groups AND match text from ALL matches
>>> "a:b, c:d, e:f" ^.. [regex|(\w):(\w)|] . matchAndGroups
[("a:b",["a","b"]),("c:d",["c","d"]),("e:f",["e","f"])]
  • Replace the first match with this value, the next with this one, and so on...
-- If we get more matches than replacements it just leaves the extras alone
>>> "one two three four" & partsOf ([regex|\w+|] . match) .~ ["1", "2", "3"]
"1 2 3 four"
  • Lazily get me the full match text of the first 2 matches where match-group 1 has a certain property.
-- The resulting list will be lazily evaluated!
>>> "a:b, c:d, e:f, g:h" 
      ^.. [regex|(\w):(\w)|] 
      . filtered (has (group 0 . filtered (> "c"))) 
      . match
["e:f","g:h"]

Anyways, at this point I'm rambling, but I hope you see that this is too useful of an abstraction for us to give up!

Huge thanks to everyone who has done work on pcre-light and pcre-heavy; and of course everyone who helped to build lens too! This wouldn't be possible without both of them!

The library has a Text interface which supports Unicode, and a ByteString interface for when you've gotta go fast!

Performance

Typically one would expect that the more expressive an interface, the worse it would perform, in this case the opposite is true! lens-regex-pcre utilizes pcre-heavy ONLY for regex compilation and finding match positions with scanRanges, that's it! In fact, I don't use pcre-heavy's built-in support for replacements at all! After finding the match positions it lazily walks over the full ByteString splitting it into chunks. Chunks are tagged with whether they're a match or not, then the "match" chunks are split further to represent whether the text is in a group or not. This allows us to implement all of our regex operations as a simple traversal over a nested list of Eithers. These traversals are the ONLY things we actually need to implement, all other functionality including listing matches, filtering matches, and even setting or updating matches already exists in lens as generic optics combinators!

This means I didn't need to optimize for replacements or for viewing separately, because I didn't optimize for specific actions at all! I just built a single Traversal, and everything else follows from that.

You heard that right! I didn't write ANY special logic for viewing, updating, setting, or anything else! I just provided the appropriate traversals, optics combinators do the rest, and it's still performant!

There was a little bit of fiddly logic involved with splitting the text up into chunks, but after that it all gets pretty easy to reason about. To optimize the Traversal itself I was easily able to refactor things to use ByteString 'Builder's rather than full ByteStrings, which have much better concatenation performance.

With the caveat that I don't claim to be an expert at benchmarks; (please take a look and tell me if I'm making any critical mistakes!) this single change took lens-regex-pcre from being about half the speed of pcre-heavy to being within 0.6% of equal for search, and ~10% faster for replacements. It's just as fast for arbitrary pure or effectful modifications, which is something other regex libraries simply don't support. If there's a need for it, it can also trivially support things like inverting the match to operate over all unmatched text, or things like splitting up a text on matches, etc.

I suspect that these performance improvements are simple enough they could also be back-ported to pcre-heavy if anyone has the desire to do so, I'd be curious if it works just as well for pcre-heavy as it did for lens-regex-pcre.

You can try out the library here!; make sure you're using v1.0.0.0.

Hopefully you learned something 🤞! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put these things together, if I managed to teach your something or even just entertain you for a minute or two maybe send a few bucks my way for a coffee? Cheers! �

Become a Patron!

September 20, 2019 12:00 AM

September 19, 2019

Magnus Therning

Haskell, ghcide, and Spacemacs

The other day I read Chris Penner’s post on Haskell IDE Support and thought I’d make an attempt to use it with Spacemacs.

After running stack build hie-bios ghcide haskell-lsp --copy-compiler-tool I had a look at the instructions on using haskell-ide-engine with Spacemacs. After a bit of trial and error I came up with these changes to my ~/.spacemacs:

The slightly weird looking lsp-haskell-process-wrapper-function is removing the pesky --lsp inserted by this line.

That seems to work. Though I have to say I’m not ready to switch from intero just yet. Two things in particular didn’t work with ghcide/LSP:

  1. Switching from one the Main.hs in one executable to the Main.hs of another executable in the same project didn’t work as expected – I had hints and types in the first, but nothing in the second.
  2. Jump to the definition of a function defined in the package didn’t work – I’m not willing to use GNU GLOBAL or some other source tagging system.

September 19, 2019 12:00 AM

September 18, 2019

Chris Penner

Slick 1.0 Release - Now with a quick and easy template!

Slick 1.0 Release - Now with a quick and easy template!

TLDR; Build a site with slick 1.0: fork the slick-template.

Hey folks! Slick has been around for a while already, it's a light wrapper over Shake which allows for blazing fast static site builds! It provides Pandoc helpers to load in pages or posts as markdown, or ANYTHING that Pandoc can read (which is pretty much EVERYTHING nowadays). It offers support for Mustache templates as well!

Shake was always great as a build tool, but its Makefile-style of dependency targets was always a little backwards for building a site. Slick 1.0 switches to recommending using Shake's FORWARD discoverable build style. This means you can basically write normal Haskell code in the Action monad to build and render your site, and Shake will automagically cache everything for you with proper and efficient cache-busting! A dream come true.

Slick lets you build and deploy a static website using Github Pages (or literally any static file host) very easily while still maintaining completely open for extensibility. You can use any shake compatible lib, or even just IO if you want; Shake's forward build tools can even detect caching rules when running arbitrary external processes (caveat emptor).

Hope you like it! In case you're curious what a site might be like; this very blog is built with slick!

Here's a full snippet of code for building a simple blog (with awesome caching) from markdown files; check it out:

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE OverloadedStrings #-}

module Main where

import           Control.Lens
import           Control.Monad
import           Data.Aeson                 as A
import           Data.Aeson.Lens
import           Development.Shake
import           Development.Shake.Classes
import           Development.Shake.Forward
import           Development.Shake.FilePath
import           GHC.Generics               (Generic)
import           Slick
import qualified Data.Text                  as T

outputFolder :: FilePath
outputFolder = "docs/"

-- | Data for the index page
data IndexInfo =
  IndexInfo
    { posts :: [Post]
    } deriving (Generic, Show, FromJSON, ToJSON)

-- | Data for a blog post
data Post =
    Post { title   :: String
         , author  :: String
         , content :: String
         , url     :: String
         , date    :: String
         , image   :: Maybe String
         }
    deriving (Generic, Eq, Ord, Show, FromJSON, ToJSON, Binary)

-- | given a list of posts this will build a table of contents
buildIndex :: [Post] -> Action ()
buildIndex posts' = do
  indexT <- compileTemplate' "site/templates/index.html"
  let indexInfo = IndexInfo {posts = posts'}
      indexHTML = T.unpack $ substitute indexT (toJSON indexInfo)
  writeFile' (outputFolder </> "index.html") indexHTML

-- | Find and build all posts
buildPosts :: Action [Post]
buildPosts = do
  pPaths <- getDirectoryFiles "." ["site/posts//*.md"]
  forP pPaths buildPost

-- | Load a post, process metadata, write it to output, then return the post object
-- Detects changes to either post content or template
buildPost :: FilePath -> Action Post
buildPost srcPath = cacheAction ("build" :: T.Text, srcPath) $ do
  liftIO . putStrLn $ "Rebuilding post: " <> srcPath
  postContent <- readFile' srcPath
  -- load post content and metadata as JSON blob
  postData <- markdownToHTML . T.pack $ postContent
  let postUrl = T.pack . dropDirectory1 $ srcPath -<.> "html"
      withPostUrl = _Object . at "url" ?~ String postUrl
  -- Add additional metadata we've been able to compute
  let fullPostData = withPostUrl $ postData
  template <- compileTemplate' "site/templates/post.html"
  writeFile' (outputFolder </> T.unpack postUrl) . T.unpack $ substitute template fullPostData
  -- Convert the metadata into a Post object
  convert fullPostData

-- | Copy all static files from the listed folders to their destination
copyStaticFiles :: Action ()
copyStaticFiles = do
    filepaths <- getDirectoryFiles "./site/" ["images//*", "css//*", "js//*"]
    void $ forP filepaths $ \filepath ->
        copyFileChanged ("site" </> filepath) (outputFolder </> filepath)

-- | Specific build rules for the Shake system
--   defines workflow to build the website
buildRules :: Action ()
buildRules = do
  allPosts <- buildPosts
  buildIndex allPosts
  copyStaticFiles

-- | Kick it all off
main :: IO ()
main = do
  let shOpts = forwardOptions $ shakeOptions { shakeVerbosity = Chatty}
  shakeArgsForward shOpts buildRules

See you next time!

Hopefully you learned something 🤞! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put these things together, if I managed to teach your something or even just entertain you for a minute or two maybe send a few bucks my way for a coffee? Cheers! �

Become a Patron!

September 18, 2019 12:00 AM

September 16, 2019

Monday Morning Haskell

Adding Random Exploration

brain_idea.jpg

Last week, we finally built a pipeline to use machine learning on our maze game. We made a Tensor Flow graph that could train a "brain" with weights so we could navigate the maze. This week, we'll see how our training works, or rather how it doesn't work. We'll consider how randomizing moves during training might help.

Our machine learning code lives in this repository. For this article, you'll want to look at the randomize-moves branch. Take a look here for the original game code. You'll want the q-learning branch in the main repo.

This part of the series uses Haskell and Tensor Flow. To learn more about using these together, download our Haskell Tensor Flow Guide!

Unsupervised Machine Learning

With a few tweaks, we can run our game using the new output weights. But what we'll find as we train the weights is that our bot never seems to win! It always seems to do the same thing! It might move up and then get stuck because it can't move up anymore. It might stand still the whole time and let the enemies come grab it. Why would this happen?

Remember that reinforcement learning depends on being able to reinforce good behaviors. Thus at some point, we have to hope our AI will win the game. Then it will get the good reward so that it can change its behavior to adapt and get good results more often. But if it never gets a good result in the whole training process, it will never learn good behaviors!

This is part of the challenge of unsupervised learning. In a supervised learning algorithm, we have specific good examples to learn from. One way to approach this would be to record our own moves of playing the game. Then the AI could learn directly from us! We'll probably try this approach in the future!

But q-learning is an unsupervised algorithm. We're forcing our AI to explore the world and learn for its own. But right now, it's only making moves that it thinks are "optimal." But with a random set of weights, the "optimal" moves aren't very optimal at all! Part of a good "exploration" plan means letting it choose moves from time to time that don't seem optimal.

Adding a Random Choice

As our first attempt to fix this, we'll add a "random move chance" to our training process. At each training step, our network chooses its "best" move, and we use that to update the world state. From now on, whenever we do this, we'll roll the dice. And if we get a number below our random chance, we'll pick a random move instead of our "best" move.

Over the course of training though, we want to decrease this random chance. In theory, our AI should be better as we train the network. So as we get closer to the end of training, we'll want to make fewer random decisions, and more "best" decisions. We'll aim to start this parameter as 1 in 5, and reduce it down to 1 in 50 as training continues. So how do we implement this?

First of all, we want to keep track of a value representing our chance of making a random move. Our runAllIterations function should be stateful in this parameter.

-- Third "Float" parameter is the random chance
runAllIterations :: Model -> World
  -> StateT ([Float, Int, Float) Session ()
...

trainGame :: World -> Session (Vector Float)
trainGame w = do
  model <- buildModel
  let initialRandomChance = 0.2
  (finalReward, finalWinCount, _) <- execStateT
    (runAllIterations model w)
    ([], 0, initialRandomChance)
  run (readValue $ weightsT model)

Then within runAllIterations, we'll make two changes. First, we'll make a new random generator for each training game. Then, we'll update the random chance, reducing it with the number of iterations:

runAllIterations :: Model -> World
  -> StateT ([Float, Int, Float) Session ()
runAllIterations model initialWorld = do
  let numIterations = 2000
  forM [1..numIterations] $ \i -> do
    gen <- liftIO getStdGen
    (wonGame, (_, finalReward, _)) <- runStateT
      (runWorldIteration model)
      (initialWorld, 0.0, gen)
    (prevRewards, prevWinCount, randomChance) <- get
    let modifiedRandomChance = 1.0 / ((fromIntegral i / 40.0) + 5)
    put (newRewards, newWinCount, modifiedRandomChance)
  return ()

Making Random Moves

We can see now that runWorldIteration must now be stateful in the random generator. We'll retrieve that as well as the random chance at the start of the operation:

runWorldIteration :: Model -> StateT (World, Float, StdGen)
  (StateT ([Float], Int, Float) Session) Bool
runWorldIteration model = do
  (prevWorld, prevReward, gen) <- get
  (_, _, randomChance) <- lift get
  ...

Now let's refactor our serialization code a bit. We want to be able to make a new move based on the index, without needing the weights:

moveFromIndex :: Int -> PlayerMove
moveFromIndex bestMoveIndex =
  PlayerMove moveDirection useStun moveDirection
  where
    moveDirection = case bestMoveIndex `mod` 5 of
      0 -> DirectionUp
      1 -> DirectionRight
      2 -> DirectionDown
      3 -> DirectionLeft
      4 -> DirectionNone

Now we can add a function that will run the random generator and give us a random move if it's low enough. Otherwise, it will keep the best move.

chooseMoveWithRandomChance ::
  PlayerMove -> StdGen -> Float -> (PlayerMove, StdGen)
chooseMoveWithRandomChance bestMove gen randomChance =
  let (randVal, gen') = randomR (0.0, 1.0) gen
      (randomIndex, gen'') = randomR (0, 1) gen'
      randomMove = moveFromIndex randomIndex
  in  if randVal < randomChance
        then (randomMove, gen'')
        else (bestMove, gen')

Now it's a simple matter of applying this function, and we're all set!

runWorldIteration :: Model -> StateT (World, Float StdGen)
  (StateT ([Float], Int, Float) Session) Bool
runWorldIteration model = do
  (prevWorld, prevReward, gen) <- get
  (_, _, randomChance) <- lift get
  ...
  let bestMove = ...
  let (newMove, newGen) = chooseMoveWithRandomChance
                            bestMove gen randomChance
  …
  put (nextWorld, prevReward + newReward, newGen)
  continuationAction

Conclusion

When we test our bot, it has a bit more variety in its moves now, but it's still not succeeding. So what do we want to do about this? It's possible that something is wrong with our network or the algorithm. But it's difficult to reveal this when the problem space is difficult. After all, we're expecting this agent to navigate a complex maze AND avoid/stun enemies.

It might help to break this process down a bit. Next week, we'll start looking at simpler examples of mazes. We'll see if our current approach can be effective at navigating an empty grid. Then we'll see if we can take some of the weights we learned and use them as a starting point for harder problems. We'll try to navigate a true maze, and see if we get better weights. Then we'll look at an empty grid with enemies. And so on. This approach will make it more obvious if there are flaws with our machine learning method.

If you've never programmed in Haskell before, it might be a little hard to jump into machine learning. Check out our Beginners Checklist and our Liftoff Series to get started!

by James Bowen at September 16, 2019 02:30 PM

Tim Docker

Using ADL from haskell

The ADL system has proven valuable at Helix. We use it in most of our projects, as a strongly typed schema language for specifying:

  • http apis (in lieu of openapi/swagger)
  • database schemas (in lieu of sql)
  • configuration files
  • user interface forms

and then as the base for code generation in haskell, java, rust, c++ and typescript.

But, because ADL has a variety of uses, the path to getting started can be unclear. As a small stand alone example, this post shows how ADL can be used to specify the syntax of a yaml configuration file, and automate its parsing into haskell.

To follow along with this project, you’ll need the ADL compiler installed and on your shell PATH.

We’ll assume that our project is some sort of server which will load a yaml configuration at startup. Jumping right in, we can specify the config schema in a file adl/config.adl:

module config {

struct ServerConfig {
  Int32 port;
  Protocol protocol = "http";
  LogLevel logLevel = "info";
};

union Protocol {
  Void http;
  SslConfiguration https;
};

struct SslConfiguration {
  FilePath certificate;
  FilePath certificateKey;
};

type FilePath = String;

union LogLevel {
  Void error;
  Void warn;
  Void info;
  Void debug;
  Void trace;
};

};

Being minimal, our ServerConfig has a port, some protocol information, and a logging level. The port has no default value, so is required in the configuration. The other fields are optional, with the given defaults being used in their absence. Note the protocol field is a union (aka a sum type). If it is http then no other information is required. However, if the protocol is https then paths for ssl certificate details are required. The full syntax and meaning of ADL is in the language documentation.

We’ve specified the data type for the server configuration, and we could now run the compiler to generate the corresponding haskell types and support code. The compiler does its best to generate idiomatic code in the target languages, but additional language specific information can improve the generated code. ADL annotations are used for this. Such annotations can be included in-line in the adl source code, though this get a little noisy when annotations are included for multiple targets – it gets hard to see the core type definitions themselves in a sea of annotations.

Hence ADL has a standard pattern for language specific annotations: such annotations for an ADL file x.adl are kept in the file x.adl-lang. Hence the adl compiler, when reading config.adl to generate haskell code, will look for and include the adl file config.adl-hs for haskell related annotations.

In this example, config.adl-hs is straightforward:

module config {

import adlc.config.haskell.*;

annotation ServerConfig HaskellFieldPrefix "sc_";
annotation Protocol HaskellFieldPrefix "p_";
annotation SslConfiguration HaskellFieldPrefix "ssl_";
annotation LogLevel HaskellFieldPrefix "log_";
};

Recent language extensions notwithstanding, haskell’s record system is somewhat primitive (try a google search for "haskell record problem"). A key issue is that record field names need to be unique in their containing module. To ensure this, by default, the haskell ADL code generator prefixes each field with its type name. Hence the ServerConfig declaration would generate:

data ServerConfig = ServerConfig
    { serverConfig_port :: Data.Int.Int32
    , serverConfig_protocol :: Protocol
    , serverConfig_logLevel :: LogLevel
    }

Whilst this guarantees that the generated code will compile, those field names are unwieldy. Hence the HaskellFieldPrefix annotation allows a custom (or no) prefix to be used. With the above config.adl-hs annotations, we get a more friendly:

data ServerConfig = ServerConfig
    { sc_port :: Data.Int.Int32
    , sc_protocol :: Protocol
    , sc_logLevel :: LogLevel
    }

With the ADL written it’s time to run the ADL compiler to generate the haskell code:

adlc haskell \
  --outputdir src \
  --package ADL \
  --rtpackage ADL.Core \
  --include-rt \
  --searchdir adl \
  adl/*.adl

The --include-rt and --rtpackage arguments tell the code generator to include the runtime support files, making the generated code self contained. See the haskell backend documentation for details.

I generally check the generated code into the source repository. Whilst this approach has some drawbacks, it has benefits too:

  • you don’t need the ADL compiler installed to build the package
  • you can build with your off-the shelf standard build system (cabal, cargo, tsc etc)

The main downside is that changing the source ADL requires explicitly rerunning the ADL compiler. In most projects I have a scripts/generate-adl.sh script to automate this step. Of course, if your build system is up to it, you may wish to generate the ADL derived code on demand.

We can now write some haskell code!

ADL’s core serialization schema is json (a alternate binary scheme is planned). In the generated haskell, every ADL value is an instance of the AdlValue type class, and then the library has helper functions to automate deserialization:

adlFromByteString :: AdlValue a => LBS.ByteString -> ParseResult a
adlFromJsonFile :: AdlValue a => FilePath -> IO (ParseResult a)
decodeAdlParseResult :: AdlValue a => T.Text -> ParseResult a -> Either T.Text a

If one wished to have a configuration file in json format, the latter two functions are sufficient to read and parse such a file. But json is less than ideal for human written configuration, due to its lack of support for comments, and its rigid syntax. The ADL core doesn’t have yaml support, but conveniently the haskell Data.Yaml package can parse yaml into json values, which the ADL core can then parse into ADL values. This is the approach we will take, and we write a yaml specific function to load an arbitrary ADL value:

import qualified Data.ByteString.Lazy as LBS
import qualified Data.Text as T
import qualified Data.Yaml as Y
import ADL.Core(runJsonParser, decodeAdlParseResult, AdlValue(..), ParseResult(..))

adlFromYamlFile :: AdlValue a => FilePath -> IO (Either T.Text a)
adlFromYamlFile file = (decodeAdlParseResult from . adlFromYamlByteString) <$> (LBS.readFile file)
  where
    adlFromYamlByteString :: (AdlValue a) => LBS.ByteString -> (ParseResult a)
    adlFromYamlByteString lbs = case Y.decodeEither' (LBS.toStrict lbs) of
      (Left e) -> ParseFailure ("Invalid yaml:" <> T.pack (Y.prettyPrintParseException e)) []
      (Right jv) -> runJsonParser jsonParser [] jv

    from = " from " <> T.pack file

Hopefully this is fairly self explanatory. It:

  • reads the input file contents as a bytestring
  • parses the yaml parser into a in-memory json value
  • parses the in memory json value into an adl value

whilst turning parse failures at either level into user friendly error messages.

With this helper function, the scaffolding for our server process is straightforward. We read an environment variable for the configuration file path, use the adlFromYamlFile written previously, and launch our (dummy) server code.

main :: IO ()
main = do
  let configEnvVar = "CONFIG_PATH"
  mEnvPath <- lookupEnv configEnvVar
  case mEnvPath of
    Nothing -> exitWithError (configEnvVar <> " not set in environment")
    (Just envPath) -> do
      eConfig <- adlFromYamlFile envPath
      case eConfig of
        (Left emsg) -> exitWithError (T.unpack emsg)
        (Right config) -> startServer config

exitWithError :: String -> IO ()
exitWithError emsg = do
  hPutStrLn stderr emsg
  exitFailure
  
startServer :: ServerConfig -> IO ()
startServer sc = do
  case sc_protocol sc of
    P_http -> putStrLn ("Starting http server on port " ++ (show (sc_port sc)))
    P_https{} -> putStrLn ("Starting https server on port " ++ (show (sc_port sc)))
  threadDelay 1000000000

The simplest configuration yaml specifies just the port, relying on the ADL defaults for other fields:

port: 8080

An example that overrides the protocol, and hence must provide additional information:

port: 8443
protocol:
  https:
    certificate: /tmp/certificate.crt
    certificateKey: /tmp/certificate.key

The ADL json/yaml serialization schema is straightforward. One point of note is that ADL unions (like Protocol in the example) are serialized as single element objects. See the serialisation documentation for details.

The parser provides helpful error messages. In the above example config, if you leave out the last line and fail to set the SSL key, the error is:

Unable to parse a value of type config.ServerConfig from demo-server-example3.yaml:
expected field certificateKey at protocol.https

Hopefully this post has given a simple but useful demonstration of ADL usage from haskell. It’s really only a starting point – the ADL system’s value increases dramatically when used to ensure consist types between systems written in multiple languages.

The complete code for this demonstration, include build and dependency configuration can be found in its github repo.

by Tim Docker at September 16, 2019 02:54 AM

September 15, 2019

Magnus Therning

Nested tmux

I’ve finally gotten around to sorting out running nested tmux instances. I found the base for the configuration in the article Tmux in practice: local and nested remote tmux sessions, which links a few other related resources.

What I ended up with was this:

# Toggle tmux keybindings on/off, for use with inner tmux
# https://is.gd/slxE45
bind -T root F12  \
  set prefix None \;\
  set key-table off \;\
  set status-left "#[fg=black,bg=blue,bold] OFF " \;\
  refresh-client -S

bind -T off F12 \
  set -u prefix \;\
  set -u key-table \;\
  set -u status-left \;\
  refresh-client -S

It’s slightly simpler than what’s in the article above, but it works and it fits rather nicely with the nord theme.

September 15, 2019 12:00 AM

Nested tmux

I’ve finally gotten around to sorting out running nested tmux instances. I found the base for the configuration in the article Tmux in practice: local and nested remote tmux sessions, which links a few other related resources.

What I ended up with was this:

# Toggle tmux keybindings on/off, for use with inner tmux
# https://is.gd/slxE45
bind -T root F12  \
  set prefix None \;\
  set key-table off \;\
  set status-left "#[fg=black,bg=blue,bold] OFF " \;\
  refresh-client -S

bind -T off F12 \
  set -u prefix \;\
  set -u key-table \;\
  set -u status-left \;\
  refresh-client -S

It’s slightly simpler than what’s in the article above, but it works and it fits rather nicely with the nord theme.

September 15, 2019 12:00 AM

September 13, 2019

Functional Jobs

Compiler Engineer at Axoni (Full-time)

Compiler Engineer

Headquartered in New York City, we are currently working intensely on what we expect to become the most widely used programming language for blockchain smart contracts: AxLang. Based on Scala, AxLang enables secure and full featured smart contract development by supporting both functional programming and formal verification. Its design is driven by the rigorous requirements for solutions serving the world's largest financial institutions.

AxLang is part of Axoni's blockchain infrastructure, which underpins the broadest reaching and most ambitious permissioned ledger production projects in the world, including $11 trillion of credit derivatives, the world's leading foreign exchange connectivity network, and various other industry implementations.

By joining our growing compiler team, you will help us build cutting edge compiler technology that targets the Ethereum open-source community. Overall, this is a unique opportunity to contribute to one of the most promising technologies today by helping us revolutionize the way smart contracts are developed in the Ethereum ecosystem.

The ideal candidate should be comfortable with programming language concepts such as type systems and formal methods and compiler design concepts such as abstract syntax tree (AST) and source-to-source transformations.

Relevant Skills and Experience

2+ years of industry experience on a functional programming language such as Scala, Haskell, OCaml Experience with compiler development a plus Experience with formal verification and/or semantic analysis a plus Research in the field of programming languages, compilers, and formal verification a plus Knowledge of (Ethereum) smart contracts a plus Familiar with best practices for Agile and Test Driven Development Strong communication skills and a collaborative team member

Get information on how to apply for this position.

September 13, 2019 07:56 PM

Lead engineer at Coinweb, ltd (Full-time)

At Coinweb we are looking for talented software engineers who love distributed systems and obsess about code quality.

We are a funded startup with a mission to bring real innovation to Fintech using blockchain. Our small friendly team works in Barcelona and Kiev, along with team members in Buenos Aires, Bangkok and Stockholm.

Our core is implemented in Haskell and we are hiring junior and senior engineers with functional programming or DevOps experience.

If you or someone you know is interested, please contact us!

Get information on how to apply for this position.

September 13, 2019 10:45 AM

Lysxia's blog

September 12, 2019

Tweag I/O

War Stories of Asterius:
Numerics & Debugging

Siddharth Bhat

I got the opportunity to work on Asterius, a new Haskell to WebAssembly compiler, during my internship at Tweag. My task was to get anything numerics-related stabilized in its compiled code. Generally, this meant experimenting with all the conversion routines between Float, Double, Rational, Int, and the ton of intrinsics that the Glasgow Haskell Compiler (GHC) provides for these values.

TLDR I helped integrate a part of GHC's test suite with Asterius during my internship at Tweag. Now, it can pass almost all of the numerics test suite. It was a really fun experience—I got to read a bunch of GHC sources, fight with the garbage collector, and come out knowing a lot more than I did when I went in. We also ended up making some modest contributions upstream, to binaryen, tasty, and GHC.

First steps: getting up to speed · PR #114

I spent my first week getting familiar with the Asterius codebase so I could fix a bug with coercion from Int64/Int32 values to Int8 values. This task forced me to explore both the Asterius codebase and the corresponding GHC sources that were responsible for this bug. I continued making other small, localized fixes that enabled me to get to know the rest of the codebase.

Integrating the GHC test suite · PR #132

The next major thing we worked on was integrating a part of the GHC test suite into Asterius—to enable us to sanity check our runtime against GHC's battle-hardened test suite. The original GHC test suite is a large Python and Makefile based project which needs significant work to reuse for Asterius, so we chose an alternative approach: copy the single-module should-run tests into our source tree, write a custom tasty driver to run each test, compare against expected output, and write results into a CSV report. The CSV reports are available as CI artifacts, so by diffing against the reports of previous commits, regressions can be quickly observed.

The initial numbers were interesting: of all 706 tests integrated so far, if we were optimists, then 380 were passing, which was about half of all tests. And if we were pessimists...

We rolled up our sleeves and got to work: I decided that by the end of the three months, I wanted all tests in the numerics/ subfolder stabilized. These tests mostly deal with everything related to numbers including conversions between Float, Double, Rational, Int, Word, the various intrinsics in GHC.Prim for numbers, and tests for Integer support (implemented in Asterius with JavaScript BigInts).

Stabilizing numerics

The workflow to crush a bug was essentially:

  1. Pick a failing test case that looks like a low-hanging fruit
  2. Read the relevant GHC sources
  3. Pin down what's going wrong by repeatedly shrinking and logging
  4. Fix it, ensure the test case turns green
  5. Goto step 1

It sounds quite repetitive, but it was anything but. I learned a lot of interesting details about GHC's internals. For example, some of the ones I remember fondly are:

At the end of all of this, I had in total 59 PRs, 40 of which got merged into Asterius. The rest are either closed experimental branches, or open PRs waiting to be merged.

Our failure rate on GHC test suite has changed as:

  • Then: 327/707 failures in total, 36/50 failures in numerics (collected from commit 6290d24)
  • Now: 168/707 failures in total, 3/50 failures in numerics (collected from commit 222858b)

Most of the remaining bugs we categorized appear to fall into the following classes:

  • A lack of runtime features like multi-threading, STM, etc.
  • A lack of full Unicode support
  • Subtle issues in various parts of the runtime, e.g., the storage manager

Aside: GC bug hunting

Eventually, we found ourselves running into bugs in our implementation of the garbage collector in Asterius. These are usually incredibly painful to debug. Two major sources of the pain are:

  • When the GC malfunctions, the heap is already in an inconsistent state, however, the final crash site can be quite far away. All we're left with is the final error message of the crash. We need to work backwards for a long time to locate the crime scene; worse, it's not even always obvious that the root cause of the bug lies in GC, judging from a seemingly irrelevant error message.
  • The WebAssembly platform still lacks a good debugging story. Well-established tools like gdb aren't available; we don't even have standardized DWARF sections yet! The plain old logging approach is still the central way of hunting bugs, if not only.

Other than the seemingly endless loop of adding more logging logic and rerunning tests, we do have some debugging-related infrastructure. Since Asterius is a whole program compiler on the Cmm level, it's possible to implement aggressive link-time rewriting passes to add tracing or asserting logic. For example, we implement the "memory trap" feature when the debugging mode is enabled: it replaces all wasm load/store instructions with calls to our read/write barrier functions implemented in JavaScript. These functions utilize the information in the block allocator and check whether an address points to a region which is already recycled by the copying GC. The memory trap is quite useful in making GC-related bugs crash as early as possible, and we indeed spotted use-after-free bugs with its help.

Another approach to check whether a runtime bug is GC-related: not running GC at all! We implemented a "YOLO mode" in the runtime which disables all evacuating/scavenging logic, and the only thing GC does is allocating new space for the nursery and resuming Haskell execution. By running the test suite with/without the YOLO flag and diffing the reports, we can quickly tell whether a test failure is likely related to GC.

I also eventually ended up writing helpers to structure the heap and figure out what arbitrary bit patterns I was looking at—bollu/biter was written during an afternoon of debugging some messed up floating-point representation bug caused by incorrect bit manipulation.

Similarly, another technique I began using was to create debug logs that would emit Python code. This Python code would then render the state of the heap at that given point in time: this is a much saner way to see what's going on than view raw numbers or bit strings. For example:

The fact that the gray region overlaps with the green region is a Bad Thing since we were freeing up some memory that is actually still kept alive by the higher-level heap allocator. Without visualization, this sort of thing is tough to recognize when you're staring at nothing but pointers which look like:

9007160601084160, 9007160602132736, 9007160603181312, 9007160604229888, 9007160601084160...

So, the root cause of the bug above was some missing synchronization logic between our two levels of allocators. We have a low-level block allocator which allocates and frees large blocks of memory, growing the wasm linear memory when needed; above that comes the heap allocator which keeps a pool of blocks to serve as nurseries of Haskell heap objects. After a round of GC, we have a set of "live" blocks which make up the "to-space" of copying GC, and we free all memory outside the set. But we should have also been keeping alive the blocks in the pools owned by the heap allocator; otherwise a piece of already "freed" memory can be provided as nurseries without proper initialization. Look at the Python-generated graph, and the simple yet deadly problem is made obvious.

In general, debugging the GC took lots of patience and code. There are entire branches worth of history spent debugging, that did not get merged into master.

Wrapping up

I loved working on Asterius at Tweag! I got to contribute stuff upstream, got my hands dirty with the garbage collector, the low-level cbits (C functions for various standard libraries), and while helping a real-world project! I hope to continue working on this and get the number of bugs down to zero.

Finally, Tweag's Paris office is a fun place to work! I picked up (very little) French, a bunch about sampling and Markov chain Monte Carlo (MCMC) techniques, tidbits of category theory and type theory, some differential geometry, and enjoyed lunch conversations about topics ranging from physics to history. It was a delightful, rewarding experience—both personally and professionally!

September 12, 2019 12:00 AM

September 07, 2019

Chris Penner

Haskell IDE Support (hie-core lsp Sept. 2019)

EDIT: This project has been renamed to ghcide now; you can find it here!

Here's a super quick guide on adding hie-core to your workflow!

Disclaimer; this post depends on the state of the world as of Saturday Morning, Sept. 7th 2019; it's likely changed since then. I'm not a maintainer of any of these libraries, and this is a complicated and confusing process. There's a good chance this won't work for you, but I'm afraid I can't support every possible set up. Use it as a guide-post, but you'll probably need to fix a few problems yourself. Feel free to let me know if things are broken, but I make no guarantees that I can help, sorry! Good luck!

This is a guide for using it with stack projects, or at least using the stack tool. If your project isn't a stack project, you can probably just run stack init first.

hie-core currently requires a whole suite of tools to run, including hie-bios, hie-core, and haskell-lsp. Each of these need to be installed against the proper GHC version and LTS that you'll be using in your project. This is a bit annoying of course, but the end result is worth it.

We need separate binaries for every GHC version, so to avoid getting them all confused, we'll install everything in LTS specific sandboxes!

  • First navigate to the project you want to run hie-core with
  • Now stack update; sometimes stack doesn't keep your hackage index up-to-date and most of the packages we'll be using are pretty new.
  • stack build hie-bios hie-core haskell-lsp --copy-compiler-tool
    • We need these three executables installed, using stack build doesn't install them globally (which is what we want to avoid conflicts), but --copy-compiler-tool allows us to share binaries with other projects of the same LTS.
    • This will probably FAIL the first time you run it, stack will suggest that you add extra-deps to your stack.yaml; go ahead and do that and try again. Repeat this process until success!

If you've got all those running, time to go for a walk, or make a cup of tea. It'll take a while.

If you're using an LTS OLDER than 14.1 then haskell-lsp will probably be too old to work with hie-core; you can try to fix it by adding the following to your extra-deps:

extra-deps:
- haskell-lsp-0.15.0.0
- haskell-lsp-types-0.15.0.0

If that doesn't work, sorry, I really have no idea :'(

Okay, so now we've got all the tools installed we can start configuring the editor. I can't tell you how to install it for every possible editor, but the key parts to know is that it's a language server, so search for integrations for your editor that handle that protocol. Usually "$MyEditorName lsp" is a good google search. Once you find a plugin you need to configure it. Typically there's a spot in the settings to associate file-types with the language server binary. Punch in the Haskell filetype or extensions accordingly, the lsp binary is stack exec hie-core -- --lsp; this'll use the hie-core you install specifically for this LTS, and will add the other dependencies to the path properly. You'll likely need to specify the binary and arguments separately, see the following vim setup for an example.

Vim Setup

Here's my setup for using hie-core with Neovim using the amazing Coc plugin. Note that you'll need to install Neovim from latest HEAD to get proper pop-up support, if you're on a Mac you can do that with brew unlink neovim; brew install --HEAD neovim.

Follow the instructions in the Coc README for installing that however you like; then run :CocConfig inside neovim to open up the config file.

Here's my current config:

{
"languageserver": {
  "haskell": {
    "command": "stack",
    "args": ["exec", "hie-core", "--", "--lsp"],
    "rootPatterns": [
      ".stack.yaml",
      "cabal.config",
      "package.yaml"
    ],
    "filetypes": [
      "hs",
      "lhs",
      "haskell"
    ],
    "initializationOptions": {
      "languageServerHaskell": {
      }
    }
  }
}
}

Also make sure to read the Sample Vim Configuration for Coc to set up bindings and such.

After you've done all that, I hope it's working for you, if not, something crazy has probably changed and you're probably on your own. Good luck!

PS; I have a little bash script I use for installing this in every new project in case you want to see how terrible I am at writing BASH. It includes a helper which auto-adds all the necessary extra-deps for you: My crappy bash script

You'll probably need to run the script more than once as it attempts to add all the needed extra-deps. Hopefully this'll get better as these tools get added to stackage.

Hopefully you learned something 🤞! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put these things together, if I managed to teach your something or even just entertain you for a minute or two maybe send a few bucks my way for a coffee? Cheers! �

Become a Patron!

September 07, 2019 12:00 AM

September 05, 2019

Well-Typed.Com

Remote Interactive Courses

Given the success of introducing an online version of our “Type level programming with GHC” course, we’re offering this again alongside two of our other courses: “Compact Introduction to Haskell” and “Performance and Optimisation”. These courses are now available to book online on a first come, first served basis. If you want to book a ticket but they have sold out, please sign up to the waiting list, in case one becomes available.

Training course details

All of these courses will be a mixture of lectures, discussions and live coding delivered via Google Meet. The maximum course size is deliberately kept small (up to 8 participants) so that it is still possible to ask and discuss individual questions. These will be delivered by Andres Löh, who has more than two decades of Haskell experience and has taught many courses to varied audiences.

Type-level Programming with GHC

An overview of Haskell language extensions designed for type-level programming and expressing more properties of your programs statically

14-15th October 2019, 0900-1300 BST (2 sessions, each 4 hours)

Haskell Performance and Optimisation

Lambda calculus and GHC’s Core language, reasoning about evaluation, space leaks and optimisations

22-23rd October 2019, 0900-1300 BST (2 sessions, each 4 hours)

Compact Introduction to Haskell

Functional programming with Haskell from scratch up to applicative functors and monads

4-7th November 2019, 0900-1300 GMT (4 sessions, each 4 hours)

Other Well-Typed training courses

If you are interested in the format, but not the topic or cannot make the time, feel free to drop us a line with requests for courses on other topics or at other times. We can also do courses on-site for your company, on the topics you are most interested in and individually tailored to your needs. Check out more detailed information on our training services or just contact us.

by christine, andres at September 05, 2019 08:01 AM

September 03, 2019

Jasper Van der Jeugt

The ZuriHac registration system

Introduction

I am one of the organizers of ZuriHac, and last year, we hand-rolled our own registration system for the event in Haskell. This blogpost explains why we decided to go this route, and we dip our toes into its design and implementation just a little bit.

I hope that the second part is especially useful to less experienced Haskellers, since it is a nice example of a small but useful standalone application. In fact, this was more or less an explicit side-purpose of the project: I worked on this together with Charles Till since he’s a nice human being and I like mentoring people in day-to-day practical Haskell code.

In theory, it should also be possible to reuse this system for other events – not too much of it is ZuriHac specific, and it’s all open source.

Relaxing at Lake ZuriHac (formerly known as Lake Zurich) after a long day of hacking and talks
Relaxing at Lake ZuriHac (formerly known as Lake Zurich) after a long day of hacking and talks

Why?

Before 2019, ZuriHac registration worked purely based on Google tools and manual labor:

  • Google Forms for the registration form
  • Google Groups to contact registrants
  • Google Sheets to manage the registrants, waitlist, T-Shirt numbers, …

Apart from the fact that the manual labor wasn’t scaling above roughly 300 people, there were a number of practical issues with these tools. The biggest issue was managing the waiting list and cancellations.

You see, ZuriHac is a free event, which means that the barrier to signing up for it is (intentionally and with good reason!) extremely low. Unfortunately, this will always result in a significant amount of people who sign up for the event, but do not actually attend. We try compensating for that by overbooking and offering cancellations; but sometimes it turns out to be hard to get people to cancel as well – especially if it’s hard to reach them.

Google Groups is not great for the purpose we’re using it for: first of all, attendees actually need to go and accept the invitation to join the group. Secondly, do you need a Google Account to join? I still don’t know and have seen conflicting information over the years. Anyway, it’s all a bit ad-hoc and confusing.

So one of the goals for the new registration system (in addition to reducing work on our side) was to be able to track participant numbers better and improve communication. We wanted to work with an explicit confirmation that you’re attending the event; or with a downloadable ticket so that we could track how many people downloaded this 1.

I looked into a few options (eventbrite, eventlama, and others…) but none of these ticked all the boxes: aside from being free (since we have limited budget). Some features that I wanted were:

  • complete privacy for our attendees
  • a custom “confirmation” workflow, or just being able to customize the registration flow in general
  • and some sort of JSON or CSV export option

With these things in mind, I set out to solve this problem the same the way I usually solve problems: write some Haskell code.

How?

The ZuriHac Registration system (zureg) is a “serverless” application that runs on AWS. It was designed to fit almost entirely in the free tier of AWS; which is why I, for example, picked DynamoDB over a database that’s actually nice to use. We used Brendan Hay’s excellent and extensive amazonka libraries to talk to AWS.

The total cost of having this running for a year, including during ZuriHac itself, totaled up to 0.61 Swiss Francs so I would say that worked out well price wise!

There are two big parts to the application: a fat lambda 2 function that provides a number of different endpoints, and a bunch of command line utilities that talk to the different services directly.

All these parts, however, are part of one monolithic codebase which makes it very easy to share code and ensure all behaviour is consistent – globally coherent as some would call it. One big “library” that has well-defined module boundaries and multiple lightweight “executables” is how I like to design applications in Haskell (and other languages).

Building and deploying

First, I’d like to go into how the project is built and compiled. It’s not something I’m proud of, but I do think it makes a good cookbook on how to do things the hard way.

The main hurdle is that we wanted want to run our Haskell code on Lambda, since this is much cheaper than using an EC2 instance: the server load is very bursty with long periods (days up to weeks) of complete inactivity.

I wrote a bunch of the zureg code before some Haskell-on-Lambda solutions popped up, so it is all done from scratch – and it’s surprisingly short. However, if I were to start a new project, I would probably use one of these frameworks:

Converting zureg to use of these frameworks is something I woulld like to look into at some point, if I find the time. The advantage of doing things from scratch, however, is that it serves the educational purposes of this blogpost very well!

Our entire serverless framework is currently contained in a single 138-line file.

From a bird’s eye view:

  1. We define a docker image that’s based on Amazon Linux – this ensures we’re using the same base operating system and system libraries as Lambda, so our binary will work there.

  2. We compile our code inside a docker container and copy out the resulting executable to the host.

  3. We zip this up together with a python script that just forwards requests to the Haskell process.

  4. We upload this zip to S3 and our cloudformation takes care of setting up the rest of the infrastructure.

I think this current situation is still pretty manageable since the application is so small; but porting it to something nicer like Nix is definitely on the table.

The database

The data model is not too complex. We’re using an event sourcing approach: this means that our source of truth is really an append-only series of events rather than a traditional row in a database that we update. These events are stored as plain JSON, and we can define them in pure Haskell:

lib/Zureg/Model.hs

And then we just have a few handwritten functions in the database module:

lib/Zureg/Database.hs

This gives us a few things for free; most importantly if something goes wrong we can go in and check what events led the user to get into this invalid state.

This code is backed by the eventful and eventful-dynamodb libraries, in addition to some custom queries.

The lambda

While our admins can interact with the system using the CLI tooling, registrants interact with the system using the webapp. The web application is powered by a fat lambda.

Using this web app, registrants can do a few things:

  • Register for the event (powered by a huge web 1.0 form using digestive-functors);
  • View their ticket (including a QR code generated by qrcode;
  • Confirm their registration;
  • Cancel their registration.

In addition to these routes used by participants, there’s a route used for ticket scans – which we’ll talk about next.

The scanning

Now that we have participant tickets, we need some way to process them at the event itself.

scanner.js is a small JavaScript tool that does this for us. It uses the device’s webcam to scan QR codes – which is nice because this means we can use either phones, tablets or a laptop to scan tickets at the event, the device just needs a modern browser version. It’s built on top of jsQR.

The scanner intentionally doesn’t do much processing – it just displays a full-screen video of the webcam and searches for a QR code using an external library. Once we get a hit for a QR code, we poll the lambda again to retrieve some information (participant name, T-Shirt size) and overlay that on top of the video.

Testing the scanner
Testing the scanner

This is useful because now the people working at the registration desk can see, as demonstrated in the image above, that I registered too late and therefore should only pick up a T-Shirt on the second day.

What is next?

There is a lot of room for improvement, but the fact that it had zero technical issues during registration or the event makes me very happy. Off the top of my head, here are some TODOs for next years:

  • We should have a CRON-style Lambda that handles the waiting list automation even further.
  • It should be easier for attendees to update their information.

Other than that, there are some non-functional TODOs:

  • Can we make the build/deploy a bit easier?
  • Should we port zureg to use one of the existing Haskell-on-Lambda frameworks?
  • I’m currently using somewhat fancy image scaling to get a sharp scaled up QR image, but this does not work if someone saves it on their phone – we should just do the scaling on the backend.

Any contributions in these areas are of course welcome!

Lastly, there’s the question of whether or not it makes sense for other events to use this. I discussed this briefly with Franz Thoma, one of the organizers of Munihac, who expressed similar gripes about evenbrite.

As it currently stands, zureg is not an off-the-shelf solution and requires some customization for your event – meaning it only really makes sense for Haskell events. On the other hand, there are a few people who prefer doing this over mucking around in settings dashboard that are hugely complicated but still do not provide the necessary customization.


  1. I realize this is a bit creepy, and fortunately it turned out not to be necessary since we could do the custom confirmation flow.

  2. In serverless terminology, it seems to common to refer to lambdas that deal with more than one specific endpoint or purpose as “fat lambdas”. I think this distracts from the issue a bit, since it’s more important to focus on how the code works and whether or not you can re-use it rather than how it is deployed – but coming from a functional programming perspective I very much enjoy the sound of “fat lambda”.

by Jasper Van der Jeugt at September 03, 2019 12:00 AM

September 02, 2019

Well-Typed.Com

Announcing the optics library

We are delighted to announce the first Hackage release of optics, a Haskell library for defining and using lenses, traversals, prisms and other optic kinds. The optics library is broadly similar in functionality to the well-established lens library, but uses an abstract interface rather than exposing the underlying implementation of each optic kind. It aims to be easier to understand than lens, with clearer interfaces, simpler types and better error messages, while retaining as much functionality as possible. It is being developed by Andrzej Rybczak and Adam Gundry, with significant contributions from Oleg Grenrus and Andres Löh, and much copy-pasting of code and selective copying of ideas from lens.

Example of optics types and error messages

Let’s dive straight into an example of using optics in GHCi. What is a lens?

*Optics> :info Lens
type Lens s t a b = Optic A_Lens NoIx s t a b

The Optic newtype unifies different optic kinds such as lenses, traversals and prisms. Its first type parameter, here A_Lens, indicates the optic kind in use. The second, NoIx, means that this is not an indexed optic (we will mostly ignore indexed optics for the purposes of this post). As in lens, the s and t parameters represent the types of the outer structure (before and after a type-changing update), and the a and b parameters represent the types of the inner field.

A lens can be constructed using, naturally enough, the lens function, which takes getter and setter functions and returns a Lens (i.e. an Optic A_Lens):

*Optics> :type lens
lens :: (s -> a) -> (s -> b -> t) -> Lens s t a b
*Optics> let l = lens (\(x,_) -> x) (\(_,y) x -> (x,y))
l :: Lens (a1, b) (a2, b) a1 a2

Given a lens we can use it to view the inner value within the outer structure, or set a new value:

*Optics> :type view
view :: Is k A_Getter => Optic' k is s a -> s -> a
*Optics> :type set
set :: Is k A_Setter => Optic k is s t a b -> b -> s -> t

Notice that these types are polymorphic in the optic kind k they accept, but specify very clearly what kind of optic they require.1 You can apply view to any optic kind k that can be converted to (i.e. is a subtype of) a Getter. The Is constraint implements subtyping using the typeclass system. In particular, we have instances for Is A_Lens A_Getter and Is A_Lens A_Setter so our lens l can be used with both operators:

*Optics> view l ('a','b')
'a'
*Optics> set l 'c' ('a','b')
('c','b')

If you try to use an optic kind that is not a subtype of the required type, a clear error message is given:

*Optics> :type sets
sets :: ((a -> b) -> s -> t) -> Setter s t a b
*Optics> :type view (sets fmap)

<interactive>:1:1: error:
A_Setter cannot be used as A_Getter
In the expression: view (sets fmap)

Composing optics

Optics are not functions, so they cannot be composed with the (.) operator. This may be viewed as a price to pay for the improved type inference and clearer type errors, but it is conceptually important: we regard optics as an abstract concept distinct from possible representations using functions, so it does not make sense to compose them with function composition or apply them with function application.2

Instead of (.), a separate composition operator (%) is provided:3

*Optics> :type l % l
l % l :: Optic A_Lens '[] ((a, b1), b2) ((b3, b1), b2) a b3
*Optics> view (l % l) (('x','y'),'z')
'x'

Composing optics of different kinds is fine, provided they have a common supertype, which the composition returns:

*Optics> :type l % sets fmap
l % sets fmap
  :: Functor f => Optic A_Setter '[] (f a, b1) (f b2, b1) a b2

However, some optic kinds do not have a common supertype, in which case a type error results from trying to compose them:

*Optics> :type to
to :: (s -> a) -> Getter s a
*Optics> :type to fst % sets fmap

<interactive>:1:1: error:
A_Getter cannot be composed with A_Setter
In the expression: to fst % sets fmap

The type of (%) itself is not entirely trivial. It relies on a type family Join to calculate the least upper bound of a pair of optic kinds:

*Optics> :type (%)
(%)
  :: (Is k (Join k l), Is l (Join k l)) =>
     Optic k is s t u v
     -> Optic l js u v a b -> Optic (Join k l) (Append is js) s t a b

However, you rarely work with (%) directly, and see only the results. The Join type family can be evaluated directly to determine how two optic kinds compose:

*Optics> :kind! Join A_Lens A_Setter
Join A_Lens A_Setter :: *
= A_Setter
*Optics> :kind! Join A_Getter A_Setter
Join A_Getter A_Setter :: *
= (TypeError ...)

A little lens comparison

For comparison, let’s try the same sequence of commands with lens. Here the underlying implementation using the van Laarhoven representation is rapidly visible:

Control.Lens> :info Lens
type Lens s t a b =
  forall (f :: * -> *). Functor f => (a -> f b) -> s -> f t
Control.Lens> :type lens
lens
  :: Functor f => (s -> a) -> (s -> b -> t) -> (a -> f b) -> s -> f t
Control.Lens> let l = lens (\(x,_) -> x) (\(_,y) x -> (x,y))
l :: Functor f => (a1 -> f a2) -> (a1, b) -> f (a2, b)

Using view and set is not much different:4

Control.Lens> :type view
view
  :: Control.Monad.Reader.Class.MonadReader s m =>
     Getting a s a -> m a
Control.Lens> :type set
set :: ASetter s t a b -> b -> s -> t
Control.Lens> view l ('a','b')
'a'
Control.Lens> set l 'c' ('a','b')
('c','b')

However, attempting to use a Setter where a Getter is expected does not report an error immediately, and when it does, the message is somewhat inscrutable:

Control.Lens> :type sets
sets
  :: (Profunctor p, Profunctor q, Settable f) =>
     (p a b -> q s t) -> Optical p q f s t a b
Control.Lens> :type view (sets fmap)
view (sets fmap)
  :: (Control.Monad.Reader.Class.MonadReader (f b) m,
      Settable (Const b), Functor f) =>
     m b
Control.Lens> view (sets fmap) ('x','y')

<interactive>:82:7: error:
No instance for (Settable (Const Char))
        arising from a use of ‘sets’
...

Somewhat magically, lens uses the (.) function composition operator for optic composition:

Control.Lens> :type l . l
l . l
  :: Functor f => (a1 -> f a2) -> ((a1, b1), b2) -> f ((a2, b1), b2)
Control.Lens> view (l . l) (('x','y'),'z')
'x'

Even more magically, this automatically selects the appropriate supertype when composing different optic kinds:

Control.Lens> :type l . sets fmap
l . sets fmap
  :: (Settable f1, Functor f2) =>
     (a -> f1 b1) -> (f2 a, b2) -> f1 (f2 b1, b2)

Once more, however, illegitimate compositions are not detected immediately but lead to a type with class constraints that can never be usefully satisfied:

Control.Lens> :type to
to :: (Profunctor p, Contravariant f) => (s -> a) -> Optic' p f s a
Control.Lens> :type to fst . sets fmap
to fst . sets fmap
  :: (Contravariant f1, Settable f1, Functor f2) =>
     (b1 -> f1 b1) -> (f2 b1, b2) -> f1 (f2 b1, b2)

Overloaded labels

Suppose we define two datatypes with the same field name:

data Human = Human { name :: String } deriving Show
data Pet = Pet { name :: String } deriving Show

Now we have a problem if we try to use name as a record selector or in a record update, because it is ambiguous which datatype is meant. The DuplicateRecordFields GHC extension can help with this to some extent, but it makes very limited use of type information to resolve the ambiguity. For example, name (Human "Peter" :: Human) will work but name (Human "Peter") is still considered ambiguous.

The GHC OverloadedLabels extension is intended to help in this situation, by providing a new syntax #name for an “overloaded label” whose interpretation is determined by its type. In particular, we can use overloaded labels as optics by giving instances of the LabelOptic class, with a few GHC extensions and a bit of boilerplate:5

{-# LANGUAGE OverloadedLabels DataKinds FlexibleInstances MultiParamTypeClasses
             UndecidableInstances TypeFamilies #-}
instance (a ~ String, b ~ String) => LabelOptic "name" A_Lens Human Human a b where
  labelOptic = lens (\ (Human n) -> n) (\ _h n -> Human n )
instance (a ~ String, b ~ String) => LabelOptic "name" A_Lens Pet Pet a b where
  labelOptic = lens (\ (Pet n) -> n) ( \ _p n -> Pet n )

Now we can use #name as a Lens, and the types will determine which field of which record is intended:

*Optics> view #name (Human "Peter")
"Peter"
*Optics> set #name "Goldie" (Pet "Sparky")
Pet {name = "Goldie"}

For more details on the support for overloaded labels in optics, check out the Haddocks for Optics.Label.

The hierarchy of optics

In optics, the hierarchy of optic kinds is closed, i.e. it is not possible to discover and make use of new optic kinds without modifying the library. Our aim is to make it easier to understand the interfaces and uses of different optic kinds, but this comes at the cost of obscuring some of the underlying common structure of the van Laarhoven or profunctor representations. One concrete limitation relative to lens is that we have not yet explored support for non-empty folds and traversals (Fold1 and Traversal1).

The diagram below shows the hierarchy of optic kinds supported by the initial release. Each arrow points from a subtype to its immediate supertype, e.g. every Lens can be used as a Getter:

alt
Optics hierarchy

The details of how indexed optics work are beyond the scope of this blog post (see the indexed optics Haddocks if you are interested), but the diagram below shows that every optic above Lens in the subtype hierarchy has an accompanying indexed variant:

alt
Indexed optics

Summary

What are the key ideas underpinning the optics library?

  • Every optic kind has a clear separation between interface and implementation, with a newtype abstraction boundary. This means the types reflect concepts such as lenses directly, rather than encoding them using higher-rank polymorphism. This leads to good type inference behaviour and (hopefully) clear error messages.

  • The interface of each optic kind is clearly and systematically documented. See the documentation for Optics.Lens as an example.

  • Since optics are not functions, they cannot be composed with the (.) operator. Instead a separate composition operator (%) is provided.

  • Subtyping between different optic kinds (e.g. using a lens as a traversal) is accomplished using typeclasses. This is mostly automatic, although explicit casts are possible and occasionally necessary.

  • Optics work with the OverloadedLabels GHC extension to allow the same name to be used for fields in different datatypes.

  • Under the hood, optics uses the indexed profunctor encoding (rather than the van Laarhoven encoding used by lens). This allows us to support affine optics (which have at most one target). We provide conversions between the optics and lens representations; for isomorphisms and prisms these are in a separate package optics-vl as this incurs a dependency on profunctors.

  • Indexed optics have a generally similar user experience to lens, but with different ergonomics (e.g. all optics are index-preserving, and there is no separate Conjoined class).

  • The main Optics module exposes only a restricted selection of operators, making inevitably opinionated choices about which operators are the most generally useful.

  • Sometimes functions in optics have a more specific type than the most general type possible, in the interests of simplicity and reducing the likelihood of errors. For example view does not work on folds, instead there is a separate function foldOf to eliminate folds, or gview if you really want additional polymorphism.

  • For library writers who wish to define optics as part of their library interface, we provide a cut-down optics-core package with significant functionality but minimal dependencies (only GHC boot libraries). Unlike lens, it is not possible to define lenses without depending on at least optics-core.

For a full introduction to optics, check out the Haddocks for the main Optics module. We welcome feedback and contributions on the GitHub well-typed/optics repo.

Acknowledgements

I would like to thank my coauthors Andrzej Rybczak, Oleg Grenrus and Andres Löh for all their work on optics. Edsko de Vries, Alp Mestanogullari, Ömer Sinan Ağacan and other colleagues at Well-Typed gave helpful feedback on the library in general and this blog post in particular. Thanks are also due to Edward Kmett for his work on lens and for critiquing (though not necessarily endorsing!) the ideas behind this library.


  1. They are also polymorphic in is, so they can be used with both indexed and unindexed optics.↩︎

  2. Neither do optics form a Category, because this would rule out optics with type-changing update or composition of optics of different kinds.↩︎

  3. An implementation detail leaks through here: the empty list '[] corresponds to NoIx and represents the empty list of indices, meaning that this optic is not indexed.↩︎

  4. lens generalises view over any MonadReader, and permits it to work on folds, whereas optics chooses not to by default. We provide a gview function in Optics.View that can be used similarly to view from lens.↩︎

  5. The boilerplate can be generated by Template Haskell now, and we are exploring making use of Generic instead. In the future we may be able to use a planned but not-yet-implemented addition to the GHC HasField class.↩︎

by adam at September 02, 2019 09:42 AM

August 30, 2019

Theory Lunch (Institute of Cybernetics, Tallinn)

Positive expansivity is impossible for reversible cellular automata

On Thursday 29 August 2019 I gave a talk about expansivity. I focused on positive expansivity and discussed a general statement which has a most remarkable consequence for cellular automata theory.

Find the talk on my personal blog HERE

by Silvio Capobianco at August 30, 2019 02:12 PM

Matt Parsons

Why 'Functor' Doesn't Matter

Alternative, less click-baity title: Names Do Not Transmit Meaning

People often complain about the names for concepts that are commonly used in Functional Programming, especially Haskell. Functor, monoid, monad, foldable, traversable, arrow, optics, etc. They’re weird words! Functor comes from category theory, Monoid comes from abstract algebra. Arrow comes from – well it’s just kind of made up! Optics, lenses, prisms, etc are all somewhat strange metaphors for what’s going on. What’s the deal? Why can’t they just pick simple names that mean what they are? Why can’t they use practical and ordinary terms, the way that Object Oriented Programming does?

Some people strengthen their complaint with moral urgency: Functor is a confusing word,

  • and it makes it more difficult for people to learn Haskell,
  • and this makes Haskell an elitist, non-inclusive language,
  • and if we just used “my favorite term” instead, it wouldn’t be a problem!

So, let’s talk about names. What are they? What do they do? How do they matter, and why?

What’s in a name?

My name is Matthew Parsons. If you google “Matthew Parsons”, you’ll see a bunch of footballers, a doctor, a professor, a radio personality. My blog is the last entry on the second page of Google results for my name, which I’m pretty proud of.

My name isn’t globally unique - there are a lot of Matt Parsons running around. Many of them think that they have my email, and sign me up for all kinds of silly stuff (and some serious stuff, too). If you further qualify the name - by appending ‘Haskell’ to the search query - then you get a ton of stuff that points to me. I appear to be the most prominent Haskell programmer named Matt Parsons (for now).

If I’m in a group of folks, and you say the word “Matt,” I’m going to assume you’re trying to get my attention. Unless there’s another Matt in the group, at which point you’ll probably say “Matt Parsons” or similar to disambiguate. I’m about to go on a bikepacking trip with two other Matts. I suspect my first name will be dropped entirely on this trip.

What does my name tell you about me? Almost nothing. I’m an English speaking male, probably of British descent. But it doesn’t tell you that I like kittens, that I like Haskell, that I like to ride bikes, or that I dislike the color red. It’s merely an imperfect, globally duplicated pointer.

What’s a name good for?

It’s a pointer with an ambiguous address space. It’s a key in a nondeterministic map. It’s a database ID column with an index, but not a unique index.

They’re bad! They don’t scale, at all. I think of a concept, I say a word, and hopefully this points to the same concept in your brain. We use names as shortcuts for communicating common concepts. If we need to learn more about a concept, we can look up the name and try to find relationships to other names.

Names can’t transmit meaning. They just point to concepts. Concepts must be explained and understood, usually in terms of a large quantity of simpler or more familiar names. If we want a name to fully describe the concept it points to, then it must be a very simple concept indeed.

How can we judge a name?

Names can’t transmit meaning, and so a name shouldn’t be judged on how well it transmits meaning. That doesn’t mean that names can’t be judged at all - there are good and bad aspects to names.

  • How reliably does it point to the right concept?
  • How memorable is it?
  • How easy is it to pronounce (for the language it originated in)?
  • How aesthetically appealing is it?

The last three are pretty subjective - I used to find it difficult to remember the difference between Monoid and Monad because they both have the form mon*d, and because people kept saying things like “Monads are monoids for functors.” I find a word like ‘illuminate’ pretty and ‘buzzfeed’ gross, which is totally just because I am weird and have opinions about this.

Reliability is also subjective. It all depends on context and familiarity. It’s essentially impossible to have fully unique names for ideas - even if you pick something totally novel, someone else can come along and use your unique name for a totally different concept.

Functor gets picked on a lot, so let’s look at that. The word functor has three meanings, one in linguistics, one in object oriented programming, and one in category theory. This is pretty good - only three collisions, and it is usually pretty clear what you mean based on context clues.

The best (but still extremely bad) alternative name to Functor is Mappable. It’s the best alternative because it is the least misleading - I’ve seen people suggest Iterable, Streaming, Container, Lift, and they’re all dramatically more misleading.

The Wiktionary page relates it to maps, suggesting that it means you “can make a geographical map of a thing,” or that you can construct a “mapping” between two sets of things. Grammatically, it implies that you can use a verb “map” over the thing.

So let’s look at the Wiktionary entry for ‘map’. At a first glance, it’s a way bigger page than Functor. There’s a common understanding: geographic maps, like you use to navigate a new city. The next most common understanding is more abstract:

A graphical representation of the relationships between objects, components or themes.

Third definition is from math, and makes it a synonym for ‘function’. But that’s not quite right - after all, Mappable x implies “I can map x,” and you can apply a function to any value in Haskell. So that doesn’t really give any additional clarity.

The other meanings are completely out-of-bounds, and it’s unlikely that someone would be confused. This name, Mappable, points to a bunch of potential meanings already, and none of them are really right. So we can create a new meaning that Mappable points to, and hope that people infer the right one.

But we still have to explain what a Mappable is. “What’s a Mappable? Well, it’s something you can map over!” is a terrible explanation! It’s literally just the grammatic expansion of the word. All it does is move the question one bit further:

What does it mean to be able to map over something?

Wait, “map over”? This is unfamiliar terminology. I know about “maps” like Google Maps. I know that I can ‘map’ a space out and provide information about how to get from here to there.

You might think that because you already learned map from JavaScript or Python that it’s a good enough name. But that’s an argument from familiarity. Rubyists and Smalltalkers are more familiar with the name collect for this operation. If they want to call it Collectable, then who are we to stop them?

Fact is, “mapping” isn’t an easy concept, no matter what you call it. We could call it “florbing” and it would require the exact same amount of instruction and understanding for the concept to work out.

Worse yet, a Mappable is a more permissive concept than a Functor. There are things that are Mappable that are not a Functor, because a Functor is structure preserving. This means that the following laws must hold:

composition:
    fmap f . fmap g = fmap (f . g)

identity:
    fmap id = id

A Set datatype (collection of unique objects) is not a functor, because it is possible for a choice of f and g to violate the composition law. Likewise, a datatype Counter that keeps track of how many times you call fmap on it is not a Functor because fmap id would not be equal to id. These laws are important, because they allow us to perform refactoring and simplify the possibilities when thinking about code.

Mappable is an OK concept, but it ain’t a Functor, and there’s no way I’m trading the name for StructurePreservingMappable.

So what makes a name bad?

Names can’t transmit meaning. They can transmit a pointer, though, which might point to some meaning. If that meaning isn’t the right meaning, then the recipient will misunderstand. Misunderstandings like this can be difficult to track down, because our brains don’t give us a type error with a line and column number to look at. Instead, we just feel confused, and we have to dig through our concept graph to figure out what’s missing or wrong.

Object Oriented Programming is littered with terrible names, precisely because they mislead and cause a false familiarity. Object, Class, Visitor, Factory, Command, Strategy, Interface, Adapter, Bridge, Composite. All of these are common English words with a relatively familiar understanding to them. And all of them are misleading.

“Object” is possibly the most reasonable name choice - the English word ‘object’ just refers to any random physical thing, or grammatically speaking, the target of a subject - something we act upon. After that, it’s misleading names causing confusion.

What’s a class? It’s a blueprint for objects! Why not call it an ObjectBlueprint? Uhhh… And how does that relate to static class variables, and other attributes of classes?

What’s the visitor pattern? What does it mean for my code to “visit” another piece of code? You need to understand the abstract meaning of these words before “visitor pattern” means anything to you.

What’s the difference between an interface, a bridge, and an adapter? These three terms are all idioms for the same sort of concept in English, but they have rather different precise meanings in programming.

Monad - now that’s a name that didn’t mean anything to me when I first read it. I read it, and I immediately knew that I didn’t understand the underlying concept. At the time, I was so tired of reading familiar words, assuming they meant something that I understood, and stepping on abstract landmines that betrayed my lack of understanding. Monad was a breath of fresh air. A new concept, and a new name to go with it!

Concepts are hard. Names don’t make them any easier or harder to understand. Names are only useful in their value as pointers, and to establish relationships between concepts.

Functor is hard to learn. It is not hard to learn because it is named Functor. If you renamed it to anything else, you’d have just as hard of a time, and you’d be cutting off your student from all of the resources and information currently using the word “functor” to refer to that concept.

August 30, 2019 12:00 AM

August 29, 2019

FP Complete

Command line parsing with clap

This post is part of a series on implementing SortaSecret.com.

clap is a library which provides the ability to parse command line options. For SortaSecret.com, we have relatively simple parsing needs: two subcommands and some options. Some of the options are, well, optional, while otherwise are required. And one of the subcommands must be selected. We'll demonstrate how to parse that with clap.

Keep in mind that in addition to the clap interface I'll be using below, there is also a structopt library that more directly parses arguments into structures. This article will not cover structopt at all; a future article may do that instead. Also, in the future it looks like structopt functionality will be merged into clap.

Also, final note: the API docs for clap are really good and include plenty of worked examples. Please check those out as well!

Reader prerequisites

This blog post will assume basic knowledge of the Rust programming language, and that you have the command line tooling (rustup, cargo, etc) installed. If you'd like more information, see our Rust getting started guide.

Simple example

Let's write a simple command line argument parser to make sure everything's working. Start off with with a cargo new clap1 --bin to start a new project, and then add clap = "2.33" underneath [dependencies] in Cargo.toml. Inside the generated clap1 directory, run cargo run, which should build clap and its dependencies, and then print Hello, world!. Or more fully, on my machine:

First run of clap1

Of course, this isn't using clap yet. Let's fix that. We're going to parse the command line option --name to find out who to say hello to. To do this with clap, we're going to follow the basic steps of:

  • Defining a new App value in builder style
  • Continuing with this builder style, add in arguments we want parsed
  • Get the matches for these arguments against the actual command line arguments
  • Extract the matched values and use them

Here's our code doing all of this:

extern crate clap;

use clap::{Arg, App};

fn main() {
    // basic app information
    let app = App::new("hello-clap")
        .version("1.0")
        .about("Says hello")
        .author("Michael Snoyman");

    // Define the name command line option
    let name_option = Arg::with_name("name")
        .long("name") // allow --name
        .takes_value(true)
        .help("Who to say hello to")
        .required(true);

    // now add in the argument we want to parse
    let app = app.arg(name_option);

    // extract the matches
    let matches = app.get_matches();

    // Extract the actual name
    let name = matches.value_of("name")
        .expect("This can't be None, we said it was required");

    println!("Hello, {}!", name);
}

You can run this with cargo run, which should result in something like:

$ cargo run
   Compiling clap1 v0.1.0 (/Users/michael/Desktop/clap1)
    Finished dev [unoptimized + debuginfo] target(s) in 1.06s
     Running `target/debug/clap1`
error: The following required arguments were not provided:
    --name <name>

USAGE:
    clap1 --name <name>

For more information try --help

To pass in command line option to our executable, we need to add an extra -- after cargo run, to tell cargo that the remainder of the command line options should be passed verbatim to the clap1 executable and not parsed by cargo itself. For example:

$ cargo run -- --help
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/clap1 --help`
hello-clap 1.0
Michael Snoyman
Says hello

USAGE:
    clap1 --name <name>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
        --name <name>    Who to say hello to

And, of course:

$ cargo run -- --name Rust
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/clap1 --name Rust`
Hello, Rust!

Exercises

  • Modify this program to make the name optional, and provide a reasonable default name
  • Remove the intermediate names for app and name_option and instead use full-on builder style. You'll end up with something like let matches = App::new...
  • Support a short name version of the name argument, so that cargo run -- -n Rust works

Extracting to a struct

It's all well and good to get the matches. But one of the best draws of Rust in my opinion is strongly typed applications. To make that a reality, my preference is to have an intermediate step between the matching and the actual application code where we extract the command line options into a struct. This isolates our parsing logic to one area, and lets the compiler help us everywhere else.

By the way, this is exactly the kind of thing that structopt does for us. But today, we'll do it directly with clap.

The first step is to define the struct. In our example, we had exactly real argument, a name, which was a required String. So our struct will look like:

struct HelloArgs {
    name: String,
}

And then, we can essentially copy-paste our original code into an impl for this struct, something like:

impl HelloArgs {
    fn new() -> Self {
        // basic app information
        let app = App::new("hello-clap")
        ...
        // Extract the actual name
        let name = matches.value_of("name")
            .expect("This can't be None, we said it was required");

        HelloArgs { name: name.to_string() }
    }
}

And we can use it with

fn main() {
    let hello = HelloArgs::new();

    println!("Hello, {}!", hello.name);
}

Since the code is almost entirely unchanged, I won't include it inline here, but you can see it as a Github Gist.

Exercises

  • Like before, modify the program to make the name optional. But do it by changing the struct HelloArgs to have an Option<String> field.

Better testing

The above refactoring doesn't really give us much, especially on such a small program. However, we can already leverage this to start doing some testing. To make that work, we first want to modify our new method to take the command line arguments as an argument, instead of reading it from the global executable. We'll also need to modify the return value to return a Result to deal with failed parses. Up until now, we've been relying on clap itself to print an error message and exit the process automatically.

First, let's look at what our signature is going to be:

fn new_from<I, T>(args: I) -> Result<Self, clap::Error>
where
    I: Iterator<Item = T>,
    T: Into<OsString> + Clone,

And using clap::Error's exit() method, we can recover our original new function working off of the actual command line arguments and exiting the program on a bad parse fairly easily:

fn new() -> Self {
    Self::new_from(std::env::args_os().into_iter()).unwrap_or_else(|e| e.exit())
}

And within our new_from method, we just need to replace the call to get_matches with:

// extract the matches
let matches = app.get_matches_from_safe(args)?;

And wrap the final value with Ok:

Ok(HelloArgs {
    name: name.to_string(),
})

Awesome, now we're ready to write some tests. First, one note: when we provide the list of arguments, the first argument is always the executable name. Our first test will make sure that we get an argument parse error when no arguments are provided, which looks like this:

#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn test_no_args() {
        HelloArgs::new_from(["exename"].iter()).unwrap_err();
    }
}

We can also test that using the --name option without a value doesn't parse:

#[test]
fn test_incomplete_name() {
    HelloArgs::new_from(["exename", "--name"].iter()).unwrap_err();
}

And finally test that things work when a name is provided:

#[test]
fn test_complete_name() {
    assert_eq!(
        HelloArgs::new_from(["exename", "--name", "Hello"].iter()).unwrap(),
        HelloArgs { name: "Hello".to_string() }
    );
}

Property checking

There's still one piece of this whole puzzle that bothers me. The caller of new or new_from knows it has received a nicely typed value. However, within new_from, we are using expect for cases that should be impossible. For example, our name_option sets required to true, and therefore we know the matches.value_of("name") call will not return a None. However, what guarantees do we have that we remembered to set required to true?

One approach to improve the situation is to use property testing. In the case of the argument parsing, I can state a simple property: for all possible strings I can send as input, the parse will either return a valid HelloArgs or generate a clap::Error. Under no circumstances, however, should it panic. And using the quickcheck and quickcheck_macros crates, we can test exactly this! First we add the following to the top of our file:

#[cfg(test)]
extern crate quickcheck;
#[cfg(test)]
#[macro_use(quickcheck)]
extern crate quickcheck_macros;

And then write the nice little property:

#[quickcheck]
fn prop_never_panics(args: Vec<String>) {
    let _ignored = HelloArgs::new_from(args.iter());
}

And sure enough, if you set required to false, this property will fail.

I'll include the full code for this at the end of the article.

Just scratching the surface

There are lots of other things you may want to do with clap, and we're not going to cover all of them here. It's a great library, and produces wonderful CLIs. You may be a bit surprised that this article claims to be part of the series on implementing SortaSecret.com. This is where SortaSecret comes in. The source code includes the cli module, which has an example of a subcommand and therefore uses an enum to handle the differents variants.

Full code

extern crate clap;

use clap::{App, Arg};
use std::ffi::OsString;

#[cfg(test)]
extern crate quickcheck;
#[cfg(test)]
#[macro_use(quickcheck)]
extern crate quickcheck_macros;

#[derive(Debug, PartialEq)]
struct HelloArgs {
    name: String,
}

impl HelloArgs {
    fn new() -> Self {
        Self::new_from(std::env::args_os().into_iter()).unwrap_or_else(|e| e.exit())
    }

    fn new_from<I, T>(args: I) -> Result<Self, clap::Error>
    where
        I: Iterator<Item = T>,
        T: Into<OsString> + Clone,
    {
        // basic app information
        let app = App::new("hello-clap")
            .version("1.0")
            .about("Says hello")
            .author("Michael Snoyman");

        // Define the name command line option
        let name_option = Arg::with_name("name")
            .long("name") // allow --name
            .short("n") // allow -n
            .takes_value(true)
            .help("Who to say hello to")
            .required(true);

        // now add in the argument we want to parse
        let app = app.arg(name_option);
        // extract the matches
        let matches = app.get_matches_from_safe(args)?;

        // Extract the actual name
        let name = matches
            .value_of("name")
            .expect("This can't be None, we said it was required");

        Ok(HelloArgs {
            name: name.to_string(),
        })
    }
}

fn main() {
    let hello = HelloArgs::new();

    println!("Hello, {}!", hello.name);
}

#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn test_no_args() {
        HelloArgs::new_from(["exename"].iter()).unwrap_err();
    }

    #[test]
    fn test_incomplete_name() {
        HelloArgs::new_from(["exename", "--name"].iter()).unwrap_err();
    }

    #[test]
    fn test_complete_name() {
        assert_eq!(
            HelloArgs::new_from(["exename", "--name", "Hello"].iter()).unwrap(),
            HelloArgs { name: "Hello".to_string() }
        );
    }

    #[test]
    fn test_short_name() {
        assert_eq!(
            HelloArgs::new_from(["exename", "-n", "Hello"].iter()).unwrap(),
            HelloArgs { name: "Hello".to_string() }
        );
    }

    /* This property will fail, can you guess why?
    #[quickcheck]
    fn prop_any_name(name: String) {
        assert_eq!(
            HelloArgs::new_from(["exename", "-n", &name].iter()).unwrap(),
            HelloArgs { name }
        );
    }
    */

    #[quickcheck]
    fn prop_never_panics(args: Vec<String>) {
        let _ignored = HelloArgs::new_from(args.iter());
    }
}

What's next?

I'd recommend checking out the rest of our SortaSecret content, and have a look at our Rust homepage.

August 29, 2019 02:52 AM

August 26, 2019

Chris Smith 2

Some editing updates to CodeWorld

I spent the last week in Berlin for the 2019 International Conference on Functional Programming. While I was there, though, I also took some time to work on outstanding CodeWorld issues. Here are a few updates.

Automatic indents

One of the earliest things students struggle with in CodeWorld is managing indentation of equations. The rule is simple enough: new equations must start at the left margin, and any line wraps are indented. But students still forget this rule often, and the error messages aren’t always easy to understand. This is one of the most common mistakes that students make in writing Haskell code, simply because there are so many chances to make it!

Instead of trying to improve the error messages, I decided to make the error harder to make in the first place. I’m most of the way there.

Demo of auto-indent for CodeWorld

The new indent mode does what is needed, at least. When you press Enter in most situations where it’s known that you need to continue the current statement, such as if you’re in parentheses or have just typed an operator that needs another operand, the indent is automatic. When you finish an expression, the new line is no longer indented. If you start a line with an operator, the indent is added to make it part of the previous line. It even does the right thing when keywords force a layout context to end, so typing in or where will often drop some indent on the line so that the new expression starts in the right place.

Note that this is not a source beautifier. It doesn’t pay any attention to how to make the source code look good; only to adding indents where necessary to keep the compiler working. Tweaking the indent rules to look better is an open area for improvement. So is making the Tab and Backspace keys work more intuitively by skipping to indent markers. But I believe this will reduce student errors significantly as they get started.

After all, the best kind of error is the one you never make!

Cleaner holes, and other things

The second thing I did was clean up GHC error messages for the typed holes feature. Typed holes are a feature of GHC that lets you type a single underscore in place of any expression. When you do this, you get a compile warning that identifies the hole and tells you what type it has, and which defined variables could fit there.

Typed holes in a CodeWorld program

Typed holes are a GHC feature, so they already worked in CodeWorld. But the messages printed were daunting, for several reasons. Among them:

  • They included a list of “relevant bindings” that students would not understand.
  • They contained long-winded explanatory text about arguments to pass to GHC to control the display of information.
  • They included notation with visible type applications when suggesting polymorphic functions.
  • They just triggered a lot of bad corner cases in CodeWorld error rewriting that mangled the messages a lot.

It took a lot of fiddling with regular expressions, but I’ve cleaned up these problems so that the message pretty much says what you care about most: there’s a hole that needs to be filled, and here are some things you can fill it with.

So why did I suddenly care? Actually, it has to do with an ICFP session I attended: a tutorial session on teaching functional programming led by Michael Sperber. In the session, Michael talked a lot about how he asks his students to explicitly write “…” in their code as a placeholder for something that should be filled in later. The idea is to jolt students out of the habit of trying to put everything together in one pass and type the finished code, by instead asking (even requiring!) them to type fragments of incomplete code first, and then put it together later.

When doing this in Haskell, once could still type “…”, but it seems a bit of a waste when there’s already a language feature that’s just as good (instead of an ellipse, it’s a blank to be filled in), which will also guide the student to find an appropriate term. I am now interested in trying this out in teaching, to see if students will learn better when they are instructed to first put blanks where the function arguments go, and then type them again as a separate step.

by Chris Smith at August 26, 2019 04:51 AM

August 25, 2019

Russell O'Connor

Counterfactual Definiteness and the EPR paradox

Many articles have been written on the EPR paradox and Bell’s inequality. I want to write down, for my own reference, what the crux of the paradox is, how it relates to a counterfactual definiteness, what the various philosophical resolutions are, and why I feel that Everett’s many worlds interpretation is the least objectionable. By and large, I will be following Guy Blaylock’s paper “The EPR paradox, Bell’s inequality, and the question of locality”, and you probably ought to be reading that paper instead of this blog post.

Counterfactual definiteness is the claim that experiments that were not performed but could have been performed would have had definite outcomes if they had been performed. For most people, counterfactual definiteness is intuitive, after all, science is all about making prediction about the outcomes of experiments that may or may not actually be performed. However, counterfactual definiteness is problematic in the face of predictions made by quantum mechanics and special relativity as we shall see.

Let us set up a standard EPR thought experiment. Suppose Alice and Bob are placed very far apart from each other and at rest relative to each other and have synchronized their clocks. They are each sent a stream of photons entangled with the other party’s stream; let us say a thousand pairs of entangled photons. While this experiment could be analyzed with just a single pair of entangled photons, the paradox is more clear with a stream of entangled photons. Alice and Bob simultaneously choose an angle to measure their photon streams at, and then simultaneously measure the polarization of their stream of photons at their chosen angle. Let us say they end up choosing the same angle, which we will label as measuring at 0°. We postulate Alice and Bob are far enough apart that all of Alice’s measurements are preformed in a space-like separated manner from all of Bob’s measurements. Alice and Bob record the results of their measurements and travel to meet up afterward to compare notes.

Let us say Alice tabulated the following results for her measurements: +---+--+-+--+--+…, where + means the photon was measured as parallel to the alignment of her detector and - means the photon was measured perpendicular to the alignment of her detector. Bob will have recorded the following result: +---+--+-+--+--+…. Their results are identical because the both performed measurements of entangled photons at the same measurement angle. Nothing surprising here.

But suppose, counterfactually, Alice had decided to measure her photons at an angle of 41.4° instead. What would have happened at Alice and Bob’s meeting? Presumably since Bob’s experiment has not changed, and his experiment was space-like separated from Alice’s experiment, his results do not depend on what experiment Alice decides to perform, so his notes would still record the result: +---+--+-+--+--+…. Quantum mechanics predicts that counterfactual Alice and Bob’s results should differ in about 25% of the entries in this counterfactual scenario. So counterfactual Alice’s notes would have perhaps recorded something like -+--+--+--+-+--+…, or perhaps something different. But whatever she recorded, it would be something that differed in about 25% of the entries when compared to Bob’s result. So far so good.

Now suppose, counterfactually, Bob decided to measure at an angle of -41.4° and it was Alice who kept her measurement at 0°. What would have happened at Alice and Bob’s meeting in this case? By the same logic, Alice’s measurements at 0° would still get the result +---+--+-+--+--+…, and it is counterfactual Bob whose reported measurement differ in 25% of the entries. Because counterfactual Bob measures at a negative angle, we don’t expect it to necessarily agree with the previous notes of counterfactual Alice. Maybe counterfactual Bob’s notes would have recorded something like ++--++-+-+--+--+…, or perhaps something different, or perhaps it might even agree the notes of counterfactual Alice. Everything is still okay, but maybe we are getting a little nervous.

Finally, let us suppose, counterfactually, Alice had decided to measure her photons at an angle of 41.4° and Bob had decided to measure his photons at an angle of -41.4°. What would have happened at Alice and Bob’s meeting in this case? Alice and Bob’s experiments are space-like separated so neither of their choices should influence the outcomes of each other’s experiments. Presumably Alice’s notes would be the same as what we wrote above for counterfactual Alice’s notes: -+--+--+--+-+--+…. Similarly Bob’s notes would be the same as what we wrote above for counterfactual Bob’s notes: ++--++-+-+--+--+…. Here is the crux of the EPR paradox. Quantum mechanics predicts that counterfactual Alice and counterfactual Bob’s notes ought to differ in approximately 87.5% of the entries in this scenario! But no matter how we rearrange counterfactual Alice and counterfactual Bob’s notes, they can only differ between 0% and 50% of their entries on average. This is what it means for Bell’s inequality to be violated.

Clearly something is wrong in our naive description of the hypothetical experiments above. What are some proposed philosophical resolutions to this EPR paradox?

One possible resolution is that Alice’s choice in her measurement does somehow affect the outcome of Bob’s experiment! The problem with this is that Alice and Bob’s experiments are space-like separated. This implies that an observer traveling rapidly towards Bob and away from Alice will observe that Bob’s experiments conclude before Alice even begins her experiment when she makes her choice to whether to measure at angle 0° or 41.4°. According to this resolution, this observer sees Alice’s choices affecting the outcome of already completed experiments!

A symmetric possible resolution is that it is Bob’s choice in his measurement that affects the outcome of Alice’s experiment. But we have the same issue as above. There still exists an observer, this time traveling towards Alice and away from Bob, who observes that Alice completes her experiments before Bob begins his experiment.

Non-local interpretations of quantum mechanics, including the Copenhagen interpretation and hidden variable interpretations such as pilot wave theory, resolve the EPR paradox in the above manner. They suggest there is some special global reference frame that is used to absolutely decide which of Alice and Bob’s experiments are performed first and whichever experiment comes first is this special reference frame is the one whose outcome affects the other’s experiment. They suggest that the rest of the laws of physics conspire to keep all agents in the dark about which reference frame is this special global reference frame, as there are no experiments that can determine which reference frame is the special one. In particular, we cannot acutally perform an experiment where we go back in time to see would have happened if Alice or Bob had choosen a different angle of measurement.

Furthermore, in general relativity, I suspect it is more difficult, and probably impossible, to come up with any globally consistent universal reference frame to resolve the order of all events.

Of course, it could also be the case that both Alice and Bob’s choice affect the outcome of each other’s experiments. But this only makes the problem worse as it would mean that in every reference frame there are future events affecting past outcomes.

Another resolution to the EPR paradox is that Alice and Bob could not have chosen different angles of polarization; if they both measure at angle 0° then that is the only choice they could have made and Alice and Bob do not have free choices in the matter. This resolution is called superdeterminism. We can make our thought experiment more extreme by taking Alice and Bob’s free choice out of the picture. Instead we have Alice first measures the polarization of a CMB photon coming from the constellation Leo and chose her measurement setting, 0° or 41.4°, based on the outcome of that measurement. We have Bob measure the polarization of a CMB photon from Aquarius on the opposite side of the visible universe to chose his setting, 0° or -41.4°. Now superdeterminism requires that the universe has been conspiring since near the beginning of time so that the plasma of the early universe would cause two photons photons to be released and travel for 13 billion years to a point where life developed and would be setting up a quantum correlation experiment and pass through their measuring devices in such a way to force them to align their measurement settings to get exactly the correlation in their records that is predicted by quantum mechanics.

Furthermore, in a superdeterminstic world there could be arbitrarily extreme violations of Bell inequalities, even beyond the violations predicted by quantum mechanics. Yet the cosmic conspiracy chooses never to produce statisitical results that exceed the Bell-style voilations predicted by quantum mechanics for some reason.

A third resolution to the EPR paradox is to say that the question of what would have happened if Alice or Bob had done a different experiment is not a well-formed question. This is the resolution captured by the “shut-up and calculate” interpretation of quantum mechanics. There is not much else to say about this resolution beyond saying that I do not find the rejection of the very question to be particularly satisfying.

Lastly we come to Everett’s many world interpretation. This interpretation resolves the EPR paradox by saying that all possible experimental outcomes of Alice and Bob’s experiment all happen and they exist together in a superposition. The phase of Alice and Bob’s superposition changes based on the polarization they choose to make their measurements with, but no matter their measurement choice, all 21000 possible outcomes of Alice experiment happen and exist together and similarly for Bob. Later, when Alice and Bob meet to compare their notes, the superposition of Alices and the superposition of Bobs interfere with each other and split up in such a way that "most" of the Alices meet up with a version of Bob whose recorded outcomes have the correlations predicted by the quantum mechanics (or in the case of perfect phase alignment "all" of the Alices meet up with the corresponding Bob who has identical recorded noted).

This resolution violates counterfactual determinism because it does not predict any specific outcome for counterfactual Alice. It predicts a similar superposition of Alices but in a different phase. In that situation, the various Bobs in superposition could have met up with any number of possible different counterfactual Alices when they interfered. Furthermore, the different phase that the counterfactual Bobs would have been in would definitely influence this interference when meeting up with the superpositions of counterfactual Alices. It is not the case that Bob’s experimental choices affects Alice’s results, but his choices does affect the interference that happens when Alice and Bob meet, and does influence which version of the superposition of Alices he (or rather they) meet up with.

The many worlds interpretation is not without its own problems. If multiple words are all equally as real, why is it that we assign less probability to those worlds with the lesser probability amplitudes. After all, those words are, in some sense, just as real as the worlds associated with larger probability amplitudes. A better way of phrasing the problem might be: why is it rational to behave as if we expect outcomes with probability in accordance to the probability amplitudes of quantum mechanics?

August 25, 2019 06:38 PM

August 24, 2019

Joachim Breitner

ICFP 2019

ICFP 2019 in Berlin ended yesterday, and it was – as always – a great pleasure. This year was particularly noteworthy for the quite affordable conference hotel and the absolutely amazing food during the coffee breaks.

Since I am no longer a proper academic, I unsurprisingly did not have real research to present. Luckily I found ways to not just be a passive participant this year:

  • At FARM, I presented Kaleidogen, a small game (or toy, some would say) of mine. The room was packed with people, so thanks for all your interest! If you missed it, you can soon see the recording or read the demo abstract.

  • At PLMW, the mentoring workshop for young researchers, I ran the “Social event” together with Niki Vazou. Like last year, we randomly grouped the students and held a little competition where they had to match program listings to languages and algorithms. This was great fun, and we even managed to solve the sudden problem of two ties in a ad-hoc extra quiz.

  • During his “State of GHC” speech, Simon Peyton Jones asked me to speak about the GHC Proposal Process for a few slides.

  • And since that is not enough stage time, I secured two spots in local stand-up comedy open mics on Monday and Friday, and even dragged sizable crowds of ICFP participants to these venues. One was a boat, and the other one a pretty dodgy bar in Neukölln, so that alone was a memorable experience. And the host was visibly surprised when his joke “I couldn’t be a software developers – I can’t commit” was met by such a roaring response…

Anyways, ICFP is over, back to disappear in the churn of every day work, and I hope to see you all next year.

by Joachim Breitner (mail@joachim-breitner.de) at August 24, 2019 06:35 AM

August 19, 2019

Magnus Therning

Hedgehog on a REST API, part 3

In my previous post on using Hedgehog on a REST API, Hedgehog on a REST API, part 2 I ran the test a few times and adjusted the model to deal with the incorrect assumptions I had initially made. In particular, I had to adjust how I modelled the User ID. Because of the simplicity of the API that wasn’t too difficult. However, that kind of completely predictable ID isn’t found in all APIs. In fact, it’s not uncommon to have completely random IDs in API (often they are UUIDs).

So, I set out to try to deal with that. I’m still using the simple API from the previous posts, but this time I’m pretending that I can’t build the ID into the model myself, or, put another way, I’m capturing the ID from the responses.

The model state

When capturing the ID it’s no longer possible to use a simple Map Int Text for the state, because I don’t actually have the ID until I have an HTTP response. However, the ID is playing an important role in the constructing of a sequence of actions. The trick is to use Var Int v instead of an ordinary Int. As I understand it, and I believe that’s a good enough understanding to make use of Hedgehog possible, is that this way the ID is an opaque blob in the construction phase, and it’s turned into a concrete value during execution. When in the opaque state it implements enough type classes to be useful for my purposes.

The API calls: add user

When taking a closer look at the Callback type not all the callbacks will get the state in the same form, opaque or concrete, and one of them, Update actually receives the state in both states depending on the phase of execution. This has the most impact on the add user action. To deal with it there’s a need to rearrange the code a bit, to be specific, commandExecute can no longer return a tuple of both the ID and the status of the HTTP response because the update function can’t reach into the tuple, which it needs to update the state.

That means the commandExecute function will have to do tests too. It is nice to keep all tests in the callbacks, but by sticking a MonadTest m constraint on the commandExecute it turns into a nice solution anyway.

I found that once I’d come around to folding the Ensure callback into the commandExecute function the rest fell out from the types.

The API calls: delete user

The other actions, deleting a user and getting a user, required only minor changes and the changes were rather similar in both cases.

Not the type for the action needs to take a Var Int v instead of just a plain Int.

Which in turn affect the implementation of HTraversable

Then the changes to the Command mostly comprise use of concrete in places where the real ID is needed.

deleteUser :: (MonadGen n, MonadIO m) => Command n m State
deleteUser = Command gen exec [ Update u
                              , Require r
                              , Ensure e
                              ]
  where
    gen (State m) = case M.keys m of
      [] -> Nothing
      ks -> Just $ DeleteUser <$> Gen.element ks

    exec (DeleteUser vi) = liftIO $ do
      mgr <- newManager defaultManagerSettings
      delReq <- parseRequest $ "DELETE http://localhost:3000/users/" ++ show (concrete vi)
      delResp <- httpNoBody delReq mgr
      return $ responseStatus delResp

    u (State m) (DeleteUser i) _ = State $ M.delete i m

    r (State m) (DeleteUser i) = i `elem` M.keys m

    e _ _ (DeleteUser _) r = r === status200

Conclusion

This post concludes my playing around with state machines in Hedgehog for this time. I certainly hope I find the time to put it to use on some larger API soon. In particular I’d love to put it to use at work; I think it’d be an excellent addition to the integration tests we currently have.

August 19, 2019 12:00 AM

August 18, 2019

Michael Snoyman

Haskell kata: withTryFileLock

This is the first Haskell code kata I’ve put on this blog (to my knowledge). The idea is to present a self contained, relatively small coding challenge to solidify some skills with Haskell. If people like this and would like to see more, let me know. Caveat: these will almost certainly be supply driven. As I notice examples like this in my code, I’ll try to extract them like this blog post.

OK, here’s the story. The filelock library provides a set of functions for working with locked files. Some of these will block until a file lock is available. However, some will instead return a Maybe value and use Nothing to represent the case where a lock is not available.

What’s interesting about this is the withTryFileLock function, which is a rare combination of the bracket pattern and potential failure. Its signature is:

withTryFileLock
  :: FilePath
  -> SharedExclusive
  -> (FileLock -> IO a)
  -> IO (Maybe a)

The FilePath parameter says which file to try and lock. SharedExclusive says the type of lock to take. The third parameter is the action to perform with the file lock. That action will return an IO a action. Then, if the lock is taken, that a value ends up wrapped in a Just constructor and returned from withTryFileLock. If the lock failed, then Nothing is returned.

The thing is, there’s an alternative function signature we could have instead, which would provide a Maybe FileLock to the inner action. It looks like this:

withTryFileLock
  :: FilePath
  -> SharedExclusive
  -> (Maybe FileLock -> IO a)
  -> IO a

Why would you want one versus the other? It’s not the topic I’m focusing on today, and it honestly doesn’t matter that much. Here’s the code kata:

Implement the second version in terms of the first, and the first version in terms of the second.

To complete these code kata:

  1. Copy/paste the code snippet below into a file called Main.hs
  2. Make sure you have Stack installed.
  3. Make tweaks to Main.hs.
  4. Run stack Main.hs.
  5. If you get an error in step 4, go back to 3.
  6. Congratulations, you’ve successfully fixed the program and parsed my BASIC-esque goto statement!

Bonus points: generalize version1 and version2 to work in any MonadUnliftIO.

#!/usr/bin/env stack
-- stack --resolver lts-14.1 script
import System.FileLock (FileLock, SharedExclusive (..), withTryFileLock)

-- We've imported this function:
--
-- withTryFileLock
--   :: FilePath
--   -> SharedExclusive
--   -> (FileLock -> IO a)
--   -> IO (Maybe a)

-- | Implement this function by using the 'withTryFileLock' imported above.
version1
  :: FilePath
  -> SharedExclusive
  -> (Maybe FileLock -> IO a)
  -> IO a
version1 = _

-- | And now turn it back into the original type signature. Use the
-- 'version1' function we just defined above.
version2
  :: FilePath
  -> SharedExclusive
  -> (FileLock -> IO a)
  -> IO (Maybe a)
version2 = _

-- | Just a simple test harness
main :: IO ()
main = do
  version1 "version1.txt" Exclusive $ \(Just _lock) ->
    version1 "version1.txt" Exclusive $ \Nothing ->
    putStrLn "Yay, it worked!"

  Just _ <- version2 "version2.txt" Exclusive $ \_lock -> do
    Nothing <- version2 "version2.txt" Exclusive $
      error "Should not be called"
    pure ()
  putStrLn "Yay, it worked!"

August 18, 2019 12:04 PM

August 16, 2019

Functional Jobs

Developer - Erlang, Elm, and Haskell at Driebit (Full-time)

Are you convinced of the benefits of functional programming? Are you always thinking about what could be the simplest and most elegant way to achieve something? And are you interested in art, culture, sustainability, and education? Then you’re the person we’re looking for!

FULL-TIME / PART-TIME · AMSTERDAM · NO REMOTE

Our team

We are a team of fifteen people. Three directors coordinate the various subdivisions, but also contribute substantively to the work we do.

We treasure an open culture in which personal opinions are appreciated. That shows in our projects, but also at the lunch table. We like to talk about what we do, and there is always room for new ideas.

A free culture like ours can only flourish because we take our responsibilities seriously. For good relationships with our clients, for the things we build, and for you; so you can have a healthy work-life balance.

alt text

What you do at Driebit

  • You build websites! Either by yourself or together with your colleagues, you strive to find the most elegant solutions to your problems.
  • You think hard about the simple way to build things, and you are capable of explaining to clients why that’s different from the easy way.
  • Because we believe in the right tools for the right job, you feel at liberty to introduce interesting technical solutions, and you do so freely and happily.
  • You improve existing websites by fixing bugs or adding new features.
  • You work on tools that help you and your colleagues be fitter, happier, more productive.
  • You may organise and host meetups, about functional programming for example!

Why you like it

  • At Driebit we work with Erlang, Haskell, and Elm.
  • Your input, talent, and knowledge are all put towards a positive contribution to society.
  • Because of longstanding relationships with our clients, you will develop interesting connections.
  • We always aim for the highest quality, and sometimes even win prizes!
  • We have lunch together every day, with nice bread and organic hagelslag!
  • We have a beautiful, sun-lit office in the best area of Amsterdam, offering a pleasant view of the canal and an excellent launch pad for afternoon walks.

alt text

What are we looking for in you?

  • Someone who loves FP and especially Haskell <3
  • Keep It Simple, Silly. Someone who’d rather do three things very well, than ten things so-so.
  • Someone for whom meaningful work is not a wish, but a prerequisite.
  • Someone who’s available 4-5 days a week.
  • A basic understanding of Dutch
  • You live in the Netherlands preferably in or around Amsterdam

So what’s next?

We would like to get to know you a little better. Please write a short summary of why you think this job fits you, and what sort of colleague you are.

If you have any questions, please do not hesitate to contact Dorien Drees (dorien [at] driebit [dot] nl).

IMPORTANT MESSAGE TO RECRUITERS: please, please, please do not contact us.

Get information on how to apply for this position.

August 16, 2019 09:12 AM

August 12, 2019

Functional Jobs

Scala developer for NetLogo at Northwestern University (Full-time)

The Center for Connected Learning at Northwestern University is looking for a full-time Scala/Java Software Developer to work on the NetLogo desktop application, a computational modeling environment widely-used in both education and research. This might be a good opportunity for you if you would enjoy working on:

  • A software project with thousands of users all over the world.
  • A programming language compiler and runtime.
  • A graphical simulation and modeling environment.
  • An open-source project with a growing public ecosystem.

This Software Developer position is based at Northwestern University's Center for Connected Learning and Computer-Based Modeling (CCL), working in a small collaborative development team in a university research group that also includes professors, postdocs, graduate students, and undergraduates, supporting the needs of multiple research projects. Many of the projects undertaken by the lab use the NetLogo software.

NetLogo is a programming language and an agent-based modeling environment. The NetLogo language is a dialect of Logo/Lisp specialized for building agent-based simulations of natural and social phenomena. NetLogo has hundreds of thousands of users ranging from grade school students to advanced researchers. NetLogo also features an expansive API that members of the NetLogo community use to extend the language to integrate with software like GIS databases, Python, R, and Mathematica, and to interface with hardware devices like Arduino boards and video cameras.

Application information:

The Northwestern campus is in Evanston, Illinois on the Lake Michigan shore, adjacent to Chicago and easily reachable by public transportation.

Specific Responsibilities:

  • Independently implements NetLogo features and bug fixes in Scala and Java, including doing code analysis to identify clean designs and architecture changes as needed.
  • Collaborates with the NetLogo development team and principal research investigators in planning and designing enhancements for NetLogo and other related projects.
  • Interacts with lab members and the NetLogo user community including responding to bug reports, questions, and suggestions, and reviewing open-source contributions; provides feedback and guidance to student workers.
  • Performs other duties as required or assigned.

Minimum Qualifications:

  • A bachelor's degree in computer science or a closely related field or the equivalent combination of education, training and experience from which comparable skills and abilities can be acquired.
  • Two or more years of software development experience, with demonstrated efforts at improving software development skills and knowledge.

Preferred Qualifications:

  • Experience working effectively as part of a small software development team, including maintaining close collaboration, using distributed version control, and implementing automated testing.
  • Experience with at least one JVM language, Scala strongly preferred.
  • Experience developing GUI applications, especially Java Swing-based applications.
  • Experience with programming language design and implementation, functional programming (especially Haskell or Lisp), and compilers.
  • Interest in and experience with computer-based modeling and simulation, especially agent-based simulation.
  • Experience working on research projects in an academic environment.
  • Experience with open-source software development and supporting the growth of an open-source community.
  • Interest in education and an understanding of secondary school math and science content.

As per Northwestern University policy, this position requires a criminal background check. Successful applicants will need to submit to a criminal background check prior to employment.

Visa sponsorship may be available for qualified candidates.

Northwestern University is an Equal Opportunity, Affirmative Action Employer of all protected classes including veterans and individuals with disabilities.

Get information on how to apply for this position.

August 12, 2019 06:28 PM

August 11, 2019

Shayne Fletcher

Partitions of a set

Calculating the partitions of a set

Having "solved" a bunch of these divide & conquer problems, I'm the first to admit to having being lulled into a false sense of security. At first glance, the problem of this post seemed deceptively simple and consequently I struggled with it, sort of "hand-waving", not really engaging my brain and getting more and more frustrated how the dang thing wouldn't yield to my experience! I think the moral of the story is math doesn't care about your previous successes and so don't let your past practice trick you into laziness. Be guided by your experience but fully apply yourself to the problem at hand!

Suppose a set of two elements {2, 3}. There are only two ways it can be partitioned: (23), (3)(2). For meaning, you might think of these two partitions like this : in the first partition, there is a connection between the elements 2 and 3, in the second, 2 and 3 are isolated from each other.

Suppose a set of elements {1, 2, 3}. There are five partitions of this set : (123), (23)(1), (13)(2), (3)(21), (3)(2)(1) (I've carefully written them out this way to help with the elucidation). Maybe you want to break here and see if you can write an algorithm for calculating them before reading on?

Observe that we can get the partitions of {1, 2, 3} from knowledge of the partitions of {2, 3} by looking at each partition of {2, 3} in turn and considering the partitions that would result by inclusion of the element 1. So, for example, the partition (23) gives rise to the partitions (123) and (23)(1). Similarly, the partition (3)(2) gives rise to the partitions (13)(2), (3)(21) and (3)(2)(1). We might characterize this process as computing new partitions of {1, 2, 3} from a partition p of {2, 3} as "extending" p .

Suppose then we write a function extend x p to capture the above idea. Let's start with the signature of extend. What would it be? Taking (23)(1) as an exemplar, we see that a component of a partition can be represented as [a] and so a partition itself then as [[a]]. We know that extend takes an element and a partition and returns a list of (new) partitions so it must have signature extend :: a -> [[a]] -> [[[a]]] (yes, lists of lists of lists are somehow easy to get confused about).

Now for writing the body of extend. The base case is the easiest of course - extending the empty partition:

extend x [] = [[[x]]]
That is, a singleton list of partitions where that one partition has one component. The inductive case is the partition obtained by "pushing" x into the first component of p together with the extensions that leave the first component of p alone.
extend x (h : tl) = ((x : h) : tl) : map (h :) (extend x tl)

We can now phrase the function partition with signature partition :: [a] -> [[[a]]] like this:

partition [] = [[]]
partition (h : tl) = concatMap (extend h) (partition tl)
The base case says, the only partition of the empty set is the the empty partition.

Wrapping it all up, the algorithm in entirety is

partition :: [a] -> [[[a]]]
partition [] = [[]]
partition (h : tl) = concatMap (extend h) (partition tl)
where
extend :: a -> [[a]] -> [[[a]]]
extend x [] = [[[x]]]
extend x (h : tl) = ((x : h) : tl) : map (h :) (extend x tl)

by Shayne Fletcher (noreply@blogger.com) at August 11, 2019 04:55 PM

Oleg Grenrus

ANN: cabal-fmt

Posted on 2019-08-11 by Oleg Grenrus

As Cabal-3.0.0.0 is now released, I uploaded the cabal-fmt tool to Hackage. I have been using cabal-fmt for over a half year now for my own Haskell projects, and have been happy with this minimal, yet useful tool. cabal-fmt formats .cabal file preserving the field ordering and comments.

cabal-fmt is based on Distribution.Fields functionality. cabal-fmt is a thin addition on top. Same Distribution.Fields (and related Distribution.FieldsGrammar) is also used in haskell-ci to parse and print .cabal-like files. I also use it in other tools to implement configuration files. In my opinion the lexical structure of .cabal files is more flexible and human-writing-friendly than YAML or JSON. YMMV. For example the header for this post is written as

with quotes needed to disambiguate YAML. That's silly :) Cabal-like syntax would be

However, enough bashing YAML.

cabal-fmt is opinionated tool, it does format few fields to my liking. Let us see how.

build-depends

build-depends modules are formatted comma first (with cabal-version: 2.2 also with a leading comma), tabulated, sorted, and ^>= preferred when it can be used. For example:

or (for older cabal-version):

Single build-depends are formatted as a single line, like

nub & sort

exposed-modules, other-modules, default-extensions and other-extensions are sorted and duplicates removed. For example.

Sometimes, you'll prefer some module to be the first, for cabal repl. In that case I would use two exposed-modules fields.

tested-with

tested-with is one more field where I don't like the default formatting either. This field drives the job selection in haskell-ci. cabal-fmt combines version ranges for compilers, and prints GHC and GHCJS in upper case.

The line generated is long, especially for packages supporting a lot of GHC versions. Something I don't have a clear preference yet how to handle.

Extra: expand exposed-modules and other-modules

The recent addition is an ability to (re)write field contents, while formatting. There's an old, ongoing discussion of allowing wildcard specification of exposed-modules in .cabal format. I'm against that change. Instead, rather cabal-fmt (or an imaginary IDE), would regenerate parts of .cabal file given some commands.

cabal-fmt: expand <directory> is a one (the only at the moment) such command.

cabal-fmt will look into directory for files, turn filenames into module names and append to the contents of exposed-modules. As the field is then nubbed and sorted, expanding is idempotent. For example cabal-fmt itself has:

The functionality is simple. There is no removal of other-modules or main-is. I think that using different directory for these is good enough workaround, and may make things clearer: directory for public modules and a directory for private ones.

Conclusion

And that's all that cabal-fmt does. Formatting of other fields comes directly from Cabal. I have few ideas, what else can be done, e.g.

  • cabal-fmt: expand for extra-source-files
  • formatting of reexported-modules
  • sorting of fields, e.g. putting type and default-language to the top of the component stanzas.

But these don't bother me enough yet, so they are not there.

The implicit goal of a project is to iterate independently of cabal-install, Find out what could be useful, and how it can be done, and later merge into cabal-install's cabal format functionality. Yet then providing enough configuration knobs to not be so opinionated.

August 11, 2019 12:00 AM

August 10, 2019

Magnus Therning

Architecture of a service

Early this summer it was finally time to put this one service I’ve been working on into our sandbox environment. It’s been running without hickups so last week I turned it on for production as well. In this post I thought I’d document the how and why of the service in the hope that someone will find it useful.

The service functions as an interface to external SMS-sending services, offering a single place to change if we find that we are unhappy with the service we’re using.1 This service replaces an older one, written in Ruby and no one really dares touch it. Hopefully the Haskell version will prove to be a joy to work with over time.

Overview of the architecture

The service is split into two parts, one web server using scotty, and streaming data processing using conduit. Persistent storage is provided by a PostgreSQL database. The general idea is that events are picked up from the database, acted upon, which in turn results in other events which written to the database. Those are then picked up and round and round we go. The web service accepts requests, turns them into events and writes the to the database.

Hopefully this crude diagram clarifies it somewhat.

Diagram of the service architecture

There are a few things that might need some explanation

  • In the past we’ve wanted to have the option to use multiple external SMS services at the same time. One is randomly chosen as the request comes in. There’s also a possibility to configure the frequency for each external service.

    Picker implements the random picking and I’ve written about that earlier in Choosing a conduit randomly.

    Success and fail are dummy senders. They don’t actually send anything, and the former succeeds at it while the latter fails. I found them useful for manual testing.

  • Successfully sending off a request to an external SMS service, getting status 200 back, doesn’t actually mean that the SMS has been sent, or even that it ever will be. Due to the nature of SMS messaging there are no guarantees of timeliness at all. Since we are interested in finding out whether an SMS actually is sent a delayed action is scheduled, which will fetch the status of a sent SMS after a certain time (currently 2 minutes). If an SMS hasn’t been sent after that time it might as well never be – it’s too slow for our end-users.

    This is what report-fetcher and fetcher-func do.

  • The queue sink and queue src are actually sourceTQueue and sinkTQueue. Splitting the stream like that makes it trivial to push in events by using writeTQueue.

  • I use sequenceConduits in order to send a single event to multiple Conduits and then combine all their results back into a single stream. The ease with which this can be done in conduit is one of the main reasons why I choose to use it.2

Effects and tests

I started out writing everything based on a type like ReaderT <my cfg type> IO and using liftIO for effects that needed lifting. This worked nicely while I was setting up the basic structure of the service, but as soon as I hooked in the database I really wanted to do some testing also of the effectful code.

After reading Introduction to Tagless Final and The ReaderT Design Patter, playing a bit with both approaches, and writing Tagless final and Scotty and The ReaderT design pattern or tagless final?, I finally chose to go down the route of tagless final. There’s no strong reason for that decision, maybe it was just because I read about it first and found it very easy to move in that direction in small steps.

There’s a split between property tests and unit tests:

  • Data types, their monad instances (like JSON (de-)serialisation), pure functions and a few effects are tested using properties. I’m using QuickCheck for that. I’ve since looked a little closer at hedgehog and if I were to do a major overhaul of the property tests I might be tempted to rewrite them using that library instead.

  • Most of the Conduits are tested using HUnit.

Configuration

The service will be run in a container and we try to follow the 12-factor app rules, where the third one says that configuration should be stored in the environment. All previous Haskell projects I’ve worked on have been command line tools were configuration is done (mostly) using command line argument. For that I usually use optparse-applicative, but it’s not applicable in this setting.

After a bit of searching on hackage I settled on etc. It turned out to be nice an easy to work with. The configuration is written in JSON and only specifies environment variables. It’s then embedded in the executable using file-embed. The only thing I miss is a ToJSON instance for Config – we’ve found it quite useful to log the active configuration when starting a service and that log entry would become a bit nicer if the message was JSON rather than the (somewhat difficult to read) string that Config’s Show instance produces.

Logging

There are two requirements we have when it comes to logging

  1. All log entries tied to a request should have a correlation ID.
  2. Log requests and responses

I’ve written about correlation ID before, Using a configuration in Scotty.

Logging requests and responses is an area where I’m not very happy with scotty. It feels natural to solve it using middleware (i.e. using middleware) but the representation, especially of responses, is a bit complicated so for the time being I’ve skipped logging the body of both. I’d be most interested to hear of libraries that could make that easier.

Data storage and picking up new events

The data stream processing depends heavily on being able to pick up when new events are written to the database. Especially when there are more than one instance running (we usually have at least two instance running in the production environment). To get that working I’ve used postgresql-simple’s support for LISTEN and NOTIFY via the function getNotification.

When I wrote about this earlier, Conduit and PostgreSQL I got some really good feedback that made my solution more robust.

Delayed actions

Some things in Haskell feel almost like cheating. The light-weight threading makes me confident that a forkIO followed by a threadDelay (or in my case, the ones from unliftio) will suffice.


  1. It has happened in the past that we’ve changed SMS service after finding that they weren’t living up to our expectations.

  2. A while ago I was experimenting with other streaming libraries, but I gave up on getting re-combination to work – Zipping streams

August 10, 2019 12:00 AM

August 09, 2019

Oliver Charles

Who Authorized These Ghosts!?

Recently at CircuitHub we’ve been making some changes to how we develop our APIs. We previously used Yesod with a custom router, but we’re currently exploring Servant for API modelling, in part due to it’s potential for code generation for other clients (e.g., our Elm frontend). Along the way, this is requiring us to rethink and reinvent previously established code, and one of those areas is authorization.

To recap, authorization is

the function of specifying access rights/privileges to resources related to information security and computer security in general and to access control in particular.

This is in contrast to authentication, which is the act of showing that someone is who they claim to be.

Authorization is a very important process, especially in a business like CircuitHub where we host many confidential projects. Accidentally exposing this data could be catastrophic to both our business and customers, so we take it very seriously.

Out of the box, Servant has experimental support for authorization, which is a good start. servant-server gives us Servant.Server.Experimental.Auth which makes it a doddle to plug in our existing authorization mechanism (cookies & Redis). But that only shows that we know who is asking for resources, how do we check that they are allowed to access the resources?

As a case study, I want to have a look at a particular end-point, /projects/:id/price. This endpoint calculates the pricing options CircuitHub can offer a project, and there are few important points to how this endpoint works:

  1. The pricing for a project depends on the user viewing it. This is because some users can consign parts so CircuitHub won’t order them. Naturally, this affects the price, so pricing is viewer dependent.
  2. Some projects are owned by organizations, and should be priced by the organization as a whole. If a user is a member of the organization that owns the project pricing has been requested for, return the pricing for the organization. If the user is not in the organization, return their own custom pricing.
  3. Private projects should only expose their pricing to superusers, the owner of the project, and any members of the project’s organization (if it’s owned by an organization).

This specification is messy and complicated, but that’s just reality doing it’s thing.

Our first approach was to try and represent this in Servant’s API type. We start with the “vanilla” route, with no authentication or authorization:

Next, we add authorization:

At this point, we’re on our own - Servant offers no authorization primitives (though there are discussions on this topic).

My first attempt to add authorization to this was:

There are two new routing combinators here: AuthorizeWith and CanView. The idea is AuthorizeWith somehow captures the result of authenticating, and provides that information to CanView. CanView itself does some kind of authorization using a type class based on its argument - here Capture "id" ProjectId. The result is certainly something that worked, but I was unhappy with both the complexity to implement it (which is scope to get it wrong), and the lack of actual evidence of authorization.

The latter point needs some expanding. What I mean by “lacking evidence” is that with the current approach, the authorization is essentially like writing the following code:

If I later add more resource access into doThings, what will hold me accountable to checking authorization on those resources? The answer is… nothing! This is similar to boolean blindless - we performed logical check, only to throw all the resulting evidence away immediately.

At this point I wanted to start exploring some different options. While playing around with ideas, I was reminded of the wonderful paper “Ghosts of Departed Proofs”, and it got me thinking… can we use these techniques for authorization?

Ghosts of Departed Proofs

The basic idea of GDP is to name values using higher-rank quantification, and then - in trusted modules - produce proofs that refer to these names. To name values, we introduce a Named type, and the higher-ranked function name to name things:

Note that the only way to construct a Named value outside of this module is to use name, which introduces a completely distinct name for a limited scope. Within this scope, we can construct proofs that refer to these names. As a basic example, we could use GDP to prove that a number is prime:

Here we have our first proof witness - IsPrime. We can witness whether or not a named Int is prime using checkPrime - like the boolean value isPrime this determines if a number is or isn’t prime, but we get evidence that we’ve checked a specific value for primality.

This is the whirlwind tour of GDP, I highly recommend reading the paper for a more thorough explanation. Also, the library justified-containers explores these ideas in the context of maps, where we have proofs that specific items are in the map (giving us total lookups, rather than partial lookups).

GDP and Authorization

This is all well and good, but how does this help with authorization? The basic idea is that authorization is itself a proof - a proof that we can view or interact with resources in a particular way. First, we have to decide which functions need authorization - these functions will be modified to require proof values the refer to the function arguments. In this example, we’ll assume our Servant handler is going to itself make a call to the price :: ProjectId -> UserId -> m Price function. However, given the specification above, we need to make sure that user and project are compatible. To do this, we’ll name the arguments, and then introduce a proof that the user in question can view the project:

But what is this CanViewProject proof?

A first approximation is to treat it as some kind of primitive or axiom. A blessed function can postulate this proof with no further evidence:

This is a good start! Our price function can only be called with a CanViewProject that matches the named arguments, and the only way to construct such a value is to use canViewProject. Of course we could get the implementation of this wrong, so we should focus our testing efforts to make sure it’s doing the right thing.

However, the Agda programmer in me is a little unhappy about just blindly postulating CanViewProject at the end. We’ve got a bit of vision back from our boolean blindness, but the landscape is still blurry. Fortunately, all we have to do is recruit more of the same machinery so far to subdivide this proof into smaller ones:

Armed with these smaller authorization primitives, we can build up our richer authorization scheme:

Now canViewProject just calls out to the other authorization routines to build it’s proof. Furthermore, there’s something interesting here. CanViewProject doesn’t postulate anything - everything is attached with a proof of the particular authorization case. This means that we can actually open up the whole CanViewProject module to the world - there’s no need to keep anything private. By doing this and allowing people to pattern match on CanViewProject, authorization results become reusable - if something else only cares that a user is a super user, we might be able to pull this directly out of CanViewProject - no need for any redundant database checks!

In fact, this very idea can help us implement the final part of our original specification:

Some projects are owned by organizations, and should be priced by the organization as a whole. If a user is a member of the organization that owns the project pricing has been requested for, return the pricing for the organization. If the user is not in the organization, return their own custom pricing.

If we refine our UserBelongsToProjectOrganization proof, we can actually maintain a bit of extra evidence:

Now whenever we have a proof UserBelongsToProjectOrganization, we can pluck out the actual organization that we’re talking about. We also have evidence that the organization owns the project, so we can easily construct a new CanViewProject proof - proofs generate more proofs!

Relationship to Servant

At the start of this post, I mentioned that the goal was to integrate this with Servant. So far, we’ve looked at adding authorization to a single function, so how does this interact with Servant? Fortunately, it requires very little to change. The Servant API type is authorization free, but does mention authentication.

It’s only when we need to call our price function do we need to have performed some authorization, and this happens in the server-side handler. We do this by naming the respective arguments, witnessing the authorization proof, and then calling price:

Conclusion

That’s where I’ve got so far. It’s early days so far, but the approach is promising. What I really like is there is almost a virtual slider between ease and rigour. It can be easy to get carried away, naming absolutely everything and trying to find the most fundamental proofs possible. I’ve found so far that it’s better to back off a little bit - are you really going to get some set membership checks wrong? Maybe. But a property check is probably gonig to be enough to keep that function in check. We’re not in a formal proof engine setting, pretending we are just makes things harder than they need to be.

by Oliver Charles at August 09, 2019 12:00 AM

August 07, 2019

Philip Wadler

IOHK is hiring!


The Plutus team at IOHK is headed by Manuel Chakravarty and consists of a small but strong team of developers; I work on it as a consultant. We are designing a smart contract language, based on Haskell (for both offchain and onchain user-level programming) and System F (as the core code that runs onchain, the equivalent of the EVM for Ethereum). IOHK, unlike any other firm I know, is committed to building on peer-reviewed research, so publication is encouraged. We are hiring!
As a Functional Compiler Engineer at IOHK you will work closely with our programming language theory and cryptography researchers, our formal methods team, and our engineering team throughout the smart contracts development programme involving design, coding, testing and integrating of new smart scripting languages into our blockchain technology. This also includes the design and implementation of relevant domain specific languages (DSLs). You will have a strong understanding of programming language design, type systems, operational semantics, interpreters, and compiler implementation techniques.
Applications from folk who will increase our diversity are encouraged. Details here.

by Philip Wadler (noreply@blogger.com) at August 07, 2019 09:07 AM

August 05, 2019

Manuel M T Chakravarty

Functional Blockchain Contracts

Check out the draft of the paper describing the principles underlying Plutus Platform. Here the abstract:

Distributed cryptographic ledgers —aka blockchains —should be a functional programmer’s dream. Their aim is immutability: once a block has been added to the chain it should not be altered or removed. The seminal blockchain, Bitcoin, uses a graph-based model that is purely functional in nature. But Bitcoin has limited support for smart contracts and distributed applications. The seminal smart-contract platform, Ethereum, uses an imperative and object-oriented model of accounts. Ethereum has been subject to numerous exploits, often linked to its use of shared mutable state by way of its imperative and object-oriented features in a concurrent and distributed system. Coding a distributed application for Ethereum requires two languages: Javascript to run off-chain, which submits transaction written in Solidity to run on-chain.

This paper describes Plutus Platform, a functional blockchain smart contract system for coding distributed applications on top of the Cardano blockchain. Most blockchain programming platforms depend on a custom language, such as Ethereum’s Solidity, but Plutus is provided as a set of libraries for Haskell. Both off-chain and on-chain code are written in Haskell: off-chain code using the Plutus library, and on-chain code in a subset of Haskell using Template Haskell. On-chain code is compiled to a tiny functional language called Plutus Core, which is System Fω with iso-recursive types and suitable primitives.

Plutus and Cardano are available open source, and Plutus Playground provides a web-based IDE that enables users to try out the system and to develop simple applications.

August 05, 2019 10:34 AM