# PolyConf 2015

Last year I spoke at PolyConf about client/server Haskell webapps, and especially on GHCJS. This year's PolyConf has just opened up a Call for Proposals.

I really enjoyed my time at the conference last year: it's a great place to meet and interact with people doing cool things in other languages, and can be a great source of inspiration for ways we can do things better. Also, as I've mentioned recently, I think it's a great thing for us Haskellers to put out content accessible by the non-Haskell world, and a conference like this is a perfect way to do so.

If you have a topic that you think polyglot programmers would be interested in, please take the time to submit a proposal for the conference.

# A Type Safe Reverse or Some Hasochism

Conor McBride was not joking when he and his co-author entitled their paper about dependent typing in Haskell “Hasochism”: Lindley and McBride (2013).

In trying to resurrect the Haskell package yarr, it seemed that a dependently typed reverse function needed to be written. Writing such a function turns out to be far from straightforward. How GHC determines that a proof (program) discharges a proposition (type signature) is rather opaque and perhaps not surprisingly the error messages one gets if the proof is incorrect are far from easy to interpret.

I’d like to thank all the folk on StackOverflow whose answers and comments I have used freely below. Needless to say, any errors are entirely mine.

Here are two implementations, each starting from different axioms (NB: I have never seen type families referred to as axioms but it seems helpful to think about them in this way).

> {-# OPTIONS_GHC -Wall                     #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults   #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind  #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans         #-}

> {-# LANGUAGE GADTs                        #-}
> {-# LANGUAGE KindSignatures               #-}
> {-# LANGUAGE DataKinds                    #-}
> {-# LANGUAGE TypeFamilies                 #-}
> {-# LANGUAGE UndecidableInstances         #-}
> {-# LANGUAGE ExplicitForAll               #-}
> {-# LANGUAGE TypeOperators                #-}
> {-# LANGUAGE ScopedTypeVariables          #-}


For both implementations, we need propositional equality: if a :~: b is inhabited by some terminating value, then the type a is the same as the type b. Further we need the normal form of an equality proof: Refl :: a :~: a and a function, gcastWith which allows us to use internal equality (:~:) to discharge a required proof of external equality (~). Readers familiar with topos theory, for example see Lambek and Scott (1988), will note that the notation is reversed.

> import Data.Type.Equality ( (:~:) (Refl), gcastWith )


For the second of the two approaches adumbrated we will need

> import Data.Proxy


The usual natural numbers:

> data Nat = Z | S Nat


We need some axioms:

1. A left unit and
2. Restricted commutativity.
> type family (n :: Nat) :+ (m :: Nat) :: Nat where
>     Z   :+ m = m
>     S n :+ m = n :+ S m


We need the usual singleton for Nat to tie types and terms together.

> data SNat :: Nat -> * where
>   SZero :: SNat Z
>   SSucc :: SNat n -> SNat (S n)


Now we can prove some lemmas.

First a lemma showing we can push :+ inside a successor, S.

> succ_plus_id :: SNat n1 -> SNat n2 -> (((S n1) :+ n2) :~: (S (n1 :+ n2)))
> succ_plus_id SZero _ = Refl
> succ_plus_id (SSucc n) m = gcastWith (succ_plus_id n (SSucc m)) Refl


This looks nothing like a standard mathematical proof and it’s hard to know what ghc is doing under the covers but presumably something like this:

• For SZero
1. S Z :+ n2 = Z :+ S n2 (by axiom 2) = S n2 (by axiom 1) and
2. S (Z + n2) = S n2 (by axiom 1)
3. So S Z :+ n2 = S (Z + n2)
• For SSucc
1. SSucc n :: SNat (S k) so n :: SNat k and m :: SNat i so SSucc m :: SNat (S i)
2. succ_plus id n (SSucc m) :: k ~ S p => S p :+ S i :~: S (p :+ S i) (by hypothesis)
3. k ~ S p => S p :+ S i :~: S (S p :+ i) (by axiom 2)
4. k :+ S i :~: S (k :+ i) (by substitution)
5. S k :+ i :~: S (k :+ i) (by axiom 2)

Second a lemma showing that Z is also the right unit.

> plus_id_r :: SNat n -> ((n :+ Z) :~: n)
> plus_id_r SZero = Refl
> plus_id_r (SSucc n) = gcastWith (plus_id_r n) (succ_plus_id n SZero)

• For SZero
1. Z :+ Z = Z (by axiom 1)
• For SSucc
1. SSucc n :: SNat (S k) so n :: SNat k
2. plus_id_r n :: k :+ Z :~: k (by hypothesis)
3. succ_plus_id n SZero :: S k :+ Z :~: S (k + Z) (by the succ_plus_id lemma)
4. succ_plus_id n SZero :: k :+ Z ~ k => S k :+ Z :~: S k (by substitution)
5. plus_id_r n :: k :+ Z :~: k (by equation 2)

Now we can defined vectors which have their lengths encoded in their type.

> infixr 4 :::

> data Vec a n where
>     Nil   :: Vec a Z
>     (:::) :: a -> Vec a n -> Vec a (S n)


We can prove a simple result using the lemma that Z is a right unit.

> elim0 :: SNat n -> (Vec a (n :+ Z) -> Vec a n)
> elim0 n x = gcastWith (plus_id_r n) x


Armed with this we can write an ${\mathcal{O}}(n)$ reverse function in which the length of the result is guaranteed to be the same as the length of the argument.

> size :: Vec a n -> SNat n
> size Nil         = SZero
> size (_ ::: xs)  = SSucc $size xs  > accrev :: Vec a n -> Vec a n > accrev x = elim0 (size x)$ go Nil x where
>     go :: Vec a m -> Vec a n -> Vec a (n :+ m)
>     go acc  Nil       = acc
>     go acc (x ::: xs) = go (x ::: acc) xs

> toList :: Vec a n -> [a]
> toList  Nil       = []
> toList (x ::: xs) = x : toList xs

> test0 :: [String]
> test0 = toList $accrev$ "a" ::: "b" ::: "c" ::: Nil

ghci> test0
["c","b","a"]


For an alternative approach, let us change the axioms slightly.

> type family (n1 :: Nat) + (n2 :: Nat) :: Nat where
>   Z + n2 = n2
>   (S n1) + n2 = S (n1 + n2)


Now the proof that Z is a right unit is more straightforward.

> plus_id_r1 :: SNat n -> ((n + Z) :~: n)
> plus_id_r1 SZero = Refl
> plus_id_r1 (SSucc n) = gcastWith (plus_id_r1 n) Refl


For the lemma showing we can push + inside a successor, S, we can use a Proxy.

> plus_succ_r1 :: SNat n1 -> Proxy n2 -> ((n1 + (S n2)) :~: (S (n1 + n2)))
> plus_succ_r1 SZero _ = Refl
> plus_succ_r1 (SSucc n1) proxy_n2 = gcastWith (plus_succ_r1 n1 proxy_n2) Refl


Now we can write our reverse function without having to calculate sizes.

> accrev1 :: Vec a n -> Vec a n
> accrev1 Nil = Nil
> accrev1 list = go SZero Nil list
>   where
>     go :: SNat n1 -> Vec a n1 -> Vec a n2 -> Vec a (n1 + n2)
>     go snat acc Nil = gcastWith (plus_id_r1 snat) acc
>     go snat acc (h ::: (t :: Vec a n3)) =
>       gcastWith (plus_succ_r1 snat (Proxy :: Proxy n3))
>                 (go (SSucc snat) (h ::: acc) t)

> test1 :: [String]
> test1 = toList $accrev1$ "a" ::: "b" ::: "c" ::: Nil

ghci> test0
["c","b","a"]


## Bibliography

Lambek, J., and P.J. Scott. 1988. Introduction to Higher-Order Categorical Logic. Cambridge Studies in Advanced Mathematics. Cambridge University Press. http://books.google.co.uk/books?id=6PY_emBeGjUC.

Lindley, Sam, and Conor McBride. 2013. “Hasochism: The Pleasure and Pain of Dependently Typed Haskell Programming.” In Proceedings of the 2013 ACM SIGPLAN Symposium on Haskell, 81–92. Haskell ’13. New York, NY, USA: ACM. doi:10.1145/2503778.2503786.

# Ok, Internet, I heard you. I need a new name for the JS type inferer/checker.

I knew a new name is needed for this JS type inferer/checker. Many people have pointed out that sweet.js uses the same short name (sjs). I could just stick with “Safe JS” but not make it short. Or I could pick a new name, to make it less boring.

Thanks!

## January 24, 2015

### Christopher Done

I spend a lot of my time on Haskell tooling, both for my hobbies and my job. Almost every project I work on sparks a desire for another piece of tooling. Much of the time, I’ll follow that wish and take a detour to implement that thing (Fay, structured-haskell-mode, hindent, are some Haskell-specific examples). But in the end it means less time working on the actual domain problem I’m interested in, so a while ago I intentionally placed a quota on the amount of time I can spend on this.

So this page will contain a list of things I’d work on if I had infinite spare time, and that I wish someone else would make. I’ll update it from time to time as ideas come to the fore.

These projects are non-trivial but are do-able by one person who has enough free time and motivation. There is a common theme among the projects listed, which is that they are things that Haskell among most other well known languages is particularly well suited for and yet we don’t have such tooling as standard tools in the Haskell tool box. They should be!

## An equational reasoning assistant

Equational reasoning lets you prove properties about your functions by following a simple substitution model to state that one term is equal to another. The approach I typically take is to expand and reduce until both sides of the equation are the same.

Here is an example. I have a data type, Consumer. Here is an instance of Functor:

instance Functor (Consumer s d) where
fmap f (Consumer d p) =
Consumer d
(\s ->
case p s of
(Failed e,s') -> (Failed e,s')
(Continued e,s') -> (Continued e,s')
(Succeeded a,s') ->
(Succeeded (f a),s'))

I want to prove that it is a law-abiding instance of Functor, which means proving that fmap id ≡ id. You don’t need to know anything about the Consumer type itself, just this implementation. Here are some very mechanical steps one can take to prove this:

id ≡ fmap id
≡ \(Consumer d p) ->
Consumer d
(\s ->
case p s of
(Failed e,s') -> (Failed e,s')
(Continued e,s') -> (Continued e,s')
(Succeeded a,s') -> (Succeeded (id a),s'))
≡ \(Consumer d p) ->
Consumer d
(\s ->
case p s of
(Failed e,s') -> (Failed e,s')
(Continued e,s') -> (Continued e,s')
(Succeeded a,s') -> (Succeeded a,s'))
≡ \(Consumer d p) ->
Consumer d
(\s -> p s)
≡ \(Consumer d p) ->
Consumer d p
≡ id

So that’s:

• Expand the fmap id into the instance’s implementation.
• Reduce by applying the property that id x ≡ x.
• Reason that if every branch of a case returns the original value of the case, then that whole case is an identity and can be dropped.
• Eta-reduce.
• Again, pattern-matching lambdas are just syntactic sugar for cases, so by the same rule this can be considered identity.
• End up with what we wanted to prove: fmap id ≡ id

These are pretty mechanical steps. They’re also pretty laborious and error-prone. Of course, if you look at the first step, it’s pretty obvious the whole thing is an identity, but writing the steps out provides transformations that can be statically checked by a program. So it’s a good example, because it’s easily understandable and you can imagine proving something more complex would require a lot more steps and a lot more substitutions. Proof of identity for Applicative has substantially more steps, but is equally mechanical.

Wouldn’t it be nice if there was a tool which given some expression would do the following?

• Suggest a list of in-place expansions.
• Suggest a list of reductions based on a set of pre-defined rules (or axioms).

Then I could easily provide an interactive interface for this from Emacs.

In order to do expansion, you need the original source of the function name you want to expand. So in the case of id, that’s why I suggested stating an axiom (id a ≡ a) for this. Similarly, I could state the identity law for Monoids by saying mappend mempty a ≡ a, mappend a mempty ≡ a. I don’t necessarily need to expand the source of all functions. Usually just the ones I’m interested in.

Given such a system, for my example above, the program could actually perform all those steps automatically and spit out the steps so that I can read them if I choose, or otherwise accept that the proof was derived sensibly.

In fact, suppose I have my implementation again, and I state what must be satisfied by the equational process (and, perhaps, some axioms that might be helpful for doing it, but in this case our axioms are pretty standard), I might write it like this:

instance Functor (Consumer s d) where
fmap f (Consumer d p) = ...
proof [|fmap id ≡ id :: Consumer s d a|]

This template-haskell macro proof would run the steps above and if the equivalence is satisfied, the program compiles. If not, it generates a compile error, showing the steps it performed and where it got stuck. TH has limitations, so it might require writing it another way.

Such a helpful tool would also encourage people (even newbies) to do more equational reasoning, which Haskell is often claimed to be good at but you don’t often see it in evidence in codebases. In practice isn’t a standard thing.

Promising work in this area:

## Catch for GHC

Ideally, we would never have inexhaustive patterns in Haskell. But a combination of an insufficient type system and people’s insistence on using partial functions leads to a library ecosystem full of potential landmines. Catch is a project by Neil Mitchell which considers how a function is called when determining whether its patterns are exhaustive or not. This lets us use things like head and actually have a formal proof that our use is correct, or a formal proof that our use, or someone else’s use, will possibly crash.

map head . group

This is an example which is always correct, because group returns a list of non-empty lists.

Unfortunately, it currently works for a defunct Haskell compiler, but apparently it can be ported to GHC Core with some work. I would very much like for someone to do that. This is yet another project which is the kind of thing people claim is possible thanks to Haskell’s unique properties, but in practice it isn’t a standard thing, in the way that QuickCheck is.

## A substitution stepper

This is semi-related, but different, to the proof assistant. I would like a program which can accept a Haskell module of source code and an expression to evaluate in the context of that module and output the same expression, as valid source code, with a single evaluation step performed. This would be fantastic for writing new algorithms, for understanding existing functions and algorithms, writing proofs, and learning Haskell. There was something like this demonstrated in Inventing on Principle. The opportunities for education and general development practice are worth such a project.

Note: A debugger stepper is not the same thing.

Example:

foldr (+) 0 [1, 2, 3, 4]

foldr (+) 0 (1 : [2, 3, 4])

1 + foldr (+) 0 [2, 3, 4]

1 + foldr (+) 0 (2 : [3, 4])

1 + (2 + foldr (+) 0 [3, 4])

1 + (2 + foldr (+) 0 (3 : [4]))

1 + (2 + (3 + foldr (+) 0 [4]))

1 + (2 + (3 + foldr (+) 0 (4 : [])))

1 + (2 + (3 + (4 + foldr (+) 0 [])))

1 + (2 + (3 + (4 + 0)))

1 + (2 + (3 + 4))

1 + (2 + 7)

1 + 9

10

Comparing this with foldl immediately shows the viewer how they differ in structure:

foldl (+) 0 [1, 2, 3, 4]

foldl (+) 0 (1 : [2, 3, 4])

foldl (+) ((+) 0 1) [2, 3, 4]

foldl (+) ((+) 0 1) (2 : [3, 4])

foldl (+) ((+) ((+) 0 1) 2) [3, 4]

foldl (+) ((+) ((+) 0 1) 2) (3 : [4])

foldl (+) ((+) ((+) ((+) 0 1) 2) 3) [4]

foldl (+) ((+) ((+) ((+) 0 1) 2) 3) (4 : [])

foldl (+) ((+) ((+) ((+) ((+) 0 1) 2) 3) 4) []

(+) ((+) ((+) ((+) 0 1) 2) 3) 4

1 + 2 + 3 + 4

3 + 3 + 4

6 + 4

10

Each step in this is a valid Haskell program, and it’s just simple substitution.

If the source for a function isn’t available, there are a couple options for what to do:

• Have special-cases for things like (+), as above.
• Just perform no substitution for that function, it will still be a legitimate program.

It’s another project I could easily provide see-as-you-type support for in Emacs, given an engine to query.

Again, this is just one more project which should just be a standard thing Haskell can do. It’s a pure language. It’s used to teach equational reasoning and following a simple lambda calculus substitution model. But there is no such tool. Haskell is practically waving in our faces with this opportunity.

Existing work in this area:

• stepeval - a prototype which nicely demonstrates the idea. It’s based on HSE and only supports a tiny subset. There aren’t any plans to move this forward at the moment. I’ll update the page if this changes.

# Writing a low level graph database

I've been interested in graph databases for a long time, and I've developed several applications that offer an API close enough to a graph API, but with relational storage. I've also played with off the shelf graph databases, but I thought it would be fun to try an implement my own, in Haskell of course.
I found that in general literature is quite succinct on how database products manage their physical storage, so I've used some of the ideas behind the Neo4J database, as explained in the Graph Databases books and in a few slideshows online.
So I've written the start of a very low level graph database, writing directly on disk via Handles and some Binary instances. I try to use fixed length record so that their IDs translate easily into offsets in the file. Mostly everything ends up looking like linked lists on disk: vertices have a pointer to their first property and their first edge, and in turn these have pointers to the next property or edge. Vertex have pointers to edges linking to and from them.
I've also had some fun trying to implement an index trie on disk.
All in all, it's quite fun, even though I realize my implementations are quite naive, and I just hope that the OS disk caching is enough to make performance acceptable. I've written a small benchmark using the Hackage graph of packages as sample data, but I would need to write the same with a relational backend.

If anybody is interested in looking at the code or even participate, everything is of course on Github!

# Commercial Haskell Special Interest Group

At FP Complete, we’re constantly striving to improve the quality of the Haskell ecosystem, with a strong emphasis on making Haskell a viable tool for commercial users. Over the past few years we’ve spoken with many companies either currently using Haskell or considering doing so, worked with a number of customers in making Haskell a reality for their software projects, and released tooling and libraries to the community.

We’re also aware that we’re not the only company trying to make Haskell a success, and that others are working on similar projects to our own. We believe that there’s quite a lot of room to collaborate on identifying problems, discussing options, and creating solutions.

Together with a few other companies and individuals, we are happy to announce the launch of a Commercial Haskell Special Interest Group.

If you're interested in using Haskell in a commercial context, please join the mailing list. I know that we have some projects we think are worth immediate collaboration, and we'll kick off discussions on those after people have time to join the mailing list. And I'm sure many others have ideas too. I look forward to hearing them!

# Combining inputs in conduit

The code the post on using my simplistic state machine together with conduit will happily wait forever for input, and the only way to terminate it is pressing Ctrl-C. In this series I’m writing a simple adder machine, but my actual use case for these ideas are protocols for communication (see the first post), and waiting forever isn’t such a good thing; I want my machine to time out if input doesn’t arrive in a timely fashion.

I can think of a few ways to achieve this:

1. Use a watchdog thread that is signalled from the main thread, e.g. by a Conduit that sends a signal as each Char passes through it.
2. As each Char is read kick off a “timeout thread” which throws an exception back to the main thread, unless the main thread kills it before the timeout expires.
3. Run a thread that creates ticks that then can be combined with the Chars read and fed into the state machine itself.

Since this is all about state machines I’ve opted for option 3. Furthermore, since I’ve recently finished reading the excellent Parallel and Concurrent Programming in Haskell I decided to attempt writing the conduit code myself instead of using something like stm-conduit.

The idea is to write two functions:

1. one Source to combine two Sources, and
2. one Sink that writes its input into a TMVar.

The latter of the two is the easiest one. Given a TMVar it just awaits input, stores it in the TMVar and then calls itself:

sinkTMVar :: MonadIO m => TMVar a -> Sink a m ()
sinkTMVar tmv = forever $do v <- await case v of Nothing -> return () Just v' -> liftIO (atomically$ putTMVar tmv v')

The other one is only slightly more involved:

whyTMVar :: MonadIO m => Source (ResourceT IO) a -> Source (ResourceT IO) a -> Source m a
whyTMVar src1 src2 = do
t1 <- liftIO newEmptyTMVarIO
t2 <- liftIO newEmptyTMVarIO
void $liftIO$ async $fstProc t1 void$ liftIO $async$ sndProc t2
forever $liftIO (atomically$ takeTMVar t1 orElse takeTMVar t2) >>= C.yield

where
fstProc t = runResourceT $src1 $$sinkTMVar t sndProc t = runResourceT src2$$ sinkTMVar t Rather short and sweet I think. However, there are a few things that I’m not completely sure of yet. forkIO vs. async vs. resourceForkIO There is a choice between at least three functions when it comes to creating the threads and I haven’t looked into which one is better to use. AFAIU there may be issues around exception handling and with resources. For now I’ve gone with async for no particular reason at all. Using TMVar In this example the input arrives rather slowly, which means having room for a single piece at a time is enough. If the use case changes and input arrives quicker then this decision has to be revisited. I’d probably choose to use stm-conduit in that case since it uses TMChan internally. Combining only two Sources Again, this use case doesn’t call for more than two Sources, at least at this point. If the need arises for using more Sources I’ll switch to stm-conduit since it already defines a function to combine a list of Sources. The next step will be to modify the conduit process and the state machine. ## January 21, 2015 ### mightybyte # LTMT Part 3: The Monad Cookbook ### Introduction The previous two posts in my Less Traveled Monad Tutorial series have not had much in the way of directly practical content. In other words, if you only read those posts and nothing else about monads, you probably wouldn't be able to use monads in real code. This was intentional because I felt that the practical stuff (like do notation) had adequate treatment in other resources. In this post I'm still not going to talk about the details of do notation--you should definitely read about that elsewhere--but I am going to talk about some of the most common things I have seen beginners struggle with and give you cookbook-style patterns that you can use to solve these issues. ### Problem: Getting at the pure value inside the monad This is perhaps the most common problem for Haskell newcomers. It usually manifests itself as something like this: main = do lineList <- lines$ readFile "myfile.txt"
-- ... do something with lineList here


That code generates the following error from GHC:

    Couldn't match type IO String' with [Char]'
Expected type: String
Actual type: IO String
In the return type of a call of readFile'


Many newcomers seem puzzled by this error message, but it tells you EXACTLY what the problem is. The return type of readFile has type IO String, but the thing that is expected in that spot is a String. (Note: String is a synonym for [Char].) The problem is, this isn't very helpful. You could understand that error completely and still not know how to solve the problem. First, let's look at the types involved.

readFile :: FilePath -> IO String
lines :: String -> [String]


Both of these functions are defined in Prelude. These two type signatures show the problem very clearly. readFile returns an IO String, but the lines function is expecting a String as its first argument. IO String != String. Somehow we need to extract the String out of the IO in order to pass it to the lines function. This is exactly what do notation was designed to help you with.

### Solution #1

main :: IO ()
main = do
let lineList = lines contents
-- ... do something with lineList here


This solution demonstrates two things about do notation. First, the left arrow lets you pull things out of the monad. Second, if you're not pulling something out of a monad, use "let foo =". One metaphor that might help you remember this is to think of "IO String" as a computation in the IO monad that returns a String. A do block lets you run these computations and assign names to the resulting pure values.

### Solution #2

We could also attack the problem a different way. Instead of pulling the result of readFile out of the monad, we can lift the lines function into the monad. The function we use to do that is called liftM.

liftM :: Monad m => (a -> b) -> m a -> m b
liftM :: Monad m => (a -> b) -> (m a -> m b)


The associativity of the -> operator is such that these two type signatures are equivalent. If you've ever heard Haskell people saying that all functions are single argument functions, this is what they are talking about. You can think of liftM as a function that takes one argument, a function (a -> b), and returns another function, a function (m a -> m b). When you think about it this way, you see that the liftM function converts a function of pure values into a function of monadic values. This is exactly what we were looking for.

main :: IO ()
main = do
lineList <- liftM lines (readFile "myfile.txt")
-- ... do something with lineList here


This is more concise than our previous solution, so in this simple example it is probably what we would use. But if we needed to use contents in more than one place, then the first solution would be better.

### Problem: Making pure values monadic

Consider the following program:

import Control.Monad
import System.Environment
main :: IO ()
main = do
args <- getArgs
output <- case args of
[] -> "cat: must specify some files"
fs -> liftM concat (mapM readFile fs)
putStrLn output


This program also has an error. GHC actually gives you three errors here because there's no way for it to know exactly what you meant. But the first error is the one we're interested in.

    Couldn't match type []' with IO'
Expected type: IO Char
Actual type: [Char]
In the expression: "cat: must specify some files"


Just like before, this error tells us exactly what's wrong. We're supposed to have an IO something, but we only have a String (remember, String is the same as [Char]). It's not convenient for us to get the pure result out of the readFile functions like we did before because of the structure of what we're trying to do. The two patterns in the case statement must have the same type, so that means that we need to somehow convert our String into an IO String. This is exactly what the return function is for.

### Solution: return

return :: a -> m a


This type signature tells us that return takes any type a as input and returns "m a". So all we have to do is use the return function.

import Control.Monad
import System.Environment
main :: IO ()
main = do
args <- getArgs
output <- case args of
[] -> return "cat: must specify some files"
fs -> liftM concat (mapM readFile fs)
putStrLn output


The 'm' that the return function wraps its argument in, is determined by the context. In this case, main is in the IO monad, so that's what return uses.

### Problem: Chaining multiple monadic operations

import System.Environment
main :: IO ()
main = do
[from,to] <- getArgs
writeLBS $Aeson.encode x The question is: where do we put this function? In small projects, these seem to inevitably end up inside the well-known Utils module. In larger, or more well-organised projects, it might end up in Foo.Web or Foo.Web.Utils. However, if we think outside of the box, and disregard dependency problems and libraries including every possible utility function one can write, it is clearer where this function should go: in Snap.Core. Putting it in Snap.Core is obviously not a solution – imagine the trouble library maintainers would have to deal with in order to include all these utility functions. # The basic scheme The scheme we use to solve this is simple yet powerful: in our own application’s non-exposed modules list, we add Snap.Core.Extended. src/Snap/Core/Extended.hs: {-# LANGUAGE OverloadedStrings #-} module Snap.Core.Extended ( module Snap.Core , json ) where import qualified Data.Aeson as Aeson import Snap.Core json :: (MonadSnap m, Aeson.ToJSON a) => a -> m () json x = do modifyResponse$ setContentType "application/json"
writeLBS $Aeson.encode x The important thing to notice here is the re-export of module Snap.Core. This means that, everywhere in our application, we can use import Snap.Core.Extended as a drop-in replacement for import Snap.Core. This also makes sharing code in a team easier. For example, say that you are looking for a catMaybes for Data.Vector. Before, I would have considered either defining this in a where clause, or locally as a non-exported function. This works for single-person projects, but not when different people are working on different modules: you end up with five implementations of this method, scattered throughout the codebase. With this scheme, however, it’s clear where to look for such a method: in Data.Vector.Extended. If it’s not there, you add it. Aside from utility functions, this scheme also works great for orphan instances. For example, if we want to serialize a HashMap k v by converting it to [(k, v)], we can add a Data.HashMap.Strict.Extended module. src/Data/HashMap/Strict/Extended.hs: {-# OPTIONS_GHC -fno-warn-orphans #-} module Data.HashMap.Strict.Extended ( module Data.HashMap.Strict ) where import Data.Binary (Binary (..)) import Data.Hashable (Hashable) import Data.HashMap.Strict instance (Binary k, Binary v, Eq k, Hashable k) => Binary (HashMap k v) where put = put . toList get = fmap fromList get A special case of these .Extended modules is Prelude.Extended. Since you will typically import Prelude.Extended into almost all modules in your application, it is a great way to add a bunch of (very) common imports from base, so import noise is reduced. This is, of course, quite subjective. Some might want to add a few specific functions to Prelude (as illustrated below), and others might prefer to add all of Control.Applicative, Data.List, Data.Maybe, and so on. src/Prelude/Extended.hs: module Prelude.Extended ( module Prelude , foldl' , fromMaybe ) where import Data.List (foldl') import Data.Maybe (fromMaybe) import Prelude # Scaling up The basic scheme breaks once our application consists of several cabal packages. If we have a package acmecorp-web, which depends on acmecorp-core, we would have to expose Data.HashMap.Strict.Extended from acmecorp-core, which feels weird. A simple solution is to create an unordered-containers-extended package (which is not uploaded to the public Hackage for obvious reasons). Then, you can export Data.HashMap.Strict.Extended from there. This solution creates quite a lot of overhead. Having many modules is fine, since they are easy to manage – they are just files after all. Managing many packages, however, is harder: every package introduces a significant amount of overhead: for example, repos need to be maintained, and dependencies need to be managed explicitly in the cabal file. An alternative solution is to simply put all of these modules together in a hackage-extended package. This solves the maintenance overhead and still gives you a very clean module hierarchy. # Conclusion After using this scheme for over year in a large, constantly evolving Haskell application, it is clear to me that this is a great way to organise and share code in a team. A side-effect of this scheme is that it becomes very convenient to consider some utility functions from these .Extended modules for inclusion in their respective libraries, since they all live in the same place. If they do get added, just remove the originals from hackage-extended, and the rest of your code doesn’t even break! Thanks to Alex Sayers for proofreading! ## January 19, 2015 ### The GHC Team # GHC Weekly News - 2015/01/19 Hi *, It's time for some more GHC news! The GHC 7.10 release is closing in, which has been the primary place we're focusing our attention. In particular, we're hoping RC2 will be Real Soon Now. Some notes from the past GHC HQ meetings this week: • GHC 7.10 is still rolling along smoothly, and it's expected that RC2 will be cut this Friday, January 23rd. Austin sent out an email about this to ghc-devs, so we can hopefully get all the necessary fixes in. • Currently, GHC HQ isn't planning on focusing many cycles on any GHC 7.10 tickets that aren't highest priority. We're otherwise going to fix things as we see fit, at our leisure - but a highest priority bug is a showstopper for us. This means if you have something you consider a showstopper for the next release, you should bump the priority on the ticket and yell at us! • We otherwise think everything looks pretty smooth for 7.10.1 RC2 - our libraries are updated, and most of the currently queued patches (with a few minor exceptions) are done and merged. Some notes from the mailing list include: • Austin has alerted everyone that soon, Phabricator will run all builds with ./validate --slow, which will increase the time taken for most builds, but will catch a wider array of bugs in commits and submitted patches - there are many cases the default ./validate script still doesn't catch. https://www.haskell.org/pipermail/ghc-devs/2015-January/008030.html • Johan Tibell asked about some clarifications for the HsBang datatype inside GHC. In response, Simon came back with some clarifications, comments, and refactorings, which greatly helped Johan. ttps://www.haskell.org/pipermail/ghc-devs/2015-January/007905.html • Richard Eisenberg had a question about the vectoriser: can we disable it? DPH seems to have stagnated a bit recently, bringing into question the necessity of keeping it on. There hasn't been anything done yet, but it looks like the build will get lighter, with a few more modules soon: https://www.haskell.org/pipermail/ghc-devs/2015-January/007986.html • Jan Stolarek has a simple question: what English spelling do we aim for in GHC? It seems that while GHC supports an assortment of British and American english syntactic literals (e.g. SPECIALIZE and SPECIALISE), the compiler sports an assortment of British/American identifiers on its own! https://www.haskell.org/pipermail/ghc-devs/2015-January/007999.html Closed tickets the past few weeks include: #9966, #9904, #9969, #9972, #9934, #9967, #9875, #9900, #9973, #9890, #5821, #9984, #9997, #9998, #9971, #10000, #10002, #9243, #9889, #9384, #8624, #9922, #9878, #9999, #9957, #7298, and #9836. ### Philip Wadler # My email is a monster My New Year's resolution is to look at my e-mail at most once a day. If you need a response in less than a day or two, please arrange it with me in advance or use a different medium. Cartoon courtesy of Oatmeal. ## January 17, 2015 ### Matthew Sackman # Free speech values In the aftermath of the attacks on Charlie Hebdo in Paris, there has been some high quality thinking and writing. There's also been some really stupid things said, from the usual protagonists. It's an interesting facilitation that the internet now provides: as I no longer watch any news on TV (in fact I don't watch any news at all), nor subscribe to any newspaper, I'm used to reading articles from a wide range of sources. Equally, it's much easier for me to avoid opinion I disagree (right-wing press) with or trivialised dumbed-down reporting (e.g. BBC news). Because of this ease of reading what you want to (in both the good and bad sense), I thought a lot of the reaction was measured and sensible. Turns out I was just unaware of most of the reaction going on. Anyway there seems to be virtually nothing left to say on this, so this post is really little more than a bunch of links to pieces I thought were thoughtful and well written. I am an agnostic. I personally don't believe in any religion but I accept I can't prove that every religion is false and so I may be wrong. I tend to treat the beliefs of any religion as arbitrary and abstract ideas. Thus one place to start is the acknowledgement that the laws of any country or civilisation are as arbitrary and ad-hoc as the rules or teachings of any religion. They are just things that people choose to believe in, or follow, or not violate. In the UK, some of our law is based in Christianity (e.g. thou shalt not murder - though I've no idea whether ideas like that actually predate Christianity; I wouldn't be surprised if they do) though other parts of Christianity are not encoded in law (adultery is not illegal, for example). Many have careless labelled these attacks as an attack of free speech and thus that the reaction is about defending free speech. As such, much has been written about how it's possible to defend the right Charlie Hebdo has to publish anything they want, even if it's offensive, even whilst criticising their choice of content. Freedom of speech is, I believe, essential for any sort of democracy to work. This doesn't suggest that if you have freedom of speech then you have a democracy (I don't believe in the UK, or arguably anywhere in Europe or north America there is functioning democracy; merely systemic corporate dictatorships disguised by corrupt elected faux-representatives), but without freedom of speech, you certainly don't have any ability to transparently hold people to account and thus corruption and abuse will obviously take hold. But freedom of speech is a choice, it's not something axiomatic to existence; it's something many people have chosen to attach huge importance to and as such will defend. But just because there are widely agreed reasons to defend the concept of freedom of speech doesn't mean that it's self-evidently a "better" idea than not having freedom of speech. To judge something as "better" than something else requires all manner of discussion of the state of human existence. Consequently, criticising people for not holding freedom of speech in the same regard as we claim to do is no different from criticising people for their choice of clothing, or housing, or diet, or career, or religious views. Of course in the UK, we don't have freedom of speech as an absolute concept nor does most of Europe. In the UK, "incitement to racial hatred" was established as an offence by the provisions of §§ 17-29 of the Public Order Act 1986, and The Criminal Justice and Public Order Act 1994 made publication of material that incited racial hatred an arrestable offence. It's not difficult to find stories of people getting arrested for saying things on twitter, the easiest is of course making bomb threats. UKIP (an extremist right wing political party in the UK) even managed to get police to visit a man who'd posted a series of tweets fact-checking UKIP policies. Thankfully, he wasn't arrested. The USA appears to hold freedom of speech as a much more inviolable concept. For example: The ACLU vigorously defends the right of neo-Nazis to march through a community filled with Holocaust survivors in Skokie, Illinois, but does not join the march; they instead vocally condemn the targeted ideas as grotesque while defending the right to express them. But whilst the outpouring in Paris and the crowds gathered as a statement of unity were warmly defiant, it is somewhat likely that rather more than physical violence that was being defied, and more than freedom of speech defended by the crowd. As Brian Klug wrote: Here is a thought experiment: Suppose that while the demonstrators stood solemnly at Place de la Republique the other night, holding up their pens and wearing their “je suis charlie” badges, a man stepped out in front brandishing a water pistol and wearing a badge that said “je suis cherif” (the first name of one of the two brothers who gunned down the Charlie Hebdo staff). Suppose he was carrying a placard with a cartoon depicting the editor of the magazine lying in a pool of blood, saying, “Well I’ll be a son of a gun!” or “You’ve really blown me away!” or some such witticism. How would the crowd have reacted? Would they have laughed? Would they have applauded this gesture as quintessentially French? Would they have seen this lone individual as a hero, standing up for liberty and freedom of speech? Or would they have been profoundly offended? And infuriated. And then what? Perhaps many of them would have denounced the offender, screaming imprecations at him. Some might have thrown their pens at him. One or two individuals — two brothers perhaps — might have raced towards him and (cheered on by the crowd) attacked him with their fists, smashing his head against the ground. All in the name of freedom of expression. He would have been lucky to get away with his life. It seems that some things you can say and governments will try and protect you. It would appear in much of Europe that blaspheming Islam is legally OK. But, as noted above, saying other things will get you arrested. The French "comedian" Dieudonné was arrested just 48 hours after the march through Paris on charges of "defending terrorism". Whilst not arrested, the UK Liberal Democrat MP David ward tweeted "Je suis Palestinian" during the Paris marches and criticised the presence of the Israeli prime minister, Netanyahu, subsequently eliciting a complaint from the Israeli ambassador to Britain. Of course the "world leaders" who gathered in Paris have a wonderful record themselves on "protecting" free speech. Charlie Hebdo did not practise satire by mocking select, specific targets, nor did they mock all major religions equally. It's been widely reported that they sacked a cartoonist for making an anti-Semitic remark and that: Jyllands-Posten, the Danish newspaper that published caricatures of the Prophet in 2005, reportedly rejected cartoons mocking Christ because they would "provoke an outcry" and proudly declared it would "in no circumstances ... publish Holocaust cartoons". But of course it comes down to the content of the publication. In this case the cartoons exist to ridicule, make fun of and offend members of one of the world's largest religions, Islam, by mocking their prophet Mohammed. As Amanda Taub writes: Dalia Mogahed, the Director of Research at the Institute for Social Policy and Understanding, explained that Mohammed is a beloved figure to Muslims, and "it is a human impulse to want to protect what's sacred to you". Mogahed compared the cartoons to the issue of flag-burning in the United States, noting that a majority of Americans favour a constitutional amendment to ban flag-burning for similar reasons: the flag is an important symbol of a national identity, and many Americans see flag-burning as an attack on that identity, or even on the country itself. That's not extremism or backwardness; it's about protecting something you cherish. In any large group of people, there will be the vast majority of sound mind and thought, and a small minority who are not. This is just the fact that all over the earth, humans are not all the same: there is some variance in health, intelligence, and every other aspect of what a human is. Any large sampling of humans will show the same set of variations. So if you offend a huge group of people, you will offend tall people and short people, rich people and poor people, fat people and thin people, violent people and peaceful people. Unsurprisingly, it would appear that the background of these killers suggests there is little to do with Islam there, and more to do with the their upbringing, family, education and integration with society. Thus even if you feel Charlie Hebdo's publications of the cartoons served some purpose (given their biased choice of target, their purpose does not seem to be an exercise in itself of freedom of speech), it should be obvious that by offending so many people, they were placing themselves in danger. The same is true of any sustained, systemic, deliberate offence to any of this planet's major religions, races, nationalities or any other grouping of humans which share values. So it becomes a balancing act between how much do you believe in the message you're publishing versus the risk you're putting yourself in. You can view the actions of Edward Snowden in this same context: he felt that the message he was delivering on the abuses of surveillance power carried out by governments across the world outweighed the significant danger he was putting himself in, and so both delivered the message and accepted the need to flee from his country, probably never to return, in fear of the consequences of his actions. Thankfully, throughout history, there have been people who have chosen to put themselves in the path of great harm (often losing their lives as a result) in order to report, expose, document and publicise matters which the wider public needed to know. Governments, monarchies and empires have crumbled when faced with popular revolt. So freedom of speech requires consideration. It is perfectly reasonable not to say something because you anticipate you won't enjoy the consequences. Most of us do not conduct our lives by going around saying anything and everything we want to our friends and family: if we did, we'd rapidly lose a lot of friends. The expression "biting your tongue" exists for a reason. Equally, it's perfectly reasonable for a news outlet to decide not to re-publish the Charlie Hebdo cartoons if they fear a violent response that they suspect the local police forces cannot prevent; not to mention just not wanting to offend so many people. I view as daft the idea that people should never choose not to publish something out of fear. People absolutely should choose not to publish, if they feel the risk to the things they hold dear is not outweighed by the message they're delivering. Everything in life is a trade-off and every action has consequences. Whilst I agree with the right to free speech, that does not imply saying anything you like is free of consequences. If it were, it would require that words have no meaning, and subsequently all communication is void: if anything you say has no consequence then you can say nothing. I am certainly not suggesting the murders were in any way justified, or that Islam or any other religion or collection of humans should be beyond criticism or even ridicule. At the end of the day, no human is perfect, and as such we can all benefit from a thorough dose of criticism once in a while. Every article I've linked to in this post repeats that such violence, regardless of the provocation, is abhorrent, and I agree with that: murder is never an acceptable response to any drawing, written or spoken word. But that doesn't mean that these events weren't predictable. Finally then we get to the insanely idiotic response from the UK government. That MI5 should have more powers that they don't need (they probably just need more money), and that we must deny terrorists "safe space" to communicate online. Which means banning encryption, which means it's impossible to use the internet for anyone. The home secretary, Theresa May said: We are determined that as far as possible there should be no safe spaces for terrorists to communicate. I would have thought that that should be a principle ... that could have been held by everybody, across all parties in this House of Commons. So of course, if terrorists can't communicate in private then no one can. Quickly, we've immediately gone from lazy labelling of events as an attack on free speech to a knee jerk response of "free speech yes, but you certainly can't have free speech in private, because you might be a terrorist". Again, it's a trade-off. I doubt that having such restrictions on communication will make the earth or this country safer for anyone and of course the impossibility of a controlled study means it cannot be proven one way or another. No security service is ever going to be flawless and from time to time very horrible things will continue to happen. I think most people are aware of this and accept this; we're all going to die after all. The loss of civil liberties though is certainly far more worrying to me. In theory, I would think these proposals so lunatic as to never see the light of day (it would be completely impossible to enforce for one thing - terrorists along with everyone else would learn to use stenography to encode their messages in pictures of cats, thus rendering their traffic no different to that of everyone else). Sadly, Labour have stated they don't believe their position to be that far away from the Conservatives, which is deeply worrying. Labour don't exactly have a great record on this area either given their previous ID card schemes and the introduction of detention-without-charge. What is needed is some transparency. We need an informed debate, with MI5 and GCHQ providing some independently verifiable facts and figures that demonstrate how they are being thwarted in what they're trying to do. We need to understand properly what the risk to us is, and most importantly we need to understand why these threats exist and what else we can do to make them decrease. I've never seen it said that any UK Government policy in the last 15 years has made the UK less of a target for such attacks. Maybe we should look at that before we start subjecting ourselves to Orwellian control. ## January 15, 2015 ### Functional Jobs # Functional Software Developer at Moixa Technology (Full-time) Green energy / IoT startup Moixa Technology is seeking to add to our software team. We're a small team dedicated to developing technology that enables a radical change in the way that energy is used in the home. We are just ramping up to deliver a project (working with large energy companies) demonstrating how communities can use the Maslow system across a group of homes to share energy in that community. We will need to deploy software that addresses some challenging problems to make this possible. The code that we develop is built around providing services based on the hardware systems that we are deploying and so needs to work with the constraints and characteristics of the hardware & low level firmware we have built. We're looking for individuals with generalist approach that are willing and and able to participate in all aspect of design, implementation and operation of our platform. The candidate must be happy in a small team, and also able to work autonomously. We are expanding as we ramp up to deliver the next generation of control software to our increasing number of deployed system. This is an exciting moment for the company. Tasks: • Design & implemetation of all parts of our software stack (web frontend & backend, data analytics, high-level code on IoT devices) • Operations support (expected <20% of time) Our current stack involves: • Scala (Akka, Play) / ClojureScript / Haskell • Postgres, neo4j • Raspberry PI / Arch Linux • PIC32 microcontroller / C Skills and Requirements: • Experience in one or more functional languages • Familiarity with at least one database paradigm • Linux scripting and operation Advantages: • Familiarity with (strongly) typed functional languages (Haskell/ML/Scala) • Embedded programming experience • Experience in data analytics (Spark or similiar) • Experience in IoT development • Open Source contributions Moixa technology is based in central London (Primrose Hill), Salary depending on experience + performance based share options. Get information on how to apply for this position. ## January 14, 2015 ### Magnus Therning # Thought on JavaScript Over the holidays I’ve been reading Douglas Crockford’s excellent book JavaScript: The Good Parts. About halfway through I came to the conclusion that JavaScript is an “anti-LISP”. There are many reasons to learn LISP, but none of them is “LISP is widely used in industry.” As Eric Raymond is famous words claim, knowing LISP will make you a better programmer. On the other hand there seems to be almost no reasons to learn JavaScript. It sure doesn’t seem to teach anything that’ll make you a better programmer. The only reason I can come up with is “JavaScript is widely used in industry.” ## January 13, 2015 ### Neil Mitchell # User Interfaces for Users Summary: When designing a user interface, think about the user, and what they want to know. Don't just present the information you know. As part of my job I've ended up writing a fair amount of user interface code, and feedback from users has given me an appreciation of some common mistakes. Many user interfaces are designed to present information, and one of the key rules is to think about what the user wants to see, not just showing the information you have easily to hand. I was reminded of this rule when I was expecting a parcel. On the morning of Saturday 10/1/2015 I used a track my parcel interface to find: The interface presents a lot of information, but most of it is interesting to the delivery company, not to me. I have basically one question: when will my parcel arrive. From the interface I can see: • The parcel is being shipped with "Express AM" delivery. No link to what that means. Searching leads me to a page saying that guarantees delivery before 1pm on a business day (some places said noon, some said 1pm). That is useful information if I want to enter into a contract with the delivery service, but not when I'm waiting for a parcel. What happens on Saturday? Do they still deliver, just not guarantee the time? Do they wait until the next business day? Do they do either, as they see fit? • My parcel has been loaded onto a vehicle. Has the vehicle has left the depot, or is that a separate step? How many further steps are there between loading and delivery? This question is easy to answer after the parcel has been delivered, since the additional steps show up in the list, but difficult to answer before. On Saturday morning my best guess about when the parcel would arrive was between then and Monday 1pm. Having been through the complete process, I now know the best answer was between some still unknown time on Monday morning and Monday 1pm. With that information, I'd have taken my son to the park rather than keeping him cooped up indoors. I suggest the company augment their user interface with the line "Your parcel will be delivered on Monday, between 9am and 1pm". It's information they could have computed easily, and answers my question. The eagle-eyed may notice that there is a plus to expand to show more information. Alas, all that shows is: I think they're trying to say my puny iPhone can't cope with the bold font that is essential to tell me the status and I should get an iPhone plus... Checking the desktop website also showed no further information. ### FP Complete # FP Complete's software pipeline FP Complete's mission is easily expressed: increase the commercial adoption of Haskell. We firmly believe that- in many domains- Haskell is the best way to write high quality, robust, performant code in a productive (read: fast-to-market) way. Those of you who, like me, are members of the Haskell community are probably using Haskell because you believe the same thing, and believe- like us- that we can make the world a better place by getting more software to be written in Haskell. By the way: FP Complete is looking to expand its team of Haskell developers. If this idea excites you, don't forget to send us your resumes! There are two interesting groups that I've spelled out in that paragraph: commercial users of Haskell, and the Haskell community. I want to clarify how our software development process interacts with these two groups, in part to more fully answer a question from Reddit. The Haskell community has created, and continues to create, amazing software. As a company, our main objective is to help other companies find this software, understand its value, and knock down any hurdles to adoption that they may have. These hurdles include: • Lack of expertise in Haskell at the company • Limitations in the ecosystem, such as missing libraries for a particular domain • Providing commercial support, such as high-availability hosted solutions and on-call engineers to provide answers and fix problems • Provide tooling as needed by commercial users You can already see quite a bit of what we've done, for example: • Create FP Haskell Center, which addressed requests from companies to provide a low-barrier-to-entry Haskell development environment for non-Haskell experts • Put together School of Haskell to enable interactive documentation to help both new Haskellers, and those looking to improve their skills • Start the LTS Haskell project as a commercial-grade Haskell package set, based on our previous work with FP Haskell Center libraries and Stackage • Provide Stackage Server as a high-availability means of hosting package sets, both official (like Stackage) and unofficial (like experimental package releases) There's something all of these have in common, which demonstrates what our software pipeline looks like: 1. We start off gathering requirements from companies- both those that are and are not our customers- to understand needs 2. We create a product in a closed-source environment 3. Iterate with our customers on the new product to make sure it addresses all of their needs 4. After the product reaches a certain point of stability (a very subjective call), we decide to release it to the community, which involves: 1. Polishing it 2. Discussing with relevant members/leaders in the community 3. Making it officially available Not every product we work on goes through all of these steps. For example, we might decide that the product is too specialized to be generally useful. That's why we sometimes hold our cards a bit close to our chest: we don't want to talk about every new idea we have, because we know some of them may be duds. Some people may ask why we go through that fourth step I listed above. After all, taking a product from "it works well for individual companies with ongoing support from us" to "it's a generally viable product for commercial and non-commercial users" is an arduous process, and doesn't directly make us any money. The answer is simple, and I already alluded to it above: the great value in Haskell comes from the wonderful work the community does. If we're to succeed in our mission of getting Haskell to improve software development in general, we need all the help we can get. So that's our strategy. You're going to continue seeing new products released from us as we perfect them with our customers. We want to find every way we can to help the community succeed even more. I'm also making a small tweak to our strategy today: I want to be more open with the community about this process. While not everything we do should be broadcast to the world (because, like I said, some things may be duds), I can share some of our directions earlier than I have previously. So let me lay out some of the directions we're working on now: 1. Better build tools. LTS Haskell is a huge step in that direction, providing a sane development environment. But there's still quite a bit of a manual process involved. We want to automate this even more. (And to directly answer hastor's question on Reddit: yes, we're going to release a Docker image.) 2. Better code inspection. We've developed a lot of functionality as part of FP Haskell Center to inspect type signature, identifier locations, usage, etc. We want to unlock that power and make it available outside of FP Haskell Center as well. 3. In a completely different direction: we're working on more powerful distributed computing capabilities. This is still early stage, so I can't say much more yet. Outside of products themselves, we want to get other companies on board with our goal of increased Haskell adoption as well. We believe many companies using Haskell today, and even more so companies considering making the jump, have a huge amount of ideas to add to the mix. We're still ironing out the details of what that will look like, but expect to hear some more from us in the next few months about this. And I'm giving you all a commitment: expect to see much more transparency about what we're doing. I intend to be sharing things with the community as we go along. Chris Done and Mathieu Boespflug will be part of this effort as well. If you have questions, ask. We want to do all we can to make the community thrive. # We're hiring: Software Engineer FP Complete is expanding yet again! We are looking to hire for several new engineers to join our Haskell development team, both to build great new core products and in partnership with our clients to apply Haskell at a large scale. Some of our recently announced core products include the Integrated Analysis Platform, Stackage and LTS Haskell, with much more to come. If you’d like to be part of our team and shape the future of Haskell, please send a resume or CV to [email protected]. Any existing work - either a running website or an open source codebase - which you can include as links be greatly appreciated as well. We will want you to start right away. Depending on your current jurisdiction, this will either be a full-time contractor position, or an employee position. This is a telecommute position: you can work from home or wherever you choose, with little or no travel. Location in North America preferred; but you will work with colleagues who are both on North American and European hours. Activities: • distribute existing and new code over large clusters, • code parallelization and performance tuning, • interface with foreign math and scientific libraries, • relentlessly refactor for composability, testability and clarity, • identify performance bottlenecks and debug known issues, • implementing unit tests, integration tests, acceptance tests, • write clear and concise documentation. Skills: • strong experience writing process driven application code in Haskell. • experience building reusable components/packages/libraries and demonstrated ability to write well structured and well documented code, • ability to evolve and refactor large code bases, • experience working under test driven methodologies, • Ability to interact with a distributed development team, and to communicate clearly on issues, in bug reports and emails. These further skills are a plus: • Bachelor’s or Master’s degree in computer science or mathematics, • experience developing products in a regulated environment (avionics/medical/finance). • experience building scalable server/Web applications, • (web) UI design and implementation experience, • an understanding of the implementation of GHC’s multithreaded runtime, • experience as an architect and/or a creator of technical specs. ### Manuel M T Chakravarty # Using Haskell as part of a Mac app Using Haskell as part of a Mac app: Here are the YouTube video and slides of my Haskell Symposium 2014 talk on using Objective-C inline in Haskell programs. This provides a fairly convenient way to integrate Haskell code into the model layer of a Mac app written in Swift or Objective-C. ### Magnus Therning # State machine with conduit Previously I wrote about a simple state machine in Haskell and then offered a simple example, an adder. As I wrote in the initial post I’ve been using this to implement communication protocols. That means IO, and for that I’ve been using conduit. In this post I thought I’d show how easy it is to wrap up the adder in conduit, read input from stdin and produce the result on stdout. At the top level is a function composing three parts: 1. input from stdin 2. processing in the adder 3. output on stdout This can be written as this process :: ResourceT IO () process = source$= calc output where source = stdinC output = stdoutC calc = concatMapC unpack == wrapMachine calcMachine == filterC f == mapC (pack . show) where f (CalcResult _) = True f _ = False The input to the adder is ordinary Chars so the ByteStrings generated by stdinC need to be converted before being handed off to the state machine, which is wrapped to make it a Conduit (see below). The state machine is generating more signals that I want printed, so I filter out everything but the CalcResults. Then it’s passed to show and turned into a ByteString before sent to stdoutC. I think the implementation of wrapMachine shows off the simplicity of conduit rather well: wrapMachine :: (Monad m) => Machine state event signal -> Conduit event m signal wrapMachine mach = do maybeEvent <- await case maybeEvent of Nothing -> return () Just event -> let (newMach, signals) = stepMachine mach event in do yield signals wrapMachine newMach The main function is as short as this: main :: IO () main = do hSetBuffering stdin NoBuffering hSetBuffering stdout NoBuffering runResourceT process The result is about as exciting as expected % runhaskell Calculator.hs 56 42 +CalcResult 98 17 56 +CalcResult 73 It never exits on its own so one has to use Ctrl-C to terminate. I thought I’d try to address that in the next post by adding a time-out to the adder machine. ## January 12, 2015 ### JP Moresmau # EclipseFP 2.6.3 released A new version of EclipseFP, the Eclipse plugins for Haskell development, has been released. The release notes can be found here. As usual install or update from Eclipse by using the update site http://eclipsefp.sf.net/updates. Happy Haskell Hacking! ### Antti-Juhani Kaijanaho (ibid) # Planet Haskell email is broken (UPDATE: fixed) I became aware about a week ago that Planet Haskell’s email address had not received any traffic for a while. It turns out the community.haskell.org email servers are misconfigured. The issue remains unfixed as of this writing. Update: this issue has been fixed. Please direct your Planet Haskell related mail directly to me ([email protected]) for the duration of the problem. ### Keegan McAllister # Continuations in C++ with fork [Update, Jan 2015: I've translated this code into Rust.] While reading "Continuations in C" I came across an intriguing idea: It is possible to simulate call/cc, or something like it, on Unix systems with system calls like fork() that literally duplicate the running process. The author sets this idea aside, and instead discusses some code that uses setjmp/longjmp and stack copying. And there are several other continuation-like constructs available for C, such as POSIX getcontext. But the idea of implementing call/cc with fork stuck with me, if only for its amusement value. I'd seen fork used for computing with probability distributions, but I couldn't find an implementation of call/cc itself. So I decided to give it a shot, using my favorite esolang, C++. Continuations are a famously mind-bending idea, and this article doesn't totally explain what they are or what they're good for. If you aren't familiar with continuations, you might catch on from the examples, or you might want to consult another source first (1, 2, 3, 4, 5, 6). # Small examples I'll get to the implementation later, but right now let's see what these fork-based continuations can do. The interface looks like this. template <typename T>class cont {public: void operator()(const T &x);};template <typename T>T call_cc( std::function< T (cont<T>) > f ); std::function is a wrapper that can hold function-like values, such as function objects or C-style function pointers. So call_cc<T> will accept any function-like value that takes an argument of type cont<T> and returns a value of type T. This wrapper is the first of several C++11 features we'll use. call_cc stands for "call with current continuation", and that's exactly what it does. call_cc(f) will call f, and return whatever f returns. The interesting part is that it passes to f an instance of our cont class, which represents all the stuff that's going to happen in the program after f returns. That cont object overloads operator() and so can be called like a function. If it's called with some argument x, the program behaves as though f had returned x. The types reflect this usage. The type parameter T in cont<T> is the return type of the function passed to call_cc. It's also the type of values accepted by cont<T>::operator(). Here's a small example. int f(cont<int> k) { std::cout << "f called" << std::endl; k(1); std::cout << "k returns" << std::endl; return 0;}int main() { std::cout << "f returns " << call_cc<int>(f) << std::endl;} When we run this code we get: f calledf returns 1 We don't see the "k returns" message. Instead, calling k(1) bails out of f early, and forces it to return 1. This would happen even if we passed k to some deeply nested function call, and invoked it there. This nonlocal return is kind of like throwing an exception, and is not that surprising. More exciting things happen if a continuation outlives the function call it came from. boost::optional< cont<int> > global_k;int g(cont<int> k) { std::cout << "g called" << std::endl; global_k = k; return 0;}int main() { std::cout << "g returns " << call_cc<int>(g) << std::endl; if (global_k) (*global_k)(1);} When we run this, we get: g calledg returns 0g returns 1 g is called once, and returns twice! When called, g saves the current continuation in a global variable. After g returns, main calls that continuation, and g returns again with a different value. What value should global_k have before g is called? There's no such thing as a "default" or "uninitialized" cont<T>. We solve this problem by wrapping it with boost::optional. We use the resulting object much like a pointer, checking for "null" and then dereferencing. The difference is that boost::optional manages storage for the underlying value, if any. Why isn't this code an infinite loop? Because invoking a cont<T> also resets global state to the values it had when the continuation was captured. The second time g returns, global_k has been reset to the "null" optional value. This is unlike Scheme's call/cc and most other continuation systems. It turns out to be a serious limitation, though it's sometimes convenient. The reason for this behavior is that invoking a continuation is implemented as a transfer of control to another process. More on that later. # Backtracking We can use continuations to implement backtracking, as found in logic programming languages. Here is a suitable interface. bool guess();void fail(); We will use guess as though it has a magical ability to predict the future. We assume it will only return true if doing so results in a program that never calls fail. Here is the implementation. boost::optional< cont<bool> > checkpoint;bool guess() { return call_cc<bool>( [](cont<bool> k) { checkpoint = k; return true; } );}void fail() { if (checkpoint) { (*checkpoint)(false); } else { std::cerr << "Nothing to be done." << std::endl; exit(1); }} guess invokes call_cc on a lambda expression, which saves the current continuation and returns true. A subsequent call to fail will invoke this continuation, retrying execution in a world where guess had returned false instead. In Scheme et al, we would store a whole stack of continuations. But invoking our cont<bool> resets global state, including the checkpoint variable itself, so we only need to explicitly track the most recent continuation. Now we can implement the integer factoring example from "Continuations in C". int integer(int m, int n) { for (int i=m; i<=n; i++) { if (guess()) return i; } fail();}void factor(int n) { const int i = integer(2, 100); const int j = integer(2, 100); if (i*j != n) fail(); std::cout << i << " * " << j << " = " << n << std::endl;} factor(n) will guess two integers, and fail if their product is not n. Calling factor(391) will produce the output 17 * 23 = 391 after a moment's delay. In fact, you might see this after your shell prompt has returned, because the output is produced by a thousand-generation descendant of the process your shell created. # Solving a maze For a more substantial use of backtracking, let's solve a maze. const int maze_size = 15;char maze[] = "X-------------+\n" " | |\n" "|--+ | | | |\n" "| | | | --+ |\n" "| | | |\n" "|-+---+--+- | |\n" "| | | |\n" "| | | ---+-+- |\n" "| | | |\n" "| +-+-+--| |\n" "| | | |--- |\n" "| | |\n" "|--- -+-------|\n" "| \n" "+------------- \n";void solve_maze() { int x=0, y=0; while ((x != maze_size-1) || (y != maze_size-1)) { if (guess()) x++; else if (guess()) x--; else if (guess()) y++; else y--; if ( (x < 0) || (x >= maze_size) || (y < 0) || (y >= maze_size) ) fail(); const int i = y*(maze_size+1) + x; if (maze[i] != ' ') fail(); maze[i] = 'X'; } for (char c : maze) { if (c == 'X') std::cout << "\e[1;32mX\e[0m"; else std::cout << c; }} Whether code or prose, the algorithm is pretty simple. Start at the upper-left corner. As long as we haven't reached the lower-right corner, guess a direction to move. Fail if we go off the edge, run into a wall, or find ourselves on a square we already visited. Once we've reached the goal, we iterate over the char array and print it out with some rad ANSI color codes. Once again, we're making good use of the fact that our continuations reset global state. That's why we see 'X' marks not on the failed detours, but only on a successful path through the maze. Here's what it looks like. X-------------+XXXXXXXX| ||--+ |X| | || | |X| --+ || |XXXXX| ||-+---+--+-X| || |XXX | XXX|| |X|X---+-+-X||XXX|XXXXXX|XX||X+-+-+--|XXX ||X| | |--- ||XXXX | ||---X-+-------|| XXXXXXXXXXX+-------------X # Excess backtracking We can run both examples in a single program. int main() { factor(391); solve_maze();} If we change the maze to be unsolvable, we'll get: 17 * 23 = 39123 * 17 = 391Nothing to be done. Factoring 391 a different way won't change the maze layout, but the program doesn't know that. We can add a cut primitive to eliminate unwanted backtracking. void cut() { checkpoint = boost::none;}int main() { factor(391); cut(); solve_maze();} # The implementation For such a crazy idea, the code to implement call_cc with fork is actually pretty reasonable. Here's the core of it. template <typename T>// staticT cont<T>::call_cc(call_cc_arg f) { int fd[2]; pipe(fd); int read_fd = fd[0]; int write_fd = fd[1]; if (fork()) { // parent close(read_fd); return f( cont<T>(write_fd) ); } else { // child close(write_fd); char buf[sizeof(T)]; if (read(read_fd, buf, sizeof(T)) < ssize_t(sizeof(T))) exit(0); close(read_fd); return *reinterpret_cast<T*>(buf); }}template <typename T>void cont<T>::impl::invoke(const T &x) { write(m_pipe, &x, sizeof(T)); exit(0);} To capture a continuation, we fork the process. The resulting processes share a pipe which was created before the fork. The parent process will call f immediately, passing a cont<T> object that holds onto the write end of this pipe. If that continuation is invoked with some argument x, the parent process will send x down the pipe and then exit. The child process wakes up from its read call, and returns x from call_cc. There are a few more implementation details. • If the parent process exits, it will close the write end of the pipe, and the child's read will return 0, i.e. end-of-file. This prevents a buildup of unused continuation processes. But what if the parent deletes the last copy of some cont<T>, yet keeps running? We'd like to kill the corresponding child process immediately. This sounds like a use for a reference-counted smart pointer, but we want to hide this detail from the user. So we split off a private implementation class, cont<T>::impl, with a destructor that calls close. The user-facing class cont<T> holds a std::shared_ptr to a cont<T>::impl. And cont<T>::operator() simply calls cont<T>::impl::invoke through this pointer. • It would be nice to tell the compiler that cont<T>::operator() won't return, to avoid warnings like "control reaches end of non-void function". GCC provides the noreturn attribute for this purpose. • We want the cont<T> constructor to be private, so we had to make call_cc a static member function of that class. But the examples above use a free function call_cc<T>. It's easiest to implement the latter as a 1-line function that calls the former. The alternative is to make it a friend function of cont<T>, which requires some forward declarations and other noise. There are a number of limitations too. • As noted, the forked child process doesn't see changes to the parent's global state. This precludes some interesting uses of continuations, like implementing coroutines. In fact, I had trouble coming up with any application other than backtracking. You could work around this limitation with shared memory, but it seemed like too much hassle. • Each captured continuation can only be invoked once. This is easiest to observe if the code using continuations also invokes fork directly. It could possibly be fixed with additional forking inside call_cc. • Calling a continuation sends the argument through a pipe using a naive byte-for-byte copy. So the argument needs to be Plain Old Data, and had better not contain pointers to anything not shared by the two processes. This means we can't send continuations through other continuations, sad to say. • I left out the error handling you would expect in serious code, because this is anything but. • Likewise, I'm assuming that a single write and read will suffice to send the value. Robust code will need to loop until completion, handle EINTR, etc. Or use some higher-level IPC mechanism. • At some size, stack-allocating the receive buffer will become a problem. • It's slow. Well, actually, I'm impressed with the speed of fork on Linux. My machine solves both backtracking problems in about a second, forking about 2000 processes along the way. You can speed it up more with static linking. But it's still far more overhead than the alternatives. As usual, you can get the code from GitHub. ## January 11, 2015 ### Gabriel Gonzalez # total-1.0.0: Exhaustive pattern matching using traversals, prisms, and lenses The lens library provides Prisms, which are a powerful way to decouple a type's interface from its internal representation. For example, consider the Either type from the Haskell Prelude: data Either a b = Left a | Right b Instead of exposing the Left and Right constructors, you could instead expose the following two Prisms: _Left :: Prism (Either a b) (Either a' b ) a a'_Right :: Prism (Either a b) (Either a b') b b' Everything you can do with a constructor, you can do with the equivalent Prism. For example, if you want to build a value using a Prism, you use review: review _Left :: a -> Either a breview _Right :: b -> Either a b You can also kind-of pattern match on the Prism using preview, returning a Just if the match succeeds or Nothing if the match fails: preview _Left :: Either a b -> Maybe apreview _Right :: Either a b -> Maybe b However, preview is not quite as good as real honest-to-god pattern matching, because if you wish to handle every branch you can't prove in the types that your pattern match was exhaustive: nonTypeSafe :: Either Char Int -> StringnonTypeSafe e = case preview _Left e of Just c -> replicate 3 c Nothing -> case preview _Right e of Just n -> replicate n '!' Nothing -> error "Impossible!" -- Unsatisfying However, this isn't a flaw with Prisms, but rather a limitation of preview. Prisms actually preserve enough information in the types to support exhaustive pattern matching! The trick for this was described a year ago by Tom Ellis, so I decided to finally implement his idea as the total library. #### Example Here's an example of a total pattern matching using the total library: import Lens.Family.Totalimport Lens.Family.Stocktotal :: Either Char Int -> String -- Same as:total = _case -- total = \case & on _Left (\c -> replicate 3 c ) -- Left c -> replicate 3 c & on _Right (\n -> replicate n '!') -- Right n -> replicate n '!' The style resembles LambdaCase. If you want to actually supply a value in the same expression, you can also write: e & (_case & on _Left ... & on _Right ... ) There's nothing unsafe going on under the hood. on is just 3 lines of code: on p f g x = case p Left x of Left l -> f l Right r -> g r (&) is just an operator for post-fix function application: x & f = f x ... and _case is just a synonym for a type class method that checks if a type is uninhabited. class Empty a where impossible :: a -> x_case = impossible The rest of the library is just code to auto-derive Empty instances using Haskell generics. The entire library is about 40 lines of code. #### Generics Here's an example of exhaustive pattern matching on a user-defined data type: {-# LANGUAGE DeriveGeneric #-}{-# LANGUAGE TemplateHaskell #-}import Control.Lens.TH (makePrisms)import GHC.Generics (Generic)import Lens.Family.Totaldata Example a b c = C1 a | C2 b | C3 c deriving (Generic)makePrisms ''Exampleinstance (Empty a, Empty b, Empty c) => Empty (Example a b c)example :: Example String Char Int -> String -- Same as:example = _case -- example = \case & on _C1 (\s -> s ) -- C1 s -> s & on _C2 (\c -> replicate 3 c ) -- C2 c -> replicate 3 c & on _C3 (\n -> replicate n '!') -- C3 n -> replicate n '!' This uses two features to remove most of the boiler-plate: • deriving (Generic) auto-generates the code for the Empty instance • makePrisms uses Template Haskell to auto-generate Prisms for our type. For those who prefer the lens-family-* suite of libraries over lens, I opened up an issue against lens-family-th so that Lens.Family.TH.makeTraversals can be used in place of makePrisms for exhaustive pattern matching. When we write this instance in our original code: instance (Empty a, Empty b, Empty c) => Empty (Example a b c) ... that uses Haskell generics to auto-generate the implementation of the impossible function: instance (Empty a, Empty b, Empty c) => Empty (Example a b c) impossible (C1 a) = impossible a impossible (C2 b) = impossible b impossible (C3 c) = impossible c That says that the type Example a b c is uninhabited if and only if the types a, b, and c are themselves uninhabited. When you write makePrisms ''Example, that creates the following three prisms: _C1 :: Prism (Example a b c) (Example a' b c ) a a'_C2 :: Prism (Example a b c) (Example a b' c ) b b'_C3 :: Prism (Example a b c) (Example a b c') c c' These Prisms are useful in their own right. You can export them in lieu of your real constructors and they are more powerful in several respects. #### Backwards compatibility There is one important benefit to exporting Prisms and hiding constructors: if you export Prisms you can change your data type's internal representation without changing your API. For example, let's say that I change my mind and implement Example as two nested Eithers: type Example a b c = Either a (Either b c) Had I exposed my constructors, this would be a breaking change for my users. However, if I expose Prisms, then I can preserve the old API by just changing the Prism definition. Instead of auto-generating them using Template Haskell, I can just write: import Lens.Family.Stock (_Left, _Right)_C1 = _Left_C2 = _Right . _Left_C3 = _Right . _Right Done! That was easy and the user of my Prisms are completely unaffected by the changes to the internals. #### Under the hood The trick that makes total work is that every branch of the pattern match subtly alters the type. The value that you match on starts out as: Example String Char Int ... and every time you pattern match against a branch, the on function Voids one of the type parameters. The pattern matching flows from bottom to top: example = _case & on _C1 (\s -> s ) -- Step 3: Example Void Void Void & on _C2 (\c -> replicate 3 c ) -- Step 2: Example String Void Void & on _C3 (\n -> replicate n '!') -- Step 1: Example String Char Void -- Step 0: Example String Char Int Finally, _case just checks that the final type (Example Void Void Void in this case) is uninhabited. All it does is check if the given type is an instance of the Empty type class, which only provides instances for uninhabited types. In this case Example Void Void Void is definitely uninhabited because Void (from Data.Void) is itself uninhabited: instance Empty Void where impossible = absurd #### Lenses What's neat is that this solution works not only for Prisms and Traversals, but for Lenses, too! Check this out: example :: (Int, Char) -> String -- Same as:example = _case -- example = \case & on _1 (\n -> replicate n '1') -- (n, _) -> replicate n '1' ... and the identity lens works (which is just id) works exactly the way you'd expect: example :: (Int, Char) -> String -- Same as:example = _case -- example = \case & on id (\(n, c) -> replicate n c) -- (n, c) -> replicate n c I didn't intend that when I wrote the library: it just fell naturally out of the types. #### Conclusion The total library now allows library authors to hide their constructors and export a backwards compatible Prism API all while still preserving exhaustive pattern matching. Note that I do not recommend switching to a lens-only API for all libraries indiscriminately. Use your judgement when deciding whether perfect backwards compatibility is worth the overhead of a lens-only API, both for the library maintainers and your users. I put the code under the Lens.Family.Total module, not necessarily because I prefer the Lens.Family.* hierarchy, but rather because I would like to see Edward Kmett add these utilities to lens, leaving my library primarily for users of the complementary lens-family-* ecosystem. You can find the total library on Hackage or on GitHub. ### Russell O'Connor # Secure diffie-hellman-group-exchange-sha256 Recently I have been working on purging DSA from my computer systems. The problem with DSA and ECDSA is that they fail catastrophically with when nonces are accidentally reused, or if the randomly generated nonces are biased. At about the same time, I was pleased to discover an article on securing SSH. It gives further advice in setting up SSH and I have proceeded to apply most of the recommendation listed there. For key exchange algorithms, the article suggests using curve25519-sha256 and falling back to diffie-hellman-group-exchange-sha256 for compatibility purposed if you must. The diffie-hellman-group-exchange-sha256 protocol allows the client and server to negotiate a prime field to perform the key exchange in. In order to avoid using the smaller prime fields, the article suggests deleting prime numbers less than 2000 bits in size from /etc/ssh/moduli. The problem with this advice is that only the SSH server reads /etc/ssh/moduli; touching this file does nothing to secure your SSH client from using small prime fields during key negotiation. Securing the client is the important use case for diffie-hellman-group-exchange-sha256, because if you can control the server, then it means you will probably use curve25519-sha256 instead. However, the protocol for diffie-hellman-group-exchange-sha256 does allow the client to negotiate the field side. The problem is that this ability is not exposed for configuration in SSH. To address this, I created a patch for OpenSSH that raises the minimum field size allowed for the diffie-hellman-group-exchange-sha256 key exchange for both the client and server. This means you do not need to edit the /etc/ssh/moduli file to increase the minimum field size for the server, but it will not hurt to do so either. If you are running NixOS you can download the patch and add it to your /etc/nixos/configuration.nix file with the following attribute.  nixpkgs.config.packageOverrides = oldpkgs: { openssh = pkgs.lib.overrideDerivation oldpkgs.openssh (oldAttrs: { patches = oldAttrs.patches ++ [ ./openssh-dh-grp-min.patch ]; }); }; As an aside, I noticed that this key exchange protocol has a design flaw in it. The hash signed by the server is the hash of V_C || V_S || I_C || I_S || K_S || min || n || max || p || g || e || f || K. The details of what those variables stand for is not important. What is important is that there is a older format of the protocol that is supported for backwards compatibility where the hash signed by the server is the hash of V_C || V_S || I_C || I_S || K_S || n || p || g || e || f || K. In this older protocol, the client only requests a field size without specifying the minimum and maximum allowed bounds. This is why the variables min and max are do not appear in the hash of the older protocol. A short header is sent by the client to determine which of these two versions of this protocol it is using. The problem is that this header is not part of the hashed data. This little crack has potential to be an exploit. A MITM attacker could replace the header the client sends with the old protocol header, and then try to manipulate the remaining communication between the client and server so that both the client and server hash the same serialized byte string allowing the server to appear to be authenticated to the client, but where the client and server are interpreting that serialized byte string in two different ways. In particular the MITM wants the client to not be doing computation modulo some safe prime, but instead do modular arithmetic over a different ring entirely. Fortunately this particular little crack does not appear to be wide enough to exploit. The incidental properties of the serialization format do not allow a successful manipulation, at least not in practical SSH configurations. When one is signing a serialized blob of data, it is important that the data can be unambiguously parsed using only the contents of that data. This principle is violated here because one cannot decide if one will parse min and max without knowing the header, which is not part of the serialized blob of data. The reason this failure can so easily creep into the protocol is that neither the client nor the server actually have to parse that blob of data. They both serialize the data and then create or verify the signature, but parsing is never done. Therefore, parsing is never implemented. Since parsing is not implemented, it is easy to miss that the data serialization format is ambigous. ### Keegan McAllister # 151-byte static Linux binary in Rust Part of the sales pitch for Rust is that it's "as bare metal as C".1 Rust can do anything C can do, run anywhere C can run,2 with code that's just as efficient, and at least as safe (but usually much safer). I'd say this claim is about 95% true, which is pretty good by the standards of marketing claims. A while back I decided to put it to the test, by making the smallest, most self-contained Rust program possible. After resolving a fewissues along the way, I ended up with a 151-byte, statically linked executable for AMD64 Linux. With the release of Rust 1.0-alpha, it's time to show this off. Here's the Rust code: #![crate_type="rlib"]#![allow(unstable)]#[macro_use] extern crate syscall;use std::intrinsics;fn exit(n: usize) -> ! { unsafe { syscall!(EXIT, n); intrinsics::unreachable() }}fn write(fd: usize, buf: &[u8]) { unsafe { syscall!(WRITE, fd, buf.as_ptr(), buf.len()); }}#[no_mangle]pub fn main() { write(1, "Hello!\n".as_bytes()); exit(0);} This uses my syscall library, which provides the syscall! macro. We wrap the underlying system calls with Rust functions, each exposing a safe interface to the unsafe syscall! macro. The main function uses these two safe functions and doesn't need its own unsafeannotation. Even in such a small program, Rust allows us to isolate memory unsafety to a subset of the code. Because of crate_type="rlib", rustc will build this as a static library, from which we extract a single object file tinyrust.o:  rustc tinyrust.rs \ -O -C no-stack-check -C relocation-model=static \ -L syscall.rs/target ar x libtinyrust.rlib tinyrust.o objdump -dr tinyrust.o0000000000000000 <main>: 0: b8 01 00 00 00 mov 0x1,%eax 5: bf 01 00 00 00 mov 0x1,%edi a: be 00 00 00 00 mov 0x0,%esi b: R_X86_64_32 .rodata.str1625 f: ba 07 00 00 00 mov 0x7,%edx 14: 0f 05 syscall 16: b8 3c 00 00 00 mov 0x3c,%eax 1b: 31 ff xor %edi,%edi 1d: 0f 05 syscall  We disable stack exhaustion checking, as well as position-independent code, in order to slim down the output. After optimization, the only instructions that survive come from inline assembly blocks in the syscall library. Note that main doesn't end in a ret instruction. The exit function (which gets inlined) is marked with a "return type" of !, meaning "doesn't return". We make good on this by invoking the unreachableintrinsic after syscall!. LLVM will optimize under the assumption that we can never reach this point, making no guarantees about the program behavior if it is reached. This represents the fact that the kernel is actually going to kill the process before syscall!(EXIT, n) can return. Because we use inline assembly and intrinsics, this code is not going to work on a stable-channel build of Rust 1.0. It will require an alpha or nightly build until such time as inline assembly and intrinsics::unreachable are added to the stable language of Rust 1.x. Note that I didn't even use #![no_std]! This program is so tiny that everything it pulls from libstd is a type definition, macro, or fully inlined function. As a result there's nothing of libstd left in the compiler output. In a larger program you may need #![no_std], although its role is greatly reduced following the removal of Rust's runtime. # Linking This is where things get weird. Whether we compile from C or Rust,3 the standard linker toolchain is going to include a bunch of junk we don't need. So I cooked up my own linker script: SECTIONS { . = 0x400078; combined . : AT(0x400078) ALIGN(1) SUBALIGN(1) { *(.text*) *(.data*) *(.rodata*) *(.bss*) }} We smash all the sections together, with no alignment padding, then extract that section as a headerless binary blob:  ld --gc-sections -e main -T script.ld -o payload tinyrust.o objcopy -j combined -O binary payload payload.bin Finally we stick this on the end of a custom ELF header. The header is written in NASM syntax but contains no instructions, only data fields. The base address 0x400078 seen above is the end of this header, when the whole file is loaded at 0x400000. There's no guarantee that ld will put main at the beginning of the file, so we need to separately determine the address of main and fill that in as the e_entry field in the ELF file header.  ENTRY=(nm -f posix payload | grep '^main ' | awk '{print 3}') nasm -f bin -o tinyrust -D entry=0xENTRY elf.s chmod +x ./tinyrust ./tinyrustHello! It works! And the size:  wc -c < tinyrust158 Seven bytes too big! # The final trick To get down to 151 bytes, I took inspiration from this classic article, which observes that padding fields in the ELF header can be used to store other data. Like, say, a string constant. The Rust code changes to access this constant: use std::{mem, raw};#[no_mangle]pub fn main() { let message: &'static [u8] = unsafe { mem::transmute(raw::Slice { data: 0x00400008 as *const u8, len: 7, }) }; write(1, message); exit(0);} A Rust slicelike &[u8] consists of a pointer to some memory, and a length indicating the number of elements that may be found there. The module std::raw exposes this as an ordinary struct that we build, then transmute to the actual slice type. The transmute function generates no code; it just tells the type checker to treat our raw::Slice<u8> as if it were a &[u8]. We return this value out of the unsafe block, taking advantage of the "everything is an expression" syntax, and then print the message as before. Trying out the new version:  rustc tinyrust.rs \ -O -C no-stack-check -C relocation-model=static \ -L syscall.rs/target ar x libtinyrust.rlib tinyrust.o objdump -dr tinyrust.o0000000000000000 <main>: 0: b8 01 00 00 00 mov 0x1,%eax 5: bf 01 00 00 00 mov 0x1,%edi a: be 08 00 40 00 mov 0x400008,%esi f: ba 07 00 00 00 mov 0x7,%edx 14: 0f 05 syscall 16: b8 3c 00 00 00 mov 0x3c,%eax 1b: 31 ff xor %edi,%edi 1d: 0f 05 syscall ... wc -c < tinyrust151 ./tinyrustHello! The object code is the same as before, except that the relocation for the string constant has become an absolute address. The binary is smaller by 7 bytes (the size of "Hello!\n") and it still works! You can find the full code on GitHub. The code in this article works on rustc 1.0.0-dev (44a287e6e 2015-01-08). If I update the code on GitHub, I will also update the version number printed by the included build script. I'd be curious to hear if anyone can make my program smaller! 1. C is not really "bare metal", but that's another story 2. From a pure language perspective. If you want to talk about availability of compilers and libraries, then Rust still has a bit of a disadvantage ;) 3. In fact, this code grew out of an earlier experiment with really small binaries in C. ## January 10, 2015 ### Philip Wadler # Are the Greens or UKIP a major party? Have your say. Ofcom has issued a draft ruling that the Greens are not a 'major party' but that UKIP is. Hard to justify, one might think, given that Carolyn Lucas has been a sitting MP since September 2008, while UKIP did not acquire its first MP until a by-election in October 2014. One consequence of Ofcom's decision is that the Greens may be shut out of any televised national election debates, while Farage is given a seat. It's a draft ruling, and there is still time to have your say. My own response to Ofcom is below. Question 1: Please provide your views on: a) the evidence of current support laid out in Annex 2, and b) whether there is any other relevant evidence which you consider Ofcom should take into account for the purposes of the 2015 review of the list of major parties: The Green Party has had a sitting MP since September 2008, while UKIP has only had a sitting MP since October 2014. This relevant point appears nowhere in the annex. Question 2: Do you agree with our assessment in relation to each of: a) The existing major parties, b) Traditional Unionist Voice in Northern Ireland, c) The Green Party (including the Scottish Green Party), and d) UKIP? Please provide reasons for your views: It is a scandal to include UKIP as a major party while excluding the Greens. Either both should be included, or both excluded. I would prefer to see both included. While UKIP enjoys stronger support than the Greens in current opinion polls, this is a shortlived phenomenon in part driven by the increased coverage recently given to UKIP by news media. Ofcom's role should be to dampen the effect of media focus on the current bandwagon, not to amplify it. Ofcom should ensure that all serious contenders have an opportunity to make their views heard. The cutoff for being a 'serious contender' should sit at support from around 5% of the electorate. Question 3: Do you agree with the proposed amendment to Rule 9 of the PPRB Rules Procedures outlined in paragraph 3.7 above? Please provide reasons for your views: I do not agree with the proposed amendment. It is stated the parties 'might'' raise unsustainable complaints, but no evidence is provided that this is a serious problem. It is more democratic to leave the decision to the Election Commission than to give Ofcom the ability to refuse complaints without any right of appeal. [I note also that the reference on the web form to 'paragraph 3.7 above' is confusing. Not only does the relevant paragraph not appear on the web page with the question, the web page does not even contain a link to the document containing the paragraph.] # Joe Sacco: On Satire Joe Sacco is my favourite graphic-novel journalist. All of us deplore the attack on Charlie Hebdo, but his response is the most thoughtful analysis I've seen. ## January 09, 2015 ### Danny Gratzer # Why Constructive Logic Posted on January 9, 2015 Tags: types Continuing on my quest of writing about my poorly thought out comments, let’s talk about constructive logic. A lot of people in and around the Haskell/FP community will make statements like The Curry-Howard isomorphism means that you’re proving things in constructive logic. Usually absent from these remarks is a nice explanation of why constructive logic matches up with the programming we know and love. In this post I’d like to highlight what constructive logic is intended to capture and why this corresponds so nicely with programming. ## A Bit of History First things first, let’s discuss the actual origin of constructive logic. It starts with a mathematician and philosopher named Brouwer. He was concerned trying to give an answer to the question “What does it mean to know something to be true” where something is defined as a mathematical proposition. He settled on the idea of proof being a sort of subjective and personal thing. I know something is true if and only if I can formulate some intuitive proof of it. When viewed this way, the proof I scribble down on paper doesn’t actually validate something’s truthfulness. It’s merely a serialization of my thought process for validating its truthfulness. Notice that this line of reasoning doesn’t actually specify a precise definition of what verifying something intuitively means. I interpret this idea as something slightly more meta then any single formal system. Rather, when looking a formal system, you ought to verify that its axioms are admissible by your own intuition and then you may go on to accept proofs built off of these axioms. Now after Brouwer started talking about these ideas Arend Heyting decided to try to write down a logic that captured this notion of “proof is intuition”. The result was this thing called intuitionistic logic. This logic is part of a broader family of logics called “constructive logics”. ## Constructive Logic The core idea of constructive logic is replacing the notion of truth found in classical logic with an intuitionist version. In a classical logic each proposition is either true or false, regardless of what we know about it. In our new constructive system, a formula cannot be assigned either until we have direct evidence of it. It’s not that there’s a magical new boolean value, {true, false, i-don’t-know}, it’s just not a meaningful question to ask. It doesn’t make sense in these logics to say “A is true” without having a proof of A. There isn’t necessarily this Platonic notion of truthfulness, just things we as logicians can prove. This is sometimes why constructive logic is called “logic for humans”. The consequences of dealing with things in this way can be boils down to a few things. For example, we now know that 1. If ∃x. A(x) can be proven, then there is some term which we can readily produce t so that A(t) is provable 2. If A ∨ B can be proven then either A or B is provable and we know which. (note that ∨ is the symbol for OR) These make sense when you realize that ∃x. A(x) can only be proven if we have a direct example of it. We can’t indirectly reason that it really ought to exist or merely claim that it must be true in one of a set of cases. We actually need to introduce it by proving an example of it. When our logic enforces this of course we can produce that example! The same goes for A ∨ B, in our logic the only way to prove A ∨ B is to either provide a proof of A or provide a proof of B. If this is the only way to build a ∨ we can always just point to how it was introduced! If we extend this to and, ∧: The only way to prove A ∧ B is to prove both A and B. If this is the only way to get to a proof of A ∧ B then of course we can get a proof of A from A ∧ B. ∧ is just behaving like a pair of proofs. All of this points at one thing: our logic is structured so that we can only prove something when we directly prove it, that’s the spirit of Brouwer’s intuitionism that we’re trying to capture. There are a lot of different incarnations of constructive logic, in fact pretty much every logic has a constructive cousin. They all share this notion of “We need a direct proof to be true” however. One thing to note that is that some constructive logics conflict a bit with intuitionism. While intuitionism might have provided some of the basis for constructive logics gradually people have poked and pushed the boundaries away from just Brouwer’s intuitionism. For example both Markov’s principle and Church’s thesis state something about all computable functions. While they may be reasonable statements we can’t give a satisfactory proof for them. This is a little confusing I know and I’m only going to talk about constructive logics that Brouwer would approve of. I encourage the curious reader to poke further at this, it’s rather cool math. ## Who on Earth Cares? Now while constructive logic probably sounds reasonable, if weird, it doesn’t immediately strike me as particularly useful! Indeed, the main reason why computer science cares about constructivism is because we all use it already. To better understand this, let’s talk about the Curry-Howard isomorphism. It’s that thing that wasn’t really invented by either Curry or Howard and some claim isn’t best seen as an isomorphism, naming is hard. The Curry-Howard isomorphism states that there’s a mapping from a type to a logical proposition and from a program to a proof. To show some of the mappings for types  CH(Either a b) = CH(a) ∨ CH(b) CH((a, b)) = CH(a) ∧ CH(b) CH( () ) = ⊤ -- True CH(Void) = ⊥ -- False CH(a -> b) = CH(a) → CH(b) So a program with the type (a, b) is really a proof that a ∧ b is true. Here the truthfulness of a proposition really means that the corresponding type can be occupied by a program. Now, onto why this logic we get is constructive. Recall our two conditions for a logic being constructive, first is that if ∃x. A(x) is provable then there’s a specific t where A(t) is provable. Under the Curry Howard isomorphism, ∃ is mapped to existential types (I wonder how that got its name :). That means that a proof of ∃x. A(x) is something like  -- Haskell ex. syntax is a bit gnaryl :/ data Exists f = forall x. Exists f x ourProof :: Exists F ourProof = ... Now we know the only way to construct an Exists F is to use the constructor Exists. This constructor means that there is at least one specific type for which we could prove f x. We can also easily produce this term as well!  isProof :: Exists f -> (f x -> c) -> c isProof (Exists x) cont = cont x We can always access the specific “witness” we used to construct this Exists type with pattern matching. The next law is similar. If we have a proof of a ∨ b we’re supposed to immediately be able to produce a proof of a or a proof of b. In programming terms, if we have a program Either a b we’re supposed to be able to immediately tell whether this returns Right or Left! We can make some argument that one of these must be possible to construct but we’re not sure which since we have to be able to actually run this program! If we evaluate a program with the type Either a b we’re guaranteed to get either Left a or Right b. ## The Self-Sacrificing Definition of Constructive Logic There are a few explanations of constructive logic that basically describe it as “Classical logic - the law of excluded middle”. More verbosely, a constructive logic is just one that forbids 1. ∀ A. A ∨ ¬ A being provable (the law of excluded middle, LEM) 2. ∀ A. ¬ (¬ A) → A being provable (the law of double negation) I carefully chose the words “being provable” because we can easily introduce these as a hypothesis to a proof and still have a sound system. Indeed this is not uncommon when working in Coq or Agda. They’re just not a readily available tool. Looking at them, this should be apparent as they both let us prove something without directly proving it. This isn’t really a defining aspect of constructivism, just a natural consequence. If we need a proof of A to show A to be true if we admit A ∨ ¬ A by default it defeats the point. We can introduce A merely by showing ¬ (¬ A) which isn’t a proof of A! Just a proof that it really ought to be true. In programming terms this is saying we can’t write these two functions.  data Void doubleNeg :: ((a -> Void) -> Void) -> a doubleNeg = ... lem :: Either a (a -> Void) lem = ... For the first one we have to choices, either we use this (a -> Void) -> Void term we’re given or we construct an a without it. Constructing an arbitrary a without the function is just equivalent to forall a. a which we know to be unoccupied. That means we have to use (a -> Void) -> Void which means we have to build an a -> Void. We have no way of doing something interesting with that supplied a however so we’re completely stuck! The story is similar with lem. In a lot of ways this definition strikes me in the same way that describing functional programming as Oh it’s just programming where you don’t have variables or objects. Or static typing as It’s just dynamic typed programming where you can’t write certain correct programs I have a strong urge to say “Well.. yes but no!”. ## Wrap Up Hopefully this helps clarify what exactly people mean when they say Haskell corresponds to a constructive logic or programs are proofs. Indeed this constructivism gives rise to a really cool thing called “proof relevant mathematics”. This is mathematics done purely with constructive proofs. One of the latest ideas to trickle from mathematics to computers is homotopy type theory where we take a proof relevant look at identity types. Before I wrap up I wanted to share one funny little thought I heard. Constructive mathematics has found a home in automated proof systems. Imagine Brouwer’s horror at hearing we do “intuitionist” proofs that no one will ever look at or try to understand beyond some random mechanical proof assistant! Thanks to Jon Sterling and Darryl McAdams for the advice and insight <script type="text/javascript"> var disqus_shortname = 'codeco'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus ## January 08, 2015 ### Danny Gratzer # A Crash Course on ML Modules Posted on January 8, 2015 Tags: sml, haskell I was having lunch with a couple of Haskell programmers the other day and the subject of the ML family came up. I’ve been writing a lot of ML lately and mentioned that I thought *ML was well worth learning for the average Haskeller. When pressed why the best answer I could come up with was “Well.. clean language, Oh! And an awesome module system” which wasn’t my exactly most compelling response. I’d like to outline a bit of SML module system here to help substantiate why looking at an ML is A Good Thing. All the code here should be translatable to OCaml if that’s more your taste. ## Concepts In ML languages modules are a well thought out portion of the language. They aren’t just “Oh we need to separate these names… modules should work”. Like any good language they have methods for abstraction and composition. Additionally, like any good part of an ML language, modules have an expressive type language for mediating how composition and abstraction works. So to explain how this module system functions as a whole, we’ll cover 3 subjects 1. Structures 2. Signatures 3. Functors Giving a cursory overview of what each thing is and how it might be used. ## Structures Structures are the values in the module language. They are how we actually create a module. The syntax for them is  struct fun flip f x y = f y x datatype 'a list = Con of ('a * 'a list) | Nil ... end A quick note to Haskellers, in ML types are lower case and type variables are written with ’s. Type constructors are applied “backwards” so List a is 'a list. So they’re just a bunch of a declarations stuffed in between a struct and end. This is a bit useless if we can’t bind it to a name. For that there’s  structure M = struct val x = 1 end And now we have a new module M with a single member, x : int. This is just like binding a variable in the term language except a “level up” if you like. We can use this just like you would use modules in any other language.  val x' = M.x + 1 Since struct ... end can contain any list of declarations we can nest module bindings.  structure M' = struct structure NestedM = M end And access this using ..  val sum = M'.NestedM.x + M.x As you can imagine, it would get a bit tedious if we needed to . our way to every single module access. For that we have open which just dumps a module’s exposed contents into our namespace. What’s particularly interesting about open is that it is a “normal” declaration and can be nested with let.  fun f y = let open M in x + y end OCaml has gone a step further and added special syntax for small opens. The “local opens” would turn our code into  let f y = M.(x + y) This already gives us a lot more power than your average module system. Structures basically encapsulate what we’d expect in a module system, but 1. Structures =/= files 2. Structures can be bound to names 3. Structures can be nested Up next is a look at what sort of type system we can impose on our language of structures. ## Signatures Now for the same reason we love types in the term language (safety, readability, insert-semireligious-rant) we’d like them in the module language. Happily ML comes equipped with a feature called signatures. Signature values look a lot like structures  sig val x : int datatype 'a list = Cons of ('a * 'a list) | Nil end So a signature is a list of declarations without any implementations. We can list algebraic data types, other modules, and even functions and values but we won’t provide any actual code to run them. I like to think of signatures as what most documentation rendering tools show for a module. As we had with structures, signatures can be given names.  signature MSIG = sig val x : int end On their own signatures are quite useless, the whole point is that we can apply them to modules after all! To do this we use : just like in the term language.  structure M : MSIG = struct val x = 1 end When compiled, this will check that M has at least the field x : int inside its structure. We can apply signatures retroactively to both module variables and structure values themselves.  structure M : MSIG = struct val x = 1 end : MSIG One interesting feature of signatures is the ability to leave certain types abstract. For example, when implementing a map the actual implementation of the core data type doesn’t belong in the signature.  signature MAP = sig type key type 'a table val empty : 'a table val insert : key -> 'a -> 'a table -> 'a table val lookup : key -> 'a table -> 'a option end Notice that the type of keys and tables are left abstract. When someone applies a signature they can do so in two ways, weak or strong ascription. Weak ascription (:) means that the constructors of abstract types are still accessible, but the signature does hide all unrelated declarations in the module. Strong ascription (:>) makes the abstract types actually abstract. Every once in a while we need to modify a signature. We can do this with the keywords where type. For example, we might implement a specialization of MAP for integer keys and want our signature to express this  structure IntMap :> MAP where type key = int = struct ... end This incantation leaves the type of the table abstract but specializes the keys to an int. Last but not least, let’s talk about abstraction in module land. ## Functors Last but not least let’s talk about the “killer feature” of ML module systems: functors. Functors are the “lifting” of functions into the module language. A functor is a function that maps modules with a certain signature to functions of a different signature. Jumping back to our earlier example of maps, the equivalent in Haskell land is Data.Map. The big difference is that Haskell gives us maps for all keys that implement Ord. Our signature doesn’t give us a clear way to associate all these different modules, one for each Orderable key, that are really the same thing. We can represent this relationship in SML with  signature ORD = sig type t val compare : t * t -> order end functor RBTree (O : ORD) : MAP where type key = O.t = struct open O .... end Which reads as “For any module implementing Ord, I can give you a module implementing MAP which keys of type O.t”. We can then instantiate these  structure IntOrd = struct type t = int val compare = Int.compare end end structure IntMap = RBTree(IntOrd) Sadly SML’s module language isn’t higher order. This means we can’t assign functors a type (there isn’t an equivalent of ->) and we can’t pass functors to functors. Even with this restriction functors are tremendously useful. One interesting difference between SML and OCaml is how functors handle abstract types. Specifically, is it the case that F(M).t = F(M).t In SML the answer is (surprisingly) no! Applying a functor generates brand new abstract types. This is actually beneficial when you remember SML and OCaml aren’t pure. For example you might write a functor for handling symbol tables and internally use a mutable symbol table. One nifty trick would be to keep of type of symbols abstract. If you only give back a symbol upon registering something in the table, this would mean that all symbols a user can supply are guaranteed to correspond to some entry. This falls apart however if functors are extensional. Consider the following REPL session  > structure S1 = SymbolTable(WhateverParameters) > structure S2 = SymbolTable(WhateverParameters) > val key = S1.register "I'm an entry" > S2.lookup key Error: no such key! This will not work if S1 and S2 have separate key types. To my knowledge, the general conclusion is that generative functors (ala SML) are good for impure code, but applicative functors (ala OCaml and BackPack) really shine with pure code. ## Wrap Up We’ve covered a lot of ground in this post. This wasn’t an exhaustive tour of every feature of ML module systems, but hopefully I got the jist across. If there’s one point to take home: In a lot of languages modules are clearly a bolted on construction. They’re something added on later to fix “that library problem” and generally consist of the same “module <-> file” and “A module imports others to bring them into scope”. In ML that’s simply not the case. The module language is a rich, well thought out thing with it’s own methods of abstraction, composition, and even a notion of types! I wholeheartedly recommend messing around a bit with OCaml or SML to see how having these things impacts your thought process. I think you’ll be pleasantly surprised. <script type="text/javascript"> var disqus_shortname = 'codeco'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus ## January 07, 2015 ### The GHC Team # GHC Weekly News - 2015/01/07 Hi *, it's time for another GHC Weekly News! This week's edition will actually be covering the last two/three weeks; your editor has missed the past few editions due to Holiday madness (and also some relaxation, which is not madness). It's also our first news posting in 2015! So let's get going without any further delay! GHC HQ met this week after the Christmas break; some of our notes include: • Since Austin is back, he'll be spending some time finishing up all the remaining binary distributions for GHC 7.8.4 and GHC 7.10.1 RC1 (mostly, FreeBSD and OS X builds). • We've found that 7.10.1 RC1 is working surprisingly well for users so far; to help users accomodate the changes, Herbert has conveniently written a migration guide for users for their most common problems when upgrading to 7.10.1: https://ghc.haskell.org/trac/ghc/wiki/Migration/7.10 • We're aiming to release the 2nd Release Candidate for GHC 7.10.1 on January 19th. We're hoping this will be the last RC, with 7.10.1 final popping up in the middle of February. • GHC HQ may tentatively be working to release another GHC 7.8 release, but only for a specific purpose: to allow it to compile with 7.10.1. This will make it significantly easier for users to compile old GHCs (perhaps on newer platforms). However, we're not yet 100% decided on this, and we will likely only do a 'very minor release' of the source tarball, should this be the case. Thanks to Edward Yang for helping with this. • For future GHC releases on Windows, we're looking into adopting Neil Mitchell's new binary distribution of GHC, which is a nice installer that includes Cabal, MSYS and GHC. This should significantly lower the burden for Windows users to use GHC and update, ship or create packages. While we're not 100% sure we'll be able to have it ready for 7.10.1, it looks promising. Thanks Neil! (For more info, read Neil's blog post here: http://neilmitchell.blogspot.co.at/2014/12/beta-testing-windows-minimal-ghc.html ) There's also been some movement and chatter on the mailing lists, as usual. • Joachim Breitner made an exciting announcement: he's working on a new performance dashboard for GHC, so we can more easily track and look at performance results over time. The current prototype looks great, and Joachim and Austin are working together to make this an official piece of GHC's infrastructure: https://www.haskell.org/pipermail/ghc-devs/2015-January/007885.html • Over the holiday, Simon went and implemented a nice new feature for GHC: detection of redundant constraints. This means if you mention Ord in a type signature, but actually use nothing which requires that constraint, GHC can properly warn about it. This will be going into 7.12: https://www.haskell.org/pipermail/ghc-devs/2015-January/007892.html • Now that GHC 7.10 will feature support for DWARF based debugging information, Johan Tibell opened a very obvious discussion thread: what should we do about shipping GHC and its libraries with debug support? Peter chimed in with some notes - hopefully this will all be sorted out in time for 7.10.1 proper: https://www.haskell.org/pipermail/ghc-devs/2015-January/007851.html Closed tickets the past few weeks include: #8984, #9880, #9732, #9783, #9575, #9860, #9316, #9845, #9913, #9909, #8650, #9881, #9919, #9732, #9783, #9915, #9914, #9751, #9744, #9879, #9876, #9032, #7473, #9764, #9067, #9852, #9847, #9891, #8909, #9954, #9508, #9586, and #9939. ### FP Complete # Announcing: mutable-containers 0.2 As part of our high-performance computing work, I recently found myself in need of some fast mutable containers. The code is now available on both Hackage and Stackage. The code is pretty young, and is open to design changes still. That said, the currently released version (0.2.0) is well tested and performs fairly well. If there are ideas for improvement, please let me know! Below is the content of the README file, which gives a good overview of the package, as well as benchmark numbers and test coverage statistics (spoiler: 100%). As always, you can see the README on Github or on Stackage. One of Haskell's strengths is immutable data structures. These structures make it easier to reason about code, simplify concurrency and parallelism, and in some cases can improve performance by allowing sharing. However, there are still classes of problems where mutable data structures can both be more convenient, and provide a performance boost. This library is meant to provide such structures in a performant, well tested way. It also provides a simple abstraction over such data structures via typeclasses. Before anything else, let me provide the caveats of this package: • Don't use this package unless you have a good reason to! Immutable data structures are a better approach most of the time! • This code is intentionally not multithread safe. If you need something like a concurrent queue, there are many options on Hackage, from Chan to TChan, to chaselev-deque. We'll first talk about the general approach to APIs in this package. Next, there are two main sets of abstractions provided, which we'll cover in the following two sections, along with their concrete implementations. Finally, we'll cover benchmarks. ## API structure The API takes heavy advantage of the PrimMonad typeclass from the primitive package. This allows our data structures to work in both IO and ST code. Each data structure has an associated type, MCState, which gives the primitive state for that structure. For example, in the case of IORef, that state is RealWorld, whereas for STRef s, it would be s. This associated type is quite similar to the PrimState associated type from primitive, and in many type signatures you'll see an equality constraint along the lines of: PrimState m ~ MCState c For those who are wondering, MCState stands for "mutable container state." All actions are part of a typeclass, which allows for generic access to different types of structures quite easily. In addition, we provide type hint functions, such as asIORef, which can help specify types when using such generic functions. For example, a common idiom might be: ioref <- fmap asIORef  newRef someVal Wherever possible, we stick to well accepted naming and type signature standards. For example, note how closely modifyRef and modifyRef' match modifyIORef and modifyIORef'. ## Single cell references The base package provides both IORef and STRef as boxed mutable references, for storing a single value. The primitive package also provides MutVar, which generalizes over both of those and works for any PrimMonad instance. The MutableRef typeclass in this package abstracts over all three of those. It has two associated types: MCState for the primitive state, and RefElement to specify what is contained by the reference. You may be wondering: why not just take the reference as a type parameter? That wouldn't allow us to have monomorphic reference types, which may be useful under some circumstances. This is a similar motivation to how the mono-traversable package works. In addition to providing an abstraction over IORef, STRef, and MutVar, this package provides four additional single-cell mutable references. URef, SRef, and BRef all contain a 1-length mutable vector under the surface, which is unboxed, storable, and boxed, respectively. The advantage of the first two over boxed standard boxed references is that it can avoid a significant amount of allocation overhead. See the relevant Stack Overflow discussion and the benchmarks below. While BRef doesn't give this same advantage (since the values are still boxed), it was trivial to include it along with the other two, and does actually demonstrate a performance advantage. Unlike URef and SRef, there is no restriction on the type of value it can store. The final reference type is PRef. Unlike the other three mentioned, it doesn't use vectors at all, but instead drops down directly to a mutable bytearray to store values. This means it has slightly less overhead (no need to store the size of the vector), but also restricts the types of things that can be stored (only instances of Prim). You should benchmark your program to determine the most efficient reference type, but generally speaking PRef will be most performant, followed by URef and SRef, and finally BRef. ## Collections Collections allow you to push and pop values to the beginning and end of themselves. Since different data structures allow different operations, each operation goes into its own typeclass, appropriately named MutablePushFront, MutablePushBack, MutablePopFront, and MutablePopBack. There is also a parent typeclass MutableCollection which provides: 1. The CollElement associated type to indicate what kinds of values are in the collection. 2. The newColl function to create a new, empty collection. The mono-traversable package provides a typeclass IsSequence which abstracts over sequence-like things. In particular, it provides operations for cons, snoc, uncons, and unsnoc. Using this abstraction, we can provide an instance for all of the typeclasses listed above for any mutable reference containing an instance of IsSequence, e.g. IORef [Int] or BRef s (Seq Double). Note that the performance of some of these combinations is terrible. In particular, pushBack or popBack on a list requires traversing the entire list, and any push operations on a Vector requires copying the entire contents of the vector. Caveat emptor! If you must use one of these structures, it's highly recommended to use Seq, which gives the best overall performance. However, in addition to these instances, this package also provides two additional data structures: double-ended queues and doubly-linked lists. The former is based around mutable vectors, and therefore as unboxed (UDeque), storable (SDeque), and boxed (BDeque) variants. Doubly-linked lists have no such variety, and are simply DLists. For general purpose queue-like structures, UDeque or SDeque is likely to give you best performance. As usual, benchmark your own program to be certain, and see the benchmark results below. ## Benchmark results The following benchmarks were performed on January 7, 2015, against version 0.2.0. ### Ref benchmark benchmarking IORef time 4.322 μs (4.322 μs .. 4.323 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 4.322 μs (4.322 μs .. 4.323 μs) std dev 1.401 ns (1.114 ns .. 1.802 ns) benchmarking STRef time 4.484 μs (4.484 μs .. 4.485 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 4.484 μs (4.484 μs .. 4.484 μs) std dev 941.0 ps (748.5 ps .. 1.164 ns) benchmarking MutVar time 4.482 μs (4.482 μs .. 4.483 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 4.482 μs (4.482 μs .. 4.483 μs) std dev 843.2 ps (707.9 ps .. 1.003 ns) benchmarking URef time 2.020 μs (2.019 μs .. 2.020 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.020 μs (2.019 μs .. 2.020 μs) std dev 955.2 ps (592.2 ps .. 1.421 ns) benchmarking PRef time 2.015 μs (2.014 μs .. 2.015 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.014 μs (2.014 μs .. 2.015 μs) std dev 901.3 ps (562.8 ps .. 1.238 ns) benchmarking SRef time 2.231 μs (2.230 μs .. 2.232 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.231 μs (2.230 μs .. 2.231 μs) std dev 1.938 ns (1.589 ns .. 2.395 ns) benchmarking BRef time 4.279 μs (4.279 μs .. 4.279 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 4.279 μs (4.279 μs .. 4.279 μs) std dev 1.281 ns (1.016 ns .. 1.653 ns) ### Deque benchmark time 8.371 ms (8.362 ms .. 8.382 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 8.386 ms (8.378 ms .. 8.398 ms) std dev 29.25 μs (20.73 μs .. 42.47 μs) benchmarking IORef (Seq Int) time 142.9 μs (142.7 μs .. 143.1 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 142.7 μs (142.6 μs .. 142.9 μs) std dev 542.8 ns (426.5 ns .. 697.0 ns) benchmarking UDeque time 107.5 μs (107.4 μs .. 107.6 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 107.5 μs (107.4 μs .. 107.6 μs) std dev 227.4 ns (171.8 ns .. 297.8 ns) benchmarking SDeque time 97.82 μs (97.76 μs .. 97.89 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 97.82 μs (97.78 μs .. 97.89 μs) std dev 169.5 ns (110.6 ns .. 274.5 ns) benchmarking BDeque time 113.5 μs (113.4 μs .. 113.6 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 113.6 μs (113.5 μs .. 113.7 μs) std dev 300.4 ns (221.8 ns .. 424.1 ns) benchmarking DList time 156.5 μs (156.3 μs .. 156.6 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 156.4 μs (156.3 μs .. 156.6 μs) std dev 389.5 ns (318.3 ns .. 502.8 ns) ## Test coverage As of version 0.2.0, this package has 100% test coverage. If you look at the report yourself, you'll see some uncovered code; it's just the automatically derived Show instance needed for QuickCheck inside the test suite itself. ## January 06, 2015 ### Philip Wadler # A contrary view to 'Antidotes to the Imitation Game' For five years, Barry Cooper has run the Alan Turing Year (ATY) mailing list, tracking worldwide activity related to the Turing Centennial and after. If you want a contrast to the view given by Antidotes to The Imitation Game, see Barry's most recent ATY post. While I agree that fiction can sometimes can closer to truth than nonfiction, I disagree with Barry's claim that the 'higher-order' interpretation of the film accurately captures the arc of Turing's life. I suspect the real Turing differs hugely from the film's version, despite the effort and skill Tyldum, Cumberbatch, and others invested in the film. Many computing scientists are disappointed by the divergence from history in The Imitation Game, while others think that if it does well at the Academy Awards that the popularisation of Turing will be good for our profession. There is something to be said for both points. Hollywood's attempts at biography seems to inevitably involve gross oversimplification or distortion. Is this really necessary for a film to succeed? Does anyone have favourite examples of films that did not grossly distort their subject matter? ### Magnus Therning # Adder as a state machine In my previous post I wrote about a, probably rather naïve, approach to constructing state machines in Haskell. I ended it with a saying that the pattern matching of Haskell makes it rather simple to manually write the step function required to create a working state machines. Hopefully this post can convince you I’m right. The vehicle I’ve chosen is a very simple machine capable of reading two integers and adding them. In slightly more detail it’s a machine that: 1. reads a total of three parts, two integers followed by a plus sign (+) 2. each part is separated by whitespace 3. on errors the machine resets and starts over The signals (a.k.a. the output alphabet) is the following type data CalcSignal = CalcNothing | CalcResult Int | CalcError CalcStates String deriving (Eq, Show) The events (a.k.a. the input alphabet) is simply Char. The states are data CalcStates = Start | ReadFirst Int | ReadSecond Int Int | ReadOperator Int Int deriving (Eq, Show) where the names hopefully are self explanatory. The type of the machine is then type CalcMachine = Machine CalcStates Char CalcSignal The machine itself can then be written like this: calcMachine :: CalcMachine calcMachine = createMachine Start go where go Start e | isNumber e = (ReadFirst (read [e]), CalcNothing) | otherwise = (Start, CalcError Start "No number") go s@(ReadFirst i) e | isNumber e = (ReadFirst (10 * i + read [e]), CalcNothing) | isSpace e = (ReadSecond i 0, CalcNothing) | otherwise = (Start, CalcError s "Bad format") go s@(ReadSecond l i) e | isNumber e = (ReadSecond l (10 * i + read [e]), CalcNothing) | isSpace e = (ReadOperator l i, CalcNothing) | otherwise = (Start, CalcError s "Bad format") go s@(ReadOperator i j) e | e == '+' = (Start, CalcResult (i + j)) | isSpace e = (s, CalcNothing) | otherwise = (Start, CalcError s "Bad operator") That’s rather simple and easy to read I find. Though I’m not sure it scales too well to much larger machines. I’ve not really used any DSLs to create large machines either, so I don’t know how well any method scales ;) To do a bit of exploratory testing it’s handy to create the following function calculate :: String -> IO () calculate = foldM_ go calcMachine where go mach c = do let (m, s) = stepMachine mach c print s return m Using that function it’s easy to check if the machine works as intended. > calculate "56 67 +" CalcNothing CalcNothing CalcNothing CalcNothing CalcNothing CalcNothing CalcResult 123 So far so good. What about the behaviour on errors? > calculate "5a6 67 +" CalcNothing CalcError (ReadFirst 5) "Bad format" CalcNothing CalcNothing CalcNothing CalcNothing CalcNothing CalcResult 73 That looks good enough to me. Though there is (at least) one detail of how the machine works that might be surprising and hence should be fixed, but I’ll leave that as an exercise for the reader ;) As I mentioned in the previous post I’ve been using this method for writing state machines to implement two different protocols. For the IO I used conduit which means I had to turn the state machine into a conduit. I’ll write about that in a later post though. ### FP Complete # A New Release for the New Year: Announcing Release 3.2 # A New Release for the New Year We recently released version 3.2 of FP Haskell Center. We want to take this opportunity to list some of the new features and highlight the additions of a Hosted Haddocks button and the ability to set up personal RSS feeds within each account. ## 3.2 Features List • Support for downloading extra packages from arbitrary.tar and .gz urls (available from the “extra packages” tab of the settings page) • Auto-insert functionality works for error messages (used to only work for warnings) • Toggle executable bit in UI. You can now make data files executable inside the IDE. • Updated hlint version for better functionality • Hosted Haddock button • Per-user RSS feed /user-feed/username to access ## More about the features we think you’ll be the most interested in ### Hosted Haddock Button Often times when you’re working on a codebase, it’s convenient to generate the Haddocks to get an idea of what’s going on. It’s also useful to be able to share those generated Haddocks with others. FP Haskell Center now allows you to do both with a single click. Inside the deployment menu, you can now generated Haddocks for your current project. Links to dependencies will be created correctly, and the generated URL is fully shareable with others. ### Per-user RSS Feed Users now have the ability to set up personal RSS Feeds within their accounts. This answers a request from some users to be able to more easily let people stay up-to-date with their content. This ties in nicely with our previous addition of Disqus comments. ## Feedback is always appreciated We are proud of every improvement we make to FP Haskell Center and look forward to your feedback. With each release we are continuing to raise the quality of our product. ## January 05, 2015 ### Functional Jobs # Senior Software Engineer at McGraw-Hill Education (Full-time) This Senior Software Engineer position is with the new LearnSmart team at McGraw-Hill Education's new and growing Research & Development center in Boston's Innovation District. We make software that helps college students study smarter, earn better grades, and retain more knowledge. The LearnSmart adaptive engine powers the products in our LearnSmart Advantage suite — LearnSmart, SmartBook, LearnSmart Achieve, LearnSmart Prep, and LearnSmart Labs. These products provide a personalized learning path that continuously adapts course content based on a student’s current knowledge and confidence level. On our team, you'll get to: • Move textbooks and learning into the digital era • Create software used by millions of students • Advance the state of the art in adaptive learning technology • Make a real difference in education Our team's products are built with Flow, a functional language in the ML family. Flow lets us write code once and deliver it to students on multiple platforms and device types. Other languages in our development ecosystem include especially JavaScript, but also C++, SWF (Flash), and Haxe. If you're interested in functional languages like Scala, Swift, Erlang, Clojure, F#, Lisp, Haskell, and OCaml, then you'll enjoy learning Flow. We don't require that you have previous experience with functional programming, only enthusiasm for learning it. But if you have do some experience with functional languages, so much the better! (On-the-job experience is best, but coursework, personal projects, and open-source contributions count too.) We require only that you: • Have a solid grasp of CS fundamentals (languages, algorithms, and data structures) • Be comfortable moving between multiple programming languages • Be comfortable with modern software practices: version control (Git), test-driven development, continuous integration, Agile Get information on how to apply for this position. ### Douglas M. Auclair (geophf) # December 2014 1HaskellADay Problems and Solutions December 2014 • December 30th, 2014: Why can't we all just consensus along? for today's Haskell problem lpaste.net/117487 Pro-, con-, anti-, ... it's all sensus to me! lpaste.net/117490 • December 29th, 2014: Uh, wait! What? It's another day? We need another #haskell puzzler? Here ya go! Rosalind subs (not grinders) http://lpaste.net/117421 Rose petals falling ... and a soution to the #rosalind subs problem http://lpaste.net/117481 • December 24th, 2014: A||B||C == Merry Christmas for today's haskell problem http://lpaste.net/117213 • December 23rd, 2014: Convergence, but not the movie, is today's #haskell problem http://lpaste.net/117089 A rational solution was posted by @sheshanaag at http://lpaste.net/117232 • December 22rd, 2014: Yesterday was about the zeroes, today's haskell problem is about the squares http://lpaste.net/116990 • December 21nd, 2014: In which we want to have the most zeros. Fancy that! http://lpaste.net/116932 (I guess we're not playing Dodgeball, then!) • December 19th, 2014: A question for today's #haskell problem: how do YOUclear a Writer monad's log? http://lpaste.net/116849 A FlushableWriter solutionwas posted by @aditcr8 at http://lpaste.net/117115 • December 18th, 2014: Why can't you spell 'PROTEIN' with proteins?lpaste.net/116723 The solution for this problem is to Eat your proteins, young man! lpaste.net/116808 • December 17th, 2014: In which we find out that Mendel was into the Domme-scene for today's #haskell problem http://lpaste.net/116683 Whether hetero- or homo- a zygous is still a zygous... whatever that is. A solution to today's #haskell-zygous problem is posted at http://lpaste.net/116711 • December 16th, 2014: All actors should relate to today's #haskell problem; it's all about H-A-M-M! http://lpaste.net/116578 Go HAMM it up; #Rosalind will love you. And the solution is so HAMMy! http://lpaste.net/116585 • December 15th, 2014: Wanna Date? In which Jane Austen writes today's #haskell problem in her modern book titled: "The Hook Up" http://lpaste.net/116486 aka Pride and Prejudice. To get to our solution space, first we have to write an l-expr scanner/parser, of course. http://lpaste.net/116558 • December 12th, 2014: In this case 'GC' does NOT mean 'garbage collection' for today's #rosalind #haskell problem lpaste.net/116305. Yeah, not garbage-collected, but a solution, nonetheless http://lpaste.net/116335 • December 11th, 2014: Why ... you ... wascally wabbit! Today's #haskell problem has a solution multiplying like Fibonaccis! http://lpaste.net/116216 I mean: rabbits. And speaking of rabid rabbits http://lpaste.net/116303, the solution is there. BLARGH! • December 10th, 2014: "Hello, Strand, you're looking lovely in that dress!" Complimenting DNA for today's #haskell problem. http://lpaste.net/116147 No, wait: 'comPLEmenting' a strand of DNA. Sorry, Miss Rosalind! A nicely complimented solution to the DNA-complement problem posted at http://lpaste.net/116151 • December 9th, 2014: I think this is my week with dear Rosalind: DNA-to-RNA transcriber for today's #haskell problem http://lpaste.net/116068 Another #TweetedHaskellSolution posted to http://lpaste.net/116071 for today's DNA-to-RNA transcriber • December 8th, 2014: Rosalind, dear Rosalie, wherefore art thou givest me bioinformatic puzzles to solve, counting nucleotides? http://lpaste.net/115961 #haskell Nucleotide-counter, HO! A nucleotide-counting solution is posted at http://lpaste.net/116011 • December 5th, 2014: This visit to Liar's-berg (or is it Truth-town?) we encounter only two women, so this should be 2x as easy to solve http://lpaste.net/114437 By coercion, we find that we are in Liarsberg, after all: http://lpaste.net/115769 • December 4th, 2014: Gadzooks! It's (past) that time of day again for the daily #haskell problem. Here: eat some glow worms! http://lpaste.net/114435 • December 3rd, 2014: Substantive citizens (with some substantive hints, for a change) for today's #haskell problem http://lpaste.net/114438 • December 2nd, 2014: There are some curious fractions to find fromprojecteuler.net for today's #haskell problem at http://lpaste.net/114208 Asolution to this curious fraction problem is posted at lpaste.net/114209 • December 1st, 2014: Today's Haskell problem is all about lambda-terms. It has NOTHING to do with for-loops and if-statements, ... NOT ONE BIT http://lpaste.net/115420 Solution: lpaste.net/115452 ### Christopher Done # Measuring duration in Haskell Happy new year, everyone. It’s a new year and time for new resolutions. Let’s talk about time. Specifically, measuring it in Haskell. ## A wrong solution How do you measure how long something takes in Haskell? Here’s a naive attempt: import Control.Exception import Data.Time main = do start <- getCurrentTime evaluate (sum [1 .. 1000000]) end <- getCurrentTime print (diffUTCTime end start) Running it, we see that it does what we expect: λ> main 0.316653s ## Inaccurate measuring Here’s what’s wrong with this implementation: • The clock can be changed by the user at any moment. • Time synchronization services regularly update time. If you’re on an Ubuntu desktop, time is updated when you first boot up from NTP servers. If you’re on a server, likely there is a daily cron job to update your time, because you don’t tend to reboot servers. My laptop has been on for 34 days:  uptime 21:13:47 up 34 days, 2:06, 3 users, load average: 0.74, 0.83, 0.84 If I run a manual update, it adjusts my clock by 500 milliseconds:  sudo ntpdate ntp.ubuntu.com 5 Jan 21:11:53 ntpdate[4805]: adjust time server x.x.x.x offset 0.517166 sec Because there is a certain amount of “drift” that occurs over time. Additionally, leap seconds can be introduced at any time and cannot be predicated systematically, but there is at least a 6 months in advance notice for time servers. In 2015 there will be an extra second added to time in-between the 30th of June to the 1st of July. These factors mean that if our main function is run during an update, the reported time could be completely wrong. For something simple like the above, maybe it doesn’t matter. For long term logging and statistics gathering, this would represent an anomaly. For a one-off, maybe it’s forgivable, because it’s convenient. But above all, it’s simply inaccurate reporting. ## Accurate measuring Readers familiar with this problem will think back to measuring time in C; it requires inspecting the system clock and dividing by clocks per second. In fact there are a couple solutions around that use this: • The timeit package. This is good if your use-case is simple. • In turn, that package uses System.CPUTime from base, which is also handy. These are more reliable, because the time cannot be changed. But they are limited, as both only measure CPU time and not IO time. So if your program takes 10 seconds but only does 5 seconds of CPU processing and 5 seconds of waiting for the disk, then you will not have the real time. Also known as wall time. In the Criterion package, there’s need for fine-grained, fast, accurate measuring of both real and CPU time, so it includes its own cross-platform implementations: That’s nice, but it’s embedded in a specific package built for benchmarking, which we may not necessarily be doing. For example, I am dabbling with a program to measure the speed of my key presses. It turns out there is a package that does similarly to Criterion, already prepared and similarly cross-platform and only depends on base and ghc-prim. ## The clock package I discovered this really nice package called clock which has the option for several time measurements: • Monotonic: a monotonic but not-absolute time which never changes after start-up. • Realtime: an absolute Epoch-based time (which is the system clock and can change). • ProcessCPUTime: CPU time taken by the process. • ThreadCPUTime: CPU time taken by the thread. Let’s rewrite our example using this package and the formatting package (which provides a handy TimeSpec formatter as of 6.1): {-# LANGUAGE OverloadedStrings #-} import Control.Exception import Formatting import Formatting.Clock import System.Clock main = do start <- getTime Monotonic evaluate (sum [1 .. 1000000]) end <- getTime Monotonic fprint (timeSpecs % "\n") start end Running it, we see we get similar information as above, but now it’s accurate. λ> main 276.05 ms If you just want CPU time for the process, or the OS thread, just provide a different argument to getTime. ## Summary So next time you want to measure how long something takes, unless you’re doing benchmarking, check out the clock package! ## January 03, 2015 ### Magnus Therning # Simple state machines in Haskell During this past autumn I’ve had to work with products implementing two small and simple protocols, one was USB based, the other completely proprietary. In both cases I found it very helpful to write clients where the central behaviour was modelled as a state machine. In the case of the proprietary protocol I’ve ended up writing a couple of servers implementing a subset of the behaviour of the real product, also here I found using a state machine as the central piece to be very useful. I started out having a look at [Hackage][] to see if there was already some tool or DSL that would help me create machines. Surprisingly I found nothing, which probably just means I didn’t look carefully enough. Anyway, after reading the Wikipedia articles on Finite-state machine, finite state transducer, Moore machine and Mealy machine I decided to write my own naïve implementation of something resembling Mealy machines. A Mealy machine is a 6-tuple comprising • a finite set of states • a start state • a finite set called the input alphabet • a finite set called the output alphabet • a transition function mapping a state and an input symbol to the next state • an output function mapping a state and an input symbol to an output symbol If the transition and output functions are combined into a single function, and we don’t care about being able to reset a machine, it’s possible to use a representation as simple as this: data Machine state event signal = Machine { mCurState :: state , mTransFunction :: state -> event -> (state, signal) } I’ve opted to call the input alphabet event and the output alphabet signal. Stepping a machine is then performed by passing the current state and an event to the combined function and updating the machine with the result. Of course the generated signal has to be dealt with too. Or in other words: stepMachine :: Machine state event signal -> event -> (Machine state event signal, signal) stepMachine machine event = (machine {mCurState = newState}, output) where curState = mCurState machine (newState, output) = mTransFunction machine curState event I also decided to add a function for creating a machine given a start state and a function. With the definition above it becomes trivial: createMachine :: state -> (state -> event -> (state, signal)) -> Machine state event signal createMachine = Machine That’s it, really. Except of course that the actual transition function still has to be written. However, with the pattern matching of Haskell I’ve found that to be rather simple, but I’ll keep that for the next post. ## January 02, 2015 ### Ken T Takusagawa # [okucbyxc] Happy 2015! The factorial of 2015 expressed in base 147 gives a nice random-looking blob of symbols. (Ironically, it ends in even more exclamation points.) We use the wbr HTML tag to break lines to the width of your browser window. See source code in Haskell. Naive radix conversion is an unfold, though divide and conquer would be more efficient. radix_convert :: Integer -> Integer -> [Integer]; radix_convert base = unfoldr  \n -> if n==0 then Nothing else Just  swap  divMod n base;  The 147 digits are a subset of Latin-1: !<wbr>#<wbr><wbr>%<wbr>&<wbr>(<wbr>)<wbr>*<wbr>+<wbr>-<wbr>/<wbr>0<wbr>1<wbr>2<wbr>3<wbr>4<wbr>5<wbr>6<wbr>7<wbr>8<wbr>9<wbr>:<wbr>;<wbr><<wbr>=<wbr>><wbr>?<wbr>@<wbr>A<wbr>B<wbr>C<wbr>D<wbr>E<wbr>F<wbr>G<wbr>H<wbr>I<wbr>J<wbr>K<wbr>L<wbr>M<wbr>N<wbr>O<wbr>P<wbr>Q<wbr>R<wbr>S<wbr>T<wbr>U<wbr>V<wbr>W<wbr>X<wbr>Y<wbr>Z<wbr>[<wbr>\<wbr>]<wbr>^<wbr>a<wbr>b<wbr>c<wbr>d<wbr>e<wbr>f<wbr>g<wbr>h<wbr>i<wbr>j<wbr>k<wbr>l<wbr>m<wbr>n<wbr>o<wbr>p<wbr>q<wbr>r<wbr>s<wbr>t<wbr>u<wbr>v<wbr>w<wbr>x<wbr>y<wbr>z<wbr>{<wbr>|<wbr>}<wbr>~<wbr>¡<wbr>¢<wbr>£<wbr>¤<wbr>¥<wbr>¦<wbr>§<wbr>©<wbr>«<wbr>¬<wbr>®<wbr>±<wbr>µ<wbr>¶<wbr>·<wbr>»<wbr>¼<wbr>½<wbr>¾<wbr>¿<wbr>Æ<wbr>Ç<wbr>Ð<wbr>×<wbr>Ø<wbr>Þ<wbr>ß<wbr>à<wbr>á<wbr>â<wbr>ã<wbr>ä<wbr>å<wbr>æ<wbr>ç<wbr>è<wbr>é<wbr>ê<wbr>ë<wbr>ì<wbr>í<wbr>î<wbr>ï<wbr>ð<wbr>ñ<wbr>ò<wbr>ó<wbr>ô<wbr>õ<wbr>ö<wbr>÷<wbr>ø<wbr>ù<wbr>ú<wbr>û<wbr>ü<wbr>ý<wbr>þ<wbr>ÿ 2015! = %<wbr>4<wbr>p<wbr>µ<wbr>ò<wbr>q<wbr>g<wbr>ê<wbr>0<wbr>)<wbr>\<wbr>8<wbr>Ç<wbr>ã<wbr>t<wbr>m<wbr>4<wbr>ç<wbr>×<wbr>F<wbr>2<wbr>u<wbr>Z<wbr>ó<wbr>|<wbr>f<wbr>µ<wbr>}<wbr>å<wbr>ï<wbr>v<wbr>Y<wbr>Ð<wbr>2<wbr>b<wbr>t<wbr>¢<wbr>á<wbr>V<wbr>ò<wbr>ø<wbr>ï<wbr>Z<wbr>¼<wbr>h<wbr>ç<wbr>k<wbr>ð<wbr>=<wbr>»<wbr>Z<wbr>?<wbr>¬<wbr>®<wbr>S<wbr>O<wbr>X<wbr>1<wbr>¿<wbr>\<wbr>¶<wbr>×<wbr>Ø<wbr>1<wbr>I<wbr>ü<wbr>@<wbr>[<wbr>R<wbr>§<wbr>}<wbr>u<wbr>Ø<wbr>æ<wbr>5<wbr>¢<wbr>ý<wbr>ç<wbr>t<wbr>®<wbr>]<wbr>Þ<wbr>P<wbr>)<wbr>l<wbr>[<wbr>¡<wbr>Æ<wbr>x<wbr>ì<wbr>G<wbr>@<wbr>ß<wbr>à<wbr>ï<wbr>½<wbr>©<wbr>P<wbr>O<wbr>O<wbr><<wbr>æ<wbr>b<wbr>k<wbr>S<wbr>ñ<wbr>û<wbr>)<wbr>0<wbr>±<wbr>¦<wbr>q<wbr>æ<wbr>)<wbr>N<wbr>w<wbr>)<wbr>2<wbr>I<wbr>â<wbr>â<wbr>/<wbr>g<wbr>u<wbr>í<wbr>å<wbr>å<wbr>/<wbr>w<wbr>7<wbr>ç<wbr>¶<wbr>û<wbr>\<wbr>w<wbr>q<wbr>ç<wbr>J<wbr>ù<wbr>å<wbr>g<wbr>D<wbr>/<wbr>¡<wbr>#<wbr>n<wbr>E<wbr>·<wbr>O<wbr>ý<wbr>ý<wbr>í<wbr>s<wbr>õ<wbr>T<wbr>y<wbr>(<wbr>8<wbr>b<wbr>V<wbr>[<wbr>ô<wbr>B<wbr>w<wbr>í<wbr>è<wbr>M<wbr>¿<wbr>¢<wbr>d<wbr>¶<wbr>~<wbr>L<wbr>y<wbr>A<wbr>A<wbr>L<wbr>3<wbr>ñ<wbr>]<wbr>/<wbr>/<wbr>¼<wbr>0<wbr>~<wbr>F<wbr>K<wbr>î<wbr>¡<wbr>d<wbr>¬<wbr>z<wbr>ú<wbr>ý<wbr>v<wbr>x<wbr>á<wbr>Ð<wbr>ú<wbr>ý<wbr>m<wbr>þ<wbr>!<wbr>l<wbr>T<wbr>û<wbr>¤<wbr>ç<wbr>4<wbr>ñ<wbr>*<wbr>^<wbr>=<wbr>ø<wbr>><wbr>ô<wbr>y<wbr>O<wbr>ô<wbr>÷<wbr>¼<wbr>S<wbr>f<wbr>H<wbr>Ø<wbr>ó<wbr>×<wbr>j<wbr>9<wbr>M<wbr>J<wbr>(<wbr>+<wbr>F<wbr>{<wbr>×<wbr>¤<wbr>^<wbr>|<wbr>û<wbr><wbr>e<wbr>L<wbr>[<wbr>K<wbr>[<wbr>®<wbr>E<wbr>®<wbr>D<wbr>ì<wbr>ã<wbr>8<wbr>H<wbr>=<wbr>ô<wbr>~<wbr>%<wbr>¿<wbr>m<wbr>¬<wbr>Z<wbr>H<wbr>*<wbr>M<wbr>«<wbr>×<wbr>¬<wbr>u<wbr>0<wbr>2<wbr>i<wbr>ÿ<wbr>ä<wbr>q<wbr>¶<wbr>ð<wbr>-<wbr>u<wbr>î<wbr>ß<wbr>@<wbr>8<wbr>à<wbr>ä<wbr>#<wbr>ü<wbr>U<wbr>D<wbr>ô<wbr><<wbr>æ<wbr>Z<wbr>¬<wbr>å<wbr>/<wbr>d<wbr>B<wbr>#<wbr>ø<wbr>-<wbr>h<wbr>Þ<wbr>ç<wbr>#<wbr>6<wbr>ð<wbr>h<wbr>é<wbr>u<wbr>¶<wbr>ï<wbr>+<wbr>¢<wbr>p<wbr>û<wbr>s<wbr>{<wbr>w<wbr>X<wbr>à<wbr>B<wbr>f<wbr>©<wbr>;<wbr>¢<wbr>d<wbr>\<wbr>ö<wbr>t<wbr>D<wbr>Ø<wbr>{<wbr>c<wbr>9<wbr>ð<wbr>t<wbr>¦<wbr>o<wbr>¤<wbr>¤<wbr>ó<wbr>/<wbr>C<wbr>¡<wbr>/<wbr>4<wbr>·<wbr>§<wbr>¢<wbr>æ<wbr><<wbr>G<wbr>«<wbr>û<wbr>j<wbr>P<wbr>ë<wbr>R<wbr><<wbr>Ø<wbr>¦<wbr>x<wbr>ê<wbr>5<wbr>è<wbr>a<wbr>Q<wbr>-<wbr>¾<wbr>-<wbr>á<wbr>¬<wbr>¡<wbr>ñ<wbr>û<wbr>j<wbr>y<wbr>+<wbr>V<wbr>n<wbr>±<wbr>4<wbr>ô<wbr>f<wbr>ö<wbr>·<wbr>¼<wbr>6<wbr>ú<wbr>F<wbr>ö<wbr>»<wbr>1<wbr>¤<wbr>j<wbr>L<wbr>|<wbr>N<wbr>×<wbr>ÿ<wbr>p<wbr>m<wbr>T<wbr>£<wbr>h<wbr>ï<wbr>ð<wbr>ý<wbr>q<wbr>ì<wbr>ö<wbr>¦<wbr>3<wbr>¾<wbr>0<wbr>G<wbr>1<wbr>ã<wbr>m<wbr>Ð<wbr>?<wbr>ü<wbr>¤<wbr>d<wbr>W<wbr>s<wbr>@<wbr>%<wbr>2<wbr>¶<wbr>V<wbr>é<wbr>O<wbr>B<wbr>è<wbr>m<wbr>Q<wbr>ê<wbr>@<wbr>e<wbr>4<wbr>!<wbr>E<wbr>z<wbr>ß<wbr>Ø<wbr>w<wbr>:<wbr>¾<wbr>î<wbr>b<wbr>æ<wbr>v<wbr>%<wbr>7<wbr>í<wbr>!<wbr>¡<wbr>&<wbr>×<wbr>×<wbr>Q<wbr>M<wbr>Z<wbr>£<wbr>g<wbr>1<wbr>ô<wbr>¡<wbr>â<wbr>~<wbr>ö<wbr>I<wbr>*<wbr>g<wbr>Ç<wbr>ô<wbr>*<wbr>|<wbr><<wbr>G<wbr>Ð<wbr>¢<wbr>¡<wbr>F<wbr>8<wbr>±<wbr>0<wbr>Z<wbr>±<wbr>©<wbr>C<wbr>×<wbr>%<wbr>Q<wbr>C<wbr>ò<wbr>÷<wbr>T<wbr>ê<wbr>8<wbr>{<wbr>u<wbr>i<wbr>¤<wbr>ý<wbr>Þ<wbr>ð<wbr>H<wbr>÷<wbr>½<wbr>ò<wbr>¶<wbr>B<wbr>a<wbr>ç<wbr>m<wbr>j<wbr>t<wbr>R<wbr>I<wbr>A<wbr>(<wbr>£<wbr>¾<wbr>Æ<wbr>+<wbr>ù<wbr>¾<wbr>z<wbr>w<wbr>S<wbr>ê<wbr>h<wbr>Æ<wbr>ô<wbr>à<wbr>)<wbr>-<wbr>E<wbr>Þ<wbr><wbr>·<wbr>h<wbr>×<wbr>m<wbr>W<wbr>s<wbr>ú<wbr>v<wbr>}<wbr>¾<wbr>Ð<wbr>M<wbr>â<wbr>·<wbr>k<wbr><<wbr>¦<wbr>Ø<wbr>ç<wbr>Ç<wbr>î<wbr>9<wbr>¦<wbr>&<wbr>&<wbr>¾<wbr>s<wbr>é<wbr>û<wbr>Z<wbr>D<wbr>Y<wbr>ó<wbr>s<wbr>n<wbr>r<wbr>o<wbr>£<wbr>%<wbr>×<wbr>¢<wbr>½<wbr>~<wbr>e<wbr>ê<wbr>1<wbr>a<wbr>?<wbr>S<wbr>a<wbr>U<wbr>ç<wbr>è<wbr>q<wbr><<wbr>H<wbr>=<wbr>ê<wbr>2<wbr>ð<wbr>3<wbr>ö<wbr>Ø<wbr>y<wbr>p<wbr>|<wbr>Ð<wbr>±<wbr>5<wbr>¢<wbr>;<wbr>J<wbr>m<wbr>÷<wbr>þ<wbr>Þ<wbr>z<wbr>®<wbr><wbr>3<wbr>Þ<wbr>><wbr>ê<wbr>;<wbr>v<wbr>8<wbr>/<wbr>~<wbr>ú<wbr>t<wbr>ú<wbr>p<wbr>1<wbr>{<wbr>C<wbr>ô<wbr>L<wbr>ë<wbr>H<wbr>å<wbr>[<wbr>¤<wbr>n<wbr>F<wbr>3<wbr>\<wbr>¿<wbr>(<wbr>«<wbr>ã<wbr>í<wbr>\<wbr>ñ<wbr>F<wbr>!<wbr>ë<wbr>Y<wbr>Ø<wbr>Y<wbr>e<wbr>P<wbr>¡<wbr>£<wbr>å<wbr>m<wbr>U<wbr>F<wbr>¦<wbr>à<wbr>m<wbr>#<wbr>Ç<wbr>ü<wbr>ø<wbr>ã<wbr>±<wbr>3<wbr>Y<wbr>p<wbr>û<wbr>®<wbr>ó<wbr>=<wbr>8<wbr>â<wbr>¬<wbr>¢<wbr>y<wbr><<wbr>µ<wbr>d<wbr>+<wbr>[<wbr>F<wbr>C<wbr>Y<wbr>á<wbr>¥<wbr>G<wbr>ù<wbr>+<wbr>¿<wbr>)<wbr>è<wbr>M<wbr>ó<wbr>i<wbr>U<wbr>)<wbr>o<wbr>5<wbr>c<wbr>K<wbr>|<wbr>B<wbr>6<wbr>:<wbr>w<wbr>0<wbr>q<wbr>Q<wbr>j<wbr>¶<wbr>&<wbr>:<wbr>d<wbr>¦<wbr>O<wbr>K<wbr>õ<wbr>±<wbr>i<wbr>µ<wbr>+<wbr>D<wbr>l<wbr>=<wbr>£<wbr>í<wbr>6<wbr>0<wbr>ë<wbr>}<wbr>n<wbr>-<wbr>3<wbr>ë<wbr>ë<wbr>¼<wbr>©<wbr>c<wbr>e<wbr>¶<wbr>t<wbr>O<wbr><wbr>y<wbr>O<wbr>C<wbr>¾<wbr>~<wbr>a<wbr>à<wbr>÷<wbr>®<wbr>!<wbr>ä<wbr>u<wbr>¿<wbr>Ç<wbr>\<wbr>F<wbr>õ<wbr>C<wbr>?<wbr>O<wbr>Q<wbr>b<wbr>S<wbr>¿<wbr>Æ<wbr>þ<wbr>þ<wbr>Y<wbr>á<wbr>é<wbr>X<wbr>*<wbr>¾<wbr>;<wbr>G<wbr>k<wbr>¡<wbr>ù<wbr>j<wbr>=<wbr>z<wbr>ê<wbr>@<wbr>W<wbr>l<wbr>k<wbr>2<wbr>O<wbr>¢<wbr>M<wbr>K<wbr>/<wbr>ú<wbr>T<wbr>A<wbr>Q<wbr>ì<wbr>Ø<wbr>m<wbr>ü<wbr>g<wbr>ô<wbr>W<wbr>¼<wbr>P<wbr>¤<wbr>!<wbr>#<wbr>O<wbr>M<wbr>u<wbr>S<wbr>~<wbr>X<wbr>Y<wbr>Ç<wbr>o<wbr>}<wbr>j<wbr>ï<wbr>4<wbr>£<wbr>«<wbr>é<wbr>ï<wbr>æ<wbr>l<wbr>R<wbr>r<wbr>M<wbr>h<wbr>à<wbr>3<wbr>Ç<wbr>ö<wbr>k<wbr>5<wbr>L<wbr>><wbr>/<wbr>¢<wbr>4<wbr>à<wbr>^<wbr>®<wbr>w<wbr>t<wbr>[<wbr>ç<wbr>/<wbr>e<wbr>V<wbr>×<wbr>ú<wbr>¬<wbr>ü<wbr>(<wbr>µ<wbr>{<wbr>¿<wbr>R<wbr>5<wbr>j<wbr>5<wbr>÷<wbr>^<wbr>ã<wbr>½<wbr>l<wbr>#<wbr>]<wbr>§<wbr>|<wbr>;<wbr>3<wbr>]<wbr>§<wbr>÷<wbr>1<wbr>^<wbr>©<wbr>e<wbr>B<wbr>G<wbr>æ<wbr>8<wbr>3<wbr>f<wbr>k<wbr>~<wbr>]<wbr>Z<wbr>j<wbr>¿<wbr>í<wbr>¿<wbr>m<wbr>õ<wbr>(<wbr>£<wbr><<wbr>G<wbr>}<wbr>=<wbr>e<wbr>v<wbr>Æ<wbr>]<wbr>¼<wbr>¡<wbr>÷<wbr>#<wbr>ô<wbr>Ø<wbr>è<wbr>]<wbr>¾<wbr>Q<wbr>3<wbr>Y<wbr>ù<wbr>é<wbr>B<wbr>2<wbr>+<wbr>ì<wbr>×<wbr>ä<wbr>}<wbr>h<wbr>O<wbr>g<wbr>ÿ<wbr>í<wbr>Þ<wbr>ø<wbr>:<wbr>â<wbr>s<wbr>z<wbr>Ç<wbr>3<wbr>Ø<wbr>a<wbr>?<wbr>S<wbr>H<wbr>ú<wbr>X<wbr>><wbr>¾<wbr>S<wbr>ø<wbr>W<wbr>(<wbr>[<wbr>û<wbr>u<wbr>7<wbr>~<wbr>X<wbr>ó<wbr>X<wbr>:<wbr>O<wbr>R<wbr>§<wbr>F<wbr>«<wbr>T<wbr>^<wbr>D<wbr>A<wbr>ñ<wbr>¦<wbr>(<wbr>q<wbr>\<wbr>ù<wbr>Z<wbr>þ<wbr>I<wbr>N<wbr>½<wbr>G<wbr>æ<wbr>~<wbr>3<wbr>n<wbr>à<wbr>D<wbr>\<wbr>F<wbr>X<wbr>¶<wbr>/<wbr>©<wbr>ç<wbr>y<wbr>f<wbr>*<wbr>¡<wbr>a<wbr>r<wbr>m<wbr>l<wbr>þ<wbr>ú<wbr>/<wbr>£<wbr>9<wbr>B<wbr>i<wbr>»<wbr>1<wbr>><wbr>¦<wbr>(<wbr>x<wbr>I<wbr>]<wbr>»<wbr>o<wbr>(<wbr>±<wbr>;<wbr>:<wbr>¦<wbr>©<wbr>¤<wbr>B<wbr>i<wbr>@<wbr>¬<wbr>¥<wbr>ç<wbr>Æ<wbr>T<wbr>Æ<wbr>;<wbr>G<wbr>®<wbr>m<wbr>ö<wbr>q<wbr>ú<wbr>G<wbr>ø<wbr>e<wbr>u<wbr>Þ<wbr>®<wbr>«<wbr>4<wbr>ñ<wbr>0<wbr>4<wbr>5<wbr>é<wbr>a<wbr>4<wbr>ç<wbr>z<wbr>u<wbr>í<wbr>©<wbr>9<wbr>ü<wbr>Ç<wbr>ü<wbr>V<wbr>H<wbr>;<wbr>F<wbr>a<wbr>V<wbr>}<wbr>G<wbr>&<wbr>n<wbr>©<wbr>ï<wbr>n<wbr>ì<wbr>p<wbr>â<wbr>!<wbr>ç<wbr>4<wbr>æ<wbr>/<wbr>!<wbr>Þ<wbr>B<wbr>¥<wbr>ü<wbr>r<wbr>)<wbr>ß<wbr>?<wbr>¿<wbr>N<wbr>{<wbr>[<wbr>Y<wbr>Þ<wbr>ñ<wbr>·<wbr>)<wbr>D<wbr>L<wbr>3<wbr>«<wbr>g<wbr>1<wbr>(<wbr>o<wbr>Y<wbr>E<wbr>à<wbr>p<wbr>N<wbr>ê<wbr>d<wbr>Ø<wbr>ï<wbr>@<wbr>l<wbr>ê<wbr>#<wbr>ó<wbr>-<wbr>¤<wbr>+<wbr>g<wbr>û<wbr>ñ<wbr>!<wbr>j<wbr>î<wbr>@<wbr>ë<wbr>!<wbr>¦<wbr>e<wbr>«<wbr>c<wbr>3<wbr>×<wbr>|<wbr>a<wbr>ø<wbr>ï<wbr>j<wbr>\<wbr>H<wbr>m<wbr>A<wbr>ä<wbr>û<wbr>¢<wbr>[<wbr>a<wbr>m<wbr>N<wbr>[<wbr>®<wbr>U<wbr>U<wbr>ä<wbr>¢<wbr>¬<wbr>@<wbr>¼<wbr>9<wbr>ì<wbr>ï<wbr>5<wbr>s<wbr>¼<wbr>@<wbr>×<wbr>Y<wbr>p<wbr>å<wbr>H<wbr>~<wbr>N<wbr>u<wbr>+<wbr>þ<wbr>d<wbr>Ð<wbr>Ð<wbr>æ<wbr>I<wbr>å<wbr>\<wbr>f<wbr>ý<wbr>ü<wbr>¢<wbr>¦<wbr>f<wbr>M<wbr>ï<wbr>2<wbr>d<wbr>÷<wbr>á<wbr>¿<wbr>N<wbr>L<wbr>¶<wbr>§<wbr>«<wbr>!<wbr>£<wbr>ç<wbr>u<wbr>¥<wbr>ù<wbr>Ð<wbr>ç<wbr>÷<wbr>ì<wbr>b<wbr>i<wbr>A<wbr>1<wbr>y<wbr>!<wbr>y<wbr>Q<wbr>J<wbr>1<wbr>×<wbr>«<wbr>¼<wbr>p<wbr>·<wbr>B<wbr>F<wbr>+<wbr>¾<wbr>v<wbr>»<wbr>^<wbr>L<wbr>G<wbr>@<wbr>1<wbr>¶<wbr>r<wbr>%<wbr>|<wbr>×<wbr>B<wbr>â<wbr>L<wbr>S<wbr>£<wbr>T<wbr>ý<wbr><wbr><wbr>µ<wbr>N<wbr>|<wbr>é<wbr>ã<wbr>f<wbr>ë<wbr>G<wbr>A<wbr>A<wbr>a<wbr>C<wbr>ù<wbr>q<wbr>¾<wbr>ò<wbr>×<wbr>ø<wbr>5<wbr>î<wbr>{<wbr>c<wbr>n<wbr>ó<wbr>¿<wbr>)<wbr>ñ<wbr>u<wbr>û<wbr>9<wbr>÷<wbr>E<wbr>¾<wbr>2<wbr>ê<wbr>6<wbr>R<wbr>ø<wbr>{<wbr>ô<wbr>ð<wbr>b<wbr>¢<wbr>?<wbr>P<wbr>m<wbr>v<wbr>ì<wbr>R<wbr>K<wbr>¤<wbr>><wbr>q<wbr>#<wbr>ñ<wbr>n<wbr>ÿ<wbr>©<wbr>©<wbr>¾<wbr>6<wbr>¬<wbr>j<wbr>A<wbr>à<wbr>m<wbr>õ<wbr>ã<wbr>a<wbr>}<wbr>R<wbr>U<wbr>R<wbr>#<wbr>®<wbr>£<wbr>ø<wbr>8<wbr>Ð<wbr>3<wbr>ó<wbr>s<wbr>©<wbr>µ<wbr>{<wbr>¤<wbr>L<wbr>ò<wbr>á<wbr>5<wbr>5<wbr>)<wbr>(<wbr>ä<wbr>k<wbr>><wbr>c<wbr>ù<wbr>ê<wbr>W<wbr>a<wbr>ö<wbr>õ<wbr>R<wbr>e<wbr>D<wbr>è<wbr>z<wbr>r<wbr>¾<wbr>Ø<wbr>y<wbr>c<wbr>â<wbr>2<wbr>§<wbr>}<wbr>Æ<wbr>x<wbr>á<wbr>¼<wbr>A<wbr>ï<wbr>:<wbr>ý<wbr>~<wbr>><wbr>Q<wbr>6<wbr>R<wbr>i<wbr>ß<wbr>-<wbr>S<wbr>p<wbr>K<wbr>6<wbr>§<wbr>ô<wbr>¢<wbr>Z<wbr>f<wbr>»<wbr>P<wbr>P<wbr>&<wbr>¼<wbr>s<wbr>±<wbr>*<wbr>ã<wbr>]<wbr>ô<wbr>~<wbr>J<wbr>z<wbr>P<wbr>5<wbr>à<wbr>r<wbr>q<wbr><<wbr>\<wbr>·<wbr>N<wbr>µ<wbr>N<wbr>J<wbr>{<wbr>r<wbr>v<wbr>^<wbr>6<wbr>¦<wbr>ý<wbr>ï<wbr>¤<wbr>-<wbr>M<wbr>x<wbr>!<wbr>ì<wbr>[<wbr>b<wbr>å<wbr>#<wbr>à<wbr>í<wbr>{<wbr>x<wbr>Æ<wbr>â<wbr>p<wbr>à<wbr>{<wbr>Ø<wbr>W<wbr>}<wbr>t<wbr>á<wbr>-<wbr>ì<wbr>=<wbr>ç<wbr>¢<wbr>¾<wbr>N<wbr>E<wbr>ì<wbr>M<wbr>G<wbr>{<wbr>í<wbr>c<wbr>ÿ<wbr>á<wbr>z<wbr>þ<wbr><<wbr>J<wbr>®<wbr>#<wbr>q<wbr>u<wbr>h<wbr>é<wbr>ú<wbr>P<wbr>w<wbr>£<wbr>:<wbr>à<wbr>%<wbr>µ<wbr>ñ<wbr>(<wbr>ë<wbr>Æ<wbr>q<wbr><<wbr>t<wbr>ì<wbr>ê<wbr>¤<wbr>®<wbr>s<wbr>û<wbr>a<wbr>d<wbr>F<wbr>ñ<wbr>9<wbr>þ<wbr>ü<wbr>C<wbr>:<wbr>û<wbr>¡<wbr>2<wbr>å<wbr>E<wbr>¾<wbr>-<wbr>><wbr>ÿ<wbr>3<wbr>¦<wbr><wbr>à<wbr>!<wbr>}<wbr>ë<wbr>µ<wbr>D<wbr>¶<wbr>5<wbr>ñ<wbr>i<wbr>V<wbr>Æ<wbr>Ð<wbr>><wbr>þ<wbr>%<wbr>B<wbr>ñ<wbr>L<wbr>@<wbr>f<wbr>ð<wbr>§<wbr>¬<wbr>6<wbr>|<wbr>ñ<wbr>ß<wbr>9<wbr>¥<wbr>¦<wbr>s<wbr>þ<wbr>g<wbr>]<wbr>;<wbr>m<wbr>/<wbr>©<wbr>b<wbr>=<wbr>-<wbr>/<wbr>ê<wbr>d<wbr>ö<wbr>ç<wbr>2<wbr><<wbr>t<wbr>á<wbr>í<wbr>ñ<wbr>j<wbr>R<wbr>:<wbr>P<wbr>9<wbr>q<wbr>)<wbr>ò<wbr>ã<wbr>p<wbr>I<wbr>q<wbr>/<wbr>e<wbr>¾<wbr>(<wbr>í<wbr>X<wbr>y<wbr>V<wbr>U<wbr>+<wbr>-<wbr>w<wbr>¢<wbr>w<wbr>ñ<wbr>V<wbr>õ<wbr>ñ<wbr>r<wbr>±<wbr>ê<wbr>q<wbr>u<wbr>±<wbr>w<wbr>®<wbr>z<wbr>H<wbr>l<wbr>ý<wbr>4<wbr>+<wbr>w<wbr>è<wbr>Þ<wbr>W<wbr>{<wbr>ü<wbr>Þ<wbr>@<wbr>¶<wbr>k<wbr>n<wbr>ß<wbr>i<wbr>:<wbr>ß<wbr>@<wbr>1<wbr>¤<wbr>þ<wbr>4<wbr>7<wbr>%<wbr>R<wbr>Y<wbr>(<wbr>@<wbr>®<wbr>Ç<wbr>m<wbr>¿<wbr>V<wbr>ð<wbr>¶<wbr>b<wbr>M<wbr>N<wbr>ø<wbr>q<wbr>ö<wbr>ó<wbr>¦<wbr>s<wbr>ú<wbr>e<wbr>o<wbr>]<wbr>Y<wbr>M<wbr>æ<wbr>ë<wbr>M<wbr>ð<wbr>í<wbr>c<wbr>è<wbr>©<wbr>)<wbr>m<wbr>4<wbr>u<wbr>/<wbr>6<wbr>=<wbr>8<wbr>½<wbr>:<wbr>W<wbr>*<wbr>Q<wbr>î<wbr>ü<wbr>G<wbr>n<wbr>x<wbr>ç<wbr>x<wbr>ï<wbr>K<wbr>C<wbr>½<wbr>)<wbr>|<wbr>ñ<wbr>I<wbr>]<wbr>5<wbr>·<wbr>d<wbr>H<wbr>d<wbr>ç<wbr>T<wbr>C<wbr>l<wbr>ã<wbr>#<wbr>¿<wbr>U<wbr>ä<wbr>W<wbr>í<wbr>á<wbr>|<wbr>§<wbr>µ<wbr>q<wbr>x<wbr>D<wbr>á<wbr>þ<wbr>µ<wbr>ò<wbr>ô<wbr>½<wbr>ä<wbr>i<wbr>d<wbr>F<wbr>Y<wbr>#<wbr>)<wbr>ê<wbr>I<wbr>±<wbr>å<wbr>p<wbr>4<wbr>9<wbr>Ç<wbr>x<wbr>V<wbr>¦<wbr>â<wbr>\<wbr>C<wbr>ï<wbr>8<wbr>»<wbr>!<wbr>®<wbr>d<wbr>¼<wbr>k<wbr>®<wbr>¬<wbr>l<wbr>£<wbr>·<wbr><<wbr>¦<wbr>C<wbr>J<wbr>W<wbr>q<wbr>q<wbr>î<wbr>A<wbr>}<wbr>ã<wbr>7<wbr>T<wbr>T<wbr>S<wbr>]<wbr>ý<wbr>÷<wbr>\<wbr>%<wbr>B<wbr>S<wbr>f<wbr>î<wbr>ô<wbr>p<wbr>|<wbr>ñ<wbr>A<wbr>n<wbr>ë<wbr>ë<wbr>?<wbr>ì<wbr>%<wbr>q<wbr>N<wbr>ÿ<wbr>~<wbr>ô<wbr>Ç<wbr>y<wbr>î<wbr>«<wbr>¶<wbr>G<wbr>l<wbr>F<wbr>¾<wbr>w<wbr>¡<wbr>P<wbr>L<wbr>;<wbr>4<wbr>m<wbr>Þ<wbr>3<wbr>Y<wbr>ù<wbr>ß<wbr>Ç<wbr>«<wbr>Ø<wbr>á<wbr>G<wbr>¬<wbr>î<wbr>ä<wbr>6<wbr>D<wbr>¡<wbr>k<wbr>l<wbr>à<wbr>å<wbr>{<wbr>&<wbr>X<wbr>S<wbr>y<wbr>ù<wbr>6<wbr>ø<wbr>#<wbr>N<wbr>4<wbr>f<wbr>µ<wbr>m<wbr>§<wbr>x<wbr>@<wbr>e<wbr>ü<wbr>f<wbr>{<wbr>â<wbr>à<wbr>µ<wbr>¾<wbr>¼<wbr>*<wbr>k<wbr>=<wbr>·<wbr>ò<wbr>a<wbr>y<wbr>÷<wbr>X<wbr>S<wbr>§<wbr>õ<wbr>i<wbr>4<wbr>ø<wbr>2<wbr>¢<wbr>1<wbr>á<wbr>è<wbr>R<wbr>¦<wbr>i<wbr>å<wbr>ú<wbr>0<wbr>o<wbr>q<wbr>J<wbr>H<wbr>d<wbr>x<wbr>%<wbr>á<wbr>·<wbr>Ø<wbr>><wbr>¿<wbr>ë<wbr>¼<wbr>¡<wbr>W<wbr>ß<wbr>Y<wbr>ò<wbr>ä<wbr>l<wbr>L<wbr>ä<wbr>ç<wbr>ë<wbr>+<wbr>2<wbr>·<wbr>f<wbr>(<wbr>O<wbr>¬<wbr>á<wbr>ý<wbr>K<wbr>q<wbr>H<wbr>j<wbr>(<wbr>/<wbr>)<wbr>g<wbr>á<wbr>ý<wbr>¥<wbr>X<wbr>Ç<wbr>P<wbr>+<wbr>3<wbr>4<wbr>z<wbr>ð<wbr>C<wbr>8<wbr>3<wbr>¢<wbr>¬<wbr>à<wbr>ø<wbr>v<wbr>¬<wbr>j<wbr>J<wbr>f<wbr>x<wbr>K<wbr>P<wbr>#<wbr>6<wbr>±<wbr>ú<wbr>ß<wbr>M<wbr>t<wbr>E<wbr>0<wbr>½<wbr>h<wbr>û<wbr>6<wbr>Ð<wbr>V<wbr>ò<wbr>µ<wbr>Z<wbr>J<wbr>ã<wbr>ð<wbr>ï<wbr>7<wbr>÷<wbr>ê<wbr>I<wbr>ø<wbr>o<wbr>ð<wbr>T<wbr>j<wbr>><wbr>v<wbr>+<wbr>~<wbr>x<wbr>ö<wbr>%<wbr>}<wbr>¡<wbr>ã<wbr>ô<wbr>ü<wbr>a<wbr>f<wbr>b<wbr>m<wbr><wbr>|<wbr>F<wbr>ã<wbr>¼<wbr>¬<wbr>Ç<wbr>Æ<wbr>A<wbr>ä<wbr>><wbr>Þ<wbr>L<wbr>±<wbr>*<wbr>3<wbr><<wbr>©<wbr>C<wbr>½<wbr>}<wbr>¤<wbr>ç<wbr>]<wbr>w<wbr>ý<wbr>¬<wbr>ü<wbr>l<wbr>:<wbr>¢<wbr>c<wbr>}<wbr>X<wbr>r<wbr>ø<wbr>y<wbr>A<wbr>o<wbr>@<wbr>o<wbr>à<wbr>G<wbr>!<wbr>r<wbr>a<wbr>Ø<wbr>«<wbr>N<wbr>8<wbr>¼<wbr>B<wbr>d<wbr>S<wbr>2<wbr>O<wbr>U<wbr>¥<wbr>A<wbr>=<wbr>|<wbr>Q<wbr>¥<wbr>f<wbr>i<wbr>Ð<wbr>ò<wbr>é<wbr>7<wbr>G<wbr>#<wbr>I<wbr>§<wbr>Q<wbr>H<wbr>H<wbr>¿<wbr>Æ<wbr>0<wbr>a<wbr>F<wbr>ï<wbr>?<wbr>W<wbr>B<wbr>×<wbr>L<wbr>><wbr>N<wbr>^<wbr>§<wbr>[<wbr>]<wbr>í<wbr>¿<wbr>¤<wbr>»<wbr>ø<wbr>R<wbr>ë<wbr>i<wbr>«<wbr>¥<wbr>¾<wbr>v<wbr>}<wbr>©<wbr>×<wbr>¦<wbr>3<wbr>j<wbr>/<wbr>L<wbr>v<wbr>/<wbr>7<wbr>Y<wbr>ê<wbr>a<wbr>!<wbr>»<wbr>¤<wbr>µ<wbr>|<wbr>©<wbr>s<wbr>?<wbr>æ<wbr>à<wbr>î<wbr>0<wbr>«<wbr>ô<wbr>g<wbr>ë<wbr>j<wbr>j<wbr>h<wbr>å<wbr>Q<wbr>Ø<wbr>s<wbr>å<wbr>p<wbr>ð<wbr>|<wbr>»<wbr>z<wbr>C<wbr>P<wbr>h<wbr>s<wbr>ø<wbr>2<wbr>§<wbr>k<wbr>¢<wbr>y<wbr>ï<wbr>¥<wbr>X<wbr>é<wbr>;<wbr>k<wbr>o<wbr>T<wbr>ì<wbr>F<wbr>ä<wbr>ú<wbr>¶<wbr>¾<wbr>¼<wbr>y<wbr>/<wbr>ì<wbr>a<wbr>Þ<wbr>/<wbr>6<wbr>±<wbr>N<wbr>;<wbr>¿<wbr><wbr>n<wbr>®<wbr>Ð<wbr>s<wbr>^<wbr>å<wbr>í<wbr>[<wbr>u<wbr>æ<wbr>·<wbr>ê<wbr>ú<wbr>a<wbr>o<wbr>p<wbr>r<wbr>U<wbr>p<wbr>þ<wbr>~<wbr>ö<wbr>ê<wbr>a<wbr>û<wbr>?<wbr>ä<wbr>=<wbr>ë<wbr>:<wbr>§<wbr>|<wbr>5<wbr>z<wbr>±<wbr>í<wbr>v<wbr>©<wbr>J<wbr>Ø<wbr>Æ<wbr>e<wbr>S<wbr>¬<wbr>C<wbr>!<wbr>+<wbr>Ð<wbr>ï<wbr>D<wbr>|<wbr>9<wbr>L<wbr>4<wbr>[<wbr>L<wbr>C<wbr>^<wbr>G<wbr>@<wbr>í<wbr><<wbr>b<wbr>*<wbr>n<wbr>2<wbr>T<wbr>ú<wbr>M<wbr>x<wbr>^<wbr>ÿ<wbr>®<wbr>|<wbr>m<wbr>(<wbr>Ç<wbr>@<wbr>ó<wbr>1<wbr>¢<wbr>V<wbr>(<wbr>L<wbr>K<wbr>t<wbr>Þ<wbr>ç<wbr>o<wbr>X<wbr>*<wbr>ó<wbr>t<wbr>í<wbr>í<wbr>m<wbr>L<wbr>ý<wbr>T<wbr>N<wbr>¶<wbr>V<wbr>ü<wbr>M<wbr>í<wbr>}<wbr>O<wbr>~<wbr>u<wbr>j<wbr><wbr>a<wbr>F<wbr>8<wbr>x<wbr>~<wbr>X<wbr>Ç<wbr>ý<wbr>E<wbr>M<wbr>Z<wbr>ò<wbr>0<wbr>¢<wbr>q<wbr>*<wbr>(<wbr>L<wbr>2<wbr>ù<wbr>×<wbr>¾<wbr>+<wbr>><wbr>â<wbr>X<wbr>P<wbr>g<wbr>h<wbr>*<wbr>7<wbr>q<wbr>á<wbr>×<wbr>O<wbr>g<wbr>l<wbr>ã<wbr>ó<wbr>[<wbr>K<wbr>x<wbr>ö<wbr>¶<wbr>Q<wbr>a<wbr>¼<wbr>!<wbr>§<wbr>E<wbr>»<wbr>¥<wbr>ï<wbr>ß<wbr>&<wbr>å<wbr>A<wbr>K<wbr>9<wbr>0<wbr>a<wbr>k<wbr>e<wbr>Þ<wbr>W<wbr>?<wbr>o<wbr>¶<wbr>N<wbr>ý<wbr>T<wbr>M<wbr>5<wbr>O<wbr>!<wbr>U<wbr>á<wbr>ë<wbr>¢<wbr>Q<wbr>ý<wbr>X<wbr>ï<wbr>j<wbr>]<wbr>O<wbr>p<wbr>§<wbr>Y<wbr>ó<wbr>-<wbr>><wbr>%<wbr>*<wbr>æ<wbr>¦<wbr>ø<wbr><<wbr>6<wbr>¥<wbr>j<wbr>g<wbr>@<wbr>t<wbr>å<wbr>h<wbr>W<wbr>ù<wbr>½<wbr>p<wbr>¿<wbr>þ<wbr>@<wbr>4<wbr>@<wbr>\<wbr>«<wbr>C<wbr>(<wbr>ê<wbr>ç<wbr>÷<wbr>¥<wbr>×<wbr>I<wbr>K<wbr>ç<wbr>µ<wbr>å<wbr>ï<wbr>t<wbr>g<wbr>¤<wbr>P<wbr>N<wbr>s<wbr>¡<wbr>X<wbr>/<wbr>5<wbr>ã<wbr>d<wbr>ê<wbr>½<wbr>|<wbr>ê<wbr>W<wbr>¿<wbr>;<wbr>®<wbr>¶<wbr>z<wbr>·<wbr>q<wbr>z<wbr>7<wbr>A<wbr>v<wbr>]<wbr>ô<wbr>U<wbr>o<wbr>7<wbr>û<wbr>û<wbr>w<wbr>{<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>! ## January 01, 2015 ### Roman Cheplyaka # Lexical analysis with parser combinators When writing a programming language parser in Haskell, the usual dilemma is whether to use lexer/parser generators (Alex+Happy), or make a single parser (using a parser combinator library like Parsec) without an explicit tokenizer. In this article, I’ll explain why you may want to use a separate lexer built with applicative parser combinators, and how you might go about writing one. ## Alex Alex, the Haskell lexer generator, is a nice tool, but it has a weakness. With Alex, you cannot observe the internal structure of a token. The most common instance of this problem is parsing string interpolation, think "Hello, {name}!". Such a string is a single token, yet it has some internal structure to it. To analyze this with Alex, you’ll probably need to have two different lexers, and run one inside the other. Another instance is date literals (useful for many DSLs), such as 2015-01-02 (or d"2015-01-02" to avoid confusion with infix subtraction.) You can recognize such a literal with an Alex regular expression d+\-d+\-d+. However, you won’t be able to get hold of the individual d+ pieces — Alex regexps lack capture groups. So you’ll need to use a separate date parsing function to parse the same string second time. ## Parsec On the other hand, we have parser combinators that solve this problem nicely. You define parsers for numbers; then you glue them with dashes, and you get a parser for dates. However, using Parsec (or similar) without a separate lexer is not the best choice: 1. Dealing with whitespace and comments is awkward in the parser; you need to wrap everything in a token combinator. (If you decide to do that, at least use a free applicative functor to ensure that you don’t forget to consume that whitespace). 2. Separating parsing into lexical and syntax analysis is just a good engineering practice. It reduces the overall complexity through «divide and conquer». The two phases usually have well-defined responsibilities and a simple interface — why not take advantage of that? 3. If a language needs the maximal munch rule, it’s hard to impossible to encode that with Parsec or similar libraries. 4. Tokens enable the syntax analyzer to report better errors. This is because you can tell which token you didn’t expect. In a Char-based Parsec parser, you can only tell which character (or an arbitrary number of characters) you didn’t expect, because you don’t know what constitutes a token. 5. Potential performance considerations. If a parser has to try several syntax tree alternatives, it reparses low-level lexical tokens anew every time. From this perspective, the lexical analyzer provides a natural «caching layer». ## regex-applicative My regex-applicative library provides applicative parser combinators with regexp semantics. We can write a simple lexer on top of it that would give us the benefits of both Alex and Parsec. {-# LANGUAGE OverloadedStrings, OverloadedLists #-} import qualified Data.Text as T import qualified Data.HashSet as HS import Text.Regex.Applicative import Text.Regex.Applicative.Common (decimal) import Data.Char import Data.Time For example, here’s a parser for dates. (For simplicity, this one doesn’t check dates for validity.) pDate :: RE Char Day pDate = fromGregorian <> decimal <* "-" <*> decimal <* "-" <*> decimal And here’s a parser for templates — strings that may include interpolated variables and escaping. pTemplate :: RE Char [Either T.Text T.Text] -- Left = text, Right = variable pTemplate = "\"" *> some piece <* "\"" where -- piece of text or interpolated variable piece = (Left . T.pack <> some ch) <|> (Right <> var) -- interpolated variable var = "{" *> pVar <* "}" -- normal character, plain or escaped ch = psym (\c -> not  c HS.member escapable) <|> ("\\" *> psym (\c -> c HS.member escapable)) -- set of escapable characters escapable = ['"', '\\', ''] pVar = T.pack <> some (psym isAlpha) Individual parsers are merged into a single pToken parser: data Token = Template [Either T.Text T.Text] | Date Day -- | ... pToken :: RE Char Token pToken = (Template <> pTemplate) <|> (Date <> pDate) -- <|> ... Finally, a simple tokenizing function might look something like this: tokens :: String -> [Token] tokens "" = [] tokens s = let re = (Just <> pToken) <|> (Nothing < some (psym isSpace)) in case findLongestPrefix re s of Just (Just tok, rest) -> tok : tokens rest Just (Nothing, rest) -> tokens rest -- whitespace Nothing -> error "lexical error" The resulting token list can be further parsed with Parsec or Happy. This simple function can be extended to do a lot more: 1. Track position in the input string, use it for error reporting, include it into tokens 2. Consume input incrementally 3. Get feedback from the syntax analyzer, and, depending on the syntactic context, perform different lexical analyses. This is useful for more complex features of programming languages, such as the off-side rule in Haskell or arbitrarily-nested command substitution and «here documents» in POSIX Shell. (This is another thing Alex cannot do, by the way.) ## December 31, 2014 ### Gabriel Gonzalez # Shortcut fusion for pipes Rewrite rules are a powerful tool that you can use to optimize Haskell code without breaking backwards compatibility. This post will illustrate how I use rewrite rules to implement a form of shortcut fusion for my pipes stream programming library. I compare pipes performance before and after adding shortcut fusion and I also compare the peformance of pipes-4.0.2 vs. conduit-1.0.10 and io-streams-1.1.4.0. This post also includes a small introduction to Haskell's rewrite rule system for those who are interested in optimizing their own libraries. Edit: This post originally used the term "stream fusion", but Duncan Coutts informed me that the more appropriate term for this is probably "short-cut fusion". #### Rule syntax The following rewrite rule from pipes demonstrates the syntax for defining optimization rules. {-# RULES "for p yield" forall p . for p yield = p #-}-- ^^^^^^^^^^^^^ ^ ^^^^^^^^^^^^^^^-- Label | Substitution rule-- |-- p can be anything -+ All rewrite rules are substitution rules, meaning that they instruct the compiler to replace anything that matches the left-hand side of the rule with the right-hand side. The above rewrite rule says to always replace for p yield with p, no matter what p is, as long as everything type-checks before and after substitution. Rewrite rules are typically used to substitute code with equivalent code of greater efficiency. In the above example, for p yield is a for loop that re-yields every element of p. The re-yielded stream is behaviorally indistinguishable from the original stream, p, because all that for p yield does is replace every yield in p with yet another yield. However, while both sides of the equation behave identically their efficiency is not the same; the left-hand side is less efficient because for loops are not free. #### Safety Rewrite rules are not checked for correctness. The only thing the compiler does is verify that the left-hand side and right-hand side of the equation both type-check. The programmer who creates the rewrite rule is responsible for proving that the substitution preserves the original behavior of the program. In fact, rewrite rules can be used to rewrite terms from other Haskell libraries without limitation. For this reason, modules with rewrite rules are automatically marked unsafe by Safe Haskell and they must be explicitly marked TrustWorthy to be used by code marked Safe. You can verify a rewrite rule is safe using equational reasoning. If you can reach the right-hand side of the equation from the left-hand side using valid code substitutions, then you can prove (with some caveats) that the two sides of the equation are functionally identical. pipes includes a complete set of proofs for its rewrite rules for this purpose. For example, the above rewrite rule is proven here, where (//>) is an infix synonym for for and respond is a synonym for yield: for = (//>)yield = respond This means that equational reasoning is useful for more than just proving program correctness. You can also use derived equations to optimize your program , assuming that you already know which side of the equation is more efficient. #### Laws Rewrite rules have a significant limitation: we cannot possibly anticipate every possible expression that downstream users might build using our libraries. So how can we optimize as much code as possible without an excessive proliferation of rewrite rules? I anecdotally observe that equations inspired from category theory prove to be highly versatile optimizations that fit within a small number of rules. These equations include: • Category laws • Functor laws • Natural transformation laws (i.e. free theorems) The first half of the shortcut fusion optimization consists of three monad laws, which are a special case of category laws. For those new to Haskell, the three monad laws are: -- Left Identityreturn x >>= f = f x-- Right Identitym >>= return = m-- Associativity(m >>= f) >>= g = m >>= (\x -> f x >>= g) If you take these three laws and replace (>>=)/return with for/yield (and rename m to p, for 'p'ipe), you get the following "for loop laws": -- Looping over a yield simplifies to function applicationfor (yield x) f = f x-- Re-yielding every element returns the original streamfor p yield = p-- You can transform two passes over a stream into a single passfor (for p f) g = for p (\x -> for (f x) g) This analogy to the monad laws is precise because for and yield are actually (>>=) and return for the ListT monad when you newtype them appropriately, and they really form a Monad in the Haskell sense of the word. What's amazing is that these monad laws also double as shortcut fusion optimizations when we convert them to rewrite rules. We already encountered the second law as our first rewrite rule, but the other two laws are useful rewrite rules, too: {-# RULES "for (yield x) f" forall x f . for (yield x) f = f x ; "for (for p f) g" forall p f g . for (for p f) g = for p (\x -> for (f x) g) #-} Note that the RULES pragma lets you group multiple rules together, as long as you separate them by semicolons. Also, there is no requirement that the rule label must match the left-hand side of the equation, but I use this convention since I'm bad at naming rewrite rules. This labeling convention also helps when diagnosing which rules fired (see below) without having to consult the original rule definitions. #### Free theorems These three rewrite rules alone do not suffice to optimize most pipes code. The reason why is that most idiomatic pipes code is not written in terms of for loops. For example, consider the map function from Pipes.Prelude: map :: Monad m => (a -> b) -> Pipe a b m rmap f = for cat (\x -> yield (f x)) The idiomatic way to transform a pipe's output is to compose the map pipe downstream: p >-> map f We can't optimize this using our shortcut fusion rewrite rules unless we rewrite the above code to the equivalent for loop: for p (\y -> yield (f y)) In other words, we require the following theorem: p >-> map f = for p (\y -> yield (f y)) This is actually a special case of the following "free theorem": -- Exercise: Derive the previous equation from this onep1 >-> for p2 (\y -> yield (f y)) = for (p1 >-> p2) (\y -> yield (f y)) A free theorem is an equation that you can prove solely from studying the types of all terms involved. I will omit the proof of this free theorem for now, but I will discuss how to derive free theorems in detail in a follow-up post. For now, just assume that the above equations are correct, as codified by the following rewrite rule: {-# RULES "p >-> map f" . forall p f . p >-> map f = for p (\y -> yield (f y)) #-} With this rewrite rule the compiler can begin to implement simple map fusion. To see why, we'll compose two map pipes and then pretend that we are the compiler, applying rewrite rules at every opportunity. Every time we apply a rewrite rule we will refer to the rule by its corresponding string label: map f >-> map g-- "p >-> map f" rule fired= for (map f) (\y -> yield (g y))-- Definition of map= for (for cat (\x -> yield (f x))) (\y -> yield (g y))-- "for (for p f) g" rule fired= for cat (\x -> for (yield (f x)) (\y -> yield (g y)))-- "for (yield x) f" rule fired= for cat (\x -> yield (g (f x))) This is identical to a single map pass, which we can prove by equational reasoning: for cat (\x -> yield (g (f x)))-- Definition of (.), in reverse= for cat (\x -> yield ((g . f) x))-- Definition of map, in reverse= map (g . f) So those rewrite rules sufficed to fuse the two map passes into a single pass. You don't have to take my word for it, though. For example, let's say that we want to prove that these rewrite rules fire for the following sample program, which increments, doubles, and then discards every number from 1 to 100000000: -- map-fusion.hsimport Pipesimport qualified Pipes.Prelude as Pmain = runEffect  for (each [1..10^8] >-> P.map (+1) >-> P.map (*2)) discard The -ddump-rule-firings flag will output every rewrite rule that fires during compilation, identifying each rule with the string label accompanying the rule:  ghc -O2 -ddump-rule-firings map-fusion.hs[1 of 1] Compiling Main ( test.hs, test.o )...Rule fired: p >-> map fRule fired: for (for p f) gRule fired: for (yield x) f... I've highlighted the rule firings that correspond to map fusion, although there are many other rewrite rules that fire (including more shortcut fusion rule firings). #### Shortcut fusion We don't have to limit ourselves to just fusing maps. Many pipes in Pipes.Prelude has an associated free theorem that rewrites pipe composition into an equivalent for loop. After these rewrites, the "for loop laws" go to town on the pipeline and fuse it into a single pass. For example, the filter pipe has a rewrite rule similar to map: {-# RULES "p >-> filter pred" forall p pred . p >-> filter pred = for p (\y -> when (pred y) (yield y)) #-} So if we combine map and filter in a pipeline, they will also fuse into a single pass: p >-> map f >-> filter pred-- "p >-> map f" rule firesfor p (\x -> yield (f x)) >-> filter pred-- "p >-> filter pred" rule firesfor (for p (\x -> yield (f x))) (\y -> when (pred y) (yield y))-- "for (for p f) g" rule firesfor p (\x -> for (yield (f x)) (\y -> when (pred y) (yield y)))-- for (yield x) f" rule firesfor p (\x -> let y = f x in when (pred y) (yield y)) This is the kind of single pass loop we might have written by hand if we were pipes experts, but thanks to rewrite rules we can write high-level, composable code and let the library automatically rewrite it into efficient and tight loops. Note that not all pipes are fusible in this way. For example the take pipe cannot be fused in this way because it cannot be rewritten in terms of a for loop. #### Benchmarks These rewrite rules make fusible pipe stages essentially free. To illustrate this I've set up a criterion benchmark testing running time as a function of the number of map stages in a pipeline: import Criterion.Mainimport Data.Functor.Identity (runIdentity)import Pipesimport qualified Pipes.Prelude as Pn :: Intn = 10^6main = defaultMain [ bench' "1 stage "  \n -> each [1..n] >-> P.map (+1) , bench' "2 stages"  \n -> each [1..n] >-> P.map (+1) >-> P.map (+1) , bench' "3 stages"  \n -> each [1..n] >-> P.map (+1) >-> P.map (+1) >-> P.map (+1) , bench' "4 stages"  \n -> each [1..n] >-> P.map (+1) >-> P.map (+1) >-> P.map (+1) >-> P.map (+1) ] where bench' label f = bench label  whnf (\n -> runIdentity  runEffect  for (f n) discard) (10^5 :: Int) Before shortcut fusion (i.e. pipes-4.0.0), the running time scales linearly with the number of map stages: warming upestimating clock resolution...mean is 24.53411 ns (20480001 iterations)found 80923 outliers among 20479999 samples (0.4%) 32461 (0.2%) high severeestimating cost of a clock call...mean is 23.89897 ns (1 iterations)benchmarking 1 stage mean: 4.480548 ms, lb 4.477734 ms, ub 4.485978 ms, ci 0.950std dev: 19.42991 us, lb 12.11399 us, ub 35.90046 us, ci 0.950benchmarking 2 stagesmean: 6.304547 ms, lb 6.301067 ms, ub 6.310991 ms, ci 0.950std dev: 23.60979 us, lb 14.01610 us, ub 37.63093 us, ci 0.950benchmarking 3 stagesmean: 10.60818 ms, lb 10.59948 ms, ub 10.62583 ms, ci 0.950std dev: 61.05200 us, lb 34.79662 us, ub 102.5613 us, ci 0.950benchmarking 4 stagesmean: 13.74065 ms, lb 13.73252 ms, ub 13.76065 ms, ci 0.950std dev: 61.13291 us, lb 29.60977 us, ub 123.3071 us, ci 0.950 Stream fusion (added in pipes-4.0.1) makes additional map stages essentially free: warming upestimating clock resolution...mean is 24.99854 ns (20480001 iterations)found 1864216 outliers among 20479999 samples (9.1%) 515889 (2.5%) high mild 1348320 (6.6%) high severeestimating cost of a clock call...mean is 23.54777 ns (1 iterations)benchmarking 1 stage mean: 2.427082 ms, lb 2.425264 ms, ub 2.430500 ms, ci 0.950std dev: 12.43505 us, lb 7.564554 us, ub 20.11641 us, ci 0.950benchmarking 2 stagesmean: 2.374217 ms, lb 2.373302 ms, ub 2.375435 ms, ci 0.950std dev: 5.394149 us, lb 4.270983 us, ub 8.407879 us, ci 0.950benchmarking 3 stagesmean: 2.438948 ms, lb 2.436673 ms, ub 2.443006 ms, ci 0.950std dev: 15.11984 us, lb 9.602960 us, ub 23.05668 us, ci 0.950benchmarking 4 stagesmean: 2.372556 ms, lb 2.371644 ms, ub 2.373949 ms, ci 0.950std dev: 5.684231 us, lb 3.955916 us, ub 9.040744 us, ci 0.950 In fact, once you have just two stages in your pipeline, pipes greatly outperforms conduit and breaks roughly even with io-streams. To show this I've written up a benchmark comparing pipes performance against these libraries for both pure loops and loops that are slightly IO-bound (by writing to /dev/null): import Criterion.Mainimport Data.Functor.Identity (runIdentity)import qualified System.IO as IOimport Data.Conduitimport qualified Data.Conduit.List as Cimport Pipesimport qualified Pipes.Prelude as Pimport qualified System.IO.Streams as Scriterion :: Int -> IO ()criterion n = IO.withFile "/dev/null" IO.WriteMode  \h -> defaultMain [ bgroup "pure" [ bench "pipes"  whnf (runIdentity . pipes) n , bench "conduit"  whnf (runIdentity . conduit) n , bench "io-streams"  nfIO (iostreams n) ] , bgroup "io" [ bench "pipes"  nfIO (pipesIO h n) , bench "conduit"  nfIO (conduitIO h n) , bench "iostreams"  nfIO (iostreamsIO h n) ] ]pipes :: Monad m => Int -> m ()pipes n = runEffect  for (each [1..n] >-> P.map (+1) >-> P.filter even) discardconduit :: Monad m => Int -> m ()conduit n = C.enumFromTo 1 n = C.map (+1) = C.filter even C.sinkNulliostreams :: Int -> IO ()iostreams n = do    is0 <- S.fromList [1..n]    is1 <- S.map (+1) is0    is2 <- S.filter even is1    S.skipToEof is2pipesIO :: IO.Handle -> Int -> IO ()pipesIO h n = runEffect $each [1..n] >-> P.map (+1) >-> P.filter even >-> P.map show >-> P.toHandle hconduitIO :: IO.Handle -> Int -> IO ()conduitIO h n = C.enumFromTo 1 n$= C.map (+1)    $= C.filter even$= C.map show     C.mapM_ (IO.hPutStrLn h)iostreamsIO :: IO.Handle -> Int -> IO ()iostreamsIO h n = do    is0 <- S.fromList [1..n]    is1 <- S.map (+1) is0    is2 <- S.filter even is1    is3 <- S.map show is2    os  <- S.makeOutputStream $\ma -> case ma of Just str -> IO.hPutStrLn h str _ -> return () S.connect is3 osmain = criterion (10^6) The benchmarks place pipes neck-and-neck with io-streams on pure loops and 10% slower on slightly IO-bound code. Both libraries perform faster than conduit: warming upestimating clock resolution...mean is 24.50726 ns (20480001 iterations)found 117040 outliers among 20479999 samples (0.6%) 45158 (0.2%) high severeestimating cost of a clock call...mean is 23.89208 ns (1 iterations)benchmarking pure/pipesmean: 24.04860 ms, lb 24.02136 ms, ub 24.10872 ms, ci 0.950std dev: 197.3707 us, lb 91.05894 us, ub 335.2267 us, ci 0.950benchmarking pure/conduitmean: 172.8454 ms, lb 172.6317 ms, ub 173.1824 ms, ci 0.950std dev: 1.361239 ms, lb 952.1500 us, ub 1.976641 ms, ci 0.950benchmarking pure/io-streamsmean: 24.16426 ms, lb 24.12789 ms, ub 24.22919 ms, ci 0.950std dev: 242.5173 us, lb 153.9087 us, ub 362.4092 us, ci 0.950benchmarking io/pipesmean: 267.7021 ms, lb 267.1789 ms, ub 268.4542 ms, ci 0.950std dev: 3.189998 ms, lb 2.370387 ms, ub 4.392541 ms, ci 0.950benchmarking io/conduitmean: 310.3034 ms, lb 309.8225 ms, ub 310.9444 ms, ci 0.950std dev: 2.827841 ms, lb 2.194127 ms, ub 3.655390 ms, ci 0.950benchmarking io/iostreamsmean: 239.6211 ms, lb 239.2072 ms, ub 240.2354 ms, ci 0.950std dev: 2.564995 ms, lb 1.879984 ms, ub 3.442018 ms, ci 0.950 I hypothesize that pipes performs slightly slower on IO compared to io-streams because of the cost of calling lift, whereas iostreams operates directly within the IO monad at all times. These benchmarks should be taken with a grain of salt. All three libraries are most frequently used in strongly IO-bound scenarios, where the overhead of each library is pretty much negligible. However, this still illustrates how big of an impact shortcut fusion can have on pure code paths. #### Conclusions pipes is a stream programming library with a strong emphasis on theory and the library's contract with the user is a set of laws inspired by category theory. My original motivation behind proving these laws was to fulfill the contract, but I only later realized that the for loop laws doubled as fortuitous shortcut fusion optimizations. This is a recurring motif in Haskell: thinking mathematically pays large dividends. For this reason I like to think of Haskell as applied category theory: I find that many topics I learn from category theory directly improve my Haskell code. This post shows one example of this phenomenon, where shortcut fusion naturally falls out of the monad laws for ListT. ### Well-Typed.Com # Simple SMT solver for use in an optimizing compiler This is a second blog post in a series about engineering optimizing compilers; the previous was Quasi-quoting DSLs for free. In this blog we will see how we might write a very simple SMT solver that will allow us to write an optimization pass that can turn something like if a == 0 then if !(a == 0) && b == 1 then write 1 else write 2 else write 3 into if a == 0 then write 2 else write 3 without much effort at all. ## Preliminaries For the sake of this blog post we will consider a very simple imperative object language, defined as data Expr = -- Arithmetic expressions ExprInt Integer -- ^ Integer constants | ExprAdd Expr Expr -- ^ Addition -- Boolean expressions | ExprBool Bool -- ^ Boolean constants | ExprEq Expr Expr -- ^ Equality | ExprNot Expr -- ^ Negation | ExprAnd Expr Expr -- ^ Logical conjunction | ExprOr Expr Expr -- ^ Logical disjunction -- Variables | ExprVar VarName -- ^ Read from a variable | ExprAssign VarName Expr -- ^ Write to a variable -- Control flow | ExprSeq Expr Expr -- ^ Sequential composition | ExprIf Expr Expr Expr -- ^ Conditional -- Input/output | ExprRead -- ^ Read an integer from the console | ExprWrite Expr -- ^ Write an integer to the console We will assume the existence of a quasi-quoter for this language so that we can write Haskell fragments such as [expr| if a == 0 then read else write b |] instead of ExprIf (ExprEq (ExprVar a) (ExprInt 0)) ExprRead (ExprWrite (ExprVar b)) How you can write such a quasi-quoter was the topic of the previous blog post. You should however be able to read this blog post without having read the previous post; hopefully the mapping from the concrete syntax to the constructors of Expr is pretty obvious. ## Simplifying assumption Our goal will be to write a function provable :: Expr -> Bool Let’s consider some examples: • The expression a || True should be provable: no matter what value variable a has, a || True is always True. • Likewise, the expression a || !a should be provable, for the same reason. • However, expression a should not be provable (a might be False) • Likewise, expression !a should also not be provable (a might be True) Note that this means that it’s perfectly possible for both an expression and the negation of that expression to be unprovable. What about an expression !(n == 0 && n == 1)? Morally speaking, this expression should be provable. However, we will be making the following very important simplifying assumption: provable is allowed to give up: when provable returns False, this should be interpreted as “failed to prove”, rather than “there exist a counterexample”. From a compiler perspective, if something is not statically provable, that simply means that an optimization may not be applied even though it could: that is, we miss an opportunity to make the program go faster, but we will not break the code. ## An evaluator We don’t want (or need) to embed a full blown theorem prover into our compiler: ideally we write something simple that will still catch a lot of the common cases. Moreover, we would prefer to reuse as much of our existing compiler infrastructure as we can. Our optimizer is likely to contain an interpreter for the object language, so that we can attempt to statically evaluate expressions. We are going to adapt this interpreter so that we can also use it in our solver. In fact, as we shall see, the solver will be a one-liner. But we’re getting ahead of ourselves. Let’s consider how to write the interpreter first. The interpreter will be running in an Eval monad; for our first attempt, let’s define this monad as a a simple wrapper around the list monad: newtype Eval a = Eval { unEval :: [a] } deriving ( Functor , Applicative , Alternative , Monad , MonadPlus ) runEval :: Eval a -> [a] runEval act = unEval act We will use the monad for failure: throwError :: Eval a throwError = Eval [] We can provide error recovery through onError :: Eval a -> Eval a -> Eval a onError (Eval act) (Eval handler) = Eval$
case act of
[] -> handler
rs -> rs

We will see why we need the list monad when we discuss the evaluator for boolean expressions; but let’s consider the evaluator for integer expressions first:

evalInt :: Expr -> Eval Integer
evalInt = go
where
go (ExprInt i)    = return i
go (ExprAdd a b)  = (+) <$> evalInt a <*> evalInt b go (ExprIf c a b) = do cond <- evalBool c evalInt (if cond then a else b) go _ = throwError  Hopefully this should be pretty self explanatory; our toy language only has a few integer-valued expressions, so there isn’t much to do. The interpreter for boolean expressions is more interesting: evalBool :: Expr -> Eval Bool evalBool = \e -> go e onError guess e where go (ExprBool a) = return a go (ExprEq a b) = (==) <$> evalInt a <*> evalInt b
go (ExprNot  a)   = not  <$> evalBool a go (ExprAnd a b) = (&&) <$> evalBool a <*> evalBool b
go (ExprOr   a b) = (||) <$> evalBool a <*> evalBool b go (ExprIf c a b) = do cond <- evalBool c evalBool (if cond then a else b) go _ = throwError guess _e = return True <|> return False The definition of go contains no surprises, and follows the definition of go in evalInt very closely. However, the top-level definition evalBool = \e -> eval e onError guess e is more interesting. If for some reason we fail to evaluate a boolean expression (for example, because it contains a variable) then guess returns both True and False. Let’s consider some examples: runEval$ evalBool [expr| True |]

evalutes to [True];

runEval $evalBool [expr| a |] evaluates to [True, False] because we don’t know what value a has, but runEval$ evalBool [expr| a || True |]

evaluates to [True, True]: we still have two guesses for a, but no matter what we guess a || True always evaluates to True.

## Satisfiability

We can now write our SMT solver; as promised, it will be a single line:

satisfiable :: Expr -> Bool
satisfiable = or . runEval . evalBool

Function satisfiable (the “S” in SMT) checks if the expression is true for some values of the variables in the expression. How do we check this? Well, we just run our evaluator which, when it encounters a variable, will return both values for the variable. Hence, if any of the values returned by the evaluator is True, then the expression is true at least for one value for each variable.

Once we have an implementation of satisfiability, we can implement provable very easily: an expression is provable if its negation is not satisfiable:

provable :: Expr -> Bool
provable = not . satisfiable . ExprNot

If we consider the three examples from the previous section, we will find that both True and a || True are provable, but a by itself is not, as expected.

## Inconsistent guesses

The careful reader might at this point find his brow furrowed, because something is amiss:

runEval $evalBool [expr| a || !a |] evaluates to [True, True, False, True] This happens because the evaluator will make two separate independent guesses about the value of a. As a consequence, a || !a will be considered not provable. Is this a bug? No, it’s not. Recall that we made the simplifying assumption that provable is allowed to give up: it’s allowed to say that an expression is not provable even when it morally is. The corresponding (dual) simplifying assumption for satisfability, and hence the interpreter, is: The interpreter, and hence satisfiability, is allowed to make inconsistent guesses. Making an inconsistent guess is equivalent to assuming False: anything follows from False and hence any expression will be considered satisfiable once we make an inconsistent guess. As a consequence, this means that once we make inconsistent guesses, we will consider the expression as not provable. ## More precision We can however do better: whenever we make a guess that a particular expression evaluates to True or False, then if we see that same expression again we can safely make the same guess, rather than making an independent guess. To implement this, we extend our Eval monad with some state to remember the guesses we made so far: newtype Eval a = Eval { unEval :: StateT [(Expr, Bool)] [] a } deriving ( Functor , Applicative , Alternative , Monad , MonadPlus , MonadState [(Expr, Bool)] ) runEval :: Eval a -> [a] runEval act = evalStateT (unEval act) [] throwError :: Eval a throwError = Eval$ StateT $\_st -> [] onError :: Eval a -> Eval a -> Eval a onError (Eval act) (Eval handler) = Eval$ StateT $\st -> case runStateT act st of [] -> runStateT handler st rs -> rs The implementation of evalInt does not change at all. The implementation of evalBool also stays almost the same; the only change is the definition of guess: guess e = do st <- get case lookup e st of Just b -> return b Nothing -> (do put ((e, True) : st) ; return True) <|> (do put ((e, False) : st) ; return False) This implements the logic we just described: when we guess the value for an expression e, we first look up if we already made a guess for this expression. If so, we return the previous guess; otherwise, we make a guess and record that guess. Once we make this change runEval$ evalBool [expr| a || !a |]

will evaluate to [True,True] and consequently a || !a will be considered provable.

## Example: folding nested conditionals

As an example of how one might use this infrastructure, we will consider a simple pass that simplifies nested conditionals (if-statements). We can use provable to check if one expression implies the other:

(~>) :: Expr -> Expr -> Bool
(~>) a b = provable [expr| ! $a ||$b |]

The simplifier is now very easy to write:

simplifyNestedIf :: Expr -> Expr
simplifyNestedIf = everywhere (mkT go)
where
go [expr| if $c1 then if$c2 then
$e1 else$e2
else
$e3 |] | c1 ~> c2 = [expr| if$c1 then $e1 else$e3 |]
| c1 ~> [expr| ! $c2 |] = [expr| if$c1 then $e2 else$e3 |]
go e = e

The auxiliary function go pattern matches against any if-statement that has another if-statement as its “then” branch. If we can prove that the condition of the outer if-statement implies the condition of the inner if-statement, we can collapse the inner if-statement to just its “then” branch. Similarly, if we can prove that the condition of the outer if-statement implies the negation of the condition of the inner if-statement, we can collapse the inner if-statement to just its “else” branch. In all other cases, we return the expression unchanged. Finally, we use the Scrap Your Boilerplate operators everywhere and mkT to apply this transformation everywhere in an expression (rather than just applying it top-level). For example,

simplifyNestedIf [expr|
if a == 0 then
if !(a == 0) && b == 1 then
write 1
else
write 2
else
write 3
|]

evaluates to

if a == 0 then
write 2
else
write 3

as promised. Incidentally, perhaps you are wondering why such an optimization would be useful; after all, which self respecting programmer writes such code? However, code like the above may result from other compiler optimizations such as inlining: imagine that the inner if-statement came from a inlined function. A lot of compiler optimizations are designed to clean up other compiler optimizations.

## Conclusion

We can implement a very simple but useful SMT solver for use in an optimizing compiler by making a small change to the interpreter, which we need anyway.

Note that the increase in precision gained from recording previous guesses is a gradual increase only; satisfiability may still make some inconsistent guesses. For example

runEval $evalBool [expr| !(n == 0 && n == 1) |] will evaluate to [False,True,True,True] because it is making independent guesses about n == 0 and n == 1; consequently !(n == 0 && n == 1) will not be considered provable. We can increase the precision of our solver by making guess smarter (the “MT” or “modulo theories” part of SMT). For example, we could record some information about the guesses about integer valued variables. At the extreme end of the scale we would be implementing a full decision procedure for first order arithmetic, which is probably optimization gone too far. However, the approach we took above where we start from the basis that we are allowed to make inconsistent guesses gives us a way to implement a simple solver that addresses the most important cases, even if it misses a few—without requiring a lot of complicated infrastructure in the compiler. And as long as we make sure that our evaluator never returns too few values (even if it may return too many) we don’t have to worry that we might generate invalid code: the worst that can happen is that we miss an optimization. ## December 30, 2014 ### Roman Cheplyaka # Denotational design does not work Conal Elliott in his paper Denotational design with type class morphisms, as well as in the recent podcast, advocates denotational design as a principle of software design. Unfortunately, in my experience it never works for realistically complex problems. ## On simplicity First, I’ll formulate Conal’s approach as I understand it. For any given entity of your model, you should come up with a simple mathematical object — the denotation — that faithfully represents that entity. The implementation of the type may vary, presumably to maximize its runtime efficiency, but it should not expose any more information than the chosen denotation has. That is considered an «abstraction leak». Conal specifically talks about that in the podcast (31m50s, for example). Here I need to stress an important but subtle point in Conal’s principle: simplicity. You only follow Conal’s principle if you find a simple denotation, not just any denotation. This point is important because without it any design is denotational design, trivially. Universal algebra tells us that for any set of operations and any (non-contradictory) laws about them there exists a model that satisfies these laws. For any Haskell module, we can interpret its types in the set theory (or a more complex domain if needed), and call that our denotation. But that’s not what Conal is after. His approach is interesting exactly because he argues that it is possible to find simple denotations. This subtle point makes Conal’s approach simultaneously attractive and unrealistic. I’ll demonstrate this with two examples from my own work experience. ## DSLs At Barclays I worked on FPF, an embedded domain-specific language for describing financial instruments. In his paper, Conal shows how a denotation for such a DSL can quickly grow in complexity when requirements change. When variables and errors are introduced, the denotation changes from a to Env -> Result a. Still, this is a very simple DSL that only supports evaluation. In reality, the main reason people make DSLs instead of using general-purpose languages is the ability to analyze DSL programs. One important feature of FPF is that it could pretty-print a program into a nice PDF. That poses an obvious problem — not every two semantically equivalent programs (under the interpretation semantics) result in equally nice PDFs. Inlining is a semantically sound transformation, but when our users get PDFs with all the definitions inlined, they get angry. Sure, we could say that now our denotation becomes the domain product (Env -> Result a, String), where String is the pretty printer output. But in reality we have a dozen different analyses, and most of them are not expressible in terms of each other, or any single simple model. They also do not satisfy many laws. For instance, one day a user (quant or trader) could come and tell us that the barrier classifier should classify two mathematically equivalent expressions as different barriers because those expressions follow certain conventions. And even though the quant is mathematically inclined, denotations and type class morphism would be the last thing he wants to hear about in response to his feature request. So, in practice, the best denotation for the DSL expressions was the AST itself. Which, according to my interpretation of Conal’s principles, is not an example of a denotational design, but a failure to apply one. ## Machines At my current job (Signal Vine), I work on a platform for scripted interaction with students via text messages. For every student enrolled in a messaging campaign, we send a message, receive a reply, process it, and the cycle repeats. This is very similar to FRP; perhaps not the FRP Conal prefers (in the podcast he stresses the importance of continuous functions as opposed to events), but the kind of discrete FRP that Justin Le models with Mealy machines. So it would seem that I should model a student as newtype Student = Student (InboundSMS -> (OutboundSMS, Student)) That would be an exemplary case of denotational design. But that would be far from our customers’ needs. Every student has a set of profile variables that are filled when the student responds to a text, and our customers (counselors who work with that student) want to see those variables. They also want to see which messages were sent, what the student’s replies were, and even what messages will be sent to the student in the future. These requirements defeat the attempt to model a student in a simple, abstract way. Instead, I need to store all the information I have about the student because sooner or later I’ll need to expose that information to the user. ## Conclusions Denotational design is a very neat idea, but I believe that it only works in simple cases and when requirements are static. In real-world commercial programming, it breaks for two main reasons: 1. Users often want maximum insight into what’s going on, and you need to break the abstraction to deliver that information. 2. Requirements change, and an innocent change in requirements may lead to a drastic change and increase in complexity of the denotation. It is certainly useful to think about denotations of your entities in specific, simple contexts (like the evaluation semantics for a DSL); such thought experiments may help you better understand the problem or even find a flaw in your implementation. But when you are implementing the actual type, your best bet is to create an algebraic data type with all the information you have, so that when you need to extend the interface (or «leak the abstraction»), it won’t cause you too much pain. ### Edward Kmett # Fast Circular Substitution Emil Axelsson and Koen Claessen wrote a functional pearl last year about Using Circular Programs for Higher-Order Syntax. About 6 months ago I had an opportunity to play with this approach in earnest, and realized we can speed it up a great deal. This has kept coming up in conversation ever since, so I've decided to write up an article here. In my bound library I exploit the fact that monads are about substitution to make a monad transformer that manages substitution for me. Here I'm going to take a more coupled approach. To have a type system with enough complexity to be worth examining, I'll adapt Dan Doel's UPTS, which is a pure type system with universe polymorphism. I won't finish the implementation here, but from where we get it should be obvious how to finish the job. Unlike Axelsson and Claessen I'm not going to bother to abstract over my name representation. To avoid losing the original name from the source, we'll just track names as strings with an integer counting the number of times it has been 'primed'. The name is purely for expository purposes, the real variable identifier is the number. We'll follow the Axelsson and Claessen convention of having the identifier assigned to each binder be larger than any one bound inside of it. If you don't need he original source names you can cull them from the representation, but they can be useful if you are representing a syntax tree for something you parsed and/or that you plan to pretty print later.  data Name = Name String Int deriving (Show,Read) hint :: Name -> String hint (Name n _) = n nameId :: Name -> Int nameId (Name _ i) = i instance Eq Name where (==) = (==) on nameId instance Ord Name where compare = compare on nameId prime :: String -> Int -> Name prime n i = Name n (i + 1)  So what is the language I want to work with?  type Level = Int data Constant = Level | LevelLiteral {-# UNPACK #-} !Level | Omega deriving (Eq,Ord,Show,Read,Typeable) data Term a = Free a | Bound {-# UNPACK #-} !Name | Constant !Constant | Term a :+ {-# UNPACK #-} !Level | Max [Term a] | Type !(Term a) | Lam {-# UNPACK #-} !Name !(Term a) !(Term a) | Pi {-# UNPACK #-} !Name !(Term a) !(Term a) | Sigma {-# UNPACK #-} !Name !(Term a) !(Term a) | App !(Term a) !(Term a) | Fst !(Term a) | Snd !(Term a) | Pair !(Term a) !(Term a) !(Term a) deriving (Show,Read,Eq,Ord,Functor,Foldable,Traversable,Typeable)  That is perhaps a bit paranoid about remaining strict, but it seemed like a good idea at the time. We can define capture avoiding substitution on terms:  subst :: Eq a => a -> Term a -> Term a -> Term a subst a x y = y >>= \a' -> if a == a' then x else return a'  Now we finally need to implement Axelsson and Claessen's circular programming trick. Here we'll abstract over terms that allow us to find the highest bound value within them:  class Bindable t where bound :: t -> Int  and instantiate it for our Term type  instance Bindable (Term a) where bound Free{} = 0 bound Bound{} = 0 -- intentional! bound Constant{} = 0 bound (a :+ _) = bound a bound (Max xs) = foldr (\a r -> bound a max r) 0 xs bound (Type t) = bound t bound (Lam b t _) = nameId b max bound t bound (Pi b t _) = nameId b max bound t bound (Sigma b t _) = nameId b max bound t bound (App x y) = bound x max bound y bound (Fst t) = bound t bound (Snd t) = bound t bound (Pair t x y) = bound t max bound x max bound y  As in the original pearl we avoid traversing into the body of the binders, hence the _'s in the code above. Now we can abstract over the pattern used to create a binder in the functional pearl, since we have multiple binder types in this syntax tree, and the code would get repetitive.  binder :: Bindable t => (Name -> t) -> (Name -> t -> r) -> String -> (t -> t) -> r binder bd c n e = c b body where body = e (bd b) b = prime n (bound body) lam, pi, sigma :: String -> Term a -> (Term a -> Term a) -> Term a lam s t = binder Bound (Lam t) s pi s t = binder Bound (Pi t) s sigma s t = binder Bound (Sigma t) s  We may not always want to give names to the variables we capture, so let's define: lam_, pi_, sigma_ :: Term a -> (Term a -> Term a) -> Term a lam_ = lam "_" pi_ = pi "_" sigma_ = sigma "_"  Now, here's the interesting part. The problem with Axelsson and Claessen's original trick is that every substitution is being handled separately. This means that if you were to write a monad for doing substitution with it, it'd actually be quite slow. You have to walk the syntax tree over and over and over. We can fuse these together by making a single pass:  instantiate :: Name -> t -> IntMap t -> IntMap t instantiate = IntMap.insert . nameId rebind :: IntMap (Term b) -> Term a -> (a -> Term b) -> Term b rebind env xs0 f = go xs0 where go = \case Free a -> f a Bound b -> env IntMap.! nameId b Constant c -> Constant c m :+ n -> go m :+ n Type t -> Type (go t) Max xs -> Max (fmap go xs) Lam b t e -> lam (hint b) (go t)$ \v ->
rebind (instantiate b v env) e f
Pi b t e     -> pi    (hint b) (go t) $\v -> rebind (instantiate b v env) e f Sigma b t e -> sigma (hint b) (go t)$ \v ->
rebind (instantiate b v env) e f
App x y      -> App (go x) (go y)
Fst x        -> Fst (go x)
Snd x        -> Snd (go x)
Pair t x y   -> Pair (go t) (go x) (go y)


Note that the Lam, Pi and Sigma cases just extend the current environment.

With that now we can upgrade the pearl's encoding to allow for an actual Monad in the same sense as bound.


instance Applicative Term where
pure = Free
(< *>) = ap

return = Free
(>>=) = rebind IntMap.empty


To show that we can work with this syntax tree representation, let's write an evaluator from it to weak head normal form:

First we'll need some helpers:


apply :: Term a -> [Term a] -> Term a
apply = foldl App

rwhnf :: IntMap (Term a) ->
[Term a] -> Term a -> Term a
rwhnf env stk     (App f x)
= rwhnf env (rebind env x Free:stk) f
rwhnf env (x:stk) (Lam b _ e)
= rwhnf (instantiate b x env) stk e
rwhnf env stk (Fst e)
= case rwhnf env [] e of
Pair _ e' _ -> rwhnf env stk e'
e'          -> Fst e'
rwhnf env stk (Snd e)
= case rwhnf env [] e of
Pair _ _ e' -> rwhnf env stk e'
e'          -> Snd e'
rwhnf env stk e
= apply (rebind env e Free) stk


Then we can start off the whnf by calling our helper with an initial starting environment:


whnf :: Term a -> Term a
whnf = rwhnf IntMap.empty []


So what have we given up? Well, bound automatically lets you compare terms for alpha equivalence by quotienting out the placement of "F" terms in the syntax tree. Here we have a problem in that the identifiers we get assigned aren't necessarily canonical.

But we can get the same identifiers out by just using the monad above:


alphaEq :: Eq a => Term a -> Term a -> Bool
alphaEq = (==) on liftM id


It makes me a bit uncomfortable that our monad is only up to alpha equivalence and that liftM swaps out the identifiers used throughout the entire syntax tree, and we've also lost the ironclad protection against exotic terms.

But overall, this is a much faster version of Axelsson and Claessen's trick and it can be used as a drop-in replacement for something like bound in many cases, and unlike bound, it lets you use HOAS-style syntax for constructing lam, pi and sigma terms.

With pattern synonyms you can prevent the user from doing bad things as well. Once 7.10 ships you'd be able to use a bidirectional pattern synonym for Pi, Sigma and Lam to hide the real constructors behind. I'm not yet sure of the "best practices" in this area.

Here's the code all in one place:

Happy Holidays,
-Edward

# Higher order functions and interface files.

Work on DDC progresses, though recently it's been straightforward compiler engineering rather than anything novel, so I haven't been blogging. DDC now drops interface files when compiling modules, multi-module compilation works, and the front-end supports unicode string literals. I've also implemented a lambda lifter and support for higher order functions. This part is well baked enough to implement simple stream functions, as used in the Haskell Data.Vector library, though the compiler doesn't do the associated fusion yet. For example:
data Stream (s a : Data) where        MkStream : (s -> Step s a) -> s -> Stream s adata Step (s a : Data) where        Yield   : a -> s -> Step s a        Skip    : s -> Step s a         Done    : Step s a-- | Apply a function to every element of a stream.smap (f : a -> b) (ss : Stream s a) : Stream s b = case ss of      MkStream stepA sA0        -> let stepB q = case stepA q of                                Yield x sA1     -> Yield (f x) sA1                                Skip sA2        -> Skip sA2                                Done            -> Done           in  MkStream stepB sA0-- | Take the given number of elements from the front of a stream.stake (n : Nat) (ss : Stream s a) : Stream (Tup2 s Nat) a = case ss of     MkStream fA sA0      -> let stepB q = case q of                       T2 sA ix                         -> if ix >= n                                then Done                                else case fA sA of                                        Yield x sA2 -> Yield x (T2 sA2 (ix + 1))                                        Skip sA3    -> Skip  (T2 sA3 ix)                                        Done        -> Done        in   MkStream stepB (T2 sA0 0)
As we can see, work on larger demos is starting to be hampered by the lack of pattern matching sugar in the source language, so I'm going to add that next. After adding pattern matching, I'm going to spend a couple of weeks bug fixing, and should get the DDC 0.5 release out in early February 2015.

# All Hammers are Terrible

“All computers are terrible.”

On social media, this turn of phrase is almost a meme. The sentiment is reasonable at some level—computers are complicated, buggy, and insecure. But the general public is largely indifferent, or least, they accept the situation for what it is: Computer freezes? turn it on then off again. Smart phone insecure? Well, the convenience of digital wallets outweighs the risk. My parents, who are not computer scientists, never say that all computers are terrible.

Rather, this is a complaint from within, by computer scientists themselves.  What drives this thinking? (While not specifically addressing the cause of the sentiment, this blog post provides a nice counterpoint.) No immediately obvious hypotheses seem particularly explanatory:

• Ludditism: perhaps despite (or because of) our work in technology, computer scientists abhor it. But I find this explanation wanting, particularly because the focus of contempt is specifically about computers, and not about technological advancement generally.
• Pessimism: perhaps there is a correlation between computer scientists and pessimists. Faced with the challenge of building good systems, they focus on the flaws rather than the progress. But it does not adequately explain the focus of the pessimism on computers rather than life in general—I don’t see posts stating that “all relationships are terrible”.
• Realism: perhaps computers are terrible. Computer science history is measured in decades, and the approach to building systems is immature compared to other engineering disciplines. I tore down my old garage this summer. The city required a blueprint, erosion plan, a drainage plan, and a few hundred dollars for me to do so. They inspected the work afterwards. Contrast this with software: how much oversight does writing security-critical code get? Just about anyone can throw code up that purports to solve a problem. Still, given that programs are among the most complex artifacts built by humans, we’re doing pretty well.
• Narcissism: finally, perhaps a tinge of narcissism drives these comments, something along the lines of, “where others have failed, I will succeed.” It is given as a tongue-in-cheek put-down regarding the work of fellow professionals. The motivation is there: if we thought that the tools of computer science were “good enough”, then the world wouldn’t need new programming languages, tools, or libraries. On the other hand, these kind of complaints are not just about directly competing work, and there will always be new inventions to make.

I tend not to hear the same kind of complaints from members of other professions. Yes, we all complain about our jobs, but not necessarily about our tools. Carpenters don’t say, “All hammers are terrible,” even if they complain about their company, their coworkers, the architect, the inspectors. Doctors don’t complain about X-ray machines, MRIs, or scalpels (but they might complain about their patients, insurance, and administration). Farmers don’t complain about the design of combines and tractors. Now, they might complain about a specific instance—“My tractor is terrible; it breaks down all the time.” But not about tractors, in general.

Other professions largely take their tools as a given. Indeed, most practitioners know so little about how the tools they use are built, it is difficult to critique them. A carpenter does not know how a skill saw is built and a doctor does not know how an X-rays are taken. Just whether they work or not and if they are useful or not.

Computer science is unique insofar as the tool makers are so closely connected with the tool users. There’s another glib quip in computer science: “patches welcome” meaning that if you (the user) thinks a tool should be improved, then it is your responsibility to make those improvements yourself. In other fields, the saying does not exist; it would be laughable! A taxi driver can’t be expected to invent a more fuel-efficient automobile; an architect can’t be expected to build better modeling software.

Computer science is unique in this respect, and it has philosophical, psychological, and sociological implications. Imagine, if you will, you are a carpenter, and you have some understanding of how the tools that you have are designed and built. Your skill saw is right-handed, but you are left-handed; with a few modifications, you realize its design can be abstracted, to accommodate either handedness. When your drill dies, instead of simply replacing it, you open it up, find the faulty circuit, and realize there’s a weakness in the wiring that allows it to overheat. When each of your tools fails you in some way, you uncover some flaw and realize they could be improved. Finally, you come to expect failure from every tool you might ever use: “All hammers are terrible.”

We all know that technology—whether we understand how it is built or not—has flaws. We know tradeoffs are made between cost, generality, performance, and reliability. There are more and less reliable, performant, and costly cars, saws, X-ray machines.

The general public certainly has opinions and a sense of the tradeoffs between the tools they use, e.g., Windows vs. Mac OS, Android vs. iOS. But the sentiment is that these are tools, with flaws and lifespans, just like a cars, appliances, and power tools.

I don’t have a strong opinion about whether computer scientists should complain or not about their tools, except, obviously, that general complaints don’t effect much change. My point is more philosophical: we find ourselves in an interesting profession, where we can open our very tools up, peer inside, and change them.

It’s a fascinating prospect.

# Nursery sizes.

Intel i5-3210M cpu, 3072 KB L3 cache. Not sure why the CPU stalls with the tiny nurseries.

# Simple Combinators for Manipulating CGPoint/CGSize/CGRect with Swift

One of the most painful things about Objective-C was having to modify CGPoint, CGSize or CGRect values. The clunky struct interface made even simple modifications verbose and ugly, since struct expressions were read-only:

CGRect imageBounds = self.view.bounds;
imageBounds.size.height -= self.footer.bounds.size.height;

self.imageView.bounds = imageBounds;

Even though we have auto-layout, I often find myself doing this kind of arithmetic with points, size or rects. In Objective-C, it required either generating dummy variables so you can modify members (as above), or really messy struct initialization syntax:

self.imageView.bounds = (CGRect) {
.origin = self.view.bounds.origin,
.size = CGSizeMake(self.view.bounds.size.width, self.view.bounds.size.height -
self.footer.bounds.size.height) };

Fortunately, none of this boilerplate is necessary with Swift. Since Swift lets you extend even C structures with new methods, I wrote a handful of combinators that eliminate this kind of code. The above snippet can now be replaced with:

self.imageView.bounds = self.view.bounds.mapHeight { $0 - self.footer.size.height } I can easily enlarge a scroll view's content size to hold its pages: self.scrollView.contentSize = self.scrollView.bounds.size.mapWidth {$0 * CGFloat(pages.count) }

I can do calculations that previously would've required dozens of lines of code in just one or two:

let topHalfFrame = self.view.bounds.mapHeight { $0 / 2 } let bottomHalfFrame = topHalfFrame.mapY {$0 + topHalfFrame.size.height }

These two lines will give me two frames that each take up half of the height of their parent view.

In cases where I simply need to set a value, I use the primitive "with..." functions:

self.view.bounds.withX(0).withY(0).withSize(0).withHeight(0)

Note that these methods can all be chained to create complex expressions.

The code for these methods is trivial, yet they give you a huge boost in expressive power.

GitHub projecthttps://github.com/moreindirection/SwiftGeometry

## Code

extension CGPoint {
func mapX(f: (CGFloat -> CGFloat)) -> CGPoint {
return self.withX(f(self.x))
}

func mapY(f: (CGFloat -> CGFloat)) -> CGPoint {
return self.withY(f(self.y))
}

func withX(x: CGFloat) -> CGPoint {
return CGPoint(x: x, y: self.y)
}

func withY(y: CGFloat) -> CGPoint {
return CGPoint(x: self.x, y: y)
}
}

extension CGSize {
func mapWidth(f: (CGFloat -> CGFloat)) -> CGSize {
return self.withWidth(f(self.width))
}

func mapHeight(f: (CGFloat -> CGFloat)) -> CGSize {
return self.withHeight(f(self.height))
}

func withWidth(width: CGFloat) -> CGSize {
return CGSize(width: width, height: self.height)
}

func withHeight(height: CGFloat) -> CGSize {
return CGSize(width: self.width, height: height)
}
}

extension CGRect {
func mapX(f: (CGFloat -> CGFloat)) -> CGRect {
return self.withX(f(self.origin.x))
}

func mapY(f: (CGFloat -> CGFloat)) -> CGRect {
return self.withY(f(self.origin.y))
}

func mapWidth(f: (CGFloat -> CGFloat)) -> CGRect {
return self.withWidth(f(self.size.width))
}

func mapHeight(f: (CGFloat -> CGFloat)) -> CGRect {
return self.withHeight(f(self.size.height))
}

func withX(x: CGFloat) -> CGRect {
return CGRect(origin: self.origin.withX(x), size: self.size)
}

func withY(y: CGFloat) -> CGRect {
return CGRect(origin: self.origin.withY(y), size: self.size)
}

func withWidth(width: CGFloat) -> CGRect {
return CGRect(origin: self.origin, size: self.size.withWidth(width))
}

func withHeight(height: CGFloat) -> CGRect {
return CGRect(origin: self.origin, size: self.size.withHeight(height))
}
}

# Examining Hackage: folds

Posted on December 27, 2014

In keeping with the rest of the “Examining Hackage” series I’d like to go through the source folds package today. We’ll try to go through most of the code in an attempt to understand what exactly folds does and how it does it. To be honest, I hadn’t actually heard of this one until someone mentioned it to me on /r/haskell but it looks pretty cool. It also has the word “comonadic” in the description, how can I resist?

It’s similar to Gabriel’s foldl library, but it also seems to provide a wider suite of types folds. In retrospect, folds has a general framework for talking about types of folds and composing them where as foldl defines only 2 types of folds, but defines a whole heap of prebuilt (left) folds.

## Poking Around

After grabbing the source and looking at the files we see that folds is actually reasonable large

~$cabal get folds && cd folds-0.6.2 && ag -g "hs$"
src/Data/Fold.hs
src/Data/Fold/L.hs
src/Data/Fold/L'.hs
src/Data/Fold/Class.hs
src/Data/Fold/M1.hs
src/Data/Fold/L1.hs
src/Data/Fold/R.hs
src/Data/Fold/Internal.hs
src/Data/Fold/L1'.hs
src/Data/Fold/R1.hs
src/Data/Fold/M.hs
Setup.lhs
tests/hlint.hs

One that jumps out at me is Internal since it likely doesn’t depend on anything. We’ll start there.

## Internal

Looking at the top gives a hint for what we’re in for

    {-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE DeriveDataTypeable #-}
module Data.Fold.Internal
( SnocList(..)
, SnocList1(..)
, List1(..)
, Maybe'(..), maybe'
, Pair'(..)
, N(..)
, Tree(..)
, Tree1(..)
, An(..)
, Box(..)
) where

This module seems to be mostly a bunch of (presumably useful) data types + their instances for Foldable, Functor, and Traversable. Since all 3 of these are simple enough you can actually just derive them I’ll elide them in most cases.

First up is SnocList, if the name didn’t give it away it is a backwards list (snoc is cons backwards)

    data SnocList a = Snoc (SnocList a) a | Nil
deriving (Eq,Ord,Show,Read,Typeable,Data)

Then we have the boilerplatey instances for Functor and Foldable. What’s a bit odd is that both foldl and foldMap are implemented where we only need foldl. Presumably this is because just foldMap gives worse performance but that’s a little disappointing.

Next is SnocList1 and List1 which are quite similar.

    data SnocList1 a = Snoc1 (SnocList1 a) a | First a

data List1 a = Cons1 a (List1 a) | Last a

If you’ve never seen this before, notice how instead of Nil we have a constructor which requires an element. This means that no matter how we construct a list we need to supply at least element. Among other things this means that head would be safe.

We also have a couple strict structures. Notice that these cannot be functors since they break fmap f . fmap g = fmap (f . g) (why?). We have

    data Maybe' a = Nothing' | Just' !a

data Pair' a b = Pair' !a !b

And we have the obvious instance for Foldable Maybe' and Monoid (a, b). Now it may seem a little silly to define these types, but from experience I can say anything that makes strictness a bit more explicit is wonderfully helpful. Now we can just use seq on a Pair' and know that both components will be forced.

Next we define a type for trees. One thing I noticed was the docs mentioned that this type reflects the structure of a foldMap

    data Tree a
= Zero
| One a
| Two (Tree a) (Tree a)
deriving (Eq,Ord,Show,Read,Typeable,Data)

When we foldMap each One should be an element of the original collection. From there we can fmap with the map part of foldMap, and we can imagine traversing the tree and replacing Two l r with l <> r, each Zero with mempty, and each One a with a.

So that’s rather nifty. On top of this we have Foldable, Traversable, and Functor instances.

We also have Tree1 which is similar but elides the Zero

    data Tree1 a = Bin1 (Tree1 a) (Tree1 a) | Tip1 a

As you’d expect, this implements the same type classes as Tree.

Now is where things get a bit weird. First up is a type for reifying monoids using reflection. I actually was thinking about doing a post on it and then I discovered Austin Seipp has done an outstanding one. So we have this N type with the definition

    newtype N a s = N { runN :: a }
deriving (Eq,Ord,Show,Read,Typeable,Data)

Now with reflection there are two key components, there’s the type class instance floating around and a fresh type s that keys it. If we have s then we can easily demand a specific instance with reflect (Proxy :: Proxy s). That’s exactly what we do here. We can create a monoid instance using this trick with

    instance Reifies s (a -> a -> a, a) => Monoid (N a s) where
mempty = N $snd$ reflect (Proxy :: Proxy s)
mappend (N a) (N b) = N $fst (reflect (Proxy :: Proxy s)) a b So at each point we use our s to grab the tuple of monoid operations we expect to be around and use them in the obvious manner. The only reason I could imagine doing this is if we had a structure which we want to use as a monoid in a number of different ways. I suppose we also could have just passed the dictionary around but maybe this was extremely ugly. We shall see later I suppose. Last comes two data types I do not understand at all. There’s An and Box. The look extremely boring.  data Box a = Box a newtype An a = An a Their instances are the same everywhere as well.. I have no clue what these are for. Grepping shows they are used though so hopefully this mystery will become clearer as we go. ## Class Going in order of the module DAG gives us Data.Fold.Class.hs. This exports two type classes and one function  module Data.Fold.Class ( Scan(..) , Folding(..) , beneath ) where One thing that worries me a little is that this imports Control.Lens which I don’t understand nearly as well as I’d like to.. We’ll see how this turns out. Our first class is  class Choice p => Scan p where prefix1 :: a -> p a b -> p a b postfix1 :: p a b -> a -> p a b run1 :: a -> p a b -> b interspersing :: a -> p a b -> p a b So right away we notice this is a subclass of Choice which is in turn a subclass of Profunctor. Choice captures the ability to pull an Either through our profunctor.  left' :: p a b -> p (Either a c) (Either b c) right' :: p a b -> p (Either c a) (Either c b) Note that we can’t do this with ordinary profunctors since we’d need a function from Either a c -> a which isn’t complete. Back to Scan p. Scan p takes a profunctor which apparently represents our folds. We then can prefix the input we supply, postfix the input we supply, and run our fold on a single element of input. This is a bit weird to me, I’m not sure if the intention is to write something like  foldList :: Scan p => [a] -> p a b -> b foldList [x] = run1 x foldList (x : xs) = foldList xs . prefix1 x or something else entirely. Additionally this doesn’t really conform to my intuition of what a scan is. I’d expect a scan to produce all of the intermediate output involved in folding. At this point, with no instances in scope, it’s a little tricky to see what’s supposed to be happening here. There are a bunch of default-signature based implementations of these methods if your type implements Foldable. Since this is the next type class in the module let’s look at that and then skip back to the defaults.  class Scan p => Folding p where prefix :: Foldable t => t a -> p a b -> p a b prefixOf :: Fold s a -> s -> p a b -> p a b postfix :: Foldable t => p a b -> t a -> p a b postfixOf :: Fold s a -> p a b -> s -> p a b run :: Foldable t => t a -> p a b -> b runOf :: Fold s a -> s -> p a b -> b filtering :: (a -> Bool) -> p a b -> p a b At this point I looked at a few of the types and my first thought was “Oh dammit lens..” but it’s actually not so bad! The first thing to do is ignore the *Of functions which work across lens’s Fold type. There seems to be a nice pair for each “running” function where it can work across a Foldable container or lens’s notion of a fold.  prefix :: Foldable t => t a -> p a b -> p a b postfix :: Foldable t => p a b -> t a -> p a b run :: Foldable t => t a -> p a b -> b The first two functions let us create a new fold that will accept some input and supplement it with a bunch of other inputs. prefix gives the supplemental input followed by the new input and postfix does the reverse. We can actually supply input and run the whole thing with run. All of these are defined with folded from lens which reifies a foldable container into a Fold. so foo = fooOf folded is the default implementation for all of these. Now for the corresponding fold functions I’m reading them as “If you give me a lens to treat s as a container that I can get elements from and a fold, I’ll feed the elements of s into the fold.” The types are tricky, but this type class seems to capture what it means to run a fold across some type of structure. Now that we’ve seen how An comes in handy. It’s used as a single object Foldable container. Since it’s newtyped, this should basically run the same as just passing a single element in.  prefix1 = prefix . An run1 = run . An postfix1 p = postfix p . An So a Scan here apparently means a fold over a single element at a time. Still not sure why this is deserving of the name Scan but there you are. Last but not least we have a notion of dragging a fold through an optic with beneath.  beneath :: Profunctor p => Optic p Identity s t a b -> p a b -> p s t beneath l f = runIdentity #. l (Identity #. f) Those #.’s are like lmaps but only work when the function we apply is a “runtime identity”. Basically this means we should be able to tell whether or not we applied the function or just used unsafeCoerce when running the code. Otherwise all we do is set up our fold f to work across Identity and feed it into the optic. ## Concrete Implementations Now a lot of the rest of the code is implementing those two type classes we went over. To figure out where all these implementations are I just ran ~$ cabal repl
> :info Scan
....
instance Scan R1 -- Defined at src/Data/Fold/R1.hs:25:10
instance Scan R -- Defined at src/Data/Fold/R.hs:27:10
instance Scan M1 -- Defined at src/Data/Fold/M1.hs:25:10
instance Scan M -- Defined at src/Data/Fold/M.hs:33:10
instance Scan L1' -- Defined at src/Data/Fold/L1'.hs:24:10
instance Scan L1 -- Defined at src/Data/Fold/L1.hs:25:10
instance Scan L' -- Defined at src/Data/Fold/L'.hs:33:10
instance Scan L -- Defined at src/Data/Fold/L.hs:33:10

Looking at the names, I really don’t want to go through each of these with this much detail. Instead I’ll skip all the *1’s and go over R, L', and M to get a nice sampling of the sort of folds we get.

### R.hs

Up first is R.hs. This defines the first type for a fold we’ve seen.

    data R a b = forall r. R (r -> b) (a -> r -> r) r

Reading this as “a right fold from a to b” we notice a few parts here. It looks like that existential r encodes our fold’s inner state and r -> b maps the current state into the result of the fold. That leaves a -> r -> r as the stepping function. All in all this doesn’t look too different from

    foldAndPresent :: (a -> r -> r) -> r -> (r -> b) -> [a] -> b
foldAndPresent f z p = p . foldr f z

The rest of this module is devoted to making a lot of instances for R. Some of these are really uninteresting like Bind, but quite a few are enlightening. To start with, Profunctor.

    instance Profunctor R where
dimap f g (R k h z) = R (g . k) (h . f) z
rmap g (R k h z) = R (g . k) h z
lmap f (R k h z) = R k (h . f) z

This should more or less by what you expect since it’s really the only the way to get the types to fit together. We fit the map from b -> d onto the presentation piece of the fold and stick the map from a -> c onto the stepper so it can take the new pieces of input.

Next we have the instance for Choice.

    instance Choice R where
left' (R k h z) = R (_Left %~ k) step (Left z) where
step (Left x) (Left y) = Left (h x y)
step (Right c) _ = Right c
step _ (Right c) = Right c

right' (R k h z) = R (_Right %~ k) step (Right z) where
step (Right x) (Right y) = Right (h x y)
step (Left c) _ = Left c
step _ (Left c) = Left c

This was slightly harder for me to read, but it helps to remember that here _Left %~ and _Right %~ are just mapping over the left and right sides of an Either. That clears up the presentation bit. For the initial state, when we’re pulling our computation through the left side we wrap it in a Left, when we’re pulling it through the right, we wrap it in Right.

The interesting bit is the new step function. It short circuits if either our state or our new value is the wrong side of an Either otherwise it just applies our stepping function and wraps it back up as an Either.

In addition to being a profunctor, R is also a monad and comonad as well as a whole bunch of more finely grained classes built around those two. I’ll just show the Monad Applicative, and Comonad instance here.

    instance Applicative (R a) where
pure b = R (\() -> b) (\_ () -> ()) ()
R xf bxx xz <*> R ya byy yz = R
(\(Pair' x y) -> xf x $ya y) (\b ~(Pair' x y) -> Pair' (bxx b x) (byy b y)) (Pair' xz yz) instance Comonad (R a) where extract (R k _ z) = k z duplicate (R k h z) = R (R k h) h z instance Monad (R a) where return b = R (\() -> b) (\_ () -> ()) () m >>= f = R (\xs a -> run xs (f a)) (:) [] <*> m Looking at the Comonad instance nesting a fold within a fold doesn’t change the accumulator, only the presentation. A nested fold is one that runs and returns a new fold which is identical except the starting state is the result of the old fold. The <*> operator here is kind of nifty. First off it zips both folds together using the strict Pair'. Finally when we get to the presentation stage we map the final state for the left which gives us a function, and the final state for the right maps to its argument. Applying these two gives us our final result. Notice that there’s some craziness happening with irrefutable patterns. When we call this function we won’t attempt to force the second argument until bxx forces x or byy forces y. This is important because it makes sure that <*> preserves short circuiting. The monad instance has a suitably boring return and >>= is a bit odd. We have one machine which accumulates all the elements it’s given in a list, this is an “identity fold” of sorts. From there our presentation function returns a lambda which expects an a and runs f a with all the input we’ve saved. We combine this with m by running it in parallel with <*> and feeding the result of m back into the lambda generated by the right. Now we’re finally in a position to define our Scan and Folding instances. Since the Scan instance can be determined from the Folding one I’ll show Folding.  instance Folding R where run t (R k h z) = k (foldr h z t) prefix s = extend (run s) postfix t s = run s (duplicate t) runOf l s (R k h z) = k (foldrOf l h z s) prefixOf l s = extend (runOf l s) postfixOf l t s = runOf l s (duplicate t) filtering p (R k h z) = R k (\a r -> if p a then h a r else r) z It took some time, but I understand how this works! The first thing to notice is that actually running a fold just relies on the foldr we have from Foldable. Postfixing a fold is particularly slick with right folds. Remember that z represents the accumulated state for the remainder of the items in our sequence. Therefore, to postfix a number of elements all we need do is run the fold on the container we’re given and store the results as the new initial state. This is precisely what happens with run s (duplicate t). Now prefix is the inefficient one here. To prefix an element we want to change how presentation works. Instead of just using the default presentation function, we actually want to take the final state we get and run the fold again using this prefixing sequence and then presenting the result. For this we have another helpful comonandic function, extend. This leaks because it holds on to the sequence a lot longer than it needs to. The rest of these functions are basically the same thing except maybe postfixing (ha) a function with Of here and there. ## L’.hs Next up is (strict) left folds. As with right folds this module is just a data type and a bunch of instances for it.  forall r. L' (r -> b) (r -> a -> r) r One thing that surprised me here was that our state r isn’t stored strictly! That’s a bit odd but presumably there’s a good reason for this. Now all the instances for L' are the same as those for R up to isomorphism because the types are well.. isomorphic. The real difference comes in the instances for Scan and Folding. Remember how Folding R used foldr, well here we just use foldl'. This has the upshot that all the strictness and whatnot is handled entirely by the foldable instance!  instance Folding L' where run t (L' k h z) = k$! foldl' h z t
prefix s             = run s . duplicate
postfix t s          = extend (run s) t

runOf l s (L' k h z) = k $! foldlOf' l h z s prefixOf l s = runOf l s . duplicate postfixOf l t s = extend (runOf l s) t filtering p (L' k h z) = L' k (\r a -> if p a then h r a else r) z So everywhere we had foldr we have foldl'. The other interesting switch is that our definitions of prefix and postfix are almost perfectly swapped! This actually makes perfect sense when you think about it. In a left fold the state is propagating from the beginning to the end versus a right fold where it propagates from the end to the beginning! So to prefix something when folding to the left we add it to the initial state and when postfixing we use the presentation function to take our final state and continue to fold with it. If you check above, you’ll find this to be precisely the opposite of what we had for right folds and since they both have the same comonad instance, we can swap the two implementations. In fact, having read the implementation for right folds I’m noticing that almost everything in this file is so close to what we had before. It really seems like there is a clever abstraction just waiting to break out. ## M.hs Now that we’ve seen how left and right folds are more or less the same, let’s try something completely different! M.hs captures the notion of a foldMap and looks pretty different than what we’ve seen before. First things first, here’s the type in question.  data M a b = forall m. M (m -> b) (a -> m) (m -> m -> m) m We still have a presentation function m -> b, and we still have an internal state m. However, we also have a conversion function to map our inputted values onto the values we know how to fold together and we have a tensor operation m -> m -> m. Now as before we have a profunctor instance  instance Profunctor M where dimap f g (M k h m e) = M (g.k) (h.f) m e rmap g (M k h m e) = M (g.k) h m e lmap f (M k h m e) = M k (h.f) m e Which might start to look familiar from what we’ve seen so far. Next we have a Choice instance which is still a little intimidating.  instance Choice M where left' (M k h m z) = M (_Left %~ k) (_Left %~ h) step (Left z) where step (Left x) (Left y) = Left (m x y) step (Right c) _ = Right c step _ (Right c) = Right c right' (M k h m z) = M (_Right %~ k) (_Right %~ h) step (Right z) where step (Right x) (Right y) = Right (m x y) step (Left c) _ = Left c step _ (Left c) = Left c As before we use prisms and %~ to drag our presentation and conversion functions into Either, similarly our starting state is wrapped in the appropriate constructor and we define a new stepping function with similar characteristic’s to what we’ve seen before. As before, we’ve got a wonderful world of monads and comonads to dive into now. We’ll start with monads here to mix it up.  instance Applicative (M a) where pure b = M (\() -> b) (\_ -> ()) (\() () -> ()) () M xf bx xx xz <*> M ya by yy yz = M (\(Pair' x y) -> xf x$ ya y)
(\b -> Pair' (bx b) (by b))
(\(Pair' x1 y1) (Pair' x2 y2) -> Pair' (xx x1 x2) (yy y1 y2))
(Pair' xz yz)

return = pure
m >>= f = M (\xs a -> run xs (f a)) One Two Zero <*> m

Our return/pure just instantiates a trivial fold that consumes ()s and outputs the value we gave it. For <*> we run both machines strictly next to each other and apply the final result of one to the final result of the other.

Bind creates a new fold that creates a tree. This tree contains every input fed to it as it’s folding and stores each merge a node in the tree. While we run this, we also run the original m we were given. Finally, when we reach the end, we apply f to the result of m and run this over the tree we’ve created which is foldable. If you remember back to the comment of Tree a capturing foldMap this is what was meant by it: we’re using a tree to suspend a foldMap until we’re in a position to run it.

    instance Comonad (M a) where
extract (M k _ _ z) = k z
duplicate (M k h m z) = M (\n -> M (k . m n) h m z) h m z

We can be pleasantly surprised that most of this code is the same. Extraction grabs our current state and presents it. Duplication creates a fold which will run and return a new fold. This new fold has the same initial state as the original fold, but when it goes to present its results it will merge it with the final state of the outer fold. This is very different from before and I suspect it will significantly impact our Folding instance.

    instance Folding M where
run s (M k h m (z :: m)) = reify (m, z) $\ (_ :: Proxy s) -> k$ runN (foldMap (N #. h) s :: N m s)
prefix s (M k h m (z :: m)) = reify (m, z) $\ (_ :: Proxy s) -> case runN (foldMap (N #. h) s :: N m s) of x -> M (\y -> k (m x y)) h m z postfix (M k h m (z :: m)) s = reify (m, z)$
\ (_ :: Proxy s) -> case runN (foldMap (N #. h) s :: N m s) of
y -> M (\x -> k (m x y)) h m z
filtering p (M k h m z) = M k (\a -> if p a then h a else z) m z

This was a little intimidating so I took the liberty of ignoring *Of functions which are pretty much the same as what we have here.

To run a fold we use foldMap, but foldMap wants to work over monoids and we only have z and m. To promote this to a type class we use reify and N. Remember N from way back when? It’s the data type that uses reflection to yank a tuple out of our context and treat it as a monoid instance. In all of this code we use reify to introduce a tuple to our environment and N as a pseudo-monoid that uses m and z.

with this in mind, this code uses N #. h which uses the normal conversion function to introduce something into the N monoid. Then foldMap takes care of the rest and all we need do is call runN to extract the results.

prefix and postfix are actually markedly similar. They both start by running the fold over the supplied structure which reduces it to an m. From there, we create a new fold which is identical in all respects except the presentation function. The new presentation function uses m to combine the pre/post-fixed result with the new result. If we’re postfixing, the postfixed result is on the right, if we’re prefixing, the left.

What’s particularly stunning is that neither of these leak! We don’t need to hold onto the structure in our new fold so we can prefix and postfix in constant memory.

## Fold.hs

Now that we’ve gone through a bunch of instances of Folding and Scanning, we’re in a position to actually look at what Data.Fold exports.

    module Data.Fold
( Scan(..)
, Folding(..)
, beneath
, L1(..)  -- lazy Mealy machine
, L1'(..) -- strict Mealy machine
, M1(..) -- semigroup reducer
, R1(..) -- reversed lazy Mealy machine
, L(..) -- lazy Moore machine
, L'(..) -- strict Moore machine
, M(..) -- monoidal reducer
, R(..) -- reversed lazy Moore machine
, AsRM1(..)
, AsL1'(..)
, AsRM(..)
, AsL'(..)
) where

So aside from the folds we’ve examined before, there are 4 new classes, AsRM[1], and AsL[1]'. We’ll look at the non-1 versions.

    class AsRM1 p => AsRM p where
asM :: p a b -> M a b
asR :: p a b -> R a b

So this class covers the class of p’s that know how to convert themselves to middle and right folds. Most of these instances are what you’d expect if you’ve ever done the “write foldl as foldr” trick or similar shenanigans.

For M

    instance AsRM M where
asR (M k h m z) = R k (m.h) z
asM = id

asM is trivially identity and since m is expected to be associative we don’t really care that R is going to associate it strictly to the right. We just glue h onto the front to map the next piece of input into something we know how to merge.

Next is R

    instance AsRM R where
asM (R k h z) = M (\f -> k (f z)) h (.) id
asR = id

For right folds we do something a bit different. We transform each value into a function of type m -> m which is the back half of a folding function. We can compose these associatively with . since they are just functions. Finally, when we need to present this, we apply this giant pipeline to the initial state and present the result. Notice here how we took a nonassociative function and bludgeoned it into associativity by partially applying it.

For L' we do something similar

    instance AsRM L' where
asR (L' k h z) = R (\f -> k (f z)) (\b g x -> g $! h x b) id asM = asM . asR We once again build up a pipeline of functions to make everything associative and apply it at the end. We can’t just use . though for composition because we need to force intermediate results. That’s why you see \b g x -> g$! h x b, it’s just strict composition.

It makes sense that we’d bundle right and monoidal folds together because every right fold can be converted to a monoidal and every monoidal fold to a right. That means that every time we can satisfy one of these functions we can build the second.

This isn’t the case for left folds because we can’t convert a monoidal or right fold to a left one. For the people who are dubious of this, foldl doesn’t let us capture the same amount of laziness we need. I forgot about this too and subsequently hung my machine trying to prove Edward Kmett wrong.

This means that the AsL' is a fairly boring class,

    class (AsRM p, AsL1' p) => AsL' p where
asL' :: p a b -> L' a b

instance AsL' L where
asL' (L k h z) = L' (\(Box r) -> k r) (\(Box r) a -> Box (h r a)) (Box z)

Now we finally see the point of Box, it’s designed to stubbornly block attempts at making its contents strict. You can see this because all the instance for L does is wrap everything in Boxes! Since L' is the same as L with some extra seqs, we can use Box to nullify those attempts at strictness and give us a normal left fold.

That’s it! We’re done!

## Wrap Up

Now that we’ve gone through a few concrete implementations and the overall structures in this package hopefully this has come together for you. I must say, I’m really quite surprised at how effectively comonadic operations can capture compositional folds. I’m certainly going to make an effort to use this package or Gabriel’s foldl a bit more in my random “tiny Haskell utility programs”.

If you’re as entranced by these nice little folding libraries as I am, I’d recommend

Trivia fact: this is the longest article out of all 52 posts on Code & Co.

Update: I decided it might be helpful to write some utility folds for folds. I figured this might be interesting to some.

## December 25, 2014

### Roman Cheplyaka

Bas van Dijk has recently released monad-control-1.0.0.0, the main purpose of which is to replace associated data types with associated type synonyms. The change caused minor breakages here and there, so people might wonder whether and why it was worth it. Let me show you a simple example that demonstrates the difference.

Let’s say we are writing a web application. wai defines an application as

type Application =
Request ->
IO ResponseReceived

Our web app will need a database connection, which we’ll pass using the ReaderT transformer:

type ApplicationM m =
Request ->

myApp :: ApplicationM (ReaderT DbConnection IO)

However, warp can only run an Application, not ApplicationM:

run :: Port -> Application -> IO ()

Can we build runM :: Port -> ApplicationM m -> m () on top of the simple run function? Solving the problems like this one is exactly the purpose of monad-control.

Here’s how such a function might look like:

runM
=> Port -> ApplicationM m -> m ()
runM port app = do
liftBaseWith $\runInIO -> run port$ \req ack -> runInIO $app req (liftBase . ack) What’s going on here? liftBaseWith, like liftM or liftBase, allows to run a primitive monadic action in a complex monad stack. The difference is that it also gives us a function, here named runInIO, which lets to “lower” complex actions to primitive ones. Here we use runInIO to translate the return value of our app, m (), into a basic IO () value that the run function can digest. All is well, except… Could not deduce (StM m ResponseReceived ~ ResponseReceived) Expected type: IO ResponseReceived Actual type: IO (StM m ResponseReceived) Relevant bindings include runInIO :: RunInBase m IO In the expression: runInIO$ app req (liftBase . ack)

The type of runInIO is forall a . m a -> IO (StM m a) (a.k.a. RunInBase m IO), while we would like a simple forall a . m a -> IO a. The purpose of StM is to encompass any “output effects”, such as state or error.

In our case, we don’t have any “output effects” (nor would we be allowed to), so StM (ReaderT DbConnection IO) ResponseReceived is really isomorphic to ResponseReceived.

In monad-control 0.x, StM used to be an associated data family, and its constructors for the standard monad transformers were hidden. Even though we knew that the above two types were isomorphic, we still couldn’t resolve the error nicely.

Not anymore! Since in monad-control 1.0 StM is an associated type synonym, StM (ReaderT DbConnection IO) ResponseReceived and ResponseReceived are not just hypothetically isomorphic; they are literally the same type. After we add the corresponding equality constraint to runM

runM
=> Port -> ApplicationM m -> m ()

our app compiles!

This example is not just an isolated case. The general problem with monad-control is that it is all too easy to discard the output effects as Edward Yang shows.

Monads for which StM m a ~ a provide a “safe subset” of monad-control. Previously, it was hard to tell apart safe and unsafe uses, because the output effects or absence thereof hid behind the opaque StM data family.

Now not only is it transparent when the output effects are absent, but we can actually encode that fact right in the type system! As an example, Mitsutoshi Aoe and I are experimenting with a safe lifted async module.

One may wonder if this subset is too boring, since it only includes monads that are isomorphic to a reader transformer over the base monad. While that is technically true, there are a lot of things you can do with a reader. The ZoomT and CustomWriterT transformers that I described in another article, as well as the Proxied transformer they’re based upon, are reader-like and thus safe to use with monad-control.

# Stopping Times

Let $S$ an $T$ be stopping times and let the filtration on which they are defined be right continuous. Then

1. $S \land T = \min(S, T)$,
2. $S \lor T = \max(S, T)$,
3. $S + T$ and
4. $\alpha T$

are stopping times where $\alpha > 1$.

For the first we have $\{S \land T \leq t\} = \{S \leq t\} \cup \{T \leq t\}$ and both the latter are in ${\mathcal{F}_t}$ by the definition of a stopping time.

Similarly for the second $\{S \lor T \leq t\} = \{S \leq t\} \cap \{T \leq t\}$.

For the fourth we have $\{\alpha T \leq t\} = \{T \leq \frac{t}{\alpha}\} \in {\mathcal{F}}_{t / \alpha} \subset {\mathcal{F}}_{t}$ since $\alpha > 1$.

The third is slightly trickier. For $\omega \in \Omega$, $S(\omega) +T(\omega) < t$ if and only if for some rational $q$, we have $S(\omega) + T(\omega) < q < t$. We can thus we can find $r \in \mathbb{Q}$ such that $S(\omega) < r < q - T(\omega)$. Writing $s \triangleq q - r \in \mathbb{Q}$ we also have $T(\omega) < q - r = s$. Thus we have $S(\omega) + T(\omega) < t$ if and only if there exist $r, s \in \mathbb{Q}$ and $r, s > 0$ such that $r + s < t$ and $S(\omega) < r$ and $T(\omega) < s$. In other words

$\displaystyle \{S +T < t\} = \bigcup_{r, s \in \mathbb{Q}^+} (\{S < r\} \cap \{T < s\})$

By right continuity (Protter 2004 Theorem 1) of the filtration, we know the terms on the right hand side are in ${\mathcal{F}}_r \subset {\mathcal{F}}_t$ and ${\mathcal{F}}_s \subset {\mathcal{F}}_t$ so that the whole right hand side is in ${\mathcal{F}}_t$. We thus know that the left hand side is in ${\mathcal{F}}_t$ and using right continuity again that therefore $S + T$ must be a stopping time.

# Bibliography

Protter, P.E. 2004. Stochastic Integration and Differential Equations: Version 2.1. Applications of Mathematics. Springer. http://books.google.co.uk/books?id=mJkFuqwr5xgC.

# Announcing: hastache 0.6.1

Happy holidays, everyone!

I would like to announce a new version of the Hastache library, version 0.6.1. Some interesting and useful changes, as well as improvements and bugfixes are included in the release. See below for an extended changelog.

Hastache is a Haskell implementation of the mustache templating system.

# Quick start

cabal update
cabal install hastache

A simple example:

import Text.Hastache
import Text.Hastache.Context
import qualified Data.Text.Lazy.IO as TL

main = hastacheStr defaultConfig (encodeStr template) (mkStrContext context)
>>= TL.putStrLn

context "unread" = MuVariable (100 :: Int)

# Whats’s new in 0.6.1?

Most of the new features in this release deal with generic contexts.

## Context merging

composeCtx is a left-leaning composition of contexts. Given contexts c1 and c2, the behaviour of (c1 <> c2) x is following: if c1 x produces ‘MuNothing’, then the result is c2 x. Otherwise the result is c1 x. Even if c1 x is ‘MuNothing’, the monadic effects of c1 are still to take place.

## Generic contexts for more datatypes

The mkGenericContext function now supports additional datatypes like Maybe (with Nothing being an empty/null value) and Either.

## Context modifiers and renaming

The new mkGenericContext' is a generalized version of mkGenericContext and it takes two addition arguments. The first one, of type (String -> String) is simply a renaming function, similar to fieldLabelModifier of aeson. To see this feature in action, consider the following example:

{-# LANGUAGE DeriveDataTypeable #-}

import Text.Hastache
import Text.Hastache.Context

import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.IO as TL

import Data.Data (Data, Typeable)
import Data.Decimal
import Data.Generics.Aliases (extQ)

data Test = Test {f :: Int}
deriving (Data, Typeable)

val1 :: Test
val1 = Test 1

val2 :: Test
val2 = Test 2

r "f" = "foo"
r x   = x

example :: Test -> IO TL.Text
example v = hastacheStr defaultConfig
(encodeStr template)
(mkGenericContext' r defaultExt v)

template = "An integer: {{foo}}"

main = do
example val1 >>= TL.putStrLn
example val2 >>= TL.putStrLn

In the example we use the renaming function r to rename a field “f” to “foo”.

The second additional argument is a query extension, of type Ext:

type Ext = forall b. (Data b, Typeable b) => b -> String

A query extension is a way of turning arbitrary datatypes into strings. This might come in very handy, if you want to generate mustache contexts from records/datatypes that contain non-primitive datatypes (from non-base modules) that you want to display. Before 0.6.1, if you had a record that contained, for example, a Decimal field, and you wanted to convert it to a context and access that field, you were simply out of luck. With this release you can basically extend the mkGenericContext' function to support any datatypes you want! Once again, I believe an example is worth a thousand words, so let us consider a slightly modified version of the example above:

{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneDeriving #-}

-- Custom extension function for types that are not supported out of
-- the box in generic contexts
import Text.Hastache
import Text.Hastache.Context

import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.IO as TL

import Data.Data (Data, Typeable)
import Data.Decimal
import Data.Generics.Aliases (extQ)

data DecimalOrInf = Inf | Dec Decimal deriving (Data, Typeable)
deriving instance Data Decimal
data Test = Test {n::Int, m::DecimalOrInf} deriving (Data, Typeable)

val1 :: Test
val1 = Test 1 (Dec $Decimal 3 1500) val2 :: Test val2 = Test 2 Inf query :: Ext query = defaultExt extQ f where f Inf = "+inf" f (Dec i) = show i r "m" = "moo" r x = x example :: Test -> IO TL.Text example v = hastacheStr defaultConfig (encodeStr template) (mkGenericContext' r query v) template = concat [ "An int: {{n}}\n", "{{#moo.Dec}}A decimal number: {{moo.Dec}}{{/moo.Dec}}", "{{#moo.Inf}}An infinity: {{moo.Inf}}{{/moo.Inf}}" ] main = do example val1 >>= TL.putStrLn example val2 >>= TL.putStrLn As you can see, the query extensions are combined using the extQ function from Data.Generics, and the “unit” of this whole thing is defaultExt function. # Links # Acknowledgments This release would not have been possible without Tobias Florek, Edsko de Vries, Janne Hellsten, @clinty on Github, Stefan Kersten, Herbert Valerio Riedel, and other people who were submitting issues, patches, and requests. Tagged: haskell, hastache ### Danny Gratzer # Examining Hackage: operational Posted on December 25, 2014 Tags: haskell In this installment of “jozefg is confused by other people’s code” we turn to operational. This is a package that’s a little less known than I’d like. It provides a monad for transforming an ADT of instructions, a monad that can be used with do notation and separates out interpretation. Most people familiar with free monads are wondering what the difference is between operational’s approach and using free monads. Going into this, I have no clue. Hopefully this will become clear later on. ## Diving Into The Source Let’s get started shall we ~$ cabal get operational

Happily enough, there’s just one (small) file so we’ll go through that.

To start with Control.Monad.Operational exports

    module Control.Monad.Operational (
Program, singleton, ProgramView, view,
ProgramT, ProgramViewT(..), viewT,
liftProgram,
) where

Like with most “provides a single monad” packages, I’m most interested in how Program works. Looking at this, we see that it’s just a synonym

    type Program instr = ProgramT instr Identity

Just like the mtl, this is defined in terms of a transformer. So what’s this transformer?

    data ProgramT instr m a where
Lift   :: m a -> ProgramT instr m a
Bind   :: ProgramT instr m b -> (b -> ProgramT instr m a)
-> ProgramT instr m a
Instr  :: instr a -> ProgramT instr m a

So ProgramT is a GADT, this is actually important because Bind has an existential type variable: b. Otherwise this is really just a plain tree, I assume (>>=) = Bind and return = Lift . return in the monad instance for this. And finally we can see that instructions are also explicitly supported with Instr.

We can confirm that the Monad instance is as boring as we’d expect with

    instance Monad m => Monad (ProgramT instr m) where
return = Lift . return
(>>=)  = Bind

lift   = Lift

instance Monad m => Functor (ProgramT instr m) where
fmap   = liftM

instance Monad m => Applicative (ProgramT instr m) where
pure   = return
(<*>)  = ap

So clearly there’s no interesting computation happening here. Looking at the export list again, we see that there’s a helpful combinator singleton for building up these Program[T]s since they’re kept abstract.

    singleton :: instr a -> ProgramT instr m a
singleton = Instr

Which once again is very boring.

So this is a lot like free monads it seems since neither one of these actually does much in its monad instance. Indeed the equivalent with free monads would be

    data Free f a = Pure a | Free (f (Free f a))
instance Functor f => Monad (Free f) where
return = Pure
Pure a >>= f = f a
(Free a) >>= f = Free (fmap (>>= f) a)

singleton :: Functor f => f a -> Free f a
singleton = Free . fmap Pure

The obvious differences is that

1. Free requires a functor while Program doesn’t
2. Frees monad instance automatically guarantees laws

2 is the bigger one for me. Free has a tighter set of constraints on its f so it can guarantee the monad laws. This is clearly false with Program since return a >>= f introduces an extra Bind instead of just giving f a.

This would explain why ProgramT is kept abstract, it’s hopelessly broken just to expose it in its raw form. Instead what we have to do is somehow partially normalize it before we present it to the user.

Indeed that’s exactly what ProgramViewT is representing. It’s a simpler data type

    data ProgramViewT instr m a where
Return :: a -> ProgramViewT instr m a
(:>>=) :: instr b
-> (b -> ProgramT instr m a)
-> ProgramViewT instr m a

This apparently “compiles” a Program so that everything is either binding an instruction or a pure value. What’s interesting is that this seems to get rid of all Lift’s as well.

How do we produce one of these? Well that seems to be viewT’s job.

    viewT :: Monad m => ProgramT instr m a -> m (ProgramViewT instr m a)
viewT (Lift m)                = m >>= return . Return
viewT ((Lift m)     Bind g) = m >>= viewT . g
viewT ((m Bind g) Bind h) = viewT (m Bind (\x -> g x Bind h))
viewT ((Instr i)    Bind g) = return (i :>>= g)
viewT (Instr i)               = return (i :>>= return)

Note that this function returns an m (ProgramViewT instr m a), not just a plain ProgramViewT. This makes sense because we have to get rid of the lifts. What I think is particularly interesting here is that the 2nd and 3rd cases are just the monad laws!

The second one says binding to a computation is just applying the function to it in the obvious manner. The third re-associates bind in a way guaranteed by the monad laws.

This means that while ProgramT isn’t going to satisfy the monad laws, we can’t tell because all the things said to be equal by the monad laws will compile to the same view. Terribly clever stuff.

The rest of the module is mostly boring stuff like Monad* instances. The last interesting functions is interpretWithMonad

    interpretWithMonad :: forall instr m b.
Monad m => (forall a. instr a -> m a) -> (Program instr b -> m b)
interpretWithMonad f = eval . view
where
eval :: forall a. ProgramView instr a -> m a
eval (Return a) = return a
eval (m :>>= k) = f m >>= interpretWithMonad f . k

This nicely highlights how you’re supposed to write an interpreter for a Program. eval handles the two cases of the view using the mapping to a monad we provided and view handles actually compiling the program into these two cases. All in all, not too shabby.

## Surprise, There Were Docs The Whole Time!

Now I assume that most people didn’t actually download the source to operational, but you really should! Inside you’ll find a whole directory, doc. It contains a few markdown files with explanations and references to the appropriate papers as well as a couple examples of actually building things with operational.

Now that you understand how the current implementation works, you should be able to understand most of what is being said there.

## Wrap Up

So operational illustrates a neat trick I rather like: using modularity to provide an O(1) implementation of >>= and hide its rule breaking with a view.

This package also drops the positivity requirement that Free implies with its functor constraint. Which I suppose means you could have

    data Foo a where
Bar :: (a -> ...) -> Bar a

Which is potentially useful.

Last but not least, operational` really exemplifies having a decent amount of documentation even though there’s only ~100 lines of code. I think the ratio of documentation : code is something like 3 : 1 which I really appreciate.

# 24 Days of GHC Extensions: Thanks!

Wow, another year out! After 24 days of frantic blogging, Christmas is finally upon us, and I’d like to take a moment to send a huge thank you to this years guest posters. To recap, the following authors submitted their work to this year’s series:

I feel the guest posts have added a lot of variety to the series, and this year each post has consistently gone above and beyond my expectations, delivering incredibly high quality content. Once again, thank you all for your hard work - 24 DOGE wouldn’t be the same without you!

Over the course of the month, we’ve looked at just over 20 extensions - but as I mentioned in the opening post, the story certainly doesn’t stop there. GHC is full of many more interesting extensions - I was hoping to get on to looking at GADTs and data kinds, but alas - there are only so many days in the month. For an example of how these extensions all interact when we write “real-world” software, readers may be interested in viewing my recent Skills Matter talk - strongly typed publish/subscribe over websockets via singleton types.

I’ve been really happy to see comments this year from people who have learnt about new extensions, seen previous extensions in a different light, or simply formed a deeper understanding of extensions they were already using. While I was a little nervous about the series at the start, I’m now confident it’s been a great success. A huge thank you to everyone who participated in the discussions - as with 24 Days of Hackage in previous years, I feel the discussion around these posts is just as important.

Finally, a thank you to everyone who donated during the series - these tokens of appreciate are greatly appreciated.

To close 24 DOGE, well… a picture speaks a thousand words.

# New version of BigPixel

New version of BigPixel:

New version of BigPixel — my iPad app for pixel art and retro sprites (with a new colour picker and sprite preview). I use BigPixel when teaching kids programming by writing small games. Creating art and programming works nicely in tandem — for details of what I do with Haskell, see http://justtesting.org/post/70871612766/lets-program