Planet Haskell

January 28, 2015

Well-Typed.Com

How we might abolish Cabal Hell, part 2

In part 1 we looked at an overview of all the various problems that make up “Cabal hell”. We also looked at an overview of a few solutions and how they overlap.

In part 2 and part 3 we’ll look in more detail at the two major solutions to Cabal hell. In this post we’ll look at Nix-style package management, and in the next post we’ll look at curated package collections.

A reminder about what the problems are

You might recall from part 1 this slightly weird diagram of the symptoms of Cabal Hell:

Cabal Hell: the symptoms 

In this part we’re going to pick out a couple of the problems and look them in a bit more detail:

  • Breaking re-installations
  • Type errors when using packages together

Breaking re-installations

There are situations where Cabal’s chosen solution would involve reinstalling an existing version of a package but built with different dependencies.

For example, suppose I have an application app and it depends on zlib and bytestring. The zlib package also depends on bytestring. Initially the versions I am using for my app-1.0 are zlib-0.5 and bytestring-0.9 (Versions simplified for presentation.) Now I decide in the next version of my app, app-1.1, that I want to use a nice new feature in bytestring version 0.10. So I ask cabal to build my application using a more recent bytestring. The cabal tool will come up with a solution using the new bytestring-0.10 and the same old version of zlib-0.5. Building this solution will involve installing bytestring-0.10 and rebuilding zlib-0.5 against it.

Example breakage 1 

What is the problem here?

The problem is the rebuilding of zlib-0.5.

Why is this a problem?

It is a problem because when we install the instance “zlib-0.5 built against bytestring-0.10” we have to delete the pre-existing instance “zlib-0.5 built against bytestring-0.9”. Anything that depended on that previous instance now has a dangling reference and so is effectively broken.

Why do we have to delete the previous instance?

The way installed packages are managed is such that each GHC package database can only have one instance of each package version. Note that having two different versions would be allowed, e.g. zlib-0.4 and zlib-0.5, but having two instances of zlib-0.5 is not allowed. Not deleting the previous instance would involve having two instances of zlib-0.5 (each instance built against different versions of bytestring).

So the root of the problem is having to delete – or if you like, mutate – package instances when installing new packages. Mutable state strikes again! And this is due to the limitation of only being able to have one instance of a package version installed at once.

Type errors when using packages together

The second, orthogonal, problem is that it is possible to install two packages and then load them both in GHCi and find that you get type errors when composing things defined in the two different packages. Effectively you cannot use these two particular installed packages together.

The reason is that the two packages have been built using different versions of some common dependency. For example, I might have zlib built against bytestring-0.9 and binary built against bytestring-0.10. Now if I try to use them together in GHCi (e.g. compressing the result of binary serialisation) then I will get a type error from GHC saying that bytestring-0.9:Data.ByteString.ByteString is not the same type as bytestring-0.10:Data.ByteString.ByteString. And GHC is not wrong here, we really cannot pretend that these are the same types (at least not in general).

Example breakage 2 

This is a variant on the classic “diamond dependency problem” that cabal solved years ago. So why do we still get it? In fact we never hit this problem when building a package with cabal, because cabal’s solver looks for “consistent” solutions. (Recall from part 1, that when we say a “consistent” solution we mean one that uses only one version of any package.)

We can still hit this problem when we install things separately and then use the packages directly with ghc or ghci. This is because cabal does not enforce consistency in the developer’s environment. It only enforces consistency within any set of packages it installs simultaneously.

The fundamental problem is that developers expect to be able to use combinations of their installed packages together, but the package tools do not enforce consistency of the developer’s environment.

In practice this class of problem is currently relatively rare. In part that is because one would often hit the problem above involving re-installing and breaking packages. If however we lifted the limitation on installing multiple instances of the same version of a package, then we would always be able to install new versions and we would instead hit this problem much more frequently.

Nix-style package management

The ideas behind Nix are now over 10 years old. When first reading the published papers on Nix some years ago I was struck by how simple and elegant they are. They also appear to work well in practice, with a full Linux distribution based on them, plus a number of other well-regarded tools.

The key ideas are:

  • A persistent package store. That is, persistent in the sense of an immutable functional data type. Packages are added but never mutated. Old packages can be garbage collected. Just like immutable data structures in Haskell, immutable package stores have several advantages.

  • Working environment(s) that are “views” into the package store. A working environment has a number of packages available, and this is just a subset of all the packages that are installed in the package store. There can be many of these that co-exist and switching the view is simple and cheap.

Contrast this to a traditional approach, e.g. what we have now with Cabal, or Linux distros based on .rpm or .deb. In the traditional approach there is a single environment which is exactly the same as the full set of installed packages. So compared to a traditional approach, we have an extra level of indirection. Having views lets us have a much larger collection of packages installed than we have available in the working environment, and allows multiple environments.

A good illustration of what these ideas give us is to see what happens when we want to add a new package:

  1. We start with our initial environment which points to a bunch of packages from the store.

  2. We compile and install a new package into the store. So far this changes very little, which is a big feature! In particular it cannot break anything. The new installed package can co-exist with all the existing ones. There can be no conflicts (the install paths are arranged to guarantee this). No existing packages have been modified or broken. No environments have yet been changed.

  3. Now we have choices about what to do with environments. Our existing environment is unchanged and does not contain the new package. We can create a new environment that consists of the old environment with the extra package added. In principle both the old and the new environments exist. This is very much like a persistent functional data structure:

    let env' = Map.insert pkgname pkg env

    Both exist, the old one is not changed, and we can decide if we are only interested in the new one or if we want to keep both.

  4. Finally we can, if we want, switch our “current” environment to be the new environment with the new package. So while multiple environments can exist, only one of them is active in our current shell session.

So what have we gained from the added indirection of a persistent store + views?

  • We can install new things without breaking anything, guaranteed.

  • We get atomic rollback if we don’t like the changes we made.

  • Multiple independent environments.

  • Environments that are very cheap to create.

The multiple environments effectively gives us sandboxes for free. In fact it’s better because we can easily share artefacts between sandboxes when we want to. That means far fewer rebuilds, and easy global installation of things built in an isolated environment.

Nix has a few other good ideas, like its functional package description language and some clever tricks for dealing with the messy details of system packages. The key ideas however are the ones I’ve just outlined, and they are the ones that we should steal for GHC/Cabal.

Mechanisms, policies and workflows

The package store and the multiple environments are just a mechanism, not a user interface. The mechanisms are also mostly policy free.

Generally we should start from user requirements, use cases, workflows and the like, and work out a user interface and then decide on mechanisms that will support that user interface. That said, I think it is clear that the Nix mechanisms are sound and sufficiently general and flexible that they will cover pretty much any user interface we decide we want.

So I think our design process should be:

  1. Look at the packaging tool requirements, use cases, workflows etc, and work out a user interface design.

  2. Then figure out how the actions in that user interface translate into operations on the Nix package store and environments.

Addressing the Cabal Hell problems

The Nix approach deals nicely with the problem of breaking re-installations. The Nix mechanisms guarantee that installed packages are never mutated, so no existing installed packages ever break.

The Nix mechanisms do not deal directly with the issue of type errors when using packages together. As we noted before, that requires enforcing the consistency of the developer’s environment. In Nix terms this is a policy about how we manage the Nix environment(s). The policy would be that each environment contains only one version of each package and it would guarantee that all packages in an environment can be used together.

Without wishing to prejudge the future user interface for cabal, I think this is a policy that we should adopt.

Enforcing consistency does have implications for the user interface. There will be situations where one wants to install a new package, but it is impossible to add it to the current environment while keeping all of the existing packages. For example, suppose we have two different web stacks that have many packages in common but that require different versions of some common package. In that case we could not have a consistent environment that contains both. Thus the user interface will have to do something when the user asks to add the second web stack to an environment that already contains the first. The user interface could minimise the problem by encouraging a style of use where most environments are quite small, but it cannot be avoided in general.

While what I am suggesting for consistency is relatively strong, we cannot get away without enforcing some restrictions on the environment. For example if our environment did contain two instances of the same version of a package then which one would we get when we launch GHCi? So my view is that given that we cannot avoid the user interface issues with environment consistency, it is better to go for the stronger and more useful form.

In fact we’ve already been experimenting in this direction. The current cabal sandbox feature does enforce consistency within each sandbox. This seems to work ok in practice because each sandbox environment is relatively small and focused on one project. Interestingly we have had some pressure to relax this constraint due to the cost of creating new sandboxes in terms of compiling. (Allowing some inconsistencies in the environment allows the common packages to be shared and thus only compiled once.) Fortunately this issue is one that is nicely solved by Nix environments which are extremely cheap to create because they allow sharing of installed packages.

Implementation progress

We’ve been making small steps towards the Nix design for many years now. Several years ago we changed GHC’s package database to use the long opaque package identifiers that are necessary to distinguish package instances.

More recently Philipp Schuster did a GSoC project looking into the details of what we need to do to incorporate the Nix ideas within GHC and Cabal. You can see the slides and video of his HiW presentation. We learned a lot, including that there’s quite a lot of work left to do.

Last year Edward Yang and Simon PJ (with advice from Simon Marlow and myself) started working on implementing the “Backpack” package system idea within GHC and Cabal. Backpack also needs to be able to manage multiple instances of packages with the same version (but different deps) and so it overlaps quite a lot with what we need for Nix-style package management in GHC. So Edward’s work has dealt with many of the issues in GHC that will be needed for Nix-style package management.

Another small step is that in GHC 7.10 we finally have the ability to register multiple instances of the same version of a package, and we have the basic mechanism in GHC to support multiple cheap environments (using environment files). Both of these new GHC features are opt-in so that they do not break existing tools.

The remaining work is primarily in the cabal tool. In particular we have to think carefully about the new user interface and how it maps into the Nix mechanisms.

So there has been a lot of progress and if we can keep making small useful steps then I think we can get there. Of course it would help to focus development efforts on it, perhaps with a hackathon or two.

Conclusion

Implementing the Nix approach in GHC/Cabal would cover a major part of the Cabal Hell problems.

In the next post we’ll look at curated package collections, which solves a different (but slightly overlapping) set of Cabal Hell problems. Nix-style package management and curated package collections are mostly complementary and we want both.

by duncan at January 28, 2015 01:29 PM

Dominic Orchard

Constraint kinds in Haskell, finally bringing us constraint families

Back in 2009 Tom Schrijvers and I wrote a paper entitled Haskell Type Constraints Unleashed [1] which appeared at FLOPS 2010 in April. In the paper we fleshed out the idea of adding constraint synyonyms and constraint families to GHC/Haskell, building upon various existing proposals for class families/indexed constraints. The general idea in our paper, and in the earlier proposals, is to add a mechanism to GHC/Haskell to allow constraints, such as type class or equality constraints, to be indexed by a type in the same way that type families and data families allow types to be indexed by types.

As an example of why constraint families are useful, consider the following type class which describes a simple, polymorphic, embedded language in Haskell (in the “finally tagless“-style [2]) (this example appears in [1]):

class Expr sem where
    constant :: a -> sem a
    add :: sem a -> sem a -> sem a

Instances of Expr provide different evaluation semantics for the language. For example, we might like to evaluate the language for numerical values, so we might try and write the following instance:

data E a = E {eval :: a}

instance Expr E where
     constant c = E c
     add e1 e2 = E $ (eval e1) + (eval e2)

However, this instance does not type check. GHC returns the type error:

    No instance for (Num a)
      arising from a use of `+'
    In the second argument of `($)', namely `(eval e1) + (eval e2)'
    ...

The + operation requires the Num a constraint, yet the signature for add states no constraints on type variable a, thus the constraint is never satisfied in this instance. We could add the Num a constraint to the signature of add, but this would restrict the polymorphism of the language: further instances would have this constraint forced upon them. Other useful semantics for the language may require other constraints e.g. Show a for pretty printing. Should we just add more and more constraints to the class? By no means!

Constraint families, as we describe in [1], provide a solution: by associating a constraint family to the class we can vary, or index, the constraints in the types of add and constant by the type of an instance of Expr. The solution we suggest looks something likes:

class Expr sem where
    constraint Pre sem a
    constant :: Pre sem a => a -> sem a
    add :: Pre sem a => sem a -> sem a -> sem a

instance Expr E where
    constraint Pre E a = Num a
    ... -- methods as before

Pre is the name of a constraint family that takes two type parameters and returns a constraint, where the first type parameter is the type parameter of the Expr class.

We could add some further instances:

data P a = P {prettyP :: String}

instance Expr P where
    constraint Pre P a = Show a
    constant c = P (show c)
    add e1 e2 = P $ prettyP e1 ++ " + " ++ prettyP e2

e.g.

*Main> prettyP (add (constant 1) (constant 2))
"1 + 2"

At the time of writing the paper I had only a prototype implementation in the form of a preprocessor that desugared constraint families into some equivalent constructions (which were significantly more awkward and ugly of course). There has not been a proper implementation in GHC, or of anything similar. Until now.

At CamHac, the Cambridge Haskell Hackathon, last month, Max Bolingbroke started work on an extension for GHC called “constraint kinds”. The new extensions unifies types and constraints such that the only distinction is that constraints have a special kind, denoted Constraint. Thus, for example, the |Show| class constructor is actually a type constructor, of kind:

Show :: * -> Constraint

For type signatures of the form C => t, the left-hand side is now a type term of kind Constraint. As another example, the equality constraints constructor ~ now has kind:

~ :: * -> * -> Constraint

i.e. it takes two types and returns a constraint.

Since constraints are now just types, existing type system features on type terms, such as synonyms and families, can be reused on constraints. Thus we can now define constraint synonyms via standard type synonms e.g.

type ShowNum a = (Num a, Show a)

And most excitingly, constraint families can be defined via type families of return kind Constraint. Our previous example can be written:

class Expr sem where
    type Pre sem a :: Constraint
    constant :: Pre sem a => a -> sem a
    add :: Pre sem a => sem a -> sem a -> sem a

instance Expr E where
    type Pre E a = Num a
    ...

Thus, Pre is a type family of kind * -> * -> Constraint. And it all works!

The constraint kinds extension can be turned on via the pragma:

{-# LANGUAGE ConstraintKinds #-}

Max has written about the extension over at his blog, which has some more examples, so do go check that out. As far as I am aware the extension should be hitting the streets with version 7.4 of GHC. But if you can’t wait it is already in the GHC HEAD revision so you can checkout a development snapshot and give it a whirl.

In my next post I will be showing how we can use the constraint kinds extension to describe abstract structures from category theory in Haskell that are defined upon subcategories. I already have a draft note about this (edit July 2012: draft note subsumed by my TFP 2012 submission)
submission if you can’t wait!

 

References


[1] Orchard, D. Schrijvers, T.: Haskell Type Constraints Unleashed, FLOPS 2010
, [author’s copy with corrections] [On SpringerLink]

[2] Carrete, J., Kiselyov, O., Shan, C. C.: Finally Tagless, Partially Evaluated, APLAS 2007


by dorchard at January 28, 2015 12:16 PM

Don Stewart (dons)

Contracting Haskell development role at Standard Chartered

The Strats team at Standard Chartered is hiring a developer for a 1 year contracting role in London.

The role is to develop and extend our parsing and validation library for FpML, using the FpML Haskell library to parse and build financial product data into our internal Haskell data types. You will extend our library to support more products, as well as building testing and validation tools (e.g. in QuickCheck) to confirm the soundness and completeness of the implementation.

Ideally you have at least a year’s experience with Haskell. A recent computer science graduate with a few Haskell (or similar typed FP language) courses and maybe open source contributions would be ideal. Advantages if you already have experience with Parsec-like combinator libraries and /or QuickCheck.

You will join an expert team in London, and this is a great opportunity to gain experience using Haskell in production as part of a very experienced team. (We have more than 2 million lines of Haskell, and our own Haskell compiler).

The role requires physical presence on the trading floor in London. Remote work isn’t an option, though some work from home may be possible. No financial background is required.

More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview.

If this sounds exciting to you, please send CVs to me – donald.stewart @ sc.com


Tagged: jobs, london

by Don Stewart at January 28, 2015 11:44 AM

January 27, 2015

The GHC Team

GHC Weekly News - 2015/01/27

Hi *,

It's time for some GHC Weekly news!

  • Austin took the time the past week to check `./validate --slow` failures, and is planning on filing bugs and fixes for the remaining failures soon. Afterwords, we'll immediately begin enabling --slow on Phabricator, so developers get their patches tested more thoroughly.
  • The 7.10 release looks like it will likely not have a 3rd Release Candidate, and will be released in late Feburary of 2015, as we originally expected.
  • The 7.10 branch currently has two showstopping bugs we plan on hitting before the final release. And we'd really like for users to test so we can catch more!
  • Austin Seipp will likely be gone for the coming week in a trip to New York City from the 28th to the 4th, meaning (much to the dismay of cheering crowds) you'd better catch him beforehand if you need him! (Alternatively Austin will be held back due to an intense snowstorm developing in NYC. So, we'll see!)
  • Austin is planning on helping the LLVM support in HEAD soon; after coordinating with Ben Gamari, we're hoping to ship GHC 7.12 with (at least) LLVM 3.6 as an officially supported backend, based on the documentation described in https://ghc.haskell.org/trac/ghc/wiki/ImprovedLLVMBackend - lots of thanks to Ben for working with upstream to file bugs and improve things!

And in other news, through chatter on the mailing list and Phabricator, we have:

  • Peter Trommler posted his first version of a native Linux/PowerPC 64bit code generator! There's still a lot more work to do, but this is a significantly improved situation over the unregisterised C backend. Curious developers can see the patch at Phab:D629.
  • A long, ongoing thread started by Richard Eisenberg about the long-term plans for the vectorisation code have been posted. The worry is that the vectoriser as well as DPH have stagnated in development, which costs other developers any time they need to build GHC, make larger changes, or keep code clean. There have been a lot of varied proposals in the thread from removing the code to commenting it out, to keeping it. It's unclear what the future holds, but the discussion still rages on. https://www.haskell.org/pipermail/ghc-devs/2015-January/007986.html
  • Alexander Vershilov made a proposal to the GHC team: can we remove the transformers dependency? It turns out to be a rather painful dependency for users of the GHC API and of packages depending on transformers, as you cannot link against any version other than the one GHC ships, causing pain. The alternative proposal involves splitting off the transformers dependency into a package of Orphan instances. The final decision isn't yet clear, nor is a winner in clear sight yet! https://www.haskell.org/pipermail/ghc-devs/2015-January/008058.html
  • Simon Marlow has started a long thread about the fate of records in future GHC versions. Previously, Adam Gundry had worked on OverloadedRecordFields. And now Nikita Volkov has introduced his records library which sits in a slightly different spot in the design space. But now the question is - how do we proceed? Despite all prior historical precedent, it looks like there's finally some convergence on a reasonable design that can hit GHC in the future. https://www.haskell.org/pipermail/ghc-devs/2015-January/008049.html

Closed tickets the past two weeks include: #9889, #9384, #8624, #9922, #9878, #9999, #9957, #7298, #9836, #10008, #9856, #9975, #10013, #9949, #9953, #9856, #9955, #9867, #10015, #9961, #5364, #9928, and #10028.

by thoughtpolice at January 27, 2015 09:03 PM

weekly20150127

Hi *,

It's time for some GHC Weekly news! <Insert funny Austin-ish blurb>

  • Austin took the time the past week to check `./validate --slow` failures, and is planning on filing bugs and fixes for the remaining failures soon. Afterwords, we'll immediately begin enabling --slow on Phabricator, so developers get their patches tested more thoroughly.
  • The 7.10 release looks like it will likely not have a 3rd Release Candidate, and will be released in late Feburary of 2015, as we originally expected.
  • The 7.10 branch currently has two showstopping bugs we plan on hitting before the final release. And we'd really like for users to test so we can catch more!
  • Austin Seipp will likely be gone for the coming week in a trip to New York City from the 28th to the 4th, meaning (much to the dismay of cheering crowds) you'd better catch him beforehand if you need him! (Alternatively Austin will be held back due to an intense snowstorm developing in NYC. So, we'll see!)
  • Austin is planning on helping the LLVM support in HEAD soon; after coordinating with Ben Gamari, we're hoping to ship GHC 7.12 with (at least) LLVM 3.6 as an officially supported backend, based on the documentation described in https://ghc.haskell.org/trac/ghc/wiki/ImprovedLLVMBackend - lots of thanks to Ben for working with upstream to file bugs and improve things!

And in other news, through chatter on the mailing list and Phabricator, we have:

  • Peter Trommler posted his first version of a native Linux/PowerPC 64bit code generator! There's still a lot more work to do, but this is a significantly improved situation over the unregisterised C backend. Curious developers can see the patch at Phab:D629.
  • A long, ongoing thread started by Richard Eisenberg about the long-term plans for the vectorisation code have been posted. The worry is that the vectoriser as well as DPH have stagnated in development, which costs other developers any time they need to build GHC, make larger changes, or keep code clean. There have been a lot of varied proposals in the thread from removing the code to commenting it out, to keeping it. It's unclear what the future holds, but the discussion still rages on. https://www.haskell.org/pipermail/ghc-devs/2015-January/007986.html
  • Alexander Vershilov made a proposal to the GHC team: can we remove the transformers dependency? It turns out to be a rather painful dependency for users of the GHC API and of packages depending on transformers, as you cannot link against any version other than the one GHC ships, causing pain. The alternative proposal involves splitting off the transformers dependency into a package of Orphan instances. The final decision isn't yet clear, nor is a winner in clear sight yet! https://www.haskell.org/pipermail/ghc-devs/2015-January/008058.html
  • Simon Marlow has started a long thread about the fate of records in future GHC versions. Previously, Adam Gundry had worked on OverloadedRecordFields. And now Nikita Volkov has introduced his records library which sits in a slightly different spot in the design space. But now the question is - how do we proceed? Despite all prior historical precedent, it looks like there's finally some convergence on a reasonable design that can hit GHC in the future. https://www.haskell.org/pipermail/ghc-devs/2015-January/008049.html

Closed tickets the past two weeks include: #9889, #9384, #8624, #9922, #9878, #9999, #9957, #7298, #9836, #10008, #9856, #10009, #10011, #9975, #10013, #9949, #9953, #9856, #9955, #9867, #10015, #9961, #5364, #9928, and #10028.

by thoughtpolice at January 27, 2015 09:01 PM

Noam Lewis

SJS is now known as Inferno

Inferno is a type inference engine for a safe subset of JS. It was formerly known as SJS.

Happy happy joy joy. The poll results are in. The interwebz has said it’s word, and that word is Inferno.

Inferno can stand for “Inference, Noam!” but who cares.

Here are the top results for those of you waiting for something to compile:

9 inferno
8 Safe JS (current name)
8 infer.js
7 saner.js
6 type.js
6 honest
6 antic (A Noam’s Type Inference and Checker :)
5 sane.js
5 entypen
4 inferity
4 entype
4 confer

 


by sinelaw at January 27, 2015 07:51 AM

Neil Mitchell

Hoogle 5 is coming

Summary: I'm working on Hoogle 5. If you like unfinished software, you can try it.

For the last month I've been working on the next version of Hoogle, version 5. It isn't finished, and it isn't the "official" version of Hoogle yet, but it's online, you can try it out, and anyone who wants to hack on Hoogle (or things like hoogle-index) should probably take a look now.

How do I try it?

The purpose of this blog post isn't to solicit beta testers, it's not ready for that - I'm mostly reaching out to Hoogle developers. That said, I know some people will want to try it, so it's online at hoogle.haskell.org. Beware, it isn't finished, and haskell.org/hoogle remains the "official" version to use (but hey, use whichever you want, or both, or neither).

What's new for users of the website?

  • It isn't finished. In particular, type search hasn't been implemented. But there are also lots of other pieces in all the corners that don't work properly.
  • It searches all the packages on Stackage by default.
  • There is a drop-down to allow you to restrict focus to only a single package, or an author, or a tag.

What's new for everyone else?

  • There is no library, it's not on Hackage and the command line tool isn't designed for users. These things will be coming over time.
  • It's hosted on a haskell.org virtual machine, served through Warp rather than as a CGI program. Thanks to the Haskell Infrastructure team for all their help. As a result, I'm free to experiment with Hoogle without accidentally making the Haskell homepage say "moo".
  • Generating database for all of Stackage (minus download time) takes about 40s and uses < 1Gb of memory. The databases are built directly out of the .tar.gz files, without unpacking them.

What's next?

Hoogle 5 is a complete rewrite of Hoogle, stealing bits as they were useful. At the moment Hoogle 4 is hosted at https://github.com/ndmitchell/hoogle while version 5 is at https://github.com/ndmitchell/hogle. The next step is to rename and move github repos so they are sensible once again. My best guess is that I should rename hoogle to hoogle4, then rename hogle to hoogle and move the issue tickets. I'm open to other suggestions.

Once that's resolved, I need to continue fixing and improving Hoogle so all the things that aren't finished become finished. If you are interested in helping, I recommend you create a github ticket describing what you want to do and we can take it from there.

by Neil Mitchell ([email protected]) at January 27, 2015 07:32 AM

Ian Ross

Non-diffusive atmospheric flow #8: flow pattern distribution

Non-diffusive atmospheric flow #8: flow pattern distribution

Up to this point, all the analysis that we’ve done has been what might be called “normal”, or “pedestrian” (or even “boring”). In climate data analysis, you almost always need to do some sort of spatial and temporal subsetting and you very often do some sort of anomaly processing. And everyone does PCA! So there’s not really been anything to get excited about yet.

Now that we have our PCA-transformed Z500 anomalies though, we can start to do some more interesting things. In this article, we’re going to look at how we can use the new representation of atmospheric flow patterns offered by the PCA eigenpatterns to reduce the dimensionality of our data, making it much easier to handle. We’ll then look at our data in an interesting geometrical way that allows us to focus on the patterns of flow while ignoring the strengths of different flows, i.e. we’ll be treating strong and weak blocking events as being the same, and strong and weak “normal” flow patterns as being the same. This simplification of things will allow us to do some statistics with our data to get an idea of whether there are statistically significant (in a sense we’ll define) flow patterns visible in our data.

Dimensionality reduction by projection to a sphere

First let’s think about how we might use the PCA-transformed data we generated in the previous article – we know that the PCA eigenpatterns are the Z500 anomaly patterns that explain the biggest fraction of the total Z500 anomaly variance: the first PCA eigenpattern is the pattern with the biggest variance, the second is the pattern orthogonal to the first with the biggest variance, the third is the pattern orthogonal to the first two patterns with the biggest variance, and so on.

In order to go from the 72 × 15 = 1080-dimensional Z500 anomaly data to something that’s easier to handle, both in terms of manipulation and in terms of visualisation, we’re going to take the seemingly radical step of discarding everything but the first three PCA eigenpatterns. Why three? Well, for one thing, three dimensions is about all we can hope to visualise. The first three PCA eigenpatterns respectively explain 8.86%, 7.46% and 6.27% of the total Z500 anomaly variance, for a total of 22.59%. That doesn’t sound like a very large fraction of the overall variance, but it’s important to remember that we’re interested in large-scale features of the flow here, and most of the variance in the later PCA eigenpatterns is small-scale “weather”, which we’d like to suppress from our analysis anyway.

Let’s make it quite clear what we’re doing here. At each time step, we have a 72 × 15 map of Z500 anomaly values. We transform this map into the PCA basis, which doesn’t lose any information, but then we truncate the vector of PCA projected components, retaining only the first three. So instead of having 72 × 15 = 1080 numbers (i.e. the individual grid-point Z500 anomaly values), we have just three numbers, the first three PCA projected components for the time step. We can thus think of our reduced dimensionality Z500 anomaly “map” as a point in three-dimensional space.

Three dimensions is still a little tricky to visualise, so we’re going to do something slightly sneaky. We’re going to take the data points in the three-dimensional space spanned by the first three PCA components, and we’re going to project those three-dimensional points onto the unit sphere. This yields two-dimensional points that we can represent by standard polar coordinates – if we think of a coordinate system where the x-, y- and z-axes lie respectively along the directions of the e1, e2 and e3 components in our three-dimensional space, then the standard colatitude θ and longitude ϕ are suitable coordinates to represent our data points. Because there’s real potential for confusion here, I’m going to be careful from now on to talk about coordinates on this “PCA projection sphere” only as θ or ϕ, reserving the words “latitude” and “longitude” for spatial points on the real Earth in the original Z500 data.

The following figures show how this works. Figure 1 shows our three-dimensional points, colour coded by distance from the origin. In fact, we normalise each of the PC components by the standard deviation of the whole set of PC component values – recall that we are using normalised PCA eigenpatterns here, so that the PCA projected component time series carry the units of Z500. That means that for the purposes of looking at patterns of Z500 variability, it makes sense to normalise the projected component time series somehow. Figure 2 shows the three-dimensional points with the unit sphere in this three-dimensional space, and Figure 3 shows each of the original three-dimensional points projected onto this unit sphere. We can then represent each Z500 pattern as a point on this sphere, parameterised by θ and ϕ, the angular components of the usual spherical coordinates in this three-dimensional space.

Original data points.

Data points with projection sphere.

Points projected onto unit sphere.


Kernel density estimation

Once we’ve projected our PCA data onto the unit sphere as described above, we can look at the distribution of data points in terms of the polar coordinates θ and ϕ:


Projected points


Note that we’re doing two kinds of projection here: first we’re transforming the original Z500 data (a time series of spatial maps) to the PCA basis (a time series of PCA projected components, along with the PCA eigenpatterns), then we’re taking the first three PCA components, normalising them and projecting to the unit sphere in the space spanned by the first three PCA eigenpatterns. Each point in the plot above thus represents a single spatial configuration of Z500 at a single point in time.

Looking at this plot, it’s not all that clear whether there exist spatial patterns of Z500 that are more common than others. There’s definitely some clustering of points in some areas of the plot, but it’s quite hard to assess because of the distortion introduced by plotting the spherical points on this kind of rectangular plot. What we’d really like is a continuous probability distribution where we can see regions of higher density of Z500 points as “bumps” in the distribution.

We’re going to use a method called kernel density estimation (KDE) to get at such a continuous distribution. As well as being better for plotting and identifying interesting patterns in the data, this will also turn out to give us a convenient route to use to determine the statistical significance of the “bumps” that we find.

We’ll start by looking at how KDE works in a simple one-dimensional case. Then we’ll write some Haskell code to do KDE on the two-dimensional sphere onto which we’ve projected our data. This is a slightly unusual use of KDE, but it turns out not to be much harder to do that the more “normal” cases.

The basic idea of KDE is pretty simple to explain. Suppose we have a sample of one-dimensional data points {xi} for i=1,,N. We can think of this sample as defining a probability distribution for x as

p(x)=1Ni=1Nδ(xxi)(1)

where δ(x) is the usual Dirac δ-function. What (1) is saying is that our data defines a PDF with a little probability mass (of weight 1/N) concentrated at each data point. In kernel density estimation, all that we do is to replace the function δ(x) by a kernel function K(x), where we require that

K(x)dx=1.

The intuition here is pretty clear: the δ-functions in (1) imply that our knowledge of the xi is perfect, so replacing the δ-functions with a “smeared out” kernel K(x) represents a lack of perfect knowledge of the xi. This has the effect of smoothing the “spikey” δ-function to give something closer to a continuous probability density function.

This is (of course!) a gross simplification. Density estimation is a complicated subject – if you’re interested in the details, the (very interesting) book by Bernard Silverman is required reading.

So, we’re going to estimate a probability density function as

p(x)=1Ni=1NK(xxi),(2)

which raises the fundamental question of what to use for the kernel, K(x)? This is more or less the whole content of the field of density estimation. We’re not going to spend a lot of time talking about the possible choices here because that would take us a bit off track, but what we have to choose basically comes down to two factors: what shape is the kernel and how “wide” is it?

A natural choice for the kernel in one-dimensional problems might be a Gaussian PDF, so that we would estimate our PDF as

p(x)=1Ni=1Nϕ(xxi,σ2),

where ϕ(μ,σ2) is a Gaussian PDF with mean μ and standard deviation σ. Here, the standard deviation measures how “wide” the kernel is. In general, people talk about the “bandwidth” of a kernel, and usually write a kernel as something like K(x;h), where h is the bandwidth (which means something different for different types of kernel, but is generally supposed to be some sort of measure of how spread out the kernel is). In fact, in most cases, it turns out to be better (for various reasons: see Silverman’s book for the details) to choose a kernel with compact support. We’re going to use something called the Epanechnikov kernel:

K(u)=34(1u2)1{u1},(3)

where u=x/h and 1{u1} is the indicator function for u1, i.e. a function that is one for all points where u1 and zero for all other points (this just ensures that our kernel has compact support and is everywhere non-negative).

This figure shows how KDE works in practice in a simple one-dimensional case: We randomly sample ten points from the range [1,9] (red impulses in the figure) and use an Epanechnikov kernel with bandwidth h=2 centred around each of the sample points. Summing the contributions from each of the kernels gives the thick black curve as an estimate of the probability density function from which the sample points were drawn.

The Haskell code to generate the data from which the figure is drawn looks like this (as usual, the code is in a Gist):

module Main where

import Prelude hiding (enumFromThenTo, length, map, mapM_, replicate, zipWith)
import Data.Vector hiding ((++))
import System.Random
import System.IO

-- Number of sample points.
n :: Int
n = 10

-- Kernel bandwidth.
h :: Double
h = 2.0

-- Ranges for data generation and output.
xgenmin, xgenmax, xmin, xmax :: Double
xgenmin = 1 ; xgenmax = 9
xmin = xgenmin - h ; xmax = xgenmax + h

-- Output step.
dx :: Double
dx = 0.01

main :: IO ()
main = do
  -- Generate sample points.
  samplexs <- replicateM n $ randomRIO (xgenmin, xgenmax)
  withFile "xs.dat" WriteMode $ \h -> forM_ samplexs (hPutStrLn h . show)
  let outxs = enumFromThenTo xmin (xmin + dx) xmax
      -- Calculate values for a single kernel.
      doone n h xs x0 = map (/ (fromIntegral n)) $ map (kernel h x0) xs
      -- Kernels for all sample points.
      kernels = map (doone n h outxs) samplexs
      -- Combined density estimate.
      pdf = foldl1' (zipWith (+)) kernels
      pr h x s = hPutStrLn h $ (show x) ++ " " ++ (show s)
      kpr h k = do
        zipWithM_ (pr h) outxs k
        hPutStrLn h ""
  withFile "kernels.dat" WriteMode $ \h -> mapM_ (kpr h) kernels
  withFile "kde.dat" WriteMode $ \h -> zipWithM_ (pr h) outxs pdf


-- Epanechnikov kernel.
kernel :: Double -> Double -> Double -> Double
kernel h x0 x
  | abs u <= 1 = 0.75 * (1 - u^2)
  | otherwise = 0
  where u = (x - x0) / h

The density estimation part of the code is basically a direct transcription of (2) and (3) into Haskell. We have to choose a resolution at which we want to output samples from the PDF (the value dx in the code) and we have to use that to generate x values to output the PDF (outxs in the code), but once we’ve done that, it’s just a matter of calculating the values of kernels centred on our random sample points for each of the output points, then combining the kernels to get the final density estimate. We’re going to do essentially the same thing in our spherical PDF case, although we have to think a little bit about the geometry of the problem, since we’re going to need a two-dimensional version of the Epanechnikov kernel on the unit sphere to replace (3). That’s just a detail though, and conceptually the calculation we’re going to do is identical to the one-dimensional example.

Calculating and visualising the spherical PDF

The code to calculate our spherical PDF using kernel density estimation is in the linked file. We follow essentially the same approach that we used in the one-dimensional case described above, except that we use a two-dimensional Epanechnikov kernel defined in terms of angular distances between points on the unit sphere. We calculate PDF values on a grid of points gij, where i and j label the θ and ϕ coordinate directions of our Nθ×Nϕ grid on the unit sphere, so that i=1,,Nθ, j=1,,Nϕ. Given a set of data points xk, k=1,,N, we define the angular distance between a grid point and a data point as

δij,k=cos1(g^ijx^k),

where a^ is a unit vector in the direction of a vector a (this is the distance function in the code).

We can then define an angular Epanechnikov kernel as

K(δ)=A(1u2)1{u1},

where u=δ/h for a bandwidth h (which is an angular bandwidth here) and where A is a normalisation constant. The inband function in the code calculates an unnormalised version of this kernel for all grid points closer to a given data point that the specified bandwidth. We accumulate these unnormalised kernel values for all grid points using the hmatrix build function.

To make life a little simpler, we deal with normalisation of the spherical PDF after we accumulate all of the kernel values, calculating an overall normalisation constant from the unnormalised PDF pu(θ,ϕ) as

C=S2pu(θ,ϕ)sinθdθdϕ

(the value int in the code) and dividing all of the accumulated kernel values by this normalisation constant to give the final normalised spherical PDF (called norm in the code).

Most of the rest of the code is then concerned with writing the results to a NetCDF file for further processing.

The spherical PDF that we get from this KDE calculation is shown in the next plot, parametrised by spherical polar coordinates θ and ϕ: darker colours show regions of greater probability density.


Spherical PDF


We can now see quite clearly that the distribution of spatial patterns of Z500 in (θ,ϕ) space does appear to be non-uniform, with some preferred regions and some less preferred regions. In the next article but one, we’ll address the question of the statistical significance of these apparent “bumps” in the PDF, before we go on to look at what sort of flow regimes the preferred regions of (θ,ϕ) space represent. (Four of the more prominent “bumps” are labelled in the figure for reference later on.)

Before that though, in the next article we’ll do some optimisation of the spherical PDF KDE code so that it’s fast enough to use for the sampling-based significance testing approach we’re going to follow.

<script src="http://zor.livefyre.com/wjs/v3.0/javascripts/livefyre.js" type="text/javascript"></script> <script type="text/javascript"> (function () { var articleId = fyre.conv.load.makeArticleId(null); fyre.conv.load({}, [{ el: 'livefyre-comments', network: "livefyre.com", siteId: "290329", articleId: articleId, signed: false, collectionMeta: { articleId: articleId, url: fyre.conv.load.makeCollectionUrl(), } }], function() {}); }()); </script>

January 27, 2015 07:09 AM

Danny Gratzer

Observations about -XStaticPointers

Posted on January 27, 2015
Tags: haskell, types

For those who haven’t heard, GHC 7.10 is making a brave foray into the exciting world of distributed computing. To this end, it’s made a new language extension called -XStaticPointers to support Cloud Haskell in a pleasant, first class manner.

If you haven’t heard of static pointers before now, it’s worth glancing through the nice tutorial from ocharles’ 24 days of $(Haskell Related Thing).

The long and short of it is that -XStaticPointers gives us this new keyword static. We apply static to an expression and if there are no closured variables (to be formalized momentarily) then we get back a StaticPtr a. This gives us a piece of data that we can" serialize and ship over the wire because it has no dependencies.

Now to expand upon this “no closured variables”. A thing can only be fed to static if the free variables in the expression are all top level variables. This forbids us from writing something like

    foo :: StaticPtr a
    foo a = static a

Now in all honesty, I’m not super interested in Cloud Haskell. It’s not my area of expertise and I’m already terrified of trying to do things on one machine. What does interest me a lot though is this notion of having “I have no free variables” in the type of an expression. It’s an invariant we didn’t really have before in Haskell.

In fact, as I looked more closely it reminded me of something called box from modal logic.

A Quick Summary of Modal Logic

I’m not trying to give you a full understanding of modal logic, just a brief taste.

Modal logic extends our vanilla logic (in Haskell land this is constructive logic) with modalities. Modalities are little prefixes we tack on the front of a proposition to qualify its meaning slightly.

For example we might say something like

If it is possible that it is raining, then I will need an umbrella.

Here we used then modality possible to indicate we’re not assuming that it is snowing, only that it’s conceivable that it is. Because I’m a witch and will melt in the rain even the possibility of death raining from the sky will force me to pack my umbrella.

To formalize this a bit, we have our inductive definition of propositions

P = ⊥
  | ⊤
  | P ∧ P
  | P ∨ P
  | P ⇒ P
  | □ P

This is the syntax of a particular modal logic with one modality. Everything looks quite normal up until the last proposition form, which is the “box” modality applied to some proposition.

The box modality (the one we really care about for later) means “necessarily”. I almost think of it is a truthier truth if you can buy that. □ forbids us from using any hypotheses saying something like A is true inside of it. Since it represents a higher standard of proof we can’t use the weaker notion that A is true! The rule for creating a box looks like this to the first approximation

 • ⊢ A
———————
Γ ⊢ □ A

So in order to prove a box something under a set of assumptions Γ, we have to prove it assuming none of those assumptions. In fact, we find that this is a slightly overly restrictive form for this judgment, we know that if we have a □ A we proved it without assumptions so if we have to introduce a □ B we should be able to use the assumption that A is true for this proof because we know we can construct one without any assumptions and could just copy paste that in.

This causes us to create a second context, one of the hypotheses that A is valid, usually notated with a Δ. We then get the rules

   Δ; • ⊢ A          Δ; Γ ⊢ A valid     A valid ∈ Δ
———————————————      ——————————————   ———————————————
Δ; Γ ⊢ □ A true       Δ; Γ ⊢ A true    Δ; Γ ⊢ A valid


Δ; Γ ⊢ □ A   Δ, A valid; Γ ⊢ B
——————————————————————————————
         Δ; Γ ⊢ B

What you should take away from these scary looking symbols is

  1. A valid is much stronger than A true
  2. Anything inside a □ can depend on valid stuff, but not true stuff
  3. □ A true is the same as A valid.

This presentation glosses over a fair amount, if your so inclined I’d suggest looking at Frank Pfenning’s lecture notes from his class entitled “Modal Logic”. These actually go at a reasonable pace and introduce the groundwork for someone who isn’t super familiar with logic.

Now that we’ve established that there is an interesting theoretical backing for modal logic, I’m going to drop it on the floor and look at what Haskell actually gives us.

That “Who Cares” Bit

Okay, so how does this pertain to StaticPtr? Well I noticed that just like how box drops hypotheses that are “merely true”, static drops variables that are introduced by our local context!

This made me think that perhaps StaticPtrs are a useful equivalent to the □ modality! This shouldn’t be terribly surprising for PL people, indeed the course I linked to above expressly mentions □ to notate “movable code”. What’s really exciting about this is that there are a lot more applications of □ then just movable code! We can use it to notate staged computation for example.

Alas however, it was not to be. Static pointers are missing one essential component that makes them unsuitable for being □, we can’t eliminate them properly. In modal logic, we have a rule that lets other boxes depend on the contents of some box. The elimination rule is much stronger than just “If you give me a □ A, I’ll give you an A” because it’s much harder to construct a □ A in the first place! It’s this asymmetry that makes static pointers not quite kosher. With static pointers there isn’t a notion that we can take one static pointer and use it in another.

For example, we can’t write

    applyS :: StaticPtr a -> StaticPtr (a -> b) -> StaticPtr b
    applyS sa sf = static (deRefStaticPtr sf (deRefStaticPtr sa))

My initial reaction was that -XStaticPointers is missing something, perhaps a notion of a “static pattern”. This would let us say something like

    applyS :: StaticPtr a -> StaticPtr (a -> b) -> StaticPtr b
    applyS sa sf =
      let static f = sf
          static a = sa
      in static (f a)

So this static pattern in a keyword would allow us to hoist a variable into the realm of things we’re allowed to leave free in a static pointer.

This makes sense from my point of view, but less so from that of Cloud Haskell. The whole point of static pointers is to show a computation is dependency free after all, static patterns introduce a (limited) set of dependencies on the thunk that make our lives complicated. It’s not obvious to me how to desugar things so that static patterns can be compiled how we want them to be, it looks like it would require some runtime code generation which is a no-no for Haskell.

My next thought was that maybe Closure was the answer, but that doesn’t actually work either! We can introduce a closure from an arbitrary serializable term which is exactly what we don’t want from a model of □! Remember, we want to model closed terms so allowing us to accept an arbitrary term defeats the point.

It’s painfully clear that StaticPtrs are very nearly □s, but not quite! Whatever Box ends up being, we’d want the following interface

    data Box a

    intoBox :: StaticPtr a -> Box a
    closeBox :: Box (a -> b) -> Box a -> Box b
    outBox :: Box a -> a

The key difference from StaticPtr’s being closeBox. Basically this gives us a way to say “I have something that’s closed except for one dependency” and we can fill that dependency with some other closed term.

This turns something like

    let static x = sx in static y

into

    intoBox (static (\x -> y)) `closeBox` intoBox sx

If you read the tutorial, you’ll notice that this is most of the implementation of Closure! Following our noses we define

    data Box a where
      Pure  :: StaticPtr a -> Box a
      Close :: Box (a -> b) -> Box a -> Box b

This is literally the dumbest implementation of Box I think is possible, but it actually works just fine.

    intoBox = Pure
    closeBox = Close

    outBox :: Box a -> a
    outBox (Pure a) = deRefStaticPtr a
    outBox (Close f a) = outBox f (outBox a)

which would seem to be modal logic in Haskell.

Wrap Up

To be honest, I’m not sure yet how this is useful. I’m kinda swamped with coursework at the moment (new semester at CMU) but it seems like a new and fun thing to play with.

I’ve stuck the code at jozefg/modal if you want to play with it. Fair warning that it only compiles with GHC >= 7.10 because we need static pointers.

Finally, since the idea of modalities for sendable code is not a new one, I should leave these links

Cheers.

<script type="text/javascript"> var disqus_shortname = 'codeco'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

January 27, 2015 12:00 AM

January 25, 2015

Yesod Web Framework

PolyConf 2015

Last year I spoke at PolyConf about client/server Haskell webapps, and especially on GHCJS. This year's PolyConf has just opened up a Call for Proposals.

I really enjoyed my time at the conference last year: it's a great place to meet and interact with people doing cool things in other languages, and can be a great source of inspiration for ways we can do things better. Also, as I've mentioned recently, I think it's a great thing for us Haskellers to put out content accessible by the non-Haskell world, and a conference like this is a perfect way to do so.

If you have a topic that you think polyglot programmers would be interested in, please take the time to submit a proposal for the conference.

January 25, 2015 01:30 PM

Dominic Steinitz

A Type Safe Reverse or Some Hasochism

Conor McBride was not joking when he and his co-author entitled their paper about dependent typing in Haskell “Hasochism”: Lindley and McBride (2013).

In trying to resurrect the Haskell package yarr, it seemed that a dependently typed reverse function needed to be written. Writing such a function turns out to be far from straightforward. How GHC determines that a proof (program) discharges a proposition (type signature) is rather opaque and perhaps not surprisingly the error messages one gets if the proof is incorrect are far from easy to interpret.

I’d like to thank all the folk on StackOverflow whose answers and comments I have used freely below. Needless to say, any errors are entirely mine.

Here are two implementations, each starting from different axioms (NB: I have never seen type families referred to as axioms but it seems helpful to think about them in this way).

> {-# OPTIONS_GHC -Wall                     #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing  #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults   #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind  #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans         #-}
> {-# LANGUAGE GADTs                        #-}
> {-# LANGUAGE KindSignatures               #-}
> {-# LANGUAGE DataKinds                    #-}
> {-# LANGUAGE TypeFamilies                 #-}
> {-# LANGUAGE UndecidableInstances         #-}
> {-# LANGUAGE ExplicitForAll               #-}
> {-# LANGUAGE TypeOperators                #-}
> {-# LANGUAGE ScopedTypeVariables          #-}

For both implementations, we need propositional equality: if a :~: b is inhabited by some terminating value, then the type a is the same as the type b. Further we need the normal form of an equality proof: Refl :: a :~: a and a function, gcastWith which allows us to use internal equality (:~:) to discharge a required proof of external equality (~). Readers familiar with topos theory, for example see Lambek and Scott (1988), will note that the notation is reversed.

> import Data.Type.Equality ( (:~:) (Refl), gcastWith )

For the second of the two approaches adumbrated we will need

> import Data.Proxy

The usual natural numbers:

> data Nat = Z | S Nat

We need some axioms:

  1. A left unit and
  2. Restricted commutativity.
> type family (n :: Nat) :+ (m :: Nat) :: Nat where
>     Z   :+ m = m
>     S n :+ m = n :+ S m

We need the usual singleton for Nat to tie types and terms together.

> data SNat :: Nat -> * where
>   SZero :: SNat Z
>   SSucc :: SNat n -> SNat (S n)

Now we can prove some lemmas.

First a lemma showing we can push :+ inside a successor, S.

> succ_plus_id :: SNat n1 -> SNat n2 -> (((S n1) :+ n2) :~: (S (n1 :+ n2)))
> succ_plus_id SZero _ = Refl
> succ_plus_id (SSucc n) m = gcastWith (succ_plus_id n (SSucc m)) Refl

This looks nothing like a standard mathematical proof and it’s hard to know what ghc is doing under the covers but presumably something like this:

  • For SZero
  1. S Z :+ n2 = Z :+ S n2 (by axiom 2) = S n2 (by axiom 1) and
  2. S (Z + n2) = S n2 (by axiom 1)
  3. So S Z :+ n2 = S (Z + n2)
  • For SSucc
  1. SSucc n :: SNat (S k) so n :: SNat k and m :: SNat i so SSucc m :: SNat (S i)
  2. succ_plus id n (SSucc m) :: k ~ S p => S p :+ S i :~: S (p :+ S i) (by hypothesis)
  3. k ~ S p => S p :+ S i :~: S (S p :+ i) (by axiom 2)
  4. k :+ S i :~: S (k :+ i) (by substitution)
  5. S k :+ i :~: S (k :+ i) (by axiom 2)

Second a lemma showing that Z is also the right unit.

> plus_id_r :: SNat n -> ((n :+ Z) :~: n)
> plus_id_r SZero = Refl
> plus_id_r (SSucc n) = gcastWith (plus_id_r n) (succ_plus_id n SZero)
  • For SZero
  1. Z :+ Z = Z (by axiom 1)
  • For SSucc
  1. SSucc n :: SNat (S k) so n :: SNat k
  2. plus_id_r n :: k :+ Z :~: k (by hypothesis)
  3. succ_plus_id n SZero :: S k :+ Z :~: S (k + Z) (by the succ_plus_id lemma)
  4. succ_plus_id n SZero :: k :+ Z ~ k => S k :+ Z :~: S k (by substitution)
  5. plus_id_r n :: k :+ Z :~: k (by equation 2)

Now we can defined vectors which have their lengths encoded in their type.

> infixr 4 :::
> data Vec a n where
>     Nil   :: Vec a Z
>     (:::) :: a -> Vec a n -> Vec a (S n)

We can prove a simple result using the lemma that Z is a right unit.

> elim0 :: SNat n -> (Vec a (n :+ Z) -> Vec a n)
> elim0 n x = gcastWith (plus_id_r n) x

Armed with this we can write an {\mathcal{O}}(n) reverse function in which the length of the result is guaranteed to be the same as the length of the argument.

> size :: Vec a n -> SNat n
> size Nil         = SZero
> size (_ ::: xs)  = SSucc $ size xs
> accrev :: Vec a n -> Vec a n
> accrev x = elim0 (size x) $ go Nil x where
>     go :: Vec a m -> Vec a n -> Vec a (n :+ m)
>     go acc  Nil       = acc
>     go acc (x ::: xs) = go (x ::: acc) xs
> toList :: Vec a n -> [a]
> toList  Nil       = []
> toList (x ::: xs) = x : toList xs
> test0 :: [String]
> test0 = toList $ accrev $ "a" ::: "b" ::: "c" ::: Nil
ghci> test0
  ["c","b","a"]

For an alternative approach, let us change the axioms slightly.

> type family (n1 :: Nat) + (n2 :: Nat) :: Nat where
>   Z + n2 = n2
>   (S n1) + n2 = S (n1 + n2)

Now the proof that Z is a right unit is more straightforward.

> plus_id_r1 :: SNat n -> ((n + Z) :~: n)
> plus_id_r1 SZero = Refl
> plus_id_r1 (SSucc n) = gcastWith (plus_id_r1 n) Refl

For the lemma showing we can push + inside a successor, S, we can use a Proxy.

> plus_succ_r1 :: SNat n1 -> Proxy n2 -> ((n1 + (S n2)) :~: (S (n1 + n2)))
> plus_succ_r1 SZero _ = Refl
> plus_succ_r1 (SSucc n1) proxy_n2 = gcastWith (plus_succ_r1 n1 proxy_n2) Refl

Now we can write our reverse function without having to calculate sizes.

> accrev1 :: Vec a n -> Vec a n
> accrev1 Nil = Nil
> accrev1 list = go SZero Nil list
>   where
>     go :: SNat n1 -> Vec a n1 -> Vec a n2 -> Vec a (n1 + n2)
>     go snat acc Nil = gcastWith (plus_id_r1 snat) acc
>     go snat acc (h ::: (t :: Vec a n3)) =
>       gcastWith (plus_succ_r1 snat (Proxy :: Proxy n3))
>                 (go (SSucc snat) (h ::: acc) t)
> test1 :: [String]
> test1 = toList $ accrev1 $ "a" ::: "b" ::: "c" ::: Nil
ghci> test0
  ["c","b","a"]

Bibliography

Lambek, J., and P.J. Scott. 1988. Introduction to Higher-Order Categorical Logic. Cambridge Studies in Advanced Mathematics. Cambridge University Press. http://books.google.co.uk/books?id=6PY_emBeGjUC.

Lindley, Sam, and Conor McBride. 2013. “Hasochism: The Pleasure and Pain of Dependently Typed Haskell Programming.” In Proceedings of the 2013 ACM SIGPLAN Symposium on Haskell, 81–92. Haskell ’13. New York, NY, USA: ACM. doi:10.1145/2503778.2503786.


by Dominic Steinitz at January 25, 2015 09:07 AM

Noam Lewis

Ok, Internet, I heard you. I need a new name for the JS type inferer/checker.

I knew a new name is needed for this JS type inferer/checker. Many people have pointed out that sweet.js uses the same short name (sjs). I could just stick with “Safe JS” but not make it short. Or I could pick a new name, to make it less boring.

Help me help you!

Please take this quick poll.

Thanks!


by sinelaw at January 25, 2015 08:50 AM

January 24, 2015

Christopher Done

My Haskell tooling wishlist

I spend a lot of my time on Haskell tooling, both for my hobbies and my job. Almost every project I work on sparks a desire for another piece of tooling. Much of the time, I’ll follow that wish and take a detour to implement that thing (Fay, structured-haskell-mode, hindent, are some Haskell-specific examples). But in the end it means less time working on the actual domain problem I’m interested in, so a while ago I intentionally placed a quota on the amount of time I can spend on this.

So this page will contain a list of things I’d work on if I had infinite spare time, and that I wish someone else would make. I’ll update it from time to time as ideas come to the fore.

These projects are non-trivial but are do-able by one person who has enough free time and motivation. There is a common theme among the projects listed, which is that they are things that Haskell among most other well known languages is particularly well suited for and yet we don’t have such tooling as standard tools in the Haskell tool box. They should be!

An equational reasoning assistant

Equational reasoning lets you prove properties about your functions by following a simple substitution model to state that one term is equal to another. The approach I typically take is to expand and reduce until both sides of the equation are the same.

Here is an example. I have a data type, Consumer. Here is an instance of Functor:

instance Functor (Consumer s d) where
  fmap f (Consumer d p) =
    Consumer d
             (\s ->
                case p s of
                  (Failed e,s') -> (Failed e,s')
                  (Continued e,s') -> (Continued e,s')
                  (Succeeded a,s') ->
                    (Succeeded (f a),s'))

I want to prove that it is a law-abiding instance of Functor, which means proving that fmap id ≡ id. You don’t need to know anything about the Consumer type itself, just this implementation. Here are some very mechanical steps one can take to prove this:

id ≡ fmap id
   ≡ \(Consumer d p) ->
        Consumer d
         (\s ->
            case p s of
              (Failed e,s') -> (Failed e,s')
              (Continued e,s') -> (Continued e,s')
              (Succeeded a,s') -> (Succeeded (id a),s'))
   ≡ \(Consumer d p) ->
        Consumer d
         (\s ->
            case p s of
              (Failed e,s') -> (Failed e,s')
              (Continued e,s') -> (Continued e,s')
              (Succeeded a,s') -> (Succeeded a,s'))
   ≡ \(Consumer d p) ->
        Consumer d
         (\s -> p s)
   ≡ \(Consumer d p) ->
        Consumer d p
   ≡ id

So that’s:

  • Expand the fmap id into the instance’s implementation.
  • Reduce by applying the property that id x ≡ x.
  • Reason that if every branch of a case returns the original value of the case, then that whole case is an identity and can be dropped.
  • Eta-reduce.
  • Again, pattern-matching lambdas are just syntactic sugar for cases, so by the same rule this can be considered identity.
  • End up with what we wanted to prove: fmap id ≡ id

These are pretty mechanical steps. They’re also pretty laborious and error-prone. Of course, if you look at the first step, it’s pretty obvious the whole thing is an identity, but writing the steps out provides transformations that can be statically checked by a program. So it’s a good example, because it’s easily understandable and you can imagine proving something more complex would require a lot more steps and a lot more substitutions. Proof of identity for Applicative has substantially more steps, but is equally mechanical.

Wouldn’t it be nice if there was a tool which given some expression would do the following?

  • Suggest a list of in-place expansions.
  • Suggest a list of reductions based on a set of pre-defined rules (or axioms).

Then I could easily provide an interactive interface for this from Emacs.

In order to do expansion, you need the original source of the function name you want to expand. So in the case of id, that’s why I suggested stating an axiom (id a ≡ a) for this. Similarly, I could state the identity law for Monoids by saying mappend mempty a ≡ a, mappend a mempty ≡ a. I don’t necessarily need to expand the source of all functions. Usually just the ones I’m interested in.

Given such a system, for my example above, the program could actually perform all those steps automatically and spit out the steps so that I can read them if I choose, or otherwise accept that the proof was derived sensibly.

In fact, suppose I have my implementation again, and I state what must be satisfied by the equational process (and, perhaps, some axioms that might be helpful for doing it, but in this case our axioms are pretty standard), I might write it like this:

instance Functor (Consumer s d) where
  fmap f (Consumer d p) = ...
proof [|fmap id ≡ id :: Consumer s d a|]

This template-haskell macro proof would run the steps above and if the equivalence is satisfied, the program compiles. If not, it generates a compile error, showing the steps it performed and where it got stuck. TH has limitations, so it might require writing it another way.

Such a helpful tool would also encourage people (even newbies) to do more equational reasoning, which Haskell is often claimed to be good at but you don’t often see it in evidence in codebases. In practice isn’t a standard thing.

Promising work in this area:

Update 2014-01-25: Andrew Gill got back to me that HERMIT is the continuation of HERA. It seems that you can get inlining, general reduction and a whole bunch of case rewrites from this project. Check the KURE paper for the DSL used to do these rewrites, it looks pretty aweeome. So if anyone’s thinking of working on this, I’d probably start with reading HERMIT.Shell or HERMIT.Plugin and see how to get it up and running. It’s a pity it has to work on Core, that’s a little sad, but as far as trade-offs go it’s not too bad. Doing proofs on things more complicated than core might be hard anyway. It does mean you’ll probably want to make a rewrite that does a global variable replacement: x and y is a little easier to read than x0_6 and the like that you get in Core.

Catch for GHC

Ideally, we would never have inexhaustive patterns in Haskell. But a combination of an insufficient type system and people’s insistence on using partial functions leads to a library ecosystem full of potential landmines. Catch is a project by Neil Mitchell which considers how a function is called when determining whether its patterns are exhaustive or not. This lets us use things like head and actually have a formal proof that our use is correct, or a formal proof that our use, or someone else’s use, will possibly crash.

map head . group

This is an example which is always correct, because group returns a list of non-empty lists.

Unfortunately, it currently works for a defunct Haskell compiler, but apparently it can be ported to GHC Core with some work. I would very much like for someone to do that. This is yet another project which is the kind of thing people claim is possible thanks to Haskell’s unique properties, but in practice it isn’t a standard thing, in the way that QuickCheck is.

A substitution stepper

This is semi-related, but different, to the proof assistant. I would like a program which can accept a Haskell module of source code and an expression to evaluate in the context of that module and output the same expression, as valid source code, with a single evaluation step performed. This would be fantastic for writing new algorithms, for understanding existing functions and algorithms, writing proofs, and learning Haskell. There was something like this demonstrated in Inventing on Principle. The opportunities for education and general development practice are worth such a project.

Note: A debugger stepper is not the same thing.

Example:

foldr (+) 0 [1, 2, 3, 4]

foldr (+) 0 (1 : [2, 3, 4])

1 + foldr (+) 0 [2, 3, 4]

1 + foldr (+) 0 (2 : [3, 4])

1 + (2 + foldr (+) 0 [3, 4])

1 + (2 + foldr (+) 0 (3 : [4]))

1 + (2 + (3 + foldr (+) 0 [4]))

1 + (2 + (3 + foldr (+) 0 (4 : [])))

1 + (2 + (3 + (4 + foldr (+) 0 [])))

1 + (2 + (3 + (4 + 0)))

1 + (2 + (3 + 4))

1 + (2 + 7)

1 + 9

10

Comparing this with foldl immediately shows the viewer how they differ in structure:

foldl (+) 0 [1, 2, 3, 4]

foldl (+) 0 (1 : [2, 3, 4])

foldl (+) ((+) 0 1) [2, 3, 4]

foldl (+) ((+) 0 1) (2 : [3, 4])

foldl (+) ((+) ((+) 0 1) 2) [3, 4]

foldl (+) ((+) ((+) 0 1) 2) (3 : [4])

foldl (+) ((+) ((+) ((+) 0 1) 2) 3) [4]

foldl (+) ((+) ((+) ((+) 0 1) 2) 3) (4 : [])

foldl (+) ((+) ((+) ((+) ((+) 0 1) 2) 3) 4) []

(+) ((+) ((+) ((+) 0 1) 2) 3) 4

1 + 2 + 3 + 4

3 + 3 + 4

6 + 4

10

Each step in this is a valid Haskell program, and it’s just simple substitution.

If the source for a function isn’t available, there are a couple options for what to do:

  • Have special-cases for things like (+), as above.
  • Just perform no substitution for that function, it will still be a legitimate program.

It’s another project I could easily provide see-as-you-type support for in Emacs, given an engine to query.

Again, this is just one more project which should just be a standard thing Haskell can do. It’s a pure language. It’s used to teach equational reasoning and following a simple lambda calculus substitution model. But there is no such tool. Haskell is practically waving in our faces with this opportunity.

Existing work in this area:

  • stepeval - a prototype which nicely demonstrates the idea. It’s based on HSE and only supports a tiny subset. There aren’t any plans to move this forward at the moment. I’ll update the page if this changes.

January 24, 2015 12:00 AM

January 23, 2015

JP Moresmau

Writing a low level graph database

I've been interested in graph databases for a long time, and I've developed several applications that offer an API close enough to a graph API, but with relational storage. I've also played with off the shelf graph databases, but I thought it would be fun to try an implement my own, in Haskell of course.
I found that in general literature is quite succinct on how database products manage their physical storage, so I've used some of the ideas behind the Neo4J database, as explained in the Graph Databases books and in a few slideshows online.
So I've written the start of a very low level graph database, writing directly on disk via Handles and some Binary instances. I try to use fixed length record so that their IDs translate easily into offsets in the file. Mostly everything ends up looking like linked lists on disk: vertices have a pointer to their first property and their first edge, and in turn these have pointers to the next property or edge. Vertex have pointers to edges linking to and from them.
I've also had some fun trying to implement an index trie on disk.
All in all, it's quite fun, even though I realize my implementations are quite naive, and I just hope that the OS disk caching is enough to make performance acceptable. I've written a small benchmark using the Hackage graph of packages as sample data, but I would need to write the same with a relational backend.

If anybody is interested in looking at the code or even participate, everything is of course on Github!

by JP Moresmau ([email protected]) at January 23, 2015 04:10 PM

January 22, 2015

FP Complete

Commercial Haskell Special Interest Group

At FP Complete, we’re constantly striving to improve the quality of the Haskell ecosystem, with a strong emphasis on making Haskell a viable tool for commercial users. Over the past few years we’ve spoken with many companies either currently using Haskell or considering doing so, worked with a number of customers in making Haskell a reality for their software projects, and released tooling and libraries to the community.

We’re also aware that we’re not the only company trying to make Haskell a success, and that others are working on similar projects to our own. We believe that there’s quite a lot of room to collaborate on identifying problems, discussing options, and creating solutions.

Together with a few other companies and individuals, we are happy to announce the launch of a Commercial Haskell Special Interest Group.

If you're interested in using Haskell in a commercial context, please join the mailing list. I know that we have some projects we think are worth immediate collaboration, and we'll kick off discussions on those after people have time to join the mailing list. And I'm sure many others have ideas too. I look forward to hearing them!

January 22, 2015 10:00 PM

Magnus Therning

Combining inputs in conduit

The code the post on using my simplistic state machine together with conduit will happily wait forever for input, and the only way to terminate it is pressing Ctrl-C. In this series I’m writing a simple adder machine, but my actual use case for these ideas are protocols for communication (see the first post), and waiting forever isn’t such a good thing; I want my machine to time out if input doesn’t arrive in a timely fashion.

I can think of a few ways to achieve this:

  1. Use a watchdog thread that is signalled from the main thread, e.g. by a Conduit that sends a signal as each Char passes through it.
  2. As each Char is read kick off a “timeout thread” which throws an exception back to the main thread, unless the main thread kills it before the timeout expires.
  3. Run a thread that creates ticks that then can be combined with the Chars read and fed into the state machine itself.

Since this is all about state machines I’ve opted for option 3. Furthermore, since I’ve recently finished reading the excellent Parallel and Concurrent Programming in Haskell I decided to attempt writing the conduit code myself instead of using something like stm-conduit.

The idea is to write two functions:

  1. one Source to combine two Sources, and
  2. one Sink that writes its input into a TMVar.

The latter of the two is the easiest one. Given a TMVar it just awaits input, stores it in the TMVar and then calls itself:

sinkTMVar :: MonadIO m => TMVar a -> Sink a m ()
sinkTMVar tmv = forever $ do
    v <- await
    case v of
        Nothing -> return ()
        Just v' -> liftIO (atomically $ putTMVar tmv v')

The other one is only slightly more involved:

whyTMVar :: MonadIO m => Source (ResourceT IO) a -> Source (ResourceT IO) a -> Source m a
whyTMVar src1 src2 = do
    t1 <- liftIO newEmptyTMVarIO 
    t2 <- liftIO newEmptyTMVarIO
    void $ liftIO $ async $ fstProc t1
    void $ liftIO $ async $ sndProc t2
    forever $ liftIO (atomically $ takeTMVar t1 `orElse` takeTMVar t2) >>= C.yield
    
    where
        fstProc t = runResourceT $ src1 $$ sinkTMVar t
        sndProc t = runResourceT $ src2 $$ sinkTMVar t

Rather short and sweet I think. However, there are a few things that I’m not completely sure of yet.

forkIO vs. async vs. resourceForkIO
There is a choice between at least three functions when it comes to creating the threads and I haven’t looked into which one is better to use. AFAIU there may be issues around exception handling and with resources. For now I’ve gone with async for no particular reason at all.
Using TMVar
In this example the input arrives rather slowly, which means having room for a single piece at a time is enough. If the use case changes and input arrives quicker then this decision has to be revisited. I’d probably choose to use stm-conduit in that case since it uses TMChan internally.
Combining only two Sources
Again, this use case doesn’t call for more than two Sources, at least at this point. If the need arises for using more Sources I’ll switch to stm-conduit since it already defines a function to combine a list of Sources.

The next step will be to modify the conduit process and the state machine.

January 22, 2015 12:00 AM

January 21, 2015

mightybyte

LTMT Part 3: The Monad Cookbook

Introduction

The previous two posts in my Less Traveled Monad Tutorial series have not had much in the way of directly practical content. In other words, if you only read those posts and nothing else about monads, you probably wouldn't be able to use monads in real code. This was intentional because I felt that the practical stuff (like do notation) had adequate treatment in other resources. In this post I'm still not going to talk about the details of do notation--you should definitely read about that elsewhere--but I am going to talk about some of the most common things I have seen beginners struggle with and give you cookbook-style patterns that you can use to solve these issues.

Problem: Getting at the pure value inside the monad

This is perhaps the most common problem for Haskell newcomers. It usually manifests itself as something like this:

main = do
    lineList <- lines $ readFile "myfile.txt"
    -- ... do something with lineList here

That code generates the following error from GHC:

    Couldn't match type `IO String' with `[Char]'
    Expected type: String
      Actual type: IO String
    In the return type of a call of `readFile'

Many newcomers seem puzzled by this error message, but it tells you EXACTLY what the problem is. The return type of readFile has type IO String, but the thing that is expected in that spot is a String. (Note: String is a synonym for [Char].) The problem is, this isn't very helpful. You could understand that error completely and still not know how to solve the problem. First, let's look at the types involved.

readFile :: FilePath -> IO String
lines :: String -> [String]

Both of these functions are defined in Prelude. These two type signatures show the problem very clearly. readFile returns an IO String, but the lines function is expecting a String as its first argument. IO String != String. Somehow we need to extract the String out of the IO in order to pass it to the lines function. This is exactly what do notation was designed to help you with.

Solution #1

main :: IO ()
main = do
    contents <- readFile "myfile.txt"
    let lineList = lines contents
    -- ... do something with lineList here

This solution demonstrates two things about do notation. First, the left arrow lets you pull things out of the monad. Second, if you're not pulling something out of a monad, use "let foo =". One metaphor that might help you remember this is to think of "IO String" as a computation in the IO monad that returns a String. A do block lets you run these computations and assign names to the resulting pure values.

Solution #2

We could also attack the problem a different way. Instead of pulling the result of readFile out of the monad, we can lift the lines function into the monad. The function we use to do that is called liftM.

liftM :: Monad m => (a -> b) -> m a -> m b
liftM :: Monad m => (a -> b) -> (m a -> m b)

The associativity of the -> operator is such that these two type signatures are equivalent. If you've ever heard Haskell people saying that all functions are single argument functions, this is what they are talking about. You can think of liftM as a function that takes one argument, a function (a -> b), and returns another function, a function (m a -> m b). When you think about it this way, you see that the liftM function converts a function of pure values into a function of monadic values. This is exactly what we were looking for.

main :: IO ()
main = do
    lineList <- liftM lines (readFile "myfile.txt")
    -- ... do something with lineList here

This is more concise than our previous solution, so in this simple example it is probably what we would use. But if we needed to use contents in more than one place, then the first solution would be better.

Problem: Making pure values monadic

Consider the following program:

import Control.Monad
import System.Environment
main :: IO ()
main = do
    args <- getArgs
    output <- case args of
                [] -> "cat: must specify some files"
                fs -> liftM concat (mapM readFile fs)
    putStrLn output

This program also has an error. GHC actually gives you three errors here because there's no way for it to know exactly what you meant. But the first error is the one we're interested in.

    Couldn't match type `[]' with `IO'
    Expected type: IO Char
      Actual type: [Char]
    In the expression: "cat: must specify some files"

Just like before, this error tells us exactly what's wrong. We're supposed to have an IO something, but we only have a String (remember, String is the same as [Char]). It's not convenient for us to get the pure result out of the readFile functions like we did before because of the structure of what we're trying to do. The two patterns in the case statement must have the same type, so that means that we need to somehow convert our String into an IO String. This is exactly what the return function is for.

Solution: return

return :: a -> m a

This type signature tells us that return takes any type a as input and returns "m a". So all we have to do is use the return function.

import Control.Monad
import System.Environment
main :: IO ()
main = do
    args <- getArgs
    output <- case args of
                [] -> return "cat: must specify some files"
                fs -> liftM concat (mapM readFile fs)
    putStrLn output

The 'm' that the return function wraps its argument in, is determined by the context. In this case, main is in the IO monad, so that's what return uses.

Problem: Chaining multiple monadic operations

import System.Environment
main :: IO ()
main = do
    [from,to] <- getArgs
    writeFile to $ readFile from

As you probably guessed, this function also has an error. Hopefully you have an idea of what it might be. It's the same problem of needing a pure value when we actually have a monadic one. You could solve it like we did in solution #1 on the first problem (you might want to go ahead and give that a try before reading further). But this particular case has a pattern that makes a different solution work nicely. Unlike the first problem, you can't use liftM here.

Solution: bind

When we used liftM, we had a pure function lines :: String -> [String]. But here we have writeFile :: FilePath -> String -> IO (). We've already supplied the first argument, so what we actually have is writeFile to :: String -> IO (). And again, readFile returns IO String instead of the pure String that we need. To solve this we can use another function that you've probably heard about when people talk about monads...the bind function.

(=<<) :: Monad m => (a -> m b) -> m a -> m b
(=<<) :: Monad m => (a -> m b) -> (m a -> m b)

Notice how the pattern here is different from the first example. In that example we had (a -> b) and we needed to convert it to (m a -> m b). Here we have (a -> m b) and we need to convert it to (m a -> m b). In other words, we're only adding an 'm' onto the 'a', which is exactly the pattern we need here. Here are the two patterns next to each other to show the correspondence.

writeFile to :: String -> IO ()
                     a ->  m b

From this we see that "writeFile to" is the first argument to the =<< function. readFile from :: IO String fits perfectly as the second argument to =<<, and then the return value is the result of the writeFile. It all fits together like this:

import System.Environment
main :: IO ()
main = do
    [from,to] <- getArgs
    writeFile to =<< readFile from

Some might point out that this third problem is really the same as the first problem. That is true, but I think it's useful to see the varying patterns laid out in this cookbook style so you can figure out what you need to use when you encounter these patterns as you're writing code. Everything I've said here can be discovered by carefully studying the Control.Monad module. There are lots of other convenience functions there that make working with monads easier. In fact, I already used one of them: mapM.

When you're first learning Haskell, I would recommend that you keep the documentation for Control.Monad close by at all times. Whenever you need to do something new involving monadic values, odds are good that there's a function in there to help you. I would not recommend spending 10 hours studying Control.Monad all at once. You'll probably be better off writing lots of code and referring to it whenever you think there should be an easier way to do what you want to do. Over time the patterns will sink in as form new connections between different concepts in your brain.

It takes effort. Some people do pick these things up more quickly than others, but I don't know anyone who just read through Control.Monad and then immediately had a working knowledge of everything in there. The patterns you're grappling with here will almost definitely be foreign to you because no other mainstream language enforces this distinction between pure values and side effecting values. But I think the payoff of being able to separate pure and impure code is well worth the effort.

by mightybyte ([email protected]) at January 21, 2015 05:09 PM

Kevin Reid (kpreid)

January 20, 2015

Robin KAY

HsQML 0.3.3.0 released: Control those Contexts

Happy New Year! Another year and another new release of HsQML is out, the Haskell binding to the Qt Quick framework that's kind to your skin. As usual, it's available for download from Hackage and immediate use adding a graphical user-interface to your favourite Haskell program.

The major new feature in this release is the addition of the OpenGLContextControl QML item to the HsQML.Canvas module. Previously, the OpenGL canvas support introduced in 0.3.2.0 left programs at the mercy of Qt to configure the context on their behalf and there was no way to influence this process. That was a problem if you want to use the latest OpenGL features because they require you to obtain a newfangled Core profile context whereas Qt appears to default to the Compatibility profile (or just plain OpenGL 2.x if that's all you have).

To use it, simply place an OpenGLContextControl item in your QML document inside the window you want to control and set the properties to the desired values. For example, the following snippet of code would request the system provide it with a context supporting at least the OpenGL 4.1 Core profile:

import HsQML.Canvas 1.0
...

OpenGLContextControl {
    majorVersion: 4;
    minorVersion: 1;
    contextType: OpenGLContextControl.OpenGL;
    contextProfile: OpenGLContextControl.CoreProfile;
}

The supported properties are all detailed in the Haddock documentation for the Canvas module. There's also a more sophisticated example in the corresponding new release of the hsqml-demo-samples package. This example, hsqml-opengl2, displays the current context settings and allows you to experiment with requesting different values.

This graphics chip-set has seen better days.

Also new in this release, i) the defSignalNamedParams function allows you to give names to your signal parameters and ii) the EngineConfig record has been extended to allow setting additional search paths for QML modules and native plugins..

The first point is an interesting one because, harking back, my old blog post on the Connections item, doesn't actually demonstrate passing parameters to the signal handler and that's because you couldn't ordinarily. You could connect a function to the signal manually using the connect() method in QML code and access arguments positionally that way, or written the handler to index into the arguments array for it's parameters if you were willing to stoop that low. Now, you can give the parameters names and they will automatically be available in the handler's scope.

Finally, the Template Haskell shims inside Setup.hs have been extended to support the latest version of the Cabal API shipping with version 1.22. The Template-free SetupNoTH.hs remains supporting 1.18 ≤ n < 1.22 will continue to do so at least until Debian upgrades their Cabal package. Setup.hs will now try to set QT_SELECT if you're running a recent enough version of GHC to support setting environment variables and this can prevent some problems with qtchooser(1).

release-0.3.3.0 - 2015.01.20

  * Added support for Cabal 1.22 API.

  * Added facility for controlling the OpenGL context.
  * Added defSignal variant with ability to set parameter names.
  * Added option for setting the module and plugin search paths.
  * Changed Setup script to set QT_SELECT (base >= 4.7).
  * Fixed crash resizing canvas in Inline mode.
  * Fixed leaking stable pointers when objects are collected.
  * Fixed Canvas delegate marshaller to fail on invalid values.
  * Fixed discrepancy between kinds of type conversion.

by Robin KAY ([email protected]) at January 20, 2015 11:17 PM

Philip Wadler

Democracy vs the 1%


To celebrate the 750th anniversary of the first meeting of the British parliament, the BBC Today programme sponsored a special edition of The Public Philosopher, asking the question Why Democracy? The programme spent much time wondering why folk felt disenfranchised but spent barely two minutes on the question of how wealth distorts politics. (Three cheers to Shirley Williams for raising the issue.) An odd contrast, if you compare it to yesterday's story that the wealthiest 1% now own as much as the other 99% combined; or to Lawrence Lessig's Mayday campaign to stop politicians slanting their votes to what will help fund their reelection; or to Thomas Picketty's analysis of why the wealthy inevitably get wealthier. (tl;dr: "Piketty's thesis has been shorthanded as r > g: that the rate of return on capital today -- and through most of history -- has been higher than general economic growth. This means that simply having money is the best way to get more money.")

by Philip Wadler ([email protected]) at January 20, 2015 08:37 PM

wren gayle romano

Back in action

I don't think I ever mentioned it here but, last semester I took a much-needed sabbatical. The main thing was to take a break from all the pressures of work and grad school and get back into a healthy headspace. Along the way I ended up pretty much dropping off the internet entirely. So if you've been missing me from various mailing lists and online communities, that's why. I'm back now. If you've tried getting in touch by email, irc, etc, and don't hear from me in the next few weeks, feel free ping me again.

This semester I'm teaching foundations of programming language theory with Jeremy Siek, and work on the dissertation continues apace. Over the break I had a few breakthrough moments thinking about the type system for chiastic lambda-calculi— which should help to clean up the normal forms for terms, as well as making room for extending the theory to include eta-conversion. Once the dust settles a bit I'll write some posts about it, as well as continuing the unification-fd tutorial I started last month.



comment count unavailable comments

January 20, 2015 05:39 AM

Noam Lewis

Introducing SJS, a type inferer and checker for JavaScript (written in Haskell)

TL;DR: SJS is a type inference and checker for JavaScript, in early development. The core inference engine is working, but various features and support for the full browser JS environment and libraries are in the works.

SJS (Haskell source on github) is an ongoing effort to produce a practical tool for statically verifying JavaScript code. The type system is designed to support a safe subset of JS, not a superset of JS. That is, sometimes, otherwise valid JS code will not pass type checking with SJS. The reason for not allowing the full dynamic behavior of JS, is to guarantee more safety and (as a bonus) allows fully unambiguous type inference.

The project is still in early development, but the core inference engine is more or less feature complete and the main thing that’s missing is support for all of JS’s builtin functions / methods and those that are present in a browser environment.

Compare to:

  • Google Closure Compiler, whose primary goal is “making JavaScript download and run faster”, but also has a pretty complex type-annotation centric type-checking feature. The type system is rather Java-like, with “shallow” or local type inference. Generics are supported at a very basic level. I should write a blog post about the features and limitations of closure. It’s a very stable project in production use at Google for several years now. I’ve used it myself on a few production projects. Written in Java. They seem to be working on a new type inference engine, but I don’t know what features it will have.
  • Facebook Flow, which was announced a few weeks ago (just as I was putting a finishing touch on my core type checker code!), has a much more advanced (compared to closure) type checker, and seems to be based on data flow analysis. I haven’t gotten around to exploring what exactly flow does, but it seems to be much closer in design to SJS, and obviously as a project has many more resources. There are certain differences in the way flow infers types, I’ll explore those in the near future.
  • TypeScript: a superset of JS that translates into plain JS. Being a superset of JS means that it includes all of the awful parts of JS! I’ve asked about disabling those bad features a while back (around version 0.9); from what I’ve checked, version 1.4 still seems to include them.
  • Other something-to-JS languages, such as PureScript, Roy, Haste, and GHCJS (a full Haskell to JS compiler). These all have various advantages. SJS is aimed at being able to run the code you wrote in plain JS. There are many cases where this is either desired or required.

Of all existing tools, Flow seems to be the closest to what I aim to achieve with SJS. However, SJS supports type system features such as polymorphism which are not currently supported by Flow. On the other hand, Flow has Facebook behind it, and will surely evolve in the near future.

Closure seems to be designed for adapting an existing JS code base. They include features such as implicit union types and/or a dynamic “any” type, and as far as I know don’t infer polymorphic types. The fundamental difference between SJS and some of the alternatives is that I’ve designed SJS for more safety, by supporting a (wide) subset of JS and disallowing certain dynamic typing idioms, such as assigning values of different types to the same variable (in the future this may be relaxed a bit when the types are used in different scopes, but that’s a whole other story).

Ok, let’s get down to the good stuff:

Features:

  • Full type inference: no type annotations necessary.
  • Parametric polymorphism (aka “generics”), based on Hindley-Milner type inference.
  • Row-type polymorphism, otherwise known as “static duck typing”.
  • Recursive types for true representation of object-oriented methods.
  • Correct handling of JS’s this dynamic scoping rules.

Support for type annotations for specifically constraining or for documentation is planned.

Polymorphism is value restricted, ML-style.

Equi-recursive types are constrained to at least include a row type in the recursion to prevent inference of evil recursive types.

Examples

Note: An ongoing goal is to improve readability of type signatures and error messages.

Basic

JavaScript:

var num = 2;
var arrNums = [num, num];

SJS infers (for arrNums):

[TNumber]

That is, an array of numbers.

Objects:

var obj = { something: 'hi', value: num };

Inferred type:

{something: TString, value: TNumber}

That is, an object with two properties: ‘something’, of type string, and ‘value’ of type number.

Functions and this

In JS, this is one truly awful part. this is a dynamically scoped variable that takes on values depending on how the current function was invoked. SJS knows about this (pun intended) and infers types for functions indicating what this must be.

For example:

function useThisData() {
    return this.data + 3;
}

SJS infers:

(this: {data: TNumber, ..l} -> TNumber)

In words: a function which expects this to be an object with at least one property, “data” of type number. It returns a number.

If we call a function that needs this incorrectly, SJS will be angry:

> useThisData();
Error: Could not unify: {data: TNumber, ..a} with TUndefined

Because we called useThisData without a preceding object property access (e.g. obj.useThisData), it will get undefined for this. SJS is telling us that our expected type for this is not unifiable with the type undefined.

Polymorphism

Given the following function:

function makeData(x) {
    return {data: x};
}

SJS infer the following type:

((this: a, b) -> {data: b})

In words: A function that takes anything for its this, and an argument of any type, call it b. It returns an object containing a single field, data of the same type b as the argument.

Row-type polymorphism (static duck typing)

Given the following function:

function getData(obj) {
    return obj.data;
}

SJS infers:

((this: h, {data: i, ..j}) -> i)

In words: a function taking any type for this, and a parameter that contains at least one property, named “data” that has some type i (could be any type). The function returns the same type i as the data property.

SJS is an ongoing project – I hope to blog about specific implementation concerns or type system features soon.


by sinelaw at January 20, 2015 12:12 AM

Jasper Van der Jeugt

Haskell Design Patterns: .Extended Modules

Introduction

For a long time, I have wanted to write a series of blogposts about Design Patterns in Haskell. This never really worked out. It is hard to write about Design Patterns.

First off, I have been writing Haskell for a long time, so mostly things feel natural and I do not really think about code in terms of Design Patterns.

Additionaly, I think there is a very, very thin line between what we call “Design Patterns” and what we call “Common Sense”. Too much on one side of the line, and you sound like a complete idiot. Too much on the other side of the line, and you sound like a pretentious fool who needs five UML diagrams in order to write a 100-line program.

However, in the last year, I have both been teaching more Haskell, and I have been reading even more code written by other people. The former made me think harder about why I do things, and the latter made me notice patterns I hadn’t thought of before, in particular if they were formulated in another way.

This has given me a better insight into these patterns, so I hope to write a couple of blogposts like this over the next couple of months. We will see how it goes – I am not exactly a prolific blogger.

The first blogpost deals with what I call .Extended Modules. While the general idea has probably been around for a while, the credit for this specific scheme goes to Bas van Dijk, Simon Meier, and Thomas Schilling.

.Extended Modules: the problem

This problem mainly resolves around organisation of code.

Haskell allows for building complex applications out of small functions that compose well. Naturally, if you are building a large application, you end up with a lot of these small functions.

Imagine we are building some web application, and we have a small function that takes a value and then sends it to the browser as JSON:

json :: (MonadSnap m, Aeson.ToJSON a) => a -> m ()
json x = do
    modifyResponse $ setContentType "application/json"
    writeLBS $ Aeson.encode x

The question is: where do we put this function? In small projects, these seem to inevitably end up inside the well-known Utils module. In larger, or more well-organised projects, it might end up in Foo.Web or Foo.Web.Utils.

However, if we think outside of the box, and disregard dependency problems and libraries including every possible utility function one can write, it is clearer where this function should go: in Snap.Core.

Putting it in Snap.Core is obviously not a solution – imagine the trouble library maintainers would have to deal with in order to include all these utility functions.

The basic scheme

The scheme we use to solve this is simple yet powerful: in our own application’s non-exposed modules list, we add Snap.Core.Extended.

src/Snap/Core/Extended.hs:

{-# LANGUAGE OverloadedStrings #-}
module Snap.Core.Extended
    ( module Snap.Core
    , json
    ) where

import qualified Data.Aeson as Aeson
import           Snap.Core

json :: (MonadSnap m, Aeson.ToJSON a) => a -> m ()
json x = do
    modifyResponse $ setContentType "application/json"
    writeLBS $ Aeson.encode x

The important thing to notice here is the re-export of module Snap.Core. This means that, everywhere in our application, we can use import Snap.Core.Extended as a drop-in replacement for import Snap.Core.

This also makes sharing code in a team easier. For example, say that you are looking for a catMaybes for Data.Vector.

Before, I would have considered either defining this in a where clause, or locally as a non-exported function. This works for single-person projects, but not when different people are working on different modules: you end up with five implementations of this method, scattered throughout the codebase.

With this scheme, however, it’s clear where to look for such a method: in Data.Vector.Extended. If it’s not there, you add it.

Aside from utility functions, this scheme also works great for orphan instances. For example, if we want to serialize a HashMap k v by converting it to [(k, v)], we can add a Data.HashMap.Strict.Extended module.

src/Data/HashMap/Strict/Extended.hs:

{-# OPTIONS_GHC -fno-warn-orphans #-}
module Data.HashMap.Strict.Extended
    ( module Data.HashMap.Strict
    ) where

import           Data.Binary         (Binary (..))
import           Data.Hashable       (Hashable)
import           Data.HashMap.Strict

instance (Binary k, Binary v, Eq k, Hashable k) => Binary (HashMap k v) where
    put = put . toList
    get = fmap fromList get

A special case of these .Extended modules is Prelude.Extended. Since you will typically import Prelude.Extended into almost all modules in your application, it is a great way to add a bunch of (very) common imports from base, so import noise is reduced.

This is, of course, quite subjective. Some might want to add a few specific functions to Prelude (as illustrated below), and others might prefer to add all of Control.Applicative, Data.List, Data.Maybe, and so on.

src/Prelude/Extended.hs:

module Prelude.Extended
    ( module Prelude
    , foldl'
    , fromMaybe
    ) where

import           Data.List  (foldl')
import           Data.Maybe (fromMaybe)
import           Prelude

Scaling up

The basic scheme breaks once our application consists of several cabal packages.

If we have a package acmecorp-web, which depends on acmecorp-core, we would have to expose Data.HashMap.Strict.Extended from acmecorp-core, which feels weird.

A simple solution is to create an unordered-containers-extended package (which is not uploaded to the public Hackage for obvious reasons). Then, you can export Data.HashMap.Strict.Extended from there.

This solution creates quite a lot of overhead. Having many modules is fine, since they are easy to manage – they are just files after all. Managing many packages, however, is harder: every package introduces a significant amount of overhead: for example, repos need to be maintained, and dependencies need to be managed explicitly in the cabal file.

An alternative solution is to simply put all of these modules together in a hackage-extended package. This solves the maintenance overhead and still gives you a very clean module hierarchy.

Conclusion

After using this scheme for over year in a large, constantly evolving Haskell application, it is clear to me that this is a great way to organise and share code in a team.

A side-effect of this scheme is that it becomes very convenient to consider some utility functions from these .Extended modules for inclusion in their respective libraries, since they all live in the same place. If they do get added, just remove the originals from hackage-extended, and the rest of your code doesn’t even break!

Thanks to Alex Sayers for proofreading!

January 20, 2015 12:00 AM

January 19, 2015

The GHC Team

GHC Weekly News - 2015/01/19

Hi *,

It's time for some more GHC news! The GHC 7.10 release is closing in, which has been the primary place we're focusing our attention. In particular, we're hoping RC2 will be Real Soon Now.

Some notes from the past GHC HQ meetings this week:

  • GHC 7.10 is still rolling along smoothly, and it's expected that RC2 will be cut this Friday, January 23rd. Austin sent out an email about this to ghc-devs, so we can hopefully get all the necessary fixes in.
  • Currently, GHC HQ isn't planning on focusing many cycles on any GHC 7.10 tickets that aren't highest priority. We're otherwise going to fix things as we see fit, at our leisure - but a highest priority bug is a showstopper for us. This means if you have something you consider a showstopper for the next release, you should bump the priority on the ticket and yell at us!
  • We otherwise think everything looks pretty smooth for 7.10.1 RC2 - our libraries are updated, and most of the currently queued patches (with a few minor exceptions) are done and merged.

Some notes from the mailing list include:

  • Austin has alerted everyone that soon, Phabricator will run all builds with ./validate --slow, which will increase the time taken for most builds, but will catch a wider array of bugs in commits and submitted patches - there are many cases the default ./validate script still doesn't catch. https://www.haskell.org/pipermail/ghc-devs/2015-January/008030.html
  • Johan Tibell asked about some clarifications for the HsBang datatype inside GHC. In response, Simon came back with some clarifications, comments, and refactorings, which greatly helped Johan. ttps://www.haskell.org/pipermail/ghc-devs/2015-January/007905.html
  • Richard Eisenberg had a question about the vectoriser: can we disable it? DPH seems to have stagnated a bit recently, bringing into question the necessity of keeping it on. There hasn't been anything done yet, but it looks like the build will get lighter, with a few more modules soon: https://www.haskell.org/pipermail/ghc-devs/2015-January/007986.html
  • Jan Stolarek has a simple question: what English spelling do we aim for in GHC? It seems that while GHC supports an assortment of British and American english syntactic literals (e.g. SPECIALIZE and SPECIALISE), the compiler sports an assortment of British/American identifiers on its own! https://www.haskell.org/pipermail/ghc-devs/2015-January/007999.html

Closed tickets the past few weeks include: #9966, #9904, #9969, #9972, #9934, #9967, #9875, #9900, #9973, #9890, #5821, #9984, #9997, #9998, #9971, #10000, #10002, #9243, #9889, #9384, #8624, #9922, #9878, #9999, #9957, #7298, and #9836.

by thoughtpolice at January 19, 2015 09:35 PM

Philip Wadler

January 17, 2015

Matthew Sackman

Free speech values

In the aftermath of the attacks on Charlie Hebdo in Paris, there has been some high quality thinking and writing. There's also been some really stupid things said, from the usual protagonists. It's an interesting facilitation that the internet now provides: as I no longer watch any news on TV (in fact I don't watch any news at all), nor subscribe to any newspaper, I'm used to reading articles from a wide range of sources. Equally, it's much easier for me to avoid opinion I disagree (right-wing press) with or trivialised dumbed-down reporting (e.g. BBC news). Because of this ease of reading what you want to (in both the good and bad sense), I thought a lot of the reaction was measured and sensible. Turns out I was just unaware of most of the reaction going on.

Anyway there seems to be virtually nothing left to say on this, so this post is really little more than a bunch of links to pieces I thought were thoughtful and well written.

I am an agnostic. I personally don't believe in any religion but I accept I can't prove that every religion is false and so I may be wrong. I tend to treat the beliefs of any religion as arbitrary and abstract ideas. Thus one place to start is the acknowledgement that the laws of any country or civilisation are as arbitrary and ad-hoc as the rules or teachings of any religion. They are just things that people choose to believe in, or follow, or not violate. In the UK, some of our law is based in Christianity (e.g. thou shalt not murder - though I've no idea whether ideas like that actually predate Christianity; I wouldn't be surprised if they do) though other parts of Christianity are not encoded in law (adultery is not illegal, for example).

Many have careless labelled these attacks as an attack of free speech and thus that the reaction is about defending free speech. As such, much has been written about how it's possible to defend the right Charlie Hebdo has to publish anything they want, even if it's offensive, even whilst criticising their choice of content.

Freedom of speech is, I believe, essential for any sort of democracy to work. This doesn't suggest that if you have freedom of speech then you have a democracy (I don't believe in the UK, or arguably anywhere in Europe or north America there is functioning democracy; merely systemic corporate dictatorships disguised by corrupt elected faux-representatives), but without freedom of speech, you certainly don't have any ability to transparently hold people to account and thus corruption and abuse will obviously take hold. But freedom of speech is a choice, it's not something axiomatic to existence; it's something many people have chosen to attach huge importance to and as such will defend. But just because there are widely agreed reasons to defend the concept of freedom of speech doesn't mean that it's self-evidently a "better" idea than not having freedom of speech. To judge something as "better" than something else requires all manner of discussion of the state of human existence. Consequently, criticising people for not holding freedom of speech in the same regard as we claim to do is no different from criticising people for their choice of clothing, or housing, or diet, or career, or religious views.

Of course in the UK, we don't have freedom of speech as an absolute concept nor does most of Europe. In the UK, "incitement to racial hatred" was established as an offence by the provisions of §§ 17-29 of the Public Order Act 1986, and The Criminal Justice and Public Order Act 1994 made publication of material that incited racial hatred an arrestable offence. It's not difficult to find stories of people getting arrested for saying things on twitter, the easiest is of course making bomb threats. UKIP (an extremist right wing political party in the UK) even managed to get police to visit a man who'd posted a series of tweets fact-checking UKIP policies. Thankfully, he wasn't arrested.

The USA appears to hold freedom of speech as a much more inviolable concept. For example:

The ACLU vigorously defends the right of neo-Nazis to march through a community filled with Holocaust survivors in Skokie, Illinois, but does not join the march; they instead vocally condemn the targeted ideas as grotesque while defending the right to express them.

But whilst the outpouring in Paris and the crowds gathered as a statement of unity were warmly defiant, it is somewhat likely that rather more than physical violence that was being defied, and more than freedom of speech defended by the crowd. As Brian Klug wrote:

Here is a thought experiment: Suppose that while the demonstrators stood solemnly at Place de la Republique the other night, holding up their pens and wearing their “je suis charlie” badges, a man stepped out in front brandishing a water pistol and wearing a badge that said “je suis cherif” (the first name of one of the two brothers who gunned down the Charlie Hebdo staff). Suppose he was carrying a placard with a cartoon depicting the editor of the magazine lying in a pool of blood, saying, “Well I’ll be a son of a gun!” or “You’ve really blown me away!” or some such witticism. How would the crowd have reacted? Would they have laughed? Would they have applauded this gesture as quintessentially French? Would they have seen this lone individual as a hero, standing up for liberty and freedom of speech? Or would they have been profoundly offended? And infuriated. And then what? Perhaps many of them would have denounced the offender, screaming imprecations at him. Some might have thrown their pens at him. One or two individuals — two brothers perhaps — might have raced towards him and (cheered on by the crowd) attacked him with their fists, smashing his head against the ground. All in the name of freedom of expression. He would have been lucky to get away with his life.

It seems that some things you can say and governments will try and protect you. It would appear in much of Europe that blaspheming Islam is legally OK. But, as noted above, saying other things will get you arrested. The French "comedian" Dieudonné was arrested just 48 hours after the march through Paris on charges of "defending terrorism". Whilst not arrested, the UK Liberal Democrat MP David ward tweeted "Je suis Palestinian" during the Paris marches and criticised the presence of the Israeli prime minister, Netanyahu, subsequently eliciting a complaint from the Israeli ambassador to Britain. Of course the "world leaders" who gathered in Paris have a wonderful record themselves on "protecting" free speech.

Charlie Hebdo did not practise satire by mocking select, specific targets, nor did they mock all major religions equally. It's been widely reported that they sacked a cartoonist for making an anti-Semitic remark and that:

Jyllands-Posten, the Danish newspaper that published caricatures of the Prophet in 2005, reportedly rejected cartoons mocking Christ because they would "provoke an outcry" and proudly declared it would "in no circumstances ... publish Holocaust cartoons".

But of course it comes down to the content of the publication. In this case the cartoons exist to ridicule, make fun of and offend members of one of the world's largest religions, Islam, by mocking their prophet Mohammed. As Amanda Taub writes:

Dalia Mogahed, the Director of Research at the Institute for Social Policy and Understanding, explained that Mohammed is a beloved figure to Muslims, and "it is a human impulse to want to protect what's sacred to you".

Mogahed compared the cartoons to the issue of flag-burning in the United States, noting that a majority of Americans favour a constitutional amendment to ban flag-burning for similar reasons: the flag is an important symbol of a national identity, and many Americans see flag-burning as an attack on that identity, or even on the country itself. That's not extremism or backwardness; it's about protecting something you cherish.

In any large group of people, there will be the vast majority of sound mind and thought, and a small minority who are not. This is just the fact that all over the earth, humans are not all the same: there is some variance in health, intelligence, and every other aspect of what a human is. Any large sampling of humans will show the same set of variations. So if you offend a huge group of people, you will offend tall people and short people, rich people and poor people, fat people and thin people, violent people and peaceful people. Unsurprisingly, it would appear that the background of these killers suggests there is little to do with Islam there, and more to do with the their upbringing, family, education and integration with society.

Thus even if you feel Charlie Hebdo's publications of the cartoons served some purpose (given their biased choice of target, their purpose does not seem to be an exercise in itself of freedom of speech), it should be obvious that by offending so many people, they were placing themselves in danger. The same is true of any sustained, systemic, deliberate offence to any of this planet's major religions, races, nationalities or any other grouping of humans which share values. So it becomes a balancing act between how much do you believe in the message you're publishing versus the risk you're putting yourself in. You can view the actions of Edward Snowden in this same context: he felt that the message he was delivering on the abuses of surveillance power carried out by governments across the world outweighed the significant danger he was putting himself in, and so both delivered the message and accepted the need to flee from his country, probably never to return, in fear of the consequences of his actions.

Thankfully, throughout history, there have been people who have chosen to put themselves in the path of great harm (often losing their lives as a result) in order to report, expose, document and publicise matters which the wider public needed to know. Governments, monarchies and empires have crumbled when faced with popular revolt.

So freedom of speech requires consideration. It is perfectly reasonable not to say something because you anticipate you won't enjoy the consequences. Most of us do not conduct our lives by going around saying anything and everything we want to our friends and family: if we did, we'd rapidly lose a lot of friends. The expression "biting your tongue" exists for a reason. Equally, it's perfectly reasonable for a news outlet to decide not to re-publish the Charlie Hebdo cartoons if they fear a violent response that they suspect the local police forces cannot prevent; not to mention just not wanting to offend so many people.

I view as daft the idea that people should never choose not to publish something out of fear. People absolutely should choose not to publish, if they feel the risk to the things they hold dear is not outweighed by the message they're delivering. Everything in life is a trade-off and every action has consequences. Whilst I agree with the right to free speech, that does not imply saying anything you like is free of consequences. If it were, it would require that words have no meaning, and subsequently all communication is void: if anything you say has no consequence then you can say nothing.

I am certainly not suggesting the murders were in any way justified, or that Islam or any other religion or collection of humans should be beyond criticism or even ridicule. At the end of the day, no human is perfect, and as such we can all benefit from a thorough dose of criticism once in a while. Every article I've linked to in this post repeats that such violence, regardless of the provocation, is abhorrent, and I agree with that: murder is never an acceptable response to any drawing, written or spoken word. But that doesn't mean that these events weren't predictable.

Finally then we get to the insanely idiotic response from the UK government. That MI5 should have more powers that they don't need (they probably just need more money), and that we must deny terrorists "safe space" to communicate online. Which means banning encryption, which means it's impossible to use the internet for anyone. The home secretary, Theresa May said:

We are determined that as far as possible there should be no safe spaces for terrorists to communicate. I would have thought that that should be a principle ... that could have been held by everybody, across all parties in this House of Commons.

So of course, if terrorists can't communicate in private then no one can. Quickly, we've immediately gone from lazy labelling of events as an attack on free speech to a knee jerk response of "free speech yes, but you certainly can't have free speech in private, because you might be a terrorist". Again, it's a trade-off. I doubt that having such restrictions on communication will make the earth or this country safer for anyone and of course the impossibility of a controlled study means it cannot be proven one way or another. No security service is ever going to be flawless and from time to time very horrible things will continue to happen. I think most people are aware of this and accept this; we're all going to die after all. The loss of civil liberties though is certainly far more worrying to me.

In theory, I would think these proposals so lunatic as to never see the light of day (it would be completely impossible to enforce for one thing - terrorists along with everyone else would learn to use stenography to encode their messages in pictures of cats, thus rendering their traffic no different to that of everyone else). Sadly, Labour have stated they don't believe their position to be that far away from the Conservatives, which is deeply worrying. Labour don't exactly have a great record on this area either given their previous ID card schemes and the introduction of detention-without-charge. What is needed is some transparency. We need an informed debate, with MI5 and GCHQ providing some independently verifiable facts and figures that demonstrate how they are being thwarted in what they're trying to do. We need to understand properly what the risk to us is, and most importantly we need to understand why these threats exist and what else we can do to make them decrease.

I've never seen it said that any UK Government policy in the last 15 years has made the UK less of a target for such attacks. Maybe we should look at that before we start subjecting ourselves to Orwellian control.

January 17, 2015 05:35 PM

January 15, 2015

Functional Jobs

Functional Software Developer at Moixa Technology (Full-time)

Green energy / IoT startup Moixa Technology is seeking to add to our software team.

We're a small team dedicated to developing technology that enables a radical change in the way that energy is used in the home. We are just ramping up to deliver a project (working with large energy companies) demonstrating how communities can use the Maslow system across a group of homes to share energy in that community. We will need to deploy software that addresses some challenging problems to make this possible.

The code that we develop is built around providing services based on the hardware systems that we are deploying and so needs to work with the constraints and characteristics of the hardware & low level firmware we have built.

We're looking for individuals with generalist approach that are willing and and able to participate in all aspect of design, implementation and operation of our platform. The candidate must be happy in a small team, and also able to work autonomously. We are expanding as we ramp up to deliver the next generation of control software to our increasing number of deployed system. This is an exciting moment for the company.

Tasks:

  • Design & implemetation of all parts of our software stack (web frontend & backend, data analytics, high-level code on IoT devices)
  • Operations support (expected <20% of time)

Our current stack involves:

  • Scala (Akka, Play) / ClojureScript / Haskell
  • Postgres, neo4j
  • Raspberry PI / Arch Linux
  • PIC32 microcontroller / C

Skills and Requirements:

  • Experience in one or more functional languages
  • Familiarity with at least one database paradigm
  • Linux scripting and operation

Advantages:

  • Familiarity with (strongly) typed functional languages (Haskell/ML/Scala)
  • Embedded programming experience
  • Experience in data analytics (Spark or similiar)
  • Experience in IoT development
  • Open Source contributions

Moixa technology is based in central London (Primrose Hill), Salary depending on experience + performance based share options.

Get information on how to apply for this position.

January 15, 2015 07:07 AM

January 14, 2015

Magnus Therning

Thought on JavaScript

Over the holidays I’ve been reading Douglas Crockford’s excellent book JavaScript: The Good Parts. About halfway through I came to the conclusion that JavaScript is an “anti-LISP”. There are many reasons to learn LISP, but none of them is “LISP is widely used in industry.” As Eric Raymond is famous words claim, knowing LISP will make you a better programmer. On the other hand there seems to be almost no reasons to learn JavaScript. It sure doesn’t seem to teach anything that’ll make you a better programmer. The only reason I can come up with is “JavaScript is widely used in industry.”

January 14, 2015 12:00 AM

January 13, 2015

Neil Mitchell

User Interfaces for Users

Summary: When designing a user interface, think about the user, and what they want to know. Don't just present the information you know.

As part of my job I've ended up writing a fair amount of user interface code, and feedback from users has given me an appreciation of some common mistakes. Many user interfaces are designed to present information, and one of the key rules is to think about what the user wants to see, not just showing the information you have easily to hand. I was reminded of this rule when I was expecting a parcel. On the morning of Saturday 10/1/2015 I used a track my parcel interface to find:

The interface presents a lot of information, but most of it is interesting to the delivery company, not to me. I have basically one question: when will my parcel arrive. From the interface I can see:

  • The parcel is being shipped with "Express AM" delivery. No link to what that means. Searching leads me to a page saying that guarantees delivery before 1pm on a business day (some places said noon, some said 1pm). That is useful information if I want to enter into a contract with the delivery service, but not when I'm waiting for a parcel. What happens on Saturday? Do they still deliver, just not guarantee the time? Do they wait until the next business day? Do they do either, as they see fit?
  • My parcel has been loaded onto a vehicle. Has the vehicle has left the depot, or is that a separate step? How many further steps are there between loading and delivery? This question is easy to answer after the parcel has been delivered, since the additional steps show up in the list, but difficult to answer before.

On Saturday morning my best guess about when the parcel would arrive was between then and Monday 1pm. Having been through the complete process, I now know the best answer was between some still unknown time on Monday morning and Monday 1pm. With that information, I'd have taken my son to the park rather than keeping him cooped up indoors.

I suggest the company augment their user interface with the line "Your parcel will be delivered on Monday, between 9am and 1pm". It's information they could have computed easily, and answers my question.

The eagle-eyed may notice that there is a plus to expand to show more information. Alas, all that shows is:

I think they're trying to say my puny iPhone can't cope with the bold font that is essential to tell me the status and I should get an iPhone plus... Checking the desktop website also showed no further information.

by Neil Mitchell ([email protected]) at January 13, 2015 06:56 PM

FP Complete

FP Complete's software pipeline

FP Complete's mission is easily expressed: increase the commercial adoption of Haskell. We firmly believe that- in many domains- Haskell is the best way to write high quality, robust, performant code in a productive (read: fast-to-market) way. Those of you who, like me, are members of the Haskell community are probably using Haskell because you believe the same thing, and believe- like us- that we can make the world a better place by getting more software to be written in Haskell.

By the way: FP Complete is looking to expand its team of Haskell developers. If this idea excites you, don't forget to send us your resumes!

There are two interesting groups that I've spelled out in that paragraph: commercial users of Haskell, and the Haskell community. I want to clarify how our software development process interacts with these two groups, in part to more fully answer a question from Reddit.

The Haskell community has created, and continues to create, amazing software. As a company, our main objective is to help other companies find this software, understand its value, and knock down any hurdles to adoption that they may have. These hurdles include:

  • Lack of expertise in Haskell at the company
  • Limitations in the ecosystem, such as missing libraries for a particular domain
  • Providing commercial support, such as high-availability hosted solutions and on-call engineers to provide answers and fix problems
  • Provide tooling as needed by commercial users

You can already see quite a bit of what we've done, for example:

  • Create FP Haskell Center, which addressed requests from companies to provide a low-barrier-to-entry Haskell development environment for non-Haskell experts
  • Put together School of Haskell to enable interactive documentation to help both new Haskellers, and those looking to improve their skills
  • Start the LTS Haskell project as a commercial-grade Haskell package set, based on our previous work with FP Haskell Center libraries and Stackage
  • Provide Stackage Server as a high-availability means of hosting package sets, both official (like Stackage) and unofficial (like experimental package releases)

There's something all of these have in common, which demonstrates what our software pipeline looks like:

  1. We start off gathering requirements from companies- both those that are and are not our customers- to understand needs
  2. We create a product in a closed-source environment
  3. Iterate with our customers on the new product to make sure it addresses all of their needs
  4. After the product reaches a certain point of stability (a very subjective call), we decide to release it to the community, which involves:
    1. Polishing it
    2. Discussing with relevant members/leaders in the community
    3. Making it officially available

Not every product we work on goes through all of these steps. For example, we might decide that the product is too specialized to be generally useful. That's why we sometimes hold our cards a bit close to our chest: we don't want to talk about every new idea we have, because we know some of them may be duds.

Some people may ask why we go through that fourth step I listed above. After all, taking a product from "it works well for individual companies with ongoing support from us" to "it's a generally viable product for commercial and non-commercial users" is an arduous process, and doesn't directly make us any money. The answer is simple, and I already alluded to it above: the great value in Haskell comes from the wonderful work the community does. If we're to succeed in our mission of getting Haskell to improve software development in general, we need all the help we can get.

So that's our strategy. You're going to continue seeing new products released from us as we perfect them with our customers. We want to find every way we can to help the community succeed even more. I'm also making a small tweak to our strategy today: I want to be more open with the community about this process. While not everything we do should be broadcast to the world (because, like I said, some things may be duds), I can share some of our directions earlier than I have previously.

So let me lay out some of the directions we're working on now:

  1. Better build tools. LTS Haskell is a huge step in that direction, providing a sane development environment. But there's still quite a bit of a manual process involved. We want to automate this even more. (And to directly answer hastor's question on Reddit: yes, we're going to release a Docker image.)
  2. Better code inspection. We've developed a lot of functionality as part of FP Haskell Center to inspect type signature, identifier locations, usage, etc. We want to unlock that power and make it available outside of FP Haskell Center as well.
  3. In a completely different direction: we're working on more powerful distributed computing capabilities. This is still early stage, so I can't say much more yet.

Outside of products themselves, we want to get other companies on board with our goal of increased Haskell adoption as well. We believe many companies using Haskell today, and even more so companies considering making the jump, have a huge amount of ideas to add to the mix. We're still ironing out the details of what that will look like, but expect to hear some more from us in the next few months about this.

And I'm giving you all a commitment: expect to see much more transparency about what we're doing. I intend to be sharing things with the community as we go along. Chris Done and Mathieu Boespflug will be part of this effort as well. If you have questions, ask. We want to do all we can to make the community thrive.

January 13, 2015 01:00 PM

We're hiring: Software Engineer

FP Complete is expanding yet again! We are looking to hire for several new engineers to join our Haskell development team, both to build great new core products and in partnership with our clients to apply Haskell at a large scale. Some of our recently announced core products include the Integrated Analysis Platform, Stackage and LTS Haskell, with much more to come. If you’d like to be part of our team and shape the future of Haskell, please send a resume or CV to [email protected]. Any existing work - either a running website or an open source codebase - which you can include as links be greatly appreciated as well.

We will want you to start right away. Depending on your current jurisdiction, this will either be a full-time contractor position, or an employee position. This is a telecommute position: you can work from home or wherever you choose, with little or no travel. Location in North America preferred; but you will work with colleagues who are both on North American and European hours.

Activities:

  • distribute existing and new code over large clusters,
  • code parallelization and performance tuning,
  • interface with foreign math and scientific libraries,
  • relentlessly refactor for composability, testability and clarity,
  • identify performance bottlenecks and debug known issues,
  • implementing unit tests, integration tests, acceptance tests,
  • write clear and concise documentation.

Skills:

  • strong experience writing process driven application code in Haskell.
  • experience building reusable components/packages/libraries and demonstrated ability to write well structured and well documented code,
  • ability to evolve and refactor large code bases,
  • experience working under test driven methodologies,
  • Ability to interact with a distributed development team, and to communicate clearly on issues, in bug reports and emails.

These further skills are a plus:

  • Bachelor’s or Master’s degree in computer science or mathematics,
  • experience developing products in a regulated environment (avionics/medical/finance).
  • experience building scalable server/Web applications,
  • (web) UI design and implementation experience,
  • an understanding of the implementation of GHC’s multithreaded runtime,
  • experience as an architect and/or a creator of technical specs.

January 13, 2015 12:30 PM

Manuel M T Chakravarty

Using Haskell as part of a Mac app

Using Haskell as part of a Mac app:

Here are the YouTube video and slides of my Haskell Symposium 2014 talk on using Objective-C inline in Haskell programs. This provides a fairly convenient way to integrate Haskell code into the model layer of a Mac app written in Swift or Objective-C.

January 13, 2015 03:51 AM

Magnus Therning

State machine with conduit

Previously I wrote about a simple state machine in Haskell and then offered a simple example, an adder. As I wrote in the initial post I’ve been using this to implement communication protocols. That means IO, and for that I’ve been using conduit. In this post I thought I’d show how easy it is to wrap up the adder in conduit, read input from stdin and produce the result on stdout.

At the top level is a function composing three parts:

  1. input from stdin
  2. processing in the adder
  3. output on stdout

This can be written as this

process :: ResourceT IO ()
process = source $= calc $$ output
    where
        source = stdinC

        output = stdoutC

        calc = concatMapC unpack =$= wrapMachine calcMachine =$= filterC f =$= mapC (pack . show)
            where
                f (CalcResult _) = True
                f _ = False

The input to the adder is ordinary Chars so the ByteStrings generated by stdinC need to be converted before being handed off to the state machine, which is wrapped to make it a Conduit (see below). The state machine is generating more signals that I want printed, so I filter out everything but the CalcResults. Then it’s passed to show and turned into a ByteString before sent to stdoutC.

I think the implementation of wrapMachine shows off the simplicity of conduit rather well:

wrapMachine :: (Monad m) => Machine state event signal -> Conduit event m signal
wrapMachine mach = do
    maybeEvent <- await
    case maybeEvent of
        Nothing -> return ()
        Just event -> let
                (newMach, signals) = stepMachine mach event
            in do
                yield signals
                wrapMachine newMach

The main function is as short as this:

main :: IO ()
main = do
    hSetBuffering stdin NoBuffering
    hSetBuffering stdout NoBuffering
    runResourceT process

The result is about as exciting as expected

% runhaskell Calculator.hs
56 42 +CalcResult 98
17 56 +CalcResult 73

It never exits on its own so one has to use Ctrl-C to terminate. I thought I’d try to address that in the next post by adding a time-out to the adder machine.

January 13, 2015 12:00 AM

January 12, 2015

JP Moresmau

Antti-Juhani Kaijanaho (ibid)

Planet Haskell email is broken (UPDATE: fixed)

I became aware about a week ago that Planet Haskell’s email address had not received any traffic for a while. It turns out the community.haskell.org email servers are misconfigured. The issue remains unfixed as of this writing. Update: this issue has been fixed.

Please direct your Planet Haskell related mail directly to me ([email protected]) for the duration of the problem.

by Antti-Juhani Kaijanaho at January 12, 2015 11:54 AM

Keegan McAllister

Continuations in C++ with fork

[Update, Jan 2015: I've translated this code into Rust.]

While reading "Continuations in C" I came across an intriguing idea:

It is possible to simulate call/cc, or something like it, on Unix systems with system calls like fork() that literally duplicate the running process.

The author sets this idea aside, and instead discusses some code that uses setjmp/longjmp and stack copying. And there are several other continuation-like constructs available for C, such as POSIX getcontext. But the idea of implementing call/cc with fork stuck with me, if only for its amusement value. I'd seen fork used for computing with probability distributions, but I couldn't find an implementation of call/cc itself. So I decided to give it a shot, using my favorite esolang, C++.

Continuations are a famously mind-bending idea, and this article doesn't totally explain what they are or what they're good for. If you aren't familiar with continuations, you might catch on from the examples, or you might want to consult another source first (1, 2, 3, 4, 5, 6).

Small examples

I'll get to the implementation later, but right now let's see what these fork-based continuations can do. The interface looks like this.

template <typename T>
class cont {
public:
void operator()(const T &x);
};

template <typename T>
T call_cc( std::function< T (cont<T>) > f );

std::function is a wrapper that can hold function-like values, such as function objects or C-style function pointers. So call_cc<T> will accept any function-like value that takes an argument of type cont<T> and returns a value of type T. This wrapper is the first of several C++11 features we'll use.

call_cc stands for "call with current continuation", and that's exactly what it does. call_cc(f) will call f, and return whatever f returns. The interesting part is that it passes to f an instance of our cont class, which represents all the stuff that's going to happen in the program after f returns. That cont object overloads operator() and so can be called like a function. If it's called with some argument x, the program behaves as though f had returned x.

The types reflect this usage. The type parameter T in cont<T> is the return type of the function passed to call_cc. It's also the type of values accepted by cont<T>::operator().

Here's a small example.

int f(cont<int> k) {
std::cout << "f called" << std::endl;
k(1);
std::cout << "k returns" << std::endl;
return 0;
}

int main() {
std::cout << "f returns " << call_cc<int>(f) << std::endl;
}

When we run this code we get:

f called
f returns 1

We don't see the "k returns" message. Instead, calling k(1) bails out of f early, and forces it to return 1. This would happen even if we passed k to some deeply nested function call, and invoked it there.

This nonlocal return is kind of like throwing an exception, and is not that surprising. More exciting things happen if a continuation outlives the function call it came from.

boost::optional< cont<int> > global_k;

int g(cont<int> k) {
std::cout << "g called" << std::endl;
global_k = k;
return 0;
}

int main() {
std::cout << "g returns " << call_cc<int>(g) << std::endl;

if (global_k)
(*global_k)(1);
}

When we run this, we get:

g called
g returns 0
g returns 1

g is called once, and returns twice! When called, g saves the current continuation in a global variable. After g returns, main calls that continuation, and g returns again with a different value.

What value should global_k have before g is called? There's no such thing as a "default" or "uninitialized" cont<T>. We solve this problem by wrapping it with boost::optional. We use the resulting object much like a pointer, checking for "null" and then dereferencing. The difference is that boost::optional manages storage for the underlying value, if any.

Why isn't this code an infinite loop? Because invoking a cont<T> also resets global state to the values it had when the continuation was captured. The second time g returns, global_k has been reset to the "null" optional value. This is unlike Scheme's call/cc and most other continuation systems. It turns out to be a serious limitation, though it's sometimes convenient. The reason for this behavior is that invoking a continuation is implemented as a transfer of control to another process. More on that later.

Backtracking

We can use continuations to implement backtracking, as found in logic programming languages. Here is a suitable interface.

bool guess();
void fail();

We will use guess as though it has a magical ability to predict the future. We assume it will only return true if doing so results in a program that never calls fail. Here is the implementation.

boost::optional< cont<bool> > checkpoint;

bool guess() {
return call_cc<bool>( [](cont<bool> k) {
checkpoint = k;
return true;
} );
}

void fail() {
if (checkpoint) {
(*checkpoint)(false);
} else {
std::cerr << "Nothing to be done." << std::endl;
exit(1);
}
}

guess invokes call_cc on a lambda expression, which saves the current continuation and returns true. A subsequent call to fail will invoke this continuation, retrying execution in a world where guess had returned false instead. In Scheme et al, we would store a whole stack of continuations. But invoking our cont<bool> resets global state, including the checkpoint variable itself, so we only need to explicitly track the most recent continuation.

Now we can implement the integer factoring example from "Continuations in C".

int integer(int m, int n) {
for (int i=m; i<=n; i++) {
if (guess())
return i;
}
fail();
}

void factor(int n) {
const int i = integer(2, 100);
const int j = integer(2, 100);

if (i*j != n)
fail();

std::cout << i << " * " << j << " = " << n << std::endl;
}

factor(n) will guess two integers, and fail if their product is not n. Calling factor(391) will produce the output

17 * 23 = 391

after a moment's delay. In fact, you might see this after your shell prompt has returned, because the output is produced by a thousand-generation descendant of the process your shell created.

Solving a maze

For a more substantial use of backtracking, let's solve a maze.

const int maze_size = 15;
char maze[] =
"X-------------+\n"
" | |\n"
"|--+ | | | |\n"
"| | | | --+ |\n"
"| | | |\n"
"|-+---+--+- | |\n"
"| | | |\n"
"| | | ---+-+- |\n"
"| | | |\n"
"| +-+-+--| |\n"
"| | | |--- |\n"
"| | |\n"
"|--- -+-------|\n"
"| \n"
"+------------- \n";

void solve_maze() {
int x=0, y=0;

while ((x != maze_size-1)
|| (y != maze_size-1)) {

if (guess()) x++;
else if (guess()) x--;
else if (guess()) y++;
else y--;

if ( (x < 0) || (x >= maze_size) ||
(y < 0) || (y >= maze_size) )
fail();

const int i = y*(maze_size+1) + x;
if (maze[i] != ' ')
fail();
maze[i] = 'X';
}

for (char c : maze) {
if (c == 'X')
std::cout << "\e[1;32mX\e[0m";
else
std::cout << c;
}
}

Whether code or prose, the algorithm is pretty simple. Start at the upper-left corner. As long as we haven't reached the lower-right corner, guess a direction to move. Fail if we go off the edge, run into a wall, or find ourselves on a square we already visited.

Once we've reached the goal, we iterate over the char array and print it out with some rad ANSI color codes.

Once again, we're making good use of the fact that our continuations reset global state. That's why we see 'X' marks not on the failed detours, but only on a successful path through the maze. Here's what it looks like.


X-------------+
XXXXXXXX| |
|--+ |X| | |
| | |X| --+ |
| |XXXXX| |
|-+---+--+-X| |
| |XXX | XXX|
| |X|X---+-+-X|
|XXX|XXXXXX|XX|
|X+-+-+--|XXX |
|X| | |--- |
|XXXX | |
|---X-+-------|
| XXXXXXXXXXX
+-------------X

Excess backtracking

We can run both examples in a single program.

int main() {
factor(391);
solve_maze();
}

If we change the maze to be unsolvable, we'll get:

17 * 23 = 391
23 * 17 = 391
Nothing to be done.

Factoring 391 a different way won't change the maze layout, but the program doesn't know that. We can add a cut primitive to eliminate unwanted backtracking.

void cut() {
checkpoint = boost::none;
}

int main() {
factor(391);
cut();
solve_maze();
}

The implementation

For such a crazy idea, the code to implement call_cc with fork is actually pretty reasonable. Here's the core of it.

template <typename T>
// static
T cont<T>::call_cc(call_cc_arg f) {
int fd[2];
pipe(fd);
int read_fd = fd[0];
int write_fd = fd[1];

if (fork()) {
// parent
close(read_fd);
return f( cont<T>(write_fd) );
} else {
// child
close(write_fd);
char buf[sizeof(T)];
if (read(read_fd, buf, sizeof(T)) < ssize_t(sizeof(T)))
exit(0);
close(read_fd);
return *reinterpret_cast<T*>(buf);
}
}

template <typename T>
void cont<T>::impl::invoke(const T &x) {
write(m_pipe, &x, sizeof(T));
exit(0);
}

To capture a continuation, we fork the process. The resulting processes share a pipe which was created before the fork. The parent process will call f immediately, passing a cont<T> object that holds onto the write end of this pipe. If that continuation is invoked with some argument x, the parent process will send x down the pipe and then exit. The child process wakes up from its read call, and returns x from call_cc.

There are a few more implementation details.

  • If the parent process exits, it will close the write end of the pipe, and the child's read will return 0, i.e. end-of-file. This prevents a buildup of unused continuation processes. But what if the parent deletes the last copy of some cont<T>, yet keeps running? We'd like to kill the corresponding child process immediately.

    This sounds like a use for a reference-counted smart pointer, but we want to hide this detail from the user. So we split off a private implementation class, cont<T>::impl, with a destructor that calls close. The user-facing class cont<T> holds a std::shared_ptr to a cont<T>::impl. And cont<T>::operator() simply calls cont<T>::impl::invoke through this pointer.

  • It would be nice to tell the compiler that cont<T>::operator() won't return, to avoid warnings like "control reaches end of non-void function". GCC provides the noreturn attribute for this purpose.

  • We want the cont<T> constructor to be private, so we had to make call_cc a static member function of that class. But the examples above use a free function call_cc<T>. It's easiest to implement the latter as a 1-line function that calls the former. The alternative is to make it a friend function of cont<T>, which requires some forward declarations and other noise.

There are a number of limitations too.

  • As noted, the forked child process doesn't see changes to the parent's global state. This precludes some interesting uses of continuations, like implementing coroutines. In fact, I had trouble coming up with any application other than backtracking. You could work around this limitation with shared memory, but it seemed like too much hassle.

  • Each captured continuation can only be invoked once. This is easiest to observe if the code using continuations also invokes fork directly. It could possibly be fixed with additional forking inside call_cc.

  • Calling a continuation sends the argument through a pipe using a naive byte-for-byte copy. So the argument needs to be Plain Old Data, and had better not contain pointers to anything not shared by the two processes. This means we can't send continuations through other continuations, sad to say.

  • I left out the error handling you would expect in serious code, because this is anything but.

  • Likewise, I'm assuming that a single write and read will suffice to send the value. Robust code will need to loop until completion, handle EINTR, etc. Or use some higher-level IPC mechanism.

  • At some size, stack-allocating the receive buffer will become a problem.

  • It's slow. Well, actually, I'm impressed with the speed of fork on Linux. My machine solves both backtracking problems in about a second, forking about 2000 processes along the way. You can speed it up more with static linking. But it's still far more overhead than the alternatives.

As usual, you can get the code from GitHub.

by keegan ([email protected]) at January 12, 2015 06:18 AM

January 11, 2015

Gabriel Gonzalez

total-1.0.0: Exhaustive pattern matching using traversals, prisms, and lenses

The lens library provides Prisms, which are a powerful way to decouple a type's interface from its internal representation. For example, consider the Either type from the Haskell Prelude:

data Either a b = Left a | Right b

Instead of exposing the Left and Right constructors, you could instead expose the following two Prisms:

_Left  :: Prism (Either a b) (Either a' b ) a a'

_Right :: Prism (Either a b) (Either a b') b b'

Everything you can do with a constructor, you can do with the equivalent Prism. For example, if you want to build a value using a Prism, you use review:

review _Left  :: a -> Either a b
review _Right :: b -> Either a b

You can also kind-of pattern match on the Prism using preview, returning a Just if the match succeeds or Nothing if the match fails:

preview _Left  :: Either a b -> Maybe a
preview _Right :: Either a b -> Maybe b

However, preview is not quite as good as real honest-to-god pattern matching, because if you wish to handle every branch you can't prove in the types that your pattern match was exhaustive:

nonTypeSafe :: Either Char Int -> String
nonTypeSafe e =
case preview _Left e of
Just c -> replicate 3 c
Nothing -> case preview _Right e of
Just n -> replicate n '!'
Nothing -> error "Impossible!" -- Unsatisfying

However, this isn't a flaw with Prisms, but rather a limitation of preview. Prisms actually preserve enough information in the types to support exhaustive pattern matching! The trick for this was described a year ago by Tom Ellis, so I decided to finally implement his idea as the total library.

Example

Here's an example of a total pattern matching using the total library:

import Lens.Family.Total
import Lens.Family.Stock

total :: Either Char Int -> String -- Same as:
total = _case -- total = \case
& on _Left (\c -> replicate 3 c ) -- Left c -> replicate 3 c
& on _Right (\n -> replicate n '!') -- Right n -> replicate n '!'

The style resembles LambdaCase. If you want to actually supply a value in the same expression, you can also write:

e & (_case
& on _Left ...
& on _Right ... )

There's nothing unsafe going on under the hood. on is just 3 lines of code:

on p f g x = case p Left x of
Left l -> f l
Right r -> g r

(&) is just an operator for post-fix function application:

x & f = f x

... and _case is just a synonym for a type class method that checks if a type is uninhabited.

class Empty a where
impossible :: a -> x

_case = impossible

The rest of the library is just code to auto-derive Empty instances using Haskell generics. The entire library is about 40 lines of code.

Generics

Here's an example of exhaustive pattern matching on a user-defined data type:

{-# LANGUAGE DeriveGeneric   #-}
{-# LANGUAGE TemplateHaskell #-}

import Control.Lens.TH (makePrisms)
import GHC.Generics (Generic)
import Lens.Family.Total

data Example a b c = C1 a | C2 b | C3 c deriving (Generic)

makePrisms ''Example

instance (Empty a, Empty b, Empty c) => Empty (Example a b c)

example :: Example String Char Int -> String -- Same as:
example = _case -- example = \case
& on _C1 (\s -> s ) -- C1 s -> s
& on _C2 (\c -> replicate 3 c ) -- C2 c -> replicate 3 c
& on _C3 (\n -> replicate n '!') -- C3 n -> replicate n '!'

This uses two features to remove most of the boiler-plate:

  • deriving (Generic) auto-generates the code for the Empty instance
  • makePrisms uses Template Haskell to auto-generate Prisms for our type.

For those who prefer the lens-family-* suite of libraries over lens, I opened up an issue against lens-family-th so that Lens.Family.TH.makeTraversals can be used in place of makePrisms for exhaustive pattern matching.

When we write this instance in our original code:

instance (Empty a, Empty b, Empty c) => Empty (Example a b c)

... that uses Haskell generics to auto-generate the implementation of the impossible function:

instance (Empty a, Empty b, Empty c) => Empty (Example a b c)
impossible (C1 a) = impossible a
impossible (C2 b) = impossible b
impossible (C3 c) = impossible c

That says that the type Example a b c is uninhabited if and only if the types a, b, and c are themselves uninhabited.

When you write makePrisms ''Example, that creates the following three prisms:

_C1 :: Prism (Example a b c) (Example a' b  c ) a a'
_C2 :: Prism (Example a b c) (Example a b' c ) b b'
_C3 :: Prism (Example a b c) (Example a b c') c c'

These Prisms are useful in their own right. You can export them in lieu of your real constructors and they are more powerful in several respects.

Backwards compatibility

There is one important benefit to exporting Prisms and hiding constructors: if you export Prisms you can change your data type's internal representation without changing your API.

For example, let's say that I change my mind and implement Example as two nested Eithers:

type Example a b c = Either a (Either b c)

Had I exposed my constructors, this would be a breaking change for my users. However, if I expose Prisms, then I can preserve the old API by just changing the Prism definition. Instead of auto-generating them using Template Haskell, I can just write:

import Lens.Family.Stock (_Left, _Right)

_C1 = _Left
_C2 = _Right . _Left
_C3 = _Right . _Right

Done! That was easy and the user of my Prisms are completely unaffected by the changes to the internals.

Under the hood

The trick that makes total work is that every branch of the pattern match subtly alters the type. The value that you match on starts out as:

Example String Char Int

... and every time you pattern match against a branch, the on function Voids one of the type parameters. The pattern matching flows from bottom to top:

example = _case
& on _C1 (\s -> s ) -- Step 3: Example Void Void Void
& on _C2 (\c -> replicate 3 c ) -- Step 2: Example String Void Void
& on _C3 (\n -> replicate n '!') -- Step 1: Example String Char Void
-- Step 0: Example String Char Int

Finally, _case just checks that the final type (Example Void Void Void in this case) is uninhabited. All it does is check if the given type is an instance of the Empty type class, which only provides instances for uninhabited types. In this case Example Void Void Void is definitely uninhabited because Void (from Data.Void) is itself uninhabited:

instance Empty Void where
impossible = absurd

Lenses

What's neat is that this solution works not only for Prisms and Traversals, but for Lenses, too! Check this out:

example :: (Int, Char) -> String     -- Same as:
example = _case -- example = \case
& on _1 (\n -> replicate n '1') -- (n, _) -> replicate n '1'

... and the identity lens works (which is just id) works exactly the way you'd expect:

example :: (Int, Char) -> String        -- Same as:
example = _case -- example = \case
& on id (\(n, c) -> replicate n c) -- (n, c) -> replicate n c

I didn't intend that when I wrote the library: it just fell naturally out of the types.

Conclusion

The total library now allows library authors to hide their constructors and export a backwards compatible Prism API all while still preserving exhaustive pattern matching.

Note that I do not recommend switching to a lens-only API for all libraries indiscriminately. Use your judgement when deciding whether perfect backwards compatibility is worth the overhead of a lens-only API, both for the library maintainers and your users.

I put the code under the Lens.Family.Total module, not necessarily because I prefer the Lens.Family.* hierarchy, but rather because I would like to see Edward Kmett add these utilities to lens, leaving my library primarily for users of the complementary lens-family-* ecosystem.

You can find the total library on Hackage or on GitHub.

by Gabriel Gonzalez ([email protected]) at January 11, 2015 07:21 PM

Russell O'Connor

Secure diffie-hellman-group-exchange-sha256

Recently I have been working on purging DSA from my computer systems. The problem with DSA and ECDSA is that they fail catastrophically with when nonces are accidentally reused, or if the randomly generated nonces are biased. At about the same time, I was pleased to discover an article on securing SSH. It gives further advice in setting up SSH and I have proceeded to apply most of the recommendation listed there.

For key exchange algorithms, the article suggests using curve25519-sha256 and falling back to diffie-hellman-group-exchange-sha256 for compatibility purposed if you must. The diffie-hellman-group-exchange-sha256 protocol allows the client and server to negotiate a prime field to perform the key exchange in. In order to avoid using the smaller prime fields, the article suggests deleting prime numbers less than 2000 bits in size from /etc/ssh/moduli. The problem with this advice is that only the SSH server reads /etc/ssh/moduli; touching this file does nothing to secure your SSH client from using small prime fields during key negotiation. Securing the client is the important use case for diffie-hellman-group-exchange-sha256, because if you can control the server, then it means you will probably use curve25519-sha256 instead.

However, the protocol for diffie-hellman-group-exchange-sha256 does allow the client to negotiate the field side. The problem is that this ability is not exposed for configuration in SSH. To address this, I created a patch for OpenSSH that raises the minimum field size allowed for the diffie-hellman-group-exchange-sha256 key exchange for both the client and server. This means you do not need to edit the /etc/ssh/moduli file to increase the minimum field size for the server, but it will not hurt to do so either.

If you are running NixOS you can download the patch and add it to your /etc/nixos/configuration.nix file with the following attribute.

  nixpkgs.config.packageOverrides = oldpkgs: {
    openssh = pkgs.lib.overrideDerivation oldpkgs.openssh (oldAttrs: {
      patches = oldAttrs.patches ++ [ ./openssh-dh-grp-min.patch ];
    });
  };

As an aside, I noticed that this key exchange protocol has a design flaw in it. The hash signed by the server is the hash of V_C || V_S || I_C || I_S || K_S || min || n || max || p || g || e || f || K. The details of what those variables stand for is not important. What is important is that there is a older format of the protocol that is supported for backwards compatibility where the hash signed by the server is the hash of V_C || V_S || I_C || I_S || K_S || n || p || g || e || f || K. In this older protocol, the client only requests a field size without specifying the minimum and maximum allowed bounds. This is why the variables min and max are do not appear in the hash of the older protocol. A short header is sent by the client to determine which of these two versions of this protocol it is using.

The problem is that this header is not part of the hashed data. This little crack has potential to be an exploit. A MITM attacker could replace the header the client sends with the old protocol header, and then try to manipulate the remaining communication between the client and server so that both the client and server hash the same serialized byte string allowing the server to appear to be authenticated to the client, but where the client and server are interpreting that serialized byte string in two different ways. In particular the MITM wants the client to not be doing computation modulo some safe prime, but instead do modular arithmetic over a different ring entirely.

Fortunately this particular little crack does not appear to be wide enough to exploit. The incidental properties of the serialization format do not allow a successful manipulation, at least not in practical SSH configurations.

When one is signing a serialized blob of data, it is important that the data can be unambiguously parsed using only the contents of that data. This principle is violated here because one cannot decide if one will parse min and max without knowing the header, which is not part of the serialized blob of data. The reason this failure can so easily creep into the protocol is that neither the client nor the server actually have to parse that blob of data. They both serialize the data and then create or verify the signature, but parsing is never done. Therefore, parsing is never implemented. Since parsing is not implemented, it is easy to miss that the data serialization format is ambigous.

January 11, 2015 04:05 AM

Keegan McAllister

151-byte static Linux binary in Rust

Part of the sales pitch for Rust is that it's "as bare metal as C".1 Rust can do anything C can do, run anywhere C can run,2 with code that's just as efficient, and at least as safe (but usually much safer).

I'd say this claim is about 95% true, which is pretty good by the standards of marketing claims. A while back I decided to put it to the test, by making the smallest, most self-contained Rust program possible. After resolving a fewissues along the way, I ended up with a 151-byte, statically linked executable for AMD64 Linux. With the release of Rust 1.0-alpha, it's time to show this off.

Here's the Rust code:


#![crate_type="rlib"]
#![allow(unstable)]

#[macro_use] extern crate syscall;

use std::intrinsics;

fn exit(n: usize) -> ! {
unsafe {
syscall!(EXIT, n);
intrinsics::unreachable()
}
}

fn write(fd: usize, buf: &[u8]) {
unsafe {
syscall!(WRITE, fd, buf.as_ptr(), buf.len());
}
}

#[no_mangle]
pub fn main() {
write(1, "Hello!\n".as_bytes());
exit(0);
}

This uses my syscall library, which provides the syscall! macro. We wrap the underlying system calls with Rust functions, each exposing a safe interface to the unsafe syscall! macro. The main function uses these two safe functions and doesn't need its own unsafeannotation. Even in such a small program, Rust allows us to isolate memory unsafety to a subset of the code.

Because of crate_type="rlib", rustc will build this as a static library, from which we extract a single object file tinyrust.o:

$ rustc tinyrust.rs \
-O -C no-stack-check -C relocation-model=static \
-L syscall.rs/target
$ ar x libtinyrust.rlib tinyrust.o
$ objdump -dr tinyrust.o
0000000000000000 <main>:
0: b8 01 00 00 00 mov $0x1,%eax
5: bf 01 00 00 00 mov $0x1,%edi
a: be 00 00 00 00 mov $0x0,%esi
b: R_X86_64_32 .rodata.str1625
f: ba 07 00 00 00 mov $0x7,%edx
14: 0f 05 syscall
16: b8 3c 00 00 00 mov $0x3c,%eax
1b: 31 ff xor %edi,%edi
1d: 0f 05 syscall

We disable stack exhaustion checking, as well as position-independent code, in order to slim down the output. After optimization, the only instructions that survive come from inline assembly blocks in the syscall library.

Note that main doesn't end in a ret instruction. The exit function (which gets inlined) is marked with a "return type" of !, meaning "doesn't return". We make good on this by invoking the unreachableintrinsic after syscall!. LLVM will optimize under the assumption that we can never reach this point, making no guarantees about the program behavior if it is reached. This represents the fact that the kernel is actually going to kill the process before syscall!(EXIT, n) can return.

Because we use inline assembly and intrinsics, this code is not going to work on a stable-channel build of Rust 1.0. It will require an alpha or nightly build until such time as inline assembly and intrinsics::unreachable are added to the stable language of Rust 1.x.

Note that I didn't even use #![no_std]! This program is so tiny that everything it pulls from libstd is a type definition, macro, or fully inlined function. As a result there's nothing of libstd left in the compiler output. In a larger program you may need #![no_std], although its role is greatly reduced following the removal of Rust's runtime.

Linking

This is where things get weird.

Whether we compile from C or Rust,3 the standard linker toolchain is going to include a bunch of junk we don't need. So I cooked up my own linker script:

SECTIONS {
. = 0x400078;

combined . : AT(0x400078) ALIGN(1) SUBALIGN(1) {
*(.text*)
*(.data*)
*(.rodata*)
*(.bss*)
}
}

We smash all the sections together, with no alignment padding, then extract that section as a headerless binary blob:

$ ld --gc-sections -e main -T script.ld -o payload tinyrust.o
$ objcopy -j combined -O binary payload payload.bin

Finally we stick this on the end of a custom ELF header. The header is written in NASM syntax but contains no instructions, only data fields. The base address 0x400078 seen above is the end of this header, when the whole file is loaded at 0x400000. There's no guarantee that ld will put main at the beginning of the file, so we need to separately determine the address of main and fill that in as the e_entry field in the ELF file header.

$ ENTRY=$(nm -f posix payload | grep '^main ' | awk '{print $3}')
$ nasm -f bin -o tinyrust -D entry=0x$ENTRY elf.s
$ chmod +x ./tinyrust
$ ./tinyrust
Hello!

It works! And the size:

$ wc -c < tinyrust
158

Seven bytes too big!

The final trick

To get down to 151 bytes, I took inspiration from this classic article, which observes that padding fields in the ELF header can be used to store other data. Like, say, a string constant. The Rust code changes to access this constant:


use std::{mem, raw};

#[no_mangle]
pub fn main() {
let message: &'static [u8] = unsafe {
mem::transmute(raw::Slice {
data: 0x00400008 as *const u8,
len: 7,
})
};

write(1, message);
exit(0);
}

A Rust slicelike &[u8] consists of a pointer to some memory, and a length indicating the number of elements that may be found there. The module std::raw exposes this as an ordinary struct that we build, then transmute to the actual slice type. The transmute function generates no code; it just tells the type checker to treat our raw::Slice<u8> as if it were a &[u8]. We return this value out of the unsafe block, taking advantage of the "everything is an expression" syntax, and then print the message as before.

Trying out the new version:

$ rustc tinyrust.rs \
-O -C no-stack-check -C relocation-model=static \
-L syscall.rs/target
$ ar x libtinyrust.rlib tinyrust.o
$ objdump -dr tinyrust.o
0000000000000000 <main>:
0: b8 01 00 00 00 mov $0x1,%eax
5: bf 01 00 00 00 mov $0x1,%edi
a: be 08 00 40 00 mov $0x400008,%esi
f: ba 07 00 00 00 mov $0x7,%edx
14: 0f 05 syscall
16: b8 3c 00 00 00 mov $0x3c,%eax
1b: 31 ff xor %edi,%edi
1d: 0f 05 syscall

...
$ wc -c < tinyrust
151
$ ./tinyrust
Hello!

The object code is the same as before, except that the relocation for the string constant has become an absolute address. The binary is smaller by 7 bytes (the size of "Hello!\n") and it still works!

You can find the full code on GitHub. The code in this article works on rustc 1.0.0-dev (44a287e6e 2015-01-08). If I update the code on GitHub, I will also update the version number printed by the included build script.

I'd be curious to hear if anyone can make my program smaller!


  1. C is not really "bare metal", but that's another story

  2. From a pure language perspective. If you want to talk about availability of compilers and libraries, then Rust still has a bit of a disadvantage ;) 

  3. In fact, this code grew out of an earlier experiment with really small binaries in C. 

by keegan ([email protected]) at January 11, 2015 02:03 AM

January 10, 2015

Philip Wadler

Are the Greens or UKIP a major party? Have your say.

Ofcom has issued a draft ruling that the Greens are not a 'major party' but that UKIP is. Hard to justify, one might think, given that Carolyn Lucas has been a sitting MP since September 2008, while UKIP did not acquire its first MP until a by-election in October 2014. One consequence of Ofcom's decision is that the Greens may be shut out of any televised national election debates, while Farage is given a seat.

It's a draft ruling, and there is still time to have your say.
My own response to Ofcom is below.

Question 1: Please provide your views on:
a) the evidence of current support laid out in Annex 2, and
b) whether there is any other relevant evidence which you consider Ofcom should take into
account for the purposes of the 2015 review of the list of major parties:

The Green Party has had a sitting MP since September 2008, while UKIP has only had a sitting MP since October 2014. This relevant point appears nowhere in the annex.

Question 2: Do you agree with our assessment in relation to each of:
a) The existing major parties,
b) Traditional Unionist Voice in Northern Ireland,
c) The Green Party (including the Scottish Green Party), and
d) UKIP?
Please provide reasons for your views:

It is a scandal to include UKIP as a major party while excluding the Greens. Either both should be included, or both excluded. I would prefer to see both included.

While UKIP enjoys stronger support than the Greens in current opinion polls, this is a shortlived phenomenon in part driven by the increased coverage recently given to UKIP by news media. Ofcom's role should be to dampen the effect of media focus on the current bandwagon, not to amplify it.

Ofcom should ensure that all serious contenders have an opportunity to make their views heard. The cutoff for being a 'serious contender' should sit at support from around 5% of the electorate.

Question 3: Do you agree with the proposed amendment to Rule 9 of the PPRB Rules Procedures outlined in paragraph 3.7 above? Please provide reasons for your views:

I do not agree with the proposed amendment. It is stated the parties 'might'' raise unsustainable complaints, but no evidence is provided that this is a serious problem. It is more democratic to leave the decision to the Election Commission than to give Ofcom the ability to refuse complaints without any right of appeal.

[I note also that the reference on the web form to 'paragraph 3.7 above' is confusing. Not only does the relevant paragraph not appear on the web page with the question, the web page does not even contain a link to the document containing the paragraph.] 

by Philip Wadler ([email protected]) at January 10, 2015 03:33 PM

January 09, 2015

Danny Gratzer

Why Constructive Logic

Posted on January 9, 2015
Tags: types

Continuing on my quest of writing about my poorly thought out comments, let’s talk about constructive logic. A lot of people in and around the Haskell/FP community will make statements like

The Curry-Howard isomorphism means that you’re proving things in constructive logic.

Usually absent from these remarks is a nice explanation of why constructive logic matches up with the programming we know and love.

In this post I’d like to highlight what constructive logic is intended to capture and why this corresponds so nicely with programming.

A Bit of History

First things first, let’s discuss the actual origin of constructive logic. It starts with a mathematician and philosopher named Brouwer. He was concerned trying to give an answer to the question “What does it mean to know something to be true” where something is defined as a mathematical proposition.

He settled on the idea of proof being a sort of subjective and personal thing. I know something is true if and only if I can formulate some intuitive proof of it. When viewed this way, the proof I scribble down on paper doesn’t actually validate something’s truthfulness. It’s merely a serialization of my thought process for validating its truthfulness.

Notice that this line of reasoning doesn’t actually specify a precise definition of what verifying something intuitively means. I interpret this idea as something slightly more meta then any single formal system. Rather, when looking a formal system, you ought to verify that its axioms are admissible by your own intuition and then you may go on to accept proofs built off of these axioms.

Now after Brouwer started talking about these ideas Arend Heyting decided to try to write down a logic that captured this notion of “proof is intuition”. The result was this thing called intuitionistic logic. This logic is part of a broader family of logics called “constructive logics”.

Constructive Logic

The core idea of constructive logic is replacing the notion of truth found in classical logic with an intuitionist version. In a classical logic each proposition is either true or false, regardless of what we know about it.

In our new constructive system, a formula cannot be assigned either until we have direct evidence of it. It’s not that there’s a magical new boolean value, {true, false, i-don’t-know}, it’s just not a meaningful question to ask. It doesn’t make sense in these logics to say “A is true” without having a proof of A. There isn’t necessarily this Platonic notion of truthfulness, just things we as logicians can prove. This is sometimes why constructive logic is called “logic for humans”.

The consequences of dealing with things in this way can be boils down to a few things. For example, we now know that

  1. If ∃x. A(x) can be proven, then there is some term which we can readily produce t so that A(t) is provable
  2. If A ∨ B can be proven then either A or B is provable and we know which. (note that ∨ is the symbol for OR)

These make sense when you realize that ∃x. A(x) can only be proven if we have a direct example of it. We can’t indirectly reason that it really ought to exist or merely claim that it must be true in one of a set of cases. We actually need to introduce it by proving an example of it. When our logic enforces this of course we can produce that example!

The same goes for A ∨ B, in our logic the only way to prove A ∨ B is to either provide a proof of A or provide a proof of B. If this is the only way to build a we can always just point to how it was introduced!

If we extend this to and, : The only way to prove A ∧ B is to prove both A and B. If this is the only way to get to a proof of A ∧ B then of course we can get a proof of A from A ∧ B. is just behaving like a pair of proofs.

All of this points at one thing: our logic is structured so that we can only prove something when we directly prove it, that’s the spirit of Brouwer’s intuitionism that we’re trying to capture.

There are a lot of different incarnations of constructive logic, in fact pretty much every logic has a constructive cousin. They all share this notion of “We need a direct proof to be true” however. One thing to note that is that some constructive logics conflict a bit with intuitionism. While intuitionism might have provided some of the basis for constructive logics gradually people have poked and pushed the boundaries away from just Brouwer’s intuitionism. For example both Markov’s principle and Church’s thesis state something about all computable functions. While they may be reasonable statements we can’t give a satisfactory proof for them. This is a little confusing I know and I’m only going to talk about constructive logics that Brouwer would approve of.

I encourage the curious reader to poke further at this, it’s rather cool math.

Who on Earth Cares?

Now while constructive logic probably sounds reasonable, if weird, it doesn’t immediately strike me as particularly useful! Indeed, the main reason why computer science cares about constructivism is because we all use it already.

To better understand this, let’s talk about the Curry-Howard isomorphism. It’s that thing that wasn’t really invented by either Curry or Howard and some claim isn’t best seen as an isomorphism, naming is hard. The Curry-Howard isomorphism states that there’s a mapping from a type to a logical proposition and from a program to a proof.

To show some of the mappings for types

    CH(Either a b) = CH(a) ∨ CH(b)
    CH((a, b))     = CH(a) ∧ CH(b)
    CH( () )       =-- True
    CH(Void)       =-- False
    CH(a -> b)     = CH(a)  CH(b)

So a program with the type (a, b) is really a proof that a ∧ b is true. Here the truthfulness of a proposition really means that the corresponding type can be occupied by a program.

Now, onto why this logic we get is constructive. Recall our two conditions for a logic being constructive, first is that if ∃x. A(x) is provable then there’s a specific t where A(t) is provable.

Under the Curry Howard isomorphism, ∃ is mapped to existential types (I wonder how that got its name :). That means that a proof of ∃x. A(x) is something like

    -- Haskell ex. syntax is a bit gnaryl :/
    data Exists f = forall x. Exists f x

    ourProof :: Exists F
    ourProof = ...

Now we know the only way to construct an Exists F is to use the constructor Exists. This constructor means that there is at least one specific type for which we could prove f x. We can also easily produce this term as well!

    isProof :: Exists f -> (f x -> c) -> c
    isProof (Exists x) cont = cont x

We can always access the specific “witness” we used to construct this Exists type with pattern matching.

The next law is similar. If we have a proof of a ∨ b we’re supposed to immediately be able to produce a proof of a or a proof of b.

In programming terms, if we have a program Either a b we’re supposed to be able to immediately tell whether this returns Right or Left! We can make some argument that one of these must be possible to construct but we’re not sure which since we have to be able to actually run this program! If we evaluate a program with the type Either a b we’re guaranteed to get either Left a or Right b.

The Self-Sacrificing Definition of Constructive Logic

There are a few explanations of constructive logic that basically describe it as “Classical logic - the law of excluded middle”. More verbosely, a constructive logic is just one that forbids

  1. ∀ A. A ∨ ¬ A being provable (the law of excluded middle, LEM)
  2. ∀ A. ¬ (¬ A) → A being provable (the law of double negation)

I carefully chose the words “being provable” because we can easily introduce these as a hypothesis to a proof and still have a sound system. Indeed this is not uncommon when working in Coq or Agda. They’re just not a readily available tool. Looking at them, this should be apparent as they both let us prove something without directly proving it.

This isn’t really a defining aspect of constructivism, just a natural consequence. If we need a proof of A to show A to be true if we admit A ∨ ¬ A by default it defeats the point. We can introduce A merely by showing ¬ (¬ A) which isn’t a proof of A! Just a proof that it really ought to be true.

In programming terms this is saying we can’t write these two functions.

    data Void

    doubleNeg :: ((a -> Void) -> Void) -> a
    doubleNeg = ...

    lem :: Either a (a -> Void)
    lem = ...

For the first one we have to choices, either we use this (a -> Void) -> Void term we’re given or we construct an a without it. Constructing an arbitrary a without the function is just equivalent to forall a. a which we know to be unoccupied. That means we have to use (a -> Void) -> Void which means we have to build an a -> Void. We have no way of doing something interesting with that supplied a however so we’re completely stuck! The story is similar with lem.

In a lot of ways this definition strikes me in the same way that describing functional programming as

Oh it’s just programming where you don’t have variables or objects.

Or static typing as

It’s just dynamic typed programming where you can’t write certain correct programs

I have a strong urge to say “Well.. yes but no!”.

Wrap Up

Hopefully this helps clarify what exactly people mean when they say Haskell corresponds to a constructive logic or programs are proofs. Indeed this constructivism gives rise to a really cool thing called “proof relevant mathematics”. This is mathematics done purely with constructive proofs. One of the latest ideas to trickle from mathematics to computers is homotopy type theory where we take a proof relevant look at identity types.

Before I wrap up I wanted to share one funny little thought I heard. Constructive mathematics has found a home in automated proof systems. Imagine Brouwer’s horror at hearing we do “intuitionist” proofs that no one will ever look at or try to understand beyond some random mechanical proof assistant!

Thanks to Jon Sterling and Darryl McAdams for the advice and insight

<script type="text/javascript"> var disqus_shortname = 'codeco'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

January 09, 2015 12:00 AM

January 08, 2015

Danny Gratzer

A Crash Course on ML Modules

Posted on January 8, 2015
Tags: sml, haskell

I was having lunch with a couple of Haskell programmers the other day and the subject of the ML family came up. I’ve been writing a lot of ML lately and mentioned that I thought *ML was well worth learning for the average Haskeller. When pressed why the best answer I could come up with was “Well.. clean language, Oh! And an awesome module system” which wasn’t my exactly most compelling response.

I’d like to outline a bit of SML module system here to help substantiate why looking at an ML is A Good Thing. All the code here should be translatable to OCaml if that’s more your taste.

Concepts

In ML languages modules are a well thought out portion of the language. They aren’t just “Oh we need to separate these names… modules should work”. Like any good language they have methods for abstraction and composition. Additionally, like any good part of an ML language, modules have an expressive type language for mediating how composition and abstraction works.

So to explain how this module system functions as a whole, we’ll cover 3 subjects

  1. Structures
  2. Signatures
  3. Functors

Giving a cursory overview of what each thing is and how it might be used.

Structures

Structures are the values in the module language. They are how we actually create a module. The syntax for them is

    struct
      fun flip f x y = f y x
      datatype 'a list = Con of ('a * 'a list) | Nil
      ...
    end

A quick note to Haskellers, in ML types are lower case and type variables are written with ’s. Type constructors are applied “backwards” so List a is 'a list.

So they’re just a bunch of a declarations stuffed in between a struct and end. This is a bit useless if we can’t bind it to a name. For that there’s

    structure M = struct val x = 1 end

And now we have a new module M with a single member, x : int. This is just like binding a variable in the term language except a “level up” if you like. We can use this just like you would use modules in any other language.

    val x' = M.x + 1

Since struct ... end can contain any list of declarations we can nest module bindings.

    structure M' =
      struct
        structure NestedM = M
      end

And access this using ..

    val sum = M'.NestedM.x + M.x

As you can imagine, it would get a bit tedious if we needed to . our way to every single module access. For that we have open which just dumps a module’s exposed contents into our namespace. What’s particularly interesting about open is that it is a “normal” declaration and can be nested with let.

    fun f y =
      let open M in
        x + y
      end

OCaml has gone a step further and added special syntax for small opens. The “local opens” would turn our code into

    let f y = M.(x + y)

This already gives us a lot more power than your average module system. Structures basically encapsulate what we’d expect in a module system, but

  1. Structures =/= files
  2. Structures can be bound to names
  3. Structures can be nested

Up next is a look at what sort of type system we can impose on our language of structures.

Signatures

Now for the same reason we love types in the term language (safety, readability, insert-semireligious-rant) we’d like them in the module language. Happily ML comes equipped with a feature called signatures. Signature values look a lot like structures

    sig
      val x : int
      datatype 'a list = Cons of ('a * 'a list) | Nil
    end

So a signature is a list of declarations without any implementations. We can list algebraic data types, other modules, and even functions and values but we won’t provide any actual code to run them. I like to think of signatures as what most documentation rendering tools show for a module.

As we had with structures, signatures can be given names.

    signature MSIG = sig val x : int end

On their own signatures are quite useless, the whole point is that we can apply them to modules after all! To do this we use : just like in the term language.

    structure M : MSIG = struct val x = 1 end

When compiled, this will check that M has at least the field x : int inside its structure. We can apply signatures retroactively to both module variables and structure values themselves.

    structure M : MSIG = struct val x = 1 end : MSIG

One interesting feature of signatures is the ability to leave certain types abstract. For example, when implementing a map the actual implementation of the core data type doesn’t belong in the signature.

    signature MAP =
      sig
        type key
        type 'a table

        val empty : 'a table
        val insert : key -> 'a -> 'a table -> 'a table
        val lookup : key -> 'a table -> 'a option
      end

Notice that the type of keys and tables are left abstract. When someone applies a signature they can do so in two ways, weak or strong ascription. Weak ascription (:) means that the constructors of abstract types are still accessible, but the signature does hide all unrelated declarations in the module. Strong ascription (:>) makes the abstract types actually abstract.

Every once in a while we need to modify a signature. We can do this with the keywords where type. For example, we might implement a specialization of MAP for integer keys and want our signature to express this

    structure IntMap :> MAP where type key = int =
      struct ... end

This incantation leaves the type of the table abstract but specializes the keys to an int.

Last but not least, let’s talk about abstraction in module land.

Functors

Last but not least let’s talk about the “killer feature” of ML module systems: functors. Functors are the “lifting” of functions into the module language. A functor is a function that maps modules with a certain signature to functions of a different signature.

Jumping back to our earlier example of maps, the equivalent in Haskell land is Data.Map. The big difference is that Haskell gives us maps for all keys that implement Ord. Our signature doesn’t give us a clear way to associate all these different modules, one for each Orderable key, that are really the same thing. We can represent this relationship in SML with

    signature ORD =
      sig
        type t
        val compare : t * t -> order
      end

    functor RBTree (O : ORD) : MAP where type key = O.t =
      struct
        open O
        ....
      end

Which reads as “For any module implementing Ord, I can give you a module implementing MAP which keys of type O.t”. We can then instantiate these

    structure IntOrd =
      struct
        type t = int
        val compare = Int.compare end
      end
    structure IntMap = RBTree(IntOrd)

Sadly SML’s module language isn’t higher order. This means we can’t assign functors a type (there isn’t an equivalent of ->) and we can’t pass functors to functors. Even with this restriction functors are tremendously useful.

One interesting difference between SML and OCaml is how functors handle abstract types. Specifically, is it the case that

F(M).t = F(M).t

In SML the answer is (surprisingly) no! Applying a functor generates brand new abstract types. This is actually beneficial when you remember SML and OCaml aren’t pure. For example you might write a functor for handling symbol tables and internally use a mutable symbol table. One nifty trick would be to keep of type of symbols abstract. If you only give back a symbol upon registering something in the table, this would mean that all symbols a user can supply are guaranteed to correspond to some entry.

This falls apart however if functors are extensional. Consider the following REPL session

    > structure S1 = SymbolTable(WhateverParameters)
    > structure S2 = SymbolTable(WhateverParameters)
    > val key = S1.register "I'm an entry"
    > S2.lookup key
    Error: no such key!

This will not work if S1 and S2 have separate key types.

To my knowledge, the general conclusion is that generative functors (ala SML) are good for impure code, but applicative functors (ala OCaml and BackPack) really shine with pure code.

Wrap Up

We’ve covered a lot of ground in this post. This wasn’t an exhaustive tour of every feature of ML module systems, but hopefully I got the jist across.

If there’s one point to take home: In a lot of languages modules are clearly a bolted on construction. They’re something added on later to fix “that library problem” and generally consist of the same “module <-> file” and “A module imports others to bring them into scope”. In ML that’s simply not the case. The module language is a rich, well thought out thing with it’s own methods of abstraction, composition, and even a notion of types!

I wholeheartedly recommend messing around a bit with OCaml or SML to see how having these things impacts your thought process. I think you’ll be pleasantly surprised.

<script type="text/javascript"> var disqus_shortname = 'codeco'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

January 08, 2015 12:00 AM

January 07, 2015

The GHC Team

GHC Weekly News - 2015/01/07

Hi *, it's time for another GHC Weekly News! This week's edition will actually be covering the last two/three weeks; your editor has missed the past few editions due to Holiday madness (and also some relaxation, which is not madness). It's also our first news posting in 2015!

So let's get going without any further delay!

GHC HQ met this week after the Christmas break; some of our notes include:

  • Since Austin is back, he'll be spending some time finishing up all the remaining binary distributions for GHC 7.8.4 and GHC 7.10.1 RC1 (mostly, FreeBSD and OS X builds).
  • We've found that 7.10.1 RC1 is working surprisingly well for users so far; to help users accomodate the changes, Herbert has conveniently written a migration guide for users for their most common problems when upgrading to 7.10.1: https://ghc.haskell.org/trac/ghc/wiki/Migration/7.10
  • We're aiming to release the 2nd Release Candidate for GHC 7.10.1 on January 19th. We're hoping this will be the last RC, with 7.10.1 final popping up in the middle of February.
  • GHC HQ may tentatively be working to release another GHC 7.8 release, but only for a specific purpose: to allow it to compile with 7.10.1. This will make it significantly easier for users to compile old GHCs (perhaps on newer platforms). However, we're not yet 100% decided on this, and we will likely only do a 'very minor release' of the source tarball, should this be the case. Thanks to Edward Yang for helping with this.
  • For future GHC releases on Windows, we're looking into adopting Neil Mitchell's new binary distribution of GHC, which is a nice installer that includes Cabal, MSYS and GHC. This should significantly lower the burden for Windows users to use GHC and update, ship or create packages. While we're not 100% sure we'll be able to have it ready for 7.10.1, it looks promising. Thanks Neil! (For more info, read Neil's blog post here: http://neilmitchell.blogspot.co.at/2014/12/beta-testing-windows-minimal-ghc.html )

There's also been some movement and chatter on the mailing lists, as usual.

  • Joachim Breitner made an exciting announcement: he's working on a new performance dashboard for GHC, so we can more easily track and look at performance results over time. The current prototype looks great, and Joachim and Austin are working together to make this an official piece of GHC's infrastructure: https://www.haskell.org/pipermail/ghc-devs/2015-January/007885.html
  • Over the holiday, Simon went and implemented a nice new feature for GHC: detection of redundant constraints. This means if you mention Ord in a type signature, but actually use nothing which requires that constraint, GHC can properly warn about it. This will be going into 7.12: https://www.haskell.org/pipermail/ghc-devs/2015-January/007892.html
  • Now that GHC 7.10 will feature support for DWARF based debugging information, Johan Tibell opened a very obvious discussion thread: what should we do about shipping GHC and its libraries with debug support? Peter chimed in with some notes - hopefully this will all be sorted out in time for 7.10.1 proper: https://www.haskell.org/pipermail/ghc-devs/2015-January/007851.html

Closed tickets the past few weeks include: #8984, #9880, #9732, #9783, #9575, #9860, #9316, #9845, #9913, #9909, #8650, #9881, #9919, #9732, #9783, #9915, #9914, #9751, #9744, #9879, #9876, #9032, #7473, #9764, #9067, #9852, #9847, #9891, #8909, #9954, #9508, #9586, and #9939.

by thoughtpolice at January 07, 2015 11:04 PM

FP Complete

Announcing: mutable-containers 0.2

As part of our high-performance computing work, I recently found myself in need of some fast mutable containers. The code is now available on both Hackage and Stackage. The code is pretty young, and is open to design changes still. That said, the currently released version (0.2.0) is well tested and performs fairly well. If there are ideas for improvement, please let me know!

Below is the content of the README file, which gives a good overview of the package, as well as benchmark numbers and test coverage statistics (spoiler: 100%). As always, you can see the README on Github or on Stackage.


One of Haskell's strengths is immutable data structures. These structures make it easier to reason about code, simplify concurrency and parallelism, and in some cases can improve performance by allowing sharing. However, there are still classes of problems where mutable data structures can both be more convenient, and provide a performance boost. This library is meant to provide such structures in a performant, well tested way. It also provides a simple abstraction over such data structures via typeclasses.

Before anything else, let me provide the caveats of this package:

  • Don't use this package unless you have a good reason to! Immutable data structures are a better approach most of the time!
  • This code is intentionally not multithread safe. If you need something like a concurrent queue, there are many options on Hackage, from Chan to TChan, to chaselev-deque.

We'll first talk about the general approach to APIs in this package. Next, there are two main sets of abstractions provided, which we'll cover in the following two sections, along with their concrete implementations. Finally, we'll cover benchmarks.

API structure

The API takes heavy advantage of the PrimMonad typeclass from the primitive package. This allows our data structures to work in both IO and ST code. Each data structure has an associated type, MCState, which gives the primitive state for that structure. For example, in the case of IORef, that state is RealWorld, whereas for STRef s, it would be s. This associated type is quite similar to the PrimState associated type from primitive, and in many type signatures you'll see an equality constraint along the lines of:

PrimState m ~ MCState c

For those who are wondering, MCState stands for "mutable container state."

All actions are part of a typeclass, which allows for generic access to different types of structures quite easily. In addition, we provide type hint functions, such as asIORef, which can help specify types when using such generic functions. For example, a common idiom might be:

ioref <- fmap asIORef $ newRef someVal

Wherever possible, we stick to well accepted naming and type signature standards. For example, note how closely modifyRef and modifyRef' match modifyIORef and modifyIORef'.

Single cell references

The base package provides both IORef and STRef as boxed mutable references, for storing a single value. The primitive package also provides MutVar, which generalizes over both of those and works for any PrimMonad instance. The MutableRef typeclass in this package abstracts over all three of those. It has two associated types: MCState for the primitive state, and RefElement to specify what is contained by the reference.

You may be wondering: why not just take the reference as a type parameter? That wouldn't allow us to have monomorphic reference types, which may be useful under some circumstances. This is a similar motivation to how the mono-traversable package works.

In addition to providing an abstraction over IORef, STRef, and MutVar, this package provides four additional single-cell mutable references. URef, SRef, and BRef all contain a 1-length mutable vector under the surface, which is unboxed, storable, and boxed, respectively. The advantage of the first two over boxed standard boxed references is that it can avoid a significant amount of allocation overhead. See the relevant Stack Overflow discussion and the benchmarks below.

While BRef doesn't give this same advantage (since the values are still boxed), it was trivial to include it along with the other two, and does actually demonstrate a performance advantage. Unlike URef and SRef, there is no restriction on the type of value it can store.

The final reference type is PRef. Unlike the other three mentioned, it doesn't use vectors at all, but instead drops down directly to a mutable bytearray to store values. This means it has slightly less overhead (no need to store the size of the vector), but also restricts the types of things that can be stored (only instances of Prim).

You should benchmark your program to determine the most efficient reference type, but generally speaking PRef will be most performant, followed by URef and SRef, and finally BRef.

Collections

Collections allow you to push and pop values to the beginning and end of themselves. Since different data structures allow different operations, each operation goes into its own typeclass, appropriately named MutablePushFront, MutablePushBack, MutablePopFront, and MutablePopBack. There is also a parent typeclass MutableCollection which provides:

  1. The CollElement associated type to indicate what kinds of values are in the collection.
  2. The newColl function to create a new, empty collection.

The mono-traversable package provides a typeclass IsSequence which abstracts over sequence-like things. In particular, it provides operations for cons, snoc, uncons, and unsnoc. Using this abstraction, we can provide an instance for all of the typeclasses listed above for any mutable reference containing an instance of IsSequence, e.g. IORef [Int] or BRef s (Seq Double).

Note that the performance of some of these combinations is terrible. In particular, pushBack or popBack on a list requires traversing the entire list, and any push operations on a Vector requires copying the entire contents of the vector. Caveat emptor! If you must use one of these structures, it's highly recommended to use Seq, which gives the best overall performance.

However, in addition to these instances, this package also provides two additional data structures: double-ended queues and doubly-linked lists. The former is based around mutable vectors, and therefore as unboxed (UDeque), storable (SDeque), and boxed (BDeque) variants. Doubly-linked lists have no such variety, and are simply DLists.

For general purpose queue-like structures, UDeque or SDeque is likely to give you best performance. As usual, benchmark your own program to be certain, and see the benchmark results below.

Benchmark results

The following benchmarks were performed on January 7, 2015, against version 0.2.0.

Ref benchmark

benchmarking IORef
time                 4.322 μs   (4.322 μs .. 4.323 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 4.322 μs   (4.322 μs .. 4.323 μs)
std dev              1.401 ns   (1.114 ns .. 1.802 ns)

benchmarking STRef
time                 4.484 μs   (4.484 μs .. 4.485 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 4.484 μs   (4.484 μs .. 4.484 μs)
std dev              941.0 ps   (748.5 ps .. 1.164 ns)

benchmarking MutVar
time                 4.482 μs   (4.482 μs .. 4.483 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 4.482 μs   (4.482 μs .. 4.483 μs)
std dev              843.2 ps   (707.9 ps .. 1.003 ns)

benchmarking URef
time                 2.020 μs   (2.019 μs .. 2.020 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.020 μs   (2.019 μs .. 2.020 μs)
std dev              955.2 ps   (592.2 ps .. 1.421 ns)

benchmarking PRef
time                 2.015 μs   (2.014 μs .. 2.015 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.014 μs   (2.014 μs .. 2.015 μs)
std dev              901.3 ps   (562.8 ps .. 1.238 ns)

benchmarking SRef
time                 2.231 μs   (2.230 μs .. 2.232 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.231 μs   (2.230 μs .. 2.231 μs)
std dev              1.938 ns   (1.589 ns .. 2.395 ns)

benchmarking BRef
time                 4.279 μs   (4.279 μs .. 4.279 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 4.279 μs   (4.279 μs .. 4.279 μs)
std dev              1.281 ns   (1.016 ns .. 1.653 ns)

Deque benchmark

time                 8.371 ms   (8.362 ms .. 8.382 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 8.386 ms   (8.378 ms .. 8.398 ms)
std dev              29.25 μs   (20.73 μs .. 42.47 μs)

benchmarking IORef (Seq Int)
time                 142.9 μs   (142.7 μs .. 143.1 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 142.7 μs   (142.6 μs .. 142.9 μs)
std dev              542.8 ns   (426.5 ns .. 697.0 ns)

benchmarking UDeque
time                 107.5 μs   (107.4 μs .. 107.6 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 107.5 μs   (107.4 μs .. 107.6 μs)
std dev              227.4 ns   (171.8 ns .. 297.8 ns)

benchmarking SDeque
time                 97.82 μs   (97.76 μs .. 97.89 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 97.82 μs   (97.78 μs .. 97.89 μs)
std dev              169.5 ns   (110.6 ns .. 274.5 ns)

benchmarking BDeque
time                 113.5 μs   (113.4 μs .. 113.6 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 113.6 μs   (113.5 μs .. 113.7 μs)
std dev              300.4 ns   (221.8 ns .. 424.1 ns)

benchmarking DList
time                 156.5 μs   (156.3 μs .. 156.6 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 156.4 μs   (156.3 μs .. 156.6 μs)
std dev              389.5 ns   (318.3 ns .. 502.8 ns)

Test coverage

As of version 0.2.0, this package has 100% test coverage. If you look at the report yourself, you'll see some uncovered code; it's just the automatically derived Show instance needed for QuickCheck inside the test suite itself.

January 07, 2015 11:00 AM

January 06, 2015

Philip Wadler

A contrary view to 'Antidotes to the Imitation Game'

For five years, Barry Cooper has run the Alan Turing Year (ATY) mailing list, tracking worldwide activity related to the Turing Centennial and after. If you want a contrast to the view given by Antidotes to The Imitation Game, see Barry's most recent ATY post.

While I agree that fiction can sometimes can closer to truth than nonfiction, I disagree with Barry's claim that the 'higher-order' interpretation of the film accurately captures the arc of Turing's life. I suspect the real Turing differs hugely from the film's version, despite the effort and skill Tyldum, Cumberbatch, and others invested in the film.

Many computing scientists are disappointed by the divergence from history in The Imitation Game, while others think that if it does well at the Academy Awards that the popularisation of Turing will be good for our profession. There is something to be said for both points.

Hollywood's attempts at biography seems to inevitably involve gross oversimplification or distortion. Is this really necessary for a film to succeed? Does anyone have favourite examples of films that did not grossly distort their subject matter?

by Philip Wadler ([email protected]) at January 06, 2015 02:02 PM

Magnus Therning

Adder as a state machine

In my previous post I wrote about a, probably rather naïve, approach to constructing state machines in Haskell. I ended it with a saying that the pattern matching of Haskell makes it rather simple to manually write the step function required to create a working state machines. Hopefully this post can convince you I’m right.

The vehicle I’ve chosen is a very simple machine capable of reading two integers and adding them. In slightly more detail it’s a machine that:

  1. reads a total of three parts, two integers followed by a plus sign (+)
  2. each part is separated by whitespace
  3. on errors the machine resets and starts over

The signals (a.k.a. the output alphabet) is the following type

data CalcSignal = CalcNothing | CalcResult Int | CalcError CalcStates String
    deriving (Eq, Show)

The events (a.k.a. the input alphabet) is simply Char. The states are

data CalcStates = Start | ReadFirst Int | ReadSecond Int Int | ReadOperator Int Int
    deriving (Eq, Show)

where the names hopefully are self explanatory. The type of the machine is then

type CalcMachine = Machine CalcStates Char CalcSignal

The machine itself can then be written like this:

calcMachine :: CalcMachine
calcMachine = createMachine Start go
    where
        go Start e
            | isNumber e = (ReadFirst (read [e]), CalcNothing)
            | otherwise = (Start, CalcError Start "No number")

        go s@(ReadFirst i) e
            | isNumber e = (ReadFirst (10 * i + read [e]), CalcNothing)
            | isSpace e = (ReadSecond i 0, CalcNothing)
            | otherwise = (Start, CalcError s "Bad format")

        go s@(ReadSecond l i) e
            | isNumber e = (ReadSecond l (10 * i + read [e]), CalcNothing)
            | isSpace e = (ReadOperator l i, CalcNothing)
            | otherwise = (Start, CalcError s "Bad format")

        go s@(ReadOperator i j) e
            | e == '+' = (Start, CalcResult (i + j))
            | isSpace e = (s, CalcNothing)
            | otherwise = (Start, CalcError s "Bad operator")

That’s rather simple and easy to read I find. Though I’m not sure it scales too well to much larger machines. I’ve not really used any DSLs to create large machines either, so I don’t know how well any method scales ;)

To do a bit of exploratory testing it’s handy to create the following function

calculate :: String -> IO ()
calculate = foldM_ go calcMachine
    where 
        go mach c = do
            let (m, s) = stepMachine mach c
            print s 
            return m

Using that function it’s easy to check if the machine works as intended.

> calculate "56 67 +"
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcResult 123

So far so good. What about the behaviour on errors?

> calculate "5a6 67 +"
CalcNothing
CalcError (ReadFirst 5) "Bad format"
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcNothing
CalcResult 73

That looks good enough to me. Though there is (at least) one detail of how the machine works that might be surprising and hence should be fixed, but I’ll leave that as an exercise for the reader ;)

As I mentioned in the previous post I’ve been using this method for writing state machines to implement two different protocols. For the IO I used conduit which means I had to turn the state machine into a conduit. I’ll write about that in a later post though.

January 06, 2015 12:00 AM

FP Complete

A New Release for the New Year: Announcing Release 3.2

A New Release for the New Year

We recently released version 3.2 of FP Haskell Center. We want to take this opportunity to list some of the new features and highlight the additions of a Hosted Haddocks button and the ability to set up personal RSS feeds within each account.

3.2 Features List

  • Support for downloading extra packages from arbitrary.tar and .gz urls (available from the “extra packages” tab of the settings page)
  • Auto-insert functionality works for error messages (used to only work for warnings)
  • Toggle executable bit in UI. You can now make data files executable inside the IDE.
  • Updated hlint version for better functionality
  • Hosted Haddock button
  • Per-user RSS feed /user-feed/username to access

More about the features we think you’ll be the most interested in

Hosted Haddock Button

Often times when you’re working on a codebase, it’s convenient to generate the Haddocks to get an idea of what’s going on. It’s also useful to be able to share those generated Haddocks with others. FP Haskell Center now allows you to do both with a single click. Inside the deployment menu, you can now generated Haddocks for your current project. Links to dependencies will be created correctly, and the generated URL is fully shareable with others.

Per-user RSS Feed

Users now have the ability to set up personal RSS Feeds within their accounts. This answers a request from some users to be able to more easily let people stay up-to-date with their content. This ties in nicely with our previous addition of Disqus comments.

Feedback is always appreciated

We are proud of every improvement we make to FP Haskell Center and look forward to your feedback. With each release we are continuing to raise the quality of our product.

January 06, 2015 12:00 AM

January 05, 2015

Functional Jobs

Senior Software Engineer at McGraw-Hill Education (Full-time)

This Senior Software Engineer position is with the new LearnSmart team at McGraw-Hill Education's new and growing Research & Development center in Boston's Innovation District.

We make software that helps college students study smarter, earn better grades, and retain more knowledge.

The LearnSmart adaptive engine powers the products in our LearnSmart Advantage suite — LearnSmart, SmartBook, LearnSmart Achieve, LearnSmart Prep, and LearnSmart Labs. These products provide a personalized learning path that continuously adapts course content based on a student’s current knowledge and confidence level.

On our team, you'll get to:

  • Move textbooks and learning into the digital era
  • Create software used by millions of students
  • Advance the state of the art in adaptive learning technology
  • Make a real difference in education

Our team's products are built with Flow, a functional language in the ML family. Flow lets us write code once and deliver it to students on multiple platforms and device types. Other languages in our development ecosystem include especially JavaScript, but also C++, SWF (Flash), and Haxe.

If you're interested in functional languages like Scala, Swift, Erlang, Clojure, F#, Lisp, Haskell, and OCaml, then you'll enjoy learning Flow. We don't require that you have previous experience with functional programming, only enthusiasm for learning it. But if you have do some experience with functional languages, so much the better! (On-the-job experience is best, but coursework, personal projects, and open-source contributions count too.)

We require only that you:

  • Have a solid grasp of CS fundamentals (languages, algorithms, and data structures)
  • Be comfortable moving between multiple programming languages
  • Be comfortable with modern software practices: version control (Git), test-driven development, continuous integration, Agile

Get information on how to apply for this position.

January 05, 2015 02:15 PM

Douglas M. Auclair (geophf)

December 2014 1HaskellADay Problems and Solutions


December 2014
  • December 30th, 2014: Why can't we all just consensus along? for today's Haskell problem lpaste.net/117487 Pro-, con-, anti-, ... it's all sensus to me! lpaste.net/117490
  • December 29th, 2014: Uh, wait! What? It's another day? We need another #haskell puzzler? Here ya go! Rosalind subs (not grinders) http://lpaste.net/117421 Rose petals falling ... and a soution to the #rosalind subs problem http://lpaste.net/117481
  • December 24th, 2014: A||B||C == Merry Christmas for today's haskell problem http://lpaste.net/117213
  • December 23rd, 2014: Convergence, but not the movie, is today's #haskell problem http://lpaste.net/117089 A rational solution was posted by @sheshanaag at http://lpaste.net/117232 
  • December 22rd, 2014: Yesterday was about the zeroes, today's haskell problem is about the squares http://lpaste.net/116990
  • December 21nd, 2014: In which we want to have the most zeros. Fancy that! http://lpaste.net/116932 (I guess we're not playing Dodgeball, then!)
  • December 19th, 2014: A question for today's #haskell problem: how do YOUclear a Writer monad's log? http://lpaste.net/116849 A FlushableWriter solutionwas posted by @aditcr8 at http://lpaste.net/117115
  • December 18th, 2014: Why can't you spell 'PROTEIN' with proteins?lpaste.net/116723 The solution for this problem is to Eat your proteins, young man! lpaste.net/116808
  • December 17th, 2014: In which we find out that Mendel was into the Domme-scene for today's #haskell problem http://lpaste.net/116683 Whether hetero- or homo- a zygous is still a zygous... whatever that is. A solution to today's #haskell-zygous problem is posted at http://lpaste.net/116711
  • December 16th, 2014: All actors should relate to today's #haskell problem; it's all about H-A-M-M! http://lpaste.net/116578 Go HAMM it up; #Rosalind will love you. And the solution is so HAMMy! http://lpaste.net/116585
  • December 15th, 2014: Wanna Date? In which Jane Austen writes today's #haskell problem in her modern book titled: "The Hook Up" http://lpaste.net/116486 aka Pride and Prejudice. To get to our solution space, first we have to write an l-expr scanner/parser, of course. http://lpaste.net/116558
  • December 12th, 2014: In this case 'GC' does NOT mean 'garbage collection' for today's #rosalind #haskell problem lpaste.net/116305. Yeah, not garbage-collected, but a solution, nonetheless http://lpaste.net/116335 
  • December 11th, 2014: Why ... you ... wascally wabbit! Today's #haskell problem has a solution multiplying like Fibonaccis! http://lpaste.net/116216 I mean: rabbits. And speaking of rabid rabbits http://lpaste.net/116303, the solution is there. BLARGH! 
  • December 10th, 2014: "Hello, Strand, you're looking lovely in that dress!" Complimenting DNA for today's #haskell problem. http://lpaste.net/116147 No, wait: 'comPLEmenting' a strand of DNA. Sorry, Miss Rosalind! A nicely complimented solution to the DNA-complement problem posted at http://lpaste.net/116151
  • December 9th, 2014: I think this is my week with dear Rosalind: DNA-to-RNA transcriber for today's #haskell problem http://lpaste.net/116068 Another #TweetedHaskellSolution posted to http://lpaste.net/116071 for today's DNA-to-RNA transcriber
  • December 8th, 2014: Rosalind, dear Rosalie, wherefore art thou givest me bioinformatic puzzles to solve, counting nucleotides? http://lpaste.net/115961 #haskell Nucleotide-counter, HO! A nucleotide-counting solution is posted at http://lpaste.net/116011 
  • December 5th, 2014: This visit to Liar's-berg (or is it Truth-town?) we encounter only two women, so this should be 2x as easy to solve http://lpaste.net/114437 By coercion, we find that we are in Liarsberg, after all: http://lpaste.net/115769
  • December 4th, 2014: Gadzooks! It's (past) that time of day again for the daily #haskell problem. Here: eat some glow worms! http://lpaste.net/114435
  • December 3rd, 2014: Substantive citizens (with some substantive hints, for a change) for today's #haskell problem http://lpaste.net/114438
  • December 2nd, 2014: There are some curious fractions to find fromprojecteuler.net for today's #haskell problem at http://lpaste.net/114208 Asolution to this curious fraction problem is posted at lpaste.net/114209
  • December 1st, 2014: Today's Haskell problem is all about lambda-terms. It has NOTHING to do with for-loops and if-statements, ... NOT ONE BIT http://lpaste.net/115420 Solution: lpaste.net/115452 

by geophf ([email protected]) at January 05, 2015 12:31 AM

Christopher Done

Measuring duration in Haskell

Happy new year, everyone. It’s a new year and time for new resolutions. Let’s talk about time. Specifically, measuring it in Haskell.

A wrong solution

How do you measure how long something takes in Haskell? Here’s a naive attempt:

import Control.Exception
import Data.Time

main = do
    start <- getCurrentTime
    evaluate (sum [1 .. 1000000])
    end <- getCurrentTime
    print (diffUTCTime end start)

Running it, we see that it does what we expect:

λ> main
0.316653s

Inaccurate measuring

Here’s what’s wrong with this implementation:

  • The clock can be changed by the user at any moment.
  • Time synchronization services regularly update time.

If you’re on an Ubuntu desktop, time is updated when you first boot up from NTP servers. If you’re on a server, likely there is a daily cron job to update your time, because you don’t tend to reboot servers. My laptop has been on for 34 days:

$ uptime
21:13:47 up 34 days,  2:06,  3 users,  load average: 0.74, 0.83, 0.84

If I run a manual update, it adjusts my clock by 500 milliseconds:

$ sudo ntpdate ntp.ubuntu.com
5 Jan 21:11:53 ntpdate[4805]: adjust time server x.x.x.x offset 0.517166 sec

Because there is a certain amount of “drift” that occurs over time.

Additionally, leap seconds can be introduced at any time and cannot be predicated systematically, but there is at least a 6 months in advance notice for time servers. In 2015 there will be an extra second added to time in-between the 30th of June to the 1st of July.

These factors mean that if our main function is run during an update, the reported time could be completely wrong. For something simple like the above, maybe it doesn’t matter. For long term logging and statistics gathering, this would represent an anomaly. For a one-off, maybe it’s forgivable, because it’s convenient. But above all, it’s simply inaccurate reporting.

Accurate measuring

Readers familiar with this problem will think back to measuring time in C; it requires inspecting the system clock and dividing by clocks per second. In fact there are a couple solutions around that use this:

  • The timeit package. This is good if your use-case is simple.
  • In turn, that package uses System.CPUTime from base, which is also handy.

These are more reliable, because the time cannot be changed. But they are limited, as both only measure CPU time and not IO time. So if your program takes 10 seconds but only does 5 seconds of CPU processing and 5 seconds of waiting for the disk, then you will not have the real time. Also known as wall time.

In the Criterion package, there’s need for fine-grained, fast, accurate measuring of both real and CPU time, so it includes its own cross-platform implementations:

That’s nice, but it’s embedded in a specific package built for benchmarking, which we may not necessarily be doing. For example, I am dabbling with a program to measure the speed of my key presses. It turns out there is a package that does similarly to Criterion, already prepared and similarly cross-platform and only depends on base and ghc-prim.

The clock package

I discovered this really nice package called clock which has the option for several time measurements:

  • Monotonic: a monotonic but not-absolute time which never changes after start-up.
  • Realtime: an absolute Epoch-based time (which is the system clock and can change).
  • ProcessCPUTime: CPU time taken by the process.
  • ThreadCPUTime: CPU time taken by the thread.

Let’s rewrite our example using this package and the formatting package (which provides a handy TimeSpec formatter as of 6.1):

{-# LANGUAGE OverloadedStrings #-}
import Control.Exception
import Formatting
import Formatting.Clock
import System.Clock

main =
  do start <- getTime Monotonic
     evaluate (sum [1 .. 1000000])
     end <- getTime Monotonic
     fprint (timeSpecs % "\n") start end

Running it, we see we get similar information as above, but now it’s accurate.

λ> main
276.05 ms

If you just want CPU time for the process, or the OS thread, just provide a different argument to getTime.

Summary

So next time you want to measure how long something takes, unless you’re doing benchmarking, check out the clock package!

January 05, 2015 12:00 AM

January 03, 2015

Magnus Therning

Simple state machines in Haskell

During this past autumn I’ve had to work with products implementing two small and simple protocols, one was USB based, the other completely proprietary. In both cases I found it very helpful to write clients where the central behaviour was modelled as a state machine. In the case of the proprietary protocol I’ve ended up writing a couple of servers implementing a subset of the behaviour of the real product, also here I found using a state machine as the central piece to be very useful.

I started out having a look at [Hackage][] to see if there was already some tool or DSL that would help me create machines. Surprisingly I found nothing, which probably just means I didn’t look carefully enough. Anyway, after reading the Wikipedia articles on Finite-state machine, finite state transducer, Moore machine and Mealy machine I decided to write my own naïve implementation of something resembling Mealy machines.

A Mealy machine is a 6-tuple comprising

  • a finite set of states
  • a start state
  • a finite set called the input alphabet
  • a finite set called the output alphabet
  • a transition function mapping a state and an input symbol to the next state
  • an output function mapping a state and an input symbol to an output symbol

If the transition and output functions are combined into a single function, and we don’t care about being able to reset a machine, it’s possible to use a representation as simple as this:

data Machine state event signal = Machine
    { mCurState :: state
    , mTransFunction :: state -> event -> (state, signal)
    }

I’ve opted to call the input alphabet event and the output alphabet signal.

Stepping a machine is then performed by passing the current state and an event to the combined function and updating the machine with the result. Of course the generated signal has to be dealt with too. Or in other words:

stepMachine :: Machine state event signal -> event -> (Machine state event signal, signal)
stepMachine machine event = (machine {mCurState = newState}, output)
    where
        curState = mCurState machine
        (newState, output) = mTransFunction machine curState event

I also decided to add a function for creating a machine given a start state and a function. With the definition above it becomes trivial:

createMachine :: state -> (state -> event -> (state, signal)) -> Machine state event signal
createMachine = Machine

That’s it, really. Except of course that the actual transition function still has to be written. However, with the pattern matching of Haskell I’ve found that to be rather simple, but I’ll keep that for the next post.

January 03, 2015 12:00 AM

January 02, 2015

Ken T Takusagawa

[okucbyxc] Happy 2015!

The factorial of 2015 expressed in base 147 gives a nice random-looking blob of symbols. (Ironically, it ends in even more exclamation points.) We use the wbr HTML tag to break lines to the width of your browser window. See source code in Haskell. Naive radix conversion is an unfold, though divide and conquer would be more efficient.

radix_convert :: Integer -> Integer -> [Integer];
radix_convert base = unfoldr $ \n -> if n==0 then Nothing else Just $ swap $ divMod n base;

The 147 digits are a subset of Latin-1: !<wbr>#<wbr>$<wbr>%<wbr>&<wbr>(<wbr>)<wbr>*<wbr>+<wbr>-<wbr>/<wbr>0<wbr>1<wbr>2<wbr>3<wbr>4<wbr>5<wbr>6<wbr>7<wbr>8<wbr>9<wbr>:<wbr>;<wbr><<wbr>=<wbr>><wbr>?<wbr>@<wbr>A<wbr>B<wbr>C<wbr>D<wbr>E<wbr>F<wbr>G<wbr>H<wbr>I<wbr>J<wbr>K<wbr>L<wbr>M<wbr>N<wbr>O<wbr>P<wbr>Q<wbr>R<wbr>S<wbr>T<wbr>U<wbr>V<wbr>W<wbr>X<wbr>Y<wbr>Z<wbr>[<wbr>\<wbr>]<wbr>^<wbr>a<wbr>b<wbr>c<wbr>d<wbr>e<wbr>f<wbr>g<wbr>h<wbr>i<wbr>j<wbr>k<wbr>l<wbr>m<wbr>n<wbr>o<wbr>p<wbr>q<wbr>r<wbr>s<wbr>t<wbr>u<wbr>v<wbr>w<wbr>x<wbr>y<wbr>z<wbr>{<wbr>|<wbr>}<wbr>~<wbr>¡<wbr>¢<wbr>£<wbr>¤<wbr>¥<wbr>¦<wbr>§<wbr>©<wbr>«<wbr>¬<wbr>®<wbr>±<wbr>µ<wbr>¶<wbr>·<wbr>»<wbr>¼<wbr>½<wbr>¾<wbr>¿<wbr>Æ<wbr>Ç<wbr>Ð<wbr>×<wbr>Ø<wbr>Þ<wbr>ß<wbr>à<wbr>á<wbr>â<wbr>ã<wbr>ä<wbr>å<wbr>æ<wbr>ç<wbr>è<wbr>é<wbr>ê<wbr>ë<wbr>ì<wbr>í<wbr>î<wbr>ï<wbr>ð<wbr>ñ<wbr>ò<wbr>ó<wbr>ô<wbr>õ<wbr>ö<wbr>÷<wbr>ø<wbr>ù<wbr>ú<wbr>û<wbr>ü<wbr>ý<wbr>þ<wbr>ÿ

2015! = %<wbr>4<wbr>p<wbr>µ<wbr>ò<wbr>q<wbr>g<wbr>ê<wbr>0<wbr>)<wbr>\<wbr>8<wbr>Ç<wbr>ã<wbr>t<wbr>m<wbr>4<wbr>ç<wbr>×<wbr>F<wbr>2<wbr>u<wbr>Z<wbr>ó<wbr>|<wbr>f<wbr>µ<wbr>}<wbr>å<wbr>ï<wbr>v<wbr>Y<wbr>Ð<wbr>2<wbr>b<wbr>t<wbr>¢<wbr>á<wbr>V<wbr>ò<wbr>ø<wbr>ï<wbr>Z<wbr>¼<wbr>h<wbr>ç<wbr>k<wbr>ð<wbr>=<wbr>»<wbr>Z<wbr>?<wbr>¬<wbr>®<wbr>S<wbr>O<wbr>X<wbr>1<wbr>¿<wbr>\<wbr>¶<wbr>×<wbr>Ø<wbr>1<wbr>I<wbr>ü<wbr>@<wbr>[<wbr>R<wbr>§<wbr>}<wbr>u<wbr>Ø<wbr>æ<wbr>5<wbr>¢<wbr>ý<wbr>ç<wbr>t<wbr>®<wbr>]<wbr>Þ<wbr>P<wbr>)<wbr>l<wbr>[<wbr>¡<wbr>Æ<wbr>x<wbr>ì<wbr>G<wbr>@<wbr>ß<wbr>à<wbr>ï<wbr>½<wbr>©<wbr>P<wbr>O<wbr>O<wbr><<wbr>æ<wbr>b<wbr>k<wbr>S<wbr>ñ<wbr>û<wbr>)<wbr>0<wbr>±<wbr>¦<wbr>q<wbr>æ<wbr>)<wbr>N<wbr>w<wbr>)<wbr>2<wbr>I<wbr>â<wbr>â<wbr>/<wbr>g<wbr>u<wbr>í<wbr>å<wbr>å<wbr>/<wbr>w<wbr>7<wbr>ç<wbr>¶<wbr>û<wbr>\<wbr>w<wbr>q<wbr>ç<wbr>J<wbr>ù<wbr>å<wbr>g<wbr>D<wbr>/<wbr>¡<wbr>#<wbr>n<wbr>E<wbr>·<wbr>O<wbr>ý<wbr>ý<wbr>í<wbr>s<wbr>õ<wbr>T<wbr>y<wbr>(<wbr>8<wbr>b<wbr>V<wbr>[<wbr>ô<wbr>B<wbr>w<wbr>í<wbr>è<wbr>M<wbr>¿<wbr>¢<wbr>d<wbr>¶<wbr>~<wbr>L<wbr>y<wbr>A<wbr>A<wbr>L<wbr>3<wbr>ñ<wbr>]<wbr>/<wbr>/<wbr>¼<wbr>0<wbr>~<wbr>F<wbr>K<wbr>î<wbr>¡<wbr>d<wbr>¬<wbr>z<wbr>ú<wbr>ý<wbr>v<wbr>x<wbr>á<wbr>Ð<wbr>ú<wbr>ý<wbr>m<wbr>þ<wbr>!<wbr>l<wbr>T<wbr>û<wbr>¤<wbr>ç<wbr>4<wbr>ñ<wbr>*<wbr>^<wbr>=<wbr>ø<wbr>><wbr>ô<wbr>y<wbr>O<wbr>ô<wbr>÷<wbr>¼<wbr>S<wbr>f<wbr>H<wbr>Ø<wbr>ó<wbr>×<wbr>j<wbr>9<wbr>M<wbr>J<wbr>(<wbr>+<wbr>F<wbr>{<wbr>×<wbr>¤<wbr>^<wbr>|<wbr>û<wbr>$<wbr>e<wbr>L<wbr>[<wbr>K<wbr>[<wbr>®<wbr>E<wbr>®<wbr>D<wbr>ì<wbr>ã<wbr>8<wbr>H<wbr>=<wbr>ô<wbr>~<wbr>%<wbr>¿<wbr>m<wbr>¬<wbr>Z<wbr>H<wbr>*<wbr>M<wbr>«<wbr>×<wbr>¬<wbr>u<wbr>0<wbr>2<wbr>i<wbr>ÿ<wbr>ä<wbr>q<wbr>¶<wbr>ð<wbr>-<wbr>u<wbr>î<wbr>ß<wbr>@<wbr>8<wbr>à<wbr>ä<wbr>#<wbr>ü<wbr>U<wbr>D<wbr>ô<wbr><<wbr>æ<wbr>Z<wbr>¬<wbr>å<wbr>/<wbr>d<wbr>B<wbr>#<wbr>ø<wbr>-<wbr>h<wbr>Þ<wbr>ç<wbr>#<wbr>6<wbr>ð<wbr>h<wbr>é<wbr>u<wbr>¶<wbr>ï<wbr>+<wbr>¢<wbr>p<wbr>û<wbr>s<wbr>{<wbr>w<wbr>X<wbr>à<wbr>B<wbr>f<wbr>©<wbr>;<wbr>¢<wbr>d<wbr>\<wbr>ö<wbr>t<wbr>D<wbr>Ø<wbr>{<wbr>c<wbr>9<wbr>ð<wbr>t<wbr>¦<wbr>o<wbr>¤<wbr>¤<wbr>ó<wbr>/<wbr>C<wbr>¡<wbr>/<wbr>4<wbr>·<wbr>§<wbr>¢<wbr>æ<wbr><<wbr>G<wbr>«<wbr>û<wbr>j<wbr>P<wbr>ë<wbr>R<wbr><<wbr>Ø<wbr>¦<wbr>x<wbr>ê<wbr>5<wbr>è<wbr>a<wbr>Q<wbr>-<wbr>¾<wbr>-<wbr>á<wbr>¬<wbr>¡<wbr>ñ<wbr>û<wbr>j<wbr>y<wbr>+<wbr>V<wbr>n<wbr>±<wbr>4<wbr>ô<wbr>f<wbr>ö<wbr>·<wbr>¼<wbr>6<wbr>ú<wbr>F<wbr>ö<wbr>»<wbr>1<wbr>¤<wbr>j<wbr>L<wbr>|<wbr>N<wbr>×<wbr>ÿ<wbr>p<wbr>m<wbr>T<wbr>£<wbr>h<wbr>ï<wbr>ð<wbr>ý<wbr>q<wbr>ì<wbr>ö<wbr>¦<wbr>3<wbr>¾<wbr>0<wbr>G<wbr>1<wbr>ã<wbr>m<wbr>Ð<wbr>?<wbr>ü<wbr>¤<wbr>d<wbr>W<wbr>s<wbr>@<wbr>%<wbr>2<wbr>¶<wbr>V<wbr>é<wbr>O<wbr>B<wbr>è<wbr>m<wbr>Q<wbr>ê<wbr>@<wbr>e<wbr>4<wbr>!<wbr>E<wbr>z<wbr>ß<wbr>Ø<wbr>w<wbr>:<wbr>¾<wbr>î<wbr>b<wbr>æ<wbr>v<wbr>%<wbr>7<wbr>í<wbr>!<wbr>¡<wbr>&<wbr>×<wbr>×<wbr>Q<wbr>M<wbr>Z<wbr>£<wbr>g<wbr>1<wbr>ô<wbr>¡<wbr>â<wbr>~<wbr>ö<wbr>I<wbr>*<wbr>g<wbr>Ç<wbr>ô<wbr>*<wbr>|<wbr><<wbr>G<wbr>Ð<wbr>¢<wbr>¡<wbr>F<wbr>8<wbr>±<wbr>0<wbr>Z<wbr>±<wbr>©<wbr>C<wbr>×<wbr>%<wbr>Q<wbr>C<wbr>ò<wbr>÷<wbr>T<wbr>ê<wbr>8<wbr>{<wbr>u<wbr>i<wbr>¤<wbr>ý<wbr>Þ<wbr>ð<wbr>H<wbr>÷<wbr>½<wbr>ò<wbr>¶<wbr>B<wbr>a<wbr>ç<wbr>m<wbr>j<wbr>t<wbr>R<wbr>I<wbr>A<wbr>(<wbr>£<wbr>¾<wbr>Æ<wbr>+<wbr>ù<wbr>¾<wbr>z<wbr>w<wbr>S<wbr>ê<wbr>h<wbr>Æ<wbr>ô<wbr>à<wbr>)<wbr>-<wbr>E<wbr>Þ<wbr>$<wbr>·<wbr>h<wbr>×<wbr>m<wbr>W<wbr>s<wbr>ú<wbr>v<wbr>}<wbr>¾<wbr>Ð<wbr>M<wbr>â<wbr>·<wbr>k<wbr><<wbr>¦<wbr>Ø<wbr>ç<wbr>Ç<wbr>î<wbr>9<wbr>¦<wbr>&<wbr>&<wbr>¾<wbr>s<wbr>é<wbr>û<wbr>Z<wbr>D<wbr>Y<wbr>ó<wbr>s<wbr>n<wbr>r<wbr>o<wbr>£<wbr>%<wbr>×<wbr>¢<wbr>½<wbr>~<wbr>e<wbr>ê<wbr>1<wbr>a<wbr>?<wbr>S<wbr>a<wbr>U<wbr>ç<wbr>è<wbr>q<wbr><<wbr>H<wbr>=<wbr>ê<wbr>2<wbr>ð<wbr>3<wbr>ö<wbr>Ø<wbr>y<wbr>p<wbr>|<wbr>Ð<wbr>±<wbr>5<wbr>¢<wbr>;<wbr>J<wbr>m<wbr>÷<wbr>þ<wbr>Þ<wbr>z<wbr>®<wbr>$<wbr>3<wbr>Þ<wbr>><wbr>ê<wbr>;<wbr>v<wbr>8<wbr>/<wbr>~<wbr>ú<wbr>t<wbr>ú<wbr>p<wbr>1<wbr>{<wbr>C<wbr>ô<wbr>L<wbr>ë<wbr>H<wbr>å<wbr>[<wbr>¤<wbr>n<wbr>F<wbr>3<wbr>\<wbr>¿<wbr>(<wbr>«<wbr>ã<wbr>í<wbr>\<wbr>ñ<wbr>F<wbr>!<wbr>ë<wbr>Y<wbr>Ø<wbr>Y<wbr>e<wbr>P<wbr>¡<wbr>£<wbr>å<wbr>m<wbr>U<wbr>F<wbr>¦<wbr>à<wbr>m<wbr>#<wbr>Ç<wbr>ü<wbr>ø<wbr>ã<wbr>±<wbr>3<wbr>Y<wbr>p<wbr>û<wbr>®<wbr>ó<wbr>=<wbr>8<wbr>â<wbr>¬<wbr>¢<wbr>y<wbr><<wbr>µ<wbr>d<wbr>+<wbr>[<wbr>F<wbr>C<wbr>Y<wbr>á<wbr>¥<wbr>G<wbr>ù<wbr>+<wbr>¿<wbr>)<wbr>è<wbr>M<wbr>ó<wbr>i<wbr>U<wbr>)<wbr>o<wbr>5<wbr>c<wbr>K<wbr>|<wbr>B<wbr>6<wbr>:<wbr>w<wbr>0<wbr>q<wbr>Q<wbr>j<wbr>¶<wbr>&<wbr>:<wbr>d<wbr>¦<wbr>O<wbr>K<wbr>õ<wbr>±<wbr>i<wbr>µ<wbr>+<wbr>D<wbr>l<wbr>=<wbr>£<wbr>í<wbr>6<wbr>0<wbr>ë<wbr>}<wbr>n<wbr>-<wbr>3<wbr>ë<wbr>ë<wbr>¼<wbr>©<wbr>c<wbr>e<wbr>¶<wbr>t<wbr>O<wbr>$<wbr>y<wbr>O<wbr>C<wbr>¾<wbr>~<wbr>a<wbr>à<wbr>÷<wbr>®<wbr>!<wbr>ä<wbr>u<wbr>¿<wbr>Ç<wbr>\<wbr>F<wbr>õ<wbr>C<wbr>?<wbr>O<wbr>Q<wbr>b<wbr>S<wbr>¿<wbr>Æ<wbr>þ<wbr>þ<wbr>Y<wbr>á<wbr>é<wbr>X<wbr>*<wbr>¾<wbr>;<wbr>G<wbr>k<wbr>¡<wbr>ù<wbr>j<wbr>=<wbr>z<wbr>ê<wbr>@<wbr>W<wbr>l<wbr>k<wbr>2<wbr>O<wbr>¢<wbr>M<wbr>K<wbr>/<wbr>ú<wbr>T<wbr>A<wbr>Q<wbr>ì<wbr>Ø<wbr>m<wbr>ü<wbr>g<wbr>ô<wbr>W<wbr>¼<wbr>P<wbr>¤<wbr>!<wbr>#<wbr>O<wbr>M<wbr>u<wbr>S<wbr>~<wbr>X<wbr>Y<wbr>Ç<wbr>o<wbr>}<wbr>j<wbr>ï<wbr>4<wbr>£<wbr>«<wbr>é<wbr>ï<wbr>æ<wbr>l<wbr>R<wbr>r<wbr>M<wbr>h<wbr>à<wbr>3<wbr>Ç<wbr>ö<wbr>k<wbr>5<wbr>L<wbr>><wbr>/<wbr>¢<wbr>4<wbr>à<wbr>^<wbr>®<wbr>w<wbr>t<wbr>[<wbr>ç<wbr>/<wbr>e<wbr>V<wbr>×<wbr>ú<wbr>¬<wbr>ü<wbr>(<wbr>µ<wbr>{<wbr>¿<wbr>R<wbr>5<wbr>j<wbr>5<wbr>÷<wbr>^<wbr>ã<wbr>½<wbr>l<wbr>#<wbr>]<wbr>§<wbr>|<wbr>;<wbr>3<wbr>]<wbr>§<wbr>÷<wbr>1<wbr>^<wbr>©<wbr>e<wbr>B<wbr>G<wbr>æ<wbr>8<wbr>3<wbr>f<wbr>k<wbr>~<wbr>]<wbr>Z<wbr>j<wbr>¿<wbr>í<wbr>¿<wbr>m<wbr>õ<wbr>(<wbr>£<wbr><<wbr>G<wbr>}<wbr>=<wbr>e<wbr>v<wbr>Æ<wbr>]<wbr>¼<wbr>¡<wbr>÷<wbr>#<wbr>ô<wbr>Ø<wbr>è<wbr>]<wbr>¾<wbr>Q<wbr>3<wbr>Y<wbr>ù<wbr>é<wbr>B<wbr>2<wbr>+<wbr>ì<wbr>×<wbr>ä<wbr>}<wbr>h<wbr>O<wbr>g<wbr>ÿ<wbr>í<wbr>Þ<wbr>ø<wbr>:<wbr>â<wbr>s<wbr>z<wbr>Ç<wbr>3<wbr>Ø<wbr>a<wbr>?<wbr>S<wbr>H<wbr>ú<wbr>X<wbr>><wbr>¾<wbr>S<wbr>ø<wbr>W<wbr>(<wbr>[<wbr>û<wbr>u<wbr>7<wbr>~<wbr>X<wbr>ó<wbr>X<wbr>:<wbr>O<wbr>R<wbr>§<wbr>F<wbr>«<wbr>T<wbr>^<wbr>D<wbr>A<wbr>ñ<wbr>¦<wbr>(<wbr>q<wbr>\<wbr>ù<wbr>Z<wbr>þ<wbr>I<wbr>N<wbr>½<wbr>G<wbr>æ<wbr>~<wbr>3<wbr>n<wbr>à<wbr>D<wbr>\<wbr>F<wbr>X<wbr>¶<wbr>/<wbr>©<wbr>ç<wbr>y<wbr>f<wbr>*<wbr>¡<wbr>a<wbr>r<wbr>m<wbr>l<wbr>þ<wbr>ú<wbr>/<wbr>£<wbr>9<wbr>B<wbr>i<wbr>»<wbr>1<wbr>><wbr>¦<wbr>(<wbr>x<wbr>I<wbr>]<wbr>»<wbr>o<wbr>(<wbr>±<wbr>;<wbr>:<wbr>¦<wbr>©<wbr>¤<wbr>B<wbr>i<wbr>@<wbr>¬<wbr>¥<wbr>ç<wbr>Æ<wbr>T<wbr>Æ<wbr>;<wbr>G<wbr>®<wbr>m<wbr>ö<wbr>q<wbr>ú<wbr>G<wbr>ø<wbr>e<wbr>u<wbr>Þ<wbr>®<wbr>«<wbr>4<wbr>ñ<wbr>0<wbr>4<wbr>5<wbr>é<wbr>a<wbr>4<wbr>ç<wbr>z<wbr>u<wbr>í<wbr>©<wbr>9<wbr>ü<wbr>Ç<wbr>ü<wbr>V<wbr>H<wbr>;<wbr>F<wbr>a<wbr>V<wbr>}<wbr>G<wbr>&<wbr>n<wbr>©<wbr>ï<wbr>n<wbr>ì<wbr>p<wbr>â<wbr>!<wbr>ç<wbr>4<wbr>æ<wbr>/<wbr>!<wbr>Þ<wbr>B<wbr>¥<wbr>ü<wbr>r<wbr>)<wbr>ß<wbr>?<wbr>¿<wbr>N<wbr>{<wbr>[<wbr>Y<wbr>Þ<wbr>ñ<wbr>·<wbr>)<wbr>D<wbr>L<wbr>3<wbr>«<wbr>g<wbr>1<wbr>(<wbr>o<wbr>Y<wbr>E<wbr>à<wbr>p<wbr>N<wbr>ê<wbr>d<wbr>Ø<wbr>ï<wbr>@<wbr>l<wbr>ê<wbr>#<wbr>ó<wbr>-<wbr>¤<wbr>+<wbr>g<wbr>û<wbr>ñ<wbr>!<wbr>j<wbr>î<wbr>@<wbr>ë<wbr>!<wbr>¦<wbr>e<wbr>«<wbr>c<wbr>3<wbr>×<wbr>|<wbr>a<wbr>ø<wbr>ï<wbr>j<wbr>\<wbr>H<wbr>m<wbr>A<wbr>ä<wbr>û<wbr>¢<wbr>[<wbr>a<wbr>m<wbr>N<wbr>[<wbr>®<wbr>U<wbr>U<wbr>ä<wbr>¢<wbr>¬<wbr>@<wbr>¼<wbr>9<wbr>ì<wbr>ï<wbr>5<wbr>s<wbr>¼<wbr>@<wbr>×<wbr>Y<wbr>p<wbr>å<wbr>H<wbr>~<wbr>N<wbr>u<wbr>+<wbr>þ<wbr>d<wbr>Ð<wbr>Ð<wbr>æ<wbr>I<wbr>å<wbr>\<wbr>f<wbr>ý<wbr>ü<wbr>¢<wbr>¦<wbr>f<wbr>M<wbr>ï<wbr>2<wbr>d<wbr>÷<wbr>á<wbr>¿<wbr>N<wbr>L<wbr>¶<wbr>§<wbr>«<wbr>!<wbr>£<wbr>ç<wbr>u<wbr>¥<wbr>ù<wbr>Ð<wbr>ç<wbr>÷<wbr>ì<wbr>b<wbr>i<wbr>A<wbr>1<wbr>y<wbr>!<wbr>y<wbr>Q<wbr>J<wbr>1<wbr>×<wbr>«<wbr>¼<wbr>p<wbr>·<wbr>B<wbr>F<wbr>+<wbr>¾<wbr>v<wbr>»<wbr>^<wbr>L<wbr>G<wbr>@<wbr>1<wbr>¶<wbr>r<wbr>%<wbr>|<wbr>×<wbr>B<wbr>â<wbr>L<wbr>S<wbr>£<wbr>T<wbr>ý<wbr>$<wbr>$<wbr>µ<wbr>N<wbr>|<wbr>é<wbr>ã<wbr>f<wbr>ë<wbr>G<wbr>A<wbr>A<wbr>a<wbr>C<wbr>ù<wbr>q<wbr>¾<wbr>ò<wbr>×<wbr>ø<wbr>5<wbr>î<wbr>{<wbr>c<wbr>n<wbr>ó<wbr>¿<wbr>)<wbr>ñ<wbr>u<wbr>û<wbr>9<wbr>÷<wbr>E<wbr>¾<wbr>2<wbr>ê<wbr>6<wbr>R<wbr>ø<wbr>{<wbr>ô<wbr>ð<wbr>b<wbr>¢<wbr>?<wbr>P<wbr>m<wbr>v<wbr>ì<wbr>R<wbr>K<wbr>¤<wbr>><wbr>q<wbr>#<wbr>ñ<wbr>n<wbr>ÿ<wbr>©<wbr>©<wbr>¾<wbr>6<wbr>¬<wbr>j<wbr>A<wbr>à<wbr>m<wbr>õ<wbr>ã<wbr>a<wbr>}<wbr>R<wbr>U<wbr>R<wbr>#<wbr>®<wbr>£<wbr>ø<wbr>8<wbr>Ð<wbr>3<wbr>ó<wbr>s<wbr>©<wbr>µ<wbr>{<wbr>¤<wbr>L<wbr>ò<wbr>á<wbr>5<wbr>5<wbr>)<wbr>(<wbr>ä<wbr>k<wbr>><wbr>c<wbr>ù<wbr>ê<wbr>W<wbr>a<wbr>ö<wbr>õ<wbr>R<wbr>e<wbr>D<wbr>è<wbr>z<wbr>r<wbr>¾<wbr>Ø<wbr>y<wbr>c<wbr>â<wbr>2<wbr>§<wbr>}<wbr>Æ<wbr>x<wbr>á<wbr>¼<wbr>A<wbr>ï<wbr>:<wbr>ý<wbr>~<wbr>><wbr>Q<wbr>6<wbr>R<wbr>i<wbr>ß<wbr>-<wbr>S<wbr>p<wbr>K<wbr>6<wbr>§<wbr>ô<wbr>¢<wbr>Z<wbr>f<wbr>»<wbr>P<wbr>P<wbr>&<wbr>¼<wbr>s<wbr>±<wbr>*<wbr>ã<wbr>]<wbr>ô<wbr>~<wbr>J<wbr>z<wbr>P<wbr>5<wbr>à<wbr>r<wbr>q<wbr><<wbr>\<wbr>·<wbr>N<wbr>µ<wbr>N<wbr>J<wbr>{<wbr>r<wbr>v<wbr>^<wbr>6<wbr>¦<wbr>ý<wbr>ï<wbr>¤<wbr>-<wbr>M<wbr>x<wbr>!<wbr>ì<wbr>[<wbr>b<wbr>å<wbr>#<wbr>à<wbr>í<wbr>{<wbr>x<wbr>Æ<wbr>â<wbr>p<wbr>à<wbr>{<wbr>Ø<wbr>W<wbr>}<wbr>t<wbr>á<wbr>-<wbr>ì<wbr>=<wbr>ç<wbr>¢<wbr>¾<wbr>N<wbr>E<wbr>ì<wbr>M<wbr>G<wbr>{<wbr>í<wbr>c<wbr>ÿ<wbr>á<wbr>z<wbr>þ<wbr><<wbr>J<wbr>®<wbr>#<wbr>q<wbr>u<wbr>h<wbr>é<wbr>ú<wbr>P<wbr>w<wbr>£<wbr>:<wbr>à<wbr>%<wbr>µ<wbr>ñ<wbr>(<wbr>ë<wbr>Æ<wbr>q<wbr><<wbr>t<wbr>ì<wbr>ê<wbr>¤<wbr>®<wbr>s<wbr>û<wbr>a<wbr>d<wbr>F<wbr>ñ<wbr>9<wbr>þ<wbr>ü<wbr>C<wbr>:<wbr>û<wbr>¡<wbr>2<wbr>å<wbr>E<wbr>¾<wbr>-<wbr>><wbr>ÿ<wbr>3<wbr>¦<wbr>$<wbr>à<wbr>!<wbr>}<wbr>ë<wbr>µ<wbr>D<wbr>¶<wbr>5<wbr>ñ<wbr>i<wbr>V<wbr>Æ<wbr>Ð<wbr>><wbr>þ<wbr>%<wbr>B<wbr>ñ<wbr>L<wbr>@<wbr>f<wbr>ð<wbr>§<wbr>¬<wbr>6<wbr>|<wbr>ñ<wbr>ß<wbr>9<wbr>¥<wbr>¦<wbr>s<wbr>þ<wbr>g<wbr>]<wbr>;<wbr>m<wbr>/<wbr>©<wbr>b<wbr>=<wbr>-<wbr>/<wbr>ê<wbr>d<wbr>ö<wbr>ç<wbr>2<wbr><<wbr>t<wbr>á<wbr>í<wbr>ñ<wbr>j<wbr>R<wbr>:<wbr>P<wbr>9<wbr>q<wbr>)<wbr>ò<wbr>ã<wbr>p<wbr>I<wbr>q<wbr>/<wbr>e<wbr>¾<wbr>(<wbr>í<wbr>X<wbr>y<wbr>V<wbr>U<wbr>+<wbr>-<wbr>w<wbr>¢<wbr>w<wbr>ñ<wbr>V<wbr>õ<wbr>ñ<wbr>r<wbr>±<wbr>ê<wbr>q<wbr>u<wbr>±<wbr>w<wbr>®<wbr>z<wbr>H<wbr>l<wbr>ý<wbr>4<wbr>+<wbr>w<wbr>è<wbr>Þ<wbr>W<wbr>{<wbr>ü<wbr>Þ<wbr>@<wbr>¶<wbr>k<wbr>n<wbr>ß<wbr>i<wbr>:<wbr>ß<wbr>@<wbr>1<wbr>¤<wbr>þ<wbr>4<wbr>7<wbr>%<wbr>R<wbr>Y<wbr>(<wbr>@<wbr>®<wbr>Ç<wbr>m<wbr>¿<wbr>V<wbr>ð<wbr>¶<wbr>b<wbr>M<wbr>N<wbr>ø<wbr>q<wbr>ö<wbr>ó<wbr>¦<wbr>s<wbr>ú<wbr>e<wbr>o<wbr>]<wbr>Y<wbr>M<wbr>æ<wbr>ë<wbr>M<wbr>ð<wbr>í<wbr>c<wbr>è<wbr>©<wbr>)<wbr>m<wbr>4<wbr>u<wbr>/<wbr>6<wbr>=<wbr>8<wbr>½<wbr>:<wbr>W<wbr>*<wbr>Q<wbr>î<wbr>ü<wbr>G<wbr>n<wbr>x<wbr>ç<wbr>x<wbr>ï<wbr>K<wbr>C<wbr>½<wbr>)<wbr>|<wbr>ñ<wbr>I<wbr>]<wbr>5<wbr>·<wbr>d<wbr>H<wbr>d<wbr>ç<wbr>T<wbr>C<wbr>l<wbr>ã<wbr>#<wbr>¿<wbr>U<wbr>ä<wbr>W<wbr>í<wbr>á<wbr>|<wbr>§<wbr>µ<wbr>q<wbr>x<wbr>D<wbr>á<wbr>þ<wbr>µ<wbr>ò<wbr>ô<wbr>½<wbr>ä<wbr>i<wbr>d<wbr>F<wbr>Y<wbr>#<wbr>)<wbr>ê<wbr>I<wbr>±<wbr>å<wbr>p<wbr>4<wbr>9<wbr>Ç<wbr>x<wbr>V<wbr>¦<wbr>â<wbr>\<wbr>C<wbr>ï<wbr>8<wbr>»<wbr>!<wbr>®<wbr>d<wbr>¼<wbr>k<wbr>®<wbr>¬<wbr>l<wbr>£<wbr>·<wbr><<wbr>¦<wbr>C<wbr>J<wbr>W<wbr>q<wbr>q<wbr>î<wbr>A<wbr>}<wbr>ã<wbr>7<wbr>T<wbr>T<wbr>S<wbr>]<wbr>ý<wbr>÷<wbr>\<wbr>%<wbr>B<wbr>S<wbr>f<wbr>î<wbr>ô<wbr>p<wbr>|<wbr>ñ<wbr>A<wbr>n<wbr>ë<wbr>ë<wbr>?<wbr>ì<wbr>%<wbr>q<wbr>N<wbr>ÿ<wbr>~<wbr>ô<wbr>Ç<wbr>y<wbr>î<wbr>«<wbr>¶<wbr>G<wbr>l<wbr>F<wbr>¾<wbr>w<wbr>¡<wbr>P<wbr>L<wbr>;<wbr>4<wbr>m<wbr>Þ<wbr>3<wbr>Y<wbr>ù<wbr>ß<wbr>Ç<wbr>«<wbr>Ø<wbr>á<wbr>G<wbr>¬<wbr>î<wbr>ä<wbr>6<wbr>D<wbr>¡<wbr>k<wbr>l<wbr>à<wbr>å<wbr>{<wbr>&<wbr>X<wbr>S<wbr>y<wbr>ù<wbr>6<wbr>ø<wbr>#<wbr>N<wbr>4<wbr>f<wbr>µ<wbr>m<wbr>§<wbr>x<wbr>@<wbr>e<wbr>ü<wbr>f<wbr>{<wbr>â<wbr>à<wbr>µ<wbr>¾<wbr>¼<wbr>*<wbr>k<wbr>=<wbr>·<wbr>ò<wbr>a<wbr>y<wbr>÷<wbr>X<wbr>S<wbr>§<wbr>õ<wbr>i<wbr>4<wbr>ø<wbr>2<wbr>¢<wbr>1<wbr>á<wbr>è<wbr>R<wbr>¦<wbr>i<wbr>å<wbr>ú<wbr>0<wbr>o<wbr>q<wbr>J<wbr>H<wbr>d<wbr>x<wbr>%<wbr>á<wbr>·<wbr>Ø<wbr>><wbr>¿<wbr>ë<wbr>¼<wbr>¡<wbr>W<wbr>ß<wbr>Y<wbr>ò<wbr>ä<wbr>l<wbr>L<wbr>ä<wbr>ç<wbr>ë<wbr>+<wbr>2<wbr>·<wbr>f<wbr>(<wbr>O<wbr>¬<wbr>á<wbr>ý<wbr>K<wbr>q<wbr>H<wbr>j<wbr>(<wbr>/<wbr>)<wbr>g<wbr>á<wbr>ý<wbr>¥<wbr>X<wbr>Ç<wbr>P<wbr>+<wbr>3<wbr>4<wbr>z<wbr>ð<wbr>C<wbr>8<wbr>3<wbr>¢<wbr>¬<wbr>à<wbr>ø<wbr>v<wbr>¬<wbr>j<wbr>J<wbr>f<wbr>x<wbr>K<wbr>P<wbr>#<wbr>6<wbr>±<wbr>ú<wbr>ß<wbr>M<wbr>t<wbr>E<wbr>0<wbr>½<wbr>h<wbr>û<wbr>6<wbr>Ð<wbr>V<wbr>ò<wbr>µ<wbr>Z<wbr>J<wbr>ã<wbr>ð<wbr>ï<wbr>7<wbr>÷<wbr>ê<wbr>I<wbr>ø<wbr>o<wbr>ð<wbr>T<wbr>j<wbr>><wbr>v<wbr>+<wbr>~<wbr>x<wbr>ö<wbr>%<wbr>}<wbr>¡<wbr>ã<wbr>ô<wbr>ü<wbr>a<wbr>f<wbr>b<wbr>m<wbr>$<wbr>|<wbr>F<wbr>ã<wbr>¼<wbr>¬<wbr>Ç<wbr>Æ<wbr>A<wbr>ä<wbr>><wbr>Þ<wbr>L<wbr>±<wbr>*<wbr>3<wbr><<wbr>©<wbr>C<wbr>½<wbr>}<wbr>¤<wbr>ç<wbr>]<wbr>w<wbr>ý<wbr>¬<wbr>ü<wbr>l<wbr>:<wbr>¢<wbr>c<wbr>}<wbr>X<wbr>r<wbr>ø<wbr>y<wbr>A<wbr>o<wbr>@<wbr>o<wbr>à<wbr>G<wbr>!<wbr>r<wbr>a<wbr>Ø<wbr>«<wbr>N<wbr>8<wbr>¼<wbr>B<wbr>d<wbr>S<wbr>2<wbr>O<wbr>U<wbr>¥<wbr>A<wbr>=<wbr>|<wbr>Q<wbr>¥<wbr>f<wbr>i<wbr>Ð<wbr>ò<wbr>é<wbr>7<wbr>G<wbr>#<wbr>I<wbr>§<wbr>Q<wbr>H<wbr>H<wbr>¿<wbr>Æ<wbr>0<wbr>a<wbr>F<wbr>ï<wbr>?<wbr>W<wbr>B<wbr>×<wbr>L<wbr>><wbr>N<wbr>^<wbr>§<wbr>[<wbr>]<wbr>í<wbr>¿<wbr>¤<wbr>»<wbr>ø<wbr>R<wbr>ë<wbr>i<wbr>«<wbr>¥<wbr>¾<wbr>v<wbr>}<wbr>©<wbr>×<wbr>¦<wbr>3<wbr>j<wbr>/<wbr>L<wbr>v<wbr>/<wbr>7<wbr>Y<wbr>ê<wbr>a<wbr>!<wbr>»<wbr>¤<wbr>µ<wbr>|<wbr>©<wbr>s<wbr>?<wbr>æ<wbr>à<wbr>î<wbr>0<wbr>«<wbr>ô<wbr>g<wbr>ë<wbr>j<wbr>j<wbr>h<wbr>å<wbr>Q<wbr>Ø<wbr>s<wbr>å<wbr>p<wbr>ð<wbr>|<wbr>»<wbr>z<wbr>C<wbr>P<wbr>h<wbr>s<wbr>ø<wbr>2<wbr>§<wbr>k<wbr>¢<wbr>y<wbr>ï<wbr>¥<wbr>X<wbr>é<wbr>;<wbr>k<wbr>o<wbr>T<wbr>ì<wbr>F<wbr>ä<wbr>ú<wbr>¶<wbr>¾<wbr>¼<wbr>y<wbr>/<wbr>ì<wbr>a<wbr>Þ<wbr>/<wbr>6<wbr>±<wbr>N<wbr>;<wbr>¿<wbr>$<wbr>n<wbr>®<wbr>Ð<wbr>s<wbr>^<wbr>å<wbr>í<wbr>[<wbr>u<wbr>æ<wbr>·<wbr>ê<wbr>ú<wbr>a<wbr>o<wbr>p<wbr>r<wbr>U<wbr>p<wbr>þ<wbr>~<wbr>ö<wbr>ê<wbr>a<wbr>û<wbr>?<wbr>ä<wbr>=<wbr>ë<wbr>:<wbr>§<wbr>|<wbr>5<wbr>z<wbr>±<wbr>í<wbr>v<wbr>©<wbr>J<wbr>Ø<wbr>Æ<wbr>e<wbr>S<wbr>¬<wbr>C<wbr>!<wbr>+<wbr>Ð<wbr>ï<wbr>D<wbr>|<wbr>9<wbr>L<wbr>4<wbr>[<wbr>L<wbr>C<wbr>^<wbr>G<wbr>@<wbr>í<wbr><<wbr>b<wbr>*<wbr>n<wbr>2<wbr>T<wbr>ú<wbr>M<wbr>x<wbr>^<wbr>ÿ<wbr>®<wbr>|<wbr>m<wbr>(<wbr>Ç<wbr>@<wbr>ó<wbr>1<wbr>¢<wbr>V<wbr>(<wbr>L<wbr>K<wbr>t<wbr>Þ<wbr>ç<wbr>o<wbr>X<wbr>*<wbr>ó<wbr>t<wbr>í<wbr>í<wbr>m<wbr>L<wbr>ý<wbr>T<wbr>N<wbr>¶<wbr>V<wbr>ü<wbr>M<wbr>í<wbr>}<wbr>O<wbr>~<wbr>u<wbr>j<wbr>$<wbr>a<wbr>F<wbr>8<wbr>x<wbr>~<wbr>X<wbr>Ç<wbr>ý<wbr>E<wbr>M<wbr>Z<wbr>ò<wbr>0<wbr>¢<wbr>q<wbr>*<wbr>(<wbr>L<wbr>2<wbr>ù<wbr>×<wbr>¾<wbr>+<wbr>><wbr>â<wbr>X<wbr>P<wbr>g<wbr>h<wbr>*<wbr>7<wbr>q<wbr>á<wbr>×<wbr>O<wbr>g<wbr>l<wbr>ã<wbr>ó<wbr>[<wbr>K<wbr>x<wbr>ö<wbr>¶<wbr>Q<wbr>a<wbr>¼<wbr>!<wbr>§<wbr>E<wbr>»<wbr>¥<wbr>ï<wbr>ß<wbr>&<wbr>å<wbr>A<wbr>K<wbr>9<wbr>0<wbr>a<wbr>k<wbr>e<wbr>Þ<wbr>W<wbr>?<wbr>o<wbr>¶<wbr>N<wbr>ý<wbr>T<wbr>M<wbr>5<wbr>O<wbr>!<wbr>U<wbr>á<wbr>ë<wbr>¢<wbr>Q<wbr>ý<wbr>X<wbr>ï<wbr>j<wbr>]<wbr>O<wbr>p<wbr>§<wbr>Y<wbr>ó<wbr>-<wbr>><wbr>%<wbr>*<wbr>æ<wbr>¦<wbr>ø<wbr><<wbr>6<wbr>¥<wbr>j<wbr>g<wbr>@<wbr>t<wbr>å<wbr>h<wbr>W<wbr>ù<wbr>½<wbr>p<wbr>¿<wbr>þ<wbr>@<wbr>4<wbr>@<wbr>\<wbr>«<wbr>C<wbr>(<wbr>ê<wbr>ç<wbr>÷<wbr>¥<wbr>×<wbr>I<wbr>K<wbr>ç<wbr>µ<wbr>å<wbr>ï<wbr>t<wbr>g<wbr>¤<wbr>P<wbr>N<wbr>s<wbr>¡<wbr>X<wbr>/<wbr>5<wbr>ã<wbr>d<wbr>ê<wbr>½<wbr>|<wbr>ê<wbr>W<wbr>¿<wbr>;<wbr>®<wbr>¶<wbr>z<wbr>·<wbr>q<wbr>z<wbr>7<wbr>A<wbr>v<wbr>]<wbr>ô<wbr>U<wbr>o<wbr>7<wbr>û<wbr>û<wbr>w<wbr>{<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!<wbr>!

by Ken ([email protected]) at January 02, 2015 01:09 AM

January 01, 2015

Roman Cheplyaka

Lexical analysis with parser combinators

When writing a programming language parser in Haskell, the usual dilemma is whether to use lexer/parser generators (Alex+Happy), or make a single parser (using a parser combinator library like Parsec) without an explicit tokenizer.

In this article, I’ll explain why you may want to use a separate lexer built with applicative parser combinators, and how you might go about writing one.

Alex

Alex, the Haskell lexer generator, is a nice tool, but it has a weakness. With Alex, you cannot observe the internal structure of a token.

The most common instance of this problem is parsing string interpolation, think "Hello, ${name}!". Such a string is a single token, yet it has some internal structure to it. To analyze this with Alex, you’ll probably need to have two different lexers, and run one inside the other.

Another instance is date literals (useful for many DSLs), such as 2015-01-02 (or d"2015-01-02" to avoid confusion with infix subtraction.) You can recognize such a literal with an Alex regular expression $d+\-$d+\-$d+. However, you won’t be able to get hold of the individual $d+ pieces — Alex regexps lack capture groups. So you’ll need to use a separate date parsing function to parse the same string second time.

Parsec

On the other hand, we have parser combinators that solve this problem nicely. You define parsers for numbers; then you glue them with dashes, and you get a parser for dates.

However, using Parsec (or similar) without a separate lexer is not the best choice:

  1. Dealing with whitespace and comments is awkward in the parser; you need to wrap everything in a token combinator. (If you decide to do that, at least use a free applicative functor to ensure that you don’t forget to consume that whitespace).

  2. Separating parsing into lexical and syntax analysis is just a good engineering practice. It reduces the overall complexity through «divide and conquer». The two phases usually have well-defined responsibilities and a simple interface — why not take advantage of that?

  3. If a language needs the maximal munch rule, it’s hard to impossible to encode that with Parsec or similar libraries.

  4. Tokens enable the syntax analyzer to report better errors. This is because you can tell which token you didn’t expect. In a Char-based Parsec parser, you can only tell which character (or an arbitrary number of characters) you didn’t expect, because you don’t know what constitutes a token.

  5. Potential performance considerations. If a parser has to try several syntax tree alternatives, it reparses low-level lexical tokens anew every time. From this perspective, the lexical analyzer provides a natural «caching layer».

regex-applicative

My regex-applicative library provides applicative parser combinators with regexp semantics. We can write a simple lexer on top of it that would give us the benefits of both Alex and Parsec.

{-# LANGUAGE OverloadedStrings, OverloadedLists #-}

import qualified Data.Text as T
import qualified Data.HashSet as HS
import Text.Regex.Applicative
import Text.Regex.Applicative.Common (decimal)
import Data.Char
import Data.Time

For example, here’s a parser for dates. (For simplicity, this one doesn’t check dates for validity.)

pDate :: RE Char Day
pDate = fromGregorian <$> decimal <* "-" <*> decimal <* "-" <*> decimal

And here’s a parser for templates — strings that may include interpolated variables and escaping.

pTemplate :: RE Char [Either T.Text T.Text] -- Left = text, Right = variable
pTemplate = "\"" *> some piece <* "\""
  where
    -- piece of text or interpolated variable
    piece = 
      (Left . T.pack <$> some ch) <|>
      (Right <$> var)

    -- interpolated variable
    var = "${" *> pVar <* "}"

    -- normal character, plain or escaped
    ch =
      psym (\c -> not $ c `HS.member` escapable) <|>
      ("\\" *> psym (\c -> c `HS.member` escapable))

    -- set of escapable characters
    escapable = ['"', '\\', '$']

    pVar = T.pack <$> some (psym isAlpha)

Individual parsers are merged into a single pToken parser:

data Token
   = Template [Either T.Text T.Text]
   | Date Day
-- | ...

pToken :: RE Char Token
pToken =
   (Template <$> pTemplate) <|>
   (Date <$> pDate)
-- <|> ...

Finally, a simple tokenizing function might look something like this:

tokens :: String -> [Token]
tokens "" = []
tokens s =
  let re = (Just <$> pToken) <|> (Nothing <$ some (psym isSpace)) in
  case findLongestPrefix re s of
    Just (Just tok, rest) -> tok : tokens rest
    Just (Nothing, rest) -> tokens rest -- whitespace
    Nothing -> error "lexical error"

The resulting token list can be further parsed with Parsec or Happy.

This simple function can be extended to do a lot more:

  1. Track position in the input string, use it for error reporting, include it into tokens
  2. Consume input incrementally
  3. Get feedback from the syntax analyzer, and, depending on the syntactic context, perform different lexical analyses. This is useful for more complex features of programming languages, such as the off-side rule in Haskell or arbitrarily-nested command substitution and «here documents» in POSIX Shell. (This is another thing Alex cannot do, by the way.)

January 01, 2015 10:00 PM

December 31, 2014

Gabriel Gonzalez

Shortcut fusion for pipes

Rewrite rules are a powerful tool that you can use to optimize Haskell code without breaking backwards compatibility. This post will illustrate how I use rewrite rules to implement a form of shortcut fusion for my pipes stream programming library. I compare pipes performance before and after adding shortcut fusion and I also compare the peformance of pipes-4.0.2 vs. conduit-1.0.10 and io-streams-1.1.4.0.

This post also includes a small introduction to Haskell's rewrite rule system for those who are interested in optimizing their own libraries.

Edit: This post originally used the term "stream fusion", but Duncan Coutts informed me that the more appropriate term for this is probably "short-cut fusion".

Rule syntax

The following rewrite rule from pipes demonstrates the syntax for defining optimization rules.

{-# RULES "for p yield" forall p . for p yield = p #-}
-- ^^^^^^^^^^^^^ ^ ^^^^^^^^^^^^^^^
-- Label | Substitution rule
-- |
-- `p` can be anything -+

All rewrite rules are substitution rules, meaning that they instruct the compiler to replace anything that matches the left-hand side of the rule with the right-hand side. The above rewrite rule says to always replace for p yield with p, no matter what p is, as long as everything type-checks before and after substitution.

Rewrite rules are typically used to substitute code with equivalent code of greater efficiency. In the above example, for p yield is a for loop that re-yields every element of p. The re-yielded stream is behaviorally indistinguishable from the original stream, p, because all that for p yield does is replace every yield in p with yet another yield. However, while both sides of the equation behave identically their efficiency is not the same; the left-hand side is less efficient because for loops are not free.

Safety

Rewrite rules are not checked for correctness. The only thing the compiler does is verify that the left-hand side and right-hand side of the equation both type-check. The programmer who creates the rewrite rule is responsible for proving that the substitution preserves the original behavior of the program.

In fact, rewrite rules can be used to rewrite terms from other Haskell libraries without limitation. For this reason, modules with rewrite rules are automatically marked unsafe by Safe Haskell and they must be explicitly marked TrustWorthy to be used by code marked Safe.

You can verify a rewrite rule is safe using equational reasoning. If you can reach the right-hand side of the equation from the left-hand side using valid code substitutions, then you can prove (with some caveats) that the two sides of the equation are functionally identical.

pipes includes a complete set of proofs for its rewrite rules for this purpose. For example, the above rewrite rule is proven here, where (//>) is an infix synonym for for and respond is a synonym for yield:

for   = (//>)

yield = respond

This means that equational reasoning is useful for more than just proving program correctness. You can also use derived equations to optimize your program , assuming that you already know which side of the equation is more efficient.

Laws

Rewrite rules have a significant limitation: we cannot possibly anticipate every possible expression that downstream users might build using our libraries. So how can we optimize as much code as possible without an excessive proliferation of rewrite rules?

I anecdotally observe that equations inspired from category theory prove to be highly versatile optimizations that fit within a small number of rules. These equations include:

  • Category laws

  • Functor laws

  • Natural transformation laws (i.e. free theorems)

The first half of the shortcut fusion optimization consists of three monad laws, which are a special case of category laws. For those new to Haskell, the three monad laws are:

-- Left Identity
return x >>= f = f x

-- Right Identity
m >>= return = m

-- Associativity
(m >>= f) >>= g = m >>= (\x -> f x >>= g)

If you take these three laws and replace (>>=)/return with for/yield (and rename m to p, for 'p'ipe), you get the following "for loop laws":

-- Looping over a yield simplifies to function application
for (yield x) f = f x

-- Re-yielding every element returns the original stream
for p yield = p

-- You can transform two passes over a stream into a single pass
for (for p f) g = for p (\x -> for (f x) g)

This analogy to the monad laws is precise because for and yield are actually (>>=) and return for the ListT monad when you newtype them appropriately, and they really form a Monad in the Haskell sense of the word.

What's amazing is that these monad laws also double as shortcut fusion optimizations when we convert them to rewrite rules. We already encountered the second law as our first rewrite rule, but the other two laws are useful rewrite rules, too:

{-# RULES
"for (yield x) f" forall x f .
for (yield x) f = f x
; "for (for p f) g" forall p f g .
for (for p f) g = for p (\x -> for (f x) g)
#-}

Note that the RULES pragma lets you group multiple rules together, as long as you separate them by semicolons. Also, there is no requirement that the rule label must match the left-hand side of the equation, but I use this convention since I'm bad at naming rewrite rules. This labeling convention also helps when diagnosing which rules fired (see below) without having to consult the original rule definitions.

Free theorems

These three rewrite rules alone do not suffice to optimize most pipes code. The reason why is that most idiomatic pipes code is not written in terms of for loops. For example, consider the map function from Pipes.Prelude:

map :: Monad m => (a -> b) -> Pipe a b m r
map f = for cat (\x -> yield (f x))

The idiomatic way to transform a pipe's output is to compose the map pipe downstream:

p >-> map f

We can't optimize this using our shortcut fusion rewrite rules unless we rewrite the above code to the equivalent for loop:

for p (\y -> yield (f y))

In other words, we require the following theorem:

p >-> map f = for p (\y -> yield (f y))

This is actually a special case of the following "free theorem":

-- Exercise: Derive the previous equation from this one

p1 >-> for p2 (\y -> yield (f y))
= for (p1 >-> p2) (\y -> yield (f y))

A free theorem is an equation that you can prove solely from studying the types of all terms involved. I will omit the proof of this free theorem for now, but I will discuss how to derive free theorems in detail in a follow-up post. For now, just assume that the above equations are correct, as codified by the following rewrite rule:

{-# RULES
"p >-> map f" . forall p f .
p >-> map f = for p (\y -> yield (f y))
#-}

With this rewrite rule the compiler can begin to implement simple map fusion. To see why, we'll compose two map pipes and then pretend that we are the compiler, applying rewrite rules at every opportunity. Every time we apply a rewrite rule we will refer to the rule by its corresponding string label:

map f >-> map g

-- "p >-> map f" rule fired
= for (map f) (\y -> yield (g y))

-- Definition of `map`
= for (for cat (\x -> yield (f x))) (\y -> yield (g y))

-- "for (for p f) g" rule fired
= for cat (\x -> for (yield (f x)) (\y -> yield (g y)))

-- "for (yield x) f" rule fired
= for cat (\x -> yield (g (f x)))

This is identical to a single map pass, which we can prove by equational reasoning:

for cat (\x -> yield (g (f x)))

-- Definition of `(.)`, in reverse
= for cat (\x -> yield ((g . f) x))

-- Definition of `map`, in reverse
= map (g . f)

So those rewrite rules sufficed to fuse the two map passes into a single pass. You don't have to take my word for it, though. For example, let's say that we want to prove that these rewrite rules fire for the following sample program, which increments, doubles, and then discards every number from 1 to 100000000:

-- map-fusion.hs

import Pipes
import qualified Pipes.Prelude as P

main = runEffect $
for (each [1..10^8] >-> P.map (+1) >-> P.map (*2)) discard

The -ddump-rule-firings flag will output every rewrite rule that fires during compilation, identifying each rule with the string label accompanying the rule:

$ ghc -O2 -ddump-rule-firings map-fusion.hs
[1 of 1] Compiling Main ( test.hs, test.o )
...
Rule fired: p >-> map f
Rule fired: for (for p f) g
Rule fired: for (yield x) f
...

I've highlighted the rule firings that correspond to map fusion, although there are many other rewrite rules that fire (including more shortcut fusion rule firings).

Shortcut fusion

We don't have to limit ourselves to just fusing maps. Many pipes in Pipes.Prelude has an associated free theorem that rewrites pipe composition into an equivalent for loop. After these rewrites, the "for loop laws" go to town on the pipeline and fuse it into a single pass.

For example, the filter pipe has a rewrite rule similar to map:

{-# RULES
"p >-> filter pred" forall p pred .
p >-> filter pred =
for p (\y -> when (pred y) (yield y))
#-}

So if we combine map and filter in a pipeline, they will also fuse into a single pass:

p >-> map f >-> filter pred

-- "p >-> map f" rule fires
for p (\x -> yield (f x)) >-> filter pred

-- "p >-> filter pred" rule fires
for (for p (\x -> yield (f x))) (\y -> when (pred y) (yield y))

-- "for (for p f) g" rule fires
for p (\x -> for (yield (f x)) (\y -> when (pred y) (yield y)))

-- for (yield x) f" rule fires
for p (\x -> let y = f x in when (pred y) (yield y))

This is the kind of single pass loop we might have written by hand if we were pipes experts, but thanks to rewrite rules we can write high-level, composable code and let the library automatically rewrite it into efficient and tight loops.

Note that not all pipes are fusible in this way. For example the take pipe cannot be fused in this way because it cannot be rewritten in terms of a for loop.

Benchmarks

These rewrite rules make fusible pipe stages essentially free. To illustrate this I've set up a criterion benchmark testing running time as a function of the number of map stages in a pipeline:

import Criterion.Main
import Data.Functor.Identity (runIdentity)
import Pipes
import qualified Pipes.Prelude as P

n :: Int
n = 10^6

main = defaultMain
[ bench' "1 stage " $ \n ->
each [1..n]
>-> P.map (+1)
, bench' "2 stages" $ \n ->
each [1..n]
>-> P.map (+1)
>-> P.map (+1)
, bench' "3 stages" $ \n ->
each [1..n]
>-> P.map (+1)
>-> P.map (+1)
>-> P.map (+1)
, bench' "4 stages" $ \n ->
each [1..n]
>-> P.map (+1)
>-> P.map (+1)
>-> P.map (+1)
>-> P.map (+1)
]
where
bench' label f = bench label $
whnf (\n -> runIdentity $ runEffect $ for (f n) discard)
(10^5 :: Int)

Before shortcut fusion (i.e. pipes-4.0.0), the running time scales linearly with the number of map stages:

warming up
estimating clock resolution...
mean is 24.53411 ns (20480001 iterations)
found 80923 outliers among 20479999 samples (0.4%)
32461 (0.2%) high severe
estimating cost of a clock call...
mean is 23.89897 ns (1 iterations)

benchmarking 1 stage
mean: 4.480548 ms, lb 4.477734 ms, ub 4.485978 ms, ci 0.950
std dev: 19.42991 us, lb 12.11399 us, ub 35.90046 us, ci 0.950

benchmarking 2 stages
mean: 6.304547 ms, lb 6.301067 ms, ub 6.310991 ms, ci 0.950
std dev: 23.60979 us, lb 14.01610 us, ub 37.63093 us, ci 0.950

benchmarking 3 stages
mean: 10.60818 ms, lb 10.59948 ms, ub 10.62583 ms, ci 0.950
std dev: 61.05200 us, lb 34.79662 us, ub 102.5613 us, ci 0.950

benchmarking 4 stages
mean: 13.74065 ms, lb 13.73252 ms, ub 13.76065 ms, ci 0.950
std dev: 61.13291 us, lb 29.60977 us, ub 123.3071 us, ci 0.950

Stream fusion (added in pipes-4.0.1) makes additional map stages essentially free:

warming up
estimating clock resolution...
mean is 24.99854 ns (20480001 iterations)
found 1864216 outliers among 20479999 samples (9.1%)
515889 (2.5%) high mild
1348320 (6.6%) high severe
estimating cost of a clock call...
mean is 23.54777 ns (1 iterations)

benchmarking 1 stage
mean: 2.427082 ms, lb 2.425264 ms, ub 2.430500 ms, ci 0.950
std dev: 12.43505 us, lb 7.564554 us, ub 20.11641 us, ci 0.950

benchmarking 2 stages
mean: 2.374217 ms, lb 2.373302 ms, ub 2.375435 ms, ci 0.950
std dev: 5.394149 us, lb 4.270983 us, ub 8.407879 us, ci 0.950

benchmarking 3 stages
mean: 2.438948 ms, lb 2.436673 ms, ub 2.443006 ms, ci 0.950
std dev: 15.11984 us, lb 9.602960 us, ub 23.05668 us, ci 0.950

benchmarking 4 stages
mean: 2.372556 ms, lb 2.371644 ms, ub 2.373949 ms, ci 0.950
std dev: 5.684231 us, lb 3.955916 us, ub 9.040744 us, ci 0.950

In fact, once you have just two stages in your pipeline, pipes greatly outperforms conduit and breaks roughly even with io-streams. To show this I've written up a benchmark comparing pipes performance against these libraries for both pure loops and loops that are slightly IO-bound (by writing to /dev/null):

import Criterion.Main
import Data.Functor.Identity (runIdentity)
import qualified System.IO as IO

import Data.Conduit
import qualified Data.Conduit.List as C

import Pipes
import qualified Pipes.Prelude as P

import qualified System.IO.Streams as S

criterion :: Int -> IO ()
criterion n = IO.withFile "/dev/null" IO.WriteMode $ \h ->
defaultMain
[ bgroup "pure"
[ bench "pipes" $ whnf (runIdentity . pipes) n
, bench "conduit" $ whnf (runIdentity . conduit) n
, bench "io-streams" $ nfIO (iostreams n)
]
, bgroup "io"
[ bench "pipes" $ nfIO (pipesIO h n)
, bench "conduit" $ nfIO (conduitIO h n)
, bench "iostreams" $ nfIO (iostreamsIO h n)
]
]

pipes :: Monad m => Int -> m ()
pipes n = runEffect $
for (each [1..n] >-> P.map (+1) >-> P.filter even) discard

conduit :: Monad m => Int -> m ()
conduit n =
C.enumFromTo 1 n $= C.map (+1) $= C.filter even $$ C.sinkNull

iostreams :: Int -> IO ()
iostreams n = do
is0 <- S.fromList [1..n]
is1 <- S.map (+1) is0
is2 <- S.filter even is1
S.skipToEof is2

pipesIO :: IO.Handle -> Int -> IO ()
pipesIO h n = runEffect $
each [1..n]
>-> P.map (+1)
>-> P.filter even
>-> P.map show
>-> P.toHandle h

conduitIO :: IO.Handle -> Int -> IO ()
conduitIO h n =
C.enumFromTo 1 n
$= C.map (+1)
$= C.filter even
$= C.map show
$$ C.mapM_ (IO.hPutStrLn h)

iostreamsIO :: IO.Handle -> Int -> IO ()
iostreamsIO h n = do
is0 <- S.fromList [1..n]
is1 <- S.map (+1) is0
is2 <- S.filter even is1
is3 <- S.map show is2
os <- S.makeOutputStream $ \ma -> case ma of
Just str -> IO.hPutStrLn h str
_ -> return ()
S.connect is3 os

main = criterion (10^6)

The benchmarks place pipes neck-and-neck with io-streams on pure loops and 10% slower on slightly IO-bound code. Both libraries perform faster than conduit:

warming up
estimating clock resolution...
mean is 24.50726 ns (20480001 iterations)
found 117040 outliers among 20479999 samples (0.6%)
45158 (0.2%) high severe
estimating cost of a clock call...
mean is 23.89208 ns (1 iterations)

benchmarking pure/pipes
mean: 24.04860 ms, lb 24.02136 ms, ub 24.10872 ms, ci 0.950
std dev: 197.3707 us, lb 91.05894 us, ub 335.2267 us, ci 0.950

benchmarking pure/conduit
mean: 172.8454 ms, lb 172.6317 ms, ub 173.1824 ms, ci 0.950
std dev: 1.361239 ms, lb 952.1500 us, ub 1.976641 ms, ci 0.950

benchmarking pure/io-streams
mean: 24.16426 ms, lb 24.12789 ms, ub 24.22919 ms, ci 0.950
std dev: 242.5173 us, lb 153.9087 us, ub 362.4092 us, ci 0.950

benchmarking io/pipes
mean: 267.7021 ms, lb 267.1789 ms, ub 268.4542 ms, ci 0.950
std dev: 3.189998 ms, lb 2.370387 ms, ub 4.392541 ms, ci 0.950

benchmarking io/conduit
mean: 310.3034 ms, lb 309.8225 ms, ub 310.9444 ms, ci 0.950
std dev: 2.827841 ms, lb 2.194127 ms, ub 3.655390 ms, ci 0.950

benchmarking io/iostreams
mean: 239.6211 ms, lb 239.2072 ms, ub 240.2354 ms, ci 0.950
std dev: 2.564995 ms, lb 1.879984 ms, ub 3.442018 ms, ci 0.950

I hypothesize that pipes performs slightly slower on IO compared to io-streams because of the cost of calling lift, whereas iostreams operates directly within the IO monad at all times.

These benchmarks should be taken with a grain of salt. All three libraries are most frequently used in strongly IO-bound scenarios, where the overhead of each library is pretty much negligible. However, this still illustrates how big of an impact shortcut fusion can have on pure code paths.

Conclusions

pipes is a stream programming library with a strong emphasis on theory and the library's contract with the user is a set of laws inspired by category theory. My original motivation behind proving these laws was to fulfill the contract, but I only later realized that the for loop laws doubled as fortuitous shortcut fusion optimizations. This is a recurring motif in Haskell: thinking mathematically pays large dividends.

For this reason I like to think of Haskell as applied category theory: I find that many topics I learn from category theory directly improve my Haskell code. This post shows one example of this phenomenon, where shortcut fusion naturally falls out of the monad laws for ListT.

by Gabriel Gonzalez ([email protected]) at December 31, 2014 05:00 PM

Well-Typed.Com

Simple SMT solver for use in an optimizing compiler

This is a second blog post in a series about engineering optimizing compilers; the previous was Quasi-quoting DSLs for free. In this blog we will see how we might write a very simple SMT solver that will allow us to write an optimization pass that can turn something like

if a == 0 then
  if !(a == 0) && b == 1 then
    write 1
  else
    write 2
else
  write 3

into

if a == 0 then
  write 2
else
  write 3

without much effort at all.

Preliminaries

For the sake of this blog post we will consider a very simple imperative object language, defined as

data Expr =
    -- Arithmetic expressions
    ExprInt Integer           -- ^ Integer constants
  | ExprAdd Expr Expr         -- ^ Addition
    -- Boolean expressions
  | ExprBool Bool             -- ^ Boolean constants
  | ExprEq Expr Expr          -- ^ Equality
  | ExprNot Expr              -- ^ Negation
  | ExprAnd Expr Expr         -- ^ Logical conjunction
  | ExprOr Expr Expr          -- ^ Logical disjunction
    -- Variables
  | ExprVar VarName           -- ^ Read from a variable
  | ExprAssign VarName Expr   -- ^ Write to a variable
    -- Control flow
  | ExprSeq Expr Expr         -- ^ Sequential composition
  | ExprIf Expr Expr Expr     -- ^ Conditional
    -- Input/output
  | ExprRead                  -- ^ Read an integer from the console
  | ExprWrite Expr            -- ^ Write an integer to the console

We will assume the existence of a quasi-quoter for this language so that we can write Haskell fragments such as

[expr| if a == 0 then read else write b |]

instead of

ExprIf (ExprEq (ExprVar a) (ExprInt 0)) 
       ExprRead 
       (ExprWrite (ExprVar b))

How you can write such a quasi-quoter was the topic of the previous blog post. You should however be able to read this blog post without having read the previous post; hopefully the mapping from the concrete syntax to the constructors of Expr is pretty obvious.

Simplifying assumption

Our goal will be to write a function

provable :: Expr -> Bool

Let’s consider some examples:

  • The expression a || True should be provable: no matter what value variable a has, a || True is always True.

  • Likewise, the expression a || !a should be provable, for the same reason.

  • However, expression a should not be provable (a might be False)

  • Likewise, expression !a should also not be provable (a might be True)

Note that this means that it’s perfectly possible for both an expression and the negation of that expression to be unprovable.

What about an expression !(n == 0 && n == 1)? Morally speaking, this expression should be provable. However, we will be making the following very important simplifying assumption:

provable is allowed to give up: when provable returns False, this should be interpreted as “failed to prove”, rather than “there exist a counterexample”.

From a compiler perspective, if something is not statically provable, that simply means that an optimization may not be applied even though it could: that is, we miss an opportunity to make the program go faster, but we will not break the code.

An evaluator

We don’t want (or need) to embed a full blown theorem prover into our compiler: ideally we write something simple that will still catch a lot of the common cases. Moreover, we would prefer to reuse as much of our existing compiler infrastructure as we can. Our optimizer is likely to contain an interpreter for the object language, so that we can attempt to statically evaluate expressions. We are going to adapt this interpreter so that we can also use it in our solver. In fact, as we shall see, the solver will be a one-liner.

But we’re getting ahead of ourselves. Let’s consider how to write the interpreter first. The interpreter will be running in an Eval monad; for our first attempt, let’s define this monad as a a simple wrapper around the list monad:

newtype Eval a = Eval { unEval :: [a] }
  deriving ( Functor
           , Applicative
           , Alternative
           , Monad
           , MonadPlus
           )

runEval :: Eval a -> [a]
runEval act = unEval act

We will use the monad for failure:

throwError :: Eval a
throwError = Eval []

We can provide error recovery through

onError :: Eval a -> Eval a -> Eval a
onError (Eval act) (Eval handler) = Eval $
    case act of
      [] -> handler
      rs -> rs

We will see why we need the list monad when we discuss the evaluator for boolean expressions; but let’s consider the evaluator for integer expressions first:

evalInt :: Expr -> Eval Integer
evalInt = go
  where
    go (ExprInt i)    = return i
    go (ExprAdd a b)  = (+) <$> evalInt a <*> evalInt b
    go (ExprIf c a b) = do cond <- evalBool c
                           evalInt (if cond then a else b)
    go _              = throwError 

Hopefully this should be pretty self explanatory; our toy language only has a few integer-valued expressions, so there isn’t much to do. The interpreter for boolean expressions is more interesting:

evalBool :: Expr -> Eval Bool
evalBool = \e -> go e `onError` guess e
  where
    go (ExprBool a)   = return a
    go (ExprEq   a b) = (==) <$> evalInt a <*> evalInt b
    go (ExprNot  a)   = not  <$> evalBool a
    go (ExprAnd  a b) = (&&) <$> evalBool a <*> evalBool b
    go (ExprOr   a b) = (||) <$> evalBool a <*> evalBool b
    go (ExprIf c a b) = do cond <- evalBool c
                           evalBool (if cond then a else b)
    go _              = throwError 

    guess _e = return True <|> return False

The definition of go contains no surprises, and follows the definition of go in evalInt very closely. However, the top-level definition

evalBool = \e -> eval e `onError` guess e

is more interesting. If for some reason we fail to evaluate a boolean expression (for example, because it contains a variable) then guess returns both True and False. Let’s consider some examples:

runEval $ evalBool [expr| True |]

evalutes to [True];

runEval $ evalBool [expr| a |]

evaluates to [True, False] because we don’t know what value a has, but

runEval $ evalBool [expr| a || True |]

evaluates to [True, True]: we still have two guesses for a, but no matter what we guess a || True always evaluates to True.

Satisfiability

We can now write our SMT solver; as promised, it will be a single line:

satisfiable :: Expr -> Bool
satisfiable = or . runEval . evalBool

Function satisfiable (the “S” in SMT) checks if the expression is true for some values of the variables in the expression. How do we check this? Well, we just run our evaluator which, when it encounters a variable, will return both values for the variable. Hence, if any of the values returned by the evaluator is True, then the expression is true at least for one value for each variable.

Once we have an implementation of satisfiability, we can implement provable very easily: an expression is provable if its negation is not satisfiable:

provable :: Expr -> Bool
provable = not . satisfiable . ExprNot

If we consider the three examples from the previous section, we will find that both True and a || True are provable, but a by itself is not, as expected.

Inconsistent guesses

The careful reader might at this point find his brow furrowed, because something is amiss:

runEval $ evalBool [expr| a || !a |]

evaluates to

[True, True, False, True]

This happens because the evaluator will make two separate independent guesses about the value of a. As a consequence, a || !a will be considered not provable.

Is this a bug? No, it’s not. Recall that we made the simplifying assumption that provable is allowed to give up: it’s allowed to say that an expression is not provable even when it morally is. The corresponding (dual) simplifying assumption for satisfability, and hence the interpreter, is:

The interpreter, and hence satisfiability, is allowed to make inconsistent guesses.

Making an inconsistent guess is equivalent to assuming False: anything follows from False and hence any expression will be considered satisfiable once we make an inconsistent guess. As a consequence, this means that once we make inconsistent guesses, we will consider the expression as not provable.

More precision

We can however do better: whenever we make a guess that a particular expression evaluates to True or False, then if we see that same expression again we can safely make the same guess, rather than making an independent guess. To implement this, we extend our Eval monad with some state to remember the guesses we made so far:

newtype Eval a = Eval { unEval :: StateT [(Expr, Bool)] [] a }
  deriving ( Functor
           , Applicative
           , Alternative
           , Monad
           , MonadPlus
           , MonadState [(Expr, Bool)]
           )
           
runEval :: Eval a -> [a]
runEval act = evalStateT (unEval act) []

throwError :: Eval a
throwError = Eval $ StateT $ \_st -> []

onError :: Eval a -> Eval a -> Eval a
onError (Eval act) (Eval handler) = Eval $ StateT $ \st ->
    case runStateT act st of
      [] -> runStateT handler st
      rs -> rs

The implementation of evalInt does not change at all. The implementation of evalBool also stays almost the same; the only change is the definition of guess:

guess e = do
  st <- get
  case lookup e st of
    Just b  -> return b
    Nothing -> (do put ((e, True)  : st) ; return True)
           <|> (do put ((e, False) : st) ; return False)

This implements the logic we just described: when we guess the value for an expression e, we first look up if we already made a guess for this expression. If so, we return the previous guess; otherwise, we make a guess and record that guess.

Once we make this change

runEval $ evalBool [expr| a || !a |]

will evaluate to [True,True] and consequently a || !a will be considered provable.

Example: folding nested conditionals

As an example of how one might use this infrastructure, we will consider a simple pass that simplifies nested conditionals (if-statements). We can use provable to check if one expression implies the other:

(~>) :: Expr -> Expr -> Bool
(~>) a b = provable [expr| ! $a || $b |]

The simplifier is now very easy to write:

simplifyNestedIf :: Expr -> Expr
simplifyNestedIf = everywhere (mkT go)
  where
    go [expr| if $c1 then
                 if $c2 then
                   $e1
                 else
                   $e2
               else
                 $e3 |]
      | c1 ~> c2              = [expr| if $c1 then $e1 else $e3 |]
      | c1 ~> [expr| ! $c2 |] = [expr| if $c1 then $e2 else $e3 |]
    go e = e

The auxiliary function go pattern matches against any if-statement that has another if-statement as its “then” branch. If we can prove that the condition of the outer if-statement implies the condition of the inner if-statement, we can collapse the inner if-statement to just its “then” branch. Similarly, if we can prove that the condition of the outer if-statement implies the negation of the condition of the inner if-statement, we can collapse the inner if-statement to just its “else” branch. In all other cases, we return the expression unchanged. Finally, we use the Scrap Your Boilerplate operators everywhere and mkT to apply this transformation everywhere in an expression (rather than just applying it top-level). For example,

simplifyNestedIf [expr| 
    if a == 0 then 
      if !(a == 0) && b == 1 then 
        write 1 
      else 
        write 2 
    else 
      write 3 
  |]

evaluates to

if a == 0 then
  write 2
else
  write 3

as promised. Incidentally, perhaps you are wondering why such an optimization would be useful; after all, which self respecting programmer writes such code? However, code like the above may result from other compiler optimizations such as inlining: imagine that the inner if-statement came from a inlined function. A lot of compiler optimizations are designed to clean up other compiler optimizations.

Conclusion

We can implement a very simple but useful SMT solver for use in an optimizing compiler by making a small change to the interpreter, which we need anyway.

Note that the increase in precision gained from recording previous guesses is a gradual increase only; satisfiability may still make some inconsistent guesses. For example

runEval $ evalBool [expr| !(n == 0 && n == 1) |]

will evaluate to

[False,True,True,True]

because it is making independent guesses about n == 0 and n == 1; consequently !(n == 0 && n == 1) will not be considered provable. We can increase the precision of our solver by making guess smarter (the “MT” or “modulo theories” part of SMT). For example, we could record some information about the guesses about integer valued variables.

At the extreme end of the scale we would be implementing a full decision procedure for first order arithmetic, which is probably optimization gone too far. However, the approach we took above where we start from the basis that we are allowed to make inconsistent guesses gives us a way to implement a simple solver that addresses the most important cases, even if it misses a few—without requiring a lot of complicated infrastructure in the compiler. And as long as we make sure that our evaluator never returns too few values (even if it may return too many) we don’t have to worry that we might generate invalid code: the worst that can happen is that we miss an optimization.

by edsko at December 31, 2014 01:36 PM

December 30, 2014

Roman Cheplyaka

Denotational design does not work

Conal Elliott in his paper Denotational design with type class morphisms, as well as in the recent podcast, advocates denotational design as a principle of software design. Unfortunately, in my experience it never works for realistically complex problems.

On simplicity

First, I’ll formulate Conal’s approach as I understand it. For any given entity of your model, you should come up with a simple mathematical object — the denotation — that faithfully represents that entity.

The implementation of the type may vary, presumably to maximize its runtime efficiency, but it should not expose any more information than the chosen denotation has. That is considered an «abstraction leak». Conal specifically talks about that in the podcast (31m50s, for example).

Here I need to stress an important but subtle point in Conal’s principle: simplicity. You only follow Conal’s principle if you find a simple denotation, not just any denotation.

This point is important because without it any design is denotational design, trivially. Universal algebra tells us that for any set of operations and any (non-contradictory) laws about them there exists a model that satisfies these laws. For any Haskell module, we can interpret its types in the set theory (or a more complex domain if needed), and call that our denotation.

But that’s not what Conal is after. His approach is interesting exactly because he argues that it is possible to find simple denotations. This subtle point makes Conal’s approach simultaneously attractive and unrealistic. I’ll demonstrate this with two examples from my own work experience.

DSLs

At Barclays I worked on FPF, an embedded domain-specific language for describing financial instruments. In his paper, Conal shows how a denotation for such a DSL can quickly grow in complexity when requirements change. When variables and errors are introduced, the denotation changes from a to Env -> Result a. Still, this is a very simple DSL that only supports evaluation.

In reality, the main reason people make DSLs instead of using general-purpose languages is the ability to analyze DSL programs. One important feature of FPF is that it could pretty-print a program into a nice PDF. That poses an obvious problem — not every two semantically equivalent programs (under the interpretation semantics) result in equally nice PDFs. Inlining is a semantically sound transformation, but when our users get PDFs with all the definitions inlined, they get angry.

Sure, we could say that now our denotation becomes the domain product (Env -> Result a, String), where String is the pretty printer output. But in reality we have a dozen different analyses, and most of them are not expressible in terms of each other, or any single simple model. They also do not satisfy many laws. For instance, one day a user (quant or trader) could come and tell us that the barrier classifier should classify two mathematically equivalent expressions as different barriers because those expressions follow certain conventions. And even though the quant is mathematically inclined, denotations and type class morphism would be the last thing he wants to hear about in response to his feature request.

So, in practice, the best denotation for the DSL expressions was the AST itself. Which, according to my interpretation of Conal’s principles, is not an example of a denotational design, but a failure to apply one.

Machines

At my current job (Signal Vine), I work on a platform for scripted interaction with students via text messages. For every student enrolled in a messaging campaign, we send a message, receive a reply, process it, and the cycle repeats.

This is very similar to FRP; perhaps not the FRP Conal prefers (in the podcast he stresses the importance of continuous functions as opposed to events), but the kind of discrete FRP that Justin Le models with Mealy machines.

So it would seem that I should model a student as

newtype Student = Student (InboundSMS -> (OutboundSMS, Student))

That would be an exemplary case of denotational design. But that would be far from our customers’ needs. Every student has a set of profile variables that are filled when the student responds to a text, and our customers (counselors who work with that student) want to see those variables. They also want to see which messages were sent, what the student’s replies were, and even what messages will be sent to the student in the future. These requirements defeat the attempt to model a student in a simple, abstract way. Instead, I need to store all the information I have about the student because sooner or later I’ll need to expose that information to the user.

Conclusions

Denotational design is a very neat idea, but I believe that it only works in simple cases and when requirements are static. In real-world commercial programming, it breaks for two main reasons:

  1. Users often want maximum insight into what’s going on, and you need to break the abstraction to deliver that information.
  2. Requirements change, and an innocent change in requirements may lead to a drastic change and increase in complexity of the denotation.

It is certainly useful to think about denotations of your entities in specific, simple contexts (like the evaluation semantics for a DSL); such thought experiments may help you better understand the problem or even find a flaw in your implementation.

But when you are implementing the actual type, your best bet is to create an algebraic data type with all the information you have, so that when you need to extend the interface (or «leak the abstraction»), it won’t cause you too much pain.

December 30, 2014 10:00 PM

Edward Kmett

Fast Circular Substitution

Emil Axelsson and Koen Claessen wrote a functional pearl last year about Using Circular Programs for Higher-Order Syntax.

About 6 months ago I had an opportunity to play with this approach in earnest, and realized we can speed it up a great deal. This has kept coming up in conversation ever since, so I've decided to write up an article here.

In my bound library I exploit the fact that monads are about substitution to make a monad transformer that manages substitution for me.

Here I'm going to take a more coupled approach.

To have a type system with enough complexity to be worth examining, I'll adapt Dan Doel's UPTS, which is a pure type system with universe polymorphism. I won't finish the implementation here, but from where we get it should be obvious how to finish the job.

Unlike Axelsson and Claessen I'm not going to bother to abstract over my name representation.

To avoid losing the original name from the source, we'll just track names as strings with an integer counting the number of times it has been 'primed'. The name is purely for expository purposes, the real variable identifier is the number. We'll follow the Axelsson and Claessen convention of having the identifier assigned to each binder be larger than any one bound inside of it. If you don't need he original source names you can cull them from the representation, but they can be useful if you are representing a syntax tree for something you parsed and/or that you plan to pretty print later.

 
data Name = Name String Int
   deriving (Show,Read)
 
hint :: Name -> String
hint (Name n _) = n
 
nameId :: Name -> Int
nameId (Name _ i) = i
 
instance Eq Name where
  (==) = (==) `on` nameId
 
instance Ord Name where
  compare = compare `on` nameId
 
prime :: String -> Int -> Name
prime n i = Name n (i + 1)
 

So what is the language I want to work with?

 
type Level = Int
 
data Constant
  = Level
  | LevelLiteral {-# UNPACK #-} !Level
  | Omega
  deriving (Eq,Ord,Show,Read,Typeable)
 
data Term a
  = Free a
  | Bound {-# UNPACK #-} !Name
  | Constant !Constant
  | Term a :+ {-# UNPACK #-} !Level
  | Max  [Term a]
  | Type !(Term a)
  | Lam   {-# UNPACK #-} !Name !(Term a) !(Term a)
  | Pi    {-# UNPACK #-} !Name !(Term a) !(Term a)
  | Sigma {-# UNPACK #-} !Name !(Term a) !(Term a)
  | App !(Term a) !(Term a)
  | Fst !(Term a)
  | Snd !(Term a)
  | Pair !(Term a) !(Term a) !(Term a)
  deriving (Show,Read,Eq,Ord,Functor,Foldable,Traversable,Typeable)
 

That is perhaps a bit paranoid about remaining strict, but it seemed like a good idea at the time.

We can define capture avoiding substitution on terms:

 
subst :: Eq a => a -> Term a -> Term a -> Term a
subst a x y = y >>= \a' ->
  if a == a'
    then x
    else return a'
 

Now we finally need to implement Axelsson and Claessen's circular programming trick. Here we'll abstract over terms that allow us to find the highest bound value within them:

 
class Bindable t where
  bound :: t -> Int
 

and instantiate it for our Term type

 
instance Bindable (Term a) where
  bound Free{}        = 0
  bound Bound{}       = 0 -- intentional!
  bound Constant{}    = 0
  bound (a :+ _)      = bound a
  bound (Max xs)      = foldr (\a r -> bound a `max` r) 0 xs
  bound (Type t)      = bound t
  bound (Lam b t _)   = nameId b `max` bound t
  bound (Pi b t _)    = nameId b `max` bound t
  bound (Sigma b t _) = nameId b `max` bound t
  bound (App x y)     = bound x `max`  bound y
  bound (Fst t)       = bound t
  bound (Snd t)       = bound t
  bound (Pair t x y)  = bound t `max` bound x `max` bound y
 

As in the original pearl we avoid traversing into the body of the binders, hence the _'s in the code above.

Now we can abstract over the pattern used to create a binder in the functional pearl, since we have multiple binder types in this syntax tree, and the code would get repetitive.

 
binder :: Bindable t =>
  (Name -> t) ->
  (Name -> t -> r) ->
  String -> (t -> t) -> r
binder bd c n e = c b body where
  body = e (bd b)
  b = prime n (bound body)
 
lam, pi, sigma :: String -> Term a -> (Term a -> Term a) -> Term a
lam s t   = binder Bound (`Lam` t) s
pi s t    = binder Bound (`Pi` t) s
sigma s t = binder Bound (`Sigma` t) s
 

We may not always want to give names to the variables we capture, so let's define:

lam_, pi_, sigma_ :: Term a -> (Term a -> Term a) -> Term a
lam_   = lam "_"
pi_    = pi "_"
sigma_ = sigma "_"

Now, here's the interesting part. The problem with Axelsson and Claessen's original trick is that every substitution is being handled separately. This means that if you were to write a monad for doing substitution with it, it'd actually be quite slow. You have to walk the syntax tree over and over and over.

We can fuse these together by making a single pass:

 
instantiate :: Name -> t -> IntMap t -> IntMap t
instantiate = IntMap.insert . nameId
 
rebind :: IntMap (Term b) -> Term a -> (a -> Term b) -> Term b
rebind env xs0 f = go xs0 where
  go = \case
    Free a       -> f a
    Bound b      -> env IntMap.! nameId b
    Constant c   -> Constant c
    m :+ n       -> go m :+ n
    Type t       -> Type (go t)
    Max xs       -> Max (fmap go xs)
    Lam b t e    -> lam   (hint b) (go t) $ \v ->
      rebind (instantiate b v env) e f
    Pi b t e     -> pi    (hint b) (go t) $ \v ->
      rebind (instantiate b v env) e f
    Sigma b t e  -> sigma (hint b) (go t) $ \v ->
      rebind (instantiate b v env) e f
    App x y      -> App (go x) (go y)
    Fst x        -> Fst (go x)
    Snd x        -> Snd (go x)
    Pair t x y   -> Pair (go t) (go x) (go y)
 

Note that the Lam, Pi and Sigma cases just extend the current environment.

With that now we can upgrade the pearl's encoding to allow for an actual Monad in the same sense as bound.

 
instance Applicative Term where
  pure = Free
  (< *>) = ap
 
instance Monad Term where
  return = Free
  (>>=) = rebind IntMap.empty
 

To show that we can work with this syntax tree representation, let's write an evaluator from it to weak head normal form:

First we'll need some helpers:

 
apply :: Term a -> [Term a] -> Term a
apply = foldl App
 
rwhnf :: IntMap (Term a) ->
  [Term a] -> Term a -> Term a
rwhnf env stk     (App f x)
  = rwhnf env (rebind env x Free:stk) f
rwhnf env (x:stk) (Lam b _ e)
  = rwhnf (instantiate b x env) stk e
rwhnf env stk (Fst e)
  = case rwhnf env [] e of
  Pair _ e' _ -> rwhnf env stk e'
  e'          -> Fst e'
rwhnf env stk (Snd e)
  = case rwhnf env [] e of
  Pair _ _ e' -> rwhnf env stk e'
  e'          -> Snd e'
rwhnf env stk e
  = apply (rebind env e Free) stk
 

Then we can start off the whnf by calling our helper with an initial starting environment:

 
whnf :: Term a -> Term a
whnf = rwhnf IntMap.empty []
 

So what have we given up? Well, bound automatically lets you compare terms for alpha equivalence by quotienting out the placement of "F" terms in the syntax tree. Here we have a problem in that the identifiers we get assigned aren't necessarily canonical.

But we can get the same identifiers out by just using the monad above:

 
alphaEq :: Eq a => Term a -> Term a -> Bool
alphaEq = (==) `on` liftM id
 

It makes me a bit uncomfortable that our monad is only up to alpha equivalence and that liftM swaps out the identifiers used throughout the entire syntax tree, and we've also lost the ironclad protection against exotic terms.

But overall, this is a much faster version of Axelsson and Claessen's trick and it can be used as a drop-in replacement for something like bound in many cases, and unlike bound, it lets you use HOAS-style syntax for constructing lam, pi and sigma terms.

With pattern synonyms you can prevent the user from doing bad things as well. Once 7.10 ships you'd be able to use a bidirectional pattern synonym for Pi, Sigma and Lam to hide the real constructors behind. I'm not yet sure of the "best practices" in this area.

Here's the code all in one place:

[Download Circular.hs]

Happy Holidays,
-Edward

by Edward Kmett at December 30, 2014 08:43 PM

Disciple/DDC

Higher order functions and interface files.

Work on DDC progresses, though recently it's been straightforward compiler engineering rather than anything novel, so I haven't been blogging. DDC now drops interface files when compiling modules, multi-module compilation works, and the front-end supports unicode string literals. I've also implemented a lambda lifter and support for higher order functions. This part is well baked enough to implement simple stream functions, as used in the Haskell Data.Vector library, though the compiler doesn't do the associated fusion yet. For example:

data Stream (s a : Data) where
MkStream : (s -> Step s a) -> s -> Stream s a

data Step (s a : Data) where
Yield : a -> s -> Step s a
Skip : s -> Step s a
Done : Step s a

-- | Apply a function to every element of a stream.
smap (f : a -> b) (ss : Stream s a) : Stream s b
= case ss of
MkStream stepA sA0
-> let stepB q = case stepA q of
Yield x sA1 -> Yield (f x) sA1
Skip sA2 -> Skip sA2
Done -> Done
in MkStream stepB sA0

-- | Take the given number of elements from the front of a stream.
stake (n : Nat) (ss : Stream s a) : Stream (Tup2 s Nat) a
= case ss of
MkStream fA sA0
-> let stepB q = case q of
T2 sA ix
-> if ix >= n
then Done
else case fA sA of
Yield x sA2 -> Yield x (T2 sA2 (ix + 1))
Skip sA3 -> Skip (T2 sA3 ix)
Done -> Done
in MkStream stepB (T2 sA0 0)
As we can see, work on larger demos is starting to be hampered by the lack of pattern matching sugar in the source language, so I'm going to add that next. After adding pattern matching, I'm going to spend a couple of weeks bug fixing, and should get the DDC 0.5 release out in early February 2015.

by Ben Lippmeier ([email protected]) at December 30, 2014 05:40 AM

December 29, 2014

Lee Pike

All Hammers are Terrible

“All computers are terrible.”

On social media, this turn of phrase is almost a meme. The sentiment is reasonable at some level—computers are complicated, buggy, and insecure. But the general public is largely indifferent, or least, they accept the situation for what it is: Computer freezes? turn it on then off again. Smart phone insecure? Well, the convenience of digital wallets outweighs the risk. My parents, who are not computer scientists, never say that all computers are terrible.

Rather, this is a complaint from within, by computer scientists themselves.  What drives this thinking? (While not specifically addressing the cause of the sentiment, this blog post provides a nice counterpoint.) No immediately obvious hypotheses seem particularly explanatory:

  • Ludditism: perhaps despite (or because of) our work in technology, computer scientists abhor it. But I find this explanation wanting, particularly because the focus of contempt is specifically about computers, and not about technological advancement generally.
  • Pessimism: perhaps there is a correlation between computer scientists and pessimists. Faced with the challenge of building good systems, they focus on the flaws rather than the progress. But it does not adequately explain the focus of the pessimism on computers rather than life in general—I don’t see posts stating that “all relationships are terrible”.
  • Realism: perhaps computers are terrible. Computer science history is measured in decades, and the approach to building systems is immature compared to other engineering disciplines. I tore down my old garage this summer. The city required a blueprint, erosion plan, a drainage plan, and a few hundred dollars for me to do so. They inspected the work afterwards. Contrast this with software: how much oversight does writing security-critical code get? Just about anyone can throw code up that purports to solve a problem. Still, given that programs are among the most complex artifacts built by humans, we’re doing pretty well.
  • Narcissism: finally, perhaps a tinge of narcissism drives these comments, something along the lines of, “where others have failed, I will succeed.” It is given as a tongue-in-cheek put-down regarding the work of fellow professionals. The motivation is there: if we thought that the tools of computer science were “good enough”, then the world wouldn’t need new programming languages, tools, or libraries. On the other hand, these kind of complaints are not just about directly competing work, and there will always be new inventions to make.

I tend not to hear the same kind of complaints from members of other professions. Yes, we all complain about our jobs, but not necessarily about our tools. Carpenters don’t say, “All hammers are terrible,” even if they complain about their company, their coworkers, the architect, the inspectors. Doctors don’t complain about X-ray machines, MRIs, or scalpels (but they might complain about their patients, insurance, and administration). Farmers don’t complain about the design of combines and tractors. Now, they might complain about a specific instance—“My tractor is terrible; it breaks down all the time.” But not about tractors, in general.

Other professions largely take their tools as a given. Indeed, most practitioners know so little about how the tools they use are built, it is difficult to critique them. A carpenter does not know how a skill saw is built and a doctor does not know how an X-rays are taken. Just whether they work or not and if they are useful or not.

Computer science is unique insofar as the tool makers are so closely connected with the tool users. There’s another glib quip in computer science: “patches welcome” meaning that if you (the user) thinks a tool should be improved, then it is your responsibility to make those improvements yourself. In other fields, the saying does not exist; it would be laughable! A taxi driver can’t be expected to invent a more fuel-efficient automobile; an architect can’t be expected to build better modeling software.

Computer science is unique in this respect, and it has philosophical, psychological, and sociological implications. Imagine, if you will, you are a carpenter, and you have some understanding of how the tools that you have are designed and built. Your skill saw is right-handed, but you are left-handed; with a few modifications, you realize its design can be abstracted, to accommodate either handedness. When your drill dies, instead of simply replacing it, you open it up, find the faulty circuit, and realize there’s a weakness in the wiring that allows it to overheat. When each of your tools fails you in some way, you uncover some flaw and realize they could be improved. Finally, you come to expect failure from every tool you might ever use: “All hammers are terrible.”

We all know that technology—whether we understand how it is built or not—has flaws. We know tradeoffs are made between cost, generality, performance, and reliability. There are more and less reliable, performant, and costly cars, saws, X-ray machines.

The general public certainly has opinions and a sense of the tradeoffs between the tools they use, e.g., Windows vs. Mac OS, Android vs. iOS. But the sentiment is that these are tools, with flaws and lifespans, just like a cars, appliances, and power tools.

I don’t have a strong opinion about whether computer scientists should complain or not about their tools, except, obviously, that general complaints don’t effect much change. My point is more philosophical: we find ourselves in an interesting profession, where we can open our very tools up, peer inside, and change them.

It’s a fascinating prospect.


by Lee Pike at December 29, 2014 05:20 PM