## January 01, 2001

### Alessandro Vermeulen

Alessandro Vermeulen is a Computer Sciences (Masters) graduate from Utrecht University specialized in Software Technology. Alessandro obtained his honorary Bachelors degree in Computer Science also at Utrecht University. Among his core interests are Functional Programming (in both Haskell and Scala), compilers and automatic program analysis. Alessandro is currently employed at ING where he focuses on incorporating functional programming, reactive systems, and other software technology best practices in daily practice.

# Dhall Survey Results (2019-2019)

2018-2019-dhall-survey

The results from the latest Dhall survey are in, which you can view here:

â€¦ and I would like to thank everybody who took the time to participate in the survey!

This year 61 people completed the survey (compared to 19 last year), so we have a greater sample size to inform the future direction of the language.

Here is the breakdown of how often people used Dhall:

• `07 (11.7%)` - Never used it
• `22 (36.7%)` - Briefly tried it out
• `11 (18.3%)` - Use it for my personal projects
• `19 (31.7%)` - Use it at work
• `01 (01.7%)` - Trying to convince work people to use it

I was pleasantly surprised by the fact that more people use Dhall at work than those who use Dhall solely for personal projects. That suggests to me that people who do enjoy using Dhall do not have difficulty getting permission from their manager or coworkers to also use Dhall at work. I could be wrong, though, because there might be sampling bias or insufficient data to get accurate numbers. Those numbers also donâ€™t necessarily imply that people have convinced their coworkers to use Dhall and I plan to update the survey next year to ask about that, too.

Let me know if you think Iâ€™m wrong and you have difficulty getting permission to use Dhall at work. Providing a smooth adoption path has always been a high priority for me because I know from experience the difficulty of introducing new tools at work.

Most people confirmed that they use Dhall for ops-related use cases:

Kubernetes only at the moment, but more to follow

To generate Kubernetes YAML configuration files

Kubernetes, Terraform and application configs

Iâ€™m still evaluating it. Currently Iâ€™m generating Prometheus (YAML) configuration from it.

As of now I am trying it to use it for generating concourse pipelines. I work at a very ops heavy company, I can see couple of our proprietary tools could also leverage dhall.

Kubernetes, custom config for Haskell projects

configuring GoCD yaml pipelines, https://github.com/tomzo/gocd-yaml-config-plugin

Generating config for Terraform, Packer and for our application configuration.

Iâ€™m trialing it for configuring various tools configured with yaml or json like docker-compose and some internal haskell tools.

Description language for a Terraform replacement tool.

â€¦ generate yaml files then read by ansible

We use it to validate we have all the required (and no excess) variables set for our deploy scripts via type checking. If we add a variable to one environment, but miss others, the builds abort before being shipped to the world. â€¦

Configuration files for Elm, Packer â€¦

Replace Nix code with something saner and generate configuration for Kubernetes.

Kubernetes, Kops, Concourse, Terraform, application config

The other specific use cases I noticed were:

• configuring build tools
• command-line interface
• backend service configuration
• wire format for transmitting code

This feedback is consistent with my understanding of how Dhall is used in the wild.

Each year I try port a difficult configuration file format to Dhall to stress test the language design. Last year I ported the `nethack` configuration format to Dhall and this year I plan to port an ops-related configuration format with a weakly-typed schema to inform the language design process.

## Document best practices

Survey respondents very commonly requested use cases, cook books, design patterns, real-world examples, and project structure guidelines. This was far-and-away the most consistent feedback from the survey:

More real world examples, like CLI application config files. Or web server-client communication examples.

More blog posts/tutorials on use cases

Resources on the next steps beyond learning syntax: how to structure Dhall code, how to organise your files, design patterns, etc.

Guides/pointers to regular things like newtypes, String equality, Homogeneous record/map constraints?

A set of documented best practices for doing common things

What would make me really happy is to see some guidelines, patterns or examples for how to support evolving schemas for clients you donâ€™t control. â€¦

Full list of possible uses with detailed examples and comparisons with similar tools.

Add several complete realistic examples (besides the existing snippets).

Learning curve through examples, starter apps, tutorials

â€¦ docs, examples

â€¦ basic usage patterns (newtypes, sums, comparisons)

A use-case.

I think Dhall should develop on making its way into common use cases boosting its clout in the industry.

End-user ergonomics and patterns. Pain points will come up rapidly if you start writing libraries to configure various popular tools. Those pain points should guide possible changes to the language, and the patterns developed need to be put front and center in a cookbook or something because there are many ways to tackle problems in Dhall and the best ways arenâ€™t always obvious.

Widely used, typical use cases. Most people configure something â€œsimpleâ€�, show me how Dhall can improve my workflow (validation against a â€œschemaâ€�, deduplication, sharing of common config between applications). The initial impression from the Dhall Github and website is that its very powerful but most configuration is dead simple, just very verboseâ€¦

Documentation.

Docs, â€¦

examples how to approach a domain genericallyâ€¦

Last time I checked, the Dhall doc was very focused on the language itself, and the configuration part was kind of forgotten. Also, having the example with unicode syntax, while cool, make them hard/impossible to type along. I think having a few more docs about basic usecase to translate an existing json/yaml config into dhall would help adoption.

Nothing fancy with language feature, simply having a statically typed configuration, and optionally a type safe way to read it and map it to host language structure would be a very good start already.

Documentation should provide more pointers to idiomatic code. We based most of our current development off the dhall-kubernetes model which leads to dozens of boilerplate type imports at the beginning of files just to see how dhall-terraform makes all of their types available in a single expression ( https://github.com/blast-hardcheese/dhall-terraform/blob/master/Terraform/Providers/Datadog/Types.dhall ). Writing a cookbook would help with this.

This is also consistent with the highest rated funding mechanism: â€œBooks and/or merchandiseâ€� (See below). I suspect that most people who selected that option were interested in the book.

This feedback surprised me because I was still overly focused on improving the documentation for lower-level language features. So thank you to everybody who pointed this out because this was a huge blind spot of mine.

So my plan is to initially focus on documenting the current state-of-the-art best practices as one of my highest priorities and keeping that document up to date as the language evolves. This sort of technical writing is also the kind of thing I enjoy doing anyway! :)

## Language bindings and/or integrations

People also very commonly requested more language bindings and integrations with existing tools:

python library. having haskell is great, but we have a lot of python and C# as well and it would be great to use it from there too

Bindings for languages I have to work with (Java/Python).

Integration with NixOS

Tighter integration with Nix

â€¦ would like to use it for nix

Scala integration, complete Kubernetes API in dhall-kubernetes

More DSLs a-la dhall-kubernetes. IMHO the assurance to have a well-formed (in the sense of API conformance) config, is a huge plus.

A complete Scala implementation, as we use almost only Scala at work

A JSON->dhall process would be a big help to import complex data sources. There is a dhall-terraform effort underway, but we have decently large number of terraform (HCL) files, and being able to convert HCL->json->dhall would mean Iâ€™m completely free of some really annoying restrictions HCL imposes on me.

Import from Jason/yaml

Python interpreter and syntax highlight in github

â€¦ more language/tool integration

Language Bindings â€¦

More language bindings (e.g.Â Go and Python are two big Ops markets)

I think golang bindings would potentially help with lots of the things I care about in my day job: terraform, prometheus, kubernetes, cloud foundryâ€¦

other language integration

Bindings/libraries for other languages.

Language bindings, â€¦

Getting more software to adopt dhall is a novel goal. But I see the modern ops world is full of golang software which would require golang bindings for dhall. Itâ€™s probably not a good language to write the bindings (or anything really), but itâ€™s popularity might mean that for dhallâ€™s success golang bindings are necessary.

Having more good language implementation, into industrial programming languages, and get rid of Yaml.

More languagesâ€¦Ocaml maybe?

Cleanbindings https://wiki.clean.cs.ru.nl/

More integrations.

Using dhall from more languages would help adoption IMO. A statically-compiled lib with bindings in JS, python, java/scala would be good (and less work than implementing the language itself in all those languages). More projects like dhall-kubernetes are also a good way to drive adoption.

Being able to more quickly add it to existing code bases would be helpful. Going from JSON->dhall and then dhall being able to spit out nix/json, without needing a linter for the down stream languages makes life easier. â€¦

I do plan to address the request to import Dhall values from JSON this year and this issue tracks the discussion and work on that:

On the other hand, I do not yet plan to personally contribute a new language binding, mainly because I want to distribute control of the language evolution process. Each reimplementation of the language gets a vote on proposed language changes and if I contribute more than one language binding then I get a disproportionate voice. Instead, I focus on making it as easy as possible for others to port the language by improving the documentation and automation surrounding the language standard.

I have no intention of becoming a benevolent dictator-for-life over the language. For example, the other voting member (Fabrizio Ferrai, who maintains the Clojure bindings to Dhall) plans to author their own interpretation of the survey feedback to parallel this post. Also, hopefully there will be two new voting members this year since the Python and PureScript language bindings are getting close to completion.

However, I can still help recruit donations for both new and existing language bindings. If you work on any language binding to Dhall and you would like to be paid for your work then set up a Patreon account or similar donation mechanism and I will help advertise that. This is an area where I recommend using distributed, non-corporate sources of funding to ensure that the language evolution process remains as democratic as possible.

## Performance

Performance was another common theme:

Faster performance; â€¦

Speed. Iâ€™d really like to mix it into things more freely as a templating language, but itâ€™s too noticeable a slow down.

If dhall-kubernetes finally becomes fast, our whole infrastructure config set (at Wire) can move there

Performance and UX. Performance currently is very bad for large dhall projects (like dhall-kubernetes)

Performance for larger scale usage (such as nix)

Speed.

I agree with this feedback and improving performance is generally a never-ending process because every time I improve the interpreter performance people begin using the language for even more ambitious projects that strain the interpreter (myself included).

Because performance is an open-ended problem this is one of the areas Iâ€™m most likely to solicit donations to fund somebody to improve interpreter performance. That would also free me up to do more technical writing for the Dhall ecosystem.

If you are able and willing to improve the performance of the interpreter then let me know and Iâ€™ll work with you to secure some donation mechanism to fund your work. I am reasonably confident there are a few companies using Dhall that would fund improvements to interpreter performance. Similarly, if you are a company that can spare some budget to fund performance improvements, also reach out to me :)

## Default record values

Another common theme was that people are struggling to port Dhall to some configuration formats that have optionally present keys:

Optionally present keys in a dictionary

â€¦ Defaults for records

Having optionals that donâ€™t need to be maked with None when not present

Iâ€™d love Dhall to become a type-safe version of Nix (the language) or a handy language for configuration files, but there are no optionally present keys in dictionaries now, so I cannot use Dhall for config files yet. Currently Iâ€™ll just stick with YAML

Dhall actually does have a design pattern for this sort of idiom, which is to override a default record with the present values, like this:

``defaultRecord // { foo = 1 }``

â€¦ so this might just be another case for needing better documentation for best practices. However, if you are not satisfied with that idiom then I invite you to open an issue on the issue tracker with your thoughts on the subject:

## Language stability / migration

People are beginning to request greater language stability as they begin to adopt Dhall at work:

Greater stability in the core language and prelude. There have been a bunch of changes last year, that while I agree with their purpose, I cranked down on adding more usage of Dhall until the ecosystem is more stable to avoid version brittleness

Smoother transitions between standard versions. Maybe if the last two versions of standard were supported we could provide deprecation warnings, and have something like `rehash` subcommand. This would allow us to update hash of existing imports only if existing (old) hash is valid.

1. Stabilizing the language standard; 2. Stabilizing the prelude API.

â€¦ stability of upstream packages

There is one feature in the pipeline which should improve things here, which is support for stable hashes (i.e.Â semantic integrity checks):

This pain point was one of the most commonly cited issues with upgrading to the next language version and this will be available in the next release of the language standard and the Haskell interpreter (along with backwards-compatible support for older semantic integrity checks).

Besides that, I donâ€™t expect new language features to be too disruptive at this point. At this point I expect most new features to the language to be additive to the language and the most disruption that users might run into is some identifier becoming a reserved keyword.

Also, the Haskell implementation takes pains to make the migration process smoother by providing `dhall lint` support for migrating old code to new idioms. For example, `dhall lint` will automatically rewrite the old syntax for `Optional` literals to use the newer syntax.

One of the features of the Dhall languageâ€™s standardization process is that every new language binding adds a new vote on proposed changes, which raises the bar for changing the language. So as the language grows more mainstream I expect the language standard to stabilize due to this effect.

## No Unicode in examples

Three respondents disliked the use of Unicode in tutorials and examples. This was an unusually specific point of feedback that recurred a few times throughout the responses:

Get rid of unicode. It is cool but it really scares off beginners to see examples with unicode.

Better documentation, without unicode symbols.

removing unicode syntax

I donâ€™t have any objection to switching examples to use ASCII instead of Unicode, so I will do that. I think Unicode is prettier, but if it gets in the way of the learning process then Iâ€™ll switch to ASCII.

For people who hate Unicode syntax, period, the `dhall` command-line tool also supports an `--ascii` flag for emitting ASCII when formatting or linting code.

## Developer experience

Most of the developer experience feedback revolved around editor support, although there were a few other recommendations:

Better error messages (I mean, I can tell a lot of work has gone into them, which i really appreciate, but I am still often baffled by them)

A community mailing list where I could ask dumb questions when I get stuck. Thereâ€™s a stackoverflow tag but it seems quite low-traffic so I was put off by that.

â€¦ editor support

â€¦ editor integrations, â€¦

Release on OSX/Windows â€¦

Type / syntax errors that are easier to visually parse.

Ergonomics - language server, support for auto-complete in editors, seeing type errors in the editor without needing to run the dhall executable separately

I love Gabrielâ€™s idea on LSP support to provide support for all editors in his state of Dhall address blog post. Making the development experience (including stability of the language) will be key to letting the configs loose outside of the services team which is a little more adept at employing functional and typed methods and tools than our Ruby applications developers on the adjacent team we support.

Iâ€™d love to be able to specify a dhall file as an argument to dhall/dhall-to-* rather than feeding to STDIN. Pipes and redirection can get clumsy when incorporating into scripts.

As I mentioned in my previous post, one goal we have for this year is to add a Dhall implementation of the language server protocol:

â€¦ and as Iâ€™m writing up this post a contributor has already done some initial groundwork on this (see the above issue).

## Other language features

The most commonly requested language feature was (bidirectional) type inference:

Type inference

Also, having to literally specify type parameters to everything forces you to name and define all sorts of intermediate types, which makes things very messy. I donâ€™t know enough about Dhall to know if this is possible/desirable to avoid

Lack of gradual typing for type arguments is the biggest deal breaker - it breaks the whole â€œgradual typingâ€� story

Easing migrating existing large structures to Dhall by ensuring gradual typing is viable

I donâ€™t expect Dhall to get bidirectional type inference at this point. The two main reasons are:

• This is harder to standardize and port to multiple languages
• This would be very disruptive at a point where we are trying to stabilize the language

So in my opinion that ship has sailed, although I no longer have veto power over proposed changes so there is still always the possibility that somebody puts forth a compelling proposal for type inference that proves me wrong.

Building maps in yaml/json with a dynamic set of keys is a challenge. The other side of the fence is less strongly typed and so they define fields that may or may not be present, or are keyed on userland values. This sucks but if dhall is going to target tools that use yaml/json, there needs to be a way with good ergonomics to build those values. â€¦

I do plan on adding better support for working with weakly-typed dictionaries. As I mentioned earlier, I plan to port one of the more difficult ops-related configuration formats to Dhall to guide the language design and this will inform how I propose to support these weakly-typed dictionaries.

Turing completeness

I donâ€™t plan on making the language Turing-complete. The absence of Turing completeness is Dhallâ€™s sine qua non.

The ability to to simple math on anything other than Nats. Being able to add or multiply two Doubles would vastly increase Dhallâ€™s usefulness to me.

Iâ€™d recommend opening an issue for this if you are interested in `Double` arithmetic. There is a bit to discuss here that wonâ€™t fit in this post:

1. Simpler syntax for input data to write configuration files. 2. Recursion.

Iâ€™m not sure if (1) would be solved by the planned support for importing from JSON (as JSON syntax is still a bit clumsy for people to author in my opinion), but at the moment there arenâ€™t any other plans along those lines.

I also donâ€™t think Dhall will get native language support for recursion (or at least not anytime soon). Recursion will likely remain a design pattern, as described in this document:

â€¦ rather than a language feature.

Two people specifically requested support for fixing a very specific bug, which is that the Haskell implementation swallows all comments except the leading comment when formatting code:

Preserving comments in the output of `dhall lint`.

I hope to get to this soon, because I understand what a pain this is (it bites me, too).

## Packaging

Better â€œpackageâ€� story. â€¦

Packaging/versioning. Importing from URL makes builds prone to flakiness, local caching helps but is insufficient. Currently using Make to clone repositories at given tags and import locally, which can be a hassle since Make wonâ€™t fetch transitive dependencies for the types/expressions that are cloned. Expressions which import types off local disk refer to relative locations, which are prone to breakage if those locations change i.e.Â due to refactoring. Some kind of standard which allows for a Cargo-style definition of imported types, locked to a version, with a separate lifecycle for resolving imports would improve stability.

Iâ€™m a bit resistant to this feedback (although I might have misunderstood it). I think the traditional packaging model adds a lot of unnecessary complexity to distributing code. I view Dhall as the â€œBash of functional programmingâ€�, where the way you author code is you just save a file and the way you consume code is you refer directly to the file you want (except that Dhall fixes some mistakes Bash made in this regard).

The other reason I question the traditional packaging model is that I donâ€™t see the value that it adds above hosting Dhall code on GitHub or GitLab, especially given that Dhall can import from URLs and resolve relative references correctly.

However, I do think there is value in making Dhall packages easier to discover, browse, and document (i.e.Â like Hackage for the Haskell ecosystem).

## Possibly old interpreters

â€¦ less verbose support for using union types more easily

Making it simpler to declare Sum types

I believe these respondents may be using an older version of the interpreter because this should be addressed in the latest release of the language and Haskell interpreter. See this page for more details:

However, reading this makes me realize that next year I should add a question asking respondents what implementation and what version they use. That would also help me gauge how long to support deprecated features (such as the `constructors` keyword).

## Funding

â€œBooks and/or merchandiseâ€� was the clear leader for funding mechanism, although I suspect thatâ€™s primarily because people want a book (paid or not) based on the overwhelming feedback in favor of documentation:

• Books and/or merchandise - 24 (52.2%)
• Crowdfunding (recurring) - i.e.Â Patreon - 18 (39.1%)
• Donation button - 14 (30.4%)
• Project bounties - 14 (30.4%)
• Crowdfunding (one-time) i.e.Â Kickstarter - 9 (19.6%)
• Opening PRS myself :) - 1 (2.2%)
• Consulting - 1 (2.2%)
• Open source sponsorship - 1 (2.2%)
• Company donation for time/moral license - 1 (2.2%)

Note that I also plan to apply for grants for open source work. I didnâ€™t list that as a funding option because my mind was already made up on that.

The funding mechanism that surprised me the most was â€œDonation buttonâ€�. I thought that was something that people didnâ€™t really do any more (i.e.Â not â€œhipâ€�), but it was tied for third-most popular funding mechanism.

I didnâ€™t list consulting as one of the funding mechanisms (which one respondent had to write in), because Iâ€™ve heard from a few sources that consulting can create perverse incentives because simplifying things means fewer billable hours. However, on more reflection I think it might be worth getting corporate sponsorship for performance improvements to the language (as previously mentioned), because thatâ€™s less likely to create the wrong incentives or lead to undue corporate influence over the language evolution.

## Conclusions

This post doesnâ€™t include the comments of praise in the interest of modesty, but I did read them all and really appreciated them! ğŸ™‚

Hopefully this gives people an idea of where my current priorities are at and also helps others understand how they might be able to contribute to the Dhall ecosystem, too!

Also, if this post is the first you are hearing about the survey, you can still complete the survey and Iâ€™ll read your response even if it wonâ€™t be summarized in this post. I still get an e-mail notification for each new submission.

You can also use the Dhall languageâ€™s issue tracker to provide feedback of any kind, too. For more details on how to discuss, propose, or contribute changes, see the following guide:

# Clodl:build, copy, and run

Facundo Dominguez, Mathieu Boespflug

Have you ever needed to execute in another computer a binary that you have built, only to find that it doesn't work because it depends on shared libraries that aren't installed? Or even worse, you don't have admin rights on the machine, so you cannot install the correct dependencies?

In this post we present `clodl`, a simple tool to build self-contained closures of dynamically linked executables and shared libraries. The closure is a file that can be copied elsewhere and executed or loaded as a library without depending on any pre-installed library.

## Deploying binaries

While copying executables is the simplest way to deploy a binary, it depends strongly on a correctly configured computing environment. Other options, such as copying all the dependencies to a Docker container, or building instead a fully static binary executable alleviate this problem. However, there are cases where these methods are not appropriate.

One such case is feeding native code to the Java Virtual Machine (JVM), since it can only load shared libraries. If we are to deploy a hybrid application requiring both the JVM and native code, we need to be sure that all the native code dependencies are shipped or the JVM won't be able to load it.

A possibility is creating a Docker image with the JVM and all the libraries. This can work as long as we have control over how the application is launched, which was not the case in one of our projects.

In this project, we wanted to integrate Haskell programs with Spark, a system for running distributed computations. Among other things, Spark interfaces with systems which manage resources in a cluster and orchestrates the creation of JVM instances to run distributed computations. Computations are submitted to Spark as self-contained JAR files containing the code to execute. In this scenario, we must be able to encapsulate dependencies.

Another situation where `clodl` has been useful for us is when we need a self-contained (static-like) executable, but we are using a compiler that doesn't build fully static executables by default (e.g. GHC). Depending on the application, doing so might not be trivial. The `clodl` approach offers a stop-gap solution, or even a permanent one if fully static builds don't work (some dependencies often make this difficult).

## What is a closure?

In essence, the closure of a shared library or executable is a file containing this initial library or executable and its dependencies. In the case of `clodl` this takes the form of a zip archive, though any container format could be used.

In addition, the closure must have some provisions to execute or load the code contained within. For example, a Java application could uncompress the closure and load the initial shared library. However, this can cause problems, since there is the danger that the dependencies would be searched at other locations where incompatible libraries could be found. Unix-like systems have a flexible mechanism for finding the dependencies of a library, but this mechanism is also hard to override when we want to force the library to use a specific set of dependencies.

Because of this, the closure has a wrapper library created solely for the purpose of loading it. This wrapper library is the result of linking dynamically the initial library and all of its dependencies, and therefore all the libraries in the closure are direct dependencies of it. Moreover, the runtime library path of this wrapper library instructs the dynamic linker to search for dependencies at the same location where the wrapper is.

If we want the closure to be executed rather than loaded, we create a wrapper executable instead of a wrapper library, which is linked in the same way.

## Creating closures with clodl

`clodl` is implemented with a mix of bash scripts and Bazel rules. Bazel is a build system which is able to express dependencies between source files in various languages (e.g. Haskell and Java). While we have been using `clodl` with it, it would be possible to implement the same approach in other tools or even as a standalone shell script.

In order to build a closure, one has to write rules in a `BUILD` file which can be fed to Bazel. For instance

``````# bring into scope the library_closure rule
"@io_tweag_clodl//clodl:clodl.bzl",
"library_closure",
)

# create a C library
cc_library(
name = "lib-cc",
srcs = ["lib.c"],
)

# create a shared library from the C library
cc_binary(
name = "libhello-cc.so",
# include a main function if you intend to make
# the closure executable
srcs = ["main.c"],
deps = ["lib-cc"],
)

# create the closure of the shared library
library_closure(
name = "clotest-cc",
srcs = ["libhello-cc.so"],
)
``````

The `library_closure` rule here produces a file `clotest-cc.zip` containing all the shared libraries needed to load `libhello-cc.so`. `clodl` finds the dependencies using whatever version of `ldd` is on the PATH when Bazel runs the rules.

We can make the the closure executable by replacing the `library_closure` rule with

``````binary_closure(
name = "clotestbin-cc",
src = "libhello-cc.so",
)
``````

This will produce a file `clotestbin-cc.sh` that has the closure zip file prepended with a bash script that invokes the main function in `main.c`. The `clodl` repository has full examples. At the time of this writing, we only have tested `clodl` on Linux.

## Tools for deployment

`clodl` is not the first tool on Linux designed to deploy binaries. There is Flatpack and Snap, for instance. These are build tools that start from the source code to build cross-linux packages. Another such tool is nix-bundle, which requires the application to be packaged with nix in order to provide a binary which bundles all of its dependencies.

These tools require specifying the dependencies, the source code and any build settings. `clodl`, in contrast, starts from the executable binary or shared library, written in any language, built by any compiler.

However, there are tools that follow this approach as well. One of these is a commercial tool called ermine, which produces a self-contained executable from a dynamically-linked application. Unfortunately, it provides no clues on the structure of the final executable. `clodl` produces an easy to handle self extracting zip archive. JAR files are also zip files, so this means that the output of `clodl` can also be loaded in the JVM.

Finally, there is a tool called exodus which produces a `tar` archive from a dynamically linked executable, its dependencies and the dynamic linker. Most notably, it also includes a launcher program that will invoke the linker with command line flags that constrain the directories in which to search for libraries. This approach is rather frugal. Unlike `clodl`, it doesn't need to build a wrapper executable. However, `exodus` will not work for building closures of libraries. When we load libraries in the JVM, we assume no control over the flags passed to the dynamic linker that the JVM is using.

## Summary

In this post we have presented `clodl`, a tool to build closures of dynamically linked executables and shared libraries. These closures are easy to copy and load or execute in other hosts. We expect these capabilites to be useful to various projects. You are welcome to get in touch if you would like to contribute or have `clodl` features enhanced.

# PyData Hong Kong - Making the Big Data ecosystem work together with Python: Apache Arrow, Spark, Flink, Beam, and Dask @ PyData Hong Kong

Come join me on Tuesday 19 February @ 02:00 at PyData Hong Kong 2019 Hong Kong for PyData Hong Kong - Making the Big Data ecosystem work together with Python: Apache Arrow, Spark, Flink, Beam, and Dask.I'll update this post with the slides soon.Come see to the talk or comment bellow to join in the discussion :).Talk feedback is appreciated at http://bit.ly/holdenTalkFeedback

## February 18, 2019

### Michael Snoyman

Early this week, I merged a commit which essentially shuts down the haskell-lang.org website. Besides a few rarely viewed pages without obvious replacements, visiting pages on https://haskell-lang.org will automatically redirect you to an appropriate page on https://haskell.fpcomplete.com. Also, consider this an announcement that there’s a new site around, https://haskell.fpcomplete.com!

The site is still being refined. However, to avoid confusion, someone requested that I make an announcement now. That request makes sense. In this post, I want to explain:

• Why I shut down haskell-lang.org
• What the goal of haskell.fpcomplete.com is
• What makes this new site different from haskell-lang.org
• Plans for the future

Onward!

I’ve been playing with the idea of shutting down haskell-lang.org for a while now. I didn’t want to simply turn it off, since there’s a lot of useful content there that I regularly use myself and point others to for training (we’ll get to that in a bit). It wasn’t a huge amount of work to put together the replacement site, but it did take some dedicated writing time. I got that time last week and did most of the work then.

I first publicly proposed shutting down the site in December of last year. I put together a proposal to get a bunch of people together to work on a new Commercial Haskell site. There wasn’t much interest in such collaboration, so I went with option B, which I’ll explain in the next section.

As for why haskell-lang.org should be shut down, I’ll quote the history section of the new site’s README:

This website replaces some previous false starts (and mistakes) at trying to create an opinionated hub of Haskell information. Previously, some FP Complete members participated in haskell-lang.org as another vision of an opinionated Haskell site. Later, we sought out collaborators for creating a more complete Commercial Haskell website.

haskell-lang.org was a mistake, and there was insufficient interest in commercialhaskell.com. Given that the branding for haskell-lang.org was incorrect, there was little interest from others in the general concept of creating an opinionated site, and we at FP Complete were still passionate about this topic, we decided to focus efforts on a site under our own branding and control.

This site is unapologetically opinionated, and follows what we have found to be the best route towards getting productive with Haskell quickly.

## The need for a site

haskell-lang.org has been collecting solid tutorials on general Haskell techniques and specific libraries. The collection is opinionated, so that I can use it for training courses I give, and point new users to it without needing to give any caveats about which approach to follow. And the site is backed by a Git repository using Markdown files, which I’m a huge fan of.

So first goal of this site would be: retain the technical content and educational approach provided by haskell-lang.org, without the bad history that goes along with that name.

## Other goals

At FP Complete, we’ve done a few surveys of the Haskell community to get an idea of what we can do to most help more companies adopt Haskell. From our last survey, it seems that more educational content is top of the list, followed by helping companies generally feel comfortable adopting Haskell. I covered the education aspect above, and we’ll continue to put efforts into improving that situation.

On the more nebulous adoption point, we’re adding two more goals to the new site:

• Provide promotion material: content that explains what makes Haskell a good choice for businesses, together with some potential drawbacks people should be aware of.
• Introduce a Success Program at FP Complete, providing affordable commercial Haskell support including training and mentoring. We believe this may help more companies adopt Haskell.

Enrolling in the Success Program is a paid service at FP Complete (though we are pricing it as low as we can, to maximize adoption without actually losing money). We’re hoping that the presence of a clearly defined and publicly priced commercial support package will help reduce perceived risk with Haskell further and allow more adoption.

## The future

The new site is still a work in progress. Overall, the styling still needs a bit of work, I want to refine the language some (and likely scale back the prose). I also want to refresh a bunch of the technical content to be in line with our current recommendations. This will also affect the Applied Haskell training I’ll be giving later this year. (Feel free to check out the current version of the course online.)

I still have some questions up in the air, like whether we’ll host a blog on this site or simply continue all FP Complete blogging activitiy on our corporate site. I’ve started putting together a bit of a philosophy page explaining the FP Complete approach to commercial Haskell development, and that needs expanding. And I’d like to get a more content on the contribute page to allow people to find a project they can cut their teeth on.

I hope that this new site not only allows for creation of and access to great Haskell content. I also hope that this is taken as a positive message from the rest of the community and an indication of how we at FP Complete, and I personally, intend to interact with the broader Haskell community going forward. We’ll remain opinionated on our technical recommendations, as I believe we should be. But hopefully this change in naming and purpose of the site remove any adversarial nature to sharing these opinions.

In the last year or so, I haven't actually written as much Haskell as I'd like to. There are a few reasons for this. First, I switched to a non-Haskell job, meaning I was now using my Windows laptop for all my Haskell coding. Even on my previous work laptop (a Mac), things weren't easy. Haskell doesn't have a real IDE, like IntelliJ for Java or XCode for iOS development.

Besides not having an IDE, Windows presents extra pains compared to the more developer-friendly Mac. And it finally dawned on me. If I, as an experienced developer, was having this much friction, it must be a nightmare for beginners. Many people must be trying to both learn the language AND fight against their dev setup. So I decided to take some time to improve how I program so that it'll be easier for me to actually do it.

I wanted good general functionality, but also some Haskell-specific functions. I did a decent amount of research and settled on Atom as my editor of choice. In this article, we'll explore the basics of setting up Atom, what it offers, and the Haskell tools we can use within it. If you're just starting out with Haskell, I hope you can take these tips to make your Haskell Journey easier.

## Goals

It's always good to begin with the end in mind. So before we start out, let's establish some goals for our development environment. A lot of these are basic items we should have regardless of what language we're using.

1. Autocomplete. Must have for terms within the file. Nice to have for extra library functions and types.
2. Syntax highlighting.
3. Should be able to display at least two code files side-by-side, should also be able to open extra files in tabs.
4. Basic file actions should only need the keyboard. These include opening new files to new tabs or splitting the window and opening a new file in the pane.
5. Should be able to build code using the keyboard only. Should be able to examine terminal output and code at the same time.
6. Should be able to format code automatically (using, for instance, Hindent)
7. Some amount of help filling in library functions and basic types. Should be able to coordinate types from other files.
8. Partial compilation. If I make an obvious mistake, the IDE should let me know immediately.
9. Vim keybindings (depends on your preference of course)

With these goals in mind, let's go ahead and see how Atom can help us.

## Basics of Atom

Luckily, the installation process for Atom is pretty painless. Using the Windows installer comes off without a hitch for me. Out of the box, Atom fulfills most of the basic requirements we'd have for an IDE. In fact, we get all our 1-4 goals without putting in any effort. The trick is that we have to learn a few keybindings. The following are what you'll need to open files.

1. Ctrl+P - Open a new tab with a file using fuzzy find
2. Ctrl+K + Direction (left/right/up/down arrow) - Open a new pane (will initially have the same file as before).
3. Ctrl+K + Ctrl+Direction - Switch pane focus

Those commands solve requirements 3 and 4 from our goals list.

Another awesome thing about Atom is the extensive network of easy-to-install plugins. We'll look at some Haskell specific items below. But to start, we can use the package manager to install `vim-mode-improved`. This allows most Vim keybindings, fulfilling requirement 9 from above. There are a few things to re-learn with different keystrokes, but it works all right.

Since Atom is so Hackable, you can also add your own keybindings and change ones you don't like. We'll do one simple example here, but you can also check out the documentation for some more ideas. One thing we'll need for goal #5 is to make it easier to bring up the bottom panel within atom. This is where terminal output goes when we run a command. You'll first want to open up `keymap.cson`, which you can do by going to the file menu and click `Keymapâ€¦`.

Then you can add the following lines at the bottom:

``````'atom-workspace':
'ctrl-shift-down': 'window:toggle-bottom-dock'
'ctrl-shift-up': 'window:toggle-bottom-dock'``````

First, we scope the command to the entire atom workspace. (We'll see an example below of a command with a more limited scope). Then we assign the Ctrl+Shift+Down Arrow key combination to toggle the bottom dock. Since it's a toggle command, we could repeat the command to move it both up and down. But this isn't very intuitive, so we add the second line so that we can also use the up arrow to bring it up.

A super helpful tool is the key binding resolver. At any point, you can use `ctrl+.` (control key plus the period key) to bring up the resolver. Then pressing any key combination will bring up the commands Atom will run for it. It will highlight the one it will pick in case of conflicts. This is great for finding unassigned key combinations!

Now let's start looking at adding some Haskell functionality to our editor. We'll start by installing a few different Haskell-related packages in Atom. You don't need all these, but this is a list of the core packages suggested in the Atom documentation.

``````language-haskell

The trickier part of getting Haskell functionality is the binary dependencies. A couple of the packages we added depend on having a couple programs installed. The most prominent of these is `ghc-mod`, which exposes some functionality of GHC. You'll also want a code formatter, such as hindent, or stylish-haskell installed.

At the most basic level, it's easy to install these programs with Stack. You can run the command:

``stack install ghc-mod stylish-haskell``

However, `ghc-mod` matches up with a specific version of GHC. The command above installs the binaries at a system-wide level. This means you can only have the version for one GHC version installed at a time. So imagine you have one project using GHC 8.0, and another project using GHC 8.2. You won't be able to get Haskell features for each one at the same time using this approach. You would need to re-install the proper version whenever you switched projects.

As a note, there are a couple ways to ensure you know what version you've installed. First, you can run the `stack install ghc-mod` command from within the particular project directory. This will use that project's LTS to resolve which version you need. You can also modify the install command like so:

``stack --resolver lts-9 install ghc-mod``

There is an approach where you can install different, compiler specific versions of the binary on your system, and have Atom pick the correct one. I haven't been able to make this approach work yet. But you can read about that approach on Alexis King's blog post here.

## Keybinding for Builds

Once we have that working, we'll have met most of our feature goals. We'll have partial compilation and some Haskell specific autocompletion. There are other packages, such as `haskell-hoogle` that you can install for even more features.

There's one more feature we want though, which is to be able to build our project from the keyboard. When we installed our Haskell packages, Atom added a "Haskell IDE" menu at the top. We can use this to build our project with "Haskell IDE" -> "Builder" -> "Build Project". We can add a keybinding for this command like so.

``````'atom-text-editor[data-grammer~/"haskell"]':
...

Notice that we added a namespace here, so this command will only run on Haskell files. Now we can build our project at any time with Ctrl+Shift+Alt+B, which will really streamline our development!

## Weaknesses

The biggest weakness with Atom Haskell-mode is binary dependencies and GHC versions. The idea behind Stack is that switching to a different project with a different compiler shouldn't be hard. But there are a lot of hoops to jump through to get editor support. To be fair though, these problems are not exclusive to Atom.

Another weakness is that the Haskell plugins for Atom currently only support up through LTS 9 (GHC 8). This is a big weakness if you're looking to use new features from the cutting edge of GHC development. So Atom Haskell-mode might not be fully-featured for industry projects or experimental work.

As a further note, the Vim mode in Atom doesn't give all the keybindings you might expect from Vim. For example, I could no longer use the colon key plus a number to jump to a line. Of course, Atom has its own bindings for these things. But it takes a little while to re-learn the muscle memory.

## Alternatives

There are, of course, alternatives to the approach I've laid out in this article. Many plugins/packages exist enabling you to get good Haskell features with Emacs and Vim. For Emacs, you should look at haskell-mode. For Vim, I made the most progress following this article from Stephen Diehl. I'll say for my part that I haven't tried the Emacs approach, and ran into problems a couple times with Vim. But with enough time and effort, you can get them to work!

If you use Visual Studio, there are a couple packages for Haskell: Haskelly and Haskero. I haven't used either of these, but they both seem provide a lot of nice features.

## Conclusion

Having a good development environment is one of the keys to programming success. More popular languages have full-featured IDE's that make programming a lot easier. Haskell doesn't have this level of support. But there's enough community help that you can use a hackable editor like Atom to get most of what you want. Since I fixed this glaring weakness, I've been able to write Haskell much more efficiently. If you're starting out with the language, this can make or break your experience! So it's worth investing at least a little bit of time and effort to ensure you've got a smooth system to work with.

Of course, having an editor setup for Haskell is meaningless if you've never used the language! Download our Beginners Checklist or read our Liftoff Series to get going!

# Freer Monads: Too Fast, Too Free

I agree the performance argument is way less important than the frequency at which it's thrown around makes it seem. The reason freer performance sucks is that you're repeatedly constructing and deconstructing trees at runtime. However, that is only a consequence of the implementation of freer as a GADT (initial encoding). I bet the final encoding can do wonders:

`newtype Freer f a = Freer (forall m. Monad m => (forall t. f t -> m t) -> m a)`

I spent a few days working through the implications of this, and it turns out to be a particularly compelling insight. Behold the microbenchmarks between `freer-simple` and an equivalent program written against `mtl`:

``````benchmarking freer-simple
time                 745.6 μs   (741.9 μs .. 749.4 μs)
1.000 R²   (0.999 R² .. 1.000 R²)
mean                 745.1 μs   (742.2 μs .. 748.5 μs)
std dev              10.68 μs   (8.167 μs .. 14.23 μs)

benchmarking mtl
time                 10.96 μs   (10.93 μs .. 10.98 μs)
1.000 R²   (1.000 R² .. 1.000 R²)
mean                 10.95 μs   (10.92 μs .. 10.99 μs)
std dev              119.3 ns   (93.42 ns .. 153.7 ns)``````

Not so good; `freer-simple` is like 75x worse in this case! But the same program again when written in this final encoding is pretty damn fast:

``````benchmarking too-fast-too-free
time                 24.23 μs   (24.10 μs .. 24.37 μs)
1.000 R²   (1.000 R² .. 1.000 R²)
mean                 24.27 μs   (24.15 μs .. 24.40 μs)
std dev              448.8 ns   (355.8 ns .. 586.1 ns)``````

It's roughly 2x slower than `mtl`, which is AKA 35x faster than `freer-simple`. This is pretty sweet, and it comes with the benefit of getting to keep the underlying semantics of `freer-simple`.

So without further ado, I'd like to share my work-in-progress with you, tentatively named `too-fast-too-free`. This is ready for prime-time, but I'd prefer to merge it to someone upstream rather than pollute hackage with yet another free(r) monad extensible effects package.

I'll do it if I have to, but the performance is fair game for anyone who wants it. If I don't hear from anyone by next week, I'll publish a new package to hackage and begin the freer monad revolution we've all been waiting for.

## What the Heck Is Any of this Stuff Anyway?

Let's investigate this finally-encoded type and see where this performance comes from:

``````newtype Freer f a = Freer
{ runFreer :: forall m. Monad m => (forall t. f t -> m t) -> m a
}``````

The type of `runFreer` is saying "if you give me a `Freer f a` and a natural transformation from `f` to some monad `m`, then I can give you back an `m a`." Sounds promising, right?

`Freer`'s instance for `Monad` is written in terms of this final `m`, and so short of shunting around some functions, we're not really paying any cost for binds compared to just writing in terms of `m`:

``````instance Monad (Freer f) where
Freer ma >>= f = Freer \$ \k -> do
a <- ma k
runFreer (f a) k``````

Compare this to the approach used by `freer-simple` which needs to allocate leafs in a tree for every bind (and for every `fmap` and `ap`!) That's a huge win already.

Turning `Freer` into `Eff` uses the same trick as `freer-simple`---let `Eff r` be `Freer (Union r)`, where `Union r` is a value that can be any effect in `r`. A natural transformation `forall m. Monad m => (forall t. Union r t -> m t)` must therefore handle every possible effect in `r`, and so we haven't lost any capabilities with our new encoding.

The challenging part was figuring out how to plumb state through the encoding of `Freer f a`---after all, many interesting interpreters are necessarily stateful.

Fortunately there's a trick. Because `Eff (e ': r) a` can be interpreted in terms of any `Monad` `m`, we can choose `m ~ StateT s (Eff r)`, and get our statefulness from `StateT` directly. Because `StateT`'s bind is written in terms of its underlying monad, this trick doesn't cost us anything more than shunting another few functions around.

We can achieve short-circuiting interpreters similarly by evaluating them via `ExceptT (Eff r)`. In fact, this pattern turns out to be pretty common---and it generalizes thusly:

``````transform
, MFunctor t
)
=> (forall m. Monad m => t m a -> m b)
-- ^ The strategy for getting out of the monad transformer.
-> (eff ~> t (Eff r))
-> Eff (eff ': r) a
-> Eff r b
transform lower f (Freer m) = Freer \$ \k -> lower \$ m \$ \u ->
case decomp u of
Left  x -> lift \$ k x
Right y -> hoist (usingFreer k) \$ f y``````

Admittedly the type is a little terrifying, but library code can specialize it down to less offensive combinators.

At the end of the day, this final encoding means that `Freer` code specializes down to its eventual result anyway, giving us the "fusion" of `fused-effects` without the boilerplate.

Hopefully these results are enough to finally put the "free monads have bad performance" argument to rest. I'll have some promising results on the `bracket` front as well that require some refinement, but hopefully those will come sooner than later.

### Douglas M. Auclair (geophf)

• May 27th, 2018:
data F = F { a, b, c :: String }
data Stamped a = { time :: Day, stamped :: a }

f :: Stamped F -> String

output of f x is  "time a b c" for the respective values of time, a, b, c

Is there some monadic / applicative elegant definition that does this?
• Nickolay Kudasov @crazy_fizruk
f = intercalate “ “ . sequence [ show.time, a.stamped, b.stamped, c.stamped ]
• Nickolay Kudasov @crazy_fizruk
f = intercalate " " <\$> http://pure.show .time <> (sequence [a, b, c]).stamped
A bit trickier, but shorter and uses stamped once.
• April 13th, 2018: given f :: [a] -> b -> [c]
where c is a derived from g :: a -> b -> c

You have [a] and [b]

Write a function h :: [a] -> [b] -> [c] from f

# Quadratic "deriving Generic" Compile Times

Summary: For large data types, `deriving Generic` can take a long time to compile.

I was building GHC using Hadrian, and part of that process involves compiling Cabal multiple times - once to build Hadrian itself, and then once per GHC stage thereafter (typically 3 stages). Compiling Cabal takes quite a long time, but one thing I spotted was that compiling the LicenseId file alone was taking 30 seconds! Given that it seemed to be on the critical path, and usually gets compiled 4 times, that seemed worth investigating.

Looking at the file, it defines an enumeration with 354 license types in it. Pulling apart the file, I found that with only the data type and `deriving Generic` it was taking about 9 seconds without optimisation or 22 seconds with optimisation, using GHC 8.6.3. I then wrote a test program that generated and compiled instances of the program:

``{-# LANGUAGE DeriveGeneric #-}module Sample whereimport GHC.Genericsdata Sample = Ctor1 | Ctor2 | ...   deriving Generic``

The resulting graph shows that compilation time is quadratic in the number of constructors:

This phenomenon could either be because the `deriving Generic` itself is quadratic, or more likely that the output generated by `deriving Generic` provokes quadratic behaviour in some subsequent part of the compiler. I'm going to report this as a GHC bug.

Update: There's already GHC bug 5642 which explains that it might just be a fundamental property of how `Generic` instances look.

For Cabal the fix might be somewhat simpler - the `deriving Generic` is only used to define a `Binary` instance, which can be written in 2 lines for an `Enum` anyway - discussion on the issue tracker. Hopefully this change will speed up compiling Cabal, and thus speed up compiling GHC.

As I watched the GHC output go by I noticed that both `Parser` and `DynFlags` in GHC itself were also very slow, with the latter taking nearly 2 minutes. I'm sure there's more scope for optimising GHC builds, so Andrey Mokhov and I have proposed a Google Summer of Code project to tackle other low-hanging fruit.

The quick program I wrote for generating the data in this blog post is below:

``import Control.Monadimport Data.Listimport System.Time.Extraimport System.Process.Extraimport Numeric.Extramain = do    forM_ [1..] \$ \n -> do        writeFile "Sample.hs" \$ unlines \$            ["{-# LANGUAGE DeriveGeneric #-}"            ,"module Sample where"            ,"import GHC.Generics"            ,"data Sample = " ++ intercalate " | " ["Ctor" ++ show i | i <- [1..n]]            ,"   deriving Generic"            ]        xs <- forM [0..2] \$ \o ->            fmap fst \$ duration \$ systemOutput_ \$ "ghc -fforce-recomp Sample.hs -O" ++ show o        putStrLn \$ unwords \$ show n : map (showDP 2) xs``

# Choosing a conduit randomly

Lately I’ve been playing around conduit. One thing I wanted to try out was to set up processing where one processing step was chosen on random from a number of components, based on weights. In short I guess I wanted a function with a type something like this

``foo :: [(Int, ConduitT i o m r)] -> ConduitT i o m r``

I have to admit I don’t even know where to start writing such a function1 but after a little bit of thinking I realised I could get the same effect by controlling how chunks of data is routed. That is, instead of choosing a component randomly, I can choose a route randomly. It would look something like when choosing from three components

``````                        +---------+   +----------+   +-------------+
| Filter  |   | Drop tag |   | Component A |
+-->| Value-0 |-->|          |-->|             |--+
|   +---------+   +----------+   +-------------+  |
+----------------+  |   +---------+   +----------+   +-------------+  |
| Choose random  |  |   | Filter  |   | Drop tag |   | Component B |  |
| value based on +----->| Value-1 |-->|          |-->|             |----->
| weights        |  |   +---------+   +----------+   +-------------+  |
+----------------+  |   +---------+   +----------+   +-------------+  |
|   | Filter  |   | Drop tag |   | Component C |  |
+-->| Value-2 |-->|          |-->|             |--+
+---------+   +----------+   +-------------+

``````

That is

1. For each chunk that comes in, choose a value randomly based on weights and tag the chunk with the choosen value, then
2. split the processing into one route for each component,
3. in each route filter out chunks tagged with a single value, and
4. remove the tag, then
5. pass the chunk to the component, and finally
6. bring the routes back together again.

Out of these steps all but the very first one are already available in conduit:

What’s left is the beginning. I started with a function to pick a value on random based on weights2

``````pickByWeight :: [(Int, b)] -> IO b
pickByWeight xs = randomRIO (1, tot) >>= \ n -> return (pick n xs)
where
tot = sum \$ map fst xs

pick n ((k, x):xs)
| n <= k = x
| otherwise = pick (n - k) xs
pick _ _ = error "pick error"``````

Using that I then made a component that tags chunks

``````picker ws = do
evt <- await
case evt of
Nothing -> return ()
Just e -> do
p <- liftIO \$ pickByWeight ws
yield (p, e)
picker ws``````

I was rather happy with this…

1. Except maybe by using Template Haskell to generate the code I did come up with.

2. I used `Quickcheck`’s `frequency` as inspiration for writing it.

# Worstsort

Thanks for the responses to my previous post about finding roots of polynomials; I now have some new avenues to explore. But today I want to write about something completely different. I recently stumbled across this fun paper by Miguel Lerna. I realized a Haskell implementation would be very elegant, and I couldn’t pass up the opportunity to share.

``````> import Data.List (permutations, insert)
>
> badsort :: Ord a => Integer -> [a] -> [a]
> badsort 0 = foldr insert []
``````

Claim: `badsort k` is a correct sorting algorithm for any natural number `k`. Before reading further I recommend staring at that until you understand what it’s doing and why it works.

• `badsort 0` is just plain old insertion sort.

• `badsort k xs` creates the list of all permutations of `xs`, sorts them into lexicographic order, and selects the first. This works because the lexicographically smallest permutation of `xs` is, in fact, the one which is sorted.

Oh, and of course, sorting the permutations lexicographically is done by a recursive call to `badsort (k-1)`. (As an aside, I like how seamless this is in Haskell with polymorphic recursion—each recursive call is at a different type.)

Here are a few examples to show that it works:

``````ghci> badsort 0 [3,1,2]
[1,2,3]

ghci> badsort 1 [3,1,2]  -- generates 6 permutations
[1,2,3]

ghci> badsort 2 [3,1,2]  -- generates 720 permutations of 6 permutations
[1,2,3]
``````

`badsort 3 [3,1,2]`, if we tried it (not recommended!!), would generate all possible permutations of the list of 720 permutations of the list of 6 permutations of `[3,1,2]`. The number of such permutations is, of course, $720!$, which has $1747$ decimal digits; there is literally not enough space in the universe to store all those permutations.

In general, `badsort k` is a correct sorting algorithm1 which takes time $\Omega((n!^{(k)})^2)$, where $n!^{(k)}$ denotes the $k$-fold iterated factorial of $n$, that is, $n!^{(0)} = n$ and $n!^{(k+1)} = (n!^{(k)})!$. (This doesn’t even take into account the time for accessing memory; at this scale we certainly can’t assume memory access takes constant time. Fetching memory from a data center in another galaxy takes a while, you know? =)

## It gets worse

``````> worstsort :: Ord a => (Integer -> Integer) -> [a] -> [a]
> worstsort f xs = badsort (f n) xs
>   where
>     n = fromIntegral \$ length xs
``````

Worstsort is parameterized by a function on natural numbers, and calls `badsort` with a recursion depth given by the function $f$ applied to the length of the list. Oh my.

Just for fun, let’s try, oh, say, the Ackermann function.

``````> ack :: Integer -> Integer -> Integer
> ack 0 n = n+1
> ack m 0 = ack (m-1) 1
> ack m n = ack (m-1) (ack m (n-1))
>
> diabolicalsort :: Ord a => [a] -> [a]
> diabolicalsort = worstsort (\n -> ack n n)
``````

Here are some fun properties of `diabolicalsort` (and any other instantiation of `worstsort`):

• It will provably terminate in a finite amount of time for any input! Although probably the words “terminate” and “finite” should be in scare quotes.

• In some sense I can’t quite define formally but still believe in my heart, it “doesn’t cheat” in the sense that it is always “making real progress” towards sorting the input list. If you are trying to design a slow sorting algorithm, it would be cheating, for example, to make an algorithm that spins in a useless loop for a thousand years and then does insertion sort.

• It works in practice on lists of length 1 or 2, but length 3 is completely hopeless. `ack 3 3 = 61`, so we are looking at the 61-fold iterated factorial of 3, which is a… rather large number.

• `ack 4 4` is $2^{2^{2^{65536}}} - 3$; there are not enough atoms in the universe to even write down this number in base 10. And then of course we take that number and iterate factorial that many times on $4$. Sheesh.

• Let us not even speak of lists of length 5.

The upshot of this, in the end, is that it is possible to make a “non-cheating” sorting algorithm whose running time grows faster than any computable function you care to choose (proof: take your chosen computable function and substitute it for `f`).

1. It might be a fun exercise to prove this formally using a proof assistant.

# Senior Haskell / Full Stack Developer at PRODA Ltd (Full-time)

Position summary

We are looking for a full-time senior software engineer to join our team in London or work remotely.

We want you to be someone who is looking to really help shape the future of the development team, have real impact on strategy, architecture and be the lead on our most important projects.

You will expand PRODA's capabilities in data ingestion, data exports, data standardisation and machine learning. We work in a functional programming stack in Haskell and Elm, together with machine learning components written in Python. Our development process includes CI/CD and automated tests. All developers work closely with our Chief Product Officer.

Skills & experience required

• Significant experience in a functional programming language, for instance: Haskell, Scala, F#, Ocaml, Clojure, etc.
• Extensive knowledge working with a major RDBMS: PostgreSQL, MySQL, MS SQL Server.
• Significant exposure to Javascript (node.js) and SPAs.
• Good understanding of Linux and Shell Scripting in Bash.
• Experience with basic machine learning algorithms, scikit-learn, Python, numPy, and pandas is highly beneficial.
• Experience with AWS Cloud services, particularly EC2, RDS, Codedeploy, Codebuild, Codepipelines, S3.
• Knowledge of build tools: Webpack, Makefiles.
• Experience with web servers: basics of load balancers, Nginx configuration, etc.
• Familiarity with Git and GitHub.
• Familiarity with Docker.

PRODA is a PropTech start-up founded in January 2017, backed by large Prop-tech players and real estate investors. We focus on solving core data processing pain points in commercial real estate, and believe that unstructured data is a key barrier to digital transformation in the industry.

PRODA is combining real estate expertise with data science to develop machine leaning-enabled software, with the aim of truly automating many analysis and reporting tasks for real estate professionals.

Our software automatically captures property data from spreadsheets, before standardising, error-checking & securely storing it. This enables users to standardise and access all their data, regardless of the country, asset class or company that it originated from.

Once standardised, many exciting machine-learning analyses, track changes and machine written reports are possible.

Please find press releases and the PRODA Whitepaper in the PRODA news section

Based in a smart We-work in London's South-bank PRODA has:

• Modern, open office with communal areas, quite spaces, table games, unlimited teas and coffees and beers in the evenings
• Focus on getting in specialists, encouraging meetups, conferences, training and learning for software engineers
• Regular team meals and activities
• Culture of smart individuals who collaborate to solve interesting problems using interesting technologies

Get information on how to apply for this position.

# Finding roots of polynomials in Haskell?

tl;dr: There are several Haskell packages one can use to find an individual root of a function on a certain interval. But I’ve had less luck finding a suitable package for finding all the roots of a given polynomial. This blog post consists of my notes and a plea for help.

The `diagrams-solve` package contains miscellaneous numeric solving routines used in diagrams, including tridiagonal and cyclic tridiagonal linear system solvers (used for generating cubic splines and emulating Metafont paths) as well as functions for finding roots of low-dimensional polynomials (quadratic, cubic, and quartic). Solving quadratics is used in a bunch of places; solving cubics is needed specifically for doing interior/exterior testing on closed loops built from cubic Bezier curves; thankfully we have never needed to solve a quartic or higher.

Unfortunately, the polynomial root solvers are pretty naive: I simply transcribed some formulas (which is why it only goes up to quartics). This works OK a lot of the time, but the formulas are very numerically unstable, and it’s not hard to come up with example inputs where the returned roots are off by quite a bit. In fact, people regularly run into this when running the test suite. I am not specifically aware of any diagrams bugs that have arisen in actual practice due to the cubic solutions being off, but it’s probably just a matter of time.

So I decided it was finally time to look into better root-finding methods. This blog post is both a plea for help and a place to write down some of the things I’ve learned so far.

## There’s root finding, and then there’s root finding

The first thing to get out of the way is that when you talk about “root finding” there are (at least!) two pretty different things you could mean:

1. Given a function $f$ and some particular interval, or an initial guess, find a value $x$ in the interval/close to the initial guess for which $f(x) = 0$.
2. Given a polynomial with {real, complex} coefficients, find all its {real, complex} roots.

If you want to do (1), there are several nice Haskell packages you could use. The `Numeric.RootFinding` module from the `math-functions` package is probably your best bet; it implements both Ridders’ method and the Newton-Raphson method, which both attempt to find a single root of a function on a given interval. They both work on any continuous `Double -> Double` function, not just polynomials (Newton-Raphson also needs to know the first derivative). But they don’t work well if you don’t already have an interval in mind to search; and if you want to find all the roots you have to call these multiple times (somehow coming up with an appropriate interval each time).

As for (2), I haven’t been able to find anything that would work well for `diagrams-solve`. Here are my notes:

• The `dsp` package has a module `Polynomial.Roots` containing an implementation of Laguerre’s method, which finds all the (complex) roots of a polynomial. However, the `dsp` package is a rather heavyweight dependency to pull in just for the root-finding code; it’s also licensed under the GPL and I’d like to avoid having to “infect” the entire diagrams ecosystem with the GPL.

• Laguerre’s method seems like it should be fairly easy to implement myself—but writing my own solver from scratch is how I got here in the first place; I’d really like to avoid it if possible. I am far from being an expert on numerical analysis, floating-point computation, etc.

• The `hmatrix-gsl` package has the `Numeric.GSL.Polynomials` module, which has an interface to a root finding algorithm from GSL (apparently using something called “balanced-QR reduction”), but I’d like to avoid pulling in a C library as as dependency, and also, again, GPL.

• From the Wikipedia page for Laguerre’s method I learned that the Jenkins-Traub algorithm is another widely used method for polynomial root-finding, and often preferred over Laguerre’s method. However, it seems rather complex, and the only Haskell implementation of Jenkins-Traub I have been able to fnid is this one which seems to be just a toy implementation; I don’t even know if it works correctly.

If you know of a good place where I can find polynomial solving code in Haskell, can you point me to it? Or if you know more about numerics than me, could you maybe whip up a quick implementation of Laguerre’s method and put it up on Hackage?

# Randomization Testing for an SQL Translator - FP Complete

Not all SQLs are created equal. Iâ€™ll say even more, none of the SQL dialects are even close to being equal. In particular, when talking about Microsoft SQL Server and PostgreSQL, their syntax might look similar. However, in their semantics, they are mountains apart. Today I will describe

# Freer Monads, More Better Programs

If you consider yourself a Haskell beginner, this post is not aimed at you! You're going to want to understand `DataKinds` and `RankNTypes` in order to get things done. Feel free to read anyway, but keep in mind that the technical solutions described here are tricky.

Every two weeks in the functional programming slack, we get into a big argument about "the right way to structure programs." The debate goes around and around in circles; names get called; feelings get hurt. We never get anywhere, and the whole process starts again 14 days later. Frankly, it's exhausting.

As best I can tell, the community roughly fragments itself along four lines---those who like mtl, those who say "just do everything in Reader IO", those who like the three layer cake, and those who think free(r) monads are worth their weight in gold.

Being in the latter camp is frustrating, because everyone has strongly negative opinions on freer monads, and as best I can tell, nobody seems to have ever actually used them.

As one of the few people in the Haskell-sphere who has actually used freer monads in both anger and production, I wanted to set the record straight. So today, let's talk about freer monads---what they are, what they buy you, how they work, and what wide-scale adoption could buy us. Yes, I'll also talk about what their weaknesses are.

## Criminally Misunderstood

Freer monads are amazingly powerful. Much more so than I think anyone realizes---including many of the people who maintain the libraries. There's a lot of free, super-common, crazy-generic functionality that exists, but isn't anywhere useful.

Freer monads are so much more than just a different way of expressing monad transformers. They're a completely orthogonal means of composition that doesn't really exist anywhere else in the Haskell ecosystem. By not using them, you are condemning yourself to writing a significant amount more of significantly more complicated code than you need to be.

Traditional monad stacks can be understood as "a small, established toolbox for side effects." You say "I want some state, so I will add a `StateT` transformer," with the understanding that this is isomorphic to `s -> (s, a)`. That means it only ever does local state.

I'd suggest we instead think about freer monads as "implementation mix-ins," or equivalently, as "compiler passes." Code written against freer monads is exceptionally high-level and doesn't mix concerns. It's all "business logic", where the implementation details are handled in layers, each one performed by a pass that simplifies the representation.

These passes are type-safe, independent and composable. You can mix-and-match which ones you want---which means testing is often just swapping in a test pass somewhere near the bottom of the stack. By mocking out different layers, it's easy to get 100% test coverage, without ever needing to write full-scale integration tests.

The beauty of this system is that the testability itself also composes. If your program runs properly under the test pass, and you can prove that both your test and real pass are also correct, this correctness composes to the entire program.

Having an exceptionally high-level representation of your program's intent is valuable in another way. Let's take a `State` capability as an example. This thing might correspond to local state, or a database, or maybe even just `GET`/`POST` HTTP requests. Who knows? But also, who cares?

Most of the people reading the code, most of the time, don't actually care what are the semantics behind the state. Those semantics are implementation details, interesting only to the people who care about the implementation. If you're tracing a program flow, and aren't interested in the database side of things, it's a lot nicer to not need to wade through a bunch of irrelevant database code.

In short, freer monads let you separate the high-level "what am I trying to do" from the low-level "how to actually do it."

The `Eff` monad is parameterized by a type-level list of effects (or capabilities as I will also call them.) This list is kept polymorphic, and constraints are enforced on it to ensure that certain effects are available.

For example, the type `StateT String (ReaderT Int IO) Bool` is analogous to `Eff '[State String, Reader Int, IO] Bool`.

However, the type `(MonadState String m, MonadReader Int m, MonadIO m) => m Bool` in the `mtl` style also has an analogue: `(Member (State String) r, Member (Reader Int) r, Member IO r) => Eff r Bool`.

Freer monads are extensible in their effects---that means you can write your own, and use them completely interchangeably with existing effects. It's trivial to write a new effect, as they're just GADTs:

``````data Teletype a where
GetLine ::           Teletype String
PutLine :: String -> Teletype ()``````

This is all it takes to define a new effect. We now have a `Teletype` effect, and we can use it after a small amount of (freely derivable) boilerplate:

``````getLine :: Member Teletype r => Eff r String
getLine = send GetLine

putLine :: Member Teletype r => String -> Eff r ()
putLine = send . PutLine``````

Notice that the `a` in `Teletype a` describes the type you get back from calling this operation.

Our new `Teletype` effect corresponds to a domain specific language that can talk about reading and writing lines on a teletype. It's important to keep in mind that there is no meaning associated with this effect. We get no semantics, other than an implicit, unverified guarantee that "it probably does what you expect."

However, this lack of pre-established semantics is a feature, rather than a bug. The semantics are given after the fact by interpretations of the effects. One interpretation of `Teletype` might be to perform it in `IO`, interacting directly with the console. Another might be in the form of `POST`ing `putLine`s to an HTTP server, and returning the results of a `GET` for `getLine`. Another could just do everything purely in memory.

Freer monads are extensible not only in their effects, but also in their interpretations. You can give new interpretations for existing effects, and for your own.

`freer-simple` offers several combinators for constructing new effects, which we'll explore in the example below.

## Solving Problems with Freer Monads

Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations.

-- Melvin Conway

Freer monads are a data representation of your program, which then gets interpreted at finer-and-finer grained resolution until it's just code.

In other words, they enforce a clean boundary between "saying what you mean" and "saying how to do it." They let you focus on writing business logic, and relegate the implementation details to library code.

They give you composable, plug-and-play functionality for transforming a high-level business logic spec into an actual implementation.

As an example of how this works on a real-life application, let's write a program that fetches a CSV file from FTP, decrypts it, streams its contents to an external pipeline, and tracks its stats in Redis.

``````ingest
:: ( Member (Input Record) r
, Member (Output [Record]) r
, Member (Output Stat) r
)
=> Eff r ()
ingest = nextInput >>= \case
Nothing     -> pure ()
Just record -> do
output [record]
output ProcessedRecordStat
ingest``````

Done!1 Pretty easy, hey?

"Now hold on a minute, Sandy! That program you wrote doesn't do what you said! It doesn't fetch files from FTP, it doesn't decrypt them, and it doesn't do anything with Redis."

That's right. It doesn't. What this does is exactly what the business people say they want---it moves data from one place to somewhere else, and lets you know about it. The rest are implementation details, and aren't relevant to anyone except the particular engineers responsible for this piece of the system. Engineers on other teams in your organization probably don't even care about the implementation details.

Written like this it's easy for people to get a sense of what you're trying to accomplish, without needing to know the nitty-gritty details connection management, credential management, performance enhancements, error handling, or database details. It concisely describes the goal, and leaves the irrelevant bits out of sight and out of mind.

Of course; not everyone wants this high-level picture. The people responsible for this service really and truly do care about how the thing actually works. At least, at some level of abstraction. The people whose job it is to ingest data probably care about the service's performance and error handling, but likely don't have strong opinions on the semantics of fetching data, the encryption schemes used, the database layout or the choice of the external streaming pipeline. They probably don't even care that they're ingesting CSV files---they'd just as happily consume real-time JSON requests.

The goal is to make it easy for people to analyze the pieces they understand and are responsible for, and hide the noise of the underlying details to someone else.

So, how to do we get from our high-level description to a real program? We transform it into a slightly less-high-level program. For example, in order to get our `Input` of `Record`s, we do actually need to parse a CSV file. You'll notice that such a problem really doesn't care where the file comes from; it just wants something to read.

So we write an interpretation of `Input Record` in terms of CSV files. This suggests we have some sort of `FileProvider` capability, whose job it is to actually get use the file in question.

``````csvInput
:: ( Member FileProvider r
, FromCSVRow i
)
=> FilePath
-> Eff (Input i ': r) a
-> Eff r a
csvInput file m = do
contents <- getFile file
let csvData = toList \$ parseCSV contents
handleRelayS csvData (const pure) bind m  -- discussed immediately after
where
--  bind :: [i] -> Input i x -> ([i] -> x -> Eff r a) -> Eff r a
bind (row : rows) NextInput k = k rows \$ Just row
bind rows@[]      NextInput k = k rows Nothing``````

`csvInput` takes a file name, reads that file in terms of some abstract `FileProvider` capability, and then returns one row of the result every time `nextInput` is called in the higher-level application.

This function is implemented in terms of the `handleRelayS` combinator. You can think of `handleRelayS` as being parameterized on what `return` and `(>>=)` mean for the effect being interpreted. In addition, it allows you to thread a piece of state between binds and the final `return`.

Our implementation of `bind` is to return a new row of the CSV file for every subsequent call to `nextInput` in the original program. We accomplish this by returning the head of the list of rows, and then passing the tail as the next piece of state.

In effect, we've described what it means to have an `Input` capability in terms of what it means to have a `FileProvider` capability. Notice that this isn't the only interpretation of `Input`---it could just as well be implemented by reading from a streaming source, or by always giving back the same result, or by cycling through a static list.

The point is that the people writing the service don't care where this data is coming from. All they care is that they can read it and pipe it to the right place. In fact, they might want to test that their service works by calling it on a constant stream of data---so instead they can interpret it purely:

``````pureInput
:: [i]
-> Eff (Input i ': r) a
-> Eff r a
pureInput is = handleRelayS is (const pure) bind
where
--  bind :: [i] -> Input i x -> ([i] -> x -> Eff r a) -> Eff r a
bind (row : rows) NextInput k = k rows \$ Just row
bind rows@[]      NextInput k = k rows Nothing``````

(for bonus points, you can implement `csvInput` in terms of `pureInput`)

Ok, great! The next step towards a working program is to give an interpretation of a `FileProvider`. We'll write two---one in terms of a lower level `FTP` capability, and one in terms of regular everyday `IO`:

``````ftpFileProvider
:: Member FTP r
=> Eff (FileProvider ': r) a
-> Eff r a
ftpFileProvider = interpret \$ \(GetFile filename) ->
ftpGet filename``````
``````localFileProvider
:: Member IO r
=> Eff (FileProvider ': r) a
-> Eff r a
localFileProvider = interpret \$ \(GetFile filename) ->

Often you just want to reinterpret an effect in terms of some other effect you already have (here, `FTP` and `IO`, respectively). In this case, it's sufficient to just use the `interpret` combinator, which takes implements your interpretation via a natural transformation (something of the form `forall x. effect x -> Eff r x`.)

For testing, you might also want a mock filesystem---`pureFileProvider :: Map FilePath ByteString -> _`.

Our program can now provide provide an `Input` capability via a `FileProvider` capability, via `IO` directly or via an `FTP` capability. You get the picture.

Something we haven't yet handled is file decryption. It's worth noting that this concern is largely orthogonal to `FileProvider`s; we'd like to be able to mix-in the capability to deal with encrypted files regardless of what the actual mechanism for files looks like.

For that, we're exposed to yet another combinator for writing interpretations; `interpose`. This combinator allows us to interpret a capability in terms of itself. Which means, we can intercept calls to a capability without necessarily handling them. Providing decrypted files is a good use case for this---we can intercept requests for files, and silently decrypt them before giving them back.

``````decryptFileProvider
:: Member Encryption r
=> Eff (FileProvider ': r) a
-> Eff (FileProvider ': r) a
decryptFileProvider =
interpose \$ \(GetFile filename) -> do
cyphertext <- getFile filename
decrypt cyphertext``````

We've gained the ability to inject logic around other interpretations!

Assuming we have an FTP implementation, the `Input` side of the coin is done. Now to deal with the `Output`s of our ingestion program. Remember, we want to put our records into some external streaming service. We can naively provide an interpreter that `POST`s these records against our service.

``````postOutput
:: ( Member HTTP r
, ToJSON i
)
=> (i -> HttpRequest 'POST)
-> Eff (Output i ': r) a
-> Eff r a
postOutput mkReq = interpret \$ \Output i ->
postHttp \$ mkReq i``````

Assuming we have another interpretation `HTTP ~> IO`, we're now good to go!

This works, but accounting comes back a few days later and complains---our streaming bill is suddenly really big. Apparently we pay per API call. Uh oh. The good news is that the API can handle up to 500 records per `POST`. So, we can just write another interpret that batches writes before posting them.

``````batch
:: Int
-> Eff (Output [i] ': r) a
-> Eff (Output [i] ': r) a
batch size = interposeS (0, []) finalize bind
where
-- finalize :: (Int, [i]) -> a -> Eff (Writer [i] ': r) a
finalize (_, acc) a = do
output acc
pure a

-- bind
--     :: (Int, [i])
--     -> Output [i] x
--     -> ((Int, [i]) -> x -> Eff (Writer [i] ': r) a)
--     -> Eff (Writer [i] ': r) a
bind (nacc, acc) (Output o) k = do
let no     = length o
total  = acc <> o
ntotal = nacc + no
if (ntotal >= size)
then do
let (emit, acc') = splitAt size total
output emit
k (ntotal - size, acc') ()
else k (ntotal, total) ()``````

Cool. Now sticking a `batch 500` pass before `postOutput` will batch all of our transactions sent to the API. Again, our business-logic doesn't change, because it need to care about this implementation detail.

We could continue on, but at this point you've seen most of the machinery freer monads give us. At the end of the day, `main` will end up looking like this:

``````main :: IO ()
main = runM
. runRedis
. runFTP
. runHTTP
. runEncryption
. redisOuput @Stat   mkRedisKey
. postOutput @Record mkApiCall
. batch      @Record 500
. ftpFileProvider
. decryptFileProvider
. csvInput "file.csv"
\$ ingest``````

It composes nicely, and the compiler will yell at you if you forget to handle any of required capabilities.

Behavior can be mixed in at will; some other common things you might want include retry-with-backoff, service discovery, chaos-injection, etc.

Over time and scale, you'll realize that most of your application code is the same crap over and over again---read configuration, connect to a database, deal with retry, shuffle data from one buffer to another. It's often hard to see this when it's written with a traditional monad stack, because traditional monad stacks don't give you the tools to abstract it away.

As you get into the habit of building new effects and interpretations for those effects, you'll see that new applications are often ready to ship after 25 lines of business logic and another 25 lines of choosing the right interpretations for it.

There are several arguments against freer monad, some of which are good, but most of which are terrible.

Free monads suffer from O(n2) complexity when used naively, which is unfortunately what you get by default. Freer monads are optimized via a queue which provides constant-time construction of the default case.

Yes, freer monads are today somewhere around 30x slower than the equivalent `mtl` code. That's roughly on par with Python, but be honest, you've deployed Python services in the past and they were fast enough. And besides, the network speed already dominates your performance---you're IO-bound anyway.

If you are writing real-time services maybe this will be an issue, but you're probably not. And if you are, optimizing Haskell is likely a skill you already have.

A subtle point to notice is that it's the monadic bits of the code that are 30x slower. Not "your program is 30x slower if you import `Control.Monad.Freer`"---but simply that you will spend more time in binds than you would in another monad. But your program isn't only monadic in `Eff`; it also needs to compute expressions and wait for `IO` and all of that stuff.

If it makes you feel better, I recently got a 15% performance increase by just more aggressively inlining some of the combinators. This suggests there's a lot of low-hanging optimization wins for anyone willing to go through the work to pluck it.

In short: worry about writing good code first, and deal with performance if it becomes an issue.

### Purescript Abandoned Eff

Purescript had a thing called `Eff`, but it was not the same as this. From the `purescript-eff` readme:

As of PureScript 0.12 the default type for handling effects is Effect from purescript-effect. This differs from Eff by removing the row of effect types. This decision was made as getting the effect rows to line up was sometimes quite tricky, without providing a great deal of benefit.

There is also purescript-run now, which uses a similar effect row mechanic but provides true algebraic effect handling. [emphasis mine]

The `Eff` described in this document is equivalent to `purescript-run`.

## Reasonably Good Arguments Against Freer Monads

### `ContT` is Not an Algebraic Effect

I never really understood this one as stated---I've never actually used `ContT` in a real monad stack. Have you?

But the sentiment behind this argument is better stated in human as "`Eff` is unable to model resource bracketing." Which is to say, it's hard to make sure an `Eff` program calls all of its finalizers.

The good news is that there's a solution if your allocation and cleanup code only requires `IO`---you can just interpret your entire `Eff` monad directly into `ResourceT`:

``````bracket
:: Member (ResourceT IO) r
=> IO a
-> (a -> IO ())
-> (a -> Eff r b)
-> Eff r b
bracket alloc dealloc m = do
(key, a) <- send \$ allocate alloc dealloc
result   <- m a
send \$ release key
pure result``````

Specialize `bracket` with your own first two parameters to taste.

More annoyingly, the lack of `ContT`-support means that it's hard to write effects that imply asynchronicity. That's not to say it's impossible, merely that it doesn't compose in the same nice way that synchronous effects do.

This is bad, but not disastrously so. You can spin up a thread pool elsewhere, and add a capability that sends effects to it:

``````data AsyncEff capabilities a where
AsyncEff
:: Members capabilities r
=> Eff r a
-> AsyncEff capabilities ()

:: Members capabilities r
=> (forall x. Eff r x -> IO x)
-- ^ An interpretation stack from `Eff r` into `IO`
-> IO (InChan (AsyncEff capabilities))
(in, out) <- newChan 10
void . async . forever \$ do
void . async \$ runEff m
pure in

asyncEff
:: Member IO r
=> InChan (AsyncEff capabilities)
-> Eff (AsyncEff capabilities ': r) a
-> Eff r a
asyncEff chan = interpret \$ send . writeChan chan``````

Changing the interface to fill an `MVar` upon completion of the task and make it available to the original `Eff` program is an exercise left to the reader.

### The Error Messages Are Bad / It's Too Complicated

This has historically been true. While `freer-simple` makes the situation significantly better, there is definitely room for improvement on this front.

First things first, `Eff` eschews the functional dependencies that `mtl` has. This means you can have multiple `Writer` effects in the same stack in `Eff` (but not in `mtl`) at the cost of type-inference.

This is both a feature, and, I won't lie to you, amazingly annoying at times. It's a feature because lots of things are just `Writer` effects. It's annoying as heck because polymorphism makes it eat shit.

For example, consider the following innocuous looking program:

``````foo :: Member (Writer Int) r => Eff r ()
foo = tell 15``````

Seems fine, right? Wrong. Because `15` is actually `fromInteger 15 :: Num a => a`, this program will complain about not having a `Writer a` capability. You as a human know what should happen here, but the compiler is stupid.

Thankfully the solution is simple, but it requires knowing what's wrong and how to fix it.

``````foo' :: Member (Writer Int) r => Eff r ()
foo' = tell @Int 15``````

If you're going to be doing a lot of work with polymorphic effects, a low-energy solution is to just provide a locally-bound monomorphic type:

``````foo'' :: Member (Writer Int) r => Eff r ()
foo'' = do
let tellInt = tell @Int
tellInt 1
tellInt 2
tellInt 3``````

All of this is much less user-friendly than it should be. However, in my experience, people quickly learn how to debug problems like this. It was enough to have an "Eff mentor" on our team, whose job it was was to promptly reply to "I don't know why this doesn't work."

### Jesus Help Me There Are A Lot of Unmaintained Free(r) Monad Packages

Tell me about it. Even as someone who is keenly interested in this stuff I have a hard time keeping up with the situation.

Here's the skinny---I'd strongly recommend `freer-simple`. Failing that, if you really, really, really need the performance, take a look at `fused-effects`.

Ignore the other ones.2

## Conclusion

Freer monads are fucking sick and you'd be foolish to not at least consider them for your next project.

Furthermore, if you're going to continue insisting on saying that \$technology is better, I strongly encourage you to write up a similar argument stating your case. My mind is open on this; if you make a strong argument, I'm more than happy to denounce this article and jump on the \$technology train too.

It's worth keeping in mind that despite our small differences; we're all on the same team here. We all love functional programming and want to do our best to make it more popular. As best I can tell, the best strategy towards that aim is to come up with a consensus on how to do things, and to stop the needless infighting.

One love.

1. `Input` and `Output` are called `Reader` and `Writer` respectively in `freer-simple`. I decided to not use this terminology in order to prevent people from thinking that these are the same monads they're used to.

2. If you're the maintainer of another effects package and want me to include it here, shoot me an email and make an argument!

# The types got you

Mark Karpov

Haskell is in a way a unique language. It is, to be clear, a language people and businesses use for building “serious stuff”, yet it is also a language which remains a platform for experimentation with functional programming. This means that the teams which develop in Haskell have access to expressivity and flexibility few other languages allow.

As the GHC compiler evolves and grows its feature set, one couldn't help but wonder about the connection between power provided by a programming language and the process of making design decisions when developing software.

We could start by stating that power and freedom are dangerous. Indeed, in human societies these have been continuously restricted or contained on different levels with tools like law, tradition, culture and religion. Which is not at all bad, because for most people it's hard to work, live, and coexist without preset constraints. Freedom is generally confusing. The statement holds in software systems as well: a task is always more feasible if some questions have been answered definitively before you start working. With each question answered for you, a possibility to screw up is removed.

Haskellers know and seem to actively acknowledge that limiting expressivity and freedom is good for them. They like DSLs and can argue endlessly how to model monadic effects in a way that a piece of code can only do exactly what is necessary and not more. So why not take a step back and look at the bigger picture? And by “bigger picture” I mean the language as a whole, as implemented by GHC.

No, I'm not suggesting removing features from the compiler. But with great power comes great responsibility. Or in our case, rather a need for great caution. Using advanced features of the type system takes a fair bit of judgment to get right. Judgement comes from experience. But who really has this experience? Who are we but a relatively small group of engineers trying to build production software using tools that relatively few people have used for this before, and even fewer people have used successfully?

Some features may be so new that no one yet truly knows what to do with them. And by “what to do with them” I don't mean that people don't know how to make fancy stuff compile. I assure you, this is accessible to many. The problem is in predicting what use of the features will give you in long term, is it worth it? Will it actually slow down the development? Will it make it harder for new people on the team to start working on your code? Are you actually catching more bugs at the compile time, or do you just think you do? Are you thinking about the practical results or about intricate niceties of your code?

Developing in Haskell is hard because it's easy to take a wrong turn that may be fatal. I'm writing this on the basis of my experience as a consultant. The anecdata of a single person might not be convincing to you, but if there were to be just one common theme across many of our projects at Tweag in the past year, it would be this: we have witnessed the high cost of entirely incidental complexity many times over. Our data suggests this issue is real and very common.

Unfortunately, the problem is not limited to code in the language itself. There seems to be a trend for disregarding common engineering practices, reinventing solutions, preferring solutions written in Haskell, or solutions which just seem to be “nicer” without doing unbiased comparison and analysis.

In the essay The bipolar Lisp programmer Mark Tarver made the point that the Lisp programming language attracts a certain kind of personality. I think it is generally true that there is a connection between personalities and the tools that people choose. Perhaps this is most prominently revealed in niche communities that are made up from enthusiastic people, not just people who have to pay the bill.

Haskell does attract a certain kind of personality. Who in their sane mind would be entirely oblivious to the many barbaric and funny words part of our lingua franca in documentation, tutorials and books today? Probably someone who is either very convinced in the benefits of typed FP or just someone who has, let's admit it, a bit adventurous and curious mind. It is easy to forget the timeless principles of robust engineering in this intellectual pursuit, perhaps even without noticing it.

Good music producers listen. They judge everything by how it sounds, because when the result is rendered and saved, people won't know how many effects, plugins, advanced techniques from some magazine were used. Nope, they really could not care less about that. They only care about the sound. For a beginner though, it is tempting to throw in a lot of processing and clever tricks, and in doing so they often overdo and produce a track that sucks.

The amateur producer from our example tried to make a record better, but he has not yet understood that his focus is misplaced. I think the situation is often the same with development in Haskell.

At the most basic level, development is about efficiently turning money into software that does something while remaining modifiable and maintainable. Haskell98 with some “benign” extensions is often the optimal solution in the trade-off between safety and simplicity. With conventional Haskell, developers effortlessly can stay in the pit of success.

Even though Haskell is a wonderful language by default, there are ways to get out of the pit. Ironically, these ways are exceptionally attractive to Haskellers for exactly the same reason Haskell itself is attractive to them: exploration and the endless pursuit of correctness. There are probably far more people who will go with anything that promises an improvement with respect to type-safety than those who are capable of sober estimation and balancing.

In the end, the main thought here is the following: build using simple proven techniques even if you're using a technology like Haskell. For anything extra (like dependent types, formal verification, etc.) you might want to think twice and thrice. And still remain uncertain.

# Full Stack Sr. Software Engineer (Haskell) at Interos (Full-time)

Interos Solutions, Inc. is a fast-growing venture backed company with an AI-driven SaaS application that delivers our commercial and government customers unique insight into their ever-changing business ecosystems. The core of our platform is the compilation, analysis and visualization of dynamically changing big data collected across open source, proprietary and public data sources. By continuously analyzing our "real-time" data stream we provide our customers with unique insights into their business relations, supply chains and other third-party activities. To accelerate the growth of our application, we are putting together a team of Haskell engineers, data analysts, data scientists, UI/UX professionals and product managers.  We are committed to building a world class product organization that leverages tools like Haskell and latest machine learning techniques to achieve outsized results as individuals and as a team.

Summary

You will design, code, implement and maintain both front-end and back-end technologies. As one of our first Full-stack Engineers, you will have room to shape your work and have an outsized impact on our product and our culture. You will work across our tech stack to develop our enterprise-grade application, help institute effective processes for ensuring our products have quality code and minimal defects, collaborate with our VP of engineering, founder and our customers on product features, and more. We're looking for engineers who relish solving the hard-technical and analytics challenges and diving into the subtle details that make products amazing.  Being part of and leading an innovative engineering team, you'll need solid hands-on experience in Haskell as well as web technologies (HTML/CSS/JS, REST, JSON/XML) and databases (Relational and NoSQL).  Experience with machine learning, big data and cloud infrastructure are a plus.

Responsibilities include:

• Working with the management team and customers to understand desired features, capabilities and testing scenarios
• Developing and maintaining the Interos customer-facing platform, machine learning capabilities and infrastructure
• Working within and across Agile teams to design, develop, test, implement, and support technical solutions across a full-stack of development tools and technologies
• Leading the craftsmanship, availability, resilience, and scalability of Interos' application
• Bringing a passion to stay current with technology, experimenting with and learning new technologies as well as a desire to train developers without Haskell experience

Requirements/Qualifications: (This is who we are looking to hire)

• At least 5 years of hands on experience in designing and architecting applications in a functional style and a strong working foundation in machine learning, big data and AWS infrastructure
• Bachelor's degree in Computer Science, Engineering, Mathematics, Physics or another scientific field is preferred
• Experience working within a distributed micro-services architecture Enough experience with Haskell to be comfortable with things like: Monad Transformers, Rank-N Types, Functional Dependencies, Type Families, GADTs, Lenses, Type-Level Programming
• Experience with either of the following would be a plus: Reflex / Reflex.Dom / Obelisk, Data Visualization
• Experience and interest in training developers without Haskell/FP experience
• Completed projects that you can discuss in depth or code samples (ideally in Haskell) that you can share are a big plus

Location: Arlington, VA (15 minutes from downtown DC)

We realize this is a lot to ask and we are hiring for a lot of similar roles soon so if this opportunity sounds interesting to you, feel free to contact us even if you don't fit the bill exactly.

Get information on how to apply for this position.

# Full Stack Software Engineer (Haskell experience preferred) at Interos Solutions (Full-time)

Interos Solutions, Inc. is a fast-growing venture backed company with an AI-driven SaaS application that delivers our commercial and government customers unique insight into their ever-changing business ecosystems. The core of our platform is the compilation, analysis and visualization of dynamically changing big data collected across open source, proprietary and public data sources. By continuously analyzing our "real-time" data stream we provide our customers with unique insights into their business relations, supply chains and other third-party activities. To accelerate the growth of our application, we are putting together a team of Haskell engineers, data analysts, data scientists, UI/UX professionals and product managers.  We are committed to building a world class product organization that leverages tools like Haskell and latest machine learning techniques to achieve outsized results as individuals and as a team.

Summary

You will code, implement and maintain both front-end and back-end technologies. Being part of an innovative engineering team, you'll need solid experience in web technologies (HTML/CSS/JS, REST, JSON/XML) and databases (Relational and NoSQL databases).  Experience with machine learning, big data, and (AWS) cloud infrastructure are a plus.  If you have experience in Haskell even better, but if not, you must have a desire to learn Haskell in "real time".

Responsibilities include:

• Working with the management team and customers to understand desired features, capabilities and testing scenarios
• Developing and maintaining the Interos customer-facing platform, machine learning capabilities and infrastructure
• Working within and across Agile teams to design, develop, test, implement, and support technical solutions across a full-stack of development tools and technologies
• Driving the craftsmanship, availability, resilience, and scalability of the Interos application
• Bringing a passion to stay current with technology, experiment with and learn new technologies such as Haskell

Requirements/Qualifications: (This is who we are looking to hire)

• At least 2 years of hands on experience and a strong foundation in coding, big data and AWS infrastructure

• Bachelor's degree in Computer Science, Engineering, Mathematics, Physics or another scientific field is preferred

• Passionate about software engineering and comfortable with front-end and back-end development

• Interested (if not experienced) in Haskell and Functional Programming

• Experienced with a variety of SQL and NoSQL databases

• Experience with any of the following would be a plus: Haskell or other FP languages (PureScript, Elm, OCaml, Clojure, etc) Python, Reflex / Reflex.Dom / Obelisk, Data Visualization

• Experience working within a distributed micro-services architecture

• Working understanding of machine learning concepts and AWS.

• Ability to work in a team as well as independently and deliver on aggressive schedules and goals

• Completed projects that you can discuss in depth or code samples that you can share are a big plus

Location: Arlington, VA (15 min from downtown DC)

Get information on how to apply for this position.

# WebAssembly in Rust: A Primer

We are excited that we have been able to keep up with the pace of our monthly webinar series. More importantly, we have been able to offer a variety of topics and this month's webinar certainly illustrates our commitment to webinar diversity.  We are pleased that Aniket Deshpande (Software and DevOps Engineer at FP Complete) was able to discuss and demonstrate "WebAssembly in Rust: A Primer." for this webinar. We had 216 people registered for the event which aired on Wednesday, Jan 30th, 2019 at 10:00 am PST.  As we always do, we recorded the event to enable anyone who could not tune in live to enjoy the webinar at their one pace. Check it out below.

### Yesod Web Framework

As I'm sure many readers are aware, Google+ is shutting down. This affects Yesod's authentication system. In particular, any users of the `Yesod.Auth.GoogleEmail2` module will need to migrate.

Fortunately for all of us, Patrick Brisbin has written both the code for using Google Sign-in, but has also put together a great migration blog post. If you're affected, please check that out for a relatively painless migration.

Earlier today, I migrated Haskellers.com in two commits: the first did all of the work, and the second made things more future-proof.

As you may be able to tell from the `GoogleEmail2` module, this isn't the first time we've had to migrate between Google authentication APIs. Hopefully this one will stick.

# Call for new Stack issue triager

For those unfamiliar with it, Stack has a team called the “Stack Issue Triagers.” If you’ve been interacting on the Stack issue tracker recently, it’s likely that those are the people you’ve seen answering questions. We’ve been slowly ramping up the team, and it seems to have been quite successful so far at helping Stack users, as well as escalating issues and getting them resolved as necessary.

We’re looking to expand the team again. This team is a volunteer effort, with each person taking a week at a time of responding to issues. We have a lot more information on this in the team process Stack documentation. Working with this team is not only a great service to the community, but also gives you a chance to learn more about maintaining a large open source Haskell codebase, and gives you many opportunities to interact with the Stack maintainers.

For now, we’re adding 1 new person to the team, though we’ll likely add another one in three months time, and continue the ramp up process from there. If you’re interested in taking part, please send me a private email to `michael at snoyman dot com`.

I’ll collect applicants and discuss with the team members. If you have questions about the process, please feel free to ask on the Gitter Stack chat!

# Haskell command-line utility using GHC generics

Today, Justin Woo wrote a post about writing a simple Haskell command-line utility with minimal dependencies. The utility is a small wrapper around the `nix-prefetch-git` command.

In the post he called out people who recommend overly complex solutions on Twitter:

However, I hope to show that we can simplify his original solution by taking advantage of just one feature: Haskell’s support for generating code from data-type definitions. My aim is to convince you that this Haskell feature improves code clarity without increasing the difficulty. If anything, I consider this version less difficult both to read and write.

Without much ado, here is my solution to the same problem (official Twitter edition):

``{-# LANGUAGE DeriveAnyClass        #-}{-# LANGUAGE DeriveGeneric         #-}{-# LANGUAGE DuplicateRecordFields #-}{-# LANGUAGE OverloadedStrings     #-}{-# LANGUAGE RecordWildCards       #-}import Data.Aeson (FromJSON, ToJSON)import Data.Text (Text)import Options.Generic (Generic, ParseRecord)import qualified Data.Aesonimport qualified Data.ByteString.Lazyimport qualified Data.Text.Encodingimport qualified Data.Text.IOimport qualified Options.Genericimport qualified Turtledata Options = Options    { branch   :: Bool    , fetchgit :: Bool    , hashOnly :: Bool    , owner    :: Text    , repo     :: Text    , rev      :: Maybe Text    } deriving (Generic, ParseRecord)data NixPrefetchGitOutput = NixPrefetchGitOutput    { url             :: Text    , rev             :: Text    , date            :: Text    , sha256          :: Text    , fetchSubmodules :: Bool    } deriving (Generic, FromJSON)data GitTemplate = GitTemplate    { url    :: Text    , sha256 :: Text    } deriving (Generic, ToJSON)data GitHubTemplate = GitHubTemplate    { owner  :: Text    , repo   :: Text    , rev    :: Text    , sha256 :: Text    } deriving (Generic, ToJSON)main :: IO ()main = do    Options {..} <- Options.Generic.getRecord "Wrapper around nix-prefetch-git"    let revisionFlag = case (rev, branch) of            (Just r , True ) -> "--rev origin/" <> r            (Just r , False) -> "--rev " <> r            (Nothing, _    ) -> ""    let url = "https://github.com/" <> owner <> "/" <> repo <> ".git/"    let command =            "GIT_TERMINAL_PROMPT=0 nix-prefetch-git " <> url <> " --quiet " <> revisionFlag    text <- Turtle.strict (Turtle.inshell command Turtle.empty)    let bytes = Data.Text.Encoding.encodeUtf8 text    NixPrefetchGitOutput {..} <- case Data.Aeson.eitherDecodeStrict bytes of        Left  string -> fail string        Right result -> return result    if hashOnly    then Data.Text.IO.putStrLn sha256    else if fetchgit    then Data.ByteString.Lazy.putStr (Data.Aeson.encode (GitTemplate {..}))    else Data.ByteString.Lazy.putStr (Data.Aeson.encode (GitHubTemplate {..}))``

This solution takes advantage of two libraries:

• `optparse-generic`

This is a library I authored which auto-generates a command-line interface (i.e. argument parser) from a Haskell datatype definition.

• `aeson`

This is a library that generates JSON encoders/decoders from Haskell datatype definitions.

Both libraries take advantage of GHC’s support for generating code statically from datatype definitions. This support is known as “GHC generics”. While a bit tricky for a library author to support, it’s very easy for a library user to consume.

All a user has to do is enable two extensions:

``{-# LANGUAGE DeriveAnyClass #-}{-# LANGUAGE DeriveGeneric  #-}``

… and then they can auto-generate an instance for any typeclass that implements GHC generics support by adding a line like this to the end of their data type:

``    deriving (Generic, SomeTypeClass)``

You can see that in the above example, replacing `SomeTypeClass` with `FromJSON`, `ToJSON`, and `ParseRecord`.

And that’s it. There’s really not much more to it than that. The result is significantly shorter than the original example (which still omitted quite a bit of code) and (in my opinion) easier to follow because actual program logic isn’t diluted by superficial encoding/decoding concerns.

I will note that the original solution only requires using libraries that are provided as part of a default GHC installation. However, given that the example is a wrapper around `nix-prefetch-git` then that implies that the user already has Nix installed, so they can obtain the necessary libraries by running this command:

``\$ nix-shell --packages \  'haskellPackages.ghcWithPackages (p: [ p.turtle p.optparse-generic p.aeson ])'``

… which is one of the reasons I like to use Nix.

## February 11, 2019

This week we're taking a quick break from new content. We've added our new series on Haskell's data system to our permanent collection. You can find it under the beginners panel or check it out here! This series had five parts. Let's take a quick review:

1. In part 1 we reviewed the basic way to construct data types in Haskell. We compared this to the syntax of other langauges like Java and Python.
2. Part 2 showed the simple way we can extend our Haskell types to make them sum types! We saw that this is a more difficult process in other languages. In fact, we resorted to making different inherited types in object oriented languages.
3. Next, we demonstrated the concept of parametric types in part 3. We saw how little we needed to add to Haskell's definitions to make this work. Again, we looked at comparable examples in other languages as well.
4. In part 4, we delved into Haskell's typeclasses. We compared them against inherited types from OO languages and noted some pros and cons.
5. Finally, in part 5 we concluded the series by exploring the idea of type families. Our code was more complicated than we'd need in other languages. And yet, our code contains a lot more behavioral guarantees in Haskell than it does elsewhere. And we achieved this while still having a good deal of flexibility. Type families have a definite learning curve, but they're a useful concept to know.

As always keeping coming back every Monday morning for some new Haskell content! For more updates and our monthly newsletter, make sure you Subscribe! This will also give you access to our Subscriber Resources!

# Rust helped me escape Cthulhu!

I just downloaded the game Lovecraft Quest on my Android tablet, and was enjoying the puzzles, until I got to the final one, and that got me, ahem, *frustrated*. I found a walk-through where the reviewers admitted to more or less clicking randomly on the screen for one hour until it worked:

I actually managed to solve the puzzle the first time quickly with sheer luck, but on the second run I was stumped. So I followed the only reasonable course of action: let the computer solve the puzzle for me!

The puzzle is as follows: there is a square (4 rows, 4 columns) of gold bars that can be either in horizontal or vertical position. You can make one bar pivot, but when you do that all the bars in the same row and column pivot too! You need to get all bars to the vertical position for the door to open to reveal... Cthulhu itself!

So let us use Rust for this puzzle, and see if a brute force approach can work.
Let's first define a few types. A grid is a 4x4 array of booleans and a position is a tuple. We'll have positions ranging from (0,0) to (3,3). A grid is solved is all elements are true (so true means vertical position for a bar).
Then let's implement the swapping operation. We'll take one Grid reference and return a new Grid with the proper bars swapped.
Note that we swap again the position we chose since we swapped it twice already, one for the columns and once for the rows. Not very elegant maybe but simple

The main solving function is as follows: we're given a grid and we return a list of positions to swap to solve the puzzle. If the grid is already solved we return an empty vec. Otherwise we keep two structures: a map of grids to a vector of positions (showing how we got to that particular grid from the start grid) and a list of grids to process. We use a Deque here so we can put new grids at the end but process grids from the start, to do a breadth-first traversal of all possible grids.
The try_all function will do all possible swaps on the given grid and store all the results with the path used to get to them. We store for all grids the shortest path we used to get to it. New grids get added at the end of the todo list. We return the path if we end up with a winning grid.
To simplify calling the program, we'll just pass a string made of 16 characters, with 1s and 0s indicating the position of the bars we see on the screen. Parsing this string into a Grid instance is easy enough with the help of chunks():
And we can pass then that string as a argument to our binary program, and print the result:
In my case, the program ran and gave me the solution: only 12 moves!

This Rust code is probably fairly naive in places, since I only have a couple of weeks of evenings learning it. I'm not too happy with the number of clone() calls I had to add to make the borrow checker happy, on top of the ones I thought I needed anyway. Maybe immutable structures would in fact be easier to use here! Of course this code could be made parallel but since it gave me my answer in a few seconds I didn't need to improve it further.

Happy Rust Hacking!

# Using stack to get around upstream bugs

Recently I bumped into a bug in amazonka.1 I can’t really sit around waiting for Amazon to fix it, and then for amazonka to use the fixed documentation to generate the code and make another release.

Luckily `slack` contains features that make it fairly simple to work around this bug until it’s properly fixed. Here’s how.

1. Put the upstream code in a git repository of your own. In my case I simply forked the amazonka repository on github (my fork is here).
2. Fix the bug and commit the change. My change to amazonka-codepipeline was simply to remove the missing fields – it was easier than trying to make them optional (i.e. wrapping them in `Maybe`s).
3. Tell `slack` to use the code from your modified git repository. In my case I added the following to my `slack.yaml`:

`````` extra-deps:
- github: magthe/amazonka
subdirs:
- amazonka-codepipeline``````

That’s it!

1. The guilty party is Amazon, not amazonka, though I was a little surprised that there doesn’t seem to be any established way to modify the Amazon API documentation before it’s used to autogenerate the Haskell code.

# Pedal to the Metal: starting with Rust

After hearing for years about Rust, I finally decided to give it a go. This is my journey so far

I'm currently using Visual Studio Code with the Rustextension. It's sufficient for the little exercises I've been doing so far, but I notices that code completion is not very accurate and does not seem to give me everything that's possible.  I've installed the IntelliJ Rust extension in my community IDEA, I'll see how well it works when I start doing bigger projects.

Rust is obviously a different kettle of fish than the last language I picked up, which was Go. It's a lot more complex and rich and hopefully powerful, and consequently takes a lot more time to master. It feels a bit like a cross between C for the low level aspects and Haskell for the structural and functional parts. I was a bit surprised straight away to read about variable shadowing since it seems to take away the security somehow (you cannot mutate a variable but you can redefine it), but from what I read it doesn't seem to be an issue practically since the compiler holds your hand at all time.

Most languages I've worked with have a Garbage Collector, so I'd say dealing with the borrow checker is going to be fun in larger code bases, but so far in the small exercises I've done it hasn't been an issue.

Hopefully I can get productive with Rust quickly, I'd like to contribute to projects like Rusty-Machine to do neural networks and ML with Rust, which would seem a perfect match with its performance and safety. We'll see!

Happy Rust Hacking!

# How to Write Technical Posts (so people will read them)

One of today's best bloggers in the functional programming space is Chris Penner. Despite this, Chris is criminally underrated---nobody I talk to seems to know of his work. Go check him out as soon as you finish reading this post!

This is reasonably pervasive phenomenon in our subculture; there's lots of fantastic work being produced, but for one reason or another, it falls between the cracks. I'd propose the biggest obstacle is that most FP writing isn't optimized to be read.

So I'd like to take some time today and talk about common failure modes in technical writing. If you don't have a checklist you run through before publishing a post, you'll probably get a lot of benefit from internalizing this advice.

The value? You'll make it easier for people to understand what you're trying to tell them. Which is why you're writing in the first place, right? The good news is that none of this is magic---just some simple guidelines for structuring your content.

## The Guiding Principle

Here's the biggest thing to keep in mind: your reader doesn't really care what you have to say. You get maybe four sentences to convince them that your essay is worth their time. If you haven't made your case by then, you've probably lost them to the next tab they binge-opened.

Even after you've convinced them that it's a valuable read, you need to continue assuring them that you're not wasting their time. If you take too long to get to the point, or if you make the information too hard to find on a quick skim, you've lost them again.

As a result, alongside your technical content, you have two primary goals to focus on:

• Provide strong, concise motivations for everything you want to talk about.
• Make it easy to skim.

If you can do these two things, you're already most of the way to better engagement.

## Writing Strong Motivations

People care about solutions to problems they have, or expect to have one day. They care about other things too, but it's not really relevant to us today.

With this in mind, you want to tailor your motivations in terms of solutions to problems.

Here are some good examples of problems that you've probably run into (and links to their solutions for you to read later):

Bad examples of motivating documents are things that assume you care simply because they exist. This is pretty common of libraries' tutorials. The same document that convinces you to use a technology should also show you how.

As a more general rule, focusing on why is going to be more valuable than on how.

Why should someone care, and why your solution is a good one.

"How" without a "why" suggests a contrived solution to a made-up problem. In other words, it's easily read as "who cares?"

People who spend lots of time reading have good heuristics for skipping lots of text. If you understand these heuristics, you can make it easy for people to find what they're looking for. And remember: they're looking for reasons to continue reading.

There are two behaviors here. The first is what I call "skimming for concepts"---which is that people will attempt to determine what text is about at a high-level. They're looking for what you're trying to tell them, as opposed to what you're actually saying.

When skimming, people are likely to read only the headings and the first two sentences of a paragraph. If they're convinced they know what you have to say, they'll skip to the next paragraph. If several paragraphs in a row don't seem relevant, they'll skip to the next heading.

If the next heading also fails to grab their attention, they'll probably give up.

The solution is to structure your argument as a tree. Headings must provide enough information to let someone know whether or not they care about the following prose.

This also means that the first sentence of each paragraph should be sufficient to understand the rest of the paragraph. The remaining sentences are only allowed to reinforce or give details of the first sentence. One paragraph equals one idea.

Roughly speaking, the hierarchy of your document should look like this:

• Document
• First sentence
• Rest of paragraph

Maybe you feel like it's hard to know how much knowledge to assume your readership knows. Understanding how people read makes this a non-issue. Present as much information as is necessary to understand your point, but make it easy to skip if people already know it. Those who don't yet know will learn, those who do can skip past, and both camps will appreciate it.

After you've convinced your reader that they care what you have to say, they're more willing to read what you actually have to say. When the reader is ready to dive in, that's when you can make the finer points of your argument.

It's unlikely that anyone is going to read every word of your essay. It's even less likely that they'll read them all in order.

## Stumbling Blocks

Now that you've got people ready to listen to you, it's important to keep them on the same page as you. You really need to stay on top of anything that might break their focus.

Use short sentences. If it's too hard to parse, people won't.

Make sure your spelling and grammar have no egregious problems. You don't need to have perfect command of the language, but you need to be able to signal that you're trustworthy-enough to listen to. Don't underestimate how much credibility you'll lose from blatantly bad spelling and grammar.

This stuff is relatively common-sense.

What might be less so is that you also need to stay on top of conceptual stumbling blocks. If your argument makes a jump that feels poorly motivated or references something potentially unfamiliar, expect that your reader will experience vertigo.

Most of the ideas we talk about in functional programming are not easy, and it doesn't do anyone any favors to pretend this isn't so. Expect that your readers will be pretty similar to you; if you had a problem understanding a piece of your topic, call that out. Point out the obstacle. Point out what they might be thinking, and then very explicitly show them what they should be thinking instead.

For example, if your code sample uses a functional dependency, it might be worth a short sentence saying "that funny bar thing is a functional dependency." Give a tiny summary of its purpose, and provide a link for them to learn more if they need to.

Another illustration: if you originally got confused that an "algebra in recursion schemes" is not the same thing as the algebra you learned in high-school, your readers probably will too. The first time you say "algebra" in the new context, say "this is a misleading term. This has nothing to do with solving equations like you did in high-school."

More important than what you're saying is what you're not saying. If you're giving examples of something that fits a pattern, make sure you give examples of things that do not fit the pattern. A concept that is applicable everywhere is useful nowhere.

Give lots of examples. Nobody I know learns via mathematical statements, nor do they learn via long, wordy, abstract prose. People learn by seeing lots of examples, and being told explicitly how these examples relate to one another. The mathematical statements are only useful after you have the intuition, so save them for an appropriate time.

## Conclusion

The takeaway advice from this essay is that if you want lots of readers, you must make it easy for them to read.

To that end, pay lots of attention to motivation. Why should people care what you have to say? What value does it give them?

Focus your energy on the beginnings---both of the essay as a whole, and of each paragraph. People who are unconvinced by your essay's value will skim their way through it, and they will do that by reading only the beginnings of things.

Make your information easy to find. Structure your argument in a tree, and make sure it supports random-access. Nobody is going to read the whole thing from start to finish. Instead they're going to jump around, ignoring the pieces they already know, and looking for the bits they don't.

Use lots of short paragraphs. A paragraph should correspond to a single idea.

Anticipate which parts of your argument will be difficult for your readers, and be proactive in trying to assuage those difficulties. Give lots of examples of what you're talking about, and more importantly, what you're not talking about.

Point out where you went into the weeds while learning this stuff. Steer readers away from common mistakes and misconceptions.

And finally, end on a high note. Leave people with a good feeling, and it will incentivize them to get to the bottom of your next piece. Inspirational calls to action are a good way to go out.

Tell them what you're going to tell them. Then tell them. Finally, tell them what you told them.

-Unknown

Funny and poignant quotes are too.

# CodeWorld Update — February, 2019

### CodeWorld Update — February, 2019

I’ve been quiet on social media lately, but it’s been an active time for the CodeWorld project. Here’s a brief summary of some of the things happening in the last few months.

### Platform features

There have been some significant updates lately in the CodeWorld platform. Here are some of the big ones.

#### The Requirements Framework

Together with Ashish Myles and with some good advice from Fernando Alegre, I started building the CodeWorld Requirements Framework. This is a language embedded into CodeWorld for expressing requirements for assigned student code. For example, you might ask students to write at least 4 functions in their project, to use all five built-in picture transformations (translation, rotation, dilation, non-uniform scaling, and coloring), and to make sure there are no unused parameters in their functions. The requirements language lets you express all of these.

The framework implements a few one-off rules for some common requirements, but based on some inspired advice from Fernando, it also includes a pattern matcher over Haskell syntax, which lets you define patterns that should (or should not) occur in the code. By abusing template haskell syntax, you can write a pattern like f \$var = sqrt \$any, and this will match any function called f whose body is defined by applying sqrt to any expression. It’s surprising how many rules can be expressed in this form. Ashish and I alpha-tested the framework when teaching functions to middle school students, and it was very successful.

That said, though, the Requirements Framework is still in experimental stages. There are a lot more rule types I’d like to implement, including common lint rules, UX improvements, and even some ambitious projects like integrating runtime validation testing and compile-time inspection testing into the framework. The status of the project is at https://github.com/google/codeworld/projects/9.

#### Tooling work

Artem Kanev has recently been contributing to the tool support in CodeWorld, and has made a lot of progress. You might notice that CodeWorld’s education mode now offers documentation for known variables when you hover over them, or when you select them in the autocomplete UI. There’s also a lot more work done to include symbols defined in the current module in these features. These simple changes make a huge difference to students as they use the platform.

I know Artem has big plans coming up, to continue working on the student programming tools and turn the CodeWorld environment into a world-class user interface (but without sacrificing its essential simplicity!). Coming soon are continuous highlighting of compile errors, better error message presentation, and deferred type errors. After that, who knows? You can definitely try out the new features and send bugs and feature requests with your feedback.

As we learn more from teaching with CodeWorld, we continue to tweak the exported API so that future classes go better. A few API changes are now in progress.

First, CodeWorld is moving toward using pattern synonyms for a lot more constants. This includes the built-in colors, and the constant pi, for example. The reason for this is that there’s a nice educational progression from listing inputs and outputs of a function in a table, then writing them in a sequence of equations for specific inputs, and finally generalizing to a pattern with a formal parameter. But that doesn’t work when the input isn’t a literal. Bidirectional pattern synonyms act like literals in most ways, so students can write equations like f(Blue) = 5, and they act as they are supposed to.

This is a big change, especially for colors, and the old names for colors will be around for at least a year and half, and possibly longer. But I encourage you to teach with the new names in the future.

At the same time, I’m continuing to update documentation and curriculum. I’ve taken another pass at the beginning of the guide over the last several months, and expanded much of the content. There are also new folders in the CodeWorld Drive folder with worksheets and resources I’ve created for my new classes. This is still very much a work in progress, but things are getting better.

I’m also resuming work on the CodeWorld comic book. This was a cute project I began many years ago, to convey some of the humor and programming culture, as well as introduce the language of error messages, in a fun and casual way. The story is finished, but we’re filling out the book with creative and personality-filled reference pages that students can use as they build their own projects. Stay tuned!

### Classes and community

Of course, none of this would be worth it without the kids! I’m also continuing to teach. I’ve finished a class for 8th grade students in the previous semester, and I’m taking on 9th graders for the first time this semester. It’s a new experience for me, so keep your fingers crossed for me.

Others have also continued their teaching. In particular, I’m still working with Louisiana State University to investigate results measured in our classes, and I know others are teaching classes everywhere from elementary school through university classes around the world.

### Getting involved

To stay up to date on CodeWorld, feel free to subscribe to the codeworld-discuss mailing list. We’ve got a lot of cool things coming up. I’m hopeful that we can work with another student or two in the upcoming Google Summer of Code season, and share some of our research results over the next several months. Stay tuned!

# Mapping a Universeof Open Source Software

Matthias Meschede

## Introduction

The repositories of distributions such as Debian and Nixpkgs are among the largest collections of open source (and some unfree) software. They are complex systems that connect and organize many interdependent packages. Interestingly, these systems grow out of the individual design choices of thousands of contributors but few of them have larger design goals in mind.

Is it possible to capture the large scale features of such a repository in an image? Are there common design choices of the contributors? Did they lead to any emergent structure?

Plenty of studies (e.g. here, here, or here) have examined these questions from different viewpoints and for various software ecosystems. In this blog post I'll try to shed some light on them from the perspective of Nixpkgs, mostly with visualizations of its complete dependency graph but also with a very short examination of its statistical properties (its degree distribution). But wait, you might think: "visualizing a graph with 50000 nodes and 300000 edges? - All you are going to see is black!". Well, it turns out that this is not the case because large scale structures appear when we look at the graph in the right way.

## An overview of the Nixpkgs dependency graph

Before explaining in detail what I did, let's start with a teaser. Check out the structure in this fascinating image where each little dot represents a single software package in the Nixpkgs repository:

What looks remotely like the section of a mouse brain actually represents around 46000 software packages. Each dot in this image corresponds to a different one, more precisely to a Nix derivation in the Nixpkgs repository. Each package is invisibly linked to those that it depends on (ingoing dependencies) or those that depend on it (outgoing dependencies). These packages are the nodes, and their invisible links the incoming and outgoing edges of the directed dependency graph of Nixpkgs.

The total number of dependencies that a node has is called the node's degree. It is subdivided in the in degree which counts the number of ingoing dependencies and the out degree which counts the number of outgoing dependencies. I have colored and sized the nodes in the above image by their in degree: a brighter larger node means that more derivations depend on it (high in degree) than on a darker smaller node (low in degree). Size and color are scaled non-linearly and should only be interpreted qualitatively. We will see later from the distribution of these degrees why this makes sense.

The graph was laid out and visualized with the program gephi using a force simulation algorithm to project each node to a display positions in the image. This force model consists of attractive and repulsive forces that drag the nodes into their final positions that you can see in the image.

The attractive force pulls nodes towards their dependencies. Therefore nodes with similar dependencies end up close to each other. Looking back at the image, we see the effect of this force in the form of node clusters with visible gaps between them. The repulsive force pushes nodes away from dependencies that have high degree. High degree nodes therefore appear with a lot of space around them. Low degree nodes, on the other hand, are tightly packed.

As a final addition, a gravitational force keeps the whole graph together and squeezes it into roughly circular form. Check out the appendix at the end of this post for a detailed explanation, including the code and the data that are required to reproduce this image yourself.

The following image of the dependency graph was laid out in the same way, but shows the package name of each node explicitely. As before, the package names are scaled non-linearly with the in degree of the node. The names of the largest nodes - we'll call them hub nodes - are well readable but to read the labels of the smaller nodes you should click on the image and use your browsers zoom functionality.

Most central to the Nixpkgs repository are the primary hub nodes bash and the Nix stdenv derivation. Unsurprisingly, most derivations depend on them - at least at build time - because bash is one of the most commonly used Unix shells and stdenv is a nix package that groups very fundamental build dependencies such as GNU make. Both are used as default environments in Nix.

Slightly smaller secondary hub nodes can also be identified: curl and mirrors-list seem to be central to the upper part of the graph and indicate that it has something to do with the internet. On the right side, Python2.7 and Python3.6 are visible as distinct clusters and grouped around a central hook derivation that is used by Nix to setup a Python environment. On the left side, different versions of Perl can be seen. In the bottom, the Linux library manager tool pkg-config forms a little aura that matches its importance for the open source ecosystem.

The hub nodes indicate that some clusters in this graph correspond to the environments of different programming languages. Naturally packages in these environments will have a common set of dependencies and therefore appear close to each other in the graph.

## Ego networks: exploring hub node environments

Let's dive further into this. A hub node forms a little subgraph together with its ingoing and outgoing dependencies that is sometimes called its ego network (this term seems to originate in studies of social networks). In the following plot, the ego networks of a few selected hub nodes are colored in green. The corresponding hub node's name is indicated above each subplot. It illustrates nicely how clusters form around a compiler, an interpreter or just an important tool:

The ego networks confirm our previous intuition that some clusters correspond to different programming languages. They also provide basic orientation on this graph. Here is a short description of the different ego networks:

Most of the upper part of the graph is dominated by curl. Closer inspection shows that most of these derivations are archives such as .tar.gz files or language specific archives such as .gem files. These archives are downloaded from the internet before other packages are derived from them. Another large, but less tightly packed region is the pkg-config zone. It dominates the bottom of the graph and it is characterized by lots of smaller hub nodes indicating that pkg-config organizes many packages that are important for other packages. The ego network of cmake is mixed into a similar region of the graph.

Python2.7 and Python3.6 form neat clusters with three central hub nodes that correpond to the python interpreter, the package managment tool pip, and the build and installation tool setuptools. Python2.7 is used more often by packages outside of its cluster than Python3.6. In contrast to Python, Perl has only a single hub node for each specific version but it forms similarly dense clusters. From the different Perl versions, Perl-5.28.0 is reused most ouside of its own cluster by a lot of other packages in the pkg-config zone. This indicates that the specific version of the Nixpkgs repository that I have evaluated uses this particular Perl version as the default for other packages.

Lisp (sbcl), Haskell (ghc), Ruby, OCaml, Go, and Rust derivations also accumulate in tight clusters around their respective compiler.

Ego networks can only give a glimpse into the structure of the graph of Nixpkgs. Not all environments are organized around a central hub. Some packages are self-contained and independent. For example, the Go packages form a tight independent group because they often include copies of third party packages that they depend on. There are also many smaller clusters that correspond to particular tools such as terraform, maven or even the vim plugins. If you have time and patience, you can try to find them in the labelled graph image above.

## The degree distribution of the Nixpkgs dependency graph

Another perspective on the global properties of the Nixpkgs repository is the degree distribution of its nodes. It shows how many nodes exist with a certain in or out degree (as a reminder: the in degree is the number of packages that depend on the node; the out degree is the number of packages that the node depends on). Certainly there are a lot more nodes with low in degree than with high in degree. More interesting is how the number of nodes with a certain degree changes between the extremes.

A whole bunch of studies (for example here, here) have found that the in degree distribution of software repository dependencies follows a power-law distribution, similar to the degree distributions of networks in Geology, Biology, Economics, or the Internet (here is a short introduction on power-laws). Such a power-law distribution can be generated by several mechanisms. The hottest candidate mechanism in the case of software repositories seems to be preferential attachment, a.k.a. "the rich get richer". In other words, a new package is more likely to depend on existing packages with a lot than on packages with few dependencies. Other mechanisms such as self-organized criticality might also play a role.

The (complementary) cumulative in and out degree distributions show how many packages exist above a certain degree. Looking at the cumulative distribution is typically better than looking at its derivative, the degree distribution. As an integral, the cumulative distribution has a better behavior when the density of packages gets low at high degrees. We plot it here for the in and out degrees in log-log scale such that a power-law is just a straight line:

The cumulative in degree distribution is a very clean power-law decay with an exponent of -0.9. The in degree distribution itself has an exponent of -1.9, because it is the derivative of the cumulative distribution. The value that we observe falls in the range of values that have been observed for the debian repository (around -2.0). Interestingly, a power-law distribution is also a scale-free distribution: For example, at the largest scale of Nixpkgs, a node with an approximate (order of magnitude) in degree of 2000 such as a python2.7 (3033), perl-5.28 (2406), or git (2394), is about 80 times (=10^1.9) less common than a node with in degree 200, which could be an important library such as python2.7-mock (212), python3.6-six (321), or freetype (303). On the other hand, such a node with degree 200 is also about 80 times less common than a node with approximate in degree 20, such as python3.6-flake8 (23), ruby2.5.3-nokogiri (20), or erlang (19). In some sense, the large scale ecosystems around python, perl, or git are therefore scaled up versions of the small scale ecosystems around a minor programming language or library. This shows that there is more to a software repository such as Nixpkgs than just a few primary and secondary hub nodes. It is a complex growing ecosystem.

The out degree of a package counts how many other packages the node depends on. Its cumulative distribution looks a bit different than the one for the in degrees. It has a cutoff at around degree 3, which means that only few nodes depend on three or less packages. At degree 4, the distribution drops markedly, which means that there are lots of packages that depend on 4 other packages. The decay afterwards is also not quite as regular as for the in degrees. If we interpret this decay as a power-law, the decay rate corresponds to an exponent of around -2.9. This means that a node that depends on 200 packages is about 800 times (=10^2.9) less common than a node that depends on 20 packages. Although this gives some intuition about the out degree distribution, it is unclear if a power-law is really a good model for it. Preferential attachment is less plausible as generating mechanism behind the out degree distribution because the number of dependencies that a package depends on is often directly chosen by the developers. Other distributions, such as multiple overlapping log-normals, could potentially fit the distribution equally well and also account for different mechanisms.

## Conclusions

The graph visualizations give an overview of the structure of the enormous Nixpkgs repository that can be difficult to grasp in its entirety otherwise. The degree distribution of the nodes is another macro perspective of the repository that is difficult to obtain with other means. It would also be interesting to examine whether the degree distributions of different language environments differ. Taking a step back, I hope that this blog post can raise some awareness for the complexity and beauty of a system that was generated by thousands of individual contributions but that ultimately becomes more than the sum of its parts.

## Appendix: do it yourself

If you want to explore the Nixpkgs dependency graph yourself, you can download the gephi project file that will allow you to start immediately. The Nixpkgs graph in dot format can be used with graphviz and many other libraries. Finally, I have also uploaded the Nixpkgs graph as a sparse adjacency matrix.

You can also regenerate the graph from the beginning:

A Nix derivation is a recipe with all meta information that is required to build a package. This includes links to other required derivations, hashed source code, shell commands, compiler flags, environment variables, and so on. Derivations are not written by hand but generated automatically by evaluating code that is written in the Nix language. The Nixpkgs repository is a collection of such Nix code and generates by default around 46000 derivation files with build recipes. These derivation files are stored as human readable text files with the extension `.drv` in the Nix store folder (usually under `/nix/store`).

To build the dependency graph, I traversed and evaluated the whole Nixpkgs repository with a few lines of Nix code. This process is quite fast (<5min) because no package has to be actually built. The derivation files only gather the meta information that is needed to build them. I then extracted the package dependencies from each derivation file in the store with a small Haskell code and saved them as a graph `.dot` file (here is the Nix and Haskell code). Due to the graph's size, I merged derivations with the same package and version name, ignoring other build characeristics that are typically identified with a unique hash by Nix.

The graph was laid out and visualized with the program gephi using a force simulation algorithm to project the graph nodes to display positions in the image. The simulation starts by putting the nodes at random initial positions. Afterwards it updates the node positions stepwise according to a force model until they find a steady place. The force model, called Force Atlas 2, consists of an attractive and of a repulsive force between connected nodes. The attractive force is proportional to the distance and the repulsive force inversely proportional to the distance of the connected nodes. In addition, the repulsive force is proportional to the product of the degrees of the connected nodes, that is to the number of edges that are incident on them. The nodes are therefore pushed to an equilibrium distance where the forces are balanced. The equilibrium distance is smaller for nodes with low degree and larger for nodes with high degrees.

Getting a nice visualization is then a matter of fine-tuning the correponding parameters of the force simulation in gephi, together with the color scheme and the node sizes, until the structure of the graph becomes most evident.

# Validating Big Data Jobs An exploration with @ApacheSpark & @ApacheAirflow (+ friends) @ FOSDEM

Thanks for joining me on 2019-02-03 at FOSDEM 2019 Brussels, Belgium for Validating Big Data Jobs An exploration with @ApacheSpark & @ApacheAirflow (+ friends).I'll update this post with the slides soon.Comment bellow to join in the discussion :).Talk feedback is appreciated at http://bit.ly/holdenTalkFeedback

# Introducing @Kubeflow (w. Special Guests Tensorflow and @ApacheSpark) @ FOSDEM

Thanks for joining us (@holdenkarau,@rawkintrevo) on 2019-02-03 at FOSDEM 2019 Brussels, Belgium for Introducing @Kubeflow (w. Special Guests Tensorflow and @ApacheSpark).I'll update this post with the slides soon.Comment bellow to join in the discussion :).Talk feedback is appreciated at http://bit.ly/holdenTalkFeedback

# OCaml server-side developer at Ahrefs (Full-time)

## What we need

Ahrefs is looking for a backend developer with a deep understanding of networks, distributed systems, OS fundamentals and taste for simple and efficient architectural designs. Our backend is implemented mostly in OCaml and some C++, as such proficiency in OCaml is very much appreciated, otherwise a strong inclination to intensively learn OCaml in a short term will be required. Understanding of functional programming in general and/or experience with other FP languages (F#,Haskell,Scala,Scheme,etc) will help a lot. Knowledge of C++ and/or Rust is a plus.

Every day the candidate will have to deal with:

• 20+ petabytes of live data
• OCaml
• linux
• git

The ideal candidate is expected to:

• Independently deal with bugs, schedule tasks and investigate code
• Make argumented technical choice and take responsibility for it
• Understand the whole technology stack at all levels : from network and userspace code to OS internals and hardware
• Handle full development cycle of a single component - i.e. formalize task, write code and tests, setup and support production (devops), resolve user requests
• Approach problems with practical mindset and suppress perfectionism when time is a priority
• Write flexible maintainable code and adapt to post-launch requirementsâ€™ tweaks

These requirements stem naturally from our approach to development with fast feedback cycle, highly-focused personal areas of responsibility and strong tendency to vertical component splitting.

## Who we are

Ahrefs runs an internet-scale bot that crawls the whole Web 24/7, storing huge volumes of information to be indexed and structured in a timely fashion. Backend system is powered by a custom petabyte-scale distributed key-value storage to accommodate all that data coming in at high speed. The storage system is implemented in OCaml with thin performance-critical low-level part in C++. On top of that Ahrefs is building various analytical services for end-users.

We are a small team and strongly believe in better technology leading to better solutions for real-world problems. We worship functional languages and static typing, extensively employ code generation and meta-programming, value code clarity and predictability, and are constantly seeking to automate repetitive tasks and eliminate boilerplate, guided by DRY and following KISS. If there is any new technology that will make our life easier - no doubt, we'll give it a try. We rely heavily on opensource code (as the only viable way to build maintainable system) and contribute back, see e.g. https://github.com/ahrefs . It goes without saying that our team is all passionate and experienced OCaml programmers, ready to lend a hand and explain that intricate ocamlbuild rule or track a CPU bug.

Our motto is "first do it, then do it right, then do it better".

## What you get

We provide:

• Competitive salary
• Informal and thriving atmosphere
• First-class workplace equipment (hardware, tools)
• Medical insurance

## Locations

Remote

Singapore : modern office in CBD

Get information on how to apply for this position.

# Announcing ghc-lib

On behalf of Digital Asset I'm delighted to announce ghc-lib, a repackaging of the GHC API to allow it to be used on different GHC versions. The GHC API allows you use the GHC compiler as a library, so you can parse, analyze and compile Haskell code.

The GHC API comes pre-installed with GHC, and is tied to that GHC version - if you are using GHC 8.6.3, you get version 8.6.3 of the API, and can't change it. The `ghc-lib` package solves that problem, letting you mix and match versions of the GHC compiler and GHC API. Why might you want that?

• Imagine you are writing a tool to work with several versions of the GHC compiler. The GHC API changes significantly between each version, so doing this would require writing a lot of C preprocessor code to support it. An alternative is to use one version of `ghc-lib` which works across multiple versions of GHC.
• Imagine you are modifying the GHC API or want features from GHC HEAD. With `ghc-lib` you can depend on the revised GHC API, without upgrading the compiler used to build everything, speeding up iteration.

While `ghc-lib` provides the full GHC API, it doesn't contain a runtime system, nor does it create a package database. That means you can't run code produced by `ghc-lib` (no runtime), and compiling off-the-shelf code is very hard (no package database containing the `base` library). What you can do:

• Parse Haskell code, making `ghc-lib` a potential replacement for haskell-src-exts. See the demo mini-hlint in the `ghc-lib` repo;
• Compile Haskell code as far as GHC's Core language, which includes renaming and type checking. See the demo mini-compile in the ghc-lib repo, and the carefully tailored file it compiles.

The package `ghc-lib` is released on Hackage, and can be used like any normal package, e.g. `cabal install ghc-lib`. Since `ghc-lib` conflicts perfectly with the GHC API and `template-haskell`, you may wish to `ghc-pkg hide ghc-lib` and use the language extension `PackageImports` to do `import "ghc-lib" GHC`. There will be two release streams within the `ghc-lib` name:

• Version 8.8.1 will be the version of ghc-lib produced against the released GHC 8.8.1, once GHC 8.8.1 is released. There is no release against GHC 8.6.3 because we had to make changes to GHC to enable ghc-lib, which were only upstreamed in the last few months.
• Version 0.20190204 is the version of ghc-lib using GHC HEAD on the date 2019-02-04.

We've been developing and using `ghc-lib` internally at Digital Asset for the last six months. The use of `ghc-lib` has enabled us to track GHC HEAD for certain projects, and develop improvements to GHC itself, and then integrate them without requiring us to rebuild all Haskell source code on every step. Smoothing out that development loop has been a massive productivity boon to us.

While this is Digital Asset's first open source project in a while, we have been making lots of contributions behind the scenes - it's no coincidence several of my recent posts involve my colleague Shayne. In particular, our engineers have been behind several GHC proposals.

While I'm announcing the project, much of the work has been done by Shayne Fletcher and Jussi Maki, but neither of them have active blogs just yet!

# Why Haskell V: Type Families

Welcome to the conclusion of our series on Haskell data types! We've gone over a lot of things in this series that demonstrated Haskell's simplicity. We compared Haskell against other languages where we saw more cumbersome syntax. In this final part, we'll see something a bit more complicated though. We'll do a quick exploration of the idea of type families. We'll start by tracing the evolution of some related type ideas, and then look at a quick example.

Type families are a rather advanced concept. But if you're more of a beginner, we've got plenty of other resources to help you out! Take a look at our Getting Started Checklist or our Liftoff Series!

## Different Kinds of Type Holes

In this series so far, we've seen a couple different ways to "plug in a hole", as far as a type or class definition goes. In the third part of this series we explored parametric types. These have type variables as part of their definition. We can view each type variable as a hole we need to fill in with another type.

Then in the fourth part, we explored the concept of typeclasses. For any instance of a typeclass, we're plugging in the holes of the function definitions of that class. We fill in each hole with an implementation of the function for that particular type.

This week, we're going to combine these ideas to get type families! A type family is an enhanced class where one or more of the "holes" we fill in is actually a type! This allows us to associate different types with each other. The result is that we can write special kinds of polymorphic functions.

## A Basic Logger

First, here's a contrived example to use through this article. We want to have a logging typeclass. We'll call it `MyLogger`. We'll have two main functions in this class. We should be able to get all the messages in the log in chronological order. Then we should be able to log a new message while sending some sort of effect. A first pass at this class might look like this:

``````class MyLogger logger where
prevMessages :: logger -> [String]
logString :: String -> logger -> logger``````

We can make a slight change that would use the `State` monad instead of passing the logger as an argument:

``````class MyLogger logger where
prevMessages :: logger -> [String]
logString :: String -> State logger ()``````

But this class is deficient in an important way. We won't be able to have any effects associated with our logging. What if we want to save the log message in a database, send it over network connection, or log it to the console? We could allow this, while still keeping `prevMessages` pure like so:

``````class MyLogger logger where
prevMessages :: logger -> [String]
logString :: String -> StateT logger IO ()``````

Now our `logString` function can use arbitrary effects. But this has the obvious downside that it forces us to introduce the `IO` monad places where we don't need it. If our logger doesn't need `IO`, we don't want it. So what do we do?

## Type Family Basics

One answer is to make our class a type family. W do this with the `type` keyword in the class defintion. First, we need a few language pragmas to allow this:

``````{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE AllowAmbiguousTypes #-}``````

Now we'll make a `type` within our class that refers to the monadic effect type of the `logString` function. We have to describe the "kind" of the type with the definition. Since it's a monad, its kind is `* -> *`. This indicates that it requires another type parameter. Here's what our definition looks like:

``````class MyLogger logger where
type LoggerMonad logger :: * -> *
prevMessages :: logger -> [String]
logString :: String -> (LoggerMonad logger) ()``````

## Some Simple Instances

Now that we have our class, let's make an instance that does NOT involve `IO`. We'll use a simple wrapper type for our logger. Our "monad" will contain the logger in a `State`. Then all we do when logging a string is change the state!

``````newtype ListWrapper = ListWrapper [String]
instance MyLogger ListWrapper where
type (LoggerMonad ListWrapper) = State ListWrapper
prevMessages (ListWrapper msgs) = reverse msgs
logString s = do
(ListWrapper msgs) <- get
put \$ ListWrapper (s : msgs)``````

Now we can make a version of this that starts involving `IO`, but without any extra "logging" effects. Instead of using a list for our state, we'll use a mapping from timestamps to the messages. When we log a string, we'll use `IO` to get the current time and store the string in the map with that time.

``````newtype StampedMessages = StampedMessages (Data.Map.Map UTCTime String)
instance MyLogger StampedMessages where
type (LoggerMonad StampedMessages) = StateT StampedMessages IO
prevMessages (StampedMessages msgs) = Data.Map.elems msgs
logString s = do
(StampedMessages msgs) <- get
currentTime <- lift getCurrentTime
put \$ StampedMessages (Data.Map.insert currentTime s msgs)``````

## More IO

Now for a couple examples that use `IO` in a traditional logging way while also storing the messages. Our first example is a `ConsoleLogger`. It will save the message in its `State` but also log the message to the console.

``````newtype ConsoleLogger = ConsoleLogger [String]
instance MyLogger ConsoleLogger where
type (LoggerMonad ConsoleLogger) = StateT ConsoleLogger IO
prevMessages (ConsoleLogger msgs) = reverse msgs
logString s = do
(ConsoleLogger msgs) <- get
lift \$ putStrLn s
put \$ ConsoleLogger (s : msgs)``````

Another option is to write our messages to a file! We'll store the file name as part of our state, though we could use the `Handle` if we wanted.

``````newtype FileLogger = FileLogger (String, [String])
instance MyLogger FileLogger where
type (LoggerMonad FileLogger) = StateT FileLogger IO
prevMessages (FileLogger (_, msgs)) = reverse msgs
logString s = do
(FileLogger (filename, msgs)) <- get
handle <- lift \$ openFile filename AppendMode
lift \$ hPutStrLn handle s
lift \$ hClose handle
put \$ FileLogger (filename, s : msgs)``````

And we can imagine that we would have a similar situation if we wanted to send the logs over the network. We would use our `State` to store information about the destination server. Or else we could add something like Servant's `ClientM` monad to our stack in the `type` definition.

## Using Our Logger

By defining our class like this, we can now write a polymorphic function that will work with any of our loggers!

``````runComputations :: (Logger logger, Monad (LoggerMonad logger)) => InputType -> (LoggerMonad logger) ResultType
runComputations input = do
logString "Starting Computation!"
let x = firstFunction input
logString "Finished First Computation!"
let y = secondFunction x
logString "Finished Second Computation!"
return y``````

This is awesome because our code is now abstracted away from the needed effects. We could call this with or without the `IO` monad.

## Comparing to Other Languages

Now, to be fair, this is one area of Haskell's type system that makes it a bit more difficult to use than other languages. Arbitrary effects can happen anywhere in Java or Python. Because of this, we don't have to worry about matching up effects with types.

But let's not forget about the benefits! For all parts of our code, we know what effects we can use. This lets us determine at compile time where certain problems can arise.

And type families give us the best of both worlds! They allow us to write polymorphic code that can work either with or without `IO` effects!

## Conclusion

That's all for our series on Haskell's data system! We've now seen a wide range of elements, from the simple to the complex. We compared Haskell against other languages. Again, the simplicity with which one can declare data in Haskell and use it polymorphically was a key selling point for me!

Hopefully this series has inspired you to get started with Haskell if you haven't already! Download our Getting Started Checklist or read our Liftoff Series to get going!

# Predicting areas for PR Comments based on Code Vectors & Mailing List Data @ FOSDEM

Thanks for joining us (@holdenkarau,@krisnova) on 2019-02-03 at FOSDEM 2019 Brussels, Belgium for Predicting areas for PR Comments based on Code Vectors & Mailing List Data.I'll update this post with the slides soon.Comment bellow to join in the discussion :).Talk feedback is appreciated at http://bit.ly/holdenTalkFeedback

# Apache Spark on Kubernetes -- Avoiding the pain of YARN @ Pre-FOSDEM Belgium Kubernetes Meetup

Thanks for joining me on 2019-02-02 for Apache Spark on Kubernetes -- Avoiding the pain of YARN.The talk covered:

Apache Spark is one of the most popular big data tools, and starting last year has had integrated support for running on Kubernetes. This talk will introduce some of the use cases of Apache Spark quickly (machine learning, ETL, etc.) and then look at the current cluster managers Spark runs on and their limitations. Most of the focus will be around running non-Java code, and the challenges associated with dependencies along with general challenges like scale-up & down. Its not all sunshine and roses though, I will talk about some of the limitations of our current approach and the work being done to improve this.

.I'll update this post with the slides soon.Comment bellow to join in the discussion :).Talk feedback is appreciated at http://bit.ly/holdenTalkFeedback

# The ReaderT design pattern or tagless final?

The other week I read V. Kevroletin’s Introduction to Tagless Final and realised that a couple of my projects, both at work and at home, would benefit from a refactoring to that approach. All in all I was happy with the changes I made, even though I haven’t made use of all the way. In particular there I could further improve the tests in a few places by adding more typeclasses. For now it’s good enough and I’ve clearly gotten some value out of it.

I found mr. Kevroletin’s article to be a good introduction so I’ve been passing it on when people on the Functional programming slack bring up questions about how to organize their code as applications grow. In particular if they mention that they’re using monad transformers. I did exactly that just the other day _@solomon_ wrote

so i’ve created a rats nest of IO where almost all the functions in my program are in `ReaderT Env IO ()` and I’m not sure how to purify everything and move the IO to the edge of the program

I proposed tagless final and passed the URL on, and then I got a pointer to the article The ReaderT Design Patter which I hadn’t seen before.

The two approches are similar, at least to me, and I can’t really judge if one’s better than the other. Just to get a feel for it I thought I’d try to rewrite the example in the `ReaderT` article in a tagless final style.

## A slightly changed example of `ReaderT` design pattern

I decided to make a few changes to the example in the article:

• I removed the `modify` function, instead the code uses the typeclass function `modifyBalance` directly.
• I separated the instances needed for the tests spatially in the code just to make it easier to see what’s “production” code and what’s test code.
• I combined the `main` functions from the various examples to that both an example (`main0`) and the test (`main1`) are run.
• I switched from `Control.Concurrent.Async.Lifted.Safe` (from `monad-control`) to `UnliftIO.Async` (from `unliftio`)

After that the code looks like this

``````{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}

import           Control.Concurrent.STM
import           Say
import           Test.Hspec
import           UnliftIO.Async

data Env = Env
{ envLog :: !(String -> IO ())
, envBalance :: !(TVar Int)
}

class HasLog a where
getLog :: a -> (String -> IO ())

instance HasLog Env where
getLog = envLog

class HasBalance a where
getBalance :: a -> TVar Int

instance HasBalance Env where
getBalance = envBalance

modifyBalance :: (Int -> Int) -> m ()

modifyBalance f = do
liftIO \$ atomically \$ modifyTVar' (getBalance env) f

logSomething msg = do
liftIO \$ getLog env msg

main0 :: IO ()
main0 = do
ref <- newTVarIO 4
let env = Env { envLog = sayString , envBalance = ref }
(concurrently_
(modifyBalance (+ 1))
(logSomething "Increasing account balance"))
env
sayString \$ "Final balance: " ++ show balance

instance HasLog (String -> IO ()) where
getLog = id

instance HasBalance (TVar Int) where
getBalance = id

modifyBalance = State.modify

main1 :: IO ()
main1 = hspec \$ do
describe "modify" \$ do
it "works, IO" \$ do
var <- newTVarIO (1 :: Int)
res `shouldBe` 3
it "works, pure" \$ do
let res = State.execState (modifyBalance (+ 2)) (1 :: Int)
res `shouldBe` 3
describe "logSomething" \$
it "works" \$ do
var <- newTVarIO ""
let logFunc msg = atomically \$ modifyTVar var (++ msg)
msg1 = "Hello "
msg2 = "World\n"
runReaderT (logSomething msg1 >> logSomething msg2) logFunc
res `shouldBe` (msg1 ++ msg2)

main :: IO ()
main = main0 >> main1``````

I think the distinguising features are

• The application environmant, `Env` will contain configuraiton values (not in this example), state, `envBalance`, and functions we might want to vary, `envLog`
• There is no explicit type representing the execution context
• Typeclasses are used to abstract over application environment, `HasLog` and `HasBalance`
• Typeclasses are used to abstract over operations, `MonadBalance`
• Typeclasses are implemented for both the application environment, `HasLog` and `HasBalance`, and the execution context, `MonadBalance`

In the end this makes for code with very loose couplings; there’s not really any single concrete type that implements all the constraints to work in the “real” main function (`main0`). I could of course introduce a type synonym for it

``type App = ReaderT Env IO``

but it brings no value – it wouldn’t be used explicitly anywhere.

## A tagless final version

In order to compare the `ReaderT` design pattern to tagless final (as I understand it) I made an attempt to translate the code above. The code below is the result.1

``````{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeFamilies #-}

import           Control.Concurrent.STM
import           Say
import           Test.Hspec
import           UnliftIO.Async

newtype Env = Env {envBalance :: TVar Int}

newtype AppM a = AppM {unAppM :: ReaderT Env IO a}

runAppM :: Env -> AppM a -> IO a
runAppM env app = runReaderT (unAppM app) env

class Monad m => ModifyM m where
mModify :: (Int -> Int) -> m ()

class Monad m => LogSomethingM m where
mLogSomething :: String -> m()

instance ModifyM AppM where
mModify f = do
liftIO \$ atomically \$ modifyTVar' ref f

instance LogSomethingM AppM where
mLogSomething = liftIO . sayString

main0 :: IO ()
main0 = do
ref <- newTVarIO 4
let env = Env ref
runAppM env
(concurrently_
(mModify (+ 1))
(mLogSomething "Increasing account balance"))
sayString \$ "Final balance: " ++ show balance

newtype ModifyAppM a = ModifyAppM {unModifyAppM :: State.StateT Int Id.Identity a}

runModifyAppM :: Int -> ModifyAppM a -> (a, Int)
runModifyAppM s app = Id.runIdentity \$ State.runStateT (unModifyAppM app) s

instance ModifyM ModifyAppM where
mModify = State.modify'

newtype LogAppM a = LogAppM {unLogAppM :: ReaderT (TVar String) IO a}

runLogAppM :: TVar String -> LogAppM a -> IO a
runLogAppM env app = runReaderT (unLogAppM app) env

instance LogSomethingM LogAppM where
mLogSomething msg = do
liftIO \$ atomically \$ modifyTVar var (++ msg)

main1 :: IO ()
main1 = hspec \$ do
describe "mModify" \$ do
it "works, IO" \$ do
var <- newTVarIO 1
runAppM (Env var) (mModify (+ 2))
res `shouldBe` 3
it "works, pure" \$ do
let (_, res) = runModifyAppM 1 (mModify (+ 2))
res `shouldBe` 3
describe "mLogSomething" \$
it "works" \$ do
var <- newTVarIO ""
runLogAppM var (mLogSomething "Hello" >> mLogSomething "World!")
res `shouldBe` "HelloWorld!"

main :: IO ()
main = main0 >> main1``````

The steps for the “real” part of the program were

1. Introduce an execution type, `AppM`, with a convenience function for running it, `runAppM`
2. Remove the log function from the environment type, `envLog` in `Env`
3. Remove all the `HasX` classes
4. Create a new operations typeclass for logging, `LogSomethingM`
5. Rename the operations typeclass for modifying the balance to match the naming found in the tagless article a bit better, `ModifyM`
6. Implement instances of both operations typeclasses for `AppM`

For testing the steps were

1. Define an execution type for each test, `ModifyAppM` and `LogAppM`, with some convenience functions for running them, `runModifyAppM` and `runLogAppM`
2. Write instances for the operations typeclasses, one for each

So I think the distinguising features are

• There’s both an environment type, `Env`, and an execution type `AppM` that wraps it
• The environment holds only configuration values (none in this example), and state (`envBalance`)
• Typeclasses are used to abstract over operations, `LogSomethingM` and `ModifyM`
• Typeclasses are only implemented for the execution type

This version has slightly more coupling, the execution type specifies the environment to use, and the operations are tied directly to the execution type. However, this coupling doesn’t really make a big difference – looking at the pure modify test the amount of code don’t differ by much.

### A short note (mostly to myself)

I did write it using `monad-control` first, and then I needed an instance for `MonadBaseControl IO`. Deriving it automatically requires `UndecidableInstances` and I didn’t really dare turn that on, so I ended up writing the instance. After some help on haskell-cafe it ended up looking like this

``````instance MonadBaseControl IO AppM where
type StM AppM a = a
liftBaseWith f = AppM (liftBaseWith \$ \ run -> f (run . unAppM))
restoreM = return``````

## Conclusion

My theoretical knowledge isn’t anywhere near good enough to say anything objectively about the difference in expressiveness of the two design patterns. That means that my conclusion comes down to taste, do you like the `readerT` patter or tagless final better?

I like the slightly looser coupling I get with the `ReaderT` pattern. Loose coupling is (almost) always a desirable goal. However, I can see that tying the typeclass instances directly to a concrete execution type results in the intent being communicated a little more clearly. Clearly communicating intent in code is also a desirable goal. In particular I suspect it’ll result in more actionable error messages when making changes to the code – the error will tell me that my execution type lacks an instance of a specific typeclass, instead of it telling me that a particular transformer stack does. On the other hand, in the `ReaderT` pattern that stack is very shallow.

One possibility would be that one pattern is better suited for libraries and the other for applications. I don’t think that’s the case though as in both cases the library code would be written in a style that results in typeclass constraints on the caller and providing instances for those typeclasses is roughly an equal amount of work for both styles.

1. Please do point out any mistakes I’ve made in this, in particular if they stem from me misunderstanding tagless final completely.

# Elixir Developers & Functional Programmers at CareDox (Full-time)

Love functional programming? Want to keep millions of children healthy? CareDox is hiring Elixir developers and functional programmers interested in learning Elixir.

We're a diverse, collaborative team of thoughtful, friendly engineers driven by CareDox's mission of working with schools and school nurses to immunize all children -- even the uninsured! -- against diseases like the flu and to help children with chronic illnesses (like asthma) manage their care.

We take pride in our Elixir code and use continuous integration tools to guarantee that all code going into production has high test coverage and clean compiler, Dialyzer, Credo, and Sobelow results.

We're conveniently located between Bryant Park and Times Square, walking distance from both Grand Central and Penn Station.

Requirements: * Functional programming experience (Haskell, Elm, Clojure, Lisp, Scheme, F#, Scala, OCaml, etc.) * Strong Unit Testing Skills

Bonus points for experience with: * Elixir, Phoenix, Erlang, and Ecto * The actor model (esp. OTP, Akka, or similar) * React & JavaScript (we use React) * SQL and Data modeling * Modern DevOps (CI/CD deployment, Docker, Kubernetes, etc.) * Working with Redis, Redshift, Elasticsearch, DynamoDB, GraphQL

Get information on how to apply for this position.

# Implementing Union in Esqueleto I

We use the SQL `UNION` operator at IOHK in one of our `beam` queries, and `esqueleto` does not support it. To make porting the IOHK SQL code more straightforward, I decided to implement `UNION`. This blog post series will delve into implementing this feature, in a somewhat stream-of-thought manner.

# Background

`esqueleto` is a SQL library that builds on the `persistent` library for database definitions and simple queries. It attempts to provide an embedded DSL that allows you to use SQL and Haskell together. In my opinion, it has less complicated types than `beam` and an easier to learn UX than `opaleye`. The `persistent` quasiquoter model definitions save a bunch of boilerplate, too.

`esqueleto` is implemented in a somewhat convoluted manner – we have a type class `Esqueleto query expr backend` that everything is defined in terms of. However, the functional depenencies on the class essentially only permit a single instance. The `query` type is always fixed to `SqlQuery`, a `WriterT [Clauses] (StateT IdentInfo)` monad. The `expr` type is always `SqlExpr`, which is a GADT that provides a structure for SQL expressions.

It’s kind of a tagless final encoding paired with a GADT initial encoding. Neat.

# Goal

Alright, so let’s start with the SQL. We want to be able to write a Haskell thing that translates to this SQL:

``````SELECT name
FROM person
UNION
SELECT title
FROM blog_post
``````

Let’s just write the syntax out that `esqueleto` usually uses, and see where that takes us:

``````unionTest = (
select \$
from \$ \person ->
return (person ^. PersonName)
) `union` (
select \$
from \$ \blog ->
return (blog ^. BlogPostTitle)
)
``````

This is a pleasing looking API! Can it work? What type would `union` need to have?

Well, probably not. Let’s look at the type of `select`:

``````select
:: (SqlSelect a r, MonadIO m)
=> SqlQuery a
``````

Once something has become a `SqlReadT` value, we can’t really introspect on the query structure anymore. So we can’t have this syntax :(

Let’s try something else:

``````unionTest =
select \$
union
( from \$ \person -> do
pure (person ^. PersonName)
)
( from \$ \blog -> do
pure (blog ^. BlogPostTitle)
)
``````

This means that `union` will end up returning a `SqlQuery a`. It takes two arguments, each of which is a query returning the same thing. We have our first attempt at a type to implement!

``````union
:: SqlQuery a
-> SqlQuery a
-> SqlQuery a
union query0 query1 = undefined
``````

# First Attempt

Alright, so, uh, how do we make values of type `SqlQuery`? Let’s first look at what the type actually is:

``````-- | SQL backend for @esqueleto@ using 'SqlPersistT'.
newtype SqlQuery a = Q
{ unQ :: W.WriterT SideData (S.State IdentState) a
}

-- | Side data written by 'SqlQuery'.
data SideData = SideData
{ sdDistinctClause :: !DistinctClause
, sdFromClause     :: ![FromClause]
, sdSetClause      :: ![SetClause]
, sdWhereClause    :: !WhereClause
, sdGroupByClause  :: !GroupByClause
, sdHavingClause   :: !HavingClause
, sdOrderByClause  :: ![OrderByClause]
, sdLimitClause    :: !LimitClause
, sdLockingClause  :: !LockingClause
}

-- | List of identifiers already in use and supply of temporary
-- identifiers.
newtype IdentState = IdentState
{ inUse :: HS.HashSet T.Text
}

initialIdentState :: IdentState
initialIdentState = IdentState mempty
``````

So, we use the `WriterT SideData` to accumulate information about the query we’re building. And then we use `IdentState` to keep track of identifiers in use.

Let’s look at some things that return `SqlQuery` values. I searched through the `Database.Esqueleto.Internal.Sql` module for `-> SqlQuery` and got some interesting results. The only function that returns a `SqlQuery` value in the whole module is this:

``````-- line 497
withNonNull :: PersistField typ
=> SqlExpr (Value (Maybe typ))
-> (SqlExpr (Value typ) -> SqlQuery a)
-> SqlQuery a
withNonNull field f = do
where_ \$ not_ \$ isNothing field
f \$ veryUnsafeCoerceSqlExprValue field
``````

Okay, so `where_` is a `SqlQuery` function. Let’s look for it’s definition:

``````class (Monad query) =>
Esqueleto query expr backend
| query -> expr backend
, expr -> query backend
where
-- snip...

-- in Database.Esqueleto.Internal.Language, line 93
-- | @WHERE@ clause: restrict the query's result.
where_ :: expr (Value Bool) -> query ()
``````

The class definition has functional dependencies that basically make it so you can determine any type variable from any other. Since `persistent` uses the `SqlBackend` type for the `backend`, you end up needing to totally pick `Esqueleto SqlQuery SqlExpr SqlBackend`, and you can’t vary any of those types.

Okay, let’s find the instance definition:

``````-- line 452 in Database.Esqueleto.Internal.Sql
where_ expr = Q \$ W.tell mempty { sdWhereClause = Where expr }

on expr = Q \$ W.tell mempty { sdFromClause = [OnClause expr] }

groupBy expr = Q \$ W.tell mempty { sdGroupByClause = GroupBy \$ toSomeValues expr }

having expr = Q \$ W.tell mempty { sdHavingClause = Where expr }

locking kind = Q \$ W.tell mempty { sdLockingClause = Monoid.Last (Just kind) }
``````

There’s actually a bunch, and they mostly just `tell` about a part of the query we’re building. Cool. This may be useful soon, but it’s not immediately obvious to me how.

I spent some time perusing the rest of the library, and I found another combinator that takes a `SqlQuery` value and produces something else:

``````sub
:: PersistField a
=> Mode
-> SqlQuery (SqlExpr (Value a))
-> SqlExpr (Value a)
sub mode query = ERaw Parens \$ \info -> toRawSql mode info query
``````

This looks useful! Let’s look at the `ERaw` constructor, which is from the `SqlExpr` datatype:

``````  -- Raw expression: states whether parenthesis are needed
-- around this expression, and takes information about the SQL
-- connection (mainly for escaping names) and returns both an
-- string ('TLB.Builder') and a list of values to be
-- interpolated by the SQL backend.
ERaw
:: NeedParens
-> (IdentInfo -> (TLB.Builder, [PersistValue]))
-> SqlExpr (Value a)
``````

Okay, so we can start with this approach and just generate the raw SQL we need. This is probably the wrong approach, but it might work, and working is better than imaginary.

``````union
:: PersistField a
=> SqlQuery (SqlExpr (Value a))
-> SqlQuery (SqlExpr (Value a))
-> SqlQuery (SqlExpr (Value a))
union query0 query1 =
pure \$ ERaw Parens \$ \info ->
let
(q0, v0) = toRawSql SELECT info query0
(q1, v1) = toRawSql SELECT info query1
in
(q0 <> " UNION " <> q1, v0 <> v1)
``````

This is basically what `sub` does. We just concatenate them with the `UNION` operator in between. Let’s write a test and see how it works!

## Testing the First Approach

I hop into the `esqueleto` test suite and start writing my test:

``````testCaseUnion :: Run -> Spec
testCaseUnion run = do
describe "union" \$ do
it "works" \$ do
run \$ do
let
names =
[ "john", "joe", "jordan", "james" ]
blogs =
[ "asdf", "qwer", "berty", "nopex" ]
(pid:_) <- forM names \$ \name ->
insert (Person name Nothing Nothing 3)

forM_ blogs \$ \blog ->
insert (BlogPost blog pid)

res <- select \$
( from \$ \person -> do
pure (person ^. PersonName)
)
`union`
( from \$ \blog -> do
pure (blog ^. BlogPostTitle)
)

liftIO \$
L.sort (map unValue res)
`shouldBe`
L.sort (names <> blogs)
``````

We insert four blogs and people into the database, and then get the `UNION` of their names and titles. Does it work?

No :(

``````  test/Common/Test.hs:1421:11:
1) Tests that are common to all backends, union, works
expected: ["asdf","berty","james","joe","john","jordan","nopex","qwer"]
but got: ["asdf"]
``````

Okay, that’s a bit weird. Why does it only pick the first? Let’s test our understanding and try a raw query. I’ll add a line that runs the raw SQL and then I’m going to `error` out to see the output in the test suite:

``````-- ...
res' <- rawSql
(  "SELECT Person.name FROM Person "
<> "UNION "
<> "SELECT BlogPost.title FROM BlogPost"
)
[]
error (show \$ map unSingle (res' :: [Single String]))
-- ...
``````

Now, we get an error that shows what that query returned:

``````  test/Common/Test.hs:1417:9:
1) Tests that are common to all backends, union, works
uncaught exception: ErrorCall
["asdf","berty","james","joe","john","jordan","nopex","qwer"]
CallStack (from HasCallStack):
error, called at test/Common/Test.hs:1417:9 in main:Common.Test
``````

Okay, that’s exactly what I expected to see! So there’s something weird about how the query is being generated. I want to get a textual representation of the query, and `toRawSql` is the function to do that. I’m going to make a wrapper around it:

``````renderQuery
:: (Monad m, EI.SqlSelect a r)
=> SqlQuery a
-> SqlPersistT m TL.Text
renderQuery q = do
pure (queryToText conn q)

queryToText :: EI.SqlSelect a r => SqlBackend -> SqlQuery a -> TL.Text
queryToText conn q =
let (tlb, _) = EI.toRawSql EI.SELECT (conn, EI.initialIdentState) q
in TLB.toLazyText tlb
``````

And we’ll render the query:

``````-- ...snip
let
q =
( from \$ \person -> do
pure (person ^. PersonName)
)
`union`
( from \$ \blog -> do
pure (blog ^. BlogPostTitle)
)
res <- select q
error . show =<< renderQuery q
-- snip...
``````

Now, what do we get?

``````  test/Common/Test.hs:1415:9:
1) Tests that are common to all backends, union, works
uncaught exception: ErrorCall
"SELECT (SELECT \"Person\".\"name\"\nFROM \"Person\"\n UNION SELECT \"BlogPost\".\"title\"\nFROM \"BlogPost\"\n)\n"
CallStack (from HasCallStack):
error, called at test/Common/Test.hs:1415:9 in main:Common.Test
``````

Okay, so it’s doing something that we don’t want. We want this:

``````SELECT name
FROM person
UNION
SELECT title
FROM blog_post
``````

And it’s doing this:

``````SELECT (
SELECT name
FROM person
UNION
SELECT title
FROM blog_post
)
``````

Which explains our problem! We actually need it do `SELECT * FROM (the union query)`. Or removing the outer SELECT entirely. So, this suggests that this isn’t the right approach.

Next post, I’ll attempt to find another way to implement it, and write down the stream-of-thought process on how I got there.

# Defining exceptions in Haskell - FP Complete

Let’s talk about exceptions. Programs do a thing successfully all the time, except sometimes when things didn’t work out. So we have “exceptions”, which, like anything fun in programming languages, was invented in Lisp in the 60’s .

# Defining exceptions in Haskell - FP Complete

Let’s talk about exceptions. Programs do a thing successfully all the time, except sometimes when things didn’t work out. So we have “exceptions”, which, like anything fun in programming languages, was invented in Lisp in the 60’s .

# Go Gotchas

As I wrote my first non trivial project in Go, I had a few head-scratching moments. Nothing that a quick search could clear up, but worth keeping in mind...

#### Nil is not nil.

You may run into that one quickly enough in tests. An instance of an interface can have a nil value but since it carries the interface type information, it does not equal to nil!! A lot has been written about it so make sure to read about it!

#### Redeclarations

The fact that you can use := to declare variables easily is a double edged sword, as declaring a variable with the same name as a previously declared one is allowed if it's in a different scope, and it shadows the original variable. So:

This code will only print True at the end. The "err" variable inside the if block has been redefined so even though it's not nil inside the if block, it IS nil once you exit it! So if you want to chain several operations that may return errors and then at the end handle whatever error occurred in a consistent way, you have to be careful.

#### Closures

This one gave more some Javascript PTSD:

This prints :
4
4
4

Which was not what I expected! One easy fix is to add a line saying i:=i before creating the func in the first loop, which by virtue of the previous gotcha redeclares a variable in the scope of the for loop body.

#### Errors

I'm not convinced in the end by the standard error reporting in Go. A lot has been written on that topic too. I feel that the code is peppered with error handling that gets in the way of readability and implementing cleanly the business logic. I'm sure there are some design tricks you can use, but you shouldn't have to resort to tricks to do clean error handling in a language, because at the end of the day it's how you handle errors that make up the quality of your software, not only how you handle the happy path.

These things you can watch out for and probably code around pretty easily, but they do impact a bit the ease of use of Go, even though I still think it's quite a simple elegant language to quickly be productive in.

Happy Go Hacking!

# Why Haskell IV: Typeclasses vs. Inheritance

Welcome to part four of our series comparing Haskell's data types to other languages. As I've expressed before, the type system is one of the key reasons I enjoy programming in Haskell. And this week, we're going to get to the heart of the matter. We'll compare Haskell's typeclass system with the idea of inheritance used by object oriented languages.

## Typeclasses Review

Before we get started, let's do a quick review of the concepts we're discussing. First, let's remember how typeclasses work. A typeclass describes a behavior we expect. Different types can choose to implement this behavior by creating an instance.

One of the most common classes is the `Functor` typeclass. The behavior of a functor is that it contains some data, and we can map a type transformation over that data.

In the raw code definition, a typeclass is a series of function names with type signatures. There's only one function for `Functor`: `fmap`:

``````class Functor f where
fmap :: (a -> b) -> f a -> f b``````

A lot of different container types implement this typeclass. For example, lists implement it with the basic `map` function:

``````instance Functor [] where
fmap = map``````

But now we can write a function that assumes nothing about one of its inputs except that it is a functor:

``stringify :: (Functor f) -> f Int -> f String``

We could pass a list of ints, an `IO` action returning an `Int`, or a `Maybe Int` if we wanted. This function would still work! This is the core idea of how we can get polymorphic code in Haskell.

## Inheritance Basics

As we saw in previous parts, object oriented languages like Java, C++, and Python tend to use inheritance to achieve polymorphism. With inheritance, we make a new class that extends the functionality of a parent class. The child class can access the fields and functions of the parent. We can call functions from the parent class on the child object. Here's an example:

``````public class Person {
public String firstName;
public String lastName;
public int age;

public Person(String fn, String ln, int age) {
this.firstName = fn;
this.lastName = ln;
this.age = age;
}

public String getFullName() {
return this.firstName + " " + this.lastName;
}
}

public class Employee extends Person {
public String company;
public String email;
public int salary;

public Employee(String fn,
String ln,
int age,
String company,
String em,
int sal) {
super(fn, ln, age);
this.company = company;
this.email = em;
this.sal = sal;
}
}``````

Inheritance expresses an "Is-A" relationship. An `Employee` "is a" `Person`. Because of this, we can create an `Employee`, but pass it to any function that expects a `Person`. We can also call the `getFullName` function from `Person` on our `Employee` type.

``````public void printPerson(Person p) {
...
}

public void main {
printPerson(e);
String s = e.getFullName();
}``````

Here's another trick. We can put items constructed as either `Person` or `Employee` in the same array, if that array has type `Person[]`:

``````public void main {
Person p = Person("Katie", "Johnson", 25);
Person[] people = {e, p};
}``````

This provides a useful kind of polymorphism we can't get in Haskell.

## Benefits

Inheritance does have a few benefits. It allows us to reuse code. The `Employee` class can use the `getFullName` function without having to define it. If we wanted, we could override the definition in the `Employee` class, but we don't have to.

Inheritance also allows a degree of polymorphism, as we saw in the code examples above. If the circumstances only require us to use a `Person`, we can use an `Employee` or any other subclass of `Person` we make.

We can also use inheritance to hide variables away when they aren't needed by subclasses. In our example above, we made all our instance variables `public`. This means an `Employee` function can still call `this.firstName`. But if we make them `private` instead, the subclasses can't use them in their functions. This helps to encapsulate our code.

## Drawbacks

Inheritance is not without its downsides though. One unpleasant consequence is that it creates a tight coupling between classes. If we change the parent class, we run the risk of breaking all child classes. If the interface to the parent class changes, we'll have to change any subclass that overrides the function.

Another potential issue is that your interface could deform to accommodate child classes. There might be some parameters only a certain child class needs, and some only the parent needs. But you'll end up having all parameters in all versions because the API needs to match.

A final problem comes from trying to understand source code. There's a yo-yo effect that can happen when you need to hunt down what function definition your code is using. For example your child class can call a parent function. That parent function might call another function in its interface. But if the child has overridden it, you'd have to go back to the child. And this pattern can continue, making it difficult to keep track of what's happening. It gets even worse the more levels of a hierarchy you have.

I was a mobile developer for a couple years, using Java and Objective C. These kinds of flaws were part of what turned me off OO-focused languages.

## Typeclasses as Inheritance

Now, Haskell doesn't allow you to "subclass" a type. But we can still get some of the same effects of inheritance by using typeclasses. Let's see how this works with the `Person` example from above. Instead of making a separate `Person` data type, we can make a `Person` typeclass. Here's one approach:

``````class Person a where
firstName :: a -> String
lastName :: a -> String
age :: a -> Int
getFullName :: a -> String

data Employee = Employee
{ employeeFirstName :: String
, employeeLastName :: String
, employeeAge :: Int
, company :: String
, email :: String
, salary :: Int
}

instance Person Employee where
firstName = employeeFirstName
lastName = employeeLastName
age = employeeAge
getFullName e = employeeFirstName e ++ " " ++ employeeLastName e``````

We can one interesting observation here. Multiple inheritance is now trivial. After all, a type can implement as many typeclasses as it wants. Python and C++ allows multiple inheritance. But it presents enough conceptual pains that languages like Java and Objective C do not allow it.

Looking at this example though, we can see a big drawback. We won't get much code reusability out of this. Every new type will have to define `getFullName`. That will get tedious. A different approach could be to only have the data fields in the interface. Then we could have a library function as a default implementation:

``````class Person a where
firstName :: a -> String
lastName :: a -> String
age :: a -> Int

getFullName :: (Person a) => a -> String
getFullName p = firstName p ++ " " ++ lastName p

data Employee = ...

instance Person Employee where
...``````

This allows code reuse. But it does not allow overriding, which the first example would. So you'd have to choose on a one-off basis which approach made more sense for your type. And no matter what, we can't place different types into the same array, as we could in Java.

So while we could do inheritance in Haskell, it's a pattern you should avoid. Stick to using typeclasses in the intended way.

## Comparisons

Object oriented inheritance has some interesting uses. But at the end of the day, I found the warts very annoying. Tight coupling between classes seems to defeat the purpose of abstraction. Meanwhile, restrictions like single inheritance feel like a code smell to me. The existence of that restriction suggests a design flaw. Finally, the issue of figuring out which version of a function you're using can be quite tricky. This is especially true when your class hierarchy is large.

Typeclasses express behaviors. And as long as our types implement those behaviors, we get access to a lot of useful code. It can be a little tedious to flesh out a new instance of a class for every type you make. But there are all kinds of ways to derive instances, and this can reduce the burden. I find typeclasses a great deal more intuitive and less restrictive. Whenever I see a requirement expressed through a typeclass, it feels clean and not clunky. This distinction is one of the big reasons I prefer Haskell over other languages.

## Conclusion

That wraps up our comparison of typeclasses and inheritance! There's one more topic I'd like to cover in this series. It goes a bit beyond the "simplicity" of Haskell into some deeper ideas. We've seen concepts like parametric types and typeclasses. These force us to fill in "holes" in a type's definition. We can expand on this idea by looking at type families. Next week, we'll explore this more advanced concept and see what it's useful for.

If you want to stay up to date with our blog, make sure to subscribe! That will give you access to our subscriber only resources page!

# A missing piece in my Emacs/Spacemacs setup for Haskell development

With the help of a work mate I’ve finally found this gem that’s been missing from my Spacemacs setup

``````(with-eval-after-load 'intero

# Hacking around on an FV-1 based guitar pedal

This is a diary-style post where I chronicle what I learn digging into a digital guitar effects pedal for the first time. I’m not sure how useful or interesting it will be to others.

I picked up an oddball modulation pedal from a “boutique” maker (being coy here) and decided there were a bunch of things I could improve about it, so I thought I’d try to tweak it. I know almost nothing about electronics.

I took the guts out of the enclosure and first googled the chips on the board. The largest was labeled “SpinSemiconductor FV-1” This is a “batteries included” DSP chip and the only real product from Spin.

IYI (IF YOU’RE INTERESTED): KEITH BARR AND THE FV-1:

One of the pleasures of this project has been learning a little about the designer of the FV-1, Keith Barr. He passed away in 2010 and some of his contributions are summarized in this tribute: cofounder of MXR and creator of the Phase 90, founder of Alesis, pioneer in bringing digital recording and then digital reverb to the masses, etc.

The FV-1 was developed and released in the mid-2000s and is responsible for the boom in “boutique” reverb and (along with the PT2399) non-BBD delay pedals, being used by the likes of: Old Blood Noise, Catalinbread, Neunaber, Earthquaker, Red Panda, Keeley, Dr. Scientist, Walrus Audio, etc, etc.

I’m new to the space but it seems like Keith was the only person doing this sort of work (cheap DSP chip that acts like another analog component, accessible to hobbyists) and with his death there is sadly no “next-gen” FV-1 coming along.

I also noticed what turned out to be an EEPROM chip And a spot marked “pgm” where a 5-pin connector header could be soldered, presumably for programming. I guessed the EEPROM was where the code itself was stored (having read this was a capability).

I traced the “pgm” solder pads carefully (using the setting on my multimeter that beeps when a circuit is formed), and found they all connected to the EEPROM. I drew up a little picture showing how the header and EEPROM chip relate.

At this point I was feeling pretty hopeful that I might be able to both write and possibly dump/decompile code from the EEPROM and tweak the functionality to my liking (maybe… eventually).

## Dumping the dang ROM

I didn’t have any kind of special programmer, but found that reading and writing to an EEPROM is easily done with an arduino

However I got concerned looking at the hardware setup: they connect 5v to pin A2, but all of mine are grounded.

I looked at the data sheet for the EEPROM and was surprised to find it really readable even for someone without much embedded background! I learned (sec 5.0 of the manual) these pins were for configuring the IÂ²C address for the chip (this is clear in the tutorial, but I didn’t get it at first). So my EEPROM was hardwired to communicate at a different address (0x50).

Another thing that confused me for a second: the manual states the EEPROM’s write-protect pin needs to be connected to either Vcc or Vss (ground), but it didn’t seem to be connected to either (according to my multimeter). But after looking closer I found it was connected to Vcc via a 1k resistor (I guess too much resistance for my multimeter to see continuity). I think the idea is that the corresponding pin from the header could be connected to ground and current would flow there overriding the write-protect). I’m sure this is probably a common technique.

I have an older “Duemilanove” arduino and had to look up the SDA and SCL pins on my board (pretty easy to find on the arduino site).

I hoped the pedal’s circuitboard would have the necessary pullup resistors soldered on so I didn’t have to use a breadboard at all but that didn’t seem to be the case. The arduino’s ATmega’s built-in pull-up resistors also apparently won’t work reliably for i2c communication, so I just wired things up like the SparkFun article above shows.

Here’s what that looked like (mostly for my own future reference); mystery pedal on the left, Duemilanove in the background:

(the dangling green wire can be connected to `Gnd` on the arduino to make the EEPROM writeable).

I used the arduino sketches from the sparkfun tutorial above, just tweaking some basics:

• modified `EEPROM_ADR` (zero out least sig bits, since my EEPROM was hardwired to 000 since all three address pins were grounded)
• loop over 32k bits with `#define EEPROM_SIZE 4096`

For developing and uploading the arduino sketch I used this this vim plugin. I just had to add to my `.vimrc`:

``````let g:arduino_board = 'arduino:avr:diecimila'
``````

…then an `:ArduinoUpload` in VIM builds and uploads the sketch to the board.

And the following seemed to be the simplest way to listen on the arduino serial port for the dumped EEPROM bytes. Note the baud passed to `-s` here should match `Serial.begin(<baud>)` in the arduino code:

``````\$ busybox microcom -s 115200 /dev/ttyUSB0 > pedal.rom
``````

After running this for the first time and inspecting the output with `hexdump` the project went off the rails for a while…: the output repeated every 512 bytes (so was this actually a 4096 bit EEPROM?) Microchip’s docs describe it as “a single block of 4K x 8-bit memory”

IYI: in fact “4k” here means 4096; they also write “kbit” to mean 1024 bits, which is not what I understand the term to mean. Bafflingly nowhere in the datasheet are they unambiguous about the actual size of the EEPROM. Also AFAIU there is no way to determine the size of an EEPROM without writing to it and reading back.

Or was it communicating with the arduino’s internal EEPROM on the same i2c address or something? (I unplugged one of the lines going to the EEPROM and got garbage, so ruled this out)

Or is the reader program actually buggy and skipping every 8 bytes, and then wrapping?

After inspecting the code I became more and more convinced the latter was what was happening:

• Arduino’s `Wire` library doesn’t look like i2c at all (in fact it’s communicating with a separate chip that does i2c asynchronously (although the Wire library doesn’t take advantage of this))
• the library is an undocumented mess, though this helped a litte
• most of the EEPROM code I tried didn’t actually match the documented spec as far as I could tell (e.g. no repeated START)

In short fertile ground for wasting a bunch of time chasing red herrings…

What was really happening is the chip actually contained 8 banks of identical programs, which is what the FV-1 in fact expects (they can be eight different programs obviously). Had I done a little more initial basic research about the FV-1, or taken the time to quickly rule out the idea that the EEPROM dump was correct despite the fact that I thought that was unlikely (easily done by writing to the last byte and reading again, which is what I eventually did), I would have saved myself a lot of time. This is like bayesian reasoning and it’s really easy to not do.

Also, oops. I had 5v running to the EEPROM (connected to the FV-1), although FV-1 takes 3.3v. Luckily this didn’t seem to fry the board…

## Assembling / Disassembling

Someone named Igor has written an FV1 decompiler presumably as a PHP script. Unfortunately the code doesn’t seem to be open source, and I wasn’t immediately sure how to contact the author. After uploading my ROM I got back a legitimate looking but at-this-point-totally-meaningless-to-me bunch of annotated assembly, like:

``````   CHO     RDAL, SIN0      ; { ACC = SIN0 } ; Get LFO value
WRAX    REG7 , 0        ; { REG7 = ACC; ACC = 0 * ACC ; PACC = ACC } ; * Note: C=0(16 bit) - s1.14 format
OR      0x20000         ; { ACC = ACC | %00000000000000100000000000000000 }
RDAX    REG7 , 1        ; { ACC = ACC + REG7 * 1 ; PACC = ACC } ; * Note: C=1(16 bit) - s1.14 format
``````

So that’s cool. Now I need to get an assembler and I’m off to the races.

The standard SpinASM assembler is Windows only (I haven’t tried it with Wine yet), so I tried an alternate one github user ndf-zz has written in python called asfv1:

``````\$ pip3 install asfv1
\$ asfv1 -b pedal.rom.disassembled pedal.rom.reassembled
``````

I got some errors which seemed related to a documented quirk that I didn’t totally understand. After adding decimals to the problematic literals I was able to successfully assemble my disassembled program!

Unfortunately comparing the first 512 bytes dumped from the EEPROM with my disassembled/reassembled output from asfv1 showed some differences. I upload the new ROM to the disassembler service again and looked at a diff and it appeared many immediate values of `1` were turned into e.g. `6.103515625E-5` or `0.001953125` after going through `asfv1` (and the mystery blockbox disassembler).

I re-read the `asfv1` README more carefully (I’d read “fixed-point” as “floating point”), did a little research and looked at a diff of the hexdumps of the roms and what was happening was pretty obvious:

`asfv1` compiled the first line of assembly below to `0x0001`, while the binary output for the original dumped ROM was achieved by using the decimal literal, as in the second line below:

``````                 `1` as unsigned 16-bit int

-
/   \
RDAX   ADCL , 1           00 01 02 84 ...etc
RDAX   ADCL , 1.0         40 00 02 84 ...etc
\   /
-

`1` as s1.14 fixed point value:
0b0100000000000000
\ fractional /
``````

I wasn’t familiar with the notation “s1.14” used in the asfv1 and FV-1 ASM docs, but it quite simply means a fixed-point real value represented by: a sign bit, followed by `1` integer bit, followed by `14` fractional bits (totalling 16 bits).

I dug into the asfv1 code and tweaked things so that, for real arguments, we treat decimal literals as real literals and values entered in hexadecimal or binary notation as a raw bit value.

With my fork I successfully assembled the output I got from the blackbox disassembler, and miraculously the output from the new `asfv1` matches the original program we dumped from the EEPROM (`head -c 512 pedal.rom`)!

``````\$ md5sum * | sort
9d13dcb79754603e85eca19cbf405c4a  pedal.rom.reassembled
...
``````

I did quick hack job on `asfv1` without understanding the code or SpinASM very deeply, so beware, but if you want to try out the fork you can do:

``````\$ pip3 install git+https://github.com/jberryman/asfv1.git
``````

I get the impression many if not most pedal makers are developing their algorithms with the GPL’d drag-and-drop editor SpinCAD Designer.

It seems difficult to build, but `luisfcorreia` has a fork where they’ve made it buildable with `maven`. Here’s what I had to do to get it to build and run successfully on my debian stretch box:

``````\$ git clone https://github.com/HolyCityAudio/SpinCAD-Designer.git
\$ apt install maven
\$ git branch maven
\$ cd spincad-designer  # seems this work was done on a copy of the codebase to a different dir in the repo...?
\$ _JAVA_OPTIONS=-Djdk.net.URLClassPath.disableClassPathURLCheck=true  mvn package
``````

SpinCAD is able to import “Spin Hex” which apparently means the Intel HEX encoded ROM data. This is I guess a common format to feed to EEPROM writers, programmers, etc.

After some trial and error I was able to convert the binary image into HEX in such a way that “File > Open Hex” in SpinCAD didn’t choke:

``````\$ sudo apt install srecord
\$ srec_cat pedal.rom.1 -binary -output pedal.rom.1.hex -Intel --line-length=19
``````

I was curious if SpinCAD would be able to disassemble and recognize the idioms from a spin ROM but unsurprisingly it does not. I probably won’t be using SpinCAD for this project, but the library of “modules” in the source code might be really valuable to learn from, or maybe I’ll build a visual “disassembler” myself using them some day.

A nice write-up/experience report from a first-time FV-1 developer, with good advice and links:

Easy DIY dev boards for FV-1 (why aren’t there more of these? What is the cheapest commercial FV-1 based pedal available?)

A alternatives to the FV-1 and some history:

Great resources for DIY pedal building generally to be found here. I found discussions of the FV-1 on all of these, though most builders focusing on analog electronics:

# Surprises of the Haskell module system (part 2)

It’s been 6 years since I published Surprises of the Haskell module system (part 1), so I thought it was time to follow through on the implicit promise and publish part 2.

## Exporting constructors without data type

Is it possible to export a data constructor without exporting its parent data type? To be clear, there’s probably no good reason to want to do that, but it’s a fun puzzle, and we’ll learn something while trying to solve it.

Surprisingly, the answer depends on how our constructor and type are named. First, consider the case when they are named differently, as in

``data T = C``

The naive attempt

``````module M (C) where
data T = C``````

does not work. The Haskell 2010 standard says this about data types in the export list:

An algebraic datatype T declared by a data or newtype declaration may be named in one of three ways:
• The form T names the type but not the constructors or field names. The ability to export a type without its constructors allows the construction of abstract datatypes (see Section 5.8).
• The form T(c1,,cn), names the type and some or all of its constructors and field names.
• The abbreviated form T(..) names the type and all its constructors and field names that are currently in scope (whether qualified or not).

Therefore, there seems to be no way to export a constructor without exporting its parent type at the same time. But here’s the trick. First, let’s move the data type definition from `M` to a different module, `Internal`:

``````module Internal where
data T = C``````

Then, in our module `M`, import `Internal` like this:

``````module M where
import Internal hiding (T)``````

This will import `C` but not `T`. Here’s Haskell 2010 on the semantics of `hiding` lists:

Entities can be excluded by using the form hiding(import1 ,  , importn ), which specifies that all entities exported by the named module should be imported except for those named in the list. Data constructors may be named directly in hiding lists without being prefixed by the associated type. Thus, in

import M hiding (C)

any constructor, class, or type named C is excluded. In contrast, using C in an import list names only a class or type.

So now, within module `M`, we have only `C` in scope but not `T`. But `M` does not export any of them yet, and it seems like we need `T` in scope in order to add `C` to the export list.

Here we use a second trick: instead of naming `C` specifically, we export our `Internal` module as a whole:

``````module M (module Internal) where
import Internal hiding (T)``````

And this achieves the desired result: `M` now exports the data constructor `C` but not the data type `T`. See Part 1 for more discussion of module exports.

This all works because the data type and its constructor are named differently. If we have

``data T = T``

this won’t work, because

``import Internal hiding (T)``

will hide both the data type and the data constructor. I actually don’t know a way to export the constructor `T` without the type `T`; if you do, please let me know.

## Exporting fields without constructor

A similar question is that of exporting a field name without exporting its parent constructor, as in

``````module M (T, field) where
data T = C { field :: Bool }``````

The above code works. But what does it mean? Field names can be used as functions (e.g. `field :: T -> Bool`), but they can also be used in the record construction (`C { field = True }`) and record update (`x { field = True }`) syntaxes.

Since we don’t export the data constructor, clearly record construction won’t work. But what about record update? Does `field` lose its magic and become a simple function when exported this way—either because it wasn’t exported as `T(field)`, or because the data constructor is not in scope? The answer is no, it is still a valid field name, and `x { field = True }` still works. Here’s Haskell 2010:

It makes no difference to an importing module how an entity was exported. For example, a field name f from data type T may be exported individually (f, item (1) above); or as an explicitly-named member of its data type (T(f), item (2)); or as an implicitly-named member (T(..), item(2)); or by exporting an entire module (module M, item (5)).

And there’s no requirement that the constructor be in scope in order to use the record update syntax—even though its desugaring would require all of the constructors to be in scope.

Some authors use this to their advantage: for instance, Michael Snoyman suggests exporting fields but not constructors for settings types.

But more often than not, we want the opposite: to make the type abstract and export a getter for it. If you think the module `M` above exports an abstract type `T` and a function `field :: T -> Bool`, then you may believe you are free to change the implementation of `T` without your users noticing. So in the next version you may change `T` to

``````module M (T, field) where
data T = C { items :: [Int] }
field :: T -> Bool
field = not . null . items``````

This new module still exports a type `T` and a function `field :: T -> Bool`, but that function is no longer a record field. If someone was using `field` in a record update, `x { field = True }`, their code is going to break.

There’s no way to export a field as a plain function or otherwise strip it of its magic. You can either define a getter yourself, as in

``````data T = C Bool
field :: T -> Bool
field = \case
C a -> a``````

or create a copy of a field, as in

``````data T = C { _field :: Bool }
field :: T -> Bool
field = _field``````

## Hidden types and type synonyms

Consider this example:

``````module M (x) where

type T = Int

x :: T
x = 1 ``````

What type does `x` have in a module that imports `M`? On the one hand, `T` is `Int`, so `x :: Int`. On the other hand, since `T` is not in scope, there’s no way to know that `T = Int`, so it might seem that `x` is now an opaque value of some abstract type `T` that we know nothing about. In fact, the former view is correct and the latter isn’t.

The type of an exported entity is unaffected by non-exported type synonyms. For example, in

module M(x) where
type T = Int
x :: T
x = 1

the type of x is both T and Int; these are interchangeable even when T is not in scope. That is, the definition of T is available to any module that encounters it whether or not the name T is in scope. The only reason to export T is to allow other modules to refer it by name; the type checker finds the definition of T if needed whether or not it is exported.

Similarly, you may wonder what does it mean to export a function but hide some of the types that are part of its signature. Here’s what Haskell 2010 has to say about that:

[E]ntities that the compiler requires for type checking or other compile time analysis need not be imported if they are not mentioned by name. The Haskell compilation system is responsible for finding any information needed for compilation without the help of the programmer. That is, the import of a variable x does not require that the datatypes and classes in the signature of x be brought into the module along with x unless these entities are referenced by name in the user program. The Haskell system silently imports any information that must accompany an entity for type checking or any other purposes. Such entities need not even be explicitly exported: the following program is valid even though T does not escape M1:

module M1(x) where
data T = T
x = T

module M2 where
import M1(x)
y = x

In this example, there is no way to supply an explicit type signature for y since T is not in scope. Whether or not T is explicitly exported, module M2 knows enough about T to correctly type check the program.

## Self-exporting module

By default—when no export list is present in a module—all values, types and classes defined in the module are exported, but not those that are imported. With an explicit export list, we can export a mix of entities from the module itself and the modules it imports. But now you have to explicitly list everything defined in the current module and remember to update the list when you define a new function or type. Or do you?

There’s a nice trick to have an explicit export list and automatically export everything from the current module at the same time: just let the module export itself.

``````-- exports x, y, and z
module M (module M, Foo.x, Bar.y) where

import Foo
import Bar

z = undefined``````

Thanks to the clear semantics of module exports (which we discussed in Part 1), self-exporting is well-defined and doesn’t involve any recursion.

# HLint Unused Extension Hints

Summary: HLint detects unused extensions in LANGUAGE pragmas, including over 17,000 on Hackage.

HLint has detected unused `LANGUAGE` pragmas for a while - where you enable an extension (e.g. `{-# LANGUAGE EmptyDataDecls #-}`) but don't use it. HLint v2.1.13 includes some improvements from Yair and myself making these hints even more powerful. As a result, I thought it worth showing some examples of what HLint can do in this area. I started by running HLint on all of Hackage, which found 17,718 "Unused LANGUAGE pragma" hints, including the examples in this post.

Detecting unused extensions

For extensions that show up as syntax (e.g. `EmptyDataDecls`, `ViewPatterns` etc), HLint has rules saying which constructs require which extensions. For extensions that aren't syntax directed (e.g. `AllowAmbiguousTypes` or `IncoherentInstances`), HLint can't detect whether they are used or not. In all, HLint has rules for detecting 36 different unused extensions. Taking a look at some examples from Hackage:

``abcBridge-0.15\src\Data\ABC\AIG.hs:18:1: Warning: Unused LANGUAGE pragmaFound:  {-# LANGUAGE EmptyDataDecls #-}Perhaps you should remove it.mallard-0.6.1.1\lib\Database\Mallard\Validation.hs:4:1: Warning: Unused LANGUAGE pragmaFound:  {-# LANGUAGE TemplateHaskell #-}Perhaps you should remove it.scholdoc-texmath-0.1.0.1\src\Text\TeXMath\Writers\TeX.hs:1:1: Warning: Unused LANGUAGE pragmaFound:  {-# LANGUAGE GeneralizedNewtypeDeriving, ViewPatterns, GADTs #-}Perhaps:  {-# LANGUAGE GeneralizedNewtypeDeriving, GADTs #-}``

As we can see, HLint can spot entirely redundant extension declarations, and also prune those that are partly redundant.

Duplicate extensions

Sometimes extension are simply duplicated, and HLint detects these, either between two separate pragmas, or within a single pragma.

``ghcjs-base-stub-0.2.0.0\src\GHCJS\Marshal\Pure.hs:3:1: Warning: Use fewer LANGUAGE pragmasFound:  {-# LANGUAGE DefaultSignatures #-}  {-# LANGUAGE DefaultSignatures #-}Perhaps:  {-# LANGUAGE DefaultSignatures #-}abstract-deque-tests-0.3\Data\Concurrent\Deque\Tests.hs:1:1: Warning: Use fewer LANGUAGE pragmasFound:  {-# LANGUAGE BangPatterns, RankNTypes, CPP, BangPatterns #-}Perhaps:  {-# LANGUAGE BangPatterns, RankNTypes, CPP #-}``

Implied extensions

The new feature for v2.1.13 is that extension are detected as redundant if they are implied by other extensions. For example, if you have `PolyKinds` defined then that implies `KindSignatures`. HLint now features a list of such implications, which it uses to detect redundant extensions.

``AERN-RnToRm-0.5.0.1\src\Data\Number\ER\RnToRm\UnitDom\Base.hs:1:1: Warning: Unused LANGUAGE pragmaFound:  {-# LANGUAGE MultiParamTypeClasses #-}Perhaps you should remove it.Note: Extension MultiParamTypeClasses is implied by FunctionalDependenciesattoparsec-0.13.2.2\Data\Attoparsec\ByteString\Char8.hs:1:1: Warning: Unused LANGUAGE pragmaFound:  {-# LANGUAGE BangPatterns, CPP, FlexibleInstances, TypeFamilies,    TypeSynonymInstances, GADTs #-}Perhaps:  {-# LANGUAGE BangPatterns, CPP, FlexibleInstances, TypeFamilies,    GADTs #-}Note: Extension TypeSynonymInstances is implied by FlexibleInstances``

Redundant extensions that imply non-redundant extensions

Sometimes there is an extension that you can tell is unused (e.g. `RecordWildCards`), which implies an extension that is either being used or can't be detected (e.g. `DisambiguateRecordFields`). In such cases HLint gives a note that the implied extension might now need to be provided explicitly, although usually it won't be necessary. As examples:

``gogol-maps-engine-0.3.0\gen\Network\Google\Resource\MapsEngine\Projects\List.hs:7:1: Warning: Unused LANGUAGE pragmaFound:  {-# LANGUAGE RecordWildCards #-}Perhaps you should remove it.Note: may require `{-# LANGUAGE DisambiguateRecordFields #-}` adding to the top of the filemanifolds-0.5.0.1\Data\Function\Affine.hs:14:1: Warning: Unused LANGUAGE pragmaFound:  {-# LANGUAGE FunctionalDependencies #-}Perhaps you should remove it.Note: may require `{-# LANGUAGE MultiParamTypeClasses #-}` adding to the top of the file``

Being wrong

Finally, sometimes HLint gets it a bit wrong. As an example:

``shake-0.17.1\src\Development\Shake\Internal\FileInfo.hs:1:1: Warning: Unused LANGUAGE pragmaFound:  {-# LANGUAGE GeneralizedNewtypeDeriving, DeriveDataTypeable, CPP,    ForeignFunctionInterface #-}Perhaps:  {-# LANGUAGE GeneralizedNewtypeDeriving, DeriveDataTypeable, CPP    #-}``

Here HLint has detected that `ForeignFunctionInterface` is not used, but in fact it is, although only under one branch of an `#ifdef`. To fix the hint we can put the extension itself under CPP, adjust the CPP definitions given to HLint, or ignore the hint.

# A New Ring Solver for Agda

Posted on January 25, 2019
Tags: Agda

I’m finally a the point where I feel like I can make the project I’ve been working on for the past few months public: A Ring Solver for Agda. The focus of the project is ergonomics and ease-of-use: hopefully the interface to the solver is simpler and more friendly than the one that’s already there. It can do step-by-step solutions (like Wolfram Alpha). It’s also asymptotically faster than the old solver (and actually faster! The usual optimizations you might apply don’t actually work here, so this bit definitely took the most work).

Anyway, this work is all for my undergrad final year project, but I’m hoping to submit it to a conference or something in the next few weeks.

# Mismatched global packages

This blog post covers an issue with how packages are shipped with GHC, and an edge case of how this can negatively interact with Stack. The overall point about the package contents mismatch will apply more broadly, but I’m mostly focused in this post on how to better handle this in Stack.

## Intro

GHC ships with a few wired in packages which cannot be reinstalled. Examples are `ghc`, `base`, and `template-haskell`. The situation with these for all build tools is simple: the GHC version determines the package version.

There are also packages which are not shipped with GHC at all. All build tools get to determine whatever version of these packages they want, assuming compatibility with the GHC version and the packages that are wired in with it.

There’s an in-between category: packages like `process` and `transformers` are shipped with GHC, but can also be easily reinstalled from Hackage. Some tools may decide to eagerly recompile the latest version of them. Stackage takes a different approach: in order to avoid invalidating the `ghc` package, it avoids recompiling any of these packages, and sticks by default with the global version. Stack is designed to work with that.

## When recompilation happens

Suppose you have the following `stack.yaml` file:

``````resolver: ghc-8.6.3
packages:
- .
``````

And suppose your local package depends on the `unix` package. On non-Windows systems, `unix` ships with GHC. Running `stack build` will therefore result in only building the local package, not `unix`, since `unix` is in the global package database.

Now let’s suppose that, instead of using a `ghc-8.6.3` resolver, you use an LTS Haskell resolver:

``````resolver: lts-13.4
packages:
- .
``````

Same thing: LTS Haskell snapshots never specify versions of global packages. Instead, they are implicitly inherited from the GHC global package database. In a few specific cases (like the `stack init` command), we want to avoid requiring that GHC is already installed, so we instead use a global hints file to specify the versions of packages which ship with GHC.

The `unix` package depends on the `time` package. Let’s suppose your `stack.yaml` file instead said:

``````resolver: lts-13.4
packages:
- .
extra-deps:
- time-1.9.2
``````

And suppose your local package depends on both `unix` and `time`. If you run `stack build`, Stack has three basic options:

1. Only recompile `time`, but keep the original `unix`. That’s a bad idea: you’ll now have two different versions of the datatypes in the `time` package floating around. People may remember extremely confusing error messages about “cannot match `ByteString` with `ByteString`” or similar. It comes from this kind of a build plan.
2. Fail to compile at all. The `unix` in the global database is “invalidated” by the presence of a newly compiled `time`, and your package depends on `unix`. Just give up.
3. Automatically add an `extra-dep` version of `unix` to the build plan, and recompile both `time` and `unix`.

(1) is a really bad idea, and (2) is pretty annoying. So today, Stack follows (3), and automatically/implicitly adds the invalidated global packages to the build plan.

## Where do global packages come from?

Look at the following simplified output of a command describing the global `process` package shipped with GHC 8.6.3:

``````\$ stack --resolver ghc-8.6.3 exec -- ghc-pkg describe process
name: process
version: 1.6.3.0
...
depends:
base-4.12.0.0 deepseq-1.4.4.0 directory-1.3.3.0 filepath-1.4.2.1
unix-2.7.2.2
``````

Important bits:

• We’re using `process-1.6.3.0`
• We’re depending on `base-4.12.0.0`

Now check out the `process.cabal` file for `process-1.6.3.0` on Hackage and on Github. You’ll notice the following bound:

``````    build-depends: base      >= 4.4 && < 4.12,
``````

These pieces of information are contradictory: the cabal file says `base < 4.12`, but GHC ships with this version of `process` and `base-4.12.0.0`! How is this possible?

Well, it turns out, when GHC says it has `process-1.6.3.0`, it’s not code that’s downloaded from Hackage. Instead, it’s established via a submodule. Specifically, you can see on Github that GHC 8.6.3 is using `process` at Git commit 36a3ad5. By contrast, the `v1.6.3.0` tag points to Git commit 7c0b5814.

This isn’t a recent revelation, and not only my revelation. I peripherally knew about this, but I’ve heard of three different people discovering this problem in the past month. I’m writing this blog post to point out how it negatively affects Stack, and what to do about it.

## How it breaks Stack

`process` depends on the `directory` package. Let’s use an extra-dep to provide a new version of `directory`, and then try to build:

``````\$ cat stack.yaml
resolver: ghc-8.6.3
packages: []
extra-deps:
- directory-1.3.3.1
\$ stack build process

Error: While constructing the build plan, the following exceptions were encountered:

In the dependencies for process-1.6.3.0:
base-4.12.0.0 from stack configuration does not match >=4.4 && <4.12  (latest matching version is 4.11.1.0)
needed since process is a build target.
``````

Well that’s darn confusing. I changed a version of `directory`, and suddenly `process` has a bounds error?!? This makes perfect sense given the above (though it’s certainly unsatisfying):

• GHC ships with a different version of `process-1.6.3.0`
• `process` needs to be recompiled when we change the `directory` package
• Stack assumes that it can simply use `process-1.6.3.0` to replace the global `process` package
• However, the `process` on Hackage differs slightly from the one shipped with GHC

This is a relatively simple problem which results in an annoying bounds error, and can be easily fixed by adding a new `process` extra-dep. However, other problems are much more severe:

• What if GHC is compiling one of these packages with a special cabal flag? We won’t be able to reproduce that. In fact, there are comments in the Stack code base about this already.
• What if the code on Hackage is fundamentally different from what’s shipped with GHC? We could get silent but significant behavior changes.

## Solution: make no assumptions

The core problem here is that Stack assumes `process-1.6.3.0` provided by GHC is the same as what’s on Hackage. That assumption plays itself out in our selection of option (3) above. I propose: get rid of the assumption! Instead, move to option (2): if we replace a global package, prune all global packages that depend on it, and require adding those as explicit `extra-dep` entries.

This will break some existing build plans, but I’m proposing this change for Stack 2.0, which is already allowing some slight breakage along those lines. This will add some annoyance to users, but with the result of more clarity on what’s happening in a build plan. And perhaps we can even make an error message for these cases which points right at this post!

# An in-depth look at quickcheck-state-machine

Stateful APIs are everywhere: file systems, databases, widget libraries, the list goes on. Automated testing of such APIs requires generating sequences of API calls, and when we find a failing test, ideally shrinking such a sequence to a minimal test case. Neither the generation nor the shrinking of such sequences is trivial. After all, it is the very nature of stateful systems that later calls may depend on earlier calls: we can only add rows to a database table after we create it, we can only write to a file after we open it, etc. Such dependencies need to be tracked carefully. Moreover, in order to verify the responses we get back from the system, the test needs to maintain some kind of internal representation of what it thinks the internal state of the system is: when we read from a file, we need to know what was in the file in order to be able to verify if the response was correct or not.

In this blog post we will take an in-depth look at `quickcheck-state-machine`, a library for testing stateful code. Our running example will be the development of a simple mock file system that should behave identically to a real file system. Although simple, the example will be large enough to give us an opportunity to discuss how we can verify that our generator is producing all test cases we need, and how we can inspect whether the shrinker is doing a good job; in both cases, test case labelling will turn out to be essential. Throughout we will also discuss design patterns for `quickcheck-state-machine` tests which improve separation of concerns and reduce duplication. It should probably be pointed out that this is an opinionated piece: there are other ways to set things up than we present here.

We will not show the full development in this blog post, and instead focus on explaining the underlying concepts. If you want to follow along, the code is
available for download. We will assume version 0.6 of `quickcheck-state-machine`, which was recently released. If you are using an older version, it is recommended to upgrade, since the newer version includes some important bug fixes, especially in the shrinker.

## Introducing the running example

Our running example will be the development of a simple mock file system; the intention is that its behaviour is identical to the real file system, within the confines of what it needs to support. We will represent the state of the file system as

``````data Mock = M {
dirs  :: Set Dir
, files :: Map File String
, open  :: Map MHandle File
, next  :: MHandle
}

type MHandle = Int

emptyMock :: Mock
emptyMock = M (Set.singleton (Dir [])) Map.empty Map.empty 0``````

We record which directories (folders) exist, the contents of all files on the system, the currently open handles (where mock handles are just integers), and the next available mock handle. To avoid confusion between files and directories we do not use `FilePath` but instead use

``````data Dir  = Dir [String]
data File = File {dir :: Dir, name :: String}``````

As one example, here is the mock equivalent of `readFile`:

``````type MockOp a = Mock -> (Either Err a, Mock)

mRead :: File -> MockOp String
mRead f m@(M _ fs hs _)
| alreadyOpen               = (Left Busy         , m)
| Just s <- Map.lookup f fs = (Right s           , m)
| otherwise                 = (Left DoesNotExist , m)
where
alreadyOpen = f `List.elem` Map.elems hs``````

We first check if there is an open handle to the file; if so, we disallow reading this file (“resource busy”); if the file exists, we return its content; otherwise we report a “does not exist” error. The implementation of the other mock functions is similar; the full API is

``````mMkDir :: Dir               -> MockOp ()
mOpen  :: File              -> MockOp MHandle
mWrite :: MHandle -> String -> MockOp ()
mClose :: MHandle           -> MockOp ()
mRead  :: File              -> MockOp String``````

Finally, we should briefly talk about errors; the errors that the mock file system can report are given by

``data Err = AlreadyExists | DoesNotExist | HandleClosed | Busy``

and they capture a subset of the IO exceptions1

``````fromIOError :: IOError -> Maybe Err
fromIOError e =
case ioeGetErrorType e of
GHC.NoSuchThing      -> Just DoesNotExist
GHC.ResourceBusy     -> Just Busy
GHC.IllegalOperation -> Just HandleClosed
_otherwise           -> Nothing``````

## Testing

Typically we are developing some stateful code, and then write a pure (mock) implementation of the same thing to test it, making sure that the stateful implementation and the simpler pure model compute the same things. Here we are doing the opposite: we are adjusting the model (the mock file system) to match what the real file system does. Either way, the process is the same: we write tests, execute them both in the real thing and in the model, and compare results.

If we were writing unit tests, we might write tests such as

• Write to two different files
• Write to a file and then read it
• etc.

However, as John Hughes of QuviQ – and one of the original authors of QuickCheck – likes to point out, as systems grow, it becomes impossible to write unit tests that test the interaction between all the features of the system. So, don’t write unit tests. Instead, generate tests, and verify properties.

To generate tests for our mock file system, we have to generate sequences of calls into the API; “open this file, open that file, write to the first file we opened, …”. We then execute this sequence both against the mock file system and against the real thing, and compare results. If something goes wrong, we end up with a test case that we can inspect. Ideally, we should then try to reduce this test to try and construct a minimal test case that illustrates the bug. We have to be careful when shrinking: for example, when we remove a call to `open` from a test case, then any subsequent writes that used that file handle must also be removed. A library such as `quickcheck-state-machine` can be used both to help with generating such sequences and, importantly, with shrinking them.

## Reifying the API

It is important that we generate the test before executing it. In other words, the test generation should not depend on any values that we only get when we run the test. Such a dependency makes it impossible to re-run the same test multiple times (no reproducible test cases) or shrink tests to obtain minimal examples. In order to do this, we need to reify the API: we need to define a data type whose constructors correspond to the API calls:

``````data Cmd h =
MkDir Dir
| Open File
| Write h String
| Close h

`Cmd` is polymorphic in the type of handles `h`; this is important, because we should be able to execute commands both against the mock file system and against the real file system:

``````runMock ::       Cmd MHandle -> Mock -> (.. Mock)
runIO   :: .. -> Cmd IO.Handle -> IO ..``````

What should the return type of these functions be? After all, different functions return different things: `Open` returns a new handle, `Read` returns a string, the other functions return unit. To solve this problem we will simply introduce a union type2 for successful responses

``data Success h = Unit () | Handle h | String String``

A response is then either a succesful response or an error:

``newtype Resp h = Resp (Either Err (Success h))``

It is now easy to implement `runMock`: we just map all the constructors in `Cmd` to the corresponding API calls, and wrap the result in the appropriate constructor of `Success`:

``````runMock :: Cmd MHandle -> Mock -> (Resp MHandle, Mock)
runMock (MkDir d)   = first (Resp . fmap Unit)   . mMkDir d
runMock (Open  f)   = first (Resp . fmap Handle) . mOpen  f
runMock (Write h s) = first (Resp . fmap Unit)   . mWrite h s
runMock (Close h)   = first (Resp . fmap Unit)   . mClose h
runMock (Read  f)   = first (Resp . fmap String) . mRead  f``````

where `first :: (a -> b) -> (a, x) -> (b, x)` comes from `Data.Bifunctor`.

We can write a similar interpreter for IO; it will take a `FilePath` as an additional argument that it will use as a prefix for all paths; we will use this to run the IO test in some temporary directory.

``runIO :: FilePath -> Cmd IO.Handle -> IO (Resp IO.Handle)``

## References

Our interpreter for IO takes real IO handles as argument; but will not have any real handles until we actually run the test. We need a way to generate commands that run in IO but don’t use real handles (yet). Here is where we see the first bit of infrastructure provided by `quickcheck-state-machine`, references:

``data Reference a r = Reference (r a)``

where we will instantiate that `r` parameter to either `Symbolic` or `Concrete`:3

``````data Symbolic a = Symbolic Var
data Concrete a = Concrete a``````

In other words, a `Reference a Concrete` is really just a wrapper around an `a`; indeed, `quickcheck-state-machine` provides

``````reference :: a -> Reference a Concrete
concrete  :: Reference a Concrete -> a``````

However, a `Reference a Symbolic` is a variable:

``newtype Var = Var Int``

An example of a program using symbolic references is

``````openThenWrite :: [Cmd (Reference IO.Handle Symbolic)]
openThenWrite = [
Open (File (Dir []) "a")
, Open (File (Dir []) "b")
, Write (Reference (Symbolic (Var 0))) "Hi"
]``````

This program corresponds precisely to our example from earlier: “open this file, open that file, then write to the first file we opened”. Commands can return as many symbolic references in their result values as they want4; in our simple example, only `Open` creates a new reference, and so `Var 0` returns to the handle returned by the first call to `Open`.

When we execute the test, those variables will be instantiated to their real values, turning symbolic references into concrete references. We will of course not write programs with symbolic references in them by hand; as we will see later, `quickcheck-state-machine` provides infrastructure for doing so.

Since we will frequently need to instantiate `Cmd` and `Resp` with references to handles, we will introduce some special syntax for this:

``````newtype At f r = At (f (Reference IO.Handle r))
type    f :@ r = At f r``````

For example, here is a wrapper around `runIO` that we will need that executes a command with concrete references:

``````semantics :: FilePath -> Cmd :@ Concrete -> IO (Resp :@ Concrete)
semantics root (At c) = (At . fmap reference) <\$>
runIO root (concrete <\$> c)``````

This is really just a call to `runIO`, with some type wrapping and unwrapping.

## Relating the two implementations

When we run our tests, we will execute the same set of commands against the mock implementation and in real IO, and compare the responses we get after each command. In order to compare, say, a command “write to this `MHandle`” against the mock file system to a command “write to this `IOHandle`” in IO, we need to know the relation between the mock handles and the IO handles. As it turns out, the most convenient way to store this mapping is as a mapping from references to the IO handles (either concrete or symbolic) to the corresponding mock handles.

``````type HandleRefs r = [(Reference IO.Handle r, MHandle)]

(!) :: Eq k => [(k, a)] -> k -> a
env ! r = fromJust (lookup r env)``````

Then to compare the responses from the mock file system to the responses from IO we need to keep track of the state of the mock file system and this mapping; we will refer to this as the model for our test:

``````data Model r = Model Mock (HandleRefs r)

initModel :: Model r
initModel = Model emptyMock []``````

The model must be polymorphic in `r`: during test generation we will instantiate `r` to `Symbolic`, and during test execution we will instantiate `r` to `Concrete`.

## Stepping the model

We want to work towards a function

``transition :: Eq1 r => Model r -> Cmd :@ r -> Resp :@ r -> Model r``

to step the model; we will gradually build up towards this. First, we can use the model to translate from commands or responses in terms of references to the corresponding commands or responses against the mock file system:

``````toMock :: (Functor f, Eq1 r) => Model r -> f :@ r -> f MHandle
toMock (Model _ hs) (At fr) = (hs !) <\$> fr``````

Specifically, this can be instantiated to

``toMock :: Eq1 r => Model r -> Cmd :@ r -> Cmd MHandle``

which means that if we have a command in terms of references, we can translate that command to the corresponding command for the mock file system and execute it:

``````step :: Eq1 r => Model r -> Cmd :@ r -> (Resp MHandle, Mock)
step m@(Model mock _) c = runMock (toMock m c) mock``````

In order to construct the full new model however we also need to know how to extend the handle mapping. We can compute this by comparing the response we get from the “real” semantics (`Resp :@ r`) to the response we get from the mock semantics (from `step`), and simply zip the handles from both responses together to obtain the new mapping. We wrap all this up into an event:

``````data Event r = Event {
before   :: Model  r
, cmd      :: Cmd :@ r
, after    :: Model  r
, mockResp :: Resp MHandle
}``````

and we construct an event from a model, the command we executed, and the response we got from the real implementation:

``````lockstep :: Eq1 r
=> Model   r
-> Cmd  :@ r
-> Resp :@ r
-> Event   r
lockstep m@(Model _ hs) c (At resp) = Event {
before   = m
, cmd      = c
, after    = Model mock' (hs <> hs')
, mockResp = resp'
}
where
(resp', mock') = step m c
hs' = zip (toList resp) (toList resp')``````

The function we mentioned at the start of this section is now easily derived:

``````transition :: Eq1 r => Model r -> Cmd :@ r -> Resp :@ r -> Model r
transition m c = after . lockstep m c``````

as well as a function that compares the mock response and the response from the real file system and checks that they are the same:

``````postcondition :: Model   Concrete
-> Cmd  :@ Concrete
-> Resp :@ Concrete
-> Logic
postcondition m c r = toMock (after e) r .== mockResp e
where
e = lockstep m c r``````

We will pass this function to `quickcheck-state-machine` to be run after every command it executes to make sure that the model and the real system do indeed return the same responses; it therefore does not need to be polymorphic in `r`. (`Logic` is a type introduced by `quickcheck-state-machine`; think of it as a boolean with some additional information, somewhat similar to `QuickCheck`’s `Property` type.)

Events will also be very useful when we label our tests; more on that later.

## Constructing symbolic responses

We mentioned above that we will not write programs with symbolic references in it by hand. Instead what will happen is that we execute commands in the mock file system, and then replace any of the generated handles by new variables. Most of this happens behind the scenes by `quickcheck-state-machine`, but we do need to give it this function to construct symbolic responses:

``````symbolicResp :: Model Symbolic
-> Cmd :@ Symbolic
-> GenSym (Resp :@ Symbolic)
symbolicResp m c = At <\$> traverse (const genSym) resp
where
(resp, _mock') = step m c``````

This function does what we just described: we use `step` to execute the command in the mock model, and then traverse the response, constructing a new (fresh) variable for each handle. `GenSym` is a monad defined in `quickcheck-state-machine` for the sole purpose of generating variables; we won’t use it anywhere else except in this function.

## Generating commands

To generate commands, `quickcheck-state-machine` requires a function that produces the next command given the current model; this function will be a standard `QuickCheck` generator. For our running example, the generator is easy to write:

``````generator :: Model Symbolic -> Maybe (Gen (Cmd :@ Symbolic))
generator (Model _ hs) = Just \$ QC.oneof \$ concat [
withoutHandle
, if null hs then [] else withHandle
]
where
withoutHandle :: [Gen (Cmd :@ Symbolic)]
withoutHandle = [
fmap At \$ MkDir <\$> genDir
, fmap At \$ Open  <\$> genFile
, fmap At \$ Read  <\$> genFile
]

withHandle :: [Gen (Cmd :@ Symbolic)]
withHandle = [
fmap At \$ Write <\$> genHandle <*> genString
, fmap At \$ Close <\$> genHandle
]

genDir :: Gen Dir
genDir = do
n <- QC.choose (0, 3)
Dir <\$> replicateM n (QC.elements ["x", "y", "z"])

genFile :: Gen File
genFile = File <\$> genDir <*> QC.elements ["a", "b", "c"]

genString :: Gen String
genString = QC.sized \$ \n -> replicateM n (QC.elements "ABC")

genHandle :: Gen (Reference IO.Handle Symbolic)
genHandle = QC.elements (map fst hs)``````

A few comments on the generator:

• When we generate paths, we choose from a very small set of directory and file names. We are not really interested in testing that, for example, our implementation is Unicode-safe here; by picking from a small known set we generate tests that are more likely to contain multiple references to the same file, without having to add special provisions to do so.
• We cannot, of course, generate handles out of nowhere; fortunately, the model tells us which handles we have available, and so `genHandle` just picks one at random from that. We do not limit the generator to picking open handles: it is important to also test the behaviour of the system in the error case where a program attempts to write to a closed handle.
• The `elements` function from `QuickCheck`, which picks a random element from a list, is partial: it will throw an error if the list is empty. We must be careful not to generate any commands requiring handles when we have no handles yet.
• The reason for the `Maybe` in the type signature is that in some circumstances a generator might be unable to generate any commands at all given a particular model state. We don’t need to take advantage of this feature and therefore always return `Just` a generator.

We can also define a shrinker for commands, but we can start with a shrinker that says that individual commands cannot be shrunk:

``````shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ _ = []``````

## Putting it all together

In order to actually start generating and running tests, `quickcheck-state-machine` wants us to bundle all of this functionality up in a single record:

``````sm :: FilePath -> StateMachine Model (At Cmd) IO (At Resp)
sm root = StateMachine {
initModel     = initModel
, transition    = transition
, precondition  = precondition
, postcondition = postcondition
, invariant     = Nothing
, generator     = generator
, distribution  = Nothing
, shrinker      = shrinker
, semantics     = semantics root
, mock          = symbolicResp
}``````

We have defined all of these functions above, with the exception of `precondition`. When `quickcheck-state-machine` finds a failing test, it will try to shrink it to produce a smaller test case. It does this by trying to remove commands from the program and then checking if the resulting program still “makes sense”: does the precondition of all commands still hold?

The precondition should be as liberal as possible; for instance, the precondition should not require that a file handle is open before writing to it, because we should also check that errors are handled correctly. Typically, the only thing the precondition should verify is that commands do not refer to non-existent handles (i.e., if we remove a call to open, then subsequent uses of the handle returned by that call to open simply cannot be executed anymore). Thus, we will define:

``````precondition :: Model Symbolic -> Cmd :@ Symbolic -> Logic
precondition (Model _ hs) (At c) =
forall (toList c) (`elem` map fst hs)``````

Aside. Although we do not need it for our running example, one other thing a precondition might do is rule out test cases that we explicitly don’t want to test. For example, if our mock file system throws an error in some cases because a particular combination of arguments to a call is simply not supported, then we don’t want to flag this as a bug. A clean way to implement this is to extend the error type with a field that marks an error as an intentional limitation; then the `precondition` can simply rule out any commands that (in the model) would return an error with this mark. This keeps the definition of the precondition itself simple, and the logic of what is and what isn’t an intentional limitation lives in the implementation of the model itself.

## Running the tests

Apart from some additional type class instances (see the code), we are now ready to define and execute the actual test:

``````prop_sequential :: FilePath -> QC.Property
prop_sequential tmpDir =
forAllCommands (sm rootUnused) Nothing \$ \cmds ->
tstTmpDir <- liftIO \$ createTempDirectory tmpDir "QSM"
let sm' = sm tstTmpDir
(hist, _model, res) <- runCommands sm' cmds
prettyCommands sm' hist
\$ checkCommandNames cmds
\$ res QC.=== Ok

rootUnused :: FilePath
rootUnused = error "root not used during command generation"``````

All functions prefixed `QC` are standard `QuickCheck` functions. The others come from `quickcheck-state-machine`:

• `forAllCommands` uses `QuickCheck`’s `forAllShrink` and instantiates it will the command generation and shrinking infrastructure from `quickcheck-state-machine`
• `runCommands` then executes the generated commands, validating the postcondition at every step.
• `prettyCommands` renders those commands in case of a test failure.
• `checkCommandNames` adds some statistics about the distribution of commands in the generated tests.

We can run the test in `ghci`:

``````> quickCheck (prop_sequential "./tmp")
+++ OK, passed 100 tests:
56% coverage

3% []
2% [("MkDir",1)]
...``````

It tells us that all tests passed, and gives us some statistics about the tests that were executed: in 3% of the cases, they contained a single `MkDir` and a single `Read`, 3% were completely empty, 2% contained one call to `MkDir`, one call to `Open`, one call to `Read`, and so on.

## Labelling

At this point you might conclude that you’re done. We have the real implementation, we have the mock implementation, they return identical results for all tests, what else could we want?

Let’s think back to those unit tests we were on the verge of writing, but stopped just in time because we remembered that we should generate unit tests, not write them:

• Write to two different files
• Write to a file and then read it

How do we know that our generated tests include these two cases (and all the other unit tests that we would have written)? We get some statistics from `quickcheck-state-machine`, but it’s not terribly helpful. For example, the first line above tells us that 3% of our test cases contain one call to `MkDir` and one call to `Read`; but we know that that call to `Read` must fail, because if these are the only two commands we execute, there aren’t any files for that `Read` to read.

The solution is to label tests. For example, we might introduce labels, or tags, that correspond to the two unit tests above:

``data Tag = OpenTwo | SuccessfulRead``

We then need to write a function that looks at a particular test case and checks which labels apply. It is important to realize that this does not mean we are bringing back the same unit tests under a different guise: programs that will be labelled `OpenTwo` must write to two different files, but may also do a whole bunch of other things.

We can use use the `foldl` package to write such a labelling function in a convenient manner. The labelling function will look at the `Event`s and produce `Tag`s. To check if we open (at least) two different files, we keep track of all the successful calls to open; for `SucessfulRead` we simply remember if we have seen a call to `Read` with a non-error result:

``````tag :: [Event Symbolic] -> [Tag]
tag = Foldl.fold \$ catMaybes <\$> sequenceA [
openTwo
]
where
openTwo :: Fold (Event Symbolic) (Maybe Tag)
openTwo = Fold update Set.empty extract
where
update :: Set File -> Event Symbolic -> Set File
update opened ev =
case (cmd ev, mockResp ev) of
(At (Open f), Resp (Right _)) ->
Set.insert f opened
_otherwise ->
opened

extract :: Set File -> Maybe Tag
extract opened = do
guard (Set.size opened >= 2)
return \$ OpenTwo

successfulRead :: Fold (Event Symbolic) (Maybe Tag)
successfulRead = Fold update False extract
where
update :: Bool -> Event Symbolic -> Bool
case (cmd ev, mockResp ev) of
(At (Read _), Resp (Right _)) ->
True
_otherwise ->
False

extract :: Bool -> Maybe Tag

(For a read to be successful, we must have created the file first – this is a property of the semantics, we don’t need to enforce this in the labeller.)

The commands we get back from the `forAllCommands` function of `quickcheck-state-machine` are of type `Commands`. This is a simple wrapper around a list of `Command`; `Command` in turn bundles a command (`Cmd`) along with its response (and the variables in that response). We can therefore easily turn this into a list of events:

``````execCmd :: Model Symbolic
-> Command (At Cmd) (At Resp)
-> Event Symbolic
execCmd model (Command cmd resp _vars) =
lockstep model cmd resp

execCmds :: Commands (At Cmd) (At Resp) -> [Event Symbolic]
execCmds = \(Commands cs) -> go initModel cs
where
go :: Model Symbolic
-> [Command (At Cmd) (At Resp)]
-> [Event Symbolic]
go _ []       = []
go m (c : cs) = e : go (after e) cs
where
e = execCmd m c``````

We can then define a variant on `prop_sequential` above that replaces `checkCommandNames` with our own labelling function (you could also use both, if you wanted to):

``````prop_sequential' :: FilePath -> QC.Property
prop_sequential' tmpDir =
forAllCommands (sm rootUnused) Nothing \$ \cmds ->
tstTmpDir <- liftIO \$ createTempDirectory tmpDir "QSM"
let sm' = sm tstTmpDir
(hist, _model, res) <- runCommands sm' cmds
prettyCommands sm' hist
\$ QC.tabulate "Tags" (map show \$ tag (execCmds cmds))
\$ res QC.=== Ok``````

Here `tabulate` is the standard QuickCheck function for adding multiple labels to a test case. If we now re-run our tests, we get

``````> quickCheck (prop_sequential' "./tmp")
+++ OK, passed 100 tests.

Tags (57 in total):
63% OpenTwo

The numbers here are a bit awkward to interpret: it says that across all tests a total of 57 labels were found, and of those 57 labels, 63% were `OpenTwo` and 37% were `SuccessfulRead`. In other words, 63% * 57 = 36 tags were `OpenTwo` (36 tests out of 100 were labelled as `OpenTwo`), and 37% * 57 = 21 tags were `SuccessfulRead` (21 tests out of 100 were labelled as `SuccessfulRead`). Note that it is perfectly possible for these two sets of tests to overlap (i.e., a single program can be tagged as both `OpenTwo` and `SuccessfulRead`).

## Inspecting the labelling function

So we have a generator, and we have labels for the unit tests that we didn’t write; have we covered our bases now? Well, how do we know that our labelling function is correct? We might seem to descent into infinite regress here, testing the tests that tests the tests… but it would be good to at least have a look at some test case examples and how they are labelled. Fortunately, this is functionality that QuickCheck provides out of the box through a function called `labelledExamplesWith`. We can define a simple wrapper around it that gives us the possibility to specify a particular seed (which will be useful once we start working on shrinking):

``````showLabelledExamples :: Maybe Int -> IO ()
showLabelledExamples mReplay = do
replaySeed <- case mReplay of
Nothing   -> getStdRandom \$ randomR (1, 999999)
Just seed -> return seed

putStrLn \$ "Using replaySeed " ++ show replaySeed

let args = QC.stdArgs {
QC.maxSuccess = 10000
, QC.replay     = Just (QC.mkQCGen replaySeed, 0)
}

QC.labelledExamplesWith args \$
forAllCommands (sm rootUnused) Nothing \$ \cmds ->
repeatedly QC.collect (tag . execCmds \$ cmds) \$
QC.property True``````

Instead of `tabulate` we must use `collect` to label examples when constructing examples; `collect` takes a single label at a time though, so we use my all-time favourite combinator to repeatedly call `collect` for all tags:

``````repeatedly :: (a -> b -> b) -> ([a] -> b -> b)
repeatedly = flip . List.foldl' . flip``````

(I wonder if I can patent this function?)

When we run this, we see something like

``````> showLabelledExamples Nothing
Using replaySeed 288187
*** Found example of OpenTwo
Commands [
.. Open (File (Dir []) "b")
, .. Open (File (Dir []) "a")
]

Commands [
.. Open (File (Dir []) "b")
, .. Close (Reference (Symbolic (Var 0)))
, .. Read (File (Dir []) "b")
]``````

(I’ve tidied up the output a bit and removed some redundant information.)

This is looking good: both of these look like the kind of examples we would have written ourselves for these tags.

## Standard shrinking

In fact, if we look at those test cases at the end of the previous section, you might be thinking that those examples look surprisingly good: not only are they indeed instances of those tags, but they are very small test cases: they are pretty much the unit tests that we would have written if we were so misguided as to think we need to write unit tests. Were we simply lucky?

No, we were not. QuickCheck’s `labelledExamples` not only searches for labelled test cases, it also tries to shrink them when it finds one, and continues to shrink them until it can shrink them no further without losing the label. The shrinker itself is implemented in `quickcheck-state-machine`; if we disable it altogether and re-run the search for labelled examples, we might find an example such as the following for `SuccessfulRead`, where for clarity I’ve marked all failing commands with an asterisk `(*)`:

``````Commands [
.. Open (File (Dir ["z", "x", "y"]) "c")    (*)
, .. MkDir (Dir ["z", "x")                    (*)
, .. Read (File (Dir ["z", "y", "y"]) "c")    (*)
, .. Read (File (Dir []) "c")                 (*)
, .. MkDir (Dir [ "x" , "x" ])                (*)
, .. Read (File (Dir ["x", "z", "x"]) "b")    (*)
, .. MkDir (Dir ["y"])
, .. MkDir (Dir [])                           (*)
, .. Open (File (Dir ["z"]) "b")              (*)
, .. Open (File (Dir []) "a")
, .. MkDir (Dir [])                           (*)
, .. Open (File (Dir ["x"]) "b")              (*)
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir ["x", "y"]) "b")         (*)
, .. Open (File (Dir []) "b")
, .. MkDir (Dir ["y"])                        (*)
, .. Read (File (Dir []) "a")
, .. MkDir (Dir ["z"])
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 1)))
, .. Open (File (Dir ["z" , "z" , "y"]) "a")  (*)
, .. Open (File (Dir ["x" , "x" , "z"]) "c")  (*)
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir ["y", "y"]) "b")         (*)
, .. Read (File (Dir ["x", "y", "x"]) "a")    (*)
]``````

This is looking significantly less ideal! If there was a bug in `read`, then this would certainly not be a very good minimal test case, and not something you would want to debug. So how does `quickcheck-state-machine` shrink tests? The basic approach is quite simple: it simply removes commands from the program. If the resulting program contains commands whose precondition is not satisfied (remember, for our running example this means that those commands would use handles that are no longer created) then it discards it as a possible shrink candidate; otherwise, it reruns the test, and if it still fails, repeats the process.

The large percentage of commands that are unsuccessful can easily be removed by `quickcheck-state-machine`:

``````Commands [
.. MkDir (Dir ["y"])
, .. Open (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir []) "b")
, .. Read (File (Dir []) "a")
, .. MkDir (Dir ["z"])
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 1)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
]``````

Both calls to `MkDir` can easily be removed, and the resulting program would still be tagged as `SuccessfulRead`; and the same is true for repeated closing of the same handle:

``````Commands [
.. Open (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir []) "b")
, .. Read (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 1)))
]``````

At this point the shrinker cannot remove the second call to `Open` because the second call to `Close` depends on it, but it can first remove that second call to `Close` and then remove that second call to `Open`, and we end up with the minimal test case that we saw in the previous section:

``````Commands [
.. Open (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 0)))
, .. Read (File (Dir []) "a")
]``````

Perfect.

## Improving shrinking

Unfortunately, this does not mean that we can depend on `quickcheck-state-machine` to solve all our shrinking problems. Consider this run of `showLabelledExamples`:

``````> showLabelledExamples (Just 166205)
Using replaySeed 166205
*** Found example of OpenTwo
Commands [
.. MkDir (Dir ["x"])
, .. Open (File (Dir [])    "c")
, .. Open (File (Dir ["x"]) "a")
]``````

This is indeed an example of a program in which we open at least two files; however, why is that call to `MkDir` still there? After all, if there was a bug in opening more than one file, and the “minimal” test case would include a call to `MkDir`, that would be a red herring which might send the person debugging the problem down the wrong path.

The reason that `quickcheck-state-machine` did not remove the call to `MkDir`, of course, is that without it the second call to `Open` would fail: it tries to create a file in directory `x`, and if that directory does not exist, it would fail. To fix this, we need to tell `quickcheck-state-machine` how to shrink individual commands; so far we have been using

``````shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ _ = []``````

which says that individual commands cannot be shrunk at all. So how might we shrink an `Open` call? One idea might be to shrink the directory, replacing for instance `/x/y/a` by `/x/a` or indeed just `/a`. We can implement this using

``````shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ (At cmd) =
case cmd of
Open (File (Dir d) f) ->
[At \$ Open (File (Dir d') f) | d' <- QC.shrink d]
_otherwise ->
[]``````

If we use this shrinker and run `showLabelledExamples` a number of times, we will find that all the examples of `OpenTwo` are now indeed minimal… until it finds an example that isn’t:

``````> showLabelledExamples (Just 980402)
Using replaySeed 980402
*** Found example of OpenTwo
Commands [
.. MkDir (Dir ["x"]))
, .. Open (File (Dir [])    "a")
, .. Open (File (Dir ["x"]) "a")
]``````

In this example we cannot shrink the directory to `[]` because the resulting program would try to open the same file twice, which is not allowed (“resource busy”). We need a better way to shrink this program.

What we want to implement is “try to replace the file path by some file in the (local) root. It is important to realize however that a shrinker such as

``````shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ (At cmd) =
case cmd of
Open _ -> -- BAD EXAMPLE
[At \$ Open (File (Dir []) f') | f' <- ["t1", "t2"]]
_otherwise ->
[]``````

is a bad idea. This shrinker tries to replace any file path with either `/t1` or `/t2`. However, this means that shrinking now never stops:

``Open (File (Dir [] "t1"))``

can be shrunk to

``Open (File (Dir [] "t2"))``

which can then be shrunk back to

``Open (File (Dir [] "t1"))``

and QuickCheck will loop when trying to shrink the test case. It is important that there is a clear direction to shrinking.

An approach that does work is the following: any file path that doesn’t start with a `t` can be replaced by path `/t100`; moreover, any `/tN` (for some number N) can be replaced by `/tN'` for some number N’ < N:

``````shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ (At cmd) =
case cmd of
Open (File (Dir []) ('t' : n)) ->
[openTemp n' | n' <- QC.shrink (read n)]
Open _ ->
[openTemp 100]
_otherwise ->
[]
where
openTemp :: Int -> Cmd :@ Symbolic
openTemp n = At \$ Open (File (Dir []) ('t' : show n))``````

Now

``````Commands [
.. MkDir (Dir ["x"]))
, .. Open (File (Dir [])    "a")
, .. Open (File (Dir ["x"]) "a")
]``````

can shrink to

``````Commands [
.. MkDir (Dir ["x"]))
, .. Open (File (Dir []) "a")
, .. Open (File (Dir []) "t100")
]``````

at which point the call to `MkDir` can be removed; this will eventually shrink down to

``````Commands [
.. Open (File (Dir []) "t0")
, .. Open (File (Dir []) "t1")
]``````

## Dependencies between commands

It’s been a long road, but we are almost there. The last thing we need to discuss is how to shrink programs with dependencies. The “open at least two” example above was relatively easy to shrink because we could shrink one command at the time. Sometimes however there are dependencies between commands. For example, consider this “minimal” example of “successful read”:

``````> showLabelledExamples (Just 617213)
Using replaySeed 617213
Commands [
.. MkDir (Dir [ "z" ])
, .. Open (File (Dir ["z"]) "b")
, .. Close (Reference (Symbolic (Var 0)))
, .. Read (File (Dir ["z"]) "b")
]``````

As before, we have a call to `MkDir` which should ideally be removed. However, if we tried to change the path in the call to `Open`, the call to `Read` would fail: both of these commands should refer to the same path. But shrinking can only change one command at a time, and this is important to keep the computational complexity (runtime cost) of shrinking down. What to do?

The problem is that we want that call to `Read` to use the same path as the call to `Open`, but we have no way to express this. The solution is to make this expressible. After all, we can already express “the handle returned by that call to `Open`”; all we need to do is introduce a second kind of reference: a reference to a path.

The language of commands changes to

``data Cmd fp h = Read (Expr fp) | -- rest as before``

`Cmd` gets an additional type argument `fp` to record the types of paths, and instead of a `File`, `Read` now takes an `Expr` as argument:

``data Expr fp = Val File | Var fp``

We can of course still use a concrete path, as before, but we can also use a variable: “use the same path as used in that call to open”. This means that `Open` must return that reference, so `Success` and `Resp` get adjusted accordingly:

``````data Success fp h = Handle fp h | -- rest as before
newtype Resp fp h = Resp (Either Err (Success fp h))``````

Just like was the case for handles, when we actually run code all variables have been resolved, so the interpreter isn’t any more difficult:

``````runMock :: Cmd File MHandle -> Mock -> (Resp File MHandle, Mock)
runMock (Open f) = first (Resp . fmap (Handle f)) . mOpen f
runMock (Read e) = first (Resp . fmap String)     . mRead (eval e)
-- rest as before

eval :: Expr File -> File
eval (Val f) = f
eval (Var f) = f``````

The IO interpreter is modified similarly. Most of the rest of the changes to the code are minor and mostly automatic. For example, `At` must now instantiate both parameters

``newtype At f r = At (f (Reference File r) (Reference IO.Handle r))``

the model must record the mapping from file references now too

``````type FileRefs r = [(Reference File r, File)]
data Model    r = Model Mock (FileRefs r) (HandleRefs r)``````

See `Version2.hs` in the example package for details.

Crucially, we can now take advantage of this in the shrinker: when we see a call to `Read` with a file path that we have seen before, we can shrink that to use a variable instead:

``````shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker (Model _ fs _) (At cmd) =
case cmd of
[At \$ Read (Var r) | r <- mapMaybe (matches f) fs]
-- other cases as before
where
matches :: File -> (r, File) -> Maybe r
matches f (r, f') | f == f'   = Just r
| otherwise = Nothing``````

This means that the problematic example5

``````> showLabelledExamples (Just 617213)
Using replaySeed 617213
Commands [
.. MkDir (Dir [ "z" ])
, .. Open (File (Dir ["z"]) "b")
, .. Close (Reference (Symbolic (Var 1)))
, .. Read (Val (File (Dir ["z"]) "b"))
]``````

can now shrink to

``````Commands [
.. MkDir (Dir [ "z" ])
, .. Open (File (Dir ["z"]) "b")
, .. Close (Reference (Symbolic (Var 1)))
]``````

At this point the shrinking for `Open` that we defined above can kick in, replacing `z/b` with `/t100`, making the call to `MkDir` redundant, and the example can ultimately shrink to

``````Commands [
.. Open (File (Dir []) "t0")
, .. Close (Reference (Symbolic (Var 1)))
]``````

## Conclusions

Writing unit tests by hand is problematic for at least two reasons. First, as the system grows in complexity the number of interactions between the various features will be impossible to capture in hand written unit tests. Second, unit tests can only exercise the system in ways that the author of the unit tests foresees; often, bugs arise when the system is exercised in ways that were not foreseen. It is therefore important not to write unit tests, but rather generate them. We have seen how a library such as `quickcheck-state-machine` can assist in generating and shrinking meaningful sequences of API calls.

We have also seen why it is important to label test cases, how to inspect the labelling function, and how to use labelling to improve shrinking. Without writing a shrinker for individual commands, the standard `quickcheck-state-machine` shrinker already does a really good job at removing redundant commands. However, if we are serious about producing minimal test cases without red herrings that might lead debugging efforts astray (and we should be serious about that), then we also need to put some thought into shrinking individual commands.

Finally, we have seen how we can improve shrinking by making dependencies between commands explicit. This also serves as an example of a language with multiple types of references; the approach we put forth in this blog post essentially scales to an arbitrary number of types of references without much difficulty.

We have not talked about running parallel tests at all in this blog post. This is an interesting topic in its own right, but affects only the very outermost level of the test: `prop_sequential` would get an analogue `prop_parallel`, and nothing else would be affected; the `quickcheck-state-machine` documentation shows how to do this; the original paper describing the approach (in Erlang) by John Hughes and others is also well worth a read. Finally, `quickcheck-state-machine` is not the only library providing this functionality in Haskell; in particular, hedgehog, an alternative to QuickCheck, does also.

The way we set up the tests in this blog post is not the only way possible, but is one that we believe leads to a clean design. The running example in this blog post is a simplified version of a mock file system that we use in the new Ouroboros consensus layer, where we use it to simulate file system errors when testing a blockchain database. The tests for that blockchain database in turn also use the design patterns laid out in this blog post, and we have used those design patterns also in a test we added to `quickcheck-state-machine` itself to test the shrinker. Of course, being Haskellers, we would prefer to turn design patterns into actual code; indeed, we believe this to be possible, but it may require some more advanced type system features. This is something we want to investigate further.

#### Footnotes

1. Mapping from `IllegalOperation` to `HandleClosed` is bit too coarse, but suffices for the sake of this blog post.

2. This has a somewhat unpleasant untyped feel to it. However, if the design patterns proposed in this blog post are used, this barely affects any code that we write: we never (or almost never) have to pattern match on that `Success` type.

3. I’m eliding a `Typeable` constraint here.

4. Technically, `quickcheck-state-machine` uses a function `Resp :@ Symbolic -> [Var]` to find all the references “bound” by a command.

5. The `Close` call now refers to `Var 1` instead of `Var 0`, because `Open` now creates two variables: the reference to the path and the reference to the handle.

### Tweag I/O

Matthias Meschede, Juan Simões

## Introduction

Haskell and data science - on first sight a great match: native function composition, lazy evaluation, fast execution times, and lots of code checks. These sound like ingredients for scalable, production-ready data transformation pipelines. What is missing then? Why is Haskell not widely used in data science?

One of the reasons is that Haskell lacks a standardized data analysis environment. For example, Python has a de facto standard library set with `numpy`, `pandas` and `scikit-learn` that form the backbone, and many other well-supported specialized libraries such as `keras` and `tensorflow` that are easily accessible. These libraries are distributed with user friendly package managers and explained in a plethora of tutorials, Stack Overflow questions and millions of Jupyter notebooks. Most problems from beginner to advanced level can be solved by adapting and combining these existing solutions.

This post presents Jupyter and JupyterLab - both important ingredients of the Python ecosystem - and shows how we can take advantage of these tools for interactive data analysis in Haskell.

## Jupyter and exploratory data analysis

Project Jupyter became famous through the browser-based notebook app that allows to execute code in various compute environments and interlace it with text and media elements (example gallery).

More generally, Project Jupyter standardizes the interactions between Jupyter frontends such as the notebook and Jupyter kernels, the compute environments. Typically a kernel receives a message from a frontend, executes some code, and responds with a rich media message. The frontend can render the response message as text, images, videos or small applications. All exchanged messages are standardized by the Jupyter protocol and therefore independent of the specific frontend or kernel that is used. Various frontends and kernels for many different languages, like Python, Haskell, R, C++, Julia, etc, exist.

Quick REPL-like interaction of a user with a compute environment via a frontend is very useful for exploratory data analysis. The user interrogates and transforms the data with little code snippets and receives immediate feedback through rich media responses. Different algorithms (expressed as short code snippets) can rapidly be prototyped and visualized. Long multistep dialogues with the kernel can be assembled into a sequential notebook. Notebooks mix explanatory text, code and media elements and can therefore be used as human-readable reports (such as this blogpost). This REPL workflow has become one of the most popular ways for exploratory data analysis.

## Conversations with a Jupyter kernel

IHaskell is the name of the Jupyter kernel for Haskell. It contains a little executable `ihaskell` that can receive and respond to messages in the Jupyter protocoll (via ZeroMQ. Here is a little dialogue that is initiated by sending the following code snippet from the notebook frontend to `ihaskell`:

``````take 10 \$ (^2) <\$> [1..]
``````

And here is the rendered answer that the frontend received from the kernel

`[1,4,9,16,25,36,49,64,81,100]`

In Jupyter parlance the above dialogue corresponds to the following `execute_request`:

``````>> shell.execute_request (8be63d5c-1170-495d-82da-e56272052faf) <<

session: "32fe9cd0-8c37-450e-93c0-6fbd45bfdcd9",
msg_id: "8be63d5c-1170-495d-82da-e56272052faf",
msg_type: "execute_request"}
channel: "shell"
content: {silent: false, store_history: true, user_expressions: Object,
allow_stdin: true, stop_on_error: true,
code: "take 10 \$ (^2) <\$> [1..]"}   <<<<< LOOK HERE
buffers: Array[0]
``````

and to the following `display_data` message that is received as a response

``````<< iopub.display_data (68cce1e7-4d60-4a20-a707-4bf352c4d8d2) >>

msg_id: "68cce1e7-4d60-4a20-a707-4bf352c4d8d2",
session: "32fe9cd0-8c37-450e-93c0-6fbd45bfdcd9",
date: "2018-08-02T08:14:10.245877Z"}
msg_id: "68cce1e7-4d60-4a20-a707-4bf352c4d8d2"
msg_type: "display_data"
msg_id: "8be63d5c-1170-495d-82da-e56272052faf",
session: "32fe9cd0-8c37-450e-93c0-6fbd45bfdcd9"}
content: {data: {text/plain: "[1,4,9,16,25,36,49,64,81,100]"},  <<<<< LOOK HERE
buffers: Array[0]
channel: "iopub"
``````

`ihaskell` can import Haskell libraries dynamically and has some special commands to enable language extensions, print type information or to use Hoogle.

## JupyterLab

JupyterLab is the newest animal in the Jupyter frontend zoo, and it is arguably the most powerful: console, notebook, terminal, text-editor, or image viewer, JupyterLab integrates these data science building blocks into a single web based user interface. JupyterLab is setup as a modular system that can be extended. A module assembles the base elements, changes or add new features to build an IDE, a classical notebook or even a GUI where all interactions with the underlying execution kernels are hidden behind graphical elements.

How can Haskell take advantage of JupyterLab's capacities? To begin with, JupyterLab provides plenty of out-of-the-box renderers that can be used for free by Haskell. From the default renderers, the most interesting is probably Vega plotting. But also `geojson`, `plotly` or and many other formats are available from the list of extensions, which will certainly grow. Another use case might be to using the JupyterLab extension system makes it easy to build a simple UI that interact with an execution environment. Finally, Jupyter and associated workflows are known by a large community. Using Haskell through these familiar environments softens the barrier that many encounter when exploring Haskell for serious data science.

Let's get into a small example that shows how to use the JupyterLab VEGA renderer with IHaskell.

## Wordclouds using Haskell, Vega and JupyterLab

We will use here the word content of all blog posts of `tweag.io`, which are written in markdown text files. The following little code cell that reads all `.md` files in the `posts` folder and concatenates them into a single long string from which we remove some punctuation characters. This code cell is then sent to the `ihaskell` kernel, which responds to the last `take` function with a simple text response.

``````:ext QuasiQuotes
import System.Directory
import Data.List

fnames <- getDirectoryContents "../../posts"
paths = ("../../posts/"++) <\$> fnames
md_files = filter (isSuffixOf ".md") paths
text <- mconcat (readFile <\$> md_files)
cleanedText = filter (not . (`elem` "\n,.?!-:;\"\'")) text
take 50 cleanedText
``````
``````"title Were hiring<br>(Software engineer / devops)p"
``````

The VEGA visualization specification for a wordcloud can be defined as a JSON string that is filled with our text data. A convenient way to write longer multiline strings in Haskell are `QuasiQuotes`. We use `fString` QuasiQuotes from the `PyF` package. Note that `{}` fills in template data and `{{` corresponds to an escaped `{`.

``````import Formatting
import PyF
import Data.String.QQ

let vegaString = [fString|{{
"\$schema": "https://vega.github.io/schema/vega/v4.json",
"width": 800,
"height": 400,

"data": [
{{
"name": "table",
"values": [
"{take 20000 cleanedText}"
],
"transform": [
{{
"type": "countpattern",
"field": "data",
"case": "upper",
"pattern": "[\\\\w']{{3,}}",
}},
{{
"type": "formula", "as": "angle",
"expr": "[0, 90][~~(random() * 3)]"
}},
{{
"type": "formula", "as": "weight",
"expr": "if(datum.text=='VEGA', 600, 300)"
}}
]
}}
],

"scales": [
{{
"name": "color",
"type": "ordinal",
"range": ["#3e4593", "#bc3761", "#39163d", "#2a1337"]
}}
],

"marks": [
{{
"type": "text",
"from": {{"data": "table"}},
"encode": {{
"enter": {{
"text": {{"field": "text"}},
"align": {{"value": "center"}},
"baseline": {{"value": "alphabetic"}},
"fill": {{"scale": "color", "field": "text"}}
}},
"update": {{
"fillOpacity": {{"value": 1}}
}},
"hover": {{
"fillOpacity": {{"value": 0.5}}
}}
}},
"transform": [
{{
"type": "wordcloud",
"size": [800, 400],
"text": {{"field": "text"}},
"rotate": {{"field": "datum.angle"}},
"font": "Helvetica Neue, Arial",
"fontSize": {{"field": "datum.count"}},
"fontWeight": {{"field": "datum.weight"}},
"fontSizeRange": [12, 56],
}}
]
}}
]
}}|]
``````

We display this JSON string with the native Jupyterlab JSON renderer here for convenience. The Display.json function from IHaskell annotates the content of the display message as `application/json` and `ihaskell` sends the annotated display message to Jupyterlab. In consequence, Jupyterlab knows that it should use its internal `application/json` renderer to display the message in the frontend.

``````import qualified IHaskell.Display as D
D.Display [D.json vegaString]
``````

In a similar way, we can annotate the display message with the `application/vnd.vegalite.v2+json` MIME renderer type. The VEGA string that we have generated earlier is then rendered with the internal JupyterLab javascript VEGA code:

``````D.Display [D.vegalite vegaString]
``````

## Conclusion

JupyterLab provides a REPL-like workflow that is convenient for quick exploratory data analysis and reporting. Haskell can benefit in particular from JupyterLab's rendering capacities, its pleasant user interface and its familiarity in the data science community. If you want to try it yourself, this blog post has been written directly as a notebook that is stored in this folder (with a dataset from Wikipedia). The IHaskell setup with this example notebook is available as a docker image that you can run with:

``````docker run -p 8888:8888 tweag/jupyterwith:latest
``````

Alternatively, if you are a Nix user, you can try our declarative JupyterLab-on-Nix setup that is the subject of our next post.

# Release delays with Stackage

Summary: There are two steps that delay new versions of packages in Stackage.

I aim to get the latest version of my software out to as many people as quickly as possible. Older versions have bugs, new versions have new features - that's why I release new versions. Unfortunately there are two steps in Stackage that slow down this process.

Taking an example, HLint depends on `haskell-src-exts`, and is tightly coupled, so (to a first approximation) every 0.1 bump to `haskell-src-exts` requires changing HLint. There are also lots of other packages that depend on `haskell-src-exts`. This situation leads to two delays in getting HLint to Stackage users, both of which are on display in bug 4214:

Issue 1: Reluctance to remove packages

Stackage has a policy that if a new package (e.g. `haskell-src-exts`) is released which breaks your package (e.g. `haskell-src-meta`) you have an unspecified amount of time to release an update. My experience is either packages are updated quickly (all upgrades on that ticket happened within 12 days) or the package maintainers never reply (46 days later no other maintainer has even left a comment).

It used to be the case that there were hard time limits (maximum one month), but my experience was those were never enforced. Unfortunately this lag can cause a significant delay until Stackage Nightly picks up an upgrade. It seems like a more mechanical rule (e.g. after 2 weeks with no update, or 6 weeks total) might keep the process ticking faster. I appreciate it's hard to break people's work, which is why making it come down to human judgement seems to lengthen the process significantly.

Delay imposed: up to 2 months, and sometimes requires chasing.

Issue 2: Existence of Stackage LTS

While the first issue is very much a trade off, the second one is (in my view) just a bad design of Stackage, as I've said before. There is Stackage Nightly which has the latest code. There is Stackage LTS which has older and therefore buggier code, up to 2-3 months older. Having two options is fine, but the `stack` tool and documentation direct people towards LTS as a preference. LTS is useful if you view the act of upgrading between 0.0.1 versions as low risk (which it isn't) or you find it easier to fix multiple distinct breaking changes when they are overlapped (which it isn't). Unfortunately Stackage LTS users won't get a new version of HLint until a new Stackage LTS version is created, even after it gets merged. On the plus side, this process happens automatically without any intervention by package authors.

Delay imposed: 2-3 months.

PS. While I criticise Stackage, that's because I want to make it better, since it is a very useful distribution channel for many people, and I'm grateful for the work the Stackage maintainers do to keep the process ticking along.

# Why Haskell III: Parametric Types

Welcome back to our series on the simplicity of Haskell's data declarations. Last week, we looked at how to express sum types in different languages. We saw that they fit very well within Haskell's data declaration system. For Java and Python, we ended up using inheritance, which presents some interesting benefits and drawbacks. We'll explore those more next week. But first, we should wrap our heads around one more concept: parametric types.

We'll see how each of these languages allows for the concept of parametric types. In my view, Haskell does have the cleanest syntax. But other compiled languages do pretty well to incorporate the concept. Dynamica languages though, provide insufficient guarantees for my liking.

This all might seem a little wild if you haven't done any Haskell at all yet! Read our Liftoff Series to get started!

Let's remember how easy it is to do parametric types in Haskell. When we want to parameterize a type, we'll add a type variable after its name in the definition. Then we can use this variable as we would any other type. Remember our `Person` type from the first week? Here's what it looks like if we parameterize the `occupation` field.

``````data Person o = Person
{ personFirstName :: String
, personLastName :: String
, personEmail :: String
, personAge :: Int
, personOccupation :: o
}``````

We add the `o` at the start, and then we can use `o` in place of our former `String` type. Now whenever we use the `Person` type, we have to specify a type parameter to complete the definition.

``````data Occupation = Lawyer | Doctor | Engineer

person1 :: Person String
person1 = Person "Michael" "Smith" "msmith@gmail.com" 27 "Lawyer"

person2 :: Person Occupation
person2 = Person "Katie" "Johnson" "kjohnson@gmail.com" 26 Doctor``````

When we define functions, we can use a specific version of our parameterized type if we want to constrain it. We can also use a generic type if it doesn't matter.

``````salesMessage :: Person Occupation -> String
salesMessage p = case personOccupation p of
Lawyer -> "We'll get you the settlement you deserve"
Doctor -> "We'll get you the care you need"
Engineer -> "We'll build that app for you"

fullName :: Person o -> String
fullName p = personFirstName p ++ " " ++ personLastName p``````

Last of all, we can use a typeclass constraint on the parametric type if we only need certain behaviors:

``````sortOnOcc :: (Ord o) => [Person o] -> [Person o]
sortOnOcc = sortBy (\p1 p2 -> compare (personOccupation p1) (personOccupation p2)``````

## Java Generic Types

Java has a comparable concept called generics. The syntax for defining generic types is pretty clean. We define a type variable in brackets. Then we can use that variable as a type freely throughout the class definition.

``````public class Person<T> {
private String firstName;
private String lastName;
private String email;
private int age;
private T occupation;

public Person(String fn, String ln, String em, int age, T occ) {
this.firstName = fn;
this.lastName = ln;
this.email = em;
this.age = age;
this.occupation = occ;
}

public T getOccupation() { return this.occupation; }
public void setOccupation(T occ) { this.occupation = occ; }
...
}``````

There's a bit of a wart in how we pass constraints. This comes from the Java distinction of interfaces from classes. Normally, when you define a class and state the subclass, you would use the `extends` keyword. But when your class uses an interface, you use the `implements` keyword.

But with generic type constraints, you only use `extends`. You can chain constraints together with `&`. But if one of the constraints is a subclass, it must come first.

``public class Person<T extends Number & Comparable & Serializable> {``

In this example, our template type `T` must be a subclass of `Number`. It must then implement the `Comparable` and `Serializable` interfaces. If we mix the order up and put an interface before the parent class, it will not compile:

``public class Person<T extends Comparable & Number & Serializable> {``

## C++ Templates

For the first time in this series, we'll reference a little bit of C++ code. C++ has the idea of "template types" which are very much like Java's generics. Here's how we can create our user type as a template:

``````template <class T>
class Person {
public:
string firstName;
string lastName;
string email;
int age;
T occupation;

bool compareOccupation(const T& other);
};``````

There's a bit more overhead with C++ though. C++ function implementations are typically defined outside the class definition. Because of this, you need an extra leading line for each of these stating that `T` is a template. This can get a bit tedious.

``````template <class T>
bool Person::compareOccupation(const T& other) {
...
}``````

One more thing I'll note from my experience with C++ templates. The error messages from template types can be verbose and difficult to parse. For example, you could forget the `template` line above. This alone could cause a very confusing message. So there's definitely a learning curve. I've always found Haskell's error messages easier to deal with.

## Python - The Wild West!

Since Python isn't compiled, there aren't type constraints when you construct an object. Thus, there is no need for type parameters. You can pass whatever object you want to a constructor. Take this example with our user and occupation:

``````class Person(object):

# This definition hasn't changed!
def __init__(self, fn, ln, em, age, occ):
self.firstName = fn
self.lastName = ln
self.email = em
self.age = age
self.occupation = occ

stringOcc = "Lawyer"
person1 = Person(
"Michael",
"Smith",
"msmith@gmail.com",
27,
stringOcc)

class Occupation(object):
â€¦

classOcc = Occupation()

# Still works!
person2 = Person(
"Katie",
"Johnson",
"kjohnson@gmail.com",
26,
classOcc)``````

Of course, with this flexibility comes great danger. If you expect there are different types you might pass for the occupation, your code must handle them all! Without compilation, it can be tricky to know you can do this. So while you can do polymorphic code in Python, you're more limited. You shouldn't get too carried away, because it is more likely to blow up in your face.

## Conclusion

Now that we know about parametric types, we have more intuition for the idea of filling in type holes. This will come in handy next week as we look at Haskell's typeclass system for sharing behaviors. We'll compare the object oriented notion of inheritance and Haskell's typeclasses. This distinction gets to the core of why I've come to prefer Haskell as a language. You won't want to miss it!

If these comparisons have intrigued you, you should give Haskell a try! Download our Beginners Checklist to get started!

# Tagless final and Scotty

For a little while I’ve been playing around with event sourcing in Haskell using `Conduit` and `Scotty`. I’ve come far enough that the basic functionality I’m after is there together with all those little bits that make it a piece of software that’s fit for deployment in production (configuration, logging, etc.). There’s just one thing that’s been nagging me, testability.

The app is built of two main parts, a web server (`Scotty`) and a pipeline of stream processing components (`Conduit`). The part using `Scotty` is utilising a simple monad stack, `ReaderT Config IO`, and the `Conduit` part is using `Conduit In Out IO`. This means that in both parts the outer edge, the part dealing with the outside world, is running in `IO` directly. Something that isn’t really aiding in testing.

I started out thinking that I’d rewrite what I have using a free monad with a bunch of interpreters. Then I remembered that I have “check out tagless final”. This post is a record of the small experiments I did to see how to use it with `Scotty` to achieve (and actually improve) on the code I have in my production-ready code.

## 1 - Use tagless final with `Scotty`

As a first simple little experiment I wrote a tiny little web server that would print a string to `stdout` when receiving the request to `GET /route0`.

The printing to `stdout` is the operation I want to make abstract.

``````class Monad m => MonadPrinter m where
mPutStrLn :: Text -> m ()``````

I then created an application type that is an instance of that class.

``````newtype AppM a = AppM { unAppM :: IO a }

mPutStrLn t = liftIO \$ putStrLn (unpack t)``````

Then I added a bit of `Scotty` boilerplate. It’s not strictly necessary, but does make the code a bit nicer to read.

``````type FooM = ScottyT Text AppM
type FooActionM = ActionT Text AppM

foo :: MonadIO m => Port -> ScottyT Text AppM () -> m ()
foo port = scottyT port unAppM``````

With that in place the web server itself is just a matter of tying it all together.

``````main :: IO ()
main = do
foo 3000 \$ do
get "/route0" \$ do
lift \$ mPutStrLn "getting /route0"
json \$ object ["route0" .= ("ok" :: String)]

That was simple enough.

In order to try out how to deal with configuration I added a class for doing some simple logging

``````class Monad m => MonadLogger m where
mLog :: Text -> m ()``````

The straight forward way to deal with configuration is to create a monad stack with `ReaderT` and since it’s logging I want to do the configuration consists of a single `LoggerSet` (from `fast-logger`).

``````newtype AppM a = AppM { unAppM :: ReaderT LoggerSet IO a }

That means the class instance can be implemented like this

``````instance MonadLogger AppM where
mLog msg = do
liftIO \$ pushLogStrLn ls \$ toLogStr msg``````

Of course `foo` has to be changed too, and it becomes a little easier with a wrapper for `runReaderT` and `unAppM`.

``````foo :: MonadIO m => LoggerSet -> Port -> ScottyT Text AppM () -> m ()
foo ls port = scottyT port (`runAppM` ls)

runAppM :: AppM a -> LoggerSet -> IO a
runAppM app ls = runReaderT (unAppM app) ls``````

With that in place the printing to `stdout` can be replaced by a writing to the log.

``````main :: IO ()
main = do
ls <- newStdoutLoggerSet defaultBufSize
foo ls 3000 \$ do
get "/route0" \$ do
lift \$ mLog "log: getting /route0"
json \$ object ["route0" .= ("ok" :: String)]

Not really a big change, I’d say. Extending the configuration is clearly straight forward too.

## 3 - Per-request configuration

At work we use correlation IDs1 and I think that the most convenient way to deal with it is to put the correlation ID into the configuration after extracting it. That is, I want to modify the configuration on each request. Luckily it turns out to be possible to do that, despite using `ReaderT` for holding the configuration.

I can’t be bothered with a full implementation of correlation ID for this little experiment, but as long as I can get a new `AppM` by running a function on the configuration it’s just a matter of extracting the correct header from the request. For this experiment it’ll do to just modify an integer in the configuration.

I start with defining a type for the configuration and changing `AppM`.

``````type Config = (LoggerSet, Int)

newtype AppM a = AppM { unAppM :: ReaderT Config IO a }

The logger instance has to be changed accordingly of course.

``````instance MonadLogger AppM where
mLog msg = do
liftIO \$ pushLogStrLn ls \$ toLogStr msg <> toLogStr (":" :: String) <> toLogStr (show i)``````

The `get` function that comes with `scotty` isn’t going to cut it, since it has no way of modifying the configuration, so I’ll need a new one.

``````mGet :: ScottyError e => RoutePattern -> ActionT e AppM () -> ScottyT e AppM ()
mGet p a = get p \$ do
withCfg (\ (ls, i) -> (ls, succ i)) a``````

The tricky bit is in the `withCfg` function. It’s indeed not very easy to read, I think

``````withCfg = mapActionT . withAppM
where
mapActionT f (ActionT a) = ActionT \$ (mapExceptT . mapReaderT . mapStateT) f a
withAppM f a = AppM \$ withReaderT f (unAppM a)``````

Basically it reaches into the guts of `scotty`’s `ActionT` type (the details are exposed in `Web.Scotty.Internal.Types`, thanks for not hiding it completely), and modifies the `ReaderT Config` I’ve supplied.

The new server has two routes, the original one and a new one at `GET /route1`.

``````main :: IO ()
main = do
putStrLn "Starting"
ls <- newStdoutLoggerSet defaultBufSize
foo (ls, 0) 3000 \$ do
get "/route0" \$ do
lift \$ mLog "log: getting /route0"
json \$ object ["route0" .= ("ok" :: String)]
mGet "/route1" \$ do
lift \$ mLog "log: getting /route1"
json \$ object ["route1" .= ("bar" :: String)]

It’s now easy to verify that the original route, `GET /route0`, logs a string containing the integer ‘0’, while the new route, `GET /route1`, logs a string containing the integer ‘1’.

1. If you don’t know what it is you’ll find multiple sources by searching for “http correlation-id”. A consistent approach to track correlation IDs through microservices is as good a place to start as any.

# Kids Coding, Part 4

Previous lessons have all been focused on teaching our ten and eight year olds some coding, since our six year old (Yakov) is still working on reading and writing in English. However, Yakov’s been home sick all week, and he asked me today to teach him some programming. So I came up with a simplified version focused solely on the GHCi prompt.

I started off with:

``````> x = 32
> x + x
``````

And then I asked him what he thought the answer would be. He quickly came back with 64. Then I typed in:

``````> y = 5
> x + y
``````

And he got 37 pretty quickly. We played around with a few more bits of simple arithmetic (he hasn’t really mastered multiplication yet). I introduced defining variables in terms of other variables:

``````> x = 2 * 3
> y = x * 2
> y + 3
``````

This took him a bit longer, but entirely due to the multiplication! This showed me a few things:

1. Reassigning variables was not confusing for him in the least
2. Variable replacement was similarly trivial

Finally, I decided to push things just a bit further and introduce functions:

``````> f x = x + 3
> f 7
``````

This confused him at first, but once I explained that this was applying the function `f` to the number 7, he got it, and said “oh, it’s the +3 function.” (Remember from last time that he’s been playing the function game for a while.) Next I hit:

``````> x = f 0
> f x
``````

This was easy: it’s 6! Finally I gave him two more challenging ones:

``````> f (f 0)
> f (f (f 10))
``````

I fully expected confusion about parentheses. I was shocked: he wasn’t bothered by them at all. He immediately got both answers, and was very proud of himself.

Total time: less than 10 minutes, probably closer to 5. Which is good, because he’s got a short attention span and wanted to play some Nintendo with me too. Overall, I was very happy with how many concepts he was able to absorb.

(Thanks to my `~/.ghc/ghci_history` file for reminding me what we covered today.)

# Ignoring HLint

Summary: HLint now has more ways to ignore hints you don't like.

HLint makes suggestions about how to improve your Haskell code. But not everyone likes all the suggestions all the time, so HLint comes with ways of ignoring those annoying hints, and HLint 2.1.11 provides even more mechanisms. Without further ado, let's take a quick tour - full details are in the HLint README.

Method 1: the `--default` flag

To ignore all hints your code currently generates run `hlint` as normal, but passing the `--default` flag, which will generate a config file with all hints that fire set to ignore. Typically, when approaching a new code base to run HLint on, I start by doing:

``hlint . --default > .hlint.yaml``

After that, it's easy to remove ignored hints from `.hlint.yaml` one by one and fix the code.

Method 2: Add `-ignore` directives

In the `.hlint.yaml` file you can write:

``- ignore: {name: Eta reduce}``

This directive ignores the named hint, and is what `--default` generates. There are also more refined ways of ignoring a hint in certain modules, or ignoring all hints in certain modules (see the README).

Method 3: Add a `{- comment -}`

Method 3 actually has 3 sub-methods, you can write any of:

• `{-# ANN module "HLint: ignore Eta reduce" #-}`
• `{-# HLINT ignore "Eta reduce" #-}`
• `{- HLINT ignore "Eta reduce" -}`

For `ANN` pragmas it is important to put them after any `import` statements. If you have the `OverloadedStrings` extension enabled you will need to give an explicit type to the annotation, e.g. `{-# ANN module ("HLint: ignore Eta reduce" :: String) #-}`. The `ANN` pragmas can also increase compile times or cause more recompilation than otherwise required, since they are evaluated by `TemplateHaskell`.

For `{-# HLINT #-}` pragmas GHC may give a warning about an unrecognised pragma, which can be supressed with `-Wno-unrecognised-pragmas`.

For `{- HLINT -}` comments they are likely to be treated as comments in syntax highlighting, which can lead to them being overlooked.

My current preference is `{- HLINT -}`, but I think GHC should just special case `{-# HLINT #-}` and then in a few GHC releases we could use that. Unfortunately, other people disagree with me, so `{- HLINT -}` is the best we have.

Method 4: Using the C Pre Processor

`hlint` defines the `__HLINT__` preprocessor definition (with value `1`), so problematic definitions (including those that don't parse) can be hidden with:

``#ifndef __HLINT__foo = ( -- HLint would fail to parse this#endif``

# Dhall - Year in review (2018-2019)

dhall-2018

The Dhall configuration language is now two years old and this post will review progress in 2018 and the future direction of the language in 2019.

If you’re not familiar with Dhall, you might want to visit the official website for the language, which is the recommended starting point. This post assumes familiarity with the language.

Also, I want to use this post to advertise a short survey that you can take if you are interested in the language and would like to provide feedback:

## Progress in 2018

This section will review the highlights of what we accomplished over the last year. These highlights are not exhaustive and I focus on improvements that might encourage people to revisit the language if they were on the fence a year ago.

If you’re already familiar with recent progress in the language and you are more interested in where the language is going then you can jump to the Future direction section.

#### New language bindings

Several contributors stepped up to the plate to begin three new actively-maintained language bindings to Dhall.

Of these three the Clojure bindings are the ones closest to completion:

• Mainly missing:

• HTTP imports (coming soon in the 0.3 release)

The Clojure bindings are sufficiently close to completion that they currently get an official vote on proposed changes to the language standard, giving them an equal voice in the language evolution process.

This is a complete reimplementation of the language entirely in Clojure that allows you to marshal Dhall expressions, including Dhall functions, directly into Clojure:

``(require '[dhall-clj.core :refer [input input-ast]]);; We can run compile and run Dhall expression in Clojure with the `input`;; function.;;;; Note that the result of the evaluation is a Clojure value(input "True && False");; => false;; We can even import functions from Dhall..;; (the following compiles a Dhall function into a Clojure function)(def build-info (input "λ(major : Natural) → { version = \"\${Natural/show major}.0\" }"));; ..and run them in Clojure!(build-info 1);; => {"version" "1.0"}``

The Clojure bindings pave the way for making the Dhall configuration language a first class citizen on the JVM.

Additionally, two other language bindings have gotten pretty far along:

• Mainly missing:

• Import resolution
• Mainly missing:

• Import resolution

These latter two language bindings haven’t announced yet as they are works in progress, but I still wanted to recognize their work so far.

Also, I want to mention that adding a conformance test suite (thanks to Fabrizio Ferrai) helped drive parallel implementations by providing implementors with a tangible measure of progress towards the goal of 100% standard coverage.

Thanks to the work of Oliver Charles you can generate `.cabal` files with Dhall by using `dhall-to-cabal`.

This part of the project’s `README` sold me on the motivation for doing so:

We can go beyond Cabal files. If Cabal is a domain specific language for building Haskell projects, what does a domain specific language for building Haskell web applications look like? Does the separate of library, executable, and test-suite make sense here? Maybe we’d rather:

``servant-project {  api-route = "My.API.Route"  server = "My.API.Server"  models = [ "My.API.Pancake", "My.API.Waffle" ]}``

… and have this take care of some other details.

When you think about this it makes perfect sense: Haskell programmers use Cabal/Hackage to package and distribute Haskell code, but then what do they use to package and distribute “Cabal code”? The answer is a language like Dhall that builds in its own code distribution mechanism instead of relying on a separate build tool. This closes the loop so that you don’t need to maintain a growing tower of build tools as your project expands.

I’m pretty sure Cabal was the first “heavy duty” configuration format tested with Dhall because this project prompted the first swell of feature requests related to interpreter performance and usability improvements for working with giant schemas.

Also, the `dhall-to-cabal` project includes the entire Cabal schema encoded as a Dhall type, which you can find here:

This comes in handy when you want a systematic listing of all Cabal configuration features. If you have the `dhall` interpreter installed you can also view the normal form of the schema in all its glory by running:

``\$ dhall <<< 'https://gist.githubusercontent.com/Gabriel439/81855ec23c679e23e556a3cdad7757cd/raw/ca49082a32fc976b9ce2d23134240568d3534fc0/package.dhall'``

You can also migrate an existing project using `cabal-to-dhall`, a tool which converts a `.cabal` file to the equivalent `.dhall` file.

#### Eta

Javier Neira with the support of TypeLead added Dhall as a supported file format for configuring Eta packages by building on top of the `dhall-to-cabal` project.

That project has also produced work-in-progress Eta and Java bindings to Dhall bindings along the way by using Eta to compile the Haskell implementation of Dhall to the JVM. When those are complete you will have yet another option for using the Dhall configuration language on the JVM

If you are interested, you can follow the progress on those bindings via this GitHub issue:

#### Kubernetes

Dhall is commonly used for ops, and the first project to systematically integrate Dhall into a widely used ops tool is the `dhall-kubernetes` project, thanks to the work of Arian van Putten, Fabrizio Ferrai, and Thomas Scholtes.

I’ve never used Kubernetes, but everybody tells me that Kubernetes configurations are large, repetitive, and error-prone YAML files, which are the perfect use case for Dhall.

#### PureScript - Spago

The Spago project builds on top of `psc-packages` to assemble a PureScript package set to build using Dhall as the configuration format.

This tool takes advantage of Dhall’s import system so that the package set can be split over multiple files, which you can see in the top-level package set here:

… and users can easily import that and easily override packages locally without dealing with the headache of rebasing their local changes whenever the upstream package set changes.

#### Complete language standard

Last year I promised to upstream all features from the Haskell implementation into the language standard, since at the time a few import-related features were implementation-defined. This was a top priority because I didn’t want to treat other language bindings as second-class citizens.

This year we successfully standardized everything, meaning that there should no longer be any implementation-defined features. Additionally, all new functionality now begins with a change to the standard followed by a change to each implementation of the language, meaning that the Haskell implementation is no longer treated as a distinguished implementation.

#### Unsigned `Natural` literals

Earlier this year Greg Pfeil proposed to fix the obligatory `+` sign preceding `Natural` number literals, which bothered a lot of newcomers to the language. He proposed require a leading `+` for non-negative `Integer`s instead of `Natural` numbers.

We knew this would be a highly breaking change, but we were all tired of the `+` signs which littered our code. The `Natural` type is much more natural to use (pun intended) than the `Integer` type, so why not optimize the syntax for `Natural` numbers?

So we made the change and now instead of writing an expression like:

``+2 + +2``

``2 + 2``

This change also improved code comprehension, because before this change an expression like this:

``  λ(a : Type)→ λ(f : Natural → a)→ [ f +0, f +1, f +2, f +3, f +4 ]``

… could be misconstrued as adding `f` to various numbers, but after this change:

``  λ(a : Type)→ λ(f : Natural → a)→ [ f 0, f 1, f 2, f 3, f 4 ]``

… the reader can more easily discern that `f` is being applied as a function to numeric arguments.

#### Type synonyms

Previously, users couldn’t create new types using `let` expressions and had to work around this limitation by using the import system to reuse types, like this:

``\$ cat ./Point2D.dhall{ x : Double, y : Double }\$ cat ./Point3D.dhall./Point2D.dhall ⩓ { z : Double }\$ cat ./project.dhalllet project : ./Point3D.dhall → ./Point2D.dhall      = λ(p : ./Point3D.dhall) → p.{ x, y }in  project``

Now users can define new types inline within the same file using an ordinary `let` expression:

``let Point2D = { x : Double, y : Double }  let Point3D = Point2D ⩓ { z : Double }let project : Point3D → Point2D      = λ(p : Point3D) → p.{ x, y }in  project``

#### Simpler `Optional` literals

`Optional` literals used to resemble lists, like this:

``[ 1 ] : Optional Natural[] : Optional Natural``

Now you can use `Some` and `None` instead, like this:

``Some 1None Natural``

In particular, a present `Optional` literal no longer requires a type since `Some` can infer the type from the provided argument. This simplifies the common idiom of overriding an absent `Optional` value within a record of defaults:

``let defaults = { age = None Natural, name = None Text }in  defaults ⫽ { name = Some "Amy" }``

The old list-like syntax is still supported but is deprecated and will be dropped. `dhall lint` will also automatically replace the old list-like `Optional` literals with their new `Some`/`None` equivalents.

#### Union constructors

Unions used to be one of the major pain points when using the language, due to having to specify all alternatives for a union literal, like this:

``[ < Left = 0 | Right : Bool >, < Right = True | Left : Natural > ]``

My first attempt to improve this introduced the `constructors` keyword which took a union type as an argument and returned a record of functions to create each constructor. This changed the above code example to:

``let Either = < Left : Natural | Right : Bool >let either = constructors Eitherin  [ either.Left 0, either.Right True ]``

However, this initial solution introduced two new problem:

• A lot of `constructors`-related boilerplate at the beginning of Dhall files
• Performance issues due to these large intermediate records of constructors

A follow-up change resolved both issues by overloading the `.` operator to also access constructor functions directly from a union type (as if it were already a record), like this:

``let Either = < Left : Natural | Right : Bool >in  [ Either.Left 0, Either.Right True ]``

This solved both the performance issue (by eliminating the need for an intermediate record of constructors) and eliminated the `constructors` keyword boilerplate. Also, this simplification plays nicely with the auto-complete support provided by the `dhall repl` since you can now auto-complete constructors using the `.` operator.

#### Import caching

In 2017 Dhall added support for semantic integrity checks, where you tag an import with a SHA256 hash of a standard binary encoding of an expression’s normal form. This integrity check protects against tampering by rejecting any expression with a different normal form, guaranteeing that the import would never change.

Several astute users pointed out that you could locally cache any import protected by such a check indefinitely. Even better, the SHA256 hash makes for a natural lookup key within that cache.

We standardized and implemented exactly that idea and now any import protected by an integrity check is permanently cached locally using the standard directory prescribed by the XDG Base Directory Specification.

For example, you can now import the entire Prelude as a package protected by an integrity check:

``https://prelude.dhall-lang.org/package.dhall  sha256:26e13b153cb428366610110d4d8f0c135e22b20179d5478bb16b1b83b3f2ca13``

The first time you resolve the Prelude the import may take a bit (~7 seconds on my machine) to fetch the entire package, but the normal form is then stored locally in a 5 KB file:

``\$ wc -c ~/.cache/dhall/534e4a9e687ba74bfac71b30fc27aa269c0465087ef79bf483e876781602a454 5133 …``

… and then subsequent attempts to import the same Prelude resolve much more quickly (~80 ms on my machine).

This means that you can now cheaply import the entire Prelude in every file instead of separately importing each function that you use.

#### Alternative imports

The language now provides support for fallback imports using the `?` operator if import resolution fails.

For example, you can use this feature to import an environment variable if present but gracefully fallback to another value if absent:

``\$ dhall <<< 'Some (env:HOME as Text) ? None Text'Some "/Users/gabriel"\$ dhall <<< 'Some (env:FOO as Text) ? None Text'None Text``

Or you can use the `?` operator to provide alternative locations for obtaining an imported expression:

``  -- Let the user override the Prelude via an environment variable if they want  env:Prelude  -- Next, try to obtain the Prelude from a local installation if present? /usr/local/share/dhall/Prelude/package.dhall  -- Fall back to importing remotely if the neither of the above imports succeed? https://prelude.dhall-lang.org/package.dhall  {- Fall back to importing directly from GitHub if the reverse proxy at     `prelude.dhall-lang.org` is not working  -}? https://github.com/dhall-lang/dhall-lang/blob/master/Prelude/package.dhall``

#### Multi-`let` expressions

You can now define multiple values within a single `let` expression instead of nesting `let` expressions. In other words, instead of this:

``    let job = { department = "Data Platform", title = "Software Engineer" }in  let john = { age = 23, name = "John Doe", position = job }in  let alice = { age = 24, name = "Alice Smith", position = job }in  [ john, alice ]``

… you can now write this:

``let job = { department = "Data Platform", title = "Software Engineer" }let john = { age = 23, name = "John Doe", position = job }let alice = { age = 24, name = "Alice Smith", position = job }in  [ john, alice ]``

`dhall lint` will also automatically simplify any code using the old nested `let` style to use the new “multi-`let`” style.

The Haskell implementation of Dhall strives to be like the “Bash of typed functional programming”, but in order to do so the implementation needs to small, statically linked, and portable so that sysadmins don’t object to widely installing Dhall. In fact, if the executable satisfies those criteria then you don’t even need your sysadmin’s permission to try Dhall out within your own workspace.

Niklas Hambüchen made this possible through this through his general-purpose work on fully static Haskell executables built using Nix. Now Dhall’s continuous integration system produces small (< 3 MB) Linux executables that have no dependency footprint whatsoever.

#### Major performance improvements

The Haskell implementation of Dhall has made dramatic strides in performance improvements over the last year, motivated by projects with very large schemas, such as:

… as well as Formation’s internal use of Dhall which has led to them upstreaming many performance improvements to handle large Dhall programs.

Thanks to the work of Fintan Halpenny, Greg Pfeil, `@quasicomputational`, and others the Haskell implementation is between 1 to 3 orders of magnitude faster than it was a year ago, depending on the configuration file that you benchmark.

We’re also not done improving performance! We continue to improve as new projects continue to stretch the boundaries of what the language can do.

#### Type diffs

Large projects like these also led to usability improvements when working with gigantic types. The Haskell implementation now displays concise “type diffs” whenever you get a type mismatch so that you can quickly narrow down the problem no matter how much your configuration schema grows. This works no matter how deeply nested the error is.

For example, the following contrived example introduces four deeply nested errors in a gigantic schema (where the type is over 6000 lines long) and the error message still zeroes in on every error:

``\$ dhall <<< '../dhall-to-cabal/dhall-to-cabal.dhall : ./type.dhall'Use "dhall --explain" for detailed errorsError: Expression doesn't match annotation{ - license2 : …, + license : …,   library : …              (   ∀(… : { arch :   ∀(… : < S390 : - Bool                                                  + {}                                         | …                                         >                                   )                                 → …                        , …                        }                  )                → { build-tools : …                                  { - version2 : …                                  , + version : …                                  , …                                  }                  , default-extensions : …                                         < - NamedWildCards2 : …                                         | - UnboxedSums : …                                         | + DataKinds : …                                         | + NamedWildCards : …                                         | …                                         >                  , …                  }              ), …}``

#### `dhall repl`

The Haskell implementation also added a REPL contributed by Oliver Charles that you can use to interactively interpret Dhall code, including sophisticated auto-completion support contributed by Basile Henry:

The REPL comes in handy when exploring large values or types, as illustrated by the `dhall-nethack` tutorial which uses the REPL:

#### `dhall lint`

The Haskell implementation also provides a useful `dhall lint` subcommand that you can use to not only format code but to also automatically improve the code in non-controversial ways.

For example, `dhall lint` will automatically remove unused `let` bindings and will simplify nested `let` expressions to instead take advantage of the newest multi-`let` feature.

#### `dhall resolve --dot`

Basile Henry also contributed support for visualizing the dependency tree of a Dhall expression like this:

The following tweet illustrates how to use this feature along with example output:

#### `dhall freeze`

Thanks to Tobias Pflug you can also automatically take advantage of Dhall’s semantic integrity checks using the `dhall freeze` subcommand. This command fetches all imports within a Dhall expression and then automatically tags all of them with semantic integrity checks.

For example:

``\$ dhall format < ./example.dhall let replicate = https://prelude.dhall-lang.org/List/replicatein  replicate 10 Text "!"\$ dhall freeze < ./example.dhall let replicate =      https://prelude.dhall-lang.org/List/replicate sha256:69c3f2b38ab6829b056d82e7976cecbee068fe5aec55990fd27ae8eeefe57341in  replicate 10 Text "!"``

#### `dhall-lang.org`

A while ago, Neuman Vong advised me that if you want your open source project to take off, you need a logo, a website, and a live demo in the browser.

So I took that advice to heart and now Dhall has all three! You can try out the language live in your browser by visiting:

This allows people to “try before they buy” and the site links to several other useful resources, such as the …

#### Dhall wiki

The Dhall wiki contains several useful educational resources for learning the language. The organization of the wiki closely follows the guidelines from this handy post on writing documentation:

The main thing that is missing is to migrate the Haskell tutorial into a language-agnostic tutorial.

You can also now follow the official Twitter account for the language:

This account regularly posts news and tips about the language and ecosystem that you can use to stay abreast of recent progress.

#### Switch from IPFS to GitHub

Early on in the language history we used IPFS to distribute the Dhall Prelude, but due to reliability issues we’ve switched to using GitHub for hosting Dhall code.

There’s even a convenient link you can use to browse the Prelude:

## Future direction

Phew! That was a lot to recap and I’m grateful to all the contributors who made that possible. Now we can review where the language is going.

First, I’m no longer benevolent dictator-for-life of the language. Each new reimplementation of the language gets a vote on the language standard and now that the Clojure implementation of Dhall is essentially complete they get an equal say on the evolution of the language. Similarly, once the PureScript bindings and Python bindings are close to complete they will also get a vote on the language standard, too.

However, I can still use this post to outline my opinion of where the language should go.

#### Crossing the Chasm

A colleague introduced me to the book Crossing the Chasm, which heavily influenced my approach to designing and marketing the language. The book was originally written for startups trying to gain mainstream adoption, but the book also strongly resonated with my experience doing open source evangelism (first for Haskell, and now Dhall).

The book explains that you need to first build a best-in-class solution for a narrowly-defined market. This in turn requires that you think carefully about what market you are trying to address and strategically allocate your limited resources to address that market.

So what “market” should Dhall try to address?

#### YAML

One of the clearest signals I’ve gotten from users is that Dhall is “the YAML killer”, for the following reasons:

• Dhall solves many of the problems that pervade enterprise YAML configuration, including excessive repetition and templating errors

• Dhall still provides many of the good parts of YAML, such as multi-line strings and comments, except with a sane standard

• Dhall can be converted to YAML using a tiny statically linked executable, which provides a smooth migration path for “brownfield” deployments

Does that mean that Dhall is clearly the best-in-class solution for people currently using YAML?

Not quite. The key thing Dhall is missing for feature parity with YAML is a wide array of native language bindings for interpreting Dhall configuration files. Many people would prefer to use Dhall without having to invoke an external executable to convert their Dhall configuration file to YAML.

This is one of the reasons I’ve slowed down the rate of evolution of the standard so that many of the new language bindings have an opportunity to implement the full standard. Also, I expect that once more language bindings have votes on the standard evolution that will further stabilize the language since new features proposals will have a higher bar to clear.

That’s not to say that we will freeze the language, but instead we will focus on strategically spending our “complexity budget” on features that help displace YAML. If we spend our complexity budget on unrelated features then we will increase the difficulty of porting Dhall to new languages without addressing the initial use case that will help Dhall gain mainstream traction.

#### JSON integration

One of YAML’s features is that all JSON is also valid YAML, by definition. In fact, some people use YAML just for the fact that it supports both JSON and comments.

This suggests that Dhall, like YAML, should also natively support JSON in some way. Dhall’s issue tracker contains a few issues along these lines and the one I would most like to see completed this year is adding support for importing JSON files as Dhall expressions:

#### Editor support

Another thing Dhall is missing compared to YAML is widespread editor support. This is why another one of my goals for this year is to create a Dhall language server so that any editor that supports the language server protocol (basically all of them) would get Dhall support for free.

#### Ops

We can actually narrow down Dhall’s “market” further if we really want to be selective about what we work on. Dhall has also grown in popularity for simplifying ops-related configurations, providing several features that ops engineers care about:

• Strong normalization

Ops commonly suffers from the dilemma that too much repetition is error prone, but too much abstraction is also error prone if readers of the code can’t effectively audit what is going on. One of Dhall’s unique features is that all code is strongly normalizing, meaning that every expression can be reduced to an abstraction-free normal form. This is made possible by the fact that Dhall is not Turing-complete (another feature favored by Ops).

• Absolute type safety

Ops engineers care about reliability since they maintain the software that the rest of their company relies on and any outages can have devastating effects on both the product and the productivity of other engineers.

This is one of the reason for the Ops flight from Turing-complete languages to inert configuration files like JSON/YAML because Turing-complete languages give you the tools to shoot yourself in the foot. However, Dhall strikes a balance between being programmable while still not being Turing-complete and having a type system with no escape hatches, so you’re incapable of shooting yourself in the foot.

• Built-in support for importing code

Another reason that Ops people hate programmable configuration files is that the programming language they pick typically comes with an external build tool for the language that adds one more layer to the tower of build tools that they have to maintain. Now they’ve just replaced one problem (a repetitive configuration file for their infrastructure) which a new problem (a repetitive configuration file for the build tool for the programming language they used to reduce the original repetition).

Dhall solves this problem well by providing built-in language support for importing other code (similar to Bash and Nix, both also heavily used for Ops use cases). This means that Dhall provides a solid foundaton for their tower of automation because they don’t need to introduce another tool to support a growing Dhall codebase.

• Dhall displaces YAML well

YAML configuration files are incredibly common in Ops and “infrastructure as code”. Example tools that use a YAML configuration are:

• Kubernetes
• Docker Compose
• Concourse
• Ansible
• Travis

YAML is so common that Ops engineers sometimes half-jokingly refer to themselves as “YAML engineers”.

As already mentioned above, Dhall provides a sane alternative to YAML.

We’ve already seen one Dhall integration for an Ops tool emerge last year with the `dhall-kubernetes` project and this year I hope we continue along those lines and add at least one more Ops-related integration.

I think the next promising integration is the `dhall-terraform` project which is still a work in progress that would benefit from contributions.

#### Funding

Finally, I would like to experiment with various ways to fund open source work on Dhall now that the language has a growing userbase. In particular, I’d like to fund:

• packaging Dhall for various software distributions (i.e. `.rpm`/`.deb`)